[For Beginners] Running AmiVoice API from Edge and Chrome - Chrome Extension Edition

Introduction
Hello, I'm D, a member of the AmiVoice API infrastructure team.
In this article, we will introduce a sample Chrome extension that runs the AmiVoice API's WebSocket speech recognition API from Edge and Chrome, and how to create it.
Target
- If you are a beginner or a non-programmer who is about to try out the AmiVoice API,[For Beginners] Running the AmiVoice API from Edge and Chrome Web Page Edition" Anyone who has read "
What you can do
- Perform voice recognition of microphone and system sounds on any web page and display the results.
Chrome extension sample
It is not published to the store. Please download and load the folder and all files including manifest.json.
You can also download the source code archive of the main branch of the GitHub repository from the link below.
Download Zip

Warnings
- AmiVoice API is used for voice recognition.
- When you perform voice recognition, you will be charged a usage fee for the AmiVoice API.Usage Fees.
- I've only tested it on Microsoft Edge and Google Chrome on Windows 10.
- We store the APPKEY locally in plain text, with a short expiration time and IP restrictions where possible.One-time AppKeyPlease use the following. Also, please disable or delete it when you are not using it.
- Usage may be restricted by the policy of the web page.
- External Libraries Opus Recorder v8.0.5lib/opus-recorder/encoderWoker.min.js is the Opus Recorder file.
- There are no restrictions on the use of JavaScript code in files other than those listed above, but we do not provide any guarantees. Use at your own risk. We cannot respond to inquiries.
て 様 に つ い て
- Microphone and system sounds are converted to 16kHz DVI/IMA ADPCM (with proprietary header) or Ogg Opus and then sent to the AmiVoice API server.
- Usage may be restricted by the policy of the web page. If conversion to Ogg Opus is not possible due to restrictions, it will be converted to DVI/IMA ADPCM (with a proprietary header).
- Because the audio format is converted, the results may differ from those when using the AmiVoice API directly.
- Microphone and system sound recording and voice recognition are limited to a maximum of 1 hour. This is not a limitation of the AmiVoice API.
How to use
Microsoft Edge
- The "Manage Extensions" screen will be displayed.
- Enable "Developer Mode".
- Click "Extract and Load" and select the folder where manifest.json is located.
- Displays the settings screen for the extension "AmiVoice API Speech Recognition Sample."
- AmiVoice API's "APPKEY" and click "Save." (We recommend using an expiry dated APPKEY.)
- Open any web page that starts with "https://".
- Display the "AmiVoice API Speech Recognition Sample" pop-up screen and click "Start speech recognition."
Google Chrome
- The "Extensions" screen will be displayed.
- Click "Load unpackaged extension" and select the folder where manifest.json is located.
- Displays the settings screen for the extension "AmiVoice API Speech Recognition Sample."
- Enter the "APPKEY" for the AmiVoice API and click "Save." (We recommend using an APPKEY with a time limit.)
- Open any web page that starts with "https://".
- Display the "AmiVoice API Speech Recognition Sample" pop-up screen and click "Start speech recognition."
Method
Using Chrome extensions, you can insert your own JavaScript and CSS into web pages created by others and change the display and behavior of the web pages.
However, you cannot do anything to other people's pages on their servers; you can only change how they appear and behave in your browser.
We will create a Chrome extension that runs the AmiVoice API's WebSocket speech recognition API and displays a screen similar to Chrome's automatic captioning function.
By the way, although it is called a "Chrome extension," it also works with Microsoft Edge.
Here are some points to keep in mind when creating a Chrome extension:
- The content script inserted into a web page by a Chrome extension and the script in the original web page cannot see each other's JavaScript variables.
- The content inserted into the HTML DOM by a Chrome extension can be read by JavaScript on the web page, so do not insert content that you would not want to be read by others. You should also be careful about conflicts with HTML element ids and names.
- This is true for all JavaScript that runs on a browser, but HTML and JavaScript code and variables are completely visible to users. If you embed an authentication key or similar, it can be easily extracted.
- If you want to reference a Chrome extension file from a content script,chrome.runtime.getURL()You need to call
- Content scripts are used to load content from web pages.CSPIf possible, avoid implementations that may be subject to limitations, such as dynamic scripts.
First, create manifest.json.
{
"name": "AmiVoice API Speech Recognition Sample",
"action": {
"default_title": "AmiVoice API Speech Recognition Sample",
"default_popup": "popup.html"
},
"manifest_version": 3,
"version": "0.1.0",
"description": "AmiVoice API Speech Recognition Sample",
"permissions": [
"activeTab",
"scripting",
"storage"
],
"options_page": "options.html",
"content_scripts": [
{
"matches": [
"https://*/*"
],
"js": [
"scripts/opus-encoder-wrapper.js",
"lib/wrp/recorder.js",
"lib/wrp/wrp.js",
"scripts/view.js"
]
}
],
"web_accessible_resources": [
{
"resources": [
"lib/wrp/processor.js",
"lib/opus-recorder/encoderWorker.min.js"
],
"matches": ["https://*/*"]
}
]
}| キ ー | Description |
|---|---|
| manifest_version | Manifest version. The latest version is "3". |
| default_popup | This is the pop-up screen that appears when you run the "Chrome extension". |
| options_page | "Chrome Extension" options page. In this example, it is the parameter setting screen. |
| content_scripts | Specify the JavaScript and CSS to be inserted into the web page when the "Chrome Extension" is executed. In this sample, only JavaScript is used. |
| web_accessible_resources | Resources called from the Chrome extension, such as image files.AudioWorkletProcessorとWeb Workersis also specified here. |
| Played | This is the condition for adding content scripts and resources to a web page. This sample is only available via https, so "https://*/*" is specified. |
| permissions | To execute the Chrome extension APIDeclaring Permissions. In this sample,activeTab,scripting,storageYou will need three of these: |
Next, create the settings screen options.html and its JavaScript code options.js.
First, options.html.
<!DOCTYPE html>
<html lang="ja">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>Settings for AmiVoice API Chrome extensions Sample</title>
<link rel="stylesheet" href="./styles/style.css">
</head>
<body>
<div>
<h1>AmiVoice API Speech Recognition Sample</h1>
<table>
<tbody>
<tr>
<td colspan="2">
<h2>WebSocket音声認識APIの設定</h2>
</td>
</tr>
<tr>
<td><label for="appkey">APPKEY(※ 暗号化せずにローカルに保存。取り扱いに注意。)</label></td>
<td>
<input type="password" id="appkey" spellcheck="false" autocomplete="off">
</td>
</tr>
<tr>
<td><label for="grammarFileNames">接続エンジン<label></td>
<td>
<select name="grammarFileNames" id="grammarFileNames">
<option value="-a-general">会話_汎用</option>
<option value="-a-general-input">音声入力_汎用</option>
<option value="-a-medgeneral">会話_医療</option>
<option value="-a-medgeneral-input">音声入力_医療</option>
<option value="-a-bizmrreport">会話_製薬</option>
<option value="-a-bizmrreport-input">音声入力_製薬</option>
<option value="-a-medkarte-input">音声入力_電子カルテ</option>
<option value="-a-bizinsurance">会話_保険</option>
<option value="-a-bizinsurance-input">音声入力_保険</option>
<option value="-a-bizfinance">会話_金融</option>
<option value="-a-bizfinance-input">音声入力_金融</option>
<option value="-a-general-en">英語_汎用</option>
<option value="-a-general-zh">中国語_汎用</option>
</select>
</td>
</tr>
<tr>
<td><label for="loggingOptOut">サービス向上のための音声と認識結果の提供を行わない(ログ保存なし)</label></td>
<td><input type="checkbox" id="loggingOptOut" checked></td>
</tr>
<tr>
<td><label for="keepFillerToken">フィラー単語を保持するかどうか</label></td>
<td><input type="checkbox" id="keepFillerToken"></td>
</tr>
<tr>
<td><label for="profileWords">ユーザー登録単語</label></td>
<td><input type="text" id="profileWords" spellcheck="false" autocomplete="off"
title="{表記1}{半角スペース}{読み1}|{表記2}{半角スペース}{読み2}のように指定します。例:AmiVoice あみぼいす|猫 きかい"></td>
</tr>
<tr>
<td colspan="2">
<h2>サンプルプログラムの設定</h2>
</td>
</tr>
<tr>
<td><label for="useDisplayMedia">システムまたはブラウザの出力音声を使用する</label></td>
<td>
<input type="checkbox" id="useDisplayMedia">
</td>
</tr>
<tr>
<td><label for="useUserMedia">マイクの入力音声を使用する</label></td>
<td>
<input type="checkbox" id="useUserMedia">
</td>
</tr>
<tr>
<td><label for="useTrace">ブラウザのコンソールにトレースログを出力する</label></td>
<td>
<input type="checkbox" id="useTrace">
</td>
</tr>
<tr>
<td><label for="useNoTranslate">ブラウザの翻訳機能を有効にしたときに認識結果の原文、<br>翻訳結果が両方見られるように認識結果を2回出力する</label></td>
<td>
<input type="checkbox" id="useNoTranslate">
</td>
</tr>
<tr>
<td><label for="useTimestamp">認識結果出力時に現在時刻を出力</label></td>
<td>
<input type="checkbox" id="useTimestamp">
</td>
</tr>
<tr>
<td><label for="useSpoken">認識結果を出力するときに「読み」も一緒に出力する</label></td>
<td>
<input type="checkbox" id="useSpoken">
</td>
</tr>
<tr>
<td><label for="useOpusRecorder">音声データをサーバーに送信する前にOgg Opus形式に圧縮する</label></td>
<td><input type="checkbox" id="useOpusRecorder" checked></td>
</tr>
<tr>
<td colspan="2">
<div style="font-size: smaller; margin-left: 20px;">
<div>Ogg Opus形式への圧縮には下記のプログラムを使用しています。</div>
<div>
Opus Recorder License (MIT)<br>
Original Work Copyright © 2013 Matt Diamond<br>
Modified Work Copyright © 2014 Christopher Rudmin<br>
<a href="https://github.com/chris-rudmin/opus-recorder/blob/v8.0.5/LICENSE.md" target="_blank"
rel="noopener noreferrer">https://github.com/chris-rudmin/opus-recorder/blob/v8.0.5/LICENSE.md</a>
</div>
</div>
</td>
</tr>
</tbody>
</table>
<button id="saveButton" type="button">保存</button>
</div>
<script src="./scripts/options.js"></script>
</body>
</html>
This is options.js.
/**
* 設定情報をロードします。
*/
function loadOptions() {
chrome.storage.local.get(null, (options) => {
if (typeof options.authorization === 'undefined') {
options.authorization = "";
}
if (typeof options.grammarFileNames === 'undefined') {
options.grammarFileNames = "-a-general";
}
if (typeof options.loggingOptOut === 'undefined') {
options.loggingOptOut = true;
}
if (typeof options.useTrace === 'undefined') {
options.useTrace = false;
}
if (typeof options.useUserMedia === 'undefined') {
options.useUseMedia = false;
}
if (typeof options.useDisplayMedia === 'undefined') {
options.useDisplayMedia = true;
}
if (typeof options.keepFillerToken === 'undefined') {
options.keepFillerToken = false;
}
if (typeof options.profileWords === 'undefined') {
options.profileWords = "";
}
if (typeof options.useTimestamp === 'undefined') {
options.useTimestamp = false;
}
if (typeof options.useSpoken === 'undefined') {
options.useSpoken = false;
}
if (typeof options.useOpusRecorder === 'undefined') {
options.useOpusRecorder = true;
}
document.getElementById('appkey').value = options.authorization;
document.getElementById('grammarFileNames').value = options.grammarFileNames;
document.getElementById('loggingOptOut').checked = options.loggingOptOut;
document.getElementById('useTrace').checked = options.useTrace;
document.getElementById('useDisplayMedia').checked = options.useDisplayMedia;
document.getElementById('useUserMedia').checked = options.useUserMedia;
document.getElementById("useNoTranslate").checked = options.useNoTranslate;
document.getElementById("keepFillerToken").checked = options.keepFillerToken;
document.getElementById("profileWords").value = options.profileWords;
document.getElementById("useTimestamp").checked = options.useTimestamp;
document.getElementById("useSpoken").checked = options.useSpoken;
document.getElementById("useOpusRecorder").checked = options.useOpusRecorder;
});
}
/**
* 設定情報を保存します。
*/
function saveOptions() {
const authorization = document.getElementById('appkey').value;
const grammarFileNames = document.getElementById('grammarFileNames').value;
const loggingOptOut = document.getElementById('loggingOptOut').checked;
const useTrace = document.getElementById('useTrace').checked;
const useDisplayMedia = document.getElementById('useDisplayMedia').checked;
const useUserMedia = document.getElementById('useUserMedia').checked;
const useNoTranslate = document.getElementById("useNoTranslate").checked;
const keepFillerToken = document.getElementById("keepFillerToken").checked;
const profileWords = document.getElementById("profileWords").value;
const useTimestamp = document.getElementById("useTimestamp").checked;
const useSpoken = document.getElementById("useSpoken").checked;
const useOpusRecorder = document.getElementById("useOpusRecorder").checked;
const options = {
authorization: authorization.trim(),
grammarFileNames: grammarFileNames.trim(),
loggingOptOut: loggingOptOut,
useTrace: useTrace,
useDisplayMedia: useDisplayMedia,
useUserMedia: useUserMedia,
useNoTranslate: useNoTranslate,
profileWords: profileWords,
keepFillerToken: keepFillerToken,
useTimestamp: useTimestamp,
useSpoken: useSpoken,
useOpusRecorder: useOpusRecorder
};
chrome.storage.local.set(options);
alert("設定を保存しました。");
}
document.addEventListener('DOMContentLoaded', loadOptions);
document.getElementById('saveButton').addEventListener('click', saveOptions);
Allows you to specify parameters,chrome.storage.local.set() and chrome.storage.local.get()Just save and load.
Next, create view.js, the JavaScript code for the screen to be inserted into the web page.
// フォントサイズ切替の小サイズ
const AMI_RESULTVIEW_FONTSIZE_SMALL = "16px";
// フォントサイズ切替の中サイズ
const AMI_RESULTVIEW_FONTSIZE_MEDIUM = "24px";
// フォントサイズ切替の大サイズ
const AMI_RESULTVIEW_FONTSIZE_LARGE = "32px";
// 結果表示画面全体のHTML要素
let amivoiceApiSampleResultViewDialogElement = null;
// 結果表示画面の結果表示部分のHTML要素
let amivoiceApiSampleResultViewElement = null;
// 結果表示画面の認識途中結果表示部分のHTML要素
let amivoiceApiSampleResultUpdatedViewElement = null;
// 結果画面の自動スクロールの有無
const ResultViewSetting = {
isAutoScroll: true
}
/**
* Traceメッセージのマスク処理です。
* @param {string} message メッセージ
*/
function maskTraceMessage(message) {
return message.replace(/authorization=\w+/, "authorization=XXXX");
}
/**
* 結果表示画面全体のHTML要素を取得します。
* @returns HTMLエレメント
*/
function getResultViewDialog() {
return amivoiceApiSampleResultViewDialogElement;
}
/**
* 結果表示画面の結果表示部分のHTML要素を取得します。
* @returns HTMLエレメント
*/
function getResultViewElement() {
return amivoiceApiSampleResultViewElement;
}
/**
* 結果表示画面の認識途中結果表示部分のHTML要素を取得します。
* @returns HTMLエレメント
*/
function getResultUpdatedElement() {
return amivoiceApiSampleResultUpdatedViewElement;
}
/**
* 結果表示画面を作成します。
* @returns 結果表示画面全体のHTMLエレメント
*/
function createResultViewDialog() {
amivoiceApiSampleResultViewDialogElement = document.createElement("div");
amivoiceApiSampleResultViewDialogElement.style.backgroundColor = "rgba(0,0,0,0.7)";
amivoiceApiSampleResultViewDialogElement.style.height = "0px";
amivoiceApiSampleResultViewDialogElement.style.width = "96%";
amivoiceApiSampleResultViewDialogElement.style.transform = "translateX(2%)";
amivoiceApiSampleResultViewDialogElement.style.position = "fixed";
amivoiceApiSampleResultViewDialogElement.style.zIndex = "99999";
amivoiceApiSampleResultViewDialogElement.style.border = "0px";
amivoiceApiSampleResultViewDialogElement.style.textAlign = "left";
amivoiceApiSampleResultViewDialogElement.style.color = "white";
amivoiceApiSampleResultViewDialogElement.style.fontSize = "24px";
amivoiceApiSampleResultViewDialogElement.style.fontFamily = "'Hiragino Kaku Gothic ProN', 'Helvetica', 'Verdana', 'Lucida Grande', 'ヒラギノ角ゴ ProN', sans-serif";
amivoiceApiSampleResultViewDialogElement.style.borderRadius = "10px";
amivoiceApiSampleResultViewDialogElement.style.height = "25%";
amivoiceApiSampleResultViewDialogElement.style.overflow = "hidden";
amivoiceApiSampleResultViewDialogElement.style.top = "70%";
amivoiceApiSampleResultViewDialogElement.style.resize = "both";
amivoiceApiSampleResultViewDialogElement.style.maxWidth = "100%";
amivoiceApiSampleResultViewDialogElement.style.maxHeight = "100%";
const headerElement = document.createElement("div");
headerElement.style.height = "12px";
amivoiceApiSampleResultViewDialogElement.appendChild(headerElement);
amivoiceApiSampleResultViewElement = document.createElement("div");
amivoiceApiSampleResultViewElement.style.overflow = "auto";
// ヘッダ分マイナス
amivoiceApiSampleResultViewElement.style.height = "calc(100% - 12px)";
amivoiceApiSampleResultUpdatedViewElement = document.createElement("div");
amivoiceApiSampleResultUpdatedViewElement.style.textDecoration = "underline";
amivoiceApiSampleResultUpdatedViewElement.style.textDecorationStyle = "dotted";
amivoiceApiSampleResultUpdatedViewElement.setAttribute("translate", "no");
amivoiceApiSampleResultViewElement.appendChild(amivoiceApiSampleResultUpdatedViewElement);
amivoiceApiSampleResultViewDialogElement.appendChild(amivoiceApiSampleResultViewElement);
document.body.appendChild(amivoiceApiSampleResultViewDialogElement);
setDraggableElement(amivoiceApiSampleResultViewDialogElement, headerElement);
return amivoiceApiSampleResultViewDialogElement;
}
/**
* 透過率切替を実行したcolorを返します。
* @param {string} color
* @returns 切替後のcolor
*/
function getToggleBackgroudColorAlpha(color) {
if (color === 'undefined' || color === null) {
return "rgba(0,0,0,0.7)";
}
if (color.startsWith("rgba")) {
let array = color.split(",");
if (array.length === 4) {
let alpha = parseFloat(array[3].trim());
alpha += 0.1;
if (alpha > 1) {
alpha = 0;
}
array[3] = alpha + ")";
return array.join(',');
}
} else if (color.startsWith("rgb")) {
let array = color.split(",");
if (array.length === 3) {
array[0] = array[0].replace("rgb", "rgba");
array[2] = parseFloat(array[2].trim());
return array.join(',') + ",0)";
}
}
return "rgba(0,0,0,0.7)";
}
/**
* HTML要素を移動できるよう設定します。
* @param {object} element HTML要素
* @param {object} headerElement HTML要素を移動するときにドラッグするヘッダー要素
*/
function setDraggableElement(element, headerElement) {
let x = 0;
let y = 0;
headerElement.onmousedown = onDragStart;
headerElement.style.cursor = "move";
/**
* 移動開始処理
* @param {object} event
*/
function onDragStart(event) {
event.preventDefault();
// 最初のマウス位置取得
x = event.clientX;
y = event.clientY;
document.addEventListener("mouseup", onDragEnd);
document.addEventListener("mousemove", onDragMove);
}
/**
* 移動処理
* @param {object} event
*/
function onDragMove(event) {
event.preventDefault();
// 本体の要素の位置を変更
element.style.left = (element.offsetLeft - (x - event.clientX)) + "px";
element.style.top = (element.offsetTop - (y - event.clientY)) + "px";
// 移動後のマウス位置取得
x = event.clientX;
y = event.clientY;
}
/**
* 移動終了処理
*/
function onDragEnd() {
document.removeEventListener("mouseup", onDragEnd);
document.removeEventListener("mousemove", onDragMove);
}
}
HTML DOM ids and styles also affect web pages, so to minimize this impact, we avoid using ids and specify styles directly in JavaScript.
This will create an HTML element that looks like Chrome's automatic transcription screen.
All that's left is to create a button and execution process for the WebSocket speech recognition API. Create a button in the Chrome extension popup.
<!DOCTYPE html>
<html lang="ja">
<head>
<meta charset="UTF-8">
<link rel="stylesheet" href="./styles/style.css">
</head>
<body>
<div>
<h2>AmiVoice API Speech Recognition Sample</h2>
</div>
<div>
<button id="startButton" type="button">音声認識開始</button>
</div>
<div>
<button id="stopButton" type="button">音声認識停止</button>
</div>
<div>
<button id="showResultButton" type="button">画面表示/非表示</button>
</div>
<div>
<button id="toggleFontsizeButton" type="button">フォントサイズ切替</button>
</div>
<div>
<button id="toggleAlphavalueButton" type="button">透過率切替</button>
</div>
<div>
<button id="toggleAutoscrollButton" type="button">自動スクロール切替</button>
</div>
<div>
<button id="showOptionsButton" type="button">設定</button>
</div>
<script src="./scripts/popup.js"></script>
</body>
</html>It's just a row of buttons. Next, create popup.js.
/**
* 結果画面の表示・非表示を切り替えます。
*/
function toggleResultView() {
// サイト上で実行されるスクリプト
// ページが読み込まれた後にChrome拡張機能を更新されたり無効から有効にされると、
// content_scriptsが使えないようなのでチェック。
if (typeof Wrp === 'undefined') {
alert("スクリプトが読み込まれていません。ページを再読み込みしてください。");
return;
}
const resultViewDialog = getResultViewDialog();
if (!resultViewDialog) {
createResultViewDialog();
return;
}
if (resultViewDialog.style.display !== "none") {
resultViewDialog.style.display = "none";
} else {
resultViewDialog.style.display = "";
}
}
/**
* 結果画面のフォントサイズを切り替えます。
*/
function toggleResultViewFontSize() {
// サイト上で実行されるスクリプト
// ページが読み込まれた後にChrome拡張機能を更新されたり無効から有効にされると、
// content_scriptsが使えないようなのでチェック。
if (typeof Wrp === 'undefined') {
alert("スクリプトが読み込まれていません。ページを再読み込みしてください。");
return;
}
const resultViewDialog = getResultViewDialog();
if (!resultViewDialog) {
createResultViewDialog();
return;
}
if (resultViewDialog.style.fontSize === AMI_RESULTVIEW_FONTSIZE_SMALL) {
resultViewDialog.style.fontSize = AMI_RESULTVIEW_FONTSIZE_MEDIUM;
} else if (resultViewDialog.style.fontSize === AMI_RESULTVIEW_FONTSIZE_MEDIUM) {
resultViewDialog.style.fontSize = AMI_RESULTVIEW_FONTSIZE_LARGE;
} else {
resultViewDialog.style.fontSize = AMI_RESULTVIEW_FONTSIZE_SMALL;
}
}
/**
* 結果画面の透明度を切り替えます。
*/
function toggleResultViewAlphaValue() {
// サイト上で実行されるスクリプト
// ページが読み込まれた後にChrome拡張機能を更新されたり無効から有効にされると、
// content_scriptsが使えないようなのでチェック。
if (typeof Wrp === 'undefined') {
alert("スクリプトが読み込まれていません。ページを再読み込みしてください。");
return;
}
const resultViewDialog = getResultViewDialog();
if (!resultViewDialog) {
createResultViewDialog();
return;
}
resultViewDialog.style.backgroundColor =
getToggleBackgroudColorAlpha(resultViewDialog.style.backgroundColor);
}
/**
* 結果画面の自動スクロールのON/OFFを切り替えます。
*/
function toggleAutoScroll() {
// サイト上で実行されるスクリプト
// ページが読み込まれた後にChrome拡張機能を更新されたり無効から有効にされると、
// content_scriptsが使えないようなのでチェック。
if (typeof Wrp === 'undefined') {
alert("スクリプトが読み込まれていません。ページを再読み込みしてください。");
return;
}
ResultViewSetting.isAutoScroll = !ResultViewSetting.isAutoScroll;
}
/**
* WebSocket音声認識APIを開始します。
*/
function startRecognition() {
// サイト上で実行されるスクリプト
// ページが読み込まれた後にChrome拡張機能を更新されたり無効から有効にされると、
// content_scriptsが使えないようなのでチェック。
if (typeof Wrp === 'undefined') {
alert("スクリプトが読み込まれていません。ページを再読み込みしてください。");
return;
}
let resultViewElement = getResultViewElement();
if (!resultViewElement) {
createResultViewDialog();
resultViewElement = getResultViewElement();
}
const resultUpdatedElement = getResultUpdatedElement();
if (!resultUpdatedElement) {
return;
}
/**
* システムログを出力します。
* @param {string} printMessage 出力内容
*/
const printSystemMessage = function (printMessage) {
const date = new Date();
const systemElement = document.createElement("div");
const timestamp = "[" + date.toLocaleTimeString() + "] ";
systemElement.textContent = (timestamp + printMessage);
systemElement.style.color = "violet";
resultViewElement.insertBefore(systemElement, resultUpdatedElement);
resultUpdatedElement.textContent = "";
// 最新の認識結果が見えるようにスクロールする。
if (ResultViewSetting.isAutoScroll) {
setTimeout(function () { resultViewElement.scrollTop = resultViewElement.scrollHeight; }, 200);
}
}
if (Wrp.isActive()) {
printSystemMessage("既に音声認識サーバーに接続中です。");
return;
}
chrome.storage.local.get(null, (options) => {
if (typeof options.authorization === 'undefined') {
printSystemMessage("設定画面でパラメーターの設定を行ってください。");
return;
}
/**
* 文字列のHTMLエスケープを行います
* @param {string} s 文字列
* @returns エスケープした文字列
*/
function sanitize_(s) {
return s.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/'/g, ''')
.replace(/"/g, '"');
}
/**
* 認識完了時の認識結果の出力を行います。
* @param {string} printMessage 出力内容
* @param {string} color フォントカラー
* @param {boolean} isHtml trueの場合はinnerHTML、それ以外はtextContentに出力
*/
const printResultFinalized = function (printMessage, color, isHtml) {
if (printMessage == "") {
return;
}
let message = printMessage;
if (options.useTimestamp) {
const date = new Date();
const timestamp = "[" + date.toLocaleTimeString() + "] ";
message = timestamp + message;
}
resultUpdatedElement.textContent = "";
const fragment = document.createDocumentFragment();
if (options.useNoTranslate) {
// Chromeで翻訳対象外となるようtranslate=noを設定した要素を挿入。
const noTranslateResultFinalizedElement = document.createElement("div");
noTranslateResultFinalizedElement.setAttribute("translate", "no");
if (isHtml) {
noTranslateResultFinalizedElement.innerHTML = message;
} else {
noTranslateResultFinalizedElement.textContent = message;
}
noTranslateResultFinalizedElement.style.color = color;
fragment.appendChild(noTranslateResultFinalizedElement);
}
const resultFinalizedElement = document.createElement("div");
if (isHtml) {
if (options.useNoTranslate) {
// ルビを削除
resultFinalizedElement.innerHTML = message.replaceAll(/<rt>[^<]*<\/rt>/g, '').replaceAll(/<[^>]*>/g, '');
} else {
resultFinalizedElement.innerHTML = message;
}
} else {
resultFinalizedElement.textContent = message;
}
resultFinalizedElement.style.color = color;
fragment.appendChild(resultFinalizedElement);
resultViewElement.insertBefore(fragment, resultUpdatedElement);
// 最新の認識結果が見えるようにスクロールする。
if (ResultViewSetting.isAutoScroll) {
setTimeout(function () { resultViewElement.scrollTop = resultViewElement.scrollHeight; }, 200);
}
}
/**
* 認識途中結果の出力を行います。
* @param {string} printMessage 出力内容
* @param {string} color フォントカラー
*/
const printResultUpdated = function (printMessage, color) {
if (resultUpdatedElement.style.color !== color) {
resultUpdatedElement.style.color = color;
}
resultUpdatedElement.textContent = printMessage;
// 最新の認識結果が見えるようにスクロールする。
if (ResultViewSetting.isAutoScroll) {
setTimeout(function () { resultViewElement.scrollTop = resultViewElement.scrollHeight; }, 200);
}
}
/**
* WebSocket音声認識APIの認識結果JSONからルビ付きHTMLテキストを作成します。
* @param {object} json
* @returns HTML
*/
const toTextWithRuby = function (json) {
if (!json.results || !json.results[0]) {
return json.text;
}
let lastWritten = "";
let resultText = "";
for (let token of json.results[0].tokens) {
// フィラー単語の前後の「%」を削除
if (/^%.+%$/.test(token.written)) {
token.written = token.written.replace(/^%(.*)%$/, "$1");
}
if (lastWritten.length > 0) {
// 末尾が数字またはアルファベット、?、.、,、!の単語と先頭がアルファベットの単語の間にスペースを入れます。
if (/[a-zA-Z0-9?.,!]$/.test(lastWritten) && /^[a-zA-Z]/.test(token.written)) {
resultText += " ";
}
}
// 読みがあり、読みと表記が異なっていて、表記がひらがな、カタカナ、半角スペース、「?」以外を含み、数字のみでない場合は、ルビを付与
let ruby = "";
if (typeof token.spoken !== 'undefined') {
ruby = token.spoken.replaceAll("_", " ").replaceAll(/ {2,}/g, " ").replaceAll(".", "").trim();
}
let written = token.written.replaceAll("_", " ").replaceAll(/ {2,}/g, " ").trim();
if ((ruby.length > 0) && (written !== ruby)
&& (written.search(/[^\u3040-\u309f\u30a0-\u30ff? ]/) !== -1)
&& !(/^[0-9]+$/.test(written))) {
resultText += ('<ruby>' + sanitize_(written) + '<rt>' + sanitize_(ruby) + '</rt></ruby>');
} else {
resultText += sanitize_(written);
}
lastWritten = token.written;
}
return resultText;
}
Wrp.serverURL = "wss://acp-api.amivoice.com/v1/";
if (options.loggingOptOut) {
Wrp.serverURL += "nolog/";
}
Wrp.grammarFileNames = options.grammarFileNames;
Wrp.authorization = options.authorization;
Wrp.profileWords = options.profileWords;
Wrp.keepFillerToken = options.keepFillerToken ? 1 : 0;
Wrp.resultUpdatedInterval = 400;
Wrp.checkIntervalTime = 600000;
Recorder.maxRecordingTime = 3600000;
Recorder.sampleRate = 16000;
Recorder.downSampling = true;
Recorder.adpcmPacking = true;
Recorder.useUserMedia = options.useUserMedia;
Recorder.useDisplayMedia = options.useDisplayMedia;
Recorder.useOpusRecorder = options.useOpusRecorder;
Wrp.TRACE = function (message) {
if (message.startsWith("ERROR:")) {
printSystemMessage(message);
} else if (options.useTrace) {
console.log(maskTraceMessage(message));
}
};
Wrp.connectStarted = function () {
printSystemMessage("音声認識サーバー接続中...");
};
Wrp.connectEnded = function () {
printSystemMessage("音声認識サーバー接続完了(音声認識準備完了)。");
};
Wrp.disconnectStarted = function () {
printSystemMessage("音声認識サーバー切断中...");
};
Wrp.disconnectEnded = function () {
printSystemMessage("音声認識サーバー切断完了。");
};
Wrp.resultUpdated = function (result) {
printResultUpdated(JSON.parse(result).text, "white");
};
Wrp.resultFinalized = function (result) {
if (options.useSpoken) {
printResultFinalized(toTextWithRuby(JSON.parse(result)), "white", true);
} else {
printResultFinalized(JSON.parse(result).text, "white", false);
}
};
try {
Wrp.feedDataResume();
} catch (e) {
printSystemMessage(e.message);
}
});
}
/**
* WebSocket音声認識APIを停止させます。
*/
function stopRecognition() {
// サイト上で実行されるスクリプト
// ページが読み込まれた後にChrome拡張機能を更新されたり無効から有効にされると、
// content_scriptsが使えないようなのでチェック。
if (typeof Wrp === 'undefined') {
alert("スクリプトが読み込まれていません。ページを再読み込みしてください。");
return;
}
if (Wrp.isActive()) {
Wrp.feedDataPause();
}
}
/**
* content_scriptが使用できない旨のアラートを表示します。
*/
function alertCantWorkScript() {
alert("「https://」以外のサイトでは使用できません。");
}
// 音声認識開始ボタンのクリックイベント設定
document.getElementById("startButton").addEventListener("click", async () => {
chrome.tabs.query({ active: true, currentWindow: true }, function (tabs) {
if (tabs[0].url.startsWith("https://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: startRecognition
});
} else if (tabs[0].url.startsWith("http://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: alertCantWorkScript
});
}
});
});
// 音声認識停止ボタンのクリックイベント設定
document.getElementById("stopButton").addEventListener("click", async () => {
chrome.tabs.query({ active: true, currentWindow: true }, function (tabs) {
if (tabs[0].url.startsWith("https://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: stopRecognition
});
} else if (tabs[0].url.startsWith("http://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: alertCantWorkScript
});
}
});
});
// 画面表示/非表示ボタンのクリックイベント設定
document.getElementById("showResultButton").addEventListener("click", async () => {
chrome.tabs.query({ active: true, currentWindow: true }, function (tabs) {
if (tabs[0].url.startsWith("https://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: toggleResultView,
});
} else if (tabs[0].url.startsWith("http://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: alertCantWorkScript
});
}
});
});
// フォントサイズ切替ボタンのクリックイベント設定
document.getElementById("toggleFontsizeButton").addEventListener("click", async () => {
chrome.tabs.query({ active: true, currentWindow: true }, function (tabs) {
if (tabs[0].url.startsWith("https://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: toggleResultViewFontSize,
});
} else if (tabs[0].url.startsWith("http://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: alertCantWorkScript
});
}
});
});
// 自動スクロール切替ボタンのクリックイベント設定
document.getElementById("toggleAutoscrollButton").addEventListener("click", async () => {
chrome.tabs.query({ active: true, currentWindow: true }, function (tabs) {
if (tabs[0].url.startsWith("https://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: toggleAutoScroll,
});
} else if (tabs[0].url.startsWith("http://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: alertCantWorkScript
});
}
});
});
// 透過率切替ボタンのクリックイベント設定
document.getElementById("toggleAlphavalueButton").addEventListener("click", async () => {
chrome.tabs.query({ active: true, currentWindow: true }, function (tabs) {
if (tabs[0].url.startsWith("https://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: toggleResultViewAlphaValue,
});
} else if (tabs[0].url.startsWith("http://")) {
chrome.scripting.executeScript({
target: { tabId: tabs[0].id },
function: alertCantWorkScript
});
}
});
});
// 設定ボタンのクリックイベント設定
document.getElementById("showOptionsButton").addEventListener("click", async () => {
chrome.runtime.openOptionsPage(null);
});
It just lists the processes that occur when a button is clicked.
chrome.storage.local,chrome.tabs.query(),chrome.scripting.executeScript()Once you understand this, I think there will be no problems.
chrome.scripting.executeScript() runs a script on the currently active tab or web page.
Note that functions called with chrome.scripting.executeScript() cannot call other functions defined in popup.js.
This is basically finished,AmiVoice API client libraryIn recorder.js,AudioWorklet.addModule()Because dynamic script code is executed (strings are executed as script code), depending on the web page,CSPThis may result in an execution error due to limitations. Therefore, move it to a separate file and change the file to AudioWorklet.addModule().
Export the string registered with AudioWorklet.addModule() to a file,chrome.runtime.getURL()Just make sure to reference it with .
// 各種変数の初期化
async function initialize_() {
// 録音関係の各種変数の初期化
audioContext_ = new AudioContext({ sampleRate: recorder_.sampleRate });
// ファイルをaddModule()するように変更。
/*
await audioContext_.audioWorklet.addModule(URL.createObjectURL(new Blob([
"registerProcessor('audioWorkletProcessor', class extends AudioWorkletProcessor {",
" constructor() {",
" super()",
" }",
" process(inputs, outputs, parameters) {",
" if (inputs.length > 0 && inputs[0].length > 0) {",
" if (inputs[0].length === 2) {",
" for (var j = 0; j < inputs[0][0].length; j++) {",
" inputs[0][0][j] = (inputs[0][0][j] + inputs[0][1][j]) / 2",
" }",
" }",
" this.port.postMessage(inputs[0][0], [inputs[0][0].buffer])",
" }",
" return true",
" }",
"})"
], {type: 'application/javascript'})));
*/
await audioContext_.audioWorklet.addModule(chrome.runtime.getURL('./lib/wrp/processor.js'));This is the contents of processor.js.
registerProcessor('audioWorkletProcessor', class extends AudioWorkletProcessor {
constructor() {
super();
}
process(inputs, outputs, parameters) {
if (inputs.length > 0 && inputs[0].length > 0) {
if (inputs[0].length === 2) {
for (var j = 0; j < inputs[0][0].length; j++) {
inputs[0][0][j] = (inputs[0][0][j] + inputs[0][1][j]) / 2;
}
}
this.port.postMessage(inputs[0][0], [inputs[0][0].buffer]);
}
return true;
}
});
This is completed.
Enter the AmiVoice API APPKEY in the settings screen, save it, and click the "Start speech recognition" button in the Chrome extension on any web page. Speech recognition should begin.
The maximum recording time for microphone and system audio is 1 hour. If you want to change it, change the value of Recorder.maxRecordingTime set in popup.js. However, we have not verified the effect of changing the value.
As an added bonus, we've added a function that allows you to simultaneously display recognition results that are not subject to translation by adding the translate=no attribute to the HTML element. By enabling the browser's translation function, you can check the text before and after translation. However, browser translation functions probably have a limit on the number of characters or words that can be translated, so they do not allow unlimited translation.
That's all. Thank you for reading until the end.
Person who wrote this article
-
AmiVoice API Infrastructure Team Member D
A budding infrastructure engineer and cat lover.
Class change from programmer to infrastructure engineer.
I wrote the "Web Page Edition" and the "Chrome Extension Edition".
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
- AmiVoice API Update Explanation: New Parameters for Voicebots Reduce Response Wait Times
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)
