Register words that are not recognized by speech recognition! Tips for word registration in AmiVoice

Shogo Ando
Hello everyone.
Have you ever had the experience of your spoken words not being recognized correctly when using AmiVoice? You might think, "Maybe my pronunciation is bad," but if your voice still doesn't get recognized no matter how many times you try, it's possible that the words weren't registered in the first place.
In such cases, word registration is effective. This time, we will explain the word registration function.
- Introduction
- Conclusion: Tips for registering words on AmiVoice
- About word registration
- The mechanism of speech recognition is slightly different from the mechanism of IME kana-kanji conversion.
- What is a "class" when registering words?
- Details on how to register words in AmiVoice
- Keep the number of words you register to a minimum (ACP allows a maximum of 1000 words)
- Be careful of short readings (avoid readings of 1-2 characters if possible)
- It's OK to register multiple words with the same "notation"
- There is no point in registering multiple words with the same pronunciation.
- Words with the same pronunciation may be registered if they are in different classes.
- The pronunciation is automatically converted.
- You can specify the pronunciation in detail by using "." (half-width dot).
- "Pronunciation" is not about furigana, but about "how to read it".
- Supplementary information for using ACP word registration
- At the end
Introduction
- This article is about a service for developers. (AmiVoice Cloud Platform) This assumes the use of AmiVoice Cloud Platform (ACP). The concept is basically the same for products equipped with AmiVoice other than ACP, but please note that the system and names may differ depending on the AmiVoice product, and specifications such as the maximum number of words that can be registered and characters that cannot be used may differ.
- This article is based on information current as of January 2022. Please note that some of the content may change in the future due to updates to the speech recognition engine, etc.
Conclusion: Tips for registering words on AmiVoice
First, the conclusion! Here are some bullet points on how to register words.
- Keep the number of words you register to a minimum (ACP allows a maximum of 1000 words)
- Be careful of things with short pronunciations (Avoid names with one or two letters if possible.)
- It's OK to register multiple words with the same "notation"
- There is no point in registering multiple words with the same pronunciation.
- Words with the same pronunciation may be registered if they are in different classes.
- The pronunciation is automatically converted.
- You can specify the pronunciation in detail by using "." (half-width dot).
- "Pronunciation" is not about furigana, but about "how to read it".
We'll go into more detail about these tips later in this article.
This is a bit of a detour, but let me explain how word registration works for speech recognition.
If you don't need a detailed explanation, feel free to skip to the second half.
About word registration
When registering words in AmiVoice, you can register words to the speech recognition engine by specifying the "notation" and "pronunciation" of the word you want to register, and if necessary, the "class" (classes will be explained later).
Microsoft's Kana-Kanji conversion also has a word registration function, but this one requires you to specify the "phrase", "pronunciation", and if necessary, the "part of speech". It's very similar.


The word registration function of both AmiVoice and Microsoft's Kana-Kanji conversion is almost the same. However, with speech recognition, the input is very vague, meaning "voice", so you need to get the hang of it to use it effectively.
The mechanism of speech recognition is slightly different from the mechanism of IME kana-kanji conversion.
This article provides a detailed explanation of the differences between AmiVoice's speech recognition system and kana-kanji conversion systems such as Microsoft's, as well as the difficulties of speech recognition.
Why is AmiVoice so accurate? Why does it have so many different voice recognition engines?
AmiVoice Cloud Platform-Tech Blog
This article touches on the concept of a "language model". A language model is data that compiles information such as "this word is likely to appear before or after certain words or phrases", and is necessary for improving the accuracy of speech recognition predictions.
For example, let's take the name of a place and station in Tokyo, "池袋". I searched for "池袋" on Google and picked out some of the sentences that appeared in the results.

You can see that the word "池袋" appears between the preceding and following phrases shown above. A language model is a collection of large amounts of data like this, modeling what words and phrases are likely to appear around "池袋".
Typically, the number of words registered in a speech recognition engine will be in the tens of thousands to hundreds of thousands, and each word will have information such as "this word is likely to appear before or after certain words or phrases".
What is a "class" when registering words?
Now, as mentioned above, all words in AmiVoice's speech recognition engine have information about "what kinds of words/phrases they are likely to appear before or after," but the problem here is "what words/phrases will user-registered words be likely to appear before or after?"
Here comes the "Classes".
If you specify a class when registering a word, the information contained in that class, such as "what words and phrases does the word likely appear before or after?", will be used.
For example, if you specify "会話_汎用エンジン (-a-general)" as the connection engine name on the ACP word registration screen, you will be able to select from the following classes.

If you select the "駅名" class here and register a word, the word will be registered in a way that it is likely to appear before or after words and phrases, such as the "池袋" example above. (The "駅名" class shares information about the words and phrases that appear before and after various station names that have been trained in the speech recognition engine, not just "池袋".)
By the way, if you specify "会話_医療エンジン (-a-medgeneral)", for example, you will be able to select from the following classes. As you can see, please note that the classes available vary depending on the speech recognition engine.

Also, one important point here is "what happens if you don't specify a class?" If you don't specify a class, the registered word will be registered in a way that it "appears before and after various words and phrases."
While it may seem convenient that registered words are more likely to appear in various places, conversely, this means that "the possibility of being mistakenly recognized by speech recognition even though you didn't intend to speak" becomes more likely to increase. In particular, as explained in the article at the link mentioned above, speech recognition is also a technology that somehow converts "voice," which is extremely ambiguous information, into text. You need to be careful because words with similar pronunciation may be recognized by speech recognition even if you didn't intend to say them.
Details on how to register words in AmiVoice
Now, let's take a closer look at some tips for registering words.
Keep the number of words you register to a minimum (ACP allows a maximum of 1000 words)
Registered words will be recognized by voice, but at the same time, there is a possibility that a word with a pronunciation similar to the registered reading will be mistakenly recognized. This is unlikely if you have only a few registered words, so you don't need to worry too much about it, but you need to be careful when registering a large number of words.
Be careful of things with short pronunciations (Avoid names with one or two letters if possible.)
The shorter the pronunciation of a word, the more likely it is that homonyms or words with similar pronunciations already exist, so you need to be careful. On the other hand, if the pronunciation is long enough, you don't need to be too careful.
It's OK to register multiple words with the same "notation"
You can register multiple words with the same "notation". For example, the pronunciation of the word "雰囲気" is "ふんいき," but some people may read it as "ふいんき." In such cases, it would be good to register two words as shown below.
■Example of registering multiple words with different pronunciations
・Notation: 雰囲気 Pronunciation: ふんいき
・Notation: 雰囲気 Pronunciation: ふいんき
There is no point in registering multiple words with the same pronunciation.
There is no point in registering multiple words with the same pronunciation. For example, in addition to the familiar "こんぶちゃ", there is also a tea mushroom drink called "昆布茶", but if someone simply says, "こんぶちゃを飲んだ", it is difficult for a human to determine which one you are referring to. The same is true for speech recognition; if you register two words with the same pronunciation as shown below and speak "こんぶちゃ", the speech recognition engine will not be able to determine which notation to output. Avoid registering multiple words with the same pronunciation.
■ Example of registering words with the same pronunciation (it is unclear which one will be recognized)
·Notation: 昆布茶 Pronunciation: こんぶちゃ
·Notation: Kombucha Pronunciation: こんぶちゃ
*By the way, if you register these two and say "こんぶちゃ", it is not clear which one will come out as the speech recognition result (this is undefined as a specification).
Words with the same pronunciation may be registered if they are in different classes.
Even if a word has the same pronunciation, if the "class" is different it will be treated as a word in a different context, so it makes sense to register it. For example, when you say "トヨタ", it could mean the car manufacturer "とよた", but it could also mean the given name (surname) "豊田". In this case, it would be a good idea to register it in this way.
■Examples of words with the same pronunciation but different classes (words recognized differently depending on the context)
·Notation: 豊田 Pronunciation: とよた Class: 名前
·Notation: トヨタ Pronunciation: とよた Class: 会社名
If you register as above, when you say "とよた" in a context where a company name is likely to appear, it will be recognized as "トヨタ", and when you say "とよた" in a context where a name (surname) is likely to appear, it will be recognized as "豊田".
*Please note that the speech recognition engine may make incorrect judgments.
The pronunciation is automatically converted.
When registering the word "東京", you would probably read it as "とうきょう". However, when actually pronouncing the word "東京", you would extend it as "とーきょー" without pronouncing the "う" sound. This means that there are cases where the reading and the actual pronunciation are different.
To prevent this, the pronunciation you specify is automatically converted internally. For example, the following pronunciations are all treated as the same, so there is no need to register various patterns like these; it is sufficient to register just one.
"とうきょう"
"とうきょお"
"とうきょー"
"とおきょう"
"とおきょお"
"とおきょー"
"とーきょう"
"とーきょお"
"とーきょー"
However, this automatic conversion may result in unintended pronunciations being registered. For example, when registering the word "幕の内", you might enter the pronunciation as "まくのうち", which will be treated as the same as "まくのーち". If the actual pronunciation is "まくのーち", there's no problem, but some people may clarify the "う" and pronounce it as "まくの 'う' ち".
The following section explains how to register in this case.
You can specify the pronunciation in detail by using "." (half-width dot).
By inserting a "." (half-width dot) in the pronunciation, you can suppress automatic conversion of the pronunciation. Specifically, it will be as follows.
・When pronounced as "まくのーち"
→Read it as "まくのうち" "まくのーち" Register with either.
・When pronounced as "まくの “う” ち"
→Read it as "まくの.うち" and register
In reality, examples like "まくのーち" and "まくの.うち" have very similar pronunciations, so in many cases speech recognition can work without problems even without distinguishing between them. It would be good to consider this in cases where you want to improve speech recognition accuracy even slightly, or when distinction is absolutely necessary.
"Pronunciation" is not about furigana, but about "how to read it".
Some people read the word "洗濯機" as "せんたくき" and others as "せんたっき", but if it is read as "せんたっき", it is preferable to register the pronunciation as "せんたっき". You can also register both as shown below.
・Notation: 洗濯機 Pronunciation: せんたくき
・Notation: 洗濯機 Pronunciation: せんたっき
By the way, in this example, both "せんたくき" and "せんたっき" are pronounced similarly, so you should be able to recognize the voice without any problems without having to think too hard about it. If the registered reading and the actual pronunciation are significantly different, or if speech recognition is not working properly, it's a good idea to check the reading.
Supplementary information for using ACP word registration
The following article explains how to use the word registration function in AmiVoice Cloud Platform (ACP). In ACP, you need to specify a profile ID after registering a word. Please note that if you do not specify this, the registered words will not be valid.
AmiVoice Cloud Platform-Tech Blog
At the end
This time, I explained the mechanism and tips for registering words in AmiVoice. At first glance, it looks similar to registering words for kana-kanji conversion, but there are many differences. I think it will be easier to understand if you actually register words while referring to the tips, try speaking them, and get a feel for it. If you are interested in software development using speech recognition, please check it out.(AmiVoice Cloud Platform) I would be happy if you could try it.
Person who wrote this article
-

Shogo Ando
While researching speech recognition, I found a speech recognition company nearby and joined the company, where I continue to work to this day.
My hobbies are traveling abroad, eating delicious food, and saunas.
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use Zenn Coupon & Trial
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)
