What is "orthographic variation" in speech recognition?

Takashi Okura
Hello everyone!
In a previous article I wrote titled "Why speech recognition rates are not 100%?", I mentioned that one of the reasons is thatVariations in spelling" was introduced.
AmiVoice Cloud Platform-Tech Blog
In this article, we will delve a little deeper into this "spelling variation." In addition to explaining what spelling variation is,
- In what situations does the variation in spelling become a problem?
- How much orthographic variation occurs in speech recognition?
I would also like to explain these points.
Table of contents
- What is spelling variation?
- When spelling variations are a problem
- To what extent does spelling variation occur?
- Final thoughts
What is spelling variation?
"Spelling variation" refers to the fact that words have the same sound and meaning but are written differently. For example, there are the following patterns:
| Pattern | An example |
|---|---|
| Kanji and Kana | various / various |
| Differences in okurigana | Moving / Moving / Moving |
| Differences in kana spelling | Violin |
| Alphabet and Kana | AmiVoice |
| Kanji and Arabic numerals | 2000 yen / 2000 yen |
Furthermore, Ando-san, who often appears in this blog,King of Words with Variations in Spelling" is what is being said.Seeker". Just thinking about it.
- Seeker
- Citrus depressa
- Shikuwasa
- Shiikuwasha
- ...
And there's no end to it.*1.
When spelling variations are a problem
In what cases does this "variation in spelling" cause problems? Here are three examples of problems.
Text search becomes difficult
I think the most common way to utilize the results of speech recognition is to "convert the speech into text and save it" like a meeting minutes. One of the benefits of this is thatYou can search for the content later"That's it.
Let's consider the case where there is variation in the text of the recognition results. For example, if you search for "AmiVoice" in the text, "AmiVoice" will not appear in the search results if the condition is an exact match.
Using OR searches or regular expressions, you can deal with slight variations in spelling, but when it comes to something like "shikwasa," it becomes difficult to cover all patterns.*2.
May affect voice recognition rate
As I mentioned in a previous article, speech recognition rates are basically calculated on a character-by-character basis, so variations in spelling can sometimes be counted as errors.
Regarding the "recognition accuracy (recognition rate)" of speech recognition
AmiVoice Cloud Platform-Tech Blog
For example, the correct sentence is "move" but the recognition result is "move."っYueTooth", if it is left as it is, it will become "っ"When"Tooth" partInsertion errorThis will count as a lower apparent voice recognition rate.
We will take a closer look at the extent to which spelling variations occur and how they affect speech recognition rates in the next section.
Format mismatch
This is often the case with matters such as parliamentary minutes, where the format of numbers and other characters is strictly determined. For example, if the rule is that "numbers should be displayed in Arabic numerals," but the recognition result outputs "two thousand yen," this is unacceptable even if it is a variation in notation. Measures such as reviewing the training data and converting the Chinese numerals in the recognition results to Arabic numerals in post-processing are required.
To what extent does spelling variation occur?
How much variation in spelling occurs in actual speech recognition?
This is a verification using internal data,Differences in okurigana""The difference between kanji and hiragana*3" and other variations in spelling, the recognition rateAbout 1 to 2 pointsIt seems to rise*4Of course, this value will vary depending on the audio situation and the engine, so it is only a reference value.
There were also cases like this:
We previously published an article comparing the speech recognition rates (recognition accuracy) of "general-purpose engines" and "domain-specific engines."
AmiVoice Cloud Platform-Tech Blog
The speech recognition rate shown in this article is
Misunderstandings caused by spelling variations have been corrected by revising the answer sentences.
As stated, these are the values after correcting the variations in notation.
| Engine | Voice recognition rate |
|---|---|
| Voice Input_General-Purpose | 87.41% |
| Voice Input_Electronic Medical Record | 97.61% |
The table below also shows the speech recognition rates of these two engines before any spelling variations were corrected.
| Engine | Voice recognition rate (Before modification → After modification) |
|---|---|
| Voice Input_General-Purpose | 75.73% → 87.41% |
| Voice Input_Electronic Medical Record | 93.77% → 97.61% |
What do you think? The general-purpose engine's recognition rate has improved by more than 10 points due to the correction of the notation variations, and some people may be surprised at the difference, which is much larger than they thought. However, when I checked the recognition results, I got the following message:Difference between full-width and half-width""The difference between Arabic numerals and Chinese numerals" There were many spelling variations related to formatting differences in the test set. If you are not getting the recognition rate you expect with speech recognition, we recommend checking the recognition results and the correct sentences to see if there are any spelling variations, including formatting differences.
Final thoughts
In this article, we have looked in detail at "orthographic variation" in speech recognition. As mentioned in the article "Why is speech recognition never 100% accurate?", there are many orthographic variations, and in many cases they cannot be converted mechanically. We hope that you can see that this is one of the difficult issues when dealing with speech recognition.
If any developers have become interested in voice recognition technology or the AmiVoice Cloud Platform after reading this article, please feel free to contact us. https://acp.amivoice.com/ Give it a try. You can use all engines for free up to 60 minutes of audio per month.
Thank you for reading this far!
Person who wrote this article
-

Takashi Okura
He joined Advanced Media as a new graduate.
My current job mainly involves research and development to improve the accuracy of speech recognition.
My hobbies include traveling (mainly trains), reading (mainly novels), and board games.
*1:Wikipedia has a good diagram (Shikwasa – Wikipedia) It might be a bit of a math problem to figure out how many different ways there are to write it.
*2:Although it is not a result of voice recognition, when writing this article, I had some trouble finding a text chat where Ando had previously talked about "Shikuwasa."WaI couldn't find anything by searching for "user", but I finally found it by searching for "seek". However, when I searched for "seek", the search results also included "secret".
*3:Note that writing parts that can be written in kanji in hiragana is called "opening the kanji," and writing them in kanji as is is called "closing the kanji." For example, in this article, the "variation" part of "orthographic variation" is open kanji.
*4:This value is not the "error improvement rate" explained in the previous article, but means that a recognition rate of 60% will become about 61-62%, and a recognition rate of 80% will become about 81-82%. Please see this article for more information on error improvement rates →What is the difference between "error improvement rate" and "recognition accuracy (recognition rate)" in speech recognition? – AmiVoice Techblog
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
- AmiVoice API Update Explanation: New Parameters for Voicebots Reduce Response Wait Times
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)
