Comparing the appearance of "Recognition accuracy (recognition rate) XX%"

Takashi Okura
Hello everyone!
In my previous article,Regarding the "recognition accuracy (recognition rate)" of voice recognition", in which we introduced how to calculate the recognition accuracy (recognition rate) of voice recognition.
Regarding the "recognition accuracy (recognition rate)" of speech recognition
AmiVoice Cloud Platform-Tech Blog
However, even if you are told that the recognition accuracy is XX%, it is difficult to know what the actual recognition results look like. AmiVoice users often ask, "They say the recognition accuracy is XX%, but is it actually usable?"
To be honest, it is difficult to give an absolute standard such as "If the recognition accuracy is XX%, it is usable!". Therefore, in this article, we will look at several sentences with different recognition accuracy and compare them.So this is what "recognition accuracy ○○%" looks like!I hope that you will have this image in mind.
Recognition accuracy calculation formula
First, let us show you the formula for calculating recognition accuracy.Character-by-character recognition accuracy" is the case.
Recognition accuracy = (N - D - S - I) / N
- N: Correct answerNumber of characters
- D: Deletion errorNumber of characters (characters that exist in the correct sentence but do not exist in the recognition result)
- S: Substitution errorNumber of characters (characters that exist in the correct sentence are replaced with different characters in the recognition result)
- I: Insertion errorNumber of characters (characters that do not exist in the correct sentence appear in the recognition result)
If the recognition result matches the correct sentence perfectly, the recognition accuracy will be 100%. The more errors there are, the lower the recognition accuracy will be.
Recognition accuracy and the "appearance" of the recognition results
In this example, the correct sentence is:
Our goal is to realize natural communication between humans and machines and create a prosperous future.
I took a sentence from our company website, "Company Profile > Business Contents." To make the recognition accuracy a round number, I set the number of characters to exactly 100, excluding punctuation marks.40 lettersIn this test, punctuation marks will not be included in the calculation of recognition accuracy.
Recognition accuracy of 50%
First, the recognition accuracy50%Let's take a look at an example of the recognition result. Please note that the following recognition result example is fictitious and was created for this blog post, and is not the result of actual recognition using AmiVoice Cloud Platform or similar.
[Recognition accuracy50%Example of recognition result:
It is about building a line that realizes communication and understanding.
Let's calculate the recognition accuracy by comparing it with the correct sentence. The top is the correct sentence, and the bottom is the recognition result. Punctuation marks have been removed.
[Sorting results]
Peopleとmachineと OfNaturalCommunicationTheRealizedYutakaOrIfuture TheWoundTo makethe eyesMeansTo do
yarnとUnderstanding Communicationhawk IいLaInnCreateboatWhat to do differenceTo doI
Recognition accuracy = (40 - 9 - 7 - 4) / 40 = 0.5 = 50%
What do you think? The only word that is correct is "communication," and the meaning of the sentence is not conveyed. With this level of recognition accuracy, for example, even if the spoken content is recognized by speech recognition and displayed as subtitles, the meaning will not be conveyed correctly. Also, even if the speech data is converted into text by speech recognition and then manually corrected while listening to the audio to create minutes, the corrections take a considerable amount of time.
Recognition accuracy of 80%
So, what is the recognition accuracy?80%What would it feel like to be like this?
[Recognition accuracy80%Example of recognition result:
We aim to create rich experiences by realizing natural communication between people and understanding.
[Sorting results]
PeopleとmachineとRealizing natural communication and enrichingIfutureWe aim to create
With peopleUnderstanding Realizing natural communicationOnrich requestWe aim to createI
Recognition accuracy = (40 - 2 - 4 - 2) / 40 = 0.8 = 80%
It seems like the answer is much closer to the correct one than when it was 50%. However, there are some typos such as "understand" and "request" and some deletions and insertions of particles.
Recognition accuracy of 90%
Higher recognition accuracy,90%Let's also look at the case of
[Recognition accuracy90%Example of recognition result:
We aim to create rich requests by realizing natural communication between humans and machines.
[Sorting results]
Man and machineとRealizing natural communication and creating a richfutureWe aim to create
Realizing natural communication between humans and machinesOn豊 か なrequestWe aim to create
Recognition accuracy = (40 - 1 - 2 - 1) / 40 = 0.9 = 90%
The number of errors has also decreased significantly. Even when correcting the results of speech recognition to create meeting minutes, it only requires making small corrections in three places, which saves a lot of time.
Recognition accuracy of 95%
Finally, recognition accuracy95%This is an example of the case.
[Recognition accuracy95%Example of recognition result:
We aim to create a prosperous future by realizing natural communication between humans and machines.
[Sorting results]
Man and machineとWe aim to realize natural communication and create a prosperous future.
Realizing natural communication between humans and machinesOnAiming to create a prosperous future
Recognition accuracy = (40 - 1 - 0 - 1) / 40 = 0.95 = 95%
With this level of accuracy, it can be said that the results are almost the same as the correct sentence.
Ultimately, what level of recognition accuracy is needed for it to be usable?
At the beginning of this article, I wrote that it is difficult to give an absolute standard such as "If the recognition accuracy is XX%, it's usable!" However, I think there are still people who would like to know a "guideline for usability."
Considering the examples of recognition results we have seen so far,85 ~Around 90% is a good guidelineIsn't it?
However, the required recognition accuracy will vary depending on the situation. For example, if you are going to manually correct the recognition results after speech recognition, it may be acceptable for the accuracy to be somewhat lower, or there may be tasks that require higher accuracy.
Also, in this example, we calculated the recognition accuracy using only one sentence, but in actual speech recognition evaluation, it is not common to use only one sentence. The recognition accuracy is calculated using the entire test set, which includes a variety of speech sounds and sentences. For more details, see this article.
AmiVoice Cloud Platform-Tech Blog
Even if the overall recognition accuracy is high, there are often sentences that cannot be recognized well. If there is a tendency for such sentences to have poor sound quality or for certain words not to be recognized, this may lead to further improvements.
Final thoughts
This time, we compared the recognition accuracy figures with the appearance of the recognition results. We hope that this gives you an idea of what "recognition accuracy XX%" looks like!
There are still other topics surrounding "voice recognition accuracy," so I hope to continue introducing them little by little on this blog.
If any developers have become interested in voice recognition technology or the AmiVoice Cloud Platform after reading this article, please feel free to contact us. https://acp.amivoice.com/ Give it a try. You can use up to 60 minutes of audio for free each month.
Thank you for reading this far!
Person who wrote this article
-

Takashi Okura
He joined Advanced Media as a new graduate.
My current job mainly involves research and development to improve the accuracy of speech recognition.
My hobbies include traveling (mainly trains), reading (mainly novels), and board games.
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
- AmiVoice API Update Explanation: New Parameters for Voicebots Reduce Response Wait Times
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)
