Tech blog
  • HOME
  • Blog
  • Comparing the appearance of "Recognition accuracy (recognition rate) XX%"

Comparing the appearance of "Recognition accuracy (recognition rate) XX%"

Published: 2021.06.28 Last updated: 2025.03.04

t-ookura Takashi Okura

Hello everyone!

In my previous article,Regarding the "recognition accuracy (recognition rate)" of voice recognition", in which we introduced how to calculate the recognition accuracy (recognition rate) of voice recognition.

Regarding the "recognition accuracy (recognition rate)" of speech recognition


AmiVoice Cloud Platform-Tech Blog

However, even if you are told that the recognition accuracy is XX%, it is difficult to know what the actual recognition results look like. AmiVoice users often ask, "They say the recognition accuracy is XX%, but is it actually usable?"

To be honest, it is difficult to give an absolute standard such as "If the recognition accuracy is XX%, it is usable!". Therefore, in this article, we will look at several sentences with different recognition accuracy and compare them.So this is what "recognition accuracy ○○%" looks like!I hope that you will have this image in mind.

Recognition accuracy calculation formula

First, let us show you the formula for calculating recognition accuracy.Character-by-character recognition accuracy" is the case.

Recognition accuracy = (N - D - S - I) / N

  • N: Correct answerNumber of characters
  • D: Deletion errorNumber of characters (characters that exist in the correct sentence but do not exist in the recognition result)
  • S: Substitution errorNumber of characters (characters that exist in the correct sentence are replaced with different characters in the recognition result)
  • I: Insertion errorNumber of characters (characters that do not exist in the correct sentence appear in the recognition result)

If the recognition result matches the correct sentence perfectly, the recognition accuracy will be 100%. The more errors there are, the lower the recognition accuracy will be.

Recognition accuracy and the "appearance" of the recognition results

In this example, the correct sentence is:

Our goal is to realize natural communication between humans and machines and create a prosperous future. 

I took a sentence from our company website, "Company Profile > Business Contents." To make the recognition accuracy a round number, I set the number of characters to exactly 100, excluding punctuation marks.40 lettersIn this test, punctuation marks will not be included in the calculation of recognition accuracy.

Recognition accuracy of 50%

First, the recognition accuracy50%Let's take a look at an example of the recognition result. Please note that the following recognition result example is fictitious and was created for this blog post, and is not the result of actual recognition using AmiVoice Cloud Platform or similar.

[Recognition accuracy50%Example of recognition result:
It is about building a line that realizes communication and understanding.

Let's calculate the recognition accuracy by comparing it with the correct sentence. The top is the correct sentence, and the bottom is the recognition result. Punctuation marks have been removed.

[Sorting results]
Peoplemachine OfNaturalCommunicationTheRealizedYutakaOrIfuture  TheWoundTo makethe eyesMeansTo do 
yarnUnderstanding Communicationhawk ILaInnCreateboatWhat to do  differenceTo doI

Recognition accuracy = (40 - 9 - 7 - 4) / 40 = 0.5 = 50%

What do you think? The only word that is correct is "communication," and the meaning of the sentence is not conveyed. With this level of recognition accuracy, for example, even if the spoken content is recognized by speech recognition and displayed as subtitles, the meaning will not be conveyed correctly. Also, even if the speech data is converted into text by speech recognition and then manually corrected while listening to the audio to create minutes, the corrections take a considerable amount of time.

Recognition accuracy of 80%

So, what is the recognition accuracy?80%What would it feel like to be like this?

[Recognition accuracy80%Example of recognition result:
We aim to create rich experiences by realizing natural communication between people and understanding.

[Sorting results]
PeoplemachineRealizing natural communication and enrichingIfutureWe aim to create 
With peopleUnderstanding Realizing natural communicationOnrich requestWe aim to createI

Recognition accuracy = (40 - 2 - 4 - 2) / 40 = 0.8 = 80%

It seems like the answer is much closer to the correct one than when it was 50%. However, there are some typos such as "understand" and "request" and some deletions and insertions of particles.

Recognition accuracy of 90%

Higher recognition accuracy,90%Let's also look at the case of

[Recognition accuracy90%Example of recognition result:
We aim to create rich requests by realizing natural communication between humans and machines.

[Sorting results]
Man and machineRealizing natural communication and creating a richfutureWe aim to create
Realizing natural communication between humans and machinesOn豊 か なrequestWe aim to create

Recognition accuracy = (40 - 1 - 2 - 1) / 40 = 0.9 = 90%

The number of errors has also decreased significantly. Even when correcting the results of speech recognition to create meeting minutes, it only requires making small corrections in three places, which saves a lot of time.

Recognition accuracy of 95%

Finally, recognition accuracy95%This is an example of the case.

[Recognition accuracy95%Example of recognition result:
We aim to create a prosperous future by realizing natural communication between humans and machines.

[Sorting results]
Man and machineWe aim to realize natural communication and create a prosperous future.
Realizing natural communication between humans and machinesOnAiming to create a prosperous future

Recognition accuracy = (40 - 1 - 0 - 1) / 40 = 0.95 = 95%

With this level of accuracy, it can be said that the results are almost the same as the correct sentence.

Ultimately, what level of recognition accuracy is needed for it to be usable?

At the beginning of this article, I wrote that it is difficult to give an absolute standard such as "If the recognition accuracy is XX%, it's usable!" However, I think there are still people who would like to know a "guideline for usability."
Considering the examples of recognition results we have seen so far,85 ~Around 90% is a good guidelineIsn't it?

However, the required recognition accuracy will vary depending on the situation. For example, if you are going to manually correct the recognition results after speech recognition, it may be acceptable for the accuracy to be somewhat lower, or there may be tasks that require higher accuracy.

Also, in this example, we calculated the recognition accuracy using only one sentence, but in actual speech recognition evaluation, it is not common to use only one sentence. The recognition accuracy is calculated using the entire test set, which includes a variety of speech sounds and sentences. For more details, see this article.

Is the voice recognition accuracy really accurate?


AmiVoice Cloud Platform-Tech Blog

Even if the overall recognition accuracy is high, there are often sentences that cannot be recognized well. If there is a tendency for such sentences to have poor sound quality or for certain words not to be recognized, this may lead to further improvements.

Final thoughts

This time, we compared the recognition accuracy figures with the appearance of the recognition results. We hope that this gives you an idea of ​​what "recognition accuracy XX%" looks like!
There are still other topics surrounding "voice recognition accuracy," so I hope to continue introducing them little by little on this blog.

If any developers have become interested in voice recognition technology or the AmiVoice Cloud Platform after reading this article, please feel free to contact us. https://acp.amivoice.com/ Give it a try. You can use up to 60 minutes of audio for free each month.

Thank you for reading this far!

Person who wrote this article

  • Takashi Okura

    He joined Advanced Media as a new graduate.
    My current job mainly involves research and development to improve the accuracy of speech recognition.
    My hobbies include traveling (mainly trains), reading (mainly novels), and board games.

     
Use API for Free