Tech Blog
  • HOME
  • Blog
  • Regarding the "recognition accuracy (recognition rate)" of speech recognition

Regarding the "recognition accuracy (recognition rate)" of speech recognition

Published: 2021.03.26 Last updated: 2025.03.04

t-ookura Takashi Okura

Hello everyone!
Advanced Media Co., Ltd.Takashi Okura.

I have been appointed to write articles for AmiVoice Techblog.
I hope to cover a variety of topics on this blog, mainly focusing on voice recognition technology and the AmiVoice Cloud Platform. Thank you for your interest.

Today's topic: "Recognition accuracy (recognition rate)" of voice recognition

In this first article, we will cover the most important thing you will need when using voice recognition:Recognition accuracy" is about!
※Similar words include "Recognition rate", but we'll explain the difference later.

How is this "recognition accuracy" calculated?
If the recognition result and the correct sentence match perfectly, the recognition accuracy can be said to be 100%.
If not a single character is recognized, the recognition accuracy will be 0%.

So, in the following example, what is the recognition accuracy?

  • Correct sentence: Available todayItadakiThank you very muchTo
  • Recognition result: Thank you for using our service today.ToothIt was

There are some parts that I understand correctly, butItadaki" is not recognized, or "To" is "ToothIt was" It seems like it would be fine to just subtract the parts that weren't recognized, but conversely, if there are extra words in the recognition results, how do you calculate the recognition accuracy?

There are several indicators for recognition accuracy, but this time we will use the "Character-by-character recognition accuracyWe will explain how to calculate ". We will introduce other metrics in future articles. First, compare the correct sentence with the recognition result to find any errors. There are three types of errors:

  • Deletion error(Deletion error): An error in which a character that exists in the correct sentence does not exist in the recognition result.
    In the example above, "Itadaki"The four characters that were deleted incorrectly are:
  • Substitution error(Substitution error): An error in which a character that exists in the correct sentence is replaced with a different character in the recognition result.
    In the example above, "Thank you very muchTo" at the end of "To" but the recognition result is "Tooth" has been replaced with
  • Insertion error(Insertion error): An error in which a character that does not exist in the correct sentence appears in the recognition result.
    In the above example, the last part of the recognition result, "It was" corresponds to the insertion error.Tooth" does not exist in the correct sentence, but if it can be treated as a substitution error, it will be given priority.

そ し て,Number of characters in the correct answerTheN,Deletion errorSubstitution errorInsertion errorThe number of characters inDSIThen, the recognition accuracy for each character can be calculated using the following formula:

Recognition accuracy = (N - D - S - I) / N

In the example above,
Recognition accuracy = (20 - 4 - 1 - 1) / 20 = 0.7 = 70%
.

About the term "recognition rate"

Here,Recognition accuracy"When"Recognition rateWe will explain the difference between the words " and ".

"Recognition accuracy" in English "Accuracy" This is shown by the above formula.
On the other hand, when evaluating the recognition results, "Correct" In Japanese, this value is translated as "correct answer rate."

Correct (correct answer rate) = (N - D - S) / N

The difference with recognition accuracy isWhether or not it contains an I (insertion error)This indicates the percentage of correctly recognized characters among the recognition results.
However, when considering actual usage scenarios, even if the accuracy rate is high, it will be useless if a lot of unnecessary characters are inserted. Most recognition results are evaluated by “Accuracy.”.

And "Recognition rate" is the word, but it seems to be often used in various papers to mean "Correct". However, in business, the word "recognition rate" is easier to understand than "recognition accuracy",Conventionally, the term "recognition rate" is often used to mean "Accuracy."That is the reality.
Therefore, in the following articles, we will prioritize clarity and use "recognition rate" to mean "Accuracy" unless otherwise specified.

When the term "recognition rate" is used in speech recognition, it may be a good idea to check whether it means "Accuracy" or "Correct," i.e., whether it is counting insertion errors.

Final thoughts

I would like to continue posting articles like this in the future that will interest even those who don't know much about voice recognition.
In future articles, I would like to compare the recognition accuracy with the appearance of the actual recognition results, as well as introduce some indicators of recognition accuracy.
And when new features are released for the AmiVoice Cloud Platform, I hope to provide technical explanations on this blog!

This is my first time writing a blog post like this, so I'm still feeling my way around, but I hope you'll continue to support me.
Thank you for reading this far!

Person who wrote this article

  • Takashi Okura

    He joined Advanced Media as a new graduate.
    My current job mainly involves research and development to improve the accuracy of speech recognition.
    My hobbies include traveling (mainly trains), reading (mainly novels), and board games.

     
Use API for Free