The accuracy of speech recognition (speech recognition rate) can be improved by the way you speak. What are the tips for how to speak?

Shogo Ando
Hello everyone.
Just as there is no such thing as "you can lose weight by just eating this!", unfortunately there is no such thing as "you can improve voice recognition accuracy by just doing this!" However, it is possible to gradually improve voice recognition accuracy by paying attention to a few points.
In this article, I will be teachingTechnical Webinars OfBased on the materials and content, I would like to explain five tips to improve voice recognition accuracy in this series.
Recommended for those who have questions such as
- Those who want to improve the recognition rate of speech recognition
- Those who want to create a service using speech recognition or are concerned about the recognition rate
When is voice recognition accuracy high or low?
Before we get into tips for improving voice recognition accuracy, I'd like to talk about when voice recognition accuracy is likely to be high or low. This may be a bit sudden, but which of the video call partners in the images below do you think has the easiest voice to hear?

Do you get the impression that the woman on the left is easy to listen to, while the woman on the right is difficult to listen to?

This is only about video calls, but the same thing can be said about voice recognition. Sounds that are easy for humans to hear are also sounds that are easy for humans to recognize.
Now, let's think about what makes the image above seem "easy to listen to" or "difficult to listen to." The following five points come to mind.
Five points that affect the accuracy of voice recognition
1. Speak appropriately
2. Select the appropriate microphone device
3. Use the microphone device appropriately
4. Use in a low-noise environment
5. Use in an appropriate communication environment
If these five elements are perfect, the accuracy of voice recognition will be quite high. However, there will be cases where you will be using it in a noisy environment, and it may be difficult to satisfy all of them. However, even in such cases, by paying attention to the other elements, you can maintain high voice recognition accuracy while covering up some of the weaknesses.
Extra edition. Speak what the speech recognition engine supports
There is one more important point that I will mention as an extra. In human communication, mutual understanding is relatively easy because people usually talk about the same topic, but in the case of speech recognition, it is important to know how familiar the engine is with that topic. Even if the pronunciation and sound quality are perfect, if the speech recognition engine does not support a specific topic or technical terminology, the recognition accuracy will be significantly reduced.Someday (next time onwards)I'd like to talk to you.
This time"1. I will talk about "speaking appropriately."
Tip 1) Speak appropriately
To improve speech recognition accuracy, it is important to first be aware of "speaking in a way that is easy for humans to understand." With that in mind, let's organize the specific points you should pay attention to.
Speak clearly
It is best to speak as clearly and articulately as possible.
Just as muttering makes it difficult for humans to understand what you are saying, it also makes it easy for voice recognition to make mistakes. It is best to speak clearly, with large mouth movements, and with each syllable as clear as possible.
It is also important to be careful not to skip any sounds. For example, it is common to pronounce "Owaraimasu" as "Ketamarimusu." Even if a few sounds are skipped, speech recognition can often be performed without any problems, but it is important to be careful as this can gradually reduce the accuracy of speech recognition.
Speak normally (don't speak unnaturally)
It's best to speak naturally, as if you were talking to someone.
When using voice recognition, you may see people who speak like an old robot, separating each letter like "Konnichiwa" or emphasizing each letter like "Konnichiwa." You might think that speaking like this makes it easier for the voice to be recognized, but with modern voice recognition, this is no longer necessary and can actually have the opposite effect. It's best to speak normally, as if you were talking to a person.
To go into a bit more detail, humans don't pronounce the sounds like "a," "i," and "u" one by one, but also pronounce the "sounds that go from "a" to "i." Voice recognition is designed to recognize these intermediate sounds, so it's better to speak including those sounds. In other words, it's better to speak normally.
Speak at an appropriate volume. Speak at a consistent volume as much as possible.
Extremely loud or soft voices are difficult for humans to hear, so it's a good idea to pay attention to the following two points.
- The voice must not be too quiet (it must be recorded at a volume loud enough for humans to hear).
- Your voice should not be too loud (no distortion).
However, this depends not only on the volume of the person's speaking voice, but also on factors such as the type of microphone used, the distance between the microphone and the mouth, and the sensitivity of the microphone. It is important to speak at an appropriate volume according to the microphone and settings used. We will explain microphones in another article.
- Speak at a consistent volume as much as possible
Another important thing is to speak at a consistent volume as much as possible. The volume of your voice tends to be lower at the beginning and end of a speech, and when this happens, it is common for the speech recognition engine to not detect your voice or to recognize it correctly. This can be prevented by increasing the microphone sensitivity a little or moving closer to the microphone, but it can also be improved by being conscious of speaking at a consistent volume even at the beginning and end of a speech.
Don't interrupt other people's conversations
The difficulty of voice recognition increases dramatically when multiple people speak at the same time. It is best to have only one person speak at a time.
Some voice recognition systems allow for multiple channel voice input. In such systems, if each speaker is assigned a separate microphone, there is no problem with speaking simultaneously. However, if there is only one microphone and multiple speakers speak into it, the voice recognition accuracy will inevitably decrease.
(In the case of AmiVoice) There is no problem even if you don't worry about intonation or accent.
Intonation and accent may vary depending on the region or individual, or may vary, for example, "the end of a word rises in a question." With AmiVoice, these differences in intonation and accent do not affect the accuracy of speech recognition.
By the way, I think there are other engines besides AmiVoice that are good at intonation and accent. Since I haven't been able to do detailed testing with other engines, I've written "in the case of AmiVoice" here.
Summary
This time, we talked about the key to speaking appropriately. Since the voice recognition engine learns normal human speech, it is important to speak in a way that is easy for people to understand. Also, you can further improve the accuracy of voice recognition by speaking at a consistent volume and being careful not to overlap with the speech of other companies. Next time, we will be introducing "The voice recognition rate changes depending on the device (microphone) you choose ~ How to choose a microphone ~"We will inform you of this.
Person who wrote this article
-

Shogo Ando
While researching speech recognition, I found a speech recognition company nearby and joined the company, where I continue to work to this day.
My hobbies are traveling abroad, eating delicious food, and saunas.
: @anpyan
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
- AmiVoice API Update Explanation: New Parameters for Voicebots Reduce Response Wait Times
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)
