Is the Internet to blame for voice recognition not working? How to create a communication environment that allows voice recognition to work smoothly

Shogo Ando
Hello everyone.
This is the final installment of our series on tips for improving the accuracy of voice recognition. The final topic will be "the optimal communication environment for voice recognition."
Because speech recognition consumes a large amount of resources, such as CPU/GPU and memory, it is often processed on a cloud server rather than on a PC or smartphone. This makes the network communication environment important.
This time, we will explain in detail the network and wireless communication environment such as Bluetooth when using a voice recognition server.
An "appropriate communication environment" is essential for voice recognition!
First, we will explain the amount of communication data required for voice recognition.
・Send (client → server)
・Audio data is uncompressed per channel 32kBytes/sec Is required
・Speech recognition setting information may be sent when a request is made.
・Receive (server → client)
・The voice recognition results will be returned
·Others
- Overhead may occur due to protocol headers, etc.
The above is the main data exchanged when using the AmiVoice API, but other voice recognition engines may send and receive other information.
The point isThe majority of communication data is voice data sent from the client to the server.Apart from the audio data, the data is mainly text information, and the amount of data is generally not large.
For voice recognition, a sampling rate of 16 kHz and a quantization bit depth of 16 bits are common, and the resulting data volume per second is calculated as follows:
- 16kHz×16bit = 256kbit/s = 32kB/s
Please note that the amount of audio data will vary depending on the following factors:
- When processing multiple channels of sound simultaneously (for example, in the case of two channels, the amount of data is twice that of one channel)
- When transmitting higher quality audio data (for example, if the sampling rate is 24kHz, the amount of data is 1.5 times that of 16kHz)
- When transmitting low-quality audio data, such as landline voice (the sampling rate is often set to 8 kHz, which is half the data volume of 16 kHz)
- If the file format is compressed (the amount of data is reduced by the amount of compression)
Data other than voice data is usually not that large, but under certain conditions, large amounts of data may be sent or received. For example, with the AmiVoice API, it is possible to send a large word registration list or grammar definitions (rule grammar) for speech recognition when making a request. Please estimate the size of this data as needed.
With this in mind, it is necessary to have an appropriate network environment and sufficient bandwidth when using voice recognition. If the network environment is poor and the bandwidth is insufficient, the following two issues may occur:
- For streaming
The sound transmission may be delayed, or may even be dropped, or may result in a timeout error. - When sending files
It takes time to send the voice, which increases the time it takes for the voice recognition results to be returned, and in some cases it may cause a timeout error.
In addition to network communication,If the communication environment for wireless microphones or Bluetooth microphones is poor,The accuracy of voice recognition may decrease. This may be due to:Crosstalk, radio interference, obstacles, etc.If the sound is interrupted or not transmitted properly, please check and try to improve the communication environment between the microphone and the receiving device.
What to do if the communication environment is poor
Here are some ways to deal with poor communication conditions.
First, if your network environment is unstable,Build a speech recognition server within your local network without using a cloud server on the InternetThere is a way to do this. This can reduce or eliminate network delays and instability. However, there are only a limited number of voice recognition engines that can take this measure. In the case of AmiVoice,AmiVoice API Private" is offered as a plan. Alternatively, you can build your own server by using OpenAI's "Whisper", for example.
Next, there is also a way to perform speech recognition on a PC or smartphone without using a server. In this case, you don't even need a local network, and speech recognition can be performed completely offline. In the case of AmiVoice,AmiVoiceSDKThis offline speech recognition is possible by using ".
When the network communication environment is poor
・Building a voice recognition server within a local network
▶Available in AmiVoice API Private
▶There are engines such as Whisper that allow you to build your own server.
・Offline voice recognition within the device
▶ Possible with AmiVoice SDK
Next, we will explain what to do if the communication environment for wireless microphones or Bluetooth microphones is poor.
First of all, the most reliable method is to use a wired microphone or the microphone built into the device whenever possible. However, if you need to use a wireless or Bluetooth microphone, the following measures can be considered.
- Avoiding radio interference
It is important to avoid interference by moving away from other wireless/Bluetooth devices, microwave ovens, and other devices that emit radio waves. Since interference is more likely to occur in crowded places, it is also effective to move away from such environments. - Reduce the impact of obstacles
Obstacles such as walls and furniture can increase the chance of communication being interrupted. You can improve the communication environment by reducing obstacles or by moving the device's transmitter and receiver closer together. - Change the radio channel
Depending on the device you are using, you may be able to change the radio channel, which may allow you to switch to a channel that does not interfere with other devices. - Contact the manufacturer
Depending on the microphone, there may be other measures available besides those listed above, so it may be a good idea to consider contacting the microphone manufacturer for detailed instructions.
When the communication environment for wireless microphones or Bluetooth microphones is poor
- Use a wired or on-device microphone
・Stay away from environments with radio interference
- Reduce obstacles or shorten the distance between the transmitter and receiver.
- Change the radio channel you use
・For details, contact the microphone manufacturer.
Summary of tips to improve the accuracy of voice recognition
So far, we have discussed tips for improving the accuracy of voice recognition in five articles. Let's summarize the key points once again.
Points that influence the accuracy of voice recognition
① Speak appropriately
②Select the right device
3. Use the device appropriately
④Use in a low-noise environment
⑤Use in an appropriate communication environment

We hope that you will put into practice the points introduced here and use your ingenuity to make the most of voice recognition in a better environment.
Person who wrote this article
-

Shogo Ando
While researching speech recognition, I found a speech recognition company nearby and joined the company, where I continue to work to this day.
My hobbies are traveling abroad, eating delicious food, and saunas.
: @anpyan
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New columns
- AmiVoice API Update Explanation: New Parameters for Voicebots Reduce Response Wait Times
- AmiVoice API Update: End-to-End ASR–Ready ”Keyword Biasing”
- Easily synthesize subtitles into videos! Subtitle workflow created with speech recognition API
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(26)
- Comparison and Verification (5)
- Others(9)
