Three types of speech recognition APIs on the AmiVoice Cloud Platform (asynchronous HTTP speech recognition API has been newly added)

Shogo Ando
Hello everyone.
We provide cloud speech recognition API for software developers. (AmiVoice Cloud Platform) We have provided two types of APIs with different protocols depending on the purpose, but we have now added another type.We now offer three types of APIs:.
This time, we will explain three types of APIs, focusing on the newly added APIs.
What APIs have been added?
The API added this time is "Asynchronous HTTP”.
Until now, there was a streaming (real-time) voice recognitionWebSocket" and " for batch speech recognition.HTTP Speech Recognition API" There were two of these.HTTP Speech Recognition API" previously recognized audio files up to 16MBytes, but the newly added API now allows for voice recognition of files larger than 16MBytes.
In addition, the API added this time is "Asynchronous HTTP", so the name of the previous API was changed to "HTTP Speech Recognition APIFrom "Synchronous HTTP" has been changed to ".
In summary, the AmiVoice Cloud Platform will have the following three types of APIs:
- WebSocket Speech Recognition API
- (Name change)Synchronous HTTP speech recognition API
- (NEW!)Asynchronous HTTP speech recognition API
I will briefly explain each one.
WebSocket
WebSocketcan convert audio streams into text in real time, which is suitable for applications where you need to use speech recognition results in real time, such as:
- Convert call center conversations into text in real time
- Real-time transcription of meeting remarks
- Voice control of smartphones and IoT devices
- Speech dialogue system
WebSocketUsing this allows you to obtain speech recognition results while you are speaking, or obtain final results immediately after detecting the end of speech, but in exchange, you need to send the audio data as a stream. This means that it requires a bit more effort to implement, such as handling the audio data in binary format and controlling the recording device as necessary.
(Name change)Synchronous HTTP
Synchronous HTTPYou can convert audio files into text. It works very simply; once you send the audio file, it will undergo speech recognition processing and return the results once processing is complete. It is suitable for converting short audio files like the one below into text.
- Convert short audio files such as voice memos and voicemails into text
- PoC of systems using voice recognition and evaluation of voice recognition accuracy
This will be explained nextAsynchronous HTTPThe operation sequence is briefly illustrated below for comparison.Synchronous HTTPThe sequence is as follows:

Synchronous HTTPIn this case, there will be a waiting period on the application side from the time the audio file is sent until the speech recognition process is complete. Also, the session needs to be kept connected during this time, but if the session is disconnected midway, you will have to start over, so we have set an upper limit (16MBytes) on the size of the audio file that can be sent.
(NEW!)Asynchronous HTTP
Asynchronous HTTPBut,Synchronous HTTPIt can also convert audio files to text, but the process is slightly different. It is suitable for converting long audio files such as those shown below, or for converting a large number of audio files to text.
- Converting call center call recordings into text
- Converting meeting recording audio files into text
- Convert video files to text and create subtitles
Asynchronous HTTPThe sequence is as follows:

When you send an audio file, a value called sessionid is returned. You can then use this sessionid to check the status of the speech recognition process and obtain the speech recognition results.
Asynchronous HTTPWhen you execute the API, a response is returned immediately. Therefore, there is no need to maintain a session, and speech recognition processing of large audio files exceeding 16MBytes is also possible.
Synchronous HTTPWhen,Asynchronous HTTPWhich should you use?
When recognizing an audio file, should you use synchronous or asynchronous voice recognition? Please refer to the following.
- Synchronous HTTP
It is relatively easy to implement, so it is suitable for handling small audio files or for trial purposes. However, please note that it does not support file sizes over 16MBytes. - Asynchronous HTTP
It will take some effort to implement, but it is something that could not be done before.*1It also supports large audio file sizes, making it ideal for those who want to convert audio from long voice calls or conferences into text. Of course, it can also handle small audio file sizes.
Details
The manuals required for actual development are listed below, so please see here for details.
I/F Specifications Asynchronous HTTP Speech Recognition API Overview – AmiVoice Cloud Platform
At the end
This time it was added in October 2021Asynchronous HTTPI have explained the three current APIs, including the above. We would like to continue providing APIs that are easier to use depending on the application. If you have any opinions or requests, please send them to us in the comments.
Person who wrote this article
-

Shogo Ando
While researching speech recognition, I found a speech recognition company nearby and joined the company, where I continue to work to this day.
My hobbies are traveling abroad, eating delicious food, and saunas.
*1:In fact, it was possible to process large audio files by streaming them to the Websocket speech recognition API. However, this method can place an unexpectedly high load on the speech recognition server, so it is recommended to use aAsynchronous HTTPWe hope you will use it.
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use Zenn Coupon & Trial
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)
