Tech blog
  • HOME
  • Blog
  • [We Tested It!] How does speech recognition accuracy change with sampling rate and compression rate? 

[We Tested It!] How does speech recognition accuracy change with sampling rate and compression rate? 

Published: 2025.07.29 Last updated: 2025.07.29

"Won't the accuracy of speech recognition decrease unless the data is of high quality?"
"I want to reduce the file size, but how much compression is okay?"

Have you ever had any of these questions when using a speech recognition service?
Considering storage capacity and communication bandwidth, we want to keep data size small, but we don't want to sacrifice accuracy... To solve this problem, we actually tested the effect that sampling rate and compression rate have on speech recognition accuracy in the AmiVoice API.

If you want to know the optimal format for audio data, or if you're struggling to balance data capacity and recognition accuracy, we hope you'll find this article useful.

Verification flow

This time, we will verify from the following two perspectives. 

  1. Differences in speech recognition accuracy due to sampling rate
  2. Differences in speech recognition accuracy due to compression ratio 

STEP 1: Prepare validation data

The data used for the verification is as follows:
We created the audio with diversity in mind to avoid a dataset that is biased towards specific speakers or content.

  • A total of 10 patterns of speech content, including daily reports, news, meetings, readings, and instructions for voice input
  • The speakers were two men and three women, ranging in age from their 20s to their 40s.
  • To eliminate the influence of noise, all audio data is recorded in a clear environment.
No.GenreOverviewWord countAudio length
1Daily ReportLunch menu details165 letters27 seconds
2NewsWeather News159 letters29 seconds
3CommandVoice command168 letters53 seconds
4Company ProfileCompany profile of Advanced Media, Inc.416 letters1 minutes 11 seconds
5ParliamentCity Council recordings477 letters1 minutes 19 seconds
6Company ProfileOur Advanced Media, Inc. Outlook563 letters1 minutes 46 seconds
7LiteratureStory reading479 letters1 minutes 47 seconds
8NewsWeather News512 letters1 minutes 50 seconds
9ParliamentRecording data of the Tokyo Metropolitan Assembly762 letters2 minutes 19 seconds
10LiteratureProse reading827 letters2 minutes 45 seconds

STEP 2: Speech recognition using AmiVoice API

Based on the audio data, we use the AmiVoice API provided by our company to perform speech recognition on audio data at various sampling rates and compression rates using curl commands. 

{APP_KEY} specifies the APPKEY required to use the AmiVoice API. Also, since the speech recognition results from the API are output in JSON format, they are formatted using the jq command and output to the "out" file. 

STEP 3: Calculating the results

We use a Python library called jiwer to measure the speech recognition accuracy and the correct answer data for each data set. For more information on how to use "jiwer", please refer to this article.

For evaluation, the average speech recognition accuracy of 10 patterns of speech data is measured using the following code. 

Test result 1: Comparison by sampling rate

Sampling rate is an index used to express sound quality. For example, sampling rates of 8 kHz are used for telephones and 44.1 kHz for CDs. The higher the sampling rate, the more high frequencies can be expressed, so a high sampling rate is important for music.
 
On the other hand, there is a limit to the pitch of human speech, so for general speech recognition applications, such a high sampling rate is not required.
The optimal sampling rate for the AmiVoice API is 16kHz. When audio with a sampling rate higher than 16kHz is input, it is internally converted (downsampled) to sound quality equivalent to a sampling rate of 16kHz and then processed for speech recognition. Therefore, there is almost no difference in accuracy when recognizing data with a sampling rate of 48kHz or 16kHz.

Now, let's actually use data with sampling rates of 48 kHz and 16 kHz to verify whether there is any difference in speech recognition accuracy.

Targets for comparison

We compared audio data with sampling rates of 48kHz and 16kHz. The details of the audio data are as follows:

  • Format: Uncompressed Wave data
  • Bit depth: 16-bit
  • Channel: Mono

Comparison result

The kbps and speech recognition accuracy for each sampling rate are as shown in the table below.

Sampling rate 48kHzSampling rate 16kHz
kbps768kbps256kbps
Average speech recognition accuracy 
(by micro-averaging)
98.0%98.0%

*kbps: Amount of data that can be processed per second
*Micro-average method: A method of calculating the average error rate by adding up the number of errors per character in all data sets and dividing by the total number of characters.

With the AmiVoice API, we have confirmed that increasing the sampling rate above 16kHz does not affect the accuracy of speech recognition.

Please note that the Amivoice API speech recognition engine is updated daily and the amount of calculation is adjusted slightly depending on the load on the speech recognition server, so sending the same voice data does not guarantee the same results. When conducting comparative testing, there may be slight errors in the speech recognition accuracy results.

Test result 2: Comparison by compression ratio

When designing an application, audio compression is extremely important to reduce communication data volume and storage capacity. However, excessive compression can degrade sound quality and affect recognition accuracy. So, what compression rate is necessary to maintain practical accuracy? We used this Opus to compare recognition accuracy by changing the compression rate (bit rate).

Targets for comparison

Using compressed audio data in Opus format, we compared six compression rates ranging from 6 kbps to 256 kbps.

Comparison result

The speech recognition accuracy for each kbps is shown in the table below. The compression rate for the original data is also listed.

256kbps128kbps64kbps32kbps16kbps6kbps
Compression ratioApproximately 1/1Approximately 1/2Approximately 1/4Approximately 1/8Approximately 1/16Approximately 1/43
Average speech recognition accuracy 
(by micro-averaging)
98.0%98.0%98.1%98.2%97.9%95.5%

Consideration

At 6 kbps, accuracy was lower than at other compression rates, but no significant difference was observed at 16 kbps or above. This shows that speech recognition can be performed with sufficient accuracy even with a certain level of compression. 

However, if the compression is so strong that it is difficult for the human ear to hear, it will affect recognition accuracy. We provide the following compression rate guidelines for each compression method. 

  • Speex: quality 7 or higher 
  • Opus: Compression ratio of about 1:10 

Summary

This time, we used actual sample voice data to verify how speech recognition accuracy changes depending on the "sampling rate" and "compression rate."
With the AmiVoice API, when audio with a sampling rate higher than 16 kHz is input, it is downsampled to 16 kHz internally, so we were able to confirm that there is no difference in speech recognition accuracy even if audio data with an excessively high sampling rate is used.
We also found that a certain level of compression rate does not affect speech recognition accuracy. However, strong compression can affect recognition accuracy, so care must be taken. If you are using compressed audio data and are not getting the recognition accuracy you expect, please check the compression rate of your audio data.

We also recommend that you try it out using your own company's voice data, referring to the verification procedures we have introduced.
AmiVoice API offers a free trial, so please try it out and use it to reduce data size and improve speech recognition accuracy.

appendix

Below is a list of the file sizes of each audio file used in the verification.

Sample Audio 
Data No.
File size (kbytes) 
48kHz
File size (kbytes) 
16kHz
12,634878
22,814938
35,0981,699
46,8551,699
57,6302,543
610,2323,411
710,3243,441
810,6093,536
913,3994,466
1015,9145,305
Sample Audio 
Data No.
File size (kbytes)
256kbps128kbps64kbps32kbps16kbps6kbps
18854452241145622
29464752391246125
31,71386043320810238
42,3031,15658129915358
52,5631,28764733816768
63,4361,72586742520881
73,4671,74187543021379
83,5631,78989945522287
94,5002,2591,136585295113
105,3442,6831,348667325119
Use API for Free