AmiVoice API
Price

Usage Fees

  • 60 minutes free per month on all engines

    All engines and the sentiment analysis option come with 60 minutes of free use per month, allowing you to verify and develop without incurring any costs.

  • Charges are per second and only for the spoken part

    Billing is done in one-second increments. Also, unlike other companies' APIs, billing is done only for "speech segments" detected from audio data, and there is no charge for silent segments.
    *BGM, TV audio, voices from neighbors, and some other noises may be detected as speech segments.

Price list (tax included)
Categories Interface No Logging Logging
General-purpose WebSocket/Synchronous HTTP
0.04125 yen/second
148.5 yen/hour
0.0275 yen/second
99 yen/hour
General-purpose Asynchronous HTTP
0.0275 yen/second
99 yen/hour
0.022 yen/second
79.2 yen/hour
Medical >Click here for details WebSocket/Synchronous HTTP/Asynchronous HTTP
0.061875 yen/second
222.75 yen/hour
0.04125 yen/second
148.5 yen/hour
Finance WebSocket/Synchronous HTTP/Asynchronous HTTP
0.04125 yen/second
148.5 yen/hour
0.033 yen/second
118.8 yen/hour
Insurance WebSocket/Synchronous HTTP/Asynchronous HTTP
0.04125 yen/second
148.5 yen/hour
0.033 yen/second
118.8 yen/hour
Multilingual WebSocket/Synchronous HTTP/Asynchronous HTTP
0.04125 yen/second
148.5 yen/hour
0.033 yen/second
118.8 yen/hour
Categories Interface Price
General purpose WebSocket
Synchronous HTTP

No Logging

0.04125 yen/second
148.5 yen/hour

Logging

0.0275 yen/second
99 yen/hour
General purpose Asynchronous HTTP

No Logging

0.0275 yen/second
99 yen/hour

Logging

0.022 yen/second
79.2 yen/hour
Medical >Click here for details WebSocket
Synchronous HTTP
Asynchronous HTTP

No Logging

0.061875 yen/second
222.75 yen/hour

Logging

0.04125 yen/second
148.5 yen/hour
金融 WebSocket
Synchronous HTTP
Asynchronous HTTP

No Logging

0.04125 yen/second
148.5 yen/hour

Logging

0.033 yen/second
118.8 yen/hour
insurance WebSocket
Synchronous HTTP
Asynchronous HTTP

No Logging

0.04125 yen/second
148.5 yen/hour

Logging

0.033 yen/second
118.8 yen/hour
Multilingual WebSocket
Synchronous HTTP
Asynchronous HTTP

No Logging

0.04125 yen/second
148.5 yen/hour

Logging

0.033 yen/second
118.8 yen/hour

For the differences between the WebSocket, Synchronous HTTP, and Asynchronous HTTP interfaces, see "Overview and Usage Scenarios".

About categories

  • "General-purpose" refers to the following speech recognition engines: "Conversation_General-purpose", "Voice Input_General-purpose", "English_General-purpose", "Chinese_General-purpose", "Korean_General-purpose", "Japanese E2E_General-purpose", "Japanese E2E_General-purpose Batch", "Chinese E2E_General-purpose", and "Chinese E2E_General-purpose Batch".
  • "Medical" refers to the speech recognition engines for "Conversation_Medical" and "Voice Input_Medical."
  • "Finance" refers to the speech recognition engines for "Conversation_Finance" and "Voice Input_Finance."
  • "Insurance" refers to the speech recognition engines for "Conversation_Insurance" and "Voice Input_Insurance."
  • "Multilingual" refers to the "Multilingual E2E_General" and "Multilingual E2E_General Batch" speech recognition engines.

*For details on the speech recognition engine, please see Speech Recognition Engine for more information.

About logging function

  • If you would like to provide data for research and development and quality improvement of our products and services, you can receive a discount by selecting "Logging".
  • If it is difficult to provide data due to security requirements or other reasons, please select "No logging".

*For details on logging function, please see Logging for more information.

Usage Amount

  • Usage refers to the length of audio data sent to the API endpoint that AI determines to be speech segments and that is subject to billing. Silence segments are not included in the usage amount.
  • Usage for each engine is calculated in milliseconds. When calculating monthly usage fees, any fractional seconds is rounded down.

About fee calculation

  • The usage fee for each engine is calculated by subtracting the free usage limit (3,600 seconds) from the usage amount for each engine, and then multiplying that amount by the unit price (excluding tax) for each engine.
  • The usage fee for each engine is calculated in units of 1 yen. Amounts less than 1 yen will be rounded up or down.
  • The usage fee for the current month (excluding tax) will be the total of the usage fees for each engine. If the total amount (excluding tax) is less than 50 yen, no charge will be made.
  • The actual consumption tax is calculated by multiplying the current month's usage fee (excluding tax) by the consumption tax rate. Any fraction less than 1 yen will be rounded down.

Types of speech recognition engines

You can use a general-purpose speech recognition engine that can be used in a wide range of situations, an industry-specific engine that can recognize industry-specific terminology such as medical terminology with high accuracy, and a multilingual engine that can support multiple languages. There is a "Conversation" engine that is suitable for transcribing conversations and phone calls, and a "Voice Input" engine that is optimized for voice input into electronic medical records and smart device operation.

Details

Categories Description Generation Supported Language

General-purpose

It is a versatile engine that can be used in a wide range of situations and businesses. It is the latest engine. End to End Japanese and Chinese

General-purpose

It is a general-purpose engine that can be used in a wide range of situations and businesses. "Word Registration" improves the recognition accuracy of specific words and phrases such as product names and proper nouns. Hybrid Japanese, English, Chinese, Korean

Medical

It provides high-accuracy recognition of speech that includes medical terminology, such as symptoms, disease names, drug names, and medical device names. >Click here for details Hybrid Japanese

Finance

It provides high-accuracy recognition of speech that includes technical terms related to financial products and markets, such as stocks, bonds, and foreign exchange. Hybrid Japanese

Insurance

It provides high-accuracy recognition of speech that includes technical terms related to insurance contracts, product details, and various special provisions. Hybrid Japanese

Multilingual

Even if multiple languages ​​are mixed in the speech, it will be converted into text in each language. Supported languages ​​are Japanese, English, and Chinese. End to End Multilingual
Categories generation Supported Language

General-purpose

End to End Japanese and Chinese
It is a versatile engine that can be used in a wide range of situations and businesses. It is the latest engine.

General-purpose

Hybrid Japanese, English, Chinese, Korean
It is a general-purpose engine that can be used in a wide range of situations and businesses. "Word Registration" improves the recognition accuracy of specific words and phrases such as product names and proper nouns.

Medical

Hybrid Japanese
It provides high-accuracy recognition of speech that includes medical terminology, such as symptoms, disease names, drug names, and medical device names. >Click here for details

Finance

Hybrid Japanese
It provides high-accuracy recognition of speech that includes technical terms related to financial products and markets, such as stocks, bonds, and foreign exchange.

Insurance

Hybrid Japanese
It provides high-accuracy recognition of speech that includes technical terms related to insurance contracts, product details, and various special provisions.

Multilingual

End to End Multilingual
Even if multiple languages ​​are mixed in the speech, it will be converted into text in each language. Supported languages ​​are Japanese, English, and Chinese.

*End to End is the latest speech recognition engine. "Hybrid" is a highly specialized engine that can support not only general conversations and voice input, but also phrases and vocabulary from various domains.

Option

If you use optional features, the following will be charged in addition to the above usage fees.

Sentiment analysis

This enables more detailed voice analysis, such as grasping the emotional context in speech. It is available with Asynchronous HTTP speech recognition API.

Sentiment analysis (tax included)
0.0440 yen/second
158.4 yen/hour

60 minutes free per month on all engines

Start Using API

Legal Information and Reliability

Use API for Free