AmiVoice API

AmiVoice API allows you to integrate high-quality speech recognition functionality into your services and products.

What is AmiVoice API?

A speech to text API that converts speech to text. By sending audio to the API endpoint, you receive a transcribed text of the speech. Speech-enabled applications such as for conference transcriptions and voice dialogue systems can be created.

Features of AmiVoice API

  • Choose the engine that best suits your industry and application

    In addition to the "general-purpose engine" that can be used in a variety of situations and businesses, we have engines that specialize in technical terms and industry terminology such as those in the medical field. By selecting the engine that best suits your usage scenario, the recognition rate can be significantly improved.

  • Free for up to 60 minutes per month. From 99 yen per hour

    There are no initial costs and it is available on a pay-as-you-go basis. You can use high-quality speech recognition engines at low prices starting from 79.2 yen per hour. Each engine can be used free of charge for 60 minutes per month.

  • Always updated via auto-training

    The Conversation_General-purpose engine is constantly updated to the latest version by the Auto-Training System (ATS). It can also recognize new words that have only recently appeared.

  • Free technical support

    We handle everything in-house, from speech recognition engine development to service provision, and our technical staff will provide direct, free support for any technical inquiries, such as when introducing the API or when dealing with individual API-related problems after the start of operation.

Real-time or batch

  • Real-time text conversion (WebSocket/Synchronous HTTP)

    Receive streamed speech recognition results in real time as the audio is processed.

    Usage scenario examples

    • Transcription of contact center calls
    • Transcription of meeting remarks
    • Smartphone applications and embedded devices
    • Voice control of IoT devices
    • Voice response enabled for telephone automated response systems

    Batch Text Transformation (Asynchronous HTTP)

    Batch convert huge audio files to text.
     

    Usage scenario examples

    • Contact center call recording transcription
    • Transcription of meeting recordings
    • Transcription of accumulated video archives
  • Recognize proper nouns and in-house terminology through word registration

    You can register words in the user dictionary, such as product names and proper nouns, depending on the usage scenario. This improves the recognition accuracy of specific words and phrases, such as company terminology and names.

  • Automatically add punctuation and question marks

    AmiVoice API automatically adds punctuation and question marks to the recognition results, helping you create more accurate and understandable spoken sentences.

  • Automatically remove fillers

    Fillers such as "ah," "urm," and "hmm" are automatically deleted (you can also set them not to be deleted). This reduces the effort required to correct recognition results.

  • Speaker diarization function

    This function identifies who spoke and when for audio containing multiple speakers. It identifies and labels speakers without prior training.

  • Blocking of inappropriate terms

    You can use it with confidence because inappropriate terms (sexual language, discriminatory language, etc.) in official situations such as business settings will not be converted into text.

  • Display pronunciation

    The speech recognition results are given pronunciation that is estimated from the speech. This can be used not only for display in apps for children, but also for data matching in subsequent systems such as CRM and product databases.

Optional FeaturesFree for up to 60 minutes per month

Sentiment analysis

The system analyzes the speaker's emotions from the voice and outputs 20 parameters, such as joy, anger, stress, dissatisfaction, and expectation, approximately once every two seconds. It allows for more detailed voice analysis, such as being able to grasp the emotional ups and downs hidden behind the expression "thank you" for example. Sentiment analysis results can be used in a wide range of situations, including contact centers, marketing, human resources, and medicine. We provide ESAS technology. Based on the latest sentiment analysis engine provided by Israel's Nemesysco, it was developed by ES Japan through verification and research of 3 million audio sources in Japan. You can use it with the asynchronous HTTP speech recognition API. You can use it for free for up to 60 minutes per month, so please feel free to try it out.

AmiVoiceAPI sentiment analysis function (optional function)

Legal Information and Reliability

60 minutes free per month on all engines

Start Using API

AmiVoice API Case Studies

We also recommend these services:

Use API for Free