TV Asahi Create Corporation
We have developed a product that enables accurate and fast subtitling of live broadcasts without the need for specialized skills.
From left: Hiroyuki Shimonagayoshi, Takeshi Yokoyama, LIM・JOOSUK
Originally, creating subtitle text for live broadcast programs required "high input skills" and "highly accurate respeaking skills," requiring personnel with specialized skills. TV Asahi Create has combined its long-cultivated knowledge of live subtitling with AI to develop "J-TAC Pro," a product that "does not require high-level specialized skills for live subtitling" but "provides viewers with accurate, easy-to-read subtitles." We spoke with Hiroyuki Shimonagayoshi of the Real-Time Subtitling Department and Development and Operations Department of the Subtitle Production Division, Takefumi Yokoyama of the Development and Operations Department of the Subtitle Production Division, and LIM・JOOSUK of the CG Systems Department of the CG Production Division about the background to the development and the effects of adopting AmiVoice.
Issues/background
Securing personnel with specialized skills is a challenge when producing subtitles for live broadcasts
The main production methods for real-time subtitles are:
- A method in which a person transcribes the broadcast audio and types it in (high-speed input method)
- A method of respeaking broadcast audio and converting it into text using voice recognition
- A method of converting broadcast audio directly into text through voice recognition
There are three examples:
Until now, TV Asahi Create has used a method in which humans transcribe and type the broadcast audio (high-speed input method), but the workload of high-speed inputters who can handle this has been increasing year by year, and securing personnel with such special skills has also been an issue. To solve these issues, they established a new production method, "a method of directly recognizing the broadcast audio and correcting any misrecognition," and decided to develop J-TAC Pro in-house to realize this.
In 2012, we came across AmiVoice's cutting-edge speech recognition technology, which led us to release "J-TAC," a support system that partially automates package subtitling, in 2013. We have continued to develop various systems since then, and while some have made it into practical use, others have given up due to being unable to overcome various operational issues, and it has been a process of repeated trial and error up to this point.
The reason for adopting this technology is that the dramatic evolution of voice recognition technology in recent years and the launch of the "AmiVoice API" service, which allows easy use of voice recognition engines, have shown the potential to overcome the various problems that had been faced up until now.
Introduction Results
Successfully drastically reduced the time lag between speech and subtitles
The conventional method (high-speed input method) of adding real-time subtitles involves transcribing speech, which inevitably results in a delay between the speech and the addition of subtitles.
Reducing this delay was also an issue in developing a live subtitling system using AI speech recognition, but the original specifications for "J-TAC Pro" required that speech recognition results be output on an utterance-by-utterance basis, meaning that it took a long time to output the results if the speaker spoke in one go without taking a breath. This meant that proofreading work had to be performed, resulting in a larger delay between the utterance and the addition of subtitles than with conventional methods, which was an issue.
We had to solve this delay problem, and we spent a long time trying and failing to find a solution. After discussing it with your company's engineers, we discovered a way to obtain intermediate confirmation information on the recognition results sent from the speech recognition engine. By adopting this new method, we were able to significantly reduce the time from speaking to the start of proofreading. We were able to keep the delay from speaking to adding subtitles the same as with the conventional method, or even reduce it further, and the development of "J-TAC Pro" made great strides.

Highly rated by users, with comments such as "The voice recognition of numbers is almost never wrong, it's amazing!"
Specifically, we have received the following responses from user companies:
・The accuracy of voice recognition for target programs (news, etc.) is higher than that of other companies' products
・Even if proper nouns are not known in advance, such as in news reports, it is good that words can be easily registered in the voice recognition engine.
Unlike other companies' products, the system can obtain confirmation information during speech recognition, allowing for early proofreading, reducing the delay between speech and subtitle display.
- There is a short time lag before proper nouns in high-profile news are reflected in the speech recognition engine.
・We use it in programs that handle a lot of numerical information, and the voice recognition results are almost never incorrect when it comes to numerical information. I thought that was amazing!
We have the same impression regarding the recognition accuracy and functionality.
We also find it helpful that the AmiVoice API has a function that allows us to check each user's usage status, and we feel reassured by the smooth exchange of technical inquiries and responses regarding bugs and requests.

Voice recognition device screen

Calibration terminal screen
Future prospects
To contribute to labor savings at many TV broadcasting stations
Starting in April 2024, Shizuoka Asahi Television began adding real-time subtitles to live broadcast programs using "J-TAC Pro." Then, in January 2025, TV Tokyo also began adding real-time subtitles to some of its live broadcast programs using "J-TAC Pro." We will continue to improve "J-TAC Pro" to make it an even more convenient system so that it can contribute to labor savings at other television stations. Furthermore, we will actively consider developing new services and adding new functions so that the system can be used not only for television broadcasts but also for various video streaming services in the future.
Service Overview

The AI live subtitling system "J-TAC Pro" is a groundbreaking product that makes it easier, more efficient and faster to create subtitle text for live television broadcasts.
AI speech recognition converts broadcast audio into text in real time with high accuracy, and the AI automatic line break function even automatically inserts line breaks to make subtitles easier for viewers to read. This means that humans only need to correct any errors made by the AI speech recognition to complete the raw subtitle text.

| Company name | TV Asahi Create Corporation |
|---|---|
| Business Content | Art production for TV programs, planning and production of events, subtitling, etc. |
| URL | http://www.tv-asahi-create.co.jp |