I tried to recognize microphone input using Python

Ichikawa-chan
Hello, this is Ichikawa-chan.
I work in development at a company called Advanced Media.
I created a Python speech recognition program for myself, so I'd like to share it with you.
The construction is a bit rough, so please use it as a reference only.
It is running in the following environment.
| OS | ubuntu18.04 |
| ア ー キ テ ク チ ャ | AMD64 |
| GCC | 7.5.0 |
procedure
1. Register with AmiVoice API
2. Download and run the program
How to move
1. Register for the AmiVoice API.
Once registered, copy the APPKEY

2. Download and run the program
You can start voice recognition with the following command. Paste the APPKEY you copied in step 1 where XXX is.
$git clone https://github.com/r-ichikawa-amivoice/ami_speechrecognizer_py
$ bash run.sh a=XXX
After that, just speak appropriately and you'll get the results.

Commentary
main.py supports the following parameters:
| a | Specify the APPKEY for the AmiVoice API site. |
| r | Specifies the format of the input. Specifying "mic" will input from the microphone. Any other strings will be treated as audio file names and will be recognized as files. |
| o | Specifies the log output format. Specifying "console" will output the log to the console. If you specify “date”, the log will be output as a yyyy-MM-dd.txt file. Any other string will be treated as a file name and the log will be output to the file with that name. |
| l | Specify the log level. Log levels equal to or greater than the specified value will also be output. “0”:DEBUG - Debugging log “1”: INFO - Normal log “2”:WORN - Warning log “3”:ERROR - Error log |
If you want to change the microphone
Currently, information is collected from a microphone called pulse.
If you want to change it, please adjust the values around rec.audio_source.
To select a device, use rec.get_device() and specify the index of the device you want to use.
To change the number of channels
Change the value of rec.audio_format[“CHANNELS”].
However, since the AmiVoice API currently only supports one channel, it is safer not to change it.
If you want to split the channel and pass it, it might be a good idea to include a source like the one below.
from functools import partial def parse(audio, length, obj, amivoice): data = obj.channel_parse(audio) amivoice.write(data[0], int(length/obj.audio_format["CHANNELS"])) rec.recorder_write_func = partial(parse, obj = rec, amivoice = stt)
Things to consider
I'm not sure if the /amivoice folder is the latest source.
If you want to update, you can get it from here:
Summary
It is now possible to perform speech recognition using Python.
Person who wrote this article
-

Ichikawa-chan
There are only two types of people in the world: me and the rest of us.
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use Zenn Coupon & Trial
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)
