Running speech recognition on Linux 3 (ACP+pulseaudio)

Ichikawa-chan
Hello, this is Ichikawa-chan.
I work in development at a company called Advanced Media.
"LinuxでSpeech recognitionI will be talking about this in three parts, with the topic being "trying to get it working."
This is the third installment, so please stay with us until the end.
Here's what we'll talk about in each session:
・Recognize voice from audio files using the AmiVoice Cloud Platform program.
- Simple voice recognition from the microphone using the AmiVoice Cloud Platform program.
・Recognize voice from the microphone using the AmiVoice Cloud Platform program. ←now
The goal this time is ACPC++and use the PulseAudio library to recognize the microphone.
This is a continuation of the article, so please refer to the previous article if you haven't seen it yet.
Running Speech Recognition on Linux 2 (Simple Microphone Input)
It is running in the following environment.
| OS | ubuntu18.04 |
| ア ー キ テ ク チ ャ | AMD64 |
| GCC | 7.5.0 |
procedure
- Writing programs using PulseAudio
- Modifying the ACP program
1. Write a program using PulseAudio
Before writing the program, you may be wondering what PulseAudio is, so I'll give you a brief explanation.
In Linux sound systems, ALSA is responsible for communicating with the sound card, but because it grabs the sound card, you can't play or record multiple things at once.
That's where PulseAudio comes in. This software acts as an intermediary between sound-using apps and ALSA, allowing multiple apps to control the sound card.
Now we will write a PulseAudio program.
I would like the file structure of the working directory to be as follows:
work /
├ acp/
│ ├ Wrp
│ ├ Hrp
│ ├ audio
│ ├ curl-ca-bundle.crt
│ └ readme.txt
└ poco/
This time we will use something that wraps PulseAudio.
You can build and run it with the following command:
$ cd ~/work
$ git clone https://github.com/r-ichikawa-amivoice/ami_pulseaudio
$ cd ami_pulseaudio
$ make
$ bash run.sh
If the sound from the microphone comes out of the speaker, it's successful.
ami_pulseaudio is a wrapper for PulseAudio that allows multi-threading. I created it because I had a lot of fun writing a C-like wrapper.
The main program is as follows:
#include "ami_pulseaudio.h"
#include <stdio.h>
#include <unistd.h>
void callback(enum AMI_PULSEAUDIO_RESULT_STATE result_state, int data_size, char* data){
switch(result_state){
case AMI_PULSEAUDIO_RESULT_STATE_ACCEPTED:
write(STDOUT_FILENO, data, data_size);
break;
}
}
int main(int argc, char** argv) {
int a = 0;
void* ap = 0;
ap = ami_pulseaudio_create(callback);
ami_pulseaudio_start(ap);
getchar();
ami_pulseaudio_stop(ap);
ami_pulseaudio_free(ap);
ap = 0;
}
To record
Please see the official documentation and examples of the unwrapped simple API.
If you are going to implement it properly, it is better to use an asynchronous API as you can do more things.
I have created a sample below for your reference.
2. Modifying the ACP program
I would like the file structure of the working directory to be as follows:
work /
├ acp/
│ ├ Wrp
│ ├ Hrp
│ ├ audio
│ ├ curl-ca-bundle.crt
│ └ readme.txt
├ poco/
└ ami_pulseaudio/
Use an appropriate editor to modify the include of acp/Wrp/cpp/WrpSimpleTester.cpp (around line 19) and the file opening (around line 402).
#include "ami_pulseaudio.h"
int wrp_flag = 1;
com::amivoice::wrp::Wrp* wrp_;
void callback(enum AMI_PULSEAUDIO_RESULT_STATE result_state, int data_size, char* data){
switch(result_state){
case AMI_PULSEAUDIO_RESULT_STATE_ACCEPTED:
if (!wrp_->feedData(data, 0, data_size)) {
printf("%s", wrp_->getLastMessage());
printf("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
wrp_flag = 0;
break;
}
}
}
...
#if 0
// 音声データファイルのオープン
FILE* audioStream;
if (fopen_s(&audioStream, audioFileName, "rb") == 0) {
// 音声データファイルからの音声データの読み込み
char audioData[4096];
int audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
while (audioDataReadBytes > 0) {
// 認識結果情報待機数が 1 以下になるまでスリープ
int maxSleepTime = 50000;
while (wrp->getWaitingResults() > 1 && maxSleepTime > 0) {
wrp->sleep(100);
maxSleepTime -= 100;
}
// WebSocket 音声認識サーバへの音声データの送信
if (!wrp->feedData(audioData, 0, audioDataReadBytes)) {
print("%s", wrp->getLastMessage());
print("WebSocket 音声認識サーバへの音声データの送信に失敗しました。")
;
break;
}
// 音声データファイルからの音声データの読み込み
audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
}
// 音声データファイルのクローズ
fclose(audioStream);
} else {
print("音声データファイル %s の読み込みに失敗しました。", audioFileName);
}
#else
//2回目で改造したあたり
void* ap = 0;
wrp_ = wrp;
ap = ami_pulseaudio_create(callback);
ami_pulseaudio_start(ap);
while(wrp_flag){
wrp->sleep(100);
}
ami_pulseaudio_stop(ap);
ami_pulseaudio_free(ap);
ap = 0;
#endif
Update WrpSimpleTester.makefile as follows:
PRJ = WrpSimpleTester CPPC = g++ -std=c++11 LD = g++ LDD = ldd -d SRC = \ $(PRJ).cpp LIB = \ -lWrp \ ../../ami_pulseaudio/lib/libami_pulseaudio.so CPPDEFINES = \ -DLINUX \ -DPOSIX \ -D$(if $(debug),_DEBUG,NDEBUG) CPPFLAGS = \ -w \ -O$(if $(debug),0 -g,3) \ -Isrc \ -I../../ami_pulseaudio/src LDFLAGS = \ -L$(OUTDIR) OUTDIR = bin/linux64$(if $(debug),_debug,_release) OBJDIR = obj/linux64$(if $(debug),_debug,_release)/$(PRJ) OUT = $(OUTDIR)/$(PRJ) OBJ = $(patsubst %.cpp,$(OBJDIR)/%.o,$(notdir $(SRC))) DEP = $(patsubst %.cpp,$(OBJDIR)/%.d,$(notdir $(SRC))) VPATH = $(sort $(dir $(SRC))) build: $(OUT) $(OUT): $(OBJ) @mkdir -p $(dir $@) $(LD) $(LDFLAGS) $(OBJ) $(LIB) -o $@ $(OBJ): $(OBJDIR)/%.o: %.cpp @mkdir -p $(dir $@) $(CPPC) $(CPPFLAGS) $(CPPDEFINES) -c -MMD
lt; -o $@
clean:
-rm -f $(OUT) $(OBJ) $(DEP)
-include $(DEP)
Create run3.sh as shown below.
#!/bin/bash
read -p "Please enter AppKey: " AppKey
set -x
export SSL_CERT_FILE=../../curl-ca-bundle.crt
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:bin/linux64_release:../../../ami_pulseaudio/lib/
bin/linux64_release/WrpSimpleTester wss://acp-api.amivoice.com/v1/ ../../audio/test.wav 16K -a-general $AppKey
Now try building and running it.
$ cd ~/work/acp/Wrp/cpp/
$ bash build
$ bash run3.sh
After executing it, try speaking into the microphone,
If you get a result like the following, it's successful.
Please enter AppKey: [APPKEY]
+ export SSL_CERT_FILE=../../curl-ca-bundle.crt
+ SSL_CERT_FILE=../../curl-ca-bundle.crt
+ export LD_LIBRARY_PATH=/opt/ros/melodic/lib:bin/linux64_release:../../../ami_pulseaudio/lib/
+ LD_LIBRARY_PATH=/opt/ros/melodic/lib:bin/linux64_release:../../../ami_pulseaudio/lib/
+ bin/linux64_release/WrpSimpleTester wss://acp-api.amivoice.com/v1/ ../../audio/test.wav 16K -a-general [APPKEY]
{“results”:[{“tokens”:[{“written”:”\u3053\u3093\u306b\u3061\u306f”,”confidence”:0.98,”starttime”:5506,”endtime”:6306,”spoken”:”\u3053\u3093\u306b\u3061\u306f”},{“written”:”\u3002″,”confidence”:0.84,”starttime”:6306,”endtime”:6642,”spoken”:”_”}],”confidence”:1.000,”starttime”:5250,”endtime”:6642,”tags”:,”rulename”:””,”text”:”\u3053\u3093\u306b\u3061\u306f\u3002″}],”utteranceid”:”20210329/ja_ja-amivoicecloud-16k-hon-ichikawa@01787d13deee0a301c5f8536-0329_172243″,”text”:”\u3053\u3093\u306b\u3061\u306f\u3002″,”code”:””,”message”:””}
-> こんにちは。
{“results”:[{“tokens”:[{“written”:”\u3055\u3088\u3046\u306a\u3089″,”confidence”:0.99,”starttime”:7420,”endtime”:8348,”spoken”:”\u3055\u3088\u3046\u306a\u3089″},{“written”:”\u3002″,”confidence”:0.72,”starttime”:8348,”endtime”:8524,”spoken”:”_”}],”confidence”:0.996,”starttime”:7100,”endtime”:9436,”tags”:,”rulename”:””,”text”:”\u3055\u3088\u3046\u306a\u3089\u3002″}],”utteranceid”:”20210329/ja_ja-amivoicecloud-16k-hon-ichikawa@01787d13deee0a301c5f8536-0329_172245″,”text”:”\u3055\u3088\u3046\u306a\u3089\u3002″,”code”:””,”message”:””}
-> さようなら。
Summary
This time, I modified the ACP sample to perform real-time recognition.
This is my first time writing an open source program, so I'm not very confident, but I hope this will be helpful to someone.
If you have any bug reports or advice, please let me know.
Person who wrote this article
-

Ichikawa-chan
This is a monster that you will likely encounter when making inquiries about Linux or embedded systems.
Most viewed articles
- A quick explanation of how speech recognition works!
- Comparing the speech recognition rates of OpenAI's Whisper and AmiVoice for "conference" audio
- How to use the AmiVoice API free coupon
New articles
- How to use Zenn Coupon & Trial
- How to use coupons for Zenn Spring 2026
- "Speech segment ratio" as seen in operational data
Category list
- Introduction to Speech Recognition (15)
- How to improve voice recognition accuracy (12)
- I tried developing it (27)
- How to use AmiVoiceAPI(27)
- Comparison and Verification (6)
- Others(10)


