Tech blog
  • HOME
  • Blog
  • Running speech recognition on Linux 3 (ACP+pulseaudio)

Running speech recognition on Linux 3 (ACP+pulseaudio)

Published: 2021.08.09 Last updated: 2025.03.04

Ichikawa Ichikawa-chan

Hello, this is Ichikawa-chan.

I work in development at a company called Advanced Media.

"LinuxSpeech recognitionI will be talking about this in three parts, with the topic being "trying to get it working."

This is the third installment, so please stay with us until the end.

Here's what we'll talk about in each session:

・Recognize voice from audio files using the AmiVoice Cloud Platform program.

- Simple voice recognition from the microphone using the AmiVoice Cloud Platform program.

・Recognize voice from the microphone using the AmiVoice Cloud Platform program. ←now

The goal this time is ACPC++and use the PulseAudio library to recognize the microphone.

This is a continuation of the article, so please refer to the previous article if you haven't seen it yet.

Running Speech Recognition on Linux 2 (Simple Microphone Input)


It is running in the following environment.

OS ubuntu18.04
ア ー キ テ ク チ ャ AMD64
GCC 7.5.0

procedure

  1. Writing programs using PulseAudio
  2. Modifying the ACP program

1. Write a program using PulseAudio

Before writing the program, you may be wondering what PulseAudio is, so I'll give you a brief explanation.

What is PulseAudio?

In Linux sound systems, ALSA is responsible for communicating with the sound card, but because it grabs the sound card, you can't play or record multiple things at once.
That's where PulseAudio comes in. This software acts as an intermediary between sound-using apps and ALSA, allowing multiple apps to control the sound card.

Now we will write a PulseAudio program.

I would like the file structure of the working directory to be as follows:

 work /
├ acp/
│ ├ Wrp
│ ├ Hrp
│ ├ audio
│ ├ curl-ca-bundle.crt
│ └ readme.txt
└ poco/

This time we will use something that wraps PulseAudio.

github.com

You can build and run it with the following command:

$ cd ~/work
$ git clone https://github.com/r-ichikawa-amivoice/ami_pulseaudio
$ cd ami_pulseaudio
$ make
$ bash run.sh

If the sound from the microphone comes out of the speaker, it's successful.

ami_pulseaudio is a wrapper for PulseAudio that allows multi-threading. I created it because I had a lot of fun writing a C-like wrapper.

The main program is as follows:


#include "ami_pulseaudio.h"
#include <stdio.h>
#include <unistd.h>

void callback(enum AMI_PULSEAUDIO_RESULT_STATE result_state, int data_size, char* data){
	switch(result_state){
		case AMI_PULSEAUDIO_RESULT_STATE_ACCEPTED:
			write(STDOUT_FILENO, data, data_size);
			break;
	}
}

int main(int argc, char** argv) {
	int a = 0;
	void* ap = 0;
	ap = ami_pulseaudio_create(callback);
	ami_pulseaudio_start(ap);
	getchar();
	ami_pulseaudio_stop(ap);
	ami_pulseaudio_free(ap);
	ap = 0;
}

To recordAfter calling ami_pulseaudio_create,ami_pulseaudio_startThat's it. It's pretty easy.

Please see the official documentation and examples of the unwrapped simple API.

freedesktop.org

If you are going to implement it properly, it is better to use an asynchronous API as you can do more things.

I have created a sample below for your reference.

github.com

 2. Modifying the ACP program

I would like the file structure of the working directory to be as follows:

work /
├ acp/
│ ├ Wrp
│ ├ Hrp
│ ├ audio
│ ├ curl-ca-bundle.crt
│ └ readme.txt
├ poco/
└ ami_pulseaudio/

Use an appropriate editor to modify the include of acp/Wrp/cpp/WrpSimpleTester.cpp (around line 19) and the file opening (around line 402).


#include "ami_pulseaudio.h"
int wrp_flag = 1;
com::amivoice::wrp::Wrp* wrp_;
void callback(enum AMI_PULSEAUDIO_RESULT_STATE result_state, int data_size, char* data){
	switch(result_state){
		case AMI_PULSEAUDIO_RESULT_STATE_ACCEPTED:
			if (!wrp_->feedData(data, 0, data_size)) {
				printf("%s", wrp_->getLastMessage());
				printf("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
				wrp_flag = 0;
				break;
			}
	}
}
...
#if 0
// 音声データファイルのオープン
FILE* audioStream;
if (fopen_s(&audioStream, audioFileName, "rb") == 0) {
// 音声データファイルからの音声データの読み込み
char audioData[4096];
int audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
while (audioDataReadBytes > 0) {
// 認識結果情報待機数が 1 以下になるまでスリープ
int maxSleepTime = 50000;
while (wrp->getWaitingResults() > 1 && maxSleepTime > 0) {
wrp->sleep(100);
maxSleepTime -= 100;
}

// WebSocket 音声認識サーバへの音声データの送信
if (!wrp->feedData(audioData, 0, audioDataReadBytes)) {
print("%s", wrp->getLastMessage());
print("WebSocket 音声認識サーバへの音声データの送信に失敗しました。")
;
break;
}

// 音声データファイルからの音声データの読み込み
audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
}

// 音声データファイルのクローズ
fclose(audioStream);
} else {
print("音声データファイル %s の読み込みに失敗しました。", audioFileName);
}
#else
//2回目で改造したあたり
void* ap = 0;
wrp_ = wrp;
ap = ami_pulseaudio_create(callback);
ami_pulseaudio_start(ap);
while(wrp_flag){
wrp->sleep(100);
}
ami_pulseaudio_stop(ap);
ami_pulseaudio_free(ap);
ap = 0;
#endif

Update WrpSimpleTester.makefile as follows:


PRJ = WrpSimpleTester
CPPC = g++ -std=c++11
LD = g++
LDD = ldd -d
SRC = \
$(PRJ).cpp
LIB = \
-lWrp \
../../ami_pulseaudio/lib/libami_pulseaudio.so
CPPDEFINES = \
-DLINUX \
-DPOSIX \
-D$(if $(debug),_DEBUG,NDEBUG)
CPPFLAGS = \
-w \
-O$(if $(debug),0 -g,3) \
-Isrc \
-I../../ami_pulseaudio/src
LDFLAGS = \
-L$(OUTDIR)
OUTDIR = bin/linux64$(if $(debug),_debug,_release)
OBJDIR = obj/linux64$(if $(debug),_debug,_release)/$(PRJ)
OUT = $(OUTDIR)/$(PRJ)
OBJ = $(patsubst %.cpp,$(OBJDIR)/%.o,$(notdir $(SRC)))
DEP = $(patsubst %.cpp,$(OBJDIR)/%.d,$(notdir $(SRC)))
VPATH = $(sort $(dir $(SRC)))
build: $(OUT)
$(OUT): $(OBJ)
@mkdir -p $(dir $@)
$(LD) $(LDFLAGS) $(OBJ) $(LIB) -o $@
$(OBJ): $(OBJDIR)/%.o: %.cpp
@mkdir -p $(dir $@)
$(CPPC) $(CPPFLAGS) $(CPPDEFINES) -c -MMD



lt; -o $@
clean:
-rm -f $(OUT) $(OBJ) $(DEP)
-include $(DEP)

Create run3.sh as shown below.


#!/bin/bash
read -p "Please enter AppKey: " AppKey
set -x
export SSL_CERT_FILE=../../curl-ca-bundle.crt
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:bin/linux64_release:../../../ami_pulseaudio/lib/
bin/linux64_release/WrpSimpleTester wss://acp-api.amivoice.com/v1/ ../../audio/test.wav 16K -a-general $AppKey

Now try building and running it.


$ cd ~/work/acp/Wrp/cpp/
$ bash build
$ bash run3.sh

After executing it, try speaking into the microphone,
If you get a result like the following, it's successful.


Please enter AppKey: [APPKEY]
+ export SSL_CERT_FILE=../../curl-ca-bundle.crt
+ SSL_CERT_FILE=../../curl-ca-bundle.crt
+ export LD_LIBRARY_PATH=/opt/ros/melodic/lib:bin/linux64_release:../../../ami_pulseaudio/lib/
+ LD_LIBRARY_PATH=/opt/ros/melodic/lib:bin/linux64_release:../../../ami_pulseaudio/lib/
+ bin/linux64_release/WrpSimpleTester wss://acp-api.amivoice.com/v1/ ../../audio/test.wav 16K -a-general [APPKEY]
{“results”:[{“tokens”:[{“written”:”\u3053\u3093\u306b\u3061\u306f”,”confidence”:0.98,”starttime”:5506,”endtime”:6306,”spoken”:”\u3053\u3093\u306b\u3061\u306f”},{“written”:”\u3002″,”confidence”:0.84,”starttime”:6306,”endtime”:6642,”spoken”:”_”}],”confidence”:1.000,”starttime”:5250,”endtime”:6642,”tags”:,”rulename”:””,”text”:”\u3053\u3093\u306b\u3061\u306f\u3002″}],”utteranceid”:”20210329/ja_ja-amivoicecloud-16k-hon-ichikawa@01787d13deee0a301c5f8536-0329_172243″,”text”:”\u3053\u3093\u306b\u3061\u306f\u3002″,”code”:””,”message”:””}
-> こんにちは。
{“results”:[{“tokens”:[{“written”:”\u3055\u3088\u3046\u306a\u3089″,”confidence”:0.99,”starttime”:7420,”endtime”:8348,”spoken”:”\u3055\u3088\u3046\u306a\u3089″},{“written”:”\u3002″,”confidence”:0.72,”starttime”:8348,”endtime”:8524,”spoken”:”_”}],”confidence”:0.996,”starttime”:7100,”endtime”:9436,”tags”:,”rulename”:””,”text”:”\u3055\u3088\u3046\u306a\u3089\u3002″}],”utteranceid”:”20210329/ja_ja-amivoicecloud-16k-hon-ichikawa@01787d13deee0a301c5f8536-0329_172245″,”text”:”\u3055\u3088\u3046\u306a\u3089\u3002″,”code”:””,”message”:””}
-> さようなら。

Summary

This time, I modified the ACP sample to perform real-time recognition.

This is my first time writing an open source program, so I'm not very confident, but I hope this will be helpful to someone.

If you have any bug reports or advice, please let me know. 

Person who wrote this article

  • Ichikawa-chan

    This is a monster that you will likely encounter when making inquiries about Linux or embedded systems.

 
 
 
 
 
 
 
 
 
 
 
 
Use API for Free