話者ダイアライゼーション
ここでは、各インターフェースの話者ダイアライゼーションの利用方法を学びます。
1. 同期インターフェースで話者ダイアライゼーションを利用する
diarization1.wavには、女性1人男性1人の音声が含まれています。
(女性A)AmiVoice APIに音声を送信すると、発話内容をテキストにした結果を返します。
(男性B)会議の文字起こしや音声対話システムなどの音声対応アプリケーションを作成できます。
話者ダイアライゼーションを有効にして、同期HTTPインターフェースの音声認識を実行します。
リクエストの条件
音声認識エンジン:日本語・会話_汎用
使用する音声ファイル:diarization1.wav
同期HTTPインターフェースで話者ダイアライゼーションを利用する方法はこちらを参照してください。
リクエストとレスポンス例
リクエスト
Linux・Mac
curl https://acp-api.amivoice.com/v1/recognize \
-F u=$APP_KEY \
-F d="grammarFileNames=-a-general segmenterProperties=useDiarizer=1" \
-F a=@diarization1.wav
Windows
curl https://acp-api.amivoice.com/v1/recognize -F u=%APP_KEY% -F d="grammarFileNames=-a-general segmenterProperties=useDiarizer=1" -F a=@diarization1.wav
レスポンス
{
"results": [
{
"tokens": [
{
"written": "AmiVoice",
"confidence": 1.00,
"starttime": 488,
"endtime": 1176,
"spoken": "あみぼいす",
"label": "speaker0"
},
{
"written": "API",
"confidence": 1.00,
"starttime": 1176,
"endtime": 1688,
"spoken": "えーぴーあい",
"label": "speaker0"
},
{
"written": "に",
"confidence": 0.99,
"starttime": 1688,
"endtime": 1944,
"spoken": "に",
"label": "speaker0"
},
{
"written": "音声",
"confidence": 1.00,
"starttime": 1976,
"endtime": 2376,
"spoken": "おんせい",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1.00,
"starttime": 2376,
"endtime": 2472,
"spoken": "を",
"label": "speaker0"
},
{
"written": "送信",
"confidence": 1.00,
"starttime": 2472,
"endtime": 2920,
"spoken": "そうしん",
"label": "speaker0"
},
{
"written": "する",
"confidence": 1.00,
"starttime": 2920,
"endtime": 3192,
"spoken": "する",
"label": "speaker0"
},
{
"written": "と",
"confidence": 1.00,
"starttime": 3192,
"endtime": 3432,
"spoken": "と",
"label": "speaker0"
},
{
"written": "、",
"confidence": 0.67,
"starttime": 3432,
"endtime": 3464,
"spoken": "_",
"label": "speaker0"
},
{
"written": "発話",
"confidence": 1.00,
"starttime": 3464,
"endtime": 3864,
"spoken": "はつわ",
"label": "speaker0"
},
{
"written": "内容",
"confidence": 1.00,
"starttime": 3864,
"endtime": 4280,
"spoken": "ないよう",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1.00,
"starttime": 4280,
"endtime": 4344,
"spoken": "を",
"label": "speaker0"
},
{
"written": "テキスト",
"confidence": 1.00,
"starttime": 4344,
"endtime": 4856,
"spoken": "てきすと",
"label": "speaker0"
},
{
"written": "に",
"confidence": 1.00,
"starttime": 4856,
"endtime": 4984,
"spoken": "に",
"label": "speaker0"
},
{
"written": "した",
"confidence": 1.00,
"starttime": 4984,
"endtime": 5160,
"spoken": "した",
"label": "speaker0"
},
{
"written": "結果",
"confidence": 1.00,
"starttime": 5160,
"endtime": 5528,
"spoken": "けっか",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1.00,
"starttime": 5528,
"endtime": 5624,
"spoken": "を",
"label": "speaker0"
},
{
"written": "返し",
"confidence": 0.86,
"starttime": 5624,
"endtime": 5976,
"spoken": "かえし",
"label": "speaker0"
},
{
"written": "ます",
"confidence": 0.86,
"starttime": 5976,
"endtime": 6376,
"spoken": "ます",
"label": "speaker0"
},
{
"written": "。",
"confidence": 0.92,
"starttime": 6728,
"endtime": 6792,
"spoken": "_",
"label": "speaker0"
},
{
"written": "会議",
"confidence": 1.00,
"starttime": 6792,
"endtime": 7160,
"spoken": "かいぎ",
"label": "speaker1"
},
{
"written": "の",
"confidence": 1.00,
"starttime": 7160,
"endtime": 7256,
"spoken": "の",
"label": "speaker1"
},
{
"written": "文字",
"confidence": 1.00,
"starttime": 7256,
"endtime": 7512,
"spoken": "もじ",
"label": "speaker1"
},
{
"written": "起こし",
"confidence": 1.00,
"starttime": 7512,
"endtime": 7880,
"spoken": "おこし",
"label": "speaker1"
},
{
"written": "や",
"confidence": 1.00,
"starttime": 7880,
"endtime": 8056,
"spoken": "や",
"label": "speaker1"
},
{
"written": "音声",
"confidence": 1.00,
"starttime": 8088,
"endtime": 8520,
"spoken": "おんせい",
"label": "speaker1"
},
{
"written": "対話",
"confidence": 1.00,
"starttime": 8520,
"endtime": 8808,
"spoken": "たいわ",
"label": "speaker1"
},
{
"written": "システム",
"confidence": 1.00,
"starttime": 8808,
"endtime": 9336,
"spoken": "しすてむ",
"label": "speaker1"
},
{
"written": "など",
"confidence": 0.81,
"starttime": 9336,
"endtime": 9512,
"spoken": "など",
"label": "speaker1"
},
{
"written": "の",
"confidence": 1.00,
"starttime": 9512,
"endtime": 9720,
"spoken": "の",
"label": "speaker1"
},
{
"written": "音声",
"confidence": 1.00,
"starttime": 9752,
"endtime": 10152,
"spoken": "おんせい",
"label": "speaker1"
},
{
"written": "対応",
"confidence": 1.00,
"starttime": 10152,
"endtime": 10680,
"spoken": "たいおう",
"label": "speaker1"
},
{
"written": "アプリケーション",
"confidence": 1.00,
"starttime": 10680,
"endtime": 11400,
"spoken": "あぷりけーしょん",
"label": "speaker1"
},
{
"written": "を",
"confidence": 1.00,
"starttime": 11400,
"endtime": 11464,
"spoken": "を",
"label": "speaker1"
},
{
"written": "作成",
"confidence": 1.00,
"starttime": 11464,
"endtime": 11864,
"spoken": "さくせい",
"label": "speaker1"
},
{
"written": "できます",
"confidence": 1.00,
"starttime": 11864,
"endtime": 12568,
"spoken": "できます",
"label": "speaker1"
},
{
"written": "。",
"confidence": 0.79,
"starttime": 12568,
"endtime": 12728,
"spoken": "_",
"label": "speaker1"
}
],
"confidence": 0.999,
"starttime": 200,
"endtime": 12728,
"tags": [],
"rulename": "",
"text": "AmiVoice APIに音声を送信すると、発話内容をテキストにした結果を返します。会議の文字起こしや音声対話システムなどの音声対応アプリケーションを作成できます。"
}
],
"utteranceid": "20250520/12/0196ebc929720a30375894c7_20250520_124231",
"text": "AmiVoice APIに音声を送信すると、発話内容をテキストにした結果を返します。会議の文字起こしや音声対話システムなどの音声対応アプリケーションを作成できます。",
"code": "",
"message": ""
}
2. 非同期インターフェースで話者ダイアライゼーションを利用する
diarization2.wavには、女性2人の音声が含まれています。
(女性A)AmiVoice API は音声をテキストに変換する音声認識APIです。
(女性B)AmiVoice APIに音声を送信すると、発話内容をテキストにした結果を返します。
話者ダイアライゼーションを有効にして、非同期HTTPインターフェースの音声認識を実行します。
リクエストの条件
音声認識エンジン:日本語・会話_汎用
使用する音声ファイル:diarization2.wav
非同期HTTPインターフェースで話者ダイアライゼーションを利用する方法はこちらを参照してください。
リクエストとレスポンス例
リクエスト
Linux・Mac
curl https://acp-api-async.amivoice.com/v1/recognitions \
-F u=$APP_KEY \
-F d="grammarFileNames=-a-general speakerDiarization=True" \
-F a=@diarization2.wav
curl -H "Authorization: Bearer $APP_KEY" \
https://acp-api-async.amivoice.com/v1/recognitions/{セッションID}
Windows
curl https://acp-api-async.amivoice.com/v1/recognitions -F u=%APP_KEY% -F d="grammarFileNames=-a-general speakerDiarization=True" -F a=@diarization2.wav
curl -H "Authorization: Bearer %APP_KEY%" https://acp-api-async.amivoice.com/v1/recognitions/{セッションID}
レスポンス
{
"status": "completed",
"session_id": "0196ebcc30480a305bc99c92",
"service_id": "ami-webinar",
"audio_size": 410330,
"audio_md5": "225fa71f97ab44c56db5d9a542ac2180",
"segments": [
{
"results": [
{
"tokens": [
{
"written": "AmiVoice",
"confidence": 1,
"starttime": 488,
"endtime": 1128,
"spoken": "あみぼいす",
"label": "speaker0"
},
{
"written": "API",
"confidence": 1,
"starttime": 1128,
"endtime": 1736,
"spoken": "えーぴーあい",
"label": "speaker0"
},
{
"written": "は",
"confidence": 1,
"starttime": 1736,
"endtime": 2040,
"spoken": "は",
"label": "speaker0"
},
{
"written": "、",
"confidence": 0.5,
"starttime": 2040,
"endtime": 2072,
"spoken": "_",
"label": "speaker0"
},
{
"written": "音声",
"confidence": 1,
"starttime": 2072,
"endtime": 2600,
"spoken": "おんせい",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1,
"starttime": 2600,
"endtime": 2664,
"spoken": "を",
"label": "speaker0"
},
{
"written": "テキスト",
"confidence": 1,
"starttime": 2664,
"endtime": 3128,
"spoken": "てきすと",
"label": "speaker0"
},
{
"written": "に",
"confidence": 1,
"starttime": 3128,
"endtime": 3224,
"spoken": "に",
"label": "speaker0"
},
{
"written": "変換",
"confidence": 1,
"starttime": 3224,
"endtime": 3672,
"spoken": "へんかん",
"label": "speaker0"
},
{
"written": "する",
"confidence": 1,
"starttime": 3672,
"endtime": 3992,
"spoken": "する",
"label": "speaker0"
},
{
"written": "音声",
"confidence": 1,
"starttime": 4024,
"endtime": 4392,
"spoken": "おんせい",
"label": "speaker0"
},
{
"written": "認識",
"confidence": 1,
"starttime": 4392,
"endtime": 4856,
"spoken": "にんしき",
"label": "speaker0"
},
{
"written": "API",
"confidence": 1,
"starttime": 4856,
"endtime": 5384,
"spoken": "えーぴーあい",
"label": "speaker0"
},
{
"written": "です",
"confidence": 1,
"starttime": 5384,
"endtime": 5784,
"spoken": "です",
"label": "speaker0"
},
{
"written": "。",
"confidence": 0.93,
"starttime": 5784,
"endtime": 5960,
"spoken": "_",
"label": "speaker0"
}
],
"confidence": 1,
"starttime": 200,
"endtime": 6040,
"tags": [],
"rulename": "",
"text": "AmiVoiceAPIは、音声をテキストに変換する音声認識APIです。"
}
],
"text": "AmiVoiceAPIは、音声をテキストに変換する音声認識APIです。"
},
{
"results": [
{
"tokens": [
{
"written": "AmiVoice",
"confidence": 1,
"starttime": 6688,
"endtime": 7344,
"spoken": "あみぼいす",
"label": "speaker0"
},
{
"written": "API",
"confidence": 1,
"starttime": 7344,
"endtime": 7872,
"spoken": "えーぴーあい",
"label": "speaker0"
},
{
"written": "に",
"confidence": 0.99,
"starttime": 7872,
"endtime": 8128,
"spoken": "に",
"label": "speaker0"
},
{
"written": "音声",
"confidence": 1,
"starttime": 8160,
"endtime": 8560,
"spoken": "おんせい",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1,
"starttime": 8560,
"endtime": 8656,
"spoken": "を",
"label": "speaker0"
},
{
"written": "送信",
"confidence": 1,
"starttime": 8656,
"endtime": 9120,
"spoken": "そうしん",
"label": "speaker0"
},
{
"written": "する",
"confidence": 1,
"starttime": 9120,
"endtime": 9392,
"spoken": "する",
"label": "speaker0"
},
{
"written": "と",
"confidence": 1,
"starttime": 9392,
"endtime": 9616,
"spoken": "と",
"label": "speaker0"
},
{
"written": "、",
"confidence": 0.67,
"starttime": 9616,
"endtime": 9648,
"spoken": "_",
"label": "speaker0"
},
{
"written": "発話",
"confidence": 0.99,
"starttime": 9648,
"endtime": 10048,
"spoken": "はつわ",
"label": "speaker0"
},
{
"written": "内容",
"confidence": 1,
"starttime": 10048,
"endtime": 10464,
"spoken": "ないよう",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1,
"starttime": 10464,
"endtime": 10528,
"spoken": "を",
"label": "speaker0"
},
{
"written": "テキスト",
"confidence": 1,
"starttime": 10528,
"endtime": 11040,
"spoken": "てきすと",
"label": "speaker0"
},
{
"written": "に",
"confidence": 1,
"starttime": 11040,
"endtime": 11168,
"spoken": "に",
"label": "speaker0"
},
{
"written": "した",
"confidence": 0.99,
"starttime": 11168,
"endtime": 11344,
"spoken": "した",
"label": "speaker0"
},
{
"written": "結果",
"confidence": 1,
"starttime": 11344,
"endtime": 11728,
"spoken": "けっか",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1,
"starttime": 11728,
"endtime": 11808,
"spoken": "を",
"label": "speaker0"
},
{
"written": "返し",
"confidence": 0.86,
"starttime": 11808,
"endtime": 12160,
"spoken": "かえし",
"label": "speaker0"
},
{
"written": "ます",
"confidence": 0.86,
"starttime": 12160,
"endtime": 12560,
"spoken": "ます",
"label": "speaker0"
},
{
"written": "。",
"confidence": 0.94,
"starttime": 12560,
"endtime": 12704,
"spoken": "_",
"label": "speaker0"
}
],
"confidence": 0.997,
"starttime": 6400,
"endtime": 12800,
"tags": [],
"rulename": "",
"text": "AmiVoiceAPIに音声を送信すると、発話内容をテキストにした結果を返します。"
}
],
"text": "AmiVoiceAPIに音声を送信すると、発話内容をテキストにした結果を返します。"
}
],
"utteranceid": "20250520/12/0196ebcceab80a3051d239d0_20250520_124637",
"text": "AmiVoiceAPIは、音声をテキストに変換する音声認識APIです。AmiVoiceAPIに音声を送信すると、発話内容をテキストにした結果を返します。",
"code": "",
"message": ""
}
実際の音声と話者ダイアライゼーションの結果が異なる場合があります。次の3. 話者数を指定するに進み、精度向上のためのパラメータを指定して再度結果を確認してください。
3. 話者数を指定する
同性同士など似ている音声では上手く話者が区別されない場合があります。パラメータを指定すると、精度が向上する可能性があります。
話者ダイアライゼーションを有効にして、非同期HTTPインターフェースの音声認識を実行します。その際、会話に含まれる最小話者数と最大話者数を指定してください。
リクエストの条件
音声認識エンジン:日本語・会話_汎用
使用する音声ファイル:diarization2.wav
最小話者数:2
最大話者数:2
話者数を指定する方法はこちらを参照してください。
リクエストとレスポンス例
リクエスト
Linux・Mac
curl https://acp-api-async.amivoice.com/v1/recognitions \
-F u=$APP_KEY \
-F d="grammarFileNames=-a-general speakerDiarization=True diarizationMinSpeaker=2 diarizationMaxSpeaker=2" \
-F a=@diarization2.wav
curl -H "Authorization: Bearer $APP_KEY" \
https://acp-api-async.amivoice.com/v1/recognitions/{セッションID}
Windows
curl https://acp-api-async.amivoice.com/v1/recognitions -F u=%APP_KEY% -F d="grammarFileNames=-a-general speakerDiarization=True diarizationMinSpeaker=2 diarizationMaxSpeaker=2" -F a=@diarization2.wav
curl -H "Authorization: Bearer %APP_KEY%" https://acp-api-async.amivoice.com/v1/recognitions/{セッションID}
レスポンス
{
"status": "completed",
"session_id": "0196ebd4c7f00a306b8f9c91",
"service_id": "ami-webinar",
"audio_size": 410330,
"audio_md5": "225fa71f97ab44c56db5d9a542ac2180",
"segments": [
{
"results": [
{
"tokens": [
{
"written": "AmiVoice",
"confidence": 1,
"starttime": 488,
"endtime": 1128,
"spoken": "あみぼいす",
"label": "speaker0"
},
{
"written": "API",
"confidence": 1,
"starttime": 1128,
"endtime": 1736,
"spoken": "えーぴーあい",
"label": "speaker0"
},
{
"written": "は",
"confidence": 1,
"starttime": 1736,
"endtime": 2040,
"spoken": "は",
"label": "speaker0"
},
{
"written": "、",
"confidence": 0.5,
"starttime": 2040,
"endtime": 2072,
"spoken": "_",
"label": "speaker0"
},
{
"written": "音声",
"confidence": 1,
"starttime": 2072,
"endtime": 2600,
"spoken": "おんせい",
"label": "speaker0"
},
{
"written": "を",
"confidence": 1,
"starttime": 2600,
"endtime": 2664,
"spoken": "を",
"label": "speaker0"
},
{
"written": "テキスト",
"confidence": 1,
"starttime": 2664,
"endtime": 3128,
"spoken": "てきすと",
"label": "speaker0"
},
{
"written": "に",
"confidence": 1,
"starttime": 3128,
"endtime": 3224,
"spoken": "に",
"label": "speaker0"
},
{
"written": "変換",
"confidence": 1,
"starttime": 3224,
"endtime": 3672,
"spoken": "へんかん",
"label": "speaker0"
},
{
"written": "する",
"confidence": 1,
"starttime": 3672,
"endtime": 3992,
"spoken": "する",
"label": "speaker0"
},
{
"written": "音声",
"confidence": 1,
"starttime": 4024,
"endtime": 4392,
"spoken": "おんせい",
"label": "speaker0"
},
{
"written": "認識",
"confidence": 1,
"starttime": 4392,
"endtime": 4856,
"spoken": "にんしき",
"label": "speaker0"
},
{
"written": "API",
"confidence": 1,
"starttime": 4856,
"endtime": 5384,
"spoken": "えーぴーあい",
"label": "speaker0"
},
{
"written": "です",
"confidence": 1,
"starttime": 5384,
"endtime": 5784,
"spoken": "です",
"label": "speaker0"
},
{
"written": "。",
"confidence": 0.93,
"starttime": 5784,
"endtime": 5960,
"spoken": "_",
"label": "speaker0"
}
],
"confidence": 1,
"starttime": 200,
"endtime": 6040,
"tags": [],
"rulename": "",
"text": "AmiVoiceAPIは、音声をテキストに変換する音声認識APIです。"
}
],
"text": "AmiVoiceAPIは、音声をテキストに変換する音声認識APIです。"
},
{
"results": [
{
"tokens": [
{
"written": "AmiVoice",
"confidence": 1,
"starttime": 6688,
"endtime": 7344,
"spoken": "あみぼいす",
"label": "speaker1"
},
{
"written": "API",
"confidence": 1,
"starttime": 7344,
"endtime": 7872,
"spoken": "えーぴーあい",
"label": "speaker1"
},
{
"written": "に",
"confidence": 0.99,
"starttime": 7872,
"endtime": 8128,
"spoken": "に",
"label": "speaker1"
},
{
"written": "音声",
"confidence": 1,
"starttime": 8160,
"endtime": 8560,
"spoken": "おんせい",
"label": "speaker1"
},
{
"written": "を",
"confidence": 1,
"starttime": 8560,
"endtime": 8656,
"spoken": "を",
"label": "speaker1"
},
{
"written": "送信",
"confidence": 1,
"starttime": 8656,
"endtime": 9120,
"spoken": "そうしん",
"label": "speaker1"
},
{
"written": "する",
"confidence": 1,
"starttime": 9120,
"endtime": 9392,
"spoken": "する",
"label": "speaker1"
},
{
"written": "と",
"confidence": 1,
"starttime": 9392,
"endtime": 9616,
"spoken": "と",
"label": "speaker1"
},
{
"written": "、",
"confidence": 0.67,
"starttime": 9616,
"endtime": 9648,
"spoken": "_",
"label": "speaker1"
},
{
"written": "発話",
"confidence": 0.99,
"starttime": 9648,
"endtime": 10048,
"spoken": "はつわ",
"label": "speaker1"
},
{
"written": "内容",
"confidence": 1,
"starttime": 10048,
"endtime": 10464,
"spoken": "ないよう",
"label": "speaker1"
},
{
"written": "を",
"confidence": 1,
"starttime": 10464,
"endtime": 10528,
"spoken": "を",
"label": "speaker1"
},
{
"written": "テキスト",
"confidence": 1,
"starttime": 10528,
"endtime": 11040,
"spoken": "てきすと",
"label": "speaker1"
},
{
"written": "に",
"confidence": 1,
"starttime": 11040,
"endtime": 11168,
"spoken": "に",
"label": "speaker1"
},
{
"written": "した",
"confidence": 0.99,
"starttime": 11168,
"endtime": 11344,
"spoken": "した",
"label": "speaker1"
},
{
"written": "結果",
"confidence": 1,
"starttime": 11344,
"endtime": 11728,
"spoken": "けっか",
"label": "speaker1"
},
{
"written": "を",
"confidence": 1,
"starttime": 11728,
"endtime": 11808,
"spoken": "を",
"label": "speaker1"
},
{
"written": "返し",
"confidence": 0.86,
"starttime": 11808,
"endtime": 12160,
"spoken": "かえし",
"label": "speaker1"
},
{
"written": "ます",
"confidence": 0.86,
"starttime": 12160,
"endtime": 12560,
"spoken": "ます",
"label": "speaker1"
},
{
"written": "。",
"confidence": 0.94,
"starttime": 12560,
"endtime": 12704,
"spoken": "_",
"label": "speaker1"
}
],
"confidence": 0.997,
"starttime": 6400,
"endtime": 12800,
"tags": [],
"rulename": "",
"text": "AmiVoiceAPIに音声を送信すると、発話内容をテキストにした結果を返します。"
}
],
"text": "AmiVoiceAPIに音声を送信すると、発話内容をテキストにした結果を返します。"
}
],
"utteranceid": "20250520/12/0196ebd584720a30971139d0_20250520_125601",
"text": "AmiVoiceAPIは、音声をテキストに変換する音声認識APIです。AmiVoiceAPIに音声を送信すると、発話内容をテキストにした結果を返します。",
"code": "",
"message": ""
}