[AmiVoice API] Connection errors in synchronous HTTP and WebSocket interfaces (updated 2025-01-24 17:35)
(Updated on 2025-01-24 17:35)
January 21, 2025: In the synchronous HTTP and WebSocket interfaces, requests could not be connected.
◆Date and time of occurrence
January 21, 2025 (Tuesday) 04:07 - 15:43
The problem occurred particularly during the following times:
・Around 09:12 to 11:38
・Around 15:42 to 15:43
◆Phenomenon
During the above period, when attempting to make a speech recognition request using 8k audio to the "Conversational_General" speech recognition engine, connection was not possible or extremely difficult. At this time, the following error was sent to the client program. For details of the error, please refer to [1] in the reference section.
s can't connect to recognizer server
s can't connect to recognizer server (can't connect to server)
s can't connect to recognizer server (can't find available servers because maximum allowed clients has reached)
During this time, even for sessions that were already connected and undergoing speech recognition, responses were slow and communication timeouts between the client and server occurred, causing the client application to only partially receive speech recognition results.
◆Details of the phenomenon and its cause
The cause of the problem was that the load balancer for the speech recognition servers was not functioning properly, causing the load to be concentrated on some of the servers. As a result, the servers with the concentrated load experienced delays in speech recognition processing and were unable to accept new connections. Approximately 4.7% of the servers were so overloaded that they were unable to function.
Details:
1. When access was concentrated, a specific speech recognition server (hereinafter referred to as DSRS) was not reporting the number of connections to the load balancer server (hereinafter referred to as DSRM).
2. DSRM selects a DSRS with fewer connections to process new requests. Therefore, when the above problem occurs, the number of connections to a DSRS with a large number of connections appears to be small, resulting in requests being sent one after another to a specific DSRS.
3. This particular DSRS had more clients than expected connecting, which was causing the instance resources to be exhausted or reaching the limit on the number of DSRS connected clients, making it unable to accept new connections.
4. As a result, the speech recognition process in the affected DSRS was delayed, resulting in significant delays in responses to clients. As a result, the session was terminated due to a timeout between the server and client, and results were only partially returned.
5. In addition, DSRS was sometimes forced to terminate due to a lack of memory, causing sessions with clients to be disconnected midway and only returning partial results.
After the abnormal termination, a new DSRS was started by auto-scaling, and the system automatically recovered to prevent a capacity shortage of DSRS. In addition, the connection information for each DSRS in DSRM is periodically sent from DSRS, and after the DSRS abnormally terminated, the processing was completed and the system returned to normal, the DSRM information was updated correctly, and the system automatically recovered over time.
◆Measures
As a short-term measure, if there is a speech recognition server with a concentrated load, we will deal with it by separating it from the load balancer. This load balancer (DSRM) is a proprietary program developed by our company. Even if we are unable to obtain connection information from the speech recognition server, we will deal with the problem by modifying the algorithm to prevent the load from concentrating on a specific speech recognition server.
◆Reference
[1]: AmiVoice API Manual "List of server errors for s command packets/s command response packets": https://docs.amivoice.com/en/amivoice-api/manual/reference-websocket-s-command-packet#server-error
--------------
The article before the above change is as follows:
(Published 2025-01-21 13:51 / Updated 2025-01-22 8:31)
Today, we encountered an issue where requests sent via the synchronous HTTP and WebSocket interfaces were unable to connect.
◆Phenomenon
During the following period, there were issues with connection when sending 8K audio to "Conversation_General".
(Added 2025-01-22 8:31)
Not only was it difficult to connect, but sessions that were already connected during this period also experienced slower responses and communication timeouts between the client and server, which meant that the client application could only partially receive speech recognition results.
◆Time when the connection error occurred2025/01/21 09:12頃 ~ 11:38 Around (breaking news)2025/01/21 15:42頃 ~ Around 15:43 (breaking news)
Current situation
Currently this problem is not occurring.
◆Cause
The cause is under investigation
We deeply apologize for any inconvenience and concern this may have caused. We are currently investigating the cause and scope of the impact, and will update you as soon as we have more information.