Patent attributes
Systems, methods, and devices for outputting visual indications regarding voice-based interactions are described. A first speech-controlled device detects spoken audio corresponding to a voice message to a second speech-controlled device. The first device captures the audio and sends audio data corresponding to the captured audio to a server. The server performs speech processing on the audio data to determine a recipient and message content. The server then determines a second speech-controlled device associated with the recipient and sends the message content to the recipient's second speech-controlled device. Thereafter, the server receives an indication from the recipient's speech-controlled device that the second device is detecting speech, presumably in response to the original message. The server then causes a visual indication to be output by the first speech-controlled device, with the visual indication representing the recipient-speech controlled device is detecting speech.