Patent attributes
This disclosure describes, in part, techniques for determining device groupings, or clusters, for multiple voice-enabled devices. The device clusters may be determined based on metadata data for audio signals (or audio data) generated by each of the multiple voice-enabled devices. For example, a remote system may analyze timestamp data for the audio signals received from the devices, and determine that the devices detected the same voice command of a user based on the timestamp data indicating that the audio signals were received within a threshold period of time from each other. Additionally, the remote system may analyze other metadata of the audio data, such as signal-to-noise (SNR) values, and determine that the SNR values are within a threshold value. The remote system may determine device clusters for the voice-enabled devices of a user based on these, and potentially other, types of metadata of the audio signals.