Patent attributes
A speech interface device is configured to process user speech by storing, in volatile memory of the speech interface device, audio data that represents user speech, and inputting first audio data, of the stored audio data, to an automatic speech recognition (ASR) component of the speech interface device, determining that a criterion is satisfied, and, based on the criterion being satisfied, maintaining second audio data in the volatile memory. The ASR component may generate text data based on the first audio data, a natural language understanding (NLU) component of the speech interface device may generate NLU data based on the text data, and, if the NLU data corresponds to a recognized intent, the second audio data may be deleted. Otherwise, speech processing can be resumed by inputting the second audio data to the ASR component.