Patent 11495210 was granted and assigned to Microsoft on November, 2022 by the United States Patent and Trademark Office.
A method and system for detecting one or more speech features in speech audio data includes receiving speech audio data, performing preprocessing on the speech audio data to prepare the speech audio data for use as an input into one or more models that detect one or more speech features, providing the preprocessed speech audio data to a stacked machine learning model, and analyzing the preprocessed speech audio data via the stacked ML model to detect the one or more speech features. The stacked ML model includes a feature aggregation model, a sequence to sequence model, and a decision-making model.