Patent attributes
In embodiments, apparatuses, methods and storage media are described that are associated with training adaptive speech recognition systems (“ASR”) using audio and text obtained from captioned video. In various embodiments, the audio and caption may be aligned for identification, such as according to a start and end time associated with a caption, and the alignment may be adjusted to better fit audio to a given caption. In various embodiments, the aligned audio and caption may then be used for training if an error value associated with the audio and caption demonstrates that the audio and caption will aid in training the ASR. In various embodiments, filters may be used on audio and text prior to training. Such filters may be used to exclude potential training audio and text based on filter criteria. Other embodiments may be described and claimed.