Patent attributes
A neural network model, such as a deep neural network (DNN), is trained using many speech examples to perform beam selection in a microphone array-based speech processing system. The DNN is trained using many different speech examples that are labeled with position or direction information relative to a training microphone array. The DNN may then be trained to recognize a direction of incoming speech so that at runtime the trained DNN may process input audio data from a microphone array and may output to a beam selector an indicator of the desired beam that may be selected for further processing. The DNN may be configured to output a beam index and/or coordinates (or other position data) corresponding to an estimated location of the detected speech. The DNN may also be configured to output acoustic unit data corresponding to speech units (for example corresponding to phonemes, senons, etc. such as those of a detected wakeword or other word).