Patent attributes
Systems and methods can be implemented to conduct speaker-invariant training for speech recognition in a variety of applications. An adversarial multi-task learning scheme for speaker-invariant training can be implemented, aiming at actively curtailing the inter-talker feature variability, while maximizing its senone discriminability to enhance the performance of a deep neural network (DNN) based automatic speech recognition system. In speaker-invariant training, a DNN acoustic model and a speaker classifier network can be jointly optimized to minimize the senone (triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker invariant and senone-discriminative intermediate feature is learned through this adversarial multi-task learning, which can be applied to an automatic speech recognition system. Additional systems and methods are disclosed.