Patent attributes
A recurrent neural network-transducer (RNN-T) model improves speech recognition by processing sequential non-blank symbols at each time step after an initial one. The model's prediction network receives a sequence of symbols from a final Softmax layer and employs a shared embedding matrix to create and map embeddings to each symbol, associating them with unique position vectors. These embeddings are weighted according to their similarity to their matching position vector. Subsequently, a joint network of the RNN-T model uses these weighted embeddings to output a probability distribution for potential speech recognition hypotheses at each time step, enabling more accurate transcriptions of spoken language.