Patent attributes
Methods, systems, and apparatus, including computer programs encoded on computer storage media for training a hierarchical recurrent neural network (HRNN) having a plurality of parameters on a plurality of training acoustic sequences to generate phoneme representations of received acoustic sequences. One method includes, for each of the received training acoustic sequences: processing the received acoustic sequence in accordance with current values of the parameters of the HRNN to generate a predicted grapheme representation of the received acoustic sequence; processing an intermediate output generated by an intermediate layer of the HRNN during the processing of the received acoustic sequence to generate one or more predicted phoneme representations of the received acoustic sequence; and adjusting the current values of the parameters of the HRNN based at (i) the predicted grapheme representation and (ii) the one or more predicted phoneme representations.