Patent attributes
Machine classifiers in accordance with embodiments of the invention capture long-term temporal dependencies in the dialogue data better than the existing recurrent neural network-based architectures. Additionally, machine classifiers may model the joint distribution of the context and response as opposed to the conditional distribution of the response given the context as employed in sequence-to-sequence frameworks. Further, input data may be bidirectionally encoded using both forward and backward separators. The forward and backward representations of the input data may be used to train the machine classifiers using a single generative model and/or shared parameters between the encoder and decoder of the machine classifier. During inference, the backward model may be used to reevaluate previously generated output sequences and the forward model may be used to generate an output sequence based on the previously generated output sequences.