Patent attributes
Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.