A system is provided for using voice characteristics in determining a user intent corresponding to an utterance. The system processes a NLU hypothesis and voice characteristics data, using a trained model, to determine an alternate NLU hypothesis based on the voice characteristics data. The voice characteristics data may indicate if a user's level of uncertainty when speaking the utterance, an age group of the user, a sentiment of the user when speaking the utterance, and other data.