A speech-processing system receives both text data and natural-understanding data (e.g., a domain, intent, and/or entity) related to a command represented in the text data. The system uses the natural-understanding data to vary vocal characteristics in determining spectrogram data corresponding to the text data based on the natural-understanding data.