A device with a microphone acquires audio data of a user's speech. That speech comprises utterances, that together comprise a session. The audio data is processed to determine sentiment data indicative of perceived emotional content of the speech as conveyed by individual utterances of the user. That information is then used to determine the emotional content of the session. For example, the information may include several words describing the overall and outlying emotions of the session. Numeric metrics may also be determined, such as activation and valence. A user interface may present the words and metrics to the user. The user may use this information to assess their state of mind, facilitate interactions with others, and so forth.