A program is presented to a user including a stream of content including a first auditory content and a first visual content related to the workout. A context state of the user participating in the workout is determined based on information from a sensor. A responsive content is selectively added based on the context state of the user to the stream of content in a manner that avoids conflict with the stream of content.