Patent attributes
A live stream, that includes a video stream and an audio stream, of a presenter is monitored. The live stream is attended by an audience that includes one or more audience members. One or more stream content features of the live stream at a first window of time is transmitted to a multimodal machine learning model. One or more audience content features of the audience at the first window of time is transferred to the multimodal model. One or more feature results, based on the stream content features and based on the audience content features, of the first window of time is obtained from the multimodal model. The feature results are sent to an auditory machine learning model. A first audio signal from the auditory machine learning model is received. An augmented stream of the first window of time is generated based on the first audio signal.