Patent attributes
Implementations disclose predicting video start times for maximizing user engagement. A method includes applying a machine-learned model to audio-visual content features of segments of a target content item, the machine-learned model trained based on user interaction signals and audio-visual content features of a training set of content item segments, calculating, based on applying the machine-learned model, a salience score for each of the segments of the target content item, and selecting, based on the calculated salience scores, one of the segments of the target content item as a starting point for playback of the target content item.