Patent attributes
Implementations disclose predicting video start times for maximizing user engagement. A method includes receiving a first content item comprising content item segments, processing the first content item using a trained machine learning model that is trained based on interaction signals and audio-visual content features of a training set of training segments of training content items, and obtaining, based on the processing of the first content item using the trained machine learning model, one or more outputs comprising salience scores for the content item segments, the salience scores indicating which content item segment of the content item segments is to be selected as a starting point for playback of the first content item.