A method for personalized playback of a video as performed by a video platform includes parsing a video into segments based on visual and audio content of the video. The platform creates multimodal fragments that represent underlying segments of the video, and then orders the multimodal fragments based on a preference of a target user. The platform thus enables nonlinear playback of the segmented video in accordance with the multimodal fragments.