Patent attributes
Systems, methods and computer program products related to aligning heterogeneous sequential data are disclosed. Video data in a media presentation and textual data corresponding to content of the media presentation are received. An action related to aligning the video data and the textual data is determined using an alignment neural network, such that the video data and the textual data are at least partially aligned following the action. The alignment neural network includes a first fully connected layer that receives as input the video data, the textual data, and data relating to a previously determined action by the alignment neural network related to aligning the video data and the textual data. The determined action related to aligning the video data and the textual data is performed.