A computer receives a multimedia data, where the multimedia data comprises a plurality of frames. The computer converts the multimedia data into a signal wave having a plurality of frequencies and a plurality of amplitudes. The computer determines a frame from the plurality of frames having a pronoun. The computer identifies a topic of the frame. The computer searches for a frame in a media repository having a highest correlation coefficient with the topic of the frame, where the frame from the media repository comprises a bag of objects and resolves the anaphora disambiguation by substituting the pronoun with an object from the bag of objects.