Systems and methods are provided for generating for display an indication of a segment of media content relevant to a voice communication. This may be accomplished by a media guidance application that monitors a voice communication between users. The media guidance application determines that a first user is describing media content. In response to determining that the first user is describing the media content, the media guidance application retrieves media asset viewing history of the first user. The media guidance application determines, based on metadata of each media asset in the media asset viewing history of the first user and the voice communication, a media asset that the first user is describing. The media guidance application determines, based on metadata of the media asset, a segment of the media asset that the first user is describing. The media guidance application generates, for display, an indication of the segment.