In some embodiments, a method receives a first textual description of content and converts the first textual description of content to a first image representation. The method compares a similarity between the first image representation and a second image representation for candidate metadata. The candidate metadata is associated with a second textual description of content. The method determines whether the first textual description of content is associated with the second textual description of content based on the comparison of similarity of the first image representation and the second image representation.