A processing device determines a plurality of visual concepts for visual data based on at least one of visual entities in the visual data or feature-level attributes in the visual data, wherein the visual entities are based on the feature-level attributes, and wherein each of the plurality of visual concepts comprises a subject visual entity related to an object visual entity by a predicate. The processing device further determines one or more visual semantics for the visual data based on the plurality of visual concepts, wherein the one or more visual semantics define relationships between the plurality of visual concepts.