Patent attributes
In some implementations, a method includes, for each of multiple hosts: identifying visual leaf pages hosted by the host that are each a web page including image data defining an image or a video that is prominently displayed relative to all other content of the web page, identifying a set of hub pages hosted by the host that each link to at least one of the visual leaf pages through an image-based link, and for each hub page, generating cluster data representing the visual leaf pages to which the hub page links by determining, for each visual leaf page, a set of feature values that each indicate pre-defined features of the visual leaf page, and generating, from the sets of feature values, a set of central feature values as the cluster data for the hub page that indicate a central tendency of each respective pre-defined feature.

