Patent attributes
This disclosure describes systems and methods that facilitate reducing a data set that may be used to construct a node graph. For example, the data set may include collections, representations, and associations between the collections and the representations. Topic scores may be determined for the representations, and diversity scores for each collection may be determined based on the topic scores of representations that are associated with the respective collection. If the diversity score is too high, then the collection and its associations are excluded from being incorporated into a node graph that is subsequently constructed from the data set. Topic scores may also be determined for collections in the data set based on the topic scores of representations that are associated with each collection.