Patent attributes
Provided is a process of modifying semantic similarity graphs representative of pair-wise similarity between documents in a corpus, the method comprising obtaining a semantic similarity graph that comprises more than 500 nodes and more than 1000 weighted edges, each node representing a document of a corpus, and each edge weight indicating an amount of similarity between a pair of documents corresponding to the respective nodes connected by the respective edge; obtaining an n-gram indicating that edge weights affected by the n-gram are to be increased or decreased; expanding the n-gram to produce a set of expansion n-grams; adjusting edge weights of edges between pairs of documents in which members of the expanded n-gram set co-occur.