Patent attributes
Techniques for prevalence-volume based relevance are provided. Corresponding systems and methods may include ingesting a corpus of documents; receiving a search operator; segmenting the corpus of documents into (i) a first set of documents that matches the search operator, and (ii) a second set of documents that do not match the search operator; extracting a first and second token list of tokens; calculating a prevalence-volume value for tokens included in the first and second token lists; generating a prevalence-volume ratio (PVR) matrix that associates tokens included in the first and/or second token lists with a PVR value, wherein the PVR value for a particular token is a ratio between the prevalence-volume value of the particular token for the first set of documents and the prevalence-volume value of the particular token for the second set of documents; and associating the search operator with the generated PVR matrix.