Patent attributes
A computing system receives a collection comprising multiple sets of ordered terms, including a first set. The system generates a dataset indicating an association between each pair of terms within a same set of the collection by generating co-occurrence score(s) for the first set. The system generates computed probabilities based on the co-occurrence score(s) for the first set. The computed probabilities indicate a likelihood that one term in a given pair of terms of the collection appears in a given set of the collection given that another term in the given pair of terms of the collection occurs. The system smoothes the computed probabilities by adding one or more random observations. The system generates one or more association indications for the first set based on the smoothed computed probabilities. The system outputs an indication of the dataset. Additionally, or alternatively, based on association measure(s), the system generates a virtual term.