Patent attributes
Aspects directed to phrase generation are provided. A method is provided that includes identifying a plurality of phrase candidates from a plurality of text string entries in a corpus. For each phrase candidate: identifying a plurality of left contexts and a plurality of right contexts for the phrase candidate, each left context of the plurality of left contexts being a nearest unique feature to the right of the phrase candidate in a text string entry and each right context of the plurality of right contexts being the nearest unique feature to the right of the phrase candidate, and calculating a left context vector including a score for each left context feature and a right context vector including a score for each right context feature of the phrase candidate. A similarity is determined between pairs of phrase candidates using the respective left and right context vectors for each phrase candidate.