Patent 11436241 was granted and assigned to FAIR ISAAC CORPORATION on September, 2022 by the United States Patent and Trademark Office.
Computer-implemented methods, systems and products for character string frequency analysis. The method includes a set of operations or steps, including parsing a plurality of character strings into one or more tokens, categorizing the one or more tokens into one or more token frequency categories, and generating a first similarity score between one or more pairs of character strings of the plurality of character strings. The method further includes calculating one or more degrees of commonality or rarity of the plurality of character strings based on the categorizing, generating one or more penalties for token pairs of the one or more pairs of character strings associated with the first similarity score based on the one or more degrees of commonality or rarity and the categorizing, and generating a second similarity score based the first similarity score and the one or more penalties.