Patent attributes
A method in one example implementation includes extracting a plurality of data elements from a record of a data file, tokenizing the data elements into tokens, and storing the tokens in a first tuple of a registration list. The method further includes selecting one of the tokens as a token key for the first tuple, where the token is selected because it occurs less frequently in the registration list than each of the other tokens in the first tuple. In specific embodiments, at least one data element is an expression element having a character pattern matching a predefined expression pattern that represents at least two words and a separator between the words. In other embodiments, at least one data element is a word defined by a character pattern of one or more consecutive essential characters. Other specific embodiments include determining an end of the record by recognizing a predefined delimiter.