Patent attributes
A computer-implemented method for processing data includes receiving an input from a user including positive examples and negative examples of a specified data type. The positive examples include first character strings that belong to the specified data type, and the negative examples include second character strings that do not belong to the specified data type. The first and second character strings are processed to create a set of attributes that characterize the positive examples. A decision tree is built, based on the attributes, which when applied to the first and second strings, distinguishes the positive examples from the negative examples. The decision tree is applied to the data so as to identify occurrences of the specified data type.