Patent attributes
A method for processing semi-structured data. The method includes receiving semi-structured data into a first format from a real business process. Preferably, the semi-structured data are machine generated. The method includes tokenizing the semi-structured data into a second format and storing the semi-structured data in the second format into one or more memories and clustering the tokenized data to form a plurality of clusters. The method also includes identifying a selected low frequency term in each of the clusters, and processing at least two of the clusters and the associated selected low frequency terms to form a single template for the at least two of the clusters. In a preferred embodiment, the method replaces the selected low frequency term with a wild card character.