A pattern-based data matching method matches pattern-based data. The data matching method generates a regular expression pattern for input datasets and describes similarity measures between the generated patterns. The data matching method analyzes an input dataset in terms of symbol classes, generalizing input values into a general pattern to allow identification or extrapolation of overlap between input datasets, aiding in matching fields in databases that are being merged and in learning a pattern for an input dataset. For each sequence of data values, the present method computes a compact pattern describing the sequence. Embodiments of the data matching method comprise noise reduction and repetitive pattern discovery in the input dataset and calculation of recall and precision of the generated pattern.