Patent attributes
Systems and methods are disclosed to implement an outlier detection system for text records. In embodiments, the detection system generates a fingerprint for each incoming record so that similar records map to similar fingerprints. Each record is assigned to a closest cluster in a set of clusters based computed distances between on the record's fingerprint and respective cluster fingerprints of the clusters. The cluster fingerprint is dynamically updated to maintain respective a representative fingerprint of its member records. When a new record is received that is not sufficiently close to any cluster, a new cluster is added to the set for the new record. In embodiments, the creation of the new cluster triggers an alert that the new record is a potential outlier. Advantageously, the disclosed detection system can be used to detect outliers in records in near real time, without the need to pre-specify outlier characteristics.