Patent attributes
Disclosed is a computer-implemented method to identify and anonymize personal information, the method comprising analyzing a first corpus with a personal information sniffer, wherein the first corpus includes unstructured text, wherein the personal information sniffer is configured to detect a set of types of personal information, and wherein the personal information sniffer produces a first set of results. The method comprises analyzing the first corpus with a set of annotators, wherein each annotator is configured to identify all instances of a type of personal information in the corpus, and wherein the set of annotators produces a second set of results. The method comprises comparing the first set of results and the second set of results, determining, the first set of results does not match the second set of results, and updating, based on the determining, the personal information sniffer.