Patent attributes
A method for anonymizing documents before publication is provided. The method includes identifying regular expressions configured to match strings to be anonymized in a document, selecting a readable identifier as an anonymized reference for a string replacement, searching the document for a match string that fits the regular expression, hashing the match string using a collision resistant, deterministic, non-inverting cryptographic hashing function, and comparing a cryptographic hash of the match string with a database including multiple previous hashes and multiple corresponding readable identifiers. When none of the previous hashes matching the cryptographic hash, the method includes creating a new database record including the cryptographic hash, incrementing a counter in the readable identifier and associating the readable identifier with the new database record, and replacing the match string with the readable identifier, throughout the document. A system and a medium storing instructions to perform the above method are also provided.