US Patent 11449674 Utility-preserving text de-identification with privacy guarantees

One embodiment of the invention provides a method for utility-preserving text de-identification. The method comprises generating corresponding processed text for each text document by applying at least one natural language processor (NLP) annotator to the text document to recognize and tag privacy-sensitive personal information corresponding to an individual, and replacing some words in the text document with some replacement values. The method further comprises determining infrequent terms occurring across all processed texts, filtering out the infrequent terms from the processed texts, and selectively reinstating to the processed texts at least one of the infrequent terms that is innocuous. The method further comprises generating a corresponding de-identified text document for each processed text by anonymizing privacy-sensitive personal information corresponding to an individual in the processed text to an extent that preserves data utility of the processed text and conceals the individual's personal identity.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11449674 Utility-preserving text de-identification with privacy guarantees

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11449674 Utility-preserving text de-identification with privacy guarantees