Patent attributes
This disclosure relates generally to data anonymization and more particular y risk-aware data anonymization. Conventional data anonymization systems either replace PII/sensitive attributes with random values or shuffles them, that causes huge data distortion affecting the data utility. The goal of publishing data is best achieved when privacy is balanced with utility of data, Moreover, to ensure privacy, assessing the risk of disclosure is important. The proposed system provides a pipeline for analysis of data patterns to understand the associated risk level of re-identification of records. Further, based on the identified risks with the records the system anonymizes the data following a pattern based anonymization approach wherein data is clustered and for each cluster distinct patterns are identified such that the information loss is minimal.