Patent attributes
Privacy is defined in the context of a guessing game based on the so-called guessing inequality. The privacy of a sanitized record, i.e., guessing anonymity, is defined by the number of guesses an attacker needs to correctly guess an original record used to generate a sanitized record. Using this definition, optimization problems are formulated that optimize a second anonymization parameter (privacy or data distortion) given constraints on a first anonymization parameter (data distortion or privacy, respectively). Optimization is performed across a spectrum of possible values for at least one noise parameter within a noise model. Noise is then generated based on the noise parameter value(s) and applied to the data, which may comprise real and/or categorical data. Prior to anonymization, the data may have identifiers suppressed, whereas outlier data values in the noise perturbed data may be likewise modified to further ensure privacy.