Golden has been acquired by ComplyAdvantage.Read about it here ⟶

Differential privacy

Differential privacy is a system enabling the analysis of databases containing personal information without divulging the identity of the individuals.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 29 Apr, 2022

Amy Tomlinson Gayle

edited on 29 Apr, 2022

Edits made to:

Timeline (+9/-8 characters)

Timeline

February 28, 2022

Google announceannounces a production machine learning model using federated learning with a differential privacy guarantee.

Amy Tomlinson Gayle

edited on 29 Apr, 2022

Edits made to:

Description (-1 characters)

Article (+211/-144 characters)

Differential privacy

Differential privacy is a system enabling the analysis of databases containing personal information, without divulging the identity of the individuals.

Article

Differential privacy is a system that enables the analysis of databases containing personal information, without divulging the identity of the individuals. Differential privacy provides a mathematically provable guarantee of privacy, protecting against a wide range of privacy attacks (including differencing attacks, linkage attacks, and reconstruction attacks).

...

The amount of sensitive data recorded digitally is increasing with people relying on digital services for new applications from payments, shopping, and health to transportation and navigation. While this data has many advantageous use cases, it also presents significant privacy challenges. Differential privacy aims to protect the privacy of an individual's data while enabling data scientists and researchers to continue the aggregate analysis of the data collected.

...

Companies have typically relied on data masking (also called de-identification) to protect privacy in datasets. Data masking removes personally identifiable information (PII) from each record within the dataset. However, research and real-life incidents have shown that simply removing PII from datasets doesn't guarantee the privacy of individuals. Combining anonymous datasets with auxiliary information allows for people's identities to be discovered. Examples include the following:

In 2006 researchers from the University of Texas at Austin analyzed a public, anonymous dataset from Netflix and successfully re-identified movies watched by thousands of people by combining the dataset with information from IMDB.

...

Differential privacy aims to prevent these types of attacks by sharing data with random noise introduced. It is possible to add a level of noise such that the output prevents an attacker from discovering anything statistically significant about individuals in the dataset while also ensuring the dataset remains useful to analysts. By introducing an appropriate level of random noise, the same output could come from a database with or without the target's information.

...

It is possible to apply differential privacy to a wide range of systems, including recommendation systems, social networks, and location-based services. Variations of differentially private algorithms are also utilized in machine learning, game theory and economic mechanism design, statistical estimation, and many more. ExamplesThe following are examples of differential privacy in use include:

Apple accumulating anonymous user data from their devices.
Amazon accessing users' personalized shopping preferences from sensitive information about past purchases.
Facebook gathering behavioral data for targeted advertising campaigns without defying various nation’s privacy policies.

...

ChallengesA variety of challenges are associated with implementing differential privacy include:

...

Differential Privacy is one of many privacy-enhancing technologies (PETs) available. Others include the following:

...

Dalenius’s statistical disclosure definition.
Minimum query set size

Dalenius’s statistical disclosure definition and
minimum query set size.

...

Minimum query set size is a constraint aiming to ensure the privacy of individuals during aggregate queries (when the returned value is calculated across a subset of records in a dataset). It blocks queries that do not include data from a minimum number of records, i.e.i.e., if the query calculates data from less than a defined threshold, the query is blocked.

...

In 1979, Dorothy Denning, Peter J. Denning, and Mayer D. Schwartz published a paper titled "The tracker: a threat to statistical database security." The paper describes a type of attack proving it is possible to learn confidential information from a series of targeted queries. Therefore, minimum query set sizes do not ensure privacy.

...

Differential privacy was defined from years of research applying algorithmic ideas to the study of privacy. Many cite Cynthia Dwork's 2006 paper, as the first definition of differential privacy. In the paper, Dwork proved Dalenius's definition failed, and that auxiliary information could always lead to re-identifying individuals when querying a dataset. Due to this fact, Dwork proposed a new definition known as differential privacy, stating the technique can:techniques

can achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy.

...

During the 2010s, large tech companies, including Apple, Facebook, and Amazon, began implementing differential privacy to protect the users of their services. Google has released multiple open-source differential privacy libraries to aid developers. In 2020, the US census adopted differential privacy techniques to protect respondents' personal information.

...

Early differential privacy patents include patent #7698250B2, first filed on December 16th, 2005 by Microsoft with inventors Cynthia Dwork and Frank D McSherry. The patent described differentially private systems and methods for controlling privacy loss during database participation by introducing an appropriate noise distribution based on the sensitivity of the query. The patent was granted and published on April 13th, 2010.

...

A number of companies and institutions have been granted patents related to differential privacy, including Apple, Microsoft, and NortonLifeLock. The table below shows a list of patents related to differential privacy.

...

By averaging multiple attempts, the attacker gets close to the real answer, and with more queries, they can uncover sensitive data and breach the privacy of the data set. From the "90% confidence interval," you can see it will take significantly more queries to be statistically confident in the real number of people with a bad credit rating.

...

While adding noise has concealed the real answer somewhat, this can be circumvented by repeatedly querying the database. WeOne could increase the number of queries it takes by increasing the level of noise introduced to the results (higher standard deviation). To better defend sensitive data, weone cannot simply add a random level of noise, i.e. standard deviation of 2 from our example above. The level of noise needed to obscure the real answer is different for each function and depends on the function's sensitivity.

...

Forfor data sets D1 and D2 differing by at most one element. The equation above states the sensitivity of a function is the largest possible difference one row can have on the result of the whole function, for any dataset. For example, a counting function has a sensitivity of 1 as adding or removing a single row from any dataset changes the count by at most 1. If the dataset waswere grouped using multiples of 5 (i.e., 0, 5, 10, 15, etc.), then the sensitivity would increase to 5. Determining the sensitivity of an arbitrary function is more difficult, and became an area of significant research.

...

For an attacker to not learn anything about an individual, they must be restricted to insignificantly small changes in their belief about an individual, i.e. there is no difference between using a dataset and an identical dataset minus a single person's records.

...

The algorithm, or mechanism K, satisfying this expression addresses concerns that any participant has about their personal information being leaked. Even if a participant's information is removed from the data, set no outputs would become significantly more or less likely.

...

Composability, if two queries are answered with different values of ε, they are guaranteed to a level of ε₁ + ε₂.
Strength against arbitrary background information, the system does not depend on any auxiliary information the attacker knows.
Security against post-processing, there are no restrictions on what an analyst can do with the results, they remain differentially private regardless of any post-processing.

...

The alternative approach is local differential privacy wherein which the aggregator does not have access to the raw data. Instead, differentially private algorithms are applied locally to each user's data before transfer to the aggregator. The aggregator can compute statistics and publish results from this noisy data without further acting on the dataset. In theory, the aggregator could publish all the data they receive as it has already been anonymized locally.

Arthur Smalley

edited on 29 Apr, 2022

Edits made to:

Article (+1 images) (+1465/-1065 characters)

Article

Differential privacy is a system that enables the analysis of databases containing personal information, without divulging the identity of the individuals. This is achieved by adding randomized “noise” to an aggregate query result in order to protect individual entries without significantly changing the result. Differentially private algorithms prevent attackers from learning anything about specific individuals while also allowing researchers to obtain valuable information on the database as a whole. One of the simplest algorithms is the Laplace mechanism, which post-processes results of aggregate queries. Differentially private algorithms are an active field of research.

...

Differential privacy aims to prevent these types of attacks by sharing data with random noise introduced. It is possible to add a level of noise such that the output prevents an attacker from discovering anything statistically significant about individuals in the dataset while also ensuring the dataset remains useful to analysts. Differentially private algorithms guarantee the attacker cannot learn anything statistically significant about a target. By introducing an appropriate level of random noise the same output could come from a database with or without the target's information.

...

Apple accumulating anonymous usageuser data from their devices.

...

In 2020 the US census adopted differential privacy techniques to protect respondents' personal information.

During the 2010s, large tech companies including Apple, Facebook, and Amazon began implementing differential privacy to protect the users of their services. Google has released multiple open-source differential privacy libraries to aid developers. In 2020 the US census adopted differential privacy techniques to protect respondents' personal information.

...

The level of noise introduced to query results is determined by the privacy loss parameter Ɛε (Epsilon). It is derived from the Laplace distribution and determines how much deviation there is in results if a single piece of data is excluded from the dataset. The extent that an attacker can change their belief about an individual is controlled by Ɛε, it determines the boundary on the change in probability of any outcome.

...

A small value for Ɛ means a small deviation in the computations where any users’ data was to be removed from the dataset, i.e. more random results where an attacker can only learn very little. Higher values for Ɛ result in more accurate but less private results. Determining the optimal value of Ɛ depends on the trade-off between privacy and accuracy for a given scenario, and has not yet been determined.

ε determines the maximum difference between a query of the original data and the same query of a parallel database missing a single record.

Demonstration of the value of ε for differential privacy.

...

A small value for ε means a small deviation in the computations where any users’ data was to be removed from the dataset, i.e. more random results where an attacker can only learn very little. If ε = 0 there is no difference in the query result if a record is removed. Higher values for ε result in more accurate but less private results. Determining the optimal value of ε depends on the trade-off between privacy and accuracy for a given scenario, and has not yet been determined.

A randomized algorithm K is Ɛ-differentiallyε-differentially private if for all data sets D1 and D2 (differing on at most one element), for all the possible values of K that could be predicted (S) if:

...

Composability, if two queries are answered with different values of Ɛε, they are guaranteed to a level of Ɛ₁ε₁ + Ɛ₂ε₂.

Edits on 28 Apr, 2022

Arthur Smalley

edited on 28 Apr, 2022

Edits made to:

Article (+2 images) (+5350/-207 characters)

Article

...

The amount of sensitive data recorded digitally is rapidly increasing with people relying on digital services in manyfor morenew applications from payments, shopping, and health to transportation and navigation. While this data has many advantageous use cases it also presents significant privacy challenges. Differential privacy aims to protect the privacy of an individual's data while enabling data scientists and researchers to continue the aggregate analysis of the data collected.

...

Differential privacy aims to prevent these types of attacks by sharing data with random noise introduced. It is possible to add a level of noise such that the output prevents an attacker from discovering anything statistically significant about individuals in the dataset while also ensuring the dataset remains useful to analysts. Differentially private algorithms guarantee the attacker cannot learn anything statistically significant about a target. By introducing an appropriate level of random noise the same output could come from a database with or without the target's information.

...

It is possible to apply differential privacy to a wide range of systems such asincluding recommendation systems, social networks, and location-based services. Variations of differentially private algorithms are also utilized in machine learning, game theory and economic mechanism design, statistical estimation, and many more. Examples of differential privacy in use include:

Variations of differentially private algorithms are utilized in machine learning, game theory and economic mechanism design, statistical estimation, and many more.

...

In 1979 Dorothy Denning, Peter J. Denning, and Mayer D. Schwartz published a paper titled "The tracker: a threat to statistical database security." The paper describes a type of attack proving it is possible to learn confidential information from a series of targeted queries. and thereforeTherefore minimum query set sizes do not ensure privacy.

...

A number of companies and institutions have been granted patents related to differential privacy including Apple, Microsoft, and NortonLifeLock. The table below shows a list of patents related to differential privacy.

...

Theory

Adding noise

Differentially private algorithms incorporate random noise to query results. This decreases the importance of individual records, preventing attackers from breaching the privacy of people within the data set.

Example

Imagine a database of credit ratings such that: 3 people have a bad rating, 1510 have a normal rating, and 200 have a good rating. An attacker wants to know the number of people with a bad credit rating. Instead of returning the real answer (N = 3), a query of the database returns the truth (N) combined with some random noise (N+L). The random noise (L) is determined randomly from a zero-centered Laplace distribution with a standard deviation of 2.

Plots of four zero-centered Laplace distributions.

The attacker begins querying the database receiving a different result each time:

Simulated results to trying to find the number of people with a bad credit rating

By averaging multiple attempts the attacker gets close to the real answer and with more queries, they can uncover sensitive data and breach the privacy of the data set. From the "90% confidence interval" you can see it will take significantly more queries to be statistically confident in the real number of people with a bad credit rating.

While adding noise has concealed the real answer somewhat, this can be circumvented by repeatedly querying the database. We could increase the number of queries it takes by increasing the level of noise introduced to the results (higher standard deviation). To better defend sensitive data, we cannot simply add a random level of noise, i.e. standard deviation of 2 from our example above. The level of noise needed to obscure the real answer is different for each function and depends on the function's sensitivity.

Sensitivity

Take the function:

The sensitivity of the function is:

For data sets D1 and D2 differing by at most one element. The equation above states the sensitivity of a function is the largest possible difference one row can have on the result of the whole function, for any dataset. For example, a counting function has a sensitivity of 1 as adding or removing a single row from any dataset changes the count by at most 1. If the dataset was grouped using multiples of 5 (i.e., 0, 5, 10, 15, etc.) then the sensitivity would increase to 5. Determining the sensitivity of an arbitrary function is more difficult, and became an area of significant research.

ε-differential privacy

For an attacker to not learn anything about an individual they must be restricted to insignificantly small changes in their belief about an individual, i.e. there is no difference between using a dataset and an identical dataset minus a single person's records.

The level of noise introduced to query results is determined by the privacy loss parameter Ɛ (Epsilon). It is derived from the Laplace distribution and determines how much deviation there is in results if a single piece of data is excluded from the dataset. The extent that an attacker can change their belief about an individual is controlled by Ɛ, it determines the boundary on the change in probability of any outcome.

A randomized algorithm K is Ɛ-differentially private if for all data sets D1 and D2 (differing on at most one element), for all values of S if:

The algorithm, or mechanism K, satisfying this expression addresses concerns that any participant has about their personal information being leaked. Even if a participant's information is removed from the data set no outputs would become significantly more or less likely.

Beyond guaranteeing privacy, differential privacy also has the following characteristics:

Composability, if two queries are answered with different values of Ɛ, they are guaranteed to a level of Ɛ₁ + Ɛ₂.
Strength against arbitrary background information, the system does not depend on any auxiliary information the attacker knows.
Security against post-processing, there are no restrictions on what an analyst can do with the results, they remain differentially private regardless of any post-processing.

...

There are two common approaches to differential privacy global (sometimes called central) and local. The main difference between them is who is granted access to the raw inputdata.

...

With noise added to each individual's data, the total noise is higher, reducing accuracy and often leading to analysts needing a larger data set. However, the main advantage of local differential privacy is the removal of the trusted aggregator. Local differential privacy is a good alternative if the aggregates are too broad for the level of analysis required. With local differential privacy, individuals cannot deny participation in the data set but they can deny the contents of their records. A local approach to differential privacy also has great potential for supervised machine learning.

Edits on 27 Apr, 2022

Arthur Smalley

edited on 27 Apr, 2022

Edits made to:

Timeline (+1 events) (+84 characters)

Article (+1 images) (+3049/-104 characters)

Article

Companies have typically relied on data masking (also called de-identification) to protect individual privacy in datasets. Data masking removes personally identifiable information (PII) from each record within the dataset. However, research and real-life incidents have shown that simply removing PII from datasets doesn't guarantee the privacy of individuals. Combining anonymous datasets with auxiliary information allows for people's identities to be discovered. Examples include:

...

Differential privacy aims to prevent these types of attacks by sharing data with random noise introduced. It is possible to add a level of noise such that the output prevents an attacker from discovering anything statistically significant about individuals in the dataset while also ensuring the dataset remains useful to analysts. Differentially private algorithms guarantee the attacker cannot learn anything statistically significant about a target,. By introducing an appropriate level of random noise means the same output could come from a database with or without the target's information.

...

It restrictsRestricting analysts to aggregate questions
DeterminingDifficulties determining the sensitivity of a question can be difficult
It canPotentially capcapping the number of questions due to data becoming noisier and less reliable
Highly sensitive queries returnreturning noisier data

...

Minimum query set size is a constraint aiming to ensure the privacy of individuals during aggregate queries (when the returned value is calculated across a subset of records in a dataset). It blocks queries that do not include data from a set minimum number of records, i.e. if the query calculates data from less than a defined threshold, thatthe query is blocked.

...

In 1979 Dorothy Denning, Peter J. Denning, and Mayer D. Schwartz publishpublished a paper titled "The tracker: a threat to statistical database security." The paper describeddescribes a type of attack showingproving it is possible to learn confidential information from a series of targeted queries and therefore minimum query set sizes do not ensure privacy.

...

In 2020 the US census adopted differential privacy techniques to protect respondents' personal information.

...

Early differential privacy patents include patent #7698250B2 first filed on December 16th, 2005 by Microsoft with inventors Cynthia Dwork and Frank D McSherry. The patent described differentially private systems and methods for controlling privacy loss during database participation by introducing an appropriate noise distribution based on the sensitivity of the query. The patent was granted and published on April 13th, 2010.

A number of companies and institutions have granted patents related to differential privacy including Apple, Microsoft, and NortonLifeLock. The table below shows a list of patents related to differential privacy.

...

Global vs local

There are two common approaches to differential privacy global (sometimes called central) and local. The main difference between them is who is granted access to the raw input.

Differences between global vs local differential privacy.

Global differential privacy

In global differential privacy a trusted central aggregator, or curator, has access to the raw data. Generally, this aggregator is a service or research organization collecting data about individuals. They receive user data without noise and are responsible for transforming it using a differentially private algorithm. The algorithm is only applied once at the end of the process before any analysis is published or shared with other parties.

When an individual's data is being queried, global differential privacy ensures they are able to deny their participation in the dataset used to produce the result. Therefore, reducing the likelihood of re-identification. Global differential privacy improves accuracy, reducing the level of noise needed to produce valuable results with a low ε. It also protects against post-processing (including from attackers with access to auxiliary information).

...

Global differential privacy does require individuals to trust their information with the central aggregator. Plus, with all the information held by a single organization, it increases the risk of cyberattacks and data leaks. Other downsides include limiting questions to ones that generate aggregates.

Local differential privacy

The alternative approach is local differential privacy where the aggregator does not have access to the raw data. Instead, differentially private algorithms are applied locally to each user's data before transfer to the aggregator. The aggregator can compute statistics and publish results from this noisy data without further acting on the dataset. In theory, the aggregator could publish all the data they receive as it has already been anonymized locally.

Timeline

2020

US Census adopts differentially privacy to protect the personal data of respondants.

Edits on 26 Apr, 2022

Arthur Smalley

edited on 26 Apr, 2022

Edits made to:

Timeline (+3 events) (+738 characters)

Description (+16/-44 characters)

Article (+3114/-39 characters)

Table (+2 rows) (+9 cells) (+226 characters)

Differential privacy

Differential privacy is a system that enablesenabling the analysis of databases containing people's personal information, without divulging the personal identificationidentity of the individuals.

Article

Differential privacy is a system that enables the analysis of databases containing personal information, without divulging the identity of the individuals. This is achieved by adding randomized “noise” to an aggregate query result in order to protect individual entries without significantly changing the result. Differentially private algorithms prevent attackers from learning anything about specific individuals while also allowing researchers to obtain valuable information on the database as a whole. Differentially private algorithms are still an active field of research.

...

Companies have typically relied on data masking (also called de-identification) to protect individual privacy in datasets. Data masking removes personally identifiable information (PII) from each record within the dataset. However, research and real-life incidents have shown that simply removing PII from datasets doesn't guarantee the privacy of individuals. Combining anonymous datasets with auxiliary information allows for originalpeople's identities to be re-identifieddiscovered. Examples include:

...

Differential privacy aims to prevent these types of attacks by sharing data combined with random noise introduced. It is possible to add a level of noise such that the output prevents an attacker from discovering anything statistically significant about individuals in the dataset while also ensuring the dataset remains useful forto analysts. Differentially private algorithms guarantee the attacker cannot learn anything statistically significant about a target, introducing an appropriate level of random noise means the same output could come from a database with or without the target's information.

...

Challenges associated with implementing differential privacy include:

It restricts analysts to aggregate questions
Determining the sensitivity of a question can be difficult
It can cap the number of questions due to data becoming noisier and less reliable
Highly sensitive queries return noisier data

Differential Privacy is one of many privacy-enhancing technologies (PETs) available. Others include:

Homomorphic encryption
Secure multiparty computation (SMPC)
Zero-knowledge proof (ZKP)
Federated learning
Generative adversarial networks (GANs)
Pseudonymization/obfuscation/data masking
On-device learning
Synthetic data generation (SDG)

History

Differentially private algorithms are the result of decades of research on technologies for privacy-preserving data analysis. Two earlier concepts that directly influenced differential privacy are:

Dalenius’s statistical disclosure definition.
Minimum query set size

Dalenius’s statistical disclosure definition.

In 1977, statistician Tore Dalenius proposed a strict definition of data privacy, stating it should be impossible to learn anything about an individual from a database that cannot be learned without access to the database. While later work would go on to disprove Dalenius's definition, it became a key building block for differential privacy.

Minimum query set size

Minimum query set size is a constraint aiming to ensure the privacy of individuals during aggregate queries (when the returned value is calculated across a subset of records in a dataset). It blocks queries that do not include data from a set minimum of records, i.e. if the query calculates data from less than a defined threshold, that query is blocked.

In 1979 Dorothy Denning, Peter J. Denning, and Mayer D. Schwartz publish a paper titled "The tracker: a threat to statistical database security." The paper described a type of attack showing it is possible to learn confidential information from a series of targeted queries and therefore minimum query set sizes do not ensure privacy.

Differential privacy

Differential privacy was defined from years of research applying algorithmic ideas to the study of privacy. Many cite Cynthia Dwork's 2006 paper, as the first definition of differential privacy. In the paper, Dwork proved Dalenius's definition failed and that auxiliary information could always lead to re-identifying individuals when querying a dataset. Due to this fact, Dwork proposed a new definition known as differential privacy, stating the technique can:

achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy.

Differential privacy guarantees that an attacker can learn nothing more about an individual than they could if the target's information were removed from the dataset. While weaker than Dalenius’s definition of privacy, the guarantee means individual records are almost irrelevant to the output of the system and therefore the organization handling a participant's data will not violate their privacy.

Patents

Table

Title

Author

Link

Type

Date

Differential Privacy

Cynthia Dwork

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/dwork.pdf

2006

The tracker

Dorothy E. Denning, Peter J. Denning

https://dl.acm.org/doi/pdf/10.1145/320064.320069

Journal

March 1979

Timeline

2006

Cynthia Dwork proves Dalenius's definition of privacy is impossible, proposing a new definition: differential privacy.

Dwork showed any access to sensitive data would violate Dalenius's definition of privacy.

March 1, 1979

Dorothy Denning, Peter J. Denning, and Mayer D. Schwartz publish work on tracker attacks.

The paper titled "The tracker: a threat to statistical database security" shows how it is possible to learn confidential information from a series of targeted queries. These attacks show minimum query set sizes cannot ensure privacy.

1977

Statistiscian Tore Dalenius proposes a strict definition.

Dalenius's definition states that nothing about an individual should be learned from the database that cannot be learned without access to the database.

Arthur Smalley

edited on 26 Apr, 2022

Edits made to:

Timeline (+5 events) (+850 characters)

Article (+2747/-684 characters)

Table (+1 rows) (+5 cells) (+224 characters)

Article

Differential privacy is a system that enables the analysis of databases containing people's personal information, without divulging the personal identificationidentity of the individuals. This is achieved by adding randomized “noise” to an aggregate query result in order to protect individual entries without significantly changing the result. Differentially private algorithms prevent attackers from learning anything about specific individuals while also allowing researchers can stillto obtain valuable datainformation on the database as a whole. Differentially private algorithms are still an active field of research.

...

The amount of sensitive data recorded digitally is rapidly increasing with people relying on digital services in many more applications from payments, shopping, and health to transportation and navigation. While this data has many advantageous use cases it also presents significant privacy challenges. Differential privacy aims to protect the privacy of an individual's data while enabling data scientists and researchers to continue the aggregate analysis of the data collected.

...

It is possible to apply differential privacy to a wide range of systems such as recommendation systems, social networks, and location-based services. Examples of differential privacy include:

Apple accumulating anonymous usage data from their devices.
Amazon accessing users' personalized shopping preferences from sensitive information about past purchases.
Facebook gathering behavioral data for targeted advertising campaigns without defying various nation’s privacy policies.
Variations of differentially private algorithms are utilized in machine learning, game theory and economic mechanism design, statistical estimation, and many more.

Companies have typically relied on data masking (also called de-identification) to protect individual privacy in datasets. Data masking removes personally identifiable information (PII) from each record within the dataset. However, research and real-life incidents have shown that simply removing PII from datasets doesn't guarantee the privacy of individuals. Combining anonymous datasets with auxiliary information allows for original identities to be re-identified. Examples include:

In 1996, an MIT researcher matched health records with public voter registration data and identified the governor of Massachusetts' personal information from a masked dataset.
In 2006 researchers from the University of Texas at Austin analyzed a public anonymous dataset from Netflix and successfully re-identified movies watched by thousands of people by combining the dataset with information from IMDB.
In 2022, researchers used AI to re-identify over half of the mobile phone records within an anonymized dataset.

...

Differential privacy aims to prevent these types of attacks by sharing data combined with random noise. It is possible to add a level of noise such that the output prevents an attacker from discovering anything statistically significant about individuals in the dataset while also ensuring the dataset remains useful for analysts. Differentially private algorithms guarantee the attacker cannot learn anything statistically significant about a target, introducing an appropriate level of random noise means the same output could come from a database with or without the target's information.

...

It is possible to apply differential privacy to a wide range of systems such as recommendation systems, social networks, and location-based services. Examples of differential privacy include:

Apple accumulating anonymous usage data from their devices.
Amazon accessing users' personalized shopping preferences from sensitive information about past purchases.
Facebook gathering behavioral data for targeted advertising campaigns without defying various nation’s privacy policies.
Variations of differentially private algorithms are utilized in machine learning, game theory and economic mechanism design, statistical estimation, and many more.

Table

Title

Author

Link

Type

Date

Differential Privacy: An Economic Method for Choosing Epsilon

Justin Hsu, Marco Gaboardi, Andreas Haeberlen, Sanjeev Khanna, Arjun Narayan, Benjamin C. Pierce, Aaron Roth

https://arxiv.org/abs/1402.3329

Journal

February 13, 2014

Timeline

February 28, 2022

Google announce a production machine learning model using federated learning with a differential privacy guarantee.

January 28, 2022

Google debuts an open-source tool allowing Python developer's to process data with differential privacy.

The tool was developed in partnership with OpenMined, an organization of open-source developers.

January 23, 2020

A team of Amazon researchers proposes a new "Mad Libs" technique to preserve privacy in natural language processing.

The technique, based on differential privacy, replaces words in individual sentences to re-phrase customer-supplied text such that the analysis is not based on the original language.

September 5, 2019

Google releases an open-source version of it foundational differential privacy libary in C++, Java, and Go.

October 30, 2014

Google releases RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response), an open source differential privacy library.

Edits on 25 Apr, 2022

Jen English

edited on 25 Apr, 2022

Edits made to:

Article

Article

Patents - Differential Privacy

Edits on 23 Apr, 2022

Arthur Smalley

edited on 23 Apr, 2022

Edits made to:

Infobox (+3 properties)

Timeline (+6 events) (+1730 characters)

Description (+179 characters)

Article (+1235 characters)

Table (+4 rows) (+20 cells) (+890 characters)

Differential privacy

Differential privacy is a system that enables the analysis of databases containing people's personal information, without divulging the personal identification of the individuals.

Article

Overview

Differential privacy is a system that enables the analysis of databases containing people's personal information, without divulging the personal identification of the individuals. This is achieved by adding randomized “noise” to an aggregate query result in order to protect individual entries without significantly changing the result. Differentially private algorithms prevent attackers from learning anything about specific individuals while researchers can still obtain valuable data on the database as a whole. Differentially private algorithms are still an active field of research.

...

It is possible to apply differential privacy to a wide range of systems such as recommendation systems, social networks, and location-based services. Examples of differential privacy include:

Apple accumulating anonymous usage data from their devices.
Amazon accessing users' personalized shopping preferences from sensitive information about past purchases.
Facebook gathering behavioral data for targeted advertising campaigns without defying various nation’s privacy policies.
Variations of differentially private algorithms are utilized in machine learning, game theory and economic mechanism design, statistical estimation, and many more.

Table

Title

Author

Link

Type

Date

Calibrating Noise to Sensitivity in Private Data Analysis - Microsoft Research

Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith

https://www.microsoft.com/en-us/research/publication/calibrating-noise-to-sensitivity-in-private-data-analysis/

Web

March, 2006

Practical Privacy: The SuLQ Framework - Microsoft Research

Avrim Blum, Cynthia Dwork, Frank McSherry, Kobbi Nissim

https://www.microsoft.com/en-us/research/publication/practical-privacy-the-sulq-framework/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F64351%2Fbdmn.pdf

Web

June, 2005

Privacy-Preserving Datamining

on Vertically Partitioned Databases

Cynthia Dwork, Kobbi Nissim

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/crypto04-dn.pdf

Journal

2004

Revealing Information while Preserving Privacy

Irit Dinur, Kobbi Nissim

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.1298&rep=rep1&type=pdf

Journal

2004

Infobox

Key people

Cynthia Dwork

Irit Dinur

Kobbi Nissim

Timeline

June 3, 2020

Facebook details new a differential privacy framework to protect user information in shared datasets.

June 13, 2016

Apple announces the use of differential privacy to protect user data in iOS.

Apple's senior vice president of software engineering Craig Federighi made the announcement in the keynote address of Apple's Worldwide Developers' Conference (WWDC) in San Francisco.

March 2006

"Calibrating Noise to Sensitivity in Private Data Analysis" is published developing the field of privacy-preserving statistical databases.

The research by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith, was presented at the Third Theory of Cryptography Conference (TCC 2006). It showed privacy can be preserved for general functions by calibrating the standard deviation of the noise according to the sensitivity of the function.

June 2005

"Practical Privacy: The SuLQ Framework" is presented at the 24th ACM SIGMOD International Conference on Management of Data / Principles of Database Systems in Baltimore.

The research by Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim, shows a strong form of privacy is possible using a small amount of noise using the Sub-Linear Queries (SuLQ) primitive.

August 2004

Cynthia Dwork & Kobbi Nissim develop the ideas from Dinur & Nissim's 2003 paper, investigating multi-attribute databases and vertically partitioned databases.

The paper titled "Privacy-Preserving Datamining on Vertically Partitioned Databases" was presented at the 24th Annual International Cryptology Conference.

June 2003

Irit Dinur & Kobbi Nissim publish a paper titled "Revealing Information while Preserving Privacy."

The paper defines a method of preserving privacy and protecting against polynomial reconstruction algorithms by introducing a perturbation to the dataset.

Arthur Smalley

edited on 23 Apr, 2022

Edits made to:

Article

Table (+1 rows) (+4 cells) (+135 characters)

Table (+2 rows) (+8 cells) (+218 characters)

Categories (+1 topics)

Related Topics (+1 topics)

Differential privacy

Differential privacy is a system enabling the analysis of databases containing personal information without divulging the identity of the individuals.

Article

Table

Title

Author

Link

Type

Date

The Algorithmic Foundations

of Differential Privacy

Cynthia Dwork, Aaron Roth

https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

2014

Table

Title

Date

Link

Differential Privacy - Simply Explained

January 25, 2018

https://www.youtube.com/watch?v=gI0wk1CXlsQ

The Definition of Differential Privacy - Cynthia Dwork

November 14, 2016

https://www.youtube.com/watch?v=lg-VhHlztqo

Edits on 7 Apr, 2022

Jude Gomila

edited on 7 Apr, 2022

Edits made to:

Infobox (+1 properties)

Infobox

Is a

Industry

Edits on 22 May, 2020

"Wikidata import from WikidataImport2"

Golden AI

edited on 22 May, 2020

Edits made to:

Infobox (+1 properties)

Infobox

Wikidata entity ID

Q5275358

Edits on 1 Jan, 2017

"Initial topic creation"

Golden AI

created this topic on 1 Jan, 2017

Edits made to:

Article

Differential privacy

Differential privacy is a system enabling the analysis of databases containing personal information without divulging the identity of the individuals.

Find more entities like Differential privacy

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.

Open Query Tool

Access by API