Introducing Private and Secure Artificial Intelligence

Source: Deep Learning on Medium


{Warning: After reading you will change your career path}

Go to the profile of Archit

Introduction:

Differential Privacy is about ensuring that when our neural networks are learning from sensitive data, that they’re only learning what they’re supposed to learn from the data without accidentally learning what they’re not supposed to learn from the data.

The term ‘Differential’ in Differential Privacy refers to variation due to the amount of privacy that the query provides and the loss in accuracy of the distribution like Adding noise provides privacy but reduced accuracy.

Foundational Principles of Differential Privacy

  1. How noise is applied?
  2. How we define privacy?

What is Differential Privacy?

The general goal of differential privacy is to ensure that different kind of statistical analysis doesn’t compromise privacy and privacy is preserved if, after the analysis, the analyzer doesn’t know anything about the features in data-set, that means Information which has been made public elsewhere isn’t harmful to an individual.

Robust definition of privacy proposed by Cynthia Dwork, Algorithmic Foundations:

“Differential Privacy” describes a promise, made by a data holder, or curator, to a data subject, and the promise is like this: “You will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies, data sets, or information sources, are available.”

On Considering a person database, to define privacy in the context of this, we’re performing some query against the database and if we remove a person from the database and the query doesn’t change then that person’s privacy would be fully protected. This means we are removing a person from the database and query doesn’t change then it means that person wasn’t leaking any statistical information into the output of the query. To give you more intuition about this let us see in a python code:

Note: “parallel databases” are simply databases with one entry removed, like in the case below we are creating n parallel database with 1 entry removed in each, starting from first like in the first pdb, the first entry is removed, similarly in the second pdb, the second entry is removed.

Note: Checkout Comments in the code for Explanation

Parallel Database Creation

Evaluating The Privacy of a function

Now let us see how to query this database and then measure the privacy of that query for this we have to compare the output of query on the entire database with the output of the query on each of the parallel databases. Hence, in this way we see how query changes on removing a person from the database. If the query doesn’t change this means the removed person didn’t leak any private information

For this, we have to check out the difference in the output of the query relative to when we would query the entire database for all parallel database and this difference is helping to evaluate sensitivity.

Sensitivity or L1 sensitivity help to compare how different queries do or do not leak information and it is the maximum change in the query when we are removing a person from the database

Now to Evaluate the privacy of a function we have to follow the following steps:

  1. Create parallel databases.
  2. Define a function which queries a database.
  3. Now calculate the sensitivity.

The main strategy that we’re going to take to protect the individual’s privacy is one of noise. which means we are adding random noise to the database and to the queries in the databases in order to protect people’s privacy.

Introducing Local and Global Differential Privacy

Local Differential Privacy adds noise to function inputs. In this case, each individual add noise to their data before putting it into the database. So, everything that enters into the database is already noised. In this setting, users are most protected as they do not have to trust the database owner to use their data

Global Differential Privacy adds noise to the output of the query on the database. This means that the database itself contains all the private information and that it’s only the interface to the data which adds the noise necessary to protect each individual’s privacy.

Conclusion

if the database operator is trustworthy that means the database owner(Trusted curator) should add noise properly, the only difference is that global differential privacy leads to more accurate result with the same level of privacy protection.


References

[1] Aaron Roth and Cynthia Dwork, Algorithmic Foundations of Differential Privacy (2014), Foundations and Trends R in Theoretical Computer Science Vol. 9, Nos. 3–4 (2014) 211–407