Differential Privacy for Official Statistics: From Theory to Practice |
Course Leader | Raphaël de Fondeville (Swiss Federal Statistical Office) |
Target Group | Statisticians and staff responsible for statistical divulgation control and data protection. |
Entry Qualifications | Sound command of English. Participants should be able to make short interventions and to actively participate in discussions. Basic knowledge of Python. Basic knowledge of theoretical statistics (random variable, hypothesis testing, …).
|
Objective(s) | The objective of this course is to provide the participants with an overview of state-of-the-art attacks aiming at extracting confidential information from data products, as well as an introduction to differential privacy as a formal mathematical framework to quantify disclosure risk and its practical implementation.
|
Contents | Principle and examples of linkage, reconstruction, and membership attacks. Overview of input and output privacy-enhancing-technologies. Motivation and properties of differential privacy: the privacy-loss budget, DP privacy guarantee, post-processing, and composition. Presentation of the Laplace, Gaussian, and exponential mechanisms. Local differential privacy, the randomized response mechanism and differentially private synthetic dataset generator. Presentation of open-source libraries to implement DP: SQL queries, summary statistics, training machine learning algorithms and synthetic data generation. Introduction to the open-source data lab Onyxia from INSEE (https://datalab.sspcloud.fr/).
|
Expected Outcome | An understanding of state-of-the-art attacks on reidentification and attribute disclosure. Basic understanding of privacy-enhancing-technologies (PETs). Theoretical understanding of differential privacy and major randomization mechanisms. Knowledge on how to implement differential privacy into Python data analysis pipelines using open-source libraries. Familiarity with the Onyxia Datalab developed by INSEE.
|
Training Methods | The course includes both ex-cathedra lectures and hand-on sessions. The following training methods will be used: Presentations and lectures. Coding practicumsn Python using the SSPCloud provided by INSEE.
|
Required Reading | None |
Suggested Reading | |
Required Preparation | |
Trainer(s)/ Lecturer(s) | Raphaël de Fondeville (Federal Statistical Office) Pauline Maury-Laribière (Federal Statistical Office) Damien Aymon (Federal Statistical Office) Lancelot Marti (Federal Statistical Office) Inès Hiverlet or Garspard Ferey (INSEE) |