Microdata access
Microdata
Microdata consist of sets of records containing information on individual respondents. To protect the anonymity of respondents (persons, households, organisations), the access to microdata is restricted. Usually access to microdata is limited to researchers.
Types of microdata
- Public use files (PUF): these are files containing records on individual respondents (persons, households, business entities) anonymised in such a way that the respondent cannot be identified either directly (by name, address, social security number etc.) or indirectly (by combining different - especially rare - characteristics of respondents: age, occupation, education etc.). PUF are not confidential and in principle may be used by general public. Due to extensive anonymisation, PUF are not very useful for scientific purposes (many variables are suppressed, modified or grouped together); very often they are used for training or testing. Public microdata are available via the Eurostat website.
- scientific use files: these are files containing records on individual respondents (persons, households, business entities) anonymised in such a way that the risk of identification of respondent is appropriately reduced but is not eliminated completely; that is why these files are considered confidential and access to them is restricted; scientific use files are usually available for download or transmitted to the researchers and are used at the researchers' premises (off-site). The condition of access to European microdata are described on microdata dedicated section on Eurostat website.
- secure use files: these are files that do not contain any direct identifiers (eg. name, address, social security number etc.) but no further protection methods are applied. Access to secure use files is restricted. Usually researchers may access the secure environment where the data are stored but any data download is prohibited unless the results of the analysis is checked (see Guidelines on output checking). Only safe (non-confidential) output is released to researchers; researchers may access secure use files in different ways:
- On-site, at the premises of statistical office;
- Remotely, connecting to servers with the data from another, distant location (eg. university, research organisation);
Scientific use files and secure use files are confidential data. In order to get access to these data researchers must fulfil certain conditions (established by data owners) and sign the relevant contracts or licenses. In addition, researchers need to respect guidelines for publication that are delivered with the data and any other conditions imposed by the data owner.
See more: conditions of access to European microdata
Modes of access to microdata
- off-site access: the data are sent or transmitted to the user and can be analysed anywhere (PUF) or in the agreed places (scientific use files);
- on-site access: the data (secure use files) can be consulted only in the predefined locations (e.g. safe centre in statistical office); the results of the analysis are controlled by the staff of the office before release; the final results cannot contain any confidential data (see more: Guidelines on output checking);
- remote access: the data (usually secure use files) are accessed by researchers who connect to the data from another - distant location; likewise in on-site access the results of the analysis can not be taken out by researcher before the control of the output (output checking);
- remote execution: the user (usually researcher) does not see the microdata but sends scripts (in statistical language like SAS, SPSS, STATA) or extracts the data via querying system and receives back cleared output (no confidential results); the output is checked manually (like in case of on-site or remote access) or automatically.
Microdata anonymisation
Anonymisation is the process of reducing or eliminating the risk of identification of respondents in microdata. The statistical methods used to anonymise the data are called statistical disclosure control methods. Also, the term "anonymisation" is sometimes used to name the process of removing direct identifiers from the data.
Statistical disclosure control
Statistical disclosure control is the statistical domain aiming at:
- microdata protection (anonymisation)
- tabular data protection (elimination of the risk of identification of the respondents in the tables published by statistical offices)
See more: Handbook on SDC
Access to European microdata
Details on access to European microdata can be found on Eurostat dedicated website.