Microdata access

Microdata

Microdata are sets of records containing information on individual respondents. To protect the anonymity of respondents (persons, households, organisations), access to microdata is restricted. Access to microdata is usually limited to researchers.

Types of microdata

Public use files (PUF)

These are files containing records on individual respondents (persons, households, business entities) anonymised in such a way that the respondent cannot be identified either directly (by name, address, social security number, etc) or indirectly (by combining different - especially rare - characteristics of respondents: age, occupation, education, etc). PUF are not confidential and in principle may be used by the general public. Due to extensive anonymisation, PUF are not very useful for scientific purposes as many variables are suppressed, modified, or grouped). Very often they are used for training or testing. Public microdata of European countries are available via the Eurostat website.

Scientific Use Files (SUF)

These are files containing records on individual respondents (persons, households, business entities) anonymised in such a way that the risk of identification of respondent is appropriately reduced but is not completely eliminated; that is why these files are considered confidential and access to them is restricted. SUF are usually available for download or transmitted to the researchers, and then used at the researchers' premises (off-site). The conditions of access to European microdata are described in the microdata dedicated section on Eurostat website.

Secure Use Files (SecUF)

These are files that do not contain any direct identifiers (name, address, social security number, etc) but no further protection methods are applied. Access to SecUF is restricted. Usually, researchers may access the secure environment where the data are stored but data download is prohibited, unless the results of the analysis is checked (see Guidelines on output checking). Only safe (non-confidential) output is released to researchers. Researchers may access SecUF in different ways:
•    On-site, at the premises of the statistical office;
•    Remotely, by connecting to servers with the data from another, distant location (e.g. university, research organisation).

SUF and SecUF are confidential data. To gain access to these data researchers must fulfil certain conditions (established by data owners) and sign the relevant contracts or licenses. In addition, researchers need to respect guidelines for publication that are delivered with the data and any other conditions imposed by the data owner.

Modes of access to microdata

Off-site access

The data are sent or transmitted to the user and can be analysed anywhere (PUF) or in the agreed places (SUF);

On-site access 

The data (usually secure us files) can be consulted only in the predefined locations (e.g. safe centre in the statistical office). The results of the analysis are controlled by the staff of the office before release, and the final results cannot contain any confidential data (see more: Guidelines on Output Checking);

Remote access

The data (usually SecUF) are accessed by researchers who connect to the data from a distant location; likewise in on-site access the results of the analysis cannot be taken out by the researcher before a control of the output (Output Checking);

Remote execution

the user (usually the researcher) does not see the microdata but sends scripts (in statistical language like SAS, SPSS, STATA) or extracts the data via a querying system and receives back cleared output (no confidential results). The output is checked manually (like in the case of on-site or remote access) or automatically.

Microdata protection methods

Microdata anonymisation

Anonymisation is the process of reducing or eliminating the risk of identification of respondents in microdata. The statistical methods used to anonymise the data are called statistical disclosure control methods. Also, the term "anonymisation" is sometimes used to name the process of removing direct identifiers from the data. 

Statistical disclosure control

Statistical disclosure control is the statistical domain aiming at:

  • microdata protection (anonymisation);
  • tabular data protection (elimination of the risk of identification of the respondents in the tables published by statistical offices).

See more: Handbook on SDC

Access to microdata

Details on access to European microdata (at Eurostat and at national level) can be found on dedicated CIMES pages on CROS. CIMES (Centralising and Integrating Metadata from European Statistics) provides an overview of microdata disseminated by Eurostat and by national statistical institutes (NSI) across Europe for research purposes.