Book page

Validation levels

Fernando MORENTE-ORIA
Fernando MORENTE-ORIA • 13 December 2023
Validation levels

Examining the practical implementation of the validation process means looking at it from a business perspective. In doing so, the focus is on the validation activities.

The amount and accessibility of information needed and the phases of the validation process are important for determining the validation levels. This approach is particularly useful when classifying and designing validation activities within an organisation.

Validations could be divided into structural validations and content validations.

  • Structural validations are linked to the definition of the data structure. In the SDMX (Statistical Data and Metadata eXchange) context,  this also includes the definition of the code lists and constraints related to the use of specific codes. Structural validations refer here to validation level 0 and a part of validation level 1 described below.
  • Content validations are linked to levels 1 to 5 described below. They also rely on a clear definition of the data structure.

 

Validation level 0: consistency with the expected IT structural requirements.

Check e.g. that:

  • The file has the expected number of columns (agreed format of the file);
  • The column has the expected format (i.e., alphanumeric, numeric etc.).

Check if:

  • the file has been sent/prepared by the authorized authority (data sender)
  • the column separator and the end of record symbol are correctly used
  • the first column is alphanumeric (format of each variable / column)
  • the first column is two-character long (length of the first column)
  • the second column fits a particular mask (date)
  • all the required information is included in the file (no missing data)

 

Validation level 1: consistency within the dataset.

Check e.g. that:

  • The content of the third column is one of the codes from the ‘Sex’ dictionary;
  • The content of the first column (reporting country) is consistent with the data sender;
  • Total inhabitants = male inhabitants + female inhabitants.

Check if:

  • the year in the second column is 2011, as in the file name
  • the number included in column 4 is a positive integer
  • total inhabitants = male inhabitants + female inhabitants
  • female inhabitants = (total inhabitants / 2) ±10%
  • there are any double records (e.g. same country recorded twice in a list of different countries)

 

Validation level 2: consistency with other datasets within the same domain and data source.

Check e.g. that:

  • New data referring to a new time period is not an outlier (does not vary by more than 10 % compared to data from the previous time period);
  • Annual data is consistent with data from the corresponding quarterly datasets.

Check if:

  • a new data is not an outlier compared to previous data (e.g. max. 10% difference from the corresponding data of the previous time period)
  • different but related datasets correlate (e.g. changes in the number of males in one dataset and number of females in another dataset during the same period)

Validation level 3: consistency within the same domain between different data sources (mirror checks).

Check e.g. that the export declared by country A to country B is the same as the import declared by country B from country A.

 

Validation level 4: consistency between separate domains available in the same organisation.

Check e.g. that the number of enterprises and employees in SBS and Business demography are consistent for the same time period.

 

Validation level 5: consistency with data available in other organisations.

Check e.g. that country data in the ESS are consistent with the data available in the World Trade Organisation, International Labour Organisation, World Bank etc.