Book page

Principles

• 12 November 2023

In addition to those described in the European Statistics Code of Practice, six further principles were drawn up for the validation processes (see Annex A of the Business Architecture for ESS Validation). These six principles are fully compatible with those in the European Statistics Code of Practice, although their aim is to provide guidance specifically on how to improve the validation processes. They are particularly relevant for designing the business and IT architecture for data validation.

 

1. THE SOONER, THE BETTER

Validation processes must be designed to be able to correct errors as soon as possible, so that data editing can be performed at the stage where the knowledge is available to do this properly and efficiently.

Rationale

This principle is at the core of any statistical validation process. There may be many reasons underlying a validation error. Finding the cause and fixing it might well include investigating the correctness of data, software, methodologies or statistical processes as a whole. This can only be done by people sufficiently familiar with the statistical domain and the way the data was produced. Hence, the sooner errors are detected in a statistical production chain, the easier and more efficient it is to correct them.

Implications

For the ESS this means that validation of national data should take place at the NSI’s who have the sole responsibility for the correctness of the national data. The NSI can only do so if the validation rules are well-defined and understood (see principle 3). If national data appears to be violating validation rules after data exchange, Eurostat should inform the NSI so that correction can be done at the right place. In cases where validation errors arise from rules involving multiple countries, data editing cannot be done by only one NSI. In those cases it is up to Eurostat, being responsible for European figures, to come up with the best possible solution.

 

2. TRUST, BUT VERIFY

When exchanging data between organisations, data producers should be trusted to have checked the data before and data consumers should verify the data on the common rules agreed.

Rationale

Successful data exchange between organisations is a shared responsibility of data producers and data consumers. This cannot be done without a reasonable amount of trust and understanding of each other's duties and challenges. It is the duty of data producers to validate data in the scope of the local perspective before providing it to others. It is the task of the data consumer to validate data in the scope of its broader perspective and provide data producers with useful feedback.

Implications

For the ESS this means that member States have a duty to provide Eurostat with data which conform to the validation rules agreed upon. Eurostat, guaranteeing and monitoring the quality of European statistics, has a duty to check that Member States data abide by these same rules and provide them with timely feedback on conformance.

 

3. WELL-DOCUMENTED AND APPROPRIATELY COMMUNICATED VALIDATION RULES

Validation rules must be clearly and unambiguously defined and documented in order to achieve a common understanding and implementation among the different actors involved.

Rationale

This principle seeks (1) to facilitate the development of sound and efficient validation processes, (2) to formalise them and achieve their harmonised implementation and (3) to raise awareness of each participant's role in the validation process.

Implications

For the ESS two elements are needed to make this principle operational: a common and easy understandable validation language and an effective communication mechanism. This means that a universal validation language must be chosen by the ESS and that domain specialists (statistical working groups) must agree upon the validation rules for their respective domains.

 

4. WELL-DOCUMENTED AND APPROPRIATELY COMMUNICATED VALIDATION ERRORS

The error messages related to the validation rules need to be clearly and unambiguously defined and documented, so that they can be communicated appropriately to ensure a common understanding on the result of the validation process.

Rationale

This will ensure (1) that errors can be properly corrected, (2) their recurrence is minimised and (3) the risk of false negatives is reduced.

Implications

For the ESS this principle requires the definition of a standard ESS validation report structure that is expressive enough to explain the error, its type and severity at a minimum and clear and unambiguous enough to be easily understood by domain and data managers of the NSI’s. A streamlined communication process between Eurostat and the NSI’s is necessary to make this principle operational.

 

5. COMPLY OR EXPLAIN

Validation rules must be satisfied or reasonably well explained.

Rationale

There may be situations that even earlier agreed validation rules cannot be satisfied. In that case there should be a possibility to escape from them, but only with a well described and understandable explanation that is accepted by the data consumer.

Implications

For the ESS this means the validation architecture should provide for a mechanism to explain the exceptional case of non-conformance and to define criteria to decide when an explanation is sufficient. Too strict criteria might become unworkable, too relaxed will not gain the quality improvements necessary. We advise to put together a set of best practices for explanations based on the use of this principle in practice. Repeated occurrences of non-conformance require joint re-evaluation of earlier agreed validation rules.

 

6. GOOD ENOUGH IS THE NEW PERFECT

Validation rules should be fit-for-purpose: they should balance data consistency and accuracy requirements with timeliness and feasibility constraints.

Rationale

It is well known and accepted that perfect data is a myth: errors always exist. The responsibility of the statistician is to manage them so that the final outcome represents a good compromise between all dimensions of data quality.

Implications

For the ESS this means that for the design of domain-specific validation rules in the statistical working groups one should look for the right balance between:

  • Number of errors to be detected: detecting too many errors risks slowing down the process and makes it inefficient; detecting too few creates the risk that important errors are left undetected.
  • Level of severity: rules that are too strict could slow down the process or may lead to a high rate of false positives.
  • Level of complexity: rules that are too complex could be source of inconsistencies and therefore of flagging false errors.
  • Output orientation versus book-keeping: validation rules should have a clear purpose in the broader context of the statistical ouput to be created.