Book page

Modelling and interoperability guidelines

MODELLING AND INTEROPERABILITY GUIDELINES

For service support, please contact:

ESTAT-DATA-METADATA-SERVICES@ec.europa.eu

Last reviewed: 09-12-2024
ESS-MH web portal

 

1. Standards for Metadata and Quality reporting in the European Statistical System (ESS) 

The main standard for reference metadata and quality reporting, approved by the ESSC (European Statistical System Committee) in 2015, is the Single Integrated Metadata Structure (SIMS). 

Out of SIMS different reporting structures can be extracted, one which is more user-oriented and another more producer-oriented:

  • User-oriented reports, that allow users to correctly understand and interpret the data released;
  • Producer-oriented reports, that monitor the quality of the statistics produced.

Metadata reporting structures

SIMS

The Single Integrated Metadata Structure (SIMS) constitutes the dynamic inventory and conceptual framework for all ESS quality and reference metadata concepts. In this structure, all statistical concepts of the two existing ESS reporting structures (ESMS and ESQRS) have been included and streamlined, by assuring that all concepts appear and are therefore reported upon only once (direct re-usability of existing information). The 19 high-level concepts and most of the sub-concepts are derived from the SDMX cross-domain concepts, published in the SDMX Glossary.

 
ESMS

Euro-SDMX Metadata Structure (ESMS) reports are standardized, SDMX compliant files used for describing the statistical data sets published by Eurostat on its website. The purpose of the more user-oriented subsets is to document the methodologies, the quality aspects and the statistical production processes in general. 

 

ESQRS

ESS Standard for Quality Reports Structure (ESQRS) is the standard for the production and dissemination of quality reports within the ESS. Those more producer-oriented subsets provide users, especially aimed at designers of statistical processes and producers of statistics, with detailed information to assess the quality of the data sets released by Eurostat.

 

2. SDMX framework for metadata and quality reporting

Each metadata and quality report in the ESS Metadata Handler (ESS-MH) is described and defined by an SDMX Metadata Structure Definition (MSD), and all Metadata Structure Definitions are publicly available in the Euro-SDMX Registry

  • The Metadata Structure Definition defines the Report Structure comprising a set of Metadata Attributes that can be defined as a hierarchy, where each Metadata Attribute identifies a Concept
  • Different Metadata Attributes in the same Report Structure can use Concepts from different Concept Schemes
  • The Metadata Attribute can be specified as having multiple occurrences and/or specified as being mandatory or conditional via Min Occurs (1 or unbounded).
  • Each Metadata Attribute can have a Representation specified (using the /localRepresentation association), e.g. String; Date; reference to a Code List, which is also stored in the Euro-SDMX Registry; Boolean.
  • The Metadata Attribute represented by a Code List can contain an Annotation, specifying whether multiple values from the respective Code List can be selected. The annotation specifies then «MAX_OCCURS»  (for maximal occurrences) as «unbounded». If this Annotation on the Metadata Attribute (Concept) is not existing, only one value of the references Code List will be accepted.
  • The Metadata Structure Definition is linked to a Metadataflow Definition, which is an abstract concept of a flow of metadata that providers will provide for different reference periods. 

Most metadata and quality reports in the ESS use standard SIMS, ESMS or ESQRS MSDs. A major part of the standard concepts included there are derived from the SDMX Glossary and consequently reflected in the cross-domain Concept Scheme ESTAT+SDMX_CDC. The additional standard concepts not included in the SDMX Glossary refer to the Eurostat specific complementary, cross-domain Concept Scheme ESTAT+ESTAT_ADD_CONCEPTS.

Several reports however rely on customized MSDs, basically extensions of the standard SIMS, ESMS and ESQRS MSDs where additionally to the standard cross-domain concepts domain-specific sub-concepts integrated. Those are defined in complementary Concept Schemes. See for example the customized ESMS Report structure for the collection of metadata to the Census, where the standard ESMS MSD based on ESTAT+SMDX_CDC and ESTAT+ESTAT_ADD_CONCEPTS has been enriched by concepts stemming from the concept scheme ESTAT+CENSUS_ESMS_ADDED_CONCEPTS. Further information on the customization approach can be found below, particularly chapters 3.3 Semantic interoperability and 3.4 Technical interoperability.

MSD and metadata concepts relationship

 

The ESS Metadata Handler (ESS-MH) is the web application developed by Eurostat for supporting the production, management, exchange and dissemination of European and national reference metadata files based on the above-mentioned standards. 

The ESS-MH retrieves the necessary SDMX artefacts for the structures of metadata and quality reporting from the Euro-SDMX Registry, i.e. respective MSDs with its references and information on Concepts in the used Concept Schemes, referenced Code Lists and the specifics of the Metadata flow.

The MSDs detail which concept IDs must be used, how those concepts must be represented, how referenced Code Lists can be handled (selection modes allowing single or multiple values) and which structure and hierarchy the SDMX-ML file (actual metadata file) should follow to be considered acceptable. Both, the SDMX-ML version of the Metadata or quality report along with the underlying MSD can be exported from ESS-MH, and they are also disseminated in a bundle and visualized user friendly as HTML file on Eurostat website.

European and national reports are compiled by users in Eurostat and national competent authorities, respectively, either directly in the application ESS-MH using Editing wizards, or by importing metadata files to the application.

Important note on the support of SDMX 3.0

Euro SDMX Registry: SDMX artefacts have been migrated to the new instance of the Euro SDMX Registry to create, manage and access SDMX 3.0 MSDs.

ESS-MH: At the end of 2024, an upgrade has been made to be able to fetch and read SDMX 3.0 MSDs and support SDMX 3.0 formats for input and output. SDMX 2.0 MSDs are not supported by the new Euro SDMX Registry, with the result that MSDs will only be available in SDMX 3.0 (and 2.1) format. The SDMX 2.0 versions, which are used by several Member States to prepare metadata files, will remain available in the old instance of the Euro SDMX Registry for a transition period until the end of 2025. 

Further technical information on export and import options is available.

SDMX registry, ESS-MH and users

 

2.1 Discrepancies between SIMS vs. SDMX concept names

When the Metadata Structure Definitions for SIMS, ESMS and ESQRS were created, their concepts were linked, when possible, to the SDMX cross-domain concepts listed in the SDMX Glossary. This was done in order to facilitate future global SDMX-based metadata sharing (e.g. exchange of metadata between international organizations).

In most cases, the names of the SDMX Glossary and of the SIMS concepts are the same. In a few cases, the concepts have the same meaning but slightly different names. There are therefore some slight discrepancies in the names (labels) of the concepts as they appear in the official SIMS documentation (e.g. the ESS handbook on quality and metadata reports) and as they appear in the ESS-MH and Euro-SDMX Registry. 

 

2.2 Implementation of QPIs in Metadata and Quality reporting

SIMS includes a set of Quality Performance Indicators (QPIs), which are meant to provide quantitative information to benchmark quality. The list of such indicators as well as detailed guidelines for how they can be computed can be found in part III, section C of the ESS handbook on quality and metadata reports (page 263).

As can be seen from the handbook, some QPIs have different guidelines for user reports and for producer reports. In many cases, the guidelines say that the information given to users should be reduced in scope / level of detail compared to the information provided to producers. For a couple of QPIs, the guidelines suggest using different calculation formulas for users and for producers. The list of QPIs which have different guidelines for users and producers is the following:

  • R1. Data completeness – rate
  • A1. Sampling error – indicators
  • A4. Unit non-response – rate
  • A5. Item non-response – rate
  • A6. Data revision - average size
  • TP2. Time lag - final results
  • TP3. Punctuality - delivery and publication
  • CC2. Length of comparable time series

In order to reflect the fact that the QPIs listed above have different definitions and guidelines for users and producers, they were integrated into SIMS in the following way:

  • The “user-oriented” information is to be provided as part of the relevant ESMS concept.
  • The “producer-oriented” information is to be provided in a separate concept.

The table below shows for each of the 8 QPIs that have different user-oriented and producer-oriented guidelines, what the Concept ID to be used is according to the SIMS MSD.

QPIUser-oriented concept IDProducer-oriented concept ID
R1. Data completeness – rateCOMPLETENESSCOMPLETENESS_RATE
A1. Sampling error – indicatorsSAMPLING_ERRSAMPLING_ERR_IND
A4. Unit non-response – rateNONRESPONSE_ERRUNIT_NONRESPONSE_RATE
A5. Item non-response – rateNONRESPONSE_ERRITEM_NONRESPONSE_RATE
A6. Data revision - average sizeREV_PRACTICEDATA_REV_AVGSIZE
TP2. Time lag - final resultsTIMELINESSTIMELAG_FINAL
TP3. Punctuality - delivery and publicationPUNCTUALITYPUNCTUALITY_RELEASE
CC2. Length of comparable time seriesCOMPAR_TIMECOMPAR_LENGTH

 

Some structures (like SIMS) contain both the user-oriented and producer-oriented concepts and therefore expect both kinds of information to be provided, while simpler structures (like ESMS) only contain the user-oriented concepts and respectively expect only the user-oriented information to be provided.

 

3. Ensuring interoperability between national and European metadata systems

Eurostat's objective is to facilitate the reuse of national metadata at European level. Precondition for achieving "once for all purposes" reporting is to guarantee interoperability for national and European quality reporting. The European Interoperability Framework (EIF), is used by Eurostat to systematically describe how interoperability is ensured at different levels, and respectively where gaps are still persisting 

EIF was adopted by the European Commission in April 2017, is a framework that gives guidance on how to set up interoperable public services. The EIF offers a layered interoperability model which organizes the different aspects to be addressed when designing interoperable systems in four layers. 

 

3.1 Legal interoperability

EIF definition: Legal interoperability is about ensuring that organizations operating under different legal frameworks, policies and strategies are able to work together. This might require that legislation does not block the establishment of European public services within and between Member States and that there are clear agreements about how to deal with differences in legislation across borders, including the option of putting in place new legislation.

The European Statistics Code of Practice was adopted by the Statistical Programme Committee on 24 February 2005 and revised by the European Statistical System Committee in September 2011 and in November 2017. Quality and metadata reporting is enshrined in the ESS Code of Practice under Principle 15: "European Statistics are available and accessible with supporting metadata and guidance".

Article 12 of regulation 223/2009 on European statistics establishes the obligation for Member States to submit reports on the quality of the data transmitted to Eurostat. It therefore provides the legal basis for metadata and quality reporting in the ESS. Moreover, in November 2015, the ESSC decided that "SIMS will be the standard for quality reporting according to Article 12 of Regulation 223/2009 on European statistics". 

The specific arrangements for the implementation of the generic obligations established by regulation 223/2009 are often laid down in domain-specific implementing acts. Not all sectoral legislation consistently refers to SIMS, but Eurostat always maps sectoral requirements to the SIMS structure. 

 

3.2 Organizational interoperability

EIF definition: This refers to the way in which public administrations align their business processes, responsibilities and expectations to achieve commonly agreed and mutually beneficial goals. In practice, organizational interoperability means documenting and integrating or aligning business processes and relevant information exchanged.

The ESS has created and promoted standards for the alignment of practices in statistical production and in quality assurance processes. These include the European Code of Practice, the European Quality Assurance Framework and reference models such as the GSBPM. These frameworks provide a common language for the description of production processes and common recommendations for how they should be managed. However, the number and granularity of such processes varies across ESS members based on national circumstances. This means that there is not always a one-to-one correspondence between what is considered to be a single process or domain at European and national level. In order to make it easier for Member States to know which metadata flows they need to report and what data they are supposed to cover, Eurostat makes available the following information: 

 

National and European metadata and quality reporting

In ESS-MH is indicated which metadata flows are currently actively used for national metadata and quality reporting. The naming convention used to create the label of Metadata Flows in ESS-MH is structured as below:   

National and European metadata reporting structure

 

Data and metadata reporting

The ESS-MH’s list of metadataflows and providing organisations is synchronised with EDAMIS. This guarantees consistency in the information available on metadataflows in both applications and allows the EDAMIS reports for compliance monitoring to also be used for metadata transmissions as well. ESS-MH National Administrators can also see to which EDAMIS datasets each metadataflow refers to using the relevant report in EDAMIS

Additionally, ESS-MH National Administrators may obtain valuable information in the Metadata flows report in ESS-MH, particularly on the status of the Metadata flows, the organisations expected to provide metadata for them, and the node in the Eurostat Data Browser to which published metadata files would be attached.

 

3.3 Semantic interoperability

EIF definition: Semantic interoperability ensures that the precise format and meaning of exchanged data and information is preserved and understood throughout exchanges between parties, in other words ‘what is sent is what is understood’.

The SIMS 2.0 standard and its related reporting structures, ESMS and ESQRS, provide an inventory of the various concepts of interest for metadata and quality reporting in the ESS. By providing a common structure and common definitions, SIMS should act as the foundation for the semantic interoperability of metadata systems in the ESS.

The main issue currently affecting semantic interoperability is the fact that, over the course of the implementation of SIMS, various degrees of domain-specific customization have been introduced. Customizations respond to the need for detailed information on specific aspects of the statistical production process and may be needed to:

  • Ensure that sufficient information is available for compliance monitoring purposes (e.g. whether specific recommendations outlined in methodological manuals are followed);
  • Standardize the responses for certain concepts (e.g. by associating a code list to a concept to make sure that the answer provided comes from a pre-defined set of possible answers);
  • Receive information at the level of specific variables or datasets within a wider domain or production process.

Eurostat tries to limit the customization of SIMS, ESMS or ESQRS-based metadata files as customization may affect the semantic (and partly technical) interoperability of the reporting standards. For this reason, Eurostat has established internal ‘golden rules’ regarding customization policy for metadata implementations. 

 

3.3.1 General principles
  • Direct implementation of SIMS: If both ESMS and ESQRS are required for a particular domain, a single SIMS template will be implemented in the ESS-MH. This will guarantee that the principle of "once for all purposes" reporting is respected. SIMS is also implemented in cases when an ESQRS template is requested. Going toward SIMS is the approach followed not only for national implementations but also for European metadata. Good examples to be mentioned for these cases are domains like COD, COMEXT, ENVPFLAC, IFS, DUBLINII.
  • SIMS concepts remain pure: No introduction of customization via HTML templates in standard SIMS concepts. Any customization must happen at the level of sub-concepts. This guarantees that information compiled at national level for the standard SIMS concepts can be directly reused at European level.

 

3.3.2 Customization approaches

When customizations are necessary, Metadata team follows the below listed customization ordered by hierarchy and preference.   

 

3.3.2.1 Customization- Guidelines

One of the most common ways to align domain specific requests with SIMS is the customization of metadata-flow specific guidelines. Each metadata flow can have customized guidelines for each reference period. Instead of extending the SIMS MSD with a domain specific sub-concept, guidelines are created/extended in order to advise what domain specific information is to be provided in the SIMS concept. Good examples of guidelines customizations can be found for all statistical domains related to modes of transport AIR, ROAD, RAIL, MRTM, IWW, that have implemented the same customized guidelines for the ESMS structure.

 

3.3.2.2 Customization- Annexes

In ESS-MH, it is possible to attach additional information as annexes in file level and sub-concept level. This attachment can be done for every metadata file or on template level. In case of an annex attached on template level, an identical annex is created every time a new metadata is created. When it is applicable, it is highly recommended using such annexes instead of HTML tables. 

The example below is the extended guidelines on S.13.1. Accuracy-overall for Air Emission Accounts Statistics. The standard SIMS guidelines are kept unchanged and enriched with domain specific guidelines in bold. Additionally, complementary instructions on how to treat and fill the pre-attached annexes are provided. Additionally, complementary instructions on how to treat and fill the pre-attached annexes are provided.

Extended guideline example

 

3.3.2.3 Customization- Metadata Structure

As a general rule, customization on structure level should be avoided but when it is needed, the following rules should be followed:

Insertion of sub-concepts at the lowest hierarchical level of the SIMS structure: Sub-concepts allow for the clear separation of "standard" information and "domain-specific" information. Consequently, the standard SIMS MSD is extended without changing any attribute of the standard concepts. In order to increase reusability within the same domain, all domain-specific concepts are created in a specific Concept Scheme following a coding convention : ‘SIMS parent concept code ’_’Domain-specific description’

Sub concept extension example

 

As a general principle, all domain-specific concepts can be added only to the lowest level of the SIMS hierarchy. In case the lowest level of the hierarchy is a QPI, domain-specific concepts may be added as siblings to the QPI (see illustrative examples below). This is because QPIs do not provide a full conceptual coverage of their parent concept. It is recommended to apply domain-specific guidelines to help the providers during the documentation process.

HTML templates (currently unavoidable for tabular information) can only be inserted at the levels of domain-specific sub-concepts, and cannot be inserted in standard concepts. The possibility of restricting for publication on country level is open for all concepts in the metadata structure. The example below illustrates this rule in detail. 

Customization of a SIMS with domain specific sub concepts:

Customatisation with domain specific concepts

Customization of SIMS with domain specific sub concepts-QPI case: 

Customatisation with subconcepts QPI case

 

3.4 Technical interoperability 

EIF definition: This covers the applications and infrastructures linking systems and services. Aspects of technical interoperability include interface specifications, interconnection services, data integration services, data presentation and exchange, and secure communication protocols.

The SDMX standard provides the basis for the technical interoperability of metadata systems in the ESS. The expected structure of metadata files is described as an SDMX Metadata Structure Definition (SDMX-ML) and the files produced by the ESS Metadata Handler are SDMX-ML files. Any metadata system that is either based on or has an interface to the SDMX standard should be able to both read and produce metadata and quality reports transmitted to Eurostat.

In order to facilitate the transition to SDMX format 3.0 and ensure compatibility with national metadata systems, ESS-MH allows imports and exports of Metadata files in format SDMX 2.0. The target is the phase-out of support for SDMX 2.0 Metadata files in ESS-MH at the end of 2025. 

Over the years, Eurostat has introduced certain extensions of the SDMX standard to deal with the requirements of metadata and quality reporting. The most notable among these is:

  • “Grouping” functionality, which specifies how concepts in the MSD can be grouped in an HTML template for easier visualization. In order to display these sub-concepts in the form of a table, an HTML table which refers to these sub-concepts is introduced in the metadata file. This way of treating referencing sub-concepts within an HTML table is atypical in SDMX and can cause interoperability issues with SDMX-based software. Consequently, no new implementation of the grouping functionality will be applied in ESS-MH, but current implementations may need to be supported until a suitable alternative is found. 

     

4. Communication

In order to improve communication with National Statistical Offices (NSO), the below listed actions are taken:

Increase transparency for the new national implementations: All the new implementations are published on Confluence with relevant information shared like:

  • Metadata flow: base structure, scope, dissemination
  • EDAMIS: domain and respective datasets (organizational interoperability)
  • Timeline: Availability in ESS-MH and expected date of transmission

Involve ESS-MH National Administrators in the process of project implementation. Before launching every new national metadata collection, ESS-MH National Administrators are informed about :

  • SDMX artefacts: MSD and Metadata Flows
  • Template: if there are prefilled texts, annexes and HTML tables included
  • Guidelines: if there are customizations on guidelines level
  • Provider organizations for each country
  • Countries who have volunteered to pilot the implementation from the content point of view
  • Planned date for launching the national metadata collection

In order to examine if any interoperability issue is faced, ESS-MH National Administrators are invited to test the new implementation either directly in ESS-MH or via their national metadata systems. All suggestions submitted to Eurostat are reflected before launching any new national metadata implementation.  In distinct cases, the feedback from the piloting phase and potential changes reflected are presented during domain- specific Working Groups.