Book page

Modeling of Eurostat's statistical classifications in ShowVoc

MODELLING OF EUROSTAT'S CLASSIFICATIONS IN SHOWVOC

 

For service support, please contact:

ESTAT-DATA-METADATA-SERVICES@ec.europa.eu

Last update : 28-09-2023

 

1. Introduction

Eurostat’s classifications are transformed into RDF (Resource Description Framework) and stored in a triplestore (Cellar). Users can browse and search the classifications online using ShowVoc. Additionally, the classifications can be queried and extracted through a SPARQL endpoint.

Eurostat‘s statistical classifications included in ShowVoc have been converted to RDF using data models such as SKOS (Simple Knowledge Organization System) and XKOS (Extended Knowledge Organization System). These data models facilitate the representation and organization of knowledge, making the classifications more structured and easy to navigate.

 

1.1. URIs (Uniform Resource Identifiers)

RDF uses URIs (Uniform Resource Identifiers) to uniquely identify resources in the RDF language. Eurostat’s classifications and their items are assigned URIs. URI’s enable direct referencing of the classifications and their individual items. Eurostat’s classifications are identified in the domain data.europa.eu. The namespace is made of the domain + one identifier (classification serie) + classification version.  For example for representing NACE we use the namespace: http://data.europa.eu/ux2/ and the identifiers suffixed to the namespace for identifying the different versions: http://data.europa.eu/ux2/nace2/ , http://data.europa.eu/ux2/nace2.1/.

The following table displays the namespaces utilized for some of Eurostat's primary classifications:

NACEhttp://data.europa.eu/ux2/nace2/
CNhttp://data.europa.eu/xsp/cn2023/
CPAhttp://data.europa.eu/qw1/prodcom2023/
PRODCOMhttp://data.europa.eu/qw1/prodcom2022/

Table 1: Namespaces in Eurostat's classifications

 

2. Statistical classification components

2.1  Classification dataset (Classification scheme)

 

Classification Scheme

Figure 1: Structure of a Classification Scheme [1]

 

Every version of the classification is represented as an SKOS Concept Scheme. A Statistical Classification Scheme is a concept scheme which includes concepts associated codes (numeric string labels), short textual names (also labels), definitions, and longer descriptions that include rules for their use. It can be flat (i.e., one level) or hierarchical.

Some main properties that are used to describe the metadata of a classification scheme are:

Classification title:
skos:prefLabel: The name of the classification, multiple languages are allowed. For NACE the prefLabel would be Statistical Classification of Economic Activities in the European Community, Rev. 2.1 (NACE Rev. 2.1).

skos:altLabel: An alternative name (usually the acronym) of the classification, multiple languages are allowed. For NACE the altLabel would be NACE Rev. 2.1.

Classification version:
owl:versioninfo: Number indicates the version of the dataset. For NACE 2.1 the versioninfo would be NACE Rev. 2.1.

Conceptual basis:
dct:description: A short description of the classification and its use.

Custodian:
dct:creator: Eurostat is the custodian for classifications owned by Eurostat.

Members of Classification scheme:
skos:hasTopConcept: Concept schemes contain top concepts, which should be the broadest concepts in the hierarchy of the scheme.

Levels inside the classification:
xkos:Levels and xkos:numberOfLevels: Document the levels and the number of levels they include. For NACE would have been an xkos:numberOfLevels of 4 and xkos:Levels, Sections, Divisions, Groups, Classes.

Legal basis:
dct:conformsTo: The legal basis (the official publication of the EU legal act) of the classification is presented with this property.

xkos:covers: A classification refers to a specific domain, which could be economic activity, economic sector, occupations, transport and more. The domain being addressed is typically represented by a skos:Concept, often derived from an established thesaurus such as EuroVoc.

Other:
xkos:follows:  With this property the links between the different versions of classifications (versions) is presented, in other words the succession in time of classifications and classification schemes is shown using this property.

skos:notation: The code of the classification. For NACE 2.1 would be NACE Rev. 2.1.

In the Figure 2 below the classification scheme of NACE 2.1 is shown:

NACE classification scheme in ShowVoc

Figure 2: NACE classification scheme in ShowVoc

 

2.2  Classification  item

All items in the classifications are represented as RDF resources of type skos:Concept. The classification items are described using a range of properties that establish their attributes and relationships within the classification system.

Each item belongs to a specific ConceptScheme (skos:inScheme ). For example in NACE 2.1 the classification item http://data.europa.eu/ux2/nace2.1/A is described by the properties below:

Hierarchical relation:

skos:topConceptOf : It is used to Identify the top-level concept(s) of a scheme.

Name of the classification item:

skos:prefLabel: The code and the name for a concept in a specific language (multiple languages are allowed).

skos:altLabel: The label (without the code) for a concept in EN and other languages.

Coding structure:

skos:notation: An acronym  or code that uniquely identifies a concept.

skos:definition: Sometimes we use a definition for the concept that provides a description or explanation.

NACE classification item in ShowVoc

Figure 3: NACE classification item in ShowVoc

 

2.3  Hierarchical structure

The classification's hierarchical structure is established using the SKOS broader properties.

SKOS has two direct hierarchy relations: skos:broader (a broader concept or a hierarchical relationship with a parent concept) and skos:narrower (a narrower concept or a hierarchical relationship with a child concept), which are inverses of one another. To follow the NACE 2.1 example mentioned in the picture below: item A is the parent to item B:
< http://data.europa.eu/ux2/nace2.1/A > skos:broader < http://data.europa.eu/ux2/nace2.1/01 >

Hierarchical structure of a NACE classification item

Figure 4: Hierarchical structure of a NACE classification item

 

2.4  Explanatory notes

Explanatory notes

Figure 5: Explanatory notes [1]

 

Classifications usually come with notes attached to the classification items. These notes describe the content of a classification item by describing what should be classified under this item and what should go elsewhere. Explanatory notes explain the content by giving examples of inclusions and exclusions, or provide rules or guidelines for how to use that category.

For the description contents we use the following:

skos:scopeNote: Usually references the includes/excludes information, describes the inclusions/exclusions within a classification item and has the following sub-properties:

xkos:inclusionNote: Provides a definition of what is included within a classification item (This items includes).

xkos:coreContentNote: Describes the inclusions (This item includes).

xkos:additionalContentNote: Describes the additional inclusions (This item includes also).

xkos:exclusionNote: Describes specific exclusions from the scope description of a classification item (This item excludes).

xkos:caseLaw: Usually references classification objects that were not explicitly described in the core content note nor in the additional content, usually refers to a ruling concerning the classification.

Explanatory notes in NACE2

Figure 6: Explanatory notes in NACE2

 

3. Statistical classification levels

Classification Levels

Figure 7: Classification Levels [1]

 

Many statistical classifications are aggregated in levels. In SKOS model, classification levels are represented as skos collections (skos:Collection)  with members all items at the same level. In XKOS model, levels are represented as classification level (xkos:ClassificationLevel) , aggregating the classification items by their level.

NACE for example, is organised into four levels: sections (level 1), divisions (level 2), groups (level 3), and classes (level 4). The bottom level (e.g. classes) being the most detailed one.

The information about the levels (xkos:levels) and number of levels (xkos:numberOfLevels) of a classification is referenced in the description of the classification (skos:ConceptScheme).

Also, classification items of a given level usually have a code that conforms to a specific structure. As example, NACE first level (sections) is identified by an alphabetical code sections, NACE second level (division) by a two-digit numerical code, NACE third level (groups) by a three-digit numerical code, NACE fourth level (classes) by a four-digit numerical code.

To summarize, the hierarchical levels are presented as:

skos:prefLabel: The name of the Level.

skos:member: The member of the collection.

xkos:depth: Defines the generality of the level (depth 1 to most generic).

xkos:notationPattern: Provides the code, letter or number of the specific structure(alphabetical code, numerical code).

xkos:organizedBy: Is used to record the generic name of the items of a given level (e.g. “generic_section).

NACE Levels in ShowVoc

Figure 8: NACE Levels in ShowVoc

 

4. Correspondence table

Concept Association

Figure 9: Concept Association[1]

 

Correspondences (aka alignments) between classification items from different classifications are expressed  using the SKOS properties “skos:exactMatch” , “skos:narrowMatch”,  “skos:closeMatch”, “skos:relatedMatch” .

Alignments in NACE2

Figure 10: Alignments in NACE2

 

Additionally, the XKOS modeling gives a more enrich representation of the correspondences between classifications (Figure 9). In XKOS, a Correspondence Table is considered as a dataset (xkos:Correspondence) that contains a set of concept associations. The xkos:ConceptAssociation class is referencing the mapping properties between a source and the target resource(s) (source skos:Concept(s) and target skos:Concept(s)).

A concept association links the source concept to single target concept. A Concept Association may have one target (one to one), more than one target (one to many) or any target if there is no correspondence in the target classification. The xkos:madeOf property links the xkos:Correspondence to its constituent xkos:ConceptAssociation components. A xkos:correspondence is linked to the skos:conceptScheme with the property xkos:compares.

The extraction of the correspondence table is possible only via a SPARQL query. Examples of how to exctract a correspondence table using  SPARQL queries are availiable in our SPARQL Queries - Short User Guide.

The extraction of the correspondence table NACE 2 –NACE2.1 is shown in the example below. The concept association, source concept (NACE 2), target concept (NACE2.1), and the comments iare presented.

NACE2-NACE2.1 Correspondence table

Figure 11: NACE2-NACE2.1 Correspondence table.

 

5. Useful material

SKOS Simple Knowledge Organization System

XKOS Extended Knowledge Organization System

XKOS Best Practices

SPARQL Queries - Short User Guide