Book page

How to respect the dataset naming convention (DSNC)

When selecting a datafile in the send data file screen, the file name is parsed and analysed using the DSNC (dataset naming convention) in order to pre-fill automatically the form.

Dataset Naming Convention (DSNC) at a glance:

 

 

DATASET ID: 

 

DOMAIN ID_DATASET STRUCTURE ID_PERIODICITY

 

FieldLengthDescription/Remark
DOMAIN ID1..8Identifies the statistical domain (a group of datasets closely linked together).Only digits from 0 to 9 and capital letters can be used.

DATASET

STRUCTURE ID

1..7

Identifies the Dataset Structure (associated with one or several statistical tables). Only digits from 0 to 9 and capital letters can be used.

  • If no Dataset Structure ID is defined, the “periodicity id for data” will be used as default value.
  • If 2 positions are used for the “periodicity or periodicities” field, then the Dataset Structure ID should be on 6 positions maximum.
PERIODICITY OR PERIODICITIES1
  • “A” for annual
  • “0” for multiannual > 10 years
  • “1” for every 10 years
  • “2” to “9” for every 2 to 9 years
  • “S” for Semester
  • “Q” for Quarterly
  • “M” for Month
  • “W” for Weekly
  • “D” for Daily
  • “O” for other periodicity
  • "N" for non-periodic

 

DATASET OCCURRENCE ID  

DATASET ID_FROM_YEAR_PERIOD_[TO]_[OPTION].FORMAT

 

FieldLengthDescription/Remark
DATASET ID  (mandatory)See aboveSee above
FROM (mandatory)2The code of the country which the primary data providing organisation belongs to. The ISO2 country codes are used, with several exceptions
YEAR (mandatory)4

Four digit representation of the reporting year, “YYYY”

For non-periodic  datasets: “0000” or the reporting year

PERIOD (mandatory)4

Four digit representation of the period within the reporting year or the sequence number for non-periodic datasets. Acceptable values depend on the periodicity:

  • “0000” for annual or multiannual,
  • “0001” to “0004” = the quarter for quarterly transmissions,
  • “0001” to “0012” = the month for monthly transmissions,
  • “0001” to “9999” = the sequence number for sequential.

TO

(optional)

2

The code of the country which the primary data receiving organisation belongs to. The same rules are applied as in FROM.

 

Note: This field is used mainly for transmissions sent from Eurostat.

Optional field(s)1…220Though not recommended, optional information given by the data sender (ignored by Eurostat tools during processing)
FORMAT20Examples: “XML”, “GES”: GESMES, “CSV”, “FLR” (“Fixed Length Records”), “DOCX”, “XLSX” etc.

 

 

Composition constraints and limitations for fields:

-          “YEAR” and “PERIOD”: If data of several years/periods are sent in a dataset occurrence, then only the year/period should be used that is agreed (specified in the calendar of the dataset).

-          Optional field(s): Only letters “A” to “Z”, digits “0” to “9” and “_” are allowed.

 

 The file name follows:

  • partially the DSNC  

It means that a valid dataset ID is found at the beginning of the file name.

In this case, only the fields until the first inconsistency will be pre-filled. Other fields remain empty and have to be filled manually by the user.

Fields extracted from the DSNC analysis have to be compatible with the filtered lists of elements present in the corresponding fields.

  • completely the DSNC

For complete DSNC, the "Dataset ID", "From", "To", "Year" and "Period" fields will be automatically filled in.

The naming convention relates to 3 levels:

  • Level 1: the dataset ID (ex: RAIL_E_Q)
  • Level 2: the dataset occurrence ID (ex: RAIL_E_Q_UK_2003_0002_EU)
  • Level 3: the datafile ID (ex: RAIL_E_Q_UK_2003_0002_EU_V0002.GES).

The DSNC parsing is based on the second level, the dataset occurrence ID.

The automatic pre-filling of the form is performed until the first inconsistency detected in the DSNC.

For instance, “RAIL_E_Q_toto.csv” is sufficient to pre-fill the field "Dataset ID" and "RAIL_E_Q_UK_2003_titi.csv" is sufficient to pre-fill the "Dataset ID", "From" and "Year" fields.