Book page

How can a user change the encoding of a file to UTF-8?

Eurostat's data validation systems expect files in formats such as CSV and SDMX-ML to be sent using UTF-8 encoding only (without BOM).  If users send CSV or SDMX-ML datafiles with a different kind of encoding (including "UTF-8 with BOM"), they may thus receive a validation error report similar to the one in the screenshot below.

Validation report including UTF-8 with BOM error
BOM is the acronym of "Byte Order Mark". It is a particular variant of the UTF-8 encoding. However, this variant is not supported by Eurostat's data validation systems. 

In such cases, the provider will need to change the encoding of the file to UTF-8 and resubmit the file. The encoding can be changed using one of the two following approaches:

Approach 1

Open the file with a text editor like Notepad. Click on "Save as" and select "UTF-8" in the "Encoding" box upon saving. See screenshot below.

Image displaying UTF-8 in the Encoding field when using the option Saving as

Approach 2

Open the file in Notepad++. In the "Encoding" menu of Notepad++, click on "UTF-8".  Then save the file. Remark: if there is a black dot next to "UTF-8", it means that the format of the file is already correct.

Encoding file as UTF-8 using Notepad ++ text editor with the Encoding option