In the rapidly evolving field of text classification, the use of deep learning methods has revolutionized the ability to construct classifiers for text classification. However, hardly any approaches are known to date that take into account the hierarchies inherent in the classification schemes during the training and evaluation phases of these models. The Work Package 10 "From text to code - Experiences and potential of the use of AI/ML for classifying and coding" aims at exploring the use of these new methods for the national statistical offices. Its goal is to harness AI and machine learning technologies to map textual descriptions from diverse sources to universally recognized classification schemes such as NACE, COICOP, and ISCO, which are pivotal in the domain of official statistics. This objective faces notable challenges, particularly the multilingual nature of the textual data and the constraints on data exchange.
To face these issues, the work package aims to deliver methodological investigations and practical implementations, encapsulated in Python and/or R code, tailored for efficient classification. Furthermore, it seeks to compile a comprehensive report detailing the methodologies employed, the evaluation of outcomes, and the lessons learned throughout the process.
To achieve these results, the work package is structured into several strategic steps. The first step will be to achieve a literature review and a systematic examination of current experiences and challenges faced by participating countries. This foundational work will be the base to define the strategic selection of problems to tackle from a methodological and an implementation point of view (technique, classification scheme, data). Based on this result, the second phase will contain the methodological investigations and the concrete practical implementations of several approaches to identify the most effective solutions. The culmination of this project will be a final report, alongside active participation in dissemination events, both virtual and physical.
These efforts aim not only to share the successful methodologies and text2code pipelines developed but also to foster a broader understanding and application of AI/ML in the classification and coding of textual data within official statistics, marking a significant step forward in the field.
Back to the main page of the AIML4OS Implementation: AIML4OS| Eurostat CROS (europa.eu)