Book page

AIML4OS WP 8 Use cases: Editing focus - Statistically valid and efficient editing and imputation in official statistics by AI/ML – with a special focus on editing

WP 8 focuses on enhancing the quality of official statistics through the innovative use of AI/ML in data editing and imputation, with a special emphasis on editing. Data Editing deals with issues that are essential for the quality of official statistics. This involves reliable information for traditional products as well as the production of good training data sets for machine learning. At the same time, machine learning can help to improve data editing.
On the one hand, by enabling the use of methods that can recognise erroneous observations as such and, on the other hand, for the specific localisation of the error within an observation. The replacement of incorrect values also touches on the customisation of this work package and establishes the connection to the imputation focus of WP9.
The expected outputs of this work package will include methodological investigations and practical implementations, encapsulated in Python and/or R code. A standardised editing pipeline for all datasets and variable types would be of great benefit to official statistics, but seems hardly feasible. Different procedures and approaches must therefore be developed and analysed for different types of datasets, taking into account the fact that data is probably not shareable between different NSI. The interplay between development, testing and standardisation leads to an iterative process that will characterise the work of this work package. At best, this will result in a methodological and implementation framework for editing. This framework will constitue the concrete result of this work package, which will further improve the quality of official statistics. Furthermore, this work package seeks to compile a comprehensive report detailing the methodologies employed, the evaluation of outcomes, and the invaluable lessons learned throughout the process.
To achieve these results, the work package is structured into several strategic steps. The first step will be to achieve a literature review and a systematic examination of current experiences and challenges faced by participating countries. Insights gained from this review will guide the selection of specific problems to address, focusing on the techniques, degree of automation, and data handling. Based on the foundational work, different methodological approaches for data editing will be developped and tested  (usually in forms of prototypes) in a second phase.
The final step involves compiling the findings and methodologies into a detailed report. Additionally, the work package will participate in key dissemination events, such as the UNECE Data Editing Workshops and Expert Meetings in 2024 and 2026. These events, both virtual and physical, will serve as platforms to share the outcomes and insights gained, promoting broader adoption and understanding of the developed methodologies.