WIN Webinar: Lessons Learned from Eurostat’s Deduplication Challenge, 13 May 2024

This webinar showcased the work of two teams, which were part of Eurostat’s first deduplication challenge – a component of the European Statistics Awards Programme Web Intelligence competition.

The webinar covered the following:

Different methods were used to identify duplicates in a multilingual dataset, using the job advertisement as a case study. These included Entity Recognition, transformer-based approaches to compare the similarity of the offers vector embeddings, or MintHash experimentations.
The webinar also shared some best practices for conducting a data science project. This was illustrated using deduplication as an example. The practices included using the Kedro framework for Python and a presentation of the Onyxia Datalab.

Watch the webinar here

Download the slides here

Read the blog

WIN Webinar: Lessons Learned from Eurostat’s Deduplication Challenge, 13 May 2024

WIH Training materials

Be the first one to comment