WIN Webinar: Lessons Learned from Eurostat’s Deduplication Challenge, 13 May 2024

Muhammad Suffian
Muhammad Suffian • 13 June 2024
Image of the webinar's first slide, highlighting the title and speakers

This webinar showcased the work of two teams, which were part of Eurostat’s first deduplication challenge – a component of the European Statistics Awards Programme Web Intelligence competition.

The webinar covered the following:

  • Different methods were used to identify duplicates in a multilingual dataset, using the job advertisement as a case study. These included Entity Recognition, transformer-based approaches to compare the similarity of the offers vector embeddings, or MintHash experimentations.
  • The webinar also shared some best practices for conducting a data science project. This was illustrated using deduplication as an example. The practices included using the Kedro framework for Python and a presentation of the Onyxia Datalab.

Watch the webinar here

Download the slides here

Read the blog

Be the first one to comment


Please log in or sign up to comment.