This webinar showcased the work of two teams, which were part of Eurostat’s first deduplication challenge – a component of the European Statistics Awards Programme Web Intelligence competition.
The webinar covered the following:
- Different methods were used to identify duplicates in a multilingual dataset, using the job advertisement as a case study. These included Entity Recognition, transformer-based approaches to compare the similarity of the offers vector embeddings, or MintHash experimentations.
- The webinar also shared some best practices for conducting a data science project. This was illustrated using deduplication as an example. The practices included using the Kedro framework for Python and a presentation of the Onyxia Datalab.
Please log in or sign up to comment.