WIN, the Hackathon Entries
Please submit your solution with a link to your repo and the names of your team members via email at ESSnet.project@ons.gov.uk by midnight on Thursday 14 November 2024
WIN, the Hackathon!
A call to the Data Science community to help us improve Official Statistics
The Web Intelligence Network (WIN) has been developing ways to identify E-commerce and Social Media use from publicly available web data. We are offering you a unique opportunity to contribute to this cutting-edge field and develop new ways to work with web data from online business enterprise characteristics (OBEC). If you want to join in and have some data science skills, you can sign up with a team. Teams can have as many members as you like. You can use your own coding environment. You and your team need to come up with ideas and passion.
In this collaborative challenge, teams develop open-source software to automatically derive E-Commerce and Social Media activity (for more details, see annexe) on a set of company websites from four different countries. The results will be evaluated against a manually labelled subset of data. This is a great opportunity to showcase your skills and contribute to the community.
Your approach to this challenge may involve scraping and interpreting texts, reading website structures, and/or automatically inspecting other data sources. However, it's crucial to emphasize that the software developed needs to comply with the highest ethical standards, including netiquette in web scraping..
Who can take part in WIN the Hackathon?
Anyone not directly involved in this topic in the WIN project can participate. Teams can be as big or small as one wishes, emphasizing the inclusivity of this challenge. To register for the Hackathon please complete the Registration Form.
The challenge, a unique opportunity for learning and growth, is now open and will conclude at midnight on 14 November 2024 (CET). Throughout the challenge, we will hold several online Q&A drop-in sessions for questions, fostering a collaborative learning environment, which will be held on:
- 17 Oct 16:00 CET - Join here - Teams meeting ID: 314 471 579 30 Pass Code: YFGHjh
- 24 Oct 16:00 CET - Join here - Teams meeting ID: 333 685 169 373 Pass code: 6dNBtq
- 31 Oct 16:00 CET - Join here - Teams meeting ID: 360 623 354 079 Pass code: KX6RQq
Any questions can be directed to ESSnet.project@ons.gov.uk
The team results are scored against a manually labelled subset, ensuring a fair and unbiased process. Each of the 2 categories (e-commerce and social media use) has equal weight. The team with the highest score wins. In the unlikely event of a draw, the explanation in the README.md will be used to determine the winner, further ensuring fairness.
How to enter
You will need to download the file input.csv with id, URL and columns for the indicators which can be downloaded at the bottom of the page.
Next steps
- Create a publicly available Github or Gitlab repo where the software can be found. It should contain a GitHub acknowledged open source license file.
- The README.md should contain a paragraph of at least 25 lines explaining the approach, pros and cons and anything worthwhile about the solution.
- The root of the repo should contain a file results.csv with the results, which is the input csv, but now with the binary indicators scored, which are:
- ecommerce: [0.1]: indicating ecommerce activities
- sm_fb: [0,1]: indicating Facebook use
- sm_linkedin: [0,1]: indicating LinkedIn use
- sm_x: [0,1]: indicating X use
- sm_insta: [0,1]: indicating Instagram use
- sm_tiktok: [0,1]: indicating TikTok use
- sm_yt: [0,1]: indicating YouTube use
- ecommerce: [0.1]: indicating ecommerce activities
- You can choose freely which methods or technology you want to use. It is only requested that it is open-source and the code to generate the classification must be available as an open-source licensed publicly available Git repo.
- More extended definitions can be found in the Annex of this document.
Submission
Submit your solution with a link to your repo and the names of your team members via email at ESSnet.project@ons.gov.uk.
Prize
The best three teams win the possibility to present their solutions at the NTTS conference (11-13 March 2025, Brussels) in the session dedicated to the WIN hackathon results. The travel costs within the European Union and hotel accommodation for one representative in each team is covered by Eurostat (the conference organizer).