ESTP Course: Scraping online data: sources, tools and methodologies. Hands-on analysis of Online Job Advertisements

Muhammad Suffian
Muhammad Suffian • 2 January 2024
Animation of a computer screen displaying a job description, with key elements such as title, location and date highlighted

 

Scraping online data: sources, tools and methodologies. Hands-on analysis of Online Job Advertisements

Course LeaderMauro Pelucchi
Target GroupOfficial statisticians working on big data methodology, data science and in employment and education statistics, as well as other statistical domains which can profit from this data source.
Entry Qualifications

Sound command of English. Participants should be able to make short interventions and to actively participate in discussions

Domain knowledge on Labour Market Intelligence

Preliminary Big Data knowledge

Familiarity with base analytical techniques

Familiarity with base programming knowledge

Objective(s)

Understand how to collect Web Data regarding Online Job Vacancies and store them

Understand of data processing techniques

Understand the challenges and the issues of web data

Base understand of data classification techniques on standard taxonomies and base understand of advanced techniques on taxonomies improvement

Contents

Landscaping the online job market

OJV data ingestion (e.g.: source selection, ingestion techniques)

Overview of web technology (HTML, CSS, JS, XPATH, ...);

Scraping vs Crawling vs Search (including URLs discovery via surveys, search engines and crowdsourcing);

Data extraction via API (HTTP messages, requests and response codes, POST, REST, JSON format, R package 'httr');

Data extraction via scraping tools;

OJV data processing (e.g.: pipeline, vacancy detection, deduplication)

Automatic classification of OJV data (e.g.: multi- language environment, feature extraction, classifiers)

Text processing and multi-language environment

Classification processes, feature extraction and machine learning

Focus on occupation’s categorization

Focus on skill’s categorization

Analysis of OJV data with the Big Data Science Workbench tools

Expected OutcomeSample script that extract Job Vacancies and other data from a web source, cleans them and prepare for analytical path
Training Methods

Presentations and lectures

Exchange of views/experiences on national practices

Exercises/DataLab

Required ReadingNone
Suggested ReadingNone
Required PreparationNone
Trainer(s)/
Lecturer(s)

Mauro Pelucchi

Colombo Ettore

Practical Information

When

Duration

Where

Organiser

APPLICATION VIA National Contact Point

13–17.05.20245 daysCologne, Germany

ICON INSTITUTE

Public Sector GmbH

Deadline: 25.03.2024
07–11.10.20245 daysCologne, Germany

ICON INSTITUTE

Public Sector GmbH

Deadline: 12.08.2024

 

Be the first one to comment


Please log in or sign up to comment.