This report evaluates the feasibility of collecting data from Digital Labour Platforms (DLPs) through web scraping techniques. Conducted as part of the 'Feasibility study on work platforms web data retrieval,' the analysis covers a range of DLPs varying in size and geographic scope. The study employs both qualitative and quantitative criteria to assess the richness of information available on these platforms and the technical challenges associated with automated data retrieval.
Qualitatively, the report examines the availability and composition of professional resumes and job descriptions, as well as additional features like reviews, ratings, and location information. Quantitatively, the study employs web scraping tests to determine the maximum number of pages that can be visited before being blocked by each platform. The results indicate varying levels of feasibility, with some platforms allowing extensive scraping and others blocking bots immediately.
The report also explores the potential for formal agreements with DLPs to access data directly, thereby enhancing data quality. Initial outreach for such agreements has been largely unresponsive. In conclusion, while web scraping is technically feasible for most platforms, the optimal approach for data collection would involve formal agreements with DLPs to ensure comprehensive and high-quality data.
Please log in or sign up to comment.