This work package is being carried out by 11 of the Web Intelligence Network partners: STATA (Austria) will be leading, with contributions from: NSI (Bulgaria), DESTATIS (Germany), FI (Finland), INSEE (France), DARES (France), ISTAT (Italy), CBS (Netherlands), GUS (Poland), PT (Portugal), ONS (UK), and FSO (Switzerland).
The aim of Work Package 4 (WP4) is to consolidate knowledge gained in the WIN (and with limitations outside, in academia and the non-ESS official statistics community) in the area of methodology and quality when using web data in the statistical production process. An extension and enhancement (E & E) of several key deliverables from the ESSnet Big Data II (especially WPF & WPK), focusing on web data, will be conducted. Additionally, a transition from quality guidelines to quantitative quality indicators, as well as the first cross-national quality assessment for online job advertisements (OJA) and online based enterprise characteristics (OBEC), will be executed.
WP4 will focus on the delivery of four main tasks:
- E & E of BREAL framework - The adoption of the WIH poses the challenge of alignment through a top down strategy of the WIH to BREAL (Big Data REference Architecture and Layers) - a European reference architecture for Big Data. At the same time, the input received from WP2 and WP3 will be used via a bottom up approach to validate and extend BREAL where needed. In both cases, architecture work will help describe deployment steps of WIH-based statistical pipelines, as well as integration of new services in the Hub. An examination of sharing potential (both for current WIH scope and future new use cases) will also be conducted.
- E & E of Quality of web scraped data – Quality Guidelines for Big Data were produced during the ESSnet Big Data I and II. While the quality guidelines give a very good overview of managing quality while working with big data, they concluded for some phases and types that “it was almost impossible to state generally applicable quality guidelines”. To overcome this issue, and simultaneously provide the user with clear instructions in using web data, a formulation of quality indicators is inevitable. The input for an extension of quality guidelines will be given by WP2 and WP3 for new data sources and new use cases.
- Assessing the Quality – this sub-task will focus on the quality assessments for OBEC and OJA, including an analysis between the statistical indicators derived from the sample survey or admin data with the ones produces by using web scraped data. This could reveal quality issues not only regarding the web scraped data but also the sample data.
- E & E of Methodology for web scraped data – Methodology reports that were developed during ESSnet Big Data I and II, will be reviewed, improved and updated to ensure the expansion to include web-based data.
Experiences obtained in the pilots of WP2 and WP3 will be used as input to WP4. To achieve a high degree of acceptance, feedback from relevant ESS bodies will be collected for key envisaged deliverables. The target group of the produced deliverables are mainly persons involved in the statistical production process in NSOs that directly or indirectly work with web data or consider to do so, as well as in a broader context also statisticians with this research interest in academia and the non-ESS official statistics community.