Book page

Advanced Python for Official Statistics

Default profile image
Magda CHMIEL • 29 January 2026

Course Leader

Christian Kauth

Target Group

Statistical production units and methodologists of NSIs with at least basic to intermediate knowledge of Python.

Applicants can prove this knowledge if they have attended the course "Basic Python for Official Statistics" or demonstrate equivalent experience.

Entry Qualifications

  • Sound command of English for discussions and presentations

  • Basic to intermediate Python knowledge

  • Familiarity with Jupyter notebooks, VS Code and Git version control

  • Experience with data manipulation and basic data analysis

  • Participants who completed "Basic Python for Official Statistics" are well-prepared

Objective(s)

Master advanced Python techniques for robust, reproducible statistical production through software engineering practices, advanced analytics, and modern AI integration. By course end, participants will:

  • Write robust and efficient Python programs using advanced programming concepts

  • Design, structure, and document reproducible statistical workflows following modern software engineering practices

  • Handle large and complex datasets efficiently using advanced pandas, NumPy, polars, and database connectors

  • Automate data ingestion from APIs, web sources, and databases, ensuring compliance with statistical standards and reproducibility

  • Apply advanced statistical and machine learning methods in Python for official statistics applications

  • Build MCP servers to expose Python functions to AI agents (GitHub Copilot, Claude Desktop)

  • Understand AI agent orchestration basics and how LLMs can automate workflows

  • Use geospatial tools to analyze regional statistics and create interactive choropleth maps

  • Create interactive visualizations with Plotly to support statistical dissemination

  • Develop applied projects relevant to their institution, strengthening capacity for Python-based production

  • Implement solutions with proper testing, documentation, and reproducible reporting

Contents

Building on Python fundamentals, this course equips NSI staff with advanced skills for statistical production. Participants learn software engineering practices, advanced data handling, modern analytics—from machine learning to geospatial applications—developing robust, reproducible solutions for complex challenges in official statistics.

 

The course emphasizes practical application through real statistical challenges, ensuring participants can immediately apply learned techniques to their institutional work.

 

Day 1: Advanced Python Programming & Software Engineering

Build reusable classes with OOP and dataclasses, create data transformation pipelines with functional programming and decorators, organize code into packages with proper dependencies, write test suites with pytest and ensure code quality with black/ruff.

 

Day 2: Advanced Data Handling for Official Statistics 

Process large datasets with advanced pandas techniques, explore modern tools like Polars for performance, automate data collection through web scraping (BeautifulSoup, Selenium), and integrate PostgreSQL databases with SQLAlchemy ORM.

 

Day 3: Statistical Computing, Machine Learning & AI Integration

Apply numerical computing with NumPy and statistical methods with statsmodels, build ML pipelines with scikit-learn and interpret models with SHAP. Explore AI integration: create MCP servers to expose Python functions to LLM agents and understand AI agent orchestration with Semantic Kernel. Develop interactive Plotly visualizations and animated geospatial choropleth maps with geopandas.

 

Day 4: Reproducible Research & Project Launch

Morning: Generate reproducible reports with Quarto, automate workflows with papermill, followed by technical Q&A covering all topics. Afternoon: Reverse-pitch hackathon—pitch ambitious projects (MCP tools, AI agents, interactive dashboards, ML systems, geospatial apps), form expert teams, and begin implementation.

 

Day 5: Advanced Project Development & Technical Showcase

Morning: Intensive development with architecture coaching—build production-ready demos with testing and documentation. Afternoon: Technical presentations (live system demos, architecture highlights, design decisions), peer voting for innovation and production-readiness, and celebration of achievements.

Expected outcome

 

 

 

Participants gain both technical proficiency and strategic understanding to leverage Python for modern statistical production:

 

Technical Skills:

  • Build reusable classes with OOP, dataclasses, and Pydantic for data validation

  • Create custom decorators, generators, and higher-order functions for advanced patterns

  • Organize code into packages with proper dependencies (pyproject.toml)

  • Write test suites with pytest (fixtures, parametrize, mocking) and ensure code quality with black/ruff

  • Process large datasets with chunking and optimized dtypes in pandas

  • Compare and use modern data tools (Polars, Parquet) for performance improvements

  • Scrape statistical websites with BeautifulSoup and automate JavaScript pages with Selenium

  • Connect to PostgreSQL databases with SQLAlchemy ORM and write queries

  • Apply NumPy for array operations and statsmodels for regression and time series

  • Build machine learning pipelines with scikit-learn and interpret models with SHAP

  • Create simple MCP servers to expose Python functions to AI agents (Claude Desktop, GitHub Copilot)

  • Understand AI agent orchestration basics with Semantic Kernel

  • Develop interactive visualizations with Plotly (dashboards, animated choropleth maps)

  • Perform geospatial analysis with geopandas (spatial joins, buffers, choropleth maps)

  • Generate reproducible reports with Quarto and automate notebook execution with papermill

 

Practical Capabilities:

  • Evaluate when to use advanced pandas vs Polars for specific tasks

  • Understand tradeoffs between SQL (PostgreSQL) and NoSQL approaches

  • Assess how MCP servers and AI agents can enhance statistical workflows

  • Debug code with pdb and implement structured logging

  • Design and implement hackathon projects applying Day 1-3 technologies

 

Professional Development:

  • Network with peers from other NSIs facing similar statistical production challenges

  • Present technical solutions through live demos and architecture explanations

  • Collaborate in teams using modern development practices

  • Foundation for continued learning in advanced Python topics

Training Methods

  • Interactive lectures with live coding demonstrations
    (Days 1-3: ~40%)

  • Hands-on coding exercises and labs embedded in theory sessions
    (Days 1-3: ~20%)

  • Hackathon-style collaborative project work
    (Days 4-5: ~30%)

  • Technical presentations and peer learning
    (Day 5: ~10%)


 

Required Reading

None

Suggested Reading

Required Preparation

Software to install (detailed instructions provided during course)

 

Free accounts to create

 

Recommended skills (not required but helpful)

  • Basic Git/GitHub for collaboration (clone, commit, push, pull) - see suggested reading above

  • Teams will use GitHub for hackathon projects, so familiarity with basic version control is beneficial

 

Project preparation (critical for 2-day hackathon)

  • Identify an advanced statistical challenge from your institution that would benefit from Python automation

  • Consider projects suitable for AI integration: MCP-powered data access, agent-orchestrated workflows, ML classification, interactive dashboards, geospatial analysis, ETL pipelines or whichever topic-related challenge you face

  • Prepare datasets or secure data access (APIs, databases, files) for your project

  • Think about technical architecture and which Day 1-3 technologies could help solve your challenge

  • The reverse-pitch format means YOU propose ambitious projects, so come prepared with ideas that excite you!

Trainer(s)/
Lecturer(s)

Christian Kauth (Independent expert)

 

Practical Information

Start date

End Date

Duration

Where

Address

APPLICATION VIA National Contact Point

22 June 2026

26 June 2026

5 days

ICON-INSTITUT Public Sector GmbH

Von-Groote-Str. 28 

50968 Cologne, 

Germany

Deadline for application:

22/04/2026