Advanced Python for Official Statistics

Course Leader	Christian Kauth
Target Group	Statistical production units and methodologists of NSIs with at least basic to intermediate knowledge of Python. Applicants can prove this knowledge if they have attended the course "Basic Python for Official Statistics" or demonstrate equivalent experience.
Entry Qualifications	Sound command of English for discussions and presentations Basic to intermediate Python knowledge Familiarity with Jupyter notebooks, VS Code and Git version control Experience with data manipulation and basic data analysis Participants who completed "Basic Python for Official Statistics" are well-prepared
Objective(s)	Master advanced Python techniques for robust, reproducible statistical production through software engineering practices, advanced analytics, and modern AI integration. By course end, participants will: Write robust and efficient Python programs using advanced programming concepts Design, structure, and document reproducible statistical workflows following modern software engineering practices Handle large and complex datasets efficiently using advanced pandas, NumPy, polars, and database connectors Automate data ingestion from APIs, web sources, and databases, ensuring compliance with statistical standards and reproducibility Apply advanced statistical and machine learning methods in Python for official statistics applications Build MCP servers to expose Python functions to AI agents (GitHub Copilot, Claude Desktop) Understand AI agent orchestration basics and how LLMs can automate workflows Use geospatial tools to analyze regional statistics and create interactive choropleth maps Create interactive visualizations with Plotly to support statistical dissemination Develop applied projects relevant to their institution, strengthening capacity for Python-based production Implement solutions with proper testing, documentation, and reproducible reporting
Contents	Building on Python fundamentals, this course equips NSI staff with advanced skills for statistical production. Participants learn software engineering practices, advanced data handling, modern analytics—from machine learning to geospatial applications—developing robust, reproducible solutions for complex challenges in official statistics. The course emphasizes practical application through real statistical challenges, ensuring participants can immediately apply learned techniques to their institutional work. Day 1: Advanced Python Programming & Software Engineering Build reusable classes with OOP and dataclasses, create data transformation pipelines with functional programming and decorators, organize code into packages with proper dependencies, write test suites with pytest and ensure code quality with black/ruff. Day 2: Advanced Data Handling for Official Statistics Process large datasets with advanced pandas techniques, explore modern tools like Polars for performance, automate data collection through web scraping (BeautifulSoup, Selenium), and integrate PostgreSQL databases with SQLAlchemy ORM. Day 3: Statistical Computing, Machine Learning & AI Integration Apply numerical computing with NumPy and statistical methods with statsmodels, build ML pipelines with scikit-learn and interpret models with SHAP. Explore AI integration: create MCP servers to expose Python functions to LLM agents and understand AI agent orchestration with Semantic Kernel. Develop interactive Plotly visualizations and animated geospatial choropleth maps with geopandas. Day 4: Reproducible Research & Project Launch Morning: Generate reproducible reports with Quarto, automate workflows with papermill, followed by technical Q&A covering all topics. Afternoon: Reverse-pitch hackathon—pitch ambitious projects (MCP tools, AI agents, interactive dashboards, ML systems, geospatial apps), form expert teams, and begin implementation. Day 5: Advanced Project Development & Technical Showcase Morning: Intensive development with architecture coaching—build production-ready demos with testing and documentation. Afternoon: Technical presentations (live system demos, architecture highlights, design decisions), peer voting for innovation and production-readiness, and celebration of achievements.
Expected outcome	Participants gain both technical proficiency and strategic understanding to leverage Python for modern statistical production: Technical Skills: Build reusable classes with OOP, dataclasses, and Pydantic for data validation Create custom decorators, generators, and higher-order functions for advanced patterns Organize code into packages with proper dependencies (pyproject.toml) Write test suites with pytest (fixtures, parametrize, mocking) and ensure code quality with black/ruff Process large datasets with chunking and optimized dtypes in pandas Compare and use modern data tools (Polars, Parquet) for performance improvements Scrape statistical websites with BeautifulSoup and automate JavaScript pages with Selenium Connect to PostgreSQL databases with SQLAlchemy ORM and write queries Apply NumPy for array operations and statsmodels for regression and time series Build machine learning pipelines with scikit-learn and interpret models with SHAP Create simple MCP servers to expose Python functions to AI agents (Claude Desktop, GitHub Copilot) Understand AI agent orchestration basics with Semantic Kernel Develop interactive visualizations with Plotly (dashboards, animated choropleth maps) Perform geospatial analysis with geopandas (spatial joins, buffers, choropleth maps) Generate reproducible reports with Quarto and automate notebook execution with papermill Practical Capabilities: Evaluate when to use advanced pandas vs Polars for specific tasks Understand tradeoffs between SQL (PostgreSQL) and NoSQL approaches Assess how MCP servers and AI agents can enhance statistical workflows Debug code with pdb and implement structured logging Design and implement hackathon projects applying Day 1-3 technologies Professional Development: Network with peers from other NSIs facing similar statistical production challenges Present technical solutions through live demos and architecture explanations Collaborate in teams using modern development practices Foundation for continued learning in advanced Python topics
Training Methods	Interactive lectures with live coding demonstrations (Days 1-3: ~40%) Hands-on coding exercises and labs embedded in theory sessions (Days 1-3: ~20%) Hackathon-style collaborative project work (Days 4-5: ~30%) Technical presentations and peer learning (Day 5: ~10%)
Required Reading	None
Suggested Reading	Git basics for collaboration https://docs.github.com/en/get-started/quickstart/hello-world GitHub flow for team projects https://docs.github.com/en/get-started/quickstart/github-flow
Required Preparation	Software to install (detailed instructions provided during course) Modern web browser (Chrome, Edge or Firefox) VS Code (https://code.visualstudio.com/) Git (https://git-scm.com/) Free accounts to create Google Colab (course material) (https://colab.research.google.com/) GitHub account (https://github.com/) for version control and team collaboration Recommended skills (not required but helpful) Basic Git/GitHub for collaboration (clone, commit, push, pull) - see suggested reading above Teams will use GitHub for hackathon projects, so familiarity with basic version control is beneficial Project preparation (critical for 2-day hackathon) Identify an advanced statistical challenge from your institution that would benefit from Python automation Consider projects suitable for AI integration: MCP-powered data access, agent-orchestrated workflows, ML classification, interactive dashboards, geospatial analysis, ETL pipelines or whichever topic-related challenge you face Prepare datasets or secure data access (APIs, databases, files) for your project Think about technical architecture and which Day 1-3 technologies could help solve your challenge The reverse-pitch format means YOU propose ambitious projects, so come prepared with ideas that excite you!
Trainer(s)/ Lecturer(s)	Christian Kauth (Independent expert)

Practical Information

Start date

End Date

Duration

Where

Address

APPLICATION VIA National Contact Point

22 June 2026

26 June 2026

5 days

ICON-INSTITUT Public Sector GmbH

Von-Groote-Str. 28

50968 Cologne,

Germany

Deadline for application:

22/04/2026