Pre-Conference Courses

BIG DATA

small data

Your Data

Transitions in the age of biomedical AI

FREIBURG | GERMANY

27.09. - 01.10.26

The pre-conference courses offer interactive, hands-on training across topics in biostatistics, bioinformatics, epidemiology, and medical informatics, including cross-cutting themes related to biomedical AI. They will take place on Sunday, September 27, and can be booked through the registration system.

Beyond “Randomized”: Designing and Evaluating Randomization in Clinical Trials

Causal Inference for Time-to-Event Outcomes – With Practical Applications in R

Clinical Text Mining Under Real-World Constraints

Efficient Bayesian Analysis of Longitudinal and Survival Data Using R-INLA

Genomic AI-Agents for Personal Genomics: Hands-On Analysis With Just-DNA-Lite and LLM Agents

Performing Comparative Effectiveness Research Using Network Meta-Analysis: An Introduction to Theoretical and Practical Advances

Prediction Under Interventions

The Reference Terminology SNOMED CT

Beyond “Randomized”: Designing and Evaluating Randomization in Clinical Trials

Half-Day

Course Instructor(s):

Stefanie Schoenen
RWTH Aachen, Institute of Medical Statistics, Germany

Daniel Bodden
RWTH Aachen, Institute of Medical Statistics, Germany

Ralf-Dieter Hilgers
Sigmund Freud Private University Vienna, Austria

Franz König
Medical University of Vienna, Austria

General Outline and Main Topics

Randomized controlled trials are widely regarded as the gold standard in clinical research because randomization and allocation concealment together protect against multiple sources of bias. Randomization aims to prevent systematic differences between treatment groups, while concealment protects assignments from foreknowledge and manipulation during enrolment. In practice, however, the label “randomized” is often treated as a guarantee of validity, while the implementation is often overlooked. This is particularly problematic in open-label and increasingly complex trial designs, where randomization interacts with design elements such as interim analyses. This course provides a comprehensive overview of randomization in clinical trials. It begins with a historical introduction and clarifies what randomization can (and cannot) achieve. Participants are introduced to different types of randomization procedures, including their strengths, limitations and the fundamental trade-off between predictability and imbalance. A practical session demonstrates a Shiny app for evaluating and visualizing the impact of bias on inference, along with the R package randomizeR and the Evaluation of Randomization procedures for Design Optimization (ERDO) framework for simulation-based planning and generation of randomization lists. The course concludes by emphasizing that labeling a study “randomized” does not guarantee valid results and provides guidance on selecting appropriate randomization procedures and addressing challenges in complex trial designs.

Learning Objectives

Participants will learn about the history and rationale of randomization in clinical trials, different randomization procedures with their strengths and limitations, and how to generate randomization lists. The course also covers how randomization affects evidence reliability, common pitfalls in randomization, and practical criteria for selecting an appropriate randomization procedure for a given study design.

Format

A concise introduction on randomization theory and reporting standards, followed by a practical hands-on R app demonstration of randomization tools, and ending with a summary discussion of remaining challenges.

Appeal to Attendees

Randomized — but truly unbiased? Simply labeling a trial “randomized” does not guarantee valid results. As clinical trial design becomes more complex, the choice, implementation, and reporting of randomization becomes crucial. This session introduces the fundamentals of randomization, including a historical perspective, reviews different randomization procedures and discusses how these choices affect predictability, balance and vulnerability to bias and with that ultimately the level of evidence. Combining methodological insight with practical R-based tools (including a Shiny app), the session shows how to select randomization approaches that support more robust clinical trials.

Course Requirements

Basic knowledge of randomized clinical trials and access to a digital device.

Causal Inference for Time-to-Event Outcomes – With Practical Applications in R

Full-Day

Course Instructor(s):

Ruth Keogh
London School of Hygiene & Tropical Medicine, Medical Statistics Department, United Kingdom

Jon Michael Gran
University of Oslo, Oslo Centre for Biostatistics and Epidemiology, Norway

General Outline and Main Topics

Recent years have seen major developments in methods for causal inference using observational data. However, the practical application of the methods is challenging and lags behind methodological developments. This is especially true in the context of survival and other time-to-event outcomes, which are commonly of interest in applications in biostatistics and data science. This course will provide training on concepts and methods for estimating causal effects of treatments on time-to-event outcomes. It will begin with an introduction to causal estimands and assumptions required for their identification using observational data. Afterwards, the course will cover estimation methods for confounding adjustment, including inverse probability weighting, marginal structural models, g-formula, and censoring-weighting approaches. The initial focus will be on treatments given at a single time point, before extending to time-varying treatment strategies. Methods that incorporate machine learning techniques will be included, and extensions to settings with competing events will also be discussed. The course material will be presented with medical and epidemiological applications in mind, but the methods are equally relevant in other areas, such as social science and economics. The course will combine lectures and computer-practical sessions using openly accessible data sets.

Learning Objectives

Select causal estimands suitable for time-to-event outcomes, including when there are competing events.

Apply methods for estimating the effects of point treatments and longitudinal treatment strategies in R, including inverse probability weighting, g-formula and doubly-robust methods using machine learning techniques.

Understand the assumptions, advantages and disadvantages of different estimation methods.

Format

The course will comprise interactive lectures alternating with practical hands-on computer segments, each supported by guided exercise sheets and solution discussions.

Appeal to Attendees

This course is aimed at researchers and students in biostatistics, epidemiology and data science. Causal inference methods are increasingly required in these fields, with recent developments emphasising use of machine learning techniques. Time-to-event outcomes are very commonly faced but training focused on this aspect has previously been lacking. The course will be of interest to participants wishing to gain a comprehensive overview of methods in this area and to those focused on applications.

Course Requirements

Some knowledge of methods for survival analysis will be assumed, such as estimation of survival curves using the Kaplan-Meier estimator and fitting regression models. Knowledge of basic causal concepts such as confounding would be beneficial. Participants should have prior experience of using R, though not necessarily for time-to-event analysis or causal inference. In advance of the course participants will be provided with background reading on some fundamental methods for time-to-event and on key causal concepts. We will make available a brief online tutorial enabling participants to familiarise themselves with the main functions for survival analysis in R.

Clinical Text Mining Under Real-World Constraints

Half-Day

Course Instructor(s):

Vittorio Torri
MOX – Modelling and Scientific Computing Lab, Department of Mathematics, Politecnico di Milano, Italy

Francesca Ieva
MOX – Modelling and Scientific Computing Lab, Department of Mathematics, Politecnico di Milano, Italy
HDS – Health Data Science Centre, Human Technopole, Italy

General Outline and Main Topics

Statistical and machine learning models for healthcare analytics increasingly rely on extracting structured variables from unstructured clinical documents such as discharge summaries and clinical notes. While Natural Language Processing (NLP) has advanced rapidly in recent years, its application to real-world medical data faces relevant domain-specific challenges. The limited availability of gold-standard annotated data, due to high annotation costs, and the constrained computational resources in hospital environments restrict the applicability of many state-of-the-art NLP models. Moreover, privacy regulations can complicate data sharing and largescale model development. These constraints create a growing need for methodologies enabling reliable information extraction under realistic resource and data constraints. This course provides a practical overview of such techniques. The course will cover robust design and validation of rule-based approaches, unsupervised representation learning and clustering, weakly supervised classification techniques, and data augmentation methods based on Large Language Models (LLMs). For each topic, we will introduce the methodological foundations, present a real-world case study (CS) where we applied the approach, and provide a hands-on programming exercise (EX). Exercises will use publicly available clinical text resources in English, including documents from the European Clinical Case Corpus (E3C) and from a dataset released for the CRF Filling Shared Task at CL4Health – LREC 2026 (FBK-CRF).

Learning Objectives

By the end of the course, participants will understand the main methodological options for extracting structured information from clinical text when labelled data is limited. They will learn how to select and implement workflows ranging from rule-based systems to large language models, based on the problem, the availability of labelled and unlabelled data, and computational resources. They will also learn how to evaluate these methods using small but representative labelled datasets, including the definition of annotation guidelines and the measurement of inter-annotator agreement.

Format

Four technical sessions: Rule based methods, clustering, weak supervision, and LLM based data augmentation, framed by a brief introduction, a short break, and a concluding Q&A.

Appeal to Attendees

The course directly addresses the growing need to integrate unstructured clinical data into statistical and epidemiological analyses. For ISCB attendees, it provides practical tools for cohort identification, phenotype classification, and extraction of structured variables usable in standard regression and survival models. For GMDS attendees, it is particularly relevant because it addresses clinical data integration challenges and practical deployment constraints that are common in medical informatics.

Course Requirements

Basic knowledge of programming (preferably Python), statistics, and machine learning. Participants should bring a laptop with internet access and a Google account to run Colab Notebooks.

Efficient Bayesian Analysis of Longitudinal and Survival Data Using R-INLA

Half-Day

Course Instructor(s):

Denis Rustand
University of Bordeaux, National Institute of Health and Medical Research, France

General Outline and Main Topics

Modern biostatistical research is increasingly characterized by the analysis of complex, high-dimensional data from sources like electronic health records and large observational studies. While joint models provide a powerful framework for these data, their application has been limited by the prohibitive computational cost of traditional methods like MCMC. This course introduces a powerful computational framework: Integrated Nested Laplace Approximations (INLA). As a highly efficient alternative for Bayesian inference, INLA provides results of comparable or superior accuracy to MCMC but reduces computation time from hours to seconds or minutes. The content is structured around the comprehensive framework of the book “Bayesian survival, longitudinal and joint models with INLA” (Chapman & Hall/CRC, May 2026). It showcases the framework’s flexibility, enabling participants to implement multivariate joint models by combining various longitudinal distributions and handling complex survival scenarios like competing risks and multi-state models.

Learning Objectives

Upon completion of this course, participants will be able to understand the fundamental principles of the INLA methodology and its advantages for biostatistical modeling. They will learn to formulate and fit a wide range of advanced Bayesian models for complex longitudinal, survival, and joint data structures. Participants will also be able to implement these models efficiently using the user-friendly INLAjoint R package. Finally, they will learn to generate and interpret dynamic predictions for personalized risk assessment, applying the fitted models for practical inference.

Format

Three sequential modules: An introductory overview of joint modeling and INLA, a hands on session building univariate then multivariate joint models, and a final segment on dynamic prediction, visualization and a Q&A.

Appeal to Attendees

This course is highly relevant for everyone analyzing complex observational studies or modern clinical trials with longitudinal biomarkers. It addresses the growing need to answer complex scientific questions where traditional methods may fail due to computational costs. By introducing a method that scales to large data and reduces computation time from hours to minutes, it makes joint modeling accessible for challenging datasets like registries. The free, open-source nature of the software and forthcoming book makes it accessible to researchers from any region.

Course Requirements

This is an intermediate level course. Participants are expected to have a background in applied statistics and R programming. Specifically, they should possess a solid understanding of mixed-effects models, prior experience with survival analysis (proportional hazards), and basic experience using R. No prior experience with Bayesian statistics or INLA is required. Participants must bring a personal laptop with R installed to participate in the hands-on exercises.

Genomic AI-Agents for Personal Genomics: Hands-On Analysis With Just-DNA-Lite and LLM Agents

Half-Day

Course Instructor(s):

Anton Kulaga
IBIMA, Institute for Biostatistics and Informatics in Medicine and Ageing Research University Medical Centre Rostock, Germany
Institute of Biochemistry of the Romanian Academy (IBAR), Romania

Livia Zaharia
HEALES (Healthy Life Extension Society), Belgium

Nikolay Usanov
HEALES (Healthy Life Extension Society), Belgium

General Outline and Main Topics

This course teaches participants to analyze genomes using open-source tools and AI-agents. We cover the pipeline from FASTQ through alignment and variant calling to VCF in theory, then focus hands-on on annotation and reporting, since alignment takes hours. Participants run Just-DNA-Lite locally to produce disease-risk and polygenic risk score (PRS) reports, filter annotations, then use AI-agents to build a new annotation module from scientific literature and apply it to real genomic data. Longevity genomics serves as the running example. If the time permits, we will demonstrate variant effect prediction with genomic foundation models.

Learning Objectives

Participants will understand the genomic pipeline from FASTQ through VCF to annotated reports (theory only). They will run Just-DNA-Lite locally to produce disease risk and polygenic risk score (PRS) reports, filter and interpret the annotation outputs. Further objectives are the use of AI-agents to create a new annotation module from published literature and polygenic risk scores and in conclusion, apply that module to genomic data and critically evaluate the results.

Format

Five workshop sessions: A brief genomics background overview, a hands on Just DNA Lite walkthrough, building a custom module with AI Agents, an application of that module to real data, and an optional preview of advanced foundation model predictions.

Appeal to Attendees

The conference theme “Big Data, Small Data, Your Data” fits exactly: Participants will analyze their own genomic data using large-scale databases and AI. The course bridges the fields of biostatistics (PRS methodology), medical informatics (clinical variants), and bioinformatics (annotation pipelines). Participants do not just run pre-built analyses — they use AI-agents to create a new module from literature and test it on real genomes. All software is open-source.

Course Requirements

Researchers and students with basic biology knowledge and IT skills. Life-science background is not required — IT expertise alone suffices. No AI/LLM experience needed. Participants bring a laptop (Linux, macOS, or Windows with WSL) with Python 3.12+ and uv pre-installed. Those with sequenced genomes can bring their VCF file.

Performing Comparative Effectiveness Research Using Network Meta-Analysis: An Introduction to Theoretical and Practical Advances

Full-Day

Course Instructor(s):

Theodoros Evrenoglou
Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Germany

Anna Chaimani
University of Oslo, Oslo Center for Biostatistics and Epidemiology, Department of Biostatistics, Norway

Guido Schwarzer
Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Germany

General Outline and Main Topics

Network meta-analysis (NMA) is a comparative effectiveness research (CER) statistical method that enables the synthesis of evidence from multiple studies and the simultaneous comparison of multiple treatments for a given condition. By combining direct and indirect evidence, NMA can inform all possible treatment comparisons, including those for which head-to-head studies are unavailable. To date, NMA has become a standard tool in CER and is widely adopted by guideline developers, stakeholders, and major organizations such as the World Health Organization and the National Institute for Health and Care Excellence. Well-documented benefits of NMA include increased precision and reliability of treatment effect estimates, as well as the ability to translate evidence across multiple treatments into a single, coherent ranking. Recent methodological advances have extended NMA to more complex settings, including multiple-component interventions, multiple outcomes, and rare events which commonly arise in the analysis of safety outcomes.

Learning Objectives

The aim of this course is to introduce both basic and advanced topics in NMA from methodological and applied perspectives. The theoretical part of the course will cover the foundations of several NMA approaches and illustrate their use and interpretation through real-world applications across diverse medical fields. In parallel, a series of interactive practical sessions will provide participants with hands-on experience in applying NMA methods and familiarity with the R packages meta, netmeta, and mvnma.

Format

Interactive lectures on basic and advanced NMA concepts, each paired with hands on labs and a demo of NMAstudio and AI supported tools.

Appeal to Attendees

By the end of the course, participants will have developed a solid understanding of the theoretical foundations of NMA, together with familiarity with recent methodological developments and tools. Through a combination of theory and hands-on practice, participants will acquire skills that enable them to critically assess, implement, and interpret NMA in applied settings. These competencies will allow participants to contribute effectively as statistical experts to clinical and public health applications, supporting evidence-based decision-making and health technology assessment.

Course Requirements

This course is designed for an interdisciplinary audience, focusing on students and early career researchers with an understanding of biostatistics, as well as health professionals, policymakers, and epidemiologists interested in integrating evidence synthesis methods into decision-making. To fully engage with the practical sessions, participants are expected to have some experience with R.

Prediction Under Interventions

Half-Day

Course Instructor(s):

Nan van Geloven
Leiden University Medical Center, The Netherlands

Karla Diaz Ordaz
University College London, United Kingdom

Doranne Thomassen
Leiden University Medical Center, The Netherlands

General Outline and Main Topics

A key aim of prediction algorithms is to provide individualized risk estimates that support end users in making decisions on interventions, for example personalizing a medical treatment. Most prediction models are however derived from observational training data where some individuals already receive the interventions the model aims to inform. This makes standard prediction models unfit to provide actionable information. In this half-day course we focus on methods for developing and evaluating predictions under interventions. These are estimates of risk under the specific intervention options that users need to decide on and can be obtained by combining prediction methods with causal inference techniques.

Learning Objectives

Participants will learn how to identify the pitfalls of relying on regular predictions for decision support, implement methods for developing interventional prediction algorithms, and apply techniques for evaluating the counterfactual performance of those predictions.

Format

Two lectures introducing causal blind spots and interventional prediction methods, each paired with hands on R exercises for training regular and causal models, followed by a break and a final lecture plus exercise on evaluating counterfactual performance.

Appeal to Attendees

Bridging the gap between predictive modeling and personalized medicine is a shared challenge for bioinformaticians, biostatisticians, and epidemiologists. This course explores the essential intersection of machine learning and causal inference. As such, it provides an excellent kick-off for further cross-disciplinary exchange among the diverse communities attending ISCB GMDS 2026.

Course Requirements

We assume participants are versed in regular prediction techniques: training a regression or machine learning model, and assessing its performance with model fit measures such as RMSE or AUC. We also assume familiarity with using R (though code for carrying out the practical exercises will be provided). We assume only basic familiarity with causal inference concepts such as confounding, other necessary concepts will be introduced.

The Reference Terminology SNOMED CT

Half-Day

Course Instructor(s):

Joshua Wiedekopf
Section for Clinical Research IT, University Hospital Schleswig-Holstein & University of Luebeck, Germany

General Outline and Main Topics

In the changing landscape towards data-driven medicine, the use of precise, re-usable clinical documentation and interoperable systems is of tantamount importance. SNOMED CT is the global reference terminology in healthcare and biomedical research, and is widely usable to ensure these vital goals, but presents unique challenges in its deployment and utilization. Based on the core role of the speaker within the German Medical Informatics Initiative as the chief expert on terminology services and controlled terminology, participants will receive a comprehensive overview in what makes SNOMED CT so valuable. The tutorial will serve as a starting point for further learning, and be an open forum for discussing this incredible resource.

Learning Objectives

Participants will appreciate the purpose of using structured clinical terminologies in healthcare and biomedical research, differentiate between terminologies, ontologies, classifications and other types of vocabularies, review the history and development of SNOMED CT and understand how it is maintained and distributed, grasp the structure and content of SNOMED CT—including concepts, descriptions, relationships, and reference sets—, get started with access to and the use of SNOMED CT by employing the SNOMED CT Browser and other tools, and identify personal pathways to work within the SNOMED CT community to contribute to the development and improvement of SNOMED CT.

Format

An introductory session sets objectives, followed by a sequence of focused talks on terminology fundamentals, SNOMED CT history, its structure, access tools, and community involvement, concluding with a wrap up.

Appeal to Attendees

SNOMED CT is an incredible resource for ensuring not only precise coding of medical facts now, but also for ensuring that data will be usable in the future for further research. Using it in practice is, however, not without challenges. By attending this tutorial, you will have the opportunity to learn about this resource in a condensed manner by a speaker who has in the past few years become an expert in the use of terminology servers and controlled terminology, especially with regards to SNOMED CT. In the tutorial, you will also be able to connect to likeminded researchers who are also interested in improving data quality and their understanding of primary-care data.

Course Requirements

The tutorial will be geared towards novice users of controlled terminology and SNOMED CT. The talk will incorporate hands-on experience with the tools that are presented, so bringing a laptop is encouraged. Programming experience is not required. Attendees will receive a comprehensive (digital) hand-out serving as a starting point for further learning.

BIG DATA

small data

Your Data