Recent & Upcoming Talks

2018

Predictive models for time-to-event data are suitable when only partial information about the outcome is known for a subset of the data – they are censored. Right censoring is the most common form of censoring and is common to clinical studies because patients can drop out or fail to complete follow-up. For these patients, it is unknown whether they experienced an adverse event after their last day of contact. Cox’s proportional hazards model is by far the most popular model applied to time-to-event data, but many alternative models exist. Many traditional machine learning models have been extended to deal with censored outcomes, e.g., support vector machines, gradient boosting, random forest, and multilayer perceptrons. In this talk, I will explain the underlying principals of modelling time-to-event data and how they extend to modern machine learning models, including deep learning.

2017

The aim of survival analysis – also referred to as reliability analysis in engineering – is to analyse the time until one or more events happen. Examples from the medical domain are the time until death, until onset of a disease, or until pregnancy. In engineering, the time until the failure of a mechanical system is a common application. In a typical clinical study, the exact time of an event will remain unknown for a subset of individuals, simply because some remained event-free before the study ended or decided to withdraw from the study. For these patients, it is unknown whether they did or did not experience an event after termination of the study. The only valid information is that any (unobserved) event must have occurred after the study ended. This property needs to be considered when applying machine learning to these type of data.

In this talk, I will give an introduction to survival analysis and demonstrate how to analyse survival data using scikit-survival (https://github.com/sebp/scikit-survival): a Python module for survival analysis built on top of scikit-learn. I will introduce survival data from various domains and explain why traditional regression and classification methods are unsuitable. Using practical examples, I will demonstrate how scikit-survival can be used to estimate the time until an event and how additional variables can be used to improve prediction. Finally, I will give an outlook on more advanced methods, which are suitable to analyse high-dimensional clinical data.

In recent years, Docker has become an essential tool for software development. We demonstrate that Docker containers together with the GitLab platform can be a useful tool for researchers too. It enables them to easily catch problematic code, automate analysis workflows, archiving of results, and share their software and its dependencies across platforms. While a Docker image bundles the whole development stack and enables its cross-platform sharing, it is often cumbersome and repetitive to build, run, and deploy an image. GitLab is a software development platform built on top of the Git version control system with built-in support for Docker. Using GitLab’s continuous integration pipelines, most tasks related to managing Docker images can be automated. In addition, utilising tools from software development, we can perform automatic code analysis to identify faulty or problematic code as early as possible. We explain how to setup a Docker-powered project in GitLab and how to automate certain tasks to ease the development workflow:

How to automatically build a new Docker image once a project has been updated.

  1. How to identify faulty and problematic code.
  2. How to use GitLab to automatically run experiments and archive their results.
  3. How to share images with other researchers using GitLab’s Docker registry.