I’m a researcher at the lab for Artificial Intelligence in Medical Imaging working on machine learning for biomedical applications. My research interests are time-to-event analysis (survival analysis) and using deep learning techniques to learn from non-Euclidean data such as graphs. Previously, I worked at The Institute of Cancer Research, London and was among the winners of the Prostate Cancer DREAM challenge. I’m the author of scikit-survival, a machine learning library for survival analysis built on top of scikit-learn.
PhD in Computer Science, 2016
Technische Universität München
MSc in Bioinformatics, 2011
Ludwig-Maximilians-Universität & Technische Universität München
BSc in Bioinformatics, 2008
Ludwig-Maximilians-Universität & Technische Universität München
I’m pleased to announce the release of scikit-survival 0.17.2. This release fixes several small issues with packaging scikit-survival and the documentation. For a full list of changes in scikit-survival 0.17.2, please see the release notes.
Most notably, binary wheels are now available for Linux, Windows, and macOS (Intel).
This has been possible thanks to the cibuildwheel
build tool, which makes it incredible easy to use GitHub Actions for building
those wheels for multiple versions of Python.
Therefore, you can now use pip
without building everything from source by
simply running
pip install scikit-survival
As before, pre-built conda packages are available too, by running
conda install -c sebp scikit-survival
This release adds support for scikit-learn 1.0, which includes support for feature names. If you pass a pandas dataframe to fit
, the estimator will set a feature_names_in_
attribute containing the feature names. When a dataframe is passed to predict
, it is checked that the column names are consistent with those passed to fit
.
The
example below
illustrates this feature.
For a full list of changes in scikit-survival 0.17.0, please see the release notes.
I am proud to announce the release if version 0.16.0 of scikit-survival,
The biggest improvement in this release is that you can now
change the evaluation metric that is used in estimators’ score
method.
This is particular useful
for hyper-parameter optimization using scikit-learn’s GridSearchCV
.
You can now use as_concordance_index_ipcw_scorer,
as_cumulative_dynamic_auc_scorer, or
as_integrated_brier_score_scorer to adjust the
score
method to your needs.
The
example below
illustrates how to use these in practice.
For a full list of changes in scikit-survival 0.16.0, please see the release notes.
I am proud to announce the release if version 0.15.0 of scikit-survival,
which brings support for scikit-learn 0.24 and Python 3.9.
Moreover, if you fit a gradient boosting model with loss='coxph'
,
you can now predict the survival and cumulative hazard function using the
predict_cumulative_hazard_function and predict_survival_function methods.
The other enhancement is that cumulative_dynamic_auc now supports evaluating time-dependent predictions. For instance, you can now evaluate the predicted time-dependent risk of a RandomSurvivalForest rather than just evaluating the predicted total number of events per instance, which is what RandomSurvivalForest.predict returns.
Today marks the release of version 0.14.0 of scikit-survival. The biggest change in this release is actually not in the code, but in the documentation. This release features a complete overhaul of the documentation. Most importantly, the documentation has a more modern feel to it, thanks to the visually pleasing pydata Sphinx theme, which also powers pandas.
Moreover, the documentation now contains a User Guide section that bundles several topics surrounding the use of scikit-survival. Some of these were available as separate Jupyter notebooks previously, such as the guide on Evaluating Survival Models. There are two new guides: The first one is on penalized Cox models. It provides a hands-on introduction to Cox’s proportional hazards model with $\ell_2$ (Ridge) and $\ell_1$ (LASSO) penalty. The second guide, is on Gradient Boosted Models and covers how gradient boosting can be used to obtain a non-linear proportional hazards model or a non-linear accelerated failure time model by using regression tree base learners. The second part of this guide covers a variant of gradient boosting that is most suitable for high-dimensional data and is based on component-wise least squares base learners.
To make it easier to get started, all notebooks can now be run in a Jupyter notebook, right from your browser, just by clicking on
scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while …