I am proud to announce the release if version 0.16.0 of scikit-survival,
The biggest improvement in this release is that you can now
change the evaluation metric that is used in estimators’ score method.
This is particular useful
for hyper-parameter optimization using scikit-learn’s GridSearchCV.
You can now use as_concordance_index_ipcw_scorer,
as_cumulative_dynamic_auc_scorer, or
as_integrated_brier_score_scorer to adjust the
score method to your needs.
The
example below
illustrates how to use these in practice.
For a full list of changes in scikit-survival 0.16.0, please see the release notes.
I am proud to announce the release if version 0.15.0 of scikit-survival,
which brings support for scikit-learn 0.24 and Python 3.9.
Moreover, if you fit a gradient boosting model with loss='coxph',
you can now predict the survival and cumulative hazard function using the
predict_cumulative_hazard_function and predict_survival_function methods.
The other enhancement is that cumulative_dynamic_auc now supports evaluating time-dependent predictions. For instance, you can now evaluate the predicted time-dependent risk of a RandomSurvivalForest rather than just evaluating the predicted total number of events per instance, which is what RandomSurvivalForest.predict returns.
Today marks the release of version 0.14.0 of scikit-survival. The biggest change in this release is actually not in the code, but in the documentation. This release features a complete overhaul of the documentation. Most importantly, the documentation has a more modern feel to it, thanks to the visually pleasing pydata Sphinx theme, which also powers pandas.
Moreover, the documentation now contains a User Guide section that bundles several topics surrounding the use of scikit-survival. Some of these were available as separate Jupyter notebooks previously, such as the guide on Evaluating Survival Models. There are two new guides: The first one is on penalized Cox models. It provides a hands-on introduction to Cox’s proportional hazards model with $\ell_2$ (Ridge) and $\ell_1$ (LASSO) penalty. The second guide, is on Gradient Boosted Models and covers how gradient boosting can be used to obtain a non-linear proportional hazards model or a non-linear accelerated failure time model by using regression tree base learners. The second part of this guide covers a variant of gradient boosting that is most suitable for high-dimensional data and is based on component-wise least squares base learners.
To make it easier to get started, all notebooks can now be run in a Jupyter notebook, right from your browser, just by clicking on
Today, I released version 0.13.0 of scikit-survival. Most notably, this release adds sksurv.metrics.brier_score and sksurv.metrics.integrated_brier_score, an updated PEP 517/518 compatible build system, and support for scikit-learn 0.23.
For a full list of changes in scikit-survival 0.13.0, please see the release notes.
Pre-built conda packages are available for Linux, macOS, and Windows via
conda install -c sebp scikit-survival
Alternatively, scikit-survival can be installed from source following these instructions.
A while back, I posted the Survival Analysis for Deep Learning tutorial. This tutorial was written for TensorFlow 1 using the tf.estimators API. The changes between version 1 and the current TensorFlow 2 are quite significant, which is why the code does not run when using a recent TensorFlow version. Therefore, I created a new version of the tutorial that is compatible with TensorFlow 2. The text is basically identical, but the training and evaluation procedure changed.
The complete notebook is available on GitHub, or you can run it directly using Google Colaboratory.
Version 0.12 of scikit-survival adds support for scikit-learn 0.22 and Python 3.8 and comes with two noticeable improvements:
predict_cumulative_hazard_function and predict_survival_function
if the underlying estimator supports it (see
first example
).For a full list of changes in scikit-survival 0.12, please see the release notes.
Today, I released a new version of scikit-survival which includes an implementation of Random Survival Forests. As it’s popular counterparts for classification and regression, a Random Survival Forest is an ensemble of tree-based learners. A Random Survival Forest ensures that individual trees are de-correlated by 1) building each tree on a different bootstrap sample of the original training data, and 2) at each node, only evaluate the split criterion for a randomly selected subset of features and thresholds. Predictions are formed by aggregating predictions of individual trees in the ensemble.
For a full list of changes in scikit-survival 0.11, please see the release notes.
This release of scikit-survival adds two features that are standard in most software for survival analysis, but were missing so far:
ties parameter that allows you to choose between Breslow’s
and Efron’s likelihood for handling tied event times. Previously, only
Breslow’s likelihood was implemented and it remains the default.
If you have many tied event times in your data, you can now select
Efron’s likelihood with ties="efron" to get better estimates of the
model’s coefficients.Most machine learning algorithms have been developed to perform classification or regression. However, in clinical research we often want to estimate the time to and event, such as death or recurrence of cancer, which leads to a special type of learning task that is distinct from classification and regression. This task is termed survival analysis, but is also referred to as time-to-event analysis or reliability analysis. Many machine learning algorithms have been adopted to perform survival analysis: Support Vector Machines, Random Forest, or Boosting. It has only been recently that survival analysis entered the era of deep learning, which is the focus of this post.
You will learn how to train a convolutional neural network to predict time to a (generated) event from MNIST images, using a loss function specific to survival analysis. The first part , will cover some basic terms and quantities used in survival analysis (feel free to skip this part if you are already familiar). In the second part , we will generate synthetic survival data from MNIST images and visualize it. In the third part , we will briefly revisit the most popular survival model of them all and learn how it can be used as a loss function for training a neural network. Finally , we put all the pieces together and train a convolutional neural network on MNIST and predict survival functions on the test data.
This release of scikit-survival adds support for scikit-learn 0.21 and pandas 0.24, among a couple of other smaller fixes. Please see the release notes for a full list of changes. If you are using scikit-survival in your research, you can now cite it using an Digital Object Identifier (DOI).