scikit-survival 0.12 Released

Version 0.12 of scikit-survival adds support for scikit-learn 0.22 and Python 3.8 and comes with two noticeable improvements:

  1. sklearn.pipeline.Pipeline will now be automatically patched to add support for predict_cumulative_hazard_function and predict_survival_function if the underlying estimator supports it (see first example ).
  2. The regularization strength of the ridge penalty in sksurv.linear_model.CoxPHSurvivalAnalysis can now be set per feature (see second example ).

For a full list of changes in scikit-survival 0.12, please see the release notes.

Pre-built conda packages are available for Linux, macOS, and Windows via

 conda install -c sebp scikit-survival

Alternatively, scikit-survival can be installed from source via pip:

 pip install -U scikit-survival

Using pipelines

You can now create a scikit-learn pipeline and directly call predict_cumulative_hazard_function and predict_survival_function if the underlying estimator supports it, such as RandomSurvivalForest below.

from sklearn.pipeline import make_pipeline
from sksurv.datasets import load_breast_cancer
from sksurv.ensemble import RandomSurvivalForest
from sksurv.preprocessing import OneHotEncoder

X, y = load_breast_cancer()

pipe = make_pipeline(OneHotEncoder(), RandomSurvivalForest()), y)

surv_fn = pipe.predict_survival_function(X, y)

Per-feature regularization strength

If you want to fit Cox’s proportional hazards model to a large set of features, but only shrink the coefficients for a subset of features, previously, you had to use CoxnetSurvivalAnalysis and set the penalty_factor parameter accordingly. This release adds a similar option to CoxPHSurvivalAnalysis, which only uses ridge regression.

For instance, consider the breast cancer data, which comprises 4 established markers (age, tumor size, tumor grade, and estrogen receptor status) and 76 genetic markers. It is sensible to fit a model where the established markers enter unpenalized and only the coefficients of the genetic markers get penalized. We can achieve this by creating an array for the regularization strength $\alpha$ where the entries corresponding to the established markers are zero.

import numpy as np
from sksurv.linear_model import CoxPHSurvivalAnalysis

X, y = load_breast_cancer()

# the last 4 features are: age, er, grade, size
num_genes = X.shape[1] - 4

# add 2, because after one-hot encoding grade becomes three features
alphas = np.ones(X.shape[1] + 2)
# do not penalize established markers
alphas[num_genes:] = 0.0

# fit the model
pipe = make_pipeline(OneHotEncoder(), CoxPHSurvivalAnalysis(alpha=alphas)), y)
Sebastian Pölsterl
Post-Doctoral Researcher

My research interests include machine learning for time-to-event analysis, non-Euclidean data, and biomedical applications.