scikit-survival 0.12 Released
Version 0.12 of scikit-survival adds support for scikit-learn 0.22 and Python 3.8 and comes with two noticeable improvements:
will now be automatically patched
to add support for
predict_survival_functionif the underlying estimator supports it (see first example ).
- The regularization strength of the ridge penalty in sksurv.linear_model.CoxPHSurvivalAnalysis can now be set per feature (see second example ).
For a full list of changes in scikit-survival 0.12, please see the release notes.
Pre-built conda packages are available for Linux, macOS, and Windows via
conda install -c sebp scikit-survival
Alternatively, scikit-survival can be installed from source via pip:
pip install -U scikit-survival
You can now create a scikit-learn pipeline and directly
if the underlying estimator supports it, such as
from sklearn.pipeline import make_pipeline from sksurv.datasets import load_breast_cancer from sksurv.ensemble import RandomSurvivalForest from sksurv.preprocessing import OneHotEncoder X, y = load_breast_cancer() pipe = make_pipeline(OneHotEncoder(), RandomSurvivalForest()) pipe.fit(X, y) surv_fn = pipe.predict_survival_function(X, y)
Per-feature regularization strength
If you want to fit Cox’s proportional hazards model to a large
set of features, but only shrink the coefficients for a subset
of features, previously, you had to use
and set the
penalty_factor parameter accordingly.
This release adds a similar option to
CoxPHSurvivalAnalysis, which only uses ridge regression.
For instance, consider the breast cancer data, which comprises 4 established markers (age, tumor size, tumor grade, and estrogen receptor status) and 76 genetic markers. It is sensible to fit a model where the established markers enter unpenalized and only the coefficients of the genetic markers get penalized. We can achieve this by creating an array for the regularization strength $\alpha$ where the entries corresponding to the established markers are zero.
import numpy as np from sksurv.linear_model import CoxPHSurvivalAnalysis X, y = load_breast_cancer() # the last 4 features are: age, er, grade, size num_genes = X.shape - 4 # add 2, because after one-hot encoding grade becomes three features alphas = np.ones(X.shape + 2) # do not penalize established markers alphas[num_genes:] = 0.0 # fit the model pipe = make_pipeline(OneHotEncoder(), CoxPHSurvivalAnalysis(alpha=alphas)) pipe.fit(X, y)