scikit-survival 0.12 Released
Version 0.12 of scikit-survival adds support for scikit-learn 0.22 and Python 3.8 and comes with two noticeable improvements:
- sklearn.pipeline.Pipeline
will now be automatically patched
to add support for
predict_cumulative_hazard_function
andpredict_survival_function
if the underlying estimator supports it (see first example ). - The regularization strength of the ridge penalty in sksurv.linear_model.CoxPHSurvivalAnalysis can now be set per feature (see second example ).
For a full list of changes in scikit-survival 0.12, please see the release notes.
Pre-built conda packages are available for Linux, macOS, and Windows via
conda install -c sebp scikit-survival
Alternatively, scikit-survival can be installed from source via pip:
pip install -U scikit-survival
Using pipelines
You can now create a scikit-learn pipeline and directly
call predict_cumulative_hazard_function
and predict_survival_function
if the underlying estimator supports it, such as
RandomSurvivalForest below.
from sklearn.pipeline import make_pipeline
from sksurv.datasets import load_breast_cancer
from sksurv.ensemble import RandomSurvivalForest
from sksurv.preprocessing import OneHotEncoder
X, y = load_breast_cancer()
pipe = make_pipeline(OneHotEncoder(), RandomSurvivalForest())
pipe.fit(X, y)
surv_fn = pipe.predict_survival_function(X, y)
Per-feature regularization strength
If you want to fit Cox’s proportional hazards model to a large
set of features, but only shrink the coefficients for a subset
of features, previously, you had to use
CoxnetSurvivalAnalysis
and set the penalty_factor
parameter accordingly.
This release adds a similar option to
CoxPHSurvivalAnalysis, which only uses ridge regression.
For instance, consider the breast cancer data, which comprises 4 established markers (age, tumor size, tumor grade, and estrogen receptor status) and 76 genetic markers. It is sensible to fit a model where the established markers enter unpenalized and only the coefficients of the genetic markers get penalized. We can achieve this by creating an array for the regularization strength $\alpha$ where the entries corresponding to the established markers are zero.
import numpy as np
from sksurv.linear_model import CoxPHSurvivalAnalysis
X, y = load_breast_cancer()
# the last 4 features are: age, er, grade, size
num_genes = X.shape[1] - 4
# add 2, because after one-hot encoding grade becomes three features
alphas = np.ones(X.shape[1] + 2)
# do not penalize established markers
alphas[num_genes:] = 0.0
# fit the model
pipe = make_pipeline(OneHotEncoder(), CoxPHSurvivalAnalysis(alpha=alphas))
pipe.fit(X, y)