This release of scikit-survival adds two features that are standard in most software for survival analysis, but were missing so far:

- CoxPHSurvivalAnalysis
now has a
`ties`

parameter that allows you to choose between Breslow’s and Efron’s likelihood for handling tied event times. Previously, only Breslow’s likelihood was implemented and it remains the default. If you have many tied event times in your data, you can now select Efron’s likelihood with`ties="efron"`

to get better estimates of the model’s coefficients. - A compare_survival function has been added. It can be used to assess whether survival functions across 2 or more groups differ.

To illustrate the use of
compare_survival,
let’s consider the Veterans’ Administration Lung Cancer Trial.
Here, we are considering the `Celltype`

feature and we want to know whether
the tumor type impacts survival. We can visualize the survival function for
each subgroup using the Kaplan-Meier estimator.

```
import matplotlib.pyplot as plt
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.nonparametric import kaplan_meier_estimator
data_x, data_y = load_veterans_lung_cancer()
group_indicator = data_x.loc[:, "Celltype"]
groups = group_indicator.unique()
for group in groups:
group_y = data_y[group_indicator == group]
time, surv_prob = kaplan_meier_estimator(
group_y["Status"],
group_y["Survival_in_days"])
plt.step(time, surv_prob, where="post",
label="Celltype = {}".format(group))
plt.xlabel("time $t$")
plt.ylabel("est. probability of survival")
plt.ylim(0, 1)
plt.grid(True)
plt.legend()
```

The figure indicates that patients with adenocarcinoma (green line) do not survive beyond 200 days, whereas patients with squamous cell lung cancer (blue line) can survive several years. We can determine whether this difference is indeed statistically significant by performing a non-parametric log-rank test. It groups patients according to cell type and compares the estimated group-specific hazard rate with the pooled hazard rate. Under the null hypothesis, the hazard rate of groups is equal for all time points. The alternative hypothesis is that the hazard rate of at least one group differs from the others at some time.

```
from sksurv.compare import compare_survival
chisq, pvalue, stats, covar = compare_survival(
data_y, group_indicator, return_stats=True)
```

The resulting test statistic $\chi^2 = 25.40$, which corresponds
to a highly significant P-value of $1.3\cdot{10}^{-5}$.
In addition, we can look at group-specific statistics by specifying
`return_stats=True`

.

counts | observed | expected | statistic | |
---|---|---|---|---|

group | ||||

adeno | 27 | 26 | 15.69 | 10.31 |

large | 27 | 26 | 34.55 | -8.55 |

smallcell | 48 | 45 | 30.10 | 14.90 |

squamous | 35 | 31 | 47.65 | -16.65 |

The column *counts* lists the size of each group and
is followed by the number of *observed* and *expected*
events. The last column *statistic* is the
difference between the observed and expected number
of events from which the overall $\chi^2$ statistic
is computed.

## Download

The latest version of scikit-survival can be obtained via *conda* or *pip*. Pre-built conda packages are available for Linux, OSX and Windows:

```
conda install -c sebp scikit-survival
```

Alternatively, you can install it from source via pip:

```
pip install -U scikit-survival
```