Big-RT: Big Data Analysis to Identify Combinatorial Predictors of Radiotherapy Toxicity for Personalised Treatment in Prostate Cancer Patients


Background: Nearly two thirds of cancer patients will receive radiotherapy (RT), with about 20% experiencing long-term RT-induced toxicity. Currently patients cannot be stratified by risk of side effects, which restricts treatment intensity for all patients. The pathogenesis of RT-induced toxicity is complex and multi-factorial, yet most predictive analyses to date are restricted to isolated data types and standard statistical techniques, with limited success. We developed and applied bespoke, state-of-the-art machine learning techniques to large-scale, high-dimensional multidisciplinary data from the CHHiP prostate RT-fractionation trial (CRUK/06/016) to identify multi-parametric predictors of RT toxicities.

Method: We performed a fully integrative analysis of clinician- and patient-reported outcomes, co-morbidities, dosimetry and genetic data (via RAPPER/PRACTICAL consortium) collected as part of the trial. Dosimetric data was presented as a functional fit to the DVH. Genetic data was reduced from $≈ 15 \times 10^6$ measures per patient to 100 by Bayesian single-SNP tests of association and further refined in a hybrid functional-scalar multivariate model with elastic-net penalty. Our final model was selected using K-fold cross-validation.

Results: We applied our methodology to 721 patients (out of 3,212 recruited) with complete data profiles. Our final model was trained on a feature matrix consisting of 12 clinical, 100 germline and 21 dosimetric variables. We identified novel candidate predictive markers integrating dosimetric, clinical and genetic variables to predict rectal bleeding, which have increased predictive power over isolated measures (mean ROC AUC of 0.713 vs. 0.524 for dosimetry and 0.641 for genetic data).

Conclusion: We have demonstrated the predictive power of our data-rich, integrative machine learning analysis driven by a multidisciplinary team. The resulting novel combinatorial markers predicting RT-induced toxicity will need to be validated in an independent data set. The successful techniques developed in this project will allow similar approaches to be applied to other tumour types treated with RT.

National Cancer Research Institute (NCRI) Cancer Conference
Sebastian Pölsterl
Post-Doctoral Researcher

My research interests include machine learning for time-to-event analysis, non-Euclidean data, and biomedical applications.