Introduction:
Prediction of outcomes following allogeneic hematopoietic cell transplantation (HCT) remains a major challenge. Machine learning (ML) is a computational procedure that may facilitate the generation of HCT prediction models. We sought to investigate the prognostic potential of multiple ML algorithms when applied to a large single-center allogeneic HCT database.
Methods:
Our registry included 2697 patients that underwent allogeneic HCT from January 1976 to December 2017, 45 pre-transplant baseline variables were included in the predictive assessment of each ML algorithm on overall survival (OS) as determined by area under the curve (AUC). Pre-transplant variables used in the EBMT machine learning study (Shouval et al, 2015) were used as a benchmark for comparison.
Results:
On the entire dataset, the random forest (RF) algorithm performed best (AUC 0.71±0.04) compared to the second-best model, logistic regression (LR) (AUC=0.69±0.04) (p<0.001). Both algorithms demonstrated improved AUC scores using all 45 variables compared to the limited variables examined by the EBMT study. Survival at 100 days post-HCT using RF on the full dataset discriminated patients into different prognostic groups with different 2-year OS (p<0.0001). We then examined the ML methods that allow for significant individual variable identification, including LR and RF, and identified matched related donors (HR=0.49, p<0.0001), increasing TBI dose (HR=1.60, p=0.006), increasing recipient age (HR=1.92, p<0.0001), higher baseline Hb (HR=0.59, p=0.0002) and increased baseline FEV1 (HR=0.73, p=0.02), among others.
Conclusion:
The application of multiple ML techniques on single center allogeneic HCT databases warrants further investigation and may provide a useful tool to identify variables with prognostic potential.