Largest study using machine learning in neuroimaging sets new benchmark for major depression diagnosis

In a recent study published in Scientific Reports, researchers established a benchmark classification of major depressive disorder (MDD) using machine learning (ML) on cortical and subcortical measures.

Study: Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures. Image Credit: Elena Kalinicheva/Shutterstock.comStudy: Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures. Image Credit: Elena Kalinicheva/Shutterstock.com


MDD has a lifetime prevalence of 14%, with a great societal impact. It often increases suicide risk and reduces the quality of life of affected individuals.

Early diagnosis and treatment are critical, given the possibility of accelerated brain aging and therapeutic resistance. Moreover, reliable biomarkers to predict the progression and therapeutic response are lacking.

MDD diagnosis has been hitherto exclusively based on self-reported symptoms, which, nevertheless, presents a high risk of misdiagnosis. Further, comorbidities, such as anxiety spectrum disorders, substance use disorders, and other diseases, contribute to the challenges associated with the correct diagnosis and treatment of MDD.

Notably, advanced neuroimaging methods, such as magnetic resonance imaging (MRI), have made examining MDD-associated cortical and subcortical changes possible. Nonetheless, small effect sizes and inference at the group level preclude clinical application.

Tools like ML allow individual-level inference and may provide better discrimination between patients with MDD and healthy individuals.

About the study

In the present study, researchers established a benchmark classification of MDD using ML on cortical and subcortical measures. They included MDD patients and healthy controls (HCs) from 30 cohorts participating in the ENIGMA MDD Working Group.

Individuals with < 75% of combined cortical and subcortical characteristics and those with missing clinical or demographic data were excluded.

Structural T1-weighted three-dimensional (3D) brain MRI scans were acquired from each site and pre-processed per the ENIGMA Consortium protocols.

Cortical grey matter segmentation was based on the Desikan-Killiany atlas, while subcortical segmentation was based on the Aseg atlas. Data were split into training and test datasets by age/sex and site (of MRI acquisition).

For both splitting strategies, data were split into ten folds; nine were used for training, and one was used for the test set. This was iteratively repeated until each fold was once used as a test set, thus performing the tenfold cross-validation.

The team used shallow linear and non-linear classification models, such as support vector machines (SVM), logistic regression with different types of regularization (LASSO, ridge, elastic net), and random forests.


Overall, 2,288 MDD patients and 3,077 HCs were included in the analysis. There were substantial differences in sex and age distribution between cohorts. When split by age/sex, cohorts were evenly distributed across folds.

By contrast, data splitting by site caused an uneven distribution of participants. The classification performance of models was comparable.

The highest balanced accuracy was 0.639 when split by age/sex. The performance of all models was reduced when data were harmonized with ComBat. When split by site, there were no significant changes in classification performance, regardless of harmonization.

The balanced accuracy was close to random chance, suggesting that models could not distinguish MDD patients from HCs.

No substantial improvements occurred even with more sophisticated harmonization algorithms (CovBat and ComBat-GAM). The team assessed weights of SVM with linear kernel to determine which regions contributed to the classification.

SVM performance was mainly driven by the same cortical features with and without ComBat harmonization. Cortical thickness features had greater weights than cortical surface areas.

Further, the highest balanced accuracy was 0.632 for males and 0.585 for females in the splitting by age/sex strategy, which reduced to 0.53 and 0.529 after ComBat harmonization.

When split by site, the balanced accuracy did not change after harmonization. In the age/sex splitting strategy, the accuracy reduced after harmonization from 0.564 to 0.529 for a subgroup of participants not using antidepressants and from 0.716 to 0.534 for a subset of antidepressant users.


The researchers benchmarked ML performance using cortical and subcortical measures to differentiate MDD patients from HCs. Balanced accuracy was around 62% and 51% when data were split into folds by age/sex and site, respectively.

Data harmonization evened performance for both splitting strategies and yielded a balanced accuracy of up to 52%. This implied that initial performance differences were due to side effects, likely arising from differences in MRI acquisition procedures.

Minor differences occurred when the dataset was stratified based on clinical and demographic parameters. The findings indicate that common ML algorithms cannot distinguish MDD from HCs based on brain structural morphometric data alone.

Moreover, classification performance did not improve in stratified, demographically, and clinically more homogeneous groups.

Therefore, further studies are needed to determine whether more sophisticated algorithms could achieve better performance.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *