Uncategorized

Machine learning-based meta-analysis of colorectal cancer and inflammatory bowel disease



Meta-Analysis

. 2023 Dec 22;18(12):e0290192.


doi: 10.1371/journal.pone.0290192.


eCollection 2023.

Affiliations

Free PMC article

Item in Clipboard

Meta-Analysis

Aria Sardari et al.


PLoS One.


.

Free PMC article

Abstract

Colorectal cancer (CRC) is a major global health concern, resulting in numerous cancer-related deaths. CRC detection, treatment, and prevention can be improved by identifying genes and biomarkers. Despite extensive research, the underlying mechanisms of CRC remain elusive, and previously identified biomarkers have not yielded satisfactory insights. This shortfall may be attributed to the predominance of univariate analysis methods, which overlook potential combinations of variants and genes contributing to disease development. Here, we address this knowledge gap by presenting a novel multivariate machine-learning strategy to pinpoint genes associated with CRC. Additionally, we applied our analysis pipeline to Inflammatory Bowel Disease (IBD), as IBD patients face substantial CRC risk. The importance of the identified genes was substantiated by rigorous validation across numerous independent datasets. Several of the discovered genes have been previously linked to CRC, while others represent novel findings warranting further investigation. A Python implementation of our pipeline can be accessed publicly at https://github.com/AriaSar/CRCIBD-ML.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures


Fig 1



Fig 1. Schematic representation of the research workflow.

a Raw datasets are retrieved from GEO, and tabular datasets are generated utilizing gene expression data and probe-gene mapping. b Data processing steps are performed, including discarding unassigned genes, imputing missing values, removing non-common genes, combining identical genes, and scaling each dataset. c After splitting datasets into training and validation sets and merging training sets to form a single set, a 1000-iteration oversampling/feature selection process is applied to identify the most prominent genes. d An ensemble classifier, comprising Random Forest, Support Vector Machine, and Logistic Regression, is trained on the training set. e The results are validated on the validation sets using the trained model, and four performance metrics—accuracy, F1-score, precision-recall, and confusion matrix—are employed for the evaluation of case-control sets and recall is employed for case-only sets.


Fig 2



Fig 2. Evaluation of identified CRC genes on independent validation sets.

Accuracy and F1-score are plotted for the different number of prominent genes utilized for training and validation. Confusion matrices and precision-recall curves (including AUC) are plotted using the first 40 prominent genes.


Fig 3



Fig 3. Evaluation of the model trained on tumor and matched normal samples on case-only datasets.


Fig 4



Fig 4. Evaluation of identified IBD genes on independent validation sets.

Accuracy and F1-score are plotted for the different number of prominent genes utilized for training and validation. Confusion matrices and precision-recall curves (including AUC) are plotted using the first 40 prominent genes.


Fig 5



Fig 5. IBD and CRC gene interaction networks generated by STRING for identified genes.

a Network generated based on CRC and IBD genes without the participation of intermediary genes. b Network generated based on CRC and IBD genes with the participation of intermediary genes.


Fig 6



Fig 6. Effect of different scalers on CRC training dataset and validation datasets.

References

    1. Araghi M, Soerjomataram I, Jenkins M, Brierley J, Morris E, Bray F, et al.. Global trends in colorectal cancer mortality: projections to the year 2035. Int J Cancer. 2019;144(12):2992–3000. doi: 10.1002/ijc.32055



      DOI



      PubMed

    1. Swiderska M, ska B, browska E, Konarzewska-Duchnowska E, ska K, Szczurko G, et al.. The diagnostics of colorectal cancer. Contemp Oncol (Pozn). 2014;18(1):1–6. doi: 10.5114/wo.2013.39995



      DOI



      PMC



      PubMed

    1. Vuik FE, Nieuwenburg SA, Bardou M, Lansdorp-Vogelaar I, Dinis-Ribeiro M, Bento MJ, et al.. Increasing incidence of colorectal cancer in young adults in Europe over the last 25 years. Gut. 2019;68(10):1820–1826. doi: 10.1136/gutjnl-2018-317592



      DOI



      PMC



      PubMed

    1. Siegel RL, Torre LA, Soerjomataram I, Hayes RB, Bray F, Weber TK, et al.. Global patterns and trends in colorectal cancer incidence in young adults. Gut. 2019;68(12):2179–2185. doi: 10.1136/gutjnl-2019-319511



      DOI



      PubMed

    1. Davidson KW, Barry MJ, Mangione CM, Cabana M, Caughey AB, Davis EM, et al.. Screening for colorectal cancer: US Preventive Services Task Force recommendation statement. Jama. 2021;325(19):1965–1977. doi: 10.1001/jama.2021.6238



      DOI



      PubMed

MeSH terms

Grants and funding

This research was partially supported by School of Graduate Studies of Memorial University and a discovery grant from the Natural Sciences and Engineering Research Council of Canada (NSERC) (grant number RGPIN: 2019-05650).

LinkOut – more resources

  • Full Text Sources

  • Miscellaneous



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *