Uncategorized

Improving generalization of machine learning-identified biomarkers using causal modelling with examples from immune receptor diagnostics



  • Frazer, K. A., Murray, S. S., Schork, N. J. & Topol, E. J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).

    Article 

    Google Scholar
     

  • Locke, W. J. et al. DNA methylation cancer biomarkers: translation to the clinic. Front. Genet. 10, 1150 (2019).

    Article 

    Google Scholar
     

  • Byron, S. A., Van Keuren-Jensen, K. R., Engelthaler, D. M., Carpten, J. D. & Craig, D. W. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 17, 257–271 (2016).

    Article 

    Google Scholar
     

  • Huang, K., Wu, L. & Yang, Y. Gut microbiota: an emerging biological diagnostic and treatment approach for gastrointestinal diseases. JGH Open 5, 973–975 (2021).

    Article 

    Google Scholar
     

  • Arnaout, R. A. et al. The future of blood testing is the immunome. Front. Immunol 12, 626793 (2021).

    Article 

    Google Scholar
     

  • Strimbu, K. & Tavel, J. A. What are biomarkers? Curr. Opin. HIV AIDS 5, 463–466 (2010).

    Article 

    Google Scholar
     

  • Subbaswamy, A. & Saria, S. From development to deployment: dataset shift, causality and shift-stable models in health AI. Biostatistics 21, 345–352 (2020).

    MathSciNet 

    Google Scholar
     

  • Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).

    Article 

    Google Scholar
     

  • Whalen, S., Schreiber, J., Noble, W. S. & Pollard, K. S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 23, 169–181 (2021).

    Article 

    Google Scholar
     

  • Dockès, J., Varoquaux, G. & Poline, J.-B. Preventing dataset shift from breaking machine-learning biomarkers. GigaScience. 10, giab055 (2021).

    Article 

    Google Scholar
     

  • Daumé, H. & Marcu, D. Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006).

    Article 
    MathSciNet 

    Google Scholar
     

  • Kouw, W. M. & Loog, M. A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. 43, 766–785 (2021).

    Article 

    Google Scholar
     

  • Wang, J. et al. Generalizing to unseen domains: a survey on domain generalization. IEEE Trans. Knowl. Data Eng. 35, 8052–8072 (2023).


    Google Scholar
     

  • Gulrajani, I. & Lopez-Paz, D. In search of lost domain generalization. Preprint at https://arxiv.org/abs/2007.01434 (2020).

  • Liu, J. et al. Towards out-of-distribution generalization: a survey. Preprint at https://doi.org/10.48550/arXiv.2108.13624 (2023).

  • Pearl, J. Causality (Cambridge Univ. Press, 2009); https://doi.org/10.1017/CBO9780511803161

  • Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (MIT Press, 2017).

  • Hernán, M. & Robins, J. Causal Inference: What If (Chapman & Hall/CRC, 2020).


    Google Scholar
     

  • Rothenhäusler, D. & Bühlmann, P. Distributionally robust and generalizable inference. Statist. Sci. 38, 527–542 (2023).

    Article 
    MathSciNet 

    Google Scholar
     

  • Kaddour, J., Lynch, A., Liu, Q., Kusner, M. J. & Silva, R. Causal machine learning: a survey and open problems. Preprint at https://doi.org/10.48550/arXiv.2206.15475 (2022).

  • Heinze-Deml, C., Maathuis, M. H. & Meinshausen, N. Causal structure learning. Annu. Rev. Stat. Appl. 5, 371–391 (2018).

    Article 
    MathSciNet 

    Google Scholar
     

  • Squires, C. & Uhler, C. Causal structure learning: a combinatorial perspective. Found. Comput. Math. https://doi.org/10.1007/s10208-022-09581-9 (2022).

    Article 

    Google Scholar
     

  • Peters, J., Bühlmann, P. & Meinshausen, N. Causal inference by using invariant prediction: identification and confidence intervals. J. R. Stat. Soc. B Stat. Methodol. 78, 947–1012 (2016).

    Article 
    MathSciNet 

    Google Scholar
     

  • Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. Preprint at https://doi.org/10.48550/arXiv.1907.02893 (2020).

  • Jiang, Y. & Veitch, V. Invariant and transportable representations for anti-causal domain shifts. Adv. Neural Inf. Process Syst. 35, 20782–20794 (2022).


    Google Scholar
     

  • Magliacane, S. et al. Domain adaptation by using causal inference to predict invariant conditional distributions. Adv. Neural Inf. Process Syst. 31, 10846–10856 (2018).


    Google Scholar
     

  • Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article 

    Google Scholar
     

  • Cui, P. & Athey, S. Stable learning establishes some common ground between causal inference and machine learning. Nat. Mach. Intell. 4, 110–115 (2022).

    Article 

    Google Scholar
     

  • Bareinboim, E. & Pearl, J. Causal inference and the data-fusion problem. Proc. Natl Acad. Sci. USA 113, 7345–7352 (2016).

    Article 

    Google Scholar
     

  • Richens, J. G., Lee, C. M. & Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 11, 3923 (2020).

    Article 

    Google Scholar
     

  • Prosperi, M. et al. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nat. Mach. Intell. 2, 369–375 (2020).

    Article 

    Google Scholar
     

  • Raita, Y., Camargo, C. A., Liang, L. & Hasegawa, K. Big data, data science and causal inference: a primer for clinicians. Front. Med. 8, 678047 (2021).

    Article 

    Google Scholar
     

  • Schölkopf, B. et al. On causal and anticausal learning. In Proc. 29th International Conference on Machine Learning 459–466 (Omnipress, 2012).

  • Greiff, V., Yaari, G. & Cowell, L. Mining adaptive immune receptor repertoires for biological and clinical information using machine learning. Curr. Opin. Syst. Biol. https://doi.org/10.1016/j.coisb.2020.10.010 (2020).

    Article 

    Google Scholar
     

  • Emerson, R. O. et al. Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat. Genet. 49, 659–665 (2017).

    Article 

    Google Scholar
     

  • Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019).

    Article 

    Google Scholar
     

  • Britanova, O. V. et al. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. J. Immunol. 192, 2689–2698 (2014).

    Article 

    Google Scholar
     

  • Schneider-Hohendorf, T. et al. Sex bias in MHC I-associated shaping of the adaptive immune system. Proc. Natl Acad. Sci. USA 115, 2168–2173 (2018).

    Article 

    Google Scholar
     

  • Slabodkin, A. et al. Individualized VDJ recombination predisposes the available Ig sequence space. Genome Res. 31, 2209–2224 (2021).

    Article 

    Google Scholar
     

  • Dendrou, C. A., Petersen, J., Rossjohn, J. & Fugger, L. HLA variation and disease. Nat. Rev. Immunol. 18, 325–339 (2018).

    Article 

    Google Scholar
     

  • Ishigaki, K. et al. HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors. Nat. Genet. 54, 393–402 (2022).

    Article 

    Google Scholar
     

  • Barennes, P. et al. Benchmarking of T cell receptor repertoire profiling methods reveals large systematic biases. Nat. Biotechnol. 39, 236–245 (2021).

    Article 

    Google Scholar
     

  • Trück, J. et al. Biological controls for standardization and interpretation of adaptive immune receptor repertoire profiling. eLife 10, e66274 (2021).

    Article 

    Google Scholar
     

  • Smirnova, A. O. et al. The use of non-functional clonotypes as a natural calibrator for quantitative bias correction in adaptive immune receptor repertoire profiling. eLife 12, e69157 (2023).

    Article 

    Google Scholar
     

  • Krishna, C., Chowell, D., Gönen, M., Elhanati, Y. & Chan, T. A. Genetic and environmental determinants of human TCR repertoire diversity. Immun. Ageing 17, 26 (2020).

    Article 

    Google Scholar
     

  • Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).

    Article 

    Google Scholar
     

  • Castelo-Branco, C. & Soveral, I. The immune system and aging: a review. Gynecol. Endocrinol. 30, 16–22 (2014).

    Article 

    Google Scholar
     

  • Hernán, M. A., Hsu, J. & Healy, B. A second chance to get causal inference right: a classification of data science tasks. Chance 32, 42–49 (2019).

    Article 

    Google Scholar
     

  • Blaas, A., Miller, A., Zappella, L., Jacobsen, J.-H. & Heinze-Deml, C. Considerations for distribution shift robustness in health. In Proc. Machine Learning for Healthcare Workshop (ICLR, 2023).

  • Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).

    Article 

    Google Scholar
     

  • Bonaguro, L. et al. A guide to systems-level immunomics. Nat. Immunol. 23, 1412–1423 (2022).

    Article 

    Google Scholar
     

  • Bareinboim, E. & Pearl, J. Controlling selection bias in causal inference. In Proc. 15th International Conference on Artificial Intelligence and Statistics Vol. 22 (eds Lawrence, N. et al.), 100–108 (PMLR, 2012).

  • Correa, J., Tian, J. & Bareinboim, E. Generalized adjustment under confounding and selection biases. In Proc. 32nd AAAI Conference on Artificial Intelligence Vol. 32, 6335–6342 (AAAI, 2018).

  • Laubach, Z. M., Murray, E. J., Hoke, K. L., Safran, R. J. & Perng, W. A biologist’s guide to model selection and causal inference. Proc. R. Soc. B Biol. Sci. 288, 20202815 (2021).

    Article 

    Google Scholar
     

  • Hernán, M. A., Hernández-Díaz, S. & Robins, J. M. A structural approach to selection bias. Epidemiology 15, 615–625 (2004).

    Article 

    Google Scholar
     

  • Zhang, K., Schölkopf, B., Muandet, K. & Wang, Z. Domain adaptation under target and conditional shift. In Proc. International Conference on Machine Learning 28 (eds Dasgupta, S. et al.) 819–827 (PMLR, 2013).

  • Garg, S., Wu, Y., Balakrishnan, S. & Lipton, Z. C. A unified view of label shift estimation. Adv. Neural Inf. Proc. Syst. 33, 3290–3300 (2020).


    Google Scholar
     

  • Pearl, J. & Bareinboim, E. External validity: from Do-calculus to transportability across populations. Stat. Sci. 29, 579–595 (2014).

    Article 
    MathSciNet 

    Google Scholar
     

  • Degtiar, I. & Rose, S. A review of generalizability and transportability. Annu. Rev. Stat. Appl. 10, 501–524 (2023).

    Article 
    MathSciNet 

    Google Scholar
     

  • Sharon, E. et al. Genetic variation in MHC proteins is associated with T cell receptor expression biases. Nat. Genet. 48, 995–1002 (2016).

    Article 

    Google Scholar
     

  • Jabri, B. & Sollid, L. M. T cells in Celiac disease. J. Immunol. 198, 3005–3014 (2017).

    Article 

    Google Scholar
     

  • Schaafsma, E., Fugle, C. M., Wang, X. & Cheng, C. Pan-cancer association of HLA gene expression with cancer prognosis and immunotherapy efficacy. Br. J. Cancer 125, 422–432 (2021).

    Article 

    Google Scholar
     

  • Rappazzo, C. G. et al. Defining and studying B cell receptor and TCR interactions. J. Immunol. 211, 311–322 (2023).

    Article 

    Google Scholar
     

  • Hendrycks, D., Lee, K. & Mazeika, M. Using pre-training can improve model robustness and uncertainty. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 2712–2721 (PMLR, 2019).

  • Pradier, M. F. et al. AIRIVA: a deep generative model of adaptive immune repertoires. Preprint at https://doi.org/10.48550/arXiv.2304.13737 (2023).

  • Gao, Y. et al. Pan-Peptide meta learning for T-cell receptor–antigen binding recognition. Nat. Mach. Intell. 5, 236–249 (2023).

    Article 

    Google Scholar
     

  • Ostrovsky-Berman, M., Frankel, B., Polak, P. & Yaari, G. Immune2vec: embedding B/T cell receptor sequences in N using natural language processing. Front. Immunol. 12, 680687 (2021).

    Article 

    Google Scholar
     

  • Fang, Y., Liu, X. & Liu, H. Attention-aware contrastive learning for predicting T cell receptor–antigen binding specificity. Brief. Bioinform. 23, bbac378 (2022).

    Article 

    Google Scholar
     

  • Gupta, G., Kapila, R., Gupta, K. & Raskar, R. Domain generalization in robust invariant representation. Preprint at https://doi.org/10.48550/arXiv.2304.03431 (2023).

  • Zhang, J. & Bottou, L. Learning useful representations for shifting tasks and distributions. In Proc. 40th International Conference on Machine Learning (eds Krause, A et al.), 40830–40850 (PMLR, 2023).

  • Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Methods 18, 1122–1127 (2021).

    Article 

    Google Scholar
     

  • Wiles, O. et al. A fine-grained analysis on distribution shift. Preprint at https://arxiv.org/abs/2110.11328 (2021).

  • Byrd, J. & Lipton, Z. What is the effect of importance weighting in deep learning? In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 872–881 (PMLR, 2019).

  • Rubelt, F. et al. Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-repertoire sequencing data. Nat. Immunol. 18, 1274–1278 (2017).

    Article 

    Google Scholar
     

  • Vander Heiden, J. A. et al. AIRR community standardized representations for annotated immune repertoires. Front. Immunol. 9, 2206 (2018).

    Article 

    Google Scholar
     

  • Peng, K. et al. Diversity in immunogenomics: the value and the challenge. Nat. Methods 18, 588–591 (2021).

    Article 

    Google Scholar
     

  • Huang, Y.-N. et al. Ancestral diversity is limited in published T cell receptor sequencing studies. Immunity 54, 2177–2179 (2021).

    Article 

    Google Scholar
     

  • Registered Reports (Center for Open Science); https://www.cos.io/initiatives/registered-reports

  • DeWitt, W. S. III et al. Human T cell receptor occurrence patterns encode immune history, genetic background and receptor specificity. eLife 7, e38358 (2018).

    Article 
    MathSciNet 

    Google Scholar
     

  • Zaslavsky, M. E. et al. Disease diagnostics using machine learning of immune receptors. Preprint at bioRxiv https://doi.org/10.1101/2022.04.26.489314 (2023).

  • Langenberg, C., Hingorani, A. D. & Whitty, C. J. M. Biological and functional multimorbidity—from mechanisms to management. Nat. Med. 29, 1649–1657 (2023).

    Article 

    Google Scholar
     

  • Bongers, S., Forré, P., Peters, J. & Mooij, J. M. Foundations of structural causal models with cycles and latent variables. Ann. Stat. 49, 2885–2915 (2021).

    Article 
    MathSciNet 

    Google Scholar
     

  • Chakraborty, B. & Murphy, S. A. Dynamic treatment regimes. Annu. Rev. Stat. Appl. 1, 447–464 (2014).

    Article 

    Google Scholar
     

  • Bizzarri, M. et al. A call for a better understanding of causation in cell biology. Nat. Rev. Mol. Cell Biol. 20, 261–262 (2019).

    Article 

    Google Scholar
     

  • Baron, R. M. & Kenny, D. A. The moderator–mediator variable distinction in social psychological research: conceptual, strategic and statistical considerations. J. Pers. Soc. Psychol. 51, 1173–1182 (1986).

    Article 

    Google Scholar
     

  • Greiff, V., Miho, E., Menzel, U. & Reddy, S. T. Bioinformatic and statistical analysis of adaptive immune repertoires. Trends Immunol. 36, 738–749 (2015).

    Article 

    Google Scholar
     

  • Nikolich-Žugich, J., Slifka, M. K. & Messaoudi, I. The many important facets of T-cell repertoire diversity. Nat. Rev. Immunol. 4, 123–132 (2004).

    Article 

    Google Scholar
     

  • Zarnitsyna, V., Evavold, B., Schoettle, L., Blattman, J. & Antia, R. Estimating the diversity, completeness, and cross-reactivity of the T cell repertoire. Front. Immunol. 4, 485 (2013).

    Article 

    Google Scholar
     

  • Murugan, A., Mora, T., Walczak, A. M. & Callan, C. G. Statistical inference of the generation probability of T-cell receptors from sequence repertoires. Proc. Natl Acad. Sci. USA 109, 16161–16166 (2012).

    Article 

    Google Scholar
     

  • Tonegawa, S. Somatic generation of antibody diversity. Nature 302, 575–581 (1983).

    Article 

    Google Scholar
     

  • Weinstein, J. A., Jiang, N., White, R. A., Fisher, D. S. & Quake, S. R. High-throughput sequencing of the zebrafish antibody repertoire. Science 324, 807–810 (2009).

    Article 

    Google Scholar
     

  • Xu, J. L. & Davis, M. M. Diversity in the CDR3 region of VH is sufficient for most antibody specificities. Immunity 13, 37–45 (2000).

    Article 

    Google Scholar
     

  • Davis, M. M. & Bjorkman, P. J. T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988).

    Article 

    Google Scholar
     

  • Brown, A. J. et al. Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires. Mol. Syst. Des. Eng. 4, 701–736 (2019).

    Article 

    Google Scholar
     

  • Qi, Q. et al. Diversity and clonal selection in the human T-cell repertoire. Proc. Natl Acad. Sci. USA 111, 13139–13144 (2014).

    Article 

    Google Scholar
     

  • Elhanati, Y. et al. Inferring processes underlying B-cell repertoire diversity. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 370, 20140243 (2015).

    Article 

    Google Scholar
     

  • Greiff, V. et al. A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome Med. 7, 49 (2015).

    Article 

    Google Scholar
     

  • Elhanati, Y., Sethna, Z., Callan, C. G. Jr, Mora, T. & Walczak, A. M. Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination. Immunol. Rev. 284, 167–179 (2018).

    Article 

    Google Scholar
     

  • Varoquaux, G. & Cheplygina, V. Machine learning for medical imaging: methodological failures and recommendations for the future. Npj Digit. Med. 5, 48 (2022).

    Article 

    Google Scholar
     

  • Ben-David, S. et al. A theory of learning from different domains. Mach. Learn. 79, 151–175 (2010).

    Article 
    MathSciNet 

    Google Scholar
     



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *