Uncategorized

Personalized Impression Generation for PET Reports Using Large Language Models



  • R. D. Niederkohr et al., “Reporting Guidance for Oncologic 18 F-FDG PET/CT Imaging,” J Nucl Med, vol. 54, no. 5, pp. 756–761, May 2013. https://doi.org/10.2967/jnumed.112.112177.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • M. P. Hartung, I. C. Bickle, F. Gaillard, and J. P. Kanne, “How to Create a Great Radiology Report,” RadioGraphics, vol. 40, no. 6, pp. 1658–1670, Oct. 2020. https://doi.org/10.1148/rg.2020200020.

    Article 
    PubMed 

    Google Scholar
     

  • Y. Zhang, D. Y. Ding, T. Qian, C. D. Manning, and C. P. Langlotz, “Learning to Summarize Radiology Findings,” in Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 204–213. https://doi.org/10.18653/v1/W18-5623.

  • J. Hu, Z. Li, Z. Chen, Z. Li, X. Wan, and T.-H. Chang, “Graph Enhanced Contrastive Learning for Radiology Findings Summarization.” arXiv, Jun. 08, 2022. Accessed: Mar. 02, 2023. [Online]. Available: http://arxiv.org/abs/2204.00203

  • J.-B. Delbrouck, M. Varma, and C. P. Langlotz, “Toward expanding the scope of radiology report summarization to multiple anatomies and modalities.” arXiv, Nov. 18, 2022. Accessed: Mar. 02, 2023. [Online]. Available: http://arxiv.org/abs/2211.08584

  • Z. Liu et al., “Radiology-GPT: A Large Language Model for Radiology.” arXiv, Jun. 14, 2023. Accessed: Jul. 17, 2023. [Online]. Available: http://arxiv.org/abs/2306.08666

  • Z. Sun et al., “Evaluating GPT4 on Impressions Generation in Radiology Reports,” Radiology, vol. 307, no. 5, p. e231259, Jun. 2023. https://doi.org/10.1148/radiol.231259.

    Article 
    PubMed 

    Google Scholar
     

  • C. Ma et al., “ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT.” arXiv, May 03, 2023. Accessed: Aug. 14, 2023. [Online]. Available: http://arxiv.org/abs/2304.08448

  • A. E. W. Johnson et al., “MIMIC-III, a freely accessible critical care database,” Sci Data, vol. 3, no. 1, p. 160035, May 2016. https://doi.org/10.1038/sdata.2016.35.

    Article 
    MathSciNet 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • J. Hu et al., “Word Graph Guided Summarization for Radiology Findings.” arXiv, Dec. 18, 2021. Accessed: Mar. 02, 2023. [Online]. Available: http://arxiv.org/abs/2112.09925

  • A. Smit, S. Jain, P. Rajpurkar, A. Pareek, A. Y. Ng, and M. P. Lungren, “CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT.” arXiv, Oct. 18, 2020. Accessed: Aug. 27, 2023. [Online]. Available: http://arxiv.org/abs/2004.09167

  • A. B. Abacha, W. Yim, G. Michalopoulos, and T. Lin, “An Investigation of Evaluation Metrics for Automated Medical Note Generation.” arXiv, May 27, 2023. Accessed: Aug. 27, 2023. [Online]. Available: http://arxiv.org/abs/2305.17364

  • M. Kayaalp, A. C. Browne, Z. A. Dodd, P. Sagan, and C. J. McDonald, “De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports”. AMIA Annu Symp Proc; 2014; 2014: 767–776. PMID: 25954383; PMCID: PMC4419982.

  • S. M. Castellino et al., “Brentuximab Vedotin with Chemotherapy in Pediatric High-Risk Hodgkin’s Lymphoma,” N Engl J Med, vol. 387, no. 18, pp. 1649–1660, Nov. 2022. https://doi.org/10.1056/NEJMoa2206660.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Y. Wang et al., “Self-Instruct: Aligning Language Models with Self-Generated Instructions.” arXiv, May 25, 2023. Accessed: Aug. 14, 2023. [Online]. Available: http://arxiv.org/abs/2212.10560

  • T. Rohan, G. Ishaan, Z. Tianyi, et al. Stanford Alpaca: An Instruction-following LLaMA model. Available at https://github.com/tatsu-lab/stanford_alpaca. Accessed June 20, 2023

  • M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.” arXiv, Oct. 29, 2019. Accessed: Mar. 07, 2023. [Online]. Available: http://arxiv.org/abs/1910.13461

  • J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization.” arXiv, Jul. 10, 2020. Accessed: Mar. 07, 2023. [Online]. Available: http://arxiv.org/abs/1912.08777

  • C. Raffel et al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” arXiv, Jul. 28, 2020. Accessed: Aug. 14, 2023. [Online]. Available: http://arxiv.org/abs/1910.10683

  • J. Wei et al., “Finetuned Language Models Are Zero-Shot Learners.” arXiv, Feb. 08, 2022. Accessed: Aug. 15, 2023. [Online]. Available: http://arxiv.org/abs/2109.01652

  • H. Yuan, Z. Yuan, R. Gan, J. Zhang, Y. Xie, and S. Yu, “BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model.” arXiv, Apr. 22, 2022. Accessed: Aug. 15, 2023. [Online]. Available: http://arxiv.org/abs/2204.03905

  • Q. Lu, D. Dou, TH. Nguyen. ClinicalT5: A Generative Language Model for Clinical Text. Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5436–5443, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.findings-emnlp.398.

  • A. E. W. Johnson et al., “MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports,” Sci Data, vol. 6, no. 1, p. 317, Dec. 2019. https://doi.org/10.1038/s41597-019-0322-0.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • C. Chen et al., “bert2BERT: Towards Reusable Pretrained Language Models,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland: Association for Computational Linguistics, 2022, pp. 2134–2148. https://doi.org/10.18653/v1/2022.acl-long.151.

  • D. M. Ziegler et al., “Fine-Tuning Language Models from Human Preferences.” arXiv, Jan. 08, 2020. Accessed: Aug. 14, 2023. [Online]. Available: http://arxiv.org/abs/1909.08593

  • S. Zhang et al., “OPT: Open Pre-trained Transformer Language Models.” arXiv, Jun. 21, 2022. Accessed: Feb. 22, 2023. [Online]. Available: http://arxiv.org/abs/2205.01068

  • H. Touvron et al., “LLaMA: Open and Efficient Foundation Language Models.” arXiv, Feb. 27, 2023. Accessed: Aug. 14, 2023. [Online]. Available: http://arxiv.org/abs/2302.13971

  • I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization.” arXiv, Jan. 04, 2019. Accessed: Aug. 31, 2023. [Online]. Available: http://arxiv.org/abs/1711.05101

  • E. J. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models.” arXiv, Oct. 16, 2021. Accessed: Aug. 15, 2023. [Online]. Available: http://arxiv.org/abs/2106.09685

  • W. Yuan, G. Neubig, and P. Liu, “BARTScore: Evaluating Generated Text as Text Generation.” arXiv, Oct. 27, 2021. Accessed: Aug. 15, 2023. [Online]. Available: http://arxiv.org/abs/2106.11520

  • Z. Huemann, C. Lee, J. Hu, S. Y. Cho, and T. Bradshaw, “Domain-adapted large language models for classifying nuclear medicine reports.” arXiv, Mar. 01, 2023. Accessed: Mar. 17, 2023. [Online]. Available: http://arxiv.org/abs/2303.01258

  • L. Smith et al., “Overview of BioCreative II gene mention recognition,” Genome Biol, vol. 9, no. S2, p. S2, Sep. 2008. https://doi.org/10.1186/gb-2008-9-s2-s2.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, Barcelona, Spain, July 2004. Association for Computational Linguistics, 2004; 74–81. https://aclanthology.org/W04-1013/.

  • T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT.” arXiv, Feb. 24, 2020. Accessed: Aug. 22, 2023. [Online]. Available: http://arxiv.org/abs/1904.09675

  • L. L. Wang et al., “Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations.” arXiv, May 23, 2023. Accessed: Aug. 22, 2023. [Online]. Available: http://arxiv.org/abs/2305.13693

  • K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics – ACL ’02, Philadelphia, Pennsylvania: Association for Computational Linguistics, 2001, p. 311. https://doi.org/10.3115/1073083.1073135.

  • M. Popović, “chrF: character n-gram F-score for automatic MT evaluation,” in Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal: Association for Computational Linguistics, 2015, pp. 392–395. https://doi.org/10.18653/v1/W15-3049.

  • Banerjee, S. and Lavie, A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization. Ann Arbor, Michigan: Association of Computational Linguistics, 2005. p. 65–72.

  • R. Vedantam, C. L. Zitnick, and D. Parikh, “CIDEr: Consensus-based Image Description Evaluation.” arXiv, Jun. 02, 2015. Accessed: Aug. 31, 2023. [Online]. Available: http://arxiv.org/abs/1411.5726

  • J.-P. Ng and V. Abrecht, “Better Summarization Evaluation with Word Embeddings for ROUGE.” arXiv, Aug. 25, 2015. Accessed: Aug. 31, 2023. [Online]. Available: http://arxiv.org/abs/1508.06034

  • W. Zhao, M. Peyrard, F. Liu, Y. Gao, C. M. Meyer, and S. Eger, “MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China: Association for Computational Linguistics, 2019, pp. 563–578. https://doi.org/10.18653/v1/D19-1053.

  • B. Thompson and M. Post, “Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online: Association for Computational Linguistics, 2020, pp. 90–121. https://doi.org/10.18653/v1/2020.emnlp-main.8.

  • M. Peyrard, T. Botschen, and I. Gurevych, “Learning to Score System Summaries for Better Content Selection Evaluation.,” in Proceedings of the Workshop on New Frontiers in Summarization, Copenhagen, Denmark: Association for Computational Linguistics, 2017, pp. 74–84. https://doi.org/10.18653/v1/W17-4510.

  • M. Zhong et al., “Towards a Unified Multi-Dimensional Evaluator for Text Generation,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, 2022, pp. 2023–2038. https://doi.org/10.18653/v1/2022.emnlp-main.131.

  • T. Scialom, S. Lamprier, B. Piwowarski, and J. Staiano, “Answers Unite! Unsupervised Metrics for Reinforced Summarization Models.” arXiv, Sep. 04, 2019. Accessed: Aug. 31, 2023. [Online]. Available: http://arxiv.org/abs/1909.01610

  • L. V. Lita, M. Rogati, and A. Lavie, “BLANC: learning evaluation metrics for MT,” in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing – HLT ’05, Vancouver, British Columbia, Canada: Association for Computational Linguistics, 2005, pp. 740–747. https://doi.org/10.3115/1220575.1220668.

  • Y. Gao, W. Zhao, and S. Eger, “SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online: Association for Computational Linguistics, 2020, pp. 1347–1354. https://doi.org/10.18653/v1/2020.acl-main.124.

  • M. Grusky, M. Naaman, and Y. Artzi, “Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana: Association for Computational Linguistics, 2018, pp. 708–719. https://doi.org/10.18653/v1/N18-1065.

  • A. R. Fabbri, W. Kryściński, B. McCann, C. Xiong, R. Socher, and D. Radev, “SummEval: Re-evaluating Summarization Evaluation,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 391–409, Apr. 2021. https://doi.org/10.1162/tacl_a_00373.

    Article 

    Google Scholar
     



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *