Ouyang L, Wu J, Jiang X, Almeida, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems. 2022; 35:730–744.
Kalyan KS, Rajasekharan A, Sangeetha S. Ammu: a survey of transformer-based biomedical pretrained language models. Journal of biomedical informatics. 2022;126:103982.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention Is All You Need. 2017. arXiv:1706.03762.
Open AI. ChatGPT release note. Available at: https://help.openai.com/en/articles/6825453-chatgpt-release-notes#h_4799933861 Last Accessed: December 22, 2023.
Tian S, Jin Q, Yeganova L, Lai P-T, Zhu Q, Chen X, Yang X, Chen, Kim W, Comeau DC, Islamaj R, Kapoor A, Gao X, Lu Z. Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health- arXiv:2306.10070. (2023).
Radford A, Narasimhan K. Improving Language Understanding by Generative Pre-Training. 2018. https://api.semanticscholar.org/CorpusID:49313245.
Cao Z, Wong K, Lin CT. Weak Human Preference Supervision for Deep Reinforcement Learning. IEEE Trans Neural Netw Learn Syst. 2021;32(12):5369–5378. doi: https://doi.org/10.1109/TNNLS.2021.3084198.
Rafailov R, Sharma A, Mitchell E, Ermon S, Manning CD, Finn C. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv:2305.18290 (2023).
Gao L, Biderman S, Black S, Golding L, Hoppe T, Foster C, Phang J, He H, Thite A, Nabeshima N, Presser S, Leahy C. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv:2101.00027 (2020).
Meta AI Request Form. Available at: https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform Last Accessed: December 22, 2023.
Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. Cureus. 2023;15(6):e40895. doi: https://doi.org/10.7759/cureus.40895.
Microsoft Bing Blog. Available at: https://blogs.bing.com/search/november-2023/our-vision-to-bring-microsoft-copilot-to-everyone-and-more. Last Accessed: December 24, 2023.
ZDNET Information. Available at: https://www.zdnet.com/article/what-is-copilot-formerly-bing-chat-heres-everything-you-need-to-know/. Last Accessed: December 24, 2023.
Avanade Insight. Available at: https://www.avanade.com/en/blogs/avanade-insights/health-care/ai-copilot. Last Accessed: December 24, 2023.
OpenAI. GPT-4 Technical Report. arXiv:2303.08774 (2023).
The decoder. Available at: https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/ Last Accessed: December 22, 2023
Lee P, Bubeck S, Petro J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med. 2023;388(13):1233–1239. doi: https://doi.org/10.1056/NEJMsr2214184.
Bhayana R, Bleakney RR, Krishna S. GPT-4 in Radiology: Improvements in Advanced Reasoning. Radiology. 2023;307(5):e230987. doi: https://doi.org/10.1148/radiol.230987.
Jang D, Yun TR, Lee CY, Kwon YK, Kim CE. GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors. PLOS Digit Health. 2023;2(12):e0000416. doi: https://doi.org/10.1371/journal.pdig.0000416.
Guerra GA, Hofmann H, Sobhani S, Hofmann G, Gomez D, Soroudi D, Hopkins BS, Dallas J, Pangal DJ, Cheok S, Nguyen VN, Mack WJ, Zada G. GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions. World Neurosurg. 2023;179:e160-e165. doi: https://doi.org/10.1016/j.wneu.2023.08.042.
Scheschenja M, Viniol S, Bastian MB, Wessendorf J, König AM, Mahnken AH. Feasibility of GPT-3 and GPT-4 for in-Depth Patient Education Prior to Interventional Radiological Procedures: A Comparative Analysis. Cardiovasc Intervent Radiol. 2023 Oct 23. doi: https://doi.org/10.1007/s00270-023-03563-2.
Spies NC, Hubler Z, Roper SM, Omosule CL, Senter-Zapata M, Roemmich BL, Brown HM, Gimple R, Farnsworth CW. GPT-4 Underperforms Experts in Detecting IV Fluid Contamination. J Appl Lab Med. 2023;8(6):1092–1100. doi: https://doi.org/10.1093/jalm/jfad058.
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Schärli N, Chowdhery A, Mansfield P, Demner-Fushman D, Agüera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–180. doi: https://doi.org/10.1038/s41586-023-06291-2.
Madaan A, Tandon N, Gupta P, Hallinan S, Gao L, Wiegreffe S, Alon U, Dziri N, Prabhumoye S, Yang Y, et al. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651 (2023).
Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L, Clark K, Pfohl S, Cole-Lewis H, Neal D, Schaekermann M, Wang A, Amin M, Lachgar S, Mansfield P, Prakash S, Green B, Dominowska E, Aguera y Arcas B, Tomasev N, Liu Y, Wong R, Semturs C, Mahdavi S. Towards Expert-Level Medical Question Answering with Large Language Models. arXiv:2305.09617v1 (2023).
Tu T, Azizi S, Driess D, Schaekermann M, Amin M, et al. Towards Generalist Biomedical AI. arXiv:2307.14334v1 (2023).
Hippocratic AI. Available at https://www.hippocraticai.com/. Last Accessed: December 24, 2023.
Hugging Face. MPT-B. Available at: https://huggingface.co/mosaicml/mpt-7b. Last Accessed: December 24, 2023.
Kauf C, Ivanova AA, Rambelli G, Chersoni E, She JS, Chowdhury Z, Fedorenko E, Lenci A. Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely. Cogn Sci. 2023;47(11):e13386. doi: https://doi.org/10.1111/cogs.13386.
Touvron H, Martin L, et al. LLaMA-2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 (2023).
Ainslie J, Lee-Thorp J, de Jong M, Zemlyanskiy Y, Lebron, Sanghai S. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4895–4901, Singapore. Association for Computational Linguistics.
Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Singh Chaplot D, de las Casas D, Bressand F, Lengyel G, Lample G, Saulnier L, Lavaud LR, Lachaux MA, Stock P, Le Scao T, Lavril T, Wang T, Lacroix T, El Sayed W. Mistral-7B arXiv:2310.06825.
An END-to-END guide on how to finetune a LLM(Mistral-7B) into a Medical Chat Doctor using Huggingface. Available at: https://medium.com/@SachinKhandewal/finetuning-mistral-7b-into-a-medical-chat-doctor-using-huggingface-qlora-peft-5ce15d45f581 Last Accessed: December 22, 2023.
Mistral AI. Available at: https://mistral.ai/news/mixtral-of-experts/ Last Accessed: December 24, 2023.
Nijkamp E, Xie T, Hayashi H, Pang B, Xia C, Xing C, Vig J, Yavuz S, Laban P, Krause B, Purushwalkam S, Niu T, Kryściński W, Murakhovs’ka L, Choubey PK, Fabbri A, Liu Y, Meng R, Tu L, Bhat M, Wu C-S, Savarese S, Zhou Y, Joty S, Xiong C. XGen-7B Technical Report. arXiv:2309.03450.
Peng C, Yang X, Chen A, Smith KE, PourNejatian N, Costa AB, Martin C, Flores MG, Zhang Y, Magoc T, Lipori G, Mitchell DA, Ospina NS, Ahmed MM, Hogan WR, Shenkman EA, Guo Y, Bian J, Wu Y. A study of generative large language model for medical research and healthcare. NPJ Digit Med. 2023;6(1):210. doi: https://doi.org/10.1038/s41746-023-00958-w.
Cunningham H, Ewart A, Riggs L, Huben R, Sharkey R. Sparse Autoencoders Find Highly Interpretable Features in Language Models. arXiv:2309.08600 (2023).
Anthropic. Available at: https://www.anthropic.com/ Last Accessed: December 22, 2023.
Gemini Team, Google. Gemini: A Family of Highly Capable Multimodal Models. Available at: https://assets.bwbx.io/documents/users/iqjWHBFdfxIU/r7G7RrtT6rnM/v0 Last Accessed: January 31, 2023.
Yasunaga M, Leskovec J, Liang P. LinkBERT: Pretraining Language Models with Document Links. arXiv:2203.15827 (2022).
Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT – Reshaping medical education and clinical management. Pak J Med Sci. 2023;39(2):605–607. doi: https://doi.org/10.12669/pjms.39.2.7653.
Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel). 2023;11(6):887. doi: https://doi.org/10.3390/healthcare11060887.
Eysenbach G. The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers. JMIR Med Educ. 2023;9:e46885. doi: https://doi.org/10.2196/46885.
Cascella M, Cascella A, Monaco F, Shariff MN. Envisioning gamification in anesthesia, pain management, and critical care: basic principles, integration of artificial intelligence, and simulation strategies. J Anesth Analg Crit Care. 2023;3(1):33. doi: https://doi.org/10.1186/s44158-023-00118-2.
Haque A, Chowdhury N-U-R. The Future of Medicine: Large Language Models Redefining Healthcare Dynamics. TechRxiv. November 22, 2023. doi: https://doi.org/10.36227/techrxiv.24354451.v2.
Gurrapu S, Kulkarni A, Huang L, Lourentzou I, Batarseh FA. Rationalization for explainable NLP: a survey. Front Artif Intell. 2023;6:1225093. doi: https://doi.org/10.3389/frai.2023.1225093.
Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J Med Syst. 2023;47(1):33. doi: https://doi.org/10.1007/s10916-023-01925-4.
Birkun AA, Gautam A. Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice. Prehosp Disaster Med. 2023;38(6):757–763. doi: https://doi.org/10.1017/S1049023X23006568.
Zúñiga Salazar G, Zúñiga D, Vindel CL, Yoong AM, Hincapie S, Zúñiga AB, Zúñiga P, Salazar E, Zúñiga B. Efficacy of AI Chats to Determine an Emergency: A Comparison Between OpenAI’s ChatGPT, Google Bard, and Microsoft Bing AI Chat. Cureus. 2023;15(9):e45473. doi: https://doi.org/10.7759/cureus.45473.
MIT Technology Review. Why Meta’s latest large language model survived only three days online. https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/ Last Accessed: December 22, 2023.
Batarseh FA, Freeman L, Huang C-H. A survey on artificial intelligence assurance. J Big Data 2021;8,7. doi:https://doi.org/10.1186/s40537-021-00445-7.
Manathunga S, Hettigoda I. Aligning Large Language Models for Clinical Tasks. arXiv:2309.02884 (2023).
Benary M, Wang XD, Schmidt M, Soll D, Hilfenhaus G, Nassir M, Sigler C, Knödler M, Keller U, Beule D, Keilholz U, Leser U, Rieke DT. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Netw Open. 2023;6(11):e2343689. doi: https://doi.org/10.1001/jamanetworkopen.2023.43689.
Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019;1:206–215. doi: https://doi.org/10.1038/s42256-019-0048-x.
Madsen A, Reddy S, Chandar S. Post-hoc Interpretability for Neural NLP: A Survey. ACM Computing Surveys. 2022;55(8):1–42. doi: https://doi.org/10.1145/3546577.
Tran D, Liu J, Dusenberry MW, Phan D, Collier M, Ren J, Han K, Wang Z, Mariet Z, Hu H, Band N, Rudner TJG, Singhal K, Nado Z, van Amersfoort J, Kirsch A, Jenatton R, Thain N, Yuan H, Buchanan K, Murphy K, Sculley D, Gal Y. Plex: towards reliability using pretrained large model extensions. Preprint at https://doi.org/10.48550/arXiv.2207.07411 (2022).
Brown T, et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020;33:1877–1901.
Lester B, Al-Rfou R, Constant N. The power of scale for parameter-efficient prompt tuning. Preprint at: https://doi.org/10.48550/arXiv.2104.08691 (2021).
Liang P. et al. Holistic evaluation of language models. Preprint at: https://doi.org/10.48550/arXiv.2211.09110 (2022).
Hippocratic AI. Available at https://www.hippocraticai.com/. Last Accessed: December 24, 2023