Considerations for Prompting Large Language Models | Oncology

To the Editor In their recent Research Letter in JAMA Oncology, Chen and colleagues 1 evaluated the large language model (LLM) GPT-3.5-turbo-0301 model via the ChatGPT (OpenAI) interface in its application for providing treatment recommendations as a response to 104 prompts as evaluated by board-certified oncologists for concordance with National Comprehensive Cancer Network guidelines. I commend the investigators for contributing in a meaningful way to assessment of LLMs within the field of oncology, as well as their transparency in providing ready access to their data and methods. It is important to note, however, that the quality of response from an LLM, at least insofar as it might pertain to a chatbot, may be contingent on the specificity of the prompt, as well as the number of tokens used in the response. Within chatbots and other LLMs, text is converted into tokens. Loosely speaking, the total number of tokens is proportional to the amount of computational power used to interpret a query and produce a response. LLMs are designed to determine the most likely response based on a corpus of training data compounded with sophisticated mathematical techniques. There are limitations on the number of tokens, and therefore computational power, that might be used by a chatbot for any given response. OpenAI has provided resources on its site to assist in understanding this concept.² An average query within this study,¹ referencing the supplemental material provided by the authors, is very broad in scope, asking, for instance, how to treat stage I breast cancer. The concern is that this might lead to mixed results and is suboptimal utilization of the chatbot. To adequately answer this question, an LLM might reference a very large body of knowledge that could not be fully incorporated into a response with the constraints of this model. Nonetheless, I recognize that it is likely an accurate reflection of common prompts from clinicians or patients. I believe that it is vital to include additional context to better ground ourselves in the realistic capacity of LLMs to give nuanced answers. As a contrast, other recent works in oncology have seen a higher level of detail and specificity in their answers by using more specific queries.³ As LLMs continue to be used increasingly in clinical practice, it will become more vital for physicians, and likewise other health care professionals, to recognize the strengths and weaknesses of such technologies. LLMs can be useful tools, no doubt, and the rapid acceleration of their capabilities indicates that they will only grow in prevalence in medicine, and therefore oncology.

Source link