A Game-Changer in Health Research: Large Language Models
Electronic Health Records (EHRs) have long been a valuable resource for health research, offering a wealth of data on patient history, treatment outcomes, and more. However, the traditional methods of extracting this data can be time-consuming and inefficient. A new study opens up a promising alternative, exploring the use of large language models (LLMs) to mine vital data from EHRs and improve research outcomes.
Unveiling Social Determinants of Health
The study focuses on the extraction of Social Determinants of Health (SDoH)—factors like socio-economic status, education, and neighborhood that significantly impact health outcomes. The identification of patients who could benefit from resource support also forms a crucial part of the study. Researchers used various LLMs, including BERT, Flan-T5, and ChatGPT models, and compared their performance. The findings underscore the potential of LLMs to augment data collection in real-world situations and improve real-world evidence on SDoH.
The Superiority of Fine-tuned Models
Researchers found that the best performing models were fine-tuned Flan T5 XL for any SDoH mentions, and Flan T5 XXL for adverse SDoH mentions. Incorporating LLM-generated synthetic data into model training also improved the performance of smaller Flan T5 models. Notably, these models were less prone to bias than other generalist models, marking a significant step forward in reducing algorithmic bias—a common and problematic issue in AI-based health research.
Powerful Predictive Potential
With an impressive accuracy rate, these specialized LLMs were able to identify 93.8 percent of patients with adverse SDoH who could benefit from additional support. This predictive potential can assist clinicians with complex diagnostic cases and help in proactive patient care. For example, in the context of cancer care, these models could identify nearly 94% of patients with adverse SDoH, significantly enhancing efforts to support patients’ care proactively.
Challenges and Considerations
Despite the promise, the use of LLMs in healthcare also presents challenges. The accessibility of large language models like GPT-4 and potential bias remain concerns. Moreover, operationalizing and integrating AI into clinical practice requires careful consideration. Factors like patient consent, minimizing bias, and understanding how patients want to interact with language models are critical. As we move forward, patient inclusion in the development and implementation of AI in healthcare will be essential.
The Way Forward
The findings of this study suggest that LLMs, especially when fine-tuned and used with synthetic data, can significantly improve the extraction of SDoH from EHRs. By capturing almost 94% of patients with adverse SDoH compared to 2% with standard EHR practice, these models present an exciting opportunity to enhance health research outcomes and patient care. As we continue to explore and refine these models, they could be transformative in identifying and addressing social determinants of health.