The Rising Role of Large Language Models
In the evolving landscape of healthcare, the potential of artificial intelligence, specifically Large Language Models (LLMs), is being increasingly recognized. These computational models are being utilized to identify Social Determinants of Health (SDoH) from Electronic Health Records (EHRs). SDoH are the conditions in which people are born, grow, live, work, and age, and have significant impacts on health outcomes. However, the documentation of SDoH within EHRs is often scarce, thus posing a challenge for healthcare providers and researchers.
Performance of LLMs in Extracting SDoH
Recent studies have demonstrated that LLMs can enable high throughput extraction of SDoH from EHRs, supporting both research and clinical care. Not only do these models outperform structured diagnostic codes and traditional BERT classifiers, but they also exhibit less bias compared to models from the ChatGPT-family. Specifically, the fine-tuned Flan T5 XL and Flan T5 XXL models have shown remarkable performance for any and adverse SDoH mentions respectively.
Overcoming Data Limitations with Synthetic Text Generation
One of the challenges faced by the LLMs in identifying SDoH is class imbalance and data limitations. To overcome this, synthetic text generation has been explored, which was found to improve the performance of smaller Flan T5 models. This approach further underscores the potential of LLMs in enhancing real-world evidence concerning SDoH, which can assist in identifying patients who could benefit from resource support.
Addressing Algorithmic Bias
One critical aspect of developing and implementing these models is the consideration of algorithmic bias. The study found that fine-tuned models were less likely than ChatGPT to change their prediction when race, ethnicity, and gender descriptors were added to the text. This suggests less algorithmic bias and more reliable predictions. The reduction of bias in these models is essential to ensure equitable healthcare outcomes and resource allocation.
Standardization and Integration of SDoH Data
Despite the advances in using LLMs to identify SDoH from EHRs, there remains a lack of standardized SDoH data. This poses challenges in integrating this data into healthcare systems. Collaboration among organizations and standardization of data can help overcome this hurdle, leading to improved identification and addressing of SDoH factors.
Conclusion
In conclusion, the use of LLMs in healthcare has demonstrated promising potential, particularly in the identification of SDoH from EHRs. These models can enhance real-world data collection, support patient care, and facilitate proactive resource allocation. However, ongoing efforts should focus on overcoming data limitations, reducing algorithmic bias, and standardizing SDoH data for seamless integration into healthcare systems.