Uncategorized

Journal of Medical Internet Research


Virtual assistants (VAs)—mostly voice assistants, chatbots, and dialogue-based interactive applications—have been leading conversational technologies, being used for health care communications and remote monitoring [-]. However, their accuracy and reliability in understanding and responding to medical questions have been limitations [], which have been slowly improving over the years []. Large language models (LLMs) offer scalable and customizable solutions for these limitations of VAs. The body of literature demonstrating the capabilities of LLMs in medicine and health care has been growing, and a number of studies benchmarking LLMs’ performance against each other or humans’ medical knowledge, decision-making processes, and empathic responses have been published [-]. Furthermore, LLM-based services improve equitable access to information and reduce language barriers via contextual and culturally aware systems [] and privacy-preserving, local solutions for low-resource settings [-]. This research and evidence give a glimpse at the future of personalized VAs.

In the context of mental health and health information–seeking behavior, we investigated the performance of VAs in responding to postpartum depression–related frequently asked questions. The evidence from our two studies (conducted in 2021 with VAs [] and 2023 with LLMs []) provides comparable findings on the 2-year difference in technology; illuminates the evolving roles of artificial intelligence, natural language processing, and LLMs in health care; and shows the promise of a more accurate and reliable digital health landscape.

In our first study in 2021 [], we investigated the clinical accuracy of Google Assistant, Amazon Alexa, Microsoft Cortana, and Apple Siri voice assistant responses to postpartum depression–related questions. In our second study in 2023 [], we replicated our research by using LLMs, and the new study showed significant improvements in the accuracy and clinical relevance of responses. Specifically, GPT-4’s responses to all postpartum depression–related questions were more accurate and clinically relevant in contrast to the VAs, of which the proportion of clinically relevant responses did not exceed the 29% reported in our earlier study. In addition, the interrater reliability score for LLMs (GPT-4: κ=1; P<.05) was higher than that for VAs (κ=0.87; P<.001), underscoring LLMs’ consistency in clinically relevant responses, which is vital to achieve for health care applications. LLMs also recommended consultations with health care providers—a feature that adds an extra layer for safety—which was not observed in our earlier study with VAs. This dramatic improvement suggests a paradigm shift in the capabilities of digital health tools for health information–seeking activities. The high clinical accuracy and reliability of LLMs point toward a promising future for their integration into existing VA platforms. LLMs can offer dynamic adaptability for VAs via custom applications and decentralized LLM architectures [,]. Given their capabilities for open-source developments and collaborations, LLMs could serve as cost-effective and inclusive frameworks for collaborative developments (ie, among technology providers, patients, and clinical experts) in fine-tuning and training VAs for specific medical purposes.

The empirical data from our studies, as well as the literature [], indicate a compelling trajectory toward LLMs being used to potentially improve the clinical and instructional capabilities of conversational technologies. This suggests a shift in our earlier spectrum model for VAs in health care () [], in which we proposed 4 service levels for a spectrum of VA use that were associated with the risk, value, and impact of VAs. These levels were the “information” (eg, asking Amazon Alexa to start self-care guidance), “assistance” (eg, setting up reminders for medication or self-therapy), “assessment” (eg, identification, detection, prediction with digital biomarkers, and management), and “support” (prescribing, substituting, or supplementing medication and therapy tools) levels. In 2020, the evidence on the utilization of VAs in health care indicated that VAs were at the “information” and “assistance” levels []. However, LLMs are opening up opportunities for VAs, potentially toward the “assessment” and “support” levels. As shows, the level of a service and the associated risk, value, and impact of the service can change based on the targeted problems and solutions. A digital health ecosystem represents the ecosystem where we may envision a future of support from VAs enhanced by LLMs with speech interaction and audio-based sensing capabilities []. Such enhancements may include quantifying human behavior–related factors beyond VA engagement, such as social engagement, emotions, neurodevelopmental and behavioral health, sleep health (snoring, heart rate, and movements), respiratory symptoms (sneezing and coughing), and motion (gait, exercise, and sedentary behavior).

Figure 1. Spectrum of virtual assistants (outlines the risk, value, and impact in health care services) and applications in digital health ecosystems. These can change based on the targeted problems and solutions. This figure was created with BioRender.com (BioRender).

Despite this promising horizon, we need to approach cautiously. As LLMs are becoming highly appealing tools that can be used as VAs in health care, it is imperative to establish a platform that facilitates democratized access and interdisciplinary collaboration during the development of such applications [,]. This platform should be designed to bring together a diverse range of stakeholders, including technologists, ethicists, researchers, health care professionals, and patients. This would ensure that the development and integration of VAs are guided by a balanced perspective that considers ethical guidelines and regulatory oversight [,], governance principles [,], privacy and safety measures [], feasibility, efficacy, and patient-centric approaches and assessment methods [,]. By prioritizing such collaborative and inclusive dialogues, we can better navigate the complex challenges and harness the full potential of these advanced technologies in health care, ensuring that they are developed responsibly, ethically, and in alignment with the diverse needs of all users.

The author thanks Yasemin Sezgin for her constructive feedback. was created with BioRender.com (BioRender).

ES serves on the editorial board of JMIR Publications.

Edited by T Leung, T de Azevedo Cardoso;
This is a non–peer-reviewed article.
submitted 29.09.23; accepted 02.01.24;
published
19.01.24

©Emre Sezgin. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 19.01.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *