Will Generative Artificial Intelligence Deliver on Its Promise in Health Care? | Artificial Intelligence

Importance
Since the introduction of ChatGPT in late 2022, generative artificial intelligence (genAI) has elicited enormous enthusiasm and serious concerns.

Observations
History has shown that general purpose technologies often fail to deliver their promised benefits for many years (“the productivity paradox of information technology”). Health care has several attributes that make the successful deployment of new technologies even more difficult than in other industries; these have challenged prior efforts to implement AI and electronic health records. However, genAI has unique properties that may shorten the usual lag between implementation and productivity and/or quality gains in health care. Moreover, the health care ecosystem has evolved to make it more receptive to genAI, and many health care organizations are poised to implement the complementary innovations in culture, leadership, workforce, and workflow often needed for digital innovations to flourish.

Conclusions and Relevance
The ability of genAI to rapidly improve and the capacity of organizations to implement complementary innovations that allow IT tools to reach their potential are more advanced than in the past; thus, genAI is capable of delivering meaningful improvements in health care more rapidly than was the case with previous technologies.

Since they became publicly available in late 2022, generative artificial intelligence (genAI) tools such as ChatGPT have elicited enormous enthusiasm, as well as concern, in all sectors of the economy. But the potential impact of genAI in health care seems particularly noteworthy. In a field in which an estimated 30% of the $4.3 trillion spent each year in the US adds little to no value,1 in which many tens of thousands of people die yearly from preventable mistakes,²^,3 and in which access to care is fragmented and inequities are commonplace, it is natural to be enthusiastic about the potential for genAI to improve quality, efficiency, equity, and patient experience.

While optimism regarding genAI is certainly warranted, so too is skepticism. In 1993, one of us (E.B.) described the “the productivity paradox of information technology,” citing repeated examples in various fields in which promising technologies initially failed—sometimes for decades—to deliver on the promise of improving productivity.4 Health care’s recent experience with digital transformation, largely through the implementation of electronic health records (EHRs), has hewed closely to the “productivity paradox” model.⁵

In this article, we describe the productivity paradox and its causes, then address the question of whether a similar paradox will occur with genAI in health care. Although history would say yes, there are unique aspects of both genAI and health care’s current context that are likely to help address the challenges. If they do, genAI may deliver on its promise in health care within a few years, not decades.

The Productivity Paradox of Information Technology

Research on the productivity paradox has demonstrated that there are 2 fundamental reasons that technologies don’t rapidly create their promised value.6 The first is that early versions of many technologies are flawed; the tools that ultimately succeed are ones that have improved with successive iterations.

While improvements in the technology are crucial, research has shown that the second, and more important, factor in overcoming the paradox relates to the processes, structure, and culture of the workplace. Humans, unfortunately, are generally unable to appreciate or implement the profound changes in organizational structure, leadership, workforce, and workflow needed to take full advantage of new technologies, at least at first. The iconic example comes from the invention of the electric motor in the late 19th century.7 Despite its obvious advantages over previous sources of power, decades passed before electrification transformed manufacturing and delivered significant productivity gains. This was largely because leaders failed to appreciate that reaping the benefits of the electric motor required rethinking the way that factories were laid out, work was organized, and workers were trained and socialized. These leaders typically replaced their big steam engine with a big electric motor, which did little to boost productivity. It was only when they switched from “group drive” (with one motor powering many machines) to “unit drive” (each machine having its own motor) that productivity doubled or tripled, as new factory layouts led to streamlined workflows. Over the subsequent 150 years, this scenario has repeated itself many times, the timeless lesson being that overcoming the productivity paradox requires complementary innovations in the way work is performed, sometimes referred to as “reimagining the work.”⁸

When considering whether genAI will deliver on its promise in health care, one way to shape the conversation is around 2 critical factors. First, is there something about genAI, compared with previous technologies, that will hasten iterative improvements in the technology? Second, is there something about the intersection of genAI and the current health care ecosystem that will accelerate the development of complementary skills and processes, or partly obviate the need for them?

The Productivity Paradox in Health Care

Before addressing the productivity paradox as it relates to genAI, it is worth reviewing the paradox as it has asserted itself with prior health care technologies. Note that we are referring here to general purpose technologies, those that influence multiple tasks and specialties, rather than more narrowly focused technologies such as laparoscopic surgery or the tools of interventional radiology, whose implementation tends to be far easier.8 But the struggle to implement general purpose technologies can pay off handsomely: since the Industrial Revolution the biggest drivers of sustained productivity growth have been these technologies, from the steam engine and electricity to computers and now, perhaps, AI.

Over the past 15 years, health care’s dominant general technological transformation came with the implementation of EHRs.5 EHR adoption was accelerated by federal incentive payments under the HITECH Act, passed during the Great Recession of 2008 and implemented beginning in 2010.⁹ In 2009, when HITECH became law, fewer than 1 in 10 US hospitals had an EHR; a decade later, fewer than 1 in 10 did not.¹⁰

While EHRs have cut the rate of medication errors and delivered numerous other benefits,11 the evidence that they have improved productivity is mixed, particularly when factoring in the EHR-associated increase in clinicians’ documentation burden.¹²^–14 The latest unanticipated consequence is the explosion in electronic messages coming from the patient portal to the physician’s EHR inbox.¹⁵ Clinicians often cite the EHR as a key factor in their dissatisfaction with work and high levels of burnout.¹⁶

As we reflect on the challenges experienced with the implementation of EHRs, it is also worth appreciating that health care has tried to implement AI decades before genAI became available. In the 1960s, 1970s, and early 1980s, several companies and academic groups developed AI tools designed to assist (or replace) clinicians as diagnosticians. None of them proved helpful enough to achieve broad acceptance or commercial viability, which led to a decades-long “AI winter,” in which interest and investment in health care AI markedly slowed.17

The field showed signs of life again in 2011, when IBM’s Watson AI software, so impressive in its Jeopardy! victory, turned its attention to health care. However, a decade later, after failing to demonstrate meaningful value to health care organizations,18 Watson Health was sold to an investment firm “for parts.”¹⁹

The Particular Challenges of Digital Implementation in Health Care

The prior failures of AI in health care and the bumpy first decade of the EHR era are not surprising. In fact, health care presents some challenges to digital transformation even more daunting than those seen in other industries.

First, health care is highly regulated, with vexing and often contentious (and litigious) debates related to data ownership. Moreover, powerful privacy regulations markedly restrict the data sharing that is essential to genAI.20

Second, the market for EHRs has become highly concentrated.21 With a handful of companies now “owning” the desktops of the vast majority of health care providers, non-EHR companies specializing in genAI-related tools experience significant barriers to entry.

Third, there are enormous numbers of players in health care, including physicians, hospitals, health plans, employers, pharmaceutical companies, device manufacturers, and government.22 This makes the successful implementation of genAI far more complex than in a direct-to-consumer industry, where a tool might only need to improve the experience of an individual consumer (who pays for it out of pocket or via advertisements), or a direct sale to a company, which decides whether the tool improves its competitive position without the complexities introduced by high levels of regulation or third-party payments. In addition, many of the incumbents in health care have massive political power, able to block technological innovations that they believe might compromise patient care or their economic position.

Fourth, health care data are very messy and often vary depending on their primary purpose (eg, clinical documentation, quality reporting, regulatory compliance, billing).23 This makes using any one dataset as a source of “truth” for AI algorithms potentially problematic.

Fifth, the health care field is anything but static, with new research constantly leading to shifts in understanding and practice that need to be integrated into care recommendations and protocols. This means that an AI algorithm generated from historical records may prove to be outdated and even dangerous.

Finally, while “fail fast and iterate” is a reasonable mantra for a consumer-facing app, the stakes in health care are too high to tolerate flaws in the output of information technology (IT) tools that could result in patient harm. Moreover, if the use of an IT tool leads to a patient death, there is likely to be mainstream and social media attention, and potentially a malpractice case, to remind everyone of the risks.

Will GenAI Overcome the Productivity Paradox in Health Care?

We have made the case that the productivity paradox of IT is commonplace in many businesses, that prior general purpose technologies in health care have fallen victim to the paradox, that the performance of AI in health care to date has been underwhelming, and that there are unique characteristics of the health care ecosystem that may make it even more challenging to overcome the paradox than in other industries. None of this would seem to predict that genAI in health care will rapidly achieve its promise. Nevertheless, we see several reasons to believe that genAI will lead to productivity and/or quality gains more quickly than those achieved by previous tools and in previous eras (Table).

Characteristics of GenAI That May Facilitate Rapid Improvements

First, genAI-related tools are remarkably easy to use. While the output of genAI improves as users learn how to create good prompts, the tools require relatively little user expertise. The unprecedented adoption curve of ChatGPT and subsequent versions of language-based generative AI (100 million users in the first 2 months) is partly due to the fact that no special training is required.24 Within months of its public launch, millions of knowledge workers were entering plain English prompts and finding that genAI was helping them to draft documents, write software code, and create graphics, often without the assistance, or even the approval, of their companies.

Second, the fact that genAI can be delivered via software to users’ computers also serves to accelerate adoption.25 Contrast this with the implementation of EHRs, a transition that required a large investment in hardware and a wholesale change in the way most health care work, both clinical and back-office, was organized. Moreover, there is now a vibrant ecosystem of venture capital–funded health care start-ups that were working on tackling a variety of health care problems at the time GPT was released.²⁶ We are aware of at least a dozen of these companies that added genAI capacity to their offerings within a few months, many with impressive results.

Third, even as EHR vendors seem to “own” the real estate of most health care workers’ computers, advances in application programming interfaces (APIs) and plug-in technologies will make it easier to achieve a relatively seamless interface between EHRs and genAI applications developed by third parties.27 Trying to stay ahead of this potential competitive threat, major EHR vendors are also rapidly integrating genAI into their own software offerings.²⁸^,29

Fourth, historically one of the key factors in overcoming the productivity paradox was the speed with which the technology underwent iterative improvement cycles. An important feature of genAI is its capacity to improve over time with limited human supervision.30 Some of the well-known problems of early large language models, such as “hallucinations” (in which the AI makes up facts and references),³¹ racial and ethnic biases,³² and inappropriate output,³³ were partly addressed a few months later. In one vivid example, while GPT-3.5 drafted a convincing (but nonsensical) prior authorization request for an anticoagulant to treat insomnia, 6 months later GPT-4 responded to the same prompt by saying, “It would be unethical and inappropriate for me to draft such a request” (Sara Murray, MD, MAS; personal communication). One can contrast this ability to rapidly iterate with the relatively slow improvement cycle of EHRs, where vendors might address problems through incremental version releases months or even years apart.

Of course, the ability to rapidly shape-shift creates its own challenges. Given that even genAI developers frequently do not completely understand how their own tools come up with certain outputs, it is possible that a patient-facing output of a genAI tool may change in a harmful way that escapes notice. The US Food and Drug Administration is currently grappling with the question of how to regulate genAI tools in health care settings, mindful that their output may change from one minute to the next.34

Early research on genAI outside of health care supports the premise that these tools have the capacity to deliver productivity and quality gains more quickly than prior technologies. One of us (E.B.) worked with colleagues to study the phased rollout of a genAI-based tool for assisting more than 5000 customer support agents in a software company.35 The agents given access to the tool had a 14% increase in productivity, accompanied by improvements in customer satisfaction and employee retention. Most of these improvements occurred within the first few months of genAI deployment and involved relatively small changes in the organization of the work. Interestingly, the least experienced and least skilled workers saw the biggest benefits, with productivity gains of 35% as they quickly ascended the learning curve.

Other studies published in the past few months have found that genAI delivered large and rapid productivity gains when helping software engineers,36 management consultants,³⁷ and writers.³⁸ One recent analysis concluded that up to 80% of occupations, including many in health care, involve at least some tasks for which worker productivity could be substantially augmented via genAI.³⁹

The Capacity of the Health Care System to Reinvent the Work

In some circumstances, the introduction of genAI will, by itself, quickly generate benefits. But in most cases, great gains will only come when implementation is coupled with significant changes in the design of the work. We believe that health care leaders are more prepared to deliver these changes than they were in the past. After having experienced the pain of some prior IT deployments, many leaders seem increasingly sophisticated in thinking about systems, change management, and the importance of workforce training and culture. IT departments of most large health care organizations now include clinicians with training in informatics and individuals with expertise in user-centered design and systems thinking.

Companies in health care’s digital space have also absorbed the lessons of the past few decades, including the importance of complementary innovations, the need for humility and patience, and the complexity and high stakes of health care. In 2021, reflecting on the demise of Watson Health, IBM chief executive officer Arvind Krishna said, “Healthcare is always going to…be more subtle, as well as more regulated, for all the right reasons….It’s a decision that may impact somebody’s life or death. You have to be more careful. So in healthcare, it turns out maybe we were too optimistic.”40

On top of this, changes in the health care marketplace will likely accelerate the pace of genAI investment and deployment. Profit margins of health care delivery organizations are increasingly threatened by a variety of forces, including limited reimbursement, rising labor costs, skyrocketing pharmaceutical costs, and increasing competition—now both from other health care organizations such as retail pharmacies and non–health care companies such as Amazon and Walmart.41 While concerns that genAI will threaten the jobs of current employees is a growing source of management-labor disputes (such as in recent strikes in the entertainment and automobile industries), current shortages of clerical staff, nurses, and some physician specialties may ameliorate this tension in health care. Finally, and perhaps ironically, many of the problems created by prior digital innovations—such as documentation burden and the EHR inbox—may be addressed by new genAI-powered tools such as digital scribes and sophisticated chatbots.

In fact, we expect that genAI will notch its early wins in health care delivery systems not so much by handling patient-facing tasks (such as making diagnoses and recommending treatments) but rather in addressing areas of waste and administrative friction, whether in creating a physician note, scheduling a patient appointment, or sending a bill or a prior authorization request to an insurance company. Experience gained in these areas will likely pave the way for broader implementation in areas that more directly affect patient outcomes and experience. A recent analysis by economists at Harvard and the consulting firm McKinsey projected that the implementation of modern AI systems could lead to savings of 5% to 10% in health care spending (roughly $200-$360 billion per year in 2019 dollars), mostly by addressing use cases in operations, corporate functions, and reimbursement.42 These savings may be an underestimate if genAI is ultimately successful in facilitating high value and evidence-based care through effective clinical decision support.¹

The productivity paradox of IT is likely to rear its head with the implementation of genAI in medicine, just as it has with prior technologies, both inside and outside health care. In fact, when compared with other industries, health care has several attributes that increase the challenge of reaping the promised benefits of technology tools.

Nevertheless, we are optimistic that the 2 key factors that have historically been critical in overcoming the productivity paradox—the ability of the digital tools to rapidly improve and the capacity of organizations to implement complementary innovations that allow IT tools to reach their potential—are more advanced than in the past. Because of this, we believe that genAI will deliver meaningful improvements in health care more rapidly than was the case with previous technologies.

Does that mean that health care will be completely transformed by genAI in the next few years? That seems unlikely, although certain use cases, such as digital scribes and some forms of back-office automation, could make a big difference relatively quickly. But it does mean that what might have been a decades-long path for genAI to overcome the productivity paradox in health care may now be traversed in 5 to 10 years, and for some digitally advanced organizations, even sooner. None of this will happen automatically. GenAI developers will need to effectively address concerns regarding hallucinations, bias, safety, and affordability. Regulators will need to enact standards that facilitate trust in genAI without unduly stifling innovation. And, most important, health care leaders will need to put in place actionable roadmaps that prioritize the areas where genAI can create the greatest benefits for their organizations, paying close attention to those complementary innovations that remain necessary and striving to mitigate the known problems with genAI and any unanticipated consequences that emerge. Given the health care system’s outsized role in both human health and in economics, the stakes could hardly be higher.

Accepted for Publication: November 14, 2023.

Published Online: November 30, 2023. doi:10.1001/jama.2023.25054

Corresponding Author: Robert M. Wachter, MD, Department of Medicine, University of California, San Francisco, 505 Parnassus Ave, Room M994, Box 0120, San Francisco, CA 94143-0120 ([email protected]).

Author Contributions: Dr Wachter had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design; acquisition, analysis, or interpretation of data; drafting of the manuscript; critical review of the manuscript for important intellectual content; and administrative, technical, or material support: Both authors.

Conflict of Interest Disclosures: Dr Wachter reports that he receives a yearly stipend for serving on the board of directors of The Doctors Company; serves on the board of directors of Second Wave Delivery Solutions and Third Wave Rx (for which he receives stock options) and the scientific advisory boards for Curai Health and Roon (with stock options); consults with Commure (stipend and stock options), Forward (stock options), and Notable (stock options); has given more than 200 talks (a few to for-profit entities including Boehringer Ingelheim and Chamberlain nursing schools) for which he has received honoraria; and holds the Benioff Endowed Chair in Hospital Medicine at UCSF from Marc and Lynne Benioff. Dr Brynjolfsson reports that he is the cofounder of Workhelix Inc (with stock); serves on the board of directors of Infinite Analytics Inc (with stock); has given more than 300 lectures for which he has received honoraria; directs the Stanford Digital Economy Lab, which receives funding from numerous for-profit, nonprofit, individual, and government sources; was an advisor to Cresta Inc (with stock); and holds the Jerry Yang and Akiko Yamazaki Endowed Chair at the Stanford Institute for Human-Centered Artificial Intelligence.

Institute of Medicine. To Err Is Human: Building a Safer Health System. National Academies Press; 2000.

Wachter
RM. The Digital Doctor: Hope, Hype and Harm at the Dawn of Medicine’s Computer Age. McGraw-Hill; 2015.

Brynjolfsson
E, McAfee
A. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. WW Norton & Co; 2014.

Brynjolfsson
E, Rock
D, Syverson
C. The productivity J-curve: how intangibles complement general purpose technologies. National Bureau of Economic Research. October 2018. Accessed September 23, 2023. https://www.nber.org/papers/w25148

12.

Tai-Seale
M, Olson
CW, Li
J,
et al. Electronic health record logs indicate that physicians split time evenly between seeing patients and desktop medicine. Health Aff (Millwood). 2017;36(4):655-662. doi:10.1377/hlthaff.2016.0811 PubMed Google Scholar Crossref

14.

Sinsky
C, Colligan
L, Li
L,
et al. Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann Intern Med. 2016;165(11):753-760. doi:10.7326/M16-0961 PubMed Google Scholar Crossref

22.

Burns
LR. The U.S. Healthcare Ecosystem: Payers, Providers, Producers. McGraw-Hill; 2021.

30.

Aghion
P, Jones
BF, Jones
CI. Artificial intelligence and economic growth. In: Agrawal
A, Gans
J, Goldfarb
A, eds. The Economics of Artificial Intelligence: An Agenda. University of Chicago Press; 2019:chap 9.

35.

Brynjolfsson
E, Li
D, Raymond
LR. Generative AI at work. National Bureau of Economic Research. April 2023. Accessed September 23, 2023. https://www.nber.org/papers/w31161

36.

Peng
S, Kalliamvakou
E, Cihon
P, Demirer
M. The impact of AI on developer productivity: evidence from github copilot. arXiv. Preprint posted online February 13, 2023. https://arxiv.org/abs/2302.06590 Google Scholar

37.

Dell’Acqua
F, McFowland
E, Mollick
ER,
et al. Navigating the jagged technological frontier: field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School Technology & Operations Mgt Unit Working Paper No. 24-013. September 15, 2023. Accessed September 23, 2023.

38.

Noy
S, Zhang
W. Experimental evidence on the productivity effects of generative artificial intelligence. SSRN. March 1, 2023. Accessed September 23, 2023. Google Scholar

39.

Eloundou
T, Manning
S, Mishkin
P, Rock
D. Gpts are gpts: an early look at the labor market impact potential of large language models. arXiv. Preprint posted online August 21, 2023. https://arxiv.org/abs/2303.10130 Google Scholar

Source link