World Health Organization (WHO) Lays Out Crucial Warnings About The Use Of Generative AI And Large Language Models In Medicine And Health

Crucial insights and warnings about the use of generative AI in medicine and health, as just … [+] released by the World Health Organization (WHO).

getty

In today’s column, I am continuing my ongoing series about generative AI in the medical and health domain by taking a close look at the recently released World Health Organization (WHO) report entitled “Ethics and Governance of Artificial Intelligence for Health. Guidance on Large Multi-Modal Models” (posted online by WHO on January 18, 2024).

The official document is nearly one hundred pages in length and packs a lot of important insights. I will provide you here with a selection of key points and proffer highlights that I believe are especially notable. My analysis and added thoughts are included to amplify and augment the content and represent solely my own views. I will give you context for the material cited and will make sure to cite passages that pertain to my commentary and that I believe are especially impactful.

All in all, I hope that this analysis and review will give you a solid grasp of what the WHO report has to say on the topic of generative AI in the medical and health domain. Consider this a meaty sampler that will whet your appetite. I urge that you consider reading the full report when you have time to do so.

To give you a sense of the coverage right away of this latest WHO report, these are the five major application areas of applying generative AI that the paper covers (excerpted from the report):

(1) “Diagnosis and clinical care, such as responding to patients’ written queries;”
(2) “Patient-guided use, such as for investigating symptoms and treatment;”
(3) “Clerical and administrative tasks, such as documenting and summarizing patient visits within electronic health records;”
(4) “Medical and nursing education, including providing trainees with simulated patient encounters, and;”
(5) “Scientific research and drug development, including to identify new compounds.”

In case you’ve been living in a cave that has no Internet access and didn’t realize what has been happening in the last few years, generative AI is increasingly entering into each of those five areas (and, well beyond too). Some people are excited about the use of generative AI in the medical and health realm. They are right to be excited since generative AI can be an enormous asset. In that same breath, we ought to acknowledge that generative AI carries a lot of baggage and can be detrimental to the medical and health arena.

Yes, this is what I refer to as the dual-use AI problem, see my in-depth discussion at the link here.

AI such as generative AI might be able to aid in making tall leaps in medicine and public health. Efforts are underway to use generative AI to try and cure cancer. This is the positive or smiley face side of using AI. There is also the sad face side. It is feasible to use AI and generative AI to try and discover new and utterly deadly biochemical endangerments.

Furthermore, dual-use comes as part and parcel of AI. You cannot just wave a magic wand and wish away the bad sides of AI. The same properties and advantages are readily turned to the dark side. Plus, you might have evildoers who purposely seek to use AI for untoward purposes, but meanwhile, there are also those innocents who might have the best of intentions that inadvertently fall into unsavory propensities.

My point is not to paint a picture of exclusive doom and gloom. The crux is to realize that we need to wisely harness the likes of AI and generative AI. Allowing wanton development and use is probably going to get us unknowingly into a heap of trouble. It is vital that we speak up, consider the tradeoffs, and proceed via a heightened awareness of what we are getting ourselves into. My ongoing column coverage on AI ethics and AI law is intended to bring awareness to all stakeholders, including AI makers, AI researchers, companies using AI, practitioners using AI, lawmakers, regulators, and so on.

It is going to take a coordinated collaboratively informed village to make sure that we get things right when it comes to AI and generative AI. This most definitely is the case in the medical and health domain where life and death are clearly at stake.

Before we leap into the WHO report, I’d like to establish what generative AI is all about.

Core Background About Generative AI And Large Language Models

Here is some quick background about generative AI to make sure we are in the same ballpark about what generative AI and also Large Language Models (LLMs) consist of. If you already are highly versed in generative AI and LLMs, you might skim this quick backgrounder and then pick up once I get into the particulars of this specific use case.

I’d like to start by dispelling a myth about generative AI. Banner headlines from time to time seem to claim or heartily suggest that AI such as generative AI is sentient or that it is fully on par with human intelligence. Don’t fall for that falsity, please.

Realize that generative AI is not sentient and only consists of mathematical and computational pattern matching. The way that generative AI works is that a great deal of data is initially fed into a pattern-matching algorithm that tries to identify patterns in the words that humans use. Most of the modern-day generative AI apps were data trained by scanning data such as text essays and narratives that were found on the Internet. Doing this was a means of getting the pattern-matching to statistically figure out which words we use and when we tend to use those words. Generative AI is built upon the use of a large language model (LLM), which entails a large-scale data structure to hold the pattern-matching facets and the use of a vast amount of data to undertake the setup data training.

There are numerous generative AI apps available nowadays, including GPT-4, Bard, Gemini, Claude, ChatGPT, etc. The one that is seemingly the most popular would be ChatGPT by AI maker OpenAI. In November 2022, OpenAI’s ChatGPT was made available to the public at large and the response was astounding in terms of how people rushed to make use of the newly released AI app. As noted earlier, there are an estimated one hundred million active weekly users at this time.

Using generative AI is relatively simple.

You log into a generative AI app and enter questions or comments as prompts. The generative AI app takes your prompting and uses the already devised pattern matching based on the original data training to try and respond to your prompts. You can interact or carry on a dialogue that appears to be nearly fluent. The nature of the prompts that you use can be a make-or-break when it comes to getting something worthwhile out of using generative AI and I’ve discussed at length the use of state-of-the-art prompt engineering techniques to best leverage generative AI, see the link here.

The conventional modern-day generative AI is of an ilk that I refer to as generic generative AI.

By and large, the data training was done on a widespread basis and involved smatterings of this or that along the way. Generative AI in that instance is not specialized in a specific domain and instead might be construed as a generalist. If you want to use generic generative AI to advise you about financial issues, legal issues, medical issues, and the like, you ought to not consider doing so. There isn’t enough depth included in the generic generative AI to render the AI suitable for domains requiring specific expertise.

AI researchers and AI developers realize that most of the contemporary generative AI is indeed generic and that people want generative AI to be deeper rather than solely shallow. Efforts are stridently being made to try and make generative AI that contains notable depth within various selected domains. One method to do this is called RAG (retrieval-augmented generation), which I’ve described in detail at the link here. Other methods are being pursued and you can expect that we will soon witness a slew of generative AI apps shaped around specific domains, see my prediction at the link here.

You might be used to using generative AI that functions in a principled text-to-text mode. A user enters some text, known as a prompt, and the generative AI app emits or generates a text-based response. Simply stated, this is text-to-text. I sometimes describe this as text-to-essay, due to the common practice of people using generative AI to produce essays.

The typical interaction is that you enter a prompt, get a response, you enter another prompt, you get a response, and so on. This is a conversation or dialogue. Another typical approach consists of entering a prompt such as tell me about the life of Abraham Lincoln, and you get a generated essay that responds to the request.

Another popular mode is text-to-image, also called text-to-art. You enter text that describes something you want to be portrayed as an image or a piece of art. The generative AI tries to parse your request and generate artwork or imagery based on your stipulation. You can iterate in a dialogue to have the generative AI adjust or modify the rendered result.

We are heading beyond the simple realm of text-to-text and text-to-image by shifting into an era of multi-modal generative AI, see my prediction details at the link here. With multi-modal generative AI, you will be able to use a mix of combinations or modes, such as text-to-audio, audio-to-text, text-to-video, video-to-text, audio-to-video, video-to-audio, etc. This will allow users to incorporate other sensory devices such as using a camera to serve as input to generative AI. You then can ask the generative AI to analyze the captured video and explain what the video consists of.

Multi-modal generative AI tremendously ups the ante regarding what you can accomplish with generative AI. This unlocks a lot more opportunities than being confined to merely one mode. You can for example mix a wide variety of modes such as using generative AI to analyze captured video and audio, which you might then use to generate a script, and then modify that script to then have the AI produce a new video with accompanying audio. The downside is that you can potentially get into hot water more easily due to trying to leverage the multi-modal facilities.

Allow me to briefly cover the hot water or troubling facets of generative AI.

Today’s generative AI that you readily run on your laptop or smartphone has tendencies that are disconcerting and deceptive:

(1) False aura of confidence.
(2) Lack of stating uncertainties.
(3) Lulls you into believing it to be true.
(4) Uses anthropomorphic wording to mislead you.
(5) Can go off the rails and do AI hallucinations.
(6) Sneakily portrays humility.

I’ll briefly explore those qualms.

Firstly, generative AI is purposely devised by AI makers to generate responses that seem confident and have a misleading appearance of an aura of greatness. An essay or response by generative AI convinces the user that the answer is on the up and up. It is all too easy for users to assume that they are getting responses of an assured quality. Now, to clarify, there are indeed times when generative AI will indicate that an answer or response is unsure, but that is a rarity. The bulk of the time a response has a semblance of perfection.

Secondly, many of the responses by generative AI are really guesses in a mathematical and statistical sense, but seldom does the AI indicate either an uncertainty level or a certainty level associated with a reply. The user can explicitly request to see a certainty or uncertainty, see my coverage at the link here, but that’s on the shoulders of the user to ask. If you don’t ask, the prevailing default is don’t tell.

Thirdly, a user is gradually and silently lulled into believing that the generative AI is flawless. This is an easy mental trap to fall into. You ask a question and get a solid answer, and this happens repeatedly. After a while, you assume that all answers will be good. Your guard drops. I’d dare say this happens even to the most skeptical and hardened of users.

Fourth, the AI makers have promulgated wording by generative AI that appears to suggest that AI is sentient. Most answers by the AI will typically contain the word “I”. The implication to the user is that the AI is speaking from the heart. We normally reserve the word “I” for humans to use. It is a word bandied around by most generative AI and the AI makers could easily curtail this if they wanted to do so.

It is what I refer to as anthropomorphizing by design.

Not good.

Fifth, generative AI can produce errors or make stuff up, yet there is often no warning or indication when this occurs. The user must ferret out these mistakes. If it occurs in a lengthy or highly dense response, the chance of discovering the malady is low or at least requires extraordinary double-checking to discover. The phrase AI hallucinations is used for these circumstances, though I disfavor using the word “hallucinations” since it is lamentedly another form of anthropomorphizing the AI.

Lastly, most generative AI has been specially data-trained to express a sense of humility. See my in-depth analysis at the link here. Users tend to let down their guard because of this artificially crafted humility. Again, this is a trickery undertaken by the AI makers.

In a process such as RLHF (reinforcement learning with human feedback), the initial data-trained generative AI is given added tuning. Personnel are hired to ask questions and then rate the answers of the AI. The ratings are used by the computational pattern matching to fine-tune how later answers should be worded. If you are curious about what generative AI might be like without this fine-tuning, see my discussion at the link here.

The vital takeaway is that there is a lot of tomfoolery already when it comes to generative AI. You are primed to be taken in by the tricks and techniques being employed.

Unpacking The WHO Report On Generative AI And LLMs In Medicine And Health

I am going to proceed in a collegial fashion.

Imagine that you and I are sitting down in a local Starbucks and having some warm cups of coffee while discussing the WHO report. I’ll bring up a topic, tell you about it, and then provide an excerpt pertaining to the matter at hand. We will collegially work our way through most of the document. I won’t cover every detail. I am handpicking especially notable or interesting points. I suppose if this was a YouTube video, I might refer to this as a reaction video.

Let’s begin at the beginning.

If you are someone that keeps tabs on the issuance of WHO reports (kudos to you), you might vaguely recall that the WHO released a report in 2021 that covered AI in health and medicine entitled “Ethics and Governance of Artificial Intelligence for Health”. The document made a splash at the time and contained six key principles underlying the ethical use and governance of AI in the health and medical domain.

By and large, the principles were about the same as other key precepts being announced by numerous governmental entities, see my coverage of the United Nations or UNESCO set of AI ethics guidelines, at the link here. I will in a moment describe the six principles of this latest WHO report since they are carried over into this new report from the prior one.

What makes this latest WHO report distinctive is that it goes beyond those six principles and also delves into the aforementioned five major application areas involving medicine and health. Furthermore, the focus in this instance is on generative AI. The 2021 report was before the advent of modern-day generative AI. ChatGPT spurred interest in generative AI and that happened in November 2022. This latest WHO report then incorporates a focus on generative AI, especially in the medical and health domain, and adds assessments of how this applies to the five major application areas.

The bottom line, even if you’ve seen the 2021 WHO report, you owe it to yourself to get up-to-date and read this new one. I’m sure you’ll enjoy doing so.

Here is what the 2024 WHO report says about the 2021 version (excerpt):

“The original WHO guidance on ethics and governance of AI for health examined various approaches to machine learning and various applications of AI in health care but did not specifically examine generative AI or LMMs. During development of that guidance and at the time of its publication in 2021, there was no evidence that generative AI and LMMs would be widely available so soon and would be applied to clinical care, health research and public health.”

And therefore this 2024 report intends to do this (excerpt):

“WHO is issuing this guidance to assist Member States in mapping the benefits and challenges associated with use of LMMs for health and in developing policies and practices for appropriate development, provision and use. The guidance includes recommendations for governance, within companies, by governments and through international collaboration, aligned with the guiding principles. The principles and recommendations, which account for the unique ways in which humans can use generative AI for health, are the basis of this guidance.”

The 2024 version provides a reminder of the six principles, which are still applicable and worthy of carrying forward. The strident principles are:

“(1) Protect autonomy.”
“(2) Promote human well-being, human safety and the public interest.”
“(3) Ensure transparency, ‘explainability’ and intelligibility.”
“(4) Foster responsibility and accountability.”
“(5) Ensure inclusiveness and equity.”
“(6) Promote AI that is responsive and sustainable.”

I’ll briefly bring you up to speed on those principles. We can then get into the guts of the rest of the latest report.

(1) Protect autonomy

One concern about the use of AI is that it might overtake human oversight. The dire outlook is that AI will be making life-or-death medical and health decisions about us and for us. No human will be particularly in the loop. You might say we will gradually and inexorably lose a semblance of human autonomy. Not good. Thus, the first principle is to make sure that we implement AI in a manner that ensures the heralded role of human autonomy remains firmly at the forefront.

Here’s what the formal indication is (excerpt):

“Humans should remain in control of health-care systems and medical decisions. Providers have the information necessary to use AI systems safely and effectively. People understand the role that AI systems play in their care. Data privacy and confidentiality are protected by valid informed consent through appropriate legal frameworks for data protection.”

If you are further interested in the topic of human autonomy and the role of AI autonomy, see my coverage at the link here.

(2) Promote human well-being, human safety and the public interest

In this next principle, a concern is that AI makers are apt to toss into the marketplace whatever AI they think they can sell and make a buck on. The trouble is that this AI might not be safe. It might contain errors that can harm people. It might be poorly designed and allow people to accidentally misuse the AI. A litany of qualms arises.

The aim is to try and guide AI makers and those fielding AI to step up and meet requirements for AI safety and strive for human well-being (here is a formal excerpt):

“Designers of AI satisfy regulatory requirements for safety, accuracy and efficacy for well-defined uses or indications. Measures of quality control in practice and quality improvement in the use of AI over time should be available. AI is not used if it results in mental or physical harm that could be avoided by use of an alternative practice or approach.”

For my coverage of the importance of AI safety, see the link here.

(3) Ensure transparency, “explainability” and intelligibility

For the third principle, a formidable issue with today’s AI is that it can be hard to discern what it is doing, along with determining why it is doing whatever it is doing. You could say that much of the current AI is opaque. It needs to be transparent. We need explainable AI, as I’ve discussed in-depth at the link here.

Here is a formal excerpt of this (excerpt):

“AI technologies should be intelligible or understandable to developers, medical professions, patients, users and regulators. Sufficient information is published or documented before the design or deployment of AI, and the information facilitates meaningful public consultation and debate on how the AI is designed and how it should or should not be used. AI is explainable according to the capacity of those to whom it is explained.”

(4) Foster responsibility and accountability

A momentous apprehension about AI is that there is confusion over who is responsible for AI that goes awry or that turns the AI into something unacceptable. Who or what is to be held accountable or responsible for harmful acts of AI? As I’ve noted in my column, we don’t yet anoint AI with legal personhood so you can’t think to go after the AI itself for your damages, see my discussion at the link here.

Here is a formal description (excerpt) of this principle:

“Foster responsibility and accountability to ensure that AI is used under appropriate conditions and by appropriately trained people. Patients and clinicians evaluate development and deployment of AI. Regulatory principles are applied upstream and downstream of the algorithm by establishing points of human supervision. Appropriate mechanisms are available for questioning and for redress for individuals and groups that are adversely affected by decisions based on AI.”

(5) Ensure inclusiveness and equity

You might be aware that generative AI can exhibit biases and discriminatory responses. This can be due to several reasons, including that the initial data training might have contained narratives and content that contained such biases. In turn, the generative AI has pattern-matched those maladies and carried them over into the seemingly fluent and “unbiased appearing” responses that usually are emitted. Deeper analysis shows that the bias is often hidden underneath the surface, see my deep dive at the link here.

This is what the formal description of this principle says (excerpt):

“AI is designed and shared to encourage the widest possible, appropriate, equitable use and access, irrespective of age, sex, gender identity, income, race, ethnicity, sexual orientation, ability or other characteristics. AI is available for use not only in high-income settings but also in low- and middle-income countries. AI does not encode biases to the disadvantage of identifiable groups. AI minimizes inevitable disparities in power. AI is monitored and evaluated to identify disproportionate effects on specific groups of people.”

(6) Promote AI that is responsive and sustainable

For the final of the six principles, we need to consider that AI consumes a lot of precious resources when we realize how much computer processing power is required to develop and field these latest AI systems. Sustainability is a topic often overlooked.

Here is the formal description (excerpt):

“AI technologies are consistent with the wider promotion of the sustainability of health systems, the environment, and workplaces.”

The United Nations has extensively examined various sustainability avenues associated with AI, see my coverage at the link here.

Moving Into The Report And Getting Our Feet Wet

You now know the six key principles.

Good for you.

I trust that you are earnestly ready to move forward with the latest elements of the WHO report. Take a sip of that delicious coffee and prepare yourself to get underway.

First, we should acknowledge that using AI in the domain of medicine and health is not a new idea. This has been going on since the AI field first got underway, tracing back to the 1950s, see my historical tracings at the link here. A longstanding effort involves mixing AI into this realm. We should not forget the past, nor underplay it. Do not be blinded by it either.

You might compellingly say that generative AI presents some novelties, partially due to its high fluency and massive pattern-matching capacity. In the past, Natural Language Processing (NLP) was stilted. Pattern-matching was inherently limited due to the cost of computer hardware and memory, and the algorithms weren’t as advanced. A grand convergence has made today’s generative AI possible and available.

The WHO report notes that it is both the advent and the usage of generative AI that can create new opportunities and equally foster new dangers (excerpt):

“Applications of AI for health include diagnosis, clinical care, research, drug development, health-care administration, public health and surveillance. Many applications of LMMs are not novel uses of AI; however, clinicians, patients, laypeople and health-care professionals and workers access and use LMMs differently.”

A particularly irksome aspect of generative AI is that we keep seeing outsized efforts to have such AI pass various credentialing exams as though this alone is a marker of practical application. This has happened in the legal field, financial field, the medical field, etc. I am not dissing those efforts. It is great to see the amazing progress that generative AI has attained. The concern is that there is an implication that passing an exam is the same as being ready to practice.

We probably fall for this because we know that humans must study for years on end, and their usual “last step” entails taking an exam. Therefore, it seems “logical” to assume that if AI can pass such a test, it is the “last step” and otherwise is primed to be put into daily use.

Not so.

Banner headlines continue to proclaim that some researchers were able to have generative AI get a near-passing or actual passing grade when taking a rigorous medical exam. That does seem exemplary. However, this does not imply that generative AI is suitable for practicing medicine. It just means that the AI has sufficient pattern matching to pass written exams. See my analysis at the link here.

We need to be mindful that having AI pass an exam is not the same as saying that the AI is ready for prime time in being used by physicians and patients (excerpt):

“Several LMMs have passed the US medical licensing examination; however, passing a written medical test by regurgitating medical knowledge is not the same as providing safe, effective clinical services, and LMMs have failed tests with material not previously published online or that could be easily solved by children.”

A contentious debate exists about whether generative AI can be used by itself in this domain or should be only used by medical professionals. Let’s first examine the role of doctors and other medical professionals as being the mainstay users of generative AI in this domain. On the one hand, you could say this is nothing new in the sense that lots of computerized systems and online apps are used routinely in this arena. The use of generative AI would at first glance seem to be ho-hum.

The devil in the details is that it is very easy to be lulled into believing that the generative AI “knows” what it is doing. You can rely upon the generative AI as a considered second opinion. Is this second opinion truly on par with that of a human physician? Do not assume so.

The good news is that the massive scale of generative AI can be a potential detector of rare circumstances. That is certainly handy. But will the rare indication be a false positive? Lots of tough questions abound.

Here are some relevant points from the WHO report (excerpts):

“Diagnosis is seen as a particularly promising area, because LMMs could be used to identify rare diagnoses or ‘unusual presentations’ in complex cases. Doctors are already using Internet search engines, online resources and differential diagnosis generators, and LMMs would be an additional instrument for diagnosis.”
“LMMs could also be used in routine diagnosis, to provide doctors with an additional opinion to ensure that obvious diagnoses are not ignored. All this can be done quickly, partly because an LMM can scan a patient’s full medical record much more quickly than can doctors.”
“One concern with respect to LMMs has been the propensity of chatbots to produce incorrect or wholly false responses from data or information (such as references) ‘invented’ by the LMM and responses that are biased in ways that replicate flaws encoded in training data. LMMs could also contribute to contextual bias, in which assumptions about where an AI technology is used result in recommendations for a different setting.”

The generative AI that is being principally used for medical and health applications tends today to be of a generic variety. We are inching our way towards enhancing the generic generative AI to be tuned specifically for the healthcare domain all told. And, during this time, the tuned or honed generative AI is usually focused on narrowly scoped subdomains.

An overarching aim of AI-powered MedTech and HealthTech research entails devising a medical or health-steeped generative AI that can provide deep dives into subdomains and simultaneously handle across-the-board medical and health advisement. This envisioned specialization of generative AI is hoped to be good enough that it could readily be retrained on the fly to deal with new twists and turns in the medical and health field. The retraining would not require an overhaul of the generative AI. Instead, a medical or health practitioner could in suitably controlled ways merely instruct the generative AI on new advances.

Sometimes this future variation of generative AI is referred to as generalist medical generative AI or something akin to that moniker.

Here’s what the formal indication had to say (excerpt):

“The long-term vision is to develop ‘generalist medical artificial intelligence’, which will allow health-care workers to dialogue flexibly with an LMM to generate responses according to customized, clinician-driven queries. Thus, a user could adapt a generalist medical AI model to a new task by describing what is required in common speech, without having to retrain the LMM or training the LMM to accept different types of unstructured data to generate a response.”

A means of doing retraining might consist entirely of natural language instructions that a person gives to the generative AI. A question arises as to whether the prompting can be entirely fluid and without any specific commands or techniques. Today, the best way to get the most out of generative AI consists of using skillful prompts as part of a user being versed in the techniques of prompt engineering, see my coverage of a range of prompt engineering approaches at the link here.

Will we continue to need a user to become conversant in prompt engineering or will generative AI eventually no longer require such skills? This is a heatedly debated topic. The thing is, regardless of how a user devises a prompt, a lingering issue is whether the generated response is correct and apt to the situation or circumstances at play. Thus, another unresolved question is going to be how a user will be able to ascertain that a medical or health recommendation emitted by generative AI is worthy and suitable to undertake.

Consider these open issues as noted in the WHO report (excerpt):

“Current LMMs also depend on human ‘prompt engineering’, in which an input is optimized to communicate effectively with an LMM. Thus, LMMs, even if trained specifically on medical data and health information, may not necessarily produce correct responses. For certain LMM-based diagnoses, there may be no confirmatory test or other means to verify its accuracy.”

I had earlier mentioned that the initial data training of data from across the Internet can introduce biases into the generative AI pattern-matching. You might be thinking that if you merely did data training on medical and health data, we’d be a lot better off. Probably not. There is bias in those datasets as well, along with likely numerous errors and confounding data.

Take a gander at these salient points (excerpts):

“Many of the LMMs currently available for public use were trained on large datasets, such as on the Internet, which may be rife with misinformation and bias. Most medical and health data are also biased, whether by race, ethnicity, ancestry, sex, gender identity or age.”
“LMMs are also often trained on electronic health records, which are full of errors and inaccurate information or rely on information obtained from physical examinations that may be inaccurate, thus affecting the output of an LMM.”

In The Swimming Pool And Treading Water

I’ve been taking you through the details and perhaps we ought to take a breather. Assuming that we are still seated in a Starbucks, let’s stretch our legs for a moment.

Okay, that was long enough, time to get back to work. No lengthy breaks for us. On with the show.

I had cautioned earlier that it is overly easy to be lulled into believing generative AI. This can readily happen to physicians and medical professionals. They function in a fast-paced non-stop high pressure environment. If generative AI appears to be providing quick and reliable answers, your guard is going to be let down. You seem to be able to get more done in less time, possibly with higher-quality results. A huge relief.

Who wouldn’t become dependent upon that kind of at-your-fingertips service?

Many would.

The WHO report gets into this conundrum (excerpts):

“In automation bias, a clinician may overlook errors that should have been spotted by a human. There is also concern that physicians and health-care workers might use LMMs in making decisions for which there are competing ethical or moral considerations. “
“Use of LMMs for moral judgments could lead to ‘moral de-skilling’, as physicians become unable to make difficult judgments or decisions.”
“There is a long-term risk that increased use of AI in medical practice will degrade or erode clinicians’ competence as medical professionals, as they increasingly transfer routine responsibilities and duties to computers. Loss of skills could result in physicians being unable to overrule or challenge an algorithm’s decision confidently or that, in the event of a network failure or security breach, a physician would be unable to complete certain medical tasks and procedures.”

All in all, the grave concern is that humans as medical professionals will become de-skilled. They will allow their medical deepness to decay. Whatever insightful protection was provided by their human layers of knowledge about medicine and health will erode. A vicious cycle occurs. The better generative AI seems to get, the worse the human side of medical awareness can decline in a downward spiral.

Some refer to this as a race to the bottom.

Others aren’t so sure that this pessimistic scenario is a necessity. It could be that the mundane aspects of medicine and health are handled by generative AI. This, in turn, could allow human medical and health professionals to shift into higher gear. They would be able to focus on the less routine minutiae and instead use their precious energy and attention toward more advanced nuances of medicine and healthcare. In that sense, generative AI is spurring the medical and health profession to new heights.

Mull over that alternative upbeat future.

So far, I’ve mainly discussed the use of generative AI by medical and health professionals. The other angle consists of people performing self-care. They opt to use generative AI by themselves, without a doctor or other health professional overseeing what is going on. A person relies fully on AI for their medical advisement.

Scary or a boon to the democratization of medicine and healthcare?

Here are some notable points to ponder (excerpts):

“LMMs could accelerate the trend towards use of AI by patients and laypeople for medical purposes.”
“Individuals have used Internet searches to obtain medical information for two decades. Therefore, LMMs could play a central role in providing information to patients and laypeople, including by integrating them into Internet searches. Large language model powered chatbots could replace search engines for seeking information, including for self-diagnosis and before visiting a medical provider. LMM-powered chatbots, with increasingly diverse forms of data, could serve as highly personalized, broadly focused virtual health assistants.”

The direction seems to be that people would have a personalized generative AI virtual health assistant. In some situations, the AI would be your sole advisor on medical and health issues. You could also make available your virtual health assistant to converse with a medical or health professional, sharing limited aspects of what your AI has gleaned about you. The AI is working on your behalf and as your medical or health advocate and adviser.

Might this be a bridge too far?

We need to keep in mind that generative AI could produce bad advice. A patient might have little basis for judging whether the medical or health recommendations are sound. An added worry that really raises the hairs on the back of the neck is that suppose a medical or health-generative AI is paid for by a particular company that wants their products or services to be in the foreground of whatever care is being dispensed. Monetization in the midst of how generative AI is responsive could distort what the generative AI has been devised to emit.

Here are some salient points (excerpts):

“Many LMM-powered chatbot applications have distinct approaches to chatbot dialogue, which is expected to become both more persuasive and more addictive, and chatbots may eventually be able to adapt conversational patterns to each user. Chatbots can provide responses to questions or engage in conversation to persuade individuals to undertake actions that go against their self-interest or well-being.
“Several experts have called for urgent action to manage the potential negative consequences of chatbots, noting that they could become ‘emotionally manipulative’.”
“Use of LMMs by patients and laypeople may not be private and may not respect the confidentiality of personal and health information that they share. Users of LMMs for other purposes have tended to share sensitive information, such as company proprietary information. Data that are shared on an LMM do not necessarily disappear, as companies may use them to improve their AI models, even though there may be no legal basis for doing so, even though the data may eventually be removed from company servers.”

For my coverage on the lack of privacy and confidentiality that often pervades generative AI, see the link here.

Suppose that eventually the preponderance of patients will make use of generative AI and become greatly accustomed to doing so. When such a patient interacts with their physician, who or what are they going to believe? Should they believe the physician or believe the generative AI? Nowadays, physicians often struggle with discussing complex medical topics that their patients have sought to learn about via online blogs and at times questionable sources of medical and health-related information.

The role of the physician-patient relationship is being rocked and perhaps eternally disrupted (see these excerpts):

“Use of LMMs by patients or their caregivers could change the physician–patient relationship fundamentally. The increase in Internet searches by patients during the past two decades has already changed these relationships, as patients can use the information they find to challenge or seek more information from their healthcare provider.”
“A related concern is that, if an AI technology reduces contact between a provider and a patient, it could reduce the opportunities for clinicians to promote health and could undermine general supportive care, such as human–human interactions when people are often most vulnerable. Generally, there is concern that clinical care could be ‘de-humanized’ by AI.”

A notable phrase there is that maybe we are heading toward de-humanized clinical care.

Once again, not everyone sees the future in that same light. Rather than AI being a form of dehumanization of patients, perhaps a more resounding sense of humanization will be fostered via the adoption of generative AI.

How so?

The logic is that if patients are better equipped to understand their medical and health circumstances, they will be much better at interacting with and leveraging the advice of their physicians and human medical advisors. Patients will no longer feel as though they are a cog in the convoluted wheels of clinical care. They will be able to stand up and understand what is going on. They will become much more active participants in ensuring their medical and health progression.

Yes, the counterview to de-humanization is that generative AI is going to fully humanize clinical care.

Makes your head spin, I’m sure.

A particular subdomain that I’ve given a well-deserved amount of attention toward consists of the use of generative AI in a mental health therapy context, see my coverage at the link here and the link here, just to name a few instances of my analyses.

The gist is that with the ease of devising mental health chatbots by everyday non-therapy trained users, we are all right now in a gigantic global experiment of what happens when society is using untested unfettered generative AI for mental health:

“AI applications in health are no longer used exclusively or accessed and used within health-care systems or in-home care, as AI technologies for health can be readily acquired and used by non-healthy system entities or simply introduced by a company, such as those that offer LMMs for public use.”
“This raises questions about whether such technologies should be regulated as clinical applications, which require greater regulatory scrutiny, or as ‘wellness applications’, which require less regulatory scrutiny. At present, such technologies arguably fall into a grey zone between the two categories.”

There are some areas in which generative AI can shine when it comes to providing a boost to medical and health professionals. One of my favorites is the ongoing efforts to bolster empathy in budding medical students and underway medical doctors. I am a strident advocate of using generative AI to enable medical professionals to learn about empathy, including role-playing with the generative AI to test and enhance their personal empathetic capabilities, see my discussion at the link here.

Anyway, there are lots of sensible and upcoming uses for generative AI in a medical education or instructional setting (see excerpts):

“LMMs are also projected to play a role in medical and nursing education.”
“They could be used to create ‘dynamic texts’ that, in comparison with generic texts, are tailored to the specific needs and questions of a student. LMMs integrated into chatbots can provide simulated conversations to improve clinician–patient communication and problem-solving, including practicing medical interviewing, diagnostic reasoning and explaining treatment options.”
“A chatbot could also be tailored to provide a student with various virtual patients, including those with disabilities or rare medical conditions. LMMs could also provide instruction, in which a medical student asks questions and receives responses accompanied by reasoning through a “chain-of-thought” including physiological and biological processes.”

Finalizing Our Swim And Getting Ready For Further Rounds

I’ve got a few more notable points to cover and then I’ll do a final wrap-up.

Your patience in getting through all of this is appreciated. If we were at Starbucks, I surely would by now gladly have gotten a final round of coffee for our lengthy chat.

Let’s shift gears and consider the use of generative AI for performing scientific research in the medical and health domain.

There is a lot of medical research that goes on. We depend upon this research to discover new advances in improving medical and health options. The time required to properly perform such research can be extensive, plus the costs can be enormous. Yet, no matter how you cut it, without this vaunted research, we might still be using leeches as an everyday medical procedure.

Can generative AI be of assistance when performing medical and health research?

Yes, absolutely.

Are there downsides or gotchas that might go hand-in-hand with using generative AI in this manner?

Yes, absolutely.

There, you got two solid yes answers out of me (please go ahead and ring a bell).

We are again faced with the dual-use issues underlying AI.

Allow me to explain.

Suppose a medical researcher has performed experiments and needs to write up the results. The resultant paper will potentially be published in a medical journal and enable other researchers to further guide and direct their work due to the enlightened insights presented. Generative AI is relatively adept at producing essays. The medical researcher decides that they can save time by having the generative AI write the bulk of the paper.

Some would say that this is no different than using a word processing package to help you compose your work. Others would insist that the comparison is speciously flawed. You might use word processing to deal with spelling and grammar, but you don’t use it to compose the wording per se. Generative AI is going to emit entire passages and could easily be the preponderance of what the paper has to say.

That’s fine, the retort goes, as long as the medical researcher reviews the paper and puts their name on it, all is good. The researcher is to be held accountable. No matter whether they typed it or if they had a team of skilled monkeys on typewriters do so, the buck stops at the feet of the person who has their name on the paper.

But should we still be willing to say that the medical researcher is truly the author of the paper? It seems squishy. They presumably did the core work. Yet they didn’t pull it all together and write up what it was all about. Maybe AI deserves some of the credit. Huh? Given that AI doesn’t have legal personhood, as I noted earlier, the idea of somehow giving credit to AI seems spurious and highly questionable. The AI isn’t going to be accountable, nor should it get credit. You might alert the reader that AI was used. That seems sensible. The key is that you can’t then try to deflect accountability by later claiming that any errors in the paper were due to the AI. The human author must still be held accountable.

Round and round this goes.

Medical journals are still in the midst of coming up with rules about when, where, and how generative AI can be used in these delicate matters. There are additional concerns. Suppose the generative AI plagiarized material or infringed on copyrights, see my in-depth review at the link here. If someone uses generative AI to summarize other medical works, can the summary be relied upon or might it be askew? The summarization facility of generative AI is great, though as I’ve noted in my assessments, you are faced with a box of chocolates that you don’t know for sure what you might get, see the link here.

Here are salient points to consider (excerpts):

“LMMs can be used in a variety of aspects of scientific research.”
“They can generate text to be used in a scientific article, for submitting manuscripts or in writing a peer review. They can be used to summarize texts, including summaries for academic papers, or can generate abstracts. LMMs can also be used to analyze and summarize data to gain new insights in clinical and scientific research. They can be used to edit text, improving the grammar, readability and conciseness of written documents such as articles and grant proposals.”
“The authorship of a scientific or medical research paper requires accountability, which cannot be assumed by AI tools.”
“Use of LMMs for activities such as generating peer reviews could undermine trust in that process.”

Another rising concern is what some refer to as a so-called model collapse, also known as the disturbing possibility of overblown bloated and flotsam synthetic data.

The deal is this.

Envision that we use generative AI, and it produces gobs and gobs of essays and writings about medical and health topics. We shall refer to those generated works as synthetic data. It is synthetic in the sense that it wasn’t written by a human but instead generated by AI. So far, so good.

Human medical researchers are gradually writing less and less due to relying upon generative AI to do their writing for them. The published works as composed by the generative AI go onto the Internet.

Along comes the next and greatest version of generative AI that is being data-trained via content on the Internet. Your Spidey sense should now be tingling. Something might be afoot.

What is the nature of the content that is ergo serving as the core underpinning for pattern-matching of this new generative AI?

It is no longer human writing in any pure sense. It has become principally synthetic data. The generative AI-produced writings might swamp the teeny amount of remaining human writing. Some argue that this is a doomsday-style scenario. We are going to merely have generative AI that is data trained on regurgitated data. The generative AI is wimpy. We might not realize what we have done. We have sunk our own geese if you will.

For my analysis of the downsides and upsides of this, see the link here.

Since we are pontificating about medical research, let’s consider an intriguing possibility that I’ve discussed at length at the link here and has to do with the availability of mega-personas in modern-day generative AI.

The odds are that a lot of medical research depends upon finding human subjects who are able and willing to participate in a medical study. This is a tough problem for the medical field. How do you find people for this purpose? If you find them, how do you motivate them to participate? Will they last the course of the study or might they drop out? The entire matter can undercut the best of medical studies.

Consider these pertinent points (excerpts):

“A third application of patient-centered LMMs could be for identifying clinical trials or for enrolment in such trials.
While AI-based programs already assist both patients and clinical trial researchers in identifying a match, LMMs could be used in the same way by using a patient’s relevant medical data. This use of AI could both lower the cost of recruitment and increase speed and efficiency, while giving individuals more opportunities to seek appropriate trials and treatment that are difficult to identify and access through other channels.”

As indicated, we can use generative AI in the effort to devise and carry out clinical trials. This showcases the wide variety of ways that generative AI can be used in medical and health research. The range is wide. You might only at first glance contemplate the writing part of such research as being applicable, but nearly any of the activities are potentially amenable to being aided by generative AI.

If you were paying close attention, you might be saying to yourself that I promised there was an intriguing aspect that had to do with mega-personas. Where did that go? Did it disappear?

Thanks for keeping me on track.

Here’s the deal.

Trying to gather dozens of participants for a medical study is difficult. If you want hundreds or thousands of patients, the difficulty factor goes through the roof.

Imagine that we could simulate the efforts of patients. Rather than necessarily using human patients, we might be able to use AI-devised “patients” that seemingly act and react as patients might. This would immensely speed up research, reduce costs, and provide a whole lot of flexibility in terms of what might be asked of the “patients” during such a study.

Into this picture steps generative AI via mega-personas, see the link here. An intrinsic part of generative AI is the capability to create mega-personas. You can tell the generative AI that you want a faked set of a thousand people that meet this or that criteria. You want another set of an additional thousand people that meet this other criterion. After doing the appropriate establishing, you then instruct the generative AI to proceed as though these faked people have been taking some medical actions for days, weeks, or months. You use the result to do your medical analyses.

Voila, you’ve done medical research at a fraction of the usual cost and effort.

I’m betting you right away wondered whether this is really a viable means of representing real humans. Glad you asked. There have been simulations of this kind for many years in the medical and health domain. Much scrutiny and care must be used. You cannot assume that whatever happens in a simulated setting is going to be the same as in the real world.

Mega-personas are handy because they allow medical researchers to try these techniques without having to know programming or have arcane skills in proprietary simulation languages. It also suggests that medical researchers might lose their heads and jump into using something that they don’t really know what it does. We need to step cautiously into this emerging possibility.

Sorry, no silver bullet, no grand solution, but a promising surprise worth exploring.

To finish up these keystones about generative AI and the medical and health field, I’ll cover two macroscopic considerations.

First, we would be astute to look across the board at what generative AI might end up doing when used on a large scale across the entire swath of the medical and health field. You can expect that at least the essential six building blocks will be impacted, including (1) medical and health service delivery, (2) medical and health workforce, (3) medical and health IT or information systems, (4) medicines access and availability, (5) medical and health economics and financial affairs, and (6) medical health leadership and overall governance.

Here are some key points (excerpts):

“Whereas many risks and concerns associated with LMMs affect individual users (such as health-care professionals, patients, researchers or caregivers), they may also pose systemic risks.”
“Emerging or anticipated risks associated with use of LMMs and other AI-based technologies in health care include: (i) risks that could affect a country’s health system, (ii) risks for regulation and governance and (iii) international societal concerns.”
“Health systems are based on six building blocks: service delivery, the health workforce, health information systems, access to essential medicines, financing, and leadership and governance. LMMs could directly or indirectly impact these building blocks.”

I trust you can see how a bigger pattern needs to be given due diligence. How will generative AI change national practices of medicine and health? How will generative AI change international practices? It is easy to assume that generative AI is only a myopic topic, but it is vital to see the forest for the trees.

Lastly, a means of comprehending generative AI involves putting your mind toward the AI value chain. Here’s what that means. AI doesn’t just spring out of nowhere. The reality is that there are a series of stages or phases of AI coming along and into the medical and health arena.

The typical layout is that there are three main stages. Things begin with AI makers that opt to devise generative AI as apps or tools. This is usually generic generative AI. Next, as we proceed further into the AI value chain, the generic generative AI is molded or customized for a medical or health purpose. That’s the second stage. Finally, the generative AI that is readied for medical or health is deployed into the field.

Deployment is of equal importance to the other two stages. Many people falsely assume that you can haphazardly toss the generative AI into the hands of users. Doing so is troubling, done quite frequently (unfortunately), and almost always bodes for disturbing problems, see my detailed case study of an eating disorder chatbot that went awry during deployment, at the link here.

Go ahead and take a moment to closely examine these points (excerpts):

“Appropriate governance of LMMs used in health care and medicine should be defined at each stage of the value chain, from collection of data to deployment of applications in health care.”
“Therefore, the three critical stages of the AI value chain discussed are: (1) the design and development of general-purpose foundation models (design and development phase); (2) definition of a service, application or product with a general-purpose foundation model (provision phase); and (3) deployment of a health-care service application or service (deployment phase).”
“At each stage of the AI value chain, the following questions are asked: (i) Which actor (the developer, the provider and/or the deployer) is best placed to address relevant risks? What risks should be addressed in the AI value chain? (ii) How can the relevant actor(s) address such risks? What ethical principles must they uphold? (iii) What is the role of a government in addressing risks? What laws, policies or investment might a government introduce or apply to require actors in the AI value chain to uphold specific ethical principles?”

By looking at generative AI from an AI value chain perspective, you can lift yourself out of the trees and discern the entirety of the forest. We need to be thinking about the day-to-day repercussions of generative AI in the medical and health domain, along with having a clear and broadened view of the total landscape that is going to be impacted.

Conclusion

Whew, you made it to my concluding remarks, congrats.

We nearly got asked to leave Starbucks for having sat there for so long. They usually don’t nudge people, but we had such an intense discussion and held onto a table for a nearly endless period of time.

Let’s do a quick wrap-up and then head on our respective ways.

It is the best of times for the medical and health field due to the advent of generative AI. It is lamentedly potentially the worst of times too, if we aren’t careful about how we opt to devise, customize, and deploy generative AI.

The Hippocratic oath informs us to devoutly carry out the medical and health profession with good conscience and dignity and in accordance with sound medical practice. There is an encouraging chance that the proper use of generative AI will enliven that part of the oath. You might say we are obligated to try.

Of course, another rule of thumb must always be at the forefront of our minds.

First, do no harm.

Okay, that’s it, so thanks again for joining me, and I look forward to having another coffee drinking chat with you soon.

Source link