ChatGPT’s Performance Found Lacking on Cancer Treatment Recommendations

ChatGPT’s Performance Found Lacking on Cancer Treatment Recommendations. A physician standing with his arms crossed and his glasses in one hand stares at a cancer cell with a target overlaid on it.

In a recent column published by the Dana-Farber Cancer Institute, one of its physicians relayed his experience in using ChatGPT to provide statistics on a certain type of cancer. To his surprise, ChatGPT made up an equation and even gave it a name.

“It was an equation that does nothing, but it looked very convincing,” said Benjamin Schlechter, M.D., who specializes in gastrointestinal cancers. “In a way, it’s like talking to children: They start making up a story and continue the more you ask them about it. In this case, ChatGPT was adding detail after detail, none of it real, because I asked it to elaborate. It’s very confident for a computer.”

It turns out that ChatGPT has similar problems with accuracy in making cancer treatment recommendations, according to a study recently published in JAMA Oncology.

Researchers from Mass General Brigham found that one-third of GPT’s 3.5 recommendations went at least partially against 2021 National Comprehensive Cancer Treatment guidelines. “Clinicians should advise patients that large language model chatbots are not a reliable source of information,” the study concluded.

The chatbot was most likely to mix incorrect recommendations among correct ones, creating an error that’s difficult even for experts to detect. The study only evaluated one model at a snapshot in time, but the findings provide insight into areas of concern and future research needs.

Danielle Bitterman, M.D., Mass General Brigham’s department of radiation oncology and the artificial intelligence (AI) in medicine program, said in a statement: “ChatGPT responses can sound a lot like a human and can be quite convincing. But, when it comes to clinical decision-making, there are so many subtleties for every patient's unique situation. A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide.”

The chatbot did not purport to be a medical device, and need not be held to such standards, the study said. Patients, however, likely will use technologies like this to educate themselves, which may affect shared decision-making in the doctor-patient relationship.

The investigators plan to explore how patients and physicians can distinguish between medical advice written by a physician compared with AI. They also plan to prompt ChatGPT with more detailed clinical cases to evaluate AI’s clinical knowledge further.

AHA Center for Health Innovation

Artificial Intelligence (AI)