Sunday, 28 April, 2024
HomePracticeChatGPT tops doctors when it comes to bedside manner – US study

ChatGPT tops doctors when it comes to bedside manner – US study

ChatGPT appears to have a better “bedside manner” than some doctors – at least when their written advice is rated for quality and empathy, a study has shown, with the researchers suggesting that artificial intelligence assistants might be able to help in drafting responses to patient questions.

The findings highlight the potential for AI assistants to play a role in medicine, according to the authors of the work, which was published in JAMA Internal Medicine.

“The opportunities for improving healthcare with AI are massive,” said Dr John Ayers, of the University of California San Diego.

However, others noted that the findings do not mean ChatGPT is actually a better doctor, and cautioned against delegating clinical responsibility, given that the chatbot has a tendency to produce “facts” that are untrue, reports The Guardian.

The study used data from Reddit’s AskDocs forum, in which members can post medical questions that are answered by verified healthcare professionals. The team randomly sampled 195 exchanges from AskDocs where a verified doctor responded to a public question.

The original questions were then posed to the AI language model, ChatGPT, which was asked to respond. A panel of three licensed healthcare professionals, who did not know whether the response came from a human physician or ChatGPT, rated the answers for quality and empathy.

Overall, the panel preferred ChatGPT’s responses to those given by a human 79% of the time. ChatGPT responses were also rated good or very good quality 79% of the time, compared with 22% of doctors’ responses, and 45% of the ChatGPT answers were rated empathic or very empathic, compared with just 5% of doctors’ replies.

Dr Christopher Longhurst, of UC San Diego Health, said: “These results suggest that tools like ChatGPT can efficiently draft high-quality, personalised medical advice for review by clinicians, and we are beginning that process at UCSD Health.”

Professor James Davenport, of the University of Bath, who was not involved in the research, said: “The paper does not say that ChatGPT can replace doctors, but does, quite legitimately, call for further research into whether and how ChatGPT can assist physicians in response generation.”

Some noted that, given ChatGPT was specifically optimised to be likeable, it was not surprising that it wrote text that came across as empathic. It also tended to give longer, chattier answers than human doctors, which could have played a role in its higher ratings.

Others cautioned against relying on language models for factual information due to their tendency to generate made-up “facts”.

Professor Anthony Cohn of the University of Leeds said using language models as a tool to draft responses was a “reasonable use case for early adoption”, but that even in a supporting role they should be used carefully.

“Humans have been shown to overly trust machine responses, particularly when they are often right, and a human may not always be sufficiently vigilant to properly check a chatbot’s response,” he said. “This would need guarding against, perhaps using random synthetic wrong responses to test vigilance.”

Study details

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum

John Ayers,  Adam Poliak, Mark Dredze, et al.

Published in JAMA Internal Medicine on 28 April 2023

Key Points

Question Can an artificial intelligence chatbot assistant, provide responses to patient questions that are of comparable quality and empathy to those written by physicians?
Findings In this cross-sectional study of 195 randomly drawn patient questions from a social media forum, a team of licensed health care professionals compared physician’s and chatbot’s responses to patient’s questions asked publicly on a public social media forum. The chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy.
Meaning These results suggest that artificial intelligence assistants may be able to aid in drafting responses to patient questions.

Abstract

Importance
The rapid expansion of virtual health care has caused a surge in patient messages concomitant with more work and burnout among health care professionals. Artificial intelligence (AI) assistants could potentially aid in creating answers to patient questions by drafting responses that could be reviewed by clinicians.

Objective
To evaluate the ability of an AI chatbot assistant (ChatGPT), released in November 2022, to provide quality and empathetic responses to patient questions.

Design, Setting, and Participants
In this cross-sectional study, a public and non-identifiable database of questions from a public social media forum (Reddit’s r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question. Chatbot responses were generated by entering the original question into a fresh session (without prior questions having been asked in the session) on December 22 and 23, 2022. The original question along with anonymised and randomly ordered physician and chatbot responses were evaluated in triplicate by a team of licensed health care professionals. Evaluators chose “which response was better” and judged both “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Mean outcomes were ordered on a 1 to 5 scale and compared between chatbot and physicians.

Results
Of the 195 questions and responses, evaluators preferred chatbot responses to physician responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations. Mean (IQR) physician responses were significantly shorter than chatbot responses (52 [17-62] words vs 211 [168-245] words; t = 25.4; P < .001). Chatbot responses were rated of significantly higher quality than physician responses (t = 13.3; P < .001). The proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;). This amounted to 3.6 times higher prevalence of good or very good quality responses for the chatbot. Chatbot responses were also rated significantly more empathetic than physician responses (t = 18.9; P < .001). The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (physicians: 4.6%, 95% CI, 2.1%-7.7%; chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%). This amounted to 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot.

Conclusions
In this cross-sectional study, a chatbot generated quality and empathetic responses to patient questions posed in an online forum. Further exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomised trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes.

 

JAMA Internal Medicine article – Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum (Open access)

 

The Guardian article – AI has better ‘bedside manner’ than some doctors, study finds (Open access)

 

See more from MedicalBrief archives:

 

The risks of ChatGPT in healthcare

 

AI outperforms humans in creating cancer treatments — but doctors balk

 

OECD: How artificial intelligence could change the future of health

 

 

 

 

 

 

MedicalBrief — our free weekly e-newsletter

We'd appreciate as much information as possible, however only an email address is required.