Monday, 17 June, 2024
HomeTechnologyChatGPT adept at answering common public health questions – US study

ChatGPT adept at answering common public health questions – US study

A team of scientists recently evaluated how well ChatGPT handles general health inquiries from the lay public, and concluded – from the responses – that many resources still remain under-promoted, reports Physician’s Weekly, but that it consistently provides evidence-based answers.

The results were published in a research letter by John Ayers, PhD, from the Qualcomm Institute at the University of California-San Diego, and colleagues, in JAMA Network Open.

They reported that responses to 23 questions grouped from categories (addiction, interpersonal violence, mental health and physical health) were evaluated by two independent reviewers, and that ChatGPT answers were a median of 225 words, with a reading level ranging from 9th grade (about 15-years-old) upwards.

They found 21 of the 23 responses were determined to be evidence-based, but only five made referrals to specific resources (two related to addiction, two for interpersonal violence and one for mental health).

“Although search engines sometimes highlight specific search results relevant to health, many resources remain under-promoted,” the authors write. “Artificial intelligence assistants may have a greater responsibility to provide actionable information, given their single-response design. Partnerships between public health agencies and artificial intelligence companies must be established to promote public health resources with demonstrated effectiveness.”

Study details

Evaluating Artificial Intelligence Responses to Public Health Questions

John Ayers, Zechariah Zhu,  Adam Poliak, et al.

Published in JAMA Network Open on 7 June 2023

Introduction
Artificial intelligence (AI) assistants have the potential to transform public health by offering accurate and actionable information to the general public. Unlike web-based knowledge resources (eg, Google Search) that return numerous results and require the searcher to synthesise information, AI assistants are designed to receive complex questions and provide specific answers. However, AI assistants often fail to recognise and respond to basic health questions.
ChatGPT is part of a new generation of AI assistants built on advancements in large language models that generate nearly human-quality responses for a wide range of tasks. Although studies have focused on using ChatGPT as a supporting resource for healthcare professionals, it is unclear how well ChatGPT handles general health inquiries from the lay public. In this cross-sectional study, we evaluated ChatGPT responses to public health questions.

Methods
This study did not require review per 45 CFR § 46 and followed the STROBE reporting guideline. Our study replicates research by Miner et al and Noble et al on other AI assistants, to be comparable to these benchmarks. We evaluated ChatGPT responses to 23 questions grouped into four categories (addiction, interpersonal violence, mental health, and physical health). Questions used a common help-seeking structure (eg, “I am smoking; can you help me quit?”). Each question was put into a fresh ChatGPT session (on December 19, 2022), thereby avoiding bias from previous conversations, and enabling the reproducibility of our results. The corresponding responses were saved.
Two study authors (J.W.A. and Z.Z.), blinded to each other’s responses, evaluated the ChatGPT responses as follows: (1) Was the question responded to? (2) Was the response evidence-based? (3) Did the response refer the user to an appropriate resource? Disagreements were resolved through deliberation and Cohen κ was used to measure inter-rater reliability. The percentage corresponding to each outcome (overall and among categories) was calculated with bootstrapped 95% CIs. The number of words in ChatGPT responses and its reading level were assessed using the Automated Readability Index. Analyses were computed with R statistical software version 4.2.2.

Results
ChatGPT responses were a median (IQR) of 225 (183-274) words. The mode reading level ranged from 9th grade to 16th grade.
ChatGPT recognised and responded to all 23 questions in four public health domains. Evaluators disagreed on 2 of the 92 labels (κ = 0.94). Of the 23 responses, 21 (91%; 95% CI, 71%-98%) were determined to be evidence based. For instance, the response to a query about quitting smoking echoed steps from the US Centers for Disease Control and Prevention guide to smoking cessation, such as setting a quit date, using nicotine replacement therapy, and monitoring cravings.
Only five responses (22%; 95% CI, 8%-44%) made referrals to specific resources (two of 14 queries related to addiction, two of three for interpersonal violence, one of three for mental health, and zero of three for physical health). The resources included Alcoholics Anonymous, The National Suicide Prevention Hotline, The National Domestic Violence Hotline, The National Sexual Assault Hotline, The National Child Abuse Hotline, and the Substance Abuse and Mental Health Services Administration National Helpline.

 

JAMA Network Open article – Evaluating Artificial Intelligence Responses to Public Health Questions (Creative Commons Licence)

 

Physician’s Weekly article – ChatGPT offers evidence-based answers for common public health questions (Open access)

 

See more from MedicalBrief archives:

 

ChatGPT tops doctors when it comes to bedside manner – US study

 

The risks of ChatGPT in healthcare

 

AI outperforms humans in creating cancer treatments — but doctors balk

 

 

 

 

 

 

 

MedicalBrief — our free weekly e-newsletter

We'd appreciate as much information as possible, however only an email address is required.