HomeArtificial Intelligence (AI)SA doctors beaten by AI in hospital study

SA doctors beaten by AI in hospital study

In an experiment by South African researchers, AI models outperformed doctors’ ward diagnoses in a large public hospital, the results being similar to a US study last month that indicated AI systems fared better than doctors at emergency room diagnoses and triage.

Business Day reports that experts have said the local findings boost the prospect of developing reliable and inexpensive AI tools to help reduce the workloads of overstretched healthcare staff.

Unlike free AI chatbots that often dispense unreliable health information, evidence is growing that commercially available AI systems of the kind tested by the researchers are more dependable, offering “exciting potential to alleviate the huge pressures on low- and middle-income country healthcare workers and the systems in which they work”, said Bruce Bassett, distinguished professor of AI at Wits University and lead author of a study describing the work published last month on the preprint server arXiv.

The paper has not been peer reviewed but is broadly in line with last month’s American study.

For the South African study, researchers asked pairs of expert doctors to scrutinise 300 sets of in-patient files from Chris Hani Baragwanath Academic Hospital and determine a diagnosis based on their analysis of the written records.

The files contained the results of multiple diagnostic tests, including images from X-rays and MRIs, lab tests and vital sign measurements like blood pressure and temperature.

The experts’ findings were used as a benchmark against which to score the diagnoses reached by hospital staff and 10 different AI systems. These included Anthropic’s Claude 4.1 Opus and 4.5 Sonnet; Google’s Gemini 3 Pro, 2.5 Pro and 2.5 Flash; OpenAI’s GPT-5.1, o3 & o4-mini, and GPT-5.1 mini; and xAI’s Grok 4.1 Fast Reasoning.

OpenAI’s GPT-5.1 scored best among the models, while Claude 4.1 scored worst, but all of the models consistently outdid the ward diagnoses made by hospital staff.

There was a 15% variation in performance between the cheapest and most expensive AI models, which ranged in cost from 1 US cent to 50 US cents. These models were significantly cheaper than the pairs of expert doctors, who cost $40 a case.

The AI models were cheap even when compared with the cost of physicians in countries where public sector salaries are far lower than South Africa. In Nigeria, for example, where physician salaries are about $1 200 a year, the average cost of a case would be $2, the researchers said.

Given the rapid fall in AI costs in recent years, high-quality diagnoses are likely to become even more affordable, they said.

“We are entering the era of cheap, good-quality (AI) diagnosis,” Bassett said.

Study details

Heads we win, tails you lose: AI detectors in education

Mark Andrew Bassett, Wayne Bradshaw, Hannah Bornsztejn et al.

Published in arXIV on 29 January 2026

Abstract

The increasing use of generative artificial intelligence (AI) in student assessment has led to institutional reliance on detection tools. Unlike plagiarism detection, AI detection relies on unverifiable probabilistic estimates. In this paper, we argue that generative AI detection should not be used in education due to its methodological imperfections, violation of procedural fairness, and unverifiable outputs. Generative AI detectors cannot be tested in real-world conditions where the true origin of a text is unknown. Attempts to validate results through linguistic markers, multiple tools, or comparisons with past work introduce confirmation bias rather than independent verification. Moreover, categorising text as human- or AI-generated imposes a false dichotomy that ignores work created with, not by, AI. Generative AI detection also raises security concerns. Academic integrity investigations must rely on evidence meeting the balance of probabilities standard, which generative AI detection scores do not satisfy.

 

axrXIV article – Heads we win, tails you lose: AI detectors in education (Open access)

 

News24 article – AI models outperform doctors in SA hospital study (Restricted access)

 

See more from MedicalBrief archives:

 

AI beats doctors in Harvard emergency triage diagnosis trial

 

AI chatbots outstrip doctors in diagnoses – US randomised study

 

ChatGPT diagnoses child’s illness after 17 doctors fail

MedicalBrief — our free weekly e-newsletter

We'd appreciate as much information as possible, however only an email address is required.