back to top
Wednesday, 30 April, 2025
HomeArtificial Intelligence (AI)AI algorithms in diagnosis could harm patients – Dutch study

AI algorithms in diagnosis could harm patients – Dutch study

Experts have warned that artificial intelligence in healthcare could focus on predictive accuracy over treatment efficacy, possibly resulting in harm to patients.

The researchers in The Netherlands suggested that while AI-driven outcome prediction models (OPMs) are promising, they risk creating “self-fulfilling prophecies” due to biases in historical data.

OPMs utilise patient-specific information, including health history and lifestyle factors, to assist doctors in evaluating treatment options.

AI’s ability to process this data in real time offers significant advantages for clinical decision making.

However, reports The Independent, the researchers’ mathematical models demonstrate a potential downside – and if trained on data reflecting historical disparities in treatment or demographics, AI could perpetuate these inequalities, leading to suboptimal patient outcomes.

The study highlights the crucial role of human oversight in AI-driven healthcare. Researchers emphasise the “inherent importance” of applying “human reasoning” to AI’s decisions, ensuring that algorithmic predictions are critically evaluated and do not inadvertently reinforce existing biases.

Scenarios

The team then created mathematical scenarios to test how AI may harm patient health and suggest that these models “can lead to harm”.

“Many expect that by predicting patient-specific outcomes, these models have the potential to inform treatment decisions and they are frequently lauded as instruments for personalised, data-driven healthcare,” they said.

“We show, however, that using prediction models for decision-making can lead to harm, even when the predictions exhibit good discrimination after deployment.”

The article, published in the data-science journal Patterns, also suggests the development of AI model development “needs to shift its primary focus away from predictive performance and instead toward changes in treatment policy and patient outcome”.

Reacting to the risks outlined in the study, Dr Catherine Menon, a principal lecturer at the University of Hertfordshire’s department of computer science, said: “This happens when AI models have been trained on historical data, where the data do not necessarily account for such factors as historical under-treatment of some medical conditions or demographics.

“These models will accurately predict poor outcomes for patients in these demographics.

“This creates a ‘self-fulfilling prophecy’ if doctors decide not to treat these patients due to the associated treatment risks and the fact that the AI predicts a poor outcome for them.

“Even worse, this perpetuates the same historic error: under-treating these patients means that they will continue to have poorer outcomes.”

Use of these AI models therefore risks worsening outcomes for patients who have typically been historically discriminated against in medical settings due to factors such as race, gender or educational background, she added.

“This demonstrates the inherent importance of evaluating AI decisions in context and applying human reasoning and assessment to AI judgments.”

AI is currently used across the NHS in England to help clinicians read X-rays and CT scans, to free up staff time, as well as to speed up the diagnosis of strokes.

In January, Prime Minister Sir Keir Starmer pledged that the UK would be an “AI superpower”, and that the technology could be used to tackle NHS waiting lists.

Ewen Harrison, a Professor of surgery and data science and co-director of the Centre for Medical Informatics at the University of Edinburgh, said: “While these tools promise more accurate and personalised care, this study highlights one of a number of concerning downsides: predictions themselves can unintentionally harm patients by influencing treatment decisions.

“Say a hospital introduces a new AI tool to estimate who is likely to have a poor recovery after knee replacement surgery. The tool uses characteristics such as age, body weight, existing health problems and physical fitness.

“Initially, doctors intend to use this tool to decide which patients would benefit from intensive rehabilitation therapy.

“However, due to limited availability and cost, it is decided instead to reserve intensive rehab primarily for patients predicted to have the best outcomes.

“Patients labelled by the algorithm as having a ‘poor predicted recovery’ receive less attention, fewer physiotherapy sessions and less encouragement overall.”

He added that this leads to a slower recovery, more pain and reduced mobility in some patients.

Study details

When accurate prediction models yield harmful self-fulfilling prophecies

Wouter A.C. van Amsterdam, Nan van Geloven, Jesse H. Krijthe et al.

Published in Patterns on 11 April 2025

The bigger picture
To tailor treatment decisions to individual patients, many researchers develop prediction models. These models assess a patient’s risk of an adverse outcome, such as a heart attack, based on their characteristics. Many believe that the best prediction models for decision-making are those that have the highest predictive performance, e.g., discrimination—the ability to assign higher risks to patients with the outcome compared to those without. Common advice is to keep evaluating discrimination after the model’s implementation to ensure effective decision-making. We show, through a clinical example and mathematical proofs, that this belief is flawed because of the existence of so-called “harmful self-fulfilling prophecies”: prediction models that retain good discrimination after implementation and yet harm patients when used for decision-making. The takeaway is that rather than relying on discrimination, we should assess models based on their impact on treatment decisions and patient outcomes.

Summary
Prediction models are popular in medical research and practice. Many expect that by predicting patient-specific outcomes, these models have the potential to inform treatment decisions, and they are frequently lauded as instruments for personalised, data-driven healthcare. We show, however, that using prediction models for decision-making can lead to harm, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients, but the worse outcome of these patients does not diminish the discrimination of the model. Our main result is a formal characterisation of a set of such prediction models. Next, we show that models that are well calibrated before and after deployment are useless for decision-making, as they make no change in the data distribution. These results call for a reconsideration of standard practices for validation and deployment of prediction models that are used in medical decisions.

 

Patterns article – When accurate prediction models yield harmful self-fulfilling prophecies (Open access)

 

The Independent article – AI health warning as researchers say algorithms could discriminate against patients (Open access)

 

See more from MedicalBrief archives:

 

AI chatbots outstrip doctors in diagnoses – US randomised study

 

ChatGPT diagnoses child’s illness after 17 doctors fail

 

The risks of ChatGPT in healthcare

MedicalBrief — our free weekly e-newsletter

We'd appreciate as much information as possible, however only an email address is required.