Saturday, 27 April, 2024
HomeDermatologyMore misdiagnoses with darker skins – US study

More misdiagnoses with darker skins – US study

Dermatologists and general practitioners are somewhat less accurate in diagnosing disease in darker skin, a recent study has suggested, but the MIT research team said that if used correctly, AI might be able to help.

When diagnosing skin diseases based solely on images of a patient’s skin, they found, doctors often do not perform as well when the patient has darker skin.

The study, which included more than 1 000 dermatologists and general practitioners, found that dermatologists accurately characterised about 38% of the images they saw, but only 34% of those that showed darker skin. General practitioners, who were less accurate overall, showed a similar decrease in accuracy with darker skin.

The research team also found that assistance from an artificial intelligence algorithm could improve this accuracy, although those improvements were greater when diagnosing patients with lighter skin.

While this is the first study to demonstrate physician diagnostic disparities across skin tone, other studies have found that the images used in dermatology textbooks and training materials predominantly feature lighter skin tones.

That may be one factor contributing to the discrepancy, the MIT team said, along with the possibility that some doctors might have less experience in treating patients with darker skin.

“Probably no doctor intends to do worse on any type of person, but it might be because you don’t have the knowledge and the experience, and therefore, on certain groups of people, you might do worse,” said Matt Groh, an assistant professor at the Northwestern University Kellogg School of Management.

“This is one of those situations where you need empirical evidence to help figure out how could change policies around dermatology education.”

Groh is lead author of the study, published in Nature Medicine. Rosalind Picard, an MIT professor of media arts and sciences, is the senior author of the paper.

Diagnostic discrepancies

Several years ago, an MIT study led by Joy Buolamwini found that facial-analysis programs had much higher error rates when predicting the gender of darker skinned people. That finding inspired Groh, who studies human-AI collaboration, to look into whether AI models, and possibly doctors themselves, might have difficulty diagnosing skin diseases on darker shades of skin – and whether those diagnostic abilities could be improved.

“This seemed like a great opportunity to identify whether there's a social problem going on and how we might want fix that, and also identify how to best build AI assistance into medical decision-making,” said Groh.

“I’m interested in how we can apply machine learning to real-world problems, specifically around how to help experts be better at their jobs… if we could improve their decision-making, we could improve patient outcomes."

To assess doctors’ diagnostic accuracy, the researchers compiled an array of 364 images from dermatology textbooks and other sources, representing 46 skin diseases across many shades of skin.

Most depicted one of eight inflammatory skin diseases, including atopic dermatitis, Lyme disease, and secondary syphilis, as well as a rare form of cancer called cutaneous T-cell lymphoma (CTCL), which can appear similar to an inflammatory skin condition. Many of these diseases, including Lyme disease, can present differently on dark and light skin.

The team recruited subjects for the study through Sermo, a social networking site for doctors. The total study group included 389 board-certified dermatologists, 116 dermatology residents, 459 general practitioners, and 154 other types of doctors.

Each study participant was shown 10 of the images and asked for their top three predictions for what disease each image might represent. They were also asked if they would refer the patient for a biopsy.

In addition, the general practitioners were asked if they would refer the patient to a dermatologist.

“This is not as comprehensive as in-person triage, where the doctor can examine the skin from different angles and control the lighting,” Picard said. “However, skin images are more scalable for online triage, and easy to input into a machine-learning algorithm, which can estimate likely diagnoses speedily.”

The researchers found that, not surprisingly, specialists in dermatology had higher accuracy rates: They classified 38% of the images correctly, compared with 19% for general practitioners.

Both groups lost about four percentage points in accuracy when trying to diagnose skin conditions based on images of darker skin – a statistically significant drop.

Dermatologists were also less likely to refer darker skin images of CTCL for biopsy, but more likely to refer them for biopsy for non-cancerous skin conditions.

A boost from AI

After evaluating how doctors performed on their own, the researchers also gave them additional images to analyse, with assistance from an AI algorithm the researchers had developed. The team trained this algorithm on about 30 000 images, asking it to classify the images as one of the eight diseases that most of the images represented, plus a ninth category of “other”.

This algorithm had an accuracy rate of about 47%. The researchers also created another version, with an artificially inflated success rate of 84%, allowing them to evaluate whether the accuracy of the model would influence doctors’ likelihood to take its recommendations.

“This allows us to evaluate AI assistance with models that are currently the best we can do, and with AI assistance that could be more accurate, maybe five years from now, with better data and models,” said Groh.

Both of these classifiers are equally accurate on light and dark skin. The researchers found that using either of these AI algorithms improved accuracy for both dermatologists (up to 60%) and general practitioners (up to 47%).

They also found that doctors were more likely to take suggestions from the higher-accuracy algorithm after it provided a few correct answers, but they rarely incorporated AI suggestions that were incorrect. This suggests the doctors are highly skilled at ruling out diseases and won’t take AI suggestions for a disease they have already ruled out, Groh says.

“They’re pretty good at not taking AI advice when the AI is wrong and the physicians are right,” he said.

While dermatologists using AI assistance showed similar increases in accuracy when looking at images of light or dark skin, general practitioners showed greater improvement on images of lighter skin than darker skin.

“This study allows us to see not only how AI assistance influences, but how it influences across levels of expertise,” Groh said.

“What might be going on there is that the PCPs don’t have as much experience, so they don’t know if they should rule a disease out or not because they aren’t as deep into the details of how different skin diseases might look on different shades of skin."

The researchers hope their findings will help stimulate medical schools and textbooks to incorporate more training on patients with darker skin. The findings could also help to guide the deployment of AI assistance programs for dermatology, which many companies are now developing.

Study details

Deep learning-aided decision support for diagnosis of skin disease across skin tones

Matthew Groh, Omar Badri, Roxana Daneshjou, Arash Koochek, Caleb Harris, Luis R. Soenksen, P. Murali Doraiswamy, Rosalind Picard.

Published in Nature Medicine on 5 February 2024

Abstract

Although advances in deep learning systems for image-based medical diagnosis demonstrate their potential to augment clinical decision-making, the effectiveness of physician–machine partnerships remains an open question, in part because physicians and algorithms are both susceptible to systematic errors, especially for diagnosis of underrepresented populations. Here we present results from a large-scale digital experiment involving board-certified dermatologists (n = 389) and primary-care physicians (n = 459) from 39 countries to evaluate the accuracy of diagnoses submitted by physicians in a store-and-forward teledermatology simulation. In this experiment, physicians were presented with 364 images spanning 46 skin diseases and asked to submit up to four differential diagnoses. Specialists and generalists achieved diagnostic accuracies of 38% and 19%, respectively, but both specialists and generalists were four percentage points less accurate for the diagnosis of images of dark skin as compared to light skin. Fair deep learning system decision support improved the diagnostic accuracy of both specialists and generalists by more than 33%, but exacerbated the gap in the diagnostic accuracy of generalists across skin tones. These results demonstrate that well-designed physician–machine partnerships can enhance the diagnostic accuracy of physicians, illustrating that success in improving overall diagnostic accuracy does not necessarily address bias.

 

Nature article – Deep learning-aided decision support for diagnosis of skin disease across skin tones (Open access)

 

See more from MedicalBrief archives:

 

AGs urge action from FDA on biased pulse oximeter technology

 

How software bias leads to under-diagnosis in black men’s lung problems

 

UK investigation into racial and gender bias in medical devices

 

A ‘glaring’ lack of darker skin in textbooks and journals

 

 

 

 

 

 

 

MedicalBrief — our free weekly e-newsletter

We'd appreciate as much information as possible, however only an email address is required.