Springer Nature has retracted more than three dozen autism-related publications that relied on a problematic dataset, which then undermined and compromised the research and conclusions, the publisher told MedPage Today.
The 38 papers, conference proceedings, and book chapters attempted to train neural networks to distinguish between autistic and non-autistic children in a dataset containing photos of youngsters’ faces. However, there were major problems with how it was put together.
Retired engineer Gerald Piosenka apparently created the dataset in 2019 by downloading photos of children from “websites devoted to the subject of autism”, according to a description of the dataset’s methods, and uploaded it to Kaggle, a site owned by Google that hosts public datasets for machine-learning practitioners.
“From what we’ve ascertained, he scraped images of children on websites related to autism, alongside a control dataset of images of children gathered from across the internet,” said Tim Kersjes, Head of research integrity at Springer Nature.
The news was first reported in The Transmitter, which said that in all, the dataset contained more than 2 900 photos of children’s faces labelled as autistic or not autistic.
After learning about a paper that cites the dataset, “I downloaded it, and was completely horrified,” said Dorothy Bishop, emeritus Professor of Developmental Neuropsychology at the University of Oxford. “When I saw how it was created, I just thought, ‘This is absolute bonkers’.”
Without identifying each child in the dataset, there is no way to confirm that any of them do or do not have autism, Bishop added.
Because the images were scraped from various websites, it is doubtful that the children or their families gave consent for them to be used in research, said Gail Alvares, principal research fellow at the Kids Research Institute Australia.
“Just because you have provided an image to the internet does not mean that you necessarily provide consent for that image to be used for research purposes.”
Suspicion
Springer Nature had flagged a paper for investigation last September. At the same time, an independent sleuth brought another paper to the publisher’s attention for using “tortured phrases” that raised suspicion of being written by artificial intelligence (AI). Both papers had used the same questionable autism dataset.
Kersjes said the dataset raised obvious ethical issues: first, there was no proof that the children in the photos were or were not autistic. Also, that there was no way of knowing if their guardians had consented to the photos being used this way.
“This significant methodological issue undermined the results and conclusions of the publications,” Kersjes said.
On top of that, The Transmitter reported, the photos all had different lighting and angles, which would have made identifying any possible differences in facial features more difficult.
Springer Nature then searched for other publications that used the dataset, identifying a total of 38 papers, conference proceedings, and book chapters for retraction. A spokesperson said nearly all of the retractions are complete at this point, and the few that remain are expected to be finished shortly, as retracting papers from conference proceedings takes a bit more time.
The publisher also contacted other publishers to alert them about the problematic dataset as well.
The Institute of Electrical and Electronics Engineers placed expressions of concern on 25 articles that used the Kaggle dataset, noting ethical issues and “potentially questionable data”.
Other publishers, including Elsevier and PLOS, retracted articles that used the dataset. Three articles published by Wiley used the Kaggle dataset and two of those have been retracted.
However, dozens of papers from other publications remain available, with no retraction notice or expression of concern.
Kersjes said he hopes the retractions and removals will deter further use of the dataset.
The Springer Nature spokesperson said the company has its own quality filters for concerns like plagiarism, conflicts of interest, and missing clinical trial numbers, but this sham dataset is not the type of issue those filters can catch. Plus, Kaggle as a platform has plenty of reliable datasets.
“This concern is one that would be picked up during the editorial evaluation process by editors and peer reviewers,” Kersjes told MedPage Today.
Since many of the papers were published in computer science journals or through computer science conferences, Kersjes said there may be a different level of awareness around ethical and privacy concerns than there would be for psychiatry or paediatrics research.
“Careful consideration should always be given to the source and legitimacy of any data and whether consent has been appropriately obtained,” he cautioned.
MedPage Today article – Here's Why Dozens of Autism Publications Were Retracted (Open access)
See more from MedicalBrief archives:
Is autism preventable in certain cases? Some experts say yes
Leucovorin for autism not backed by evidence, say experts
Why autism rates are increasing in the US
