Small sample sizes in studies using functional MRI to investigate brain connectivity and function are common in neuroscience, despite years of warnings that such studies likely lack sufficient statistical power.
A new analysis reveals that task-based fMRI experiments involving typical sample sizes of about 30 participants are only modestly replicable. This means that independent efforts to repeat the experiments are as likely to challenge as to confirm the original results.
The study also finds that task-based fMRI studies with sample sizes of up to 100 also fall short of being perfectly replicable.
Task-based fMRI studies track changes in blood oxygen levels in the brain while study subjects are engaged in cognitive tasks. The technique allows researchers to see which brain regions are recruited to perform specific tasks.
But those in the field of cognitive neuroscience have not agreed on specific standards for the design of task-based fMRI studies – in particular, how many study subjects are needed to ensure reliable findings. The new research aims to address this shortfall, researchers said.
“This study is the largest investigation of the role of sample size in the reproducibility of task-based fMRI methods,” said University of Illinois psychology professor Aron Barbey, who conducted the research with Erick Paul, a former postdoctoral fellow at the U of I; Benjamin Turner, of Nanyang Technological University in Singapore; and Michael Miller, of the University of California-Santa Barbara.
“We systematically examined the reproducibility of studies at different sample sizes, from as few as 16 to as many as 121 participants, and across many different task-based fMRI approaches,” Barbey said. “We found that reproducibility is modest in studies involving the standard number of 30 participants, and that much larger sample sizes may be needed to improve replicability.”
The problem is widespread, Paul said. A majority fMRI studies published in cognitive neuroscience journals in 2011-14 involved fewer than 100 participants. The median number is about 30. The problem stems in part from the fact that fMRI studies are expensive.
“Even small studies can cost several tens of thousands of dollars, and funding systems are not generally set up to enable the routine collection of data from more than 100 participants,” Paul said.
Other factors also contribute to the challenge. For example, researchers in neuroscience and psychological science may not understand how statistical analyses for fMRI differ from those in behavioural studies, and so may not accurately evaluate the statistical power of their fMRI analyses, Turner said.
“They may confuse the high retest accuracy of fMRI with high replicability,” he said. “But that sort of accuracy refers only to retests of the same individuals engaged in the same tasks, not to similarity across separate small groups.”
To overcome the problem of expense, Miller suggests that researchers pool their resources to conduct fMRI studies across numerous sites involving greater numbers of participants. This would likely result in more diverse participant pools. It also would advance the work of many scientists, laboratories and institutions at once, yielding more data per participant with potentially greater impact.
“Replicability is the foundation of scientific progress,” the authors wrote. “Unfortunately, for a variety of reasons, many scientific fields are gripped by a crisis of irreproducibility. While some of the causes of the crisis are deeply woven into the academic landscape – incentives related to publication, funding and tenure – the most straightforward solution relates to statistical power.”
“Our study points to a major problem in the field of cognitive neuroscience, a field in which fMRI studies are a commonly used tool,” Barbey said. “But this is a problem that we can solve by increasing the size of the samples we study.”
Despite a growing body of research suggesting that task-based functional magnetic resonance imaging (fMRI) studies often suffer from a lack of statistical power due to too-small samples, the proliferation of such underpowered studies continues unabated. Using large independent samples across eleven tasks, we demonstrate the impact of sample size on replicability, assessed at different levels of analysis relevant to fMRI researchers. We find that the degree of replicability for typical sample sizes is modest and that sample sizes much larger than typical (N = 100) produce results that fall well short of perfectly replicable. Thus, our results join the existing line of work advocating for larger sample sizes. Moreover, because we test sample sizes over a fairly large range and use intuitive metrics of replicability, our hope is that our results are more understandable and convincing to researchers who may have found previous results advocating for larger samples inaccessible.
Benjamin O Turner, Erick J Paul, Michael B Miller, Aron K Barbey