“…the difference between significant and not significant is not itself necessarily significant.”

The quote above comes from a perspective published in Nature Neuroscience this past summer by Nieuwenhuis and colleagues. They detail a surprisingly common mistake in the statistical analyses carried out by some studies published in prominent journals.
It might be easier to first illustrate the mistake with an example. Let’s say I give a control group and a treatment group a task. To measure performance, I record how quickly each subject completes it. The control group takes an average of 2 minutes to complete the task, whereas the treatment group takes 2 minutes 10 seconds. After this first session, I give the treatment group a real drug and the control group gets a placebo. Then I give another version of the task and measure performance again. This time, the control group takes 2 minutes 5 seconds whereas the treatment group takes 5 minutes. Is this increase in completion time significant?

Looking at the pre-drug condition, I carry out a test to see if 2 minutes is significantly different from 2 min 10 sec (p = .06, non-significant). Then for the post-drug condition, I conduct another test to see if 2 min 5 sec is significantly different from 5 minutes (p = .04, significant). I decide that the drug has a significantly detrimental effect because one difference is non-significant but the other is significant. But this isn’t an interaction; I’ve only reported simple main effects. I should have first calculated the differences between the two groups in pre-drug and post-drug sessions and run one test to see if the differences are significantly different. That is, I need to see if significant is significantly different from non-significant.

The above example of an invalid conclusion drawn from insufficient statistical analysis is similar to what Nieuwenhuis and colleagues found in a number of studies. They reviewed 513 papers in Science, Nature, Nature Neuroscience, Neuron, and The Journal of Neuroscience from 2009 and 2010. Out of those, 157 articles presented a situation in which this type of flawed statistical analysis could be made. Seventy-eight of those studies performed correct analyses, whereas 79 ran incorrect statistical analyses. That’s 50%! The authors also state that it’s likely that these errors did not seriously impact most of the studies’ final conclusions, but that it’s impossible to be sure since the necessary information isn’t given.

For researchers, this article simply reinforces the importance of ensuring that data is examined correctly because it could turn into an integrity issue later. Additionally, more and more current scientific findings are reaching the general public in various ways (e.g., this blog). For everyone, whether science is a hobby or a career, this article highlights the necessity of having an understanding of basic statistics in order to determine for oneself whether the conclusions heard and read are valid.

Nieuwenhuis, S., Forstmann, B.U., & Wagenmakers E.-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nature Neuroscience, 14, 1105-1107. Read it here: http://www.nature.com/neuro/journal/v14/n9/full/nn.2886.html