Of p Values and Effect Sizes

Published by

on

Scientists are obsessed with p values, and since I work in a particularly quantitative field, I’m more obsessed than most. When you run a statistical analysis on noisy data, there are several ways to get a statistically significant p value. You could increase your sample size to improve the statistical power of your analysis. You could have a really strong effect size, resulting in statistical significance even in a small study. Or you could just get lucky and get a significant p value by chance.

A few days ago, I attended a seminar by someone looking for a specific effect in a mouse experiment. Her experiments results in p=0.06, but she was very certain that the effect is real. “The data is so noisy, to get so close to a significant p value the effect has to be very strong,” she argued. This is a fallacy.

Eric Loken and Andrew Gelman published an article on this a few years ago:

A common view is that any study finding an effect under noisy conditions provides evidence that the underlying effect is particularly strong and robust. Yet, statistical significance conveys very little information when measurements are noisy […] Should we assume that if statistical significance is achieved in the presence of measurement error, the associated effects would have been stronger without noise? We caution against the fallacy of assuming that that which does not kill statistical significance makes it stronger.

But why is this a fallacy? The answer is related to selection bias:

In a study with noisy measurements and small or moderate sample size, standard errors will be high and statistically significant estimates will therefore be large, even if the underlying effects are small. This is known as the statistical significance filter and can be a severe upward bias in the magnitude of effects.

This blog post by Gelman goes into more detail.

Previous Post
Next Post