Can you trust the p-value?

Most of us are familiar with the standard significance threshold being <.05. Less than .05 you have a significant finding, and greater than .05 you have a non-significant finding. Ideally, this would correctly discriminate between spurious and real effects. However, there is a rather simple way to artificially lower your p-value - recruit more participants. This is one example of p-hacking.

Without going too far into the math of it, p-values are calculated using the standard error of the mean (SEM), and the SEM is directly influenced by the number of participants in a study. With a large enough sample almost any effect, even ones that are infinitesimally small, could become β€œsignificant.”

So what can you do to make sure you’re not a victim of p-hacking? The most simple solution is to look at effect sizes. Effect sizes are not affected by sample size and may give a more objective assessment as to the importance of an effect.

Before implementing any changes to your practice or study based on a new piece of research I highly suggest comparing the p-value against the effect size. If you find the p-value to be significant and the effect size to be very low, you may have found yourself an instance of p-hacking, inadvertent or otherwise.