Description 
It is widely believed that P values from null hypothesis testing overstate the evidence against the null. Simulation of t tests suggests that, if you declare that you’ve made a discovery when you have observed P = 0.047 then you’ll be wrong at least 30% of the time, and quite probably more often: see http://rsos.royalsocietypublishing.org/content/1/3/140216 This problem will be discussed from several points of view: e.g. do we look at P = 0.047 or at P ≤ 0.047? Is the point null sensible? To what extent do the conclusions depend on objective Bayesian arguments? The results of simulations are consistent with the work of J. Berger & Sellke on calibration of P values, and with the work of Valen Johnson on uniformly mostpowerful Bayesian tests. On the other hand, onesided tests with distributed rather than point null give different results. Finally, I ask what should be done about the teaching of statistics, in the light of these results. Could it be that what’s taught in introductory statistics courses has contributed to the crisis in reproducibility in some sorts of science?
