"The trouble with the world is that the stupid are cocksure and the intelligent are full of doubt." -- Bertrand Russell

Monday, January 9, 2012

Correction on "Helping Theist Arguments" (from 1/3/12)

User efrique on r/atheism has informed me that I made a conceptual error in my article "Helping Theist Arguments". With his permission, I include his comments here without alteration (other than editing out comments not related to his correction, which can be found here.) It's slightly embarrassing for me that I made this conceptual error, given the number of courses I've had on statistics and my background in statistical physics. It's slightly encouraging that efrique ensures me that this is a common mistake, repeated in a number of textbooks.

In my article, I wrote: "Whether you accept or reject the null hypothesis is then based upon how big the probability is that the given hypothesis is true. That's the way basic statistical tests work."

Efrique responds:

This is not right (you've strayed into my area, so I am probably going to get more technical than required)
A p-value is the probability of a test statistic at least as extreme as the one observed given that the null hypothesis is TRUE. This is entirely different from the probability that the null is true:
Something like p( T >= t | H0) is not at all the same thing as P(H0) or even P(H0 | T = t)
[This is an extremely common mistake, so it's probably one you've seen elsewhere - and more than once. I've even seen this error committed in statistical textbooks. It's one of several tests I use to reject a text as inadequate; it takes all of a minute or two to check if a text does these kind of things, but it's generally very telling of the overall standard.]
(If you're a Bayesian, you could argue that a p-value is related to the probability that the null is true given a result at least as extreme (proportional to it via Bayes theorem), but a Bayesian generally won't calculate a p-value at all, since a Bayesian can just jump straight to the relative posterior probability for the two hypotheses, or more likely would cast the whole thing in terms of decision theory and simply start computing some loss function or other quantity related to some measure of utility or disutlity. The people that calculate p-values - frequentists, generally - will deny that it makes sense to talk about relative probability of the null and alternative {holding that they're either true or false}, and hence the p-value simply has no interpetation even as a quantity proportional to such a probability.)

1 comment:

  1. For a more in-depth discussion of the nuances of null hypothesis significance testing (and why it's probably inadequate), you could check out the book Scientific Method in Practice by Hugh G. Gauch.

    tl;dr version here: http://web.math.umt.edu/wilson/Math444/Handouts/Cohen94_earth%20is%20round.pdf

    The trouble with Bayesian hypothesis testing of course is that, while it's conceptually more straightforward and probably more useful than NHST, the computational details are trickier. There also aren't as many good intro textbooks for it.

    So, with my modest mathematical abilities, I find myself accepting Bayesian over frequentist inference intellectually, but not handy with the nitty-gritty of carrying it out. Not a good place to be.