@memester,
memester;114199 wrote:And so these two methods tell you what ? p = 1 or p = 0.5.
In the case of 50 + 80, it being equal to 80 + 50, what does that mean in
English, as to the statistical significance ?
It's not so much the method per se, but whether you are using a formula that assumes the variability of interest to be unidirectional or bidirectional. If, for example, you're doing a study of fever, that would (usually) be regarded as a unidirectional variable -- i.e. temperatures
above normal, and a temperature
below normal would not be allowed to reject the null hypothesis (this depends on what your hypothesis is). So you would use a one-tailed T-test for this. If you can have a bidirectional variable, you'd usually use a two-tailed T-test. The P-value is twice as high for a two-tailed T-test (it's built into the formula and I don't know how the formula is derived), but what that means is it's twice as likely that an observed difference is due to chance.
Two-tailed test - Wikipedia, the free encyclopedia
P = 0.05 literally means that there is a 5% probability that an observed difference is due to chance, and a 95% probability that it was due to a real difference between the groups.
memester;114199 wrote:Is this the way that the climate scientists approach the question of statistical significance of temperature readings ?
How would you do it like a climate scientist, using ten thousand readings, from the 2 thermometer stations ?
Statistics can be applied to any data set, so it doesn't matter if you're taking 1000 readings each from two weather stations or from two patient groups. Let's assume that study design is uniform (which cannot be completely assumed for a "case-control" design where you're comparing X "now" versus X "then" but you hadn't created a study protocol before X "then" -- and you also don't have a matched control group). But for the sake of simplicity, we assume that the equipment and the method of measurement is uniform, and the measurements are noontime ambient temperature readings at a fixed location every day of July for ten consecutive years -- once from 1900-1909 and once from 2000-2009.
This would generate 300 readings from each period. You could then use something as simple as a T-test to compare means. You need stats, because in all likelihood the "number" is different, i.e. you might get a 17 degrees C from one set and a 20 degrees C from the other set. But is that a REAL difference or can it just be due to chance? The greater your N, the more likely that an observed difference is real. This is intuitive -- if you flip a coin four times and get three heads and one tail, this is not reflecting the true probability -- you need more measurements (which will cause regress to the true mean, which is closer to 50/50).
This effect of sample size is built into the probability equations. Again, I don't know how, it's very complex math that's beyond how I use it. But if you generate a P value of < 0.000001, that means that there is only a 0.0001% probability that the difference is due to chance. If your P value is 0.1, then you have a 10% probability that it was due to chance.
In other words, the P value is simply telling us the confidence with which we can conclude that an observed difference is real. Conventionally a P < 0.05 is considered "significant", though this is arbitrary and people who get a P of 0.053 will often call it a "trend" towards significance.
In the case of 50 + 80 vs 80 + 50, let's say that these are temperature readings done the same way on July 1 and July 10 of 1900 vs 2000. The means are the same, right? It's the commutative property of addition -- 50 + 80 = 80 + 50, and the mean of 80 + 50 is (80 + 50) / 2 = 65.
So if you're doing a test of significance comparing 65 vs 65 (as your means), your P-value will be 1. Why 1? Because there is a 0% probability of there being a true difference between the groups.