This isn’t p-hacking

*Disclaimer: Bryan is not a weather vane.

For the past half a year or so, Bryan has been living only partially in Seattle, and is often gone more than he is here. When he is in town visiting or working, he often complains that it’s gray and rainy, even though we are now deep into summer and the oft-promised bone-dry weather has materialized in the way of 90+F degree heat waves. I mostly dismissed his complaints as biased, or anecdotal, since I had been largely experiencing summer bliss. But when I woke up this morning to a sniffly nose, hazy skies, and a negative attitude, I thought, let’s see what the data has to say about this.

The question I am trying to answer is whether it rains more when Bryan is in town.

Variable x: Bryan is in town
Variable y: It rains in Seattle

H0: The two variables are independent.
H1: The two variables are not independent.

Statistical testing

I will compute the Phi coefficient. At α = 0.05 and df = 1, the Chi square critical value is 3.84, above which I will reject the null hypothesis.

Data collection

The time window of interest was chosen as 2016.3.1 — 2016.8.29 (today), a span of 182 days. This corresponds to the time period in which Bryan began traveling away from Seattle.

Weather data was obtained from the NWS Seattle forecast office’s handy-dandy website. The value of interest is precipitation amount. I was able to sample this as a binary variable: whether or not it rained at all that day. The value “T” (meaning trace amount), was interpreted as rain.

Results

Contingency Table

Rain No rain Total
Bryan here 32 31 63
Bryan gone 37 82 119
Total 69 113 182

Mosaic plot of contingency table

Probability of rain
When Bryan’s here: 51%
When Bryan’s gone: 31%

Amount of rain per rainy day
When Bryan’s here: 0.24 in/day
When Bryan’s gone: 0.15 in/day

Upon initial inspection, it certainly seems like it rains a lot more when Bryan’s in town! But is it statistically significant???

N: 182
Phi coefficient: 0.19
Chi square value: 6.79

The Phi coefficient is less than 0.3, indicating low to no association between the two variables. However, this interpretation is incomplete because neither variable is very evenly distributed. The Chi square value of 6.79 is much higher than the critical value of 3.84, so it looks like we can comfortably reject the null hypothesis at the current alpha.

Woohoo! It looks like it does statistically rain more when Bryan is in Seattle!

Notice that I did not equate correlation with causation; if I did, I would say something like: “Looks like having Bryan around really helps predict the weather in Seattle! Let’s keep him away so we can experience a 40% increase in sunny days!” Unfortunately, this kind of interpretation is far too common.

I’m not so eager to throw in the towel yet. I will keep fishing until I have enough to hint at causation 😉

Leave a Reply

Your email address will not be published. Required fields are marked *