A story of ice-cream, Drowning and causal modelling.
If you have taken any statistic class, the very first thing your where thought would most likely be:
Correlation does not imply causation
The next thing you then will learn is that:
Correlation does not imply causation. But sometimes, it does.
If it weren’t for the last bit, statistics courses would be very short since finding correlations is useless if it doesn’t explain causation.
However, the first bit is always the most important since bad statistics are often worse than no statistics, and the world is full of bad statistics.
The problem is that bad statistics can have many real-life consequences, from people believing eating chocolate will earn you a Nobel prize to justifying racism, or under/overestimate the effect and side effects of medical treatments.
Thus, learning a few causal modelling tools may help you find the real explanation behind the numbers or call bullshit when someone points at a correlation and calls to action.
So let me demonstrate with a textbook example.
Basic Causal modelling
If you look at the graph, you will see that sale of ice cream and drowning accidents per month are highly correlated. This may lead to two theories:
A: Selling ice cream somehow courses more drowning accidents
B: People are drowning, increase ice cream sale
In today’s world, someone would probably post theory A online in an engaging, emotional post, and people would rally in protest and demand ice cream be made illegal and that everyone involved in the trade is sent to jail.
But as scientists, we are committed to the truth. And the only way to find the truth is to design some experiments. To investigate hypothesis A, we could close all ice cream stands or give away free ice cream to see if it changes the number of drowning accidents. Then given that we have learned, and can still remember, how to make a t-test correctly, we will probably find no significant change between selling ice cream and people drowning.
This will falsify hypothesis A, and we should congrats ourselves on good science (falsifying a hypothesis is just as important as verifying them even though you don’t make the same headline). But we can’t call it a day because falsifying A says nothing about B.
The correct scientific way is to tie some people to a rock and drop them in the ocean, to see if we sell more ice cream. The ethical complications of this experiment should be discussed in another article, right now, we are doing ice-cold, ice-cream science. After performing this experiment, we would most likely find the ice cream sale, unlike our conscience, was unaffected.
Mediators
Let's say that our intervention study on stopping ice-cream sales proved a small but significant effect on the number of people drowning, then we may wish to investigate this phenomenon further. In this case, we could develop hypothesis A.1.
A.1: Eating ice cream makes you a worse swimmer. And bad swimmers have a higher chance of drowning.
In this case, the swimming ability is the mediator. The beauty of this hypothesis is that it is built on two sub hypotheses that can be tested separately in two different studies. It would also be easier to detect a significant correlation since the correlation between drowning and ice cream is given as the correlation between ice cream and swimming multiplied by the correlation between swimming ability and a person’s chance of drowning.
It is worth noting that no link was found between eating and swimming abilities when scientists investigated this link between eating and swimming abilities.
Confounders
So we can conclude that it’s both hypotheses A and B is false and should look elsewhere for an explanation. And I am sure it has been screaming in your mind this whole time. It is the season that courses both.
In this case, the season variable season is called a confounder, and Ice cream sales and drowning are both children of this confounder. A simple mathematical trick to see this is to condition the confounder. For example, we could only look at drowning and ice cream sales for the same month in different years.
Now the correlation is completely gone.
Colliders
Since you found out about the season, I hope you took a moment to feel clever, of course, your wrong. Few people turn the calendar, see it is June the first, and think time to increase their ice cream10% consumption, and then try to hold their breath underwater for 10 seconds longer than in April.
But the season is defined by the earth’s rotation, which courses more sunshine on one part and course the temperature to rise. Both the amount of sunshine and the temperature affect ice cream sales and the number of people going swimming. While this is just another example of mediators, it’s also an example of a collider.
The short point is that conditioning can be a powerful tool, but it doesn’t tell about what courses what. Only interventions do.
Putting it all together
Trying to put what we have learned today into a model that looks like this.
This, however, is still a very simplified model that fails to capture much of the complexity of our little ice cream problem. For example, if we added geography, culture, or economics, we would see a much different story, but right now, that’s all considered noise. Generally, in our attempt to fully understand the world, we can keep adding nodes to our causal model until we don’t have any noise unaccounted for.
We also haven’t discussed how notes affect each other. Would one-degree extra average temperature translate to 10% more ice cream sales(multiplier) or make everyone buy one more ice cream independent of all other factors (addition)?
However, I think I have been rambling enough about this problem, so I will end with the point all scientists wish to make. The world is more complicated than it seems.
Credit:
Drowning data: Injury fact
Icecream data: United states department of agriculture