Monday, April 16, 2012

Judea Pearl and causality

Judea Pearl gave a lecture in Oslo some time ago and I just want to digest it by writing this blog post.

His main argument was that causality was not a statisticial concept. Statistical concepts are derived from the joint distribution. Causality, however, cannot be dervied from a joint distribution (alone) since changing a variable will lead to a new joint distribution. If I double the price of a product, I cannot automatically use the old joint probability distribution to infer the effect of this change. It all depends on why there was a change, other circumstances and the stability of the functions.

This is clearly true. Some may complain that not all changes will lead to large changes in the joint distribution, we may have previous knowledge that the relationship os often stable and so on, but in principle it seems clear that we cannot get causality from the observing the joint distribution alone. We need some kind of identifying assumption or mechanism. We need something to identify the effect of changes in a (structural) model: A randomized experiment, an instrumental variable, something!

Pearl's second argument was that the standard statistical and mathematical language was unable to articulate the assumptions needed to talk about causality. We need new notation, he argued, to distinguish between equations and claims about causality. In his language y = 2 x is an equation, while y:=2x or y <-- 2x symbolizes causal claims. More generally he argues that there are two approaches and languages that can be used: The first is structural equation modelling (which he approaches using arrows, graphs and do operators). The second is the potential outcome language used by Neyman-Rubin and many others. In this language the notation Y1 and Y0 is used to indicate the value of the outcome when treated vs. not.

So what, one might say? Is is so important to make the distinctions and notational investments above? Pearl has at least managed to show that the language is useful for generating surprising conclusions that have important practical implications. For instance, his graphical approach and the "do-calculus" make it easier to identify when it is possible to estimate the causal effect and how it should be done. He has also showed, somewhat surprisingly, that conditioning on some types of variables ("colliders") in a standard regression analysis, will introduce bias. Finally, using graphs and the do-calculus it is easier to focus attention of how to achieve identification of causal effects (by closing "back doors"). This is all very valuable.

The frame works, but it seems to lose at least som of its visual elegance, when we introduce a time dimension with lags, feedbacks and dose-response relationships. Pearl's answer to this was that the approach was still valid, but that in a dynamic situation one would have to imagine a succession of graphs. 

In sum it was a stimulating talk. Some might argue that the approach is slightly too much "either/or." Pearl makes a very sharp distinction between when it is possible and when it is impossible to identify a causal effect. There is no such thing as "approximate identification." This is clearly mathematically true, but sometimes the important question is not "is it possible" but "how likely it is that I have gotten closer to a decision relevant conclusion." To use an analogy: It is impossible to solve large NP problems fast, but it is possible to get an approximate solutions fast. In the same way I sometimes get the impression that Pearl's approach focuses heavily on "yes/no" questions as opposed to questions about shades ("how bad is it if x is the case compared to y when I try to identify a causal effect"). To be slightly more specific. Using a collider in a regression is bad, yes, but how bad is it? And what determines the degree to which it produces misleading results? But these, I guess, are practical problems that come after logical and conceptual clarification.





No comments:

Post a Comment