Monday, September 26, 2011

How to do Mediation Analysis with Survival Data

I recently heard Theis Lange talk about How to do Mediation Analysis with Survival Data. Lets say you want to examine the effect of socio-economic position (SEP) on long term sickness absence. Part of the effect may go through the physical work environment, part may be directly related to the socio-economic position:



To make the idea even more specific, imagine a society with farmers having a high socio-economic status (reducing the probability of becoming sick), but working in a dangerous physical environment (increasing the probability of becoming sick). The question is how much of the overall association between status and sickness absence that goes through the physical environment and how much that can be attributed directly to status.

Unfortunately there is no general frame for answering these questions. There are some approaches for specific models, but nothing that works for all cases. Lange's aim is to provide such a general frame. His approach is based on nested counterfactuals. We imagine SEP being constant and vary the mediating variable, but we also imagine the mediating variable being constant and change the status variable. This sounds easy, but as often it involves some effort to apply the idea. Lange makes a very useful contribution by showing exactly how to apply the idea in R.

I have two small comments. First of all, the language of direct and indirect effect is slightly misleading. All effects are indirect in the sense that at a finer level of detail one could specify a more detailed causal mechanism. What we are talking about is really  "effects going through the mediating mechanism" vs "all other effects that go though other mediating mechanisms." But this is obvious.

More problematic, there may be a logical problem with nested models. At least if one is not careful when interpreting the effects. Go back to the example of the farmer. Lets say we have lots of professions and we want to examine the relationship between having a job in that professions (which is a proxy for socio economic status) and sickness absence. Imagening a farmer in a farmer environment is easy. Imagining a farmer in a non-farmer environment sounds difficult but not impossible for professions that are close, but imagining an economist working in a farmer's environment is implausible. Nested counterfactual often involve implausible counterfactuals of this type. An approach that asks us to work out the effect of a mediating variable by taking the average across some possible and some impossible counterfactuals may have a problem.

To what extent it is a problem, depends on how carefult one is interpreting the model. One may, for instance, interpret socio-economic status as something much more general. Imagine qadruplets from the same background, two with occupations having similar status - economist and dentist, perhaps - but with different work environments. Another two with different status professions, but with similar work environments. This may be logically possible but it seems difficult to identify the associations in general.

In sum. Good idea, great R implementation, but uncertain about the intuition of nested models in many contexts.


'via Blog this'

UnderstandingSociety: Current issues in causation research

A blog post summarizing current strands in causation research - with many links to relevant papers.

UnderstandingSociety: Current issues in causation research:

'via Blog this'

Wednesday, September 7, 2011

Estimating a treatment effect using OLS: The problem of the implicit conditional-variance weighting

While reading Stephen L. Morgan and Christopher Winship's book Counterfactuals and Causal Inference, I came across the following statement:
“In general, regression models do not offer consistent estimates of the average treatment effect when causal heterogeneity is present.” (p. 148) 
The argument is that linear regression estimates implcitly use a weighting that is likely to be wrong. Consider an example in which the units can be divided into three strata (small, medium and large). Assume the effect of treatment differs between the strata: Treating large units have a larger effect than treating small and medium units. We also assume that there are more medium units than small and large units. The overall average effect of treatment is the average of the treatment effects in the different strata. Since there may be more of some units than others, the overall average must be weighted. If most units are medium, then the results for this group should be given more weight when calculating the overall average effect. For each strata the treatment effects is weighted by the number of units it contains realtive to the overall number of units and we get the average effects of treatment. This sounds both intuitive and correct.

What happens if we use regression to estimate the treatment effect instead of the stratified weighting described above. Using the same data one may try a linear regression to find the effect of treatment in a model that accounts for different effects in different strata (i.e. one includes dummy variables for each strata except for one). Interestingly this kind of regression implicitly uses a different weighting than the stratified maching described above. According to Morgan and Winship, the linear regression estimate implicitly include the variance of the treatment variable within each stratum when constructing the weight (p. 144). If treatment is binary, the variance is p(1-p) where p is the probability that a unit will receive treatment. The variance is highest when p=0.5 (see figure below). In short, linear regression does not only use the number of units in the strata to weight the results, but also the conditional variance. The big question, of course, is whether this weighting procedure correct results.



According to Morgan and Winship, the answer is most likely no. In order to give correct results the propensity to receive treatment has to be the same in each strata or the stratum specific effect must be the same for each strata (p. 148). Morgan and Winship claim that the first is almost never true because the purpose of controlling for the strata in the regression is that they have different propensities to get the treatment. I have to admit I was not convinced by this since the justification for including the strata could be that the units respond differently once they get treatment, not that there is a difference in the propensity to get treatment in the different strata. True, sometimes the problem is that different groups have different probabilities of receiving treatment, but not always? I may be mistaken here, but I do not want to just accept it right away. The second is more likely to be false: One often suspects that different strata respond differently to the same treatment. If this is captured by the model, it does not creat a problem, but if one uses a linear regression and there is non-linear heterogeneity, one may end up with very wrong results.

While accepting that this implicit weighting can create problems for regression estimates, I am slightly more uncertain both about how often the problem will occur and to what extent it will create problems when one of the conditions are not fulfilled. Morgan and Winship provide some helpful calculations showing that it is possible to get estimates that are very wrong. This is interesting, but the possibility does not demonstrate its generality.


Tuesday, August 23, 2011

Statisticians keen to alter image of number-crunching profession

Irish times comments on the type of papers presented at a statistics conference, saying the papers including an example of "the downright obscure" i.e. a paper on “semiparametric propensity score weighting method to adjust for nonresponse using multivariate auxillary information”

The title may be obscure to some, but it makes perfect sense and it is an interesting approach.

Statisticians keen to alter image of number-crunching profession

Is propensity score analysis really comparing like-with-like? One common mistake in propensity score analysis

Over the next few weeks I am going to read and comment on a series of papers about propensity score analysis. The comments are highly selective and do not represent a general evaluation of the articles. The first article is Deheija, R.H. and S. Wahaba (2002) Propensity score-matching methods for nonexperimental causal studies, Review of Economics and Statistics 84(1): 151-161.

Summary
This article is an introduction to propensity score mathing using the example of how to evaluate the effect of work training programs. Training programs are well suited to test the method since there is a great deal of data available, including data from randomized experiments. One can then compare the results from the randomized experiment and analysis based on propensity score adjustment using observational data. The authors also presents detailed information abour how different choices in the analysis affects the results (matching with or without replacement, varying the number of comparison units and different matching methods).

If one simply compares average earnings among those who participated in the programs and those that did not, those who did had earnings that were about $8500 lower than those who paricipated in the program. This, of course, does not imply that the program had no effect since there is a bias. Those who entered the program generally have more problems and lower earnings than the rest of the population. To find the causal effect one can conduct randomized experiments and in this case experiments indicates that on average the training increased annual earnings for those who participated by $1794.

Depending on the sample used and the different choices made in the analysis, the propensity score analysis suggested that the effect was between -916 and +1928. This is a wide interval, but the main factor driving the differences in the result are the use of diffferent samples, not the choice of matching methods. Comparing the sample from the National Suported Work (NS) Demonstration to data from the Current Population Survey (CPS) gives treatment effects between 1119 and 1605, while comparing it using the Population Survey of Income Dynamics results in answers from -916 to +1928. The main problem here is that when matching without replacement the number of observations becomes small and the results become unreliable.

The importance of distinguishing between statistical and economic significance
The general idea of matching and propensity score analysis is to compare like-with-like. To test whether this is the case, many authors seem content with traditional statistical t-tests of whether the sub-groups being compared have similar covariates (age, health, education and so on). For instance, in the present article the authors note that "None of the differences between the matched group and the NSW sample are statistically significant" (p. 157).

It seems to me that this approach makes it too easy to conclude that the groups are similar enough to be compared. In traditional hypothesis testing the null hypothesis is that the groups are similar. Given that the statistical test is biased toward not rejecting the null-hypothesis, it takes quite a big difference to conclude that the groups are different. In short, the playing field is not equal. It becomes easy to claim that "I now compare like-with-like" when you require a lot of evidence to change your mind. depending on the chosen level of significance, they will favour the null-hypothesis of similarity unless that are 95% certain that the groups are differenty are different.

What should be done? First of all one should examine the covariates for potentially important differences even if they are not statistically significant. The problem, of course, is that this injects subjectivity into the process. Who is to judge which covariate is important to balance and by how much? However, this subjectivity is unavoidable and may be desirable in cases when one has prior information about the importance of some variables. However, the issue may also have implications about how to do propensity score analysis. So far it seems like people have used the kitchen-sink method of throwing lots of possible covariates into the analysis. Some of these are obsviously important, while others are a priori less important and are thrown in "just to be sure." Given that the computer does not know the difference, it may be wrong to try equally hard to get balance on all covariates and it may also be problematic to use the kitchen sink approach.

Tuesday, March 22, 2011

How to use the method of ”propensity scores analysis” in SPSS?


 
1. Generate the “propensity scores” (an estimate of how likely it is that an individual with certain characteristics will end up in treatment A)
a)      Select logistic regression (Analyze  --> Regression  --> Binary Logistic)
b)      Select the dependent variable (whether the client received treatment A or not). This has to be a dichotomous variable. If it does not exist in the form you want it, use “Recode (into different variable)” under “Transform” in the SPSS menu before running the logistic regression.
c)      Move all the variables you believe important into the box for “Covariates.” (e.g. gender, age etc. Important variables = Those that influence both the outcome of the treatment and whether the person receives treatment A or not).
d)      In the menu for logistic regression, first click  “Save” and select “Probabilities” under “Predicted Values.” After this click “Continue.” (We need to save the result of the regression since we are later going to compare individuals with similar propensity score values.)
e)      Click “OK” and in the unlikely case that no mistake has been made, SPSS will run the regression and add a new column to your dataset which represents the “Propensity score” (often automatically labelled “pre_1”, “pre_2” and so on) You will also get an output with lots of information about the regression result (coefficient values, how many cases it correctly predicts and so on. Ignore this for now.)

2. Compare individuals with similar propensity scores (using subclassification)
a)      In the SPSS menu system, select “Transform”  --> ”Categorize Variables” and select the variable you just created/saved under in the binary regression (the propensity score, often labelled “pre_1”). Also change the number of groups to 5. The new categorized variable will (automatically) be called “npre_1”, “npre_2” and so on.
b)      You can now compare the groups within the same category by – for instance – “Analyze”  --> “Descriptive Statistics”  --> “Crosstabs” or “OLAP Cubes” under “Analyse”  --> “Reports” and choosing “npre_1” as the layer/classification variable. By so doing you will get the mean result for those with similar propensity scores (here defined as less than 0.2 difference) who received treatment A compared those who did not.