New Study on Thimerosal and Neurodevelopmental Disorders: II. What Happened to Control for Confounding?Published May 24th, 2008 in Autism, Child Health, Infant Health, Medical & Epidemiological Studies
Before diving in and talking immediately about the Young, Geier, & Geier study, “Thimerosal exposure in infants and neurodevelopmental disorders: an assessment of computerized medical records in the Vaccine Safety Datalink,” it’s probably a good idea to back up a bit and talk about confounding and confounders. The authors make no mention of confounding and take no account of potential confounding in their analysis. This is one of the many flaws of the study.
Ironically, no epidemiologist has contributed more to modern epidemiological thinking about confounding than Sander Greenland. Professor Greenland has published around 50 methodological papers about confounding and confounders, including five papers specifically on the problems of confounding in ecological study designs. The irony here is that Prof Greenland is an expert witness for the Petitioners in the Autism Omnibus. Given that Prof Greenland has written so much about confounding, you might guess that there are hundreds of other methodological papers related to the subject, and you would be right. But there’s no need to get bogged down in an advanced discussion of confounding, even though the philosophically inclined might be fascinated by new thinking on counterfactual confounding and causality.
For the purposes of our discussion today, we will use the classical definition of confounding accepted by most epidemiologists: Let X be an independent variable, or exposure, and Y be a dependent variable, or disease outcome.*
Z is a potential confounder if:
(1) Z is causally related to the disease outcome Y or the diagnosis of disease outcome Y. Put another way, Z is a risk factor for the disease or for diagnosis of the disease.
(2) Z is associated with the exposure.**
If you put #1 and #2 above together, confounding can be thought of as a “mixing of effects.” You’ve hypothesized a causal association between X and Y, but some other risk factor Z may be responsible part (or all) of the association under investigation. So a potential confounder must have the potential to provide an alternative explanation for the observed association. (There’s a nice little lesson plan for teaching “Confounding in Epidemiology” to high school students at the College Board website.)
Now let’s get away from theoretical discussion and talk about studying thimerosal exposure and neurodevelopmental disorders, with a specific focus on autism. What are the potential confounders that might need to be considered when investigating this association? To guide us in thinking about confounding, the usual approach would be to use knowledge of risk factors for autism, knowledge of vaccine utilization, and previous literature. We also need to consider the nature of the data. If the investigation includes data on children collected over several years, it’s usually a good idea to consider date of birth as a potential confounder. In the Vaccine Safety Datalink (VSD) data, we know that the frequency of autism was increasing over time, between 1990 and 1996. We also know that Hg exposure varied during that time period. These latter two relationships are shown clearly in Figure 1 of the Young et al paper. I would argue, therefore, that in the VSD data its absolutely essential to consider date of birth as a confounder. At the very least you would need to take birth cohort into account. Indeed, given the marked increase in autism rates over time, you would want very “tight control” for date of birth, so it might even be a good idea to consider month of birth as a control variable in the analysis. Indeed, in the the Verstraeten et al. study of the VSD, the investigators used “proportional hazards models stratified by…year and month of birth.” Whatever else one thinks of the Verstraeten et al. study, any epidemiologist would agree that this was the correct approach to control for confounding in the VSD. Now it’s impossible to know for absolutely certain that a potential confounder is an actual confounder until you analyze the data. But in the VSD, the autism time trend is so strong that you have to consider annual birth cohort as a confounder, and month of birth would be even better.
There’s lots of other variables that could be considered as confounders in looking at the association between thimerosal exposure and neurodevelopmental disorders. For example, in Heron and Golding’s analysis of the British ALSPAC data the investigators used nine confounders in their multivarate analysis (birth weight, gestational age at birth, maternal education, child’s gender, parity, housing tenure, midpregnancy maternal smoking , child’s ethnicity, and breastfeeding for 3 months or more.) In Thompson et al’s follow-up study of children from the VSD, there is a huge list of confounders In Table F of their Supplementary Appendix.
Unfortunately, Young et al. did not have access to this kind of detailed data from the VSD — nothing even close to it. Given the limited amount of data available to them, all the more reason to carefully consider confounding in the data that they did have. They did have data on year of birth from both the exposure file and the outcome file. Young et al. analyzed the VSD using ecological regression analysis with birth cohorts as units of analysis, but the only variables in the regression were autism rate and “average Hg dose per person.” A simple approach would have been to add year of birth (1990, 1991, 1992…, 1996) in to the regression analysis as a control variable. This would not have completely controlled for confounding, but it would have been a start.
I can’t be absolutely sure, since they don’t describe any details or give any statistical references, but I think Young et al’s ecological regression equation is really quite simple:
Log(autism prevalence rate) = A + B(average Hg exposure in 100 microgram intervals), where A is an intercept term and the units of analysis are birth cohorts. Without going into the mathematical derivations, take my word for it that the Risk Ratio = exp(B). If we look at the first risk ratio in Table 3, which is 2.87, Young et al. would interpret it thus: In the birth to 7 month period, the rate of autism was approximately 2.9 times higher given a 100 microgram increase in Hg exposure in thimerosal-containing vaccines. This 2.87 was derived from a poisson regression in which the slope was 0.46 (i.e., the log of 2.87). As a slight modification of the statement in my previous post, picture a scatter plot of 7 points were the X axis is average Hg dose, the Y axis is log(prevalence rate), and the 7 points are where the mean Hg dose for each birth cohort intersects the log(prevalence rate) for that birth cohort. The slopes of the lines for autism are 0.46 (for exposure from birth to 7 months) and 0.42 for (exposure from birth to 13 months).
Pretty steep slopes and, therefore, apparently strong associations. But there’s no attempt to control for, or adjust for, the confounding effect of birth cohort. Just one look at Figure 1 (or a basic knowledge about trends in autism) tells you the regression coefficients (slopes) are being driven by increases in autism risk over time. Given the increase in frequency of autism (and other neurodevelopmental disabilities) during time time period, you could do an ecological regression analysis of almost any factor that varied over time and you would find an an association with autism. I would bet that you could enter number of sushi bars per capita into an ecological regression and you’d find an association with autism rates.
And please note, as I alluded to briefly in my last post, that one of the major problems of ecological analyses like this one is that, if confounding variables are left uncontrolled, risk ratio estimates tend to be hugely and spuriously magnified. In other words, the 2.87 is wrong and much too big. This “ecological magnification” bias will be the topic of my next post.
Oh, and by the way, let’s not forget that Young, Geier, and Geier “cooked” the data on the number of cases in such a way that they would get stronger effects. I mention this again because all of hese flaws combine together to make one virtually unbelievable paper.
If you have any questions, please don’t hesitate to comment.
*Please understand that “disease” is a general term used here for the sake of methodological discussion only. It could be argued that autism and ASD are not “diseases.” Also, these definitions (of confounding, etc.) work for research on outcomes that are clearly not diseases. For example, I have a colleague who just submitted a paper for publication on causes of happiness during pregnancy.
**There is one other “rule” about confounding that doesn’t apply to our discussion of the Young et al. paper, but it’s worth mentioning for completeness:
(3) Z cannot be a confounder if it’s association with the exposure is entirely due to the causal effects of the exposure on Z. Thus, for example, Z may not be a confounder if it is an intermediate variable in the causal pathway between exposure and outcome, i.e. exposure X affects Z, which affects the outcome Y. Please don’t worry too much about this criterion #3, because it’s not relevant to our discussion today.