New Study on Thimerosal and Neurodevelopmental Disorders: III. Group-Level Units of Analysis and the Ecological FallacyPublished May 29th, 2008 in Autism, Child Health, Infant Health, Medical & Epidemiological Studies
We believe that the investigator is never justified in interpreting the results of ecological analyses in terms of the individuals who give rise to the data. (”The Ecological Fallacy,” 1988, by Steven Piantadosi, MD, PhD, biostatistician and Director of the Samuel Oschin Comprehensive Cancer Institute at Cedars-Sinai Medical Center, Baltimore)
It is impossible to study ecologic analyses very carefully and come away with much confidence in them. Averaging is a process that necessitates some loss of information… When analyzing such aggregated data, we not only lose all ability to extend inferences reliably to less aggregated data but we even lose the ability to estimate the direction and magnitude of bias. We cannot rely on the addition of more grouped data to eliminate the bias. I encourage epidemiologists to understand the deficiencies of group-level analyses and limit them to the role of hypothesis generation. (”Ecologic Biases,” 1994, also by Professor Piantadosi).
Before I recommence my discussion of the Young-Geier Autism Study, I should stress that I’m certainly not an expert on vaccines. Even though I worked at the CDC for part of my career, I never worked in the National Center for Immunization and Respiratory Diseases (formerly the National Immunization Program [NIP]). Don’t forget that the CDC is a big place, spread out in several buildings in Atlanta and elsewhere. I don’t think I even knew anyone in the NIP. Given what I don’t know about vaccines, I’m very glad to see that Barbara Martin, MD has written a multi-installment critique of of the Young-Geier paper, with more emphasis on pharmacology and neurology (the latter being her medical specialty).
It seems that much of the confusion and difficulty in understanding the Young-Geier paper comes from the use of the term ecological study or ecological design. This confusion is not surprising, since here the word ecological has a meaning quite different from its meaning in biology or in everyday language. You will find the concept of the ecological study included among the methods of economics, sociology, and epidemiology, but almost never in clinical research in medicine (for good reasons, as we shall see). In order to understand the concept of an ecological-level study, it’s best to first think of what is meant by an individual-level study. In an individual-level study the investigator has data on every variable for every participant in the study. This may sound ridiculously simple, but it needs careful explication here, because an “ecological” study is quite different. In an individual-level study of thimerosal containing vaccines (TCV) and neurodevelopmental disorders (ND), for each child in the study we would have vaccination history for that child, ND diagnosis or diagnoses (if diagnosed), exact age of diagnosis or diagnoses (if diagnosed), date of birth, gender, age when follow-up ended, and information on as many potential confounders as possible for that child: birth weight, gestational age at birth, socioeconomic status, etc., etc. and all of that data would be linked together for that individual. Thus, in an individual-level study, the individual is the unit of analysis. All four of the cohort studies reviewed by Parker et al in 2004 and by Clements and McIntyre in 2006 were individual-level studies. (Both Parker et al. and Clements and McIntyre include previous papers by the Geiers in their reviews, but I’m not including any of those in the latter four studies I just mentioned.)
In an ecological study, data is collected at the group-level, as opposed to the individual level. The group is the unit of analysis. In fact, it would probably be easier to think of the Young-Geier Autism study as a group-level study with a group-level design and a group-level analysis, rather than using the confusing term “ecological.” Given the way the study is described, it may seem that there were 278,624 children in the study. Unfortunately, the investigators had 21 data files available to them separated by birth cohort, with no way to link the individual-level across data files. Thus, there were seven birth cohorts, 1990, 1991, 1992, …, up to 1996, and three data files for each year: an outcome file, a vaccine file, and a birth file. The birth file is mentioned only in Table 1 and was obviously not used in any of the analyses reported in this paper. Given the unavailability of linked individual-level data, the investigators decided to carry out a group-level analysis in the following manner:
First, for each of the seven birth cohorts they calculated the prevalence rate of autism, other ND’s, and control disorders, fudging the rates upward for 1995 and 1996, as I described in my first post on this paper.
Second [and here I’ll let the authors take over], Within each vaccine file, the cumulative Hg dose for each individual was calculated based on the number of each type of vaccine received. The cumulative dose of Hg was then aggregated over a birth cohort resulting in a total Hg dose for a particular vaccine by year of birth. The total Hg doses for each of the vaccines were then added together to obtain a total Hg dose for all vaccines by year of birth. The total Hg dose by year of birth was then divided by the population at risk for each birth cohort which was previously defined above. This calculation resulted in an average Hg dose per person for each birth cohort which served as the exposure variable. Because of interest in particular windows of exposure, Hg doses from vaccine exposure were calculated for the following periods: 1) birth to 7 months; and 2) birth to 13 months. Your head may be spinning after reading that, but the important thing to remember is that the birth cohort was the unit of analysis and for each birth cohort they calculated the average Hg dose per child.
So this is not an individual-level analysis in which we’re looking at the association of TCV exposure in 278,624 children to the probability* of autism (and other NDs) in those 278,624 children. Its a very simple group-level analysis with seven birth cohorts as the units of analysis. For each neurodevelopmental outcome, the investigators calculated seven prevalence rates by year (with fudged rates for 1995 and 1996) and then correlated the average Hg dose per child per year with prevalence rate. More precisely, they regressed log(prevalence rate) on average Hg exposure. The point is that, for each ND, there are only 14 numbers in the analysis — two from each birth cohort.
Let’s get back to the basic question. Come to think of it, what is the basic question? Despite their frequent use of the term “ecological,” anyone reading this paper without the utmost care is bound to think that the authors were testing hypotheses about individual children. After all, the questions that really matters are: Does thimerosal exposure cause autism in individual children? Is there a dose-response relationship between thimerosal exposure and the probability of autism in individual children? Given the nature of the data, these questions just cannot be answered using the Young-Geier analytic approach.
At this point I can easily see someone saying, ‘”What’s the big deal? Surely the the Young-Geier study does show a strong association between Hg exposure by birth cohort and autism rate by birth cohort. You may call it a group-level association, but can’t you just make the jump to the individual level and assume that Hg in vaccines causes autism?” The answer is a very definite NO. To make this jump from group-level to individual-level data is The Ecological Fallacy, which can be defined simply as thinking that relationships observed for groups necessarily hold for individuals.
The ecological fallacy was first described by the psychologist Edward Thorndike in 1938 in a paper entitled, “On the fallacy of imputing the correlations found for groups to the individuals or smaller groups composing them.” (Kind of says it all, doesn’t it.) The concept was introduced into sociology in 1950 by W.S. Robinson in 1950 in a paper entitled, “Ecological correlations and the behavior of individuals,” and the term Ecological Fallacy was coined by the sociologist H.C. Selvin in 1958. The concept of the ecological fallacy was formally introduced into epidemiology by Mervyn Susser in his 1973 text, Causal Thinking in the Health Sciences, although group-level analyses had been published in public health and epidemiology for decades.
To show you one example of the ecological fallacy, let’s take a brief look at H.C. Selvin’s 1958 paper. Selvin re-analyzed the 1897 study of Emile Durkheim (the “father of sociology”), Suicide, which investigated the association between religion and suicide. Although it’s difficult to find Selvin’s 1958 paper, the analyses are duplicated in a review by Professor Hal Morgenstern of the University of Michigan. Durkheim had data on four groups of Prussian provinces between 1883 and 1890. When the suicide rate is regressed on the percent of each group that was Protestant, an ecologic regression reveals a relative risk of 7.57, “i.e. it appears that Protestants were 7½ times as likely to commit suicide as were other residents (most of whom were Catholic)….ln fact, Durkheim actually compared suicide rates for Protestants and Catholics living in Prussia. From his data, we find that the rate was about twice as great among Protestants as among other religious groups, suggesting a substantial difference between the results obtained at the ecologic level (RR = 7.57) and those obtained at the individual level (RR = 2).” Thus, in Durkheim’s data, the effect estimate (the relative risk) is magnified by 4 by ecologic bias. In a recent methodological investigation of bias magnification in ecologic studies, Dr. Tom Webster of Boston University shows that effect measures can be biased upwards by as much as 25 times or more in ecologic analyses in which confounding is not controlled. Note that the Young-Geier Autism Study is exactly that — an ecologic analysis with no adjustment for confounding at all.
Please note that there is a completely different kind of “ecological study” that can be used when investigating a causal association. I’m referring to a study of time trends in which the exposure is suddenly interrupted or stopped. In my opinion, you can’t come to a causal conclusion based on only one such interrupted time trend study. But a few such studies, along with a few individual-level studies (cohort studies and perhaps case-control studies) do add helpful information. Thus, we now have studies from Denmark, Sweden, and California demonstrating that when thimerosal was removed from vaccines the prevalence rate of autism continued to rise. In addition, there’s one study from Yokohama, Japan in which the incidence of autism spectrum disorders continued to increase even after MMR vaccination was discontinued completely.
I would like to end with a somewhat unusual quote from the Young-Geier paper: In considering potential limitations for the present study, because of the ecological nature of the study design, we were not able to link vaccine exposures across individual patient records. Individual vaccine doses could not be directly attributed to individual patients. Hence, the results of the present study represent the aggregate doses of Hg and aggregate prevalence of disorders for a given birth cohort year, and not analyses of individual children. While this information would have been useful for additional analyses, given the magnitude and robustness of the observed effects, this limitation appears to have had a limited impact on the strength of the results. (Page 6, first full paragraph). This is a useful paragraph, but a very strange paragraph indeed. The first three sentences say that they did an group-level analysis and why. The last sentence is the kicker — the authors are claiming that individual-level data is not needed to investigate this question, nor is there any need to take confounding into account. Surely Dr. Young knows better. Decades of methodological research and empirical comparisons have shown that group-level analyses of this sort are extremely biased compared to individual-level analyses. Usually the relative risks are grossly inflated.
In my next post, I hope to use the limited amount of data provided in the Young-Geiers paper to see if we can figure out approximately how much the rate ratios are articially inflated upwards by ecologic bias. In the meantime, if you have any questions, don’t hesitate to comment.
*In a computerized individual-level analysis of the probability of autism, each child would be coded 1=autism and 0=noncase.