We believe that the investigator is never justified in interpreting the results of ecological analyses in terms of the individuals who give rise to the data. (”The Ecological Fallacy,” 1988, by Steven Piantadosi, MD, PhD, biostatistician and Director of the Samuel Oschin Comprehensive Cancer Institute at Cedars-Sinai Medical Center, Baltimore)

It is impossible to study ecologic analyses very carefully and come away with much confidence in them. Averaging is a process that necessitates some loss of information… When analyzing such aggregated data, we not only lose all ability to extend inferences reliably to less aggregated data but we even lose the ability to estimate the direction and magnitude of bias. We cannot rely on the addition of more grouped data to eliminate the bias. I encourage epidemiologists to understand the deficiencies of group-level analyses and limit them to the role of hypothesis generation. (”Ecologic Biases,” 1994, also by Professor Piantadosi).

Before I recommence my discussion of the Young-Geier Autism Study, I should stress that I’m certainly not an expert on vaccines. Even though I worked at the CDC for part of my career, I never worked in the National Center for Immunization and Respiratory Diseases (formerly the National Immunization Program [NIP]). Don’t forget that the CDC is a big place, spread out in several buildings in Atlanta and elsewhere. I don’t think I even knew anyone in the NIP. Given what I don’t know about vaccines, I’m very glad to see that Barbara Martin, MD has written a multi-installment critique of of the Young-Geier paper, with more emphasis on pharmacology and neurology (the latter being her medical specialty).

It seems that much of the confusion and difficulty in understanding the Young-Geier paper comes from the use of the term ecological study or ecological design. This confusion is not surprising, since here the word ecological has a meaning quite different from its meaning in biology or in everyday language. You will find the concept of the ecological study included among the methods of economics, sociology, and epidemiology, but almost never in clinical research in medicine (for good reasons, as we shall see). In order to understand the concept of an ecological-level study, it’s best to first think of what is meant by an individual-level study. In an individual-level study the investigator has data on every variable for every participant in the study. This may sound ridiculously simple, but it needs careful explication here, because an “ecological” study is quite different. In an individual-level study of thimerosal containing vaccines (TCV) and neurodevelopmental disorders (ND), for each child in the study we would have vaccination history for that child, ND diagnosis or diagnoses (if diagnosed), exact age of diagnosis or diagnoses (if diagnosed), date of birth, gender, age when follow-up ended, and information on as many potential confounders as possible for that child: birth weight, gestational age at birth, socioeconomic status, etc., etc. and all of that data would be linked together for that individual. Thus, in an individual-level study, the individual is the unit of analysis. All four of the cohort studies reviewed by Parker et al in 2004 and by Clements and McIntyre in 2006 were individual-level studies. (Both Parker et al. and Clements and McIntyre include previous papers by the Geiers in their reviews, but I’m not including any of those in the latter four studies I just mentioned.)

In an ecological study, data is collected at the group-level, as opposed to the individual level. The group is the unit of analysis. In fact, it would probably be easier to think of the Young-Geier Autism study as a group-level study with a group-level design and a group-level analysis, rather than using the confusing term “ecological.” Given the way the study is described, it may seem that there were 278,624 children in the study. Unfortunately, the investigators had 21 data files available to them separated by birth cohort, with no way to link the individual-level across data files. Thus, there were seven birth cohorts, 1990, 1991, 1992, …, up to 1996, and three data files for each year: an outcome file, a vaccine file, and a birth file. The birth file is mentioned only in Table 1 and was obviously not used in any of the analyses reported in this paper. Given the unavailability of linked individual-level data, the investigators decided to carry out a group-level analysis in the following manner:

First, for each of the seven birth cohorts they calculated the prevalence rate of autism, other ND’s, and control disorders, fudging the rates upward for 1995 and 1996, as I described in my first post on this paper.

Second [and here I’ll let the authors take over], Within each vaccine file, the cumulative Hg dose for each individual was calculated based on the number of each type of vaccine received. The cumulative dose of Hg was then aggregated over a birth cohort resulting in a total Hg dose for a particular vaccine by year of birth. The total Hg doses for each of the vaccines were then added together to obtain a total Hg dose for all vaccines by year of birth. The total Hg dose by year of birth was then divided by the population at risk for each birth cohort which was previously defined above. This calculation resulted in an average Hg dose per person for each birth cohort which served as the exposure variable. Because of interest in particular windows of exposure, Hg doses from vaccine exposure were calculated for the following periods: 1) birth to 7 months; and 2) birth to 13 months. Your head may be spinning after reading that, but the important thing to remember is that the birth cohort was the unit of analysis and for each birth cohort they calculated the average Hg dose per child.

So this is not an individual-level analysis in which we’re looking at the association of TCV exposure in 278,624 children to the probability* of autism (and other NDs) in those 278,624 children. Its a very simple group-level analysis with seven birth cohorts as the units of analysis. For each neurodevelopmental outcome, the investigators calculated seven prevalence rates by year (with fudged rates for 1995 and 1996) and then correlated the average Hg dose per child per year with prevalence rate. More precisely, they regressed log(prevalence rate) on average Hg exposure. The point is that, for each ND, there are only 14 numbers in the analysis — two from each birth cohort.

Let’s get back to the basic question. Come to think of it, what is the basic question? Despite their frequent use of the term “ecological,” anyone reading this paper without the utmost care is bound to think that the authors were testing hypotheses about individual children. After all, the questions that really matters are: Does thimerosal exposure cause autism in individual children? Is there a dose-response relationship between thimerosal exposure and the probability of autism in individual children? Given the nature of the data, these questions just cannot be answered using the Young-Geier analytic approach.

At this point I can easily see someone saying, ‘”What’s the big deal? Surely the the Young-Geier study does show a strong association between Hg exposure by birth cohort and autism rate by birth cohort. You may call it a group-level association, but can’t you just make the jump to the individual level and assume that Hg in vaccines causes autism?” The answer is a very definite NO. To make this jump from group-level to individual-level data is The Ecological Fallacy, which can be defined simply as thinking that relationships observed for groups necessarily hold for individuals.

The ecological fallacy was first described by the psychologist Edward Thorndike in 1938 in a paper entitled, “On the fallacy of imputing the correlations found for groups to the individuals or smaller groups composing them.” (Kind of says it all, doesn’t it.) The concept was introduced into sociology in 1950 by W.S. Robinson in 1950 in a paper entitled, “Ecological correlations and the behavior of individuals,” and the term Ecological Fallacy was coined by the sociologist H.C. Selvin in 1958. The concept of the ecological fallacy was formally introduced into epidemiology by Mervyn Susser in his 1973 text, Causal Thinking in the Health Sciences, although group-level analyses had been published in public health and epidemiology for decades.

To show you one example of the ecological fallacy, let’s take a brief look at H.C. Selvin’s 1958 paper. Selvin re-analyzed the 1897 study of Emile Durkheim (the “father of sociology”), Suicide, which investigated the association between religion and suicide. Although it’s difficult to find Selvin’s 1958 paper, the analyses are duplicated in a review by Professor Hal Morgenstern of the University of Michigan. Durkheim had data on four groups of Prussian provinces between 1883 and 1890. When the suicide rate is regressed on the percent of each group that was Protestant, an ecologic regression reveals a relative risk of 7.57, “i.e. it appears that Protestants were 7½ times as likely to commit suicide as were other residents (most of whom were Catholic)….ln fact, Durkheim actually compared suicide rates for Protestants and Catholics living in Prussia. From his data, we find that the rate was about twice as great among Protestants as among other religious groups, suggesting a substantial difference between the results obtained at the ecologic level (RR = 7.57) and those obtained at the individual level (RR = 2).” Thus, in Durkheim’s data, the effect estimate (the relative risk) is magnified by 4 by ecologic bias. In a recent methodological investigation of bias magnification in ecologic studies, Dr. Tom Webster of Boston University shows that effect measures can be biased upwards by as much as 25 times or more in ecologic analyses in which confounding is not controlled. Note that the Young-Geier Autism Study is exactly that — an ecologic analysis with no adjustment for confounding at all.

Please note that there is a completely different kind of “ecological study” that can be used when investigating a causal association. I’m referring to a study of time trends in which the exposure is suddenly interrupted or stopped. In my opinion, you can’t come to a causal conclusion based on only one such interrupted time trend study. But a few such studies, along with a few individual-level studies (cohort studies and perhaps case-control studies) do add helpful information. Thus, we now have studies from Denmark, Sweden, and California demonstrating that when thimerosal was removed from vaccines the prevalence rate of autism continued to rise. In addition, there’s one study from Yokohama, Japan in which the incidence of autism spectrum disorders continued to increase even after MMR vaccination was discontinued completely.

I would like to end with a somewhat unusual quote from the Young-Geier paper: In considering potential limitations for the present study, because of the ecological nature of the study design, we were not able to link vaccine exposures across individual patient records. Individual vaccine doses could not be directly attributed to individual patients. Hence, the results of the present study represent the aggregate doses of Hg and aggregate prevalence of disorders for a given birth cohort year, and not analyses of individual children. While this information would have been useful for additional analyses, given the magnitude and robustness of the observed effects, this limitation appears to have had a limited impact on the strength of the results. (Page 6, first full paragraph). This is a useful paragraph, but a very strange paragraph indeed. The first three sentences say that they did an group-level analysis and why. The last sentence is the kicker — the authors are claiming that individual-level data is not needed to investigate this question, nor is there any need to take confounding into account. Surely Dr. Young knows better. Decades of methodological research and empirical comparisons have shown that group-level analyses of this sort are extremely biased compared to individual-level analyses. Usually the relative risks are grossly inflated.

In my next post, I hope to use the limited amount of data provided in the Young-Geiers paper to see if we can figure out approximately how much the rate ratios are articially inflated upwards by ecologic bias. In the meantime, if you have any questions, don’t hesitate to comment.

*In a computerized individual-level analysis of the probability of autism, each child would be coded 1=autism and 0=noncase.

Sphere: Related Content

17 Responses to “New Study on Thimerosal and Neurodevelopmental Disorders: III. Group-Level Units of Analysis and the Ecological Fallacy”  

  1. 1 Do'C

    Hi Epi Wonk,

    Excellent article, and I enjoyed learning about ecological fallacy and ecological bias (which I was previously unaware of). I’ll look forward to your next installment that may quantify it in terms of Young Geier and Geier.

    I only take pedantic exception with one statement:

    Thus, we now have studies from Denmark, Sweden, and California demonstrating that when thimerosal was removed from vaccines the prevalence rate of autism continued to rise.

    While Schecter and Grether does have scientific value in my humble opinion, in fact I’ve written about this topic several times, I consider it important to note that the data from California is administrative prevalence (and therefore subject to limitations/confounds). I think it is still useful because were thimersosal to have had a significant measurable impact, it’s removal/reduction probably shoud have had an impact that was visible in the administrative prevalence. It wasn’t.

    Keep up the great blogging!


  2. 2 Uncle Dave


    That was quite a lecture. Everything the casual observer needs to know about ecological Fallacy and group level analysis but didn’t know to ask.

    Doc’C wrote;
    “I consider it important to note that the data from California is administrative prevalence (and therefore subject to limitations/confounds)”

    Administrative prevalence being integrity of the diagnosis or label of Autism? Pardon my lay person ignorance, but could that be explained in less than a novel.

  3. 3 Do'C

    Uncle Dave,

    If I understand your question correctly, and admittedly, I may not, I’m merely stating that administrative prevalence (prevalence among those receiving services within the CDDS system under the autism label) doesn’t necessarily translate to prevalence in the general population in the state, as might be ineferred by the term “prevalence rate of autism” (without qualification, and specifically with reference to the California paper).

    There may be several reasons that many autistic children are not included in the CDDS data, or that the numbers are affected by other factors (kids moving in and out of the state, etc.), in fact the CDDS makes this clear in their public disclaimers about interpreting the data. It is likely that CDDS autism caseload is behind the current descriptive epidemiology (undercounting).

    All that being said, I do still stand by my original statement that I think it is still useful for the reason I mentioned.

  4. 4 B. Martin, MD

    Many thanks for the “ecological” explanation. My last post (I hope) on the study is at http://bmartinmd.com/2008/05/young-geier-autism-study-4.html. I look foward to your quantitative assessment (inasmuch as it is possible) of the dubious rate ratios.

  5. 5 Hey Zeus is my homeboy

    Great breakdown of a problem that is seen too often in research: the application of group onto the individual. I’d chalk this particular Geier disaster up to simple ignorance rather than the much more suspicious cooking-the-data bit you described earlier. That said, these guys truly suck.

  6. 6 Uncle Dave


    “I’m merely stating that administrative prevalence (prevalence among those receiving services within the CDDS system under the autism label) doesn’t necessarily translate to prevalence in the general population in the state, as might be ineferred by the term “prevalence rate of autism” (without qualification, and specifically with reference to the California paper).”

    “If I understand your question correctly,..”

    You did.

    That is pretty much how I understood your reference.


  7. 7 Joseph

    If I’m not mistaken, prior VSD studies (Verstraeten et al.) did have individual-level analyses. The Geiers apparently took VSD data and did with it what you could do without having VSD data. That is, you could take an administrative prevalence database, and roughly calculate average thimerosal exposure for any birth year, without having VSD data at all. Surely, you could find a (coincidental) correlation that way, and not surprisingly the Geiers have in the past.

    It’s unclear what the value of using VSD would be in this paper, given the way it was analyzed. I can only imagine it’s so that they could claim to have done a VSD study.

  8. 8 EpiWonk

    @Joseph: I think you’re pretty much correct. Although though they DID go to the Data Center at NCHS and obtain the VSD data in some form, it’s very difficult to figure out out what they did, given the description in their paper. But you’ve summed up the situation nicely.

  9. 9 ebohlman

    The Ecological Fallacy, which can be defined simply as thinking that relationships observed for groups necessarily hold for individuals.

    Slightly more clear: “thinking that relationships observed between group averages necessarily hold for measurements made on individuals.” The key point here is that averages do not behave the same way that single measurements do. For example, the average height of American adults today is greater than the average height of American adults fifty years ago. Yet no American who was an adult fifty years ago is any taller today than he/she was back then (this is, of course, due to the fact that the averages aren’t for the same group).

    There’s a strong correlation between the percentage of gay men in an American county and the percentage of Jews in that county, but Jewish men are no more likely be gay than Gentile men. In other words, there’s an ecological relationship between male homosexuality and Judaism, but not an individual relationship (the reason for the ecological relationship is that both gay men and Jews tend to live in highly urban regions. Please, by the way, don’t quote me as saying anything about “GLBT” people because the urbanicity correlation doesn’t hold for lesbians and I’m not sure that the relationship, if any, for bisexuals and transgendered people has even been studied. Extrapolating way beyond the range of your observations is the cardinal sin of quantitative analysis).

  10. 10 Evil Monkey

    Thank you for this breakdown. The latest Geier paper surely counts as stats abuse….

  11. 11 Jeremiah

    (We’ve heard what you don’t like about Young et al’s analysis)

    Where’s your analysis of the data?

    (We’ve heard your three part commentary)

    Where’s your contribution?

  1. 1 Science-Based Medicine » Cell phones and cancer again, or: Oh, no! My cell phone’s going to give me cancer!
  2. 2 Autism Blog - Bravo Age of Autism « Left Brain/Right Brain
  3. 3 Science-Based Medicine » Ann Coulter says: Radiation is good for you!
  4. 4 Science-Based Medicine » Ann Coulter says: Radiation is good for you!
  5. 5 Science-Based Medicine » Vaccines and infant mortality rates: A false relationship promoted by the anti-vaccine movement
  6. 6 The Definitive Reference Guide to Debunking the Vaccine-Autism Myth | Angry Autie

Leave a Reply