Data courtesy of DeSoto and Hitlan (2007)
Since I became involved in the question of vaccines and autism — and more specifically the question of mercury in vaccines and autism — every week I’ve received a few identical e-mails from anti-vaccinationists that consist of a list of references. It’s always the same references and I’ve come to think of it as “The List.” Always on the top of The List is Desoto MC, Hitlan RT. Blood levels of mercury are related to diagnosis of autism: a reanalysis of an important data set. Journal of Child Neurology. 2007;22:1308-11. I read the DeSoto and Hitlan paper back in April and was skeptical about the reported results then. However, I heard from an epidemiologist friend of mind that Dr. Catherine DeSoto was extremely courteous and forthcoming in answering questions about the paper, so I decided to let my skepticism simmer for a while.
Then a few days ago an important paper in Science was published on Identifying Autism Loci and Genes by Tracing Recent Shared Ancestry. Naturally the Science paper was reported by hundreds of newspapers and other media outlets. One of the best newspaper stories was a Washington Post article entitled, “Mental Activity May Affect Autism-Linked Genes.” Unfortunately, the comment section after the Washington Post article was completely hijacked by antivaccinationists who insisted that vaccines cause autism and that genetic studies of autism are part of a cover-up of the truth. And once again, one of the commenters presented “The List” with the DeSoto and Hitlan on the top.
My simmering skepticism boiled over and I decided to to take a closer look at the DeSoto & Hitlan paper. Obviously you need a lttle background here, especially since the history of the Desoto & Hitlan paper actually involves at least three publications.
(1) In June 2004 the Journal of Child Neurology published an article by Patrick Ip, Virginia Wong, Marco Ho, Joseph Lee, and Wilfred Wong of the University of Hong Kong. Ip et al. performed a case-control study to compare the hair and blood mercury levels of 82 children with autistic spectrum disorder (ASD) and a control group of 55 normal children. (Important note: I am NOT going to discuss the analyses of hair mecury levels, per DeSoto & Hitlan’s statement, “The hair analysis data is, in fact, interesting. But it is of secondary importance.”) The ASD cases included all ASD children actively folowed up from April to September 2000 in the Duchess of Kent Children’s Habilitation Hospital in Hong Kong. All ASD children were assessed by Virginia Wong. The diagnosis of ASD was made only if they fulfilled the DSM-IV diagnostic criteria for autism and undergone a structured interview using the Autism Diagnostic Interview-Revised. The control group consisted of “normal children who had mild viral illness and were admitted to the pediatric ward of [Hong Kong’s] Queen Mary Hospital.” Ip et al. reported that there were no differences in mean mercury levels. The mean blood mercury levels of the ASD case and control groups were were reported to be 19.53 and 17.68 nmol/L, respectively (P = 0.15), a difference of 1.85. Ip et al concluded that “there is no causal relationship between mercury as an environmental neurotoxin and autism.” The authors also noted: the “blood mercury levels of both autistic and normal children in Hong Kong were elevated [compared to other populations around the world];” and “this study is limited by the sample size and culture because Hong Kong Chinese are famous for eating seafood.” (Ip P, Wong V, Ho M, et al. Mercury exposure in children with autistic spectrum disorder: case-control study. J Child Neurol. 2004;19:431-4. Erratum in: J Child Neurol. 2007;22:1324.)
(2) In May 2007 Dr. Catherine DeSoto wrote to the Editorial Office of the Journal of Child Neurology expressing concern about what appeared to be obvious inconsistencies in the data analysis of the results section of the Ip st al article. Dr. DeSoto’s specific concern related to the statistical interpretation of the data. Dr. Roger Brumback, the Editor-in-Chief of the Journal of Child Neurology contacted Virginia Wong, the corresponding author of the Ip et al article, and requested the original data. Professor Wong provided a spreadsheet of the original data. So all of the original data can be found in two tables at: Brumback RA. Note From Editor-in-Chief About Erratum for Ip et al Article. J Child Neurol 2007;22:1321-1323. These are the data that I used for my analyses below.
(3) At the request of the Editor-in-Chief of the Journal of Child Neurology, Dr. Catherine DeSoto and Dr. Robert Hitlan performed an analysis of the original data, which was published as a special article in the November 2007 issue of the journal. According to the abstract, DeSoto & Hitlan “found that the original p value was in error and that a significant relation does exist between the blood levels of mercury and diagnosis of an autism spectrum disorder.” A few details about Desoto & Hitlan’s analysis would be in order here, but there aren’t many details. (This was the first reason that I was skeptical about the paper.) The authors do mention that they excluded two outliers that were greater than 3 standard deviations above the mean. I have absolutely no problem with this — in fact, I agree with DeSoto & Hitlan that it was a good idea. What I find unusual is that the authors mention only one of the outlying values — a blood mercury level of 98 nmol/L in the ASD group. I had to go to the original data to figure out that the other outlier they excluded, which was a value of 74 in the control group.
Here is the total extent of the results section regarding blood mercury levels: “Logistic regression was performed using blood mercury level as the predictor and the autistic/control group as the criterion. Results of this reanalysis indicate that blood mercury can be used to predict autism diagnosis. Data included: r = .20, r2 = .04, F(1,133) =5.76, P = .017. This finding indicates that there is a statistically significant relationship between mercury levels in the blood and diagnosis of an autism spectrum disorder.” That’s it for results. I’m going to skip any discussion of the r and r2, since they’re not immediately relevant to this discussion and they’re just complex enough to confuse a lot of people (but see below). This leaves us with an F-test from a logistic regression and a highly significant P-value. The authors don’t say which logistic regression statistical package they used. The F-test seems to be a test of whether the mean blood mercury levels of the ASD case group and the control group are different — the same hypothesis Ip et al. were testing — but this is unclear. Again this seems most unusual to me, but DeSoto & Hitlan do not provide the reader with means for either of the two groups. Fortunately, my epidemiologist friend (mentioned above) e-mailed Dr. DeSoto and she responded almost immediately with the missing information (which I’ve confirmed in my own analyses): With two outliers removed:
Mean blood mercury level in control group: 13.59 nmol/L
Mean blood mercury level in ASD cases: 18.57 nmol/L
Difference between groups: 4.98 nmol/L
95% confidence interval for
difference between groups: 0.88 - 9.1
I just don’t undertand why DeSoto & Hitlan didn’t provide these data in their paper.
In any event, now we’ve learned a little bit more, but I was still skeptical of these analyses for another reason. Ip et al. state outright that they performed a student’s t test to compare the means of the two groups. DeSoto & Hitlan never come right out and say that they’re interesting in comparing means, but it’s certainly implied. However, a comparison of arithmetic means, and certainly the use of the t test, assumes that we’re comparing two normally distributed samples. Although I’d never analyzed blood mercury before, I have analyzed blood lead levels. In my experience, blood lead levels are never normally distributed. This is why we use geometric means and percentiles — not arithmetic means — when we report descriptive statistics on blood lead levels. So I was skeptical about whether blood mercury levels would be normally distributed in children from Hong Kong.
The first thing I always do — and I always told my students to do this — is to actually LOOK at the data. It’s tempting to start out by looking at the ASD cases, by my advice is that it’s wiser to check out the control group first. I’ve excluded an outlier, so there’s 54 controls. Since there’s an unequal number of controls and cases, it’s easier to compare the two groups if we use percentages instead of raw numbers. So here’s the percentage distribution of the control group:
Some people don’t like these skinny little bars that PowerPoint provides in its histograms, so here’s the identical data shown in an “area under the curve” type chart:
If these data are normally distributed, or anything close to normally distributed, than I’m Bernadine Healy. In fact, trying to choose a “measure of central tendency” for these data is pretty much hopeless. The arithmetic mean of 13.6 is essentially meaningless. (No pun intended.) There were 54 controls. 10 of the controls have blood mercury values of 5.0 nmol/L, which means 5 is the mode, but that doesn’t help us much either. 6 controls had a value of 8.0, but saying 8 is a second mode would be silly. The best thing to do is look at the data and describe what’s actually there: There’s a cluster of 36 controls with values between 5 and 14 nmol/L that’s very heavily skewed such that the mode of the cluster is 5. There’s a second cluster of 13 controls with values between 17 and 24. Then there are 5 controls scattered across higher values between 33 and 42. One useful aspect of looking at the controls first is that it gave me a opportunity to choose an unbiased cut-off point for my odds ratio analysis. Since the literature doesn’t provide definitive advice for a “high” blood mercury level for Hong Kong children, and these controls have this nice space with no values between 14 and 17, I decided to define greater than 16 nmol/L as a “high” mercury level for my odds ratio analysis. Now let’s look at the data for the ASD cases. Again this is a percentage distribution.
Once again we have a distribution of values that’s not even close to normally distributed. There were 81 ASD cases. 14 cases had a blood mercury value of 5.0. I suppose you could say there was a second mode at about 20 nmol/L, since there were 6 cases with a value of 20 and 4 cases with a value of 23. What does the arithmetic mean of 18.6 signify in a distribution like this? Very little, I think. Here’s a chart showing the percentage distributions of the ASD cases and the controls compared:
There were so many more blood mercury values between 5 and 10 than in any other intervals, I’ve shown these as individual categories. Then I’ve categorized blood mercury levels in 5 nmol/L cetegories. So: Is there a difference between these two distributions? And how would we characterize the difference? It looks like the main difference is that the ASD cases have more mercury blood values at the upper end of the distribution than do the controls. By “the upper end of the distribution” I mean values greater than 25 nmol/L. In fact, that’s just what’s going on. Of the 54 controls, there were only 5 children with blood mercury levels greater than 25 (and the greatest value was 42). Of the 81 ASD cases, there were 21 children with values greater than 25, with 4 values between 41 and 45 and a high value of 59. So how do we go about carrying out a “formal” statistical comaprison of these two groups. First, any analysis involving a comparison of arithmetic means, such as a t-test, or a logistic regression in the form that DeSoto & Hitlan used (with blood mercury entered simply as a “continuous” variable) would be wrong. Why? Because the blood mercury values of these two samples just don’t come from normally distributed populations or anything close. Second, it’s common to calculate geometric means for blood mercury levels. in these two samples, the geometric means were 11.1 for the control group and 14.4 for the ASD cases, a difference of 3.3. A formal statistical comparison of the geometric means would be a bit more complex, because it would involve a logarithmic transformation of the blood mercury values. But the purpose of the log transformation is to make the distributions normal and there’s no way you’re going to make these two distributions normal unless you get somebody to jump up and down on the bars at the value for 5.0 until they almost disappear. (Any nominees for jumpers out there? Certified data fudgers?) So formal comparison of geometric means would also be wrong.
This leaves us with an analytic method that makes no assumptions about the distributions of the cases and controls — the calculation of an odds ratio or odds ratios. Since this post is too long already, I’m not going to explain what an odds ratio is except to say (1) it’s the optimal measure of strength of association in a case-control study and (2) please don’t make the mistake of assuming that a prospective cohort study of blood mercury levels and autism would have found a relative risk, or risk ratio, or rate ratio similar to the odds ratios I’m about to show. To learn more about the odds ratio, read the article in the British Medical Journal series on medical statistics or Google odds ratio. The Wikipedia article on “Odds Ratio” is okay, but not great. For an explanation of confidence intervals, see “Statistical Criteria in the Interpretation of Epidemiologic Data” and “Beyond the Confidence Interval.” So here are the results of my analysis:
Using blood mercury cut-off point of 17 nmol/L
(above 16 considered high mercury level)
Odds Ratio = 1.86
95% Exact* CI: 0.86 - 4.06
p = 0.126
(Chi-square = 2.34)
*Exact confidence interval calculated using the method of Mehta CR. The exact analysis of contingency tables in medical research. Stat Methods Med Res. 1994;3:135-56.
But wait. I felt a sudden disturbance in the Force, as though thousands of biostatisticians are writhing in agony because I used only two categories and “didn’t take advantage of all of the data.” So let’s do a trend analysis, using the value 5 nmol/L as the reference category (where OR = 1.00):
|Blood Mercury||Odds Ratio|
|6 to 10||0.63|
|11 to 15||0.98|
|16 to 20||1.07|
|21 to 25||1.00|
This isn’t a complete trend analysis, obviously. When I stop at 25 nmol/L, the chi-square for linear trend is 0,378 and the p-value is 0.54. One of the great things about entering data by hand and actually LOOKING at the data while you do it is that you can stop and notice certain things. Like, for example: in these data there’s no significant difference between the two distributions under 25 nmol/L. So any difference between the blood mercury distributions of the cases and controls is being “driven” by an excess of ASD cases with values above 25.
In order to do a proper chi-square analysis for trend, one really needs at least 5 individuals in each cell. So I had to group all the higher values together in one category at 26 nmol/L and greater:
|Blood Mercury||Odds Ratio|
|6 to 10||0.63|
|11 to 15||0.98|
|16 to 20||1.07|
|21 to 25||1.00|
|26 and greater||3.00|
Chi-square for linear trend = 5.897
p-value = 0.015
So the linear trend is statistically significant, but it’s completely “driven” by the 21 ASD cases with blood mercury levels of greater than 25 nmol/L. At this point there’s a real temptation to analyze the data using a cut-off point of 25. This is post-hoc analysis based on what we’ve seen in the data, so it’s questionable, but I’ll go ahead with it anyway:
(with 95% Confidence Interval)
Using blood mercury cut-off point of 25 nmol/L
(above 25 considered high mercury level)
Odds Ratio = 3.4
95% Exact CI: 1.1 - 12.4
Logistic regression analysis
Now that we have a much better picture of differences between the cases and controls, I think it’s okay to run a logistic regression analysis. These are the results:
Chi Square= 5.9955; df=1; p= 0.014
Odds Ratio = 1.04
95% Confidence Interval: 1.005 to 1.075
The odds ratio can be interpreted as follows: For every 1 nmol/l increase in blood mercury, the difference between ASD cases and controls increases by an odds of about 0.04. Note that this effect size is “on average.” There’s obviously no way of knowing simply from this effect size estimate (OR = 1.04) that all of the differences between ASD cases and controls occurs at greater than 25 nmol/L.
1. I want to emphasize that this post is in no way meant as an ad hominem attack on Dr. DeSoto or Dr. Hitlan or the Editor of the Journal of Child Neurology. I ask commenters to refrain such attacks in the discussion.
2. Indeed the main point of this post is that data analysts should “look before they jump.” Look at the data carefully using visual methods like the charts above, or carry out detailed cross-tabulations, before you jump in and start running logistic regressions, etc.
3. I’m not making any assumptions about what DeSoto & Hitlan did or did not do in exploratory or preliminary analyses. But all I have to work with is what’s in the published paper. The paper is four pages long, yet only one 8-line paragraph is devoted to the main result. On the other hand, three relatively long paragaphs are devoted to lecturing Ip and colleagues on why they (Ip et al.) should have used a one-tailed test.
4. This is a relatively small data set with weird and unstable distributions of blood mercury . Unfortunately, there are very few data sets with information on blood mercury that include both autism cases and a control group. Unfortunately, we therefore must to consider it an “important data set.”
5. The analysis of Ip at al. (2004) and the analysis of DeSoto and Hitlan (2007) in which the mean blood mercury levels of ASD cases and controls were compared were statistically inappropriate. Any argument that the statistically significant p-value found by DeSoto & Hitlan just goes to show the “robustness” of the t-test is absurd.
6. DeSoto and Hitlan (2007) concluded that “a significant relation does exist between the blood levels of mercury and diagnosis of an autism spectrum disorder.” I disagree. In my opinion, this statement is too strong.
7. What is my conclusion about what this data set tells us about the association between blood mercury and autistic spectrum disorder? Not much. I don’t think it shows a significant relationship. On the other hand (and this is important), I don’t think that it shows that there is not a relationship either.
In my pre-planned dichotomous analysis above, I found an odds ratio of 1.86, with a lower 95% confidence limit of 0.86. An odds ratio of 1.86 is of moderate strength, but this is clearly not statistically significant. The trend analysis shows that odds ratios are stable (i.e., consistently close to 1.00) until we reach blood levels higher than 25 nmol/L, when the odds ratio is 3.00. In a post-hoc analysis using 25 nmol/L, I found an odds ratio of 3.4, with a 95% confidence interval of 1.1 to 12.4. You can see the logistic regression findings above, but my opinion is that these are the least important findings of the entire series of analyses. We did find a “statistically significant” odds ratio of 1.04 (95% CI: 1.005 to 1.075; p = 0.014), but this tells us much less than the graphical analysis and the trend analysis of odds ratios.
Given these results from a case-control study with such a small sample size, these are really of the “more research is needed” variety. Again, my opinion: I don’t think there’s a significant relationship. Nor do I think there’s definitively not a relationship.
8. DeSoto and Hitlan (2007) report an r of .20 and an r2 of .04. They then devote part of the last paragraph of their paper discussing why an “effect size” of .04 is important. This would have to be a subject for a whole other post, but like most epidemiologists (and sociologists and econometricians), I consider correlational statistics like r’s and R2’s essentially useless as measures of effect. Class: for tomorrow, read the classic paper, “The fallacy of employing standardized regression coefficients and correlations as measures of effect.” I’ll probably do a post on the subject anyway, but be ready for a pop quiz.
9. We can conclude absolutely nothing about the association of ethylmercury in vaccines to autism from these data.
10. As usual, your questions and comments are welcome. Agree, disagree, or whatever, but be civilized.
Important note and apologies to Drs. Desoto and Hitlan, Ken, efrique, and my readers: The original article that I posted on Wednesday, July 16th, has been revised on the afternoon of Saturday, July 19th. I was somewhat puzzled by Ken and efrique’s comments. Then I realized that I had not published the final version of my post on July 16th, but an earlier draft. In other words, I screwed up. That’s what happens when you blog at 4:00 in the morning. Thank you, Ken and efrique for your comments.
Essentially the changes are these: I have performed my own logistic regression analysis, but I have NOT changed any of my conclusions. There are also a few changes in the paragraph in which I describe DeSoto and Hitlan’s Results section.