<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Epi Wonk&#8217;s Intro to Data Analysis</title>
	<atom:link href="http://epiwonk.com/?feed=rss2&#038;p=112" rel="self" type="application/rss+xml" />
	<link>http://epiwonk.com/?p=112</link>
	<description>Epidemiology, Health, and Medical News Media Watchdog: A Blog for the General Public</description>
	<pubDate>Wed, 08 Sep 2010 04:51:17 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: Autism Blog - Prof. DeSoto discusses mercury and autism &#171; Left Brain/Right Brain</title>
		<link>http://epiwonk.com/?p=112#comment-893</link>
		<dc:creator>Autism Blog - Prof. DeSoto discusses mercury and autism &#171; Left Brain/Right Brain</dc:creator>
		<pubDate>Tue, 03 Aug 2010 21:59:23 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-893</guid>
		<description>[...] and a Note on Scientific Honesty. Perhaps the best analysis of the original DeSoto and Hitlan paper was performed by EpiWonk, an [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] and a Note on Scientific Honesty. Perhaps the best analysis of the original DeSoto and Hitlan paper was performed by EpiWonk, an [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dionna Langefels</title>
		<link>http://epiwonk.com/?p=112#comment-875</link>
		<dc:creator>Dionna Langefels</dc:creator>
		<pubDate>Thu, 11 Mar 2010 19:44:05 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-875</guid>
		<description>Just landed on this post via Google lookup. I love it. This post switch my perception and I am obtaining the RSS feeds. Cheers.</description>
		<content:encoded><![CDATA[<p>Just landed on this post via Google lookup. I love it. This post switch my perception and I am obtaining the RSS feeds. Cheers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Autism Blog - Are blood mercury levels an important metric in autism? &#171; Left Brain/Right Brain</title>
		<link>http://epiwonk.com/?p=112#comment-798</link>
		<dc:creator>Autism Blog - Are blood mercury levels an important metric in autism? &#171; Left Brain/Right Brain</dc:creator>
		<pubDate>Mon, 26 Oct 2009 05:01:32 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-798</guid>
		<description>[...] to Diagnosis of Autism: A Reanalysis of an Important Data Set, was itself immediately reanalyzed (epiwonk, epiwonk-2, Autism Street, leading to a response analysis by the Age of Autism blog, to name a [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] to Diagnosis of Autism: A Reanalysis of an Important Data Set, was itself immediately reanalyzed (epiwonk, epiwonk-2, Autism Street, leading to a response analysis by the Age of Autism blog, to name a [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken</title>
		<link>http://epiwonk.com/?p=112#comment-369</link>
		<dc:creator>Ken</dc:creator>
		<pubDate>Sun, 20 Jul 2008 04:15:18 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-369</guid>
		<description>This might also be of interest http://www.uni.edu/desoto/desoto_hitlan_autism.html

I just found the erratum to the original Ip paper (It is mentioned after the Ip paper but not linked) and it completely corrects Table 1.</description>
		<content:encoded><![CDATA[<p>This might also be of interest <a href="http://www.uni.edu/desoto/desoto_hitlan_autism.html" rel="nofollow">http://www.uni.edu/desoto/desoto_hitlan_autism.html</a></p>
<p>I just found the erratum to the original Ip paper (It is mentioned after the Ip paper but not linked) and it completely corrects Table 1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken</title>
		<link>http://epiwonk.com/?p=112#comment-368</link>
		<dc:creator>Ken</dc:creator>
		<pubDate>Sun, 20 Jul 2008 01:32:33 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-368</guid>
		<description>I now have the data and the original paper, and it gets worse. Hopefully I haven't made any errors but it is the weekend.

1. The original problem DeSoto had with the data was that using the standard deviations in Ip the t-test should have been significant. The problem is the blood Hg std in Table 1 are wrong, they have lost the first digit and should be 15.65 and 12.49. On the same topic the std for age are obviously ridiculous and should be 2.8 and 3.5 instead of 0.2 and 0.4. The text claims a rage of 4-11 years, which is clearly incompatible with the published std. In fact the age range is 3-16. I didn't check the rest of the table.

2. For the t-test I get a p-value of 0.057 compared to Ip 0.15. Taking logs (which seems correct based on residuals) reduces this to 0.047. Removing the two outliers gives a p-value of 0.018 for unlogged data. 

3. Using logistic I get 0.06 for regression on blood Hg, 0.02 with the outliers removed (close enough to DeSoto) and 0.05 with log blood Hg. Removing the outliers doesn't seem very conservative as DeSoto claims and this should have been reported. After log transformation they don't look like outliers any more.

4. Ip based the desicion to not include age and gender in the model because they weren't significantly different between the groups. Not a valid reason but they don't make much difference anyway. Logistic regression with log blood Hg, age and sex has a p value of 0.03 for log blood Hg but age and sex are not significant.

5. Still leaves the problem that blood Hg is obviously censored at 5 and the analysis should take this into account, although it probably wont make much difference. A topic for Advanced Data Analysis.

6. Ignoring the censoring a statistician could sensibly fit a logistic with log blood Hg, age and sex and the resulting p value for log blood Hg would be 0.03. Significant but not hugely. So while the statistical analysis isn't optimal it doesn't really change the conclusions. Still leaves the problem of the bias which is probably much more important. In addition to those already mentioned is the assumption that current blood mercury is an indication of blood mercury prior to development of autism.</description>
		<content:encoded><![CDATA[<p>I now have the data and the original paper, and it gets worse. Hopefully I haven&#8217;t made any errors but it is the weekend.</p>
<p>1. The original problem DeSoto had with the data was that using the standard deviations in Ip the t-test should have been significant. The problem is the blood Hg std in Table 1 are wrong, they have lost the first digit and should be 15.65 and 12.49. On the same topic the std for age are obviously ridiculous and should be 2.8 and 3.5 instead of 0.2 and 0.4. The text claims a rage of 4-11 years, which is clearly incompatible with the published std. In fact the age range is 3-16. I didn&#8217;t check the rest of the table.</p>
<p>2. For the t-test I get a p-value of 0.057 compared to Ip 0.15. Taking logs (which seems correct based on residuals) reduces this to 0.047. Removing the two outliers gives a p-value of 0.018 for unlogged data. </p>
<p>3. Using logistic I get 0.06 for regression on blood Hg, 0.02 with the outliers removed (close enough to DeSoto) and 0.05 with log blood Hg. Removing the outliers doesn&#8217;t seem very conservative as DeSoto claims and this should have been reported. After log transformation they don&#8217;t look like outliers any more.</p>
<p>4. Ip based the desicion to not include age and gender in the model because they weren&#8217;t significantly different between the groups. Not a valid reason but they don&#8217;t make much difference anyway. Logistic regression with log blood Hg, age and sex has a p value of 0.03 for log blood Hg but age and sex are not significant.</p>
<p>5. Still leaves the problem that blood Hg is obviously censored at 5 and the analysis should take this into account, although it probably wont make much difference. A topic for Advanced Data Analysis.</p>
<p>6. Ignoring the censoring a statistician could sensibly fit a logistic with log blood Hg, age and sex and the resulting p value for log blood Hg would be 0.03. Significant but not hugely. So while the statistical analysis isn&#8217;t optimal it doesn&#8217;t really change the conclusions. Still leaves the problem of the bias which is probably much more important. In addition to those already mentioned is the assumption that current blood mercury is an indication of blood mercury prior to development of autism.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: efrique</title>
		<link>http://epiwonk.com/?p=112#comment-367</link>
		<dc:creator>efrique</dc:creator>
		<pubDate>Sun, 20 Jul 2008 00:44:26 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-367</guid>
		<description>Epiwonk: My apologies. the PLUS character is not showing up when my replies go up with the "Your comment is awaiting moderation" message for some reason. I don't know if it's a problem with just the way it's displaying at that time or if it's being lost altogether. the second message should have said:

 That should say “asymptotically normal from the CLT PLUS Slutsky’s theorem” in the 6th paragraph. I don’t know why the "PLUS" went away (it’s in the original copy of the reply that I wrote in notepad and pasted into the window).


(but with an actual plus symbol where I just typed PLUS)

If you wish, I'm happy if you just make sure the PLUS is present in the original message and delete these two followups.</description>
		<content:encoded><![CDATA[<p>Epiwonk: My apologies. the PLUS character is not showing up when my replies go up with the &#8220;Your comment is awaiting moderation&#8221; message for some reason. I don&#8217;t know if it&#8217;s a problem with just the way it&#8217;s displaying at that time or if it&#8217;s being lost altogether. the second message should have said:</p>
<p> That should say “asymptotically normal from the CLT PLUS Slutsky’s theorem” in the 6th paragraph. I don’t know why the &#8220;PLUS&#8221; went away (it’s in the original copy of the reply that I wrote in notepad and pasted into the window).</p>
<p>(but with an actual plus symbol where I just typed PLUS)</p>
<p>If you wish, I&#8217;m happy if you just make sure the PLUS is present in the original message and delete these two followups.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: efrique</title>
		<link>http://epiwonk.com/?p=112#comment-365</link>
		<dc:creator>efrique</dc:creator>
		<pubDate>Sun, 20 Jul 2008 00:32:54 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-365</guid>
		<description>Hi Epiwonk: 

Even in the case of ordinary regression, the X-variables don't have a specific distributional assumption - the model is conditioned on them. Only the Y-variable does and then only its conditional distribution. The fact remains that given the hypothesis that mercury causes autism, mercury level is not the response, but whether or not you were diagnosed with autism, and so the distribution of mercury levels is not an issue unless you are trying to do some kind of inverse regression..

[Actually, I some further issues with some of your criticisms of their paper, but I think I should stick to the main points.]

I'm not certain what the source of the "F-test" figure was, but I do see one possibility. Or rather, I see two different possibilities, but they're just the same thing looked at two different ways; I will discuss one of them.

Be aware that this is mere conjecture - you'd have to ask the authors to be sure. 

In GLMs, some people call the ratio of a coefficient to its standard error as a "t-statistic". In ordinary regression, the square of this t-statistic has an F-distribution. 

Note that in GLMs this "t-statistic" doesn't actually have a t-distribution  (it's still "asymptotically normal from the CLT + Slutsky's theorem"). With large d.f. it doesn't matter though, since we're just using the normal tables anyway, so you end up in the right place whatever you call it, and calling it a t-statistic has the advantage of helping us understand what thing we're looking at. With large d.f., if you happen to look up t-tables it won't impact your conclusions, so it's more a matter of poor terminology than bad practice.

[Some people argue that the numerator is asymptotically normal and the denominator is asymptotically the square root of a chi-squared on its df and "hence it's asymptotically t". This argument has two different flaws. It may or may not be asymptotically t but that argument is not sufficient to establish that it is.]

Anyway, if you square that standardized coefficient, you might call it an F-statistic by analogy. Both the squared statistic and the F-table asymptotically approach chi-square (in the same way that the original statistic and the t-table both approach normality), so again, the conclusions should be correct.

If that's what they did, their terminology may be a little sloppy but the correct impression is generated; if it was done with the aim of presenting information more familiar to users of ordinary regression, I wouldn't have a big problem with it.</description>
		<content:encoded><![CDATA[<p>Hi Epiwonk: </p>
<p>Even in the case of ordinary regression, the X-variables don&#8217;t have a specific distributional assumption - the model is conditioned on them. Only the Y-variable does and then only its conditional distribution. The fact remains that given the hypothesis that mercury causes autism, mercury level is not the response, but whether or not you were diagnosed with autism, and so the distribution of mercury levels is not an issue unless you are trying to do some kind of inverse regression..</p>
<p>[Actually, I some further issues with some of your criticisms of their paper, but I think I should stick to the main points.]</p>
<p>I&#8217;m not certain what the source of the &#8220;F-test&#8221; figure was, but I do see one possibility. Or rather, I see two different possibilities, but they&#8217;re just the same thing looked at two different ways; I will discuss one of them.</p>
<p>Be aware that this is mere conjecture - you&#8217;d have to ask the authors to be sure. </p>
<p>In GLMs, some people call the ratio of a coefficient to its standard error as a &#8220;t-statistic&#8221;. In ordinary regression, the square of this t-statistic has an F-distribution. </p>
<p>Note that in GLMs this &#8220;t-statistic&#8221; doesn&#8217;t actually have a t-distribution  (it&#8217;s still &#8220;asymptotically normal from the CLT + Slutsky&#8217;s theorem&#8221;). With large d.f. it doesn&#8217;t matter though, since we&#8217;re just using the normal tables anyway, so you end up in the right place whatever you call it, and calling it a t-statistic has the advantage of helping us understand what thing we&#8217;re looking at. With large d.f., if you happen to look up t-tables it won&#8217;t impact your conclusions, so it&#8217;s more a matter of poor terminology than bad practice.</p>
<p>[Some people argue that the numerator is asymptotically normal and the denominator is asymptotically the square root of a chi-squared on its df and &#8220;hence it&#8217;s asymptotically t&#8221;. This argument has two different flaws. It may or may not be asymptotically t but that argument is not sufficient to establish that it is.]</p>
<p>Anyway, if you square that standardized coefficient, you might call it an F-statistic by analogy. Both the squared statistic and the F-table asymptotically approach chi-square (in the same way that the original statistic and the t-table both approach normality), so again, the conclusions should be correct.</p>
<p>If that&#8217;s what they did, their terminology may be a little sloppy but the correct impression is generated; if it was done with the aim of presenting information more familiar to users of ordinary regression, I wouldn&#8217;t have a big problem with it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: EpiWonk</title>
		<link>http://epiwonk.com/?p=112#comment-363</link>
		<dc:creator>EpiWonk</dc:creator>
		<pubDate>Sat, 19 Jul 2008 21:39:11 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-363</guid>
		<description>@efrique: See my "Important Note and Apology" at the bottom of the my post above.  Technically, you're correct -- and you made me realize that I'd posted an earlier draft of my article without my own logistic regression results.  I say technically you're correct that logistic regression "has no distributional requirement."  In practice, for this particular data sets, my opinion is that the simple logistic regression results (which DeSoto nd Hitlan report without even a regrssion coefficient or odfds ratio) are misleading.

But again -- thank you very much for your comment -- it woke me up to my error.

Thank you also to Ken; I should have realized my mistake when he made similar comments.

efrique: Here's a question that still has Ken and I partially stumped: Where does the F-test come from in DeSoto &#038; Hitlan's logistic regression results, which I quote in my article?  They must have done some sort of least squares comparison of means to get an F-test.</description>
		<content:encoded><![CDATA[<p>@efrique: See my &#8220;Important Note and Apology&#8221; at the bottom of the my post above.  Technically, you&#8217;re correct &#8212; and you made me realize that I&#8217;d posted an earlier draft of my article without my own logistic regression results.  I say technically you&#8217;re correct that logistic regression &#8220;has no distributional requirement.&#8221;  In practice, for this particular data sets, my opinion is that the simple logistic regression results (which DeSoto nd Hitlan report without even a regrssion coefficient or odfds ratio) are misleading.</p>
<p>But again &#8212; thank you very much for your comment &#8212; it woke me up to my error.</p>
<p>Thank you also to Ken; I should have realized my mistake when he made similar comments.</p>
<p>efrique: Here&#8217;s a question that still has Ken and I partially stumped: Where does the F-test come from in DeSoto &#038; Hitlan&#8217;s logistic regression results, which I quote in my article?  They must have done some sort of least squares comparison of means to get an F-test.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: EpiWonk</title>
		<link>http://epiwonk.com/?p=112#comment-362</link>
		<dc:creator>EpiWonk</dc:creator>
		<pubDate>Sat, 19 Jul 2008 17:24:17 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-362</guid>
		<description>@efrique: I've probably done logistic regression hundreds of times in my career and and published numerous papers using logistic regression analyses.  Sorry, but because part of the purpose of this blog is for teaching, and your comment will just confuse readers, I've blocked it.</description>
		<content:encoded><![CDATA[<p>@efrique: I&#8217;ve probably done logistic regression hundreds of times in my career and and published numerous papers using logistic regression analyses.  Sorry, but because part of the purpose of this blog is for teaching, and your comment will just confuse readers, I&#8217;ve blocked it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph</title>
		<link>http://epiwonk.com/?p=112#comment-347</link>
		<dc:creator>Joseph</dc:creator>
		<pubDate>Thu, 17 Jul 2008 23:27:37 +0000</pubDate>
		<guid isPermaLink="false">http://epiwonk.com/?p=112#comment-347</guid>
		<description>&lt;i&gt;The gender distributions of the ASD cases and controls are very similar:&lt;/i&gt;

I had noticed previously that the blood mercury difference of autistic vs. non-autistic females in this data set was greater than that of males. I'm not sure if there's statistical significance on that.</description>
		<content:encoded><![CDATA[<p><i>The gender distributions of the ASD cases and controls are very similar:</i></p>
<p>I had noticed previously that the blood mercury difference of autistic vs. non-autistic females in this data set was greater than that of males. I&#8217;m not sure if there&#8217;s statistical significance on that.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
