I’ve mostly found the numerical alchemy by the right amusing, however something over at BrainShavings was too egregious to go unmentioned.

Benjamin Disraeli is reputed to have said: “There are three kinds of lies: lies, damned lies, and statistics.” A prescient man, that Ben.

Boy, you got that right. It seems The Puddle Pirate takes exception with comparing this years troop fatality figures with last years figures month-by-month, and proceeds to “demonstrate” that this is cherry picking the figures since he could just pick the highest death totals from each corresponding month over the duration of the war to show that deaths are “not so bad” this year (tho interestingly, over the period from Jan to August, only two monthly highs for US troop deaths were not set this year). Then it gets good.

When we compare August ’07 to August ’06, and July ’07 to July ’06, things look bad at first. But that’s not a meaningful way to look at the data. We only compare data from the same month in successive years if we’re trying to spot seasonal patterns.

No – you compare data from the same month in successive years if you are trying to control for seasonal patterns. To use TPP’s “spring break” example, looking at hotel occupancy during spring break last year and, say, January this year doesn’t tell you anything interesting at all. Conversely, looking at spring break occupancy last year compared to spring break occupancy this year can give you some indication about hotel occupancy trends between this year and last independent of seasonal variations.

So, in the interests of “doin’ some lernin'”, I took the numbers presented at BrainShavings (assuming they are correct), and did some very basic analysis of US troop deaths over the period of Jan to Aug, 2003 to 2007 (throwing out Jan & Feb 2003 since we were not in-theater). Here’s what I found.

  • Average deaths per year are up. 2007 has 27 more deaths on average than the entire war, and 33 more deaths on average than 2006.
  • The “low outlier” number (avg – 3x std deviation) for 2007 is the highest it’s been all war.
  • The “high outlier” number (avg + 3x std deviation) for 2007 is the 2nd highest it’s been all war.
  • Every month this year has had a higher death toll than the whole-war by month average. August was closest to the average for it’s month at +4. The next closest was around +20.
  • There are monthly trends. Spring (Apr, May, trailing into Jun) are traditionally the deadliest months in the Jan-Aug timeframe, averaging nearly 50% more deaths than other months. July is 2nd lowest, with only 2 more deaths on average than February. July also has the least variability of all months in the sample, with a standard deviation of just over 14 deaths. The “busy period” in the spring also has the highest variability (nearly 40 deaths standard deviation for May).

I did this without any “outlier” treatment. If I replace the two highest totals (April 2004 and May 2007) with the global mean, it does little to change for the overall analysis, other than make 2007 look even more deadly than it already does (believe it or not). Interestingly, 2006 showed a slight reduction in US deaths (over the period of interest) compared to 2005, a reversal of the overall trend of the war to get more deadly every year.

Our lefty pals also choose to completely ignore September through December of 2006. Now maybe I just don’t get The New Democratic Math, but I don’t understand why combat deaths in the last third of 2006 don’t fit into the picture they’re painting here.

Because we are trying to compare apples to apples here. The Oct-December timeframe is historically even more deadly than the May-June period, according to past trends. You can’t talk about the surge’s effect on that period until you have relevant data, and we don’t yet. We have to control for seasonal variability in conflict intensity.

Tagged with: