Two reporters from today’s Dayton Daily News allege that cheating on standardized tests is rampant in Ohio and the Ohio Department of Education is willfully ignoring the evidence.  That evidence being an unscientific model that they were given by reporters from the Atlanta Journal-Constitution who are on a quest to prove that the massive scandal in Atlanta wasn’t an isolated case.

Instead, they have embroiled schools nationwide in another witch hunt using methods reminiscent of the Salem Witch Trials where the guilty can never prove their innocence.

Such as the following method used to convict “warlock” George Burroughs in 1692:

See if they can say the Lord’s Prayer

If they don’t, they’re guilty. If they do, they’re guilty too. George Burroughs, the only minister to be executed during the Trials, ran across this problem. He was standing at the gallows to be executed when he recited the Lord’s Prayer to prove his innocence – it was believed that a witch (or warlock, in this case) would be unable to utter the holy words. People were momentarily convinced that the jury had wronged him until a minister named Cotton Mather told the crowd that the Devil allowed George Burroughs to say that prayer to make it seem as if he was innocent. Ahhh, of course. With Satan himself apparently working right through him, Burroughs’ fate was sealed and he was hanged moments later. (Mental Floss)

This is the process of questioning schools these days.  School districts such as Cleveland are expected to make dramatic improvements in a short period of time.  But if they actually accomplish the task they aren’t questioned by reporters as to their method of change, they are instead accused of cheating for demonstrating abnormal test results.

Dayton reporters Ken McCall and Laura A. Bischoff explained the methodology they used in partnership with the Atlanta Journal-Constitution (who ran the story yesterday) in simplistic terms:

  1. The newspapers used a statistical model on year-to-year changes in reading and math test scores at the school level for third- through eighth-grade students during seven years, beginning in 2005.
  2. The analysis flagged as suspicious any score change that had less than a 5 percent probability of occurring by chance based on all the other scores on that test in that state.
  3. The study then calculated the probability that any district would have the number of flagged or improbable scores it had in any one year. In some cases, those probabilities were approaching zero.

The Atlanta Journal-Constitution goes into a bit more detail:

  1. We created approximate cohorts by matching results for each school, grade and subject with test results from the previous grade in the previous year.
  2. For each state, grade, cohort and year, we created a linear regression model, weighted by the number of students in a class, and compared the average score for a class with the score predicted by the model based on the previous year’s average score.
  3. Classes with scores rising or dropping with a probability of less than 0.05 were flagged as unusual.
  4. Finally, we looked for improbable clusters of unusual score changes within districts by calculating the probability that a district would, by random chance, have a number of flagged classes in a year, given the district’s total number of classes and the percentage of classes flagged statewide.
  5. The district calculations excluded schools identified as charter schools.

To summarize the process, the modeled attempted to compare the scores of students as they progressed through the grades to determine if their test scores from the following year were out of an expected range – too high or too low — that was determined by the reporters, not statistics or testing experts — they even took credit in their article:

With the analysis largely complete, we consulted statisticians ?and testing experts.”

Sounds like a solid plan for a few journalists, don’t you think?  Such a solid plan produced some outstanding analysis of the results.  From Dayton:

However, more than 2,600 improbable changes — large spikes or drops — were found in Ohio schools between 2005-2011. Because of the sheer number, it seems unlikely all of them can be explained away by the quality of instruction, demographics, or changes in mobility and class size.

Unless the boundaries of a district changes to suddenly bring in wealthier or poorer students, the average students coming and going in a district are going to look pretty much the same….

Ah, greater scientific analysis can scarcely be found — “it seems unlikely” coupled with the underwhelming observation from their experts that wealth is a key determinant of test results.  Their method supposedly includes high-level calculations of probability, but the end results can produce nothing more than “it seems unlikely?”  Unlikely not being the same as impossible, of course.  But did the journalists dig into their results to rule out that such changes in demographics did not occur at any particular grade or school?  No.

Furthermore, note that the “2,600 improbable changes” include spikes and drops in test results.  These journalists are putting out this theory of irregularities and cheating by schools based on numbers that include falling scores!  Right, because so many educators are interested in risking their careers by encouraging children to change their scores to incorrect answers to suffer a significant DROP in their test scores.  Yet those numbers are touted by these “journalists” in their sweeping accusations of improbable scores and cheating.

During this study’s stretch of time, the state’s average proficiency rate on these tests increased just over half of the time (58%).  We could then extrapolate that only 58% of those 2,600 changes reflect improbable spikes in test scores, a revised total of 1,486.

With the study including 65,159 total classes, 1,486 represents a mere 2.3% of the total, meaning that the Ohio data shows that only 2.3 out of 100 classes (in a seven-year period) showed a statistically significant growth.  Conversely, the remaining number of improbably drops represents 1.7% of the total number of classes.  In other words, 1.7 out of 100 classes demonstrated a significant drop in test scores.

Put another way, those 2,600 irregularities average out to 371 per year, an estimated 215 too high and 156 too low.

So while these reporters are recklessly accusing approximately 215 educators annually of cheating to increase student test scores, what exactly are they accusing the other 156 of doing?

These journalists have absolutely no idea.  And they don’t care who they harm in the process.

Their own experts told them the following:

Approximate cohorts mostly compare similar students.  Because of this, large jumps or dives in test scores should be rare, experts told us.

How rare?  They don’t say.  But if we have approximately the same number dropping significantly as we do that are increasing, cheating doesn’t explain it.

And neither can these “journalists.”