The last I'll post on teacher data reports

I’m just about ready to move on to writing about more important and pressing topics than the release of teacher data reports, but wanted to share a couple more great pieces that point out flaws in the reports, and their release.  What all three of these pieces share — and every piece I’ve read in the New York Times, Daily News, and Post have lacked — is actual statistical analysis.  This is what we call rigor!  Shame on New York’s largest papers for lazy reporting.

Gary Rubinstein is brilliant once again:

Out of 665 teachers who taught two different grade levels of the same subject in 2010, the average difference between the two scores was nearly 30 points. One out of four teachers, or approximately 28%, had a difference of 40 or more points. Ten percent of the teachers had differences of 60 points or more, and a full five percent had differences of 70 points or more. When I made my scatter plot with one grade on the x-axis and the other grade on they y-axis I found that the correlation coefficient was a miniscule .24

Rather than report about these obvious ways to check how invalid these metrics are and how shameful it is that these scores have already been used in tenure decisions, or about how a similarly flawed formula will be used in the future to determine who to fire or who to give a bonus to, newspapers are treating these scores like they are meaningful. The New York Post searched for the teacher with the lowest score and wrote an article about ‘the worst teacher in the city’ with her picture attached. The New York Times must have felt they were taking the high-road when they did a similar thing but, instead, found the ‘best’ teachers based on these ratings.

Gotham School’s Philissa Cramer explains why the value-added formula used in the reports guarantees a spread of above and below average teachers between different types of schools (which was reported as news by the Times):

Value-added measurements like the ones used to generate the city’s Teacher Data Reports are designed precisely to control for differences in neighborhood, student makeup, and students’ past performance.

The adjustments mean that teachers are effectively ranked relative to other teachers of similar students. Teachers who teach similar students, then, are guaranteed to have a full range of scores, from high to low. And, unsurprisingly, teachers in the same school or neighborhood often teach similar students.

And finally, Aaron Pallas gives more great insight on how little difference there is between large percentiles in TDR rankings:

If this were to happen, Ruiz’s value-added score would rise from 0 to .05. And the percentile range associated with a value-added score of .05 is 75 to 77. All of a sudden, an “average” teacher looks pretty good. And this isn’t due to the margin of error! It’s just because many teachers are about equally effective in promoting student achievement, according to the value-added model in use. A relatively small change in student performance shifts a teacher’s location in the value-added distribution by a surprisingly large amount.