New Teacher Evaluation Systems are not Trustworthy Without Better Assessments

It seems that the biggest issue these days in education “reform” is the attempt to change how teachers are evaluated. Locally in New York, the state legislature passed a new evaluation system last year and the Board of Regents more recently released their guidelines for the implementation of that law, though much of the details remain to be negotiated between local districts and unions. Nationally, the Gates-funded Measure of Effective Teaching Project is starting to share some conclusions from the first two years of their study, and a recent report from the Center for Teaching Quality’s New Millenium Initiative by a group of Denver teachers has garnered some positive attention in the blogosphere from Renee Moore, Ariel Sachs, Dan Brown, and others.

Like nearly all issues in education, this one is complex. I have gotten to see just how complex it is from two vantage points within the NYC discourse: I have been working for the past semester to support the social studies teachers in NYC’s transformation schools who were subject to the pilot of new assessments that are to be part of the new teacher evaluation system. I am also on the UFT negotiating committee for the new system. Unfortunately, I am under non-disclosure obligations for both sides, and can’t yet write from those experiences. I did, however, have the luck to be invited last night to participate in a webinar through the Teacher Leadership Network with a researcher from the Gates MET study, so will use that study as a jumping off point for some comments.

There is tremendous reason to be skeptical, if not downright resistant, to Gates money being used to support this study, as Joanne Barkan so brilliantly documented in Dissent. I’m willing to put that aside for the minute, to assume the best intentions of the researchers who are working on this and other projects. The basic logic of the MET project, as well as all efforts to measure teacher effectiveness, seems to be as follows “if we can identify what goes into good teaching, then we can a) replicate it through better teacher education and development and b) remove ineffective teachers that will be replaced with the better developed teachers we will then be able to create.” The less benign version of this argument, which is motivating the politicized teacher evaluation laws passed around the country is that “we need to identify bad teachers so we can fire them and replace them with good ones.” Again, I’m willing here to deal with the better intentions of former, despite all the others on the bandwagon.

The billion dollar question then becomes, what is “good teaching”? And unfortunately, this is the question I have seen dealt with in far too simplistic ways, if at all. The MET study claims that evaluation should be based on “students’ achievement gains” and “any additional components of the evaluation…should be valid predictors of the student achievement gains” (“Working” p. 5). This seems like incredibly circular logic, as it implies that other measurements of teacher effectiveness are only valid if they predict students’ gains on the standardized tests the study used. And while in their initial findings, the MET study showed that “the type of teaching that leads to gains on the state tests corresponds with better performance on cognitively challenging tasks and tasks that require deeper conceptual understanding, such as writing.” (“Learning” p. 5), this reveals further flaws in the project’s logic, as it places the cart squarely before the horse. Shouldn’t the question be: are the assessments of student outcomes valid indicators of students’ ability to complete cognitively challenging tasks? Is it not likely that teachers would see even more growth in students’ capacities for deeper conceptual understanding without the state tests that assess other skills and knowledge in the way?

The conversation in the webinar yesterday focused largely on the question of trust in developing new observation systems. This could not be more important. For teachers to be able to trust any new observation system, and for the public to be able to trust the validity of any system, there needs to be a much larger focus on what the desired outcomes are for students’ learning, and what is the most meaningful way to assess students’ attainment of these outcomes. Organizations like Edutopia and Fairtest have documented the incredible flaws in current assessments, and Joanne Barkan, once again, showed the misuse of these assessments to attack teachers using deeply-flawed mathematical models like Value-Added (which is also the basis for the Gates study’s data). There needs to be exponentially more dialogue initiated in order to develop better assessments that assess meaningful outcomes.

We also, I think, need to be prepared to recognize there will not be one silver bullet solution to this issue. The new evaluation system in New York allows for one district to use different assessments from from another, and even for clusters or networks of schools within a district to choose different assessments. This is a move in the right direction. Just as colleges seem to have no problem recognizing that the IB and AP are equally valid assessments, so to, should we allow more flexibility for schools, or even teachers, to have access to a battery of meaningful, rigorous, and valid assessments of student learnings. As I wrote yesterday, there is never a silver bullet solution for the complexities of education, and we should not expect things to be any different with assessment.


4 thoughts on “New Teacher Evaluation Systems are not Trustworthy Without Better Assessments

  1. But what input would teachers have in new assessments? If you read about the NYCDOE creating new assessments (without UFT input as of mid May) for schools, and if you draw the same conclusions about the role in assessment that the publisher Parsons seems to want to play that I have (based on their intention to use their foundation to team with Bill Gates and develop curriculum based on Common Core that the company will then turn and sell), the answer MUST be none.
    And if teachers will have no role in establishing these new assessments, then why would they (sorry, why should we) agree to a new evaluation system that is 40 percent based on these new exams?


  2. The involvement of a for-profit corporation like Pearson makes me uneasy. As does the use of Value-added measures. Teaching and learning are so complex, the interactions between teacher and student are different depending upon the personalities and the contexts–at this point in time I can think of no over-arching evaluation system that will effectively capture what “good teaching” looks like. As I was reading this post, I began to think of “good parenting.” We know some major components involved in being a good parent, but how would we ever develop an evaluation system to measure it? True, we know what not to do–Casey Anthony and the OctoMom come to mind right away. I think the same is true of teaching. We have a general sense of what highly accomplished teaching looks like (NBPTS Core Propositions and subject area Standards), and we (mostly) know what not to do. But trying to codify and measure that is like (to steal a line from “The Sound of Music”) “trying to pin a wave upon the sand.”


    1. Gail – I really like your metaphor of comparing good teaching to good parenting: there is not one right version of it, but we can agree (to a large degree) on how not to do it. I’ve been using a similar analogy of being a quarterback recently, but I think yours resonates better with more people. The question, then, is how do we get politicians and others to accept this?


  3. I share your concerns about the current high-stakes tests being used to evaluate teachers. We are currently seeing how the scores on these tests are driving school reform, here in California.

    It”s like were a fat guy on a fad diet. Everywhere we look, we see what are currently unattainable goals, like pictures of supermodels in magazines. At first, some of us were able to achieve those goals in the real world. However, year after year, the bar has raised, like models getting thinner and thinner. In 2014, when perfection is the minimum acceptable score, I would imagine every school in the nation will be “failing.” This is like our current magazine models, who go through extensive training, make-up, surgery and computer enhancement to become more beautiful than humanly possible.

    Meanwhile, school, like normal men and women, are a variety of shapes and sizes. We are a variety of effectiveness for student achievement. Schools like mine in Oakland, CA, really are unhealthy. With a 40% drop-out rate, I’m glad we are working on our own reform plans. Other schools are perfectly healthy, if one uses more traditional measures. Under 20% of their kids drop out, and more than half of those dropouts find success in their local alternative education system.

    But now, even these traditionally healthy schools are thinking of themselves as failing. It’s not because they have grown worse, it is because the minimum level of passing has passed them by. Now even these schools are casting about looking for their quick fix… their silver bullet.

    And those quick fixes for schools — those fad diets — are there for the taking. Every program from every ed-reform huckster claims “higher test scores” like the fantastic claims of instant weight loss from the countless books, pills, and swills offered right along side the surgical and computer enhanced models in the magazines.

    It’s a brilliant piece of marketing! Don Draper would be so proud, he would pour himself a drink!

    I can only imagine the speed with which the hucksters will have teacher-imporvement programs ready for sale once teacher evaluations are tied to high-stakes tests. I can see it now – “Guaranteed to raise test scores!”


Comments are closed.