Deflating the Rhetoric in the Evaluation Debate

August 21, 2012

This blog post started out as a series of comments I offered in response to a blog post by Educators 4 Excellence Executive Director Ama Nyamekye; she argues that the amended version of Assembly Bill 5, currently headed to the state Senate, will weaken teacher evaluation in California by requiring districts to bargain with unions if there is to be any use of standardized test scores in teacher evaluations.  Her prediction is that unions will prevent that from happening, thus endangering the prospects for better evaluation.  I hope her prediction about union resistance to bad ideas is correct, and I couldn’t disagree more about the appropriateness of that fight.

Ama, like you I’ve spent considerable time working with teachers on the question of how to improve evaluations.  I respect your experience and the fact that your work has led you to different observations and conclusions.  However, I have problems with your generalizations about teachers’ attitudes, and your framing of the issues in the debate over AB5.

You write that, “if Californians — and teachers especially — had a second to look inside the legislation, they’d see that lawmakers are about to remove any chance that the state’s educators have of receiving the meaningful feedback and support they need…”  It becomes clear later in your piece that for you, this use of the phrase “meaningful feedback” is a synonym for “standardized test score data” (presumably with some value-added measurement applied).  Maybe you know those teachers who are clamoring for that data.  I do not.  Where were they when the L.A. Times debacle was unfolding?  They seemed significantly outnumbered.  I know an active teacher leader in E4E, but her thoughts on test score data in teacher evaluations seem much more nuanced, if not outright skeptical.  (But I’ll leave it to her to add any more than that if she wants to jump in).  Overall, in my years of working with and talking to hundreds of teachers around the state and the country, I’ve found overwhelming opposition to the idea of using standardized test data in teacher evaluations, and we saw most teachers expressing similar doubts in a recent national survey that asked about the usefulness of standardized tests.  There is also an solid consensus in the research community that the use of test scores for teacher evaluation is inappropriate.  (See below for a list of research organizations* that have taken that position).


Authentic assessments of student learning could help teacher evaluations take flight – without relying on bubble tests that tell us very little.

Where I do agree with you is that teachers want better evaluations, better feedback, and many are open to including student work in the evaluation process.  Teachers who worked on the ACT teacher evaluation report were unequivocal in supporting that concept but rejecting state tests for that purpose.  National Board Certified Teachers only achieve certification if they can demonstrate their contributions to student learning – and most teachers embrace that requirement in the process.  However, since almost none of the standards I follow in my teaching can be assessed even close to adequately using state tests, I wouldn’t consider those test results for any evaluation of my teaching.  (That’s one of several reasons).

You write, “In other words, teachers are left in a no-win situation — either they continue to not receive proper evaluations of their work or they get an evaluation system that only tells them half a story about their performance.”  You are presenting a false dilemma.  While I agree that most teachers are not receiving “proper” evaluations, the fact is that many teachers are receiving good evaluations – and without standardized test score data.  If you truly believe your statement, you are asserting that the majority of teachers, who teach untested grades and subjects, cannot receive a proper evaluation.  Teachers in private schools I suppose are similarly fated to flimsy evaluations, knowing merely half of the story of their work with students.  Every teacher in Finland and Singapore must be getting improper evaluations, too.  Supposedly, I do teach a tested subject and grade level.  As a high school English teacher, I would argue that my students standardized test scores tell you next to nothing about my performance, for numerous, research-backed reasons.

Finally, you sound the fiscal alarm, citing the potential loss of Race to the Top grants as a reason to stop AB5.  These federal grants could add up to hundreds of millions of dollars, as you stated, but that much money would be spread over a number of districts and a number of years.  Relative to our overall education spending, perhaps it would help, but it’s not likely an amount worth pursuing at any cost – especially when you consider the likelihood that up to half of the grant money would never touch classrooms anyways. In the New York Times, Michael Winerip wrote about how impressive New York’s $700-million Race to the Top award sounds, until you realize it’s one-third of one percent of what the state will likely spend on education over the life of the grant.  On a more tangible scale, it’s as if you’re planning to spend $10,000 and I’m offering you another $33.  You need a stronger incentive than that.

I’ve previously described these federal manipulations as an attempt at cheap conversions, taking advantage of our dire economic conditions to ram through questionable policies.  And is it worthwhile?  It’s more like an invitation to legal battles and red tape, enough that some winners are less than thrilled and some potential applicants are sitting out.  See: New York state; New York localities; Hawaii; Clark County, NV; Tennesseea district in Delaware; districts in Ohio and Georgia (note: the article cites nearly 30 districts dropping out in Ohio; the eventual number was nearly 60).

So, if we’re going to debate the merits of AB5, perhaps we could work from a more balanced assessment of what California’s teachers want and need, what state test scores cannot provide, and what minimal risks we face – if any – from missing out on a federal grant.

*Research organizations that have found standardized test scores and value-added measures are not appropriate for teacher evaluations and/or high-stakes decisions include the following:  The National Council for Measurement in Education; The American Psychological Association; The American Education Research Association; The RAND Corporation; The National Academies/National Research Council; The Economic Policy Insitute; Educational Testing Services.  For citations, and more on the topic, I refer you to this blog post, and respectfully ask if you in turn could cite any professional research associations or organizations that would support your position on the use of test scores for teacher evaluation.

5 Comments leave one →
  1. Mike permalink
    September 2, 2012 11:55 am

    In the interest of time, I’ll just leave this link to a RAND policy report in which RAND provides recommendations for the use of VAM in teacher evaluations. Most of the institutions you list offer similar recommendations. However, you repeatedly seem to interpret these recommendations for improvement as a recommendation that they should not be used.

    Rather, RAND ends this policy report with the following reminder: efforts to use student performance data will require experimentation.

    Mike McCabe (teacher)

    • David B. Cohen permalink*
      September 2, 2012 9:14 pm

      Fair enough, Mike. I appreciate the comment and the link. Could we agree that, until the experimentation shows stability and reliability, and until the experimentation involves measurement tools that are designed and validated for the purpose of teacher evaluation, it would be premature to incorporate test-based student data in a teacher evaluation? If not, I don’t see how we can argue that we’re designing evaluations that are truly designed to improve teaching. If RAND is saying “not yet” and I’m arguing “not at all”, at least we’re on common ground at “not now.” And of course they’re going to say we should keep trying; their existence and their livelihoods depend in part on continuing the effort.

      • Mike permalink
        September 3, 2012 10:41 am

        I’m ok with student testing used as a small part of teacher evaluations (even with their current limitations), which is how I interpret most of the policy recommendations I’ve read. Most state boards that want to use VAM in evaluations seem to want closer to 50%, which we can certainly agree doesn’t make sense. Since moving to a private school a year ago after 12 years in a public school district I certainly see the benefits (and drawbacks) of decentralized policy making. In particular, I think that teachers and parents probably do a pretty good job of informally evaluating teachers and informing administration of concerns. Of course- this does not require VAM, and is much cheaper :). In the mean time, how to ensure equity and fairness in teacher evaluations in a cost effective manner will likely haunt public school systems for some time to come. Sadly, I do not think the solutions will move us in the right direction so long as politicians are in charge.

        Needless to say, I appreciate your work on this front and think the model evaluation system you and your colleagues created is most definitely a step in the right direction.

  2. David B. Cohen permalink*
    September 3, 2012 2:40 pm

    Thanks for the comment, Mike, and the plug at the end. Our work at ACT didn’t quite yield a model evaluation system, but I think we did a good job of distilling it down to about 7 guiding principles. If you compare similar reports and systems, I think there’s a solid consensus around the major pieces of the puzzle. Testing becomes quite a sticking point; if we could leave it at “small part” – and not even specify any further than that, I might relax, a little. But once it’s in, it’s hard to see we avoid the problems and the pressure to increase the percentage. I don’t even like the idea of breaking evaluation into percentages. I used to do that when evaluating student writing: 20% for intro and conclusion, 20% for evidence and analysis, 30% for sentence structure and diction, etc. What I found is that forcing quantifiable measures onto inherently non-quanitfiable information was an exercise in arbitrary manipulation and ultimately futile. If a student did everything else perfectly but omitted evidence and analysis, could that possibly be a B-minus? There are dealbreakers, and there are times when the sum is greater than the parts, with students or with teachers.
    But I do agree that it would be worthwhile to find a way to include student and parent feedback in evaluations.


