Advocate for the Right Tools in Teacher Evaluation
The blog post that follows is lifted (and slightly modified) from a lengthy comment I put on the Accomplished California Teachers Ning site, which is our members-only online platform. A Los Angeles teacher asked me to elaborate on a prior comment I’d made suggesting that I think teachers need to fight against the use of value-added measures in teacher evaluation.
Okay – you asked for it!
Before I post the particulars, please note: standardized tests are not well-designed diagnostic tools when applied to individual students, let alone teachers. They are mainly designed to give an indication of school, district, or state level performance. They may have some value at the individual student level, but even that breaks down if you get to the subtest level (see J. Cizik, 2007). So, if I had my way, any debate on this issue would begin (and moments later, end!) with the following exchange:
Q: Are these tests you’re using in our evaluations designed for that purpose, or validated for that purpose?
A: Um, no.
Q: What measures should we use then?
This is not just my opinion. I’m relying on the policy position of the nation’s leading educational research organizations and professional associations: the American Psychological Association, National Research Council, American Education Research Association, and National Council for Measurement in Education all have said that tests and VAM are not up to the task of teacher evaluation. Period.
Sadly, that won’t win the argument. They’ll come back at you and say, “but data shows that effective teachers can raise scores…” or “we’re only using the tests for 20%…” or “other methods are not objective or reliable either.” To which I would say:
– statistical correlation is not causation. Data can also “prove” that effective 5th grade teachers raise their students’ 3rd grade scores. Of course, we know that doesn’t happen, so this falsification test demonstrates the the test data are influenced by unseen factors (J. Rothstein, 2010).
– the percentage is arbitrary and subject to change, but since we know the data are faulty, they don’t belong in the evaluation. At all.
– Other methods like observation, portfolios, self/peer evaluation, etc., may have reliability issues used in isolation and infrequently. However, used in concert and multiple times, they are the best option we have. James Popham, professor emeritus at UCLA and a pre-eminent expert in assessment, says that ultimately, the “professional judgment of well-trained colleagues” is the best option we have for teacher evaluation systems. (When he says “colleagues” I think he means educators, but not necessarily immediate peers in your school; they might be former teachers, principals, etc.).
Here are some of my blog posts about VAM in evaluations, in which I elaborate on some of the items above, and link to more sources:
- Turning the Tables: VAM on Trial – This is my favorite prior post on the topic, in which I imagine an attorney shredding a VAM advocate in a trial cross-examination. Inspired by an actual LAUSD lawsuit.
- Big Apple’s Rotten Ratings – Let’s learn something from the huge VAM mess New York City just stepped into.
- Evaluating Teachers with VAM: Variable Ambiguous Mistake – includes citations from an important VAM report by the Economics Policy Institute.
- Bleeding the Patient: VAM Nauseum – As the title suggests, I’ve tired of hearing VAM defended with the type of unsubstantiated arguments that made bleeding of patients seem like a good idea at the time.
- Open Letter to California Public Officials – A blog post that led to this slide in a presentation I made to the San Mateo County School Boards Association.
I’m glad to talk more about this with you or any of your colleagues. To be frank, I think some unions and some teacher leadership groups have made strategic errors by entertaining the idea of compromise on VAM in teacher evaluation policies. Think how much stronger our advocacy for teaching and learning would be if teaching groups like Accomplished California Teachers, Educators for Excellence, Teach Plus, UTLA, NewTLA, and others could all agree and present a united vision to LAUSD! Let’s include evidence of student learning, but make it student learning that matters: authentic, meaningful, substantive, and richly integrated with the curriculum. But as long as VAM remains part of any evaluation proposal, I would advocate rejection. Of course, it helps to have an alternative, not just a rejection, which is what teacher groups are thankfully getting around to in recent months and years.
Great post–a good summary of all the work you’ve done on VAM and teacher evaluation, written in clear language. Will share. I’m not sure what you mean in the first sentence, second paragraph (“Before I post…”). Perhaps a word is missing?
Thanks for reading and commenting, Nancy. That sentence may be a bit casual (since I lifted this from a more casual context), but I don’t think there’s any missing word.
I am confused by your opening, unless Nancy is correct, and you left out a word. Did you intend to state, “standardized tests are well-designed diagnostic tools when applied to individual…”? The absence of the word “not” in that sentence was frankly jarring and left me questioning the point of the piece for quite a while. Were I not a regular reader, familiar and in complete agreement with your position, I may have mistaken the message you were trying to put forth. My appologies for belaboring the point. Thank you for another spot on blog posting. I will be sharing it.
I think you misrepresent some of the associations in the beginning of your blog post… they have not all dismissed VAM, but rather they recognize it’s limitations.
I described their position as “tests and VAM are not up to the task of teacher evaluation.” Note I said for evaluation – not rejecting it for all uses.
In their own words, the NRC states “VAM estimates of teacher effectiveness … should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable.” The AERA position is “Tests valid for one use may be invalid for another. Each separate use of a high-stakes test, for individual certification, for school evaluation, for curricular improvement, for increasing student motivation, or for other uses requires a separate evaluation of the strengths and limitations of both the testing program and the test itself.” How have I misrepresented them?
You suggest they should not be used in evaluation, period (that was my impression). That’s not my understanding. My understanding is that they should not be used as the only factor in evaluation.
From the EPI report found here: http://epi.3cdn.net/b9667271ee6c154195_t9m6iij8k.pdf
“although standardized test scores of students are one
piece of information for school leaders to use to make
judgments about teacher effectiveness, such scores should be only a part of an overall comprehensive evaluation”
mlsgarden – yikes! You’re right, of course. And Nancy, too. I somehow got fixated on the wrong part of the sentence, and as sometimes happens with writers, my eyes were seeing what my brain was thinking more clearly than what I was writing. I had it correct in the original post on our ACT members’ site, but somewhere along the way in a minor edit to some of the other wording, I lost track of a “not” – now corrected. Thank you.
Good morning David… thanks for the excellent summary just in time to take to an important conversation about the future of my school network. I’ll be forwarding your essay along to colleagues and referring to your prior work. Being in the classroom every day — as all educational policymakers should be — really brings home the inadvisability of using test scores to evaluate teaching. Please continue to update us and share your syntheses.
Mike, in this particular blog post, I did not specifically cite the EPI report, except to refer readers to another blog post that does cite their report. So… I still stand by my characerizations of the organizations that I did cite. In the other blog post, the one that does deal specifically with the EPI report, I don’t think I misrepresented their views either – but if you want to comb through that one I’m open to being corrected.
When push comes to shove, I’d challenge the EPI authors to answer the same question I posed in bold print at the outset of this blog post. Their report was good, but not flawless.
I just went through my latest experience of administering the annual California “Standards” Tests. I put “standards” in quotation marks because they attempt to assess so few of them and seriously bungle the job. Garbage. They do nothing for students, and nothing for teachers – at least, nothing we shouldn’t be able to do more quickly, more accurately, more productively and diagnostically in the course of our regular operations.
Yes- I know you didn’t cite this report- but I’m looking around trying to find the reports and/o policy positions you reference and trying to find the one that says don’t use VAM, period. Please feel free to give me a link or publication- I’d like to do that. Everything I’ve seen cautions against using it for high stakes decision making. I agree with that. I’m certain you agree with that.
As for the imperfect EPI report I reference above-, I lifted this summary from Valerie Stauss:
It was written by four former presidents of the American Educational Research Association; two former presidents of the National Council on Measurement in Education; the current and two former chairs of the Board of Testing and Assessment of the National Research Council of the National Academy of Sciences; the president-elect of the Association for Public Policy Analysis and Management; the former director of the Educational Testing Service’s Policy Information Center; a former associate director of the National Assessment of Educational Progress; a former assistant U.S. secretary of education; a member of the National Assessment Governing Board; and the vice president, a former president, and three other members of the National Academy of Education.