Teacher evaluation has been a frequent topic in this space: Accomplished California Teachers (ACT) first coalesced as a teacher leadership group in large part to produce a report on evaluation that would feature teacher voice regarding current practices and promising reforms for California schools. I’ve also written frequently about an evaluation method that stands out as the worst popular idea out there – using value-added measurement (VAM) of student test scores as part of a teacher evaluation. The research evidence showing problems with VAM in teacher evaluation is solid, consistent, and comes from multiple fields and disciplines – most recently, statisticians (more on that in a moment). The evidence comes from companies, universities, and governmental studies. And the anecdotal evidence is rather damning as well: how many VAM train-wrecks do we need to see?
On the relevance of student learning to teacher evaluation, the ACT team that produced our evaluation report was influenced by the fact that many of us were National Board Certified Teachers. Our certification required evidence of student learning – after all, teaching without learning is merely a set of word or actions. Board certified or not, our team members all agreed that an effective teacher needs to be able to show student learning, as part of an analytical and reflective architecture of accomplished teaching. It doesn’t mean that student learning happens for every student on the same timeline, showing up on the same types of assessments, but effective teachers take all assessments and learning experiences into account in the constant effort to plan and improve good instruction.
Value-added measures have a certain intuitive appeal, because they claim the ability to predict the trajectory of student test scores, theoretically showing the “value” added by the teacher if the score is higher than predicted. This deceptively simple concept sounds reasonable, especially for non-teachers, and even more so for policy makers. They often seem eager to impose on teachers and administrators what is essentially one-way “accountability” for the success of schools; stagnant or declining scores bring negative consequences, so the public can be reassured that insecure school personnel will be compelled to do their jobs. Meanwhile, policy makers often ignore (because the voters and media allow them to ignore) what should be their share of accountability for the conditions of schools, and even the outside-of-school conditions that all the experts agree outweigh teacher effects on standardized test scores. Yes, you read that correctly: most of the variation in students’ test scores can be accounted for by factors outside of school - factors like family wealth, educational attainment, health care, and similar.
If you care to look at some of my prior posts on the topic of VAM in teacher evaluation, you’ll find that education researchers, economists, scientists, mathematicians, and experts in psychometrics (the measurement of knowledge) have all weighed in against the idea. Some offer stronger objections than others, but most agree that VAM is not stable or reliable enough for high-stakes usage. It has also been noted by multiple professional associations that measures validated for one purpose (measuring student knowledge) cannot be assumed valid for other purposes (measuring teacher effect). The main proponents of VAM use for high-stakes personnel decisions all seem to be economists (Hanushek, Chetty, Kane), or researchers with some vested interest in finding what they end up finding (Gates Foundation, William Sanders).
Well, the latest professional group to weigh in on the topic was the American Statistical Association. The ASA is not against the concept or use of VAM, but they do caution that VAM should only be used under a whole set of circumstances that are quite unlike the circumstances found in schools and districts using VAM. For example, VAM should be used by experts, with clear information regarding formulas and margins of error, and careful analysis of how sensitive statistical models are when the assessment changes.
Here are some choice quotes from their April 8, 2014 report:
VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.
Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
VAMs are only as good as the data fed into them. Ideally, tests should fully measure student achievement with respect to the curriculum objectives and content standards adopted by the state, in both breadth and depth. In practice, no test meets this stringent standard, and it needs to be recognized that, at best, most VAMs predict only performance on the test and not necessarily long-range learning outcomes.
[Regarding studies that have found some predictive ability in VAM scores by teachers, with "correlations generally less that 0.5]: These studies, however, have taken place in districts in which VAMs are used for low-stakes purposes. The models fit under these circumstances do not necessarily predict the relationship between VAM scores and student test score gains that would result if VAMs were implemented for high-stakes purposes such as awarding tenure, making salary decisions, or dismissing teachers.
I should also note that there’s a portion of this report I disagree with, regarding the potential use of VAM to evaluate teacher training programs:
A VAM score may provide teachers and administrators with information on their students’ performance and identify areas where improvement is needed, but it does not provide information on how to improve the teaching. The models, however, may be used to evaluate effects of policies or teacher training programs by comparing the average VAM scores of teachers from different programs. In these uses, the VAM scores partially adjust for the differing backgrounds of the students, and averaging the results over different teachers improves the stability of the estimates.
It’s unclear to me how VAM within schools or districts is recommended for observing correlation, but when extended beyond schools to involve even more complex interactions of variables known and unknown, we’re now talking about evaluating effects (causation, rather than mere correlation). While I understand the value of larger sample sizes in reaching stronger conclusions about data, I question the ability of anyone undertaking such an evaluation to control for the differences among schools. The quote above mentions only the “differing backgrounds of the students.” However, different teacher training programs develop different relationships with schools and districts. Teachers are not randomly distributed to schools or communities after their training, and the school’s and community’s effects on the teachers would seem highly relevant. There are studies that show the effects of principals on test scores, the effects teachers on teachers, effects of class period length, effects of tutoring that may or may not be available, effects of libraries that may or may not even be open, etc. My open letter to California policy makers on this topic argued, and would stand by the argument, that there are simply too many interacting variables to reach any reliable conclusions that depend on value-added measures.
Teachers are tribal people. We build a nice little fortress and stay inside as much as possible, defending the gates when necessary. This tribalism has saved us in an ever-shifting landscape, but it’s got its limitations and it may soon be the cause of our demise. The two big tribes now are Old Generation and Next Generation teachers, and the most obvious place of need is our local unions. The Next Gen teachers are attracted by the glamour of the reform groups and the promise of an amplified voice; the Old Gen prefers the lunchroom and the union hall. Over the last school year I’ve gotten the chance to hang out with 58 Next Gen teachers from 17 states, courtesy of the NEA. I’ve learned that we’re lined up on either side of a generation gap that is worlds apart, but that each side is a critical part of the equation. We need each other more than we will benefit from sticking with our tribe.
Because the origins of unions are rooted in staying alive in a hostile environment, the Old Generation totally gets the need for protection, collective bargaining and the need to jump up and down once in a while. The Old Gen safeguards everyone’s rights whether they like it or not – experience has taught them that careers are subject to the whim of the public, politicians, and now philanthropists. We once did believe that our good work would be our protection from harm or unfair practices; we never saw the need for pensions or fully-funded health benefits as youngsters. And then life taught us something – we’re treading faster for less money and less respect, and sometimes one of us gets caught in the machinery and goes down. Working conditions, pay raises, benefits, retirement, the bread and butter issues, these are the spears and bagpipes of teachers who are nearing the end of their tenure and see the world falling apart around them. They sometimes see the Next Generation of teachers as an opposing camp, vulnerable to the influence of outside agencies. But these are the people we need to protect us, our profession, and public education and we need to make friends now.
The Next Generation doesn’t seem to register the Old Gen to the same degree. The Old Gen are just irrelevant, cranky strangers in red t-shirts standing outside the board of education, waving signs. These younger folks see step and column pay scales as ridiculous, and can’t understand why anyone would ever have a problem with being evaluated or using student outcome data as part of that evaluation. The reform groups give them opportunities to meet policymakers, be on panels, write policy papers – that’s what drew me in and that’s what we need to do to compete. In the last three years in LA, Next Gen issues have been career pathways, evaluation and new pay structures. They don’t care about retirement right now. But the Next Generation needs to understand that without involvement in the union today, there won’t be a retirement later, or anything else resembling stability. Time to end the tribalism.
So how do unions take responsibility for educating their younger siblings, and why should it be incumbent upon the Next Gen to cooperate with the Old Gen? Union policy is often the result of great ideas for what other people should do. Someone should form a committee! Someone needs to do outreach! Someone should plan an event! Our locals, with the support of state and national affiliates, need to offer professional development, discussions and social events. Our locals need to admit that a panel with dinner and drinks is way more interesting than parliamentary procedure, and then have panel discussions around Next Gen issues with union leadership. Finally, our locals need members like you and me to create opportunity for the Next Gen to learn the relevance of the union, and to be relevant in it. We know there are interesting things happening in our locals even if we have to dig around for them. Find one of those opportunities and take along a young friend.
Our future depends on it.
Greetings to anyone who attended or is interested in our presentation at the Teaching & Learning Conference 2014. We’re glad to share what we’ve done, what we’ve learned, and what we’re still figuring out about teacher leadership.
My partners in this presentation are Pat Graff (NM), Lanelle Gordin (CA), Cheryl Suliteanu (CA), Maren Johnson (WA). Below you can find the main point of the presentation and some useful links. Presentation slides are available at the bottom of the page, though the settings for background images didn’t transfer from Powerpoint and will make some slides hard to read. The slides with information you’re most likely to want to look up or follow up are pretty clear though.
The central idea of our presentation is that teacher leadership is essential to school improvement and advancing the profession. New roles for teacher leaders are already emerging, and we must ensure that as these roles evolve they are formalized (yet flexible!), sustainable (funded and integrated), and truly professional (requiring demonstrated accomplishment and skill). Not everything we talk about is entirely there yet, and our examples are all variations on the theme, but we think our stories are instructive regarding how teacher leaders are changing the field and where we should be headed.
Thank you to the National Board for Professional Teaching Standards for putting on this excellent conference and inviting us to speak, and thank you to all who came to see our presentation (or cared enough to read this blog post anyways!).
Resources and connections (links open in a new window/tab):
- For the ACT report on teacher career pathways, see our Publications page.
- More information on California’s Greatness by Design report (and a link to find the report).
- Bay Area New Millennium Initiative report on teacher career pathways.
- Read more about the Riverside County Teacher Leadership Certification Academy.
- Stories from School group blog (including posts by Maren Johnson) from the Center for Strengthening the Teaching Profession in Washington.
- On Twitter: Cheryl Suliteanu @CSuliteanu – Maren Johnson @maren_johnson - David B. Cohen – @CohenD
Our slides (with apologies for formatting issues that come up in the transfer from Powerpoint to Google Docs):
I must confess that I do check my blog stats once or twice a day. I’m not driven by pursuit of big numbers, though of course I’m pleased when a post is widely shared and read. But I’m drawn to the stats page because of curiosity about search terms that lead to this blog, about old posts that suddenly find new life for unknown reasons. The stats also show that I’m often wrong about which posts I think will be more popular. The ones I like the most often fade quickly, while posts I’m less invested in sometimes take off.
Case in point: a Facebook friend from New Zealand shared the story about a school cheating case where teachers tampered with test results reported to the national authorities. I noticed similarities to incidents in the U.S., and made the connection to Pasi Sahlberg’s talks and writing about the Global Education Reform Movement – or GERM. I cranked out that blog post in a few minutes and figured it would be a blip on the stat sheet. Surprisingly, it has been the most read and shared post in the past month.
The post also drew a response from Benjamin Riley of the New Schools Venture Fund, who may have noticed this particular post because he’s currently on leave from NSVF and spending a year working in New Zealand. (Nice work if you can get it! I loved my visit there a few years ago).
Here is the distillation of Riley’s response and suggestions (though if you go read the full version at his blog, you get the benefit of his use of White Stripes lyrics). He argues that the inevitability of cheating on tests shouldn’t be used to argue against testing any more than cheating by golfers leads to the end of golf; it simply means we must guard against cheating. Riley also quotes Kevin Carey suggesting that such cheating is short-sighted if inflated scores will end up inflating expectations for subsequent years. Carey adds that “cheating also means that public schools finally care enough about student performance that some ethically challenged educators have chosen to cheat. This is far better than the alternative, where learning is so incidental and non-transparent that people of low character can’t be bothered to lie about it.”
Overall, I can agree with Riley that individuals are responsible, and that cheating by itself is not an argument to eliminate testing. There are some appropriate uses of large-scale standardized assessment. I don’t agree with Carey that an uptick in cheating indicates people “care enough about student performance.” I think it means those people are mad or fearful about the public uses of what passes for “student performance” but really isn’t.
Riley and Carey seem to assume that standardized testing generally produces useful information about students, teachers, schools, and systems – from the individual level on up – and so they tackle this issue with a focus on what educators should do without engaging around one of the key underlying problems: weak tests, or good tests used for weak policies, are central to this story. So I’m asking what education leaders should do to address the problem. I wouldn’t be satisfied with an answer that puts the problem entirely on the teacher, any more than I would accept a teacher who says all the problems in the class are the students’ fault.
Their perspective also seems a bit removed from an understanding of classrooms and schools. I’d suggest that Carey’s view suggests people think more about cheating rationally rather than emotionally. I don’t think it works that way. People who cheat are likely angry or insecure. It’s also important to acknowledge that in some of the high-profile cases in the U.S., there’s evidence of cheating at the school and administrative levels, which calls for a different set of models in trying to understand the psychology of the act, throwing in group dynamics and the possible role of intimidation.
Riley’s main point, the one linked to the lyrics, is that you can’t blame the test for the cheating, any more than you can blame the bank for the robber. And if you’re addressing the cheater, or the robber, I agree: pushing off one’s own misdeeds on others doesn’t negate or excuse the misdeed.
But as someone who has administered thousands of tests and been responsible for learning outcomes, I accept an accountability that Riley and Carey seem less interested in ascribing to “the system” that gives the larger tests, and should be responsible for broader outcomes. This is my fundamental disagreement with much of the education reform notion of accountability. The people with the most power are supposed to bear the most responsibility. When I give a test, I try to design an assessment that is fair, useful, valid, productive, and worth the effort. To attach high-stakes to an exercise that doesn’t meet those criteria is to invite cheating. I’m not excusing the cheaters, but it would be sloppy, unprofessional work on my part to create conditions that I should have known would ultimately undermine my work. If a bank has inadequate safeguards against robbery, it’s not exactly their fault if they’re robbed. But isn’t someone supposed to be accountable for having the foresight to reduce the chances? If a CEO creates the conditions that push more of his managers and accountants to cook the books, and at the same time says be honest, then does the CEO bear any responsibility for corrupt practices that should have been anticipated?
Where’s that accountability in education policy? Okay, punish the cheaters. You can even try to tighten test security, but ultimately, that system must rely on local practitioners without creating an undue administrative burden. So, policy makers, we can easily predict that the more you rely on standardized tests for purposes they can’t adequately measure, and the more you raise the stakes, you are creating conditions that lead to more cheating. It will happen. It shouldn’t. People should resist. They should do the right thing. They should raise objections in appropriate venues. They should be honest and transparent. Those are wonderful sentiments that serve to distance you from knowingly following a series of steps that will have a negative effect on schools and on your own accountability measures.
Our Secretary of Education, and many state superintendents and legislatures have pushed more and more use of mediocre tests for inappropriate, high-stakes uses. People shouldn’t cheat, but neither should our leaders avoid criticism for their failure to produce an accountability system that works. They’ve ignored too much of what we know about students, teachers, schools, learning, and human nature. They’ve failed as policy architects and leaders who should have foreseen the mess they’ve helped create.
Pasi Sahlberg, the well-known Finnish education expert and author of Finnish Lessons, has described the negative trends in education reform as GERM – the Global Education Reform Movement. You can see his TEDx talk “GERM that kills schools” embedded below.
The basics of GERM are well known to most people by now: one-way accountability, where leaders demand results from practitioners while no one seems to hold leaders accountable for creating the conditions necessary for success; high-stakes testing; misguided focus on rankings, competition, and punishment; a near-obsession with data; deprofessionalizing teaching through reduced autonomy and increased focus on compliance.
And the metaphor of GERM makes sense the way unhealthy ideas about educational systems continue to spread. The latest example comes from New Zealand, where teachers at a school have apparently responded to high-stakes testing and narrowed curriculum by cheating. I’m not excusing cheating, but anyone putting these kinds of systems in place – in Atlanta, Washington, D.C., Houston, California or New Zealand – must acknowledge their responsibility as well; cheating is a predictable result when you use improper or limited measures excessively, and in ways that feel threatening.
As a teacher, I would certainly punish a student for cheating in my class. But if I assign much of a student’s grade based on a procedure or task that’s easy to falsify, and it’s also something that my students find intrusive, flawed, coercive or irrelevant, then certainly I’m also at fault for creating the conditions that almost inevitably lead to cheating.
The reactions in New Zealand sound quite familiar:
Labour Education spokesman Chris Hipkins: “The high stakes nature of the system is nonsense. It is very easily manipulated, it is heavily subjective and is no way a reliable measurement of school performance. The higher stakes you make it, the more pressure there is going to be on schools to make their subjective judgments to increase their achievement results.”
Martin Thrupp of Waikato University, who led a three-year study into national standards: “The tail starts to wag the dog and the assessment system kind of takes over and pushes out a broader approach and people tend to go more directly for activities that are going to more directly push kids along in terms of the national standards.”
A parent at the affected school: “I’ve heard from teachers that national standards are putting a lot of pressure on them to document these standard tests rather than allowing children to have their individual strengths recognised.”
It seems quite likely that these GERM approaches will fail in the long run. How many years we’ll spend learning that lesson remains to be seen.
Charles Kerchner’s recent EdWeek essay examines some of the reasons that California has been “A K-12 Education Outlier.” He suggests that it’s a bit of a surprise that California is markedly resistant to federal education policies, considering the state has a Democratic majority in the legislature and a Democrat in the governor’s office. Kerchner writes: “California’s divergence is no red-state aversion to the federal government; nor is it sticker shock at the price of new K-12 assessments. It’s an aversion to the Race to the Top mentality, and the embrace of a deeply held alternative view of what drives improvement in public education.”
That aversion has been proudly on display at the semi-annual meeting of the California Teacher Union Reform Network (CalTURN. Disclosure: I’m on the CalTURN steering committee). The first day of the meeting featured appearances by Kerchner, along with our state superintendent and the president of our largest teachers association. Teachers and administrators in the room applauded their comments about holding out against bad ideas pushed by the federal government. Perhaps the most obvious example is the misuse of student test scores in teacher evaluations. We do have colleagues here joining us from other states – educators who are living with the consequences of that suspect practice. Value-added measures for evaluation are problematic enough when applied in the way most people assume – students tested on the subjects they study in class – but we’re hearing about practices that should strike reasonable people as an outright fraud: teachers are being evaluated based on test scores for students or subjects they don’t teach. These mistakes may be the most visible, memorable legacy of the Obama-Duncan education reform effort, certain to be an embarrassment when history shows – and it will – how poorly supported and how ineffective that approach was.
And yet, California has not entirely resisted national education reform; at the state level, California is fully committed to Common Core implementation, having invested over $1-billion so far, with proposals to more than double that in the future. Tom Torlakson, State Superintendent for Public Instruction, noted that, thanks to AB-484 (a bill enacted over Arne Duncan’s intrusive objections) California has an opportunity to focus on the standards and on professional development without the immediate pressure of high-stakes accountability measures linked to those tests. Accountability hawks in our own state, around the country sounded alarms, while teachers and administrators breathed sigh of relief.
The teachers and administrators here at CalTURN are not kicking back thinking they don’t have to worry about student learning for a couple years, nor are they rehashing the debate about whether or not to adopt the Common Core. Instead, they are moving forward productively, collaborating within and across districts. They are sharing their stories about how to make labor-management relationships work for schools and kids, and envisioning improved methods and measures of accountability. These are not mere philosophical exercises. The new local control funding formula requires districts to develop local accountability plans for the use of new funds. The impact of AB-484 is that the educational leaders in this room are entirely able to focus on collaborative visions for improving schools and school communities without continually talking about test scores. Looking around the country (New York comes to mind) we see compelling evidence that California’s approach is the sane, reasonable, and productive option that Duncan should be applauding rather than threatening. The NEA has supported Common Core, but NEA President Dennis Van Roekel has also raised objections to the implementation in various states.
CTA President Dean Vogel was also at CalTURN, emphasizing the commitment of California teachers to work with students, families and communities. He noted that surveys and polls consistently show teachers are trusted in their communities, and therefore its imperative for us to maintain that trust and strengthen relationships to hold on to what works and advocate for improvements. He noted that the Vergara trial, currently going on in Los Angeles, represents what we are up against: school and community outsiders funding a well-coordinated effort to frame unions, seeking solutions that will undermine our profession without addressing the more glaring inequities that undermine our state’s education system.
The California teachers I’ve been listening today for the past two days are confident in the process of labor-management collaboration. We have willing partners in the district leadership in the room, and in the districts represented here. One teacher described the experience of the past couple days as “affirming we have a shared vision for students.” Another teacher shared a concern that, moving forwards, “The state is going to want us to test, test, test,” and then she asked if, in the face of over-testing, “Are we going to live with courage and do what we know is right?”
For California, what is right is what’s happening here at CalTURN and elsewhere around the state: teachers and administrators insisting that we share a commitment to working together for students and communities, embracing authentic, local, mutual accountability – and resisting non-educators who call on us to do what we know is educationally unsound.
Today and tomorrow I’ll be in Sacramento attending the semi-annual meeting of the California Teacher Union Reform Network – CalTURN. My involvement with CalTURN the past few years has much to do with my optimism about the direction of public education in California. (Disclosure: I’m also on the CalTURN steering committee). This convening brings together union and district leaders from around the state, teams that are committed to labor-management collaboration. In this room, you won’t hear union leaders and administrators complaining about “them” and you won’t hear one side “we” are supporting students and “they” are supporting adult interests.
You also won’t hear anyone say “we” have it all figured out. As a consumer of information about schools and educational governance, I am quick to tune out, or at least discount, stories that sound a bit miraculous, schools and systems that have the solutions. Those stories often don’t stand up to scrutiny, or the success is short-lived, stratospheric success returning to earth when the people involved change or the conditions evolve.
I find it exciting to hear about the real hard work that people are doing to build and sustain incremental change, to create institutional culture based on shared values and open communication. It’s incremental change, and it doesn’t proceed in a linear way. There are districts that have been making good progress for years in labor-management collaboration, and just this morning, we heard from three of them here in California: Poway Unified, San Juan Unified, and ABC Unified. What’s impressive to me is not that they have perfect school districts where everyone gets along, but rather, that they’ve slowly built up an expectation that labor and management work together at every step. They understand that we need each other, and that our overall interests are the same: improve schools, help students. They understand that in the long run, neither labor nor management “wins” if the other loses. Our institutions, students, and communities, do not benefit from weakened or dysfunctional elements within the system.
Here’s a great example: in one district we heard from in this morning’s panel, the union and district leadership put out joint communiqués to staff. Rather than one side or the other communicating with teachers and administrators about professional development or Common Core implementation, a unified message comes through. And even more impressive to me, they put into those messages where they are currently in disagreement and still working through issues. The benefit of that collaboration at the district level is that the teachers and administrators at a school site both know what “their” leadership is doing, and they know what issues are being addressed; this candid communication allows schools to focus on student learning and set aside the issues that they know are being dealt with on their behalf.
As I compose this blog post, I’m looking around the room, seeing and hearing district teams having relaxed and productive conversations about how to work together to improve working together. Such opportunities are not common enough. Sometimes district teams attend conference together to focus on curriculum or professional development, but I think it’s less frequent that they have the opportunity to focus on themselves. If more labor-management teams could engage in this kind of collaboration, the work of professional development, evaluation, and instructional change would all benefit.
If you are reading this post today or tomorrow (Mar. 6-7), or shortly thereafter, check out #CalTURN for some updates and insights via Twitter.