The Common Core transition reached me on a more personal level this week when my 11-year old son came home and asked, “Dad, do you know anyone who made the Smarter Balanced test?” When I replied that I do not, he said, “That’s good. It’s a bad test.” My older son chimed in as well, giving the test mixed reviews.
Some people who know me might think I’d opt my children out of standardized testing, but so far, I haven’t. While I don’t care for standardized tests, I haven’t felt the need for my sons to opt-out because I don’t believe they have been over-tested, or had their time wasted on test prep; to my knowledge, no one in their school, or in our district, has suggested we should put those test result to any inappropriate use for the children or teachers.
I might have considered opting out anyways because I don’t like the use of tests to rank and punish schools, but even that objection has faded for now, with California moving away from its prior accountability program and entering a transitional period to something new. To be honest, I don’t fully understand the new system yet; schools can report the same rankings or ratings this year that they had last year, as we pilot test the Smarter Balanced assessments, and the eventual accountability measures will involve more local decision-making about a variety of measures beyond testing.
My sons had a few specific critiques of the test questions, user interface, and other issues. The whole family got a good laugh out of my 11-year old telling us about his essay. Due to a glitch in the system, he reports, his full response turned out like this:
Sorry I cannot write more than a line or it deletes itself. :(
My point in sharing this anecdote is not to criticize. Next year, the test will probably be improved, and from my sons’ perspective, the novelty will be long gone. What’s too hard this year may seem normal next year. Maybe not. It’s too soon to say.
But I will say this: I think my son’s response was perfect for the situation. And I think this anecdote perfectly illustrates how stupid it would be to evaluate his teacher based on results from a first-ever administration of a flawed assessment. (Even using the old tests, the problems of VAM in teacher evaluation are, at this point, insurmountable). Thankfully, in California, no one will even see the results from the first run-through. And why should we? No one would be able to say reliably what the results even mean, and it will take at least a few iterations before year-to-year comparisons have even a chance of offering any real insights.
I don’t think any independent expert in educational measurement or assessment is ready to go on record vouching for the validity of value-added measures in teacher evaluation if the inputs come from brand new assessments – tests that were never validated for that purpose in the first place. It’s mainly politicians and certain “accountability” enthusiasts in the education bureaucracy or think tank crowd who are ready to plunge recklessly into these unknown waters. Of course, many of these individuals, and their districts and states, are reacting to the pressure from the Education Department to take these unwise steps. Despite the legal ambiguities around his approach, and the deficiencies in research and reasoning, Secretary Arne Duncan continues to play at carrots and sticks to push VAM into teacher evaluation.
My sympathies to those of you living and working with the consequences; for the time being at least, in my state and district, the imperfections are being handled perfectly. Who knows? California’s slow and sane approach just might work.
A more philosophical post than usual – for what it’s worth.
This morning I had the opportunity to listen to a talk by Elane Geller, a Holocaust survivor originally from Poland, and now a resident of Southern California. I’ve heard Holocaust survivors speak a number of times in my life, and it’s always a profound experience, but there were two particular take-away ideas I thought would be worth sharing in this space.
The first was an observation Geller offered about Holocaust education, particularly for Jewish children. She commented that any learning about the Holocaust should be considered in a wider context (my words, not hers), when she talked about a cycle, “from joy, to pain, and back to joy again.” In other words, Holocaust education would not be a starting point for Jewish education; instead, you should start with the positivity of traditions and living culture. Then, yes, it is necessary to understand the negative history of the Holocaust, to confront evil directly and name it. And finally, you wouldn’t want to neglect the importance of closure that brings the child back to a sense of joy and positivity about the future.
It struck me that this cycle could apply to any learning experience, within the span of day, week, month or years. If academics are relevant to our students’ lives as members of a challenged society, then we must confront challenges openly, honestly, in ways that are sensitive to individuals and cultures while also academically focused. On the personal side, it makes sense to establish a sense of joy about learning, a degree of comfort among people in an academic community. Then, it should be safe to move into content that may be uncomfortable for some, but absolutely necessary. Such work can be done in an age-appropriate way that is still academically “honest” and true to the core of an academic discipline, and able to respect and honor the personal, emotional side of learning. The cycle is complete when our uncomfortable or challenging learning experiences are brought to a positive conclusion, with a sense of agency and purpose, and clear evidence of new learning.
The second observation that stuck with me this morning occurred when Geller talked about a sense of mutual responsibility, and even a sense of mutual peril in looking at world events. It’s a sentiment that has been expressed many times in many cultures, that a threat to human rights anywhere is a threat to human rights everywhere. The word ubuntu, found in multiple languages and dialects in southern Africa, identifies a similar concept – that my humanity is bound up in your humanity.
Given the scope of the humanitarian crises right now in places like Syria, South Sudan, central Africa, and North Korea, American education policy debates begin to look relatively minor. But on the other hand, the United States is not exactly leading the world in efforts to avoid a humanitarian crisis of its own (though of a different nature). The overall poverty rate in the United States is shameful, given our overall economic output. The childhood poverty rate is an embarrassment, and a blight that should speak to all of us on a personal, moral level. The potential social and economic upheaval that awaits us if we continue down this path should give us all pause, and then, prompt us to act.
Considering the severity of the poverty problem and the obvious deleterious effects of poverty on children’s health, social and academic development, it’s frankly troubling to me that philanthropists, politicians, and others supposedly dedicated to children’s welfare can remain relatively silent about economics and broader social policies, while dedicating considerable time, money, and energy to vigorous battles over policies that have questionable chances of producing minor improvements in children’s lives. I’m not trying to seize the Holocaust or other vast social problems as a high road to attack the positions of people whose education policy ideas I disagree with; setting aside the merits of any specific policy position, I will go so far as to say that those focused on marginal issues while ignoring essential issues lack credibility when they try to seize the moral high ground.
Let’s have those debates. But maybe those debates would be less contentious and more productive if we had more ubuntu.
Teacher evaluation has been a frequent topic in this space: Accomplished California Teachers (ACT) first coalesced as a teacher leadership group in large part to produce a report on evaluation that would feature teacher voice regarding current practices and promising reforms for California schools. I’ve also written frequently about an evaluation method that stands out as the worst popular idea out there – using value-added measurement (VAM) of student test scores as part of a teacher evaluation. The research evidence showing problems with VAM in teacher evaluation is solid, consistent, and comes from multiple fields and disciplines – most recently, statisticians (more on that in a moment). The evidence comes from companies, universities, and governmental studies. And the anecdotal evidence is rather damning as well: how many VAM train-wrecks do we need to see?
On the relevance of student learning to teacher evaluation, the ACT team that produced our evaluation report was influenced by the fact that many of us were National Board Certified Teachers. Our certification required evidence of student learning – after all, teaching without learning is merely a set of word or actions. Board certified or not, our team members all agreed that an effective teacher needs to be able to show student learning, as part of an analytical and reflective architecture of accomplished teaching. It doesn’t mean that student learning happens for every student on the same timeline, showing up on the same types of assessments, but effective teachers take all assessments and learning experiences into account in the constant effort to plan and improve good instruction.
Value-added measures have a certain intuitive appeal, because they claim the ability to predict the trajectory of student test scores, theoretically showing the “value” added by the teacher if the score is higher than predicted. This deceptively simple concept sounds reasonable, especially for non-teachers, and even more so for policy makers. They often seem eager to impose on teachers and administrators what is essentially one-way “accountability” for the success of schools; stagnant or declining scores bring negative consequences, so the public can be reassured that insecure school personnel will be compelled to do their jobs. Meanwhile, policy makers often ignore (because the voters and media allow them to ignore) what should be their share of accountability for the conditions of schools, and even the outside-of-school conditions that all the experts agree outweigh teacher effects on standardized test scores. Yes, you read that correctly: most of the variation in students’ test scores can be accounted for by factors outside of school - factors like family wealth, educational attainment, health care, and similar.
If you care to look at some of my prior posts on the topic of VAM in teacher evaluation, you’ll find that education researchers, economists, scientists, mathematicians, and experts in psychometrics (the measurement of knowledge) have all weighed in against the idea. Some offer stronger objections than others, but most agree that VAM is not stable or reliable enough for high-stakes usage. It has also been noted by multiple professional associations that measures validated for one purpose (measuring student knowledge) cannot be assumed valid for other purposes (measuring teacher effect). The main proponents of VAM use for high-stakes personnel decisions all seem to be economists (Hanushek, Chetty, Kane), or researchers with some vested interest in finding what they end up finding (Gates Foundation, William Sanders).
Well, the latest professional group to weigh in on the topic was the American Statistical Association. The ASA is not against the concept or use of VAM, but they do caution that VAM should only be used under a whole set of circumstances that are quite unlike the circumstances found in schools and districts using VAM. For example, VAM should be used by experts, with clear information regarding formulas and margins of error, and careful analysis of how sensitive statistical models are when the assessment changes.
Here are some choice quotes from their April 8, 2014 report:
VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.
Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
VAMs are only as good as the data fed into them. Ideally, tests should fully measure student achievement with respect to the curriculum objectives and content standards adopted by the state, in both breadth and depth. In practice, no test meets this stringent standard, and it needs to be recognized that, at best, most VAMs predict only performance on the test and not necessarily long-range learning outcomes.
[Regarding studies that have found some predictive ability in VAM scores by teachers, with "correlations generally less that 0.5]: These studies, however, have taken place in districts in which VAMs are used for low-stakes purposes. The models fit under these circumstances do not necessarily predict the relationship between VAM scores and student test score gains that would result if VAMs were implemented for high-stakes purposes such as awarding tenure, making salary decisions, or dismissing teachers.
I should also note that there’s a portion of this report I disagree with, regarding the potential use of VAM to evaluate teacher training programs:
A VAM score may provide teachers and administrators with information on their students’ performance and identify areas where improvement is needed, but it does not provide information on how to improve the teaching. The models, however, may be used to evaluate effects of policies or teacher training programs by comparing the average VAM scores of teachers from different programs. In these uses, the VAM scores partially adjust for the differing backgrounds of the students, and averaging the results over different teachers improves the stability of the estimates.
It’s unclear to me how VAM within schools or districts is recommended for observing correlation, but when extended beyond schools to involve even more complex interactions of variables known and unknown, we’re now talking about evaluating effects (causation, rather than mere correlation). While I understand the value of larger sample sizes in reaching stronger conclusions about data, I question the ability of anyone undertaking such an evaluation to control for the differences among schools. The quote above mentions only the “differing backgrounds of the students.” However, different teacher training programs develop different relationships with schools and districts. Teachers are not randomly distributed to schools or communities after their training, and the school’s and community’s effects on the teachers would seem highly relevant. There are studies that show the effects of principals on test scores, the effects teachers on teachers, effects of class period length, effects of tutoring that may or may not be available, effects of libraries that may or may not even be open, etc. My open letter to California policy makers on this topic argued, and would stand by the argument, that there are simply too many interacting variables to reach any reliable conclusions that depend on value-added measures.
Teachers are tribal people. We build a nice little fortress and stay inside as much as possible, defending the gates when necessary. This tribalism has saved us in an ever-shifting landscape, but it’s got its limitations and it may soon be the cause of our demise. The two big tribes now are Old Generation and Next Generation teachers, and the most obvious place of need is our local unions. The Next Gen teachers are attracted by the glamour of the reform groups and the promise of an amplified voice; the Old Gen prefers the lunchroom and the union hall. Over the last school year I’ve gotten the chance to hang out with 58 Next Gen teachers from 17 states, courtesy of the NEA. I’ve learned that we’re lined up on either side of a generation gap that is worlds apart, but that each side is a critical part of the equation. We need each other more than we will benefit from sticking with our tribe.
Because the origins of unions are rooted in staying alive in a hostile environment, the Old Generation totally gets the need for protection, collective bargaining and the need to jump up and down once in a while. The Old Gen safeguards everyone’s rights whether they like it or not – experience has taught them that careers are subject to the whim of the public, politicians, and now philanthropists. We once did believe that our good work would be our protection from harm or unfair practices; we never saw the need for pensions or fully-funded health benefits as youngsters. And then life taught us something – we’re treading faster for less money and less respect, and sometimes one of us gets caught in the machinery and goes down. Working conditions, pay raises, benefits, retirement, the bread and butter issues, these are the spears and bagpipes of teachers who are nearing the end of their tenure and see the world falling apart around them. They sometimes see the Next Generation of teachers as an opposing camp, vulnerable to the influence of outside agencies. But these are the people we need to protect us, our profession, and public education and we need to make friends now.
The Next Generation doesn’t seem to register the Old Gen to the same degree. The Old Gen are just irrelevant, cranky strangers in red t-shirts standing outside the board of education, waving signs. These younger folks see step and column pay scales as ridiculous, and can’t understand why anyone would ever have a problem with being evaluated or using student outcome data as part of that evaluation. The reform groups give them opportunities to meet policymakers, be on panels, write policy papers – that’s what drew me in and that’s what we need to do to compete. In the last three years in LA, Next Gen issues have been career pathways, evaluation and new pay structures. They don’t care about retirement right now. But the Next Generation needs to understand that without involvement in the union today, there won’t be a retirement later, or anything else resembling stability. Time to end the tribalism.
So how do unions take responsibility for educating their younger siblings, and why should it be incumbent upon the Next Gen to cooperate with the Old Gen? Union policy is often the result of great ideas for what other people should do. Someone should form a committee! Someone needs to do outreach! Someone should plan an event! Our locals, with the support of state and national affiliates, need to offer professional development, discussions and social events. Our locals need to admit that a panel with dinner and drinks is way more interesting than parliamentary procedure, and then have panel discussions around Next Gen issues with union leadership. Finally, our locals need members like you and me to create opportunity for the Next Gen to learn the relevance of the union, and to be relevant in it. We know there are interesting things happening in our locals even if we have to dig around for them. Find one of those opportunities and take along a young friend.
Our future depends on it.
Greetings to anyone who attended or is interested in our presentation at the Teaching & Learning Conference 2014. We’re glad to share what we’ve done, what we’ve learned, and what we’re still figuring out about teacher leadership.
My partners in this presentation are Pat Graff (NM), Lanelle Gordin (CA), Cheryl Suliteanu (CA), Maren Johnson (WA). Below you can find the main point of the presentation and some useful links. Presentation slides are available at the bottom of the page, though the settings for background images didn’t transfer from Powerpoint and will make some slides hard to read. The slides with information you’re most likely to want to look up or follow up are pretty clear though.
The central idea of our presentation is that teacher leadership is essential to school improvement and advancing the profession. New roles for teacher leaders are already emerging, and we must ensure that as these roles evolve they are formalized (yet flexible!), sustainable (funded and integrated), and truly professional (requiring demonstrated accomplishment and skill). Not everything we talk about is entirely there yet, and our examples are all variations on the theme, but we think our stories are instructive regarding how teacher leaders are changing the field and where we should be headed.
Thank you to the National Board for Professional Teaching Standards for putting on this excellent conference and inviting us to speak, and thank you to all who came to see our presentation (or cared enough to read this blog post anyways!).
Resources and connections (links open in a new window/tab):
- For the ACT report on teacher career pathways, see our Publications page.
- More information on California’s Greatness by Design report (and a link to find the report).
- Bay Area New Millennium Initiative report on teacher career pathways.
- Read more about the Riverside County Teacher Leadership Certification Academy.
- Stories from School group blog (including posts by Maren Johnson) from the Center for Strengthening the Teaching Profession in Washington.
- On Twitter: Cheryl Suliteanu @CSuliteanu – Maren Johnson @maren_johnson - David B. Cohen – @CohenD
Our slides (with apologies for formatting issues that come up in the transfer from Powerpoint to Google Docs):
I must confess that I do check my blog stats once or twice a day. I’m not driven by pursuit of big numbers, though of course I’m pleased when a post is widely shared and read. But I’m drawn to the stats page because of curiosity about search terms that lead to this blog, about old posts that suddenly find new life for unknown reasons. The stats also show that I’m often wrong about which posts I think will be more popular. The ones I like the most often fade quickly, while posts I’m less invested in sometimes take off.
Case in point: a Facebook friend from New Zealand shared the story about a school cheating case where teachers tampered with test results reported to the national authorities. I noticed similarities to incidents in the U.S., and made the connection to Pasi Sahlberg’s talks and writing about the Global Education Reform Movement – or GERM. I cranked out that blog post in a few minutes and figured it would be a blip on the stat sheet. Surprisingly, it has been the most read and shared post in the past month.
The post also drew a response from Benjamin Riley of the New Schools Venture Fund, who may have noticed this particular post because he’s currently on leave from NSVF and spending a year working in New Zealand. (Nice work if you can get it! I loved my visit there a few years ago).
Here is the distillation of Riley’s response and suggestions (though if you go read the full version at his blog, you get the benefit of his use of White Stripes lyrics). He argues that the inevitability of cheating on tests shouldn’t be used to argue against testing any more than cheating by golfers leads to the end of golf; it simply means we must guard against cheating. Riley also quotes Kevin Carey suggesting that such cheating is short-sighted if inflated scores will end up inflating expectations for subsequent years. Carey adds that “cheating also means that public schools finally care enough about student performance that some ethically challenged educators have chosen to cheat. This is far better than the alternative, where learning is so incidental and non-transparent that people of low character can’t be bothered to lie about it.”
Overall, I can agree with Riley that individuals are responsible, and that cheating by itself is not an argument to eliminate testing. There are some appropriate uses of large-scale standardized assessment. I don’t agree with Carey that an uptick in cheating indicates people “care enough about student performance.” I think it means those people are mad or fearful about the public uses of what passes for “student performance” but really isn’t.
Riley and Carey seem to assume that standardized testing generally produces useful information about students, teachers, schools, and systems – from the individual level on up – and so they tackle this issue with a focus on what educators should do without engaging around one of the key underlying problems: weak tests, or good tests used for weak policies, are central to this story. So I’m asking what education leaders should do to address the problem. I wouldn’t be satisfied with an answer that puts the problem entirely on the teacher, any more than I would accept a teacher who says all the problems in the class are the students’ fault.
Their perspective also seems a bit removed from an understanding of classrooms and schools. I’d suggest that Carey’s view suggests people think more about cheating rationally rather than emotionally. I don’t think it works that way. People who cheat are likely angry or insecure. It’s also important to acknowledge that in some of the high-profile cases in the U.S., there’s evidence of cheating at the school and administrative levels, which calls for a different set of models in trying to understand the psychology of the act, throwing in group dynamics and the possible role of intimidation.
Riley’s main point, the one linked to the lyrics, is that you can’t blame the test for the cheating, any more than you can blame the bank for the robber. And if you’re addressing the cheater, or the robber, I agree: pushing off one’s own misdeeds on others doesn’t negate or excuse the misdeed.
But as someone who has administered thousands of tests and been responsible for learning outcomes, I accept an accountability that Riley and Carey seem less interested in ascribing to “the system” that gives the larger tests, and should be responsible for broader outcomes. This is my fundamental disagreement with much of the education reform notion of accountability. The people with the most power are supposed to bear the most responsibility. When I give a test, I try to design an assessment that is fair, useful, valid, productive, and worth the effort. To attach high-stakes to an exercise that doesn’t meet those criteria is to invite cheating. I’m not excusing the cheaters, but it would be sloppy, unprofessional work on my part to create conditions that I should have known would ultimately undermine my work. If a bank has inadequate safeguards against robbery, it’s not exactly their fault if they’re robbed. But isn’t someone supposed to be accountable for having the foresight to reduce the chances? If a CEO creates the conditions that push more of his managers and accountants to cook the books, and at the same time says be honest, then does the CEO bear any responsibility for corrupt practices that should have been anticipated?
Where’s that accountability in education policy? Okay, punish the cheaters. You can even try to tighten test security, but ultimately, that system must rely on local practitioners without creating an undue administrative burden. So, policy makers, we can easily predict that the more you rely on standardized tests for purposes they can’t adequately measure, and the more you raise the stakes, you are creating conditions that lead to more cheating. It will happen. It shouldn’t. People should resist. They should do the right thing. They should raise objections in appropriate venues. They should be honest and transparent. Those are wonderful sentiments that serve to distance you from knowingly following a series of steps that will have a negative effect on schools and on your own accountability measures.
Our Secretary of Education, and many state superintendents and legislatures have pushed more and more use of mediocre tests for inappropriate, high-stakes uses. People shouldn’t cheat, but neither should our leaders avoid criticism for their failure to produce an accountability system that works. They’ve ignored too much of what we know about students, teachers, schools, learning, and human nature. They’ve failed as policy architects and leaders who should have foreseen the mess they’ve helped create.
Pasi Sahlberg, the well-known Finnish education expert and author of Finnish Lessons, has described the negative trends in education reform as GERM – the Global Education Reform Movement. You can see his TEDx talk “GERM that kills schools” embedded below.
The basics of GERM are well known to most people by now: one-way accountability, where leaders demand results from practitioners while no one seems to hold leaders accountable for creating the conditions necessary for success; high-stakes testing; misguided focus on rankings, competition, and punishment; a near-obsession with data; deprofessionalizing teaching through reduced autonomy and increased focus on compliance.
And the metaphor of GERM makes sense the way unhealthy ideas about educational systems continue to spread. The latest example comes from New Zealand, where teachers at a school have apparently responded to high-stakes testing and narrowed curriculum by cheating. I’m not excusing cheating, but anyone putting these kinds of systems in place – in Atlanta, Washington, D.C., Houston, California or New Zealand – must acknowledge their responsibility as well; cheating is a predictable result when you use improper or limited measures excessively, and in ways that feel threatening.
As a teacher, I would certainly punish a student for cheating in my class. But if I assign much of a student’s grade based on a procedure or task that’s easy to falsify, and it’s also something that my students find intrusive, flawed, coercive or irrelevant, then certainly I’m also at fault for creating the conditions that almost inevitably lead to cheating.
The reactions in New Zealand sound quite familiar:
Labour Education spokesman Chris Hipkins: “The high stakes nature of the system is nonsense. It is very easily manipulated, it is heavily subjective and is no way a reliable measurement of school performance. The higher stakes you make it, the more pressure there is going to be on schools to make their subjective judgments to increase their achievement results.”
Martin Thrupp of Waikato University, who led a three-year study into national standards: “The tail starts to wag the dog and the assessment system kind of takes over and pushes out a broader approach and people tend to go more directly for activities that are going to more directly push kids along in terms of the national standards.”
A parent at the affected school: “I’ve heard from teachers that national standards are putting a lot of pressure on them to document these standard tests rather than allowing children to have their individual strengths recognised.”
It seems quite likely that these GERM approaches will fail in the long run. How many years we’ll spend learning that lesson remains to be seen.