More Thoughts on LAUSD, Standards and Evaluation
In my prior post, I explained why I’m not holding my breath with any expectation of huge changes in California following from the recent court ruling on teacher evaluations in Los Angeles Unified. In this post, I add some additional reflections on why the judge’s order won’t produce significant, statewide changes, at least in the next several years.
I’ve already noted that despite the Stull Act’s provisions concerning student test scores in teacher evaluation, and their enthusiastic support from certain circles, many leading nations, states, districts, and private schools have demonstrated that using student test scores is entirely unnecessary in elevating teaching quality. Part of the reason for that assertion is that the tests themselves are of such poor quality. However, it’s also worth noting that the state is in the process of changing over to new Common Core standards, and new assessments. I don’t expect many districts or unions are ready for protracted negotiations over the use of an assessment system that is in its death throes; as for the next assessment system, the President of the State Board of Education, Michael Kirst, suggested it would be 2018-9 before there’s enough data to guide the use of value-added measures based on those assessments. It should be noted that Kirst was referring to the use of test scores to evaluate schools, not individual teachers, although Kirst has offered some support for that idea in the past, prior to his current appointment to the Board. I would argue that the use of test scores and value-added measures might have some usefulness for schools and districts with large enough sample sizes, and that the application of similar methods for teacher evaluation presents an even greater challenge. 2018… 2019… 2020…?
Of course, it might not be that many years before we see the Stull Act further amended, if not scrapped and replaced by something better. By “better” I mean something that leaves state tests out of teacher evaluation, and instead offers a directive that is firm on the concept of student learning relating to teaching, but flexible on the measures to be collectively bargained in local districts that have significant differences that should be respected. If superintendents like Chris Steinhauser speak up individually and through their professional associations, we might build a consensus among teachers, administrators, parents and school boards, around better approaches to teacher evaluation without the misguided reliance on state tests. It’s happening already in New York, where at least 1,496 principals have taken a highly visible and public stand against teacher evaluation using student test scores. The movement against testing is also growing in Florida and Texas – why not here?
Even when the Common Core assessments arrive, I hope all parties will proceed with caution around test scores and teacher evaluation. Standardized tests of any type, if designed as student assessments, are not necessarily valid for the assessment of teaching. That is not my opinion, by the way, but rather, a restatement of a basic tenet in educational measurement in research, oft repeated in this blog but seldom acknowledged in the education reform debate: an instrument designed for one purpose is not necessarily valid for other purposes. And if you stop to think about it, here’s why. The test of student skills has no way of identifying how the student knows the right answers. Good guess? Good teaching? And if it’s good teaching, whose teaching was it – a parent, tutor, instructional aide, substitute, teacher? (In fact, the Common Core standards put an appropriate emphasis on interdisciplinary instruction, an approach which should improve learning, and thoroughly complicate efforts to evaluate individual teachers by test scores).
For what it’s worth, I’ve always supported my criticisms of VAM with research and examples, and cited underlying flaws based on the National Research Council (NRC), National Council on Measurement in Education (NCME), American Psychology Association (APA) and American Education Research Association (AERA) in blog posts, presentations, and online exchanges. Almost laughably, one education “reformer” in a fairly high position once commented to me on Twitter…
It’s unfortunate that my listing of leading professional organizations for education research was mere “alphabet soup” to a person with significant influence in education reform and a multi-million dollar budget. I was citing the people who set the standards for the field, and the counter-argument was based on beliefs. To be fair, those beliefs are not entirely without basis: studies do show correlations between teachers and value-added scores. Of course, correlations are not causation, as Jesse Rothstein so perfectly illustrated with regard to value-added measurement when he correlated top-rated fifth-grade teachers and their students’ third-grade test scores. We know current teachers don’t influence past scores, so the correlation demonstrates significant influence of a factor or factors other than teachers in shaping the students’ scores. The inability to control for such factors may be an insurmountable flaw in value-added measurement; we’ll see. In the meantime, I’m deeply troubled by indifference to or ignorance of those factors so frequently demonstrated by most policy makers and journalists, along with too many administrators, researchers, and reformers who should know better.
Speaking of journalists, I expect that teachers and unions will continue to be presented strictly as the parties obstructing the reforms. Will anyone ask Chris Steinhauser for comment on the LAUSD ruling? Will any journalists note the fact that there are teachers who embrace the use of student learning evidence in our evaluation, and that we have affirmed that commitment through groups like ACT, through the California Teachers Association, and through the National Board for Professional Teaching Standards? Will they ever evaluate poorly supported opinions, and ask the proponents of value-added measures how they skirt the issue of validity when they rely on mere correlations?
I’m not holding my breath for that, either.