Study: Once-a-year teacher evaluations not enough

So says the head of an article in the January 6 Idaho Statesman. I deplore much of what is said about us in the media, but to my surprise I found myself agreeing with most points in this article and applauding some. Most of it makes good sense. It made me feel good because my district has made much progress in this direction. Most (but not all) of the recommendations resulting from the study cited here have been implemented and are operative insofar as resources allow. I would like to work my way through the article commenting on it pretty much sentence by sentence: responsive reading notes, as it were.

SEATTLE — Once-a-year evaluations aren’t enough to help teachers improve, says a report by the Bill & Melinda Gates Foundation.

I agree heartily with the central assertion of the report being referred to here, in general principal, with the sole reservation that the devil is in the details of how more frequent observation would be conducted and how it would be used. The article goes on to give at least some details, which I will assume accurately reflect the main points of the report. I am pleasantly surprised that this study was conducted under the auspices of the Bill & Melinda Gates Foundation. I am suspicious of the agendas of Billionaires Bearing Gifts.

… School districts using infrequent classroom observations to decide who are their best – and their worst – teachers could be making some big mistakes, according to the second part of a multi-year study from the foundation.

This journalist, like most of his ilk, talks about how infrequent observations can be a mistake for the district, but what about the teachers? No one really likes to be observed. It is an interruption and a distraction. Nevertheless, the experienced teacher comes to realize that more frequent observations, well and reasonably done, fairly and constructively employed, can be his friend. Of course more frequent observation will document poor teachers, and a history of poor performance can be just cause for non-renewing even a continuing-contract teacher, but can also document that teachers are doing their jobs. A union is not acting in the best interest of its members if it knee-jerk opposes any and all evaluation.

Good teacher evaluations require multiple nuanced observations by trained evaluators. Those results should be combined with other measures, such as student test scores and classroom surveys, to gather enough information to both evaluate teachers and help them improve, the researchers found after nationwide experiments involving thousands of teachers.

Operative terms here are “nuanced [italics mine] observations.” Much Reformist rhetoric suggests a blunt instrument approach, one size fits all. Would that teaching and what goes on in a classroom were so simple. Teaching is a complex pursuit, although many people who have never done it do not believe so. Every day is different. Every combination of kids in a room is different. The content of every lesson makes its own demands. Another operative term is trained evaluators. Evaluation is at once a science, an art, and a craft. Good performance and bad alike are complex sets of behaviors, effective, more-or-less effective, and obviously negative. An evaluator must know what he is looking for and what he is looking at. Tests are one part of a larger whole, and not the most important part, I think. If this report is indeed research based rather than ideology based, it is miles ahead of most Reformist fare.

The most common teacher evaluation method used by school districts today – a single classroom observation once every few years – has only a 33 percent chance of resulting in an accurate assessment of a teacher, the researchers found.

The single classroom observation a year is inadequate, and it never has been adequate, although until fairly recently, it was the norm. It probably still is standard practice in many districts. It just never occurred that there might be something better. In this one indirect way at least, Reformism may have had a positive, if unintended, benefit. At best, single observation allows the administrator to see only one of approximately 900 high school classroom hours in an academic year. If this were a really representative sample, it might be adequate. But it is not an adequate sample. If the observation is at an appointed time, a teacher will prepare something especially for that time so that he can really strut his stuff. It may be his best, but it won’t be truly representative. If, on the other hand, the observation is at a random time and unannounced, it may catch the teacher on his worst day in his class from hell (which we all have from time to time) and be no more representative than the advance-scheduled observation. The answer is more observations. The more items in a sample, the more likely it will be that variability among the samples will average each other out.

This confirms what many teachers and their unions have been saying for years: “That when high stakes decisions are being made, school districts should allow for more than one observation,” said Tom Kane, deputy director of the Seattle-based foundation’s education program and leader of the research project.

As I said above, a personnel folder full of good evaluations (and actually, observations are properly only a part of the evaluation process) is your best friend at contract renewal time. It is hard to fire even a new teacher, not yet on continuing contract, who has numerous consistently good reviews. But based on only one observation per year, you are only as good as that last one.

Teachers across the nation are getting too little feedback and are being left alone to figure out what they need to do to improve, says Vicki Phillips, director of the foundation’s education program. If the nation is serious about improving the quality of its teachers, improving evaluation and feedback should be an important element of that effort.

This, not getting the goods on teachers in order to fire them, is the greatest value of frequent observations. Once upon a time, in another school district, there was a rookie foreign language teacher who experienced considerable difficulty his first two years. His linguistic competence was never an issue. He made a hobby of learning languages, often obscure ones. Kyrgyz? If he could obtain an English-Kyrgyz dictionary and some Kyrgyz newspapers, he could teach himself the language. He did not necessarily need a Kyrgyz grammar; he could construct his own. Languages were not just his classroom subject, but his passion and his genius. Alas, when it came to classroom management, he was pretty clueless, and the kids led him a merry chase. His frustration grew. By the end of his second year, it came to a head.

One spring evening I was grading essays late in the faculty lounge. V. and our principal, G, came in, engaged in an earnest conversation, sat down, and continued to discuss. It was none of my business, so I went into fly-on-the-wall mode, kept my eyes on my papers, my mouth shut and my ears open. V. had returned his contract for the following year unsigned. He was not planning to come back the following year. This was the first G. had heard of it. He was understandably concerned; he was losing a one-man foreign language department. As they talked, V. explained the reasons for leaving, cataloguing his difficulties (no, he did not have a better offer elsewhere). For each, G. offered sound advice. Finally, V. banged both fists upon the table and shouted (he was a very soft-spoken man), “Dammit, G!  Why weren’t you telling me this stuff two years ago when it could have done me some good? Sorry. My mind is made up.” He had given up on teaching. We lost a talented, perhaps positively brilliant young teacher of great promise.

For the past two years, the foundation has been working to build a fair and reliable system of teacher evaluation and feedback to help teachers improve their craft and assist school administrators in their personnel decisions.

This one sentence contains several important terms. “Fair and reliable:” it must be those. Teachers are always worried about the fairness of their evaluations, the more so in this age of hostile Reformist rhetoric, according to which superb teachers should get preferential treatment (bonuses), and not-so-superb teachers should get the sack. Administrators’ “personnel decisions” are becoming high-stakes as never before. Skepticism is warranted. But the most important benefit of a well-designed, well-administered evaluation process is that it will provide “feedback to help teachers improve their craft.” G. was a principal of many virtues, but this was not among them.

This report comes amid efforts across the country to change the way teachers are evaluated. Most of the new systems are a direct result of a call by the federal government for education reform, and many are finding implementation of the evaluation systems difficult.

The core of the Gates Foundation study was a collection of digital videos of more than 13,000 lessons in classrooms of teachers who volunteered to be studied.

The classrooms are being studied in Charlotte-Mecklenburg Schools, the Dallas Independent School District, Denver Public Schools, Hillsborough County Public Schools in Tampa and St. Petersburg, Fla., Memphis City Schools, The New York City Department of Education and Pittsburgh Public Schools.

It is too bad that school districts are getting interested in evaluation only in response to outside pressure from the federal or state level. Of course, evaluating teachers is difficult. But if the result is to be worthwhile, it is worth working through. There is no fast way. Of course, a team tasked with coming up with an evaluation system should be aware of what is being done/has been done elsewhere. They should also be aware of what has been done in the past in their own district. The more the better – ideas are where you find them. But a system copied from what worked somewhere else will not necessarily work very well here. One size does not fit all. A canned system, handed down by the legislature or the state department of education, or even the local administration, will not be worth much. There are no short cuts.

The main conclusions of this report are as follows [Let’s consider them one at a time]:

– High quality classroom observations require clear, specific standards, well trained and certified evaluators and multiple observations per teacher.

There are three operative terms here. First is the need for clear, specific standards. Defining them will require much thought, much discussion, and finally, clear, cogent writing. By next year or the year after, the process will be no better than the language with which it is written. Second is “well trained and certified evaluators” who know what they are looking for and what they are looking at. Being an administrator does not necessarily mean competent evaluations. Third, multiple observations per teacher:  It is easier to get rid of a bad teacher if his incompetence has been well documented. Likewise, it is harder to remove a good teacher if his adequate or better classroom performance has been documented on numerous occasions.

– Classroom evaluation is not enough. That information should be combined with student feedback and data on improvement in student test scores. Combining the three kinds of evaluations offsets the weaknesses of each individual approach.

That “Classroom evaluation is not enough” is obvious. I have no quarrel. But I am leery of the next two. How will student feedback be obtained, processed, and weighted. Unimportant, the Reformists would say. Mere administrative details, of no concern to any teacher, they would say. No, these “details” are of deadly importance. If this is not done thoughtfully and carefully, it could turn into a high-stakes popularity contest. What would happen to a teacher’s authority if students could say, “Yo, teach! If you not nice to us, we vote against you, turn our thumbs down, and get your sorry ass fired! We vote you off the island?” And if a teacher’s authority is thus undermined, will he be able to effectively manage his classroom? Why should students then accept instruction from him as worth anything? What impact would there be on instructional effectiveness? “Piffle,” the Reformists would sneer. This is mere catastrophizing. It is a worst case scenario and therefore irrelevant.” All too often, as in this case, the worst case is the defining case, the only relevant one. Here there be dragons.

Furthermore, according to another article at Idaho Statesman.com, henceforth parental input is also a component of teacher evaluations, mandated by the state Department of Education, based on legislation to be introduced in the 2010 legislature. How will this data be gathered, and how will it be factored in? Years ago, an “effectiveness audit” was instrumental in removing a principal. How was parental input gathered? From a questionnaire distributed to those parents who had been invited to attend a parent organization meeting. Here there be dragons. At one of these meetings, a parent is alleged to have said to that principal, “Lady, you better watch your step because now you are taking your orders from me.” Urban legend or not, it is chilling.

How much weight should test scores carry in evaluating teachers? According to the same article at Statesman.com, the state Department of Education cites language passed by last year’s legislature as mandating that “at least 50 percent of all teaching evaluations performed after June 30 will be tied to the academic performance of students.” That’s how much. It is nice to see test scores as scientifically objective and mathematically precise, and therefore above question. This kind of illusory reverence for test scores denies the importance, indeed the existence, of all kinds of game-changing, game-defining external variables. Reformists deny the relevance of such factors as student demographics, student achievement to date, etc. Yet these things can matter. They are not an excuse not to work with and do all possible for students less advantaged economically and educationally, but they matter. In 2010, a “bad” teacher in Los Angeles jumped off a bridge shortly after he had been pilloried by name in the Los Angeles Times because of his students’ lackluster test scores. Coincidence?  A post hoc ergo propter hoc fallacy? I doubt it. The part of the story that didn’t come out until later was that the administration deliberately loaded his classes with underachieving students, at his request, because he could work with them and no one else wanted them. Yes, he added value; no, his kids were still not up to snuff. Did test scores tell the whole story?  No good deed goes unpunished.

A component not mentioned here, a set of evaluation criteria that I consider to be of great importance, but which is obviously beneath the notice of Reformists in general and beyond the ken of Politicians in particular, is Teacher Practice. What is good practice? How do we define it so that we know it when see it, and so that we see it in the first place? Conversely, what are bad practices, likely to be counter-productive? Answers to these questions will require much study, thought and discussion. The criteria and instruments based on them that are unilaterally handed down do not work well.

Reformists say that the bottom line is the test scores – the magic numbers. They assume that students who test well do so because they have had good teachers. But what do good teachers actually do that makes them good. If we pay attention only to the ultimate effect (test scores), we must pay attention to cause (teacher practices). We cannot have effect without cause. We evaluate teacher practice by direct observation (the more frequent and the better the evaluation rubric, the better) which is a labor-intensive and time-consuming, and therefore, more expensive enterprise.  Evaluation primarily by test scores is easy and cheap, and let’s face it, numbers, like test scores, have political punch, but it exemplifies the “street light effect” at is damndest.

– The different evaluation methods still need to be refined, but they’re better than what most districts are using now.

Evaluation methods always need to be refined, even if they are better than what most districts use. It is always a work in progress. The operative word is refined. If the old system has any practices worth keeping, they should be kept and built on. Think twice about tearing down the old system, even if it leaves much to be desired. A common fallacy of Reformists is to think that they are inventing the wheel. They want the one big fix that solves all problems (and makes them look good). Schools don’t work that way, and life doesn’t work that way. What is wanted is continuous improvement because it works and it lasts.

Memphis Public Schools used to evaluate its teachers once every five years. With financial help from the Gates Foundation, the district has switched to a system of four-to-six classroom visits by both principal and peer evaluators, followed by feedback meetings focused on improvement.

One observation every five years is absurdly inadequate, for experienced teachers as well as for rookies. It smacks to me of administrative malpractice. In my district, Nampa Idaho, 131, experienced teachers receive one formal (summative) observation a year. These are scheduled, so the teacher has advance notice. This gave me opportunity to write up and print my plan for the day, explaining what I would be doing that day, the rationale for it in terms of curriculum goals, what the evaluator could expect to see, and what I would like the evaluator to look for in particular. I found this to be a useful exercise in its own right. There were usually pre-observation and post-observation conferences. In addition, there were several “fast-track” observations that were unannounced and did not usually last the whole period. In a previous and much smaller school, the principal often delivered his own messages instead of sending a student aide. This had him in and out of many classrooms, however briefly, and gave him numerous “snapshots” of what was actually going on in his building. We thought then that it was poor use of administrative time. I think differently now.

The new system was implemented after to set new district-wide standards and both teachers and principals were thoroughly trained in the new system.

“This process is neither quick nor easy. And we’re still working out the kinks,” said Tequilla Banks, coordinator of research, evaluation and assessment for the Memphis district…. She said, however, that both teachers and administrators feel the effort is worth it.

When you are done, you are not done. It will always be a work in progress. Things change. New ideas are thought of. It must be revisited regularly. Notice especially that “teachers and administrators worked together.” This is vital. Teachers must participate in the process if they are to own it. They must own it if they are to accept it. And, they must accept it if they are to benefit from it.

The president of the teacher’s union in Hillsborough County Schools, which is using both teacher and principal evaluators, said teachers have embraced the new system.

The union should play a key role in the evaluation process. The scope of teacher evaluation, its methods, and its rules should be negotiated into the master contract. The make-up and the membership of the committee that draws up the actual instrument(s) must be approved at the negotiating table by both district and union representatives. Ideally, the committee(s) should reflect a balance between union and district personnel. The union must not merely react to board or outside pressure, but must take the initiative.

“We’re new in this process, but already many teachers tell us they value the conversations they’re having with their peers,” said Jean Clements.

These conversations are valuable in their own right, in that they will encourage serious thought about what constitutes good teaching and good performance in the total context of the school. These conversations should not occur just in the drafting committee, but in curriculum committees, faculty councils, departments, clear down to Friday afternoon social gatherings at favored watering spots. And those conversations should reach clear up to negotiating table. This kind of “metacognition” will ultimately have more effect on actual teaching practice and teacher growth than any evaluation process itself. This should be what Reformists want.

Both Hillsborough and Memphis are also experimenting with student surveys.

Those surveys, also being piloted by the foundation in school districts around the nation, are not popularity contests, Kane said. They focus on class experiences and ask students to talk about things like whether they are being challenged and engaged.

College professors have been evaluated by their students for years. Kane, who is also a Harvard professor, said he thinks school teachers could learn to appreciate that feedback as well.

“One thing I’ve learned is once you show people the questions, much of the hesitance fades away,” he said.

Feedback is good. Popularity contests are not.

Kane emphasized that the main finding of this research is that the more information gathered about any one teacher, the better chance she or he will be given an accurate evaluation that helps improve teaching practice.

This is the idea in a nutshell: to improve teacher practice. Any evaluation system that does not do this – and not all of them do – is of little value, and may be counter-productive. How do we evaluate the evaluation system? We determine whether it actually does this.

Districts that don’t have the money to completely change their evaluation systems can take some first steps that the foundation and the school districts thought would make a meaningful difference. Those ideas include:

– Better training and certification for observers, including videotaping lessons and having more than one person evaluate a teacher.

– Student surveys to supplement other methods of evaluation or as a way to help teachers and their mentors work together.

– Convene meetings between teachers and administrators to start collaborating on improving the evaluation system.

– Look at the foundation’s research results and start a conversation about which parts of a teacher’s practice are most closely linked to student success. Focus professional development on those areas.

The last two are the cheapest, in fact will likely cost next to nothing, and will produce the most long-term benefit. Once again, focus on teacher’s practice and start serious conversations about it. A number of years ago, there was a period of a few years that several of us, mostly from the Language Arts department, stopped at a nearby tavern after school on Friday afternoons. What did we talk about? We talked shop! We talked about what we did that did or did not work. We talked about curriculum and brainstormed ideas, many of which eventually got written up and sent on to Secondary Curriculum Committee. I learned more about teaching in these few hours every week than in many formal education courses. These sessions were more productive than many formal meetings. In fact, our curriculum meetings were all the more productive because we had already discussed many new ideas and brought them to the formal meeting.

Randi Weingarten, president of the American Federal of Teachers, expressed concern that too much emphasis is being placed on evaluating teachers and not on improving their performance.

“Until we make a commitment to develop evaluation systems that are first and foremost about continuous improvement and professional growth, we will continue to struggle in our efforts to provide every child with a high-quality education,” she said in a written statement.

The boldface italics are mine. Nuff said.

http://www.idahostatesman.com/2012/01/06/1942245/study-effective-teacher-evaluation.html#storylink=misearch

http://www.idahostatesman.com/2012/01/11/1948842/idaho-teacher-evaluations-to-include.html#storylink=latest

http://www.idahostatesman.com/2012/01/10/1947084/apnewsbreak-id-to-adopt-new-school.html

This entry was posted in Education Reform, Teacher Accountability. Bookmark the permalink.