How student evaluations are harming liberal education in America
I’ve never been a big fan of student evaluations since many of the students in the first class I ever taught at a university decided to write that mine had been the worst class they had ever taken and that I ought to be fired for crimes committed against academia. Despite possibly gaining a permanent chip-on-my-shoulder against the wealthy late-teen demographic, I have to admit that those evaluations encouraged me to think about how I was teaching (and probably closed a few doors to me in subsequent job applications).
Although my experience is that subtle and constructive commentary are somewhat a rarity on student evaluations (most of the comments on the free response portion, positive and negative, are pretty much of the caliber you’d find on a YouTube video), students are, in theory, in the best position to offer useful feedback about a course after having just experienced it for 50+ hours over several months. And, occasionally, a good comment has led me down newer and better paths in teaching.
The problem, however, is not the emotional body-blow that some nasty comments engender – at least these tend to be offset by the students who take the time to write kind “that-a-boys,” but, rather, how many faculty and administrative-types use 5-point average bubble scores denoting, à la Bill and Ted, how “excellent” their professors and courses are. My university, and I’m sure many others, use the IDEA form, which can be seen here (this is what the professors get back – our report card and pretty much a lot of what our teaching life comes down to).
Most students don’t realize how central those evaluations are to the careers of the professors they are rating. After all, they are pretty much tasked to give an impressionistic “grade” in the final minutes of some class at the end of the semester. How much weight could a moment filling out a bubble carry?
It turns out at many universities the answer is – a lot. Attempting to emulate the trends of public schools and business, most university departments rank pay increases each year according to “merit” (tenure-track faculty go through this – adjuncts just don’t get rehired). In some places, scholarship – publishing and such – plays a large role. In other schools that consider themselves more teaching-oriented merit pay hinges largely on whatever information is available to indicate better or worse teaching. Administrators usually lean heavily on the “objective” teaching evaluation numbers to determine these rankings. Considering that raises are general computed on an annual percentage basis that compounds in future years, one early bad evaluation can cost the recipient thousands of dollars over their career. Continually being “outperformed” by one’s peers – tens of thousands of dollars.
The situation is mostly the same when faculty members get together in small groups to discuss the tenure prospects of their colleagues. Tasked with evaluating the teaching effectiveness of their colleagues and determine whether they can retain their job, they have to work with the information they are given. Everyone wants to be fair-minded and objective – so, the tendency is to consult the big, bold teaching evaluation numbers first. At my university, the requirements for the tenure dossier asks the candidate to compile a handy chart so the numbers are easily accessible. At least in the case of tenure applications a faculty member is able to provide other, supplemental materials to establish their teaching cred. In the end, however, no one is likely to be impressed by thought-provoking class activities or a well-designed syllabus for a class that has sub-par student evaluations.
Of course if a professor is denied tenure in whole or part due to below-average teaching evaluations – good luck finding another job! Most academic positions receive dozens, if not hundreds of applications – and most ask to see those teaching evaluation numbers. In many cases, particularly the more liberal artsy kind of “teaching schools,” an applicant would be better off having a few tenths of a point better average on teaching evaluations than they would a page of publications – especially if they fall on the wrong side of the magic number “4.0,” which represents around the average value of the typical teaching evaluation.
All of this is fine and good if student evaluations of professors reward the good and weed out the bad. We live in a meritocratic country and academia is not a place you want to make a living if you are averse to competition. So, the questions remains, how much do student evaluation really reflect the goal of “teaching effectiveness” they are used to measure?
Given the simple purposes to which student evaluation averages are put, the answer to the question is extremely complex. The human tendency, of course, is for professors with high evaluations to pat themselves on the back for all their talent and hard work and to recognize the keen insight and good judgment of their students while those with lower evaluations just kind of suffer quietly with their low self-esteem. Either way, most faculty, in my experience have been thoroughly acculturated and institutionalized with the idea that student evaluation scores and teaching effectiveness are interchangeable terms.
Hundreds of scholarly articles have been written on the question of what factors influence student evaluation. The large majority explore factors completely unrelated to teaching effectiveness, such as those cited in this short piece that references 15 research findings of ways to improve student evaluations in manners that one would expect to mostly be related only tangentially to teaching outcomes.
The most valid way to answer whether teaching effectiveness and student evaluation scores are related is to conduct an experiment that holds non-teaching factors (such as class times and most importantly, course material) equal and then compares the performance of students in difference faculty classes with the student evaluations those faculty receive.
Fortunately, this experiment has been conducted several times, for instance in classes with multiple sections with different instructors and standardized testing at the end. I mention the results of a “meta” study in my recent book that reports that research, on average, finds that about 20% of the variation in student performance is reflected in different teaching evaluations. This source mentions 16%-25%. Thus, while it is fair to say that student evaluations are related to teaching effectiveness, probably at least 75% of what they reflect are other factors unrelated to teaching effectiveness.
If variations in student evaluations are 75% “something else,” what are the implications? Beyond the issue of fairness, does it matter that faculty with average student evaluations in the high 4s are consistently deemed more effective teachers than those in the high 3s? After all, student satisfaction certainly plays a role in student decisions like continuing to pursue a field and, perhaps more cynically, later donate money to their alma mater.
The main problem is that the manner in which student evaluations are used creates perverse incentives for faculty members that can actually reduce learning outcomes in the classroom. I’ll not soon forget a conversation I had last summer when a faculty member at my university gave a talk to a group of us who were training to be better at incorporating technology into the classroom setting. The conversation went something like this:
Faculty member: “After I introduced an interactive online component to the class, the students came in better prepared and more willing to talk about the material. I could tell they really got things in comparison to earlier semesters. The bad thing, though, was that my student evaluations actually went down right before I went up for full professor. I would hesitate to do it again.”
Me: “So, just to be clear, you believe the learning outcome was better, but your evaluations went down?”
Faculty member: “Yes. And when you go up for full professor you really want to be a perfectionist.
Another example is a study mentioned recently in the New York Times that revealed that (surprise!) frequent quizzes improve student learning. What caught my attention, though, was the quotation by one of the authors which stated: “Sam and I usually get really high course evaluations . . . these were the lowest ever.” Depending on how those evaluations are used at the researchers’ university, they may have to make the same choice facing instructors at institutions across the country every day: namely, should I encourage student learning or pursue enlightened self-interest?
Another set of questions involves whether basing important decisions on student evaluations is associated with grade inflation and the dumbing down of course content. It’s unclear if students actually respond to instructors who consciously (or I think often unconsciously) grade a bit easier or require less thinking and effort, but it is clear that faculty often believe this to be the case (see this recent survey). Such a perception among faculty is enough to suggest a problem almost certainly exists.
If curriculum’s are being “dumbed down,” that fact seems to be reflected in a bombshell study that came out in 2011 suggesting that students tend to make very little progress in critical thinking skills during their time at college. The same types of courses that help students become better thinkers are often the same types of courses that those institutions – “liberal institutions” – specialize in. They are also the same types of institutions that rely heavily on teaching and all-too-often the student evaluations that go along with assessing the quality of that teaching. An analogy was recently drawn by a writer in the Wall Street Journal who compared the student evaluation process to what would happen if restaurants gave health inspectors grades. Suffice it to say, I don’t think people would want to eat out as often.
The problem for the country as a whole is that the misuse of student evaluation of teachers takes place almost everywhere. Colleges and universities compete with one another for students and people tend to judge their academic experiences in many classes based on how much they enjoyed themselves. If colleges want to compete, they have to, like any other enterprise, give the consumers what they want.
Many of our present and future leaders have graduated from colleges that have conducted operations informed by the goal of maximizing student satisfaction. They govern our lives without having really been challenged in a way that would later help them weigh evidence and understand complex issues like global climate change or fiscal policy. The country and the world is and will be the worse for it. All because of a little bubble sheet.