educational assessment

Desperation is seldom fun to see.

Some years ago, having recently released our online marking functionality I was reviewing some of the functionality in a customer workspace I was intrigued to see “Beyonce??” mentioned in a marker’s comments on an essay. The student’s essay was evaluating some poetry and had completely misunderstood the use of metaphor in the poem in question. The student also clearly knew that her interpretation was way off, but didn’t know how and had reached the end of her patience. So after a desultory attempt at answering, with a cry from the heart, reminiscent of William Wallace’s call for freedom, she wrote “BEYONCE” with about seventeen exclamation points. It felt good to see that her spirit was not broken, and it was a moment of empathy that drove home the damage that standardized tests are inflicting on our students. That vignette is playing itself out millions of time each year in this country, the following explains why.

What are “Standardized Tests”?

We use standardized tests for a variety of reasons, but underlying every reason (curriculum effectiveness, college/career preparedness, teacher effectiveness, etc.) is the understanding that the test is measuring what a student has learned. In order to know how all our students are doing, we give them all standardized tests, meaning every student receives essentially the same set of tests. So, a standardized test is a test where all students take essentially the same test. This is a difficult endeavor given the wide range of students and number of tests, and raises the question “How do we do this reliably and in a reasonable amount of time?”

Accuracy and Difficulty vs Length

We all want tests to reliably measure the students’ learning. In order to make these tests reliable, we need to supply questions of varying difficulty, from very easy to very difficult, to cover a wide range of abilities. In order to reduce the length of the test, most of the questions fall in the medium easy to medium difficulty range because that is where most of the students’ ability level will fall. So the test that best balances length and accuracy for the whole population should be constructed such that the amount of questions of any difficulty is proportionate to the number of students of that ability.

Why are most questions in the medium difficulty range? Imagine creating a test to measure 10th graders’ math ability. A small number of the students might have a couple years of calculus. If the test covered those topics, imagine the experience of most students who would often not even understand the notation in the question. Frustrating, right? On the other hand, if the test was also constructed to measure students with only rudimentary math knowledge, these average to advanced students would be frustrated and bored from answering a lot of questions on basic math facts. The solution most organizations use is to present only a few questions that are really easy or difficult, and accept that this score is not as accurate as they would prefer for the students at either end of the ability range.

These Tests are Inaccurate and Mean Spirited

The problem is that while this might work OK for a lot of kids, it exacts a pretty heavy toll on others. Almost one in five students will not know the answer to 80% of the questions on these tests, and scoring about 20% on a test certainly feels like failing. It feels like failing every time a student takes such a test. Over the course of an academic career, students in the bottom quintile will guess on or skip 10,000 questions. That is 10,000 times the student is told that school, learning, or success is not for them. Even biasing the test to be easier only makes a slight improvement.

Computerized Adaptive Testing, Test Performance with Bell Curve

The shaded area represents students who will miss at least 80% of questions.

It isn’t necessarily better for the top students whose every testing experience assures them that they are already very successful when the reality is that they are likely being outperformed by a significant percentage of their future colleagues.

In other words, at both ends of the Bell Curve, we are serving our students very poorly, inadvertently encouraging lower performing students to give up (there is some evidence that the two correlate) and higher performing students to take it easy. It is no wonder that people dislike standardized tests.

There is a Solution

A computerized adaptive test (CAT) solves all the problems outlined above. Properly constructed, a CAT has the ability to make the following faster, fairer, and more valid:

  • Every examinee completes the test in less time (fast)
  • Every examinee gets a more accurate score (valid)
  • Every examinee receives questions tuned to their ability so gets about half right (fair)

Given all the advantages of CAT, it may seem hard to believe that they are not used more often. While they are starting to catch on, it is not fast enough given the heavy toll that the old methods exact on our students. It is true that few testing providers can enable CATs, but that is simply making an excuse. If a standardized test is delivered to as few as 500 students it can be made adaptive. It probably isn’t, but it could be. All that is needed are computers or tablets, an Internet connection, and some effort. We should expect more.

How can my organization implement CAT?

While CAT used to only be feasible for large organizations that tested hundreds of thousands or millions of examinees per year, a number of advances have changed this landscape.  If you’d like to do something about your test, it might be worthwhile for you to evaluate CAT.  We can help you with that evaluation; if you’d like to chat, here is a link to schedule a meeting. Or contact me if you’d like to discuss the math or related ideas please drop me a note.

The following two tabs change content below.