Multistage Testing

December 29, 2022

Multistage testing (MST) is a type of computerized adaptive testing (CAT). This means it is an exam delivered on computers which dynamically personalize it for each examinee or student. Typically, this is done with respect to the difficulty of the questions, by making the exam easier for lower-ability students and harder for high-ability students. Doing this makes the test shorter and more accurate while providing additional benefits. This post will provide more information on multistage testing so you can evaluate if it is a good fit for your organization.

Already interested in MST and want to implement it? Contact us to talk to one of our experts and get access to our powerful online assessment platform, where you can create your own MST and CAT exams in a matter of hours.

What is multistage testing?

Like CAT, multistage testing adapts the difficulty of the items presented to the student. But while adaptive testing works by adapting each item one by one using item response theory (IRT), multistage works in blocks of items. That is, CAT will deliver one item, score it, pick a new item, score it, pick a new item, etc. Multistage testing will deliver a block of items, such as 10, score them, then deliver another block of 10.

The design of a multistage test is often referred to as panels. There is usually a single routing test or routing stage which starts the exam, and then students are directed to different levels of panels for subsequent stages. The number of levels is sometimes used to describe the design; the example on the right is a 1-3-3 design. Unlike CAT, there are only a few potential paths, unless each stage has a pool of available testlets.

As with item-by-item CAT, multistage testing is almost always done using IRT as the psychometric paradigm, selection algorithm, and scoring method. This is because IRT can score examinees on a common scale regardless of which items they see, which is not possible using classical test theory.

To learn more about MST, I recommend this book.

Why multistage testing?

Item-by-item CAT is not the best fit for all assessments, especially those that naturally tend towards testlets, such as language assessments where there is a reading passage with 3-5 associated questions.

Multistage testing allows you to realize some of the well-known benefits of adaptive testing (see below), with more control over content and exposure. In addition to controlling content at an examinee level, it also can make it easier to manage item bank usage for the organization.

How do I implement multistage testing?

1. Develop your item banks using items calibrated with item response theory

2. Assemble a test with multiple stages, defining pools of items in each stage as testlets

3. Evaluate the test information functions for each testlet

4. Run simulation studies to validate the delivery algorithm with your predefined testlets

5. Publish for online delivery

Our industry-leading assessment platform manages much of this process for you. The image to the right shows our test assembly screen where you can evaluate the test information functions for each testlet.

Benefits of multistage testing

There are a number of benefits to this approach, which are mostly shared with CAT.

Shorter exams: because difficulty is targeted, you waste less time
Increased security: There are many possible configurations, unlike a linear exam where everyone sees the same set of items
Increased engagement: Lower ability students are not discouraged, and high ability students are not bored
Control of content: CAT has some content control algorithms, but they are sometimes not sufficient
Supports testlets: CAT does not support tests that have testlets, like a reading passage with 5 questions
Allows for review: CAT does not usually allow for review (students can go back a question to change an answer), while MST does

Examples of multistage testing

MST is often used in language assessment, which means that it is often used in educational assessment, such as benchmark K-12 exams, university admissions, or language placement/certification. One of the most famous examples is the Scholastic Aptitude Test from The College Board; it is moving to an MST approach in 2023.

Because of the complexity of item response theory, most organizations that implement MST have a full-time psychometrician on staff. If your organization does not, we would love to discuss how we can work together.

Nathan Thompson

Nathan Thompson earned his PhD in Psychometrics from the University of Minnesota, with a focus on computerized adaptive testing. His undergraduate degree was from Luther College with a triple major of Mathematics, Psychology, and Latin. He is primarily interested in the use of AI and software automation to augment and replace the work done by psychometricians, which has provided extensive experience in software design and programming. Dr. Thompson has published over 100 journal articles and conference presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/ .

Ready to talk to an assessment expert?

Get in touch, and we'll meet to discuss how we can improve your exam development, delivery, and psychometrics!

Request a Consultation