Estimated reading time: 6 minutes
Computerized adaptive testing (CAT) has been around since the 1970s and is well-known for the benefits it can provide, most notably that it can reduce testing time 50-90% with no loss of measurement precision. Developing a sound, defensible CAT is not easy, but our goal is to make it as easy as possible – that is, everything you need is available in clean software UI and you never have to write a single line of code.
Here, we outline the software, data analysis, and project management steps needed to develop a computerized adaptive test that aligns with best practices and international standards.
This approach is based on Thompson and Weiss (2011) model, which refers there for a general treatment of CAT development, especially the use of simulation studies. Also, this article assumes you have mastered the concepts of item response theory and CAT, including:
- IRT models (e.g., 3PL, rating scale model)
- Item response functions
- Item information functions
- Theta estimation
- Conditional standard error of measurement
- Item selection algorithms
- Termination criterion.
If IRT is new to you, please visit these resources
If you have some background in IRT but CAT is new, please visit these resources
Overview: Steps to develop an adaptive test
There are nine steps to developing a CAT on our industry-leading platform, fastest:
|Step||Work to be done||Software|
|1||Perform feasibility and planning studies||CATSim|
|2||Develop item bank||FastTest|
|3||Pilot items on 100-2000 examinees||FastTest|
|4||Perform item analysis and other due diligence||Iteman/Xcalibre|
|6||Upload IRT parameters into FastTest||FastTest|
We’ll now talk a little more about each of these.
Perform feasibility and planning studies
The first step, before doing anything else, is to confirm that your assessment meets the basic requirements of CAT. For example, you need to have a decent-sized item bank, data on hundreds of examinees (or the future opportunity), and items that are scoreable in real-time. See this paper for a full discussion.
If there are no huddles, then the next step is to perform monte Carlo simulations that help you scope out the project, using the CATSim software. For example, you might simulate CATs with three sizes of the item bank, so you have a better idea of how many items to write.
Develop item bank
Now that you have some idea of how many items you need and in which ranges of difficulty and/or content constraints, you can leverage the powerful item authoring functionality of FastTest, as well as the item review and workflow management to ensure that subject matter experts are performing quality assurance on each other.
Because IRT requires that you have data from real examinees to calibrate item difficulty, you need to get that data. To do so, create test(s) in FastTest to deliver all your items in a matter that meets your practical situation. That is, some organizations have a captive audience and might be able to have 500 people take all 300 items in their bank next week. Other organizations might need to create 4 linear forms of 100 items with some overlap. Others might be constrained to still use current test forms and only tack on 20 new items onto the end of every examinee’s test.
Of course, some of you might have existing data. That is, you might have spreadsheets of data from a previous test delivery system, paper-based delivery, or perhaps even already have your IRT parameters from past efforts. You can use those too.
If you do deliver the pilot phase with FastTest, you now need to export the data to be analyzed in psychometric analytic software. FastTest makes it easy to export both the data matrix and the item metadata needed for Xcalibre’s control file.
Perform item analysis, DIF, and other due diligence
The purpose of this step is to ensure that items included in your future CAT are of high quality. Any steps that your organization normally does to review item performance is still relevant. This typically includes a review of items with low point-biserial correlations (poor discrimination), items where more examinees selected a distractor than the correct option (key flags), high or low classical P values, and differential item functioning (DIF) flags. Our Iteman software is designed exactly for this process. If you have a FastTest account the Iteman analysis report is now available at a single click. If not, Iteman is also available as a standalone program.
Calibrate with Xcalibre
Because CAT algorithms rely entirely on IRT parameters (unless you are doing special algorithms like diagnostic measurement models or measurement decision theory), we need to calculate the IRT parameters and get them into our testing platform. If you have delivered all your items in a single block to examinees, like the example above with 500 people, then that single matrix can just be analyzed with Xcalibre. If you have multiple forms, LOFT, or the “tack-on” approach, you need to worry about IRT equating.
Upload IRT parameters into FastTest
Xcalibre will provide all the IRT parameters in a spreadsheet, in addition to the primary Word report. Import them into your testing platform. This will associate the IRT parameters with all the items in your CAT pool. FastTest has functionality to streamline this process.
Now that you have your final pool of items established, and calculated the IRT parameters, you need to establish the algorithms you are going to use to publish the CAT. That is, you need to decide on the Initial Theta rule, Item Selection rule (including sub algorithms like content or exposure constraints), and Termination Criterion. To establish these, you need to perform more simulation studies, but now with your final bank as the input rather than a fake bank from the monte Carlo simulations. The most important aspect is determining the tradeoff between test length and precision; a termination criterion that provides more precise scores will have longer tests, and you can control the exact extent with a CAT.
Assemble a “test form” in FastTest that consists of all the items you intend to use in your CAT tool. Then select CAT as the delivery method in the Test Options screen, and you’ll see a screen where you can input the results from your CATSim validity study for the three important CAT algorithms.
Your CAT is now ready to go! Before bringing in real students, however, we recommend that you take it a few times as QA. Do so with certain students in mind, such as a very low student, a very high student, or one near the cut score (if you have one). To peek under the hood at the CAT algorithm, you can export the Examinee Test Detail Report from FastTest, which provides an item-by-item picture of how the CAT proceeds.
As you can see, the development of an adaptive test is not easy and can take months even if you have all the software and expertise you need. But for something so important, that could be used to make important decisions about people, this is absolutely warranted.
However, if you have all the data you need today, there’s no reason that it should take months to develop an adaptive test – assessment platforms should make it easy enough for you to do so in an afternoon, which FastTest absolutely does.