assessment systems logo

How do I develop a computerized adaptive test?


Computerized adaptive testing (CAT) has been around since the 1970s and is well-known for the benefits it can provide, most notably that it can reduce testing time 50-90% with no loss of measurement precision.  Developing a sound, defensible CAT is not easy, but our goal is to make it as easy as possible – that is, everything you need is available in clean software UI and you never have to write a single line of code.

Here, we outline the software, data analysis, and project management steps needed to develop a computerized adaptive test that aligns with best practices and international standards.

This approach is based on Thompson and Weiss (2011) model, which refers there for a general treatment of CAT development, especially the use of simulation studies.  Also, this article assumes you have mastered the concepts of item response theory and CAT, including:

  • IRT models (e.g., 3PL, rating scale model)
  • Item response functions
  • Item information functions
  • Theta estimation
  • Conditional standard error of measurement
  • Item selection algorithms
  • Termination criterion.


If IRT is new to you, please visit these resources

If you have some background in IRT but CAT is new, please visit these resources

Overview: Steps to develop an adaptive test

There are nine steps to developing a CAT on our industry-leading platform, fastest:

Step Work to be done Software
1 Perform feasibility and planning studies CATSim
2 Develop item bank FastTest
3 Pilot items on 100-2000 examinees FastTest
4 Perform item analysis and other due diligence Iteman/Xcalibre
5 IRT calibration Xcalibre
6 Upload IRT parameters into FastTest FastTest
7 Validity study CATSim
8 Publish CAT FastTest
9 Quality assurance FastTest


We’ll now talk a little more about each of these.


Perform feasibility and planning studies

The first step, before doing anything else, is to confirm that your assessment meets the basic requirements of CAT.  For example, you need to have a decent-sized item bank, data on hundreds of examinees (or the future opportunity), and items that are scoreable in real-time.  See this paper for a full discussion. 

 If there are no huddles, then the next step is to perform monte Carlo simulations that help you scope out the project, using the CATSim software.  For example, you might simulate CATs with three sizes of the item bank, so you have a better idea of how many items to write.


Develop item bank

Now that you have some idea of how many items you need and in which ranges of difficulty and/or content constraints, you can leverage the powerful item authoring functionality of FastTest, as well as the item review and workflow management to ensure that subject matter experts are performing quality assurance on each other.


Pilot items

Because IRT requires that you have data from real examinees to calibrate item difficulty, you need to get that data.  To do so, create test(s) in FastTest to deliver all your items in a matter that meets your practical situation. That is, some organizations have a captive audience and might be able to have 500 people take all 300 items in their bank next week.  Other organizations might need to create 4 linear forms of 100 items with some overlap. Others might be constrained to still use current test forms and only tack on 20 new items onto the end of every examinee’s test.

Of course, some of you might have existing data.  That is, you might have spreadsheets of data from a previous test delivery system, paper-based delivery, or perhaps even already have your IRT parameters from past efforts.  You can use those too.

If you do deliver the pilot phase with FastTest, you now need to export the data to be analyzed in psychometric analytic software.  FastTest makes it easy to export both the data matrix and the item metadata needed for Xcalibre’s control file.


Perform item analysis, DIF, and other due diligence

The purpose of this step is to ensure that items included in your future CAT are of high quality.  Any steps that your organization normally does to review item performance is still relevant. This typically includes a review of items with low point-biserial correlations (poor discrimination), items where more examinees selected a distractor than the correct option (key flags), high or low classical P values, and differential item functioning (DIF) flags.  Our Iteman software is designed exactly for this process. If you have a FastTest account the Iteman analysis report is now available at a single click. If not, Iteman is also available as a standalone program.


Calibrate with Xcalibre

Because CAT algorithms rely entirely on IRT parameters (unless you are doing special algorithms like diagnostic measurement models or measurement decision theory), we need to calculate the IRT parameters and get them into our testing platform.  If you have delivered all your items in a single block to examinees, like the example above with 500 people, then that single matrix can just be analyzed with Xcalibre. If you have multiple forms, LOFT, or the “tack-on” approach, you need to worry about IRT equating.


Upload IRT parameters into FastTest

Xcalibre will provide all the IRT parameters in a spreadsheet, in addition to the primary Word report.  Import them into your testing platform.  This will associate the IRT parameters with all the items in your CAT pool.  FastTest has functionality to streamline this process.


Validity study

Now that you have your final pool of items established, and calculated the IRT parameters, you need to establish the algorithms you are going to use to publish the CAT.  That is, you need to decide on the Initial Theta rule, Item Selection rule (including sub algorithms like content or exposure constraints), and Termination Criterion. To establish these, you need to perform more simulation studies, but now with your final bank as the input rather than a fake bank from the monte Carlo simulations.  The most important aspect is determining the tradeoff between test length and precision; a termination criterion that provides more precise scores will have longer tests, and you can control the exact extent with a CAT.


Publish CAT

Assemble a “test form” in FastTest that consists of all the items you intend to use in your CAT tool.  Then select CAT as the delivery method in the Test Options screen, and you’ll see a screen where you can input the results from your CATSim validity study for the three important CAT algorithms.


Quality assurance

Your CAT is now ready to go!  Before bringing in real students, however, we recommend that you take it a few times as QA.  Do so with certain students in mind, such as a very low student, a very high student, or one near the cut score (if you have one).  To peek under the hood at the CAT algorithm, you can export the Examinee Test Detail Report from FastTest, which provides an item-by-item picture of how the CAT proceeds.

Adaprive testing examinee report


As you can see, the development of an adaptive test is not easy and can take months even if you have all the software and expertise you need.  But for something so important, that could be used to make important decisions about people, this is absolutely warranted.

  However, if you have all the data you need today, there’s no reason that it should take months to develop an adaptive test – assessment platforms should make it easy enough for you to do so in an afternoon, which FastTest absolutely does.

Want to talk with one of our experts about applying this process to your exam?  Get in touch or sign up for a free account

The following two tabs change content below.

Nathan Thompson, PhD

CEO at Assessment Systems
Nathan Thompson, PhD, is CEO and Co-Founder of Assessment Systems Corporation (ASC). He is a psychometrician, software developer, author, and researcher, and evangelist for AI and automation. His mission is to elevate the profession of psychometrics by using software to automate psychometric work like item review, job analysis, and Angoff studies, so we can focus on more innovative work. His core goal is to improve assessment throughout the world. Nate was originally trained as a psychometrician, with an honors degree at Luther College with a triple major of Math/Psych/Latin, and then a PhD in Psychometrics at the University of Minnesota. He then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. He is also cofounder and Membership Director at the International Association for Computerized Adaptive Testing ( He's published 100+ papers and presentations, but his favorite remains

Latest posts by Nathan Thompson, PhD (see all)

Share This Post


More To Explore

certification exam delivery

Certification Exam Delivery: Guidelines For Success

Certification exam delivery is the process of administering a certification test to candidates.  This might seem straightforward, but it is surprisingly complex.  The greater the

Multiple choice bubble sheet - split half reliability

Split Half Reliability

Split Half Reliability is an internal consistency approach to quantifying the reliability of a test, in the paradigm of classical test theory.  The name comes