What are the best practices for test item review?

item banking

What is an item review?  It is the process of performing quality control on items before they are ever delivered to examinees. 

This is an absolutely essential step in the development of items for medium and high stakes exams; while a teacher might not have other teachers review questions on a 4th-grade math quiz, items that are part of an admissions exam or professional certification exam will go through multiple layers of independent item review before a single examinee sees them. 

This blog post will discuss some important aspects of the item review process.

Why item review?

Assessment items are, when you look at it from a business perspective, a work product.  They are component parts of a larger machine, the test or assessment; in some cases interchangeable, in other cases very intentional and specific.  It is obviously common practice to perform quality assurance on work products, and the item review process simply applies this concept to test questions.

Who does the item review?

This can differ greatly based on the type of assessment and the stakes involved.  In a medium-stakes situation, it might be just one other reviewer.  A professional certificate exam might have all items reviewed by one content expert other than the person who wrote the item, and this could be considered sufficient. 

In higher-stakes exams that are developed by large organizations, the item might go through two content reviewers, a psychometric reviewer, a biased reviewer, and an editor.  Additionally, it then might go through additional stages for formatting.  You can see how this can then become a very big deal, with dozens of people and hundreds of items floating around.

What do the reviewers check?

It depends on who the reviewer is, but there are often checklists that the organization provides.  A content reviewer might check that the stem is clear, the key is fully correct, the distractors fully incorrect, and all answers of reasonably equivalent length. 

The psychometric reviewer might check for aspects that inadvertently tip off the correct answer.  The bias reviewer might look for a specific set of situations that potentially disadvantage some subgroup of the population.  An editor might look for correct usage of punctuation, such as the fact that the stem should never end in a colon.

For example, during my graduate school years, I used to write items that were eventually used in the US State of Alaska for K-12 assessments.  The reviewers not only looked for straightforward issues like answer correctness but for potential bias in the case of Alaskans.  As item writers, we were warned to be careful about mentioning any objects that we take for granted in the Lower 48: roads, shopping malls, indoor plumbing, and farms are examples that come to mind. Checking this was a stage of item review.

How do we manage the work?

The best practice to manage the process is to implement stages.  An organization might decide that all items go to the reviewers listed previously, and in the order that I described them.  Each one must complete their review checklist before the item can be moved onto the next stage.  This might seem like a coldhearted assembly line, given that there certainly is an art to writing good items, but assembly lines unarguably lead to greater quality and increased productivity.

Is there software that makes the item review process easier?

Yes.  You have likely used some form of work process management software in your own jobs, such as Trello, JIRA, or Github. These are typically based on the concept of swimlanes, which as a whole is often referred to as a Kanban board.  Back in the day, Kanban boards were actual boards with post-its on them, as you might have seen on shows like Silicon Valley.  

This presents the aforementioned stages as columns in a user interface, and tasks (items) are moved through the stages.  Once Content Reviewer 1 is done with their work and leaves comments on the item, the software provides a way for them to change the stage to Content Review 2 and assign some person as Content Reviewer 2.

Below is an example of this from ASC’s online assessment platform, Assess.ai.  Because Assess.ai is designed for organizations that are driven by best practices and advanced psychometrics, there is an entire portion of the system dedicated to the management of item review via the swimlanes interface.

Item-review-kanban

To implement this process, an administrator at the organization defines the stages that they want all items to receive, and Assess.ai will present these as columns in the swimlane interface.  Administrators can then track and manage the workflow visually.  The reviewers themselves don’t need access to everything, but instead are instructed to click on the items they are supposed to review, and they will be presented with an interface like the one below.

FastTest Item Review

Can I implement Kanban item review at my organization?

Absolutely!  Assess.ai is available as a free version (sign up here), with a limit of 500 items and 1 user.  While this means that the free version won’t let you manage dozens of users, you can still implement some aspects of the process to improve item quality in your organization.  Once you are ready to expand, you can simply upgrade your account and add the users. 

Want to learn more?  Drop us an email at solutions@assess.com.

Nathan Thompson, PhD

Nathan Thompson, PhD, is CEO and Co-Founder of Assessment Systems Corporation (ASC). He is a psychometrician, software developer, author, and researcher, and evangelist for AI and automation. His mission is to elevate the profession of psychometrics by using software to automate psychometric work like item review, job analysis, and Angoff studies, so we can focus on more innovative work. His core goal is to improve assessment throughout the world.

Nate was originally trained as a psychometrician, with an honors degree at Luther College with a triple major of Math/Psych/Latin, and then a PhD in Psychometrics at the University of Minnesota. He then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. He is also cofounder and Membership Director at the International Association for Computerized Adaptive Testing (iacat.org). He’s published 100+ papers and presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/.

Share This Post

Facebook
Twitter
LinkedIn
Email

More To Explore

Psychometrics

The IRT Item Difficulty Parameter

The item difficulty parameter from item response theory (IRT) is both a shape parameter of the item response function (IRF) but also an important way

waves paper
Psychometrics

The One Parameter Logistic Model

The One Parameter Logistic Model (OPLM or 1PL or IRT 1PL) is one of the three main dichotomous models in the item response theory (IRT)