Item banking refers to the purposeful creation of a database of items intending to measure a predetermined set of constructs. The term item refers to what many call questions; though their content need not be restricted as such and can include problems to solve or situations to evaluate in addition to straightforward questions. The art of item banking is the organizational structure by which items are categorized. As a critical component of any high-quality assessment, item banking is the foundation for the development of valid, reliable content and defensible test forms. Automated item banking systems, such as the Item Explorer module of FastTest, result in significantly reduced administrative time for maintaining content and producing tests. While there are no absolute standards in creating and managing item banks, best practice guidelines are emerging. Some of the essential aspects include ensuring that:

  • Items are reusable objects; when selecting an item banking platform it is important to ensure that items can be used more than once; ideally item performance should be tracked not only within a test form, but across test forms as well.
  • Item history and usage is tracked; the usage of a given item, whether it is actively on a test form or dormant waiting to be assigned, should be easily accessible for test developers to assess, as the over-exposure of items can reduce the validity of a test form. As you deliver your items, their content is exposed to examinees. Upon exposure to many examinees, items can then be flagged for retirement or revision to reduce cheating or teaching to the test.
  • Items can be sorted; as test developers select items for a test form, it is imperative that they can sort items based on their content area or other categorization method, so as to select a sample of items that is representative of the full breadth of constructs we intend to measure.
  • Item versions are tracked; as items appear on test forms, their content may be revised for clarity. Any such changes should be tracked and versions of the same item should have some link between them so that we can easily review the performance of earlier versions in conjunction with current versions.
  • Review process workflow is tracked; as items are revised and versioned, it is imperative that the changes in content and the users who made these changes are tracked. In post-test assessment, there may be a need for further clarification, and the ability to pinpoint who took part in reviewing an item an expedite that process.
  • Metadata is recorded; any relevant information about an item should be recorded and stored with the item. The most common applications for metadata that we see are author, source, description, content area, depth of knowledge, IRT parameters, and CTT statistics, but there are likely many data points specific to your organization that are worth storing.

Keeping these guidelines in mind, here are some concrete steps that you can take to establish your item bank in accordance with psychometric best practices.

Make your Job Easier: Establish a Naming Convention

Names are important. As you are importing or creating your item banks it is important to identify each item with a unique, but recognizable name. Naming conventions should reflect your bank’s structure and should include numbers with leading zeros to support true numerical sorting. For example, let’s consider the item banks of a high school science teacher. Take a look at the example below:

What are some ways that this utilizes best practices?

  • Each subject has its own item bank. We can easily view all Biology items by selecting the Biology item bank.
  • A separate folder, 8Ah clearly delineates items for honors students.
  • The item names follow along with the item bank and category names, allowing us to search for all items for 8th grade unit A-1 with the query “8A-1”, or similarly for honors items “8Ah-1”
  • Leading zeros are used so that as the item bank expands, items will sort properly; an item ending in 001 will appear before 010.

Indeed, the execution of these best practices should be adapted to the needs of your organization, but it is important to establish a convention of some kind.  That is, you can use a period rather than underscore – as long as you are consistent.

Prepare for the Future: Store Extensive Metadata

Metadata is valuable. As you create items, take the time to record simple metadata like author and source. Having this information can prove very useful once the original item writer has moved to another department, or left the organization. Later in your test development life cycle, as you deliver items, you have the ability to aggregate and record item statistics. Values like discrimination and difficulty are fundamental to creating better tests, driving reliability and validity.

Statistics are used in the assembly of test forms, for example.  Classical statistics can be used to estimate mean, standard deviation, reliability, standard error, and pass rate, while item response theory parameters can be used to calculate test information and standard error functions. Data from both psychometric theories can be used to pre-equate multiple forms.

In the event that your organization decides to publish an adaptive test, utilizing CAT delivery, item parameters for each item will be essential because they are used for intelligently selecting items and scoring examinees. Additionally, in the event that the integrity of your test or scoring mechanism is ever challenged, documentation of validity is essential to defensibility and the storage of metadata is one such vital piece of documentation.

Increase Content Quality: Track Workflow

Utilize a review workflow to increase quality. Using a standardized review process will ensure that all items are vetted in a similar matter. Have a step in the process for grammar, spelling, and syntax review, as well as content review by a subject matter expert. As an item progresses through the workflow, its development should be tracked, as workflow results also serve as validity documentation.

Accept comments and suggestions from a variety of sources. It is not uncommon for each item reviewer to view an item through their distinctive lens. Having a diverse group of item reviewers stands to benefit your test takers, as they are likely to be diverse as well!

Keep Your Items Organized: Categorize Them

Identify items by content area. Creating a content hierarchy can also help you to organize your item bank and ensure that your test covers the relevant topics. Most often, we see content areas defined first by an analysis of the construct(s) being tested. In the event of a high school science test, this may include the evaluation of the content taught in class. For a high-stakes certification exam, this almost always includes a job-task analysis. Both methods produce what is called a test blue print, indicating how important various content areas are to the demonstration of knowledge in the areas being assessed. Once content areas are defined, we can assign items to levels or categories based on their content. As you are developing your test, and invariably referring back to your test blueprint, you can use this categorization to determine which items from each content area to select.

There is no doubt that item banking will remain a key aspect of developing and maintaining quality assessments. Utilizing best practices, and caring for your items throughout the test development life cycle, will pay great dividends as it increases the reliability, validity, and defensibility of your assessment.

Worried your current item banking platform isn’t up to par? We would love to discuss how Assessment Systems can help. FastTest was designed by psychometricians with an intuitive and easy to use item banking module. Check out our free version here, or contact us to learn more.

Want to improve the quality of your assessments with item banking?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Market Sector*
Lead Source

Authoring test items: Science as well as art

You are experts at what you do, and you want to make sure that your examinees are too.  In order to do so, you need tests that are reliable, valid, and legally defensible.  That said, it is likely that the test items within your tests are the greatest threat to its actual validity and reliability.

To find out whether your test items are your allies or your enemies, read through your test and identify the items that contain the most prevalent item construction flaws.  The first three of the most prevalent construction flaws are located in the item stem (i.e. question).  Look to see if your item stems contain…

1) BIAS – Nowadays, we tend to think of bias as relating to culture or religion, but there are many more subtle types of biases that oftentimes sneak into your tests.  Consider the following questions to determine the extent of bias in your tests:

  • Are there are acronyms in your test that are not considered industry standard?
  • Are you testing on policies and procedures that may vary from one location to another?
  • Are you using vocabulary that is more recognizable to a female examinee than a male?
  • Are you referencing objects that are not familiar to examinees from a newer or older generation?

2) NOT – We’ve all taken tests which ask a negatively worded question. These test items are easy to write, but they are devastating to the validity and reliability of your tests—particularly fast test-takers or individuals with lower reading skills.  If the examinee misses that one single word, they will get the question wrong even if they actually know the material.  This test item ends up penalizing the wrong examinees!

3) EXCESS VERBIAGE – Long stems can be effective and essential in many situations, but they are also more prone to two specific item construction flaws.  If the stem is unnecessarily long, it can contribute to examinee fatigue.  Because each item requires more energy to read and understand, examinees tire sooner and may begin to perform more poorly later on in the test—regardless of their competence level.

Additionally, long stems often include information that can be used to answer other questions in the test.  This could lead your test to be an assessment of whose test-taking memory is best (i.e. “Oh yeah, #5 said XYZ, so the answer to #34 is XYZ.”) rather than who knows the material.

Unfortunately, item stems aren’t the only offenders.  Experienced test writers actually know that the distractors (i.e. options) are actually more difficult to write than the stems themselves.  When you review your test items, look to see if your item distractors contain

4) IMPLAUSIBILTY – The purpose of a distractor is to pull less qualified examinees away from the correct answer by other options that look correct.  In order for them to “distract” an examinee from the correct answer, they have to be plausible.  The closer they are to being correct, the more difficult the exam will be.  If the distractors are obviously incorrect, even unqualified examinees won’t pick them, and your exam will not help you discriminate between examinees who know the material and examinees that do not.

5) 3-TO-1 SPLITS – You may recall watching Sesame Street as a child.  If so, you remember the song “One of these things…”  (Either way, enjoy refreshing your memory!)   Looking back, it seems really elementary, but sometimes our test item options are written in such a way that an examinee can play this simple game with your test.  Instead of knowing the material, they can look for the option that stands out as different from the others.  Consider the following questions to determine if one of your items falls into this category:

  • Is the correct answer significantly longer than the distractors?
  • Does the correct answer contain more detail than the distractors?
  • Is the grammatical structure different for the answer than for the distractors?

6) ALL OF THE ABOVE – There are a couple of problems with having this phrase (or the opposite “None of the above”) as an option.  For starters, good test takers know that this is—statistically speaking—usually the correct answer.  If it’s there and the examinee picks it, they have a better than 50% chance of getting the item right—even if they don’t know the content.  Also, if they are able to identify two options as correct, they can select “All of the above” without knowing whether or not the third option was correct.  These sorts of questions also get in the way of good item analysis.   Whether the examinee gets this item right or wrong, it’s harder to ascertain what knowledge they have because the correct answer is so broad.

The process of reading through your exams in search of these flaws is time-consuming (and oftentimes depressing), but it is an essential step towards developing an exam that is valid, reliable, and reflects well on your organization as a whole.  Once you have a chance to look at one of your tests, please write in the comments below what you discovered.  We’d love to hear from you and support you as you strive towards better items, exams, and professionals.