Authoring test items: Science as well as art
You are experts at what you do, and you want to make sure that your examinees are too. In order to do so, you need tests that are reliable, valid, and legally defensible. That said, it is likely that the test items within your tests are the greatest threat to its actual validity and reliability.
To find out whether your test items are your allies or your enemies, read through your test and identify the items that contain the most prevalent item construction flaws. The first three of the most prevalent construction flaws are located in the item stem (i.e. question). Look to see if your item stems contain…
1) BIAS – Nowadays, we tend to think of bias as relating to culture or religion, but there are many more subtle types of biases that oftentimes sneak into your tests. Consider the following questions to determine the extent of bias in your tests:
- Are there are acronyms in your test that are not considered industry standard?
- Are you testing on policies and procedures that may vary from one location to another?
- Are you using vocabulary that is more recognizable to a female examinee than a male?
- Are you referencing objects that are not familiar to examinees from a newer or older generation?
2) NOT – We’ve all taken tests which ask a negatively worded question. These test items are easy to write, but they are devastating to the validity and reliability of your tests—particularly fast test-takers or individuals with lower reading skills. If the examinee misses that one single word, they will get the question wrong even if they actually know the material. This test item ends up penalizing the wrong examinees!
3) EXCESS VERBIAGE – Long stems can be effective and essential in many situations, but they are also more prone to two specific item construction flaws. If the stem is unnecessarily long, it can contribute to examinee fatigue. Because each item requires more energy to read and understand, examinees tire sooner and may begin to perform more poorly later on in the test—regardless of their competence level.
Additionally, long stems often include information that can be used to answer other questions in the test. This could lead your test to be an assessment of whose test-taking memory is best (i.e. “Oh yeah, #5 said XYZ, so the answer to #34 is XYZ.”) rather than who knows the material.
Unfortunately, item stems aren’t the only offenders. Experienced test writers actually know that the distractors (i.e. options) are actually more difficult to write than the stems themselves. When you review your test items, look to see if your item distractors contain…
4) IMPLAUSIBILTY – The purpose of a distractor is to pull less qualified examinees away from the correct answer by other options that look correct. In order for them to “distract” an examinee from the correct answer, they have to be plausible. The closer they are to being correct, the more difficult the exam will be. If the distractors are obviously incorrect, even unqualified examinees won’t pick them, and your exam will not help you discriminate between examinees who know the material and examinees that do not.
5) 3-TO-1 SPLITS – You may recall watching Sesame Street as a child. If so, you remember the song “One of these things…” (Either way, enjoy refreshing your memory!) Looking back, it seems really elementary, but sometimes our test item options are written in such a way that an examinee can play this simple game with your test. Instead of knowing the material, they can look for the option that stands out as different from the others. Consider the following questions to determine if one of your items falls into this category:
- Is the correct answer significantly longer than the distractors?
- Does the correct answer contain more detail than the distractors?
- Is the grammatical structure different for the answer than for the distractors?
6) ALL OF THE ABOVE – There are a couple of problems with having this phrase (or the opposite “None of the above”) as an option. For starters, good test takers know that this is—statistically speaking—usually the correct answer. If it’s there and the examinee picks it, they have a better than 50% chance of getting the item right—even if they don’t know the content. Also, if they are able to identify two options as correct, they can select “All of the above” without knowing whether or not the third option was correct. These sorts of questions also get in the way of good item analysis. Whether the examinee gets this item right or wrong, it’s harder to ascertain what knowledge they have because the correct answer is so broad.
The process of reading through your exams in search of these flaws is time-consuming (and oftentimes depressing), but it is an essential step towards developing an exam that is valid, reliable, and reflects well on your organization as a whole. Once you have a chance to look at one of your tests, please write in the comments below what you discovered. We’d love to hear from you and support you as you strive towards better items, exams, and professionals.