Test blueprints, aka test specifications (shortened to “test specs”), are the formalized design of an assessment, test, or exam. This can be in the context of educational assessment, pre-employment, certification, licensure, or any other type. Generally, the amount of effort and detail is commensurate with the stakes of the assessment; a 10 item quiz for 5th grade math is quite different than the licensure exam for surgeons!
Why do we need test blueprints?
The blueprints are used for various purposes. The most important is that they are part of the validity documentation. Validity refers to the evidence we have (“evidence-centered design”) that a test’s scores mean what we want them to mean. So if we want the scores to reflect knowledge of high school math curriculum for graduation, then the test specifications should align to the curriculum quite closely. If we want the scores to reflect that a surgeon is qualified to do practice, we want the test specifications to reflect the knowledge and skills needed to practice. A lot of work can go into designing the blueprints, such as job task analysis in certification and licensure. The image here provides an example of how JTA data is converted into content blueprints.
The test blueprints/specifications are also important for directing efforts in test development. At the simplest level, you want your item writers to create new items in areas where you need them. If the blueprints only call for 1% of the test on a certain topic, you don’t want the item writers making a lot of new questions there.
The test blueprints are often published publicly in a simplified version to help external stakeholders. For example, you want the surgeons to be able to study for their test, so you publish a list of content domains that is covered by the test, and the percentage of items from each. A fantastic example of this is at NOCTI. Another good example which covers multiple aspects of the list below is this one from New Mexico.
What are test blueprints?
The test blueprints, like the blueprints of a house or office building, define everything needed to build it. There are multiple aspects to this, which can vary by type of exam. It breaks down into two types of information: item distribution, and operational guidelines.
There are many ways that you can classify items on the test. The content domain or topic that they cover is the most obvious here, such as defining a math test that is 40% Algebra, 30% Geometry, and 30% Calculus. But there are other, more practical and operational, considerations as well.
Number of items
First, the blueprints should define the number of items, including a breakdown of scored vs. unscored (pilot) items. Often, there is documented reasoning behind the choices for this, such as pretesting plans, or an estimate of reliability based on projected test length.
This is the most important and most common. Some test blueprints only cover this and the number of items. It defines all the content covered by the test, and the percentage for each. Sometimes, there are sub-domains and sub-sub-domains! Here is an example of that, from the New Mexico link provided earlier.
Many tests only have multiple choice items, so this is then unnecessary. But there are tests, for example, that require 50 multiple choice items, 10 drag and drop, 10 fill-in-the-blank, and 2 essay. Such designs need to be explained and codified in the the test blueprints.
Some test blueprints define a distribution or target level of statistics. For example, it might require 20% of the items to have classical difficulty statistics (P-values) of 0.40 to 0.60, 60% of the items with values 0.60 to 0.90, and 20% from 0.90 to 1.00. Or, there might just be acceptable ranges, such as stating that all difficulty statistics should be 0.40 to 0.98.
Cognitive level or Bloom’s
Not all assessments tackle this consideration, but it is common in education. The test blueprints might specify a certain number of items that are Recall vs. higher levels of cognitive complexity. Note that this might overlap with Item Type.
The design of the test might be ordered into sections, which is documented closely. Continuing the example above, there might be Section 1 that is the 50 multiple choice items, Section 2 is drag-and-drop plus fill-in-the-blank, and Section 3 is Essay.
Operational and practical considerations
This part of the blueprints covers aspects other than the nature of items. There are many things that are useful, but here are a few examples.
- Time limits – What is the overall time limit of the test? Section time limits?
- Navigation – Are examinees allowed to move back and forth between sections? Between items?
- Test design – If you are using modern designs like computerized adaptive testing or linear on the fly testing, you need to define these with a lot of detail.
- Messaging – What instructions will you give? Are there pop-up messages?
- Access – How do you control access to the exam? Are there eligibility requirements? Published online vs. paper? So many options.
As you can see, there are a ton of things to consider when publishing a test. If the test is low-stakes, many of these are treated informally, such as a teacher handing out a 10 item quiz. But for high-stakes assessment, the definition of formal test blueprints and specifications is absolutely essential. Not only does it prepare the candidates and other stakeholders, but it makes things easier for the test developers, and provides substantial documentation for validity. Moreover, if you work in an area where there are potential legal challenges, it provides a bulwark of legal defensibility. If you work in high-stakes or high-volume assessment, you need to define your test blueprints.