The two terms Norm-Referenced and Criterion-Referenced are commonly used to describe tests, exams, and assessments. They are often some of the first concepts learned when studying assessment and psychometrics.
Norm-referenced means that we are referencing how your score compares to other people. Criterion-referenced means that we are referencing how your score compares to a criterion such as a cutscore or a body of knowledge.
Do we say a test is “Norm-Referenced” vs. “Criterion-Referenced”?
Actually, that’s a slight misuse.
The terms Norm-Referenced and Criterion-Referenced refer to score interpretations. Most tests can actually be interpreted in both ways, though they are usually designed and validated for only one of the other.
Hence the shorthand usage of saying “this is a norm-referenced test” even though it just means that it is the primarily intended interpretation.
Examples of Norm-Referenced vs. Criterion-Referenced
Suppose you received a score of 90% on a Math exam in school. This could be interpreted in both ways. If the cutscore was 80%, you clearly passed; that is the criterion-referenced interpretation. If the average score was 75%, then you performed at the top of the class; this is the norm-referenced interpretation.
What if the average score was 95%? Well, that changes your norm-referenced interpretation (you are now below average) but the criterion-referenced interpretation does not change.
Now consider a certification exam. This is an example of a test that is specifically designed to be criterion-referenced. It is supposed to measure that you have the knowledge and skills to practice in your profession. It doesn’t matter whether all candidates pass or only a few candidates pass; the cutscore is the cutscore.
However, you could interpret your score by looking at your percentile rank compared to other examinees; it just doesn’t impact the cutscore
On the other hand, we have an IQ test. There is no criterion-referenced cutscore of whether you are “smart” or “passed.” Instead, the scores are located on the standard normal curve (mean=100, SD=15), and all interpretations are norm-referenced. Namely, where do you stand compared to others?
Is this impacted by item response theory (IRT)?
If you have looked at item response theory (IRT), you know that it scores examinees on what is effectively the standard normal curve (though this is shifted if Rasch). But, IRT-scored exams can still be criterion-referenced. It can still be designed to measure a specific body of knowledge and have a cutscore that is fixed and stable over time.
Even computerized adaptive testing can be used like this. An example is the NCLEX exam for nurses in the United States. It is an adaptive test, but the cutscore is -0.18 (NCLEX-PN on Rasch scale) and it is most definitely criterion-referenced.
Building and validating an exam
The process of developing an assessment is surprisingly difficult as there are many forces at play. The greater the stakes, volume, and incentives for stakeholders, the more effort that goes into developing and validating. ASC’s expert consultants can help you navigate these rough waters.