Confidence Interval for Test Scores
Content
A confidence interval for test scores is a common way to interpret the results of a test by phrasing it as a range rather than a single number. We all understand that tests provide imperfect measurements at a specific point in time, and actual performance can vary over different occasions. The examinee might be sick or tired today and score lower than their true score on the test, or get lucky with some items on topics they have studied more closely, then score higher today than they normally might (or vice versa with tricky items).
Psychometricians recognize this and have developed the concept of the standard error of measurement, which is an index of this variation. The calculation of the SEM differs between classical test theory and item response theory, but in either case, we can use it to make a confidence interval around the observed score. Because tests are imperfect measurements, some psychometricians recommend always reporting scores as a range rather than a single number.
A confidence interval is a very common concept from statistics in general (not psychometrics alone) about making a likely range for the true value of something being estimated. We can take 1.96 times a standard error on each side of a point estimate to get a 95% confidence interval. Start by calculating 1.96 times the SEM, then add and subtract it to the original score to get a range.
Example of confidence interval with Classical Test Theory
With CTT, the confidence interval is placed on raw number-correct scores. Suppose the reliability of a 100-item test is 0.90, with a mean of 85 and standard deviation of 5. The SEM is then 5*sqrt(1-0.90) = 5*0.31 = 1.58. If your score is a 67, then a 95% confidence interval is 63.90 to 70.10. We are 95% sure that your true score lies in that range.
Example of confidence interval with Item Response Theory
The same concept applies to item response theory (IRT). But the scale of numbers is quite different, because the theta scale runs from approximately -3 to +3. Also, the SEM is calculated directly from item parameters, in a complex way that is beyond the scope of this discussion. But if your score is -1.0 and the SEM is 0.30, then the 95% confidence interval for your score is -1.588 to -0.412. This confidence interval can be compared to a cutscore as an adaptive testing approach to pass/fail tests.
Example of confidence interval with a Scaled Score
This concept also works on scaled scores. IQ is typically reported on a scale with a mean of 100 and standard deviation of 15. Suppose the test had an SEM of 3.2, and your score was 112. Then if we take 1.96*3.2 and plus or minus it on either side, we get a confidence interval of 105.73 to 118.27.
Nathan Thompson, PhD
Latest posts by Nathan Thompson, PhD (see all)
- Situational Judgment Tests: Higher Fidelity in Pre-Employment Testing - November 30, 2024
- What is an Assessment-Based Certificate? - October 12, 2024
- What is Psychometrics? How does it improve assessment? - October 12, 2024