multiple choice test bubble sheet scores

A confidence interval for test scores is a common way to interpret the results of a test by phrasing it as a range rather than a single number.  We all know that tests are imperfect measurements that happen at a given slice in time, and performance could in actuality vary over time.  The examinee might be sick or tired today and score lower than their true score on the test, or get lucky with some items on topics they have studied more closely, then score higher today than they normally might (or vice versa with tricky items).

Psychometricians recognize this and have developed the concept of the standard error of measurement, which is an index of this variation.  The calculation of the SEM differs between classical test theory and item response theory, but in either case, we can use it to make a confidence interval around the observed score. Because tests are imperfect measurements, some psychometricians recommend always reporting scores as a range rather than a single number.

A confidence interval is a very common concept from statistics in general (not psychometrics alone) about making a likely range for the true value of something being estimated.  We can take 1.96 times a standard error on each side of a point estimate to get a 95% confidence interval.  Start by calculating 1.96 times the SEM, then add and subtract it to the original score to get a range.

Example of confidence interval with Classical Test Theory

With CTT, the confidence interval is placed on raw number-correct scores.  Suppose the reliability of a 100-item test is 0.90, with a mean of 85 and standard deviation of 5.  The SEM is then 5*sqrt(1-0.90) = 5*0.31 = 1.58.  If your score is a 67, then a 95% confidence interval is 63.90 to 70.10.  We are 95% sure that your true score lies in that range.

Example of confidence interval with Item Response Theory

The same concept applies to item response theory.  But the scale of numbers is quite different, because the theta scale runs from approximately -3 to +3.  Also, the SEM is calculated directly from item parameters, in a complex way that is beyond the scope of this discussion.  But if your score is -1.0 and the SEM is 0.30, then the 95% confidence interval for your score is -1.588 to -0.412.  This confidence interval can be compared to a cutscore as an adaptive testing approach to pass/fail tests.

Example of confidence interval with a Scaled Score

This concept also works on scaled scores.  IQ is typically reported on a scale with a mean of 100 and standard deviation of 15.  Suppose the test had an SEM of 3.2, and your score was 112.  Then if we take 1.96*3.2 and plus or minus it on either side, we get a confidence interval of 105.73 to 118.27.

The following two tabs change content below.
Avatar for Nathan Thompson, PhD

Nathan Thompson, PhD

Nathan Thompson earned his PhD in Psychometrics from the University of Minnesota, with a focus on computerized adaptive testing. His undergraduate degree was from Luther College with a triple major of Mathematics, Psychology, and Latin. He is primarily interested in the use of AI and software automation to augment and replace the work done by psychometricians, which has provided extensive experience in software design and programming. Dr. Thompson has published over 100 journal articles and conference presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/ .
Avatar for Nathan Thompson, PhD

Latest posts by Nathan Thompson, PhD (see all)