Some time ago, I received this question regarding interpreting IRT cutscores (item response theory):

In my examination system, we are currently labeling ‘FAIL’ for student’s mark with below 50% and ‘PASS’ for 50% and above.  I found that this amazing Xcalibre software can classify students’ achievement in 2 groups based on scores.  But, when I tried to run IRT EPC with my data (with cut point of 0.5 selected), it shows that students with 24/40 correct items were classified as ‘FAIL’. Because in CTT, 24/40 correctly answered items is equal to 60% (Pass).  I can’t find its interpretation in Guyer & Thompson (2013) User’s Manual for Xcalibre.  How exactly should I set my cut point to perform 2-group classification using IRT EPC in Xcalibre to make it about equal to 50% achievement in CTT?

In this context, EPC refers to expected percent/proportion correct.  IRT uses the test response function (TRF) to convert a theta score to an expectation of what percent of items in the pool that a student would answer correctly.  So this Xcalibre user is wondering how to set IRT cutscores on theta that meets their needs.

## Classical vs Item response theory cutscores

The short answer, in this case, would be to evaluate the TRF and reverse-calculate the theta for the cutscore.  That is, find your desired cutscore on the y-axis, and determine the corresponding value of theta.  In the example below, I have found a % cutscore of 70 and found the corresponding theta of -0.20 or so.  In the case above, a theta=0.5 likely corresponded to a percent correct score of 80%, so observed scores of 24/40 would indeed fail.

## Setting the Cutscores with Item Response Theory

Of course, it is indefensible to set a cutscore to be arbitrary round numbers.  To be defensible, you need to set the cutscore with an accepted methodology such as Angoff, modified-Angoff, Nedelsky, Bookmark, or Contrasting Groups.

A nice example is a the modified-Angoff, which is used extremely often in certification and licensure situations.  More information is available on this method here.  The result of this method will typically be a specific cutscore, either on the raw or percent metric.  The TRF can be presented in both of those metrics, allowing the conversion on the right to be calculated easily.

Alternatively, some standard-setting methods can work directly on the IRT theta scale, including the Bookmark and Contrasting Groups approaches.  For example, the Bookmark method will have you calibrate all items with IRT first, order the items by IRT difficulty in a booklet, and then experts will page through the booklet and insert a bookmark where they think the cutscore should be. (hence the name!)

## How do I implement IRT?

Interested in applying IRT to improve your assessments?  Download a free trial copy of  Xcalibre  here.  If you want to deliver online tests that are scored directly with IRT, in real time (including computerized adaptive testing), check out  FastTest.

The following two tabs change content below.

#### Nathan Thompson, PhD

Nathan Thompson earned his PhD in Psychometrics from the University of Minnesota, with a focus on computerized adaptive testing. His undergraduate degree was from Luther College with a triple major of Mathematics, Psychology, and Latin. He is primarily interested in the use of AI and software automation to augment and replace the work done by psychometricians, which has provided extensive experience in software design and programming. Dr. Thompson has published over 100 journal articles and conference presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/ .