Some time ago, I received this question regarding interpreting IRT cutscores (item response theory):
In my examination system, we are currently labeling ‘FAIL’ for student’s mark with below 50% and ‘PASS’ for 50% and above. I found that this amazing Xcalibre software can classify students’ achievement in 2 groups based on scores. But, when I tried to run IRT EPC with my data (with cut point of 0.5 selected), it shows that students with 24/40 correct items were classified as ‘FAIL’. Because in CTT, 24/40 correctly answered items is equal to 60% (Pass). I can’t find its interpretation in Guyer & Thompson (2013) User’s Manual for Xcalibre. How exactly should I set my cut point to perform 2-group classification using IRT EPC in Xcalibre to make it about equal to 50% achievement in CTT?
In this context, EPC refers to expected percent/proportion correct. IRT uses the test response function (TRF) to convert a theta score to an expectation of what percent of items in the pool that a student would answer correctly. So this Xcalibre user is wondering how to set IRT cutscores on theta that meets their needs.
Classical vs IRT cutscores
The short answer, in this case, would be to evaluate the TRF and reverse-calculate the theta for the cutscore. That is, find your desired cutscore on the y-axis, and determine the corresponding value of theta. In the example below, I have found a % cutscore of 70 and found the corresponding theta of -0.20 or so. In the case above, a theta=0.5 likely corresponded to a percent correct score of 80%, so observed scores of 24/40 would indeed fail.
Setting the Cutscores with IRT
Of course, it is indefensible to set a cutscore to be arbitrary round numbers. To be defensible, you need to set the cutscore with an accepted methodology such as Angoff, modified-Angoff, Nedelsky, Bookmark, or Contrasting Groups.
A nice example is a the modified-Angoff, which is used extremely often in certification and licensure situations. More information is available on this method here. The result of this method will typically be a specific cutscore, either on the raw or percent metric. The TRF can be presented in both of those metrics, allowing the conversion on the right to be calculated easily.
Alternatively, some standard-setting methods can work directly on the IRT theta scale, including the Bookmark and Contrasting Groups approaches. For example, the Bookmark method will have you calibrate all items with IRT first, order the items by IRT difficulty in a booklet, and then experts will page through the booklet and insert a bookmark where they think the cutscore should be. (hence the name!)
Interested in applying IRT to improve your assessments? Download a free trial copy of Xcalibre here. If you want to deliver online tests that are scored directly with IRT, in real time (including computerized adaptive testing), check out FastTest.
Nathan Thompson, PhD, is CEO and Co-Founder of Assessment Systems Corporation (ASC). He is a psychometrician, software developer, author, and researcher, and evangelist for AI and automation. His mission is to elevate the profession of psychometrics by using software to automate psychometric work like item review, job analysis, and Angoff studies, so we can focus on more innovative work. His core goal is to improve assessment throughout the world.
Nate was originally trained as a psychometrician, with an honors degree at Luther College with a triple major of Math/Psych/Latin, and then a PhD in Psychometrics at the University of Minnesota. He then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. He is also cofounder and Membership Director at the International Association for Computerized Adaptive Testing (iacat.org). He’s published 100+ papers and presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/.