Estimated reading time: 2 minutes
Classical test theory is a century-old paradigm for psychometrics – using quantitative and scientific processes to develop and analyze assessments to improve their quality. (nobody likes unfair tests!) The most basic and frequently used item statistic from classical test theory is the P-value. It is usually called item difficulty but is sometimes called item facility, which can lead to possible confusion.
The P-Value Statistic
The classical P-value is the proportion of examinees that respond correctly to a question, or respond in the “keyed direction” for items where the notion of correct is not relevant (imagine a personality assessment where all questions are Yes/No statements such as “I like to go to parties” … Yes is the keyed direction for an Extraversion scale). Note that this is NOT the same as the p-value that is used in hypothesis testing from general statistical methods. This P-value is almost universally agreed upon in terms of calculation. But some people call it item difficulty and others call it item facility. Why?
It has to do with the clarity interpretation. It usually makes sense to think of difficulty as an important aspect of the item. The P-value presents this, but in a reverse manner. We usually expect higher values to indicate more of something, right? But a P-value of 1.00 is high, and it means that there is not much difficulty; everyone gets the item correct, so it is actually no difficulty whatsoever. A P-value of 0.25 is low, but it means that there is a lot of difficulty; only 25% of examinees are getting it correct, so it has quite a lot of difficulty.
So where does “item facility” come in?
See how the meaning is reversed? It’s for this reason that some psychometricians prefer to call it item facility or item easiness. We still use the P-value, but 1.00 means high facility/easiness, and 0.25 means low facility/easiness. The direction of the semantics fits much better.
Nevertheless, this is a minority of psychometricians. There’s too much momentum to change an entire field at this point! It’s similar to the 3 dichotomous IRT parameters (a,b,c); some of you might have noticed that they are actually in the wrong order because the 1-parameter model does not use the parameter, it uses the b.
At the end of the day, it doesn’t really matter, but it’s another good example of how we all just got used to doing something and it’s now too far down the road to change it. Tradition is a funny thing.
Nathan Thompson, PhD, is CEO and Co-Founder of Assessment Systems Corporation (ASC). He is a psychometrician, software developer, author, and researcher, and evangelist for AI and automation. His mission is to elevate the profession of psychometrics by using software to automate psychometric work like item review, job analysis, and Angoff studies, so we can focus on more innovative work. His core goal is to improve assessment throughout the world.
Nate was originally trained as a psychometrician, with an honors degree at Luther College with a triple major of Math/Psych/Latin, and then a PhD in Psychometrics at the University of Minnesota. He then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. He is also cofounder and Membership Director at the International Association for Computerized Adaptive Testing (iacat.org). He’s published 100+ papers and presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/.