assessment systems logo

What is a standard setting study?

educational standard setting

A standard setting study is a formal process for establishing a performance standard. In the assessment world, there are actually two uses of the word standard – the other one refers to a formal definition of the content that is being tested, such as the Common Core State Standards in the USA. For this reason, I prefer the term cutscore study.

After item authoring, item review, and test form assembly, a cutscore or passing score will often be set to determine what level of performance qualified as “pass” or a similar classification.  This cannot be done arbitrarily (e.g., setting it at 70% because that’s what you saw when you were in school).  To be legally defensible and eligible for Accreditation, it must be done using one of several standard-setting approaches from the psychometric literature.  

The choice of method depends upon the nature of the test, the availability of pilot data, and the availability of subject matter experts.

Some types of a standard setting study:

  • Angoff – In an Angoff study, a panel of subject matter experts rates each item, estimating the percentage of minimally competent candidates that would answer each item correctly.  It is often done in tandem with the Beuk Compromise.  The Angoff method does not require actual examinee data, though the Beuk does.
  • Bookmark – The bookmark method orders the items in a test form in ascending difficulty, and a panel of experts reads through and places a “bookmark” in the book where they think a cutscore should be.  Obviously, this requires enough real data to calibrate item difficulty, usually using item response theory, which requires several hundred examinees.
  • Contrasting Groups – Candidates are sorted into Pass and Fail groups based on their performance on a different exam or some other unrelated standard.  If using data from another exam, a sample of at least 50 candidates is obviously needed.
  • Borderline Group – Similar to Contrasting Groups, but a borderline group is defined using alternative information such as biodata, and the scores of the group are evaluated.
The following two tabs change content below.

Nathan Thompson, PhD

CEO at Assessment Systems
Nathan Thompson, PhD, is CEO and Co-Founder of Assessment Systems Corporation (ASC). He is a psychometrician, software developer, author, and researcher, and evangelist for AI and automation. His mission is to elevate the profession of psychometrics by using software to automate psychometric work like item review, job analysis, and Angoff studies, so we can focus on more innovative work. His core goal is to improve assessment throughout the world. Nate was originally trained as a psychometrician, with an honors degree at Luther College with a triple major of Math/Psych/Latin, and then a PhD in Psychometrics at the University of Minnesota. He then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. He is also cofounder and Membership Director at the International Association for Computerized Adaptive Testing (iacat.org). He's published 100+ papers and presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/.

Latest posts by Nathan Thompson, PhD (see all)

Share This Post

Facebook
Twitter
LinkedIn
Email

More To Explore

certification exam delivery
Credentialing

Certification Exam Delivery: Guidelines For Success

Certification exam delivery is the process of administering a certification test to candidates.  This might seem straightforward, but it is surprisingly complex.  The greater the

Multiple choice bubble sheet - split half reliability
Psychometrics

Split Half Reliability

Split Half Reliability is an internal consistency approach to quantifying the reliability of a test, in the paradigm of classical test theory.  The name comes