A standard setting study is a formal process for establishing a performance standard. In the assessment world, there are actually two uses of the word standard – the other one refers to a formal definition of the content that is being tested, such as the Common Core State Standards in the USA. For this reason, I prefer the term cutscore study.
After item authoring, item review, and test form assembly, a cutscore or passing score will often be set to determine what level of performance qualified as “pass” or a similar classification. This cannot be done arbitrarily (e.g., setting it at 70% because that’s what you saw when you were in school). To be legally defensible and eligible for Accreditation, it must be done using one of several standard-setting approaches from the psychometric literature.
The choice of method depends upon the nature of the test, the availability of pilot data, and the availability of subject matter experts.
Some types of a standard setting study:
- Angoff – In an Angoff study, a panel of subject matter experts rates each item, estimating the percentage of minimally competent candidates that would answer each item correctly. It is often done in tandem with the Beuk Compromise. The Angoff method does not require actual examinee data, though the Beuk does.
- Bookmark – The bookmark method orders the items in a test form in ascending difficulty, and a panel of experts reads through and places a “bookmark” in the book where they think a cutscore should be. Obviously, this requires enough real data to calibrate item difficulty, usually using item response theory, which requires several hundred examinees.
- Contrasting Groups – Candidates are sorted into Pass and Fail groups based on their performance on a different exam or some other unrelated standard. If using data from another exam, a sample of at least 50 candidates is obviously needed.
- Borderline Group – Similar to Contrasting Groups, but a borderline group is defined using alternative information such as biodata, and the scores of the group are evaluated.
Nathan Thompson, PhD
Latest posts by Nathan Thompson, PhD (see all)
- Webinar Recording: A History of Computerized Adaptive Testing with Prof. David J. Weiss - June 14, 2022
- Incremental Validity - June 3, 2022
- Case Study: Escuela Superior de Administración Pública (ESAP), Colombia - June 2, 2022