Entries by Nathan Thompson, PhD

, ,

Multistage Testing

Multistage testing (MST) is a type of computerized adaptive testing (CAT).  This means it is an exam delivered on computers which dynamically personalize it for each examinee or student.  Typically, this is done with respect to the difficulty of the questions, by making the exam easier for lower-ability students and harder for high-ability students.  Doing […]

, ,

Automated Item Generation

Automated item generation (AIG) is a paradigm for developing assessment items (test questions), utilizing principles of artificial intelligence and automation. As the name suggests, it tries to automate some or all of the effort involved with item authoring, as that is one of the most time-intensive aspects of assessment development – which is no news […]

, ,

Ebel Method of Standard Setting

The Ebel method of standard setting is a psychometric approach to establish a cutscore for tests consisting of multiple-choice questions. It is usually used for high-stakes examinations in the fields of higher education, medical and health professions, and for selecting applicants. How is the Ebel method performed? The Ebel method requires a panel of judges who […]

,

Distractor Analysis for Test Items

Distractor analysis refers to the process of evaluating the performance of incorrect answers vs the correct answer for multiple choice items on a test.  It is a key step in the psychometric analysis process to evaluate item and test performance as part of documenting test reliability and validity. What is a distractor? An item distractor, […]

What is multi-modal test delivery?

Multi-modal test delivery refers to an exam that is capable of being delivered in several different ways, or of a online testing software platform designed to support this process. For example, you might provide the option for a certification exam to be taken on computer at third-party testing centers or via paper at the annual […]

Confidence Interval for Test Scores

A confidence interval for test scores is a common way to interpret the results of a test by phrasing it as a range rather than a single number.  We all understand that tests provide imperfect measurements at a specific point in time, and actual performance can vary over different occasions.  The examinee might be sick […]

Composite Test Score

A composite test score refers to a test score that is combined from the scores of multiple tests, that is, a test battery.  The purpose is to create a single number that succinctly summarizes examinee performance.  Of course, some information is lost by this, so the original scores are typically reported as well. This is […]

Inter-Rater Reliability vs Agreement

Inter-rater reliability and inter-rater agreement are important concepts in certain psychometric situations.  For many assessments, there is never any encounter with raters, but there certainly are plenty of assessments that do.  This article will define these two concepts and discuss two psychometric situations where they are important.  For a more detailed treatment, I recommend Tinsley […]

,

Split Half Reliability Index

Split Half Reliability is an internal consistency approach to quantifying the reliability of a test, in the paradigm of classical test theory.  Reliability refers to the repeatability or consistency of the test scores; we definitely want a test to be reliable.  The name comes from a simple description of the method: we split the test […]