ASC 2022 Logo no tagline 300

What validity threats are relevant to psychometric forensics?

validity threats

Validity, in its modern conceptualization, refers to evidence that supports our intended interpretations of test scores (see Chapter 1 of APA/AERA/NCME Standards for full treatment).  Validity threats are issues that hinder the interpretations and use of scores.  The word “interpretation” is key because test scores can be interpreted in different ways, including ways that are not intended by the test designers.

For example, a test given at the end of Nursing school to prepare for a national licensure exam might be used by the school as a sort of Final Exam.  However, the test was not designed for this purpose and might not even be aligned with the school’s curriculum.

Another example is that certification tests are usually designed to demonstrate minimal competence, not differentiate amongst experts, so interpreting a high score as expertise might not be warranted.

Validity threats: Always be on the lookout!

Test sponsors, therefore, must be vigilant against any validity threats.  Some of these, like the two aforementioned examples, might be outside the scope of the organization.  While it is certainly worthwhile to address such issues, our primary focus is on aspects of the exam itself.

Which validity threats rise to the surface in psychometric forensics?

Here, we will discuss several threats to validity that typically present themselves in psychometric forensics, with a focus on security aspects.  However, I’m not just listing security threats here, as psychometric forensics is excellent at flagging other types of validity threats too.

Threat Description Approach Example Indices
Collusion (copying) Examinees are copying answers from one another, usually with a defined Source. Error similarity (only looks at incorrect) 2 examinees get the same 10 items wrong, and select the same distractor on each B-B Ran, B-B Obs, K, K1, K2, S2
Response similarity 2 examinees give the same response on 98/100 items S2, g2, ω, Zjk
Group level help/issues Similar to collusion but at a group level; could be examinees working together, or receiving answers from a teacher/proctor.  Note that many examinees using the same brain dump would have a similar signature but across locations. Group level statistics Location has one of the highest mean scores but lowest mean times Descriptive statistics such as mean score, mean time, and pass rate
Response or error similarity On a certain group of items, the entire classroom gives the same answers Roll-up analysis, such as mean collusion flags per group; also erasure analysis (paper only)
Pre-Knowledge Examinee comes in to take the test already knowing the items and answers, often purchased from a brain dump website. Time-Score analysis Examinee has high score and very short time RTE or total time vs. scores
Response or error similarity Examinee has all the same responses as a known brain dump site All indices
Pretest item comparison Examinee gets 100% on existing items but 50% on new items Pre vs Scored results
Person fit Examinee gets the 10 hardest items correct but performs below average on the rest of the items Guttman indices, lz
Harvesting Examinee is not actually taking the test, but is sitting it to memorize items so they can be sold afterwards, often at a brain dump website.  Similar signature to Sleepers but more likely to occur on voluntary tests, or where high scores benefit examinees. Time-Score analysis Low score, high time, few attempts. RTE or total time vs. scores
Mean vs Median item time Examinee “camps” on 10 items to memorize them; mean item time much higher than the median Mean-Median index
Option flagging Examinee answers “C” to all items in the second half Option proportions
Low motivation: Sleeper Examinees are disengaged, producing data that is flagged as unusual and invalid; fortunately, not usually a security concern but could be a policy concern. Similar signature to Harvester but more likely to occur on mandatory tests, or where high scores do not benefit examinees. Time-Score analysis Low score, high time, few attempts. RTE or total time vs. scores
Item timeout rate If you have item time limits, examinee hits them Proportion items that hit limit
Person fit Examinee attempt a few items, passes through the rest Guttman indices, lz
Low motivation: Clicker Examinees are disengaged, producing data that is flagged as unusual and invalid; fortunately, not usually a security concern but could be a policy concern. Similar idea to Sleeper but data is quite different. Time-Score analysis Examinee quickly clicks “A” to all items, finishing with a low time and low score RTE, Total time vs. scores
Option flagging See above Option proportions
Nathan Thompson, PhD

Nathan Thompson, PhD, is CEO and Co-Founder of Assessment Systems Corporation (ASC). He is a psychometrician, software developer, author, and researcher, and evangelist for AI and automation. His mission is to elevate the profession of psychometrics by using software to automate psychometric work like item review, job analysis, and Angoff studies, so we can focus on more innovative work. His core goal is to improve assessment throughout the world.

Nate was originally trained as a psychometrician, with an honors degree at Luther College with a triple major of Math/Psych/Latin, and then a PhD in Psychometrics at the University of Minnesota. He then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. He is also cofounder and Membership Director at the International Association for Computerized Adaptive Testing (iacat.org). He’s published 100+ papers and presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/.

Share This Post

Facebook
Twitter
LinkedIn
Email

More To Explore

gamification in learning and assessment
Education

Gamification in Learning & Assessment

Gamification in assessment and psychometrics presents new opportunities for ways to improve the quality of exams. While the majority of adults perceive games with caution