Validity Threats and Psychometric Forensics

April 10, 2024

Validity threats are issues with a test or assessment that hinder the interpretations and use of scores, such as cheating, inappropriate use of scores, unfair preparation, or non-standardized delivery. It is important to establish a test security plan to define the threats relevant for you and address them.

Validity, in its modern conceptualization, refers to evidence that supports our intended interpretations of test scores (see Chapter 1 of APA/AERA/NCME Standards for full treatment). The word “interpretation” is key because test scores can be interpreted in different ways, including ways that are not intended by the test designers. For example, a test given at the end of Nursing school to prepare for a national licensure exam might be used by the school as a sort of Final Exam. However, the test was not designed for this purpose and might not even be aligned with the school’s curriculum. Another example is that certification tests are usually designed to demonstrate minimal competence, not differentiate amongst experts, so interpreting a high score as expertise might not be warranted.

Validity threats: Always be on the lookout!

Test sponsors, therefore, must be vigilant against any validity threats. Some of these, like the two aforementioned examples, might be outside the scope of the organization. While it is certainly worthwhile to address such issues, our primary focus is on aspects of the exam itself.

Which validity threats rise to the surface in psychometric forensics?

Here, we will discuss several threats to validity that typically present themselves in psychometric forensics, with a focus on security aspects. However, I’m not just listing security threats here, as psychometric forensics is excellent at flagging other types of validity threats too.

Threat	Description	Approach	Example	Indices
Collusion (copying)	Examinees are copying answers from one another, usually with a defined Source.	Error similarity (only looks at incorrect)	2 examinees get the same 10 items wrong, and select the same distractor on each	B-B Ran, B-B Obs, K, K1, K2, S2
Collusion (copying)		Response similarity	2 examinees give the same response on 98/100 items	S2, g₂, ω, Z_jk
Group level help/issues	Similar to collusion but at a group level; could be examinees working together, or receiving answers from a teacher/proctor. Note that many examinees using the same brain dump would have a similar signature but across locations.	Group level statistics	Location has one of the highest mean scores but lowest mean times	Descriptive statistics such as mean score, mean time, and pass rate
Group level help/issues		Response or error similarity	On a certain group of items, the entire classroom gives the same answers	Roll-up analysis, such as mean collusion flags per group; also erasure analysis (paper only)
Pre-Knowledge	Examinee comes in to take the test already knowing the items and answers, often purchased from a brain dump website.	Time-Score analysis	Examinee has high score and very short time	RTE or total time vs. scores
		Response or error similarity	Examinee has all the same responses as a known brain dump site	All indices
		Pretest item comparison	Examinee gets 100% on existing items but 50% on new items	Pre vs Scored results
		Person fit	Examinee gets the 10 hardest items correct but performs below average on the rest of the items	Guttman indices, l_z
Harvesting	Examinee is not actually taking the test, but is sitting it to memorize items so they can be sold afterwards, often at a brain dump website. Similar signature to Sleepers but more likely to occur on voluntary tests, or where high scores benefit examinees.	Time-Score analysis	Low score, high time, few attempts.	RTE or total time vs. scores
		Mean vs Median item time	Examinee “camps” on 10 items to memorize them; mean item time much higher than the median	Mean-Median index
		Option flagging	Examinee answers “C” to all items in the second half	Option proportions
Low motivation: Sleeper	Examinees are disengaged, producing data that is flagged as unusual and invalid; fortunately, not usually a security concern but could be a policy concern. Similar signature to Harvester but more likely to occur on mandatory tests, or where high scores do not benefit examinees.	Time-Score analysis	Low score, high time, few attempts.	RTE or total time vs. scores
		Item timeout rate	If you have item time limits, examinee hits them	Proportion items that hit limit
		Person fit	Examinee attempt a few items, passes through the rest	Guttman indices, l_z
Low motivation: Clicker	Examinees are disengaged, producing data that is flagged as unusual and invalid; fortunately, not usually a security concern but could be a policy concern. Similar idea to Sleeper but data is quite different.	Time-Score analysis	Examinee quickly clicks “A” to all items, finishing with a low time and low score	RTE, Total time vs. scores
Low motivation: Clicker		Option flagging	See above	Option proportions

Psychometric Forensics to Find Evidence of Cheating

An emerging sector in the field of psychometrics is the area devoted to analyzing test data to find cheaters and other illicit or invalid testing behavior. There is a distinction between primary and secondary collusion, and there are specific collusion detection indices and methods to investigate aberrant testing behavior, such as

While research on this topic is more than 50 years old, the modern era did not begin until Wollack published his paper on the Omega index in 1997. Since then, the sophistication and effectiveness of methodology in the field has multiplied, and many more publications focus on it than in the pre-Omega era. This is evidenced by not one but three recent books on the subject:

Wollack, J., & Fremer, J. (2013). Handbook of Test Security.
Kingston, N., & Clark, A. (2014). Test Fraud: Statistical Detection and Methodology.
Cizek, G., & Wollack, J. (2016). Handbook of Quantitative Methods for Detecting Cheating on Tests.

Nathan Thompson

Nathan Thompson earned his PhD in Psychometrics from the University of Minnesota, with a focus on computerized adaptive testing. His undergraduate degree was from Luther College with a triple major of Mathematics, Psychology, and Latin. He is primarily interested in the use of AI and software automation to augment and replace the work done by psychometricians, which has provided extensive experience in software design and programming. Dr. Thompson has published over 100 journal articles and conference presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/ .

Ready to talk to an assessment expert?

Get in touch, and we'll meet to discuss how we can improve your exam development, delivery, and psychometrics!

Request a Consultation

Validity Threats and Psychometric Forensics

Validity threats: Always be on the lookout!

Which validity threats rise to the surface in psychometric forensics?

Psychometric Forensics to Find Evidence of Cheating

Ready to talk to an assessment expert?

Solutions

Services

Company