Psychometric forensics is a surprisingly deep and complex field. Many of the indices are incredibly sophisticated, but a good high-level and simple analysis to start with is overall time vs. scores, which I call Time-Score Analysis. This approach uses simple flagging on two easily interpretable metrics (total test time in minutes and number correct raw score) to identify possible pre-knowledge, clickers, and harvester/sleepers. Consider the four quadrants that a bivariate scatterplot of these variables would produce.
|Quadrant||Interpretation||Possible threat?||Suggested flagging|
|Upper right||High scores and taking their diligent time||Good examinees||NA|
|Upper left||High scores with low time||Pre-knowledge||Top 50% score and bottom 5% time|
|Lower left||Low scores with low time||“Clickers” or other low motivation||Bottom 5% time and score|
|Lower right||Low scores with high time||Harvesters, sleepers, or just very low ability||Top 5% time and bottom 5% scores|
An example of Time-Score Analysis
Consider the example data below. What can this tell us about the performance of the test in general, and about specific examinees?
This test had 100 items, scored classically (number-correct), and a time limit of 60 minutes. Most examinees took 45-55 minutes, so the time limit was appropriate. A few examinees spent 58-59 minutes; there will usually be some diligent students like that. There was a fairly strong relationship of time with score, in that examinees who took longer, scored highly.
Now, what about the individuals? I’ve highlighted 5 examples.
- This examinee had the shortest time, and one of the lowest scores. They apparently did not care very much. They are an example of a low motivation examinee that moved through quickly. One of my clients calls these “clickers.”
- This examinee also took a short time, but had a suspiciously high score. They definitely are an outlier on the scatterplot, and should perhaps be investigated.
- This examinee is simply super-diligent. They went right up to the 60 minute limit, and achieved one of the highest scores.
- This examinee also went right up to the 60 minute limit, but had one of the lowest scores. They are likely low ability or low motivation. That same client of mine calls these “sleepers” – a candidate that is forced to take the exam but doesn’t care, so just sits there and dozes.Alternatively, it might be a harvester; some who has been assigned to memorize test content, so they spend all the time they can, but only look at half the items so they can focus on memorization.
- This examinee had by far the lowest score, and one of the lowest times. Perhaps they didn’t even answer every question. Again, there is a motivation/effort issue here, most likely.
How useful is time-score analysis?
Like other aspects of psychometric forensics, this is primarily useful for flagging purposes. We do not know yet if #4 is a Harvester or just low motivation. Instead of accusing them, we open an investigation. How many items did they attempt? Are they a repeat test-taker? What location did they take the test? Do we have proctor notes, site video, remote proctoring video, or other evidence that we can review? There is a lot that can go into such an investigation. Moreover, simple analyses such as this are merely the tip of the iceberg when it comes to psychometric forensics. In fact, so much that I’ve heard some organizations simply stick their head in the sand and don’t even bother checking out someone like #4. It just isn’t in the budget.
However, test security is an essential aspect of validity. If someone has stolen your test items, the test is now compromised, and you are guaranteed that scores do not mean the same thing they meant when the test was published. It’s now apples and oranges, even though the items on the test are the same. Perhaps you might not challenge individual examinees, but perhaps institute a plan to publish new test forms every 6 months. Regardless, your organization needs to have some difficult internal discussions and establish a test security plan.
Latest posts by nthompson (see all)
- What are the possible transformations for scaled scoring? - July 13, 2019
- What is computerized adaptive testing? - May 21, 2019
- What is a standard setting study? - May 21, 2019