Item parameter drift (IPD) refers to the phenomenon in which parameter values a given test item changes over multiple testing occasions within the item response theory (IRT) framework. This phenomenon is often relevant to student progress monitoring assessments where a set of items is used several times in one year, or across years, to track student growth; the observing of trends in student academic achievements depends upon stable linking (anchoring) between assessment moments over time, and if the item parameters are not stable, the scale is not stable and time-to-time comparisons are not either. Some psychometricians consider IPD as a special case of differential item functioning (DIF), but these two are different issues and should not be confused with each other.
Reasons for Item Parameter Drift
IRT modeling is attractive for assessment field since its property of item parameter invariance to a particular sample of test-takers which is fundamental for their estimation, and that assumption enables important things like strong equating of tests across time and the possibility for computerized adaptive testing. However, item parameters are not always invariant. There are plenty of reasons that could stand behind IPD. One possibility is curricular changes based on assessment results or instruction that is more concentrated. Other feasible reasons are item exposure, cheating, or curricular misalignment with some standards. No matter what has led to IPD, its presence can cause biased estimates of student ability. In particular, IPD can be highly detrimental in terms of reliability and validity in case of high-stakes examinations. Therefore, it is crucial to detect item parameter drift when anchoring assessment occasions over time, especially when the same anchor items are used repeatedly.
Perhaps the simplest example is item exposure. Suppose a 100-item test is delivered twice per year, with 20 items always remaining as anchors. Eventually students will share memories and the topics of those will become known. More students will get them correct over time, making the items appear easier.
There are several methods of detecting IPD. Some of them are simpler because they do not require estimation of anchoring constants, and some of them are more difficult due to the need of that estimation. Simple methods include the “3-sigma p-value”, the “0.3 logits”, and the “3-sigma IRT” approaches. Complex methods involve the “3-sigma scaled IRT”, the “Mantel-Haenszel”, and the “area between item characteristic curves”, where the last two approaches are based on consideration that IPD is a special case of DIF, and therefore there is an opportunity to draw upon a massive body of existing research on DIF methodologies.
Even though not all psychometricians think that removal of outlying anchor items is the best solution for item parameter drift, if we do not eliminate drifting items from the process of equating test scores, they will affect transformations of ability estimates, not only item parameters. Imagine that there is an examination, which classifies examinees as either failing or passing, or into four performance categories; then in case of IPD, 10-40% of students could be misclassified. In high-stakes testing occasions where classification of examinees implies certain sanctions or rewards, IPD scenarios should be minimized as much as possible. As soon as it is detected that some items exhibit IPD, these items should be referred to the subject-matter experts for further investigation. Otherwise, if there is a need in a faster decision, such flagged anchor items should be removed immediately. Afterwards, psychometricians need to re-estimate linking constants and evaluate IPD again. This process should repeat unless none of the anchor items shows item parameter drift.