Contributions to the Theory and Practice of Computerized Adaptive Testing
Theo J.H.M.Eggen - Citogroep Arnhem, Netherlands
This book includes reports on six research studies that make important contributions
to computerized adaptive testing (CAT).
The first chapter gives a general overview of the basic aspects of CAT. The remaining
six chapters can be read independently of each other. The first three chapters have a more
theoretical orientation. The final three chapters have a more practical orientation and
are directly applicable in CAT programs.
Chapters 2, 3, and 4 are devoted to item calibration. Item calibration is critical
in CAT, because the item parameters are considered to be known in every step of a CAT
algorithm. If the estimates of the item parameters have large standard errors or are
possibly changed due to use of the items, the validity of inferences made on the basis
of a CAT is threatened. A sound item calibration is therefore of utmost importance.
In Chapter 2, the loss of information in estimating item parameters with conditional
maximum likelihood methods is compared to joint and marginal maximum likelihood methods,
using a generally applicable theoretical framework. In the third chapter, the theory is
generalized to incomplete testing designs, which are by necessity applied in the case of
larger item banks which are used for CATs. The conditions for correct item calibration in
different incomplete designs using different item parameter estimation methods are the
subject of the fourth chapter.
Traditionally, CAT algorithms are developed for obtaining an efficient estimate of
an examinee's trait levels. As described in Chapter 5, the same algorithms can also be
used in classification problems, where the exact estimate of a examinee's trait is not
necessarily important but rather the examinee's classification in one of a few distinct
categories. In this situation, a CAT algorithm based on statistical testing, rather than
on estimation, can be used. In Chapter 5, the use of such algorithms based on the
sequential probability ratio test for classification in three distinct categories is studied.
In traditional CATs, item selection methods are based on a criterion which is closely
related to statistical estimation (Fisher information). In chapter 6 an item selection
method -- Kullback-Leibler information -- which is conceptually better related to
statistical testing is presented and evaluated.
One of the consequences of optimal item selection in CATs is that it can be expected
that each examinee will answer about half of the items correctly. The final chapter
explores whether the selection procedures can be altered in favor of the practical
desire to have, for certain groups of examinees, easier or more difficult tests.
Table of Contents
1. Introduction and overview
1.1 Introduction
1.2 Overview
1.3 References
2. On the loss of information in conditional maximum likelihood estimation of item parameters
2.1 Introduction
2.1.1 Estimation of item parameters in the Rasch model
2.1.2 Information and efficiency
2.2 Notation and terminology
2.3 The F-information: definition and basic properties
2.4 A scalar measure of information
2.5 F-information in separate models
2.5.1 Properties of the efficient score statistics
2.5.2 Theorems on the F-information in separable models
2.6 Checking the conditions for no loss of information using CML in two Rasch models
2.6.1 The Rasch Poisson counts model
2.6.2 The Rasch model for dichotomously scored items
2.7 F-information in the dichotomous Rasch model: comparing JML and CML
2.7.1 Comparison of information efficiency in JML and CML estimation
2.8 F-information in the dichotomous Rasch model: comparing MML and CML
2.8.1 F-information in pm
2.8.2 F-information in f(x|t)
2.8.3 Comparison of information in efficiency in CML and MML estimation
2.9 Conclusion
2.10 References
Appendix chapter 2
3. Loss of information in estimating item parameters in incomplete designs
3.1 Introduction
3.2 F-information and the Rasch model
3.3 Normalization, information, and the determinant
3.3.1 The influence of the normalization on the information matrices
3.4 F-information in incomplete designs
3.5 The information comparison in incomplete designs
3.6 Examples of comparing the efficiency in incomplete designs
3.6.1 The designs
3.6.2 Results for an observed response as unit of cost
3.6.3 Results for a test taker as unit of cost
3.7 Conclusion
3.8 References
Appendix chapter 3
4. Item calibration in incomplete testing designs
4.1 Introduction
4.2 Item response theory
4.2.1 Conditional maximum likelihood estimation
4.2.2 Marginal maximum likelihood estimation
4.3 Inference and missing data
4.4 Incomplete calibration designs
4.4.1 Random incomplete designs
4.4.2 Multistage testing designs
4.4.3 Targeted testing designs
4.5 Item calibration and missing data
4.5.1 The marginal model and missing data
4.5.2 The conditional model and missing data
4.6 Conclusion
4.7 References
5. Computerized adaptive testing for classifying examinees into three
categories
5.1 Introduction
5.2 Context
5.2.1 The mathematics item bank
5.3 Research questions
5.4 Statistical computation procedures
5.4.1 Statistical estimation in the testing algorithm
5.4.2 Statistical testing in the testing algorithm
5.5 Item selection
5.5.1 Starting procedure
5.5.2 Item selection procedures
5.6 Design of the simulation studies
5.7 Results of the simulation studies
5.7.1 Measurement accuracy with statistical estimation
5.7.2 The algorithms in the conditions of the placement test
5.8 Discussion
5.9 References
6. Item selection in adaptive testing with the sequential probability ration test
6.1 Introduction
6.2 Sequential testing in the testing algorithm
6.2.1 Classification in two categories
6.2.2 Classification in three categories
6.3 Item selection
6.3.1 Fisher information
6.3.2 Kullback-Leibler information
6.4 Comparison of item selection procedures
6.4.1 Method
6.4.2 Results
6.5 Discussion
6.6 References
7. Optimal testing with easy or difficult items in computerized adaptive testing7.1 Introduction
7.2 Item selection in CAT
7.3 Item selection on the basis of success probability
7.3.1 Performance of item selection based on nearest p-point
7.4 Alternative method for selecting with higher or lower success probabilities
7.4.1 Performance of item selection based on selection at a shifted ability level
7.4.2 Some properties of selection at the shifted ability level
7.5 Discussion
7.6 References
Samenvatting
226 pages. 2004. Softcover.