Test validation: How to determine if a test score is supported?

December 27, 2022

Test validation is the process of verifying whether the specific requirements to test development stages are fulfilled or not, based on solid evidence. In particular, test validation is an ongoing process of developing an argument that a specific test, its score interpretation or use is valid. The interpretation and use of testing data should be validated in terms of content, substantive, structural, external, generalizability, and consequential aspects of construct validity (Messick, 1994). Validity is the status of an argument that can be positive or negative: positive evidence supports and negative evidence weakens the validity argument, accordingly. Validity cannot be absolute and can be judged only in degrees. Meta-analysis, a technique frequently employed in psychometrics, aggregates research findings across multiple studies to assess the overall validity and reliability of a test. By synthesizing data from diverse sources, meta-analysis provides a comprehensive evaluation of the test’s construct validity, supporting its use in educational and psychological assessments (AERA, APA, & NCME, 1999).

Validation as part of test development

To be effective, test development has to be structured, systematic, and detail-oriented. These features can guarantee sufficient validity evidence supporting inferences proposed by test scores obtained via assessment. Downing (2006) suggested a twelve-step framework for the effective test development:

Overall plan
Content definition
Test blueprint
Item development
Test design and assembly
Test production
Test administration
Scoring test responses
Standard setting
Reporting test results
Item bank management
Technical report

Even though this framework is outlined as a sequential timeline, in practice some of these steps may occur simultaneously or may be ordered differently. A starting point of the test development – the purpose – defines the planned test and regulates almost all validity-related activities. Each step of the test development process focuses on its crucial aspect – validation.

Hypothetically, an excellent performance of all steps can ensure a test validity, i.e. the produced test would estimate examinee ability fairly within the content area to be measured by this test. However, human factor involved in the test production might play a negative role, so there is an essential need for the test validation.

Reasons for test validation

There are myriads of possible reasons that can lead to the invalidation of test score interpretation or use. Let us consider some obvious issues that potentially jeopardize test validation and are subject to validation:

overall plan: wrong choice of a psychometric model;
content definition: content domain is ill defined;
test blueprint: test blueprint does not specify an exact sampling plan for the content domain;
item development: items measure content at an inappropriate cognitive level;
test design and assembly: unequal booklets;
test administration: cheating;
scoring test responses: inconsistent scoring among examiners;
standard setting: unsuitable method of establishing passing scores;
item bank management: inaccurate updating of item parameters.

Context for test validation

All tests have common types of validity evidence that is purported, e.g. reliability, comparability, equating, and item quality. However, tests can vary in terms of a quantity of constructs measured (single, multiple) and can have different purposes which call for the unique types of test validation evidence. In general, there are several major types of tests:

Admissions tests (e.g., SAT, ACT, and GRE)
Credentialing tests (e.g., a live-patient examination for a dentist before licensing)
Large-scale achievement tests (e.g., Stanford Achievement Test, Iowa Test of Basic Skills, and TerraNova)
Pre-employment tests
Medical or psychological
Language

The main idea is that the type of test usually defines a unique validation agenda that focuses on appropriate types of validity evidence and issues that are challenged in that type of test.

Categorization of test validation studies

Since there are multiple precedents for the test score invalidation, there are many categories of test validation studies that can be applied to validate test results. In our post, we will look at the categorization suggested by Haladyna (2011):

Category 1: Test Validation Studies Specific to a Testing Program
Subcategory of a study	Focus of a study
1. Studies That Provide Validity Evidence in Support of the Claim for a Test Score Interpretation or Use	Content analysis Item analysis Standard setting Equating Reliability
2. Studies That Threaten a Test Score Interpretation of Use	Cheating Scoring errors Student motivation Unethical test preparation Inappropriate test administration
3. Studies That Address Other Problems That Threaten Test Score Interpretation or Use	Drop in reliability Drift in item parameters over time Redesign of a published test Possible security problem
Category 2: Test Validation Studies That Apply to More Than One Testing Program
Studies that lead to the establishment of concepts, principles, or procedures that guide, inform, or improve test development or scoring	Introducing a concept Introducing a principle Introducing a procedure Studying a pervasive problem

Summary

Even though test development is a longitudinal laborious process, test creators have to be extremely accurate while executing their obligations within each activity. The crown of this process is obtaining valid and reliable test scores, and their adequate interpretation and use. The higher the stakes or consequences of the test scores, the greater attention should be paid to the test validity, and, therefore, to the test validation. The latter one is emphasized by integrating all reliable sources of evidence to strengthen the argument for test score interpretation and use.

References

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (1999). Standards for educational and psychological testing. American Educational Research Association.

Downing, S. M. (2011). Twelve steps for effective test development. In. S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3-25). Lawrence Erlbaum Associates.

Haladyna, T. M. (2011). Roles and importance of validity studies in test development. In. S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 739-755). Lawrence Erlbaum Associates.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational researcher, 23(2), 13-23.

Laila Issayeva M.Sc.

Laila Issayeva earned her BA in Mathematics and Computer Science at Aktobe State University and Master’s in Education at Nazarbayev University. She has experience as a math teacher, school leader, and as a project manager for the implementation of nationwide math assessments for Kazakhstan. She is currently pursuing a PhD in psychometrics.

Ready to talk to an assessment expert?

Get in touch, and we'll meet to discuss how we can improve your exam development, delivery, and psychometrics!

Request a Consultation