How do I develop a test security plan?

lock keyboard test security plan

A test security plan (TSP) is a document that lays out how an assessment organization address security of its intellectual property, to protect the validity of the exam scores.  If a test is compromised, the scores become meaningless, so security is obviously important.  The test security plan helps an organization anticipate test security issues, establish deterrent and detection methods, and plan responses.  It can also include validity threats not security-related, such as how to deal with examinees that have low motivation.  Note that it is not limited to delivery; it can often include topics like how to manage item writers.

Since the first tests were developed 2000 years ago for entry into the civil service of Imperial China, test security has been a concern.  The reason is quite straightforward: most threats to test security are also validity threats. The decisions we make with test scores could therefore be invalid, or at least suboptimal.  It is therefore imperative that organizations that use or develop tests should develop a TSP. 

There are several reasons to develop a test security plan.  First, it drives greater security and therefore validity.  The TSP will enhance the legal defensibility of the testing program.  It helps to safeguard the content, which is typically an expensive investment for any organization that develops tests themselves.  If incidents do happen, they can be dealt with more swiftly and effectively.  It helps to manage all the security-related efforts.

The development of such a complex document requires a strong framework.  We advocate a framework with three phases: planning, implementation, and response.  In addition, the TSP should be revised periodically.

 

Phase 1: Planning

The first step in this phase is to list all potential threats to each assessment program at your organization.  This could include harvesting of test content, preknowledge of test content from past harvesters, copying other examinees, proxy testers, proctor help, and outside help.  Next, these should be rated on axes that are important to the organization; a simple approach would be to rate on potential impact to score validity, cost to the organization, and likelihood of occurrence.  This risk assessment exercise will help the remainder of the framework.

Next, the organization should develop the test security plan.  The first piece is to identify deterrents and procedures to reduce the possibility of issues.  This includes delivery procedures (such as a lockdown browser or proctoring), proctor training manuals, a strong candidate agreement, anonymous reporting pathways, confirmation testing, and candidate identification requirements.  The second piece is to explicitly plan for psychometric forensics. 

This can range from complex collusion indices based on item response theory to simple flags, such as a candidate responding to a certain multiple choice option more than 50% of the time or obtaining a score in the top 10% but in the lowest 10% of time.  The third piece is to establish planned responses.  What will you do if a proctor reports that two candidates were copying each other?  What if someone obtains a high score in an unreasonably short time? 

What if someone obviously did not try to pass the exam, but still sat there for the allotted time?  If a candidate were to lose a job opportunity due to your response, it helps you defensibility to show that the process was established ahead of time with the input of important stakeholders.

 

Phase 2: Implementation

The second phase is to implement the relevant aspects of the Test Security Plan, such as training all proctors in accordance with the manual and login procedures, setting IP address limits, or ensuring that a new secure testing platform with lockdown is rolled out to all testing locations.  There are generally two approaches.  Proactive approaches attempt to reduce the likelihood of issues in the first place, and reactive methods happen after the test is given.  The reactive methods can be observational, quantitative, or content-focused.  Observational methods include proctor reports or an anonymous tip line.  Quantitative methods include psychometric forensics, for which you will need software like SIFT.  Content-focused methods include automated web crawling.

Both approaches require continuous attention.  You might need to train new proctors several times per year, or update your lockdown browser.  If you use a virtual proctoring service based on record-and-review, flagged candidates must be periodically reviewed.  The reactive methods are similar: incoming anonymous tips or proctor reports must be dealt with at any given time.  The least continuous aspect is some of the psychometric forensics, which depend on a large-scale data analysis; for example, you might gather data from tens of thousands of examinees in a testing window and can only do a complete analysis at that point, which could take several weeks.

proctored test centers

Phase 3: Response

The third phase, of course, to put your planned responses into motion if issues are detected.  Some of these could be relatively innocuous; if a proctor is reported as not following procedures, they might need some remedial training, and it’s certainly possible that no security breach occurred.  The more dramatic responses include actions taken against the candidate.  The most lenient is to provide a warning or simply ask them to retake the test.  The most extreme methods include a full invalidation of the score with future sanctions, such as a five-year ban on taking the test again, which could prevent someone from entering a profession for which they spent 8 years and hundreds of thousands of dollars in educative preparation.

 

What does a test security plan mean for me?

It is clear that test security threats are also validity threats, and that the extensive (and expensive!) measures warrant a strategic and proactive approach in many situations.  A framework like the one advocated here will help organizations identify and prioritize threats so that the measures are appropriate for a given program.  Note that the results can be quite different if an organization has multiple programs, from a practice test to an entry level screening test to a promotional test to a professional certification or licensure.

Another important difference between test sponsors/publishers and test consumers.  In the case of an organization that purchases off-the-shelf pre-employment tests, the validity of score interpretations is of more direct concern, while the theft of content might not be an immediate concern.  Conversely, the publisher of such tests has invested heavily in the content and could be massively impacted by theft, while the copying of two examinees in the hiring organization is not of immediate concern.

In summary, there are more security threats, deterrents, procedures, and psychometric forensic methods than can be discussed in one blog post, so the focus here rather on the framework itself.  For starters, start thinking strategically about test security and how it impacts their assessment programs by using the multi-axis rating approach, then begin to develop a Test Security Plan.  The end goal is to improve the health and validity of your assessments.


Want to implement some of the security aspects discussed here, like online delivery lockdown browser, IP address limits, and proctor passwords?

Sign up for a free account in FastTest!

 

 

Nathan Thompson, PhD

Nathan Thompson, PhD, is CEO and Co-Founder of Assessment Systems Corporation (ASC). He is a psychometrician, software developer, author, and researcher, and evangelist for AI and automation. His mission is to elevate the profession of psychometrics by using software to automate psychometric work like item review, job analysis, and Angoff studies, so we can focus on more innovative work. His core goal is to improve assessment throughout the world.

Nate was originally trained as a psychometrician, with an honors degree at Luther College with a triple major of Math/Psych/Latin, and then a PhD in Psychometrics at the University of Minnesota. He then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. He is also cofounder and Membership Director at the International Association for Computerized Adaptive Testing (iacat.org). He’s published 100+ papers and presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/.

Share This Post

Facebook
Twitter
LinkedIn
Email

More To Explore

waves paper
Psychometrics

The One Parameter Logistic Model

The One Parameter Logistic Model (OPLM or 1PL or IRT 1PL) is one of the three main dichotomous models in the item response theory (IRT)

laptop and numbers
Education

What is a z-Score?

A z-score measures the distance between a raw score and a mean in standard deviation units. The z-score is also known as a standard score