Certification Test Development: Best Practices

Business team working on a new business plan with modern digital

We get asked all the time at Assessment Systems Corporation: What are the steps involved in certification test development? What are the best practices that lead to a sound exam that can meet accreditation standards? There are a number of key steps.  But first of all, what is certification test development?

What is Certification Test Development?

A certification exam is extremely important.  It can be a gatekeeper to a profession that candidates have spent years studying for, perhaps with hundreds of thousands of dollars in investment.  There are definitely some certification tests out there that are just 50 multiple choice question written by someone in their basement.  Any real certification exams, however, go through a rigorous process backed by decades of scientific research.  If you want to get your certification accredited (a stamp of approval that it is good), then you need to follow the guidelines for certification test development, as described below.  Of course, there are many other considerations as well – board governance, financials, policies, education, etc. – which are actually the majority of the certification standards.psychometrics test development cycle job task analysis

Job Analysis

A best practice among certification programs is to understand the knowledge, skills, and attributes used in the performance of the job or role. A job task analysis, or role delineation, uses quantitative methods to gain insight into job competence. Subject matter expert (SME) involvement is critical to validity. The value of this analysis ranges from helping practitioners identify areas for development to benchmarking content for educational and training programs, field leadership for the certification organization, and promotion and elevation of the field.

Test Blueprint

Developing a blueprint, or content outline, for the test from job analysis results lends validity to test results. A best practice is to use the job analysis to derive which content areas should be on the test and how many items should be in each. Beyond test validity, the benefits of publishing the blueprint include transparency for all in the certification process and a guide for candidates to focus their test preparation.  Here’s a free tool to help this process.

Item Writing and Review

A foundational principle of test development says that the effectiveness of the test is only as good as the items on it. High-quality certification programs have quality standards for their items covering type, content, format, and grammar. They ensure that each item meets these before inclusion on a test. Once items have been sampled, program managers consider whether their performance warrants use in future testing.  The process of item writing and review can be done in a system like Google Docs very early on, or for small scale exams, but you will soon outgrow that and want a dedicated platform like FastTest.

Standard Setting

The passing standard, or cut score, is usually where the certification program’s test intersects with their decision whether to confer certification. Test takers whose scores meet or exceed the passing standard receive a pass result, and those whose performance fails to meet the standard receive a fail. The stakes mentioned above rest on where and in what way this this standard is set. Common methods for setting the passing standard involve SME input and previous item performance, such as the modified-Angoff and Bookmark approaches. Publishing the passing standard and method used to derive it is valuable to candidates for understanding how they performed on the test and to certifying organizations for deciding where to grant certification.

Psychometric Reporting

Validity and reliability of the test are key concepts in psychological testing. Validity speaks to how accurately the test measures real-world job or role performance. Savvy programs document evidence of this in the points already discussed. Once candidates have answered items, analysis of test and item performance begins.

Traditionally, programs look at statistics such as reliability (especially at the passing standard), distribution of scores, difficulty, and discrimination of the test and each item for a particular group of test takers. Certifying organizations use this analysis to finalize test results and report on the test’s performance. Leading organizations with high candidate volumes are doing additional analysis of results and developing innovative ways to compile and administer tests, such as Linear on-the-fly (LOFT) testing and computer adaptive testing (CAT).


If you want to be accredited, you need to have multiple versions or forms of the test, so that if someone fails then retakes the exam, they don’t see the exact same items.  Many orgs will also rotate exam forms across time, such as rotating once per year.  Whenever you have multiple forms, you must do equating, which is a statistical process to adjust for differences in difficulty.  We all know that some exam forms might be easier or hard than others; equating makes sure that the standard stays consistent and fair.

The Certification Test Development Cycle Continues…

Certification test development is not a one-and-done situation. If you leave one form of the test out in the field, the questions and answers will soon become common knowledge. It’s essential to periodically update the exam by developing new items and then rotating them in by publishing new forms. And every 5-10 years (depending on the profession) the test needs a complete overhaul, with a new job analysis and standard setting. As with any high performing machine, you need oil changes and periodic maintenance!

Share This Post


More To Explore


The IRT Item Difficulty Parameter

The item difficulty parameter from item response theory (IRT) is both a shape parameter of the item response function (IRF) but also an important way

waves paper

The One Parameter Logistic Model

The One Parameter Logistic Model (OPLM or 1PL or IRT 1PL) is one of the three main dichotomous models in the item response theory (IRT)