University certificate credential

A Certification Management System (CMS) or Credential Management System plays a pivotal role in streamlining the key processes surrounding the certification or credentialing of people, namely that they have certain knowledge or skills in a profession.  It helps with ensuring compliance, reducing business operation costs, and maximizing the value of certifications. In this article, we explore the significance of adopting a CMS and its benefits for both nonprofits and businesses.

What is a certification management system?

A certification management system is an enterprise software platform that is designed specifically for organizations whose primary goal is to award credentials to people for professional skills and knowledge.  Such an organization is often a nonprofit like an Association or Board, and is sometimes called an “awarding body” or similar term.   However, nowadays there are many for-profit corporations which offer certifications.  For example, many IT/Software companies will certify people on their products.  Here’s a page that does nothing but list the certifications offered by SalesForce!

These organizations often offer various credentials within a field.

  • Initial certifications (high stakes and broad) – Example: Certified Widgetmaker
  • Certificates or microcredentials – Example: Advanced Widget Design Specialist
  • Recertification exams – Example: taking a test every 5 years to maintain your Certified Widgetmaker status
  • Benchmark/progress exams – Example: Widgetmaker training programs are 2 years long and you take a benchmark exam at the end of Year 1
  • Practice tests: Example: old items from the Certified Widgetmaker test provided in a low-stakes fashion for practice

A credentialing body will need to manage the many business and operation aspects around these.  Some examples:

  • Applications, tracking who is applying for whichCertification management system, credentialing
  • Payment processing
  • Eligibility pathways and documentation (e.g., diplomas)
  • Pass/Fail results
  • Retake status
  • Expiration dates

There will often be functionality that makes these things easier, like automated emails to remind the professionals when their certification is expiring so they can register for their Recertification exam.

 

Reasons to use a certification management system

  1. Enhancing Compliance and Regulatory Adherence:  A comprehensive CMS provides a centralized repository where organizations can securely store, track, and manage certifications and credentials. This ensures compliance with industry standards, regulatory bodies, and audits.  A certification management system will also help your organization achieve accreditation like NCCA or ANSI/ISO 17024.
  2. Streamlining Certification Tracking and Renewals: Managing certifications manually can be a time-consuming and error-prone process. A CMS simplifies this task by automating certification tracking, renewal reminders, and verification processes. By digitizing the management of certifications, organizations can save valuable time and resources, eliminating the need for tedious paperwork and manual record-keeping.
  3. Improving Workforce Efficiency and Development: An efficient CMS empowers organizations to optimize their workforce’s knowledge and skill development. By capturing comprehensive data on certifications, skills, and training, organizations gain valuable insights into their employees’ capabilities. This information can guide targeted training initiatives, succession planning, and talent management efforts. Moreover, employees can leverage the CMS to identify skill gaps, explore potential career paths, and pursue professional development opportunities.
  4. Enhancing Credential Verification and Fraud Prevention: Verifying the authenticity of certifications is critical, especially in industries where credentials hold significant weight. A CMS with built-in verification features enables employers, clients, and other stakeholders to authenticate certifications quickly and accurately. By incorporating advanced security measures, such as blockchain technology or encrypted digital badges, CMSs provide an added layer of protection against fraud and credential forgery. This not only safeguards the reputation of organizations but also fosters trust and confidence among customers, partners, and regulatory bodies.

 

Of course, the bottom line is that a certification management system will save money, because this is a lot of information for the awarding body to track, and it is mission-critical.

 

Conclusion

Implementing a Certification Management System is a strategic investment for organizations seeking to streamline their certification processes and maximize their value. By centralizing certification management, enhancing compliance, streamlining renewals, improving workforce development, and bolstering credential verification, a robust CMS empowers organizations to stay ahead in a competitive landscape while ensuring credibility and regulatory adherence.

ANSI ISO 17024 Accreditation

ANSI ISO/IEC 17024 accreditation is an internationally recognized standard for the accreditation of personnel certification bodies.  That is, it is a stamp of approval from an independent audit which says your certification is good quality.  ANSI stands for the American National Standards Institute, while ISO refers to the International Organization for Standardization. The portion of ANSI which carries out the accreditation process is the ANSI National Accreditation Board (ANAB).

What does ANSI ISO/IEC 17024 cover?

ANSI ISO/IEC 17024 specifies the requirements for bodies operating certification programs for individuals, ensuring that the certification processes are fair, valid, and reliable. The standard outlines the general principles and requirements for the certification of personnel across various fields, including but not limited to healthcare, information technology, engineering, and safety.

The standard covers a wide range of aspects related to certification bodies, including:

certification accreditation

  1. Impartiality and independence: Certification bodies must demonstrate impartiality and avoid any conflicts of interest.
  2. Certification program development: The standard sets criteria for developing certification programs, including defining competencies, establishing eligibility requirements, and developing examination processes.
  3. Examination processes: It outlines guidelines for the design, development, and administration of examinations to assess individuals’ knowledge, skills, and competencies.
  4. Certification process: The standard addresses the application process, evaluation of candidates, decision-making on certification, and ongoing certification maintenance.
  5. Management system requirements: ANSI ISO/IEC 17024 includes requirements for the management system of the certification body, including document control, record keeping, and continual improvement processes.

 

What does ANSI ISO/IEC 17024 mean?

Accreditation to ANSI ISO/IEC 17024 provides assurance to stakeholders that the certification programs and processes are conducted in a consistent, competent, and reliable manner. It enhances the credibility and acceptance of certifications issued by accredited certification bodies, helping individuals demonstrate their professional competence and expertise in their respective fields.

 

Benefits of being accredited under ANSI ISO/IEC 17024

That is a business question for you.  In some cases, you are required; in some professions, there might be a law that candidates do not receive federal funding or do not have certifications recognized if their certification is not accredited.  However, for many professions, accreditation is optional.  In those cases, if there are two certification bodies, it is a competitive advantage for one to become accredited.  But for small certification bodies with no competitors, accreditation is often not worth the great expense.

Here are some reasons to consider pursuing accreditation.

  1. Global Recognition: ISO 17024 accreditation provides global recognition and credibility to a certification program. It demonstrates compliance with internationally recognized standards, which can enhance the reputation of the certification and increase its acceptance worldwide.
  2. Quality Assurance: ISO 17024 accreditation ensures that the certification program follows rigorous processes and standards for the development, administration, and evaluation of assessments. This helps maintain the quality and reliability of the certification, giving stakeholders confidence in its validity and fairness.
  3. Competitive Advantage: Accreditation under ISO 17024 can serve as a competitive differentiator for the certification program. It distinguishes the certification from others in the market by indicating a commitment to high standards of professionalism, competence, and integrity.
  4. Stakeholder Confidence: Accreditation provides assurance to stakeholders, including employers, professionals, and regulatory bodies, that the certification meets recognized criteria for competency assessment. This builds trust and confidence in the certification, leading to increased participation and recognition within the industry.
  5. Continuous Improvement: ISO 17024 accreditation fosters a culture of continuous improvement within the certification program. Through regular assessments and audits, organizations can identify areas for enhancement and implement best practices to enhance the effectiveness and relevance of the certification over time.

Note that ANSI ISO/IEC 17024 is not the only show in town.  The National Commission for Certifying Agencies also accredits certifications, though they define it per certification program, not certification body.

 

Do I need to do all this work myself?

No!  Much of it, yes, you need to do, because no one else has the specific knowledge of your profession and content area.  But we can certainly help with some portions, especially the exam development and psychometrics.  We can also provide the item banking and exam delivery platform to securely administer your exams and report the results.

AI essay scoring process

Microcredentials are short, focused, and targeted educational or assessment-based certificate programs that offer learners a way to acquire specific skills or knowledge in a particular field.  In today’s fast-paced and rapidly evolving job market, traditional degrees may not always be enough to stand out among the competition, and they often take too long to achieve. Microcredentials have emerged as a promising solution to this problem.

What is a Microcredential?

A microcredential is, as the name would suggest, a smaller version of a credential.  A credential is an attestation of learning or skills, that is typically long-term or large scale.  Examples are a 4-year degree in Marketing, certification as a SalesForce developer, or licensure as a surgeon.  Microcredentials have a similar meaning but are typically much more narrow; instead of a degree in marketing, you might have completed various course/certificates in WordPress plugin development, Google Ads, Search Engine Optimization, etc.

In addition, there are other terms like “nano-degrees” or “digital badges” which sometimes overlap, but there are no agreed-upon definitions that differentiate.  However, Badges are usually considered to be even smaller than a Microcredential or Nano-Degree.

In most cases, microcredentials are tied to educational programs, and are therefore quite distinct from certification or licensure, which are assessment-focused.  These are often smaller than a university degree, such as a 2-week online course or a 3-month bootcamp.

Why Microcredentials?

Microcredentials have become increasingly popular in recent years because they offer several benefits to both learners and employers. For learners, they provimicrocredentials-degree-onlinede a more flexible and cost-effective way to gain new skills or upgrade existing ones without committing to a full-time degree program.  Additionally, microcredentials allow learners to demonstrate their competency in a specific skill or knowledge area, which can help them stand out in a crowded job market.  Given that they are less expensive and short duration, learners can often receive “more bang for their buck” in terms of adding to their skillset and improving job prospects.

For employers, microcredentials offer a way to identify job candidates who possess the specific skills they need. By looking for candidates who have earned they in relevant areas, employers can quickly narrow down the pool of applicants and find the most qualified candidates. Additionally, they can be used by employers to upskill their existing workforce, helping employees stay current with the latest developments in their field.

How to Earn Microcredentials

Microcredentials can be earned in a variety of ways, including online courses, workshops, boot camps, and other short-term training programs. These programs typically focus on a specific topic, such as data analytics, project management, or digital marketing. To earn a microcredential, learners must complete a series of assessments or projects that demonstrate their mastery of the subject matter. Once earned, microcredentials are typically displayed as digital badges that can be shared on social media profiles, online resumes, or other platforms.

Types of Microcredentials

There are several types of microcredentials available, including skill-based, competency-based, and stackable credentials. Skill-based microcredentials focus on developing specific skills or knowledge areas, such as coding, graphic design, or language proficiency. Competency-based microcredentials, on the other hand, assess a learner’s ability to perform a specific task or set of tasks, such as managing a team or conducting market research. Finally, stackable microcredentials allow learners to build on existing credentials by earning additional microcredentials in related areas, creating a pathway to a full degree program.

Examples

Consider the field of marketing.  Traditionally, you might go to a university for a 4-year degree in Marketing, Business, or Communications.  This is a very broad approach, and of course takes 4 years (or more for some people).  Alternatively, there are now many options where you can get a microcredential focused specifically on Digital Marketing, a very in-demand skill set.  A great example of this is Oregon State University, but a quick googling will show you many more.  Some providers will even get more specific, such as Social Media Management, Search Engine Optimization, or even specifically WordPress.  But then these are typically branded as a Badge rather than a Nano-Degree or Microcredential.

Conclusion

In conclusion, microcredentials offer a flexible and cost-effective way to gain new skills or upgrade existing ones, making them an attractive option for both learners and employers. By focusing on specific skills or knowledge areas, they allow learners to demonstrate their competence in a particular field, while also providing a way for employers to identify the most qualified job candidates. As the job market continues to evolve, microcredentials are likely to become an increasingly important part of the educational landscape, providing learners with the tools they need to succeed in their careers.

ebel-method-for-multiple-choice-questions

The Ebel method of standard setting is a psychometric approach to establish a cutscore for tests consisting of multiple-choice questions. It is usually used for high-stakes examinations in the fields of higher education, medical and health professions, and for selecting applicants.

How is the Ebel method performed?

The Ebel method requires a panel of judges who would first categorize each item in a data set by two criteria: level of difficulty and relevance or importance. Then the panel would agree upon an expected percentage of items that should be answered correctly for each group of items according to their categorization.

It is crucial that judges are the experts in the examined field; otherwise, their judgement would not be valid and reliable. Prior to the item rating process, the panelists should be given sufficient amount of information about the purpose and procedures of the Ebel method. In particular, it is important that the judges would understand the meaning of difficulty and relevance in the context of the current assessment.

Next stage would be to determine what “minimally competent” performance means in the specific case depending on the content. When everything is clear and all definitions are agreed upon, the experts should classify each item across difficulty (easy, medium, or hard) and relevance (minimal, acceptable, important, or essential). In order to minimize the influence of the judges’ opinion on each other, it is more recommended to use individual ratings rather than consensus ones.

Afterwards judgements on the proportion of items expected to be answered correctly by minimally competent candidates need to be collected for each item category, e.g. easy and desirable. However, for the rating and timesaving purposes the grid proposed by Ebel and Frisbie (1972) might be used. It is worth mentioning though that Ebel ratings are content-specific, so values in the grid might happen to be too low or too high for a test.

Ebel-method-data

At the end, the Ebel method, like the modified-Angoff method, identifies a cut-off score for an examination based on the performance of candidates in relation to a defined standard (absolute), rather than how they perform in relation to their peers (relative). Ebel scores for each item and for the whole exam are calculated as the average of the scores provided by each expert: the number of items in each category is multiplied by the expected percentage of correct answers, and the total results are added to calculate the cutscore.

Pros of using Ebel

  • This method provides an overview of a test difficulty
  • Cut-off score is identified prior to an examination
  • It is relatively easy for experts to perform

Cons of using Ebel

  • This method is time-consuming and costly
  • Evaluation grid is hard to get right
  • Digital software is required
  • Back-up is necessary

Conclusion

The Ebel method is a quite complex standard-setting process compared to others due to the need of an analysis of the content, and it therefore imposes a burden on the standard-setting panel. However, Ebel considers the relevance of the test items and the expected proportion of the correct answers of the minimally competent candidates, including borderline candidates. Thus, even though the procedure is complicated, the results are very stable and very close to the actual cut-off scores.

References

Ebel, R. L., & Frisbie, D. A. (1972). Essentials of educational measurement.

high jump adaptive testing 2

A cutscore or passing point (aka cut-off score and cutoff score as well) is a score on a test that is used to categorize examinees.  The most common example of this is pass/fail, which we are all familiar with from our school days.  For instance, a score of 70% and above will pass, while below 70% will fail.  However, many tests have more than one cutscore.  An example of this is the National Assessment of Educational Progress (NAEP) in the USA, which has 3 cutscores, creating 4 categories: Below Basic, Basic, Proficient, and Advanced.

The process of setting a cutscore is called a standard-setting study.  However, I dislike this term because the word “standard” is used to reflect other things in the assessment world.  In some cases, it is the definition of what is to be learned or covered (see Common Core State Standards) and in other cases it refers to the process of reducing construct-irrelevant variance by ensuring that all examinees are taking the testing in standardized conditions (standardized testing).  So I prefer cutscore or passing point.  And passing point is limited to the case of an exam with only one cutscore where the classifications are pass/fail, which is not always the case – not only are there many situations where there are more than one cutscore, but many two-category situations might use other decisions, like Hire/NotHire or a clinical diagnosis like Depressed/NotDepressed.

When establishing cutscores, it is important to use scaled scores to ensure consistency and fairness.  Scaling adjusts raw scores to a common metric, which helps to accurately reflect the intended performance standards across different test forms or administrations.  You may read about setting a cutscore on a test scored with item response theory in this blog post.  For a deeper understanding of how measurement variability can affect the interpretation of cutscores, be sure to check out our blog post on confidence intervals.

Types of cutscores

There are two types of cutscores, reflecting the two ways that a test score can be interpreted: norm-referenced and criterion-referenced.  The Hofstee method represents a compromise approach that incorporates aspects of both.

Criterion-referenced Cutscore

A cutscore of this type is referenced to the material of the exam, regardless of examinee performance.  In most cases, this is the sort of cutscore that you need to be legally defensible for high stakes exams.  Psychometricians have spent a lot of time inventing ways to do this, and scientifically studying them.

Names of some methods you might see for this type are: modified-Angoff, Nedelsky, and Bookmark.

Example

An example of this is a certification exam.  If the cutscore is 75%, you pass.  In some months or years, this might be most candidates, in other months it might be fewer.  The standard does not change.  In fact, the organizations that manage such exams go to great lengths to keep it stable over time, a process known as equating.

Norm-referenced Cutscore

A cutscore of this type is referenced to the examinees, regardless of their mastery of the material.

A name of this you might see is a quota.  Such as when a test is delivered to only accept the top 10% of applicants.

Example

An example of this was in my college Biology class.  It was a weeder class, to weed out the students who start college planning to be pre-med simply because they like the idea of being a doctor or are drawn to the potential salary.  So, the exams were intentionally made very hard, so that the average score might only be 50% correct.  They then awarded an A to anyone who had a z-score of 1.0 or greater, which is the top 15% of students – regardless of how well you actually scored on the exam.  You might get a score of 60% correct but be 95th percentile and get an A.

Nedelsky-method-standard-setting-panel-meeting

The Nedelsky method is an approach to setting the cutscore of an exam.  Originally suggested by Nedelsky (1954), it is an early attempt to implement a quantitative, rigorous procedure to the process of standard setting.  Quantitative approaches are needed to eliminate the arbitrariness and subjectivity that would otherwise dominate the process of setting a cutscore.  The most obvious and common example of this is simply setting the cutscore at a round number like 70%, regardless of the difficulty of the test or the ability level of the examinees.  It is for this reason that a cutscore must be set with a method such as the Nedelsky approach to be legally defensible or meet accreditation standards.

How to implement the Nedelsky method

The first step, like several other standard setting methods, is to gather a panel of subject matter experts (SMEs).  The next step is for the panel to discuss the concept of a minimally qualified candidate   This is a concept about the type of candidate that should barely pass this exam, and sits on the borderline of competence. They then review a test form, paying specific attention to each of the items on the form.  For every item in the test form, each rater estimates the number of options that an MCC will be able to eliminate.  This then translates into the probability of a correct response, assuming that each candidate guesses amongst the remaining options.   If an MCC can only eliminate one of the options of a four option item, they then have a 1/3 = 33% chance of getting the item correct.  If two, then ½ = 50%.

These ratings are then averaged across all items and all raters.  This then represents the percentage score expected of an MCC on this test form, as defined by the panel.  This makes a compelling, quantitative argument for what the cutscore should then be, because we would expect anyone that is minimally qualified to score at that point or higher.

Item Rater1 Rater2 Rater3
1 33 50 33
2 25 25 25
3 25 33 25
4 33 50 50
5 50 100 50
Total 33.2 51.6 36.6

Drawbacks to the Nedelsky method

This approach only works on multiple choice items, because it depends on the evaluation of option probability.  It is also a gross oversimplification.  If the item has four options, there are only four possible values for the Nedelsky rating 25%, 33%, 50%, 100%.  This is all the more striking when you consider that most items tend to have a percent-correct value between 50% and 100%, and reflecting this fact is impossible with the Nedelsky method. Obviously, more goes into answering a question than simply eliminating one or two of the distractors.  This is one reason that another method is generally preferred and supersedes this method…

Nedelsky vs Modified-Angoff

The Nedelsky method has been superseded by the modified-Angoff method.  The modified-Angoff method is essentially the same process but allows for finer variations, and can be applied to other item types.  The modified-Angoff method subsumes the Nedelsky method, as a rater can still implement the Nedelsky approach within that paradigm.  In fact, I often tell raters to use the Nedelsky approach as a starting point or benchmark.  For example, if they think that the examinee can easily eliminate two options, and is slightly more likely to guess one of the remaining two options, the rating is not 50%, but rather 60%.  The modified-Angoff approach also allows for a second round of ratings after discussion to increase consensus (Delphi Method).  Raters can slightly adjust their rating without being hemmed into one of only four possible ratings.

healthcare certification

An Objective Structured Clinical Examination (OSCE Exam) is an assessment designed to measure performance of tasks, typically medical, in a high-fidelity way.  It is more a test of skill than knowledge.  For example, I used to work at a certification board for ophthalmic assistants; there were 3 levels, and the top two levels included both a knowledge test (200 multiple choice items) and an OSCE (level 2 was a digital simulation, level 3 was live human patients).

OSCE exams serve a very important purpose in many fields, forging a critical bridge between learning and practice.  This post will cover some of the basics.

 

What is an Objective Structured Clinical Examination?

An OSCE exam typically works by defining very specific tasks that the examinee is required to do, while examiners (often professors) watch them while grading them via a rubric or checklist.  Each of the tasks is often called a station, and the OSCE will often have multiple stations.  Consider the compclinical examinationonents of the name:

  • Objective: We are trying to be as objective as possible, boiling down a potentially very complex patient scenario and task into a checklist or rubric. We want to make it quantitative, measurable, and reliable.
  • Structured: The task itself is very boxed, such as using retinoscopy to measure astigmatism (perhaps one thing of 20 that might happen at a visit to your ophthalmologist).
  • Clinical: The task is something to be done in a clinical setting; this is to increase fidelity and validity.

A great summary is provided by Zayyan (2011):

The Objective Structured Clinical Examination is a versatile multipurpose evaluative tool that can be utilized to assess health care professionals in a clinical setting. It assesses competency, based on objective testing through direct observation. It is precise, objective, and reproducible allowing uniform testing of students for a wide range of clinical skills. Unlike the traditional clinical exam, the OSCE could evaluate areas most critical to performance of health care professionals such as communication skills and ability to handle unpredictable patient behavior.

There are a few key points here.

  • It is a clinical setting, rather than a lecture hall setting (though in non-medical fields, “clinical setting” is relative!)
  • It is assessing competency of clinical skills
  • It is based on observation, where examiners rate the examinee
  • It will often include assessment of “soft skills” or other non-knowledge aspects

 

Where are OSCE Exams used?

OSCE exams are very important in the medical professions.  This report shows that many medical schools use it, though it curiously does not say how many schools were part of the survey.

However, it is most certainly not limited to medical fields.  You don’t hear the term very often outside medical education, but the approach is used widely.   Professions where someone is physically doing something are more likely to use OSCEs.  An accountant, on the other hand, does no physically do something, and their equivalent of an OSCE is more like a complex accounting scenario that needs to be completed in MS Excel and then graded.

 

Examples of OSCE exams

Of course, there are many medical examples.  I work with the American Board of Chiropractic Sports Physicians, who have a practicNurse skill testal exam.  Check out their DACBSP® webpage and scroll down to the Practical Exam resources, including instructional videos for some stations.

I once worked with a crane operator certification.  They had a performance test where you had to drive the crane into a certain position, lift and place certain objects, and then move a wrecking ball through a path of oil drums without knocking anything over – all while being rated by an examiner with a checklist.  Sounds a lot like an OSCE?

Perhaps the most common OSCE is one that you have likely taken: a Driver’s test.  In addition to taking a knowledge test, you were also likely asked to drive a car with an examiner armed with a checklist while he told you to do various “stations” like parallel parking, perpendicular parking, or navigating a stoplight.

 

Tell me more!

There are dedicated resources in the world of medical education and assessment, such as Downing and Yudkowsky (2019) Assessment in Health Professions Education (https://www.routledge.com/Assessment-in-Health-Professions-Education/Yudkowsky-Park-Downing/p/book/9781315166902).   You might also be interested in my Lecture Notes from a course taught using that textbook.

ncca certification accreditation

NCCA accreditation is a stamp of approval on the quality of a certification program, governed by the National Commission for Certifying Agencies (NCCA).™  This is part of the Institute for Credentialing Excellence™, the leader in the world of professional credentialing.  NCCA accreditation tells your certificants – and all stakeholders in your profession, including customers/patients – that the credential meets best practices and international standards, so they can trust the quality of the personnel who have achieved it.  In many cases, you can’t have this trust with an unaccredited credential; though there are definitely many decent ones who just lack the size/funding to get accredited.

Want to talk to one of our accreditation experts?  Contact us.

NCCA accreditation requirements

Getting a certification accredited shows that it is of good quality. Anyone can write 50 questions on a topic in their basement and throw it up on some free survey/quiz software, then call it a certification. In fact, many places do, and charge hundreds of dollars for this. NCCA accreditation is a pushback on this practice, where respectable certifications banded together and agreed on a few main points regarding what is high quality. Some examples:

  • You must have an oversight board, which includes a public member.
  • You must have a legitimate organization with audited financial statements.
  • You must have policies for application, retakes, continuing education, and more.
  • There must be a firewall between certification staff and education staff.
  • The test must be professionally designed and maintained.
  • The test must be delivered securely, with proctoring.

 

What do we mean by “certification program”?

A certification is a validation of a person’s skills and knowledge for a particular profession.  We all think of it as a test that must be passed, but that’s actually a minority of the process.  There’s also things like initial education, eligibility pathways to sit for the exam, retake policies, how to get recertified, continuing education, etc.  On top of that, there are organizational issues; you need to make sure that there is an appropriate governing board, that education and certification staff don’t overlap, that you have valid financial accounting, etc.  So that’s why the accreditation refers to a “program” and not just a “test.”

This means that an organization with multiple certification programs will need to apply for accreditation on each.  However, since many of the aspects are about the organization (e.g., financial statements), there is massive overlap and these can be re-used for each.

 

What do we mean by “stamp of approval”?

NCCA is a panel of experts, composed of a range of stakeholders in the industry: PhD psychometricians, internationally-known certification managers, attorneys with expertise in this specific topic, and so on.  You need to complete a formal application process, submitting tons of documentation about the aforementioned topics.  The panel will then review this and grant accreditation, stating that you have followed all the standards.

Again, note that this is not just a stamp of approval on the exam.  If you have an exam for certified Widgetmakers and you have a panel of expert Widgetmakers, the NCCA is not going to evaluate your actual questions.  They are evaluating much bigger questions.  Do you have a nonprofit board set up and have the correct legal governance?  Do you have audited financial statements like any other sound entity?  Do you have a published Candidate Handbook that lays out everything from how to initially apply for the certification to how to maintain it for your career?

 

Why should we get accredited?

In many cases, it is not necessary to achieve NCCA accreditation.  There are really seven reasons:girl looking for resources improving assessment

  1. Quality Assurance: Accreditation ensures that a certification program meets established standards of quality and rigor. It validates that the program has undergone a comprehensive review by an independent accrediting body and has demonstrated its adherence to industry-recognized standards and best practices. Accreditation helps maintain and improve the quality of the certification program over time.
  2. Credibility and Recognition: Accreditation adds credibility and recognition to a certification program. It signifies that the program has been evaluated by experts in the field and has met rigorous criteria. Accreditation enhances the reputation of the certification, making it more valuable and trusted by employers, professionals, and other stakeholders.  This helps you sell more certifications; remember, credentialing is a business and certifications are the flagship product!
  3. Industry Acceptance: Accreditation can increase the acceptance and recognition of a certification within the industry or professional community. It provides assurance to employers, clients, and regulatory bodies that the certified individuals have acquired the necessary knowledge, skills, and competencies to perform their roles effectively.
  4. Competitive Advantage: Some fields, like personal trainers, have many organizations that offer training and certifications.  Achieving certification provides an advantage over your competitors in the marketplace.
  5. Standardization: Accreditation promotes standardization and consistency in the certification process. It ensures that the program’s content, assessment methods, passing criteria, and recertification requirements are fair, transparent, and consistent across all candidates. Standardization helps maintain a level playing field and ensures that certified individuals possess the same level of expertise.
  6. Career Advancement: Accreditation can enhance career opportunities for individuals holding the certified credential. It demonstrates their commitment to professional development and continuous learning. Accredited certifications are often preferred or required by employers, which can lead to better job prospects, promotions, and salary advancements.
  7. Regulatory Compliance: In some industries or professions, accreditation may be a requirement for regulatory compliance. Certain certifications may be mandated by licensing boards or regulatory authorities to ensure public safety, consumer protection, or adherence to specific standards and regulations.  Another example is that if you are selling certifications to members of the US Military, they need to be accredited.

These are all very good reasons, certainly.

 

What is involved in NCCA accreditation?

test dev cycle

The time and cost can vary widely depending on the current state of your organization. If you read the NCCA Standards (requirements to get accredited), they generally fall into 3 categories:

1. Psychometrics and test development: You need to follow best practices in making the exam.  You can’t just write 50 items in your basement and throw it up on a survey platform.  You need statistical reports, job task analysis study, standard setting studies a defensible pass score, and much more.
2. Certification operations and policies: You need to establish policies and procedures, then document in a Candidate Handbook.  You need to set up a business: accepting payments, bookkeeping, tracking status, retakes, annual recertification, perhaps a member conference or webinars, etc.
3. Business/legal/governance:  You need to be a legit organization with Bylaws and audited financial statements.  You will need a governance structure, like an overall Board of Directors and then committees on Certification, Education, or other aspects.

 

What is the cost of NCCA accreditation?

A rule of thumb that I have heard in the industry is that achieving NCCA accreditation for a certification exam will take 1 year and $100,000. Most of that is for parts 2 and 3, which are typically done by you, and not your testing vendor.  So those costs are not what is paid to NCCA for the application process, either.  It is to your staff, to work on a quality Candidate Handbook, set up quarterly webinars for continuing education, create a registration portal – whatever makes sense for you, as long as it follows the Standards.  In some cases they might be things you already do, such as audited financial statements.

We specialize in the psychometrics, which costs far less than $100,000 and takes 3-6 months depending on availability of your subject matter experts. We can certainly work on parts 2 and 3 if you do not have bandwidth and expertise internally.  We can also deliver the exams for you.

If you aren’t sure of the next steps, we can perform an audit on your current state and potential timeline, which will provide a much clearer picture.  CONTACT US  to learn more.

Note: this is not an endorsement of NCCA by ASC, or vice versa, and is meant for educational purposes only.

 

university class

The Beuk Compromise or Beuk Adjustment is a method for a “reality check” on the results of a modified-Angoff standard setting study.  It is well-known that experts will often overestimate examinee capabilities and choose a cutscore that is too high – in some cases, so high that even the experts themselves would fail the exam!  The Beuk Compromise was designed to balance this with the reality of actual examinee performance.  There are similar methods as well, such as the Hofstee Method.

What is a modified-Angoff study?

The Angoff approach is one of the most common ways of setting a defensible cutscore on an exam, especially in the world of professional credentialing (certification and licensure testing).  A panel of subject matter experts (SMEs) is convened to discuss the concept of a minimally competent candidate (MCC) and then review each item on the exam to estimate the percentage of minimally competent candidates that would get each item correct.  If experts disagree, you will need to evaluate inter-rater reliability and agreement, and after that have the experts discuss and re-rate the items to gain better consensus.  The average of these ratings is then the average score that the panel expects an MCC to achieve – a very compelling argument for what should be the passing score!

OK, then what is the issue?

But in practice, the experts are often in rarified air and forgot what it was like to be 22 years old and entering the profession wide-eyed, so they often overestimate both the description of the MCC and the difficulty ratings themselves.  You might find a situation where they set the cutscore at 82, but the average score on the exam is 63.  You might go further and ask the experts to take the exam themselves and find their average is only 75!

So, psychometricians have developed add-on procedures to address this issue.  Each SME can also be asked to provide information for an adjustment or compromise method.  A compromise method assumes that we should not rely on modified-Angoff ratings alone; the results of another method should be considered in conjunction.

The most common adjustment method is the Beuk adjustment or Beuk compromise, which recognizes that a pure Angoff study makes no use of actual data on the test, and instead attempts to reconcile the Angoff approach with an estimate of the score distribution on the test.  Of course, this approach can then be only used if data exists; if there is no data available with which to estimate the score distribution, the Beuk adjustment is not possible.

What is the Beuk Compromise?

To find the Beuk compromise, two pieces of information are needed from each SME: an estimated pass rate and an estimated cutscore.  The estimated cutscore is obtained by calculating the average Angoff rating for each SME; you need to ask them for what they think the MCC pass rate should be.  What you will often find is that they say the pass rate should be, say, 75%, but when you continue the example before (average score of 63), the pass rate with their recommended cutscore turns out to be 10%!

How do I implement the Beuk Compromise?

Use the Angoff Analysis Tool.

SMEs are then simply asked in the meeting to estimate the pass rate of examinees who take the test, after having reviewed all the items.   Enter those values into the AAT in the assigned cells.  If the SMEs consider the test difficult with regards to the cutscore that should be applied and the types of examinees, a low pass rate will be estimated.  These ratings are recorded on the “Adjustments” tab of the AAT.

The Beuk adjustment is best depicted graphically, and this figure is presented on the last tab of the AAT workbook.  It involves two functions:

  1. A curve that presents the pass rate as a function a function of all possible cutscores – this is calculated using the estimates of the score distribution.
  2. A straight line that is a function of the estimated pass rates. The line must pass through the point on the plane where the expected pass rate and panel-recommended cutscore intersect, and has a slope equal to the ratio of the standard deviations of the rater’s cutscore and pass rate estimates.

The x-coordinate of the intersection of these two functions is the Beuk adjustment.  An example of this graph is presented below.  Here, we have a 200-point exam.  A cutscore of 170 would produce a pass rate of about 20%.  A cutscore of 120 would produce a pass rate of about 90%.  The Beuk comes out to be about 145.

Beuk compromise

concurrent calibration irt equating linking

Test equating refers to the issue of defensibly translating scores from one test form to another. That is, if you have an exam where half of students see one set of items while the other half see a different set, how do you know that a score of 70 is the same one both forms? What if one is a bit easier? If you are delivering assessments in conventional linear forms – or piloting a bank for CAT/LOFT – you are likely to utilize more than one test form, and, therefore, are faced with the issue of test equating.

When two test forms have been properly equated, educators can validly interpret performance on one test form as having the same substantive meaning compared to the equated score of the other test form (Ryan & Brockmann, 2009). While the concept is simple, the methodology can be complex, and there is an entire area of psychometric research devoted to this topic. This post will provide an overview of the topic.

 

Why do we need test linking and equating?

The need is obvious: to adjust for differences in difficulty to ensure that all examinees receive a fair score on a stable scale. Suppose you take Form A and get a score of 72/100 while your friend takes Form B and gets a score of 74/100. Is your friend smarter than you, or did his form happen to have easier questions?  What if the passing score on the exam was 73? Well, if the test designers built-in some overlap of items between the forms, we can answer this question empirically.

Suppose the two forms overlap by 50 items, called anchor items or equator items. They are delivered to a large, representative sample. Here are the results.

Mean score on 50 overlap items Mean score on 100 total items
30 72
32 74

Because the mean score on the anchor items was higher, we then think that the Form B group was a little smarter, which led to a higher total score.

Now suppose these are the results:

Mean score on 50 overlap items Mean score on 100 total items
32 72
32 74

Now, we have evidence that the groups are of equal ability. The higher total score on Form B must then be because the unique items on that form are a bit easier.

 

What is test equating?

According to Ryan and Brockmann (2009), “Equating is a technical procedure or process conducted to establish comparable scores, with equivalent meaning, on different versions of test forms of the same test; it allows them to be used interchangeably.” (p. 8). Thus, successful equating is an important factor in evaluating assessment validity, and, therefore, it often becomes an important topic of discussion within testing programs.

Practice has shown that scores, and tests producing scores, must satisfy very strong requirements to achieve this demanding goal of interchangeability. Equating would not be necessary if test forms were assembled as strictly parallel, meaning that they would have identical psychometric properties. In reality, it is almost impossible to construct multiple test forms that are strictly parallel, and equating is necessary to attune a test construction process.

Dorans, Moses, and Eignor (2010) suggest the following five requirements towards equating of two test forms:

  • tests should measure the same construct (e.g. latent trait, skill, ability);
  • tests should have the same level of reliability;
  • equating transformation for mapping the scores of tests should be the inverse function;
  • test results should not depend on the test form an examinee actually takes;
  • the equating function used to link the scores of two tests should be the same regardless of the choice of (sub) population from which it is derived.

Detecting item parameter drift (IPD) is crucial for the equating process because it helps in identifying items whose parameters have changed. By addressing IPD, test developers can ensure that the equating process remains valid and reliable.

 

How do I calculate an equating?

Classical test theory (CTT) methods include linear equating and equipercentile equating as well as several others. Some newer approaches that work well with small samples are Circle-Arc (Livingston & Kim, 2009) and Nominal Weights (Babcock, Albano, & Raymond, 2012).  Specific methods for linear equating include Tucker, Levine, and Chained (von Davier & Kong, 2003). Linear equating approaches are conceptually simple and easy to interpret; given the examples above, the equating transformation might be estimated with a slope of 1.01 and an intercept of 1.97, which would directly confirm the hypothesis that one form was about 2 points easier than the other.

Item response theory (IRT) approaches include equating through common items (equating by applying an equating constant, equating by concurrent or simultaneous calibration, and equating with common items through test characteristic curves), and common person calibration (Ryan & Brockmann, 2009). The common-item approach is quite often used, and specific methods for finding the constants (conversion parameters) include Stocking-Lord, Haebara, Mean/Mean, and Mean/Sigma. Because IRT assumes that two scales on the same construct differ by only a simple linear transformation, all we need to do is find the slope and intercept of that transformation. Those methods do so, and often produce nice looking figures like the one below from the program IRTEQ (Han, 2007). Note that the b parameters do not fall on the identity line, because there was indeed a difference between the groups, and the results clearly find that is the case.

IRTEQ IRT equating

Practitioners can equate forms with CTT or IRT. However, one of the reasons that IRT was invented was that equating with CTT was very weak. Hambleton and Jones (1993) explain that when CTT equating methods are applied, both ability parameter (i.e., observed score) and item parameters (i.e., difficulty and discrimination) are dependent on each other, limiting its utility in practical test development. IRT solves the CTT interdependency problem by combining ability and item parameters in one model. The IRT equating methods are more accurate and stable than the CTT methods (Hambleton & Jones, 1993; Han, Kolen, & Pohlmann, 1997; De Ayala, 2013; Kolen and Brennan, 2014) and provide a solid basis for modern large-scale computer-based tests, such as computerized adaptive tests (Educational Testing Service, 2010; OECD, 2017).

Of course, one of the reasons that CTT is still around in general is that it works much better with smaller samples, and this is also the case for CTT test equating (Babcock, Albano, & Raymond, 2012).

 

How do I implement test equating?

Test equating is a mathematically complex process, regardless of which method you use.  Therefore, it requires special software.  Here are some programs to consider.

  1. CIPE performs both linear and equipercentile equating with classical test theory. It is available from the University of Iowa’s CASMA site, which also includes several other software programs.
  2. IRTEQ is an easy-to-use program which performs all major methods of IRT Conversion equating.  It is available from the University of Massachusetts website, as well as several other good programs.
  3. There are many R packages for equating and related psychometric topics. This article claims that there are 45 packages for IRT analysis alone!
  4. If you want to do IRT equating, you need IRT calibration software. We highly recommend Xcalibre since it is easy to use and automatically creates reports in Word for you. If you want to do the calibration approach to IRT equating (both anchor-item and concurrent-calibration), rather than the conversion approach, this is handled directly by IRT software like Xcalibre. For the conversion approach, you need separate software like IRTEQ.

Equating is typically performed by highly trained psychometricians; in many cases, an organization will contract out to a testing company or consultant with the relevant experience. Contact us if you’d like to discuss this.

 

Does equating happen before or after delivery?

Both. These are called pre-equating and post-equating (Ryan & Brockmann, 2009).  Post-equating means the calculation is done after delivery and you have a full data set, for example if a test is delivered twice per year on a single day, we can do it after that day.  Pre-equating is more tricky, because you are trying to calculate the equating before a test form has ever been delivered to an examinee; but this is 100% necessary in many situations, especially those with continuous delivery windows.

 

How do I learn more about test equating?

If you are eager to learn more about the topic of equating, the classic reference is the book by Kolen and Brennan (2004; 2014) that provides the most complete coverage of score equating and linking.  There are other resources more readily available on the internet, like this free handbook from CCSSO. If you would like to learn more about IRT, we suggest the books by De Ayala (2008) and Embretson and Reise (2000). A brief intro of IRT equating is available on our website.

Several new ideas of general use in equating, with a focus on kernel equating, were introduced in the book by von Davier, Holland, and Thayer (2004). Holland and Dorans (2006) presented a historical background for test score linking, based on work by Angoff (1971), Flanagan (1951), and Petersen, Kolen, and Hoover (1989). If you look for a straightforward description of the major issues and procedures encountered in practice, then you should turn to Livingston (2004).


Want to learn more? Talk to a Psychometric Consultant!

References

Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508-600). American Council on Education.

Babcock, B., Albano, A., & Raymond, M. (2012). Nominal Weights Mean Equating: A Method for Very Small Samples. Educational and Psychological Measurement, 72(4), 1-21.

Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating. ETS Research Report Series2010(2), i-41.

De Ayala, R. J. (2008). A commentary on historical perspectives on invariant measurement: Guttman, Rasch, and Mokken.

De Ayala, R. J. (2013). Factor analysis with categorical indicators: Item response theory. In Applied quantitative analysis in education and the social sciences (pp. 220-254). Routledge.

Educational Testing Service (2010). Linking TOEFL iBT Scores to IELTS Scores: A Research Report. Educational Testing Service.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Maheah.

Flanagan, J. C. (1951). Units, scores, and norms. In E. F. Lindquist (Ed.), Educational measurement (pp. 695-763). American Council on Education.

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational measurement: issues and practice12(3), 38-47.

Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true-and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education10(2), 105-121.

Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187-220). Praeger.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, linking, and scaling: Methods and practices (2nd ed.). Springer-Verlag.

Kolen, M. J., & Brennan, R. L. (2014). Item response theory methods. In Test Equating, Scaling, and Linking (pp. 171-245). Springer.

Livingston, S. A. (2004). Equating test scores (without IRT). ETS.

Livingston, S. A., & Kim, S. (2009). The Circle‐Arc Method for Equating in Small Samples. Journal of Educational Measurement 46(3): 330-343.

OECD (2017). PISA 2015 Technical Report. OECD Publishing.

Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). Macmillan.

Ryan, J., & Brockmann, F. (2009). A Practitioner’s Introduction to Equating with Primers on Classical Test Theory and Item Response Theory. Council of Chief State School Officers.

von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. Springer.

von Davier, A. A., & Kong, N. (2003). A unified approach to linear equating for non-equivalent groups design. Research report 03-31 from Educational Testing Service. https://www.ets.org/Media/Research/pdf/RR-03-31-vonDavier.pdf