Certification and Licensure are two terms that are used quite frequently to refer to examinations that someone has to pass to demonstrate skills in a certain profession or topic.  They are quite similar, and often confused.  This is exacerbated by even more similar terms in the field, such as accreditation, credentialing, certificate, and microcredentials.  This post will help you understand the differences.

What is Certification?

Certification is “a credential that you earn to show that you have specific skills or knowledge. They are usually tied to an occupation, technology, or industry.” (CareerOneStop)  The important aspect in this definition is the latter portion; the organization that runs the certification is generally across an industry or a profession, regardless of political boundaries.  It is almost always some sort of professional association or industry board, like the American Association of Widgetmakers (obviously not a real thing).  However, it is sometimes governed by a specific company or other organization regarding their products; perhaps the most well known is how Amazon Web Services will certify you in skills to hand their offerings.  Many other technology and software companies do the same.

What is Licensure?

Licensure is a “formal permission to do something: esp., authorization by law to do some specified thing (license to marry, practice medicine, hunt, etc.)” (Schmitt, 1995).  The key phrase here is by law.  The governing organization is a governmental entity, and that is defines what licensure is.  In fact, licensure is not even always about a profession; almost all of us have a Driver’s License for which we passed a simple exam.  Moreover, it does not always even have to be about a profession; many millions of people have a Fishing License, which is granted by the government (by States in the USA), for which you simply pay a small fee.  The license is still an attestation, but not of your skills, just that you have been authorized to do something.

Certification and Licensure

In almost all cases, there is a test that you must pass, for both certification and licensure.  The development and delivery of such tests is extremely similar, leading to the confusion.  They often will both utilize job analysis, Angoff studies, and the like.  The difference between the two is outside the test itself, and instead refers to the sponsoring organization: is it mandated/governed by a governmental entity, or is it unrelated to political/governmental boundaries?

Can they be the same exam?

To make things even more confusing… yes.  And it does not even have to be consistent.  In the US, some professions have a wide certification, which is also required in some States as licensure, but not in all States!  Some States might have their own exams, or not even require an exam.  This muddles the difference between certification and licensure.

Outline

This outline summarizes some of the relevant terms.  This is certainly more than can be covered in a single blog post, so this will need to be revisited!!!

  • Attestation of some level of quality for a person or organization = CREDENTIALING
    • Attestation of a person
      • By government = LICENSURE
      • By independent board or company
        • High stakes, wide profession = CERTIFICATION
        • Medium stakes = CERTIFICATE
        • Low stakes, quite specific skill = MICROCREDENTIAL
      • By an educational institution = DEGREE OR DIPLOMA
    • Attestation of an organization = ACCREDITATION

 

 

Authors: 

Laila Issayeva, MS

Nathan Thompson, PhD

 

The Bookmark Method of standard setting (Lewis, Mitzel, & Green, 1996) is a scientifically-based approach to setting cutscores on an examination. It allows stakeholders of an assessment to make decisions and classifications about examinees that are constructive rather than arbitrary (e.g., 70%), meet the goals of the test, and contribute to overall validity. A major advantage of the bookmark method over others is that it utilizes difficulty statistics on all items, making it very data-driven; but this can also be a disadvantage in situations where such data is not available. It also has the advantage of panelist confidence (Karantonis & Sireci, 2006).

The bookmark method operates by delivering a test to a representative sample (or population) of examinees, and then calculating the difficulty statistics for each item. We line up the items in order of difficulty, and experts review the items to place a bookmark where they think a cutscore should be. Nowadays, we use computer screens, but of course in the past this was often done by printing the items in paper booklets, and the experts would literally insert a bookmark.

What is standard setting?

Standard setting (Cizek & Bunch, 2006) is an integral part of the test development process even though it has been undervalued outside of practitioners’ view in the past (Bejar, 2008). Standard setting is the methodology of defining achievement or proficiency levels and corresponding cutscores. A cutscore is a score that serves as a measure of classifying test takers into categories.

Educational assessments and credentialing examinations are often employed to distribute test takers among ordered categories according to their performance across specific content and skills (AERA, APA, & NCME, 2014; Hambleton, 2013). For instance, in tests used for certification and licensing purposes, test takers are typically classified as “pass”—those who score at or above the cutscore—and those who “fail”. In education, students are often classified in terms of proficiency; the Nation’s Report Card assessment (NAEP) in the United States classifies students as Below Basic, Basic, Proficient, Advanced.

However, assessment results could come into question unless the cutscores are appropriately defined. This is why arbitrary cutscores are considered indefensible and lacking validity. Instead, psychometricians help test sponsors to set cutscores using methodologies from the scientific literature, driven by evaluations of item and test difficulty as well as examinee performance.

When to use the bookmark method?

Two approaches are mainly used in international practice to establish assessment standards: the Angoff method (Cizek, 2006) and the Bookmark method (Buckendahl, Smith, Impara, & Plake, 2000). The Bookmark method, unlike the Angoff method, requires the test to be administered prior to defining cutscores based on test data. This provides additional weight to the validity of the process, and better informs the subject matter experts during the process. Of course, many exams require a cutscore to be set before it is published, which is impossible with the bookmark; the Angoff procedure is very useful then.

How do I implement the bookmark method?

The process of standard setting employing the Bookmark method consists of the following stages:

  1. Identify a team of subject matter experts (SMEs); their number should be around 6-12, and led by a test developer/psychometrician/statistician
  2. Analyze test takers’ responses by means of the item response theory (IRT)
  3. Create a list items according to item difficulty in an ascending order
  4. Define the competency levels for test takers; for example, have the 6-12 experts discuss what should differentiate a “pass” candidate from a “fail” candidate
  5. Experts read the items in the ascending order (they do not need to see the IRT values), and place a bookmark where appropriate based on professional judgement across well-defined levels
  6. Calculate thresholds based on the bookmarks set, across all experts
  7. If needed, discuss results and perform a second round

Example of the Bookmark Method

If there are four competency levels such as the NAEP example, then SMEs need to set up three bookmarks in-between: first bookmark is set after the last item in a row that fits the minimally competent candidate for the first level, then second and third. There are thresholds/cutscores from 1 to 2, 2 to 3, and 3 to 4. SMEs perform this individually without discussion, by reading the items.

When all SMEs have provided their opinion, the standard setting coordinator combines all results into one spreadsheet and leads the discussion when all participants express their opinion referring to the bookmarks set. This might look like the sheet below. Note that SME4 had a relatively high standard in their mind, while SME2 had a low standard in their mind – placing virtually every student above an IRT score of 0.0 into the top category!

bookmark method 1

After the discussion, the SMEs are given one more opportunity to set the bookmarks again. Usually, after the exchange of opinions, the picture alters. SMEs gain consensus, and the variation in the graphic is reduced.  An example of this is below.

bookmark method 2

What do to with the results?

Based on the SMEs’ voting results, the coordinator or psychometrician calculates the final thresholds on the IRT scale, and provides them to the analytical team who would ultimately prepare reports for the assessment across competency levels. This might entail score reports to examinees, feedback reports to teachers, and aggregate reports to test sponsors, government officials, and more.

You can see how the scientific approach will directly impact the interpretations of such reports. Rather than government officials just knowing how many students scored 80-90% correct vs 90-100% correct, the results are framed in terms of how many students are truly proficient in the topic. This makes decisions from test scores – both at the individual and aggregate levels – much more defensible and informative.  They become truly criterion-referenced.  This is especially true when the scores are equated across years to account for differences in examinee distributions and test difficulty, and the standard can be demonstrated to be stable.  For high-stakes examinations such as medical certification/licensure, admissions exams, and many more situations, this is absolutely critical.

Want to talk to an expert about implementing this for your exams?  Contact us.

References

[AERA, APA, & NCME] (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education). (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Bejar, I. I. (2008). Standard setting: What is it? Why is it important. R&D Connections, 7, 1-6. Retrieved from https://www.ets.org/Media/Research/pdf/RD_Connections7.pdf

Buckendahl, C. W., Smith, R. W., Impara, J. C., & Plake, B. S. (2000). A comparison of Angoff and Bookmark standard setting methods. Paper presented at the Annual Meeting of the Mid-Western Educational Research Association, Chicago, IL: October 25-28, 2000.

Cizek, G., & Bunch, M. (2006). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests.  Thousand Oaks, CA: Sage.

Cizek, G. J. (2007). Standard setting. In Steven M. Downing and Thomas M. Haladyna (Eds.) Handbook of test development. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers, pp. 225-258.

Hambleton, R. K. (2013). Setting performance standards on educational assessments and criteria for evaluating the process. In Setting performance standards, pp. 103-130. Routledge. Retrieved from https://www.nciea.org/publications/SetStandards_Hambleton99.pdf

Karantonis, A., & Sireci, S. (2006). The Bookmark Standard‐Setting Method: A Literature Review. Educational Measurement Issues and Practice 25(1):4 – 12.

Lewis, D. M., Mitzel, H. C., & Green, D. R. (1996, June). Standard setting: A Book-mark approach. In D. R. Green (Chair),IRT-based standard setting procedures utilizing behavioral anchoring. Symposium conducted at the Council of Chief State School Officers National Conference on Large-Scale Assessment, Phoenix, AZ.

 

 

There are many types of remote proctoring on the market, spread across dozens of vendors, especially new ones that sought to capitalize on the pandemic which were not involved with assessment before hand.  With so many options, how can you more effectively select amongst the types of remote proctoring?

What is remote proctoring?

Remote proctoring refers to the proctoring (invigilation) of educational or professional assessments when the proctor is not in the same room as the examinee.  This means that it is done with a video stream or recording, which are monitored by a human and/or AI.  It is also referred to as online proctoring.

Remote proctoring offers a compelling alternative to in-person proctoring, somewhere in between unproctored at-home tests and tests delivered in an expensive testing center.  This makes it a perfect fit for medium-stakes exams, such as university placement, pre-employment screening, and many types of certification/licensure tests.

What are the types of remote proctoring?

There are four types of remote proctoring, which can be adapted to a particular use case, sometimes varying between different tests in a single organization.  ASC supports all four types, and partners with 5 different vendors to help provide the best solution to our clients.  In descending order of security:

 

Approach What it entails for you What it entails for the candidate

Live with professional proctors

  • You register a set of examinees in FastTest, and tell us when they are to take their exams and under what rules.
  • We provide the relevant information to the proctors.
  • You send all the necessary information to your examinees.
  • The most secure of the types of remote proctoring.
  • Examinee goes to ascproctor.com, where they will initiate a chat with a proctor.
  • After confirmation of their identity and workspace, they are provided information on how to take the test.
  • The proctor then watches a video stream from their webcam as well as a phone on the side of the room, ensuring that the environment is secure. They do not see the screen, so your exam content is not exposed.
  • When the examinee is finished, they notify the proctor, and are excused.

Live, bring your own proctor (BYOP)

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • Your staff logs into the admin portal and awaits examinees.
  • Videos with AI flagging are available for later review if needed.
  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  Proctors ask the examinee to provide identity verification, then launch the test.
  • Examinee is watched on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

Record and Review (with option for AI)

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • After examinees take the test, your staff (or ours) logs into review all the videos and report on any issues.  AI will automatically flag irregular behavior, making your reviews more time-efficient.

 

  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  System asks the examinee to provide identity verification, then launch the test.
  • Examinee is recorded on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

AI only

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • Videos are stored for 1 month if you need to check any.

 

  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  System asks the examinee to provide identity verification, then launch the test.
  • Examinee is recorded on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

 

Some case studies

We’ve worked with all types of remote proctoring, across many types of assessment:

  • ASC delivers high-stakes certification exams for a number of certification boards, in multiple countries, using the live proctoring with professional proctors.  Some of these are available continuously on-demand, while others are on specific days where hundreds of candidates log in.
  • We partnered with a large university in South America, where their admissions exams were delivered using Bring Your Own Proctor, enabling them to drastically reduce costs by utilizing their own staff.
  • We partnered with a private company to provide AI-enhanced record-and-review proctoring for applicants, where ASC staff reviews the results and provides a report to the client.
  • We partner with an organization that delivers civil service exams for a country, and utilizes both unproctored and AI-only proctoring, differing across a range of exam titles.

How do I select a vendor?

First, determine the level of security necessary, and the trade-off with costs.  Live proctoring with professionals can cost $20 to $100 or more, while AI proctoring can be as little as a few dollars.  Then, evaluate some vendors to see which group they fall into; note that some vendors can do all of them!  Then, ask for some demos so you understand the business processes involved and the UX on the examinee side, both of which could substantially impact the soft costs for your organization.  Then, start negotiating with the vendor you want!

 

Want some more information?

Get in touch with us, we’d love to show you a demo!

Email solutions@assess.com.

 

If you have worked in the field of assessment and psychometrics, you have undoubtedly encountered the word “standard.” While a relatively simple word, it has the potential to be confusing because it is used in three (and more!) completely different but very important ways. Here’s a brief discussion.

Standard = Cutscore

As noted by the well-known professor Gregory Cizek here, “standard setting refers to the process of establishing one or more cut scores on a test.” The various methods of setting a cutscore, like Angoff or Bookmark, are referred to as standard setting studies. In this context, the standard is the bar that separates a Pass from a Fail. We use methods like the ones mentioned to determine this bar in as scientific and defensible fashion as possible, and give it more concrete meaning than an arbitrarily selected round number like 70%. Selecting a round number like that will likely get you sued since there is no criterion-referenced interpretation.

Standard = Blueprint

If you work in the field of education, you often hear the term “educational standards.” These refer to the curriculum blueprints for an educational system, which also translate into assessment blueprints, because you want to assess what is on the curriculum. Several important ones in the USA are noted here, perhaps the most common of which nowadays is the Common Core State Standards, which attempted to standardize the standards across states. These standards exist to standardize the educational system, by teaching what a group of experts have agreed upon should be taught in 6th grade Math classes for example. Note that they don’t state how or when a topic should be taught, merely that 6th Grade Math should cover Number Lines, Measurement Scales, Variables, whatever – sometime in the year.

Standard = Guideline

If you work in the field of professional certification, you hear the term just as often but in a different context, accreditation standards. The two most common are the National Commission for Certifying Agencies (NCCA) and the ANSI National Accreditation Board (ANAB). These two organizations are a consortium of credentialing bodies that give a stamp of approval to credentialing bodies, stating that a Certification or Certificate program is legit. Why? Because there is no law to stop me from buying a textbook on any topic, writing 50 test questions in my basement, and selling it as a Certification. It is completely a situation of caveat emptor, and these organizations are helping the buyers by giving a stamp of approval that the certification was developed with accepted practices like a Job Analysis, Standard Setting Study, etc.

In addition, there are the professional standards for our field. These are guidelines on assessment in general rather than just credentialing. Two great examples are the AERA/APA/NCME Standards for Educational and Psychological Measurement and the International Test Commission’s Guidelines (yes they switch to that term) on various topics.

Also: Standardized = Equivalent Conditions

The word is also used quite frequently in the context of standardized testing, though it is rarely chopped to the root word “standard.” In this case, it refers to the fact that the test is given under equivalent conditions to provide greater fairness and validity. A standardized test does NOT mean multiple choice, bubble sheets, or any of the other pop connotations that are carried with it. It just means that we are standardizing the assessment and the administration process. Think of it as a scientific experiment; the basic premise of the scientific method is holding all variables constant except the variable in question, which in this case is the student’s ability. So we ensure that all students receive a psychometrically equivalent exam, with equivalent (as much as possible) writing utensils, scrap paper, computer, time limit, and all other practical surroundings. The problem comes with the lack of equivalence in access to study materials, prep coaching, education, and many bigger questions… but those are a societal issue and not a psychometric one.

So despite all the bashing that the term gets, a standardized test is MUCH better than the alternatives of no assessment at all, or an assessment that is not a level playing field and has low reliability. Consider the case of hiring employees: if assessments were not used to provide objective information on applicant skills and we could only use interviews (which are famously subjective and inaccurate), all hiring would be virtually random and the amount of incompetent people in jobs would increase a hundredfold. And don’t we already have enough people in jobs where they don’t belong?

A standard setting study is a formal process for establishing a performance standard. In the assessment world, there are actually two uses of the word standard – the other one refers to a formal definition of the content that is being tested, such as the Common Core State Standards in the USA. For this reason, I prefer the term cutscore study.

After item authoring, item review, and test form assembly, a cutscore or passing score will often be set to determine what level of performance qualified as “pass” or a similar classification.  This cannot be done arbitrarily (e.g., setting it at 70% because that’s what you saw when you were in school).  To be legally defensible and eligible for Accreditation, it must be done using one of several standard-setting approaches from the psychometric literature.  

The choice of method depends upon the nature of the test, the availability of pilot data, and the availability of subject matter experts.

Some types of a standard setting study:

  • Angoff – In an Angoff study, a panel of subject matter experts rates each item, estimating the percentage of minimally competent candidates that would answer each item correctly.  It is often done in tandem with the Beuk Compromise.  The Angoff method does not require actual examinee data, though the Beuk does.
  • Bookmark – The bookmark method orders the items in a test form in ascending difficulty, and a panel of experts reads through and places a “bookmark” in the book where they think a cutscore should be.  Obviously, this requires enough real data to calibrate item difficulty, usually using item response theory, which requires several hundred examinees.
  • Contrasting Groups – Candidates are sorted into Pass and Fail groups based on their performance on a different exam or some other unrelated standard.  If using data from another exam, a sample of at least 50 candidates is obviously needed.
  • Borderline Group – Similar to Contrasting Groups, but a borderline group is defined using alternative information such as biodata, and the scores of the group are evaluated.

Working toward accreditation or building your team of professionals? Accreditation bodies like ANSI and NCCA require job analyses. Our Psychometricians are available to conduct a job analysis study and write defensible documentation to move your program forward and ensure you are hiring individuals with the skills and knowledge necessary to be successful.

The job market is competitive, especially for employers; whether you need a job analysis or not, the job description you post must convert prospects to candidates. After all, you can lead a horse to water but you can’t make it drink. Vervoe Co-Founder and CEO Omer Molad shares his thoughts about job descriptions that get the right people. Here’s how to write a job description that will attract the right candidates.

Why Focus on Activities?

People are hired to perform value-adding activities. While companies have different approaches to how they hire, their goals are usually the same. Every company wants to hire high-performing people, not people who just look good on paper.

Despite this simple and obvious assumption, too many companies ignore activities and focus on things that don’t indicate performance. This happens at every stage of the hiring process. For example:

  • Many job descriptions focus on what candidates have done in the past.
  • Screening is based on candidates’ backgrounds.
  • Assessment methods often don’t simulate the tasks are performed in the role.

Instead, use on-the-job activities as the guide for the entire hiring process. If you follow this principle, you will hire people who perform the value-adding activities you require.


Here’s how it works.

The Job Description

Defining the role is the foundation of hiring. If you do that incorrectly, the entire hiring process will be steered in the wrong direction. The clearer you are, the higher your chances of attracting the person you want. The problem with so many job descriptions is that they are aren’t linked closely enough to the daily activities of the job. Let’s change that.

A good job description should have three sections:

1. Start with why

“People don’t buy what you do, they buy why you do it.” – Simon Sinek

This approach is entirely applicable to job descriptions. Sell candidates on your company’s vision and story. Sell them on the role and the culture. This will achieve two things. First, it is likely to increase the quality of applicants. Second, candidates will be more likely to invest in the application process and make an effort if they buy into your “why”.

Conversely, candidates who don’t relate to your vision or culture will opt out. Mission accomplished.

2. Describe the role in activities

Outline, point by point, what the successful candidate will do every day. Keep it simple and be very specific. No clichés, no jargon. Candidates need to understand how they will spend each day, what they need to achieve, who they’ll be working with and under what conditions.

This is a great way of managing expectations. By communicating to candidates what they’ll be doing in the role, you are forcing them to ask themselves whether they can do those activities well and how much they enjoy doing them. This presents another opportunity for less suitable candidates to opt out.

3. State your requirements

The previous two sections should make this part easy because you’ve set the scene. Candidates already know what your company stands for and what they’ll be doing in the role. Now you can add some more detail about the type of person you are looking for and how you expect them to approach the role.

Don’t worry about years of experience, grades in college or anything else that’s not activity-based. Bring it back to activities and use plain English.

Describe the kind of person you’re looking for by listing how you want them to approach the role. Put thing in context. Instead of “strong communicator”, write “clearly communicate customer feedback to the product team”. Instead of “flexible”, write “prepared to join calls with developers late at night when necessary”.

You should also use this section to articulate the attitude and behaviors you’d like to see. Candidates already know from the previous section what they’ll be doing on a daily basis. Now explain how.

Here are some examples of good job descriptions and a useful guide on how to write one.

Candidate Screening

With a good job description and scenario-based assessment, candidate screening is simply not required. To learn more about why you don’t need to screen candidates read this.

But in short, screening is not about activities, it’s about a candidate’s background. Ruling people out based on their background is counterproductive. Instead, set candidates up for success with a savvy job description, and then assess the ones that want the job based on that description.

Don’t worry about receiving too many applications from people who aren’t qualified or ignore the job description. That is solved automatically in the assessment stage and you won’t need to lift a finger.

Scenario-based Assessment

Your job description will attract people who want to be part of your journey, and want to do the job you advertised. That’s the theory at least.

Now it’s time to find out how it stacks up.

The assessment stage, which is the most important part of your hiring process, should be entirely based on activities. Go back to the job description and choose the most important on-the-job activities.

Create simulations of those activities so you can see how candidates perform in real-world scenarios. To learn how to write a great interview script read this.

Use automated interviews to deliver the simulations to candidates online.

Some candidates will not make the effort. Others will find the activities too challenging. Others yet will see that the activities are not aligned with their interests or passions. The most motivated and qualified candidates will prevail.

It’s easy to read a job description and apply for a job. However, when candidates are asked to perform challenging tasks, they need to be motivated and confident in their abilities. You’ll only need to view and score completed interviews and you’ll know who measures up within minutes.

Using automated interviews based on activities, you can audition candidates for the role. They will, in turn, get a chance to do the role, albeit in a small way.

The candidates who perform well in the automated interviews will have proven they can do the activities you want them to do in the role. Seeing first hand how well they perform each of those activities will help you confidently make your hiring decision.

By focusing on activities, you can create a hiring process that reflects your role and how you want it to be performed. It’s a simple and effective method to hire people who can, and want to, perform the activities you consider to be value-adding.

***

Our friends at Vervoe specialize in automating your recruiting and screening process to improve your time to hire and ensure you’re hiring the right person for the right position. This post was originally posted by Vervoe, reposted with permission. For more information about Vervoe, visit them at https://vervoe.com/.

MINNEAPOLIS, MN, September 7, 2018 – Assessment Systems, global leaders in psychometrics and assessment software has added the National Institute of Automotive Service Excellence (ASE) to its growing list of valued partners.

Since 1972, ASE has driven to elevate the quality of vehicle repair and service by assessing and certifying automotive professionals. This partnership joins the power and sophistication of Assessment Systems’ flagship products – FastTest, Iteman and Xcalibre – with ASE’s long standing and renowned certification in the automotive industry.

“Our values align,” said Cassandra Bettenberg, Executive Director of Strategic Partnerships at Assessment Systems, “Like Assessment Systems, ASE values psychometrics and wants to develop and deliver more valid and reliable exams.”

As Assessment Systems continues to diversify its list of partners across industries, they continue to improve their best-in-class assessment technology and their new assessment platform, Assess.ai. Assessment Systems recently earned a spot on the Inc. 5000 –  Inc. Magazine’s list of America’s Fastest-Growing Private Companies – for the second year in a row.

About The National Institute for Automotive Service Excellence
The National Institute for Automotive Service Excellence was established in 1972 as a non-profit organization to help improve the quality of automotive service and repair through the voluntary testing and certification of automotive technicians and parts specialists. Today, there are nearly 400,000 ASE- certified professionals at work in dealerships, independent shops, collision repair shops, auto parts stores, fleets, schools and colleges throughout the country.

About Assessment Systems Corporation
Assessment Systems is the trusted provider of high-stakes assessment and psychometric services for over 250 partners worldwide, delivering over 2,000,000 assessments every year. Powered by decades of research in psychometrics, Assessment Systems offers best-in-class software platforms and consulting services to support high-quality measurement and completely scalable solutions. Assessment Systems’ success is driven by a commitment to make assessments smarter, faster, and fairer to ensure bad tests don’t hurt good people.

###

The modified-Angoff method is arguably the most common method of setting a cutscore on a test.  The Angoff cutscore is legally defensible and meets international standards such as AERA/APA/NCME, ISO 17024, and NCCA.  It also has the benefit that it does not require the test to be administered to a sample of candidates first; methods like Contrasting Groups, Borderline Group, and Bookmark do so.

There are, of course, some drawbacks to the Angoff cutscore process.  The most significant is the fact that the subject matter experts (SMEs) tend to overestimate their conceptualization of a minimally competent candidate, and therefore overestimate the cutscore.  Sometimes to the point that the expected pass rate is zero!

Another drawback is that the Angoff cutscore process only works in the classical psychometric paradigm – the recommended cutscores are on the number-correct metric or percentage-correct metric.  If your tests are developed and scored in the item response theory (IRT) paradigm, you need to convert the classical cutscore to the IRT theta scale.  The easiest way to do that is to reverse-calculate the test response function (TRF) from IRT.

The Test Response Function

The TRF (sometimes called a test characteristic curve) is an important method of characterizing test performance in the IRT paradigm.  The TRF predicts a classical score from an IRT score, as you see below.  Like the item response function and test information function (item response and test information function ), it uses the theta scale as the X-axis.  The Y-axis can be either the number-correct metric or proportion-correct metric.

In this example, you can see that a theta of -0.6 translates to an estimated number-correct score of approximately 10, and +1 to 15.5.  Note that the number-correct metric only makes sense for linear or LOFT exams, where every examinee receives the same number of items.  In the case of CAT exams, only the proportion correct metric makes sense.

Angoff cutscore to IRT

So how does this help us with the conversion of a cutscore?  Well, we hereby have a way of translating any number-correct score or proportion-correct score.  So any Angoff-recommended cutscore can be reverse-calculated to a theta value.  If your Angoff study (or Beuk) recommends a cutscore of 10 out of 20 points, you can convert that to a theta cutscore of -0.6.  If the recommended cutscore was 15.5, the theta cutscore would be 1.0.

Because IRT works in a way that it scores examinees on the same scale with any set of items, as long as those items have been part of a linking/equating study.  Therefore, a single Angoff study on a set of items can be equated to any other linear test form, LOFT pool, or CAT pool.  This makes it possible to apply the classically-focused Angoff method to IRT-focused programs.