certification exam development construction

Certification exam development, as well as other credentialing like licensure or certificates, is incredibly important.  Such exams serve as gatekeepers into many professions, often after people have invested a ton of money and years of their life in preparation.  Therefore, it is critical that the tests be developed well, and have the necessary supporting documentation to show that they are defensible.  So what exactly goes into developing a quality exam, sound psychometrics, and establishing the validity documentation, perhaps enough to achieve NCCA accreditation for your certification?

Well, there is a well-defined and recognized process for certification exam development, though it is rarely the exact same for every organization.  In general, the accreditation guidelines say you need to address these things, but leave the specific approach up to you.  For example, you have to do a cutscore study, but you are allow to choose Bookmark vs Angoff vs other method.

 

Job Analysis / Practice Analysis

A job analysis study provides the vehicle for defining the important job knowledge, skills, and abilities (KSA) that will later be translated into content on a certification exam. During a job analysis, important job KSAs are obtained by directly analyzing job performance of highly competent job incumbents or surveying subject-matter experts regarding important aspects of successful job performance. The job analysis generally serves as a fundamental source of evidence supporting the validity of scores for certification exams.

 

Test Specifications and Blueprints

The results of the job analysis study are quantitatively converted into a blueprint for the exam.  Basically, it comes down to this: if the experts say that a certain topic or skill is done quite often or is very critical, then it deserves more weight on the exam, right?  There are different ways to do this.  My favorite article on the topic is Raymond & Neustel, 2006Here’s a free tool to help.

 

test development cycle job task analysis

Item Development

After important job KSAs are established, subject-matter experts write test items to assess them. The end result is the development of an item bank from which exam forms can be constructed. The quality of the item bank also supports test validity.  A key operational step is the development of an Item Writing Guide and holding an item writing workshop for the SMEs.

 

Pilot Testing

There should be evidence that each item in the bank actually measures the content that it is supposed to measure; in order to assess this, data must be gathered from samples of test-takers. After items are written, they are generally pilot tested by administering them to a sample of examinees in a low-stakes context—one in which examinees’ responses to the test items do not factor into any decisions regarding competency. After pilot test data is obtained, a psychometric analysis of the test and test items can be performed. This analysis will yield statistics that indicate the degree to which the items measure the intended test content. Items that appear to be weak indicators of the test content generally are removed from the item bank or flagged for item review so they can be reviewed by subject matter experts for correctness and clarity.

Note that this is not always possible, and is one of the ways that different organizations diverge in how they approach exam development.

 

Standard Setting

Standard setting also is a critical source of evidence supporting the validity of professional credentialing exam (i.e. pass/fail) decisions made based on test scores.  Standard setting is a process by which a passing score (or cutscore) is established; this is the point on the score scale that differentiates between examinees that are and are not deemed competent to perform the job. In order to be valid, the cutscore cannot be arbitrarily defined. Two examples of arbitrary methods are the quota (setting the cut score to produce a certain percentage of passing scores) and the flat cutscore (such as 70% on all tests). Both of these approaches ignore the content and difficulty of the test.  Avoid these!

Instead, the cutscore must be based on one of several well-researched criterion-referenced methods from the psychometric literature.  There are two types of criterion-referenced standard-setting procedures (Cizek, 2006): examinee-centered and test-centered.

The Contrasting Groups method is one example of a defensible examinee-centered standard-setting approach. This method compares the scores of candidates previously defined as Pass or Fail. Obviously, this has the drawback that a separate method already exists for classification. Moreover, examinee-centered approaches such as this require data from examinees, but many testing programs wish to set the cutscore before publishing the test and delivering it to any examinees. Therefore, test-centered methods are more commonly used in credentialing.

The most frequently used test-centered method is the Modified Angoff Method (Angoff, 1971) which requires a committee of subject matter experts (SMEs).  Another commonly used approach is the Bookmark Method.

 

Equating

If the test has more than one form – which is required by NCCA Standards and other guidelines – they must be statistically equated.  If you use classical test theory, there are methods like Tucker or Levine.  If you use item response theory, you can either bake the equating into the item calibration process with software like Xcalibre, or use conversion methods like Stocking & Lord.

What does this process do?  Well, if this year’s certification exam had an average of 3 points higher than last years, how do you know if this year’s version was 3 points easier, or this year’s cohort was 3 points smarter, or a mixture of both?  Learn more here.

 

Psychometric Analysis & Reporting

This part is an absolutely critical step in the exam development cycle for professional credentialing.  You need to statistically analyze the results to flag any items that are not performing well, so you can replace or modify them.  This looks at statistics like item p-value (difficulty), item point biserial (discrimination), option/distractor analysis, and differential item functioning.  You should also look at overall test reliability/precision and other psychometric indices.  If you are accredited, you need to perform year-end reports and submit them to the governing body.  Learn more about item and test analysis.

 

Exam Development: It’s a Vicious Cycle

Now, consider the big picture: in many cases, an exam is not a one-and-done thing.  It is re-used, perhaps continually.  Often there are new versions released, perhaps based on updated blueprints or simply to swap out questions so that they don’t get overexposed.  That’s why this is better conceptualized as an exam development cycle, like the circle shown above.  Often some steps like Job Analysis are only done once every 5 years, while the rotation of item development, piloting, equating, and psychometric reporting might happen with each exam window (perhaps you do exams in December and May each year).

ASC has extensive expertise in managing this cycle for professional credentialing exams, as well as many other types of assessments.  Get in touch with us to talk to one of our psychometricians.

ncca certification accreditation

NCCA Accreditation is a stamp of approval on the quality of a certification program, governed by the National Commission for Certifying Agencies (NCCA).™  This is part of the Institute for Credentialing Excellence™, the leader in the world of professional credentialing.  NCCA accreditation tells your certificants – and all stakeholders in your profession, including customers/patients – that the credential meets best practices and international standards, so they can trust the quality of the personnel who have achieved it.  In many cases, you can’t have this trust with an unaccredited credential; though there are definitely many decent ones who just lack the size/funding to get accredited.

What is NCCA Accreditation?

Getting a certification accredited shows that it is good quality.  Anyone can write 50 questions on a topic in their basement and throw it up on some free survey/quiz software, then call it a certification.  In fact, many places do, and charge hundreds of dollars for this.  NCCA accreditation is a push back on this practice, where respectable certifications banded together and agreed on a few main points regarding what his high quality.  Some examples:

  • You must have an oversight board, which includes a public member
  • You must have a legit organization with audited financial statements
  • You must have policies for application, retakes, continuing education, and more
  • There must be a firewall between certification staff and education staff
  • The test must be professionally designed and maintained
  • The test must be delivered securely, with proctoring.

What do we mean by “certification program”?

A certification is a validation of a person’s skills and knowledge for a particular profession.  We all think of it as a test that must be passed, but that’s actually a minority of the process.  There’s also things like initial education, eligibility pathways to sit for the exam, retake policies, how to get recertified, continuing education, etc.  On top of that, there are organizational issues; you need to make sure that there is an appropriate governing board, that education and certification staff don’t overlap, that you have valid financial accounting, etc.  So that’s why the accreditation refers to a “program” and not just a “test.”

This means that an organization with multiple certification programs will need to apply for accreditation on each.  However, since many of the aspects are about the organization (e.g., financial statements), there is massive overlap and these can be re-used for each.

What do we mean by “stamp of approval”?

NCCA is a panel of experts, composed of a range of stakeholders in the industry: PhD psychometricians, internationally-known certification managers, attorneys with expertise in this specific topic, and so on.  You need to complete a formal application process, submitting tons of documentation about the aforementioned topics.  The panel will then review this and grant accreditation, stating that you have followed all the standards.

Again, note that this is not just a stamp of approval on the exam.  If you have an exam for certified Widgetmakers and you have a panel of expert Widgetmakers, the NCCA is not going to evaluate your actual questions.  They are evaluating much bigger questions.  Do you have a nonprofit board set up and have the correct legal governance?  Do you have audited financial statements like any other sound entity?  Do you have a published Candidate Handbook that lays out everything from how to initially apply for the certification to how to maintain it for your career?

Why should we get accredited?

In many cases, it is not necessary to achieve NCCA accreditation.  There are really three reasons:

  1. Quality Assurance: Accreditation ensures that a certification program meets established standards of quality and rigor. It validates that the program has undergone a comprehensive review by an independent accrediting body and has demonstrated its adherence to industry-recognized standards and best practices. Accreditation helps maintain and improve the quality of the certification program over time.
  2. Credibility and Recognition: Accreditation adds credibility and recognition to a certification program. It signifies that the program has been evaluated by experts in the field and has met rigorous criteria. Accreditation enhances the reputation of the certification, making it more valuable and trusted by employers, professionals, and other stakeholders.  This helps you sell more certifications; remember, credentialing is a business and certifications are the flagship product!
  3. Industry Acceptance: Accreditation can increase the acceptance and recognition of a certification within the industry or professional community. It provides assurance to employers, clients, and regulatory bodies that the certified individuals have acquired the necessary knowledge, skills, and competencies to perform their roles effectively.
  4. Competitive Advantage: Some fields, like personal trainers, have many organizations that offer training and certifications.  Achieving certification provides an advantage over your competitors in the marketplace.
  5. Standardization: Accreditation promotes standardization and consistency in the certification process. It ensures that the program’s content, assessment methods, passing criteria, and recertification requirements are fair, transparent, and consistent across all candidates. Standardization helps maintain a level playing field and ensures that certified individuals possess the same level of expertise.
  6. Career Advancement: Accreditation can enhance career opportunities for individuals holding the certified credential. It demonstrates their commitment to professional development and continuous learning. Accredited certifications are often preferred or required by employers, which can lead to better job prospects, promotions, and salary advancements.
  7. Regulatory Compliance: In some industries or professions, accreditation may be a requirement for regulatory compliance. Certain certifications may be mandated by licensing boards or regulatory authorities to ensure public safety, consumer protection, or adherence to specific standards and regulations.  Another example is that if you are selling certifications to members of the US Military, they need to be accredited.

These are all very good reasons, certainly.

What is involved in NCCA Accreditation?

The time and cost can vary widely depending on the current state of your organization. If you read the NCCA Standards (requirements to get accredited), they generally fall into 3 categories:

1. Psychometrics and test development: You need to follow best practices in making the exam.  You can’t just write 50 items in your basement and throw it up on a survey platform.  You need statistical reports, job task analysis study, standard setting studies a defensible pass score, and much more.
2. Certification operations and policies: You need to establish policies and procedures, then document in a Candidate Handbook.  You need to set up a business: accepting payments, bookkeeping, tracking status, retakes, annual recertification, perhaps a member conference or webinars, etc.
3. Business/legal/governance:  You need to be a legit organization with Bylaws and audited financial statements.

What is the cost of NCCA Accreditation?

A rule of thumb that I have heard in the industry is that achieving NCCA accreditation for a certification exam will take 1 year and $100,000. Most of that is for parts 2 and 3, which are typically done by you, and not your testing vendor.  So those costs are not what is paid to NCCA for the application process, either.  It is to your staff, to work on a quality Candidate Handbook, set up quarterly webinars for continuing education, create a registration portal – whatever makes sense for you, as long as it follows the Standards.  In some cases they might be things you already do, such as audited financial statements.

We specialize in the psychometrics, which costs far less than $100,000 and takes 3-6 months depending on availability of your subject matter experts. We can certainly work on parts 2 and 3 if you do not have bandwidth and expertise internally.  We can also deliver the exams for you.

If you aren’t sure of the next steps, we can perform an audit on your current state and potential timeline, which will provide a much clearer picture.  CONTACT US  to learn more.

Note: this is not an endorsement of NCCA by ASC, or vice versa, and is meant for educational purposes only.

 

university class

The Beuk Compromise or Beuk Adjustment is a method for a “reality check” on the results of a modified-Angoff standard setting study.  It is well-known that experts will often overestimate examinee capabilities and choose a cutscore that is too high – in some cases, so high that even the experts themselves would fail the exam!  The Beuk Compromise was designed to balance this with the reality of actual examinee performance.  There are similar methods as well, such as the Hofstee Method.

What is a modified-Angoff study?

The Angoff approach is one of the most common ways of setting a defensible cutscore on an exam, especially in the world of professional credentialing (certification and licensure testing).  A panel of subject matter experts (SMEs) is convened to discuss the concept of a minimally competent candidate (MCC) and then review each item on the exam to estimate the percentage of minimally competent candidates that would get each item correct.  The average of these ratings is then the average score that the panel expects an MCC to achieve – a very compelling argument for what should be the passing score!

OK, then what is the issue?

But in practice, the experts are often in rarified air and forgot what it was like to be 22 years old and entering the profession wide-eyed, so they often overestimate both the description of the MCC and the difficulty ratings themselves.  You might find a situation where they set the cutscore at 82, but the average score on the exam is 63.  You might go further and ask the experts to take the exam themselves and find their average is only 75!

So, psychometricians have developed add-on procedures to address this issue.  Each SME can also be asked to provide information for an adjustment or compromise method.  A compromise method assumes that we should not rely on modified-Angoff ratings alone; the results of another method should be considered in conjunction.

The most common adjustment method is the Beuk adjustment or Beuk compromise, which recognizes that a pure Angoff study makes no use of actual data on the test, and instead attempts to reconcile the Angoff approach with an estimate of the score distribution on the test.  Of course, this approach can then be only used if data exists; if there is no data available with which to estimate the score distribution, the Beuk adjustment is not possible.

What is the Beuk Compromise?

To find the Beuk compromise, two pieces of information are needed from each SME: an estimated pass rate and an estimated cutscore.  The estimated cutscore is obtained by calculating the average Angoff rating for each SME; you need to ask them for what they think the MCC pass rate should be.  What you will often find is that the say the pass rate should be, say, 75%, but when you continue the example before (average score of 63), the pass rate with their recommended cutscore turns out to be 10%!

How do I implement the Beuk Compromise?

Use the Angoff Analysis Tool.

SMEs are then simply asked in the meeting to estimate the pass rate of examinees who take the test, after having reviewed all the items.   Enter those values into the AAT in the assigned cells. If the SMEs consider the test difficult with regards to the cutscore that should be applied and the types of examinees, a low pass rate will be estimated.  These ratings are recorded on the “Adjustments” tab of the AAT.

The Beuk adjustment is best depicted graphically, and this figure is presented on the last tab of the AAT workbook.  It involves two functions:

  1. A curve that presents the pass rate as a function a function of all possible cutscores – this is calculated using the estimates of the score distribution.
  2. A straight line that is a function of the estimated pass rates. The line must pass through the point on the plane where the expected pass rate and panel-recommended cutscore intersect, and has a slope equal to the ratio of the standard deviations of the rater’s cutscore and pass rate estimates.

The x-coordinate of the intersection of these two functions is the Beuk adjustment.  An example of this graph is presented below.  Here, we have a 200-point exam.  A cutscore of 170 would produce a pass rate of about 20%.  A cutscore of 120 would produce a pass rate of about 90%.  The Beuk comes out to be about 145.

Beuk compromise

psychometrics graduate programs university students

Certification vs. Certificate: These terms might seem similar, but mean quite different things.   Both of these terms here are a credential, which is an umbrella term.  A credential refers to any type of supporting document/attestation that you have done something or know something, and includes simple things like a Driver’s License all the way to a Medical Doctor degree with a Neurosurgery certification.  For more discussion, we recommend you check out the aptly named Institute for Credentialing Excellence.

 

Certification

Professional or personnel certification is a voluntary process by which individuals are evaluated against predetermined standards for knowledge, skills, or competencies. Participants who demonstrate that they meet the standards by successfully passing an assessment process are granted the certification.  Note that this 1) refers to an individual (not a program or organization) and 2) refers only to showing that they have competencies (not simply attending or completing a course).

Certifications are usually put out by independent organizations, not a government or other entity (that is licensure).  Usually it is a nonprofit association or board, which can be for a specific country (American Board of _______) or worldwide (International Association of ________).

A lot of work goes into developing good certifications, sometimes millions of dollars.  This includes test development practices like job task analysis, item writer training workshops, modified-Angoff cutscore studies, equating, and item response theory.

Certificate

A certificate program is less rigorous than a certification, and is often just a piece of paper that you might receive after sitting through a course.  For example, you might take an online course in a MOOC where you watch a few video lectures and perhaps answer a few quizzes with no proctoring, and afterwards you receive a certificate.

A step up from this is an assessment based certificate which requires that you pass a fairly exam afterwards.  This exam is sometimes built with some of the stringent practices of a certification, like a modified-Angoff standard setting study.

An assessment-based certificate program is a non-degree granting program that:
(a) provides instruction and training to aid participants in acquiring specific knowledge, skills, and/or competencies associated with intended learning outcomes;
(b) evaluates participants’ achievement of the intended learning outcomes; and
(c) awards a certificate only to those participants who meet the performance, proficiency or passing standard for the assessment(s).

Learn more about assessment based certificates with NCCA.

Additional terms beyond Certification vs. Certificate

License: Like a certification, but it is required by law.  It is usually defined by competencies (a driver’s license means have shown you know how to drive) but not always (a marriage license does not mean you know how to be a good spouse!).

Microcredential: Like a certificate, but even narrower.

Degree / Diploma: Means that you have completed some sort of education.  This can range all the way from a 4-hour online course to 4 years of prestigious medical school!

Digital Badge: This is often similar to a certificate, sometimes a microcredential, but is displayed as a digital icon, such as on your LinkedIn profile.

Accreditation: This says that your Certification or Certificate program meets best practices.  This is NOT for an individual; it refers to an organization or a program.  For example, if a university accredited?  Is a certification program accredited?  There are strict guidelines to do so, and NCCA Accreditation is one example.

concurrent calibration irt equating linking

Test equating refers to the issue of defensibly translating scores from one test form to another. That is, if you have an exam where half of students see one set of items while the other half see a different set, how do you know that a score of 70 is the same one both forms? What if one is a bit easier? If you are delivering assessments in conventional linear forms – or piloting a bank for CAT/LOFT – you are likely to utilize more than one test form, and, therefore, are faced with the issue of test equating.

When two test forms have been properly equated, educators can validly interpret performance on one test form as having the same substantive meaning compared to the equated score of the other test form (Ryan & Brockmann, 2009). While the concept is simple, the methodology can be complex, and there is an entire area of psychometric research devoted to this topic. This post will provide an overview of the topic.

Why do we need test linking and equating?

The need is obvious: to adjust for differences in difficulty to ensure that all examinees receive a fair score on a stable scale. Suppose you take Form A and get a score of 72/100 while your friend takes Form B and gets a score of 74/100. Is your friend smarter than you, or did his form happen to have easier questions?  What if the passing score on the exam was 73? Well, if the test designers built-in some overlap of items between the forms, we can answer this question empirically.

Suppose the two forms overlap by 50 items, called anchor items or equator items. They are delivered to a large, representative sample. Here are the results.

Mean score on 50 overlap items Mean score on 100 total items
30 72
32 74

Because the mean score on the anchor items was higher, we then think that the Form B group was a little smarter, which led to a higher total score.

Now suppose these are the results:

Mean score on 50 overlap items Mean score on 100 total items
32 72
32 74

Now, we have evidence that the groups are of equal ability. The higher total score on Form B must then be because the unique items on that form are a bit easier.

What is test equating?

According to Ryan and Brockmann (2009), “Equating is a technical procedure or process conducted to establish comparable scores, with equivalent meaning, on different versions of test forms of the same test; it allows them to be used interchangeably.” (p. 8). Thus, successful equating is an important factor in evaluating assessment validity, and, therefore, it often becomes an important topic of discussion within testing programs.

Practice has shown that scores, and tests producing scores, must satisfy very strong requirements to achieve this demanding goal of interchangeability. Equating would not be necessary if test forms were assembled as strictly parallel, meaning that they would have identical psychometric properties. In reality, it is almost impossible to construct multiple test forms that are strictly parallel, and equating is necessary to attune a test construction process.

Dorans, Moses, and Eignor (2010) suggest the following five requirements towards equating of two test forms:

  • tests should measure the same construct (e.g. latent trait, skill, ability);
  • tests should have the same level of reliability;
  • equating transformation for mapping the scores of tests should be the inverse function;
  • test results should not depend on the test form an examinee actually takes;
  • the equating function used to link the scores of two tests should be the same regardless of the choice of (sub) population from which it is derived.

How do I calculate an equating?

Classical test theory (CTT) methods include linear equating and equipercentile equating as well as several others. Some newer approaches that work well with small samples are Circle-Arc (Livingston & Kim, 2009) and Nominal Weights (Babcock, Albano, & Raymond, 2012).  Specific methods for linear equating include Tucker, Levine, and Chained (von Davier & Kong, 2003). Linear equating approaches are conceptually simple and easy to interpret; given the examples above, the equating transformation might be estimated with a slope of 1.01 and an intercept of 1.97, which would directly confirm the hypothesis that one form was about 2 points easier than the other.

Item response theory (IRT) approaches include equating through common items (equating by applying an equating constant, equating by concurrent or simultaneous calibration, and equating with common items through test characteristic curves), and common person calibration (Ryan & Brockmann, 2009). The common-item approach is quite often used, and specific methods for finding the constants (conversion parameters) include Stocking-Lord, Haebara, Mean/Mean, and Mean/Sigma. Because IRT assumes that two scales on the same construct differ by only a simple linear transformation, all we need to do is find the slope and intercept of that transformation. Those methods do so, and often produce nice looking figures like the one below from the program IRTEQ (Han, 2007). Note that the b parameters do not fall on the identity line, because there was indeed a difference between the groups, and the results clearly find that is the case.

IRTEQ IRT equating

Practitioners can equate forms with CTT or IRT. However, one of the reasons that IRT was invented was that equating with CTT was very weak. Hambleton and Jones (1993) explain that when CTT equating methods are applied, both ability parameter (i.e., observed score) and item parameters (i.e., difficulty and discrimination) are dependent on each other, limiting its utility in practical test development. IRT solves the CTT interdependency problem by combining ability and item parameters in one model. The IRT equating methods are more accurate and stable than the CTT methods (Hambleton & Jones, 1993; Han, Kolen, & Pohlmann, 1997; De Ayala, 2013; Kolen and Brennan, 2014) and provide a solid basis for modern large-scale computer-based tests, such as computerized adaptive tests (Educational Testing Service, 2010; OECD, 2017).

Of course, one of the reasons that CTT is still around in general is that it works much better with smaller samples, and this is also the case for CTT test equating (Babcock, Albano, & Raymond, 2012).

How do I implement test equating?

Test equating is a mathematically complex process, regardless of which method you use.  Therefore, it requires special software.  Here are some programs to consider.

  1. CIPE performs both linear and equipercentile equating with classical test theory. It is available from the University of Iowa’s CASMA site, which also includes several other software programs.
  2. IRTEQ is an easy-to-use program which performs all major methods of IRT Conversion equating.  It is available from the University of Massachusetts website, as well as several other good programs.
  3. There are many R packages for equating and related psychometric topics. This article claims that there are 45 packages for IRT analysis alone!
  4. If you want to do IRT equating, you need IRT calibration software. We highly recommend Xcalibre since it is easy to use and automatically creates reports in Word for you. If you want to do the calibration approach to IRT equating (both anchor-item and concurrent-calibration), rather than the conversion approach, this is handled directly by IRT software like Xcalibre. For the conversion approach, you need separate software like IRTEQ.

Equating is typically performed by highly trained psychometricians; in many cases, an organization will contract out to a testing company or consultant with the relevant experience. Contact us if you’d like to discuss this.

Does equating happen before or after delivery?

Both. These are called pre-equating and post-equating (Ryan & Brockmann, 2009).  Post-equating means the calculation is done after delivery and you have a full data set, for example if a test is delivered twice per year on a single day, we can do it after that day.  Pre-equating is more tricky, because you are trying to calculate the equating before a test form has ever been delivered to an examinee; but this is 100% necessary in many situations, especially those with continuous delivery windows.

How do I learn more about test equating?

If you are eager to learn more about the topic of equating, the classic reference is the book by Kolen and Brennan (2004; 2014) that provides the most complete coverage of score equating and linking.  There are other resources more readily available on the internet, like this free handbook from CCSSO. If you would like to learn more about IRT, we suggest the books by De Ayala (2008) and Embretson and Reise (2000). A brief intro of IRT equating is available on our website.

Several new ideas of general use in equating, with a focus on kernel equating, were introduced in the book by von Davier, Holland, and Thayer (2004). Holland and Dorans (2006) presented a historical background for test score linking, based on work by Angoff (1971), Flanagan (1951), and Petersen, Kolen, and Hoover (1989). If you look for a straightforward description of the major issues and procedures encountered in practice, then you should turn to Livingston (2004).


Want to learn more? Talk to a Psychometric Consultant

References

Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508-600). American Council on Education.

Babcock, B., Albano, A., & Raymond, M. (2012). Nominal Weights Mean Equating: A Method for Very Small Samples. Educational and Psychological Measurement, 72(4), 1-21.

Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating. ETS Research Report Series2010(2), i-41.

De Ayala, R. J. (2008). A commentary on historical perspectives on invariant measurement: Guttman, Rasch, and Mokken.

De Ayala, R. J. (2013). Factor analysis with categorical indicators: Item response theory. In Applied quantitative analysis in education and the social sciences (pp. 220-254). Routledge.

Educational Testing Service (2010). Linking TOEFL iBT Scores to IELTS Scores: A Research Report. Educational Testing Service.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Maheah.

Flanagan, J. C. (1951). Units, scores, and norms. In E. F. Lindquist (Ed.), Educational measurement (pp. 695-763). American Council on Education.

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational measurement: issues and practice12(3), 38-47.

Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true-and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education10(2), 105-121.

Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187-220). Praeger.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, linking, and scaling: Methods and practices (2nd ed.). Springer-Verlag.

Kolen, M. J., & Brennan, R. L. (2014). Item response theory methods. In Test Equating, Scaling, and Linking (pp. 171-245). Springer.

Livingston, S. A. (2004). Equating test scores (without IRT). ETS.

Livingston, S. A., & Kim, S. (2009). The Circle‐Arc Method for Equating in Small Samples. Journal of Educational Measurement 46(3): 330-343.

OECD (2017). PISA 2015 Technical Report. OECD Publishing.

Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). Macmillan.

Ryan, J., & Brockmann, F. (2009). A Practitioner’s Introduction to Equating with Primers on Classical Test Theory and Item Response Theory. Council of Chief State School Officers.

von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. Springer.

von Davier, A. A., & Kong, N. (2003). A unified approach to linear equating for non-equivalent groups design. Research report 03-31 from Educational Testing Service. https://www.ets.org/Media/Research/pdf/RR-03-31-vonDavier.pdf

certification licensure exam laptop

Certification vs Licensure exams are two terms that are used quite frequently to refer to examinations that someone has to pass to demonstrate skills in a certain profession or topic.  They are quite similar, and often confused.  This is exacerbated by even more similar terms in the field, such as accreditation, credentialing, certificate, and microcredentials.  This post will help you understand the differences.

What is Certification?

Certification is “a credential that you earn to show that you have specific skills or knowledge. They are usually tied to an occupation, technology, or industry.” (CareerOneStop)  The important aspect in this definition is the latter portion; the organization that runs the certification is generally across an industry or a profession, regardless of political boundaries.  It is almost always some sort of professional association or industry board, like the American Association of Widgetmakers (obviously not a real thing).  However, it is sometimes governed by a specific company or other organization regarding their products; perhaps the most well known is how Amazon Web Services will certify you in skills to hand their offerings.  Many other technology and software companies do the same.

What is Licensure?

Licensure is a “formal permission to do something: esp., authorization by law to do some specified thing (license to marry, practice medicine, hunt, etc.)” (Schmitt, 1995).  The key phrase here is by law.  The sponsoring organization is a governmental entity, and that is defines what licensure is.  In fact, licensure is not even always about a profession; almost all of us have a Driver’s License for which we passed a simple exam.  Moreover, it does not always even have to be about a profession; many millions of people have a Fishing License, which is granted by the government (by States in the USA), for which you simply pay a small fee.  The license is still an attestation, but not of your skills, just that you have been authorized to do something.  Of course, in the context of assessment, it means that you have passed some sort of exam which is mandated by law, typically for professions that are dangerous enough or impact a wide range of people that the government has stepped in to provide oversight: attorneys, physicians, medical professionals, etc.

Certification vs Licensure Exams

woman-taking-testUsually, there is a test that you must pass, but the sponsor can differ with certification vs licensure.  The development and delivery of such tests is extremely similar, leading to the confusion.  They often will both utilize job analysis, Angoff studies, and the like.  The difference between the two is outside the test itself, and instead refers to the sponsoring organization: is it mandated/governed by a governmental entity, or is it unrelated to political/governmental boundaries?  You are awarded a credential after successful completion, but the difference is in the group that awards the credential, what it means, and where it is recognized.

However, there are many licensures that do not involve an exam, but you simply need to file some paperwork with the government.  An example of this is a marriage license.  You certainly don’t have to take a test to qualify!

Can they be the same exam?

To make things even more confusing… yes.  And it does not even have to be consistent.  In the US, some professions have a wide certification, which is also required in some States as licensure, but not in all States!  Some States might have their own exams, or not even require an exam.  This muddles the difference between certification vs licensure.  ICRC notes that they are sometimes complementary or parallel processes.

Differences between Certification and Licensure

Aspect Certification Licensure
Mandatory? No Yes
Run by Association, Board, Nonprofit, Private Company Government
Does it use an exam? Yes, especially if it is accredited Sometimes, but often not (consider a marriage license)
Accreditation involved? Yes, NCCA and ANSI provide accreditation that a certification is high quality No; often there is no check on quality
Examples Certified Chiropractic Sports Physician (CCSP®), Certified in Clean Needle Technique (CNT) Marriage license; Driver’s License; Fishing License; License to practice law (Bar Exam)

How do these terms relate to other, similar terms?

This outline summarizes some of the relevant terms regarding certification vs licensure and other credentials.  This is certainly more than can be covered in a single blog post!

  • Attestation of some level of quality for a person or organization = CREDENTIALING
    • Attestation of a person
      • By government = LICENSURE
      • By independent board or company
        • High stakes, wide profession = CERTIFICATION
        • Medium stakes = CERTIFICATE
        • Low stakes, quite specific skill = MICROCREDENTIAL
      • By an educational institution = DEGREE OR DIPLOMA
    • Attestation of an organization = ACCREDITATION
bookmark-method-of-standard-setting

Authors: 

Laila Issayeva, MS

Nathan Thompson, PhD

The Bookmark Method of standard setting (Lewis, Mitzel, & Green, 1996) is a scientifically-based approach to setting cutscores on an examination. It allows stakeholders of an assessment to make decisions and classifications about examinees that are constructive rather than arbitrary (e.g., 70%), meet the goals of the test, and contribute to overall validity. A major advantage of the bookmark method over others is that it utilizes difficulty statistics on all items, making it very data-driven; but this can also be a disadvantage in situations where such data is not available. It also has the advantage of panelist confidence (Karantonis & Sireci, 2006).

The bookmark method operates by delivering a test to a representative sample (or population) of examinees, and then calculating the difficulty statistics for each item. We line up the items in order of difficulty, and experts review the items to place a bookmark where they think a cutscore should be. Nowadays, we use computer screens, but of course in the past this was often done by printing the items in paper booklets, and the experts would literally insert a bookmark.

What is standard setting?

Standard setting (Cizek & Bunch, 2006) is an integral part of the test development process even though it has been undervalued outside of practitioners’ view in the past (Bejar, 2008). Standard setting is the methodology of defining achievement or proficiency levels and corresponding cutscores. A cutscore is a score that serves as a measure of classifying test takers into categories.

Educational assessments and credentialing examinations are often employed to distribute test takers among ordered categories according to their performance across specific content and skills (AERA, APA, & NCME, 2014; Hambleton, 2013). For instance, in tests used for certification and licensing purposes, test takers are typically classified as “pass”—those who score at or above the cutscore—and those who “fail”. In education, students are often classified in terms of proficiency; the Nation’s Report Card assessment (NAEP) in the United States classifies students as Below Basic, Basic, Proficient, Advanced.

However, assessment results could come into question unless the cutscores are appropriately defined. This is why arbitrary cutscores are considered indefensible and lacking validity. Instead, psychometricians help test sponsors to set cutscores using methodologies from the scientific literature, driven by evaluations of item and test difficulty as well as examinee performance.

When to use the bookmark method?

Two approaches are mainly used in international practice to establish assessment standards: the Angoff method (Cizek, 2006) and the Bookmark method (Buckendahl, Smith, Impara, & Plake, 2000). The Bookmark method, unlike the Angoff method, requires the test to be administered prior to defining cutscores based on test data. This provides additional weight to the validity of the process, and better informs the subject matter experts during the process. Of course, many exams require a cutscore to be set before it is published, which is impossible with the bookmark; the Angoff procedure is very useful then.

How do I implement the bookmark method?

The process of standard setting employing the Bookmark method consists of the following stages:

  1. Identify a team of subject matter experts (SMEs); their number should be around 6-12, and led by a test developer/psychometrician/statistician
  2. Analyze test takers’ responses by means of the item response theory (IRT)
  3. Create a list items according to item difficulty in an ascending order
  4. Define the competency levels for test takers; for example, have the 6-12 experts discuss what should differentiate a “pass” candidate from a “fail” candidate
  5. Experts read the items in the ascending order (they do not need to see the IRT values), and place a bookmark where appropriate based on professional judgement across well-defined levels
  6. Calculate thresholds based on the bookmarks set, across all experts
  7. If needed, discuss results and perform a second round

Example of the Bookmark Method

If there are four competency levels such as the NAEP example, then SMEs need to set up three bookmarks in-between: first bookmark is set after the last item in a row that fits the minimally competent candidate for the first level, then second and third. There are thresholds/cutscores from 1 to 2, 2 to 3, and 3 to 4. SMEs perform this individually without discussion, by reading the items.

When all SMEs have provided their opinion, the standard setting coordinator combines all results into one spreadsheet and leads the discussion when all participants express their opinion referring to the bookmarks set. This might look like the sheet below. Note that SME4 had a relatively high standard in their mind, while SME2 had a low standard in their mind – placing virtually every student above an IRT score of 0.0 into the top category!

bookmark method 1

After the discussion, the SMEs are given one more opportunity to set the bookmarks again. Usually, after the exchange of opinions, the picture alters. SMEs gain consensus, and the variation in the graphic is reduced.  An example of this is below.

bookmark method

What to do with the results?

Based on the SMEs’ voting results, the coordinator or psychometrician calculates the final thresholds on the IRT scale, and provides them to the analytical team who would ultimately prepare reports for the assessment across competency levels. This might entail score reports to examinees, feedback reports to teachers, and aggregate reports to test sponsors, government officials, and more.

You can see how the scientific approach will directly impact the interpretations of such reports. Rather than government officials just knowing how many students scored 80-90% correct vs 90-100% correct, the results are framed in terms of how many students are truly proficient in the topic. This makes decisions from test scores – both at the individual and aggregate levels – much more defensible and informative.  They become truly criterion-referenced.  This is especially true when the scores are equated across years to account for differences in examinee distributions and test difficulty, and the standard can be demonstrated to be stable.  For high-stakes examinations such as medical certification/licensure, admissions exams, and many more situations, this is absolutely critical.

Want to talk to an expert about implementing this for your exams?  Contact us.

References

[AERA, APA, & NCME] (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education). (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Bejar, I. I. (2008). Standard setting: What is it? Why is it important. R&D Connections, 7, 1-6. Retrieved from https://www.ets.org/Media/Research/pdf/RD_Connections7.pdf

Buckendahl, C. W., Smith, R. W., Impara, J. C., & Plake, B. S. (2000). A comparison of Angoff and Bookmark standard setting methods. Paper presented at the Annual Meeting of the Mid-Western Educational Research Association, Chicago, IL: October 25-28, 2000.

Cizek, G., & Bunch, M. (2006). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests.  Thousand Oaks, CA: Sage.

Cizek, G. J. (2007). Standard setting. In Steven M. Downing and Thomas M. Haladyna (Eds.) Handbook of test development. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers, pp. 225-258.

Hambleton, R. K. (2013). Setting performance standards on educational assessments and criteria for evaluating the process. In Setting performance standards, pp. 103-130. Routledge. Retrieved from https://www.nciea.org/publications/SetStandards_Hambleton99.pdf

Karantonis, A., & Sireci, S. (2006). The Bookmark Standard‐Setting Method: A Literature Review. Educational Measurement Issues and Practice 25(1):4 – 12.

Lewis, D. M., Mitzel, H. C., & Green, D. R. (1996, June). Standard setting: A Book-mark approach. In D. R. Green (Chair),IRT-based standard setting procedures utilizing behavioral anchoring. Symposium conducted at the Council of Chief State School Officers National Conference on Large-Scale Assessment, Phoenix, AZ.

exam-administration-proctoring

Certification exam administration and proctoring is a crucial component of the professional credentialing process.  Certification exams are expensive to develop well, so an organization wants to protect that investment by delivering the exam with appropriate security so that items are not stolen.  Moreover, there is an obvious incentive for candidates to cheat.  So, a certification body needs appropriate processes in place to deliver the certification exams.  Here are some tips

1. Determine the best approach for certification exam administration and proctoring

Here are a few of the considerations to take into account.

Cohorts vs. Continuous

Do you have cohorts, where events make more sense, or do you need continuous?  For example, if the test is tied to university training programs that graduate candidates in December and May each year, that affects your need for delivery.  Alternatively, some certifications are not tied to such training; you might have to only show work experience.  In those cases, candidates are ready to take the test continuously throughout the year.

Paper vs computer

Does it make more sense to deliver the test on paper or on computer?  This used to be a cost issue, but now the cost of computerized delivery, especially with online proctoring at home, has dropped significantly while saving so much time for candidates.  Also, some exam types like clinical simulations can only be delivered on computers.

Test centers vs online proctored vs events

Some types of tests require events, such as a clinical assessment in an actual clinic with standardized patients.  Some tests can be taken anywhere.  Exam events can also coincide with other events; perhaps you have online delivery through the year but deliver a paper version of the test at your annual conference, for convenience.

ansi accreditation certification exam candidates

Geographic dispersion

If your exam is for a small US state or a small country, it might be easy to require exams in a test center, because you can easily set up only one or two test centers to cover the geography.  Some certifications are international, and need to deliver on-demand throughout the year; those are a great fit for online.

Security level needs

If your test has extremely high stakes, there is extremely high incentive to cheat.  An entry-level certification on WordPress is different than a medical licensure exam.  The latter is a better fit for test centers, while the former might be fine with online proctoring on-demand.

2. Evaluate remote proctoring options

If you choose to explore this approach, here are three main types to evaluate.

A. AI only

AI only proctoring means that there are no humans.  The examinee is recorded on video, and AI algorithms flag potential issues, such as if they leave their seat, then notify an administrator (usually a professor) of students with a high number of flags.  This approach is usually not relevant for certifications or other credentialing exams, it is more for low-stakes exams like a Psychology 101 Midterm at your local university.  The vendors for this approach are interested in large-scale projects, such as proctoring all midterms and finals at a university, perhaps hundreds of thousands of exams per year.

B. Record and Review

Record and review proctoring means that the examinee is recorded on video, but that video is watched by a real human and flagged if they think there is cheating, theft, or other issues.  This is much higher quality, and higher price, but has one major flaw that might be concerning to certification tests: if someone steals your test by taking pictures, you won’t find out until tomorrow.  But at least you know who it was and you are certain of what happened, with a video proof.  Perhaps useful for microcredentials or recertification exams.

C. Live Online Proctoring

Live online proctoring (LOP), or what I call “live human proctoring” (because some AI proctoring is also “live” in real time!) means that there is a professional human proctor on the other side of the video from the examinee.  They check the examinee in, confirm their identity, scan the room, provide instructions, and actually watch them take the test.  Some providers like MonitorEDU even have the examinee make a second video stream on their phone, which is placed on a bookshelf or similar spot to see the entire room through the test.  Certainly, this approach is a very good fit with certification exams and other credentialing.  You protect the test content as well as the validity of that individual’s score; that is not possible with the other two approaches.

3. Determine other technology, psychometric, and operational needs

Next, your organization should establish the other needs for your exams.  Do you require special item types?  Perhaps adaptive testing or linear on the fly testingPsychometric consulting services?  Specific operational controls such as exam time/date windows or navigation limits?  Registration and payment portal?  Write all these up so that you can use the list to shop for a provider.

4. Find an integrated provider for certification exam administration

test development cycle fasttest

Most providers of remote proctoring are just that: remote proctoring.  They do not have a professional platform to manage item banks, schedule examinees, deliver tests, create custom score reports, and analyze psychometrics.  Some do not even integrate with such platforms, and only integrate with learning management systems like Moodle, seeing as their entire target market is only low-stakes university exams.  So if you are seeking a vendor for certification testing or other credentialing, the list of potential vendors is smaller.

Our flagship platform, FastTest, works with 6 different remote proctoring providers and can easily integrate with more.  It also supports paper exams, self-hosted events, and testing centers.

5. Establish the new process

Once you have selected a vendor, work with them to establish the new process for delivering your certification exams with remote proctoring.  Remember, this goes FAR beyond exam day!

  • Candidate Handbook
  • Registration and scheduling
  • Candidate training and practice tests
  • Exam delivery (including verification, environmental rules, materials allowed, break policy, etc.)
  • Test security plan:  What do you do if someone is caught taking pictures of the exam with their phone, or the other potential events?

Ready to start?

ASC is one of the world leaders in this endeavor.  Contact us to get a free account in our platform and experience the examinee process, or to receive a demonstration from one of our experts.

student remote online proctoring software

Online proctoring software refers to platforms that proctor educational or professional assessments (exams or tests) when the proctor is not in the same room as the examinee.  This means that it is done with a video stream or recording using a webcam and sometimes an additional device, which are monitored by a human and/or AI.  It is also referred to as remote proctoring or invigilation. Online proctoring offers a compelling alternative to in-person proctoring, somewhere in between unproctored at-home tests and tests delivered at an expensive testing center in an office building.  This makes it a perfect fit for medium-stakes exams, such as university placement, pre-employment screening, and many types of certification/licensure tests.

What are the types of online proctoring?

There are many types of online proctoring software on the market, spread across dozens of vendors, especially new ones that sought to capitalize on the pandemic which were not involved with assessment before hand.  With so many options, how can you more effectively select amongst the types of remote proctoring? There are four types of remote proctoring platforms, which can be adapted to a particular use case, sometimes varying between different tests in a single organization.  ASC supports all four types, and partners with 5 different vendors to help provide the best solution to our clients.  In descending order of security:

Approach What it entails for you What it entails for the candidate

Live with professional proctors

  • You register a set of examinees in FastTest, and tell us when they are to take their exams and under what rules.
  • We provide the relevant information to the proctors.
  • You send all the necessary information to your examinees.
  • The most secure of the types of remote proctoring.
  • Examinee goes to ascproctor.com, where they will initiate a chat with a proctor.
  • After confirmation of their identity and workspace, they are provided information on how to take the test.
  • The proctor then watches a video stream from their webcam as well as a phone on the side of the room, ensuring that the environment is secure. They do not see the screen, so your exam content is not exposed. They maintain exam invigilation continuously.
  • When the examinee is finished, they notify the proctor, and are excused.

Live, bring your own proctor (BYOP)

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • Your staff logs into the admin portal and awaits examinees.
  • Videos with AI flagging are available for later review if needed.
  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  Proctors ask the examinee to provide identity verification, then launch the test.
  • Examinee is watched on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

Record and Review (with option for AI)

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • After examinees take the test, your staff (or ours) logs into review all the videos and report on any issues.  AI will automatically flag irregular behavior, making your reviews more time-efficient.
  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  System asks the examinee to provide identity verification, then launch the test.
  • Examinee is recorded on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

AI only

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • Videos are stored for 1 month if you need to check any.
  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  System asks the examinee to provide identity verification, then launch the test.
  • Examinee is recorded on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

 

Some case studies for different types of exams

We’ve worked with all types of remote proctoring software, across many types of assessment:

  • ASC delivers high-stakes certification exams for a number of certification boards, in multiple countries, using the live proctoring with professional proctors.  Some of these are available continuously on-demand, while others are on specific days where hundreds of candidates log in.
  • We partnered with a large university in South America, where their admissions exams were delivered using Bring Your Own Proctor, enabling them to drastically reduce costs by utilizing their own staff.
  • We partnered with a private company to provide AI-enhanced record-and-review proctoring for applicants, where ASC staff reviews the results and provides a report to the client.
  • We partner with an organization that delivers civil service exams for a country, and utilizes both unproctored and AI-only proctoring, differing across a range of exam titles.

 

Finding the Best Online Proctoring Software: Two Distinct Markets

First, I would describe the online proctoring industry as actually falling into two distinct markets, so the first step is to determine which of these fits your organizationlaptop-desk-above

  1. Large scale, lower cost (when large scale), lower security systems designed to be used only as a plugin to major LMS platforms like Blackboard or Canvas. These systems are therefore designed for medium-stakes exams like an Intro to Psychology midterm at a university.
  2. Lower scale, higher cost, higher security systems designed to be used with standalone assessment platforms. These are generally for higher-stakes exams like certification or workforce, or perhaps special use at universities like Admissions and Placement exams.

How to tell the difference? The first type will advertise about easy integration with systems like Blackboard or Canvas as a key feature. They will also often focus on AI review of videos, rather than using real humans. Another key consideration is to look at the existing client base, which is often advertised.  

Other ways that online proctoring software can differ

Screen capture:

Some online proctoring providers have an option to record/stream the screen as well as the webcam. Some also provide the option to only do this (no webcam) for lower stakes exams.

Mobile phone as the second camera:

Some newer platforms provide the option to easily integrate the examinee’s mobile phone as a second camera (third stream, if you include screen capture), which effectively operates as a human proctor. Examinees will be instructed to use the video to show under the table, behind the monitor, etc., before starting the exam. They then might be instructed to stand up the phone 2 meters away with a clear view of the entire room while the test is being delivered.  This is in addition to the webcam.

API integrations:

Some systems require software developers to set up an API integration with your LMS or assessment platform. Others are more flexible, and you can just log in yourself, upload a list of examinees, and you are all set.

On-Demand vs. Scheduled:

Some platforms involve the examinee scheduling a time slot. Others are purely on-demand, and the examinee can show up whenever they are ready. MonitorEDU is a prime example of this: examinees show up at any time, present their ID to a live human, and are then started on the test immediately – no downloads/installs, no system checks, no API integrations, nothing.  

More security: A better test delivery software

A good testing delivery platform will also come with its own functionality to enhance test security: randomization, automated item generation, computerized adaptive testing, linear-on-the-fly testing, professional item banking, item response theory scoring, scaled scoring, psychometric analytics, equating, lockdown delivery, and more. In the context of online proctoring, perhaps the most salient is the lockdown delivery. In this case, the test will completely take over the examinee’s computer and they can’t use it for anything else until the test is done.

LMS systems rarely include any of this functionality, because they are not needed for a midterm exam of Intro to Psychology. However, most assessments in the world that have real stakes – university admissions, certifications, workforce hiring, etc. – depend heavily on such functionality. It’s not just out of habit or tradition, either. Such methods are considered essential by international standards including AERA/APA/NCMA, ITC, and NCCA.  

ASC’s preferred online proctoring partners

ASC’s online assessment platforms are integrated with some of the leading remote proctoring software providers.

Type Vendors
Live MonitorEDU
AI Alemira, Sumadi, ProctorFree
Record and Review Alemira, ProctorFree
Bring Your Own Proctor Alemira

 

List of Online Proctoring Software Providers

Looking to evaluate potential vendors?  Here is a great place to start.

# Name Website Country Proctor Service
1 Aiproctor https://www.aiproctor.com/ USA AI
2 Centre Based Test (CBT) https://www.conductexam.com/center-based-online-test-software India Live, Record and Review
3 Class in Pocket classinpocket.com (Website now defunct) India AI
4 Datamatics https://www.datamatics.com/industries/education-technology/proctoring India AI, Live, Record and Review
5 DigiProctor https://www.digiproctor.com India AI
6 Disamina https://disamina.in/ India AI
7 Examity https://www.examity.com/ USA Live
8 ExamMonitor https://examsoft.com/ USA Record and Review
9 ExamOnline https://examonline.in/remote-proctoring-solution-for-employee-hiring/ India AI, Live
10 Eduswitch https://eduswitch.com/  India AI
11 Examus https://examus.com Russia AI, Bring Your Own Proctor, Live
12 EasyProctor https://www.excelsoftcorp.com/products/assessment-and-proctoring-solutions/ India AI, Live, Record and Review
13 HonorLock https://honorlock.com/ USA AI, Record and Review
14 Internet Testing Systems https://www.testsys.com/ USA Bring your own proctor/td>
14 Invigulus https://www.invigulus.com/  USA AI, Live, Record and Review
15 Iris Invigilation https://www.irisinvigilation.com/ Australia AI
16 Mettl https://mettl.com/en/online-remote-proctoring/ India AI, Live, Record and Review
17 MonitorEdu https://monitoredu.com/proctoring USA Live
18 OnVUE https://home.pearsonvue.com/Test-takers/OnVUE-online-proctoring.aspx USA Live
19 Oxagile https://www.oxagile.com/competence/edtech-solutions/proctoring/ USA AI, Live, Record and Review
20 Parakh https://parakh.online/blog/remote-proctoring-ultimate-solution-for-secure-online-exam India AI, Live, Record and Review
21 ProctorFree https://www.proctorfree.com/ USA AI, Live
22 Proctor360 https://proctor360.com/ USA AI, Bring Your Own Proctor, Live, Record and Review
23 ProctorEDU https://proctoredu.com/ Russia AI, Live, Record and Review
24 ProctorExam https://proctorexam.com/ Netherlands Bring Your Own Proctor, Live, Record and Review
25 Proctorio https://proctorio.com/products/online-proctoring USA AI, Live
26 Proctortrack https://www.proctortrack.com/ USA AI, Live
27 ProctorU https://www.proctoru.com/ USA AI, Live, Record and Review
28 Proview https://www.proview.io/en USA AI, Live
29 PSI Bridge https://www.psionline.com/en-gb/platforms/psi-bridge/ USA Live, Record and Review
30 Respondus Monitor https://web.respondus.com/he/monitor/ USA AI, Live, Record and Review
31 Rosalyn https://www.rosalyn.ai/ USA AI, Live
32 SmarterProctoring https://smarterservices.com/smarterproctoring/ USA AI, Bring Your Own Proctor, Live
33 Sumadi https://sumadi.net/ Honduras AI, Live, Record and Review
34 Suyati https://suyati.com/ India AI, Live, Record and Review
35 TCS iON Remote Assessments https://www.tcsion.com/hub/remote-assessment-marking-internship/ India AI, Live
36 Think Exam https://www.thinkexam.com/remoteproctoring India AI, Live
37 uxpertise XP https://uxpertise.ca/en/uxpertise-xp/ Canada AI, Live, Record and Review
38 Proctor AI https://www.visive.ai/solutions/proctor-ai India AI, Live, Record and Review
39 Wise Proctor https://www.wiseattend.com/wise-proctor USA AI, Record and Review
40 Xobin https://xobin.com/online-remote-proctoring India AI
41 Youtestme https://www.youtestme.com/online-proctoring/ Canada AI, Live

 

How do I select a vendor?

First, determine the level of security necessary, and the trade-off with costs.  Live proctoring with professionals can cost $30 to $100 or more, while AI proctoring can be as little as a few dollars.  Then, evaluate some vendors to see which group they fall into; note that some vendors can do all of them!  Then, ask for some demos so you understand the business processes involved and the UX on the examinee side, both of which could substantially impact the soft costs for your organization.  Then, start negotiating with the vendor you want!

Want some more information?

Get in touch with us, we’d love to show you a demo or introduce you to partners!

Email solutions@assess.com.

standard setting

If you have worked in the field of assessment and psychometrics, you have undoubtedly encountered the word “standard.” While a relatively simple word, it has the potential to be confusing because it is used in three (and more!) completely different but very important ways. Here’s a brief discussion.

Standard = Cutscore

As noted by the well-known professor Gregory Cizek here, “standard setting refers to the process of establishing one or more cut scores on a test.” The various methods of setting a cutscore, like Angoff or Bookmark, are referred to as standard setting studies. In this context, the standard is the bar that separates a Pass from a Fail. We use methods like the ones mentioned to determine this bar in as scientific and defensible fashion as possible, and give it more concrete meaning than an arbitrarily selected round number like 70%. Selecting a round number like that will likely get you sued since there is no criterion-referenced interpretation.

Standard = Blueprint

If you work in the field of education, you often hear the term “educational standards.” These refer to the curriculum blueprints for an educational system, which also translate into assessment blueprints, because you want to assess what is on the curriculum. Several important ones in the USA are noted here, perhaps the most common of which nowadays is the Common Core State Standards, which attempted to standardize the standards across states. These standards exist to standardize the educational system, by teaching what a group of experts have agreed upon should be taught in 6th grade Math classes for example. Note that they don’t state how or when a topic should be taught, merely that 6th Grade Math should cover Number Lines, Measurement Scales, Variables, whatever – sometime in the year.

Standard = Guideline

If you work in the field of professional certification, you hear the term just as often but in a different context, accreditation standards. The two most common are the National Commission for Certifying Agencies (NCCA) and the ANSI National Accreditation Board (ANAB). These two organizations are a consortium of credentialing bodies that give a stamp of approval to credentialing bodies, stating that a Certification or Certificate program is legit. Why? Because there is no law to stop me from buying a textbook on any topic, writing 50 test questions in my basement, and selling it as a Certification. It is completely a situation of caveat emptor, and these organizations are helping the buyers by giving a stamp of approval that the certification was developed with accepted practices like a Job Analysis, Standard Setting Study, etc.

In addition, there are the professional standards for our field. These are guidelines on assessment in general rather than just credentialing. Two great examples are the AERA/APA/NCME Standards for Educational and Psychological Measurement and the International Test Commission’s Guidelines (yes they switch to that term) on various topics.

Also: Standardized = Equivalent Conditions

The word is also used quite frequently in the context of standardized testing, though it is rarely chopped to the root word “standard.” In this case, it refers to the fact that the test is given under equivalent conditions to provide greater fairness and validity. A standardized test does NOT mean multiple choice, bubble sheets, or any of the other pop connotations that are carried with it. It just means that we are standardizing the assessment and the administration process. Think of it as a scientific experiment; the basic premise of the scientific method is holding all variables constant except the variable in question, which in this case is the student’s ability. So we ensure that all students receive a psychometrically equivalent exam, with equivalent (as much as possible) writing utensils, scrap paper, computer, time limit, and all other practical surroundings. The problem comes with the lack of equivalence in access to study materials, prep coaching, education, and many bigger questions… but those are a societal issue and not a psychometric one.

So despite all the bashing that the term gets, a standardized test is MUCH better than the alternatives of no assessment at all, or an assessment that is not a level playing field and has low reliability. Consider the case of hiring employees: if assessments were not used to provide objective information on applicant skills and we could only use interviews (which are famously subjective and inaccurate), all hiring would be virtually random and the amount of incompetent people in jobs would increase a hundredfold. And don’t we already have enough people in jobs where they don’t belong?