Posts

Item banking refers to the purposeful creation of a database of items intending to measure a predetermined set of constructs. The term item refers to what many call questions; though their content need not be restricted as such and can include problems to solve or situations to evaluate in addition to straightforward questions. The art of item banking is the organizational structure by which items are categorized. As a critical component of any high-quality assessment, item banking is the foundation for the development of valid, reliable content and defensible test forms. Automated item banking systems, such as the Item Explorer module of FastTest, result in significantly reduced administrative time for maintaining content and producing tests. While there are no absolute standards in creating and managing item banks, best practice guidelines are emerging. Some of the essential aspects include ensuring that:

  • Items are reusable objects; when selecting an item banking platform it is important to ensure that items can be used more than once; ideally item performance should be tracked not only within a test form, but across test forms as well.
  • Item history and usage is tracked; the usage of a given item, whether it is actively on a test form or dormant waiting to be assigned, should be easily accessible for test developers to assess, as the over-exposure of items can reduce the validity of a test form. As you deliver your items, their content is exposed to examinees. Upon exposure to many examinees, items can then be flagged for retirement or revision to reduce cheating or teaching to the test.
  • Items can be sorted; as test developers select items for a test form, it is imperative that they can sort items based on their content area or other categorization method, so as to select a sample of items that is representative of the full breadth of constructs we intend to measure.
  • Item versions are tracked; as items appear on test forms, their content may be revised for clarity. Any such changes should be tracked and versions of the same item should have some link between them so that we can easily review the performance of earlier versions in conjunction with current versions.
  • Review process workflow is tracked; as items are revised and versioned, it is imperative that the changes in content and the users who made these changes are tracked. In post-test assessment, there may be a need for further clarification, and the ability to pinpoint who took part in reviewing an item an expedite that process.
  • Metadata is recorded; any relevant information about an item should be recorded and stored with the item. The most common applications for metadata that we see are author, source, description, content area, depth of knowledge, IRT parameters, and CTT statistics, but there are likely many data points specific to your organization that are worth storing.

Keeping these guidelines in mind, here are some concrete steps that you can take to establish your item bank in accordance with psychometric best practices.

Make your Job Easier: Establish a Naming Convention

Names are important. As you are importing or creating your item banks it is important to identify each item with a unique, but recognizable name. Naming conventions should reflect your bank’s structure and should include numbers with leading zeros to support true numerical sorting. For example, let’s consider the item banks of a high school science teacher. Take a look at the example below:

What are some ways that this utilizes best practices?

  • Each subject has its own item bank. We can easily view all Biology items by selecting the Biology item bank.
  • A separate folder, 8Ah clearly delineates items for honors students.
  • The item names follow along with the item bank and category names, allowing us to search for all items for 8th grade unit A-1 with the query “8A-1”, or similarly for honors items “8Ah-1”
  • Leading zeros are used so that as the item bank expands, items will sort properly; an item ending in 001 will appear before 010.

Indeed, the execution of these best practices should be adapted to the needs of your organization, but it is important to establish a convention of some kind.  That is, you can use a period rather than underscore – as long as you are consistent.

Prepare for the Future: Store Extensive Metadata

Metadata is valuable. As you create items, take the time to record simple metadata like author and source. Having this information can prove very useful once the original item writer has moved to another department, or left the organization. Later in your test development life cycle, as you deliver items, you have the ability to aggregate and record item statistics. Values like discrimination and difficulty are fundamental to creating better tests, driving reliability and validity.

Statistics are used in the assembly of test forms, for example.  Classical statistics can be used to estimate mean, standard deviation, reliability, standard error, and pass rate, while item response theory parameters can be used to calculate test information and standard error functions. Data from both psychometric theories can be used to pre-equate multiple forms.

In the event that your organization decides to publish an adaptive test, utilizing CAT delivery, item parameters for each item will be essential because they are used for intelligently selecting items and scoring examinees. Additionally, in the event that the integrity of your test or scoring mechanism is ever challenged, documentation of validity is essential to defensibility and the storage of metadata is one such vital piece of documentation.

Increase Content Quality: Track Workflow

Utilize a review workflow to increase quality. Using a standardized review process will ensure that all items are vetted in a similar matter. Have a step in the process for grammar, spelling, and syntax review, as well as content review by a subject matter expert. As an item progresses through the workflow, its development should be tracked, as workflow results also serve as validity documentation.

Accept comments and suggestions from a variety of sources. It is not uncommon for each item reviewer to view an item through their distinctive lens. Having a diverse group of item reviewers stands to benefit your test takers, as they are likely to be diverse as well!

Keep Your Items Organized: Categorize Them

Identify items by content area. Creating a content hierarchy can also help you to organize your item bank and ensure that your test covers the relevant topics. Most often, we see content areas defined first by an analysis of the construct(s) being tested. In the event of a high school science test, this may include the evaluation of the content taught in class. For a high-stakes certification exam, this almost always includes a job-task analysis. Both methods produce what is called a test blue print, indicating how important various content areas are to the demonstration of knowledge in the areas being assessed. Once content areas are defined, we can assign items to levels or categories based on their content. As you are developing your test, and invariably referring back to your test blueprint, you can use this categorization to determine which items from each content area to select.

There is no doubt that item banking will remain a key aspect of developing and maintaining quality assessments. Utilizing best practices, and caring for your items throughout the test development life cycle, will pay great dividends as it increases the reliability, validity, and defensibility of your assessment.

Worried your current item banking platform isn’t up to par? We would love to discuss how Assessment Systems can help. FastTest was designed by psychometricians with an intuitive and easy to use item banking module. Check out our free version here, or contact us to learn more.

Want to improve the quality of your assessments with item banking?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

Background of the Certification/Recertification Report

The National Commission for Certifying Agencies (NCCA) is a group that accredits certification programs.  Basically, many of the professional associations that had certifications (e.g., Certified Professional Widgetmaker) banded together to form a super-association, which then established an accreditation arm to ensure that certifications were of high quality – as there are many professional certifications in the world of near-zero quality.  So this is a good thing.  Becoming accredited is a rigorous process, and the story doesn’t stop there.  Once you are accredited, you need to submit annual reports to NCCA as well as occasionally re-apply.  This is mentioned in NCCA Standard 24, and is also described in the NCCA webpage regarding annual renewals of accreditation. It requires information on certification and recertification volumes.

The certification program must demonstrate continued compliance to maintain accreditation.

Essential Elements:

  1. The certification program must annually complete and submit information requested of the certification agency and its programs for the previous reporting year.

There are a number of reports that are required, one of which is a summary of Certification/Recertification numbers.  These are currently submitted through an online system, but an example of a previous paper form can be found here, which clearly states some of the tables that you need to fill out.

In the past, you had to pay consultants or staff to manually compile this information.  Our certification management system does it automatically for you – for free.

Overview of the Cert/Recert Report

Our easily generated Certification/Recertification report provides a simple, clean overview of your certification program. The report includes important information required for maintaining accreditation, including the number of applicants, new certifications, and recertifications, as well as percentages signifying the success rate of new candidates and recertification candidates. The report is automatically produced within FastTest’s reporting module, saving your organization thousands of dollars in consultant fees.

Here is a sample report. Assume that this organization has one base level certification, Generalist, with 3 additional specialist areas where certification can also be earned.

Annual Certification/Recertification Report

Date run: 11/10/2016
Timeframe: 1/1/2012 – 12/31/2015

ProgramApplicants for first time certificationFirst time certifiedDue for recertificationRecertifiedPercent due that recertified
Generalist4,5622,89965328744%
Specialist A253122722940%
Specialist B1146724736%
Specialist C4413200%

Let’s examine the data for the Generalist program. Follow the table across the first data line:

Generalist is the name of the program with the data being analyzed. The following data all refers to candidates who were involved in the certification process at the Generalist level within our organization.

4,562 is the number of candidates who registered to be certified for the program within the timeframe indicated above the table. These candidates have never been certified in this program before.

2,899 is the number of candidates who successfully completed the certification process by receiving a passing score and meeting any other minimum requirements set forth by the certification program.

653 is the number of previously certified candidates whose certification expired within the timeframe indicated above the table.

287 is the number of previously certified candidates who successfully completed the recertification process.

44% is the percentage of candidates eligible for recertification within the indicated timeframe who successfully completed the recertification process. Another way to express this value would be 287/653; the number of who successfully completed recertification divided by the number of those due for recertification within the given timeframe.

The same format follows for each of the Specialist programs.

If you found this post interesting, you might also be interested in checking out this post on the NCCA Annual Statistics Report.  That report is another one of the requirements, but focuses on statistical and psychometric characteristics of your exams.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

If you are delivering high stakes tests in linear forms – or piloting a bank for CAT/LOFT – you are faced with the issue of how to equate the forms together.  That is, how can we defensibly translate a score on Form A to a score on Form B?  While the concept is simple, the methodology can be complex, and there is an entire area of psychometric research devoted to this topic.  There are a number of ways to approach this issue, and IRT equating is the strongest.

Why do we need equating?

The need is obvious: to adjust for differences in difficulty to ensure that all examinees receive a fair score on a stable scale.  Suppose you take Form A and get s score of 72/100 while your friend takes Form B and gets a score of 74/100.  Is your friend smarter than you, or did his form happen to have easier questions?  Well, if the test designers built in some overlap, we can answer this question empirically.

Suppose the two forms overlap by 50 items, called anchor items or equator items.  They are each delivered to a large, representative sample.  Here are the results.

Exam FormMean score on 50 overlap itemsMean score on 100 total items
A3072
B3274

Because the mean score on the anchor items was higher, we then think that the Form B group was a little smarter, which led to a higher total score.

Now suppose these are the results:

Exam FormMean score on 50 overlap itemsMean score on 100 total items
A3272
B3274

Now, we have evidence that the groups are of equal ability.  The higher total score on Form B must then be because the unique items on that form are a bit easier.

How do I calculate an equating?

You can equate forms with classical test theory (CTT) or item response theory (IRT).  However, one of the reasons that IRT was invented was that equating with CTT was very weak.  CTT methods include Tucker, Levine, and equipercentile.  Right now, though, let’s focus on IRT.

IRT equating

There are three general approaches to IRT equating.  All of them can be accomplished with our industry-leading software Xcalibre, though conversion equating requires an additional software called IRTEQ.

  1. Conversion
  2. Concurrent Calibration
  3. Fixed Anchor Calibration

Conversion

With this approach, you need to calibrate each form of your test using IRT, completely separately.  We then evaluate the relationship between IRT parameters on each form, and use that to estimate the relationship to convert examinee scores.  Theoretically what you do is line up the IRT parameters of the common items and perform a linear regression, so you can then apply that linear conversion to scores.  But DO NOT just do a regular linear regression.  There are specific methods you must use, including mean/mean, mean/sigma, Stocking & Lord, and Haebara.  Fortunately, you don’t have to figure out all the calculations yourself, as there is free software available to do it for you: IRTEQ.

Concurrent Calibrationcommon item linking irt equating

The second approach is to combine the datasets into what is known as a sparse matrix.  You then run this single data set through the IRT calibration, and it will place all items and examinees onto a common scale.  The concept of a sparse matrix is typically represented by the figure below, representing the non-equivalent anchor test (NEAT) design approach.  The IRT calibration software will automatically equate the two forms and you can use the resultant scores.

Fixed Anchor Calibration

The third approach is a combination of the two above; it utilizes the separate calibration concept but still uses the IRT calibration process to perform the equating rather than separate software.  With this approach, you would first calibrate your data for Form A.  You then find all the IRT item parameters for the common items, and input them into your IRT calibration software when you calibrate Form B.  You can tell the software to “fix” the item parameters so that those particular ones (from the common items) do not change.  Then all the item parameters for the unique items are forced onto the scale of the common items, which of course is the underlying scale from Form A.  This then also forces the scores from the Form B students onto the Form A scale.

 

 

How do these approaches compare to each other?
concurrent calibration irt equating linking

Concurrent calibration is arguably the easiest, but has the drawback that it merges the scales of each form into a new scale somewhere in the middle.  If you need to report the scores on either form on the original scale, then you must use the Conversion or Fixed Anchor approaches.  This situation commonly happens if you are equating across time periods.  Suppose you delivered Form A last year and are now trying to equate Form B.  You can’t just create a new scale and thereby nullify all the scores you reported last year.  You must map Form B onto Form A so that this year’s scores are reported on last year’s scale and everyone’s scores will be consistent.

 

Where do I go from here?

If you want to do IRT equating, you need IRT calibration software.  All three approaches use it.  I highly recommend Xcalibre since it is easy to use and automatically creates reports in Word for you.  If you want to learn more about the topic of equating, the classic reference is the book by Kolen and Brennan (2004; 2014).  There are other resources more readily available on the internet, like this free handbook from CCSSO.  If you would like to learn more about IRT, I recommend the books by de Ayala (2008) and Embretson & Reise (2000).  A very brief intro is available on our website.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

Psychometrics – the science of educational/psychological testing- is an essential aspect of assessment, but to most people it remains a black box. However, a basic understanding is important for anyone working in the testing industry, especially those developing or selling tests. Why is psychometrics so important, and how will it benefit your organization? There are two primary ways to implement better psychometrics in your organization: process improvement (typically implemented by psychometricians), and specially-designed software. This article will outline some of the ways that your tests can be improved, but first, let me outline some of the things that psychometrics can do for you.

 

Define what should be covered by the test

Before writing any items, you need to define very specifically what will be on the test.  Psychometricians typically run a job analysis study to form a quantitative, scientific basis for the test blueprints.  A job analysis is necessary for a certification program to get accredited.

 

Improve development of test content

There is a corpus of scientific literature on how to develop test items that accurately measure whatever you are trying to measure.  This is not just limited to multiple-choice items, although that approach remains popular.  Psychometricians leverage their knowledge of best practices to guide the item authoring and review process in a way that the result is highly defensible test content.  Professional item banking software provides the most efficient way to develop high-quality content and publish multiple test forms, as well as store important historical information like item statistics.

 

Set cutscores

Test scores are often used to classify candidates into groups, such as pass/fail (Certification/Licensure), hire/non-hire (Pre-Employment), and below-basic/basic/proficient/advanced (Education).  Psychometricians lead studies to determine the cutscores, using methodologies such as Angoff, Beuk, Contrasting-Groups, and Borderline.

 

Statistically analyze results

Psychometricians are essential for this step, as the statistical analyses can be quite complex.  Smaller testing organizations typically utilize classical test theory, which is based on simple mathematics like proportions and correlations.  Large, high-profile organizations typically use item response theory, which is based on a type of nonlinear regression analysis.  Psychometricians evaluate overall reliability of the test, difficulty and discrimination of each item, distractor analysis, possible bias, multidimensionality, linking multiple test forms/years, and much more.  Software such as Iteman and Xcalibre is also available for organizations with enough expertise to run statistical analyses internally.

 

Establish and document validity

Validity is the evidence provided to support score interpretations.  For example, we might interpret scores on a test to reflect knowledge of English, and we need to provide documentation and research supporting this.  There are several ways to provide this evidence.  A straightforward approach is to establish content-related evidence, which includes the test definition, blueprints, and item authoring/review.  In some situations, criterion-related evidence is important, which directly correlates test scores to another variable of interest.  Delivering tests in a secure manner is also essential for validity.

 

Certification

In certification testing, psychometricians develop the test via a documented chain of evidence following a sequence of research outlined by accreditation bodies, typically: job analysis, test blueprints, item writing and review, cutscore study, and statistical analysis.  Web-based item banking software like FastTest is typically useful because the exam committee often consists of experts located across the country or even throughout the world; they can then easily log in from anywhere and collaborate.

 

Pre-Employment

In pre-employment testing, validity evidence relies primarily on establishing appropriate content (a test on PHP programming for a PHP programming job) and the correlation of test scores with an important criterion like job performance ratings (shows that the test predicts good job performance).  Adaptive tests are becoming much more common in pre-employment testing because they provide several benefits, the most important of which is cutting test time by 50% – a big deal for large corporations that test a million applicants each year.  Adaptive testing is based on item response theory, and requires a specialized psychometrician as well as specially designed software like FastTest.

 

K-12 Education

Most assessments in education fall into one of two categories: lower-stakes formative assessment in classrooms, and higher-stakes summative assessments like year-end exams.  Psychometrics is essential for establishing the reliability and validity of higher-stakes exams, and on equating the scores across different years.  They are also important for formative assessments, which are moving towards adaptive formats because of the 50% reduction in test time, meaning that student spend less time testing and more time learning.

 

Universities

Universities typically do not give much thought to psychometrics even though a significant amount of testing occurs in higher education, especially with the move to online learning and MOOCs.  Given that many of the exams are high stakes (consider a certificate exam after completing a year-long graduate program!), psychometricians should be used in the establishment of legally defensible cutscores and in statistical analysis to ensure reliable tests, and professionally designed assessment systems used for developing and delivering tests, especially with enhanced security.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

The Contrasting Groups Method is a common approach to setting a cutscore.  It is very easy to do, but has the important drawback that some sort of “gold standard” is needed to assign examinees into categories such as Pass and Fail.  This “gold standard” should be unrelated to the test itself.

For example, suppose you wanted to set a cutscore on a practice test that is helping examinees determine if they are ready for a high-stakes certification test.  You might have past data for examinees who took both your practice exam and the actual certification test.  Their results from the certification test can be used to assign them to groups of Pass or Fail, and then you can evaluate the practice test score distributions for each group.  These distributions are typically smoothed, and their intersection represents an appropriate cutscore for the practice test.  In the example below, the two curves intersect near a score of 85, suggesting that this is an appropriate cutscore for the practice test that will closely predict the results of the official certification test.

I developed a simple tool in MS Excel that allows our psychometricians to easily produce both the smoothed and unsmoothed versions of this method, given nothing more than a list of practice test scores and “real” test classification for examinees.  If you think this method might be appropriate for your exams, please contact us at sales@54.89.150.95 and one of our consultants will get in touch with you.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source