Finding good employees in an overcrowded market is a daunting task. In fact, according to research by Career builder, 74% of employers admit to hiring the wrong employees. Bad hires are not only expensive, but can also adversely affect cultural dynamics in the workforce. This is where pre-employment assessment software shows its value.

Pre-employment testing tools help companies create effective assessments, thus saving valuable resources, improving candidate experience & quality hire, and reducing hiring bias.  But, finding a pre-employment testing software that can help you reap these benefits can be difficult, especially because of the explosion of software solutions in the market.  If you are lost on which tools will help you develop and deliver your own pre-employment assessments, this guide is for you.

First things first: you need to understand the basics of pre-employment tests. 

What is a pre-employment test?

A pre-employment test refers to an examination given to job seekers before hiring. The main reasons for administering these tests include determining important candidate metrics such as cognitive abilities, job experience, and personality traits.  The popularity of pre-employment tests has sky-rocketed in the past years. This is because of their ability to help companies manage large banks of candidate applications.  This helps increase quality hires by providing access to a diversified network of professionals while eliminating roadblocks such as ‘Resume Spammers’.  

Types of pre-employment tests

There are different types of pre-employment assessments. Each of them achieves a different goal in the hiring process. The major types of pre-employment assessments include:

Personality tests: Despite rapidly finding their way into HR, these types of pre-employment tests are widely misunderstood. Personality tests answer questions in the social spectrum.  One of the main goals of these tests is to quantify the success of certain candidates based on behavioral traits. 

Aptitude tests: Unlike personality tests or emotional intelligence tests which tend to lie on the social spectrum, aptitude tests measure problem-solving, critical thinking, and agility.  These types of tests are popular because can predict job performance than any other type because they can tap into areas that cannot be found in resumes or job interviews. 

Skills Testing: The kinds of tests can be considered a measure of job experience; ranging from high-end skills to low-end skills such as typing or Microsoft excel. Skill tests can either measure specific skills such as communication or measure generalized skills such as numeracy. 

Emotional Intelligence tests: These kinds of assessments are a new concept but are becoming important in the HR industry. With strong Emotional Intelligence (EI) being associated with benefits such as improved workplace productivity and good leadership, many companies are investing heavily in developing these kinds of tests.  Despite being able to be administered to any candidates, it is recommended they be set aside for people seeking leadership positions, or those expected to work in social contexts. 

Risk tests: As the name suggests, these types of tests help companies reduce risks. Risk assessments offer assurance to employers that their workers will commit to established work ethics and not involve themselves in any activities that may cause harm to themselves or the organization.  There are different types of risk tests. Safety tests, which are popular in contexts such as construction, measure the likelihood of the candidates engaging in activities that can cause them harm. Other common types of risk tests include Integrity tests

Pre-employment testing software: The Benefits 

Now that you have a good understanding of what pre-employment tests are, let’s discuss the benefits of integrating pre-employment assessment software into your hiring process. Here are some of the benefits:

Saves Valuable resources

Unlike the lengthy and costly traditional hiring processes, pre-employment assessment software helps companies increase their ROI by eliminating HR snugs such as face-to-face interactions or geographical restrictions. Pre-employment testing tools can also reduce the amount of time it takes to make good hires while reducing the risks of facing the financial consequences of a bad hire. 

Supports Data-Driven Hiring Decisions

Data runs the modern world, and hiring is no different. You are better off letting complex algorithms crunch the numbers and help you decide which talent is a fit, as opposed to hiring based on a hunch. 

Pre-employment assessment software helps you analyze assessments and generate reports/visualizations to help you choose the right candidates from a large talent pool. 

Improving candidate experience 

Candidate experience is an important aspect of a company’s growth, especially considering the fact that 69% of candidates admitting not to apply for a job in a company after having a negative experience. Good candidate experience means you get access to the best talent in the world. 

Elimination of Human Bias

Traditional hiring processes are based on instinct. They are not effective since it’s easy for candidates to provide false information on their resumes and cover letters. 

But, the use of pre-employment assessment software has helped in eliminating this hurdle. The tools have leveled the playing ground, and only the best candidates are considered for a position. 

Need some help deciding how you can reap the mentioned benefits of pre-employment assessment software? Click the button below to get help.


What To Consider When Choosing pre-employment assessment software

Now that you have a clear idea of what pre-employment tests are and the benefits of integrating pre-employment assessment software into your hiring process, let’s see how you can find the right tools. 

Here are the most important things to consider when choosing the right pre-employment testing software for your organization.

Ease-of-use

The candidates should be your top priority when you are sourcing pre-employment assessment software. This is because the ease of use directly co-relates with good candidate experience. Good software should have simple navigation modules and easy comprehension. 

Here is a checklist to help you decide if a pre-employment assessment software is easy to use;

  • Are the results easy to interpret?
  • What is the UI/UX like?
  • What ways does it use to automate tasks such as applicant management?
  • Does it have good documentation and an active community?

Tests Delivery (Remote proctoring)

Remote proctoring (Courtesy of FastTest)

Good online assessment software should feature good online proctoring functionalities. This is because most remote jobs accept applications from all over the world. It is therefore advisable to choose a pre-employment testing software that has secure remote proctoring capabilities. Here are some things you should look for on remote proctoring;

  • Does the platform support security processes such as IP-based authentication, lockdown browser, and AI-flagging?
  • What types of online proctoring does the software offer? Live real-time, AI review, or record and review?
  • Does it let you bring your own proctor?
  • Does it offer test analytics?

Test & data security, and compliance

Defensibility is what defines test security. There are several layers of security associated with pre-employment test security. When evaluating this aspect, you should consider what pre-employment testing software does to achieve the highest level of security. This is because data breaches are wildly expensive. 

The first layer of security is the test itself. The software should support security technologies and frameworks such as lockdown browser, IP-flagging, and IP-based authentication. If you are interested in knowing how to secure your assessments, check this post out.

The other layer of security is on the candidate’s side. As an employer, you will have access to the candidate’s private information. How can you ensure that your candidate’s data is secure? That is reason enough to evaluate the software’s data protection and compliance guidelines.

A good pre-employment testing software should be compliant with certifications such as GDRP. The software should also be flexible to adapt to compliance guidelines from different parts of the world. 

Questions you need to ask;

  • What mechanisms does the software employ to eliminate infidelity?
  • Is their remote proctoring function reliable and secure?
  • Are they compliant with security compliance guidelines including ISO, SSO, or GDPR?
  • How does the software protect user data?

User experience

A good user experience is a must-have when you are sourcing any enterprise software. A new age pre-employment testing software should create user experience maps with both the candidates and employer in mind. Some ways you can tell if a software offers a seamless user experience includes;

  • User-friendly interface
  • Simple and easy to interact with
  • Easy to create and manage item banks
  • Clean dashboard with advanced analytics and visualizations

Customizing your user-experience maps to fit candidates’ expectations attracts high-quality talent. 

Scalability and automation

With a single job post attracting approximately 250 candidates, scalability isn’t something you should overlook. A good pre-employment testing software should thus have the ability to handle any kind of workload, without sacrificing assessment quality. 

It is also important you check the automation capabilities of the software. The hiring process has many repetitive tasks that can be automated with technologies such as Machine learning, Artificial Intelligence (AI), and robotic process automation (RPA).  

Here are some questions you should consider in relation to scalability and automation; 

  • Does the software offer Automated Item Generation (AIG)?
  • How many candidates can it handle? 
  • Can it support candidates from different locations worldwide?


Reporting and analytics

Reporting and visualization

Example of Reporting and visualization functionality to help you make data-driven hiring decisions  (Courtesy of FastTest and Assess.ai)

A good pre-employment assessment software will not leave you hanging after helping you develop and deliver the tests. It will enable you to derive important insight from the assessments.

The analytics reports can then be used to make data-driven decisions on which candidate is suitable and how to improve candidate experience. Here are some queries to make on reporting and analytics;

  • Does the software have a good dashboard?
  • What format are reports generated in?
  • What are some key insights that prospects can gather from the analytics process?
  • How good are the visualizations?

Customer and Technical Support

Customer and technical support is not something you should overlook. A good pre-employment assessment software should have an Omni-channel support system that is available 24/7. This is mainly because some situations need a fast response. Here are some of the questions your should ask when vetting customer and technical support;

  • What channels of support does the software offer/How prompt is their support?
  • How good is their FAQ/resources page?
  • Do they offer multi-language support mediums?
  • Do they have dedicated managers to help you get the best out of your tests?

Conclusion

Finding the right pre-employment testing software is a lengthy process, yet profitable in the long run. We hope the article sheds some light on the important aspects to look for when looking for such tools. Also, don’t forget to take a pragmatic approach when implementing such tools into your hiring process.

Are you stuck on how you can use pre-employment testing tools to improve your hiring process? Feel free to contact us and we will guide you on the entire process, from concept development to implementation. Whether you need off-the-shelf tests or a comprehensive platform to build your own exams, we can provide the guidance you need.  We also offer free versions of our industry-leading software FastTest and Assess.ai– visit our Contact Us page to get started!

 

Digital assessment security is a highly discussed topic in the Ed-Tech community. The use of digital assessments in education has led to increased concerns about academic integrity. While digital assessments have more advantages than the outdated pen-paper method, they are open to a lot of criticism in relation to academic integrity concerns. However, cheating can be eliminated using the right tools and best practices. 

Before we get into the nitty-gritty of creating secure digital assessments, let’s define cheating.

What is cheating? 

Cheating refers to actual, intended, or attempted deception and/or dishonesty in relation to academic or corporate assessments. 

Some smart ways in which examinees cheat in digital assessments include;

  • Plagiarism– Taking someone else’s work and turning it into one’s own.
  • Access to external resources– Using external resources such as the internet, books, etc. 
  • Using freelancing services– Paying someone else especially from academic freelancing sites such as Studypool to do the work for you. 
  • Impersonation– Letting someone else take an assessment in one’s place.
  • Abetting cheating– Selfless acts such as releasing exam papers early, or discussing answers with examinees who have taken the exam. 
  • Using remote access tools to get external help.

When addressing digital assessments infidelity, most professionals ask the wrong questions. How do we stop cheating? Or which digital assessment security techniques should we use?

 But we believe that is not an effective way to approach this problem. The right questions ask are; why do students cheat? how can we make students/employees be confident enough to take assessments without cheating? Which digital assessment tools address all the weakest links in the digital assessment life cycle? For instance, by integrating techniques such as adaptive testing or multistage-testing, the students and employees feel confident to take exams. 

In this blog post, we are going to share digital assessment techniques and best practices that can help you create effective assessments. 

Part 1: Digital exams security techniques 

Online/Remote proctoring is the process of monitoring digital assessments. This gives supervisors and invigilators the freedom to monitor examinations from any location. 

1. Online Proctoring

To increase assessment security in this process, you can utilize these methods;

  • Image Capturing

This is a technique used to increase the effectiveness of online testing by taking photos of activities in a given interval of time. The images are later analyzed to flag down any form of infidelity detected in the image banks. This is a basic mechanism, yet effective in situations where students don’t have access to a stable internet connection. 

  • Audio Capturing

This one is almost similar to image/video capturing but keeps track of audio logs. The audio logs are later analyzed to identify any voice abnormalities in the surrounding environment. This eliminates risks such as getting help from family members or friends in the same room. 

  • Live Video Recording and/ Streaming

This is the most important element of security in digital assessments. Live video streaming keeps the candidate under constant surveillance and invigilators can suspend the examination if they notice suspicious behavior. Some online testing tools are even integrated with AI systems to help in the process of identifying suspicious activity. Video recording on the other hand works like audio capturing. 

Digital assessment security: Remote Proctoring,  Record and Review
Remote Proctoring: Record and Review

 

The video is recorded, stored in the cloud, and later analyzed to spot any suspicious behavior. 

To give you the freedom to be creative and flexible in the assessment process, Assess.com even lets you bring your own proctor.

2. Lockdown Browser

 

High-speed broadband and access to hundreds of tools to do anything. Sounds like the perfect opportunity to cheat? Yes, to some extent. However, with the lockdown browser feature, students are limited to one tab, the one hosting the exam. 

If the student tries accessing any external software or opening a tab, a notification is sent to the invigilator. The feature starts by scanning the computer for video/image capture. Once the test begins, the examinee is restricted from surfing the internet for answers. 

3. Digital assessment security through AI Flagging

Digital Assessment security: AI Flagging In Remote Proctoring
AI Flagging In Remote Proctoring 

 

As seen in the article 10 EdTech Trends and Predictions To Watch Out For, In 2021,  AI will increase in popularity in Ed-Tech. Not only can facial recognition (which is a form of AI) be used to increase security on campus but also in digital assessment.

 By using systems integrated with AI systems, supervisors can identify suspicious behavior and take action, in real-time. Some actions the AI systems are able to identify include two faces in one screen.  This feature works in hand with video streaming/recording. This helps increase examination integrity while cutting back on costs. 

4. IP-Based Authentication

 

Identity authentication is an integral step in achieving the highest form of academic integrity in digital assessments. IP-Based authentication uses a user’s IP address to confirm the identity of students undertaking the exam. This comes in handy when examinees try to cheat by offering remote access to other users. In a recent paper exploring, e-authentication systems, other components of authentication may include biometric systems, mechanisms, knowledge, and document analysis. Using IP-Based authentication in digital assessment may decrease some risks of cheating.

5. Digital assessment security by data encryption

Cybercrimes have become a threat to many industries. Education is one of the industries that face a lot of challenges when it comes to information security. In fact, according to a recent study, 83% of schools in the UK experienced at least one cyberattack, despite having protection mechanisms such as antivirus software and firewalls. Another research also discovered that education records could be sold as high as $265 on the black market

Digital assesment security: CyberSecurity By Industry
CyberSecurity By Industry

 

 This kind of insecurity finds its way into digital assessment because most tests are developed and delivered online. Some students can get early access to digital assessments or change their grades if they can find a loophole in the digital assessment systems. To prevent this, it is important to implement cybersecurity practices in digital assessment systems. Data encryption is a key part of ensuring that exams and personal data are safe.  

  1. Audit Logging

The audit logging function tracks user activity in all aspects of the testing process using unique identifiers such as IP addresses. It keeps track of all activities including clicks, stand-by time, software accessed during the assessment, and so much more! These logs are then analyzed for signs of infidelity. 

Part II: Digital assessment security best practices

Digital assesment life cycle
Digital Assessment Life Circle

 

Most of the technologies discussed above focus on the last few stages of the evaluation process. The strategies below focus on techniques that do not fall under the main stages of the process, yet very critical.

1. Creating Awareness On Cheating

Many institutions don’t create enough awareness of the dangers of cheating in relation to academic integrity. You can use media such as videos to give them guidelines of how to undertake exams authentically, and what is considered cheating. This prepares the students to undertake the exams without thinking about cheating. 

2. Don’t Use Publisher Item Banks

Course textbooks and guides may come with complimentary test banks, and you should refrain from using them in assessments. This is because most course materials are available on the internet, and may encourage infidelity. Instead, make up your own questions, based on your learning objectives and methodologies. 

3. Disable Backtracking

This increases learning efficiency by making sure that students answer what they know, and locate their weakest links. By facilitating backtracking, you give the students time to go back and try to locate the correct answers, which reduces effectiveness. 

4. Diversity In Question Types

Fear of Failure is one thing that makes examinees cheat. By creating diverse question types, you improve the assessment engagement of the students, therefore improving their confidence. Don’t give your examinees multiple-choice questions only, add in some yes/no questions, essays, and so on.

5. Make Them Sign Academic Integrity Contracts

Contracts have a psychological effect on people, and the examinees are more likely to be authentic if they sign some form of contract. 

If you are interested in other best practices to increase online assessment security, here is an important resource that can help.

Conclusion

Online exam security is an important aspect of online education and should not be ignored. Despite the immaturity of existing frameworks and methodologies in the market, techniques such as remote proctoring, lockdown browser, and audit logging have proven to be effective. 

Creating and delivering high-quality digital assessments is not a walk in the park. If you need professional help in creating online assessments with alignment to the best psychometrics practices, contact Assess for a free consultation. 

You can also get access to online assessment tools with security features including, AI-flagging, remorse proctoring, lock-down browser and so much more!

The Test Information Function is a concept from item response theory (IRT) that is designed to evaluate how well an assessment differentiates examinees, and at what ranges of ability. For example, we might expect an exam composed of difficult items to do a great job in differentiating top examinees, but it is worthless for the lower half of examinees because they will be so confused and lost.

The reverse is true of an easy test; it doesn’t do any good for top examinees. The test information function quantifies this and has a lot of other important applications and interpretations.

Test Information Function: how to calculate it

The test information function is not something you can calculate by hand. First, you need to estimate item-level IRT parameters, which define the item response function. The only way to do this is with specialized software; there are a few options in the market, but we recommend Xcalibre.

Next, the item response function is converted to an item information function for each item. The item information functions can then be summed into a test information function. Lastly, the test information function is often inverted into the conditional standard error of measurement function, which is extremely useful in test design and evaluation.

IRT Item Parameters

Software like Xcalibre will estimate a set of item parameters. The parameter you use depends on the item types and other aspects of your assessment.

For example, let’s just use the 3-parameter model, which estimates a, b, and c. And we’ll use a small test of 5 items. These are ordered by difficulty: item 1 is very easy and Item 5 is very hard.

Item a b c
1 1.00 -2.00 0.20
2 0.70 -1.00 0.40
3 0.40 0.00 0.30
4 0.80 1.00 0.00
5 1.20 2.00 0.25

Item Response Function

The item response function uses the IRT equation to convert the parameters into a curve. The purpose of the item parameters is to fit this curve for each item, like a regression model to describe how it performs.

Here are the response functions for those 5 items. Note the scale on the x-axis, similar to the bell curve, with the easy items to the left and hard ones to the right.

item response function

Item Information Function

The item information function evaluates the calculus derivative of the item response function. An item provides more information about examinees where it provides more slope.

For example, consider Item 5: it is difficult, so it is not very useful for examinees in the bottom half of ability. The slope of the Item 5 IRF is then nearly 0 for that entire range. This then means that its information function is nearly 0.

item information functions

Test Information Function

The test information function then sums up the item information functions to summarize where the test is providing information. If you imagine adding the graphs above, you can easily imagine some humps near the top and bottom of the range where there are the prominent IIFs. 

Test information function

Conditional Standard Error of Measurement Function

The test information function can be inverted into an estimate of the conditional standard error of measurement. What do we mean by conditional? If you are familiar with classical test theory, you know that it estimates the same standard error of measurement for everyone that takes a test.

But given the reasonable concepts above, it is incredibly unreasonable to expect this. If a test has only difficult items, then it measures top students well, and does not measure lower students well, so why should we say that their scores are just as accurate? The conditional standard error of measurement turns this into a function of ability.

Also, note that it refers to the theta scale and not to the number-correct scale.

Conditional standard error of measurement function

How can I implement all this?

For starters, I recommend delving deeper into an item response theory book. My favorite is Item Response Theory for Psychologists by Embretson and Riese. Next, you need some item response theory software.

Xcalibre can be downloaded as a free version for learning and is the easiest program to learn how to use (no 1980s-style command code… how is that still a thing?). But if you are an R fan, there are plenty of resources in that community as well.

Tell me again: why are we doing this?

The purpose of all this is to effectively model how items and tests work, namely, how they interact with examinees. This then allows us to evaluate their performance so that we can improve them, thereby enhancing reliability and validity.

Classical test theory had a lot of shortcomings in this endeavor, which led to IRT being invented. IRT also facilitates some modern approaches to assessment, such as linear on-the-fly testing, adaptive testing, and multistage testing.

An item distractor, also known as a foil or a trap, is an incorrect option for a selected-response item on an assessment.

What makes a good item distractor?

One word: plausibility.  We need the item distractor to attract examinees.  If it is so irrelevant that no one considers it, then it does not do any good to include it in the item.  Consider the following item.

What is the capital of the United States of America?

A. Los Angeles

B. New York

C. Washington, D.C.

D. Mexico City

The last option is quite implausible – not only is it outside the USA, but it mentions another country in the name, so no student is likely to select this.  This then becomes a three-horse race, and students have a 1 in 3 chance of guessing.  This certainly makes the item easier.

In addition, the distractor needs to have negative discrimination.  That is, while we want the correct answer to attract the more capable examinees, we want the distractors to attract the lower examinees.  If you have a distractor that you thought was incorrect, and it turns out to attract all the top students, you need to take a long, hard look at that question! To calculate discrimination statistics on distractors, you will need software such as Iteman.

What makes a bad item distractor?

Obviously, implausibility and negative discrimination are frequent offenders.  But if you think more deeply about plausibility, the key is actually plausibility without being arguably correct.  This can be a fine line to walk, and is a common source of problems for items.  You might have a medical item that presents a scenario and asks for a likely diagnosis; perhaps one of the distractors is very unlikely so as to be essentially implausible, but it might actually be possible for a small subset of patients under certain conditions.  If the author and item reviewers did not catch this, the examinees probably will, and this will be evident in the statistics.  This is one of the reasons it is important to do psychometric analysis of test results; in fact, accreditation standards often require you to go through this process at least once a year.

What is an item review?  It is the process of performing quality control on items before they are ever delivered to examinees. 

This is an absolutely essential step in the development of items for medium and high stakes exams; while a teacher might not have other teachers review questions on a 4th-grade math quiz, items that are part of a admissions exam or professional certification exam will go through multiple layers of independent item review before a single examinee sees them. 

This blog post will discuss some important aspects of the item review process.

Why item review?

Assessment items are, when you look at it from a business perspective, a work product.  They are component parts of a larger machine, the test or assessment; in some cases interchangeable, in other cases very intentional and specific.  It is obviously common practice to perform quality assurance on work products, and the item review process simply applies this concept to test questions.

Who does the item review?

This can differ greatly based on the type of assessment and the stakes involved.  In a medium-stakes situation, it might be just one other reviewer.  A professional certificate exam might have all items reviewed by one content expert other than the person who wrote the item, and this could be considered sufficient. 

In higher-stakes exams that are developed by large organizations, the item might go through two content reviewers, a psychometric reviewer, a biased reviewer, and an editor.  Additionally, it then might go through additional stages for formatting.  You can see how this can then become a very big deal, with dozens of people and hundreds of items floating around.

What do the reviewers check?

It depends on who the reviewer is, but there are often checklists that the organization provides.  A content reviewer might check that the stem is clear, the key is fully correct, the distractors fully incorrect, and all answers of reasonably equivalent length. 

The psychometric reviewer might check for aspects that inadvertently tip off the correct answer.  The bias reviewer might look for a specific set of situations that potentially disadvantage some subgroup of the population.  An editor might look for correct usage of punctuation, such as the fact that the stem should never end in a colon.

For example, during my graduate school years, I used to write items that were eventually used in the US State of Alaska for K-12 assessments.  The reviewers not only looked for straightforward issues like answer correctness but for potential bias in the case of Alaskans.  As item writers, we were warned to be careful about mentioning any objects that we take for granted in the Lower 48: roads, shopping malls, indoor plumbing, and farms are examples that come to mind. Checking this was a stage of item review.

How do we manage the work?

The best practice to manage the process is to implement stages.  An organization might decide that all items go to the reviewers listed previously, and in the order that I described them.  Each one must complete their review checklist before the item can be moved onto the next stage.  This might seem like a coldhearted assembly line, given that there certainly is an art to writing good items, but assembly lines unarguably lead to greater quality and increased productivity.

Is there software that makes the item review process easier?

Yes.  You have likely used some form of work process management software in your own jobs, such as Trello, JIRA, or Github. These are typically based on the concept of swimlanes, which as a whole is often referred to as a Kanban board.  Back in the day, Kanban boards were actual boards with post-its on them, as you might have seen on shows like Silicon Valley.  

This presents the aforementioned stages as columns in a user interface, and tasks (items) are moved through the stages.  Once Content Reviewer 1 is done with their work and leaves comments on the item, the software provides a way for them to change the stage to Content Review 2 and assign some person as Content Reviewer 2.

Below is an example of this from ASC’s online assessment platform, Assess.ai.  Because Assess.ai is designed for organizations that are driven by best practices and advanced psychometrics, there is an entire portion of the system dedicated to the management of item review via the swimlanes interface.

kanban board

To implement this process, an administrator at the organization defines the stages that they want all items to receive, and Assess.ai will present these as columns in the swimlane interface.  Administrators can then track and manage the workflow visually.  The reviewers themselves don’t need access to everything, but instead are instructed to click on the items they are supposed to review, and they will be presented with an interface like the one below.

item review

Can I implement Kanban item review at my organization?

Absolutely!  Assess.ai is available as a free version (sign up here), with a limit of 500 items and 1 user.  While this means that the free version won’t let you manage dozens of users, you can still implement some aspects of the process to improve item quality in your organization.  Once you are ready to expand, you can simply upgrade your account and add the users. 

Want to learn more?  Drop us an email at solutions@assess.com.

Estimated reading time: 6 minutes

Technology-enhanced items are assessment items (questions) that utilize technology to improve the interaction of the item, over and above what is possible with paper.  Tech-enhanced items can improve examinee engagement (important with K12 assessment), assess complex concepts with higher fidelity, improve precision/reliability, and enhance face validity/sellability. 

To some extent, the last word is the key one; tech-enhanced items simply look sexier and therefore make an assessment platform easier to sell, even if they don’t actually improve assessment.  I’d argue that there are also technology-enabled items, which are distinct, as discussed below.

What is the goal of technology enhanced items?

The goal is to improve assessment, by increasing things like reliability/precision, validity, and fidelity. However, there are a number of TEIs that is actually designed more for sales purposes than psychometric purposes. So, how to know if TEIs improve assessment?  That, of course, is an empirical question that is best answered with an experiment.  But let me suggest one metric address this question: how far does the item go beyond just reformulating a traditional item format to use current user-interface technology?  I would define the reformulating of traditional format to be a fake TEI while going beyond would define a true TEI.

  An alternative nomenclature might be to call the reformulations technology-enhanced items and the true tech usage to be technology-enabled items (Almond et al, 2010; Bryant, 2017), as they would not be possible without technology.

A great example of this is the relationship between a traditional multiple response item and certain types of drag and drop items.  There are a number of different ways that drag and drop items can be created, but for now, let’s use the example of a format that asks the examinee to drag text statements into a box. 

An example of this is K12 assessment items from PARCC that ask the student to read a passage, then the item presents a list of some statements about the story, asking the student to drag all true statements into a box.  Take this tech-enhanced item drag & drop, for example.

Brians winter drag drop statements

Now, consider the following item, often called multiple response.

Brians winter multiple response

You can see how this item is the exact same in terms of psychometric interaction: the student is presented a list of statements, and select those they think are true.  The item is scored with integers from 0 to K where K is the number of correct statements; the integers are often then used to implement the generalized partial credit model for final scoring.  This would be true regardless of whether the item was presented as multiple response vs. drag and drop. The multiple response item, of course, could just as easily be delivered via paper and pencil. Converting it to drag and drop enhances the item with technology, but the interaction of the student with the item, psychometrically, remains the same.

Some True TEIs, or Technology Enabled Items

Of course, the past decade or so has witnessed stronger innovation in item formats. Gamified assessments change how the interaction of person and item is approached, though this is arguably not as relevant for high stakes assessment due to concerns of validity. There are also simulation items. For example, a test for a construction crane operator might provide an interface with crane controls and ask the examinee to complete a tasks. Even at the K-12 level there can be such items, such as the simulation of a science experiment where the student is given various test tubes or other instruments on the screen.

Both of these approaches are extremely powerful but have a major disadvantage: cost. They are typically custom-designed. In the case of the crane operator exam or even the science experiment, you would need to hire software developers to create this simulation. There are now some simulation-development ecosystems that make this process more efficient, but the items still involve custom authoring and custom scoring algorithms.

To address this shortcoming, there is a new generation of self-authored item types that are true TEIs. By “self-authored” I mean that a science teacher would be able to create these items themselves, just like they would a multiple choice item. The amount of technology leveraged is somewhere between a multiple choice item and a custom-designed simulation, providing a compromise of reduced cost but still increasing the engagement for the examinee. An example of this is shown below from ASC’s Assess.ai assessment platform. A major advantage of this approach is that the items do not need custom scoring algorithms, and instead are typically scored via point integers, which enables the use of polytomous item response theory.

tech enhanced items

Are we at least moving forward?  Not always!

There is always pushback against technology, and in this topic the counterexample is the gridded item type.  It actually goes in reverse of innovation, because it doesn’t take a traditional format and reformulate it for current UI. It actually ignores the capabilities of current UI (actually, UI for the past 20+ years) and is therefore a step backward. With that item type, students are presented a bubble sheet from a 1960s style paper exam, on a computer screen, and asked to fill in the bubbles by clicking on them rather than using a pencil on paper.

Another example is the EBSR item type from the artist formerly known as PARCC. It was a new item type that intended to assess deeper understanding, but it did not use any tech-enhancement or -enablement, instead asking two traditional questions in a linked manner. As any psychometrician can tell you, this approach ignored basic assumptions of psychometrics, so you can guess the quality of measurement that it put out.

How can I implement TEIs?

It takes very little software development expertise to develop a platform that supports multiple choice items. An item like the graphing one above, though, takes substantial investment. So there are relatively few platforms that can support these, especially with best practices like workflow item review or item response theory. You can try authoring them for free in our Assess.ai assessment platform, or if you have more questions, contact solutions@assess.com.

Estimated reading time: 3 minutes

Automated item generation (AIG) is a paradigm for developing assessment items (test questions), utilizing principles of artificial intelligence and automation. As the name suggests, it tries to automate some or all of the effort involved with item authoring, as that is one of the most time-intensive aspects of assessment development – which is no news to anyone who has authored test questions!

Items can cost up to $2000 to develop, so even cutting the average cost in half could provide massive time/money savings to an organization.

There are two types of automated item generation:

Type 1: Item Templates (Current Technology)

The first type is based on the concept of item templates to create a family of items using dynamic, insertable variables. There are three stages to this work. For more detail, read this article by Gierl, Lai, and Turner (2012).

  • Authors, or a team, create an cognitive model by isolating what it is they are exactly trying to assess and different ways that it the knowledge could be presented or evidenced. This might include information such as what are the important vs. incidental variables, and what a correct answer should include .
  • They then develop templates for items based on this model, like the example you see below.
  • An algorithm then turns this template into a family of related items, often by producing all possible permutations.

Obviously, you can’t use more than one of these on a given test form. And in some cases, some of the permutations will be an unlikely scenario or possibly completely irrelevant. But the savings can still be quite real. I saw a conference presentation by Andre de Champlain from the Medical Council of Canada, stating that overall efficiency improved by 6x and the generated items were higher quality than traditionally written items because the process made the authors think more deeply about what they were assessing and how. He also recommended that template permutations not be automatically moved to the item bank but instead that each is reviewed by SMEs, for reasons such as those stated above.

You might think “Hey, that’s not really AI…” – AI is doing things that have been in the past done by humans, and the definition gets pushed further every year. Remember, AI used to be just having the Atari be able to play Pong with you!

Type 2: AI Processing of Source Text (Future Technology)

The second type is what the phrase “automated item generation” more likely brings to mind: upload a textbook or similar source to some software, and it spits back drafts of test questions. For example, see this article by von Davier (2019). This technology is still cutting edge and working through issues. For example, how do you automatically come up with quality, plausible distractors for a multiple-choice item? This might be automated in some cases like mathematics, but in most cases, the knowledge of plausibility lies with content matter expertise. Moreover, this approach is certainly not accessible for the typical educator. It is currently in use, but by massive organizations that spend millions of dollars.

How Can I Implement Automated Item Generation?

AIG has been used the large testing companies for years but is no longer limited to their domain. It is now available off the shelf as part of ASC’s nextgen assessment platform, Assess.ai. Best of all, that component is available at the free subscription level, all you need to do is register with a valid email address.

Assess.ai provides a clean, intuitive interface to implement Type 1 AIG, in a way that is accessible to all organizations. Develop your item templates, insert dynamic fields, and then export the results to review then implement in an item banking system, which is also available for free in Assess.ai.

automated item generation
Asses’s automated item generation template

If you have worked in the field of assessment and psychometrics, you have undoubtedly encountered the word “standard.” While a relatively simple word, it has the potential to be confusing because it is used in three (and more!) completely different but very important ways. Here’s a brief discussion.

Standard = Cutscore

As noted by the well-known professor Gregory Cizek here, “standard setting refers to the process of establishing one or more cut scores on a test.” The various methods of setting a cutscore, like Angoff or Bookmark, are referred to as standard setting studies. In this context, the standard is the bar that separates a Pass from a Fail. We use methods like the ones mentioned to determine this bar in as scientific and defensible fashion as possible, and give it more concrete meaning than an arbitrarily selected round number like 70%. Selecting a round number like that will likely get you sued since there is no criterion-referenced interpretation.

Standard = Blueprint

If you work in the field of education, you often hear the term “educational standards.” These refer to the curriculum blueprints for an educational system, which also translate into assessment blueprints, because you want to assess what is on the curriculum. Several important ones in the USA are noted here, perhaps the most common of which nowadays is the Common Core State Standards, which attempted to standardize the standards across states. These standards exist to standardize the educational system, by teaching what a group of experts have agreed upon should be taught in 6th grade Math classes for example. Note that they don’t state how or when a topic should be taught, merely that 6th Grade Math should cover Number Lines, Measurement Scales, Variables, whatever – sometime in the year.

Standard = Guideline

If you work in the field of professional certification, you hear the term just as often but in a different context, accreditation standards. The two most common are the National Commission for Certifying Agencies (NCCA) and the ANSI National Accreditation Board (ANAB). These two organizations are a consortium of credentialing bodies that give a stamp of approval to credentialing bodies, stating that a Certification or Certificate program is legit. Why? Because there is no law to stop me from buying a textbook on any topic, writing 50 test questions in my basement, and selling it as a Certification. It is completely a situation of caveat emptor, and these organizations are helping the buyers by giving a stamp of approval that the certification was developed with accepted practices like a Job Analysis, Standard Setting Study, etc.

In addition, there are the professional standards for our field. These are guidelines on assessment in general rather than just credentialing. Two great examples are the AERA/APA/NCME Standards for Educational and Psychological Measurement and the International Test Commission’s Guidelines (yes they switch to that term) on various topics.

Also: Standardized = Equivalent Conditions

The word is also used quite frequently in the context of standardized testing, though it is rarely chopped to the root word “standard.” In this case, it refers to the fact that the test is given under equivalent conditions to provide greater fairness and validity. A standardized test does NOT mean multiple choice, bubble sheets, or any of the other pop connotations that are carried with it. It just means that we are standardizing the assessment and the administration process. Think of it as a scientific experiment; the basic premise of the scientific method is holding all variables constant except the variable in question, which in this case is the student’s ability. So we ensure that all students receive a psychometrically equivalent exam, with equivalent (as much as possible) writing utensils, scrap paper, computer, time limit, and all other practical surroundings. The problem comes with the lack of equivalence in access to study materials, prep coaching, education, and many bigger questions… but those are a societal issue and not a psychometric one.

So despite all the bashing that the term gets, a standardized test is MUCH better than the alternatives of no assessment at all, or an assessment that is not a level playing field and has low reliability. Consider the case of hiring employees: if assessments were not used to provide objective information on applicant skills and we could only use interviews (which are famously subjective and inaccurate), all hiring would be virtually random and the amount of incompetent people in jobs would increase a hundredfold. And don’t we already have enough people in jobs where they don’t belong?

A standard setting study is a formal process for establishing a performance standard. In the assessment world, there are actually two uses of the word standard – the other one refers to a formal definition of the content that is being tested, such as the Common Core State Standards in the USA. For this reason, I prefer the term cutscore study.

After item authoring, item review, and test form assembly, a cutscore or passing score will often be set to determine what level of performance qualified as “pass” or a similar classification.  This cannot be done arbitrarily (e.g., setting it at 70% because that’s what you saw when you were in school).  To be legally defensible and eligible for Accreditation, it must be done using one of several standard-setting approaches from the psychometric literature.  

The choice of method depends upon the nature of the test, the availability of pilot data, and the availability of subject matter experts.

Some types of Cutscore Studies:

  • Angoff – In an Angoff study, a panel of subject matter experts rates each item, estimating the percentage of minimally competent candidates that would answer each item correctly.  It is often done in tandem with the Beuk Compromise.  The Angoff method does not require actual examinee data, though the Beuk does.
  • Bookmark – The bookmark method orders the items in a test form in ascending difficulty, and a panel of experts reads through and places a “bookmark” in the book where they think a cutscore should be.  Obviously, this requires enough real data to calibrate item difficulty, usually using item response theory, which requires several hundred examinees.
  • Contrasting Groups – Candidates are sorted into Pass and Fail groups based on their performance on a different exam or some other unrelated standard.  If using data from another exam, a sample of at least 50 candidates is obviously needed.
  • Borderline Group – Similar to Contrasting Groups, but a borderline group is defined using alternative information such as biodata, and the scores of the group are evaluated.

One of the most cliche phrases associated with assessment is “teaching to the test.”  I’ve always hated this phrase, because it is only used in a derogatory matter, almost always by people who do not understand the basics of assessment and psychometrics.  I recently saw it mentioned in this article on PISA, and that was one time too many, especially since it was used in an oblique, vague, and unreferenced manner.

So, I’m going to come out and say something very unpopular: in most cases, TEACHING TO THE TEST IS A GOOD THING.

 

Why teaching to the test is usually a good thing

If the test reflects the curriculum – which any good test will – then someone who is teaching to the test will be teaching to the curriculum.  Which, of course, is the entire goal of teaching. The phrase “teaching to the test” is used in an insulting sense, especially because the alliteration is resounding and sellable, but it’s really not a bad thing in most cases.  If a curriculum says that 4th graders should learn how to add and divide fractions, and the test evaluates this, what is the problem? Especially if it uses modern methodology like adaptive testing or tech-enhanced items to make the process more engaging and instructional, rather than oversimplifying to a text-only multiple choice question on paper bubble sheets?

The the world of credentialing assessment, this is an extremely important link.  Credential tests start with a job analysis study, which surveys professionals to determine what they consider to be the most important and frequently used skills in the job.  This data is then transformed into test blueprints. Instructors for the profession, as well as aspiring students that are studying to pass the test, then focus on what is in the blueprints.  This, of course, still contains the skills that are most important and frequently used in the job!

 

So what is the problem then?

Now, telling teachers how to teach is more concerning, and more likely to be a bad thing.  Finland does well because it gives teachers lots of training and then power to choose how they teach, as noted in the PISA article.

As a counterexample, my high school math department made an edict starting my sophomore year that all teachers had to use the “Chicago Method.”  It was pure bunk and based on the fact that students should be doing as much busy work as possible instead of the teachers actually teaching. I think it is because some salesman convinced the department head to make the switch so that they would buy a thousand brand new textbooks.  The method makes some decent points (here’s an article from, coincidentally, when I was a sophomore in high school) but I think we ended up with a bastardization of it, as the edict was primarily:

  1. Assign students to read the next chapter in class (instead of teaching them!); go sit at your desk.
  2. Assign students to do at least 30 homework questions overnight, and come back tomorrow with any questions they have.  
  3. Answer any questions, then assign them the next chapter to read.  Whatever you do, DO NOT teach them about the topic before they start doing the homework questions.  Go sit at your desk.

Isn’t that preposterous?  Unsurprisingly, after two years of this, I went from being a leader of the Math Team to someone who explicitly said “I am never taking Math again”.  And indeed, I managed to avoid all math during my senior year of high school and first year of college. Thankfully, I had incredible professors in my years at Luther College, leading to me loving math again, earning a math major, and applying to grad school in psychometrics.  This shows the effect that might happen with “telling teachers how to teach.” Or in this case, specifically – and bizarrely – to NOT teach.

 

What about all the bad tests out there?

Now, let’s get back to the assumption that a test does reflect a curriculum/blueprints.  There are, most certainly, plenty of cases where an assessment is not designed or built well.  That’s an entirely different problem, and is an entirely valid concern. I have seen a number of these in my career.  This danger why we have international standards on assessments, like AERA/APA/NCME and NCCA.  These provide guidelines on how a test should be build, sort of like how you need to build a house according to building code and not just throwing up some walls and a roof.

For example, there is nothing that is stopping me from identifying a career that has a lot of people looking to gain an edge over one another to get a better job… then buying a textbook, writing 50 questions in my basement, and throwing it up on a nice-looking website to sell as a professional certification.  I might sell it for $395, and if I get just 100 people to sign up, I’ve made $39,500!!!! This violates just about every NCCA guideline, though. If I wanted to get a stamp of approval that my certification was legit – as well as making it legally defensible – I would need to follow the NCCA guidelines.

My point here is that there are definitely bad tests out there, just like there are millions of other bad products in the world.  It’s a matter of caveat emptor. But just because you had some cheap furniture on college that broke right away, doesn’t mean you swear off on all furniture.  You stay away from bad furniture.

There’s also the problem of tests being misused, but again that’s not a problem with the test itself.  Certainly, someone making decisions is uninformed. It could actually be the best test in the world, with 100% precision, but if it is used for an invalid application then it’s still not a good situation.  For example, if you took a very well-made exam for high school graduation and started using it for employment decisions with adults. Psychometricians call this validity – that we have evidence to support the intended use of the test and interpretations of scores.  It is the #1 concern of assessment professionals, so if a test is being misused, it’s probably by someone without a background in assessment.

 

So where do we go from here?

Put it this way, if an overweight person is trying to become fitter, is success more likely to come from changing diet and exercise habits, or from complaining about their bathroom scale?  Complaining unspecifically about a high school graduation assessment is not going to improve education; let’s change how we educate our children to prepare them for that assessment, and ensure that the assessment reflects the goals of the education.  Nevertheless, of course, we need to invest in making the assessment as sound and fair as we can – which is exactly why I am in this career.