Before the introduction of online exams in the education sector, the mentioning of the word ‘exams’ was met with anxiety. Exams were limited to four walls and you could cut the tension in the exam center with a razor. The furious scribbling of papers, the sharp glances from the hawk-eyed invigilators, and the constant ticking of the wall clock is not any experience many will forget.

But then came the internet and there was a better way to assess students. Online exams, though not popular at the time, provided a better way to develop and deliver tests. Using psychometric methods such as Computerized Adaptive Testing and Item Response Theory, assessments became more reliable and secure. Delivery mechanisms including remote proctoring provided students with the ability to take their exams anywhere in the world.

However, despite all these numerous benefits, online exams remained in the dark till the pandemic hit. Many educational institutions and businesses embraced online exams and made them the core of their systems. Forward a year later, and the deployment of vaccines is underway. Many education institutions are confused about which examination models to stick to. Should you go on with the online exams model they used when everyone was stuck in their homes? Should you adopt hybrid examination models, or should you go back to the traditional pen-and-paper method?  

This blog post will provide you with an evaluation of whether offline exams are still worth it in 2021. 

Offline Exams; The good, the bad, and the ugly

The Good

Offline exams have been a stepping stone towards the development of modern assessment models that are more effective. We can’t ignore the fact that there are several advantages of traditional exams. 

Some advantages of offline exams include students having familiarity with the system, development of a social connection between learners, exemption from technical glitches, and affordability. Some schools don’t have the resources and pen-and-paper assessments are the only option available. 

The Bad and The Ugly

However, the pen-and-paper method of assessment has not been able to ensure that exams that achieve their core objectives. The traditional method of assessments is paved with uncertainties and inaccuracies.

 How do you develop and decide the main learning objectives? How do you measure performance and know what to do to improve learning outcomes? And how do you evaluate student strengths and weaknesses? These are just a few questions that the traditional assessment method can’t answer.

 Below is a list of challenges pen-and-paper methods face from test development to evaluation:

1. Needs a lot of resources

From test development to evaluation, pen and paper methods require a lot of resources. Resources can range from high human resource fees to materials needed to develop and deliver the exams to students. 

2. Lack of seamless collaboration and scalability

The ability to cater to a bigger audience is important for productivity and saving resources. However,  the pen-and-paper method offers no room for scalability. Only a fixed number of students can take exams at a certain period. This is not only expensive but also wastes valuable time and increases the chances of leakage.

3. Prone to cheating

Most people think that offline exams are cheat-proof but that is not the case. Most offline exams count on invigilators and supervisors to make sure that cheating does not occur. However, many pen-and-paper assessments are open to leakages. High candidate-to-ratio is another factor that contributes to cheating in offline exams.

4. Poor student engagement

We live in a world of instant gratification and that is the same when it comes to assessments. Unlike online exams which have options to keep the students engaged, offline exams are open to constant destruction from external factors.

Offline exams also have few options when it comes to question types. 

5. Flawed evaluation system

To err is human”

 But, when it comes to assessments, accuracy, and consistency. Traditional evaluation methods are slow and labor-intensive. Instructors take a long time to evaluate tests. This defeats the entire purpose of assessments.

6. Poor result analysis

Pen-and-paper exams depend on instructors to analyze the results and come up with insight. This requires a lot of human resources and expensive software. It is also difficult to find out if your learning strategy is working or it needs some adjustments. 

A glimpse into online exams

Also referred to as digital exams or e-exams, online exams are delivered over the internet. The best online examination platforms include modules to facilitate the entire examination process, from development to delivery. Online exams provide instructors with the ability to diversify question types and monitor the assessment process. Using learning analytics, they are also able to modify learning methods to increase the quality of output. 

Online exams work like your typical pen-and-paper methods but with more accuracy, scalability, and reliability. The grading of the papers is done automatically after the assessment is done, depending on the question types. For essays, instructors can use online essay scoring. E-examinations have improved the assessment process and those are just a few examples. 

Here are some pros and cons of online exams to help you contemplate whether online exams are for you

The pros of online exams

1. Scalability

Unlike traditional testing methods which have a fixed number of people who can take an exam in a fixed time, online exams can cater to bigger audiences. This saves education institutions a lot of resources that would be invested in developing and managing examination centers. 

2. Automated report generation and visualization

This is the greatest advantage online exams have over offline exams. The automated report generation and visualization functions integrated into online assessment platforms enable instructors to accurately gauge learning outcomes. This gives them actionable insight to improve the learning process. 

3. Accessibility

Online exams can be taken from anywhere in the world. All one needs is a computer and an internet connection. This has given students access to knowledge from global learning institutions.

4. Support for diversified question types

Unlike traditional exams which are limited to a certain number of question types, online exams offer many question types option. Multiple Choice Questions, video assessments, coding simulators, and many other question types are supported. With this kind of freedom, instructors are able to decide which question types are fit for certain topics. 

5. In-built psychometrics

Psychometrics is an important part of an assessment as it ensures the development of high-quality tests. The implementation of psychometrics into traditional pen-and-paper methods is a difficult process and depends on the experience of instructors. 

With online exams, you can easily capitalize on it through tech-enhanced items, automated test assembly, Computerized Adaptive Testing, etc. 

6. Improved academic integrity

Cheating is the biggest concern when it comes to online exams. Most people wonder, ‘Isn’t giving students access to a computer and a high-speed internet connection just handing over answers to students?’ Well, that is far from the truth. 

In fact, online exams are safer than offline exams. Online exams are protected using advanced technologies such as Lockdown browser, IP-based authentication, AI-flagging, and many other strategies.

 Check out this article to learn how online exams are secured.  

7. Environmental friendliness

Sustainability is an important aspect of modern civilization.  Online exams eliminate the need to use resources that are not environmentally friendly such as paper. 

The cons of online exams

1. Digital transformation challenges

The process of transitioning examination from offline models to online platforms is one that requires intense planning and resources. However, this barrier can be eliminated easily by creating awareness among students and instructors on how to capitalize on digital assessments.

You can also hire firms with experience in migrating to digital assessments to help in the process. 

2. Academic integrity concerns

Cheating concerns still remain a turn-off for institutions that wish to transition to online exams. There are many ways students can circumnavigate security protocols to cheat in exams. Some ways include impersonation, external help, surfing the internet, and many others. 

However, these cheating ‘tricks’ can be avoided using Assess’ online assessment software with security features such as lock-down browser, IP-based authentication, and AI-powered remote proctoring. 

Offline Exams vs Online exams

traditional approach vs modern approach

Offline exams vs E-examinations

Conclusion

Are offline exams still worth it in 2021? No, they are not. As we have seen from the above sections, the traditional exam approach has several flaws that are barriers to effective assessment. However, it’s not as simple as that. There are many instances where the traditional approach would be a better option. Some instances include when students can’t afford infrastructure. But, when you are looking to conduct high-stakes examinations, online exams are the best option. 

How Assess Can Help 

Transitioning from offline exams to offline exams is not a simple task. That is why Assess is here to help you every step of the way, from test development to delivery. We provide you with the best assessment software and access to the most experienced team. Ready to take your assessments online?

 

 

Assessment is an important part of the learning process as it helps enhance the quality of learning outcomes. With the increased adoption of online assessments, especially during and after the Covid-19 pandemic,  it is important to put in place practices that ease the development of effective online assessments.

 

This is because well-crafted online assessments positively influence the ways in which students approach problems. Online assessments also provide many benefits compared to traditional assessments. Some benefits include; improved grading accuracy, accessibility, improved feedback methods, and many others.

 

But, developing effective online assessments is not an easy task. There are a lot of forces at play.

 

 This 2-part blog series aims to provide you with actionable tips and strategies that can help you improve your online assessments.

 

But, before getting into the nitty-gritty, let’s see some characteristics of high-quality online exams;

Characteristics of High-quality online assessments

Here are some characteristics to look for in good online assessments;

 

  • Fair, defensible, and bias-free
  • Cost-effective and practical
  • Keep track of progress (Short-term and long-term)
  • Flexible and able to scale
  • Provide a real learning experience
  • Include a scoring system that reflects a mastery and not a gross score
  • Should provide diversified question types
  • Provide good feedback mechanisms and actionable insight
  • In alignment with the involved curriculum and standards
  • Reliable and accessible to everyone

 

Now let’s design some effective digital exams!

Use online quizzes to spot student misconceptions

Spotting knowledge gaps is an important part of assessments and online quizzes can help with this. This approach involves giving lecture videos to students before class and then testing them on the same. The answers and feedback are given immediately, preferably with some guidance. 

 

The preferable question type for this strategy is the ‘Fill-in-the-blank’ type or MCQ. 5 attempts are given to each student and they can score full marks if they score correctly within that time.

 

The student responses can then be analyzed, especially the first attempt, and the insights used to shape the learning experience in a good way. 

Capitalize on Adaptive Testing

The benefits of adaptive testing are too numerous to miss out on.

 

What is adaptive testing? Adaptive testing is the delivery of a test to an examinee using an algorithm that adapts the difficulty of the test to their ability.  It also adapts to the number of items used (no need to waste their time). 

It’s like the High Jump in Track & Field.  

 

You start the bar in a middling-low position.  If you make it, the bar is raised.  This continues until you fail, and then it is dropped a little.  Or if you fail the first one, it can be dropped until you succeed.

 

Some benefits of integrating adaptive testing in your strategy include shorter tests, improved test security, individualized exam pacing, and increased motivation. For more information about adaptive testing, check out this blog post

Adaptive testing options

 

Want to start using adaptive testing? Contact us to get started right away. 

Choose the right online assessment tools

Choosing the right online assessment software to develop and deliver exams is not something you can avoid. Not only does digital assessment software automate repetitive tasks, increase efficiency but also helps shape the learning process for better outcomes.

 

But, with the concentrated market of online assessment software, it can be difficult to find the right software. Choosing software that does not align with your assessment strategy can be catastrophic.

 

 A good online assessment platform should cater to your needs in every step of the test development cycle. It should offer the best functionality and have a world-class team at your disposal. 

 

Yet, that can’t be enough to choose software that will help you develop effective online assessments. Here is the perfect resource to help you choose the appropriate tool. 

Understand the ‘Why’ Of assessments

Having a clear definition of why you are developing an assessment is key to making them effective. John Biggs and Catherine Tang, in their constructive alignment theory, argue that assessment tasks (AT) and teaching-learning outcomes (TLA) are created to make sure that students achieve their intended learning outcomes (ITL).

 

 Assessments should be developed based on the ILOs for particular topics. Different learning outcomes are achieved when a variety of assessment types are used. 

 

Effective online assessments have clear goals and create a learning process that ensures students have a chance to self-learn, practice, and receive actionable feedback. 

 

Multiple Choice Questions

Multiple-choice questions are unavoidable when it comes to online assessments. They feature all the benefits an overwhelmed teacher would wish to have in an exam, easy to deliver, easy to develop, and easy to score.  While they often face a poor reputation, they remain so common for a reason: they provide the most bang for your buck in assessment, namely contributing to reliable/valid scores without soaking too much time from teachers or students.

 

But, the art of developing effective MCQs is one that very few possess. To help you save some time here is how to develop effective multiple-choice questions.

 

  • Be strategic when developing MCQs. This is because many tutors fall into the trap of how simple it is to create MCQs, that they lose grip of the big picture. 
  • The responses should be direct. No fluff! 
  • Avoid options like ‘None of the above or ‘All of the above like the plague. Slip them into the assessment occasionally though. 
  • Make them engaging 
  • Language matters. A lot! Avoid grammatical errors or logical contradictions. 
  • Ensure consistency in both the right answers and the distractors. 
  • Stems should be direct and clear. 
  • Eliminate barriers such as sensitive issues or bias for students

 

The effective online assessment checklist

That is a lot of information to digest, but here is a simple checklist to help you know if you are on track to develop online assessments that are effective.

 

  • What defines success in your online assessments? You should have well-established performance metrics to make sure that your examinees get the best out of your entire learning journey?
  • What type of feedback system do you have? The feedback method should help both the student and the teacher get something out of the examination process.
  • How diversified are your online assessments? You should offer different question types depending on learning objectives.
  • Do you have a good online assessment tool? A good online exam tool should provide all the necessary functionality to develop good exams. Feel free to check out this blog post to understand what makes a good online assessment software. 
  • Does your strategy involve peer assessments or self-assessments?
  • Do your assessments empower and motivate students to give their best?
  • To what extent do you involve your students in the assessment process? Make sure to involve them as much as possible. 
  • Are your exams in alignment with the best psychometrics and international standards?
  • How secure and defensible are your exams? Feel free to check out this resource to understand online exam security.
  • How often do you try different assessment strategies? You should keep running tests using different approaches to see what works and what doesn’t.

Final thoughts

Did you check most of the boxes in that list? If yes, you are on your way to developing effective online assessments. If not, try implementing the strategies discussed in the blog post. 

Developing good online exams is not an easy task. It requires a lot of dedication and time. 

If you get stuck, feel free to contact us and we will help you with your entire digital assessment journey. Not only do we have a complete suite of online assessment software to help you develop and deliver effective online exams, but also have an experienced team to guide you every step of the way. 

 

Resources for extra reading

https://cetl.ppu.edu/sites/default/files/publications/-John_Biggs_and_Catherine_Tang-_Teaching_for_Quali-BookFiorg-.pdf

 

https://www.nus.edu.sg/cdtl/docs/default-source/professional-development-docs/resources/designing-online-assessments.pdf

 

There are many types of remote proctoring on the market, spread across dozens of vendors, especially new ones that sought to capitalize on the pandemic which were not involved with assessment before hand.  With so many options, how can you more effectively select amongst the types of remote proctoring?

What is remote proctoring?

Remote proctoring refers to the proctoring (invigilation) of educational or professional assessments when the proctor is not in the same room as the examinee.  This means that it is done with a video stream or recording, which are monitored by a human and/or AI.  It is also referred to as online proctoring.

Remote proctoring offers a compelling alternative to in-person proctoring, somewhere in between unproctored at-home tests and tests delivered in an expensive testing center.  This makes it a perfect fit for medium-stakes exams, such as university placement, pre-employment screening, and many types of certification/licensure tests.

What are the types of remote proctoring?

There are four types of remote proctoring, which can be adapted to a particular use case, sometimes varying between different tests in a single organization.  ASC supports all four types, and partners with 5 different vendors to help provide the best solution to our clients.  In descending order of security:

 

Approach What it entails for you What it entails for the candidate

Live with professional proctors

  • You register a set of examinees in FastTest, and tell us when they are to take their exams and under what rules.
  • We provide the relevant information to the proctors.
  • You send all the necessary information to your examinees.
  • The most secure of the types of remote proctoring.
  • Examinee goes to ascproctor.com, where they will initiate a chat with a proctor.
  • After confirmation of their identity and workspace, they are provided information on how to take the test.
  • The proctor then watches a video stream from their webcam as well as a phone on the side of the room, ensuring that the environment is secure. They do not see the screen, so your exam content is not exposed.
  • When the examinee is finished, they notify the proctor, and are excused.

Live, bring your own proctor (BYOP)

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • Your staff logs into the admin portal and awaits examinees.
  • Videos with AI flagging are available for later review if needed.
  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  Proctors ask the examinee to provide identity verification, then launch the test.
  • Examinee is watched on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

Record and Review (with option for AI)

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • After examinees take the test, your staff (or ours) logs into review all the videos and report on any issues.  AI will automatically flag irregular behavior, making your reviews more time-efficient.

 

  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  System asks the examinee to provide identity verification, then launch the test.
  • Examinee is recorded on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

AI only

  • You upload examinees into FastTest, which will generate links.
  • You send relevant instructions and the links to examinees.
  • Videos are stored for 1 month if you need to check any.

 

  • Examinee will click on a link, which launches the proctoring software.
  • An automated system check is performed.
  • The proctoring is launched.  System asks the examinee to provide identity verification, then launch the test.
  • Examinee is recorded on the webcam and screencast.  AI algorithms help to flag irregular behavior.
  • Examinee concludes the test

 

Some case studies

We’ve worked with all types of remote proctoring, across many types of assessment:

  • ASC delivers high-stakes certification exams for a number of certification boards, in multiple countries, using the live proctoring with professional proctors.  Some of these are available continuously on-demand, while others are on specific days where hundreds of candidates log in.
  • We partnered with a large university in South America, where their admissions exams were delivered using Bring Your Own Proctor, enabling them to drastically reduce costs by utilizing their own staff.
  • We partnered with a private company to provide AI-enhanced record-and-review proctoring for applicants, where ASC staff reviews the results and provides a report to the client.
  • We partner with an organization that delivers civil service exams for a country, and utilizes both unproctored and AI-only proctoring, differing across a range of exam titles.

How do I select a vendor?

First, determine the level of security necessary, and the trade-off with costs.  Live proctoring with professionals can cost $20 to $100 or more, while AI proctoring can be as little as a few dollars.  Then, evaluate some vendors to see which group they fall into; note that some vendors can do all of them!  Then, ask for some demos so you understand the business processes involved and the UX on the examinee side, both of which could substantially impact the soft costs for your organization.  Then, start negotiating with the vendor you want!

 

Want some more information?

Get in touch with us, we’d love to show you a demo!

Email solutions@assess.com.

 

Estimated reading time: 6 minutes

Technology-enhanced items are assessment items (questions) that utilize technology to improve the interaction of the item, over and above what is possible with paper.  Tech-enhanced items can improve examinee engagement (important with K12 assessment), assess complex concepts with higher fidelity, improve precision/reliability, and enhance face validity/sellability. 

To some extent, the last word is the key one; tech-enhanced items simply look sexier and therefore make an assessment platform easier to sell, even if they don’t actually improve assessment.  I’d argue that there are also technology-enabled items, which are distinct, as discussed below.

What is the goal of technology enhanced items?

The goal is to improve assessment, by increasing things like reliability/precision, validity, and fidelity. However, there are a number of TEIs that is actually designed more for sales purposes than psychometric purposes. So, how to know if TEIs improve assessment?  That, of course, is an empirical question that is best answered with an experiment.  But let me suggest one metric address this question: how far does the item go beyond just reformulating a traditional item format to use current user-interface technology?  I would define the reformulating of traditional format to be a fake TEI while going beyond would define a true TEI.

  An alternative nomenclature might be to call the reformulations technology-enhanced items and the true tech usage to be technology-enabled items (Almond et al, 2010; Bryant, 2017), as they would not be possible without technology.

A great example of this is the relationship between a traditional multiple response item and certain types of drag and drop items.  There are a number of different ways that drag and drop items can be created, but for now, let’s use the example of a format that asks the examinee to drag text statements into a box. 

An example of this is K12 assessment items from PARCC that ask the student to read a passage, then the item presents a list of some statements about the story, asking the student to drag all true statements into a box.  Take this tech-enhanced item drag & drop, for example.

Brians winter drag drop statements

Now, consider the following item, often called multiple response.

Brians winter multiple response

You can see how this item is the exact same in terms of psychometric interaction: the student is presented a list of statements, and select those they think are true.  The item is scored with integers from 0 to K where K is the number of correct statements; the integers are often then used to implement the generalized partial credit model for final scoring.  This would be true regardless of whether the item was presented as multiple response vs. drag and drop. The multiple response item, of course, could just as easily be delivered via paper and pencil. Converting it to drag and drop enhances the item with technology, but the interaction of the student with the item, psychometrically, remains the same.

Some True TEIs, or Technology Enabled Items

Of course, the past decade or so has witnessed stronger innovation in item formats. Gamified assessments change how the interaction of person and item is approached, though this is arguably not as relevant for high stakes assessment due to concerns of validity. There are also simulation items. For example, a test for a construction crane operator might provide an interface with crane controls and ask the examinee to complete a tasks. Even at the K-12 level there can be such items, such as the simulation of a science experiment where the student is given various test tubes or other instruments on the screen.

Both of these approaches are extremely powerful but have a major disadvantage: cost. They are typically custom-designed. In the case of the crane operator exam or even the science experiment, you would need to hire software developers to create this simulation. There are now some simulation-development ecosystems that make this process more efficient, but the items still involve custom authoring and custom scoring algorithms.

To address this shortcoming, there is a new generation of self-authored item types that are true TEIs. By “self-authored” I mean that a science teacher would be able to create these items themselves, just like they would a multiple choice item. The amount of technology leveraged is somewhere between a multiple choice item and a custom-designed simulation, providing a compromise of reduced cost but still increasing the engagement for the examinee. An example of this is shown below from ASC’s Assess.ai assessment platform. A major advantage of this approach is that the items do not need custom scoring algorithms, and instead are typically scored via point integers, which enables the use of polytomous item response theory.

tech enhanced items

Are we at least moving forward?  Not always!

There is always pushback against technology, and in this topic the counterexample is the gridded item type.  It actually goes in reverse of innovation, because it doesn’t take a traditional format and reformulate it for current UI. It actually ignores the capabilities of current UI (actually, UI for the past 20+ years) and is therefore a step backward. With that item type, students are presented a bubble sheet from a 1960s style paper exam, on a computer screen, and asked to fill in the bubbles by clicking on them rather than using a pencil on paper.

Another example is the EBSR item type from the artist formerly known as PARCC. It was a new item type that intended to assess deeper understanding, but it did not use any tech-enhancement or -enablement, instead asking two traditional questions in a linked manner. As any psychometrician can tell you, this approach ignored basic assumptions of psychometrics, so you can guess the quality of measurement that it put out.

How can I implement TEIs?

It takes very little software development expertise to develop a platform that supports multiple choice items. An item like the graphing one above, though, takes substantial investment. So there are relatively few platforms that can support these, especially with best practices like workflow item review or item response theory. You can try authoring them for free in our Assess.ai assessment platform, or if you have more questions, contact solutions@assess.com.

Estimated reading time: 3 minutes

Automated item generation (AIG) is a paradigm for developing assessment items (test questions), utilizing principles of artificial intelligence and automation. As the name suggests, it tries to automate some or all of the effort involved with item authoring, as that is one of the most time-intensive aspects of assessment development – which is no news to anyone who has authored test questions!

Items can cost up to $2000 to develop, so even cutting the average cost in half could provide massive time/money savings to an organization.

There are two types of automated item generation:

Type 1: Item Templates (Current Technology)

The first type is based on the concept of item templates to create a family of items using dynamic, insertable variables. There are three stages to this work. For more detail, read this article by Gierl, Lai, and Turner (2012).

  • Authors, or a team, create an cognitive model by isolating what it is they are exactly trying to assess and different ways that it the knowledge could be presented or evidenced. This might include information such as what are the important vs. incidental variables, and what a correct answer should include .
  • They then develop templates for items based on this model, like the example you see below.
  • An algorithm then turns this template into a family of related items, often by producing all possible permutations.

Obviously, you can’t use more than one of these on a given test form. And in some cases, some of the permutations will be an unlikely scenario or possibly completely irrelevant. But the savings can still be quite real. I saw a conference presentation by Andre de Champlain from the Medical Council of Canada, stating that overall efficiency improved by 6x and the generated items were higher quality than traditionally written items because the process made the authors think more deeply about what they were assessing and how. He also recommended that template permutations not be automatically moved to the item bank but instead that each is reviewed by SMEs, for reasons such as those stated above.

You might think “Hey, that’s not really AI…” – AI is doing things that have been in the past done by humans, and the definition gets pushed further every year. Remember, AI used to be just having the Atari be able to play Pong with you!

Type 2: AI Processing of Source Text (Future Technology)

The second type is what the phrase “automated item generation” more likely brings to mind: upload a textbook or similar source to some software, and it spits back drafts of test questions. For example, see this article by von Davier (2019). This technology is still cutting edge and working through issues. For example, how do you automatically come up with quality, plausible distractors for a multiple-choice item? This might be automated in some cases like mathematics, but in most cases, the knowledge of plausibility lies with content matter expertise. Moreover, this approach is certainly not accessible for the typical educator. It is currently in use, but by massive organizations that spend millions of dollars.

How Can I Implement Automated Item Generation?

AIG has been used the large testing companies for years but is no longer limited to their domain. It is now available off the shelf as part of ASC’s nextgen assessment platform, Assess.ai. Best of all, that component is available at the free subscription level, all you need to do is register with a valid email address.

Assess.ai provides a clean, intuitive interface to implement Type 1 AIG, in a way that is accessible to all organizations. Develop your item templates, insert dynamic fields, and then export the results to review then implement in an item banking system, which is also available for free in Assess.ai.

automated item generation
Asses’s automated item generation template

 

If you have worked in the field of assessment and psychometrics, you have undoubtedly encountered the word “standard.” While a relatively simple word, it has the potential to be confusing because it is used in three (and more!) completely different but very important ways. Here’s a brief discussion.

Standard = Cutscore

As noted by the well-known professor Gregory Cizek here, “standard setting refers to the process of establishing one or more cut scores on a test.” The various methods of setting a cutscore, like Angoff or Bookmark, are referred to as standard setting studies. In this context, the standard is the bar that separates a Pass from a Fail. We use methods like the ones mentioned to determine this bar in as scientific and defensible fashion as possible, and give it more concrete meaning than an arbitrarily selected round number like 70%. Selecting a round number like that will likely get you sued since there is no criterion-referenced interpretation.

Standard = Blueprint

If you work in the field of education, you often hear the term “educational standards.” These refer to the curriculum blueprints for an educational system, which also translate into assessment blueprints, because you want to assess what is on the curriculum. Several important ones in the USA are noted here, perhaps the most common of which nowadays is the Common Core State Standards, which attempted to standardize the standards across states. These standards exist to standardize the educational system, by teaching what a group of experts have agreed upon should be taught in 6th grade Math classes for example. Note that they don’t state how or when a topic should be taught, merely that 6th Grade Math should cover Number Lines, Measurement Scales, Variables, whatever – sometime in the year.

Standard = Guideline

If you work in the field of professional certification, you hear the term just as often but in a different context, accreditation standards. The two most common are the National Commission for Certifying Agencies (NCCA) and the ANSI National Accreditation Board (ANAB). These two organizations are a consortium of credentialing bodies that give a stamp of approval to credentialing bodies, stating that a Certification or Certificate program is legit. Why? Because there is no law to stop me from buying a textbook on any topic, writing 50 test questions in my basement, and selling it as a Certification. It is completely a situation of caveat emptor, and these organizations are helping the buyers by giving a stamp of approval that the certification was developed with accepted practices like a Job Analysis, Standard Setting Study, etc.

In addition, there are the professional standards for our field. These are guidelines on assessment in general rather than just credentialing. Two great examples are the AERA/APA/NCME Standards for Educational and Psychological Measurement and the International Test Commission’s Guidelines (yes they switch to that term) on various topics.

Also: Standardized = Equivalent Conditions

The word is also used quite frequently in the context of standardized testing, though it is rarely chopped to the root word “standard.” In this case, it refers to the fact that the test is given under equivalent conditions to provide greater fairness and validity. A standardized test does NOT mean multiple choice, bubble sheets, or any of the other pop connotations that are carried with it. It just means that we are standardizing the assessment and the administration process. Think of it as a scientific experiment; the basic premise of the scientific method is holding all variables constant except the variable in question, which in this case is the student’s ability. So we ensure that all students receive a psychometrically equivalent exam, with equivalent (as much as possible) writing utensils, scrap paper, computer, time limit, and all other practical surroundings. The problem comes with the lack of equivalence in access to study materials, prep coaching, education, and many bigger questions… but those are a societal issue and not a psychometric one.

So despite all the bashing that the term gets, a standardized test is MUCH better than the alternatives of no assessment at all, or an assessment that is not a level playing field and has low reliability. Consider the case of hiring employees: if assessments were not used to provide objective information on applicant skills and we could only use interviews (which are famously subjective and inaccurate), all hiring would be virtually random and the amount of incompetent people in jobs would increase a hundredfold. And don’t we already have enough people in jobs where they don’t belong?

Estimated reading time: 5 minutes

What is computerized adaptive testing?  Computerized adaptive tests (CATs) are a sophisticated method of test delivery based on item response theory (IRT). By using AI algorithms to personalize the test to every examinee, the test becomes shorter, more accurate, more secure, and fairer.

 

What is computerized adaptive testing?

Computerized adaptive testing is an algorithm that drives how a test is delivered.  It is coded into a software platform, using the machine-learning approach of IRT to select items and score examinees.  The algorithm proceeds in a loop until the test is complete.

computerized adaptive testing

Adapting By Difficulty and Quantity

They operate by adapting both the difficulty and quantity of items seen by each examinee.

Difficulty
Most characterizations of adaptive testing focus on how item difficulty is matched to examinee ability. High-ability examinees receive more difficult items, while low ability examinees receive easier items, which has important benefits to the student and the organization. An adaptive test typically begins by delivering an item of medium difficulty; if you get it correct, you get a tougher item, and if you get it incorrect, you get an easier item. This basic algorithm continues until the test is finished, though it usually includes sub algorithms for important things like content distribution and item exposure.

Quantity
A less publicized facet of adaptation is the number of items. Adaptive tests can be designed to stop when certain psychometric criteria are reached, such as a specific level of score precision. Some examinees finish very quickly with few items, so that adaptive tests are typically about half as many questions as a regular test, with at least as much accuracy. Since some examinees have longer tests, these adaptive tests are referred to as variable-length. Obviously, this makes for a massive benefit: cutting testing time in half, on average, can substantially decrease testing costs.

But, some adaptive tests use a fixed length, and only adapt item difficulty. This is merely for public relations issues, namely the inconvenience of dealing with examinees who feel they were unfairly treated by the CAT, even though it is arguably more fair and valid than conventional tests.

Advantages of computerized adaptive testing

By making the test more intelligent, adaptive testing provides a wide range of benefits.  Some of the well-known advantages of adaptive testing, recognized by scholarly psychometric research, are listed below.  However, the development of an adaptive test is a very complex process that requires substantial expertise in item response theory (IRT) and CAT simulation research.  

Our experienced team of psychometricians can provide your organization with the requisite experience to implement adaptive testing and help your organization benefit from these advantages. Contact us or read this white paper to learn more.

  • Shorter tests, anywhere from a 50% to 90% reduction; reduces cost, examinee fatigue, and item exposure
  • More precise scores: CAT will make tests more accurate
  • More control of score precision (accuracy): CAT ensures that all students will have the same accuracy, making the test much more fair.  Traditional tests measure the middle students well but not the top or bottom students.
  • Increased efficiency
  • Greater test security because everyone is not seeing the same form
  • A better experience for examinees, as they only see items relevant for them, providing an appropriate challenge
  • The better experience can lead to increased examinee motivation
  • Immediate score reporting
  • More frequent retesting is possible; minimize practice effects
  • Individual pacing of tests; examinees move at their own speed
  • On-demand testing can reduce printing, scheduling, and other paper-based concerns
  • Storing results in a database immediately makes data management easier
  • Computerized testing facilitates the use of multimedia in items
 

No, you can’t just subjectively rank items!

Computerized adaptive tests (CATs) are the future of assessment. They operate by adapting both the difficulty and number of items to each individual examinee. The development of an adaptive test is no small feat, and requires five steps integrating the expertise of test content developers, software engineers, and psychometricians.

The development of a quality adaptive test is complex and requires experienced psychometricians in both item response theory (IRT) calibration and CAT simulation research. FastTest can provide you the psychometrician and software; if you provide test items and pilot data, we can help you quickly publish an adaptive version of your test.

Step 1: Feasibility, applicability, and planning studies. First, extensive monte carlo simulation research must occur, and the results formulated as business cases, to evaluate whether adaptive testing is feasible, applicable, or even possible.

Step 2: Develop item bank. An item bank must be developed to meet the specifications recommended by Step 1.

Step 3: Pretest and calibrate item bank. Items must be pilot tested on 200-1000 examinees (depends on IRT model) and analyzed by a Ph.D. psychometrician.

Step 4: Determine specifications for final CAT. Data from Step 3 is analyzed to evaluate CAT specifications and determine most efficient algorithms using CAT simulation software such as CATSim.

Step 5: Publish live CAT. The adaptive test is published in a testing engine capable of fully adaptive tests based on IRT. Want to learn more about our one-of-a-kind model? Click here to read the seminal article by two of our psychometricians. More adaptive testing research is available here.

What do I need for adaptive testing?

Minimum requirements:

  • A large item bank piloted on at least 500 examinees
  • 1,000 examinees per year
  • Specialized in IRT calibration and CAT simulation software.
  • Staff with a Ph.D. in psychometrics or an equivalent level of experience. Or, leverage our internationally recognized expertise in the field.
  • Items (questions) that can be scored objectively correct/incorrect in real-time
  • An item banking system and CAT delivery platform
  • Financial resources: Because it is so complex, the development of a CAT will cost at least $10,000 (USD) — but if you are testing large volumes of examinees, it will be a significant positive investment. If you pay $20/hour for proctoring seats and cut a test from 2 hours to 1 hour for just 1,000 examinees… that’s a $20,000 savings.

Adaptive testing: Resources for further reading

Visit the links below to learn more about adaptive testing.  

How can I start developing a CAT?

Contact us for a free account in our industry-leading CAT platform.

One of the most cliche phrases associated with assessment is “teaching to the test.”  I’ve always hated this phrase, because it is only used in a derogatory matter, almost always by people who do not understand the basics of assessment and psychometrics.  I recently saw it mentioned in this article on PISA, and that was one time too many, especially since it was used in an oblique, vague, and unreferenced manner.

So, I’m going to come out and say something very unpopular: in most cases, TEACHING TO THE TEST IS A GOOD THING.

 

Why teaching to the test is usually a good thing

If the test reflects the curriculum – which any good test will – then someone who is teaching to the test will be teaching to the curriculum.  Which, of course, is the entire goal of teaching. The phrase “teaching to the test” is used in an insulting sense, especially because the alliteration is resounding and sellable, but it’s really not a bad thing in most cases.  If a curriculum says that 4th graders should learn how to add and divide fractions, and the test evaluates this, what is the problem? Especially if it uses modern methodology like adaptive testing or tech-enhanced items to make the process more engaging and instructional, rather than oversimplifying to a text-only multiple choice question on paper bubble sheets?

The the world of credentialing assessment, this is an extremely important link.  Credential tests start with a job analysis study, which surveys professionals to determine what they consider to be the most important and frequently used skills in the job.  This data is then transformed into test blueprints. Instructors for the profession, as well as aspiring students that are studying to pass the test, then focus on what is in the blueprints.  This, of course, still contains the skills that are most important and frequently used in the job!

 

So what is the problem then?

Now, telling teachers how to teach is more concerning, and more likely to be a bad thing.  Finland does well because it gives teachers lots of training and then power to choose how they teach, as noted in the PISA article.

As a counterexample, my high school math department made an edict starting my sophomore year that all teachers had to use the “Chicago Method.”  It was pure bunk and based on the fact that students should be doing as much busy work as possible instead of the teachers actually teaching. I think it is because some salesman convinced the department head to make the switch so that they would buy a thousand brand new textbooks.  The method makes some decent points (here’s an article from, coincidentally, when I was a sophomore in high school) but I think we ended up with a bastardization of it, as the edict was primarily:

  1. Assign students to read the next chapter in class (instead of teaching them!); go sit at your desk.
  2. Assign students to do at least 30 homework questions overnight, and come back tomorrow with any questions they have.  
  3. Answer any questions, then assign them the next chapter to read.  Whatever you do, DO NOT teach them about the topic before they start doing the homework questions.  Go sit at your desk.

Isn’t that preposterous?  Unsurprisingly, after two years of this, I went from being a leader of the Math Team to someone who explicitly said “I am never taking Math again”.  And indeed, I managed to avoid all math during my senior year of high school and first year of college. Thankfully, I had incredible professors in my years at Luther College, leading to me loving math again, earning a math major, and applying to grad school in psychometrics.  This shows the effect that might happen with “telling teachers how to teach.” Or in this case, specifically – and bizarrely – to NOT teach.

 

What about all the bad tests out there?

Now, let’s get back to the assumption that a test does reflect a curriculum/blueprints.  There are, most certainly, plenty of cases where an assessment is not designed or built well.  That’s an entirely different problem, and is an entirely valid concern. I have seen a number of these in my career.  This danger why we have international standards on assessments, like AERA/APA/NCME and NCCA.  These provide guidelines on how a test should be build, sort of like how you need to build a house according to building code and not just throwing up some walls and a roof.

For example, there is nothing that is stopping me from identifying a career that has a lot of people looking to gain an edge over one another to get a better job… then buying a textbook, writing 50 questions in my basement, and throwing it up on a nice-looking website to sell as a professional certification.  I might sell it for $395, and if I get just 100 people to sign up, I’ve made $39,500!!!! This violates just about every NCCA guideline, though. If I wanted to get a stamp of approval that my certification was legit – as well as making it legally defensible – I would need to follow the NCCA guidelines.

My point here is that there are definitely bad tests out there, just like there are millions of other bad products in the world.  It’s a matter of caveat emptor. But just because you had some cheap furniture on college that broke right away, doesn’t mean you swear off on all furniture.  You stay away from bad furniture.

There’s also the problem of tests being misused, but again that’s not a problem with the test itself.  Certainly, someone making decisions is uninformed. It could actually be the best test in the world, with 100% precision, but if it is used for an invalid application then it’s still not a good situation.  For example, if you took a very well-made exam for high school graduation and started using it for employment decisions with adults. Psychometricians call this validity – that we have evidence to support the intended use of the test and interpretations of scores.  It is the #1 concern of assessment professionals, so if a test is being misused, it’s probably by someone without a background in assessment.

 

So where do we go from here?

Put it this way, if an overweight person is trying to become fitter, is success more likely to come from changing diet and exercise habits, or from complaining about their bathroom scale?  Complaining unspecifically about a high school graduation assessment is not going to improve education; let’s change how we educate our children to prepare them for that assessment, and ensure that the assessment reflects the goals of the education.  Nevertheless, of course, we need to invest in making the assessment as sound and fair as we can – which is exactly why I am in this career.

The field of psychometrics is definitely a small niche in the academic world, despite being an integral part of everyone’s life. When I’m trying to explain what I do to people from outside the field, I’m often asked something like, “Where do you even go to study something like that?”  I’m also frequently asked by people already in the field where they can go to get a graduate degree in psychometrics, especially on sophisticated topics like item response theory or adaptive testing

Well, there are indeed a good number of Ph.D. programs in psychometrics, though they rarely appear with that straightforward name, as you can see below.  This can make them tough to find even if you are specifically looking for them.

Note: This list is not intended to be comprehensive, but rather a sampling of the most well-known or unique programs.

  If you want to do deeper research and are actually shopping for a grad school, I highly recommend you check out a comprehensive list of programs on the NCME website.   I also recommend the SIOP list of grad programs; they are for I/O psychology but many of them have professors with expertise in things like assessment validation or item response theory.

How to choose a graduate degree in psychometrics or assessment?

Here’s an oversimplification of how I see the selection of education…

  1. When you are in high school and selecting a university or college, you are selecting a school.
  2. When you are 18-20 and selecting a major, you are selecting a department.
  3. When you are selecting where to pursue a Master’s, you are selecting a program.
  4. When you are selecting where to pursue a Ph.D., you are selecting an advisor.

The key point: When you do a Ph.D., you are going to spend a lot of time working one on one with your advisor, both for the dissertation but also likely for research projects.  It is therefore vital that you selected someone who not only aligns with your interests (otherwise you’ll be bored and disengaged) but also whom you quite simply like enough at a personal level to work one on one for several years!

University of Minnesota: Quantitative/Psychometrics Program (Psychology) and Quantitative Foundations of Educational Research (Education)

I’m partial to this one since it is where I completed my Ph.D., with Prof. David J. Weiss in the Psychology Department.  The UMN is interesting in that it actually has two separate graduate programs in psychometrics: the one in Psychology, which has since become more focused on quantitative psychology, but also one in the Education department.

Website: https://cla.umn.edu/psychology/graduate/areas-specialization/quantitativepsychometric-methods-qpm

http://www.cehd.umn.edu/edpsych/programs/qme/

University of Massachusetts: Research, Educational Measurement, and Psychometrics (REMP)

For many years, if you wanted to learn item response theory, you read Item Response Theory. Principles and Applications by Hambleton and Swaminathan (1985).  These were two longtime professors at UMass, and it speaks to the quality of that program.  Both have since retired but the faculty remains excellent.  Also, note that the program website has a nice page on psychometric resources and software.

Website: https://www.umass.edu/remp/

University of Iowa: Center for Advanced Studies in Measurement and Assessment

This program is in the Education department, and has the advantage of being in one of the epicenters of the industry: the testing giant ACT is headquartered only a few miles away, the giant Pearson has an office in town, and the Iowa Test of Basic Skills is an offshoot of the university itself.  Like UMass, Iowa also has a website with educational materials and useful software.

Website: https://education.uiowa.edu/centers/casma

University of Wisconsin-Madison

UW has well-known professors like Daniel Bolt and James Wollack.  Plus, Madison is well-known for being a fun city given its small size.  The large K-12 testing company, Renaissance Learning, is headquartered only a few miles away.

https://edpsych.education.wisc.edu/category/quantitative-methods/

University of Nebraska – Lincoln: Quantitative, Qualitative & Psychometric Methods

For many years, the cornerstones of this program were the husband-and-wife duo of James Impara and Barbara Plake.  They’ve now retired, but excellent new professors have joined.  In addition, UNL is the home of the Buros Institute.

https://cehs.unl.edu/edpsych/quantitative-qualitative-psychometric-methods/

University of Kansas: Research, Evaluation, Measurement, and Statistics

Not far from Lincoln, NE is Lawrence, Kansas.  The program here has been around a long time, with excellent faculty.  Students have an option for practical experience working at the Achievement and Assessment Institute.

https://epsy.ku.edu/academics/educational-psychology-research/phd/overview-benefits

Michigan State University: Measurement and Quantitative Methods

Like most of the rest of these programs, it is in a vibrant college town.  The focus is more on quantitative methods than assessment.

https://education.msu.edu/cepse/mqm/

UNC-Greensboro: Educational Research, Measurement, and Evaluation

While most programs listed here are in the northern USA, this one is in the southern part of the country, where such programs are smaller and fewer.  UNCG is quite strong however.  https://soe.uncg.edu/academics/departments/erm/erm-programs/ph-d-in-educational-research-measurement-and-evaluation/

University of Texas: Quantitative Methods

UT, like some of the other programs, has an advantage in that the educational assessment arm of Pearson is located there.

https://education.utexas.edu/departments/educational-psychology/graduate-programs/quantitative-methods

Boston College: Measurement, Evaluation, Statistics, and Assessment (MESA)

This program is involved in international research such as IMSS & PIRLS.

https://www.bc.edu/bc-web/schools/lynch-school/academics/departments/mesa.html

Morgan State University: Graduate Program in Psychometrics

Morgan State is unique in that it is a historically black institution that has an excellent program dedicated to psychometrics. https://www.morgan.edu/psychometrics

Fordham University: Psychometrics and Quantitative Psychology

Fordham has an excellent program, located in New York City.

https://www.fordham.edu/info/21665/phd_in_psychometrics_and_quantitative_psychology

James Madison University: Assessment and Measurement

While not as large as the major public universities on this list, JMU has a strong, practically focused program in psychometrics.

https://www.jmu.edu/grad/programs/snapshots/psychology-assessment-and-measurement.shtml

Outside the US

University of Alberta:  Measurement, Evaluation, and Data Science

This is arguably the leading program in all of Canada.

https://www.ualberta.ca/educational-psychology/graduate-programs/measurement-evaluation-and-data-sciences/index.html

University of British Columbia: Measurement, Evaluation, and Research Methodology

UBC is home to Bruno Zumbo, one of the most prolific researchers in the field.

http://ecps.educ.ubc.ca/program/measurement-evaluation-and-research-methodology/

University of Twente: Research Methodology, Measurement, and Data Analysis

For decades, Twente has been the center of psychometrics in Europe, with professors like Wim van der Linden, Theo Eggen, Cees Glas, and Bernard Veldkamp.  It’s also linked with Cito, the premier testing company in Europe, which provides excellent opportunities to apply your skills.

https://www.utwente.nl/en/bms/omd/

University of Amsterdam: Psychological Methods

This program has a number of well-known professors, with expertise in both psychometrics and quantitative psychology.

Website: https://psyres.uva.nl/content/research-groups/programme-group-psychological-methods/programme-group-psychological-methods.html?cb

University of Cambridge: The Psychometrics Centre

The Psychometrics Centre at Cambridge includes professors John Rust and David Stillwell.  It hosted the 2015 IACAT conference and is the home to the open-source CAT platform Concerto.

https://www.psychometrics.cam.ac.uk/

KU Leuven: Research Group of Quantitative Psychology and Individual Differences

This is home to well-known researchers such as Paul De Boeck.

https://ppw.kuleuven.be/okp/home/

University of Western Australia: Pearson Psychometrics Laboratory

This is home to David Andrich, best known for the Rasch Rating Scale Model.

Website: http://www.education.uwa.edu.au/ppl

University of Oslo: Assessment, Measurement, and Evaluation

This program provides an opportunity in the Nordic/Scandinavian countries, with a program in assessment and psychometrics.

Website: https://www.uio.no/english/studies/programmes/assessment-evaluation-master

Online

There are very few programs that offer graduate training in psychometrics that is 100% online.  Here’s the only one I know of.  If you know of another one, please get in touch with me.

The University of Illinois at Chicago: Measurement, Evaluation, Statistics, and Assessment

This program is of particular because it has an online Master’s program, which allows you to get a high-quality graduate degree in psychometrics from just about anywhere in the world.  One of my colleagues here at ASC has recently enrolled in this program.

Website: https://education.uic.edu/academics-admissions/programs/measurement-evaluation-statistics-and-assessment-mesa

We hope the article helps you find the best institution to pursue your graduate degree in psychometrics.

Estimated reading time: 3 minutes

Artificial intelligence (AI) and machine learning (ML) have become buzzwords over the past few years.  As I already wrote about, they are actually old news in the field of psychometrics.   Factor analysis is a classical example of ML, and item response theory also qualifies as ML.  Computerized adaptive testing is actually an application of AI to psychometrics that dates back to the 1970s.

One thing that is very different about the world of AI/ML today is the massive power available in free platforms like R, Python, and TensorFlow.  I’ve been thinking a lot over the past few years about how these tools can impact the world of assessment.  A straightforward application is too automated essay scoring; a common way to approach that problem is through natural language processing with the “bag of words” model and utilize the document-term matrix (DTM) as predictors in a model for essay score as a criterion variable.  Surprisingly simple.  This got me to wondering where else we could apply that sort of modeling.  Obviously, student response data on selected-response items provides a ton of data, but the research questions are less clear.  So, I turned to the topic that I think has the next largest set of data and text: item banks.

Step 1: Text Mining

The first step was to explore tools for text mining in R.  I found this well-written and clear tutorial on the text2vec package and used that as my springboard.  Within minutes I was able to get a document term matrix, and in a few more minutes was able to prune it.  This DTM alone can provide useful info to an organization on their item bank, but I wanted to delve further.  Can the DTM predict item quality?

Step 2: Fit Models

To do this, I utilized both the caret and glmnet packages to fit models.  I love the caret package, but if you search the literature you’ll find it has a problem with sparse matrices, which is exactly what the DTM is.  One blog post I found said that anyone with a sparse matrix is pretty much stuck using glmnet.

I tried a few models on a small item bank of 500 items from a friend of mine, and my adjusted R squared for the prediction of IRT parameters (as an index of item quality) was 0.53 – meaning that I could account for more than half the variance of item quality just by knowing some of the common words in each item’s stem.  I wasn’t even using the answer texts n-grams, or additional information like Author and content domain.

Want to learn more about your item banks?

I’d love to swim even deeper on this issue.  If you have a large item bank and would like to work with me to analyze it so you can provide better feedback and direction to your item writers and test developers, drop me a message at nthompson@54.89.150.95!  This could directly impact the efficiency of your organization and the quality of your assessments.