psychometrics-possibilities

The Goal: Quality Assessment

On March 31, 2017, I read an article in The Industrial-Organizational Psychologist (the journal published by the Society for Industrial Organizational Psychology) that really resonated with me:

Has Industrial-Organizational Psychology Lost Its Way? By Deniz S. Ones, Robert B. Kaiser, Tomas Chamorro-Premuzic, Cicek Svensson

Why? Because many of their points apply to psychometrics and its current state of innovation. They summarize their concerns in six bullet points:

  • Overemphasis on theory
  • Fixation on trivial methodological minutiae
  • Suppression of exploration and innovation
  • Obsession with publication while ignoring practical issues
  • Distraction by fads
  • Loss of real-world influence to other fields

 

So What Is Psychometrics Supposed to Be Doing?

psychometrician in code

What has irked me most over the years is the overemphasis on theory and minutiae rather than solving practical problems. This is why I stopped attending NCME conferences and instead attended practical ones like ATP. My goal is to improve the quality of assessment worldwide. Developing esoteric DIF methodology, new multidimensional IRT models, or CAT sub-algorithms that improve efficiency by only 0.5% won’t significantly impact the many poor assessments out there or the flawed decisions they lead to. There is a place for research, but practical improvements are underserved.

Example: Credentialing

Credentialing is a prime example of where assessment quality matters. Many licensure and certification tests used to make critical decisions are poorly constructed. Some organizations simply don’t care, while others face external constraints. I once worked with a Department of Agriculture in a western U.S. state where the legislature mandated licensure tests for professions with as few as three test-takers per year.

How do we encourage such groups to follow best practices? Traditionally, they would need to hire expensive consultants, which isn’t feasible for small organizations. Why spend $5,000 on an Angoff study for just three candidates per year? I don’t blame them for avoiding it, but the result is low-quality tests certifying unqualified practitioners.



An Opportunity for Innovation

making-predictions-and-decisions-based-on-test-scores

There is still innovation happening in our field, but it is often misdirected. Large corporations hire fresh PhDs in psychometrics and then assign them repetitive tasks like running SAS scripts or conducting Angoff studies over and over. I experienced this firsthand, and after 18 months, I was eager for something more.

Worse, much of the innovation isn’t focused on better measurement. I once saw someone advocating for gamification in assessments, claiming measurement precision didn’t matter. This is, of course, ridiculous. An appealing UI is meaningless if the results are just random numbers.

Innovation in Psychometrics at ASC

At ASC, much of our innovation is aimed at solving these issues. I developed Iteman 4 and Xcalibre 4 to allow organizations to generate professional psychometric reports without hiring expensive consultants. Unlike other programs that produce text files or Excel spreadsheets, our tools generate reports directly in Microsoft Word, making them easy to use.

Similarly, our FastTest platform was designed to modernize assessment processes. Conducting an Angoff study with items on a projector and SMEs writing ratings on paper is outdated. FastTest allows this to be done online, enabling remote SME participation and reducing costs. Want to publish an adaptive (CAT) exam without coding? We’ve built that directly into the test publishing interface.

Back to My Original Point

The question was: What should psychometrics be doing? My answer: improving assessment. Advanced mathematical research is useful, but only for the top 5% of organizations. It’s our duty as psychometricians to find ways to help the other 95%.

The Future: Automation in Psychometrics

The future lies in automation. Iteman 4, Xcalibre 4, and FastTest were effective machine learning tools before the term became trendy. Other fields, like Big Data, are gaining influence by applying concepts psychometrics has used for decades. Item Response Theory, for example, is a form of machine learning that has been around for 50 years!

If you’re looking to improve your assessment practices, ASC’s tools can help. Contact us today to learn how automation and innovation can enhance your psychometric processes.



 

us govt fedramp fisma

ASC is proud to announce that we have successfully passed an audit for FISMA / FedRAMP Moderate, demonstrating our extremely high security standards for our online assessment platform!  FISMA and FedRAMP are both security protocols that are required to provide cloud-based software to the United States government, based on the National Institute of Standards and Technology (NIST) controls, outlined in NIST SP 800-53.  More information is below.

What does it mean that we achieved FISMA / FedRAMP?

If you are a US state or federal government entity, this means that it is much easier to utilize ASC’s powerful software for item banking, online testing, and psychometrics.  You can serve as a sponsor for an Authority to Operate (ATO).  If you are not such an entity, it means you can rest assured that ASC’s commitment to security is strong enough that it meets such stringent standards.

There are many aspects that go into building a platform of this quality, and then of course there is the substantial investment of a third-party audit.  This includes code quality, code review, user roles, separation of concerns, staff access to servers, tracking of tickets and code releases, etc.

More information on FISMA / FedRAMP

https://www.paloaltonetworks.com/cyberpedia/difference-between-fisma-and-fedramp

https://www.fedramp.gov/program-basics/

https://foresite.com/blog/fisma-vs-fedramp-and-nist-making-sense-of-government-compliance-standards/

Yes, I’d like to learn more.

Please contact us for a software demo or request a trial account.  We’d love to hear your requirements!

 

Multistage testing algorithm

The adaptive SAT (Scholastic Aptitude Test) exam was announced in January 2022 by the College Board, with the goal to modernize the test and make it more widely available, migrating the exam from paper-and-pencil to computerized delivery.  Moreover, it would make the tests “adaptive.”  But what does it mean to have an adaptive SAT?  How does adaptive testing work, and why does it make tests more secure, efficient, accurate, and fair?

Click here to take an example Adaptive SAT (Math)

 

What is the SAT?

The SAT is the most commonly used exam for university admissions in the United States, though the ACT ranks a close second.  Decades of research has shown that it accurately predicts important outcomes, such as 4-year graduation rates or GPA.  Moreover, it provides incremental validity over other predictors, such as High School GPA.  The adaptive SAT exam will use algorithms to make the test shorter, smarter, and more accurate.

The new version of the SAT has 3 sections: Math, Reading, and Writing/Language.  These are administered separately from a psychometric perspective.  The Reading section tests comprehension and analysis through passages from literature, historical documents, social sciences, and natural sciences. The Writing and Language section evaluates grammar, punctuation, and editing skills, asking students to improve sentence structure and word choice. The Math section covers algebra, problem-solving, data analysis, and some advanced math topics, divided into calculator and no-calculator portions. Each section measures critical thinking and problem-solving abilities, contributing to the overall score. The optional Essay was removed in 2021.

Digital Assessment

The new SAT with adaptive testing is being called the “Digital SAT” by the College Board.  Digital assessment, also known as electronic assessment or computer-based testing, refers to the delivery of exams via computers.  It’s sometimes called online assessment or internet-based assessment as well, but not all software platforms are online, some stay secure on LANs.

What is “adaptive”?

When a test is adaptive, it means that it is being delivered with a computer algorithm that will adjust the difficulty of questions based on an individual’s performance.  If you do well, you get tougher items.  If you do not do well, you get easier items.

But while this seems straightforward and logical on the surface, there is a host of technical challenges to this.  And, as researchers have delved into those challenges over the past 50 years, they have developed several approaches to how the adaptive algorithm can work.

  1. Adapt the difficulty after every single itemMultistage testing algorithm
  2. Adapt the difficulty in blocks of items (sections), aka MultiStage Testing
  3. Adapt the test in entirely different ways (e.g., decision trees based on machine learning models, or cognitive diagnostic models)

There are plenty of famous exams which use the first approach, including the NWEA MAP test and the Graduate Management Admissions Test (GMAT).  But the SAT uses the second approach.  There are several reasons to do so, an important one of which is that it allows you to use “testlets” which are items that are grouped together.  For example, you probably remember test questions that have a reading passage with 3-5 attached questions; well, you can’t do that if you are picking a new standalone item after every item, as with Approach #1.

So how does it work?  Each Adaptive SAT subtest will have two sections.  An examinee will finish Section 1, and then based on their performance, get a Section 2 that is tailored to them.  It’s not like it is just easy vs hard, either; there might be 30 possible Section 2s (10 each of Easy, Medium, Hard), or variations in between.  A depiction of a 3-stage test is to the right.

How do we fairly score the results if students receive different questions?  That issue has long been addressed by item response theory.  Examinees are scored with a complex machine learning model which takes into account not just how many items they got correct, but which items, and how difficult or high-quality those items are.  This is nothing new; it has been used by many large-scale assessments since the 1980s.

If you want to delve deeper into learning about adaptive algorithms, here is a detailed article.

 

Why an adaptive SAT?

The decades of research have shown adaptive testing to have well-known benefits.  It requires fewer items to achieve the same level of accuracy in scores, which means shorter exams for everyone.  It is also more secure, because not everyone sees the same items in the same order.  It can produce a more engaging assessment as well, keeping the top performers challenged and avoid the lower performers checking out after getting too frustrated by difficult items.  And, of course, using digital assessment has many advantages itself, such as faster score turnaround and enabling the use of tech-enhanced items.  So, the migration to an adaptive SAT on top of being digital will be beneficial for the students.

 

The California Department of Human Resources (CalHR, calhr.ca.gov/) has selected Assessment Systems Corporation (ASC, assess.com) as its vendor for an online assessment platform. CalHR is responsible for the personnel selection and hiring of many job roles for the State, and delivers hundreds of thousands of tests per year to job applicants. CalHR seeks to migrate to a modern cloud-based platform that allows it to manage large item banks, quickly publish new test forms, and deliver large-scale assessments that align with modern psychometrics like item response theory (IRT) and computerized adaptive testing (CAT).

Assess.ai as a solution

ASC’s landmark assessment platform Assess.ai was selected as a solution for this project. ASC has been providing computerized assessment platforms with modern psychometric capabilities since the 1980s, and released Assess.ai in 2019 as a successor to its industry-leading platform FastTest. It includes modules for item authoring, item review, automated item generation, test publishing, online delivery, and automated psychometric reporting.

Read the full article here.

Multistage adaptive testing

Multistage testing

Automated item generation

automated item generation

Nathan Thompson, Ph.D., was recently invited to talk about ASC and the future of educational assessment on the Ednorth EdTech Podcast.

EdNorth is an association dedicated to continuing the long history of innovation in educational technology that has been rooted in the Twin Cities of Minnesota (Minneapolis / Saint Paul). Click below to listen online, or find it on Apple or other podcast aggregators.

Dr. Thompson discusses the history of ASC, ASC’s mission to improve assessment with quality psychometrics, and how AI and automation are becoming used more often – even though they’ve been part of the Psychometrics field for a century.

Thank you to Dave Swerdlick and the team at EdNorth for the opportunity to speak!