us govt fedramp fisma

ASC is proud to announce that we have successfully passed an audit for FISMA / FedRAMP Moderate, demonstrating our extremely high security standards for our online assessment platform!  FISMA and FedRAMP are both security protocols that are required to provide cloud-based software to the United States government, based on the National Institute of Standards and Technology (NIST) controls, outlined in NIST SP 800-53.  More information is below.

What does it mean that we achieved FISMA / FedRAMP?

If you are a US state or federal government entity, this means that it is much easier to utilize ASC’s powerful software for item banking, online testing, and psychometrics.  You can serve as a sponsor for an Authority to Operate (ATO).  If you are not such an entity, it means you can rest assured that ASC’s commitment to security is strong enough that it meets such stringent standards.

There are many aspects that go into building a platform of this quality, and then of course there is the substantial investment of a third-party audit.  This includes code quality, code review, user roles, separation of concerns, staff access to servers, tracking of tickets and code releases, etc.

More information on FISMA / FedRAMP

https://www.paloaltonetworks.com/cyberpedia/difference-between-fisma-and-fedramp

https://www.fedramp.gov/program-basics/

https://foresite.com/blog/fisma-vs-fedramp-and-nist-making-sense-of-government-compliance-standards/

Yes, I’d like to learn more.

Please contact us for a software demo or request a trial account.  We’d love to hear your requirements!

 

Multistage testing algorithm

The College Board announced in January 2022 that it was planning to finally migrate the Scholastic Aptitude Test (SAT) from paper-and-pencil to computerized delivery.  Moreover, it would make the tests “adaptive.”  But what does it mean to have an adaptive SAT?

What is the SAT?

The SAT is the most commonly used exam for university admissions in the United States, though the ACT ranks a close second.  Decades of research has shown that it accurately predicts important outcomes, such as 4-year graduation rates or GPA.  Moreover, it provides incremental validity over other predictors, such as High School GPA.  The adaptive SAT exam will use algorithms to make the test shorter, smarter, and more accurate.

The new version of the SAT has 3 sections: Math, Reading, and Writing/Language.  These are administered separately from a psychometric perspective.

Digital Assessment

The new SAT is being called the “Digital SAT” by the College Board.  Digital assessment, also known as electronic assessment or computer-based testing, refers to the delivery of exams via computers.  It’s sometimes called online assessment or internet-based assessment as well, but not all software platforms are online, some stay secure on LANs.

What is “adaptive”?

When a test is adaptive, it means that it is being delivered with a computer algorithm that will adjust the difficulty of questions based on an individual’s performance.  If you do well, you get tougher items.  If you do not do well, you get easier items.

But while this seems straightforward and logical on the surface, there is a host of technical challenges to this.  And, as researchers have delved into those challenges over the past 50 years, they have developed several approaches to how the adaptive algorithm can work.

  1. Adapt the difficulty after every single itemMultistage testing algorithm
  2. Adapt the difficulty in blocks of items (sections), aka MultiStage Testing
  3. Adapt the test in entirely different ways (e.g., decision trees based on machine learning models, or cognitive diagnostic models)

There are plenty of famous exams which use the first approach, including the NWEA MAP test and the Graduate Management Admissions Test (GMAT).  But the SAT plans to use the second approach.  There are several reasons to do so, an important one of which is that it allows you to use “testlets” which are items that are grouped together.  For example, you probably remember test questions that have a reading passage with 3-5 attached questions; well, you can’t do that if you are picking a new standalone item after every item, as with Approach #1.

So how does it work?  Each Adaptive SAT subtest will have two sections.  An examinee will finish Section 1, and then based on their performance, get a Section 2 that is tailored to them.  It’s not like it is just easy vs hard, either; there might be 30 possible Section 2s (10 each of Easy, Medium, Hard), or variations in between.  A depiction of a 3-stage test is to the right.

How do we fairly score the results if students receive different questions?  That issue has long been addressed by item response theory.  Examinees are scored with a complex machine learning model which takes into account not just how many items they got correct, but which items, and how difficult or high-quality those items are.  This is nothing new; it has been used by many large-scale assessments since the 1980s.

If you want to delve deeper into learning about adaptive algorithms, start over here.

 

 

Why an adaptive SAT?

The decades of research have shown adaptive testing to have well-known benefits.  It requires fewer items to achieve the same level of accuracy in scores, which means shorter exams for everyone.  It is also more secure, because not everyone sees the same items in the same order.  It can produce a more engaging assessment as well, keeping the top performers challenged and avoid the lower performers checking out after getting too frustrated by difficult items.  And, of course, using digital assessment has many advantages itself, such as faster score turnaround and enabling the use of tech-enhanced items.  So, the migration to an adaptive SAT on top of being digital will be beneficial for the students.

The California Department of Human Resources (CalHR, calhr.ca.gov/) has selected Assessment Systems Corporation (ASC, assess.com) as its vendor for an online assessment platform. CalHR is responsible for the personnel selection and hiring of many job roles for the State, and delivers hundreds of thousands of tests per year to job applicants. CalHR seeks to migrate to a modern cloud-based platform that allows it to manage large item banks, quickly publish new test forms, and deliver large-scale assessments that align with modern psychometrics like item response theory (IRT) and computerized adaptive testing (CAT).

Assess.ai as a solution

ASC’s landmark assessment platform Assess.ai was selected as a solution for this project. ASC has been providing computerized assessment platforms with modern psychometric capabilities since the 1980s, and released Assess.ai in 2019 as a successor to its industry-leading platform FastTest. It includes modules for item authoring, item review, automated item generation, test publishing, online delivery, and automated psychometric reporting.

Read the full article here.

Multistage adaptive testing

Multistage testing

Automated item generation

automated item generation

Nathan Thompson, Ph.D., was recently invited to talk about ASC and the future of educational assessment on the Ednorth EdTech Podcast.

EdNorth is an association dedicated to continuing the long history of innovation in educational technology that has been rooted in the Twin Cities of Minnesota (Minneapolis / Saint Paul). Click below to listen online, or find it on Apple or other podcast aggregators.

Dr. Thompson discusses the history of ASC, ASC’s mission to improve assessment with quality psychometrics, and how AI and automation are becoming used more often – even though they’ve been part of the Psychometrics field for a century.

Thank you to Dave Swerdlick and the team at EdNorth for the opportunity to speak!

psychometrics-possibilities

Today I read an article in The Industrial-Organizational Psychologist (the colloquial journal published by the Society for Industrial Organizational Psychology) that really resonated with me.

Has Industrial-Organizational Psychology Lost Its Way?
-Deniz S. Ones, Robert B. Kaiser, Tomas Chamorro-Premuzic, Cicek Svensson

Why?  Because I think a lot of the points they are making are also true about the field of Psychometrics and our innovation.  They summarize their point in six bullet points that they suggest present a troubling direction for their field.  Though honestly, I suppose a lot of Academia falls under these, while some great innovation is happening over on some free MOOCs and the like because they aren’t fettered by the chains of the purely or partially academic world.

  • an overemphasis on theory
  • a proliferation of, and fixation on, trivial methodological minutiae
  • a suppression of exploration and a repression of innovation
  • an unhealthy obsession with publication while ignoring practical issues
  • a tendency to be distracted by fads
  • a growing habit of losing real-world influence to other fields.

So what is psychometrics supposed to be doing?

The part that has irked me the most about Psychometrics over the years is the overemphasis on theory and minutiae rather than solving practical problems.  This is the main reason I stopped attending the NCME conference and instead attend practical conferences like ATP.  It stems from my desire to improve the quality of assessment throughout the world.  Development of esoteric DIF methodology, new multidimensional IRT models, or a new CAT sub-algorithm when there are already dozens and the new one offers a 0.5% increase in efficiency… stuff like that isn’t going to impact all the terrible assessment being done in the world and the terrible decisions being made about people based on those assessments.  Don’t get me wrong, there is a place for the substantive research, but I feel the latter point is underserved.

The Goal: Quality Assessment

psychometrician in code

And it’s that point that is driving the work that I do.  There is a lot of mediocre or downright bad assessment out there in the world.  I once talked to a Pre-Employment testing company and asked if I could help implement strong psychometrics to improve their tests as well as validity documentation.  Their answer?  It was essentially “No thanks, we’ve never been sued so we’re OK where we are.”  Thankfully, they fell in the mediocre category rather the downright bad category.

Of course, in many cases, there is simply a lack of incentive to produce quality assessment.  Higher Education is a classic case of this.  Professional schools (e.g., Medicine) often have accreditation tied in some part to demonstrating quality assessment of their students.  There is typically no such constraint on undergraduate education, so your Intro to Psychology and freshman English Comp classes still do assessment the same way they did 40 years ago… with no psychometrics whatsoever.  Many small credentialing organizations lack incentive too, until they decide to pursue accreditation.

I like to describe the situation this way: take all the assessments of the world and get them a percentile rank in psychometric quality.  The top 5% are the big organizations, such as Nursing licensure in the US, that have in-house psychometricians, large volumes, and huge budgets.  We don’t have to worry about them as they will be doing good assessment (and that substantive research I mentioned might be of use to them!).  The bottom 50% or more are like university classroom assessments.  They’ll probably never use real psychometrics.  I’m concerned about that 50-95th percentile.

Example: Credentialing

A great example of this level is the world of Credentialing.  There a TON of poorly constructed licensure and certification tests that are being used to make incredibly important decisions about people’s lives.  Some are simply because the organization is for-profit and doesn’t care.  Some are caused by external constraints.  I once worked with a Department of Agriculture for a western US State, where the legislature mandated that licensure tests be given for certain professions, even though only like 3 people per year took some tests.

So how do we get groups like that to follow best practices in assessment?  In the past, the only way to get psychometrics done is for them to pay a consultant a ton of money that they don’t have.  Why spend $5k on an Angoff study or classical test report for 3 people/year?  I don’t blame them.  The field of Psychometrics needs to find a way to help such groups.  Otherwise, the tests are low quality and they are giving licenses to unqualified practitioners.

There are some bogus providers out there, for sure.  I’ve seen Certification delivery platforms that don’t even store the examinee responses, which would be necessary to do any psychometric analysis whatsoever.  Obviously they aren’t doing much to help the situation.  Software platforms that focus on things like tracking payments and prerequisites simply miss the boat too.  They are condoning bad assessment.

Similarly, mathematically complex advancements such as multidimensional IRT are of no use to this type of organization.  It’s not helping the situation.

An Opportunity for Innovation

making-predictions-and-decisions-based-on-test-scores

I think there is still a decent amount of innovation in our field.  There are organizations that are doing great work to develop innovative items, psychometrics, and assessments.  However, it is well known that large corporations will snap up fresh PhDs in Psychometrics and then lock them in a back room to do uninnovative work like run SAS scripts or conduct Angoff studies over and over and over.  This happened to me and after only 18 months I was ready for more.

Unfortunately, I have found that a lot of innovation is not driven by producing good measurement.  I was in a discussion on LinkedIn where someone was pushing gamification for assessments and declared that measurement precision was of no interest.  This, of course, is ludicrous.  It’s OK to produce random numbers as long as the UI looks cool for students?

Innovation in Psychometrics at ASC

Much of the innovation at ASC is targeted towards the issue I have presented here.  I originally developed Iteman 4 and Xcalibre 4 to meet this type of usage.  I wanted to enable an organization to produce professional psychometric analysis reports on their assessments without having to pay massive amounts of money to a consultant.  Additionally, I wanted to save time; there are other software programs which can produce similar results, but drop them in text files or Excel spreadsheets instead of Microsoft Word which is of course what everyone would use to draft a report.

Much of our FastTest platform is designed with a similar bent.  Tired of running an Angoff study with items on a projector and the SMEs writing all their ratings with pencil and paper, only to be transcribed later?  Well, you can do this online.  Moreover, because it is only you can use the SMEs remotely rather than paying to fly them into a central office.  Want to publish an adaptive (CAT) exam without writing code?  We have it built directly into our test publishing interface.

Back to My Original Point

So the title is “What is Psychometrics Supposed to be Doing?” with regards to psychometrics innovation.  My answer, of course, is improving assessment.  The issue I take with the mathematically advanced research is that it is only relevant for that top 5% of organizations that is mentioned.  It’s also our duty as psychometricians to find better ways to help the other 95%.

What else can we be doing?  I think the future here is automation.  Iteman 4 and Xcalibre 4, as well as FastTest, were really machine learning and automation platforms before those things became so en vogue.  As the SIOP article mentioned at the beginning talks about, other scholarly areas like Big Data are gaining more real-world influence even if they are doing things that Psychometrics has done for a long time.  Item Response Theory is a form of machine learning and it’s been around for 50 years!

 

female-teacher-opt-out-of-testing

The “opt out” movement is a supposedly-grass-roots movement against K-12 standardized testing, primarily focusing action on encouraging parents to refuse to allow their kids to take tests, i.e., opt out of testing.  The absolutely bizarre part of this is that large scale test scores are rarely used for individual impact on the student, and that tests take up only a tiny fraction of school time throughout the year.  An extremely well-written paper was recently released that explored this befuddling situation, written by Randy E. Bennett at Educational Testing Service (ETS).  Dr. Bennett is an internationally-renowned researcher whose opinion is quite respected.  He came to an interesting conclusion about the opt out of testing topic.

Opt-Out Movement: The Background

After a brief overview, he summarizes the situation:

Despite the fact that reducing testing time is a recurring political response, the evidence described thus far suggests that the actual time devoted to testing might not provide the strongest rationale for opting out, especially in the suburban low-poverty schools in which test refusal appears to occur more frequently.

A closer look at New York, the state with the highest opt-out rates, found a less obvious but stronger relationship (page 7):

It appears to have been the confluence of a revamped teacher evaluation system with a dramatically harder, Common Core-aligned test that galvanized the opt-out movement in New York State (Fairbanks, 2015; Harris & Fessenden, 2015; PBS Newshour, 2015). For 2014, 96% of the state’s teachers had been rated as effective or highly effective, even though only 31% of students had achieved proficiency in ELA and only 36% in mathematics (NYSED, 2014; Taylor, 2015). These proficiency rates were very similar to ones achieved on the 2013 NAEP for Grades 4 and 8 (USDE, 2013a, 2013b, 2013c, 2013d). The rates were also remarkably lower than on New York’s pre-Common-Core assessments. The new rates might be taken to imply that teachers were doing a less-than-adequate job and that supervisors, perhaps unwittingly, were giving them inflated evaluations for it.

That view appears to have been behind a March 2015 initiative from New York Governor Andrew Cuomo (Harris & Fessenden, 2015; Taylor, 2015). At his request, the legislature reduced the role of the principal’s judgment, favored by teachers, and increased from 20% to 50% the role of test-score growth indicators in evaluation and tenure decisions (Rebora, 2015). As a result, the New York State United Teachers union urged parents to boycott the assessment so as to subvert the new teacher evaluations and disseminated information to guide parents specifically in that action (Gee, 2015; Karlin, 2015).

The future?

I am certainly sympathetic to the issues facing teachers today, being the son of two teachers and having a sibling who is a teacher, as well as having wanted to be a high school teacher myself until I was 18.  The lack of resources and low pay facing most educators is appalling.  However, the situation described above is simply an extension of the soccer-syndrome that many in our society decry: how all kids should be allowed to play and rewarded equally, merely for participation and not performance.  With no measure of performance, there is no external impetus to perform – and we all know the role that motivation plays in performance.

It will be interesting to see the role that the Opt Out Of Testing movement plays in the post-NLCB world.

conference-exec-speaker

Every Spring, the Association of Test Publishers (ATP) hosts its annual conference, Innovations in Testing.  This is the leading conference in the testing industry, with nearly 1000 people from major testing vendors and a wide range of test sponsors, from school districts to certification boards to employment testing companies.  While the technical depth is much lower that pure-scholar conferences like NCME and IACAT, it is the top conference for networking, business contacts, and discussion of practical issues.

The conference is typically held in a warm location at the end of a long winter.  This year did not disappoint, with Orlando providing us with a sunny 75 degrees each day!

Interested in attending a conference on assessment and psychometrics?  We provide this list to help you decide.

ATP Presentations

Here are the four presentations that were presented by the Assessment Systems team:

Let the CAT out of the Bag: Making Adaptive Testing more Accessible and Feasible

This session explored the barriers to implementing adaptive testing and how to address them.  It still remains underutilized, and it is our job as a profession to fix that.

FastTest: A Comprehensive System for Assessment

FastTest  revolutionizes how assessments are developed and delivered with scalable security.  We provided a demo session with an open discussion.  Want to see a demo yourself?  Get in touch!

Is Remote Proctoring Really Less Secure?

Rethink security. How can we leverage technology to improve proctoring?  Is the 2000-year old approach still the most valid?  Let’s critically evaluate remote proctoring and how technology will make it better than one person watching a room of 30 computers.  Privacy is another important consideration.

The Best of Both Worlds: Leveraging TEIs without Sacrificing Psychometrics

Anyone can dream up a drag and drop item. Can you make it automatically scored with IRT in real time?  We discussed our partnership with Charles County (MD) Public Schools to develop a system that allowed a single school district to develop quality assessments on par with what a major testing company would do. Learn more about our item authoring platform.