On Sunday and Monday, I had the opportunity to attend the Assessment in Middle East conference hosted at the prestigious Abu Dhabi University. The conference brought together higher education professionals across the Middle East to discuss how the assessment of students and faculty can contribute to quality improvement through the use of data. It was fascinating to see talk with these professionals on how they use data to make themselves better.

Within our own business, we like to D.R.E.A.M. (insert Wu Tang Clan reference here… RIP ODB). D.R.E.A.M stands for Data rules everything around me, and at Assessment Systems it really does. It is refreshing to have been surrounded by professionals for two days who thrive off of continual improvement using data as opposed to doing things like they always had because ‘than is just how we do it here’. These are the types of professionals I love working with.

I have a lot of great takeaways from my time at the conference and look forward to sharing them with my team when I return.

Until next time!

One of the best aspects of my position is the opportunity to travel the world and talk with many experts about psychometrics and educational assessment.  In December 2017, I was lucky enough to travel to Monterrey, Mexico, with the dual purpose of a conference and a Psychometrics Seminar.  It was an exciting week that taught me a lot about education in Latin America.

The first half of the week was the Congreso Internacional de Innovacion Educativa, the premier conference in educational technology in Latin America, with more than 3,000 attendees.  I had the opportunity to present on a project we are doing with Tecnologico de Monterrey to implement adaptive testing in admissions exams.  Tecnologico is the #2 university in Mexico, and therefore a leader in all of Latin America, as well as being the host of the conference.  In addition, I heard of number of interesting talks, including one by the CEO of Coursera on the role of MOOCs in higher education.  I’m personally a MOOC addict, though I tend to start new courses far faster than I ever finish one.

 

adaptive testing seminar

The second half of the week was the 2017 ASC Learning Summit, a 2-day seminar to teach item response theory and adaptive testing to anyone that wanted to learn.  We had 22 attendees, which was extremely successful given the short notice. (Tec had to move the entire conference from Mexico City to Monterrey after an earthquake)  We had 18 from Mexico, two from the US, one from Barbados, and one from the UK.  These included university professors, K12 assessment professionals, higher education admissions staff, certification test psychometricians, and more.

 

 

If time allows, I hope to record my presentations from that summit as a series of webinars or videos, so join our mailing list to stay tuned if you aren’t already on it.  In addition, I’ll be holding similar seminars in the future.  If you are interested in having me visit your country for such an event, get in touch with me at nthompson@54.89.150.95.

 

Last week, I had the opportunity to attend the 2017 Conference on Test Security (COTS), hosted by the University of Wisconsin-Madison.  If your organization has any concerns about test security (that is, you have any sort of real stakes tied to your test!), I recommend that you attend COTS.  It has a great mix of psychometric research with practical discussions such as policies and procedures.  While it was originally titled “Conference on Statistical Detection of Test Fraud” it has since expanded its scope and thankfully reduced the number of syllables in the name.

The venue was the Pyle Center on the shores of Lake Mendota, just one block from the famous State Street.  Madison is a beautiful city, situated on an isthmus between two large lakes; great for visuals but not so great for traffic patterns.  The location was incredibly convenient for me, as it is driving distance from my home in Minnesota, and allowed me to stay with my family in nearby Watertown, watch my brother coach a high school football game in Columbus, and stop at the CamRock mountain bike trails that I’ve always wanted to try (highly recommend!).

One highlight of the conference was the chance to present with my friend, former colleague, and graduate school office-mate Jennifer Davis from the National Association of Boards of Pharmacy.  We compared three software programs for psychometric forensics: SIFT, CopyDetect, and Outlier Detection Tool.  SIFT and CopyDetect both provide several collusion indices, but SIFT provides more and is incredibly faster (CopyDetect took 2 hours to run 134 examinees).  The Outlier Detection Tool is an internal spreadsheet used by NABP that serves a slightly different purpose; for more information, contact them.

The best part of the Conference on Test Security, just like the IACAT conference I just attended, was the chance to spend time with old friends that I only see once every year or two, as well as make new friends such as a researcher from ASC’s partner Ascend Learning.  In fact, I didn’t even get a chance to attend any sessions on the second day, I instead spent the time talking to colleagues.

Biggest disappointment?  I didn’t hang around until Saturday to attend the Badger game and join the traditional “Jump Around!”

I just returned from the 2017 Innovations in Testing conference in Scottsdale, AZ, organized by the Association of Test Publishers.  This is one of my favorite conferences because it contains a compelling blend of ingredients – quality psychometrics, innovative technology, and the business of assessment – and the 2017 edition definitely did not disappoint.

This year I was joined by four of my colleagues:

  • Dave Saben – CEO
  • Tom Padden – Head of Growth
  • Andrew Lunstad – VP of Partnerships
  • Kat Stanley, MA PMP – Partner Support Manager

Our agenda was quite full, as would be expected with such a gravitational nexus of the testing industry.  In addition to sponsoring the conference and holding a booth in the Exhibit Hall, we met with a number of current and future partners, held a BBQ at our AirBnb house, attended plenty of presentations, and gave 4 presentations of our own.  Here’s a rundown of this year’s happenings:

Takeaways of the conference

We attended presentations on a variety of topics, and it was evident that AI and Big Data are playing an increasingly impactful role in our field.  We are excited to be on the forefront of that; we’ve been doing Machine Learning since the 1970s, long before that term became en vogue.  Much of our current innovation effort is focused on automation of psychometrics, such as automated creation of NCCA Accrediation Reports and Classical Test Theory Reports.

Exhibit Hall

ATP organized an exhibit hall for the testing conference on Monday and Tuesday, and we had a great high-traffic location near the center of the room (between two food serving areas, which is always nice!).  In addition to the normal handouts, we distributed scorpion suckers and gave away a Google Home device, a nod towards our love of AI and machine learning.

BBQ

We held an informal dinner at our AirBnb house one night, with about 30 friends and partners.  We brought in some of the best BBQ in town (Joe’s) and relaxed by a poolside fire until the desert evening got too cold.  It was a great opportunity to chat with old friends and meet some new ones.

ASC’s Presentations

ASC was grateful for the opportunity to speak 4 times about the innovations we are making in the testing industry – certainly a great fit, given the title of the conference.

Tech-Enhanced Items: How Can They Provide Better Measurement?

Nathan Thompson and Andrew Lunstad from ASC, and Paul Jaquith from ACTVET, discussed tech-enhanced items from a psychometric and software design perspective, evaluating their impact on the ultimate goal: improving measurement.

Sift & Winnow: Options For Data Forensics

Nathan Thompson from ASC, and Joy Matthews-Lopez from Professional Testing Inc., discussed methodologies and software to help detect instances of test fraud, and what to do about it if you find anything worth investigating.

FastTest: A Configurable System for Cross-Sector Assessment

Andrew Lunstad and Dave Saben demonstrated FastTest, a secure, cloud-based platform uniquely designed to improve any type of assessment, from certification to pre-employment to education to psychology.

SIFT: Innovating Data Forensics

Nathan Thompson demonstrated SIFT, an innovative program designed as a cost-effective option for detecting test fraud and other threats to validity, such as low motivation.

 

If you are interested in any of these topics and did not connect with us at the conference, please contact solutions@54.89.150.95.  We are already looking forward to next year’s conference!

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

ASC attended the 2016 Conference on Test Security (COTS), held October 18-20 in Cedar Rapids IA, graciously hosted by Pearson.  The conference brings together thought leaders on all aspects of test security, including statistical detection of test fraud, management of test centers, candidate agreements, investigations, and legal implications.  ASC was lucky enough to win three presentation spots.  Please check out the abstracts below.  If you are interested in learning more, please get in touch!

 

SIFT: Software for Investing Fraud in Testing
Nathan Thompson & Terry Ausman

SIFT is a software program specifically designed to bring data forensics to more practitioners. Widespread application of data forensics, like other advanced psychometric topics, is somewhat limited when an organization’s only options are to hire outside consultants or attempt to write code themselves. SIFT enables organizations with smaller budgets to apply some data forensics by automating the calculation of complex indices as well as simpler yet important statistics, in a user-friendly interface.

The most complex portion is a set of 10 collusion indices (more in development) from which the user can choose. SIFT also provides functionality for response time analysis, including the Response Time Effort index (Wise & Kong). More common analyses include classical item statistics, mean test times, score gains, and pass rates. All indices are also rolled-up into two nested levels of groups (for example, school and indices are also rolled-up into two nested levels of groups (for example, school and district or country and city) to facilitate identification of locations with issues.

All output is provided in spreadsheets for easy viewing, manipulation, and secondary analysis. This allows, for example, a small certification organization to obtain all of this output in only a few hours of work, and quickly investigate locations before a test is further compromised.

 

Statistical Detection: Where Do I Start?
Nathan Thompson & Terry Ausman

How can statistical detection of test fraud be better directed, or test security practices in general for that matter? This presentation will begin by cogently outlining various types of analysis into a framework by aligning them with the hypothesis each intends to test, show that this framework should be used to direct efforts, and then provide some real experience by applying these to real data sets from K-12 education and professional certification.

In the first section, we will start by identifying the common hypotheses to be tested, including: examinee copying, brain dump makers, brain dump takers, proctor/teacher involvement, low motivation, and compromised locations. Next, we match up analyses, such as how collusion indices are designed to elucidate copying but can also help find brain dump takers. We also provide deeper explanations on the specific analyses.

In the second section, we apply this framework to the analysis of real data sets. This will show how the framework can be useful in directing data forensics work rather than aimlessly poking around. It will also demonstrate usage of the statistical analyses, facilitating learning of the approaches as well as driving discussions of practical issues faced by attendees. The final portion of the presentation will then be just such a discussion.

 

Statistical Methods of Detecting Test Fraud: Can We Get More Practitioners on Board?
Nathan Thompson

Statistical methods of detecting test fraud have been around since the 1970s, but are still not in general use by most practitioners, instead being limited to a few specialists.  Similarly, best practices in test security are still not commonly used except at large organizations with big stakes in play.  First, we will discuss the sort of hurdles that can prevent more professionals from learning about the topic, or for knowledgeable professionals to apply best practices.  Next, we will discuss some potential solutions to each of those hurdles.  The goal is to increase the validity of scores being reported throughout the industry by elevating the profession.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

Every Spring, the Association of Test Publishers (ATP) hosts its annual conference, Innovations in Testing.  This is the leading conference in the testing industry, with nearly 1000 people from major testing vendors and a wide range of test sponsors, from school districts to certification boards to employment testing companies.  While the technical depth is much lower that pure-scholar conferences like NCME and IACAT, it is the top conference for networking, business contacts, and discussion of practical issues.

The conference is typically held in a warm location at the end of a long winter.  This year did not disappoint, with Orlando providing us with a sunny 75 degrees each day!

Interested in attending a conference on assessment and psychometrics?  We provide this list to help you decide.

ATP Presentations

Here are the four presentations that were presented by the Assessment Systems team:

Let the CAT out of the Bag: Making Adaptive Testing more Accessible and Feasible

This session explored the barriers to implementing adaptive testing and how to address them.  It still remains underutilized, and it is our job as a profession to fix that.

FastTest: A Comprehensive System for Assessment

FastTest revolutionizes how assessments are developed and delivered with scalable security.  We provided a demo session with an open discussion.  Want to see a demo yourself?  Get in touch!

Is Remote Proctoring Really Less Secure?

Rethink security. How can we leverage technology to improve proctoring?  Is the 2000-year old approach still the most valid?  Let’s critically evaluate remote proctoring and how technology will make it better than one person watching a room of 30 computers.

The Best of Both Worlds: Leveraging TEIs without Sacrificing Psychometrics

Anyone can dream up a drag and drop item. Can you make it automatically scored with IRT in real time?  We discussed our partnership with Charles County (MD) Public Schools to develop a system that allowed a single school district to develop quality assessments on par with what a major testing company would do.  Learn more about our item authoring platform.

 

In addition to speaking, I served on the scientific committee – here’s a pic of the volunteers (I am third from left in the front row).

 

ATP2016

Photo courtesy of the Association of Test Publishers

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

The topic of test security is an emerging field within the assessment industry.  Test fraud is one of the most salient threats to score validity, so methods to combat it are extremely important to all of us.  So important, in fact, that an annual conference has been established that is devoted to the topic: the Conference on Test Security (COTS).

The 2015 Conference on Test Security

The 2015 edition of Conference on Test Security was hosted by the Center for Educational Testing and Evaluation at the University of Kansas.  Assessment Systems had the privilege to present two full sessions: One Size Does Not Fit All: Making Test Security Configurable and Scalable and Let’s Rethink How Technology Can Improve Proctoring.  Abstracts for these are below; if you would like to learn more, please contact us.  Additionally, we had the opportunity for a product demonstration of our upcoming data forensics software, SIFT (more on that below).

The Conference on Test Security kicked off with a keynote on the now-famous Atlanta cheating scandal.  This scandal was unique in that it was systematic and top-down rather than bottom-up.  The follow-up message was a stressing of the difference between assessment and accountability; because a test is tied to accountability standards does not mean the test in itself is bad, or that all testing is bad.  Most of the opt-out movement is completely unfamiliar with the issues.

One of the most commonly presented topics at the conference is data forensics.  In fact, the Conference on Test Security used to be call the Conference on Statistical Detection of Test Fraud.  But while there has been research on statistical detection of test fraud for more than 50 years, it is effectively a much younger topic and we are still learning a lot.  Moreover, there are no good software programs that are publicly available to help organizations implement best practices in data forensics.  This is where SIFT comes in.

What is Data Forensics?

In the realm of test security, data forensics refers to analysis of data to find evidence of various types of test fraud.  There are a few big types, and the approach to analysis can be quite different.  Here are some descriptions, though this is far from a complete treatment of the topic!

Answer-changing: In Atlanta, teachers and administrators would change answers on student bubble sheets after the test was turned in.  This involves quantification of the changes, and then analysis of right-to-wrong vs. wrong-to-right changes, amongst other things.  This, of course, is primarily relevant for paper-based tests, but some answer-changing can happen on computer-based tests.

Preknowledge: If an examinee purchases a copy of the test off the internet from an illegal “brain dump” website, they will get a surprisingly high score while taking less time than expected.  This could be on all items or a subset.

Item harvesting: An examinee is paid to memorize as many items as they can.  They might spend 5 minutes each on the first 15 items and not even look at the remainder.

Collusion:  The age-old issue of Student A copying off Student B is collusion, but it can also involve multiple students and even teachers or other people.  Statistical analysis looks at unusual patterns in the response data.

How can I implement some of this myself?test security, data forensics

Unfortunately, there is no publicly available software that is adequately devoted to data forensics.  Existing software is very limited in the analysis it provides and/or its usability.  For example, there are some packages available in the R programming language, but you need to learn to program in R!  Therefore Assessment Systems has developed our own system, entitled Software for Investigating Test Fraud (SIFT), to meet this market need.

SIFT will provide a wide range of analysis, including a number of collision indices (see the first six on the left; we will do more!), flagging of possible preknowledge or item harvesting, unusual time patterns, etc.  It will also aggregate the analyses up a few levels; for example flagging test centers or schools that have unusually high numbers of students with unusually high collusion or time pattern flags.

A beta version will be available in December, with a full version available in 2016.  If you are interested, please contact us!

 

Presentation Abstracts

One Size Does Not Fit All: Making Test Security Configurable and Scalable

Development of an organization’s test security plan involves many choices, an important aspect of which is the test development, publishing, and delivery process.  Much of this process is now browser-based for many organizations.  While there are risks involved with this approach, it provides much more flexibility and control for organizations, plus additional advantages such as immediate republishing.  This is especially useful because different programs/tests within an organization might vary widely.  It is therefore ideal to have an assessment platform that maximizes the configurability security.

This presentation will provide a model to evaluate security risks, determine relevant tactics, and design your delivery solution by configuring test publishing/delivery option around these tactics to ensure test integrity.  Key configurations include:

  • Regular browser vs. lockdown browser
  • No proctor, webcam proctor, or live proctor
  • Login processes such as student codes, proctor codes, and ID verification
  • Delivery approach: linear, LOFT, CAT
  • Practical constraints like setting delivery windows, time limits, and allowing review
  • Complete event tracking during the exam
  • Data forensics within the system.

In addition, we invite attendees to discuss technological approaches they have taken to addressing test security risks, and how they fit into the general model.

Let’s Rethink How Technology Can Improve Proctoring

Technology has revolutionized much of assessment.  However, a large proportion of proctoring is still done the same way it was 30 years ago.  How can we best leverage technology to improve test security by improving the proctoring of an assessment?  Much of this discussion revolves around remote proctoring (RP), but there are other aspects.  For example, consider a candidate focusing on memorizing 10 items: can this be better addressed by real-time monitoring of irregular response times with RP than by a single in-person proctor on the other side of the room?  Or by LOFT/CAT delivery?

This presentation discusses the security risks and validity threats that are intended to be addressed by proctors and how they might be instead addressed by technology in some way.  Some of the axes of comparison include:

  • Confirming ID of examinee
  • Provision of instructions
  • Confirmation of clean test area with only allowed materials
  • Monitoring of examinee actions during test time
  • Maintaining standardized test environment
  • Protecting test content
  • Monitoring irregular time patterns

In addition, we can consider how we can augment the message of deterrence with tactics like data forensics, strong agreements, possibility of immediate test shutdown, and more secure delivery methods like LOFT.

EdSurge (http://www.edsurge.com) has compiled a list of EdTech conferences, complete with a fun infographic.  It includes big ones like Maker Fair, SXSWEDU, and ISTE, as well as smaller or more regional conferences.   I appreciate all the time that EdSurge put into this.  I’m also interested in their endeavor to update their EdTech Index, since it is a useful idea but way out of date.  However, I think that is a Sisyphean task, given the huge number of companies out there and the fact that there is constant turnover with launchings and closings!

 

The conference list is available here.  It unfortunately does not include any assessment-specific conferences, even those focused on technology like IACAT.  If you are more interested in assessment-related conferences, check out this page.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

Last night, I had the honor to sit on a panel discussing Current Themes in Educational Assessment at an Educelerate event.  Educelerate “is a networking forum for people who are passionate about promoting innovation and entrepreneurship in education – particularly through the use of technology.”  It is a national network, and the Twin Cities has an active chapter due to the substantial presence of the Education industry here.  See the local MeetUp page for more information or to join up.  There is also a national Twitter feed to follow.

I’d like to thank Sean Finn for organizing the event and serving as moderator.  I’d also like to thank the other three panelists in addition to everyone that attended.

  • Jennifer Dugan – Director of Assessment at the State of Minnesota
  • Greg Wright – CEO at Naiku
  • Steve Mesmer – COO at Write the World

After an overview of assessment at the State level by Ms. Dugan, each panelist was asked to provide a brief response to three questions regarding assessment.  Here are mine:

  1. In your opinion, how do you perceive the role of technology in educational assessment?

I think this depends on the purpose of the assessment.  In assessment of learning, from 3rd grade benchmark exams to medical licensure tests, the purpose of the test is to obtain an accurate estimate of student mastery.  The greater the stakes, the more accuracy is needed.  Technology should serve this goal.

In assessment for learning, the goal is more to engage the student and be integral to the learning process.  Using complex psychometrics to gain more accurate scores is less important.  Technology should explore ways to engage the student and enhance learning, such as simulations.

However, we must not lose sight of these purposes, and adding technology merely for the sake of appearing innovative is actually counterproductive.  I’ve already seen this happen twice with a PARCC project.  They have “two-part” items that are supposed to delve deeper into student understanding, but because the approach is purely pedagogical and not psychometric, the data they produce is unusable.  PARCC also takes the standard multiple response item (choose two out of five checkboxes) and makes it into a drag and drop item; no difference whatsoever in data or psychometrics, just sleeker looking technology.

 

  1. What opportunities do you see for new technologies that can help improve educational assessment in the 21st century?

There are a few ways this can happen.

My favorite is adaptive testing, whereby we leverage the computing power of technology to make tests more accurate.  The same is also true for more sophisticated psychometric modeling.

Another great idea is automated essay scoring, which is not safe as the ONLY scoring method, but improves accuracy when used appropriately.  Given the massive back-end cost of scoring essay items, any alleviation on that front will allow for more use of constructed-response formats.

New item types that allow us to glean more information in a shorter amount of time will improve the efficiency and accuracy of assessment.  But as I mentioned previously, development of new item types should always be done with the correct purpose in mind.

Big Data will likely improve the use of assessment data, but can also come into play in terms of the development and delivery of tests.

I’d also like to see Virtual Reality break into the assessment arena.  Our company works with Crane Operator exams.  Who WOULDN’T want to take an exam like that via virtual reality?

 

  1. Adaptive testing is a common term in the educational assessment world, especially given the focus of Smarter Balanced. What is the future of adaptive testing in your opinion, and how will that impact educational assessment?

The primary purpose of adaptive testing is to improve the efficiency of the assessment process.  That is, research has generally shown that it can produce scores just as accurate as a linear test, but with half as many items.  Moreover, the improvement in precision is typically more pronounced for students that are very high or low, because the typical exam does not provide many items for them; most items are of middle difficulty.  Alternatively if we want to keep the same precision, we can cut time in half; this is extremely relevant for quick diagnostic tests as opposed to longer, high-stakes tests.

While the time savings are notable at an individual level, consider the overall time savings across hundreds of thousands of students – certainly relevant in an environment of “less testing!”

CAT also has secondary advantages, such as increasing student engagement because students are only presented with items of appropriate difficulty.

One major opportunity is for CAT to start using more sophisticated models, such as multidimensional item response theory, cognitive diagnostic models, and models that utilize item response time.  This will improve its performance even further.

The future involves more widespread use of CAT as the cost of providing it continues to come down.  While it will never be something that can be done at the classroom or school level since it requires a PhD, more companies will be able to provide it, and at a lower price point, which means it ends up being used more widely.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source