Psychometric forensics is a surprisingly deep and complex field.  Many of the indices are incredibly sophisticated, but a good high-level and simple analysis to start with is overall time vs. scores, which I call Time-Score Analysis.  This approach uses simple flagging on two easily interpretable metrics (total test time in minutes and number correct raw score) to identify possible pre-knowledge, clickers, and harvester/sleepers.  Consider the four quadrants that a bivariate scatterplot of these variables would produce.

 

QuadrantInterpretationPossible threat?Suggested flagging
Upper rightHigh scores and taking their diligent timeGood examineesNA
Upper leftHigh scores with low timePre-knowledgeTop 50% score and bottom 5% time
Lower left Low scores with low time“Clickers” or other low motivationBottom 5% time and score
Lower right Low scores with high timeHarvesters, sleepers, or just very low abilityTop 5% time and bottom 5% scores

An example of Time-Score Analysis

Consider the example data below.  What can this tell us about the performance of the test in general, and about specific examinees?

This test had 100 items, scored classically (number-correct), and a time limit of 60 minutes.  Most examinees took 45-55 minutes, so the time limit was appropriate.  A few examinees spent 58-59 minutes; there will usually be some diligent students like that.  There was a fairly strong relationship of time with score, in that examinees who took longer, scored highly.

Now, what about the individuals?  I’ve highlighted 5 examples.

  1. This examinee had the shortest time, and one of the lowest scores.  They apparently did not care very much.  They are an example of a low motivation examinee that moved through quickly.  One of my clients calls these “clickers.”
  2. This examinee also took a short time, but had a suspiciously high score.  They definitely are an outlier on the scatterplot, and should perhaps be investigated.
  3. This examinee is simply super-diligent.  They went right up to the 60 minute limit, and achieved one of the highest scores.
  4. This examinee also went right up to the 60 minute limit, but had one of the lowest scores.  They are likely low ability or low motivation.  That same client of mine calls these “sleepers” – a candidate that is forced to take the exam but doesn’t care, so just sits there and dozes.Alternatively, it might be a harvester; some who has been assigned to memorize test content, so they spend all the time they can, but only look at half the items so they can focus on memorization.
  5. This examinee had by far the lowest score, and one of the lowest times.  Perhaps they didn’t even answer every question.  Again, there is a motivation/effort issue here, most likely.

How useful is time-score analysis?

Like other aspects of psychometric forensics, this is primarily useful for flagging purposes.  We do not know yet if #4 is a Harvester or just low motivation.  Instead of accusing them, we open an investigation.  How many items did they attempt?  Are they a repeat test-taker?  What location did they take the test?  Do we have proctor notes, site video, remote proctoring video, or other evidence that we can review?  There is a lot that can go into such an investigation.  Moreover, simple analyses such as this are merely the tip of the iceberg when it comes to psychometric forensics.  In fact, so much that I’ve heard some organizations simply stick their head in the sand and don’t even bother checking out someone like #4.  It just isn’t in the budget.

However, test security is an essential aspect of validity.  If someone has stolen your test items, the test is now compromised, and you are guaranteed that scores do not mean the same thing they meant when the test was published.  It’s now apples and oranges, even though the items on the test are the same.  Perhaps you might not challenge individual examinees, but perhaps institute a plan to publish new test forms every 6 months.  Regardless, your organization needs to have some difficult internal discussions and establish a test security plan.

 

You may have heard, we’ve recently successfully completed the SOC 2 Type 2 Examination. The SOC 2 process is cumbersome, but for us it was worth it. One of our core values is Do The Right Thing: for our partners, our people, and our planet. Our commitment to ensuring we are successfully securing our data is our way of doing the right thing.

How secure is your assessment program? Let our experts help secure your program by doing forensic analysis, strengthening your proctoring practices, and elevate your test delivery procedures.  .

In the spirit of data security and secure assessments, we want to share a post from our friends at inspired eLearning on the topic of data breaches.

 

Not sure what’s fact and what’s fiction when it comes to data breaches? Check out the top five data breach myths we’ve heard of…and the reality behind them!

Data Breach Myth 1: Only major companies get targeted for data breaches.

Reality: Any company of any size can be the target of a cyber-attack. We often only see news reports about data breaches from major companies which leads to data breach myths like this one. However, that doesn’t mean small companies are in the clear. In fact, 58% of companies that get their data stolen are small businesses. Basically, if your company has an online presence and collects data from customers in any way, you could be susceptible to a data breach.

Data Breach Myth 2: Cybersecurity is only the IT department’s problem.

Reality: Employees in all departments can establish a Security First mindset and help keep important company information safe from data breaches. In fact, it’s often employees not in the IT department who are accidentally making the company vulnerable to an attack or a data breach. This comes down to lack of security awareness training and resources. Many employees aren’t aware of the tell-tale signs of a phishing email and end up clicking infected links or opening bad attachments. This can easily open the door to malware, which can infiltrate the entire system rather than just affecting one employee. For this reason, it’s helpful for companies to teach all employees the basics on how to avoid data breaches, starting with security awareness training in the workplace.

Data Breach Myth 3: All you need is a strong password.

Reality: A strong password is helpful, but it won’t stop all data breaches. It can also be helpful to use two-factor authentication. You can add another layer of protection by requiring users to confirm a phone number via text message or requiring a fingerprint on top of entering their strong password. Although two-factor authentication can be helpful, it is not fool-proof. You should also implement cyber-security training to keep your organization educated and ahead of the threat.

Data Breach Myth 4: Data breaches only cause financial damage

Reality: The financial and reputational damage caused by data breaches can affect companies for years. Companies might face fines and lawsuits that require them to pay out money to the victims of the data breach over time. They might also have to invest more money in cybersecurity training and defenses after the data breach. In addition to financial loss, companies often must deal with a loss of reputation and trust in their company. As a result, companies might lose business and in some cases be forced to shut down.

Data Breach Myth 5: It’s possible to be completely cyber secure.

Reality: Most security professionals would agree that it’s almost impossible to be totally bulletproof when it comes to cyber-attacks. However, cyber-risk is best managed through continual threat education, security awareness training, and involvement from all levels of leadership.

Want more tips? Read more at inspiredeLearning.

Fraudulent testing data is everywhere. In academic testing, students cheat by looking at other students’ responses or informing their friends in the next section what questions are on the test. In professional credentialing, candidates will sit for the exam simply to steal the content for posting on brain dump sites, while other candidates purchasing the content from these sites never pause to consider the ethical ramifications of trading in stolen property.

Threats to test security are also threats to validity and, by extension, the entire existence and integrity of the assessment. What’s worse? The greater the stakes, the greater the incentive to cheat. Has your organization ever taken a deep dive into your assessment data to search for evidence of cheating or other invalid behavior?

Dr. Nathan Thompson, Assessment Systems co-founder and VP of Psychometrics, has long recognized the value of psychometric forensics to an assessment program, but also the lack of software to implement it. Because of this, Dr. Thompson developed Software for Investigating Fraud in Testing (SIFT) in 2016.

“The software is easy to run because of its friendly UI, but the results are so complex that only a small percentage of Ph.D. psychometricians can understand the output,” Dr. Thompson said.

That is why Assessment Systems is proud to offer Psychometric Forensics service, leveraging Dr. Thompson’s expertise (and our love for test security) to bring this customized consulting to organizations who wish to protect the integrity of their assessments.

“The cliché holds true here: an ounce of prevention is worth a pound of cure,” Dr. Thompson said. “We can work with you to identify areas of concern and explore policies, procedures, and practices that will help you.”

If you provide us a dataset, we’ll analyze it with a range of collusion indices and other statistics, evaluating your examinees individually as well as groups such as test centers or classrooms. ASC’s mission is to improve the quality of as many assessments as we can.

Psychometrics is the cornerstone of any high-quality assessment program.  Most organizations do not have an in-house PhD psychometrician, which then necessitates the search for psychometric consulting.  Most organizations, when first searching, are new to the topic and not sure what role the psychometrician plays.  In this article, we’ll talk about how psychometricians and their tools can help improve your assessments, whether you just want to check on test reliability or pursue the lengthy process of accreditation.

Why ASC?

Whether you are establishing or expanding a credentialing program, streamlining operations, or moving from paper to online testing, ASC has a proven track record of providing practical, cost-efficient solutions with uncompromising quality. We offer a free consultation with our team of experts to discuss your needs and determine which solutions are the best fit, including our enterprise SaaS platforms, consulting on sound psychometrics, or recommending you to one of our respected partners.
 

At the heart of our business is our people.

Our collaborative team of Ph.D. psychometricians, accreditation experts, and software developers have diverse experience developing solutions that drive best practices in assessment. This real-world knowledge enables us to consult your organization with solutions tailored specifically to your goals, timeline, and budget.
 

Comprehensive Solutions to Address Specific Measurement Problems

Much of psychometric consulting is project-based around solving a specific problem.  For example, you might be wondering how to set a cutscore on a certification/licensure exam that is legally defensible and meets accreditation standards.  This is a very specific issue, and the scientific literature has suggested a number of sound approaches.  Here are some of the topics where psychometricians can really help:

  • Test Design: Job Analysis & Blueprints
  • Standard and Cutscore Setting Studies
  • Item Writing and Review Workshops
  • Test and Item Statistical Analysis
  • Equating Across Years and Forms
  • Adaptive Testing Research
  • Test Security Evaluation
  • NCCA/ANSI Accreditation

 

Why psychometric consulting?

All areas of assessment can be smarter, faster and fairer.

Develop Reliable and Valid Assessments
We’ll help you understand what needs to be done to develop defensible tests and how to implement them in a cost-efficient manner.  Much of the work revolves around establishing a sound test development cycle.

Increase Test Security
We have specific expertise in psychometric forensics, allowing you to flag suspicious candidates or groups in real time, using our automated forensics report.

Achieve Accreditation
Our dedicated experts will assist in setting your organization up for success with NCCA/ANSI accreditation of professional certification programs.

Comprehensive Psychometric Analytics
We use CTT and IRT with principles of machine learning and AI to deeply understand your data and provide actionable recommendations.

We can help your organization develop and publish certification and licensure exams, based on best practices and accreditation standards, in a matter of months.

If you’re looking for a way to add these best practices to your assessments, here’s how:

Item and Test Statistical Analysis
If you are doing this process at least annually, you are not meeting best practices or accreditation standards. But don’t worry, we can help! In addition to performing these analyses for you, you also have the option of running them yourself in our FastTest platform or using our psychometric software like Iteman and Xcalibre.

Job Analysis
How do you know what a professional certification test should cover?  Well, let’s get some hard data by surveying job incumbents. Knowing and understanding this information and how to use it is essential if you want to test people on whether they are prepared for the job or profession.

Cutscore Studies (Standard Setting)
When you use sound psychometric practices like the modified-Angoff, Beuk Compromise, Bookmark, and Contrasting Groups methods, it will help you establish a cutscore that meets professional standards.

 

It’s all much easier if you use the right software!

Once we help you determine the best solutions for your organization, we can train you on best practices, and it’s extremely easy to use our software yourself.  Software like Iteman and Xcalibre is designed to replace much of the manual work done by psychometricians for item and test analysis, and FastTest automates many aspects of test development and publishing.  We even offer free software like the Angoff Analysis Tool.  However, our ultimate goal is your success: Assessment Systems is a full-service company that continues to provide psychometric consulting and support even after you’ve made a purchase. Our team of professionals is available to provide you with additional support at any point in time. We want to ensure you’re getting the most out of our products!  Click below to sign up for a free account in FastTest and see for yourself.

 

​So, yeah, the use of “hacks” in the title is definitely on the ironic and gratuitous side, but there is still a point to be made: are you making full use of current technology to keep your tests secure?  Gone are the days when you are limited to linear test forms on paper in physical locations.  Here are some quick points on how modern assessment technology can deliver assessments more securely, effectively, and efficiently than traditional methods:

1.  AI delivery like CAT and LOFT

Psychometrics was one of the first areas to apply modern data science and machine learning (see this blog post for a story about a MOOC course).  But did you know it was also one of the first areas to apply artificial intelligence (AI)?  Early forms of computerized adaptive testing (CAT) were suggested in the 1960s and had become widely available in the 1980s.  CAT delivers a unique test to each examinee by using complex algorithms to personalize the test.  This makes it much more secure, and can also reduce test length by 50-90%.

2. Psychometric forensics

Modern psychometrics has suggested many methods for finding cheaters and other invalid test taking behavior.  These can range from very simple rules like flagging someone for having a top 5% score in a bottom 5% time, to extremely complex collusion indices.  These approaches are designed explicitly to keep your test more secure.

3. Tech enhanced items

Tech enhanced items (TEIs) are test questions that leverage technology to be more complex than is possible on paper test.  Classic examples include drag and drop or hotspot items.  These items are harder to memorize, and therefore contribute to security.

4. IP address limits

Suppose you want to make sure that your test is only delivered in certain school buildings, campuses, or other geographic location.  You can build a test delivery platform that limits your tests to a range of IP addresses, which implements this geographic restriction.

5. Lockdown browser

 A lockdown browser is special software that locks a computer screen onto a test in progress, so for example a student cannot open Google in another tab and simply search for answers.  Advanced versions can also scan the computer for software that is considered a threat, like screen capture software.

6. Identity verification

Tests can be built to require unique login procedures, such as requiring a proctor to enter their employee ID and the test-taker to enter their student ID.  Examinees can also be required to show photo ID, and of course there are new biometric methods being developed.

7. Remote proctoring

 The days are gone when you need to hop in the car and drive 3 hours to sit in a windowless room at a community college to take a test.  Nowadays, proctors can watch you and your desktop via webcam.  This is arguably as secure as an in-person proctoring, and certainly more convenient and cost-effective.

So, how can I implement these to deliver assessments more securely?

Some of these approaches are provided by vendors specifically dedicated to that space, such as ProctorExam for remote proctoring.  However, if you use ASC’s FastTest platform, all of these methods are available for you right out of the box.  Want to see for yourself?  Sign up for a free account!

The traditional Learning Management System (LMS) is designed to serve as a portal between educators and their learners. Platforms like Moodle are successful in facilitating cooperative online learning in a number of groundbreaking ways: course management, interactive discussion boards, assignment submissions, and delivery of learning content. While all of this is great, we’ve yet to see an LMS that implements best practices in assessment and psychometrics to ensure that medium or high stakes tests meet international standards.

To put it bluntly, LMS systems have assessment functionality that is usually good enough for short classroom quizzes but falls far short of what is required for a test that is used to award a credential.  A white paper on this topic is available here, but some examples include:

  • Treatment of items as reusable objects
  • Item metadata and historical use
  • Collaborative item review and versioning
  • Test assembly based on psychometrics
  • Psychometric forensics to search for non-independent test-taking behavior
  • Deeper score reporting and analytics

Assessment Systems is pleased to announce the launch of an easy-to-use bridge between FastTest and Moodle that will allow users to seamlessly deliver sound assessments from within Moodle while taking advantage of the sophisticated test development and psychometric tools available within FastTest. In addition to seamless delivery for learners, all candidate information is transferred to FastTest, eliminating the examinee import process.  The bridge makes use of the international Learning Tools Interoperability standards.

If you are already a FastTest user, watch a step-by-step tutorial on how to establish the connection, in the FastTest User Manual by logging into your FastTest workspace and selecting Manual in the upper right-hand corner. You’ll find the guide in Appendix N.

If you are not yet a FastTest user and would like to discuss how it can improve your assessments while still allowing you to leverage Moodle or other LMS systems for learning content, sign up for a free account here.

As we jump headfirst into 2018, we’re reflecting on our successes from the past year. One such success was our inclusion in the Minneapolis/St. Paul Business Journal’s list of Best Places to Work in 2017. We’re honored to be recognized!


So, what makes Assessment Systems one of the best places to work?

Though founded in 1979, we run our company with the mindset and energy of a startup. This means we have a strong foundation on which to create world-class software, but at the same time, we’re constantly innovating, working with the newest technologies and taking risks.

Our leadership team drives this startup mentality, which encourages employees to constantly be on their toes. With experts in a variety of areas, including assessment, psychometrics, entrepreneurship, and tech, not only do all team members play an important role in the business, they also have a real opportunity to make a difference.

We have great company values.

Furthermore, it’s easy for our employees to be inspired every day due to our company’s values. Our CEO stresses the importance of doing the right thing and being kind, which everyone on the team is proud to stand behind. Principles such as these are fundamental to the success of our employees. Ask anyone who’s partnered with us and they’ll tell you that we’re a small company with a big heart that wants to provide the best product and service to our clients.


Last, but certainly not least, we love what we do!

Our unique company culture, diverse team, and values make it easy to love where we work. As a result, we’re all the more motivated to make a difference in our industry and to continue improving our company culture even more.

We may not have a big team, but we have incredible skill sets and a collaborative environment where we rely on each other to make great things happen. We are a small company that is changing the way people test online and improving the world one test at a time.


Sound interesting? Check out our careers page or learn more about what we’re doing at assess.com.

Desperation is seldom fun to see.

Some years ago, having recently released our online marking functionality I was reviewing some of the functionality in a customer workspace I was intrigued to see “Beyonce??” mentioned in a marker’s comments on an essay. The student’s essay was evaluating some poetry and had completely misunderstood the use of metaphor in the poem in question. The student also clearly knew that her interpretation was way off, but didn’t know how and had reached the end of her patience. So after a desultory attempt at answering, with a cry from the heart, reminiscent of William Wallace’s call for freedom, she wrote “BEYONCE” with about seventeen exclamation points. It felt good to see that her spirit was not broken, and it was a moment of empathy that drove home the damage that standardized tests are inflicting on our students. That vignette is playing itself out millions of time each year in this country, the following explains why.

What are “Standardized Tests”?

We use standardized tests for a variety of reasons, but underlying every reason (curriculum effectiveness, college/career preparedness, teacher effectiveness, etc.) is the understanding that the test is measuring what a student has learned. In order to know how all our students are doing, we give them all standardized tests, meaning every student receives essentially the same set of tests. So, a standardized test is a test where all students take essentially the same test. This is a difficult endeavor given the wide range of students and number of tests, and raises the question “How do we do this reliably and in a reasonable amount of time?”

Accuracy and Difficulty vs Length

We all want tests to reliably measure the students’ learning. In order to make these tests reliable, we need to supply questions of varying difficulty, from very easy to very difficult, to cover a wide range of abilities. In order to reduce the length of the test, most of the questions fall in the medium easy to medium difficulty range because that is where most of the students’ ability level will fall. So the test that best balances length and accuracy for the whole population should be constructed such that the amount of questions of any difficulty is proportionate to the number of students of that ability.

Why are most questions in the medium difficulty range? Imagine creating a test to measure 10th graders’ math ability. A small number of the students might have a couple years of calculus. If the test covered those topics, imagine the experience of most students who would often not even understand the notation in the question. Frustrating, right? On the other hand, if the test was also constructed to measure students with only rudimentary math knowledge, these average to advanced students would be frustrated and bored from answering a lot of questions on basic math facts. The solution most organizations use is to present only a few questions that are really easy or difficult, and accept that this score is not as accurate as they would prefer for the students at either end of the ability range.

These Tests are Inaccurate and Mean Spirited

The problem is that while this might work OK for a lot of kids, it exacts a pretty heavy toll on others. Almost one in five students will not know the answer to 80% of the questions on these tests, and scoring about 20% on a test certainly feels like failing. It feels like failing every time a student takes such a test. Over the course of an academic career, students in the bottom quintile will guess on or skip 10,000 questions. That is 10,000 times the student is told that school, learning, or success is not for them. Even biasing the test to be easier only makes a slight improvement.

Computerized Adaptive Testing, Test Performance with Bell Curve

The shaded area represents students who will miss at least 80% of questions.

It isn’t necessarily better for the top students whose every testing experience assures them that they are already very successful when the reality is that they are likely being outperformed by a significant percentage of their future colleagues.

In other words, at both ends of the Bell Curve, we are serving our students very poorly, inadvertently encouraging lower performing students to give up (there is some evidence that the two correlate) and higher performing students to take it easy. It is no wonder that people dislike standardized tests.

There is a Solution

A computerized adaptive test (CAT) solves all the problems outlined above. Properly constructed, a CAT has the ability to make the following faster, fairer, and more valid:

  • Every examinee completes the test in less time (fast)
  • Every examinee gets a more accurate score (valid)
  • Every examinee receives questions tuned to their ability so gets about half right (fair)

Given all the advantages of CAT, it may seem hard to believe that they are not used more often. While they are starting to catch on, it is not fast enough given the heavy toll that the old methods exact on our students. It is true that few testing providers can enable CATs, but that is simply making an excuse. If a standardized test is delivered to as few as 500 students it can be made adaptive. It probably isn’t, but it could be. All that is needed are computers or tablets, an Internet connection, and some effort. We should expect more.

How can my organization implement CAT?

While CAT used to only be feasible for large organizations that tested hundreds of thousands or millions of examinees per year, a number of advances have changed this landscape.  If you’d like to do something about your test, it might be worthwhile for you to evaluate CAT.  We can help you with that evaluation; if you’d like to chat, here is a link to schedule a meeting. Or contact me if you’d like to discuss the math or related ideas please drop me a note.

Since the first tests were developed 2000 years ago for entry into civil service of Imperial China, test security has been a concern.  The reason is quite straightforward: most threats to test security are also threats to validity, and the decisions we make with test scores could therefore be invalid, or at least suboptimal.  It is therefore imperative that organizations that develop or utilize tests should develop a Test Security Plan (TSP).  The TSP is a document that helps an organization anticipate test security issues, establish deterrent and detection methods, and plan responses.  In can also include validity threats not security-related, such as how to deal with examinees that have low motivation.

There are several reasons to develop a Test Security Plan.  First, it drives greater security and therefore validity.  The TSP will enhance the legal defensibility of the testing program.  It helps to safeguard the content, which is typically an expensive investment for any organization that develops tests themselves.  If incidents do happen, they can be dealt with more swiftly and effectively.  It helps to manage all the security-related efforts.

The development of such a complex document requires a strong framework.  We advocate a framework with three phases: planning, implementation, and response.  In addition, the TSP should be revised periodically.

 

Phase 1: Planning

The first step in this phase is to list all potential threats to each assessment program at your organization.  This could include harvesting of test content, preknowledge of test content from past harvesters, copying other examinees, proxy testers, proctor help, and outside help.  Next, these should be rated on axes that are important to the organization; a simple approach would be to rate on potential impact to score validity, cost to the organization, and likelihood of occurrence.  This risk assessment exercise will help the remainder of the framework.

Next, the organization should develop the TSP.  The first piece is to identify deterrents and procedures to reduce the possibility of issues.  This includes delivery procedures (such as a lockdown browser or proctoring), proctor training manuals, a strong candidate agreement, anonymous reporting pathways, confirmation testing, and candidate identification requirements.  The second piece is to explicitly plan for psychometric forensics.  This can rsange from complex collusion indices based on item response theory to simple flags, such as a candidate responding to a certain multiple choice option more than 50% of the time or obtaining a score in the top 10% but in the lowest 10% of time.  The third piece is to establish planned responses.  What will you do if a proctor reports that two candidates were copying each other?  What if someone obtains a high score in an unreasonably short time?  What if someone obviously did not try to pass the exam, but still sat there for the allotted time?  If a candidate were to lose a job opportunity due to your response, it helps you defensibility to show that the process was established ahead of time with the input of important stakeholders.

 

Phase 2: Implementation

The second phase is to implement the relevant aspects of the Test Security Plan, such as training all proctors in accordance with the manual and login procedures, setting IP address limits, or ensuring that a new secure testing platform with lockdown is rolled out to all testing locations.  There are generally two approaches.  Proactive approaches attempt to reduce the likelihood of issues in the first place, and reactive methods happen after the test is given.  The reactive methods can be observational, quantitative, or content-focused.  Observational methods include proctor reports or an anonymous tip line.  Quantitative methods include psychometric forensics, for which you will need software like SIFT.  Content-focused methods include automated web crawling.

Both approaches require continuous attention.  You might need to train new proctors several times per year, or update your lockdown browser.  If you use a virtual proctoring service based on record-and-review, flagged candidates must be periodically reviewed.  The reactive methods are similar: incoming anonymous tips or proctor reports must be dealt with at any given time.  The least continuous aspect is some of the psychometric forensics, which depend on a large-scale data analysis; for example, you might gather data from tens of thousands of examinees in a testing window and can only do a complete analysis at that point, which could take several weeks.

 

Phase 3: Response

The third phase, of course, to put your planned responses into motion if issues are detected.  Some of these could be relatively innocuous; if a proctor is reported as not following procedures, they might need some remedial training, and it’s certainly possible that no security breach occurred.  The more dramatic responses include actions taken against the candidate.  The most lenient is to provide a warning or simply ask them to retake the test.  The most extreme methods include a full invalidation of the score with future sanctions, such as a five-year ban on taking the test again, which could prevent someone from entering a profession for which they spent 8 years and hundreds of thousands of dollars in educative preparation.

 

What does a test security plan mean for me?

It is clear that test security threats are also validity threats, and that the extensive (and expensive!) measures warrant a strategic and proactive approach in many situations.  A framework like the one advocated here will help organizations identify and prioritize threats so that the measures are appropriate for a given program.  Note that the results can be quite different if an organization has multiple programs, from a practice test to an entry level screening test to a promotional test to a professional certification or licensure.

Another important difference is that between test sponsors/publishers and test consumers.  In the case of an organization that purchases off-the-shelf pre-employment tests, the validity of score interpretations is of more direct concern, while the theft of content might not be an immediate concern.  Conversely, the publisher of such tests has invested heavily in the content and could be massively impacted by theft, while the copying of two examinees in the hiring organization is not of immediate concern.

In summary, there are more security threats, deterrents, procedures, and psychometric forensic methods than can be discussed in one blog post, so the focus here rather on the framework itself.  For starters, start thinking strategically about test security and how it impacts their assessment programs by using the multi-axis rating approach, then begin to develop a Test Security Plan.  The end goal is to improve the health and validity of your assessments.

Want to implement some of the security aspects discussed here, like online delivery lockdown browser, IP address limits, and proctor passwords? Sign up for a free account in FastTest!

If you are involved with any sort of data science, which psychometrics most definitely is, you’ve probably used R.  R is an environment that allows you to implement packages for many different types of analysis, which are built by a massive community of data scientists around the world.  These are completely self-policed, so there is a wide range of quality.  One  package that is specific to psychometrics is called CopyDetect, and it’s designed particularly for psychometric forensics.  Here’s some of my thoughts on CopyDetect and similar packages in R.

Note: This review is written in October 2017 for v1.2, and R packages are frequently updated by their authors, so some issues might be addressed in the future.

What is CopyDetect?

CopyDetect (documentation here) is a package, meaning that you must first install it and then integrate it into your command scripts, write and run all the code you need for importing and formatting your data, at which point you can call it using command code designed by the author.  An example in the documentation is something like you see below, which calls CopyDetect to compare the response strings of Examinee #30 to Examinee #70.

CopyDetect1(data=responses,item.par=est.ipar,pair=c(30,70))

This assumes, of course, at even a higher level that you are comfortable with writing code – most psychometricians are, but certainly not most professionals in the assessment field.

Advantages of CopyDetect

  1. It is free.  Like many R packages, it enjoys attention simply because of the price tag.
  2. It calculates well-respected, modern indices such as Wollack’s Omega and the K-variants.  In contrast, an old program called Scrutiny! calculated only one index, and it wasn’t a very good one.  Some other software only calculates really bad indices.
  3. It’s not hard to get it running, at a basic level.  There’s been other packages that I’ve simply had to give up on.
  4. It is being updated.  Because it is maintained by a university professor with time to devote to it, some issues can be addressed.  Some packages have been abandoned, like many WordPress Plugins.
  5. It makes psychometric forensics accessible to more organizations and psychometricians.

Disadvantages of CopyDetect

Here are some of the issues I have found.  Many thanks to a colleague (another psychometrician, who frequently speaks on forensics at conferences) who performed an independent evaluation and shared similar points.

  1. It primarily uses dichotomous 0/1 data.  There is a “CopyDetect1” portion of the package that assumes multiple choice items have been collapsed to 0/1.  However, most of the best collusion indices specifically focus on the probability of two examinees selecting the same incorrect distractor, and this info is ignored in dichotomous data because all incorrect answers have been collapsed to 0.  There is a “CopyDetect2” portion of the package, but neither myself or a colleague at another organization (who is an internationally recognized speaker on psychometric forensics) were able to even get it to run.
  2. CopyDetect counts an examinee against itself in the the baseline calculations, which of course is a minor effect in large sample, but still an obvious error because an examinee can’t copy off themselves.
  3. CopyDetect doubles all frequencies by looking at the entire matrix rather than a diagonal; if Examinees 1 and 2 had the same response on 10 items, it also adds in that “2 vs 1” had 10 items with the same response, stating the total number of same responses is 20 when it really is 10.
  4. The documentation is lacking, though this is actually normal for R packages.  The vast majority of the useful help/examples for R is on websites like StackOverflow or Quora.  Very rarely have I found anything of use in the official documentation for any R package.  The example does not talk about how to run real data and evaluate results for cheating – the entire purpose of the software – and instead provides instructions on simulation studies to check the power/error of indices on fake data.
  5. CopyDetect does not produce usable output.  Results are stored as an object, which is a sort of min-database inside R.  You then have to write more code to find, format, and print the results you might need.
  6. CopyDetect does not recognize on its own that you might have more than 2 examinees, which of course is what always happens.  You have to write more code to perform all pairwise comparisons.
  7. It can be incredibly slow.  This is also normal for a lot of R since 1) many packages are low quality and 2) R itself is not a compiled language (though many packages contain compiled code, and some packages are indeed incredibly fast at what they do).
  8. It does not do roll-up analysis at the group level, which is incredibly important for psychometric forensics.

Summary

So what does this mean?  Well, I spent one evening looking into using CopyDetect for some of my consulting work, and immediately recognized the above issues.  That was enough to drop it and move on.

This isn’t the only experience I’ve had like this with R, either.  As noted in this scientific research from the University of Minnesota, some of the item response theory packages in R aren’t the best.

Update: The author has released a new version (currently dated 2018.10.08), and some of these issues might now be addressed.