Posts on psychometrics: The Science of Assessment

La seguridad y validez de las pruebas y exámenes en línea son extremadamente importantes.  La pandemia COVID-19 está cambiando drásticamente todos los aspectos de nuestro mundo, y una de las áreas más afectadas es la evaluación educativa y otros tipos de evaluación. Muchas organizaciones aún realizaban pruebas con metodologías de hace 50 años, como colocar a 200 examinados en una sala grande con escritorios, exámenes en papel y un lápiz. COVID-19 está obligando a muchas organizaciones a dar un giro, lo que brinda la oportunidad de modernizar las evaluaciones. 

Pero, ¿cómo podemos mantener la seguridad en la evaluación, y por lo tanto la validez, a través de estos cambios? A continuación, presentamos algunas sugerencias, las cuales se pueden implementar fácilmente en las plataformas de evaluación de ASC, líderes en la industria. Comience registrándose para obtener una cuenta gratuita en https://assess.com/assess-ai/.

Verdadera banca de ítems con acceso a contenido

Una buena evaluación en línea comienza con buenos ítems. Si bien los Sistemas de Gestión del Aprendizaje (LMS) y otras plataformas que no son realmente de evaluación incluyen algunas funciones de creación de ítems, por lo general no cumplen con los requisitos básicos para una verdadera banca de ítems. Existen prácticas recomendadas con respecto a la banca de ítems que son estándar en las organizaciones de evaluación a gran escala (p. Ej., Los Departamentos de Educación de Estado en EE. UU.), pero son sorprendentemente raras para los exámenes de certificación / licencia profesional, universidades y otras organizaciones. A continuación, se muestran algunos ejemplos.collaborative item banking

• Los ítems son reutilizables (no es necesario cargarlos para cada prueba en la que se utilicen)

• Seguimiento de la versión del ítem

• Seguimiento y auditorías de edición hecha por usuarios

• Controles de contenido de autor (los profesores de matemáticas solo pueden ver elementos de matemáticas)

• Almacenar metadatos como parámetros de la Teoría de Respuesta al Ítem (TRI) y estadísticas clásicas

• Seguimiento del uso de ítems en las pruebas

• Flujo de trabajo de revisión de ítems

Acceso basado en roles

Todos los usuarios deben estar limitados por roles, como Autor del ítem, Revisor del Ítem, Editor de Pruebas y Administrador de Examinados. Entonces, por ejemplo, es posible que alguien a cargo de administrar la lista de examinados / estudiantes nunca vea ninguna pregunta del examen.

Análisis forense de datos

Hay muchas formas de analizar los resultados de tu prueba para buscar posibles amenazas de seguridad / validez. Nuestro  software SIFT  proporciona una plataforma de software gratuita para ayudarte a implementar esta metodología moderna. Puedes evaluar los índices de colusión, que cuantifican qué tan similares son las respuestas para cualquier par de examinados. También puedes evaluar los tiempos de respuesta, el rendimiento del grupo y las estadísticas acumuladas.

Aleatorización

Cuando las pruebas se entregan en línea, debe tener la opción de aleatorizar el orden de los ítems y también el orden de las respuestas. Al imprimir en papel, debe haber una opción para aleatorizar el orden. Pero, por supuesto, está mucho más limitado respecto a esto cuando se usa papel.

Prueba lineal sobre la marcha (LOFT)

LOFT creará una prueba aleatoria única para cada examinado. Por ejemplo, puedes tener un grupo de 300 ítems distribuidos en 4 dominios, y cada examinado recibirá 100 ítems con 25 de cada dominio. Esto aumenta enormemente la seguridad.

Pruebas adaptativas computarizadas (CAT)

CAT lleva la personalización aún más lejos y adapta la dificultad del examen y el número de ítems que ve cada alumno, en base a ciertos algoritmos y objetivos psicométricos. Esto hace que la prueba sea extremadamente segura.

Navegador bloqueado

¿Quieres asegurarte de que el alumno no pueda navegar en busca de respuestas o tomar capturas de pantalla de ítems? Necesitas un navegador bloqueado. Las plataformas de evaluación de ASC,  Assess.ai  y  FastTest, vienen con esto listo para usar y sin costo adicional.

Códigos de prueba para examinados

¿Quieres asegurarte de que la persona adecuada realice el examen adecuado? Genera contraseñas únicas de un solo uso para que las entregue un supervisor después de la verificación de identidad. Esto es especialmente útil en la supervisión remota; el estudiante nunca recibe ninguna información  antes del examen sobre cómo ingresar, excepto para iniciar la sesión de supervisión virtual. Una vez que el supervisor verifica la identidad del examinado le proporciona la contraseña única de un solo uso.

Códigos de supervisor

¿Quieres un paso adicional en el procedimiento de inicio de la prueba? Una vez que se verifica la identidad de un estudiante e ingresa su código, el supervisor también debe ingresar una contraseña diferente que sea exclusiva para él ese día.

Ventanas de fecha / hora

¿Quieres evitar que los examinados ingresen temprano o tarde? Configura una ventana de tiempo específica, como el viernes de 9 a 12 am.

Supervisión basada en IA (Inteligencia Artificial)

Deliver-exams remote proctoring

Este nivel de supervisión es relativamente económico, y hace un gran trabajo validando los resultados de un examinado individual. Sin embargo, no protege la propiedad intelectual de las preguntas de tu examen. Si un examinado roba todas las preguntas, no lo sabrás de inmediato. Por lo tanto, es muy útil para exámenes de nivel bajo o medio, pero no tan útil para exámenes de alto riesgo como certificaciones o licenciaturas. Obtenga más información sobre nuestras opciones de supervisión remota. También te recomiendo esta publicación de blog para obtener una descripción general de la industria de supervisión remota.

Supervisión pruebas en línea en tiempo real

Si no puedes asistir a los centros de pruebas en persona debido a COVID, esta es la siguiente mejor opción. Los supervisores en vivo pueden registrar al candidato, verificar la identidad e implementar todas las demás cosas anteriores. Además, pueden verificar el entorno del examinado y detener el examen si ven que el examinado roba preguntas u otros problemas importantes. MonitorEDU es un gran ejemplo de esto.

¿Cómo puedo empezar?

¿Necesitas ayuda para implementar algunas de estas medidas? ¿O simplemente quieres hablar sobre las posibilidades? Envía un correo electrónico a ASC a solutions@assess.com.

 

La vigilancia en línea existe desde hace más de una década. Pero dado el reciente brote de COVID-19, las instituciones educativas y de fuerza laboral / certificación están luchando por cambiar sus operaciones, y una gran parte de esto es un aumento increíble en la vigilancia en línea. Esta publicación de blog está destinada a proporcionar una descripción general de la industria de vigilancia en línea para alguien que es nuevo en el tema o está comenzando a comprar y está abrumado por todas las opciones que existen.

Vigilancia en Línea: Dos Mercados Distintos

En primer lugar, describiría la industria de vigilancia en línea como perteneciente a dos mercados distintos, por lo que el primer paso es determinar cuál de ellos se adapta a tu organización.

1. Sistemas a mayor escala, de menor costo (cuando son a gran escala) y con menos seguridad, diseñados para ser utilizados solo como un complemento para las principales plataformas LMS como Blackboard o Canvas. Por lo tanto, estos sistemas de vigilancia en línea están diseñados para exámenes de nivel medio, como un examen de mitad de período de Introducción a la psicología en una universidad.

2. Sistemas de menor escala, mayor costo y mayor seguridad diseñados para ser utilizados con plataformas de evaluación independientes. Estos son generalmente para exámenes de mayor importancia como certificación o fuerza laboral, o quizás para uso especial en universidades como exámenes de Admisión y Colocación.

¿Cómo reconocer la diferencia? El primer tipo anunciará la fácil integración con sistemas como Blackboard o Canvas como característica clave. También se centrarán a menudo en la revisión de videos por IA, en lugar de usar humanos en tiempo real. Otra consideración clave es observar la base de clientes existente, que usualmente es anunciada.

Otras formas en que los sistemas de vigilancia en línea pueden diferir

IA vs humanos: Algunos sistemas se basan exclusivamente en algoritmos de inteligencia artificial para marcar las grabaciones de video de los examinados. Otros sistemas utilizan humanos reales.

Grabar y Revisar vs Humanos en Tiempo Real: Existen dos formas si se utilizan humanos. Primero, puede ser en vivo y en tiempo real, lo que significa que hay un ser humano en el otro extremo del video que puede confirmar la identidad antes de permitir que comience la prueba, y detener la prueba si hay actividad ilícita. Grabar y Revisar grabará el audio y un humano lo comprobará en un plazo de 24 a 48 horas. Esto es más flexible, pero no puedes detener la prueba si alguien está robando el contenido; probablemente no lo sabrás hasta el día siguiente.

Captura de pantalla: Algunos proveedores de vigilancia en línea tienen la opción de grabar / transmitir la pantalla y también la cámara web. Algunos también brindan la opción de hacer únicamente esto (sin cámara web) para exámenes de menor importancia.

Teléfono móvil como tercera cámara: Algunas plataformas más nuevas ofrecen la opción de integrar fácilmente el teléfono móvil del examinado como una tercera cámara, que funciona efectivamente como un supervisor humano. Se les indicará a los examinados que utilicen el video para mostrar debajo de la mesa, detrás del monitor, etc., antes de comenzar el examen. Luego, se les puede indicar que coloquen el teléfono a 2 metros de distancia con una vista clara de toda la habitación mientras se realiza la prueba.

Uso de supervisores propios: Algunos sistemas de vigilancia en línea le permiten utilizar su propio personal como supervisores, lo que es especialmente útil si la prueba se realiza en un período de tiempo reducido. Si se entrega continuamente 24 × 7 durante todo el año, probablemente desee utilizar el personal altamente capacitado del proveedor.

Integraciones de API: Algunos sistemas requieren que los desarrolladores de software configuren una integración de API con su LMS o plataforma de evaluación. Otros son más flexibles y puedes iniciar sesión por ti mismo, cargar una lista de examinados y ya queda todo listo para la prueba.

Bajo pedido vs Programado: Algunas plataformas requieren que se programe un margen de tiempo para que los examinados realicen la prueba. Otros son puramente bajo demanda y el examinado puede presentarse cuando esté listo. MonitorEDU es un excelente ejemplo de esto: los examinados se presentan en cualquier momento, presentan su identificación a un humano en tiempo real y luego comienzan la prueba de inmediato: sin descargas / instalaciones, sin verificaciones del sistema, sin integraciones de API, nada.

Más seguridad: Un Mejor Sistema de Entrega de Pruebas

Una buena plataforma de entrega de pruebas también vendrá con su propia funcionalidad para mejorar la seguridad de las pruebas: aleatorización, generación automatizada de ítems, pruebas adaptativas computarizadas, pruebas lineales sobre la marcha, banca profesional de ítems, puntuación de la teoría de respuesta a los ítems, puntuación escalada, análisis psicométrico, equiparación, entrega de bloqueo y más. En el contexto de la vigilancia en línea, quizás lo más destacado sea la entrega de bloqueo. En este caso, la prueba se hará cargo por completo de la computadora del examinado y no podrá usarla para nada más hasta que termine la prueba.

Los sistemas LMS rara vez incluyen esta funcionalidad, porque no son necesarios para un examen de mitad de período de Introducción a la psicología. Sin embargo, hay muchas cosas en juego en la mayoría de las evaluaciones del mundo (admisiones universitarias, certificaciones, contratación de personal, etc.) y estas pruebas dependen en gran medida de dicha funcionalidad. Tampoco es solo una costumbre o una tradición. Dichos métodos se consideran esenciales según los estándares internacionales, incluidos AERA/APA/NCMA, ITC y NCCA.

Socios de ASC de Vigilancia en Línea

ASC les brinda a sus clientes una solución lista para ser usada, debido a que está asociado con algunos de los líderes en el ámbito. Estos incluyen: MonitorEDU, ProctorExam, Examity y Proctor360. Obtén más información en nuestra página web sobre esa funcionalidad y otra que explica el concepto de seguridad de prueba configurable.

Traducido de la entrada de blog escrita por el Dr. Nathan Thompson.

Nathan Thompson obtuvo su doctorado en psicometría de la Universidad de Minnesota, con un enfoque en pruebas adaptativas computarizadas. Su licenciatura fue de Luther College con una triple especialización en Matemáticas, Psicología y Latín. Está interesado principalmente en el uso de la IA y la automatización de software para aumentar y reemplazar el trabajo realizado por psicometristas, lo que le ha proporcionado una amplia experiencia en el diseño y programación de software. El Dr. Thompson ha publicado más de 100 artículos de revistas y presentaciones de conferencias, pero su favorito sigue siendo https://pareonline.net/getvn.asp?v=16&n=1.

Test information function

The IRT Test Information Function is a concept from item response theory (IRT) that is designed to evaluate how well an assessment differentiates examinees, and at what ranges of ability. For example, we might expect an exam composed of difficult items to do a great job in differentiating top examinees, but it is worthless for the lower half of examinees because they will be so confused and lost.

The reverse is true of an easy test; it doesn’t do any good for top examinees. The test information function quantifies this and has a lot of other important applications and interpretations.

IRT Test Information Function: how to calculate it

The test information function is not something you can calculate by hand. First, you need to estimate item-level IRT parameters, which define the item response function. The only way to do this is with specialized software; there are a few options in the market, but we recommend Xcalibre.

Next, the item response function is converted to an item information function for each item. The item information functions can then be summed into a test information function. Lastly, the test information function is often inverted into the conditional standard error of measurement function, which is extremely useful in test design and evaluation.

IRT Item Parameters

Software like Xcalibre will estimate a set of item parameters. The parameter you use depends on the item types and other aspects of your assessment.

For example, let’s just use the 3-parameter model, which estimates a, b, and c. And we’ll use a small test of 5 items. These are ordered by difficulty: item 1 is very easy and Item 5 is very hard.

Item a b c
1 1.00 -2.00 0.20
2 0.70 -1.00 0.40
3 0.40 0.00 0.30
4 0.80 1.00 0.00
5 1.20 2.00 0.25

 

Item Response Function

The item response function uses the IRT equation to convert the parameters into a curve. The purpose of the item parameters is to fit this curve for each item, like a regression model to describe how it performs.

Here are the response functions for those 5 items. Note the scale on the x-axis, similar to the bell curve, with the easy items to the left and hard ones to the right.

item response function five graphs

 

Item Information Function

The item information function evaluates the calculus derivative of the item response function. An item provides more information about examinees where it provides more slope.

For example, consider Item 5: it is difficult, so it is not very useful for examinees in the bottom half of ability. The slope of the Item 5 IRF is then nearly 0 for that entire range. This then means that its information function is nearly 0.

item information function five graphs

 

Test Information Function

The test information function then sums up the item information functions to summarize where the test is providing information. If you imagine adding the graphs above, you can easily imagine some humps near the top and bottom of the range where there are the prominent IIFs. 

test information function

 

Conditional Standard Error of Measurement Function

The test information function can be inverted into an estimate of the conditional standard error of measurement. What do we mean by conditional? If you are familiar with classical test theory, you know that it estimates the same standard error of measurement for everyone that takes a test.

But given the reasonable concepts above, it is incredibly unreasonable to expect this. If a test has only difficult items, then it measures top students well, and does not measure lower students well, so why should we say that their scores are just as accurate? The conditional standard error of measurement turns this into a function of ability.

Also, note that it refers to the theta scale and not to the number-correct scale.

conditional standard error of measurement

 

How can I implement all this?

For starters, I recommend delving deeper into an item response theory book. My favorite is Item Response Theory for Psychologists by Embretson and Riese. Next, you need some item response theory software.

Xcalibre can be downloaded as a free version for learning and is the easiest program to learn how to use (no 1980s-style command code… how is that still a thing?). But if you are an R fan, there are plenty of resources in that community as well.

Tell me again: why are we doing this?

The purpose of all this is to effectively model how items and tests work, namely, how they interact with examinees. This then allows us to evaluate their performance so that we can improve them, thereby enhancing reliability and validity.

Classical test theory had a lot of shortcomings in this endeavor, which led to IRT being invented. IRT also facilitates some modern approaches to assessment, such as linear on-the-fly testing, adaptive testing, and multistage testing.

Professional certification programs that allow participants to validate their knowledge and skills abound in the U.S. and around the world. Examples range from long-standing, well-recognized teacher certification and CPR programs to more niche and cutting-edge offerings, such as the Project Management Professional and Amazon Web Services credentials. Candidates are often pursuing some combination of new career, promotion, higher salary, and self-fulfillment. They take varying risks with their time and finances for what could return great reward. Given the high stakes involved, an extensive effort goes into certification program management – that is, ensuring that the certifications are developed and run according to best practices. This includes psychometrics, but is most definitely not limited to that topic.

 

What goes into certification program management?

There are many aspects that go into certification program management, including:pre-employment-testing

  • Legal status
  • Board governance
  • Accounting
  • Test development
  • Staffing, org charts, and org structure (firewall between certification and education)
  • Continuing education
  • Recertification
  • Prerequisites and eligibility pathways
  • Operations
  • Policies for candidates

One important consideration in certification program management is the requirement of firewall between staff involved in Certification and those involved in Education. Basically, you don’t want the people who are teaching courses to have seen the items on the test – especially if there is an incentive for them to help the students pass! For example, if instructors are tracked by pass rate for their students or institution, they have a reason to want more people to pass, and could divulge more info about the exam than they should. If they never know such information, they instead concentrate on teaching.

Other stakes in certification program management include the protection of the public. This includes patients, students, customers, employees, employers, and all others affected by the performance of certified individuals. The program itself also wagers its reputation each time it confers a certification.

In this high-risk environment, savvy certification program managers are concerned with granting certification only to those likely to practice competently. They optimize their tests using tools from the science of psychometrics to ensure that candidates must demonstrate appropriate knowledge and skills in order to pass. Learn more about the process of test development in this blog post.

question bank

What is a question bank? A question bank refers to a pool of test questions to be used on various assessments across time.  For example, a Certified Widgetmaker Exam might have a pool of 500 questions developed over the past 10 years. Suppose the exam is delivered in June and December of every year, and each time 150 questions are used. This strong pool of items allows the organization to easily select questions and publish a new form of the exam each time, maintaining security and validity.

A question bank is more commonly called an item bank. It is due to the fact that the term question is not often used because many assessment items are not actually questions; they might be statements, vignettes, simulations, or many things other than the traditional question-and-4-answers.

What goes into a question bank?  Metadata.

A question bank is actually much more than the questions themselves. If you ran the Certified Widgetmaker Exam, you would want to keep track of some additional important information. This is all based on the concept of treating the question as a reusable object; if you use the item 4 times, you should never need to type/upload it 4 times. It should be in the system only once, with all its associated metadata!

What to track Examples
Which exam forms used each question Dec 2017, May 2018, May 2019, Dec 2020
Unique item ID Math.Algebra.078
Source/Reference Wilson (2016) p. 123
Status New, Under Review, Active, Retired
Statistics Classical difficulty and discrimination: Item response theory parameters
Reviewer comments Jake Smith 2020/11/22: “I think that D is arguably correct, and we need to provide greater detail in the stem.”
Content area, domain, blueprint Math / Algebra / Quadratic

 

The Solution: Question Banking Software

As you can see, there’s actually quite a bit of functionality and data that goes into a true question bank system. And this is only regarding the questions themselves – it doesn’t get into additional topics such as media file management, Workflow Management, Automated Item Generation, or Test Assembly & Publishing. A professional question banking software system will have much, much more than just a way to store the questions.  FastTest provides a powerful alternative solution to some older platforms on the market.

Looking for a deeper treatment of the topic? Check out the chapter Computerized Item Banking by ASC’s cofounder, C. David Vale, in the 2006 Handbook of Test Development.

Want to learn more about how question banking software can help your organization? Click here, check out this other post, or fill out our contact form for a demonstration.

 

psychometrician psychometrist

A psychometrist is an important profession within the world of assessment and psychology.  Their primary role is to deliver and interpret assessments, typically the sorts of assessments that are delivered in a one-on-one clinical situation.  For example, they might give IQ tests to kids to identify those who qualify as Gifted, then explain the results to parents and teachers.  Obviously, there are many assessments which do not require one-on-one in-person delivery like this; psychometrists are unique in that they are trained on how to deliver these complex types of assessments.  This post will describe more about the role of a psychometrist.

What is a Psychometrist?

A psychometrist is someone involved in the use and administration of assessments, and in most cases is working in the field of psychological testing. This is someone who uses tests every day and is familiar with how to administer such tests (especially complex ones like IQ) and interpret their results to provide feedback to individuals. Some have doctoral degrees as a clinical/counseling psychologist and have extensive expertise in that role; for example, the use of an Autism-spectrum screening test to effectively diagnose patients and develop individualized plans.

Consider the following definition from the National Association of Psychometrists:

A psychometrist is responsible for the administration and scoring of psychological and neuropsychological tests under the supervision of a clinical psychologist or clinical neuropsychologist. 

Source: https://www.napnet.org/what

Where do psychometrists work?

The vast majority of psychometrists work in a clinical setting.  One might work in an Autism center.  One might be at a psychiatric hospital.  One might be at a neurological clinic.  Some school psychologists also perform this work, working directly in schools.  In all cases, they are working directly with the examinee (patient, student, etc.).

Psychometrist Training and Certification

Psychometrists have at least a Bachelor’s degree in psychology or related field, often a Master’s.  There is typically a clinical training component.  Learn more at the National Association of Psychometrists

There is a specific certification for psychometrists, offered by the Board of Certified Psychometrists.  This involves passing a certification exam of 120 questions over 2.5 hours; the test is professionally designed and administered to meet best practices for credentialing exams.

Career Opportunities for Psychometrists

Psychometrists have excellent career prospects, given the general shortage of healthcare personnel.  However, as their training is much less than doctoral-level roles like a psychologist or psychiatrist, the pay rate is far less.

Psychometrist vs. Related Roles

One misconception that I often see on the internet is the distinction or lack thereof between the related job titlesSome professionals are only involved with the engineering of assessments, usually not even in the field of psychology.  They do not work with patients.  Others work with patients but focus on counseling rather than assessment.  The most flagrant offender, curiously, is Google. Like most companies, we utilize AdWords, and find that some job titles and terms are treated interchangeably when they are not related.

A psychometrist usually works under the direction of a psychiatrist or psychologist, though sometimes a psychologist serves as their own psychometrist.  For example, a psychologist at a mental health clinic is in charge of screening patients and treating them, but might have staff to deliver psychological assessments.  But a psychologist in a school might not have staff for that, and also delivers IQ tests to students.

For clarification, here is a comparison of related job titles:

 

Aspect Psychometrist Psychometrician Psychologist Psychiatrist
How are they involved with assessment? Administration & interpretation Engineering & validation Patient treatment Medical treatment
Education Bachelor’s/Master’s in Psychology (often Counseling) PhD in Psychometrics, Psychology, or Education PhD in Psychology (often Counseling or Clinical) MD (Doctor of Medicine or Osteopathy)
Quantitative skills Interpreting scores with summary statistics (mean, standard deviation, z-scores, correlations) Complex analyses like item response theory or factor analysis; complex designs such as adaptive testing Quantitative research outside of assessment, such as comparing treatment methods Some training, but primary purpose is patient care
Soft skills Works extensively with patients and students, often in a counseling role, and can be highly trained on those aspects Often a pure data analyst, but some work with expert panels for topics like job analysis or Angoff studies; never with patients or students Works extensively with patients and students, often in a counseling role, and can be highly trained on those aspects Works extensively with patients and students, often in a counseling role, and can be highly trained on those aspects
Example Staff in a clinic that delivers IQ and other assessments to patients Researcher involved in designing high-stakes exams such as medical certification or university admissions Clinical therapist in private practice Supervisory staff in a clinic or inpatient facility that treats patients

 

Need help in designing an assessment?  Contact us.

 

 

ways-to-improve-item-banks

The foundation of a decent assessment program is the ability to develop and manage strong item banks. Item banks are a central repository of test questions, each stored with important metadata such as Author or Difficulty. They are designed to treat items are reusable objects, which makes it easier to publish new exam forms.

Of course, the storage of metadata is very useful as well and provides validity documentation evidence. Most importantly, a true item banking system will make the process of developing new items more efficient (lower cost) and effective (higher quality).

1. Item writers are screened for expertise

Make sure the item writers (authors) that are recruited for the program will meet minimum levels of expertise. Often this involves a lot of years of experience in the field. You also might want to make sure their demographics are sufficiently distributed, such as specialty area or geographic region.

2. Item writers are trained on best practices

Item writers must be trained on best practices in item writing, as well as any guidelines provided by the organization. A great example is this book from TIMSS. ASC has provided their guidelines for download here. This facilitates higher quality item banks.

3. Items go through review workflow to check best practices

After items are written, they should proceed through a standardized workflow and quality assurance. This is the best practice in developing any products. The field of software development uses a concept called the Kanban Board, which ASC has implemented in its item banking platform.

Review steps can include psychometrician, bias, language editing, and course content.

4. Items are all linked to blueprint/standards

All items in the item banks should be appropriately categorized. This guarantees that no items are measuring an unknown or unneeded concept. Items should be written to meet blueprints or standards.

5. Item banks pilotingitem writing laptop paper

Items are all written with good intent. However, we all know that some items are better than others. Items need to be given to some actual examinees so we can obtain feedback, and also obtain data for psychometric analysis.

Often, they are piloted as unscored items before eventual use as “live” scored items. But this isn’t always possible.

6. Psychometric analysis of items

After items are piloted, you need to analyze them with classical test theory and/or item response theory to evaluate their performance. I like to say there are three possible choices after this evaluation: hold, revise, and retire. Items that perform well are preserved as-is.

Those of moderate quality might be modified and re-piloted. Those that are unsalvageable are slated for early retirement.

How to accomplish all this?

This process can be extremely long, involved, and expensive. Many organizations hire in-house test development managers or psychometricians; those without that option will hire organizations such as ASC to serve as consultants.

Regardless, it is important to have a software platform in place that can effectively manage this process. Such platforms have been around since the 1980s, but many organizations still struggle by managing their item banks with Word, Excel, PowerPoint, and Email!

ASC provides an item banking platform for free, which is used by hundreds of organizations. Click below to sign up for your own account.


Sign Up For Free Account

pair-of-students-examinees-that-have-common-responses

This collusion detection (test cheating) index simply calculates the number of responses in common between a given pair of examinees.  For example, both answered ‘B’ to a certain item regardless of whether it was correct or incorrect.  There is no probabilistic evaluation that can be used to flag examinees.  However, it could be of good use from a descriptive or investigative perspective. 

It has a major flaw in that we expect it to be very high for high-ability examinees.  If two smart examinees both get 99/100 correct, the minimum RIC they could have is 98/100.  Even if they have never met each other and have no possibility of collusion or cheating.

Note that RIC is not standardized in any way, so its range and relevant flag cutoff will depend on the number of items in your test, and how much your examinee responses vary.  For a 100-item test, you might want to set the flag at 90 items.  But for a 50-item test, this is obviously irrelevant, and you might want to set it at 45.

Problems such as these with Responses In Common have led to the development of much more sophisticated indices of examinee collusion and copying, such as Holland’s K index and variants.

Need an easy way to calculate this?  Download our SIFT software for free.

two-examinees-cheating

Exact Errors in Common (EEIC) is an extremely basic collusion detection index simply calculates the number of responses in common between a given pair of examinees.

For example, suppose two examinees got 80/100 correct on a test. Of the 20 each got wrong, they had 10 in common. Of those, they gave the same wrong answer on 5 items. This means that the EEIC would be 5. Why does this index provide evidence of collusion detection? Well, if you and I both get 20 items wrong on a test (same score), that’s not going to raise any eyebrows. But what if we get the same 20 items wrong? A little more concerning. What if we gave the same exact wrong answers on all of those 20? Definitely cause for concern!

There is no probabilistic evaluation that can be used to flag examinees.  However, it could be of good use from a descriptive or investigative perspective. Because it is of limited use by itself, it was incorporated into more advanced indices, such as Harpp, Hogan, and Jennings (1996).

Note that because Exact Errors in Common is not standardized in any way, so its range and relevant flag cutoff will depend on the number of items in your test, and how much your examinee responses vary.  For a 100-item test, you might want to set the flag at 10 items.  But for a 20-item test, this is obviously irrelevant, and you might want to set it at 5 (because most examinees will probably not even get more than 10 errors).

EEIC is easy to calculate, but you can download the SIFT software for free.

pair-of-students-cheating

This exam cheating index (collusion detection) simply calculates the number of errors in common between a given pair of examinees.  For example, two examinees got 80/100 correct, meaning 20 errors, and they answered all of the same questions wrongly, the EIC would be 20. If they both scored 80/100 but had only 10 wrong questions in common, the EIC would be 10.  There is no probabilistic evaluation that can be used to flag examinees, as with more advanced indices. In fact, it is used inside some other indices, such as Harpp & Hogan.  However, this index could be of good use from a descriptive or investigative perspective.

Note that EIC is not standardized in any way, so its range and relevant flag cutoff will depend on the number of items in your test, and how much your examinee responses vary.  For a 100-item test, you might want to set the flag at 10 items.  But for a 30-item test, this is obviously irrelevant, and you might want to set it at 5 (because most examinees will probably not even get more than 10 errors).

Learn more about applying EIC with SIFT, a free software program for exam cheating detection and other assessment issues.