Posts on psychometrics: The Science of Assessment

psychometric training and workshops

Post-training assessment is an integral part of improving the performance and productivity of employees. To gauge the effectiveness of the training, assessments are the go-to solution for many businesses.  They ensure transfer and retention of the training knowledge, provide feedback to employees, and can be used for evaluations.  At the aggregate level, they help determine opportunities for improvement at the company. Effective test preparation can enhance the accuracy and reliability of these assessments, ensuring that employees are adequately prepared to demonstrate their knowledge and skills.

future of assessment

Benefits Of Post-Training Assessments

Insight On Company Strengths and Weaknesses

Testing gives businesses and organizations insight into the positives and negatives of their training programs. For example, if an organization realizes that certain employees can’t grasp certain concepts, they may decide to modify how they are delivered or eliminate them completely. The employees can also work on their areas of weaknesses after the assessments, hence improving productivity. 

Image: The Future Of Assessments: Tech. ed

Helps in Measuring Performance

Unlike traditional testing which is impossible to perform analytics on the assessments, measure performance and high-fidelity, computer-based assessments can quantify initial goals such as call center skills. By measuring performance, businesses can create data-driven roadmaps on how their employees can achieve their best form in terms of performance. 

Advocate For Ideas and Concepts That Can Be Integrated Into The Real World

Workers learn every day and sometimes what they learn is not used in driving the business towards attaining its objectives. This can lead to burnout and information overload in employees, which in turn lowers performance and work quality. By using post-training assessments, you can customize tests to help workers attain skills that are only in alignment with your business goals. Implementing digital assessments can streamline this process, making it easier to deploy adaptive testing methods that provide real-time feedback and tailored learning paths. This can be done by using methods such as adaptive testing

 

Other Benefits of Cloud-based Testing Include: 

  • The assessments can be taken from anywhere in the world
  • Saves the company a lot of time and resources
  • Improved security compared to traditional assessments
  • Improved accuracy and reliability
  • Scalability and flexibility
  • Increases skill and knowledge transfer

 

Tips To Improve Your Post-Training Assessments

1. Personalized Testing

Most businesses have an array of different training needs. Most employees have different backgrounds and responsibilities in organizations, which is difficult to create effective generalized tests. To achieve the main objectives of your training, it is important to differentiate the assessments. Sales assessments, technical assessments, management assessments, etc. can not be the same. Even in the same department, there could be diversification in terms of skills and responsibilities. One way to achieve personalized testing is by using methods such as Computerized Adaptive Testing. Through the immense power of AI and machine learning, this method gives you the power to create tests that are unique to employees. Not only does personalized testing improve effectiveness in your workforce, but it is also cost-effective, secure, and in alignment with the best Psychometrics practices in the corporate world. It is also important to keep in mind the components of effective assessments when creating personalized tests.  

2. Analyzing Assessment Results

Many businesses don’t see the importance of analyzing training assessment results. How do you expect to improve your training programs and assessments if you don’t check the data?  This can tell you important things like where the students are weakest and perhaps need more instruction, or if some questions are wonky.

Analyzing Assessment Results

Example of Assessment analysis on Iteman

 

Analyze assessment results using psychometric analytics software such as Iteman to get important insights such as successful participants, item performance issues, and many others. This provides you with a blueprint to improve your assessments and employee training programs. 

3. Integrating Assessment Into Company Culture

Getting the best out of assessment is not about getting it right once, but getting it right over a long period of time. Integrating assessment into company culture is one great way to achieve this. This will make assessment part of the systems and employees will always look forward to improving their skills. You can also use strategies such as gamification to make sure that your employees enjoy the process. It is also critical to give the employees the freedom to provide feedback on the training programs. 

4. Diversify Your Assessment Types

One great myth about assessments is that they are limited in terms of questions and problems you can present to your employees. However, this is not true!

By using methods such as item banking, assessment systems are able to provide users with the ability to develop assessments using different question types. Some modern question types include:

  • Drag & drop 
  • Multiple correct 
  • Embedded audio or video
  • Cloze or fill in the blank
  • Number lines
  • Situational judgment test items
  • Counter or timer for performance tests

Diversification of question types improves comprehension in employees and helps them develop skills to approach problems from multiple angles. 

5. Choose Your Assessment Tools Carefully

This is among the most important considerations you should make when creating a workforce training assessment strategy. This is because software tools are the core of how your campaigns turn out. 

There are many assessment tools available, but choosing one that meets your requirements can be a daunting task. Apart from the key considerations of budget, functionality, etc., there are many other factors to keep in mind before choosing online assessment tools. 

To help you choose an assessment tool that will help you in your assessment journey, here are a few things to consider:

Ease-of-use

Most people are new to assessments, and as much as some functionalities can be powerful, they may be overwhelming to candidates and the test development staff. This may make candidates underperform. It is, therefore, important to vet the platform and its functionalities to make sure that they are easy to use. 

Functionality 

Training assessments are becoming popular and new inventions are being made every day. Does the assessment software have the latest innovations in the industry? Do you get value for your money? Does it support modern psychometrics like item response theory? These are just but a few questions to ask when vetting a platform for functionality. 

Assessment Reporting and Visualizations

One major advantage of assessments over traditional ones is that they offer access to instant assessment reporting. You should therefore look for a platform that offers advanced reporting and visualizations in metrics such as performance, question strengths, and many others. 

Cheating precautions and Security

When it comes to assessments, there are two concerns when it comes to security. How secure are the assessments? And how secure is the platform? In relation to the tests, the platform should provide precautions and technologies such as Lockdown browser against cheating. They should also have measures in place to make sure that user data is secure. 

Reliable Support System

This is one consideration that many businesses don’t keep in mind, and end up regretting in the long run. Which channels does the corporate training assessment platform use to provide its users with support? Do they have resources such as whitepapers and documentation in case you need them? How fast is their support?  These are questions you should ask before selecting a platform to take care of your assessment needs. 

Scalability

A good testing vendor should be able to provide you with resources should your needs go beyond expectation. This includes delivery volume – server scalability – but also being able to manage more item authors, more assessments, more examinees, and greater psychometric rigor.

Final Thoughts

Adopting effective post-training assessments can be daunting tasks with a lot of forces at play, and we hope these tips will help you get the best out of your assessments. 

Do you want to integrate smarter assessments into your corporate environment or any industry but feel overwhelmed by the process? Feel free to contact an experienced team of professionals to help you create an assessment strategy that helps you achieve your long-term goals and objectives.  

You can also sign up to get free access to our online assessment suite including 60 item types, IRT, adaptive testing, and so much more functionality!

Confectioner-confetti

Item analysis is the statistical evaluation of test questions to ensure they are good quality, and fix them if they are not.  This is a key step in the test development cycle; after items have been delivered to examinees (either as a pilot, or in full usage), we analyze the statistics to determine if there are issues which affect validity and reliability, such as being too difficult or biased.  This post will describe the basics of this process.  If you’d like further detail and instructions on using software, you can also you can also check out our tutorial videos on our YouTube channel and download our free psychometric software.


Download a free copy of Iteman: Software for Item Analysis

What is Item Analysis?

Item analysis refers to the process of statistically analyzing assessment data to evaluate the quality and performance of your test items. This is an important step in the test development cycle, not only because it helps improve the quality of your test, but because it provides documentation for validity: evidence that your test performs well and score interpretations mean what you intend.  It is one of the most common applications of psychometrics, by using item statistics to flag, diagnose, and fix the poorly performing items on a test.  Every item that is poorly performing is potentially hurting the examinees.Iteman Statistics Screenshot

Item analysis boils down to two goals:

  1. Find the items that are not performing well (difficulty and discrimination, usually)
  2. Figure out WHY those items are not performing well, so we can determine whether to revise or retire them

There are different ways to evaluate performance, such as whether the item is too difficult/easy, too confusing (not discriminating), miskeyed, or perhaps even biased to a minority group.

Moreover, there are two completely different paradigms for this analysis: classical test theory (CTT) and item response theory (IRT). On top of that, the analyses can differ based on whether the item is dichotomous (right/wrong) or polytomous (2 or more points).

Because of the possible variations, item analysis complex topic. But, that doesn’t even get into the evaluation of test performance. In this post, we’ll cover some of the basics for each theory, at the item level.

 

How to do Item Analysis

1. Prepare your data for item analysis

Most psychometric software utilizes a person x item matrix.  That is, a data file where examinees are rows and items are columns.  Sometimes, it is a sparse matrix where is a lot of missing data, like linear on the fly testing.  You will also need to provide metadata to the software, such as your Item IDs, correct answers, item types, etc.  The format for this will differ by software.

2. Run data through item analysis software

To implement item analysis, you should utilize dedicated software designed for this purpose. If you utilize an online assessment platform, it will provide you output for item analysis, such as distractor P values and point-biserials (if not, it isn’t a real assessment platform). In some cases, you might utilize standalone software. CITAS  provides a simple spreadsheet-based approach to help you learn the basics, completely for free.  A screenshot of the CITAS output is here.  However, professionals will need a level above this.  Iteman  and  Xcalibre  are two specially-designed software programs from ASC for this purpose, one for CTT and one for IRT.

CITAS output with histogram

3. Interpret results of item analysis

Item analysis software will produce tables of numbers.  Sometimes, these will be ugly ASCII-style tables from the 1980s.  Sometimes, they will be beautiful Word docs with graphs and explanations.  Either way, you need to interpret the statistics to determine which items have problems and how to fix them.  The rest of this article will delve into that.

 

Item Analysis with Classical Test Theory

Classical Test Theory provides a simple and intuitive approach to item analysis. It utilizes nothing more complicated than proportions, averages, counts, and correlations. For this reason, it is useful for small-scale exams or use with groups that do not have psychometric expertise.

Item Difficulty: Dichotomous

CTT quantifies item difficulty for dichotomous items as the proportion (P value) of examinees that correctly answer it.

It ranges from 0.0 to 1.0. A high value means that the item is easy, and a low value means that the item is difficult.  There are no hard and fast rules because interpretation can vary widely for different situations.  For example, a test given at the beginning of the school year would be expected to have low statistics since the students have not yet been taught the material.  On the other hand, a professional certification exam, where someone can not even sit unless they have 3 years of experience and a relevant degree, might have all items appear easy even though they are quite advanced topics!  Here are some general guidelines”

    0.95-1.0 = Too easy (not doing much good to differentiate examinees, which is really the purpose of assessment)

    0.60-0.95 = Typical

    0.40-0.60 = Hard

    <0.40 = Too hard (consider that a 4 option multiple choice has a 25% chance of pure guessing)

With Iteman, you can set bounds to automatically flag items.  The minimum P value bound represents what you consider the cut point for an item being too difficult. For a relatively easy test, you might specify 0.50 as a minimum, which means that 50% of the examinees have answered the item correctly.

For a test where we expect examinees to perform poorly, the minimum might be lowered to 0.4 or even 0.3. The minimum should take into account the possibility of guessing; if the item is multiple-choice with four options, there is a 25% chance of randomly guessing the answer, so the minimum should probably not be 0.20.  The maximum P value represents the cut point for what you consider to be an item that is too easy. The primary consideration here is that if an item is so easy that nearly everyone gets it correct, it is not providing much information about the examinees.  In fact, items with a P of 0.95 or higher typically have very poor point-biserial correlations.

Note that because the scale is inverted (lower value means higher difficulty), this is sometimes referred to as item facility.

The Item Mean (Polytomous)

This refers to an item that is scored with 2 or more point levels, like an essay scored on a 0-4 point rubric or a Likert-type item that is “Rate on a scale of 1 to 5.”

  • 1=Strongly Disagree
  • 2=Disagree
  • 3=Neutral
  • 4=Agree
  • 5=Strongly Agree

The item mean is the average of the item responses converted to numeric values across all examinees. The range of the item mean is dependent on the number of categories and whether the item responses begin at 0. The interpretation of the item mean depends on the type of item (rating scale or partial credit). A good rating scale item will have an item mean close to ½ of the maximum, as this means that on average, examinees are not endorsing categories near the extremes of the continuum.

You will have to adjust for your own situation, but here is an example for the 5-point Likert-style item.

    1-2 is very low; people disagree fairly strongly on average

    2-3 is low to neutral; people tend to disagree on average

    3-4 is neutral to high; people tend to agree on average

    4-5 is very high; people agree fairly strongly on average

Iteman also provides flagging bounds for this statistic.  The minimum item mean bound represents what you consider the cut point for the item mean being too low.  The maximum item mean bound represents what you consider the cut point for the item mean being too high.

The number of categories for the items must be considered when setting the bounds of the minimum/maximum values. This is important as all items of a certain type (e.g., 3-category) might be flagged.

Item Discrimination: Dichotomous

In psychometrics, discrimination is a GOOD THING, even though the word often has a negative connotation in general. The entire point of an exam is to discriminate amongst examinees; smart students should get a high score and not-so-smart students should get a low score. If everyone gets the same score, there is no discrimination and no point in the exam! Item discrimination evaluates this concept.

CTT uses the point-biserial item-total correlation (Rpbis) as its primary statistic for this.

The Pearson point-biserial correlation (r-pbis) is a measure of the discrimination or differentiating strength, of the item. It ranges from −1.0 to 1.0 and is a correlation of item scores and total raw scores.  If you consider a scored data matrix (multiple-choice items converted to 0/1 data), this would be the correlation between the item column and a column that is the sum of all item columns for each row (a person’s score).

A good item is able to differentiate between examinees of high and low ability yet have a higher point-biserial, but rarely above 0.50. A negative point-biserial is indicative of a very poor item because it means that the high-ability examinees are answering incorrectly, while the low examinees are answering it correctly, which of course would be bizarre, and therefore typically indicates that the specified correct answer is actually wrong. A point-biserial of 0.0 provides no differentiation between low-scoring and high-scoring examinees, essentially random “noise.”  Here are some general guidelines on interpretation.  Note that these assume a decent sample size; if you only have a small number of examinees, many item statistics will be flagged!

    0.20+ = Good item; smarter examinees tend to get the item correct

    0.10-0.20 = OK item; but probably review it

    0.0-0.10 = Marginal item quality; should probably be revised or replaced

    <0.0 = Terrible item; replace it

***Major red flag is if the correct answer has a negative Rpbis and a distractor has a positive Rpbis

The minimum item-total correlation bound represents the lowest discrimination you are willing to accept. This is typically a small positive number, like 0.10 or 0.20. If your sample size is small, it could possibly be reduced.  The maximum item-total correlation bound is almost always 1.0, because it is typically desired that the Rpbis be as high as possible.

The biserial correlation is also a measure of the discrimination or differentiating strength, of the item. It ranges from −1.0 to 1.0. The biserial correlation is computed between the item and total score as if the item was a continuous measure of the trait. Since the biserial is an estimate of Pearson’s r it will be larger in absolute magnitude than the corresponding point-biserial.

The biserial makes the stricter assumption that the score distribution is normal. The biserial correlation is not recommended for traits where the score distribution is known to be non-normal (e.g., pathology).

Item Discrimination: Polytomous

The Pearson’s r correlation is the product-moment correlation between the item responses (as numeric values) and total score. It ranges from −1.0 to 1.0. The r correlation indexes the linear relationship between item score and total score and assumes that the item responses for an item form a continuous variable. The r correlation and the Rpbis are equivalent for a 2-category item, so guidelines for interpretation remain unchanged.

The minimum item-total correlation bound represents the lowest discrimination you are willing to accept. Since the typical r correlation (0.5) will be larger than the typical Rpbis (0.3) correlation, you may wish to set the lower bound higher for a test with polytomous items (0.2 to 0.3). If your sample size is small, it could possibly be reduced.  The maximum item-total correlation bound is almost always 1.0, because it is typically desired that the Rpbis be as high as possible.

The eta coefficient is an additional index of discrimination computed using an analysis of variance with the item response as the independent variable and total score as the dependent variable. The eta coefficient is the ratio of the between-groups sum of squares to the total sum of squares and has a range of 0 to 1. The eta coefficient does not assume that the item responses are continuous and also does not assume a linear relationship between the item response and total score.

As a result, the eta coefficient will always be equal or greater than Pearson’s r. Note that the biserial correlation will be reported if the item has only 2 categories.

Key and Distractor Analysis

In the case of many item types, it pays to evaluate the answers. A distractor is an incorrect option. We want to make sure that more examinees are not selecting a distractor than the key (P value) and also that no distractor has higher discrimination. The latter would mean that smart students are selecting the wrong answer, and not-so-smart students are selecting what is supposedly correct. In some cases, the item is just bad. In others, the answer is just incorrectly recorded, perhaps by a typo. We call this a miskey of the item. In both cases, we want to flag the item and then dig into the distractor statistics to figure out what is wrong.

Iteman Psychometric Item Analysis

Example

Here is an example output for one item from our  Iteman  software, which you can download for free. You might also be interested in this video.  This is a very well-performing item.  Here are some key takeaways.

  • This is a 4-option multiple choice item
  • It was on a subscore named “Example subscore”
  • This item was seen by 736 examinees
  • 70% of students answered it correctly, so it was fairly easy, but not too easy
  • The Rpbis was 0.53 which is extremely high; the item is good quality
  • The line for the correct answer in the quantile plot has a clear positive slope, which reflects the high discrimination quality
  • The proportion of examinees selecting the wrong answers was nicely distributed, not too high, and with negative Rpbis values. This means the distractors are sufficiently incorrect and not confusing.

 

Item Analysis with Item Response Theory

Item Response Theory (IRT) is a very sophisticated paradigm of item analysis and tackles numerous psychometric tasks, from item analysis to equating to adaptive testing. It requires much larger sample sizes than CTT (100-1000 responses per item) and extensive expertise (typically a PhD psychometrician). Maximum Likelihood Estimation (MLE) is a key concept in IRT used to estimate model parameters for better accuracy in assessments.

IRT isn’t suitable for small-scale exams like classroom quizzes. However, it is used by virtually every “real” exam you will take in your life, from K-12 benchmark exams to university admissions to professional certifications.

If you haven’t used IRT, I recommend you check out this blog post first.

Item Difficulty

IRT evaluates item difficulty for dichotomous items as a b-parameter, which is sort of like a z-score for the item on the bell curve: 0.0 is average, 2.0 is hard, and -2.0 is easy. (This can differ somewhat with the Rasch approach, which rescales everything.) In the case of polytomous items, there is a b-parameter for each threshold, or step between points.

Item Discrimination

IRT evaluates item discrimination by the slope of its item response function, which is called the a-parameter. Often, values above 0.80 are good and below 0.80 are less effective.

Key and Distractor Analysis

Xcalibre-poly-output

In the case of polytomous items, the multiple b-parameters provide an evaluation of the different answers. For dichotomous items, the IRT modeling does not distinguish amongst correct answers. Therefore, we utilize the CTT approach for distractor analysis. This remains extremely important for diagnosing issues in multiple choice items.

Example

Here is an example of what output from an IRT analysis program (Xcalibre) looks like. You might also be interested in this video.

  • Here, we have a polytomous item, such as an essay scored from 0 to 3 points.
  • It is calibrated with the generalized partial credit model.
  • It has strong classical discrimination (0.62)
  • It has poor IRT discrimination (0.466)
  • The average raw score was 2.314 out of 3.0, so fairly easy
  • There was a sufficient distribution of responses over the four point levels
  • The boundary parameters are not in sequence; this item should be reviewed

 

Summary

This article is a very broad overview and does not do justice to the complexity of psychometrics and the art of diagnosing/revising items!  I recommend that you download some of the item analysis software and start exploring your own data.

For additional reading, I recommend some of the common textbooks.  For more on how to write/revise items, check out Haladyna (2004) and subsequent works.  For item response theory, I highly recommend Embretson & Riese (2000).

 

So, yeah, the use of “hacks” in the title is definitely on the ironic and gratuitous side, but there is still a point to be made: are you making full use of current technology to keep your tests secure?  Gone are the days when you are limited to linear test forms on paper in physical locations.  Here are some quick points on how modern assessment technology can deliver assessments more securely, effectively, and efficiently than traditional methods:

1.  AI delivery like CAT and LOFT

Psychometrics was one of the first areas to apply modern data science and machine learning (see this blog post for a story about a MOOC course).  But did you know it was also one of the first areas to apply artificial intelligence (AI)?  Early forms of computerized adaptive testing (CAT) were suggested in the 1960s and had become widely available in the 1980s.  CAT delivers a unique test to each examinee by using complex algorithms to personalize the test.  This makes it much more secure, and can also reduce test length by 50-90%.

2. Psychometric forensics

Modern psychometrics has suggested many methods for finding cheaters and other invalid test-taking behavior.  These can range from very simple rules like flagging someone for having a top 5% score in a bottom 5% time, to extremely complex collusion indices.  These approaches are designed explicitly to keep your test more secure.

3. Tech enhanced items

Tech enhanced items (TEIs) are test questions that leverage technology to be more complex than is possible on paper tests.  Classic examples include drag and drop or hotspot items.  These items are harder to memorize and therefore contribute to security.

4. IP address limits

Suppose you want to make sure that your test is only delivered in certain school buildings, campuses, or other geographic locations.  You can build a test delivery platform that limits your tests to a range of IP addresses, which implements this geographic restriction.

5. Lockdown browser

A lockdown browser is a special software that locks a computer screen onto a test in progress, so for example a student cannot open Google in another tab and simply search for answers.  Advanced versions can also scan the computer for software that is considered a threat, like a screen capture software.

6. Identity verification

Tests can be built to require unique login procedures, such as requiring a proctor to enter their employee ID and the test-taker to enter their student ID.  Examinees can also be required to show photo ID, and of course, there are new biometric methods being developed.

7. Remote proctoring

The days are gone when you need to hop in the car and drive 3 hours to sit in a windowless room at a community college to take a test.  Nowadays, proctors can watch you and your desktop via webcam.  This is arguably as secure as in-person proctoring, and certainly more convenient and cost-effective.

So, how can I implement these to deliver assessments more securely?

Some of these approaches are provided by vendors specifically dedicated to that space, such as ProctorExam for remote proctoring.  However, if you use ASC’s FastTest platform, all of these methods are available for you right out of the box.  Want to see for yourself?  Sign up for a free account!

Conditional standard error of measurement function

Do you conduct adaptive testing research? Perhaps a thesis or dissertation? Or maybe you have developed adaptive tests and have a technical report or validity study? I encourage you to check out the Journal of Computerized Adaptive Testing as a publication outlet for your adaptive testing research. JCAT is the official journal of the International Association for Computerized Adaptive Testing (IACAT), a nonprofit organization dedicated to improving the science of assessments.

JCAT has an absolutely stellar board of editors and was founded to focus on improving the dissemination of research in adaptive testing. The IACAT website also contains a comprehensive bibliography of research in adaptive testing, across all journals and tech reports, for the past 50 years.  IACAT was founded at the 2009 conference on computerized adaptive testing and has since held conferences every other year as well as hosting the JCAT journal.

Potential research topics at the JCAT journal

Here are some of the potential research topics:

laptop data graph

  • Item selection algorithms
  • Item exposure algorithms
  • Termination criteria
  • Cognitive diagnostic models
  • Simulation studies
  • Validation studies
  • Item response theory models
  • Multistage testing
  • Use of adaptive testing in new(er) situations, like patient reported outcomes
  • Design of actual adaptive assessments and their release into the wild

If you are not involved in CAT research but are interested, please visit the IACAT and journal website to read the articles.  Access is free.  JCAT would also appreciate it if you would share this information to colleagues so that they might consider publication.

La seguridad y validez de las pruebas y exámenes en línea son extremadamente importantes.  La pandemia COVID-19 cambió drásticamente todos los aspectos de nuestro mundo, y una de las áreas más afectadas es la evaluación educativa y otros tipos de evaluación. Muchas organizaciones aún realizaban pruebas con metodologías de hace 50 años, como colocar a 200 evaluados en una sala grande con escritorios, exámenes en papel y un lápiz. COVID-19 está obligando a muchas organizaciones a dar un giro, lo que brinda la oportunidad de modernizar las evaluaciones. 

Pero, ¿cómo podemos mantener la seguridad en la evaluación, y por lo tanto la validez, a través de estos cambios? A continuación, presentamos algunas sugerencias, las cuales se pueden implementar fácilmente en las plataformas de evaluación de ASC, líderes en la industria. Comience registrándose para obtener una cuenta gratuita en https://assess.com/assess-ai/.

Verdadera banca de ítems con acceso a contenido

Una buena evaluación en línea comienza con buenos ítems. Si bien los Sistemas de Gestión del Aprendizaje (LMS por sus siglas en inglés) y otras plataformas que no son realmente de evaluación incluyen algunas funciones de creación de ítems, por lo general no cumplen con los requisitos básicos para una verdadera banca de ítems. Existen prácticas recomendadas con respecto a la banca de ítems que son estándar en las organizaciones de evaluación a gran escala (p. Ej., Los Departamentos de Educación de Estado en EE. UU.), pero son sorprendentemente raras para los exámenes de certificación/licencia profesional, universidades y otras organizaciones. A continuación, se muestran algunos ejemplos.collaborative item banking

• Los ítems son reutilizables (no es necesario cargarlos para cada prueba en la que se utilicen).

• Seguimiento de la versión del ítem.

• Seguimiento y auditorías de edición hecha por usuarios.

• Controles de contenido de autor (los profesores de matemáticas solo pueden ver elementos de matemáticas).

• Almacenar metadatos como parámetros de la Teoría de Respuesta al Ítem (TRI) y estadísticas clásicas.

• Seguimiento del uso de ítems en las pruebas.

• Flujo de trabajo de revisión de ítems.

Acceso basado en roles

Todos los usuarios deben estar limitados por roles, como Autor del ítem, Revisor del Ítem, Editor de Pruebas y Administrador de los Evaluados. Entonces, por ejemplo, es posible que alguien a cargo de administrar la lista de evaluados/estudiantes nunca vea ninguna pregunta del examen.

Análisis forense de datos

Hay muchas formas de analizar los resultados de tu prueba para buscar posibles amenazas de seguridad / validez. Nuestro  software SIFT  proporciona una plataforma de software gratuita para ayudarte a implementar esta metodología moderna. Puedes evaluar los índices de colusión, que cuantifican qué tan similares son las respuestas para cualquier par de evaluados. También puedes evaluar los tiempos de respuesta, el rendimiento del grupo y las estadísticas acumuladas.

Aleatorización

Cuando las pruebas se entregan en línea, debe tener la opción de aleatorizar el orden de los ítems y también el orden de las respuestas. Al imprimir en papel, debe haber una opción para aleatorizar el orden. Pero, por supuesto, está mucho más limitado respecto a esto cuando se usa papel.

Prueba lineal sobre la marcha (LOFT)

LOFT creará una prueba aleatoria única para cada evaluado. Por ejemplo, puedes tener un grupo de 300 ítems distribuidos en 4 dominios, y cada evaluado recibirá 100 ítems con 25 de cada dominio. Esto aumenta enormemente la seguridad.

Pruebas adaptativas computarizadas (CAT)

CAT lleva la personalización aún más lejos y adapta la dificultad del examen y el número de ítems que ve cada alumno, en base a ciertos algoritmos y objetivos psicométricos. Esto hace que la prueba sea extremadamente segura.

Navegador bloqueado

¿Quieres asegurarte de que el alumno no pueda navegar en busca de respuestas o tomar capturas de pantalla de ítems? Necesitas un navegador bloqueado. Las plataformas de evaluación de ASC,  Assess.ai  y  FastTest, vienen con esto listo para usar y sin costo adicional.

Códigos de prueba para evaluados

¿Quieres asegurarte de que la persona adecuada realice el examen adecuado? Genera contraseñas únicas de un solo uso para que las entregue un supervisor después de la verificación de identidad. Esto es especialmente útil en la supervisión remota; el estudiante nunca recibe ninguna información  antes del examen sobre cómo ingresar, excepto para iniciar la sesión de supervisión virtual. Una vez que el supervisor verifica la identidad del evaluado le proporciona la contraseña única de un solo uso.

Códigos de supervisor

¿Quieres un paso adicional en el procedimiento de inicio de la prueba? Una vez que se verifica la identidad de un estudiante e ingresa su código, el supervisor también debe ingresar una contraseña diferente que sea exclusiva para él ese día.

Ventanas de fecha / hora

¿Quieres evitar que los evaluados ingresen temprano o tarde? Configura una ventana de tiempo específica, como el viernes de 9 a 12 am.

Supervisión basada en IA (Inteligencia Artificial)

Deliver-exams remote proctoring

Este nivel de supervisión es relativamente económico, y hace un gran trabajo validando los resultados de un evaluado individual. Sin embargo, no protege la propiedad intelectual de las preguntas de tu examen. Si un evaluado roba todas las preguntas, no lo sabrás de inmediato. Por lo tanto, es muy útil para exámenes de nivel bajo o medio, pero no tan útil para exámenes de alto riesgo como certificaciones o licenciaturas. Obtenga más información sobre nuestras opciones de supervisión remota. También te recomiendo esta publicación de blog para obtener una descripción general de la industria de supervisión remota.

Supervisión pruebas en línea en tiempo real

Si no puedes asistir a los centros de pruebas en persona debido a COVID, esta es la siguiente mejor opción. Los supervisores en vivo pueden registrar al candidato, verificar la identidad e implementar todas las demás cosas anteriores. Además, pueden verificar el entorno del evaluado y detener el examen si ven que el evaluado roba preguntas u otros problemas importantes. MonitorEDU es un gran ejemplo de esto.

¿Cómo puedo empezar?

¿Necesitas ayuda para implementar algunas de estas medidas? ¿O simplemente quieres hablar sobre las posibilidades? Envía un correo electrónico a ASC a solutions@assess.com.

 

La vigilancia en línea existe desde hace más de una década. Pero dado el reciente brote de COVID-19, las instituciones educativas y de fuerza laboral / certificación están luchando por cambiar sus operaciones, y una gran parte de esto es un aumento increíble en la vigilancia en línea. Esta publicación de blog está destinada a proporcionar una descripción general de la industria de vigilancia en línea para alguien que es nuevo en el tema o está comenzando a comprar y está abrumado por todas las opciones que existen.

Vigilancia en Línea: Dos Mercados Distintos

En primer lugar, describiría la industria de vigilancia en línea como perteneciente a dos mercados distintos, por lo que el primer paso es determinar cuál de ellos se adapta a tu organización.

1. Sistemas a mayor escala, de menor costo (cuando son a gran escala) y con menos seguridad, diseñados para ser utilizados solo como un complemento para las principales plataformas LMS como Blackboard o Canvas. Por lo tanto, estos sistemas de vigilancia en línea están diseñados para exámenes de nivel medio, como un examen de mitad de período de Introducción a la psicología en una universidad.

2. Sistemas de menor escala, mayor costo y mayor seguridad diseñados para ser utilizados con plataformas de evaluación independientes. Estos son generalmente para exámenes de mayor importancia como certificación o fuerza laboral, o quizás para uso especial en universidades como exámenes de Admisión y Colocación.

¿Cómo reconocer la diferencia? El primer tipo anunciará la fácil integración con sistemas como Blackboard o Canvas como característica clave. También se centrarán a menudo en la revisión de videos por IA, en lugar de usar humanos en tiempo real. Otra consideración clave es observar la base de clientes existente, que usualmente es anunciada.

Otras formas en que los sistemas de vigilancia en línea pueden diferir

IA vs humanos: Algunos sistemas se basan exclusivamente en algoritmos de inteligencia artificial para marcar las grabaciones de video de los examinados. Otros sistemas utilizan humanos reales.

Grabar y Revisar vs Humanos en Tiempo Real: Existen dos formas si se utilizan humanos. Primero, puede ser en vivo y en tiempo real, lo que significa que hay un ser humano en el otro extremo del video que puede confirmar la identidad antes de permitir que comience la prueba, y detener la prueba si hay actividad ilícita. Grabar y Revisar grabará el audio y un humano lo comprobará en un plazo de 24 a 48 horas. Esto es más flexible, pero no puedes detener la prueba si alguien está robando el contenido; probablemente no lo sabrás hasta el día siguiente.

Captura de pantalla: Algunos proveedores de vigilancia en línea tienen la opción de grabar / transmitir la pantalla y también la cámara web. Algunos también brindan la opción de hacer únicamente esto (sin cámara web) para exámenes de menor importancia.

Teléfono móvil como tercera cámara: Algunas plataformas más nuevas ofrecen la opción de integrar fácilmente el teléfono móvil del examinado como una tercera cámara, que funciona efectivamente como un supervisor humano. Se les indicará a los examinados que utilicen el video para mostrar debajo de la mesa, detrás del monitor, etc., antes de comenzar el examen. Luego, se les puede indicar que coloquen el teléfono a 2 metros de distancia con una vista clara de toda la habitación mientras se realiza la prueba.

Uso de supervisores propios: Algunos sistemas de vigilancia en línea le permiten utilizar su propio personal como supervisores, lo que es especialmente útil si la prueba se realiza en un período de tiempo reducido. Si se entrega continuamente 24 × 7 durante todo el año, probablemente desee utilizar el personal altamente capacitado del proveedor.

Integraciones de API: Algunos sistemas requieren que los desarrolladores de software configuren una integración de API con su LMS o plataforma de evaluación. Otros son más flexibles y puedes iniciar sesión por ti mismo, cargar una lista de examinados y ya queda todo listo para la prueba.

Bajo pedido vs Programado: Algunas plataformas requieren que se programe un margen de tiempo para que los examinados realicen la prueba. Otros son puramente bajo demanda y el examinado puede presentarse cuando esté listo. MonitorEDU es un excelente ejemplo de esto: los examinados se presentan en cualquier momento, presentan su identificación a un humano en tiempo real y luego comienzan la prueba de inmediato: sin descargas / instalaciones, sin verificaciones del sistema, sin integraciones de API, nada.

Más seguridad: Un Mejor Sistema de Entrega de Pruebas

Una buena plataforma de entrega de pruebas también vendrá con su propia funcionalidad para mejorar la seguridad de las pruebas: aleatorización, generación automatizada de ítems, pruebas adaptativas computarizadas, pruebas lineales sobre la marcha, banca profesional de ítems, puntuación de la teoría de respuesta a los ítems, puntuación escalada, análisis psicométrico, equiparación, entrega de bloqueo y más. En el contexto de la vigilancia en línea, quizás lo más destacado sea la entrega de bloqueo. En este caso, la prueba se hará cargo por completo de la computadora del examinado y no podrá usarla para nada más hasta que termine la prueba.

Los sistemas LMS rara vez incluyen esta funcionalidad, porque no son necesarios para un examen de mitad de período de Introducción a la psicología. Sin embargo, hay muchas cosas en juego en la mayoría de las evaluaciones del mundo (admisiones universitarias, certificaciones, contratación de personal, etc.) y estas pruebas dependen en gran medida de dicha funcionalidad. Tampoco es solo una costumbre o una tradición. Dichos métodos se consideran esenciales según los estándares internacionales, incluidos AERA/APA/NCMA, ITC y NCCA.

Socios de ASC de Vigilancia en Línea

ASC les brinda a sus clientes una solución lista para ser usada, debido a que está asociado con algunos de los líderes en el ámbito. Estos incluyen: MonitorEDU, ProctorExam, Examity y Proctor360. Obtén más información en nuestra página web sobre esa funcionalidad y otra que explica el concepto de seguridad de prueba configurable.

Traducido de la entrada de blog escrita por el Dr. Nathan Thompson.

Nathan Thompson obtuvo su doctorado en psicometría de la Universidad de Minnesota, con un enfoque en pruebas adaptativas computarizadas. Su licenciatura fue de Luther College con una triple especialización en Matemáticas, Psicología y Latín. Está interesado principalmente en el uso de la IA y la automatización de software para aumentar y reemplazar el trabajo realizado por psicometristas, lo que le ha proporcionado una amplia experiencia en el diseño y programación de software. El Dr. Thompson ha publicado más de 100 artículos de revistas y presentaciones de conferencias, pero su favorito sigue siendo https://pareonline.net/getvn.asp?v=16&n=1.

Test information function

The IRT Test Information Function is a concept from item response theory (IRT) that is designed to evaluate how well an assessment differentiates examinees, and at what ranges of ability. For example, we might expect an exam composed of difficult items to do a great job in differentiating top examinees, but it is worthless for the lower half of examinees because they will be so confused and lost.

The reverse is true of an easy test; it doesn’t do any good for top examinees. The test information function quantifies this and has a lot of other important applications and interpretations.

IRT Test Information Function: how to calculate it

The test information function is not something you can calculate by hand. First, you need to estimate item-level IRT parameters, which define the item response function. The only way to do this is with specialized software; there are a few options in the market, but we recommend Xcalibre.

Next, the item response function is converted to an item information function for each item. The item information functions can then be summed into a test information function. Lastly, the test information function is often inverted into the conditional standard error of measurement function, which is extremely useful in test design and evaluation.

IRT Item Parameters

Software like Xcalibre will estimate a set of item parameters. The parameter you use depends on the item types and other aspects of your assessment.

For example, let’s just use the 3-parameter model, which estimates a, b, and c. And we’ll use a small test of 5 items. These are ordered by difficulty: item 1 is very easy and Item 5 is very hard.

Item a b c
1 1.00 -2.00 0.20
2 0.70 -1.00 0.40
3 0.40 0.00 0.30
4 0.80 1.00 0.00
5 1.20 2.00 0.25

 

Item Response Function

The item response function uses the IRT equation to convert the parameters into a curve. The purpose of the item parameters is to fit this curve for each item, like a regression model to describe how it performs.

Here are the response functions for those 5 items. Note the scale on the x-axis, similar to the bell curve, with the easy items to the left and hard ones to the right.

item response function five graphs

 

Item Information Function

The item information function evaluates the calculus derivative of the item response function. An item provides more information about examinees where it provides more slope.

For example, consider Item 5: it is difficult, so it is not very useful for examinees in the bottom half of ability. The slope of the Item 5 IRF is then nearly 0 for that entire range. This then means that its information function is nearly 0.

item information function five graphs

 

Test Information Function

The test information function then sums up the item information functions to summarize where the test is providing information. If you imagine adding the graphs above, you can easily imagine some humps near the top and bottom of the range where there are the prominent IIFs. 

test information function

 

Conditional Standard Error of Measurement Function

The test information function can be inverted into an estimate of the conditional standard error of measurement. What do we mean by conditional? If you are familiar with classical test theory, you know that it estimates the same standard error of measurement for everyone that takes a test.

But given the reasonable concepts above, it is incredibly unreasonable to expect this. If a test has only difficult items, then it measures top students well, and does not measure lower students well, so why should we say that their scores are just as accurate? The conditional standard error of measurement turns this into a function of ability.

Also, note that it refers to the theta scale and not to the number-correct scale.

conditional standard error of measurement

 

How can I implement all this?

For starters, I recommend delving deeper into an item response theory book. My favorite is Item Response Theory for Psychologists by Embretson and Riese. Next, you need some item response theory software.

Xcalibre can be downloaded as a free version for learning and is the easiest program to learn how to use (no 1980s-style command code… how is that still a thing?). But if you are an R fan, there are plenty of resources in that community as well.

Tell me again: why are we doing this?

The purpose of all this is to effectively model how items and tests work, namely, how they interact with examinees. This then allows us to evaluate their performance so that we can improve them, thereby enhancing reliability and validity.

Classical test theory had a lot of shortcomings in this endeavor, which led to IRT being invented. IRT also facilitates some modern approaches to assessment, such as linear on-the-fly testing, adaptive testing, and multistage testing.

conditional standard error of measurement

The standard error of measurement (SEM) is one of the core concepts in psychometrics.  One of the primary assumptions of any assessment is that it is accurately and consistently measuring whatever it is we want to measure.  We, therefore, need to demonstrate that it is doing so.  There are a number of ways of quantifying this, and one of the most common is the SEM.

The SEM can be used in both the classical test theory (CTT) perspective and item response theory (IRT) perspective, though it is defined quite differently in both.

 

What is measurement error?

We can all agree that assessments are not perfect, from a 4th grade math quiz to a Psych 101 exam at university to a driver’s license test.  Suppose you got 80% on an exam today.  If we wiped your brain clean and you took the exam tomorrow, what score would you get?  Probably a little higher or lower.  Psychometricians consider you to have a true score which is what would happen if the test was perfect, you had no interruptions or distractions, and everything else fell into place.  But in reality, you, of course, do not get that score each time.  So psychometricians try to estimate the error in your score, and use this in various ways to improve the assessment and how scores are used.

 

The Standard Error of Measurement in Classical Test Theory

In CTT, it is defined as

SEM = SD*sqrt(1-r),

where SD is the standard deviation of scores for everyone who took the test, and r is the reliability of the test.  It is interpreted as the standard deviation of scores that you would find if you had the person take the test over and over, with a fresh mind each time.  A confidence interval with this is then interpreted as the band where you would expect the person’s true score on the test to fall.

This has some conceptual disadvantages.  For one, it assumes that SEM is the same for all examinees, which is unrealistic.  The interpretation focuses only on this single test form rather than the accuracy of measuring someone’s true standing on the trait.  Moreover, it does not utilize the examinee’s responses in any way.  Lord (1984) suggested a conditional standard error of measurement based on classical test theory, but it focuses on the error of the examinee taking the same test again, rather than the measurement of the true latent value as is done with IRT below.

The classical SEM is reported in Iteman for each subscore, the total score, score on scored items only, and score on pretest items.

Item Response Theory: Conditional Standard Error of Measurement 

Early researchers realized that this assumption is unreasonable.  Suppose that a test has a lot of easy questions.  It will therefore measure low-ability examinees quite well.  Imagine that it is a Math placement exam for university, and has a lot of Geometry and Algebra questions at a high school level.  It will measure students well who are at that level, but do a very poor job of measuring top students.  In an extreme case, let’s say the top 20% of students get every item correct, and there is no way to differentiate them; that defeats the purpose of the test.

The weaknesses of the classical SEM are one of the reasons that IRT was developed.  IRT conceptualizes the SEM as a continuous function across the range of student ability, which is an inversion of the test information function (TIF).  A test form will have more accuracy – less error – in a range of ability where there are more items or items of higher quality.  That is, a test with most items of middle difficulty will produce accurate scores in the middle of the range, but not measure students on the top or bottom very well.  

An example of this is shown below.  On the right is the conditional standard error of measurement function, and on the left is its inverse, the test information function.  Clearly, this test has a lot of items around -1.0 on the theta spectrum, which is around the 15th percentile.  Students above 1.0 (85th percentile) are not being measured well.

Standard error of measurement and test information function

This is actually only the predicted SEM based on all the items in a test/pool.  The observed SEM can differ for each examinee based on the items that they answered, and which ones they answered correctly.  If you want to calculate the IRT SEM on a test of yours, you need to download Xcalibre and implement a full IRT calibration study.

How is CSEM used?

A useful way to think about conditional standard error of measurement is with confidence intervals.  Suppose your score on a test is 0.5 with item response theory.  If the CSEM is 0.25 (see above) then we can get a 95% confidence interval by taking plus or minus 2 standard errors.  This means that we are 95% certain that your true score lies between 0.0 and 1.0.  For a theta of 2.5 with an CSEM of 0.5, that band is then 1.5 to 2.5 – which might seem wide, but remember that is like 94th percentile to 99th percentile.

You will sometimes see scores reported in this manner.  I once saw a report on an IQ test that did not give a single score, but instead said “we can expect that 9 times out of 10 that you would score between X and Y.”

There are various ways to use the CSEM and related functions in the design of tests, including the assembly of parallel linear forms and the development of computerized adaptive tests. To learn more about this, I recommend you delve into a book on IRT, such as Embretson and Riese (2000).  That’s more than I can cover here.

question bank

What is a question bank? A question bank refers to a pool of test questions to be used on various assessments across time.  For example, a Certified Widgetmaker Exam might have a pool of 500 questions developed over the past 10 years. Suppose the exam is delivered in June and December of every year, and each time 150 questions are used. This strong pool of items allows the organization to easily select questions and publish a new form of the exam each time, maintaining security and validity.

A question bank is more commonly called an item bank. It is due to the fact that the term ‘question’ is not often used because many assessment items are not actually questions; they might be statements, vignettes, simulations, or many things other than the traditional question-and-4-answers. It is important to regularly review the item bank to identify and address any ‘enemy items,’ which are items that might negatively impact the test’s reliability and fairness.

What goes into a question bank?  Metadata.

A question bank is actually much more than the questions themselves. If you ran the Certified Widgetmaker Exam, you would want to keep track of some additional important information. This is all based on the concept of treating the question as a reusable object; if you use the item 4 times, you should never need to type/upload it 4 times. It should be in the system only once, with all its associated metadata!

What to track Examples
Which exam forms used each question Dec 2017, May 2018, May 2019, Dec 2020
Unique item ID Math.Algebra.078
Source/Reference Wilson (2016) p. 123
Status New, Under Review, Active, Retired
Statistics Classical difficulty and discrimination: Item response theory parameters
Reviewer comments Jake Smith 2020/11/22: “I think that D is arguably correct, and we need to provide greater detail in the stem.”
Content area, domain, blueprint Math / Algebra / Quadratic

 

The Solution: Question Banking Software

As you can see, there’s actually quite a bit of functionality and data that goes into a true question bank system. And this is only regarding the questions themselves – it doesn’t get into additional topics such as media file management, Workflow Management, Automated Item Generation, or Test Assembly & Publishing. A professional question banking software system will have much, much more than just a way to store the questions.  FastTest provides a powerful alternative solution to some older platforms on the market.

Looking for a deeper treatment of the topic? Check out the chapter Computerized Item Banking by ASC’s cofounder, C. David Vale, in the 2006 Handbook of Test Development.

Want to learn more about how question banking software can help your organization? Click here, check out this other post, or fill out our contact form for a demonstration.

 

psychometrician psychometrist

A psychometrist is an important profession within the world of assessment and psychology.  Their primary role is to deliver and interpret assessments, typically the sorts of assessments that are delivered in a one-on-one clinical situation.  For example, they might give IQ tests to kids to identify those who qualify as Gifted, then explain the results to parents and teachers.  Obviously, there are many assessments which do not require one-on-one in-person delivery like this; psychometrists are unique in that they are trained on how to deliver these complex types of assessments.  This post will describe more about the role of a psychometrist.

What is a Psychometrist?

A psychometrist is someone involved in the use and administration of assessments, and in most cases is working in the field of psychological testing. This is someone who uses tests every day and is familiar with how to administer such tests (especially complex ones like IQ) and interpret their results to provide feedback to individuals. Some have doctoral degrees as a clinical/counseling psychologist and have extensive expertise in that role; for example, the use of an Autism-spectrum screening test to effectively diagnose patients and develop individualized plans.

Consider the following definition from the National Association of Psychometrists:

A psychometrist is responsible for the administration and scoring of psychological and neuropsychological tests under the supervision of a clinical psychologist or clinical neuropsychologist. 

Source: https://www.napnet.org/what

Where do psychometrists work?

The vast majority of psychometrists work in a clinical setting.  One might work in an Autism center.  One might be at a psychiatric hospital.  One might be at a neurological clinic.  Some school psychologists also perform this work, working directly in schools.  In all cases, they are working directly with the examinee (patient, student, etc.).

Psychometrist Training and Certification

Psychometrists have at least a Bachelor’s degree in psychology or related field, often a Master’s.  There is typically a clinical training component.  Learn more at the National Association of Psychometrists

There is a specific certification for psychometrists, offered by the Board of Certified Psychometrists.  This involves passing a certification exam of 120 questions over 2.5 hours; the test is professionally designed and administered to meet best practices for credentialing exams.

Career Opportunities for Psychometrists

Psychometrists have excellent career prospects, given the general shortage of healthcare personnel.  However, as their training is much less than doctoral-level roles like a psychologist or psychiatrist, the pay rate is far less.

Psychometrist vs. Related Roles

One misconception that I often see on the internet is the distinction or lack thereof between the related job titlesSome professionals are only involved with the engineering of assessments, usually not even in the field of psychology.  They do not work with patients.  Others work with patients but focus on counseling rather than assessment.  The most flagrant offender, curiously, is Google. Like most companies, we utilize AdWords, and find that some job titles and terms are treated interchangeably when they are not related.

A psychometrist usually works under the direction of a psychiatrist or psychologist, though sometimes a psychologist serves as their own psychometrist.  For example, a psychologist at a mental health clinic is in charge of screening patients and treating them, but might have staff to deliver psychological assessments.  But a psychologist in a school might not have staff for that, and also delivers IQ tests to students.

For clarification, here is a comparison of related job titles:

 

Aspect Psychometrist Psychometrician Psychologist Psychiatrist
How are they involved with assessment? Administration & interpretation Engineering & validation Patient treatment Medical treatment
Education Bachelor’s/Master’s in Psychology (often Counseling) PhD in Psychometrics, Psychology, or Education PhD in Psychology (often Counseling or Clinical) MD (Doctor of Medicine or Osteopathy)
Quantitative skills Interpreting scores with summary statistics (mean, standard deviation, z-scores, correlations) Complex analyses like item response theory or factor analysis; complex designs such as adaptive testing Quantitative research outside of assessment, such as comparing treatment methods Some training, but primary purpose is patient care
Soft skills Works extensively with patients and students, often in a counseling role, and can be highly trained on those aspects Often a pure data analyst, but some work with expert panels for topics like job analysis or Angoff studies; never with patients or students Works extensively with patients and students, often in a counseling role, and can be highly trained on those aspects Works extensively with patients and students, often in a counseling role, and can be highly trained on those aspects
Example Staff in a clinic that delivers IQ and other assessments to patients Researcher involved in designing high-stakes exams such as medical certification or university admissions Clinical therapist in private practice Supervisory staff in a clinic or inpatient facility that treats patients

 

Need help in designing an assessment?  Contact us.