¿Qué es la psicometría? Ciencia de datos para evaluación.
La psicometría es la ciencia de la evaluación educativa y psicológica, que utiliza datos para garantizar que las pruebas sean justas y precisas. ¿Alguna vez has sentido que hiciste una prueba que era injusta, demasiado difícil, que no cubría los temas correctos o que estaba llena de preguntas que eran simplemente confusas o estaban mal escritas? Los psicometristas son las personas que ayudan a las organizaciones a solucionar estos problemas utilizando la ciencia de datos, así como temas más avanzados como cómo diseñar un algoritmo de IA que se adapte a cada examinado.
La psicometría es un aspecto fundamental de muchos campos. Tener información precisa sobre las personas es esencial para la educación, los recursos humanos, el desarrollo de la fuerza laboral, la capacitación corporativa, las certificaciones/licencias profesionales, la medicina y más. Estudia científicamente cómo se diseñan, desarrollan, entregan, validan y califican las pruebas.
Key points about psychometrics
- Psychometrics is the study of how to measure and evaluate mental constructs, such as intelligence, personality, or knowledge of accounting laws.
- Psychometrics is NOT just a job selection test.
- Psychometrics is dedicated to making tests more accurate and fair.
- Psychometrics relies heavily on data analysis and machine learning, such as item response theory .
What is psychometry?
Psychometrics is the study of assessment itself, regardless of the type of test being considered. In fact, many psychometricians don’t even work on a particular test, they only work on psychometrics itself, such as new methods of data analysis. Most professionals don’t care what the test measures, and will often switch jobs into completely unrelated subjects, such as moving from a K-12 testing company to a psychological measurement company or to an Accounting certification exam. We often refer to what we’re measuring simply as “theta,” a term from item response theory .
Psychometrics addresses fundamental questions around assessment, such as how to determine whether a test is reliable or whether an item is of good quality, as well as much more complex questions such as how to ensure that a college entrance exam score today means the same as it did 10 years ago. It also examines phenomena such as positive variegation , where different cognitive abilities tend to be positively correlated, supporting the consistency and generalizability of test scores over time.
Psychometrics is a branch of data science. In fact, it’s been around long before that term was even a buzzword. Don’t believe me? Check out this Coursera course on Data Science , and the first example they give as one of the seminal landmark projects in data science is… psychometrics! (early research on factor analysis of intelligence).
Although assessment is everywhere and psychometrics is an essential aspect of assessment, for most people it remains a black box and practitioners are jokingly called ” psychomagicians “. However, it is important for everyone working in the testing industry to have a basic understanding of it, especially those who develop or sell tests.
Psychometrics is NOT limited to very specific types of assessment. Some people use the term interchangeably with concepts like IQ testing, personality assessment, or pre-employment testing. These are just small parts of the field! Also, it is not the administration of a test .
Why do we need psychometrics?
The purpose of tests is to provide useful information about people , such as whether to hire them, certify them in a profession, or determine what to teach them next in school. Better tests mean better decisions. Why? The scientific evidence is overwhelming that tests provide better information for decision makers than many other types of information, such as interviews, resumes, or educational achievements. Thus, tests serve an extremely useful function in our society.
The goal of psychometrics is to provide validity — evidence to support that the interpretations of test scores are what we intended. If a certification test is supposed to mean that someone who passes it meets the minimum standard to work in a certain job, we need a lot of evidence about that — especially since the test is so high-stakes in that case. Meta-analysis , a key tool in psychometrics, aggregates research findings from across studies to provide strong evidence about the reliability and validity of tests. By synthesizing data from multiple studies, meta-analysis strengthens claims of test validity, which is especially crucial in high-stakes certification exams where accuracy and fairness are paramount.
What does psychometry do?
Creating and maintaining a high-quality test isn’t easy. Many important issues can arise. Much of the field revolves around solving important questions about tests: what should they cover, what is a good question, how do we set a good cut score , how do we ensure that the test predicts job performance or student success, and so on. Many of these questions align with the test development cycle —we’ll talk more about that later.
How do we define what the test should cover? (Test Design)
Before writing any items, you must define very specifically what the test will include. If the test is for credentialing or pre-employment, psychometricians typically conduct a job analysis study to form a quantitative scientific basis for the test blueprints. A job analysis is necessary for a certification program to gain accreditation. In education, test coverage is often defined by the curriculum.
How do we ensure that the questions are of good quality? (Item writing)
There is a body of scientific literature on how to develop test items that accurately measure what you are trying to measure. An excellent summary is Haladyna’s book . This is not limited to just multiple choice items, although that approach remains popular. Psychometricians leverage their knowledge of best practices to guide the item creation and review process in a way that results in highly defensible test content. Professional item bank software provides the most efficient way to develop high-quality content and publish multiple test forms, as well as store important historical information such as item statistics.
How do we set a defensible cut score? (Standard Setting)
Test scores are often used to classify candidates into groups such as pass/fail (Certification/License), hire/not hire (Pre-employment), and below basic/basic/proficient/advanced (Education). Psychometricians conduct studies to determine cutoff scores, using methodologies such as Angoff , Beuk , Contrasting Groups , and Cutoff .
How do we analyse the results to improve the exam? (Psychometric analysis)
Psychometricians are essential for this step, as statistical analyses can be quite complex. Smaller testing organizations typically use classical test theory, which is based on simple math like proportions and correlations. Large, high-profile organizations typically use item response theory (IRT), which is based on a type of nonlinear regression analysis. Psychometricians assess overall test reliability, item difficulty and discrimination, distractor analysis, potential bias, multidimensionality, linking multiple test forms/years, and much more. Software like Iteman and Xcalibre are also available for organizations with enough experience to run statistical analyses in-house. Scroll down for examples.
How do we compare scores between groups or years? (Equalization)
This is known as linking and equating . There are some psychometricians who dedicate their entire careers to this topic. If you’re working on a certification exam, for example, you want to make sure that the passing standard is the same this year as it was last year. If you passed at 76% last year and this year you passed at 25%, not only will candidates be upset, but there will be much less confidence in the meaning of the credential.
How do we know that the test measures what it should? (Validity)
Validity is the evidence provided to support score interpretations . For example, we might interpret scores on a test to reflect knowledge of English, and we need to provide documentation and research to support this. There are several ways to provide this evidence. A straightforward approach is to establish content-related evidence , which includes test definition, schemas, and item creation/revision. In some situations, criterion-related evidence is important as it directly correlates test scores with another variable of interest. Presenting tests securely is also essential for validity.
Where is psychometry used?
Certification/license/credentialing
In certification testing, psychometricians develop the test through a documented chain of evidence following a sequence of research outlined by the accrediting bodies, typically: job analysis, test blueprints, item writing and revision, cut score study, and statistical analysis. Web-based item bank software like FastTest is often helpful because the exam committee is often comprised of experts located across the country or even around the world; they can then easily log in from anywhere and collaborate.
Pre-employment
In pre-employment testing, evidence of validity relies primarily on establishing appropriate content (a test on PHP programming for a PHP programming job) and correlating the test scores with an important criterion such as job performance ratings (showing that the test predicts good job performance). Adaptive testing is becoming much more common in pre-employment testing because it provides several benefits, the most important of which is reducing testing time by 50% – a huge achievement for large corporations that screen a million applicants each year. Adaptive testing is based on item response theory and requires a specialized psychometrician as well as specially designed software such as FastTest .
K-12 Education
Most assessments in education fall into one of two categories: lower-stakes formative assessment in classrooms and higher-stakes summative assessments, such as end-of-year exams . Psychometrics are essential for establishing the reliability and validity of higher-stakes exams and for equating scores across different years. They are also important for formative assessments, which are evolving toward adaptive formats due to the 50% reduction in testing time, meaning students spend less time being assessed and more time learning.
Universities
Universities don’t often give much thought to psychometrics, even though a significant amount of testing is done in higher education, especially with the transition to online learning and MOOCs . Since many of the exams are high stakes (consider a certification exam after completing a one-year graduate program!), psychometricians must be used to establish legally defensible cut scores and statistical analysis to ensure reliable tests, and professionally designed assessment systems to develop and deliver tests, especially with enhanced security.
Medicine/Psychology
Have you ever taken a survey at your doctor’s office or before or after surgery? Perhaps a depression or anxiety inventory at a psychotherapist? Psychometricians have worked on these issues.
The Test Development Cycle
Psychometrics is at the core of the test development cycle , which is the process of developing a robust test. It is sometimes referred to by similar names, such as the assessment life cycle.
You’ll recognize some of the terms from the introduction above. What we’re trying to demonstrate here is that those questions aren’t stand-alone topics, or something you do once and just report back on. A review is typically a living thing. Organizations typically re-release a new version every year or 6 months, meaning that much of the cycle repeats on that timeline. Not all of it is; for example, many organizations only do a job analysis and standards setting every 5 years.
Consider a certification exam in healthcare. The profession doesn’t change quickly because things like anatomy never change and medical procedures rarely change (e.g. how to measure blood pressure). So every 5 years you do a job analysis of your certs to see what they’re doing and what’s important. This then gets turned into test blueprints . Items get re-assigned if needed, but most likely they don’t need to because there are probably only minor changes to the blueprints. A new cut score is then set using the modified Angoff method and the test is given this year. It’s given again next year, but it’s equated to this year instead of starting over. Item statistics are still analyzed though, leading to a new cycle of item review and publication of a new form for next year.
Example of psychometry in action
Below is an output from our Iteman software . It is a deep analysis of a single English vocabulary question to see if the student knows the word ‘ alleviate ’. About 70% of the students answered correctly, with a very strong point biserial. The distractor P-values were all in the minority and the distractor point biserials were negative, adding evidence to the validity. The graph shows that the line for the correct answer goes up while the others go down, which is a good thing. If you are familiar with item response theory, you will notice how the blue line is similar to an item response function. That is not a coincidence.
Now, let’s look at another, more interesting one. Here’s a vocabulary question about the word ‘ confectioner’ . Notice that only 37% of students get it right… even though there’s a 25% chance of just guessing! However, the point biserial discrimination is still very strong at 0.49. That means it’s a really good item. It’s just hard, which means it does a great job of differentiating between the best students.
Psychometry sounds fun! How can I join the band?
You will need a graduate degree. I recommend checking out the NCME website (ncme.org) for student resources. Good luck!
Do you already have a degree and are you looking for a job? These are the two sites I recommend:
- NCME: Also has a job postings page that is really good (ncme.org)
- Horizon Search : headhunting for I/O psychometricians and psychologists
Nathan Thompson, PhD
Latest posts by Nathan Thompson, PhD (see all)
- ¿Qué es la psicometría? Ciencia de datos para evaluación. - October 7, 2024
- What is RIASEC Assessment? - September 29, 2024
- Addressing Pre-Knowledge in Exam Cheating - September 9, 2024