Meta-analysis is a research process of collating data from multiple independent but similar scientific studies in order to identify common trends and findings by means of statistical methods. To put it simply, it is a method where you can accumulate all of your research findings and analyze them statistically. It is often used in psychometrics and industrial-organizational psychology to help validate assessments. Meta-analysis not only serves as a summary of a research question but also provides a quantitative evaluation of the relationship between two variables or the effectiveness of an experiment. It can also work for examining theoretical assumptions that compete with each other.

Background of Meta-Analysis

An American statistician and researcher, Gene Glass, devised the term ‘meta-analysis’ in 1976. He called so the statistical analysis of a large amount of data from individual studies in order to integrate the findings. Medical researchers began employing meta-analysis a few years later. One of the first influential applications of this method was when Elwood and Cochrane used meta-analysis to examine the effect of aspirin on reducing recurrences of heart attacks.


Purpose of Meta-Analysis

In general, meta-analysis is aimed at two things:

  • to establish whether a study has an effect and to determine whether it is positive or negative,
  • to analyze the results of previously conducted studies to find out common trends.

Performing Meta-Analysis

Even though there could be various ways of conducting meta-analysis depending on the research purpose and field, there are eight major steps:

  1. Set a research question and propose a hypothesis
  2. Conduct a systematic review of the relevant studies
  3. Extract data from the studies to include into the meta-analysis considering sample sizes and data variability measures for intervention and control groups (the control group is under observation whilst the intervention group is under experiment)
  4. Calculate summary measures, called effect sizes (the difference in average values between intervention and control groups), and standardize
    estimates if necessary for making comparisons between the groups
  5. Choose a meta-analytical method: quantitative (traditional univariate meta-analysis, meta-regression, meta-analytic structural equation modeling) or qualitative
  6. Pick up the software depending on the complexity of the methods used and the dataset (e.g. templates for Microsoft Excel, Stata, SPSS, SAS, R, Comprehensive Meta-Analysis, RevMan), and code the effect sizes
  7. Do analyses by employing an appropriate model for comparing effect sizes using fixed effects (assumes that all observations share a common mean effect size) or random effects (assumes heterogeneity and allows for a variation of the true effect sizes across observations)
  8. Synthesize results and report them

Prior to making any conclusions and reporting results, it would be helpful to use the checklist suggested by DeSimone et al. (2021) to ensure that all crucial aspects of the meta-analysis have been addressed in your study.

Meta-Analysis in Assessment and Psychometrics: Test Validation and Validity Generalization

Due to its versatility, meta-analysis is used in various fields of research, in particular as a test validation strategy in psychology and psychometrics. The most common situation to apply meta-analysis is validating the use of tests in workplace in the field of personnel psychology and pre-employment testing. The classic example of such application is the work done by Schmidt and Hunter (1998) who analyzed 85 years of research on what best predicts job performance. This is one of the most important articles in that topic. It has been recently updated by Sackett et al. (2021) with slightly different results.

How is meta-analysis applied to such a situation?  Well, start be reconceptualizing a “sample” as a set of studies, not a set of people. So let’s say we find 100 studies that use pre-employment tests to select examinees by predicting job performance (obviously, there are far more). Because most studies use more than one test, there might be 77 that use a general cognitive ability test, 63 that use a conscientiousness assessment, 24 that use a situational judgment test, etc. We look at the correlation coefficients reported for those first 77 studies and find that the average is 0.51, while the average correlation for conscientiousness is 0.44 and for SJTs is 0.39. You can see how this is extremely useful in a practical sense, as a practitioner that might be tasked with selecting an assessment battery!

Meta-analysis studies will often go further and clean up the results, by tossing studies with poor methodology or skewed samples, and applying corrections for things like range restriction and unreliability. This enhances the validity of the overall results. To see such an example, visit the Sackett et al. (2021) article.

Such research has led to the concept of validity generalization. This suggests that if a test has been validated for many uses, or similar uses, you can consider it validated for your particular use without having to do a validation study. For example, if you are selecting clerical workers and you can see that there are literally hundreds of studies which show that numeracy or quantitative tests will predict job performance, there is no need for you to do ANOTHER study. If challenged, you can just point to the hundreds of studies already done. Obviously, this is a reasonable argument, but you should not take it too far, i.e., generalize too much.


As you might have understood so far, conducting meta-analysis is not a piece of cake. However, it is very efficient when the researcher intends to evaluate effects in diverse participants, set another hypothesis creating a precedence for future research studies, demonstrate statistical significance or surmount the issue of a small sample size in research.


Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2021). Introduction to meta-analysis. John Wiley & Sons.

DeSimone, J. A., Brannick, M. T., O’Boyle, E. H., & Ryu, J. W. (2021). Recommendations for reviewing meta-analyses in organizational research. Organizational Research Methods24(4), 694-717.

Field, A. P., & Gillett, R. (2010). How to do a meta‐analysis. British Journal of Mathematical and Statistical Psychology63(3), 665-694.

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational researcher5(10), 3-8.

Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Sage Publications.

Gurevitch, J., Koricheva, J., Nakagawa, S., & Stewart, G. (2018). Meta-analysis and the science of research synthesis. Nature555(7695), 175-182.

Hansen, C., Steinmetz, H., & Block, J. (2022). How to conduct a meta-analysis in eight steps: a practical guide. Management Review Quarterly72(1), 1-19.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Academic Press.

Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings across studies. Sage Publications.

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Sage.

Peto, R., & Parish, S. (1980). Aspirin after myocardial infarction. Lancet1(8179), 1172-1173.

Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2021). Revisiting meta-analytic estimates of validity in personnel selection: addressing systematic overcorrection for restriction of range. Journal of Applied Psychology.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: practical and theoretical implications of 85 years of research findings. Psychological bulletin124(2), 262.


A z-score measures the distance between a raw score and a mean in standard deviation units. The z-score is also known as a standard score since it enables comparing scores on various variables by standardizing the distribution of scores. It is worth mentioning that a standard normal distribution (also known as the z-score distribution or probability distribution) is a normally shaped distribution with a mean of  0  and a standard deviation of  1.

The z-score can be positive or negative. The sign depends on whether the observation is above or below the mean. For instance, the z-score of  +2  indicates that the raw score (data point) is two standard deviations above the mean, while a  -1  signifies that it is one standard deviation below the mean. The z-score of  0  equals the mean.

Z-scores generally range from  -3  standard deviations (which would fall to the far left of the normal distribution curve) up to  +3  standard deviations (which would fall to the far right of the normal distribution curve). This covers  99%  of the population; there are people outside that range (e.g., gifted students) but for most cases it is difficult to measure the extremes and there is little practical difference.  It is for this reason that scaled scores on exams are often produced with this paradigm; the SAT has a mean of  500  and standard deviation of  100, so the range is  200  to  800. 

How to calculate a z-score

Here is a formula for calculating the z-score:

z = (xμ)/σ


     x – individual value

     μ – mean

     σ – standard deviation.

Interpretation of the formula:

  • Subtract the mean of the values from the individual value
  • Divide the difference by the standard deviation.

Here is a graphical depiction of the standard normal curve and how the z-score relates to other metrics.


T scores

Advantages of using a z-score

When you standardize the raw data by transforming them into z-scores, you receive the following benefits:

  • Identify outliers
  • Understand where an individual score fits into a distribution
  • Normalize scores for statistical decision-making (e.g., grading on a curve)
  • Calculate probabilities and percentiles using the standard normal distribution
  • Compare scores on different distributions with different means and standard deviations

Example of using a z-score in real life situation

Let’s imagine that there is a set of SAT scores from students, and this data set obeys a normal distribution law with the mean score of  500  and a standard deviation of  100. Suppose we need to find the probability that these SAT scores exceed  650. In order to standardize our data, we have to find the z-score for  650. The z-score will tell us how many standard deviations away from the mean  650  is.

  • Subtracting the mean from the individual value:

x – 650

μ – 500

xμ = 650– 500= 150

  • Dividing the obtained difference by the standard deviation:

σ – 100

z = 150 ÷ 100 = 1.5

The z-score for the value of  650  is  1.5, i.e.  650  is  1.5  standard deviations above the mean in our distribution.

If you look up this z-score on a conversion table, you will see that it says  0.93319.  This means that a score of  650  is at the  93rd  percentile of students.

Additional resources

Khan Academy

Normal Distribution (Wikipedia)