Maximum Likelihood Estimation

December 18, 2022

Maximum Likelihood Estimation (MLE) is an approach to estimating parameters for a model. It is one of the core aspects of Item Response Theory (IRT), especially to estimate item parameters (analyze questions) and estimate person parameters (scoring). This article will provide an introduction to the concepts of MLE.

Content

History behind Maximum Likelihood Estimation
Definition of Maximum Likelihood Estimation
Comparison of likelihood and probability
Calculation of Maximum Likelihood Estimation
Key characteristics of Maximum Likelihood Estimation
Weaknesses of Maximum Likelihood Estimation
Application of Maximum Likelihood Estimation
Summary about Maximum Likelihood Estimation
References

1. History behind Maximum Likelihood Estimation

Even though early ideas about MLE appeared in the mid-1700s, Sir Ronald Aylmer Fisher developed them into a more formalized concept much later. Fisher was working seminally on maximum likelihood from 1912 to 1922, criticizing himself and producing several justifications. In 1925, he finally published “Statistical Methods for Research Workers”, one of the 20th century’s most influential books on statistical methods. In general, the production of maximum likelihood concept has been a breakthrough in Statistics.

2. Definition of Maximum Likelihood Estimation

Wikipedia defines MLE as follows:

In statistics, Maximum Likelihood Estimation is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate.

Merriam Webster has a slightly different definition for MLE:

A statistical method for estimating population parameters (as the mean and variance) from sample data that selects as estimates those parameter values maximizing the probability of obtaining the observed data.

To sum up, MLE is a method that detects parameter values of a model. These parameter values are identified such that they maximize the likelihood that the process designed by the model produced the data that were actually observed. To put it simply, MLE answers the question:

For which parameter value does the observed data have the biggest probability?

3. Comparison of likelihood and probability

The definitions above contain “probability” but it is important not to mix these two different concepts. Let us look at some differences between likelihood and probability, so that you could differentiate between them.

Likelihood	Probability
Refers to the occurred events with known outcomes	Refers to the events that will occur in the future
Likelihoods do not add up to 1	Probabilities add up to 1
Example 1: I flipped a coin 20 times and obtained 20 heads. What is the likelihood that the coin is fair?	Example 1: I flipped a coin 20 times. What is the probability of the coin to land heads or tails every time?
Example 2: Given the fixed outcomes (data), what is the likelihood of different parameter values?	Example 2: The fixed parameter P = 0.5 is given. What is the probability of different outcomes?

4. Calculation of Maximum Likelihood Estimation

MLE can be calculated as a derivative of a log-likelihood in relation to each parameter, the mean μ and the variance σ², that is equated to 0. There are four general steps in estimating the parameters:

Call for a distribution of the observed data
Estimate distribution’s parameters using log-likelihood
Paste estimated parameters into a distribution’s probability function
Evaluate the distribution of the observed data

5. Key characteristics of Maximum Likelihood Estimation

MLE operates with one-dimensional data
MLE uses only “clean” data (e.g. no outliers)
MLE is usually computationally manageable
MLE is often real-time on modern computers
MLE works well for simple cases (e.g. binomial distribution)

6. Weaknesses of Maximum Likelihood Estimation

MLE is sensitive to outliers
MLE often demands optimization for speed and memory to obtain useful results
MLE is sometimes poor at differentiating between models with similar distributions
MLE can be technically challenging, especially for multidimensional data and complex models

7. Application of Maximum Likelihood Estimation

In order to apply MLE, two important assumptions (typically referred to as the i.i.d. assumption) need to be made:

Data must be independently distributed, i.e. the observation of any given data point does not depend on the observation of any other data point (each data point is an independent experiment)
Data must be identically distributed, i.e. each data point is generated from the same distribution family with the same parameters

Let us consider several world-known applications of MLE:

Global Positioning System (GPS)
Smart keyboard programs for iOS and Android operating systems (e.g. Swype)
Speech recognition programs (e.g. Carnegie Mellon open source SPHINX speech recognizer, Dragon Naturally Speaking)
Detection and measurement of the properties of the Higgs Boson at the European Organization for Nuclear Research (CERN) by means of the Large Hadron Collider (Francois Englert and Peter Higgs were awarded the Nobel Prize in Physics in 2013 for the theory of Higgs Boson)

Generally speaking, MLE is employed in agriculture, economics, finance, physics, medicine and many other fields.

8. Summary about Maximum Likelihood Estimation

Despite some functional issues with MLE such as technical challenges for multidimensional data and complex multiparameter models that interfere solving many real world problems, MLE remains a powerful and widely used statistical approach for classification and parameter estimation. MLE has brought many successes to the mankind in both scientific and commercial worlds.

9. References

Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162-176.

Stigler, S. M. (2007). The epic story of maximum likelihood. Statistical Science, 598-620.

Laila Issayeva M.Sc.

Laila Issayeva earned her BA in Mathematics and Computer Science at Aktobe State University and Master’s in Education at Nazarbayev University. She has experience as a math teacher, school leader, and as a project manager for the implementation of nationwide math assessments for Kazakhstan. She is currently pursuing a PhD in psychometrics.

Ready to talk to an assessment expert?

Get in touch, and we'll meet to discuss how we can improve your exam development, delivery, and psychometrics!

Request a Consultation