Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) is an approach to estimating parameters for a model. It is one of the core aspects of Item Response Theory (IRT), especially to estimate item parameters (analyze questions) and estimate person parameters (scoring). This article will provide an introduction to the concepts of MLE.

Content

History behind Maximum Likelihood Estimation
Defining Maximum Likelihood Estimation
Comparison of likelihood and probability
Key characteristics of Maximum Likelihood Estimation
Weaknesses of Maximum Likelihood Estimation
Application of Maximum Likelihood Estimation
Summarizing remarks about Maximum Likelihood Estimation
References

History behind Maximum Likelihood Estimation

Even though early ideas about MLE appeared in the mid-1700s, Sir Ronald Aylmer Fisher developed them into a more formalized concept much later. Fisher was working seminally on maximum likelihood from 1912 to 1922, criticizing himself and producing several justifications. In 1925, he finally published “Statistical Methods for Research Workers”, one of the 20th century’s most influential books on statistical methods. In general, the production of maximum likelihood concept has been a breakthrough in Statistics.

Defining Maximum Likelihood Estimation

Wikipedia defines MLE as follows:

In statistics, Maximum Likelihood Estimation is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate.

Merriam Webster has a slightly different definition for MLE:

A statistical method for estimating population parameters (as the mean and variance) from sample data that selects as estimates those parameter values maximizing the probability of obtaining the observed data.

To sum up, MLE is a method that detects parameter values of a model. These parameter values are identified such that they maximize the likelihood that the process designed by the model produced the data that were actually observed. To put it simply, MLE answers the question:

For which parameter value does the observed data have the biggest probability?

Comparison of likelihood and probability

The definitions above contain “probability” but it is important not to mix these two different concepts. Let us look at some differences between likelihood and probability, so that you could differentiate between them.

Likelihood	Probability
Refers to the occurred events with known outcomes	Refers to the events that will occur in the future
Likelihoods do not add up to 1	Probabilities add up to 1
Example 1: I flipped a coin 20 times and obtained 20 heads. What is the likelihood that the coin is fair?	Example 1: I flipped a coin 20 times. What is the probability of the coin to land heads or tails every time?
Example 2: Given the fixed outcomes (data), what is the likelihood of different parameter values?	Example 2: The fixed parameter P = 0.5 is given. What is the probability of different outcomes?

Calculating Maximum Likelihood Estimation

MLE can be calculated as a derivative of a log-likelihood in relation to each parameter, the mean μ and the variance σ², that is equated to 0. There are four general steps in estimating the parameters:

Call for a distribution of the observed data
Estimate distribution’s parameters using log-likelihood
Paste estimated parameters into a distribution’s probability function
Evaluate the distribution of the observed data

Key characteristics of Maximum Likelihood Estimation

MLE operates with one-dimensional data
MLE uses only “clean” data (e.g. no outliers)
MLE is usually computationally manageable
MLE is often real-time on modern computers
MLE works well for simple cases (e.g. binomial distribution)

Weaknesses of Maximum Likelihood Estimation

MLE is sensitive to outliers
MLE often demands optimization for speed and memory to obtain useful results
MLE is sometimes poor at differentiating between models with similar distributions
MLE can be technically challenging, especially for multidimensional data and complex models

Application of Maximum Likelihood Estimation

In order to apply MLE, two important assumptions (typically referred to as the i.i.d. assumption) need to be made:

Data must be independently distributed, i.e. the observation of any given data point does not depend on the observation of any other data point (each data point is an independent experiment)
Data must be identically distributed, i.e. each data point is generated from the same distribution family with the same parameters

Let us consider several world-known applications of MLE:

Global Positioning System (GPS)
Smart keyboard programs for iOS and Android operating systems (e.g. Swype)
Speech recognition programs (e.g. Carnegie Mellon open source SPHINX speech recognizer, Dragon Naturally Speaking)
Detection and measurement of the properties of the Higgs Boson at the European Organization for Nuclear Research (CERN) by means of the Large Hadron Collider (Francois Englert and Peter Higgs were awarded the Nobel Prize in Physics in 2013 for the theory of Higgs Boson)

Generally speaking, MLE is employed in agriculture, economics, finance, physics, medicine and many other fields.

Summarizing remarks about Maximum Likelihood Estimation

Despite some functional issues with MLE such as technical challenges for multidimensional data and complex multiparameter models that interfere solving many real world problems, MLE remains a powerful and widely used statistical approach for classification and parameter estimation. MLE has brought many successes to the mankind in both scientific and commercial worlds.

References

Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162-176.

Stigler, S. M. (2007). The epic story of maximum likelihood. Statistical Science, 598-620.

Bio
Latest Posts

Laila Issayeva M.Sc.

Laila Baudinovna Issayeva earned her M.Sc. in Educational Leadership from Nazarbayev University with a focus on School Leadership and Improvement Management. Her undergraduate degree was from Aktobe Regional State University with a major in Mathematics and a minor in Computer Science. Laila is an experienced educator and an educational measurement specialist with expertise in item and test development, setting standards, analyzing, interpreting, and presenting data based on classical test theory and item response theory (IRT). As a professional, Laila is primarily interested in the employment of IRT methodology and artificial intelligence technologies to educational improvement.

Latest posts by Laila Issayeva M.Sc. (see all)

Factor Analysis: Evaluating Dimensionality in Assessment - August 16, 2024
What is Digital Assessment, aka e-Assessment? - May 18, 2024
What is a z-score? - November 15, 2023

Maximum Likelihood Estimation