##
Archive for **January 14th, 2023**

### Maximum Likelikood (ML) concept? How applied in Physics? Is it a powerful tool to estimate the value of unknown parameters using Bayes Theorem in Probability?

Posted January 14, 2023

on:**Note**: ML) concept is mostly applied in **social sciences and experiments done on living species **(including plants) because of the many variables that need to be controlled and the variability in the kinds of measurement used and the mostly random nature of the events

Obviously, **all is computed by “computer”,** so we can make our equations are complex as we wish it. for accuracy sake

Jul 2, 2022

An experiment consists, in fact, in drawing **random numbers properly distributed**. We just try to improve our knowledge by means of the Bayes Theorem, repeating the measurements, possibly improving the techniques such that the uncertainty of the “measurements” is progressively reduced.

The Bayes Theorem states that

where P(g|x) represents the probability of a specific e**vent g**, given the observation of one or more events x. P(g) is the probability of an event to occur, irrespective of any other event.

As physicists, we can read the theorem as follows:

we don’t know, with certainty, the value of most physical quantities (e.g. the gravitational acceleration g). We can only measure it, but, inevitably, **the measurement is associated to an uncertainty**.

You realise this soon, when you try to measure twice the value of g: it is highly unlikely that you obtain the same exact value. You rather obtain a** distribution of data**. It means that the value of g is, in fact, a** random variable** (the fact that you believe it must have a *true* value is irrelevant to the discussion: you cannot measure it, so it is pointless to consider it).

According to Bayes Theorem, starting from a prior knowledge, represented by a (subjective) P(g), one can update the knowledge by including more data, providing a **posterior probability** which is closer to the true one.

Here, you can find a COLAB notebook where the Bayes Theorem is demonstrated in action (we simulate the repeated extraction of a ball from a urn and figure out which is the right probability of extracting a ball of a given color, irrespective of our initial choice).

You can make a rough estimation by eye before starting to measure it.

Since an object takes about half a second to fall from a height of 1 m, kinematics suggests that g is of the order of 10 m/s². We can associate a very crude uncertainty to this value: certainly g>0, and cannot exceed, say, 50 times this estimation: 500 m/s².

We can start **with betting that g** lies in the interval [0,500] m/s², and that there is no reason to believe that any of the values in this interval is more likely than the other (you may disagree, but probabilities are

*subjective*, so this is my guess, and you are not authorised to discuss it; of course you can bet against me).

If g is the result of our estimation, P(g) is a uniform distribution, and is our **prior**, i.e., our belief on the value of g, **before** having acquired any new information.

By the normalisation condition, P(g) is a constant p=1/500 s²/m (remember that **p is the probability density function **and that the probability of finding g to be between g and g+dg is P(g)dg which is, as it should, dimensionless.

The interpretation of P(g) is the following: the probability that g is any number between 0 and 500 m/s² is the same.

When we perform a new measurement, we obtain a new value g₁, and in doing this we believe that g is within the above interval, so the probability to obtain g₁ as a result is subject to the fact that we estimated it as g: P(g₁|g).

P(g₁|g) is called the **likelihood**, because, since we believe that g is a number in the above interval, it is likely that we find it to be g₁∈[0,500], after a measurement.

P(g₁) is nothing but a **normalisation factor**, and is called the **evidence**. The evidence comes just from the experiment, and nothing else.

After the first measurement, we can then affirm that

where **σ is the uncertainty **associated to the measurement, at least in those cases in which data distribute as a gaussian. P(g|g₁) is called the **posterior**, because it gives us the probability that the value of the gravitational acceleration is g, **after** we measured it to be g₁.

The interpretation of P(g|g₁) is that g=g₁ with the highest possible probability, because, for g=g₁, the exponential is 1, and is at its maximum.

The probability that g≠g₁ decreases with the distance |g-g₁|. This is why, after a real measurement (not just a rough estimation), we believe that the value of the gravitational acceleration is the one we measured.

We can now treat P(g|g₁) as our new prior: on the other hand our knowledge about the process is improved, and we must use all the information we have, to make a new, more informed, *bet*.

After a second measurement, the new posterior will thus be

According to the Bayes Theorem, the posterior is the prior (evaluated at the last step) multiplied by the probability to measure g₂, given g, and g₁. For simplicity we assume that the uncertainty is always the same.

Again, this becomes our new prior and we can **iterate for N further measurements** to obtain

**The principle of ML states that P(g|g₁,g₂,…), the probability for the event consisting of the sequence {gᵢ} to occur, is the highest possible. If so, the posterior gives the probability that g is the true value of the gravitational acceleration, given that we measured the sequence {gᵢ}.**

Working with products is difficult. Observing that, if P(g|gᵢ) has a maximum, its logarithm attains its maximum for the same values of the parameters, we compute

The maximum of the log-likelihood is attained when its derivative with respect to g is null, i.e.,

which happens when

Differentiating with respect to σ, we find

which is satisfied when

In summary,** the reason for which we use the mean as the true value of a measurement and its variance as an estimator of the uncertainty, is that they maximise the likelihood.**

The principle of ML can be used in similar, but different, cases, such as the derivation of the parameters of a distribution, or fitting data to a model.

Do you want to keep reading stories like this? Don’t miss new stories by subscribing Medium: https://giovanni-organtini.medium.com/membership