# The concept of Covariance among independent variables? Assuming that what we select as factors are actually independent in our experiments?

Posted June 24, 2022

on:**Note:** the concept of Covariance goes against our inclination (ego of the “expert” experimenter) and it lead to many defective results if we don’t tend to verifying how interdependent are the variables we **selected for our hypothesis**.

Check for covariance between your variables before trying to fit any model!

Do Not worry about the complicated equations: the computer does all the work and the researcher has to determine how to interpret the results. **He should also suggest which variable to drop or add to the follow up experiment** if any of the independent variables exhibited strong covariance.

Feb 11, 2021

**Visualizing Covariance**

A common assumption when we build any model tends to be that the variables are independent, but this assumption often doesn’t hold perfectly: and covariance among the factors messes up your estimates.

**First we derive the likelihood distribution** for some model, next we will show how the shape of this distribution and hence the confidence interval of our estimates changes with variance.

Finally, we will show how we can visualize the **effect of covariance graphically **and how high covariance between the variables you are trying to estimate** affect the confidence intervals**.

Starting with Bayes Theorem, we can write the **posterior probability** of some model X given observed data D and prior information I. We assume that our posterior is proportional to our likelihood and prior/evidence is some** constant.**

as:

Hence if we maximise likelihood, we maximise the posterior.

Our model can generate ideal estimates (F=(F1,F2,…,Fn)). The difference between the ideal estimates and observed data (D=(D1,D2,…,Dn)) is the **error or noise**.

We assume that the measurement noise follows a **Gaussian distribution**.

Assuming that **each measurement is independent,** we can write that the probability of observing the real data given our model and prior belief as the product of the probability of observing each discrepancy where each discrepancy has some variance σ² and mean 0:

:

Taking logarithm, we have the log likelihood L:

When we maximise L we minimise the **residuals**.

By maximising L, we can determine the X parameters that gives that minimises the discrepancy between the model and the observed data.

This is the maximum likelihood parameter estimate.

We shall denote this as X*.*

*To see how variance affects our likelihood, we remind ourselves that P(X|D,I)=exp(L). We then Taylor expand L around the maximum point L(X*).

the 2nd term is 0 because at the maximum L, the slope of L is 0.

We can compare this with the Gaussian distribution: We see that we can write P(D|X, I) as a Gaussian distribution with mean X* and variance -1/(d²L/dX*²) . *

*Naturally the larger the variance, the wider the Gaussian and the wider our confidence intervals for the same confidence level. *

*We also see that this variance is given by the negative of the inverse of the second derivative of the log-likelihood function.*

*The variance determines the width of the normal distribution*

*Another way to examine the quality of our estimates is by looking at the deviation from the maximum likelihood. *

*Looking at the log likelihood function again:*

*We see that Q defines a region around the maximum likelihood.*

*We now examine Q again in matrix notation, Hessian matrix*

*Since the Hessian matrix is symmetric, it is also diagonalizable: Hessian matrix is diagonalizable*

*We can define a mapping: We map estimated variables X onto y where E is a square matrix whose cols are the normalised eigenvectors of the Hessian in any order, D is a diagonal matrix whose entries are the associated eigenvalues of the eigenvectors in E.*

*We see that we can actually write Q as the equation for an eclipse in the space of y for the 2D case of 2 variables.*

*The eclipse define the region around which Q= some constant value k.*

*Let’s al final get around to examining covariance now.*

*In 2D, we can write, by considering a 2 variable model with parameters X and y, we can get P(X|D,I) if we integrate over all possible values of y: We substitute the expression for Q then factor out the parts that don’t depend on y, complete squares. *

*Integrals from -inf to inf of exp(-y²/(2sigma²)) have the standard solution sigma*sqrt (2*pi) which we can substitute into our expression:

We can apply the same métodos to get the variance for y and for the covariance of x, y.

The covariance matrix is given by: We can see that it is just the negative inverse of the Hessian matrix.

Moreover since the Hessian is symmetric, its determinant is given by the product of its eigenvalues.

So if the covariance becomes very high, C becomes large and the determinant becomes very small. The area of the ellipse is given by k/(product of the eigenvalues)=k/determinant, hence when covariance is high, the ellipse becomes very elongated in 1 direction.

This in turns increases the marginal error bar for a given k, so for some confidence level, the error bar of our estimate would be larger and we would have less precise estimates if the covariance is high.

It also leads to problems with convergence for optimization algorithms that uses the **second derivative **(Hessian) such as **Newton-Raphson**.

In the case of Newton-Raphson, the large covariance implies a large step size which can cause the algorithm to repeatedly over and undershoot the local maximum point and the algorithm to be unable to converge.

In summary, check for covariance between your variables before trying to fit any model!

## Leave a Reply