Covariation is a measure of the linear relationship between two variables.
It tells us how much two variables vary together, and is computed as follows:
$$Cov(X,Y)=E[(X-μ_X )(Y-μ_Y )]=\sum_i P(X=x_i),P(Y=y_i)[(x_i-μ_X)(y_i-μ_Y)]$$
$$=E(XY)-μ_X E(Y)-μ_Y E(X)+μ_X μ_Y$$
$$\mathbf{Cov(X,Y)= E(XY)-μ_X μ_Y}$$
$$Cov(X,X)= E[(X-μ_X)^2]=Var(X)$$
The covariance is somewhat difficult to interpret since it is scale dependent.
Bigger covariance does not necessarily mean the stronger the relationship. For clearer
interpretation, we remove scale dependency by standardizing the covariance. The resulting measure is
referred to as correlation:
$$\mathbf{Corr(X,Y)=\frac{Cov(X,Y)}{σ_X σ_Y}}$$
Correlation is essentially a way of determining the existence of a relationship between
two variables, and it provides a measure of the strength of the relationship. Points to note:
- Correlation is unit‐free.
- Corr(X, Y) is always between ˗1.0 and 1.0.
- Corr(X, Y) > 0: X and Y are positively correlated. Both variables tend to move in tandem.
- Corr(X, Y) < 0: X and Y are negatively correlated. An increase in one variable is associated with a decrease in the other.
- Corr(X,Y) = 1.0: perfect positive linear relationship.
- Corr(X,Y) = 0: no linear relationship between X and Y.
- Corr(X,Y) = −1.0: perfect negative linear relationship.