Derivation of Correlation Coefficient Property

 

Derivation of the Correlation Coefficient

Data Science and A.I. Lecture Series

 

Problem Statement

Objective: Derive the formula for the correlation coefficient \( r(X, Y) \):

\[
r(X, Y) = \frac{\sigma_X^2 + \sigma_Y^2 – \sigma_{X-Y}^2}{2 \sigma_X \sigma_Y}.
\]

Definitions:

  • \( \sigma_X^2 \): Variance of \( X \).
  • \( \sigma_Y^2 \): Variance of \( Y \).
  • \( \sigma_{X-Y}^2 \): Variance of \( Z = X – Y \).
  • Covariance between \( X \) and \( Y \): \( \text{Cov}(X, Y) \).

Step 1: Variance of \( Z = X – Y \)

Define \( Z = X – Y \). The variance of \( Z \) is:

\[
\sigma_{X-Y}^2 = \frac{1}{n} \sum_{i=1}^n \left( z_i – \overline{Z} \right)^2.
\]

Where:

  • \( z_i = x_i – y_i \): Difference between corresponding values of \( X \) and \( Y \).
  • \( \overline{Z} = \overline{X} – \overline{Y} \): Mean of \( Z \), obtained as the difference of the means of \( X \) and \( Y \).

Substitute \( z_i = x_i – y_i \):

\[
\sigma_{X-Y}^2 = \frac{1}{n} \sum_{i=1}^n \left\{ (x_i – \overline{X}) – (y_i – \overline{Y}) \right\}^2.
\]

Step 2: Expanding the Variance

Expand the squared term inside the summation:

\[
\sigma_{X-Y}^2 = \frac{1}{n} \sum_{i=1}^n \left[ (x_i – \overline{X})^2 + (y_i – \overline{Y})^2 – 2 (x_i – \overline{X})(y_i – \overline{Y}) \right].
\]

This gives three components:

  1. \( \frac{1}{n} \sum_{i=1}^n (x_i – \overline{X})^2 = \sigma_X^2 \), the variance of \( X \).
  2. \( \frac{1}{n} \sum_{i=1}^n (y_i – \overline{Y})^2 = \sigma_Y^2 \), the variance of \( Y \).
  3. \( \frac{1}{n} \sum_{i=1}^n (x_i – \overline{X})(y_i – \overline{Y}) = \text{Cov}(X, Y) \), the covariance between \( X \) and \( Y \).

Substitute these into \( \sigma_{X-Y}^2 \):

\[
\sigma_{X-Y}^2 = \sigma_X^2 + \sigma_Y^2 – 2 \, \text{Cov}(X, Y).
\]

Step 3: Correlation Coefficient

Recall the definition of the correlation coefficient:

\[
r(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}.
\]

From the variance expansion:

\[
\sigma_{X-Y}^2 = \sigma_X^2 + \sigma_Y^2 – 2 \, \text{Cov}(X, Y).
\]

Rearrange to express \( \text{Cov}(X, Y) \) in terms of \( \sigma_X^2, \sigma_Y^2, \) and \( \sigma_{X-Y}^2 \):

\[
\text{Cov}(X, Y) = \frac{\sigma_X^2 + \sigma_Y^2 – \sigma_{X-Y}^2}{2}.
\]

Substitute into the formula for \( r(X, Y) \):

\[
r(X, Y) = \frac{\sigma_X^2 + \sigma_Y^2 – \sigma_{X-Y}^2}{2 \sigma_X \sigma_Y}.
\]

PDF Presentation

corelderivation

Video

Conclusion

The formula for the correlation coefficient \( r(X, Y) \) is derived successfully. This formula is crucial for understanding relationships in data science and statistics.

 

©Postnetwork-All rights reserved.