Definition and Calculation of The Correlation Coefficient Video

The Definition and Calculation of The Correlation Coefficient

Data Science and A.I. Lecture Series

 

1. Definition of Correlation Coefficient

The correlation coefficient measures the strength and direction of a linear relationship between two variables. It is denoted by r, and it ranges from -1 to +1:

  • r = +1: Perfect positive correlation.
  • r = -1: Perfect negative correlation.
  • r = 0: No linear correlation.

Formula for Correlation Coefficient:

\[
r = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X) \cdot \text{Var}(Y)}}
\]

2. Assumptions for Correlation Coefficient

There are a few key assumptions when using the correlation coefficient:

  1. Linearity: The relationship between the variables must be linear.
  2. Normality: Both variables should follow a normal distribution.
  3. Cause-and-Effect: Correlation does not imply causation; it only measures the degree of association.

3. Examples of Causation

Here are some classic examples of causation:

  • Flipping a Switch and Turning on a Light: Flipping the switch (cause) results in the light turning on (effect).
  • Smoking and Lung Cancer: Smoking (cause) significantly increases the risk of lung cancer (effect).
  • Studying and Grades: More study time (cause) leads to better exam grades (effect).
  • Exercise and Fitness: Regular exercise (cause) improves physical fitness (effect).

4. Example 1: Positive Correlation

Consider the following data:

\[
X = [10, 20, 30, 40, 50], \quad Y = [15, 30, 45, 60, 75]
\]

Steps to Calculate Correlation Coefficient:

  1. Compute the Means:\[
    \bar{X} = \frac{\sum X}{n} = 30, \quad \bar{Y} = \frac{\sum Y}{n} = 45
    \]
  2. Find Deviations:\[
    d_x = X – \bar{X}, \quad d_y = Y – \bar{Y}
    \]
  3. Compute r:\[
    r = \frac{\sum d_x d_y}{\sqrt{\sum d_x^2 \cdot \sum d_y^2}}
    \]

5. Computation Table for Positive Correlation

Here is the step-by-step computation for the correlation coefficient:

\[
\begin{array}{|c|c|c|c|c|c|c|}
\hline
X & Y & d_x & d_y & d_x^2 & d_y^2 & d_x d_y \\
\hline
10 & 15 & -20 & -30 & 400 & 900 & 600 \\
20 & 30 & -10 & -15 & 100 & 225 & 150 \\
30 & 45 & 0 & 0 & 0 & 0 & 0 \\
40 & 60 & 10 & 15 & 100 & 225 & 150 \\
50 & 75 & 20 & 30 & 400 & 900 & 600 \\
\hline
\sum & \sum & 0 & 0 & 1000 & 2250 & 1500 \\
\hline
\end{array}
\]

Final Calculation:

\[
r = \frac{\sum d_x d_y}{\sqrt{\sum d_x^2 \cdot \sum d_y^2}} = \frac{1500}{\sqrt{1000 \cdot 2250}} = \frac{1500}{1500} = 1
\]

Thus, the correlation coefficient r = 1, indicating a perfect positive correlation.

PDF Presentation

correlationandassumption

Video

 

6. Conclusion

The correlation coefficient r measures the strength and direction of a linear relationship between two variables.

Causation requires stronger evidence that one variable directly affects the other.

Key Takeaways:

  • r = 1: Perfect positive correlation.
  • r = -1: Perfect negative correlation.

Reach PostNetwork Academy

Visit our pages for more resources:

©Postnetwork-All rights reserved.