The Definition and Calculation of The Correlation Coefficient
Data Science and A.I. Lecture Series
1. Definition of Correlation Coefficient
The correlation coefficient measures the strength and direction of a linear relationship between two variables. It is denoted by r, and it ranges from -1 to +1:
- r = +1: Perfect positive correlation.
- r = -1: Perfect negative correlation.
- r = 0: No linear correlation.
Formula for Correlation Coefficient:
\[
r = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X) \cdot \text{Var}(Y)}}
\]
2. Assumptions for Correlation Coefficient
There are a few key assumptions when using the correlation coefficient:
- Linearity: The relationship between the variables must be linear.
- Normality: Both variables should follow a normal distribution.
- Cause-and-Effect: Correlation does not imply causation; it only measures the degree of association.
3. Examples of Causation
Here are some classic examples of causation:
- Flipping a Switch and Turning on a Light: Flipping the switch (cause) results in the light turning on (effect).
- Smoking and Lung Cancer: Smoking (cause) significantly increases the risk of lung cancer (effect).
- Studying and Grades: More study time (cause) leads to better exam grades (effect).
- Exercise and Fitness: Regular exercise (cause) improves physical fitness (effect).
4. Example 1: Positive Correlation
Consider the following data:
\[
X = [10, 20, 30, 40, 50], \quad Y = [15, 30, 45, 60, 75]
\]
Steps to Calculate Correlation Coefficient:
- Compute the Means:\[
\bar{X} = \frac{\sum X}{n} = 30, \quad \bar{Y} = \frac{\sum Y}{n} = 45
\] - Find Deviations:\[
d_x = X – \bar{X}, \quad d_y = Y – \bar{Y}
\] - Compute r:\[
r = \frac{\sum d_x d_y}{\sqrt{\sum d_x^2 \cdot \sum d_y^2}}
\]
5. Computation Table for Positive Correlation
Here is the step-by-step computation for the correlation coefficient:
\[
\begin{array}{|c|c|c|c|c|c|c|}
\hline
X & Y & d_x & d_y & d_x^2 & d_y^2 & d_x d_y \\
\hline
10 & 15 & -20 & -30 & 400 & 900 & 600 \\
20 & 30 & -10 & -15 & 100 & 225 & 150 \\
30 & 45 & 0 & 0 & 0 & 0 & 0 \\
40 & 60 & 10 & 15 & 100 & 225 & 150 \\
50 & 75 & 20 & 30 & 400 & 900 & 600 \\
\hline
\sum & \sum & 0 & 0 & 1000 & 2250 & 1500 \\
\hline
\end{array}
\]
Final Calculation:
\[
r = \frac{\sum d_x d_y}{\sqrt{\sum d_x^2 \cdot \sum d_y^2}} = \frac{1500}{\sqrt{1000 \cdot 2250}} = \frac{1500}{1500} = 1
\]
Thus, the correlation coefficient r = 1, indicating a perfect positive correlation.
PDF Presentation
correlationandassumptionVideo
6. Conclusion
The correlation coefficient r measures the strength and direction of a linear relationship between two variables.
Causation requires stronger evidence that one variable directly affects the other.
Key Takeaways:
- r = 1: Perfect positive correlation.
- r = -1: Perfect negative correlation.
Reach PostNetwork Academy
Visit our pages for more resources:
- Website: PostNetwork Academy
- YouTube Channel: PostNetwork Academy YouTube
- Facebook Page: PostNetwork Academy on Facebook
- LinkedIn Page: PostNetwork Academy on LinkedIn