Linear Regression
Previously we modeled:
with regression
\[ \hat{y} = b_0 + b_1x \]
Key question:
How strong is the relationship?
Pearson’s (r):
Range:
\[ -1 \le r \le 1 \]
| r value | Meaning |
|---|---|
| close to 1 | strong positive relationship |
| close to -1 | strong negative relationship |
| close to 0 | weak or no linear relationship |
Important:
Correlation measures linear association.
Questions:
Pearson’s r (using R):
Interpretation:
Pairwise correlations for multiple variables:
Example:
mpg wt hp disp
mpg 1.00 -0.87 -0.78 -0.85
wt -0.87 1.00 0.66 0.89
hp -0.78 0.66 1.00 0.79
disp -0.85 0.89 0.79 1.00
Recall the regression model: \[\hat{y} = b_0 + b_1x\]
The slope (b_1):
change in predicted (y) for a one-unit increase in (x).
But how do we know if this slope is meaningful?
The slope is zero: \(\beta_1=0\)
No relationship between \(x\) and \(y\)
Any observed relation is due to random variation.
Regression detects association, not causation.
Example:
Ice cream sales ↑
Drowning incidents ↑
Where models are reliable?
Example:
Linear regression assumes:
\[\text{relationship = straight line}\]
But some relationships are curved.
Example patterns:
A straight line would not describe this well.
We check:


CourseKata Ch. 9