Multivariate Models
\[ \text{DATA} = \text{MODEL} + \text{ERROR} \]
Outcome:
PriceK: sale price in thousands of dollarsPredictors:
Neighborhood: Downtown or EastsideHomeSizeK: square footage in thousandsHasFireplace: fireplace indicatorPriceKWhich model reduces more error; explains more variability?
\[ Y_i = b_0 + b_1X_{1i} + b_2X_{2i} + e_i \]
For Smallville:
\[ \text{PriceK}_i = b_0 + b_1\text{Neighborhood}_i + b_2\text{HomeSizeK}_i + e_i \]
\[ \widehat{\text{PriceK}} = b_0 + b_1(\text{NeighborhoodEastside}) + b_2(\text{HomeSizeK}) \]
Important
NeighborhoodEastside = 1 for Eastside and 0 for Downtown.
The intercept is the predicted price when:
NeighborhoodEastside = 0 → DowntownHomeSizeK = 0Intercept is the model’s Downtown baseline at 0K square feet.
Is this substantively meaningful, or mainly mathematical?
NeighborhoodEastsideThis means we compare observations that are the same on the other predictor.
Single Predictor Model:
Two Predictor Model:
HomeSizeK coefficient get smaller after adding Neighborhood?\[ \text{Residual} = \text{Observed} - \text{Predicted} \]
SS Error: Error left after using both predictors.
Which model is better?
Could the difference be by chance?
CourseKata Ch. 13 (1)