CourseKata Chapter 13

Multivariate Models

Mansour Abdoli, PhD

Overview / Goals

  • Does more Predictor mean better Model?
  • Explaning Variability by Different Predictors
  • Visualizing Variability by Multiple Predictors
  • Constructing a Multivariate Model
  • Interpreting Coefficients
  • Comparing Models

Does more Predictor mean better Model?

Model Complexity Review

\[ \text{DATA} = \text{MODEL} + \text{ERROR} \]

  • Empty Model: Mean Response
  • Simple Model: 2-Group Means or a Simple Regression
  • Complex: Multiple-Group Means and/or more Numerical variables

The Smallville data

Outcome:

  • PriceK: sale price in thousands of dollars

Predictors:

  • Neighborhood: Downtown or Eastside
  • HomeSizeK: square footage in thousands
  • HasFireplace: fireplace indicator

Explaning Variability by Different Predictors

Variability in PriceK

Two single-predictor models

Which model reduces more error; explains more variability?

Visualizing Variability by Multiple Predictors

Using Both Predictors

  • Does neighborhood add any value?
  • Does home-size effect changes?

Constructing a Multivariate Model

Specifying a multivariate model

\[ Y_i = b_0 + b_1X_{1i} + b_2X_{2i} + e_i \]

For Smallville:

\[ \text{PriceK}_i = b_0 + b_1\text{Neighborhood}_i + b_2\text{HomeSizeK}_i + e_i \]

Fitting the model in R

  • Model Fit: Minimize SS Error

Write the fitted model

\[ \widehat{\text{PriceK}} = b_0 + b_1(\text{NeighborhoodEastside}) + b_2(\text{HomeSizeK}) \]

Important

NeighborhoodEastside = 1 for Eastside and 0 for Downtown.

Interpreting Coefficients

Interpreting the intercept

  • The intercept is the predicted price when:

    • NeighborhoodEastside = 0 → Downtown
    • HomeSizeK = 0
  • Intercept is the model’s Downtown baseline at 0K square feet.

  • Is this substantively meaningful, or mainly mathematical?

Interpreting NeighborhoodEastside

  • The predicted difference in price …
    • between Eastside and Downtown homes …
    • holding home size constant
    • in \(1000\$\) is …

Holding the other variable constant

This means we compare observations that are the same on the other predictor.

  • Same home size, different neighborhood
  • Same neighborhood, different home size

Comparing Models

Compare single vs multivariate slope

Single Predictor Model:

Two Predictor Model:

  • Why does the HomeSizeK coefficient get smaller after adding Neighborhood?

Predictions & Residuals

Visualizing predictions

Parallel lines

  • For additive models, group lines have:
    • the same slope and
    • different intercepts.

Residuals still mean the same thing

\[ \text{Residual} = \text{Observed} - \text{Predicted} \]

Sum of squared residuals

SS Error: Error left after using both predictors.

Comparing models by error reduction

Which model is better?

Could the difference be by chance?