CourseKata Chapter 13

Multivariate Models

Mansour Abdoli, PhD

Overview / Goals

Does more Predictor mean better Model?
Explaning Variability by Different Predictors
Visualizing Variability by Multiple Predictors
Constructing a Multivariate Model
Interpreting Coefficients
Comparing Models

Does more Predictor mean better Model?

Model Complexity Review

\[ \text{DATA} = \text{MODEL} + \text{ERROR} \]

Empty Model: Mean Response
Simple Model: 2-Group Means or a Simple Regression
Complex: Multiple-Group Means and/or more Numerical variables

The Smallville data

Outcome:

PriceK: sale price in thousands of dollars

Predictors:

Neighborhood: Downtown or Eastside
HomeSizeK: square footage in thousands
HasFireplace: fireplace indicator

Explaning Variability by Different Predictors

Variability in `PriceK`

Two single-predictor models

Which model reduces more error; explains more variability?

Visualizing Variability by Multiple Predictors

Using Both Predictors

Does neighborhood add any value?

Does home-size effect changes?

Constructing a Multivariate Model

Specifying a multivariate model

\[ Y_i = b_0 + b_1X_{1i} + b_2X_{2i} + e_i \]

For Smallville:

\[ \text{PriceK}_i = b_0 + b_1\text{Neighborhood}_i + b_2\text{HomeSizeK}_i + e_i \]

Fitting the model in R

Model Fit: Minimize SS Error

Write the fitted model

\[ \widehat{\text{PriceK}} = b_0 + b_1(\text{NeighborhoodEastside}) + b_2(\text{HomeSizeK}) \]

Important

NeighborhoodEastside = 1 for Eastside and 0 for Downtown.

Interpreting Coefficients

Interpreting the intercept

The intercept is the predicted price when:
- NeighborhoodEastside = 0 → Downtown
- HomeSizeK = 0
Intercept is the model’s Downtown baseline at 0K square feet.
Is this substantively meaningful, or mainly mathematical?

Interpreting `NeighborhoodEastside`

The predicted difference in price …
- between Eastside and Downtown homes …
- holding home size constant …
- in $1000\$$ is …

Holding the other variable constant

This means we compare observations that are the same on the other predictor.

Same home size, different neighborhood
Same neighborhood, different home size

Comparing Models

Compare single vs multivariate slope

Single Predictor Model:

Two Predictor Model:

Why does the HomeSizeK coefficient get smaller after adding Neighborhood?

Predictions & Residuals

Visualizing predictions

Parallel lines

For additive models, group lines have:
- the same slope and
- different intercepts.

Residuals still mean the same thing

\[ \text{Residual} = \text{Observed} - \text{Predicted} \]

Sum of squared residuals

SS Error: Error left after using both predictors.

Comparing models by error reduction

Which model is better?

Could the difference be by chance?

CourseKata Chapter 13

Overview / Goals

Does more Predictor mean better Model?

Model Complexity Review

The Smallville data

Explaning Variability by Different Predictors

Variability in PriceK

Two single-predictor models

Visualizing Variability by Multiple Predictors

Using Both Predictors

Constructing a Multivariate Model

Specifying a multivariate model

Fitting the model in R

Write the fitted model

Interpreting Coefficients

Interpreting the intercept

Interpreting NeighborhoodEastside

Holding the other variable constant

Comparing Models

Compare single vs multivariate slope

Predictions & Residuals

Visualizing predictions

Parallel lines

Residuals still mean the same thing

Sum of squared residuals

Comparing models by error reduction

Variability in `PriceK`

Interpreting `NeighborhoodEastside`