CourseKata Chapter 13

Multivariate Models Inference

Mansour Abdoli, PhD

Overview / Goals

Review: Multivariate Model
- Interpreting Coefficients
- Comparing Models
New Topic: Is the model real (significant)?
- Simulated Null Distribution:
  - Is PRE/F/$\beta_j$ real?
- Theoretical Null Distribution
  - Assumptions
  - F/$b_j$ Distribution

Multivariate Model

\[ \text{PriceK} = b_0 + b_1(\text{HomeSizeK}) + b_2(\text{Neighborhood}) \]

Interpreting Coefficeints

         (Intercept)            HomeSizeK NeighborhoodEastside 
           177.24018             67.85362            -66.21476

$b_0=177.24$: Price in $1000 at baseline (Downtown & 0 sq.ft)
$b_1=67.85$: Difference in Price in $1000 for any increase of 1000 sq.ft. in home size in the same neighborhood.
$b_2=-66.21$: Difference in Price in $1000 of the same home size between Downtown and Eastside.

Comparing Models

PRE= 0

PRE= 0.3595436

PRE= 0.4217041

PRE= 0.5428084

Is the model real (significant)?

Hypotheses & Null Distribution

Null Hypothesis:
- No; there is no different
Alternative Hypothesis:
- Yes; the difference is not by chance

Build Null Distribution
Compare with Observed statistic

Inference by Simulation

Is PRE difference real?

is it?

Is F difference real?

is it?

Are slopes meaningfully different from 0?

Theoretical Distribution of $F$ & $b_j$

Population Parameters: $\beta_0$, $\beta_1$, $\beta_2$
Sample Statistics: $b_0$, $b_1$, $b_2$
Null Distribution Assumptions:
- Observations: Independent (Within & Between)
- Error Distribution:
  - (Roughly) Normal
  - Constant $\sigma$

Checking Normality/Constant Variance:

F Inference

NULL Hypothesis: All slopes are zero
- No association beteen response and predictors.
Null Distribution: $F\sim F(df_1=2, df_2=n-3)$
- $p$-value $= P(F>F_{obs})$

$\beta_j$ Inference

\[ t = \frac{b_j}{SE(b_j)} \]

NULL Hypothesis: usually $\beta_j=0$
- $j^{th}$ predictor has no effect.
Null Distribution: $t\sim T(df=n-3)$
- $p$-value $= P(t \text{ at least as extrem as } t_{obs} \text{ in favor of } H_a)$

Practice

                      Estimate Std. Error   t value     Pr(>|t|)
(Intercept)          177.24018   43.66642  4.058959 0.0003408006
HomeSizeK             67.85362   19.90140  3.409490 0.0019321124
NeighborhoodEastside -66.21476   23.89049 -2.771594 0.0096394164

Does NeighborhoodEastside have a negative impact on PriceK?

Hypotheses: $H_0: \beta_{Neighborhood} = 0$ vs. $H_a: \beta_{Neighborhood} < 0$
$p$-value = P(T<-2.7715945) = ?

Practice

                      Estimate Std. Error   t value     Pr(>|t|)
(Intercept)          177.24018   43.66642  4.058959 0.0003408006
HomeSizeK             67.85362   19.90140  3.409490 0.0019321124
NeighborhoodEastside -66.21476   23.89049 -2.771594 0.0096394164

Find a 95% C.I. for $\beta_\text{HomeSizeL}$:

SE $= 19.9013993
$t^* = t_{0.025, df=29} = 2.0452$
95% C.I. of $\beta_{\text{HomeSizeK}} = (...., ....)$

CourseKata Chapter 13

Overview / Goals

Multivariate Model

Interpreting Coefficeints

Comparing Models

Is the model real (significant)?

Hypotheses & Null Distribution

Inference by Simulation

Is PRE difference real?

Is F difference real?

Are slopes meaningfully different from 0?

Theoretical Distribution of \(F\) & \(b_j\)

Checking Normality/Constant Variance:

F Inference

\(\beta_j\) Inference

Practice

Practice