CourseKata Chapter 13

Multivariate Models Inference

Mansour Abdoli, PhD

Overview / Goals

  • Review: Multivariate Model
    • Interpreting Coefficients
    • Comparing Models
  • New Topic: Is the model real (significant)?
    • Simulated Null Distribution:
      • Is PRE/F/\(\beta_j\) real?
    • Theoretical Null Distribution
      • Assumptions
      • F/\(b_j\) Distribution

Multivariate Model

\[ \text{PriceK} = b_0 + b_1(\text{HomeSizeK}) + b_2(\text{Neighborhood}) \]

Interpreting Coefficeints

         (Intercept)            HomeSizeK NeighborhoodEastside 
           177.24018             67.85362            -66.21476 
  • \(b_0=177.24\): Price in $1000 at baseline (Downtown & 0 sq.ft)
  • \(b_1=67.85\): Difference in Price in $1000 for any increase of 1000 sq.ft. in home size in the same neighborhood.
  • \(b_2=-66.21\): Difference in Price in $1000 of the same home size between Downtown and Eastside.

Comparing Models

PRE= 0 

PRE= 0.3595436

PRE= 0.4217041

PRE= 0.5428084

Is the model real (significant)?

Hypotheses & Null Distribution

  • Null Hypothesis:
    • No; there is no different
  • Alternative Hypothesis:
    • Yes; the difference is not by chance

 

  • Build Null Distribution
  • Compare with Observed statistic

Inference by Simulation

Is PRE difference real?

  • is it?

Is F difference real?

  • is it?

Are slopes meaningfully different from 0?

Theoretical Distribution of \(F\) & \(b_j\)

  • Population Parameters: \(\beta_0\), \(\beta_1\), \(\beta_2\)

  • Sample Statistics: \(b_0\), \(b_1\), \(b_2\)

  • Null Distribution Assumptions:

    • Observations: Independent (Within & Between)
    • Error Distribution:
      • (Roughly) Normal
      • Constant \(\sigma\)

Checking Normality/Constant Variance:

F Inference

  • NULL Hypothesis: All slopes are zero
    • No association beteen response and predictors.
  • Null Distribution: \(F\sim F(df_1=2, df_2=n-3)\)
    • \(p\)-value \(= P(F>F_{obs})\)

\(\beta_j\) Inference

\[ t = \frac{b_j}{SE(b_j)} \]

  • NULL Hypothesis: usually \(\beta_j=0\)
    • \(j^{th}\) predictor has no effect.
  • Null Distribution: \(t\sim T(df=n-3)\)
    • \(p\)-value \(= P(t \text{ at least as extrem as } t_{obs} \text{ in favor of } H_a)\)

Practice

                      Estimate Std. Error   t value     Pr(>|t|)
(Intercept)          177.24018   43.66642  4.058959 0.0003408006
HomeSizeK             67.85362   19.90140  3.409490 0.0019321124
NeighborhoodEastside -66.21476   23.89049 -2.771594 0.0096394164

Does NeighborhoodEastside have a negative impact on PriceK?

  • Hypotheses: \(H_0: \beta_{Neighborhood} = 0\) vs. \(H_a: \beta_{Neighborhood} < 0\)
  • \(p\)-value = P(T<-2.7715945) = ?

Practice

                      Estimate Std. Error   t value     Pr(>|t|)
(Intercept)          177.24018   43.66642  4.058959 0.0003408006
HomeSizeK             67.85362   19.90140  3.409490 0.0019321124
NeighborhoodEastside -66.21476   23.89049 -2.771594 0.0096394164

Find a 95% C.I. for \(\beta_\text{HomeSizeL}\):

  • SE $= 19.9013993
  • \(t^* = t_{0.025, df=29} = 2.0452\)
  • 95% C.I. of \(\beta_{\text{HomeSizeK}} = (...., ....)\)