CourseKata Chapter 12

Confidence Interval (2)

Mansour Abdoli, PhD

Overview / Goals

  • C.I. for One Mean - Review
  • C.I. for Two Means
  • C.I. for Intercept & Slope
  • Paiwise C.I.
  • Factors Affecting C.I.

C.I. for One Mean - Review

Point Estimate \(\pm\) ME

Computing ME

Simulation

  • Resample
  • Compute \(\bar y\)
  • Repeate many times
  • Find middle C%: \[(LL, UL)\]
  • ME = \(\frac{UL-LL}{2}\)

Theoretical

  • Check Assumption
    • Normality or
    • Large Sample Size
  • Compute SE = \(\frac{s}{\sqrt{n}}\)
  • Find \(t_{\alpha/2}\) multiplier
    from \(t\sim T(df=n-1)\)
  • ME = \(t_{\alpha/2} \cdot SE\)

C.I. for Two Means

Point Estimate \(\pm\) ME \[\bar y_2 - \bar y_1 \pm ME\]

Computing ME - By Simulation

Three ways to simulate a Distribution

  • Shuffle \(Y\)
    • Models no-association \(\text{Null Distribution}\)
  • Resample dataset
    • Different groups sizes
  • Resample within groups
    • Holds the same structure
  • Key is finding ME for the same structure

ME and Sampling Distributions

Simulation

  • Resample (Find Sampling Dist.)
  • Compute \(b_1=\bar y_2-\bar y_1\)
  • Repeate many times
  • Find middle C%: \[(LL, UL)\]
  • ME = \(\frac{UL-LL}{2}\)

Computing ME - Theoretical

  • Check Assumptions
  • Normality or
  • Large Sample Sizes
  • Compute SE & DF
  • Equal Variance
    • \(s_p^2 = \frac{(n_1)s_1^2+(n_2)s_2^2}{n_1+n_2-2}\)
    • \(DF = n_1+n_2-2\)
    • SE = \(s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\)
  • Unequal Variance
    • \(DF \approx min(n_1-1, n_2-1)\)
    • SE = \(\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\)
  • \(t_{\alpha/2}\) from \(t\sim T(df=DF)\)
  • ME = \(t_{\alpha/2} \cdot SE\)

Computing ME - Theoretical

‘confint()’ Computes confidence interval

C.I. for Intercept & Slope

C.I. for Intercept & Slope

Simulation

  • Resample Data
  • Compute \(b_0\) and \(b_1\)
  • Repeate many times
  • For \(\beta_0\):
    • Find middle C%: \[(LL_{b_0}, UL_{b_0})\]
    • \(ME_{b_0} = \frac{UL_{b_0}-LL_{b_0}}{2}\)
    • \(b_0\pm ME_{b_0}\)
  • For \(\beta_1\):
    • Find middle C%: \[(LL_{b_1}, UL_{b_1})\]
    • \(ME_{b_1} = \frac{UL_{b_1}-LL_{b_1}}{2}\)
    • \(b_1\pm ME_{b_1}\)

C.I. for Intercept & Slope

Theoretical

  • Assumptions:
    • Linear Association
    • Normal Distribution
    • Equal Variance
  • \(t_{\alpha/2}\) from \(t\sim T(df=n-2)\)
  • \(ME_{b_i} = t_{\alpha/2} \cdot SE_{b_i}\)
  • \(b_i\) and \(SE_i\) are from the Summary table
Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) \(b_0\) \(SE_{b_0}\) \(t_{b_0}=\frac{b_0}{SE_{b_0}}\) \(p\text{-value}_{b_0}\)
Explanatory \(b_1\) \(SE_{b_1}\) \(t_{b_0}=\frac{b_1}{SE_{b_1}}\) \(p\text{-value}_{b_1}\)

C.I. for Intercept & Slope

Recalculate C.I.s:

Paiwise C.I.

Adhoc for Single Test of Multiple Predictors

Account for \(\alpha\) Inflation

  • Benferroni:
    • Use \(\alpha/K\)
      • \(K\) is the number of pairwise tests
  • Tukey’s HSD
    • Use range distribution

Factors Affecting C.I.

Main Ideas

  • Higher confidence level \(\Rightarrow\) wider interval
  • Larger sample size \(\Rightarrow\) narrower interval
  • More variability \(\Rightarrow\) wider interval
  • For two means: more balanced group sizes \(\Rightarrow\) smaller SE
  • For regression slope: more spread in \(X\) \(\Rightarrow\) narrower interval for \(b_1\)

A Quick Demo

Wrap-Up Questions

  • If we want a narrower interval, which things can we control?
  • Why does asking for 99% confidence force a wider interval?
  • Why does larger sample size help even when the center stays the same?
  • Why is a slope estimate more precise when the \(X\) values are more spread out?