CourseKata Chapter 12

Confidence Interval (2)

Mansour Abdoli, PhD

Overview / Goals

C.I. for One Mean - Review
C.I. for Two Means
C.I. for Intercept & Slope
Paiwise C.I.
Factors Affecting C.I.

C.I. for One Mean - Review

Point Estimate \(\pm\) ME

Computing ME

Simulation

Resample
Compute \(\bar y\)
Repeate many times
Find middle C%: \[(LL, UL)\]
ME = \(\frac{UL-LL}{2}\)

Theoretical

Check Assumption
- Normality or
- Large Sample Size
Compute SE = \(\frac{s}{\sqrt{n}}\)
Find \(t_{\alpha/2}\) multiplier
from \(t\sim T(df=n-1)\)
ME = \(t_{\alpha/2} \cdot SE\)

C.I. for Two Means

Point Estimate \(\pm\) ME \[\bar y_2 - \bar y_1 \pm ME\]

Computing ME - By Simulation

Three ways to simulate a Distribution

Shuffle \(Y\)
- Models no-association \(\text{Null Distribution}\)
Resample dataset
- Different groups sizes
Resample within groups
- Holds the same structure
Key is finding ME for the same structure

ME and Sampling Distributions

Simulation

Resample (Find Sampling Dist.)
Compute \(b_1=\bar y_2-\bar y_1\)
Repeate many times
Find middle C%: \[(LL, UL)\]
ME = \(\frac{UL-LL}{2}\)

Computing ME - Theoretical

Check Assumptions

Normality or

Large Sample Sizes

Compute SE & DF

Equal Variance
- \(s_p^2 = \frac{(n_1)s_1^2+(n_2)s_2^2}{n_1+n_2-2}\)
- \(DF = n_1+n_2-2\)
- SE = \(s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\)

Unequal Variance
- \(DF \approx min(n_1-1, n_2-1)\)
- SE = \(\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\)

\(t_{\alpha/2}\) from \(t\sim T(df=DF)\)
ME = \(t_{\alpha/2} \cdot SE\)

Computing ME - Theoretical

‘confint()’ Computes confidence interval

C.I. for Intercept & Slope

Simulation

Resample Data

Compute \(b_0\) and \(b_1\)

Repeate many times

For \(\beta_0\):
- Find middle C%: \[(LL_{b_0}, UL_{b_0})\]
- \(ME_{b_0} = \frac{UL_{b_0}-LL_{b_0}}{2}\)
- \(b_0\pm ME_{b_0}\)

For \(\beta_1\):
- Find middle C%: \[(LL_{b_1}, UL_{b_1})\]
- \(ME_{b_1} = \frac{UL_{b_1}-LL_{b_1}}{2}\)
- \(b_1\pm ME_{b_1}\)

C.I. for Intercept & Slope

Theoretical

Assumptions:
- Linear Association
- Normal Distribution
- Equal Variance
\(t_{\alpha/2}\) from \(t\sim T(df=n-2)\)
\(ME_{b_i} = t_{\alpha/2} \cdot SE_{b_i}\)

\(b_i\) and \(SE_i\) are from the Summary table

Coefficients
	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	\(b_0\)	\(SE_{b_0}\)	\(t_{b_0}=\frac{b_0}{SE_{b_0}}\)	\(p\text{-value}_{b_0}\)
Explanatory	\(b_1\)	\(SE_{b_1}\)	\(t_{b_0}=\frac{b_1}{SE_{b_1}}\)	\(p\text{-value}_{b_1}\)

C.I. for Intercept & Slope

Recalculate C.I.s:

Paiwise C.I.

Adhoc for Single Test of Multiple Predictors

Account for \(\alpha\) Inflation

Benferroni:
- Use \(\alpha/K\)
  - \(K\) is the number of pairwise tests
Tukey’s HSD
- Use range distribution

Factors Affecting C.I.

Main Ideas

Higher confidence level \(\Rightarrow\) wider interval
Larger sample size \(\Rightarrow\) narrower interval
More variability \(\Rightarrow\) wider interval
For two means: more balanced group sizes \(\Rightarrow\) smaller SE
For regression slope: more spread in \(X\) \(\Rightarrow\) narrower interval for \(b_1\)

A Quick Demo

Wrap-Up Questions

If we want a narrower interval, which things can we control?
Why does asking for 99% confidence force a wider interval?
Why does larger sample size help even when the center stays the same?
Why is a slope estimate more precise when the \(X\) values are more spread out?