CourseKata Chapter 10

Logic of Inference

Mansour Abdoli, PhD

Overview / Goals

Modeling Data vs DGP
Sampling Distribution
Null Distribution of \(b_1: (\mu_1-\mu_2)\)
Unlikely (unexpected) Events
\(\alpha\) vs. p-value
Testing \(b_1: \beta_1\)

Inference vs Fit

Modeling Data vs DGP

Modeling Data (Fit)

Observed:
- DATA = Model + Error
- More complex, Better Fit
  - Fit: PRE

Modeling DGP (Inference)

Unknown:
- Hypothesis (Claim)
- Null Hypothesis
  - Chance Model

How good a model estimates DGP?
- not sure, but it is not by chance => Test of Hypothesis
- useful for prediction => Cross-validation

Models

Quantitative Response: \(Y = Model + Error\)
- Categorical Explanatory \(\Rightarrow b_0=\bar x_1\), \(b_1=\bar x_2-\bar x_1\)
- Quantitative Explanatory \(\Rightarrow b_0=\hat y_{x=0}\), \(b_1=\text{slope}\)
Models have assumptions:
- We focus on the parameter(s): \(\beta_0\) or \(\beta_1\)

Distributions

Population Distribution (DGP)
- Empty Model: \(\beta_0 + \varepsilon\)
- Fixed shape, center and spread
Sample Distribution
- Data: \(\bar x + e\)
- Everything may change per sample
Distribution of a Sample Statistics
- Sample Mean: \(\beta_0 + \epsilon\)
- May change per sample size.

Quantitative ~ Categorical

Null Distribution of \(b_1: (\mu_1-\mu_2)\)

Null: Empty Model
\(Y\) is independent of \(X\)
Simulate Null Distribution of \(\bar y\)
- Shuffle \(Y\)
- Compute \(b_1=\mu_2-\mu_1\)
- Repeat many times
Visualize with a histogram

Null Distribution Example

Unlikely (unexpected) Events

Observed Statistics far from Expected value
- Distance in not universal
  - Use standardized value
  - Use Probability
If tail probability is low,
- observed statistic is unlikely

Example

\(\alpha\) vs. \(p\)-value

How far is far?

How low is an unusual probability?
- Depends! But typically 5%.
- We call this Significance Level (\(\alpha\))
Far:
- \(p\)-value\(< \alpha\)
- Observed statistic in \(\alpha\) portion of tail(s)

Testing \(b_1: (\mu_1-\mu_2)\)

Null Hypothesis: Empty Model DGP
- Reject this if observed statistics in unusually far.
In what direction?
- The Alternative:
  - \(\beta_1>0\): right tail
  - \(\beta_1<0\): Left tail
  - \(\beta_1\ne0\): both tails

Example: One-Tail Tests

\(H_a: \beta_1>0\)

\(H_a: \beta_1<0\)

Example: Two-Tail Tests

\(H_a: \beta_1\ne 0\)

\(F\) p-value

\(F\) and \(t\) Distributions

\(F\) statistics in ANOVA: - Follows \(F(df_{num}, df_{denum})\)

\(F\) p-value

Quantitative ~ Quantitative

Testing \(b_1: \beta_1\)