CourseKata Chapter 8

PRE, F, Cohen’s D

Mansour Abdoli, PhD

Overview / Goals

By the end of today, you can:

fit and interpret multi‑group (3+ groups) models using lm()
compare group models using PRE, df, MS, and the F ratio
describe and compute effect sizes (mean differences, PRE, Cohen’s d)
use shuffle/simulation to ask whether an effect could plausibly be due to randomness

Review: Group Models

Data:
- Outcome: \(Y\) (A numerical Variable)
- Explanatory: A Categorical Variable
Models:
- Empty model: one mean for everyone
- Group model: separate mean for each group.

Example

Comparing Models

Model Measures:

    Gender Interest
SSE  10546    11567
PRE 0.1123 0.026324
F   19.609   2.0818

Which one is better?

Effect of Increasing Levels

Example: 2-Level Explanatory

\(Y_i = b_0 + b_1 \cdot X_{1i} + e_i\)
- \(b_0 = \text{mean}(\text{Thumb}_\text{short})\)
- \(b_1 = \text{mean}(\text{Thumb}_\text{tall})-\text{mean}(\text{Thumb}_\text{short})\)
- \(X_{1i} = 1 \text{ for Height}==\text{tall}\)

Example: 3-Level Explanatory

\(Y_i = b_0 + b_1 \cdot X_{1i} + b_2 \cdot X_{2i} + e_i\)
- \(b_0 = \text{mean}(\text{Thumb}_\text{Short})\)
- \(b_1 = \text{mean}(\text{Thumb}_\text{Med.})-\text{mean}(\text{Thumb}_\text{Short})\)
  - \(X_{1i} = 1 \text{ for Height}==\text{Med.}\)
- \(b_2 = \text{mean}(\text{Thumb}_\text{Tall})-\text{mean}(\text{Thumb}_\text{Short})\)
  - \(X_{2i} = 1 \text{ for Height}==\text{Tall}\)

Comparing Model Measures

Notice anything?

Level Size and Model Measures

SSE (SSR):

How much of error is not explained.

PRE = (1-SSE)/SST

Proportion of Explained variability

More levels, more complex models, more explained variability

F = MSM/MSE

Ratio of average explained variability to unexplained variablity
- Less sensetive to larger number of levels

The F Ratio

Why we need F

PRE alone doesn’t “penalize” complexity.

Adding parameters almost always reduces SSE.
- Is the reduction big enough to be worth the extra parameters?

F ratio \[F = \frac{MS_{\text{Model}}}{MS_{\text{Error}}}\]

Large \(F\) means: error reduced per parameter is large relative to remaining error.

Effect Size in Group Models

Why effect size?

With large \(n\), tiny effects can look convincing.
Effect size says: How big is the difference?

Common group-model effect sizes:

Mean differences (model parameters)
PRE (variance explained relative to empty model)
Cohen’s \(d\) (standardized mean difference)

Cohen’s d (two groups)

Standardized Difference: \[d = \frac{\bar Y_1 - \bar Y_2}{s_{pooled}}\]
- \(s_{pooled}^2 = MSE = \frac{SSE}{df_E}\)
- In general: \(s_{pooled}^2 = \frac{df_1\cdot s_1^2+df_2\cdot s_2^2}{df_1+df_2}\)

Verify the calculation.

Effect size ≠ causality

Large difference between groups:
- Practical Effect
- Nor a proof of causal effect

Group models show association
Causal claims need stronger design / evidence

Thinking about DGP

Observed Sample to Population Inference

The observed effect:
- A real systematic difference,
- Or a random variation (noise)
A Test Approach
- Assume random variation (the true effect = 0)
  - Null Hypothesis (\(H_0\))
- Find Null Distribution of effect
  - Sampling Distribution under the Null
- Measure how often effects are as large as observed one

Shuffle-based Simulation (Conceptual Inference)

Shuffle logic

Keep the same \(Y\) values
Randomly reassign them to groups
Refit the model
Record the effect size (e.g., \(b_1\) or PRE)
Repeat many times to get a shuffle distribution
- A.k.a. Null Distribution

Example: Two-group Model

Visualize the shuffle distribution

Interpreting the shuffle plot

The histogram:
- effects expected by chance
  (if the DGP had no group effect)
The vertical line is the observed effect

If the observed effect is far out in the tails:
- The “no-effect” DGP is unlikely

Wrap-up

What Chapter 8 emphasizes

Multi-group models:
baseline mean and group changes
Model comparison:
fit (PRE) + complexity (df)
F ratio:
a complexity-aware comparison tool
Effect sizes:
Statistical vs Practical Difference
Shuffle simulation:
Could the effect be due to Noise?