Coursekata

Simulation-Based Inference Summary

How to use this page

This page summarizes the simulation steps for confidence intervals and hypothesis tests in common settings.

Two big ideas appear throughout:

A confidence interval uses a bootstrap distribution centered near the observed statistic.
A hypothesis test uses a null distribution centered at the null value.

Tip

General pattern

For a confidence interval:

Compute the observed sample statistic.
Resample with replacement from the sample.
Recompute the statistic for each bootstrap sample.
Repeat many times to build a bootstrap distribution.
Use the middle 95% to cmpute the confidence interval directly or via estimating the margin of error.

For a hypothesis test:

State the null and alternative hypotheses.
Compute the observed sample statistic.
Simulate data under the null model.
Recompute the statistic for each simulated sample.
Repeat many times to build a null distribution.
Find the proportion of simulated statistics at least as extreme as the observed statistic, or identify rejection region(s).
Make a decision and state a conclusion in context.

Parameter and statistic

Parameter: \(p\)
Statistic: \(\hat{p}\)
Sample size: \(n\)

Confidence interval

Record the observed sample proportion \(\hat{p}\).
Represent the sample of size \(n\) as a set of 0s and 1s.
Generate one bootstrap sample proportion:
- Resample (sample with replacement) from the observed sample, using the same sample size \(n\).
- Compute the proportion of 1s in the resample.
Repeat many times to build a bootstrap distribution of \(\hat{p}\).
Use the middle 95% of the simulated proportions, or compute a margin of error \(ME\) from the bootstrap distribution.
Build and interpret the interval as plausible values for the population proportion.

Hypothesis test

State \(H_0: p = p_0\) and the alternative.
Compute the observed sample proportion \(\hat{p}_{obs}\).
Simulate a sample proportion \(\hat{p}\) using a population where the probability of success is \(p_0\):
- Sample with replacement \(n\) times.
- Compute one simulated sample proportion \(\hat{p}\).
Repeat many times to build the null distribution of \(\hat{p}\).
Find the proportion of simulated values at least as extreme as \(\hat{p}_{obs}\) relative to \(p_0\), or identify rejection region(s).
Make a decision and state your conclusion in context.

Note

The confidence interval is centered near the observed \(\hat{p}\).
The test distribution is centered at the null value \(p_0\).
Larger sample sizes produce less variability in \(\hat{p}\).

Parameter and statistic

Parameters: \(p_1\) and \(p_2\) (\(p_1 - p_2\))
Statistics: \(\hat{p}_1\) and \(\hat{p}_2\) (\(\hat{p}_1 - \hat{p}_2\))
Sample sizes: \(n_1\) and \(n_2\)

Confidence interval

Compute the observed difference in sample proportions, \(\hat{p}_1 - \hat{p}_2\).
Represent each sample as 0s and 1s.
Generate one bootstrap difference, \(\hat{p}_1 - \hat{p}_2\):
- Resample (sample with replacement) within group 1, using size \(n_1\), and compute \(\hat{p}_1\).
- Resample (sample with replacement) within group 2, using size \(n_2\), and compute \(\hat{p}_2\).
- Compute \(\hat{p}_1 - \hat{p}_2\).
Repeat many times to build a bootstrap distribution of \(\hat{p}_1 - \hat{p}_2\).
Use the middle 95% or compute a margin of error.
Interpret the interval as plausible values for the difference in population proportions.

Hypothesis test

State \(H_0: p_1 - p_2 = 0\) and the alternative.
Compute the observed difference, \((\hat{p}_1 - \hat{p}_2)_{obs}\).
Build the null distribution:
- Assume both groups come from the same population under the null.
- Pool the outcomes, then randomly assign or resample outcomes into two groups of sizes \(n_1\) and \(n_2\).
- Compute the simulated difference in proportions.
Repeat many times to build the null distribution of \(\hat{p}_1 - \hat{p}_2\).
Find the proportion of simulated differences at least as extreme as the observed difference, or identify rejection region(s).
Make a decision and state your conclusion in context.

Note

For confidence intervals, resample within each group.
For tests, simulate under the null: shuffle the response and reassign to explanatory; i.e.
- Null: No association; treating the two groups as coming from the same population.

Parameter and statistic

Parameter: \(\mu\)
Statistic: \(\bar{x}\)
Sample size: \(n\)

Confidence interval

Record the sample mean \(\bar{x}\).
Generate one bootstrap sample mean:
- Resample with replacement from the sample, using the same size \(n\).
- Compute the sample mean.
Repeat many times to build a bootstrap distribution of \(\bar{x}\).
Use the middle 95% or compute a margin of error.
Interpret the interval as plausible values for the population mean.

Hypothesis test

State \(H_0: \mu = \mu_0\) and the alternative.
Compute the observed mean \(\bar{x}_{obs}\).
Build the null distribution:
- Shift the sample so its mean equals \(\mu_0\), or simulate from a model centered at \(\mu_0\).
- Draw a sample of size \(n\).
- Compute the simulated mean.
Repeat many times to build the null distribution of \(\bar{x}\).
Find the proportion of simulated means at least as extreme as \(\bar{x}_{obs}\) relative to \(\mu_0\), or identify rejection region(s).
Make a decision and state your conclusion in context.

Note

For a one-mean test, the null model must be centered at \(\mu_0\).
The standard error gets smaller as \(n\) increases.

Parameter and statistic

Parameters: \(\mu_1\) and \(\mu_2\)
Statistic: \(\bar{x}_1 - \bar{x}_2\)
Sample sizes: \(n_1\) and \(n_2\)

Confidence interval

Compute the observed difference in sample means, \(\bar{x}_1 - \bar{x}_2\).
Generate one bootstrap difference:
- Resample with replacement within group 1 and compute \(\bar{x}_1\).
- Resample with replacement within group 2 and compute \(\bar{x}_2\).
- Compute \(\bar{x}_1 - \bar{x}_2\).
Repeat many times to build a bootstrap distribution of \(\bar{x}_1 - \bar{x}_2\).
Use the middle 95% or compute a margin of error.
Interpret the interval as plausible values for the difference in population means.

Hypothesis test

State \(H_0: \mu_1 - \mu_2 = 0\) and the alternative.
Compute the observed difference, \((\bar{x}_1 - \bar{x}_2)_{obs}\).
Build the null distribution:
- Assume there is no group effect under the null.
- Pool the observed values and randomly assign them into two groups of sizes \(n_1\) and \(n_2\), or shuffle the group labels.
- Compute the simulated difference in means.
Repeat many times to build the null distribution of \(\bar{x}_1 - \bar{x}_2\).
Find the proportion of simulated differences at least as extreme as the observed difference, or identify rejection region(s).
Make a decision and state your conclusion in context.

Note

For confidence intervals, resample within groups.
For tests, shuffle or reassign group labels under the null.
- Null: No association; observations are from the same population.

Parameter and statistic

Parameter: \(\beta_1\)
Statistic: \(b_1\)
Sample size: \(n\)

Confidence interval

Fit the regression line and record the sample slope \(b_1\).
Generate one bootstrap slope:
- Resample cases with replacement from the original data.
- Refit the regression line.
- Record the slope.
Repeat many times to build a bootstrap distribution of \(b_1\).
Use the middle 95% or compute a margin of error.
Interpret the interval as plausible values for the population slope \(\beta_1\).

Hypothesis test

State \(H_0: \beta_1 = 0\) and the alternative.
Fit the regression and record the observed slope \(b_{1,obs}\).
Build the null distribution:
- Keep the explanatory variable values fixed.
- Shuffle the response values, or shuffle the pairing between \(x\) and \(y\), to break any linear relationship.
- Refit the regression and record the simulated slope.
Repeat many times to build the null distribution of \(b_1\).
Find the proportion of simulated slopes at least as extreme as the observed slope, or identify rejection region(s).
Make a decision and state your conclusion in context.

Note

For confidence intervals, resample whole cases.
For tests, shuffle the response and pair back to predictor values.
- Null: No association between \(y\) and \(x\).

Parameter and statistic

Parameters: \(\mu_1, \mu_2, \dots, \mu_k\)
Statistic: usually \(F\), or another measure of between-group vs within-group variation
Sample sizes: \(n_1, n_2, \dots, n_k\)

Confidence interval

A single overall confidence interval is not the focus for multiple means.

Instead, common follow-up goals are:

confidence intervals for specific pairwise differences
confidence intervals for group means

If needed, construct pairwise bootstrap intervals by resampling within each group and computing the difference in means for the chosen pair.

Hypothesis test

State \(H_0: \mu_1 = \mu_2 = \cdots = \mu_k\) and the alternative that at least one mean differs.
Compute the observed test statistic, often \(F\).
Build the null distribution:
- Assume all groups come from the same population under the null.
- Pool the observed values.
- Randomly assign the values into groups of the original sizes.
- Compute the test statistic for the simulated grouping.
Repeat many times to build the null distribution of the statistic.
Find the proportion of simulated statistics at least as extreme as the observed statistic, or identify a rejection region.
Make a decision and state your conclusion in context.

Note

The overall test asks whether there is evidence that any group mean differs.
- A single test that determines whether a model is significant prevents the inflation of Type I Error.
If the overall test is significant, follow-up comparisons are usually needed.
- Adjusted-\(\alpha\) is used for the post-hoc pairwise comparisons.

Parameter and statistic

Parameters: population joint distribution or population cell proportions
Statistic: often \(\chi^2\)
Sample size: \(n\)

Confidence interval

A single confidence interval is not usually the focus for a test of independence.

Instead, we usually ask whether two categorical variables are associated.

Useful follow-up summaries may include:

confidence intervals for selected conditional proportions
confidence intervals for differences in proportions within specific categories

Hypothesis test

State \(H_0\): the two categorical variables are independent, and the alternative that they are associated.
Compute the observed test statistic, often \(\chi^2\).
Build the null distribution:
- Keep one variable fixed.
- Shuffle the labels of the other variable to break any association while preserving the marginal structure.
- Recompute the test statistic for the shuffled data.
Repeat many times to build the null distribution of the statistic.
Find the proportion of simulated statistics at least as extreme as the observed statistic, or identify a rejection region.
Make a decision and state your conclusion in context.

Note

A significant result suggests an association, not causation.
- A single test that determines whether a model is significant prevents the inflation of Type I Error.
If the overall test is significant, follow-up comparisons are usually needed.
- Adjusted-\(\alpha\) is used for the post-hoc pairwise comparisons.
After the test, examine conditional proportions or residuals to see where the association appears.