CourseKata Chapter 6: Sec 1 to 4

SS, Population Var, Sample Variance & SD

Mansour Abdoli, PhD

Today: Mean, SS, Variance and SD

Session Goals

By the end of today you can:

  • Explain what a residual is
  • Show why the mean minimizes sum of squares
  • Explain why SS grows with sample size
  • Define variance as average squared error
  • Interpret SD as typical prediction error

DATA = MODEL + ERROR

Recall: \[\text{DATA} = \text{MODEL} + \text{ERROR}\]

For the empty (simple) model: \[\begin{array}\text{Model = Center} &= \hat{y_i}\\ \text{Data} &= y_i\\ \text{Residual} = e_i &= y_i - \hat{y_i} \end{array}\]

  • Total Error (SS) as Model Performance

Group Activity

  • Let population be \[5, 7, 7, 12, 20\]
  • Find mode, median, and mean.
  • Evaluate Different Simple Models
    • Choose one center \[\text{Centers} = 0, 5, 7, 10, \bar y, 12, 15, 17, \text{ or } 20\]
  • Calculate Residuals and SS

Checking Results

Plotting the Results

Lesson Learned

  • SS~Center relation is U-shaped
  • Minimum SS happens at \(\hat y_i = \bar y\).


  • The mean minimizes: \[SS = \sum_{i=1}^n (y_i - \hat{y_i})^2\]

Why SS?

  • Uses Mean (Balances Error)
  • Measures Variability

Check Application of Variability

  • Original Data
  • Stretched Data

Population Size and SS

  • What happens if we double the data size? \[5, 7, 7, 12, 20, 5, 7, 7, 12, 20\]
    • Does the average change?
    • Do the residuals change?
    • How does SS change?
  • Let’s check:

SS and Relative Variability

  • SS represents variability.
  • Variability changes relative to the sample size.
  • Relative Variability (Variance) = SS/Size

Population Variance vs. SS

  • Population Variance = \[\sigma^2 = \frac{SS}{n}\]

Sample Variance

Random Samples and Sample Variance

Sampling Distribution of Variance

Using the Law of Large Numbers for Distribution

  • Mean Sample Variance \(\approx (4/5)\) Population Variance

Sample Variance

  • Sample Variance = \(s^2 = \frac{SS}{n-1}\)

  • Sample Standard Deviation = \(s=\sqrt{\frac{SS}{n-1}}\)

    • A good measure of model error.
      • The typical size of prediction error when using the mean.
  • R Functions

٫# Important Distinction

Quantity Measures
Residual Individual error
SS Total squared error
Variance Average squared error
SD Typical prediction error

Exit Question

If we change one value to be extremely large, what happens?

  1. Mean changes a little
  2. Mean changes a lot
  3. Variance increases
  4. Both B and C

Big Idea Today

The mean is not arbitrary.


It is the value that minimizes squared prediction error.


Variance and SD measure how wrong that model is.