CourseKata Chapter 4

Modeling Relationships: Numerical & Categorical Responses

Mansour Abdoli, PhD

Today: Response and Explanatory

Session goals

By the end of today, you can:

  • Identify Response vs Explanatory Variables
  • Choose the Correct Plot based on the Response Type
  • Visualize Explained Variability
  • Recognize Random Sampling vs Random Assignment
  • Differentiate Association and Causation

Review: Type of Response

Investigating Distributions

Numerical response

  • Summary
    • Center: Mean, Midean, Mode
    • Spread: Std.Dev., IQR, Range
  • Visualizatin (shape):
    • Histogram/Boxplot

Categorical response

  • Summary
    • Count
    • Proportion


  • Visualization
    • Barplot

Quick check:

          Job            Interest Height Weight    Sex
1 Not Working         No Interest   70.5    188   male
2 Not Working Somewhat Interested   64.8    127 female

Job

Interest

Height

Investigating Associations:

Explaining Variability:

  • Response = Explanatory + Other Stuff
  • Formula: Response ~ Explanatory
  • Residual: \(\text{Observed} - \text{Predicted}\)

Numerical Response

  • Numerical Explanatory
  • Categorical Explanatory

Categorical Response

  • Categorical Explanatory
  • Numerical Explanatory

Consequential Choices

Part of Fingers Dataset

  Gender         Job            Interest GradePredict Height MathAnxious Thumb
1   male Not Working         No Interest          3.3   70.5       Agree    66
2 female Not Working Somewhat Interested          3.7   64.8       Agree    64
  • Practice:
    • What type are the variables, and which is Response?

    • What association do you expect?

  • Intrests ~ Job
  • Height ~ Gender
  • GradePredict ~ Height
  • MathAnxious ~ Year

Part A — Numerical Response

Numerical ~ Numerical:

Scatterplot: Height ~ Thumb

  • Characteristics
    • Form
    • Direction
    • Strength

Numerical ~ Categorical:

Boxplot: GradePredict ~ Job

  • Compare
    • Skewness
    • Center / Positions
    • Spread

Explaning Variation

Total

Between & Within

Explaning Variation: Different Ways

Histogram + Boxplot

Histogram + Density

More Explanatory Variables

Part B — Categorical Response

Categorical ~ Categorical

Total

Between & Within

Characteristics

  • Levels (Values)
  • Counts
  • Proportions
  • : Not much critical
  • : Limited use for comparable groups
  • : Useful for making inferences

Categorical ~ Categorical: Proportions

Total

Between & Within

Stacked-Barplot

Counts & Proportions

Count

Proportions

Part C — Modeling and Design

Models of Variability

  • Chance Model: Variability is only by Chance \[GradePredict = \text{Other Stuff} (Unexplained)\]

  • One Explanatory: Gender explains some variability \[GradePredict = Gender + \text{Other Stuff}\]

  • Two Explanatory: Both Gender and Interest explain some variability \[GradePredict = Gender + Interest + \text{Other Stuff}\]

How to Compate Models (Hypothesis)

  • Explained variability: outcome (residuals) depends on explanatory value
    • Residuals = Observed - Predicted (or expected)
  • More explained variablity \(\rightarrow\) Better model

Research Design

Association or Causation?

  • Observational Studies: Random Sampling
    • Unbiased Association
    • Other variables may also explain variation
  • Designed Experiments: Random Assignment
    • Eliminating the impact of other variables
    • Revealing Causation