Sampling • Tidy Data • Missing Data • Creating Variables
By the end of today, you can:
NA) in a principled wayWhy we care:
Now take a random sample of 10 and compare.
Discuss: Which sample looks “more representative” of the population? Why?
A useful default structure:
coursekata data frame: FingersThink:
- What are the cases?
- What are the variables?
Often we only want a few rows, too:
Example: keep only Gender == "female".
Question: - How many females are in this sample?
NA (“not available”) to represent missingness.NA is not the same as the text "NA".In groups of 3–4:
SSLast is important to your analysis.na.omit() on the whole data framefilter() for just SSLastBe ready to justify your choice in 1–2 sentences.
CourseKata example idea: “Is ring finger longer than index finger?”
Example: make a “HighRatio” indicator (above 1 vs not).
Now cross-tab by Gender:
NA requires explicit handlingWrite a short response:
SSLast is NA (don’t use na.omit())CourseKata Ch. 2.8–2.11