Patterns & Association
By the end of today you can:
tally())Two bunches:
2, 2, 2, 2, 2, 2, 2, 2, 22, 1, 3, 3, 2, 3, 1, 2, 1We reveal patterns by sorting and counting.
We use patterns to find
unusual things
exceptions
outliers
[1] 1 1 1 2 2 2 3 3 3
X
1 2 3
3 3 3
What pattern is easier to see after sorting?
What does the frequency table make obvious?
Two core processes:
Pick one:
For your choice:
Cond Age Wt Wt2 BMI BMI2 Fat Fat2 WHR WHR2 Syst Syst2 Diast Diast2
1 0 43 137 137.4 25.1 25.1 31.9 32.8 0.79 0.79 124 118 70 73
2 0 42 150 147.0 29.3 28.7 35.5 NA 0.81 0.81 119 112 80 68
3 0 41 124 124.8 26.9 27.0 35.1 NA 0.84 0.84 108 107 59 65
4 0 40 173 171.4 32.8 32.4 41.9 42.4 1.00 1.00 116 126 71 79
5 0 33 163 160.2 37.9 37.2 41.7 NA 0.86 0.84 113 114 73 78
6 0 24 90 91.8 16.5 16.8 NA NA 0.73 0.73 NA NA 78 76
Condition
1 Uninformed
2 Uninformed
3 Uninformed
4 Uninformed
5 Uninformed
6 Uninformed
Age)Age = 41)'data.frame': 75 obs. of 15 variables:
$ Cond : int 0 0 0 0 0 0 0 0 0 0 ...
$ Age : int 43 42 41 40 33 24 46 21 29 19 ...
$ Wt : int 137 150 124 173 163 90 150 156 141 123 ...
$ Wt2 : num 137 147 125 171 160 ...
$ BMI : num 25.1 29.3 26.9 32.8 37.9 16.5 27.5 25.9 27.5 19.6 ...
$ BMI2 : num 25.1 28.7 27 32.4 37.2 16.8 27.4 25.7 27.4 19.7 ...
$ Fat : num 31.9 35.5 35.1 41.9 41.7 NA 36.1 36.4 NA 26.6 ...
$ Fat2 : num 32.8 NA NA 42.4 NA NA 37.3 NA NA NA ...
$ WHR : num 0.79 0.81 0.84 1 0.86 0.73 0.9 0.78 0.87 0.69 ...
$ WHR2 : num 0.79 0.81 0.84 1 0.84 0.73 0.9 0.78 0.85 0.69 ...
$ Syst : int 124 119 108 116 113 NA 119 116 110 113 ...
$ Syst2 : int 118 112 107 126 114 NA 115 135 115 117 ...
$ Diast : int 70 80 59 71 73 78 75 67 73 75 ...
$ Diast2 : int 73 68 65 79 78 76 77 65 74 72 ...
$ Condition: Factor w/ 2 levels "Informed","Uninformed": 2 2 2 2 2 2 2 2 2 2 ...
'data.frame': 75 obs. of 15 variables:
$ Cond : int 0 0 0 0 0 0 0 0 0 0 ...
$ Age : int 43 42 41 40 33 24 46 21 29 19 ...
$ Wt : int 137 150 124 173 163 90 150 156 141 123 ...
$ Wt2 : num 137 147 125 171 160 ...
$ BMI : num 25.1 29.3 26.9 32.8 37.9 16.5 27.5 25.9 27.5 19.6 ...
$ BMI2 : num 25.1 28.7 27 32.4 37.2 16.8 27.4 25.7 27.4 19.7 ...
$ Fat : num 31.9 35.5 35.1 41.9 41.7 NA 36.1 36.4 NA 26.6 ...
$ Fat2 : num 32.8 NA NA 42.4 NA NA 37.3 NA NA NA ...
$ WHR : num 0.79 0.81 0.84 1 0.86 0.73 0.9 0.78 0.87 0.69 ...
$ WHR2 : num 0.79 0.81 0.84 1 0.84 0.73 0.9 0.78 0.85 0.69 ...
$ Syst : int 124 119 108 116 113 NA 119 116 110 113 ...
$ Syst2 : int 118 112 107 126 114 NA 115 135 115 117 ...
$ Diast : int 70 80 59 71 73 78 75 67 73 75 ...
$ Diast2 : int 73 68 65 79 78 76 77 65 74 72 ...
$ Condition: Factor w/ 2 levels "Informed","Uninformed": 2 2 2 2 2 2 2 2 2 2 ...
Cond Age Wt Wt2 BMI BMI2 Fat Fat2 WHR WHR2 Syst Syst2 Diast Diast2
1 0 43 137 137 25.1 25.1 31.9 32.8 0.79 0.79 124 118 70 73
2 0 42 150 147 29.3 28.7 35.5 NA 0.81 0.81 119 112 80 68
3 0 41 124 125 26.9 27.0 35.1 NA 0.84 0.84 108 107 59 65
Condition
1 Uninformed
2 Uninformed
3 Uninformed
Question: Which one do you prefer: labels or Numerical Codes?
Pick ONE scenario:
Your team must produce:
Each team shares (60–90 sec):
CourseKata Ch. 2.1–2.7