Summarizing Data through numbers
Measures of Central Tendency
Dispersion
Skew and Kurtosis
Measures of Central Tendency
Data set: 3,4,3,1,2,3,9,5,6,7,4,8
Mean
3+4+3+1+2+3+9+5+6+7++8 /12 = 4.583
Median
1,2,3,3,3,4,4,5,6,7,8,9 Hence Answer = 4
Mode
The value 3 appears 3 times, and 4 appears 2 times and all other values appear once. Hence 3 is the mode.
Where do we want to use Mean, Median and Mode
Choosing between mean and median
- Bad outliers
Errors
Do not provide a realistic picture of the story
- Good outlierss
The story is in the outliers
Mode
- Useful with nominal variables
- Multi modal distributions
[ Strategy: Lose 1 rupee everyday on 99% of the days. But on 1% of the days , It gave re. 10,00,00,000. ]
Example :-
40% - voted for garbage can at 25th meter mark
45% - voted for garbage can at 75th meter mark
15% - uniform between 0 and 100
Measures of Dispersion
Data set: 3,4,3,1,2,3,9,5,6,7,4,8
Range (Max-Min) (9-1 = 8)
Inter Quartile Range: 3rd quartile - 1st quartile (75th Percentile- 25th Percentile) (6.5-3 = 3.5)
Sample Standard deviation
Questions that go with Standard deviation
- Why do we use the square function on the deviations ? What are its implications ?
- Why do we work on standard deviation and not the variance ?
- Why do we average by dividing by N-1 and not N ?
Mean absolute Deviation and its variants
- Use |𝒳i-𝒳| instead of (𝒳i-𝒳)2
Measures of Central Tendency
Dispersion
Skew and Kurtosis
Measures of Central Tendency
Data set: 3,4,3,1,2,3,9,5,6,7,4,8
Mean
3+4+3+1+2+3+9+5+6+7++8 /12 = 4.583
Median
1,2,3,3,3,4,4,5,6,7,8,9 Hence Answer = 4
Mode
The value 3 appears 3 times, and 4 appears 2 times and all other values appear once. Hence 3 is the mode.
Where do we want to use Mean, Median and Mode
Choosing between mean and median
- Bad outliers
Errors
Do not provide a realistic picture of the story
- Good outlierss
The story is in the outliers
Mode
- Useful with nominal variables
- Multi modal distributions
[ Strategy: Lose 1 rupee everyday on 99% of the days. But on 1% of the days , It gave re. 10,00,00,000. ]
Example :-
40% - voted for garbage can at 25th meter mark
45% - voted for garbage can at 75th meter mark
15% - uniform between 0 and 100
Measures of Dispersion
Data set: 3,4,3,1,2,3,9,5,6,7,4,8
Range (Max-Min) (9-1 = 8)
Inter Quartile Range: 3rd quartile - 1st quartile (75th Percentile- 25th Percentile) (6.5-3 = 3.5)
Sample Standard deviation
Questions that go with Standard deviation
- Why do we use the square function on the deviations ? What are its implications ?
- Why do we work on standard deviation and not the variance ?
- Why do we average by dividing by N-1 and not N ?
Mean absolute Deviation and its variants
- Use |𝒳i-𝒳| instead of (𝒳i-𝒳)2
No comments:
Post a Comment