Descriptive Statistics
Descriptive Statistics
Descriptive statistics help us summarize and organize large sets of data so we can easily understand their main features. We typically look at two main aspects: the center of the data and the spread of the data.
Measures of Center
Measures of center tell us where the middle or typical value of a data set lies.
- Mean (Average): The sum of all values divided by the number of values.
- Median: The middle value when the data is ordered from least to greatest. If there is an even number of values, it is the average of the two middle numbers.
- Mode: The value(s) that appear most frequently.
Example: Find the mean, median, and mode of the data set: 3,5,7,7,9,12,15.
- Mean: 73+5+7+7+9+12+15â=758ââ8.29
- Median: The numbers are already in order. The middle (4th) number is 7.
- Mode: The number 7 appears most often.
Measures of Spread
Measures of spread describe how stretched or squeezed the data is.
- Range: The difference between the maximum and minimum values. For the data above, the range is 15â3=12.
- Interquartile Range (IQR): The range of the middle 50% of the data. It is calculated as Q3âQ1, where Q1 is the median of the lower half and Q3 is the median of the upper half.
- Standard Deviation: A measure of how much the individual data values deviate from the mean. A low standard deviation means the data is tightly clustered close to the mean.
Box Plots and Outliers
A box plot (or box-and-whisker plot) is a visual display of the five-number summary:
- Minimum: The lowest value (excluding outliers).
- First Quartile (Q1): The median of the lower half of the data.
- Median (Q2): The middle of the data set.
- Third Quartile (Q3): The median of the upper half of the data.
- Maximum: The highest value (excluding outliers).
The "box" is drawn from Q1 to Q3, representing the middle 50% of the data, with a vertical line inside marking the median. The "whiskers" extend outward to the minimum and maximum values.
Identifying Outliers: An outlier is an unusually high or low value that stands out from the rest of the data. Mathematically, a value is considered an outlier if it is:
- Less than Q1â1.5ÃIQR
- Greater than Q3+1.5ÃIQR