Facebook Pixel
Mathos AI logo

Descriptive Statistics

Descriptive Statistics

Descriptive statistics help us summarize and organize large sets of data so we can easily understand their main features. We typically look at two main aspects: the center of the data and the spread of the data.

Measures of Center

Measures of center tell us where the middle or typical value of a data set lies.

  • Mean (Average): The sum of all values divided by the number of values.
  • Median: The middle value when the data is ordered from least to greatest. If there is an even number of values, it is the average of the two middle numbers.
  • Mode: The value(s) that appear most frequently.

Example: Find the mean, median, and mode of the data set: 3,5,7,7,9,12,153, 5, 7, 7, 9, 12, 15.

  • Mean: 3+5+7+7+9+12+157=587≈8.29\frac{3 + 5 + 7 + 7 + 9 + 12 + 15}{7} = \frac{58}{7} \approx 8.29
  • Median: The numbers are already in order. The middle (4th) number is 77.
  • Mode: The number 77 appears most often.

Measures of Spread

Measures of spread describe how stretched or squeezed the data is.

  • Range: The difference between the maximum and minimum values. For the data above, the range is 15−3=1215 - 3 = 12.
  • Interquartile Range (IQR): The range of the middle 50% of the data. It is calculated as Q3−Q1Q3 - Q1, where Q1Q1 is the median of the lower half and Q3Q3 is the median of the upper half.
  • Standard Deviation: A measure of how much the individual data values deviate from the mean. A low standard deviation means the data is tightly clustered close to the mean.

Box Plots and Outliers

A box plot (or box-and-whisker plot) is a visual display of the five-number summary:

  1. Minimum: The lowest value (excluding outliers).
  2. First Quartile (Q1): The median of the lower half of the data.
  3. Median (Q2): The middle of the data set.
  4. Third Quartile (Q3): The median of the upper half of the data.
  5. Maximum: The highest value (excluding outliers).

The "box" is drawn from Q1Q1 to Q3Q3, representing the middle 50% of the data, with a vertical line inside marking the median. The "whiskers" extend outward to the minimum and maximum values.

Identifying Outliers: An outlier is an unusually high or low value that stands out from the rest of the data. Mathematically, a value is considered an outlier if it is:

  • Less than Q1−1.5×IQRQ1 - 1.5 \times IQR
  • Greater than Q3+1.5×IQRQ3 + 1.5 \times IQR