Box Plots and Interquartile Range
Box Plots and Interquartile Range
A box plot (or box-and-whisker plot) is a visual way to show how a set of data is spread out. Instead of showing every single data point, it divides the data into four equal parts (quartiles) using five key numbers.
The Five-Number Summary
To draw a box plot, you first need to find the five-number summary:
- Minimum: The smallest number in the dataset.
- First Quartile (Q1): The median of the lower half of the data.
- Median (Q2): The middle number of the entire dataset.
- Third Quartile (Q3): The median of the upper half of the data.
- Maximum: The largest number in the dataset.
Example: Find the five-number summary for the data set: 2,4,6,8,10,12,14.
- The data is already in order.
- Median (Q2): The middle number is 8.
- Lower half: 2,4,6. The median of this half is Q1=4.
- Upper half: 10,12,14. The median of this half is Q3=12.
- Minimum: 2
- Maximum: 14
Drawing a Box Plot
Once you have your five numbers, you can draw the plot along a number line:
- Draw a box that starts at Q1 and ends at Q3.
- Draw a vertical line inside the box at the Median (Q2).
- Draw lines (called "whiskers") extending from the ends of the box out to the Minimum and Maximum values.
The Interquartile Range (IQR)
The Interquartile Range (IQR) measures the spread of the middle 50% of your data. It is the length of the box in your box plot.
IQR=Q3−Q1
Using our previous example: IQR=12−4=8
Identifying Outliers
An outlier is a data point that is abnormally far away from the rest of the data. We use the IQR to check if a number is an outlier. A number is an outlier if it is:
- Lower than Q1−1.5×IQR
- Higher than Q3+1.5×IQR
Example: Is 25 an outlier in our dataset?
- Calculate the upper boundary: Q3+1.5×IQR=12+1.5×8=12+12=24.
- Since 25 is greater than the upper boundary of 24, it falls outside the normal range. Therefore, 25 is an outlier.