|Module 2: Describing, Clarifying and Presenting Data
4. Summarising data
4.3. Identifying an outlier
Let’s look again at the stem-and-leaf plot and the histogram that were developed for the student marks data set.
Notice how the measurement of 16 falls outside the pattern of the frequency histogram and therefore deviates from the overall shape of the histogram. This is an indication that 16 is an outlier in this set of marks.
Why did this one student score so poorly in this subject in comparison with his/her peers? Can this deviation be explained? Several explanations come to mind:
The student and the lecturer/tutor would be in a position to assess whether one of these explanations is appropriate, but anyone else would not have the relevant information.
- The student missed the final examination due to illness.
- The student did not attempt any assessment tasks throughout the session.
Outliers can be significant or they can be a mismeasurement
An outlier can be an unusual, important observation. Alternatively, it could be a mismeasurement. Understanding the context and checking the data might resolve questions associated with the outlier, but often there is a dilemma about how outliers should be treated.
Outliers can distort the mean of a set of data. Data involving income or pricing is often summarised using the median. For example, in the real estate section of the newspaper, the median house price for a suburb is often used rather than the mean price because an outlier such as a very high priced mansion will have less effect on the median price than it would on the mean house price. You might also note that the highest and lowest prices will also be reported so that potential buyers or sellers have some idea of the range of prices paid for a house in that suburb.
This page last updated 31 August 2009