1331.0 - Statistics - A Powerful Edge!, 1996  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 31/07/1998   
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> Stats Maths >> Stem and Leaf Plots

ORGANISING DATA

STEM AND LEAF PLOTS

The use of a stem and leaf plot, or stemplot, is a technique to classify either discrete or continuous variables.

In the previous example on battery life, it can be seen that there are two observations that lie in the interval 360-369. However, it cannot be seen from the table what those actual observations are.

The two values, 363 and 369, can only be found by searching through all the original data. The main advantage of a stemplot is that the data are grouped whilst also displaying all the original data.

Each observation may be considered as consisting of two parts: a stem and a leaf. To make a stemplot, each observation must first be separated into its two parts:

  • a stem is the first digit or digits;
  • a leaf is the final digit of a value;
  • each stem can consist of any number of digits; and
  • each leaf can only have a single digit.

So — for example:
  • if the value of an observation is 25: the stem is 2 and the leaf is 5; and
  • if the value of an observation is 369: the stem is 36 and the leaf is 9.

Where observations are accurate to one or more decimal places, such as 23.7, the stem is 23 and the leaf is 7. (The number 23.7 could be rounded off to 24 to limit the number of stems if the range of values is too great.)

In stemplots, tally marks are not required as the actual data are used.


EXAMPLE

1. The numbers of books ten students read in one year were as follows:
    12, 23, 19, 6, 10, 7, 15, 25, 21, 12.
Prepare a stemplot for the data.

Stem
Leaf

0
6 7
1
2 9 0 5 2
2
3 5 1

In the table:
  • the stem ‘0’ represents the class interval 0-9,
  • the stem ‘l’ represents the class interval 10-19, and
  • the stem ‘2’ represents the class interval 20-29.

Note that the number ‘6’ can be written as 06 thus having a stem of 0 and a leaf of 6 .

Usually, a stemplot is placed in order, which simply means that the leaves are arranged in ascending order from left to right. Also, commas that separate the leaves (digits) are not necessary since the leaf is always a single digit.

Using the above table, the resultant ordered stemplot is shown below:

Stem
Leaf

0
6 7
1
0 2 2 5 9
2
1 3 5

SPLITTING STEMS

If the leaves are crowded on too few stems, then it is useful to split each stem into two or more components. Thus, for the interval 0-9, a stem split in two would create one interval of 0-4 and another interval of 5-9. A stem split in five would create the intervals 0-1, 2-3, 4-5, 6-7 and 8-9.

EXAMPLE

1.Fifteen people were asked how often they drove to work over ten working days. The number of times each person drove were as follows:
5, 7, 9, 9, 3, 5, 1, 0, 0, 4, 3, 7, 2, 9, 8.
Prepare an ordered stemplot for this data.

The stemplot could be drawn as follows:

Stem
Leaf

0
0 0 1 2 3 3 4 5 5 7 7 8 9 9 9

This stemplot’s organisation does not give much information about the data. Having only one stem creates an overcrowded leaf. In this case it is useful to split the stem. The stemplot is then displayed as follows:

Stem
Leaf

0(0)
0 0 1 2 3 3 4
0(5)
5 5 7 7 8 9 9 9
  • The stem 0(0) means all the data within the interval 0-4.
  • The stem 0(5) means all the data within the interval 5-9.


EXAMPLE

2.A swimmer training for a competition recorded the number of 50 metre laps she swam each day for thirty days. The numbers of laps recorded each day were as follows:
22, 21, 24, 19, 27, 28, 24, 25, 29, 28, 26, 31, 28, 27, 22, 39, 20, 10, 26, 24, 27, 28, 26, 28, 18, 32, 29, 25, 31, 27
a)Prepare an ordered stemplot. Make a brief comment on what the stemplot shows.
b)Redraw the stemplot by splitting the stems into five-unit intervals. Make a brief comment on what the new stemplot shows.

a)The observations range in value from 10 to 39 so the stemplot should have stems of 1, 2 and 3. The ordered stemplot is shown below:
Stem
Leaf

1
0 8 9
2
0 1 2 2 4 4 4 5 5 6 6 6 7 7 7 7 8 8 8 8 8 9 9
3
1 1 2 9

It is obvious from the stemplot that the swimmer usually swims between 20 and 29 laps in training each day.

b)Splitting the stems into five-unit intervals gives the following stemplot:
Stem
Leaf

1(0)
0
1(5)
8 9
2(0)
0 1 2 2 4 4 4
2(5)
5 5 6 6 6 7 7 7 7 8 8 8 8 8 9 9
3(0)
1 1 2
3(5)
9

Note that 1(0)means all the data between 10 and 14, 1(5)means all the data between 15 and 19, and so on.

The revised stemplot shows that the swimmer usually swims between 25 and 29 laps in training each day. The values 1(0) 0 = 10 and 3(5) 9 = 39 are outliers: a concept that is described shortly.

EXAMPLE

3.The weights (to the nearest tenth of a kilogram) of 30 students were measured and recorded as follows:

59.2, 61.5, 62.3, 61.4, 60.9, 59.8, 60.5, 59.0, 61.1, 60.7, 61.6, 56.3, 61.9, 65.7, 60.4, 58.9, 59.0, 61.2, 62.1, 61.4, 58.4, 60.8, 60.2, 62.7, 60.0, 59.3, 61.9, 61.7, 58.4, 62.2.
Prepare an ordered stemplot for the data and briefly comment on what the analysis indicates.

In this case, the stems will be the whole number values and the leaves will be the decimal values. The data ranges from 56.3 to 65.7 so the stems should start at 56 and finish at 65.


Stem
Leaf

56
57
3
58
4 4 9
59
0 0 2 3 8
60
0 2 4 5 7 8 9
61
1 2 4 4 5 6 7 9 9
62
63
64
1 2 3 7

65
7

It is not necessary to split stems because the leaves are not crowded on too few stems; nor is it necessary to round the values as the range of values is not large. The stemplot reveals that the group with the highest number of observations recorded is the 61 to 61.9 group.

OUTLIERS

An outlier is an extreme value of the data. It is an observation value that is significantly different from the rest of the data. There may be more than one outlier in a set of data.

Sometimes, outliers are significant pieces of data and should not be ignored. In other instances, they occur as a result of an error or misinformation and should be ignored.

In the previous example, outliers are 56.3 and 65.7, as these two values are quite different from the other values.

By ignoring these two outliers, the previous example’s stemplot could be redrawn as below:

Stem
Leaf

58
4 4 9
59
0 0 2 3 8
60
0 2 4 5 7 8 9
61
1 2 4 4 5 6 7 9 9
62
1 2 3 7

Outliers: 56/3 and 65/7

When using a stemplot, it is often a matter of judgement to spot an outlier. This is because, except when using boxplots (explained on the section Box and Whisker Plots), there is no strict rule to specify how far removed a value must be from the rest of a data set to qualify as an outlier.


FEATURES OF A DISTRIBUTION

When assessing the overall pattern of any distribution, the features to look for are the number of peaks, general shape (skewed or symmetric), centre and spread.


NUMBER OF PEAKS

The first characteristic that can be readily seen from a line graph is the number of high points or peaks the distribution has.

While most distributions that occur in statistical research have only one main peak (unimodal), other distributions may have two peaks (bimodal) or more than two peaks (multimodal).

Examples of unimodal, bimodal and multimodal line graphs are shown below:

Image: examples of unimodal, bimodal and multimodal line graphs



GENERAL SHAPE

The second main feature of a distribution is the extent to which it is symmetric.

A perfectly symmetric curve is one in which both sides of the distribution would exactly correspond if the figure was folded over its central point.

It should be noted, though, that it is unusual for a distribution to be perfectly symmetric. An example of a symmetric distribution is shown below:

Image: example of a symmetric distribution


A symmetric, unimodal, bell-shaped distribution - a relatively common occurrence - is called a normal distribution.


If the distribution is lop-sided, it is said to be skewed.

A distribution is said to be skewed to the right, or positively skewed, if most of the data are concentrated on the left of the distribution. The right tail clearly extends further from the centre than the left tail as shown below:

Image: positively skewed distribution


A distribution is said to be skewed to the left, or negatively skewed, if most of the data are concentrated on the right of the distribution. The left tail clearly extends further from the centre than the right tail as shown below:

Image: negatively skewed distribution



CENTRE AND SPREAD

Locating the centre (median) of a distribution can be done by counting half the observations up from the smallest. Obviously, this method is impracticable for very large sets of data. A stemplot makes this easy, as the data are arranged in ascending order. (A more precise technique of finding this mid-point is described in a later section.)

The amount of distribution spread and any large deviations from the general pattern (outliers) can be quickly spotted on the graph.


USING STEMPLOTS AS GRAPHS

A stemplot is a simple kind of graph that is made out of the numbers themselves, and is a means of displaying the main features of a distribution. By turning a stemplot on its side, it will resemble a histogram and provide similar visual information.

EXAMPLE

1. The results of forty-one students’ Maths tests (out of 70) are recorded below:

31, 49, 19, 62, 50, 24, 45, 23, 51, 32, 48, 55, 60, 40, 35, 54, 26, 57, 37, 43, 65, 50, 55, 18, 53, 41, 50, 34, 67, 56, 44, 4, 54, 57, 39, 52, 45, 35, 51, 63, 42.

a) Is the variable discrete or continuous? Explain.

b) Prepare an ordered stemplot for the data and briefly describe what the stemplot shows.

Are there any outliers? If so, what are they?

c) By turning the stemplot on its side (or rotating the page 90 degrees left), describe the distribution’s main features such as:

  • number of peaks,
  • symmetry, and
  • value at the centre of the distribution.


Answers:

a) A test score is a discrete variable. It is not possible to have a test score of 35.74542341... for example.

b) The lowest value is 4 and the highest is 67. Therefore, the stemplot for Maths test results that covers this range of values is as follows:


Stem
Leaf

0
4
1
8 9
2
3 4 6
3
1 2 4 5 5 7 9
4
0 1 2 3 4 5 5 8 9
5
0 0 0 1 1 2 3 4 4 5 5 6 7 7
6
0 2 3 5 7

2|4 represents 24

The stemplot reveals that most students obtained a mark in the interval between 50 and 59. The large number of students who obtained high results could mean the test was too easy, most students knew the subject being tested, or a combination of both.

The result of 4 could be an outlier, as there is a gap between this and the next result, 18.

c)If the stemplot is turned on its side, it will look like the following:
Image: a stemplot turned on its side


The distribution has a single peak in the 50s interval.

Although there are only 41 observations, the distribution shows that most data are clustered at the right. The left tail extends further from the data centre than the right tail. Therefore, the distribution is skewed to the left or negatively skewed.

As there are 41 observations, the distribution centre will occur at the 21st observation. By counting 21 observations up from the smallest, the centre is 48. (Note: the same value would have been obtained if 21 observations were counted down from the highest observation. Measures of centre or location are discussed in detail on the sections Measures of Location - Mean - Median and Mode.)


EXERCISES
1.Indicate which of the following are discrete or continuous variables:
a) The time taken for you to get to school.

b) The number of couples who were married last year.

c) The number of goals scored by a women’s hockey team.

d) The speed of a bicycle.

e) Your age.

f) The number of subjects which you can choose to do next year.

g) The time of a phone call between two people.

h) The annual income of an individual.

i) The number of people working at the Australian Bureau of Statistics.

j) The number of brothers and sisters you have.

k) The distance between your house and school.

I) The number of pages in this book.

2.Give two examples, different to any of those given in Question 1, of:
a) a discrete variable, and
b) a continuous variable.
3.a) Copy and complete the frequency distribution table for the following set of data:
2, 5, 4, 3, 4, 3, 1, 3, 3, 2, 3, 4.

Score (x)
Tally
Frequency (f)

1
2
3
4
5
b) Which score occurs the most frequently (the mode)?
    4.A local milkbar owner records how many customers enter the store over a 25 day period. The number of customers is as follows:
    20, 21, 23, 21, 26, 24, 20, 24, 25, 22, 22, 23, 21, 24, 21, 26, 24, 22, 21, 23, 25, 22, 21, 24, 21.
    a) What type of variable is used?

    b) Present this data in a frequency distribution table by tallying the data.

    c) Which observation occurs the most frequently (the mode)?

    d) Set up a table to include the relative frequency and percentage frequency of the data.

    e) Comment briefly on what conclusions you can make from the tables.

    5.The wind speed (measured to the nearest knot) of the Fremantle Doctor was recorded for 40 days.
    15, 22, 14, 12, 21, 34, 19, 11, 13, 0, 16, 4, 23, 8, 12, 18, 24, 17, 14, 3, 10, 12, 9, 15, 20, 5, 19, 13, 17, 11, 16, 19, 24, 12, 7, 14, 17, 10, 14, 23.
    a) What type of variable is used?

    b) Choose an appropriate class interval and present this data in a frequency distribution table by tallying the data.

    c) Which class interval occurs the most frequently?

    d) Set up a table to include the relative frequency and percentage frequency of the data.

    e) Comment briefly on what conclusions you can make from the tables.
    6.Copy and complete the stem and leaf table below for the following set of data:
    21, 35, 27, 2, 18, 25, 10, 4, 43, 14, 29, 24, 15, 9, 26, 31, 41, 1, 28, 38, 40, 22, 37, 26, 19, 0, 33, 12, 16, 23.
    Stem
    Leaf

    0
    1
    2
    3
    4
    Redraw the table so that it is an ordered stem and leaf table.

    7.a) Prepare an ordered stem and leaf plot for the data in Question 5.

    b) Do any outliers exist? If so, can you explain the reason for their presence?

    c) Describe the distribution’s main features:

    i) number of peaks,

    ii) general shape, and

    iii) approximate value at the distribution’s centre.

    8.The number of road fatalities in the A.C.T. from 1960 to 1992 was as follows:
    10, 7, 8, 8, 17, 15, 17, 23, 14, 26, 31, 20, 32, 29, 31, 32, 38, 29, 30, 24, 30, 29, 26, 28, 37, 33, 32, 36, 32, 32, 26, 17, 20.
    a) What type of variable is used?

    b) Prepare an ordered stem and leaf plot for the data.

    c) Expand the stemplot by using five-unit intervals.

    d) Do any outliers exist? If so, can you explain the reason for their presence?

    e) Describe the distribution’s main features:

    i) number of peaks,

    ii) general shape, and

    iii) approximate value at the distribution’s centre.

    9.The mean July daily minimum temperature (Celsius) for Sydney from 1972 to 1992 is recorded as follows:
    6.1, 8.9, 6.9, 7.2, 7.0, 6.2, 5.7, 6.2, 6.8, 6.4, 6.8, 6.4, 7.6, 7.8, 7.3, 6.8, 8.8, 7.8, 8.1, 8.1, 7.9.
    a) What type of variable is used?

    b) Prepare an ordered stem and leaf plot for the data.

    c) Is it necessary to expand the stemplot? Why or why not?

    d) Do any outliers exist? If so, can you explain the reason for their presence?

    e) Describe the main features of the distribution.

    10.Fifty company staff were surveyed and asked what their weekly salary was to the nearest dollar. The results follow:
    514, 476, 497, 511, 484, 513, 471, 470, 441, 466, 443, 481, 502, 528, 459, 548, 521, 517, 463, 478, 473, 514, 542, 519, 522, 523, 546, 487, 486, 473, 527, 470, 440, 564, 499, 523, 484, 463, 461, 437, 555, 525, 461, 539, 466, 470, 486, 490, 543, 519.
    a) What type of variable is used?

    b) Choose an appropriate class interval and present this data in a frequency distribution table by tallying the data.

    c) Which class interval occurs the most frequently?

    d) Set up a table to include the data’s relative frequency and percentage frequency.

    e) Comment briefly on what conclusions you can make from the tables.

    f) Prepare an ordered stem and leaf plot for the data.

    g) Do any outliers exist? If so, can you explain the reason for their presence?

    h) Describe the main features of the distribution such as:

    i) number of peaks,

    ii) general shape, and

    iii) approximate value at the distribution’s centre.



    Click here for answers


    CLASS ACTIVITIES

    1.Accurately draw a straight line measuring exactly 10 centimetres long. Without measuring, put a mark where you think halfway is (exactly). Now measure the length of each segment. By how many millimetres was your estimate short of the halfway (5cm) mark? Record this value. Find out how much the rest of the class deviated from halfway.
    With this data, construct a frequency table including relative frequency and percentage frequency.
    Which result occurred the most?

    Prepare a stem and leaf plot for the data.

    Do any outliers exist?

    How many peaks does the distribution have?

    What is the distribution’s general shape?

    What is the distribution’s approximate centre?

    What conclusions can you make from the analysis?
    2.Ask your teacher to give you a class set of results from a recent test or assignment. Perform a detailed analysis on the data similar to that described above. Comment briefly on:
    a) standard of test or assignment,

    b) ability of the class, and

    c) standard of teaching, supporting each answer with evidence based on your analysis.
    3.Throw a dice 30 times. Record each result using a frequency table.
    What type of variable is being used? Calculate the relative frequencies and percentage frequencies.

    Which result occurred the most? Would you expect any number to occur more often than the others?

    Prepare a stem and leaf plot for the data.

    Do any outliers exist?

    How many peaks does the distribution have? What is the general shape of the distribution?

    What is the distribution’s approximate centre?

    What conclusions can you make from the analysis?
    4.Survey teachers in your school to find what colour car they drive. Don’t include shades of colours. What type of variable is this? Present the data in a frequency table, including relative frequency and percentage frequency.
    What colour car is the most popular among surveyed teachers? By what percentage is this colour more popular than the second most common colour?

    Why can’t you prepare a stemplot for this data?

    How do you think a car manufacturer might use this type of data analysis?


      Previous PageNext Page