Australian Bureau of Statistics

Rate this page
ABS @ Facebook ABS @ Twitter ABS RSS ABS Email notification service
Education Services

Education Services homepage

Teacher Statistical Literacy

Back to Education Services home page

Concepts and definitions

Click on the triangles to open a section. The table below is a list of the concepts covered in each section.

    Hide details for StatisticsStatistics

    Statistics are numerical data that have been organised to serve a useful purpose. A major role of the ABS is to provide the Australian community with statistics that will help them make informed decisions. Statistical information provided by the ABS is used widely in Australia by governments, business people, researchers, members of the public, teachers and students.

    Data are observations or facts which, when collected, organised and evaluated, become information or knowledge.

    Data item
    A data item is the smallest piece of information that can be obtained from a survey or census.

    A dataset is data collected for a particular study. A dataset represents a collection of elements; and for each element, information on one or more characteristics is included.

    An outlier is an extreme value of the data. It is an observation value that is significantly different from the rest of the data. There may be more than one outlier in a set of data.
    Sometimes, outliers are significant pieces of data and should not be ignored. In other instances, they occur as a result of an error or misinformation and should be ignored. The decision to include or exclude an outlier needs to be clearly justified when discussing results.

    The weights (in kilograms) of 30 students were measured and recorded in the stem and leaf plot shown in Figure 1. In this case, the stem is the whole number values and the leaves are the decimal values. The outliers are 56.3 and 67.7.

    Stem Leaf

    584 4 9
    590 0 2 3 8
    600 2 4 5 7 8 9
    611 2 4 4 5 6 7 9 9
    621 2 3 7

    Fig 1 Stem and leaf plot

    Hide details for VariablesVariables

    A variable is any measurable characteristic or attribute that can have different values for different subjects. Height, age, amount of income, country of birth, grades obtained at school and type of housing are examples of variables.

    An observation is a single piece of data about a variable

    Independent variable
    An independent variable is the variable whose values are independent of changes in the values of other variables. It its the variable deliberately controlled or changed to assess changes in the dependent variable.

    Dependent variable
    A dependent variable depends on the independent variable.

    Categorical variables
    Nominal variable
    A nominal variable describes a name or category. For example, for the variable 'method of travel to school' all its values are words such as bus, walk, car and tram. Nominal variables are often referred to as categorical variables.

    Ordinal variable
    An ordinal variable is a number that represents a category. For example, postcodes and school year levels.

    Numerical variables
    A numerical variable is one that describes a numerically measured value. Numerical variables can be either discrete or continuous.

    Continuous variable
    A continuous variable is a numeric variable that can take any value within a certain range. For example, distance, age and temperature are continuous variables.

    Discrete variable
    A discrete variable can only take a finite number of values within a certain range. An example of a discrete variable is the number of children in a family – a family can have 0,1,2 or 3 children but not 2.5.

    Class interval
    A class interval is a group of data values for a variable. The intervals are generally the same size – for example, 4-6, 7-9 and 10-12. However, the intervals may have different sizes such as 4-6, 7-9 and 10-14. The boundaries of class intervals must not overlap so that each observation can be allocated to only one interval.

    Hide details for SamplingSampling

    A census is a collection of information from all units in the population. The Census of Population and Housing is a statistical collection that aims to accurately measure the number of persons in Australia on Census night, their key characteristics and the dwellings in which they live.

    An estimate is an inference for the target population using information obtained from a sample of the population.

    A part of a population selected for the purpose of studying certain characteristics of an entire population of interest. A sample is used to represent the population. You can often get a response form a sample where it would not be possible to get a response form every member of the population.

    Sample size
    The sample size is the number of units, including persons, households, businesses and schools etc, being surveyed. In general, the larger the sample size, the smaller the sampling error.

    Random sample
    In a random sample, all units in the target population have an equal chance of selection.

    Simple random sample
    All members of the sample are chosen at random and have the same chance of being in the sample.
    A Tattslotto draw is a good example of simple random sampling. A sample of six numbers is randomly generated from a population of 45 with each number having an equal chance of being selected.

    Systematic random sample
    The first member of the sample is chosen at random then the other members of the sample are taken at intervals.

    Stratified random sample
    Relevant subgroups form within the population are identified and random samples are selected from within each strata.
    For example, a school has 24 Year 7 students. Eight of the students are 11 years old, twelve are 12 years old and four are 13 years old. The strata are the ages of students.
    To take a stratified sample, select one quarter of the students in each age group – for instance, two students form the 11-year-olds, three students from the 12-year-olds and ones student who is 13 years old.
    In this example, the strata are proportionally represented; however, this will not always be the case. The important thing to remember is to take a random sample from each strata.

    Non-random sample
    In a non-random sample, the chance of a member of the population being in the sample is unknown. The accuracy of the sample in representing the population is unknown.

    Quota sample
    This is a type of stratified sampling in which selection within the strata is non-random. Quota sampling requires setting a number of participants to include in a survey – usually a proportion of the population.
    Take the example of Year 7 students from the stratified random sample above who are in strata of age groups. Unlike stratified random sampling where participants are selected at random, participants in a quota sample are selected to fill the quotas.
    For instance, the first 15 twelve-year-old Year 7 students to arrive at school on any given day may be selected. However, this sample may not be representative of all twelve-year-olds in Year 7.

    Convenience sample
    In a convenience sample, participants are selected by how easy it is to reach them.
    For example, the first ten students to walk through the front gates of the school is an easy sample to take. Convenience sampling does not produce a representative sample of the population because people or things that can be reached easily and conveniently are likely to be different to those that are harder to reach.

    Volunteer sample
    This is where participants volunteer to be part of the survey.
    Phone-in sampling is a common method of volunteer sampling used by television and radio stations to measure public opinion. People are asked to telephone or SMS their vote on a particular issue by a certain time. There is no control over how many people vote.
    There are two main problems with this type of sampling. Firstly, there is no limit to the number of times a person can vote, and secondly, those not interested in voting will not be included in the sample. People who don't call in may have different views to the people who are calling in.
    Additionally, only those watching television or listening to the radio know that there is a survey taking place.
    As such, volunteer sampling is unlikely to produce a sample that accurately represents the population.

    Sampling error
    Sampling error is the difference between an estimate derived from a sample survey and the true value that would result if a census of the whole population was taken.

    Non-sampling error
    Non-sampling errors are not caused by sampling methodology. They can be made by participants and interviewers when the questionnaire is being filled in. or they can happen when the questionnaire is being processed.

    Hide details for Frequency and distributionFrequency and distribution

    The frequency (f) of a particular observation is the number of times the observation occurs in that data.

    Cumulative frequency
    Cumulative frequency is the total of a frequency and all frequencies below it in a frequency distribution. It is the running total of frequencies.

    Relative frequency
    Relative frequency is another term for proportion. It is the number of times a particular observations occurs divided by the total number of observations.

    The distribution of a variable is the pattern of values of the observations.

    Hide details for Graphs and displaysGraphs and displays

    A graph is a diagram representing a system of connections or interrelations among two or more variables by a number of distinctive dots, lines, bars, etc.

    A chart is a visual representation of data. Bar, line, pie and other types of charts are examples of charts.

    Box and whisker plots (often called ‘box plots’) can be used to show the interquartile range. Figure 1 shows a box and whisker plot of student ages.
    Notice that a scale is drawn underneath. Box plots can be drawn horizontally or vertically.

    Frequency distribution tables can be used for nominal and numeric variables.

    Twenty people were asked how many cars were registered to their households. The results were recorded as follows: 1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0. This data can be presented in a frequency distribution table – see Figure 2.

    Stem and leaf plots are a convenient way to organise data. Each observation value is considered to consist of two parts - a stem and a leaf.

    • the stem is the first digit or digits
    • the leaf is the final digit

    The number of books ten students read in one year were as follows: 12, 23, 19, 6, 10, 7, 15, 25, 21, 12.
    In ascending order, these are: 6, 7, 10, 12, 12, 15, 19, 21, 23, 25. Figure 3 is a stem and leaf plot of this data.

    In the stem and leaf plot (fig 3):

    • the stem '0' represents the class interval 0-9
    • the stem '1' represents the class interval 10-19
    • the stem '2' represents the class interval 20-29.

    If there are a large number of observations for each stem, the stem can be split in two. For example the interval 0-9 could be split into intervals 0-4 and 5-9. The stem would then be written as 0(0) and 0(5).

    Time series
    A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series.

    The ABS defines a trend as the long term movement in a time series without calendar related and irregular effects, and is a reflection of the underlying change in that measure. It is the result of influences such as population growth, price inflation and general economic changes.

    Equation: Box and whisker plot
    Fig 1 Box and whisker plot

    Number of cars (x)
    Frequency (f)

    l l l l
    l l l l l
    l l l l
    l l l
    l l

    Fig 2 Frequency distribution table

    Stem Leaf

    6 7
    0 2 2 5 9
    1 3 5

    Fig 3 Stem and leaf plot

    Show details for Summary statisticsSummary statistics

    List of items in each category

    Commonwealth of Australia 2008

    Unless otherwise noted, content on this website is licensed under a Creative Commons Attribution 2.5 Australia Licence together with any terms, conditions and exclusions as set out in the website Copyright notice. For permission to do anything beyond the scope of this licence and copyright terms contact us.