'The ABC of the ABS' - A Dictionary of Statistical Terms
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Attribute
A characteristic of a person, object or concept (eg. age, sex).
Baby boomers
There is no universal definition, but the widest interpretation of the concept are people born between 1946-1966. For further information see Facts on Stats: ‘What generation do you belong to?’
Bias
Statistics which deviate from a true value in one direction. More common in survey collections, bias can arise from the selection of non-representative survey populations to represent the whole population, or from processes in the collection which systematically favour and give advantage to answers or findings.
Census
A census aims to collect information from all units in a population. The ABS conducts various censuses including the Census of Population and Housing which aims to accurately measure the number of persons in Australia on Census night, their key characteristics, and the dwellings in which they live; the Agricultural Census which collects information about every agricultural property in Australia; and the Retail Census which collects information about every shop in Australia.
Chart
A chart is a visual representation of data. Common examples present information as bars, lines, pie pieces or dots.
Class interval
A class interval is a group of data values for a variable. The intervals are generally the same size (eg. 0-4, 5-9, 10-14, etc.); however the intervals may have different sizes (eg. 0-4, 5-14, 15-19, etc.). The boundaries of class intervals must be non-overlapping so that each observation can be allocated to only one interval.
Cohort
A group whose members share a common experience or characteristic. For example, people born in a year (birth cohort), migrating in a year (migration cohort), and numbers of Grade 1 students in a school (Grade cohort), etc.
Cohort analysis
Utilises data on the history of groups, examining their characteristics and behaviours over time. These studies are useful in analysing changes within the life cycle of individuals.
Collection district (CD)
These geographic areas are the smallest geographic unit, and the smallest standard area for which most detailed census data are available and on which all statistical geography is based. There is an average of about 225 dwellings in each CD. In rural areas, the number of dwellings per CD generally declines as population densities decrease. The design of the CDs is reviewed for each Census to allow for change and growth.
Consumer Price Index (CPI)
The Consumer Price Index (CPI) is an index which measures the changes in the price of a fixed basket of goods and services acquired by household consumers. Both the goods and the volume must remain constant. It relates only to consumer goods and services priced in the eight capital cities. It is often called the inflation rate. It is an indicator of price movements in consumer goods and services, not price levels or dollar amounts. It can be expressed as an index number or as a percentage change.
Prices of 11 major groups feed into CPI = All groups (also called the headline CPI):
· food
· alcohol and tobacco
· clothing and footwear
· housing
· household contents and services
· health
· transportation
· communication
· recreation
· education
· financial and insurance services
Continuous variable
A continuous variable is a numeric variable that can take any value within a certain range. Examples of continuous variables may be distance, age or temperature.
Data
Data are observations or facts which, when collected, organised and evaluated, become information or knowledge.
Data item
A data item is the smallest piece of information that can be obtained from a survey or census.
Data set
A data set is data collected for a particular study. A data set represents a collection of elements; and for each element, information on one or more characteristics is included.
Deciles
Deciles divide ordered data into ten equal groups.
Demography
Demography is a broad social science discipline concerned with the study of human populations. Demographers are interested in three key areas that directly affect population change: fertility, mortality and migration.
Discrete variable
A discrete variable can only take a finite number of values within a certain range (unlike a continuous variable). An example of a discrete variable would be the number of children in a family (i.e. a family can have 0, 1, 2 or 3 children, but not 2.5).
Distribution
The distribution of a variable is the pattern of values of the observations.
Employed
Persons aged 15 years and over are considered employed if, during the week prior to answering the question, they worked for one hour or more for pay, profit, commission or payment in kind in a job, business, or on a farm.
Estimated resident population (ERP)
Estimated resident population (ERP) are estimates of the Australian population obtained by adding to the estimated population at the beginning of each period the components of natural increase (on a usual residence basis) and net overseas migration.
First generation Australia
Those who were born overseas and migrated to Australia.
Frequency
The number of observations in a given statistical category.
Graph
A graph is a visual representation of data. Common examples present information as bars, lines, pie pieces or dots.
Index
An Index is a number used to show the variation in some quantity over time. It is usual to fix the first observation (sometimes called a benchmark) to a base value of 100, then having all the following observations linked to this base to compare any relative changes over time. It is a type of time series data.
Indigenous population
People of Aboriginal or Torres Strait Islander descent who identify as an Aboriginal or Torres Strait Islander and are accepted as such by the community in which they live. Data referring to the size of the Indigenous population are experimental estimates in that the standard approach to population estimation is not possible because satisfactory data on births, deaths and migration are not generally available. Furthermore, there is significant intercensal volatility in census counts of the Indigenous population, due in part to changes in the propensity of persons to identify as being of Aboriginal or Torres Strait Islander origin.
Information
Information is data that has been organised to serve a useful purpose.
Labour force
The labour force includes both employed and unemployed people aged 15 years and over.
Labour force status
Labour force status identifies whether a person aged 15 years or over is employed, unemployed or not in the labour force.
Mean
The mean is the sum of all the observation values divided by the number of observations; also know as the arithmetic average.
Median
The median is the middle value of a set of data (or the average of the middle two in an even-numbered set), after the data have been placed in ascending order. There are as many observations above the median as there are below.
Median age
The age at which half the population is older and half is younger.
Mesh block
Mesh Blocks are a new micro-level geographical unit for classifying data and will be the basis for statistics. There are 314,369 spatial Mesh Blocks covering Australia with most residential Mesh Blocks containing approximately 30 to 60 dwellings. Mesh Blocks have been designed to be small enough to aggregate accurately to a wide range of spatial units and thus enable a ready comparison of statistics between geographical areas, and large enough to protect against accidental disclosure of confidential information. Mesh Blocks are intended to become a new building block of statistical and administrative geography. Individual Mesh Blocks will only have very basic Census data (numbers of people and dwellings), but aggregates of Mesh Block can contain a very rich source of statistical information.
Metadata
The information that defines and clarifies the numbers, it can also include such things as explanatory notes, information papers, and concepts, sources and methods. Within a statistical table, the row and column descriptions, along with the reference period for the data and the geographic area described usually found in the table heading, and footnotes associated with the table, constitute metadata. Other information about how the data was collected is also available (in explanatory notes, classification manuals, information papers and descriptions of concepts, frameworks, sources and methods).
Mode
In a set of data, the mode is the most frequently observed value.
Natural increase
The excess of births over deaths during the year. Should deaths exceed births, then the equivalent concept is natural decrease.
Net
The term 'net' refers to the difference between two figures; that which remains after necessary deductions have been made.
Net interstate migration
Interstate arrivals minus interstate departures during the year.
Net overseas migration
Permanent and long-term arrivals minus permanent and long-term departures during the year.
Non-sampling error
Non-sampling errors occur in producing statistical information that are not caused by sampling methodology. For example, errors can be induced as a result of the respondent, questionnaire, interviewer, processing, etc.
Non-school qualification
Non-school qualifications are awarded for educational attainments other than those of pre-primary, primary or secondary education. They include qualifications at the Postgraduate Degree level, Master Degree level, Graduate Diploma and Graduate Certificate level, Bachelor Degree level, Advanced Diploma and Diploma level, and Certificates I, II, III and IV levels. Non-school qualifications may be attained concurrently with school qualifications.
Observation
An observation is a single piece of data about a variable.
Outlier
An outlier is an extreme, or atypical data value(s) in a sample. It is an observation value that is significantly different from the rest of the data. There may be more than one outlier in a set of data.
Out of scope
Not in the population of interest or excluded from the statistical collection.
Participation rate
The participation rate is the proportion of the population aged 15 years and over that are in the labour force. For example, the participation rate for females is derived by adding the number of females employed to the number of females unemployed and dividing this number (the female labour force) by the total number of females in the population aged 15 years and over and expressing this as a percentage.
Percentage
Percentage is the term used to express a number as a fraction of one hundred.
Percentage change
To calculate percentage change: new value minus old value, divided by old value and multiplied by 100.
Persons not in the labour force
Many people are neither employed nor unemployed, according to ABS categories. Examples of people in this category are retirees, those who choose not to work, and those who are unable to work. These groups form an important part of the labour force framework and contain people who are known collectively as persons not in the labour force.
Place of enumeration
The place of enumeration is the place at which the person is counted on Census Night (i.e. where he/she spent Census Night), which may not be where he/she usually lives.
Place of usual residence
This is the place where a person usually lives. It may, or may not be the place where the person was counted on Census Night. Each person is required to state his/her address of usual residence in Question 8. The count of persons at their usual residence is known as the de jure population count.
Population
This is a statistical term that can apply to people or things. It refers to the total number of units that you are interested in. It can apply to people, dwellings, businesses, agricultural properties, vehicles, etc. It can also apply to specific groups within those broader categories, such as the student population, the working population, vineyards, passenger vehicles, etc. It is important to accurately define what the population of interest is.
Population projections
Population projections are not predictions or forecasts. They are an assessment of what would happen, in future years, to Australia's population given a set of assumptions about future trends in fertility, mortality and migration.
Quartiles
Quartiles divide ordered data into four equal groups.
Quintiles
Quintiles divide ordered data into five equal groups.
Random error
The process applied to data to allow the maximum amount of detail possible to be released without breaching confidentiality. Consequently, care should be taken when interpreting cells with small numbers.
Random sample
In a random sample, all units in the target population have an equal chance of selection.
Range
The range is the actual spread of data, including any outliers. It is the difference between the highest and lowest observation.
Ratio
A ratio expresses the difference between two numbers in a proportional relationship. For example, there might be 23 students in a class. If there are 10 girls then the ratio of girls to boys is 10 to 13, or 10:13. Expressed as a fraction the ratio is 10/13. The ratio of boys to girls is 13:10.
Relative frequency
Relative frequency is another term for proportion. It is the number of times a particular observation occurs divided by the total number of observations.
Relative standard error (RSE)
The relative standard error is the standard error expressed as a percentage of the estimate. Hence, the RSE is 'scaled' to the estimate. This enables the user to compare the quality of estimates of different size.
Sample
A part of a population selected for the purpose of studying certain characteristics of an entire population of interest.
Sample size
The sample size is the number of units (e.g. persons, households, businesses, schools) being surveyed. In general, the larger the sample size, the smaller the sampling error.
Sampling error
The sampling error is the difference between an estimate derived from a sample survey and the true value that would result if a census of the whole population was taken.
Scope
The population of interest or the target population.
Second generation Australian
Those born in Australia with at least one overseas-born parent.
Seasonally adjusted series
These remove known seasonal and calendar-related influences. Examples are the effects of Easter and Christmas on employment and retail sales. However, these seasonally adjusted series can still show erratic movements, due to irregular influences such as strikes. These erratic movements may be 'smoothed' by averaging figures over a period of months; the resultant series is known as a trend series.
Sex ratio
The number of males per 100 females.
Standard deviation
Standard deviation is the measure of spread most commonly used in statistical practice when the mean is the measure of centre. Standard deviation is most useful for symmetric distributions with no outliers.
Standard error (SE)
The standard error is a measure of the variability in the data. It is expressed as a number and relates to a specified data item and gives an indication of the mean distance between each sampled unit's value for that data item and the average value in the sample.
Statistics
Statistics are numerical data that have been organised to serve a useful purpose.
Statistical literacy
The ability to critically evaluate data.
Survey Estimate
An inference for the target population, using information obtained from a sample of the population. These estimates are subject to both sampling and non-sampling error.
Time series
A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series.
Trend
The ABS defines a trend as the 'long term' movement in a time series without calendar-related and irregular effects, and is a reflection of the underlying change in that measure. It is the result of influences such as population growth, price inflation and general economic changes. Trend estimates 'smooth out' erratic movements in the seasonally adjusted series. The trend series reflects the general drift or underlying path of the data.
Unemployment
Unemployed persons are persons aged 15 and over who were not employed during the week of the Labour Force Survey, and:
(a) had actively looked for full-time or part-time work at any time in the four weeks up to the end of the reference week and were either:
(i) available for work in the reference week, or would have been available except for temporary illness (i.e. lasting for less than 4 weeks to the end of the reference week);
(ii) waiting to start a new job within 4 weeks from the end of the reference week and would have started in the reference week if the job had been available then; or
(b) were waiting to be called back to a full-time or part-time job from which they had been stood down without pay for less than four weeks up to the end of the reference week (including the whole of the reference week for reasons other than bad weather or plant breakdown).
Unit
A unit is a single component of a population. It is an entity about which information is being collected.
Variable
A characteristic that may assume more than one of a set of values (e.g. income, age, eye colour, weight).
Variance
Measures the spread of data around the mean. It involves taking the square of the difference between the mean and each observation and then averages the result. The standard deviation is simply the square root of the variance.
Weighted estimate
Estimation allows us to make inferences about the population as a whole. In order to do this, we need to weight the data. Weighting it the process whereby each unit in the sample has its response inflated to represent the response from all similar units in the population. The weight of a unit reflects the proportion of the population that the sampled unit represents. The weight allocated to each sample observation depends on the process used to select the sample.
Other resources:
ABS
ABS Education Services Glossary
Census Dictionary, 2011 (ABS cat. no. 2901.0)
Population data sources and definitions
Statistical Language
NON-ABS
RobertNiles.com 'Statistics that every writer should know'
The OECD Glossary of Statistical Terms
|