2062.0 - Census Data Enhancement project: An update, 2011  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 28/11/2014  Ceased
   Page tools: Print Print Page Print all pages in this productPrint All
  • Explanatory Notes



Australian Bureau of Statistics
ACLD Australian Census Longitudinal Dataset
ACMID Australian Census and Migrants Integrated Dataset
AEDI Australian Early Development Index
Census Data Enhancement
COAG Council of Australian Governments
NAPLAN National Assessment Program - Literacy and Numeracy
SDB Settlement Database
SLCD Statistical Longitudinal Census Dataset
SSRI Social Security and Related Information dataset
VET Vocational Education and Training


Australian Census Longitudinal Dataset

The Australian Census Longitudinal Dataset was created by linking a 5% random sample of individuals from the 2006 Census to records from the 2011 Census. During its development and prior to release, it was referred to as the Statistical Longitudinal Census Dataset.

Administrative dataset

Information (including personal information) collected by agencies for the administration of programs, policies or services (e.g. Medicare data, taxation data). Administrative data is one type of unit record level data.

Australian Early Development Index

The Australian Early Development Index measures how young Australian children are developing. The data is collected every three years and provides a population measure of children's development at the time they start primary school. The main aim of the AEDI is to provide data to help communities in the development and reorientation of services and systems to improve the health and wellbeing of young children.

Australian Government's Settlement Database

The Australian Government's Settlement Database contains statistical data from the administration of immigration programs. This includes overseas arrivals and departures data, where the period of duration is over 12 months, and visa data, including type of visa.

Census Dress Rehearsal

The Census Dress Rehearsal is generally conducted in the year prior to the Census of Population and Housing. It is used as a run-through of operational processes and data collection methods for the main Census, and is tested on a sample of dwellings across Australia.

Census processing period

The period of time immediately after the conduct of the Census of Population and Housing during which the Census forms are processed to produce statistical outputs. The Census processing period has generally lasted 12 months.

Confidentialisation procedures

Confidentialising data involves altering a dataset to ensure that individual records can not be identified. Some examples of confidentialisation are:

  • perturbation - changing the data slightly to reduce/remove the risk of disclosure, without significantly affecting aggregate results;
  • combining and collapsing categories - combining several response categories into one, or reducing the amount of classificatory detail available in microdata; and
  • suppression - not releasing information for unsafe cells, or deleting individual records or data items from the file.

Data are measurements or observations that are collected as a source of information.

Data integration

Data integration involves bringing together multiple data sources, generally at the unit record level (i.e. for a person or organisation) or micro level (e.g. information for a small geographic area), to provide new datasets for statistical or research purposes. Data integration refers to the full range of management and governance practices around the process, including project approval, data transfer, linking and merging the data, and dissemination.

Data item

Any characteristic, number, or quantity that can be measured or counted.

Data linking

Data linking (also referred to as data linkage or record linkage) is one aspect of the data integration process. Data linking creates links between data from two or more sources based on common features present in those sources.


A file containing the individual responses from a statistical collection, administrative records or register of information (for example disease register). Datasets are used to generate statistical output. A data set that has been formed through data linking is called a linked (or integrated) dataset.

De-identified data

De-identifying data involves two key steps:

1. De-identification of the data, which is removal of any direct identifiers (eg. name, address, Australian Business Number) from the data; and
2. Removing or altering any other information that may allow an individual to be identified, such as a rare characteristic of an individual, or a combination of unique or remarkable characteristics that enable identification.

Death register data

The registration of deaths is the responsibility of the individual State and Territory Registrars of Births, Deaths and Marriages and is based on the data provided on an information form. This information form is the basis of the data provided to the ABS for processing and production of death statistics.


Dissemination is the process of outputting data. Dissemination at the ABS can be achieved through publications, unit record files, and static tables, as well as through dissemination products such as TableBuilder and DataAnalyser.

Linkage methodology

The method used to link datasets.

Linkage quality

The quality of a linked dataset is evaluated through various measures. These include estimating the number of records that were linked correctly, examining the properties of the records not able to be linked, and assessing the under or over representation of population groups.

Linkage strategy

A linkage strategy is the approach taken for a particular data linking project. It takes into account the method of linking to be used, the level of accuracy required, the suitability and availability of linking variables, and the resources available.

Longitudinal dataset

A dataset which contains information for the same unit over a number of different points in time.

Medicare Benefits Schedule

Medicare Australia collects Medicare Benefits Schedule claims data under the Medicare Act 1973. This information includes type of medical service, cost and provider type for services received from GPs.

National Assessment Program - Literacy and Numeracy

NAPLAN is an annual national test held in literacy and numeracy for students in Years 3, 5, 7 and 9. All students across Australia undertake the same year level tests in the four domains: reading, writing, language conventions (spelling, grammar and punctuation), and numeracy.

Official statistics

Official statistics are defined as those statistics produced by government departments and agencies including statistics collected by surveys or from administrative systems.

Pharmaceutical Benefits Scheme data

The Department of Health collects information on purchases of medicines subsidised by the Australian Government through the Pharmaceutical Benefits Scheme. This information includes type of medication, cost and prescriber type.


In the context of data integration, privacy refers to the protection of an individual’s personal information as defined by the Privacy Act. The Privacy Act 1988 is an Australian law which regulates how personal information is collected, used, stored and disclosed.

Quality study

A quality study investigates the outcomes of linkages using a common set of input data sources but different linkage methods. Quality studies help to determine the feasibility of different linkage methods and identify areas for improvement.

Random sample

A method of sampling in which every unit in the population has a predetermined probability of being selected.

Statistical purposes

Functions related to the compilation, analysis and dissemination of statistics. Statistical purposes precludes use of a dataset for administrative or client management purposes, where there is an impact on specified individuals.

Statistical output

The result of any collection, storage, analysis and transformation of data where the individual statistical unit is of no interest in itself, and the results are presented in a form that does not reveal information about identifiable individuals.

Student enrolments data

Student enrolments data is collected by education departments in state and territory governments.


TableBuilder is an ABS dissemination tool that confidentialises data on the fly, allowing users to undertake cross-tabulations.

Unit record data

Unit record data refers to data where each record represents observations for an individual or organisation. Unit record data may contain individual responses to questions on a survey questionnaire or administrative form. For example, a unit record would have one person's answers given to the questions ‘In what year were you born?’, 'what is your address?' and 'what is your employment status?'.


Any characteristic, number, or quantity that can be measured or counted.

Vocational Education and Training in Schools data

Data on VET in Schools are collected through administrative sources in each state and territory. These authorities submit the data to the National Centre for Vocational Education Research where a national dataset is compiled. Data are inclusive of all persons aged 15-19 years who are enrolled in a VET in Schools module or unit.