Updates on the Census Data Enhancement Project
The Australian Bureau of Statistics is conducting a Census Data Enhancement project to add value to the data collected in the five-yearly Census of Population and Housing.
The central feature of the project is the creation of a Statistical Longitudinal Census Data set (SLCD), which is based on a 5% sample of the population. The aim is to link records for this sample from each population census by statistical techniques which do not involve the use of name and address. It is intended that the sample will be augmented at each census with a 5% random sample of people who have been born or migrated to Australia since the preceding census.
The whole 2006 Census data set may be used for quality studies. During the period of Census processing, names and addresses as well as other variables have been used to link Census data and other selected data sets for these quality studies. The quality studies that were proposed for the 2006 Census are of two types. The first type is to assess the feasibility and quality of linking without name and address, while the second is to help improve ABS statistical outputs.
Analytical Services Branch has undertaken the linkage work for four of these quality studies as shown in the table below. The linkage method used was probabilistic and implemented using a modified version of Febrl (Christen and Churches, 2005). The linkage runs have now been completed and analysis of the linked data sets is underway. Now that Census processing has been completed, all names and addresses provided by Census respondents have been removed from linked data sets. The datasets will not leave the ABS, nor be accessible to anyone other than those ABS officers involved in the quality studies. These linked data sets will be destroyed after use.
Quality Studies for which probabilistic linkage has been conducted by Analytical Services Branch
In the areas of assessing feasibility and linkage quality:
- Simulated SLCD formation - linking data sets for Census Dress Rehearsal 2005 and Census 2006, with the aim of assessing the feasibility of forming the SLCD without names and addresses and making defensible statements about the quality of the linked data;
- Migrant Settlements - linking data sets for Migrant Settlements since 2000 and Census 2006, with the aim of assessing the feasibility of a subsequent statistical study to investigate outcomes for immigrants admitted under different entry visas.
In the areas of improving ABS statistical outputs:
- Indigenous Mortality - linking data sets for Deaths between August 2006 & June 2007 and Census 2006, with the aim of estimating the under-coverage of reported Indigenous status on death certificates, and investigating the use of correction factors for improving estimates of Indigenous mortality;
- Investigate Possible Improvements to the 2011 Post Enumeration Survey - linking datasets for Post Enumeration Survey 2006 and Census 2006, with the aim of assessing the feasibility of replacing the current clerical matching with an automated procedure, and widening the search area for people who give vague addresses.
Reference: Christen, P. and Churches, T. (2005a) Febrl - freely extensible biomedical record linkage - release 0.3.1, Australian National University, Canberra, available at: http://cs.anu.edu.au/~Peter.Christen/Febrl/febrl-0.3/febrldoc-0.3
For further information about this project, please contact Glenys Bishop on (02) 6252 5140.