2062.0 - Census Data Enhancement Project: An Update, Oct 2010  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 15/10/2010   
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> Appendix

APPENDIX 1 CENSUS DATA ENHANCEMENT PROJECT 2006

The 2006 Census Data Enhancement (CDE) project encompassed three components:

  1. creation of a 5% Statistical Longitudinal Census Dataset (SLCD);
  2. bringing together 2006 Census data with ABS and non-ABS datasets using name and address during Census processing to undertake quality studies; and
  3. bringing together the 5% SLCD with specified non-ABS datasets for statistical and research purposes.

CREATION OF A STATISTICAL LONGITUDINAL CENSUS DATASET (SLCD)

A major component of the CDE project is the creation of a Statistical Longitudinal Census Dataset, or SLCD. The first wave of the SLCD was formed with a 5% random sample of the population from the 2006 Census. The second wave of the SLCD will be created by combining the wave one data in the SLCD 2006 with data from the 2011 Census of Population and Housing.

The 5% SLCD is formed using statistical data linking techniques (see Glossary) rather than matching based on name and address. A paper outlining methods for creating the 5% SLCD, results from similar statistical linking projects, and preliminary results of linking can be found in ABS Research Paper: Exploring Methods for Creating a Longitudinal Census Dataset (ABS cat. no. 1352.0.55.076).

A quality study was undertaken as part of the 2006 CDE project which assessed the likely quality of the 5% SLCD. The results of that study found that the 5% SLCD would produce similar or better quality results to a panel survey. For further information, see Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (ABS cat. no. 1351.0.55.026).

Further information about the second wave of the 5% SLCD and plans for the third wave are outlined in Section 3 of this paper.

BRINGING TOGETHER 2006 CENSUS DATA WITH OTHER DATASETS USING NAME AND ADDRESS DURING CENSUS PROCESSING TO UNDERTAKE QUALITY STUDIES

The second component of the 2006 CDE project was the undertaking of several quality studies which involved bringing together data from the 2006 Census with other ABS and non-ABS datasets. The agreement of the custodians of the non-ABS datasets was required for projects using non-ABS datasets. The aim of these studies was to understand and evaluate the quality of ABS statistical operations and outputs, to better inform ABS on the most suitable statistical techniques for bringing together the 5% SLCD and other datasets, and to assess the quality of datasets created using these techniques.

The quality studies were undertaken during the 2006 Census processing period using names and addresses to undertake the linkage, after which all Census forms and names and addresses held by the ABS were destroyed. The linked datasets created for the quality studies did not contain name and address information and were destroyed at the completion of the studies (see previous comments regarding security, confidentiality and accessibility).

A number of quality studies were proposed for Census 2006 as outlined in the information paper Census Data Enhancement Project: An Update (ABS cat. no. 2062.0). The outcomes of studies that did proceed are outlined below.

Feasibility of combining the 5% SLCD with data from future Censuses

This study aimed to test the feasibility of bringing together a 5% sample of one Census with subsequent Censuses using statistical techniques. This was simulated by linking the 2005 Census Dress Rehearsal dataset to the 2006 Census data both with and without names and addresses as matching variables. The linking using name and address acted as a benchmark for assessing the quality of the linking without using name and address. Details about the linking methodologies used, the application of these methodologies in the quality study using the 2005 Census Dress Rehearsal dataset and the outcomes of the study were released in three research papers:

The ABS will be undertaking a series of quality studies using the 2010 Census Dress Rehearsal dataset . For further details, see '1 Bringing together 2011 census data with other datasets during census processing using name and address, for quality studies'.Indigenous Mortality

This study brought together data from the 2006 Census with data from death registrations for the period August 2006 to 30 June 2007, during the Census processing period using name and address. The study assessed the undercoverage of Indigenous deaths in death registration records; identified factors that may be contributing to undercoverage in Indigenous deaths in death registrations; and assessed the feasibility of calculating and applying adjustment factors to improve estimates of Indigenous mortality.

Findings from the quality study using death registration data can be found in the information paper, Census Data Enhancement - Indigenous Mortality Quality Study, 2006-07 (ABS cat. no. 4723.0).

Adjustment factors obtained in the quality study were subsequently used in the development and introduction of a new method to derive adjusted Indigenous deaths used to produce life tables and life expectancy estimates for Aboriginal and Torres Strait Islander Australians. The availability of information from the quality study considerably improved the quality and robustness of the estimates of Indigenous life expectancy. These estimates had previously relied on a range of assumptions about the level of under identification of Indigenous deaths in each jurisdiction. The linked Census and death registration data pointed to significant deficiencies in these assumptions, resulting in significant underestimation of Indigenous life expectancy in some states/territories. For further information, see Discussion Paper: Assessment of Methods for Developing Life Tables for Aboriginal and Torres Strait Islander Australians, 2006 (ABS cat. no. 3302.0.55.002).

The ABS will be undertaking an Indigenous Mortality project as part of the 2011 CDE project. For further details, see '2.1 Indigenous Mortality Project'.

Assessing the feasibility of bringing together data from the Department of Immigration and Citizenship's Settlement Database and the 2006 Census

The Migrants Quality Study was conducted to assess the feasibility of linking the Department of Immigration and Citizenship's Settlement Database (SDB) to the 5% Statistical Longitudinal Census Dataset (SLCD) without the use of name and address as linking variables. Findings of the quality study Assessing the Quality of Linking Migrant Settlement Records to Census Data (ABS cat. no. 1351.0.55.027) were released in August 2009. The results from the quality study indicated that linking the SDB to the 5% SLCD is feasible and can produce useful information that no other data source currently provides. However, some quality issues were identified and further work was proposed to ensure that the linked data are correctly interpreted and appropriately used. For further details, see '1 Bringing together 2011 census data with other datasets during census processing using name and address, for quality studies'.

Assessing Automatic Data Linking for the Census Post Enumeration Survey

The Census Post Enumeration Survey (PES) is conducted a few weeks after the Census to estimate the number of people who were missed in the Census or who were counted more than once. This is done by matching Census and PES responses. A quality study undertaken after the 2006 PES assessed the feasibility of introducing automated linking processes to improve the efficiency and effectiveness of the PES, in line with previous changes to the survey, such as the introduction of Computer Assisted Interviewing in 2006. The study found that automated data linking can provide important quality and efficiency gains to replace in part, though not entirely, the clerical matching process. A key advantage of automated data linking is that it provides a greater capability for locating respondents at undisclosed or poorly reported Census night addresses. BRINGING TOGETHER THE 5% SLCD WITH SPECIFIED NON-ABS DATASETS FOR STATISTICAL AND RESEARCH PURPOSES

The third component of the 2006 CDE project was the bringing together of the 5% SLCD with specified non-ABS datasets using statistical techniques. The agreement of the custodians of the non-ABS datasets was required for these projects, and projects could only be for statistical and research purposes.

Three non-ABS datasets were identified for this component of the CDE Project. These were birth and death register data, including cause of death data; migrants data from the Department of Immigration and Citizenship's Settlement Database (SDB); and national disease registers.

A statistical study based on bringing together data from the Department of Immigration and Citizenship's Settlement Database with data from the 2006 Census was undertaken. The bringing together of migrant information with Census information had the potential to provide insights into patterns of settlement of different groups of migrants, including family formation, housing, labour force characteristics, changing occupations, educational pathways and region of settlement. The study started after the completion of the quality study (discussed above) and results have been released Perspectives on Migrants, June 2010 (ABS cat. no. 3416.0).

PRIVACY AND CONFIDENTIALITY

A fundamental aspect of the CDE project is the management of privacy and confidentiality. The ABS applied strict protocols to the 2006 CDE project to ensure that the privacy of individuals and the confidentiality of their data was protected throughout the 2006 CDE project. These protocols included:
  • legislative protections requiring all data collected by, or supplied to the ABS, including datasets created for the CDE project, to remain confidential to the ABS;
  • destruction of all Census forms and deletion of all name and address information from the 2006 Census, including the names and addresses for the 5% sample of the population contained in the SLCD. The ABS will not retain Census name and address once Census processing is completed. The only exception is if a person explicitly agrees by answering the relevant question on the Census form to have their name-identified responses retained by the National Archives of Australia for release in 99 years time (see Glossary for further detail);
  • strict application of standard ABS procedures to ensure that all aggregate outputs disseminated by the ABS as a result of the quality studies were unlikely to enable identification of any individual or household.
  • meeting all obligations under the Information Privacy Principles for the 2006 Census, including the provision of information about the purpose and use of the information provided in the Census, data security and the release of a CDE Fact Sheet.

Audits undertaken by Oakton in 2008 and 2010 found that:
  • the security and confidentiality arrangements put in place to protect the data associated with the Statistical Longitudinal Census Dataset (SLCD) and the quality studies complied with the originally proposed controls; and
  • the linked datasets created using name and address were deleted after use.



Previous Page