1.1 OVERVIEW
In 2005, the ABS embarked on a project to enhance the value of Census data by bringing it together with other datasets, both ABS and non-ABS, to leverage more information from the combination of datasets than would be available from the individual datasets separately. The ACLD was proposed as an enduring longitudinal dataset constructed through the linking of records from successive Censuses.
As part of the development phase, a quality study was undertaken in which data from the 2005 Census Dress rehearsal were linked to data from the 2006 Census. This quality study concluded that the linkage methodology was feasible and that the expected quality of the linked data file would be sufficient for longitudinal analysis. For more information see, Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (cat. no. 1351.0.55.026).
As a result of the positive assessment from this quality study, a 5% random sample (979,661 records) was selected from the 2006 Census to comprise Wave 1 of the ACLD. This sample was then brought together with data from the 2011 Census using data linkage techniques, resulting in a linked data file consisting of 800,759 records.
Data linkage is typically undertaken using probabilistic and/or deterministic methods, both of which were used in the ACLD project:
- Probabilistic: linkage is based on the level of overall agreement on a set of variables common to the two datasets. This approach allows links to be assigned in spite of missing or inconsistent information, providing there is enough agreement on other variables to offset any disagreement.
- Deterministic: linkage involves assigning record pairs across two datasets that match exactly or closely on common variables. This type of linkage is most applicable where the records from different sources consistently report sufficient information and can be an efficient process for conducting linkage.
In addition, the ABS refers to three types of linkage which are based on the variables used. These can be broadly grouped in order of linkage quality:
- Gold: linking using name, address and personal characteristics such as age and sex.
- Silver: linking using an encrypted, non-identifiable numeric version of name and personal characteristics.
- Bronze: linking using only personal characteristics.
Bronze linkage with both deterministic and probabilistic components was used to combine the 2006 Census sample and the 2011 Census. This method was selected based on the type of information available for linkage and the results from the quality study that linked the 2005 Census Dress Rehearsal and the 2006 Census. The quality study had investigated the relative suitability of Gold, Silver and Bronze methods and concluded that, whilst linkage using name and address information would provide a high quality match, a Bronze linkage would still yield a dataset of sufficient quality for longitudinal analysis. This study also identified that use of a non-identifying, grouped numeric code (hash code) based on name (Silver linkage) could also improve the quality and efficiency of the linkage process in the future. For more information see,
Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (cat. no. 1351.0.55.026).
At each Census, the ACLD will be augmented with a sample of children who have been born and immigrants who have arrived in Australia since the previous Census, to maintain the size of the longitudinal dataset.
For many individuals the linkage process will have accurately matched their 2006 Census record with the corresponding record from the 2011 Census. In some cases, the link will represent different people who share a number of characteristics in common. Some inaccuracy in the linkage will not generally affect statistical conclusions drawn from the linked data, although care should be taken in the interpretation of results. For more information see the
Data linking methodology chapter.