|Page tools: Print Page Print All|
3. LINKAGE RESULTS, 2011-2016, 2011 PANEL
TABLE 1 - LINKAGE RATES, By Selected Characteristics
(b) Includes Other Territories.
(c) Includes Migratory areas.
The linkage rates for the 2011-2016 ACLD were relatively consistent across most sub-populations and were in line with expected results. Compared with the overall linkage rate of 76%, the sub-populations which achieved the highest linkage rates were persons:
The sub-populations which achieved the lowest linkage rates were persons:
Traditionally, the Census Post Enumeration Survey (PES) has shown that the Census has higher rates of undercount for people of Aboriginal and/or Torres Strait Islander origin, those aged between 20 and 29 and for those in the Northern Territory. As expected, the lower ACLD linkage rates broadly aligned with the same groups that experience higher levels of undercount in the 2016 Census. One additional group that had lower linkage rates were persons aged 75 and over at the time of the 2011 Census who, due to age, had an increased risk of death over the ensuing five years. Further information on Census undercount can be found in Census of Population and Housing: Details of Overcount and Undercount, 2016 (cat. no. 2940.0).
Further, data cubes demonstrating the linkage rates for various sub-populations are available as an attachment to this Information paper.
3.1 LINKAGE ACCURACY
The following quality measures were calculated for the ACLD and indicate a good level of overall quality:
3.1.1 Linkage Precision
Not all record pairs assigned as links in a data linkage process are a true match, that is, a record pair belonging to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are actually false, i.e. the records in the link belong to different people rather than the same person. The linkage strategy used for the ACLD was designed to ensure a high level of accuracy while also achieving a sufficiently high number of links to enable longitudinal research. Accordingly, the strategy was restrictive and conservative.
One of the key measures of linkage quality is the proportion of links in the dataset that are false. The number of false links is able to be estimated through the use of methods such as clerically reviewing a sample of links, or by using modelling techniques. Once an estimate of the number of false links is obtained, a 'precision' can be calculated. The precision is an estimate of the proportion of links that are matches (i.e. belonging to the same entity).
Precision estimation for the ACLD involved conducting clerical review on a stratified random sample of links. Potential links were stratified by their link weight value, with a minimum of 5% of links sampled from each individual link weight value (after rounding down to the nearest integer). After reviewing the sample, the results were used to calculate precision estimates for links grouped by pass and rounded link weight value. These estimates were then applied to the entire set of linkage results. This provided an estimate of precision for each individual link, which can be referred to as 'marginal precision'. Using the marginal precision, the 'cumulative precision' of the final set of one-to-one links could be estimated.
After producing both marginal and cumulative precision estimates, a cut-off point was selected. This cut-off is intended to optimise both the number of links and cumulative precision of the links retained above the cut-off point, while at the same time maintaining a high level of marginal precision for every individual link above the cut-off. The marginal precision estimates were used to select the cut-off, with all links with a marginal precision of at least 81% being retained. This resulted in a final file of 927,520 links once the cut-off was applied, with an estimated cumulative precision of 98.6%, or a false link rate of 1.4%, for these links.
Clerical review relies upon judgment by a well trained individual, therefore, while efforts are taken to minimise the risk, it is possible for a link to be incorrectly assigned as a match or non-match. An alternative way of measuring precision is through the use of models. We applied the method of Chipperfield et al (2018) to provide an independent model-based estimate of the precision. While the clerical estimate of cumulative precision was 98.6%, the model-based approach estimated the precision to be over 99%. The precision as estimated by the clerical review process was retained as the more conservative estimate.
Table 2 provides a summary of the precision estimate and false link rate by the pass where each link was selected (estimated via clerical review).
TABLE 2 - PRECISION ESTIMATES AND FALSE LINK RATES, By Pass Number
(b) Data presented in the table have been unperturbed.
The conservative and restrictive nature of the blocking and linking strategy, accompanied by quality controls that were implemented during clerical review, helped to minimise the estimated number of false links throughout the linkage process.
Almost three quarters (73%) of all links were achieved in the first pass of the project, which used a deterministic linking methodology to identify and filter matches. This pass implemented tight geographic and demographic restrictions to maximise the number of high quality links assigned and to limit the amount of alternative comparisons required. Using this approach, links were only accepted if a single unique record pair was identified.
3.1.2 Consistency of Common Information on Record Pairs
In data linkage projects, geographic boundaries function as blocking variables that restrict the search for links to records which agree on the defined geography. They are also used as linking variables, and when combined with other linking fields (such as hashed name, age, sex and date of birth), they provide a high level of uniqueness, and reduce the likelihood of linking to an incorrect record.
Table 3 displays the number of records that had consistent information on key linking variables, grouped by levels of geography.
TABLE 3 - CONSISTENCY OF LINKED RECORDS, By Geography And Selected Linking Fields
(b) Categories are mutually exclusive. Records that agree in each category are excluded from subsequent categories.
(c) Percentages may not add up to the total due to rounding.
By contrast, record pairs may have inconsistent information and yet be a match. Inconsistent information may be recorded for the same person in different Censuses due to a range of factors, including:
Of particular note is inconsistency due to non-reporting of name and date of birth. Respondents are becoming less likely to provide their date of birth, with 90% reporting in the 2011 Census decreasing to 81% reported date of birth in the 2016 Census. Further, just over one per cent of Australians had a missing, or blank, response for first name or surname in the 2016 Census. There appeared to be a relationship between having a missing response for both first name and surname and non-response on other variables. Of the people who did not report first name and surname, approximately half did not report at least one of sex, age, or Indigenous status. The vast majority of missing responses came from paper forms, with the overall level of missing responses in the 2016 Census remaining low.
3.2 CHARACTERISTICS OF LINKED AND UNLINKED 2011 ACLD PANEL SAMPLE
The random sample selected from the 2011 Census for the 2011 ACLD Panel was designed to maximise overlap with the 2006 ACLD Panel, while also being representative of the Australian population by age, sex and jurisdiction as well as other characteristics such as Indigenous status and country of birth. The 2011 Panel sample size was increased in comparison to the 2006 Panel sample size primarily due to the increase in the Australian population from 2006 to 2011. The 2011 Panel size was increased slightly to 5.7%, to achieve a linked sample size closer to 5% of the population after allowing for missed links and people no longer being in scope of the ACLD due to death or overseas migration.
Table 4 shows the distribution of key populations across the 2011 Census, the 2011 ACLD Panel sample and the linked results.
TABLE 4 - SELECTED CHARACTERISTICS, By 2011 Census, 2011 ACLD Panel Sample, ACLD Linked Results
(b) Data presented in the table have been perturbed. As a result the sum of individual categories may not align with totals.
(c) Includes Other Territories.
(d) Includes Migratory areas.
The distribution of the ACLD file by sub-population was generally well aligned with both the 2011 Panel sample and the entire 2011 Census. When looking at the relative difference between these proportions, however, some differences are more clearly observed.
Compared with the entire 2011 Census, the linked 2011 ACLD Panel contains relatively more records for people aged 50-59 years, and to a lesser extent those aged 0-9 years, 40-49 years and 60-69 years. By contrast, the linked 2011 Panel contains relatively fewer records for people aged 20-29 years and 80 years and over. This is consistent with the 2006-2011 ACLD linkage as these subpopulations followed similar linkage rates.
In general, the distribution of weighted counts for the linked ACLD file is close to that of the entire 2011 Census, but it should be noted that the weighting process is not designed to produce counts corresponding to the population in 2011. Rather, the weighted population is that of people who were in scope of both the 2011 and 2016 Censuses (see Section 3.4 Weighting). Thus, for example, the lower proportion of older people in the linked file, even after weighting, reflects the impact on the 2011 Panel sample of deaths that occurred between 2011 and 2016.
Further data cubes demonstrating more detailed population distributions are provided as an attachment to this Information paper.
3.3 REASONS FOR UNLINKED RECORDS
There are two main reasons why records from the 2011 Panel sample were not linked to a 2016 Census record:
3.3.1 Missing and/or inconsistent information
In these cases, the true match was present in the pool of all record pairs but it was not identified because there was a high level of inconsistency between information on the 2011 ACLD Panel sample record and the 2016 Census record, or key linking fields were missing altogether. The reasons for the match being missed can be categorised into the following groups:
Accurate address coding was crucial in narrowing the search and differentiating between true and false links. It was a particular challenge for persons who had moved, since linkage was then dependent on the information supplied in 2016 about the person's address in 2011. Processing for the 2016 Census involved coding for address five years ago to a fine level of geography, ideally Mesh Block. This was not always possible, due to insufficient and/or incorrect address information being supplied for some persons, potentially due to recall issues.
3.3.2 No 2016 Census record
A person included in the 2011 ACLD Panel sample may have had no equivalent 2016 Census record because they were no longer in scope for the Census due to migration from Australia, or death between 2011 and 2016, or they may have been missed in the 2016 Census.
According to mortality data compiled by the ABS from data supplied by the Registrars of Births, Deaths and Marriages, approximately 913,000 people died in Australia between 2011 and 2016. If 5% of these people were selected in the 2011 Panel sample, then it could be estimated that up to 46,000 people could not have been linked due to death between 2011 and 2016. Similarly, migration data estimates that just over 1.4 million people left Australia as permanent emigrants over the same period, potentially resulting in up to 70,000 people from the 2011 Panel sample being unlikely to have a corresponding 2016 Census record. For more information please refer to the relevant releases of Migration, Australia (cat. no. 3412.0) and Deaths, Australia (cat. no. 3302.0).
Due to the size and complexity of the Census, it is inevitable that some people are missed and some are counted more than once. It is for this reason that the Census Post Enumeration Survey (PES) is run shortly after each Census, to provide an independent measure of Census coverage. The PES determines how many people should have been counted in the Census, how many were missed (undercount), and how many were counted more than once (overcount). It also provides information on the characteristics of those in the population who have been under- or overcounted.
The net undercount rate for the 2016 Census was 1%, with a higher rate for Aboriginal and Torres Strait Islander people than for the non-Indigenous population. Thus approximately 12,000 people from the 2011 Panel sample could have been missed in the 2016 Census. This estimate is a starting point only and does not take into account the likelihood of people being missed in successive Censuses. For more information please refer to Census of Population and Housing: Details of Overcount and Undercount, 2016 (cat. no. 2940.0).
When taking into account all of these factors, it is estimated that approximately 40% of the unlinked 2011 ACLD Panel sample (128,000 out of the 293,000 unlinked records) would not have a corresponding record in the 2016 Census. This would indicate that the initial linkage rate of 76% could be representative of up to 85% of the population that actually had an opportunity to be linked.
Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, persons. The weight can be considered an indication of how many people in the relevant population are represented by each person in the sample. Weights were created for linked records in the ACLD to enable longitudinal population estimates to be produced. Cross-sectional population estimates for 2011 and 2016 are available from each Census.
The 2011 Panel of the ACLD is a random sample of 5% of the Australian population in 2011. As such, each person in the sample should represent about 20 people in the population. Between Censuses, however, the in scope population changes as people die or move overseas. In addition, Census net undercount and data quality can affect the capacity to link equivalent records across waves. The weights of the linked records on the ACLD were calibrated to the estimated population that was in scope of both the 2011 and 2016 Censuses, 21,080,214 persons. The weights were based on four components: the design weight, undercoverage adjustment, missed link adjustment and population benchmarking.
The mean final weight for the linked records is 22.3 for females and 23.2 for males. The weights range between 14.8 and 83. The mean weight was higher for Aboriginal and Torres Strait Islander persons and for people in the Northern Territory.
The population benchmark is based on the 2016 Estimated Resident Population (ERP), which is adjusted by the estimated probability a person was also in Australia in 2011. This probability is formed using the 2016 Census reported address five year ago variable. Further information on this approach can be found in the paper Chipperfield, Brown & Watson (2016). See References section for details of this publication.
For more information about weighting please refer to the Appendix.
These documents will be presented in a new window.
2080.5 - Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 Quality Declaration
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 20/03/2019