2080.5 - Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 Quality Declaration 
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 20/03/2019   
   Page tools: Print Print Page Print all pages in this productPrint All

3. LINKAGE RESULTS, 2006-2011 (ORIGINAL), 2006 PANEL

At the completion of the linkage process, 800,759 (82%) out of the 979,661 records from the 2006 Census sample (Wave 1) were linked to a 2011 Census (Wave 2) record to create the linked ACLD. This linkage rate was consistent with results from other Bronze linkage projects using the 2006 and 2011 Census.

All results presented in this publication (unless identified in the relevant table) are based on characteristics from the Wave 1 sample and have been confidentialised to prevent the identification of individuals.

Table 2 displays the linkage rate for a range of sub-populations.

TABLE 2 - LINKAGE RATES, By selected characteristics
2006 Census sample

ACLD
Linkage rate
(no.)
(no.)
(% )

Sex
Male
480 285
390 487
81.3
Female
499 372
410 274
82.2
Age group (years)
0-14
194 017
170 834
88.1
15-19
66 247
51 220
77.3
20-24
66 512
49 327
74.2
25-29
62 249
48 642
78.1
30-39
140 271
117 655
83.9
40-49
142 911
123 946
86.7
50-59
126 285
108 962
86.3
60-69
86 385
71 906
83.2
70-74
31 004
23 678
76.4
75 and over
63 784
34 586
54.2
Indigenous status
Non-Indigenous
942 253
775 419
82.3
Aboriginal
19 697
13 340
67.7
Torres Strait Islander
1 449
923
63.7
Both Aboriginal and Torres Strait Islander
839
543
64.7
Not stated
15 416
10 530
68.3
State/Territory of usual residence
New South Wales
323 136
263 369
81.5
Victoria
244 095
203 668
83.4
Queensland
192 606
154 013
80.0
South Australia
75 481
62 239
82.5
Western Australia
95 795
77 921
81.3
Tasmania
23 787
19 583
82.3
Northern Territory
8 469
6 226
73.5
Australian Capital Territory
16 186
13 680
84.5
Remote areas
Major Cities
669 274
552 339
82.5
Inner Regional
195 401
159 611
81.7
Outer Regional
92 396
73 122
79.1
Remote
13 989
10 533
75.3
Very Remote
6 546
4 602
70.3
No Usual Address
2 029
539
26.6
Total(a)(b)(c)
979 661
800 759
81.7


(a) Data presented in the table have been confidentialised. As a result, the sum of individual categories may not align with totals.
(b) Includes Other Territories.
(c) Includes Migratory areas.

The linkage rates that were achieved for the ACLD were relatively consistent across most sub-populations and were in line with expected results. Compared with the national average of 82%, the sub-populations which achieved the highest linkage rates were persons:
  • aged 0 to 14 years (88%), followed by 40 to 49 years (87%) and 50 to 59 years (86%)
  • of non-Indigenous origin (82%)
  • who usually lived in the ACT (85%) and Victoria (83%)
  • who usually lived in Major cities (83%).


The subpopulations which achieved the lowest linkage rates were persons:
  • aged 20-24 years (74%) and 75 years and over (54%)
  • of Aboriginal (68%), Torres Strait Islander (64%) or both Aboriginal and Torres Strait Islander origin (65%)
  • who usually lived in the Northern Territory (74%)
  • who usually lived in remote (75%) and very remote areas (70%) or who had no usual address in 2006 (27%).


Traditionally, the Census Post Enumeration Survey (PES) has shown that the Census has higher rates of undercount for people of Aboriginal and/or Torres Strait Islander origin, those aged between 20 and 29 and for those in the Northern Territory. As expected, the lower ACLD linkage rates broadly aligned with the same groups that experience higher levels of undercount in the Census. One additional group that had lower linkage rates were persons aged 75 and over at the time of the 2006 Census who, due to age, had an increased risk of death over the ensuing five years. Further information on Census undercount can be found in Census of Population and Housing - Details of Undercount, 2011 (cat. no. 2940.0)

Further data cubes, demonstrating the linkage rates for various sub-populations are available as an attachment to this Information paper.

3.1 LINKAGE ACCURACY

The following quality measures were calculated for the ACLD and indicate a good level of overall quality:
  1. The linkage rate, that is the proportion of the 2006 Census sample records linked to a 2011 Census record, including the number of true matches and false links.
  2. The consistency of reporting of common information between record pairs.

3.1.1 Linkage Rates, True and False Links

Not all record pairs assigned as links in a data linkage exercise are a match, that is, a record pair belonging to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are nevertheless false. The linkage strategy used for the ACLD was designed to achieve both a high number of links and to ensure a high level of accuracy to enable longitudinal research. Accordingly, the strategy was restrictive and conservative, especially in the early passes.

Analysis from the results of clerical review was conducted to determine the quality of the linkage process and estimate the number of true links in the linked ACLD file. This process involved calculating the proportion of rejected record pairs at each linkage weight and determining the amount of false links this would represent in the final output file.

Table 3 provides a summary from the results of clerical review, including an estimate of the number of false links accepted in each pass. Due to the nature of deterministic linking and the way in which linked records were retained, no false links were identified in passes 1 and 2. While it is assumed that all links assigned in these passes were true, as they contained consistent information across all key linking fields, in reality there may have been a small but un-quantifiable number of false links.


TABLE 3 - LINKAGE RESULTS, By pass number

Pass number(a)
1
2
3
4
5
6
7
8
9
11
12
Total(b)

Links created (No.)
559 182
131 575
11 131
182 285
212 071
57 713
10 489
10 156
236 180
133 555
29 911
1 574 248
Sampled in clerical review (No.)
30
30
240
400
400
345
206
120
411
201
200
2 583
Links assigned (No.)
544 925
10 919
10 489
62 570
87 248
18 988
1 723
159
50 007
9 827
3 904
800 759
Total false links (No.)
0
0
997
9 929
17 274
1 832
237
29
10 712
1 051
731
42 792
False link rate (%)
0
0
9.5
15.9
19.8
9.6
13.7
18.4
21.4
10.7
18.7
5.3


(a) The results of Pass 10 were used to identify the blocking field to be used in Pass 11. As a result, there were no records output from Pass 10.

(b) Data presented in the table have been confidentialised. As a result the sum of individual categories may not align with totals.



The combined clerical review results indicate that the number of false links in the final ACLD file could be as low as 5%. By including a tolerance around these results and assuming a small false link rate for the deterministic passes, the false link rate for the ACLD is estimated to be about 5-10%. The passes that contained the highest proportion of false links were Pass 9 (21.4%), where family information was used to try and resolve unlinked records, and Pass 5 (19.8%), which used a broad geography (SA4) as the blocking field. Whilst this is only an approximate estimate, it does give an indication of the high level of overall quality examined through reviewing a sample of over 2,500 record pairs.

The linkage rate of 82% with a false link rate of 5% was broadly consistent with, or better than, other ABS Census linkage projects which did not use name and address as linkage variables (see Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (cat. no. 1351.0.55.026)).

The conservative and restrictive nature of the blocking and linking strategy helped to minimise the number of estimated false links throughout the linkage process accompanied by quality controls that were implemented during clerical review.

About two-thirds (68%) of all links were achieved in the first pass of the project, which used a deterministic linking methodology to identify and filter matches. In Pass 1, a tight geographic and demographic restriction was implemented to maximise the amount of high quality links assigned and to limit the amount of alternative comparisons required. Using this approach, links were only accepted if a single record pair was identified.

3.1.2 Consistency of Common Information on Record Pairs

In data linkage projects, geographic boundaries function as blocking variables that restrict the search for record pairs. They are also used as linking variables, and when combined with other linking fields such as age, sex and date of birth, provide a high level of uniqueness, and reduce the likelihood of linking to an incorrect record.

Table 4 displays the number of records that had consistent information and is grouped by the consistency of the record pairs across varying levels of geography.

TABLE 4 - CONSISTENCY OF LINKED RECORDS, By geography and selected linking fields

Consistency of key linkage fields(a)(b)
(no.)
(%)

Mesh Block combined with
Age exact, Sex, DOB Day and Month agree
552 714
69.0
Age exact, Sex agree
41 135
5.1
Age +/- 2 years, Sex agree
77 98
1.0
SA2 combined with
Age +/- 2 years, Sex , DOB Day and Month agree
84 265
10.5
Age +/- 2 years, Sex agree
26 739
3.3
SA4 combined with
Age +/- 2 years, Sex , DOB Day and Month agree
66 623
8.3
Total records included
779 274
97.3
Total records linked
800 759
100

(a) Only includes records that agree on all key linking fields.
(b) Categories are mutually exclusive. Records that agree in each category are excluded from subsequent categories.

Just over 97% of all records that were matched in the ACLD linkage process agreed on small to medium levels of geographic area combined with other key linking fields, such as age, sex and date of birth. While the number of consistent fields can give a strong indication of likely linkage quality, other factors should be taken into account, for example, the expected number of people in a geographic area that are likely to share a characteristic by chance. A tolerance of plus or minus two years was used at certain parts of the linkage process to cater for persons who may have understated their age in 2006 and overstated it in 2011 or vice versa.

By contrast, record pairs may have inconsistent information and yet be a true link. Inconsistent information may be recorded for the same person in different Censuses due to a range of factors, including:
  • Transcription errors in the Census, where the wrong category is selected or the information is transposed, such as the day the person was born being reported in the month instead of as the day field.
  • Data capture errors, where the Census form is scanned using Optical Character Recognition software and certain characters may be mis-classified, such as a 1 captured as a 7 or a 3 as an 8.
  • Reporting errors, where information is given for the wrong member of the household (e.g. person 1's information is reported for person 3) or where the person completing the Census estimates information that they do not know (e.g. about a fellow group household member).
  • Information that was not stated by the respondent and has been imputed as part of Census processing (such as age or sex).
  • A different person fills out the Census form at the different time points and interprets the questions differently.

3.1.2.1 Consistent Reporting of Indigenous Status

Consistency of Indigenous status is a special case, since the change in reporting over time is both a potential indicator of linkage quality, and is of analytical interest.

Results from the 2011 Census observed an unexpected increase in persons who identified as being of Aboriginal and/or Torres Strait Islander origin. This was due, in part, to improvements in Census collection practices that resulted in a more complete enumeration of the Aboriginal and Torres Strait Islander population in 2011 than in 2006. In addition, a significant contributor to this increase, was a change in the propensity of people to identify as being of Aboriginal and/or Torres Strait Islander origin in 2011 compared with 2006 (see Census of Population and Housing: Understanding the Increase in Aboriginal and Torres Strait Islander Counts, 2006-2011 (cat. no. 2077.0)).

While there was a group of people in the ACLD who were identified as non-Indigenous in 2006 and of Aboriginal and/or Torres Strait Islander origin in 2011, this group was relatively small and was counterbalanced by an almost equally sized group who reported the opposite. This pattern of change is different to that expected, given the increasing propensity of people to identify their Aboriginal and Torres Strait Islander origin observed at the aggregate level in the entire 2011 Census.

Throughout the linkage process, Indigenous status was used as a blocking and linking variable. Whilst this would have only made a small contribution to the linkage weight, this may have increased the likelihood of assigning a link to a record pair that contained consistent information for Indigenous status. Record pairs that contained inconsistent information for Indigenous status still had a good chance of being linked, however, providing there was sufficient additional information available for linking.

Differences in the reporting of Indigenous status between 2006 and 2011 on the ACLD may be due to a range of reasons. These include:

  • people deliberately identifying their Indigenous origin differently at the two time points
  • false links, where similar but not identical persons have been linked
  • data capture errors, where multiple boxes may have been selected
  • a different person filling out the Census form at each period of time and interpreting the question on Indigenous status differently
  • transcription errors in the Census, where the wrong category is selected by accident.

Table 5 shows the reporting of Indigenous status for the linked records on the ACLD, across the 2006 and 2011 Censuses. Further data cubes, demonstrating a more detailed breakdown, by remoteness areas, are provided as an attachment to this Information paper.

TABLE 5 - CONSISTENCY OF INDIGENOUS STATUS FOR LINKED RECORDS, 2006 and 2011
2011 INDIGENOUS STATUS

Non-Indigenous
Aboriginal and/or Torres Strait Islander
Not stated
Total
(no.)
(no.)
(no.)
(no.)

2006 INDIGENOUS STATUS

Non-Indigenous
766 851
1 697
6 868
775 419
Aboriginal and/or Torres Strait Islander
1 367
13 274
165
14 802
Not stated
9 729
226
575
10 530

Total(a)
777 946
15 205
7 609
800 759


(a) Data presented in the table have been confidentialised. As a result, the sum of individual categories may not align with totals.

3.2 CHARACTERISTICS OF LINKED AND UNLINKED 2006 CENSUS SAMPLE

Table 6 shows the distribution of key populations across the 2006 Census, the 2006 Census sample and the ACLD.


TABLE 6 - SELECTED CHARACTERISTICS, By 2006 Census, 2006 Census sample and ACLD

ACLD

2006 Census
2006 Census sample
Original results
Weighted results(a)




(no.)
(%)
(no.)
(%)
(no.)
(%)
(no.)
(%)

Sex
Male
9 896 500
49.3
480 285
49.0
390 487
48.8
9 193 092
49.4
Female
10 165 146
50.7
499 372
51.0
410 274
51.2
9 432 201
50.6
State/Territory of usual residence
New South Wales
6 549 174
32.6
323 136
33.0
263 369
32.9
6 093 946
32.7
Victoria
4 932 422
24.6
244 095
24.9
203 668
25.4
4 624 754
24.8
Queensland
3 904 531
19.5
192 606
19.7
154 013
19.2
3 635 806
19.5
South Australia
1 514 340
7.5
75 481
7.7
62 239
7.8
1 445 720
7.8
Western Australia
1 959 088
9.8
95 795
9.8
77 921
9.7
1 858 559
10.0
Tasmania
476 481
2.4
23 787
2.4
19 583
2.4
465 052
2.5
Northern Territory
192 899
1.0
8 469
0.9
6 226
0.8
179 713
1.0
Australian Capital Territory
324 034
1.6
16 186
1.7
13 680
1.7
319 439
1.7
Age group (years)
0-9
2 579 496
12.9
127 331
13.0
114 298
14.3
2 551 524
13.7
10-19
2 756 102
13.7
132 937
13.6
107 761
13.5
2 541 650
13.6
20-29
2 684 371
13.4
128 760
13.1
97 973
12.2
2 348 272
12.6
30-39
2 893 058
14.4
140 271
14.3
117 655
14.7
2 800 173
15.0
40-49
2 942 353
14.7
142 911
14.6
123 946
15.5
2 868 511
15.4
50-59
2 574 589
12.8
126 285
12.9
108 962
13.6
2 473 288
13.3
60-69
1 733 297
8.6
86 385
8.8
71 906
9.0
1 640 081
8.8
70-79
1 168 675
5.8
58 277
5.9
42 262
5.3
993 870
5.3
80 and over
729 705
3.6
36 502
3.7
16 002
2.0
408 018
2.2
Indigenous status
Non-Indigenous
18 266 814
91.1
942 253
96.2
775 419
96.8
17 806 585
95.6

Aboriginal and/or Torres Strait Islander
455 027
2.3
21 985
2.2
14 802
1.8
561 088
3.0
Aboriginal
407 700
2.0
19 697
2.0
13 340
1.7
507 554
2.7
Torres Strait Islander
29 515
0.1
1 449
0.1
923
0.1
32 876
0.2
Both Aboriginal and Torres Strait Islander
17 812
0.1
839
0.1
543
0.1
20 805
0.1
Not stated
1 133 449
5.6
15 416
1.6
10 530
1.3
257 343
1.4
Overseas visitor
206 357
1.0
0
0.0
0
0.0
0
0.0
Total(b)(c)(d)
20 061 646
100
979 661
100
800 759
100
18 625 246
100

(a) For more information on weighting see chapter 3.4.
(b) Data presented in the table have been confidentialised. As a result, the sum of individual categories may not align with totals.
(c) Includes Other Territories.
(d) Includes Migratory areas
.



The distribution of the ACLD file by sub-population was generally well aligned with both the 2006 Census sample and the entire 2006 Census. When looking at the relative difference between these proportions, however, some differences are more clearly observed.

Compared with the entire 2006 Census, the linked ACLD contains relatively more records for people aged 0-9 years, and to a lesser extent those aged 40-49 years, 50-59 years and 60-69 years. By contrast, the ACLD contains relatively fewer records for people aged 20-29 years and 80 years and over. There is also relatively fewer people of Aboriginal and Torres Strait Islander origin in the ACLD, than the entire 2006 Census (1.8% compared with 2.3%). The corresponding weighted estimate, however, represents 3.0% of the total population, which is attributed to benchmarking the 2006 sample to the Aboriginal and Torres Strait Islander population in 2011 and therefore to the higher level of identification observed in the 2011 Census than in 2006 (see section 3.4).

In general, the distribution of weighted counts for the linked ACLD file is close to that of the entire 2006 Census, but it is not designed to produce counts corresponding to the population in 2006. Rather, the weighted population is that of people who were in scope of both the 2006 and 2011 Censuses (see section 3.4). Thus, for example, the lower proportion of older people in the linked file, even after weighting, reflects that impact of deaths on the 2006 sample that occurred between 2006 and 2011.

Further data cubes, demonstrating more detailed population distributions, are provided as an attachment to this Information paper.

3.3 REASONS FOR UNLINKED RECORDS

There are two main reasons why records from the 2006 Census sample were not linked to a 2011 Census record:
  1. Records belonging to the same individual were present in the 2006 Census sample and the 2011 Census but these records failed to be linked because they contained missing or inconsistent information.
  2. There was no 2011 Census record corresponding to the 2006 Census sample because the person was not counted in the Census.

3.3.1 Missing and/or Inconsistent Information

In these cases, the true match was present in the pool of all record pairs but it was not identified because there was a high level of inconsistency between information on the 2006 Census sample and the 2011 Census record, or key linking fields were missing altogether. The reasons for the match being missed can be categorised into the following groups:
  • The missing or inconsistent information did not allow the record pair to be compared in the same blocking categories and could not be linked.
  • The record pair did not contain enough common information to distinguish the match from other potential record pairs.
  • The record pair was linked, but was attributed a low link weight as it contained a lot of missing or inconsistent information and was positioned below the cut-off identified in sample clerical review.
  • The record pair was subjected to clerical review, but the high level of inconsistency did not enable it to be deemed a link.

Accurate address coding was crucial in narrowing the search and differentiating between true and false links. It was a particular challenge for persons who had moved, since linkage was then dependent on the information supplied in 2011 about the person's address in 2006. Processing for the 2011 Census involved coding for address five years ago to a fine level of geography, ideally Mesh block. This was not always possible, either due to the insufficient detail of address information supplied or because by 2011, Census respondents may not have accurately remembered their address on Census Night in 2006.

3.3.2 No 2011 Census Record

A person included in the 2006 Census sample may have had no equivalent 2011 Census record because they were no longer in scope for the Census due to migration from Australia, or death between 2006 and 2011, or they may simply have been missed in the Census.

According to mortality data compiled by the ABS from data supplied by the Registrars of Births, Deaths and Marriages, about 700,000 people died in Australia between 2006 and 2011. If 5% of these people were represented in the 2006 sample, then it could be expected that up to 35,000 people could not have been linked due to death between 2006 and 2011. Similarly, migration data shows that just over one million people left Australia as permanent emigrants over the same period, potentially resulting in up to 50,000 people from 2006 Census sample being unlikely to have a corresponding 2011 Census record.

Due to the size and complexity of the Census, it is inevitable that some people are missed and some are counted more than once. It is for this reason that the Census Post Enumeration Survey (PES) is run shortly after each Census, to provide an independent measure of Census coverage. The PES determines how many people should have been counted in the Census, how many were missed (undercount), and how many were counted more than once (overcount). It also provides information on the characteristics of those in the population who have been missed or overcounted.
    The net undercount rate for the 2011 Census was 1.7%, with a higher rate for Aboriginal and Torres Strait Islander people than for the non-Indigenous population (see Census of Population and Housing - Details of Undercount, 2011 (cat. no. 2940.0)) Thus, roughly, 15,000 people from the 2006 Census sample could have been missed in the 2011 Census. This estimate is a starting point only and does not take into account the likelihood of people being missed in successive Censuses.

    When taking into account all of these factors, it is estimated that over half of the unlinked 2006 Census sample (100,000 out of the 180,000 unlinked records) would not have a corresponding record in the 2011 Census. This would indicate that the initial linkage rate of 82% could be representative of up to 91% of the population that actually had an opportunity to be linked.

    3.4 WEIGHTING


    Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, persons. The weight can be considered an indication of how many people in the relevant population are represented by each person in the sample. Weights were created for linked records in the ACLD to enable longitudinal population estimates to be produced. Cross-sectional population estimates for 2006 and 2011 are available from each Census.

    The ACLD began as a random sample of 5% of the Australian population in 2006. As such, each person in the sample should represent about 20 people in the population. Between Censuses, however, the in scope population changes as people die or move overseas. In addition, Census net undercount and data quality can affect the capacity to link equivalent records across waves. The ACLD weighting process, benchmarked the linked ACLD records to the population that was in scope of both the 2006 and 2011 Censuses. The weights were based on four components: the design weight, undercoverage adjustment, missed link adjustment and population benchmarking.

    The original population benchmark was the 2011 Estimated Resident Population (ERP). The 2011 ERP was chosen over the 2006 ERP as the baseline population as it is more recent. The ERP was than adjusted to exclude births and overseas arrivals that had occurred between 2006 and 2011.

    Weights were benchmarked to the following population groups:
    • state by age (ten year groups), by sex, by mobility (interstate arrivals benchmarked separately)
    • Indigenous status by state.

    The weights have a mean value of 24 and range between 17 and 103. Higher weights are associated with people of Aboriginal and Torres Strait Islander origin and people who moved interstate between 2006 and 2011. For more information see the Appendix.


    Back to top of the page