2080.5 - Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2011  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 18/12/2013   
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> 3. Linkage results >> 3.1 Linkage accuracy >> 3.1.1 Linkage rates, true and false links

3.1.1 LINKAGE RATES, TRUE AND FALSE LINKS

Not all record pairs assigned as links in a data linkage exercise are a match, that is, a record pair belonging to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are nevertheless false. The linkage strategy used for the ACLD was designed to achieve both a high number of links and to ensure a high level of accuracy to enable longitudinal research. Accordingly, the strategy was restrictive and conservative, especially in the early passes.

Analysis from the results of clerical review was conducted to determine the quality of the linkage process and estimate the number of true links in the linked ACLD file. This process involved calculating the proportion of rejected record pairs at each linkage weight and determining the amount of false links this would represent in the final output file.

Table 3 provides a summary from the results of clerical review, including an estimate of the number of false links accepted in each pass. Due to the nature of deterministic linking and the way in which linked records were retained, no false links were identified in passes 1 and 2. While it is assumed that all links assigned in these passes were true, as they contained consistent information across all key linking fields, in reality there may have been a small but un-quantifiable number of false links.


TABLE 3 - LINKAGE RESULTS, By pass number

Pass number(a)
1
2
3
4
5
6
7
8
9
11
12
Total(b)

Links created (No.)
559 182
131 575
11 131
182 285
212 071
57 713
10 489
10 156
236 180
133 555
29 911
1 574 248
Sampled in clerical review (No.)
30
30
240
400
400
345
206
120
411
201
200
2 583
Links assigned (No.)
544 925
10 919
10 489
62 570
87 248
18 988
1 723
159
50 007
9 827
3 904
800 759
Total false links (No.)
0
0
997
9 929
17 274
1 832
237
29
10 712
1 051
731
42 792
False link rate (%)
0
0
9.5
15.9
19.8
9.6
13.7
18.4
21.4
10.7
18.7
5.3


(a) The results of Pass 10 were used to identify the blocking field to be used in Pass 11. As a result, there were no records output from Pass 10.

(b) Data presented in the table have been confidentialised. As a result the sum of individual categories may not align with totals.



The combined clerical review results indicate that the number of false links in the final ACLD file could be as low as 5%. By including a tolerance around these results and assuming a small false link rate for the deterministic passes, the false link rate for the ACLD is estimated to be about 5-10%. The passes that contained the highest proportion of false links were Pass 9 (21.4%), where family information was used to try and resolve unlinked records, and Pass 5 (19.8%), which used a broad geography (SA4) as the blocking field. Whilst this is only an approximate estimate, it does give an indication of the high level of overall quality examined through reviewing a sample of over 2,500 record pairs.

The linkage rate of 82% with a false link rate of 5% was broadly consistent with, or better than, other ABS Census linkage projects which did not use name and address as linkage variables (see Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (cat. no. 1351.0.55.026)).

The conservative and restrictive nature of the blocking and linking strategy helped to minimise the number of estimated false links throughout the linkage process accompanied by quality controls that were implemented during clerical review.

About two-thirds (68%) of all links were achieved in the first pass of the project, which used a deterministic linking methodology to identify and filter matches. In Pass 1, a tight geographic and demographic restriction was implemented to maximise the amount of high quality links assigned and to limit the amount of alternative comparisons required. Using this approach, links were only accepted if a single record pair was identified.



Previous PageNext Page