DECISION MODEL
A decision rule determines whether the record pair is linked, not linked or considered further as a possible link. The first phase of this process is automated, in which a record is assigned to its best possible pairing. This process is known as one-to-one assignment. Ideally (and often true in practice) each record has a single, obvious best pairing, which is its true match.
Linking projects in the ABS have typically used an auction algorithm to assign optimally one record on the first dataset to one record on the second dataset. The auction algorithm maximises the sum of all the record pair comparison weights through alternative assignment choices, such that if a record A1 on File A links well to records B1 and B2 on File B, but record A2 links well to B2 only, the auction algorithm will assign A1 to B1 and A2 to B2, to maximise the overall comparison weights for all record pairs.
The second phase of the decision rule stage takes the output of one-to-one assignment and decides which pairs should be retained as links, and which should be rejected as non-links. This is done by defining cut-off weights against which record pair comparison weights are evaluated. The simplest decision rule uses a single cut-off such that all record pairs with a weight greater than or equal to the cut-off are assigned as links, and all those pairs with a weight less than the cut-off are assigned as non-links. A more sophisticated decision rule was used in the Death registrations to Census linkage project and employs lower and upper cut-offs. Record pairs with a weight above the upper cut-off are declared links while those with a weight below the lower cut-off are declared non-links. The record pairs with weights between the upper and lower cut-offs are designated for clerical review.
Note that even where the original data is of very high quality, the information on equivalent records may not be identical across all the blocking and linking variables. For this reason, several ‘passes’ are used to optimise the opportunity for equivalent records to be linked, with different combinations of blocking and linking variables for each pass. Records on each dataset not linked on one pass are included in the pool of possible links for the next pass.
In clerical review, each record pair is manually inspected to resolve its match status. A clerical reviewer is often able to utilise information which cannot be captured in the automated comparison process, such as variations in names and common transcription errors (e.g. 1 and 7). Reviewed records are either accepted as links or rejected as non-links.
In order to establish the upper and lower cut-off values, a sample of the record pairs is clerically reviewed. This enables an estimate of the number of false links. In the 2011 Death registrations to Census linkage project the upper cut-offs were set at a weight value such that no false links were detected above the cut-offs. In the fifth pass neither sampling nor one-to-one assignment was used. Rather, all potential links for the remaining unlinked Aboriginal and Torres Strait Islander deaths were manually reviewed. In all passes, any record pair that included an Aboriginal and Torres Strait Islander death and had a link weight below the lower cut-off was also subjected to clerical review, regardless of the link weight.
Thus considerable resources were assigned to clerical review to ensure greater control over quality. This achieved:
- a reduction in the amount of false links–since a high upper clerical cut-off weight could be chosen before automatically assigning record pairs as links
- tailored clerical review–allowing for specific sub populations, such as potential Aboriginal and Torres Strait islander links, to be targeted.