3302.0.55.004 - Information Paper: Death registrations to Census linkage project - Methodology and Quality Assessment, 2011-2012  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 18/09/2013  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All  

COMPARISON FUNCTIONS

Before calculating m and u probabilities for some variables it is first necessary to define what constitutes agreement. Typical comparison functions include:

  • exact match (e.g. Sex). Agreement occurs only when the two field values are identical. This criterion is used for most linking fields
  • approximate string comparison (e.g. Name). Two strings may be said to agree in spite of a certain proportion of missing, differing, or transposed characters, allowing for misspellings, transcriptions of poor handwriting, etc. Approximate string comparators allow for partial agreement if the strings being compared are similar but do not exactly match, and can be used to ensure that both identical and similar string pairs are defined to agree
  • numeric difference (e.g. Age). A pair may be defined to agree if their field values differ by an amount less than or equal to a specified maximum difference.

For further details on comparison functions used for linkage, see Christen & Churches (2005).

Alternatively, near or partial agreement may be factored into the linking process by converting m and u probabilities to weights. For example, a person’s age on equivalent records will frequently be an exact match, and the m and u probabilities are calculated based on this definition. During linkage, however, a partial agreement weight was given for ages within two years difference.

Table 2.4 displays the comparator types and tolerances applied to linking fields in this project. Comparator types were changed and tolerances were relaxed for some linking fields in later passes of the linkage, in order to broaden the search for remaining unlinked records.

Blocking fields, linking fields, comparator types, and m and u probabilities are input to linking software. Records which agree on the blocking variable(s) are compared on all linking fields.



Table 2.4 - KEY LINKING FIELDS, By comparator type and tolerance

Comparator type and tolerance

Geographic information
Street numberExact String
Street nameApproximate String, threshold value=0.85
SuburbApproximate String, threshold value=0.85
Personal information
First name Approximate String, threshold value=0.85
Surname Approximate String, threshold value=0.85
Personal characteristics
SexExact String
Day of birthExact String (Passes 1 & 2), Numeric Comparison with Absolute Tolerance +2 (Passes 4 & 5)
Month of birthExact String
AgeNumeric Comparison with Absolute Tolerance +1 (Passes 1 & 2 ) , + 2 (Passes 4 & 5)
BirthplaceExact String
Year of arrivalNumeric Comparison with Absolute Tolerance +1 (Passes 1, 2 & 3) / + 2 (Passes 4 & 5)
Marital statusExact String



Previous PageNext Page