Measuring Precision for Deterministic and Probabilistic Record Linkage

Record linkage is the act of bringing together records from two files that belong to, or are likely to belong to, the same unit (e.g. person, student, business). Record linkage is an appropriate technique when data sets need to be joined to enhance dimensions such as time and breadth or depth of detail. For example, the Australian Census Longitudinal Database (ACLD), created by linking the 2006 and 2011 Australian Population Censuses, allows longitudinal analysis. Record linkage offers opportunities for new statistical output and analysis at relatively low cost.

With these new opportunities comes the associated problem of linkage errors. Because a unique person identifier is often not available, records belonging to two different people may be incorrectly linked. Estimating the proportion of links that are correct, called Precision, is difficult because, even after clerical review, there will remain some uncertainty about whether a link is in fact correct or incorrect. Links can be declared deterministically, using a set of pre-defined rules, or probabilistically, where evidence for a link being a match is weighed against the evidence that it is not a match, both of which are widely used in practice.

We have developed an estimator of Precision for a linked file that has been created by either deterministic or probabilistic linkage, both of which are widely used at the ABS. We have demonstrated that the proposed estimators perform well in simulation and in real case studies. The ABS’ deterministic macro, D-MAC, produces associated precision estimates.

