1504.0 - Methodological News, Mar 2018  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 22/03/2018   
   Page tools: Print Print Page Print all pages in this productPrint All

CORRECTING BIASES IN ESTIMATION WHEN LINKAGE ERRORS ARE PRESENT IN A PROBABILISTICALLY-LINKED DATASET UNDER ONE-TO-ONE LINKING SCENARIO

Computerised probabilistic record linkage (CPL) attempts to link records belonging to the same individual in multiple datasets when unique identifiers are absent. Linking multiple sets together allows more statistical analysis to be performed because the linked dataset contains more analysis variables than in each individual dataset. CPL may link two records in two different datasets if they share similar values across several attributes (referred to as linking fields), such as date of birth, age and gender. Even so, the two linked records may sometimes correspond to different individuals. Therefore, linkage errors (records in different dataset not belonging to the same individual are linked) are likely to be present in the linked dataset, leading to estimation bias.

To correct estimation bias, the ABS is investigating a weighting approach. Specifically, a weighting matrix assigns each possible link a weight, where the weight takes into account the linkage error process. Analysts can then perform unbiased standard analysis using the weights. The main issue is modelling the linkage error process, called a Linkage Error Model (LEM), so that the effects of linkage error can be reversed. The LEM in Chambers et al. (2009) is consistent with the assumption that the probability of a link does not depend upon the values of the observed linking field. However, this requirement does not always hold in practice. We have considered relaxing this assumption by conditioning on observed linking fields (e.g. males are less likely to be linked to females). To estimate our LEM and relax the above assumption, we use a latent model to simulate the linkage error process. The whole process is fully computerised.

As expected, our simulation result shows that the performance of our LEM and Chambers et al ’s LEM model are unbiased if linking fields and covariates in the model are independent. However, when the independence assumption is violated, such as when one of the linking fields is a covariate of the response variable, our LEM leads to estimates with very little bias while Chambers et al’ s LEM could be heavily biased. Further directions would be to apply this method to a real linkage situation and to use the method to estimate the proportion of links that are correct in any given linkage exercise.

Further Information

For more information, please contact Yue Ma ym894@uowmail.edu.au or James Chipperfield James.Chipperfield@abs.gov.au

The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.