|Page tools: Print Page Print All RSS Search this Product|
Analysis of probabilistically linked data: application to the simulated Statistical Longitudinal Census Dataset
As has been described in previous MDMD newsletters, the key feature of the Census Data Enhancement project is to create a Statistical Longitudinal Census Dataset (SLCD) based on a random sample of 5% of person records from the 2006 Census. These will be linked to person records from 2011 and subsequent Censuses without using name and address as linking variables. The SLCD will provide a substantial opportunity for longitudinal analysis to see how people and their families or households change over time, while maintaining the ABS’ strong commitment to the confidentiality of its Census respondents. Since a unique person identifier will not be available, some links will be incorrect, so some linked Census records will not correspond to the same individual.
The ABS has conducted a quality study to assess the feasibility of forming the SLCD in this way and its likely quality. Within a short window, during which the 2006 Census data were being processed, name and address were available for both the Census and Census Dress Rehearsal (CDR). Gold standard person-level links were formed using names, address, mesh block and and selected Census data items and were assumed to be without error. To simulate the linkage method for the SLCD, Bronze standard person-level links were formed using only mesh block and and selected Census data items (i.e. no names and address). Differences between Bronze standard and the Gold standard estimates are assumed to be due to errors in the Bronze standard links.
In the previous issue of Methodological News, mention was made of fitting generalised linear models to probabilistically linked data. A method was developed by Professor Ray Chambers, of the University of Wollongong, for removing bias in analysis due to inexact linkage. This method was implemented as part of the above quality study. While the method did in fact reduce the bias due to incorrect links, a larger source of error was due to non-links.
A non-link arises when a record on one file that could have been linked to its existing counterpart on the other file was not linked at all. A non-link would occur if there were insufficient information for a reliable link to be made. If the characteristics of non-links are unusual in some way, estimates obtained from the Bronze-linked data may be biased. This concern is analogous to the concern of record non-response in sample surveys and is based on substantive reasons. For example, people aged under 20 years were under-represented in the Bronze linked data because there were relatively few useful linking variables. For instance most are never married, do not have post-school qualifications, many have not yet completed school and those who have may not have a steady field of employment yet. Future work is focusing on reducing the error due to non-links.
For more details, contact James Chipperfield on (02) 6252 7301 or firstname.lastname@example.org.
These documents will be presented in a new window.