1504.0 - Methodological News, Sep 2014

ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 25/09/2014

Page tools: Print Page Print All
Summary Creating a Prototype Linked Employer-Employee Dataset, With Example Productivity Analysis Sample and Frame Maintenance Procedures for Census and Household Surveys A More Efficient Sample Design Process for REACS 2013/14 Uniqueness Analysis How to Contact Us and Email Subscriber List About this Release	Uniqueness Analysis Data Integration, Access and Confidentiality Methodology Unit (DIACMU) is currently developing methods to evaluate the feasibility of linking datasets prior to the actual linking process and to help identify disclosure risks in linked datasets. One method recently developed is a “uniqueness analysis” on the input datasets. Data linking involves bringing together records from two or more datasets belonging to the same unit. The process produces a unit record file containing analysis fields from the input datasets for the common population. It is a cost-effective method of acquiring more comprehensive statistics. The recent release of the Australian Census Longitudinal Dataset (ACLD) was an important milestone for data linking in the ABS. Ideally, datasets should be linked with a high degree of accuracy and coverage. Data linking is only feasible if there are linking variables on datasets that can uniquely identify individual record pairs belonging to the same unit. The more record pairs uniquely identified by a combination of linking variables, the more likely that high quality links are established. It is important to ascertain the likely success of a linking project before undertaking the project. Uniqueness analysis determines the proportion of records on a single file which are uniquely identified by their values on a combination of variables. It provides a guide to the upper bound of records that could be uniquely linked using the available variables (Conn and Bishop, 2005). For example, if one could uniquely identify 80% of records on File A, but only 50% on File B, then the upper bound for the match rate would be 50%. This is considered an upper bound as errors or changes in linking fields can occur across the two datasets. This analysis helps inform whether a linking project is feasible, and furthermore, provides insight into the optimal linking strategy. This method extends the work of Conn and Bishop in the following ways: 1. investigating the marginal improvement in the percentage of uniquely identified records by increasing the number of variables in the combination of potential linking variables 2. taking into account non-response in linking variables in calculating that percentage. It is envisaged that a uniqueness analysis will be conducted on linked datasets to discover the relationship between the percentage of uniquely identified records and the linkage accuracy. Besides data linking, DIACMU is also investigating methods to more efficiently mitigate disclosure risks in disseminating data on TableBuilder and DataAnalyser. Linked datasets released on TableBuilder and DataAnalyser include the ACLD and the Australian Census and Migrants Integrated Dataset. A uniqueness analysis on linked datasets can help quickly identify disclosure risks prior to their release. Thus, the uniqueness analysis can potentially have multiple applications besides determining the feasibility of linking datasets. It also gives DIACMU a guide to the best way in ensuring the relevance of linked datasets while maintaining confidentiality. References Conn, L & Bishop, G (2005). Exploring Methods for Creating a Longitudinal Dataset, cat. no. 1352.0.55.076, Australian Bureau of Statistics, Canberra. Further Information For more information, please contact Charles Au (02 6252 5990, charles.au@abs.gov.au) The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us. Document Selection These documents will be presented in a new window.