Information paper: Name encoding method for Census 2016
Since 2006, the ABS has been enhancing the value of Census data by combining it with data from different sources. By combining Census data with other information (e.g. from surveys or administrative collections) we can gain an improved understanding of complex policy problems. This improved understanding can help to better inform government decisions in important areas such as health, education, infrastructure and the economy.
ABS research shows the use of anonymised name and address to link different data significantly increases the successful linkage rates and the quality of the resulting statistics.
Previously, Census names and addresses were destroyed at the end of Census data processing, approximately 18 months after the Census. Without names and addresses, there was less opportunity for the ABS to combine Census data with other information.
For the 2016 Census, the ABS decided to extend the retention period for Census names and addresses (from the end of Census data processing to a period of up to four years) to support high quality data integration. This decision enables the ABS to maximise the value of the Census data.
The ABS is committed to ensuring sufficient safeguards are in place to protect information collected in the Census, including names and addresses.
The safety and security of name and address information during the Census retention period is achieved through a combination of data security and procedural measures, including converting names to anonymised codes and only allowing a small number of ABS officers to use these codes while applying strict security protocols. The codes are not available to researchers either within or outside the ABS.
For information on the full suite of measures in place to protect name and address information during the retention period, see Privacy, Confidentiality and Security.
This paper provides a summary of the steps taken by the ABS to make a decision on the method to convert names to anonymised codes.
External advice on name encoding methods
Following the 2016 Census, the ABS engaged independent cryptography experts from the University of Melbourne to investigate different methods of encoding names for use in data integration projects. A range of options were presented to the ABS for further consideration.
The ABS assessed these options in terms of
- their impact on linkage accuracy,
- ability to meet security requirements, and
- feasibility of implementation.
This assessment identified ‘Lossy encoding’ as the preferred method at this time.
The assessment was presented to the ABS’ Methodology Advisory Committee in November, 2017. The Committee supported the general approach of Lossy encoding and also recommended continued efforts to utilise growing methodological research and advancements in this area. Please refer to the Methodology Advisory Committee paper
Lossy encoding groups names together into a desired number of ‘bins’. During data linkage, the bin identifiers are used as linking variables instead of names. First names and last names are encoded separately. Bin identifiers are removed from the dataset that is subsequently used by analysts to derive statistics.
After applying the Lossy encoding method, original name can never be directly re-derived from the anonymised name code. Its baseline level of protection is grounded in the fact that some information is destroyed (lost) through the encoding process.
Further details about Lossy encoding can be found in the Methodology Advisory Committee paper
Lossy encoding has been implemented to encode Census 2016 names for data linking.
Since new anonymisation methods are constantly being developed by experts around the world, the ABS will continue to explore these methods to ensure the ABS can consider and apply improved methods when they become available.