Methodological News, Jun 2022

Features important work and developments in ABS methodologies

Released
30/06/2022

This issue contains three articles:

  • The connection between the ABS perturbation methodology and differential privacy
  • Adjusting to the times: the use of video interviews for data collection
  • Safe use of personal data for sample design and estimation

The connection between the ABS perturbation methodology and differential privacy

The ABS is committed to improving access, while ensuring privacy and confidentiality is maintained. The emergence of differential privacy (DP) methods has created opportunities to better quantify the trade-off between statistical utility and confidentiality protection in our statistical outputs.

The ABS is continuing to explore the opportunities offered by DP. This research builds on Leaver and Marley (2011) and Bailie and Chien (2019) to improve the perturbation methodology we use in TableBuilder. This methodology has two elements – an entropy maximisation method for generating the perturbation table and a cell key method to ensure consistent protections for statistical outputs. Recent work has focussed on the first element and has considered an analytical entropy maximisation approach to incorporate (ε,δ)-DP parameters in the design of the perturbation table (transition matrix).

Collaborating with Professor Parastoo Sadeghi, the ABS has considered a single static counting query function and explored the analytical form of the symmetric perturbation distribution – a special case of current TableBuilder parameters which include asymmetric perturbation distributions. This collaboration has established:

  1. a method to analytically quantify the ε and δ parameters in the setting above
  2. an approach to incorporate the ε parameter and the symmetric support of the distribution into the entropy maximisation process
  3. the importance of carefully choosing the variance parameter in the method proposed by Leaver and Marley (2011) with respect to the DP parameters
  4. a sampling scheme to ensure the proposed method can be integrated with the cell key approach to improve ABS perturbation methodology and quantifying the ε and δ DP parameters post sampling

This research has shown it is possible to incorporate DP parameters in the design of the perturbation table. There are several areas for future research including:

  1. extending the method to consider asymmetrical perturbation distributions
  2. developing a framework to consider ε and δ parameters for dynamic table environments
  3. evaluating the performance against different types of perturbation distributions.

 For more information, please contact Professor Parastoo Sadeghi or Dr. Joseph Chien.

Adjusting to the times: the use of video interviews for data collection

The need to mitigate challenges associated with in-person interviewing during the COVID-19 pandemic, together with an increase in the use of video calls across society, contributed to the decision to explore the collection of official statistics using remote video interviews.

Video-Assisted Live Interviewing (VALI) is data collection conducted online using a video conferencing platform. While VALI was initially considered for pandemic related reasons, it also provides the potential to improve data collection efficiency, reduce costs, enhance interviewer safety, and may also improve response rates and reduce provider burden.

A comprehensive range of VALI related research has been undertaken to develop the video interviewing process. This research includes conducting an online panel study of mode preferences, field test observation, and multiple usability testing rounds with ABS field interviewers and various respondent cohorts.

Findings from testing included that respondents: 

  • were positive about the mode, and many had used video calls previously
  • appreciated being able to participate in the interview in a private location where they feel most comfortable. For example, in a bedroom or study when otherwise they would have met with an interviewer in a main room of the house
  • liked and felt more at ease with the physical separation that video provided when talking about sensitive content
  • appreciated being able to see survey prompt cards online during the interview, and would have liked more information (for example, question wording) to be shown on the cards

Further evaluation activities, including a pilot study, will be undertaken prior to decisions being made about the future of VALI within the ABS.

Plans are also underway to conduct a modal experiment to enable data quality comparisons between VALI, online and telephone collection. The results of this experiment are due in early 2023.

For more information, please contact Kirsten Gerlach.

Safe use of personal data for sample design and estimation

The ABS is the custodian for MADIP – the Multi-Agency Data Integration Project.  The ABS collects and links administrative data from a number of Australian government agencies to create a secure data asset combining information on health, education, government payments, income and taxation, employment, and population demographics.  Access to these data is provided to authorised researchers in a way that protects the personal privacy of the information.

This rich source of data could also be used to improve the efficiency of ABS household surveys, by identifying subpopulations of interest and over-sampling these subpopulations, or by using administrative data to improve the efficiency of estimation.  But this needs to be done in a safe way that protects the confidentiality of the information and is seen as an appropriate use of the information by the agencies that supply the data and by the Australian community.

One method of using the data that is generally considered safe is to use area-level summaries.  So for example, if a survey would like to over-sample recent migrants, the proportion of recent migrants in the population can be calculated at area level, and areas with high proportions can be over-sampled. Typically an SA1 area is used, with an average population of approximately 400 people.

While area-level summaries work well in many situations, further efficiencies can be gained by using information at an address-level, particularly if the interest is in relatively rare subpopulations.  The gain needs to be balanced against the risk to privacy that would occur if information from MADIP were simply linked to addresses on the sampling frame.  One technique developed by the ABS is to fit a predictive model, that returns a propensity, or a likelihood that an address contains the subpopulation of interest.  These models can take into account the chance that MADIP information is out of date, and for example that people may have moved address.  While the use of predictive models is, in general, not as effective as directly linking personal information to the sampling frame, it can be a significant improvement on using area-level summaries and can be an appropriate balance between improving efficiency and protecting the privacy of personal information.

For more information, please contact Bruce Fraser.

Contact us

Please email methodology@abs.gov.au to:

  • contact authors for further information
  • provide comments or feedback
  • be added to or removed from our electronic mailing list

Alternatively, you can post to:

Methodological News Editor
Methodology Division
Australian Bureau of Statistics
Locked Bag No. 10
Belconnen ACT 2617

The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.

Previous releases

Releases from June 2021 onwards can be accessed under research.

Releases up to March 2021 can be accessed under past releases.

Back to top of the page