This issue contains three articles:
- Masked deep neural networks and their potential applications for official statistics
- Feeding back survey information to the frame
- Methodological developments for Census 2021
Features important work and developments in ABS methodologies
This issue contains three articles:
The ABS has been investigating masked Deep Neural Networks (Huang et. al. 2019) as a way of approximating the joint density of a set of correlated data items. We call this approximation of the joint density a 'structure summary.' Modelling the structure summary of business or household data could be useful for a number of purposes. Firstly, to enable imputation of missing data that preserves the relationships between items - business turnover, wages and sales for example. Secondly, and more ambitiously, to repair biased administrative data based on a corresponding representative survey. The rich detail of the adjusted administrative data would then be made available, including statistics at fine levels of disaggregation.
In a masked Deep Neural Network (DNN) stochastic binary vectors (masks) are used so that for each record each target variable can be both predicted and a predictor. The result is that a single DNN model provides the probability density of any single variable conditional on any subset of the remaining variables. To provide coherent multivariate imputes it is necessary to combine the conditional probability densities provide by the DNN with an annealed Gibbs sampling algorithm. In essence, imputed values generated from modelled conditional densities in earlier iterations of a sequence become fixed values to help predict densities and generate imputes for other data items in later iterations. By repetition of the Gibbs sampling process multiple imputes may be generated for a record and the data uncertainty captured.
The method described above should be effective for imputation of missing at random (MAR) values in administrative data along the lines of sequential regression multiple imputation methods. When combined with simple coverage adjustments this may be sufficient repair to enable useful statistics to be derived from administrative data sources. When missing not at random (MNAR) mechanisms occur in administrative datasets, and a sample survey is available that collects corresponding data items, then repair of the biased administrative data may still be possible. This is done using the DNN structure summary and without the need to link survey and administrative data at record level.
Two alternative approaches are being evaluated for administrative data repair. In the first approach the structure summary of the administrative data is modelled and then fine-tuned using transfer learning, with the survey dataset used as the source domain. Coherent administrative values are then simulated from the fine-tuned structure summary. In the second approach structure summaries are learnt on both the administrative and survey datasets. New administrative data values are imputed consistent with the survey structure summary, but such that the imputed values are as close as possible to the observed administrative data values. These approaches are being progressed in collaboration with the Centre for Data Science at the Queensland University of Technology.
The use of masked DNNs for coherent multivariate imputation has been investigated and is looking promising. The repair of biased administrative data is more challenging with work focussing on allowing for the sampling variability in the adjustment of the administrative data values.
For more information, please contact Sean Buttsworth.
The ABS has recently investigated the use of dependent source feedback (DSF) with the aim to quantify the potential bias and variance impacts on survey estimates resulting from using business industry information obtained during the sampling process.
During the enumeration phase of business surveys, we may gather industry information (referred to as 'reported' ANZSIC) that differs from that on the frame. One approach is to feed this new information back to the frame with the aim of improving estimates. However, as sampling relies on the selected sample being representative of the frame, in the case of business surveys that use synchronised selection to control for overlap between cycles, feeding back this type of information (DSF) to the frame may introduce some bias.
Three scenarios were analysed using both simulated and observed survey variables:
The work is ongoing but preliminary results indicate that movement estimates are largely unaffected, while in some situations, level estimates could be improved by using 'reported' ANZSIC in estimation rather than feeding it back to the frame. This warrants further investigation coupled with assessing the impact of increased changes to ANZSIC observed during the current COVID-19 pandemic.
As part of the investigation, the ABS contacted a number of other National Statistical Offices to enquire about their approach to DSF. These countries generally have a similar set up to the ABS, with a business register based on tax data and use of synchronised selections type methodology to manage overlap between surveys. It identified that there is no internationally consistent approach to DSF. Some countries fully allow DSF with information collected during the survey given precedence over other sources, some allow it in specific circumstances (e.g. to surveys other than the one the information was collected in) and others don't allow it at all.
Future ABS work will be contingent on the implementation of an initiative from the Australian Taxation Office for businesses to review their industry code annually from 2022. While implementation details are being finalised, the degree to which this may lessen the DSF impact will be a key factor.
For more information, please contact Gwenda Thompson.
The Census is the largest logistical peacetime operation in Australia. The development of each Census spans more than five years. This article discusses some of the methodological developments for Census 2021.
Counting the Australian population is a gargantuan task that involves a very large field force of temporary ABS staff. We use data and modelling to determine where we need to hire these staff as well as how to marshal them to produce the highest quality results possible.
A major project in the lead up to the 2021 Census was determining our Field Officer recruitment targets. That is, determining how many staff we expect to need in every small area around the country to knock on doors and encourage response from households. To do this we used data from Census 2016 and created statistical and machine learning models for several different things at small area levels:
Results of these models were combined to produce estimates of the amount of work required in each small area, and hence how many staff we needed to hire to complete this vital work.
Another area of methodological development for Census 2021 was in operational monitoring during enumeration of the Census. We created models for the expected response rate over time, using data from 2016 and adjusting for known and expected changes in 2021. These models can be compared to actual response rates as they happen and tell us which areas are underperforming, helping us guide decisions on prioritising field effort to maximise response. For example, areas where we might need to fly in extra field staff or areas where we may need to extend visits or cease them early.
All Census communication was developed with a strong collection methodology focus to support positive respondent experiences and accurate responses. For example, form instructions about who to include were redesigned, improving usability and quality of population and family relationship data.
Accessibility was prioritised in all channels to assist full participation. Extensive development and expert partners enabled digital service compatibility with a wide range of devices and software, and Braille and large-print paper form versions. Wording throughout the forms was updated to Plainer English.
The ABS took a mobile-first approach for the digital service to meet growing respondent demand for this mode. Form design was guided by principles in the ABS Forms Design Standards Manual and the Australian Government Digital Service Standard.
Behavioural insights research fed into respondent letters. New messaging about when to complete the Census, clarity on compulsion and refined instructions for web form access were combined with critical scope and purpose information. Letters that were still appealingly short were achieved with the broader media campaign context.
The ABS conducted thousands of cognitive interviews and usability tests with the public. These provided iterative insights into feasibility of new question topics, performance of new online features such as self-service help, and letter design. With the onset of COVID-19, the ABS introduced user testing by video-conference and new unmoderated online testing methods to generate timely information about respondent comprehension and behaviours.
Work is now underway to evaluate the 2021 Census. Assisted by natural language processing, the ABS will analyse feedback provided by respondents at the end of the digital form. Paradata will also demonstrate how and when respondents completed the Census. These findings will guide the design processes for the next Census, in 2026.
For more information, please contact Ben Ingram.
Please email email@example.com to:
Alternatively, you can post to:
Methodological News Editor
Australian Bureau of Statistics
Locked Bag No. 10
Belconnen ACT 2617