1504.0 - Methodological News, Mar 2019  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 28/03/2019   
   Page tools: Print Print Page Print all pages in this productPrint All

UNSUPERVISED MACHINE LEARNING FOR DATA EDITING AND CORRECTION

Many administrative datasets and survey datasets have missing data and errors. There may be little subject matter knowledge available to indicate what checks need to be done to identify errors, and specifying a set of such edit rules is laborious. It is also not always clear how to impute for missing item values in a way that is consistent with reported items.

Supervised learning is already being explored in the ABS for situations where training data are available from historical editing by humans. In this situation the goal is to establish a model using the pre-edited data to predict an outcome such as "reported value is incorrect". Predictions from such a model can be used to guide future editing, but supervised machine learning can replicate biases that were inherent in the manual editing processes.

For this study, the direction proposed is to use machine learning or automated modelling techniques to explore features of a dataset, for example by fitting a model for the joint density of the reported items on a dataset. Such a model would give low probability to units with item relationships that appear infrequently in the dataset. Methods would be preferred in which the unusual relationships can be reported to a human expert who can then determine whether the situation signifies an error. This can then be incorporated in edit rules, in a revised model that can highlight items for correction and in imputing a new item value.

For more information, contact Philip Bell Methodology@abs.gov.au

The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.