Methodological News, Dec 2022

Features important work and developments in ABS methodologies

Released
13/12/2022

This issue contains three articles:

  • Automating geographic area design
  • Supply chain network reconstruction
  • Using knowledge graphs for trade statistics

Automating geographic area design

Methodology Division co-chairs the Domain for Government Systems at the Queensland University of Technology Centre for Data Science. As part of this collaboration, MD sponsors graduate research degrees on several projects of interest to the ABS. One project that has been underway since the beginning of 2022 is on the automation of area design for the Australian Statistical Geographic Standard (ASGS).

The ASGS is updated every five years in conjunction with the Australian Census of Population and Housing. This process is currently manual, iterative and time-consuming - particularly from the perspective of manually creating new polygon boundaries in areas where there has been notable population and dwelling growth over the preceding 5 years. There are many constraints to account for in designing statistical areas, such as the need to align boundaries sensibly with geographic features such as major roads and rivers. Land use, population size, statistical characteristics and likelihood of future population growth must also be considered.

Automation could save considerable effort on manual redesign tasks for population growth areas, but most research in this field is focussed on building a partition from scratch. The ABS/QUT sponsored project is providing a novel approach to update a pre-existing partition to align more optimally with desirable features and constraints. The Hierarchical Land Parcel Aggregation (HeLP) Algorithm developed in the project combines hierarchical aggregation and graph-theoretic approaches to realign boundaries to account for population growth and other changes.

An index of optimality is used to measure the performance of the HeLP Algorithm that incorporates measures of homogeneity of land use, geographic compactness, and equality of population distribution. Tested on an area in south-eastern Queensland, the HeLP Algorithm offers an opportunity to streamline ASGS boundary creation and redirect manual effort towards higher level design tasks. Testing of the HeLP algorithm is underway to validate its comparability to existing processes, adjusting the algorithm’s properties to align with existing ASGS design criteria more closely.

Preliminary analysis of the HeLP Algorithm has identified it is less parsimonious, tending to create more statistical areas than current ASGS design processes. These results suggests that this is driven by areas with large numbers of roads and low land use homogeneity. In such areas a human analyst may choose to override selected automated design outputs, where the algorithm creates many small areas. The HeLP algorithm is relatively efficient with worst-case time complexity being \(\cal{O}\left(n^2 \log n\right)\) time, and in tests running on average in \(\cal{O}\left(n \log n\right)\) time.

Future work on this project will include: exploring high performance computing options for reducing computation time, system deployment options for use within the ABS, further fine-tuning the parameters of the algorithm for optimal performance, and developing the capability to use the algorithm within selected locations where manual boundary creation could be most effectively streamlined through automation.

For more information, please contact Edwin Lu or Filip Juricev-Martincev.

Supply chain network reconstruction

Over recent years, the world has witnessed an increase in economic disruptions, ranging from natural disasters and trade embargoes to the COVID-19 pandemic and cyber-attacks. These events impact national and international business supply chains, highlight the vulnerabilities of modern integrated economies and heighten sovereign risk. Across the globe, governments are seeking to measure resilience in the business supply chain network, forecast the impacts of emerging or potential disruptions, develop effective economic risk mitigation strategies, and facilitate economic recovery.

The ABS has been undertaking an innovative project to evaluate the feasibility of reconstructing the domestic business supply chain network to estimate the risks and likely impacts of economic shocks through the Australian economy. This is difficult to do with existing statistics that are not designed to capture the trading relationships between businesses and hence the dynamics of network interactions.

To construct a probabilistic prototype supply chain network, the ABS has utilised supply-use tables from the national accounts and tax data on business sales and purchases. The approach follows methods implemented at the Dutch Central Bureau of Statistics and in the emerging body of literature on network reconstruction. Our first proof-of-concept model focused on the bread-manufacturing supply chain, from fertiliser production to wheat farms, to flour milling and finally to bread-making. We produced a visualisation of the relative trade across each of these production markets that highlights the key selling and buying states and territories for each product. Importantly, we established the feasibility of the approach in the Australian context, revealing its potential to address specific research needs. Over time, the ABS plans to explore the acquisition of other data sources and improvements to the underlying methods that would enhance the model.

Network reconstruction presents new opportunities to quantify complex economic and social systems. Extensions to this prototype work could include extending the network to other parts of the economy and using the model to measure the extent and magnitude of disruptions throughout supply chains. As a comprehensive map of interconnections in the economy, a reconstructed network would enable the ABS to fill critical data gaps and provide new statistical tools for policymaking and, macro-economic research.

For more information, please contact Anthony Russo.

Using knowledge graphs for trade statistics

The ABS is committed to the continued improvement of its data integration and analysis capabilities. New methods and systems increase the efficiency of delivering mainstream statistical products. They also enable new statistical solutions that meet emerging information needs. With the publication of the 2019-20 Characteristics of Australian Exporters (CoAE) in April 2022, the ABS has achieved a major milestone in its adoption of powerful graph-based methods for statistical production. Statistical outputs for the 2019-20 CoAE publication were generated from GLIDE, an advanced information system developed at the ABS for integrating and analysing diverse multisource data.

GLIDE enables the microdata sets needed for a particular analytical purpose to be represented, stored, combined and manipulated in the form of a network called a knowledge graph. Such microdata can be of any kind (structured or unstructured) and from any source (survey, administrative, transactional, sensor, web, and so on). The knowledge graph is a logically unified information object that can hold all source and derived data items, together with their underlying concepts and any associated metadata.

The knowledge graph paradigm underpins a flexible, intuitive and highly scalable approach for dealing with massive volumes of highly heterogeneous data. It also allows the semantics (meaning) of concepts captured in the data to be expressed in a computable way. This facilitates consistent and objective interpretation, and provides a backbone for specialised logical reasoning and machine learning methods to automatically derive analytical insights. Both the graph model and semantic enrichment framework used in GLIDE are based on the W3C standards for the Semantic Web.

For the 2019-20 CoAE publication, a Trade Performance Knowledge Graph (TPKG) was constructed from linked source data sets in the Business Longitudinal Analysis Data Environment (BLADE) that contain firm-level business characteristics and activity information, including records of trade (export and import) transactions. Not only did the TPKG assist in the efficient production and validation of statistical outputs from a huge volume of integrated data, it enabled a long-standing data linkage problem to be solved using an innovative graph method. This resulted in improved mapping between business units identified by Australian Business Number (ABN) in the trade transactions data and production units – or Type of Activity Units (TAUs) – associated with profiled businesses in the ABS Business Register.

The TPKG will provide a foundation for the delivery of future trade statistics, including the 2020-21 Characteristics of Australian Importers (CoAI) publication scheduled for release in February 2023. The ABS is also investigating the feasibility of augmenting the TPKG with additional data after the 2020-21 CoAI is published, using a graph-based location spine that incorporates geographic information and spatial positioning data.

For more information, please contact Tim Cadogan-Cowper or Ric Clarke.

Contact us

Please email methodology@abs.gov.au to:

  • contact authors for further information
  • provide comments or feedback
  • be added to or removed from our electronic mailing list

Alternatively, you can post to:

Methodological News Editor
Methodology Division
Australian Bureau of Statistics
Locked Bag No. 10
Belconnen ACT 2617

The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.

Previous releases

Releases from June 2021 onwards can be accessed under research.

Releases up to March 2021 can be accessed under past releases.

Back to top of the page