Methodological News, March Quarter 2025

Features important work and developments in ABS methodologies

Released
12/03/2025

This issue contains two articles:

  • Using LLMs to analyse survey feedback for improved user experience
  • Enhancing Labour Statistics: Producing Demographics Insights from Payroll Data

Using LLMs to analyse survey feedback for improved user experience

The Australian Bureau of Statistics (ABS) frequently collects user feedback at the end of surveys to improve user experience. The end-of-survey comments provide users with an opportunity to share their feedback, including any difficulties encountered or suggestions for improvement. These comments, which are free text responses, can be analysed using natural language processing (NLP) techniques to identify key issues contributing to respondent burden, ultimately enhancing user experience.

This work is a revision of the previous work done on text analysis. Specifically, this work uses artificial intelligence (AI) in the areas of NLP and large language models (LLMs) to perform tasks such as topic modeling and sentiment analysis, leading to greater performance, faster development, and greater automation possibilities. Further, we embed these models within a reuseable end-to-end pipeline that can be deployed in the cloud. 

The free text analysis pipeline performs two tasks:

  1. Topic modelling
  2. Sentiment analysis

Topic modelling is a process that identifies and clusters the most frequent words in the text, to generate ‘topics’ that represent key underlying themes (e.g. “insufficient time to complete the survey”). ABS has implemented various methods for topic modelling, including BERT transformers (Bidirectional Encoder Representations from Transformers)-based method (BERTopic) and probabilistic approaches like Latent Dirichlet Allocation (LDA).  The best model is automatically selected using the ‘coherence score’ as a metric, evaluates whether words grouped into a “topic” are coherent. 

Sentiment analysis is a process that determines the emotional tone or sentiment (positive, neutral or negative) of each comment, reflecting the user experience. Insights from sentiment analysis can be used to improve user experience and reduce respondent burden. Sentiment analysis is conducted using a pre-trained LLM. ABS evaluated the performance of several open-source, pre-trained LLMs on a test dataset. Human evaluation was used methodically to select the model that closely aligns with human judgment.

The code-pipeline developed in this project is semi-automatic and is reusable if configured and evaluated for other survey free-text data. The tasks in the pipeline are modular, allowing new projects to include only the tasks they need (e.g. only topic modelling) and add new machine learning/ NLP tasks. The pipeline is implemented in the AWS (Amazon Web Service) cloud environment, and the outputs can be visualised in Quicksight, an AWS visualisation tool. This allows the output to be analysed to obtain desired insights for improved business intelligence.

Next steps will include improving the performance, reusability, and reliability/reproducibility of the pipeline. 

For more information, please contact Ilana Lichtenstein and Sobia Saleem at methodology@abs.gov.au.

Enhancing Labour Statistics: Producing Demographics Insights from Payroll Data

As part of our ongoing objective to provide new insights, the ABS is enhancing statistical outputs that use Single Touch Payroll (STP) data supplied by the Australian Taxation Office (ATO). This data contains payroll earnings information for jobholders across the majority of employing businesses in Australia and is currently used to produce the Monthly Employee Earnings Indicator and Payroll Jobs publications.

Single Touch Payroll Data and the Units Model

To enable the disaggregation of employment and wages and salaries data by jobholder characteristics like sex and age groups, the ABS has developed a new methodology for mapping jobholder level demographics information from STP to the Economic Units Model (EUM). This work extends the statistical unit mapping described in Methodological News, September Quarter 2023 | Australian Bureau of Statistics
Most businesses on the ABS Business Register are assumed to have a simple structure where an ABN is directly mapped to the statistical unit. Larger and more complex businesses are profiled by the ABS as Enterprise Groups (EGs) consisting of multiple ABNs, with data mapped to one or more Type of Activity Units (TAUs). 
STP data can be used to infer employment and wages and salaries paid by ABNs to jobholders. While business characteristics are available for all businesses under the units model, jobholder characteristics are not - but they are available from STP and related data. Our goal was to enable statistical outputs disaggregated by both business and jobholder characteristics, e.g. employment by industry subdivision and sex. 

Methodology

The proposed demographic mapping methodology utilises ABN level information from prior periods of STP and related ATO data to apportion the data from a group of ABNs to one or more TAUs.
The method first relies upon constructing rates of prevalence for a variable of interest (employment or wages) by industry subdivision from historical data for “demographic categories” (sex by ten-year age groups) to account for differing industry demographics. Within a particular EG, the industry subdivisions of all TAUs are assigned a proportion of each demographic category, weighted by the total employment size of each TAU. Historical totals across all ABNs in the same EG are created in each demographic category for the variable of interest and used to adjust the proportion of the variable within each demographic category in all TAUs. Finally, the total employment or wages in each TAU is prorated according to these proportions.
An advantage of this method is that the pre-existing employment and wages totals of all TAUs are preserved, meaning no modifications to existing methodology are required to implement this additional proration step. Additivity is preserved, that is the sum of all demographic categories matches the pre-existing total employment and wages of the TAU. 
The method is currently undergoing consideration before being utilised in published data.
For more information, please contact Jacob Ryan or Daniel Gow at methodology@abs.gov.au.

Contact us

Please email methodology@abs.gov.au to:

  • contact authors for further information
  • provide comments or feedback
  • be added to or removed from our electronic mailing list.

Alternatively, you can post to:

Methodological News Editor
Methodology Division
Australian Bureau of Statistics
Locked Bag No. 10
Belconnen ACT 2617

The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.

Previous releases

Releases from June 2021 onwards can be accessed under research.

Releases up to March 2021 can be accessed under past releases.

Back to top of the page