About the Whole of Australian Government Occupation Coding Service

Learn more about the Coding Service and how to use it. Includes privacy, security, and assurance information.

Release date and time
22/10/2025 11:30am AEDT

What is the Coding Service?

We created a video explaining what the ABS WoAG Occupation Coding Service is and how it works. 

Watch it here:

The WoAG Occupation Coding Service ('the Coding Service') is a new capability designed to code job title and tasks text to official statistical classifications. In this context, ‘coding’ means assigning numbered and labelled categories to free text descriptions of job titles and tasks. The Coding Service is trained by supervised machine learning to make fast and accurate predictions of the best code matches for this input data.

Using cloud compute for power and scale to achieve highly automated, high-quality coding, the service makes coding data to the latest classifications easier, faster and cheaper. Coding to a standard classification ensures the data is meaningful and comparable across data sources, and enables consistent data coding across the Australian data landscape. This means data assets are more coherent for research and analysis, and policy makers get better information.

The service can be integrated into agency systems, forms and survey instruments. It can code single records or small batches of data in real time, and can also code large datasets in batches, up to many millions of records.

The Coding Service can code data to:

  • the Occupation Standard Classification for Australia (OSCA) 2024, and
  • the Australian and New Zealand Standard Classification of Occupations (ANZSCO) 2022 Australian Update.

The ABS will maintain the service to ensure the most up-to-date occupation models are available.

Machine Learning models

The Coding Service uses Hierarchical Support Vector Machine (HSVM) models trained by Machine Learning (ML) to code data. ML is a branch of artificial intelligence (AI) that uses algorithms to analyse, draw conclusions and then make predictions from patterns in data. The algorithms are trained to spot patterns, which helps improve coding accuracy. 

The models make predictions by learning how the ABS applies the occupation classifications against the way people describe their jobs and the tasks they do. During the training process, the models learn to map certain words, or combinations of words, to return specific occupation codes for given input text. As the models are trained with both job title and task inputs, the best outcomes are achieved when both inputs are included for coding, and enough detail to ensure the right coding decision is provided.

The models were trained with millions of de-identified responses to occupation of employment questions in previous Australian Censuses. An advantage of this is that the training data represents the Australian population, and the service can recognise and code jobs described in the language and broad vocabulary used by Australians themselves - including misspellings and common colloquial terms. For example, if a person types ‘brickie’ into an online form, the service will recognise that this belongs to OSCA code 371131 - Bricklayer.

The models consider individual words, as well as bigrams, or combinations of two consecutive words, to help capture contextual meaning and word order. For example, the phrase ‘General Manager’ carries a meaning that is distinct from the words ‘General’ or ‘Manager’ on their own, which may appear across many unrelated occupations.

When working out the best match for an input, the models draw on the structure and distribution of the input job title and task text in the training data. This approach gives the models flexibility to learn from natural variations in language. It also means that the quality, frequency, and specificity of training examples directly shape model behaviour. This means the models are more likely to predict occupations that are more common in the training data, particularly when coding short or ambiguous inputs.

Training data is analysed to address biases and ensure the models balance efficiency with measures of quality, including accuracy and matching of input text to the classification. While the models behind the service have been extensively trained and tested to support high-quality coding, we cannot validate or guarantee correct results for all possible inputs. As with any AI ML based service, it may return unexpected results for ambiguous inputs. We encourage feedback (via coding.capability@abs.gov.au) to help us continue refining the service for all users. 

How the service works

The Coding Service is connected and consumed through a standard and secure Application Programming Interface (API) endpoint. This is a software-to-software interface that allows two applications to exchange data with each other via programming scripts. Users may need support from their agency's technology services to integrate the Coding Service into their systems.

The programming code written in an agency’s system sends a request to the Coding Service, via an API, to assign classification codes and titles to specified input text. The ABS system returns classification codes and titles back through the API to the sender.

Single record or small batch coding

  • Single record (synchronous) coding is the real time coding of text inputs for a single instance via the API. This involves a programming code request that includes a text input of a job title and a description of that job’s tasks. The service generates a response in the programming code that indicates whether the request was successful and if so, provides the best matched occupation code and title for the job, and a confidence score for that match.
  • The single record coding function can be embedded into online forms or surveys for synchronous (real-time) coding of text entries. Programming code for integration is available in the User Guide.
  • Synchronous coding of up to 300 records (small batch coding) can also be carried out via the API. The service generates a response to each record listed in the programming request code, indicating whether the record was successfully coded, and if so, providing the best matched occupation code and title for that job, with its associated confidence score

Large batch coding

  • The Coding Service can also be used for large batch coding or recoding, up to millions of records. This is asynchronous coding and can take from 5 minutes to a couple of hours, depending on a range of factors such as number of concurrent users, order in the queue, size of the data file to be coded and connection speed.
  • A file of records is sent via an API request and coded by the service. The output is a web address returned in the sender’s programming code. The web address hosts a downloadable file of all the coded and un-coded records, which opens by clicking the address.
  • The service does not recognise classification codes as inputs.
  • While it is not possible to recode a six-digit ANZSCO code to a six-digit OSCA code, datasets with only ANZSCO codes may be recoded to OSCA if the ANZSCO occupation title is reattached to each record. Including more information as a tasks text entry, such as the occupation descriptions from the classification, will give better outcomes.

Understanding model confidence

In machine learning, model confidence typically refers to the estimated probability the model assigns to its predicted outcome, indicating how certain it is about that prediction. The Coding Service models make predictions according to the structure of the classifications. In the case of ANZSCO and OSCA, these are hierarchical structures with five levels:

  • major group (1-digit code) - broadest level
  • sub-major group (2-digit code)
  • minor group (3-digit code)
  • unit group (4-digit code)
  • occupation (6-digit code) - most detailed level

(To learn more about the occupation classification structure, see the OSCA Structure.)

The model uses the input text (job title and tasks) to predict the most appropriate code, starting at the major group (1-digit code) and making its way down the classification structure to the occupation level (6-digit code). At each level, the model will choose the most likely code based on the information it learned during training, and the final code returned to the user will include a confidence score. This confidence score represents the overall confidence of the model across all the levels of the classification for which the model was able to predict a code. 

For example, if the model has predicted the 6-digit code 265433, Registered Nurse (Aged Care), its prediction path across the OSCA classification, from broadest to most specific occupation categories, would be:

Model prediction path

Shows the classification codes, labels and a brief description for registered nurse from the 1 digit (least detailed) to the 6 digit (most detailed) level

At each level, the model generates a probability score for the code at that level given the code at the previous (higher) level. These are conditional probabilities.

Let’s say the probabilities are:

  • Starting probability of being a Professional: 0.99
  • From Professionals to Health Professionals: 0.94
  • From Health Professionals to Midwifery and Nursing Professionals: 0.92
  • From Midwifery and Nursing Professionals to Registered Nurses: 0.91
  • From Registered Nurses to Registered Nurse (Aged Care): 0.96

To find the joint probability - the overall probability of the input text being coded as 265433 Registered Nurse (Aged Care) - we multiply all the conditional probabilities:

            0.99 × 0.94 × 0.92 × 0.91 × 0.96 = 0.748

So the overall confidence score for this 6-digit code prediction will be about 75%.

If the coder is unable to code an input text at the more detailed levels, the joint probability will be computed across the levels that were coded successfully.

For example, if the coder predicted the 3-digit code 265 (Midwifery and Nursing Professionals) with the following conditional probabilities:

  • Starting probability of being a Professional: 0.90
  • From Professionals to Health Professionals: 0.80
  • From Health Professionals to Midwifery and Nursing Professionals: 0.90

the confidence score for this prediction would be 0.9 × 0.8 × 0.9 = 0.648, or about 65%.

Using the service

How to register

API integration

  • Instructions for connecting to the service are in the User Guide. Users must agree to read and follow the API integration instructions in the guide as part of registration.
  • The user guide is targeted toward software developers and technical professionals integrating the service into a client application.
  • The guide outlines the API endpoints available for accessing and using the service, and provides integration instructions for calling the API.
Example scripts
  • The ABS does not provide maintenance or support for end users’ own applications, but we have created an accessible set of sample scripts with clear instructions, useful tips and examples. Used in conjunction with the User Guide, these examples could help shape your approach to integrating with the coding API, or converting batch coding output.

Important context for model use

The occupation coding models are optimised for use in an Australian context. They use an English character set for coding responses that relate to jobs that fall within the definition and scope of official ABS occupation classifications. 

The contextual assumption of the input text is that the text relates to and describes a job. The models can recognise a broad vocabulary, and will attempt to code all input text regardless of context, so users need to ensure a contextual fit between their input data and the coding task being undertaken. For example, if a person enters their job text as ‘prisoner’, the service assumes a context that the occupation to be coded works with prisoners in some way, and codes to ‘Correctional Officer’. Likewise, the input text ‘Student’ codes to ‘Student Services Adviser’.

If a person describes two jobs in a single text response, the service will attempt to code them to a best fit single occupation code at the most detailed level. This will reflect the training data and will depend on how many times the two jobs were present together in the training data. The service will default to whatever is most commonly found in the training data.  

The OSCA 2024 model has also been trained to return non-classification codes for responses that are not detailed enough to be coded anywhere in the classification (e.g ‘aaa’), or responses that people may enter as their ‘job’ but that are not occupations within the scope of the classification, such as:

  • Retired
  • Unemployed
  • Pensioner
  • Housewife/husband
  • Child/baby
  • Not in the labour force.

As above, if people combine non-classification responses with job titles, such as ‘retired boilermaker’, these may code as ‘retired’, or ‘boilermaker’, or something else entirely depending on other contextual data and the amount of times the combination of words appeared in the training data. We are keen to learn from users on whether and how these non-classification codes support their coding.

(Please note that the ANZSCO 2022 model also available in the Coding Service does NOT include non-classification codes.)

Tips for getting the best predictions out of the Coding Service

  • Enter both a job title and its tasks where possible. This is how the Coding Service performs best as this is how the models were trained. The models should be able to predict a 6-digit occupation code if input text contains brief, specific details and descriptions of both a job and its tasks. It may not code as accurately with just one of these inputs included.
  • Quality in, quality out. When the input text is limited or vague, the model will produce a lower quality coding outcome. The more precise the job title and task information sent to the coder, the better the quality of the predictions.
  • Be specific. Sending the Coding Service the job text ‘teacher’ with the tasks text ‘teaching’ is too vague. A better entry might be ‘primary school teacher’ and a more detailed description of the tasks might be ‘teach literacy, numeracy, etc. to primary school students’.
  • Spell check your input text. The model has learned to handle common misspellings of certain words, however, the cleaner the input text, the better chance of getting a quality prediction.

Review coding outputs

  • Outputs should be reviewed before use to assure their quality. The models significantly improve autocoding rates and quality, but we cannot validate or guarantee correct results for all possible inputs, so users will still need to undertake a small amount of human review and correction. The confidence scores returned with the codes will help inform that review.
  • The ML models for occupation were trained on 2021 Census data that had been coded to version 1.3 of ANZSCO. This training data was updated to OSCA 2024 using a process that was separate from the machine learning method. The update process was new (experimental), and resulted in some incorrect OSCA codes in the data used to train the ML models, which will affect some model predictions.
  • When the input text is limited or vague, the model may predict a code at a less detailed level (e.g. a 3-digit code representing the minor group) or fail to produce a code altogether. Conversely, the model may predict 6 digit codes for input text that should be coded to a less detailed level. For example, ‘Cleaner’ might code to ‘811131, Commercial cleaner’ instead of ‘811, Cleaners and Laundry Workers, or ‘Artist’ might code to ‘231731, Painter (Visual Arts)’ instead of‘2317, Visual Arts and Crafts Professionals’.
  • Send us feedback! If you get unexpected results, please let us know. We are committed to continuously improving the quality of the service. To support this, we will keep analysing model performance and releasing improved models. We welcome any feedback from your results to help us (via email to coding.capability@abs.gov.au).  

Privacy and security

The WoAG Occupation Coding Service is private, safe and secure. It has been designed with strong protections in place to make sure it cannot be misused or accessed by other systems, and has been Infosec Registered Assessors Program (IRAP) assessed, affirming the security of the environment.

  • Users of the Coding Service cannot interact with the AI technologies used to develop coding models.
  • The models are wholly developed within ABS owned and controlled infrastructure within Australia, following strong privacy and security practices within existing ABS safeguards, while ensuring responsible and transparent use of ML technology.
  • The machine learning models underpinning the service are trained entirely within a dedicated, segregated, private and highly secure ABS account.
  • The models are static, and cannot learn from user input data, and the service does not re-use or retain the text input from user requests.
  • The service does not access any data beyond the specific text submitted for coding. It does not store, share, or use any personal information. The data submitted for coding is securely processed and then deleted once a response is provided. 

These are conscious design choices that reflect our commitment to privacy, security, and trust. 

Assurance

The WoAG Occupation Coding Service uses AI in accordance with applicable legislation, regulations, frameworks and policies.

The service complies with the Policy for the responsible use of AI in government. This policy advises the use of existing frameworks through which to consider the risks of developing and using AI, including cyber security, privacy, and ethics. Data security and cyber security are critical for the ABS and are essential elements of the AI technology used in the service. The ABS has governance procedures to ensure that AI is used safely and responsibly when creating the Coding Service models, in accordance with government policy and ethics principles.

The models were assessed against the newly developed Australian Government Pilot AI assurance framework to assure the models’ quality and fitness for purpose. The ABS has taken a risk-managed approach to risks specific to the use of AI, including the potential for algorithmic bias, lack of model transparency and unintended consequences. The forms of AI used in the Coding Service, including Hierarchical Support Vector Machine (HSVM) models, ensure the AI is transparent, traceable, and contained, which lowers potential risks.

The WoAG Occupation Coding Service is purpose-built for one specific task: coding data to official occupation classifications. It cannot do anything outside of this single purpose.

The models are tightly scoped, using highly controlled, narrow machine learning capability. They can only return occupation codes and titles. They cannot generate responses, provide insights, or perform any function beyond this specific task.

The models are trained on historical ABS data. They remain static until manually updated or retrained in a controlled process. They do not dynamically learn from, nor adapt to the data that users send to be coded. They cannot update themselves with previously unseen words. 

Contact us

You can contact the ABS for support via email: coding.capability@abs.gov.au. We aim to respond to all messages and provide support as soon as possible.

Back to top of the page