Coding service formats

WoAG Occupation Coding Service User Guide

Request formats and recommended text inputs.

Released

30/06/2025

Release date and time

30/06/2025 11:30am AEST

Request formats

The coding service uses JSON format for the following services:

Real-time (synchronous) public coding service
Real-time partner coding service
Real-time small batch coding service

It uses JSONL format for the large batch/bulk (asynchronous) partner coding service.

The GET Data method will return the following for each of the specified services for occupation coding:

Service	Returns
Real-time (synchronous) public coding service	One or more classification codes and titles for the free text supplied
Real-time partner coding service	One or more classification codes and titles for the free text supplied
Real-time small batch coding service (up to 300 records)	The best match 1-digit to 6-digit codes and titles (moving up the classification hierarchy from 6-digit to 1-digit level) for the free text supplied If the coder cannot code the free text supplied, it will provide 3 suggestions
Large batch/bulk (asynchronous) partner coding service	The best match 1-digit to 6-digit codes and titles (moving up the classification hierarchy from 6-digit to 1-digit level) for the free text supplied If the coder cannot code the free text supplied, it will provide 3 suggestions

Recommended text input for coding

The occupation coder will perform optimally when provided with both a job title and tasks as free text inputs, as this is how the ML training was carried out.
Large batch (asynchronous) coding requires both the occp_text and tasks_text fields in the call. If input text for one or other of the fields is not available, include the blank field. For example:
"occp_text": "sewing machinist"
"tasks_text": ""
The coder will not perform as well with just one text field completed (i.e if only the job title or only the task text is entered). If results are unsuccessful, entering more information will help the service make better predictions.
For synchronous coding, text strings can be a maximum of 100 characters only (a total of 100 characters for combined occupation and task input text entries).
For asynchronous coding, text strings can be a total of 300 characters for the combined fields.
The coding service API will not accept custom data queries or query string parameters.
The service does not recognise classification codes as inputs. While it is not possible to recode a six-digit ANZSCO code to a six-digit OSCA code, datasets with only ANZSCO codes may be recoded to OSCA if the ANZSCO occupation title is reattached to each record. Including more information as a tasks text entry, such as the occupation descriptions from the classification, will give better outcomes.
The coding service has been trained on English inputs only. The service accepts printable ASCII characters, which includes all English letters and connectives, but excludes certain accents, foreign currency symbols and control characters like file endings or backspace. Including a bad character may result in an 'Invalid request body' error.

Contextual considerations

The contextual assumption of the input text is that the text relates to and describes a person’s job. The coder is able to recognise a very broad vocabulary and will attempt to code all input text, regardless of context, so users need to ensure a contextual fit between their input data and the coding task being undertaken.

For example, if a person enters their job text as ‘prisoner’, the Coding Service assumes a context that the job to be coded works with prisoners in some way, and codes to ‘Correctional Officer’. Likewise, the input text ‘student’ codes to ‘Student Services Adviser’.

The OSCA 2024 model also returns non-classification codes for responses that are not occupations within the scope of the classification:

code label
099900 Not in the labour force nfd
099988 Inadequately described
099920 Child/baby
099930 Invalid pensioner
099940 Other pensioner
099950 Housewife/husband
099960 Retired
099970 Unemployed/Not work for the dole

(Please note: the ANZSCO 2022 model also available in the Coding Service does NOT include non-classification codes.)

Multiple occupation entries

The service is designed to provide a single occupation code and title for a single record. If multiple jobs per record are entered in the occp_text field, the coder will attempt to code the provided text to a best fit single occupation code at the most detailed level.

The output will reflect the training data, and will depend on how many times the two jobs were present together in the training data. The Coding Service will default to whatever is most commonly found in the training data.

If multiple jobs are present, you will need to format each job as a separate request.

Likewise, if people combine non-classification labels with job titles, such as ‘retired boilermaker’, ‘unemployed postman’, etc, the coder will attempt a best fit code. This may be ‘retired’, or ‘boilermaker’, or something else entirely depending on the context and the amount of times the combination of words appeared in the training data. These records may require review.

For more detail, see Using the service: Important context for model use, Tips for getting the best predictions out of the Coding Service, and Review coding outputs.

APA

Citation