Coding service formats
Request formats and recommended text inputs.
Request formats
The coding service uses JSON format for the following services:
- Real-time (synchronous) public coding service
- Real-time partner coding service
- Real-time small batch coding service
It uses JSONL format for the large batch/bulk (asynchronous) partner coding service.
The GET Data method will return the following for each of the specified services for occupation coding:
| Service | Returns |
|---|---|
| Real-time (synchronous) public coding service |
|
| Real-time partner coding service |
|
| Real-time small batch coding service (up to 300 records) |
|
| Large batch/bulk (asynchronous) partner coding service |
|
Recommended text input for coding
- The occupation coder will perform optimally when provided with both a job title and tasks as free text inputs, as this is how the ML training was carried out.
- Large batch (asynchronous) coding requires both the occp_text and tasks_text fields in the call. If input text for one or other of the fields is not available, include the blank field. For example:
"occp_text": "sewing machinist"
"tasks_text": "" - The coder will not perform as well with just one text field completed (i.e if only the job title or only the task text is entered). If results are unsuccessful, entering more information will help the service make better predictions.
- For synchronous coding, text strings can be a maximum of 100 characters only (a total of 100 characters for combined occupation and task input text entries).
- For asynchronous coding, text strings can be a total of 300 characters for the combined fields.
- The coding service API will not accept custom data queries or query string parameters.
- The service does not recognise classification codes as inputs. While it is not possible to recode a six-digit ANZSCO code to a six-digit OSCA code, datasets with only ANZSCO codes may be recoded to OSCA if the ANZSCO occupation title is reattached to each record. Including more information as a tasks text entry, such as the occupation descriptions from the classification, will give better outcomes.
- The coding service has been trained on English inputs only. The service accepts printable ASCII characters, which includes all English letters and connectives, but excludes certain accents, foreign currency symbols and control characters like file endings or backspace. Including a bad character may result in an 'Invalid request body' error.
Contextual considerations
The contextual assumption of the input text is that the text relates to and describes a person’s job. The coder is able to recognise a very broad vocabulary and will attempt to code all input text, regardless of context, so users need to ensure a contextual fit between their input data and the coding task being undertaken.
For example, if a person enters their job text as ‘prisoner’, the Coding Service assumes a context that the job to be coded works with prisoners in some way, and codes to ‘Correctional Officer’. Likewise, the input text ‘student’ codes to ‘Student Services Adviser’.
The OSCA 2024 model also returns non-classification codes for responses that are not occupations within the scope of the classification:
code label
099900 Not in the labour force nfd
099988 Inadequately described
099920 Child/baby
099930 Invalid pensioner
099940 Other pensioner
099950 Housewife/husband
099960 Retired
099970 Unemployed/Not work for the dole
(Please note: the ANZSCO 2022 model also available in the Coding Service does NOT include non-classification codes.)
Multiple occupation entries
The service is designed to provide a single occupation code and title for a single record. If multiple jobs per record are entered in the occp_text field, the coder will attempt to code the provided text to a best fit single occupation code at the most detailed level.
The output will reflect the training data, and will depend on how many times the two jobs were present together in the training data. The Coding Service will default to whatever is most commonly found in the training data.
- If multiple jobs are present, you will need to format each job as a separate request.
Likewise, if people combine non-classification labels with job titles, such as ‘retired boilermaker’, ‘unemployed postman’, etc, the coder will attempt a best fit code. This may be ‘retired’, or ‘boilermaker’, or something else entirely depending on the context and the amount of times the combination of words appeared in the training data. These records may require review.
For more detail, see Using the service: Important context for model use, Tips for getting the best predictions out of the Coding Service, and Review coding outputs.