Asynchronous batch coding

WoAG Occupation Coding Service User Guide

Coding a large volume of data.

Released
30/06/2025

In addition to real-time coding of single records and small batches of data, the Coding Service has been designed to code large datasets through asynchronous batching (that is, returning data after a short period of time). 

The asynchronous service can be used for as little as one record, up to millions of records.

Note: Asynchronous batch coding should be used if you need to code or recode a large volume of data. While it is the most efficient method of coding larger datasets, it is not real-time, and may be subject to queueing during high load periods. 

Getting an upload URL for input data to a batch coding operation

This endpoint is used to create an asynchronous batch inference operation. The API will return a location where you can upload your input file and begin your batch inference operation.

Request syntax

Depending on whether you are specifying a model against which to code your records, your request will follow one of the following formats:

1. Coding records against the latest model

POST /v1/topics/{topic}/batch-code HTTP/1.1
Host: string
Content-type: application/json
Authorisation: string

2. Coding records against a specific model

POST /v1/topics/{topic}/models/{model}/batch-code HTTP/1.1 
Host: string
Content-type: application/json
Authorisation: string

URI request parameters
topicThe uriName of the topic against which the record is coded. This can be acquired by listing the available topics.
Required: Yes
modelThe model GUID for the model you would like to use to code records. This can be acquired by listing the available models for your topic
Required: No

Request body

The request does not have a request body.

Response syntax

HTTP/1.1 200 OK
Content-type: application/json
{
    "requestUploadUrl": "string",
    "operationId": "string",
    "bucketKmsKeyArn": "string"
}

If the action is successful, the service sends back an HTTP 200 response. The following data is returned in JSON format by the service:

Response elements
requestUploadUrlA URL where the records file is to be uploaded.
Type: String
operationIdThe identifier of the operation, to be used to check the status of this job. This must be recorded at this point to maintain access to the operation.
Type: String, in GUID format.
bucketKmsKeyArnA parameter used by the ABS system to ensure the operation’s input data is from the same user who created the operation. 
This must be passed into the x-amz-server-side-encryption-aws-kms-key-id header when uploading your input file.
Type: String

Errors

For information about the errors that are common to all actions, see Errors and suggested actions.

Examples

Creating a new operation to code against the latest model for occupation:

Sample Request

POST /v1/topics/osca/batch-code HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-Type: application/json
Authorisation: example token

Sample Response

HTTP/1.1 200 OK
Content-type: application/json
{
    "requestUploadUrl": "https://domain/endpoint?queries",
    "operationId": "00000000-0000-0000-0000-000000000000",
    "bucketKmsKeyArn": "xyz"
}

 

Creating a new operation to code against a specified model:

Sample Request

POST /v1/topics/anzsco/models/GUID/batch-code HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-Type: application/json
Authorisation: example token

Sample Response

HTTP/1.1 200 OK

Content-type: application/json
{
    "requestUploadUrl": "https://domain/endpoint?queries",
    "operationId": "00000000-0000-0000-0000-000000000000",
    "bucketKmsKeyArn": "xyz"
}

Uploading data for inference

Once you have created an inference operation, you will need to upload your data to the provided requestUploadUrl. This is a pre-signed HTTP request which is managed by the AWS S3 server, and the expected input is outlined below.

Request Syntax

PUT requestUploadUrl HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: string

{ "recordId": "string", "occp_text": "string", "tasks_text": "string" }
...

URI Request Parameters
requestUploadUrlThe location where the input file is being uploaded. This is provided when you first create the inference operation.
Type: String

Please note: the x-amz-server-side-encryption header is not variable and should always have the value aws:kms.

Request Header Parameters
x-amz-server-side-encryption-aws-kms-key-idA parameter used by the ABS system to ensure the input data is from the same user who created the operation. This is provided in the bucketKmsKeyArn field when you first create your inference operation.
Type: String

The request accepts your input file in JSONL format. The maximum input file size is 5GB. All lines of input must contain the same fields, and these fields should satisfy the Record type for the relevant topic/model as specified when creating the upload URL. You may specify the additional field outlined below:

Request Body
recordIdAn identifier for the record being coded. This need not be unique.
Type: String
Required: No

Response Syntax

HTTP/1.1 200 OK

Errors

For information about the errors that are common to all actions, see Errors and suggested actions.

Examples

Specifying all free text inputs and a record identifier:

Sample Request

PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz

{ "recordId": "1", "occp_text": "software developer", "tasks_text": "writing code and unit tests" }
{ "recordId": "2", "occp_text": "Paramedic", "tasks_text": "responding to medical emergencies" }
...

Sample Response

HTTP/1.1 200 OK
 

Specifying a single free text input and a record identifier:

Sample Request

PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz

{ "recordId": "1", "occp_text": "software developer, writes code and unit tests" }
{ "recordId": "2", "occp_text": "Paramedic, respond to emergencies" }
...

Sample Response

HTTP/1.1 200 OK


Specifying all free text inputs and no record identifier:

Sample Request

PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz

{ "occp_text": "software developer", "tasks_text": "writing code and unit tests" }
{ "occp_text": "Paramedic", "tasks_text": "responding to medical emergencies" }
...

Sample Response

HTTP/1.1 200 OK

Checking the status of a batch inference operation

This endpoint is used to check the status of your batch inference job. When the status of your job is complete, the service will return a URL to copy into your web browser to retrieve your coded data.

Request Syntax

Depending on whether you are specifying a model against which to code your records, your request will follow one of the following formats. The application backend handles these requests identically, so you don’t need to worry about recording the model which you used when you began the operation.

1. Checking an operation by specifying the topic only

GET /v1/topics/{topic}/batch-code/operations/{operation_id} HTTP/1.1
Host: string
Content-type: application/json
Authorisation: string
 

2. Checking an operation by specifying both the topic and model 

GET /v1/topics/{topic}/models/{model}/batch-code/operations/{operation_id} HTTP/1.1
Host: string
Content-type: application/json
Authorisation: string

URI Request Parameters
topicThe uriName of the topic against which the record is coded. This can be acquired by listing the available topics.
Required: Yes
modelThe model GUID for the model you would like to use to code records. This can be acquired by listing the available models for your topic.
Required: No
operation_idThe GUID of the operation to get the status of. This value is provided when you first create your inference operation.
Required:  Yes

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200 OK
Content-type: application/json
{
    "operationStatus": "string",
    "responseDownloadUrl": "string",
    "error": "string"
}

If the specified operation exists, the service sends back an HTTP 200 OK status code. The status of the operation will dictate the contents of the response. This data is returned in JSON format by the service:

Response Elements
operationStatusThe status of the operation.
Type: String
Valid Values: awaiting_input | in_progress | complete | timed_out | failed
responseDownloadUrlA URL where the output data file can be downloaded. This field is optional and is returned only if operationStatus is complete.
Type: String
metadataDownloadUrlA URL where the output metadata file can be downloaded. This file includes information about the model used to code your data.This field is optional and is returned only if operationStatus is complete.
Type: String
errorInformation on why the operation failed. This field is optional and is returned only if operationStatus is failed.

A note about presigned URLs

The responseDownloadUrl and metadataDownloadUrl are presigned URLs. Anyone with this link will be able to download your output file, so it is your responsibility to keep the link secret. 

The link will expire after one hour, after which you will have to get a new URL for your output file

Your output files will be deleted from the system within 24 hours after your inference operation succeeds.

A state machine indicating the progression of operations is shown below:

Flow chart of operational steps showing output messages if operation times out, if inference fails and if inference is successful.

Flow chart of operational steps showing output messages if operation times out, if inference fails and if inference is successful. 

If operation times out, output returned is "operation status: timed out".

If inference fails, output returned is "operation status: failed" and a relevant error message is provided.

If inference is successful, output is "operation status: complete" and the relevant response URLs are provided.

For information about the errors that are common to all actions, see Errors and suggested actions. The following errors may occur when calling this service: 

Errors
Unable to retrieve operation for given idNo operations were found to match the given operation_id. Please confirm your operation ID. If you have lost your operation ID, you will have to create a new operation.
HTTP Status Code: 404 (Not Found)
User is not authorised to retrieve operation GUIDThe specified operation does not belong to the current user.
You may have authenticated with the wrong user or specified the wrong operation_id. Try authenticating again with the right credentials, and confirm your operation.
HTTP Status Code: 401 (Unauthorised)

Examples

Getting the status of an operation:

Sample Request

GET /v1/topics/osca/batch-code/operations/GUID HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-type: application/json
Authorisation: example token

Sample Request specifying the model used

GET /v1/topics/anzsco/models/GUID/batch-code/operations/GUID HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-type: application/json
Authorisation: example token

Sample Responses

HTTP/1.1 200 OK
Content-type: application/json

and any of the following:

Request sampleExpected response bodyInterpretation
New operation (data not yet uploaded){
"operationStatus": "awaiting_input"
}
  • The server acknowledges the operation exists.
  • No input data file has yet been received.
  • The operation is pending your next action - typically an upload via PUT request.

Just uploaded

 

{
"operationStatus": "in_progress"
}
  • The server has received the input data file.
  • The specified operation is now running, or may be queued to run soon.
  • You should keep checking in periodically (for example, up to once every ten minutes) to see how the operation is progressing.
  • The output files will be deleted within 24 hours.

Never uploaded

 

{
"operationStatus": "timed_out"
}
  • The server acknowledges the operation exists.
  • No input data has yet been received.
  • The operation has timed out due to inactivity and can no longer accept input data.
  • If you wish to run an asynchronous batch operation, you will need to create a new operation.
Operation complete{
"operationStatus": "complete",
"responseDownloadUrl": "https://domain/endpoint?queries",
"metadataDownloadUrl": "https://domain/endpoint?queries",
}
  • The specified operation is now complete.
  • The output files are now available at the provided URLs.
  • You should download the output files now as they will be deleted within 24 hours.
  • There may be unsuccessfully coded records in the output file. Errors will be reported on a record-by-record basis where possible. This reduces the need to recode the entire input file.
Operation failed {
"operationStatus": "failed",
"error": "error message"
}
  • The specified operation has failed inference.
  • Check your input file for any errors or invalid records and try again.
  • The error message may provide context on what caused the operation failure. If the error message does not help resolve the issue, please note your operation id when contacting us for support.
  • If you wish to run another asynchronous batch operation, you will need to create a new operation.

Downloading processed data from a complete operation

Once your asynchronous inference operation is complete, you can download the output file by accessing (copying into a web browser) the responseDownloadUrl that is provided when you check the status of a complete operation. The same process may be used to view the operation metadata, available at the metadataDownloadUrl.

This is a generic HTTP GET request which is managed by the AWS S3 server, and the expected format is outlined below.

Response Elements

The asynchronous batch coding service outputs a jsonl file with each line corresponding to the record from the original input file. Each line is an AsynchronousCodeResponse object.

Examples

In response to input which specifies a record identifier:

Sample Request

GET https://domain/endpoint?queries HTTP/1.1

Sample Response

HTTP/1.1 200 OK
Date: Thu, 20 Jun 2024 02:26:34 GMT
Last-Modified: Thu, 20 Jun 2024 02:24:04 GMT
Accept-Ranges: bytes
Content-Type: application/octet-stream
Server: AmazonS3
Content-Length: 7660
...

{ "recordId": "1", "result": { "codeCategory": "261313", " codeLabel": "Software Engineer", "codeConfidence": 0.98 } }
{ "id": "2", "suggestions": [{ "codeCategory": "411111", "codeLabel": "Ambulance Officer", "codeConfidence":  0.26 }, { "codeCategory": "411112", "codeLabel": "Intensive Care Ambulance Paramedic", "codeConfidence": 0.24 }] }
...

 

In response to input which specifies no record identifier:

Sample Request

GET https://domain/endpoint?queries HTTP/1.1

Sample Response

HTTP/1.1 200 OK
Date: Thu, 20 Jun 2024 02:26:34 GMT
Last-Modified: Thu, 20 Jun 2024 02:24:04 GMT
Accept-Ranges: bytes
Content-Type: application/octet-stream
Server: AmazonS3
Content-Length: 7660
...

{ "recordId": "", "result": { "codeCategory": "261313", " codeLabel": "Software Engineer", "codeConfidence": 0.98 } }
{ "recordId": "", "suggestions": [{ "codeCategory": "411111", "codeLabel": "Ambulance Officer", "codeConfidence":  0.26 }, { "codeCategory": "411112", "codeLabel": "Intensive Care Ambulance Paramedic", "codeConfidence": 0.24 }] }
...

 

Back to top of the page