Asynchronous batch coding

WoAG Occupation Coding Service User Guide

Coding a large volume of data.

Released
30/06/2025
Release date and time
30/06/2025 11:30am AEST

In addition to real-time coding of single records and small batches of data, the Coding Service has been designed to code large datasets through asynchronous batching (that is, returning data after a short period of time). 

The asynchronous service can be used for as little as one record, up to millions of records.

Note: Asynchronous batch coding should be used if you need to code or recode a large volume of data. While it is the most efficient method of coding larger datasets, it is not real-time, and may be subject to queueing during high load periods. 

Getting an upload URL for input data to a batch coding operation

This endpoint is used to create an asynchronous batch inference operation. The API will return a location where you can upload your input file and begin your batch inference operation.

Request syntax

Depending on whether you are specifying a model against which to code your records, your request will follow one of the following formats:

1. Coding records against the latest model
POST /v1/topics/{topic}/batch-code HTTP/1.1
Host: string
Content-type: application/json
Authorization: string
2. Coding records against a specific model
POST /v1/topics/{topic}/models/{model}/batch-code HTTP/1.1 
Host: string
Content-type: application/json
Authorization: string
URI request parameters
topicThe uriName of the topic against which the record is coded. This can be acquired by listing the available topics.
Required: Yes
modelThe model GUID for the model you would like to use to code records. This can be acquired by listing the available models for your topic
Required: No

Request body

The request does not have a request body.

Response syntax

HTTP/1.1 200 OK
Content-type: application/json
{
    "requestUploadUrl": "string",
    "operationId": "string",
    "bucketKmsKeyArn": "string"
}

If the action is successful, the service sends back an HTTP 200 response. The following data is returned in JSON format by the service:

Response elements
requestUploadUrlA URL where the records file is to be uploaded.
Type: String
operationIdThe identifier of the operation, to be used to check the status of this job. This must be recorded at this point to maintain access to the operation.
Type: String, in GUID format.
bucketKmsKeyArnA parameter used by the ABS system to ensure the operation’s input data is from the same user who created the operation. 
This must be passed into the x-amz-server-side-encryption-aws-kms-key-id header when uploading your input file.
Type: String

Errors

For information about asynchronous coding errors, see Errors and suggested actions.

Examples

 Creating a new operation to code against the latest model:
Sample request
POST /v1/topics/osca/batch-code HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-Type: application/json
Authorization: example token
Sample response
HTTP/1.1 200 OK
Content-type: application/json
{
    "requestUploadUrl": "https://domain/endpoint?queries",
    "operationId": "00000000-0000-0000-0000-000000000000",
    "bucketKmsKeyArn": "xyz"
}
 Creating a new operation to code against a specified model:
Sample request
POST /v1/topics/anzsco/models/GUID/batch-code HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-Type: application/json
Authorization: example token
Sample response
HTTP/1.1 200 OK
Content-type: application/json
{
    "requestUploadUrl": "https://domain/endpoint?queries",
    "operationId": "00000000-0000-0000-0000-000000000000",
    "bucketKmsKeyArn": "xyz"
}

Uploading data for inference

Once you have created an inference operation, you will need to upload your data to the provided requestUploadUrl. This is a pre-signed HTTP request which is managed by the AWS S3 server, and the expected input is outlined below. 

  • Both occp_text and tasks_test fields are required (although one may be empty, indicated by “”).

Request Syntax

PUT requestUploadUrl HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: string
{ "recordId": "string", "occp_text": "string", "tasks_text": "string" }
...
URI Request Parameters
requestUploadUrlThe location where the input file is being uploaded. This is provided when you first create the inference operation.
Type: String

Please note: the x-amz-server-side-encryption header is not variable and should always have the value aws:kms.

Request Header Parameters
x-amz-server-side-encryption-aws-kms-key-idA parameter used by the ABS system to ensure the input data is from the same user who created the operation. This is provided in the bucketKmsKeyArn field when you first create your inference operation.
Type: String

The request accepts your input file in JSONL format. The maximum input file size is 5GB. All lines of input must contain the same fields, and these fields should satisfy the Record type for the relevant topic/model as specified when creating the upload URL. You may specify the additional field outlined below:

Request Body
recordIdAn identifier for the record being coded. This need not be unique.
Type: String
Required: No

Response Syntax

HTTP/1.1 200 OK

Errors

For information about the errors that are common to all actions, see Errors and suggested actions.

Examples

 Specifying a record identifier:
Sample request
PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz
{ "recordId": "1", "occp_text": "software developer", "tasks_text": "writing code and unit tests" }
{ "recordId": "2", "occp_text": "Paramedic", "tasks_text": "responding to medical emergencies" }
{ "recordId": "3", "occp_text": "Brickie", "tasks_text": "" }
...
Sample response
HTTP/1.1 200 OK
 Specifying no record identifier:
Sample request
PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz
{ "occp_text": "software developer", "tasks_text": "writing code and unit tests" }
{ "occp_text": "Paramedic", "tasks_text": "responding to medical emergencies" }
{ "occp_text": "Brickie", "tasks_text": "" }
...
Sample response
HTTP/1.1 200 OK

Checking the status of a batch inference operation

This endpoint is used to check the status of your batch inference job. When the status of your job is complete, the service will return a URL to copy into your web browser to retrieve your coded data.

Request Syntax

Depending on whether you are specifying a model against which to code your records, your request will follow one of the following formats. The application backend handles these requests identically, so you don’t need to worry about recording the model which you used when you began the operation.

1. Checking an operation by specifying the topic only
GET /v1/topics/{topic}/batch-code/operations/{operation_id} HTTP/1.1
Host: string
Content-type: application/json
Authorization: string
 
2. Checking an operation by specifying both the topic and model 
GET /v1/topics/{topic}/models/{model}/batch-code/operations/{operation_id} HTTP/1.1
Host: string
Content-type: application/json
Authorization: string
URI Request Parameters
topicThe uriName of the topic against which the record is coded. This can be acquired by listing the available topics.
Required: Yes
modelThe model GUID for the model you would like to use to code records. This can be acquired by listing the available models for your topic.
Required: No
operation_idThe GUID of the operation to get the status of. This value is provided when you first create your inference operation.
Required:  Yes

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200 OK
Content-type: application/json
{
    "operationStatus": "string",
    "responseDownloadUrl": "string",
    "error": "string"
}

If the specified operation exists, the service sends back an HTTP 200 OK status code. The status of the operation will dictate the contents of the response. This data is returned in JSON format by the service:

Response Elements
operationStatusThe status of the operation.
Type: String
Valid Values: awaiting_input | in_progress | complete | timed_out | failed
responseDownloadUrlA URL where the output data file can be downloaded. This field is optional and is returned only if operationStatus is complete.
Type: String
metadataDownloadUrlA URL where the output metadata file can be downloaded. This file includes information about the model used to code your data.This field is optional and is returned only if operationStatus is complete.
Type: String
errorInformation on why the operation failed. This field is optional and is returned only if operationStatus is failed.

A note about presigned URLs

The responseDownloadUrl and metadataDownloadUrl are presigned URLs. Anyone with this link will be able to download your output file, so it is your responsibility to keep the link secret. 

The link will expire after one hour, after which you will have to get a new URL for your output file

Your output files will be deleted from the system within 24 hours after your inference operation succeeds.

A state machine indicating the progression of operations is shown below:

Flow chart of operational steps showing output messages if operation times out, if inference fails and if inference is successful.

Flow chart of operational steps showing output messages if operation times out, if inference fails and if inference is successful. 

If operation times out, output returned is "operation status: timed out".

If inference fails, output returned is "operation status: failed" and a relevant error message is provided.

If inference is successful, output is "operation status: complete" and the relevant response URLs are provided.

Errors

For information about asynchronous coding errors, see Errors and suggested actions.

Examples

 Getting the status of an operation:
Sample request
GET /v1/topics/osca/batch-code/operations/GUID HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-type: application/json
Authorization: example token
Sample request specifying the model used
GET /v1/topics/anzsco/models/GUID/batch-code/operations/GUID HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-type: application/json
Authorization: example token
Sample responses
HTTP/1.1 200 OK
Content-type: application/json

and any of the following:

Request sampleExpected response bodyInterpretation
New operation (data not yet uploaded)
{
"operationStatus": "awaiting_input"
}
  • The server acknowledges the operation exists.
  • No input data file has yet been received.
  • The operation is pending your next action - typically an upload via PUT request.

Just uploaded

 

{
"operationStatus": "in_progress"
}
  • The server has received the input data file.
  • The specified operation is now running, or may be queued to run soon.
  • You should keep checking in periodically (for example, up to once every ten minutes) to see how the operation is progressing.
  • The output files will be deleted within 24 hours.

Never uploaded

 

{
"operationStatus": "timed_out"
}
  • The server acknowledges the operation exists.
  • No input data has yet been received.
  • The operation has timed out due to inactivity and can no longer accept input data.
  • If you wish to run an asynchronous batch operation, you will need to create a new operation.
Operation complete
{
"operationStatus": "complete",
"responseDownloadUrl":
"https://domain/endpoint?queries",
"metadataDownloadUrl":
"https://domain/endpoint?queries",
}
  • The specified operation is now complete.
  • The output files are now available at the provided URLs.
  • You should download the output files now as they will be deleted within 24 hours.
  • There may be unsuccessfully coded records in the output file. Errors will be reported on a record-by-record basis where possible. This reduces the need to recode the entire input file.
Operation failed 
{
"operationStatus": "failed",
"error": "error message"
}
  • The specified operation has failed inference.
  • Check your input file for any errors or invalid records and try again.
  • The error message may provide context on what caused the operation failure. If the error message does not help resolve the issue, please note your operation id when contacting us for support.
  • If you wish to run another asynchronous batch operation, you will need to create a new operation.

Downloading processed data from a complete operation

Once your asynchronous inference operation is complete, you can download the output file by accessing (copying into a web browser) the responseDownloadUrl that is provided when you check the status of a complete operation. The same process may be used to view the operation metadata, available at the metadataDownloadUrl.

This is a generic HTTP GET request which is managed by the AWS S3 server, and the expected format is outlined below.

Response Elements

The asynchronous batch coding service outputs a jsonl file with each line corresponding to the record from the original input file. Each line is an AsynchronousCodeResponse object.

Examples

 In response to input which specifies a record identifier:
Sample request
GET https://domain/endpoint?queries HTTP/1.1
Sample response
HTTP/1.1 200 OK
Date: Thu, 20 Jun 2024 02:26:34 GMT
Last-Modified: Thu, 20 Jun 2024 02:24:04 GMT
Accept-Ranges: bytes
Content-Type: application/octet-stream
Server: AmazonS3
Content-Length: 7660
...
{ "recordId": "1", “codeStatus”: “successful”, “input”: { “occp_text”: “software developer”, “tasks_text”: “writing code and unit tests” }, "result": { "codeCategory": "261313", " codeLabel": "Software Engineer", "codeConfidence": 0.98 } }
{ "recordId": "2", “codeStatus”: “unsuccessful”, “input”: { “occp_text”: “Paramedic”, “tasks_text”: “responding to medical emergencies” }, "suggestions": [{ "codeCategory": "411111", "codeLabel": "Ambulance Officer", "codeConfidence":  0.26 }, { "codeCategory": "411112", "codeLabel": "Intensive Care Ambulance Paramedic", "codeConfidence": 0.24 }] 
{ "recordId": "unknown", "codeStatus": "unsuccessful", "input": "this is not a json string", "error": "Invalid JSON data format. "}
 In response to input which specifies no record identifier:
Sample request
GET https://domain/endpoint?queries HTTP/1.1
Sample response
HTTP/1.1 200 OK
Date: Thu, 20 Jun 2024 02:26:34 GMT
Last-Modified: Thu, 20 Jun 2024 02:24:04 GMT
Accept-Ranges: bytes
Content-Type: application/octet-stream
Server: AmazonS3
Content-Length: 7660
...
{ "recordId": null, “codeStatus”: “successful”, “input”: { “occp_text”: “software developer”, “tasks_text”: “writing code and unit tests” }, "result": { "codeCategory": "261313", " codeLabel": "Software Engineer", "codeConfidence": 0.98 } }
{ "recordId": null, “codeStatus”: “unsuccessful”, “input”: { “occp_text”: “Paramedic”, “tasks_text”: “responding to medical emergencies” }, "suggestions": [{ "codeCategory": "411111", "codeLabel": "Ambulance Officer", "codeConfidence":  0.26 }, { "codeCategory": "411112", "codeLabel": "Intensive Care Ambulance Paramedic", "codeConfidence": 0.24 }] }
{ "recordId": "unknown", "codeStatus": "unsuccessful", "input": "this is not a json string", "error": "Invalid JSON data format. "}
Back to top of the page