Asynchronous batch coding
Coding a large volume of data.
In addition to real-time coding of single records and small batches of data, the Coding Service has been designed to code large datasets through asynchronous batching (that is, returning data after a short period of time).
The asynchronous service can be used for as little as one record, up to millions of records.
Note: Asynchronous batch coding should be used if you need to code or recode a large volume of data. While it is the most efficient method of coding larger datasets, it is not real-time, and may be subject to queueing during high load periods.
Getting an upload URL for input data to a batch coding operation
This endpoint is used to create an asynchronous batch inference operation. The API will return a location where you can upload your input file and begin your batch inference operation.
Request syntax
Depending on whether you are specifying a model against which to code your records, your request will follow one of the following formats:
1. Coding records against the latest model
POST /v1/topics/{topic}/batch-code HTTP/1.1
Host: string
Content-type: application/json
Authorisation: string
2. Coding records against a specific model
POST /v1/topics/{topic}/models/{model}/batch-code HTTP/1.1
Host: string
Content-type: application/json
Authorisation: string
topic | The uriName of the topic against which the record is coded. This can be acquired by listing the available topics. Required: Yes |
---|---|
model | The model GUID for the model you would like to use to code records. This can be acquired by listing the available models for your topic. Required: No |
Request body
The request does not have a request body.
Response syntax
HTTP/1.1 200 OK
Content-type: application/json
{
"requestUploadUrl": "string",
"operationId": "string",
"bucketKmsKeyArn": "string"
}
If the action is successful, the service sends back an HTTP 200 response. The following data is returned in JSON format by the service:
requestUploadUrl | A URL where the records file is to be uploaded. Type: String |
---|---|
operationId | The identifier of the operation, to be used to check the status of this job. This must be recorded at this point to maintain access to the operation. Type: String, in GUID format. |
bucketKmsKeyArn | A parameter used by the ABS system to ensure the operation’s input data is from the same user who created the operation. This must be passed into the x-amz-server-side-encryption-aws-kms-key-id header when uploading your input file. Type: String |
Errors
For information about the errors that are common to all actions, see Errors and suggested actions.
Examples
Creating a new operation to code against the latest model for occupation:
Sample Request
POST /v1/topics/osca/batch-code HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-Type: application/json
Authorisation: example token
Sample Response
HTTP/1.1 200 OK
Content-type: application/json
{
"requestUploadUrl": "https://domain/endpoint?queries",
"operationId": "00000000-0000-0000-0000-000000000000",
"bucketKmsKeyArn": "xyz"
}
Creating a new operation to code against a specified model:
Sample Request
POST /v1/topics/anzsco/models/GUID/batch-code HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-Type: application/json
Authorisation: example token
Sample Response
HTTP/1.1 200 OK
Content-type: application/json
{
"requestUploadUrl": "https://domain/endpoint?queries",
"operationId": "00000000-0000-0000-0000-000000000000",
"bucketKmsKeyArn": "xyz"
}
Uploading data for inference
Once you have created an inference operation, you will need to upload your data to the provided requestUploadUrl. This is a pre-signed HTTP request which is managed by the AWS S3 server, and the expected input is outlined below.
Request Syntax
PUT requestUploadUrl HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: string
{ "recordId": "string", "occp_text": "string", "tasks_text": "string" }
...
requestUploadUrl | The location where the input file is being uploaded. This is provided when you first create the inference operation. Type: String |
---|
Please note: the x-amz-server-side-encryption header is not variable and should always have the value aws:kms.
x-amz-server-side-encryption-aws-kms-key-id | A parameter used by the ABS system to ensure the input data is from the same user who created the operation. This is provided in the bucketKmsKeyArn field when you first create your inference operation. Type: String |
---|
The request accepts your input file in JSONL format. The maximum input file size is 5GB. All lines of input must contain the same fields, and these fields should satisfy the Record type for the relevant topic/model as specified when creating the upload URL. You may specify the additional field outlined below:
recordId | An identifier for the record being coded. This need not be unique. Type: String Required: No |
---|
Response Syntax
HTTP/1.1 200 OK
Errors
For information about the errors that are common to all actions, see Errors and suggested actions.
Examples
Specifying all free text inputs and a record identifier:
Sample Request
PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz
{ "recordId": "1", "occp_text": "software developer", "tasks_text": "writing code and unit tests" }
{ "recordId": "2", "occp_text": "Paramedic", "tasks_text": "responding to medical emergencies" }
...
Sample Response
HTTP/1.1 200 OK
Specifying a single free text input and a record identifier:
Sample Request
PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz
{ "recordId": "1", "occp_text": "software developer, writes code and unit tests" }
{ "recordId": "2", "occp_text": "Paramedic, respond to emergencies" }
...
Sample Response
HTTP/1.1 200 OK
Specifying all free text inputs and no record identifier:
Sample Request
PUT https://domain/endpoint?queries HTTP/1.1
x-amz-server-side-encryption: aws:kms
x-amz-server-side-encryption-aws-kms-key-id: xyz
{ "occp_text": "software developer", "tasks_text": "writing code and unit tests" }
{ "occp_text": "Paramedic", "tasks_text": "responding to medical emergencies" }
...
Sample Response
HTTP/1.1 200 OK
Checking the status of a batch inference operation
This endpoint is used to check the status of your batch inference job. When the status of your job is complete, the service will return a URL to copy into your web browser to retrieve your coded data.
Request Syntax
Depending on whether you are specifying a model against which to code your records, your request will follow one of the following formats. The application backend handles these requests identically, so you don’t need to worry about recording the model which you used when you began the operation.
1. Checking an operation by specifying the topic only
GET /v1/topics/{topic}/batch-code/operations/{operation_id} HTTP/1.1
Host: string
Content-type: application/json
Authorisation: string
2. Checking an operation by specifying both the topic and model
GET /v1/topics/{topic}/models/{model}/batch-code/operations/{operation_id} HTTP/1.1
Host: string
Content-type: application/json
Authorisation: string
topic | The uriName of the topic against which the record is coded. This can be acquired by listing the available topics. Required: Yes |
---|---|
model | The model GUID for the model you would like to use to code records. This can be acquired by listing the available models for your topic. Required: No |
operation_id | The GUID of the operation to get the status of. This value is provided when you first create your inference operation. Required: Yes |
Request Body
The request does not have a request body.
Response Syntax
HTTP/1.1 200 OK
Content-type: application/json
{
"operationStatus": "string",
"responseDownloadUrl": "string",
"error": "string"
}
If the specified operation exists, the service sends back an HTTP 200 OK status code. The status of the operation will dictate the contents of the response. This data is returned in JSON format by the service:
operationStatus | The status of the operation. Type: String Valid Values: awaiting_input | in_progress | complete | timed_out | failed |
---|---|
responseDownloadUrl | A URL where the output data file can be downloaded. This field is optional and is returned only if operationStatus is complete. Type: String |
metadataDownloadUrl | A URL where the output metadata file can be downloaded. This file includes information about the model used to code your data.This field is optional and is returned only if operationStatus is complete. Type: String |
error | Information on why the operation failed. This field is optional and is returned only if operationStatus is failed. |
A note about presigned URLs
The responseDownloadUrl and metadataDownloadUrl are presigned URLs. Anyone with this link will be able to download your output file, so it is your responsibility to keep the link secret.
The link will expire after one hour, after which you will have to get a new URL for your output file.
Your output files will be deleted from the system within 24 hours after your inference operation succeeds.
A state machine indicating the progression of operations is shown below:
Image

Description
Flow chart of operational steps showing output messages if operation times out, if inference fails and if inference is successful.
If operation times out, output returned is "operation status: timed out".
If inference fails, output returned is "operation status: failed" and a relevant error message is provided.
If inference is successful, output is "operation status: complete" and the relevant response URLs are provided.
For information about the errors that are common to all actions, see Errors and suggested actions. The following errors may occur when calling this service:
Unable to retrieve operation for given id | No operations were found to match the given operation_id. Please confirm your operation ID. If you have lost your operation ID, you will have to create a new operation. HTTP Status Code: 404 (Not Found) |
---|---|
User is not authorised to retrieve operation GUID | The specified operation does not belong to the current user. You may have authenticated with the wrong user or specified the wrong operation_id. Try authenticating again with the right credentials, and confirm your operation. HTTP Status Code: 401 (Unauthorised) |
Examples
Getting the status of an operation:
Sample Request
GET /v1/topics/osca/batch-code/operations/GUID HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-type: application/json
Authorisation: example token
Sample Request specifying the model used
GET /v1/topics/anzsco/models/GUID/batch-code/operations/GUID HTTP/1.1
Host: https://partner-coder.api.abs.gov.au
Content-type: application/json
Authorisation: example token
Sample Responses
HTTP/1.1 200 OK
Content-type: application/json
and any of the following:
Request sample | Expected response body | Interpretation |
---|---|---|
New operation (data not yet uploaded) | { "operationStatus": "awaiting_input" } |
|
Just uploaded
| { "operationStatus": "in_progress" } |
|
Never uploaded
| { "operationStatus": "timed_out" } |
|
Operation complete | { "operationStatus": "complete", "responseDownloadUrl": "https://domain/endpoint?queries", "metadataDownloadUrl": "https://domain/endpoint?queries", } |
|
Operation failed | { "operationStatus": "failed", "error": "error message" } |
|
Downloading processed data from a complete operation
Once your asynchronous inference operation is complete, you can download the output file by accessing (copying into a web browser) the responseDownloadUrl that is provided when you check the status of a complete operation. The same process may be used to view the operation metadata, available at the metadataDownloadUrl.
This is a generic HTTP GET request which is managed by the AWS S3 server, and the expected format is outlined below.
Response Elements
The asynchronous batch coding service outputs a jsonl file with each line corresponding to the record from the original input file. Each line is an AsynchronousCodeResponse object.
Examples
In response to input which specifies a record identifier:
Sample Request
GET https://domain/endpoint?queries HTTP/1.1
Sample Response
HTTP/1.1 200 OK
Date: Thu, 20 Jun 2024 02:26:34 GMT
Last-Modified: Thu, 20 Jun 2024 02:24:04 GMT
Accept-Ranges: bytes
Content-Type: application/octet-stream
Server: AmazonS3
Content-Length: 7660
...
{ "recordId": "1", "result": { "codeCategory": "261313", " codeLabel": "Software Engineer", "codeConfidence": 0.98 } }
{ "id": "2", "suggestions": [{ "codeCategory": "411111", "codeLabel": "Ambulance Officer", "codeConfidence": 0.26 }, { "codeCategory": "411112", "codeLabel": "Intensive Care Ambulance Paramedic", "codeConfidence": 0.24 }] }
...
In response to input which specifies no record identifier:
Sample Request
GET https://domain/endpoint?queries HTTP/1.1
Sample Response
HTTP/1.1 200 OK
Date: Thu, 20 Jun 2024 02:26:34 GMT
Last-Modified: Thu, 20 Jun 2024 02:24:04 GMT
Accept-Ranges: bytes
Content-Type: application/octet-stream
Server: AmazonS3
Content-Length: 7660
...
{ "recordId": "", "result": { "codeCategory": "261313", " codeLabel": "Software Engineer", "codeConfidence": 0.98 } }
{ "recordId": "", "suggestions": [{ "codeCategory": "411111", "codeLabel": "Ambulance Officer", "codeConfidence": 0.26 }, { "codeCategory": "411112", "codeLabel": "Intensive Care Ambulance Paramedic", "codeConfidence": 0.24 }] }
...