Input and output clearance

DataLab

Requesting input, output and transfer clearance. Applying the output rules to your analysis. 

Released
19/11/2021

DataLab clearance instructions and templates

Request output clearance

DataLab outputs must be approved and cleared by the ABS before being shared. Output requests generally take 1–2 weeks to be completed. Large, complex, or insufficiently described files will take longer to review. Please apply data minimisation principles and only request what you need. You must not copy or remove anything (for example data, code, notes) out of the DataLab yourself. Please do not include any counts or data from DataLab in your emails with the ABS.

To request output clearance:

  1. Format the data with clear headings and labels (ODS or ODT format is preferred).
  2. Apply the appropriate output rules and prepare evidence in a separate file.
  3. Move the relevant files, including supporting evidence, to a new sub-folder in the output (O:) drive.
  4. Click the button below to generate an email and then complete the details and send.

✉ Request output clearance

If the link does not generate an email, please use the following template in a new email. Do not reply to, forward or copy an existing email chain for a new request as this will not be received.

To: datalab.clearance@abs.gov.au 
Subject: Request DataLab output clearance 

  1. Virtual machine (VM): 
  2. Project name: 
  3. (Optional) If urgent, date required and justification:
  4. Output drive sub-folder path: O:/
  5. Names of files requiring clearance: 
  6. (If relevant) Names of files with supporting evidence: 
  7. Data products used to produce output (e.g. blade1617_core): 
  1. Relationship of output to project objectives: 
  2. (If relevant) Relationship to previous requests: 
  1. For each table / model / graph:
  • Description of output (e.g. weighted income by age, logistic regression model to predict health service usage)
  • People / businesses in scope including reference period (e.g. mining businesses operating in 2019)
  • Definitions of each original and self-constructed variable in output (e.g. count: unweighted count of people, empstat: employment status)
  • Output rules applied (e.g. Rule of 10 on unweighted counts, dominance, degrees of freedom)

   ** Reminder: Do not include counts or data in emails ** 

Request input clearance

This request is only for loading aggregate data, concordances, supporting material or statistical code to DataLab projects. To add microdata to your project, please submit an existing DataLab project query. To request new software or a software package, please use this template.

The following will not be loaded to the DataLab.

  • Names of people or businesses
  • Addresses or longitudes and latitudes for specific locations
  • Free text fields

To request input clearance:

  1. Click the button below to generate an email and complete the details.
  2. Attach any files for input and send.

✉ Request input clearance

If the link does not generate an email, please use the following template in a new email. Do not reply to, forward or copy an existing email chain for a new request as this will not be received. 

To: datalab.clearance@abs.gov.au
Subject: Request DataLab input file load

  1. Virtual machine (VM): 
  2. Project name:  
  3. (Optional) If urgent, date required and justification: 
  4. File type (e.g. code, aggregate data, correspondence file): 
  1. If publicly available:
    Source URL: 
    Terms of use (e.g. Creative Commons Attribution 4.0): 
  1. If not publicly available:
    Name of owner/author/custodian:  
    Attach consent to use file in DataLab. 
  1. Description of each file:
  2. How the file will be used: 
  3. For data tables - description of each variable:

Request transfer between projects

This request is to move code and other files that do not contain data between DataLab projects. Please ensure there are no counts or IDs anywhere, including in logs or comments. If you wish to move files containing data, please submit an output request ensuring all output rules are met and note that you want it transferred to another project.

To request transfer clearance:

  1. Check files do not contain any data.
  2. Move the relevant files to a new sub-folder in the output (O:) drive. 
  3. Click the button below to generate an email and then complete the details and send. 

✉ Request transfer between projects

If the link does not generate an email, please use the following template in a new email. Do not reply to, forward or copy an existing email chain for a new request as this will not be received.

To: datalab.clearance@abs.gov.au
Subject: Request DataLab file transfer

  1. Transfer from virtual machine (VM):
  2. Transfer to VM: 
  3. (Optional) If urgent, date required and justification: 
  4. Output drive sub-folder path: O:/
  5. Names of files requiring transfer: 
  6. Reason for moving files: 

** Reminder: Do not include counts or data in emails **  

Output rules quick reference table

The most common types of analysis are listed below along with the applicable rules for output. Other output types will be assessed based on similar principles. 

 
Output typeApplicable rules
Frequency tables (counts, percentages)Rule of 10
Group disclosure
Magnitude statistics (means, sums, ratios)Rule of 10
Group disclosure
Dominance
Quantiles (percentiles, medians)Minimum contributors for quantiles                                                         
Minimums, maximums, rangesMinimum contributors for quantiles
Models including regressionsDegrees of freedom
Model-specific rules
Charts (graphs, plots and histograms)Chart clearance
MicrodataNot appropriate for output
Synthetic microdataNot appropriate for output

Rule of 10 

The rule of 10 refers to the minimum number of contributors required for each cell or statistic. The underlying (unweighted) count of observations must meet this threshold, and evidence must be provided. 

If multiple tables are produced, differences of less than ten should not be able to be calculated through combining the tables. 

The rule of 10 applies to most outputs including counts, percentages (both numerator and denominator), means, sums, ratios, and other statistics. 

Options for making output safe include suppression of small counts, aggregation of categories or perturbation. If a cell is suppressed but it can be derived or estimated from other outputs, one or more additional values should be suppressed to protect the values of the the primary suppressed cell from being worked out.

See Data downloads for examples and options for treatment. 

Dominance 

The dominance rule is designed to prevent the re-identification of units that contribute a large percentage of a cell's total value, which could in turn reveal information about individuals, households or businesses. 

DataLab has a (1,50) and a (2,67) rule. This means that for any cell, the largest contributor cannot account for more than 50% of the total value and the largest two contributors cannot account for more than 67% of the total value. 

Where a variable can take both positive and negative values, the negative values should be replaced with absolute values before determining the largest contributors and the total. The largest absolute value is then divided by the sum of absolute values to determine if the (1,50) rule is met, and the sum of the two largest absolute values are divided by the sum of absolute values to check the (2,67) rule.  

Similar to the rule of 10, in the case of the dominance rule failing and if a cell is suppressed but it can be derived or estimated from other outputs, one or more additional values should be suppressed to protect the values of the primary suppressed cell from being worked out.

Dominance must be checked if any mean, total or similar statistic is calculated for continuous or magnitude variables. It does not apply to counts.

See Data downloads for examples and options for treatment. 

Group disclosure

Group (or attribute) disclosure occurs when all or nearly all units that have one feature also have some other feature. This means that even when the individual units may appear protected based on other rules, a previously unknown attribute of a unit may be disclosed based on the attributes of the group. Group disclosure risk should be assessed when any cell contains more than 90% of total number of units in the row or column.   

This rule applies to frequency tables. Whether group disclosure requires treatment depends on the sensitivity and nature of the output. 

See Data downloads for examples and options for treatment. 

Minimum contributors for quantiles

Quantiles and other relative ranks must be based on a minimum number of contributors depending on the precision. Underlying unweighted counts should be provided when reporting quantiles in the outputs. For information on required contributors for quantiles, see the table below: 

 
Quantile Minimum contributors 

Medians ( 0.50 )

10 

Quartiles ( 0.25, 0.5, 0.75 )

20 

Quintiles ( 0.2, 0.4, 0.6, 0.8 )

25 

Deciles ( 0.1, 0.2, 0.3 ... 0.9 )

50 

Vigintiles ( 0.05, 0.1, 0.15 ... 0.95 )

100 

Percentiles ( 0.01, 0.02 ... 0.99 )

500 

Minimums and maximums are generally unsafe to output. The following percentiles are safe options if the minimum contributors rule is satisfied: 

  • 1st and 99th percentiles 
  • 5th and 95th percentiles 
  • 10th and 90th percentiles 

See Data downloads for examples and options for treatment. 

Degrees of freedom

Models and regressions are generally safe to output. However, overfitted models can pose a disclosure risk. All models and regressions must have a minimum of 10 degrees of freedom and evidence that this has been met should be provided.

The degrees of freedom are calculated by subtracting the number of parameters and other model restrictions from the total number of observations that contribute to the model.

See Data downloads for examples and options for treatment.

Model-specific rules

There are additional rules for specific model types. 

For ordinary least squares regressions, the R-squared should be lower than 0.9. If the R-squared is higher than this, the constant may need to be suppressed to prevent predictions. This requirement does not apply to other models such as fixed effects or two-stage regressions. 

Additionally, for ordinary least squares regressions with a continuous dependent variable and only categorical independent variables, the regression will approximate the tabular means. The addition of a continuous independent variable, or suppression of the intercept reduces the disclosure risk. Otherwise, apply the rule of 10 and dominance rules.

For survival curves, each step change in the survival curve should represent at least 10 data subjects. 

Correlation coefficients should be calculated based on a minimum of 10 contributors.  

Gini coefficients are usually safe to output, and must be based on a minimum of 10 contributors. 

For classification and regression trees, any underlying unweighted counts must meet the rule of 10

For other models, please provide evidence that no estimates or parameters are derived from fewer than 10 underlying contributors and explain why the output is non-disclosive.  

See Data downloads for examples and options for treatment. 

Chart clearance

All graphs, plots and other charts are subject to the output rules that apply to the underlying output type. The data used in the chart must be provided, accompanied by any relevant supporting evidence that it meets output rules. 

Charts that plot characteristics of individual units or groups of fewer than 10 units will not be cleared. 

See Data downloads for examples and options for treatment. 

Data downloads

DataLab output clearance examples (not real data)

Back to top of the page