Input and output clearance

DataLab

Requesting input and output clearance, output rules

Released
19/11/2021

Outputs from DataLab must be approved by ABS before they can be released. You must not remove anything (data, code, notes, etc.) from the DataLab yourself.

Requests generally take 1 – 2 weeks to be completed. Large, complex, or insufficiently described files will take longer to review. Please only request what you need.

Before you submit a request, apply the appropriate rules and prepare evidence.

For output clearance and transfer requests, move the relevant file(s) to a new folder in the Output drive then choose the appropriate link below to submit your request.

If you have aggregated data, code, or concordance files that you would like to use in DataLab, they will need to be cleared before they can be loaded. These files will not be approved for release in a subsequent output request unless evidence that output rules have been met is provided.

\(\Large ✉\) Request input clearance

If the above email links do not work, see Templates below. Do not reply to, forward or copy in an existing email chain for a new request as this may not be received.

Output rules

Rule of 10

  • Each cell/statistic should have at least 10 (unweighted) contributors
  • Provide unweighted counts

Dominance rules

  • (1,50) rule: the largest contributor of a cell/statistic should not exceed 50% of the total for that cell/statistic
  • (2,67) rule: the two largest contributors of a cell/statistic should not exceed 67% of the total for that cell/statistic
  • Replace negative values with absolute values, take the largest one (two) absolute value(s) and calculate the (1,50) and (2,67) statistics for the contribution to the total of absolute values
  • Provide evidence

 

Applying dominance rules

The dominance rule applies to tables that present magnitude or continuous variables such as income or turnover. This does not apply to categorical variables or counts. The rule is designed to prevent the re-identification of units that contribute a large percentage of a cell's total value, which could in turn reveal information about individuals, households or businesses. The cell dominance rule defines the number of units that are allowed to contribute a defined percentage of the total. 

DataLab has a (1,50) and (2,67) rule. This means that the top contributor cannot contribute more than 50% of the total value to a cell and the top 2 contributors cannot contribute more than 67% of the total value to a cell.

Dominance is required if any mean, total, ratio, proportion or measure of concentration statistic can be calculated for continuous or magnitude variables.

While ratios/proportions can be continuous, if the numerator and denominator of the ratios/proportions are counts, we do not need dominance statistics.

It is also required when there is a regression with a continuous dependent variable and categorical independent variables. In this case, every combination of categorical variables (crosstab) will need to be tested for dominance against the dependent variable.

The below table shows an example of the additional information that analysts need to provide for output clearance when requesting a mean, total, ratio, proportion or measure of concentration

There are multiple instances where the (1,50) (2,67) rule is violated.

The top contributor in LGA 3 contributes 2.51/3.22 = 78% of the total.

This violates the (1,50) rule.

The top 2 contributors in LGA 3 contributes 3.03/3.22 = 94% of the total.

This violates the (2,67) rule.

You may also need to apply consequential suppression to your table so suppressed values cannot be derived.

LGATotal Profit ($M)Top 1 Contributor ($M)Top 2 Contributors ($M)Top 1 Contribution to Total Profit (%)Top 2 Contribution to Total Profit (%)
11.650.510.823150
20.940.110.151216
33.222.513.037894
42.11.521.837287
52.050.50.82439


 

Group disclosure rule

  • In all tabular and similar outputs, no cell should contain 90% or more of the column or row total
  • Provide evidence

Minimum contributors for percentiles

PercentileMinimum contributors
0.01500
0.05100
0.1050
0.2520
0.5010
0.7520
0.9050
0.95100
0.99500

Minimum 10 degrees of freedom

  • All modelled output should have at least 10 degrees of freedom
  • Degrees of freedom = number of observations - number of parameters - other restrictions of the model

Consequential suppression

If one or more of the rules fail and suppression is applied, one or more additional cells should be suppressed to protect the value of the primary suppressed cell from being worked out.

In the case of the rule of 10 failing, if someone has access to multiple tables regarding the same sample, they cannot use these multiple tables to deduce values of cells with less than 10 observations.

In the case of the dominance rules failing, if area11 + area12 + area13 = area1, and a cell in area11 is suppressed, then the same cell in area12 and/or area13 also needs to be suppressed such that both dominance rules pass for the combined suppressed cells.

Likewise, for any other relationships. Examples include:

  • Industry11 + Industry12 + Industry13 = Industry1
  • variable1 + variable2 + variable3 = variable4
  • (variable1 - variable2) / variable1 = variable3
  • variable1 / variable2 = variable3

Preparing your output for clearance

The following supporting evidence is required to demonstrate data meet the Clearance Rules.

Descriptive statistics

Frequency tables
  • Rule of 10
  • Group disclosure rule
  • Consequential suppression
Magnitude tables, means, totals, indices, indicators, proportions, measures of concentration
  • Rule of 10
  • Dominance rules
  • Group disclosure rule
  • Consequential suppression
Ratios
  • Rule of 10
  • Dominance rules
  • Group disclosure rule
  • Consequential suppression
  • If the ratio is calculated at the business or individual level, the ratio is treated as another variable on the dataset and the (1,50) and (2,67) dominance rules applies as usual
  • If the ratio is in the form of aggregate/aggregate, the (1,50) and (2,67) dominance rules applies to the numerator and denominator separately. If either the numerator or denominator fail, the ratio is suppressed
Maximums, minimums

Subject to minimum contributors for percentiles, use:

  • 99th and 1st percentiles
  • 95th and 5th percentiles
  • 90th and 10th percentiles
Quantiles (including median, quartiles, quintiles, deciles, percentiles)
  • Minimum contributors for percentiles
Box plot
  • Same rules apply as per quartiles, maximums and minimums
  • Minimum contributors for percentiles
Mode
  • Rule of 10
Higher moments of distributions/measures of spread (including variance, covariance, kurtosis, skewness)
  • Rule of 10
Graphs, pictorial representations of actual data
  • Not normally released if showing individual observations

Correlation and regression analysis

Regression coefficients, and summary and test statistics
  • Minimum 10 degrees of freedom
  • R-squared ≤ 0.8

For regressions that have a continuous dependent variable and only categorical independent variables, the regression will return the average of each category. In this case:

  • Rule of 10
  • Dominance rules
  • Provide a cross-tab of the independent variables. Each cell must have at least 10 observations.
  • Each cell in the cross-tab needs to be tested for the (1,50) and (2,67) dominance rules for the dependent variable.
Hazard models
  • Rule of 10
  • There must be at least 10 'failures'
Estimation residuals
  • Not normally released
  • Provide justification
Correlation coefficients
  • Rule of 10
     

How to apply dominance rule and rule of 10 for regression

Example 1: Linear Regression

A linear regression was run to predict income by age and health status:

Age was binned into three categories: <18 years, >18 and < 30 and > 30 years, where <18 was the reference category.

Health status was categorised according to healthy or unhealthy, where unhealthy was the reference category.

Suppose the desired output was a regression summary below:

 Beta CoefficientP Value
Constant1.50.001
Age >18 and <3020.004
Age >3030.002

N=1000, R-Squared=0.67

We should provide a crosstabulation of counts and a dominance table for the output clearance team.

Crosstabulation of Counts

 UnhealthyHealthy
Age < 181530
Age >18 and <304070
Age >306089

Counts for each combination of variables are greater than 10. The rule of 10 is satisfied

Dominance Table

We should provide a dominance table for the output clearance team like below:

Please note: Only the columns Top 1 and Top 2 Contribution to Total Income are required. The other columns are presented to illustrate the calculation. This table is also usually presented in one long spreadsheet.

 Total IncomeTop 1 IncomeTop2 IncomeTop 1 Contribution to Total IncomeTop 2 Contribution to Total Income
 UnhealthyUnhealthyUnhealthyUnhealthyUnhealthy
Age < 18$1,500$500$900500/1,500 =33%900/1,500 = 60%
Age >18 and <30$130,000$55,000$85,00055,000/130,000 = 42%85,000/130,000 = 65%
Age >30$1,000,000$520,000$600,000520,000/1,000,000 = 52%600,000/1,000,000=60%
 
 Total IncomeTop 1 IncomeTop2 IncomeTop 1 Contribution to Total IncomeTop 2 Contribution to Total Income
 HealthyHealthyHealthyHealthyHealthy
Age < 18$2,500$1,000$1,3001,000/2,500 = 40%1,300/2,500 = 52%
Age >18 and <30$230,000$155,000$200,000155,000/230,000 = 67%200,000/230,000 = 87%
Age >30$2,000,000$600,000$900,000600,000/2,000,000=30%900,000/1,000,000=90%

There are multiple instances where the (1,50) and (2,67) rules are violated. Adjustments to the regression output will need to be applied before it can be cleared. The most common suggestion is to suppress the constant/intercept.


 

Unit records

Print, list or other commands that produce unit record level data

  • Prohibited

Templates

If the links at the top of the page do not generate emails, please use the following templates instead in a new email. Do not reply to, forward or copy in an existing email chain for a new request as this may not be received.

Request output clearance

To: datalab.clearance@abs.gov.au

Subject: Request DataLab output clearance

Please clear the following output.

  1. Virtual Machine (VM):
  2. Project name:
  3. (Optional) Date and time output needed (minimum 2 business days, Output clearance usually takes 1 - 2 weeks):
  4. Path to folder containing output:
  5. List of files in folder requiring clearance:
  6. (Optional) List of files in folder to support output clearance e.g. code (files listed here will not be cleared):
  7. Data products used to produce output (e.g. blade1617_core):
  8. Relationship of output to the Project Proposal and how output will be used:
  9. Relationship to previous similar cleared outputs (if relevant):
  10. Description of analysis - include all points below:

 

For each table / model / graph provide the following:

  • Description of analysis undertaken (e.g. added up ATO income, logistic regression model to predict mental health service usage)

  • Description of people / businesses in scope including any reference period (e.g. people who have graduated from an Australian university in 2019and gained full-time employment within three years)

  • Clear definitions of each variable (e.g. count: unweighted count of people, avg_inc: weighted average income, empstat: categorical employment status)

  • Definitions of relationships between variables (e.g. total = male + female)

  • Supporting evidence to demonstrate data meets the Clearance Rules (e.g. underlying unweighted counts, dominance checks, degrees of freedom)

 

** Reminder: Do not include counts in emails **

Request transfer between projects

To: datalab.clearance@abs.gov.au

Subject: Request DataLab transfer file load

Please move the following file(s) between my projects.

  1. From Virtual Machine (VM):
  2. To Virtual Machine (VM):
  3. Path to folder for migration (in Output drive):
  4. Reason for moving files:

Ensure that:

  • There is no data or counts within any code
  • There are no IDs in the files
  • Files do not contain any unvetted analysis

If you wish to move files containing data please submit an output request ensuring all output rules are met and note that you want it transferred to another VM.

Request input clearance

To: datalab.clearance@abs.gov.au

Subject: Request DataLab input file load

Please load the attached files to my DataLab account.

1. Virtual Machine (VM):

2. Project name:

3. File type (e.g. code, data, correspondence file):

4. For each file evidence of permission to use as follows

  • Name of organisation/individual owner of the data/code:
  • If publicly available
    • The source of the data (e.g. website links):
    • Terms of Use (This is usually the copyright link at the bottom of website):
  • If not publicly available (e.g. purchased)
    • Attach copy of consent from owner of data

5. For data and correspondence files

  • Description of each file:
  • How the file(s) will be used:
  • Data item list of each variable on the file and description of each variable:

 

Please note the following will not be loaded to the DataLab, please remove these fields.

  • Names of people or businesses - note: names of courses, subjects, jobs etc may be loaded if the file passes technical assessment, this is a charged service, please contact data.services@abs.gov.au
  • Address information or longitudes and latitudes for a specific address or location - we do allow longitude and latitude of geography classifications e.g. Local government area, Statistical Area 3 (SA3), etc.
  • Free text fields – either remove or summarise free text field into a categorical variable
  • ID variables – the presence of ID variables signifies a unit record file which cannot be loaded via this process
Back to top of the page