Input and output clearance

DataLab

Requesting input and output clearance, output rules

Released
19/11/2021

Outputs from DataLab must be approved by ABS before they can be released. You must not remove anything (data, code, notes, etc.) from the DataLab yourself.

Before you ask for output clearance, apply the appropriate DataLab output rules to each statistic.

 \(\Large ✉\) Request output clearance

 \(\Large ✉\) Request input clearance 
 

Output rules

Rule of 10

  • Each cell/statistic should have at least 10 (unweighted) contributors
  • Provide unweighted counts

Dominance rules

  • (1,50) rule: the largest contributor of a cell/statistic should not exceed 50% of the total for that cell/statistic
  • (2,67) rule: the two largest contributors of a cell/statistic should not exceed 67% of the total for that cell/statistic
  • Replace negative values with absolute values, take the largest one (two) absolute value(s) and calculate the (1,50) and (2,67) statistics for the contribution to the total of absolute values
  • Provide evidence

 

Applying dominance rules

The dominance rule applies to tables that present magnitude or continuous variables such as income or turnover. This does not apply to categorical variables or counts. The rule is designed to prevent the re-identification of units that contribute a large percentage of a cell's total value, which could in turn reveal information about individuals, households or businesses. The cell dominance rule defines the number of units that are allowed to contribute a defined percentage of the total. 

DataLab has a (1,50) and (2,67) rule. This means that the top contributor cannot contribute more than 50% of the total value to a cell and the top 2 contributors cannot contribute more than 67% of the total value to a cell.

Dominance is required if any mean, total, ratio, proportion or measure of concentration statistic can be calculated for continuous or magnitude variables.

While ratios/proportions can be continuous, if the numerator and denominator of the ratios/proportions are counts, we do not need dominance statistics.

It is also required when there is a regression with a continuous dependent variable and categorical independent variables. In this case, every combination of categorical variables (crosstab) will need to be tested for dominance against the dependent variable.

The below table shows an example of the additional information that analysts need to provide for output clearance when requesting a mean, total, ratio, proportion or measure of concentration

There are multiple instances where the (1,50) (2,67) rule is violated.

The top contributor in LGA 3 contributes 2.51/3.22 = 78% of the total.

This violates the (1,50) rule.

The top 2 contributors in LGA 3 contributes 3.03/3.22 = 94% of the total.

This violates the (2,67) rule.

You may also need to apply consequential suppression to your table so suppressed values cannot be derived.

LGATotal Profit ($M)Top 1 Contributor ($M)Top 2 Contributors ($M)Top 1 Contribution to Total Profit (%)Top 2 Contribution to Total Profit (%)
11.650.510.823150
20.940.110.151216
33.222.513.037894
42.11.521.837287
52.050.50.82439


 

Group disclosure rule

  • In all tabular and similar outputs, no cell should contain 90% or more of the column or row total
  • Provide evidence

Minimum contributors for percentiles

PercentileMinimum contributors
0.01500
0.05100
0.1050
0.2520
0.5010
0.7520
0.9050
0.95100
0.99500

Minimum 10 degrees of freedom

  • All modelled output should have at least 10 degrees of freedom
  • Degrees of freedom = number of observations - number of parameters - other restrictions of the model

Consequential suppression

If one or more of the rules fail and suppression is applied, one or more additional cells should be suppressed to protect the value of the primary suppressed cell from being worked out.

In the case of the rule of 10 failing, if someone has access to multiple tables regarding the same sample, they cannot use these multiple tables to deduce values of cells with less than 10 observations.

In the case of the dominance rules failing, if area11 + area12 + area13 = area1, and a cell in area11 is suppressed, then the same cell in area12 and/or area13 also needs to be suppressed such that both dominance rules pass for the combined suppressed cells.

Likewise, for any other relationships. Examples include:

  • Industry11 + Industry12 + Industry13 = Industry1
  • variable1 + variable2 + variable3 = variable4
  • (variable1 - variable2) / variable1 = variable3
  • variable1 / variable2 = variable3

Preparing your output for clearance

Descriptive statistics

Frequency tables
  • Rule of 10
  • Group disclosure rule
  • Consequential suppression
Magnitude tables, means, totals, indices, indicators, proportions, measures of concentration
  • Rule of 10
  • Dominance rules
  • Group disclosure rule
  • Consequential suppression
Ratios
  • Rule of 10
  • Dominance rules
  • Group disclosure rule
  • Consequential suppression
  • If the ratio is calculated at the business or individual level, the ratio is treated as another variable on the dataset and the (1,50) and (2,67) dominance rules applies as usual
  • If the ratio is in the form of aggregate/aggregate, the (1,50) and (2,67) dominance rules applies to the numerator and denominator separately. If either the numerator or denominator fail, the ratio is suppressed
Maximums, minimums

Subject to minimum contributors for percentiles, use:

  • 99th and 1st percentiles
  • 95th and 5th percentiles
  • 90th and 10th percentiles
Quantiles (including median, quartiles, quintiles, deciles, percentiles)
  • Minimum contributors for percentiles
Box plot
  • Same rules apply as per quartiles, maximums and minimums
  • Minimum contributors for percentiles
Mode
  • Rule of 10
Higher moments of distributions/measures of spread (including variance, covariance, kurtosis, skewness)
  • Rule of 10
Graphs, pictorial representations of actual data
  • Not normally released if showing individual observations

Correlation and regression analysis

Regression coefficients, and summary and test statistics
  • Minimum 10 degrees of freedom
  • R-squared ≤ 0.8

For regressions that have a continuous dependent variable and only categorical independent variables, the regression will return the average of each category. In this case:

  • Rule of 10
  • Dominance rules
  • Provide a cross-tab of the independent variables. Each cell must have at least 10 observations.
  • Each cell in the cross-tab needs to be tested for the (1,50) and (2,67) dominance rules for the dependent variable.
Hazard models
  • Rule of 10
  • There must be at least 10 'failures'
Estimation residuals
  • Not normally released
  • Provide justification
Correlation coefficients
  • Rule of 10
     

How to apply dominance rule and rule of 10 for regression

Example 1: Linear Regression

A linear regression was run to predict income by age and health status:

Age was binned into three categories: <18 years, >18 and < 30 and > 30 years, where <18 was the reference category.

Health status was categorised according to healthy or unhealthy, where unhealthy was the reference category.

Suppose the desired output was a regression summary below:

 Beta CoefficientP Value
Constant1.50.001
Age >18 and <3020.004
Age >3030.002

N=1000, R-Squared=0.67

We should provide a crosstabulation of counts and a dominance table for the output clearance team.

Crosstabulation of Counts

 UnhealthyHealthy
Age < 181530
Age >18 and <304070
Age >306089

Counts for each combination of variables are greater than 10. The rule of 10 is satisfied

Dominance Table

We should provide a dominance table for the output clearance team like below:

Please note: Only the columns Top 1 and Top 2 Contribution to Total Income are required. The other columns are presented to illustrate the calculation. This table is also usually presented in one long spreadsheet.

 Total IncomeTop 1 IncomeTop2 IncomeTop 1 Contribution to Total IncomeTop 2 Contribution to Total Income
 UnhealthyUnhealthyUnhealthyUnhealthyUnhealthy
Age < 18$1,500$500$900500/1,500 =33%900/1,500 = 60%
Age >18 and <30$130,000$55,000$85,00055,000/130,000 = 42%85,000/130,000 = 65%
Age >30$1,000,000$520,000$600,000520,000/1,000,000 = 52%600,000/1,000,000=60%
 
 Total IncomeTop 1 IncomeTop2 IncomeTop 1 Contribution to Total IncomeTop 2 Contribution to Total Income
 HealthyHealthyHealthyHealthyHealthy
Age < 18$2,500$1,000$1,3001,000/2,500 = 40%1,300/2,500 = 52%
Age >18 and <30$230,000$155,000$200,000155,000/230,000 = 67%200,000/230,000 = 87%
Age >30$2,000,000$600,000$900,000600,000/2,000,000=30%900,000/1,000,000=90%

There are multiple instances where the (1,50) and (2,67) rules are violated. Adjustments to the regression output will need to be applied before it can be cleared. The most common suggestion is to suppress the constant/intercept.


 

Unit records

Print, list or other commands that produce unit record level data

  • Prohibited

Request output clearance

To request output clearance:

  1. Make sure you have applied the output clearance rules.
  2. Move your output to the Output drive.
  3. Use the 'Request output clearance' link at the top of this page. If the Request output button does not generate an email, use the template below to submit your request.

Outputs generally take 2-3 business days to be cleared if all the rules have been followed. Outputs where the rules have been improperly applied will take longer. Large outputs will also take longer. To minimise clearance time, ensure that requests contain only necessary outputs and the rules have been correctly applied.

To: microdata.access@abs.gov.au

Subject: Request DataLab output clearance

Dear DataLab team

I have saved my output to the Output drive for ABS review.

Project name:
Output file name(s):
Data file(s) used (e.g. BLADE1617_CORE):
Description of the original and self-constructed variables:
Description of the analysis:

Additional requirements are listed below:

  • Weighted outputs: I have included the unweighted frequencies in my output.
  • Graphs/charts: I have included the underlying numbers used to produce the graphs/charts.
  • I have included any relevant code and log files.

Request input clearance

If you have your own data, code or files that you would like to use in DataLab, they need to be approved before they can be loaded. This is known as input clearance. Examples of inputs include:

  • data - aggregated data, tables, microdata and classifications
  • code - user written code and packages
  • other files - Word documents and PDFs

To request input clearance use the 'Request input clearance' link at the top of this page. If the Request input clearance link does not generate an email, use the template below to submit your request.

We aim to respond to your input clearance request within two to three business days. It is likely to take longer if your request is large, complex or needs clarification.

To: microdata.access@abs.gov.au

Subject: Request DataLab input file load

Dear DataLab team

I would like to load the attached file(s) to my DataLab project.

Project name:
File type (e.g. code or data):
Description of each file:

Additional information required for each data file:

  • organisation/individual owner of the data:
  • source of the data (include website link if applicable):
  • any terms of use or licensing that applies to the data that may restrict its use in the ABS DataLab and require additional permissions or conditions: