# Input and output clearance

DataLab

Requesting input and output clearance, output rules

Released
19/11/2021

Outputs from DataLab must be approved by ABS before they can be released. You must not remove anything (data, code, notes, etc.) from the DataLab yourself.

Before you ask for output clearance, apply the appropriate DataLab output rules to each statistic.

## Output rules

### Rule of 10

• Each cell/statistic should have at least 10 (unweighted) contributors
• Provide unweighted counts

### Dominance rules

• (1,50) rule: the largest contributor of a cell/statistic should not exceed 50% of the total for that cell/statistic
• (2,67) rule: the two largest contributors of a cell/statistic should not exceed 67% of the total for that cell/statistic
• Replace negative values with absolute values, take the largest one (two) absolute value(s) and calculate the (1,50) and (2,67) statistics for the contribution to the total of absolute values
• Provide evidence

### Applying dominance rules

The dominance rule applies to tables that present magnitude or continuous variables such as income or turnover. This does not apply to categorical variables or counts. The rule is designed to prevent the re-identification of units that contribute a large percentage of a cell's total value, which could in turn reveal information about individuals, households or businesses. The cell dominance rule defines the number of units that are allowed to contribute a defined percentage of the total.

DataLab has a (1,50) and (2,67) rule. This means that the top contributor cannot contribute more than 50% of the total value to a cell and the top 2 contributors cannot contribute more than 67% of the total value to a cell.

Dominance is required if any mean, total, ratio, proportion or measure of concentration statistic can be calculated for continuous or magnitude variables.

While ratios/proportions can be continuous, if the numerator and denominator of the ratios/proportions are counts, we do not need dominance statistics.

It is also required when there is a regression with a continuous dependent variable and categorical independent variables. In this case, every combination of categorical variables (crosstab) will need to be tested for dominance against the dependent variable.

The below table shows an example of the additional information that analysts need to provide for output clearance when requesting a mean, total, ratio, proportion or measure of concentration

There are multiple instances where the (1,50) (2,67) rule is violated.

The top contributor in LGA 3 contributes 2.51/3.22 = 78% of the total.

This violates the (1,50) rule.

The top 2 contributors in LGA 3 contributes 3.03/3.22 = 94% of the total.

This violates the (2,67) rule.

You may also need to apply consequential suppression to your table so suppressed values cannot be derived.

LGATotal Profit ($M)Top 1 Contributor ($M)Top 2 Contributors ($M)Top 1 Contribution to Total Profit (%)Top 2 Contribution to Total Profit (%) 11.650.510.823150 20.940.110.151216 33.222.513.037894 42.11.521.837287 52.050.50.82439 ### Group disclosure rule • In all tabular and similar outputs, no cell should contain 90% or more of the column or row total • Provide evidence ### Minimum contributors for percentiles PercentileMinimum contributors 0.01500 0.05100 0.1050 0.2520 0.5010 0.7520 0.9050 0.95100 0.99500 ### Minimum 10 degrees of freedom • All modelled output should have at least 10 degrees of freedom • Degrees of freedom = number of observations - number of parameters - other restrictions of the model ### Consequential suppression If one or more of the rules fail and suppression is applied, one or more additional cells should be suppressed to protect the value of the primary suppressed cell from being worked out. In the case of the rule of 10 failing, if someone has access to multiple tables regarding the same sample, they cannot use these multiple tables to deduce values of cells with less than 10 observations. In the case of the dominance rules failing, if area11 + area12 + area13 = area1, and a cell in area11 is suppressed, then the same cell in area12 and/or area13 also needs to be suppressed such that both dominance rules pass for the combined suppressed cells. Likewise, for any other relationships. Examples include: • Industry11 + Industry12 + Industry13 = Industry1 • variable1 + variable2 + variable3 = variable4 • (variable1 - variable2) / variable1 = variable3 • variable1 / variable2 = variable3 ## Preparing your output for clearance ### Descriptive statistics ##### Frequency tables • Rule of 10 • Group disclosure rule • Consequential suppression ##### Magnitude tables, means, totals, indices, indicators, proportions, measures of concentration • Rule of 10 • Dominance rules • Group disclosure rule • Consequential suppression ##### Ratios • Rule of 10 • Dominance rules • Group disclosure rule • Consequential suppression • If the ratio is calculated at the business or individual level, the ratio is treated as another variable on the dataset and the (1,50) and (2,67) dominance rules applies as usual • If the ratio is in the form of aggregate/aggregate, the (1,50) and (2,67) dominance rules applies to the numerator and denominator separately. If either the numerator or denominator fail, the ratio is suppressed ##### Maximums, minimums Subject to minimum contributors for percentiles, use: • 99th and 1st percentiles • 95th and 5th percentiles • 90th and 10th percentiles ##### Quantiles (including median, quartiles, quintiles, deciles, percentiles) • Minimum contributors for percentiles ##### Box plot • Same rules apply as per quartiles, maximums and minimums • Minimum contributors for percentiles ##### Mode • Rule of 10 ##### Higher moments of distributions/measures of spread (including variance, covariance, kurtosis, skewness) • Rule of 10 ##### Graphs, pictorial representations of actual data • Not normally released if showing individual observations ### Correlation and regression analysis ##### Regression coefficients, and summary and test statistics • Minimum 10 degrees of freedom • R-squared ≤ 0.8 For regressions that have a continuous dependent variable and only categorical independent variables, the regression will return the average of each category. In this case: • Rule of 10 • Dominance rules • Provide a cross-tab of the independent variables. Each cell must have at least 10 observations. • Each cell in the cross-tab needs to be tested for the (1,50) and (2,67) dominance rules for the dependent variable. ##### Hazard models • Rule of 10 • There must be at least 10 'failures' ##### Estimation residuals • Not normally released • Provide justification ##### Correlation coefficients • Rule of 10 ### How to apply dominance rule and rule of 10 for regression #### Example 1: Linear Regression A linear regression was run to predict income by age and health status: Age was binned into three categories: <18 years, >18 and < 30 and > 30 years, where <18 was the reference category. Health status was categorised according to healthy or unhealthy, where unhealthy was the reference category. Suppose the desired output was a regression summary below: Beta CoefficientP Value Constant1.50.001 Age >18 and <3020.004 Age >3030.002 N=1000, R-Squared=0.67 We should provide a crosstabulation of counts and a dominance table for the output clearance team. #### Crosstabulation of Counts UnhealthyHealthy Age < 181530 Age >18 and <304070 Age >306089 Counts for each combination of variables are greater than 10. The rule of 10 is satisfied #### Dominance Table We should provide a dominance table for the output clearance team like below: Please note: Only the columns Top 1 and Top 2 Contribution to Total Income are required. The other columns are presented to illustrate the calculation. This table is also usually presented in one long spreadsheet. Total IncomeTop 1 IncomeTop2 IncomeTop 1 Contribution to Total IncomeTop 2 Contribution to Total Income UnhealthyUnhealthyUnhealthyUnhealthyUnhealthy Age < 18$1,500$500$900500/1,500 =33%900/1,500 = 60%
Age >18 and <30$130,000$55,000$85,00055,000/130,000 = 42%85,000/130,000 = 65% Age >30$1,000,000$520,000$600,000520,000/1,000,000 = 52%600,000/1,000,000=60%

Total IncomeTop 1 IncomeTop2 IncomeTop 1 Contribution to Total IncomeTop 2 Contribution to Total Income
HealthyHealthyHealthyHealthyHealthy
Age < 18$2,500$1,000$1,3001,000/2,500 = 40%1,300/2,500 = 52% Age >18 and <30$230,000$155,000$200,000155,000/230,000 = 67%200,000/230,000 = 87%
Age >30$2,000,000$600,000\$900,000600,000/2,000,000=30%900,000/1,000,000=90%

There are multiple instances where the (1,50) and (2,67) rules are violated. Adjustments to the regression output will need to be applied before it can be cleared. The most common suggestion is to suppress the constant/intercept.

### Unit records

Print, list or other commands that produce unit record level data

• Prohibited

## Request output clearance

To request output clearance:

1. Make sure you have applied the output clearance rules.
2. Move your output to the Output drive.
3. Use the 'Request output clearance' link at the top of this page. If the Request output button does not generate an email, use the template below to submit your request.

Outputs generally take 2-3 business days to be cleared if all the rules have been followed. Outputs where the rules have been improperly applied will take longer. Large outputs will also take longer. To minimise clearance time, ensure that requests contain only necessary outputs and the rules have been correctly applied.

To: microdata.access@abs.gov.au

Subject: Request DataLab output clearance

Dear DataLab team

I have saved my output to the Output drive for ABS review.

Project name:
Output file name(s):
Data file(s) used (e.g. BLADE1617_CORE):
Description of the original and self-constructed variables:
Description of the analysis:

Additional requirements are listed below:

• Weighted outputs: I have included the unweighted frequencies in my output.
• Graphs/charts: I have included the underlying numbers used to produce the graphs/charts.
• I have included any relevant code and log files.

## Request input clearance

If you have your own data, code or files that you would like to use in DataLab, they need to be approved before they can be loaded. This is known as input clearance. Examples of inputs include:

• data - aggregated data, tables, microdata and classifications
• code - user written code and packages
• other files - Word documents and PDFs

To request input clearance use the 'Request input clearance' link at the top of this page. If the Request input clearance link does not generate an email, use the template below to submit your request.

We aim to respond to your input clearance request within two to three business days. It is likely to take longer if your request is large, complex or needs clarification.

To: microdata.access@abs.gov.au

Subject: Request DataLab input file load

Dear DataLab team

I would like to load the attached file(s) to my DataLab project.

Project name:
File type (e.g. code or data):
Description of each file:

Additional information required for each data file:

• organisation/individual owner of the data:
• source of the data (include website link if applicable):
• any terms of use or licensing that applies to the data that may restrict its use in the ABS DataLab and require additional permissions or conditions: