# Input and output clearance

DataLab

Requesting input, output and transfer clearance. Applying the output rules to your analysis.

Released
19/11/2021

## Output rules quick reference table

The most common types of analysis are listed below along with the applicable rules for output. Other output types will be assessed based on similar principles.

Output typeApplicable rules
Frequency tables (counts, percentages)Rule of 10
Group disclosure
Magnitude statistics (means, sums, ratios)Rule of 10
Group disclosure
Dominance
Quantiles (percentiles, medians)Minimum contributors for quantiles
Minimums, maximums, rangesMinimum contributors for quantiles
Models including regressionsDegrees of freedom
Model-specific rules
Charts (graphs, plots and histograms)Chart clearance
MicrodataNot appropriate for output
Synthetic microdataNot appropriate for output

## Rule of 10

The rule of 10 refers to the minimum number of contributors required for each cell or statistic. The underlying (unweighted) count of observations must meet this threshold, and evidence must be provided.

If multiple tables are produced, differences of less than ten should not be able to be calculated through combining the tables.

The rule of 10 applies to most outputs including counts, percentages (both numerator and denominator), means, sums, ratios, and other statistics.

Options for making output safe include suppression of small counts, aggregation of categories or perturbation. If a cell is suppressed but it can be derived or estimated from other outputs, one or more additional values should be suppressed to protect the values of the the primary suppressed cell from being worked out.

## Dominance

The dominance rule is designed to prevent the re-identification of units that contribute a large percentage of a cell's total value, which could in turn reveal information about individuals, households or businesses.

DataLab has a (1,50) and a (2,67) rule. This means that for any cell, the largest contributor cannot account for more than 50% of the total value and the largest two contributors cannot account for more than 67% of the total value.

Where a variable can take both positive and negative values, the negative values should be replaced with absolute values before determining the largest contributors and the total. The largest absolute value is then divided by the sum of absolute values to determine if the (1,50) rule is met, and the sum of the two largest absolute values are divided by the sum of absolute values to check the (2,67) rule.

Similar to the rule of 10, in the case of the dominance rule failing and if a cell is suppressed but it can be derived or estimated from other outputs, one or more additional values should be suppressed to protect the values of the primary suppressed cell from being worked out.

Dominance must be checked if any mean, total or similar statistic is calculated for continuous or magnitude variables. It does not apply to counts.

## Group disclosure

Group (or attribute) disclosure occurs when all or nearly all units that have one feature also have some other feature. This means that even when the individual units may appear protected based on other rules, a previously unknown attribute of a unit may be disclosed based on the attributes of the group. Group disclosure risk should be assessed when any cell contains more than 90% of total number of units in the row or column.

This rule applies to frequency tables. Whether group disclosure requires treatment depends on the sensitivity and nature of the output.

## Minimum contributors for quantiles

Quantiles and other relative ranks must be based on a minimum number of contributors depending on the precision. Underlying unweighted counts should be provided when reporting quantiles in the outputs. For information on required contributors for quantiles, see the table below:

Quantile Minimum contributors

Medians ( 0.50 )

10

Quartiles ( 0.25, 0.5, 0.75 )

20

Quintiles ( 0.2, 0.4, 0.6, 0.8 )

25

Deciles ( 0.1, 0.2, 0.3 ... 0.9 )

50

Vigintiles ( 0.05, 0.1, 0.15 ... 0.95 )

100

Percentiles ( 0.01, 0.02 ... 0.99 )

500

Minimums and maximums are generally unsafe to output. The following percentiles are safe options if the minimum contributors rule is satisfied:

• 1st and 99th percentiles
• 5th and 95th percentiles
• 10th and 90th percentiles

## Degrees of freedom

Models and regressions are generally safe to output. However, overfitted models can pose a disclosure risk. All models and regressions must have a minimum of 10 degrees of freedom and evidence that this has been met should be provided.

The degrees of freedom are calculated by subtracting the number of parameters and other model restrictions from the total number of observations that contribute to the model.

## Model-specific rules

There are additional rules for specific model types.

For ordinary least squares regressions, the R-squared should be lower than 0.9. If the R-squared is higher than this, the constant may need to be suppressed to prevent predictions. This requirement does not apply to other models such as fixed effects or two-stage regressions.

Additionally, for ordinary least squares regressions with a continuous dependent variable and only categorical independent variables, the regression will approximate the tabular means. The addition of a continuous independent variable, or suppression of the intercept reduces the disclosure risk. Otherwise, apply the rule of 10 and dominance rules.

For survival curves, each step change in the survival curve should represent at least 10 data subjects.

Correlation coefficients should be calculated based on a minimum of 10 contributors.

Gini coefficients are usually safe to output, and must be based on a minimum of 10 contributors.

For classification and regression trees, any underlying unweighted counts must meet the rule of 10

For other models, please provide evidence that no estimates or parameters are derived from fewer than 10 underlying contributors and explain why the output is non-disclosive.

## Chart clearance

All graphs, plots and other charts are subject to the output rules that apply to the underlying output type. The data used in the chart must be provided, accompanied by any relevant supporting evidence that it meets output rules.

Charts that plot characteristics of individual units or groups of fewer than 10 units will not be cleared.