This section covers:
To minimise the risk of identifying individuals in aggregate statistics, a technique has been developed to randomly adjust cell values. Random adjustment of the data, known as perturbation, is considered to be the most satisfactory technique for avoiding the release of identifiable data while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics.
Perturbation is applied across all non-zero cells in a table, including the totals cells. Perturbation may change the true cell value by either increasing or decreasing the value by a small amount. Within this context, although cells may appear to contain none, or all, of the relevant population, this is not necessarily a reflection of the true value of the cell. These adjustments result in introduced random errors, but with almost no bias. The information value of the table as a whole is not significantly impaired.
Random perturbation can be a source of frustration to users, as it can result in inconsistencies in the data. Most tables reporting basic statistics will not show significant discrepancies due to random perturbation. However, as the degree of complexity of tables increases, the need for random perturbation remains and it will continue to be used in most TableBuilder datasets.
In TableBuilder, totals are not calculated by summing the interior values of the table. Instead, more accurate totals are provided by calculating the true total, and then perturbing this value. If you attempt to reconstruct a total on the basis of the perturbed interior cells, you will add together the small changes made to each cell which may result in a large change relative to the perturbed total. It is recommended that totals are constructed in TableBuilder, rather than by summing the interior cells from an exported table.
In addition to perturbation, some TableBuilder datasets use the additivity technique to make further adjustments to the data to ensure that the interior cells add up to the totals. As additivity is not required for confidentiality purposes, most datasets in TableBuilder do not use the additivity technique. For further information, see Additivity below.
When calculating proportions, percentages or ratios from cross-classified or small area tables, the introduced random error can be ignored except for small cells. The introduced random adjustments made to cells in a table are independent of the size of the original cell value, so perturbation has the greatest relative impact on small cell values. The information value of the table as a whole is not impaired as small cell values are also strongly affected by other factors, such as sampling error, respondent errors and processing errors.
Caution should be exercised when interpreting and using cells with small values or large percentage relative standard error (RSE) values. RSEs are provided for survey-based datasets that are subject to sampling variability. Datasets in Census TableBuilder are not weighted so RSEs are not applicable for Census data. See the Relative standard error section for further information in relation to survey datasets.
When analysing a table of means or sums of a continuous variable, it is recommended that the table be compared to the corresponding table of counts of units with a valid response for that continuous variable. No reliance on estimates of means or sums should be placed on cells with a large RSE or for which the corresponding cell count is small. For more information about using continuous variables, see the Summation options, ranges and quantiles section.
General information about confidentiality and perturbation is provided in the Treating aggregate data section of the ABS Confidentiality Series.
For information about Census data confidentiality in other products, see Census of Population and Housing - QuickStats, Community Profiles and DataPacks User Guide, Australia, 2016 (2916.0).
The additivity technique makes additional adjustments to table cells to ensure internal consistency of its tables. As additivity is not required for confidentiality purposes, most datasets in TableBuilder do not
use the additivity technique. The Census of Population and Housing datasets used this technique until June 2017, when it was removed from all Census TableBuilder datasets. Users should refer to the dataset-specific information to ascertain whether the additivity technique has been used for that dataset. Information about each dataset is linked from the Available microdata
For datasets where the additivity technique is used, secondary adjustments are made to cell values so that each table of estimates of counts will be internally consistent. ‘Internally consistent’ means that the interior cells add up to the totals. The tables at different geographic levels are adjusted independently, and tables at a higher geographic level may not be equal to the sum of the tables for the component geographic units. A table of estimates of sums will in general not be internally consistent. Also, the technique may introduce minor discrepancies between tables with similar data items.
For datasets where the additivity technique is not used, a table of estimates of counts or proportions will in general not be internally consistent. However, cells in one table will be consistent with other tables containing the same data item combinations.
RSE estimates do not take into account the effects of the additivity technique. To ensure consistency with the cell values, the additivity technique may scale some RSE estimates. See the Relative standard error
section for further information in relation to survey datasets.
Some datasets have an additional quality measure applied to tables with too many small cells. Sparsity does not apply to Census TableBuilder datasets
. Small cells may not be reliable, as not enough records have been selected in the sample to accurately estimate the population for that combination of characteristics.
If a table has too many small cells the table may not be returned when the user clicks the Retrieve Data button. In this example table showing Country of Birth
(using the most detailed level of this hierarchical variable) by Social marital status
, an exclamation mark symbol and message displays at the top of the table when the user clicks on Retrieve Data.
The following messages display below the table, indicating that the table is too sparse and has been suppressed.
To continue working, users can try creating a variant of the original table. For example, removing a Not applicable category may reduce the number of small cells in the table and allow the data to be retrieved. Possible methods to reduce the size of the table include:
- removing one or more variables
- removing one or more variable categories
- using a higher (less detailed) level of a hierarchical variable
- creating a custom range to combine less relevant categories.
For this table, the Marital status
categories of Not applicable
and Not married
were removed. Then the full Country of birth
variable was replaced with all categories within Oceania and Antarctica
, still at the most detailed level of this hierarchical variable. This table was able to be returned using Retrieve Data.