Introduced random error
Under the Census and Statistics Act it is an offence to release any information collected under the Act that is likely to enable identification of any particular individual or organisation. Introduced random error is used to ensure that no data are released which could risk the identification of individuals in the statistics.
Many classifications used in ABS statistics have an uneven distribution of data throughout their categories. For example, the number of people who are Anglican or born in Italy is quite large (3,881,162 and 218,718 respectively in 2001), while the number of people who are Buddhist or born in Chile (357,813 and 23,420 respectively in 2001), is relatively small. When religion is cross-classified with country of birth, the number in the table cell who are Anglican and who were born in Italy could be small, and the number of Buddhists born in Chile even smaller. These small numbers increase the risk of identifying individuals in the statistics.
Even when variables are more evenly distributed in the classifications, the problem still occurs. The more detailed the classifications, and the more of them that are applied in constructing a table, the greater the incidence of very small cells.
Care is taken in the specification of tables to minimise the risk of identifying individuals. In addition, a technique has been developed to randomly adjust cell values. Random adjustment of the data is considered to be the most satisfactory technique for avoiding the release of identifiable Census data. When the technique is applied, all cells are slightly adjusted to prevent any identifiable data being exposed. These adjustments result in small introduced random errors. However the information value of the table as a whole is not impaired. The technique allows very large tables, for which there is a strong client demand, to be produced even though they contain numbers of very small cells.
The totals and subtotals in summary tables are also subjected to small adjustments. These adjustments of totals and subtotals include modifications to preserve the additivity within tables. Although each table of this kind is internally consistent, comparisons between tables which contain similar data may show some minor discrepancies. In addition the tables at different geographic levels are adjusted independently, and tables at the higher geographic level may not be equal to the sum of the tables for the component geographic units.
It is not possible to determine which individual figures have been affected by random error adjustments, but the small variance which may be associated with derived totals can, for the most part, be ignored.
No reliance should be placed on small cells as they are impacted by random adjustment, respondent and processing errors.
Many different classifications are used in Census tables and the tables are produced for a variety of geographical areas. The effect of the introduced random error is minimised if the statistic required is found direct from a tabulation rather than from aggregating more finely classified data. Similarly, rather than aggregating data from small areas to obtain statistics about a larger standard geographic area, published data for the larger area should be used wherever possible.
When calculating proportions, percentages or ratios from cross-classified or small area tables, the random error introduced can be ignored except when very small cells are involved, in which case the impact on percentages and ratios can be significant.
See also Confidentiality.
This page last updated 20 May 2011