Confidentiality and protecting your data
The Australian Bureau of Statistics (ABS) is committed to protecting the personal information it collects. Not only does the ABS have strong legislative protections that safeguard the privacy of an individual's information, we have a proud 100 year history of maintaining community trust in the way it collects, uses, discloses and stores your personal information collected in the Census.
What does Confidentiality mean?
Confidentiality is about ensuring the personal information the ABS has collected is kept secret or private. The ABS uses a number of processes and methods to ensure the information released is consistent with our privacy obligations. The ABS never has, and never will release identifiable Census data. Key measures to safeguard information include strong encryption of data, restricted access on a need-to-know basis and monitoring of staff data access, including regular audits.
In accordance with the Census and Statistics Act 1905 all Census data, including QuickStats, Community Profiles, DataPacks and TableBuilder, is subjected to a confidentiality process called perturbation before release. This includes the information found in Reflecting Australia and all publications that use Census data. This confidentiality process is undertaken to avoid releasing information that may allow for the identification of particular individuals, families, households, or businesses.
The ABS has developed a technique to adjust counts to maintain confidentiality of information. This technique, known as perturbation, is applied to all counts, including totals, to prevent any identifiable data about individuals being released. These adjustments result in small introduced random errors and can mean that the rows and columns of a table do not sum to the displayed totals. However, the confidentiality technique is applied in a controlled manner that ensures the information value of the table as a whole is not significantly affected. Further information on the methodology of perturbation can be found in Confidentialising Tabular Output to Protect Against Differencing paper.
Perturbation can be a source of frustration to users because rows and columns do not add to totals, but this technique is implemented to protect personal information. Most tables reporting basic statistics will not show significant discrepancies due to perturbation. However, as the degree of complexity of a table increases, the need for perturbation remains and it will continue to be used in the release of 2016 Census data.
For 2006 and 2011 Census data, an additional 'additivity step' was applied that made further small adjustments to each table to ensure rows and columns added to totals. This extra adjustment meant that comparisons between tables which contained similar data items had minor discrepancies. In addition, as the tables at different geographic levels are adjusted independently, tables at the higher geographic level may not be equal to the sum of the tables for the component geographic units. Because of these inconsistencies, for 2016 Census data this additivity step has been removed. For consistency and interpretability, the 2006 and 2011 data that appears in the following 2016 products have been re-calculated without additivity - Time Series Community Profile, DataPacks and the time series comparisons in QuickStats. ABS survey data outputs no longer implement this additivity step.
Interpreting the data
Perturbation has very little impact on Census data.
This is because it is applied consistently to the data so the same information will always have the same adjustment applied, and it is very small in magnitude. For example, a count of 15 - 24 year old males in New South Wales will have the same perturbation applied regardless of how a table with this data is constructed. However, the count in QuickStats may in rare cases differ marginally from the count in Community Profiles and DataPacks because the data in these products are recoded for presentation purposes.
The best number to use will always be the count that most directly corresponds to the information you require. It is not recommended that you derive information by summing across a row or down a column, as this increases the instances where perturbation may impact on the output. For instance, if you are interested in the count of 15 - 24 year old males in New South Wales, the total count will be the best figure to use, not the sum in individual years of males in New South Wales.
When calculating proportions, percentages or ratios from cross-classified or small area tables, the random adjustments introduced by perturbation can be ignored except when very small counts are involved, in which case the impact on percentages and ratios can be relatively significant. No reliance should be placed on small counts (that is, counts of 20 or less). Aside from the effects of the confidentiality process, Census non-response and possible respondent and processing errors have greatest relative impact on these small counts.
With the removal of the additivity step for 2016 Census data, comparisons over time should be made using the 2016 time series products where possible. Comparisons between 2011 QuickStats and 2016 QuickStats will compare one product where additivity has been applied and another where it hasn't. Whilst this will not have a significant impact on the differences observed over time, the most correct approach is to use the 2016 time series products. The 2011 Census data products will not be re-released with the additivity step removed.
Users should exercise care when interpreting the medians that have been calculated from TableBuilder for Census data. The removal of additivity has affected median figures for small areas and small sub-populations. The ABS has manually re-calculated the medians displayed in QuickStats, Community Profiles and DataPacks using the same methodology as 2006 and 2011 Census output, so comparisons over time can be reliably made. This methodology could not be automatically implemented in output systems, hence there may be slight differences between the medians extracted in TableBuilder and those in the other data products. As a general rule, users should ensure the underlying population a median is calculated from has a count of at least 100, though this needs to be higher for Census topics with a detailed classification like age in single years, rent and mortgage repayments in single dollars.