|Page tools: Print Page Print All|
This involves slightly altering cells in a table to make them all divisible by the same number. (Common numbers used for rounding are 3, 5 or 10.) Data rounding may be random or controlled. It prevents the original data values from being known with certainty while ensuring the usefulness of the data is not significantly affected.
For more information, see Part 4: Managing the risk of disclosure: Treating aggregate data .
A process of moving the values of one or more variables in a record of a Unit Record File to where that record will not pose a disclosure risk.
For more information, see Part 5: Managing the risk of disclosure: Treating microdata.
Differencing, or differencing attack
This is where someone with access to multiple tables can deduce the true values of cells that had been modified or suppressed. The individual tables may be non-disclosive, but when the tables are compared, the difference between cells across the tables may be disclosive.
For example, if a user accessed a table with information on 20–25 year olds and then accessed a subsequent table with information on 20–24 year olds, the difference between the two tables will reveal information about 25 year olds only.
This occurs when the data includes an identifier (e.g. name or address) that can be used, without any additional information, to establish the identity of a person, group or organisation.
A breach of confidentiality, where a person, group or organisation is identified or has previously unknown characteristics (attributes) associated to them as a result of releasing data.
The process of limiting the risk of an individual or organisation being directly or indirectly identified. This can be via statistical (i.e. data focussed) or non-statistical (i.e. data context-focussed) techniques or processes.
Disclosure risk management
In the context of confidentiality, this involves determining whether released datasets (or sections of released datasets) constitute a risk of disclosure or re-identification, and then putting in place controlling mechanisms to mitigate those risks. The Five Safes Framework provides a way of assessing risk within the constraints provided by policies and legislation.
For more information, see Part 3: Managing the risk of disclosure: the Five Safes Framework.
Five Safes Framework
A multi-dimensional approach to managing disclosure risk, consisting of Safe People, Safe Projects, Safe Settings, Safe Data and Safe Outputs. Each safe is considered both individually and in combination to determine disclosure risks and to put in place mitigation strategies for releasing and accessing data.
For more information, see 3. Managing the risk of disclosure: the Five Safes Framework.
Also called the threshold rule, this sets a particular value for the minimum number of unweighted contributors (e.g. people, households or businesses) to any cell in the table. Cells with very few contributors ('small cells') may pose a disclosure risk. Common threshold values are 3, 5 and 10. If a cell fails this rule, further investigation or action is needed to ensure the cell is adequately protected.
Datasets that contain more than one level. For example, a dataset containing unit records with information about individual people (eg personal income) may also contain information about the families these people are part of (eg household income).
Data that includes information that may refers directly to an individual (e.g. name or address, ABN, Medicare number).
An identifier, or direct identifier, is information that directly establishes the identity of an individual or organisation. The following are examples of identifiers: name, address, driver's licence number, Medicare number and Australian Business Number.
This occurs when the identity of an individual, group or organisation is disclosed due to a unique combination of characteristics (that are not direct identifiers) in a dataset. For example, a famous individual may be identifiable on the basis of their age, sex, occupation and income.
This is where a user compares records from one dataset with records from another in an attempt to find records that have corresponding information, such that it may be concluded that the two records belong to the same individual. Where this is done in an attempt to re-identify that individual, there is a clear breach of the Privacy Act and other legislation governing data access.
See aggregate data.
Datasets of unit records where each record contains information about a person or organisation. This information can include individual responses to questions on surveys or administrative forms.
Situations where data is made available with no restriction on access or use (excluding possible copyright or licensing requirements). In terms of the Five Safes Framework, the only control is on Safe Data. Thus data on data.gov.au would be considered open data. On the other hand, data underlying the ABS TableBuilder product would not be (as there is a Safe Setting control); however once tables are produced they are considered open data.
An unusual record that, because it has an extreme value for one or more data items, stands out from the rest of the population or sample.
A statistical disclosure control rule that prevents any user from estimating the value of a cell contributor to within P% (where P is defined by the data custodian).
According to the Privacy Act 1988, personal information is 'information or an opinion about an identified individual, or an individual who is reasonably identifiable:
(a) whether the information or opinion is true or not true; and
(b) whether the information or opinion is recorded in a material form or not’.
In other words, personal information is information that identifies, or could identify, a person. This can include not only names and addresses, but also medical records, bank account details, photos, videos and even information about what a person likes and where they work. Information can still be personal without having a name attached to it. For example, in some cases, date of birth and postcode may be enough to identify someone.
A statistical disclosure control technique used for count or magnitude data. Perturbation is a data modification method that involves changing the data slightly to reduce the risk of disclosure while retaining as much data content and structure as possible. Perturbation techniques include data rounding.
For more about statistical disclosure control techniques, see Part 4: Managing the risk of disclosure: Treating aggregate data.
Although not specifically defined in the Privacy Act, privacy is generally considered as an individual’s right to have their personal information kept confidential unless informed consent has been given to release the information, or a legal authority exists. This is in accordance with the requirements of the Privacy Act 1988.
Re-identification is the act of determining the identity of a person or organisation using publicly or privately held information about that individual or organisation.
Remote analysis facility
Remote access facilities are used by agencies around the world. These facilities enable approved researchers to submit data queries from their desktops through a secure online interface. Requests are run against a Confidentialised Unit Record File (CURF) that is securely stored inside the data custodian's computing environment.
Rare characteristics or attributes in the data that can pose an identification risk, depending on how extraordinary or noticeable they are. These might include unusual jobs, very large families or very high income. Remarkable characteristics can lead to re-identification of individuals, households or organisations.
See data provider.
Information that is publicly or privately known about a respondent. This information may be used to breach confidentiality.
See data rounding.
One of the Five Safes, Safe Data poses the question: has appropriate and sufficient protection been applied to the data? At a minimum, direct identifiers such as name and address must be removed or encoded. Further statistical disclosure control may also need to be applied depending on the context in which data is released.
One of the Five Safes, Safe Outputs poses the question: are the statistical results non-disclosive? This is the final check, which aims for negligible risk of disclosure. All data made available outside of the data custodian’s IT environment must be checked for disclosure. For example, statistical experts may check all outputs for inadvertent disclosure before the data leave a secure data centre.
One of the Five Safes, Safe People poses the question: is the researcher appropriately authorised to access and use the data? By placing controls on the way data is accessed, the data custodian invests some responsibility for preventing re-identification in the researcher. The general rule is that as the detail in the data increases, so should the level of user authorisation required.
One of the Five Safes, Safe Projects poses the question: is the data to be used for an appropriate purpose? Before users can access detailed microdata, they may need to demonstrate to the data custodian that their project has a valid research aim, public benefit or statistical purpose. Again the requirements under Safe Projects will depend on the context in which the data is accessed.
One of the Five Safes, Safe Settings poses the question: does the access environment prevent unauthorised use? The environment here can be considered in terms of both the IT and physical environment. In some data access contexts, such as Open Data, Safe Settings are not applicable. At the other end of the spectrum, sensitive information is accessed through secure research centres.
Secure Research Centre
See data laboratory.
Safe storage of, and access to, held data. Security covers both IT security and the physical security of buildings.
Sensitive information (data)
Sensitive information is considered a subset of personal information and, within the Privacy Act, is afforded a greater importance in terms of confidentiality (in particular leading to worse consequences for a re-identified individual). The Office of the Australian Information Commissioner lists a number of characteristics about an individual that are defined as sensitive.
Community expectations and even ethical considerations, however, may not consider this list to be exhaustive (eg. financial information is not present). Indeed, it could be argued all personal information can be potentially sensitive depending on the context and the individual concerned. In addition, businesses may consider much of their information to be sensitive.
Where a user inadvertently recognises an individual or organisation in a dataset, without deliberately attempting to identify them. Data custodians should expect that this may occur in the normal course of data analysis. Generally this risk only applies to microdata, but it could also apply to aggregate data) if the outputs have not been checked rigorously enough (see Safe Outputs).
Purposes which support the collection, storage, compilation, analysis and transformation of data for the production of statistical outputs, the dissemination of those outputs and the information describing them. Statistical purposes may be distinguished from administrative, regulatory, compliance, law enforcement or other purposes that affect the rights, privileges or benefits of particular individuals or organisations.
This means not releasing information deemed to be a disclosure risk. Data suppression involves removing specific values from a table so that people and organisations cannot be re-identified from the released data. For more information, see Part 4: Managing the risk of disclosure: Treating aggregate data .
See aggregate data.
See frequency rule.
The situation where an individual can be distinguished from all other members in a population or sample. The existence of uniqueness is determined by the size of the population or sample, the degree to which it is segmented (e.g. by geographic information), and the number and detail of characteristics provided for each unit in the dataset.
Unit record data
If you have questions or feedback, please email: firstname.lastname@example.org
These documents will be presented in a new window.
1160.0 - ABS Confidentiality Series, Aug 2017
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 23/08/2017 First Issue