Five Safes framework

Data confidentiality guide

Using safe people, projects, settings, data and output to balance disclosure risk and utility, ABS Fives Safes examples

Released

8/11/2021

Balancing disclosure risk and data utility

A key challenge for data custodians is to provide data with maximum utility for users but still maintain the confidentiality of the information. Every data release carries some risk of disclosure, so the benefits of each release (its utility or usefulness for research and statistical purposes) must substantially outweigh its risks and be clearly understood. This balancing of risk and utility is something everyone does on a daily basis (for example, when they choose to drive a car). Similarly, data custodians need to approach disclosure risk by managing it, rather than trying to eliminate it.

Confidentiality is breached when a person, group or an organisation is re-identified through a data release or when information can be attributed to them. The likelihood of this happening, or risk of disclosure, is not easily determined. Implicit in this is that the consequences of disclosure are always damaging (to some extent) to the individual or organisation. It is difficult to ascertain the degree of damage, mostly because people differ in the importance they place on information. What may be considered highly confidential to one person is of no consequence to another. The ABS assumes all information it collects to be potentially sensitive and manages it accordingly.

Managing disclosure risk becomes a question of assessing not only the data itself, but also the context in which the data is released. Once the context is clearly understood, it is much easier to determine how to protect against the threat of disclosure. The Five Safes framework provides a structure for assessing and managing disclosure risk that is appropriate to the intended data use.

This framework has been adopted by ABS, several other Australian government agencies as well as national statistical organisations such as the Office of National Statistics (UK) and Statistics New Zealand.

Five Safes framework

The Five Safes framework takes a multi-dimensional approach to managing disclosure risk. Each safe refers to an independent but related aspect of disclosure risk. The framework poses specific questions to help assess and describe each risk aspect (or safe) in a qualitative way. This allows data custodians to place appropriate controls, not just on the data itself, but on the manner in which data is accessed. The framework is designed to facilitate safe data release and prevent over-regulation

The five elements of the framework are:

safe people
safe projects
safe settings
safe data
safe outputs

Safe people

Is the researcher appropriately authorised to access and use the data?

By placing controls on the way data is accessed, the data custodian invests some responsibility in the researcher for preventing re-identification. Usually, as the detail in the data increases, so should the level of user authorisation required.

Prerequisites for user authorisation usually include:

training in confidentiality and the conditions of data use
signing a legally binding undertaking to maintain data confidentiality

By definition, a safe people assessment would not be required for open data (data that is released into the public domain with no restriction on use).

Safe projects

Is the data to be used for an appropriate purpose?

Users wanting to access detailed microdata should be expected to explain the purpose of their project. For example, in order to access detailed microdata in the ABS DataLab, users must demonstrate to the ABS that their project has a statistical purpose and show it has:

a valid research aim
a public benefit
no capacity to be used for compliance or regulatory purposes

As with safe people, the need for a safe project assessment will depend on the context in which the data is accessed. It would not be required for open data.

Safe settings

Does the access environment prevent unauthorised use?

The environment here can be considered in terms of both the IT and the physical environment. In data access contexts such as open data, safe settings are not required. At the other end of the spectrum however, sensitive data should only be accessed via secure research centres.

Secure research centres may have features such as:

a locked room requiring personal authentication
IT monitoring equipment
auditing and other supervision

Safe settings ensure that data access and use is occurring in a transparent way.

Safe data

Has appropriate and sufficient protection been applied to the data?

At a minimum the removal of direct identifiers (such as name and address) must be applied to data before it is released. Further statistical disclosure controls should also be applied, depending on how the data will be released. Table 1 shows some of the statistical factors that should be considered when assessing disclosure risk.

Table 1: Factors to consider when assessing disclosure risk
Factor	Effect on disclosure risk
Data age	Older data is generally less risky
Sample data (e.g. a survey)	Decreases risk
Population data (e.g. a census)	Increases risk
Administrative data	Increases risk
Longitudinal data	Increases risk
Hierarchical data	Increases risk
Sensitive data	Increases risk (sensitive data may be a more attractive target)
Data quality	Poor quality data may offer some protection
Microdata	Main risk: re-identification
Aggregate data	Main risks: attribute disclosure and disclosure from differencing
Key variables	The variables of most interest to users are usually the most disclosive

Source: UK Anonymisation Decision-making Framework

Safe outputs

Are the statistical results non-disclosive?

This is the final check on the information before it is made public, which aims to reduce the risk of disclosure to a minimum. All data made available outside the data custodian's IT environment must be checked for disclosure. For example in the ABS DataLab, statistical experts check all outputs for inadvertent disclosure before the data leaves the DataLab environment.

Examples from the ABS

The Five Safes framework provides a mechanism for data custodians to take necessary and reasonable steps to manage disclosure risk in their data releases. It broadens the approach to data confidentiality by considering not just the treatment of data, but also the manner and context in which data is released.

The safes are assessed independently, but also considered as a whole. They can be thought of as a series of adjustable levers or controls to effectively manage risk and maximise the usefulness of a data release. The degree to which each safe is controlled is critical to assessing the disclosure risk. Tightly controlling all five will be counterproductive because the restrictions applied will not produce a corresponding benefit (useful data).

In practice, the safe data part of the Five Safes should be addressed after the other four are considered. This is because the degree of data treatment required will become evident once it is clear who will be able to access the data, under what conditions, in what circumstances and how the resulting data will be protected in order to be made public. The process is likely to be iterative, as data treatment with a view to maintaining utility may necessitate reassessing one or more of the other four safes.

This table describes how the ABS applies the Five Safes framework to three different data access channels - open data, basic and detailed microdata files.

Table 2: Three examples of ABS application of the Five Safes framework
	Website or publication table (open data)	Basic microdata file (via direct download)	Detailed microdata file (via ABS DataLab)
Safe people	No control necessary Anyone may view the data online	Some control Users must register to use the data and sign a Declaration of Use Breaches may be subject to sanctions and/or legal proceedings	High control Users must undergo training, complete an authorisation process, sign legally binding confidentiality undertakings and a compliance declaration Breaches of protocols or disclosure of information may be subject to sanctions and/or legal proceedings
Safe projects	No control necessary Anyone can use the data for their own purposes	Some control Users sign a declaration regarding the purpose for which they will use the data	High control Users must detail the purpose for which they will use the data Purpose can be compared to what is actually produced (see Safe Outputs)
Safe settings	No control necessary	Some control Users are required to store the data securely and can work on the data in their own physical and IT environment	High control The DataLab a secure, closed environment, accessed virtually or on-site Secure login, auditing and monitoring capabilities No data can be removed without first being checked by ABS staff
Safe data	Very high control The data is highly aggregated	High control The data is treated by the ABS to ensure no individual is likely to be identified	Appropriate control Direct identifiers are removed and the data is further treated where appropriate. Appropriate control of the data optimises its usefulness for statistical and research purposes.
Safe outputs	Very high control Every table is checked for disclosure before release (in an Open Data context, the data is the safe output)	Some control The output is technically controlled by the user, but the ABS provides guidelines or rules about what may be published or shared	High control All statistical outputs are assessed by the ABS for disclosure before being released to the user. The outputs may also be compared for consistency with the original project proposal.

In all three cases, applying any one safe in isolation is unlikely to provide an effective confidentiality solution. However, when all five safes are considered in combination, the overall disclosure risk becomes very low.

Tabular data is most effectively protected through safe data and safe outputs.

When data is loaded into the user's own environment, some of the safes can wit be more effectively controlled than others. The data custodian cannot directly monitor how the data is used. However, the data custodian mitigates disclosure risk by directly protecting the data. The downside of this approach is that the data can lose some of its utility. Examples of these types of datasets include:

basic microdata files (produced by the ABS)
public use files (PUFs)

The treatment of the microdata files in the ABS DataLab effectively uses all five safes. Safe people, safe projects, safe settings, safe data and safe outputs are all controlled to mitigate the risk of disclosure. This allows appropriately approved researchers to work securely with highly detailed microdata.

APA

Citation

Five Safes framework

APA

Citation

Balancing disclosure risk and data utility

Five Safes framework

Safe people

Safe projects

Safe settings

Safe data

Safe outputs

Examples from the ABS

Provide feedback