Five Safes framework
Using safe people, projects, settings, data and output to balance disclosure risk and utility, ABS Fives Safes examples
Balancing disclosure risk and data utility
A key challenge for data custodians is to provide data with maximum utility for users but still maintain the confidentiality of the information. Every data release carries some risk of disclosure, so the benefits of each release (its utility or usefulness for research and statistical purposes) must substantially outweigh its risks and be clearly understood. This balancing of risk and utility is something everyone does on a daily basis (for example, when they choose to drive a car). Similarly, data custodians need to approach disclosure risk by managing it, rather than trying to eliminate it.
Confidentiality is breached when a person, group or an organisation is re-identified through a data release or when information can be attributed to them. The likelihood of this happening, or risk of disclosure, is not easily determined. Implicit in this is that the consequences of disclosure are always damaging (to some extent) to the individual or organisation. It is difficult to ascertain the degree of damage, mostly because people differ in the importance they place on information. What may be considered highly confidential to one person is of no consequence to another. The ABS assumes all information it collects to be potentially sensitive and manages it accordingly.
Managing disclosure risk becomes a question of assessing not only the data itself, but also the context in which the data is released. Once the context is clearly understood, it is much easier to determine how to protect against the threat of disclosure. The Five Safes framework provides a structure for assessing and managing disclosure risk that is appropriate to the intended data use.
This framework has been adopted by ABS, several other Australian government agencies as well as national statistical organisations such as the Office of National Statistics (UK) and Statistics New Zealand.
Five Safes framework
The Five Safes framework takes a multi-dimensional approach to managing disclosure risk. Each safe refers to an independent but related aspect of disclosure risk. The framework poses specific questions to help assess and describe each risk aspect (or safe) in a qualitative way. This allows data custodians to place appropriate controls, not just on the data itself, but on the manner in which data is accessed. The framework is designed to facilitate safe data release and prevent over-regulation
The five elements of the framework are:
- safe people
- safe projects
- safe settings
- safe data
- safe outputs
Is the researcher appropriately authorised to access and use the data?
By placing controls on the way data is accessed, the data custodian invests some responsibility in the researcher for preventing re-identification. Usually, as the detail in the data increases, so should the level of user authorisation required.
Prerequisites for user authorisation usually include:
- training in confidentiality and the conditions of data use
- signing a legally binding undertaking to maintain data confidentiality
By definition, a safe people assessment would not be required for open data (data that is released into the public domain with no restriction on use).
Is the data to be used for an appropriate purpose?
Users wanting to access detailed microdata should be expected to explain the purpose of their project. For example, in order to access detailed microdata in the ABS DataLab, users must demonstrate to the ABS that their project has a statistical purpose and show it has:
- a valid research aim
- a public benefit
- no capacity to be used for compliance or regulatory purposes
As with safe people, the need for a safe project assessment will depend on the context in which the data is accessed. It would not be required for open data.
Does the access environment prevent unauthorised use?
The environment here can be considered in terms of both the IT and the physical environment. In data access contexts such as open data, safe settings are not required. At the other end of the spectrum however, sensitive data should only be accessed via secure research centres.
Secure research centres may have features such as:
- a locked room requiring personal authentication
- IT monitoring equipment
- auditing and other supervision
Safe settings ensure that data access and use is occurring in a transparent way.
Has appropriate and sufficient protection been applied to the data?
At a minimum the removal of direct identifiers (such as name and address) must be applied to data before it is released. Further statistical disclosure controls should also be applied, depending on how the data will be released. Table 1 shows some of the statistical factors that should be considered when assessing disclosure risk.
|Factor||Effect on disclosure risk|
|Data age||Older data is generally less risky|
|Sample data (e.g. a survey)||Decreases risk|
|Population data (e.g. a census)||Increases risk|
|Administrative data||Increases risk|
|Longitudinal data||Increases risk|
|Hierarchical data||Increases risk|
|Sensitive data||Increases risk (sensitive data may be a more attractive target)|
|Data quality||Poor quality data may offer some protection|
|Microdata||Main risk: re-identification|
|Aggregate data||Main risks: attribute disclosure and disclosure from differencing|
|Key variables||The variables of most interest to users are usually the most disclosive|
Source: UK Anonymisation Decision-making Framework
Are the statistical results non-disclosive?
This is the final check on the information before it is made public, which aims to reduce the risk of disclosure to a minimum. All data made available outside the data custodian's IT environment must be checked for disclosure. For example in the ABS DataLab, statistical experts check all outputs for inadvertent disclosure before the data leaves the DataLab environment.
Examples from the ABS
The Five Safes framework provides a mechanism for data custodians to take necessary and reasonable steps to manage disclosure risk in their data releases. It broadens the approach to data confidentiality by considering not just the treatment of data, but also the manner and context in which data is released.
The safes are assessed independently, but also considered as a whole. They can be thought of as a series of adjustable levers or controls to effectively manage risk and maximise the usefulness of a data release. The degree to which each safe is controlled is critical to assessing the disclosure risk. Tightly controlling all five will be counterproductive because the restrictions applied will not produce a corresponding benefit (useful data).
In practice, the safe data part of the Five Safes should be addressed after the other four are considered. This is because the degree of data treatment required will become evident once it is clear who will be able to access the data, under what conditions, in what circumstances and how the resulting data will be protected in order to be made public. The process is likely to be iterative, as data treatment with a view to maintaining utility may necessitate reassessing one or more of the other four safes.
This table describes how the ABS applies the Five Safes framework to three different data access channels - open data, basic and detailed microdata files.
|Website or publication table (open data)||Basic microdata file (via direct download)||Detailed microdata file (via ABS DataLab)|
|Safe people||No control necessary|
Anyone may view the data online
Users must register to use the data and sign a Declaration of Use
Breaches may be subject to sanctions and/or legal proceedings
Users must undergo training, complete an authorisation process, sign legally binding confidentiality undertakings and a compliance declaration
Breaches of protocols or disclosure of information may be subject to sanctions and/or legal proceedings
|Safe projects||No control necessary|
Anyone can use the data for their own purposes
Users sign a declaration regarding the purpose for which they will use the data
Users must detail the purpose for which they will use the data
Purpose can be compared to what is actually produced (see Safe Outputs)
|Safe settings||No control necessary||Some control|
Users are required to store the data securely and can work on the data in their own physical and IT environment
The DataLab a secure, closed environment, accessed virtually or on-site
Secure login, auditing and monitoring capabilities
No data can be removed without first being checked by ABS staff
|Safe data||Very high control|
The data is highly aggregated
The data is treated by the ABS to ensure no individual is likely to be identified
Direct identifiers are removed and the data is further treated where appropriate. Appropriate control of the data optimises its usefulness for statistical and research purposes.
|Safe outputs||Very high control|
Every table is checked for disclosure before release
(in an Open Data context, the data is the safe output)
The output is technically controlled by the user, but the ABS provides guidelines or rules about what may be published or shared
All statistical outputs are assessed by the ABS for disclosure before being released to the user. The outputs may also be compared for consistency with the original project proposal.
In all three cases, applying any one safe in isolation is unlikely to provide an effective confidentiality solution. However, when all five safes are considered in combination, the overall disclosure risk becomes very low.
Tabular data is most effectively protected through safe data and safe outputs.
When data is loaded into the user's own environment, some of the safes can wit be more effectively controlled than others. The data custodian cannot directly monitor how the data is used. However, the data custodian mitigates disclosure risk by directly protecting the data. The downside of this approach is that the data can lose some of its utility. Examples of these types of datasets include:
- basic microdata files (produced by the ABS)
- public use files (PUFs)
The treatment of the microdata files in the ABS DataLab effectively uses all five safes. Safe people, safe projects, safe settings, safe data and safe outputs are all controlled to mitigate the risk of disclosure. This allows appropriately approved researchers to work securely with highly detailed microdata.