1160.0 - ABS Confidentiality Series, Aug 2017  
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 23/08/2017  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All RSS Feed RSS Bookmark and Share Search this Product

MANAGING THE RISK OF DISCLOSURE: THE FIVE SAFES FRAMEWORK

This page contains the following:
Balancing disclosure risk and data utility
The Five Safes Framework
Five Safes in practice: examples from the ABS


BALANCING DISCLOSURE RISK AND DATA UTILITY

Confidentiality is breached when a person, group or an organisation is re-identified through a data release or when information can be attributed to them. The likelihood of this happening, or risk of disclosure, is not easily determined. Implicit in this is that the consequences of disclosure are always damaging (to some extent) to the individual or organisation. It is difficult to ascertain the degree of damage; mostly because people differ in the importance they place on information (i.e. what may be considered highly confidential to one person is of no consequence to another). The ABS assumes all information it collects to be potentially sensitive and manages it accordingly.

A key challenge for data custodians is to provide data with maximum utility for users but still maintain the confidentiality of the information. Every data release carries some risk of disclosure, so the benefits of each release (i.e. its utility or usefulness for research and statistical purposes) must substantially outweigh its risks and be clearly understood. This balancing of risk and utility is something everyone does on a daily basis (for example, when they choose to drive a car). Similarly, data custodians need to approach disclosure risk by managing it, rather than trying to eliminate it.

Managing disclosure risk becomes a question of assessing not only the data itself, but also the context in which the data are released. Once the context is clearly understood, it is much easier to determine how to protect against the threat of disclosure. The Five Safes Framework provides a structure for assessing and managing disclosure risk that is appropriate to the intended data use.

This framework has been adopted by the Australian Bureau of Statistics (ABS), several other Australian government agencies as well as national statistical organisations such as the Office of National Statistics (UK) and Statistics New Zealand.


THE FIVE SAFES FRAMEWORK

The Five Safes Framework takes a multi-dimensional approach to managing disclosure risk. Each ‘safe’ refers to an independent but related aspect of disclosure risk. The framework poses specific questions to help assess and describe each risk aspect (or safe) in a qualitative way. This allows data custodians to place appropriate controls, not just on the data itself, but on the manner in which data are accessed. The framework is designed to facilitate safe data release and prevent over-regulation

The five elements of the framework are:

  • Safe People
  • Safe Projects
  • Safe Settings
  • Safe Data
  • Safe Outputs.

Safe People

Is the researcher appropriately authorised to access and use the data?

By placing controls on the way data are accessed, the data custodian invests some responsibility in the researcher for preventing re-identification. The general rule is this: as the detail in the data increases, so should the level of user authorisation required.

Prerequisites for user authorisation usually include the following:
  • Undertaking training in confidentiality and the conditions of data use.
  • Signing a legally binding undertaking to maintain data confidentiality with the data custodian.

By definition, a Safe People assessment would not be required for open data (i.e. data that are released into the public domain with no restriction on use).

Safe Projects

Is the data to be used for an appropriate purpose?

Users wanting to access detailed microdata should be expected to explain the purpose of their project. For example, in order to access detailed microdata in the ABS DataLab, users must demonstrate to the ABS that their project has a statistical purpose and show it has:
  • A valid research aim.
  • A public benefit.
  • No capacity to be used for compliance or regulatory purposes.

As with Safe People, the need for a Safe Project assessment will depend on the context in which the data are accessed. It would not be required for open data.

Safe Settings

Does the access environment prevent unauthorised use?

The environment here can be considered in terms of both the IT and the physical environment. In data access contexts such as open data, Safe Settings are not required. At the other end of the spectrum however, sensitive data should only be accessed via secure research centres.

Secure research centres may have features such as:
  • A locked room requiring personal authentication.
  • IT monitoring equipment.
  • Auditing and other supervision.

Safe Settings ensure that data access and use is occurring in a transparent way.

Safe Data

Has appropriate and sufficient protection been applied to the data?

At a minimum the removal of direct identifiers (such as name and address) must be applied to data before it is released. Further statistical disclosure controls should also be applied, depending on how the data will be released. Table 1 shows some of the statistical factors that should be considered when assessing disclosure risk.

TABLE 1: FACTORS TO CONSIDER WHEN ASSESSING DISCLOSURE RISK
FactorEffect on disclosure risk

Data ageOlder data are generally less risky
Sample data (e.g. a survey)Decreases risk
Population data (e.g. a census)Increases risk
Longitudinal dataIncreases risk
Hierarchical dataIncreases risk
Sensitive dataIncreases risk (sensitive data may be a more attractive “target”)
Data qualityPoor quality data may offer some protection
MicrodataMain risk: re-identification
Aggregate dataMain risks: attribute disclosure and disclosure from differencing
Key variablesThe variables of most interest to users are invariably the most disclosive

Source: UK Anonymisation Decision-making Framework


Safe Outputs

Are the statistical results non-disclosive?

This is the final check on the information before it is made public, which aims to reduce the risk of disclosure to a minimum. All data made available outside the data custodian’s IT environment must be checked for disclosure. For example in the ABS’ DataLab, statistical experts check all outputs for inadvertent disclosure before the data leaves the DataLab environment.

In practice, the Safe Data part of the Five Safes should be addressed after the other four are considered. This is because the degree of data treatment required will become evident once it is clear who will be able to access the data, under what conditions, in what circumstances and how the resulting data will be protected in order to be made public. The process is likely to be iterative, as data treatment with a view to maintaining utility may necessitate reassessing one or more of the other four safes.


THE FIVE SAFES IN PRACTICE: EXAMPLES FROM THE ABS

The Five Safes Framework provides a mechanism for data custodians to take necessary and reasonable steps to manage disclosure risk in their data releases. It broadens the approach to data confidentiality by considering not just the treatment of data, but also the manner and context in which data are released.

The safes are assessed independently, but also considered as a whole. They can be thought of as a series of adjustable levers or controls to effectively manage risk and maximise the usefulness of a data release. The degree to which each safe is controlled is critical to assessing the disclosure risk. Tightly controlling all five will be counterproductive because the restrictions applied will not produce a corresponding benefit (i.e. useful data).
The table below describes how the ABS applies the Five Safes Framework to three different data access channels – open data, Confidentialised Unit Record Files (CURF) and detailed microdata files.


TABLE 2: THREE EXAMPLES OF ABS APPLICATION OF THE FIVE SAFES FRAMEWORK
Website or publication table
(open data)
Basic Confidentialised Record File (CURF)
(via direct download)
Detailed microdata file
(via the ABS DataLab)

Safe PeopleNo control necessary
Anyone may view the data online.
Some control
Users must register to use the data and sign a Declaration of Use. Breaches may be subject to sanctions and/or legal proceedings.
High control
Users must undergo training, complete an authorisation process,
sign legally binding confidentiality undertakings and a compliance declaration. Breaches of protocols or disclosure of information may be subject to sanctions and/or legal proceedings
.
Safe ProjectsNo control necessary
Anyone can use the data for their own purposes.
Some control
Users sign a declaration regarding the purpose for which they will use the data.
High control
Users must detail the purpose for which they will use the data. This can be compared to what is actually produced (see Outputs).
.
Safe SettingsNo control necessarySome control
Users are required to store the data securely and can work on the data in their own physical and IT environment.
High control
The DataLab is inside the ABS IT environment (with virtual access available to some users). It requires secure login and has auditing and monitoring capabilities. No data can be removed without first being checked by ABS staff.
.
Safe DataVery high control
The data are highly aggregated.
High control
The data are treated by the ABS to ensure no individual is likely to be identified.
Appropriate control
Direct identifiers are removed and the data are further treated where appropriate. Appropriate control of the data optimises its usefulness for statistical and research purposes.
.
Safe OutputsVery high control
Every table is checked for disclosure before release.
(Note: in an Open Data context, the data is the safe output)
Some control
The output is technically controlled by the user, but the ABS provides guidelines or rules about what may be published or shared.
High control
All statistical outputs are assessed by the ABS for disclosure before being released to the user. The outputs may also be compared for consistency with the original project proposal.



In all three cases, applying any one Safe in isolation is unlikely to provide an effective confidentiality solution. However, when all five are considered in combination, the overall disclosure risk becomes very low.
The treatment of the microdata files in the ABS DataLab exemplifies the framework’s holistic nature: Safe People, Safe Projects, Safe Settings, Safe Data and Safe Outputs are all controlled to mitigate the risk of disclosure, allowing appropriately authorised researchers to work securely with highly detailed microdata.

When data are loaded into the user’s own environment, the data custodian has no way to monitor how the data are used. In this case, the data custodian mitigates disclosure risk by directly protecting the data. The downside of this approach is that the data can lose some of its utility. Examples of these types of datasets include the following:
  • Basic Confidentialised Unit Record Files (CURFs), produced by the ABS
  • Public Use Files (PUFs).

Techniques to treat microdata to mitigate disclosure risks are outlined in Part 5: Managing the risk of disclosure: treating microdata.
As Table 2 shows, tabular data are most effectively protected through Safe Data and Safe Outputs. Techniques for protecting tabular data are presented in Part 4: Managing the risk of disclosure: treating aggregate data.