Standard Level of Detail for Confidentialised Unit Record Files
The ABS makes unidentifiable microdata from its surveys available to users in the form of Confidentialised Unit Record Files (CURFs). In the past, the balance between maximising the level of detail and maintaining confidentiality has been done separately for each individual CURF. This has resulted in trade-offs in the level of detail between variables to best meet the needs of clients for a particular CURF. This method has been labour intensive, and while it has given some flexibility for client areas, it has resulted in inconsistency between collections and over time, leading to a lack of predictability for what will be allowable on the CURF. It was therefore decided to standardise the level of detail for CURFs in order to reduce the effort required in producing and assessing CURFs.
The nature of different collections and the different release mechanisms has dictated that a number of different standard levels of details be developed depending on the type of collection, and whether the CURF is to be released on CD ROM and the remote access data lab (RADL), or just on the RADL.
Basic files (CD ROM or RADL release)
- income, labour and education CURFs to contain more detail for age around the 'transition' age groups, (i.e. 15-24 and 55-64 years):
- other social surveys may have children in scope so a different standard level of detail was needed. It was noted that in the past some collections had opted for different Geography to be included on CURFs for different reasons. As a result, two 'options' for Geography are considered for this standard - state and one sub-state geography, and remoteness and socioeconomic indexes for areas (SEIFA) presented in quintiles:
Expanded files (RADL release only)
- the Population Census 1% Household Sample File has a sample of at least four times that of most other collections, thus supporting more detailed analysis. A separate standard level of detail was considered for the Census file.
- for most expanded CURFs, the level of detail of variables is that which would be useful for analysis. Variables which identify groups of people that a user is likely to know, (e.g. geographic area or industry), are not as fine as other variables. In addition, masking of individual records (through altering values of variables for a small number of particularly unusual records) on the file is the main strategy for protection against spontaneous recognition;
- again, the Census file supports more detailed analysis, so a separate standard level of detail was required;
So far, efforts to standardise have been concentrated on a small set of core person level variables which have, in the past, most commonly had the level of detail collapsed to reduce the risk of identification. The standard variables can be grouped according to whether they pose a risk to list matching, to spontaneous recognition, or to both.
Risk of list matching and spontaneous recognition:
- Indigenous collections were also considered as a separate case, as the size and distribution of the Indigenous population would mean that some variables would need to be further restricted from that proposed in the expanded standard.
Risk of list matching only:
Risk of spontaneous recognition only:
- Marital Status;
- Country of Birth;
- Year of arrival;
- Indigenous status.
Income has also been considered as a risk for list matching and spontaneous recognition, but due to some complications in determining the appropriate level of detail on non-income survey CURFs, it has not been included in the first set of variables to be signed off.
Approval and future developments
Only standards for household collections have been developed so far, as almost all CURFs produced by the ABS have been from household collections. It is planned to develop standard levels of details for other common person and household level variables for CURFs in the near future. Plans to develop standards for business surveys as experience with them expands are also being developed.
All CURFs will be subject to the appropriate standards unless there is very well justified user demand and an appropriate trade off in the level of detail is made.
An assessment of the combined disclosure risk posed by the variables in the first set of standards has been conducted and considered by the Micro Data Review Panel. Approval for the standards will be sought from the Australian Statistician.
For more information, including the complete detail for the variables in the standards, please contact Kirsty Leslie on (02) 6252 5594 or Paul Schubert (02) 6252 7306.
email: firstname.lastname@example.org or