The ACLD-SSRI dataset is a unique data source in that it brings together information about the characteristics, circumstances, and transitions of people who have interacted with the social security system, and therefore has the potential to increase knowledge about a wide range of socio-economic issues facing Australians and their families.
While some survey collections (such as the Survey of Income and Housing and the General Social Survey) include items about government payments received combined with a range of socio-economic and demographic characteristics, the substantially larger sample size of the ACLD-SSRI dataset allows these issues to be explored in much more detail. In addition, the use of administrative data gives us information on the actual payments received, while traditional surveys rely on the respondent accurately reporting. Additionally, the ACLD-SSRI allows longitudinal characteristics of benefit recipients to be explored.
However, the ACLD-SSRI dataset has some significant limitations (see Limitations) and is considered experimental in nature. Evaluation of the project to date has identified areas where different methodologies could be considered in the future to increase the robustness of the data, and where further enhancements could be made to improve the utility of the dataset.
Linkage of the ACLD (a random 5% sample of 2006 Census records linked to records from the 2011 Census) to a subset of the SSRI dataset (people who received social security benefits or who had suspended benefits in September 2011) was undertaken using selected variables common to both datasets to ensure each record in the ACLD had the highest possible chance of being accurately linked to a record in the SSRI dataset (refer to Methodology for more information). However, the overlap of these datasets is unknown, so it is difficult to calculate an exact linkage rate.
This method was chosen as a non-identifying grouped numeric code is available only for the records in the 5% ACLD sample, and not for the full Census dataset. It was thought that the inclusion of a non-identifying grouped numeric code as a linkage variable would produce a higher quality linkage than linking to the full 2011 Census dataset where such a code is not available. However, linkage to the full Census has a number of advantages:
- Analysis of the linked ACLD-SSRI dataset has shown that most items of analytical interest are cross-sectional in nature - examining the characteristics of SSRI recipients using data items from the Census that are not available on SSRI data in isolation. Linkage to the Census itself rather than a longitudinal sample would be more optimal for this purpose (the relevant ACLD sample could then be extracted from the linked dataset for any required longitudinal analysis).
- Linkage to the 2011 Census allows a direct calculation of a linkage rate as a measure of linkage quality.
- While the ACLD-SSRI dataset is large in comparison to sample surveys, linkage to the full Census dataset allows the characteristics of very small groups or small areas to be examined.
The ACLD-SSRI record weights are not benchmarked to the SSRI population. While the weights are of good quality for use in analysing the longitudinal Australian population, there is no mathematical measure of how accurately they reflect the demographic make-up of either the longitudinal SSRI population or the "point in time" population at the time of the Census. In future, alternative linking strategies may allow for good quality benchmarks incorporating SSRI population information to be produced, enabling the calculation of more accurate weights, thus increasing the utility of the file and allowing inferences to be made with more confidence.
Weighting the linked dataset to be representative of the input SSRI dataset would minimise the impact of bias on analysis of the characteristics of SSRI recipients.
Additional data items
This pilot project involved linking SSRI data from the 2011 time-point only. Analysis has shown that it may have been more analytically useful to include SSRI data from earlier time points in order to use 2011 Census data to explore the outcomes for those on particular payments. Longitudinal benefits information would provide a valuable resource for understanding the dynamics and pathways of people's involvement in the social security system.
For the purposes of this pilot project, only a relatively small selection of SSRI data items was linked to the ACLD dataset. While this has resulted in a unique and valuable dataset bringing together information not available on any alternative dataset, there may be potential in the future to further enhance the dataset by including a greater range of SSRI data, or information from additional time-points.