Australian Workshop on Public Finance: Keynote Address

Making the Most of Administrative and Integrated Data Assets

Dr David Gruen AO
Australian Statistician
Thursday 3 August 2023

Introduction

Thank you, Bob, and thank you for the invitation to present the Keynote address at the third Australian Workshop on Public Finance.

I have lived in the Canberra region on and off for much of my life. It is a beautiful part of the world. I would like to take this opportunity to thank the Traditional Custodians of this land who have cared for it over millennia. I pay my respects to their Elders and acknowledge members of the Aboriginal and Torres Strait Islander community attending today.

Thank you for the opportunity to give my reflections on progress in the availability of Australian administrative and integrated data assets, the value of empirical research and policy evaluations that materially influence policy and the public discourse, and where the academic community fits into the picture.

Data, data everywhere

My overarching message is that, while there is always more to do, we are making excellent progress improving the quality and timeliness of administrative data and, more broadly, integrated data assets.

I will confine my remarks to the data assets for which the ABS is responsible or is involved in developing. That is not because we are the sole public sector organisation making progress improving the quality of data assets but rather to play to my comparative advantage – it’s the topic I know best!

Over the past several years, there has been a great deal of progress.[1]

Let me begin with Single Touch Payroll (STP). The Australian Tax Office (ATO) receives payroll information from employers with STP-enabled payroll software each time the employer runs their payroll. Given the extensive coverage of the STP system, these data cover more than 10 million employees. That is not quite every employee in the country, but it is not far from it. The arrival of the COVID-19 pandemic in early 2020 made access to this rich vein of near real-time information an urgent priority. The ATO expedited access, and the ABS began receiving these data in early April 2020.[2]

From then on, the ATO provides job and wage data from the STP system to the ABS each week with which we produce a new administrative data asset and a new publication: Weekly Payroll Jobs and Wages.

STP has now moved into a second phase of development. STP Phase 2 includes more detailed breakdowns of people’s earnings, differentiates between the different types of payments they receive, and provides more information on the nature of jobs (for example, whether they are full-time or part-time, or casual or non-casual jobs, etc). By the middle of next year, employers of most employees across Australia will report to the ATO via STP Phase 2. The ABS is looking forward to accessing this information at some point in the future.

In many ways, access to STP Phase 1 data taught us new ways of doing things. Given the scale and complexity of these data, it made sense to ingest and analyse them using cloud computing services rather than using our existing computer systems. And that is the new model for accessing public and private sector big data assets.

My next example gives a sense of the breadth of subject-matter areas in which data integration projects are being developed. The ‘Criminal Justice Data Asset’ is a longitudinal national data asset linking police recorded criminal offenders in Australia’s criminal courts with adult prisoners in the corrective services systems. The dataset will show how people move and interact within and across the justice system nationally, something that is currently not possible. The dataset will have the potential to be linked to other Commonwealth, state and territory held datasets for deeper analysis of the characteristics of criminal offenders. The ABS is working with the 24 criminal justice agencies (police departments, criminal courts and adult corrective service systems in the eight states and territories) to move from a pilot to production of the asset. The current plan is the asset will be made available to approved policymakers and researchers for approved projects in 2024-25.

As most of you would know, MADIP (the Multi Agency Data Integration Project) and BLADE (the Business Longitudinal Analysis Data Environment) are the two largest integrated data assets hosted by the ABS. These two data assets include data from several data custodians. MADIP was established in 2015 while BLADE was originally established as the Expanded Analytical Business Longitudinal Database (EABLD) in 2014 as a joint project between the ABS and the Department of Industry, Innovation and Science.

Major improvements were made to MADIP and BLADE with funding from the Data Integration Partnership for Australia (DIPA) over 2017-2020. Since the end of DIPA, major improvements have continued. Initially, analysts wanted static assets that were timestamped. New versions were created each year and the underlying data in both MADIP and BLADE were updated once a year. But as these data assets have matured, processes have been streamlined and key enabling infrastructure (particularly the ABS DataLab) has been moved to the cloud. This enhances security and makes possible more sophisticated data analysis. A desire by analysts for closer to real-time data has also resulted in both MADIP and BLADE being updated much more frequently (some datasets monthly; others quarterly).

Figures 1 and 2 below show the datasets that are currently included in MADIP and BLADE. As the figures make clear, these two integrated data assets now include an impressive number of datasets that provide information on many aspects of people’s and businesses’ lived experience.

Figure 1: Multi-Agency Data Integration Project (MADIP)

This figure outlines the all the datasets included in the Multi-Agency Data Integration Project (MADIP)

This figure outlines the all the datasets included in the Multi-Agency Data Integration Project (MADIP). MADIP is a secure data asset combining information on health, education, government payments, income and taxation, employment, and population demographics (including the Census) over time. It provides whole-of-life insights about various population groups in Australia, such as the interactions between their characteristics, use of services like healthcare and education, and outcomes like improved health and employment.

Figure 2: Business Longitudinal Analysis Data Environment (BLADE)

This figure outlines all the datasets included in the Business Longitudinal Analysis Data Environment (BLADE)

This figure outlines all the datasets included in the Business Longitudinal Analysis Data Environment (BLADE). BLADE is an economic data tool combining tax, trade and intellectual property data with information from ABS surveys to provide a better understanding of the Australian economy and businesses performance over time.

There have been many additions to these integrated data assets. Let me describe a few of them.

We have included a monthly series on Business Insolvencies (ASIC data) which has been used to analyse the impact of various economic events on businesses survivability. We have also linked MADIP and BLADE which has enabled Longitudinal Employer-Employee analysis. We have also added data linked to the MADIP-BLADE asset to help identify Sole Traders and Partnership businesses so analysts can better understand the characteristics of people starting and running small businesses and to better evaluate and develop business support programs.

To support Treasury to track economic recovery from the pandemic, the Labour Market Tracker Project integrated job-related data, including STP, JobKeeper and JobSeeker data to both MADIP and BLADE, with datasets updated fortnightly, monthly and quarterly as they became available, to enable up-to-date monitoring of the labour market and the economy.

And for my final example of recent improvements in integrated data assets, data from the Australian Immunisation Register are being linked to MADIP each week. Provisional Death Registrations data are being linked and updated monthly. These data have being used by the Department of Health to generate insights for the Australian COVID-19 Vaccine and Treatment Strategy, including which groups in the community had lower vaccine uptake – and hence where to focus effort to raise that uptake. They have also been used in a recent academic study that demonstrates how effective COVID-19 vaccines have been in reducing mortality among older Australians.[3]

Developments from the May Budget

The last topic I want to touch on this morning is two announcements in the May Federal Budget of relevance for this conference.

Firstly, the Budget included $10 million over four years to establish the Australian Centre for Evaluation (ACE) in the Australian Treasury. The aim of the centre is to improve the quantity, quality, and impact of evaluations across the APS, and work in close collaboration with evaluation units in other departments and agencies.

Dr Shane Johnson – a name well known to most of you and someone extremely well equipped to oversee evaluation work – is head of the Treasury Division in which the ACE is being established. The ACE’s responsibilities will include promoting the Commonwealth’s evaluation policy, identifying opportunities to partner on high‑quality impact evaluations, promoting better evaluation planning and use in Budget and Cabinet processes and supporting and overseeing evaluation capability building across the Australian government.

Systematic, high-quality evaluation requires data. I am looking forward to the imminent establishment of the ACE so that we at the ABS can work closely with the new centre to look at ways to further improve data sharing and linkage to better support evaluation and research. Once it is established, I have no doubt there will be many productive interactions and collaborations between the ACE and the people attending this conference. Well-conducted empirical research and evaluations are important inputs into public policy and there is benefit for evaluations to be conducted in the public sector, in the research sector, and in collaborations between the two.

The second announcement in the May Budget of relevance for this conference was a $200 million package to target entrenched community disadvantage with a focus on intergenerational disadvantage and improving child and family wellbeing, led by Treasury and the Department of Social Services (DSS). As part of this package, the ABS received $16.4 million over four years to deliver a ‘Life Course Data Initiative’ to improve understanding of how communities experience disadvantage, including through longitudinal data. In many instances, disadvantage is concentrated in specific communities, and it is hard to obtain a detailed picture of what is happening in these communities. While survey data can provide valuable insights, the sample size often precludes generating results at the community level. Accessing administrative data can overcome this limitation.

The Life Course Data Asset initiative will aim to connect administrative datasets into a linked longitudinal data asset to support community-level analysis. Connecting data across different domains and levels of government will support analysis of the characteristics, programs and service interactions of individuals, households and families which serve as either protective or risk factors in experiencing disadvantage over time.

The ABS will engage with States and Territories over the next few months to develop a set of criteria for a pilot. Initially, the pilot will be developed with a single State partner, which will enable the ABS to focus resources on significantly deepening the data available on childhood in Australia. This approach will establish frameworks and processes for subsequently expanding the asset to all jurisdictions (States and Territories). The Life Course Data Asset will seek to include datasets across many aspects of people’s lives including health, education, employment, security, and housing.

Figure 3 below provides a schematic of the potential datasets that will be in scope. The success of the Life Course Data Asset will depend on acquiring and linking administrative data held by the relevant jurisdiction, together with an expansion of Commonwealth datasets. Support from all jurisdictions will be important in building the longer-term data asset – because many of the interactions Australians have with government services are with programs delivered at either the state/territory or local government level.

Figure 3: Provides a schematic of the potential datasets that will be in scope for the Life Course Data Asset

This figure outlines the potential datasets that will be in scope for the Life Course Data Asset

This figure provides an overview of the proposed Life Course Data Asset, which will be a person-based asset that will link data across domains and levels of government to facilitate analysis of outcomes across the life course within the family, household and broader community context. Please note the data are indicative only and provides an illustrate example of the potential breadth of data relevant to understanding outcomes across the life course. Identifiable sources may not be available for all topics.

Conclusion

In conclusion, I hope I have given you a sense of the progress being made in developing administrative and integrated data assets and making them available to analysts and researchers.

Over time, these data assets should greatly expand the opportunity for analysts, both within government and academia, to do high-quality empirical research and evaluations of programs and thereby to improve the information base on which future public policy is formulated.

Thank you.

Footnotes

[1] My summary of progress is also not exhaustive. For example, the National Disability Data Asset and the associated Australian National Data Integration Infrastructure are important new initiatives which I have discussed elsewhere.

[2] We are grateful to the ATO for this access, particularly given the ATO were at the time delivering the JobKeeper package among other activities. Data on more than 10 million employees from STP allows us to produce detailed geospatial analysis (or to disaggregate across other dimensions) which is not possible using the 50,000 or so individuals from whom we collect data in the monthly Labour Force Survey. This coverage and detail are benefits of administrative ‘big’ data sources.

[3] The study demonstrated that in early 2022, a 65+ year old person having had three COVID-19 vaccinations – with the third dose administered within the previous three months – had a COVID-19 mortality rate 93 per cent lower than a comparable unvaccinated person. (93 per cent is an extremely large fall in mortality.) Further, it demonstrated how vaccine effectiveness wanes over time. People who received their most recent booster within the previous three months had a much larger fall in mortality (by around 20 percentage points) than people whose latest booster had been more than six months ago. It remained true that being vaccinated reduced mortality significantly relative to the unvaccinated but the level of protection was noticeably higher for those with a recent booster. See ‘Effectiveness of COVID-19 vaccinations against COVID-19 specific and all-case mortality in older Australians’ by Bette Liu, Sandrine Stepien, Timothy Dobbins, Heather Gidding, David Henry, Rosemary Korda, Lucas Mills, Sallie-Anne Pearson, Nicole Pratt, Claire Vajdic, Jennifer Welsh, Kristine Macartney from the National Centre for Immunisation Research and Surveillance (NCIRS).

Back to top of the page