Australian Government Data Forum: Keynote address

Strengthening Australian Public Service (APS) Data Capability

Dr David Gruen AO
Australian Statistician and Head of the APS Data Profession
Wednesday 17 May 2023


Thank you, Gayle and Chris.

I have lived in the Canberra region on and off for much of my life. It is a beautiful part of the world. I would like to take this opportunity to thank the Traditional Custodians of this land who have cared for it over millennia. I pay my respects to their Elders and acknowledge and welcome other members of the Aboriginal and Torres Strait Islander community who are attending today.

Thank you for the opportunity to speak about strengthening APS data capability. In my talk today, I’ll give you an overview of the initiatives undertaken by the APS Data Profession to raise data capability across the public service. I will also talk about the data landscape more broadly and give a few examples that illustrate how far we have come in our capacity to generate insights from the data assets now available to the public sector.

Data, data everywhere

In preparing this talk, I reflected on the 34 years since I joined the public service, not as a graduate, but following an earlier career as a research scientist. Over that time, I’ve had many different roles spanning research, analysis, and policy development. All these roles involved data in some way – using it myself, managing staff who use it or, more recently, leading projects that expand the range of data available to the community and for public policy purposes.

For the past couple of decades, we have seen rapid growth in the availability of data and a much wider range of data sources. This is largely because of the digital revolution – which has created a deluge of data in its wake. Data professionals are increasingly accessing this deluge of data and using it to derive insights of value both privately and for public policy purposes.

Being able to access and use data for policy development and efficient service delivery has become a prominent part of the successful operation of the public service.

Data is also a focus of the Government’s agenda. Building on Minister Gallagher’s remarks, Jenny Wilkinson in her keynote address this morning provided an overview of the initial Data and Digital Government Strategy, released with the 2023-24 Budget. The Government is taking meaningful steps to ensure the APS has the right capability, tools, and processes to securely share, understand and use data for better policy advice, regulation, and services.

This heightened interest and availability of data brings into focus how important it is to lift data capability across the APS. Most roles in the public service now require a level of data literacy that wasn’t needed in years gone by and departments and agencies are seeking to employ many more people with deep data expertise.

It is here the Data Profession has a major role to play.

The APS Data Profession

In 2018, the Australian Government commissioned an independent review of the public service, led by David Thodey. The review panel included Glyn Davis and Gordon de Brouwer, names with which I suspect you are familiar.

Among many other things, the review noted the APS should make better use of data and analytics to generate deeper insights, provide better advice to inform government decisions, and enable more effective service delivery and regulation to improve social and economic outcomes.

In response to the review recommendations, the Australian Public Service Commission (APSC) established three Professional streams. The Human Resources Profession was launched first, second was the Digital Profession and third was the Data Profession, launched in September 2020 with me as head of profession and the ABS as lead agency.

The initiatives developed by the Data Profession are co-designed with partner agencies across the APS. Co-design is important because it gives us the best chance that the programs and resources developed by the profession are valued broadly across the APS and hopefully are embedded in a way that will be sustained.

The Data Profession’s focus is on lifting data capability in the APS. It does this by:

  • recruiting people with data skills into the APS,
  • developing training courses, including for SES officers who are not data professionals but who increasingly need to understand the data aspects of their roles,
  • providing a members’ community platform where members of the data profession create communities of practice, share ideas, advertise events – like today’s Data Forum! – and access learning resources,
  • defining data capabilities to enable individuals and agencies to assess their range of data capabilities and respond to any identified gaps,
  • encouraging diversity of people in data roles, and
  • creating career pathways and development opportunities.

Let me talk about each of these in more detail.

As many of you would be aware, since the 2021 intake, it has been possible for graduates who enter the APS to do so via a dedicated data graduate stream.

Interest in this data graduate stream has built over time. For the 2021 intake, 11 agencies were involved in data graduate recruitment and 65 data graduates were placed across the APS.

For the 2024 intake, over 40 agencies are involved in recruitment, and we anticipate placing about 360 data graduates across the APS.

There are now ten streams through which graduates can enter the APS – a generalist stream and nine specialist streams.[1] It gives a sense of the level of interest in data by prospective graduates that there have been more applicants to the data graduate stream (1,462) for the 2024 intake than to any of the other specialist streams.

In terms of training courses, a Data Leadership Course for SES who are not data professionals has been developed to raise data literacy for senior executives in the APS. I sometimes joke that the target audience for this course is SES officers who have heard of the security and other benefits of putting datasets in the cloud but would be hard pressed to explain exactly what the cloud is.

The Data Profession engaged the ANU to develop a pilot SES Data Leadership Course in 2021. Following the success of the pilot, the course has been delivered by the ANU in partnership with the APS Academy. Including the pilot, 6 courses have been delivered to date to a total of 111 participants across 19 agencies, with one more course scheduled this financial year for another 20 SES. Post course collaboration among course participants is encouraged via a dedicated group on the members community platform. Given its success and the obvious demand, the course will continue into 2023-24.

Building on this success, a similar course is being developed for the EL2 cohort. As well as this, with our partners, we will develop and support training modules for the broader EL cohort which will include training modules with more technical content.

Most recently, the Data Profession has piloted Data Graduate development training modules that will be evaluated, refined and made available later in the year to all 2023 data graduates. These training modules cover data in the APS, trust in government, evidence-based decision making, data storytelling and visualisation and the APS Data Profession and uplifting data capability.

Building on the importance of community, the Data Profession took the opportunity to join forces with the Digital Profession on their established web-based members’ platform to improve our communications and engagement with data professionals. A rebranded Data and Digital Professions Members’ Community Platform (MCP) was launched in August 2022.

On this platform, data professionals can consult a Directory of Members to engage with other members of the Data Profession. Members can also share and find both shorter- and longer-term data job opportunities. Membership has grown steadily since the launch in August last year, with 4,665 Data Profession members on the platform in early May 2023.

The Communities of Practice (CoPs) on the platform have been particularly effective, with 13 different communities of practice currently live. These have been established for a variety of purposes: for example, the Graduate Data Network has a central place to collaborate, as does the Women in Digital Community of Practice, which recently expanded to become the first joint community across both the Data and Digital Professions, now known as the ‘Women in Data and Digital’ community of practice. The goal of this community is to increase connections and visibility for women in data and digital roles. I am delighted to see this community as a finalist in the APS Diversity in Data category for the APS Awards being presented tonight.

The Data Profession supports communities of practice on the platform to enable data professionals to exchange ideas and develop capabilities relevant to their specific areas of expertise. I encourage other groups of data professionals to set up their own communities of practice – it’s a great way to build your community and develop connections with others across the APS who share your specific interests.

Let me turn now to the Data Capability Framework, which was published in October last year. The Framework outlines 26 specific capabilities associated with working with data in the APS. Each capability has three proficiency levels: foundational, intermediate and advanced.

Agencies that have implemented the framework are using it for various purposes such as:

  • Organisational planning and strategy,
  • Alignment with an agency’s own capability framework,
  • Job role workshops,
  • Development and career planning workshops, and
  • Capability assessment at an agency level or by individuals self-assessing their own data capabilities and identifying any gaps.

For example, the ABS ran a data capability survey to assess current levels of capability across the organisation, and to offer employees a tool that enables them to reflect on their strengths and areas for improvement.

Individuals who filled in the survey provided self-assessments of their own capabilities (using the categories: no skills, foundational, intermediate, or advanced) across 24 data capabilities based on the Data Capability Framework. Survey respondents also provided a self-assessment of their proficiency across eight software/programming languages (which included, for example, Microsoft Excel, SAS, R and Python).

Figure 1 below shows the results. For each data capability, the lightest colour shows the share of people who reported they had no skills, with progressively darker colours for the shares who reported foundational, intermediate and advanced skills.

The figure provides a clear visual representation of the self-assessed levels of data capability across the specific data capabilities and software/programming languages. It will be used to guide future training effort.

Figure 1: ABS Data Capability Survey

Figure 1 show the results from the ABS Data Capability Survey

Figure 1 shows the results from the ABS Data Capability Survey.

The final capability initiative I’ll discuss is the APS Data Job Roles. These were published in December 2022 and soon will be extended, with more personas added to the current suite. This work is designed to complement the Data Capability Framework.

Data Job Roles establish a common understanding of baseline data skills, competencies and requirements of different data roles across the APS. The Data Job Roles can be used by agencies and individuals to:

  • Understand the capabilities needed by data professionals for particular jobs,
  • Identify capabilities that need to be developed for people to progress in their careers,
  • Assess capabilities in preparation for performance reviews,
  • Provide a common language for advertisements for data job roles and
  • Support human resource and workforce planning.

As I said earlier, the initiatives in the Data Profession have been co-designed with other agencies to ensure their broad appeal across the APS. In the case of Data Job Roles, I want to record my appreciation to the ATO who did most of the work to develop this initiative.

Exciting examples of using data for public policy

Having talked in some detail about the initiatives in place to raise data capability across the APS, I want to provide a few examples of how far we have come with the data now available within the public service to provide public policy insights. While there are many elements to this, I will focus on the rise of ‘big data’ and the increasing use of integrated data assets.

On the rise of big data, I’ll give just one example to illustrate what is now possible. This example is based on a recent study on rents, which is a joint piece of work by the ABS and RBA.

This study uses a new dataset which provides data on rents for about 600,000 rental properties across both regional and capital cities in Australia. Rents data on these 600,000 properties is updated monthly. With that much data, it is possible to provide extremely detailed information on developments in the rental market.

Let me give you two examples.

Figure 2 below shows rental prices over the past five years by distance from the CBD in Sydney and Melbourne. The broad outlines of the price developments in Australia’s two largest cities are remarkably similar. With the arrival of COVID-19 in Australia in March 2020, there were big falls in market rents close to the CBD but not further out in the suburbs. The near-to-CBD rental price falls began to reverse in 2021 but have yet to fully unwind their earlier falls. The contrast with the outer suburbs is striking indeed.

Figure 2: Rent price indices*, by capital city SA3, March 2020 = 100

Figure 2: Rent price indices*, by capital city SA3, March 2020 = 100

A two panel line graph of rent price indices by SA3 in greater Sydney and greater Melbourne with the index equal to 100 in March 2020. The X-axis represents the month, ranging from June 2018 to February 2023. The Y-axis represents the level of the index. Each line is coloured according to the distance of that SA3 from the CBD, warmer colours (like red) mean that the SA3 is further from the CBD, while cooler colours (like purple) mean that the SA3 is close to the CBD (ranging from 0 to 80km). The graph shows that, in general, rent prices in SA3s close to the CBD declined further after the onset of the pandemic and remained lower for longer, with some SA3s remaining below their pre-pandemic levels in February 2023. By contrast, rent prices in SA3s far from the CBD have increased over the past three and have increased by up to 10 per cent over this period in some cases.

The second insight from these data is presented below in Figure 3. This shows, for those rental properties with a new tenant, the proportion of properties that saw rental increases of different sizes over the previous year. For example, in mid-2020 less than five per cent of new tenants were being charged rent more than 10 per above the rent that the previous tenant had been charged on that property a year earlier (mid-2019). But by early 2023, around two-thirds of new tenants were being charged rent more than 10 per above the rent that had been charged to the previous tenant on that property a year earlier.

This example makes it is pretty clear that big data provides a level of detail about developments in the rental market that is not available any other way. It enables analysts to understand what has been going on with the rental market not just on average but in different segments of the market at different times. A curious analyst could use these data to throw light on many public-policy-relevant questions about the rental market. (In my opinion, curiosity is an underappreciated attribute for success!)

* Expenditure weighted. Includes private rents only. It should be noted that distribution presented in this graph uses different methodology and sampling to the CPI.

Source: ABS

Let me turn now to how far we have come with integrated data assets. Figures 4 and 5 below show the datasets that make up two large public sector integrated data assets: the Multi-Agency Data Integration Project (MADIP) and the Business Longitudinal Analysis Data Environment (BLADE). As the figures make clear, these two integrated data assets now include an impressive number of datasets that provide information on many aspects of people’s and businesses’ lived experience.

These integrated datasets therefore provide the opportunity for analysts to tackle public policy problems across multiple dimensions. It is incumbent on the hosts of these data assets, in this case the ABS, to ensure that they are hosted securely with well-articulated protocols to ensure that people’s and businesses’ private information is protected and is not compromised.

Figure 4: Multi-Agency Data Integration Project (MADIP)

Figure 4 shows the datasets that make up the public sector integrated data asset: the Multi-Agency Data Integration Project (MADIP)

Figure 4 shows the datasets that make up the public sector integrated data asset: the Multi-Agency Data Integration Project (MADIP).

Figure 5: Business Longitudinal Analysis Data Environment (BLADE)

Figure 5 shows the datasets that make up the public sector integrated data asset: the Business Longitudinal Analysis Data Environment (BLADE).

Figure 5 shows the datasets that make up the public sector integrated data asset: the Business Longitudinal Analysis Data Environment (BLADE).

Let me give an example of a recent study that uses integrated data to answer important public policy questions. The study uses a link between MADIP and the Australian Immunisation Register – a dataset that contains details of all Australians vaccinated against COVID-19 and when they were vaccinated.[2]

The study followed 3.8 million Australians aged 65+ in 2022 to examine the relationship between mortality for this older age group and vaccination status.

Now some key findings from the study. First, it provided insights about the impact of vaccine boosters on mortality rates. It demonstrated that in early 2022, a 65+ year old person having had three COVID-19 vaccinations – with the third dose administered within the previous three months – reduced COVID-19 mortality relative to a comparable unvaccinated person by 93 per cent. 93 per cent is an extremely large fall in mortality.

Second, the study demonstrated how vaccine effectiveness wanes over time. It showed that people who received their most recent booster within the previous three months had a much larger reduction in mortality (by around 20 percentage points) than people whose latest booster had been more than six months ago. It remained true that being vaccinated reduced mortality significantly relative to the unvaccinated but the level of protection was noticeably higher for those who had had a recent booster.

For our purposes here, the point I want to highlight is this: As with the earlier example with rents, there are enormous benefits in being able to examine outcomes from such a large sample. A sample of 3.8 million does not include every 65+ year old Australian in 2022, but it is close. Population wide analysis is a great way of controlling for selection biases that could otherwise distort results.

In this example, the benefit of working with an integrated data asset is that it enables more complex public policy questions to be answered than would be possible with a single dataset.


To conclude my remarks, I am very much looking forward to the inaugural Data Awards ceremony tonight. These awards are a celebration of achievement, highlighting innovative and resourceful solutions implemented across the APS which use data to make a difference to the Australian community – so congratulations to all finalists and good luck for tonight!

We are increasingly working in a highly competitive employment environment for people with data skills. Recognising that, the Data Profession will continue to focus on the attraction, development and retention of staff while seeking to highlight and enhance the benefits of being a data professional in the APS.

And finally, let me take the opportunity to recommend you join the Member Community Platform if you haven’t already done so – to collaborate with your peers, access events, learning resources and job opportunities to support your data career in the APS.

Thank you.


[1] The specialist streams are Accounting and Finance, Data, Digital, Economics, Human Resources, Indigenous, Intelligence, Legal and STEM. Prospective graduates to the APS can apply in more than one stream.

[2] ‘Effectiveness of COVID-19 vaccinations against COVID-19 specific and all-case mortality in older Australians’ by Bette Liu, Sandrine Stepien, Timothy Dobbins, Heather Gidding, David Henry, Rosemary Korda, Lucas Mills, Sallie-Anne Pearson, Nicole Pratt, Claire Vajdic, Jennifer Welsh, Kristine Macartney from the National Centre for Immunisation Research and Surveillance (NCIRS).

Back to top of the page