Data transformation

Presentation to CEDA Trustees, 5 October 2017, Victoria

David W Kalisch, Australian Statistician

ABS – delivering priceless statistics that inform important decisions

The ABS purpose is to inform Australia’s important decisions. This is reflected in the ABS outcome reported in the 2017-18 Budget Papers and in our Corporate Plan 2017-18.

The ABS produces around 500 statistical releases every year (around two major statistical products every working day), and then we also do a Census of Population and Housing, as well as a Census of Agriculture every five years.
  • ABS statistics provide trusted, contemporary information about our economy, our society, our population and our environment.
  • These statistics provide the reliable evidence on which all governments, businesses and the community should make important decisions every day.
  • Our statistics help inform our democratic processes, they influence economic policy settings by governments and the RBA, they affect government funding to states and regions as well as electoral distributions, and influence business decisions and household choices.

ABS statistics are truly priceless. It’s hard to imagine our country functioning well without this vital information.

ABS is a complex organization, with around 3000 staff skilled in statistics, mathematics, economics, IT, information management, logistics, field interviewing, finance, risk management, HR and communications. We provide national statistical services from 9 locations (every capital city and Geelong), with many of our work programs operating across several locations.

Maximising public value from our capability and resources

The operating context of the ABS is very challenging. We have 20% less funding and staffing compared to 15 years ago and our fragile, ageing statistical infrastructure is being replaced over a five year program. ABS receives comparatively less resourcing from our government than is the case for our Canadian and New Zealand national statistical office colleagues, to deliver a broadly similar statistical work program.

At the same time, key data users demand new statistical products be added to most of our existing releases. It is becoming more difficult to get information from households and businesses through traditional survey approaches, for example as more people live in secure apartment buildings, are increasingly reluctant to provide information for surveys and as businesses seek to reduce their operating costs. However, new information sources, often colloquially called “big data”, are emerging that can, in some instances, provide new or substitute sources of information. New statistical techniques are also being developed, for use in Australia and internationally.

Without statistical and organisational innovation, ABS would produce substantially fewer statistical outputs now and into the future.

The main strategic focus of the ABS leadership is to maximise public value from the resources we receive.

We assess this across the five competing dimensions of:
  • the provision of quality, timely national statistics;
  • producing new statistical insights;
  • enabling effective, safe use of ABS data;
  • pursuing data capture that is efficient and less intrusive; and
  • continuing to build ABS capability for the future.

Inevitably, we have to make choices and trade-offs between these five dimensions, in general putting more effort into one of these dimensions (like producing new insights) does reduce the amount of effort we can apply to the other public value dimensions. We are making these trade-offs and judgments within the overall context of our budget and staff capability.

ABS continues to prioritise reliable, essential official statistics for the nation. Over the last two years, we have put increased attention and resources to statistical risk management across all of our key statistics, as we need to use our old patchwork IT systems for some time. For those of you familiar with the CPI, we are moving to annual re-weighting so it is more accurate over time. I have recently established a Chief Economist position at the ABS to increase focus on coherence of our economic statistics and to improve communication and understanding of our major economic indicators.

The ABS is expected to measure an economy, society and environment that are becoming more complex, and more complicated to measure. Globalisation is challenging how we measure economic activity and international trade. We are putting more attention to measuring the large and growing service sector of our economy. Productivity is a key policy and measurement conundrum. We have just published a new labour accounts series to provide more insights into multiple job holding and hours of work, and expanding statistics produced on the basis of our Environmental Economic Accounting Framework. At the same time, we are removing some lower priority statistics that do not have widespread demand.

Over the last decade, the ABS has been expanding our data integration activity. Through this expansion we have produced new statistics to inform important policy issues, such as the importance of small and medium sized enterprises to overall employment growth and innovation across the economy. We’ve created new insights into the employment and income outcomes for migrants, and we’ve provided new information on the outcomes achieved by participants in a number of government programs.

The ABS is enabling improved yet still safe access to our valuable statistical resources, especially our microdata. This is reflects our legislative framework that requires ABS to protect the secrecy of information we receive and maximise the use of our statistics.

Legal advice that I sought and received from the Australian Government Solicitor suggested that the ABS was previously implementing an approach to data release that was more restrictive than our legislation required. This had the consequence that Australia was not gaining as much benefit from ABS and our data resources than was reasonably justified to protect the information of individual businesses and persons. It didn’t meet a contemporary reasonableness test.

The ABS continues to use its rigorous internal processes, such as our Disclosure Review Committee and dissemination protections, to mitigate risks of identifying individuals and businesses. We have also updated our processes and implemented a broader “five safes” approach to assessing potential disclosure risk. The five safes are safe people, safe project, safe setting, safe data, safe output. Our improved Datalab facility provides more convenient access to microdata for approved users, while retaining a range of essential disclosure protections.

The ABS wants to improve the experience for those who generously supply information to us to enable the production of our essential official statistics. Over 80% of our business surveys are returned on-line, and we would like to increase this. Our household collections have modest e-collection rates, as we also manage the potential statistical risks from major disruptions to the mode of collection. We continue to assess opportunities to reduce the call we make on households and businesses. This includes considering alternative data opportunities aside from direct collection. Establishment of our National Data Acquisition Centre in Geelong is expected to improve the provider experience with the ABS.

I’d like to briefly comment on building the future capability of the ABS. Over the last two years we have improved many of our key stakeholder relationships. We have changed our governance committees and organisation structures significantly to improve our decision making. We have improved the diversity of our senior workforce, including now having just over 50% of our SES staff being women. We have also completed two years of our five year project to refresh our statistical infrastructure. We have implemented activity based working across most of our nine locations, providing a positive and more flexible working environment for our staff and increasing productivity of our organisation.

While much has been achieved increasing capability of the ABS, and enabling our staff to more fully use their skills and expertise, there is always more to do. The ABS transformation should be ongoing, reflecting expectations of a constantly changing external environment.

ABS has very skilled staff that will need to work somewhat differently in the future as we modernise our statistical infrastructure and take advantage of the changing information and technological environment. By bringing in new staff with other experiences and expertise through short term secondments or permanent placements, we complement those who have worked at ABS for a long time. We are assisting ABS staff to gain opportunities, insights and wisdom by working elsewhere, sharing their ABS knowledge, and then returning to the ABS with strengthened stakeholder relationships.

From my position as Australian Statistician responsible for this national treasure called the ABS, that has modernised what it does and how it works over the past 110+ years, I would argue that the data revolution we are now experiencing creates more challenges and more opportunities than at any time in the ABS’ history. More information is being produced (albeit of variable quality), technology is better able to process large information sets and the community have greater demand for information and an expectation to receive it in real time.

Some key data transformation issues

I would like to take the opportunity to highlight three key issues today:
  • the somewhat ubiquitous “big data”
  • contemporary statistical techniques such as data integration
  • social license and public trust

Big data

ABS is already a major user of big data.

The ABS has, for some time, made considerable use of government data that should be characterised as big data. This includes customs data, government financial data, taxation data, information from our State Registrars of births, deaths and marriages and immigration data. We have recently moved to make greater use of Medicare enrolments data alongside immigration data to produce reliable net overseas migration and state based population estimates. We are making use of planning data direct from a number of state government Departments to produce better quality information on building approvals.

We are using retail scanner data to produce accurate price measurements for one quarter of our national CPI, and now using web-scraping approaches to deliver more internet-based price information for the CPI.

ABS is also currently investigating other big data applications – two recent examples include satellite imagery for potential application in agricultural statistics and GPS data for freight transport statistics. Mobile phone data and other transport system information hold promise of future application in official statistics.

The emergence of big data, and its potential application for official statistics, is not confined to Australia. In 2014, the international statistical community, through the United Nations Statistical Commission (UNSC), established a Global Working Group on Big Data for Official Statistics. I’m proud to note that Australia through the ABS chaired the United Nations Global Working Group for its first three years.

The Global Working Group on Big Data is canvassing potential applications for official statistics using satellite imagery and geospatial data, mobile phone data and social media data, demonstrating the broad reach of this activity. It is also providing a vehicle to develop new methods, skills and capabilities in the use of big data for official purposes across the international statistical community.

This global activity demonstrates how the international statistical community very actively collaborates and shares innovations, an efficient way of improving our respective statistics. Many national statistical organisations are identifying how big data can be used for official statistical purposes, and applying it in a number of instances. The UN Global Working Group provides a platform to fast-track further development of new statistical methodologies and practices, making better use of big data as countries work together.

With big data, there are a number of broad considerations that national statistical organisations do need to consider, including:
  • the quality (accuracy and representativeness) of the big data compared to other data collection methods,
  • the comparative costs of using big data or sourcing information through other more traditional statistical means, and
  • the impact on the community from sourcing data via available big data compared to traditional survey approaches.

Big data and traditional survey collections are both legitimate information sources, but they are different, and I expect the ABS will continue to use both for many years to come. They each have a range of advantages and disadvantages.

A 2017 book entitled “Everyone Lies: What the Internet can tell us about who we really are” by former Google data scientist Stephens-Davidowitz, highlights some of the advantages and disadvantages of big data. He provides cogent examples of how some big data can be harnessed to provide new and accurate insights, but also cautions that data analysts need to be informed, smart users of these new data sources – just as I would suggest they should be smart, informed users of official statistics.

Data integration

Over the last decade, we have seen further development and refinement of statistical techniques, alongside the explosion of new data, and technological developments that enable efficient use of very large data sets. We have also seen expanding demand from key users for new data.

One of these key developments has been in data integration, enabled by improved statistical methods and technological possibilities.

Data integration involves bringing together different data sets for statistical and research purposes. It has been a statistical technique used for some time. For example, the ABS has used data integration in the Census context since 1966, matching Census returns with the post enumeration survey in order to produce more accurate population estimates and contribute to Census response estimates.

The ABS has developed its data integration capability and practice over the past decade. This statistical technique is not just something that has emerged over the past few years – it has been carefully nurtured and developed over some time:
  • In 2005 – ABS established the Census data enhancement project, developing methodological expertise on probabilistic record linkage, followed up by quality studies in 2007 assessing the value of name and address linkage;
  • In 2009 – ABS received funding to enhance its data integration facility to produce better estimates of Indigenous life expectancy;
  • In 2010 – ABS worked with Commonwealth Government Portfolio Secretaries to establish a set of high level principles for data integration activities;
  • In 2012 – ABS was independently accredited to undertake high risk data integration projects using Commonwealth Government data, and completed further quality studies assessing data integration using 2011 Census data
  • Through 2013-2017 – ABS has undertaken a range of data integration activities, using Census data from 2006 and 2011, to contribute to important policy issues.

Data integration has a number of advantages for the broader community:
  • it can deliver insights to complex and inter-related problems that single sources of data generally cannot provide;
  • it makes more effective use of data that has already been collected, reducing the respondent burden on households and businesses who might otherwise need to respond to data collections;
  • it enables statistical agencies and researchers to deliver new statistical and policy insights at much lower cost and in a more timely manner than would be possible if we needed to undertake new statistical data collections to cover these dimensions; and
  • it is privacy preserving – the use of existing data combined with statistical techniques and practices that ensure the confidentiality of personal or business information enables statistics to be produced without otherwise impinging upon the privacy of individuals, households and businesses through new intrusive data collections.

Greater onus is placed upon statistical agencies to ensure the secrecy of the integrated data. It is critical to ensure that any use or release of this consolidated data does not divulge information on individual persons, households and businesses. Great value is derived from the integrated data set’s aggregated information and the key data relationships that emerge from analysis of the consolidated data set – not from the individual data items.

Already, ABS has delivered some concrete examples of the statistical insights created through expanded and safe use of data integration techniques:
  • The ABS has been able to produce more accurate estimates of Indigenous life expectancy, critical to good measurement of one of the COAG Closing the Gap targets.
  • The Australian Census Longitudinal Dataset (ACLD) has already provided important information on where people who identified as autoworkers in 2006 were working five years later in 2011, and if they were still working. Data from the 2016 Census can add another five yearly snapshot.
  • The Business Longitudinal Analytical Data Environment (BLADE) has provided insights around the importance of small and medium enterprises to employment growth and innovation.
  • 2011 Census data has been integrated safely with a range of education and mental health program information to identify outcomes for people who have received this program assistance.

We are further enhancing the ABS’ national data integration capability through the decision of the Australian Government in the 2017 Budget to establish the Data Integration Partnership for Australia (DIPA).

DIPA will enable more Commonwealth government data sets to be safely linked to ABS surveys and Census information in a structured way, to deliver more statistics that are useful for policy decisions in the economic, industry, society and environment spheres. ABS will be the main data integrator in DIPA. The funding of a number of analytical centres in selected Government Departments should build their capacity to analyse information and deliver increased insights relevant to policy and budget related matters.

This type of statistical facility has already been implemented, successfully, in New Zealand with their Integrated Data Initiative largely managed by Statistics New Zealand together with increased analytical effort in the NZ Treasury. Many statistical agencies across the developed world are increasingly using data integration for policy and research purposes, so we do have the opportunity to collaborate with the world’s best practice.

Social license and public trust

This data revolution is not taking place in a vacuum, where everything else is staying still. The standard economist’s assumption that “everything else is equal” (ceteris paribus) does not hold for the information age.

The current data transformation is taking place in an environment where there is declining trust in public institutions (and NGOs, business and the media) across the world and also in Australia (2017 Edelman Trust Barometer).

While the data revolution is opening up new information opportunities, I suspect there is also less trust in data security. The community is regularly informed of large data breaches (predominantly reported from private sector organisations, such as Verizon, Sony, Red Cross, Equifax), and there is concern about the potential of cyber security hacks to extract sensitive data. This is the modern environment in which data is being provided and used.

ABS has a compact with the community that we will safeguard the necessary and often sensitive personal and business information we can legally compel from individuals, households and businesses in order to produce our nation’s essential official statistics. We also have a compact (and many Memorandums of Understanding) with government data providers that we will safeguard the data they provide to us.

ABS has very strong data secrecy legislative provisions, reinforced by ABS practices, systems and culture. This includes strong disclosure controls to safeguard your information, and the sensitive information we receive from other agencies. We will only use your information for legitimate statistical purposes, to support policy and research, to aid decision making and contemporary understanding our economy, society, population and environment.

Strong community trust that the ABS will keep your sensitive personal and business information safe is essential to ensure we receive accurate responses to our data collections and high levels of voluntary compliance with our survey and Census program. This is crucial for us to deliver high quality, timely statistics back to the community.

At the same time, the ABS wants the nation to get maximum benefit from the statistics we produce. The official statistics produced by the ABS are part of our nation’s public data resource, paid for by taxpayers over many years, and they should be used to benefit our community.

This reflects a careful balancing dimension, where ABS is safeguarding the secrecy of personal and business information, while also enabling safe, effective use of our statistics across the community.

ABS has traditionally taken a very risk-averse view about where this balance should be struck, and this would have reduced the utility of our statistical resource to the detriment of the broader community. I would argue that the ABS is currently pursuing a more balanced, still safe but more effective, approach to data use.

I do have a warning for those pursuing a more cavalier approach to open and available data (usually with the noble purpose of enhancing knowledge). If this is done in a way that compromises public trust and confidence in the ABS’s data release practices, it may also have the dire consequence of compromising the quality of Australia’s official statistics.

The 2016 Census experience provides some contemporary evidence about this issue of social license and community trust as it relates to the collection of data and production of statistics.

Those of you who are old enough or are perhaps Census history aficionados would recall the Census privacy debates particularly in the 1970s. Census privacy debates also occurred in the late 1990s and in the lead up to the 2006 Census. Privacy debates are a feature of Australian Censuses. This is a feature that is uniquely Australian when I look at the experiences of our national statistical office colleagues in Canada, New Zealand, the Netherlands and Ireland who all collect and retain personal information indefinitely from their five-yearly Censuses for sound statistical purposes.

The recent privacy furore that surrounded the lead up to the 2016 Census, driven by privacy advocates and featured in mainstream and social media, was consistent with the media commentary around the Censuses in the 1970s. The ensuing political commentary around the 2016 Census was a little different, reflecting the more combative political environment in Australia over recent years and the timing of the 2016 Census just after a tightly contested and close federal election.

So how did this privacy debate play out in the real world, across the Australian community beyond the privacy advocates and the media? Right through the privacy furore in the media and social media, sentiment testing among the Australian community showed a consistently strong intent (97-98% of respondents) to complete the Census accurately and fully.

The commitment of Australians to complete the Census was unwavering. We achieved a 2016 Census response rate of over 95 per cent, comparable to the 2006 and 2011 Censuses. We achieved an 80% increase in those who completed the Census on-line in 2016 compared to 2011, and this higher on-line response delivered a quality improvement to the Census data in 2016. Overall, the Independent Assurance Panel I established to review the Census concluded that the 2016 Census data was of comparable quality to the 2006 and 2011 Censuses.

The other key feature of the recent Census experience was that many of those who are strong supporters of maximising the use of data for sound policy and research purposes were comparatively silent. The conclusion I have drawn from this experience is that if researchers and policy developers want to have effective use of the data collected by ABS, they do need to become more vocal advocates for this in the public sphere.

Concluding remarks

My intention today was to give you an insight into the challenges and opportunities of a modern national statistical organisation.

While there is currently more data now than ever before, there are still demands for more information from the ABS than we can satisfy from our current resources. There are a number of key surveys that we have not undertaken for some time, such as a time use survey and updated mental health and wellbeing survey, that should be part of our set of contemporary official statistics.

My final comment is that we should not only focus on what data is or is not available.

We should recognise there is much more than can be done to analyse and use the data that is already available. We should support Governments, universities, businesses, NGOs, think tanks and independent researchers who are big users of data, to use more of the valuable data that is available for them.

When you bring together quality data and quality analysis, you can contribute new insights that will help shape our nation’s future. When you use quality data in sound analysis you move from anecdotes and prejudices to good policy design and well-informed decisions.

I suspect this is where a major challenge and opportunity lies. We must do more to develop effective partnerships between data producers such as the ABS and the nation’s analytical capability. Greater attention does also need to be given to building our capacity as a nation to analyse and make best use of our available data resources, and we must do this in order to make best use of the anticipated burgeoning data resource over coming years.