’Big Data’, Official Statistics and National Statistics Offices
 

Australian Statistical Conference 2016
Presentation: ’Big Data’, Official Statistics and National Statistics Offices, 6 December 2016, Canberra
David W. Kalisch, Australian Statistician


Introduction

First, I would like to acknowledge the Ngunnawal people, the Traditional Custodians of the land we are meeting on today and pay my respects to their Elders both past and present, and to acknowledge members of the Aboriginal and Torres Strait Islander community who may be attending this conference today.

I would also like to thank the Australian Statistical Conference for the privilege to present the Knibbs Keynote address.

One dimension that was clear to me before I took on the role of Australian Statistician was the importance of the ABS and its official statistics. Our statistics provide the bedrock for economic, fiscal and monetary policy. They shine a light on our society and environment, and show how it is changing. Our official statistics inform decision-making, and are an essential contribution to our democratic institutions.

Since I became Australian Statistician I have seen first-hand the commitment of the ABS to delivering the quality, relevant, timely statistics that Australia needs. And the experiences of recent months have highlighted how important it is that as a Nation we value and trust our official statistics, and continue to invest in them. It also demonstrated the need to continue to innovate and transform the ways we produce our statistics so we can address current and future complex policy questions.

National Statistics Offices must be responsive or else in the absence of official information – less reliable information will be used – as we have seen with the amount of ‘false’ news on social media.

I will firstly talk about the value of official statistics and the role of the ABS. I will then talk about ‘big data’ – in its many forms and what they might mean for official statistics in the future. I will also talk about what the ABS is doing to transform and position itself for the future.

Sir George Handley Knibbs

It is apt to be presenting about big data and official statistics at this lecture named after one of my forebears, the first Commonwealth Statistician Sir George Handley Knibbs. And in some ways it seems that things have not changed!

When Knibbs was appointed 110 years ago, he was given the responsibility to set up the Commonwealth Bureau of Census and Statistics and to unify the states’ statistical collections. It soon became clear that the goal of uniform national statistics was not easily achieved, and it was necessary to undertake original compilations and to take over responsibility for some statistics. The first of these was commerce statistics with shipping returns sent directly to the Commonwealth Bureau. The second of these was vital statistics. Both of which are ‘big data’ administrative sources that we continue to use to this day!

Knibbs also focussed on the task of producing quality, timely, relevant statistics. The main challenge for him was to produce timely, nationally consistent statistics from state data. Knibbs approved a series of prototype statistical forms to be used by each state with the intention to streamline the statistics obtained from each state to then produce national statistics. Despite in principle agreement, the states were by no means united in the promptness with which they supplied the data, and the Commonwealth Bureau was unable to produce complete collections until all state input was received. Knibbs was understandably frustrated by this situation. For their part, state statisticians complained that Knibbs ignored Conference resolutions and did things his own way!

Knibbs left the Bureau in 1921 to take up the role of the director of the newly constituted Commonwealth Institute of Science and Industry, the fore runner of CSIRO – personally demonstrating a strong link between statistics, science and innovation.

The Value of Official Statistics

So, what is the value of official statistics in the 21st century? In a set of advertisements for a prominent, internationally-accepted credit card several years ago, we were presented with an idyllic scene from a family holiday or some other prominent time in someone’s life journey.

The plot runs something along the lines of use of this credit card is really helpful to make a purchase of a good or service, but the happiness and enjoyment received by the person or their family overall is much greater than the actual monetary cost. The outcome is truly “priceless”.

To cut to the chase, my conjecture is that the value of quality, relevant, timely statistics to governments, to business and the community is truly priceless. The value of ABS statistics to informed government decision making, to critical business decisions and to community awareness that is essential for democratic processes is, on any reasonable assessment, worth much more than the cost to taxpayers of funding ABS products and services. We provide public value far in excess of our annual budget cost.

To give some sense of value, I cannot think of any major decision made in Australia that did not have any regard to at least some quality information produced and provided by the ABS.

To highlight just a few examples of how ABS data is used:
    • for government policy (macroeconomic settings, through to distribution of GST and federal financial grants, planning key infrastructure such as schools, hospitals and roads according to timely population projections, our electorate boundaries, policy choices from industry assistance through to education, employment services, immigration and the environment, as well as regulatory arrangements to give just a few examples)
    • for businesses deciding whether to enter markets, expand their business operations or divert into different activities
    • for a range of community and NGOs delivering services or advocating to governments or the community, and
    • for households or individuals seeking to understand job opportunities in particular locations, comparative house prices or change in the cost of living in our capital cities and the socio-economic features of respective communities and how Australia compares internationally.

These are just a few examples of how official statistics, and information derived from them, are essential to economic and social progress and community awareness in Australia.

Value of the ABS

The ABS is one of the Commonwealth government’s most important public facing agencies. We play an important role in delivering these official statistics, such as our population estimates, national accounts, inflation and labour force statistics.

With almost 500 statistical products released last year, 15 million visits to the ABS website and 3 million downloads of ABS data, the ABS collects and reports a range of economic, social and environmental data. These statistics are produced through a range of different methods to provide a picture of the Australian economy, society and environment – including detailed cross-sectoral surveys, time series collections, composite indicators, administrative by-product, population samples, Censuses and longitudinal and panel data.

Each of these methods of producing statistics have different costs and benefits and the approach selected needs to be fit for the intended purpose.

In this pluralist data world, other organisations also play their role – and should be expected to play their role – in contributing to the rich information tapestry to inform the nation.

The ABS also has a legislated role to oversee and encourage consistency of standards across Australia’s national statistical system, recognising that official statistics are contributed by other organisations, such as the Australian Institute of Health and Welfare and the Bureau of Meteorology, to name but two.

Many of the challenges faced by the ABS are not unique to the ABS, but to a greater or lesser extent are faced by all national statistical offices, especially in developed economies, including:
  • the challenges and opportunities of delivering official statistics in an information age
  • the choices inevitably made by national statistical offices in an environment of growing expectations and constrained resources, and
  • choices made by governments around budget allocations made to national statistical offices and to other government services and costs, and over more recent years within the environment of prevailing pressures for fiscal consolidation.

Delivering official statistics in an information age

We are all familiar with the reference to living in an information age.
There are a number of features, such as the increased amount of information now available from a variety of sources, increasing access to information via mobile devices, and the increased power and reduced cost of computers. The democratisation of data increases the opportunity for many organisations and individuals to store and analyse very large amounts of information, and introduce new information beyond that produced by official statistics. And individuals and businesses expect that existing information is used effectively, reducing survey response burden wherever possible.

To give you a sense of the increasing information, as this slide shows (Business Insider Australia, 19 Aug 2015, accessed 2 May 2016 (http://www.businessinsider.com.au/infographic-heres-how-much-data-is-created-on-the-web-every-minute-2015-8)), there is an unimaginable amount of data created every minute of every day. (For example, Google alone processes 20 petabytes of information per day. It would take 223,000 DVDs to store a single petabyte – Computer Weekly, accessed 2 May 2016 (http://www.computerweekly.com/feature/What-does-a-petabyte-look-like))

But not all of this big data is structured in a way that can be easily used to understand individuals, businesses, and communities.

From the perspective of a national statistical office, this information age can be perceived as a benefit or a curse, and it probably has elements of both:
  • To some extent, national statistical offices benefit from the new and expanding data sources, as some information can be available to us in more timely and cheaper means than our current processes, and available as key inputs to processes of compiling official statistics. However we need to be judicious about the quality of these potential inputs, and a cost-benefit lens is applied in these circumstances.
  • We recognise that some of these new information sources can be seen as competing with official statistics, where users then need to make a judgment call around the overall utility of competing information. That is their choice, but they should be well informed around the data they use and ensure it is fit for purpose.
  • In some instances, national statistical offices might choose not to continue to deliver some statistics in the future if there is a reputable, valued alternative supplied by the market, allowing the statistical office to divert its scarce resources to the production of higher priority statistics where no feasible alternatives are available or where it is essential for certain statistics to come from an official source.
  • The ABS has benefited from the reduced cost of computing and technology, and we now have many new opportunities to do more with the data and manage larger data sets very efficiently, which previous technology would have only allowed at prohibitive cost.

The information age also presents another key challenge for official statistics, that I would call the ambidextrous desire – where users want consistent reliable time series data and also want us to introduce innovative measurement that better captures new features of our economy and society.

Many users of official statistics want to get information produced in a consistent way for a long period of time to allow for ready comparisons. National statistical offices do this very well. Estimates are prepared according to statistical standards (often international, to assist with robust international comparisons).

The international official statistics community regularly reassesses the suitability of international statistical standards in the light of the uses of the information and the ability to improve measurement, such as revising the System of National Accounts and updating international labour statistics standards.

Official statistics do need to be able to measure changes in the real economy and society. Over its history the ABS has improved the way it collects information on our economy and population and has broadened the range and complexity of the statistics we produce on industries and our social conditions.

Reconsideration of the breadth and depth of our statistical program is a constant for us, as we consider the need to better measure, for example, the productivity of the important health and education sectors of our economy and the growing sharing and digital economy.
There are undoubtedly pressures to deliver more information in more creative ways. However, many users of official statistics are not willing to compromise at all around the quality of information, in say national accounts or CPI estimates, for some modest improvements in timeliness.

Overall, my assessment is that the new information age provides more opportunities and more benefits than concerns.

These are not simple choices as they require national statistical offices to consider a range of factors, such as information uses and requirements, priority setting, the nature of internal data acquisition and processing approaches including technical methodological challenges, dissemination opportunities, privacy and security of data, and potential integration of new information sources with other official statistics.

Ultimately, it is the modern reality that national statistical offices need to be aware of the rapidly changing information landscape and make good judgments on where and how to (as well as when not to) utilise new information and new approaches for official statistical purposes – while also protecting the confidence and privacy of individuals, households and businesses.

Use of ‘big data’ in ABS – examples

As I mentioned, the ABS has a long track record of effectively working with big data. The Australian Census is a dataset covering the entire population, and the 2016 predominately digital Census is one of the world’s most comprehensive with 45 topics and 61 questions. Not only does it provide reliable information for very small groups of people and communities, the Australian Census Longitudinal Dataset extends this further, by bringing multiple Census snapshots together and creating a longitudinal view of the lives of Australians. I will talk more about the ACLD shortly.

Demography

The ABS also has considerable experience working with high volume public data. Our official population estimates use 40 million cross border movements each year to determine net overseas migration, along with about 350,000 changes in addresses registered through Medicare, and 300,000 births and 160,000 deaths collected through the state and territory registries.

Customs data

Our economic statistics have also been using big data since the inception of the ABS. Our International Trade area has been receiving customs import and export data on a continual basis since 1905. In 1904, it is recorded that 1,873 vessels carried 3.4 million net tons of goods from Australian ports. Now we receive millions of records each month to compile our international trade statistics.

Consumer Price Index

The big data used to compile official statistics is not limited to government sources. Since the March 2014 quarter, retail transactions including from supermarkets, have been used to price over 25% of the weight of the Consumer Price Index. This data has not only reduced the need to physically price products in stores but has improved the accuracy of the CPI by increasing the frequency of price observations, increasing the product and business coverage, and allow more frequent updates to the weighting information. It could also open up new statistics in the future, subject to funding, of course, such as regional or monthly price measures.

The ABS is currently undertaking research to make even greater use of transactions data in compiling the CPI. The ABS research is focused on:
  • making use of automated methods and techniques that would allow the number of products priced in the compilation of the CPI to increase significantly; and
  • using revenue data for each product to weight the prices based on their economic importance each period.

Both of these actions enhance the measurement of inflation in Australia using 'Big data'.

Housing Finance

The ABS also receives on a daily basis from the Australian Prudential Regulation Authority, or APRA, housing finance commitments from significant lenders such as banks. These are used to produce statistics on housing finance but are also input into the Australian National Accounts and our Gross Domestic Product estimates.

Business Tax Data

We also use business income tax from the Tax Office as an input into our business statistics and to inform the National Accounts. It is used in a range of ways from ensuring accurate sampling, assessing the quality of the data, and benchmarking the data but most significantly to allow the ABS to include small businesses in the estimates without having to directly burden these businesses with surveys.

These are but a few examples of how the ABS has been using big data sources for many years.

New ‘Big’ ABS datasets

Big data is also created both through integrating large administrative datasets, and integrating data with the Census. This has been a focus of the ABS and major national statistical offices over the past decade and will be the focus of the ABS in the future. Multiple large datasets can be brought together over time to provide new statistical and policy insights.

Australian Census Longitudinal Dataset

For a statistical view of Australians' journeys through life, the Australian Census Longitudinal Dataset or ACLD integrates a sample from the 2006 Census of Population and Housing with the 2011 Census. The ABS will also link the 2016 Census. This dataset provides insights into the changes over each five year period for people and families. Through using a 5% sample of the Census, the large sample size allows longitudinal analysis of groups of people that would be extremely difficult with conventional longitudinal surveys.

With the range of Census questions on the ACLD, transitions over time can be analysed from many different perspectives. For example, the 2014 Industry Report produced by the Department of Industry, Innovation and Science, used the dataset to analyse employment outcomes of Automotive Manufacturing workers. The analysis found that despite the magnitude of structural change in the Automotive Manufacturing Industry between the 2006 and 2011 Census, the employment outcomes for automotive workers were mostly positive – most workers exiting the sector managed to transition to other industries or sectors.

Over time, the Australian Census Longitudinal Dataset will continue to grow in value as data from each successive Census are linked, providing an extended longitudinal picture of the social and economic conditions in the lives of Australians.

The ACLD has been a huge success with over 8,000 users registered. The utility of the data has also been enhanced through linking it with selected administrative datasets including migrant settlements. We are also in the process of linking social security payment data. The 2016 ACLD is scheduled for release in December 2017.

Business Longitudinal Analytical Data Environment

In partnership with the Department of Industry, Innovation and Science, the ABS has built the business longitudinal analytical data environment or BLADE. BLADE combines several years of Australian Taxation Office administrative tax data with ABS business survey data to provide detailed information on the characteristics and finances of Australian businesses. Formerly known as the Expanded Analytical Business Longitudinal Database or EABLD, this integrated data environment enables analysis of businesses over time and includes the micro-economic factors that drive performance, innovation, job creation, competitiveness and productivity.

BLADE has already been used to examine the contribution of start-ups to job creation in the Australian economy, revealing that it is young small to medium enterprises that make the greatest contribution to overall jobs growth.

And just last week, BLADE was used to show that innovation active businesses outperform non-innovation active businesses. It also showed that the frequency of innovation matters as the positive impact of innovation gets stronger when businesses innovate more frequently.

The Commonwealth Treasury are also exploring the use of BLADE to develop an understanding of the role of competition in driving productivity across sectors of the economy, the drivers of small business performance and the role of government regulation on entry and exit in the marketplace.

Linked Employer-Employee Dataset

The foundational linked employer and employee dataset opens up a source of new insights into employment. This brings together personal income tax data with business data from BLADE and has already provided a better understanding of multiple job holding.

This represents an important first step, with the future to include data across multiple years and more detailed socio-economic and demographic information related to employees. Over time we should be able to explore the drivers of firm-level performance, such as the educational qualifications of employees. It will also provide more insights into job creation and job destruction as industries change over time, with the ultimate longer term goal to enhance our understanding of productivity, the changes in employment by industry, entry and exit to the labour market and other labour market dynamics.  

Multi-agency Data Integration Project

There are also future data opportunities with our other data integration partnerships, such as the Multi-agency Data Integration Project. This brings together, for the first time, Census data with administrative data on health, income, and social security payments, to establish a rich cross-portfolio data resource that can be used for research and policy purposes. The project, using 2011 data, is currently in an evaluation phase and has significant potential to be extended across time (longitudinally) and expanded to include other data sources of importance to program evaluation and public policy.

But ultimately in the longer term we do not want a range of bespoke integrated datasets presenting partial views of people, businesses or communities – the ABS would prefer to establish integrated views that would be readily available to be used responsively to address current and emerging policy questions across a wide range of issues.

Other emerging big data sources?

But all of this is just the beginning. As I have already mentioned, the future big data sources will not be confined to government administrative data. The ABS has been exploring other data sources.

Freight GPS

The Commonwealth Bureau of Infrastructure, Transport and Regional Economics (BITRE) and the ABS are examining the feasibility of administrative telematics data as a source of road freight statistics. This can transform the way in which road freight statistics are collected and disseminated, and improve decision making for transport policy and by industry.

Due to the size and diversity of the road transport sector freight we currently rely on traditional sample surveys, such as the Survey of Motor Vehicle Use and user-funded Road Freight Movements Survey. These surveys are costly, impose significant respondent burden and release of data is much later than collection. Telematics presents an opportunity to obtain this data as a by-product of already collected information.

Telematics data is generated by devices fitted to the vehicle that record GPS coordinates at time stamped intervals along with a range of data from the vehicle’s engine management system.

Satellite

The ABS has also investigated the feasibility of different methodological approaches to using satellite imagery to estimate crop area statistics. This work identified some opportunities but also challenges with using satellite imagery as a way to replace direct collection from farmers to produce agricultural statistics such as land use, crop type and crop yield.

Mobile phone

The ABS also sees the value of using mobile phone information, given the constant close proximity of most of our phones to our person – most of the time. We have investigated the methodologies for using mobile device location-based services for estimating day time population and measuring population mobility.

Summary

These are only three examples of the new opportunities for big data for official statistics, which can be integrated with traditional sources to produce existing or new official statistics.

It is essential that Australia leverages big data effectively to best equip it to answer the complex social and economic policy questions of the 21st Century.

However, there will be challenges for us to work through with using these new data sources. What is the quality? Are the questions we need to consider different to the way we have approached traditional sources? Are the ways in which we need to manage big data different? How should it be effectively used? What methods should be applied? How should users interpret the results of big data analysis?

This is an important new space for the ABS to be in. We have experience with working with very large datasets, we have the legislation and policy safeguards to ensure that the confidentiality of data is maintained, we have the methodological experience and we invest in techniques and technology to best use and protect the data.

ABS transformation

This work has been in the context of the ABS transformation. The ABS is an organisation that has traditionally taken a very cautious and conservative approach to innovation. This reflects the over-riding demand of our key stakeholders that we continue to produce accurate, consistent time series with our key economic and population statistics.

However, even against this backdrop we cannot afford not to innovative what we do and how we work for a number of reasons.

Over the last few years, the ABS has been grappling with the parlous state of our statistical systems and the risk posed for Australia’s key statistics.

There are also clear gaps in our statistical products that need to be addressed – for example, service sector, productivity measurement, industry and community transitions, Australia’s environmental assets.

We need to have strong and ongoing engagement with key stakeholders about our statistical work so we understand the changing priorities for information – while the changing information environment is offering new sources of information that can be used to produce official statistics.

Community expectations are also challenging the past ABS business models that have collected a lot of data directly from households and businesses.

To continue to deliver the statistics Australia requires into the future, the ABS statistical business systems are being modernised through an investment by Government of $257 million over five years.

This is being complemented by an extensive transformation of our internal operations, with a focus on our partnerships, strategy, governance, people, culture and our property infrastructure. We are cognisant that taxpayers fund our services and we are determined – as much as possible – to continually improve the value to taxpayers from our stewardship of their resources.

This transformation is probably the most significant and comprehensive change in the history of the ABS. It will change the way all aspects of the business – not only the systems and processes for producing our statistics but also how we manage and support our business.

It is ambitious but it is necessary to ensure we can continue to produce the statistics required to make informed decisions in a rapidly changing world. It will also build capability for the ABS in the future.

We need to deliver this transformation while we still produce regular statistics, that are high quality, relevant and on time.

We are eighteen months into this transformation.

Within these eighteen months we have clearer government expectations of the ABS, developed our stakeholder management plan, delivered on some of our new key foundational infrastructure including our metadata registry repository, made some initial improvements into our statistical risk management and commenced transitioning to our new culture of activity based working environment with high performing and more diverse staff focussed on analytical rather than processing activities.

There is still more to be delivered in the next three and a half years, including our enterprise data management environment, systems to acquire household and business data – predominately on-line, and moving to our re-engineered statistical processes and our future service model. Workforce capability and processes will require constant attention.

We need to achieve improvements to the ABS processes, reduce risk to key economic and population statistics but also improve access to our statistics that does not compromise the privacy and confidentiality of sensitive personal and business information.

This ABS future is to provide better statistical solutions and the statistics required for current and future decision making. We need to fundamentally improve the way information resources are managed. As I have said, it will include maximising, as much as possible, the use of existing information, not only the many government administrative data sources but exploring non-government data.

We will continue to collect information directly from people and businesses, but even here seek to maximise the value of this information through linking other datasets to our Census and surveys. We will have greater co-ordination of effort across our business and social statistics, creating more complete and coherent statistics.

In Conclusion

The ABS is working to achieve greater use of our information and evidence. We want ABS statistics to contribute to good decision making by governments, business, households and across the community, while naturally protecting the confidentiality of the sensitive information provided to us by households and businesses.

‘Big data’ can provide useful insights to aid good decision making, but they do need to be seen alongside other information resources.
With the inevitable challenge of limited resources, expanding user demands and keeping ahead of measurement challenges within a dynamic economy, society and environment, the ABS continues to face difficult choices around our priority statistics.

Iterative change will not be sufficient, and the additional investment we are making in new statistical processes and systems is key to our future capability.

The other critical part of our change is for the ABS to improve our partnerships with key stakeholders, to draw not only on their understandings and in some cases their resources, but to co-design new statistical solutions that provide greater insight than is currently possible from our current statistical program.

It is an exciting but demanding time for statistics, to provide governments, businesses and the community the information required for important decisions.

Thank you