Using scanner data to estimate household consumption, September 2021

Experimental Food Estimates for Household Final Consumption Expenditure

Released
6/09/2021

1. Abstract

The Australian Bureau of Statistics (ABS) has been collecting ‘point of sale’ systems data from supermarkets, otherwise known as “scanner data” or "transactions data", since 2011. The main use of scanner data in the ABS has been in the compilation of the Consumer Price Index (CPI). This paper outlines methods relating to quarterly experimental Food estimates based on supermarket scanner data for the 5206.0 National Accounts publication. The paper also demonstrates the potential scanner data has for official statistics due to its coverage, frequency, and granularity.

The paper details how scanner data will be used to produce current price and volume estimates of food consumption for the National Accounts. Additionally, the paper assesses the quality and success of these methods against the ABS Data Quality Framework and outlines how this data will be incorporated into future National Accounts releases.

Authors (alphabetical order): Rishab Babji, Ashleigh Kidd, Sameer Nawaz and Andy Peisker

National Accounts Branch, Macroeconomic Statistics

The ABS welcomes comments and suggestions from readers. To provide feedback, please email national.accounts@abs.gov.au. 

2. Introduction

The Australian Bureau of Statistics (ABS) is focused on maximising the value of ABS outputs, and utilising alternative data sources such as data from ‘point of sale’ systems from supermarkets (scanner data). These data sources offer significant benefits due to their high frequency, coverage, and granularity. They also have the potential to reduce collection costs, and the reporting burden placed on the Australian community.

This information paper provides the key findings of recent work to develop experimental estimates of Food consumption within the Australian National Accounts. The paper outlines the concepts, data sources, methodology, and quality assessment of the experimental estimates as well as a plan for how these estimates will be incorporated into the National Accounts.

3. Background

The ABS has a strong focus on using new and emerging data sources to produce new statistical insights and improve the quality of existing outputs. The National Accounts has been at the forefront of this endeavour, including the development of more accurate and timely measures of household consumption using transactions data. The research paper, Recent applications of supermarket scanner data in the National Accounts, highlighted some initial applications of scanner data to produce indicators and analytical insights of household consumption. The ABS has undertaken further research to develop more comprehensive methods to compile estimates of food consumption using the scanner data. As part of this work, the ABS has developed new quarterly benchmark (level) estimates of food consumption which have been built up from product information in the scanner data. This work demonstrates improvements to the accuracy and timeliness of existing measures of food consumption in the National Accounts.

4. What is Household Final Consumption Expenditure?

Household Final Consumption Expenditure (HFCE) in the National Accounts measures the value of goods and services purchased by Australian households. It is the largest component of Gross Domestic Product (GDP) accounting for around 60% of the total. HFCE is compiled in accordance with international standards contained in the System of National Accounts, 2008 (SNA). Australia’s application of the SNA standard is described in Australian System of National Accounts: Concepts, Sources and Methods, 2020-21.

HFCE estimates are currently based on the Household Expenditure Survey (HES) and the Retail Industry Survey and Wholesale Industry Survey (RIS WIS) which are only available every 5 to 6 years. These estimates are updated on a quarterly basis using more frequent survey data (including the Retail Trade Survey).

The ABS publish both annual and quarterly estimates of HFCE and these are broken down using the Classification of Individual Consumption by Purpose (COICOP) and by State and Territory. Quarterly estimates of HFCE are aggregated to the COICOP group level, with more granular level information available on an annual basis.

5. What is supermarket scanner data?

The ABS currently receive scanner data from major supermarket chains, this data accounts for approximately 84% of all expenditure by consumers through supermarkets. The datasets are supplied weekly with the following dimensions:

  • Product/item description
  • Quantity of items sold
  • Dollar value of items sold
  • Geographical location of stores

The ABS currently utilises this data to compile estimates for the Consumer Price Index (CPI) and Apparent Consumption of Food Stuffs.

6. Methodology

6.1. Overview

Food consumption in the National Accounts is an inherently complex concept to measure due to the broad variety of sources from which households obtain food. The ABS has adopted a layered approach, by which scanner data is a base layer, and other components build up to the National Accounts concept of food consumption. These steps are described in further detail below.

6.2. Outlier removal and imputation

Scanner data is a by-product of the point of sale systems of businesses, therefore methods for data cleaning, outlier removal and imputation, are required to ensure the data is fit for purpose in National Accounts.

For example, scanner data can occasionally include systems abnormalities or even duplication in sales due to product refunds which may lead to unusually large sales figures. Where possible the ABS has applied standard statistical techniques such as windsorisation, and the ABS has developed automated imputation techniques which detect and treat outliers for large scale transactions datasets.

6.3. Mapping scanner data to standard classifications

In its raw form, the scanner data comprise more than 2 million distinct products (at the Stock Keeping Unit level [SKU]). Many of these products include both food and non-food items. The scanner data are mapped to Input Output Product Classification (IOPC) system, considered the building block of the National Accounts and the lowest level of classification used in these experimental estimates. These estimates are then aggregated to the Supply Use Product Classification (SUPC) followed by aggregation to the Classification of Individual Consumption According to Purpose (COICOP), see Figure 1 for a schematic of the aggregation structure.

The ABS has developed a machine-learning tool called the Intelligent Coder (IC) for large-scale classification tasks such as the mapping of supermarket scanner data to IOPC. Future work on these estimates will utilise this tool in the future to classify products. At present natural language processing methods combined with manually developed concordances are used to map the scanner data to standard classifications in line with traditional data mapping methods.

    Figure 1 - Aggregation structure diagram product classifications in the National Accounts

    Schematic aggregation structure diagram product classifications in the ABS

    Figure 1 - Aggregation structure diagram product classifications in the National Accounts

    This figure illustrates the aggregation structure used in the experimental estimates of HFCE Food. Via a pyramid diagram, it shows how Input-Output Product Classification categories, which are the finest level of product detail published in the National Accounts, aggregate to the Food COICOP group via intermediate levels. The two main intermediate levels are SUPC (Supply-Use Product Classification) and IOPG (Input-Output Product Classification). At each level of aggregation, examples of product categories are given, for example the IOPCs "Almonds and Macadamias" and "Onions" aggregate to the "Fruits, Nuts and Vegetables" SUPC and then to the "Other Agriculture" IOPG.

    6.4. Layered approach to estimating final food consumption

    The food category published in the National Accounts includes all food purchased by households mainly for consumption or preparation at home. The concept extends beyond supermarkets and food retailers to include other sources of food such as backyard food production and food sourced from non-profit institutions. However, pre-prepared meals provided by caterers, restaurants and similar establishments are not included in Food, and instead are recorded in the services components of HFCE. Given the relatively broad concept of food consumption, there is a need to make use of a variety of data sources to estimate total food consumption, including ABS surveys and alternative data sources such as the scanner data.

    The scanner data currently received by the ABS covers approximately 60-65% of food consumed in Australia by households. The remaining 35-40% coming from other sources such as supermarkets not included in the scanner data received by the ABS and speciality food stores such as bakers, delicatessens, wholesalers, and farmers’ markets. Under the layered approach, the estimates are built up sequentially with scanner data providing the initial layer upon which three additional layers are applied in sequence. This approach can be seen in Figure 2.

     

     

      Figure 2 - Components of food consumption and their contributions

      Figure 2 - Components of food consumption an their contributions

      Figure 2 - Components of food consumption and their contributions

      This figure illustrates the conceptual coverage of various data sources used in the experimental estimates, depicted as circles ("layers") with surface area proportional to the component's coverage in dollar terms. It shows that the scanner data from major supermarkets (layer 1, innermost circle) has around 63% coverage of all food consumed. Other supermarkets have around 12% coverage (layer 2), the rest of the retail industry has 13% coverage (layer 3) and the remaining 12% come from sources including purchases from wholesalers, manufacturers and home-grown food amongst others (layer 4).

      A range of data sources and bespoke methods were developed for each layer and are briefly described below. Further detail of the method can be found in Appendix 2.

      Layer 2 – Non-scanner data supermarkets (smaller supermarket chains)

      Layer 2 represents food purchased from supermarkets which are not covered by the scanner data received by the ABS. Layer 2 uses the Retail Trade survey which includes monthly estimates of sales from all Australian supermarkets and grocery stores to measure total sales. The product detail from the scanner data the ABS does receive is applied to the product estimates for these supermarkets to ensure that the composition of product categories reflect changes in household spending patterns over time. This method assumes the changes in the composition of products recorded in the scanner data the ABS receives are representative of changes across all supermarkets.

      Layer 3 – Food consumed from non-supermarket retailers

      This layer represents retail industry sales of food outside supermarkets and grocery stores and makes up around 13% of total food consumption. These include speciality food stores and sales from non-food retailers such as pharmacies and newsagents. This layer also makes use of the Retail Trade survey to measure the sales of these retailers. To allocate these sales to products (at the IOPC level, the estimates were weighted by proportions estimated using the RIS WIS, these proportions are carried forward through the time series.

      Layer 4 – Food consumed from non-retail industry sources 

      This layer represents consumption outside the retail industry, including sales from service industries such as casinos, ships and airlines, as well as wholesalers and manufacturers direct to households. Spending by overseas residents and businesses which are out of scope of household consumption are also removed as part of this layer (Net Overseas Expenditure and any business expenditure at supermarkets or retail outlets not removed in previous layers). This layer makes up around 12% of food consumption. There are a number of associated adjustments for these individual components which use a range of data sources such as industry and household surveys, Business Activity Statement data, Reserve Bank Australia (RBA) transaction data and research reports from external providers. While these data sources combine to produce good quality estimates for layer 4, work is ongoing to maximise the use of new and emerging data sources to supplement these existing sources to produce more timely and accurate estimates. These components are also included in the published HFCE estimates and are described further in the Australian System of National Accounts. Figure 3 demonstrates these layer measurements in further detail.

        Figure 3 - Average composition of food consumption in Australia, 2015-2021

        Figure 3 - Average composition of food consumption in Australia, 2015-2021

        Figure 3 - Average composition of food consumption in Australia, 2015-2021

        This figure expands on figure 2 by highlighting, via a to-scale coverage pie chart, more component detail within the four layers. It shows that, for example, within layer 3, the large majority of the food purchases are from speciality food stores such as bakers and delicatessens, and a small part comes from other non-food retailers such as newsagents, pharmacies and 2-dollar shops. Within layer 4, it shows that food purchased from the services sector (e.g. entertainment venues) has a similar contribution to total food as purchases from wholesalers and that grown at home, as well as to all remaining sources which include purchases from manufacturers, as well as from flea markets and airlines.

        6.5. Price and volume estimation

        Chain volume measures of household food consumption have been estimated using price indexes designed at the lowest level of aggregation (IOPC). The aim is to improve the accuracy of volume measures by applying deflation at detailed product levels on a quarterly basis. Deflators have been applied to each product category (IOPC) using established methods of measuring price change in food based on the scanner data. These methods were applied with slight modifications to CPI price index compilation, to suit the level of price granularity required for HFCE deflators. This is the first time the ABS has produced price indexes for the purpose of deflating HFCE at such a low level of consumption on a quarterly basis and improves the accuracy of the chain volume measures for Food in HFCE.

        To produce real food consumption measures, the relevant nominal consumption measure (as described in section 6.3) is divided by the price index and referenced to the most recent quarter. Referencing means that the series is scaled up or down so that the reference quarter volume measure matches the nominal measure. The Laspeyres’ chained volume measure approach is then used to aggregate prices and nominal consumption measures.

        7. Comparison with published HFCE Food

        Comparison of the published and experimental HFCE Food series (Figure 4) shows the experimental estimates are on average 7% above the currently published estimates. This reflects differences in the datasets currently used to compile HFCE (primarily RIS WIS and the Retail Trade Survey) compared to the scanner data. The scanner data suggests a higher and growing proportion of supermarket sales are food items compared with previous RIS WIS measures (Figure 5). The difference in this proportion explains over 80% of the divergence between the official food and the experimental food estimates.

        The product level granularity available through these methods allows more accurate measurement of food and non-food sales, resulting in higher levels for the experimental series and the ability to better reflect changes in food consumption patterns over time. The increased difference between the experimental and published food estimates from 2020 onward can be attributed to the shift in composition towards greater food sales during the COVID-19 period.

        The shift in composition is also evident when comparing growth rates, most notably in March quarter 2020 when the growth rates are noticeably different (Figure 6). The experimental estimates improve on the accuracy of the existing Retail Trade indicator which does not account for the shift in the mix of products purchased from supermarkets. 

        Download
        Download
        Download

        The experimental estimates based on scanner data can provide additional insights into how food consumption is changing over time.  In particular, the IOPC based estimates allow for detailed analysis at the product level and can show how particular food products contribute to annual and quarterly growth (Figure 7). 

        Download

        Analysis of changes at the detailed product level can also be undertaken for growth in volume terms. Figure 8 compares the growth trajectories for selected products in real terms, and shows the diversity in growth rates over time. This level of detail and explanatory power is not possible with existing data sources which are highly aggregated.

        Download

        8. Quality assessment

        The experimental food estimates were assessed against each of the 7 dimensions of quality as outlined in the ABS Data Quality Framework. The ABS Data Quality Framework is designed for use in assessing the quality or “fitness for purpose” of estimates or indicators in a variety of settings. Fitness for purpose implies an assessment of the experimental food estimates, with specific reference to its intended objectives or aims in measuring household food consumption in HFCE. The summary assessment outlined in Figure 9 describes how the experimental estimates are an improved measure of food consumption in HFCE. This assessment is described in further detail in Appendix 1 of this paper.

          Figure 9. Summary of ABS Data Quality Framework assessment for the Experimental Food Estimates

          Figure 9. Summary of ABS Data Quality Framework Assessment for the Experimental Food Estimates

          Figure 9. Summary of ABS Data Quality Framework assessment for the Experimental Food Estimates

          This figure shows the seven ABS dimensions of quality for published statistics as bubbles, with accompanying comments indicating how the experimental food estimates assess against them. The comments imply that using scanner data as a base for food measurement improve the quality of the resulting statistics. More information on this assessment can be found in Appendix 1.

          9. Future work

          The ABS continues to explore a range of possible applications for alternative data sources in the National Accounts as well as across the broader suite of statistics. A key focus is continuing to improve the automation of data editing and mapping by further exploring the use of machine learning algorithms for maintenance and outlier detection to maximise the continuing fitness for purpose of the scanner data.

          Further refinements and testing of the food consumption estimation methods discussed in this paper will be a key focus over the next 12 months. The ABS plan to introduce experimental quarterly food consumption estimates based on this method into the 5206.0 National Accounts publication in September quarter 2021, published alongside official estimates. These will replace the existing data sources used in official estimates of food in September quarter 2022 following consultation with users and stakeholders.   

          In addition to scanner data, the ABS continue to source new and emerging data relating to household consumption and expenditure. This include transactions datasets from additional retailers and from other organisations such as commercial banks and other private sector entities. In addition to the work currently underway in the National Accounts, the ABS is also looking at how these data sources can support the production of household expenditure survey’s through Re-imagining the Household Expenditure Survey.

          The ABS will continue to explore these new data sources and estimation techniques relating to Australian household consumption and expenditure measurement, ensuring that our estimates remain current and reflective of the Australian economy. Although surveys will continue to play a central role in economic measurement, the use of a wide variety of data sources will continue to grow ensuring relevance, accuracy, and greater timeliness of official statistics.

           

          Appendix 1 – Detailed ABS Data Quality Framework: Assessment of the Experimental Food Estimates in HFCE

          Institutional Environment

          The data that underpins the experimental estimates are underpinned by long standing arrangements with data providers. The cost of collection for the scanner data is much lower compared to survey collections, as automated collection processes are in place, as opposed to phone or survey form collections. Appropriate confidentiality rules have been applied to the scanner data to protect the information of all businesses involved. All information is handled in accordance with the Australian Privacy Principles contained in the Privacy Act 1988.

          Relevance

          The experimental food estimates have advantages in terms of relevance for users due to the ability to target food consumption growth and its drivers more directly in the quarterly HFCE results. Furthermore, the scope of the experimental estimates more closely aligns with the conceptual framework of the Australian National Accounts as outlined in the Australian System of National Accounts.

          This improvement addresses a current deficiency in the estimation of household food consumption in the National Accounts and will ultimately result in more accurate estimates of GDP. The enhanced timeliness, analytical and interrogative power of these estimates will also work to improve the relevance of the estimates for users who will have better quality information, and more timely results for household consumption.

          Timeliness

          The timeliness of the experimental estimates is contingent on the weekly submission of results from the providers of scanner data. These estimates reduce the reliance on infrequent survey collections, thus significantly improving the timeliness of results for household consumption.

          Accuracy

          Currently, official estimates for Food consumption within the HFCE is predominately compiled using aggregated data, with various adjustments applied to account for scope differences. The experimental estimates however utilise data within a methodology that is more aligned to the conceptual framework of the Australian National Accounts. As a result, the experimental estimates are a more accurate estimate of household food consumption.

          Improvements to the estimates are observable in the aggregated results, and in dis-aggregated results, as the methodology allows for the measurement of compositional change at the product level. Additionally, the ability to align frequent and high-quality price indexes to that of the products within HFCE, allows for low levels of deflation and thus, more accurate volume estimation.

          Coherence

          The estimates were produced over a period of 5 years, which is long enough to compare with similar statistical products within the ABS. When comparing the estimates to that of existing measures of food (such as the HFCE publication, and Retail Trade), the quarterly movements of the experimental estimates are generally aligned, except in 2020 where substantial changes in food consumption due to product compositional change were observed.

          Interpretability

          The experimental food estimates have been designed to align with the scope and coverage of HFCE outlined in the conceptual framework of the Australian National Accounts.

          Additionally, the higher level of detail present within the estimates of lower level product categories allows for the interrogation of drivers of growth for food consumption in each quarter. This will enable higher quality statistical reporting, which will improve the interpretability and understanding of changes in food consumption among users.      

           

          Appendix 2 – Detailed descriptions of methods for building up food measure from scanner data

          Layer 2 – Food consumed from non-scanner data supermarkets (smaller supermarket chains)

          The scanner data covers approximately 84% of food sales from supermarkets and grocery stores, the remaining 16% obtained from non-providing businesses in this industry. Therefore, a separate data source is needed to measure this component, referred to as “layer 2”. Measuring layer 2 (making up around 12.5% of total food sales) uses the Retail Trade survey which produces monthly level estimates of sales from all Australian supermarkets and grocery stores. However, data availability for layer 2 is limited to either outdated product category information from RIS WIS or more timely but less detailed data from Retail Trade. A key problem with the approach is the use of fixed weights at the product category level. The quarterly food consumption level for supermarkets and grocery stores are calculated by using the RIS WIS product category estimates and moved forward by the timelier retail trade aggregates. Additionally, the scanner data is used to ensure the compositional integrity of product categories within the layer two product estimates. The steps in this method are as follows:

          1. Use the scanner data to directly measure the IOPC sales levels in layer 1
          2. Split the difference between the Retail food benchmark and the total scanner data food into product categories by using proportions identified in the Retail and Wholesale Industries Survey 2012-13 (RIS WIS), updated for relevance using up-to-date scanner data information
          3. Aggregate the corresponding product levels from the first two steps to obtain “layer 2” estimates

          This is an improvement on existing approaches which rely heavily on the RISWIS estimates, because the new approach is able to track product supermarket food composition over time, keeping estimates relevant and being robust to compositional changes such as those observed during 2020.

          Layer 3 – Food consumed from non-supermarket retailers

          The resulting layer 2 measure still excludes food sold outside the supermarket and grocery store industry; therefore, further layers are needed to capture this. Layer 3 (making up around 13% of total food sales) accounts for retail industry sales of food outside supermarkets and grocery stores, such as speciality food stores and sales from non-food retailers such as pharmacies and newsagents. These estimates also made use of the Retail Trade survey, for the full range of published categories, apart from Restaurants and Takeaways which are not included in the HFCE definition of food. As much of these non-supermarket categories predominantly sell non-food items, these estimates were weighted by food-sales fractions estimated using the Retail and Wholesale Industries Survey 2012-13, which also allowed for estimates of product categories which could be carried forward through the time series. The corresponding product category estimates were added to layer 2 to obtain the layer 3 estimates.          

          Layer 4 – Food consumed from non-retail industry sources 

          To capture all remaining food consumption, layer 4 (together making up around 12% of total food sales) covers sales outside the retail industry, including sales from non-restaurant service industries such as casinos, ships and airlines, as well as wholesalers, manufacturers direct to households while also accounting for out-of-scope spending in the scanner data from overseas residents and businesses. There are a range of associated adjustments for these individual components which use a range of data sources such as industry and household surveys, Business Activity Statement data, RBA transaction data and research reports from external providers, amongst others. While these data sources combine to produce a good quality adjustment for layer 4, work is ongoing to maximise the use of new and emerging data sources to supplement these existing sources to ensure the most timely and accurate estimates are maintained. These components are also included in the published HFCE estimates and are described further in the Australian System of National Accounts.

          Data Downloads

          Table 1. Experimental Food Estimates HFCE, current price and volume, COICOP Group, SUPC and IOPC, Original