Methodological News, Dec 2021

Features important work and developments in ABS methodologies

Released
10/12/2021

This issue contains two articles:

  • Feasibility study: Estimating the value of agricultural commodities produced using supermarket scanner data
  • Exploring simulated microdata to enhance data access

Feasibility study: Estimating the value of agricultural commodities produced using supermarket scanner data

The ABS currently uses a combination of survey and administrative data sources to calculate commodity values for the ABS publication, Value of Agricultural Commodities Produced (VACP). A feasibility study was recently conducted to explore the use of supermarket scanner data to estimate ‘farmgate’ prices for selected agricultural commodities, replacing data received from marketing reports and wholesalers.

Approach

The initial exploratory stage of this analysis focussed on four main commodities: bananas, avocados, pumpkins and mushrooms. These commodities have a high proportion sold as fresh produce, and were considered promising in terms of the potential alignment between the VACP and scanner data.

The ABS compared a variety of VACP and scanner data variables to explore if there was a suitable relationship between VACP and scanner data. If a prominent relationship was found, scanner data would be used to model and produce gross values for some VACP commodities.

Key relationships explored included:

  • Variety level comparisons of monthly unit prices (city, state, and national level)
  • Commodity level comparisons of monthly unit prices (national level only)
  • Ratio of scanner data to VACP monthly unit prices

Results

Overall, results at the national level were better aligned compared to city and state results. All commodities demonstrated a good relationship between VACP and scanner data for some analyses, but poor relationships for others. However, none of the relationships explored demonstrated good alignment between VACP and scanner data for all four commodities.

When comparing prices on a variety level, Cavendish bananas showed the most promising results. However, when we compared prices at the commodity level, the alignment between VACP and scanner data for bananas was very poor. On both a commodity and variety level, mushrooms had the worst alignment between prices, with flat and button mushroom varieties showing significantly poor alignment. However, when the ratio between scanner data and VACP was investigated, the mushrooms varieties had the most consistent ratios (i.e. minimal fluctuations over the time period examined). In contrast to this, on a commodity level and for the Kent and butternut varieties, the monthly prices of pumpkins were well aligned between the datasets. However, when investigating the price ratios, we saw large fluctuations between months, suggesting poor alignment between VACP and scanner data.

Outcome & future work

Although some results showed promise, there was no relationship that was consistently strong across all four commodities. However, the results demonstrated that there was value in using the scanner data to support validation of the outputs being produced for VACP.  The ABS will continue to explore the use of scanner data as a confrontational tool for VACP.

For more information, please contact Soraya McPhail.

Exploring simulated microdata to enhance data access

The ABS is exploring the use of simulated data to make microdata more accessible for users. Simulated or synthetic datasets can be generated using aggregate statistics from real data. This approach replicates the statistical properties of the original data while ensuring confidentiality requirements are met.

The ABS currently makes deidentified, detailed microdata available through the ABS DataLab, a secure and remote access service. Use of the DataLab is restricted to approved researchers and all projects require approval from the ABS. Some projects also require consideration by public sector data custodians. Simulated, deidentified microdata could be safely explored by researchers before getting approved DataLab access.

The approach which is being explored uses the Vale-Maurelli method to create numeric variables and multinomial regression models to create categorical variables. The Vale-Maurelli method simulates numeric variables using their first four moments (mean, variance, skewness, and kurtosis) and their covariance matrix. For each categorical variable, a multinomial regression model is created using numeric variables as predictors and the categories are then sampled from the predicted probabilities.

This process results in a safe, simulated dataset which could be made accessible for a wider range of users. If the current research is successful, the ABS will consider seeking approval from data custodians to take this approach further. Before making simulated data accessible to researchers, the ABS would also fully consider the privacy impacts of the project and minimise any disclosure risk. The long-term desire is to have a simulated dataset which could be used:

  1. by prospective DataLab researchers to test their code and gain an understanding of the assets available in the DataLab environment
  2. as a tool to facilitate capability building both in the ABS and externally

For more information, please contact Tiernan Byrne.

Contact us

Please email methodology@abs.gov.au to:

  • contact authors for further information
  • provide comments or feedback
  • be added to or removed from our electronic mailing list

Alternatively, you can post to:

Methodological News Editor
Methodology Division
Australian Bureau of Statistics
Locked Bag No. 10
Belconnen ACT 2617

The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.

Previous releases

Releases from June 2021 onwards can be accessed under research.

Issues up to March 2021 can be found under past releases.