1504.0 - Methodological News, Dec 2008  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 12/12/2008   
   Page tools: Print Print Page Print all pages in this productPrint All

Feasibility of data pooling in the ABS

The ABS is always under competing pressures. On the one hand, there is pressure for ABS to produce more precise estimates for small sub-populations. On the other hand, there is pressure on budgets so that we increase the utility of existing collections. With these considerations in mind, Analytical Services Branch (ASB) is currently exploring whether improved estimates can be created by combining (i.e. pooling) data from multiple ABS collections. If successful, data pooling will allow ABS to better use the data it already has and to analyse more of the Australian population in greater depth.

The main aims of the investigation being conducted by ASB are to explore and understand the issues involved in pooling data from multiple sources; to develop a set of criteria to evaluate, on a case by case basis, whether gains can be obtained through pooling; and, to propose techniques for effectively pooling data in common situations, under various assumptions.

The primary benefit of data pooling is increased sample size, which may allow key estimates to be produced with reduced sampling error. It may also be possible to use a pooled dataset to produce estimates for small populations, whose sampling errors were initially too high for publication.

However, inconsistencies between collections may introduce additional non-sampling error. This increase in non-sampling error must be weighed against the reduction in sampling error, to decide whether pooling is beneficial. Possible differences between collections to consider include:

    • differences in scope and/or coverage of the collections;
    • differences in enumeration periods;
    • differences in sample design and/or weighting procedures;
    • differences in questionnaires; and
    • differences in non-response.
ASB plans to explore the impact of each of the sources of non-sampling error, when using a pooled dataset to create parameter estimates and variance estimates. It will conduct a number of case studies, using collections from the ABS Household Survey Program, to highlight some of the key issues. The first case study, which is underway, looks at Indigenous labour force estimates, by combining Labour Force Survey (LFS) data with data from the National Aboriginal and Torres Strait Islander Health Survey (NATSIHS). Currently, annual Indigenous labour force estimates are produced from the LFS by pooling Indigenous respondents from 12 months (ABS cat. no 6287.0). This pooled sample allows broad aggregates of labour force characteristics to be published at the State and Territory level. However, high standard errors are still problematic, for example for States and Territories with smaller Indigenous populations, and for remote areas. By introducing the NATSIHS sample into the pooled dataset, ASB will investigate what gains are possible in terms of reduction in sampling error and more disaggregated estimates.

However, there are a number of differences between LFS and NATSIHS, which may lead to the introduction of non-sampling error when the datasets are pooled. For example, one key difference is in the questionnaires: LFS uses a much more detailed set of questions to determine labour force status than NATSIHS. ASB will attempt to quantify the effect of this inconsistency, and investigate methods for taking questionnaire differences (i.e. measurement error) into account when pooling the data.

For more information, please contact Russell Lim on (02) 6252 7346 or russell.lim@abs.gov.au.