Australian Bureau of Statistics

Rate the ABS website
ABS Home
ABS @ Facebook ABS @ Twitter ABS RSS ABS Email notification service
Newsletters - Methodological News - Issue 11, September 2003
 
 

A Quarterly Information Bulletin from the Methodology Division

September 2003

DEVELOPING EXPERIMENTAL SPATIAL PRICE INDEXES FOR AUSTRALIA
FORMAL ERROR ANALYSIS OF IMAGING AND RECOGNITION TO IMPROVE PROCESSING AND FORM DESIGN
SAMPLE AND FRAME MAINTENANCE PROCEDURES MANUAL
THE 2001 REVIEW OF SEIFA CONCEPTS AND METHODS
LABOUR FORCE SURVEY SEASONAL ADJUSTMENT ENHANCEMENTS
CONVERTING MAIL SURVEYS TO CATI: FORM DESIGN AND TESTING
INPUT SIGNIFICANCE EDITING TRIAL IN SIS YIELDS VERY GOOD RESULTS
TOWARDS INTEGRATED STOCK-FLOW HUMAN CAPITAL ACCOUNTS FOR AUSTRALIA
THE MEASUREMENT STRATEGY FOR REGISTER SHOCKS


DEVELOPING EXPERIMENTAL SPATIAL PRICE INDEXES FOR AUSTRALIA

The Australian Bureau of Statistics (ABS) has for many years published the Consumer Price Index (Consumer Price Index, Australia Cat. no. 6401.0). The Consumer Price Index (CPI) measures the movements over time in retail prices of goods and services commonly purchased by metropolitan households. Although a separate index is available for each of the 8 capital cities (i.e. Sydney, Melbourne, Brisbane, Adelaide, Perth, Hobart, Darwin and Canberra), the eight indexes cannot be used to compare price levels between the cities.

This project assesses the feasibility of using existing CPI data to produce experimental measures of price differences between the eight capital cities (i.e. spatial price indexes). The indexes cover the year ended June 2002.

Using the CPI sample to compare prices between cities posed some theoretical and practical problems. For example, some items are priced in only one capital city and the item specifications may vary slightly from city to city. These properties of the dataset do not hinder the construction of intertemporal indexes, but are a major hindrance to the construction of spatial indexes. Thus, the first stage of the study addressed the problem of bridging gaps in the dataset and resolving differences in specifications.

The spatial price indexes were calculated based on the multilateral Elteto-Koves-Shultz (EKS) formula. This formula directly compares prices of individual goods and services consumed in the eight capital cities and, in the process of aggregating individual price data, takes into account local consumption habits. The EKS formula has been used by the OECD's Purchasing Power Parities (PPPs) Program in the construction of its official PPPs.

So far indexes have been produced for the following CPI groups: food; alcohol and tobacco; clothing and footwear; household furnishings, supplies and services; health; transportation; communication; recreation; and education.

Preliminary assessments suggest that the index numbers look broadly plausible. However, price observations for some "services" such as Housing and Miscellaneous were deemed unsuitable for spatial comparisons and consequently they have been excluded from the spatial price indexes for time being. The index numbers for health, education and transportation showed larger inter-city variations than expected and as a result need further investigation. Work will continue to improve these areas in the future.

A paper presenting the experimental spatial price indexes is planned and any comments and suggestions would be appreciated. The paper reports our work-in-progress thus far. As a result, the indexes should not be used in policy debates, or for official purposes, until the ABS validates the statistics and publishes them in an official publication

For more information, please contact Alex Waschka on 02 6252 6992 or Shiji Zhao on 02 6252 6053.

Email: alex.waschka@abs.gov.au.

Shiji.zhao@abs.gov.au.


FORMAL ERROR ANALYSIS OF IMAGING AND RECOGNITION TO IMPROVE PROCESSING AND FORM DESIGN

Background

The ABS is increasing the use of imaging and recognition (I&R) technology for capturing data from business surveys. This has resulted in considerable interest in non-sample error associated with I&R, particularly those related to form design and processing, and the opportunities for improvement offered by a planned I&R software upgrade.

To find out the extent of these problems an error analysis was undertaken using data from the Economic Activity Survey (EAS) which had moved to I&R for data capture for the most recent reference period. The main aims of this project were to:
  • determine common errors on the forms, and the extent of the errors, which may cause specific problems with imaging and recognition as the primary method of data capture (e.g. comments outside designated areas, insufficient space in answer boxes, nils, etc);
  • examine any other obvious errors relating to the design of forms (e.g. $,000 vs whole dollar reporting); and
  • identify possible improvements to general form design standards.

Process

Two samples of 120 previously processed Economic Activity Survey forms were selected, one each of a 'long' form (63 questions, some with multiple data items) and a 'short' form. As well as the original collection forms, two data files were obtained for each of the sampled respondents: the original repaired data file (ie after recognition errors and failures identified by the recognition process had been corrected), and the equivalent data file after output editing.

The paper forms were put through the complete imaging, recognition and repair process in a test environment so both the recognised and repaired values could be extracted and confronted. The repaired values were also compared with the original values provided to the collection area, and with the values after editing. All forms were manually inspected and a range of errors and usage patterns recorded.

Response analysis

The analysis gathered specific information on the effects of recognition and processing on data, with most of the issues identified through previous consultations with collection areas. The main finding from the analysis was that most of the commonly reported problems were not as prevalent as we were led to believe. Issues covered included: European 7's; diagonally crossed 0's; writing the word "nil"; brackets; negative values and dashes; non-black pen; white out/tape; spurious marks; crossed out questions and sections; crossed out answers and overwritten answers; obvious whole dollar reporting; answers running over the answer space provided; front of form label changes; and comments outside designated areas.

The three most common recognition errors were caused by spurious marks (22% of errors), use of white out or tape (20%) and crossed out and overwritten answers. In addition there were significant problems with the reporting of nil or negative values and answered spaces being too small or too close together (mainly tick boxes).

Several of these errors can be minimised with improved software (European 7's, diagonally crossed 0's, writing the word "nil"), where others can be addressed through form design (data entry box spacing and size), while crossing out and correction errors may indicate underlying problems with question wording, instructions or formats.

All ABS forms include an optional 'final comments' question, and, because space on survey forms always appears to be at a premium, the usefulness of this question has been a matter of particular interest. Comments were provided by 22% of respondents in this analysis. More than half of these related to data reported and would be useful during editing. A significant number also related to the status of the businesses surveyed and had frame and imputation implications. Under 5% of respondents had complaints.

The project resulted in ten recommendations for further investigation into some areas and identifying solutions to problems through using new I&R software.

For more information, please contact Tracey Rowley on (02) 6252 5905.

Email: tracey.rowley@abs.gov.au.


SAMPLE AND FRAME MAINTENANCE PROCEDURES MANUAL

Imperfect frames are a well known and inevitable source of non-sampling error in surveys. The aim of Sample and Frame Maintenance Procedures is to minimise the amount of non-sampling error caused by imperfect frames in ABS business surveys. Sample and Frame Maintenance Procedures (SFMP) are a set of standard rules which are followed when a difference is found between a business in the real world and its representation on the survey frame. Some of these differences if not correctly treated will result in large errors in survey estimates. SFMP are not only used to minimise these errors but to ensure that these situations are treated correctly and consistently across the entire range of ABS business surveys.

Changes to SFMP are usually driven by major changes that affect the ABS Business Register or its source, the Australian Business Register (ABR). Businesses who register for an Australian Business Number are included on the ABR which is maintained by the Australian Taxation Office. The ABR provides a complete register of operating businesses, and this register is available to the ABS as the main source of information for our statistical frames.

With the reform of business taxation in Australia, changes were imposed on SFMP which led to a revision of the ABS standard business rules. Subsequently, the new edition of the SFMP manual was released recently and documents the revised SFMP.

For more information, please contact Rosslyn Starick on (03) 9615 7689.

Email: rosslyn.starick@abs.gov.au.


THE 2001 REVIEW OF SEIFA CONCEPTS AND METHODS

The Socio-Economic Indexes for Areas are a number of indexes summarising information from a large number of variables on the Census. The technique used is Principal Components Analysis, which derives weights to maximise the correlation between all the variables in the analysis. The weights are then applied to standardised raw data to derive the indexes.

The indexes were first officially calculated in 1990 using data from the 1986 Census. At the time, a great deal of work was done on identifying which variables should contribute to an index of disadvantage. Users and academics of the time were consulted to determine which variables to include.

For the 2001 Indexes, a wide ranging review was decided upon, which covered the variables, the method, and the output. The review included a great deal of user consultation, including visits and tele-conferences with users in each State; visits to Commonwealth Public Service departments; representation in information groups; and direct Email with users.

The review started with an issues paper, to put on the table what were thought to be the major issues. This was sent to each of the users and comments were taken. On the basis of these comments, a Position Paper was written, which outlined the issue, analysis, and proposed position on each issue. This position paper was released to users for comment.

Comments on the position paper were fairly positive. However, the position paper did highlight one area of concern (that of dropping the Postcode level index), so the postcode level index was brought back in.

The review highlighted the fact that there is not a variable selection strategy for SEIFA, so a strategy was developed which prioritised variables into three groups: those directly to do with Socio-Economic Status (Income, Education and Occupation); those directly measuring an aspect of disadvantage (disadvantage is a broader concept than socio-economic status, and covers aspects like wealth, language, access to services); and those variables which could be used as a proxy for an aspect of disadvantage. The last group were only included in the Disadvantage Index for the 2001 SEIFA. There was no guarantee they were associated with the aspect of disadvantage which was trying to be measured; and much of the disadvantage could already be captured with other variables. An example is Indigenous persons; it is difficult to pinpoint any additional disadvantages beyond what is already captured with low income and unemployed variables.

This also means the 2001 Disadvantage Index uses the same variables (but different weights) as the 1996 Disadvantage Index, giving a consistent series.

The review also found that very few people used the Urban and Rural Indexes of Advantage, so they have been replaced with one index of Advantage/ Disadvantage. This index is on a continuum from high disadvantage to high advantage; rather than high disadvantage to low disadvantage (which the Disadvantage index measures). In the Advantage/ Disadvantage index, disadvantage variables are offset by advantage.

Additional work is planned before the 2006 Indexes, including how income groups can be equalised for family size; and whether it is necessary to age-standardise any variables.

The indexes will be released in October.

For more information, please contact Robert Tanton on (02) 6252 5506.

Email: robert.tanton@abs.gov.au.


LABOUR FORCE SURVEY SEASONAL ADJUSTMENT ENHANCEMENTS

An enhanced seasonal adjustment will be implemented in the November 2003 reference month. The enhancements include taking into account two identified calendar related effects and implementing the concurrent seasonal adjustment method.

Identified calendar effects

Other than in December and January, Labour Force Survey (LFS) interviews are generally conducted over two weeks, beginning on the Monday between the 6th and 12th of the month. Two calendar related effects have been found in some Australia level aggregates: January Interview Start Date effect; and Easter effect in April.

Each year LFS interviews for December start four weeks after November interviews start, and January interviews start five weeks after December interviews start. As a result, January interviewing may commence as early as the 8th or as late as the 14th. Employment conditions change markedly around the Christmas and New Year holiday period. A changing interview date for January may impact on the survey estimates. A significant January interview start date effect has been found in adult Female Employed and Female Part-time Employed series.

The timing of Easter with respect to the interview fortnight can impact on the survey estimates for some series in the month of April. Five different timings of Easter Monday in relation to the start of the survey fortnight for April are possible. Easter Monday could:
  • occur a week before the survey period;
  • coincide with the start of the first week of the survey reference period;
  • coincide with the start of the second week of the survey reference period;
  • immediately follow the end of the reference period; or
  • fall a week after the end of the reference period.

When Easter Monday coincides with the start of the second week of the survey reference period, a significant negative effect is found in the adult Female Employed and Female Part-time Employed series. A complementary positive effect is also found in Female Not in the Labour Force series.

Change in seasonal adjustment approach

Traditionally, the labour force times series have used forward factor seasonal adjustment, which is generally estimated once per year. This method relies on an annual analysis of the latest available original data to project seasonal factors (known as forward factors) that will be applied in the forthcoming 12 months.

Alternatively, concurrent seasonal adjustment uses the original time series available at each reference period to estimate seasonal factors for the current and previous months. Concurrent seasonal adjustment is technically superior to the annual forward factor method because it uses all available data to fine tune the estimates of seasonal component at each period. The ABS has demonstrated the advantages of concurrent seasonal adjustment methodology in terms of improved seasonal factor estimates and revision reduction. Concurrent seasonal adjustment has been implemented on many other ABS time series with positive user acceptance of these changes. It has been decided to implement concurrent seasonal adjustment for labour force times series starting in the November 2003 reference month.

For more information, please contact Mark Zhang on (02) 6252-5132.

Email: mark.zhang@abs.gov.au.


CONVERTING MAIL SURVEYS TO CATI: FORM DESIGN AND TESTING

Until now, the Australian Bureau of Statistics has made very limited use of Computer Assisted Telephone Interviewing (CATI) for business surveys. Organisational change, declining response rates and ongoing pressure to produce statistics faster have led to a new interest in this mode of data collection. Recently a CATI instrument for data capture during the very final stage of intensive follow-up (IFU) was developed for the Business Technology Survey. This instrument replaced the informal use of telephone data collection employed by this collection.

The main forms design concerns were:
  • the questions in the paper form had to be reworded so they worked as an interview while minimising mode effects. In particular there were numerous questions with long "select all that apply" lists that had to be changed to obtain yes or no answers, while still obtaining similar data to the paper form;
  • the explanatory notes in the paper form were too long to read out every time. Notes had to be prioritised, in some cases moved to the question, and respondents had to be encouraged to ask for the rest. Getting good quality data had to be balanced with finishing the interview before respondent fatigue set in;
  • the screen design and functionality of the CATI instrument, developed in Blaise, needed to ensure the interview went smoothly and measurement error was minimal. Some standards could be adapted from the household Computer Assisted Personal Interviewing (CAPI) instruments used by the ABS, also in Blaise, however the users involved and the type of questions being asked are quite different;
  • the survey had two financial questions which could require record checking and/or a different contact person to the other questions, which could be awkward.

A testing strategy was developed to address these concerns and was implemented in a very short period. An expert review process continued throughout the development. Three different scripts were developed to address the wording and notes concerns. These scripts were tested as paper telephone interviews on randomly assigned groups of live respondents who were excluded from the main survey. The results were compared across scripts and with previous survey and testing results of the paper form. The main findings were that each option in lists did not need to be reworded into a separate question and that some basic questions could be open-ended. A final script combining the best parts of the three was tested again to ensure the new questions and notes worked.

At the same time a rough prototype was developed in Blaise. An informal user review of two different layout options was conducted. Individual users went through both versions and their behaviour and comments were recorded. The main findings emphasised the need to have layout follow a normal reading path down the screen. The final script was incorporated into the next prototype. An informal group run-through of the instrument was conducted and then formal useability testing. One by one, each operator who would be using the CATI phoned a pretend respondent and went through an entire interview. They were observed and debriefed. This allowed examination of the words, layout, navigation and sequencing all together. It was also excellent training in the instrument for the operator.

This process ensured not only an effective new data collection instrument, but a group of users who had contributed to the development of it and were therefore satisfied with the final result.

For more information, please contact: Emma Farrell on (02) 6252 7316.

Email: emma.farrell@abs.gov.au.


INPUT SIGNIFICANCE EDITING TRIAL IN SIS YIELDS VERY GOOD RESULTS

A parallel run of Input Significance Editing (ISE) and Service Industry Survey (SIS) Intermediate Editing was done for the 01/02 Employment Services Survey. ISE is an editing approach applied at the input stage which is intended to direct resources to units that are expected to yield the most benefit from editing. Current applications of ISE are normally in surveys that have more recent historical data since the method needs expected values for all units. It was decided to do a trial of ISE in SIS to determine if the method will be applicable for surveys that don't necessarily have historical data.

For the SIS trial, 9 key variables or items which contribute to key outputs were identified. Expected values or imputes were calculated for each key item using current survey data after sufficient responses were received. Two types of imputes were tested: regression-based imputes; and a combination of means and medians of imputation classes.

Each unit was assigned an item score for each of the key items and a provider score which combined the item scores. Item scores were derived from the weighted difference of the unit's reported value and imputed value for the item.

ISE lists were generated at the provider level and for each item by ranking units according to the expected benefit in editing each unit. The scores were used to measure expected benefit. For each list, a cutoff was set for the level of cumulative benefit in editing all units above the cutoff. All units that were above the cutoff in the provider or any of the item lists were selected for editing. SIS kept snapshots of the survey data file before and after intermediate editing. These snapshots were used to analyse results.

Overall, the trial gave very good results. It showed that ISE was effective in prioritising units for editing and the cost-benefit trade off, as units were edited in the ranked lists, was quite strong. Provider scores performed well in summarising the nine key items. Mean and median imputes also performed reasonably well, which is a good outcome, since they are much easier to calculate than the regression-based imputes. Moreover, most of the units selected for editing in ISE matched with units flagged for editing by SIS using their current editing system. This was a desirable outcome since SIS have a very good editing strategy.

For information on the SIS trial, please contact Elsa Lapiz on (03) 9615 7364.

Email elsa.lapiz@abs.gov.au.

For general information on significance editing, please contact Keith Farwell on (03) 6222 5889.

Email: keith.farwell@abs.gov.au.


TOWARDS INTEGRATED STOCK-FLOW HUMAN CAPITAL ACCOUNTS FOR AUSTRALIA

The Analytical Services Branch has embarked on its next stage of its human capital project - Towards Integrated Stock-Flow Human Capital Accounts for Australia. This is a natural extension to the previous work on measuring the stock of human capital for Australia. Stocks are connected with flows. The changes in stocks between periods result from the accumulation of prior events, transactions and other flows. In the case of human capital, its stock in the long run depends on the rates at which:
  • new workers enter the work force; and
  • workers acquire knowledge, skills and other related attributes.

Of course, it also depends on the extent to which:
  • they manage to retain their acquired knowledge and skills; and
  • they retire from the workforce.

In order to provide a full account of the growth of human capital, it is necessary to establish an integrated stock-flow accounting system in which changes in the stock of human capital can be fully explained by investment and other flows in human capital.

Consistent with the choice of using the Jorgenson and Fraumeni approach to valuing the stock of human capital, this study uses the Jorgenson system of accounting for human capital, developed by Dale Jorgenson and his colleagues in the 1980s, to obtain estimates of human capital flows over periods and integrate them with the changes in the human capital stock between periods. The major features of this accounting system are summarised as follows:
  • it is based on the concept of human capital measured as the lifetime labour incomes for all individuals in the economy;
  • the changes in human capital stock from period to period is viewed as the sum of human capital formation, net of depreciation on human capital and the revaluation of human capital;
  • human capital formation results from increases in the workforce and increments to lifetime incomes due to investment in formal education;
  • depreciation on human capital is considered as being due to ageing and decreases in the workforce;
  • the difference between gross human capital formation and depreciation on human capital is net human capital formation;
  • revaluation on human capital is considered to be due to changes in lifetime incomes over time for each age/sex/education groups;
  • among other flows, investment in education, which causes increments to lifetime incomes for individuals who undertake additional schooling activities; stands out as the most important source of flows into the human capital stock.

The estimation of flows of human capital formation requires data on demography, education and immigration. In the short-term, these information are either directly available or could be indirectly extrapolated from data from the Census of Population and Housing. In the long-term, for refinement and reconciliation purposes, independent and more detailed data on education enrolment and completion rates are needed to derive separate estimates of investment in education. Data on migration are used to reconcile school enrolment data with estimates of transitions from lower to higher levels of educational attainment and demographic changes between different age/education groups.

Accordingly, a two stage estimation procedure has been proposed. The first stage starts with the census data and the experimental estimates of the human capital stock for the five census years from 1981 to 2001. Given data on schooling activity and recent new migrants, it could be possible to trace the changes in number of persons for each age/sex/education group between census years into three sources:
  • number of persons who have moved from one educational level to the higher ones;
  • number of persons who have immigrated;
  • number of persons who have emigrated.

In the second stage of estimation when all data required on migration and education are obtained, detailed estimates on human capital formation and depreciation will be made and reconciled with those derived by using the census data.

If the acquisition and development of skills embodied in human beings are treated as production, the conventional production boundary as defined in the System of National Accounts could be expanded. This expansion would have a number of ramifications on the existing Australian National Accounts.

For more information, please contact Hui Wei on (02) 6252 5754.

E-mail: hui.wei@abs.gov.au.


THE MEASUREMENT STRATEGY FOR REGISTER SHOCKS

As a result of the reform of the Australian tax system in 2002 and the creation of the Australian Business Register (ABR) which is maintained by the Australian Tax Office (ATO), ABS surveys moved to new frames for business surveys. The ABS Business Register now comprises of ABS maintained (large, complex) units from the old register, and simple and middle units from the ABR. This move to a new frame resulted in changes in estimates caused by changes to the units model (structural changes to businesses) resulting in population changes within particular ANZSIC classes which affected the scope of many collections.

A generalised methodology for measuring the impact on estimates was developed. The methodology was developed for sub-annuals and annual collections. The basis of the methodology required the creation of a new basis frame and the 'imputation' of data on the new basis for a units on the survey frame for which actual data was not available. A method of backcasting old series was developed by Time Series. In practice it was expected that individual collections would need to fine tune the general strategy to suite their own circumstances.

In respect of the measurement strategy the main challenges/difficulties were:
  • that surveys conducted on a less than annual frequency need some understanding of the changes to their series over time. The standard approach had been to ignore these effects because the surveys in general are conducted infrequently. In the end specific assessments of bridging for ad-hoc collections was required;
  • that the measurement frame and associated concordances files were not perfect (nor were they expected to be) and it took time to understand and assure ourselves of the final version. Main issues were with duplication between the simple population in the new ATO maintained population and the ABS maintained population, public and private units (especially for surveys with limited scope) and resolving specific treatments for split Type of Activity Unit caused by the new business structure arrangements in the real world;
  • most measurement outcomes were not able to be signed off in advance of understanding what the new publication estimates would be, as originally planned. In the end the final quality assurance process for each collection was done as part of the sign-off process for each publication. This enabled survey managers to convince themselves that the estimates on the old and new basis were coherent after the measured effects were taken into account.

A newly formed SSB Methodological Panel recently reviewed the measurement methodologies that could be employed for managing major shocks to series. It was an attempt to consider alternatives in light of the Tax Reform experiences.

The mass imputation method worked reasonably well for tax reform stream 2. However, improvements and research have been identified that should be progressed. The Stream 2 impact measurement required a new basis estimate to be calculated using old basis sample data. This was done by a mass imputation of all units on a new basis measurement frame. An issue in mass imputation methods is how to calculate standard errors.

The measurement strategy issues were considered as a time series problem. Given that the purpose of a measurement strategy is to link the new series to the old at the changeover point, there were three options if it was not possible to estimate impacts due to sampling error, (e.g. due to a higher than usual rotation rate). A three step process is more appropriate, noting that some of the steps may be skipped depending on the specific issue.
  • Estimate the impact using data and frames from 3 months before the changeover point. This impact estimate and/or backcast series would be released at the same time as the first new basis statistics.
  • An improved impact estimate can be made using the data from the first new basis survey. This impact would be released after the first new basis statistics, (e.g. 1 or 2 months after).
  • Once 3 cycles of new basis data are available, an even better impact estimate can be made using time series methods.

Another area worth investigating is how to re-weight an old-basis sample to get new basis estimates. While this method has many of the problems faced in tax reform stream 2, it is considered possible to get estimates which could be mapped over time prior to crossing over to a new basis. This has potential for less radical changes than tax reform stream 2 where business were changing structure.

Overall, it is considered feasible to develop improved methods for managing statistical impacts.

For more information, please contact Paul Sutcliffe on (02) 6252-6759.

Email: p.sutcliffe@abs.gov.au.



Commonwealth of Australia 2008

Unless otherwise noted, content on this website is licensed under a Creative Commons Attribution 2.5 Australia Licence together with any terms, conditions and exclusions as set out in the website Copyright notice. For permission to do anything beyond the scope of this licence and copyright terms contact us.