6503.0 - Household Expenditure Survey and Survey of Income and Housing: User Guide, 2003-04  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 09/06/2006  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> Part 2 - Survey Methodology >> 2.4 Data Processing

2.4 DATA PROCESSING


DATA PROCESSING METHODS

Computer based systems were used to process the data from the 2003-04 HES and SIH with a program known as BLAISE. It was necessary to employ a variety of methods to process and edit the data which reflected the different questionnaires used to collect data from the household, individual and diary components of the surveys. These processes are outlined below.


Coding and input editing of household and individual questionnaires

Internal system edits were applied in the computer-assisted interview (CAI) questionnaires to ensure the completeness and consistency of the responses being provided. The interviewer could not proceed from one section of the interview to the next until responses had been appropriately completed.


A number of range and consistency edits were programmed into the CAI questionnaire. Edit messages automatically appeared on the screen if the information entered was either outside the permitted range for a particular question, or contradicted information already recorded. These edit queries were resolved on the spot with respondents.


Data from the CAI questionnaires were electronically loaded to the processing database on receipt in the ABS office in each State or Territory. There, checks were made to ensure data for all relevant questions were fully accounted for and that returns for each household and respondent were obtained. Problems identified by interviewers were resolved by office staff, where possible, based on other information contained in the schedule, or on the comments provided by interviewers.


Computer-assisted coding was performed on responses to questions on country of birth, occupation and industry of employment to ensure completeness. Data on relationships between household members were used to delineate families and income units within the household, and to classify households and income units by type.


Data capture and coding of individual HES diaries

HES diaries were collected from respondents two weeks after the initial household interview. They were then dispatched to the appropriate ABS office in each State or Territory. All reported expenditures in the diaries were entered into the BLAISE Diary Processing System. The BLAISE system helped operators to code diary items into HEC codes. A trigram coder enabled operators to select the appropriate good or service from an alphabetically ordered pick list of options. The system also deleted expenditure recorded in the diaries on items covered by the household questionnaire. For example, the household questionnaire collected information on mains gas payments so any payments coded to HEC code 0201010201 (Mains Gas - selected dwelling) were automatically deleted.


The complete list of items classified to each expenditure code is called the HEC coding list and is available for researchers who need a detailed knowledge of the content of each expenditure code (see 2.3 'Data collection and data item description'). For example, a researcher may need to know the contents of HEC code 0309030101 Potato crisps and other savoury confectionery which the HEC coding list shows to contain Burger rings, Cheezels, chips (crisps), corn chips, Le snack, pretzels, Twisties and many others. During coding of data, there was a level of manual involvement in adding codes to the coding list for goods not already listed and for variant spelling and punctuation of reported expenditures.


Additional editing

A range of processes were applied to the diary information to check that specific values were correctly coded if they were unusually high or low; that errors had not occurred in coding; and that relationships between household and diary information were consistent. A Query Resolution System ensured that:

  • an accurate record of decisions was made in resolving the queries;
  • coding of products was consistent;
  • the HEC coding list was updated for unusual or unknown products;
  • coders could continue to process diaries if they could not resolve an issue within a short time.

A range of edits was also applied to the household, individual and diary information to double check that logical sequences had been followed in the questionnaires; that specific values lay within expected ranges; and that relationships between items were consistent.


Unusually high expenditure and income values (termed statistical outliers) were investigated to determine whether there had been errors in entering the data. Such values were also examined for their effect on total income and expenditure estimates for Australia, but no action was deemed necessary.


Imputation for missing records and values

Some households did not supply all the required information but supplied sufficient information to be retained in the sample. Such partial response occurs when:

  • income or other data in a questionnaire are missing from one or more non-significant person's records because they are unable or unwilling to provide the data
  • all key questions are answered by the significant person(s) but other questions are not answered
  • not every person aged 15 or over residing in the household responds but the significant person(s) provide answers to all key questions
  • diaries are not all fully completed, but sufficient information is provided.

In the first and second cases of partial response above, the data provided are retained and the missing data are imputed by replacing each missing value with a value reported by another person (referred to as the donor).


For the third type of partial response, the data for the persons who did respond are retained, and data for each missing person are provided by imputing data values equivalent to those of a fully responding person (donor).


For the fourth type of partial response, the diary information provided is used to represent the missing information. For example, if the first week of diary entries is provided but not the second week then the first week of expenditure is used to represent expenditure for the second week.


Donor records are selected by finding fully responding persons with matching information on various characteristics, such as state, sex, age, labour force status, income and expenditure, as the person with missing information. As far as possible, the imputed information is an appropriate proxy for the information that is missing. Depending on which values are to be imputed, donors are randomly chosen from the pool of individual records with complete information for the block of questions where the missing information occurs.



Previous PageNext Page