4671.0 - Household Energy Consumption Survey, User Guide, Australia, 2012  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 24/09/2013  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All

DATA PROCESSING METHODS

Computer based systems were used to collect and process the data from the HECS with a software program known as BLAISE. A variety of methods were employed to process and edit the data, reflecting the different questionnaires used to collect data from the household, individual and longitudinal components of the HECS. These following subsections outline these processes.

Coding and input editing of household, individual and longitudinal questionnaires
Additional editing
Imputation for missing records and values


Coding and input editing of household, individual and longitudinal questionnaires

For the household and individual questionnaires, internal system edits were applied in the computer-assisted interview (CAI) questionnaires to ensure the completeness and consistency of the responses being provided. The interviewer could not proceed from one section of the interview to the next until responses had been appropriately completed.

A number of range and consistency edits were programmed into the CAI questionnaires. Edit messages automatically appeared on the screen if the information entered was either outside the permitted range for a particular question, or contradicted information already recorded. These edit queries were resolved on the spot with respondents.

Data from the CAI questionnaires were electronically loaded to the processing database on receipt in the ABS office in each state or territory. Office checks were made to ensure data for all relevant questions were fully accounted for and that returns for each household and respondent were obtained. Problems identified by interviewers were resolved by office staff, where possible, based on other information contained in the schedule, or on the comments provided by interviewers.

Computer-assisted coding was performed on responses to questions on country of birth, occupation and industry of employment to ensure completeness. Data on relationships between household members were used to delineate families and income units within the household, and to classify households and income units by type.

For the longitudinal questionnaire some system edits were utilised, but were much more limited than the CAI interview. Furthermore, some office checks were made for completeness of questionnaires obtained from households who responded through the Internet. All remaining items that were not reported are disseminated as 'not collected'.

Additional editing

A range of edits was also applied to the household and individual information to double check that logical sequences had been followed in the questionnaires; that specific values lay within expected ranges; and that relationships between items were consistent.

Unusually high values (termed statistical outliers) were investigated to determine whether there had been errors in entering the data. Such values were also examined for their effect on aggregate income and expenditure estimates for Australia and action was taken where necessary.

Imputation for missing records and values

Some households did not supply all the required information, but supplied sufficient principal information to be retained in the sample. Such partial responses occur when:

  • income or other data in a questionnaire are missing from one or more non-significant person's records because they are unable or unwilling to provide the data
  • all key questions are answered by the significant person(s) but other questions are not answered
  • not every person aged 15 years and over residing in the household responds but the significant person(s) provide answers to all key questions

In the first and second cases of partial response above, the data provided are retained and some missing data are imputed by replacing missing values with a value reported by another person with similar characteristics (referred to as the donor).

For the third type of partial response, the data for the persons who did respond are retained, and data for each missing person are provided by imputing data values equivalent to those of a fully responding person (the donor).

The HECS did not impute data for the following types of items:
  • Household energy characteristics, including sources of energy used and energy efficiency characteristics
  • Billing arrangement characteristics, except for the frequency of fixed regular payments where applicable households did not know the frequency
  • Discounts or rebates on electricity or gas bills
  • Household expenditure, where applicable on:
    • GreenPower.
    • Supply and consumption charges for gas and electricity
    • Feed-in tariffs for solar electricity
  • Household energy consumption variables, including data linked from the BSRED
  • Financial stress items
  • Household energy perceptions and behaviour data
  • Household heating, cooling, lighting and appliance data (i.e. any data collected from the paper form)
  • Any data collected from the longitudinal component


Donor records are selected by finding fully responding persons with matching information on various characteristics (such as state, sex, age, labour force status and income) as the person with missing information. As far as possible, the imputed information is an appropriate proxy for the information that is missing. Depending on which values are to be imputed, donors are randomly chosen from the pool of individual records with complete information for the block of questions where the missing information occurs.

The final sample includes 5,308 households which had at least one imputed value in income, assets and liabilities or energy expenditure. For 28.8% of these households only a single value was missing, and most of these were for income from interest and dividends, and superannuation modules.