DATA PROCESSING METHODS
Computer based systems were used to process the data from the SIH with a program known as BLAISE. It was necessary to employ a variety of methods to process and edit the data which reflected the different questionnaires used to collect data from the household and individual components of the surveys. These processes are outlined below.
Coding and input editing of household and individual questionnaires
Internal system edits were applied in the computer-assisted interview (CAI) questionnaires to ensure the completeness and consistency of the responses being provided. The interviewer could not proceed from one section of the interview to the next until responses had been appropriately completed.
A number of range and consistency edits were programmed into the CAI questionnaire. Edit messages automatically appeared on the screen if the information entered was either outside the permitted range for a particular question, or contradicted information already recorded. These edit queries were resolved on the spot with respondents.
Data from the CAI questionnaires were electronically loaded to the processing database on receipt in the ABS office in each State or Territory. There, checks were made to ensure data for all relevant questions were fully accounted for and that returns for each household and respondent were obtained. Problems identified by interviewers were resolved by office staff, where possible, based on other information contained in the schedule, or on the comments provided by interviewers.
Computer-assisted coding was performed on responses to questions on country of birth, occupation and industry of employment to ensure completeness. Data on relationships between household members were used to delineate families and income units within the household, and to classify households and income units by type.
A query resolution system ensured that an accurate record of decisions was made in resolving the queries.
Additional editing
A range of edits was also applied to the household and individual information to double check that logical sequences had been followed in the questionnaires; that specific values lay within expected ranges; and that relationships between items were consistent.
Unusually high values (termed statistical outliers) were investigated to determine whether there had been errors in entering the data. Such values were also examined for their effect on aggregate estimates for Australia, and action was taken where necessary.
Imputation for missing records and values
Some households did not supply all the required information but supplied sufficient information to be retained in the sample. Such partial response occurs when:
- income or other data in a questionnaire are missing from one or more non-significant person's records because they are unable or unwilling to provide the data
- all key questions are answered by the significant person(s) but other questions are not answered.
In these cases, the data provided are retained and the missing data are imputed by replacing each missing value with a value reported by another person (referred to as the donor).
Donor records are selected by finding fully responding persons with matching information on various characteristics, such as state, sex, age, labour force status and income, as the person with missing information. As far as possible, the imputed information is an appropriate proxy for the information that is missing. Depending on which values are to be imputed, donors are randomly chosen from the pool of individual records with complete information for the block of questions where the missing information occurs.
In previous SIH surveys, responses were also imputed where not every person aged 15 or over residing in the household responds, but the significant person(s) provide(s) answers to all key questions. In 2005-06 these households were regarded as non-responding.