A combination of clerical and computer-based systems were used to process data obtained in the survey. Internal system edits were applied in the CAI questionnaire to ensure the completeness and consistency of the responses being provided. The interviewer could not proceed from one section of the interview to the next until responses had been appropriately completed.
A number of range and consistency edits were programmed into the CAI questionnaire. Edit messages automatically appeared on the screen if the information entered was either outside the previously determined range for a particular question, or contradicted information already recorded. These edit queries were resolved on the spot with respondents.
Data from the CAI questionnaire were electronically loaded to the processing database on receipt in the ABS office in each State or Territory. A record of superannuation authorisations was also electronically captured in the Regional Offices (ROs) via scanning the unique barcode.
Computer assisted coding was performed on responses to questions on languages, country of birth, occupation and industry of employment to ensure completeness. Data on relationships between household members were used to delineate families within the household, and to classify households by type. An outline of the computer assisted coding that was performed is provided below.
Language spoken and country of birth coding
The interview questionnaire listed the most frequently reported languages and countries. Interviewers were instructed to mark the appropriate box, or if the reported language or country was not among those listed, to record the name of the language or country for subsequent office coding. Languages were classified according to the Australian Standard Classification of Languages (ASCL), 2005-06 (cat. no. 1267.0). Country of birth was classified according to the Standard Australian Classification of Countries (SACC), 1998 (cat. no. 1269.0).
Occupation and industry coding
Occupation and industry codes relate to up to four jobs held by employed respondents at the time of interview. Occupation and industry codes have been dual classified to allow for comparisons to be made to other survey data that has been output using the previous classification. Please note however, the dual coded data is not available on the CURF, although it is available as a special data request. See the 'Special data services' section in Chapter 5: 'Survey output and dissemination' for more details.
Occupation was office coded based on a description of the kind of work performed, as reported by respondents and recorded by interviewers. Occupation was coded to the Australian and New Zealand Standard Classification of Occupations (ANZSCO), 2006 (cat. no. 1220.0) as well as the Australian Standard Classification of Occupation (ASCO) 1997 (cat. no. 1220.0). Industry of employment was coded to the Australian and New Zealand Standard Industrial Classification (ANZSIC), 2006 (cat. no. 1292.0) as well as the Australian and New Zealand Standard Industrial Classification (ANZSIC), 2003 (cat. no. 1292.0).
Sector of employment coding
Sector coding (public sector, private sector or not determined) was conducted for the main and second job held by employed respondents at the time of interview. Sector coding was applied within the CAI instrument for main job, and was office coded for second job.
Family relationship coding
Based on information recorded on the household form, all usual residents in each sampled dwelling were grouped into family units and classified according to their position within the family. This information was then transferred to each individual questionnaire.
Coding of educational qualifications
Level of highest educational qualification and field of study of that qualification were coded to the Australian Standard Classification of Education (ASCED) (cat. no. 1272.0). Coding was based on the level and field of study as reported by respondents and recorded by interviewers.
Geography data (Capital city, Balance of State/Territory; Remoteness areas) were classified according to the Australian Standard Geographical Classification (ASGC) (cat. no. 1216.0).
Information from the questionnaires was stored on a computer output file in the form of data items. In some cases, items were formed from answers to individual questions, and in other cases were derived from answers to several questions.
Data available from the survey are essentially 'as reported' by respondents. Imputation for missing values was not undertaken for any items within this survey. Where data was missing, not available or unknown, values have been coded as such and are available in the totals for each relevant item. In some cases it was possible to correct errors or inconsistencies in the data which were originally captured in the interview by referring to other data in the record; in other cases this was not possible and some errors and inconsistencies may remain on the data file.
A range of procedures and checks were followed in processing the survey to minimise errors occurring during processing. Checks were performed on records to ensure that specific values lay within valid ranges, and that relationships between items were within limits deemed acceptable for the purposes of this survey. These checks were also designed to detect errors which may have occurred during processing and to identify instances which, although not necessarily an error, were sufficiently unusual or close to agreed limits to warrant further examination.
Throughout processing, frequency counts and tables containing cross-classifications of selected data items were produced for checking purposes. The purpose of this analysis was to identify any problems in the input data which had not previously been identified, as well as errors in derivations or other inconsistencies between related items. In the final stages of processing, additional output editing and data confrontation was undertaken to ensure SEARS estimates conformed to known or expected patterns, and were broadly consistent with data from SEAS 2000 and other ABS and non-ABS data sources, allowing for methodological and other factors which might impact on comparability.
There were three files which fed into the combined dataset for SEARS: one that included information from the household, one that included data from the superannuation funds, and one that contained personal information. These files were merged into a combined dataset, which features the following six levels:
- the top level contains household information;
- the second level contains information on the family;
- the third level contains the majority of the information for each person aged 15 years and over;
- the fourth level contains a subset of person level data relating to the superannuation account/s held. Data on this level is only available through specialist consultancy services;
- the fifth level contains a subset of data relating to the job/s held; and
- the sixth level contains data relating to the care provided for children and/or adults. Data on this level is also only available through specialist consultancy services.
This dataset is a hierarchical file, which is an efficient means of storing and retrieving information that describes one to many, or many to many, relationships. For example, a person may have worked in two jobs, and have income information about one of these jobs but not the other. In this circumstance, different record levels are used to store the details related to these incidents. Some records at the lower levels may be null to maintain the hierarchy of the file.
Details of some of the items included for each level are:
WEIGHTING, BENCHMARKING AND ESTIMATION
- Household level:
- characteristics relating to household such as State or Territory of usual residence, household income, number of persons in household and household composition;
- Family level:
- characteristics relating to families such as child care arrangements, presence of a child with a disability in the family, number of persons in family, and family type;
- Person level:
- employment arrangements - the number of jobs and hours respondents work, the types of jobs they have, the flexibility and stability of employment, working patterns and preferred working patterns;
- retirement - whether and when respondents intend to retire, whether they plan to take up part time work as a transition to retirement or change their work in other ways, and how they intend to support themselves in retirement;
- income - income received for the 2005-06 financial year which can include wages and salaries, income from businesses, rental properties, dividends, interest, pensions or allowances;
- superannuation - what superannuation coverage respondents have and whether they are contributing to their superannuation, whether people are already receiving superannuation pensions or annuities, and whether lump sums have been received and how these were used; and
- managing caring responsibilities - whether respondents care for children and/or adults and the working arrangements, if any, that they use, or would like to use to help in balancing work and care responsibilities. The survey provides a broader insight into how caring responsibilities are managed within households;
- Account level:
- details for up to three accounts, such as type of fund, balance of account and who makes contributions to each account;
- Job level:
- details for up to four jobs. such as occupation and industry, hours worked, types of paid leave;
- Care level:
- detailed information about adult or child recipients of care, such as age, relationship to care provider and reason for care.
Weighting is the process of adjusting results from a sample survey to infer results for the total population. To do this, a 'weight' is allocated to each sample unit. The weight is a value which indicates how many population units are represented by the sample unit. The first step in calculating weights for each unit is to assign an initial weight, which is the inverse of the probability of being selected in the survey. For example, if the probability of a household being selected in the survey was 1 in 600, then the household would have an initial weight of 600 (that is, it represents 600 households).
In SEARS 2007 there are two main types of 'sample units': persons and households. Weights were calculated separately for persons and households. Only complete households were given a household weight, but all fully responding persons, including those who belonged to an incomplete household, were given a person weight. For this reason, an estimate obtained using the person weights will not exactly match the same estimate obtained using household weights. For example, if the estimate of all persons is calculated using person weights, it will not match the same estimate calculated by multiplying the number of persons in each household by the household weights. The use of all fully responding persons with person level estimates allows a higher level of accuracy to be achieved for those estimates.
The initial weights are then calibrated to align with independent estimates of the population of interest, referred to as benchmarks. Population benchmarks are projections of the Estimated Resident Population (ERP) data based on the 2001 Census of Population and Housing. Person level initial weights were calibrated to meet the benchmarks at designated state by area of usual residence by sex by age group classes. The household weights were calibrated to meet the household benchmarks at designated state by area of usual residence by household composition classes.
Weights calibrated against population benchmarks ensure that the survey estimates conform to the independently estimated distribution of the population, rather than to the distribution within the sample itself. Calibration to population benchmarks helps to compensate for over or under enumeration of particular categories of persons which may occur due to either the random nature of sampling or non-response. Benchmarking also ensures that survey estimates have consistency with other ABS surveys.
The 2007 SEARS was benchmarked to the estimated resident population (ERP) aged 15 years and over living in private dwellings in each State and Territory, excluding the ERP living in very remote areas of Australia at May 2007. Therefore, SEARS 2007 estimates do not (and are not intended to) match estimates for the total Australian resident population obtained from other sources (which include persons living in non-private dwellings, such as hotels and boarding houses, and persons living in very remote parts of Australia).
Benchmark variables used in the 2007 SEARS, with corresponding level of detail, were:
- State or Territory of usual residence - all States and Territories;
- Area of usual residence - capital city or balance of State;
- Age of person - grouped in the following way: 15-69, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, 85+. Further collapsing was required for some States and Territories. For South Australia, balance of state, males were grouped to 20-29 and 80+, and females were grouped to 15-24, 30-39. For Western Australia, balance of state, males were grouped as 80+. For Tasmania, balance of state, males were grouped to 15-24. For the Northern Territory, capital city, males and females were grouped to 80+, and for Australian Capital Territory, capital city, females were grouped to 80+; and
- Sex of person - males and females.
- State or Territory of usual residence - all States and Territories;
- Area of usual residence - capital city or balance of State; and
- Household composition.
The benchmarks used in SEARS 2007 were the same as those used in SEAS 2000. The only change was to the age groupings used. In 2000 these age groups were 15-19, 20-24, 25-34, 35-44, 45-54 and 55-69. The expanded detail for age groups aims to improve estimates for older age groups.
Each record in SEARS 2007 contains one weight - either a person weight or a household weight. The weights indicate how many population units, that is, persons or households, are represented by the sample unit. In addition, replicate weights have also been included, with 60 person replicate weights or 60 household replicate weights. The purpose of these replicate weights is to enable calculation of the relative standard error (RSE) for each estimate produced. Survey estimates of counts of persons are obtained by summing the weights of persons with the characteristic of interest. Estimates for means (such as mean age of persons) are obtained by summing the weights of persons in each category (e.g. individual ages), multiplying by the value for each category, aggregating the results across categories, then dividing by the sum of the weights for all persons. The same methods are applicable for estimates of households, families or jobs.
For more information on RSEs, please refer to Chapter 4: 'Data Quality'
. For more information on use of weights, please refer to Chapter 6: 'Using the CURF data'
. For a list of the weight variables on SEARS 2007 files (person weight, person replicate weights, household weight, household replicate weights), please refer to the survey data item list on the ABS web site <www.abs.gov.au