4159.0.55.002 - General Social Survey: User Guide, Australia, 2010

ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 07/12/2011

Page tools: Print

Print Page Print all pages in this product

Contents >> Data processing >> Data Processing

DATA PROCESSING

Data capture

Computer based systems were used to process the data from the survey. Internal system edits were applied in the computer assisted interviewing (CAI) instrument to ensure the completeness and consistency of the questionnaire and responses during the interview. The interviewer could not proceed from one section of the interview to the next until responses had been properly completed.

A number of range and consistency edits were programmed into the CAI collection instrument. Edit messages appeared on screen automatically if the information entered was either outside the permitted range for a particular question, or contradicted information already recorded. These edit queries were resolved by interviewers on the spot with respondents.

Workloads were electronically loaded on receipt in the ABS office in each state or territory. Checks were made to ensure interviewer workloads were fully accounted for and that questionnaires for each household and respondent were completed. Problems with the questionnaire identified by interviewers were resolved by office staff, where possible, using other information contained in the questionnaire, or by referring to the comments provided by interviewers.

Coding

Computer assisted coding was performed on responses to questions on country of birth, language, family relationships, educational qualifications and occupation. Geography data was also coded. The following details the classifications used to code data.

Country of birth coding. The survey questionnaire listed the ten most frequently reported countries. Interviewers were instructed to mark the appropriate box, or if the reported country was not among those listed, to record the name of the country for subsequent coding. All responses for country of birth were coded according to the Standard Australian Classification of Countries (SACC), Second edition (cat. no. 1269.0).

Coding of language. The survey questionnaire listed 10 most frequently reported languages first spoken at home. Interviewers were instructed to mark the appropriate box, or if the reported language was not among those listed, to record the name of the language for subsequent coding. All responses for language spoken were coded to the Australian Standard Classification of Languages (ASCL) (cat. no. 1267.0).

Family relationships. Based on household information collected for all persons in each dwelling, all usual residents were grouped into family units and classified according to their relationship within the family.

Coding of educational qualification. Level and field of both highest educational attainment and current study were coded to the Australian Standard Classification of Education (ASCED) (cat. no. 1272.0). Coding was based on the level and field of study as reported by respondents and recorded by Interviewers.

Occupation data were classified according to the ANZSCO - Australian and New Zealand Standard Classification of Occupations, First Edition, Revision 1 (cat. no. 1220.0).

Geography data (e.g. Capital city, Balance of state/territory; Remoteness areas) were classified according to the Australian Standard Geographical Classification (ASGC), 2006 (cat. no. 1216.0).

Output processing

Information from the questionnaires, other than names and addresses, was stored on a computer output file in the form of data items. In some cases, items were formed from answers to individual questions, while in other cases data items were derived from answers to several questions.

During processing of the data, checks were performed on records to ensure that specific values lay within valid ranges and that relationships between items were within limits deemed acceptable for the purposes of this survey. These checks were also designed to detect errors which may have occurred during processing and to identify instances which, although not necessarily an error, were sufficiently unusual or close to agreed limits to warrant further examination.

Throughout processing, frequency counts and tables containing cross-classifications of selected data items were produced for checking purposes. The purpose of this analysis was to identify any problems in the input data which had not previously been identified, as well as errors in derivations or other inconsistencies between related items. In the final stages of processing, additional output editing and data confrontation was undertaken to ensure GSS estimates conformed to known or expected patterns, and were broadly consistent with data from the previous GSS or from other ABS data sources, allowing for methodological and other factors which might impact comparability.

Data available from the survey are essentially 'as reported' by respondents. The procedures and checks outlined above were designed primarily to minimise errors occurring during processing. In some cases it was possible to correct errors or inconsistencies in the data which was originally recorded in the interview, through reference to other data in the record; in other cases this was not possible and some errors and inconsistencies remain on the data file.

Output file

A two level hierarchical data file was produced as outlined below:

person level - (the main level) containing the majority of data about the respondent and household (e.g. demographics, income, education, employment, health, social capital, housing and mobility, crime and safety, etc.); and
Difficulty Accessing Service Providers level - containing information about the organisations a respondent had difficulty accessing over the past 12 months (up to a maximum of three services).

A hierarchical file is an efficient means of storing and retrieving information which describes one to many, or many to many, relationships. For example, a person may have had more than one service that they had difficulty accessing. In this circumstance, different record levels are used to store the details related to these incidents.

Most data from the GSS is available at the person level and describes personal characteristics, or characteristics of the household to which the person belongs.