Once all the forms have been collected, they are transported under secure arrangements to the Data Processing Centre. It is here that data on the forms are processed to produce the computer files used to provide census products for users. Names and addresses are not stored on the computer files.
OPTICALLY READING THE FORMS
Processing begins with a check that all forms have been received from the collectors and that there is a form for each dwelling and person listed in the Collector's record book. Torn, stained or otherwise damaged forms are transcribed to ensure they will pass successfully through the next stage of optical mark recognition (OMR). This involves a machine (OMR reader) reading horizontal marks made on the forms by householders and transferring the responses onto a computer file.
FIRST STAGE PROCESSING
After the OMR reading, further checks are taken to ensure that key dwelling and person information has been captured by the OMR reader. In this stage, the mark-box questions for which there should be a response but for which no response has been captured are checked and the simple write-in questions are coded. This is done by computer-assisted coding (CAC). In this process, the computer system directs the coder to examine those forms and questions for which a response is required. For mark-box questions, all the possible responses are presented on the screen and the coder chooses from the list presented, using the information provided on the census form. For write-in questions, the coder enters the response given on the form and is then presented with electronic index entries that correspond closely to the response. The coder then chooses the index entry that most closely matches the response given on the census form. Once a match is achieved, the code corresponding to the index entry is applied by the computer to the computer file.
SECOND STAGE PROCESSING
The next stage in processing is coding of the complex write-in responses. The same approach is used as for computer-assisted coding of the simple responses. However, for these questions there is not always a direct word-for-word matching of the index entries with the response on the census form.
Coding proceeds in a structured way with key words of the response being identified by the coder according to certain rules and entered onto the computer. The computer may prompt the coder for additional information and present supplementary index entries for matching before coding is complete. As for the previous processes, once a final match is made, the code is then applied to the computer file.
Most of the information on families is captured directly from the mark-box responses provided on the census form and for most families, the family code can be derived automatically from these responses. However, for a small number of situations, such as two families living in the same household or where there are complex relationships between family members, computer-assisted family coding is required. The computer directs the coder to the forms that require special family coding and provides response screens for the coder to enter the codes. An editing program immediately checks the validity of the family codes supplied and if invalid, the coder has to repeat the process. When the family codes for the household are valid, they are then applied by the computer to the computer file.
In a small number of cases, the coder cannot match the response on a census form with the index information presented by the computer. These responses are referred to a query resolution section for the allocation of a code using supplementary indexes and information. Where necessary, new entries will be added to the coding indexes so that similar responses will be able to be coded in future. Where there is inadequate information on the census form to determine a precise code, a more general or 'dump' code will be allocated.
Some editing is undertaken to reduce the inconsistencies in census data. The kind of errors that editing procedures can detect are limited to responses and/or codes which are invalid or inconsistent with other responses on the forms, or which are in conflict with census definitions. Once detected such inconsistencies are dealt with by changing one or more responses on the basis of decision tables drawn up for the purpose. Although the number of edit failures due to householder error are small, there are cases where the subsequent adjustment of records is, by necessity, somewhat arbitrary, because of the absence of conclusive information. Some inconsistencies remain where it is impossible to determine the true situation from information provided on the census form.
IMPUTED AND DERIVED DATA
During processing, procedures for deriving or imputing some data items are implemented by the editing system.
Some data items are derived from other responses given on the census forms. An example of a derived characteristic is labour force status. This characteristic is derived for all people and is determined using a decision table which takes into account the responses (or lack of them) given to several other questions on the form. These are: full or part-time job; job last week; looked for work; availability to start work; hours worked and mode of travel to work.
Data imputation is used for a small number of specific data items such as age, sex and marital status where responses have not been provided on the census form. As it is not usually possible to derive these values from other responses on the form, they must be imputed. For example, age is imputed by the processing system by utilising look-up tables based on the 1991 Census age values for the population (or sub-groups of the population) as a whole. These tables then provide an imputed value which fills the missing value.
The final outcome of the Data Processing Centre work is a file of coded records for each person, family, household and dwelling enumerated in the census containing no identifying information. Once validated, the file becomes the source of all products containing census data.
Once all the statistical data has been extracted and the forms are no longer needed for processing, they are pulped and turned into recycled paper and cardboard.