STANDARDISATION PROCESSES APPLIED TO VARIABLES
The following is a description of further standardisation techniques that were performed on variables for this project:
FIRST NAME
For the Census data, the original names were first subjected to repair processes at the DPC. First names were compared against a master name index, which allowed for names that were misread by the DPC Optical Character Recognition (OCR) software to be parsed and repaired. Standardisation of first names included removal of non-alphabetical characters and titles (e.g. Ms, Dr).
First names were then compared against a nickname concordance, ensuring that different variations would be grouped into a common name for the purposes of linkage. For example, the names ‘Bradley’ and ‘Brad’ may both be standardised to ‘Bradley’. Any first names that could not be matched to a nickname retained their original form.
Name data on the death registrations were of considerably better quality than those on the Census, and as such were not required to go through a repair process. However the remainder of the First name standardisation process for death registrations was consistent with the Census.
SURNAME
Census surnames underwent repair processes at the Census DPC. Surnames that were repaired were subject to further standardisation prior to linkage; otherwise the original stated surname was used.
For both Census and death registrations, non-alphabetical characters were removed from surnames. Records with multiple surnames that had not stated a first name had the first part of the surname substituted into the final First name field.
INITIAL 4
The variable ‘Initial 4’ was derived by concatenating the first two letters of the standardised first name with the first two letters of the standardised surname. If either the standardised first name or standardised surname was missing, then initial 4 was set to missing. This variable was used to group names into common categories.
SEX
Census records that contained an imputed value for sex but had provided a first name were compared against a name index in an attempt to determine if the name was commonly given to males or females. If the Census name matched to a name on the index, then the relevant sex was applied to the Census record. If the Census name could not match to any name on the index, then the value for sex was coded to missing.
ADDRESS (STREET NUMBER, STREET NAME, SUBURB, POSTCODE)
Linking was conducted based on the usual residential address of Census records and death registrations. Census addresses were also repaired using the output from Census address coding. Death registrations where only a residential title was supplied (e.g. nursing home, hospital etc.) underwent additional coding.
MESH BLOCK
Mesh Blocks are the smallest geographical area defined by the ABS. The 2011 Australian Statistical Geography Standard (ASGS) contains 347,627 Mesh Blocks covering the whole of Australia without gaps or overlaps.
The standardised Mesh Block variable was based on the usual residential address of a record. Instances where a Mesh Block could not be assigned or the respondent usually resided overseas were recoded to missing.
AGE
Age was standardised to three digits and top-coded to a maximum value of 115. For death registrations, age in months under one year was recoded to zero.
YEAR OF BIRTH
Year of birth values that were either invalid or had only two digits were amended using age information, when possible. For example, where a record had only stated ‘07’ as the year of birth, this value would be recoded to either ‘1907’ or ‘2007’, depending on supplementary age information that had been provided.
BIRTHPLACE
A two-digit Birthplace was created in order to minimise disagreement when linking records belonging to people born outside Australia. This allowed for records to agree using broader regions rather than specific countries where information might disagree (e.g. ‘Northern Europe’ instead of ‘England’, ‘Norway’, etc.).
YEAR OF ARRIVAL IN AUSTRALIA
Records that did not state a Year of arrival between 1896 and 2011 but had stated an age had a derived value created in the same manner as had been done for year of birth. Records that stated a Year of arrival and also stated they were born in Australia did not have the Year of arrival recoded to missing, as the birthplace may have been misreported.