STANDARDISATION
Before records on the two datasets are compared, the contents of the two datasets need to be standardised to facilitate comparison. This includes a number of steps such as verification, recoding and reformatting fields, and parsing text fields (i.e. separating text fields into their components). Additionally, some fields require substantial repair.
Some variables differ between the two datasets in a predictable way, and an adjustment is required to negate this difference. Some variables are coded differently at different points in time, and concordances may be necessary to create variables which align on the two datasets. Variables may also be recoded or aggregated in order to obtain a more robust form of the variable. This set of procedures is collectively termed 'standardisation'. Standardisation takes place in conjunction with a broader evaluation of the dataset, in which potential linking variables are identified.
The standardisation procedure for the Death registrations to Census linkage project involved coding imputed and invalid values for selected variables to a common missing value. These variables included day of birth, month of birth, year of birth, age, sex, year of arrival and marital status. Entire imputed records created for persons known to exist but from whom no Census form had been received, were removed from the pool of Census records prior to linkage.