2.1 STANDARDISATION
Before records on two datasets are compared, the contents of each need to be as consistent as possible to facilitate comparison. This process is known as 'standardisation' and includes a number of steps such as verification, recoding and re-formatting fields, and parsing text fields (i.e. separating text fields into their components). Additionally, some fields require substantial repair prior to standardisation.
Some variables, such as age, differ between the two datasets in a predictable way, and an adjustment is required to account for this difference. Some variables are coded differently at different points in time, and concordances may be necessary to create variables which align on the two datasets. Variables may also be recoded or aggregated in order to obtain a more robust form of the variable. Standardisation takes place in conjunction with a broader evaluation of the dataset, in which potential linking variables are identified.
The standardisation procedure for the ACLD linkage project involved coding imputed and invalid values for selected variables to a common missing value. These variables include day of birth, month of birth, year of birth, age, sex, year of arrival and marital status. Standardisation for hierarchical fields involved collapsing at higher levels of aggregation to minimise disagreement when linking records which may have had a small intercensal change or to allow for potential differences in the coding of the variable. This allows for records to agree using broader categories rather than disagree on specific information that may have changed over time or be reported and/or coded inconsistently. An example of this is country of birth. Whereas in 2006 the respondent may have been coded to ‘Northern Europe’, in 2011 they may have reported a specific country such as ‘England’ or ‘Norway’. If left in its original state, a comparison between 'Northern Europe' and 'England' would not agree, even though one is a sub-category of the other. Variables grouped in this manner included country of birth, occupation, field and level of qualification, language spoken and religion.