2940.0 - Census of Population and Housing: Details of Overcount and Undercount, Australia, 2016 Quality Declaration 
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 27/06/2017   
   Page tools: Print Print Page Print all pages in this productPrint All

LINKING AND MATCHING

1 A linking and matching exercise is used to determine whether each PES respondent was counted in the Census (and how many times), whether they were counted in error, or whether they were missed entirely. Linking PES persons to their Census form involved a range of automated and manual processes, focused on finding matches between approximately 114,000 PES person records and over 22 million Census person records (not including records for the imputed persons in non-responding dwellings).

2 The various processes that made up linking and matching in 2016 were:

  • Standardisation
  • Address coding
  • Address text matching
  • Automated Data Linking (ADL)
  • Clerical review using the Match and Search System (MSS).

STANDARDISATION

3 In preparation for ADL, PES data were repaired and standardised through a four-stage process to convert them into a format that could be directly compared with similarly standardised Census data:
  • Data Repair – cleaned the data by removing non-alphabetic characters and capitalising the remainder, and by removing additional spaces
  • Name standardisation – converted common nicknames, abbreviations, misspellings or variations on a name to their 'origin name' (e.g. Beth, Eliza and Libby were converted to Elizabeth)
  • Address text parsing – cleaned raw address information by cross-referencing the components of an address, such as street name or suburb, with known street names or suburbs
  • Data transformation – ensured that each variable was comparable to its Census counterpart (e.g. ensuring PES numeric identifiers for Indigenous status matched to those of Census).

ADDRESS CODING

4 Address information is essential for matching PES respondents to their Census form. PES addresses were divided into two categories:
  • Enumeration Address – the address at which the PES interview took place
  • Search Address – all other addresses, including the usual address of visitors to the PES dwelling, the address at which the PES person was located on Census night, the address at which they were included on a Census form, and any other addresses where the respondent may have been included on a Census form.

5 The AddressCoder@ABS is a web-based application used to geocode each type of address. From this, geographic information was assigned, such as a Census Field Area (CFA), a Mesh Block (MB), or a Statistical Area 1 (SA1) boundary, which were all used during automated data linking of persons.

6 Geocoding via the address coder was relatively resilient to errors in the address text (e.g. character substitution or form-scanning errors) as it needed only to identify the locality and not a specific address or dwelling.

7 Addresses that were unable to be coded automatically via the AddressCoder@ABS application were sent to a processing team for manual coding. This manual process utilised various methods, including mapping software, to thoroughly scrutinise addresses and achieve the most accurate geographic coding possible.

ADDRESS TEXT MATCHING

8 Address text matching was introduced in the 2016 PES and provided an opportunity to identify potential dwelling links based on exact address information. It was used to match a PES address to a specific entry on the ABS Address Register.

9 This exercise was particularly useful for dwelling types that were in scope for linking (e.g. unoccupied dwellings) but unable to be linked via automated data linking, which is person-based. The proposed dwelling link was then fed through to the clerical matching process for confirmation. It should be noted, however, that address text matching was susceptible to errors or missing entries in the address register.

AUTOMATED DATA LINKING (ADL)

Linking

10 ADL refers to the use of probabilistic linking methods to determine possible links between Census and PES data in an automated fashion, before any clerical matching process begins. This was introduced as the primary linking method in the 2011 PES, which used the Freely Extensible Biomedical Record Linking (FEBRL) software, and was used again in 2016.

11 The automated linking process used a range of personal and address characteristics to evaluate the chance that a PES record and a Census record were for the same person. The method generated large numbers of candidate links and then used a process of elimination to filter down to genuine matches.

12 Seven different linking runs were used in ADL to compare PES and Census records, each focused on a slightly different combination of name, addresses and demographic variables. At the beginning of each run, a list of PES and Census records was obtained by selecting a subset of the PES and Census datasets which agreed on a small number of variables (e.g. the same SA1, date of birth, and surname). This process, called 'blocking', reduced the number of Census and PES records to compare within a run, in order to increase the likelihood of proposing good quality links.

13 The 2016 PES used a set of blocking variables that were comparable to those used in 2011, allowing for updated geographical classifications. The seven linking runs used various combinations of the following:
  • SA1
  • CFA
  • State
  • Postcode
  • Standardised name (blocking on initials)
  • Birth date (day, month and year)
  • Sex
  • Registered marital status.

14 Potential links were generated by assigning weights to reflect the level of agreement for combinations of linking variables within each block. Large positive weights indicated probable matches, while large negative weights indicated probable non-matches.

Consolidation of ADL links

15 A series of processes was undertaken following the ADL runs to clean and consolidate the proposed links.

16 The Collect, Analyse, Reduce, De-duplicate and Systematise (CARDS) process identified and rated the most plausible links from each ADL run for all PES respondents. The process then combined the links from all ADL runs and removed any duplicates, with links from earlier runs taking precedence.

17 The final step of the automated linking process was to group person links together into dwelling units when they were co-located in the same PES-Census dwelling pair, through a process called Dwelling Link Rating (DLR). This had several benefits including:
  • finding additional candidate person links by upgrading lower quality links if they were co-located with high quality person links
  • grouping links by whole dwellings in preparation for clerical review. The MSS was more effective when handling person links in the same dwelling, as the processor could examine the entire household and their corresponding Census form at the same time.

18 The proposed dwelling links were then categorised into the following:
  • Platinum – the dwelling link was of sufficiently high quality that it could be immediately confirmed and did not require clerical review
  • Silver – the dwelling link was of moderate quality (e.g. there were high quality links for some persons in a dwelling and not others) and required clerical review to confirm or reject the quality of the link
  • Tin – the link was of low quality and was unlikely to be a true match, therefore clerical review was required to search for matches for these dwellings and persons without the assistance of ADL.

19 All PES dwellings with either Silver or Tin links were sent for clerical review. A small percentage of Platinum links were also clerically reviewed for quality assurance purposes.

MATCH AND SEARCH SYSTEM (MSS)

Processing in the MSS

20 While ADL is a critical component of PES linking and matching, it cannot entirely replace the traditional clerical decision-making process. Clerical judgment will always be required to resolve the more complex or ambiguous cases and provides a means of quality assuring the automated processes. The MSS is used for this purpose.

21 The MSS allows processing staff to manually search, view, and compare PES and Census data. There are two phases of processing in the MSS:
  • Evaluate a candidate link provided by ADL and confirm or reject the link for both dwellings and people between PES and Census
  • A clerical search for a link in the absence of a good ADL candidate link, by searching for people on Census forms at alternative search addresses provided by the PES respondent.

22 To evaluate ADL links, the processor first needed to confirm whether the ADL-proposed dwelling link was correct. Once the dwelling link was confirmed, the Census person records for that dwelling were compared with the PES person records, using information such as Name, Sex, Date of birth, Age, Registered marital status, Indigenous status and Country of birth. The extent to which each of these variables was the same, in both the PES and the Census, determined a match or a non-match status for the pair.

23 Where the ADL-proposed dwelling link was rejected, or if no dwelling link was proposed by ADL, processing staff undertook an intensive search. This search focused on the nominated (and surrounding) CFAs for all search addresses provided by respondents during the PES interview, in order to locate possible Census forms where that person was included. If a dwelling match was found, they proceeded to rate the candidate person matches within that dwelling as per the above.

24 Some redevelopment of the MSS was necessary in 2016 to ensure the system aligned with the changes made to the 2016 Census enumeration model. During this redevelopment work, some system enhancements were made to further strengthen and streamline the clerical matching processes and outcomes; however, the system is considered to be largely comparable with the 2011 version.

MSS Quality Assurance and Adjudication Processes

25 Quality assurance (QA) procedures were used to ensure the accuracy of MSS outcomes. For example, all records sent to the MSS were processed twice. The QA workloads were processed by a different processor, and there were no identifiers to mark it as an original or QA workload.

26 Where the original and the QA match status corresponded, the original match status was accepted. Where there was a discrepancy between the original match status and the QA match status (at either the dwelling or person level), the records were flagged for adjudication by a senior processor (adjudicator) who reviewed all information and determined which match status was correct. Where both the original and QA records were deemed to be incorrect, the adjudicator reprocessed the record.

27 The adjudication process was also useful in identifying potential issues or areas where processing staff were having difficulty. This allowed ongoing feedback to be provided to the MSS staff and contributed to the overall quality of PES processing.

28 A 5% sub-sample of Platinum ADL linked dwellings and persons was also processed fully through the MSS as a quality measure of the ADL. Reprocessing these records confirmed the robustness and high quality of the ADL links. Interrogation of any high quality ADL links that were rejected by the processing staff was also undertaken as a further quality assurance measure.

DISCRETE COMMUNITY PROCESSING

29 As per 2011, ADL was not utilised for processing the Discrete Communities sample in 2016. The low quality geocoded data for these areas, as well as the ability for PES respondents to provide alternative names, would have complicated the ADL process. Instead, the sample underwent full clerical searching and matching in the MSS. System enhancements made to the MSS enabled more thorough and efficient processing to be completed.

30 The process involved searching the entire community for a person match, rather than just searching within a single dwelling. Person matching in Discrete Communities used the same rules for determining a match as in the General Population, but allowed for the use of up to two alternative names for each person when matching on name.

CONFIDENCE OF MATCH DECISIONS

31 Outcomes from linking and matching processes underwent a high level of scrutiny and quality assurance in 2016, to ensure the PES did not miss links for PES persons who were actually counted in the Census, and did not link a PES person to a Census record in error.

32 Final match rates for the General Population for persons with at least one link to 2016 Census were lower than the 2011 equivalents (91.7% and 92.7%, respectively). This change was driven by a reduction in Census response rates. However, more high quality links were found by ADL in 2016 that did not require clerical review, compared with 2011 (65.1% and 59.8%, respectively). This is likely to be the result of improved capture of text fields, such as names, from increased online Census uptake.

Matching Outcomes, 2011- 2016

2011
2016
no.
%
no.
%

GENERAL POPULATION
Persons matched(a)
87 645
92.7
100 783
91.7
ADL Platinum (not clerically reviewed)
52 398
59.8
65 576
65.1
ADL Silver (clerically reviewed)
34 653
39.5
33 776
33.5
Intensive search
594
0.7
1 431
1.4
Persons not matched
6 894
7.3
9 142
8.3
Total Persons
94 539
100.0
109 925
100.0
DISCRETE COMMUNITY
Persons matched(b)
2 528
87.1
3 394
79.2
Persons not matched
373
12.9
892
20.8
Total Persons
2 901
100.0
4 286
100.0

(a) A person can be matched more than once, therefore the total number of matches does not sum to the total number of matched persons.
(b) Matches for the Discrete Community sample were made via the MSS only.