Australian Bureau of Statistics
2940.0 - Census of Population and Housing - Details of Undercount, 2011
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 21/06/2012
|Page tools: Print Page Print All RSS Search this Product|
LINKING AND MATCHING
The PES allowed up to seven search addresses to be recorded, however the greatest number of search addresses recorded in the field for a single respondent in 2011 was two. Search addresses comprised around 10% of the total number of addresses recorded in the PES.
Table 16 shows that for every enumeration state, 70-85% of search addresses were located within the same state as the enumeration address (with the exception of the ACT), which allowed PES respondents to be linked to their Census location in a state-based run of ADL. The remainder were linked in non-state-based ADL runs, and were distributed predominantly throughout the three most populous states of New South Wales, Victoria and Queensland.
Search address data were collected directly from PES respondents and related to locations at which they were present up to two months previously. As such, the detail and accuracy of this information varied, ranging from perfectly spelt out addresses with street number, suburb, city and postcode, to 'vague addresses' such as "a motel in Sydney". Therefore, in order to code search addresses successfully, an additional two-stage process was carried out, as detailed below.
Address repair was conducted on all search addresses, that is, any address given in the PES that differed from the enumeration address. This was done manually by a team of coders who reviewed the address text fields and amended them through a variety of techniques. Quality assurance was then conducted.
Address coding was undertaken after address repair with the aim of identifying the correct geographic areas (Meshblock [MB], CLW and SA1) for all addresses (enumeration and search addresses), according to the ASGS. This was done by first running all addresses through the AddressCoder@ABS application. Quality assurance for this automated process involved a complete review of the addresses that were amended by the automated coder in order to fit into a geographic classification, and retention of all original addresses. Those records which were not automatically coded were then sent to a coding team for manual processing. This manual process utilised various methods, including mapping software, to thoroughly scrutinise addresses and achieve the most accurate geographic coding possible. Further quality assurance was then undertaken.
INPUT EDITING - ITEM DERIVATIONS
Most data on the PES file were of a sufficient quality to feed into both linking and matching processes and later output processing, without further detailed editing. However, certain validation processes highlighted issues that required amendments to be made.
Derivations were used to correct Age/Date of Birth (DOB) and Marital Status responses. Where one data field was missing (e.g. Age), but a similar one was available (e.g. DOB), the missing field was derived and populated. Derivations were also created by examining individual 'person level' records to derive 'dwelling level' information for the relevant dwelling (e.g. the number of Usual Residents and Visitors in the dwelling, or whether the dwelling contains any Indigenous respondents).
INPUT EDITING - STANDARDISATION
In preparation for ADL, PES data were repaired and standardised through a three-stage process, converting it into a format that could then be compared with similarly standardised Census data through both automated and manual systems.
AUTOMATED DATA LINKING - LINKING
Automated Data Linking refers to the use of probabilistic linking methods to determine possible links between Census and PES data in an automated fashion, and was used as the primary linking method in 2011. Its introduction followed an evaluation exercise undertaken by linking experts within the ABS after the 2006 PES.
ADL uses a range of personal and address characteristics, to evaluate the likelihood that a PES record and a Census record pertain to the same individual. The software used in both the 2006 quality study and the 2011 PES was Freely Extensible Biomedical Record Linking (FEBRL), which was developed at the Australian National University.
ADL provided the opportunity to match persons in the 2011 PES with those in the 2011 Census who would have previously been too difficult to match, given the constraints of prior technology and processes. The key gains in matching effectiveness and efficiency provided by ADL in 2011 included:
A number of different linking runs were used in 2011 to compare PES and Census records, each of which focused on a slightly different combination of name, address and demographic variables. At the beginning of each run, a list of PES and Census records was obtained by selecting a subset of the PES and Census datasets based upon agreement on a small number of variables. This process, called 'blocking', was used to stratify identified links (i.e. links at earlier runs took precedence), and to reduce the quantity of poor quality links returned in each run. Table 17 shows the ADL runs and the relevant 'blocking' fields used in each run.
Potential links were then assessed by assigning weights that reflected the level of agreement on selected data items from the two records. Large positive weights indicated probable matches, while large negative weights were observed for probable non-matches. These weights were then grouped and organised in the processes of CARDS and DLR, which we now describe.
AUTOMATED DATA LINKING - CARDS AND DLR
Important to the effective use of ADL were a series of processes run after ADL output was obtained. The Collect, Analyse, Reduce, De-duplicate and Systematise (CARDS) process collated, processed, identified and rated the most plausible links from each ADL run for all PES respondents. The process then combined the person links from all ADL runs and removed any duplicates. The resulting output was a single numeric 'Person Link Rating' (PLR) for each individual linked pair (a PES respondent and a Census respondent) ranging from 0.1 to 10.0 based upon agreement on various characteristics.
Person links were then grouped into Platinum, Silver and Tin categories, based on their PLRs.
The CARDS process concluded by identifying and rating dwelling links through the Dwelling Link Rating system. In order to identify dwelling links, all person links within one PES dwelling were grouped together into a 'dwelling'. Dwelling links were then created between that PES dwelling and the Census dwelling(s) of the linked Census respondents. A 'Dwelling Link Rating' (DLR) was then assigned to each dwelling link based on the number of people linked between the PES and Census dwellings proportional to the number (if any) that were not linked, and the PLRs of the links.
Similar to the person links, dwelling links were then stratified into Platinum, Silver and Tin categories based upon their DLRs, allowing strong links (e.g. those with many person links and high PLRs) to be investigated before weaker links (e.g. with few person links and low PLRs). For a dwelling link to be rated as platinum, all its persons had to have a platinum PLR and be linked to Census persons within a single dwelling. If there were missing people, in either the PES or Census dwelling, or not every person had a Platinum link, the maximum rating the dwelling could be assigned was Silver. As with person links, the remainder of dwellings were placed into either Silver or Tin, based on the quality of the person links within.
PROCESSING IN THE PES MATCH AND SEARCH SYSTEM (MSS)
While ADL was the next step in the evolution and continual improvement of PES processing, ADL could not entirely replace the clerical decision-making process that has previously been at the core of PES processing. Clerical judgment will always be required to resolve the more complex or ambiguous cases and be used as a means of quality assuring automated processes. Some adjustments to the clerical match and search processes were necessary in 2011 to ensure that the relative strengths of both ADL and the MSS were fully realised.
The MSS was the main PES clerical review facility and was specifically built for PES processing in 2006. In 2011, the MSS again allowed processing staff to clerically search, view, compare, and record matches between PES and Census data. PES processing staff used the MSS to record clerical matches of dwellings and people between PES and Census, and to clerically search for people on Census forms at alternative addresses provided in the PES. In 2011, it was also used to assure the quality of ADL output.
The initial phase of MSS processing involved confirming whether the ADL output was correct. Once a dwelling link was confirmed, the Census person records for that dwelling were clerically compared with the PES person records. The information compared included name, sex, date of birth, age, marital status, Indigenous status and country of birth. The extent to which each of these variables was the same, in both the PES and the Census, determined the ADL match status of the pair and the level of match.
AUTOMATED DATA LINKING - LINK UPGRADING
Link Upgrading was a process of secondary examination after the main runs of ADL and MSS clerical review were completed for each state. Once MSS had been run on the Silver links for each state, the highest rated tin links for those PES people who were not matched were extracted (i.e. effectively upgraded) and entered into a second run of MSS processing.
INTENSIVE SEARCH ACTIVITIES
Once all ADL links were reviewed, the final phase of MSS processing was to conduct an intensive clerical search for persons and dwellings not matched as a result of ADL-enabled processing. This was done by searching CLWs (and neighbouring CLWs) for addresses provided by respondents during the PES interview (search addresses), in order to locate possible Census forms where that person was included. This followed 2006 methodology, which is described in Census of Population and Housing - Undercount, 2006 (cat. no. 2940.0) and Census of Population and Housing - Details of Undercount, 2006 (cat. no. 2940.0).
MSS QUALITY ASSURANCE AND ADJUDICATION PROCESSES
To ensure the accuracy of MSS processing, quality assurance (QA) procedures were used in the match and search process whereby all PES records processed in MSS were processed a second time by a different clerk. There was no identifier on the workloads that allowed the PES processors to know whether they were processing an 'original' or a QA workload. Where the initial and the QA processing outcomes corresponded, the initial match status was accepted. Where there was a discrepancy between the initial match status and the QA match status, the records were flagged for adjudication by a senior officer who reviewed all information and determined which match status was correct. Where both the initial and QA records were deemed to be inaccurate, the adjudicator reprocessed the record.
The QA process was also useful in identifying potential processing issues or areas where processors were having difficulty. This allowed ongoing feedback to be provided to the PES processors and contributed to the overall quality assurance of PES processing.
DISCRETE INDIGENOUS COMMUNITY PROCESSING
MSS processing for discrete Indigenous communities followed the 2006 approach and involved searching the entire community for a person match, rather than just searching within a single dwelling. Person matching in discrete Indigenous communities used the same rules for determining a match as in the mainstream component, but allowed for the use of up to two alternate names for each person when matching on name.
CONFIDENCE OF MATCH DECISIONS
Table 18 shows the matching outcomes from the 2011 PES linking and matching processing. Of the 94,539 total mainstream matches, 52,398 (or 59.8%) were matched without clerical review, 34,653 (or 39.5%) were matched after clerical review of ADL links, with the remaining 594 (or 0.7%) matched as a result of intensive search processing.
STATISTICAL IMPACT STUDY PROCESSING
In order to assess the impact of ADL on 2011 PES estimates, a Statistical Impact Study was conducted during linking and matching processing. For further information see the Statistical Impact of ADL Technical Note (in Explanatory Notes).
These documents will be presented in a new window.
This page last updated 20 June 2012