Appendix C Accuracy of the Census and the Post Enumeration Survey

Report on the quality of 2021 Census data: Statistical Independent Assurance Panel to the Australian Statistician

An independent view of the quality of statistical outputs from the 2021 Census of Population and Housing

Released

28/06/2022

Release date and time

28/06/2022 10:00am AEST

Overview

This Appendix provides an outline of an accuracy framework the Panel created to assist in its assessment of the accuracy of the Census and the Post Enumeration Survey. Furthermore, it provides an assessment of the accuracy of the Census and the Post Enumeration Survey according to this framework in Sections C.3 and C.4.

C.1 Census accuracy framework

The first component of the accuracy framework focuses on the quality of the Census data and identifies four main groups of potential errors that can impact on quality. These are:

1. Non-response

Non-response error occurs when people refuse to participate in the Census or do not return their Census forms in time for their data to be processed. Key considerations for this error type are:

The distribution of non-response. If particular sub-populations such as young males are more likely than others to non-respond, then the data may not be representative of the entire population. The impact can be reduced by the imputation of persons in non-responding occupied dwellings, depending on the accuracy of the imputation process.
The accuracy of the imputation process determines how well non-response bias is mitigated. Imputation refers to the process whereby missing or erroneous responses are inferred from likely or appropriate information. An important aspect is the determination of whether non-responding dwellings were occupied on Census night or not. This applies to both private and non-private dwellings.
Item non-response. Some items are not completed on the Census form, either accidentally or deliberately. As forms are processed, these are coded as ‘not stated’ and will have impacts on the quality of the Census data if the item non-response rate is high. Also, for imputed persons, there will be item non-response for all items except for Age, Sex, Marital status, and Place of usual residence, which are imputed. This will add to the effective item non-response rate.

2. Coverage

After adjusting for non-response, coverage error in the Census is the difference between the number of people and dwellings counted in the Census, compared to the actual number of people and dwellings in Australia on Census night. Coverage error can be due to overcoverage or undercoverage:

Overcoverage of dwellings can occur when dwellings are listed or counted more than once, or out of scope dwellings (e.g. unoccupied temporary dwellings such as cabins, caravans and tents) are mistakenly included. Overcoverage of persons may occur when people are counted more than once, or when forms for people who do not exist or are outside Australia on Census night are submitted.
Undercoverage of dwellings can occur when dwellings are missed from the count (e.g. not listed on the Address Register) or are mistakenly considered out of scope. Undercoverage of persons can occur when the Census misses people from the count, which can be due to their dwelling where they were on Census night being missed, or because they did not respond and were not correctly identified as a non-respondent.

3. Measurement

Measurement error is the difference between what the Census questions are trying to measure and the responses people give to them. Difference can occur due to the way people interpret questions. As the questions and interpretations change over time, this can lead to challenges comparing historical series. Key considerations for this error type are:

comparability over time;
consistency with external data sources; and
internal consistency within the Census data set.

4. Processing

Processing error encompasses all errors introduced in processing the data after collection is complete. Two key types of processing errors are:

coding errors, which occur when a response is incorrectly coded (or misclassified) into the wrong category; and
imputation errors, which occur when imputed values do not accurately represent the true missing value.

There is a small possibility of some error during the data capture process due to misreading of the paper forms.

C.2 Population estimates accuracy framework

The second component of the accuracy framework focuses on the accuracy of the Census data in its use in contributing to population estimates. This component looks at potential errors that may impact on the quality of population estimates from the Post Enumeration Survey and its interaction with the Census, and categorises them into six main groups:

1. Coverage

Coverage error in the Post Enumeration Survey is the difference between the population in scope for selection in the survey and the population that ideally should have been in scope. Dwellings and people could be missed due to deficiencies in the Address Register used for the survey or imperfect field procedures, leading to undercoverage.

2. Sampling

Sampling error is random error resulting in either an underestimate or an overestimate as the Post Enumeration Survey is a sample survey. Sampling errors will be relatively higher for the more detailed estimates.

3. Non-response

Non-response error for the Post Enumeration Survey occurs when people selected in the survey do not respond; this is either when correspondence requesting them to undertake the interview does not reach them (and a Field Interviewer is unable to make contact with them directly), or when people refuse to participate. Key considerations for this error type are also the distribution of non-response (i.e. representativeness of the achieved sample), and the level of item non-response.

4. Measurement

Similar to that for the Census, measurement error for population estimates is the difference between what the Census questions are trying to measure, and the responses people give to them. Of particular interest for population estimates is consistency between the way people respond to the Post Enumeration Survey and the Census.

5. Processing

Processing error encompasses all errors introduced in processing the data after the Post Enumeration Survey collection is complete. A key processing error is matching error, which occurs if processing does not correctly match persons counted in the Post Enumeration Survey sample to their corresponding record in the Census.

6. Model

Model error occurs when the underlying assumptions in the model used to estimate Census overcoverage and undercoverage are not valid. For example, the model makes assumptions of statistical independence within population sub-groups when the Post Enumeration Survey and the Census miss the same people (see Section C.4.3). This may not be a valid assumption in practice.

C.3 Assessment of the Census against the accuracy framework

1. Coverage

Net overcount for persons imputed (where a dwelling was known to be occupied but no form was returned, or where a person was known to be staying in a non-private dwelling but no form was returned) is 2.1%, lower than 2.7% in 2016. This is a statistically significant decrease. The introduction of self-service options on the Census website made it easier for people to report if they would not be at home on Census night. In addition, an occupancy determination model was developed for 2021 which helped to more accurately determine whether a dwelling was occupied or unoccupied on Census night in the absence of reliable field information. Both innovations – combined with a less mobile population due to COVID-19 restrictions and more people being enumerated at home - have helped ameliorate the impact of restricted field activities due to the pandemic and have resulted in an improvement in data quality.

Net undercount for persons on Census forms in 2021 is 2.8%, nearly one percentage point lower than 2016 (3.7%). This is statistically significant. Gross undercount has reduced while gross overcount has remained stable. Gross undercount has decreased from 4.9% in 2016 to 4.0% driven largely by a reduction in people missed from returned Census forms. This may be due to a combination of factors including form improvements to clarify who should be included on the form, as well as the effect of COVID-19 related restrictions reducing mobility and of household members and transience of the population.

2. Non-response

Non-response increases the risk of bias toward characteristics of the responding population. The response rate for private dwellings is calculated as 96.1% in 2021 compared with 95.1% for 2016. Upon completion of the field phase, imputation (for persons that were not included on a Census form) is the main method for dealing with potential non-response bias. Its effectiveness for dealing with non-response bias depends on the accuracy of the imputation process. There are two aspects to imputation accuracy:

determining which non-responding dwellings were occupied on Census night; and
ensuring that the imputation process accurately reflects the age, sex and marital status characteristics of non-responding dwellings.

Following the 2016 Census, the Independent Assurance Panel recommended that these areas, namely the accuracy of occupancy determination and the representativeness of age imputation, be improved for the 2021 Census. Initial results from the 2021 Post Enumeration Survey indicate that there have been improvements in both areas in the 2021 Census.

Figure C.1 Imputed people by age, Australia: 2021 Census
	Persons
0-4	56,590
5-9	67,930
10-14	66,265
15-19	60,170
20-24	76,657
25-29	94,642
30-34	92,549
35-39	88,072
40-44	75,663
45-49	73,291
50-54	68,876
55-59	65,127
60-64	58,052
65-69	49,874
70-74	42,427
75-79	30,385
80-84	22,434
85+	27,815

Figure C.1	Imputed people by age, Australia: 2021 Census

["","Persons"]

[["0-4","5-9","10-14","15-19","20-24","25-29","30-34","35-39","40-44","45-49","50-54","55-59","60-64","65-69","70-74","75-79","80-84","85+"],[[56590],[67930],[66265],[60170],[76657],[94642],[92549],[88072],[75663],[73291],[68876],[65127],[58052],[49874],[42427],[30385],[22434],[27815]]]

[]

[{"value":"0","axis_id":"0","axis_title":"Age group (years)","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

[{"value":"0","axis_id":"0","axis_title":"Persons","axis_units":"","tooltip_units":"","table_units":"","axis_min":"0","axis_max":"100000","tick_interval":"10000","precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Note: Excludes overseas visitors and Other Territories.

The Post Enumeration Survey provides a comparison of the number of people created through imputation and the number of people that should have been created. As in previous censuses, the ABS has created more people in 2021 than were missed. However, this has improved in the 2021 Census with over-imputation decreasing from 2.7% in 2016 to 2.1%.

Item non-response rates for the 2021 Census are lower than in 2016 and this is largely due to the lower number of imputed records where ‘not stated’ will be recorded for all Census variables except Age, Sex, Marital status and Usual residence. Excluding imputed records, item non-response rates in 2021 are also generally lower than in 2016 mainly due to the increased use of the online form.

See Section 3.3 for more information.

3. Measurement

For the variables examined, the Panel could not identify any significant new measurement errors by comparing 2021 and 2016 Census results. The Panel observed an increase in the number of persons reporting Aboriginal and Torres Strait Islander ancestries, and noted a higher rate of non-response for the Long-term health conditions question for people aged 15 years and under who responded on the Interviewer household form. This is thought to be due to the question’s placement on that type of form. See Sections 3.5 and 3.7 for further discussion of Census data items.

4. Processing

The Panel did not examine the accuracy of the coding process. However, the Panel observed there was an improvement in the imputation process in the 2021 Census.

C.4 Assessment of the Post Enumeration Survey against the accuracy framework

1. Coverage

There has been no change in the target population of the Post Enumeration Survey in 2021 compared with 2016, which includes all private dwellings in Australia but excludes non-private dwellings. In 2021, a list based multi-stage sample was used based off the Address Register frame for the private dwelling sample. This is different to 2016, and earlier iterations, which used an area based multi-stage sample. The Aboriginal and Torres Strait Islander community sample continued to use an area based multi-stage sample. To ensure the Post Enumeration Survey measured coverage independently to the Census, a desktop address canvassing exercise was undertaken prior to enumeration which identified dwellings missed from the Address Register frame and added these to the Post Enumeration Survey sample.

2. Sampling

The sample size was increased from 2016 by around five per cent to account for population growth. Sampling error has decreased slightly overall due to the increase in Census response rates and the number of responding dwellings in the Post Enumeration Survey sample. Sampling errors are shown in Table C.1.

Table C.1 presents net undercount rates and standard errors for each state and territory. As can be seen, the level of variance in 2021 is comparable with 2016.

Table C.1 Net undercount rate, by state/territory of usual residence
	2021		2016
	%	Standard error	%	Standard error
New South Wales	0.0	0.4	0.8	0.4
Victoria	0.3	0.3	1.4	0.4
Queensland	1.0	0.4	1.3	0.5
South Australia	1.0	0.5	0.2	0.5
Western Australia	3.0	0.6	0.4	0.6
Tasmania	1.6	0.5	0.1	0.7
Northern Territory	6.0	1.2	5.0	1.5
Australian Capital Territory	-0.6	0.7	-1.1	1.4
Australia	0.7	0.2	1.0	0.2

Notes: A negative value indicates a net overcount.

Excludes Other Territories and overseas visitors.

3. Non-response/correlated response bias

The total number of fully responding dwellings in the 2021 Post Enumeration Survey was 45,138. This represented a response rate of 89.3% for the general population sample (a decrease from 91.2% in 2016) and 91.6% for the Aboriginal and Torres Strait Islander Community sample (a decrease from 92.7% in 2016) (see Table C.2). This increase in non-response rate increases the risk of non-response bias and correlated response bias, in particular. This could lead to an underestimation of the population estimates but there is no evidence of this being an issue.

Table C.2 Response rates, by state and territory
	NSW	Vic.	Qld	SA	WA	Tas.	NT	ACT	Aust.
	%	%	%	%	%	%	%	%	%
2021
General population	88.8	92.3	88.4	89.9	91.1	91.2	79.9	89.5	89.3
Aboriginal and Torres Strait Islander communities	88.0	-	96.5	83.3	90.7	-	91.7	-	91.6
2016
General population	89.1	91.3	92.4	93.2	93.3	94.9	84.4	91.6	91.2
Aboriginal and Torres Strait Islander communities	78.6	-	92.7	100.0	89.5	-	93.4	-	92.7

- Nil or rounded to zero.

Note: Excludes Other Territories and overseas visitors.

To reduce correlation bias, weighting classes can be used in the estimation model such that within a weighting class there are no systematic patterns of response. Based on analysis of 2021 Census and Post Enumeration Survey response patterns, state-based quartiles of the Socio-Economic Indexes for Areas (SEIFA) - Index of Economic Resources[1] were included as weighting classes which effectively reduced the correlation bias in the estimates.

A technique known as propensity analysis was also conducted on both the Census and Post Enumeration Survey to assess the main drivers of non-response. This analysis used Post Enumeration Survey data to model the propensity for a person to complete a Census form and Census data to model the propensity for a person to complete the Post Enumeration Survey. Outcomes from this analysis were used to determine benchmarks for use in the weighting used in the Post Enumeration Survey estimation process. The variables used in the weighting process in 2016 were again used in 2021 and included state and/or territory, and part of the state and/or territory of usual residence, Sex, five-year age groups, Marital status, Country of birth, Indigenous status, and whether located in a Aboriginal and Torres Strait Islander community. As a result of the 2021 propensity analysis, two additional groupings were found to be significant predictors of Census response and therefore included in the weighting process; an indicator of how the Census form was delivered (mailed out or dropped off), and a measure of Census field areas that were difficult to enumerate (based on the proportion of occupied non-responding private dwellings).

[1] Australian Bureau of Statistics (2018). Census of Population and Housing: Socio-Economic Indexes for Areas (SEIFA), Australia, 2016. Retrieved from https://www.abs.gov.au/ausstats/abs@.nsf/mf/2033.0.55.001

4. Measurement

Recall error may feature because of the long Census enumeration period. However, this would be no more significant than in 2016. Recall error depends on whether there is an anchor point to assist people’s recollection. Census communication focussed on a Census period for response rather than Census night and response patterns showed high response on and immediately before Census night. While this may have affected the concept of a Census night, recall bias was assessed as being no more a risk in 2021 than in previous Post Enumeration Surveys.

Memory effects for major events (which Census is designed to be) tend to wane after one to two weeks and would have been a little worse after ten weeks for the 2021 Post Enumeration than they would have been for previous Post Enumeration Surveys.
In addition, the requirement to recall information about Census night in 2021 was examined in the context of Automated Data Linking. Automated Data Linkage provides an enhanced ability to search for, and match, Post Enumeration Survey persons to their Census form, utilising a variety of search addresses and personal identifier information. Therefore, the requirement for persons to recall their exact address on Census night is reduced, provided they can correctly recall whether or not they were in Australia (i.e. whether they should have been counted at all).

5. Processing

Matching error should be lower in the 2021 post Enumeration Survey because of the improved matching procedures.

Outcomes from linking and matching processes underwent a high level of scrutiny and quality assurance in 2021, to ensure that the Post Enumeration Survey did not miss links for Post Enumeration Survey persons who were actually counted in the Census, and to ensure that a Post Enumeration Survey person was not linked to a Census record in error.

Final match rates for the general population for persons with at least one link to 2021 Census were higher than the 2016 equivalents (94.5% and 91.1%, respectively). More high-quality links were found by Automated Data Linking in 2021 (and did not require clerical review), compared to 2016 (75.0% and 65.1%, respectively). This is likely to be a result of increased online Census uptake as well as more people being at home on Census night with mobility restricted in some states and territories.

Given a key requirement for successfully linking a Post Enumeration Survey person to their corresponding Census record is a sufficient level of data quality, a quality adjustment was continued in 2021. This method identified Census and Post Enumeration Survey records that had insufficient personal identifier information necessary for linking records (e.g. where Census data was missing or imputed for multiple linking variables, such as Name, Age or date of birth, and Sex). To remove the potential for any upward bias on the Post Enumeration Survey estimate of population totals (and level of net undercount), these records were treated in a similar fashion to late returns. This adjustment moved 31,616 Census persons or 0.12% of all persons counted in the Census.

6. Model

The ABS advises that the estimation model referred to in the section on non-response/correlated response bias, which is used to produce estimates and adjust for non-response appears to have worked effectively. However, there are no measures of the residual non-response bias.

APA

Citation

Appendix C Accuracy of the Census and the Post Enumeration Survey

APA

Citation

Overview

C.1 Census accuracy framework

1. Non-response

2. Coverage

3. Measurement

4. Processing

C.2 Population estimates accuracy framework

1. Coverage

2. Sampling

3. Non-response

4. Measurement

5. Processing

6. Model

C.3 Assessment of the Census against the accuracy framework

1. Coverage

2. Non-response

3. Measurement

4. Processing

C.4 Assessment of the Post Enumeration Survey against the accuracy framework

1. Coverage

2. Sampling

3. Non-response/correlated response bias

4. Measurement

5. Processing

6. Model

Provide feedback