TECHNICAL NOTE 2: RELIABILITY OF THE PES ESTIMATES
SAMPLING ERRORS ASSOCIATED WITH STATISTICS PRODUCED FROM THE PES
1 Statistics produced from the PES are subject to sampling error. Since only a sample of dwellings was included in the PES, estimates derived from the survey may differ from figures which would have been obtained if all dwellings had been included in the survey. Further, the particular sample selected for the PES was only one of a number of possible samples and each possible sample would also yield different estimates. One measure of the likely difference is given by the standard error (SE).
2 Given an estimate and the SE on that estimate, there are about two chances in three that the sample estimate will differ by less than one SE from the figure that would have been obtained if all dwellings had been included in the survey, and about nineteen chances in twenty that the difference will be less than two SEs.
3 The following example illustrates the use of the concept of SE. If an estimate of 2.5% has a SE of 0.1 percentage points, there are two chances in three that the figure that would have been obtained if all dwellings had been included in the sample is in the range 2.5% ± (1 x 0.1%); i.e. between 2.4% and 2.6%, and nineteen chances in twenty that the figure is in the range 2.5% ± (2 x 0.1%); i.e., between 2.3% and 2.7%.
SAMPLING ERRORS ON ESTIMATES OF DIFFERENCE
4 The sampling error on the difference between two estimates can be derived from their SEs. For the difference between two estimates x and y produced from the PES, the SE of the difference may be approximated by the following formula:
5 This approximation will be exact for differences between estimates in different states, for greater capital city versus rest of state regions, or for differences between estimates from different Censuses. However, for estimates within the same region, there will be a negative correlation between the estimates (or rates) so that the approximation will generally underestimate the true SE.
6 For example, if the estimates of the rate of net undercount for usual residents in Queensland greater capital city and rest of state are 2.7% and 1.4%, with SEs of 0.35 and 0.5 percentage points respectively, and using the formula above, the SE on the difference (1.3 percentage points) is:
7 Therefore there are nineteen chances in twenty that the difference between the rates of undercount for usual residents between these two regions is within the range 1.3 ± (2 x 0.61); i.e., between 0.08 and 2.52 percentage points.
NON SAMPLING ERROR
8 The estimates of net undercount are also subject to non-sampling errors which occur in all collections, whether censuses or surveys. Examples of this kind of error include imperfections in reporting by respondents and errors made in the collection and processing of data. Every effort is made in both the Census and PES to minimise non-sampling error by careful design of forms, training and supervision of field officers and interviewers, and by using effective operating procedures. Types of non-sampling error arising from the way the PES is conducted and the way estimates are derived from the survey are discussed below.
9 A potential weakness in the PES method is its necessary dependence on linking as a means of deciding whether or not a given person or dwelling in the PES has been counted in the Census. Despite procedures to minimise this, the difficulties associated with the linking process mean there is a risk of failing to link people who were actually included in the Census. The effect of not linking when there should have been a link would be to overstate net undercount in the Census. However, the introduction of ADL in the 2011 PES processing phase, which was used again in 2016, helped to reduce the likelihood of this type of error occurring.
10 Nevertheless, if the variables used to establish the link are of poor quality (e.g. not stated or imputed), links are less likely to be made. To mitigate this risk, Census records that had insufficient personal identifier information, and therefore did not have a high chance of being linked to PES, were moved to the Census non-contact sector and treated in a similar fashion to late returns. This avoids a bias in the contact sector, albeit at the expense of increased sample error from the non-contact sector. For more information on the treatment of these records, see Components of Net Undercount on the Summary tab.
11 While the Census and PES are conducted independently of each other, they are very similar in many respects. Thus, some weaknesses in the Census may also be shared by the PES leading to an understatement of net undercount. For example, dwellings missed by a Census Field Officer are often difficult to find and so could be missed by a PES interviewer as well. In addition, people who avoid being included in the Census may also avoid being included in the PES. The use of benchmarks in estimation helps to control for the effect of this ‘correlation bias’.
ERRORS ASSOCIATED WITH THE NON-CONTACT SECTOR
12 The PES provides an estimate of the total number of people who should have been counted in the Census non-contact sector (i.e. late returns, imputed persons in non-responding dwellings, and persons with insufficient personal identifier information on their Census form).
13 PES estimates of the population in the non-contact sector have relatively high sampling errors, mainly because Census person counts for this sector were not available to use as a weighting benchmark, but also due to the small sample size (as there were relatively few Census non-contact dwellings selected by chance in the PES sample).
14 This lack of Census person counts also means that, while the dwelling weights used for the non-contact sector were estimated from the sector itself, the adjustments applied to provide final person weights use the information observed in the contact sector. This is a potential source of non-sampling error, as is any bias arising from peculiarities of the non-respondents in this sector. Both these sources of non-sampling error are expected to be small, compared with the sampling error of the population estimate for the non-contact sector.