|Page tools: Print Page Print All|
TECHNICAL NOTE THE IMPACT OF IMPROVEMENTS TO THE 2011 PES ON MEASURING POPULATION GROWTH (2006-2011)
3 The 2011 PES estimated national net undercount to be 374,500 persons (1.7%). This was 175,100 persons less than the undercount in 2006. In comparing 2006 and 2011 estimates it is important to note that a new method was introduced in the 2011 PES. For 2011, ABS used a method known as Automated Data Linking (ADL) which was the major contributor to the decrease in the net undercount rate, from 2.7% in 2006 to 1.7% in 2011.
Automated Data Linking (ADL)
4 Prior to 2011, ABS used a method of determining whether PES respondents were counted in the 2006 Census based on clerical searching and matching. While in most cases this was a reliable methodology, there were instances where address information was too vague or not provided at all, which limited its overall effectiveness in determining whether PES respondents were counted in the Census or not. Automated Data Linking, which was introduced into PES processing in 2011, employs probabilistic linking techniques and enables the matching of persons that would not have been possible in previous surveys. This major improvement in the effectiveness of PES matching has led to a reduction in net undercount in 2011, although it should be noted that 2006 and 2011 estimates are not strictly comparable. For more information on ADL see Census of Population and Housing - Details of Undercount, 2011 (cat. no. 2940.0).
5 In the graphs below, the black line is the original ERP series from June 2006 to June 2011 based on the 2006 Census, and without any regard to the 2011 Census or PES. The series has been estimated from 30 June 2006 to 30 June 2011 by adding births, subtracting deaths and adding the net of overseas migration. This series is called "unrebased ERP".
6 The grey dot represents the new Census base for 30 June 2011. Since the grey dot takes account of up-to-date 2011 Census and PES data (and other rebasing components), it is assumed to be more accurate than the corresponding point on the unrebased line as an estimate of ERP for 30 June 2011.
7 Since the point on the black line for 30 June 2011 is higher than the grey point, the first estimate of 30 June 2011 (based on the 2006 Census) should now be considered an overestimate.
8 The gap between the two points for 30 June 2011 is called intercensal error and it is usually explained as the error which has accumulated over the 5 year period between Censuses. For 2006-2011, this gap is estimated at 294,400. Intercensal error by definition cannot be attributed to any specific component of population growth or the two population bases. It is interpreted as the accumulated error in all of the components of growth including error in either or both of the two population bases.
9 Once intercensal error has been calculated and because it is assumed to have accumulated over 5 years, the error must be spread evenly through the series back to (but not including) the previous population base 5 years earlier.
10 In the graph below, the grey line takes the 2011 Census, PES and other adjustments into account and works backwards to 30 June 2006, evenly spreading the intercensal error through the ERP series. This grey line has now accounted for intercensal error and thus supersedes the black line based on the previous Census.
11 For the first quarter after the new base, in this case 30 September 2011, the components of population growth will be used to increment the grey (preliminary rebased) line.
Graph 2: Unrebased ERP vs Preliminary Rebased ERP - 2006-2011
The Statistical Impact of ADL
12 ABS carried out a study into the statistical impact of introducing ADL. A sample of PES records were processed using a close approximation of the 2006 clerical search and match method, the outcome of which was compared to that achieved from ADL-enabled processing for the same group of records. The Statistical Impact Study answers the question: 'What was the statistical impact of the 2011 Census PES net undercount estimate of using the new ADL method?'. It therefore also answers the related question: 'What was the statistical impact on the intercensal error of using the new ADL method?'.
13 The ADL Statistical Impact Study estimated that the use of ADL to determine whether PES respondents were counted in the 2011 Census or not resulted in a net undercount that was 246,985 persons less than the 2006 PES matching methodology would have delivered.
14 The Statistical Impact Study estimate has a standard error of 43,000. A common approach to assessing the variability inherent in estimates is to examine the 95% confidence interval (which is two standard errors either side of the estimate). Using this approach, there is a 95% chance that the true estimate of the statistical impact of ADL on net undercount in 2011 is between 160,985 and 332,985 persons.
15 It is important to note that the Statistical Impact Study estimate was not designed to provide an alternative measure of net undercount for 2011, in 2006 terms, but only to identify the impact of the ADL methodology. There are a range of PES and Census changes that are not related to ADL that will affect comparability between 2006 and 2011.
The impact of ADL on intercensal error
16 The intercensal error, after factoring in the estimated ADL impact of 246,985, is around 47,000 people. While the Statistical Impact Study results provide some guidance to users of PES and ERP data, they do not, and can not, allow users to produce an alternative 2011 measure, given the other PES and Census changes that were also made. It is also not possible to use the results to produce an alternative 2006 measure.
17 Nonetheless it is clear that ADL has had a significant impact on both the PES undercount estimate and the estimate of intercensal error. The Statistical Impact Study results challenge the usual interpretation of intercensal error as the accumulation of error from all sources because it implies that the introduction of ADL accounts for the majority of the intercensal error, though recognising that the confidence interval on the estimate of the impact is relatively broad. ADL explains around 84% of the intercensal error, with the remaining 16% explained by errors in all of the components of growth and errors in the two population bases.
Population growth between the Censuses
18 The impact of ADL has also had a subsequent impact on population growth from 30 June 2006 to 30 June 2011. Although the rebased ERP is a better estimate of the population level than unrebased ERP, when it comes to measuring population growth, there are some challenges in interpreting the data of which users need to be aware.
19 Table 1 shows that average annual growth on the unrebased ERP series (1.79%) is coherent with average annual growth on the rebased ERP, but only if the impact of ADL was specifically excluded (1.75%). The Statistical Impact Study result is used to estimate the impact of ADL on historical population growth rates from 2006-11.
20 In considering the standard errors on the Statistical Impact Study estimate, we can be 95% confident that rebased population growth would have ranged between 1.7% and 1.8% from 2006 to 2011 if ADL were not used. It is noteworthy that the Census to Census average annual growth rate of 1.6% is higher than the rebased average annual growth (1.5%), but lower than the average annual growth on the rebased ERP if ADL was not used (1.75%).
21 The rebased ERP series produced an average annual growth rate of 1.5%, which is mostly driven by the fact that the PES estimate of net undercount used in the 2006 base did not use ADL, whereas the 2011 base did.
22 The Statistical Impact Study shows that the impact of ADL is the major contributor to reduced intercensal error. The rebased ERP series produces population growth rates that are not coherent with either the unrebased ERP series or the rebased ERP, once the impact of ADL has been taken into account.
23 In using population estimates, for information on the population level for the 2006-11 period, the rebased ERP series is the best series to use. For population growth over the 2006-11 period, the comparison should focus on the components of growth (i.e. births, deaths and migration), rather than the difference in population levels.
24 Population growth rates for the 2006-11 period that are coherent with existing published ERP growth figures can be achieved by introducing a series break into the ERP series. This would be unprecedented, and would require assumptions to be made regarding the impact of ADL over the five year intercensal period. The Statistical Impact Study was not designed to support such assumptions, and other sources of empirical evidence would be required. The implications of a series break on the many uses of the ERP series would also need very careful consideration.
These documents will be presented in a new window.