3118.0 - Demography Working Paper 1999/4 - Measuring Census Undercount in Australia and New Zealand, 1999  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 25/10/1999   
   Page tools: Print Print Page Print all pages in this productPrint All






A population census is a valuable data source for analysing the major demographic, social and economic characteristics of, and changes in, the population. It provides statistics for decision-making by governments, businesses, community organisations and individuals. A census also provides a base for post-censal population estimates and projections, which assist in planning and policy-making at the national and local levels.

Whenever a census is undertaken, questions about the completeness and accuracy of the census count invariably arise. In a large and complex exercise such as a census, it is inevitable that some people will be missed and some included more than once. Usually more people are missed than overcounted, so the census count of the population would be less than the true population. This difference is called net undercount. Net undercount can bias census counts because the characteristics of people missed may be different from those of people counted. Rates of undercount can vary significantly for different population groups depending on factors such as sex, age, ethnicity and geographic location.

Most national statistical agencies provide independent measures of census coverage. These measures may be based on demographic analyses, comparisons with administrative records or estimations from a sample survey conducted shortly after the census. Both the Australian Bureau of Statistics (ABS) and Statistics New Zealand (SNZ) conducted a Post-Enumeration Survey (PES) following their respective censuses in 1996. How were these surveys designed and conducted? What can be learnt for the 2001 PES which follows the next five-yearly census in each country?


Australia and New Zealand have low undercount rates by international standards. For Australia, the 1996 estimate of net undercount was the lowest since 1976. For New Zealand, 1996 was the first time a PES had been conducted. In both countries, estimates of net undercount are used to produce a more accurate estimate of the true population than if only census counts were used. Canada and the UK also use their Reverse Record Check and Census Validation Survey respectively in deriving population estimates. The census coverage results from the most recent evaluation studies in selected countries are summarised in Table 1.


Census Year
Gross Undercount
Gross Overcount
Net Undercount

New Zealand
not available
not available

Statistics Canada's Reverse Record Check measures gross undercount by comparing a sample of records of people included and missed in the previous census, supplemented with data from administrative records such as those for births and immigration, with the current census records. Census gross overcount is evaluated by taking a sample of records from the current census, obtaining information on possible alternative addresses, and then checking these records to identify duplicate reporting (Statistics Canada 1998).

The Office of National Statistics in the UK has conducted a post-enumeration study for every census since 1961. However, the 1991 Census Validation Study which suggested a 0.4% net undercount failed to identify the majority of those people missed by the census (OPCS 1993). In contrast, demographic analysis using the 1981 Census as a base, suggested a net undercount in 1991 of about one million people or 2.2% (OPCS 1995).

The US Census Bureau has conducted a PES after each census since 1950. Significant ethnic differentials are apparent, with net undercount rates for the Hispanic, Black, Asian/Pacific Island and American Indian populations above the national average (US Bureau of the Census 1996). However, due to the controversy and litigation relating to adjustment for undercount, the PES results have not been used in the derivation of population estimates (Hogan 1992).

The key features of the PES in both Australia and New Zealand are independence from the census and timeliness. Australia is also unique in that it does not directly publish the initial PES results but evaluates them against other data sources before publishing adjusted PES results (see section 3).


The PES is conducted as independently of the census as possible to minimise factors which might compromise the performance of census collectors and the integrity of the PES sample:
  • the PES sample is selected from an independent sample frame
  • separate office staff are used in the census and PES
  • PES interviewers are not employed as census field staff, and vice versa
  • census field and office staff are not told which areas are included in PES
  • PES fieldwork commences after census forms are collected (both countries use census field staff to deliver and pick up census forms) so that there is no overlap
  • all census forms received after the commencement of PES fieldwork are deemed 'late' and are excluded from directly affecting the PES results.

Despite these efforts to maintain independence, the census results and PES results may have correlation bias - the reasons which contribute to a person being missed in the census may also cause them to be missed in the PES.


Conducting evaluation studies promptly after a census is considered to be a key means of minimising recall error. The timeliness of the two week field phase of the PES (which in 1996 commenced two weeks after census night in the case of New Zealand and three weeks in the case of Australia) reflects an advantage of the drop off/pick up method of distributing and collecting census forms used in both countries. This contrasts with the mail-back method used in both Canada and USA, with associated delays of up to four months between the census and under-enumeration studies. In the UK, interviews were carried out in its evaluation study between six weeks and three months after the census date.



In both Australia and New Zealand the scope of the PES was similar to that of the census. For practical reasons some areas, dwellings and people were excluded from, or not able to be covered by, the PES. In total, less than 5% of the resident population of both countries was excluded. In Australia, non-private dwellings (NPDs, such as hotels and hospitals), sparsely settled areas (areas with less than 0.57 dwellings per square kilometre) and discrete Indigenous communities (where special census enumeration procedures were used) were excluded from the PES. In New Zealand, NPDs, remote areas and temporary private dwellings (e.g. tents, caravans, yachts) were excluded.

A concern is that these exclusions from the PES represent key enumeration problem areas. However, this concern needs to be balanced by the relatively small population involved and that often these areas and dwellings would require the same enumeration contacts and procedures in the PES as used in the census, thus compromising the independent nature of the PES. NPDs were excluded from the PES because of the problems involved in enumerating the mobile population found in many NPDs. However, some people in NPDs on census night were selected in the PES by virtue of them residing in or visiting a private dwelling included in the PES sample.


In both Australia and New Zealand, the 1996 PES adopted the sample design and infrastructure of existing population surveys. The Australian PES used a multi-stage stratified area sample from the ABS Labour Force Survey (LFS) parallel sample frame. The first stage involved a random selection of about 4,600 primary sampling units (PSUs) from an Australian total of about 26,000 PSUs. A PSU has an average size of 250 dwellings. These PSUs had been stratified into 460 groups on the basis of State and urban/rural distinction. In most areas, sample selection involved the selection of PSUs, followed by smaller blocks within selected PSUs, and dwellings within selected blocks.

The sampling fraction varied between States and Territories (ranging from 2 in 75 in Northern Territory to 1 in 277 in New South Wales), to facilitate separate undercount estimates for each State and Territory. A double cluster sample was selected for the Northern Territory to overcome the inherent difficulties imposed by the small size and mobility of its population. In total, the 1996 PES sample contained about 37,000 private dwellings (0.5% of total private dwellings in Australia).

The New Zealand PES used a two-stage stratified cluster sample from the SNZ Household Labour Force Survey (HLFS) sample frame. The first stage involved a random selection of about 1,000 PSUs from a New Zealand total of about 18,800 PSUs. A PSU usually contains between 50 and 100 dwellings, with an average size of 70 dwellings. These PSUs had been stratified into 122 groups on the basis of region, urban/rural classification, Maori population density, Pacific Island population density, and other socio-economic variables such as level of education and employment. The proportion of PSUs selected differed between strata. A greater proportion of PSUs were selected in Auckland to boost the Pacific Island sample population. (At the 1996 Census, about two-thirds of New Zealand's Pacific Islands population lived in the Auckland Region compared with only one-quarter of the European population). The second stage involved a systematic random selection of households from each selected PSU. The selection process ensured that the sample of households was geographically spread across the entire PSU. The total sample contained about 10,400 private dwellings (0.8% of total private dwellings in New Zealand).

The advantages in using existing sample designs included reduced costs for enumeration and field collection (as interviewers were already familiar with the geographic areas used in the sample), the availability of existing maps and street listings, and minimisation of respondent burden (by controlling overlap between the PES and other household surveys).


In both Australia and New Zealand, specially trained interviewers collected data through a face-to-face interview. Details were collected from any responsible adult in each household. In some cases follow-up interviews were conducted by telephone. There were 31,200 fully responding households in Australia, compared with 8,900 in New Zealand.


The PES form was designed to prompt respondents for an address where they may have been included on a census form. Visitors to households included in the PES were also asked for their address of usual residence. These responses were used to determine the number of times each respondent was included in the census.

The PES collected personal details (name, sex, and date of birth or age) to facilitate accurate matching of the PES form to census forms and to allow accurate undercount estimates to be generated for age and sex categories. The Australian PES also collected marital status, country of birth and Indigenous origin, while the New Zealand PES also collected ethnicity. These variables were collected to meet each country's specific population estimate and projection needs.


There were some differences in the data processing phase of the PES between Australia and New Zealand. In Australia, data capture occurred after the completion of the matching and searching processes but before the edit and amend process. The results of the matching and searching processes were marked directly onto the form prior to data capture. The data on the PES forms was then captured using Optical Mark Recognition (OMR) technology. In New Zealand, data capture was completed by data key entry and occurred before the matching and searching processes. The matching and searching results were directly entered into the PES database.


The critical matching and searching phase involved locating census forms corresponding to addresses given in the PES. The objective of matching and searching was to determine whether a respondent was included in the census at the addresses specified. Matching involved finding the census form corresponding to the dwelling at which the PES interview took place, while searching involved locating census forms at alternative addresses provided during the interview.

In both Australia and New Zealand, matching and searching were clerical operations involving the comparison of PES and census forms. Physical PES and census forms were used in Australia. However, in New Zealand all census forms were scanned and the images stored on compact disc as part of census processing. Consequently, the physical PES forms could be compared to these computerised census images.

Individual details were used to determine whether a person included in the PES was included on a census form. The most important details available for comparison were name, date of birth (or age) and sex. For more difficult cases, other details such as ethnic group (in the case of New Zealand), country of birth and marital status (in the case of Australia), and usual resident/visitor status were used.


During the searching process it may have been difficult to determine a match status for a person because of a vague search address or incomplete information on the PES or census forms. In Australia, the 85,400 full respondents to the 1996 PES provided a total of 7,100 search addresses of which 22% were too vague to allow a match status to be determined. In New Zealand, 25,400 full respondents provided about 1,400 search addresses of which 10% were too vague to allow a match status to be determined. Less than 2% of PES respondents in each country therefore had a vague search address and required imputation of a match status. In Australia and New Zealand this imputation was based on the characteristics of the person with the unresolved match status. The PES variable imputed was termed ENTOT (enumeration total), the number of times that a person was counted in the census. ENTOT could take any positive integer value but the most common values were 0 (not counted in the census), 1 (counted once in the census), and 2 (counted twice in the census).

In Australia, a logistic regression model was used to impute match status. A review of the independent regression variables was undertaken based on 1991 PES data. The regression coefficients were derived once 1996 PES data became available. In 1996 the regression variables used were:
  • the question on the PES form to which the search address was given in response
  • census night address
  • scope and coverage status
  • Indigenous origin
  • whether the respondent had already been matched at the PES address
  • whether the respondent considered that the person had been included on a census form
  • age
  • marital status
  • part of State (whether capital city or remainder)
  • number of searches.

In New Zealand, a sequential imputation procedure was used in 1996 to determine the match status for vague search addresses. Sequential imputation is equivalent to donor imputation and involved sorting the final weighted PES file by one or more variables which had a strong correlation with the ENTOT variable. The sort variables were evaluated using Goodman and Kruskal's Z-statistic. This measured the relative decrease in the proportion of incorrect predictions as a sort variable was added. The variables found to have the highest correlation with ENTOT were:
  • census dwelling form type (e.g. whether occupied or unoccupied)
  • ethnic group.


The PES in Australia and New Zealand used a similar method for estimating net undercount. The results from the matching and searching were used to calculate the number of dwellings/people who should have been counted in the census and the number who actually were counted in the census. The ratio of these two numbers represents the amount by which census counts should be adjusted for net undercount. The estimate of the true population is given by:

X = PES estimate of the true population
x = PES estimate of the population who should have been counted in the census less dummies and late returns
y = PES estimate of the population who were counted in the census less dummies and late returns
Y = census count of the population less dummies and late returns
DLR = census count of dummies and late returns

A dummy was a census form created during census enumeration. If there was a request by a household or person to mail back a census form, a dummy form was created to account for the possibility that the form would not be forwarded. Dummies may also have been created if a householder refused to complete a census form or if there was no contact on collection.

A late return was a census form returned, usually by mail, after the commencement of PES fieldwork. These forms may introduce bias because people who had not returned their census form may have been prompted to do so as a result of being included in the PES. Both dummies and late returns were excluded from the adjustment ratio to avoid biasing the PES estimates.

The above adjustment ratio also had two different weights applied to it. A dwelling weight was assigned to each dwelling to reflect the probability of that dwelling being selected in the PES. Similarly, a person weight was assigned to each responding individual to reflect the probability of that person being included in the PES.

The person weight also included an allowance for non-response. For PES dwellings that could not be matched to a census dwelling due to the vagueness of the PES address or due to the dwelling being missed during the enumeration of the census, the weight was simply the sampling fraction adjusted for non-response. For all other census dwelling form types the actual number of census dwellings was known. The weights for these form types were the actual total number of dwellings from the census divided by the number of responding dwellings. The weights containing the actual number of total dwellings from the census were more accurate as they account for the different sampling fractions for each of the form types. The weighted ratio was then applied to census counts of the population to produce an initial estimate of the true population at the subnational (e.g. capital city/balance of State in Australia and region in New Zealand) and subgroup (e.g. five-year age group by sex) levels.


The net undercount in the 1996 Australian and New Zealand censuses was estimated at 1.6% and 1.2%, respectively. The standard sampling error associated with these estimates was 0.1% in both countries. The Australian PES results were not directly published but evaluated against other data sources before being published as adjusted PES results.

Adults aged 15-29 years, the most mobile segment of the population, had the highest undercount rate of 2.8% in Australia and 2.1% in New Zealand. Those aged 45 years and over had the lowest rates of 1.0% and 0.6%, respectively. The standard error for these broad age groups was 0.1-0.2% in Australia and 0.1-0.3% in New Zealand. In contrast with New Zealand, the Australian PES was able to produce estimates of net undercount for 5-year age groups, with standard errors of 0.2-0.4%. This was due to the larger sample size of the Australian PES.

Males had a higher rate of undercount than females in both Australia and New Zealand. There were also ethnic and geographic differentials in net undercount in both countries. The rate of net undercount of the Indigenous population in Australia (which made up about 2% of the Australian population in 1996) was six times the rate of the non-Indigenous population. In New Zealand, the rates of net undercount of the Maori and Pacific Islands ethnic groups (which together made up about one-fifth of the New Zealand population in 1996) were over three times that of the remainder of the population. For a full discussion of the net undercount results, see ABS (1997) and SNZ (1998a).


To offset the impact of correlation bias between the census and PES results, population estimates derived from the PES are compared to three main sources in Australia: estimates from the National Demographic Data Bank, Medicare enrolment numbers and the estimated resident population based on the previous census.

The National Demographic Data Bank is a population database maintained by the ABS using administrative data (notably births, deaths, and overseas arrivals and departures). The database is independent of census data and contains population data back to the year 1925. For the 1996 PES, these data were considered to measure age-sex totals well up to about age 30, after which there were some concerns about pre-1970 international migration data. Sex ratios derived from these data are considered most reliable for ages under 23 years.

Enrolment data from Medicare (the Australian government health rebate system) are considered a good source for calculating sex ratios, but less reliable for age-sex totals. Age-sex totals are least reliable among the older ages where people may remain enrolled in Medicare after their death.

Population estimates based on the previous census are considered most reliable at ages 0 to 4 years because at these ages the largest component of change is birth registration data. The data for other ages are not entirely independent of the National Demographic Data Bank and Medicare enrolment data because of the demographic adjustments made to PES estimates after the previous census.

In 1996, demographic adjustments resulted in relatively small changes to the PES estimates of undercount. Table 2 shows the estimates of undercount at the national level by five-year age group and sex, before and after adjustment. The most significant adjustments were made to males aged 30-34 years and 60-69 years (ABS 1999).

For males aged 30-34 years, an adjustment for the initial undercount rate of 4.03% resulted in a sex ratio of 101.2 (males per 100 females). This result was inconsistent with other data sources. Firstly, the sex ratio was higher than that from Medicare data (99.5) and adjacent age groups (100.4 for the 25-29 years and 99.6 for 35-39 years). Secondly, the resultant 1996 population estimate appeared too high compared to the 1991 estimated population. Thirdly, the undercount rate appeared too high compared with net undercount of 2.26% in 1986 and 2.63% in 1991. To resolve these inconsistencies, the net undercount rate of males aged 30-34 years was reduced to 2.42%.

For males aged 60-64 and 65-69 years, the initial undercount rate appeared too low when compared to the adjacent 55-59 and 70-74 year age groups, both in 1996 and historically. The rates also appeared too low when compared to that of females. Consequently, the undercount rate for the 60-64 and 65-69 age groups was linearly interpolated between the undercount rates for males aged 55-59 and 70-74 years.


Age group



In New Zealand adjustment included a synthetic modelling approach to smooth across age. The smaller sample size in the New Zealand PES resulted in sampling errors much higher than those experienced in Australia. Given the limited PES results that were available and published (see SNZ 1998a), and considering the different outputs that were required, SNZ adopted a simple synthetic approach for modelling undercount (SNZ 1998b). Age and sex were key factors for modelling New Zealand's census coverage, as in Australia. However, ethnicity was also a critical factor, while geographical area was a less important variable once age, sex and ethnicity were taken into account.

SNZ produced very fine age group adjustment factors for a limited number of ethnic groups by sex. Initial testing revealed that given the limited amount of data for modelling the age group structure, two broad ethnic groupings worked best: Maori and Pacific Island combined, and 'Other'. For each ethnic-sex group, the adjustment factors were smoothed using a parametric model for the age structure. For an exposition of the underlying approach, see Congdon (1993).

The Rogers migration function (Rogers and Castro 1986) was used to capture the expectations of the age structure of undercount in the New Zealand PES. This prior knowledge drew on the structure of undercount in other countries with larger PES samples (notably Australia) and the age dependency of other related demographic phenomena (especially migration). The overall features of the curve came from the model, but the specifics of shape and height were determined from the PES data through estimating the function parameters. Due to the high sampling errors in the New Zealand PES data, some of the parameters were constrained to produce demographically plausible curves.


Both Australia and New Zealand intend to improve existing processes and to encompass new technology for the 2001 PES.

In Australia, non-private dwellings, sparsely settled areas and discrete Indigenous communities will continue to be excluded from the PES. However, further efforts will be made to improve the census count in these areas by making more effective use of administrative data for coverage checks.

For form design, only minor modifications are being considered in both countries as the forms were considered reasonably easy to complete and process in 1996. Some changes will be needed to accommodate improvements in data capture technology.

In both countries the data collection phase comprised roughly one-third of the operational costs of the PES in 1996. In New Zealand, alternative methods of data collection will be considered, such as the use of telephone interviewing, telephone follow-up, computer aided interviewing and electronic questionnaires (where information is collected on a hand-held computer rather than on paper). These options will need to be assessed on the basis of how they can maximise response and reduce costs without compromising data quality. In Australia further effort will be made to reduce the extent of vague match addresses recorded by interviewers.

For the 1996 New Zealand PES, images of census forms were used during the matching and searching processes. The images were created during the census data capture process and allowed entire census forms to be viewed on a computer screen. While imaging technology was not used in Australia in 1996, it is planned to use it for the 2001 Census. The use of census images instead of paper census forms during matching and searching is expected to introduce efficiencies into the overall process. Firstly, once a census form has been imaged, access is immediate. In contrast, paper forms must be stored and then physically flowed to the PES processing area. Secondly, census data can be electronically transcribed to the PES database. Thirdly, imaging technology will allow multiple access to images, satisfying competing priorities for census forms. While there are efficiencies to be gained, the success of this technology relies on the rapid retrieval of images.

Intelligent Character Recognition (ICR) is planned to be used for the first time to process the Australian PES in 2001. This technology enables hand-written characters to be captured, converted to a known character, and then output to file. It is also anticipated that PES data capture will be completed prior to the matching and searching processes. These two changes will allow a major improvement to take place. Previously, hand-written search addresses were manually transcribed from PES forms onto worksheets prior to the commencement of the searching process. By capturing data at an earlier stage in the overall process and using ICR technology, these addresses will be captured and automatically coded to a corresponding census collection district allowing census images for these districts to be flagged for PES processing. Hand-written country of birth responses will also be ICR captured and then automatically coded. The ICR technology will also convert marks made in optical mark recognition response boxes into electronic data.

One of the biggest challenges in both countries is to more fully integrate the PES processing system with the census processing system. The benefits of this approach are threefold: the PES process will be better positioned to take advantage of imaging technology; an interface between the PES and census processing systems can be created which will enable electronic matching to census records (enhancing the quality management of the matching and searching processes); and the expertise of census development staff can be used to create a much more sophisticated processing system.


Accurately measuring net undercount in the census is an evolving and ongoing process. The aim for any PES should be to continuously improve on previous efforts by using past experience and emerging technology. Both Australia and New Zealand are refining the 2001 PES form and processing systems to take advantage of the knowledge gained from the 1996 PES and improvements in computer and data capture technology. While maintaining independence and timeliness, these innovations should allow the 2001 PES to be completed with improved timeliness and data quality without increasing costs.


ABS (1997), Information Paper: Census of Population and Housing: Data Quality - Undercount, 1996, Catalogue No. 2940.0, Canberra.

ABS (1999), Demographic Estimates and Projections: Concepts, Sources and Methods, Statistical Concepts Library.

Choi C Y, Steel D G and Skinner T J (1988), Adjusting the 1986 Australian Census Count for Under-Enumeration, Survey Methodology, 14(2), p173-189.

Congdon P (1993), Statistical Graduation in Local Demographic Analysis and Projection, Journal of the Royal Statistical Society, Series A, 156, p237-270.

Hogan H (1992), The 1990 Post-Enumeration Survey: Operations and Results, Journal of the American Statistical Association, 88(423), p1047-1060.

OPCS (1993), Rebasing the annual population estimates, Population Trends, 73, p27-31.

OPCS (1995), 1991 Census, General Report, p113, London, HMSO.

Robinson J G, Ahmed B, das Gupta P and Woodrow K (1992), Estimation of Population Coverage in the 1990 United States Census Based on Demographic Analysis, Journal of the American Statistical Association, 88(423), p1061-1071.

Rogers A and Castro L J (1986), "Migration" in Migration and Settlement: A Multiregional Comparative Study, (ed. Rogers A and Willekens F J), Reidel, Dordrecht.

Statistical Science (1994), 9(4), p457-537.

SNZ (1998a), A Report on the 1996 Post Enumeration Survey, Catalogue No. 02.431.0096, Wellington.

SNZ (1998b), Adjustment of Post-censal Population Estimates for Census Undercount, Research Report #3, Wellington.

Statistics Canada (1998), A review of procedures for estimating the net undercount of censuses in Canada, the United States, Britain and Australia, Demographic Document, Current Demographic Analysis No. 5, Catalogue No. 91F0015MPE, Ottawa.

US Bureau of the Census (1996), Statistical Abstract of the United States: 1996, Washington.