RE-ENGINEERING THE CENSUS
Hundreds of thousands of Australians took advantage of the opportunity to use the Internet to respond to the 2006 Census of Population and Housing, conducted by the Australian Bureau of Statistics (ABS) in August 2006. The census has been held every five years since 1961. The introduction of a secure online option was a significant departure from the tradition of physical collection and despatch of completed paper census forms. However, use of the Internet was only one of the many innovative ways advanced technology was adapted in 2006 to bring in the answers in Australia’s biggest survey.
The gross figures of the operation alone begged for new and clever ways to capture the data: the details of over 20 million people, on up to 10 million hand-completed forms, collected by a workforce of nearly 30,000 from every occupant of the country’s 7.7 million square kilometres - plus the Australian Antarctic Territory and occupied offshore islands - delivered to area supervisors by foot, bicycle or private car, then forwarded under strict security by road transport to one single point on the land mass; and most of this work completed over a period of less than three weeks.
In these beginning years of the 21st century, the scale of the census demands the maximum use of digital data processing and communications technology to help contain labour costs and minimise human error. Yet the constitutional importance of the data alone dictates that any adoption of new techniques or emerging technology can occur only after thorough testing on a scale in keeping with the magnitude and watershed timing of the census.
AUSTRALIA'S BIGGEST SURVEY
The census is a huge undertaking by national standards. It represents a major investment in time and money for the ABS, in return for which the Bureau - and the country - obtains a vast electronic storehouse of data of exceptional quality: a detailed picture of the circumstances of each of Australia’s more than 20 million people on one night every five years. Apart from its obvious value in government administration and planning, census data forms the basis of the allocation of each state and territory’s seats in the House of Representatives, and is used in the distribution by the Australian Government of Goods and Services Tax revenue to the states and territories. It is also invaluable to business and community organisations, researchers and students.
To achieve this outcome, the ABS goes to great lengths to ensure two primary requirements are met - practically nobody is missed, and every completed census form finds its way to the Data Processing Centre (DPC) safely and securely. Security is a matter of overriding importance. Until the data is separated in processing from personally identifiable information - such as the name and address details shown on an original census form - it can be seen only by a restricted number of ABS employees.
The continuous effort required to bring about the 2006 Census is illustrated by the fact that by census night on 8 August 2006, ABS staff had already been working for well over a year on preparations for the next census, scheduled for 2011. Given the six to seven-year preparation time needed, the rate of convergence between the census and current technology may seem to lag behind the take-up of new techniques by the private sector, or even by individuals. But this is to be expected. Put simply, census night is not the time to trial last week’s headline technology development, let alone tomorrow’s beta applications.
The established technique of having a form delivered and retrieved in person by a census employee, or collector, has produced a fund of amusing or thought-provoking stories of intrepid collectors going to unusual lengths to reach people in remote or unlikely locations. A more mundane but equally important process is the logistical effort involved in transporting the completed forms securely from all the far-flung corners of Australia.
Hence the question '... how can we do this more easily and cheaply without sacrificing thoroughness and security?' In setting out to develop practical answers, the ABS and its predecessor, the Commonwealth Bureau of Census and Statistics, have established an international reputation for responsible early adoption of technology.
ADVANCES IN 2006
The 2006 Census saw the introduction of important improvements. These included advances in field operations and administration, including advanced mapping techniques; the introduction of the Internet-based version of the census form; utilisation of more advanced processing technology, and improvements to the output of census data.
Producing maps for the census field force involved what might well be the largest individual map production project in Australia. Innovative mapping technology was used to produce individual maps for each of the 39,000 collection districts (CDs). These displayed the CD boundary over a topographic base with a level of detail suited to the size of the individual district. For large rural CDs, inset maps were also included to provide helpful detail. Purpose-designed jurisdiction maps were available to area supervisors together with copies of the individual CD maps. Census district managers had maps covering the larger territory for which they were responsible.
The ground for an innovative census was further prepared using new techniques to recruit the field workforce of 43,000. Instead of paper application forms, recruitment was carried out using call centres or Internet-based application forms. Job applications were uploaded frequently to a database, making it possible to track the progress of recruitment throughout the country, so that extra resources could be quickly applied in any areas where it was proving difficult to attract enough suitable people.
The 2006 Census was the first to provide all participants with the option of completing and submitting their census forms via the Internet. This process was known as the 'eCensus'.
An early version of the eCensus had been trialled on a small scale during the 2001 Census, but it was clear from the outset that protecting the privacy of Census responses, coupled with the potential scale of a national Internet census option, called for a great deal of preparation and testing. A census ‘dress rehearsal’ held in a number of communities in August 2005 included an eCensus option, and resulted in 8% of participants electing to submit their forms online.
ABS’s industry surveys have charted the rapid growth in Internet subscribers in recent years, including the very fast growth in the number of higher speed or broadband subscribers (see Internet activity). There was reason to believe that a significant number of Australians would take advantage of an Internet census option.
Apart from the convenience for some households of submission via the now familiar medium of the Internet, there were other advantages for users. Many people with visual impairment or other disabilities, who might normally require the assistance of family or friends to complete a paper census form, could use applications such as screen readers to complete a form independently online. In addition, people who were difficult to reach to collect paper forms - for reasons including geographical remoteness or even security provisions in blocks of flats - could more easily lodge their forms via the Internet.
The information technology company IBM Australia was contracted by ABS to build and host the eCensus application, using the strongest available encryption technology. Because of the strict security provisions in the contract, IBM itself did not have access to census responses, with the ‘private key’ or decryption technology being available only to the ABS.
The resulting Internet census form was highly interactive. Participants electing to complete the online form were able to log in using a unique twelve-digit personal identification number (PIN - the ‘eCensus number’) supplied to householders in a sealed security envelope delivered by hand with each paper census form. This was coupled with the accompanying individual census form number to give access to the eCensus.
Once logged in, participants were able to move through the various ‘pages’ of the form, completing the details, navigating back to make any corrections, or electing to partially complete the form and retrieve the saved data later for finalisation. Printed guidance booklets delivered to every household provided instructions on using the eCensus, and a technical support telephone help line was available. Early reaction to the eCensus on Internet forums, such as the broadband choice forum Whirlpool, was generally positive. Forum participants commented favourably on how quickly they were able to fill out and submit the form.
Shortly after census night it was clear that almost 780,000 households, or 9.0% of all households, had opted to complete their forms online - the Australian Capital Territory recorded the highest take-up (15.9% of households), and the Northern Territory the lowest (6.3%). Expeditioners wintering in Australia's Antarctic Territory went online to complete their 2006 Census forms. Census information from Australians based at Casey, Davis, Mawson and Macquarie Island stations was collected in a matter of hours; previously the completed paper forms could only be processed once they were physically shipped out on the summer resupply voyages at the end of the year.
Both ABS and IBM expressed satisfaction with the response of the eCensus application to the demands placed on it, particularly on census night. Usage peaked at 72,000 submissions between 8:00 pm and 9:00 pm on the actual night, and at one point 55,000 users were logged on simultaneously. ‘During the 24-hour period of 8 August, eCensus delivered more than 12.5 million page views,’ a joint statement affirmed.
The eCensus form included a feedback or comment facility, and users were encouraged to comment on their experience. Overall feedback had been very positive, and comments will be examined as part of preparations by the ABS for the next census.
For processing, data from each eCensus form was loaded into the same file format as that used for data extracted from paper forms, and processed in the same way as all other census data. The processing system generated an image of the eCensus data to match that produced from the paper forms.
If participation in the eCensus was dependent on receiving hand-delivered login information, what impact if any did eCensus uptake have on the labour involved in census collection? The answer lay at least partly in the way eCensus submissions were linked to field operations through mobile telephone technology.
While the eCensus was a major step forward for Australia, an Internet-based census option had already been trialled successfully in Canada and New Zealand. It was in the area of census field communications that Australia broke new ground in management of the 2006 Census. ABS was aware of keen interest from other national statistical agencies as the system was developed, trialled and put into operation.
The receipt of a completed eCensus form, for example, automatically generated a text message sent to the mobile telephone of the Census collector for the relevant CD, advising that there was no need to call at that particular address to retrieve a paper form. Nearly 1.6 million short message service (SMS) messages were sent to collectors during the census field operation.
As an employer of a large mobile workforce, the census field force, ABS embraced SMS technology in various ways for the 2006 Census, primarily to help manage the 30,000 Census collectors working across the country. Whereas in the past operational developments and essential information had been communicated to collectors through field supervisors, SMS made it possible to advise a collector instantly and directly of any important development in their area, such as the receipt of an eCensus form from a particular address, or a request for a form - for example, through the Census Inquiry Service telephone help line - from a householder who might have been missed in the first distribution. The time and labour-saving benefits of such a system are obvious.
Another innovative use of SMS was in promoting the census to young adult Australians, identified by research as a group requiring targeted advocacy to encourage them to participate. Promotional messages were sent to the mobile phones of 80,000 young subscribers in metropolitan areas shortly before census night.
If SMS was of practical assistance in maintaining ‘quality assurance’ for the census, an advanced online field management system also played a major role. This system, linking the Census Management Units in each state and territory capital with field supervisors by computer, gave state management teams the ability to track field activity with great immediacy.
PROCESSING THE FORMS
Improvements in the use of intelligent character recognition (ICR), automatic repair and automatic coding proved a major step forward in efficient, accurate and thorough processing of paper forms from the 2006 Census. More efficient techniques for handling of forms and of the captured data itself within the DPC also represented major advances on previous censuses.
Twenty years ago, the 1986 Census DPC employed 1,600 people and took 18 months to complete processing and release the data. At its peak, the 2006 DPC employed half that number, but expects to finish its job in less than twelve months, with the first release of detailed data scheduled to occur eleven months after census night. Yet from the point of view of DPC management, advances in technology and increased efficiency have enabled better outcomes from the census and better quality data than ever before. Australia boasts the fastest output of census data of any country.
By the morning after census night, information technology experts at the DPC calculated that they had 92 different applications in place to commence processing Census responses. Each of the applications inherited from the 2001 Census had been either replaced or significantly updated, and fresh applications were still being added.
'Flow control’ at the DPC employed innovative wireless tracking of paper forms. Forms were received in boxes containing the intake from one CD, a 'rule-of-thumb' measure of the workload of a single census collector. As the boxes were moved around the DPC, logistics staff passed hand-held wireless scanners or ‘wands’ over bar codes on the boxes and then over similar bar codes at the entrance to each processing section of the centre. In this way every box of forms was accounted for throughout processing, and could be traced instantly if needed.
After checking on arrival to ensure each form was in suitable condition to be scanned, the forms were trimmed. The individual pages were passed in large batches, at the rate of up to 6,000 pages an hour, through 13 high technology scanners which captured an electronic image of each page. These images were stored in a central database, together with similar images derived from eCensus forms, ready for further processing. At a later stage, those images relating to individuals who had elected to have their details stored for 99 years in the Census Time Capsule would be transferred to microfilm for that purpose. For those who submitted paper forms, their descendants will be able to see an image of their ancestor’s handwriting.
The major advance in data handling at the DPC for the 2006 Census was the advent of simultaneous processing of different topics from the census forms, allowing far greater flexibility. This was made possible through storage of the individual census records on a highly advanced central database. In the 2001 Census, data had been captured on text files which were then moved through a series of separate databases as topics were completed in a strict sequence. By 2006 the coding workload on any single topic, such as occupation or industry, could be distributed in large tranches to any arrangement of teams, whereas in the past the records had to be processed in CD lots to ensure the orderly progress of the data through the system.
A major challenge in the automatic coding of handwritten responses on census forms is the almost unlimited variety of handwriting styles the computers will encounter in processing millions of images of pages. Major strides had been made in ICR between the 2001 and 2006 Censuses. Improved automatic coding, using much more powerful indexes against which to code responses and more accurate automatic repair processes, greatly reduced the need for manual assessment and correction. Quality assurance processes, that in the past involved manual checking of sample records to ensure a high level of accuracy was being maintained, were largely automated in 2006, with the added benefit that individual coders received direct electronic feedback. Overall, this greatly improved flexibility substantially increased the efficiency and speed of processing.
Most questions on the paper census form called for small horizontal pen marks to indicate householders’ responses from a list of possible answers. However, a number of questions required words and figures to be written. The ICR process translated these written submissions - now in the form of electronic images rather than on the original paper or eCensus files - into classification codes. Where the applications could not resolve a word, figure or letter successfully, an image of the problem character could be diverted to a staff member for manual coding, or ultimately a ‘snippet’ from the image of the original census form could be examined to decide on the meaning of a piece of handwriting.
Processing is divided into two ‘runs’ to speed up the output of census data. First release processing (FRP) covers topics which are simpler to process, such as age, sex and religion, and which achieve a high degree of automatic coding. Second release processing covers complex topics such as industry, qualifications and occupation, which require more manual intervention to decipher responses. FRP is geared to a release of data as early as possible in 2007.
How accurate is the overall count, and how do we know? The answer is through the post-enumeration survey (PES), which is conducted by the ABS on a selected sample of households to test the accuracy of the original count. In 2001 the PES was conducted using paper forms. In 2006 the PES interviewers used a new application on notebook computers, designed to check on the original data captured from the census forms completed by the test households.
NEW DEAL FOR USERS
Developments in the release of census data are making it more accessible for users than ever before. The advent of free access to statistical publications and a wide range of other data on the ABS web site since 2005 have been followed by major innovations on the site.
Traditionally census data has been offered to the public on a ‘one-size-fits-all’ basis, where users were presented with geographical information in batches designed to meet straightforward needs, but which required them to employ the ABS statistical consultancy service to adapt the data for more specialised uses. This was largely because the data was retained in a pre-formatted state. The new system maintains the data in a raw form that can be adapted ‘on the fly’ to more easily respond to the specific needs of the user.
Users can already access small area census information easily and quickly on the web site in a variety of forms, including through interactive maps, and manipulate the figures online to show desired results. Tables or graphs can be produced online with a few clicks of a mouse.
By the time of the final main release of 2006 Census data in mid-2007, the site will feature a new ‘table builder’ which will enable users to create their own tables in a variety of forms using multiple variables.
THE CENSUS OF THE FUTURE
The success of the first eCensus application has ensured that this option will continue to be available in future censuses. No doubt a greater proportion of households will opt to use the eCensus as the application is further refined and public familiarity with the Internet further expands.
Improved processing and coding techniques offer the real possibility of expanding the actual content of the census. The two main barriers to expansion have been ‘respondent load’ (the magnitude of the householder’s task in completing the census form) and processing cost. New solutions are already reducing the cost of processing. The ABS is already considering possible options for the future. One of these options might be for broadening the content of the census without increasing the load on the householder is to conduct a '50/50' sample-census. This might involve, for example, half the households in Australia providing information on social questions, such as family details, background and education, and the other half responding to economic questions such as income and employment. Statisticians believe such a census would produce a great deal more information at the same cost, while retaining the advantages of the single-form census: quality small area statistics and a very large sample of population. Relieved of the burden of asking every householder for the same demographic and economic information, the census could expand into new areas such as the interactions of families across households - for example, where one family provides support for aged parents living independently - or providing data about people with multiple jobs.
However, the ABS believes the one-night ‘snapshot’ or ‘as enumerated’ census is here to stay. Where other countries have moved to a census taken over longer periods of time, or ‘place of usual residence’ counts, the Bureau believes such censuses cost more to run and have resulted in little or no improvement in the quality of the data. The snapshot technique is seen as simpler for respondents to understand while avoiding errors brought about by different conceptions of a ‘normal’ or ‘usual’ place of residence.