Using administrative data to improve the Census count

Refining our methods to support a high quality 2021 Census count

Released
11/06/2021

The ABS is using administrative data to help assess which houses were empty on Census night, and to help improve the Census count.

We work hard to collect data from every house that was occupied on Census night, but we still need to adjust the count for a small number of occupied houses where no Census form was received.

We do this using a process called imputation, where we copy basic Census information (number of people with their age and sex) from another similar household (a donor) to represent the missed people.

We know from looking at results from the 2016 Post Census Review that, despite our best efforts, we sometimes impute data when the house was in fact empty on Census night.  If this happens a lot in an area, it can make the counts for that area larger than they should be.

Our research also shows that when we adjust the Census count using imputation, we tend to impute more older Australians than we should.

Improvements using administrative data

Using administrative data will improve our Census count in two ways. 

First, it helps us to better assess whether a house was empty on Census night.  This helps to ensure that we don’t over-adjust Census counts.  This is especially important for areas where it is harder to make this assessment, such as inner-city areas with high numbers of apartment buildings.

Second, administrative data helps us to choose a donor household where the people are more similar in age to those who were missed.  This ensures that the adjustments we make to the count do not over-represent older Australians.

Assessing which houses were empty

After Census data has been collected, we look at the small number of houses where no Census form was received and where Census field staff are still unable to work out whether the house was occupied (for the 2016 Census, this was about 3% of all houses).

We must decide whether these houses were empty on Census night, or whether the Census count should be adjusted to cover people who were missed.  For the 2021 Census we will use administrative data to help with this decision.

First, we create indicators from administrative data for the houses where we need to make an assessment.  Some of these are specific to a house, while some relate to a group of houses across a wider area.  All administrative data is stored safely and securely, and household level data is always kept confidential by storing it without address information attached.

The administrative data we use includes government services data (Medicare, Centrelink and Tax Office data), electricity usage from energy distributors and information from the ABS Address Register.  We also use data from the previous Census.

Table 1 shows the type of indicators we use to assess whether houses were empty or occupied.  Indicators from government services data and the ABS Address Register relate to houses, while indicators from electricity usage and the previous Census are created for a group of houses across an area.

Indicators from electricity data are created at an area level to strengthen privacy (refer to Note 1 for more information about the use of electricity data).  Although this information loses some precision when we use it at an area level, it still provides an important signal of occupancy because it is more up to date than government services data at the time of the Census.  Government services data are updated more slowly and tend to be a better signal for whether a house is empty or occupied over the long-term.

Download
Table 1: Types of indicators used to assess whether a house was empty or occupied
Level of indicatorIndicators usedData source
Specific to a house
  • The possible number, sex, and age group of people residing in the house
  • Whether people likely to reside in the house recently interacted with services provided by the government
  • The type of house, e.g. stand-alone or apartment
  • Medicare
  • Centrelink
  • Tax Office
  • ABS address register
Grouped for houses across a wider area
  • An estimated occupancy rate at Census time for houses in the area requiring an occupancy assessment (using electricity data)
  • The estimated occupancy rate across all houses in the area from the 2016 Census
  • Electricity usage
  • 2016 Census

 

We then use these indicators to assess whether houses that didn’t submit a form were empty or occupied.  For example, if Centrelink data indicates that a government payment was recently received at a house, we are more likely to decide it is occupied.  On the other hand, if the house is in an area where electricity usage indicates a low occupancy rate, we are more likely to decide it is empty.

We have developed a set of rules (a statistical model) based on how well this information could have assessed whether a house was empty in the 2016 Census.  The assessment for each dwelling is done automatically by applying these rules, so an ABS employee running this process will not be looking directly at this information.

When we apply this new approach to 2016 Census data it shows that we would have set another 1.7% houses as empty, which would have reduced the Census count of people by about 2%. (Our official population estimates would not have changed, as we create those using a separate process which uses counts from the Census as only one of a number of inputs).  These reductions match closely with estimates from the 2016 Post Census Review, making us confident our approach is improving Census counts.

Bigger reductions in Census counts are expected in inner-city areas, where there are large numbers of high-rise apartments, and it is harder to tell whether the apartments are empty.  If we had used this new approach in 2016, counts for some of these areas would have been 5-10% lower.

Adjusting the Census count

For the 2021 Census, we will also use administrative data to help choose donor houses with people who are more similar in age to the people who were missed.

For houses we assessed as occupied on Census night, we first derive the number and ages of people recorded at the house from government services data (the same information used in Table 1 above).  We then use this information to help choose a “donor” house from the Census that is more likely to have people of the right age (refer to Note 2 for more information).

For example, administrative data may indicate that two people aged 30-34 live in a house that we assessed as occupied.  We then choose a donor house from the Census where administrative data also indicates there were two people aged 30-34.  The Census counts from this donor house are copied across for the missed people.  This is more likely to provide counts of people in the right age group than just choosing a house in the same area at random.

Notes

  1. The 2021 Census Administrative Data Privacy Impact Assessment recommended proceeding cautiously with the use of electricity data.  In response, the ABS has stated it will only use this information at an area level for the 2021 Census.
  2. We don’t use the administrative data counts as direct substitutes for people missed in the house.  This is something we may consider for the future, but further research and privacy assessment is required to understand whether this would be appropriate.