The ABS will be closed from 12.00pm, 24 December 2025 and will reopen at 9.00am, 2 January 2026. During this time there will be no statistical releases and our support functions will be unavailable. The ABS wishes you a safe and happy Christmas.

Creating household structures using the Life Course Dataset

Methodology for creating household structures using the Life Course Dataset

Released
9/12/2025
Release date and time
09/12/2025 11:30am AEDT

What are the Life Course Dataset household structures?

The Life Course Dataset household structures are derived from administrative data sources using the Person Level Integrated Data Asset (PLIDA). They represent annual snapshots of individuals who were living together in the same dwelling as of 30 June for each year from 2006 to 2021. These structures have been developed to support longitudinal analysis of households in administrative data. 

The household structures have been developed as part of the pilot Life Course Data Initiative (LCDI), which aims to build an evidence base to support long-term policy responses that address disadvantage, particularly among children and their families.

While the Census provides detailed household information, it is only available every five years. PLIDA offers more frequent data collection but does not explicitly or consistently define household membership. To fill this gap, household structures have been created by linking individuals to common addresses at each annual snapshot. This approach enhances the utility of PLIDA and supports broader analysis within the Life Course Dataset 

This information paper outlines the methodology used to construct household structures. It includes:

  • Key findings about the household structures.
  • A definition of the Life Course Dataset household structures.
  • An overview of how the household structures are created from administrative data sources at each annual snapshot.
  • A comparison between the Life Course Dataset household structures and 2021 Census data.
  • Considerations for interpreting and using the Life Course Dataset household structures.

Data included in this release are not official statistics. They provide experimental information about people recorded in administrative data with particular focus on households. This release explains the methodology used to create household structures using the Life Course Dataset which will serve as the basis for future releases from the Life Course Data Initiative, including the experimental child-level indicator. 

What is administrative data?

Households in the Life Course Dataset

  • At 30 June 2021, the number of private occupied households in Australia estimated in the Life Course Dataset was 9.6 million. Of these, 2.5 million (26.5%) were households with children (aged 0-14 years).
  • The number of households has increased from 2006 to 2021, growing at an average annual rate of 2.4% for all households and 1.7% for households with children.
  • The number of households with children as a proportion of total households has decreased over this period, from 29.3% to 26.5%.
  • The average size of households reduced from 2.68 persons per household (0.53 children per household) in 2006 to 2.63 persons (0.48 children per household) in 2021.

Defining the Life Course Dataset household structures

To create the Life Course Dataset household structures, we follow the ABS standard definition for households. The ABS defines a household as:

“One or more persons, at least one of whom is at least 15 years of age, usually resident in the same private dwelling.” 

The Life Course Dataset household structures are created by identifying persons who share a dwelling, indexed by a common address, at annual snapshot dates.

The Life Course Dataset household structures include persons who are in the Life Course Dataset population, which is a scoped population created from the Person Level Integrated Asset (PLIDA).   

When counting households, we exclude: 

  • non-private dwellings (e.g. institutions, boarding households)
  • households containing only children under 15 years of age.

This approach aligns with the ABS standard definition of a household and ensures consistency across administrative and statistical applications. 

The image below provides examples of how household structures change across selected snapshot dates. In the second example, we see that on 30 June 2006, Person B shared a dwelling with one other person. By 30 June 2015, Person B had moved to a different dwelling and they lived alone. 

The examples illustrate that the household structures do not track household groups over time, but indicate which persons share an address at each snapshot date. This means that if a family moves address, then their household identifier will change, even though the household group may remain the same.

Diagram illustrating how household structures change over time for two individuals, Person A and Person B, across four snapshot dates: 30 June 2006, 2010, 2015, and 2021. Full description can be found in the ‘Description’ tab above the image.

The diagram illustrates how household structures change over time for two individuals, Person A and Person B, across four snapshot dates: 30 June 2006, 2010, 2015, and 2021. 

The layout consists of two rows, one for each person, a female and a male adult. Each row contains four dwelling images, representing the household composition of that person at each snapshot date. The dwellings are arranged chronologically from left to right. 

For Person A, the diagram shows that they remain at the same de-identified address across all four snapshots, but their household structure changes over time. At 30 June 2006, Person A is living alone. At 30 June 2010, they are living with a male adult. At 30 June 2015, their dwelling includes the same male adult and one female child. At 30 June 2021, the dwelling has grown further to include the same male adult, the female child, and now an additional male child. 

For Person B, the diagram shows that at 30 June 2006 they are living with a male adult at one de-identified address, and this arrangement remains unchanged at 30 June 2010. At 30 June 2015, Person B has moved to a different de-identified address and is living alone. This situation continues at 30 June 2021, with Person B still living alone at the new address. 

Assumptions

Creating the Life Course Dataset household structures

The household structures are created using integrated data from the Person Level Integrated Data Asset (PLIDA). PLIDA is a secure data asset combining information on health, education, government payments, income and taxation, employment, and population demographics to create a comprehensive picture of Australia over time.

To create the Life Course Dataset household structures, there are four key steps:

  • Step 1: Creating an ever-resident administrative population.
  • Step 2: Scoping the population at each snapshot date to obtain the Life Course Dataset population.
  • Step 3: Grouping people into households at each snapshot date.
  • Step 4: Imputing missing addresses.

The steps used to create household structures are illustrated below.

Diagram outlining the creation of the Life Course Dataset household structures from the ever-resident population in PLIDA. Full description can be found in the ‘Description’ tab above the image.

The image is a flow diagram outlining the creation of the Life Course Dataset household structures from the ever-resident population in PLIDA. 

There are five main boxes in the diagram. The first box on the left represents the ever-resident Australian population. This population includes anyone who has appeared at least once in either the Australian Taxation Office, Centrelink or Medicare datasets from 2006 onwards. 

Data from the left hand box feeds into the next box, which represents the rules to determine individuals living in Australia on the reference date, 30 June of each year, using birth, death, migration, and activity information. For each year, people are removed from the scoped population if they: died before 30 June, were born after 30 June, were not in Australia for at least 12 months within the 16 months following 30 June, had no activity in the last 1, 2 or 5 years (depending on age) before 30 June or, are older than 115 years on 30 June. 

This scoped population data feeds into the third box in the centre, showing the replication of the scoping process annually to produce a population snapshot for each year from 30 June 2006 to 30 June 2021.

​The data from the scoped population feeds into the fourth box that is used to create households. For each person in the scoped population the best address location is determined at any time point from the available data sources. For each year, people sharing the same de-identified address on 30 June are grouped into households. Where address information is missing, imputation is applied to fill the gaps. 

Data from the fourth box feeds into the final box, showing the annual replication of the process to produce households from 30 June 2006 to 30 June 2021. 

Step 1: Creating an ever-resident administrative population

To create the Life Course Dataset household structures, the first step is to create what we call the ever-resident administrative population.

This population is created from the PLIDA Person Linkage Spine, which uses data linkage methods to combine records from the Medicare Consumer Directory (Medicare registrations), DOMINO Centrelink Administrative data (Centrelink registrations) and Personal Income Tax data (Tax registrations). 

Anyone who has appeared in at least one of these datasets since 2006 is included in the ever-resident population. The aim is to cover all people who were living in Australia at any time from 2006 onwards. At 30 June 2021, the ever-resident administrative population included around 39 million people.

Step 2: Scoping the population at each snapshot date

To create the Life Course Dataset household structures, the next step is to determine which people in the ever-resident administrative population were living in Australia at the reference date of 30 June for each year from 2006 to 2021. 

This process, known as scoping, applies a series of rules to align the ever-resident administrative population with the concept of the Estimated Resident Population (ERP). The ERP includes all people who usually live in Australia, including those temporarily overseas, and excludes short-term visitors.

The approach to scoping the population for the Life Course Dataset household structures is described in Scoping the population in the Life Course Dataset.  

Step 3: Grouping people into households at each snapshot date

To create the Life Course Dataset household structures, the next step is to group in-scope persons who share the same address at the reference date of 30 June each year from 2006 to 2021. This is done using the PLIDA Core Locations module.  

The Core Locations module is a PLIDA dataset that consolidates address information from multiple source datasets to determine a single location for each person from 2006. The source datasets for the Core Locations module are: 

  • Medicare Consumer Directory (MCD)
  • Australian Taxation Office (ATO)
  • Centrelink Administrative Data (DOMINO).

Instead of storing full address text, the Core Locations module uses a secure hashed Address Register Identifier (ARID) that represents each address in the ABS Address Register. Hashing transforms the ARID into a unique and anonymised numerical code that does not contain the original address information.  

The Core Locations module contains address history information for persons in PLIDA. Address history information is stored as a series of location episodes, which include the hashed ARID, other geographic information, and a start date and end date. To identify who shared an address at a point in time, we look at who shared the same hashed ARID on that date (for example, 30 June 2006).  

Step 4: Imputing missing addresses

To create the Life Course Dataset household structures, it is essential to have hashed ARID information available. When hashed ARIDs are missing, we use imputation to fill the gaps. This means that we estimate the hashed ARID based on other available information. We use two main imputation methods:

  1. Using previous hashed ARIDs. If a person had a valid hashed ARID in the past, we carry it forward to fill in missing data.
  2. Using relationship information. If a person’s hashed ARID is missing, we copy it from the hashed ARID of their parent or partner. This occurred primarily for children or people in couple relationships. 

Imputation was most often used to create the Life Course Dataset household structures for children aged 0-14 years, as they have a higher proportion of missing hashed ARIDs compared to the total population. For this age group, hashed ARIDs were typically imputed by copying a parent’s hashed ARID. Our quality checks indicated that this imputation was of high quality. Of the imputed hashed ARIDs for children in 2021, 91% matched the child’s Census hashed ARID.

The proportion of missing hashed ARIDs is higher at earlier time periods. The graph below shows that for the total population, the total proportion of missing hashed ARIDs and the proportion of hashed ARIDs that remain missing after imputation is larger as we go back in time. This means that the quality of the household structures is poorest at the start of the series and improves over time.

How the Life Course Dataset household structures compare to Census

The Life Course Dataset household structures are an experimental product. To understand their quality, the household structures were compared to 2021 Census data through both aggregate measures (such as household size) and at a person level (such as if a person’s location in the household structures matches the location in the Census). 

While the Life Course Dataset household structures show close alignment with Census households, some differences are expected due to key conceptual and technical factors. These include: 

  • Differences in population scope: The administrative household structures aim to reflect the population scope of the Estimated Resident Population, which includes usual residents of Australia even if temporarily overseas. In contrast, the Census includes overseas visitors present on Census night and excludes usual residents who are outside Australia on Census night.
  • Address reporting errors: Administrative data may contain inaccuracies due to delayed or incomplete address updates, or individuals approximating their location.
  • Linkage limitations: Comparisons rely on linkage between the PLIDA Person Linkage Spine and Census. While the linkage between the PLIDA Person Linkage Spine and the Census is high quality, errors may result in mismatches where records thought to represent the same person actually refer to different individuals.
  • Differences in how data is captured: The Census and the household structures capture living arrangements differently. For example, children in shared care may be represented in different households in the different data sources.

Aggregate comparisons to Census

In 2011, 2016 and 2021, the Life Course Dataset consistently provided a smaller estimate of households compared to the Census. This difference is small. In 2021, there were 9.6 million administrative households compared to 9.8 million households in the Census. 

While there are some differences, the distributions of different household types between the Life Course Dataset households for 2021 and 2021 Census align well. Most notably, the Life Course Dataset provides a smaller estimate of the number of households with only two adults, and provides a larger estimate of the number of households with three or more adults.

  1. Child-only households have been excluded from the Life Course Dataset household structures composition graph and table. As a result, category totals do not sum to 100%.

The Life Course Dataset consistently provides a higher estimate of average household size compared to the Census in 2011, 2016 and 2021. The average size of households in the Life Course Dataset was 2.63 persons and 0.48 children in 2021. The average number of persons per household according to 2021 Census was 2.52 persons and 0.47 children.

In 2021, the Life Course Dataset households are broadly consistent with the Census. The Life Course Dataset provides a smaller estimate of the number of two-person households and provides a higher estimate of the number of three-person households and larger households.

Person level comparison to Census

To understand the alignment of the household structures and the Census at a person level, we matched persons in the 2021 Life Course Dataset population snapshot to the 2021 Census. Approximately 22 million persons could be matched between the two data sources. 

Overall, about 83% of hashed ARIDs from the Life Course Dataset household structures agreed with the Census hashed ARID.

The level of agreement varied by age, with lower agreement for persons aged in their 20s relative to other ages. Agreement for children aged 0-14 years is generally lower than for people aged 30 years and over.

The level of agreement for children and all persons was broadly consistent across states and territories, with lower agreement in Northern Territory. For all states and territories, the level of agreement was lower for children (aged 0-14 years) than for all persons.

Factors affecting interpretation

The Life Course Dataset household structures are an experimental product. There are several factors that affect their use and interpretation.

Implications from the way the household structures were created

  • The Life Course Dataset population scoping approach allows persons to move in and out of scope of successive household structure snapshots. This may occur, for example, when a person moves in and out of Australia over time. Such persons may appear in one household structure snapshot file, but not the next.
  • As the household structures are calculated as annual snapshots on 30 June, any changes in address that occur within the financial year are not captured. This means that households may appear more stable or consistent over time than is actually the case, particularly for persons who move address frequently.  

Data considerations

  • The household structures capture fewer households compared to Census. One reason for this is that there are people with missing location information in the PLIDA. When a hashed ARID does not match with any persons, due to missing location information, the household cannot be identified and counted.
  • Across the snapshot years, approximately 50,000 to 80,000 administrative households have only children (persons aged under 15 years) residing in them. Children in these households have been flagged in the Life Course Dataset household structures and their hashed ARIDs set to missing. Many of these households are unlikely to reflect true household arrangements and likely arise from incorrect information about address changes available in administrative data sources.
  • The household structures contain households that have matched to a large number of persons (7 persons and over). The Life Course Dataset household structures contain many more large households than the 2021 Census (in 2021, about 100,000 administrative households compared to 45,000 Census households). Incorrect information about address changes available in administrative data sources likely contributes to this effect, leading to multiple families incorrectly being attributed the same address for a period of time. 

Varying agreement between administrative and Census location

  • Compared to adults, children typically have lower rates of agreement between their administrative location and their Census location. This is likely due to the nature of administrative processes, which often involve direct engagement with and information collection from adults and parents.
  • Persons aged in their 20s have lower rates of agreement between their administrative location and their Census location compared to other adults. For adults, agreement rates generally increase with age. This is expected because younger persons typically have fewer interactions with government services, and location typically becomes more stable as people get older. 
Back to top of the page