Scoping the population in the Life Course Dataset

Methodology for scoping the population in the Life Course Dataset

Released
9/12/2025
Release date and time
09/12/2025 11:30am AEDT

What is the Life Course Dataset population?

The Life Course Dataset population is created from administrative data sources using the Person Level Integrated Data Asset (PLIDA) and represents a series of snapshots of people in Australia at 30 June for each year from 2006 to 2021. This population has been developed as part of the pilot Life Course Data Initiative (LCDI), which aims to build an evidence base to support long-term policy responses that address disadvantage, particularly among children and their families.

The methodology for scoping the Life Course Dataset population builds on the approach developed for the Administrative Data Snapshot of Population, which was used to produce experimental population data as at 30 June 2021. This approach has been extended to cover a population snapshot at June each year from 2006 to 2021, with a particular focus on scoping the child population aged 0-14 years. This reflects the LCDI’s focus on developing data to address evidence gaps in the early years of life to better understand and respond to disadvantage.

This information paper outlines the methodological approach used to scope the Life Course Dataset population. It includes:

  • Key findings about the child population of the Life Course Dataset.
  • An overview of how the population is created from administrative data sources.
  • A comparison between the Life Course Dataset population and the official Estimated Resident Population, highlighting differences in scope and coverage to support appropriate interpretation of the data.

Data included in this release are not official statistics. They provide experimental information about people recorded in administrative data with particular focus on children aged 0-14 years. This release explains the methodology used to create the population of the Life Course Dataset which will serve as the basis for future releases from the Life Course Data Initiative including the experimental child-level indicator.

What is administrative data?

Children in the Life Course Dataset

  • At 30 June 2021, the Life Course Dataset population includes an estimated 4.8 million children aged 0-14 years living in Australia.
  • The number of children in the Life Course Dataset population has increased steadily from 2006 to 2021, growing at an average annual rate of 1.2%, compared to 1.6% for the total population.
  • The number of children as a proportion of the entire population has decreased over this period, from 19.6% in 2006 to 18.5% in 2021.
     

Data used to create the Life Course Dataset population

The Life Course Dataset population is created using integrated administrative data from the Person Level Integrated Data Asset (PLIDA). PLIDA is a secure data asset combining information on health, education, government payments, income and taxation, employment, and population demographics to create a comprehensive picture of Australia over time.

Population scoping for the Life Course Dataset is undertaken using the Core Scoping module within PLIDA. This module enables users to determine an approximate resident population for a given point in time, based on a structured methodology developed by the ABS as part of the Administrative Data Snapshot of Population.

The Core Scoping module is created by combining information from a range of different administrative datasets and consists of three components:

Vitals table

Residence tables

Activity tables

Creating an ever-resident administrative population

To build the Life Course Dataset population, the first step is to create what we call the ever-resident administrative population.

This population is created from the PLIDA Person Linkage Spine, which uses data linkage methods to bring together records from the Medicare Consumer Directory (MCD), DOMINO Centrelink administrative data (DOMINO) and Personal Income Tax data (PIT).

Anyone who has appeared in at least one of these datasets since 2006 is included in the ever-resident population. The aim is to cover all people who were living in Australia at any time from 2006 onwards. At 30 June 2021, the ever-resident administrative population included around 39 million people.

Scoping the population at a point in time

To create the Life Course Dataset population, the next step is to determine which people in the ever-resident population were living in Australia at the reference date of 30 June for each year from 2006 to 2021.

This process, known as scoping, applies a series of rules to align the ever-resident population with the concept of the Estimated Resident Population (ERP). The ERP includes all people who usually live in Australia, including those temporarily overseas, and excludes short-term visitors. The image below shows the scoping process.

Scoping the PLIDA ever-resident population to the Life Course Dataset population at a point in time

The image is a flow diagram scoping the Life Course Dataset population from the ever-resident population in PLIDA. Full description can be found in the 'Description' tab above the image.

The image is a flow diagram scoping the Life Course Dataset population from the ever-resident population in PLIDA. 

There are three main boxes in the diagram. The first box on the left represents the ever-resident Australian population (the PLIDA Spine population). The PLIDA Spine population includes anyone who has appeared at least once in either the Australian Taxation Office, Centrelink or Medicare datasets from 2006 onwards. 

Data from the left hand box feeds into the centre box, which represents the rules to determine individuals living in Australia on the reference date, 30 June of each year, using birth, death, migration, and activity information. For each year, people are removed from the scoped population if they: died before 30 June, were born after 30 June, were not in Australia for at least 12 months within the 16 months following 30 June, had no activity in the last 1, 2 or 5 years (depending on age) before 30 June or, are older than 115 years on 30 June. 

This scoped population data feeds into the final box on the right hand side, showing the replication of the scoping process annually to produce a population snapshot for each year from 30 June 2006 to 30 June 2021.

Scoping ensures the administrative population reflects ERP by removing individuals who are unlikely to have been residing in Australia at the reference date. These rules are applied in a specific order, as summarised in the table below for the 30 June 2021 snapshot. 

Rules used to scope the administrative data to people living in Australia at 30 June 2021
Scoping ruleMeasured byNumber of people removed
  1. Remove if died before the reference date
Date of death4.6 million
  1. Remove if born after the reference date
Date of birth0.8 million
  1. Remove if not in Australia for 12 of the subsequent 16 months on the reference date

Date of arrival

Date of departure

4.6 million
  1. Remove if no activity in the last 1, 2 or 5 years before the reference date depending on age
Use of government services in the last 1 to 5 years*3.0 million
  1. Remove if older than 115 years on the reference date
Date of birth237 people

*5 years for ages 1–24, 2 years for ages 25–79, and 1 year for ages 80 and over.

Removing records for death before or birth after reference date

Accounting for overseas migration

Applying age-based activity rules

How the Life Course Dataset population compares to official population estimates

The Life Course Dataset population is an experimental product that uses integrated administrative data to provide a snapshot of Australia’s population at 30 June of each year from 2006 to 2021. It does not replace Australia’s official population estimates, the Estimated Resident Population (ERP).

The Life Course Dataset population is constructed from administrative data. Administrative data is not typically designed with statistical production in mind. Hence, there are some characteristics of the data that should be considered when interpreting the Life Course Dataset population.

To help users understand this complexity, different factors affecting the interpretation of the Life Course Dataset population are explained through comparison to ERP below.

At the national level, the Life Course Dataset population closely aligns with ERP for each year from 2006 to 2021. The largest difference occurs in the 30 June 2021 snapshot, where the Life Course Dataset provides a higher estimate of the population by approximately 229,520 people, which is 0.9% higher than ERP. This higher estimate is likely due to the inclusion of Single Touch Payroll (STP) data in 2020, which increased the volume of administrative activity in the PLIDA data, and may have inflated counts of individuals for inclusion in the Life Course Dataset population.

The age distribution of the Life Course Dataset population closely aligns with ERP across most age groups. For the majority of ages, differences between the two measures are relatively small, usually within a few percentage points.

There are some exceptions to the close alignment between the Life Course Dataset population and ERP across age groups.

The Life Course Dataset population overestimates the number of infants aged 0 years across all years from 2006 to 2021 by approximately 11,140 infants, which is 3.7% higher than ERP. This overestimate is due to challenges in applying age-based activity scoping rules for infants in the administrative data.

At older ages, particularly among those aged 90 years and over, the Life Course Dataset also tends to produce a higher estimate than the official population. This reflects the nature of administrative data, where individuals are often retained in scope unless a recorded date of death is available, or they are aged over 115 years, in which case they are excluded from the dataset to maintain data quality.

Back to top of the page