Integration of the 2018 Survey of Disability, Ageing and Carers and the Person Linkage Spine

This paper describes the scope, coverage and quality of the 2018 SDAC linked dataset

Released
15/12/2021

Introduction

The Survey of Disability, Ageing and Carers (SDAC) is an Australia-wide survey which collects information about the health of people, including:

  • prevalence of disability
  • the support needs of people with disability and/or aged 65 years or older
  • the supports provided to people with disabilities and older people by informal carers
  • demographic and socioeconomic characteristics

The Multi-Agency Data Integration Project (MADIP) is a partnership among Australian Government agencies to combine information on healthcare, education, government payments, personal income tax, and the Census to create a comprehensive picture of Australia over time. Information in MADIP is combined by linking person-level datasets to a central linking infrastructure, or ‘Spine’ that serves as a base dataset representing the ‘ever-resident’ population of Australia. The Spine is made up of the Medicare Consumer Directory, Personal Income Tax data and DOMINO Centrelink Administrative Data (DOMINO CAD). More information on MADIP, including how the data is kept secure and confidential and Privacy Impact Assessments for MADIP, is available on the ABS Website.

The ABS linked the 2018 SDAC with the MADIP in 2020.  This provides potential rich insights about support needs of people with disability, older people and their carers. High-level policy issues that could be informed by this linkage include:

  • the connection between people’s disability-related support needs and their use of government services
  • the impact of disability, ageing and caring on social and economic participation
  • patterns of healthcare service use for people with different functional impairments.

This paper summarises scope, coverage and quality of the 2018 SDAC linked dataset, and provides an overview of linkage methodology used and results achieved from the linkage process undertaken in 2020.

Data scope and coverage

2018 Survey of Disability, Ageing and Carers

The SDAC was conducted in all states and territories from July 2018 to March 2019. It included 54,142 people (41,580 adults and 12,562 children aged 0 to 17 years) living in 21,983 private dwellings.

Table 1: Scope of the SDAC
In scopeOut of scope
Urban and rural areas in all states and territoriesVery Remote parts of Australia
Permanent residents of AustraliaDiscrete Aboriginal and Torres Strait Islander communities
Overseas visitors who have been working or studying in Australia for the last 12 months or more, or intend to do soPersons whose usual place of residence was outside Australia
Persons usually resident in a private dwellingMembers of non-Australian Defence forces (and their dependents) stationed in Australia
 Non-private dwellings (e.g. motels, hotels, short-stay caravan parks)
 Certain diplomatic personnel of overseas governments, customarily excluded from the Census and estimated resident population
 Visitors to private dwellings

 

2019 Person Linkage Spine and the MADIP

The June 2019 Spine used to link SDAC 2018 data to the MADIP consists of 35,253,568 person records and includes persons who were present in any of the following datasets:

  • Medicare Consumer Directory (MCD) during the period January 2006 to June 2019
  • Personal Income Tax (PIT) having received a payment summary and/or completed an income tax return during the 2006-07 to 2018-19 financial years; and
  • DOMINO Centrelink Administrative Data (DOMINO CAD) during the period January 2006 to June 2019.

All Spine person records were used in the linkage.

Data preparation

The variables used for linking SDAC to the Person were name, address related information, date of birth and sex.

Name

First names and Surnames were cleaned, standardised, and anonymised.

Cleaning includes removal of known nonsense values (e.g. ‘baby’), removal of titles (e.g. ‘Dr’) and removal of special characters (e.g. &).

Standardisation involves converting common nicknames, abbreviations, misspellings or variations on a first name to their 'origin name' (e.g. Beth, Eliza and Libby are converted to Elizabeth). The standardisation process accounts for sex or gender, e.g. ‘Alex’ will standardise to ‘Alexandra’ for a female or ‘Alexander’ for a male. Any first name that could not be standardised was retained in its original form.

Once cleaned and standardised, names are anonymised for linkage.

Address related information

The ABS Address Register provides a comprehensive list of all physical addresses in Australia and includes an Address Register ID (ARID) for each physical address. An anonymised version of ARID is used in data linkage projects. Addresses were geocoded to ARID, Mesh block, Statistical Area Level 1 (SA1), Statistical Area Level 2 (SA2), and Statistical Area Level 4 (SA4), according to the ASGS 2016 classification.

A small number of records had missing or incomplete address information and could not be geocoded to the specified level of geography.

Date of birth

Day, month and year of birth were used in the linkage, however, 9.8% of 2018 SDAC records were missing date of birth. For these records, year of birth was estimated using the person’s age on the date they completed the survey. Age data was available for each record and is accurate to within one year (as the survey is conducted over a financial year).

Sex

No records were missing sex; hence no data cleaning was necessary for this variable.
 

Quality of SDAC Linking Variables

A number of metrics can be used to assess dataset quality, including rates of data missing from the dataset (referred to as ‘missingness’). Missingness rates were calculated for the key linking variables of first name, surname, date of birth, address related information and sex. These rates were low except for date of birth and surname (see Table 2).

The quality of the SDAC linking variables is considered high given the generally low missingness rates observed for these variables. The high quality of address related information compensated for the lower quality of date of birth data in the linkage process.

Table 2: Missingness rates for linking variables
Linking variableNumber of persons with missing informationMissingness rate (%)
Date of Birth5,2899.8
Surname3,3776.2
First Name1,8083.4
Address Register ID1330.3
Mesh Block730.1
SA1280.1
SA2 or SA440.0
State/Territory00.0
Sex00.0

Linkage methodology

The 2018 SDAC to Spine linkage was completed using deterministic linking. Deterministic linkage involves locating record pairs across the two datasets that match exactly or closely (according to pre-defined rules) on common variables. The deterministic linkage employed for the SDAC used a four-stage approach. The matching rules and criteria were gradually broadened with each stage to tolerate greater differences in a field or expand the geographic area in which a match can occur.

Linkage results

A total of 49,431 links were achieved, giving a total linkage rate of 91.3%.

An approximate link quality measure was assigned by assuming a relationship between the linking evidence and resulting link quality. The links formed in Stage 1 were very high quality and agreed exactly on cleaned first name, cleaned surname, date of birth, and ARID or mesh block (sex was also used in some passes). The proportion of total links for Selected Persons in each stage is presented in Table 3.

Table 3: Linkage results by stage
Stage of linkingNumber of links identifiedProportion of total links (%)
Stage 1 (Quality 1)26,41753.4
Stage 2 (Quality 2)4,8579.8
Stage 3 (Quality 3)12,89926.1
Stage 4 (Quality 4)5,25810.6
Total49,431100.0

Quality measures were applied to the file at the completion of the linkage.

Quality 1 and 2 links are very good quality and can be included with confidence in most analyses. More than 60.0% of all links were assigned Quality 1 or Quality 2. Quality 3 links are good quality and can be used in aggregate analyses, though should be used with caution for small population groups. Quality 4 links are lower quality links and should be used with caution.

In addition, analysts can perform sensitivity tests to understand the impacts of excluding/including these links for specific analysis. Linkage rates by Sex, Age and State/Territory are presented in Tables 4A, 4B and 4C. High linkage rates were achieved across most demographics, with lower rates achieved for younger people and those living in the Northern Territory.

Table 4A: Linkage rate by Sex
Total recordsLinked recordsLinkage rate (%)
Male26,47324,06890.9
Female27,66925,36391.7
Total54,14249,43191.3
Table 4B: Linkage rate by Age
Total recordsLinked recordsLinkage rate (%)
Under 15 years10,5079,47190.1
15-24 years6,0825,42689.2
25-34 years7,0176,08786.8
35-44 years7,1526,56291.8
45-54 years7,1076,54292.1
55-64 years6,7226,27693.4
65-74 years5,5175,17993.9
75-84 years3,0162,90796.4
85 years and over1,02298196.0
Table 4C: Linkage rate by State or Territory
Total recordsLinked recordsLinkage rate (%)
NSW15,07313,60890.3
VIC13,41712,33892.0
QLD10,3509,33590.2
SA1,9671,80992.0
WA9,4698,88693.8
TAS1,6951,55191.5
NT53739874.1
ACT1,6341,50692.2
Total54,14249,43191.3

Linkage rates varied across subpopulations within the data. Consideration of the quality flag can be useful in evaluating the suitability of 2018 SDAC-MADIP linked data, especially for analyses of subpopulations. 

People with no educational attainment and those who did not continue after Year 12 were the least likely to have been linked. Table 5 presents the linkage outcomes by highest level of educational attainment.

Table 5: Linkage rate by Highest level of educational attainment
Total recordsLinked recordsLinkage rate (%)
Bachelor degree or higher11,99610,97991.5
Advanced Diploma/ Certificate III/IV12,84011,57292.7
Year 126,2985,59588.8
Year 112,1181,95392.2
Year 10, Cert I/II4,9554,50891.0
Year 91,9551,78591.3
Year 8 or below1,8501,71592.7
No educational attainment15312783.0
Level not determined1,6101,50693.5

Linkage rates also varied depending on country of birth. People born in England had a linkage rate of 95.2% while people born in China had a linkage rate of 77.7%. Table 6 presents the linkage outcomes by country of birth.

Table 6: Linkage rate by Country of birth
Total recordsLinked recordsLinkage rate (%)
Australia39,01335,91992.1
England2,4002,28495.2
India1,3151,12185.3
New Zealand1,2111,11792.2
China (exc. SARs and Taiwan)92571977.7
Philippines61354588.9
South Africa51347793.0
Vietnam39733684.6
Italy36633992.6
Other7,3896,57489.0

People with disabilities, older people and carers were all linked at a higher rate than those that did not fall into these groupings. There were differences between subgroups. For example, those with sensory and speech disabilities had a higher linkage rate than those with psychosocial disabilities. Tables 7A and 7B provide linkage rates by Disability status and Disability group.

Table 7A: Linkage rate by Disability status
Total recordsLinked recordsLinkage rate (%)Quality 1 and 2 Linkage rate (%)
Profound core activity limitation1,4021,31994.167.3
Severe core activity limitation1,4371,36895.263.1
Moderate core activity limitation1,4571,39195.566.3
Mild core activity limitation3,6093,43195.163.6
Employment or schooling restriction only78072993.561.2
No specific limitation1,1791,12495.363.4
Long-term health conditions with no disability12,53111,80794.262.0
No long-term health conditions or disability31,74728,26289.054.4
Table 7B: Linkage rate by Disability group
Total recordsLinked recordsLinkage rate (%)Quality 1 and 2 Linkage rate (%)
Sensory or Speech3,4173,24695.063.7
Intellectual1,4321,35094.759.4
Physical restriction6,2755,96295.063.1
Psychosocial2,2202,08293.860.9

Similar differences can be seen between primary carers, other carers and people who aren’t providing informal care in Table 8.

Table 8: Linkage rate by Carer status
Total recordsLinked recordsLinkage rate (%)Quality 1 and 2 Linkage rate (%)
Primary Carer1,9651,88996.165.0
Other carer3,9783,76894.763.9
Not a carer48,19943,77490.857.0

Using integrated data

The ABS provides access to de-identified microdata for authorised researchers through its DataLab. The DataLab is designed for high-end users to undertake complex analysis of microdata. The ABS manages access to integrated data by using the Five Safes Framework – an internationally recognised approach to managing disclosure risk. 

The Responsible Use of Microdata Guide outlines the process and limitations of using Microdata. 

Access to the integrated data is possible where the research need is for data broader than that collected in the 2018 SDAC. As an example, to gain insights into the health of people receiving Government benefits, data items about the support needs of people from the 2018 SDAC could be linked with payment information from DOMINO Centrelink Administrative Data.

Where data requirements can be satisfied by just the 2018 SDAC, integrated data will not be approved for use.  Detailed microdata from the 2018 SDAC is available for analysis in several forms.

Given the high linkage rates and level of precision achieved, the linked data is considered highly suitable for research and analysis purposes to inform the development of disability, ageing and carer related policies and evidence-based decisions. As such, the original survey weights calculated for the SDAC can be used to give a good approximation for population estimates.

When the SDAC sample is weighted to the total population, no compensation is made for Indigenous status to correct for sampling bias that may occur in Aboriginal and Torres Strait Islander population estimates. In addition, very remote areas and Discrete Aboriginal and Torres Strait Islander communities were outside the survey scope. This means that use of the analytical dataset for Aboriginal and Torres Strait Islander population research may not be appropriate and would require careful consideration.

No alternate survey weights have been created to address the non-linkage rates. As a result, population estimates produced from the linked file will be lower than those produced from the unlinked SDAC file.