1700.0 - Microdata: Multi-Agency Data Integration Project, Australia Quality Declaration 
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 26/03/2019  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All RSS Feed RSS Bookmark and Share Search this Product
  • Quality Declaration

QUALITY DECLARATION

INSTITUTIONAL ENVIRONMENT

General information about the institutional environment of the ABS; including the legislative obligations, financing and governance arrangements, and mechanisms for scrutiny of operations; can be found here: ABS Institutional Environment.

Information about the ABS and its role as an Accredited Integrating Authority under the Commonwealth Data Integration Interim Arrangements can be found here: ABS Integrating Authority Accreditation.

All MADIP microdata products (e.g. the Basic Longitudinal Extract (2011) and Basic Longitudinal Extract (2011-2016 Cohorts)) are separate extracts from the MADIP integrated data asset. Information about the project is available here: Multi-Agency Data Integration Project.

MADIP microdata products are released in the secure ABS DataLab, and in accordance with the conditions specified in the Census and Statistics (Information Access and Release) Determination 2018 made under the Census and Statistics Act 1905. This ensures that confidentiality is maintained whilst enabling micro level data to be released.

MADIP microdata products are also made available under trial access arrangements via the Protari application programming interface (API) under Section 12 of the Census and Statistics Act 1905. This enables confidentialised aggregate tables to be generated from underlying unit record data.

RELEVANCE

The MADIP is a partnership among six Australian Government agencies to combine longitudinal information on healthcare, education, government payments and personal income tax with population demographics to create a comprehensive social picture of Australia.

MADIP microdata products can be used to study how socioeconomic characteristics predict government service usage and changes over time. These datasets also allow for analysis of changes in social, health and economic outcomes for sub-populations such as Aboriginal and Torres Strait Islander peoples, young people, older Australians, welfare recipients, and regional communities.

2011 Cohort microdata product

The Basic Longitudinal Extract (2011) contains key demographic, social, healthcare, government payment and income information spanning 2011-2016 for the Australian population in 2011. This information is sourced from:

  • the MEDB and MBS data;
  • PIT datasets;
  • SSRI data; and
  • the Census 2011 data.

The scope and coverage of the dataset is described in detail at the Basic Longitudinal Extract (2011) Methodology page.

This dataset was created to allow for analysis of Medicare use, social security payments, and personal income tax, along with rich socio-demographic information from the Census 2011.

2011-2016 Cohorts microdata product

The Basic Longitudinal Extract (2011-2016 Cohorts) contains key demographic, social, healthcare, education, government payment and income information spanning 2011-2016 for the resident Australian population between 1 January 2011 and 31 December 2016. This information is sourced from:
  • the MEDB, MBS and PBS data;
  • SSRI data;
  • PIT datasets;
  • AT data; and
  • Census 2016 data.

The scope and coverage of the dataset is described in detail at the Basic Longitudinal Extract (2011-2016 Cohorts) Methodology page.

This dataset was created to allow for analysis of medical services, pharmaceutical prescriptions, social security payments, personal income tax, apprentices and trainees, along with rich socio-demographic information from the Census 2016.

TIMELINESS

The first MADIP linkage took place in early 2016 to test the feasibility of linking multiple government datasets.

An independent Privacy Impact Assessment of the MADIP was undertaken from mid-2017 to ensure all privacy risks were identified and mitigated before the project became fully operational and was expanded longitudinally under the DIPA in mid-2018. The final report was published in April 2018, along with a response from the MADIP partner agencies - both of these documents can be found on the ABS Privacy Impact Assessments page.

2011 Cohort microdata product

The Basic Longitudinal Extract (2011) microdata product, released in June 2018, relates to the Australian population in 2011 and includes information for this population covering the period 2011 to 2016. The dataset is suitable for analysing trends and transitions for the 2011 Australian population over a six year period from 2011 to 2016. The specific time periods to which the different sources of information in the product pertain are defined in Basic Longitudinal Extract (2011) Methodology (Scope and Coverage). The information included from the MEDB, MBS, PIT and SSRI source datasets was the most recent available at the time of the longitudinal expansion of MADIP. (Note that Census 2016 information is not available in the Basic Longitudinal Extract (2011) as it was not linked into the MEDB spine at the time the product was developed.)

2011-2016 Cohorts microdata product

The Basic Longitudinal Extract (2011-2016 Cohorts) microdata product, released in March 2019, relates to the resident Australian population between 1 January 2011 and 31 December 2016 and includes information for this population covering the period 2011 to 2016. The dataset is suitable for analysing trends and transitions for the Australian population in any year between 2011 and 2016 over the period from 2011 to 2016. The specific time periods to which the different sources of information in the product pertain are defined in Basic Longitudinal Extract (2011-2016 Cohorts) Methodology (Scope and Coverage). The information included from the MEDB, MBS, PBS, SSRI, PIT and AT source datasets was the most recent available at the time that the first iteration of the Person Linkage Spine, based on which the microdata product is built, was created.

ACCURACY

All reasonable attempts have been made to ensure the accuracy of the MADIP microdata products, however, the following limitations should be considered when interpreting analytical results from the Basic Longitudinal Extracts:
  • Differences in the scope of the source datasets - each administrative dataset has a different eligible population;
  • Differences in the purpose of collection - the MEDB, MBS, PBS, SSRI, PIT and AT source datasets contain data collected by Australian Government agencies for administrative purposes; only the Census 2011 and 2016 data were collected for statistical purposes;
  • Differences in the way similar concepts are measured - for example, income information collected through tax returns may be defined differently from income collected in the Census, and the questions, form types, and timing of data collection are different. Users are encouraged to visit the relevant Australian Government agency websites for information about the data items (see Interpretability below);
  • Missed links and missing information – as described in Methodology (Scope and Coverage) and Methodology (Sources of Error) for each dataset;
  • Possible undercoverage and overcoverage – as described in Methodology (Scope and Coverage) for each dataset;
  • Linking error, reporting error and processing error – as described in Methodology (Sources of Error) for each dataset;
  • No additional editing, cleaning or imputation was conducted on each dataset over and above what was conducted on the source datasets by the data custodian agencies.

2011 Cohort microdata product

As detailed in the Basic Longitudinal Extract (2011) Methodology (Linking Methodology), records pertaining to individuals in each of the source datasets, namely MEDB, MBS, PIT, SSRI and Census 2011, have been linked using statistical methods and techniques developed by the ABS.

The accuracy of the linkage between MEDB and PIT records, and MEDB and SSRI records is considered to be high given that name, sex, geocoded address and date of birth information was available on these datasets. The linkage rates of these two separate, independent linkage exercises are high, 93.4% and 94.6% respectively.

Given that name information was not available in the Census 2011 data (name information reported by respondents to the Census 2011 was destroyed at the end of the Census 2011 processing period), sex, geocoded address, date of birth, age and other common information between Census 2011, MEDB and also SSRI was used to effectively link Census 2011 records to their corresponding MEDB records. The utilisation of SSRI information allowed the MEDB and SSRI linked records to be drawn on to improve the linkage of Census 2011 records to MEDB records. The linkage rate achieved was 66.5%.

The ABS continually improves the statistical linkage methods it utilises and welcomes feedback from authorised users of the Basic Longitudinal Extract (2011) microdata product to feed into this continuous improvement process. In particular, forthcoming products are expected to have a higher linkage rate for Census records.

2011-2016 Cohorts microdata product

As detailed in the Basic Longitudinal Extract (2011-2016) Cohorts Methodology (Linking Methodology), records pertaining to individuals in each of the source datasets, namely MEDB, SSRI, PIT, AT and Census 2016, have been linked using statistical methods and techniques developed by the ABS.

The Person Linkage Spine was created by combining MEDB, SSRI and PIT. The accuracy of the linkage between MEDB-to-SSRI and PIT-to-MEDB-SSRI is considered to be high and high linkage rates, of 97.2% and 92.0% respectively, were achieved. The accuracy of the separate Census 2016-to-Spine and AT-to-Spine linkage exercises was also considered to be high and achieved linkage rates of 92.1% and 94.3% respectively.

The ABS continually improves the statistical linkage methods it utilises and welcomes feedback from authorised users of the Basic Longitudinal Extract (2011-2016 Cohorts) microdata product to feed into this continuous improvement process.

COHERENCE

The data included in the Basic Longitudinal Extracts is used for many purposes. The administrative datasets (MEDB, MBS, PBS, SSRI, PIT and AT) are all used for reporting purposes by Australian Government agencies, and the Census information is analysed by many users.

Estimates derived from MADIP microdata products may differ to those derived from MADIP source datasets, or other similar sources. This is due to a range of factors, including:
  • The Basic Longitudinal Extracts are linked datasets, the scope of which is specific to the linkage methodology conducted to create the microdata products. The ABS is continuously improving its data linking methodologies, therefore similar linked microdata products may be comprised of different links;
  • Variability in population scope and reference periods between source datasets – as described in Methodology (Sources of Error) for each dataset; and
  • Differences in the purposes of the source datasets and the way similar concepts are measured – as described in Methodology (Sources of Error) for each dataset and Accuracy (above).

A number of data items in the Basic Longitudinal Extracts have been derived from several of the source datasets. For example, geographical information in the MADIP microdata products draws on all of the source datasets to minimise the number of records with missing information. This means the geography data items in the microdata products will not be directly comparable to any of their source datasets. The data items lists for each Basic Longitudinal Extract, available in the Downloads tab, denote data items by source dataset and also those that have been derived from multiple source datasets.

A small percentage of linked records have inconsistent data, such as a different Indigenous status reported for Census and in the SSRI data, and difference in income between Census and PIT. This can happen due to errors in the data or it can be the result of the source datasets covering slightly different reference periods and information having changed legitimately to reflect real changes in a person’s characteristics or circumstances.

Several data items in the Basic Longitudinal Extracts have been summarised from transaction and event-based data in order to provide information at a person level. Namely:
  • MBS data items in the microdata products have been derived from transaction-level MBS data, which contains a record for each service and/or claim a person uses and/or makes;
  • PBS data items in the microdata products have been derived from transaction-level PBS data, which contains a record for each prescription and/or claim a person uses and/or makes; and
  • SSRI data items in the microdata products have been derived from event-based SSRI data, which contains a record whenever there is a change in the circumstances of a welfare benefit recipient. Examples of the type of changes in circumstances that would generate a new record are a change in dollar value in a welfare benefit payment, the commencement or ceasing of a welfare benefit payment, and a change in a person’s residential address.

The data custodian agencies collect the MBS, PBS and SSRI data in this manner as part of the administration of Medicare use by and welfare benefit payments to Australians. The microdata products include data items that summarise this detailed information (for example, whether a person is receiving a welfare benefit payment on 9 August for each year 2011-2016 in the Basic Longitudinal Extract (2011) or whether a person received a welfare benefit payment at any time during the calendar year for each year 2011-2016 in the Basic Longitudinal Extract (2011-2016 Cohorts)).

The linkage rate of Census 2011 to MEDB in the Basic Longitudinal Extract (2011) is lower compared to the linkage rates between the other administrative datasets included in the microdata product. This may affect comparisons with other Census 2011 statistics and analysis. A weight has been provided in the Basic Longitudinal Extract (2011) for all linked Census 2011 records to weigh those records up to the Census 2011 population. This weight allows analytical results produced from this microdata product to be more comparable with other statistics derived directly from the Census 2011 data. Note that records in this microdata product that were not linked to a Census 2011 record have a weight of zero.

INTERPRETABILITY

This publication should be referred to when using the Basic Longitudinal Extracts. It contains information on the methodology used to create the microdata products, the data items included in the microdata products, using the microdata products in the ABS DataLab or accessing the Protari API under trial arrangements, and conditions of use.

Authorised users of the Basic Longitudinal Extracts are encouraged to visit the relevant Australian Government agency website for further information about the data included in the microdata products – namely:
ACCESSIBILITY

Researchers who are affiliated with Australian Government or academic research organisations can apply for access to use MADIP microdata in DataLab for in-depth analysis using a range of statistical software packages.

Information about the DataLab can be found on the Using the DataLab and About the DataLab pages.

To find out how to apply for access to MADIP microdata in the DataLab, contact dipa@abs.gov.au.

The ABS is trialling access to MADIP microdata via Protari – a tool with a web-based Table Interface and an application programming interface (API). Participation in the trial is open to analysts associated with Australian Government agencies. The MADIP Basic Longitudinal Extract, 2011-2016 (2011 Cohort) microdata product is currently available via Protari.

More information about Protari and the access trial is available on the Using Protari and Protari pages.

To find out how to participate in the trial, contact protari@abs.gov.au.

Test files containing false, randomised data based on the MADIP products are available from the Downloads tab. The data in the test files is not real. The purpose of the test files is to allow users to become familiar with the product structure and prepare code/programs prior to accessing the actual microdata in the DataLab. For more information, see the MADIP Basic Longitudinal Extract, 2011-2016 (2011 Cohort) DataLab Test File and the MADIP Basic Longitudinal Extract, 2011-2016 (2011-2016 Cohorts) DataLab Test File pages.

Further information to assist users in understanding and accessing microdata is available from the Microdata Entry Page.

User Guides for the MADIP microdata products are available for authorised researchers to access in the DataLab.

For additional information about and support for using the microdata products, contact dipa@abs.gov.au.

For technical support in using the DataLab, contact microdata.access@abs.gov.au or call (02) 6252 7714.

For technical support in using the Protari tool, contact protari@abs.gov.au.