1700.0 - Microdata: Multi-Agency Data Integration Project, Australia Quality Declaration 
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 24/10/2018  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All RSS Feed RSS Bookmark and Share Search this Product

This document was added 22/11/2018.



USING PROTARI

INTRODUCTION

The ABS is conducting a user trial of new software called Protari. Protari aims to enable relevant, and customisable, analytical interfaces for users while maintaining privacy. The ABS is inviting participation from analysts associated with Australian Commonwealth or State Government agencies. For more information see Protari.

The following Multi-Agency Data Integration Project (MADIP), Basic Longitudinal Extract (BLE) products are currently available:


For the full list of data items for the above data products, see the Data Item lists from the Downloads tab within this publication. Note that for the BLE (2011-2016 Cohorts) Protari provides subsets of the data (rather than the full file). See more information below.

For information on scope and coverage, defining your population of interest, the linking methodology and weighting - see the Methodology (2011 Cohort) and Methodology (2011-2016 Cohorts) sections within this publication.

ASSESSING THE FITNESS FOR PURPOSE OF DATA

It is the responsibility of each researcher to assess the fitness of data for intended purposes. Protari perturbs (slightly adjusts) data outputs to protect privacy while retaining the information value of the table as a whole (more information below). Data accuracy may also be impacted by statistical or other error, including but not limited to coverage error, response error and linkage error (see 'Methodology' chapters linked above). No reliance should be placed on data cells with small values, particularly where the total table population is also small.

PERTURBATION TO PROTECT CONFIDENTIALITY

To minimise the risk of identifying individuals in aggregate statistics, outputs derived through Protari are perturbed. That is, small random perturbations (or changes) are applied to individual cells within results while the information value of the table as a whole is retained. The ABS considers perturbation to be the most satisfactory technique for avoiding the release of identifiable data while maximising the range of information that can be released. Perturbation is considered necessary due to the flexible nature of the possible queries, the amount of detail in the underlying dataset, and the potential for results from multiple queries to be compared. When interpreting results from Protari, consider that:
  • while perturbation results in introduced random errors, it does so with almost no bias.
  • running the same query multiple times will result in the same perturbation and therefore the same results.
  • some relationships between different estimates (e.g. adding to 100%) may not be preserved exactly.

Protari may not be suitable for all types of research. It is the responsibility of each researcher to assess the fitness of data for their intended purposes.
  • It is expected that for most queries perturbations will have a smaller impact on data accuracy than from other forms of error (e.g. coverage error, response error and linkage error).
  • Although cells may appear to contain none, or all, of the relevant population, this is not necessarily a reflection of the true value of the cell.
  • No reliance should be placed on data cells with small values, particularly where the total table population is also small.

The methods used to calculate the perturbations and other confidentiality protections are very similar to those used in ABS TableBuilder (see the Confidentiality page of the TableBuilder User Guide). The ABS uses the 'Five Safes Framework' for protecting the privacy of individuals when releasing data. Perturbation is only one element of privacy protection within this broader framework. Further information on perturbation can be found in the 'Managing the Risk of Disclosure: Treating Aggregate Data' section of ABS Confidentiality Series, Aug 2017 (cat. no. 1160.0) while more information about the Five Safes Framework is in the 'Managing the Risk of Disclosure: The Five Safes Framework' section of the same publication.

SUBSETS AND SAMPLE FILES FOR BLE (2011-2016 COHORTS)

The BLE (2011-2016 Cohorts) is available as subsets and samples via Protari, rather than in its entirety. Each subset and sample file includes data from the in-scope population, across all 6 years (2011-2016) and for all the General, Scoping, Derived, Census, PIT, SSRI, and Apprentice and Trainee data items, but differ depending on MBS/PBS data item inclusions and geographic availability. For the full list of data items, see the Data Item List from the Downloads tab within this publication. The following subsets and samples are currently available for use by approved users via Protari.

TABLE 1: COMPARING SUBSETS AND SAMPLE FILES FOR BLE (2011-2016 COHORTS)

Dataset TitleDataset NameDescription

MADIP, 2016 Census, Unweighted
MADIP, 2016 Census, Weighted
madip_2016
madip_2016_wtd
For these datasets results are generally available at SA1 or higher geographic level. However the datasets include only summary MBS and PBS data items (see the below table for description of summary MBS and PBS data items). If these MBS or PBS data items are included in a query, the results are only available at SA3 or higher geographic level.

MADIP, 2016 Census, 20% Sample, Unweighted
MADIP, 2016 Census, 20% Sample, Weighted
MADIP, 2016 Census, 2% Sample, Unweighted
MADIP, 2016 Census, 2% Sample, Weighted
madip_2016_20pct
madip_2016_20pct_wtd
madip_2016_2pct
madip_2016_2pct_wtd
These datasets include all MBS and PBS data items. Results are only available at SA3 geographic level or higher.



TABLE 2: SUMMARY MBS AND PBS DATA ITEMS INCLUDED IN THE HEALTH SUMMARY SUBSET

Data Item NameData Item Label

mbs_<year>_data_flagMBS <year> Data flag
mbs_<year>_ser_used_0101_rMBS <year> Number of services used - Non-referred attendances - General Practitioner/Vocationally Registered General Practitioner (VRGP)
mbs_<year>_ser_used_rMBS <year> Number of services used - Total
mbs_<year>_ben_paid_0101_rMBS <year> Benefits paid - Non-referred attendances - General Practitioner/Vocationally Registered General Practitioner (VRGP)
mbs_<year>_ben_paid_rMBS <year> Benefits paid - Total
mbs_<year>_fee_char_0101_rMBS <year> Fees charged - Non-referred attendances - General Practitioner/Vocationally Registered General Practitioner (VRGP)
mbs_<year>_fee_char_rMBS <year> Fees charged - Total
pbs_<year>_data_flagPBS <year> Data flag
pbs_<year>_prscrpt_rPBS <year> Prescriptions - Total
pbs_<year>_prscrpt_abov_cp_rPBS <year> Prescriptions - Subsidised (above co-payment) - Total
pbs_<year>_prscrpt_undr_cp_rPBS <year> Prescriptions - Not subsidised (under co-payment) - Total
pbs_<year>_bnft_rPBS <year> Benefits received - Total
pbs_<year>_pnt_cntrb_abov_cp_rPBS <year> Patient contribution (book price) - Subsidised (above co-payment) - Total


WEIGHTS

Both the BLE (2011 Cohort) and the BLE (2011-2016 Cohorts) contain a single weight field to correct for incompleteness of the linkage of Census (2011 Census or 2016 Census respectively) to MEDB/MADIP (for further information see the Methodology (2011 Cohort) or Methodology (2011-2016 Cohorts) sections within this publication). Typically queries involving Census fields should use these weights so that estimates for the complete Census population are obtained. For queries that do not include fields from Census, more accurate results will usually be obtained by not applying the weights, so that records that are not linked to Census can also be used in the estimation.

In Protari the ability for the user to apply or not apply these weights is achieved by the selection of different datasets. If the '<dataset name> Weighted' dataset is selected then the weights will be used in calculating the results. If the '<dataset name> Unweighted' dataset is selected then the weights will not be used. These datasets are otherwise the same.

LONGITUDINAL DATA

Within the BLE all source datasets except Personal Income Tax (PIT) have years ending 31 December. For PIT data, a year refers to the financial year ending on 30 June of that calendar year (e.g. '2015' refers to the financial year 2014-15).

Although the BLE contains six years of data (2011-2016) for most source datasets, Census and Derived data items relate to a single year only (i.e. 2011 for the BLE (2011 Cohort) and 2016 for the BLE (2011-2016 Cohorts). In Protari, results relating to data items prefixed 'Census 2011', 'Derived 2011', 'Census 2016' or 'Derived 2016' will always refer to that single year, even if the query is run for a different year.