Insights from the Australian Census and Temporary Entrants Integrated Dataset methodology

This is not the latest release View the latest release
Reference period
2016
Released
14/02/2019

Explanatory notes

Introduction

1 The statistics in this publication were compiled from the 2016 Australian Census and Temporary Entrants Integrated Dataset (ACTEID).

2 The statistics in this publication relate to people who were present in Australia on Census Night, 9 August 2016 and held a temporary visa. In this publication, this population is referred to as Temporary Entrants.

3 The 2016 Australian Census and Temporary Entrants Integrated Dataset (ACTEID) Project linked the 2016 Census of Population and Housing dataset with Temporary Visa Holder data from the Department of Home Affairs.

Data sources

Temporary Visa Holder data

4 The Temporary Visa Holder data is administrative data pertaining to temporary visa holders in Australia, from various Department of Home Affairs (Home Affairs) systems.

2016 Census of Population and Housing

5 For information about the 2016 Census and collection methodology please refer to the information provided on the ABS website (www.abs.gov.au) at Understanding Census Data. Information about the data quality of the Census is available on the ABS website under Census Data Quality.

Scope

6 The scope of the 2016 Australian Census and Temporary Entrants Integrated Dataset (ACTEID) is restricted to people who had a temporary visa and were present in Australia on 9 August 2016.

7 The 2016 ACTEID includes the following Visa types and subclasses for persons:

Special Category (New Zealand citizen)

  • Special Category (444)


Temporary Work (Skilled)

  • Temporary Work (Skilled) (457)


Working Holiday Maker

  • Working Holiday (417)
  • Work and Holiday (462)


Student

  • Student (Temporary) (500)
  • Independent ELICOS Sector (570)
  • Schools Sector (571)
  • Vocational Education and Training Sector (572)
  • Higher Education Sector (573)
  • Postgraduate Research Sector (574)
  • Non-Award Foundation/Other Sector (575)
  • AUSAID/Defence Sponsored Sector (576)


Other Temporary visa

  • Bridging Visa Class A (010)
  • Bridging Visa Class B (020)
  • Bridging Visa Class C (030)
  • Bridging Visa Class D (040)
  • Bridging Visa Class E (050)
  • Bridging Visa Class F (060)
  • Bridging Visa Class R (070)
  • Temporary Work (Short Stay Activity) (400)
  • Temporary Work (Long Stay Activity) (401)
  • Training and Research (402)
  • Temporary Work (International Relations) (403)
  • Investor Retirement (405)
  • Government Agreement (406)
  • Retirement (410)
  • Foreign Government Agency Staff (415)
  • Special Program (416)
  • Entertainment (420)
  • Media and Film Staff (423)
  • Supported Dependent of Australian or New Zealand Citizen Temporarily in Australia (430)
  • New Zealand Citizen (Family Relationship) (461)
  • Skilled - Graduate (476)
  • Temporary Graduate (485)
  • Diplomatic (995)
     

Data integration

8 Statistical data integration involves combining information from different data sources such as administrative, survey and/or Census to provide new datasets for statistical and research purposes.

9 Data linking is a key part of statistical data integration and involves combining records from different source datasets using variables that are shared between the sources. Data linkage is performed on unit records that represent individual persons.

Linkage between the Temporary Visa Holder data and the 2016 Census

10 The 2016 temporary entrant records were linked to the 2016 Census of Population and Housing data using a combination of deterministic and probabilistic linkage methodologies. The linkage method used in this project is considered a silver standard linkage because encoded name and address information was used. Further information about name and address encoding can be found in Information paper: Name encoding method for Census 2016.

11 Deterministic data linkage, also known as rule-based linkage, involves assigning record pairs across two datasets that match exactly or closely on common variables.

12 Probabilistic linking allows links to be assigned in spite of missing or inconsistent information, providing there is enough agreement on other variables to offset any disagreement. In probabilistic data linkage, records from two datasets are compared and brought together using several variables common to each dataset (Fellegi & Sunter, 1969).

13 A key feature of the methodology is the ability to handle a variety of linking variables and record comparison methods to produce a single numerical measure of how well two particular records match, referred to as the 'linkage weight'. This allows ranking of all possible links and optimal assignment of the link or non-link status (Solon and Bishop, 2009).

Linkage results

14 At the completion of the linkage process 974,803 (60%) out of 1,635,498 records from the Temporary entrants data were linked to the 2016 Census data.

Estimation method

Calibration

15 The estimates in this publication are obtained by assigning a "weight" to each linked record. The weight is a value which indicates how many Temporary entrants records are represented by the linked record. Weights aim to adjust for the fact that not all Temporary entrants records are able to be successfully linked to a Census record, and the linked Temporary entrants records may not be representative of all records.

16 The weighting process involved a two-step linking propensity calibration process.

17 The first step of the calibration process adjusted for missed links. The methodology adopted was originally developed to adjust for non-response in sample surveys. Concepts of non-response and non-links differ in that the former is generally a result of an action by a person selected in a sample, and the latter is the failure to link a record likely as a result of the quality of its linking variables. However, both situations may result in under/over representation, and as such the methodology developed to adjust for non-response is suitable to apply to adjust for non-links. Unlike non-response in a sample survey, in this case many of the characteristics of the non-linked records are known, and these characteristics can therefore be used as inputs into an adjustment for unlinked records.

18 The propensity of a Temporary entrants record to be linked to a Census record was modelled using a logistic regression, which estimates the probability of each record having been linked based on that record's characteristics. The logistic regression was performed separately for student visa holders, temporary skilled workers, Special Category (New Zealand citizen) visa holders, and others. Each record was then assigned an initial weight given by the inverse of the linkage probability estimated by the relevant regression model. For example, if the regression model estimated that a Temporary visa holder record had a 75% chance of being successfully linked to a Census record, the initial weight would be 1 divided by 0.75, or 1.33. This ensures that records in the linked dataset which share characteristics with unlinked records are given higher weights, so that the characteristics associated with unlinked records are adequately represented on the linked file.

19 The second step of the calibration process uses the weights derived from the first step as an input into the calibration to known totals from the Temporary entrants dataset. This adjusts for residual bias not accounted for by the regression model, and ensures that totals from the linked dataset exactly match totals from the Temporary entrants dataset for characteristics considered to be of particular interest, such as visa group, applicant status (primary or secondary) and state/ territory of residence.

20 Following the two-step calibration process, weights are applied to the 974,803 linked records so that estimates will align to the 1,635,498 in scope records from the Temporary entrants population. The mean weight is therefore around 1.68, though the weights range between 1.0 and 12.5.

Estimation

21 Estimates in this publication are obtained by summing the weights of persons with the characteristic of interest. Cells in this publication have been randomly adjusted to avoid the release of confidential data. Discrepancies may occur between sums of the component items and totals.

Reliability of estimates

22 Error in estimates produced using the 2016 Australian Census and Temporary Entrants Integrated Dataset (ACTEID) may occur due to false links and the non-random distribution of missed links.

Missed links

23 As many of the characteristics of the unlinked records are known, much of the error introduced by under or over representation of certain groups amongst the linked records is able to be mitigated by the calibration process.

False links

24 Not all record pairs assigned as links in any data linkage process are a true match, that is, the record pairs may not relate to the same individual. These are known as false links.

Measures of error

25 While the calibration process is able to mitigate the potential for bias due to missed links, it does not mitigate against the error introduced by false links. Accordingly, the linkage strategy used for the 2016 Australian Census and Temporary Entrants Dataset (ACTEID) was designed to ensure a high level of accuracy while also achieving a sufficiently large number of linked records to enable detailed analysis of small populations. Using the model developed by Chipperfield et al (2018), the estimated precision of the linkage (the proportion of links that are true matches) was 99%.

26 In survey data sampling error is estimated using a measure of Relative Standard Error (RSE). As the 2016 Australian Census and Temporary Entrants Integrated Dataset (ACTEID) is not based on a sample, RSEs cannot be produced for this data. A measure of uncertainty associated with estimates due to the calibration model could theoretically be produced, but would not represent the error introduced by false links, and have therefore not been included in this publication.

Comparability with other data

27 Estimates from the 2016 ACTEID may differ from statistics produced from other ABS collections or from the Temporary Visa Holder data. While the linked records have been calibrated to selected population totals from the Temporary Visa Holder data, other totals may not align. In some cases a data item may be available on both the Temporary Visa Holder data and the Census (such as Country of Birth), but differs between the two sources. The 2016 ACTEID has used the Census data item.

Acknowledgement

28 The ABS acknowledges the continuing support provided by the Department of Home Affairs and the Department of Social Services for the Australian Census and Temporary Entrants Integrated Dataset (ACTEID) Project. The provision of data as well as ongoing assistance provided by both agencies is essential to enable this important work to be undertaken. The enhancing of migrant related statistics through data linkage by the ABS would not be possible without their cooperation and support. The ABS also acknowledges the importance of the information provided freely by individuals in the course of the 2016 Census. The Census information of individuals received by the ABS is treated in the strictest confidence as is required by the Census and Statistics Act 1905. See the following link to the Census Privacy Policy.

Glossary

Apart from the concepts and data items originating from the Department of Home Affairs Temporary Visa Holder (TVH) database, (for example, visa type and main/secondary applicant status), all other terms and definitions relate to Census variables and the Australian Statistical Geography Standard (ASGS). For more detail on Census variables, please see the Census Dictionary.

Show all

Abbreviations

Show all

Back to top of the page