3417.0 - Understanding Migrant Outcomes - Enhancing the Value of Census Data, Australia, 2011  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 19/09/2013  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All

EXPLANATORY NOTES


INTRODUCTION

1 The statistics in this publication were compiled from the Migrants Census Data Enhancement (CDE) Integrated Dataset, 2011 produced in the 2011 Migrants CDE Project.

2 The statistics in this publication relate to people who have migrated to Australia under a permanent Skilled, Family or Humanitarian stream visa and 'arrived' in Australia between 1 January 2000 and 9 August 2011 (subject to the definition of arrived in Note 8 and further exclusions listed in Notes 6 - 7). In this publication, this population is referred to as Permanent Migrants.

3 The 2011 Migrants CDE Project probabilistically linked the 2011 Census dataset with an extract from the Department of Immigration and Citizenship's (DIAC) Settlement Data Base (SDB). For more information on the 2011 Migrants CDE Project please see 'In This Issue' and/or the Research Paper: Assessing the Quality of Linking Migrant Settlement Records to 2011 Census Data, Aug 2013 (cat. no. 1351.0.55.043). Information about this project is also included on the Public Register of Data Integration Projects on the National Statistical Service website.


DATA SOURCES

Settlement Data Base

4 The Department of Immigration and Citizenship (DIAC) Settlement Data Base (SDB) is an administrative database combining data regarding permanent settlers in Australia from various data sources from within DIAC. The SDB provides statistical data for the planning of settlement services within DIAC and for other government and community agencies involved in the settlement of migrants. Updated address information is incorporated from the Department of Human Services (Medicare Australia).

2011 Census of Population and Housing

5 For information about the 2011 Census and collection methodology please refer to the information provided on the ABS website (www.abs.gov.au) at Census 2011 Reference and Information. Information about the data quality of the Census is available on the ABS website under Census Data Quality.


SCOPE

6 The scope of the Migrants CDE Integrated Dataset, 2011 is restricted to people who responded to the 9 August 2011 Census of Population and Housing AND persons who had a settlement record on the SDB with a date of arrival between 1 January 2000 and 9 August 2011 (inclusive). The Migrants CDE Integrated Dataset, 2011 excludes:

  • Persons for which the 2011 Census record was imputed
  • Persons whose Census record indicated that they were an overseas visitor
  • Persons who were out of the country on Census night
  • Non-visa settlers (e.g. some New Zealand Citizens who have migrated to Australia)
  • Deceased persons

7 In addition this publication excludes:
  • Permanent migrants from the Permanent - Other visa stream
  • Temporary migrants from the Temporary - Student and Temporary - Other visa groups
  • Migrants whose visa subclass code is classified as Not Applicable

8 The SDB date of arrival on which the scope is based reflects an individual's latest arrival pertaining to their latest permanent visa. For an offshore applicant, the SDB arrival date is when the applicant arrives in Australia on that permanent visa. However, for a person who applies onshore for a permanent visa, the date of arrival listed on the SDB is the date of their last entry into Australia.


COVERAGE

9 Of the 1,315,048 SDB records that were in-scope, 1,003,532 records (76%) were linked to a Census record. After publication exclusions (see Note 7) the sample for the data in this publication is 974,545 persons.

10 The in-scope SDB records that did not link are referred to as unlinked records. Unlinked records may include missed links (people for which a record existed on both the Census and the SDB) or non-links (people for which a record existed on the SDB but not on the Census).

11 Some groups of records were more likely to link or conversely less likely to link than other groups of records. This resulted in over representation of some groups and under representation of others. Records are more difficult to link when a person has poorly reported, poorly coded, missing or non-applicable values for linking variables. The non-random distribution of links has the potential to cause bias. Potential bias associated with over/under representation was addressed in the calibration process described in Notes 12 to 16.


ESTIMATION METHOD

Calibration

12 The estimates in this publication are obtained by assigning a "weight" to each linked record. The weight is a value which indicates how many SDB records are represented by the linked record. Weights aim to adjust for the fact that the linked SDB records may not be representative of all the SDB records. The weights on the Migrants CDE Integrated Dataset, 2011 range from 0.7 to 4.5.

13 The number of SDB records that were in-scope for linking was adjusted from 1,315,048 to 1,311,654 to account for incomplete information about deceased persons on the SDB. The 3,394 person downward adjustment was calculated using ABS Demography statistics. The two-step calibration process then weighted the original 1,003,532 sample up to a death-adjusted population total of 1,311,654. After exclusions, the calibrated population total for the data in this publication is 1,273,701.

14 The first step of the calibration process used methodology developed to adjust for non-response in sample surveys. The concepts of non-response and non-links differ in that the former is a result of an action by a person selected in a sample, and the latter is the failure to link a record likely as a result of the quality of its linking variables. However, both situations may result in under/over representation, and as such the methodology developed to adjust for non-response is suitable to apply to adjust for non-links. The Migrant CDE Integrated Dataset, 2011 is unique in that many characteristics of the non-links are known, and these characteristics can therefore be used as inputs into a non-links adjustment.

15 The propensity of an SDB record to be linked to a Census record was modelled, and each record was assigned an initial weight. Records in the linked dataset which share characteristics with unlinked records are given higher weights by this model, such that unlinked records are adequately represented on the linked file.

16 The second step of the calibration process used the weighted file as produced in step one, and calibrated it to the death-adjusted SDB totals. Calibration was conducted using SDB variables, age group cross classified with sex, country of birth, visa group and visa stream cross classified with state.

Estimation

17 Estimates in this publication are obtained by summing the weights of persons with the characteristic of interest. Any discrepancies between totals and sums of components are due to rounding.


RELIABILITY OF ESTIMATES

18 Error in estimates produced using the Migrants CDE Integrated Dataset, 2011 may occur due to false links and the non-random distribution of unlinked records.

False links

19 The 2011 Migrants CDE Project produced four datasets. Three of these datasets were produced using Bronze Standard probabilistic linking. The term Bronze Standard indicates that name and address were not used as linking variables. A Gold Standard dataset was also produced. The term Gold Standard indicates that name and address were used as linking variables. The Gold Standard Dataset was used as a benchmark against which the quality of the three Bronze Standard datasets was assessed. Where a SDB record was linked to a Census record on the Gold Standard Dataset but was linked to a different Census record on a Bronze Standard dataset, the record on the bronze dataset is considered to be a false link. Additionally, where a SDB record was not linked to a Census record on the Gold Standard Dataset but was linked to a Census record on a Bronze Standard dataset, the record on the bronze dataset is also considered to be a false link.

20 Three Bronze Standard datasets labelled High, Medium and Low were created, and differ in terms of the threshold used to accept or reject links in the probabilistic linking process. The High label indicates a high threshold which results in fewer overall links and fewer estimated false links, and conversely the Low label indicates a low threshold which results in more overall links but also a higher level of estimated false links.

21 The 2011 Migrants CDE Project used the Low dataset for the statistics produced in this publication. The Migrant CDE Bronze Low dataset contains an estimated 115,451 false links. Users should note that the estimation of false links relates only to the difference between the Gold Standard Dataset and the Bronze Standard datasets, and that there may also be records on the Gold Standard Dataset for which the person in the SDB record is not the same person as that of the linked Census record. This error has not been estimated and is not reflected in the estimate of false links.

22 The calibration process does not mitigate against the error introduced by false links or error introduced in the Gold Standard probabilistic linking process. Due to the quality issues mentioned above, estimates should generally be treated with caution. Further information about the data and the linking methodology used is available in the Research Paper: Assessing the Quality of Linking Migrant Settlement Records to 2011 Census Data, Aug 2013 (cat. no. 1351.0.55.043).

Unlinked records

23 Error introduced by under/over representation of characteristic based groups in unlinked records has been mitigated to some extent by the two-step calibration process.

Measures of error

24 In survey data sampling error is estimated using a measure of Relative Standard Error (RSE). Whilst RSEs can be produced for this data, they would not represent the error introduced by false links or error introduced in the Gold Standard probabilistic linking process, and have therefore not been included in this publication.

25 Statements made in the text of this publication that compare proportions between two population groups have not been tested for significance. Statistical significance testing requires an estimate of the magnitude of the error for each statistical estimate, which is not yet available for statistical estimates produced using the Migrants CDE Integrated Dataset, 2011


INTERPRETATION OF RESULTS

26 There are several variables common to the two source datasets which have definitional differences.

Year of arrival

27 Estimates in this publication are produced using the 2011 Census year of arrival variable (YARP). On the Census, people are asked to report when they first came to Australia to live for one year or more. This will be regardless of visa type and may have occurred many years before their latest permanent visa reported on the SDB.

28 The scope of the SDB records were assessed using the SDB date of arrival variable. The SDB date of arrival reflects an individual’s latest arrival pertaining to their latest permanent visa (see Note 8 for further information). However, an individual's migration journey can be complex and occur over a long period of time. Many people live in Australia on a temporary visa prior to their current permanent visa being granted. Some of these temporary migrants leave Australia and then apply for a permanent visa offshore. Others apply for a permanent visa while they are still living in Australia (i.e. onshore applicants) on a temporary visa.

29 For a substantial number of records the year of arrival reported on the Census is different to the arrival year recorded on the SDB. For the majority of these records the Census year of arrival precedes that of the SDB. In these cases, it is likely that the person was a temporary migrant for a period of time before attaining permanent resident status. Producing estimates using the Census year of arrival variable has resulted in an 'Arrived prior to 2000' group being present in the estimates. This group is indicative of the lag between first arrival as a temporary resident (Census year of arrival) and the granting of permanent resident status (SDB date of arrival). The lag also has a pronounced effect on the estimates for the most recent years of arrival. Temporary migrants who arrived in Australia between 1 January 2000 and Census night 2011, who transitioned to permanent residency after Census night 2011, were out of scope for the SDB linking extract and are not represented in the estimates. Due to the conceptual differences discussed (Notes 27-29) the year of arrival estimates in this publication will not reflect the DIAC reported intake of migrants for individual years of arrival, nor will they reflect year of arrival estimates from the 2011 Census.

30 It should also be noted that the estimates present in the 'Arrived prior to 2000' group only represent a very small percentage of all permanent migrants who arrived in Australia prior to 2000 (i.e. only those whose date of arrival on the SDB is 1 January 2000 to 9 August 2011).

Country of Birth

31 Estimates in this publication are produced using the Census country of birth variable (BPLP). The concept measured for country of birth is the same for both the Census and SDB variables. However, the Census variable was coded using the Standard Australian Classification of Countries (SACC 1269.0) as it was on 9 August 2011, whilst the SDB country of birth variable has been coded at the time of record creation over an 11 year period and therefore is based on a classification that has evolved over time.

32 For a substantial number of records, the 4 digit country of birth reported on the Census is different to the 4 digit country of birth recorded on the SDB. For the majority of these records the 2 digit country of birth code is the same and the difference at the 4 digit level is due to differences in coding and the classifications. Estimates for individual 4 digit country of birth may not necessarily reflect the DIAC reported intake of migrants from that country of birth due to the conceptual differences described in Note 31.


COMPARABILITY WITH OTHER DATA

33 Estimates from the 2011 Migrants CDE Project will differ from the estimates produced from other ABS collections and estimates produced from the SDB for several reasons. The estimates are a result of integrating data from two data sources, one an administrative dataset and the other a census. The linked records have been calibrated to known population totals from the SDB, and the resulting dataset is unique from both the Census and the SDB. Due to the quality issues mentioned in Notes 18 to 25, estimates should generally be treated with caution. Further information about the data and the linking methodology used is available in the Research Paper: Assessing the Quality of Linking Migrant Settlement Records to 2011 Census Data, Aug 2013 (cat. no. 1351.0.55.043). This research paper provides a summary of the Migrants Quality Study.


ACKNOWLEDGEMENT

34 The ABS acknowledges the continuing support provided by the Department of Immigration and Citizenship (DIAC) for the Migrants CDE Project. The provision of data as well as ongoing assistance provided by DIAC is essential to enable this important work to be undertaken. The enhancing of migrant related statistics through data linkage by the ABS would not be possible without their cooperation and support. The ABS also acknowledges the importance of the information provided freely by individuals in the course of the 2011 Census. The Census information of individuals received by the ABS is treated in the strictest confidence as is required by the Census and Statistics Act 1905.