|Page tools: Print Page Print All|
8 The date of arrival on which the scope is based reflects an individual's latest arrival pertaining to their latest permanent visa. For an offshore applicant, the arrival date is when the applicant arrives in Australia on that permanent visa. However, for a person who applies onshore for a permanent visa, the date of arrival listed is the date of their last entry into Australia.
9 Statistical data integration involves combining information from different data sources such as administrative, survey and/or Census to provide new datasets for statistical and research purposes.
10 Data linking is a key part of statistical data integration and involves combining records from different source datasets using variables that are shared between the sources. Data linkage is performed on unit records that represent individual persons.
Linkage between the Permanent Migrant Data and the 2016 Census
11 The 2016 Permanent Migrant Data records were linked to the 2016 Census of Population and Housing data using a combination of deterministic and probabilistic linkage methodologies.
12 Deterministic data linkage, also known as rule-based linkage, involves assigning record pairs across two datasets that match exactly or closely on common variables. This type of linkage is most applicable where the records from different sources consistently report sufficient information to efficiently identify links. It is less applicable in instances where there are issues with data quality or where there are limited characteristics. The deterministic linkage method used in this project is considered a silver standard linkage because encoded name and address information was used in this phase of the linkage.
13 Probabilistic linking allows links to be assigned in spite of missing or inconsistent information, providing there is enough agreement on other variables to offset any disagreement. In probabilistic data linkage, records from two datasets are compared and brought together using several variables common to each dataset (Fellegi & Sunter, 1969).
14 A key feature of the methodology is the ability to handle a variety of linking variables and record comparison methods to produce a single numerical measure of how well two particular records match, referred to as the 'linkage weight'. This allows ranking of all possible links and optimal assignment of the link or non-link status (Solon and Bishop, 2009).” This probabilistic linkage method used in this project is considered a silver standard linkage because it also used encoded names and address, date of birth, country of birth, year of arrival and codes representing small geographic areas. Further information about name and address encoding can be found in Information paper: Name encoding method for Census 2016.
19 The first step of the calibration process adjusted for non-response. The methodology adopted was developed to adjust for non-response in sample surveys. Concepts of non-response and non-links differ in that the former is a result of an action by a person selected in a sample, and the latter is the failure to link a record likely as a result of the quality of its linking variables. However, both situations may result in under/over representation, and as such the methodology developed to adjust for non-response is suitable to apply to adjust for non-links. Like its 2011 counterpart, ACMID 2016 is unique in that many characteristics of the non-linked records are known, and these characteristics can therefore be used as inputs into an adjustment for unlinked records.
20 The propensity of a Permanent Migrant Data record to be linked to a Census record was modelled using a logistic regression, which outputs the probability of linking for each record based on that record’s characteristics. Each record was then assigned an initial weight given by the inverse of this probability.
21 The second step of the calibration process uses the weights derived from the first step as an input into the calibration to the known Permanent Migrant Dataset subpopulation totals such as visa group, location of visa grant, applicant status and state/ territory of residence. Calibration was then conducted to the following benchmark totals from the Permanent Migrant Data file:
INTERPRETATION OF RESULTS
29 There are several variables common to the two source datasets which have definitional differences.
Year of arrival
30 Estimates in this publication are produced using the 2016 Census year of arrival variable (YARP). The year of arrival question on the Census asks overseas-born people to report the year they first arrived in Australia with the intention of staying for at least one year. The year the person first arrived in Australia to live here for one year or more may have occurred many years before their 'arrival date' as reported in the Permanent Migrant Data.
31 The 'Prior to 2000' year of arrival group represents those permanent migrants (whose Permanent Migrant Data arrival date is 1 January 2000 to 9 August 2016) who reported on the Census that they first came to Australia to live for one year or more prior to 2000. For some individuals, their year of arrival as reported on the Census is different to their Permanent Migrant Data arrival date pertaining to their permanent visa. The Permanent Migrant Data arrival date reflects an individual’s latest arrival pertaining to their latest permanent visa (see Note 8). Where the Census year of arrival precedes that of the Permanent Migrant Data, it is likely that the person was a temporary migrant for a period of time before attaining permanent resident status.
32 Due to the conceptual differences discussed (Notes 30-31) the year of arrival estimates in this publication will not reflect the Department of Home Affairs reported migrant intake for individual years of arrival, nor will they reflect year of arrival estimates from the 2016 Census of Population and Housing.
Country of Birth
33 Estimates in this publication are produced using the 2016 Census country of birth variable (BPLP). The concept measured for country of birth is the same for both the Census and Permanent Migrant Data. However, the Census variable was coded using the Standard Australian Classification of Countries (SACC 1269.0) as it was on 9 August 2016, whilst the Permanent Migrant Data country of birth variable has been coded at the time of record creation over an 16 year period and therefore is based on a classification that has evolved over time.
34 For a substantial number of records, the 4 digit country of birth reported on the Census is different to the 4 digit country of birth recorded on the Permanent Migrant Data. For the majority of these records the 2 digit country of birth code is the same and the difference at the 4 digit level is due to differences in coding and the classifications.
35 Due to the conceptual differences described in Note 33 and 34 estimates for individual 4 digit country of birth may not necessarily reflect the Department of Home Affairs reported migrant intake from that country of birth.
COMPARABILITY WITH OTHER DATA
36 Estimates from the 2016 Australian Migrants and Census Integrated Dataset will differ from the estimates produced from other ABS collections and estimates produced from the Permanent Migrant Data for several reasons. The estimates are a result of integrating data from two data sources, one an administrative dataset and the other a census. The linked records have been calibrated to known population totals from the Permanent Migrant Data, and the resulting dataset is unique from both the Census and the Permanent Migrant Data. Due to the quality issues mentioned in Notes 24 to 28, estimates should generally be treated with caution.
39 In accordance with the Census and Statistics Act 1905, data are subject to a confidentiality process before release as noted above. This confidentiality process is undertaken to avoid releasing information that may allow the identification of particular individuals, families, households, dwellings or businesses.
Perturbation of data
40 To minimise the risk of identifying individuals in aggregate statistics, a technique is used to randomly adjust cell values. This technique is called perturbation. Perturbation involves small random adjustments of the statistics and is considered the most satisfactory technique for avoiding the release of identifiable statistics while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics.
41 The introduction of these random adjustments result in tables not adding up. While some datasets apply a technique called additivity to give internally consistent results, additivity has not been implemented on the 2016 ACMID. As a result, randomly adjusted individual cells will be consistent across tables, but the totals in any table will not be the sum of the individual cell values. The size of the difference between summed cells and the relevant total will generally be very small.
These documents will be presented in a new window.
3417.0.55.001 - Microdata: Australian Census and Migrants Integrated Dataset, 2016 Quality Declaration
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 18/07/2018