3417.0.55.001 - Microdata: Australian Census and Migrants Integrated Dataset, 2011 Quality Declaration 
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 14/02/2014  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All

Image: Linking Methodology LINKING METHODOLOGY

DATA SOURCES
SCOPE AND COVERAGE
ESTIMATION METHOD
RELIABILITY OF ESTIMATES
INTERPRETATION OF RESULTS
COMPARABILITY WITH OTHER DATA


DATA SOURCES

The Australian Census and Migrants Integrated Dataset (ACMID), 2011 was produced using the following data sources.

a) 2011 Census of Population and Housing


For information about the 2011 Census and collection methodology please refer to the information provided on the ABS website at Census 2011 Reference and Information. Information about the data quality of the Census is available on the ABS website under Census Data Quality.

b) Settlement Data Base

The Department of Immigration and Border Protection's (DIBP) Settlement Data Base (SDB) is an administrative database combining data regarding permanent settlers in Australia from various data sources within DIBP. The SDB provides statistical data for the planning of settlement services within DIBP and for other government and community agencies involved in the settlement of migrants. Updated address information is incorporated from the Department of Human Services (Medicare Australia).


SCOPE AND COVERAGE

Scope

The ACMID, 2011 contains information on persons who responded to the 9 August 2011 Census of Population and Housing AND persons who had a permanent visa record on the Department of Immigration and Border Protection's Settlement Database with a date of arrival between 1 January 2000 and 9 August 2011 (inclusive). Persons excluded from the ACMID, 2011 are:
    • Persons for whom the 2011 Census record was imputed
    • Persons whose Census record indicated that they were an overseas visitor
    • Persons who were out of the country on Census night
    • Non-visa settlers (e.g. some New Zealand citizens who have migrated to Australia)
    • Deceased persons
    • Permanent migrants from the Permanent - Other visa stream.

The SDB date of arrival on which the scope is based reflects an individual's latest arrival pertaining to their latest permanent visa. For an offshore applicant, the SDB arrival date is when the applicant arrives in Australia on that permanent visa. However, for a person who applies onshore for a permanent visa, the date of arrival listed on the SDB is the date of their last entry into Australia.

Coverage


Of the 1,315,048 SDB records that were in-scope, 1,003,532 records (76%) were linked to a Census record. After scope exclusions the sample for the data is 974,545 persons.

The in-scope SDB records that did not link are referred to as unlinked records. Unlinked records may include missed links (people for which a record existed on both the Census and the SDB) or non-links (people for which a record existed on the SDB but not on the Census).

Some groups of records were more likely to link or conversely less likely to link than other groups of records. This resulted in over representation of some groups and under representation of others. Records are more difficult to link when a person has poorly reported, poorly coded, missing or non-applicable values for variables that are used for linking purposes. The non-random distribution of links has the potential to cause bias. Potential bias associated with over/under representation is addressed in the calibration process described below.
ESTIMATION METHOD

Calibration

The estimates are obtained by assigning a 'weight' to each linked record. The weight is a value which indicates how many SDB records are represented by the linked record. Weights aim to adjust for the fact that the linked SDB records may not be representative of all the SDB records. The weights on the ACMID, 2011 range from 0.7 to 4.5.

The number of SDB records that were in-scope for linking was adjusted from 1,315,048 to 1,311,654 to account for incomplete information about deceased persons on the SDB. The 3,394 person downward adjustment was calculated using ABS Demography statistics. The two-step calibration process then weighted the original 1,003,532 sample up to a death-adjusted population total of 1,311,654. After accounting for the exclusions mentioned, the calibrated population total for the data in this publication is 1,273,701.

The first step of the calibration process used methodology developed to adjust for non-response in sample surveys. The concepts of non-response and non-links differ in that the former is a result of an action by a person selected in a sample, and the latter is the failure to link a record likely as a result of the quality of its linking variables. However, both situations may result in under/over representation, and as such the methodology developed to adjust for non-response is suitable to apply to adjust for non-links. The ACMID, 2011 is unique in that many characteristics of the non-links are known, and these characteristics can therefore be used as inputs into a non-links adjustment.

The propensity of an SDB record to be linked to a Census record was modelled, and each record was assigned an initial weight. Records in the linked dataset which share characteristics with unlinked records are given higher weights by this model, such that unlinked records are adequately represented on the linked file.

The second step of the calibration process used the weighted file as produced in step one, and calibrated it to the death-adjusted SDB totals. Calibration was conducted using SDB variables, age group cross classified with sex, country of birth, visa group and visa stream cross classified with state.

Estimation

Estimates are obtained by summing the weights of persons with the characteristic of interest. Any discrepancies between totals and sums of components are due to rounding.
RELIABILITY OF ESTIMATES

Error in estimates produced using the ACMID, 2011 may occur due to false links and the non-random distribution of unlinked records.

False links


The ACMID, 2011 project produced four datasets. Three of these datasets were produced using Bronze Standard probabilistic linking. The term Bronze Standard indicates that name and address were not used as linking variables. A Gold Standard dataset was also produced. The term Gold Standard indicates that name and address were used as linking variables. The Gold Standard Dataset was used as a benchmark against which the quality of the three Bronze Standard datasets was assessed. Where a SDB record was linked to a Census record on the Gold Standard Dataset but was linked to a different Census record on a Bronze Standard dataset, the record on the bronze dataset is considered to be a false link. Additionally, where a SDB record was not linked to a Census record on the Gold Standard Dataset but was linked to a Census record on a Bronze Standard dataset, the record on the bronze dataset is also considered to be a false link.

Three Bronze Standard datasets labelled High, Medium and Low were created, and differ in terms of the threshold used to accept or reject links in the probabilistic linking process. The High label indicates a high threshold which results in fewer overall links and fewer estimated false links, and conversely the Low label indicates a low threshold which results in more overall links but also a higher level of estimated false links.

The ACMID, 2011 project used the Low dataset for the statistics produced in this TableBuilder product. The ACMID Bronze Low contains an estimated 115,451 false links. Users should note that the estimation of false links relates only to the difference between the Gold Standard Dataset and the Bronze Standard datasets, and that there may also be records on the Gold Standard Dataset for which the person in the SDB record is not the same person as that of the linked Census record. This error has not been estimated and is not reflected in the estimate of false links.

The calibration process does not mitigate against the error introduced by false links or error introduced in the Gold Standard probabilistic linking process. Due to the quality issues mentioned above, estimates should generally be treated with caution. Further information about the data and the linking methodology used is available in the Research Paper: Assessing the Quality of Linking Migrant Settlement Records to 2011 Census Data, Aug 2013 (cat. no. 1351.0.55.043).

Unlinked records


Error introduced by under/over representation of characteristic based groups in unlinked records has been mitigated to some extent by the two-step calibration process described earlier.

Measures of error


In survey data sampling error is estimated using a measure of Relative Standard Error (RSE). Whilst RSEs can be produced for this data, they would not represent the error introduced by false links or error introduced in the Gold Standard probabilistic linking process, and have therefore not been included in this product.
INTERPRETATION OF RESULTS

There are several variables common to the two source datasets which have definitional differences.

Year of arrival


Estimates are produced using the 2011 Census year of arrival variable (YARP). On the Census, people are asked to report when they first came to Australia to live for one year or more. This will be regardless of visa type and may have occurred many years before their latest permanent visa reported on the SDB.

The scope of the SDB records were assessed using the SDB date of arrival variable. The SDB date of arrival reflects an individual’s latest arrival pertaining to their latest permanent visa. However, an individual's migration journey can be complex and occur over a long period of time. Many people live in Australia on a temporary visa prior to their current permanent visa being granted. Some of these temporary migrants leave Australia and then apply for a permanent visa offshore. Others apply for a permanent visa while they are still living in Australia (i.e. onshore applicants) on a temporary visa.

For a substantial number of records the year of arrival reported on the Census is different to the arrival year recorded on the SDB. For the majority of these records the Census year of arrival precedes that of the SDB. In these cases, it is likely that the person was a temporary migrant for a period of time before attaining permanent resident status. Producing estimates using the Census year of arrival variable has resulted in an 'Arrived prior to 2000' group being present in the estimates. This group is indicative of the lag between first arrival as a temporary resident (Census year of arrival) and the granting of permanent resident status (SDB date of arrival). The lag also has a pronounced effect on the estimates for the most recent years of arrival. Temporary migrants who arrived in Australia between 1 January 2000 and Census night 2011, who transitioned to permanent residency after Census night 2011, were out of scope for the SDB linking extract and are not represented in the estimates. Due to these conceptual differences the year of arrival estimates in this product will not reflect the DIBP reported intake of migrants for individual years of arrival, nor will they reflect year of arrival estimates from the 2011 Census.

It should also be noted that the estimates present in the 'Arrived prior to 2000' group only represent a small proportion of all permanent migrants who arrived in Australia prior to 2000 as only those whose date of arrival on the SDB is between 1 January 2000 and 9 August 2011 inclusive are included.

Country of Birth


Estimates in this publication are produced using the Census country of birth variable (BPLP). The concept measured for country of birth is the same for both the Census and SDB variables. However, the Census variable was coded using the Standard Australian Classification of Countries (SACC) (cat. no. 1269.0.) as it was on 9 August 2011, whilst the SDB country of birth variable has been coded at the time of record creation over an 11 year period and therefore is based on a classification that has evolved over time.

For a substantial number of records, the 4 digit country of birth reported on the Census is different to the 4 digit country of birth recorded on the SDB. For the majority of these records the 2 digit country of birth code is the same and the difference at the 4 digit level is due to differences in coding and the classifications. Estimates for individual 4 digit country of birth may not necessarily reflect the DIBP reported intake of migrants from that country of birth due to the conceptual differences described above.
COMPARABILITY WITH OTHER DATA

Estimates from the ACMID, 2011 project will differ from the estimates produced from other ABS collections and estimates produced from the SDB for several reasons. The estimates are a result of integrating data from two data sources, one an administrative dataset and the other a census. The linked records have been calibrated to known population totals from the SDB, and the resulting dataset is unique from both the Census and the SDB. Due to the quality issues mentioned earlier, estimates should generally be treated with caution. Further information about the data and the linking methodology used is available in the Research Paper: Assessing the Quality of Linking Migrant Settlement Records to 2011 Census Data, Aug 2013 (cat. no. 1351.0.55.043). This research paper provides a summary of the Migrants Quality Study.