DATA INTEGRATION @ THE NMSU
Commonly known as DI or data linkage, the purpose of data integration is to gain more information from the combination of datasets than is available from the datasets separately, without increasing the burden on providers through further survey collections. Linked datasets are particularly appealing because they are often very large, enabling cross tabulations that may not be possible with survey data due to the sample size. Furthermore, where multiple years of data can be linked, cohort analysis can be undertaken to establish common pathways.
As an Integrating Authority, the ABS is in a good position to integrate sensitive data from administration sources because we are governed by the Census and Statistics Act 1905 which prevents the release of information that could be attributed to a specific individual. In addition, the ABS adheres to the High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes. So, the public can rest assured that their data is in safe hands. If you are interested, further information is available on the ABS website about Statistical Data Integration and ABS integrating authority services services.
The NMSU is currently working on two data integration projects, both using extracts from the Department of Immigration and Border Protection's (DIBP) Settlement Database (SDB).
Migrants Census Data Enhancement (CDE) Project
The 2011 Migrants Census Data Enhancement (CDE) Project used both Gold Standard probabilistic linking and Bronze Standard probabilistic linking to combine the 2011 Census of Population and Housing with the DIBP SDB. The integration of this data enhances the statistical and research value of both datasets by enabling the settlement outcomes of migrants who have arrived permanently in Australia since 1 January 2000 through to 9 August 2011 to be analysed in the context of their entry conditions (i.e. their visa type, whether a main or secondary applicant and whether they applied onshore or offshore).
There are a suite of outputs for this project.
The ABS released the Microdata: Australian Census and Migrants Integrated Dataset, 2011 (cat. no. 3417.0.55.001) on 14 February 2014 and it is now available for access via TableBuilder. Work is currently underway on a second microdata release in DataAnalyser which is expected to be released at the end of June 2014 at this stage. DataAnalyser will allow users to undertake statistical analyses such as regression as well as providing table production functionality.
TableBuilder is an online tool for creating tables and graphs from ABS Microdata. TableBuilder is available for one-off fee of $925 if you are registering as an organisation which allows for multiple users. The rate for individuals is a bit lower (see pricing). Staff and students at participating Universities listed under the ABS/Universities Australia Agreement have access to TableBuilder products for the purposes of research, teaching and permitted commercial uses.
To apply for access to any of the TableBuilder datasets, you first need to register in the Registration Centre, and join your organisation. You can then log in to the Registration Centre at any time to apply for access to TableBuilder datasets. For further assistance with registering, see the Registration basics page. All available TableBuilder products are listed in the Expected and available Microdata page.
The CDE statistical publication, 'Understanding Migrant Outcomes - Enhancing the Value of Census Data, Australia, 2011' (cat. no. 3417.0), was released on 19 September 2013 containing a suite of outputs at the National level. Supplementing this National data, a series of associated State and Territory level data cubes will be released to the public later this month, as additional information.
The quality of the linking is discussed in the 'Research Paper: Assessing the quality of Linking Migrant Settlement Records to 2011 Census Data' (cat. no. 1351.0.55.043) which was released on the ABS website on 19 August 2013.
The ABS has linked a 5% sample of the 2011 Census data to the 2006 Census data using Bronze Standard probabilistic linking to create the Microdata: Australian Census Longitudinal Dataset, 2006-2011 (cat. no. 2080.0) in TableBuilder which was released on the 18 December 2013. NMSU will be working to enhance the ACLD with information from the Migrants CDE Project Bronze Standard linked dataset. At this stage we anticipate output from this longitudinal linkage, Migrants Australian Census Longitudinal Dataset (Migrants ACLD) to be available in mid 2014. We will keep you updated about our progress in future newsletters.
Migrant Personal Income Tax (PIT) Data Integration (DI) project - Feasibility phase
The Migrant PIT DI project seeks to establish if an extract of the Department of Immigration and Border Protection (DIBP) Settlement Database (SDB) can be integrated with Personal Income Tax (PIT) data from the Australian Taxation Office. The linking process for the feasibility phase has been completed and analysis is being conducted on the linked file. A research paper is scheduled for release via the ABS website later this year. Recommendations will be made from this Research Paper as to whether this project will proceed to a dissemination phase with further statistical output.
The linked dataset may provide insight into the economic outcomes of permanent migrants who arrived in Australia from 1 January 2000. The linked dataset is unique in that it contains many disaggregated income variables not collected elsewhere for these recent migrants, including own unincorporated business income, investment income, and superannuation and annuity income. For more information see the project listing on the Public Register of Data Integration Projects.