DATA INTEGRATION - DIVERSE SOURCES AND BIG DATA OFFER NEW INSIGHTS
The ABS sees its future as a statistical organisation working to a solutions centred data model, underpinned by a strong methodological effort in providing statistical tools that will improve the ABS's ability to link, combine and repurpose data in meaningful ways.
There are two general approaches to linking data - either across collections, or through time. Cross collection linkage relies on finding common elements in different source datasets and then using these common elements to merge the datasets together, while time based linkage creates a time series of data from a number of 'single point in time' observations.
Transforming statistics in a major way relies on new approaches to sourcing or creating datasets, and integrating components of these datasets with a wide range of administrative data held by governments and organisations. The ABS needs to use data integration as a statistical tool to increase the depth and breadth of available statistics without adding extra costs to government or additional respondent burden on the Australian community.
Statistical data integration developments rely on partnering with the government, research and private sector. The ABS is building capability through new and existing partnerships with data custodians to improve accessibility to public information, maximising its use for statistical purposes. Through contributing to Australian Public Service wide initiatives the ABS is supporting the increased availability of data and developing greater capacity to create ‘on demand’ integrated datasets.
NEW INSIGHTS FROM CROSS-SECTIONAL INTEGRATED DATA
The ABS has expanded a number of existing datasets by using data integration to provide a point in time view of social, economic and environmental issues.
By combining information from surveys, administrative collections and censuses, a more complete picture of the circumstances of individuals, households and businesses can be seen. Integrated datasets have the flexibility to be combined with additional point in time and/or longitudinal information to understand broader implications as policy directions evolve and Australian society changes.
MENTAL HEALTH SERVICES AND THE CENSUS
In partnership with the National Mental Health Commission, the Department of Health and the Department of Human Services, the ABS has integrated a subset of Commonwealth subsidised mental health related data items from the Medicare Benefits Schedule and the Pharmaceutical Benefits Scheme, with demographic details from the Census of Population and Housing. This dataset supported new analysis of the effectiveness and efficiency of mental health services in Australia - for more information on this initiative, see the article about Unlocking the power of statistics.
MEASURING EDUCATIONAL OUTCOMES OVER TIME
The ABS has led a number of data integration initiatives in partnership with state and territory agencies, the Australian Government Department of Education and Training and other stakeholders that aim to address data gaps and build the evidence base in child development, education and training statistics. This work combined information from the Census with the Australian Early Development Census and National Assessment Program - Literacy and Numeracy to produce experimental estimates for Queensland and Tasmania. These datasets were used to analyse the impact of personal, family, social and economic characteristics on school achievement and child development over time.
In addition, the ABS collaborated with the National Centre for Vocational Education Research and other stakeholders to demonstrate the potential for measuring longitudinal post-study outcomes by linking vocational education and training and census data.
Ultimately, these pathfinder projects aim to expand the information available to answer policy and research interests.
MIGRANT PERSONAL INCOME TAX DATA INTEGRATION FEASIBILITY PROJECT
The Migrant Personal Income Tax Data Integration project linked an extract of the Department of Social Services settlement database with data from the Australian Taxation Office to test the feasibility of developing an integrated dataset for research and statistical purposes. This project demonstrated that the linking of these records was feasible, and a research paper documenting the project findings has been released.
The linked dataset will provide a comprehensive picture of the economic outcomes of migrants to assist policy makers and researchers to better understand the experience of migrants and their contribution to Australia.
This is particularly important given the prominence of Australian immigration policy in shaping future population growth, and the major changes that have occurred in migration policies over the last decade.
INCREASING AVAILABILITY OF ABORIGINAL AND TORRES STRAIT ISLANDER STATISTICS
Integrated and longitudinal data is meeting growing demand for Aboriginal and Torres Strait Islander statistics without the need to collect additional information.
NEW INSIGHTS FROM LONGITUDINAL DATA
- improving the life expectancy estimates for Aboriginal and Torres Strait Islander people by linking death registrations to Census records
- exploring whether school leavers from 2006 had continued on to further study and/or had moved into the workforce by 2011.
Longitudinal data provides researchers and policy makers with the ability to study changing patterns in social, economic and environmental conditions. Recognising that the complexities of pathways and transitions involve many external factors in an individual's life, longitudinally integrated datasets provide evidence to understand these trajectories.
The ABS currently produces three key longitudinally linked datasets:
THE BUSINESS LONGITUDINAL DATABASE
- the Business Longitudinal Database
- the Australian Census Longitudinal Dataset
- the Longitudinal Labour Force Dataset.
The Business Longitudinal Database comprises several datasets containing characteristics and financial information about small and medium businesses and it has already provided new information about business innovation, efficiency performance and likelihood of business survival.
Recently, in conjunction with the Department of Industry and Science, the ABS has developed a new integrated firm-level dataset, the Expanded Analytical Business Longitudinal Database, which links financial and characteristics data for all active businesses in the Australian economy from 2001-02 to 2012-13.
THE AUSTRALIAN CENSUS LONGITUDINAL DATASET
The Australian Census Longitudinal Dataset combines socio-demographic information about individuals from the 2006 and 2011 Censuses and allows the study of changing patterns in social and economic conditions at the individual level has been used by researchers and policy makers to:
THE LONGITUDINAL LABOUR FORCE DATASET
- better understand the factors associated with identification of Indigenous status by Aboriginal and Torres Strait Islander peoples
- investigate employment outcomes of workers who moved between industries
- investigate changes in family relationships and transitions of individuals.
The Longitudinal Labour Force Dataset combines 36 monthly labour force surveys along with data collected from other ABS surveys from January 2008 to December 2010. It allows in depth analysis of the Australian labour market over time and provides information about labour force status, socio-demographic characteristics, employment information as well as industry and occupation of individuals.
It has been used by researchers and policy makers to investigate:
- labour mobility
- household structure and economic participation
- employment outcomes of migrants
- increases in unemployment during recessions.
Advances in statistical methods, technology and the availability of administrative data are creating rich sources for evidence based analysis of government programs.
In its role as a provider and coordinator of official statistics, the ABS is harnessing emerging opportunities to create new datasets, fill data gaps and increase the accessibility of statistics.
The ABS is moving towards a solutions centred data provision model, underpinned by data integration. Transforming the ABS's statistical output will rely on new approaches to sourcing or creating datasets, as well as integrating components of these datasets with a wide range of other data held by governments and organisations.
The ABS will continue to use data integration as a standard statistical tool to increase the depth and breadth of available statistics, while technological advances and modern statistical techniques continue to open up new possibilities in data integration and big data, facilitate greater access to microdata and safeguard the privacy of individuals.
Future data options will harness new technologies and create new access pathways for our research and policy partners – find out more about ABS data integration at: www.abs.gov.au/dataintegration
The ABS is building capability through new and existing partnerships with data custodians to improve accessibility to public information, maximising its use for statistical purposes. Through participating in public sector initiatives the ABS is supporting the increased availability of data and developing greater capacity to create new 'on demand' integration information.