1001.0 - Australian Bureau of Statistics -- Annual Report, Report on ABS performance in 2015-16  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 13/10/2016   
   Page tools: Print Print Page

SPECIAL ARTICLE

DATA INTEGRATION
For well over 100 years, the ABS has been Australia's trusted national statistical agency, providing official statistics to inform on a wide range of matters of importance such as parliamentary representation, the environment, the economy, and jobs. As the economy and society become more complex and interconnected, official statistics must also change to keep pace and remain relevant.

Like many national statistical offices around the globe, the ABS has been adopting data integration techniques to increase the depth and breadth of information available to support research and public policy in a way that is less costly and less intrusive on households and businesses than traditional survey methods. Evolving technology and advances in analytical capability have replaced the need for manual processes and is delivering new insights and value in a way that could not have been envisaged even two decades ago.

In simple terms data integration is a well-established method of bringing together existing information about people, places, business or events in a way that protects privacy and confidentiality.

In recognition of this evolving information environment, the ABS began investing in a dedicated data integration facility in 2005. The facility was subsequently independently accredited as a Commonwealth data integration facility in 2012. These investments have enhanced the internal mechanisms the ABS uses to keep personal information more secure, over and above the ABS’ already strong protections.

The ABS’ data integration facility requires all data integration project proposals to go through a rigorous assessment and approval process to ensure the project provides a significant public benefit and takes a privacy-by-design approach. In addition, staff members assigned to a project are never able to see all of an individual’s information together at any point of the data integration process and data access rights are only provided on a ‘needs to know’ basis – this is known as the ‘separation principle’. These protections are in addition to existing strong protections that all ABS staff are legally bound to never release personal information to any individual or organisation outside of the ABS.

Investments in ABS data integration capability and methodological expertise over the past decade are seeing strong demand from government, academic and research use of the ABS data integration facility. The ABS is transitioning from undertaking a small, discrete array of projects to a rich, diverse and expanding program of projects across social, economic and environmental domains.

Since 2005, over 100 separate data sources have been used in 44 data integration projects internally and in partnership with over 25 different organisations including Australian and state government departments and a small number of non-government organisations.

The ABS has used data integration in the production of several nationally significant statistical and research datasets. Some of these are detailed below.


BUSINESS LONGITUDINAL ANALYSIS DATA ENVIRONMENT

The ABS and the Department of Industry, Innovation and Science have developed the Business Longitudinal Analysis Data Environment (BLADE), which contains detailed information on the characteristics and finances of Australian businesses.

Formerly known as the Expanded Analytical Business Longitudinal Database (EABLD), this integrated data environment draws on several years of administrative data from the Australian Taxation Office (ATO) and survey data from the ABS, enabling analysis of businesses over time and the micro-economic factors that drive performance, innovation, job creation, competiveness and productivity. The BLADE therefore improves the evidence base for policy development and reform.

For example, the BLADE has been used to examine the contribution of start-ups to job creation in the Australian economy, revealing that it is young small to medium enterprises that make the greatest contribution to overall jobs growth.


AUSTRALIAN CENSUS LONGITUDINAL DATASET

The Australian Census Longitudinal Dataset (ACLD) brings together a 5% random sample of records from the 2006 Census with corresponding records from the 2011 Census. This provides a unique opportunity for researchers to access a very large and detailed longitudinal dataset and examine pathways and transitions for different population groups.

Following the 2016 Census, the ABS will expand the ACLD to include a third time point. Over time, the ACLD will continue to grow in value as records from each successive Census are linked, providing a much more detailed longitudinal picture of changing patterns in social and economic conditions in the lives of Australians.

The ACLD was created by linking data from the 2006 Census and the 2011 Census using personal characteristics and geographic region. The utility of the dataset has also been enhanced by linking the ACLD with selected administrative datasets, including information on migrant settlements. This allows outcomes of particular groups of migrants (such as those on study, family or humanitarian visas) to be examined using the range of topics collected in the Census.

The ACLD is available to registered users in TableBuilder, where users can create their own customised tables. In addition, the ACLD is available in the ABS DataLab facility as a microdata product that enables researchers to unlock the full power of the longitudinal Census data by performing more detailed analytical techniques using a range of statistical software packages.

Researchers and policy makers have used the ACLD to:

  • better understand the factors associated with the increase in people choosing to identify as Aboriginal or Torres Strait Islander
  • investigate employment outcomes for retrenched workers leaving the motor vehicle industry
  • investigate changes in family relationships and fertility




MULTI-AGENCY DATA INTEGRATION PROJECT

The Multi-agency Data Integration Project (MADIP) is a collaborative partnership between five Australian Government agencies: the Department of Health, Department of Social Services, Department of Human Services, ATO, and the ABS.

MADIP is currently in an evaluation phase, with the partner agencies working together to maximise the value of existing data, address and resolve barriers to data sharing, and create an enduring data resource with cross-portfolio information readily available to support analysis and evaluation as it is needed.

MADIP has at its core a high quality snapshot of 2011 data, combining administrative data on health services, income tax and government support payments with the detailed demographic and family data from the Census.

An evaluation of the integrated data and its potential to inform policy development and evaluation is currently being undertaken by data experts from all five partner agencies. Preliminary analysis suggests that this project has significant potential to improve Australia's health, welfare and education systems through a better understanding of the impact of social and economic policies and industry changes.

The shared vision for this resource is that the dataset will be expanded over time (both longitudinally and in terms of new data), and will become available for broader use by other Australian Government agencies, states and territories, academics, and the public.


LINKED EMPLOYEE-EMPLOYER DATABASE

The foundational Linked Employee–Employer Database (LEED) project joins personal income tax data from the Australian Taxation Office with business-level data from the EABLD, linking person-level data with business-level data for the first time.

This foundation project represents an important first step towards a future LEED, which will contain data linked across multiple years and include more detailed socio-economic and demographic information relating to employees. Through further linkage with other datasets, additional characteristics could be used to explore the drivers of firm-level performance, such as the educational qualifications of employees.

The ultimate longer term goal is to enhance understanding of productivity, changes in employment by industry, entry and exit to the labour market, and other important labour market dynamics. This addresses a longstanding information gap in Australian labour market statistics and provides a solid evidence base for policy development and evaluation.


ACCESS TO RESEARCH DATASETS

The ABS is committed to streamlining access to the data from data integration projects such as these (and to ABS data more generally), while maintaining appropriate protection of individuals’ personal information. The ABS facilitates safe access to appropriately de-identified microdata through the Five Safes framework. This framework unlocks the value of existing data while ensuring the privacy and confidentiality of individuals.

The Five Safes Framework has been adapted from the UK model. The Five Safes cover:

1. safe people - Is the researcher authorised to access and use the data appropriately?
2. safe projects - Is the data to be used for an appropriate purpose?
3. safe settings - Does the access environment prevent unauthorised use?
4. safe data - Has appropriate and sufficient protection been applied to the data?
5. safe output - Are the statistical results non-disclosive?

The purpose of each of the five safes in the framework
Taken together the Five Safes ensure that a comprehensive assessment is undertaken and that appropriate controls are put in place prior to data access being undertaken. This ensures data is not able to be used in a way that is likely to enable the identification of any individuals or organisations.

The ABS is improving access to data by increasing the use of inpostings of Government officials under the Australian Bureau of Statistics Act 1975, where officials are subject to the same stringent secrecy and data security requirements as ABS officers, and through increased use of the ABS DataLab, a secure physical environment allowing access to detailed microdata files. ABS has also commenced trials of a 'virtual' DataLab, giving more convenient yet still secure and legal access to other Commonwealth officials


NEXT STEPS
For more than a decade, the ABS has demonstrated the considerable public value that comes from unleashing the power of data through data integration, and shown that data can be brought together in a way that upholds the privacy, confidentiality and security of the information while safely meeting the growing demand for richer data to inform on complex policy issues.

While data integration is already integral to many key official statistics routinely produced by the ABS, the ABS will continue to expand its use of data integration as a standard statistical tool to increase the depth and breadth of available statistics, and maximise the use of existing data. As part of this, the ABS will continue to develop and improve data linkage methods and techniques to take advantage of new technologies and new and emerging data sources available from government, commercial or community sources.

The ability to get the most value out of existing data through improved access and use depends on arrangements that maintain the trust and support of the community. We will continue to support and drive best practice in data integration. Importantly, we will continue to invest in best practice to ensure the data which ABS collects and curates is secure and that privacy and confidentiality is assured whilst maximising the utility and accessibility of the data for public policy and research use.

We will continue to build and maintain relationships with data custodians and continue to undertake data integration projects in partnership with other Australian Government and state and territory agencies to improve accessibility to public information.

Data integration not only helps meet the need for richer and more detailed data on discrete topics such as social policy or firm-level productivity, it also meets the need for data across sectors, such as linked employer–employee data. As a result, the ABS is maturing its data integration program from a series of discrete projects into an enduring public resource that provides a single, safe access point for approved researchers to nationally significant integrated data across economic, social, and geospatial domains.