Australian Bureau of Statistics

Rate the ABS website
ABS Home > Statistical Data Integration
ABS @ Facebook ABS @ Twitter ABS RSS ABS Email notification service
Challenges of statistical data integration
 

Banner: Statistical Data Integration


CHALLENGES OF STATISTICAL DATA INTEGRATION

A number of challenges need to be addressed within the National Statistical System (NSS) when undertaking integration. Data integration activities need a safe, secure environment, should be performed in line with community expectations and must meet legal and privacy requirements. The paragraphs below briefly outline some high level issues for consideration across the NSS.


Community Acceptance of Statistical Integration
Infrastructure and Data Management
Capability and Skills
Access to Data - Privacy and Legal Considerations
Confidentiality
Separation of Identifiers from Content
Providing Access to Linked Datasets


Community Acceptance of Statistical Integration

Qualitative research indicates that most members of the public are supportive of their data being used for statistical and research purposes, to improve social and economic outcomes for the Australian community, as long as the data is well managed and confidentiality is maintained. This is particularly true for health related information, where most research has been done on community attitudes to data linking. Legal protections to ensure data is kept confidential are important in obtaining community acceptance.


Infrastructure and Data Management

Data custodians and integrating authorities are equally responsible for managing data through appropriate storage and governance processes. In line with the Australian Government’s High level principles for data integration involving Commonwealth data for statistical and research purposes, there should be clearly defined procedures for managing data in each organisation. These procedures should cover secure data storage, data access arrangements and data retention policies.

Major costs associated with data integration are usually incurred in the initial set up of information technology and data management infrastructure, and transparency measures. These costs will vary depending on existing infrastructure within an organisation.


Capability and Skills

Having the appropriate capability to undertake data integration is a challenge for many organisations. Data integration requires analytical skills, but does not necessarily require additional specialist skills. In Australia, the limited supply of qualified graduates with analytical skills is a well-known issue and therefore developing and maintaining analytical expertise is critical to undertaking data integration activities.


Access to Data - Privacy and Legal Considerations

The data management practices of agencies that hold data should include access arrangements to maximise the use of data while upholding privacy and legislative requirements. Where practical, obtaining consent from data providers to allow data integration should be considered. Consideration should also be given to the primary purpose for which the data was collected.

Commonwealth operations are covered by the Privacy Act 1988 as well as other specific legislation. Most state and territory government agencies are bound by their jurisdictional privacy legislation. Many jurisdictions have governance arrangements with Privacy Commissioners or Information Officers that need to be followed prior to accessing data.

In addition to legislative and privacy considerations, sensitivity of the data also needs to be addressed. Datasets used in integration projects usually contain identifiable information about individuals or businesses (for example, name and address) and can include sensitive information such as health or income information. Information may also be politically sensitive when relating to government grants or commercial operations. A data management strategy should aim to reduce the risk associated with integrating sensitive data. Confidentialising data and implementing the separation principle are examples of strategies that can help to reduce risk (see discussion below).


Confidentiality

The wealth of information provided by integrated datasets can create additional risk by increasing the chance of identifying an entity (such as a person or business). Protecting the confidentiality of individuals or organisations in an integrated dataset is a key element in maintaining the ongoing trust of the Australian public. Removing identifying details, such as names, from a dataset does not necessarily protect identity as other variables can be used to deduce the identity of an individual or organisation in the dataset.

Identities can be protected by either confidentialising (e.g. perturbing variables or records), or by restricting access, or some combination of both strategies. Protecting the confidentiality and privacy of individuals and organisations also needs to be considered during the actual linking process used to form the integrated dataset.

Protecting the confidentiality and privacy of individuals and organisations also needs to be considered during the actual linking process used to form the integrated dataset.


Separation of Identifiers from Content

The ABS separates identifying variables from content variables as part of its suite of strategies to protect the identities of individuals and organisations in datasets. This means that no-one can see the identifying or demographic information, used to identify which records relate to the same person or organisation (e.g. name, address, date of birth), in conjunction with the content data (e.g. clinical information, benefit information, company profits). Instead, staff can see only the information they need to do the linking or analysis. So, rather than someone being able to see that John Smith has a rare medical condition, or the profits earned by Company X, the person doing the linking sees only the information needed to do the linking (e.g. John Smith’s name and address) and the analyst just sees a record, with no identifying information, showing that a person has a rare medical condition together with any other variables needed for analysis (e.g. broad age group, sex).


Providing Access to Linked Datasets

The wealth of information provided by integrated datasets can create additional risk by increasing the chance of identifying an entity (such as a person or business). This risk is increased when providing access to users who may hold some of the data within the linked dataset. The aim is to use integrated datasets to their maximum potential, while ensuring the privacy of data providers and maintaining the trust of the general public.

A range of options are required to provide easy access to datasets which allow both basic and complex analysis and are flexible enough that any software package for analysis can be used.


Return to Data Integration Homepage


Bookmark and Share. Opens in a new window


Commonwealth of Australia 2014

Unless otherwise noted, content on this website is licensed under a Creative Commons Attribution 2.5 Australia Licence together with any terms, conditions and exclusions as set out in the website Copyright notice. For permission to do anything beyond the scope of this licence and copyright terms contact us.