The paper summarises the approach taken to construct the Integrated Dataset (linked employer-employee data) and experimental statistics on employee earnings and jobs undertaken as part of the Linked Employer-Employee Database (LEED) Foundation Projects. The paper provides information on the data sources and integration methodology, a summary of results, and future directions for the LEED in the ABS.
The paper is structured as follows:
- Introduction - overview of the LEED Foundation Projects, their scope and coverage;
- Data Sources - Personal Income Tax (PIT) and the Expanded Analytical Business Longitudinal Database (EABLD);
- Integration Methodology - construction of the Integrated Dataset;
- Summary of Results - aggregate experimental statistics and their coherence with ABS estimates; and
- Future Directions - microdata product and future LEED.
, and appendices
are provided in the Explanatory Notes
The Australian Bureau of Statistics (ABS) is embarking on a period of major organisational transformation to respond to the new opportunities and challenges of the dynamic statistical landscape. Maximising the value of administrative data through integration and improved access is a strategic priority for the ABS in order to deliver high quality official statistics in efficient and innovative ways. For more information, refer to the ABS Corporate Plan, 2015-19 (cat. no. 1005.0).
As part of this transformation, the ABS is exploring the potential of creating a LEED which integrates person and business administrative data sourced from the Australian Taxation Office (ATO). The LEED would be linked longitudinally, as well as provide point in time data.
A future LEED would build on the EABLD, which integrates business tax and survey data. The LEED would be created by integrating the EABLD with a longitudinally linked PIT database. The long term vision is to extend the LEED by integrating other key administrative data, survey data (person and business level), and data from the Census of Population and Housing to deliver a new statistical solution to vastly expand the information base on the Australian labour market.
The LEED would address a longstanding information gap in Australian labour statistics by being a single database capable of addressing complex and varied questions about employer-employee relationships at both a point in time and longitudinally (e.g. examining firm and employee characteristics of productive firms). The creation of a LEED would demonstrate that administrative and directly collected data can be integrated to provide a strong evidence base for research, policy development and evaluation.
LEED FOUNDATION PROJECTS
The LEED Foundation Projects are being undertaken to build support for the future LEED. The purpose of these projects is to demonstrate the value of the LEED by assessing the feasibility of integrating person and business tax data, and using this Integrated Dataset to create new statistical outputs.
The LEED Foundation Projects integrate person level data from the PIT dataset with business level data sourced from the EABLD (integrated business data) to produce experimental statistics on employee earnings and jobs for the 2011-12 financial year.
These projects demonstrate an important step towards a future LEED, as it is the first time the ABS has integrated PIT data with the EABLD.
The experimental statistics produced from the Integrated Dataset (linked employer-employee data) were designed to:
- assess whether administrative data can be used to:
- address a known information gap by creating a new experimental measure of employee jobs in Australia; and
- produce new experimental statistics to complement existing information on employees and earnings from ABS household and business collections;
- be an example of the value of a future LEED for statistical purposes.
The LEED Foundation Projects capture information on all employee earnings and jobs in Australia throughout the reference period of 1 July 2011 to 30 June 2012.
The scope of the LEED Foundation Projects includes:
- all persons who were an employee (see below) at any point in the reference period as recorded on either an Individual Tax Return (ITR) or an Individual Pay As You Go (PAYG) summary;
- all jobs as reported in an Individual PAYG summary during the reference period; and
- all businesses which provided an Individual PAYG summary to an employee in the reference period.
(see Explanatory Notes
, paragraphs 32-35) is defined as someone who reported earnings on an ITR (see Explanatory Notes
, paragraphs 36-51) or who had an Individual PAYG summary reporting $1 or more in gross payment (see Explanatory Notes
, paragraph 61).
Persons who did not report any earnings on their ITR and did not receive an Individual PAYG summary from an employer were excluded from the scope of the LEED Foundation Projects. These include persons who were not in the labour force, were unemployed, or were employed but were away from all of their jobs for the entire reference period and did not receive any pay during that period.
is defined as a link between an employee and a business for $1 or more in payment as reported on an Individual PAYG summary. An employee can have multiple jobs with the same or different businesses during the financial year, and can hold two or more jobs concurrently (see Explanatory Notes
, paragraphs 52-60).
Jobs in which no Individual PAYG summary was provided by the employing business are not captured in the PIT data, and are therefore not included. Jobs in which the occupier was an Owner Manager of an Unincorporated Enterprise (OMUE, e.g. sole traders) are out of scope as they are not considered an employee (although the person may be included as an employee in other jobs they may hold).
which did not report PAYG withholdings from employees are deemed to be non-employing and are out of scope of the LEED Foundation Projects. These businesses are deemed to be non-employing, irrespective of their employment size on the ABS Business Register, because they do not report any jobs through an Individual PAYG summary.
Coverage restrictions apply to the LEED Foundation Projects Integrated Dataset.
The LEED Foundation Projects' use of unique identifiers ensures that each individual is unlikely to be included more than once in the experimental statistics.
Employees who meet one of the following conditions will be partially excluded from the LEED Foundation Projects. For these employees, missing information from one source (e.g. missing PAYG data) will result in exclusion from certain statistics (e.g. Mean gross payment, or Number of jobs).
- Employees who did not report earnings on an ITR for any of the following reasons:
- Employees who did not receive an Individual PAYG summary from an employer for any of the following reasons:
- They worked for cash in hand or other payments not recorded on an Individual PAYG summary;
- They conducted illicit activities not recorded on Individual PAYG summaries;
- They did not supply their Tax File Number (TFN) to their employer; or
- Any other reason.
No employing businesses were excluded on the basis of coverage.
Diagram 1: Implications of coverage on experimental statistics on employee earnings and jobs
The results of the LEED Foundation Projects are based, in part, on:
- Taxation data supplied by the ATO to the Australian Statistician under the Taxation Administration Act 1953; and
- Australian Business Register (ABR) data supplied by the Registrar to the Australian Statistician under A New Tax System (Australian Business Number) Act 1999.
These Acts require that such data is only used by the ABS for the purpose of administering the Census and Statistics Act 1905
. The ABS is obligated to maintain the confidentiality of individuals and businesses in these ATO and ABR data sets, as well as comply with provisions that govern the use and release of this information, including the Privacy Act 1988
Access to taxation data is tightly controlled within the ABS. Policies and guidelines governing the disclosure of information were implemented and followed in order to maintain the confidentiality of individuals and businesses. The aggregate experimental statistics have been confidentialised to ensure that they are not likely to enable identification of a particular person or organisation.