Microdata: Outcomes from Vocational Education and Training in Schools, Australia

Facilitates detailed analysis of the longitudinal post school outcomes of participants in Vocational Education and Training in Schools

Introduction

The Outcomes from Vocational Education and Training in Schools Microdata Product comes from the 2006 Vocational Education and Training in Schools and 2011 Census of Population and Housing Integrated Dataset (the Integrated Data). The integration of this data enhances the evidence base available for analysis of the post school labour market and education outcomes of participants in Vocational Education and Training in Schools. The Integrated Data is only released in the ABS Data Laboratory, however, a Test File is also available to help users become familiar with the structure of the data before undertaking an ABS Data Laboratory session.

Data on Vocational Education and Training in Schools was supplied to the ABS by the National Centre for Vocational Education Research. More information on this data can be found via the Explanatory notes section. The Census of Population and Housing is conducted every five years and aims to measure accurately the number of people and dwellings in Australia on Census Night (9 August for the 2011 Census).

Microdata products are the most detailed information available from a Census, survey, or administrative sources and generally include confidentialised unit record level information such as responses to individual questions on a questionnaire. They also include derived data from responses for two or more variables and are released with the approval of the Australian Statistician.

Available products

A Test File is available on the Data downloads section for understanding the structure of the data and to test code. This file does not contain real data and cannot be used for analysis. The Integrated Data is available through the ABS Data Laboratory, which enables in-depth analysis using a range of statistical software packages. Further information about the ABS Data Laboratory, and other information to assist users in understanding and accessing microdata in general, is available from the Microdata Entry Page.

To apply for access to the Integrated Data in the ABS Data Laboratory, please contact Microdata Access Strategies via microdata.access@abs.gov.au.

Information on access can be found on the How to Apply for Microdata page on the ABS web site.

Further information

For further information about data sources, data scope, linkage methodology and results, weighting methodology, and data quality see the information available in the Explanatory Notes section.

For further information about the data structure and available data items see the Data Items section. A Data items list is also available from the Data downloads section.

For further information about the Test File see the Test File section. The Test File is available from the Data downloads section.

Support

For support in the use of this product, please contact Microdata Access Strategies on 02 6252 7714 or via microdata.access@abs.gov.au.

Data available on request

Customised tables are available on a fee-for-service basis. A consultancy service is available for complex analysis and modelling. Contact the National Information and Referral Service on 1300 135 070.

Acknowledgments

ABS releases draw extensively on information provided freely by individuals, businesses, governments and other organisations. Their continued cooperation is very much appreciated: without it, the wide range of statistics published by the ABS would not be available. Information received by the ABS is treated in strict confidence as required by the Census and Statistics Act 1905.

The ABS gratefully acknowledges the co-operation and technical advice provided by the National Centre for Vocational Education Research.

Inquiries

For further information about these and related statistics, contact the National Information and Referral Service on 1300 135 070, or email client.services@abs.gov.au. The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.

Data items

Data items list

A complete list of data items included on the Integrated 2006 Vocational Education and Training (VET) in Schools and 2011 Census of Population and Housing (the Census) dataset is provided in an Excel workbook that can be accessed from the Data downloads section.

All data items are created at the person level. This includes data items relating to the family and household of the person from the Census. For ease of use, these data items have been divided into Person, Dwelling, Household, Family, Spouse Related, and Male and Female parent related groupings.

Users intending to apply for access to the ABS Data Laboratory should ensure the data they require, and the level of detail required, are available and applicable for the intended use.

Weighting

Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, student records. The weight can be considered an indication of how many students in the relevant population are represented by each person in the sample. Weights were created for linked records to enable population estimates to be produced.

Estimates from this dataset should be obtained by summing the weights assigned to each linked record, using the variable called WEIGHTS. The weight is a value which indicates how many student population records are represented by the linked record. Weights aim to adjust for the fact that the linked student records may not be representative of all the student records. Weighting should be used to ensure better representation of population sub-groups and to enhance the reliability of linked education data for longitudinal and cross-sectional analysis. Note that only weighted counts will produce an estimate of the total number of persons with the specified characteristic.

Not applicable categories

'Not applicable' categories occur when an item on the Census form or VET in Schools administrative forms do not apply to the respondent. Not all data items in the Integrated Data include a 'Not applicable' category.

Not stated categories

'Not stated' categories occur when no response has been provided for a data item. Some VET in Schools and all Census data items contain not stated categories except for age, sex, marital status and usual address, as this information is imputed for these items.

Visitors on Census night

Overseas visitors were excluded from the Census. However, the Census does include visitors from within Australia in 2011. These are people who were enumerated away from their usual residence on Census Night. Family information cannot be derived for these persons and as such all family, spouse, and male and female parent related data items are not applicable for visitors.

All dwelling related data items, however, have been made applicable to visitors. This information relates to their dwelling of enumeration on Census Night, not usual residence.

Most household data items are not applicable to visitors, however for four data items, visitors have been included in order to align to standard Census derivations of that data item. These comprise:

  • Total Household Income as stated (weekly) of household in which person was enumerated
  • Total Household Income (weekly) of household in which person was enumerated
  • Household Income Derivation Indicator of household in which person was enumerated
  • Household Composition of household in which person was enumerated.

Any applicable household information for a visitor relates to their place of enumeration, not usual residence.

Where a data item is applicable to visitors, the Usual Address Indicator on Census Night data item can be used to restrict the table to usual residents only.

Persons temporarily absent on Census night

The Census household form provides the opportunity to list up to three persons who were temporarily absent from the dwelling on Census Night. A limited amount of information is collected for these persons and it is used to better derive the family and household characteristics of the dwelling. In deriving family and household related data items for the Census data, information on persons temporarily absent was included where relevant and available.

Test file

A Test File has been created for the Outcomes from Vocational Education and Training in Schools Microdata Product. The purpose of the Test File is to allow researchers/analysts to become more familiar with the data structure and prepare code/programs prior to applying for, or commencing, an ABS Data Laboratory session. This aims to maximise the value of sessions by saving users time and resources once they enter the ABS Data Laboratory environment.

This Test File mimics the structure of the 2006 Vocational Education and Training in Schools and 2011 Census of Population and Housing Integrated Dataset in that it has the same data items and allowed values. All data on the file is false, created through a randomisation process. Proportions of values within data items in the Test File will be similar to those in the real data, however, relationships between data items will not be intentionally maintained. It is extremely unlikely that a record in the Test File would match with a genuine record in the real data.

The Test File is available as a free download through the Data downloads section. The Test File may also be available in other file formats on request. For further information users should email microdata.access@abs.gov.au or telephone (02) 6252 7714.

The Test File does not contain real data, and cannot be used for analysis.

Conditions of use

User responsibilities

The Census and Statistics Act 1905 includes a legislative guarantee to respondents that their confidentiality will be protected. This is fundamental to the trust that the Australian public has in the ABS and that trust is in turn fundamental to maintaining the quality of ABS information. Without that trust, respondents may be less forthcoming or truthful in answering our questionnaires. For more information, see 'Avoiding inadvertent disclosure and 'Microdata' on the web page How the ABS keeps your information confidential.

Conditions of sale

All ABS products and services are provided subject to the ABS DisclaimerABS CopyrightABS Privacy and ABS Conditions of Sale.

Any queries relating to these Conditions of Sale should be emailed to intermediary.management@abs.gov.au.

Price

Microdata access is priced according to the ABS Pricing Policy and Commonwealth Cost Recovery Guidelines. For details refer to ABS Pricing Policy on the ABS website. For microdata prices refer to the Microdata prices web page.

How to apply for access

To apply for access to the microdata, clients should read the How to Apply for Microdata web page.

Clients should familiarise themselves with the User Manual: Responsible Use of ABS CURFs before applying for access.

Australian universities

The ABS/Universities Australia Agreement provides participating universities with access to a range of ABS products and services. This includes access to microdata.

For further information, university clients should refer to the ABS/Universities Australia Agreement web page.

Further information

The Microdata Entry page on the ABS website contains links to microdata related information to assist users to understand and access microdata.

For further information users should email microdata.access@abs.gov.au or telephone (02) 6252 7714.

Data downloads

I-Note

A Test File for the Outcomes from Vocational Education and Training in Schools Microdata Product has been created to allow researchers/analysts to become familiar with the data structure and prepare code/programs prior to applying for, or commencing, an ABS Data Laboratory session

The Test File is available from the Data downloads section and may also be available in other file formats on request. For further information users should email microdata.access@abs.gov.au or telephone (02) 6252 7714.

The Test File does not contain real data, and cannot be used for analysis.

Data files

Explanatory notes

Show all

Data sources

1 This Microdata Product is formed from the integration of 2006 Vocational Education and Training (VET) in Schools data with 2011 Census of Population and Housing (the Census) data.

Vocational Education and Training in Schools

2 VET in Schools is a program which allows students to combine vocational studies with their general education curriculum. Students participating in VET in Schools continue to work towards their Senior Secondary School Certificate, while the VET component of their studies gives them credit towards a nationally recognised VET qualification. VET in Schools programs may involve structured work placements.

3 Data on VET in Schools are collected from the administrative records of enrolments at registered training organisations held by senior secondary assessment authorities, sometimes known as Boards of Studies, or state training authorities in each state and territory. These authorities submit the data to the National Centre for Vocational Education Research where national datasets are compiled.

4 Data for this product are inclusive of all persons aged 15-19 years who were enrolled in a VET in Schools module or unit of competency in 2006.

5 A module is a self-contained block of learning which can be completed on its own or as part of a course and which may also result in the attainment of one or more units of competency.

6 A unit of competency is a component of a competency standard. A unit of competency is a statement of a key function or role in a particular job or occupation.

Census of Population and Housing

7 The Census is undertaken by the ABS every five years, and is collected under the authority of the Census and Statistics Act 1905. For information about the 2011 Census, including collection methodology, please refer to the information provided on the Census 2011 Reference and Information section of the ABS website. Information about the data quality of the Census is also available on the ABS website under Census Data Quality.

Scope

8 The scope of this Microdata Product is persons aged 15-19 years who were enrolled in a VET in Schools module or unit of competency in 2006 and who also responded to the 2011 Census.

Data integration

9 Statistical data integration involves combining information from different administrative and/or survey sources to provide new datasets for statistical and research purposes. Further information on data integration is available on the National Statistical Service website – Data Integration.

10 Data linking is a key part of statistical data integration and involves the technical process of combining records from different source datasets using variables that are shared between the sources. Data linkage is typically performed on records that represent individual persons, rather than aggregates. The most common methods used link records on exact matches for common variables ('deterministic' linkage), or close matches ranked by probabilities that the variables used will result in a true match ('probabilistic' linkage).

Linking 2006 Vocational Education and Training in Schools data to 2011 Census of Population and Housing data

11 VET in Schools records were linked to Census records through exact matches on responses for common variables ('deterministic' linkage). For example, a variable that was common to each dataset was Sex which had the possible responses of '1' (Male) or '2' (Female), if a record had a response of '1' on both datasets it would be one step closer to becoming a link. As name and address were not available, matches were sought on various combinations of Postcode, Locality code, Statistical Area 2, Statistical Local Area, Date of birth, Age, Sex, and Country of birth. At least one geographical element, Sex, and Date of birth or Age were kept as a minimum in all combinations that were used to search for links.

12 Unique links were taken from each combination of variables and then ranked in ascending order of the duplicate rate of each combination. This duplicate rate was calculated as the number of vocational education and training in schools records that linked to MORE THAN one Census record divided by the number of vocational education and training in schools records that linked to AT LEAST one Census record. Where records matched on more than one combination of variables in the set of unique links the match from the combination with the lower duplicate rate was kept. The theory behind this is that higher duplicate rates point to more common characteristics in the populations you are trying to match, and links that are made on more common characteristics are more likely to be false.

13 The duplicate rates were quite high for the records linked through this process due to the large area geographic variables available for linking. In order to preserve the quality of the linked dataset, only links with lower duplicate rates were kept for analysis. The links that were rejected happened to be those made with Age instead of Date of birth, or Statistical Local Area in place of another geographic variable.

14 Information about data linkage methods used in similar studies can be found in - Research Paper: Assessing the Feasibility of Linking 2011 Vocational Education and Training in Schools Data to 2011 Census Data (cat. no. 1351.0.55.044)

Linkage results

15 At the completion of the linkage process, 50.52% (84,412 out of 167,088) of the in-scope VET in Schools records were successfully linked to Census records. This link rate is relatively low when compared to similar projects where education data was linked to the Census, for example - Research Paper: Assessing the Quality of Linking School Enrolment Records to 2011 Census Data: Deterministic Linkage Methods (cat. no. 1351.0.55.045). There is potential to raise the link rate by being less strict with the combinations of linking variables and the duplicate rate cut-off. However, the small increase in the link rate using these methods would be outweighed by a loss in accuracy.

16 While only unique links with acceptable duplicate rates were kept, these links still have a small chance of being false. This chance of error is influenced by a few factors. The first factor is the amount of missing or invalid information for the linking variables used. Matches can only be made on valid responses and any of the unique links could have potentially been duplicated in the records with missing or invalid information if that information was present. The table below shows the proportion of in-scope records in each dataset that have missing or incomplete information for the variables used for linkage.

Missing or incomplete information
 2006 Vocational Education and Training in Schools
%
2011 Census of Population and Housing
%
Postcode0.3919.53
Locality code36.5720.82
Statistical Area 244.564.62
Date of birth08.96
Sex00
Country of birth48.581.55

The locality codes used for linking were State Suburb (SSC) codes, for more information about SSCs see Australian Statistical Geography Standard (ASGS): Volume 3 - Non ABS Structures (cat. no. 1270.0.55.003)

17 While both sources of data are population counts, VET in Schools students in 2006 may not have filled in a Census form in 2011 because they were no longer a resident of Australia, were abroad temporarily at the time of collection, or were missed for another reason. Similarly to missing information, these people who were missing from the 2011 Census could have created duplicate records for the links that were considered unique. Additionally, there would have been persons in the 2011 Census who did not have a chance to take part in VET in Schools even though they would have been eligible because they arrived in Australia after 2006, were abroad for that year, or were missing for another reason. As this group may have similar characteristics to the persons in the 2011 Census who may have done VET in Schools, some of them could have been linked, escalating the chance of false links.

18 Another potential quality issue stems from the fact that some groups of people may be less likely to link due to their characteristics. These include, persons in remote areas who may have poor address information, persons in densely populated areas where many people share similar characteristics, and persons who may not fill in enrolment or Census forms correctly due to language barriers or other reasons. This potential bias can result in a linked dataset that is not representative of the input data and therefore not appropriate for analysis.

19 In order to check the representativeness of the linked data, frequencies were run on demographic variables and compared with the input data. The analysis revealed that some groups were under-represented in the linked data. The groups most affected included:

  • Students in remote areas, particularly in the Northern Territory
  • Aboriginal and Torres Strait Islander students
  • Students born outside of Australia, particularly those born in China and surrounding territories.

20 In order to account for the groups that were under-represented and for the low link rate, the linked data was weighted to match the input data. This process is explained in the section below.

21 Consistency checks were also run on common variables within the VET in Schools and Census datasets. The focus was on common variables that were either not used for linkage or only used on some of the linkage passes. Noting that not stated and missing values were excluded from these analyses. The variables tested and findings included:

  • Country of birth - there are 2,815 records (3.3% of the linked records) with a different value reported in VET in Schools to Census
  • Highest level of schooling completed - there are 2,440 records (2.9% of the linked records) with a lower level of schooling reported in 2011 on Census than in 2006 on VET in Schools
  • Highest level of non-school qualification completed - there are 1,403 records (1.7% of the linked records) with a lower level of qualification reported in 2011 on Census than in 2006 on VET in Schools
  • Proficiency in spoken English - there were 39 records (0.5% of the linked records) with a lower level of English proficiency reported in 2011 on Census than in 2006 on VET in Schools.

These inconsistencies could be an indicator of bias or error in linkage, however, they could also be an indicator of issues with quality of reporting on either dataset involved. Unlike the groups under-represented in the linkage, these inconsistencies cannot be addressed through weighting.

Weighting

22 Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, student records. The weight can be considered an indication of how many students in the relevant population are represented by each person in the sample. Weights were created for linked records to enable population estimates to be produced.

23 Estimates from this dataset should be obtained by summing the weights assigned to each linked record, using the variable called WEIGHTS. The weight is a value which indicates how many student population records are represented by the linked record. Weights aim to adjust for the fact that the linked student records may not be representative of all the student records. Weighting should be used to ensure better representation of population sub-groups and to enhance the reliability of linked education data for longitudinal and cross-sectional analysis. Note that only weighted counts will produce an estimate of the total number of persons with the specified characteristic.

24 Weights were benchmarked to the following population groups:

  • Postcode (129 groups), with large Postcodes, those with 800 persons or more, weighted individually and smaller Postcodes grouped together by state
  • Sex, age, and Indigenous status (50 groups)
  • Country of birth (6 groups), with Australia, New Zealand, the United Kingdom, China and surrounding territories, other countries, and not stated / missing responses as separate groups.

25 The weights have a mean value of 2, a median value of 1.9 and range between 1.1 and 9.1.

26 The weighted total of the 84,412 in scope linked records was 170,011.

27 Further analysis to assess the weighting involved comparisons of proportions and counts between the in-scope records and the weighted linked records. This analysis revealed that overall, subgroups were appropriately represented at the national level. However, there were some indications that certain groups may be over-represented, these included:

  • Aboriginal students in Tasmania
  • Torres Strait Islander students and both Aboriginal and Torres Strait Islander students in Western Australia
  • Male Torres Strait Islander students in South Australia
  • Torres Strait Islander students in New South Wales
  • Students from Training organisation type 'Technical and Further Education institute'.

28 There were also some indications of groups that may have been under-represented. These included:

  • Male Torres Strait Islander students in Queensland
  • Male Aboriginal students in Victoria
  • Students from Training organisation type 'School - Government'
  • Students from Parent schools identified as 'Technical and Further Education institute' or 'Community-based adult education provider'.

Use of the data

29 Despite the efforts made to assure the quality of the linked dataset and weight it to make it representative, there is still a chance that some of the links made were false and certain groups (more than those identified above) were either under or over represented after weighting. The microdata available through this product should be used with caution.

Previous catalogue number

This release previously used catalogue number 4260.0.55.001.