1 This Microdata Product is formed from the integration of 2006 Vocational Education and Training (VET) in Schools data with 2011 Census of Population and Housing (the Census) data.
Vocational Education and Training in Schools
2 VET in Schools is a program which allows students to combine vocational studies with their general education curriculum. Students participating in VET in Schools continue to work towards their Senior Secondary School Certificate, while the VET component of their studies gives them credit towards a nationally recognised VET qualification. VET in Schools programs may involve structured work placements.
3 Data on VET in Schools are collected from the administrative records of enrolments at registered training organisations held by senior secondary assessment authorities, sometimes known as Boards of Studies, or state training authorities in each state and territory. These authorities submit the data to the National Centre for Vocational Education Research where national datasets are compiled.
4 Data for this product are inclusive of all persons aged 15-19 years who were enrolled in a VET in Schools module or unit of competency in 2006.
5 A module is a self-contained block of learning which can be completed on its own or as part of a course and which may also result in the attainment of one or more units of competency.
6 A unit of competency is a
component of a competency standard. A unit of competency is a statement of a key function or role in a particular job or occupation.
Census of Population and Housing
7 The Census is undertaken by the ABS every five years, and is collected under the authority of the Census and Statistics Act 1905. For information about the 2011 Census, including collection methodology, please refer to the information provided on the Census 2011 Reference and Information section of the ABS website. Information about the data quality of the Census is also available on the ABS website under Census Data Quality.
8 The scope of this Microdata Product is persons aged 15-19 years who were enrolled in a VET in Schools module or unit of competency in 2006 and who also responded to the 2011 Census.
9 Statistical data integration involves combining information from different administrative and/or survey sources to provide new datasets for statistical and research purposes. Further information on data integration is available on the National Statistical Service website – Data Integration.
10 Data linking is a key part of statistical data integration and involves the technical process of combining records from different source datasets using variables that are shared between the sources. Data linkage is typically performed on records that represent individual persons, rather than aggregates. The most common methods used link records on exact matches for common variables ('deterministic' linkage), or close matches ranked by probabilities that the variables used will result in a true match ('probabilistic' linkage).
Linking 2006 Vocational Education and Training in Schools data to 2011 Census of Population and Housing data
11 VET in Schools records were linked to Census records through exact matches on responses for common variables ('deterministic' linkage). For example, a variable that was common to each dataset was Sex which had the possible responses of '1' (Male) or '2' (Female), if a record had a response of '1' on both datasets it would be one step closer to becoming a link. As name and address were not available, matches were sought on various combinations of Postcode, Locality code, Statistical Area 2, Statistical Local Area, Date of birth, Age, Sex, and Country of birth. At least one geographical element, Sex, and Date of birth or Age were kept as a minimum in all combinations that were used to search for links.
12 Unique links were taken from each combination of variables and then ranked in ascending order of the duplicate rate of each combination. This duplicate rate was calculated as the number of vocational education and training in schools records that linked to MORE THAN one Census record divided by the number of vocational education and training in schools records that linked to AT LEAST one Census record. Where records matched on more than one combination of variables in the set of unique links the match from the combination with the lower duplicate rate was kept. The theory behind this is that higher duplicate rates point to more common characteristics in the populations you are trying to match, and links that are made on more common characteristics are more likely to be false.
13 The duplicate rates were quite high for the records linked through this process due to the large area geographic variables available for linking. In order to preserve the quality of the linked dataset, only links with lower duplicate rates were kept for analysis. The links that were rejected happened to be those made with Age instead of Date of birth, or Statistical Local Area in place of another geographic variable.
14 Information about data linkage methods used in similar studies can be found in - Research Paper: Assessing the Feasibility of Linking 2011 Vocational Education and Training in Schools Data to 2011 Census Data (cat. no. 1351.0.55.044)
15 At the completion of the linkage process, 50.52% (84,412 out of 167,088) of the in-scope VET in Schools records were successfully linked to Census records. This link rate is relatively low when compared to similar projects where education data was linked to the Census, for example - Research Paper: Assessing the Quality of Linking School Enrolment Records to 2011 Census Data: Deterministic Linkage Methods (cat. no. 1351.0.55.045). There is potential to raise the link rate by being less strict with the combinations of linking variables and the duplicate rate cut-off. However, the small increase in the link rate using these methods would be outweighed by a loss in accuracy.
16 While only unique links with acceptable duplicate rates were kept, these links still have a small chance of being false. This chance of error is influenced by a few factors. The first factor is the amount of missing or invalid information for the linking variables used. Matches can only be made on valid responses and any of the unique links could have potentially been duplicated in the records with missing or invalid information if that information was present. The table below shows the proportion of in-scope records in each dataset that have missing or incomplete information for the variables used for linkage.
The locality codes used for linking were State Suburb (SSC) codes, for more information about SSCs see Australian Statistical Geography Standard (ASGS): Volume 3 - Non ABS Structures (cat. no. 1270.0.55.003)
MISSING OR INCOMPLETE INFORMATION
2006 Vocational Education and Training in Schools
2011 Census of Population and Housing
|Statistical Area 2|
|Date of birth|
|Country of birth|
While both sources of data are population counts, VET in Schools students in 2006 may not have filled in a Census form in 2011 because they were no longer a resident of Australia, were abroad temporarily at the time of collection, or were missed for another reason. Similarly to missing information, these people who were missing from the 2011 Census could have created duplicate records for the links that were considered unique. Additionally, there would have been persons in the 2011 Census who did not have a chance to take part in VET in Schools even though they would have been eligible because they arrived in Australia after 2006, were abroad for that year, or were missing for another reason. As this group may have similar characteristics to the persons in the 2011 Census who may have done VET in Schools, some of them could have been linked, escalating the chance of false links.
Another potential quality issue stems from the fact that some groups of people may be less likely to link due to their characteristics. These include, persons in remote areas who may have poor address information, persons in densely populated areas where many people share similar characteristics, and persons who may not fill in enrolment or Census forms correctly due to language barriers or other reasons. This potential bias can result in a linked dataset that is not representative of the input data and therefore not appropriate for analysis.
In order to check the representativeness of the linked data, frequencies were run on demographic variables and compared with the input data. The analysis revealed that some groups were under-represented in the linked data. T
he groups most affected included:
- Students in remote areas, particularly in the Northern Territory
- Aboriginal and Torres Strait Islander students
- Students born outside of Australia, particularly those born in China and surrounding territories.
In order to account for the groups that were under-represented and for the low link rate, the linked data was weighted to match the input data. This process is explained in the section below.
Consistency checks were also run on common variables within the VET in Schools and Census datasets. The focus was on common variables that were either not used for linkage or only used on some of the linkage passes. Noting that not stated and missing values were excluded from these analyses. The variables tested and findings included:
- Country of birth - there are 2,815 records (3.3% of the linked records) with a different value reported in VET in Schools to Census
- Highest level of schooling completed - there are 2,440 records (2.9% of the linked records) with a lower level of schooling reported in 2011 on Census than in 2006 on VET in Schools
- Highest level of non-school qualification completed - there are 1,403 records (1.7% of the linked records) with a lower level of qualification reported in 2011 on Census than in 2006 on VET in Schools
- Proficiency in spoken English - there were 39 records (0.5% of the linked records) with a lower level of English proficiency reported in 2011 on Census than in 2006 on VET in Schools.
These inconsistencies could be an indicator of bias or error in linkage, however, they could also be an indicator of issues with quality of reporting on either dataset involved. Unlike the groups under-represented in the linkage, these inconsistencies cannot be addressed through weighting.
Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, student records. The weight can be considered an indication of how many students in the relevant population are represented by each person in the sample. Weights were created for linked records to enable population estimates to be produced.
Estimates from this dataset should be obtained by summing the weights assigned to each linked record, using the variable called WEIGHTS. The weight is a value which indicates how many student population records are represented by the linked record. Weights aim to adjust for the fact that the linked student records may not be representative of all the student records. Weighting should be used to ensure better representation of population sub-groups and to enhance the reliability of linked education data for longitudinal and cross-sectional analysis. Note that only weighted counts will produce an estimate of the total number of persons with the specified characteristic.
Weights were benchmarked to the following population groups:
- Postcode (129 groups), with large Postcodes, those with 800 persons or more, weighted individually and smaller Postcodes grouped together by state
- Sex, age, and Indigenous status (50 groups)
- Country of birth (6 groups), with Australia, New Zealand, the United Kingdom, China and surrounding territories, other countries, and not stated / missing responses as separate groups.
The weights have a mean value of 2, a median value of 1.9 and range between 1.1 and 9.1.
The weighted total of the 84,412
in scope linked records was 170,011.
Further analysis to assess the weighting involved comparisons of proportions and counts between the in-scope records and the weighted linked records. This analysis revealed that overall, subgroups were appropriately represented at the national level. However, there were some indications that certain groups may be over-represented, these included:
- Aboriginal students in Tasmania
- Torres Strait Islander students and both Aboriginal and Torres Strait Islander students in Western Australia
- Male Torres Strait Islander students in South Australia
- Torres Strait Islander students in New South Wales
- Students from Training organisation type 'Technical and Further Education institute'.
There were also some indications of groups that may have been under-represented. These included:
USE OF THE DATA
- Male Torres Strait Islander students in Queensland
- Male Aboriginal students in Victoria
- Students from Training organisation type 'School - Government'
- Students from Parent schools identified as 'Technical and Further Education institute' or 'Community-based adult education provider'.
Despite the efforts made to assure the quality of the linked dataset and weight it to make it representative, there is still a chance that some of the links made were false and certain groups (more than those identified above) were either under or over represented after weighting. The microdata available through this product should be used with caution.