Australian Bureau of Statistics
2080.5 - Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2011
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 18/12/2013 First Issue
|Page tools: Print Page Print All RSS Search this Product|
WEIGHTING THE ACLD
Figure 1: IN SCOPE POPULATION FOR THE AUSTRALIAN CENSUS LONGITUDINAL DATASET, 2006-2011
DESCRIPTION OF WEIGHTING PROCESS FOR LONGITUDINAL WEIGHTS
Positive weights were calculated for each linked record on the ACLD, that is each 2006 sample record that was successfully linked to a 2011 Census record. No weights were calculated for the unlinked records. The resulting weights in the ACLD are a measure of how many population units each person represents, taking into account both the likelihood that the person was linked, and the general composition of the in scope population. The weights consist of four components. The first two components address the likelihood of a person being selected for the 2006 sample through a sample design and undercoverage adjustment. The third and fourth components adjust the weight on the basis of the longitudinal population in scope of both Censuses, by adjusting for missed links and benchmarking to the relevant population and subpopulations that were at risk of being underrepresented otherwise. The following describes each component in depth.
For a sample survey, the design weight needs to take into account any differential likelihood of selection on the basis of survey design. Given that the 2006 sample of the ACLD was taken from a population Census and the design utilised random selection, the design weight for the ACLD is quite simply the inverse of the probability of selection. Given that the probability of selection is 1 in 20 (5%), the design weight is:
W1 = 20.
In order to represent the full 2006 Estimated Resident Population (ERP), the design weight was then adjusted for the small proportion of people who were in scope for the 2006 Census but did not complete a Census form in 2006. While this proportion varies substantially between demographic groups, the 2006 Census net undercount proportion of 2.7% was used for simplicity. This resulted in an undercoverage adjusted weight of:
W2 = W1 x (1 / (1-0.027))
The aim of this component was to account for missed links, that is, 2006 sample records that had corresponding 2011 Census records, but were not linked. No attempt was made to correct for false links. The missed link adjusted weight is the product of the undercoverage adjusted weight and the inverse of the estimated propensity to link.
W3 = W2 x (inverse of the estimated propensity to link)
The propensity to link was estimated using a logistic regression model that was applied to the 2006 sample, with the response variable being the link status. The logistic regression model describes a relationship between a 2006 sample record's propensity to link and its values for a range of 2006 Census variables such as Indigenous status, marital status, country of birth, language spoken at home and English proficiency, labour force participation and occupation, educational attainment, mobility (whether moved in the preceding year) and remoteness. The estimated propensity to link varied considerably between records.
Two separate models were applied to the 2006 sample. The first model was applied to people under the age of 15 years on 2006 Census night. This model excluded the variables that were not applicable to people under 15 years of age, such as marital status. The second model was applied to the remainder of the sample (persons aged 15 years or over in 2006).
Each model was initially estimated using a training dataset, which consisted of 75% of the respective records. For each model, an out of sample Hosmer-Lemeshow type of analysis was applied to the remaining 25% of the records to determine the estimated propensity ranges for which each model provided a poor fit. For the model applied to the sample that was aged under 15 years, the model significantly underestimated the linkage rates where the estimated propensities were less than 0.65. To improve the estimated propensities, all links for people aged under 15 years on 2006 Census night with estimated propensities less than 0.65 had their estimated propensities set to 0.65. Similarly, all links for people aged 15 years or over on 2006 Census night with estimated propensities less than 0.61 had their estimated propensities set to 0.61.
The missed link adjustment carries the assumptions that the ACLD contains no false links and that all records in the 2006 sample that weren't linked, did have a corresponding 2011 Census record. As with many linked datasets, both of these assumptions are invalid for the ACLD. The violation of these assumptions results in the missed link adjustment correcting not only for missed links, but also for the records in the 2006 sample that weren't linked because they didn't have a 2011 Census record. Therefore the missed link adjustment erroneously corrects also for persons that died between the 2006 and 2011 Census nights, persons that moved overseas between the 2006 and 2011 Census nights and (of less concern because it is an objective of the calibration component) persons that were living in Australia on 2011 Census night but weren't counted. Furthermore, records that are less likely to be linked are expected intuitively to be more likely to be linked incorrectly. Giving these links a higher missed link adjusted weight can increase the influence of false links in the ACLD. The calibration component remediates the over-representation of persons who have died or moved overseas to some extent.
Odds ratios and accompanying Wald confidence intervals for the predictor variables for the first model (for persons aged under 15 years in 2006) are contained in Table A.1. A comparison group is selected for each characteristic, and the odds ratio for the other categories represents the ratio of the odds of being linked in contrast to the comparison group. For instance, Table A.1 shows the odds ratios by age group in 2006. Those aged 8-13 years were less likely to be linked than those aged 0-7 years (the comparison group), but more likely than those aged 14 years. Conversely, the odds ratios for school type in 2006 show that persons attending Catholic schools were more likely to be linked than those attending government school (the comparison group).
Table A.1 - ODDS RATIOS FROM THE LOGISTIC REGRESSION MODEL, Persons aged under 15 years, 2006
(a) Includes Supplementary codes.
(b) Includes other school sector and pre-school
(c) Includes Migratory, Offshore and Shipping Zones and No usual address
Source: Australian Census Longitudinal Dataset 2006-2011
Odds ratios and accompanying Wald confidence intervals for the predictor variables for the second model (for persons aged 15 years or over in 2006) are contained in Table A.2. A wider variety of variables were available for this age group. There are some differences between the two models. For instance, English speaking proficiency appears to have a detrimental impact on the propensity to link for persons aged under 15 years, but no clear impact for those aged 15 years or over.
Table A.2 - ODDS RATIOS FROM LOGISTIC REGRESSION MODEL, Persons aged 15 years or over, 2006
(a) Includes persons who did not go to school.
(b) Includes Migratory, Offshore and Shipping Zones and No usual address
Source: Australian Census Longitudinal Dataset, 2006-2011
The missed link adjusted weight was calibrated so that the resulting weighted counts of the ACLD links would be equal to estimates of the longitudinal population size at the national and selected sub-national levels. For the ACLD, weights were calibrated to two sets of benchmarks simultaneously using a 'raking' tool. This is a program which was developed to determine record level weights using iterative horizontal and vertical passes through the unit records until a satisfactory set of weights are converged upon. To mitigate against the possibility of the tool producing calibrated weights that were less than one, lower bounds for the calibrated weights were set to 20% of the missed link adjusted weight. Upper bounds were not necessary because extremely high weights were not produced.
The first set of benchmarks comprise state/territory, by interstate migration, by sex, by ten year age group population benchmarks. There were two interstate migration groups, with the first group consisting of the population that resided in the given state/territory on 30 June 2006 and 30 June 2011, and the second group consisting of the population that resided in the given state/territory on 30 June 2011 but were in a different state/territory on 30 June 2006 (i.e. interstate arrivals). The interstate migration groups served to correct for the lower linkage rates among people who moved interstate between 30 June 2006 and 30 June 2011. The second set of benchmarks comprised Indigenous status (according to the 2011 Census) by state/territory.
Note that the ERP by Indigenous status for the period 2006 - 2011 is currently being revised in view of a higher than expected intercensal increase in the number of Aboriginal and Torres Strait Islander persons (see Census of Population and Housing: Understanding the Increase in Aboriginal and Torres Strait Islander Counts, 2006-2011, ABS cat. no. 2077.0). As a result, weights for the ACLD will be reviewed when this data becomes available.
The first set of benchmarks were estimated by first dividing the 30 June 2011 ERP into the state/territory, by interstate migration, by sex, by ten year age groups and then subtracting the number of overseas arrivals between 30 June 2006 and 30 June 2011 from each of the groups. Births between 30 June 2006 and 30 June 2011 were automatically excluded because the youngest age group consisted of those aged 5-14 years on 30 June 2011. Groups that had very small ERPs were merged together. For example, the male ERP for those aged 75 to 84 years and those aged 85 years or over on 30 June 2011 who resided in the Northern Territory during the intercensal period were summed. As a result, the first set of benchmarks comprises 275 age by sex by state/territory groups. These benchmarks are displayed in Table A.3.
Table A.3: BENCHMARKS OF THE LONGITUDINAL POPULATION, By state/territory, interstate migration status, sex and age, 2006-2011
After setting these benchmarks, the data was assessed for how well subpopulations were represented. Aboriginal and Torres Strait Islander persons were under-represented at this stage, partly owing to intercensal growth in this subpopulation (see Census of Population and Housing: Understanding the Increase in Aboriginal and Torres Strait Islander Counts, 2006-2011, ABS cat. no. 2077.0). At the time of publication, finalised ERP data by Indigenous status for 2006 was unavailable - this data is due for release early in 2014. As a result, the second set of benchmarks was estimated by applying the rate of growth for the Aboriginal and Torres Strait Islander population from 2006-2011 from previous projections (Experimental Estimates and Projections, Aboriginal and Torres Strait Islander Australians, 1991 to 2021, ABS cat. no. 3238.0) to the 2011 ERP for Aboriginal and Torres Strait Islander persons (using the B series of projections) and removing deaths and interstate departures between 2006 and 2011. Overseas departures were not estimated or removed.
The Indigenous status benchmark groups comprised 17 state/territory by Indigenous status groups, where Indigenous status was either 'Aboriginal/Torres Strait Islander' or 'Not Aboriginal/Torres Strait Islander' (including both non-Indigenous and not stated). Due to the small population size in Other Territories, this benchmark was not disaggregated by Indigenous status. The benchmarks by Indigenous status are displayed in Table A.4.
Table A.4: BENCHMARKS OF THE LONGITUDINAL POPULATION, By state/territory and Indigenous status, 2011
Source: adjusted 2011 Estimated Resident Population
The mean weight for selected characteristics gives an indication of how much the weight has been increased or reduced from the initial probability of selection (which would give a weight of 20) in order to address missed links and Census undercount.
Table A.5 shows that the mean weight for the linked records is 23.3 - that is, each person in the linked dataset generally represents just over 23 persons in the population. The largest weight was 168 and the smallest was 4. The mean weight was higher for Aboriginal and Torres Strait Islander persons and for people who had moved, particularly interstate.
Table A.5: DESCRIPTIVE STATISTICS FOR WEIGHTS, By selected characteristics, 2011
Source: Australian Census Longitudinal Dataset 2006-2011
These documents will be presented in a new window.
This page last updated 17 December 2013