|Page tools: Print Page Print All|
Each record on the person level also contains 30 replicate weights and, by using these weights, it is possible to calculate standard errors for weighted estimates produced from the microdata. This method is known as the 30 group Jack-knife variance estimator.
Under the Jackknife method of replicate weighting, weights were derived as follows:
Replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit records analyses such as chi-square and logistic regression to be conducted which take into account the sample design. Replicate weights for any variable of interest can be calculated from the 30 replicate groups, giving 30 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.
To obtain the standard error of a weighted estimate y, the same estimate is calculated using each of the 30 replicate weights. The variability between these replicate estimates (denoting y(g) for group number g) is used to measure the standard error of the original weighted estimate y using the formula:
g = the replicate group number
y(g) = the weighted estimate, having applied the weights for replicate group g
y = the weighted estimate from the sample.
The 30 group Jack-knife method can be applied not just to estimates of the population total, but also where the estimate y is a function of estimates of the population total, such as a proportion, difference or ratio. For more information on the 30 group Jack-knife method of SE estimation, see Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee), July 1999 (cat. no. 1352.0.55.029).
Use of the 30 group Jack-knife method for complex estimates, such as regression parameters from a statistical model, is not straightforward and may not be appropriate. The method as described does not apply to investigations where survey weights are not used, such as in unweighted statistical modelling.
NOT APPLICABLE CATEGORIES
Most data items included in the microdata include a 'Not applicable' category. The classification value of the 'Not applicable' category, where relevant, are shown in the data item lists in the Downloads tab.
A number of questions included in the survey allowed respondents to provide one or more responses. Each response category for one of these 'multi-response questions' (or data items) is basically treated as a separate data item. These data items have the same general data item identifier (SASName) but are each suffixed with a letter – A for the first response, B for the second response, C for the third response, D for the fourth response and so on.
For example, the multi-response data item 'All sources of household income' (with a general SASName of ALLINCU – see data item list), has six response categories. Consequently, six data items have been produced - ALLINCUA, ALLINCUB, ALLINCUC, ALLINCUD and ALLINCUE.
Each data item in the series (i.e. ALLINCUA-- ALLINCUE) will have two response codes: A 'Yes' response (for the first in the series (code 1), for the second in the series (code 2) etc.) and a 'Null' response (code 0) indicating that the response was not relevant for the respondent. The last data item in the series will represent a 'Not Applicable' response (i.e. value of last character in series) which comprises the respondents not asked the questions (e.g. ALLINCUE with values of 0 or 5).
It should be noted that the sum of individual multi-response categories will be greater than the population or number of people applicable to the particular data item as respondents are able to select more than one response. Multi-response data items can be identified in the data item list as SASNames followed by a range of letters in brackets; for example, ALLINCU(A-E).
The population relevant to each data item is identified in the data item list and should be borne in mind when extracting and analysing data from the CURF or in TableBuilder. The actual population count for each data item is equal to the total cumulative frequency minus the 'Not applicable' category.
Generally all populations, including very specific populations, can be 'filtered' using other relevant data items. For example, if the population of interest is 'Employed persons', any data item with that population (excluding the 'Not applicable' category) can be used.
For example, the CURF data items 'Status in employment' (EMPSTCUR) or 'Industry (ANZSIC 2006)' (INDA06EC) are applicable to employed persons only. Therefore, either of the following filters could be used when restricting a table to 'Employed persons' only:
EMPSTCUR > 0 or INDA06EC < 26
(Note: For those data items, the 'Not applicable' categories (i.e. those persons who are not employed) are codes 0 and 26 respectively and would be excluded from either population filter shown above.)
Conversely, code 1 for the data items 'Labour force status' (LFSCURF) is 'employed persons'. Therefore, once again if the population of interest is employed persons, this data item could be used as the filter (i.e. LFSCURF = 1).
These documents will be presented in a new window.