RELIABILITY OF ESTIMATES
Measuring sampling variability
Since the estimates from this survey are based on information obtained from a sub-sample of usual residents of a sample of dwellings, they are subject to sampling variability; that is, they may differ from those that would have been produced if all usual residents of all dwellings had been included in the survey. One measure of the likely difference is given by the standard error (SE), which indicates the extent to which an estimate might have varied by chance because only a sample of dwellings was included.
There are about two chances in three that a sample estimate will differ by less than one SE from the number that would have been obtained if all dwellings had been included, and about 19 chances in 20 that the difference will be less than two SEs. Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate.
INDICATIVE STANDARD ERRORS
Because of the large number and diverse nature of estimates which it is possible to produce from the NHS and NHSI it is not practicable to present separate indication of the SEs of all estimates. Indicative standard errors, and relative standard errors on estimates from the NHS and NHSI are provided in Tables 1 to 3 below. Figures in these table do not give a precise measure of the SE for a particular estimate but will provide an indication of its magnitude. ABS has modelled these SEs on the full survey design information. Exact RSEs on every estimate can however be provided by the replicate weight methodology. This methodology is described at the end of this Appendix.
An example of the calculation and the use of SEs from Table 1 in relation to estimates of persons is as follows. Consider the estimate for Australia of persons aged 45 - 54 years who reported high cholesterol as a long-term condition (246,300). Since this estimate is between 200,000 and 300,000 in the SE table, the SE will be between 13,200 and 15,600 and can be approximated by linear interpolation as 14,300 (rounded to the nearest 100). Therefore, there are about two chances in three that the value that would have been produced if all dwellings had been included in the survey will fall in the range 232,000 to 260,600 and about 19 chances in 20 that the value will fall within the range 217,700 to 274,900.
As can be seen from the SE table the smaller the estimate the higher the RSE. Very small estimates are thus subject to such high SEs (relative to the size of the estimate) as to detract seriously from their value for most reasonable uses. Only estimates with RSEs of less than 25% and percentages based on such estimates are considered sufficiently reliable for most purposes. However estimates with a higher RSE are contained in published tables from the survey and can be provided on request. In published output estimates with an RSE of 25% to 50% are preceded by an asterisk (e.g. *3.4) to indicate that they are subject to high SEs and should be used with caution. Estimates with RSEs greater than 50% are preceded by a double asterisk (e.g. **2.1) to indicate that they are considered too unreliable for general use.
SEs of proportions and percentages
Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends of the accuracy of both the numerator and denominator. A formula to approximate the RSE of a proportion is given below:
RSE( x/y ) =sqrt[RSE(x)]2 - [RSE(y)]2
Note - this formula only holds when the x is a subset of y. It should not be used if this is not the case i.e. estimates of 'rates' as opposed to proportions.
Using this formula, the RSE of the estimated proportion or percentage will be lower than the RSE estimate of the numerator. Therefore an approximation for SEs of proportions or percentages may be derived by neglecting the RSE of the denominator i.e. obtaining the RSE of the number of persons corresponding to the numerator of the proportion or percentage and then applying this figure to the estimated proportion or percentage. This approach was adopted for the purposes of assigning the * or ** to indicate a 25% or 50% RSE threshold in publications from the NHS and NHSI.
SEs may also be used to calculate SEs for the difference between two survey estimates (numbers or percentages). The sampling error of the difference between the two estimates depends on their individual SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (x-y) may be calculated by the following formula:
SE(x-y) =sqrt[SE(x)]2 +[SE(y)]2
While this formula will only be exact for differences between separate and uncorrelated characteristics of subpopulations, it is expected to provide a reasonable approximation for most differences likely to be of interest in relation to this survey.
The imprecision due to sampling variability, which is measured by the SE, should not be confused with inaccuracies that may occur because of imperfections in reporting by respondents and recording by interviewers, and errors made in coding and processing data. Inaccuracies of this kind are referred to as non-sampling error, and they may occur in any enumeration, whether it be a full count or a sample. Every effort is made to reduce non-sampling error to a minimum by careful design of questionnaires, intensive training and supervision of interviewers, and efficient operating procedures.
TABLE 1: (INDICATIVE) STANDARD ERRORS ON NHS PERSON ESTIMATES
Standard error (no)
|Size of estimate|
TABLE 2: NHS ESTIMATES WITH AN (INDICATIVE) RSE OF 25% AND 50%
|Size of estimate|
|RSE of 25% |
|RSE of 50%|
TABLE 3: (INDICATIVE) STANDARD ERRORS ON INDIGENOUS PERSON ESTIMATES, AUSTRALIA
Because the age distribution of the Indigenous population differs from that of the non-Indigenous population, data are often age standardised for the purposes of making comparisons between the Indigenous and non-Indigenous populations. Age standardised estimates are also often used for comparisons over time. Where Indigenous estimates from the 2001 collection have been age standardised, the standard errors are, on average, between 10% and 30% higher than the corresponding standard error of unstandardised estimates. Therefore, an adjustment factor of approximately 1.2 should be applied to the RSEs shown above for all age standardised estimates for the Indigenous population.
REPLICATE WEIGHTS TECHNIQUE
A class of techniques called replication methods provide a general method of estimating variances for the types of complex sample designs and weighting procedures employed in ABS household surveys.
A basic idea behind the replication approach is to select subsamples repeatedly from the whole sample. For each of these subsamples the statistic of interest is calculated. The variance of the full sample statistics is then estimated using the variability among the replicate statistics calculated from these subsamples. The subsamples are called replicate groups and the statistics calculated from these replicates are called replicate estimates.
There are various ways of creating replicate subsamples from the full sample. The replicate weights produced for the 2001 NHS have been created under the Jackknife method of replication which is described below.
There are numerous advantages to using the replicate weighting approach. These include;
|Size of estimate|
Relative Standard Error
Derivation of replicate weights
Under the Jackknife method of replicate weighting, weights were derived as follows:
- the same procedure is applicable to most statistics such as means, percentages, ratios, correlations, derived statistics and regression coefficients
- it is not necessary for the analyst to have available detailed survey design information if the replicate weights are included with the data file.
- 30 replicate groups were formed with each group formed to mirror the overall sample. Units from a CD all belong to the same replicate group and a unit can belong to only one replicate group.
- one replicate group was dropped from the file and then the remaining records were weighted in the same manner as for the full sample
- The records in that group that was dropped received a weight of zero
- This process was repeated for each replicate group (i.e. a total of 30 times)
Application of replicate weights
As noted above, replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit records analyses such as chi-square and logistic regression to be conducted which take into account the sample design.
Replicate weights for any variable of interest can be calculated from the 30 replicate groups, giving 30 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.
The formula for calculating the Standard error (SE) and relative standard error (RSE) of an estimate using this method is shown below.
SE(y) = sqrt ( (29/30) Sg (y(g) - y)2 )
g = 1,..,30 (the no. of replicate weights) ;
y(g) = estimate from using repwt g; and
y = estimate from using full person weight.
The RSE(y) = SE(y)/y * 100%.
This method can also be used when modelling relationships from unit record data, regardless of the modelling technique used. in modelling, the full sample would be used to estimate the parameter being studied, such as a regression co-efficient, the 30 replicate groups used to provide 30 replicate estimates of the survey parameter. The variance of the estimate of the parameter from the full sample is then approximated, as above, by the variability of the replicate estimates.
Use of replicate weights with statistical packages
Not all statistical computer packages may allow direct calculation of SEs using the Jacknife replicate weights. However, those packages that allow the direct use of Balanced Repeated Replication (BRR) methodology generally include the option of an adjustment factor. This factor can be incorporated to overcome the difference between the variance formulae.
Availability of RSEs calculated using replicate weights
Indicative RSEs were used in the summary publications released from the NHS and NHSI. However,
- Ultimately each record had 30 replicate weights attached to it with one of these being the zero weight.
- A set of NHS tables containing a breakdown by ASGC Remoteness categories is available as spreadsheets on the ABS web site, via the Health Theme Page. RSEs for these tables were calculated using the replicate weights methodology.
- Tables from the publication National Health Survey: Aboriginal and Torres Strait Islander Results, Australia 2001 (cat. no. 4715.0) which contain age standardised estimates were also recompiled with RSEs calculated using the replicate weights methodology. These are available electronically and can be accessed through publication 4715.0 on the ABS web site.