
Contents >>
Appendix 12  Standard errors
RELIABILITY OF ESTIMATES
Measuring sampling variability
Since the estimates from this survey are based on information obtained from a subsample of usual residents of a sample of dwellings, they are subject to sampling variability; that is, they may differ from those that would have been produced if all usual residents of all dwellings had been included in the survey. One measure of the likely difference is given by the standard error (SE), which indicates the extent to which an estimate might have varied by chance because only a sample of dwellings was included.
There are about two chances in three that a sample estimate will differ by less than one SE from the number that would have been obtained if all dwellings had been included, and about 19 chances in 20 that the difference will be less than two SEs. Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate.
INDICATIVE STANDARD ERRORS
Because of the large number and diverse nature of estimates which it is possible to produce from the NHS and NHSI it is not practicable to present separate indication of the SEs of all estimates. Indicative standard errors, and relative standard errors on estimates from the NHS and NHSI are provided in Tables 1 to 3 below. Figures in these table do not give a precise measure of the SE for a particular estimate but will provide an indication of its magnitude. ABS has modelled these SEs on the full survey design information. Exact RSEs on every estimate can however be provided by the replicate weight methodology. This methodology is described at the end of this Appendix.
An example of the calculation and the use of SEs from Table 1 in relation to estimates of persons is as follows. Consider the estimate for Australia of persons aged 45  54 years who reported high cholesterol as a longterm condition (246,300). Since this estimate is between 200,000 and 300,000 in the SE table, the SE will be between 13,200 and 15,600 and can be approximated by linear interpolation as 14,300 (rounded to the nearest 100). Therefore, there are about two chances in three that the value that would have been produced if all dwellings had been included in the survey will fall in the range 232,000 to 260,600 and about 19 chances in 20 that the value will fall within the range 217,700 to 274,900.
As can be seen from the SE table the smaller the estimate the higher the RSE. Very small estimates are thus subject to such high SEs (relative to the size of the estimate) as to detract seriously from their value for most reasonable uses. Only estimates with RSEs of less than 25% and percentages based on such estimates are considered sufficiently reliable for most purposes. However estimates with a higher RSE are contained in published tables from the survey and can be provided on request. In published output estimates with an RSE of 25% to 50% are preceded by an asterisk (e.g. *3.4) to indicate that they are subject to high SEs and should be used with caution. Estimates with RSEs greater than 50% are preceded by a double asterisk (e.g. **2.1) to indicate that they are considered too unreliable for general use.
SEs of proportions and percentages
Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends of the accuracy of both the numerator and denominator. A formula to approximate the RSE of a proportion is given below:
RSE( x/y ) =sqrt[RSE(x)]2  [RSE(y)]2
Note  this formula only holds when the x is a subset of y. It should not be used if this is not the case i.e. estimates of 'rates' as opposed to proportions.
Using this formula, the RSE of the estimated proportion or percentage will be lower than the RSE estimate of the numerator. Therefore an approximation for SEs of proportions or percentages may be derived by neglecting the RSE of the denominator i.e. obtaining the RSE of the number of persons corresponding to the numerator of the proportion or percentage and then applying this figure to the estimated proportion or percentage. This approach was adopted for the purposes of assigning the * or ** to indicate a 25% or 50% RSE threshold in publications from the NHS and NHSI.
SEs may also be used to calculate SEs for the difference between two survey estimates (numbers or percentages). The sampling error of the difference between the two estimates depends on their individual SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (xy) may be calculated by the following formula:
SE(xy) =sqrt[SE(x)]2 +[SE(y)]2
While this formula will only be exact for differences between separate and uncorrelated characteristics of subpopulations, it is expected to provide a reasonable approximation for most differences likely to be of interest in relation to this survey.
The imprecision due to sampling variability, which is measured by the SE, should not be confused with inaccuracies that may occur because of imperfections in reporting by respondents and recording by interviewers, and errors made in coding and processing data. Inaccuracies of this kind are referred to as nonsampling error, and they may occur in any enumeration, whether it be a full count or a sample. Every effort is made to reduce nonsampling error to a minimum by careful design of questionnaires, intensive training and supervision of interviewers, and efficient operating procedures.
TABLE 1: (INDICATIVE) STANDARD ERRORS ON NHS PERSON ESTIMATES

 Standard error (no)  Australia 



Size of estimate  NSW  Vic  Qld  SA  WA  Tas  ACT  SE (no)  RSE (%) 

500  520  488  499  404  438  342  268  468  93.7 
1,000  848  782  777  647  686  526  397  750  75.0 
1,500  1,113  1,019  997  839  880  666  492  978  65.2 
2,000  1,342  1,222  1,184  1,002  1,046  780  570  1,174  58.7 
2,500  1,548  1,403  1,350  1,145  1,190  880  635  1,350  54.0 
3,400  1,734  1,566  1,500  1,272  1,320  969  693  1,512  50.4 
3,500  1,904  1,718  1,638  1,390  1,439  1,047  742  1,659  47.4 
4,000  2,064  1,860  1,764  1,496  1,548  1,120  788  1,800  45.0 
4,500  2,219  1,989  1,881  1,598  1,652  1,184  832  1,930  42.9 
5,000  2,360  2,115  1,995  1,690  1,745  1,245  870  2,055  41.1 
6,000  2,622  2,346  2,202  1,866  1,920  1,362  942  2,286  38.1 
8,000  3,088  2,752  2,568  2,160  2,232  1,552  1,056  2,696  33.7 
10,000  3,500  3,100  2,880  2,420  2,490  1,710  1,160  3,060  30.6 
20,000  5,040  4,440  4,060  3,340  3,460  2,260  1,480  4,440  22.2 
30,000  6,180  5,400  4,920  3,960  4,140  2,610  1,680  5,490  18.3 
40,000  7,080  6,160  5,600  4,440  4,680  2,880  1,840  6,320  15.8 
50,000  7,850  6,800  6,200  4,850  5,100  3,100  1,950  7,050  14.1 
100,000  10,600  9,100  8,300  6,200  6,600  3,800  2,300  9,700  9.7 
200,000  13,800  12,000  10,800  7,600  8,400  4,400  3,000  13,200  6.6 
300,000  16,200  13,800  12,600  8,400  9,600  4,800  2,800  15,600  5.2 
400,000  17,600  15,200  14,000  8,800  10,400  5,200   17,600  4.4 
500,000  19,000  16,500  15,000  9,500  11,000    19,000  3.8 
1,000,000  23,000  20,000  19,000  11,000  13,000    24,000  2.4 
2,000,000  28,000  24,000  22,000      30,000  1.5 
5,000,000  35,000        40,000  0.8 
10,000,000         50,000  0.5 
20,000,000         60,000  0.3 

TABLE 2: NHS ESTIMATES WITH AN (INDICATIVE) RSE OF 25% AND 50%

Size of estimate  NSW  Vic  Qld  SA  WA  Tas  ACT  Aust 

RSE of 25%  20353  15693  13348  9352  9940  4978  2577  15563 
RSE of 50%  4337  3343  2996  2009  2224  1131  588  3059 

TABLE 3: (INDICATIVE) STANDARD ERRORS ON INDIGENOUS PERSON ESTIMATES, AUSTRALIA

Size of estimate  Standard Error  Relative Standard Error 

 no.  % 
500  270  54.3 
600  310  51.2 
700  340  48.6 
800  370  46.4 
900  400  44.5 
1,000  430  42.8 
1,100  450  41.3 
1,200  480  40.0 
1,300  500  38.8 
1,400  530  37.7 
1,500  550  36.7 
1,600  570  35.8 
1,700  590  34.9 
1,800  610  34.1 
1,900  630  33.4 
2,000  650  32.7 
2,100  670  32.0 
2,200  690  31.4 
2,300  710  30.8 
2,400  730  30.3 
2,500  740  29.8 
3,000  830  27.5 
3,500  900  25.7 
4,000  970  24.2 
4,500  1,030  22.9 
5,000  1,090  21.8 
6,000  1,200  20.0 
7,000  1,300  18.6 
8,000  1,390  17.4 
9,000  1,470  16.4 
10,000  1,550  15.5 
20,000  2,130  10.7 
30,000  2,540  8.5 
40,000  2,850  7.1 
50,000  3,110  6.2 
100,000  3,980  4.0 
200,000  4,940  2.5 
300,000  5,520  1.8 
400,000  5,940  1.5 

NOTE:
Because the age distribution of the Indigenous population differs from that of the nonIndigenous population, data are often age standardised for the purposes of making comparisons between the Indigenous and nonIndigenous populations. Age standardised estimates are also often used for comparisons over time. Where Indigenous estimates from the 2001 collection have been age standardised, the standard errors are, on average, between 10% and 30% higher than the corresponding standard error of unstandardised estimates. Therefore, an adjustment factor of approximately 1.2 should be applied to the RSEs shown above for all age standardised estimates for the Indigenous population.
REPLICATE WEIGHTS TECHNIQUE
A class of techniques called replication methods provide a general method of estimating variances for the types of complex sample designs and weighting procedures employed in ABS household surveys.
A basic idea behind the replication approach is to select subsamples repeatedly from the whole sample. For each of these subsamples the statistic of interest is calculated. The variance of the full sample statistics is then estimated using the variability among the replicate statistics calculated from these subsamples. The subsamples are called replicate groups and the statistics calculated from these replicates are called replicate estimates.
There are various ways of creating replicate subsamples from the full sample. The replicate weights produced for the 2001 NHS have been created under the Jackknife method of replication which is described below.
There are numerous advantages to using the replicate weighting approach. These include;
 the same procedure is applicable to most statistics such as means, percentages, ratios, correlations, derived statistics and regression coefficients
 it is not necessary for the analyst to have available detailed survey design information if the replicate weights are included with the data file.
Derivation of replicate weights
Under the Jackknife method of replicate weighting, weights were derived as follows:
 30 replicate groups were formed with each group formed to mirror the overall sample. Units from a CD all belong to the same replicate group and a unit can belong to only one replicate group.
 one replicate group was dropped from the file and then the remaining records were weighted in the same manner as for the full sample
 The records in that group that was dropped received a weight of zero
 This process was repeated for each replicate group (i.e. a total of 30 times)
 Ultimately each record had 30 replicate weights attached to it with one of these being the zero weight.
Application of replicate weights
As noted above, replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit records analyses such as chisquare and logistic regression to be conducted which take into account the sample design.
Replicate weights for any variable of interest can be calculated from the 30 replicate groups, giving 30 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.
The formula for calculating the Standard error (SE) and relative standard error (RSE) of an estimate using this method is shown below.
SE(y) = sqrt ( (29/30) S_{g} (y_{(g)}  y)^{2} )
where
g = 1,..,30 (the no. of replicate weights) ;
y_{(g)} = estimate from using repwt g; and
y = estimate from using full person weight.
The RSE(y) = SE(y)/y * 100%.
This method can also be used when modelling relationships from unit record data, regardless of the modelling technique used. in modelling, the full sample would be used to estimate the parameter being studied, such as a regression coefficient, the 30 replicate groups used to provide 30 replicate estimates of the survey parameter. The variance of the estimate of the parameter from the full sample is then approximated, as above, by the variability of the replicate estimates.
Use of replicate weights with statistical packages
Not all statistical computer packages may allow direct calculation of SEs using the Jacknife replicate weights. However, those packages that allow the direct use of Balanced Repeated Replication (BRR) methodology generally include the option of an adjustment factor. This factor can be incorporated to overcome the difference between the variance formulae.
Availability of RSEs calculated using replicate weights
Indicative RSEs were used in the summary publications released from the NHS and NHSI. However,
 A set of NHS tables containing a breakdown by ASGC Remoteness categories is available as spreadsheets on the ABS web site, via the Health Theme Page. RSEs for these tables were calculated using the replicate weights methodology.
 Tables from the publication National Health Survey: Aboriginal and Torres Strait Islander Results, Australia 2001 (cat. no. 4715.0) which contain age standardised estimates were also recompiled with RSEs calculated using the replicate weights methodology. These are available electronically and can be accessed through publication 4715.0 on the ABS web site.

Follow us on...
Like us on Facebook Follow us on Twitter Follow us on Instagram