4363.0.55.001 - National Health Survey: Users' Guide, 2001

ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 27/05/2003

Page tools: Print

Print Page Print all pages in this product

Contents >> Appendix 12 - Standard errors

RELIABILITY OF ESTIMATES

Measuring sampling variability

Since the estimates from this survey are based on information obtained from a sub-sample of usual residents of a sample of dwellings, they are subject to sampling variability; that is, they may differ from those that would have been produced if all usual residents of all dwellings had been included in the survey. One measure of the likely difference is given by the standard error (SE), which indicates the extent to which an estimate might have varied by chance because only a sample of dwellings was included.

There are about two chances in three that a sample estimate will differ by less than one SE from the number that would have been obtained if all dwellings had been included, and about 19 chances in 20 that the difference will be less than two SEs. Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate.

INDICATIVE STANDARD ERRORS

Because of the large number and diverse nature of estimates which it is possible to produce from the NHS and NHSI it is not practicable to present separate indication of the SEs of all estimates. Indicative standard errors, and relative standard errors on estimates from the NHS and NHSI are provided in Tables 1 to 3 below. Figures in these table do not give a precise measure of the SE for a particular estimate but will provide an indication of its magnitude. ABS has modelled these SEs on the full survey design information. Exact RSEs on every estimate can however be provided by the replicate weight methodology. This methodology is described at the end of this Appendix.

An example of the calculation and the use of SEs from Table 1 in relation to estimates of persons is as follows. Consider the estimate for Australia of persons aged 45 - 54 years who reported high cholesterol as a long-term condition (246,300). Since this estimate is between 200,000 and 300,000 in the SE table, the SE will be between 13,200 and 15,600 and can be approximated by linear interpolation as 14,300 (rounded to the nearest 100). Therefore, there are about two chances in three that the value that would have been produced if all dwellings had been included in the survey will fall in the range 232,000 to 260,600 and about 19 chances in 20 that the value will fall within the range 217,700 to 274,900.

As can be seen from the SE table the smaller the estimate the higher the RSE. Very small estimates are thus subject to such high SEs (relative to the size of the estimate) as to detract seriously from their value for most reasonable uses. Only estimates with RSEs of less than 25% and percentages based on such estimates are considered sufficiently reliable for most purposes. However estimates with a higher RSE are contained in published tables from the survey and can be provided on request. In published output estimates with an RSE of 25% to 50% are preceded by an asterisk (e.g. *3.4) to indicate that they are subject to high SEs and should be used with caution. Estimates with RSEs greater than 50% are preceded by a double asterisk (e.g. **2.1) to indicate that they are considered too unreliable for general use.

SEs of proportions and percentages

Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends of the accuracy of both the numerator and denominator. A formula to approximate the RSE of a proportion is given below:

RSE( x/y ) =sqrt[RSE(x)]2 - [RSE(y)]2

Note - this formula only holds when the x is a subset of y. It should not be used if this is not the case i.e. estimates of 'rates' as opposed to proportions.

Using this formula, the RSE of the estimated proportion or percentage will be lower than the RSE estimate of the numerator. Therefore an approximation for SEs of proportions or percentages may be derived by neglecting the RSE of the denominator i.e. obtaining the RSE of the number of persons corresponding to the numerator of the proportion or percentage and then applying this figure to the estimated proportion or percentage. This approach was adopted for the purposes of assigning the * or ** to indicate a 25% or 50% RSE threshold in publications from the NHS and NHSI.

SEs may also be used to calculate SEs for the difference between two survey estimates (numbers or percentages). The sampling error of the difference between the two estimates depends on their individual SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (x-y) may be calculated by the following formula:

SE(x-y) =sqrt[SE(x)]2 +[SE(y)]2

While this formula will only be exact for differences between separate and uncorrelated characteristics of subpopulations, it is expected to provide a reasonable approximation for most differences likely to be of interest in relation to this survey.

The imprecision due to sampling variability, which is measured by the SE, should not be confused with inaccuracies that may occur because of imperfections in reporting by respondents and recording by interviewers, and errors made in coding and processing data. Inaccuracies of this kind are referred to as non-sampling error, and they may occur in any enumeration, whether it be a full count or a sample. Every effort is made to reduce non-sampling error to a minimum by careful design of questionnaires, intensive training and supervision of interviewers, and efficient operating procedures.

TABLE 1: (INDICATIVE) STANDARD ERRORS ON NHS PERSON ESTIMATES


	Standard error (no)							Australia

Size of estimate	NSW	Vic	Qld	SA	WA	Tas	ACT	SE (no)	RSE (%)

500	520	488	499	404	438	342	268	468	93.7
1,000	848	782	777	647	686	526	397	750	75.0
1,500	1,113	1,019	997	839	880	666	492	978	65.2
2,000	1,342	1,222	1,184	1,002	1,046	780	570	1,174	58.7
2,500	1,548	1,403	1,350	1,145	1,190	880	635	1,350	54.0
3,400	1,734	1,566	1,500	1,272	1,320	969	693	1,512	50.4
3,500	1,904	1,718	1,638	1,390	1,439	1,047	742	1,659	47.4
4,000	2,064	1,860	1,764	1,496	1,548	1,120	788	1,800	45.0
4,500	2,219	1,989	1,881	1,598	1,652	1,184	832	1,930	42.9
5,000	2,360	2,115	1,995	1,690	1,745	1,245	870	2,055	41.1
6,000	2,622	2,346	2,202	1,866	1,920	1,362	942	2,286	38.1
8,000	3,088	2,752	2,568	2,160	2,232	1,552	1,056	2,696	33.7
10,000	3,500	3,100	2,880	2,420	2,490	1,710	1,160	3,060	30.6
20,000	5,040	4,440	4,060	3,340	3,460	2,260	1,480	4,440	22.2
30,000	6,180	5,400	4,920	3,960	4,140	2,610	1,680	5,490	18.3
40,000	7,080	6,160	5,600	4,440	4,680	2,880	1,840	6,320	15.8
50,000	7,850	6,800	6,200	4,850	5,100	3,100	1,950	7,050	14.1
100,000	10,600	9,100	8,300	6,200	6,600	3,800	2,300	9,700	9.7
200,000	13,800	12,000	10,800	7,600	8,400	4,400	3,000	13,200	6.6
300,000	16,200	13,800	12,600	8,400	9,600	4,800	2,800	15,600	5.2
400,000	17,600	15,200	14,000	8,800	10,400	5,200		17,600	4.4
500,000	19,000	16,500	15,000	9,500	11,000			19,000	3.8
1,000,000	23,000	20,000	19,000	11,000	13,000			24,000	2.4
2,000,000	28,000	24,000	22,000					30,000	1.5
5,000,000	35,000							40,000	0.8
10,000,000								50,000	0.5
20,000,000								60,000	0.3

TABLE 2: NHS ESTIMATES WITH AN (INDICATIVE) RSE OF 25% AND 50%


Size of estimate	NSW	Vic	Qld	SA	WA	Tas	ACT	Aust

RSE of 25%	20353	15693	13348	9352	9940	4978	2577	15563
RSE of 50%	4337	3343	2996	2009	2224	1131	588	3059

TABLE 3: (INDICATIVE) STANDARD ERRORS ON INDIGENOUS PERSON ESTIMATES, AUSTRALIA


Size of estimate	Standard Error	Relative Standard Error

	no.	%
500	270	54.3
600	310	51.2
700	340	48.6
800	370	46.4
900	400	44.5
1,000	430	42.8
1,100	450	41.3
1,200	480	40.0
1,300	500	38.8
1,400	530	37.7
1,500	550	36.7
1,600	570	35.8
1,700	590	34.9
1,800	610	34.1
1,900	630	33.4
2,000	650	32.7
2,100	670	32.0
2,200	690	31.4
2,300	710	30.8
2,400	730	30.3
2,500	740	29.8
3,000	830	27.5
3,500	900	25.7
4,000	970	24.2
4,500	1,030	22.9
5,000	1,090	21.8
6,000	1,200	20.0
7,000	1,300	18.6
8,000	1,390	17.4
9,000	1,470	16.4
10,000	1,550	15.5
20,000	2,130	10.7
30,000	2,540	8.5
40,000	2,850	7.1
50,000	3,110	6.2
100,000	3,980	4.0
200,000	4,940	2.5
300,000	5,520	1.8
400,000	5,940	1.5

NOTE:
Because the age distribution of the Indigenous population differs from that of the non-Indigenous population, data are often age standardised for the purposes of making comparisons between the Indigenous and non-Indigenous populations. Age standardised estimates are also often used for comparisons over time. Where Indigenous estimates from the 2001 collection have been age standardised, the standard errors are, on average, between 10% and 30% higher than the corresponding standard error of unstandardised estimates. Therefore, an adjustment factor of approximately 1.2 should be applied to the RSEs shown above for all age standardised estimates for the Indigenous population.

REPLICATE WEIGHTS TECHNIQUE

A class of techniques called replication methods provide a general method of estimating variances for the types of complex sample designs and weighting procedures employed in ABS household surveys.

A basic idea behind the replication approach is to select subsamples repeatedly from the whole sample. For each of these subsamples the statistic of interest is calculated. The variance of the full sample statistics is then estimated using the variability among the replicate statistics calculated from these subsamples. The subsamples are called replicate groups and the statistics calculated from these replicates are called replicate estimates.

There are various ways of creating replicate subsamples from the full sample. The replicate weights produced for the 2001 NHS have been created under the Jackknife method of replication which is described below.

There are numerous advantages to using the replicate weighting approach. These include;

the same procedure is applicable to most statistics such as means, percentages, ratios, correlations, derived statistics and regression coefficients
it is not necessary for the analyst to have available detailed survey design information if the replicate weights are included with the data file.

Derivation of replicate weights

Under the Jackknife method of replicate weighting, weights were derived as follows:

30 replicate groups were formed with each group formed to mirror the overall sample. Units from a CD all belong to the same replicate group and a unit can belong to only one replicate group.

one replicate group was dropped from the file and then the remaining records were weighted in the same manner as for the full sample

The records in that group that was dropped received a weight of zero

This process was repeated for each replicate group (i.e. a total of 30 times)

Ultimately each record had 30 replicate weights attached to it with one of these being the zero weight.

Application of replicate weights

As noted above, replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit records analyses such as chi-square and logistic regression to be conducted which take into account the sample design.

Replicate weights for any variable of interest can be calculated from the 30 replicate groups, giving 30 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.

The formula for calculating the Standard error (SE) and relative standard error (RSE) of an estimate using this method is shown below.

SE(y) = sqrt ( (29/30) S_g (y_(g) - y)² )

where

g = 1,..,30 (the no. of replicate weights) ;
y_(g) = estimate from using repwt g; and
y = estimate from using full person weight.

The RSE(y) = SE(y)/y * 100%.

This method can also be used when modelling relationships from unit record data, regardless of the modelling technique used. in modelling, the full sample would be used to estimate the parameter being studied, such as a regression co-efficient, the 30 replicate groups used to provide 30 replicate estimates of the survey parameter. The variance of the estimate of the parameter from the full sample is then approximated, as above, by the variability of the replicate estimates.

Use of replicate weights with statistical packages

Not all statistical computer packages may allow direct calculation of SEs using the Jacknife replicate weights. However, those packages that allow the direct use of Balanced Repeated Replication (BRR) methodology generally include the option of an adjustment factor. This factor can be incorporated to overcome the difference between the variance formulae.

Availability of RSEs calculated using replicate weights

Indicative RSEs were used in the summary publications released from the NHS and NHSI. However,

A set of NHS tables containing a breakdown by ASGC Remoteness categories is available as spreadsheets on the ABS web site, via the Health Theme Page. RSEs for these tables were calculated using the replicate weights methodology.

Tables from the publication National Health Survey: Aboriginal and Torres Strait Islander Results, Australia 2001 (cat. no. 4715.0) which contain age standardised estimates were also recompiled with RSEs calculated using the replicate weights methodology. These are available electronically and can be accessed through publication 4715.0 on the ABS web site.

Back to top


Chapter 1 - Introduction Chapter 2 - Survey Design and Operation Chapter 3 - Health Status Indicators Chapter 4 - Health Related Actions Chapter 5 - Health Risk Factors Chapter 6 - Population Characteristics Chapter 7 - Data Quality and Interpretation of results Chapter 8 - Data Output and Dissemination	Appendix 1 - Glossary of Terms Used Appendix 2 - Sample Counts and Weighted Estimates Appendix 3 - Classification of Long-term Medical Conditions: Based on ICD-10 Appendix 4 - Classification of Long-term Medical Conditions: Based on ICD-9 Appendix 5 - Classification of Long-term Medical Conditions: ICPC Based Appendix 6 - Classification of Type of Medication Appendix 7 - Classification of Country of Birth Appendix 8 - Classification of Language Spoken at Home	Appendix 9 - Classification of Occupation Appendix 10 - Classification of Industry of Employment Appendix 11 - Classification of Types of Alcoholic Drinks Appendix 12 - Standard Errors Appendix 13 - Content of the 2001 National Health Survey (Indigenous) Appendix 14 - List of Abbreviations