TECHNICAL NOTE DATA QUALITY
ESTIMATION PROCEDURE
1 The survey weights are calculated to ensure that the survey estimates conformed to an independently estimated distribution of households (by number of adults and children within the household, and by part of the state).
2 The estimates were then obtained by summing the weights of households with the characteristic of interest. For example, an estimate of the total number of households living in a dwelling with heating is obtained by adding together the weight for each household in the sample living in a dwelling with heating. For a comparison between dwellings and households please refer to the Glossary.
RELIABILITY OF ESTIMATES
3 Estimates in this publication are subject to non-sampling and sampling errors.
Non-sampling errors
4 Non-sampling errors may arise as a result of errors in the reporting, recording or processing of the data and can occur even if there is a complete enumeration of the population. Non-sampling errors can be introduced through inadequacies in the questionnaire, non-response, inaccurate reporting by respondents, errors in the application of survey procedures, incorrect recording of answers, and errors in data entry and processing.
5 It is difficult to measure the size of the non-sampling errors and the extent of these errors could vary considerably from survey to survey and from question to question. Every effort was made in the design of this survey and in the development of survey procedures to minimise the effect of these errors.
Sampling errors
6 Sampling error is the difference between the published estimate, calculated from a sample of dwellings, and the value that would have been produced if all dwellings had been included in the survey.
ESTIMATES OF SAMPLING ERRORS
7 One measure of the likely difference between a survey estimate and the 'true' population value is given by the Standard Error (SE). There are about two chances in three (67%) that a survey estimate is within one SE of the figure that would have been obtained if all households had been included in the survey, and about nineteen chances in twenty (95%) that the estimate lies within two SEs.
8 Due to space limitations, it is impractical to print the SE of each estimate in the publication. Instead, a table of SEs is provided to enable readers to determine the SE for an estimate based on the size of that estimate (see SE table below). The SE table is derived from a mathematical model, which is created using the data collected in the survey. The figures in the SE table will not give a precise measure of the SE for a particular estimate but will provide an indication of its magnitude.
9 Linear interpolation can be used to calculate the SE of estimates falling between the sizes of estimates presented in the table below, using the following general formula:
10 An example of the calculation and use of SEs is as follows. Table 2 shows that the estimated number of households in WA that lived in a separate house was 637,100. Since this estimate is between 500,000 and 1,000,000, the SE table shows that the SE will lie between 8,331 and 9,669. The approximate value of the SE can be interpolated as follows:
11 Therefore, there are about two chances in three that the true number of persons in WA that lived in a separate house lies between 628,402 and 645,798, and there are about nineteen chances in twenty that the value lies between 619,704 and 654,496.
12 The SE can also be expressed as a percentage of the estimate, known as the Relative Standard Error (RSE). The RSE is calculated by dividing the SE of an estimate by the estimate, and expressing it as a percentage. That is:
13 For example, the RSE for the number of households that lived in a separate house is:
14 In general, the size of the SE increases as the size of the estimate increases. Conversely, the RSE decreases as the size of the estimate increases. Very small estimates are thus subject to high RSEs and are considered unreliable for general use.
15 Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. A formula to approximate the RSE of a proportion is given below. This formula is only valid when the numerator is a subset of the denominator.
16 Published estimates are sometimes used to calculate the difference between two survey estimates. Such estimates are also subject to sampling error. The sampling error of the difference between two estimates depends on the SE of each estimate and the relationship (correlation) between them. The approximate SE of the difference between two estimates may be calculated using the following formula:
17 While this formula will only be exact for differences between separate and uncorrelated characteristics or subpopulations, it is expected to provide a good approximation for all differences likely to be of interest in this publication.
18 For example, Table 2 shows that an estimated 136,800 households in WA lived in a dwelling with solar energy and 546,200 households lived in a dwelling with mains gas. This equates to a difference of 409,400 households. The standard error for each estimate is calculated using linear interpolation (as described above) and then the standard error on the estimate of the difference is calculated as:
19 Therefore, there are about two chances in three that the true difference between the number of households in WA living in a dwelling with solar or mains gas energy sources lies between 399,218 and 419,582, and there are about nineteen chances in twenty that the value lies between 389,036 and 429,764.
STANDARD ERRORS ON ESTIMATES OF WA HOUSEHOLDS, Domestic Use of Water and Energy, October 2006 |
| |
| Standard Error | Relative Standard Error | |
Size of estimate | no. | % | |
| |
1,000 | 501.1 | 50.1 | |
1,500 | 652.7 | 43.5 | |
2,000 | 782.0 | 39.1 | |
2,500 | 896.1 | 35.8 | |
3,000 | 999.1 | 33.3 | |
3,500 | 1 093.4 | 31.2 | |
4,000 | 1 180.7 | 29.5 | |
5,000 | 1 338.8 | 26.8 | |
8,000 | 1 724.8 | 21.6 | |
10,000 | 1 935.0 | 19.3 | |
20,000 | 2 705.9 | 13.5 | |
30,000 | 3 242.4 | 10.8 | |
50,000 | 4 007.1 | 8.0 | |
100,000 | 5 189.9 | 5.2 | |
200,000 | 6 503.5 | 3.3 | |
300,000 | 7 308.4 | 2.4 | |
500,000 | 8 330.5 | 1.7 | |
1,000,000 | 9 668.5 | 1.0 | |
2,000,000 | 10 856.8 | 0.5 | |
| |
20 Where differences between data items have been noted in the Summary of Findings, they are statistically significant unless otherwise specified. In this publication a statistically significant difference is one where there are nineteen chances in twenty that the difference noted reflects a true difference between population groups of interest rather than being the result of sampling variability.
Follow us on...
Like us on Facebook Follow us on Twitter Follow us on Instagram Subscribe to ABS updates