Reliability of estimates
Two types of error are possible in estimates based on a sample survey:
- Non-sampling error
- Sampling error
Non-sampling error
Non-sampling error is caused by factors other than those related to sample selection. It is any factor that results in the data values not accurately reflecting the true value of the population.
It can occur at any stage throughout the survey process. Examples include:
- selected people who do not respond (e.g., refusals, non-contact)
- questions being misunderstood
- responses being incorrectly recorded
- errors in coding or processing the survey data.
Undercoverage is a type of non-sampling error. For more information, see Undercoverage in the How the data is processed section.
Sampling error
Sampling error is the expected difference that can occur between the published estimates and the value that would have been produced if the entire population had been surveyed. Sampling error is the result of random variation and can be estimated using measures of variance in the data.
Standard error
One measure of sampling error is the standard error (SE). There are about two chances in three that an estimate will differ by less than one SE from the figure that would have been obtained if the whole population had been included. There are about 19 chances in 20 that an estimate will differ by less than two SEs.
Relative standard error
The relative standard error (RSE) is a useful measure of sampling error. It is the SE expressed as a percentage of the estimate:
\(RSE\% = (\frac{{SE}}{{estimate}}) \times 100\)
Only estimates with RSEs less than 25% are considered reliable for most purposes. Estimates with larger RSEs, between 25% and less than 50%, have been included in the publication, but are flagged to indicate they are subject to high SEs. These should be used with caution. Estimates with RSEs of 50% or more have also been flagged and are considered unreliable for most purposes. RSEs for these estimates are not published.
Margin of error for proportions
Another measure of sampling error is the Margin of Error (MOE). This describes the distance from the population value that the sample estimate is likely to be within and is particularly useful to understand the accuracy of proportion estimates. It is specified at a given level of confidence. Confidence levels typically used are 90%, 95% and 99%.
For example, at the 95% confidence level, the MOE indicates that there are about 19 chances in 20 that the estimate will differ by less than the specified MOE from the population value (the figure obtained if the whole population had been enumerated). The 95% MOE is calculated as 1.96 multiplied by the SE:
\(MOE=SE \times1.96\)
The RSE can also be used to directly calculate a 95% MOE by:
\(MOE(y) \approx (\frac{{RSE(y) \times y}}{{100}}) \times 1.96\)
The MOEs in this publication are calculated at the 95% confidence level. This can easily be converted to a 90% confidence level by multiplying the MOE by:
\(\frac{{1.645}}{{1.96}}\)
or to a 99% confidence level by multiplying the MOE by:
\(\frac{{2.576}}{{1.96}}\)
Depending on how the estimate is to be used, an MOE of greater than 10% may be considered too large to inform decisions. For example, a proportion of 15% with an MOE of plus or minus 11% would mean the estimate could be anything from 4% to 26%. It is important to consider this range when using the estimates to make assertions about the population.
Confidence intervals
A confidence interval expresses the sampling error as a range in which the population value is expected to lie at a given level of confidence. A confidence interval is calculated by taking the estimate plus or minus the MOE of that estimate. In other terms, the 95% confidence interval is the estimate +/- MOE.
Calculating measures of error
Proportions or percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. A formula to approximate the RSE of a proportion is given below. This formula is only valid when the numerator (x) is a subset of the denominator (y):
\(RSE(\frac{x}{y}) \approx \sqrt {{{[RSE(x)]}^2} - {{[RSE(y)]}^2}}\)
When calculating measures of error, it may be useful to convert RSE or MOE to SE. This allows the use of standard formulas involving the SE. The SE can be obtained from RSE or MOE using the following formulas:
\(SE = \frac{{RSE\% \times estimate}}{{100}}\)
\(SE = \frac{{MOE}}{{1.96}}\)
Comparison of estimates
The difference between two survey estimates (counts or percentages) can also be calculated from published estimates. Such an estimate is also subject to sampling error. The sampling error of the difference between two estimates depends on their SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (x - y) may be calculated by the following formula:
\(SE(x - y) \approx \sqrt {{{[SE(x)]}^2} + {{[SE(y)]}^2}}\)
While this formula will only be exact for differences between unrelated characteristics or sub-populations, it provides a reasonable approximation for the differences likely to be of interest in this publication.
Significance testing
When comparing estimates between surveys or between populations within a survey, it is useful to determine whether apparent differences are 'real' differences or simply the product of differences between the survey samples.
One way to examine this is to determine whether the difference between the estimates is statistically significant. This is done by calculating the standard error of the difference between two estimates (x and y) and using that to calculate the test statistic using the formula below:
\((\frac{{|x - y|}}{{SE(x - y)}})\)
where
\(SE(y) \approx \frac{{RSE(y) \times y}}{{100}}\)
If the value of the statistic is greater than 1.96, we can say there is good evidence of a statistically significant difference at 95% confidence levels between the two populations with respect to that characteristic. Otherwise, it cannot be stated with confidence that there is a real difference between the populations.