TECHNICAL NOTE DATA QUALITY
RELIABILITY OF THE ESTIMATES
1 Two types of error are possible in an estimate based on a sample survey: sampling error and non-sampling error. The sampling error is a measure of the variability that occurs by chance because a sample, rather than the entire population, is surveyed. Since the estimates in this publication are based on information obtained from occupants of a sample of dwellings they are subject to sampling variability; that is they may differ from the figures that would have been produced if all dwellings had been included in the survey. One measure of the likely difference is given by the standard error (SE). There are about two chances in three that a sample estimate will differ by less than one SE from the figure that would have been obtained if all dwellings had been included, and about 19 chances in 20 that the difference will be less than two SEs.
2 Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate. The RSE is a useful measure in that it provides an immediate indication of the percentage errors likely to have occurred due to sampling, and thus avoids the need to refer also to the size of the estimate.
3 RSEs for estimates from the 2012 SDAC were calculated using the Jackknife method of variance estimation. This involves the calculation of 60 'replicate' estimates based on 60 different sub-samples of the original sample. The variability of estimates obtained from these sub-samples is used to estimate the sample variability surrounding the main estimate.
4 Tables 1 and 2 contain time series estimates from the 2012 SDAC, 2009 SDAC and 2003 SDAC. The spreadsheet datacubes associated with the current edition of Disability, Ageing and Carers, Australia: Summary of Findings (cat. no. 4430.0) contain directly calculated RSEs for the 2012 and 2009 estimates. However, the RSEs for the 2003 estimates were calculated using a previous statistical SE model. This is detailed in Disability, Ageing and Carers, Australia: Summary of Findings, 2003 (cat. no. 4430.0) which is available on the ABS website <www.abs.gov.au>. While the direct method is more accurate, the difference between the two is usually not significant for most estimates.
5 Very small estimates may be subject to such high RSEs as to detract seriously from their value for most reasonable uses. In the tables in this publication, only estimates with RSEs less than 25% are considered sufficiently reliable for most purposes, however, estimates with larger RSEs are published. Where an estimate has a RSE between 25% and 50%, a footnote is included to indicate that it is subject to a high RSE and should be used with caution, and the RSE is provided. Estimates with RSEs of 50% or more include a footnote explaining that such estimates are considered unreliable for general use, and the RSE is suppressed.
6 The imprecision due to sampling variability is measured by the SE. This should not be confused with inaccuracies that may occur because of imperfections in reporting by interviewers and respondents or errors made in coding and processing of data. Inaccuracies of this kind are referred to as non-sampling error, and they may occur in any enumeration, whether it be in a full count or only a sample. In practice, the potential for non-sampling error adds to the uncertainty of the estimates caused by sampling variability. However, it is not possible to quantify the non-sampling error.
CONFIDENTIALITY OF ESTIMATES
7 In accordance with the Census and Statistics Act, 1905, all published estimates are subjected to a confidentiality process before release. This process is undertaken to minimise the risk of identifying particular individuals, families, households or dwellings in aggregate statistics, through analysis of published data.
8 To minimise the risk of identifying individuals in aggregate statistics, a technique is used to randomly adjust cell values. This technique is called perturbation. Perturbation involves small random adjustment of the statistics and is considered the most satisfactory technique for avoiding the release of identifiable statistics while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics.
9 After perturbation, a given published cell value will be consistent across all tables. However, adding up cell values to derive a total will not necessarily give the same result as published totals. The introduction of perturbation in publications ensures that these statistics are consistent with statistics released via services such as Table Builder. Refer to the Interpreting the Results chapter of the Explanatory Notes for a further illustration of how perturbed estimates will be published.
EXAMPLE OF INTERPRETATION OF SAMPLING ERROR
10 Standard errors can be calculated using the estimates and the corresponding RSEs. As an example, using estimates from Table 3, 142,900 males aged 25 to 34 years are estimated as having a disability. The RSE for this estimate is 4.9% (see the Relative Standard Error Table at the end of this Technical Note). The SE is calculated by:
11 Therefore, there are about two chances in three that the actual number of males that have a disability will fall within the range 135,900 to 149,900 and about 19 chances in 20 that the value will fall within the range 128,900 to 156,900. This example is illustrated in the diagram below.
STANDARD ERRORS OF PROPORTIONS AND PERCENTAGES
12 Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. For proportions where the denominator is an estimate of the number of persons in a group and the numerator is the number of persons in a sub-group of the denominator group, the formula to approximate the RSE is given below. The formula is only valid when x is a subset of y.
13 As an example, using estimates from Table 3, of the 4,234,200 persons who had a disability, 2,186,200 are females, or 51.6%. The RSE for 2,186,200 is 1.0% and the RSE for 4,234,200 is 0.9% (see the Relative Standard Error Table at the end of this Technical Note). Applying the above formula, the RSE for the proportion of females who had a disability is:
14 Therefore, the SE for the proportion of persons who had a disability and were female, is 0.2 percentage points (=0.4/100 x 51.6). Hence, there are about two chances in three that the proportion of females who had a disability is between 51.4% and 51.8%, and 19 chances in 20 that the proportion is between 51.2% and 52.0%.
COMPARISON OF ESTIMATES
15 Published estimates may also be used to calculate the difference between two survey estimates. Such an estimate is subject to sampling error. The sampling error of the difference between two estimates depends on their SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (x-y) may be calculated by the following formula:
16 While the above formula will be exact only for differences between separate and uncorrelated (unrelated) characteristics of sub-populations, it is expected that it will provide a reasonable approximation for all differences likely to be of interest in this publication.
SIGNIFICANCE TESTING
17 For comparing estimates between surveys or between populations within a survey, it is useful to determine whether apparent differences are 'real' differences between the corresponding population characteristics or simply the product of differences between the survey samples. One way to examine this is to determine whether the difference between the estimates is statistically significant. This is done by calculating the SE of the difference between two estimates (x and y) and using that to calculate the test statistic using the formula below:
18 If the value of the statistic is greater than 1.96 then we may say there is good evidence of a statistically significant difference at 95% confidence levels between the two populations with respect to that characteristic. Otherwise, it cannot be stated with confidence that there is a real difference between the populations.
19 The selected tables in this publication that show the results of significance testing are annotated to indicate where the estimates are significantly different from each other. In all other tables that do not show the results of significance testing, users should take account of RSEs when comparing estimates for different populations.
AGE STANDARDISATION
20 Totals presented in Tables 1 and 2 comparing rates over time are shown as age-standardised percentages. An age-standardised rate removes the effects of different age structures when comparing population groups or changes over time. The standardised rate is that which would have prevailed if the actual population had the standard age composition. Age-specific disability rates are multiplied by the standard population for each age group. The results are added and the sum calculated as a percentage of the standard population total to give the age-standardised percentage rate. The standardised rates should only be used to identify differences between population groups and changes over time.
21 For this publication the direct age standardisation method was used. The standard population used was the 30 June 2001 Estimated Resident Population. Estimates of age-standardised rates were calculated using the following formula:
where:
C_{direct} = the age-standardised rate for the population of interest
a = the age categories that have been used in the age standardisation
C_{a} = the estimated rate for the population being standardised in age category a
P_{sa} = the proportion of the standard population in age category a.
22 The age categories used in the standardisation for this publication were five year age groups, to 75 years and over. |