4442.0 - Family Characteristics, Australia, Jun 2003  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 22/09/2004   
   Page tools: Print Print Page Print all pages in this productPrint All

TECHNICAL NOTE - DATA QUALITY


ESTIMATION PROCEDURE

1 Estimates of numbers of persons, families and households with particular characteristics are derived from the survey by a complex estimation procedure. This procedure ensures that the survey estimates conform to person benchmarks by State, part-of-State, age and sex, and to household benchmarks by State, part-of-State and household composition (number of adults and children usually resident in the household). These benchmarks are produced from estimates of the resident population derived independently of the survey.



RELIABILITY OF ESTIMATES

2 Since the estimates in this publication are based on information obtained from occupants of a sample of dwellings, they are subject to sampling variability. That is, they may differ from those that would have been produced if all dwellings had been included in the survey. One measure of the likely difference is given by the standard error (SE), which indicates the extent to which an estimate might have varied by chance because only a sample of dwellings was included. There are about 2 chances in 3 (67%) that a sample estimate will vary by less than one SE from the number that would have been obtained if all dwellings had been included, and about 19 chances in 20 (95%) that the difference will be less than two SEs. Another measure of the sampling variability is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate.


3 Due to space limitations, it is impractical to present the SE of each estimate in the publication. Instead, tables of SEs are provided to enable readers to determine the SE for an estimate from the size of that estimate (see tables T1 and T2 in Standard Errors). Each SE table is derived from a mathematical model, referred to as the 'SE model', which is created using the data collected in this survey. It should be noted that the SE models only give an approximate value for the SE for any particular estimate, since there is some minor variation among SEs for different estimates of the same size.


4 It is important to use the correct standard error table for the estimate. For person and child estimates, T1 must be used to calculate the SEs and for household and family estimates, T2 must be used. Some tables have, for example, both family and person estimates and the correct SE table must be used for the different estimates.



CALCULATION OF STANDARD ERROR

5 An example of the calculation and use of SEs in relation to estimates of persons is as follows: From table 4 (in the publication), consider the estimate for Australia of 169,600 for lone parents aged 25-34 years in 2003. Since this estimate is between 150,000 (lower estimate) and 200,000 (upper estimate) in the SE table for person estimates (T1), the SE for Australia will lie between 7,450 and 8,300 and can be approximated by interpolation using the following general formula:


Equation: SE of estimate (persons)PT1
Equation: SE of estimate (persons)PT2


6 Therefore, there are about two chances in three that the value that would have resulted if all dwellings had been included in the survey will fall within the range 161,800 and 177,400 and about 19 chances in 20 that the value will fall within the range 154,000 and 185,200. This example is illustrated in the following diagram.


Diagram: Calculation of Standard Error


7 An example of the calculation and use of SEs in relation to estimates of families is as follows: From table 6 (in the publiaction), consider the estimate for Australia of 477,300 for couple families with non-dependent children only in 2003. Since this estimate is between 300,000 and 500,000 in the SE table for family estimates (T2), the SE for Australia will lie between 6,800 and 7,750 and can be approximated by interpolation using the same general formula:


Equation: SE of estimate (families)


8 Therefore, there are about two chances in three that the value that would have been produced if all dwellings had been included in the survey will fall within the range 469,700 and 484,900 and about 19 chances in 20 that the value will fall within the range 462,100 and 492,500.


9 In general, the size of the SE increases as the size of the estimate increases, but at a slower rate. Therefore, the RSE decreases as the size of the estimate increases. Very small estimates are thus subject to such high RSEs that their value for most practical purposes is unreliable. In the tables in this publication, only estimates with an RSE of 25% or less are considered reliable for most purposes. Estimates with an RSE greater than 25% but less than or equal to 50% are preceded by an asterisk (e.g. *3.4) to indicate they are subject to high SEs and should be used with caution. Estimates with RSEs of greater than 50%, preceded by a double asterisk (e.g. **0.3), are considered too unreliable for general use and should only be used to aggregate with other estimates to provide derived estimates with an RSE of 25% or less.



MEAN AND MEDIAN

10 The mean and median are both measures of central tendency among a distribution. The mean, or average, is calculated as the (weighted) sum of the estimate divided by the number of observations contributing to the sum. The median is the middle value of a set of values when the values are sorted in order. This publication contains median income values (in tables 23 and 24). While means (averages) are not presented in the publication tables, they can be derived from components of some tables. Two averages of interest that can be calculated are the average number of persons per household (from table 1) and the average number of children aged 0-17 years per family (from table 13). To derive these averages the estimate of the number of persons/children is divided by the estimate of the number of households/families.


11 The RSEs of estimates of the median weekly parental income are obtained by first finding the RSE of the estimate of the total number of persons contributing to the median (see T1 or T2) and then multiplying the resulting number by a factor in T3.


12 The following is an example of the calculation of SEs where the use of a factor is required. Table 23 shows that the number of couple families with children aged 0-17 years and with the youngest child aged 0-2 years was 558,800 with a median weekly parental income of $1,054. The SE of 558,800 can be calculated from T2 (by interpolation) as 7,900 (rounded to the nearest 100). To convert this to an RSE, the SE is expressed as a percentage of the estimate, or 7,900/558,800 *100= 1.4%.


13 The RSE of the estimate of the median weekly parental income is calculated by multiplying this RSE (1.4%) by the appropriate factor shown in T3 (in this case 0.98): 1.4 x 0.98 = 1.4%. The SE of this estimate of the median weekly parental income is therefore 1.4% of $1,054, i.e. about $15. Therefore, there are 2 chances in 3 that the median weekly parental income that would have been obtained if all families had been included in the survey would have been in the range $1,039 to $1,069, and about 19 chances in 20 that it would have been within the range $1,024 to $1,084.


14 The cut-offs in T4 show parental income below which the estimate of the median will have the specified RSE. For example, the 25% RSE cut-off for median income in lone parent families in Australia is 2,351. This means that estimates of the average income of lone parents in Australia based on fewer than 2,351 families will have an RSE of 25% or more.



PROPORTIONS AND PERCENTAGES

15 Proportions and percentages formed from the ratio of two estimates are also subject to sampling error. The size of the error depends on the accuracy of both the numerator and the denominator. A formula to approximate the RSE of a proportion or percentage is given below.


16 This formula is only valid when x is a subset of y.


Equation: RSE equation


17 Considering the example in paragraph 5, the 169,600 persons aged 25-34 years who were lone parents represent 21.2% of the 799,800 persons who were lone parents. The SE of 799,800 is calculated by interpolation as 13,300 (rounded to the nearest 100). To convert this to a RSE, the SE is expressed as a percentage of the estimate, or 13,300/799,800 x 100 = 1.7%. The SE for 169,600 was calculated previously as 7,800 which converted to a RSE is 7,800/169,600 x 100 = 4.6%. Applying the above formula, the RSE of the proportion is


Equation: RSE with data


18 Therefore, the SE for the proportion of persons aged 25-34 years who were lone parents is 0.9 percentage points (or (21.2/100) x 4.3). Therefore there are about 2 chances in 3 that the proportion of persons aged 25-34 years who were lone parents is between 20.3% and 22.1%, and 19 chances in 20 that the proportion is within the range 19.4% and 23.0%.



DIFFERENCES

19 Published estimates may also be used to calculate the difference between two survey estimates (of numbers or percentages). Such an estimate is also subject to sampling error. The sampling error of the difference between two estimates depends on their SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (x - y) may be calculated by the following formula:


Equation: SE(x-y)


20 While this formula will only be exact for differences between separate and uncorrelated characteristics or sub populations, it is expected to provide a good approximation for all differences likely to be of interest in this publication.



SIGNIFICANCE TESTING

21 Statistical significance testing has been undertaken for the comparison of proportions in this publication in the following tables:

  • 1 - for proportion of persons, families and households between 1992 and 1997, and 1997 and 2003
  • 13 - for proportion of families, persons and children between 1992 and 1997, and 1997 and 2003
  • 20 - for proportion of children between 1997 and 2003

22 The statistical significance test for these comparisons was performed to determine whether it is likely that there is a difference between the corresponding population characteristics, rather than the result of sampling variability in the data. The standard error of the difference between two corresponding estimates
(x and y) can be calculated using the formula in paragraph 19. This standard error is then used to calculate the following test statistic:


Equation: Sig test


23 If the value of this test statistic is greater than 1.96 then there are 19 chances in 20 that there is a real difference in the two populations with respect to that characteristic. Otherwise, it cannot be stated with confidence that there is a real difference between the populations.


24 The SE of an estimated percentage or rate computed by using sample data for both numerator and denominator, depends on both the size of the numerator and the size of the denominator. However, the RSE of the estimated percentage or rate will generally be lower than the RSE of the estimate of the numerator. This means that differences on proportions may be significant while differences on estimates are not.


25 The selected tables in this publication that show the results of statistical significance testing are annotated to indicate whether or not the estimates which have been compared are statistically significantly different from each other with respect to the test statistic. In all other tables which do not show the results of significance testing, users should take account of RSE's when comparing estimates.