6361.0.55.002 - Employment Arrangements, Retirement and Superannuation, User Guide, Australia, April To July 2007
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 18/11/2008
Page tools: Print All

DATA QUALITY

DATA QUALITY

Although care has been taken to ensure that the results of this survey are as accurate as possible, there are certain factors which affect the reliability of the results to some extent, and for which no adequate adjustments can be made. These are known as sampling error and non-sampling error. These factors, which are discussed below, should be kept in mind when interpreting the results of the survey.

SAMPLING ERROR

Sampling error is the difference between the published estimates derived from a sample of persons and the value that would have been produced if all persons in scope of the survey had been enumerated. The estimates in this survey are obtained from the occupants of samples of dwellings. Therefore, the estimates are subject to sampling variability and may differ from the figures that would have been produced if information had been collected for all dwellings.

Measures of sampling error

One measure of the likely difference is given by the standard error (SE), which indicates the extent to which an estimate might have varied because only a sample of dwellings was included. There are about two chances in three that the sample estimate will differ by less than one SE from the figure that would have been obtained if all dwellings had been included, and about 19 chances in 20 that the difference will be less than two SEs.

Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate to which it relates. The RSE is a useful measure in that it provides an immediate indication of the percentage errors likely to have occurred due to sampling, and thus avoids the need to refer also to the size of the estimate.

For estimates of population sizes, the size of the SE generally increases with the level of the estimate, so that the larger the estimate, the larger the SE. However, the larger the sampling estimate, the smaller the SE in percentage terms (RSE). Thus, larger sample estimates will be relatively more reliable than smaller estimates.

Only estimates with RSEs of 25% or less are considered reliable for most purposes. However, estimates with RSEs of 25% or more are included in all published 2007 SEARS output. Estimates with RSEs greater than 25% but less than or equal to 50% are annotated by an asterisk to indicate they are subject to high SEs and should be used with caution. Estimates with RSEs of greater than 50%, annotated by a double asterisk, are considered too unreliable for general use and should only be used to aggregate with other estimates to provide derived estimates with RSEs of 25% or less.

Relative standard errors for estimates from SEARS 2007 are published for the first time in 'direct' form. Previously a statistical model was produced that related the size of estimates to their corresponding RSEs, and this information was displayed via a standard error table. For SEARS 2007, RSEs for estimates were calculated for each separate estimate and published individually. The Jackknife method of variance estimation was used for this process, which involved the calculation of 60 'replicate' estimates based on 60 different subsamples of the original sample. The variability of estimates obtained from these subsamples was used to estimate the sample variability surrounding the main estimate. Unlike the previous method, direct calculation of RSEs can result in larger estimates having larger RSEs than smaller ones, since these larger estimates may have more inherent variability.

Standard errors of proportions and percentages

Proportions and percentages, which are formed from the ratio of two estimates, are also subject to sampling errors. The size of the error depends on the accuracy of both the estimates. For proportions where the denominator is an estimate of the number of households in a grouping, and the numerator is the number of households in a sub-group of the denominator group, the formula for the RSE is given by:

This formula is only valid when x is a sub-group of y .

The SE of an estimated percentage or rate, computed by using sample data for both numerator and denominator, depends on the size of both numerator and denominator. However, the formula above shows that the RSE of the estimated percentage or rate will generally be lower than the RSE of the estimate of the numerator.

Standard errors of differences

The difference between two survey estimates (of numbers or percentages) is itself an estimate and is therefore subject to sampling variability. The SE of the difference between two survey estimates depends on their SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (x-y) can be calculated using the formula:

While this formula will only be exact for differences between separate and uncorrelated (unrelated) characteristics or sub-populations, it is expected to provide a good approximation for all of the differences likely to be of interest in this survey.

Testing for statistically significant differences

Statistical significance testing can be undertaken to determine whether it is likely that there is a difference between two estimates from different samples. The standard error for the difference between two estimates can be calculated using the formula in the paragraph above. The standard error is used to calculate the following test statistic:

If the value of the test statistic is greater than 1.96, then we may say that we are 95% certain that there is a statistically significant difference between the two populations with respect to that characteristic. Otherwise, it cannot be stated with confidence that there is a real difference between the populations.

NON-SAMPLING ERROR

Lack of precision due to sampling variability should not be confused with inaccuracies that may occur for other reasons, such as errors in response and recording. Inaccuracies of this type are referred to as non-sampling error. This type of error is not specific to sample surveys and can occur in a census enumeration. The major sources are:

• errors related to scope and coverage;
• response errors such as incorrect interpretations or wording of questions;
• interviewer bias;
• non-response bias; and
• processing errors.

These sources of error are discussed in turn below.

Errors related to scope and coverage

Some dwellings may have been inadvertently included or excluded because, for example, the distinctions between whether they were private or non-private dwellings may have been unclear. All efforts were made to overcome such situations by constant updating of lists both before and during the survey. Furthermore, some persons may have been inadvertently included or excluded because of difficulties in applying the scope rules concerning the identification of usual residents, and the treatment of some overseas visitors.

Response errors

Response errors may have arisen from three main sources:
• deficiencies in questionnaire design and methodology;
• deficiencies in interviewing technique; and
• inaccurate reporting by respondents.

Errors may be caused by ambiguous or misleading questions, inadequate or inconsistent definitions of terminology used, poor questionnaire design (e.g. causing some questions to be missed), or poor or inaccurate responses from superannuation funds requested to supply data. Thorough testing occurred before the questionnaire format was finalised to minimise problems in questionnaire content, design and layout.

Response errors may also have occurred due to the lengthy nature of the survey, resulting in interviewer and/or respondent fatigue (i.e. loss of concentration). While efforts were made to minimise errors arising from deliberate misreporting or non-reporting by respondents (including emphasising the importance of the data and checking consistency within the survey instrument), some instances will have inevitably occurred.

Recall error may also have led to response error. Information recorded in this survey is essentially 'as reported' by respondents, and hence may differ from information available from other sources or collected using different methodologies. Responses may be affected by imperfect recall or individual interpretation of survey questions. Reference periods used in relation to each topic were selected to suit the nature of the information being sought; in particular to strike the right balance between minimising recall errors and ensuring the period was meaningful, representative (from both respondent and data use perspectives) and able to yield sufficient observations in the survey to support reliable estimates. It is possible that the reference periods did not suit every person for every topic, and that difficulty with recall may have led to inaccurate reporting in some instances.

Lack of uniformity in interviewing also results in non-sampling error. Thorough training programs, a standard Interviewer's Manual, the use of experienced interviewers and checking of interviewers' work were methods employed to achieve and maintain uniform interviewing practices and a high level of accuracy in recording answers on the survey questionnaire. A respondent's perception of the personal characteristics of the interviewer can be a source of error. The age, sex, appearance or manner of the interviewer may influence the answers obtained. In addition to the response errors described above, inaccurate reporting may occur if respondents provide deliberately incorrect responses.

Non-response bias

One of the main sources of non-sampling error is non-response, that is, when persons resident in households selected in the survey cannot be contacted, or, if contacted, are unable or unwilling to participate. Non-response can affect the reliability of results and can introduce bias. The magnitude of any bias depends upon the level of non-response and the extent of the difference between the characteristics of those people who responded to the survey and those who did not.

As it would not have been possible to quantify accurately the nature and extent of the differences between respondents and non-respondents in the survey, every effort was made to reduce the level of non-response.

For further information about the effect of non-response bias on SEARS 2007 superannuation data, see the entry for Superannuation under the 'Data quality' section in this chapter.

Processing errors

Processing errors may occur at any stage between initial collection of the data and final compilation of statistics. There are four stages where error may occur:
• coding, where errors may have occurred during the coding of various items by office processors;
• data transfer, where errors may have occurred during the transfer of data from the questionnaires to the data file;
• editing, where computer editing programs may have failed to detect errors which reasonably could have been corrected; and
• manipulation of data, where inappropriate edit checks, inaccurate weights in the estimation procedure and incorrect derivation of new items from raw survey data can also introduce errors into the results.

To minimise the likelihood of errors occurring, a number of quality assurance processes were employed throughout all stages of survey development and processing.

Seasonal effects

The estimates from the 2007 SEARS are based on information collected from April to July 2007, and due to seasonal effects they may not be fully representative of other time periods in the year. For example, SEARS 2007 asked standard ABS questions on labour force status to determine whether a person was employed. Employment is subject to seasonal variation throughout the year. Therefore, SEARS 2007 results for employment could have differed if the survey had been conducted over the whole year or in a different part of the year.

For further information about seasonal effects on superannuation data, see Appendix 2 in Employment Arrangements, Retirement and Superannuation, 2007 (cat. no. 6361.0).

LIMITATIONS ON DATA ITEMS

Social marital status

The distinction between visitors and usual residents is used to ensure that partnerships are identified only between persons usually resident in the same household. Due to the scope exclusions identified in Chapter 2: 'Survey Methodology', the standard variable used in SEARS 2007, 'Social marital status', identifies the living arrangements of couples in the Australian population only. A 'social marriage' is deemed to exist when a registered marriage, de facto marriage or couple relationship (either opposite-sex or same-sex) is reported in response to a question about relationships within the household, and when the two individuals concerned are usually resident in the same household. This may result in inconsistencies when social marital status is used in combination with some items output in SEARS 2007. For example, a person may report they intend to live off their spouse's income at retirement but may have a social marital status of 'Not married'. This is considered valid as the respondent may be in a registered marriage but the spouse may usually reside elsewhere.

Socio-Economic Indexes for Areas (SEIFA)

There are five Socio-Economic Indexes for Areas (SEIFAs) compiled by the ABS following each population census. Each of the indexes summarise different aspects of the socio-economic status of the people living in those areas. The index refers to the population of the area (the Census Collector's District) in which a person lives, not to the socio-economic situation of the particular individual. The index used in this publication was compiled following the 2001 Census. For further information about the SEIFAs see Information Paper: Census of Population and Housing - Socio-Economic Indexes for Areas, Australia (cat. no. 2039.0).

Reference periods

Different reference periods were used for collecting various components of SEARS 2007 data to correspond with information that would be readily available to respondents of the survey. As in the Labour Force Survey, labour force status is determined on the basis of activity in the reference week, that is, the week prior to the interview. Details of employment arrangements were generally collected on a 'usual working arrangements' basis. This differs from SEAS 2000 which only collected details of working arrangements based on work undertaken in the last 4 weeks.

Income data were collected using the last financial year as the reference period for business and property income, and the last pay period for wages and salaries and other sources of private income. Reported income amounts were recalculated to a weekly amount.

The preferred reference period for collection of superannuation data was the 2005-2006 financial year. However, where information was not available for this period, information was accepted for other periods, providing they commenced no earlier than 1 July 2004. In a small number of cases, information up to August 2007 was also used. Superannuation contribution amounts were converted to a weekly contribution amount.

The different reference periods for different topics in the survey can lead to apparent inconsistencies in the estimates. For example, a person may be currently working for an employer but also report some business income that relates to an unincorporated business that they were operating in the previous financial year. Similarly a person may be unemployed but report employer contributions to superannuation that were made in the 2005-2006 financial year when they were employed. The data as reported are assumed to be correct.

Jobs data

SEARS 2007 collected detailed information for a person's main job and second job (where applicable). A reduced set of information was also collected for a person's third and fourth job. While much of the jobs data is collected in relation to individual jobs, information about working patterns and preferred working patterns are based on the overall commitment to work, that is, for all jobs in which a person works, rather than each job singly. For example, respondents were asked about leave entitlements, in relation to each specific job, but were asked whether they usually do any work between 7pm and 7am, or on weekends in relation to all jobs. For the majority of the population who have only one job there is little if any difference in this approach, however, it does provide a more comprehensive picture of the total current employment commitments of multiple jobholders.

Care

SEARS 2007 collected information about the caring responsibilities that people have as an indication of their burden of care and the working arrangements they use, or would like to use, to help them manage these caring responsibilities. Questions relating to care of own children were asked of the first person in the family only, while everyone in the household was asked about any other caring responsibilities they had, either within or outside the household. While some information was collected on the characteristics of persons receiving the care, the focus of SEARS 2007 was on the care providers. More detailed information about the use and demand for childcare is available from Child Care Australia, June 2005 (cat. no. 4402.0) and information about disabled or aged persons and their carers is available from Disability, Ageing and Carers, Australia, 2003 (cat. no. 4430.0).

Inadequate information was collected for a small proportion of households (<1%) in relation to the children that lived in the household, which affected questions about caring arrangements and working arrangements used to care. Persons from these households are shown in a 'not determined' category for applicable data items.

Retirement

SEARS 2007 collected information about the plans that people aged 45 years and over have for retirement, including transitions to retirement, expected sources of income at retirement, reasons for retiring and retirement income. Inadequate information was collected to determine the retirement status (whether retired or not retired from the labour force) of a small number of people aged 45 years and over. Inadequate information was also collected regarding the retirement plans of a small number of people currently working part-time, and a number of people who did not know whether they were going to work part-time as a transition to retirement, but did intend to retire. These people are shown in a 'not determined' category for applicable data items.

Income

SEARS 2007 uses both gross personal income and equivalised gross household income. People's economic well-being is largely determined by their command over economic resources, and the amount of income to which they have access is an important component of these resources. While income is usually received by individuals, it is normally shared between family members. Even when there is no transfer of income between members of a household, they are still likely to benefit from the economies of scale that arise from the sharing of dwellings. Household income therefore provides an indication of people's economic well-being. However, larger households need greater income to achieve the same standard of living as smaller households, so to make meaningful comparisons, household income is adjusted or equivalised to take account of differing household size and composition.

Equivalised gross household income data are presented in this publication in quintiles. The quintiles are groupings that result from ranking all persons in the population in ascending order according to their equivalised gross household income, then dividing the population into five equal groups, each comprising 20% of the estimated population. The population used for this purpose includes all people living in private dwellings, including those under the age of 15 years. As the scope of this publication is restricted to only those persons aged 15 years and over, the distribution of this smaller population across the quintiles is not necessarily the same as it is for persons of all ages, i.e. the percentage of persons aged 15 years and over in each of these quintiles may be larger or smaller than 20%.

Superannuation

In SEARS 2007 a greater coverage of weekly contribution values was achieved by converting more of the reported contributions to a weekly equivalent. Total reported contribution amounts were converted to a weekly rate by dividing the total contribution amount by the contribution period in weeks. The contribution period was determined based on the length of time a respondent had been contributing to an account, and the period covered by their superannuation statement. If a respondent had been contributing to their superannuation account for less than 12 months, the statement length was the contribution period. If the statement was for a full year, the reported contributions were divided by 52.145 to obtain a weekly contribution rate. If the statement covered a monthly period, then the reported contributions were divided by the number of months multiplied by 4.345 to obtain the weekly contribution rate. Otherwise, statement periods were converted into weekly amounts by subtracting the start date of the statement from the end date to get the number of days, then converting the number of days into weeks by dividing by 7 and using the integer.

Respondents were asked to report the contribution period for contributions estimated without reference to a statement. If contribution periods were not reported, amounts were averaged over an assumed twelve months period. One-off payments were also averaged over twelve months.

Some respondents reported details that yielded either very high or very low weekly rates of contribution. A number of factors may cause high contribution rates, such as very large irregular or one-off contributions being made to the superannuation account, or a roll-over amount from a previous fund being reported as a contribution. Fund member responses to changes in the superannuation legislation may also have impacted on the size and volume of contributions made during the SEARS 2007 reference period (see Superannuation legislation in Appendix 2 of Employment Arrangements, Retirement and Superannuation, 2007 (cat. no. 6361.0)). Small irregular or one-off payments can also result in very low weekly contribution amounts.

Missing values for superannuation data and their effects on published medians and means

When a contribution or account balance was not able to be determined for a respondent, the value of that contribution or balance was recorded as missing. In estimation classes with significant percentages of missing values, the risk of biased estimates is increased. This is particularly true when the characteristics of the respondents with missing values and those with known values are significantly different. For example, analysis shows that there is a risk of underestimation in the median and mean values of superannuation savings in the 15-24 year old age group. This implies that the uncertainty in superannuation estimates for this age group may be greater than is indicated by the published RSEs. The following table summarises the overall 'missingness' in each of the superannuation variables.

 Analysis of respondents with missing values Missing values Missing values for 15-24 year olds % % Accumulation accounts 11.7 17.8 Withdrawal/resignation benefit of defined benefit accounts 5.0 3.7 Withdrawal/resignation benefit of hybrid accounts 9.1 14.0 Total superannuation balance 11.6 17.4

More information about the collection and quality of superannuation data in this survey is included in Appendix 2 of Employment Arrangements, Retirement and Superannuation, Australia, April to July 2007 (cat. no. 6361.0).