4808.0 - Illicit Drug Use, Sources of Australian Data, 2001  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 28/11/2001   
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> 7. Data quality issues

7.1 Introduction

The datasets, research papers and publications referred to in previous chapters are significant national sources of information on the prevalence of illicit drug use and its social and financial impact in the Australia community. However, those interested in this data need to be aware of a number of issues and caveats related to the interpretation of the data and any comparison among the various sources. Some of these issues originate in the diverse and complex nature of the topic, and some reflect the different definitions and classifications that are applied in obtaining the data.

When using data from any of the sources mentioned in this publication, it is imperative to know as much about the individual collection as possible. General information should be available on its purpose, the aims of those funding and/or conducting the collection, and the intended output from the collection. Details about the collection process need to be obtained, including the methodologies used to select respondents and collect data from them, as well as the data coding and processing systems. Answers to questions such as 'Who is included?' and 'How are they selected?' will indicate whether the information obtained from the dataset can be generalised to a broader population.

The final responsibility will fall on the analyst or researcher to make a judgement on whether a specific dataset is 'fit for purpose' in terms of its relevance, quality, etc. This decision needs to be formed by careful consideration of the types of issues which are discussed below.

7.2 Definitions

As there is no standard definition of the terms ‘illicit drugs’ or ‘drugs’, considerable variation exists in the coverage of relevant data sources and analyses. Additionally, substances can be grouped in different ways under headings such as 'stimulants' or 'hypnotics'. Consequently, data from one source or study may not be comparable with that from others, and users of data need to be aware of the detailed definitions of the terms used in each data source or study.

The ABS has provided some guidance on the relationship between various drugs by publishing, in 2000, the Australian Standard Classification of Drugs of Concern (Cat. no. 1248.0). This classification assists researchers and public policy planners by providing a consistent framework of drugs of concern. The classification categorises each substance according to its chemical structure, mechanism of action and physiological effects. It does not distinguish between drugs according to their legal status. Its indexes include terms commonly used by drug users, some proprietary or brand names, acronyms and chemical names.

Future use of the standard classification could help to standardise the collection of information about drug use or assist in defining the range of the drugs included in datasets. For example, data reports could reference the standard list of drugs and indicate which drugs are included/excluded from the analysis.

7.3 Data sources

The range of collections from which data on illicit drug use can be extracted fall into two main types - administrative collections and survey collections.There are fundamental differences between these two types of collections, concerning the target population of the collection and the randomness of inclusion in the collection. These issues will impact on the potential uses of the data. Administrative collections provide data on a specific client group who are generally a non-random selection from a broader population, whereas survey collections can be random selections from any targeted population of interest (within operational constraints).

Availability of data from the respondent and the nature of the questions to be answered (or hypotheses to be tested) are major factors when choosing the type of collection to be used for analysis. Often, data concerning a specific issue or group needs to be compared with data from a broader target population, to indicate how widespread, or prevalent, specific characteristics and behaviours are in the broader population.

For example, consider each of the following questions.

  • What is the prevalence of illicit drug use among Australian males aged 15-34 years?
  • What proportion of illicit drug users have been involved in criminal activities to support their habit?
  • What is the rate of hepatitis C among injecting drug users?

Each relates to a different target population. None of these questions can be fully answered by extracting data from administrative sources, as such sources cannot provide figures for the broader target population. Theoretically, these questions should be able to be answered by using data provided by carefully designed and implemented random surveys of 15-34 year old males, illicit drug users and injecting drug users, respectively. There are, however, difficulties in gaining good estimates from such surveys, due to the issues raised in the discussion of surveys below.

7.4 Administrative collections

Many of the datasets covered by this publication originate from the administrative system of a government or private organisation. These datasets provide information about the clients of those organisations. The characteristics of those captured in these datasets may not be representative of any larger group. Generally, they will not be representative of the population as a whole and may not even be representative of the target group of the organisation.

The primary aim of these administrative systems is to support and facilitate the provision of a service. Provision of information on illicit drug use is only a by-product. Thus, data items may be limited and not tailored to the requirements of analysts and researchers interested in illicit drugs. Further, the by-product nature of administrative collections often means that data from the systems are of unknown quality. The agencies responsible for the collections often do not focus on the important definitional issues nor build data quality into the systems. Duplicates, missing items, etc. are possible. Hence, it is important to have a statement on the quality checks conducted on any administrative dataset.

Many administrative collections have been in place for a number of years and can show changes over time. Caution needs to be exercised, however, as trends evident in data from these collections may reflect changes in service provision and administrative procedures rather than trends in the use of illicit drugs. For example, the number of drug offences recorded in police statistics may be influenced by variables such as improvements in the effectiveness of law enforcement activities or changes in the penalty attached to an offence.

National administrative datasets are often aggregated from State and Territory administrative sources, as provision of services is primarily the responsibility of State and Territory governments. This presents some challenges when gathering national data, as each State and Territory collects data for its own purposes. Different systems for information storage and different definitions of scope and data items may be used. It requires substantial effort, with the assistance of the responsible State and Territory bodies, to coordinate and collate data from their independent administrative systems into a national data collection.

Even when a standardised set of data items is available from all jurisdictions, each administration may still put a slightly different emphasis or interpretation on the data required, so care needs to be exercised when interpreting and comparing data from different States and Territories. For example, testing procedures adopted to identify the use of drugs can vary according to State or Territory of jurisdiction.

In a number of areas, such as services for the treatment of alcohol and other drugs, considerable work has gone into the development of minimum data sets with mandatory reporting by all jurisdictions based on standard definitions and protocols.

7.5 Surveys

In general, the nature of data collected in surveys is quite different from administrative collections. Surveys allow analysts and researchers to tailor the data items to meet the objectives of their study and enable subjects to be explored in depth. Survey collections can be designed to gain data from the population group of interest. If a large random selection from within the target population is practical, the survey results will be representative of the target population.

There are many sources of error in survey data, but there are two major types of error: sampling and non-sampling errors. Sampling errors occur because of the use of a sample rather than the complete enumeration of the population. The size of the sampling error is largely dependent on the size of the sample, as explained below, but it will also depend on the inherent variability of the population. Non-sampling errors can occur at any stage of a survey for reasons such as errors in response, recording or processing of the data, and can occur even if there is a complete enumeration of the population.

Although sampling and non-sampling errors occur to some extent in all surveys, they can be minimised by good survey practice. Each data source should have some discussion of these possible sources of errors and the steps taken to minimise their effects, so that any interpretation of results can be appropriate. Some survey features which impact on sampling and non-sampling errors are discussed below.

7.5.1 Sample selection

For any specific target population, consideration needs to be given to identifying an appropriate way of sampling from that population. Large national surveys of the general population, such as the National Survey of Mental Health and Wellbeing of Adults (SMHWB) conducted by the ABS, may adopt a multistage design based on drawing a sample of private dwellings. Surveys of school children may be based on a sample of students, drawn from a sample of schools. These designs would generally be based on a random sample at each stage.

An important attribute of any random sample is that the likelihood of any individual in the target population being included in the sample can be calculated - from knowledge about the number of dwellings, the number of schools, the number and characteristics of students in each school, etc. In this situation, data collected from each respondent can be used to represent the data which would be collected from similar people in the target population. Hence, the sample data can be used to generate estimates for a broader population. The reliability and validity of the estimates will depend on the representativeness of the sample and the quality of the data collected from the respondents.

Clearly, it is not realistic to expect that such survey designs would be feasible for all target populations, particularly those whose numbers are small in the general population or who are difficult to find because, for example, they are homeless. Studies of populations of illicit drug users often employ non-random methods to obtain a sample. Methods of recruitment have included peer referral, advertising in magazines and visiting known locations of illicit drug users. To ensure the sample contains a range of the known population under study, researchers may make use of information from previous work in the field to target the recruitment of respondents according to known characteristics such as age, sex or city of residence. Although these non-random surveys cannot be used to generate estimates for the broader population, they can shed light on the behaviours, environments, etc. of many illicit drug users.

7.5.2 Representativeness of the sample

Samples achieved in surveys may not represent the total target population because of the scope of the survey design. For example, surveys based on private dwellings, such as the SMHWB, exclude people who are homeless as well as those in non-private dwellings (e.g. hotels, guest houses) or institutional settings (e.g. hospitals, prisons, military establishments and university halls of residence). The exclusion of these groups can bias the results if the prevalence and patterns of their illicit drug use differ from that displayed by people residing in private dwellings.

A low response rate to a survey may also result in biased information. This is partly because there is no way of knowing whether those who refuse to participate in the survey have characteristics similar to those who do participate. It may also be the case that response rates vary across groups so that particular groups within the population may be under-represented in the sample, as is frequently the case with young adult males, for example. If their behaviours are different from other groups within the population, then the procedures to produce estimates for the broader population may only partly overcome the bias in the sample. Hence, if the response rate is low, the survey results can reflect only the characteristics of the respondents and should not be used to draw inferences about the whole population.

Biases are also known as systematic errors as they produce survey results unrepresentative of the target population by systematically distorting the survey estimates. The most common sources of bias are survey non-response and samples which are not representative of the population of interest. The magnitude of the bias will depend on the extent to which those under-represented in the sample differ from those included in the sample.

7.5.3 Sample size

Sample size depends on a range of factors, including the objective of the survey, the funds available, and the response rate to be achieved. Whatever the sample size, the sample results would not be identical to results obtained from the whole of the population. One frequently used measure of the difference which results from the use of a sample rather than the complete enumeration of the population is the sampling error. A low sampling error means we can expect the sample results to be close to the population results. In general, the larger the size of the sample, the lower the sampling error. Conversely, the smaller the sample size, the larger the sampling error and the less reliable are any estimates based on the sample.

This limitation is not only applicable to small sample surveys; it may also limit the information available from large surveys when data on small sub-groups are extracted for analysis. If the sub-group of interest is only a small proportion of the total population, randomly selected samples of the total population can be expected to include only a small number of the target group, and thus yield little reliable data concerning this target group. For example, results from national household surveys may provide little detail about small specific sub-populations, such as injecting drug users, particular ethnic groups and regional populations. Other studies which target just these specific population groups can be more valuable in providing detailed information and in monitoring trends within these groups.

7.5.4 Self-reported responses

In general, surveys are not objective in the sense that they rely on self-reporting of situations, behaviours and attitudes. The results of a personal interview or a written questionnaire may be affected by the respondent’s ability to recall events accurately. Results may also be influenced by the respondent’s willingness to discuss illegal activities openly. Respondents may have concerns that they will be reported to the authorities, or that their answers will become known to other family/household members. Although survey procedures can help address such issues, it is likely that a socially unacceptable behaviour such as illicit drug use would be under-reported in surveys of the general population. In some cases, however, as with younger persons, involvement in such non-conventional behaviour may be over-reported.

Response errors, such as inaccurate reporting by respondents can be reduced by good questionnaire design. For example, thorough testing is necessary to ensure correct interpretation of the wording of questions. Conversely, the likelihood of response errors will increase with inadequacies in the questionnaire, imprecise application of survey procedures, incorrect recording of answers, errors in data entry and processing, etc.

Previous PageNext Page