4720.0 - National Aboriginal and Torres Strait Islander Social Survey: Users' Guide, 2008
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 26/02/2010
|Page tools: Print Page Print All|
This chapter provides information about the release of microdata from the 2008 National Aboriginal and Torres Strait Islander Social Survey (NATSISS).
The 2008 NATSISS data will be available via three Expanded Confidentialised Unit Record Files (CURFs):
The Expanded CURFs are released with the approval of the Australian Statistician and will only be available through the Australian Bureau of Statistics (ABS) Remote Access Data Laboratory (RADL). No Basic CURF (on CD-ROM or RADL) is being released.
The RADL is an on-line database query system that supports access to the CURFs. Microdata are held on a server at the ABS and accessed by submitting programs to interrogate and analyse data, as well as access the results. More information about accessing the CURFs and using the RADL is available from the About CURF Microdata page on the ABS website.
This chapter contains important directions for use of the 2008 NATSISS CURFs, which must be adhered to, in order to maintain the confidentiality of survey respondents as agreed to in the Deed of Undertaking. It also provides important information to consider when specifying output from the CURF files and to aid interpretation of the content. Where relevant, references to information contained in other chapters of the Users' Guide have been included.
USING THE CURF MICRODATA
This segment contains the following topics:
About the CURF microdata
The survey microdata are being released under the provisions of the Census and Statistics Act 1905. This Act allows for the release of data in the form of unit records where the information is not likely to enable the identification of a particular person or organisation. Accordingly, there are no names or addresses of survey respondents on the CURFs, and other steps have been taken to protect the confidentiality of respondents. This includes:
To further assist in protecting the confidentiality of unit record data, all dollar values have been perturbed. That is, each value has been adjusted up or down by a small, random amount. Also, for each of these items, each value above or below a certain cut-off limit has been set to the mean of all the values above or below the cut-off limit.
Steps to confidentialise the data available on the CURFs are taken in such a way as to maximise the usefulness of the content while maintaining the confidentiality of respondents to ABS statistical collections. As a result, it may not be possible to exactly reconcile all the statistics produced from the CURFs with published statistics.
Intending purchasers should ensure that the data they require, at the level of detail they require, are available on the CURF/s. Data collected in the survey but not contained on the Expanded CURFs may be available on request in tabulated form. See the Survey products chapter for more information on specialised data requests.
Data from the 2008 NATSISS will be available via three Expanded CURFs:
The primary differences between these CURFs are the:
Each CURF contains five levels from the survey:
For further information see CURF file structure below.
This CURF contains a data item (STATEC) which identifies each state and territory separately, except Tasmania and the ACT. Due to confidentiality considerations, the samples from Tasmania and the ACT have been combined into a single category of Tas/ACT. This data item is located on the household level file in the State/Territory CURF.
State/Territory by ASGC Remoteness Structure CURF
This CURF contains a broad National Remoteness data item (ARIAC) and a special data item (REMSTAT), consisting of 13 output categories which comprise selected cross-classifications of state/territory by remoteness, where sample size permits. Output categories can be found in the data item list, which has been released with this publication. These two data items are available on the household level file in the State/Territory by ASGC Remoteness Structure CURF.
Detailed State/Territory by ASGC Remoteness Structure CURF
This CURF contains a broad National Remoteness data item (ARIAC) and a special data item (REMSTATE), consisting of 17 output categories which comprise selected cross-classifications of state/territory by remoteness, where sample size permits. Output categories can be found in the data item list, which has been released with this publication. These two data items are available on the household level file in the State/Territory by ASGC Remoteness Structure CURF.
A detailed list of the data items and categories for each CURF has been released on the ABS website in spreadsheet format. The majority of data items on the CURFs are the same as collected and output on the main survey file. In a small number of instances, it was necessary to remove items or to reduce the level of information available on the CURFs for confidentiality reasons. Where information on the CURFs differ to the main survey file or between CURFs, this can be determined by comparing response categories on the data item list or through the notation, 'Not on CURF'. Refer to the data item list for more information.
Alcohol consumption risk level
Error in the derivation of alcohol consumption risk levels for people drinking at 'low risk' and 'risky' levels.
Risk of harm from alcohol consumption in the long term (chronic risk)
In the 2002 and 2008 National Aboriginal and Torres Strait Islander Social Surveys (NATSISS), the 2001 National Health and Medical Research Council (NHMRC) Guidelines were incorrectly applied, such that males who reported four standard drinks (50 mls of alcohol) and females who reported two standard drinks (25 mls of alcohol) were categorised as drinking at 'risky' rather than 'low risk' levels. Males reporting seven or more standard drinks and females reporting five or more standard drinks were correctly categorised as being at 'high risk' of harm.
Risk of harm from alcohol consumption in the short term (acute risk)
In the 2002 and 2008 NATSISS, the 2001 NHMRC Guidelines were incorrectly applied, such that males who reported six standard drinks (75 mls of alcohol) and females who reported four standard drinks (50 mls of alcohol) were categorised as drinking at 'risky' rather than 'low risk' levels. Males reporting eleven or more standard drinks and females reporting seven or more standard drinks were correctly categorised as being at 'high risk' of harm.
An Information Paper outlining the nature and extent of the error and including revised data has been published (cat. no. 4714.0.55.005). In addition to the revised data in the Information Paper, the ABS has also provided a set of data cubes with more detailed information.
Users of the NATSISS 2008 CURF are advised not to use the following items to calculate alcohol consumption risk levels based on 2001 NHMRC Guidelines:
Instead, the data items shown in the table below should be used to calculate alcohol consumption risk levels (based on 2001 NHMRC Guidelines).
CURF file structure
The 2008 NATSISS CURFs contain separate files, arranged in a hierarchy, made up of the following levels:
1. Household: contains household descriptors (eg size, structure), household income and geographic items, including a SEIFA index.
3. Selected person: contains information about each survey respondent, including demographic and socioeconomic characteristics, and the full range of health items obtained in the survey, other than those contained in levels 4 to 6.
4. Barriers: contains details of types of services reported as having access problems (BARNUM) and the associated access problem for each service (FCBQ3).
5. Culture: contains details of the types of cultural activities participated in (CULNUM) and the reasons for participation for each activity (CULPQ4).
6. Discrimination: contains details of the situations in which discrimination had been felt (DISNUM) and the frequency of discrimination for each situation (DISCQ12).
Relationships between the levels
Levels 1 and 3 are in a hierarchical relationship: a household comprises one to four selected persons.
Levels 4 to 6 are in a hierarchical relationship with level 3 (selected person). These levels exist to describe 'one to many' relationships. For example:
Some items relating to the topics covered in these lower level hierarchies (levels 4 to 6) are also available on the selected person level (level 3), where appropriate. For example, while the details regarding type of cultural activity participated in and associated individual reasons for participation in each activity are held at the culture level (level 5), other culture related items such as child participation in cultural activities with main carer or frequency of attendance at cultural activities (15 years and over) are available on selected person level (level 3). Levels 4 to 6 only exist where the selected person is in the applicable population. For example, there are no records for children aged 0-14 years on the barriers or discrimination levels (levels 4 and 6).
Counting units and weights
The counting unit for each level is as follows:
Level 1 - the household;
Level 3 - the selected person/s;
Level 4 - the service type;
Level 5 - the cultural activity; and
Level 6 - the situation in which discrimination had been felt
There is a weight attached to the selected person level (level 3) to estimate the total Indigenous population, and the household level (level 1) to estimate total Indigenous households.
The selected person weight can be used on levels 4 to 6 by copying it across. When the weight is used for these levels, the population is restricted to persons who have a record on the particular levels and will therefore be repeated for each instance of the counting unit.
A person weight provides an estimate of the number of persons with the selected characteristic. Replicate weights (ie. WHM1001 to WHM1250, WPM1001 to WPM1250) have also been included on the selected persons level and may be used to calculate the sampling error on any estimate produced from the CURFs. For more information, refer to:
Identifiers can be used on records at each level of the file to copy information from one level to another. There are several identifiers for each record on each of the levels, including:
The identifiers ABSHID and ABSPID appear on all levels of the file. At higher levels, identifiers for lower levels are set to zero. For example, on the household level the identifier for person number is set to zero. No family numbers or income units have been output for this survey.
Copying data to a lower level
There may be instances when a data item is not contained on the level of the file required by users. For example, geographic information is not included on the selected person level because it is a household characteristic. To copy data from the household level to the person level:
The resulting dataset will contain all the household level information (eg geography) attached to the person level data (eg age and sex).
Continuous data items
When analysing continuous items at the person and household levels, it is necessary to exclude the special codes. The special codes are used for responses that do not represent the data being collected (eg 'Don't know'). The codes vary, but will generally be 0, 96, 97, 98, 99 or variations of these. For example, the 'Weekly rent' data item has reserved values of:
The data item list provides the special codes for continuous items. Care should be taken to exclude these codes when categorising higher values for ranges, and when calculating means, medians and other summary statistics.
Multiple response items
There are a number of data items on the CURFs that contain multiple responses. This means that the person being interviewed was able to select one or more response categories for these items. Multiple response items are indicated on the data item list.
On the CURFs, each response category for the multiple response questions is treated as a separate data item. Each data item therefore has a response of either:
A 'Not applicable' response has a code of '0' indicating that the response category does not apply for the respondent. A 'Yes' response has a code greater than '0' indicating a positive response for that category.
An example of a multiple response item is the question on the 'Types of selected stressors experienced by self, family or friends in last 12 months' (TSTRALL), which has 27 response categories. From these categories, 25 separate data items have been produced - TSTRALLA, TSTRALLB, TSTRALLC...TSTRALLY.
In most cases, multiple response items will have a number of categories falling into the first SAS category. This is denoted by an 'A' at the end of the fixed SAS name, eg TSTRALLA. This category will contain the first multiple response category, as well as any special codes for the item. Using the example of TSTRALLA, these special codes are 97 'Not applicable' and 98 'Not stated'. When using data from these multiple response items, the placement of these special codes should be confirmed by referring to the data item list.
Use of repeating datasets
The 'one to many' relationships described by levels 4 to 6 are known as repeating datasets, that is, sets of data with a counting unit which may be repeated for a person. For example, a repeating dataset for situations or places felt discriminated against (DISNUM) will have one record per situation reported because the situation is the counting unit. Repeating datasets are only useful when common information is collected for each instance of a counting unit. For example, the table below shows that each situation (DISNUM) reported has the data item 'Frequency of discrimination in last 12 months' (DISCQ12) associated with it. By using this item, a table can be run which provides the frequency of how often discrimination occurred for each situation identified.
To run the above-mentioned example, the following SAS code (or equivalent) can be used:
PROC FREQ DATA= ISST08DS;
The following output would be produced for the example data set:
Although the sample output in the table only relates to a single person, the totals are a count of all services for that person.
Data from the 2008 NATSISS will be available via three Expanded CURFs:
Due to questionnaire differences between non-remote and remote areas, there are some differences in the data items available on the three CURFs. An overview of the differences in content of the three CURFs is provided in the tables below and 'Expanded CURFs - Availability of file contents'.
Some survey questions were only asked of people/households in non-remote areas. Data items based on these questions are therefore only available for non-remote areas and have no values for remote geographies. Output for these data items is restricted to the non-remote areas shown in the tables below, see 'Non-remote only data items'. In the data item list, these items are noted as being 'Non-Remote only' or identify the 'main population' as being non-remote persons/households. These items are only available on the State/Territory by ASGC Remoteness Structure CURF and the Detailed State/Territory by ASGC Remoteness Structure CURF.
In some cases, the concepts and questions asked in non-remote and remote areas may be comparable, but additional response categories were available for data items in non-remote areas. Two data items have been produced to account for these differences:
Data items and versions of data items that are restricted to non-remote only cannot produce results for total Australia, due to only representing data for non-remote areas of Australia. However, Australia level non-remote data can be produced by using the following data item:
The following table provides information on the availability of non-remote only data items by Australia level geographic areas. If an item is available (eg non-remote only), then where it meets other population criteria, it will contain data. Where the geographic output category is Remote/Very Remote, people living in these areas will be included in the 'not applicable' category of the non-remote only data item. If more detailed non-remote geographic categories are required (ie Major Cities, Inner Regional, Outer Regional) please contact the National Information and Referral Service, whose details are provided in the Survey products chapter.
To obtain valid results for data items that are restricted to non-remote only at the state/territory level, it is necessary to access the State/Territory by ASGC Remoteness Structure CURF or the Detailed State/Territory by ASGC Remoteness Structure CURF. When using the State/Territory by ASGC Remoteness Structure CURF use either of the following data items:
When using the Detailed State/Territory by ASGC Remoteness Structure CURF use either of the following data items:
The following table provides information on the availability of non-remote only data items by state/territory geographic areas for the State/Territory by ASGC Remoteness Structure CURF. If an item is available by a geographic output category (eg Major cities), then where it meets other population criteria, it will contain data. Where the geographic output category is Remote/Very Remote, people living in these areas will be included in the 'not applicable' category of the non-remote only data item.
The following table provides information on the availability of non-remote only data items by state/territory geographic areas for the Detailed State/Territory by ASGC Remoteness Structure CURF.
Indigenous status for Queensland
The 2008 NATSISS sample for Queensland was designed to allow for the release of data on the Torres Strait Islander population. When using the Indigenous status item for Queensland on the CURFs (INDSTATQ), it should be noted that the Torres Strait Islander category comprises persons who:
The 2008 NATSISS CURFs, the State/Territory CURF and the State/Territory by ASGC Remoteness Structure CURF, include one Socio-Economic Index For Areas (SEIFA):
This item may be used in conjunction with other geography items, which are available on the CURF being used (eg STATEC, REMSTAT, ARIAC).
The Index of Relative Disadvantage (CDDIS) is not included on the Detailed State/Territory by ASGC Remoteness Structure CURF.
The SEIFA index is presented in deciles and is calculated by simply grouping Collection Districts (CDs) into 10 equal groups (ie equal number of CDs in each group) then matching the CDs of survey records to those groups. As all CDs are not equal in size, and because the NATSISS sample is not selected to ensure an equal distribution at the CD level, this method does not result in an equal number of people or households in each decile.
The characteristics indicated by the index relate to the area (in this case the CD) in which a population lives, not necessarily to all individuals who live in that area. It should also be noted that the variables used to create the index are not necessarily the most appropriate for the Indigenous population, and being an area based index it is not an Indigenous specific disadvantage measure.
For further details on SEIFA refer to the Information Paper: An introduction to Socio-Economic Indexes for Areas (SEIFA), 2006 (cat. no. 2039.0), available from the ABS website.
RELIABILITY OF ESTIMATES
This segment contains the following topics:
For detailed information on the following topics, refer to the Interpretation of results chapter:
Calculating standard errors
The person level and household level records on the CURFs contain 250 replicate weights. The standard error (SE) for each estimate produced from the CURFs can be calculated using the replicate weights provided. When calculating SEs it is important to select the replicate weights which are most appropriate for the analysis being undertaken. For more information see the segment Counting units and weights.
The formula for calculating the:
of an estimate using the replicate weights technique is shown below.
Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. More information on sampling error is provided in the Interpretation of results chapter and information on the replicate weights technique used for the 2008 NATSISS is available in Appendix 2: Replicate weights technique.
Age standardisation may be applied to some data to improve comparisons between Indigenous and non-Indigenous populations. Different methods of age standardisation are appropriate for different types of data and for different purposes. The application and/or most appropriate method of age standardisation should be considered when using data from the 2008 NATSISS. Further information on age standardisation and the method applied to the 2008 NATSISS summary publication is provided in the Interpretation of results chapter. Note: age standardised weights have not been produced for the CURFs.
CONTENTS OF THE CURFS
Differences in the file contents of the 2008 NATSISS Expanded CURFs are outlined in the following table. Data items marked with an 'X' indicate availability on the CURF.
The following table outlines the available file formats for each of the Expanded CURFs.
CONDITIONS OF RELEASE
This segment contains the following topics:
Release of the CURFs
The Expanded CURFs are being released in accordance with a Ministerial Determination (Clause 7, Statutory Rules 1983, No. 19) in pursuance of Section 13 of the Census and Statistics Act 1905. As required by the Determination, the CURFs have been designed so that the information on the files is not likely to enable the identification of a particular person or organisation to which it relates.
All CURF users are required to read and abide by the User Manual: Responsible Use of ABS CURFs, Sep 2009 (cat. no. 1406.0.55.003), available from the ABS website. Use of the data for unauthorised purposes may render the purchaser liable to severe penalties. Advice on the propriety of any intended use of the data is available from the:
ABS Microdata Access Strategies Section
T: (02) 6252 7714
F: (02) 6252 8132
Deed of Undertaking
The Australian Statistician's approval is required for each CURF release. The ABS also requires all organisations, and individuals within organisations, who purchase or are seeking to use a CURF to sign and submit a Deed of Undertaking. The Deed legally binds the applicant to comply with ABS terms and conditions of CURF access. The undertaking requires that CURF users will:
Use of data for statistical purposes
Use of the data for statistical purposes means that the CURF data is used to produce information of a statistical nature. Examples of statistical purposes are:
Conditions of sale
All ABS products and services are provided subject to the ABS Conditions of Sale.
Information about CURF prices and payment is available from the Microdata prices page.
Accessing the CURFs
Due to the level of detail provided, the Expanded CURFs will only be available via the ABS Remote Access Data Laboratory (RADL). As the three 2008 NATSISS CURFs may not be merged, anyone requiring use of more than one CURF will need a separate RADL log-in for each CURF. More information on this process will be provided to applicants.
Further information on accessing the CURFs is available in the User Manual: Responsible Use of ABS CURFs, Sep 2009 (cat. no. 1406.0.55.003).
The CURFs can be accessed by universities participating in the ABS/Universities Australia Agreement for research and teaching purposes.
These documents will be presented in a new window.