4715.0.55.002 - Technical Manual: National Aboriginal and Torres Strait Islander Health Survey, Expanded CURF, 2004-05  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 21/02/2014   
   Page tools: Print Print Page Print all pages in this productPrint All



This chapter contains important directions for the 2004-05 NATSIHS CURF user, which must be adhered to in order to maintain the confidentiality of survey respondents as agreed to in the responsible officer and individual users undertakings. It also provides important background information to consider when specifying output from the CURF.


The 2004-05 NATSIHS CURF contains separate files arranged in a hierarchy made up of the following levels:

1. Household: Contains household descriptors (eg size, structure), household income, and geographic items, including a SEIFA index. The data item INDGSTHH is available on this level to identify Indigenous and non-Indigenous households. Some categories within data items on this level are not available for non-Indigenous persons/households and are identified as being for 'Indigenous households only' in the data item list.

2. (All) Persons in household: Contains selected demographic information about all residents of sampled households, except for large households where some non-selected persons were dropped. The data items INDGSTHH and INDGSTAC are available on this level to identify Indigenous and non-Indigenous households/persons.

3. (Selected) Person: Contains information about each survey respondent, including demographic and socioeconomic characteristics, and the full range of health items obtained in the survey, other than those contained in levels 4-7. The data item INDSTATA is available on this level to identify Indigenous and non-Indigenous persons. Some data items and categories within data items on this level are not available for non-Indigenous persons and are identified as being for 'Indigenous only' in the data item list.

4. Alcohol: Contains detailed information about type of drinks and amounts consumed on the days recorded in the survey.

5. Conditions: Mainly contains information about the types of long term conditions reported, including conditions classified to ICD-10 and ICPC. Some categories within data items are not available for non-Indigenous persons and are identified as being for 'Indigenous only' in the data item list.

6. Injury damage: Contains details of the injuries reported as sustained in the most recent injury event.

7. Body part injured: Contains details of the body parts injured in the most recent injury event.

The first three levels are in a hierarchical relationship: a household comprises a number of residents ((all) Persons in household level), from which one to four were the selected respondents to the survey ((selected) Person level).

Levels four to six are in a hierarchical relationship with the (selected) Person level and level seven is in a hierarchical relationship with level six. These levels exist to describe 'one to many' relationships. For example:

  • a person may report drinking alcohol on more than one day in the reference period and different quantities of multiple drink types on each of those days;
  • a person may report multiple conditions;
  • a person may report multiple types of injuries and/or injuries to multiple body sites arising from a single injury event.

Some items relating to the topics covered in these lower level hierarchies are also held at the (selected) Person level, where appropriate. For example, while the detailed alcohol consumption data (day of consumption x type of drink x quantity consumed) are held at the alcohol level, other alcohol related items such as risk level, summary of drink types and quantity consumed, recorded consumption compared to usual, etc, are held at the (selected) Person level. The lower level records (levels 4 to 7) only exist where the person is in the relevant population, so that, for example, if a person reported that they did not have a long term condition, there would be no record for them in the Conditions level.


The counting unit for each level is as follows:
      level one - the household;
      level two - the person;
      level three - the selected person/s;
      level four - the types and quantities of alcohol and days consumed;
      level five - the condition;
      level six - type of injury; and
      level seven - body part.

There is a weight attached to the (selected) Person level (level three) to estimate the total Indigenous or non-Indigenous population, and the Household level (level one) to estimate total Indigenous households (Note: there are no household weights for households comprising non-Indigenous persons only from the NHS).

While the CURF contains an (all) Persons in household level (level two), this should be considered as characteristics of the selected persons (or, for Indigenous only, the household also); the level is not intended for use to produce person estimates. For non-Indigenous households, the household level should also be considered as a characteristic of the selected persons and is not intended for use to produce household estimates.

The person weight can be used on levels 4 to 7 by copying it across. When the weight is used for these levels, the population is restricted to persons who have a record on the particular levels and will be repeated for each instance of the counting unit.

A person weight provides an estimate of the number of persons with the selected characteristic. Replicate weights (e.g. AS_TO001 to AS_TO250) have also been included and these can be used to calculate the sampling error on any estimate produced from the CURF. Age standardised weights and replicate weights have also been included on the (selected) Person level. For more information, refer to the 'Calculating standard error (SEs) and relative standard errors (RSEs)' and 'Age standardisation' sections below.


There are a series of identifiers that can be used on records at each level of the file to copy information from one level to another.

The identifiers ABSHID, ABSFID, ABSIID, ABSPID, ABSJID appear on all levels of the file. Where the information for the identifier is not relevant for a level, it has a value of 0.

Each household has a unique twelve-character (combination of characters and digits) random identifier (ABSHID). This identifier appears on the Household level, and is repeated on every other level. On the (all) Persons in household level, each family within the household is numbered sequentially (ABSFID) with non family members numbered sequentially from 50; and within each family, each income unit is numbered sequentially (ABSIID). Within the household, each person is numbered sequentially (ABSPID). Items containing this family, income unit and person number then appear on all the levels below the (all) Persons in household level. The combination of household and family identifier uniquely identifies the family. A combination of household, family and income unit identifiers uniquely identifies the income unit. The Injury damage level and the Body part injured level also have an identifier (ABSJID) and with ABSHID and ABSPID it enables information to be copied between levels seven and eight.

The identifiers needed for linking information between records are:
      1. Household = ABSHID
      2. (all) Persons in household = ABSHID, ABSPID;
      3. (selected) Person = ABSHID, ABSPID;
      4. Alcohol = ABSHID, ABSPID;
      5. Conditions = ABSHID, ABSPID;
      6. Injury damage = ABSHID, ABSPID, ABSJID;
      7. Body part injured = ABSHID, ABSPID, ABSJID;

To copy information from a lower level to a level above the following SAS code can be used (or equivalent):

IF ICD10D=57 THEN FLAG=1; *set flag for condition = tinnitus;


The TINNITUS file, using the Condition level dataset, keeps the last record for each ABSPID, i.e. person, and sets the item FLAG to 1 if the person has tinnitus as a condition. This newly created flag is then merged on to the Person level file so that this item can now be cross classified or analysed with any other item on the Person level.

To copy information from a higher level to a level below the following SAS code can be used (or equivalent):

IF A AND B THEN OUTPUT; *only keeps records which are present on both files;

Unlike the previous merge, this merge will match one person record to many injury records. The statement "If A and B then OUTPUT;" ensures that only records present on both files are kept. If this statement was not used then person records without a corresponding injury record would appear with a missing value for all injury data items. Note that the data items copied from the (selected) Person level will now be duplicated as required to match the counting unit for the level they have been added to, injuries in this case (i.e. if a person was bruised and had an open would, their data from the (selected) Person level will be attached to both types of injury).


For items with 'continuous' values, such as Personal gross weekly cash income, certain values are reserved as special codes and must not be added as if they were quantitative values. These values are at the upper range of items (for example, 999 for three digit items, 9999 for four digit items etc.) and are specifically identified in the data item list with their text meaning specified next to them as shown:

      Continuous income
      999,995 Not stated
      999,996 Not applicable
      999,997 Refusal
      999,998 Not known
      999,999 No source


A number of the questions asked during the NATSIHS allowed respondents to give multiple responses. On the CURF, each response category for such questions is treated as a separate data item. An example of a multiple response item is the "How usually feel when treated badly because Aboriginal/Torres Strait Islander" data item which lists nine categories from DISCRQ3A (feel angry) through to DISCRQ3F (other feeling). Except for DISCRQ3A, output for each of these categories (for example DISCRQ3B) will contain two codes:
  • a zero code (eg. indicating the number of people who don't feel sad when they are treated badly or haven't been treated badly); and
  • a non-zero code (eg. in this case '2', indicating the number of people who do feel sad when they are treated badly).

User note: as respondents can report more than one category for a multiple response item, the sum of responses for all categories will exceed the sum of respondents for that item.

In most cases, multiple response items will have a number of categories falling into the first SAS category (denoted by an 'A' at the end of the fixed SAS name, e.g. DISCRQ3A). This category will contain the first multiple response category, as well as any special codes for the item (for example for DISCRQ3A these special codes are 7 None of the above, 97 Refusal, and 98 Not stated). When using data from these multiple response items, users should first confirm the placement of these special codes.


The 'one to many' relationships described by levels 4 to 7 are known as repeating datasets i.e. sets of data with a counting unit which may be repeated for a person. For example, a repeating dataset for conditions will have one record per condition reported because condition is the counting unit (see table example below). Repeating datasets are only useful when common information is collected for each instance of a counting unit. For example in the table below, each condition (ICD10D) reported has the data item "Whether condition a result of an injury" (WCONDRJ) associated with it. By using this item, a table can be run to ascertain which of the conditions reported are the result of an injury.

Example of 'Conditions' repeating dataset

Household ID (ABSHID)
Person ID (ABSPID)


To run the table mentioned above the following SAS code (or equivalent) can be used:


The following output would be produced for the example data set:




- nil or rounded to zero (including null cells)

The condition is the result of an injury when WCONDRJ is code 1. The above example shows that one of the four conditions is the result of an injury. Note that although the output above only relates to a single person the totals are a count of all conditions for that person.


The 2004-05 NATSIHS CURF contains three geographic field items: State/Territory of usual residence (STATEC); ASGC Remoteness Structure (ARIAC) and a combination of State/Territory by ASGC Remoteness Structure (REMSTAT). The CURF also contains one SEIFA index (see SEIFA Index section below).

STATEC identifies each state and territory separately except Tasmania and the ACT. Due to confidentiality considerations, the samples from Tasmania (876 records) and ACT (368 records) have been combined into a single category of Tas./ACT.

ARIAC has two output categories: non-remote and remote/very remote.

REMSTAT has thirteen output categories that comprise selected cross-classification of state/territory by remoteness where sample size permits. Output categories can be found in the data item list and in the table below.

User notes:
  • Only one form of Geography may be used at a time (including the SEIFA).
  • Cross-classification of state/territory by remoteness must only be undertaken using REMSTAT.
  • The sample in the Northern Territory (NT) was reduced for the NHS to a level such that NT records contribute appropriately to national estimates but are insufficient to support reliable estimates for the NT. As a result, non-Indigenous estimates for NT should not be produced from the CURF.
  • The remote sample of the NHS does not sufficiently support data to be released at this level and does not have the same scope and coverage as that of the NATSIHS. As a result, non-Indigenous estimates for remote areas should not be produced from the CURF.

Users of the 2004-05 NATSIHS CURF on RADL are advised that some items collected using CAI are only valid for restricted geographic output. This is because either the entire item was collected in CAI only (predominately used in non-community areas) or certain categories of the item were collected in CAI but not in PAPI (community areas). Due to confidentiality issues that arise because of the size of populations in the different states and territories, there are some restrictions on the geographic output for these items. The following section provides information on how to use the geographic variables ARIAC and REMSTAT in order to obtain valid output for these items.

There are 384 data items (excluding identifiers and weights) on the CURF, 100 of which are available for non-remote areas only. There are two reasons for restricting items to non-remote areas only:
  • A number of items were only collected in CAI. These items were not collected in PAPI. They are flagged as being available for "Non-remote" areas in the data item list. These items have been coded so as to have no values for remote geographies. Valid output for these items is restricted to non-remote areas as shown in the table below.
  • Some items were collected in both CAI and PAPI, however the CAI either separated concepts (such as General Practitioner and Specialist) or collected categories not collected in PAPI. For these items there are two versions of items.
      • The first version is a collapsed version of output categories or concepts common to both CAI and PAPI. They can be output for all geographies on the CURF (and are indicated in the data item list as being available for "Both" non-remote and remote areas).
      • The second version is either an item which contains an expanded set of output categories including categories that are unique to CAI, or a number of items which provide separated information about concepts that have been combined in the PAPI. These items have been coded so as to have no values for remote geographies. Valid output for these items is restricted to non-remote areas as shown in the table below (and are indicated in the data item list as being available for "Non-remote" areas).

Australia level

Those data items and versions of data items restricted to non-remote only cannot produce results for total Australia. These items require the use of the ARIAC variable with results available for non-remote Australia only.

State/Territory level

To obtain valid results for these non-remote restricted data items at state level, users must use the state by remoteness variable (REMSTAT) and use only the valid output categories shown in the table below. STATEC can't be used for these data items.

Valid geographic output categories for non-remote items Invalid geographic output categories for non-remote items
(Complete coverage of CAI-specific items) (Incomplete coverage of CAI-specific items)

Australia Level (ARIAC)
1 Australia non-remote2 Australia remote/very remote
State/Territory Level (REMSTAT)
1 NSW Major Cities 8 Qld Remote/Very Remote
2 NSW Inner Regional 10 WA Remote/Very Remote
3 NSW Outer Regional 11 NT Remote/Very Remote
4 Vic Total(a) 13 Balance of Australia–Remote/Very Remote(d)
5 Qld Major Cities
6 Qld Inner Regional
7 Qld Outer Regional
9 WA Non-remote(b)
12 Balance of Australia–Non-remote(c)

(a) As only a very small portion of Vic is considered remote and enumeration did not occur in these areas, this represents Non-Remote.
(b) Includes Major cities, Inner regional area and Outer regional area.
(c) Includes Non-remote areas of SA, NT, and Tas/ACT.
(d) Includes Remote/Very Remote areas of NSW, SA, and Tas/ACT.


The 2004-05 NATSIHS CURF includes one index from the set of Socioeconomic Indexes For Areas (SEIFA), the Index of Relative Socio-Economic Disadvantage. This SEIFA index may not be used in conjunction with any other form of geography i.e. the index may not be used with STATEC, ARIAC or REMSTAT.

The SEIFA index is presented in deciles and is derived by simply grouping Collectors Districts (CDs) into 10 equal groups (i.e. equal number of CDs in each group) then matching the CDs of survey records to those groups. Because all CDs are not equal in size, and because the NATSIHS and NHS sample is not selected to ensure an equal distribution at the CD level, this method does not result in an equal number of people or households in each decile.

It should be borne in mind that the characteristics indicated by the index relate to the area (in this case the CD) in which a population lives, not necessarily to all individuals who live in that area. It should also be further noted that the variables used to create the index are not necessarily the most appropriate for the Indigenous population, and being an area based index it is not an Indigenous specific disadvantage measure.

For further information regarding SEIFA indexes, see Census of Population and Housing: Socio-Economic Indexes for Area's (SEIFA), Australia - Technical Paper, 2001 (cat. no. 2039.0.55.001)


There are 25,511 non-Indigenous respondents from the 2004-05 NHS included on the NATSIHS CURF to enable comparisons with the non-Indigenous population. Data items from the NHS that are not comparable to the NATSIHS are not included on this CURF.

Person weights (including age standardised weights) are attached to non-Indigenous records. Note that weighted data derived from non-Indigenous NHS respondents will not equate to published NHS estimates (which do not distinguish on the basis of Indigenous status). Also, the different geographic scope of the NATSIHS and NHS results in comparisons only being available for non-remote or total Indigenous and non-Indigenous populations. Remote comparisons are not appropriate. As mentioned previously it is also not appropriate to produce total Australia data from this CURF. Total Australia data is available using the NHS Basic or Expanded CURF.

The Data Item list indicate data items that are common to both Indigenous and non-Indigenous respondents and those that relate to the Indigenous respondents only.


SEs and RSEs can be estimated directly from the 2004-05 NATSIHS CURF using the replicate weight method. The basic idea behind the replicate approach is to select subsamples repeatedly (for Indigenous it is 250 times, for non-Indigenous it is 60 times) from the whole sample. For each of these subsamples the statistic of interest is calculated. The variance of the full sample statistic is then estimated using the variability among the replicate statistics calculated from the subsamples.

There are various ways of creating replicate subsamples from the full sample. The replicate weights produced for the 2004-05 NATSIHS (and 2004-05 NHS) have been created using a group jackknife method of replication. The formulae for calculating the SE and RSE of an estimate using this method are:

Equation: Tech5

Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. For proportions where the denominator is an estimate of the number of persons in a group and the numerator is the number of persons in a sub-group of the denominator group, the formula to approximate the RSE is given by:

Equation: appendix 7 e1


The age structure of the Indigenous population is considerably younger than that of the non-Indigenous population. As age is strongly related to health, statistical comparisons between Australia's Indigenous and non-Indigenous populations which do not take account of age may be misleading. Age standardisation is recommended by the ABS to improve the validity of comparisons between Indigenous and non-Indigenous populations when analysing health data. An alternative technique for analysing characteristics in populations that have different age structures is to compare the distribution of the variable of interest by age group. For this approach, unadjusted (non age standardised) data could be output in 10 year or 20 year age ranges.

Different methods of age standardisation are appropriate for different types of data and different purposes. The 2004-05 NATSIHS CURF has an inbuilt age standardisation facility that uses the direct method of age standardisation based on the total Australian population at 30 June 2001. Age standardised results, and associated RSEs, can be generated by using the relevant set of age standardised weights in place of the main (unadjusted) weights.

Four different sets of age standardised weights (and their associated 250 replicate weights) are available depending on the disaggregation of the population of interest. These are:
      AS_TO_WT - standardises total Indigenous estimates to the total Australian population age structure as at 30 June 2001.
      AS_RE_WT - standardises estimates for the Indigenous population separately in remote and non-remote areas to the total population age structure by remoteness area as at 30 June 2001.
      AS_ST_WT - standardises estimates for the Indigenous population separately in each state/territory as at 30 June 2001.
      AS_SE_WT - standardises estimates separately for Indigenous males and females to the total population age structure by sex as at 30 June 2001.

A set of age standardised weights (and their associated 60 replicate weights) are also available for the non-Indigenous population from the NHS.

Age standardised results generated from the 2004-05 NATSIHS CURF do not provide a measure of the prevalence of a particular characteristic in the Indigenous population. Rather, they provide a means of comparing these data with results for another population (such as the non-Indigenous population) that has also been standardised to the same reference population. Users will need to apply their own methods when age standardising from other sources.

For further details regarding age standardised weights and the method used to produce the weights, please refer to Chapter 7 of the Users' Guide.