Australian Bureau of Statistics
4715.0.55.002 - Technical Manual: National Aboriginal and Torres Strait Islander Health Survey, Expanded CURF, 2004-05
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 21/02/2014
|Page tools: Print Page Print All RSS Search this Product|
CHAPTER 3 USING THE CURF DATA
Some items relating to the topics covered in these lower level hierarchies are also held at the (selected) Person level, where appropriate. For example, while the detailed alcohol consumption data (day of consumption x type of drink x quantity consumed) are held at the alcohol level, other alcohol related items such as risk level, summary of drink types and quantity consumed, recorded consumption compared to usual, etc, are held at the (selected) Person level. The lower level records (levels 4 to 7) only exist where the person is in the relevant population, so that, for example, if a person reported that they did not have a long term condition, there would be no record for them in the Conditions level.
COUNTING UNITS AND WEIGHTS
The counting unit for each level is as follows:
level two - the person;
level three - the selected person/s;
level four - the types and quantities of alcohol and days consumed;
level five - the condition;
level six - type of injury; and
level seven - body part.
There is a weight attached to the (selected) Person level (level three) to estimate the total Indigenous or non-Indigenous population, and the Household level (level one) to estimate total Indigenous households (Note: there are no household weights for households comprising non-Indigenous persons only from the NHS).
While the CURF contains an (all) Persons in household level (level two), this should be considered as characteristics of the selected persons (or, for Indigenous only, the household also); the level is not intended for use to produce person estimates. For non-Indigenous households, the household level should also be considered as a characteristic of the selected persons and is not intended for use to produce household estimates.
The person weight can be used on levels 4 to 7 by copying it across. When the weight is used for these levels, the population is restricted to persons who have a record on the particular levels and will be repeated for each instance of the counting unit.
A person weight provides an estimate of the number of persons with the selected characteristic. Replicate weights (e.g. AS_TO001 to AS_TO250) have also been included and these can be used to calculate the sampling error on any estimate produced from the CURF. Age standardised weights and replicate weights have also been included on the (selected) Person level. For more information, refer to the 'Calculating standard error (SEs) and relative standard errors (RSEs)' and 'Age standardisation' sections below.
There are a series of identifiers that can be used on records at each level of the file to copy information from one level to another.
The identifiers ABSHID, ABSFID, ABSIID, ABSPID, ABSJID appear on all levels of the file. Where the information for the identifier is not relevant for a level, it has a value of 0.
Each household has a unique twelve-character (combination of characters and digits) random identifier (ABSHID). This identifier appears on the Household level, and is repeated on every other level. On the (all) Persons in household level, each family within the household is numbered sequentially (ABSFID) with non family members numbered sequentially from 50; and within each family, each income unit is numbered sequentially (ABSIID). Within the household, each person is numbered sequentially (ABSPID). Items containing this family, income unit and person number then appear on all the levels below the (all) Persons in household level. The combination of household and family identifier uniquely identifies the family. A combination of household, family and income unit identifiers uniquely identifies the income unit. The Injury damage level and the Body part injured level also have an identifier (ABSJID) and with ABSHID and ABSPID it enables information to be copied between levels seven and eight.
The identifiers needed for linking information between records are:
2. (all) Persons in household = ABSHID, ABSPID;
3. (selected) Person = ABSHID, ABSPID;
4. Alcohol = ABSHID, ABSPID;
5. Conditions = ABSHID, ABSPID;
6. Injury damage = ABSHID, ABSPID, ABSJID;
7. Body part injured = ABSHID, ABSPID, ABSJID;
To copy information from a lower level to a level above the following SAS code can be used (or equivalent):
PROC SORT DATA=IHS05CNC;
BY ABSHID ABSPID;
DATA TINNITUS (KEEP ABSHID ABSPID FLAG)
BY ABSHID ABSPID;
IF ICD10D=57 THEN FLAG=1; *set flag for condition = tinnitus;
IF LAST.ABSPID THEN DO;
PROC SORT DATA=IHS05PNC;
BY ABSHID ABSPID;
MERGE TINNITUS IHS05PNC;
BY ABSHID ABSPID;
The TINNITUS file, using the Condition level dataset, keeps the last record for each ABSPID, i.e. person, and sets the item FLAG to 1 if the person has tinnitus as a condition. This newly created flag is then merged on to the Person level file so that this item can now be cross classified or analysed with any other item on the Person level.
To copy information from a higher level to a level below the following SAS code can be used (or equivalent):
PROC SORT DATA=IHS05IDC;
BY ABSHID ABSPID;
PROC SORT DATA=IHS05PNC;
BY ABSHID ABSPID;
MERGE IHS05IDC (IN=A) IHS05PNC (KEEP=ABSHID ABSPID SEX AGECI IN=B);
BY ABSHID ABSPID;
IF A AND B THEN OUTPUT; *only keeps records which are present on both files;
Unlike the previous merge, this merge will match one person record to many injury records. The statement "If A and B then OUTPUT;" ensures that only records present on both files are kept. If this statement was not used then person records without a corresponding injury record would appear with a missing value for all injury data items. Note that the data items copied from the (selected) Person level will now be duplicated as required to match the counting unit for the level they have been added to, injuries in this case (i.e. if a person was bruised and had an open would, their data from the (selected) Person level will be attached to both types of injury).
For items with 'continuous' values, such as Personal gross weekly cash income, certain values are reserved as special codes and must not be added as if they were quantitative values. These values are at the upper range of items (for example, 999 for three digit items, 9999 for four digit items etc.) and are specifically identified in the data item list with their text meaning specified next to them as shown:
LABEL: PERSONAL GROSS WEEKLY CASH INCOME
999,995 Not stated
999,996 Not applicable
999,998 Not known
999,999 No source
MULTIPLE RESPONSE ITEMS
A number of the questions asked during the NATSIHS allowed respondents to give multiple responses. On the CURF, each response category for such questions is treated as a separate data item. An example of a multiple response item is the "How usually feel when treated badly because Aboriginal/Torres Strait Islander" data item which lists nine categories from DISCRQ3A (feel angry) through to DISCRQ3F (other feeling). Except for DISCRQ3A, output for each of these categories (for example DISCRQ3B) will contain two codes:
User note: as respondents can report more than one category for a multiple response item, the sum of responses for all categories will exceed the sum of respondents for that item.
In most cases, multiple response items will have a number of categories falling into the first SAS category (denoted by an 'A' at the end of the fixed SAS name, e.g. DISCRQ3A). This category will contain the first multiple response category, as well as any special codes for the item (for example for DISCRQ3A these special codes are 7 None of the above, 97 Refusal, and 98 Not stated). When using data from these multiple response items, users should first confirm the placement of these special codes.
USING REPEATING DATASETS
The 'one to many' relationships described by levels 4 to 7 are known as repeating datasets i.e. sets of data with a counting unit which may be repeated for a person. For example, a repeating dataset for conditions will have one record per condition reported because condition is the counting unit (see table example below). Repeating datasets are only useful when common information is collected for each instance of a counting unit. For example in the table below, each condition (ICD10D) reported has the data item "Whether condition a result of an injury" (WCONDRJ) associated with it. By using this item, a table can be run to ascertain which of the conditions reported are the result of an injury.
Example of 'Conditions' repeating dataset
To run the table mentioned above the following SAS code (or equivalent) can be used:
PROC FREQ DATA=IHS05CNC;
The following output would be produced for the example data set:
The condition is the result of an injury when WCONDRJ is code 1. The above example shows that one of the four conditions is the result of an injury. Note that although the output above only relates to a single person the totals are a count of all conditions for that person.
GEOGRAPHY AND SEIFA
The 2004-05 NATSIHS CURF contains three geographic field items: State/Territory of usual residence (STATEC); ASGC Remoteness Structure (ARIAC) and a combination of State/Territory by ASGC Remoteness Structure (REMSTAT). The CURF also contains one SEIFA index (see SEIFA Index section below).
STATEC identifies each state and territory separately except Tasmania and the ACT. Due to confidentiality considerations, the samples from Tasmania (876 records) and ACT (368 records) have been combined into a single category of Tas./ACT.
ARIAC has two output categories: non-remote and remote/very remote.
REMSTAT has thirteen output categories that comprise selected cross-classification of state/territory by remoteness where sample size permits. Output categories can be found in the data item list and in the table below.
Users of the 2004-05 NATSIHS CURF on RADL are advised that some items collected using CAI are only valid for restricted geographic output. This is because either the entire item was collected in CAI only (predominately used in non-community areas) or certain categories of the item were collected in CAI but not in PAPI (community areas). Due to confidentiality issues that arise because of the size of populations in the different states and territories, there are some restrictions on the geographic output for these items. The following section provides information on how to use the geographic variables ARIAC and REMSTAT in order to obtain valid output for these items.
There are 384 data items (excluding identifiers and weights) on the CURF, 100 of which are available for non-remote areas only. There are two reasons for restricting items to non-remote areas only:
Those data items and versions of data items restricted to non-remote only cannot produce results for total Australia. These items require the use of the ARIAC variable with results available for non-remote Australia only.
To obtain valid results for these non-remote restricted data items at state level, users must use the state by remoteness variable (REMSTAT) and use only the valid output categories shown in the table below. STATEC can't be used for these data items.
(a) As only a very small portion of Vic is considered remote and enumeration did not occur in these areas, this represents Non-Remote.
(b) Includes Major cities, Inner regional area and Outer regional area.
(c) Includes Non-remote areas of SA, NT, and Tas/ACT.
(d) Includes Remote/Very Remote areas of NSW, SA, and Tas/ACT.
The 2004-05 NATSIHS CURF includes one index from the set of Socioeconomic Indexes For Areas (SEIFA), the Index of Relative Socio-Economic Disadvantage. This SEIFA index may not be used in conjunction with any other form of geography i.e. the index may not be used with STATEC, ARIAC or REMSTAT.
The SEIFA index is presented in deciles and is derived by simply grouping Collectors Districts (CDs) into 10 equal groups (i.e. equal number of CDs in each group) then matching the CDs of survey records to those groups. Because all CDs are not equal in size, and because the NATSIHS and NHS sample is not selected to ensure an equal distribution at the CD level, this method does not result in an equal number of people or households in each decile.
It should be borne in mind that the characteristics indicated by the index relate to the area (in this case the CD) in which a population lives, not necessarily to all individuals who live in that area. It should also be further noted that the variables used to create the index are not necessarily the most appropriate for the Indigenous population, and being an area based index it is not an Indigenous specific disadvantage measure.
For further information regarding SEIFA indexes, see Census of Population and Housing: Socio-Economic Indexes for Area's (SEIFA), Australia - Technical Paper, 2001 (cat. no. 2039.0.55.001)
There are 25,511 non-Indigenous respondents from the 2004-05 NHS included on the NATSIHS CURF to enable comparisons with the non-Indigenous population. Data items from the NHS that are not comparable to the NATSIHS are not included on this CURF.
Person weights (including age standardised weights) are attached to non-Indigenous records. Note that weighted data derived from non-Indigenous NHS respondents will not equate to published NHS estimates (which do not distinguish on the basis of Indigenous status). Also, the different geographic scope of the NATSIHS and NHS results in comparisons only being available for non-remote or total Indigenous and non-Indigenous populations. Remote comparisons are not appropriate. As mentioned previously it is also not appropriate to produce total Australia data from this CURF. Total Australia data is available using the NHS Basic or Expanded CURF.
The Data Item list indicate data items that are common to both Indigenous and non-Indigenous respondents and those that relate to the Indigenous respondents only.
CALCULATING STANDARD ERRORS (SES) AND RELATIVE STANDARD ERRORS (RSES)
SEs and RSEs can be estimated directly from the 2004-05 NATSIHS CURF using the replicate weight method. The basic idea behind the replicate approach is to select subsamples repeatedly (for Indigenous it is 250 times, for non-Indigenous it is 60 times) from the whole sample. For each of these subsamples the statistic of interest is calculated. The variance of the full sample statistic is then estimated using the variability among the replicate statistics calculated from the subsamples.
There are various ways of creating replicate subsamples from the full sample. The replicate weights produced for the 2004-05 NATSIHS (and 2004-05 NHS) have been created using a group jackknife method of replication. The formulae for calculating the SE and RSE of an estimate using this method are:
Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. For proportions where the denominator is an estimate of the number of persons in a group and the numerator is the number of persons in a sub-group of the denominator group, the formula to approximate the RSE is given by:
The age structure of the Indigenous population is considerably younger than that of the non-Indigenous population. As age is strongly related to health, statistical comparisons between Australia's Indigenous and non-Indigenous populations which do not take account of age may be misleading. Age standardisation is recommended by the ABS to improve the validity of comparisons between Indigenous and non-Indigenous populations when analysing health data. An alternative technique for analysing characteristics in populations that have different age structures is to compare the distribution of the variable of interest by age group. For this approach, unadjusted (non age standardised) data could be output in 10 year or 20 year age ranges.
Different methods of age standardisation are appropriate for different types of data and different purposes. The 2004-05 NATSIHS CURF has an inbuilt age standardisation facility that uses the direct method of age standardisation based on the total Australian population at 30 June 2001. Age standardised results, and associated RSEs, can be generated by using the relevant set of age standardised weights in place of the main (unadjusted) weights.
Four different sets of age standardised weights (and their associated 250 replicate weights) are available depending on the disaggregation of the population of interest. These are:
AS_RE_WT - standardises estimates for the Indigenous population separately in remote and non-remote areas to the total population age structure by remoteness area as at 30 June 2001.
AS_ST_WT - standardises estimates for the Indigenous population separately in each state/territory as at 30 June 2001.
AS_SE_WT - standardises estimates separately for Indigenous males and females to the total population age structure by sex as at 30 June 2001.
A set of age standardised weights (and their associated 60 replicate weights) are also available for the non-Indigenous population from the NHS.
Age standardised results generated from the 2004-05 NATSIHS CURF do not provide a measure of the prevalence of a particular characteristic in the Indigenous population. Rather, they provide a means of comparing these data with results for another population (such as the non-Indigenous population) that has also been standardised to the same reference population. Users will need to apply their own methods when age standardising from other sources.
For further details regarding age standardised weights and the method used to produce the weights, please refer to Chapter 7 of the Users' Guide.
These documents will be presented in a new window.
This page last updated 26 February 2014