4324.0.55.001 - Microdata: Australian Health Survey, National Health Survey, 2011-12 Quality Declaration 
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 05/11/2014   
   Page tools: Print Print Page Print all pages in this productPrint All

USING THE EXPANDED CURF


ABOUT THE EXPANDED CURF

The NHS 2011–12 Expanded Confidentialised Unit Record File (CURF) contains unit records relating to all households that contained at least one fully responding person. The data are released under the Census and Statistics Act 1905, which has provision for the release of data in the form of unit records where the information is not likely to enable the identification of a particular person or organisation. Accordingly, there are no names or addresses of survey respondents on the CURF and other steps, including the following list of actions, have been taken to protect the confidentiality of respondents:

  • the level of detail of many data items has been reduced by grouping, ranging or top coding values
  • some unusual records have been changed to protect against identification
  • excluding some data items that were collected
  • income data has been perturbed.

The nature of the changes made, and the relatively small number of records involved ensure that the effects on data for analysis purposes is considered negligible. These changes also mean that estimates produced from the Expanded CURF may differ from those published in Australian Health Survey: First Results, 2011-12 (cat. no. 4364.0.55.001), subsequent publications, TableBuilder and/or the Basic CURF.

Detailed information about the data collected, comments regarding data quality and other points to assist in using and interpreting the data are contained in Australian Health Survey: Users' Guide, 2011-13 (cat. no. 4363.0.55.001).


ACCESSING EXPANDED CURFS

Expanded CURFs can only be accessed via the Remote Access Data Laboratory (RADL) and the DataLab. Users must have applied for use of the RADL and the DataLab prior to using the Expanded CURF microdata. Details on the RADL and the DataLab can be found on the Microdata Entry Page.


COUNTS AND WEIGHTS
NUMBER OF RECORDS BY LEVEL, NHS 2011-12 EXPANDED CURF

LEVELSRECORD COUNTS (UNWEIGHTED)
WEIGHTED COUNTS (if applicable)
Household level15 565
8 581 354
Persons in Household (All persons)38 206
N/A
Person level (Selected persons)20 426
22 105 281
Alcohol Day31 844
N/A
Alcohol Type34 747
N/A
Actions (Condition Group)48 709
N/A
Conditions level72 694
N/A
Medication level49 172
N/A
Biomedical level (Persons 5+)20 426
20 649 321



Weights and Hierarchical Files

Weight Variables
There are three weight variables on the file:

Household Weight (NHSFHHWT) - Household level - Benchmarked
Person Weight (NHSFINWT) - Selected Person level - Benchmarked to the total population.
Biomedical Weight (NHMSPERW) - Biomedical level - Benchmarked to the total population aged 5 years and over. Note that this level also contains non-biomedical participant records, however, their biomedical weight is set to 0 so they won't contribute to estimates. When using biomedical data items in conjunction with other items on the biomedical level or with items from other levels, the biomedical weight should be used.

There is no weight associated with the Persons in household level. This level is available in order to produce compositional information about the household (e.g. Number of persons in household aged 4-14 years) which can then either be used with the household weight to represent for example, the number of households with at least two persons aged 4-14 years, or with the person weight to represent the number of people living in households that contain at least two persons aged 4-14 years.

There are also no weights associated with the other levels. This is because the records are repeated for each person. If, for example, NHSFINWT is merged onto the Conditions level, it will be attached to each condition record and therefore be repeated for each condition a person has. This should be considered when producing tables. See 'Copying information across levels' below for more information.

For more information about weights, see 'Reliability of Estimates' below.

Using Weights
The NHS is a sample survey. To produce estimates for the in-scope population you must use weight fields in your calculations. The 'Biomedical Weight (Benchmarked weight)' must be used for all tables where a biomedical level data item is being used. This includes where biomedical items are being used with items from other levels. Which weight, if any, is used on data at non-benchmarked levels will affect the result as shown in the examples below:


Level of Data Item

Estimates if use Household Weight

Estimates if use Person Weight




Household level

Households with the specified characteristics.

Persons in households with the specified characteristics.

Persons in Household (All persons)

Households containing one or more persons with the specified characteristics.

Persons in households containing one or more persons with the specified characteristics.

Person level (Selected persons)

Households containing one or more selected persons with the specified characteristics.

Persons with the specified characteristics.

Alcohol Day

Households containing one or more selected persons with one or more alcohol days with the specified characteristics.

Persons with one or more alcohol days with the specified characteristics.

Alcohol Type

Households containing one or more selected persons with one or more alcohol types with the specified characteristics.

Persons with one or more alcohol types with the specified characteristics.

Actions (Condition Group)

Households containing one or more selected persons with one or more actions with the specified characteristics.

Persons with one or more actions with the specified characteristics.

Conditions level

Households containing one or more selected persons with one or more conditions with the specified characteristics.

Persons with one or more conditions with the specified characteristics.

Medication level

Households containing one or more selected persons with one or more medications with the specified characteristics.

Persons with one or more medications with the specified characteristics.

Biomedical level

Not applicable because not all households contain at least one biomedical participant.

Persons with the specified biomedical characteristics.*

*Note: Biomedical persons (Benchmarked weight) must be used to produce population estimates of persons 5 years and over with specified biomedical characteristics, rather than Persons (Benchmarked weight). The Biomedical persons (Benchmarked weight) applies a weight of 0 to children under 5 years and biomedical non-participants, ensuring that they do not contribute to the population estimate.


IDENTIFIERS

Every record on each level of the file is uniquely identified.

The identifiers ABSHHID, ABSAID, ABSPID, ABSBID, ABSTID, ABSGID, ABSCID, ABSMID and ABSUID appear on all levels of the file. Where the information for the identifier is not relevant for a level, it has a value of 0. See the Data Item List for details on which ID equates to which level.

Each household has a unique thirteen digit random identifier, ABSHHID*. This identifier appears on the household level and is repeated on each level on each record pertaining to that household. The combination of identifiers uniquely identifies a record at a particular level as shown below.

1. Household = ABSHHID*
2. All Persons in Household = ABSHHID* + ABSAID
3. Selected Person = ABSHHID* + ABSPID
4. Alcohol Day = ABSHHID* + ABSPID + ABSBID
5. Alcohol Type = ABSHHID* + ABSPID + ABSBID + ABSTID
6. Actions = ABSHHID* + ABSPID + ABSGID
7. Conditions = ABSHHID* + ABSPID + ABSGID + ABSCID
8. Medication = ABSHHID* + ABSPID + ABSMID
9. Biomedical = ABSHHID* + ABSPID + ABSUID
*Note: the SAS name for the Household record identifier is ABSLID on the Basic CURF.

ABSHHID assists with linking together people of the same household and also with household characteristics such as geography (located on the household level). The combination of ABSHHID, ABSPID, ABSGID and ABSCID identifies each individual condition record a person has. When merging data with a level above, only those identifiers relevant to the level above are required. When merging with the level below (for example, the conditions level with the person level), the data on the person level will duplicate for each condition. See 'Copying information across levels' below for more information.

Copying information across levels

The following SAS code is an example of copying information from a lower level to a level above:
    PROC SORT DATA=NHS11E.NHS11ECN OUT=SORTED_ECN; /* Condition level */
    BY ABSHHID ABSPID ABSGID;

    DATA TOT_LTC (KEEP=ABSHHID ABSPID ABSGID LONGTERM);
    SET SORTED_ECN;
    BY ABSHHID ABSPID ABSGID; /* This step will go through each Condition record within each unique combination of ABSHHID, ABSPID and ABSGID */
    RETAIN LONGTERM;

    IF FIRST.ABSGID THEN
    DO;
    LONGTERM=0;
    END; /* Create the new variable and set an initial value of 0 */

    IF CONDSTAT=1 THEN LONGTERM=LONGTERM+1; /*Counts the number of diagnosed long term conditions*/

    IF LAST.ABSGID THEN OUTPUT; /* This outputs the last record including the totals found for each unique combination of ABSHHID, ABSPID and ABSGID */

    PROC SORT DATA=NHS11E.NHS11EAC OUT=SORTED_EAC; /* Actions level - the level above Condition */
    BY ABSHHID ABSPID ABSGID;

    DATA MRGFILES;
    MERGE TOT_LTC SORTED_EAC;
    BY ABSHHID ABSPID ABSGID;

    PROC FREQ DATA=MRGFILES; /*This procedure produces sample counts of diagnosed long term conditions for each condition group */
    TABLES LONGTERM /NOCOL NOROW NOCUM NOPERCENT;

    RUN;

The new variable LONGTERM presents a count of the number of diagnosed/longterm conditions belonging to each actions record. For example, a person with two current, diagnosed, long-term cardiovascular conditions (on the Conditions level) would have a value of '2' for LONGTERM on the cardiovascular actions record on the Actions level.

The following SAS code is an example of copying information from a higher level to a level below:
    PROC SORT DATA=NHS11E.NHS11ESP OUT=SORTED_PERSON (KEEP=ABSHHID ABSPID AGEC SEX);
    BY ABSHHID ABSPID;

    PROC SORT=NHS11E.NHS11EAC OUT=SORTED_ACTIONS;
    BY ABSHHID ABSPID;

    DATA MRGFILES;
    MERGE SORTED_ACTIONS SORTED_PERSON;
    BY ABSHHID ABSPID;

    RUN;

This merge matches one Person record to many Actions records. So, the data items copied from the person level ('AGEC' and 'SEX' in the example) will be repeated for the counting unit of the level they have been added to, Actions in this case. Each Actions record will therefore receive the AGEC and SEX of the Person they belong to.

For more information regarding merges across levels (including sample SPSS and Stata code) see SAMPLE CODE AND USING CURFS.


MULTI-RESPONSE ITEMS


A number of questions in the survey allowed respondents to provide one or more responses. Each response category for these multi-response data items is treated as a separate data item. On the CURF, these data items share the same identifier (SAS name) prefix but are each separately suffixed with a letter - A for the first response, B for the second response, C for the third response and so on.

For example, the multi-response data item 'Days in last week consumed alcohol' has seven response categories (excluding 'Not applicable' and 'No alcohol consumed in last week'). There are seven data items named ALCDYWA, ALCDYWB, ALCDYWC...ALCDYWG. Each data item in the series will have either a positive response code or a null response code, with the exception of the first item in the series, ALCDYWA. ALCDYWA has four potential response codes: the positive response code 1 - 'Monday', the code 0 - null response, as well as the two additional response codes, code 8 - 'No alcohol consumed in last week' and code 9 - 'Not applicable'. The remaining items ALCDYWB--G have just the two response codes each. The data item list identifies all multi-response items and lists the corresponding codes with the corresponding response categories.

Note that the sum of individual multi-response categories will be greater than the population applicable to the particular data item as respondents are able to select more than one response.


RELIABILITY OF ESTIMATES

As the survey was conducted on a sample of private households in Australia, it is important to take account of the method of sample selection when deriving estimates from the CURF. This is particularly important as a person's chance of selection in the survey varied depending on the state or territory in which the person lived. If these chances of selection are not accounted for by use of appropriate weights, the results will be biased. For details on the weighting process, see Weighting, Benchmarks and Estimation procedures in Australian Health Survey: Users' Guide, 2011-13 (cat. no. 4363.0.55.001).

Each person record has a main weight (NHSFINWT). This weight indicates how many population units are represented by the sample units. When producing estimates of sub-populations from the CURF, it is essential that they are calculated by adding the weights of persons in each category and not just by counting the sample number in each category. If each person's weight were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a person's chance of selection or of different response rates across population groups, with the result that the estimates produced could be biased. The application of weights ensures that estimates will conform to an independently estimated distribution of the population by age, by sex, etc. rather than to the distributions within the sample itself.

Each person record on the CURF contains 60 replicate weights in addition to the main weight. Replicate weights can be used to calculate measures of sampling error. For details on sampling error calculations and replicate weights, see the Technical Note in the Australian Health Survey: Users' Guide, 2011-13 (cat. no. 4363.0.55.001).


CODING ISSUES

The following three coding issues affect the Expanded CURF:

1) Approximately 4,210,000 persons aged 0 to 14 years are coded to 'Not stated' rather than 'Not applicable' for the data item 'Gross weekly personal income in deciles'. Users can correct this problem by using 'Age of person' to restrict tables with this data item to only include persons aged 15 years and over.

2) Approximately 249,000 persons aged 15 years are coded to 'Not applicable' rather than 'Never smoked daily' for 'Duration of daily smoking - years'. Users can correct this problem by using 'Age of person' to move 15 year olds from 'Not applicable' to 'Never smoked daily'.

3) Issues have been identified with the coding of the 2011-12 data for the conditions 'Back pain/problem, disc disorder', 'Diseases of the digestive system', 'Symptoms, signs and conditions not elsewhere classified', 'Rheumatism' and 'Other diseases of the musculoskeletal system and connective tissue'. Analysis indicates that at the national level:
    • 'Back pain/problem, disc disorder' was under-reported by approximately 545,000 people
    • 'Diseases of the digestive system' was over-reported by approximately 230,000 people
    • 'Symptoms, signs and conditions not elsewhere classified' was over-reported by approximately 255,000 people
    • 'Rheumatism' was over-reported as a 'Current and long term' condition by approximately 145,000 people
    • 'Other diseases of the musculoskeletal system and connective tissue' was over-reported as a 'Current and long term' condition by approximately 275,000 people.

Therefore, 2011-12 data for these conditions are not comparable with other years. However, 2014-15 data have been correctly coded.


EXPANDED CURF FILES

SAS files
These files contain the data for the CURF in SAS format.

NHS11EHH.sas7bdat contains the Household level data
NHS11EAP.sas7bdat contains the Persons in Household level data (All Persons)
NHS11ESP.sas7bdat contains the Person level data (Selected Person)
NHS11EA3.sas7bdat contains the Alcohol Day level data
NHS11E14.sas7bdat contains the Alcohol Type level data
NHS11EAC.sas7bdat contains the Actions level data
NHS11ECN.sas7bdat contains the Condition level data
NHS11EMD.sas7bdat contains the Medication level data
NHS11EBI.sas7bdat contains the Biomedical level data

SPSS files
These files contain the data for the CURF in SPSS format.

NHS11EHH.sav contains the Household level data
NHS11EAP.sav contains the Persons in Household level data (All Persons)
NHS11ESP.sav contains the Person level data (Selected Person)
NHS11EA3.sav contains the Alcohol Day level data
NHS11E14.sav contains the Alcohol Type level data
NHS11EAC.sav contains the Actions level data
NHS11ECN.sav contains the Condition level data
NHS11EMD.sav contains the Medication level data
NHS11EBI.sav contains the Biomedical level data


STATA files
These files contain the data for the CURF in Stata format.

NHS11EHH.dta contains the Household level data
NHS11EAP.dta contains the Persons in Household level data (All Persons)
NHS11ESP.dta contains the Person level data (Selected Person)
NHS11EA3.dta contains the Alcohol Day level data
NHS11E14.dta contains the Alcohol Type level data
NHS11EAC.dta contains the Actions level data
NHS11ECN.dta contains the Condition level data
NHS11EMD.dta contains the Medication level data
NHS11EBI.dta contains the Biomedical level data

Information files
FORMATS.sas7bcat is a SAS library containing formats

Frequency files
The following plain text format files contain data item code values and category labels at each level.

ECURF NHS11E Household Freq.txt contains weighted and unweighted frequencies for Household level items
ECURF NHS11E Persons in Household Freq.txt contains unweighted frequencies for Persons in Household level items
ECURF NHS11E Person Freq.txt contains weighted and unweighted frequencies for Person level items
ECURF NHS11E Alcohol Day Freq.txt contains unweighted frequencies for Alcohol Day level items
ECURF NHS11E Alcohol Type Freq.txt contains unweighted frequencies for the Alcohol Type level items
ECURF NHS11E Actions Freq.txt contains the unweighted frequencies for the Actions level items
ECURF NHS11E Condition Freq.txt contains unweighted frequencies for Condition level items
ECURF NHS11E Medication Freq.txt contains unweighted frequencies for Medication level items
ECURF NHS11E Biomedical Weighted Freq.txt contains weighted frequencies for the Biomedical level items
ECURF NHS11E Biomedical Unweighted Freq.txt contains unweighted frequencies for the Biomedical level items