Microdata and TableBuilder: National Health Survey

Provides data from the National Health Survey for key health statistics including long-term health conditions and health risk factors

Introduction

The National Health Survey (NHS) is collected every three years and is designed to provide a range of information about the health of Australians. It provides data such as prevalence of chronic and long-term health conditions, self-reported health status and health risk factors. This information can be cross classified by selected demographic and socioeconomic characteristics.

This product provides information about the microdata releases from the most recent NHS cycles, 2020-21 and 2017-18, including details about the data files and how to use the different microdata products. Data Item Lists, information about the survey methodology, and links to microdata for previous NHS releases (2014-15 and prior) are also provided.

It is important to note that the 2020-21 NHS data should be considered a break in time series from previous NHS collections and used for point-in-time national analysis only. The survey was collected during the COVID-19 pandemic, via an online, self-complete form, which significantly changed the data collection and survey estimates. For more information, see 2020-21 Methodology information.

Available products

  • TableBuilder - an online tool for creating tables and graphs. This product is available for NHS cycles from 2011-12 to 2017-18. A dataset for 2020-21 will be available in an upcoming release. For more information, see the TableBuilder page.
  • Basic microdata – approved users can download and analyse unit record data in their own environment. This product is available for NHS cycles from 1977-78 to 2017-18. It is not available for the 2020-21 NHS. For more information, see the MicrodataDownload page.
  • Detailed microdata - approved users can access DataLab for in-depth and interactive data analysis using a range of statistical software packages. This product is available for NHS cycles 2001 to 2020-21. For more information, including prerequisites for DataLab access, see the DataLab page.

File structure

Datasets from the NHS are hierarchical in nature. A hierarchical data file is an efficient means of storing and retrieving information which describes one to many, or many to many, relationships. For example, a person may report multiple days on which alcohol was consumed and multiple types of alcoholic beverages on each of these days.

Data about households and families are contained as individual characteristics on person records. While estimates are also available at the household level, estimates at the family level are not available from this survey. The data items and related output categories are described in Excel spreadsheets from the Data Item Lists section.

2020-21 NHS file structure

The following table shows the levels available in the microdata products and the information contained on those levels:

Level nameInformation contained on level
  1. Household

Geographic classifications, household size and structure, dwelling characteristics and household income details.

  1. Selected Person

Demographic and socioeconomic characteristics of survey respondents, as well as health, health risks and related information provided by respondents.

  1. Alcohol – Day consumed

Total daily alcohol consumption of the three most recent days alcohol was consumed by the respondent in the last week.

  1. Alcohol – Day/Type consumed

Broad types of alcoholic beverages consumed, including quantities, on the three most recent days of alcohol consumption in the last week.

  1. Conditions

Health conditions and status.

  1. Physical Activity - Day

Types of daily physical activity and duration.

 

The following table shows the hierarchical file structure and the relationship between each level:

Table title
Level 1Level 2Level 3Level 4Relationship type
Household   One record per in scope household
 Selected Persons  Up to two selected person records per household (1 adult and 1 child)
  Conditions One Conditions record for each reported condition for each selected person record
  Physical Activity - Day Seven Physical activity day records per selected person aged 15 years and over
  Alcohol – Day consumed Up to three Alcohol – day consumed records per selected person 15 years and older (children 0-14 years were out of scope for the alcohol module)
   Alcohol – Day/Type consumedUp to 17 Alcohol – Type records per Alcohol – Day consumed record

 

2017-18 NHS file structure

The following table shows the levels available in the microdata products and the information contained on those levels:

Level nameTable BuilderBasic microdataDataLabInformation contained on level
1. HouseholdXXXGeographic classifications, household size and structure, dwelling characteristics and household income details
2. Selected personXXXDemographic and socio-economic characteristics of survey respondents, and most of the health, health risks and related information they provided
3. Alcohol - Day consumedXXXAlcohol consumption on the three most recent days on which respondents reported consuming alcohol and the order of consumption
4. Alcohol - Type consumedXXXOrder of consumption, and the broad alcohol types and quantities for each type consumed on those days
5. ConditionsXXXInformation about health conditions reported by respondents
6. MedicationsXXXInformation on medications reported by respondents
7. Health literacyXXXInformation on Health literacy reported by respondents

The following table shows the hierarchical file structure and the relationship between each level:

Level 1Level 2Level 3Level 4Relationship type
Household   One record per in scope household
 Selected Persons  Up to two selected person records per household (1 adult and 1 child)
  Health Literacy One health literacy record for each person who responded to the health literacy survey
  Conditions One Conditions record for each reported condition for each selected person record
  Medications One Medications record for each reported medication/supplement for each selected person record
  Alcohol - Day consumed Up to three Alcohol - day consumed records per selected person 15 years and older (children 0-14 years were out of scope for the alcohol module)
   Alcohol - Type consumedUp to 13 Alcohol - type records per Alcohol - day consumed record

Counts and weights

Number of records by level, NHS 2020-21 microdata
LevelRecord counts (unweighted)Weighted counts (if applicable)
Household10,1329,782,954
Person (Selected persons)13,28124,995,375
Alcohol Day17,687N/A
Alcohol Type20,086N/A
Conditions54,311N/A
Physical Activity Day77,697N/A

 

Number of records by level, NHS 2017-18 microdata
LevelsRecord counts (Unweighted)Weighted counts (if applicable)
Household level16,3769,268,534 
Person level (Selected persons)21,31524,103,016
Alcohol Day27,848N/A
Alcohol Type30,343N/A
Conditions level87,107N/A
Medications level52,901N/A
Health Literacy level5,79018,655,100

 

Weight variables

For 2020-21 NHS, there are two weight variables on the file:

  • Household Weight (NHSHHWT) - Household level – Benchmarked
  • Person Weight (NHSFINWT) - Selected Person level - Benchmarked to the total population.

For 2017-18 NHS, there are three weight variables on the file:

  • Household Weight (NHSFHHWT) - Household level - Benchmarked
  • Person Weight (NHIFINWT) - Selected Person level - Benchmarked to the total population.
  • Health Literacy Person Weight (HLSFINWT) - Health Literacy level - Benchmarked to the total population 18 years and over.

There is no weight associated with the other levels. This is because the records are repeated for each person. If, for example, NHSFINWT is merged onto the Conditions level, it will be attached to each condition record and therefore be repeated for each person where they have more than one condition. This should be considered when producing tables.

Using weights

The NHS is a sample survey, so to produce estimates for the in-scope population you must use weight fields in your calculations. When analysing a Household level item at the household level, you will need to use the household weight. For example, if you wanted to know the number of households in a state, rather than the number of persons living in that state.

Caution should be used when applying the ‘Household’ weight to items from other levels. For example, if the household weight is applied to a selected person level demographic item, such as ‘Sex’, your table will show the number of households with one or more selected persons of that sex. Since up to two people can be selected in the NHS, this will result in some households being counted twice, once for the selected adult and once for the selected child, if they are both the same sex.

File content

Available data items

Data items include:

  • Demographics - Age, Sex, Country of Birth, Main language spoken, Marital status
  • Household details - Type, Size, Household composition, Tenure, SEIFA, Geography
  • Labour force status
  • Educational attainment
  • Personal and Household Income
  • Migrant and Visa status
  • Self-assessed health status
  • Self-reported height, weight and body mass
  • Long-term health conditions such as arthritis, asthma, cancer, diabetes, hypertension, cardiovascular disease, kidney disease etc
  • Risk factors such as tobacco smoking, e-cigarettes/vaping, alcohol consumption, fruit and vegetable consumption, sugar sweetened and diet drink consumption, and physical activity
  • Health service use
  • Bodily pain
  • Psychological distress

There were no physical measurements collected in 2020-21 NHS, such as blood pressure, height, weight and waist.

The Data Item Lists in the Data downloads section is the definitive source of available data items and categories.

Identifiers

Every record on each level of the file is uniquely identified. See Data Item Lists for details on which ID equates to which level.

Each household has a unique random identifier, ABSHIDD. This identifier appears on the household level and is repeated on each level on each record pertaining to that household. A combination of identifiers for a particular level and all levels above in the hierarchical structure uniquely identifies a record at a particular level. For example, each record on the conditions level is uniquely identified by a combination of the Household, Person and Conditions level identifiers.

The Household record identifier, ABSHIDD, assists with linking people from the same household, and with household characteristics such as geography (located on the household level) to the Person records. When merging data with a level above, only those identifiers relevant to the level above are required.

Multi-response items

Several questions in the survey allowed respondents to provide one or more responses. Each response category for these multi-response data items is treated as a separate data item. In the detailed microdata, these data items share the same identifier (SAS name) prefix but are each separately suffixed with a letter - A for the first response, B for the second response, C for the third response and so on.

For example, the multi-response data item 'Disability type' has six response categories (excluding 'Not applicable'). There are six data items named DISABA, DISABB, DISABC...DISABF. Each data item in the series will have either a positive response code or a null response code, with the exception of the first item in the series, DISABA. DISABA has three potential response codes: the positive response code 1 - 'Sight, hearing, speech', the code 0 - null response, as well as the additional response code, code 7 - 'Not applicable'. The remaining items DISAB--F have just the two response codes each. The data item list identifies all multi-response items and lists the corresponding codes with the corresponding response categories.

Note that the sum of individual multi-response categories will be greater than the population applicable to a particular data item as respondents can select more than one response.

Reliability of estimates

As the survey was conducted on a sample of private households in Australia, it is important to take account of the method of sample selection when deriving estimates from the detailed microdata. This is particularly important as a person's chance of selection in the survey varied depending on the state or territory in which the person lived. If these chances of selection are not accounted for by use of appropriate weights, the results will be biased.

Each person record has a main weight (NHSFINWT). This weight indicates how many population units are represented by the sample units. When producing estimates of sub-populations from the detailed microdata, it is essential that they are calculated by adding the weights of persons in each category and not just by counting the sample number in each category. If each person's weight were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a person's chance of selection or of different response rates across population groups, with the result that the estimates produced could be biased. The application of weights ensures that estimates will conform to an independently estimated distribution of the population by age, by sex, etc. rather than to the distributions within the sample itself.

Each person record on the detailed microdata contains 60 replicate weights in addition to the main weight. Replicate weights can be used to calculate measures of sampling error.

Non-Indigenous flag

The purpose of the Non-Indigenous flag (NONINDST) is to assist users in producing non-Indigenous data only. It should not be used to estimate Aboriginal and Torres Strait Islander populations through differencing, as the scope of the National Health Survey excludes Very Remote areas of Australia and discrete Aboriginal and Torres Strait Islander communities.

Continuous items

Some continuous data items are allocated special codes for certain responses (e.g. 9999 = 'Not applicable'). When creating ranges for such continuous items for use in the TableBuilder, these special codes will NOT be included in these ranges. Any special codes for continuous (summation) data items are listed in the Data Item List (DIL) and will be found in the categorical version of the continuous item. However, note that labelling of '0's in the DIL does not necessarily mean they are excluded from the ranges (for example - identifying 0 as 'Did not visit' or 'Did not do') as they may still be important in some calculations. Reference should be made to the categorical version of the item to identify which codes are specifically excluded. Therefore the total shown only represents 'valid responses' of that continuous data item rather than all responses (including special codes).

Data Item Lists

Data Item Lists

Data files

Previous releases

 TableBuilderdata seriesMicrodataDownloadDataLab
National Health Survey, 2017-18TableBuilderBasic microdataDetailed microdata
National Health Survey, 2014-15TableBuilderBasic microdataDetailed microdata
Australian Health Survey, National Health Survey, 2011-12TableBuilderBasic microdataDetailed microdata
Australian Health Survey, Core Content - Risk Factors and Selected Health Conditions, 2011-12TableBuilder Detailed microdata
National Health Survey, 2007-08 Basic microdataDetailed microdata
National Health Survey, 2004-05 Basic microdataDetailed microdata
National Health Survey, 2001 Basic microdataBasic microdata
National Health Survey, 1995 Basic microdata 
National Health Survey, 1989-90 Basic microdata 
National Health Survey, 1983 Basic microdata 
National Health Survey, 1977-78 Basic microdata 

Further information

See National Health Survey: First results, Methodology, 2020-21 for further information about the 2020-21 NHS cycle.

See National Health Survey: First results, 2017-18 and National Health Survey: First results, Methodology, 2017-18 for further information about the 2017-18 NHS cycle.

Previous catalogue number

This release previously used catalogue number 4324.0.55.001.