Microdata: National Nutrition and Physical Activity Survey

Provides data from the National Nutrition and Physical Activity Survey for key health statistics including nutrition, physical activity and sleep.

Release date and time
24/03/2026 11:30am AEDT

Introduction

The National Nutrition and Physical Activity Survey (NNPAS) is designed to provide a range of information about nutrition, physical activity, sedentary behaviour and sleep. This information can be cross classified by selected demographic and socioeconomic characteristics. 

This product provides information about the microdata releases from the two NNPAS surveys, 2023 and 2011–12, including details about the data files and how to use the different microdata products. Data Item Lists and information about the survey methodology are also provided.

The 2023 NNPAS is considered to generally be comparable to the 2011–12 NNPAS. However, due to the time between the NNPAS surveys, there have been numerous changes to the content. These changes are mostly due to updates to relevant nutrition and physical activity guidelines, updates to demographic standards (e.g. country of birth, occupation, industry) and the addition of content based on user needs. Major changes that occurred include changes to the dietary recall tool and the use of an accelerometer instead of a pedometer to collect direct measure data on physical activity, as well as sleep.

A summary of the main content changes applied in the 2023 NNPAS compared with the 2011–12 survey can be found in the NNPAS Methodology and Data Item List in the Data Downloads section. Additionally, a comparison of food and nutrient collections over time can be found in the Intergenerational Health and Mental Health Study: Concepts, Sources and Methods.

Available products

  • Basic microdata – approved users can download and analyse unit record data in their own environment. This product is currently available for the previous NNPAS survey in 2011-12. The 2023 NNPAS Basic microdata will be released in the second quarter of 2026. For more information, see the MicrodataDownload page.
  • Detailed microdata - approved users can access DataLab for in-depth and interactive data analysis using a range of statistical software packages. This product is available for the NNPAS survey from 2011-12 and 2023. For more information, including prerequisites for DataLab access, see the DataLab page.
  • TableBuilder is an online tool for creating tables and graphs, and can be accessed via the ABS website. Using TableBuilder, users can only access the 2011-12 NNPAS data.

File structure

Datasets from the NNPAS are hierarchical in nature. A hierarchical data file is an efficient means of storing and retrieving information which describes one to many, or many to many, relationships. For example, a child aged 5–17 years may report multiple days on which physical activity was undertaken, and different types of physical activity on each of these days.

Data about households and families are contained as individual characteristics on person records. While estimates are also available at the household level, estimates at the family level are not available from this survey. The data items and related output categories are described in Excel spreadsheets in the Data Item Lists in the Data Downloads section.

2023 NNPAS file structure

2011-12 NNPAS file structure

Counts and weights

2023 NNPAS

Number of records by level, NNPAS 2023 microdata
LevelRecord counts (unweighted)Weighted counts (if applicable)
Household8,81710,386,915
Person (Selected persons)11,19925,392,728
Conditions16,170N/A
Pre-School Aged Child (2-5) Physical Activity Day14,127N/A
School-Aged Child (5-17) Physical Activity Day 22,563N/A
School-Aged Child (5-17) Physical Activity Detail 29,307N/A
Dietary Recall (known as ‘Food level’ in 2011-12 NNPAS)313,416N/A
Supplements15,449N/A
Accelerometer - Sleep31,502N/A
Accelerometer - Day31,626N/A
Accelerometer - Hour759,024N/A
Accelerometer - Quarter Hour2,643,245N/A
Accelerometer - 5-SecondUp to 120,960 rows per file
(4,518 files)
N/A
Accelerometer - Input 100 HzUp to 60,480,000 rows per file
(4,518 files) 
N/A

2011-12 NNPAS

Number of records by level, NNPAS 2011-12 microdata
LevelRecord counts (unweighted)Weighted counts (if applicable)
Household level9,5198,581,354
Persons in Household level (All persons)23,464N/A
Person level (Selected persons)12,15321,526,456
Conditions level15,897N/A
Child 2-4 Physical Activity Day level16,203N/A
Child 5-17 Physical Activity Day level24,411N/A
Child 5-17 Physical Activity Detailed level31,789N/A
Adult Physical Activity level13,474N/A
Pedometer level51,341N/A
Biomedical level (Persons 5+)12,15320,649,321
Food level341,897N/A
Supplement level25,141N/A
Australian Dietary Guidelines level3,102,528N/A

Weight variables

For 2023 NNPAS, there are two weight variables on the file:

  • Household Weight (NPAHHWT) – Household level - Benchmarked
  • Person Weight (NPAFINWT) – Selected Person level - Benchmarked to the total population aged 2 years and over.

For 2011–12 NNPAS, there are three weight variables on the file:

  • Household Weight (NPAHHWT) – Household level - Benchmarked
  • Person Weight (NPAFINWT) – Selected Person level - Benchmarked to the total population aged 2 years and over
  • Biomedical persons (NHMSPERW) – located on the Biomedical level. This weight has been benchmarked to produce Australian population estimates based on Biomedical participants aged 5 years and over. 

For both years, there is no weight associated with the other levels. This is because the records are repeated for each selected person. If, for example, NPAFINWT is merged onto the Conditions level, it will be attached to each condition record and therefore be repeated for each selected person where they have more than one condition. This should be considered when producing tables and analysing microdata.

Using weights

The NNPAS is a sample survey, so to produce estimates for the in-scope population, you must use weight fields in your calculations. When analysing a Household level item at the household level, you will need to use the household weight. For example, if you wanted to know the number of households in a state, rather than the number of persons in that state, you need to use the household weight, not the person weight.

Caution should be used when applying the ‘Household’ weight to items from other levels. For example, if the household weight is applied to a selected person level demographic item, such as ‘Sex’, your table will show the number of households with one or more selected persons of that sex. Since up to two people can be selected in the NNPAS, this will result in some households being counted twice, once for the selected adult and once for the selected child, if they are both the same sex.

File content

Available data items

Data items for the 2023 NNPAS include:

  • Demographics – age, sex, gender and sexual orientation, country of birth, main language spoken, marital status
  • Household details – type, size, household composition, Socio-Economic Indexes for Areas (SEIFA), geography
  • Labour force status
  • Educational attainment
  • Household Income
  • Migrant and Visa status
  • Long-term health status relating to diabetes, hypertension, kidney disease
  • Risk factors such as tobacco smoking and physical activity
  • 24-hour dietary recall
  • Specific dietary information such as food avoidance, consumption of oils, fats, salt, and dietary supplements
  • Physical and sedentary activity
  • Sleep behaviours
  • Self-reported height and weight
  • Physical Measures – blood pressure, height, weight and waist. 

The Data Item List in the Data Downloads section is the definitive source of available data items and categories.

Identifiers

Every record on each level of the file is uniquely identified. See Data Item List in the Data Downloads section for details on which ID equates to which level.

Each household has a unique random identifier, ABSHIDD. This identifier appears on the household level and is repeated on each level on each record pertaining to that household. A combination of identifiers for a particular level and all levels above in the hierarchical structure uniquely identifies a record at a particular level. For example, each record on the conditions level is uniquely identified by a combination of the Household, Person and Conditions level identifiers.

The Household record identifier, ABSHIDD, assists with linking people from the same household, and with household characteristics such as geography (located on the household level) to the Person records. When merging data with a level above, only those identifiers relevant to the level above are required.

Multi-response items

Several questions in the survey allowed respondents to provide one or more responses. Each response category for these multi-response data items is treated as a separate data item. In the detailed microdata, these data items share the same identifier (SAS name) prefix but are each separately suffixed with a letter - A for the first response, B for the second response, C for the third response and so on.

For example, the multi-response data item 'Whether avoids foods(s) due to intolerances, allergies or cultural, religious or ethical reasons’ has four response categories. There are four data items named AVOIDFDA, AVOIDFDB, AVOIDFDC, AVOIDFDD. Each data item in the series will have either a positive response code or a null response code, with the exception of the first item in the series, AVOIDFDA. 

AVOIDFDA has four potential response codes: 

  • code 0 – null response
  • code 1 – ‘Food intolerances’ – positive response
  • code 5 – ‘Does not avoid foods for any of these reasons’
  • code 8 – ‘Not stated’.

The remaining items AVOIDFDB—D have just two response codes each; 0 for a null response and a non-zero number for a positive response. The Data Item List identifies all multi-response items and lists the corresponding codes with the corresponding response categories.

Note that the sum of individual multi-response categories will be greater than the population applicable to a particular data item as respondents can select more than one response.

Non-Indigenous flag

The purpose of the Non-Indigenous flag (NONINDST) is to assist users in producing non-Indigenous data only. It should not be used to estimate Aboriginal and Torres Strait Islander populations through differencing, as the scope of the NNPAS excludes Very Remote areas of Australia and discrete Aboriginal and Torres Strait Islander communities.

Continuous items

Some continuous data items are allocated special codes for certain responses (e.g. 9999 = 'Not applicable'). Any special codes for continuous (summation) data items are listed in the Data Item List (see the Data Downloads section) and will be found in the categorical version of the continuous item. However, note that labelling of '0's in the DIL does not necessarily mean they are excluded from the ranges (for example - identifying 0 as 'Did not do') as they may still be important in some calculations. Reference should be made to the categorical version of the item to identify which codes are specifically excluded. Therefore, the total shown only represents 'valid responses' of that continuous data item rather than all responses (including special codes).

Using 2023 NNPAS accelerometer microdata files

Accelerometers are a common type of sensor used to study human movement. They are wearable devices that measure linear acceleration – the change in a person’s speed (velocity) per unit time. Acceleration was measured 100 times per second (100Hz) for up to one week resulting in large files (up to 60,480,000 rows per file). 

Accelerometer data is output at different levels of detail. These include the most detailed input 100Hz files, as well as summary files with data per: 

  • 5-seconds
  • quarter-hour (15-minutes)
  • hour
  • day
  • person (weekly, weekday and weekend).

Due to the amount of data on the Input 100Hz and 5-second levels, these data are given as one file per participating person. Users may choose small samples of the data for the populations they are interested in. An example file (DataLab Test File) for the Accelerometer 5-second, Quarter hour and Input 100Hz levels is available in the Data Downloads section. More detailed data allow users to perform their own analysis of accelerometer data for specific research needs. 

For all sub-person levels, data is provided for each person who participated in the accelerometer study regardless of whether they met the minimum wear time requirements for inclusion in the publication estimates (see Methodology for specific details). Wear time and imputation flags have been provided to enable users to restrict the data.

All datasets, except the 5-second and Input 100Hz levels include columns with record identifiers called ABSHIDD (household ID) and ABSPIDD (person ID), which allow merging. For the 5-second and Input 100Hz levels, the file names contain these identifiers.

The following table shows some example use cases for each of the accelerometer levels in the Datalab:

LevelExample uses
Person levelFor analysis of physical activity and sleep per week and split by weekday and weekend days. Also includes flags for wear time thresholds, imputation rates and time zone which can be transferred to other levels.
Day levelFor analysis of physical activity and sleep by day of the week. This level includes a flag for wear day order (i.e. second 24-hour period, third 24-hour period). 
Sleep levelFor analysis of main sleep periods, physical activity during sleep periods, and sleep analysis by day of the week.
Hour levelFor analysis of physical activity and sleep by 'time-of-day' and per day. 
Quarter hour level

For analysis of acceleration, luminosity and temperature by time of day in 15-minute increments. Users may apply their own processing methodologies to this input data level, as it has minimal data processing applied. This level is the "long-epoch" dataset created by GGIR in Part 1.

Use of this level will result in faster processing times compared to the 5-second and 100Hz levels.

5-second levelFor analysis of acceleration in the 5-second (short epoch) time increments. For users who wish to analyse in GGIR with different acceleration thresholds or settings. This level can be used to run GGIR Parts 2-6 with the 5-second epoch set. Use of this semi-processed level will result in faster processing times compared to the 100Hz level. 
Input 100Hz levelFor detailed analysis of the x, y and z axis. For users who wish to use an analysis program other than GGIR, set their own epoch length or develop modelling algorithms related to accelerometry research. 

Day and Sleep levels

The ‘Accelerometer – Day level’ dataset has seven records for each person who participated in the accelerometer study. Each record represents a full 24‑hour period from midnight‑to‑midnight during the week the device was worn. Because respondents start and stop wearing the device at different times, the first and last partial days are combined into one complete 24‑hour record.

Some sleep data also appears on the ‘Accelerometer – Day level’. However, this does not represent one sleep period because it adds up all the sleep that happened between midnight and midnight. For people who go to sleep before midnight and wake up the next morning, the ‘Day level’ dataset will only include the sleep before midnight (plus any sleep after midnight from the night before).

The ‘Accelerometer – Sleep level’ dataset contains one record for each main sleep period. A main sleep period is defined as the longest period of sustained inactivity (or lack of movement) between midday and midday. Most respondents have 6–7 sleep-level records. See ‘Directly measured sleep’ in Methodology.

Quarter hour level

The quarter-hour level dataset has up to 168 hours of data for each respondent who participated in the accelerometer study, broken into 15-minute blocks. It is provided as one SAS or CSV file, and all times are shown in the respondent’s local time. 

An imputation flag is included to show where imputation was done for the higher-level datasets, but the quarter-hour data itself is not imputed. As data on this level do not have imputation, data cleaning or wear thresholds already applied, the outputs may not exactly match the Measured physical activity and Sleep publications if different methods are used. Users will need to consider data processing methods when using these data.

5-second level

The 5-second level dataset includes calculated acceleration measures and timestamps (in the respondent’s local time) for every 5-seconds. Only respondents who participated in the accelerometer study are included. There is no imputation on this level, and it does not include any information about wear-time rules. As data on this level do not have these already applied, the outputs may not exactly match the Measured physical activity and Sleep publication if different methods are used. Users will need to consider data processing methods when using these data.

Each respondent has their own CSV file, named using the format: 

“<ABSHIDD>-<ABSPIDD>.csv”

The file name can be used to merge on relevant demography data from higher level datasets. 

Each CSV can have up to 120,960 rows of data (which is seven days of 5-second periods) but may have fewer rows if the device was not worn the whole time. 

100Hz level

The 100Hz level is the highest detail data available from the accelerometer study. It matches the devices measurement rate of 100 readings per second.

Each participating respondent has one compressed gzip (.gz) file. Inside this file is a CSV containing all their data. This gzip file works like a ZIP file but usually gives better compression. These CSV files can be opened in common software such as: 

  • R (using gzfile())
  • Python (using the ‘gzip’ module)
  • De-compression tools like 7-Zip. 

Each file includes: 

  • acceleration in three axes (x, y and z)
  • time in milliseconds as an integer since the start of the UNIX epoch. 

Similar to the 5-second level, the file name uses ABSHIDD (household ID) and ABSPIDD (person ID) so the data can be matched to other datasets. As data on this level do not have imputation, data cleaning or wear thresholds already applied, the outputs may not exactly match the Measured physical activity and Sleep publication if different methods are used. Users will need to consider data processing methods when using these data.

Important note: Reading these files uses a lot of computing power. Users should choose a virtual machine that suits their analysis needs. See information about the virtual machine options at Using your workspace. It is strongly recommended not to decompress all files on DataLab unless necessary, because they take a long time to process and use a lot of storage. For example, a typical 7-day file contains up to 60,480,000 rows and is about 242MB when compressed or 2.07GB when uncompressed.

Using accelerometer timestamps

Reliability of estimates

As the survey was conducted on a sample of private households in Australia, it is important to take account of the method of sample selection when deriving estimates from the detailed microdata. This is important because a person's chance of selection in the survey varied depending on the state or territory in which the person lived. If these chances of selection are not accounted for by use of appropriate weights, the results could be biased.

Each household or person record has a main weight (NPAHHWT or NPAFINWT). This weight indicates how many population units are represented by the sample unit. When producing estimates of sub-populations from the detailed microdata, it is essential that they are calculated by adding the weights of households or persons in each category and not just by counting the sample number in each category. If each household’s or person’s weight were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a household's or person’s chance of selection or of different response rates across population groups. This could result in the estimates produced being biased. The application of weights ensures that estimates will conform to an independently estimated distribution of the population by age, by sex, etc., rather than to the distributions within the sample itself.

It is also important to calculate a measure of sampling error for each estimate. Sampling error occurs because only part of the population is surveyed to represent the whole population. Sampling error should be considered when interpreting estimates as this gives an indication of accuracy. It reflects the importance that can be placed on interpretations using the estimate. Measures of sampling error include standard error (SE), relative standard error (RSE) and margin of error (MoE). These measures of sampling error can be estimated using the replicate weights. The replicate weight variables provided on the microdata are labelled WHHORXX (household) and WPM01XX (person), where XX represents the number of the given replicate group. The exact number of replicates will vary depending on the survey. The NNPAS uses 60 replicate groups for both household and person weight labelled WHHOR01 to WHHOR60 (household) and WPM0101 to WPM0160 (person).

Using replicate weights for estimating sampling error

Overview of replication methods

ABS household surveys employ complex sample designs and weighting which require special methods for estimating the variance of survey statistics. Variance estimators for a simple random sample are not appropriate for this survey microdata.

A class of techniques called 'replication methods' provide a general process for estimating variance for the types of complex sample designs and weighting procedures employed in ABS household surveys. The ABS uses a method called the Group Jackknife Replication Method. 

A basic idea behind the replication approach is to split the sample into G replicate groups. One replicate group is then dropped from the file and a new set of weights is produced for the remaining sample. This is repeated for all G replicate groups to provide G sets of replicate weights. For each set of replicate weights, the statistic of interest is recalculated and the variance of the full sample statistic is estimated using the variability among the replicate statistics.

The statistics calculated from these replicates are called replicate estimates. Replicate weights provided on the microdata file enable variance of survey statistics, such as means and medians, to be calculated relatively simply (Further technical explanation can be found in Section 4 of Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee).

How to use replicate weights

To calculate the standard error of any statistic derived from the survey data, the method is as follows:

  1. Calculate the estimate of the statistic of interest using the main weight.
  2. Repeat the calculation above for each replicate weight, substituting the replicate weight for the main weight and creating G replicate estimates.  In the example where there are 60 replicate weights, you will have 60 replicate estimates.
  3. Use the outputs from steps 1 and 2 as inputs to the formula below to calculate the estimate of the Standard Error (SE) for the statistic of interest.
\[SE\left( y \right) = \sqrt {\;\left( {\frac{{G - 1}}{G}\;} \right)\;\;\;\sum\limits_{g = 1}^G {{{\left( {{y_{(g)}} - y} \right)}^2}} }\]

[equation 1]

  • \(G\) = Number of replicate groups
  • \(g\) = the replicate group number
  • \(y_{(g)}\) = Replicate estimate for group g, i.e. the estimate of y calculated using the replicate weight for g
  • \(y\) = the weighted estimate of y from the sample

From the replicate variance you can then derive the following measures of sampling error: relative standard error (RSE), or margin of error (MOE) of the estimate.

\[Relative\ Standard\ Error\ \left(RSE\right)=\frac{SE}{Estimate}\]

[equation 2]

\[Margin\ of\ Error\left(MoE\right)=1.96\times\ SE\]

[equation 3]

An example in calculating the SE for an estimate of the mean

Suppose you are calculating the mean value of earnings, y, in a sample. Using the main weight produces an estimate of $500.

You have 5 sets of Group Jackknife replicate weights and using these weights (instead of the main weight) you calculate 5 replicate estimates of $510, $490, $505, $503, $498 respectively. 

To calculate the standard error of the estimate you will substitute the following inputs to equation [1]:

\[\begin{equation}\begin{split} \ SE\left(y\right)&=\sqrt{\ \left(\frac{5-1}{5}\ \right)\ \ \ \sum_{g=1}^{5}\left(y_{\left(g\right)}-500\right)^2} \\ SE\left(y\right)&=13.8 \end{split} \end{equation}\]

To calculate the RSE you divide the SE by the estimate of \(y\left(500\right)\) and multiply by 100 to get a %:

\[\begin{equation} \begin{split} RSE(y) &= \frac{13.8}{500} \times 100 \\ RSE\left(y\right) &= 2.8\% \end{split} \end{equation}\]

To calculate the margin of error you multiply the SE by 1.96:

\[\begin{equation} \begin{split} Margin\ of\ Error\left(y\right) &= 13.8\times1.96 \\ Margin\ of\ Error\left(y\right) &= 27.05 \end{split} \end{equation}\]

Data Downloads

Data Item Lists

Data files

DataLab Test file

Test file for accelerometer - contains one zip file with an example version of the Quarter hour, 5-second and 100Hz levels (250MB).

Previous Releases

 TableBuilderdata seriesMicrodataDownload
National Nutrition and Physical Activity Survey, 2011-12TableBuilderBasic and Expanded CURFs

History of changes

Show all

Further Information

See National Nutrition and Physical Activity Survey: Methodology, 2023 for further information about the 2023 NNPAS cycle.

See Australian Health Survey: Users’ Guide, 2011-13 for further information about the 2011-12 NNPAS cycle.

Previous catalogue number

This release previously used catalogue number 4324.0.55.002.

Back to top of the page