Microdata: National Aboriginal and Torres Strait Islander Nutrition and Physical Activity Survey

Provides data from the NATSINPAS for key health statistics including nutrition, physical activity and sleep.

Release date and time
01/05/2026 11:30am AEST

Introduction

The National Aboriginal and Torres Strait Islander Nutrition and Physical Activity Survey (NATSINPAS) is designed to provide a range of information about nutrition, physical activity, sedentary behaviour and sleep. This information can be cross classified by selected demographic and socioeconomic characteristics. 

This product provides information about the microdata releases from the two NATSINPAS surveys, 2023 and 2012–13. It includes details about the data files and how to use the different microdata products. Data Item Lists, information about the survey methodology, and a link to microdata for the previous NATSINPAS release (2012–13) is also provided.

The 2023 NATSINPAS is generally comparable to the 2012–13 NATSINPAS. However, due to the time between the NATSINPAS surveys, there have been a number of changes to the content. These changes are mainly due to updates to relevant nutrition and physical activity guidelines, updates to demographic standards (for example, occupation, industry) and the addition of content based on user needs. Major changes that occurred include changes to the dietary recall tool and the use of an accelerometer instead of a pedometer to collect directly measured data on physical activity, inactivity and sleep.

A summary of the main content changes applied in the 2023 NATSINPAS compared with the 2012–13 survey can be found in the NATSINPAS Methodology and Data Item List in the Data downloads section. Additionally, a comparison of food and nutrient collections over time can be found in the Intergenerational Health and Mental Health Study: Concepts, Sources and Methods.

Available products

  • Basic microdata – approved users can download and analyse unit record data in their own environment. This product is currently available for the previous NATSINPAS survey in 2012–13. The 2023 NATSINPAS Basic microdata will be released in the second quarter of 2026. For more information, see the MicrodataDownload page.
  • Detailed microdata - approved users can access DataLab for in-depth and interactive data analysis using a range of statistical software packages. This product is available for the NATSINPAS survey from 2012–13 and 2023. For more information, including prerequisites for DataLab access, see the DataLab page.
  • TableBuilder is an online tool for creating tables and graphs, and can be accessed via the ABS website. Using TableBuilder, users can only access the 2012–13 NATSINPAS data.

File structure

Datasets from the NATSINPAS are hierarchical in nature. A hierarchical data file is an efficient means of storing and retrieving information which describes one to many, or many to many, relationships. For example, a child aged 5-17 may report multiple days on which physical activity was undertaken, and different types of physical activity on each of these days.

Most data are contained as individual characteristics on person records. Estimates are also available at the household level. The data items and related output categories are described in Excel spreadsheets in the Data Item Lists in the Data downloads section.

2023 NATSINPAS file structure

2012–13 NATSINPAS file structure

Counts and weights

2023 NATSINPAS

Number of records by level, NATSINPAS 2023 microdata:

LevelRecord counts (unweighted)Weighted counts (if applicable)
Household2,097462,188
Person (Selected persons)2,879946,233
Conditions5,348N/A
Pre-School Aged Child (2-5) Physical Activity Day3,027N/A
School-Aged Child (5-17) Physical Activity Day 3,493N/A
School-Aged Child (5-17) Physical Activity Detail 4,098N/A
Adult Physical Activity Day 8,345N/A
Adult Physical Activity Detail 3,006N/A
Dietary Recall (known as ‘Food level’ in 2012–13 NATSINPAS)39,829N/A
Supplements3,175N/A
Accelerometer - Sleep7,140N/A
Accelerometer - Day7,231N/A
Accelerometer - Hour173,544N/A
Accelerometer - Quarter Hour524,536N/A
Accelerometer - 5-SecondUp to 120,960 rows per file
(1,033 files)
N/A
Accelerometer - Input 100HzUp to 60,480,000 rows per file
(1,033 files)
N/A

2012–13 NATSINPAS

Number of records by level, NATSINPAS 2012–13:

LevelsRecord counts (unweighted)Weighted counts (if applicable)
Household level2,900N/A
Persons in Household level (All persons)10,275N/A
Person level (Selected persons)4,109609,915
Conditions level5,414N/A
Child 2-4 Physical Activity Day level (NR only)4,399N/A
Child 5-17 Physical Activity Day level (NR only)5,063N/A
Child 5-17 Physical Activity Detailed level (NR only)5,982N/A
Adult Physical Activity level (NR only)4,202N/A
Pedometer level (NR only)7,753N/A
Biomedical level (Persons 5+)4,109365,868
Food level72,376N/A
Supplement level8,538N/A
Australian Dietary Guidelines level761,280N/A

Weight variables

For the 2023 NATSINPAS, there are two weight variables on the file:

  • Household Weight (FINHHWT) - Household level - Benchmarked
  • Person Weight (FINPERWT) - Selected Person level - Benchmarked to the total population aged 2 years and over.

There is no weight associated with the other levels. This is because the records are repeated for each selected person. If, for example, FINPERWT is merged onto the Conditions level, it will be attached to each condition record and therefore be repeated for each selected person where they have more than one condition. This should be considered when producing tables and analysing microdata.

For the 2012–13 NATSINPAS, there are two weight variables on the file:

  • Person Weight (IPAFINWT) - Selected Person level - Benchmarked to the total population aged 2 years and over.
  • Biomedical persons (IHMSPERW) - located on the Biomedical level. This weight has been benchmarked to produce Australian population estimates based on Biomedical participants aged 5 years and over. 

There are no weights associated with the Household level. Household variables can be used in conjunction with the Person or Biomedical weights to provide, for example, geographic or household compositional information for selected persons. There are also no weights associated with the other levels. This is because the records are repeated for each person who was selected in the survey. If, for example, IPAFINWT is merged onto the Conditions level, it will be attached to each condition record and therefore be repeated for each person where they have more than one condition. This should be considered when producing tables and analysing microdata. 

Using weights in the 2023 NATSINPAS

The NATSINPAS is a sample survey, so to produce estimates for the in-scope population, you must use weight fields in your calculations. When analysing a Household level item at the household level, you will need to use the household weight. For example, if you wanted to know the number of households in a state, rather than the number of persons living in that state, you need to use the household weight, not the person weight.

Caution should be used when applying the ‘Household’ weight to items from other levels. For example, if the household weight is applied to a selected person level demographic item, such as ‘Sex’, your table will show the number of households with one or more selected persons of that sex. Since up to two people can be selected in the NATSINPAS, this will result in some households being counted twice, once for the selected adult and once for the selected child, if they are both the same sex.

File content

Available data items

Data items for the 2023 NATSINPAS include:

  • Demographics – age, sex, Indigenous status, main language spoken at home, marital status
  • Household details – size, household composition, Socio-Economic Indexes for Areas (SEIFA), geography
  • Employment – Labour force status, hours usually worked
  • Education – current study status, attainment
  • Household Income
  • Long-term health status relating to diabetes, kidney disease, mental health conditions
  • Risk factors such as tobacco smoking and physical activity
  • 24-hour dietary recall
  • Specific dietary information such as consumption of fruits, vegetables, oils, fats, salt, tap water and dietary supplements
  • Influences on dietary choices
  • Physical and sedentary activity
  • Sleep behaviours
  • Self-reported height and weight
  • Physical Measures – blood pressure, height, weight and waist. 

The Data Item List in the Data downloads section is the definitive source of available data items and categories.

Identifiers

Every record on each level of the file is uniquely identified. See Data Item List in the Data downloads section for details on which ID equates to which level.

Each household has a unique random identifier, ABSHIDD. This identifier appears on the household level and is repeated on each level on each record pertaining to that household. A combination of identifiers for a particular level and all levels above in the hierarchical structure uniquely identifies a record at a particular level. For example, each record on the conditions level is uniquely identified by a combination of the Household, Person and Conditions level identifiers.

The Household record identifier, ABSHIDD, assists with linking people from the same household, and with household characteristics such as geography (located on the household level) to the Person records. When merging data with a level above, only those identifiers relevant to the level above are required.

Multi-response items

Several questions in the survey allowed respondents to provide one or more responses. Each response category for these multi-response data items is treated as a separate data item. In the detailed microdata, these data items share the same identifier (SAS name) prefix but are each separately suffixed with a letter - A for the first response, B for the second response, C for the third response and so on.

For example, the multi-response data item 'All types of physical activity undertaken in last week' (PATYPEW) has seven response categories. There are seven data items named PATYPEWA, PATYPEWB, PATYPEWC....PATYPEWG. Each data item in the series will have either a positive response code or a null response code, with the exception of the first item in the series, PATYPEWA. 

PATYPEWA has four potential response codes: 

  • code 0 – null response
  • code 1 – 'Walking for exercise, recreation or sport' – positive response
  • code 8 – 'No physical activity in last week'
  • code 9 – 'Not applicable'. 

The remaining items PATYPEWB, PATYPEWC....PATYPEWG have just two response codes each. The Data Item List identifies all multi-response items and lists the corresponding codes with the corresponding response categories. See the Data Item List in the Data downloads section.

Note that the sum of individual multi-response categories will be greater than the population applicable to a particular data item as respondents can select more than one response.

Continuous items

Some continuous data items are allocated special codes for certain responses (e.g. 9999 = 'Not applicable'). Any special codes for continuous (summation) data items are listed in the Data Item List and will be found in the categorical version of the continuous item. However, note that labelling of '0's in the Data Item List does not necessarily mean they are excluded from the ranges (for example - identifying 0 as 'Did not do') as they may still be important in some calculations. Reference should be made to the categorical version of the item to identify which codes are specifically excluded. Therefore, the total shown only represents 'valid responses' of that continuous data item rather than all responses (including special codes). See the Data Item List in the Data downloads section.

Using 2023 NATSINPAS accelerometer microdata files

Accelerometers are a common type of sensor used to study human movement. They are wearable devices that measure linear acceleration – the change in a person’s speed (velocity) per unit time. Acceleration was measured 100 times per second (100Hz) for up to one week resulting in large files (up to 60,480,000 rows per file). 

Accelerometer data is output at different levels of detail. These include the most detailed input 100Hz files, as well as summary files with data per: 

  • 5-seconds
  • quarter-hour (15-minutes)
  • hour
  • day
  • person (weekly, weekday and weekend).

Due to the amount of data on the Input 100Hz and 5-second levels, these data are given as one file per participating person. Users may choose small samples of the data for the populations they are interested in. An example file (DataLab Test File) for the Accelerometer 5-second, Quarter hour and Input 100Hz levels is available in the Data downloads section. More detailed data allow users to perform their own analysis of accelerometer data for specific research needs. 

For all sub-person levels, data is provided for each person who participated in the accelerometer study regardless of whether they met the minimum wear time requirements for inclusion in the publication estimates. Wear time and imputation flags have been provided to enable users to restrict the data.

As discussed in the NATSINPAS methodology, for people living in remote areas, publication estimates were produced using the first four days of data; that is, the first three full 24 hour periods of wear, and the first and last partial 24 hour periods summed. Categories (1,2,3,7) from the data items “Midnight-midnight periods, first-last day summed” on the Accelerometer hour level and day level (variable names; MMORDDAY, MMORDER) can be used to identify these days (see table below). To meet the minimum wear criteria for inclusion in the publication estimates, there must have been at least 48 hours of wear time in this period.

'Midnight-midnight periods, first-last day summed' categories

For people living in non-remote areas, publication estimates were produced for all persons with at least 48 hours of wear time.

For sleep estimates, this population is as stated above with either:

  • The first four nights of data for persons living in remote areas having at least two nights of sleep data with >80% wear time, or
  • all nights of data for persons in non-remote with >80% wear time. 

NOTE: Example R and SAS code to handle the different wear times in remote and non-remote areas is available in the DataLab.

All datasets, except the 5-second and Input 100Hz levels, include columns with record identifiers called ABSHIDD (household ID) and ABSPIDD (person ID), which allow merging. For the 5-second and Input 100Hz levels, the file names contain these identifiers. For more information, see the 5-second and Input 100Hz level sections below. 

The following table shows some example use cases for each of the accelerometer levels in the DataLab:

LevelExample uses
Person levelFor analysis of physical activity and sleep per week and split by weekday and weekend days. Also includes flags for wear time thresholds, imputation rates and time zone which can be transferred to other levels.
Day levelFor analysis of physical activity and sleep by day of the week. This level includes a flag for wear day order (i.e. second 24 hour period, third 24 hour period).
Sleep levelFor analysis of main sleep periods, physical activity during sleep periods, and sleep analysis by day of the week.
Hour levelFor analysis of physical activity and sleep by ‘time-of-day’ and per day.
Quarter hour level

For analysis of acceleration, luminosity and temperature by time of day in 15-minute increments. Users may apply their own processing methodologies to this input data level, as it has minimal data processing applied. This level is the “long-epoch” dataset created by GGIR in Part 1.

Use of this level will result in faster processing times compared to the 5-second and 100Hz levels.

5-second levelFor analysis of acceleration in the 5-second (short epoch) time increments. For users who wish to analyse in GGIR with different acceleration thresholds or settings. This level can be used to run GGIR Parts 2-6 with the 5-second epoch set. Use of this semi-processed level will result in faster processing times compared to the 100Hz level.
Input 100Hz levelFor detailed analysis of the x, y and z axis. For users who wish to use an analysis program other than GGIR, set their own epoch length or develop modelling algorithms related to accelerometry research.

Day and Sleep levels

The ‘Accelerometer – Day level’ dataset has seven records for each person who participated in the accelerometer study. Each record represents a full 24 hour period from midnight‑to‑midnight during the week the device was worn. Because respondents start and stop wearing the device at different times, the first and last partial days are combined into one complete 24 hour record.

Some sleep data also appears on the ‘Accelerometer – Day level’. However, this does not represent one sleep period because it adds up all the sleep that happened between midnight and midnight. For people who go to sleep before midnight and wake up the next morning, the ‘Day level’ dataset will only include the sleep before midnight (plus any sleep after midnight from the night before).

The ‘Accelerometer – Sleep level’ dataset contains one record for each main sleep period. A main sleep period is defined as the longest period of sustained inactivity (or lack of movement) between midday and midday. Most respondents have 6-7 sleep-level records. See ‘Measured physical activity and sleep (accelerometer)’ in the methodology

Quarter hour level

The quarter-hour level dataset has up to 168 hours of data for each respondent who participated in the accelerometer study, broken into 15-minute blocks. It is provided as one SAS or CSV file, and all times are shown in the respondent’s local time. 

An imputation flag is included to show where imputation was done for the higher-level datasets, but the quarter-hour data itself is not imputed. As data on this level do not have imputation, data cleaning or wear thresholds already applied, the outputs may not exactly match the NATSINPAS publication if different methods are used. Users will need to consider data processing methods when using these data.

5-second level

The 5-second level dataset includes calculated acceleration measures and timestamps (in the respondent’s local time) for every 5-seconds. Only respondents who participated in the accelerometer study are included. There is no imputation on this level, and it does not include any information about wear-time rules. As data on this level do not have these already applied, the outputs may not exactly match the NATSINPAS publication if different methods are used. Users will need to consider data processing methods when using these data.

Each respondent has their own CSV file, named using the format: 

“<ABSHIDD>-<ABSPIDD>.csv”

The file name can be used to merge on relevant demography data from higher level datasets. 

Each CSV can have up to 120,960 rows of data (which is seven days of 5-second periods) but may have fewer rows if the device was not worn the whole time. 

100Hz level

The 100Hz level is the highest detail data available from the accelerometer study. It matches the devices measurement rate of 100 readings per second.

Each participating respondent has one compressed gzip (.gz) file. Inside this file is a CSV containing all their data. This gzip file works like a ZIP file but usually gives better compression. These CSV files can be opened in common software such as: 

  • R (using gzfile())
  • Python (using the ‘gzip’ module)
  • De-compression tools like 7-Zip. 

Each file includes: 

  • acceleration in three axes (x, y and z)
  • time in milliseconds as an integer since the start of the UNIX epoch. 

Similar to the 5-second level, the file name uses ABSHIDD (household ID) and ABSPIDD (person ID) so the data can be matched to other datasets. As data on this level do not have imputation, data cleaning or wear thresholds already applied, the outputs may not exactly match the NATSINPAS publication if different methods are used. Users will need to consider data processing methods when using these data.

Important note: Reading these files uses a lot of computing power. Users should choose a virtual machine that suits their analysis needs. See information about the virtual machine options at Using your workspace. It is strongly recommended not to decompress all files on DataLab unless necessary, because they take a long time to process and use a lot of storage. For example, a typical 7-day file contains up to 60,480,000 rows and is about 242MB when compressed or 2.07GB when uncompressed.

Using accelerometer timestamps

The timestamps in the 100Hz level are a 9-digit number. This represents the milliseconds since 1 January 1970 at 00:00:00 UTC. 

These timestamps are designed so that when converted to the respondent’s local time, the day of the week and the clock time (for example, “Thursday” or “9:19am”) match the survey data. The timestamps do not include the real calendar date of when the data was collected. Month and season of interview, as well as each participant’s local time zone is available on the ‘Person level’.

Example formulae for converting time stamps to AEST (UTC +10:00):

Microsoft Excel:

=A1 / 1000 / 86400 + DATE(1970,1,1) + TIME(10,0,0)

R (general):

as.POSIXct(timestamp / 1000, origin = "1970-01-01", tz = "Australia/Canberra")

R (GGIR):

Parameters: configtz = “UTC” and desiredtz = “Australia/Canberra”

Python (using Pandas):

datetime.fromtimestamp(timestamp / 1000, tz = ZoneInfo("Australia/Canberra"))

SAS:

timestamp / 1000 + HMS(10,00,0) + MDY(1,1,1970) * 86400;

Reliability of estimates

As the survey was conducted on a sample of private households in Australia, it is important to take account of the method of sample selection when deriving estimates from the detailed microdata. This is important because a person's chance of selection in the survey varied depending on the state or territory in which the person lived. If these chances of selection are not accounted for by use of appropriate weights, the results could be biased.

Each household or person record has a main weight (FINHHWT or FINPERWT). This weight indicates how many population units are represented by the sample unit. When producing estimates of sub-populations from the detailed microdata, it is essential that they are calculated by adding the weights of households or persons in each category and not just by counting the sample number in each category. If each household’s or person’s weight were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a household's or person’s chance of selection or of different response rates across population groups. This could result in estimates produced being biased. The application of weights ensures that estimates will conform to an independently estimated distribution of the population by age, by sex, etc., rather than to the distributions within the sample itself.

It is also important to calculate a measure of sampling error for each estimate. Sampling error occurs because only part of the population is surveyed to represent the whole population. Sampling error should be considered when interpreting estimates as this gives an indication of accuracy. It reflects the importance that can be placed on interpretations using the estimate. Measures of sampling error include standard error (SE), relative standard error (RSE) and margin of errors (MoE). These measures of sampling error can be estimated using the replicate weights. The replicate weight variables provided on the microdata are labelled WHM1XXX (household) and WPM1XXX (person), where XXX represents the number of the given replicate group. The exact number of replicates will vary depending on the survey. The NATSINPAS uses 250 replicate groups for both household and person weights labelled WHM1001 to WHM1250 (household) and WPM1001 to WPM1250 (person). 

Using replicate weights for estimating sampling error

Overview of replication methods

ABS household surveys employ complex sample designs and weighting which require special methods for estimating the variance of survey statistics. Variance estimators for a simple random sample are not appropriate for this survey microdata.

A class of techniques called 'replication methods' provide a general process for estimating variance for the types of complex sample designs and weighting procedures employed in ABS household surveys. The ABS uses a method called the Group Jackknife Replication Method. 

A basic idea behind the replication approach is to split the sample into G replicate groups. One replicate group is then dropped from the file and a new set of weights is produced for the remaining sample. This is repeated for all G replicate groups to provide G sets of replicate weights. For each set of replicate weights, the statistic of interest is recalculated and the variance of the full sample statistic is estimated using the variability among the replicate statistics.

The statistics calculated from these replicates are called replicate estimates. Replicate weights provided on the microdata file enable variance of survey statistics, such as means and medians, to be calculated relatively simply (Further technical explanation can be found in Section 4 of Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee).

How to use replicate weights

To calculate the standard error of any statistic derived from the survey data, the method is as follows:

  1. Calculate the estimate of the statistic of interest using the main weight.
  2. Repeat the calculation above for each replicate weight, substituting the replicate weight for the main weight and creating G replicate estimates. In the example where there are 250 replicate weights, you will have 250 replicate estimates.
  3. Use the outputs from steps 1 and 2 as inputs to the formula below to calculate the estimate of the Standard Error (SE) for the statistic of interest.
\[SE\left( y \right) = \sqrt {\;\left( {\frac{{G - 1}}{G}\;} \right)\;\;\;\sum\limits_{g = 1}^G {{{\left( {{y_{(g)}} - y} \right)}^2}} }\]

[equation 1]

  • \(G\) = Number of replicate groups
  • \(g\) = the replicate group number
  • \(y_{\left(g\right)}\) = Replicate estimate for group g, i.e. the estimate of y calculated using the replicate weight for g
  • \(y\) = the weighted estimate of y from the sample.

From the replicate variance you can then derive the following measures of sampling error: relative standard error (RSE), or margin of error (MOE) of the estimate.

\[Relative\ Standard\ Error \left(RSE\right)=\frac{SE}{Estimate}\]

[equation 2]

\[Margin\ of\ Error\left(MoE\right)=1.96 \times SE\]

[equation 3]

 

An example in calculating the SE for an estimate of the mean

Suppose you are calculating the mean value of earnings, y, in a sample.  Using the main weight produces an estimate of $500.

You have 5 sets of Group Jackknife replicate weights and using these weights (instead of the main weight) you calculate 5 replicate estimates of $510, $490, $505, $503, $498 respectively. 

To calculate the standard error of the estimate you will substitute the following inputs to equation [1]:

  • \(G\) = 5
  • \(y\) = 500
  • \(g\) = 1, \(y_{\left(g\right)}\) = 510
  • \(g\) = 2, \(y_{\left(g\right)}\) = 490
\[\begin{align} SE(y) &= \sqrt {\frac{{5 - 1}}{5}\sum\limits_{g = 1}^5 {{{({y_{(g)}} - 500)}^2}} } \\ SE(y) &= \sqrt {\frac{4}{5}({{(510 - 500)}^2} + {{(490 - 500)}^2} + {{(505 - 500)}^2} + {{(503 - 500)}^2} + {{(498 - 500)}^2})} \\ SE(y) &= \sqrt {\frac{4}{5} \times 238} \\ SE\left( y \right) &= 13.8 \end{align}\]

To calculate the RSE you divide the SE by the estimate of  and multiply by 100 to get a %:

\[\begin{equation} \begin{split} RSE\left(y\right) &= \frac{13.8}{500} \times 100 \\ RSE\left(y\right) &= 2.8\% \end{split} \end{equation}\]

To calculate the margin of error you multiply the SE by 1.96:

\[\begin{equation} \begin{split} Margin\ of\ Error\left(y\right) &= 13.8\times1.96 \\ Margin\ of\ Error\left(y\right) &= 27.05 \end{split} \end{equation}\]

Data downloads

Data item lists

Data files

DataLab Test file

Test file for accelerometer - contains one zip file with an example version of the Quarter hour, 5-second and 100Hz levels (250MB).

Previous releases

 TableBuilderdata seriesMicrodataDownload
National Aboriginal and Torres Strait Islander Nutrition and Physical Activity Survey, 2012–13TableBuilderBasic and Expanded CURFs

Further information

See National Aboriginal and Torres Strait Islander Nutrition and Physical Activity Survey, Methodology, 2023 for further information about the 2023 NATSINPAS cycle.

See Australian Aboriginal and Torres Strait Islander Health Survey: Users’ Guide, 2012–13 for further information about the 2012–13 NATSINPAS cycle.

Previous catalogue number

This release previously used catalogue number 4715.0.30.002.

Back to top of the page