Microdata: Smoker Status
Presents pooled data about smoking from multiple household surveys including the National Health Survey
In 2021–22, the National Health Survey (NHS), National Study of Mental Health and Wellbeing (NSMHW), Survey of Income and Housing (SIH), and Survey of Disability and Carers (SDAC) collected a standard set of information that was pooled to produce the Smoker Status dataset.
The Smoker Status dataset provides data on the prevalence of smoking in Australia and can be cross classified by selected demographic and socio-economic characteristics.
Smoking data has previously been pooled for the 2020-21 and 2017-18 financial years. This data can also be accessed using DataLab however due to changes in data sources and collection methodologies, comparisons over time should be made with caution.
The DataLab environment allows real time access to detailed microdata from the Smoker Status pooled dataset.
DataLab is an interactive data analysis solution available for users to run advanced statistical analyses, for example, multiple regressions and structural equation modelling. The DataLab environment contains up-to-date versions of SPSS, Stata, SAS and R analytical languages. Controls in DataLab have been put in place to protect the identification of individuals and organisations. All output from DataLab sessions is cleared by an ABS officer before it is released.
For more information, including prerequisites for DataLab access, please see the About DataLab page.
The following table shows the levels available on the DataLab product and the information contained on those levels:
|Level name||Information contained on level|
|1. Household||Geographic classifications, household size and structure, and dwelling characteristics.|
|2. Selected person||Demographic and socio-economic characteristics of survey respondents, as well as information about smoking.|
The following table shows the hierarchical file structure and the relationship between each level:
One record per in scope household.
Selected persons in household aged 15 years and over.
Counts and weights
The following table shows the number of records on each level and the weighted counts for the 2017-18, 2020-21, and 2021-22 Smoker Status datasets. This data includes persons aged 15 years and over.
Record counts (unweighted)
There are two weight variables on each Smoker Status datasets. For the 2020-21 and 2021-22 datasets, these are:
- Household weight (HSPDHHWT) - benchmarked to produce household estimates.
- Person weight (HSPDSPWT) - benchmarked to the total population aged 15 years and over.
For the 2017-18 dataset, the weight variables are:
- Household weight (NHIFHHWT) - benchmarked to produce household estimates.
- Person weight (NHIFINWT) - benchmarked to the total population aged 15 years and over.
Available data items
Data items include:
- Demographics, such as Age, Sex, Country of Birth, Main language spoken, Marital status
- Household details, such as Type, Size, Household composition, Tenure, SEIFA, Geography
- Labour force status
- Educational attainment
- Self-assessed health status
- Visa status
- Current smoker status.
Each survey reference period has a data item list available in the Data downloads section. Each list provides information on all available data items and categories for that reference period. There may be different data items available for each period. Please refer to each list to confirm it meets your requirements before purchasing the DataLab product.
Every record on each level of the file has a unique identifier. These identifiers, ABSHIDD and ABSPID, appear on both levels of the file.
Each household has a unique fifteen-digit random identifier, ABSHIDD. This identifier appears on the household level and is repeated on each level on each record pertaining to that household. The combination of identifiers uniquely identifies a record at a particular level as shown below:
- Household = ABSHIDD
- Selected person = ABSHIDD + ABSPID
The household record identifier, ABSHIDD, assists with linking people from the same household, and also with household characteristics such as geography (located on the household level) to the person records.
Pooled smoking data was previously released for the 2020-21 and 2017-18 financial years. In 2020-21, the pooled dataset was created using sample from the NHS, General Social Survey (GSS), SIH, Time Use Survey (TUS) and the NSMHW. In 2017-18, the pooled dataset was created using sample from the NHS and SIH only.
While similar in content, each pooled dataset has different data sources and collection methodologies for the financial year and comparisons over time should be made with caution. In particular, the 2020-21 surveys were conducted during the peak of the COVID-19 pandemic with the majority of interviews collected via online, self-complete forms (64%). There were significant impacts on response rates and sample representativeness because Interviewer follow-up of non-responding households was not possible. The 2020-21 pooled smoking data is considered a break in series, and reflects the specific time point only.
For more information, see methodology for each dataset. Links to these datasets can be found in Further information.
Data item list
See Insights into Australian smokers, 2021-22 for summary results and methodology information.
See Pandemic insights into current Australian smokers, 2020-21 and Smoking, 2017-18 for summary results and methodology information for previous releases.
Previous catalogue number
This release previously used catalogue number 4324.0.55.004.