# Microdata and TableBuilder: Census of Population and Housing

Designed for complex data queries such as detailed analysis and modelling on appropriately confidentialised unit record data

## Introduction

This publication provides a range of information about the release of microdata from the Census. Microdata products are the most detailed information available from the Census. They contain data which is either responses to individual questions on the Census form or data derived from answers to two or more questions. Microdata is released with the approval of the Australian Statistician.

This publication includes:

• how to apply for and use the microdata
• information about the conditions of use
• information on the quality of the microdata.

The Census microdata files comprise of:

• TableBuilder datasets
• basic microdata files
• detailed microdata files.

Subject to limitations in the data classifications used, these files enable users to tabulate, manipulate and analyse data to their own specifications.

The detailed microdata files contain small systematic samples of confidentialised occupied private dwellings and non-private dwellings, with their associated family and person records.

The basic microdata file contains a small systematic sample of confidentialised occupied private dwellings with their associated family and person records, and a random sample of persons from all non-private dwellings together with a record for the associated non-private dwelling.

### Available products

The Census microdata files are available in:

• ABS DataLab – analyse detailed microdata within the ABS’ secure system

To access these products and to learn more, see Microdata Entry page.

### Apply for access

Before applying for access, users should read the Responsible use of ABS microdata user guide to understand the obligations when using microdata.

To choose the best data product or service for you and to learn how to access, see Compare data services.

### Data available on request

Data obtained in the Census but not contained in the Census data products may be available from the ABS, on request, as statistics in tabulated form. Subject to confidentiality and sampling variability constraints, special tabulations can be produced incorporating variables, populations and geographic areas selected to meet individual requirements. These are available on a fee for service basis. Enquiries should be submitted via an Information consultancy form

To view variables available for request, refer to either the current 2021 Census dictionary or the historical dictionaries from previous Census years.

## Using Census data in TableBuilder

TableBuilder is an online data tool in which you can create tables of ABS microdata. It is designed to help you produce data specific to your needs through a flexible online user interface.

Within TableBuilder, you can:

• construct tables of Census data for a range of geographic areas, including small area geographies like Mesh Blocks, Statistical Area Level 1s or Postal Areas
• display data by counts or percentages in your table
• create, save and share customised geographic areas and recodes with other registered users.

Information on variables (also referred to as data items) can be found in the data item lists in the Data downloads. Detailed information about Census variables and concepts can be found in either the current 2021 Census dictionary or the historical dictionaries from previous Census years.

System restrictions have been implemented which prevent the cross-tabulation of certain variables within several Census Pro datasets.

These restrictions have been applied to:

• maintain the confidentiality of respondents
• ensure the output of quality data
• assist users by not allowing combinations of variables that statistically should not be combined.

When the restriction is triggered the following error message will be displayed: "The variable you are trying to add cannot be used with one of the variables already in the table.” Other similar variables may be available. For example, if you are using geographical areas from Mesh Blocks, you may be able to use another geographical area variable instead, such as Main Statistical Area Structure (Main ASGS).

Visit the TableBuilder page for information on how to access and use TableBuilder.

## Using basic microdata

The following information is relevant to 2016 microdata files. 2021 Census data will be released in a basic microdata product as part of the third data release in 2023. Planning is underway and will be shared when confirmed.

The full classification structures for the 2016 basic microdata file variables can be found in the 2016 Census Dictionary

Many of the classifications in the basic microdata file have been collapsed and the full listings of the basic microdata classifications are detailed in the data items lists in the Data downloads section.

### Identifiers

#### Dwelling, Family and Person IDs

Each record level are given an identifier:

• Dwelling (Household) - ABSHID
• Family - ABSFID
• Person - ABSPID.

To enable users to link records, the following Identifiers are available across levels:

• ABSFID and the related ABSHID on each family record
• ABSPID and the related ABSFID and ABSHID on each person record.

#### Dwelling Indicator for Persons

The DWIP (Dwelling Indicator for Persons) variable was introduced in 2006 as a way of enabling users of the microdata files to more easily distinguish between those people enumerated in private dwellings and those enumerated in non-private dwellings (without the need to link to the household file). This variable was applied in 2011 and is included in 2016 as well.

The DWIP variable applies to all persons enumerated in an occupied private or non-private dwelling. Categories are:

1. Enumerated in an occupied private dwelling
2. Enumerated in a non-private dwelling.

As migratory, off-shore and shipping areas were not included in the sample, there is no `Not applicable' category for this variable.

#### Geography

The basic microdata file contains information on the geographic area of selected dwellings. For 2016, geographic areas in the basic microdata file are based on the Australian Statistical Geography Standard (ASGS).

To ensure that the information on the file is not likely to enable identification of a person or household, all areas are defined using a minimum population size of 250,000 persons (except for the Northern Territory which had a total population of 228,833 persons) from the full Census. Records are randomly ordered within a region to further reduce the likelihood of individual identification.

All regions can be aggregated to the state level.

Geographic regions are formed from Statistical Area Level 4 and form the basis of the following data items:

• AREAENUM (Area of enumeration)
• REGUCP (Region of usual residence on Census Night)
• REGU1P (Region of usual residence 1 year ago) and
• REGU5P (Region of usual residence 5 years ago) data items.

A full list of regions is included in the data item list.

#### Files and file structures

Dwelling, family and person level files are available in the following formats:

• CSV in a comma delimited ASCII text format
• SAS for Windows
• SPSS for Windows
• STATA.

## Using detailed microdata

The following information is relevant to 2016 microdata files. 2021 Census data will be released in a basic microdata product as part of the third data release in 2023. Planning is underway and will be shared when confirmed.

Detailed microdata files are the ABS's most detailed unit record data and have been designed specifically for use within the DataLab environment. A 5% sample of person, family and household unit record data from the 2016 Census has been released as detailed microdata files into the ABS' DataLab environment.

The full listing of the detailed microdata classifications and the corresponding Census classifications are detailed in the data item lists in the Data downloads section. In some cases these will differ marginally.

Further information about 2016 Census variables can be found in the 2016 Census dictionary. For information about response rates and Census data quality, please visit the Understanding the Census and Census data publication.

### Identifiers

Dwelling, family and person IDs as well as DWIP for detailed microdata are the same as identifiers for basic microdata.

#### Geography

The detailed microdata file contains information on the geographic area of selected dwellings and each person's usual residence geographies. For 2016, geographic areas in the file have been based on the ASGS.

A list of the geographic variables available in the detailed microdata file is available in the data item list in the Data downloads section.

#### Files and file structures

##### CSV

These files contain the data in a comma delimited ASCII text format:

• CDM16_dwelling.csv contains the Dwelling level data
• CDM16_family.csv contains the Family level data
• CDM16_person.csv contains the Person level data.
##### SAS

These files contain the data in SAS for Windows format:

• CDM16_dwelling.sas7bdat contains the Dwelling level data
• CDM16_family.sas7bdat contains the Family level data
• CDM16_person.sas7bdat contains the Person level data.
##### SPSS

These files contain the data in SPSS for Windows format:

• CDM16_dwelling.sav contains the Dwelling level data
• CDM16_family.sav contains the Family level data
• CDM16_person.sav contains the Person level data.
##### STATA

These files contain the data in STATA format:

• CDM16_dwelling.dta contains the Dwelling level data
• CDM16_family.dta contains the Family level data
• CDM16_person.dta contains the Person level data.
##### Information files

This file is a SAS library containing formats.

• FORMATS.sas7bcat

## Methodology for basic and detailed microdata

### Selection of sample

Data in the Census basic and detailed microdata files represent samples of dwelling, family and person records from the Census. Systematic sampling techniques were utilised to ensure a representative sample across states and territories in each microdata file.

### Detailed microdata and basic microdata files

The detailed microdata file contains a 5% sample of dwelling records, taken from occupied private dwellings and non-private dwellings, and their associated family and person records. That is, the detailed microdata file provides a sample of five occupied private and non-private dwelling records in every hundred from the Census with their associated family and person records.

The 1% basic microdata file provides a sample of one private dwelling record in every hundred from the Census and their associated family and person records. Dwellings with more than six usual residents were removed from the sample to ensure confidentiality of large dwellings. For non-private dwellings, the sampling is applied to persons present where one person in every hundred is selected and the associated dwelling records included on the file.

The data are released under the Census and Statistics Act 1905, which has provision for the release of individual level records (unit records) where the information is not likely to enable the identification of a particular person or organisation. Accordingly, there are no names or addresses on the microdata files and other steps, including the following list of actions, are taken to maintain respondent confidentiality.

In both the detailed and basic microdata files:

• Records from the Other Territories, comprising Jervis Bay, Cocos (Keeling) and Christmas Islands, have been excluded from sampling, as have migratory, shipping and off-shore statistical areas.
• Some variables that were collected in the Census have been excluded from the files.

In the basic microdata file, additional confidentiality measures were undertaken:

• Large households (with seven or more usual residents) have been replaced in the sample to ensure confidentiality of large households. A dwelling from a similar geographic region of a similar size (up to six residents) was chosen by random sampling as a replacement for each large household.
• The level of detail of certain variables has been reduced by grouping, ranging or top coding values.
• Where necessary, minor edits were made to individual records.

The nature of the changes made, and the relatively small number of records involved, ensure that the effect on data for analysis purposes is considered negligible. These changes also mean that estimates produced from the microdata files may differ from those published in other Census tools and products.

Data included on the microdata files comprise the key output items for the Census, including person demographics, labour force, education, family and dwelling characteristics. For a full list of available variables (also known as data items) in TableBuilder, detailed and basic microdata files, please see the data item lists in the Data downloads section.

### Changes from previous Census Microdata files

There have been 5 new variables included on the 2016 detailed microdata file and 4 new variables on the basic microdata file. These are:

• Indigenous status (INGP) on the persons level
• Indigenous household indicator (INGDWTD) on the dwelling level
• Form type (FTPP) on the persons level
• Status in employment (SIEMP), which is a new item for the 2016 Census and replaces Employment type (EMTP), which was used in 2011 Census output.
• Type of non-private dwelling (NPDD) on the dwelling level (available on the detailed microdata file only).

The following variables underwent changes to their classifications in the 2016 Census:

• Ancestry (ANC1P, ANC2P)
• Birthplace of mother (BPFP)
• Birthplace of father (BFMP)
• Income classifications for persons (INCP), family (FINF, FINASF, FIDF) and household (HIND, HINASD, HIDD, HIED)
• Religious affiliation (RELP)
• Year of arrival in Australia (YARP), to accommodate the years between the 2011 and 2016 censuses.

### Estimation procedure

An estimate of the total for an item can be obtained by totalling the item for the relevant Census microdata file and then multiplying the result by 20 for the detailed microdata file, or by 100 for the basic microdata file. Note that this estimate of the total will not correspond exactly to the total that would be obtained from the full Census, firstly because of the sampling error due to the microdata files containing only a sample of Census records, and secondly, in the basic microdata file, because of the exclusion of large households.

Averages from the microdata files, such as the proportion of persons falling into a particular category, can be used as an estimate of the corresponding average in the Census. For example, the proportion of Australian born persons who are students is estimated by the proportion of students observed among Australian born persons on the microdata files. Note that if the denominator of such a proportion is known from the full Census then it can be multiplied by the estimated proportion to give an estimate of the numerator. For example, the total number of Australian born students could be estimated by multiplying the above proportion by the Australian born population. This gives an alternative estimate from using one of the microdata files (rather than counting the Australian born students on the detailed microdata file and multiplying by 20) that may be preferred in some circumstances, since it is more compatible with the known full-Census count.

Household, family and person estimates are available for private dwellings in both Census microdata files. For the detailed microdata file, person and household estimates are available for non-private dwellings. For the basic microdata file, only person estimates are available, due to the differing sampling methodologies. Family records are not applicable for non-private dwellings in both files.

### Reliability of estimates

The sampling error should be taken into account when interpreting estimates from the Census microdata files. A measure of the likely difference between an estimate from the Census microdata files and the corresponding full Census value is given by the standard error (SE) of the estimate. The SE indicates the extent to which an estimate might have varied by chance because only a sample of persons was included. There are about two chances in three that a sample estimate will differ by less than one SE from the full Census value, and about 19 chances in 20 that the difference will be less than two SEs. Another measure of sampling variability is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate to which it refers.

Non-sampling errors may occur in any statistical collection (a full count or a sample) and should not be confused with imprecision due to sampling error, which is measured by the SE. Non-sampling errors in both Census microdata files are differences due to the exclusion of large dwellings. In the Census as a whole, there may be inaccuracies that occur because of imperfections in reporting by respondents, errors made in collection (such as when recording responses) and errors made in processing the Census data. It is not possible to quantify non-sampling error, but every effort is made to reduce it to a minimum. For the following examples, non-sampling error is assumed to be zero. In practice, the potential for non-sampling error adds to the uncertainty in the estimates that is caused by sampling variability.

#### Standard error calculation

Both Census microdata files can be treated, for the purposes of standard error calculations, as a simple random sample of dwellings from the private dwelling population. For some analytic purposes, the non-private dwelling population has only a minor influence on results, and it is sufficient to include each person counted in a non-private dwelling as a separate 'dwelling' when calculating standard errors.

##### Dwelling level estimates

Estimates of the SE of averages for dwelling-level items can be obtained using standard formulae for a simple random sample. These standard error formulae require computing the average value of an item of interest per dwelling on the Census microdata file. The formula for $$y_{A V}$$, the estimated average of an item that takes value $$y_d$$ for dwelling $$d$$ out of $$n$$ sampled dwellings in a geographic area, is:

$$y_{A V}=\frac{1}{n} \sum_ \limits {d} y_{d}$$

where$$\sum_ \limits {d}$$ represents summing over the $$n$$ dwellings.

The standard error estimate $$S E\left(y_{A V}\right)$$is given by the following formula:

$$S E\left(y_{A V}\right)=\sqrt{\frac{1}{n} \frac{1}{n-1} \sum_ \limits {d}\left(y_{d}-y_{A V}\right)^{2}}$$

The estimate $$y_{T O T}$$ of the total count for this item, and its corresponding SE estimate $$S E\left(y_{T o T}\right)$$, are obtained by multiplying the average per dwelling by the number of dwellings in the geographic area. The number of dwellings is approximated with minimal error by:

$$w×n$$

where w is the weight (20 on the detailed microdata file and 100 on the basic microdata file) since the construction of the Census microdata file ensures proportional representation of geographic areas.

The formulae are as follows:

$$y_{T O T}=w \times n \times y_{A V}$$

$$S E\left(y_{T O T}\right)=w \times n \times S E\left(y_{A V}\right)$$

Note that the geographic area to be used in these calculations should be the smallest geographic area containing the dwellings in question. For example, estimates for a single state should use state as the geographic area.

##### Person level estimates

The above formulae can be applied to totals of persons by treating the $$y_{d}$$ as person counts within the dwelling i.e. $$y_{d}$$ is the number of persons from dwelling $$d$$ with the characteristic of interest. This makes $$y_{d}$$ the average number of persons per dwelling having this characteristic, and $$y_{T O T}$$ the total number of persons in the geographic area with this characteristic.

##### Family level estimates

Similarly, estimates for family-level items can be obtained by treating the $$y_{d}$$ as family counts within the dwelling i.e. $$y_{d}$$ is the number of families from dwelling $$d$$ with the characteristic of interest, $$y_{d}$$ is the average number of families per dwelling having the characteristic, and $$​​y_{T O T}$$ is the total number of families in the geographic area with the characteristic.

##### Clustering of the person sample

For some person level variables, it may be a reasonable approximation to treat the Census microdata files as a simple random sample of persons, even though it is in fact a sample of dwellings. This would involve letting $$d$$ in the above formulae indicate persons rather than dwellings, and replacing $$n$$ by the number of persons in the geographic area of interest. Person level means and associated standard errors could then be obtained by a standard tabulation package applied to the person level data.

Unfortunately, doing this will typically give an underestimate of the actual SE. The extent of this underestimation depends on how clustered the variable of interest is within dwellings - that is, on how often similar values of the variable tend to occur together in the same dwelling. The understatement of standard error will be greatest for variables that are highly clustered within dwellings, such as birthplace.

For this reason, it would be appropriate when treating the Census microdata files as a sample of persons to obtain a measure of the effect of clustering for the variables being investigated. A suitable measure is the design factor (DEFT), given by the ratio of the SE calculated correctly (with dwellings as units) to the SE calculated treating persons as units. Standard errors from the person level analysis can then be adjusted by this factor.

The SE ignoring clustering will be denoted by $$S E_{p}\left(y_{T o T}\right)$$ , with the subscript $$p$$ indicating that it is calculated at the person level. This can be obtained by taking the person level Census microdata file and creating a variable taking the value 1 for Australian born persons and 0 otherwise. This is then used to estimate the total and its SE.

An example using the 2011 Census microdata files showed that the standard error produced ignoring clustering underestimates the actual standard error by a factor of 2. Users could expect that other totals (eg. for geographic regions) for the variable 'Australian-born' would have a similar design factor.

##### Proportions

Simple approximations can be used to estimate the standard error for a ratio of counts. If $$y_{T O T_{1}}$$ and $$y_{T O T_{1}}$$ are estimated totals for two nested categories (i.e. category 2 is a subset of category 1) then writing

$$R S E\left(y_{T O T}\right)=\frac{S E\left(y_{T O T}\right)}{y_{T O T}}$$

for the relative standard error gives the following approximation:

$$R S E\left(\frac{y_{T O T_{2}}}{y_{T O T_{i}}}\right)=\sqrt{R S E\left(y_{T O T_{2}}\right)^{2}-R S E\left(y_{T O T_{i}}\right)^{2}}$$

This formula depends on the two categories being nested, and should not be used for distinct categories.

##### Differences

If two totals are for distinct categories (e.g. in comparing estimates across states), then the difference between two totals has the following SE approximation:

$$S E\left(y_{T O T_{2}}-y_{T O T_{i}}\right)=\sqrt{S E\left(y_{T O T_{2}}\right)^{2}+S E\left(y_{T O T_{i}}\right)^{2}}$$

While this formula will only be exact for differences between separate and uncorrelated (unrelated) characteristics or sub-populations, it is expected to provide a good approximation for most differences likely to be of interest.

##### Regression estimates

One use of the sample file will be to examine relationships between variables using regression methods. By treating the dwelling as the sample unit, standard regression packages can be used unweighted and the resulting standard errors and test statistics will be good estimates. For example, a regression model could be derived for $$y_{i}$$, the number of persons in the dwelling needing assistance with core activities, against various characteristics $$x_{1 i}, x_{2 i}, \ldots, x_{k i}$$ such as $$x_{1 i}$$ , the number of persons in the dwelling aged over 65 years, to fit the linear regression model:

$$y_{i}=a+b_{1} x_{1 i}+\ldots+b_{k} x_{k i}$$

Measures of model fit and of significance of the parameters $$a, b_{1}, \ldots, b_{k}$$ from the standard package will then be appropriate. Unfortunately, such a linear model may not adequately describe the relationships between variables at a dwelling level.

If a similar regression is performed treating person as the sample unit, the resulting standard errors and measures of significance could be inaccurate or misleading. This arises because the persons in the sample are clustered within dwellings, and so their responses may be "correlated" or affected by similar influences such as characteristics of the dwelling. The extent to which the measures of significance are affected will depend on how clustered the variable $$y_{i}$$ is likely to be within dwellings.

If a person level analysis is performed, such as a 'logistic analysis' of the probability of a person having a given characteristic, then the effect of clustering should be taken into account when interpreting the outcomes. In particular, SE are likely to be understated, as discussed in the section Clustering of the person sample, and this will tend to increase the apparent significance of modelled effects.

Techniques are available to perform valid analyses at the person level for a sample that is clustered within dwellings, treating persons as being subject to both person and dwelling effects. These techniques include 'multi-level', 'random effect' and 'mixed' modelling. (Footnote ¹ and ²)

By using these techniques, models can be used that do a better job of describing the actual relationships between variables at both person and dwelling level. Statistical packages are widely available to validly perform such analyses.

### Footnotes

1. Footnote 1 Goldstein, H. and Arnold, E, 1995, 'Multilevel Statistical Models', 2nd ed.Halsted Press, New York.
2. Snijders Tom A. B. and Bosker Roel J, 1999, 'Multilevel analysis : an introduction to basic and advanced multilevel modelling, SAGE, London.

Data files

Data files

## Previous releases

Census of Population and Housing, 2011TableBuilderBasic microdataDetailed microdata
Census of Population and Housing, 2006TableBuilderBasic microdataDetailed microdata
Census of Population and Housing, 2001 Basic microdataDetailed microdata
Census of Population and Housing, 1996 Basic microdata
Census of Population and Housing, 1991 Basic microdata
Census of Population and Housing, 1986 Basic microdata
Census of Population and Housing, 1981 Basic microdata

## History of changes

### Show all

##### 21/09/2022

Updated to include 2021 Census TableBuilder information.

##### 29/10/2019

2016 Experimental Index of Household Advantage and Disadvantage (IHAD) datasets made available via Census TableBuilder Pro. Release includes supportive changes to 'Introduction' and 'Using TableBuilder for Census Data' chapters, as well as the 'TableBuilder Guest, Basic and Pro Data Items List' in the Data downloads section.

##### 23/08/2019

Additional content: Census TableBuilder Pro system restrictions now included in the 'Using TableBuilder for Census Data' chapter. Changes also made to the 'TableBuilder Guest, Basic and Pro Data Items List' in the Data downloads section.

##### 11/04/2019

Basic CURF made available via Microdata Downloads. Release includes textual changes relating to sampling methodology and availability of Microdata products.

##### 10/01/2018

Updates to expected Basic CURF release date and minor corrections to Detailed Microdata data item list.

## Quality declaration

### Institutional environment

The microdata products addressed in this publication are released in accordance with the conditions specified in the Statistics Determination section of the Census and Statistics Act 1905, noting that the Census and Statistics (Information Release and Access) Determination 2018 came into effect on 15 August 2018 and has replaced the Statistics Determination 1983. This ensures that confidentiality is maintained whilst enabling unit record level data to be released. More information on the confidentiality practices can be found in the Data confidentiality guide.

For information on the institutional environment of the ABS, including the legislative obligations of the ABS, please see ABS Legislative Framework.

### Relevance

Microdata is available as TableBuilder datasets, 5% detailed microdata files and a 1% basic microdata file. These microdata files are the most detailed information available about key characteristics of people in Australia on Census Night and are released to support advanced data analysis. These characteristics are generally responses to individual questions on the Census form or data derived from two or more questions.

### Timeliness

The Census and Statistics Act 1905 requires the Australian Statistician to conduct a Census on a regular basis. Since 1961, a Census has been required every five years. Microdata products are usually released within three years of the collection of Census data.

### Accuracy

The microdata files generally contain finer levels of detail of variables than what is otherwise published in other formats, for example in QuickStats or Community Profiles. For more information on the level of detail provided, see the associated data item lists for the individual microdata products found in the Data downloads section.

Steps to confidentialise the data made available on the microdata files are taken in such a way as to maximise the usefulness of the data while maintaining the confidentiality of respondents. As a result, it may not be possible to exactly reconcile all the statistics produced from the microdata with other published statistics.

### Coherence

It is important for Census microdata to be comparable and compatible with previous censuses and related survey or administrative data sources. However:

• There are differences regarding how the sample has been created in relation to larger households in different Census years.
• The product types have changed overtime in response to the evolving institutional environment. This enables more detailed information to be provided for Census variables compared to previous Census years.
• The classifications used for Census data topics change over time.

### Interpretability

The information within this publication should be referred to when using the microdata products. It explains the sample methodology, use of the microdata files, file structure, the data item lists and changes over time.

### Accessibility

Microdata files are available to approved users. Users wishing to access the microdata files should read the Responsible use of ABS microdata web page before applying for access. Users should also familiarise themselves with information available via the microdata entry page.

A full list of available microdata can be viewed via the Available microdata page. More detail regarding types and modes of access to microdata can be found on the Compare data services page.