Microdata and TableBuilder: Census of Population and Housing

Designed for complex data queries such as detailed analysis and modelling on appropriately confidentialised unit record data

Introduction

About this publication

This publication provides information about microdata from the Census made available via different methods for analytical research. Microdata products contain the most detailed information available from the Census. They contain data which is either the response to individual questions on the Census form or derived from answers to two or more questions.

This publication includes:

  • details about the methodology
  • how to apply for and use the microdata
  • information on the quality of the microdata.

About the microdata

Privacy

The ABS is given the authority to collect, hold and use personal information for Census and statistical purposes as legislated by the Australian Bureau of Statistics Act 1975 and the Census and Statistics Act 1905. Data are released under the Census and Statistics Act 1905, which has provision for the release of individual level records (unit records) where the information is not likely to enable the identification of a particular person or organisation. Census microdata products do not contain names or addresses, and each have different assessment processes conducted and measures applied to ensure they are sufficiently confidentalised.

Microdata products

The Census microdata files comprise of:

  • TableBuilder datasets – build your own tables based on underlying microdata
  • 5% sample basic microdata files – download basic microdata from MicrodataDownload to use in your own environment
  • detailed microdata files – analyse detailed microdata within the ABS’s secure DataLab environment.

Use cases

Subject to limitations in the data classifications used, these files enable users to tabulate, manipulate and analyse data to their own specifications. Typical applications include:

  • production of papers, journal articles, books, PhD theses
  • microsimulation
  • modelling
  • conducting detailed analyses
  • producing detailed tabulations in a disaggregated form.

Data quality

For 2021 information about response rates and Census data quality, visit Census methodology.

Apply for access

Before applying for access, users should read the Responsible use of ABS microdata user guide to understand the obligations when using microdata.

To choose the best data product or service for you and to learn how to access, see Compare data services.

The list of variables (also referred to as data items) available within the different Census microdata products is available for download under Data downloads. The 2021 Census dictionary contains detailed information about the Census variables and concepts.

To apply, see How to access in TableBuilderMicrodataDownload and DataLab.

Data available on request

Data obtained in the Census but not contained in the Census data products may be available from the ABS, on request, as statistics in tabulated form. Subject to confidentiality and sampling variability constraints, special tabulations can be produced incorporating variables, populations and geographic areas selected to meet individual requirements. These are available on a fee for service basis. Enquiries should be submitted via an Information consultancy form

To view variables available for request, refer to either the current 2021 Census dictionary or the historical dictionaries from previous Census years.

Census data in TableBuilder

TableBuilder is an online data tool in which you can create tables of ABS microdata. It is designed to help you produce data specific to your needs through a flexible online user interface.

Within TableBuilder, you can: 

  • construct tables of Census data for a range of geographic areas, including small area geographies like Mesh Blocks, Statistical Area Level 1s or Postal Areas
  • display data by counts or percentages in your table
  • download tables as CSV, Excel and SDMX files
  • create, save and share customised geographic areas and recodes with other registered users.

Access to Census TableBuilder is free of charge. Visit the TableBuilder page for information on how to access and use TableBuilder.

Products

There are eleven 2021 Census TableBuilder products that contain different combinations of Census variables. Census releases both ‘Basic’ and ‘Pro’ TableBuilder datasets. The ‘Pro’ datasets contain a greater range of detail and variables than the Basic datasets and are designed for more complex analysis needs. Each product is designed with different populations and variables to support different analysis scopes including:

Further information on people experiencing homelessness, or marginally housed, as calculated from the Census of Population and Housing can be found in Estimating homelessness: Census. The methodological technique for those experiencing homelessness is outlined in Estimating homelessness: Census methodology.

The list of Census TableBuilder products, and the variables available in them, is available for download under Data downloads. Detailed information about Census variables and concepts can be found in either the current 2021 Census dictionary or the historical dictionaries from previous Census years. 

Restrictions

System restrictions have been implemented which prevent the cross-tabulation of certain variables within several Census datasets.

These restrictions have been applied to:

  • maintain the confidentiality of respondents
  • ensure the output of quality data
  • assist users by not allowing combinations of variables that statistically should not be combined.

When the restriction is triggered the following error message will be displayed:

"The variable you are trying to add cannot be used with one of the variables already in the table.”

Other similar variables may be available. For example, if you are using geographical areas from Mesh Blocks, you may be able to use another geographical area variable instead, such as Main Statistical Area Structure (Main ASGS).

Detailed microdata

Detailed microdata files are the ABS’s most detailed unit record data and have been designed specifically for use within the DataLab environment.

Data included on the microdata files comprise the key output items for the Census. This includes data collected in the Census which covers family, household and personal characteristics in topics including cultural diversity, disability and carers, education and training, health, income and work, and service in the Australian Defence Force.

This detailed microdata product includes all person, family, and dwelling records from the 2021 Census. As this is a full file, there is no supporting methodology information. The Methodology for 2016 basic and detailed microdata only applies to the 2016 product as this was a 5% sample file.

Changes to variables

Changes have been made to variables released on the detailed microdata files over the different Census cycles, for example, health and Australian Defence Force service variables are new to the 2021 product and are not available in preceding products. Lists of available variables and their correspondence with Census classifications within the different Census detailed microdata products are available for download under Data downloads. Census classifications can be found in either the current 2021 Census dictionary or the historical dictionaries from previous Census years. 

The mnemonic for the Form type variable has changed in the 2021 product to FTCP. This previously was FTPP in earlier detailed microdata products. The variable is otherwise unchanged, that is the codes and categories for this variable remain the same.

Geography

The detailed microdata file contains information on the geographic area of dwellings and each person’s usual residence geographies. Geographic areas have been based on the Australian Statistical Geography Standard (ASGS).

A list of the geographic variables available in the detailed microdata product is available in the data item list in the Data downloads section. 

Identifiers

Dwelling, family and person IDs

Each record level is given an identifier:

  • Dwelling (Household) - ABSHID
  • Family - ABSFID
  • Person - ABSPID.

To enable users to link records, the following identifiers are available across levels:

  • ABSFID and the related ABSHID on each family record
  • ABSPID and the related ABSFID and ABSHID on each person record.

Dwelling indicator for persons

The Dwelling indicator for persons (DWIP) variable was introduced in 2006 as a way of enabling users of the microdata files to more easily distinguish between those people enumerated in private dwellings and those enumerated in non-private dwellings (without the need to link to the household file).

The DWIP variable applies to all persons enumerated in an occupied private or non-private dwelling. Categories are:

  1. Enumerated in an occupied private dwelling
  2. Enumerated in a non-private dwelling
  3. Enumerated in other dwellings.

2021 detailed microdata files

CSV

These files contain the data in a comma delimited ASCII text format: 

  • census_2021_dwelling.csv contains the Dwelling level data
  • census_2021_family.csv contains the Family level data
  • census_2021_person.csv contains the Person level data.

SAS

These files contain the data in SAS for Windows format:

  • census_2021_dwelling.sas7bdat contains the Dwelling level data
  • census_2021_family.sas7bdat contains the Family level data
  • census_2021_person.sas7bdat contains the Person level data.

STATA

These files contain the data in STATA format:

  • census_2021_dwelling.dta contains the Dwelling level data
  • census_2021_family.dta contains the Family level data
  • census_2021_person.dta contains the Person level data.

Information files

This file is an Excel data item list, containing all variables, categories, and codes:

  • CENSUS 2021 Data Item List.xlsx

Basic microdata

Basic microdata files provide unit record information about persons, families and dwellings and is designed for statistical analysis in your own environment, for example modelling. Approved users will be able to download the files via MicrodataDownload (a secure download system).

Basic microdata contains highly confidentialised data items, most of which are provided in ranges or broad groupings. Lists of available variables and their correspondence with Census classifications within the different Census basic microdata products are available for download under Data downloads. Census classifications can be found in either the current 2021 Census dictionary or the historical dictionaries from previous Census years. 

Data included on the microdata files comprise the key output items for the Census. This includes data collected in the Census which covers family, household and personal characteristics in topics including cultural diversity, disability and carers, education and training, health, income and work, and service in the Australian Defence Force.

The data are released under the Census and Statistics Act 1905, which has provision for the release of individual level records (unit records) where the information is not likely to enable the identification of a particular person or organisation. Accordingly, there are no names or addresses on the microdata files and other steps, including the following list of actions, are taken to maintain respondent confidentiality.

  • Records from the Other Territories, comprising Jervis Bay, Cocos (Keeling) Islands, Norfolk Island, and Christmas Island, have been excluded from sampling, as have migratory, shipping and off-shore statistical areas.
  • Large households (with seven or more usual residents) have been replaced in the sample to ensure confidentiality of large households. A dwelling from a similar geographic region of a similar size (up to six residents) was chosen by random sampling as a replacement for each large household.
  • Some variables that were collected in the Census have been excluded from the files.
  • The level of detail of certain variables has been reduced by grouping, ranging or top coding values.
  • Some individual records from non-private dwellings have been suppressed (removed from the sample).
  • Minor edits were made to individual records.

The nature of the changes made, and the relatively small number of records involved, ensure that the effect on data for analysis purposes is considered negligible. These changes also mean that estimates produced from the microdata files may differ from those published in other Census tools and products.

Changes to variables

Changes have been made to variables released on the basic microdata files over the different Census periods, for example, health and Australian Defence Force service variables are new to the 2021 product and are not available in preceding products.

The following two variables that were available on the 2016 basic microdata are no longer available on the 2021 basic microdata:

  • Dwelling internet connection (NEDD) due to information on access to the internet no longer being collected as part of the Census.
  • Proficiency in spoken English (ENGP) has been replaced by Proficiency in spoken English (ENGLP). The ENGP variable was only applicable to those who used a language other than English or who did not state a language, whereas ENGLP is applicable to all persons.

The mnemonic for the Form type variable has changed in the 2021 product to FTCP. This previously was FTPP in earlier detailed microdata products. The variable is otherwise unchanged, that is the codes and categories for this variable remain the same.

Lists of available variables and their correspondence with Census classifications within the different Census basic microdata products are available for download under Data downloads. Census classifications can be found in either the current 2021 Census dictionary or the historical dictionaries from previous Census years. 

As a confidentalisation measure, in the 2021 basic microdata Dwelling (ABSHID) and Family (ABSFID) identifiers are not available for non-private dwelling records, instead they are all set to 9999999999.

2021 basic microdata methodology

Selection of sample

Data in the Census basic microdata files represent samples of dwelling, family and person records from the Census. Systematic sampling was utilised to ensure a representative sample across states and territories in each microdata file.

The 5% basic microdata file provides a sample of one private dwelling records in every twenty from the Census and their associated family and person records. Large households (private dwellings with seven or more usual residents) have been replaced in the sample to ensure data confidentiality. A private dwelling from a similar geographic region, with the same indigenous household indicator (INGDWTD), and of a similar size (up to six usual residents) was chosen as a replacement for each large private dwelling. For non-private dwellings, the sampling is applied to persons present where one person in every twenty is selected and the associated non-private dwelling records included on the file.

Sample and population counts for persons by dwelling type (1)
StateSample countsPopulation counts
Persons in occupied private dwellingsPersons in non-private dwellingsTotal persons in dwellings (2)Persons in occupied private dwellingsPersons in non-private dwellingsTotal persons in dwellings (2)
New South Wales387,5498,779396,3287,888,123181,8248,069,947
Victoria312,0576,416318,4736,338,365133,9876,472,350
Queensland247,2569,262256,5185,020,157190,0955,210,255
South Australia85,4472,29787,7441,726,83649,9981,776,838
Western Australia125,6685,811131,4792,554,139119,3542,673,495
Tasmania26,58281427,396536,83116,705553,532
Northern Territory11,1401,27912,419234,83329,176264,011
Australian Capital Territory21,43069422,124438,85714,600453,453
Total1,217,12935,3521,252,48124,738,138735,74625,473,880
  1. Counts are based on please of enumeration, and exclude persons in migratory, off-shore and shipping areas.
  2. Total dwelling counts include occupied private dwellings and non-private dwellings.
Sample and population counts for families in occupied private dwellings (1)
StateSample countsPopulation counts
New South Wales162,0313,266,807
Victoria131,3902,644,759
Queensland106,1492,139,206
South Australia37,893761,256
Western Australia54,0471,088,266
Tasmania12,113243,462
Northern Territory4,64995,549
Australian Capital Territory9,251186,228
Total517,52310,425,533
  1. Counts exclude families in migratory, off-shore and shipping areas.
Sample and population counts for families by dwelling type (1)
StateSample countsPopulation counts
Persons in occupied private dwellingsPersons in non-private dwellingsTotal persons in dwellings (2)Persons in occupied private dwellingsPersons in non-private dwellingsTotal persons in dwellings (2)
New South Wales152,4363,315155,7513,058,2636,9823,065,244
Victoria125,0012,497127,4982,507,6345,1102,512,744
Queensland99,5323,147102,6791,998,0205,0852,003,106
South Australia36,04982336,872723,1581,387724,539
Western Australia51,2971,39952,6961,029,7462,3861,032,136
Tasmania11,43835811,796229,427690230,121
Northern Territory4,2463704,61685,37071586,083
Australian Capital Territory8,7181408,858174,972187175,168
Total488,71712,049500,7669,806,59022,5449,829,138
  1. Unoccupied private dwellings, and dwellings in Other Territories and migratory, off-shore and shipping areas are not included as they are out of scope for the basic sample.
  2. Total dwelling counts include occupied private dwellings and non-private dwellings.

Estimation procedure

An estimate of the total for an item can be obtained by totalling the item for the relevant Census microdata file and then multiplying the result by 20 for the basic microdata file. Note that this estimate of the total will not correspond exactly to the total that would be obtained from the full Census, firstly because of the sampling error due to the microdata files containing only a sample of Census records, and secondly, in the basic microdata file, because of the exclusion of large households.

Averages from the microdata files, such as the proportion of persons falling into a particular category, can be used as an estimate of the corresponding average in the Census. For example, the proportion of Australian born persons who are students is estimated by the proportion of students observed among Australian born persons on the microdata files. Note that if the denominator of such a proportion is known from the full Census then it can be multiplied by the estimated proportion to give an estimate of the numerator. For example, the total number of Australian born students could be estimated by multiplying the above proportion by the Australian born population. This gives an alternative estimate from using one of the microdata files (rather than counting the Australian born students on the basic microdata file and multiplying by 20) that may be preferred in some circumstances, since it is more compatible with the known full-Census count.

For private dwellings, person, household, and family level estimates can be calculated. For non-private dwellings, only person level estimates can be calculated, due to the differing methodology for how non-private dwellings are sampled. 

Reliability of estimates

The sampling error should be taken into account when interpreting estimates from the Census microdata files. A measure of the likely difference between an estimate from the Census microdata files and the corresponding full Census value is given by the standard error (SE) of the estimate. The SE indicates the extent to which an estimate might have varied by chance because only a sample of persons was included. There are about two chances in three that a sample estimate will differ by less than one SE from the full Census value, and about 19 chances in 20 that the difference will be less than two SEs. Another measure of sampling variability is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate to which it refers.

Non-sampling errors may occur in any statistical collection (a full count or a sample) and should not be confused with imprecision due to sampling error, which is measured by the SE. Non-sampling errors in Census microdata files are differences due to the exclusion of large dwellings. In the Census as a whole, there may be inaccuracies that occur because of imperfections in reporting by respondents, errors made in collection (such as when recording responses) and errors made in processing the Census data. It is not possible to quantify non-sampling error, but every effort is made to reduce it to a minimum. For the following examples, non-sampling error is assumed to be zero. In practice, the potential for non-sampling error adds to the uncertainty in the estimates that is caused by sampling variability. 

Standard error calculation 

Census microdata files can be treated, for the purposes of standard error calculations, as a simple random sample of dwellings from the private dwelling population. For some analytic purposes, the non-private dwelling population has only a minor influence on results, and it is sufficient to include each person counted in a non-private dwelling as a separate 'dwelling' when calculating standard errors.

Dwelling level estimates

Estimates of the SE of averages for dwelling-level items can be obtained using standard formulae for a simple random sample. These standard error formulae require computing the average value of an item of interest per dwelling on the Census microdata file. The formula for \(y_{A V}\), the estimated average of an item that takes value \(y_d\) for dwelling \(d\) out of \(n\) sampled dwellings in a geographic area, is:

       \(y_{A V}=\frac{1}{n} \sum_ \limits {d} y_{d}\)

where\(\sum_ \limits {d}\) represents summing over the \(n\) dwellings.

The standard error estimate \(S E\left(y_{A V}\right)\)is given by the following formula:

      \(S E\left(y_{A V}\right)=\sqrt{\frac{1}{n} \frac{1}{n-1} \sum_ \limits {d}\left(y_{d}-y_{A V}\right)^{2}}\)

The estimate \(y_{T O T}\) of the total count for this item, and its corresponding SE estimate \(S E\left(y_{T o T}\right)\), are obtained by multiplying the average per dwelling by the number of dwellings in the geographic area. The number of dwellings is approximated with minimal error by:

      \(w×n\)

where w is the weight (20) since the construction of the Census microdata file ensures proportional representation of geographic areas.

The formulae are as follows:

     \(y_{T O T}=w \times n \times y_{A V}\)

    \(S E\left(y_{T O T}\right)=w \times n \times S E\left(y_{A V}\right)\)

Note that the geographic area to be used in these calculations should be the smallest geographic area containing the dwellings in question. For example, estimates for a single state should use state as the geographic area.

Person level estimates

The above formulae can be applied to totals of persons by treating the \(y_{d}\) as person counts within the dwelling i.e. \(y_{d}\) is the number of persons from dwelling \(d\) with the characteristic of interest. This makes \(y_{d}\) the average number of persons per dwelling having this characteristic, and \(y_{T O T}\) the total number of persons in the geographic area with this characteristic.

Family level estimates

Similarly, estimates for family-level items can be obtained by treating the \(y_{d}\) as family counts within the dwelling i.e. \(y_{d}\) is the number of families from dwelling \(d\) with the characteristic of interest, \(y_{d}\) is the average number of families per dwelling having the characteristic, and \(​​y_{T O T}\) is the total number of families in the geographic area with the characteristic.

Clustering of the person sample

For some person level variables, it may be a reasonable approximation to treat the Census microdata files as a simple random sample of persons, even though it is in fact a sample of dwellings. This would involve letting \(d\) in the above formulae indicate persons rather than dwellings, and replacing \(n\) by the number of persons in the geographic area of interest. Person level means and associated standard errors could then be obtained by a standard tabulation package applied to the person level data.

Unfortunately, doing this will typically give an underestimate of the actual SE. The extent of this underestimation depends on how clustered the variable of interest is within dwellings - that is, on how often similar values of the variable tend to occur together in the same dwelling. The understatement of standard error will be greatest for variables that are highly clustered within dwellings, such as birthplace.

For this reason, it would be appropriate when treating the Census microdata files as a sample of persons to obtain a measure of the effect of clustering for the variables being investigated. A suitable measure is the design factor (DEFT), given by the ratio of the SE calculated correctly (with dwellings as units) to the SE calculated treating persons as units. Standard errors from the person level analysis can then be adjusted by this factor.

The SE ignoring clustering will be denoted by \(S E_{p}\left(y_{T o T}\right)\) , with the subscript \(p\) indicating that it is calculated at the person level. This can be obtained by taking the person level Census microdata file and creating a variable taking the value 1 for Australian born persons and 0 otherwise. This is then used to estimate the total and its SE.

An example using the 2011 Census microdata files showed that the standard error produced ignoring clustering underestimates the actual standard error by a factor of 2. Users could expect that other totals (eg. for geographic regions) for the variable 'Australian-born' would have a similar design factor.

Standard errors for proportions and differences

Proportions

Simple approximations can be used to estimate the standard error for a ratio of counts. If \(y_{T O T_{1}}\) and \(y_{T O T_{1}}\) are estimated totals for two nested categories (i.e. category 2 is a subset of category 1) then writing 

   \(R S E\left(y_{T O T}\right)=\frac{S E\left(y_{T O T}\right)}{y_{T O T}}\)

for the relative standard error gives the following approximation:

    \(R S E\left(\frac{y_{T O T_{2}}}{y_{T O T_{i}}}\right)=\sqrt{R S E\left(y_{T O T_{2}}\right)^{2}-R S E\left(y_{T O T_{i}}\right)^{2}}\)

This formula depends on the two categories being nested, and should not be used for distinct categories.

Differences

If two totals are for distinct categories (e.g. in comparing estimates across states), then the difference between two totals has the following SE approximation:

    \(S E\left(y_{T O T_{2}}-y_{T O T_{i}}\right)=\sqrt{S E\left(y_{T O T_{2}}\right)^{2}+S E\left(y_{T O T_{i}}\right)^{2}}\)

While this formula will only be exact for differences between separate and uncorrelated (unrelated) characteristics or sub-populations, it is expected to provide a good approximation for most differences likely to be of interest.

Regression estimates

One use of the sample file will be to examine relationships between variables using regression methods. By treating the dwelling as the sample unit, standard regression packages can be used unweighted and the resulting standard errors and test statistics will be good estimates. For example, a regression model could be derived for \(y_{i}\), the number of persons in the dwelling needing assistance with core activities, against various characteristics \(x_{1 i}, x_{2 i}, \ldots, x_{k i}\) such as \(x_{1 i}\) , the number of persons in the dwelling aged over 65 years, to fit the linear regression model:

    \(y_{i}=a+b_{1} x_{1 i}+\ldots+b_{k} x_{k i}\)

Measures of model fit and of significance of the parameters \(a, b_{1}, \ldots, b_{k}\) from the standard package will then be appropriate. Unfortunately, such a linear model may not adequately describe the relationships between variables at a dwelling level.

If a similar regression is performed treating person as the sample unit, the resulting standard errors and measures of significance could be inaccurate or misleading. This arises because the persons in the sample are clustered within dwellings, and so their responses may be "correlated" or affected by similar influences such as characteristics of the dwelling. The extent to which the measures of significance are affected will depend on how clustered the variable \(y_{i}\) is likely to be within dwellings.

If a person level analysis is performed, such as a 'logistic analysis' of the probability of a person having a given characteristic, then the effect of clustering should be taken into account when interpreting the outcomes. In particular, SE are likely to be understated, as discussed in the section Clustering of the person sample, and this will tend to increase the apparent significance of modelled effects.

Techniques are available to perform valid analyses at the person level for a sample that is clustered within dwellings, treating persons as being subject to both person and dwelling effects. These techniques include 'multi-level', 'random effect' and 'mixed' modelling. (Footnote ¹ and ²)

By using these techniques, models can be used that do a better job of describing the actual relationships between variables at both person and dwelling level. Statistical packages are widely available to validly perform such analyses.

Geography

The basic microdata file contains information on the geographic area of selected dwellings and for each person's usual residence. For 2021, geographic areas in the basic microdata file are based on the Australian Statistical Geography Standard (ASGS).

To ensure that the information on the file is not likely to enable identification of a person or household, all areas are defined using a minimum population size of 250,000 persons (except for the Northern Territory based on Place of Usual Residence which had a total population of 228,912 persons) from the full Census. Records are randomly ordered within a region to further reduce the likelihood of individual identification. 

All regions can be aggregated to the state level.

Geographic regions are formed from Statistical Area Level 4 and form the basis of the following data items: 

  • AREAENUM (Area of enumeration)
  • REGUCP (Region of usual residence)
  • REGU1P (Region of usual residence one year ago) and
  • REGU5P (Region of usual residence five years ago) data items.

A full list of regions is included in the data item list which is available from Data Downloads.

Identifiers

Dwelling, family and person IDs

Each record level is given an identifier:

  • Dwelling (Household) - ABSHID
  • Family - ABSFID
  • Person - ABSPID.

To enable users to link records, the following Identifiers are available across levels:

  • ABSFID and the related ABSHID on each family record
  • ABSPID and the related ABSFID and ABSHID on each person record.

As a confidentalisation measure, in the 2021 basic microdata Dwelling (ABSHID) and Family (ABSFID) identifiers are not available for non-private dwelling records, instead they are all set to 9999999999.

Dwelling indicator for persons

The Dwelling indicator for persons (DWIP) variable was introduced in 2006 as a way of enabling users of the microdata files to more easily distinguish between those people enumerated in private dwellings and those enumerated in non-private dwellings (without the need to link to the household file).

The DWIP variable applies to all persons enumerated in an occupied private or non-private dwelling. Categories are:

  • Enumerated in an occupied private dwelling
  • Enumerated in a non-private dwelling

As migratory, off-shore and shipping areas were not included in the sample, there is no `Enumerated in other dwellings' category for this variable on the basic microdata.

2021 basic microdata files

CSV

These files contain the data in a comma delimited ASCII text format: 

  • census_2021_basic_dwelling.csv contains the Dwelling level data
  • census_2021_basic_family.csv contains the Family level data
  • census_2021_basic_person.csv contains the Person level data.

SAS

These files contain the data in SAS for Windows format:

  • census_2021_basic_dwelling.sas7bdat contains the Dwelling level data
  • census_2021_basic_family.sas7bdat contains the Family level data
  • census_2021_basic_person.sas7bdat contains the Person level data
  • census_2021_basic_formats.sas7bdat contains formats.

STATA

These files contain the data in STATA format:

  • census_2021_basic_dwelling.dta contains the Dwelling level data
  • census_2021_basic_family.dta contains the Family level data
  • census_2021_basic_person.dta contains the Person level data.

SPSS

There files contain the data in SPSS format:

  • census_2021_basic_dwelling.sav contains the Dwelling level data
  • census_2021_basic_family.sav contains the Family level data
  • census_2021_basic_person.sav contains the Person level data.

Information files

A data item list containing all variable, categories, and codes in this basic microdata is available under Data download.  

Footnotes

  1. Goldstein, H. and Arnold, E, 1995, 'Multilevel Statistical Models', 2nd ed. Halsted Press, New York.
  2. Snijders Tom A. B. and Bosker Roel J, 1999, 'Multilevel analysis : an introduction to basic and advanced multilevel modelling, SAGE, London.

Data downloads

Data files

Previous release data downloads

Data files
 TableBuilder data seriesMicrodataDownloadDataLab
Census of Population and Housing, 2011TableBuilderBasic microdataDetailed microdata
Census of Population and Housing, 2006TableBuilderBasic microdataDetailed microdata
Census of Population and Housing, 2001 Basic microdataDetailed microdata
Census of Population and Housing, 1996 Basic microdata 
Census of Population and Housing, 1991 Basic microdata 
Census of Population and Housing, 1986 Basic microdata 
Census of Population and Housing, 1981 Basic microdata 

Previous releases

Using 2016 detailed microdata

About the variables

Detailed microdata files are the ABS's most detailed unit record data and have been designed specifically for use within the DataLab environment. A 5% sample of person, family and household unit record data from the 2016 Census has been released as detailed microdata files into the ABS' DataLab environment.

The full listing of the detailed microdata classifications and the corresponding Census classifications are detailed in the data item lists in the Data downloads section. In some cases these will differ marginally.

Further information about 2016 Census variables can be found in the 2016 Census dictionary. For information about response rates and Census data quality, please visit the Understanding the Census and Census data publication.

Identifiers

Dwelling, family and person IDs as well as DWIP for detailed microdata are the same as identifiers for basic microdata. 

Geography

The detailed microdata file contains information on the geographic area of selected dwellings and each person's usual residence geographies. For 2016, geographic areas in the file have been based on the ASGS.

A list of the geographic variables available in the detailed microdata file is available in the data item list in the Data downloads section. 

Files and file structures 

CSV

These files contain the data in a comma delimited ASCII text format: 

  • CDM16_dwelling.csv contains the Dwelling level data
  • CDM16_family.csv contains the Family level data
  • CDM16_person.csv contains the Person level data.
SAS

These files contain the data in SAS for Windows format:

  • CDM16_dwelling.sas7bdat contains the Dwelling level data
  • CDM16_family.sas7bdat contains the Family level data
  • CDM16_person.sas7bdat contains the Person level data.
SPSS

These files contain the data in SPSS for Windows format:

  • CDM16_dwelling.sav contains the Dwelling level data
  • CDM16_family.sav contains the Family level data
  • CDM16_person.sav contains the Person level data.
STATA

These files contain the data in STATA format:

  • CDM16_dwelling.dta contains the Dwelling level data
  • CDM16_family.dta contains the Family level data
  • CDM16_person.dta contains the Person level data.
Information files

This file is a SAS library containing formats.

  • FORMATS.sas7bcat

Using 2016 basic microdata

About the variables

The full classification structures for the 2016 basic microdata file variables can be found in the 2016 Census dictionary

Many of the classifications in the basic microdata file have been collapsed and the full listings of the basic microdata classifications are detailed in the data items lists in the Data downloads section.

Identifiers

Dwelling, Family and Person IDs 

Each record level are given an identifier:

  • Dwelling (Household) - ABSHID
  • Family - ABSFID
  • Person - ABSPID.

To enable users to link records, the following Identifiers are available across levels:

  • ABSFID and the related ABSHID on each family record
  • ABSPID and the related ABSFID and ABSHID on each person record.

Dwelling indicator for persons

The DWIP (Dwelling indicator for persons) variable was introduced in 2006 as a way of enabling users of the microdata files to more easily distinguish between those people enumerated in private dwellings and those enumerated in non-private dwellings (without the need to link to the household file). This variable was applied in 2011 and is included in 2016 as well.

The DWIP variable applies to all persons enumerated in an occupied private or non-private dwelling. Categories are:

  1. Enumerated in an occupied private dwelling
  2. Enumerated in a non-private dwelling. 

As migratory, off-shore and shipping areas were not included in the sample, there is no `Not applicable' category for this variable.

Geography

The basic microdata file contains information on the geographic area of selected dwellings. For 2016, geographic areas in the basic microdata file are based on the Australian Statistical Geography Standard (ASGS).

To ensure that the information on the file is not likely to enable identification of a person or household, all areas are defined using a minimum population size of 250,000 persons (except for the Northern Territory which had a total population of 228,833 persons) from the full Census. Records are randomly ordered within a region to further reduce the likelihood of individual identification. 

All regions can be aggregated to the state level.

Geographic regions are formed from Statistical Area Level 4 and form the basis of the following data items: 

  • AREAENUM (Area of enumeration)
  • REGUCP (Region of usual residence on Census Night)
  • REGU1P (Region of usual residence 1 year ago) and
  • REGU5P (Region of usual residence 5 years ago) data items.

A full list of regions is included in the data item list.

Files and file structures

Dwelling, family and person level files are available in the following formats:

  • CSV in a comma delimited ASCII text format
  • SAS for Windows
  • SPSS for Windows
  • STATA.

Methodology for 2016 basic and detailed microdata

Selection of sample

Data in the Census basic and detailed microdata files represent samples of dwelling, family and person records from the Census. Systematic sampling techniques were utilised to ensure a representative sample across states and territories in each microdata file.

The detailed microdata file contains a 5% sample of dwelling records, taken from occupied private dwellings and non-private dwellings, and their associated family and person records. That is, the detailed microdata file provides a sample of five occupied private and non-private dwelling records in every hundred from the Census with their associated family and person records.

The 1% basic microdata file provides a sample of one private dwelling record in every hundred from the Census and their associated family and person records. Dwellings with more than six usual residents were removed from the sample to ensure confidentiality of large dwellings. For non-private dwellings, the sampling is applied to persons present where one person in every hundred is selected and the associated dwelling records included on the file.

The data are released under the Census and Statistics Act 1905, which has provision for the release of individual level records (unit records) where the information is not likely to enable the identification of a particular person or organisation. Accordingly, there are no names or addresses on the microdata files and other steps, including the following list of actions, are taken to maintain respondent confidentiality.

In both the detailed and basic microdata files: 

  • Records from the Other Territories, comprising Jervis Bay, Cocos (Keeling) and Christmas Islands, have been excluded from sampling, as have migratory, shipping and off-shore statistical areas.
  • Some variables that were collected in the Census have been excluded from the files.

In the basic microdata file, additional confidentiality measures were undertaken:

  • Large households (with seven or more usual residents) have been replaced in the sample to ensure confidentiality of large households. A dwelling from a similar geographic region of a similar size (up to six residents) was chosen by random sampling as a replacement for each large household.
  • The level of detail of certain variables has been reduced by grouping, ranging or top coding values.
  • Where necessary, minor edits were made to individual records.

The nature of the changes made, and the relatively small number of records involved, ensure that the effect on data for analysis purposes is considered negligible. These changes also mean that estimates produced from the microdata files may differ from those published in other Census tools and products.

Changes from previous Census Microdata files

There have been 5 new variables included on the 2016 detailed microdata file and 4 new variables on the basic microdata file. These are:

  • Indigenous status (INGP) on the persons level
  • Indigenous household indicator (INGDWTD) on the dwelling level
  • Form type (FTPP) on the persons level
  • Status in employment (SIEMP), which is a new item for the 2016 Census and replaces Employment type (EMTP), which was used in 2011 Census output.
  • Type of non-private dwelling (NPDD) on the dwelling level (available on the detailed microdata file only).

The following variables underwent changes to their classifications in the 2016 Census:

  • Ancestry (ANC1P, ANC2P)
  • Birthplace of mother (BPFP)
  • Birthplace of father (BFMP)
  • Income classifications for persons (INCP), family (FINF, FINASF, FIDF) and household (HIND, HINASD, HIDD, HIED)
  • Religious affiliation (RELP)
  • Year of arrival in Australia (YARP), to accommodate the years between the 2011 and 2016 censuses.

For more information about these variables, refer to the 2016 Census dictionary.

Estimation procedure

An estimate of the total for an item can be obtained by totalling the item for the relevant Census microdata file and then multiplying the result by 20 for the detailed microdata file, or by 100 for the basic microdata file. Note that this estimate of the total will not correspond exactly to the total that would be obtained from the full Census, firstly because of the sampling error due to the microdata files containing only a sample of Census records, and secondly, in the basic microdata file, because of the exclusion of large households.

Averages from the microdata files, such as the proportion of persons falling into a particular category, can be used as an estimate of the corresponding average in the Census. For example, the proportion of Australian born persons who are students is estimated by the proportion of students observed among Australian born persons on the microdata files. Note that if the denominator of such a proportion is known from the full Census then it can be multiplied by the estimated proportion to give an estimate of the numerator. For example, the total number of Australian born students could be estimated by multiplying the above proportion by the Australian born population. This gives an alternative estimate from using one of the microdata files (rather than counting the Australian born students on the detailed microdata file and multiplying by 20) that may be preferred in some circumstances, since it is more compatible with the known full-Census count.

Household, family and person estimates are available for private dwellings in both Census microdata files. For the detailed microdata file, person and household estimates are available for non-private dwellings. For the basic microdata file, only person estimates are available, due to the differing sampling methodologies. Family records are not applicable for non-private dwellings in both files.

Reliability of estimates

The sampling error should be taken into account when interpreting estimates from the Census microdata files. A measure of the likely difference between an estimate from the Census microdata files and the corresponding full Census value is given by the standard error (SE) of the estimate. The SE indicates the extent to which an estimate might have varied by chance because only a sample of persons was included. There are about two chances in three that a sample estimate will differ by less than one SE from the full Census value, and about 19 chances in 20 that the difference will be less than two SEs. Another measure of sampling variability is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate to which it refers.

Non-sampling errors may occur in any statistical collection (a full count or a sample) and should not be confused with imprecision due to sampling error, which is measured by the SE. Non-sampling errors in both Census microdata files are differences due to the exclusion of large dwellings. In the Census as a whole, there may be inaccuracies that occur because of imperfections in reporting by respondents, errors made in collection (such as when recording responses) and errors made in processing the Census data. It is not possible to quantify non-sampling error, but every effort is made to reduce it to a minimum. For the following examples, non-sampling error is assumed to be zero. In practice, the potential for non-sampling error adds to the uncertainty in the estimates that is caused by sampling variability. 

Standard error calculation

Both Census microdata files can be treated, for the purposes of standard error calculations, as a simple random sample of dwellings from the private dwelling population. For some analytic purposes, the non-private dwelling population has only a minor influence on results, and it is sufficient to include each person counted in a non-private dwelling as a separate 'dwelling' when calculating standard errors.

Dwelling level estimates

Estimates of the SE of averages for dwelling-level items can be obtained using standard formulae for a simple random sample. These standard error formulae require computing the average value of an item of interest per dwelling on the Census microdata file. The formula for \(y_{A V}\), the estimated average of an item that takes value \(y_d\) for dwelling \(d\) out of \(n\) sampled dwellings in a geographic area, is:

       \(y_{A V}=\frac{1}{n} \sum_ \limits {d} y_{d}\)

where \(\sum_ \limits {d}\) represents summing over the \(n\) dwellings.

The standard error estimate \(S E\left(y_{A V}\right)\) is given by the following formula:

      \(S E\left(y_{A V}\right)=\sqrt{\frac{1}{n} \frac{1}{n-1} \sum_ \limits {d}\left(y_{d}-y_{A V}\right)^{2}}\)

The estimate \(y_{T O T}\) of the total count for this item, and its corresponding SE estimate \(S E\left(y_{T o T}\right)\), are obtained by multiplying the average per dwelling by the number of dwellings in the geographic area. The number of dwellings is approximated with minimal error by:

      \(w×n\)

where w is the weight (20 on the detailed microdata file and 100 on the basic microdata file) since the construction of the Census microdata file ensures proportional representation of geographic areas.

The formulae are as follows:

     \(y_{T O T}=w \times n \times y_{A V}\)

    \(S E\left(y_{T O T}\right)=w \times n \times S E\left(y_{A V}\right)\)

Note that the geographic area to be used in these calculations should be the smallest geographic area containing the dwellings in question. For example, estimates for a single state should use state as the geographic area.

Person level estimates

The above formulae can be applied to totals of persons by treating the \(y_{d}\) as person counts within the dwelling i.e. \(y_{d}\) is the number of persons from dwelling \(d\) with the characteristic of interest. This makes \(y_{d}\) the average number of persons per dwelling having this characteristic, and \(y_{T O T}\) the total number of persons in the geographic area with this characteristic.

Family level estimates

Similarly, estimates for family-level items can be obtained by treating the \(yd\) as family counts within the dwelling i.e. \(yd\) is the number of families from dwelling \(d\) with the characteristic of interest, \(yd\) is the average number of families per dwelling having the characteristic, and ​​\(y_{T O T}\) is the total number of families in the geographic area with the characteristic.

Clustering of the person sample

For some person level variables, it may be a reasonable approximation to treat the Census microdata files as a simple random sample of persons, even though it is in fact a sample of dwellings. This would involve letting \(d\) in the above formulae indicate persons rather than dwellings, and replacing \(n\) by the number of persons in the geographic area of interest. Person level means and associated standard errors could then be obtained by a standard tabulation package applied to the person level data.

Unfortunately, doing this will typically give an underestimate of the actual SE. The extent of this underestimation depends on how clustered the variable of interest is within dwellings - that is, on how often similar values of the variable tend to occur together in the same dwelling. The understatement of standard error will be greatest for variables that are highly clustered within dwellings, such as birthplace.

For this reason, it would be appropriate when treating the Census microdata files as a sample of persons to obtain a measure of the effect of clustering for the variables being investigated. A suitable measure is the design factor (DEFT), given by the ratio of the SE calculated correctly (with dwellings as units) to the SE calculated treating persons as units. Standard errors from the person level analysis can then be adjusted by this factor.

The SE ignoring clustering will be denoted by \(S E_{p}\left(y_{T o T}\right)\) , with the subscript \(p\) indicating that it is calculated at the person level. This can be obtained by taking the person level Census microdata file and creating a variable taking the value 1 for Australian born persons and 0 otherwise. This is then used to estimate the total and its SE.

An example using the 2011 Census microdata files showed that the standard error produced ignoring clustering underestimates the actual standard error by a factor of 2. Users could expect that other totals (eg. for geographic regions) for the variable 'Australian-born' would have a similar design factor.

Standard errors for proportions and differences

Proportions

Simple approximations can be used to estimate the standard error for a ratio of counts. If \(y_{T O T_{1}}\) and \(y_{T O T_{1}}\) are estimated totals for two nested categories (i.e. category 2 is a subset of category 1) then writing 

   \(R S E\left(y_{T O T}\right)=\frac{S E\left(y_{T O T}\right)}{y_{T O T}}\)

for the relative standard error gives the following approximation:

    \(R S E\left(\frac{y_{T O T_{2}}}{y_{T O T_{i}}}\right)=\sqrt{R S E\left(y_{T O T_{2}}\right)^{2}-R S E\left(y_{T O T_{i}}\right)^{2}}\)

This formula depends on the two categories being nested, and should not be used for distinct categories.

Differences

If two totals are for distinct categories (e.g. in comparing estimates across states), then the difference between two totals has the following SE approximation:

    \(S E\left(y_{T O T_{2}}-y_{T O T_{i}}\right)=\sqrt{S E\left(y_{T O T_{2}}\right)^{2}+S E\left(y_{T O T_{i}}\right)^{2}}\)

While this formula will only be exact for differences between separate and uncorrelated (unrelated) characteristics or sub-populations, it is expected to provide a good approximation for most differences likely to be of interest.

Regression estimates

One use of the sample file will be to examine relationships between variables using regression methods. By treating the dwelling as the sample unit, standard regression packages can be used unweighted and the resulting standard errors and test statistics will be good estimates. For example, a regression model could be derived for \(y_{i}\), the number of persons in the dwelling needing assistance with core activities, against various characteristics \(x_{1 i}, x_{2 i}, \ldots, x_{k i}\) such as \(x_{1 i}\) , the number of persons in the dwelling aged over 65 years, to fit the linear regression model:

    \(y_{i}=a+b_{1} x_{1 i}+\ldots+b_{k} x_{k i}\)

Measures of model fit and of significance of the parameters \(a, b_{1}, \ldots, b_{k}\) from the standard package will then be appropriate. Unfortunately, such a linear model may not adequately describe the relationships between variables at a dwelling level.

If a similar regression is performed treating person as the sample unit, the resulting standard errors and measures of significance could be inaccurate or misleading. This arises because the persons in the sample are clustered within dwellings, and so their responses may be "correlated" or affected by similar influences such as characteristics of the dwelling. The extent to which the measures of significance are affected will depend on how clustered the variable \(y_{i}\) is likely to be within dwellings.

If a person level analysis is performed, such as a 'logistic analysis' of the probability of a person having a given characteristic, then the effect of clustering should be taken into account when interpreting the outcomes. In particular, SE are likely to be understated, as discussed in the section Clustering of the person sample, and this will tend to increase the apparent significance of modelled effects.

Techniques are available to perform valid analyses at the person level for a sample that is clustered within dwellings, treating persons as being subject to both person and dwelling effects. These techniques include 'multi-level', 'random effect' and 'mixed' modelling. (Footnote ¹ and ²)

By using these techniques, models can be used that do a better job of describing the actual relationships between variables at both person and dwelling level. Statistical packages are widely available to validly perform such analyses.

History of changes

Show all

15/08/2023

Updated to add 2021 basic microdata information including the addition of the supporting data item list.

27/04/2023

Updated to include 2021 detailed microdata information and updates also made to 'Census data in TableBuilder'. An updated version of the 2021 Census TableBuilder data item list, and a 2021 detailed microdata data item list has been added.

21/09/2022

Updated to include 2021 Census TableBuilder information.

29/10/2019 

2016 Experimental Index of Household Advantage and Disadvantage (IHAD) datasets made available via Census TableBuilder Pro. Release includes supportive changes to 'Introduction' and 'Using TableBuilder for Census Data' chapters, as well as the 'TableBuilder Guest, Basic and Pro Data Items List' in the Data downloads section.

23/08/2019 

Additional content: Census TableBuilder Pro system restrictions now included in the 'Using TableBuilder for Census Data' chapter. Changes also made to the 'TableBuilder Guest, Basic and Pro Data Items List' in the Data downloads section.

11/04/2019

Basic CURF made available via Microdata Downloads. Release includes textual changes relating to sampling methodology and availability of Microdata products.

10/04/2019

Additional Content: Basic CURF data item list available via the Data downloads section.

10/01/2018

Updates to expected Basic CURF release date and minor corrections to Detailed Microdata data item list.

Quality declaration

Institutional environment

The microdata products addressed in this publication are released in accordance with the conditions specified in the Statistics Determination section of the Census and Statistics Act 1905, noting that the Census and Statistics (Information Release and Access) Determination 2018 came into effect on 15 August 2018 and has replaced the Statistics Determination 1983. This ensures that confidentiality is maintained whilst enabling unit record level data to be released. More information on the confidentiality practices can be found in the Data confidentiality guide.

For information on the institutional environment of the ABS, including the legislative obligations of the ABS, please see ABS Legislative Framework.

Relevance

Microdata files are the most detailed information available about key characteristics of people in Australia on Census Night and are released to support advanced data analysis. These characteristics are generally responses to individual questions on the Census form or data derived from two or more questions.

Timelines

The Census and Statistics Act 1905 requires the Australian Statistician to conduct a Census on a regular basis. Since 1961, a Census has been required every five years. Microdata products are usually released within three years of the collection of Census data.

Accuracy

The microdata files generally contain finer levels of detail of variables than what is otherwise published in other formats, for example in QuickStats or Community Profiles. For more information on the level of detail provided, see the associated data item lists for the individual microdata products found in the Data downloads section.

Steps to confidentialise the data made available on the microdata files are taken in such a way as to maximise the usefulness of the data while maintaining the confidentiality of respondents. As a result, it may not be possible to exactly reconcile all the statistics produced from the microdata with other published statistics.

Coherence

It is important for Census microdata to be comparable and compatible with previous censuses and related survey or administrative data sources. However:

  • There are differences regarding how the sample has been created in relation to larger households in different Census years.
  • The product types have changed overtime in response to the evolving institutional environment. This enables more detailed information to be provided for Census variables compared to previous Census years.
  • The classifications used for Census data topics change over time.

Interpretability

The information within this publication should be referred to when using the microdata products. It explains the sample methodology, use of the microdata files, file structure, the data item lists and changes over time.

For more information about Census content, refer to Guide to Census data.

Accessibility

Microdata files are available to approved users. Users wishing to access the microdata files should read the Responsible use of ABS microdata web page before applying for access. Users should also familiarise themselves with information available via the microdata entry page.

A full list of available microdata can be viewed via the Available microdata page. More detail regarding types and modes of access to microdata can be found on the Compare data services page.

Any questions regarding access to microdata can be forwarded to microdata.access@abs.gov.au

Previous catalogue number

This release previously used catalogue number 2037.0.30.001.

Back to top of the page