2080.0 - Microdata: Australian Census Longitudinal Dataset, ACLD Quality Declaration 
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 20/03/2019   
   Page tools: Print Print Page Print all pages in this productPrint All

METHODOLOGY


SCOPE AND COVERAGE

The ACLD is a random 5% sample of persons enumerated in Australia on each Census Night, and linked together using statistical techniques. Three waves of data have contributed to the ACLD so far, from the 2006, 2011 and 2016 Censuses.

The Census covers all areas in Australia and includes persons living in both private and non-private dwellings but excludes:

  • diplomatic personnel of overseas governments and their families
  • Australian residents overseas on Census Night

Overseas visitors are excluded from the 2011 ACLD sample. Visitors within Australia to private and non-private dwellings on Census Night are included.

For more information on the scope and coverage of the Census:

SAMPLE DESIGN

In preparation for adding 2016 Census data to the ACLD, a new panel of 2011 Census records was selected as a representative sample of the 2011 population. The 2011 Panel was designed to include most of the 2011 Census records that were linked in the 2006 Panel, with new records added to account for missed links in the 2006 Panel, and new births and migrants since the 2006 Census.

Sample maintenance

Without sample maintenance, the ACLD would decline in its ability to accurately reflect the Australian population over time, due to:
  • people newly in scope of the ACLD (i.e. children born and immigrants who arrived in Australia since the previous Census) not being represented in the sample,
  • people no longer being in scope due to death or overseas migration, and
  • missing and/or incorrect links.

The 2011 Panel sample was increased slightly to 5.7%, to achieve a linked sample size of no greater than 5% of the population, after allowing for missed links and people no longer being in scope due to death or overseas migration. The 2011 panel sample of over one million records (1,221,057) from the 2011 Census was linked to the 2016 Census, resulting in a linked sample size of 927,520 records at a linkage rate of 76%. This achieved a linked sample size of 4.3%.

Multi panel sample method

The ACLD sample is maintained through the application of the Multi-Panel framework, which provides an approach for selecting records in the ACLD to create panels that maintain the longitudinal and cross-sectional representativeness of the dataset over time, while minimising the impact of accumulated linkage bias on longitudinal analysis.

The Multi-Panel framework is comprised of multiple overlapping panels, with each panel representing a single Census population (2006, 2011, 2016, etc.). Each Census year a panel is selected and linked to subsequent Censuses. The sample selection strategy for each panel is designed to maintain a linked sample size of 5%, maximise sample overlap between the panels, and introduce new records to the dataset in each panel to account for new births, migrants and missed links in previous panels.

This allows flexibility for users, who can draw on the most appropriate panel for their research question.

For further information on the Multi-Panel framework refer to Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 (cat. no. 2080.5).


LINKING METHODOLOGY

Linking Strategy

Data from the ACLD Panel samples and the Census files were brought together using data linkage techniques.

Data linkage is typically undertaken using deterministic and/or probabilistic methods:
  • Deterministic linkage: involves assigning record pairs across two datasets that match exactly or closely on common variables. This type of linkage is most applicable where the records from different sources consistently report sufficient information and can be an efficient process for conducting linkage.
  • Probabilistic linkage: is based on the level of overall agreement on a set of variables common to the two datasets. This approach allows links to be assigned in spite of some missing or inconsistent information, providing there is enough agreement on other variables.

Linking Variables

The variables on Census files that were used for linking include:
  • First name hash code
  • Surname hash code
  • Age
  • Sex
  • Date of birth
  • Indigenous status
  • Country of birth
  • Year of arrival
  • Marital status
  • Religion
  • Language spoken
  • Mother's age
  • Mother's day and month of birth
  • Mother's country of birth
  • Father's age
  • Father's day and month of birth
  • Father's country of birth
  • Mesh block
  • Statistical Areas 1, 2 and 4.

A number of linkage passes were conducted based on different combinations of these variables to ensure each record had the highest possible chance of being linked.

For more information about the linking variables used, see:
There were two main reasons why some records were not linked across Census files:

I. Records belonging to the same individual were present at both time points but these records failed to be linked because they contained missing or inconsistent information.
II. The person had no record in the later Census.

For detailed information on the linking methodology and an assessment of its quality see Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 (cat. no. 2080.5).

To protect the privacy of Census respondents, we used an ABS encoded Census name for linking 2011 and 2016 Census records in the ACLD. Encoding was undertaken in 2011 for the purpose of protecting privacy by anonymising name and improving the future quality and efficiency of the linking process.

The codes are created by grouping people with a combination of letters from their first and last names using a secure one-way process, meaning that a code cannot be reversed to deduce the original name information. Each code represents approximately 2,000 people drawn from many different letter combinations, and therefore is not unique to an individual. Actual name information from the 2016 Census was not used to link to 2011 Census records.

For further information, see Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 (cat. no. 2080.5).

At the end of the linkage processes:
    • 800,759 (82%) of the 979,661 sample records from the 2006 Panel were linked to a 2011 Census record on the original 2006-11 linkage dataset
    • 927,520 (76%) of the 1,221,057 sample records from the 2011 Panel were linked to a 2016 Census record
    • 756,945 (77%) of the 979,662 sample records from the 2006 Panel were linked to a 2011 Census record, to form the relinked 2006-11 portion of the 2006-11-16 file. These record pairs were then linked to the 2016 Census via the 2011 Census record in each pair, which achieved 605,618 links (80% of the 2011 records in the 2006 Panel). Of links from the 2006 Panel sample, 62% linked to both the 2011 and 2016 Censuses.


WEIGHTING, BENCHMARKING AND ESTIMATION

Weighting

Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, persons. The weight can be considered an indication of how many people in the relevant population are represented by each person in the sample. Weights were created for linked records in the ACLD to enable longitudinal population estimates to be produced.

Each Panel of the ACLD is a random 5% sample of persons enumerated in Australia on Census Night. As such, each person in the sample should represent about 20 people in the Australian population. Between Censuses, however, the Australian population in scope of the ACLD changes as people die or move overseas. In addition, Census net undercount and data quality can affect the capacity to link equivalent records across waves.

The ACLD weights benchmark the linked records to the estimated Australian in scope population. The weights were based on four components: the design weight, undercoverage adjustment, missed link adjustment and population benchmarking.

For the 2006-11 weight, the original population benchmark was the 2011 Estimated Resident Population (ERP). The 2011 ERP was chosen over the 2006 ERP as the baseline population is more recent. The 2011 ERP was then adjusted to exclude people who were not in Australia in 2006.

For the 2011-16 weight, the population benchmark is based on the 2016 Estimated Resident Population (ERP). This population benchmark was adjusted by the estimated probability that a person was also in Australia in 2011. This probability was estimated using the 2016 Census 'reported 5 year ago address' variable.

For the 2006-2011-2016 weight, the population benchmark is based on the 2011 and 2016 Estimated Resident Population (ERP), which is adjusted by the estimated probability a person belongs to the longitudinal population. This probability is formed using the Census reported address five year ago variable from the 2011 or 2016 Census.

Weights were benchmarked to the following population groups:
  • state/territory by age (ten year groups) by sex by mobility (interstate arrivals benchmarked separately), and
  • Indigenous status by state/territory.

The 2006-11 weights (original 2006-11 dataset) have a mean value of 24 and range between 17 and 103. Higher weights are associated with people of Aboriginal and Torres Strait Islander origin, and people who moved interstate between 2006 and 2011.

The 2006-11 (re-link) weights available on the 2006-11-16 DataLab file have a mean value of 25.0 for females and 26.6 for males. The weights range between 16.1 and 176.9. The mean weight was higher for Aboriginal and Torres Strait Islander persons and for people in the Northern Territory.

The 2006-11-16 weights have a mean value of 29.4 for females and 31.5 for males. The weights range between 15.9 and 341.3. The mean weight was higher for Aboriginal and Torres Strait Islander persons and for people in the Northern Territory.

The 2011-16 weights have a mean value of 22.3 for females and 23.2 for males. The weights range between 14.8 and 83. The mean weight was higher for Aboriginal and Torres Strait Islander persons and for people in the Northern Territory.

Estimation

Estimates of population groups are obtained by summing the weights of persons with the characteristic(s) of interest.

For further information about ACLD weighting and estimation refer to Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 (cat. no. 2080.5).


SOURCES OF ERROR

All reasonable attempts have been taken to ensure the accuracy of the longitudinal dataset. Nevertheless potential sources of error including sampling, linking and Census quality error should be kept in mind when interpreting the results.

Sampling Error

Sampling error occurs because only a small proportion of the total population is used to produce estimates that represent the whole population. Sampling error refers to the fact that for a given sample size, each sample will produce different results, which will usually not be equal to the population value.

There are two common ways of reducing sampling error - increasing sample size and/or utilising an appropriate selection method (for example, multi-stage sampling would be appropriate for household surveys). Given the large sample size for the ACLD (1 in 20 persons), and simple random selection, sampling error is minimal.

Linking Accuracy

False links can occur during the linkage process as even when a record pair matches on all or most linking fields, it may not actually belong to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are nevertheless false. The nature of the process used for the ACLD linkage means that while the links obtained are to a high degree of accuracy, some false links may be present within the ACLD dataset. There is an estimated 5-10% false link rate in the original linkage of the 2006-2011 ACLD linkage, an estimated 5% false link rate in the re-link of the 2006-2011 ACLD linkage and an estimated 1% false link rate in the 2011-2016 ACLD linkages.

For further detail on the accuracy of the linkage, see Linkage Results sections in to Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 (cat. no. 2080.5).

Managing Census Quality

The ABS aims to produce high quality data from the Census. To achieve this, extensive effort is put into Census form design, collection procedures and processing procedures.

There are four principle sources of error in Census data: respondent error, processing error, partial response and undercount. Quality management of the Census program aims to reduce error as much as possible, and to provide a measure of the remaining error to data users, to allow them to use the data in an informed way.

Information about the quality of the 2006, 2011 and 2016 Census data is available on the Data Quality page on the ABS website.

The Census Independent Assurance Panel concluded that the 2016 Census data is of comparable quality to 2011 and 2006 Census data so may be used with confidence. Information is available in Census of Population and Housing: Understanding the Census and Census Data, Australia, 2016 (cat. no. 2900.0).

For more detail see Managing Census Quality, in Census of Population and Housing: Census Dictionary, 2016 (cat. no. 2901.0).

Respondent Error

For most households in Australia, the Census is self-enumerated. This means that householders are required to complete the Census form themselves, rather than having the help of a Census collector. The Census form may be completed by one household member on behalf of others. Error can be introduced if the respondent does not understand the question, or does not know the correct information about other household members. Self-enumeration carries the risk that wrong answers could be given, either intentionally or unintentionally

Processing Error

Much of the data on the Census form is recorded using automatic processes, such as scanning, Intelligent Character Recognition and other automatic processes. Quality assurance procedures are used during Census processing to ensure processing errors are kept at an acceptable level. Sample checking is undertaken during coding operations, and corrections are made where necessary.

Partial Response

When completing their Census form, some people do not answer all the questions which apply to them. While questions of a sensitive nature are generally excluded from the Census, all topics have a level of non-response. This can be measured and is generally low. In those instances where a householder fails to answer a question, a 'not stated' code is allocated during processing, with the exception of non-response to age, sex, marital status and place of usual residence. These variables are needed for population estimates, so they are imputed using other information on the Census form, as well as information from the previous Census.

Undercount

The goal of the Census is to obtain a complete measure of the number and characteristics of people in Australia on Census Night and their dwellings, but it is inevitable that a small number will be missed and some will be counted more than once. In Australia more people are missed from the Census than are counted more than once, thus the effect when both factors are taken into account is a net undercount.

For more detail see Managing Census Quality, in Census of Population and Housing: Census Dictionary, 2016 (cat. no. 2901.0).


DATA CONSISTENCY

A small percentage of linked records have inconsistent data, such as a different country of birth at the two time points or an age inconsistency of more than one year (when the expected five year difference is accounted for). Inconsistencies may be due to:
  • false link - the record pair does not belong to the same individual
  • reporting error - information for the same individual was reported differently at different time points
  • processing error - the value of a data item was inaccurately assigned or imputed during processing.

In most analysis, the effect of inconsistent information may only have a small impact. Characteristics from the 2006, 2011 or the 2016 data can be used in tables and some exploration of consistency over time will assist in drawing appropriate conclusions.

No data editing was applied to the file beyond that which had already taken place during the relevant Census processing period. A set of consistency flags has been included on the ACLD file so that inconsistent data may be observed, quantified or excluded from calculations. Consistency flags, located in the Quality Indicators group of data items, have been created for Census variables that would not be expected to change over time or have unlikely transitions over time. These are as follows:
  • Age
  • Sex
  • Country of Birth
  • Birthplace of Person
  • Birthplace of Female Parent
  • Birthplace of Male Parent
  • Year of Arrival
  • Indigenous Status
  • Registered Marital Status
  • Highest Year of School Completed
  • Level of Highest Non-School Qualification
  • Country of Birth of Spouse or Partner
  • Number of Children Ever Born.

There are numerous ways to define 'consistency'. The consistency flags have fine level categories to allow users flexibility in using their own definition of 'consistent' or 'inconsistent'. For example, where one Census has 'not stated' for the year of arrival data item, a user can decide whether the record should be considered consistent or not. The same applies to where the response for one Census is 'not applicable'. The labels attached to each category suggesting consistency or inconsistency will assist the user in determining which records are consistent or inconsistent for their needs. The tables below use the relevant labels to define inconsistency.

See also Quality Indicators in the Data Items sections.


INCONSISTENT REPORTING ON THE ORIGINAL 2006-2011 LINKED ACLD FILE, By Selected Characteristics
      Characteristic
Proportion of linked records with inconsistent data between 2006 and 2011 (a)

      Age (by more than 1 year)
2.41%
      Sex
0.11%
      Birthplace of person
2.09%
      Birthplace of female parent
4.01%
      Birthplace of male parent
4.41%
      Year of arrival (b)
17.86%
      Indigenous status
0.53%
      Registered marital status
0.71%
      Highest year of school completed
6.27%
      Level of highest non-school qualification
14.86%
      Country of birth of spouse or partner (b)
3.85%
      Number of children ever born
2.79%

(a) Excludes records where a relevant data item was not stated, inadequately described or not applicable in both years.
(b) Excludes records where a response was not applicable one year and applicable the other.


INCONSISTENT REPORTING ON THE LINKED 2006-11-16 ACLD FILE, By Selected Characteristics
      Characteristic
Proportion of linked records with inconsistent data between 2006 and 2011 (a)

      Age (by more than 1 year)
0.42%
      Sex
0.05%
      Birthplace of person
2.12%
      Birthplace of female parent
2.58%
      Birthplace of male parent
2.75%
      Year of arrival (b)
13.24%
      Indigenous status
0.74%
      Registered marital status
0.46%
      Highest year of school completed
7.15%
      Level of highest non-school qualification
14.44%
      Country of birth of spouse or partner (b)
3.01%
      Number of children ever born
1.87%

(a) Excludes records where a relevant data item was not stated, inadequately described or not applicable in both years.
(b) Excludes records where a response was not applicable one year and applicable the other.


INCONSISTENT REPORTING ON THE 2011-2016 LINKED ACLD FILE, By Selected Characteristics
      Characteristic
Proportion of linked records with inconsistent data between 2011 and 2016 (a)

      Age (by more than 1 year)
0.92%
      Sex
0.16%
      Birthplace of person
1.23%
      Birthplace of female parent
1.60%
      Birthplace of male parent
1.90%
      Year of arrival (b)
15.94%
      Indigenous status
0.83%
      Registered marital status
0.54%
      Highest year of school completed
6.57%
      Level of highest non-school qualification
12.87%
      Country of birth of spouse or partner (b)
1.72%
      Number of children ever born
1.10%

(a) Excludes records where a relevant data item was not stated, inadequately described or not applicable in both years.
(b) Excludes records where a response was not applicable one year and applicable the other.


INCONSISTENT REPORTING ON THE LINKED 2006-11-16 ACLD FILE, By Selected Characteristics
      Characteristic
Proportion of linked records with inconsistent data between 2011 and 2016 (a)

      Age (by more than 1 year)
0.63%
      Sex
0.10%
      Birthplace of person
1.03%
      Birthplace of female parent
1.49%
      Birthplace of male parent
1.82%
      Year of arrival (b)
15.53%
      Indigenous status
0.70%
      Registered marital status
0.44%
      Highest year of school completed
6.47%
      Level of highest non-school qualification
12.56%
      Country of birth of spouse or partner (b)
1.45%
      Number of children ever born
0.99%

(a) Excludes records where a relevant data item was not stated, inadequately described or not applicable in both years.
(b) Excludes records where a response was not applicable one year and applicable the other.


Back to top of the page