The data underpinning IHAD
This chapter looks at the data used to construct IHAD 2021. All data is from the 2021 Census of Population and Housing.
The candidate list of variables
Variables from the Census were included in the initial candidate variable list for IHAD if they were deemed to be related to the definition of advantage and disadvantage that the IHAD is intending to capture. The same candidate variable list from the Experimental IHAD 2016 was used for IHAD 2021, excluding the dwelling internet connection variable as noted in Variables underpinning IHAD. The candidate variables fall into a multi-dimensional framework. The dimensions are:
- housing,
- family,
- education,
- occupation, and
- miscellaneous.
Constructing the variables
Specifications
IHAD is constructed from 2021 Census data, with variables derived as binary indicators. Variables typically relate to persons but also relate to families or dwellings. Family and person level variables have been derived at the household level. For example, for the candidate variable ‘households where the person with the highest educational attainment has a Bachelor Degree or above’, the highest qualification for all in scope persons in the household is considered and if one person has a Bachelor Degree or higher, the derived variable has a value of 1. If no people in the household have a Bachelor Degree or higher, the value is 0. For the candidate variable ‘households where all people aged 15 years and over are unemployed’ if all in scope people aged 15 years and over in the household have labour force status unemployed, the derived variable will have a value of 1, otherwise it will have a value of 0. In most cases, the indicator specifications were based on the Experimental IHAD 2016 specifications. Some minor changes were made to reflect updates to the Census 2021 variable coding. The Appendix contains detailed descriptions of the indicator specifications used for all the IHAD variables.
Scope
The scope of the IHAD is private dwellings that were occupied on Census Night. Non-classifiable occupied private dwellings (e.g. dwellings that only contained visitors) and unoccupied private dwellings were excluded. This accounted for approximately 1.6 million dwellings or 14.5% of all private dwellings. Approximately 1.0 million (9.6%) were unoccupied private dwellings; 0.5 million (4.9%) were non-classifiable occupied private dwellings. Non-private dwellings, offshore, migratory, and shipping were also excluded. Note that residents in boarding houses and hostels are not included as these are classified as non-private dwellings.
Census (all private dwellings - in scope dwellings) | Excluded from index | % |
---|---|---|
10.85 - 9.28 million dwellings | 1.58 million dwellings | 14.5 |
Persons temporarily away from home
The IHAD is calculated based on the characteristics of persons who are both usually resident in a household and enumerated in that household on Census Night. If all usual residents of a household aged 15 or more were away from home on Census Night, that dwelling would be out of scope of the IHAD.
Persons temporarily overseas on Census Night are out of scope of the Census, and thus Census data is not available for those persons. Persons staying elsewhere in Australia are in scope of the Census, but they are not able to be associated back to their dwelling of usual residence, and therefore their characteristics as measured in the Census are not able to be used in the derivation of the household level variables used in the index.
If one or more usual residents were away, but at least one person was at home on Census Night, then that dwelling remains in scope of the IHAD and an index value would be calculated for that household. However, the persons temporarily away from that dwelling would not have their characteristics contributing to the index value for that household; only those persons present will. This may result in a different level of advantage or disadvantage being calculated for that dwelling than would have been the case had all persons usually resident in that dwelling been at home on Census Night. Around 4.8% (0.44 million dwellings) of in scope dwellings had one or more usual residents away from home on Census Night.
An example of this situation would be a one family couple household with two adults usually resident. One member of the couple is unemployed and was at home on Census Night, while the other person was travelling for work-related reasons. The person characteristics used in the calculation of the IHAD are based only on the characteristics of the (unemployed) person that was home on Census Night.
Exclusions
Rules for the minimum number of persons and dwellings for an area to receive an index score has been a feature of SEIFA since its inception following the 1986 Census. In the Statistical Areas Level 1 (SA1) and Statistical Area Level 2 (SA2) data cubes available in the Data downloads section, IHAD quartile percentages will not be provided for SA1s or SA2s that do not have a SEIFA score. Refer to Areas without a SEIFA score for more information about these excluded areas.
Missing responses
Data quality considerations for construction of an index at household level centre on the level of non-response to Census questions. Overall non-response was relatively low (around 0-6%) and fairly consistent across candidate variables, with the exception of equivalised total household income (HIED and level of highest educational attainment (HEAP) (around 7-11%). Please refer to the Census of Population and Housing: Census Dictionary, 2021 for details about these variables.
Due to partial non-response from some Census respondents, some households could not be included in the IHAD construction without some action to account for missing values within candidate variables. For example, for the candidate variable ‘Households where all people aged 15 years and over are unemployed’, if someone within the household does not indicate their labour force status, it may not be possible to assign a value for this candidate variable.
Two actions to deal with missing data have been applied:
- removal of households with high numbers of non-response
- imputation of missing values
Removal of households with high numbers of non-response
Households with 10 or more missing candidate variable responses have been removed. This number is consistent with the Experimental IHAD 2016 and was chosen because it tended to correspond to dwellings where most person-based variables were coded as 'not stated' (Wise and Williamson, 2013). Approximately 0.5% of in scope households (43,726) were removed; the proportion of households with 3 or fewer missing candidate variable responses was approximately 97.5%.
Imputation of missing values
Wise and Williamson (2013) noted that if ‘not stated’ responses are grouped with records that do not have a particular disadvantaging characteristic, then there is an implicit advantage being assigned to those individuals. They recommended that imputation should be performed where appropriate.
Missing values for household, family, and person level Census variables that were required have been imputed. The method used randomly assigned missing responses for a given variable to one of the allowed responses, based on the frequency proportions for the variable at the national level. As a result, the distribution of the imputed responses for most of the variables being treated aligned within reasonable bounds with the original distribution of non-missing responses.
This was also true for HIED (equivalised total household income) at the national level. However, within household composition categories the distribution of imputed values was more variable. By state, the rate of non-response for HIED was between 5-11%; when looking at different household compositions it ranged from 3-19%, with group and multiple family households having the higher values within that range. Similar results were observed for HEAP (level of highest educational attainment) by state and by ten-year age groups.
Due to this variability and consistent with the Experimental IHAD 2016, hot-deck imputation was applied to HIED and HEAP. This involved assigning a value for the missing HIED/HEAP variable from a donor that matched the recipient’s values for a selection of other Census variables. For HIED the selection of variables were household composition, state or territory, SA2, and number of adults and employed people in the household; for HEAP the selection of variables were: age in 10 year groups, state or territory, SA2, level of non-school qualification, highest year of school completed, total personal weekly income and labour force status. The distribution of the imputed responses for each of the variables improved from the initial imputation approach and was deemed to be sufficient for the purpose of constructing the IHAD.
Description of candidate IHAD variables
This section contains a description of each variable on the candidate variable list. The tables containing the variable descriptions also state whether the variable is an indicator of relative advantage (adv) or relative disadvantage (dis). Each subsection corresponds to one of the socio-economic dimensions listed in The candidate list of variables. The candidate list includes all variables considered for inclusion in IHAD before the principal component analysis (PCA) stage. The final list of variables included in IHAD can be found in the Technical details for IHAD: variables and loadings.
Housing variables
List of household variables
Variable | Variable description |
---|---|
NOCAR | Households with no car (dis) |
HIGHCAR | Households with three or more cars (adv) |
FEWBED | Households with one or no bedrooms (dis) |
HIGHBED | Households with four or more bedrooms (adv) |
OVERCROWD | Households requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) (dis) |
SPAREBED | Households with one or more bedrooms spare (based on Canadian National Occupancy Standard) (adv) |
OTHER_HHLD | Households with a structure classified as "other" (e.g. caravan, tent) (dis) |
MULTI_FAMILY | Multi-family households (adv) |
LOWRENT | Households where rent payments are less than $250 per week, excluding employer landlords (excludes $0) (dis) |
HIGHRENT | Households where rent payments are more than $500 per week (adv) |
PUBLIC_RENT | Households being rented from a state or territory housing authority, or a housing co-operative/community/church group (dis) |
OWNED | Households owned outright (adv) |
PURCHASED | Households being purchased (adv) |
HIGHMORTGAGE | Households where mortgage repayments are greater than or equal to $2,900 per month (adv) |
AREA_RVR | Households in remote/very remote area (dis) |
AREA_MC | Households in major cities (adv) |
The cut-off values that are used to determine which dwellings are considered to have high or low income, mortgage repayments, and rent, mostly align with those used for 2021 SEIFA. These were updated, based on the most recent Census, to reflect real-world changes. For the mortgage and rent variables, the high value cut-off captures the 9th and 10th deciles while the low value cut-off captures the 1st and 2nd deciles.
Family variables
List of family variables
Variable | Variable description |
---|---|
ONEPARENT | Households with a one-parent family, with dependent children only (dis) |
CHILDJOBLESS | Households with children aged under 15 years and parent(s) not employed (dis) |
Education variables
List of person variables - education
Variable | Variable description |
---|---|
NOYEAR11_OR_HIGHER | Households where the person with the highest educational attainment left school at year 10 or below, including those who did not go to school and with Certificate level I or II (excludes those currently studying secondary education) (dis) |
YEAR11 | Households where the person with the highest educational attainment left school at year 11 (excludes those currently studying secondary education) (dis) |
CERTIFICATE | Households where the person with the highest educational attainment has a Certificate III or IV (adv) |
DIPLOMA | Households where the person with the highest educational attainment has an Advanced Diploma or Diploma (adv) |
DEGREE | Households where the person with the highest educational attainment has a Bachelor Degree or above (adv) |
DEGREE_DEPENDENT* | Households with at least one dependent child and the person with the highest educational attainment has a Bachelor Degree or above (adv). |
NOYEAR12_DEPENDENT* | Households with at least one dependent child and the person with the highest educational attainment left school at year 11 or below, including those who did not go to school and with Certificate level I or II (excludes those currently studying secondary education) (dis). |
* Combining education level with dependent children represents the concept of household level advantage/disadvantage to children from having or not having educated parents. Dependent children are derived using CDCF (Counts the number of dependent children in the family). A dependent child is a person who is either a child under 15 years of age, or a dependent student aged 15-24 years.
Occupation variables
List of person variables - occupation
Variable | Variable description |
---|---|
INC_LOW | Households with low annual equivalised income (between $1 and $25,999) (dis) |
INC_HIGH | Households with high annual equivalised income (greater than $90,999) (adv) |
ALL_UNEMPLOYED | Households where all people aged 15 years and over are unemployed (dis) |
HIGH_SKILL | Households where the highest skilled employed adult works in a skill level 1 occupation (adv) |
SKILL_LVL_2 | Households where the highest skilled employed adult works in a skill level 2 occupation (adv) |
SKILL_LVL_4 | Households where the highest skilled employed adult works in a skill level 4 occupation (dis) |
LOW_SKILL | Households where the highest skilled employed adult works in a skill level 5 occupation (dis) |
ALL_SHORT_DISTANCE | Households where all people aged 15 years and over who are employed, travel 0 to less than 2.5 km to work (adv) |
ALL_LONG_DISTANCE | Households where all people aged 15 years and over who are employed, travel 50 to less than 250 km to work (dis) |
ALL_VLONG_DISTANCE | Households where all people aged 15 years and over who are employed, travel 250 or more km to work (dis) |
Miscellaneous variables
List of person variables - miscellaneous
Variable | Variable description |
---|---|
SEP_DIVORCED | Households with one or more people aged 15 years and over separated or divorced (dis) |
ENGPOOR | Households with one or more people aged 15 years and over who do not speak English well (dis) |
ROM | Households with one or more people aged 15 years and over who arrived in Australia in the last 10 years (dis) |
UNENGAGED_YOUTH | Households with one or more people aged between 15 and 24 years who are not working or studying (dis) |
DISABILITY_UNDER70 | Households with one or more people aged under 70 years who need assistance with core activities (dis) |
DISABILITY_HH_PROP | Households where more than 50% of people need assistance with core activities (dis) |
CARER | Households with one or more people aged 15 years and over who provide unpaid assistance to a person with a disability (dis) |
VOLUNTEER | Households with one or more people aged 15 years and over who does voluntary work for an organisation or group (adv) |
RETIRED_NOT_OWNED | Households with a person aged 65 years and over who does not own the home, or occupy it under a life tenure scheme (dis) |