Construction of IHAD

Latest release
Index of Household Advantage and Disadvantage (IHAD): Technical Paper
Reference period
2021
Released
11/02/2025
Next release Unknown
First release

This chapter describes the methods used to construct IHAD, some important technical specifications and basic outputs.

Principal Component Analysis

Principal component analysis (PCA) has been used since the first release of SEIFA to summarise Census variables related to socio-economic advantage and disadvantage. The same methodology is used to create the IHAD, modified where necessary to use binary variables.. The aim of PCA is to reduce a large number of correlated variables into a smaller set of transformed variables, called "principal components". Each component is a weighted linear combination of the original. It is possible to extract as many components as there are variables. If the original variables are highly correlated, much of the variation can be summarised by a single principal component.

The first principal component is the weighted linear combination of variables that captures the maximum amount of variation present in the original dataset. This is calculated using the correlations between the variables. In general, variables that are strongly correlated with many others in the list will receive high weights. The first principal component is used to create the IHAD index.

The PCA used the binary candidate variables and the correlation matrix of these variables to give an indication of how significantly each variable contributes to the measurement of the unobserved latent variable of interest, namely socio-economic advantage and disadvantage. Each variable receives a loading that indicates the correlation of that variable with the index. A positive loading indicates an advantaging variable whereas a negative loading indicates a disadvantaging variable. The variables with the highest loadings are the ones that have the highest correlation with the index value.

Polychoric correlations were used instead of the standard Pearson correlations for the correlation matrix; this is appropriate for binary variables to ensure the correlation coefficients used in the PCA are unbiased. Using polychoric correlations is considered to be more accurate when running a PCA on discrete data such as the binary variables used in the IHAD.

The candidate variables listed in Description of candidate IHAD variables were used in the PCA for the IHAD and removed if their loading was less than or equal to 0.3 on the grounds that they were not particularly strong indicators of advantage or disadvantage. This process was performed iteratively, until all of the variables had a loading above 0.3. This is the same procedure used to create the SEIFA. The final variables and their loadings following this process are presented in the Technical details for IHAD: variables and loadings.

The first principal component scores were derived by taking the product of each standardised variable with its respective weight, then taking the sum. For convenience and consistency with the approach taken for SEIFA, these raw component scores were then standardised to a mean of 1,000 and a standard deviation of 100 to produce the index.

The sign of the PCA weights is arbitrary, but intuitively we want more disadvantaged households to have lower scores, for example NOCAR is a disadvantage variable and so should have a negative weight. The weights were multiplied by -1 to give advantage indicators positive weights and loadings, and disadvantage indicators negative weights and loadings. Accordingly, high index scores indicate relative advantage, and low index scores indicate relative disadvantage.

Step-by-step process

With the preceding two sections providing context, a step-by-step process for constructing IHAD is presented below:

1: Creating the initial variable list

Given the data available, we created a list of variables related to the definition of relative household socio-economic advantage and disadvantage.

2: Removing households with 10+ missing responses and imputing missing responses

We applied the IHAD scope to the dataset, and then identified households with 10 or more applicable missing responses. We removed these households from the dataset, imputed missing responses for most of the required variables, and then applied Hotdeck imputation for HIED and HEAP to create the dataset we used to construct the candidate variables.

3: Constructing the variables

We created binary indicators from household, family, and person level variables. These indicators take a value of 1 if the characteristics is present, and 0 if it isn’t.

4: Removing very highly correlated variables

We removed highly correlated variables to avoid over-representing any specific socio-economic characteristic. When two variables had a correlation coefficient greater than 0.8 in absolute value and were measuring conceptually similar aspects of advantage or disadvantage, we generally removed one of them. However, we applied some discretion, depending on the variables in question and the size of the correlation.

5: Conducting the initial PCA

We conducted principal component analysis (PCA) using the binary candidate variables and the correlation matrix of these variables, to obtain the loading for each variable on the first principal component.

6: Removing low loading variables

We excluded variables with loadings less than 0.3 in absolute value, on the grounds that they were not strong indicators of relative advantage or disadvantage. This limit is an accepted level in the PCA literature and has been used in past releases of SEIFA and IHAD. We removed variables one at a time, starting with the lowest loading variable.

7: Conducting PCA on the reduced list of variables

We conducted a PCA on the reduced variable list, and if any other variables loaded below 0.3, we repeated steps six and seven.

8. Calculating and standardising component/index scores

We derived the first principal component scores for each household by taking the product of each variable with its respective weight, then taking the sum across all variables. Note that the weight for each variable was calculated by dividing the loading by the square root of the eigenvalue.

\({Z_{SA1}} = \sum\limits_{j = 1}^p {\frac{{{L_j}}}{{\sqrt \lambda  }} \times {X_{j,}}_{SA1}}\)

where,

\({Z_{SA1}}\)= raw score for the SA1

\({{X_{j,}}_{SA1}}\) = standardised variable of the j-th variable for the SA1

\({{L_j}}\) = loading for the j-th variable

\(\lambda\) = eigenvalue of the principal component

\(p\) = total number of variables in the index

For convenience of presentation, we then rescaled the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the household index scores in IHAD.

Note that the principal components are arbitrary with respect to their sign (positive or negative), so we set the sign of the weights and loadings so that they make intuitive sense. That is, we gave advantage indicators positive weights and loadings, and disadvantage indicators negative weights and loadings. Accordingly, high scores indicate relative advantage, and low scores indicate relative disadvantage. This is consistent with previous editions of SEIFA and IHAD.

Technical details for IHAD: variables and loadings

This section gives the results of the principal component analysis carried out for IHAD, including variable loadings and percentage of variance explained. A list of variables initially considered for inclusion but removed due to high correlations with other variables or weak loadings is also provided.

IHAD summaries variables that indicate either relative socio-economic advantage or disadvantage, according to the concept described in Defining the concept behind IHAD. The final IHAD variables and loadings are listed below.

IHAD variables and loadings

IHAD indicators of disadvantage

The following variables are indicators of disadvantage. PUBLIC_RENT is the strongest indicator of disadvantage in the index. 

VariableDescriptionLoading
PUBLIC_RENTHouseholds being rented from a state or territory housing authority, or a housing co-operative/community/church group (disadvantage)-0.84
LOWRENTHouseholds where rent payments are less than $250 per week, excluding employer landlords (excludes $0) (disadvantage)-0.81
INC_LOWHouseholds with low annual equivalised income (between $1 and $25,999) (disadvantage)-0.71
NOYEAR11or higherHouseholds where the person with the highest educational attainment left school at year 10 or below, including those who did not go to school and with Certificate level I or II (excludes those currently studying secondary education) (disadvantage)-0.69
NOCARHouseholds with no car (disadvantage)-0.61
RETIRED_NOT_OWNEDHouseholds with a person aged 65 years and over who does not own the home, or occupy it under a life tenure scheme (disadvantage)-0.59
DISABILITY_HH_PROPHouseholds where more than 50% of people need assistance with core activities (disadvantage)-0.55
NOYEAR12_DEPENDENTHouseholds with at least one dependent child and the person with the highest educational attainment left school at year 11 or below, including those who did not go to school and with Certificate level I or II (excludes those currently studying secondary education) (disadvantage)-0.54
FEWBEDHouseholds with one or no bedrooms (disadvantage)-0.45
ALL_UNEMPLOYEDHouseholds where all people aged 15 years and over are unemployed (disadvantage)-0.44
YEAR11Households where the person with the highest educational attainment left school at year 11 (excludes those currently studying secondary education) (disadvantage)-0.41
CHILDJOBLESSHouseholds with children aged under 15 years and parent(s) not employed (disadvantage)-0.35
IHAD indicators of advantage

The following variables are indicators of advantage. DEGREE_DEPENDENT is the strongest indicator of advantage in the index.

VariableDescriptionLoading
HIGHCARHouseholds with three or more cars (advantage)0.43
HIGHBEDHouseholds with four or more bedrooms (advantage)0.50
INC_HIGHHouseholds with high annual equivalised income (greater than $90,999) (advantage)0.68
PURCHASEDHouseholds being purchased (advantage)0.75
DEGREEHouseholds where the person with the highest educational attainment has a Bachelor Degree or above (advantage)0.76
HIGH_SKILLHouseholds where the highest skilled employed adult works in a skill level 1 occupation (advantage)0.78
HIGHMORTGAGEHouseholds where mortgage repayments are greater than or equal to $2,900 per month (advantage)0.79
DEGREE_DEPENDENTHouseholds with at least one dependent child and the person with the highest educational attainment has a Bachelor Degree or above (advantage).0.81

The 2021 IHAD index explains 41.4% of the total variance of the variables in the final variable list. The Experimental IHAD 2016 explained 43.2% of this total variance.

Removal of highly correlated variables

In most cases, highly correlated variables were removed from the initial candidate list. This was done to prevent instability in the variable weights and over-representation of any specific socio-economic characteristic. When two variables had a correlation coefficient of size greater than 0.8 in absolute value, one of them was generally removed. However, if they were deemed to be measuring different socio-economic characteristics (e.g. education and occupation), both were retained.

Variable descriptionReason for exclusion
Households with one or more people aged 15 years and over who are unemployed (UNEMPLOYED) (disadvantage)Highly correlated with ALL_UNEMPLOYED which highlights disadvantaged households better
Households with one or more people aged 70 years and over who need assistance with core activities (DISABILITY_OVER70) (disadvantage)Highly correlated with DISABILITY_HH_PROP (0.83) and not as representative of the total population
Households where all people aged 15 years and over have no educational attainment (NOEDU) (disadvantage)Small prevalence and highly correlated with NOYEAR11_OR_HIGHER

Removal of low loading variables

The following variables were initially considered for the index but were excluded when the analysis showed that they were weak indicators of relative advantage or disadvantage.

VariableVariable description
OVERCROWDHouseholds requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) (disadvantage)
SPAREBEDHouseholds with one or more bedrooms spare (based on Canadian National Occupancy Standard) (advantage)
OTHER_HHLDHouseholds with a structure classified as "other" (e.g. caravan, tent) (disadvantage)
MULTI_FAMILYMulti-family households (advantage)
HIGHRENTHouseholds where rent payments are more than $500 per week (advantage)
OWNEDHouseholds owned outright (advantage)
ONEPARENTHouseholds with a one-parent family, with dependent children only (disadvantage)
CERTIFICATEHouseholds where the person with the highest educational attainment has a Certificate III or IV (advantage)
DIPLOMAHouseholds where the person with the highest educational attainment has an Advanced Diploma or Diploma (advantage)
SKILL_LVL_2Households where the highest skilled employed adult works in a skill level 2 occupation (advantage)
SKILL_LVL_4Households where the highest skilled employed adult works in a skill level 4 occupation (disadvantage)
LOW_SKILLHouseholds where the highest skilled employed adult works in a skill level 5 occupation (disadvantage)
ALL_SHORT_DISTANCEHouseholds where all people aged 15 years and over who are employed, travel 0 to less than 2.5 km to work (advantage)
ALL_LONG_DISTANCEHouseholds where all people aged 15 years and over who are employed travel 50 to less than 250 km to work (disadvantage)
ALL_VLONG_DISTANCEHouseholds where all people aged 15 years and over who are employed travel 250 or more km to work (disadvantage)
SEP_DIVORCEDHouseholds with one or more people aged 15 years and over separated or divorced (disadvantage)
ENGPOORHouseholds with one or more people aged 15 years and over who do not speak English well (disadvantage)
ROMHouseholds with one or more people aged 15 years and over who arrived in Australia in the last 10 years (disadvantage)
UNENGAGED_YOUTHHouseholds with one or more people aged between 15 and 24 years who are not working or studying (disadvantage)
CARERHouseholds with one or more people aged 15 years and over who provide unpaid assistance to a person with a disability (disadvantage)
VOLUNTEERHouseholds with one or more people aged 15 years and over who does voluntary work for an organisation or group (advantage)

Distribution of the IHAD

This section presents the frequency histogram of IHAD scores. The IHAD distributions have generally similar shapes to those from Experimental IHAD 2016.

The scores for IHAD range from 613 to 1,246; the table presents maximum and minimum scores of each IHAD quartile. These show that there is sufficient variation in the IHAD scores to allow for the formation of these groups.

Some households will not have any indicators of advantage or disadvantage (i.e. their values for the final binary candidate variables are all 0). They will still receive an IHAD score reflecting the middle of the IHAD score distribution, which places them in quartile 2.

Frequency distribution of ranked households index group
Household index groupNumber of households*Household index score
FrequencyPercentageMinimumMaximum
1              2,307,765 25.0613943
2              2,308,638 25.0943992
3              2,338,78625.39921,070
4              2,275,87224.71,0701,246

* The total number of in-scope households assigned an IHAD score is 9,231,061

Back to top of the page