Construction of IHAD

Latest release

Index of Household Advantage and Disadvantage (IHAD): Technical Paper

Reference period

2021

Released

11/02/2025

Next release Unknown

First release

This chapter describes the methods used to construct IHAD, some important technical specifications and basic outputs.

Principal Component Analysis

Principal component analysis (PCA) has been used since the first release of SEIFA to summarise Census variables related to socio-economic advantage and disadvantage. The same methodology is used to create the IHAD, modified where necessary to use binary variables.. The aim of PCA is to reduce a large number of correlated variables into a smaller set of transformed variables, called "principal components". Each component is a weighted linear combination of the original. It is possible to extract as many components as there are variables. If the original variables are highly correlated, much of the variation can be summarised by a single principal component.

The first principal component is the weighted linear combination of variables that captures the maximum amount of variation present in the original dataset. This is calculated using the correlations between the variables. In general, variables that are strongly correlated with many others in the list will receive high weights. The first principal component is used to create the IHAD index.

The PCA used the binary candidate variables and the correlation matrix of these variables to give an indication of how significantly each variable contributes to the measurement of the unobserved latent variable of interest, namely socio-economic advantage and disadvantage. Each variable receives a loading that indicates the correlation of that variable with the index. A positive loading indicates an advantaging variable whereas a negative loading indicates a disadvantaging variable. The variables with the highest loadings are the ones that have the highest correlation with the index value.

Polychoric correlations were used instead of the standard Pearson correlations for the correlation matrix; this is appropriate for binary variables to ensure the correlation coefficients used in the PCA are unbiased. Using polychoric correlations is considered to be more accurate when running a PCA on discrete data such as the binary variables used in the IHAD.

The candidate variables listed in Description of candidate IHAD variables were used in the PCA for the IHAD and removed if their loading was less than or equal to 0.3 on the grounds that they were not particularly strong indicators of advantage or disadvantage. This process was performed iteratively, until all of the variables had a loading above 0.3. This is the same procedure used to create the SEIFA. The final variables and their loadings following this process are presented in the Technical details for IHAD: variables and loadings .

The first principal component scores were derived by taking the product of each standardised variable with its respective weight, then taking the sum. For convenience and consistency with the approach taken for SEIFA, these raw component scores were then standardised to a mean of 1,000 and a standard deviation of 100 to produce the index.

The sign of the PCA weights is arbitrary, but intuitively we want more disadvantaged households to have lower scores, for example NOCAR is a disadvantage variable and so should have a negative weight. The weights were multiplied by -1 to give advantage indicators positive weights and loadings, and disadvantage indicators negative weights and loadings. Accordingly, high index scores indicate relative advantage, and low index scores indicate relative disadvantage.

Step-by-step process

With the preceding two sections providing context, a step-by-step process for constructing IHAD is presented below:

1: Creating the initial variable list

Given the data available, we created a list of variables related to the definition of relative household socio-economic advantage and disadvantage.

2: Removing households with 10+ missing responses and imputing missing responses

We applied the IHAD scope to the dataset, and then identified households with 10 or more applicable missing responses. We removed these households from the dataset, imputed missing responses for most of the required variables, and then applied Hotdeck imputation for HIED and HEAP to create the dataset we used to construct the candidate variables.

3: Constructing the variables

We created binary indicators from household, family, and person level variables. These indicators take a value of 1 if the characteristics is present, and 0 if it isn’t.

4: Removing very highly correlated variables

We removed highly correlated variables to avoid over-representing any specific socio-economic characteristic. When two variables had a correlation coefficient greater than 0.8 in absolute value and were measuring conceptually similar aspects of advantage or disadvantage, we generally removed one of them. However, we applied some discretion, depending on the variables in question and the size of the correlation.

5: Conducting the initial PCA

We conducted principal component analysis (PCA) using the binary candidate variables and the correlation matrix of these variables, to obtain the loading for each variable on the first principal component.

6: Removing low loading variables

We excluded variables with loadings less than 0.3 in absolute value, on the grounds that they were not strong indicators of relative advantage or disadvantage. This limit is an accepted level in the PCA literature and has been used in past releases of SEIFA and IHAD. We removed variables one at a time, starting with the lowest loading variable.

7: Conducting PCA on the reduced list of variables

We conducted a PCA on the reduced variable list, and if any other variables loaded below 0.3, we repeated steps six and seven.

8. Calculating and standardising component/index scores

We derived the first principal component scores for each household by taking the product of each variable with its respective weight, then taking the sum across all variables. Note that the weight for each variable was calculated by dividing the loading by the square root of the eigenvalue.

${Z_{SA1}} = \sum\limits_{j = 1}^p {\frac{{{L_j}}}{{\sqrt \lambda }} \times {X_{j,}}_{SA1}}$

where,

${Z_{SA1}}$= raw score for the SA1

${{X_{j,}}_{SA1}}$ = standardised variable of the j-th variable for the SA1

${{L_j}}$ = loading for the j-th variable

$\lambda$ = eigenvalue of the principal component

$p$ = total number of variables in the index

For convenience of presentation, we then rescaled the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the household index scores in IHAD.

Note that the principal components are arbitrary with respect to their sign (positive or negative), so we set the sign of the weights and loadings so that they make intuitive sense. That is, we gave advantage indicators positive weights and loadings, and disadvantage indicators negative weights and loadings. Accordingly, high scores indicate relative advantage, and low scores indicate relative disadvantage. This is consistent with previous editions of SEIFA and IHAD.

Technical details for IHAD: variables and loadings

This section gives the results of the principal component analysis carried out for IHAD, including variable loadings and percentage of variance explained. A list of variables initially considered for inclusion but removed due to high correlations with other variables or weak loadings is also provided.

IHAD summaries variables that indicate either relative socio-economic advantage or disadvantage, according to the concept described in Defining the concept behind IHAD. The final IHAD variables and loadings are listed below.

IHAD variables and loadings

IHAD indicators of disadvantage

The following variables are indicators of disadvantage. PUBLIC_RENT is the strongest indicator of disadvantage in the index.

Variable	Description	Loading
PUBLIC_RENT	Households being rented from a state or territory housing authority, or a housing co-operative/community/church group (disadvantage)	-0.84
LOWRENT	Households where rent payments are less than $250 per week, excluding employer landlords (excludes $0) (disadvantage)	-0.81
INC_LOW	Households with low annual equivalised income (between $1 and $25,999) (disadvantage)	-0.71
NOYEAR11or higher	Households where the person with the highest educational attainment left school at year 10 or below, including those who did not go to school and with Certificate level I or II (excludes those currently studying secondary education) (disadvantage)	-0.69
NOCAR	Households with no car (disadvantage)	-0.61
RETIRED_NOT_OWNED	Households with a person aged 65 years and over who does not own the home, or occupy it under a life tenure scheme (disadvantage)	-0.59
DISABILITY_HH_PROP	Households where more than 50% of people need assistance with core activities (disadvantage)	-0.55
NOYEAR12_DEPENDENT	Households with at least one dependent child and the person with the highest educational attainment left school at year 11 or below, including those who did not go to school and with Certificate level I or II (excludes those currently studying secondary education) (disadvantage)	-0.54
FEWBED	Households with one or no bedrooms (disadvantage)	-0.45
ALL_UNEMPLOYED	Households where all people aged 15 years and over are unemployed (disadvantage)	-0.44
YEAR11	Households where the person with the highest educational attainment left school at year 11 (excludes those currently studying secondary education) (disadvantage)	-0.41
CHILDJOBLESS	Households with children aged under 15 years and parent(s) not employed (disadvantage)	-0.35

IHAD indicators of advantage

The following variables are indicators of advantage. DEGREE_DEPENDENT is the strongest indicator of advantage in the index.

Variable	Description	Loading
HIGHCAR	Households with three or more cars (advantage)	0.43
HIGHBED	Households with four or more bedrooms (advantage)	0.50
INC_HIGH	Households with high annual equivalised income (greater than $90,999) (advantage)	0.68
PURCHASED	Households being purchased (advantage)	0.75
DEGREE	Households where the person with the highest educational attainment has a Bachelor Degree or above (advantage)	0.76
HIGH_SKILL	Households where the highest skilled employed adult works in a skill level 1 occupation (advantage)	0.78
HIGHMORTGAGE	Households where mortgage repayments are greater than or equal to $2,900 per month (advantage)	0.79
DEGREE_DEPENDENT	Households with at least one dependent child and the person with the highest educational attainment has a Bachelor Degree or above (advantage).	0.81

The 2021 IHAD index explains 41.4% of the total variance of the variables in the final variable list. The Experimental IHAD 2016 explained 43.2% of this total variance.

Removal of highly correlated variables

In most cases, highly correlated variables were removed from the initial candidate list. This was done to prevent instability in the variable weights and over-representation of any specific socio-economic characteristic. When two variables had a correlation coefficient of size greater than 0.8 in absolute value, one of them was generally removed. However, if they were deemed to be measuring different socio-economic characteristics (e.g. education and occupation), both were retained.

Variable description	Reason for exclusion
Households with one or more people aged 15 years and over who are unemployed (UNEMPLOYED) (disadvantage)	Highly correlated with ALL_UNEMPLOYED which highlights disadvantaged households better
Households with one or more people aged 70 years and over who need assistance with core activities (DISABILITY_OVER70) (disadvantage)	Highly correlated with DISABILITY_HH_PROP (0.83) and not as representative of the total population
Households where all people aged 15 years and over have no educational attainment (NOEDU) (disadvantage)	Small prevalence and highly correlated with NOYEAR11_OR_HIGHER

Removal of low loading variables

The following variables were initially considered for the index but were excluded when the analysis showed that they were weak indicators of relative advantage or disadvantage.

Variable	Variable description
OVERCROWD	Households requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) (disadvantage)
SPAREBED	Households with one or more bedrooms spare (based on Canadian National Occupancy Standard) (advantage)
OTHER_HHLD	Households with a structure classified as "other" (e.g. caravan, tent) (disadvantage)
MULTI_FAMILY	Multi-family households (advantage)
HIGHRENT	Households where rent payments are more than $500 per week (advantage)
OWNED	Households owned outright (advantage)
ONEPARENT	Households with a one-parent family, with dependent children only (disadvantage)
CERTIFICATE	Households where the person with the highest educational attainment has a Certificate III or IV (advantage)
DIPLOMA	Households where the person with the highest educational attainment has an Advanced Diploma or Diploma (advantage)
SKILL_LVL_2	Households where the highest skilled employed adult works in a skill level 2 occupation (advantage)
SKILL_LVL_4	Households where the highest skilled employed adult works in a skill level 4 occupation (disadvantage)
LOW_SKILL	Households where the highest skilled employed adult works in a skill level 5 occupation (disadvantage)
ALL_SHORT_DISTANCE	Households where all people aged 15 years and over who are employed, travel 0 to less than 2.5 km to work (advantage)
ALL_LONG_DISTANCE	Households where all people aged 15 years and over who are employed travel 50 to less than 250 km to work (disadvantage)
ALL_VLONG_DISTANCE	Households where all people aged 15 years and over who are employed travel 250 or more km to work (disadvantage)
SEP_DIVORCED	Households with one or more people aged 15 years and over separated or divorced (disadvantage)
ENGPOOR	Households with one or more people aged 15 years and over who do not speak English well (disadvantage)
ROM	Households with one or more people aged 15 years and over who arrived in Australia in the last 10 years (disadvantage)
UNENGAGED_YOUTH	Households with one or more people aged between 15 and 24 years who are not working or studying (disadvantage)
CARER	Households with one or more people aged 15 years and over who provide unpaid assistance to a person with a disability (disadvantage)
VOLUNTEER	Households with one or more people aged 15 years and over who does voluntary work for an organisation or group (advantage)

Distribution of the IHAD

This section presents the frequency histogram of IHAD scores. The IHAD distributions have generally similar shapes to those from Experimental IHAD 2016.

The scores for IHAD range from 613 to 1,246; the table presents maximum and minimum scores of each IHAD quartile. These show that there is sufficient variation in the IHAD scores to allow for the formation of these groups.

Some households will not have any indicators of advantage or disadvantage (i.e. their values for the final binary candidate variables are all 0). They will still receive an IHAD score reflecting the middle of the IHAD score distribution, which places them in quartile 2.

Distribution of household index scores
IHAD score group (midpoint)	Number of households
563	0
588	0
613	2
638	21
663	4,056
688	18,371
713	30,992
738	48,733
763	59,239
788	89,802
813	115,610
838	206,239
863	346,385
888	346,280
913	752,604
938	1,104,857
963	696,955
988	988,534
1013	769,681
1038	705,848
1063	873,613
1088	462,249
1113	574,997
1138	337,998
1163	281,534
1188	218,916
1213	166,986
1238	30,559
1263	0
1288	0

Distribution of household index scores

["IHAD score group (midpoint)","Number of households"]

[["563","588","613","638","663","688","713","738","763","788","813","838","863","888","913","938","963","988","1013","1038","1063","1088","1113","1138","1163","1188","1213","1238","1263","1288"],[[0],[0],[2],[21],[4056],[18371],[30992],[48733],[59239],[89802],[115610],[206239],[346385],[346280],[752604],[1104857],[696955],[988534],[769681],[705848],[873613],[462249],[574997],[337998],[281534],[218916],[166986],[30559],[0],[0]]]

[]

[{"value":"0","axis_id":"0","axis_title":"","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

[{"value":"0","axis_id":"0","axis_title":"Number of Households","axis_units":"","tooltip_units":"","table_units":"","axis_min":"0","axis_max":"1200000","tick_interval":"150000","precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Frequency distribution of ranked households index group
Household index group	Number of households*		Household index score
Household index group	Frequency	Percentage	Minimum	Maximum
1	2,307,765	25.0	613	943
2	2,308,638	25.0	943	992
3	2,338,786	25.3	992	1,070
4	2,275,872	24.7	1,070	1,246

* The total number of in-scope households assigned an IHAD score is 9,231,061

APA

Citation