Construction of the indexes

Latest release

Socio-Economic Indexes for Areas (SEIFA): Technical Paper

Reference period

2021

Released

27/04/2023

Next release Unknown

First release

This chapter describes the methods used to construct the indexes, some important technical specifications of each index, and some basic outputs.

Principal Component Analysis

Each index is a weighted sum of SEIFA variables. As with past versions of SEIFA, principal component analysis (PCA) is used to determine the weights. This section introduces some technical concepts related to PCA to assist the reader understand the SEIFA index construction process. Some references are given at the end of this section for readers interested in a comprehensive discussion of PCA.

PCA is a technique that involves summarising a large number of correlated variables into a set of new uncorrelated components, each of which is a linear combination of the original variables. There are as many principal components as there are variables. If the original variables are highly correlated, much of the variation can be summarised by a reduced set of components, enabling easier analysis. The first principal component accounts for the largest proportion of variance in the original dataset, with each following component explaining less of the variance. The principal component used for each SEIFA index is the one that can be interpreted as best explaining the variation in the concept of advantage and disadvantage for that index. For the four indexes in SEIFA 2016, the first principal component was used to create the index.

The PCA procedure gives an eigenvalue for each component, which indicates the amount of variance in the original data explained by the component. The proportion of variance explained by a principal component is its eigenvalue divided by the sum of all the eigenvalues. The 'loading' for a variable is calculated by multiplying the eigenvector by the square root of the eigenvalue. It gives a measure of the strength of the relationship between the variable and the component, though it should be noted that some sources use different definitions for the loadings and weights in PCA. The loadings are also useful in comparing results obtained from different sets of original variables (such as for the four indexes in SEIFA). Loadings for each index are presented in the following sections.

To generate the component scores (otherwise known as raw scores), the loading is converted to a weight by dividing it by the square root of the eigenvalue. The product of the weight and standardised variable values are summed to produce the raw scores. The raw scores for each component will then have variance equal to the eigenvalue for that component. We then rescale the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the index scores in SEIFA - this process is known as "standardisation".

More detailed explanations of PCA can be found in Joliffe (1986) and O’Rourke (2005).

Areas with no SEIFA score

Some SA1 areas do not receive an index score, either due to low populations or poor-quality data. The criteria used to identify these areas are called ‘exclusion rules’. SEIFA 2021 uses a similar exclusion rule framework as SEIFA 2016, with the aim of obtaining a reliable index score for as many areas as possible.

The 2021 exclusion rules use a two-phase approach. The first phase excludes areas (SA1s) that should not receive a SEIFA score because of the type of area, confidentiality or reliability concerns (e.g. low population or low response rates for particular key variables). The second phase excludes areas (SA1s) by looking specifically at the variables included in each index. For each SA1, if any of the variables have a low denominator count, it is deemed that there is not enough data to support a reliable calculation of an index score for that area.

Some additional comments on the exclusion rule framework:

The first phase rules are applied before PCA, whereas the second phase rules are applied following the PCA when the list of variables has been finalised. The step-by-step process provides details on how this is implemented.
SA1s excluded in the first phase will be excluded for all four indexes. The number of SA1s excluded in the second phase may be different for each index, because they have different sets of variables.
Following on from the point above, an area can receive a score for one index and not another depending on the make-up of its variables.
The low denominator cut-off of six is chosen based on past practice and a judgement on how many responses are required to calculate a reliable value for an area.

The exclusion of areas is based on the confidentialised counts for each SEIFA variable to ensure the confidentiality of respondents is upheld and the reliability of the indexes is maintained.

The specific exclusion rules and the number of areas meeting each rule are summarised in the table below. Note that areas might fall into multiple categories, which is why the column sum does not equal the final total number of excluded areas.

The proportions of excluded SA1s are similar to those for SEIFA 2016.

Summary of excluded areas - first phase
Exclusion criteria	Total SA1s excluded
Population = 0	1,357
No Usual Address SA1	9
Offshore, Shipping SA1	24
Population > 0 and ≤ 10	554
Employed persons ≤ 5	2079
Classifiable(a) occupied private dwellings ≤ 5	2118
People in private dwellings ≤ 20%	1741
Total excluded due to any of the rules above	2412

These are dwellings where the type of household living in the dwelling could be determined during the collection process. For more information, refer to the 2021 Census Dictionary.

Summary of excluded areas - second phase
Index	Total SA1s excluded
IRSD	150
IRSAD	150
IER	127
IEO	20

Step-by-step process

With the preceding two sections providing context, a step-by-step process for constructing the indexes is presented below.

1: Creating the initial variable list

Given the data available, we created a list of variables related to our definition of relative socio-economic advantage and disadvantage.

2: Constructing the variables

We created all variables as proportions at the SA1 level (e.g. ‘percent of people aged 15 years and over attending secondary school’). We then standardised these proportions to a mean of zero and a standard deviation of one. The standardisation was used to prevent variables with larger prevalence, or larger ranges, from having a disproportionate influence on the index.

3: Applying first phase exclusion rules

We excluded areas (SA1s) that should not receive an index score because of the type of area, confidentiality, or reliability concerns.

4: Calculating the correlation matrix

We set to missing any variables that have denominators less than our prescribed cut-off of six. Note that we did not exclude areas based on this cut-off at this stage in the process – this occurred at step nine. We calculated the correlation matrix and used pairwise deletion when areas (observations) contain missing values. Pairwise deletion is a method for dealing with missing data. The maximum number of non-missing values for each pair of variables is used in the calculation of the correlation matrix. This contrasts to listwise deletion in which entire records (areas in our case) are removed from the analysis if any of their variables have missing values. Given the number of observations in our dataset and the low prevalence of missing values, the use of pairwise deletion had very little impact on the correlation matrix, however it did enable a convenient way of implementing our second phase exclusion rules (refer to step nine).

5: Removing very highly correlated variables

We removed highly correlated variables to avoid over-representing any specific socio-economic characteristic. When two variables had a correlation coefficient greater than 0.8 in absolute value and were measuring conceptually similar aspects of advantage or disadvantage, we generally removed one of them. However, we applied some discretion, depending on the variables in question and the size of the correlation.

6: Conducting the initial PCA

Using the correlation matrix, we conducted principal component analysis (PCA) to obtain the loading for each variable on the first principal component.

7: Removing low loading variables

We excluded variables with loadings less than 0.3 in absolute value, on the grounds that they were not strong indicators of relative advantage or disadvantage. This limit is an accepted level in the PCA literature and has been used in past releases of SEIFA. We removed variables one at a time, starting with the lowest loading variable.

8: Conducting PCA on the reduced list of variables

We conducted a PCA on the reduced variable list, and if any other variables loaded below 0.3, we repeated steps seven and eight.

9: Finalise list of variables in index and apply second phase exclusion rules

After the final list of variables in the index was determined, we excluded any SA1s that had denominators less than our prescribed cut-off of six for any of the variables on the final variable list.

10: Calculating and standardising component/index scores

We derived the first principal component scores for each SA1 by taking the product of each standardised variable with its respective weight, then taking the sum across all variables. Note that the weight for each variable was calculated by dividing the loading by the square root of the eigenvalue.

${Z_{SA1}} = \sum\limits_{j = 1}^p {\frac{{{L_j}}}{{\sqrt \lambda }} \times {X_{j,}}_{SA1}} $

where,

${Z_{SA1}}$ = raw score for the SA1

${{X_{j,}}_{SA1}}$ = standardised variable of the j-th variable for the SA1

${{L_j}}$ = loading for the j-th variable

$\lambda$ = eigenvalue of the principal component

$p$ = total number of variables in the index

For convenience of presentation, we then rescaled the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the SA1 index scores in SEIFA.

Note that the principal components are arbitrary with respect to their sign (positive or negative), so we set the sign of the weights and loadings so that they make intuitive sense. That is, we gave advantage indicators positive weights and loadings, and disadvantage indicators negative weights and loadings. Accordingly, high scores indicate relative advantage, and low scores indicate relative disadvantage. This is consistent with previous editions of SEIFA.

11: Creating higher geographic level indexes

We constructed indexes for geographies higher than the SA1 level using population weighted averages of the constituent SA1s. We used the following formula:

$INDE{X_{AREA}} = \frac{{\sum\limits_{i = 1}^n {{{(INDE{X_{SA{1_i}}} \times PO{P_{SA{1_i}}})}^{}}} }}{{PO{P_{AREA}}}}$

where,

$INDEX$= Index score for each SA1 or higher level area

$POP$ = Population for each SA1 or higher level area

$n$ = Total number of SA1s (with index scores) in the higher level area

The higher level area population is the sum of the populations from the constituent SA1s that received an index score. Populations in excluded SA1s are not included in this calculation.

Although we constructed the higher level indexes from standardised SA1 level indexes, they were not standardised themselves. Therefore the higher level area indexes do not necessarily have a mean of 1,000 or standard deviation of 100. Only SA1s with index scores were used to create the higher level indexes. In a small number of cases, where a higher level area contains a number of SA1s that were excluded, its index score may not be a good representation of its entire population.

For this reason, the output spreadsheets provide the proportion of each higher area level population that was in excluded SA1s. In general, we encourage users conducting analysis at higher level areas to keep in mind that the indexes were constructed at the SA1 level, and to consider using the distribution of SA1s within the higher level areas, rather than just the one index score for each higher level area.

Technical details of each index: variables and loadings

This section gives the results of the principal component analysis carried out for each index, including variable loadings and percentage of variance explained. We also list the variables initially considered for inclusion but removed due to high correlations with other variables or weak loadings.

Index of Relative Socio-economic Disadvantage

The IRSD summarises variables that indicate relative disadvantage at the SA1 level, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IRSD variables and loadings
Variable name	Variable description	Variable loading
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles)	-0.87
CHILDJOBLESS	Per cent of families with children under 15 years of age who live with jobless parents	-0.78
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II	-0.75
LOWRENT	Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week)	-0.71
UNEMPLOYED	Per cent of people (in the labour force) unemployed	-0.68
OCC_LABOUR	Per cent of employed people classified as 'labourers'	-0.68
DISABILITYU70	Per cent of people aged under 70 who need assistance with core activities due to a long–term health condition, disability or old age	-0.63
ONEPARENT	Per cent of one parent families with dependent offspring only	-0.58
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on the Canadian National Occupancy Standard)	-0.51
OCC_DRIVERS	Per cent of employed people classified as Machinery Operators and Drivers	-0.51
SEPDIVORCED	Per cent of people aged 15 and over who are separated or divorced	-0.51
NOEDU	Per cent of people aged 15 years and over who have no educational attainment	-0.47
OCC_SERVICE_L	Per cent of employed people classified as Low Skill Community and Personal Service Workers	-0.45
NOCAR	Per cent of occupied private dwellings with no cars	-0.43
ENGLISHPOOR	Per cent of people who do not speak English well	-0.35

The 2021 IRSD index explains 37% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 43% (2016 IRSD), 44% (2011 IRSD), 39% (2006 IRSD) and 33% (2001 IRSD).

Removal of highly correlated variables

Of the variables considered for the IRSD, there were no two variables that had a correlation coefficient greater than 0.8 in absolute value.

Removal of low loading variables

The following table shows the variables that were dropped from the IRSD because their loading was below our prescribed cutoff of 0.3 in absolute value. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.

IRSD variables removed due to low loadings
Variable name	Variable description	Variable loading
OCC_SALES_L	Per cent of employed people classified as Low-Skill Sales Workers	-0.27
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification	-0.21
FEWBED	Per cent of occupied private dwellings with one or no bedrooms	-0.01

Index of Relative Socio-Economic Advantage and Disadvantage

The IRSAD summarises variables that indicate either relative socio-economic advantage or disadvantage, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IRSAD variables and loadings
Variable name	Variable description	Variable loading
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II	-0.85
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles)	-0.83
OCC_LABOUR	Per cent of employed people classified as 'labourers'	-0.75
DISABILITYU70	Per cent of people aged under 70 who need assistance with core activities due to a long–term health condition, disability or old age	-0.67
CHILDJOBLESS	Per cent of families with children under 15 years of age who live with jobless parents	-0.65
OCC_DRIVERS	Per cent of employed people classified as Machinery Operators and Drivers	-0.61
LOWRENT	Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week)	-0.58
SEPDIVORCED	Per cent of people aged 15 and over who are separated or divorced	-0.58
ONEPARENT	Per cent of one parent families with dependent offspring only	-0.55
UNEMPLOYED	Per cent of people (in the labour force) unemployed	-0.54
OCC_SERVICE_L	Per cent of employed people classified as Low Skill Community and Personal Service Workers	-0.49
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification	-0.45
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard)	-0.32
NOEDU	Per cent of people aged 15 years and over who have no educational attainment	-0.32
OCC_SALES_L	Per cent of employed people classified as Low Skill Sales	-0.32
ATUNI	Per cent of people aged 15 years and over at university or other tertiary institution	0.35
HIGHBED	Per cent of occupied private dwellings with four or more bedrooms	0.35
DIPLOMA	Per cent of people aged 15 years and over whose highest level of education attainment is a diploma qualification	0.38
HIGHRENT	Per cent of occupied private dwellings paying rent greater than $500 per week	0.51
OCC_MANAGER	Per cent of employed people classified as Managers	0.52
HIGHMORTGAGE	Per cent of occupied private dwellings paying mortgage greater than $3,000 per month	0.69
OCC_PROF	Per cent of employed people classified as Professionals	0.74
INC_HIGH	Per cent of people living in households with stated annual household equivalised income greater than $91,000 (approx 9th and 10th deciles)	0.85

The 2021 IRSAD index explains 34% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 38% (2016 IRSAD), 39% (2011 IRSAD), 44% (2006 IRSAD) and 41% (2001 IRSAD).

Removal of highly correlated variables

The variable DEGREE had high correlations with NOYR12ORHIGHER (–0.83) and OCC_PROF (0.88). This suggested that the proportion of people in an area with a degree was explained by other variables in the index. Therefore DEGREE was dropped.

Removal of low loading variables

The table below shows the variables dropped from the IRSAD because of low loadings. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.

IRSAD variables removed due to low loadings
Variable name	Variable description	Variable loading
NOCAR	Per cent of occupied private dwellings with no cars	0.24
SPAREBED	Per cent of occupied private dwellings with one or no bedrooms	0.20
ENGLISHPOOR	Per cent of people who do not speak English well	-0.21
HIGHCAR	Per cent of occupied private dwellings with three or more cars	0.20
OWNING	Per cent of occupied private dwellings owning dwelling without a mortgage	0.19
FEWBED	Per cent of occupied private dwellings with one or no bedrooms	-0.01

Index of Economic Resources

The IER focuses on the financial aspects of relative socio-economic advantage and disadvantage, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IER variables and loadings
Variable name	Variable description	Variable loading
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles)	-0.73
LOWRENT	Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week)	-0.71
NOCAR	Per cent of occupied private dwellings with no cars	-0.70
LONE	Per cent of occupied private dwellings who are lone person occupied private dwellings	-0.68
ONEPARENT	Per cent of one parent families with dependent offspring only	-0.54
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard)	-0.51
UNEMPLOYED_IER	Per cent of people aged 15 years and over who are unemployed	-0.48
GROUP	Per cent of occupied private dwellings who are group occupied private dwellings	-0.39
OWNING	Per cent of occupied private dwellings owning dwelling without a mortgage	0.34
UNINCORP	Per cent of dwellings with at least one person who is an owner of an unincorporated enterprise	0.47
INC_HIGH	Per cent of people with stated annual household equivalised income greater than $91,000 (approx. 9th and 10th deciles)	0.52
HIGHMORTGAGE	Per cent of occupied private dwellings paying mortgage greater than $3,000 per month	0.64
MORTGAGE	Per cent of occupied private dwellings owning dwelling (with a mortgage)	0.66
HIGHBED	Per cent of occupied private dwellings with four or more bedrooms	0.75

The 2021 IER index explains 35% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 38% (2016 IER) 39% (2011 IER) and 35% (2006 IER).

Removal of highly correlated variables

No variables were dropped based on high correlations.

Removal of low loading variables

The table below shows the variable dropped from the IER because of a low loading.

IER variables removed due to low loadings
Variable name	Variable description	Variable loading
HIGHRENT	Per cent of occupied private dwellings paying rent greater than $500 per week	0.07

Index of Education and Occupation

The IEO summarises variables related to educational qualifications and vocational skills, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IEO variables and loadings
Variable name	Variable description	Variable loading
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II	-0.87
OCC_SKILL5	Per cent of employed people who work in a Skill Level 5 occupation	-0.76
OCC_SKILL4	Per cent of employed people who work in a Skill Level 4 occupation	-0.75
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification	-0.65
UNEMPLOYED	Per cent of people (in the labour force) unemployed	-0.41
DIPLOMA	Per cent of people aged 15 years and over whose highest level of education attainment is a diploma qualification	0.37
ATUNI	Per cent of people aged 15 years and over at university or other tertiary institution	0.48
OCC_SKILL1	Per cent of employed people who work in a Skill Level 1 occupation	0.90

The 2021 IEO index explains 46% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 41% (2016 IEO) 47% (2011 IEO), 52% (2006 IEO) and 46% (2001 IEO).

Removal of highly correlated variables

DEGREE (% People aged 15 years and over with a degree or higher qualification) had high correlations with NOYR12ORHIGHER (–0.83) and OCC_SKILL1 (0.82). It was decided that the proportion of people with a degree was already well explained by the index, and DEGREE was removed.

Removal of low loading variables

The table below shows the variable dropped from the IEO because of a low loading. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.

IER variable removed due to low loadings
Variable name	Variable description	Variable loading
NOEDU	Per cent of people aged 15 years and over who have no educational attainment	0.29
OCC_SKILL2	Per cent of employed people who work in a skill level 2 occupation	0.27
ATSCHOOL	Per cent of people aged 15 years and over who are still attending secondary school	0.05

Summary of variables included in indexes

The table below shows the final set of variables included in each index.

List of variables in each index, by socio-economic dimension
Dimension	Index of Relative Socio-Economic Disadvantage	Index of Relative Socio-Economic Advantage and Disadvantage	Index of Economic Resources	Index of Education and Occupation
Income	INC_LOW	INC_HIGH INC_LOW	INC_HIGH INC_LOW
Education	NOYR12ORHIGHER NOEDU	NOYR12ORHIGHER NOEDU CERTIFICATE ATUNI DIPLOMA		NOYR12ORHIGHER CERTIFICATE ATUNI DIPLOMA
Employment	UNEMPLOYED	UNEMPLOYED	UNEMPLOYED_IER	UNEMPLOYED
Occupation	OCC_LABOUR OCC_DRIVERS OCC_SERVICE_L	OCC_LABOUR OCC_DRIVERS OCC_SERVICE_L OCC_SALES_L OCC_MANAGER OCC_PROF		OCC_SKILL1 OCC_SKILL4 OCC_SKILL5
Housing	LOWRENT OVERCROWD	LOWRENT OVERCROWD HIGHRENT HIGHBED HIGHMORTGAGE	LOWRENT OVERCROWD OWNING MORTGAGE HIGHBED HIGHMORTGAGE
Other	CHILDJOBLESS ONEPARENT DISABILITYU70 ENGLISHPOOR NOCAR SEPDIVORCED	CHILDJOBLESS ONEPARENT DISABILITYU70 SEPDIVORCED	UNINCORP ONEPARENT LONE GROUP NOCAR

Distribution of the indexes

This section presents frequency histograms for each index at the SA1 level. The index distributions have generally similar shapes to those from SEIFA 2016.

Index of Relative Socio-Economic Disadvantage

The IRSD distribution shown below has a very long left tail. The values range from about 143 to 1207. This index contains only disadvantage indicators, so there is more scope to distinguish between disadvantaged areas than advantaged areas.

The steep peak for this distribution means that there will be little difference in the scores of SA1s in the middle deciles, and so the characteristics related to the IRSD variables may not vary much across SA1s in these middle deciles.

IRSD score distribution
IRSD score group (midpoint)	Number of SA1s
25	0
75	0
125	2
175	0
225	4
275	14
325	16
375	22
425	42
475	65
525	98
575	105
625	160
675	286
725	544
775	1,100
825	2,081
875	3,633
925	6,163
975	9,666
1025	14,075
1075	14,917
1125	6,105
1175	180
1225	2
1275	0
1325	0
1375	0

IRSD score distribution

["IRSD score group (midpoint)","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[2],[0],[4],[14],[16],[22],[42],[65],[98],[105],[160],[286],[544],[1100],[2081],[3633],[6163],[9666],[14075],[14917],[6105],[180],[2],[0],[0],[0]]]

[]

[{"value":"0","axis_id":"0","axis_title":"IRSD score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Index of Relative Socio-Economic Advantage and Disadvantage

The scores for IRSAD range from 435 to 1273. The right-hand slope is not as steep in the IRSAD distribution as it is in the IRSD distribution. This means that the IRSAD scores of SA1s in the upper deciles are more spread out than the IRSD scores in these deciles, and this index has a greater ability to differentiate between the more advantaged areas.

IRSAD score distribution
IRSAD score	Number of SA1s
25	0
75	0
125	0
175	0
225	0
275	0
325	0
375	0
425	1
475	7
525	14
575	53
625	101
675	189
725	410
775	932
825	2,464
875	5,271
925	8,002
975	10,769
1025	11,590
1075	9,715
1125	6,512
1175	2,989
1225	260
1275	1
1325	0
1375	0

IRSAD score distribution

["IRSAD score","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[0],[0],[0],[0],[0],[0],[1],[7],[14],[53],[101],[189],[410],[932],[2464],[5271],[8002],[10769],[11590],[9715],[6512],[2989],[260],[1],[0],[0]]]

[]

[{"value":"0","axis_id":"0","axis_title":"IRSAD score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Index of Economic Resources

The scores for IER range from 299 to 1315.

IER score distribution
IER score	Number of SA1s
25	0
75	0
125	0
175	0
225	0
275	1
325	4
375	7
425	21
475	39
525	76
575	88
625	108
675	189
725	356
775	955
825	2,136
875	4,595
925	8,064
975	10,887
1025	12,086
1075	10,922
1125	6,287
1175	2,208
1225	257
1275	15
1325	2
1375	0

IER score distribution

["IER score","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[0],[0],[0],[1],[4],[7],[21],[39],[76],[88],[108],[189],[356],[955],[2136],[4595],[8064],[10887],[12086],[10922],[6287],[2208],[257],[15],[2],[0]]]

[]

[{"value":"0","axis_id":"0","axis_title":"IER score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Index of Education and Occupation

The scores for IEO range from 407 to 1372

IEO score distribution
IEO score	Number of SA1s
25	0
75	0
125	0
175	0
225	0
275	0
325	0
375	0
425	1
475	0
525	1
575	3
625	13
675	68
725	216
775	849
825	2,736
875	5,989
925	9,398
975	10,750
1025	10,257
1075	8,343
1125	6,401
1175	3,784
1225	593
1275	7
1325	0
1375	1

IEO score distribution

["IEO score","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[0],[0],[0],[0],[0],[0],[1],[0],[1],[3],[13],[68],[216],[849],[2736],[5989],[9398],[10750],[10257],[8343],[6401],[3784],[593],[7],[0],[1]]]

[]

[{"value":"0","axis_id":"0","axis_title":"IEO score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Basic output: scores, ranks, deciles and percentiles

Scores

The scores are a weighted combination of the selected indicators of advantage and disadvantage which have been standardised to a distribution with a mean of 1000 and standard deviation of 100. An area with all of its indicators equal to the national average will receive a score of 1000. The score for an area will increase if an area has: an indicator of advantage that is greater than the national average; or an indicator of disadvantage that is less than the national average. Conversely, the score for an area will decrease if an area has: an indicator of disadvantage that is greater than the national average; or an indicator of advantage that is less than the national average. Indicators which are further away from the national average have a larger impact on the score.

For areas larger than SA1, the scores are a population weighted average of constituent SA1 scores, as described in Step 11 of the step by step process.

It is important to remember that the scores are an ordinal measure (discussed in more detail in broad guidelines on appropriate use), so care should be taken when comparing scores. For example, an area with a score of 500 is not twice as disadvantaged as an area with a score of 1000; it just had more markers of relative disadvantage.

Ranks, Deciles and Percentiles

As an ordinal measurement, it’s often more appropriate to use alternative measures rather than the raw score. We have calculated ranks, deciles and percentiles and included these in the output spreadsheets. These measures are defined below.

Rank

The areas are ranked in order of their score, from lowest to highest, with rank one representing the most disadvantaged area. Note that in the spreadsheets, rankings are provided on a national basis and also a state/territory basis. Note that the same set of scores is used for each ranking – the scores are not recalculated for each state/territory.

Deciles

All areas are ordered from lowest to highest score, the lowest 10% of areas are given a decile number of one, the next lowest 10% of areas are given a decile number of two and so on, up to the highest 10% of areas which are given a decile number of 10. This means that areas are divided into ten equal sized groups, depending on their score.

Percentiles

All areas are ordered from lowest to highest score, the lowest 1% of areas are given a percentile number of one, the next lowest 1% of areas are given a percentile number of two and so on, up to the highest 1% of areas which are given a percentile number of 100. This means that areas are divided into one hundred equal sized groups, depending on their score. Sometimes deciles and percentiles are referred to generally as quantiles. Other commonly used quantiles include quintiles and quartiles, although we have not included these in the output spreadsheets. They can easily be derived using the percentiles.

Geographic output levels for SEIFA 2021

The primary unit of analysis and the smallest area for which the indexes are available is the Statistical Area Level 1 (SA1). This is the recommended unit of analysis for SEIFA 2021.

For a selection of geographic areas larger than SA1, scores have been calculated by taking population-weighted averages of constituent SA1 scores. The output spreadsheets also contain some information about the distribution of SA1 index scores within larger areas. This enables users to consider the socio-economic diversity that can exist within a larger area.

The table below summarises the output available at the different geographic levels.

Geographic output summary for SEIFA 2021
Geographic unit	Index score	SA1 distribution information
Statistical Area level 1 (SA1)	Yes	N/A
Statistical Area level 2 (SA2)	Yes	Yes
Statistical Area level 3 (SA3)	No	Yes
Statistical Area level 4 (SA4)	No	Yes
Local Government Area (LGA)	Yes	Yes
Suburbs and Localities (SAL)	Yes	Yes
Postal Area (POA)	Yes	Yes
Commonwealth Electoral Division (CED)	No	Yes
State Electoral Division (SED)	No	Yes

For the geographies larger than SA1, and not in the ASGS (LGAs, SALs and POAs), a best fit correspondence of SA1s to the larger geographies was used. Local Government Areas (LGAs), Suburbs and Localities (SALs) and Postal Areas (POAs) are constructed from Mesh Blocks in the 2021 version of the ASGS. In some cases, particularly for certain SALs with small populations, the SA1 boundaries do not correspond closely to the higher level area. For this reason, SEIFA scores for SALs and POAs with small populations should be used with caution, as the scores may have been calculated from populations that do not correspond closely with the actual population in the area. Refer to ABS Maps for information useful for identifying areas that do not correspond closely to the SA1 structure.

The output spreadsheets contain specific references to the ABS publications from which the geography classifications and correspondences have been sourced.

APA

Construction of the indexes

APA

Principal Component Analysis

Areas with no SEIFA score

Step-by-step process

1: Creating the initial variable list

2: Constructing the variables

3: Applying first phase exclusion rules

4: Calculating the correlation matrix

5: Removing very highly correlated variables

6: Conducting the initial PCA

7: Removing low loading variables

8: Conducting PCA on the reduced list of variables

9: Finalise list of variables in index and apply second phase exclusion rules

10: Calculating and standardising component/index scores

11: Creating higher geographic level indexes

Technical details of each index: variables and loadings

Index of Relative Socio-economic Disadvantage

Removal of highly correlated variables

Removal of low loading variables

Index of Relative Socio-Economic Advantage and Disadvantage

Removal of highly correlated variables

Removal of low loading variables

Index of Economic Resources

Removal of highly correlated variables

Removal of low loading variables

Index of Education and Occupation

Removal of highly correlated variables

Removal of low loading variables

Summary of variables included in indexes

Distribution of the indexes

Index of Relative Socio-Economic Disadvantage

Index of Relative Socio-Economic Advantage and Disadvantage

Index of Economic Resources

Index of Education and Occupation

Basic output: scores, ranks, deciles and percentiles

Scores

Ranks, Deciles and Percentiles

Rank

Deciles

Percentiles

Geographic output levels for SEIFA 2021