Latest release

Socio-Economic Indexes for Areas (SEIFA): Technical Paper

Ranks areas according to relative socio-economic advantage and disadvantage based on Census data.

Reference period

2021

Released

27/04/2023

What is SEIFA?

Socio-Economic Indexes for Areas (SEIFA) is a product developed by the ABS that ranks areas in Australia according to relative socio-economic advantage and disadvantage. The indexes are based on information from the five-yearly Census. SEIFA 2021 is based on Census 2021 data, and consists of four indexes, each focusing on a different aspect of socio-economic advantage and disadvantage, summarising a different subset of Census variables.

Some common uses of SEIFA include:

determining areas that require funding and services,
identifying new business opportunities, and
assisting research into the relationship between socio-economic disadvantage and various social outcomes.

Purpose of technical paper

This paper provides information on the concepts, data, and methods used to create SEIFA 2021. The paper also contains discussion of the correct interpretation and appropriate use of the indexes.

This paper is intended to be a comprehensive reference for SEIFA 2021. Refer to Methodology for basic information that has been prepared for a general audience.

Historic context

A relative measure of socio-economic disadvantage was first produced by the ABS following the 1971 Census. Socio Economic Indexes for Areas (SEIFA), in its present form, was first produced from the 1986 Census data.

Features of SEIFA 2021

This section highlights some important features of SEIFA 2021, and how they compare with SEIFA 2016.

SEIFA 2021 consists of the same four indexes as produced for SEIFA 2001, 2006, 2011 and 2016, each referring to the general population:

the Index of Relative Socio-economic Disadvantage (IRSD),
the Index of Relative Socio-economic Advantage and Disadvantage (IRSAD),
the Index of Economic Resources (IER), and
the Index of Education and Occupation (IEO).

We have generally aimed to maintain consistency between SEIFA 2021 and the previous release. However, some changes have been made and are described below.

Updated geography standard

SEIFA 2021 uses the Australian Statistical Geography Standard (ASGS) Edition 3 (2021). The structure of the ASGS Edition 3 is similar to the structure of ASGS Edition 2 (2016), though there have been updates to SA1 boundaries in some areas. In this version of the ASGS, State Suburbs (SSCs) are now referred to as Suburbs and Localities (SALs). SALs and Postal Areas (POAs) are constructed from Mesh Blocks rather than SA1s. For more information about the ASGS, refer to Changes from the previous edition of the ASGS.

Variables underpinning the indexes

Some variables were updated in line with new classification standards. For example, for the 2016 SEIFA, Australian and New Zealand Classification of Occupations, 2013 (ANZSCO), version 1.2A was used. For 2021, the updated version, ANZSCO version 1.3, was used, resulting in some changes to skill level and some title changes. Variables using cut-off values in their definitions, such as high and low income, were updated to use new cut-off values. For more information about how the cut-off values were selected, refer to the description of candidate SEIFA variables. Census 2021 did not collect information about dwelling internet connection, and so the NONET variable from SEIFA 2016 could not be considered for inclusion in SEIFA 2021.

Output

SEIFA output includes a general introduction to SEIFA 2021, a basic Methodology, this Technical Paper and data which can be sourced from:

Data cubes for a range of geographies
TableBuilder data
DataExplorer data (available after 11:30 on 27 April 2023)
Interactive maps (available on 9 May 2023).

Interpretation of the indexes

To set some context for the rest of this paper, it is worth briefly touching on some important characteristics of the indexes.

The indexes are assigned to areas, not to individuals. They indicate the collective socio-economic characteristics of the people living in an area.

As measures of socio-economic conditions, the indexes are best interpreted as ordinal measures that rank areas. The index scores are based on an arbitrary numerical scale and do not represent a quantity of advantage or disadvantage.

For ease of interpretation, we generally recommend using the index rankings and quantiles (e.g. deciles) for analysis, rather than using the index scores. However, index scores are still provided in the output and can be used for more sophisticated analyses.

Each index is constructed based on a weighted combination of selected variables. The indexes are dependent on the set of variables chosen for the analysis. A different set of underlying variables would result in a different index.

The indexes are primarily designed to compare the relative socio-economic characteristics of areas at a given point in time. It can be very difficult to perform useful longitudinal or time series analysis, and this sort of analysis should be undertaken with care.

There is more discussion of these points in Using and Interpreting SEIFA.

Conceptual framework

The concept of relative socio-economic advantage and disadvantage

For SEIFA 2021, the concept of relative socio-economic advantage and disadvantage is the same as that used for SEIFA 2006, 2011 and 2016. That is, the ABS broadly defines relative socio-economic advantage and disadvantage in terms of people's access to material and social resources, and their ability to participate in society. This is described as ‘broadly defined’ in recognition of the many concepts that have emerged in the literature to describe advantage and disadvantage. The dimensions included in SEIFA are guided by international research, given the constraints of Census data. The Census does collect information on the key dimensions of income, education, employment, occupation, housing, and other miscellaneous indicators of advantage and disadvantage. Variables have been selected from these dimensions and are discussed further in the description of candidate SEIFA variables.

Another point to note is that SEIFA measures relative advantage and disadvantage at an area level, not at an individual level. Area level and individual level disadvantage are separate though related concepts. Area level disadvantage depends on the socio-economic conditions of a community or neighbourhood as a whole. These are primarily the collective characteristics of the area’s residents, but may also be characteristics of the area itself, such as a lack of public resources, transport infrastructure or high levels of pollution. However, it is important to remember that SEIFA is restricted to the information that is included in the Census.

It is recommended that SEIFA users consider their research interests, the definition of each SEIFA index and the variables included in each index to determine the appropriate index to use. The ABS produces four indexes, each summarising a different subset of Census variables, because users may be interested in different aspects of socioeconomic advantage and disadvantage. Defining the concept behind each of the four indexes provides more information on the indexes included in SEIFA.

Defining the concept behind each of the four indexes

This section gives a description of the concept behind each of the four indexes. For a list of the variables included in each index, refer to the technical details for each index: variables and loadings.

The Index of Relative Socio-Economic Disadvantage

The IRSD summarises variables that indicate relative disadvantage. This index ranks areas on a continuum from most disadvantaged to least disadvantaged. A low score on this index indicates a high proportion of relatively disadvantaged people in an area. We cannot conclude that an area with a very high score has a large proportion of relatively advantaged people, as there are no variables in the index to indicate this. We can only conclude that such an area has a relatively low incidence of disadvantage.

The Index of Relative Socio-Economic Advantage and Disadvantage

The IRSAD summarises variables that indicate either relative advantage or disadvantage. This index ranks areas on a continuum from most disadvantaged to most advantaged.

An area with a high score on this index has a relatively high incidence of advantage and a relatively low incidence of disadvantage. Due to the differences in scope between this index and the IRSD, the scores of some areas can vary substantially between the two indexes. For example, consider a large area that has parts containing relatively disadvantaged people, and other parts containing relatively advantaged people. This area may have a low IRSD ranking, due to its pockets of disadvantage. However, its IRSAD ranking may be moderate, or even above average, because the pockets of advantage may offset the pockets of disadvantage.

The Index of Economic Resources

The IER summarises variables relating to the financial aspects of relative socio-economic advantage and disadvantage. These include indicators of high and low income, as well as variables that correlate with high or low wealth. Areas with higher scores have relatively greater access to economic resources than areas with lower scores.

The Index of Education and Occupation

The IEO summarises variables relating to the educational and occupational aspects of relative socio-economic advantage and disadvantage. This index focuses on the skills of the people in an area, both formal qualifications and the skills required to perform different occupations. A low score indicates that an area has a high proportion of people without qualifications, without jobs, and/or with low skilled jobs. A high score indicates many people with high qualifications and/or highly skilled jobs.

The data underpinning the indexes

This chapter looks at the data used to construct the four indexes in SEIFA 2021. All data is from the 2021 Census of Population and Housing.

The candidate list of variables

The candidate variable list from SEIFA 2016 was used for SEIFA 2021 with one exception: the dwelling internet connection variable was not included in Census 2021, and therefore was not available for inclusion in SEIFA 2021. The candidate variables fall into a multi-dimensional framework. The dimensions are:

income
education
employment
occupation
housing
miscellaneous.

Variables typically relate to persons but can also relate to families or dwellings.

Constructing the variables

Specifications

The variables were expressed as proportion of units in an area with a specific characteristic. Depending on the variable, the unit may be a person, family, or dwelling. As each variable was expressed as a proportion, a numerator and denominator were required. The numerator for each variable was a subset of the denominator. In most cases, the numerator and denominator specifications were based on SEIFA 2016 specifications. Some minor changes were made to reflect updates to the Census 2021 variable coding. The Appendix contains detailed descriptions of the numerators and denominators used for all the SEIFA variables. Note that for convenience of presentation in the following sections, the variable proportions are expressed as percentages.

Place of Usual Residence

A person may or may not be enumerated at their place of usual residence on Census Night. Where possible for SEIFA 2021, a person's usual residence was used as the basis of analysis. Counts compiled on a ‘place of usual residence’ basis are appropriate for SEIFA, because they are less likely to be influenced by seasonal factors such as school holidays and snow seasons. However, it is important to understand that certain areas, for example SA1s in popular tourist destinations, may receive scores influenced by the specific time at which the Census is conducted. For instance, the 2021 Census was conducted in August 2021, which is during the high season for ski resorts and the townships in those areas. This means that these areas may have higher property rental prices, higher employment figures and greater income levels than if the Census were conducted in the low season.

Not stated and not applicable

We excluded records with ‘Not stated’ and ‘Not applicable’ values (for the particular variable) from both the numerator and denominator counts. Overseas visitors were excluded implicitly by using usual residence summation, and explicitly in the few instances where this was not possible. For details, see the Appendix.

The numerator and denominator values were calculated from confidentialised Census counts, with the confidentialisation process being the same as that used for the TableBuilder product and other Census releases. Where necessary, the derived proportions were adjusted so that none of them were less than zero or greater than one.

Description of candidate SEIFA variables

This section contains a description of each variable on the candidate variable list. There is a brief discussion of how each variable relates to our definition of relative socio-economic advantage or disadvantage. The tables containing the variable descriptions also state whether the variable is an indicator of relative advantage (adv) or relative disadvantage (dis). Each subsection corresponds to one of the socio-economic dimensions listed in the candidate list of variables.

Income variables

List of income variables
Variable mnemonic	Variable description
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles) (dis)
INC_HIGH	Per cent of people living in households with stated annual household equivalised income greater than or equal to $91,000 (approx. 9th and 10th deciles) (adv)

Income is an important economic resource and is a core component of our notion of relative socio-economic advantage or disadvantage. Income variables are used in all the SEIFA indexes except the Index of Education and Occupation. The income variables are constructed using equivalised household income. Equivalisation is a process in which household income is adjusted by an ‘equivalence scale’, based on the number of adults and children in the household. The SEIFA variables using equivalised household income are calculated from the Census 2021 Equivalised Total Household Income variable (HIED).

The low income variable has been defined for SEIFA 2021 to capture approximately the first and second deciles of the equivalised household income distribution, excluding negative and nil income. That is, those people living in dwellings with equivalised household income between $1 and $499 per week ($1 to $25,999 per year). While the first quintile of equivalised household income was a strong indicator of disadvantage, people reporting negative and nil incomes tended to have profiles with less association with disadvantage. The cut-off of $91,000 for the high income variable was chosen to approximately capture the highest income quintile (top 20%).

Education variables

List of education variables
Variable mnemonic	Variable description
ATUNI	Per cent of people aged 15 years and over attending university or other tertiary institution (adv)
ATSCHOOL	Per cent of people aged 15 years and over attending secondary school (adv)
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of education is a Certificate Level III or IV qualification (dis)
DEGREE	Per cent of people aged 15 years and over whose highest level of education is a bachelor degree qualification or higher (adv)
DIPLOMA	Per cent of people aged 15 years and over whose highest level of education is a diploma or advanced diploma (adv)
NOEDU	Per cent of people aged 15 years and over who have no formal educational attainment (dis)
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of educational attainment is Year 11 or lower (includes Certificate Levels I and II; excludes those still at secondary school) (dis)

Education is important when considering socio-economic advantage and disadvantage because the skills people obtain through school and post-school education can increase their own standard of living, as well as that of their community. Certificate Levels I and II are regarded as a lower educational attainment than year 12 schooling, and are grouped in the NOYR12ORHIGHER variable, as opposed to the CERTIFICATE variable. This specific educational hierarchy was based on the ABS publication Education and Work Australia. Note also that the CERTIFICATE variable is an indicator of relative disadvantage in SEIFA. It is true that having a certificate qualification gives a person an advantage over someone with no qualifications. However, at an area level, a high proportion of people with certificate qualifications correlates with other disadvantaging characteristics (e.g. lower skilled occupations).

Employment variables

List of employment variables
Variable mnemonic	Variable description
UNEMPLOYED	Per cent of people in the labour force who are unemployed (dis)
UNEMPLOYED_IER	Per cent of people aged 15 and over who are unemployed (dis)

For most people, employment is their main source of income. Employment can also contribute to social participation and self-esteem. An unemployment variable is included in each of the SEIFA indexes. The standard unemployment variable (UNEMPLOYED) is calculated as the number of unemployed people divided by the number of people in the labour force (the unemployment rate). The variable used in the Index of Economic Resources (UNEMPLOYED_IER) is the number of unemployed people divided by the entire adult population of the area. This enables us to distinguish the unemployed from those employed and those not in the labour force, as the latter two groups were found to have significantly higher average wealth.

Occupation variables

List of occupation variables
Variable mnemonic	Variable description
OCC_DRIVERS	Per cent of employed people classified as Machinery Operators and Drivers (dis)
OCC_LABOUR	Per cent of employed people classified as Labourers (dis)
OCC_MANAGER	Per cent of employed people classified as Managers (adv)
OCC_PROF	Per cent of employed people classified as Professionals (adv)
OCC_SALES_L	Per cent of employed people classified as Low-Skill Sales Workers (dis)
OCC_SERVICE_L	Per cent of employed people classified as Low-Skill Community and Personal Service Workers (dis)
OCC_SKILL1	Per cent of employed people who work in a Skill Level 1 occupation (adv)
OCC_SKILL2	Per cent of employed people who work in a Skill Level 2 occupation (adv)
OCC_SKILL4	Per cent of employed people who work in a Skill Level 4 occupation (dis)
OCC_SKILL5	Per cent of employed people who work in a Skill Level 5 occupation (dis)

Occupation plays a significant part in determining socio-economic advantage and disadvantage. The ability to accumulate economic resources varies greatly with occupation type. The SEIFA 2021 occupation variables have been classified using the Australian and New Zealand Standard Classification of Occupations, Version 1.3 (ANZSCO).

Each occupation in ANZSCO is assigned a skill level ranging from 1 (highest) to 5 (lowest), which indicates the range and complexity of the set of tasks performed in a particular occupation. These skill levels were used as the basis of the occupation variables in the Index of Education and Occupation. For the purposes of OCC_SALES_L and OCC_SERVICE_L, low skill was determined as skill levels 4 and 5. The aim was to include broad categories of both advantaging and disadvantaging occupations, which complement the education variables by introducing the aspect of vocational skills. For the IRSD and the IRSAD, we used the ANZSCO major groups in conjunction with the skill levels to construct the occupation variables. This was done to identify occupations, or groups of occupations, which contribute to relative advantage or disadvantage at an area level. Using the major groups as well as the skill levels also helped to maintain consistency with SEIFA 2016.

Housing variables

List of housing variables
Variable mnemonic	Variable description
FEWBED	Per cent of occupied private dwellings with one or no bedrooms (dis)
HIGHBED	Per cent of occupied private dwellings with four or more bedrooms (adv)
HIGHMORTGAGE	Per cent of occupied private dwellings paying more than $3,000 per month in mortgage repayments (adv)
HIGHRENT	Per cent of occupied private dwellings paying more than $500 per week in rent (adv)
LOWRENT	Per cent of occupied private dwellings paying less than $250 per week in rent (excluding $0 per week) (dis)
MORTGAGE	Per cent of occupied private dwellings owning the dwelling they occupy (with a mortgage) (adv)
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) (dis)
OWNING	Per cent of occupied private dwellings owning the dwelling they occupy (without a mortgage) (adv)
SPAREBED	Per cent of occupied private dwellings with one or more bedrooms spare (based on Canadian National Occupancy Standard) (adv)

All dwelling variables excluded dwellings whose inhabitants all usually resided elsewhere, whose inhabitants were all under 15, or which could not be classified due to insufficient information. For numerator and denominator specifications, refer to the appendix: variable specifications.

Having an adequate and appropriate place to live is fundamental to socio-economic wellbeing. There are many aspects to housing that affect the quality of people’s lives. Dwelling size, cost and security of tenure are all important in this regard, and are therefore considered in SEIFA. Housing size is measured by the variables FEWBED, HIGHBED, OVERCROWD and SPAREBED. The variable FEWBED measures dwellings with one or no bedrooms, whilst the variable HIGHBED measures dwellings with four or more bedrooms. The variable OVERCROWD measures dwellings that do not have enough bedrooms for their occupants. Conversely, the variable SPAREBED measures dwellings that have one or more bedrooms spare for their occupants. These last two variables are calculated using the Canadian National Occupancy Standard, which determines housing appropriateness using the number of bedrooms and the number, age, sex and relationships of household members. For more information, refer to Housing Occupancy and Costs, 2019-20. Housing cost for SEIFA is measured using reported mortgage or rent payments. The cut-offs for the high and low groups were based on the ranges corresponding to the top and bottom quintiles. The high housing cost variables (HIGHMORTGAGE, HIGHRENT) are indicators of relative advantage, because they indicate greater financial capacity, as well as higher quality housing or locational advantage.

The low housing cost variable (LOWRENT) is an indicator of relative disadvantage, for similar reasons.

Owning a house, with or without a mortgage, is an indicator of advantage. First, owning a house implies security of tenure. For many Australian households, the family home is their most valuable asset. Owning with a mortgage indicates the financial capacity to make repayments, as well as the possession of a future asset. The denominator of the mortgage and rent variable proportions is based on all households in an area.

The Census captures limited household information, and does not for instance capture housing affordability, housing stress, dwelling value and dwelling quality. Although some variables, such as number of bedrooms and amount of rent or mortgage payments, may provide a proxy in some instances, their relationship to dwelling quality and dwelling value is not uniform across all areas.

An investigation using SEIFA 2016 was conducted on including housing stress, as defined by housing costs comprising 30% or more of the total household income, for lower income households only. The analysis showed that the impact on the overall distribution of SEIFA scores was small, and it was noted that the definition of housing stress had limitations.

Other indicators of relative advantage or disadvantage

List of other variables
Variable mnemonic	Variable description
CHILDJOBLESS	Per cent of families with children under 15 years of age and jobless parents (dis)
DISABILITYU70	Per cent of people aged under 70 who need assistance with core activities due to a long-term health condition, disability or old age (dis)
ENGLISHPOOR	Per cent of people who do not speak English well (dis)
GROUP	Per cent of occupied private dwellings that are group occupied private dwellings (dis)
HIGHCAR	Per cent of occupied private dwellings with three or more cars (adv)
LONE	Per cent of occupied private dwellings that are lone person occupied private dwellings (dis)
NOCAR	Per cent of occupied private dwellings with no cars (dis)
ONEPARENT	Per cent of families that are one parent families with dependent offspring only (dis)
SEPDIVORCED	Per cent of people aged 15 and over who are separated or divorced (dis)
UNINCORP	Per cent of occupied private dwellings with at least one person who is an owner of an unincorporated enterprise (adv)

All dwelling variables excluded dwellings whose inhabitants all usually resided elsewhere, whose inhabitants were all under 15, or which could not be classified due to insufficient information. For numerator and denominator specifications refer to the appendix: variable specifications.

The CHILDJOBLESS variable is defined as the proportion of families with children under 15 years old and jobless parents. The variable could be an indicator for entrenched disadvantage since children who grow up in jobless families may be more likely to experience intergenerational unemployment and diminished opportunities to participate in society.

The disability variable (DISABILITYU70) provides an indication of the physical or health aspects of socio-economic disadvantage. It is based on the Census question on need for assistance, which was developed to provide an indication of whether people have a profound or severe disability. People with a profound or severe disability are defined as those people needing help or assistance in one or more of the three core activity areas of self-care, mobility and communication, because of a disability, long term health condition (lasting six months or more) or old age. Disability limits employment opportunities, and possibly access to community resources. For the purpose of indicating relative socio-economic disadvantage, we have limited the scope of the SEIFA disability variable to people aged under 70, as was done for SEIFA 2016.

Questions relating to long-term health conditions were asked for the first time in Census 2021. These were not added to the SEIFA candidate variables for 2021, as many health researchers are interested in measuring individual health outcomes and analysing their relationship with socio-economic advantage/disadvantage. If SEIFA included health variables, it would make these relationships less clear and significantly harder to interpret. It was determined that it would be beneficial to retain the established approach to SEIFA, which is to only include the DISABILITYU70 variable.

A lack of fluency in English may limit employment opportunities and the ability to participate in society.

A car is both a material resource and a means of transport that enables greater freedom. A limitation of the NOCAR variable is that the need for a car varies depending on the remoteness of the area and access to public transport.

A past analysis of wealth data collected by the ABS showed that lone person households have lower average wealth (per person) than other household types. A higher proportion of lone person households in an area is correlated with lower ability to access economic resources beyond what is measured by the equivalised household income variables. An analysis of group households yielded a similar conclusion – an association with low wealth. A high proportion of unincorporated enterprise owners was found to correlate with high wealth and access to economic resources. These three variables were used only in the Index of Economic Resources.

One parent families are disadvantaged compared with other family structures, because of the need to simultaneously provide and care for dependants. Aside from having lower equivalised household incomes, one parent families also have lower rates of employment and labour force participation, lower rates of home ownership and higher incidence of financial stress, as compared to couple family households – for example, refer to Australian Social Trends, 2007. There are significant correlations at the area level between the number of one parent families and many indicators of relative socio-economic disadvantage. The same patterns are evident for areas with high proportions of people who are separated or divorced.

Basic exploratory analysis of variables

The Census data was converted into the SEIFA variable proportions. Summary statistics for these proportions were analysed to identify significant changes since 2016. Overall, there were no unexpected changes to the SEIFA variable proportions.

Candidate variable list for each index

The following table shows the candidate variable list for each index. The candidate list includes all variables considered for inclusion in an index before the principal component analysis stage. The final list of variables included in each index can be found in in technical details of each index: variables and loadings.

Candidate variable list for each index, by socio-economic dimension
Dimension	Index of Relative Socio-Economic Disadvantage	Index of Relative Socio Economic Advantage and Disadvantage	Index of Economic Resources	Index of Education and Occupation
Income	INC_LOW	INC_HIGH INC_LOW	INC_HIGH INC_LOW
Education	NOYR12ORHIGHER NOEDU CERTIFICATE	NOYR12ORHIGHER NOEDU CERTIFICATE ATUNI DIPLOMA DEGREE		NOYR12ORHIGHER NOEDU CERTIFICATE ATUNI DIPLOMA DEGREE ATSCHOOL
Employment	UNEMPLOYED	UNEMPLOYED	UNEMPLOYED_IER	UNEMPLOYED
Occupation	OCC_LABOUR OCC_DRIVERS OCC_SERVICE_L OCC_SALES_L	OCC_LABOUR OCC_DRIVERS OCC_SERVICE_L OCC_SALES_L OCC_PROF OCC_MANAGER		OCC_SKILL1 OCC_SKILL2 OCC_SKILL4 OCC_SKILL5
Housing	LOWRENT OVERCROWD FEWBED	LOWRENT OVERCROWD HIGHBED HIGHRENT HIGHMORTGAGE OWNING SPAREBED	LOWRENT OVERCROWD MORTGAGE HIGHBED HIGHRENT HIGHMORTGAGE OWNING
Other	CHILDJOBLESS ONEPARENT NOCAR DISABILITYU70 ENGLISHPOOR SEPDIVORCED NONET	CHILDJOBLESS ONEPARENT NOCAR DISABILITYU70 ENGLISHPOOR SEPDIVORCED HIGHCAR	UNINCORP ONEPARENT NOCAR GROUP LONE

Refer to the appendix: variable specifications for the definitions of each variable listed in this table
The variables listed in this table are not the final list of variables included in the indexes. For the final list, refer to technical details of each index: variables and loadings

Construction of the indexes

This chapter describes the methods used to construct the indexes, some important technical specifications of each index, and some basic outputs.

Principal Component Analysis

Each index is a weighted sum of SEIFA variables. As with past versions of SEIFA, principal component analysis (PCA) is used to determine the weights. This section introduces some technical concepts related to PCA to assist the reader understand the SEIFA index construction process. Some references are given at the end of this section for readers interested in a comprehensive discussion of PCA.

PCA is a technique that involves summarising a large number of correlated variables into a set of new uncorrelated components, each of which is a linear combination of the original variables. There are as many principal components as there are variables. If the original variables are highly correlated, much of the variation can be summarised by a reduced set of components, enabling easier analysis. The first principal component accounts for the largest proportion of variance in the original dataset, with each following component explaining less of the variance. The principal component used for each SEIFA index is the one that can be interpreted as best explaining the variation in the concept of advantage and disadvantage for that index. For the four indexes in SEIFA 2016, the first principal component was used to create the index.

The PCA procedure gives an eigenvalue for each component, which indicates the amount of variance in the original data explained by the component. The proportion of variance explained by a principal component is its eigenvalue divided by the sum of all the eigenvalues. The 'loading' for a variable is calculated by multiplying the eigenvector by the square root of the eigenvalue. It gives a measure of the strength of the relationship between the variable and the component, though it should be noted that some sources use different definitions for the loadings and weights in PCA. The loadings are also useful in comparing results obtained from different sets of original variables (such as for the four indexes in SEIFA). Loadings for each index are presented in the following sections.

To generate the component scores (otherwise known as raw scores), the loading is converted to a weight by dividing it by the square root of the eigenvalue. The product of the weight and standardised variable values are summed to produce the raw scores. The raw scores for each component will then have variance equal to the eigenvalue for that component. We then rescale the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the index scores in SEIFA - this process is known as "standardisation".

More detailed explanations of PCA can be found in Joliffe (1986) and O’Rourke (2005).

Areas with no SEIFA score

Some SA1 areas do not receive an index score, either due to low populations or poor-quality data. The criteria used to identify these areas are called ‘exclusion rules’. SEIFA 2021 uses a similar exclusion rule framework as SEIFA 2016, with the aim of obtaining a reliable index score for as many areas as possible.

The 2021 exclusion rules use a two-phase approach. The first phase excludes areas (SA1s) that should not receive a SEIFA score because of the type of area, confidentiality or reliability concerns (e.g. low population or low response rates for particular key variables). The second phase excludes areas (SA1s) by looking specifically at the variables included in each index. For each SA1, if any of the variables have a low denominator count, it is deemed that there is not enough data to support a reliable calculation of an index score for that area.

Some additional comments on the exclusion rule framework:

The first phase rules are applied before PCA, whereas the second phase rules are applied following the PCA when the list of variables has been finalised. The step-by-step process provides details on how this is implemented.
SA1s excluded in the first phase will be excluded for all four indexes. The number of SA1s excluded in the second phase may be different for each index, because they have different sets of variables.
Following on from the point above, an area can receive a score for one index and not another depending on the make-up of its variables.
The low denominator cut-off of six is chosen based on past practice and a judgement on how many responses are required to calculate a reliable value for an area.

The exclusion of areas is based on the confidentialised counts for each SEIFA variable to ensure the confidentiality of respondents is upheld and the reliability of the indexes is maintained.

The specific exclusion rules and the number of areas meeting each rule are summarised in the table below. Note that areas might fall into multiple categories, which is why the column sum does not equal the final total number of excluded areas.

The proportions of excluded SA1s are similar to those for SEIFA 2016.

Summary of excluded areas - first phase
Exclusion criteria	Total SA1s excluded
Population = 0	1,357
No Usual Address SA1	9
Offshore, Shipping SA1	24
Population > 0 and ≤ 10	554
Employed persons ≤ 5	2079
Classifiable(a) occupied private dwellings ≤ 5	2118
People in private dwellings ≤ 20%	1741
Total excluded due to any of the rules above	2412

These are dwellings where the type of household living in the dwelling could be determined during the collection process. For more information, refer to the 2021 Census Dictionary.

Summary of excluded areas - second phase
Index	Total SA1s excluded
IRSD	150
IRSAD	150
IER	127
IEO	20

Step-by-step process

With the preceding two sections providing context, a step-by-step process for constructing the indexes is presented below.

1: Creating the initial variable list

Given the data available, we created a list of variables related to our definition of relative socio-economic advantage and disadvantage.

2: Constructing the variables

We created all variables as proportions at the SA1 level (e.g. ‘percent of people aged 15 years and over attending secondary school’). We then standardised these proportions to a mean of zero and a standard deviation of one. The standardisation was used to prevent variables with larger prevalence, or larger ranges, from having a disproportionate influence on the index.

3: Applying first phase exclusion rules

We excluded areas (SA1s) that should not receive an index score because of the type of area, confidentiality, or reliability concerns.

4: Calculating the correlation matrix

We set to missing any variables that have denominators less than our prescribed cut-off of six. Note that we did not exclude areas based on this cut-off at this stage in the process – this occurred at step nine. We calculated the correlation matrix and used pairwise deletion when areas (observations) contain missing values. Pairwise deletion is a method for dealing with missing data. The maximum number of non-missing values for each pair of variables is used in the calculation of the correlation matrix. This contrasts to listwise deletion in which entire records (areas in our case) are removed from the analysis if any of their variables have missing values. Given the number of observations in our dataset and the low prevalence of missing values, the use of pairwise deletion had very little impact on the correlation matrix, however it did enable a convenient way of implementing our second phase exclusion rules (refer to step nine).

5: Removing very highly correlated variables

We removed highly correlated variables to avoid over-representing any specific socio-economic characteristic. When two variables had a correlation coefficient greater than 0.8 in absolute value and were measuring conceptually similar aspects of advantage or disadvantage, we generally removed one of them. However, we applied some discretion, depending on the variables in question and the size of the correlation.

6: Conducting the initial PCA

Using the correlation matrix, we conducted principal component analysis (PCA) to obtain the loading for each variable on the first principal component.

7: Removing low loading variables

We excluded variables with loadings less than 0.3 in absolute value, on the grounds that they were not strong indicators of relative advantage or disadvantage. This limit is an accepted level in the PCA literature and has been used in past releases of SEIFA. We removed variables one at a time, starting with the lowest loading variable.

8: Conducting PCA on the reduced list of variables

We conducted a PCA on the reduced variable list, and if any other variables loaded below 0.3, we repeated steps seven and eight.

9: Finalise list of variables in index and apply second phase exclusion rules

After the final list of variables in the index was determined, we excluded any SA1s that had denominators less than our prescribed cut-off of six for any of the variables on the final variable list.

10: Calculating and standardising component/index scores

We derived the first principal component scores for each SA1 by taking the product of each standardised variable with its respective weight, then taking the sum across all variables. Note that the weight for each variable was calculated by dividing the loading by the square root of the eigenvalue.

${Z_{SA1}} = \sum\limits_{j = 1}^p {\frac{{{L_j}}}{{\sqrt \lambda }} \times {X_{j,}}_{SA1}} $

where,

${Z_{SA1}}$ = raw score for the SA1

${{X_{j,}}_{SA1}}$ = standardised variable of the j-th variable for the SA1

${{L_j}}$ = loading for the j-th variable

$\lambda$ = eigenvalue of the principal component

$p$ = total number of variables in the index

For convenience of presentation, we then rescaled the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the SA1 index scores in SEIFA.

Note that the principal components are arbitrary with respect to their sign (positive or negative), so we set the sign of the weights and loadings so that they make intuitive sense. That is, we gave advantage indicators positive weights and loadings, and disadvantage indicators negative weights and loadings. Accordingly, high scores indicate relative advantage, and low scores indicate relative disadvantage. This is consistent with previous editions of SEIFA.

11: Creating higher geographic level indexes

We constructed indexes for geographies higher than the SA1 level using population weighted averages of the constituent SA1s. We used the following formula:

$INDE{X_{AREA}} = \frac{{\sum\limits_{i = 1}^n {{{(INDE{X_{SA{1_i}}} \times PO{P_{SA{1_i}}})}^{}}} }}{{PO{P_{AREA}}}}$

where,

$INDEX$= Index score for each SA1 or higher level area

$POP$ = Population for each SA1 or higher level area

$n$ = Total number of SA1s (with index scores) in the higher level area

The higher level area population is the sum of the populations from the constituent SA1s that received an index score. Populations in excluded SA1s are not included in this calculation.

Although we constructed the higher level indexes from standardised SA1 level indexes, they were not standardised themselves. Therefore the higher level area indexes do not necessarily have a mean of 1,000 or standard deviation of 100. Only SA1s with index scores were used to create the higher level indexes. In a small number of cases, where a higher level area contains a number of SA1s that were excluded, its index score may not be a good representation of its entire population.

For this reason, the output spreadsheets provide the proportion of each higher area level population that was in excluded SA1s. In general, we encourage users conducting analysis at higher level areas to keep in mind that the indexes were constructed at the SA1 level, and to consider using the distribution of SA1s within the higher level areas, rather than just the one index score for each higher level area.

Technical details of each index: variables and loadings

This section gives the results of the principal component analysis carried out for each index, including variable loadings and percentage of variance explained. We also list the variables initially considered for inclusion but removed due to high correlations with other variables or weak loadings.

Index of Relative Socio-economic Disadvantage

The IRSD summarises variables that indicate relative disadvantage at the SA1 level, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IRSD variables and loadings
Variable name	Variable description	Variable loading
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles)	-0.87
CHILDJOBLESS	Per cent of families with children under 15 years of age who live with jobless parents	-0.78
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II	-0.75
LOWRENT	Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week)	-0.71
UNEMPLOYED	Per cent of people (in the labour force) unemployed	-0.68
OCC_LABOUR	Per cent of employed people classified as 'labourers'	-0.68
DISABILITYU70	Per cent of people aged under 70 who need assistance with core activities due to a long–term health condition, disability or old age	-0.63
ONEPARENT	Per cent of one parent families with dependent offspring only	-0.58
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on the Canadian National Occupancy Standard)	-0.51
OCC_DRIVERS	Per cent of employed people classified as Machinery Operators and Drivers	-0.51
SEPDIVORCED	Per cent of people aged 15 and over who are separated or divorced	-0.51
NOEDU	Per cent of people aged 15 years and over who have no educational attainment	-0.47
OCC_SERVICE_L	Per cent of employed people classified as Low Skill Community and Personal Service Workers	-0.45
NOCAR	Per cent of occupied private dwellings with no cars	-0.43
ENGLISHPOOR	Per cent of people who do not speak English well	-0.35

The 2021 IRSD index explains 37% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 43% (2016 IRSD), 44% (2011 IRSD), 39% (2006 IRSD) and 33% (2001 IRSD).

Removal of highly correlated variables

Of the variables considered for the IRSD, there were no two variables that had a correlation coefficient greater than 0.8 in absolute value.

Removal of low loading variables

The following table shows the variables that were dropped from the IRSD because their loading was below our prescribed cutoff of 0.3 in absolute value. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.

IRSD variables removed due to low loadings
Variable name	Variable description	Variable loading
OCC_SALES_L	Per cent of employed people classified as Low-Skill Sales Workers	-0.27
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification	-0.21
FEWBED	Per cent of occupied private dwellings with one or no bedrooms	-0.01

Index of Relative Socio-Economic Advantage and Disadvantage

The IRSAD summarises variables that indicate either relative socio-economic advantage or disadvantage, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IRSAD variables and loadings
Variable name	Variable description	Variable loading
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II	-0.85
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles)	-0.83
OCC_LABOUR	Per cent of employed people classified as 'labourers'	-0.75
DISABILITYU70	Per cent of people aged under 70 who need assistance with core activities due to a long–term health condition, disability or old age	-0.67
CHILDJOBLESS	Per cent of families with children under 15 years of age who live with jobless parents	-0.65
OCC_DRIVERS	Per cent of employed people classified as Machinery Operators and Drivers	-0.61
LOWRENT	Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week)	-0.58
SEPDIVORCED	Per cent of people aged 15 and over who are separated or divorced	-0.58
ONEPARENT	Per cent of one parent families with dependent offspring only	-0.55
UNEMPLOYED	Per cent of people (in the labour force) unemployed	-0.54
OCC_SERVICE_L	Per cent of employed people classified as Low Skill Community and Personal Service Workers	-0.49
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification	-0.45
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard)	-0.32
NOEDU	Per cent of people aged 15 years and over who have no educational attainment	-0.32
OCC_SALES_L	Per cent of employed people classified as Low Skill Sales	-0.32
ATUNI	Per cent of people aged 15 years and over at university or other tertiary institution	0.35
HIGHBED	Per cent of occupied private dwellings with four or more bedrooms	0.35
DIPLOMA	Per cent of people aged 15 years and over whose highest level of education attainment is a diploma qualification	0.38
HIGHRENT	Per cent of occupied private dwellings paying rent greater than $470 per week	0.51
OCC_MANAGER	Per cent of employed people classified as Managers	0.52
HIGHMORTGAGE	Per cent of occupied private dwellings paying mortgage greater than $2,800 per month	0.69
OCC_PROF	Per cent of employed people classified as Professionals	0.74
INC_HIGH	Per cent of people living in households with stated annual household equivalised income greater than $91,000 (approx 9th and 10th deciles)	0.85

The 2021 IRSAD index explains 34% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 38% (2016 IRSAD), 39% (2011 IRSAD), 44% (2006 IRSAD) and 41% (2001 IRSAD).

Removal of highly correlated variables

The variable DEGREE had high correlations with NOYR12ORHIGHER (–0.83) and OCC_PROF (0.88). This suggested that the proportion of people in an area with a degree was explained by other variables in the index. Therefore DEGREE was dropped.

Removal of low loading variables

The table below shows the variables dropped from the IRSAD because of low loadings. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.

IRSAD variables removed due to low loadings
Variable name	Variable description	Variable loading
NOCAR	Per cent of occupied private dwellings with no cars	0.24
SPAREBED	Per cent of occupied private dwellings with one or no bedrooms	0.20
ENGLISHPOOR	Per cent of people who do not speak English well	-0.21
HIGHCAR	Per cent of occupied private dwellings with three or more cars	0.20
OWNING	Per cent of occupied private dwellings owning dwelling without a mortgage	0.19
FEWBED	Per cent of occupied private dwellings with one or no bedrooms	-0.01

Index of Economic Resources

The IER focuses on the financial aspects of relative socio-economic advantage and disadvantage, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IER variables and loadings
Variable name	Variable description	Variable loading
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles)	-0.73
LOWRENT	Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week)	-0.71
NOCAR	Per cent of occupied private dwellings with no cars	-0.70
LONE	Per cent of occupied private dwellings who are lone person occupied private dwellings	-0.68
ONEPARENT	Per cent of one parent families with dependent offspring only	-0.54
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard)	-0.51
UNEMPLOYED_IER	Per cent of people aged 15 years and over who are unemployed	-0.48
GROUP	Per cent of occupied private dwellings who are group occupied private dwellings	-0.39
OWNING	Per cent of occupied private dwellings owning dwelling without a mortgage	0.34
UNINCORP	Per cent of dwellings with at least one person who is an owner of an unincorporated enterprise	0.47
INC_HIGH	Per cent of people with stated annual household equivalised income greater than $91,000 (approx. 9th and 10th deciles)	0.52
HIGHMORTGAGE	Per cent of occupied private dwellings paying mortgage greater than $2,800 per month	0.64
MORTGAGE	Per cent of occupied private dwellings owning dwelling (with a mortgage)	0.66
HIGHBED	Per cent of occupied private dwellings with four or more bedrooms	0.75

The 2021 IER index explains 35% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 38% (2016 IER) 39% (2011 IER) and 35% (2006 IER).

Removal of highly correlated variables

No variables were dropped based on high correlations.

Removal of low loading variables

The table below shows the variable dropped from the IER because of a low loading.

IER variables removed due to low loadings
Variable name	Variable description	Variable loading
HIGHRENT	Per cent of occupied private dwellings paying rent greater than $470 per week	0.07

Index of Education and Occupation

The IEO summarises variables related to educational qualifications and vocational skills, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.

Final IEO variables and loadings
Variable name	Variable description	Variable loading
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II	-0.87
OCC_SKILL5	Per cent of employed people who work in a Skill Level 5 occupation	-0.76
OCC_SKILL4	Per cent of employed people who work in a Skill Level 4 occupation	-0.75
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification	-0.65
UNEMPLOYED	Per cent of people (in the labour force) unemployed	-0.41
DIPLOMA	Per cent of people aged 15 years and over whose highest level of education attainment is a diploma qualification	0.37
ATUNI	Per cent of people aged 15 years and over at university or other tertiary institution	0.48
OCC_SKILL1	Per cent of employed people who work in a Skill Level 1 occupation	0.90

The 2021 IEO index explains 46% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 41% (2016 IEO) 47% (2011 IEO), 52% (2006 IEO) and 46% (2001 IEO).

Removal of highly correlated variables

DEGREE (% People aged 15 years and over with a degree or higher qualification) had high correlations with NOYR12ORHIGHER (–0.83) and OCC_SKILL1 (0.82). It was decided that the proportion of people with a degree was already well explained by the index, and DEGREE was removed.

Removal of low loading variables

The table below shows the variable dropped from the IEO because of a low loading. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.

IER variable removed due to low loadings
Variable name	Variable description	Variable loading
NOEDU	Per cent of people aged 15 years and over who have no educational attainment	0.29
OCC_SKILL2	Per cent of employed people who work in a skill level 2 occupation	0.27
ATSCHOOL	Per cent of people aged 15 years and over who are still attending secondary school	0.05

Summary of variables included in indexes

The table below shows the final set of variables included in each index.

List of variables in each index, by socio-economic dimension
Dimension	Index of Relative Socio-Economic Disadvantage	Index of Relative Socio-Economic Advantage and Disadvantage	Index of Economic Resources	Index of Education and Occupation
Income	INC_LOW	INC_HIGH INC_LOW	INC_HIGH INC_LOW
Education	NOYR12ORHIGHER NOEDU	NOYR12ORHIGHER NOEDU CERTIFICATE ATUNI DIPLOMA		NOYR12ORHIGHER CERTIFICATE ATUNI DIPLOMA
Employment	UNEMPLOYED	UNEMPLOYED	UNEMPLOYED_IER	UNEMPLOYED
Occupation	OCC_LABOUR OCC_DRIVERS OCC_SERVICE_L	OCC_LABOUR OCC_DRIVERS OCC_SERVICE_L OCC_SALES_L OCC_MANAGER OCC_PROF		OCC_SKILL1 OCC_SKILL4 OCC_SKILL5
Housing	LOWRENT OVERCROWD	LOWRENT OVERCROWD HIGHRENT HIGHBED HIGHMORTGAGE	LOWRENT OVERCROWD OWNING MORTGAGE HIGHBED HIGHMORTGAGE
Other	CHILDJOBLESS ONEPARENT DISABILITYU70 ENGLISHPOOR NOCAR SEPDIVORCED	CHILDJOBLESS ONEPARENT DISABILITYU70 SEPDIVORCED	UNINCORP ONEPARENT LONE GROUP NOCAR

Distribution of the indexes

This section presents frequency histograms for each index at the SA1 level. The index distributions have generally similar shapes to those from SEIFA 2016.

Index of Relative Socio-Economic Disadvantage

The IRSD distribution shown below has a very long left tail. The values range from about 143 to 1207. This index contains only disadvantage indicators, so there is more scope to distinguish between disadvantaged areas than advantaged areas.

The steep peak for this distribution means that there will be little difference in the scores of SA1s in the middle deciles, and so the characteristics related to the IRSD variables may not vary much across SA1s in these middle deciles.

IRSD score distribution
IRSD score group (midpoint)	Number of SA1s
25	0
75	0
125	2
175	0
225	4
275	14
325	16
375	22
425	42
475	65
525	98
575	105
625	160
675	286
725	544
775	1,100
825	2,081
875	3,633
925	6,163
975	9,666
1025	14,075
1075	14,917
1125	6,105
1175	180
1225	2
1275	0
1325	0
1375	0

IRSD score distribution

["IRSD score group (midpoint)","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[2],[0],[4],[14],[16],[22],[42],[65],[98],[105],[160],[286],[544],[1100],[2081],[3633],[6163],[9666],[14075],[14917],[6105],[180],[2],[0],[0],[0]]]

[]

{"0":{"value":"0","axis_id":"0","axis_title":"IRSD score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":null},"reverse_axis":false}

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Index of Relative Socio-Economic Advantage and Disadvantage

The scores for IRSAD range from 435 to 1273. The right-hand slope is not as steep in the IRSAD distribution as it is in the IRSD distribution. This means that the IRSAD scores of SA1s in the upper deciles are more spread out than the IRSD scores in these deciles, and this index has a greater ability to differentiate between the more advantaged areas.

IRSAD score distribution
IRSAD score	Number of SA1s
25	0
75	0
125	0
175	0
225	0
275	0
325	0
375	0
425	1
475	7
525	14
575	53
625	101
675	189
725	410
775	932
825	2,464
875	5,271
925	8,002
975	10,769
1025	11,590
1075	9,715
1125	6,512
1175	2,989
1225	260
1275	1
1325	0
1375	0

IRSAD score distribution

["IRSAD score","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[0],[0],[0],[0],[0],[0],[1],[7],[14],[53],[101],[189],[410],[932],[2464],[5271],[8002],[10769],[11590],[9715],[6512],[2989],[260],[1],[0],[0]]]

[]

{"0":{"value":"0","axis_id":"0","axis_title":"IRSAD score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":null},"reverse_axis":false}

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Index of Economic Resources

The scores for IER range from 299 to 1315.

IER score distribution
IER score	Number of SA1s
25	0
75	0
125	0
175	0
225	0
275	1
325	4
375	7
425	21
475	39
525	76
575	88
625	108
675	189
725	356
775	955
825	2,136
875	4,595
925	8,064
975	10,887
1025	12,086
1075	10,922
1125	6,287
1175	2,208
1225	257
1275	15
1325	2
1375	0

IER score distribution

["IER score","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[0],[0],[0],[1],[4],[7],[21],[39],[76],[88],[108],[189],[356],[955],[2136],[4595],[8064],[10887],[12086],[10922],[6287],[2208],[257],[15],[2],[0]]]

[]

{"0":{"value":"0","axis_id":"0","axis_title":"IER score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":null},"reverse_axis":false}

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Index of Education and Occupation

The scores for IEO range from 407 to 1372

IEO score distribution
IEO score	Number of SA1s
25	0
75	0
125	0
175	0
225	0
275	0
325	0
375	0
425	1
475	0
525	1
575	3
625	13
675	68
725	216
775	849
825	2,736
875	5,989
925	9,398
975	10,750
1025	10,257
1075	8,343
1125	6,401
1175	3,784
1225	593
1275	7
1325	0
1375	1

IEO score distribution

["IEO score","Number of SA1s"]

[["25","75","125","175","225","275","325","375","425","475","525","575","625","675","725","775","825","875","925","975","1025","1075","1125","1175","1225","1275","1325","1375"],[[0],[0],[0],[0],[0],[0],[0],[0],[1],[0],[1],[3],[13],[68],[216],[849],[2736],[5989],[9398],[10750],[10257],[8343],[6401],[3784],[593],[7],[0],[1]]]

[]

{"0":{"value":"0","axis_id":"0","axis_title":"IEO score","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":null},"reverse_axis":false}

[{"value":"0","axis_id":"0","axis_title":"Number of SA1s","axis_units":"","tooltip_units":"","table_units":"","axis_min":null,"axis_max":null,"tick_interval":null,"precision":"-1","data_unit_prefix":"","data_unit_suffix":"","reverse_axis":false}]

Basic output: scores, ranks, deciles and percentiles

Scores

The scores are a weighted combination of the selected indicators of advantage and disadvantage which have been standardised to a distribution with a mean of 1000 and standard deviation of 100. An area with all of its indicators equal to the national average will receive a score of 1000. The score for an area will increase if an area has: an indicator of advantage that is greater than the national average; or an indicator of disadvantage that is less than the national average. Conversely, the score for an area will decrease if an area has: an indicator of disadvantage that is greater than the national average; or an indicator of advantage that is less than the national average. Indicators which are further away from the national average have a larger impact on the score.

For areas larger than SA1, the scores are a population weighted average of constituent SA1 scores, as described in Step 11 of the step by step process.

It is important to remember that the scores are an ordinal measure (discussed in more detail in broad guidelines on appropriate use), so care should be taken when comparing scores. For example, an area with a score of 500 is not twice as disadvantaged as an area with a score of 1000; it just had more markers of relative disadvantage.

Ranks, Deciles and Percentiles

As an ordinal measurement, it’s often more appropriate to use alternative measures rather than the raw score. We have calculated ranks, deciles and percentiles and included these in the output spreadsheets. These measures are defined below.

Rank

The areas are ranked in order of their score, from lowest to highest, with rank one representing the most disadvantaged area. Note that in the spreadsheets, rankings are provided on a national basis and also a state/territory basis. Note that the same set of scores is used for each ranking – the scores are not recalculated for each state/territory.

Deciles

All areas are ordered from lowest to highest score, the lowest 10% of areas are given a decile number of one, the next lowest 10% of areas are given a decile number of two and so on, up to the highest 10% of areas which are given a decile number of 10. This means that areas are divided into ten equal sized groups, depending on their score.

Percentiles

All areas are ordered from lowest to highest score, the lowest 1% of areas are given a percentile number of one, the next lowest 1% of areas are given a percentile number of two and so on, up to the highest 1% of areas which are given a percentile number of 100. This means that areas are divided into one hundred equal sized groups, depending on their score. Sometimes deciles and percentiles are referred to generally as quantiles. Other commonly used quantiles include quintiles and quartiles, although we have not included these in the output spreadsheets. They can easily be derived using the percentiles.

Geographic output levels for SEIFA 2021

The primary unit of analysis and the smallest area for which the indexes are available is the Statistical Area Level 1 (SA1). This is the recommended unit of analysis for SEIFA 2021.

For a selection of geographic areas larger than SA1, scores have been calculated by taking population-weighted averages of constituent SA1 scores. The output spreadsheets also contain some information about the distribution of SA1 index scores within larger areas. This enables users to consider the socio-economic diversity that can exist within a larger area.

The table below summarises the output available at the different geographic levels.

Geographic output summary for SEIFA 2021
Geographic unit	Index score	SA1 distribution information
Statistical Area level 1 (SA1)	Yes	N/A
Statistical Area level 2 (SA2)	Yes	Yes
Statistical Area level 3 (SA3)	No	Yes
Statistical Area level 4 (SA4)	No	Yes
Local Government Area (LGA)	Yes	Yes
Suburbs and Localities (SAL)	Yes	Yes
Postal Area (POA)	Yes	Yes
Commonwealth Electoral Division (CED)	No	Yes
State Electoral Division (SED)	No	Yes

For the geographies larger than SA1, and not in the ASGS (LGAs, SALs and POAs), a best fit correspondence of SA1s to the larger geographies was used. Local Government Areas (LGAs), Suburbs and Localities (SALs) and Postal Areas (POAs) are constructed from Mesh Blocks in the 2021 version of the ASGS. In some cases, particularly for certain SALs with small populations, the SA1 boundaries do not correspond closely to the higher level area. For this reason, SEIFA scores for SALs and POAs with small populations should be used with caution, as the scores may have been calculated from populations that do not correspond closely with the actual population in the area. Refer to ABS Maps for information useful for identifying areas that do not correspond closely to the SA1 structure.

The output spreadsheets contain specific references to the ABS publications from which the geography classifications and correspondences have been sourced.

Validation of the indexes

Once the indexes are calculated, they are checked to ensure that they are measuring the desired concept and that the results generally make sense. This validation is important to establish the credibility of the indexes and identify any issues that may have been missed in the construction of the indexes. The methods used to validate SEIFA 2021 include:

comparison of SEIFA 2021 rankings with 2016 rankings
identification of the drivers of change from SEIFA 2016 to 2021
seeking review from internal experts.

Relationships between the indexes

We examined SEIFA for internal consistency by looking at the correlations between the indexes. The table below shows the rank correlation matrix. All correlations are in the expected directions and show significant relationships. The IRSD is very highly correlated with the IRSAD (0.94).

Spearman's rank correlation matrix
Dimension	IRSD	IRSAD	IER	IEO
IRSD	1.00
IRSAD	0.94	1.00
IER	0.79	0.68	1.00
IEO	0.79	0.93	0.45	1.00

The indexes that measure specific dimensions of advantage and disadvantage (IER and the IEO) have a lower correlation with the other indexes with the exception of IEO and IRSAD. The IER includes variables associated with high and low wealth that are not included in the other indexes. The IEO focuses solely on educational qualifications, employment and vocational skills.

The IER and the IEO are positively correlated, but the correlation is much weaker than between the other indexes (0.45). There is a significant difference between the concepts measured by these two indexes, and they do not share any common variables.

Comparing 2016 and 2021 rankings

The SA1 scores from 2021 were checked against comparable areas from 2016, where possible, to identify areas with large changes and determine whether these changes were plausible. Some changes are to be expected, particularly in areas with high population growth and areas that have been affected by economic changes in the region. This process did not identify any results that seemed unrealistic.

Validation of higher-level area indexes

Most of the validation was focused on the SA1 level indexes because SA1s are the primary unit of analysis and indexes for higher level areas (e.g. SA2) are population weighted averages of the SA1 scores. However, we conducted basic validation checks on any higher level area indexes that we produced. This process did not identify any results that seemed unrealistic.

Using and interpreting SEIFA

This chapter provides information to assist in the appropriate use of SEIFA and to help users gain the most value from the product.

Broad guidelines on appropriate use

Area level indexes

The indexes are assigned to areas, not to individuals. They indicate the collective socio-economic characteristics of the people living in an area. A relatively disadvantaged area is likely to have a high proportion of relatively disadvantaged people. However, such an area is also likely to contain some people who are relatively advantaged. When area level indexes are used as proxy measures of individual level socio-economic advantage and disadvantage, many people are likely to be misclassified. This is known as the ecological fallacy. Wise and Mathews (2011) conducted an investigation into the extent of this issue as it relates to SEIFA.

Ordinal indexes

As measures of socio-economic level, the indexes are best interpreted as ordinal measures. They can be used to rank areas and are also useful to understand the distribution of socio-economic conditions across different areas. Also, the index scores are on an arbitrary numerical scale. The scores do not represent some quantity of advantage or disadvantage. For example, we cannot infer that an area with an index value of 1000 is twice as advantaged as an area with an index value of 500.

For ease of interpretation, we generally recommend using the index rankings and quantiles (e.g. deciles) for analysis, rather than using the index scores. Index scores are still provided in the output and can still be used for analysis when appropriate. For more information on index scores, rankings, and quantiles, refer to basic output: scores, ranks, deciles and percentiles.

Importance of the underlying variables

Each index is constructed using a weighted combination of selected variables. The indexes are dependent on the set of variables chosen for the analysis. A different set of underlying variables would result in a different index. However, due to the large number of variables in each index, removing or altering a single variable will usually not have a large effect.

Users should consider the aspect of socio-economic advantage and disadvantage in which they are interested and examine the underlying set of variables in each index. This will allow them to make an informed decision on whether an index is appropriate for their particular purpose. Choice of index provides some tips on choosing which of the four indexes to use.

Choice of index

Depending on the aim or context of the analysis, one of the SEIFA indexes may be more appropriate than the others. Below are some aspects to be considered.

The concept and variables underlying each index. The concepts behind each index are described in defining the concept behind each of the four indexes. The final variable lists for each index are in the technical details of each index: variables and loadings.
The degree to which the four indexes are correlated with each other – this is discussed in relationships between the indexes.
The IRSD ranks areas on a continuum from most disadvantaged to least disadvantaged, while the other three indexes (IRSAD, IER, IEO) rank areas on a continuum from most disadvantaged/least advantaged to most advantaged/least disadvantaged.
The IRSD and IRSAD are more general measures in the sense that they summarise variables from a wider range of socio-economic dimensions. The IER and IEO are more targeted measures aimed at capturing narrower concepts.
Simpler measures, such as income or employment status, may be more appropriate than SEIFA for some analysis. For an in-depth discussion on choosing a socio-economic measure, refer to Information Paper: Measures of Socioeconomic Status, New Issue for June 2011.

Using index scores for areas larger than SA1

Given that the indexes are area level measures, they have the tendency to mask some underlying diversity. In some applications of the indexes, it may be important to identify diversity of socioeconomic characteristics within areas.

When using an index at a geographic level higher than SA1 (e.g. SA2s and LGAs), we do have some scope to assess the diversity within that area by looking at its constituent SA1s. There is further discussion about assessing diversity within areas in Wise and Mathews (2011) and Radisich and Wise (2012). The second paper also proposes an additional measure that can be used to identify diverse larger areas. This measure is called the ‘SA1-concentration score’ and can identify the presence of disadvantaged SA1s within an overall advantaged large area.

To enable the analyses described above, an additional type of output has been released for SEIFA 2021. For all geographic levels higher than SA1 for which index scores are released, the corresponding SA1 distributions within those areas have been presented in spreadsheets.

As noted previously, SEIFA scores for SALs and POAs with small populations should be used with caution, because the SA1 boundaries may not correspond closely to the higher level area. For more information, refer to geographic output levels for SEIFA 2021.

Mapping the indexes

Maps of the indexes are an excellent way of observing the spatial distribution of relative socio-economic advantage and disadvantage. Refer to interactive maps for available maps of the SEIFA 2021 indexes.

Using the indexes as contextual variables in social analysis

SEIFA index ranks and deciles are commonly merged onto a person level dataset based on the area in which that person resides. The indexes can then be used to help investigate the relationship between disadvantage or advantage and other variables of interest. This type of analysis can yield some very interesting findings; however, it is important to interpret the findings correctly. Some interpretive issues are discussed below.

A SEIFA index refers to the area in which a person lives. It is a contextual variable. It is incorrect to say that a person is very disadvantaged just because they live in a very disadvantaged area. It is true that living in a very disadvantaged area may disadvantage them to a certain extent, but it is possible that they are advantaged in other respects such as having a good education and earning a high income, and are therefore not typical of other residents in that area. The issue of diversity of individuals within areas is further investigated and discussed in SEIFA: Getting a Handle on Individual Diversity Within Areas, 2011.

It is desirable to use the smallest geographic unit possible when merging an index to another dataset. In the case of SEIFA 2021, the SA1 is the smallest unit available, and if possible, SA1s should be derived on the dataset to which SEIFA scores are being appended.

Area-based quantiles versus population-based quantiles

The word ‘quantiles’ is used to collectively describe measures such as percentiles and deciles. In the spreadsheets in which the indexes are presented, quantiles (percentiles and deciles) are presented in addition to the index scores and rankings, as described in basic output: scores, ranks, deciles and percentiles. These quantiles are calculated based on dividing the number of areas into equal groups. These are called area-based quantiles.

An alternative way of defining the quantiles is to divide them into equal groups based on the number of people living in those areas. The quantiles would then contain an equal number of people (or at least as can be best achieved) in each group, rather than an equal number of areas. These are called population-based quantiles.

The ABS publishes area-based quantiles because they are easier to interpret, since SEIFA is an area-based measure. They also serve most analytical purposes. There are some instances in which the use of population-based quantiles is appropriate. Users can create their own population-based quantiles using information already available in the output spreadsheets. Population-based deciles are also available in Census TableBuilder. As mentioned above, population-based quantiles can be difficult to interpret, so users should take care in how they are applied. The population-based quantiles represent groups of individuals who live in similarly ranked areas, as opposed to groups of similarly ranked individuals.

References

Australian Bureau of Statistics (Aug 2007), Australian Social Trends, 2007, ABS Website, accessed 20 April 2023.

Australian Bureau of Statistics (Jun 2011), Information Paper: Measures of Socioeconomic Status, New Issue for June 2011, ABS Website, accessed 20 April 2023.

Australian Bureau of Statistics (Nov 2019), ANZSCO - Australian and New Zealand Standard Classification of Occupations, 2013, Version 1.3, ABS Website, accessed 20 April 2023.

Australian Bureau of Statistics (2019-20), Housing Occupancy and Costs, ABS Website, accessed 20 April 2023.

Australian Bureau of Statistics (2021), Census of Population and Housing: Census dictionary, ABS Website, accessed 20 April 2023.

Australian Bureau of Statistics (Jul2021-Jun2026), Australian Statistical Geography Standard (ASGS) Edition 3, ABS Website, accessed 20 April 2023.

Australian Bureau of Statistics (May 2022), Education and Work, Australia, ABS Website, accessed 20 April 2023.

Joliffe, I.T. (1986) Principal Component Analysis, Springer–Verlag, New York.

O’Rourke, N.; Hatcher, L. and Stepanski, E.J. (2005) A Step-by-Step Approach to Using SAS for Univariate and Multivariate Statistics, Second Edition, SAS Institute Inc., Cary, NC.

Radisich, P. and Wise, P. (2012) “Socio-Economic Indexes For Areas: Robustness, Diversity Within Larger Areas and the New Geography Standard”, Methodology Research Papers, cat. no. 1351.0.55.038, Australian Bureau of Statistics, Canberra.

Wise, P. and Mathews, R. (2011) “Socio-Economic Indexes For Areas: Getting a Handle on Individual Diversity Within Areas”, Methodology Research Papers, cat. no. 1351.0.55.036, Australian Bureau of Statistics, Canberra.

Historical research papers

Over the years, the ABS has released several research papers that have documented research and development the ABS has performed on different aspects of the SEIFA indexes.

Wise, P. and Williamson, C (2013) “Building on SEIFA: Finer Levels of Socio-Economic Summary Measures”, Methodology Research Papers, cat. no. 1352.0.55.135, Australian Bureau of Statistics, Canberra.

Baker, J. and Adhikari, P. (2007) “Socio-Economic Indexes for Individuals and Families”, Methodology Research Paper, cat. no. 1352.0.55.086, Australian Bureau of Statistics, Canberra.

Ciurej, M.; Tanton, R. and Sutcliffe, A. (2006) “Analysis of the Regional Distribution of Relatively Disadvantaged Areas using 2001 SEIFA”, Methodology Research Paper, cat. no. 1351.0.55.013, Australian Bureau of Statistics, Canberra.

Adhikari, P. (2006) “Socio-Economic Indexes for Areas: Introduction, Use and Future Directions”, Methodology Research Paper, cat. no. 1351.0.55.015, Australian Bureau of Statistics, Canberra.

Appendix: Variable specifications

This appendix gives descriptions of each variable considered for inclusion in one of the 2021 indexes. The description of the variable proportion is followed by two bullet points; the first is a description of the numerator, the second is a description of the denominator. The square brackets contain specifications for creating the numerator/denominator from Census data items, according to the mnemonics used in the Census of Population and Housing: Census Dictionary, 2021. The variables are arranged by socio-economic dimension.

Notes:

The Skill Level for each occupation can be found in ANZSCO – Australian and New Zealand Standard Classification of Occupations, Version 1.3
Household composition was ‘not classifiable’ if the household: contained only visitors or persons aged under 15 years on Census night; or was determined to be occupied on Census Night but the collector could not make contact; or could not be classified because there was insufficient information on the Census form.
The Canadian National Occupancy Standard determines housing appropriateness, using the number of bedrooms and the number, age, sex and relationships of household members. For more information refer to Housing Occupancy and Costs, Australia, 2019–20.

Income variables

Income variables - specification
Variable mnemonic	Variable description
INC_LOW	Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles) number of people living in classifiable occupied private dwellings with stated annual household equivalised income between $1 and $25,999 [HIED = 02–05 and DWTD = 1 and UAICP = 1-2] number of people living in classifiable occupied private dwellings with stated household equivalised income [HIED = 01–16 and DWTD = 1 and UAICP = 1-2]
INC_HIGH	Per cent of people living in households with stated annual household equivalised income greater than or equal to $91,000 (approx. 9th and 10th deciles) number of people living in classifiable occupied private dwellings with stated annual household equivalised income greater than $91,000 [HIED = 12–16 and DWTD = 1 and UAICP = 1-2] number of people living in classifiable occupied private dwellings with stated household equivalised income [HIED = 01–16 and DWTD = 1 and UAICP = 1-2]

Education variables

Education variables - specification
Variable mnemonic	Variable description
ATSCHOOL	Per cent of people aged 15 years and over who are attending secondary school number of people aged 15 years and over attending secondary school [AGEP > 14 and TYPP = 31, 32, 33, 39 and UAICP = 1-2] number of people aged 15 years and over (excluding educational institution attendance not stated) [AGEP > 14 and TYPP ne &&, VV and UAICP = 1-2]
ATUNI	Per cent of people aged 15 years and over attending university or other tertiary institution number of people aged 15 years and over at university or other tertiary institution [AGEP > 14 and TYPP = 41, 42, 49] number of people aged 15 years and over (excluding educational institution attendance not stated) [AGEP > 14 and TYPP ne &&, VV and UAICP = 1-2]
CERTIFICATE	Per cent of people aged 15 years and over whose highest level of education is a Certificate Level III or IV qualification number of people aged 15 years and over with a certificate III or IV qualification [HEAP = 51 and UAICP = 1-2] number of people aged 15 years and over (excluding highest level of education not stated or inadequately described) [HEAP ne 001, @@@, VVV, &&&]
DEGREE	Percent of people aged 15 years and over whose highest level of education is a bachelor degree qualification or higher number of people aged 15 years and over whose highest level of education is a bachelor degree qualification or higher [HEAP = 1–3] number of people aged 15 years and over (excluding highest level of education not stated or inadequately described) [HEAP ne 001, @@@, VVV, &&&]
DIPLOMA	Percent of people aged 15 years and over whose highest level of education is a diploma or advanced diploma number of people aged 15 years and over whose highest level of education is a diploma or advanced diploma qualification [HEAP = 4] number of people aged 15 years and over (excluding highest level of education not stated or inadequately described) [HEAP ne 001, @@@, VVV, &&&]
NOEDU	Per cent of people aged 15 years and over who have no formal educational attainment number of people aged 15 years and over who have no formal educational attainment [HEAP = 998] number of people aged 15 years and over (excluding highest level of education not stated or inadequately described) [HEAP ne 001, @@@, VVV, &&&]
NOYR12ORHIGHER	Per cent of people aged 15 years and over whose highest level of educational attainment is Year 11 or lower (includes Certificate Levels I and II; excludes those still at secondary school) number of people aged 15 years and over whose highest level of education is year 11 or lower (includes certificate I and II qualifications; excludes those still at secondary school) [HEAP = 613, 621, 720, 721, 724, 811, 812, 998 and TYPP ne 31, 32, 33, 39] number of people aged 15 years and over (excluding highest level of education not stated or inadequately described) [HEAP ne 001, @@@, VVV, &&&]

Employment variables

Employment variables - specifications
Variable mnemonic	Variable description
UNEMPLOYED	Per cent of people in the labour force who are unemployed number of people aged 15 years and over who are unemployed and looking for work [LFSP = 4–5] number of people aged 15 years and over in the labour force [LFSP = 1–5]
UNEMPLOYED_IER	Per cent of people aged 15 and over who are unemployed number of people aged 15 years and over who are unemployed and looking for work [LFSP = 4–5] number of people aged 15 years and over (excluding labour force status not stated) [LFSP = 1–6]

Occupation variables

Occupation variables - specifications
Variable mnemonic	Variable description
OCC_DRIVERS	Per cent of employed people classified as Machinery Operators and Drivers number of employed people classified as Machinery Operators and Drivers [OCCP = 7] number of employed people with a stated occupation [OCCP = 1–8]
OCC_LABOUR	Per cent of employed people classified as Labourers number of employed people classified as Labourers [OCCP = 8] number of employed people with a stated occupation [OCCP = 1–8]
OCC_MANAGER	Per cent of employed people classified as Managers number of employed people classified as Managers [OCCP = 1] number of employed people with a stated occupation [OCCP = 1–8]
OCC_PROF	Per cent of employed people classified as Professionals number of employed people classified as Professionals [OCCP = 2] number of employed people with a stated occupation [OCCP = 1–8]
OCC_SALES_L	Per cent of employed people classified as Low-Skill Sales Workers number of employed people classified as Low-Skill Sales Workers [OCCP = 6 and OCSKP = 5] number of employed people with a stated occupation [OCCP = 1–8]
OCC_SERVICE_L	Per cent of employed people classified as Low-Skill Community and Personal Service Workers number of employed people classified as Low-Skill Community and Personal Service Workers [OCCP = 4 and OCSKP= 4–5] number of employed people with a stated occupation [OCCP = 1–8]
OCC_SKILL1	Per cent of employed people who work in a Skill Level 1 occupation number of employed people who work in a Skill Level 1 occupation [OCSKP= 1] number of employed people with a stated occupation [OCCP = 1–8]
OCC_SKILL2	Per cent of employed people who work in a Skill Level 2 occupation number of employed people who work in a Skill Level 2 occupation [OCSKP = 2] number of employed people with a stated occupation [OCCP = 1–8]
OCC_SKILL4	Per cent of employed people who work in a Skill Level 4 occupation number of employed people who work in a Skill Level 4 occupation [OCSKP = 4] number of employed people with a stated occupation [OCCP = 1–8]
OCC_SKILL5	Per cent of employed people who work in a Skill Level 5 occupation number of employed people who work in a Skill Level 5 occupation [OCSKP = 5] number of employed people with a stated occupation [OCCP = 1–8]

Housing variables

Housing variables - specification
Variable mnemonic	Variable description
FEWBED	Per cent of occupied private dwellings with one or no bedrooms number of classifiable occupied private dwellings with one or no bedrooms [BEDD = 00-01 and HHCD = 11–32] number of classifiable occupied private dwellings with a stated number of bedrooms [BEDD ne &&, @@ and HHCD = 11–32]
HIGHBED	Per cent of occupied private dwellings with four or more bedrooms number of classifiable occupied private dwellings with four or more bedrooms [BEDD = 04– 30 and HHCD = 11–32] number of classifiable occupied private dwellings with a stated number of bedrooms [BEDD ne &&, @@ and HHCD = 11–32]
HIGHMORTGAGE	Per cent of occupied private dwellings paying more than $3,000 per month in mortgage repayments number of mortgaged classifiable occupied private dwellings with monthly mortgage repayments greater or equal to $3,000 [MRED = 3000–9999, HHCD = 11–32 and TEND = 1-7] number of classifiable occupied private dwellings (excluding those with tenure not stated or not applicable, mortgage not stated or not applicable and rent not stated) [TEND ne &, @, MRED ne &&&&, RNTD ne &&&& and HHCD = 11–32]
HIGHRENT	Per cent of occupied private dwellings paying more than $500 per week in rent number of rented classifiable occupied private dwellings with rent payments greater or equal to $500 per week [RNTD = 500–9999, HHCD = 11–32 and TEND = 1-7] number of classifiable occupied private dwellings (excluding those with tenure not stated, mortgage not stated and rent not stated) [TEND ne &, @, MRED ne &&&&, RNTD ne &&&& and HHCD = 11–32]
LOWRENT	Per cent of occupied private dwellings paying less than $250 per week in rent (excluding $0 per week) number of rented classifiable occupied private dwellings with rent payments less than $250 per week (excluding rent-free and renting from employer) [RNTD = 1–249 and HHCD = 11–32 and LLDD ne 51, 52, &&, @@] number of classifiable occupied private dwellings (excluding those with tenure not stated, mortgage not stated and rent not stated) [TEND ne &, @, MRED ne &&&&, RNTD ne &&&& and HHCD = 11–32]
OVERCROWD	Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) number of classifiable occupied private dwellings needing one or more extra bedrooms (based on Canadian National Occupancy Standard) [HOSD = 01-04 and HHCD = 11–32] number of classifiable occupied private dwellings (excluding dwellings where housing utilisation cannot be determined or is not stated) [HOSD ne 10, &&, @@ and HHCD = 11–32]
OWNING	Per cent of occupied private dwellings owning the dwelling they occupy (without a mortgage) number of households owning the dwelling they occupy without a mortgage (includes caravans in parks) [TEND = 1 and HHCD = 11–32] number of classifiable occupied private dwellings (excluding tenure not stated) [TEND ne &, @ and HHCD = 11–32]
MORTGAGE	Per cent of occupied private dwellings owning the dwelling they occupy (with a mortgage) number of mortgaged classifiable occupied private dwellings (including those with mortgage not stated) [TEND = 2, 3, 6 and HHCD = 11–32] number of classifiable occupied private dwellings (excluding tenure not stated) [TEND ne &, @ and HHCD = 11–32]
SPAREBED	Per cent of occupied private dwellings with one or more bedrooms spare (based on Canadian National Occupancy Standard) number of classifiable occupied private dwellings with one or more spare bedrooms (based on Canadian National Occupancy Standard) [HOSD= 06-09 and HHCD = 11–32] number of classifiable occupied private dwellings (excluding dwellings where housing utilisation cannot be determined or is not stated) [HOSD ne 10, &&, @@ and HHCD = 11–32]

Other variables

Other variables - specifications
Variable mnemonic	Variable description
CHILDJOBLESS	Per cent of families with children under 15 years of age and jobless parents number of families with children aged under 15 and jobless parents [FMCF = 21, 31 and LFSF = 16, 17, 19, 25, 26] number of families (excluding not applicable or not stated) [FMCF ne @@@@ and LFSF ne 06, 11, 15, 18, 20, 21, 27, @@]
DISABILITYU70	Per cent of people aged under 70 who need assistance with core activities due to a long-term health condition, disability or old age number of people aged under 70 years needing assistance in one or more of the three core activity areas of self-care, mobility and communication, because of a disability, long term health condition (lasting six months or more) or old age [AGEP < 70 and ASSNP = 1] number of people aged under 70 years (excluding need for assistance not stated) [AGEP < 70 and ASSNP = 1–2]
ENGLISHPOOR	Per cent of people who do not speak English well number of people aged 5 years and over who speak English either not well or not at all [AGEP > 4 and ENGLP = 4, 5] number of people aged 5 years and over (excluding those who did not state their English proficiency or main language) [AGEP > 4 and ENGLP = 1–5]
GROUP	Per cent of occupied private dwellings that are group occupied private dwellings number of classifiable occupied private dwellings that are occupied by group households (including caravans in parks) [HHCD = 32] number of classifiable occupied private dwellings (including caravans in parks) [HHCD = 11–32]
HIGHCAR	Per cent of occupied private dwellings with three or more cars number of classifiable occupied private dwellings which had 3 or more registered motor vehicles at or near the dwelling [VEHD = 03–30 and HHCD = 11–32] number of classifiable occupied private dwellings (excluding number of vehicles not stated) [VEHD ne &&, @@ and HHCD = 11–32]
LONE	Per cent of occupied private dwellings that are lone person occupied private dwellings number of classifiable occupied private dwellings that are occupied by lone person households (including caravans in parks) [HHCD = 31] number of classifiable occupied private dwellings (including caravans in parks) [HHCD = 11–32]
NOCAR	Per cent of occupied private dwellings with no cars number of classifiable occupied private dwellings which did not have a registered motor vehicle at or near the dwelling [VEHD = 00 and HHCD = 11–32] number of classifiable occupied private dwellings (excluding number of vehicles not stated) [VEHD ne &&, @@ and HHCD = 11–32]
ONEPARENT	Per cent of families that are one parent families with dependent offspring only number of families that are one parent families with dependent offspring only [FMCF = 3112, 3122, 3212] number of families [FMCF ne @@@@]
SEPDIVORCED	Per cent of people aged 15 and over who are separated or divorced number of people aged 15 years or older who are separated or divorced [MSTP = 3, 4] number of people aged 15 years or older (excluding marital status not stated) [MSTP = 1–5]
UNINCORP	Per cent of occupied private dwellings with at least one person who is an owner of an unincorporated enterprise number of classifiable occupied private dwellings where at least one usual resident is the owner of an unincorporated enterprise (who was at their usual address upon enumeration) [SIEMP = 5-7, UAICP = 1 and HHCD = 11– 32] number of classifiable occupied private dwellings (including caravans in parks) [HHCD = 11–32]

APA

Socio-Economic Indexes for Areas (SEIFA): Technical Paper

What is SEIFA?

Purpose of technical paper

Historic context

Features of SEIFA 2021

Updated geography standard

Variables underpinning the indexes

Output

Interpretation of the indexes

Conceptual framework

The concept of relative socio-economic advantage and disadvantage

Defining the concept behind each of the four indexes

The Index of Relative Socio-Economic Disadvantage

The Index of Relative Socio-Economic Advantage and Disadvantage

The Index of Economic Resources

The Index of Education and Occupation

The data underpinning the indexes

The candidate list of variables

Constructing the variables

Specifications

Place of Usual Residence

Not stated and not applicable

Description of candidate SEIFA variables

Income variables

Education variables

Employment variables

Occupation variables

Housing variables

Other indicators of relative advantage or disadvantage

Basic exploratory analysis of variables

Candidate variable list for each index

Construction of the indexes

Principal Component Analysis

Areas with no SEIFA score

Step-by-step process

1: Creating the initial variable list

2: Constructing the variables

3: Applying first phase exclusion rules

4: Calculating the correlation matrix

5: Removing very highly correlated variables

6: Conducting the initial PCA

7: Removing low loading variables

8: Conducting PCA on the reduced list of variables

9: Finalise list of variables in index and apply second phase exclusion rules

10: Calculating and standardising component/index scores

11: Creating higher geographic level indexes

Technical details of each index: variables and loadings

Index of Relative Socio-economic Disadvantage

Removal of highly correlated variables

Removal of low loading variables

Index of Relative Socio-Economic Advantage and Disadvantage

Removal of highly correlated variables

Removal of low loading variables

Index of Economic Resources

Removal of highly correlated variables

Removal of low loading variables

Index of Education and Occupation

Removal of highly correlated variables

Removal of low loading variables

Summary of variables included in indexes

Distribution of the indexes

Index of Relative Socio-Economic Disadvantage

Index of Relative Socio-Economic Advantage and Disadvantage

Index of Economic Resources

Index of Education and Occupation

Basic output: scores, ranks, deciles and percentiles

Scores

Ranks, Deciles and Percentiles

Rank

Deciles

Percentiles

Geographic output levels for SEIFA 2021

Validation of the indexes

Relationships between the indexes

Comparing 2016 and 2021 rankings

Validation of higher-level area indexes

Using and interpreting SEIFA

Broad guidelines on appropriate use

Area level indexes