Socio-Economic Indexes for Areas (SEIFA): Technical Paper
Ranks areas according to relative socio-economic advantage and disadvantage based on Census data.
What is SEIFA?
Socio-Economic Indexes for Areas (SEIFA) is a product developed by the ABS that ranks areas in Australia according to relative socio-economic advantage and disadvantage. The indexes are based on information from the five-yearly Census. SEIFA 2021 is based on Census 2021 data, and consists of four indexes, each focusing on a different aspect of socio-economic advantage and disadvantage, summarising a different subset of Census variables.
Some common uses of SEIFA include:
- determining areas that require funding and services,
- identifying new business opportunities, and
- assisting research into the relationship between socio-economic disadvantage and various social outcomes.
Purpose of technical paper
This paper provides information on the concepts, data, and methods used to create SEIFA 2021. The paper also contains discussion of the correct interpretation and appropriate use of the indexes.
This paper is intended to be a comprehensive reference for SEIFA 2021. Refer to Methodology for basic information that has been prepared for a general audience.
Historic context
A relative measure of socio-economic disadvantage was first produced by the ABS following the 1971 Census. Socio Economic Indexes for Areas (SEIFA), in its present form, was first produced from the 1986 Census data.
Features of SEIFA 2021
This section highlights some important features of SEIFA 2021, and how they compare with SEIFA 2016.
SEIFA 2021 consists of the same four indexes as produced for SEIFA 2001, 2006, 2011 and 2016, each referring to the general population:
- the Index of Relative Socio-economic Disadvantage (IRSD),
- the Index of Relative Socio-economic Advantage and Disadvantage (IRSAD),
- the Index of Economic Resources (IER), and
- the Index of Education and Occupation (IEO).
We have generally aimed to maintain consistency between SEIFA 2021 and the previous release. However, some changes have been made and are described below.
Updated geography standard
SEIFA 2021 uses the Australian Statistical Geography Standard (ASGS) Edition 3 (2021). The structure of the ASGS Edition 3 is similar to the structure of ASGS Edition 2 (2016), though there have been updates to SA1 boundaries in some areas. In this version of the ASGS, State Suburbs (SSCs) are now referred to as Suburbs and Localities (SALs). SALs and Postal Areas (POAs) are constructed from Mesh Blocks rather than SA1s. For more information about the ASGS, refer to Changes from the previous edition of the ASGS.
Variables underpinning the indexes
Some variables were updated in line with new classification standards. For example, for the 2016 SEIFA, Australian and New Zealand Classification of Occupations, 2013 (ANZSCO), version 1.2A was used. For 2021, the updated version, ANZSCO version 1.3, was used, resulting in some changes to skill level and some title changes. Variables using cut-off values in their definitions, such as high and low income, were updated to use new cut-off values. For more information about how the cut-off values were selected, refer to the description of candidate SEIFA variables. Census 2021 did not collect information about dwelling internet connection, and so the NONET variable from SEIFA 2016 could not be considered for inclusion in SEIFA 2021.
Output
SEIFA output includes a general introduction to SEIFA 2021, a basic Methodology, this Technical Paper and data which can be sourced from:
- Data cubes for a range of geographies
- TableBuilder data
- DataExplorer data (available after 11:30 on 27 April 2023)
- Interactive maps (available on 9 May 2023).
Interpretation of the indexes
To set some context for the rest of this paper, it is worth briefly touching on some important characteristics of the indexes.
The indexes are assigned to areas, not to individuals. They indicate the collective socio-economic characteristics of the people living in an area.
As measures of socio-economic conditions, the indexes are best interpreted as ordinal measures that rank areas. The index scores are based on an arbitrary numerical scale and do not represent a quantity of advantage or disadvantage.
For ease of interpretation, we generally recommend using the index rankings and quantiles (e.g. deciles) for analysis, rather than using the index scores. However, index scores are still provided in the output and can be used for more sophisticated analyses.
Each index is constructed based on a weighted combination of selected variables. The indexes are dependent on the set of variables chosen for the analysis. A different set of underlying variables would result in a different index.
The indexes are primarily designed to compare the relative socio-economic characteristics of areas at a given point in time. It can be very difficult to perform useful longitudinal or time series analysis, and this sort of analysis should be undertaken with care.
There is more discussion of these points in Using and Interpreting SEIFA.
Conceptual framework
The concept of relative socio-economic advantage and disadvantage
For SEIFA 2021, the concept of relative socio-economic advantage and disadvantage is the same as that used for SEIFA 2006, 2011 and 2016. That is, the ABS broadly defines relative socio-economic advantage and disadvantage in terms of people's access to material and social resources, and their ability to participate in society. This is described as ‘broadly defined’ in recognition of the many concepts that have emerged in the literature to describe advantage and disadvantage. The dimensions included in SEIFA are guided by international research, given the constraints of Census data. The Census does collect information on the key dimensions of income, education, employment, occupation, housing, and other miscellaneous indicators of advantage and disadvantage. Variables have been selected from these dimensions and are discussed further in the description of candidate SEIFA variables.
Another point to note is that SEIFA measures relative advantage and disadvantage at an area level, not at an individual level. Area level and individual level disadvantage are separate though related concepts. Area level disadvantage depends on the socio-economic conditions of a community or neighbourhood as a whole. These are primarily the collective characteristics of the area’s residents, but may also be characteristics of the area itself, such as a lack of public resources, transport infrastructure or high levels of pollution. However, it is important to remember that SEIFA is restricted to the information that is included in the Census.
It is recommended that SEIFA users consider their research interests, the definition of each SEIFA index and the variables included in each index to determine the appropriate index to use. The ABS produces four indexes, each summarising a different subset of Census variables, because users may be interested in different aspects of socioeconomic advantage and disadvantage. Defining the concept behind each of the four indexes provides more information on the indexes included in SEIFA.
Defining the concept behind each of the four indexes
This section gives a description of the concept behind each of the four indexes. For a list of the variables included in each index, refer to the technical details for each index: variables and loadings.
The Index of Relative Socio-Economic Disadvantage
The IRSD summarises variables that indicate relative disadvantage. This index ranks areas on a continuum from most disadvantaged to least disadvantaged. A low score on this index indicates a high proportion of relatively disadvantaged people in an area. We cannot conclude that an area with a very high score has a large proportion of relatively advantaged people, as there are no variables in the index to indicate this. We can only conclude that such an area has a relatively low incidence of disadvantage.
The Index of Relative Socio-Economic Advantage and Disadvantage
The IRSAD summarises variables that indicate either relative advantage or disadvantage. This index ranks areas on a continuum from most disadvantaged to most advantaged.
An area with a high score on this index has a relatively high incidence of advantage and a relatively low incidence of disadvantage. Due to the differences in scope between this index and the IRSD, the scores of some areas can vary substantially between the two indexes. For example, consider a large area that has parts containing relatively disadvantaged people, and other parts containing relatively advantaged people. This area may have a low IRSD ranking, due to its pockets of disadvantage. However, its IRSAD ranking may be moderate, or even above average, because the pockets of advantage may offset the pockets of disadvantage.
The Index of Economic Resources
The IER summarises variables relating to the financial aspects of relative socio-economic advantage and disadvantage. These include indicators of high and low income, as well as variables that correlate with high or low wealth. Areas with higher scores have relatively greater access to economic resources than areas with lower scores.
The Index of Education and Occupation
The IEO summarises variables relating to the educational and occupational aspects of relative socio-economic advantage and disadvantage. This index focuses on the skills of the people in an area, both formal qualifications and the skills required to perform different occupations. A low score indicates that an area has a high proportion of people without qualifications, without jobs, and/or with low skilled jobs. A high score indicates many people with high qualifications and/or highly skilled jobs.
The data underpinning the indexes
This chapter looks at the data used to construct the four indexes in SEIFA 2021. All data is from the 2021 Census of Population and Housing.
The candidate list of variables
The candidate variable list from SEIFA 2016 was used for SEIFA 2021 with one exception: the dwelling internet connection variable was not included in Census 2021, and therefore was not available for inclusion in SEIFA 2021. The candidate variables fall into a multi-dimensional framework. The dimensions are:
- income
- education
- employment
- occupation
- housing
- miscellaneous.
Variables typically relate to persons but can also relate to families or dwellings.
Constructing the variables
Specifications
The variables were expressed as proportion of units in an area with a specific characteristic. Depending on the variable, the unit may be a person, family, or dwelling. As each variable was expressed as a proportion, a numerator and denominator were required. The numerator for each variable was a subset of the denominator. In most cases, the numerator and denominator specifications were based on SEIFA 2016 specifications. Some minor changes were made to reflect updates to the Census 2021 variable coding. The Appendix contains detailed descriptions of the numerators and denominators used for all the SEIFA variables. Note that for convenience of presentation in the following sections, the variable proportions are expressed as percentages.
Place of Usual Residence
A person may or may not be enumerated at their place of usual residence on Census Night. Where possible for SEIFA 2021, a person's usual residence was used as the basis of analysis. Counts compiled on a ‘place of usual residence’ basis are appropriate for SEIFA, because they are less likely to be influenced by seasonal factors such as school holidays and snow seasons. However, it is important to understand that certain areas, for example SA1s in popular tourist destinations, may receive scores influenced by the specific time at which the Census is conducted. For instance, the 2021 Census was conducted in August 2021, which is during the high season for ski resorts and the townships in those areas. This means that these areas may have higher property rental prices, higher employment figures and greater income levels than if the Census were conducted in the low season.
Not stated and not applicable
We excluded records with ‘Not stated’ and ‘Not applicable’ values (for the particular variable) from both the numerator and denominator counts. Overseas visitors were excluded implicitly by using usual residence summation, and explicitly in the few instances where this was not possible. For details, see the Appendix.
The numerator and denominator values were calculated from confidentialised Census counts, with the confidentialisation process being the same as that used for the TableBuilder product and other Census releases. Where necessary, the derived proportions were adjusted so that none of them were less than zero or greater than one.
Description of candidate SEIFA variables
This section contains a description of each variable on the candidate variable list. There is a brief discussion of how each variable relates to our definition of relative socio-economic advantage or disadvantage. The tables containing the variable descriptions also state whether the variable is an indicator of relative advantage (adv) or relative disadvantage (dis). Each subsection corresponds to one of the socio-economic dimensions listed in the candidate list of variables.
Income variables
Variable mnemonic | Variable description |
---|---|
INC_LOW | Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles) (dis) |
INC_HIGH | Per cent of people living in households with stated annual household equivalised income greater than or equal to $91,000 (approx. 9th and 10th deciles) (adv) |
Income is an important economic resource and is a core component of our notion of relative socio-economic advantage or disadvantage. Income variables are used in all the SEIFA indexes except the Index of Education and Occupation. The income variables are constructed using equivalised household income. Equivalisation is a process in which household income is adjusted by an ‘equivalence scale’, based on the number of adults and children in the household. The SEIFA variables using equivalised household income are calculated from the Census 2021 Equivalised Total Household Income variable (HIED).
The low income variable has been defined for SEIFA 2021 to capture approximately the first and second deciles of the equivalised household income distribution, excluding negative and nil income. That is, those people living in dwellings with equivalised household income between $1 and $499 per week ($1 to $25,999 per year). While the first quintile of equivalised household income was a strong indicator of disadvantage, people reporting negative and nil incomes tended to have profiles with less association with disadvantage. The cut-off of $91,000 for the high income variable was chosen to approximately capture the highest income quintile (top 20%).
Education variables
Variable mnemonic | Variable description |
---|---|
ATUNI | Per cent of people aged 15 years and over attending university or other tertiary institution (adv) |
ATSCHOOL | Per cent of people aged 15 years and over attending secondary school (adv) |
CERTIFICATE | Per cent of people aged 15 years and over whose highest level of education is a Certificate Level III or IV qualification (dis) |
DEGREE | Per cent of people aged 15 years and over whose highest level of education is a bachelor degree qualification or higher (adv) |
DIPLOMA | Per cent of people aged 15 years and over whose highest level of education is a diploma or advanced diploma (adv) |
NOEDU | Per cent of people aged 15 years and over who have no formal educational attainment (dis) |
NOYR12ORHIGHER | Per cent of people aged 15 years and over whose highest level of educational attainment is Year 11 or lower (includes Certificate Levels I and II; excludes those still at secondary school) (dis) |
Education is important when considering socio-economic advantage and disadvantage because the skills people obtain through school and post-school education can increase their own standard of living, as well as that of their community. Certificate Levels I and II are regarded as a lower educational attainment than year 12 schooling, and are grouped in the NOYR12ORHIGHER variable, as opposed to the CERTIFICATE variable. This specific educational hierarchy was based on the ABS publication Education and Work Australia. Note also that the CERTIFICATE variable is an indicator of relative disadvantage in SEIFA. It is true that having a certificate qualification gives a person an advantage over someone with no qualifications. However, at an area level, a high proportion of people with certificate qualifications correlates with other disadvantaging characteristics (e.g. lower skilled occupations).
Employment variables
Variable mnemonic | Variable description |
---|---|
UNEMPLOYED | Per cent of people in the labour force who are unemployed (dis) |
UNEMPLOYED_IER | Per cent of people aged 15 and over who are unemployed (dis) |
For most people, employment is their main source of income. Employment can also contribute to social participation and self-esteem. An unemployment variable is included in each of the SEIFA indexes. The standard unemployment variable (UNEMPLOYED) is calculated as the number of unemployed people divided by the number of people in the labour force (the unemployment rate). The variable used in the Index of Economic Resources (UNEMPLOYED_IER) is the number of unemployed people divided by the entire adult population of the area. This enables us to distinguish the unemployed from those employed and those not in the labour force, as the latter two groups were found to have significantly higher average wealth.
Occupation variables
Variable mnemonic | Variable description |
---|---|
OCC_DRIVERS | Per cent of employed people classified as Machinery Operators and Drivers (dis) |
OCC_LABOUR | Per cent of employed people classified as Labourers (dis) |
OCC_MANAGER | Per cent of employed people classified as Managers (adv) |
OCC_PROF | Per cent of employed people classified as Professionals (adv) |
OCC_SALES_L | Per cent of employed people classified as Low-Skill Sales Workers (dis) |
OCC_SERVICE_L | Per cent of employed people classified as Low-Skill Community and Personal Service Workers (dis) |
OCC_SKILL1 | Per cent of employed people who work in a Skill Level 1 occupation (adv) |
OCC_SKILL2 | Per cent of employed people who work in a Skill Level 2 occupation (adv) |
OCC_SKILL4 | Per cent of employed people who work in a Skill Level 4 occupation (dis) |
OCC_SKILL5 | Per cent of employed people who work in a Skill Level 5 occupation (dis) |
Occupation plays a significant part in determining socio-economic advantage and disadvantage. The ability to accumulate economic resources varies greatly with occupation type. The SEIFA 2021 occupation variables have been classified using the Australian and New Zealand Standard Classification of Occupations, Version 1.3 (ANZSCO).
Each occupation in ANZSCO is assigned a skill level ranging from 1 (highest) to 5 (lowest), which indicates the range and complexity of the set of tasks performed in a particular occupation. These skill levels were used as the basis of the occupation variables in the Index of Education and Occupation. For the purposes of OCC_SALES_L and OCC_SERVICE_L, low skill was determined as skill levels 4 and 5. The aim was to include broad categories of both advantaging and disadvantaging occupations, which complement the education variables by introducing the aspect of vocational skills. For the IRSD and the IRSAD, we used the ANZSCO major groups in conjunction with the skill levels to construct the occupation variables. This was done to identify occupations, or groups of occupations, which contribute to relative advantage or disadvantage at an area level. Using the major groups as well as the skill levels also helped to maintain consistency with SEIFA 2016.
Housing variables
Variable mnemonic | Variable description |
---|---|
FEWBED | Per cent of occupied private dwellings with one or no bedrooms (dis) |
HIGHBED | Per cent of occupied private dwellings with four or more bedrooms (adv) |
HIGHMORTGAGE | Per cent of occupied private dwellings paying more than $3,000 per month in mortgage repayments (adv) |
HIGHRENT | Per cent of occupied private dwellings paying more than $500 per week in rent (adv) |
LOWRENT | Per cent of occupied private dwellings paying less than $250 per week in rent (excluding $0 per week) (dis) |
MORTGAGE | Per cent of occupied private dwellings owning the dwelling they occupy (with a mortgage) (adv) |
OVERCROWD | Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) (dis) |
OWNING | Per cent of occupied private dwellings owning the dwelling they occupy (without a mortgage) (adv) |
SPAREBED | Per cent of occupied private dwellings with one or more bedrooms spare (based on Canadian National Occupancy Standard) (adv) |
- All dwelling variables excluded dwellings whose inhabitants all usually resided elsewhere, whose inhabitants were all under 15, or which could not be classified due to insufficient information. For numerator and denominator specifications, refer to the appendix: variable specifications.
Having an adequate and appropriate place to live is fundamental to socio-economic wellbeing. There are many aspects to housing that affect the quality of people’s lives. Dwelling size, cost and security of tenure are all important in this regard, and are therefore considered in SEIFA. Housing size is measured by the variables FEWBED, HIGHBED, OVERCROWD and SPAREBED. The variable FEWBED measures dwellings with one or no bedrooms, whilst the variable HIGHBED measures dwellings with four or more bedrooms. The variable OVERCROWD measures dwellings that do not have enough bedrooms for their occupants. Conversely, the variable SPAREBED measures dwellings that have one or more bedrooms spare for their occupants. These last two variables are calculated using the Canadian National Occupancy Standard, which determines housing appropriateness using the number of bedrooms and the number, age, sex and relationships of household members. For more information, refer to Housing Occupancy and Costs, 2019-20. Housing cost for SEIFA is measured using reported mortgage or rent payments. The cut-offs for the high and low groups were based on the ranges corresponding to the top and bottom quintiles. The high housing cost variables (HIGHMORTGAGE, HIGHRENT) are indicators of relative advantage, because they indicate greater financial capacity, as well as higher quality housing or locational advantage.
The low housing cost variable (LOWRENT) is an indicator of relative disadvantage, for similar reasons.
Owning a house, with or without a mortgage, is an indicator of advantage. First, owning a house implies security of tenure. For many Australian households, the family home is their most valuable asset. Owning with a mortgage indicates the financial capacity to make repayments, as well as the possession of a future asset. The denominator of the mortgage and rent variable proportions is based on all households in an area.
The Census captures limited household information, and does not for instance capture housing affordability, housing stress, dwelling value and dwelling quality. Although some variables, such as number of bedrooms and amount of rent or mortgage payments, may provide a proxy in some instances, their relationship to dwelling quality and dwelling value is not uniform across all areas.
An investigation using SEIFA 2016 was conducted on including housing stress, as defined by housing costs comprising 30% or more of the total household income, for lower income households only. The analysis showed that the impact on the overall distribution of SEIFA scores was small, and it was noted that the definition of housing stress had limitations.
Other indicators of relative advantage or disadvantage
Variable mnemonic | Variable description |
---|---|
CHILDJOBLESS | Per cent of families with children under 15 years of age and jobless parents (dis) |
DISABILITYU70 | Per cent of people aged under 70 who need assistance with core activities due to a long-term health condition, disability or old age (dis) |
ENGLISHPOOR | Per cent of people who do not speak English well (dis) |
GROUP | Per cent of occupied private dwellings that are group occupied private dwellings (dis) |
HIGHCAR | Per cent of occupied private dwellings with three or more cars (adv) |
LONE | Per cent of occupied private dwellings that are lone person occupied private dwellings (dis) |
NOCAR | Per cent of occupied private dwellings with no cars (dis) |
ONEPARENT | Per cent of families that are one parent families with dependent offspring only (dis) |
SEPDIVORCED | Per cent of people aged 15 and over who are separated or divorced (dis) |
UNINCORP | Per cent of occupied private dwellings with at least one person who is an owner of an unincorporated enterprise (adv) |
- All dwelling variables excluded dwellings whose inhabitants all usually resided elsewhere, whose inhabitants were all under 15, or which could not be classified due to insufficient information. For numerator and denominator specifications refer to the appendix: variable specifications.
The CHILDJOBLESS variable is defined as the proportion of families with children under 15 years old and jobless parents. The variable could be an indicator for entrenched disadvantage since children who grow up in jobless families may be more likely to experience intergenerational unemployment and diminished opportunities to participate in society.
The disability variable (DISABILITYU70) provides an indication of the physical or health aspects of socio-economic disadvantage. It is based on the Census question on need for assistance, which was developed to provide an indication of whether people have a profound or severe disability. People with a profound or severe disability are defined as those people needing help or assistance in one or more of the three core activity areas of self-care, mobility and communication, because of a disability, long term health condition (lasting six months or more) or old age. Disability limits employment opportunities, and possibly access to community resources. For the purpose of indicating relative socio-economic disadvantage, we have limited the scope of the SEIFA disability variable to people aged under 70, as was done for SEIFA 2016.
Questions relating to long-term health conditions were asked for the first time in Census 2021. These were not added to the SEIFA candidate variables for 2021, as many health researchers are interested in measuring individual health outcomes and analysing their relationship with socio-economic advantage/disadvantage. If SEIFA included health variables, it would make these relationships less clear and significantly harder to interpret. It was determined that it would be beneficial to retain the established approach to SEIFA, which is to only include the DISABILITYU70 variable.
A lack of fluency in English may limit employment opportunities and the ability to participate in society.
A car is both a material resource and a means of transport that enables greater freedom. A limitation of the NOCAR variable is that the need for a car varies depending on the remoteness of the area and access to public transport.
A past analysis of wealth data collected by the ABS showed that lone person households have lower average wealth (per person) than other household types. A higher proportion of lone person households in an area is correlated with lower ability to access economic resources beyond what is measured by the equivalised household income variables. An analysis of group households yielded a similar conclusion – an association with low wealth. A high proportion of unincorporated enterprise owners was found to correlate with high wealth and access to economic resources. These three variables were used only in the Index of Economic Resources.
One parent families are disadvantaged compared with other family structures, because of the need to simultaneously provide and care for dependants. Aside from having lower equivalised household incomes, one parent families also have lower rates of employment and labour force participation, lower rates of home ownership and higher incidence of financial stress, as compared to couple family households – for example, refer to Australian Social Trends, 2007. There are significant correlations at the area level between the number of one parent families and many indicators of relative socio-economic disadvantage. The same patterns are evident for areas with high proportions of people who are separated or divorced.
Basic exploratory analysis of variables
The Census data was converted into the SEIFA variable proportions. Summary statistics for these proportions were analysed to identify significant changes since 2016. Overall, there were no unexpected changes to the SEIFA variable proportions.
Candidate variable list for each index
The following table shows the candidate variable list for each index. The candidate list includes all variables considered for inclusion in an index before the principal component analysis stage. The final list of variables included in each index can be found in in technical details of each index: variables and loadings.
Dimension | Index of Relative Socio-Economic Disadvantage | Index of Relative Socio Economic Advantage and Disadvantage | Index of Economic Resources | Index of Education and Occupation |
---|---|---|---|---|
Income | INC_LOW | INC_HIGH | INC_HIGH |
|
Education | NOYR12ORHIGHER | NOYR12ORHIGHER |
| NOYR12ORHIGHER |
Employment | UNEMPLOYED | UNEMPLOYED | UNEMPLOYED_IER | UNEMPLOYED |
Occupation | OCC_LABOUR | OCC_LABOUR |
| OCC_SKILL1 |
Housing | LOWRENT | LOWRENT | LOWRENT |
|
Other | CHILDJOBLESS | CHILDJOBLESS | UNINCORP
|
|
- Refer to the appendix: variable specifications for the definitions of each variable listed in this table
- The variables listed in this table are not the final list of variables included in the indexes. For the final list, refer to technical details of each index: variables and loadings
Construction of the indexes
This chapter describes the methods used to construct the indexes, some important technical specifications of each index, and some basic outputs.
Principal Component Analysis
Each index is a weighted sum of SEIFA variables. As with past versions of SEIFA, principal component analysis (PCA) is used to determine the weights. This section introduces some technical concepts related to PCA to assist the reader understand the SEIFA index construction process. Some references are given at the end of this section for readers interested in a comprehensive discussion of PCA.
PCA is a technique that involves summarising a large number of correlated variables into a set of new uncorrelated components, each of which is a linear combination of the original variables. There are as many principal components as there are variables. If the original variables are highly correlated, much of the variation can be summarised by a reduced set of components, enabling easier analysis. The first principal component accounts for the largest proportion of variance in the original dataset, with each following component explaining less of the variance. The principal component used for each SEIFA index is the one that can be interpreted as best explaining the variation in the concept of advantage and disadvantage for that index. For the four indexes in SEIFA 2016, the first principal component was used to create the index.
The PCA procedure gives an eigenvalue for each component, which indicates the amount of variance in the original data explained by the component. The proportion of variance explained by a principal component is its eigenvalue divided by the sum of all the eigenvalues. The 'loading' for a variable is calculated by multiplying the eigenvector by the square root of the eigenvalue. It gives a measure of the strength of the relationship between the variable and the component, though it should be noted that some sources use different definitions for the loadings and weights in PCA. The loadings are also useful in comparing results obtained from different sets of original variables (such as for the four indexes in SEIFA). Loadings for each index are presented in the following sections.
To generate the component scores (otherwise known as raw scores), the loading is converted to a weight by dividing it by the square root of the eigenvalue. The product of the weight and standardised variable values are summed to produce the raw scores. The raw scores for each component will then have variance equal to the eigenvalue for that component. We then rescale the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the index scores in SEIFA - this process is known as "standardisation".
More detailed explanations of PCA can be found in Joliffe (1986) and O’Rourke (2005).
Areas with no SEIFA score
Some SA1 areas do not receive an index score, either due to low populations or poor-quality data. The criteria used to identify these areas are called ‘exclusion rules’. SEIFA 2021 uses a similar exclusion rule framework as SEIFA 2016, with the aim of obtaining a reliable index score for as many areas as possible.
The 2021 exclusion rules use a two-phase approach. The first phase excludes areas (SA1s) that should not receive a SEIFA score because of the type of area, confidentiality or reliability concerns (e.g. low population or low response rates for particular key variables). The second phase excludes areas (SA1s) by looking specifically at the variables included in each index. For each SA1, if any of the variables have a low denominator count, it is deemed that there is not enough data to support a reliable calculation of an index score for that area.
Some additional comments on the exclusion rule framework:
- The first phase rules are applied before PCA, whereas the second phase rules are applied following the PCA when the list of variables has been finalised. The step-by-step process provides details on how this is implemented.
- SA1s excluded in the first phase will be excluded for all four indexes. The number of SA1s excluded in the second phase may be different for each index, because they have different sets of variables.
- Following on from the point above, an area can receive a score for one index and not another depending on the make-up of its variables.
- The low denominator cut-off of six is chosen based on past practice and a judgement on how many responses are required to calculate a reliable value for an area.
- The exclusion of areas is based on the confidentialised counts for each SEIFA variable to ensure the confidentiality of respondents is upheld and the reliability of the indexes is maintained.
The specific exclusion rules and the number of areas meeting each rule are summarised in the table below. Note that areas might fall into multiple categories, which is why the column sum does not equal the final total number of excluded areas.
The proportions of excluded SA1s are similar to those for SEIFA 2016.
Exclusion criteria | Total SA1s excluded |
---|---|
Population = 0 | 1,357 |
No Usual Address SA1 | 9 |
Offshore, Shipping SA1 | 24 |
Population > 0 and ≤ 10 | 554 |
Employed persons ≤ 5 | 2079 |
Classifiable(a) occupied private dwellings ≤ 5 | 2118 |
People in private dwellings ≤ 20% | 1741 |
Total excluded due to any of the rules above | 2412 |
- These are dwellings where the type of household living in the dwelling could be determined during the collection process. For more information, refer to the 2021 Census Dictionary.
Index | Total SA1s excluded |
---|---|
IRSD | 150 |
IRSAD | 150 |
IER | 127 |
IEO | 20 |
Step-by-step process
With the preceding two sections providing context, a step-by-step process for constructing the indexes is presented below.
1: Creating the initial variable list
Given the data available, we created a list of variables related to our definition of relative socio-economic advantage and disadvantage.
2: Constructing the variables
We created all variables as proportions at the SA1 level (e.g. ‘percent of people aged 15 years and over attending secondary school’). We then standardised these proportions to a mean of zero and a standard deviation of one. The standardisation was used to prevent variables with larger prevalence, or larger ranges, from having a disproportionate influence on the index.
3: Applying first phase exclusion rules
We excluded areas (SA1s) that should not receive an index score because of the type of area, confidentiality, or reliability concerns.
4: Calculating the correlation matrix
We set to missing any variables that have denominators less than our prescribed cut-off of six. Note that we did not exclude areas based on this cut-off at this stage in the process – this occurred at step nine. We calculated the correlation matrix and used pairwise deletion when areas (observations) contain missing values. Pairwise deletion is a method for dealing with missing data. The maximum number of non-missing values for each pair of variables is used in the calculation of the correlation matrix. This contrasts to listwise deletion in which entire records (areas in our case) are removed from the analysis if any of their variables have missing values. Given the number of observations in our dataset and the low prevalence of missing values, the use of pairwise deletion had very little impact on the correlation matrix, however it did enable a convenient way of implementing our second phase exclusion rules (refer to step nine).
5: Removing very highly correlated variables
We removed highly correlated variables to avoid over-representing any specific socio-economic characteristic. When two variables had a correlation coefficient greater than 0.8 in absolute value and were measuring conceptually similar aspects of advantage or disadvantage, we generally removed one of them. However, we applied some discretion, depending on the variables in question and the size of the correlation.
6: Conducting the initial PCA
Using the correlation matrix, we conducted principal component analysis (PCA) to obtain the loading for each variable on the first principal component.
7: Removing low loading variables
We excluded variables with loadings less than 0.3 in absolute value, on the grounds that they were not strong indicators of relative advantage or disadvantage. This limit is an accepted level in the PCA literature and has been used in past releases of SEIFA. We removed variables one at a time, starting with the lowest loading variable.
8: Conducting PCA on the reduced list of variables
We conducted a PCA on the reduced variable list, and if any other variables loaded below 0.3, we repeated steps seven and eight.
9: Finalise list of variables in index and apply second phase exclusion rules
After the final list of variables in the index was determined, we excluded any SA1s that had denominators less than our prescribed cut-off of six for any of the variables on the final variable list.
10: Calculating and standardising component/index scores
We derived the first principal component scores for each SA1 by taking the product of each standardised variable with its respective weight, then taking the sum across all variables. Note that the weight for each variable was calculated by dividing the loading by the square root of the eigenvalue.
\({Z_{SA1}} = \sum\limits_{j = 1}^p {\frac{{{L_j}}}{{\sqrt \lambda }} \times {X_{j,}}_{SA1}} \)
where,
\({Z_{SA1}}\) = raw score for the SA1
\({{X_{j,}}_{SA1}}\) = standardised variable of the j-th variable for the SA1
\({{L_j}}\) = loading for the j-th variable
\(\lambda\) = eigenvalue of the principal component
\(p\) = total number of variables in the index
For convenience of presentation, we then rescaled the raw scores to a mean of 1,000 and standard deviation of 100 to create a new set of scores that are the SA1 index scores in SEIFA.
Note that the principal components are arbitrary with respect to their sign (positive or negative), so we set the sign of the weights and loadings so that they make intuitive sense. That is, we gave advantage indicators positive weights and loadings, and disadvantage indicators negative weights and loadings. Accordingly, high scores indicate relative advantage, and low scores indicate relative disadvantage. This is consistent with previous editions of SEIFA.
11: Creating higher geographic level indexes
We constructed indexes for geographies higher than the SA1 level using population weighted averages of the constituent SA1s. We used the following formula:
\(INDE{X_{AREA}} = \frac{{\sum\limits_{i = 1}^n {{{(INDE{X_{SA{1_i}}} \times PO{P_{SA{1_i}}})}^{}}} }}{{PO{P_{AREA}}}}\)
where,
\(INDEX\)= Index score for each SA1 or higher level area
\(POP\) = Population for each SA1 or higher level area
\(n\) = Total number of SA1s (with index scores) in the higher level area
The higher level area population is the sum of the populations from the constituent SA1s that received an index score. Populations in excluded SA1s are not included in this calculation.
Although we constructed the higher level indexes from standardised SA1 level indexes, they were not standardised themselves. Therefore the higher level area indexes do not necessarily have a mean of 1,000 or standard deviation of 100. Only SA1s with index scores were used to create the higher level indexes. In a small number of cases, where a higher level area contains a number of SA1s that were excluded, its index score may not be a good representation of its entire population.
For this reason, the output spreadsheets provide the proportion of each higher area level population that was in excluded SA1s. In general, we encourage users conducting analysis at higher level areas to keep in mind that the indexes were constructed at the SA1 level, and to consider using the distribution of SA1s within the higher level areas, rather than just the one index score for each higher level area.
Technical details of each index: variables and loadings
This section gives the results of the principal component analysis carried out for each index, including variable loadings and percentage of variance explained. We also list the variables initially considered for inclusion but removed due to high correlations with other variables or weak loadings.
Index of Relative Socio-economic Disadvantage
The IRSD summarises variables that indicate relative disadvantage at the SA1 level, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.
Variable name | Variable description | Variable loading |
---|---|---|
INC_LOW | Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles) | -0.87 |
CHILDJOBLESS | Per cent of families with children under 15 years of age who live with jobless parents | -0.78 |
NOYR12ORHIGHER | Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II | -0.75 |
LOWRENT | Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week) | -0.71 |
UNEMPLOYED | Per cent of people (in the labour force) unemployed | -0.68 |
OCC_LABOUR | Per cent of employed people classified as 'labourers' | -0.68 |
DISABILITYU70 | Per cent of people aged under 70 who need assistance with core activities due to a long–term health condition, disability or old age | -0.63 |
ONEPARENT | Per cent of one parent families with dependent offspring only | -0.58 |
OVERCROWD | Per cent of occupied private dwellings requiring one or more extra bedrooms (based on the Canadian National Occupancy Standard) | -0.51 |
OCC_DRIVERS | Per cent of employed people classified as Machinery Operators and Drivers | -0.51 |
SEPDIVORCED | Per cent of people aged 15 and over who are separated or divorced | -0.51 |
NOEDU | Per cent of people aged 15 years and over who have no educational attainment | -0.47 |
OCC_SERVICE_L | Per cent of employed people classified as Low Skill Community and Personal Service Workers | -0.45 |
NOCAR | Per cent of occupied private dwellings with no cars | -0.43 |
ENGLISHPOOR | Per cent of people who do not speak English well | -0.35 |
The 2021 IRSD index explains 37% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 43% (2016 IRSD), 44% (2011 IRSD), 39% (2006 IRSD) and 33% (2001 IRSD).
Removal of highly correlated variables
Of the variables considered for the IRSD, there were no two variables that had a correlation coefficient greater than 0.8 in absolute value.
Removal of low loading variables
The following table shows the variables that were dropped from the IRSD because their loading was below our prescribed cutoff of 0.3 in absolute value. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.
Variable name | Variable description | Variable loading |
---|---|---|
OCC_SALES_L | Per cent of employed people classified as Low-Skill Sales Workers | -0.27 |
CERTIFICATE | Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification | -0.21 |
FEWBED | Per cent of occupied private dwellings with one or no bedrooms | -0.01 |
Index of Relative Socio-Economic Advantage and Disadvantage
The IRSAD summarises variables that indicate either relative socio-economic advantage or disadvantage, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.
Variable name | Variable description | Variable loading |
---|---|---|
NOYR12ORHIGHER | Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II | -0.85 |
INC_LOW | Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles) | -0.83 |
OCC_LABOUR | Per cent of employed people classified as 'labourers' | -0.75 |
DISABILITYU70 | Per cent of people aged under 70 who need assistance with core activities due to a long–term health condition, disability or old age | -0.67 |
CHILDJOBLESS | Per cent of families with children under 15 years of age who live with jobless parents | -0.65 |
OCC_DRIVERS | Per cent of employed people classified as Machinery Operators and Drivers | -0.61 |
LOWRENT | Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week) | -0.58 |
SEPDIVORCED | Per cent of people aged 15 and over who are separated or divorced | -0.58 |
ONEPARENT | Per cent of one parent families with dependent offspring only | -0.55 |
UNEMPLOYED | Per cent of people (in the labour force) unemployed | -0.54 |
OCC_SERVICE_L | Per cent of employed people classified as Low Skill Community and Personal Service Workers | -0.49 |
CERTIFICATE | Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification | -0.45 |
OVERCROWD | Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) | -0.32 |
NOEDU | Per cent of people aged 15 years and over who have no educational attainment | -0.32 |
OCC_SALES_L | Per cent of employed people classified as Low Skill Sales | -0.32 |
ATUNI | Per cent of people aged 15 years and over at university or other tertiary institution | 0.35 |
HIGHBED | Per cent of occupied private dwellings with four or more bedrooms | 0.35 |
DIPLOMA | Per cent of people aged 15 years and over whose highest level of education attainment is a diploma qualification | 0.38 |
HIGHRENT | Per cent of occupied private dwellings paying rent greater than $470 per week | 0.51 |
OCC_MANAGER | Per cent of employed people classified as Managers | 0.52 |
HIGHMORTGAGE | Per cent of occupied private dwellings paying mortgage greater than $2,800 per month | 0.69 |
OCC_PROF | Per cent of employed people classified as Professionals | 0.74 |
INC_HIGH | Per cent of people living in households with stated annual household equivalised income greater than $91,000 (approx 9th and 10th deciles) | 0.85 |
The 2021 IRSAD index explains 34% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 38% (2016 IRSAD), 39% (2011 IRSAD), 44% (2006 IRSAD) and 41% (2001 IRSAD).
Removal of highly correlated variables
The variable DEGREE had high correlations with NOYR12ORHIGHER (–0.83) and OCC_PROF (0.88). This suggested that the proportion of people in an area with a degree was explained by other variables in the index. Therefore DEGREE was dropped.
Removal of low loading variables
The table below shows the variables dropped from the IRSAD because of low loadings. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.
Variable name | Variable description | Variable loading |
---|---|---|
NOCAR | Per cent of occupied private dwellings with no cars | 0.24 |
SPAREBED | Per cent of occupied private dwellings with one or no bedrooms | 0.20 |
ENGLISHPOOR | Per cent of people who do not speak English well | -0.21 |
HIGHCAR | Per cent of occupied private dwellings with three or more cars | 0.20 |
OWNING | Per cent of occupied private dwellings owning dwelling without a mortgage | 0.19 |
FEWBED | Per cent of occupied private dwellings with one or no bedrooms | -0.01 |
Index of Economic Resources
The IER focuses on the financial aspects of relative socio-economic advantage and disadvantage, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.
Variable name | Variable description | Variable loading |
---|---|---|
INC_LOW | Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles) | -0.73 |
LOWRENT | Per cent of occupied private dwellings paying rent less than $250 per week (excluding $0 per week) | -0.71 |
NOCAR | Per cent of occupied private dwellings with no cars | -0.70 |
LONE | Per cent of occupied private dwellings who are lone person occupied private dwellings | -0.68 |
ONEPARENT | Per cent of one parent families with dependent offspring only | -0.54 |
OVERCROWD | Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard) | -0.51 |
UNEMPLOYED_IER | Per cent of people aged 15 years and over who are unemployed | -0.48 |
GROUP | Per cent of occupied private dwellings who are group occupied private dwellings | -0.39 |
OWNING | Per cent of occupied private dwellings owning dwelling without a mortgage | 0.34 |
UNINCORP | Per cent of dwellings with at least one person who is an owner of an unincorporated enterprise | 0.47 |
INC_HIGH | Per cent of people with stated annual household equivalised income greater than $91,000 (approx. 9th and 10th deciles) | 0.52 |
HIGHMORTGAGE | Per cent of occupied private dwellings paying mortgage greater than $2,800 per month | 0.64 |
MORTGAGE | Per cent of occupied private dwellings owning dwelling (with a mortgage) | 0.66 |
HIGHBED | Per cent of occupied private dwellings with four or more bedrooms | 0.75 |
The 2021 IER index explains 35% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 38% (2016 IER) 39% (2011 IER) and 35% (2006 IER).
Removal of highly correlated variables
No variables were dropped based on high correlations.
Removal of low loading variables
The table below shows the variable dropped from the IER because of a low loading.
Variable name | Variable description | Variable loading |
---|---|---|
HIGHRENT | Per cent of occupied private dwellings paying rent greater than $470 per week | 0.07 |
Index of Education and Occupation
The IEO summarises variables related to educational qualifications and vocational skills, according to the concept described in defining the concept behind each of the four indexes. The final variable list and corresponding loadings are shown below.
Variable name | Variable description | Variable loading |
---|---|---|
NOYR12ORHIGHER | Per cent of people aged 15 years and over whose highest level of education is Year 11 or lower. Includes Certificate I and II | -0.87 |
OCC_SKILL5 | Per cent of employed people who work in a Skill Level 5 occupation | -0.76 |
OCC_SKILL4 | Per cent of employed people who work in a Skill Level 4 occupation | -0.75 |
CERTIFICATE | Per cent of people aged 15 years and over whose highest level of educational attainment is a certificate III or IV qualification | -0.65 |
UNEMPLOYED | Per cent of people (in the labour force) unemployed | -0.41 |
DIPLOMA | Per cent of people aged 15 years and over whose highest level of education attainment is a diploma qualification | 0.37 |
ATUNI | Per cent of people aged 15 years and over at university or other tertiary institution | 0.48 |
OCC_SKILL1 | Per cent of employed people who work in a Skill Level 1 occupation | 0.90 |
The 2021 IEO index explains 46% of the total variance of the variables in the final variable list. The corresponding percentages for previous indexes are: 41% (2016 IEO) 47% (2011 IEO), 52% (2006 IEO) and 46% (2001 IEO).
Removal of highly correlated variables
DEGREE (% People aged 15 years and over with a degree or higher qualification) had high correlations with NOYR12ORHIGHER (–0.83) and OCC_SKILL1 (0.82). It was decided that the proportion of people with a degree was already well explained by the index, and DEGREE was removed.
Removal of low loading variables
The table below shows the variable dropped from the IEO because of a low loading. The variables are shown in the order they were removed, with the loadings from the iteration when they were removed.
Variable name | Variable description | Variable loading |
---|---|---|
NOEDU | Per cent of people aged 15 years and over who have no educational attainment | 0.29 |
OCC_SKILL2 | Per cent of employed people who work in a skill level 2 occupation | 0.27 |
ATSCHOOL | Per cent of people aged 15 years and over who are still attending secondary school | 0.05 |
Summary of variables included in indexes
The table below shows the final set of variables included in each index.
Dimension | Index of Relative Socio-Economic Disadvantage | Index of Relative Socio-Economic Advantage and Disadvantage | Index of Economic Resources | Index of Education and Occupation |
---|---|---|---|---|
Income | INC_LOW | INC_HIGH | INC_HIGH |
|
Education | NOYR12ORHIGHER | NOYR12ORHIGHER |
| NOYR12ORHIGHER
|
Employment | UNEMPLOYED | UNEMPLOYED | UNEMPLOYED_IER | UNEMPLOYED |
Occupation | OCC_LABOUR | OCC_LABOUR |
| OCC_SKILL1 |
Housing | LOWRENT | LOWRENT | LOWRENT |
|
Other | CHILDJOBLESS | CHILDJOBLESS | UNINCORP |
|
Distribution of the indexes
This section presents frequency histograms for each index at the SA1 level. The index distributions have generally similar shapes to those from SEIFA 2016.
Index of Relative Socio-Economic Disadvantage
The IRSD distribution shown below has a very long left tail. The values range from about 143 to 1207. This index contains only disadvantage indicators, so there is more scope to distinguish between disadvantaged areas than advantaged areas.
The steep peak for this distribution means that there will be little difference in the scores of SA1s in the middle deciles, and so the characteristics related to the IRSD variables may not vary much across SA1s in these middle deciles.
Index of Relative Socio-Economic Advantage and Disadvantage
The scores for IRSAD range from 435 to 1273. The right-hand slope is not as steep in the IRSAD distribution as it is in the IRSD distribution. This means that the IRSAD scores of SA1s in the upper deciles are more spread out than the IRSD scores in these deciles, and this index has a greater ability to differentiate between the more advantaged areas.
Index of Economic Resources
The scores for IER range from 299 to 1315.
Index of Education and Occupation
The scores for IEO range from 407 to 1372
Basic output: scores, ranks, deciles and percentiles
Scores
The scores are a weighted combination of the selected indicators of advantage and disadvantage which have been standardised to a distribution with a mean of 1000 and standard deviation of 100. An area with all of its indicators equal to the national average will receive a score of 1000. The score for an area will increase if an area has: an indicator of advantage that is greater than the national average; or an indicator of disadvantage that is less than the national average. Conversely, the score for an area will decrease if an area has: an indicator of disadvantage that is greater than the national average; or an indicator of advantage that is less than the national average. Indicators which are further away from the national average have a larger impact on the score.
For areas larger than SA1, the scores are a population weighted average of constituent SA1 scores, as described in Step 11 of the step by step process.
It is important to remember that the scores are an ordinal measure (discussed in more detail in broad guidelines on appropriate use), so care should be taken when comparing scores. For example, an area with a score of 500 is not twice as disadvantaged as an area with a score of 1000; it just had more markers of relative disadvantage.
Ranks, Deciles and Percentiles
As an ordinal measurement, it’s often more appropriate to use alternative measures rather than the raw score. We have calculated ranks, deciles and percentiles and included these in the output spreadsheets. These measures are defined below.
Rank
The areas are ranked in order of their score, from lowest to highest, with rank one representing the most disadvantaged area. Note that in the spreadsheets, rankings are provided on a national basis and also a state/territory basis. Note that the same set of scores is used for each ranking – the scores are not recalculated for each state/territory.
Deciles
All areas are ordered from lowest to highest score, the lowest 10% of areas are given a decile number of one, the next lowest 10% of areas are given a decile number of two and so on, up to the highest 10% of areas which are given a decile number of 10. This means that areas are divided into ten equal sized groups, depending on their score.
Percentiles
All areas are ordered from lowest to highest score, the lowest 1% of areas are given a percentile number of one, the next lowest 1% of areas are given a percentile number of two and so on, up to the highest 1% of areas which are given a percentile number of 100. This means that areas are divided into one hundred equal sized groups, depending on their score. Sometimes deciles and percentiles are referred to generally as quantiles. Other commonly used quantiles include quintiles and quartiles, although we have not included these in the output spreadsheets. They can easily be derived using the percentiles.
Geographic output levels for SEIFA 2021
The primary unit of analysis and the smallest area for which the indexes are available is the Statistical Area Level 1 (SA1). This is the recommended unit of analysis for SEIFA 2021.
For a selection of geographic areas larger than SA1, scores have been calculated by taking population-weighted averages of constituent SA1 scores. The output spreadsheets also contain some information about the distribution of SA1 index scores within larger areas. This enables users to consider the socio-economic diversity that can exist within a larger area.
The table below summarises the output available at the different geographic levels.
Geographic unit | Index score | SA1 distribution information |
---|---|---|
Statistical Area level 1 (SA1) | Yes | N/A |
Statistical Area level 2 (SA2) | Yes | Yes |
Statistical Area level 3 (SA3) | No | Yes |
Statistical Area level 4 (SA4) | No | Yes |
Local Government Area (LGA) | Yes | Yes |
Suburbs and Localities (SAL) | Yes | Yes |
Postal Area (POA) | Yes | Yes |
Commonwealth Electoral Division (CED) | No | Yes |
State Electoral Division (SED) | No | Yes |
For the geographies larger than SA1, and not in the ASGS (LGAs, SALs and POAs), a best fit correspondence of SA1s to the larger geographies was used. Local Government Areas (LGAs), Suburbs and Localities (SALs) and Postal Areas (POAs) are constructed from Mesh Blocks in the 2021 version of the ASGS. In some cases, particularly for certain SALs with small populations, the SA1 boundaries do not correspond closely to the higher level area. For this reason, SEIFA scores for SALs and POAs with small populations should be used with caution, as the scores may have been calculated from populations that do not correspond closely with the actual population in the area. Refer to ABS Maps for information useful for identifying areas that do not correspond closely to the SA1 structure.
The output spreadsheets contain specific references to the ABS publications from which the geography classifications and correspondences have been sourced.
Validation of the indexes
Once the indexes are calculated, they are checked to ensure that they are measuring the desired concept and that the results generally make sense. This validation is important to establish the credibility of the indexes and identify any issues that may have been missed in the construction of the indexes. The methods used to validate SEIFA 2021 include:
- comparison of SEIFA 2021 rankings with 2016 rankings
- identification of the drivers of change from SEIFA 2016 to 2021
- seeking review from internal experts.
Relationships between the indexes
We examined SEIFA for internal consistency by looking at the correlations between the indexes. The table below shows the rank correlation matrix. All correlations are in the expected directions and show significant relationships. The IRSD is very highly correlated with the IRSAD (0.94).
Dimension | IRSD | IRSAD | IER | IEO |
---|---|---|---|---|
IRSD | 1.00 | |||
IRSAD | 0.94 | 1.00 | ||
IER | 0.79 | 0.68 | 1.00 | |
IEO | 0.79 | 0.93 | 0.45 | 1.00 |
The indexes that measure specific dimensions of advantage and disadvantage (IER and the IEO) have a lower correlation with the other indexes with the exception of IEO and IRSAD. The IER includes variables associated with high and low wealth that are not included in the other indexes. The IEO focuses solely on educational qualifications, employment and vocational skills.
The IER and the IEO are positively correlated, but the correlation is much weaker than between the other indexes (0.45). There is a significant difference between the concepts measured by these two indexes, and they do not share any common variables.
Comparing 2016 and 2021 rankings
The SA1 scores from 2021 were checked against comparable areas from 2016, where possible, to identify areas with large changes and determine whether these changes were plausible. Some changes are to be expected, particularly in areas with high population growth and areas that have been affected by economic changes in the region. This process did not identify any results that seemed unrealistic.
Validation of higher-level area indexes
Most of the validation was focused on the SA1 level indexes because SA1s are the primary unit of analysis and indexes for higher level areas (e.g. SA2) are population weighted averages of the SA1 scores. However, we conducted basic validation checks on any higher level area indexes that we produced. This process did not identify any results that seemed unrealistic.
Using and interpreting SEIFA
This chapter provides information to assist in the appropriate use of SEIFA and to help users gain the most value from the product.
Broad guidelines on appropriate use
Area level indexes
The indexes are assigned to areas, not to individuals. They indicate the collective socio-economic characteristics of the people living in an area. A relatively disadvantaged area is likely to have a high proportion of relatively disadvantaged people. However, such an area is also likely to contain some people who are relatively advantaged. When area level indexes are used as proxy measures of individual level socio-economic advantage and disadvantage, many people are likely to be misclassified. This is known as the ecological fallacy. Wise and Mathews (2011) conducted an investigation into the extent of this issue as it relates to SEIFA.
Ordinal indexes
As measures of socio-economic level, the indexes are best interpreted as ordinal measures. They can be used to rank areas and are also useful to understand the distribution of socio-economic conditions across different areas. Also, the index scores are on an arbitrary numerical scale. The scores do not represent some quantity of advantage or disadvantage. For example, we cannot infer that an area with an index value of 1000 is twice as advantaged as an area with an index value of 500.
For ease of interpretation, we generally recommend using the index rankings and quantiles (e.g. deciles) for analysis, rather than using the index scores. Index scores are still provided in the output and can still be used for analysis when appropriate. For more information on index scores, rankings, and quantiles, refer to basic output: scores, ranks, deciles and percentiles.
Importance of the underlying variables
Each index is constructed using a weighted combination of selected variables. The indexes are dependent on the set of variables chosen for the analysis. A different set of underlying variables would result in a different index. However, due to the large number of variables in each index, removing or altering a single variable will usually not have a large effect.
Users should consider the aspect of socio-economic advantage and disadvantage in which they are interested and examine the underlying set of variables in each index. This will allow them to make an informed decision on whether an index is appropriate for their particular purpose. Choice of index provides some tips on choosing which of the four indexes to use.
Choice of index
Depending on the aim or context of the analysis, one of the SEIFA indexes may be more appropriate than the others. Below are some aspects to be considered.
- The concept and variables underlying each index. The concepts behind each index are described in defining the concept behind each of the four indexes. The final variable lists for each index are in the technical details of each index: variables and loadings.
- The degree to which the four indexes are correlated with each other – this is discussed in relationships between the indexes.
- The IRSD ranks areas on a continuum from most disadvantaged to least disadvantaged, while the other three indexes (IRSAD, IER, IEO) rank areas on a continuum from most disadvantaged/least advantaged to most advantaged/least disadvantaged.
- The IRSD and IRSAD are more general measures in the sense that they summarise variables from a wider range of socio-economic dimensions. The IER and IEO are more targeted measures aimed at capturing narrower concepts.
- Simpler measures, such as income or employment status, may be more appropriate than SEIFA for some analysis. For an in-depth discussion on choosing a socio-economic measure, refer to Information Paper: Measures of Socioeconomic Status, New Issue for June 2011.
Using index scores for areas larger than SA1
Given that the indexes are area level measures, they have the tendency to mask some underlying diversity. In some applications of the indexes, it may be important to identify diversity of socioeconomic characteristics within areas.
When using an index at a geographic level higher than SA1 (e.g. SA2s and LGAs), we do have some scope to assess the diversity within that area by looking at its constituent SA1s. There is further discussion about assessing diversity within areas in Wise and Mathews (2011) and Radisich and Wise (2012). The second paper also proposes an additional measure that can be used to identify diverse larger areas. This measure is called the ‘SA1-concentration score’ and can identify the presence of disadvantaged SA1s within an overall advantaged large area.
To enable the analyses described above, an additional type of output has been released for SEIFA 2021. For all geographic levels higher than SA1 for which index scores are released, the corresponding SA1 distributions within those areas have been presented in spreadsheets.
As noted previously, SEIFA scores for SALs and POAs with small populations should be used with caution, because the SA1 boundaries may not correspond closely to the higher level area. For more information, refer to geographic output levels for SEIFA 2021.
Mapping the indexes
Maps of the indexes are an excellent way of observing the spatial distribution of relative socio-economic advantage and disadvantage. Refer to interactive maps for available maps of the SEIFA 2021 indexes.
Using the indexes as contextual variables in social analysis
SEIFA index ranks and deciles are commonly merged onto a person level dataset based on the area in which that person resides. The indexes can then be used to help investigate the relationship between disadvantage or advantage and other variables of interest. This type of analysis can yield some very interesting findings; however, it is important to interpret the findings correctly. Some interpretive issues are discussed below.
A SEIFA index refers to the area in which a person lives. It is a contextual variable. It is incorrect to say that a person is very disadvantaged just because they live in a very disadvantaged area. It is true that living in a very disadvantaged area may disadvantage them to a certain extent, but it is possible that they are advantaged in other respects such as having a good education and earning a high income, and are therefore not typical of other residents in that area. The issue of diversity of individuals within areas is further investigated and discussed in SEIFA: Getting a Handle on Individual Diversity Within Areas, 2011.
It is desirable to use the smallest geographic unit possible when merging an index to another dataset. In the case of SEIFA 2021, the SA1 is the smallest unit available, and if possible, SA1s should be derived on the dataset to which SEIFA scores are being appended.
Area-based quantiles versus population-based quantiles
The word ‘quantiles’ is used to collectively describe measures such as percentiles and deciles. In the spreadsheets in which the indexes are presented, quantiles (percentiles and deciles) are presented in addition to the index scores and rankings, as described in basic output: scores, ranks, deciles and percentiles. These quantiles are calculated based on dividing the number of areas into equal groups. These are called area-based quantiles.
An alternative way of defining the quantiles is to divide them into equal groups based on the number of people living in those areas. The quantiles would then contain an equal number of people (or at least as can be best achieved) in each group, rather than an equal number of areas. These are called population-based quantiles.
The ABS publishes area-based quantiles because they are easier to interpret, since SEIFA is an area-based measure. They also serve most analytical purposes. There are some instances in which the use of population-based quantiles is appropriate. Users can create their own population-based quantiles using information already available in the output spreadsheets. Population-based deciles are also available in Census TableBuilder. As mentioned above, population-based quantiles can be difficult to interpret, so users should take care in how they are applied. The population-based quantiles represent groups of individuals who live in similarly ranked areas, as opposed to groups of similarly ranked individuals.
References
Australian Bureau of Statistics (Aug 2007), Australian Social Trends, 2007, ABS Website, accessed 20 April 2023.
Australian Bureau of Statistics (Jun 2011), Information Paper: Measures of Socioeconomic Status, New Issue for June 2011, ABS Website, accessed 20 April 2023.
Australian Bureau of Statistics (Nov 2019), ANZSCO - Australian and New Zealand Standard Classification of Occupations, 2013, Version 1.3, ABS Website, accessed 20 April 2023.
Australian Bureau of Statistics (2019-20), Housing Occupancy and Costs, ABS Website, accessed 20 April 2023.
Australian Bureau of Statistics (2021), Census of Population and Housing: Census dictionary, ABS Website, accessed 20 April 2023.
Australian Bureau of Statistics (Jul2021-Jun2026), Australian Statistical Geography Standard (ASGS) Edition 3, ABS Website, accessed 20 April 2023.
Australian Bureau of Statistics (May 2022), Education and Work, Australia, ABS Website, accessed 20 April 2023.
Joliffe, I.T. (1986) Principal Component Analysis, Springer–Verlag, New York.
O’Rourke, N.; Hatcher, L. and Stepanski, E.J. (2005) A Step-by-Step Approach to Using SAS for Univariate and Multivariate Statistics, Second Edition, SAS Institute Inc., Cary, NC.
Radisich, P. and Wise, P. (2012) “Socio-Economic Indexes For Areas: Robustness, Diversity Within Larger Areas and the New Geography Standard”, Methodology Research Papers, cat. no. 1351.0.55.038, Australian Bureau of Statistics, Canberra.
Wise, P. and Mathews, R. (2011) “Socio-Economic Indexes For Areas: Getting a Handle on Individual Diversity Within Areas”, Methodology Research Papers, cat. no. 1351.0.55.036, Australian Bureau of Statistics, Canberra.
Historical research papers
Over the years, the ABS has released several research papers that have documented research and development the ABS has performed on different aspects of the SEIFA indexes.
Wise, P. and Williamson, C (2013) “Building on SEIFA: Finer Levels of Socio-Economic Summary Measures”, Methodology Research Papers, cat. no. 1352.0.55.135, Australian Bureau of Statistics, Canberra.
Radisich, P. and Wise, P. (2012) “Socio-Economic Indexes For Areas: Robustness, Diversity Within Larger Areas and the New Geography Standard”, Methodology Research Papers, cat. no. 1351.0.55.038, Australian Bureau of Statistics, Canberra.
Wise, P. and Mathews, R. (2011) “Socio-Economic Indexes For Areas: Getting a Handle on Individual Diversity Within Areas”, Methodology Research Papers, cat. no. 1351.0.55.036, Australian Bureau of Statistics, Canberra.
Baker, J. and Adhikari, P. (2007) “Socio-Economic Indexes for Individuals and Families”, Methodology Research Paper, cat. no. 1352.0.55.086, Australian Bureau of Statistics, Canberra.
Ciurej, M.; Tanton, R. and Sutcliffe, A. (2006) “Analysis of the Regional Distribution of Relatively Disadvantaged Areas using 2001 SEIFA”, Methodology Research Paper, cat. no. 1351.0.55.013, Australian Bureau of Statistics, Canberra.
Adhikari, P. (2006) “Socio-Economic Indexes for Areas: Introduction, Use and Future Directions”, Methodology Research Paper, cat. no. 1351.0.55.015, Australian Bureau of Statistics, Canberra.
Appendix: Variable specifications
This appendix gives descriptions of each variable considered for inclusion in one of the 2021 indexes. The description of the variable proportion is followed by two bullet points; the first is a description of the numerator, the second is a description of the denominator. The square brackets contain specifications for creating the numerator/denominator from Census data items, according to the mnemonics used in the Census of Population and Housing: Census Dictionary, 2021. The variables are arranged by socio-economic dimension.
Notes:
- The Skill Level for each occupation can be found in ANZSCO – Australian and New Zealand Standard Classification of Occupations, Version 1.3
- Household composition was ‘not classifiable’ if the household: contained only visitors or persons aged under 15 years on Census night; or was determined to be occupied on Census Night but the collector could not make contact; or could not be classified because there was insufficient information on the Census form.
- The Canadian National Occupancy Standard determines housing appropriateness, using the number of bedrooms and the number, age, sex and relationships of household members. For more information refer to Housing Occupancy and Costs, Australia, 2019–20.
Income variables
Variable mnemonic | Variable description |
---|---|
INC_LOW | Per cent of people living in households with stated annual household equivalised income between $1 and $25,999 (approx. 1st and 2nd deciles)
|
INC_HIGH
| Per cent of people living in households with stated annual household equivalised income greater than or equal to $91,000 (approx. 9th and 10th deciles)
|
Education variables
Variable mnemonic | Variable description | |
---|---|---|
ATSCHOOL | Per cent of people aged 15 years and over who are attending secondary school
| |
ATUNI | Per cent of people aged 15 years and over attending university or other tertiary institution
| |
CERTIFICATE | Per cent of people aged 15 years and over whose highest level of education is a Certificate Level III or IV qualification
| |
DEGREE | Percent of people aged 15 years and over whose highest level of education is a bachelor degree qualification or higher
|
|
DIPLOMA | Percent of people aged 15 years and over whose highest level of education is a diploma or advanced diploma
|
|
NOEDU | Per cent of people aged 15 years and over who have no formal educational attainment
|
|
NOYR12ORHIGHER
| Per cent of people aged 15 years and over whose highest level of educational attainment is Year 11 or lower (includes Certificate Levels I and II; excludes those still at secondary school)
|
|
Employment variables
Variable mnemonic | Variable description |
---|---|
UNEMPLOYED | Per cent of people in the labour force who are unemployed
|
UNEMPLOYED_IER
| Per cent of people aged 15 and over who are unemployed
|
Occupation variables
Variable mnemonic | Variable description |
---|---|
OCC_DRIVERS | Per cent of employed people classified as Machinery Operators and Drivers
|
OCC_LABOUR | Per cent of employed people classified as Labourers
|
OCC_MANAGER | Per cent of employed people classified as Managers
|
OCC_PROF | Per cent of employed people classified as Professionals
|
OCC_SALES_L | Per cent of employed people classified as Low-Skill Sales Workers
|
OCC_SERVICE_L | Per cent of employed people classified as Low-Skill Community and Personal Service Workers
|
OCC_SKILL1 | Per cent of employed people who work in a Skill Level 1 occupation
|
OCC_SKILL2 | Per cent of employed people who work in a Skill Level 2 occupation
|
OCC_SKILL4 | Per cent of employed people who work in a Skill Level 4 occupation
|
OCC_SKILL5
| Per cent of employed people who work in a Skill Level 5 occupation
|
Housing variables
Variable mnemonic | Variable description | |
---|---|---|
FEWBED | Per cent of occupied private dwellings with one or no bedrooms
| |
HIGHBED | Per cent of occupied private dwellings with four or more bedrooms
| |
HIGHMORTGAGE | Per cent of occupied private dwellings paying more than $3,000 per month in mortgage repayments
| |
HIGHRENT | Per cent of occupied private dwellings paying more than $500 per week in rent
| |
LOWRENT | Per cent of occupied private dwellings paying less than $250 per week in rent (excluding $0 per week)
| |
OVERCROWD | Per cent of occupied private dwellings requiring one or more extra bedrooms (based on Canadian National Occupancy Standard)
| |
OWNING | Per cent of occupied private dwellings owning the dwelling they occupy (without a mortgage)
| |
MORTGAGE | Per cent of occupied private dwellings owning the dwelling they occupy (with a mortgage)
| |
SPAREBED | Per cent of occupied private dwellings with one or more bedrooms spare (based on Canadian National Occupancy Standard)
|
|
Other variables
Variable mnemonic | Variable description | ||
---|---|---|---|
CHILDJOBLESS | Per cent of families with children under 15 years of age and jobless parents
|
| |
DISABILITYU70 | Per cent of people aged under 70 who need assistance with core activities due to a long-term health condition, disability or old age
|
| |
ENGLISHPOOR | Per cent of people who do not speak English well
|
| |
GROUP | Per cent of occupied private dwellings that are group occupied private dwellings
| ||
HIGHCAR | Per cent of occupied private dwellings with three or more cars
|
| |
LONE | Per cent of occupied private dwellings that are lone person occupied private dwellings
|
| |
NOCAR
| Per cent of occupied private dwellings with no cars
|
| |
ONEPARENT | Per cent of families that are one parent families with dependent offspring only
|
| |
SEPDIVORCED | Per cent of people aged 15 and over who are separated or divorced
|
| |
UNINCORP | Per cent of occupied private dwellings with at least one person who is an owner of an unincorporated enterprise
|
|