6523.0 - Household Income and Income Distribution, Australia, 2002-03

ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 03/12/2004

Page tools: Print

Print Page Print all pages in this product

Explanatory Notes
Glossary
Abbreviations
Analysing Income Distribution (Appendix)
Current and Annual Income (Appendix)
Equivalised Disposable Household Income (Appendix)
Sampling Variability (Appendix)

APPENDIX 1 ANALYSING INCOME DISTRIBUTION

INTRODUCTION

There are many ways to illustrate aspects of the distribution of income and to measure the extent of income inequality. In this publication, five main types of indicator are used - means and medians, frequency distributions, percentile ratios, income shares, and Gini coefficients. This Appendix describes how these indicators are derived.

The Gini coefficient is a single statistic summary of inequality. Analysts sometimes use other single statistic summaries in addition to or instead of the Gini coefficient. This appendix also provides a comparison between the Gini coefficient and some alternative summary measures, the Theil index and the Atkinson index.

MEAN AND MEDIAN

Mean household income (average household income) and median household income (the midpoint when all persons or households are ranked in ascending order of household income) are simple indicators that can be used to show income differences between subgroups of the population. Many tables in this publication include mean household income and median household income data.

In most cases, the income measure used is equivalised disposable household income. As described in Appendix 3, equivalised disposable household income can be viewed as an indicator of the economic resources available to each member of a household. In this publication, therefore, the mean and median values of equivalised disposable household income are always calculated with respect to the relevant number of persons, even where the table is describing households. Measures calculated in this way are sometimes known as person weighted measures. The method of calculation is described under 'Estimation' in the Explanatory Notes.

In some tables describing households, the mean and median of gross household income are also shown. These measures are calculated with respect to the relevant number of households, not persons. They are sometimes known as household weighted measures.

FREQUENCY DISTRIBUTION

A frequency distribution illustrates the location and spread of income within a population. It groups the population into classes by size of household income and gives the number or proportion of people in each income range. A graph of the frequency distribution is a good way to portray the essence of the income distribution. The graph in the Summary of Findings shows the proportion of people within $50 household income ranges.

Frequency distributions can provide considerable detail about variations in the income of the population being described, but it is difficult to describe the differences between two frequency distributions. They are therefore often accompanied by other summary statistics, such as the mean and median. Taken together, the mean and median can provide an indication of the shape of the frequency distribution. As can be seen in the graph (figure 4) in the Summary of Findings, the distribution of income tends to be asymmetrical, with a small number of people having relatively high household incomes and a larger number of people having relatively lower household incomes. The greater the asymmetry, the greater will be the difference between the mean and the median.

QUANTILE MEASURES

When persons (or any other units) are ranked from the lowest to the highest on the basis of some characteristic such as their household income, they can then be divided into equally sized groups. The generic term for such groups is quantiles.

Quintiles, deciles and percentiles

When the population is divided into five equally sized groups, the quantiles are called quintiles. If there are 10 groups, they are deciles, and division into 100 groups gives percentiles. Thus the first quintile will comprise the first two deciles and the first 20 percentiles.

This publication frequently presents data classified into income quintiles, supplemented by data relating to the 2nd and 3rd deciles. The latter is included to enable quintile style analysis to be carried out without undue impact from very low incomes which may not accurately reflect levels of economic wellbeing (see paragraph 13 in the Explanatory Notes).

Equivalised disposable household income is the income measure used to define the quantiles shown in this publication, and the quantiles each comprise the same number of persons, that is, they are person weighted.

Upper values and medians

In some analyses, the statistic of interest is the boundary between quantiles. This is usually expressed in terms of the upper value of a particular percentile. For example, the upper value of the first quintile is also the upper value of the 20th percentile and is described as P20. The upper value of the ninth decile is P90. The median of a whole population is P50, the median of the 3rd quintile is also P50, the median of the first quintile is P10, etc.

Percentile ratios

Percentile ratios summarise the relative distance between two points on the income distribution. To illustrate the full spread of the income distribution, the percentile ratio needs to refer to points near the extremes of the distribution, for example, the P90/P10 ratio. The P80/P20 ratio better illustrates the magnitude of the range within which the incomes of the majority of the population fall. The P80/P50 and P50/P20 ratios focus on comparing the ends of the income distribution with the midpoint.

Income share

Income shares can be calculated and compared for each income quintile (or any other subgrouping) of a population. The aggregate income of the units in each quintile is divided by the overall aggregate income of the entire population to derive income shares.

SINGLE STATISTIC SUMMARIES OF INEQUALITY

Taken together, the simple measures of income distribution described above can provide an indication of changes in the income distribution of a population over time, or differences in the income distributions of two separate populations. However, none of the simple measures comprises a single statistic that summarises the whole income distribution in a way that directly takes into account the individual incomes of all members of the population.

The remainder of this appendix considers some of the issues associated with compiling a single statistic summary of inequality, and compares a number of alternative measures. The first is the Gini coefficient, which is the most commonly used summary measure, and in the past has been the only summary measure included in this publication. The Gini coefficient is compared with the Theil index and a number of Atkinson indexes.

Concept of income inequality

It is generally agreed that perfect equality in the distribution of income can be defined as the situation in which everyone in the population lives in a household with the same equivalised disposable household income (see Appendix 3, Equivalised Disposable Household Income). If any household has lower or higher equivalised disposable household income than any other household, there is inequality in the income distribution. (As for means and medians described above, inequality is measured with respect to the number of persons, but the concept of inequality applies equally if a population of households, income units or other units is under consideration.)

However, there is no unique, generally accepted way of summarising the degree to which a population does not have perfect equality, or, more practically, summarising the difference in inequality between two populations. Unequal distributions of income can occur in many different ways. The majority of people may have very similar incomes with pockets of very high or very low income. Or entire populations may be heavily clustered at the top and the bottom of the income distribution with few people receiving incomes in between these extremes. To evaluate one income distribution as having greater or lesser inequality than another income distribution, it is necessary to compare the distributions in terms of which segments of the population have a greater share of income and which segments have a lower share. It is then necessary to at least implicitly judge whether the relative gain in income by some people is more than offset or less than offset by the relative loss of income by some other people. Different observers may make different judgments about the same situation, depending on personal preferences, etc. Different summary measures of inequality embody different judgments about the relative gains and losses. As will be seen below, some measures allow the user to explicitly set a parameter to reflect the judgment of the user in this regard.

Simple examples of different patterns of inequality can be used to illustrate the issues under consideration.

For the first example, consider the equivalised disposable household income of the two populations A and B depicted in the diagram A1, Frequency Distributions I. Population A is derived from the 2000-01 Survey of Income and Housing (SIH) population after removing people in households with zero income (the reason for deleting households with zero income is explained later in this appendix). Population B covers the same people as in population A, but everyone's income is transformed in a particular way that reduces the proportional differences in income across the population while retaining the same mean income for the population. There are therefore fewer people on very low or very high incomes and more people in between these extremes, with the median for population B closer to the mean, and less spread between P10 and P90

The extent to which the income distributions for populations A and B vary from equality, and from each other, can be illustrated graphically another way, using Lorenz curves.

Lorenz curves

The Lorenz curve is a graph with the horizontal axis showing the cumulative proportion of the persons in the population ranked according to their income and with the vertical axis showing the corresponding cumulative proportion of equivalised disposable household income. The graph then shows the income share of any selected cumulative proportion of the population. The diagonal line represents a situation of perfect equality, that is, all people have the same equivalised disposable household income. The diagram A2, Lorenz Curves I shows the Lorenz curves for the two populations described above.

Since the distribution of population B's income is uniformly less widely spread than for population A, all points of the Lorenz curve for population B are closer to the line of perfect equality than the corresponding points of the Lorenz curve for population A. In this situation, population B is said to be in a position of Lorenz dominance and can be regarded as having a more equal income distribution than population A.

However, if the Lorenz curves of two populations cross over there is no Lorenz dominance and there is no generally accepted way of defining which of the two populations has the more equal income distribution.

Consider the income distributions of the populations in a second example, as shown in the diagram A3 Frequency Distributions II. Population A is the same as in the first example above. Populations C and D also cover the same people as in population A, and all have the same mean income. But the income of populations C and D are transformed in such a way that the lower income people are relatively better off than for population A and the higher income people are also relatively better off than for population A. Conversely, the incomes of the middle of the population are relatively reduced so that the mean income of the three populations remains the same. Also the ranking of the population by income has not changed the relative position of any person. For population A, the lowest income is $1, for population C it is about $180, and for population D it is about $150. The incomes of the higher income people have received a relatively greater boost for population D than for population C.

The medians (not shown in the diagram) are higher for populations C and D than for A, but all are below the mean. As for population B in the earlier diagram, P10 for populations C and D is above P10 for population A. However, in contrast to population B, populations C and D also have P90 above that of population A.

The diagram A4, Lorenz Curves II shows the resultant differences in the Lorenz curves, with the curves for both populations C and D crossing that of population A. Therefore there is ambiguity about whether populations C and D have greater or less income inequality than population A. Comparing populations C and D to population A, both lower and higher income people have a greater share of total income and middle income people have less. In population C, the lower income people show a relatively greater gain than the higher income people. Conversely, in population D, the higher income people show a relatively greater gain than the lower income people. However, the curve for population C does not cross that of population D, and therefore population C has Lorenz dominance over population D, that is, income is unambiguously distributed more equally in population C than in population D.

Table A8 shows the years for which the income distribution has Lorenz dominance over the income distributions of other years. Table A8 also shows the years for which the lack of Lorenz dominance is due only to the crossing of the Lorenz curves in the bottom decile of the income distribution, that part of the income distribution for which income is not necessarily a good indicator of economic wellbeing.

The Lorenz curves described in this appendix are depicting the relativities between income distributions and do not show whether incomes overall have been growing, contracting or remaining static. Another form of Lorenz curves, known as Generalised Lorenz curves, depict the cumulative incomes of populations after adjusting for differences in average income between the populations. They therefore can be used to analyse differences in the level of income as well as differences in distribution, but do not as clearly show differences in inequality (see, for example, Deaton (1997)). In this publication, differences in the level of income for the low, middle and high income segments of the population are described in the Summary of Findings.

Summary indicators

The three commonly used summary inequality measures mentioned earlier - the Gini coefficient, the Theil index, and the Atkinson index - can be produced for populations A, B, C and D. Table A5 provides the values for these measures with respect to each population, and descriptions of the measures follow. The Atkinson index is considered with a number of different settings of a user defined parameter, as described later.

A5 Comparison of inequality summary measures
Diagram: Comparison of inequality summary measures

Gini coefficient

The Gini coefficient can be defined by referring to the Lorenz curve. It is the ratio of the area between the actual Lorenz curve and the diagonal (or line of equality) compared to the total area under the diagonal. The Gini coefficient equals zero when all people have the same level of income and approaches one when one person receives all the income. In other words, the smaller the Gini coefficient the more equal the distribution of income, given the assumptions underlying the Gini coefficient.

Table A5 shows that the Gini coefficient for population B is substantially below the coefficient for population A. The coefficient for population C is a little above that for population A, and the coefficient for population D is somewhat further above. According to the Gini coefficient, therefore, population B has a more equal income distribution than population A, but populations C and D have less equal distributions.

Mathematically, the Gini coefficient can be expressed as

where

is the number of people in the population

is the mean equivalised disposable household income of all people in the population

and

are the equivalised disposable household income of the ith and jth persons in the population.

The Gini coefficient is a summary of the differences between each person in the population and every other person in the population. The differences are the absolute arithmetic differences, and therefore a difference of $x between two relatively high income people contributes as much to the index as a difference of $x between two relatively low income people.

An increase in the income of a person with income greater than median income will always lead to an increase in the coefficient, and a decrease in the income of a person with income lower than median income will also always lead to an increase in the coefficient. The extent of the increase will depend on the proportion of people that have income in the range between median income and the income of the person with the changed income, both before and after the change in income.

At the extremes, increasing the income of the person with the lowest income by $x or increasing the income of the person with the highest income by $x will respectively decrease and increase the Gini coefficient by the same amount (assuming the lowest income person remains the lowest income person after the change).

Theil index

Another commonly used summary statistic is the Theil index, which can be expressed mathematically as

The Theil index ranges between zero when all incomes are equal and log n when one person receives all the income. It therefore has a higher value if one person in a larger population receives all income compared to if one person in a smaller population receives all income. However, it has the same value for two unequally sized populations if income is distributed with the same proportions in the two populations, that is, they have identical Lorenz curves. (The other single statistic summary indicators discussed in this appendix also have this characteristic.)

As for the Gini coefficient, if one population has Lorenz dominance over another population, the Theil index for the first population will be lower. Table A5 shows, therefore, that population B has a lower Theil index than population A, and population C has a lower Theil index than population D. The Theil index for population A is also below that for populations C and D.

The construction of the Theil index is substantially different from that of the Gini coefficient. Instead of comparing the income of each person with the income of every other person, the Theil index compares the income of each person with the mean income of the population.

Atkinson index

The Atkinson index is a more complex summary statistic. As in the Theil index, it contains a ratio comparison of each person's income with the population mean. But it also requires the user to set a parameter, , specifying a level of 'inequality aversion'. The mathematical expression is

for not equal to one, and

for equal to one.

An Atkinson index always has a value between zero and one, regardless of the value of For any given value of , a lower value of the Atkinson index implies a greater degree of equality in the income distribution.

The 'inequality aversion' parameter, , in effect specifies how much more benefit the user thinks an extra dollar would provide to a person with lower income compared to the benefit an extra dollar would provide to a person on a higher income. At the extreme of set to zero, the user has no 'inequality aversion'. The benefit of an extra dollar is assumed to be the same for everyone in the population, and the Atkinson index is always equal to zero regardless of whether the incomes in the population are widely dispersed or not.

The higher the setting of , the greater the relative benefit derived by a lower income person receiving an extra dollar compared to a higher income person receiving an extra dollar. Consequently, the higher the setting of , the more sensitive is the Atkinson index to the ratios of the lowest incomes in the population to the mean income of the population. In particular, if a population has a number of people with income very close to zero, that is, only a very small proportion of mean income, their influence can dominate the Atkinson index and it has a value close to one.

Table A5 presents the Atkinson index with various settings of between 0.5 and 2.0. As expected, the Atkinson indexes for population B are always lower than those for population A, reflecting the Lorenz dominance of population B over population A. Similarly, the Atkinson indexes for population C are always lower than those for population D. However, comparing populations C and D with populations A and B gives a mixed picture.

The higher the setting of , the more emphasis the Atkinson index gives to the lowest values in the income distribution. Populations A and B have some values less than one hundredth of the mean, but populations C and D do not. Therefore the Atkinson index increases more quickly for populations A and B as the setting of is increased. For set to 1.0 and above, population A is measured as having greater income inequality than population C; for set to 1.5 and above population A has greater income inequality than population D; and for set to 2.0 population B also has greater income inequality than population C.

A complicating factor is that the Atkinson index cannot be calculated for a population containing zero incomes. Over one per cent of the SIH population has zero equivalised disposable household income including reported negative incomes which are set to zero when equivalised.

Comparison of summary measures

Table A6 provides the chosen summary measures for all years in which the SIH has been conducted, together with the standard errors of the estimates in 2002-03. In 1995-96, 1997-98 and 1999-2000 all indicators consistently pointed to an increase or a decrease in inequality. In the other years there was a mixed picture. Over the whole period, all indicators show an increase in inequality, although none of the movements are significant at the 95% confidence level. Standard errors for years prior to 2002-03 tend to be higher than those for 2002-03 because the 2002-03 SIH had a larger sample than the earlier SIHs.

A6 Summary statistics of income inequality, 1994-95 to 2002-03
Diagram: Summary statistics of income inequality, 1994-95 to 2002-03

Sensitivity of summary measures to low incomes

Table A7 compares the impact on selected inequality summary statistics for the 2000-01 SIH population if persons with zero equivalised disposable household income have their income set to 1 cent, to 10 cents or to $1, or if they are omitted from the population altogether. Note that population A used in the first part of this appendix was the 2000-01 SIH population, after removing persons with zero income.

The table shows that the Atkinson indexes, but not the Gini or Theil measures, are sensitive to small changes, in dollar terms, to the lowest incomes in the Australian data set. It also shows that if persons with zero income are omitted from the population altogether, all indicators are impacted, with the least impact being on the Gini coefficient, and with an impact of over 50% on the Atkinson index with set to 2.0.

A7 Comparison of alternative treatments of persons with zero household income, 2000-01
Diagram: Comparison of alternative treatments of persons with zero household income, 2000-01

Diagram: Comparison of alternative treatments of persons with zero household income, 2000-01

Given the likelihood that most of the very low incomes do not accurately represent the economic wellbeing of the respondents reporting such values, there is some doubt about the usefulness of summary indicators that are particularly sensitive to this segment of the population.

Choice of summary measures

There are several implicit and explicit assumptions underlying the measures discussed above. The Atkinson index explicitly requires the user to choose an 'inequality aversion' factor, but the other measures also implicitly embody judgements about how inequality is to be quantified.

Rather than considering just one summary measure, analysts will often look at a range of measures to see whether or not they give a consistent indication about changes in inequality, especially if there is no Lorenz dominance among the distributions being compared. Comparisons can be for the same population over time, or between different populations at a point in time.

Each of the indicators has its own particular advantages. For example, the Gini coefficient can be easily understood through the graphical interpretation of the Lorenz curve, and it is probably the most widely used indicator. The Theil index is particularly useful where analysts wish to decompose the measure of income inequality in a population into the inequality that exists within subpopulations and the inequality that exists between those subpopulations. The Atkinson indexes highlight that summary measures depend on the underlying assumptions about the quantification of inequality and assist the user in varying some of those assumptions. The Gini coefficient is sometimes criticised as being too sensitive to relative changes around the middle of the income distribution. This sensitivity arises because the derivation of the Gini coefficient reflects the ranking of the population, and ranking is most likely to change at the densest part of the income distribution, which is likely to be around the middle of the distribution.

In choosing which income distribution indicators to present, whether for simple or summary measures, it is useful to recall that income alone is not a perfect measure of the economic resources available to people to maintain or enhance their wellbeing, but it is a reasonable proxy that will be suitable for most people. However, as explained in paragraph 13 of the Explanatory Notes of this publication, some respondents report extremely low and even negative incomes in the Survey of Income and Housing (SIH), often reflecting their business and investment arrangements rather than any distinctly low economic wellbeing of these respondents. In other cases, incomes may be underreported either accidentally or deliberately, so again they are not a good indicator of economic inequality. It has therefore been considered inappropriate for these records to have a disproportionate influence on a summary income inequality measure being used for assessing inequality in economic wellbeing, just as the bottom decile is excluded in this publication from analysis of low income growth over time.

The Gini has been retained in the main body of this publication because it is not overly sensitive to the extremely low incomes that can be reported, and it is relatively simple to interpret. The other summary measures looked at in this appendix are more sensitive in the Australian context to extremely low and negative incomes that are assumed to not adequately reflect economic wellbeing.

A8 LORENZ DOMINANCE BETWEEN INCOME DISTRIBUTIONS, 1994-95 TO 2002-03

	Dominates	Almost dominates(a)	No dominance relationship(b)	Almost dominated by(a)	Is dominated by
	survey year	survey year	survey year	survey year	survey year

1994-95		1999-00	1997-98		1995-96
		2000-01			1996-97
		2002-03
1995-96	1994-95		1996-97
	1997-98
	1999-00
	2000-01
	2002-03
1996-97	1994-95		1995-96
	1997-98
	1999-00
	2000-01
	2002-03
1997-98	1999-00	2000-01	1994-95		1995-96
	2002-03				1996-97
1999-00			2000-01	1994-95	1995-96
			2002-03		1996-97
					1997-98
2000-01			1999-00	1994-95	1995-96
			2002-03	1997-98	1996-97
2002-03			1999-00	1994-95	1995-96
			2000-01		1996-97
					1997-98

(a) Lorenz curves only cross in the first decile of the income distribution
(b) Lorenz curves cross at least once outside the first decile of the income distribution

Expected developments

In the 2003-04 Household Income and Expenditure Survey (a combination of the Household Expenditure Survey and the Survey of Income and Housing, currently being processed), the ABS collected additional information about the assets and liabilities of households. This wealth information, together with the reported expenditure of households, is expected to provide a better understanding of the characteristics of households with very low incomes.

Bibliography

Allison, D.P. (1978). Measures of Inequality. American Sociological Review, vol. 43 (December), pp. 865-880.

Atkinson, A.B. (1970). On the Measurement of Inequality. Journal of Economic Theory 2, pp.244-263.

Chakravarty, S.R. and Muliere, P. (2003). Welfare Indicators: A Review and new perspectives. 1. Measurement of inequality. Metron- International Journal of Statistics, vol. LXI, no.3, pp. 457-497

Cowell, F. (1995). Measuring Inequality. LSE Handbook in Economics Series, 2nd Ed., Prentice Hall, London.

Cowell, F. (2002). The Economics of Poverty and Inequality. London School of Economics, Edward Elgar, Cheltenham, UK.

Cowell, F. and Flachaire, E. (2002). Sensitivity of Inequality Measures to Extreme Values. Distributional Analysis and Research Program Discussion Paper No. 60.

Cowell, F. and Victoria-Feser, M-P. (1993). Robustness and Properties of Inequality Measures. Distributional Analysis and Research Program Discussion Paper No. 1.

Deaton, A. (1997). The analysis of household surveys: A microeconomic approach to development policy. John Hopkins University Press and The World Bank.

Shorrocks, A.F. (1983). Ranking Income Distributions. Economica, February 50, p.p. 3-17.

Subramanian, S. (2004). Indicators of Inequality and Poverty. United Nations University- World Institute for Development Economics Research, Research Paper No. 2004/25.