Statistical geography explained

A statistical geography provides the extra dimension of location to statistics. A statistical geography effectively divides the area of interest, on which the statistics are collected, into spatial categories, called statistical areas, that allow the user to see not just how the data varies but also where it varies. An effective statistical geography is one which supports many uses and enables comparisons over time.

Geography affects data

The size and shape of statistical areas impacts upon the picture the resulting data portrays. This can have a direct impact on the results of any subsequent analysis. As an extreme example consider two adjacent suburbs, A and B. Suburb A is primarily composed of students with incomes around $20,000. Suburb B is primarily composed of people with incomes around $50,000. If the two populations are defined by a single statistical area there is no information on how these two groups relate to each other spatially. If the data is divided into two statistical areas, each containing one half of suburbs A and B, the data for each statistical area would suggest that the two suburbs are roughly similar in terms of income profile. However if the two statistical areas are split so that one contains Suburb A and one contains Suburb B then it would become apparent that the two suburbs had very different incomes. This issue relates to the modifiable areal unit problem (MAUP).

Statistical relevance

Ideally the way a region is divided up into statistical areas should depend upon the type of statistics being presented. Statistical areas with equal population sizes of 10,000 people will result in many areas within a city and only a few areas covering remote agricultural regions where the population density is much lower. These statistical areas would be useful for presenting data such as estimated residential populations. However these statistical areas would be inappropriate for agricultural data where the many statistical areas in the city would have no data and the few very large statistical areas in remote areas would hide much of the spatial variation in the agricultural data. This explains why different data is presented on different statistical geographies.

Relevance over time

For statistical areas to be relevant to the statistics produced on them, they must reflect the real world. However in the real world, geography changes, towns grow, new roads are built linking communities and dividing others and administrative boundaries change. As a result of this statistical areas must also change to reflect this. Urban centre and locality boundaries that cover only half of an urban centre provide a poor representation of reality which can lead to confusion. However every time a statistical area changes it becomes more difficult to compare it with past statistical areas to see how places have changed over time. Ideally, statistical areas will balance these conflicting interests. As an example the Statistical Area 2 regions in the ASGS are built with a buffer around towns to allow the town to grow within each statistical area. This minimises the need to change over the boundary allowing data to be compared over time on a single region. When the boundary of a statistical area does require change, managing this change through a process of splitting or amalgamation allows a simpler comparison between the old and new statistical areas.

Scale

In general, users want as much spatial detail as possible from a statistical geography however this needs to be balanced with both the quality of statistics available for each statistical area and confidentiality. For Census data where every person in the statistical area has data collected on them, accurate data can be created for very small areas. For confidentiality reasons these statistical areas are required to contain several hundred people. However for monthly survey data such as unemployment data where only a small percentage of the state is sampled the statistical areas need to contain a large enough quantity of the sample to provide accurate data and this can mean several hundred thousand people.

Using Mesh Blocks for flexibility

Ideally a statistical geography will provide a range of statistical areas to suit different statistics but also accommodate the provision of a range of statistics on a common statistical area. To provide this flexibility the ABS uses a "building block" geographical unit called a Mesh Block. A Mesh Block covers only a very small geographic area and contains on average around 30 dwellings. Statistical data is stored at the Mesh Block level, with individual identifiers removed for confidentiality reasons. Because Mesh Blocks are such small statistical areas they can be combined together to accurately approximate a large range of other statistical areas. In the future, custom designed groups of Mesh Blocks will be able to be combined to approximate an area of interest so that statistics can be obtained for these areas.