SSF Guidance Material – Using Geographic Boundaries and Classifications with Statistics
A typical statistic gives a numerical answer to a three-layered question:
- What was observed?
- Where was it observed?
- When was it observed?
Location and geography are integral to answering the ‘where’ part of the question, but also must be associated with the 'when' and the 'what'.
All statistics have a location associated with them. This can be very general such as knowing that a statistic is associated with a country or a state. Alternatively, it can be more specific, such as a city or region, a suburb, a building or even a single point on the ground.
The location information associated with statistics are used for a range of purposes in the statistical process – collection, processing and validation, analysis and dissemination – with perhaps the most prominent being in the aggregation of data for external release (dissemination).
The Statistical Spatial Framework (SSF)¹ was developed by the ABS to provide a broad framework for integrating spatial (location and geographic) information with statistical information. It provides a distinct role for geographic boundaries and classifications by recommending the use of a common geography, the Australian Statistical Geography Standard (ASGS)².
Geographic classifications and boundaries
Geographic classifications and boundaries play a vital role in managing location information in the statistical process. A geographic classification breaks a very large area, such as a country, down into manageable pieces or regions. These classifications allow data to be summarised into regional statistics, so that the person using the data does not need to analyse all of the individual responses separately. This regional data can be used to make statistical comparisons between regions or analyse trends across regions. Some geographic frameworks, such as the Australian Statistical Geography Standard (ASGS), have multiple levels or region types, which allow efficient and flexible release of data at levels that are suitable for various types of data and analysis needs.
For statistical information, an ideal geographic classification must:
- Balance the size of regions so that detailed information can be released but also so that the privacy and confidentiality of the information can also be maintained.
- Be regions that are clearly identifiable or relatable to the data user (i.e. be known units, like suburbs, or be clearly mapped).
- Have well defined criteria for defining their coverage.
- Have boundaries that clearly define each region.
- Be suitable for releasing or analysing data across a range of statistical subject matter.
- Be comprehensive, covering the whole area to which the classification applies.
- Be stable over time to allow comparison of data over time.
The SSF recommends that data released or made available from any socio-economic dataset should include Australian Statistical Geography Standard (ASGS) regions. ASGS regions are the common geography in the SSF and will allow data from different sources to be integrated using these regions.
The ASGS is the ABS’s official geographic classification framework. It provides the geographic classifications used in the majority of the statistical processes undertaken by the ABS, and the majority of socio-economic statistics released by the ABS include an ASGS region type. The ASGS also provides geographic classifications for use by government agencies and other organisations in their statistical datasets and data releases. Use of common geographic classifications across datasets enables data from different sources to be integrated at a geographic level. This allows data from different sources for a region or region type to be quickly compared and contrasted, or used as compatible inputs in more detailed analysis.
In addition to the ASGS, a range of regions that are not included in the ASGS are used by organisations in their statistical databases and in their statistical analysis (e.g. Medicare Local regions, school catchments, hospital regions, environmental zones, etc.). Each of these regions may or may not meet all of the criteria for an ideal geographic classification for statistics. Therefore, use of these geographies for statistical outputs should be considered carefully as it may limit the long term usefulness of the data produced. If data is released on geographies not included in the ASGS, the SSF recommends that data also be released for an appropriate ASGS region.
The Australian Statistical Geography Standard (ASGS) defines all of the standard regions used by the ABS to output data. This includes the main Statistical Area geography structure, which includes four sub-state region types, and a range of other ABS defined geographies based on urbanisation, remoteness, major metropolitan labour markets, and the Aboriginal and Torres Strait Islander population.
The ASGS also encompasses some key external boundary sets that the ABS releases data for, such as Local Government Areas, officially gazetted State Suburbs, Commonwealth and State Electoral Divisions and Natural Resource Management Regions.
Each level of the ASGS has been designed to maximise the amount of data that can be released for specific types of collections, while also minimising the risk that a particular person or organisation could be identified. Despite this, all externally released datasets must undergo rigorous confidentiality checks, including assessment of possible geographic differencing risks. For more information on confidentiality and geographic differencing, see SSF Guidance Material paper “Protecting Privacy for Geospatially Enabled Statistics: Geographic Differencing” on the SSF webpage.
ASGS replaces the ASGC
The ASGS replaced the Australian Standard Geographical Classification (ASGC), which the ABS used and maintained between 1984 and 2011. The ASGS was first released in July 2011 in time for the 2011 Census of Population and Housing (the Census). All other ABS data collections have migrated to the ASGS.
The main reason the ASGC was replaced was that it was unstable as a result of being tied to externally defined and constantly changing Local Government Area (LGA) boundaries for each state and territory. This meant that small area regions at the lower levels of the ASGC framework changed every year to follow LGA changes, which made it difficult and costly to change statistical processes and produce data that could be compared over time.
In contrast, the main structure of the ASGS will remain more stable, with changes made only due to population growth, changes in the built environment and for statistical classification reasons. Changes to boundaries will be made and published every 5 years with each Census. This will allow better comparability of data between geographic regions and over time.
The ASGS Structure
The ASGS comprises a hierarchy of geographic regions and is split into two broads groups:
- ABS Structures
- Non-ABS Structures
ASGS structure diagram – ABS Structures
ASGS structure diagram – ABS Structures
ASGS structure diagram – Non-ABS Structures
ASGS structure diagram – Non-ABS Structures
For further information on the Australian Statistical Geography Standard (ASGS), refer to the ASGS web page.
Selecting the right geography
Whilst the ASGS has been designed to be flexible for the release of different types of statistics, serious consideration needs to be given to what the most appropriate ASGS region is for different types of data. To select the most appropriate ASGS region for your data it is important to take into consideration the following issues early in the process:
- User needs
- Data quality – sample and non-sampling error
- Purpose of the data – current and future
The size and shape of a region impacts upon the picture the resulting data portrays. This can have a direct impact on the results of any subsequent analysis. The following link to an academic journal paper provides more information on this issue - Modifiable areal unit problem.
The ABS Geospatial Solutions Section can advise on which ASGS geography is the most appropriate for your needs, please email: firstname.lastname@example.org
Adding geographic information to your data
The Statistical Spatial Framework (SSF) recommends that each unit record in socio-economic datasets be geocoded with a location coordinate (i.e. latitude and longitude) and an Australian Statistical Geography Standard (ASGS) Mesh Block code. This geocode information is usually obtained through geocoding address information for each statistical unit in the dataset. The SSF also recommends that regional data released or made available from any socio-economic dataset should include ASGS regions – the common geography in the SSF.
The most appropriate method to add geocodes, such as ASGS regions, to your data depends on whether it is unit record level data or aggregated data and what geographic information the data already contains. The table below summarises different data types, the geographic information they contain and the appropriate transformation process that should be used to add geocode information. A diagram that shows the different pathways can also be found in the Appendix – Pathways for location and regional information.
|Data type||Geographic information contained in the data||Appropriate transformation process to be used||Geography available after transformation||Level of Accuracy|
|Unit record data||Location coordinate - Latitude and Longitude||Point-in-polygon allocation||Any||Most accurate|
|Unit record data||Building/site address||Geocoding||Any, via coordinate||Most accurate|
|Unit record data||Locality information (e.g. Suburb and Postcode/State)||Coding Index||SA2 and above||Moderately accurate|
|Aggregated data||ASGS region||Allocation table||Higher level ASGS region||Most accurate|
|Aggregated data||Region information - other non-ASGS regions (e.g. Medicare Local regions, school catchments, hospital regions, environmental zones, etc.)||Correspondence||Higher or similar level ASGS region||Least accurate|
|Aggregated data||ASGS region||Correspondence||Region information other than ASGS||Least accurate|
Unit record data
The options available for geocoding unit record data to the ASGS and other region types depends directly on the location information contained in the unit records.
For datasets that already have location coordinates attached to the unit records, these point references can be transformed into ASGS or other region types. This transformation uses GIS based point-in-polygon processes and allows data to be allocated to the selected region. Ideally, the pointin-polygon allocation process would assign an ASGS Mesh Block to each record, this then allows all other ASGS geography to be built-up from this basic building block using allocation tables.
Generally, the point-in-polygon allocation process is very accurate; however, the accuracy of the allocations is dependent on the accuracy of the original geocoding process used to obtain the location coordinates for each unit record. Therefore, it is critical to understand the accuracy of the coordinates included in any dataset.
Where a dataset has a full building or site address for each unit record then address geocoding is the most accurate method to obtain geocodes from the address information. Address geocoding will provide a location coordinate for each address and, ideally, an ASGS Mesh Block code. Region information can then be obtained using allocation tables and point-in-polygon processes. Further information about geocoding is contained in the SSF guidance material paper, “Geocoding Unit Record Data Using Address and Location” on the SSF website.
If only partial address information is available (such as suburb, state and/or postcode) coding indexes may be used to code unit records to ASGS SA2 units or higher level geographies. Locality, Postcode and State are all part of an address and when used in conjunction can effectively code unit record data to the SA2 level and above. This can be done using a suburb/locality to SA2 coding index.
There are several issues with using Postcode references alone to code data to the ASGS or other regions. There is no official geographic definition of Postcodes and they do not cover the whole of Australia. In general, Postcodes are larger than suburbs and consequently cannot effectively code data to the more detailed structures of the ASGS (e.g. SA1, SA2) and other smaller regions. However, it is possible to reasonably accurately code data to the larger SA4 and GCCSA levels, using a Postcode to SA4 index.
Aggregated data is unit record data that has been summed together according to a characteristic or characteristics in the data, for example data for a region or for a particular age group. This data is commonly thought of in terms of data tables or reports, but can be presented in maps, graphs and other forms.
The options for converting aggregated data from an existing region to a new region depends on the region information the dataset already contains. If it contains:
- ASGS regions that need to be converted to a higher level ASGS region – use an allocation table.
- Other region information (not in the ASGS) that needs to be converted to an ASGS region – use a correspondence.
- ASGS region to be converted to a new region not in the ASGS – use a correspondence.
If the data already contains ASGS region information then allocation tables can be used. This is very accurate as smaller regions of the ASGS directly aggregate to larger regions of the ASGS that are in the same hierarchy. However, these tables can only be used to aggregate data to higher levels of the ASGS. They cannot be used to disaggregate data from higher levels of the ASGS.
If the dataset already contains geographic region information and you want to convert it to a new unrelated region then a correspondence may be used. Correspondences are a mathematical method of converting data from one region to another.
Correspondences are less reliable than address coding or allocation tables and the results can be misleading in some circumstances. This is because they are based on the assumption that the data to be converted is distributed evenly across the original regions or in line with distribution of the total population (see details below). Either of these assumptions may or may not be reasonable depending on the circumstances and the type of data being converted. Therefore, correspondences need to be used with a great deal of care. A quality indicator is incorporated in ABS correspondences to assist users in determining whether they are fit for purpose.
When correspondences are created, the ABS uses a weighting unit to assist with the allocation of "From" data to their respective "To" units. The unit that is used to weight the correspondence will influence how the "From" units are apportioned, which in turn can have a major impact on the data values and hence the quality of the converted, or corresponded, data.
Population weighted correspondences are usually more effective for social and demographic data because populations are generally not evenly spread across a region. Area weighted correspondences are usually more suitable for agricultural and environmental data, because this data tends to be more evenly distributed across a region.
Correspondences are more accurate where the regions in the original dataset are smaller than the new regions the data is being converted to.
An example of a simple correspondence
This diagram is an example of a simple area based correspondence.
The “From” regions are A, B and C, for which there is data. The “To” areas are Region 1 and Region 2, which are the areas that data needs to be converted to.
- All of the area of A is contained within Region 1, so 100% of its data is allocated to Region 1.
- 61% of B is contained within Region 1, so 61% of its data is allocated to Region 1.
- The remaining part of B is contained within Region 2, so 39% of its data is allocated to Region 2.
- All of the area C is contained with Region 2, so 100% of its data is allocated to Region 2.
If this is applied to the following data:
The data convertion is shown in the table below:
To produce the following results:
|Region||Person From A||Persons From B||Persons From C||Total Persons|
Where can I get further information?
More information about geographic boundaries and classifications can be found by visiting the ABS website – Statistical geography homepage
For more information about privacy and geospatial information see the SSF guidance material paper, “Protecting Privacy for Geospatially Enabled Statistics: Geographic Differencing” on the SSF webpage.
Any questions or comments on this paper or other statistical geography topics can be emailed to email@example.com