# TableBuilder: Participation in Sport and Physical Recreation, Australia

Data on persons aged 15 years and over who participated in sport and physical activities as players, competitors or physically undertook an activity

## Introduction

This product provides a range of information about the release of microdata on Participation in Sport and Physical Recreation from the Australian Bureau of Statistics 2013–14 Multipurpose Household Survey (MPHS), including details about the survey methodology and how to use TableBuilder. Data item lists and information on the conditions of use and the quality of the microdata, as well as the definitions used, are also provided.

Microdata are the most detailed information available from a survey and are generally the responses to individual questions on the questionnaire or data derived from two or more questions and are released with the approval of the Australian Statistician.

### Available products

The following microdata product is available from this survey:

• TableBuilder – an online tool for creating tables and graphs.

Further information about this service, and other information to assist users in understanding and accessing microdata in general, is available from the Microdata Entry Page. Before applying for access, users should read and familiarise themselves with the information contained in the User Manual: TableBuilder.

### Further information

Further information about the survey and the microdata can be found in the various pages associated with this product, including:

• a detailed list of data items for the 2013/14 Participation in Sport and Physical Recreation TableBuilder, available in the Data downloads section
• the Quality Declaration, Abbreviations and Glossary sections.

### Support

## Survey methodology

General information about the 2013-14 MPHS Participation in Sport and Physical Recreation topic, including summary results, are available in the publication Participation in Sport and Physical Recreation, Australia, 2013-14 (cat. no. 4177.0).

Detailed information about the survey including scope and coverage, survey design, data collection methodology, weighting, estimation and benchmarking and the reliability of estimates can be accessed from the Explanatory Notes page of that publication. All published summary tables, in Excel spreadsheet format, can be accessed from the Data downloads section.

## File structure

### Data available by level

The 2013-14 Multipurpose Household Survey asked respondents across Australia a range of questions about their participation in sport and physical recreation activities over a 12 month period. Responses to these questions, along with a range of socio-demographic data are available as microdata through TableBuilder files. The microdata files have three levels:

1. Person level
2. Activity level
3. Role level

These levels are hierarchical as each activity must be linked to a person. A person identifier exists on each level which allows data users to combine people's characteristics with the activities they undertake.

#### Person level

The Person level contains all of the standard demographic characteristics of each person such as age, sex, country of birth, education and labour force status. The level also contains person characteristic data items relevant to participation in sport and physical recreation activities.

In addition, the level includes some household characteristics applicable to the respondent such as equivalised weekly household income and whether any children aged 14 years or under are present in the household.

All geographic identifiers are included on the Person level (i.e. state/territory of usual residence, remoteness area and capital city/balance of state).

#### Activity level

The Activity level contains details relating to each individual sport or physical recreation activity participated in by each respondent. A maximum of 10 sports and physical recreation activities were recorded in the survey for each respondent.

A complete data item list can be accessed from the Data downloads section.

#### Role level

The role level contains fields relating to each particular role: frequency and duration of involvement and payment details.

A complete data item list can be accessed from the Data downloads section.

### Weights and estimation

As the survey was conducted on a sample of households in Australia, it is important to take account of the method of sample selection when deriving estimates. This is particularly important as a person's chance of selection in the survey varied depending on the state or territory in which they lived. Survey 'weights' are values which indicate how many population units are represented by the sample unit.

There is one weight provided: a person weight. This should be used when analysing data at the person, activity and role level.

Where estimates are derived, it is essential that they are calculated by adding the weights of persons in each category, and not just by counting the number of records falling into each category. If each person's 'weight' were to be ignored, then no account would be taken of a person's chance of selection in the survey or of different response rates across population groups, with the result that counts produced could be seriously biased. The application of weights ensures that the person estimates conform to an independently estimated distribution of the population by age, sex, state/territory, part of state and labour force status.

### Not applicable categories

Most data items included in the microdata include a 'Not applicable' category. The 'Not applicable' category comprises those respondents who were not asked a particular question and hence are not applicable to the population to which the data item refers. The classification value of the 'Not applicable' category, where relevant, are shown in the data item lists in the Data downloads section.

### Special codes

For some data items certain classification values have been reserved as special codes and must not be added as if they were quantitative values. These special codes generally relate to data items such as income. For example, code 999999999 for the data item 'Weekly personal income from all sources - Parametric', refers to 'Income unknown or not stated'.

### Populations

The population relevant to each data item is shown in the data item list and should be considered when extracting and analysing the microdata. The actual population count for each data item is equal to the total cumulative frequency minus the 'Not applicable' category.

Generally, all populations, including very specific populations, can be 'filtered' using other relevant data items. For example, if the population of interest is 'Employed persons', any data item with that population (excluding the 'Not applicable' category) can be used as a filter.

### Standard errors

Each record on the person level and activity level also contains 30 replicate weights and, by using these weights, it is possible to calculate Standard Errors (SEs) for weighted estimates produced from the microdata. This method is known as the 30 group Jack-knife variance estimator.

Under the Jack-knife method of replicate weighting, weights were derived as follows:

• 30 replicate groups were formed each group mirroring the overall sample (where units from a collection district all belong to the same replicate group and a unit can belong to only one replicate group)
• One replicate group was dropped from the file and then the remaining records were weighted in the same manner as for the full sample
• Records in that group that were dropped received a weight of zero

This process was repeated for each replicate group (i.e. a total of 30 times). Ultimately each record had 30 replicate weights attached to it with one of these being the zero weight.

Replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit records analyses such as chi-square and logistic regression to be conducted which take into account the sample design. Replicate weights for any variable of interest can be calculated from the 30 replicate groups, giving 30 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.

To obtain the SE of a weighted estimate y, the same estimate is calculated using each of the 30 replicate weights. The variability between these replicate estimates (denoting y(g) for group number g) is used to measure the SE of the original weighted estimate y using the formula:

$$S E(y)=\sqrt{(29 / 30) \sum \limits_{g=1}^{30}\left(y_{(g)}-y\right)^{2}}$$

where:
$$g$$ = the replicate group number
$$y(g)$$ = the weighted estimate, having applied the weights for replicate group g (is this needed?)
$$y$$ = the weighted estimate from the sample.

The 30 group Jack-knife method can be applied not just to estimates of the population total, but also where the estimate y is a function of estimates of the population total, such as a proportion, difference or ratio. For more information on the 30 group Jack-knife method of SE estimation, see Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee) (cat. no. 1352.0.55.029).

Use of the 30 group Jack-knife method for complex estimates, such as regression parameters from a statistical model, is not straightforward and may not be appropriate. The method as described does not apply to investigations where survey weights are not used, such as in unweighted statistical modelling.

## Using TableBuilder

For general information relating to the TableBuilder or instructions on how to use features of the TableBuilder product, please refer to the User Manual: TableBuilder (cat. no. 1406.0.55.005).

More detailed information relating to survey methodologies, such as the counting units and weights applied to the TableBuilder dataset, are explained in the Survey methodology section.

The TableBuilder dataset contains all of the person, activity and role level data applicable to the Participation in Sport and Physical Recreation topic. Information on the structure is provided in the File structure section.

### Continuous data items

TableBuilder includes a number of continuous variables which can have a response value at any point along a continuum. Some continuous data items are allocated special codes for certain responses (e.g. 9999 = 'Not applicable'). When creating ranges in TableBuilder for such continuous items, special codes will automatically be excluded. Therefore the total will show only 'valid responses' rather than all responses (including special codes).

For example:

The following shows the tabulation of the data items 'Number of times participated in sport/physical recreation as a player in the last 12 months' by 'Sex of person'. The continuous values of the data item are contained in the 'A valid response was recorded' row. To show the actual continuous values in a table, a range must be created.

Here is the same table with a range applied for the continuous values for the data item 'Number of times participated in sport/physical recreation as a player in the last 12 months' (Sport Example). Note that the numbers of respondents for the 'Don't know' category no longer contribute to the table.

Any special codes for continuous data items are listed in the Data Item List.

### Filed exclusion rules

To ensure confidentiality, TableBuilder prevents the cross-tabulation of certain variables which could result in respondents being identified. These are know as field exclusion rules. These restrictions have been applied to the sub-state geographic and SEIFA data items such that only one sub-state geographic or SEIFA data item can be included in any one table.

The sub-state geographic and SEIFA data items available are:

• Greater Capital City Statistical Areas
• Remoteness Areas - ASGS
• Section of State - ASGS
• SEIFA - Index of Relative Socio-economic Advantage and Disadvantage - 2011 - SA1 - Deciles National
• SEIFA - Index of Relative Socio-economic Disadvantage - 2011 - SA1 - Deciles National
• SEIFA - Index of Relative Socio-economic Advantage and Disadvantage - 2011 - SA1 - Deciles State
• SEIFA - Index of Relative Socio-economic Disadvantage - 2011 - SA1 - Deciles State

If field exclusion rules exist for certain variables, users will see the following message: “Maximum number of fields in exclusion group exceeded.”

To minimise the risk of identifying individuals in aggregate statistics, a technique is used to randomly adjust cell values. This technique is called perturbation. Perturbation involves small random adjustment of the statistics and is considered the most satisfactory technique for avoiding the release of identifiable statistics while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics. After perturbation, a given published cell value will be consistent across all tables. However, adding up cell values to derive a total will not necessarily give the same result as published totals. The introduction of perturbation in publications ensures that these statistics are consistent with statistics released via services such as Table Builder.

### Zero value cells

Tables generated from sample surveys will sometimes contain cells with zero values because no respondents that satisfy the parameters of the cell were in the survey. This is despite there being people in the population with those characteristics. That is, the cell may have had a value above zero if all persons in scope of the survey had been enumerated. This is an example of sampling variability which occurs with all sample surveys. Relative Standard Errors cannot be generated for zero cells. Whilst the tables may include cells with zero values, the ABS does not publish such zero estimates in Participation in Sport and Physical Recreation, Australia (cat. no. 4177.0) and recommends that TableBuilder clients do not use these data either.

### Multi-response data items

A number of the survey's data items allow respondents to provide more than one response. These are referred to as 'multi–response data items'. An example of such a data item is shown below. For this data item respondents can report all types of facilities they have used for sport/physical recreation activity in the last 12 months.

When a multiple response data item is tabulated, a person is counted against each response they have provided (e.g. a person who used "outdoor sports facilities" and "off-road cycleways or bike paths" will be counted one time in each of these two categories).

As a result, each person in the appropriate population is counted at least once, and some persons are counted multiple times. Therefore, the total for a multiple response data item will be less than or equal to the sum of its components. Multi–response data items can be identified by the initials 'MR' in the data item list, which can be accessed from the Downloads page. In the example below, the sum of the components is 27,039,900 whereas the total population is 18,474,000.

## Data item list

A complete list of all data items included on the Participation in Sport and Physical Recreation TableBuilder file is provided in an Excel spreadsheet that can be accessed from the Data downloads section. The population applicable to each data item is also shown.

### TableBuilder data

Data items are generally available for cross-tabulation using the TableBuilder, although some restrictions may apply.

A list of data items available for use with the TableBuilder, including relevant population and classification details, can be found in the Data downloads section.

For a complete list of all data items included on the TableBuilder, refer to the Excel spreadsheet in the Data downloads section. The data item spreadsheet has 11 worksheets;

• data items on demographics
• data items on geography
• data items on labour force characteristics
• data items on household income characteristics
• data items on personal income characteristics
• data items on education characteristics
• data items on participation in sport and physical recreation: person level participation
• data items on participation in sport and physical recreation: activity level participation
• data items on involvement in organised sport and physical activity: person level IOSPA
• data items on involvement in organised sport and physical activity: role level IOSPA

## Conditions of use

### User responsibilities

The Census and Statistics Act 1905 includes a legislative guarantee to respondents that their confidentiality will be protected. This is fundamental to the trust the Australian public has in the ABS, and that trust is in turn fundamental to the excellent quality of ABS information. Without that trust, survey respondents may be less forthcoming or truthful in answering ABS questionnaires. For more information, see 'Avoiding inadvertent disclosure' and 'Microdata' on the ABS web page How the ABS keeps your information confidential.

In accordance with the Census and Statistics Act, data in TableBuilder are subjected to a confidentiality process before release. The release of microdata must satisfy the ABS legislative obligation to release information in a manner that is not likely to enable the identification of a particular person or organisation.

This confidentiality process is applied to avoid releasing information that may lead to the identification of individuals, families, households, dwellings or businesses.

Prior to being granted access to TableBuilder, users must agree to the following ABS Terms and Conditions of Microdata Access:

• understand that the ABS has taken great care to ensure that the information on the survey output record file is correct and as accurate as possible and understand that the ABS does not guarantee, or accept any legal liability whatsoever arising from, or connected to, the use of any material contained within, or derived from TableBuilder.
• understand that all data extracted from the survey output record file through TableBuilder will be confidentialised prior to being supplied and that as a result, no reliance should be placed on small cells as they are impacted by random adjustment and respondent and processing errors
• users inform the ABS, through their Contact Officer, upon leaving their organisation that their access is disabled

