SUMMATION OPTIONS, RANGES AND QUANTILES
This section includes:
Summation options
Ranges
Quantiles
SUMMATION OPTIONS
UNDERSTANDING WHAT IS BEING COUNTED
Some datasets allow users to choose a summation option - what the table is counting. For example, a table may include options for counting people or number of drinks of alcohol consumed. For other datasets there is a default summation option, as it does not make sense to count anything other than the default option.
Census TableBuilder has default summation options set for all datasets, and the summation options cannot be changed. Information about the Default Summation can be found by clicking on the information i link.
The Summation Options control what is being measured in the table. If a Summation Option is not added to the table, TableBuilder automatically adds the Default Summation option for that dataset.
Summation Options allow users to add, remove and change whether the table contains estimates of the sum, median or mean of a continuous variable, or contains estimates of population sizes. For example, instead of counting the number of males and females who consume alcohol, a user can display the total number of standard drinks consumed by males and females. For survey data, this total is weighted based on population estimates.
Sum, Median and Mean are summation options that are available for continuous variables. Ranges can also be created for continuous variables. There are no continuous variables in Census TableBuilder datasets and therefore Sum, Median and Mean options are not available. An example of a continuous variable is Income, where the value can be any number between $0 and a set maximum, including decimal values. Variables that are not continuous are categorical variables. An example of a categorical variable is State, where there are set categories only, such as New South Wales or Victoria. Some variables can be loaded to TableBuilder as categorical or continuous, such as Age.
A table which includes a Sum, Median or Mean for a continuous variable will only include those records in the dataset that have a valid value. For example, records with responses for the continuous variable such as 'N/A' or 'Did not respond' will be excluded. The records which have valid responses for a variable are determined for each individual cell of the table, including total cells. For further information about valid values, see Interpreting sums, means and medians.
ADD A SUM, MEDIAN OR MEAN TO A TABLE
- Starting with an empty table, double-click on the Summation Options section at the top of the left panel of the Table View. If the dataset includes any continuous variables, or if users can select different summation options for the dataset, these are displayed here.
.
- Select the Sum, Median or Mean tick box and Add to Row, Column or Wafer. In this example, the Sum of Number of standard drinks consumed by type of drinks consumed has been added to Row.
.
- Only one Summation Option can be added to each table. If a user adds more than one Summation Option to a table by ticking Sum, Median and Mean at the same time, TableBuilder displays a message that 'You are adding more than one summation option to the table, only one summation option is allowed per table for the current dataset.'
.
- The Weighted sum of Number of standard drinks consumed by type of drinks consumed has been added to Row.
.
- Other variables can then be added to the table. The variable Sex is added to Column, and the data retrieved. This displays the Weighted sum of Number of standard drinks consumed in the last week, which is 62 million standard drinks for all males in Australia and 28 million for all females.
.
- To display the Median or Mean Number of standard drinks consumed instead of the Sum, select the new option from the Summation Options on the left, and Add to Row again. This automatically replaces the Summation Option currently in the table.
.
- After retrieving the data, the Weighted median Number of standard drinks consumed during the last week was 2.9 drinks for Males, 1.9 drinks for Females and 2.7 drinks for all persons.
.
- Similarly, when the Weighted mean Number of standard drinks consumed is added to the row, it automatically replaces the Median. The Weighted mean of Number of standard drinks consumed during the last week was 4.2 drinks for Males, 2.8 drinks for Females and 3.6 drinks for all persons.
.
- If all Summation Options are removed from the table, TableBuilder automatically adds in the Default summation option. If the Weighted mean is removed from the above table, by dragging it into Trash, TableBuilder confirms that 'You have removed all the summation options from the table. The table is now using the default summation option, listed in the Filter'. In this case, the table has reverted to counting Selected persons.
INTERPRETING SUMS, MEANS AND MEDIANS
Care must be taken when interpreting estimates of
sums and
means of continuous variables. Users should read the entry for the continuous variable in the Data Item List for the dataset and examine the Population, and the 'special response' categories. Information about each variable's population is available in the dataset's Data Item List on the Downloads tab via the i link.
A continuous variable on a survey dataset has an associated range of 'valid value' responses, and also may have various categories of response that are 'special', for example one special response may be 'Not applicable', and another 'Not stated'. These special responses may occur for a variety of reasons, such as the relevant survey question not applying to certain units, or the information being unable to be determined. The survey documentation should be consulted for detailed information, via the i link next to the dataset name in the top left corner.
Whenever a sum or mean is included in a table for a continuous variable, the statistic is estimated for the variable's reference population with a valid response.
A continuous variable that does have possible special responses appears in two different sections of the variables list panel. It appears as a selectable Summation Option, and is also present as a categorical variable under the relevant grouping. The version that appears as a categorical variable contains variable categories for each of the special response types, and one for 'valid' responses. Using this variable, population estimates of the various special response types (such as 'No Response') can be obtained. It is highly recommended when interpreting a table of means or sums of a variable that the corresponding categorical variable be used in separate tabulations of population counts.
When interpreting a table of
median estimates, it is also important to understand the population for which the estimate applies, and the valid responses. If there are a small number of units making up a cell's reference population with a valid response, the cell may be suppressed, showing a value of '0' or 'np'. The suppression occurs to prevent the release of disclosive information. The Relative Standard Error (RSE) for each median estimate is estimated using the Woodruff method, which is a replicate weight method. Further information is available in the
Relative Standard Error section.
- When working with Sum, Median and Mean, it is important to check the Data Item List for the dataset to see the population to which the variable applies. The Data Item List can be accessed via the i link at the top left of the Table View screen, next to the dataset name.
.
- From the microdata publication, go to the Downloads tab to open the TableBuilder Data Item List. Information for the variable that was used for the above example is available in the Data Item List for this dataset. In this example, respondents who Did not consume alcohol in the last week or Have never consumed alcohol and respondents who are younger than 15 years are not included in the Sum, Median or the Mean.
.
- From the above example, searching for Number of standard drinks consumed using the search box in the lower left corner, finds two results in the variables list. One is the Summation Option and one is a categorical variable that can be used to find the number of records that recorded a valid response, and that were used to calculate the Sum, Median and Mean.
.
- If a table is created using Sex and the categorical version of Number of standard drinks consumed, the weighted number of persons whose responses contributed to the Sum and Mean are displayed. This table is now counting persons (in 000's). The categorical variable for Number of standard drinks consumed provides estimates of the size of the population to which the above Sum, Median and Mean are calculated. It also provides the sizes of the populations to which it does not apply, for example, the category 'Have never consumed alcohol), including those people under the age of 15 years.
.
There were weighted counts of 5.8 million males and 4.5 million females whose responses were included in the Sum and Mean. A very different result would have occurred for Sum and Mean if the persons who did not consume any alcohol were recorded as consuming a valid value of 0 standard drinks.
RANGES
For each continuous variable on the dataset, users can create
Ranges and include them in a table. Ranges allow users to merge responses to continuous variables in a way that suits their analysis. For example, Age ranges can be created from single years to 15-17 years, 18-29 years, 30-49 years and 50+ if that is required.
All variables in Census TableBuilder datasets are categorical. Categorical variables can also be grouped. See the
Custom data section for further information.
When using a Ranges variable in a table, the ranges span all valid values for that variable. The table will not include units which did not provide a ‘valid’ response for that variable. Even if all the categories within a Ranges variable are added to a table, the grand total of the table may be reduced due to units without a valid value being excluded. See
Interpreting sums, means and medians for more information about valid responses for continuous variables. Users should also read the entry for the continuous variable in the Data Item List for the dataset and examine the Population, and the 'special response' categories. Information about each variable's population is available in the dataset's Data Item List on the Downloads tab via the i link.
CREATING RANGES
- In the left panel, open the Summation Options to select the variable to be put into ranges. Click on the Range button.
.
- The Range and Quantile dialogue box appears. Enter a name for the range (no longer than 25 characters). When selecting the From and To values, select less than or less than or equal to in the drop-down menu to choose whether to include the lower boundary or upper boundary in each range. Enter the increment for each range.
.
- If values are selected outside the allowable range values, TableBuilder displays information about the minimums and maximums for each field. Click Next.
.
- The ranges to be included are displayed. Edit by clicking Back or continue by clicking Create.
.
- This creates your range, and adds it to the variables list panel in the Table View. Click on the Ranges folder to view your range. The Ranges variable is saved, and when you log off and log back in at a later time it will still be available for use.
.
- Custom ranges can be added to a table like any other variable.
CLONE, DOWNLOAD AND DELETE RANGES
- To clone or delete a Ranges variable, find the Ranges folder in the left hand panel. Click the Manage button next to the variable.
.
- The Ranges Management dialogue box opens.
.
- To Clone a range, click the Clone button. The Ranges and Quantiles dialogue box opens with the parameter values of the existing Ranges variable entered. These parameter can be edited, and the name can be changed before clicking the Next button and continuing the creation process. Ranges variable names must be unique.
.
- To Delete a range permanently, click the Delete button. A confirmation dialogue box opens showing any Groups (created from Custom data) and any tables you have saved that use this Ranges variable. Clicking OK will delete these groups and tables along with the Ranges variable. Once deleted, that Ranges variable no longer appears in the left panel in the Tables view.
QUANTILES
Quantiles can be created for each continuous variable on the dataset. Quantiles are ranges that have an equal number of records in each range, such as Deciles (10 ranges), quintiles (5 ranges), quartiles (4 ranges) and the median (2 ranges). Quantiles can be applied to the whole population or to a sub-population by using a filter. Quantiles can also be saved as a Quantile Ranges variable and can then be used in tabulations.
All variables in Census TableBuilder datasets are categorical. Categorical variables can also be grouped. See the
Custom data section for further information.
GENERATING QUANTILE BOUNDARIES AND RANGES
- In the left panel, open the Summation Options to select the variable to be put into ranges. Click on the Range button.
.
- The Range and Quantile dialogue box appears, open at the Custom Ranges tab.
- Select the Quantile tab.
- Enter a name for the Quantile (no longer than 25 characters).
- Click the drop-down menu for the Number of Ranges to choose from Median, Quartile, Quintile, and Decile options.
- Click the drop-down menu next to Equal distribution of and choose the Distribution variable. For an ordinary quantile, ensure that the appropriate level is chosen as the Distribution variable, and do not choose a continuous variable here. See Equal distribution quantiles for more information.
.
- Choose the subpopulation for which quantiles will be generated by using the Filter By section. Navigate the left panel to the categories to be included in the filter. Click the tick boxes next to the variables to be included in the filter, for example Sex of person ‘Male’. Then click the Move button. The window on the right displays the filter, is updated to include the variables and categories moved. Leave the filter empty if the quantile is to be applied to the whole population.
.
- More variables and categories can be selected using the same process. Variables and categories can also be removed from the filter by selecting tick boxes in the window on the right and then clicking the Move button. See Interpreting quantiles for more details on how to interpret a quantile with a filter.
.
- Click the Next button. The window updates to show the estimated quantile boundaries along with their RSEs.
.
- Users may choose to:
- cancel by clicking Cancel
- make further changes to the quantile by clicking Back
- download the displayed table in csv format by clicking the Download button, or
- create a Quantile Ranges variable by clicking the Create button.
.
- By clicking Create, the boundaries are used to create a new Quantile variable with the name you entered. The Quantile variable can be accessed from the Ranges folder in the left panel of the Table View.
CLONE, DOWNLOAD AND DELETE QUANTILE RANGES
- To clone or delete a Quantile Ranges variable, find the Quantile Range in the left panel in the Ranges folder. Click the Manage button next to the variable.
.
- The Quantile Ranges Management dialogue box opens.
.
- To Clone a quantile range, click the Clone button. The Ranges and Quantiles dialogue box opens with the parameter values of the existing Quantile Ranges variable entered. These parameter can be edited, and the name can be changed before clicking the Next button and continuing the creation process. Quantile Ranges variable names must be unique.
.
- To Download a quantile range in csv format, click the Download button.
.
- To Delete a quantile range, click Delete to permanently delete the Quantile Ranges variable. A confirmation dialogue box opens showing any Groups (created from Custom data) and any tables you have saved that use this Quantile Ranges variable. Clicking OK will delete these groups or tables along with the Quantile Ranges variable. Once deleted, that Quantile Ranges variable no longer appears in the variables list panel.
INTERPRETING QUANTILES
When using a Quantile Ranges variable in a table, the ranges span all valid values for that variable. The table will not include units which did not provide a ‘valid’ response for that variable. Even if all categories within a Quantile Ranges variable are added to a table, the grand total of the table may be reduced due to units without a valid value being excluded. See
Interpreting sums, means and medians for more information about valid responses for continuous variables. Users should also read the entry for the continuous variable in the Data Item List for the dataset and examine the Population, and the 'special response' categories. Information about each variable's population is available in the dataset's Data Item List on the Downloads tab via the i link.
In the example, the estimates are quartiles of the variable
Body mass index (BMI) - score measured weighted by
Selected Persons level, for the subpopulation of persons with
Sex being
Male. The window on the right displays the quantiles, where the Range column shows the number of the quantile, the Max column shows the value of the quantile and the RSE column the estimate’s RSE. In the example, the first quartile (25th percentile) is 21.729 with RSE 0.43%, the second quartile (50th percentile) is 25.75 with RSE 0.29%, and the third quartile (75th percentile) is 29.39 with RSE 0.39%. The value for the final range is always shown with infinity, as this range has no maximum.
In some situations, it is possible that an estimate of a quantile may be displayed, but the estimate of the RSE is displayed as *. This occurs when the RSE cannot be calculated reliably and the estimate should be treated with caution. When a Quantile Ranges variable is created by clicking the Create button, the first range has no minimum value and the last range has no maximum value. This is why the value of Max for the last entry is displayed as infinity with an RSE of *.
When multiple variables with multiple categories are present in the Filter By panel, the population is filtered by restricting to records that belong to at least one of the variables, for each separate variable. For example, in the filter below, the subpopulation is all
Male persons who have a
Country of birth in
North-East Asia.
It is possible to use variables from different levels of the data in a filter. For example, it is possible to use a filter of
State or Territory being
NSW, a Household level variable, when requesting quantiles of a Person level variable such as
Age of Person. If this is the case, the population is filtered by restricting to the appropriate collection of units. The population over which quantiles are estimated would be those persons belonging to a household in NSW.
Once the Quantile Range variable has been created and is used in a table, it applies to the entire population, and not to the subpopulation that the quantile refers to.
For example, if the 1st Quantile Range above was selected from the Ranges Group and used in a table of estimated counts, the estimates would be estimated counts of the entire population, and not restricted to persons with particular values for Country of Birth and Sex.
EQUAL DISTRIBUTION QUANTILES
In addition to ordinary quantiles as described above,
Equal Distribution Quantiles can also be generated. By selecting a continuous variable as the
Equal Distribution by variable, groups are created that contribute the same proportion of the total of that variable, rather than groups that have the same proportion of the total number of records.
For example, select the variable
Average daily intake over week (in mls) is selected as the continuous variable (by clicking on the Range button next to this variable). Choose deciles and set a filter for
Sex is
Male. For the
Equal Distribution of variable:
- If Persons is selected as the Equal Distribution of variable, then the deciles of Average daily intake over week (in mls) is created. These deciles can be interpreted as equally distributing the number of persons into ten groups. Each group has an equal share of the number of persons: 10%.
- If Average daily intake over week (in mls) is chosen as the Equal distribution of variable, then the deciles created do not equally distribute the number of persons into ten groups. Instead, it is Average daily intake over week (in mls) that is equally distributed amongst the ten groups, so that each of the ten groups created represents a 10% share of the total of Average daily intake over week (in mls) within the subpopulation of all Males.
It is important when generating ordinary quantiles, that the
Equal distribution of variable is the appropriate weight, and is not a continuous variable.
The
Equal distribution of variable cannot contain negative values.