1406.0.55.005 - TableBuilder, User Guide  
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 15/12/2016   
   Page tools: Print Print Page Print all pages in this productPrint All RSS Feed RSS Bookmark and Share Search this Product

SUMMATION OPTIONS, RANGES AND QUANTILES

This section includes:
Summation options
Ranges
Quantiles


SUMMATION OPTIONS

UNDERSTANDING WHAT IS BEING COUNTED

Some datasets allow users to choose a summation option - what the table is counting. For example, a table may include options for counting people or number of drinks of alcohol consumed. For other datasets there is a default summation option, as it does not make sense to count anything other than the default option.

Census TableBuilder has default summation options set for all datasets, and the summation options cannot be changed. Information about the Default Summation can be found by clicking on the information i link.
Image: Default summation displaying for a Census dataset

The Summation Options control what is being measured in the table. If a Summation Option is not added to the table, TableBuilder automatically adds the Default Summation option for that dataset.

Summation Options allow users to add, remove and change whether the table contains estimates of the sum, median or mean of a continuous variable, or contains estimates of population sizes. For example, instead of counting the number of males and females who consume alcohol, a user can display the total number of standard drinks consumed by males and females. For survey data, this total is weighted based on population estimates.

Sum, Median and Mean are summation options that are available for continuous variables. Ranges can also be created for continuous variables. There are no continuous variables in Census TableBuilder datasets and therefore Sum, Median and Mean options are not available. An example of a continuous variable is Income, where the value can be any number between $0 and a set maximum, including decimal values. Variables that are not continuous are categorical variables. An example of a categorical variable is State, where there are set categories only, such as New South Wales or Victoria. Some variables can be loaded to TableBuilder as categorical or continuous, such as Age.

A table which includes a Sum, Median or Mean for a continuous variable will only include those records in the dataset that have a valid value. For example, records with responses for the continuous variable such as 'N/A' or 'Did not respond' will be excluded. The records which have valid responses for a variable are determined for each individual cell of the table, including total cells. For further information about valid values, see Interpreting sums, means and medians.

ADD A SUM, MEDIAN OR MEAN TO A TABLE

  1. Starting with an empty table, double-click on the Summation Options section at the top of the left panel of the Table View. If the dataset includes any continuous variables, or if users can select different summation options for the dataset, these are displayed here.
    .
  2. Select the Sum, Median or Mean tick box and Add to Row, Column or Wafer. In this example, the Sum of Number of standard drinks consumed by type of drinks consumed has been added to Row.
    Image: Summation options with Sum selected displaying on variables list panel
    .
  3. Only one Summation Option can be added to each table. If a user adds more than one Summation Option to a table by ticking Sum, Median and Mean at the same time, TableBuilder displays a message that 'You are adding more than one summation option to the table, only one summation option is allowed per table for the current dataset.'
    Image: Error message indicating more than one summation option cannot be added to the same table
    .
  4. The Weighted sum of Number of standard drinks consumed by type of drinks consumed has been added to Row.
    Image: Weighted sum of Number of standard drinks added to table
    .
  5. Other variables can then be added to the table. The variable Sex is added to Column, and the data retrieved. This displays the Weighted sum of Number of standard drinks consumed in the last week, which is 62 million standard drinks for all males in Australia and 28 million for all females.
    Image: Other variables added with weighted sum of Number of standard drinks
    .
  6. To display the Median or Mean Number of standard drinks consumed instead of the Sum, select the new option from the Summation Options on the left, and Add to Row again. This automatically replaces the Summation Option currently in the table.
    Image: Summation options with Median selected displaying on variables list panel
    .
  7. After retrieving the data, the Weighted median Number of standard drinks consumed during the last week was 2.9 drinks for Males, 1.9 drinks for Females and 2.7 drinks for all persons.
    Image: Weighted median displaying with Sex
    .
  8. Similarly, when the Weighted mean Number of standard drinks consumed is added to the row, it automatically replaces the Median. The Weighted mean of Number of standard drinks consumed during the last week was 4.2 drinks for Males, 2.8 drinks for Females and 3.6 drinks for all persons.
    Image: Weighted mean displaying with Sex
    .
  9. If all Summation Options are removed from the table, TableBuilder automatically adds in the Default summation option. If the Weighted mean is removed from the above table, by dragging it into Trash, TableBuilder confirms that 'You have removed all the summation options from the table. The table is now using the default summation option, listed in the Filter'. In this case, the table has reverted to counting Selected persons.
    Image: Message indicating that removing all summation options from a table reverts the table to default summation
INTERPRETING SUMS, MEANS AND MEDIANS

Care must be taken when interpreting estimates of sums and means of continuous variables. Users should read the entry for the continuous variable in the Data Item List for the dataset and examine the Population, and the 'special response' categories. Information about each variable's population is available in the dataset's Data Item List on the Downloads tab via the i link.

A continuous variable on a survey dataset has an associated range of 'valid value' responses, and also may have various categories of response that are 'special', for example one special response may be 'Not applicable', and another 'Not stated'. These special responses may occur for a variety of reasons, such as the relevant survey question not applying to certain units, or the information being unable to be determined. The survey documentation should be consulted for detailed information, via the i link next to the dataset name in the top left corner.

Whenever a sum or mean is included in a table for a continuous variable, the statistic is estimated for the variable's reference population with a valid response.

A continuous variable that does have possible special responses appears in two different sections of the variables list panel. It appears as a selectable Summation Option, and is also present as a categorical variable under the relevant grouping. The version that appears as a categorical variable contains variable categories for each of the special response types, and one for 'valid' responses. Using this variable, population estimates of the various special response types (such as 'No Response') can be obtained. It is highly recommended when interpreting a table of means or sums of a variable that the corresponding categorical variable be used in separate tabulations of population counts.

When interpreting a table of median estimates, it is also important to understand the population for which the estimate applies, and the valid responses. If there are a small number of units making up a cell's reference population with a valid response, the cell may be suppressed, showing a value of '0' or 'np'. The suppression occurs to prevent the release of disclosive information. The Relative Standard Error (RSE) for each median estimate is estimated using the Woodruff method, which is a replicate weight method. Further information is available in the Relative Standard Error section.
  1. When working with Sum, Median and Mean, it is important to check the Data Item List for the dataset to see the population to which the variable applies. The Data Item List can be accessed via the i link at the top left of the Table View screen, next to the dataset name.
    Image: i link next to the dataset name in top left corner
    .
  2. From the microdata publication, go to the Downloads tab to open the TableBuilder Data Item List. Information for the variable that was used for the above example is available in the Data Item List for this dataset. In this example, respondents who Did not consume alcohol in the last week or Have never consumed alcohol and respondents who are younger than 15 years are not included in the Sum, Median or the Mean.
    Image: Example Data Item List in Excel
    .
  3. From the above example, searching for Number of standard drinks consumed using the search box in the lower left corner, finds two results in the variables list. One is the Summation Option and one is a categorical variable that can be used to find the number of records that recorded a valid response, and that were used to calculate the Sum, Median and Mean.
    Image: Number of standard drinks as a categorical variable
    .
  4. If a table is created using Sex and the categorical version of Number of standard drinks consumed, the weighted number of persons whose responses contributed to the Sum and Mean are displayed. This table is now counting persons (in 000's). The categorical variable for Number of standard drinks consumed provides estimates of the size of the population to which the above Sum, Median and Mean are calculated. It also provides the sizes of the populations to which it does not apply, for example, the category 'Have never consumed alcohol), including those people under the age of 15 years.
    .
    There were weighted counts of 5.8 million males and 4.5 million females whose responses were included in the Sum and Mean. A very different result would have occurred for Sum and Mean if the persons who did not consume any alcohol were recorded as consuming a valid value of 0 standard drinks.
    Image: Table using categorical Number of standard drinks and Default summation of selected persons
RANGES

For each continuous variable on the dataset, users can create Ranges and include them in a table. Ranges allow users to merge responses to continuous variables in a way that suits their analysis. For example, Age ranges can be created from single years to 15-17 years, 18-29 years, 30-49 years and 50+ if that is required.

All variables in Census TableBuilder datasets are categorical. Categorical variables can also be grouped. See the Custom data section for further information.

When using a Ranges variable in a table, the ranges span all valid values for that variable. The table will not include units which did not provide a ‘valid’ response for that variable. Even if all the categories within a Ranges variable are added to a table, the grand total of the table may be reduced due to units without a valid value being excluded. See Interpreting sums, means and medians for more information about valid responses for continuous variables. Users should also read the entry for the continuous variable in the Data Item List for the dataset and examine the Population, and the 'special response' categories. Information about each variable's population is available in the dataset's Data Item List on the Downloads tab via the i link.

CREATING RANGES
  1. In the left panel, open the Summation Options to select the variable to be put into ranges. Click on the Range button.
    Image: Summation options displaying with Range button
    .
  2. The Range and Quantile dialogue box appears. Enter a name for the range (no longer than 25 characters). When selecting the From and To values, select less than or less than or equal to in the drop-down menu to choose whether to include the lower boundary or upper boundary in each range. Enter the increment for each range.
    Image: Range dialogue box with from, to and increment settings
    .
  3. If values are selected outside the allowable range values, TableBuilder displays information about the minimums and maximums for each field. Click Next.
    Image: Range dialogue box with errors where ranges are not able to be applied
    .
  4. The ranges to be included are displayed. Edit by clicking Back or continue by clicking Create.
    Image: Chosen ranges displayed
    .
  5. This creates your range, and adds it to the variables list panel in the Table View. Click on the Ranges folder to view your range. The Ranges variable is saved, and when you log off and log back in at a later time it will still be available for use.
    Image: Ranges displaying in variables list panel
    .
  6. Custom ranges can be added to a table like any other variable.
    Image: Adding custom range to a table
CLONE, DOWNLOAD AND DELETE RANGES
  1. To clone or delete a Ranges variable, find the Ranges folder in the left hand panel. Click the Manage button next to the variable.
    Image: Ranges in variable list panel showing Manage button
    .
  2. The Ranges Management dialogue box opens.
    Image: Ranges management dialogue box
    .
  3. To Clone a range, click the Clone button. The Ranges and Quantiles dialogue box opens with the parameter values of the existing Ranges variable entered. These parameter can be edited, and the name can be changed before clicking the Next button and continuing the creation process. Ranges variable names must be unique.
    .
  4. To Delete a range permanently, click the Delete button. A confirmation dialogue box opens showing any Groups (created from Custom data) and any tables you have saved that use this Ranges variable. Clicking OK will delete these groups and tables along with the Ranges variable. Once deleted, that Ranges variable no longer appears in the left panel in the Tables view.
    Image: Delete range dialogue box
QUANTILES

Quantiles can be created for each continuous variable on the dataset. Quantiles are ranges that have an equal number of records in each range, such as Deciles (10 ranges), quintiles (5 ranges), quartiles (4 ranges) and the median (2 ranges). Quantiles can be applied to the whole population or to a sub-population by using a filter. Quantiles can also be saved as a Quantile Ranges variable and can then be used in tabulations.

All variables in Census TableBuilder datasets are categorical. Categorical variables can also be grouped. See the Custom data section for further information.
GENERATING QUANTILE BOUNDARIES AND RANGES
  1. In the left panel, open the Summation Options to select the variable to be put into ranges. Click on the Range button.
    Image: Summation options showing selecting Range button
    .
  2. The Range and Quantile dialogue box appears, open at the Custom Ranges tab.
    - Select the Quantile tab.
    - Enter a name for the Quantile (no longer than 25 characters).
    - Click the drop-down menu for the Number of Ranges to choose from Median, Quartile, Quintile, and Decile options.
    - Click the drop-down menu next to Equal distribution of and choose the Distribution variable. For an ordinary quantile, ensure that the appropriate level is chosen as the Distribution variable, and do not choose a continuous variable here. See Equal distribution quantiles for more information.
    Image: Range dialogue box on the Quantile tab
    .
  3. Choose the subpopulation for which quantiles will be generated by using the Filter By section. Navigate the left panel to the categories to be included in the filter. Click the tick boxes next to the variables to be included in the filter, for example Sex of person ‘Male’. Then click the Move button. The window on the right displays the filter, is updated to include the variables and categories moved. Leave the filter empty if the quantile is to be applied to the whole population.
    Image: Adding filters to a range
    .
  4. More variables and categories can be selected using the same process. Variables and categories can also be removed from the filter by selecting tick boxes in the window on the right and then clicking the Move button. See Interpreting quantiles for more details on how to interpret a quantile with a filter.
    .
  5. Click the Next button. The window updates to show the estimated quantile boundaries along with their RSEs.
    Image: Custom ranges displaying before being created
    .
  6. Users may choose to:
    - cancel by clicking Cancel
    - make further changes to the quantile by clicking Back
    - download the displayed table in csv format by clicking the Download button, or
    - create a Quantile Ranges variable by clicking the Create button.
    .
  7. By clicking Create, the boundaries are used to create a new Quantile variable with the name you entered. The Quantile variable can be accessed from the Ranges folder in the left panel of the Table View.
    Image: Range displaying in variables list panel
CLONE, DOWNLOAD AND DELETE QUANTILE RANGES
  1. To clone or delete a Quantile Ranges variable, find the Quantile Range in the left panel in the Ranges folder. Click the Manage button next to the variable.
    Image: Ranges in variables panel showing the Manage button
    .
  2. The Quantile Ranges Management dialogue box opens.
    Image: Quantile ranges management dialogue box
    .
  3. To Clone a quantile range, click the Clone button. The Ranges and Quantiles dialogue box opens with the parameter values of the existing Quantile Ranges variable entered. These parameter can be edited, and the name can be changed before clicking the Next button and continuing the creation process. Quantile Ranges variable names must be unique.
    .
  4. To Download a quantile range in csv format, click the Download button.
    .
  5. To Delete a quantile range, click Delete to permanently delete the Quantile Ranges variable. A confirmation dialogue box opens showing any Groups (created from Custom data) and any tables you have saved that use this Quantile Ranges variable. Clicking OK will delete these groups or tables along with the Quantile Ranges variable. Once deleted, that Quantile Ranges variable no longer appears in the variables list panel.
    Image: Deleting quantiles dialogue box

INTERPRETING QUANTILES

When using a Quantile Ranges variable in a table, the ranges span all valid values for that variable. The table will not include units which did not provide a ‘valid’ response for that variable. Even if all categories within a Quantile Ranges variable are added to a table, the grand total of the table may be reduced due to units without a valid value being excluded. See Interpreting sums, means and medians for more information about valid responses for continuous variables. Users should also read the entry for the continuous variable in the Data Item List for the dataset and examine the Population, and the 'special response' categories. Information about each variable's population is available in the dataset's Data Item List on the Downloads tab via the i link.

In the example, the estimates are quartiles of the variable Body mass index (BMI) - score measured weighted by Selected Persons level, for the subpopulation of persons with Sex being Male. The window on the right displays the quantiles, where the Range column shows the number of the quantile, the Max column shows the value of the quantile and the RSE column the estimate’s RSE. In the example, the first quartile (25th percentile) is 21.729 with RSE 0.43%, the second quartile (50th percentile) is 25.75 with RSE 0.29%, and the third quartile (75th percentile) is 29.39 with RSE 0.39%. The value for the final range is always shown with infinity, as this range has no maximum.
Image: Range dialogue box with ranges displayed for one filter

In some situations, it is possible that an estimate of a quantile may be displayed, but the estimate of the RSE is displayed as *. This occurs when the RSE cannot be calculated reliably and the estimate should be treated with caution. When a Quantile Ranges variable is created by clicking the Create button, the first range has no minimum value and the last range has no maximum value. This is why the value of Max for the last entry is displayed as infinity with an RSE of *.

When multiple variables with multiple categories are present in the Filter By panel, the population is filtered by restricting to records that belong to at least one of the variables, for each separate variable. For example, in the filter below, the subpopulation is all Male persons who have a Country of birth in North-East Asia.
Image: Range dialogue box with ranges displayed for two filters

It is possible to use variables from different levels of the data in a filter. For example, it is possible to use a filter of State or Territory being NSW, a Household level variable, when requesting quantiles of a Person level variable such as Age of Person. If this is the case, the population is filtered by restricting to the appropriate collection of units. The population over which quantiles are estimated would be those persons belonging to a household in NSW.

Once the Quantile Range variable has been created and is used in a table, it applies to the entire population, and not to the subpopulation that the quantile refers to.

For example, if the 1st Quantile Range above was selected from the Ranges Group and used in a table of estimated counts, the estimates would be estimated counts of the entire population, and not restricted to persons with particular values for Country of Birth and Sex.

EQUAL DISTRIBUTION QUANTILES

In addition to ordinary quantiles as described above, Equal Distribution Quantiles can also be generated. By selecting a continuous variable as the Equal Distribution by variable, groups are created that contribute the same proportion of the total of that variable, rather than groups that have the same proportion of the total number of records.

For example, select the variable Average daily intake over week (in mls) is selected as the continuous variable (by clicking on the Range button next to this variable). Choose deciles and set a filter for Sex is Male. For the Equal Distribution of variable:
  • If Persons is selected as the Equal Distribution of variable, then the deciles of Average daily intake over week (in mls) is created. These deciles can be interpreted as equally distributing the number of persons into ten groups. Each group has an equal share of the number of persons: 10%.
  • If Average daily intake over week (in mls) is chosen as the Equal distribution of variable, then the deciles created do not equally distribute the number of persons into ten groups. Instead, it is Average daily intake over week (in mls) that is equally distributed amongst the ten groups, so that each of the ten groups created represents a 10% share of the total of Average daily intake over week (in mls) within the subpopulation of all Males.

It is important when generating ordinary quantiles, that the Equal distribution of variable is the appropriate weight, and is not a continuous variable.

The Equal distribution of variable cannot contain negative values.