4159.0.30.004 - Microdata: General Social Survey, Australia, 2014 Quality Declaration 
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 17/09/2015  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All

USING THE CURF

ABOUT THE CURF

The 2014 GSS Expanded CURF contains four separate record level files which are described in the File Structure document on the summary tab. Subject to the limitation of sample size, the data classifications used and the conditions of use, it is possible to interrogate the data, produce tabulations and undertake statistical analyses to individual specifications.

The data included in the CURF are released under the provisions of the Census and Statistics Act 1905. This legislation allows the Australian Statistician to release unit record data, or microdata, provided this is done "in a manner that is not likely to enable the identification of a particular person or organisation to which it relates." Accordingly, there are no names or addresses of survey respondents on the CURF and other steps, including the following list of actions, have been taken to protect the confidentiality of respondents:

-Excluding some data items that were collected.
-Applying value ranges, collapses or top-coding to some variables.
-Changing some demographic characteristics on a number of person records.

As a result, aggregated data obtained from the CURF will not exactly match estimates previously published in General Social Survey: Summary Results, Australia, 2014 (cat. no. 4159.0). Information about the impact of confidentialising actions on the CURF and comparison to published estimates for key populations can be found in Table one below.

Table 1 shows the change to the estimated population of persons aged fifteen years and over, and population estimates of persons aged fifteen years and over by state or territory, as previously published and from the CURF. It can be seen that, proportionally, the largest impact of the confidentialising process is in relation to ACT, where estimates changed by less than one percent.

CHANGE OF POPULATION ESTIMATES FOR PEOPLE AGED FIFTEEN YEARS AND OVER DUE TO RECORD MASKING, BY STATE

Published
CURF
State or territory
'000
'000
% change

New South Wales
5 967.4
5 963.0
-0.1
Victoria
4 682.6
4 683.6
0.0
Queensland
3 660.5
3 662.7
0.1
South Australia
1 347.5
1 344.6
-0.2
Western Australia
1 973.0
1 978.2
0.3
Tasmania
409.7
410.2
0.1
Northern Territory
140.5
140.6
0.0
Australian Capital territory
303.3
306.0
0.9
Total
18 486.0
18 488.8
0.0


Steps to confidentialise the datasets made available on the CURF are undertaken in such a way as to ensure the integrity of the datasets and optimise the content, while maintaining the confidentiality of respondents. Intending purchasers should ensure that the data they require at the level of detail they require are available on the CURF; data obtained in the survey, but not contained on the CURF may be available in TableBuilder or in tabulated form on request. The Data Item Lists document on the Summary tab contains information about the list of data items, which is available as an Excel spreadsheet on the Downloads tab.

RECORD COUNTS

Table 2 shows the number of records on each level for the CURF dataset.

COUNTING UNITS AND NUMBER OF RECORDS, BY LEVEL

LevelCounting unit
Number of records

Household levelHouseholds
12 932
Person levelPersons
12 932
Volunteering levelInstances of volunteering
15 475
Access to services levelServices had difficulty accessing
15 198



IDENTIFIERS

There is a series of identifiers that can be used on records at each level of the file.

File level identifiers


The identifiers ABSHID, ABSPID, ABSVID, ABSDID appear on all levels of the file (as they are needed to create a hierarchical CSV file). Where the information for the identifier is not relevant for a level, it has a value of 0.

Each household has a unique twelve digit random identifier, ABSHID. This identifier appears on the Household level and is repeated on every other level. The Voluntary Work and Difficulty Accessing Service Providers episode levels are children of the Person level, and therefore the unique identifier is comprised of the Household, Person and episode level. The composition of identifiers for each level is outlined below:

1. Household = ABSHID
2. Person = ABSHID, ABSPID
3. Voluntary Work = ABSHID, ABSPID, ABSVID
4. Difficulty Accessing Service Providers = ABSHID, ABSPID, ABSDID

Copying information across levels

Identifiers can be used to copy information from one level of the file to another. The following SAS code (or equivalent) can be used to copy information from one level to another:

PROC SORT DATA=GSS14EP; *Person level file;
BY ABSHID;
RUN;

PROC SORT DATA=GSS14EH; *Household level file;
BY ABSHID;
RUN;

DATA MERGE_FILE;
MERGE GSS14EP (IN=A) GSS14EH (IN=B);
BY ABSHID;
IF A AND B THEN OUTPUT;
RUN;


The following SAS code (or equivalent) can be used to copy information from a higher level to a level below:

PROC SORT DATA=GSS14EP; *Person level file;
BY ABSHID;
RUN;

PROC SORT DATA=GSS14EV; *Volunteering level file;
BY ABSHID;
RUN;

DATA MERGE_FILE;
MERGE GSS14EV (IN=A) GSS14EP (IN=B)
BY ABSHID;
IF A AND B THEN OUTPUT; *Only keeps records which are present on both files;
RUN;

This merge will match one GSS14EP record to many GSS14EV records. The statement 'If A and B then OUTPUT;' ensures that only records present on both files are kept. If this statement was not used then GSS14EP records without a corresponding GSS14EV record would appear with a missing value for all GSS14EV data items. Note that the data items copied from the GSS14EP level will now have the counting unit for the level they have been added to, being instances of volunteering in this case.

Combining data from different levels can sometimes be confusing, both in selecting an appropriate item and in understanding the counting unit. For example, if you are interested in volunteering activity, and you want to analyse this by volunteers' characteristics such as sex or age, then you might cross-tabulate SEX by VOLSECT (organisation sector type). This would yield results indicating the estimate (or sample count) of instances of volunteering in each sector, split by sex, rather than the estimate (or sample count) of males or females and their respective activity as volunteers in each sector. When looking at the volunteering level, the counting unit is instances of volunteering, rather than persons.

Example STATA code

use "`GSS14EH'"
table GCCSA, c( freq ) f(%11.0f) stubwidth(30)
table SF2SA1DN, c( freq ) f(%11.0f) stubwidth(30)

use "`GSS14EP'"
table DISSTAT, c( freq ) f(%11.0f) stubwidth(30)
table EDATTAIN, c( freq ) f(%11.0f) stubwidth(30)

Example SPSS code

GET
FILE=GSS14EH.
EXECUTE.
FREQUENCIES
VARIABLES=GCCSA SF2SA1DN/ORDER=ANALYSIS.

GET
FILE=GSS14EP.
EXECUTE.
FREQUENCIES
VARIABLES=DISSTAT EDATTAIN/ORDER=ANALYSIS.


MULTI-RESPONSE ITEMS ON THE CURF

A number of questions included in the survey allowed respondents to provide one or more responses. Each response category for one of these 'multi-response questions' (or data items) is basically treated as a separate data item. On the CURF, these data items have the same general data item identifier (SASName) but are each suffixed with a letter – A for the first response, B for the second response, C for the third response, D for the fourth response and so on.

For example, the multi-response data item 'Long term health condition by type of condition' (with a general SASName of LTHCOND – see data item list), has twenty-one response categories. Consequently, twenty-one data items have been produced - LTHCONDA, LTHCONDB, LTHCONDC and so on.

Each data item in the series (i.e. LTHCONDA -- LTHCONDU) will have two response codes: A 'Yes' response (for the first in the series (code 1), for the second in the series (code 2) etc.) or a 'Null' response (code 0) indicating that the response was not relevant for the respondent. The last data item in the series will represent a 'Not Applicable' response (i.e. value of last character in series) which comprises the respondents not asked the questions (e.g. LTHCONDU with values of 0 or 99).

It should be noted that the sum of individual multi-response categories will be greater than the population or number of people applicable to the particular data item as respondents are able to select more than one response. Multi-response data items can be identified in the data item list where the words <multiple response> appear next to the data item name.


CURF DATA FILES

The 2014 expanded CURF can be accessed via the RADL, and is available in SAS, SPSS and STATA formats. The CURF comprises the following files:

SAS files

These files contain the data for the CURF in SAS format.

GSS14EH.SAS7BDAT contains the Household level data
GSS14EP.SAS7BDAT contains the Person level data
GSS14EV.SAS7BDAT contains the Voluntary Work level data
GSS14ED.SAS7BDAT contains the Difficulty Accessing Service Providers data


SPSS files

These files contain the data for the CURF in SPSS format.

GSS14EH.SAV contains the Household level data
GSS14EP.SAV contains the Person level data
GSS14EV.SAV contains the Voluntary Work level data
GSS14ED.SAV contains the Difficulty Accessing Service Providers data


STATA files

These files contain the data for the CURF in STATA format.

GSS14EH.DTA contains the Household level data
GSS14EP.DTA contains the Person level data
GSS14EV.DTA contains the Voluntary Work level data
GSS14ED.DTA contains the Difficulty Accessing Service Providers data


INFORMATION FILES

Data item list

The Data item list contains all the data items, including details of categories and code values, that are available on the CURF.

Formats file

This file is a SAS library containing formats.

Frequency files


Household (HH)
A file containing documentation of the Household level data. Data item code values and category labels are provided with weighted household frequencies of each value. This file is in plain text format.

Person (PER)
A file containing documentation of the Person level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.

Voluntary Work (VOL)
A file containing documentation of the Voluntary Work level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.

Difficulty Accessing Service Providers (DASP)
A file containing documentation of the Difficulty Accessing Service Providers level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.


Back to top of the page