|Page tools: Print Page Print All|
USING THE CURF
CHANGE OF POPULATION ESTIMATES FOR PEOPLE AGED FIFTEEN YEARS AND OVER DUE TO RECORD MASKING, BY STATE
Steps to confidentialise the datasets made available on the CURF are undertaken in such a way as to ensure the integrity of the datasets and optimise the content, while maintaining the confidentiality of respondents. Intending purchasers should ensure that the data they require at the level of detail they require are available on the CURF; data obtained in the survey, but not contained on the CURF may be available in TableBuilder or in tabulated form on request. The Data Item Lists document on the Summary tab contains information about the list of data items, which is available as an Excel spreadsheet on the Downloads tab.
Table 2 shows the number of records on each level for the CURF dataset.
COUNTING UNITS AND NUMBER OF RECORDS, BY LEVEL
There is a series of identifiers that can be used on records at each level of the file.
File level identifiers
The identifiers ABSHID, ABSPID, ABSVID, ABSDID appear on all levels of the file (as they are needed to create a hierarchical CSV file). Where the information for the identifier is not relevant for a level, it has a value of 0.
Each household has a unique twelve digit random identifier, ABSHID. This identifier appears on the Household level and is repeated on every other level. The Voluntary Work and Difficulty Accessing Service Providers episode levels are children of the Person level, and therefore the unique identifier is comprised of the Household, Person and episode level. The composition of identifiers for each level is outlined below:
1. Household = ABSHID
2. Person = ABSHID, ABSPID
3. Voluntary Work = ABSHID, ABSPID, ABSVID
4. Difficulty Accessing Service Providers = ABSHID, ABSPID, ABSDID
Copying information across levels
Identifiers can be used to copy information from one level of the file to another. The following SAS code (or equivalent) can be used to copy information from one level to another:
PROC SORT DATA=GSS14EP; *Person level file;
PROC SORT DATA=GSS14EH; *Household level file;
MERGE GSS14EP (IN=A) GSS14EH (IN=B);
IF A AND B THEN OUTPUT;
The following SAS code (or equivalent) can be used to copy information from a higher level to a level below:
PROC SORT DATA=GSS14EP; *Person level file;
PROC SORT DATA=GSS14EV; *Volunteering level file;
MERGE GSS14EV (IN=A) GSS14EP (IN=B)
IF A AND B THEN OUTPUT; *Only keeps records which are present on both files;
This merge will match one GSS14EP record to many GSS14EV records. The statement 'If A and B then OUTPUT;' ensures that only records present on both files are kept. If this statement was not used then GSS14EP records without a corresponding GSS14EV record would appear with a missing value for all GSS14EV data items. Note that the data items copied from the GSS14EP level will now have the counting unit for the level they have been added to, being instances of volunteering in this case.
Combining data from different levels can sometimes be confusing, both in selecting an appropriate item and in understanding the counting unit. For example, if you are interested in volunteering activity, and you want to analyse this by volunteers' characteristics such as sex or age, then you might cross-tabulate SEX by VOLSECT (organisation sector type). This would yield results indicating the estimate (or sample count) of instances of volunteering in each sector, split by sex, rather than the estimate (or sample count) of males or females and their respective activity as volunteers in each sector. When looking at the volunteering level, the counting unit is instances of volunteering, rather than persons.
Example STATA code
table GCCSA, c( freq ) f(%11.0f) stubwidth(30)
table SF2SA1DN, c( freq ) f(%11.0f) stubwidth(30)
table DISSTAT, c( freq ) f(%11.0f) stubwidth(30)
table EDATTAIN, c( freq ) f(%11.0f) stubwidth(30)
Example SPSS code
MULTI-RESPONSE ITEMS ON THE CURF
A number of questions included in the survey allowed respondents to provide one or more responses. Each response category for one of these 'multi-response questions' (or data items) is basically treated as a separate data item. On the CURF, these data items have the same general data item identifier (SASName) but are each suffixed with a letter – A for the first response, B for the second response, C for the third response, D for the fourth response and so on.
For example, the multi-response data item 'Long term health condition by type of condition' (with a general SASName of LTHCOND – see data item list), has twenty-one response categories. Consequently, twenty-one data items have been produced - LTHCONDA, LTHCONDB, LTHCONDC and so on.
Each data item in the series (i.e. LTHCONDA -- LTHCONDU) will have two response codes: A 'Yes' response (for the first in the series (code 1), for the second in the series (code 2) etc.) or a 'Null' response (code 0) indicating that the response was not relevant for the respondent. The last data item in the series will represent a 'Not Applicable' response (i.e. value of last character in series) which comprises the respondents not asked the questions (e.g. LTHCONDU with values of 0 or 99).
It should be noted that the sum of individual multi-response categories will be greater than the population or number of people applicable to the particular data item as respondents are able to select more than one response. Multi-response data items can be identified in the data item list where the words <multiple response> appear next to the data item name.
CURF DATA FILES
The 2014 expanded CURF can be accessed via the RADL, and is available in SAS, SPSS and STATA formats. The CURF comprises the following files:
These files contain the data for the CURF in SAS format.
GSS14EH.SAS7BDAT contains the Household level data
GSS14EP.SAS7BDAT contains the Person level data
GSS14EV.SAS7BDAT contains the Voluntary Work level data
GSS14ED.SAS7BDAT contains the Difficulty Accessing Service Providers data
These files contain the data for the CURF in SPSS format.
GSS14EH.SAV contains the Household level data
GSS14EP.SAV contains the Person level data
GSS14EV.SAV contains the Voluntary Work level data
GSS14ED.SAV contains the Difficulty Accessing Service Providers data
These files contain the data for the CURF in STATA format.
GSS14EH.DTA contains the Household level data
GSS14EP.DTA contains the Person level data
GSS14EV.DTA contains the Voluntary Work level data
GSS14ED.DTA contains the Difficulty Accessing Service Providers data
Data item list
The Data item list contains all the data items, including details of categories and code values, that are available on the CURF.
This file is a SAS library containing formats.
A file containing documentation of the Household level data. Data item code values and category labels are provided with weighted household frequencies of each value. This file is in plain text format.
A file containing documentation of the Person level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.
Voluntary Work (VOL)
A file containing documentation of the Voluntary Work level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.
Difficulty Accessing Service Providers (DASP)
A file containing documentation of the Difficulty Accessing Service Providers level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.
These documents will be presented in a new window.
4159.0.30.004 - Microdata: General Social Survey, Australia, 2014 Quality Declaration
Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 17/09/2015 First Issue