6361.0.55.002 - Employment Arrangements, Retirement and Superannuation, User Guide, Australia, April To July 2007  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 18/11/2008   
   Page tools: Print Print Page Print all pages in this productPrint All

USING THE CURF DATA


USING THE CURF DATA

Microdata from SEARS 2007 is available in the form of an expanded CURF available only via the remote access data laboratory (RADL). The RADL is an on-line database query system under which microdata are held on a server at the ABS, to which users can submit programs to interrogate and analyse data, and access the results. Further information about the RADL facility and obtaining access to the CURF are available on the ABS website <www.abs.gov.au> (see Services We Provide/CURF Microdata/Accessing CURF Microdata).

This chapter provides details on how to use the microdata, content of the files and conditions of microdata release.


About the microdata

SEARS 2007 microdata are released under the provisions of the Census and Statistics Act 1905. This Act allows for the release of data in the form of unit records where the information is not likely to enable the identification of a particular person or organisation. Accordingly, there are no names or addresses of survey respondents on the CURF, and other steps have been taken to protect the confidentiality of respondents. These include removing some items from the CURF, reducing the level of detail shown for some items, and changing characteristics for some records. To further assist in the confidentiality of unit record data, all dollar values have been perturbed. That is, each value has been adjusted up or down by a small, random amount. Also, for each of these items, each value above or below a certain cut-off limit has been set to the mean of all the values above or below the cut-off limit.

As a consequence of the steps taken to ensure confidentiality, data on the CURF will not match published data exactly.

Steps to confidentialise the data on the CURF are taken in such a way to ensure the integrity of the data and optimise its content, while maintaining the confidentiality of respondents. Intending purchasers should ensure that the data they require, at the level of detail they require, are available on the CURF. Data collected in the survey but not contained on the CURF may be available as statistics in tabulated form on request. A list of the data items on the expanded CURF is available on the ABS website <www.abs.gov.au> (see 6361.0.55.002 SEARS 2007 CURF Data Items).


FILE STRUCTURE AND USE

The SEARS 2007 expanded CURF contains a set of four files with confidentialised records. These files are in a hierarchical relationship and provide records at the following levels:

  • Household level: contains information about State or Territory, area of usual residence (capital city/balance of State), tenure type by landlord type, family composition of household, number of children in the household by age groups, household equivalised and gross weekly income, and two SEIFA indexes;
  • Family level: contains information relating to family specific items such as family composition, number of children in the family by age groups, age of oldest and youngest person in the family;
  • Person level: contains information only for persons aged 15 years and over such as age, sex, marital status, relationship in household, country of birth, year of arrival in Australia, education qualifications, employment arrangements including labour force details, working arrangements used to provide care, provision of care to adults and children, previous job details, superannuation account balances and contributions; and
  • Job level: contains information for up to three jobs for each employed person aged 15 years and over such as occupation, industry, sector of employment, paid leave entitlements, working arrangements and flexibility. Detailed information is available for the main and second jobs, although due to a small response size there is limited data available for the third job. To maintain the hierarchical structure of the file, each person will have job records for all three jobs, although some of these records will have null data due to no information being available or provided.

The table below shows the number of records on each level.

SEARS Expanded CURF Record counts

Number of records
no.

Household level
14 059
Family level
14 770
Person level
26 955
Job level
28 119



USING THE EPISODIC DATASET

The job level is an episodic dataset. The episodic dataset in SEARS 2007 is a set of data with a counting unit (jobs) which may be repeated for a person.

The item 'Job number' (ABSJID) can be used to differentiate between main, second and third job. ABSJID=1 is used to create items for main job, ABSJID=2 to create items for second job, and ABSJID=3 to create items for third job. These items can be used to count persons based on the characteristic of interest. However, if job number is not restricted then the counts are for characteristics of jobs (not persons).


Use of weights

On each level of the CURF, every record contains a 'weight'. The weight indicates how many population units are represented by the sample unit. See the discussion under Weighting in Chapter 3: 'Data processing' for more information.

The person weight identifier is WEIGHTPN and the household weight identifier is WEIGHTHH. In addition, replicate weights have been included, with 60 person replicate weights (WPM0101 - WPM0160) and 60 household replicate weights (WHM0101 - WHM0160). The purpose of these replicate weights is to enable the calculation of the RSE for each estimate produced from the CURF. For more information on RSEs, please refer to Chapter 4: 'Data Quality'.

The household weight can be found on both the household and family levels, while the person weight can be found on both the person and job levels. Where estimates are derived from the CURF, it is essential that they are calculated using the correct weight for that level of the file, and not just counting the number of records in each category. If person or household weights were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a person’s or household's chance of selection, or of different response rates across population groups, and the resulting estimates may be seriously biased. The application of weights ensures that estimates conform to an independently estimated distribution of the population by age and other characteristics, rather than to the distributions within the sample itself.

It should be noted that as a result of some of the changes made to protect confidentiality on the CURF, estimates of benchmarked items produced from the CURF may not equal benchmarked values.


Incomplete families and households

Some family and household records are incomplete, that is, one or more people belonging to the unit did not respond to the survey. Where this is the case, they have an identifier of ICHHFLG set to '1' and the weight for the household and family level records has been set to nil. Therefore, when weighted statistics are produced from these files, the incomplete units are not represented in these statistics.


Record identifiers

There are several identifiers for each record on each of the levels.

Each household has a unique random identifier (ABSHID). This identifier appears on the household level, and is repeated on the family, person and job levels for each record relating to that household.

Each family within the household is numbered sequentially. Non-family members, single person households and persons in group households have a sequential 'family number' commencing at 50. 'Family number' (ABSFID) appears on the family, person and job levels. The combination of household identifier and family number uniquely identifies a family.

A family has one or more income units, and each income unit within the family is numbered sequentially. 'Income unit number' (ABSIID) appears on the person and job levels. SEARS does not output information at the income unit level, but the combination of household identifier, and family and income unit number uniquely identifies an income unit that can be used at the person and job levels.

An income unit has one or more persons and each person within the income unit is numbered sequentially. 'Person number' (ABSPID) appears on the person level. The combination of household identifier, and family, income unit and person number uniquely identifies a person.

A person may have one or more jobs and each of these is numbered sequentially. 'Job number' (ABSJID) appears on the job level. The combination of household identifier, and family, income unit, person and job number uniquely identifies a job.

At higher levels, identifiers for lower levels are set to zero. For example, on the household level, the identifiers for family, person, income unit and job are all set to zero.


Deriving items at a higher level

There may be instances when information is not provided at the level required by users and items will need to be derived at a higher level. For example, to determine the number of adults in a household which is not available on the household level, the information must be derived from the person level up to the household level. Deriving higher level estimates is possible using the person, family and household identifiers. Care should be taken to exclude persons from incomplete households and families by always using the weight applicable to the higher level (that is, WEIGHTHH for family or household level derived estimates).


Copying data to a lower level

There may be instances when information is not provided at the lower levels as required by users. For example, characteristics of people such as age and sex are not included on the job level.

To copy data from the person level to the job level:
  • sort person and job level by person. All records for a particular person must be sorted together. This can be done by sorting on the identifiers for household, family, income unit, and person (ABSHID, ABSFID, ABSIID, and ABSPID);
  • match the records in person and job level by person. The records must be matched using the identifiers for household, family, income unit and person (ABSHID, ABSFID, ABSIID, and ABSPID); and
  • copy the person level items to the corresponding job level records.

These steps will result in a new dataset containing all the person level information (e.g. age) attached to the job level data (e.g. occupation in job).


Special codes for income items

When analysing income totals at person, job and household level, it is necessary to exclude the reserved value of 99,999,998 for 'Not known or not stated' and 99,999,999 for 'Not applicable'. The data item list, Appendix 1 of this User Guide, lists the special codes for these items. Also, if more than one contributing income item at the person level has a value of 'Not known', then totals derived from these items, such as 'Gross personal income per week' are also set to 'Not known', as it was not possible to derive an accurate total. Similarly, if more than one contributing person record in a household has a value of 'Not known', then household income (both equivalised and gross) and derived income deciles are set to 'Not known/no income reported'. Care should be taken to exclude these codes when categorising higher income values, and when calculating means, medians and other summary statistics.


Special codes for other continuous items

When analysing continuous items at person and job levels, it is necessary to exclude the special codes. For example, 'Number of hours would like to work (all jobs) (NHRLWKA) has a reserved value of 998 for 'Not stated' and 999 for 'Not applicable'. The data item list, Appendix 1 of this User Guide, lists the special codes for continuous items. Care should be taken to exclude these codes when categorising higher values for ranges, and when calculating means, medians and other summary statistics.


Deriving equivalised income

Equivalised gross household income per week is expressed in continuous dollars as well as deciles. Equivalised income is derived by calculating an equivalence factor according to the chosen equivalence scale, and then dividing income by the factor.

The equivalence factor derived using the 'modified OECD' equivalence scale is determined by allocating points to each person in a household. The first adult in the household is given a weight of 1 point, each additional person aged 15 years and over is allocated 0.5 points, and each child under 15 years is allocated 0.3 points. Equivalised household income is derived by dividing the total household income by a factor equal to the sum of the equivalence points allocated to the household members. The equivalised income of a lone person household is the same as its unequivalised income. The equivalised income of a household comprising more than one person lies between the total value and the per capita value of its unequivalised income.

Equivalised household income is an indicator of the economic resources available to each member of a household. It can be used for comparing the situation of individuals as well as comparing the situation of households. When unequivalised household income is negative, such as when a loss is reported for an individual's unincorporated business or other investment income, and this loss is greater than any positive income from any other source, then equivalised household income is set to zero.


Equivalised gross household income per week deciles boundaries

To assist in the use and interpretation of the deciles, the dollar amounts contained in each decile are shown in the following table. Cases where income was 'not stated' or 'not known' were excluded before the deciles were calculated.

The table below shows the Equivalised gross household income per week (EQUIVDEC) decile boundaries.

Equivalised gross household income per week (EQUIVDEC) decile boundaries, for Expanded CURF

min
max
Range

Lowest 10%
-
249.57
Less than 250.00
Second decile
250.00
322.50
250.00 to less than 322.67
Third decile
322.67
427.78
322.67 to less than 427.83
Fourth decile
427.83
538.10
427.83 to less than 538.13
Fifth decile
538.13
656.00
538.13 to less than 656.09
Six decile
656.09
779.62
656.09 to less than 780.00
Seventh decile
780.00
934.00
780.00 to less than 934.29
Eighth decile
934.29
1 142.00
934.29 to less than 1142.86
Ninth decile
1 142.86
1 503.33
1142.86 to less than 1503.81
Highest 10%
1 503.81
26 288.00
1503.81 or greater

- nil or rounded to zero (including null cells)



Multiple response items

There are a number of data items on the SEARS 2007 expanded CURF that contain multiple responses. Respondents were able to select one or more response categories for these items, and the output data items are multi-response in nature. This section describes these items and provides some information on how to use them.

All multiple response items are indicated in a certain way on the data item list, for example, the item 'All reasons not currently working' (NOTCWKA - NOTCWKU) captures the first response in the first, or 'A', position e.g. (NOTCWKA), and additional responses (if provided) are in the subsequent positions e.g. second and then third, or 'B' and then 'C' positions. If a person did not answer the question then they will have a value of 0 'not applicable' in the first position (NOTCWKA). The 'Null response' (value of 0) is a default code and should not be used in data analysis.


Geography

To enable CURF users greater flexibility in their analyses, the ABS has included Socio-economic Indexes for Areas (SEIFA) and sub-state geography items on the SEARS 2007 expanded CURF. Cross-tabulations by several of these items simultaneously produce cells relating to some small geographic regions. Tables showing multiple data items, cross-tabulated by more than one SEIFA and/or sub-state geography at a time, are not permitted due to the detailed information about small geographical regions that could be presented. However, simple cross-tabulations of population counts by SEIFA or sub-state geographic data items may be useful for clients in order to determine which SEIFA or geography item to include in their primary analysis, and such output is permitted.

See the Glossary for the definitions of the SEIFA data items included on the SEARS 2007 expanded CURF. For more information about SEIFA see Information Paper: Census of Population and Housing - Socio-economic Indexes for Areas, Australia (cat. no. 2039.0).


Children aged under 15 years

Children aged under 15 years do not have their own person level record. Information on the number and ages of such children was collected and is included on the household and family level files.


Changes relating to data items

Between the first and second iterations of this survey there were changes to specific data items. As SEARS 2007 included new question modules, conceptual and methodological changes, and an expansion of scope, these changes are too substantial to list individually. Issues affecting the comparability of SEAS 2000 and SEARS 2007 are outlined in Chapter 1: 'Introduction'. CURF users who are undertaking comparisons with the SEAS 2000 CURF should take care to compare items they are using for analysis by using the supporting information, user guide and attachments, and data item lists available for both SEAS 2000 and SEARS 2007, as well as Appendix 1 and Appendix 2 of Employment Arrangements, Retirement and Superannuation, Australia, April to July 2007 (cat. no. 6361.0). Particular attention should be paid to the definition of the data item, populations relating to the data item, and reference periods that apply.


SEARS 2007 CURF files

The CURF is only available via the RADL. It is available in several different formats (SAS, SPSS, STATA). The names of the expanded CURF files are listed below:

SAS files:

These files contain the data for the expanded CURF in SAS for Windows format:
  • ERS07EH.SAS7BDAT contains the Household level data;
  • ERS07EF.SAS7BDAT contains the Family level data;
  • ERS07EP.SAS7BDAT contains the Person level data; and
  • ERS07EJ.SAS7BDAT contains the Job level data.

SPSS files:

These files contain the data for the expanded CURF in SPSS for Windows format:
  • ERS07EH.SAV contains the Household level data;
  • ERS07EF.SAV contains the Family level data;
  • ERS07EP.SAV contains the Person level data; and
  • ERS07EJ.SAV contains the Job level data.

STATA files:

These files contain the data for the expanded CURF in STATA format:
  • ERS07EH.DTA contains the Household level data;
  • ERS07EF.DTA contains the Family level data;
  • ERS07EP.DTA contains the Person level data; and
  • ERS07EJ.DTA contains the Job level data.

Information Files
  • FORMATS.SAS7BCAT - This file is a SAS library containing formats.
  • README.TXT - This is a text file describing the contents of the CURF.
  • RESPONSIBLE ACCESS TO ABS CURFs TRAINING MANUAL_MAR05.PDF - This is an acrobat file explaining the CURF users' role and obligations when using confidentialised data.
  • Employment Arrangements, Retirement and Superannuation, Australia: Confidentialised Unit Record File, User Guide, 2007 (cat. no. 6361.0.55.002)
  • Employment Arrangements, Retirement and Superannuation, Australia, April to July 2007 (cat. no. 6361.0)
  • ABS CONDITIONS OF SALE.PDF - This file describes ABS conditions of sale.
  • COPYRITE1.BAT - This file describes copyright obligations for CURF users.
  • IMPORTANT INFORMATION FOR CURF USERS_300903.PDF - This file directs users to the ABS website for more and up to date information on what is available from the ABS.
  • 6361.0.55.002 SEARS 2007 CURF Data Items.XLS - This file contains documentation of SEARS 2007 data items including data item labels, code values and category labels.
  • FREQUENCIES_ERS07EH.TXT - This file contains documentation of the Household level data. Data item code values and category labels are provided with weighted and unweighted household frequencies of each value. This file is in plain text format.
  • FREQUENCIES_ERS07EF.TXT - This file contains documentation of the Family level data. Data item code values and category labels are provided with weighted and unweighted family frequencies of each value. This file is in plain text format.
  • FREQUENCIES_ERS07EP.TXT - This file contains documentation of the Person level data. Data item code values and category labels are provided with weighted and unweighted person frequencies of each value. This file is in plain text format.
  • FREQUENCIES_ERS07EJ.TXT - This file contains documentation of the Job level data. Data item code values and category labels are provided with weighted and unweighted job frequencies of each value. This file is in plain text format.


CONDITIONS OF RELEASE

The SEARS 2007 expanded CURF has been released in accordance with a Ministerial Determination (Clause 7, Statutory Rules 1983, No. 19) in pursuance of section 13 of the Census and Statistics Act 1905. As required by the Determination, the CURF has been designed so that the information on the files is not likely to enable the identification of the particular person or organisation to which it relates. The Australian Statistician's approval is required for each release of a CURF. In addition, the ABS requires all organisations and individuals within organisations who purchase or are seeking to use a CURF to sign an undertaking to abide by the legislative restrictions on use, before access to the CURF will be granted. The undertaking includes, among other conditions, that in using the data people will:
  • use the information only for the statistical purposes specified in the Schedule to the Undertaking;
  • not attempt to identify particular persons or organisations;
  • not disclose, either directly or indirectly, the information to any other person or organisation other than members of this organisation who have been approved by the ABS to have individual access to the information;
  • not attempt to match, with or without using identifiers, the information with any other list of persons or organisations;
  • comply with any other direction or requirement specified in the ABS Responsible Access to ABS CURFs Training Manual; and
  • not attempt to access the information after the term of their authorisation expires, or after their authorisation is rescinded by the organisation which provided it, or after they cease to be a member of that organisation.

Use of the data for statistical purposes means use of the CURF data to produce information of a statistical nature. Examples of statistical purposes are:
  • manipulation of the data to produce means, correlations or other descriptive or summary measures;
  • estimation of population characteristics;
  • use of data as input to mathematical models or for other types of analysis (e.g. factor analysis); and
  • providing graphical or pictorial representations of the characteristics of the population or subsets of the population.

All CURF users are required to read and abide by the Responsible Access to ABS Confidentialised Unit Record Files (CURFs) Training Manual available on the ABS website <www.abs.gov.au> (see Services We Provide/CURF Microdata/Accessing CURF Microdata/Responsible Access to ABS CURFs).


Conditions of sale

All ABS products and services are provided subject to the ABS conditions of sale. Any queries relating to these Conditions of Sale should be referred to <intermediary.management@abs.gov.au>.


Access method

Due to the level of detail provided, the SEARS 2007 expanded CURF is only available via the ABS Remote Access Data Laboratory (RADL).


Price

The current recommended retail price of the SEARS 2007 expanded CURF is $1,320 (including GST).


Australian Universities

University clients should refer to the ABS website (see Services We Provide, Services to Universities). The SEARS 2007 expanded CURF can be accessed by universities participating in the ABS/Universities Australia CURF Agreement for research and teaching purposes.


Other clients

Other prospective clients should contact the Microdata Access Strategies Section of the ABS at <microdata.access@abs.gov.au> or on (02) 6252 7714.