Australian Bureau of Statistics
4402.0.55.001 - Microdata: Childhood Education and Care, Australia , June 2011
Previous ISSUE Released at 11:30 AM (CANBERRA TIME) 25/10/2012
|Page tools: Print Page Print All RSS Search this Product|
USING THE CURF
There are a series of unique identifiers on records at each level of the file. Households (or Income Units) have a household identifier (ABSHID) and children have a child identifier (ABSPID). Repeating datasets also have identifiers to identify the type of care used by the income unit (ABSIID) and the child's care (ABSCID).
File level identifiers
The following are the identifiers:
1. Income Unit = ABSHID
2. Income Unit Care = ABSHID, ABSIID
3. Child = ABSHID, ABSPID
4. Child Care = ABSHID, ABSPID, ABSCID
All these identifiers are on each level.
As well as uniquely identifying all units, the identifiers are vital, for associated units, for copying attributes from one type of counting unit to another. For example, an income unit variable such as the labour force status of parents can be copied to all the children within the family. The means by which this might be done in SAS is illustrated below:
PROC SORT DATA=CEC11E.CEC11EHH OUT=CEC11EHH; BY ABSHID;
PROC SORT DATA =CEC11E.CEC11EPN OUT=CEC11EPN; BY ABSHID ABSPID;
DATA MERGFILE (KEEP=ABSHID ABSIID ABSPID ABSCID LFSPAR);
MERGE CEC11EHH CEC11EPN;
SORT CASES BY ABSHID.
/KEEP=ABSHID ABSIID ABSPID ABSCID.
SORT CASES BY ABSHID ABSPID.
MATCH FILES FILE=SORTEDCH
The following is an example of an income unit where the data item LFSPAR has been copied from the Income Unit level onto the Child level.
USING REPEATING DATASETS
The Income Unit and Child levels are counting units, whereas the Income Unit Care and Child Care levels are repeating datasets. The repeating datasets in the CEaCS are a set of data with a counting unit which may be repeated for a child or an income unit. The 'one to many' relationships described in File Structure, for the links between the Income Unit level and the Income Unit Care level and the Child level and the Child Care level, shows the connection between counting units and repeating datasets, i.e. an event or episode is repeated so that multiple records with the same set of data exist for the same child (or income unit).
For example, a child may have used more than one instance of child care such as (i) a long day care centre, (ii) family day care and (iii) grandparents. Consequently, three records would be present on the Child Care level for this child, representing a repeating dataset, with each record containing information for a common set of data items, e.g. Number of days of care used, Number of hours of care used, cost of the care and so on. Also, the child will have summary records in addition to the individual care records, described below.
In this example, although the three records all relate to a single child, any totals from the Child Care level are a count of child care arrangements.
Repeating datasets are only useful when common information is collected for each instance of a counting unit. For example, each child in a family may have several instances of care (CARINDX) with a cost of care after the Child Care Benefit and the Child Care Rebate (COSTCCR) associated with it, for last week and usually (USLWFLG). Therefore, each child care unit has a cost of care after CCB and CCR (COSTCCR) associated with it. This enables a table to be run on all instances of care.
To run a table on the dataset outlined above, the following SAS code (or equivalent) can be used. This will give you output that shows the frequency of each cost (dollar value) for each type of care usually used by the single child:
PROC FREQ DATA=CEC11E. CEC11ECC;
WHERE USLWFLG = 2;
Summary Records and Data Items
In addition to the general or base records present in the repeating datasets (i.e. on the Income Unit Care and Child Care levels) that provide details about each instance of child care, there are also 'summary' records that provide aggregate information for selected groupings of the types of care. For example, summary records are available for groupings of formal care, informal care and all care.
In the example of a child who attended long day care, family day care and also received care from a grandparent, there are three base records on the Child Care level because they attended three separate instances of child care. For each record the data item cost of care after CCB and CCR was reported as $38, $10 and $5 respectively. Therefore, the summary record for this child for the total cost of formal care (i.e. long day care and family day care) is recorded as $48 ($38 + $10). Similarly, the summary record for this child for the total cost of all care (i.e. all three types of care) is recorded as $53.
The following data items comprise the classifications that enable the data for these summary records to be tabulated:
Income Unit Care level - Type of care used by the family (SASName IUCINDX).
Child Care level - All types of care (SASName CARINDX).
Note that although the output above only relates to a single child, the totals are a count of all conditions for that child. That is, the table above shows the frequency of different costs for each type of care for an individual child.
As with the Child level file, some data items in a repeating dataset are only applicable to a particular sub-population of the dataset. For instance, the item 'Main reason intends to claim for the cost of formal care' from the Child Care level is only applicable for formal care. Records outside the sub-population will appear as a "Not applicable" e.g. children with just informal care or no care. In the Child Care level the usual or last week flag must be used. Refer to 'Using Flag Items'.
In addition, note that if you want to create ranged hours or cost tables which include custom totals for type of care (for example, all formal care excluding occasional care) you need to sum hours and cost for the types of care included in your total to the Child level before ranging the result.
USING FLAG ITEMS
To enable easier table specification and to ensure that the correct populations, and hence the correct data, are being tabulated, a number of 'flags' have been included in the CURF that should be used at all times when extracting data.
Usual or last week flag
There is a usual or last week care flag (USLWFLG) that allows users to look at a child's care usage for the reference week (last week) or their usual care usage. This flag is on the Child Care level. A similar flag at the Income Unit Care level (IUCSFLG) filters whether the care used by the family is on a usual or last week basis. These flags also include or exclude preschool from care used last week or usually.
It is imperative that the usual or last week care flags are used when any data items from the Child Care level or the Income Unit Care level are used, regardless of whether the care level data items are used alone or with other Child level or Income Unit level data items. If these flags are not used for child care or income unit care data items, the data will be incorrect.
The categories of the flags are:
1. Care usually used including preschool
2. Care usually used excluding preschool
3. Care used last week including preschool
4. Care used last week excluding preschool
Labour force scope flag
In households where all adults were out on scope of the LFS, no information was obtained for the 2011 CEaCS. However, as long as at least one parent in the household was in scope for the LFS, information about children aged 0–12 years and some information about their parents were collected and included in the 2011 CEaCS.
There is a labour force scope flag (LFSFLAG) to indicate whether the income unit is out on scope. This flag (present on the Income Unit level) indicates if one parent in a family was out on scope or coverage. Limited employment and demographic data are available for these families.
Information about the working arrangements used by parent/guardians to help care for their child was not available for parent/guardians who were out on scope or coverage of the labour force for any reason.
MULTI-RESPONSE DATA ITEMS
A number of questions included in the survey allowed respondents to provide one or more responses. Each response category for one of these 'multi-response questions' (or data items) is basically treated as a separate data item. These data items have the same general data item identifier (SASName) but are each suffixed with a letter – A for the first response, B for the second response, C for the third response, D for the fourth response and so on.
For example, the multi-response data item 'All sources of income of parent(s)' (with a general SASName of ASCIPAR – see data item list), has five response categories. Consequently, five data items have been produced - ASCIPARA, ASCIPARB, ASCIPARC, ASCIPARD and ASCIPARE.
Each data item in the series (i.e. ASCIPARA-- ASCIPARE) will have two response codes: A 'Yes' response (for the first in the series (code 1), for the second in the series (code 2) etc.) and a 'Null' response (code 0) indicating that the response was not relevant for the respondent. The first data item in the series also includes a 'Not Applicable' response which comprises the respondents not asked the questions (e.g. ASCIPARA with a value of 9).
It should be noted that the sum of individual multi-response categories will be greater than the population or number of people applicable to the particular data item as respondents are able to select more than one response.
Multi-response data items can be identified in the data item list as SASNames followed by a range of letters in brackets; for example, ASCIPAR(A-E). They can also be identified in the CURF data item list with a # appended to the data item name (e.g. Usual education/care/parenting arrangements two years prior to attending school #).
WEIGHTS AND ESTIMATION
As the survey was conducted on a sample of households in Australia, it is important to take account of the method of sample selection when deriving estimates. This is particularly important as a child's chance of selection in the survey varied depending on the state or territory in which they lived. Survey 'weights' are values which indicate how many population units are represented by the sample unit. See discussion in Survey Methodology.
There are two weights provided on the CEaCS CURF, as follows:
The application of weights ensures that:
Each record on the each of the levels also contains 60 replicate weights and, by using these weights, it is possible to calculate standard errors for weighted estimates produced from the microdata. This method is known as the 60 group Jack-knife variance estimator. When calculating standard errors, it is important to select the replicate weights which are most appropriate for the analysis being undertaken. The replicate weights are as follows:
Replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit record analyses such as chi-square and logistic regression to be conducted which take into account the sample design. Replicate weights for any variable of interest can be calculated from the 60 replicate groups, giving 60 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.
To obtain the standard error of a weighted estimate y, the same estimate is calculated using each of the 60 replicate weights. The variability between these replicate estimates (denoting y(g) for group number g) is used to measure the standard error of the original weighted estimate y using the formula:
g = the replicate group number
y(g) = the weighted estimate, having applied the weights for replicate group g
y = the weighted estimate from the sample.
The 60 group Jack-knife method can be applied not just to estimates of the population total, but also where the estimate y is a function of estimates of the population total, such as a proportion, difference or ratio. For more information on the 60 group Jack-knife method of SE estimation, see Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee), July 1999 (cat. no. 1352.0.55.029).
Use of the 60 group Jack-knife method for complex estimates, such as regression parameters from a statistical model, is not straightforward and may not be appropriate. The method as described does not apply to investigations where survey weights are not used, such as in unweighted statistical modelling.
To enable analysis at a regional level, each record on the CURF contains a state/territory identifier (STATECF) and two sub-state identifiers – Capital city/Balance of state/Territory (AREAOUT) and Remoteness structure (AREAREMC). The AREAOUT geographic data item has two output categories – Capital city and Balance of state/Territory. Only the capital city statistical divisions (as defined in the Australian Standard Geographical Classification ASGC (cat. no. 1216.0)) of the six states are included in the Capital city category. All other regions in Australia, including the territory capitals Darwin and Canberra, are classified to the Balance of state/Territory category.
Conditions of Use of Geographic Data Items
To provide CURF users with greater flexibility in their analyses, the ABS has included several sub-state geography data items (as described above) on the Expanded CURF.
Conditions are placed on the use of these items. Tables showing multiple data items, cross tabulated by more than one sub-state geography at a time are not permitted due to the detailed information about small geographic regions that could be presented. However, simple cross-tabulations of population counts by sub-state geographic data items may be useful for clients in order to determine which geography item to include in their primary analysis, and such output is permitted.
CURF FILE NAMES
The 2011 CEaCS Expanded CURF can be accessed through the RADL and is available in SAS, SPSS and STATA formats. The CURF comprises the following files:
These files contain the data for the CURF in SAS format.
CEC11EHH.SAS7BDAT - the CEaCS CURF Income Unit level file in SAS for Windows format.
CEC11EIC.SAS7BDAT - the CEaCS CURF Income Unit Care level file in SAS for Windows format.
CEC11EPN.SAS7BDAT - the CEaCS CURF Child level file in SAS for Windows format.
CEC11ECC.SAS7BDAT - the CEaCS CURF Child Care level file in SAS for Windows format.
These files contain the data for the CURF in SPSS format.
CEC11EHH.SAV - the CEaCS CURF Income Unit level file in SPSS format.
CEC11EIC. SAV - the CEaCS CURF Income Unit Care level file in SPSS format.
CEC11EPN. SAV - the CEaCS CURF Child level file in SPSS format.
CEC11ECC. SAV - the CEaCS CURF Child Care level file in SPSS format.
These files contain the data for the CURF in STATA format.
CEC11EHH.DTA - the CEaCS CURF Income Unit level file in STATA format.
CEC11EIC. DTA - the CEaCS CURF Income Unit Care level file in STATA format.
CEC11EPN. DTA - the CEaCS CURF Child level file in STATA format.
CEC11ECC. DTA - the CEaCS CURF Child Care level file in STATA format.
Data Item List
The Data item list contains all the data items, including details of categories and code values, that are available on the Expanded CURF. It is available on the Downloads tab.
FORMATS.SAS7BCAT - the SAS format file which provides labels for associated code values in the SAS version of the CURF.
FREQUENCIES_CEC11EHH.TXT - contains weighted and unweighted frequency counts for all Income Unit level data items. The file is in plain text format.
FREQUENCIES_CEC11EIC.TXT - contains weighted and unweighted frequency counts for all Income Unit Care level data items. The file is in plain text format.
FREQUENCIES_CEC11EPN.TXT - contains weighted and unweighted frequency counts for all Child level data items. The file is in plain text format.
FREQUENCIES_CEC11ECC.TXT - contains weighted and unweighted frequency counts for all Child Care level data items. The file is in plain text format.
These documents will be presented in a new window.
This page last updated 3 July 2015