Part B - The ABS data quality framework
The ABS has developed a Data Quality Framework (DQF) which is used in evaluating the quality of ABS statistical collections and products (e.g. survey data, statistical tables), including administrative data. The ABS DQF is used in the collection, processing and dissemination of GFS. The ABS DQF is comprised of seven dimensions of quality, reflecting a broad and inclusive approach to quality definition and assessment. The seven dimensions of quality comprise:
- Institutional environment;
- Interpretability; and
The first dimension of quality in the ABS DQF is the institutional environment. This dimension refers to the institutional and organisational factors which may have a significant influence on the effectiveness and credibility of the agency producing the statistics. Consideration of the institutional environment associated with a statistical product is important as it enables an assessment of the surrounding context, which may influence the validity, reliability or appropriateness of the product. The dimension of institutional environment can be evaluated by considering six key aspects:
- Impartiality and objectivity: whether the production and dissemination of data are undertaken in an objective, professional and transparent manner.
- Professional independence: the extent to which the agency producing statistics is independent from other policy, regulatory or administrative departments and bodies, as well as from private sector operators, and potential conflict of interest.
- Mandate for data collection: the extent to which administrative organisations, businesses and households, and the public at large may be compelled by law to allow access to, or to provide data to, the agency producing statistics.
- Adequacy of resources: the extent to which the resources available to the agency are sufficient to meet its needs in terms of the production or collection of data.
- Quality commitment: the extent to which processes, staff and facilities are in place for ensuring the data produced are commensurate with their quality objectives.
- Statistical confidentiality: the extent to which the privacy of data providers (households, enterprises, administrations and other respondents), and the confidentiality of the information they provide, are guaranteed (if relevant).
The second dimension of quality in the ABS DQF is relevance. This dimension refers to how well the statistical product or release meets the needs of users in terms of the concept(s) measured, and the population(s) represented. Consideration of the relevance associated with a statistical product is important as it enables an assessment of whether the product addresses the issues most important to policy makers, researchers and to the broader Australian community. The dimension of relevance can be evaluated by considering the following key aspects:
- Scope and coverage: the purpose or aim for collecting the information, including identification of the target population, discussion of whom the data represent, who is excluded and whether there are any impacts or biases caused by exclusion of particular people, areas or groups.
- Reference period: this refers to the period for which the data were collected (e.g. the September-December quarter of the 2014-15 financial year), as well as whether there were any exceptions to the collection period (e.g., delays in receipt of data, changes to field collection processes due to natural disasters).
- Geographic detail: information about the level of geographical detail available for the data (e.g. postcode area, Statistical Local Area) and the actual geographic regions for which data are available.
- Main outputs / data items: whether the data measures the concepts meant to be measured for its intended uses.
- Classifications and statistical standards: the extent to which the classifications and standards used reflect the target concepts to be measured or the population of interest.
- Type of estimates available: this refers to the nature of the statistics produced, which could be index numbers, trend estimates, seasonally adjusted data, or original unadjusted data.
- Other cautions: information about any other relevant issue or caution that should be exercised in the use of the data.
Timeliness is the third dimension of quality in the ABS DQF. Timeliness refers to the delay between the reference period (to which the data pertain) and the date at which the data become available; and the delay between the advertised date and the date at which the data become available (i.e. the actual release date). These aspects are important considerations in assessing quality, as lengthy delays between the reference period and data availability, or between advertised and actual release dates, can have implications for the currency or reliability of the data. The dimension of timeliness can be evaluated by considering two key aspects:
- Timing: this refers to the time lag between the reference period and when the data actually become available (including the time lag between the advertised date for release and the actual date of release). For example, the reference period may be the 2004-05 financial year, but data may not become available for analysis until the middle of 2006.
- Frequency of survey: this refers to whether the survey or data collection was conducted on a oneoff basis, or whether it is expected to be ongoing. If it is expected to be ongoing, frequency also includes information about the proposed frequency of repeated collections and when data will be released for subsequent reference periods.
The fourth dimension of quality in the ABS DQF is accuracy. Accuracy refers to the degree to which the data correctly describe the phenomenon they were designed to measure. This is an important component of quality as it relates to how well the data portray reality, which has clear implications for how useful and meaningful the data will be for interpretation or further analysis. In particular, when using administrative data, it is important to remember that statistical outputs for analysis are generally not the primary reason for the collection of the data.
Accuracy should be assessed in terms of the major sources of errors that potentially cause inaccuracy. Any factors which could impact on the validity of the information for users should be described in quality statements. The dimension of accuracy can be evaluated by considering a number of key aspects:
- Coverage error: this occurs when a unit in the sample is incorrectly excluded or included, or is duplicated in the sample (e.g., a field interviewer omits to interview a set of households or people in a household). Coverage of the statistical measures could be assessed by comparing the population included for the data collection to the target population.
- Sample error: where sampling is used, the impact of sample error can be assessed using information about the total sample size and the size of the sample in key output levels (e.g. number of sample units in a particular geographical area), the sampling error of the key measures, and the extent to which there are changes or deficiencies in the sample which could impact on accuracy.
- Non-response error: this refers to incomplete information provided by a respondent (e.g., when some data are missing, or the respondent has not answered all questions or provided all required information). Assessment should be based on non-response rates, or percentages of estimates imputed, and any statistical corrections or adjustment made to the estimates to address the bias from missing data.
- Response error: this refers to a type of error caused by respondents intentionally or accidentally providing inaccurate responses, or incomplete responses, during the provision of data. This occurs not only in statistical surveys, but also in administrative data collection where forms, or concepts on forms, are not well understood by respondents. Respondent errors are usually gauged by comparison with alternative sources of data and follow-up procedures.
- Other sources of errors: Any other serious accuracy problems with the statistics should be considered. These may include errors caused by incorrect processing of data (e.g. erroneous data entry or recognition), alterations made to the data to ensure the confidentiality of the respondents (e.g. by adding "noise" to the data), rounding errors involved during collection, processing or dissemination, and other quality assurance processes.
- Revisions to data: the extent to which the data are subject to revision or correction, in light of new information or following rectification of errors in processing or estimation, and the time frame in which revisions are produced.
The fifth dimension of quality in the ABS DQF is coherence. Coherence refers to the internal consistency of a statistical collection, product or release, as well as its comparability with other sources of information, within a broad analytical framework and over time. The use of standard concepts, classifications and target populations promotes coherence, as does the use of common methodology across surveys. Coherence is an important component of quality as it provides an indication of whether the dataset can be usefully compared with other sources to enable data compilation and comparison. It is important to note that coherence does not necessarily imply full numerical consistency, rather consistency in methods and collection standards. Quality statements of statistical measures must include a discussion of any factors which would affect the comparability of the data over time. The coherence of a statistical collection, product or release can be evaluated by considering a number of key aspects:
- Changes to data items: to what extent a long time series of particular data items might be available, or whether significant changes have occurred to the way that data are collected.
- Comparison across data items: this refers to the capacity to be able to make meaningful comparisons across multiple data items within the same collection. The ability to make comparisons may be affected if there have been significant changes in collection, processing or estimation methodology which might have occurred across multiple items within a collection.
- Comparison with previous releases: the extent to which there have been significant changes in collection, processing or estimation methodology in this release compared with previous releases, or any 'real world' events which have impacted on the data since the previous release.
- Comparison with other products available: this refers to whether there are any other data sources with which a particular series has been compared, and whether these two sources tell the same story. This aspect may also include identification of any other key data sources with which the data cannot be compared, and the reasons for this, such as differences in scope or definitions.
Interpretability is the sixth dimension of quality in the ABS DQF. Interpretability refers to the availability of information to help provide insight into the data. Information available which could assist interpretation may include the variables used, the availability of metadata, including concepts, classifications, and measures of accuracy. Interpretability is an important component of quality as it enables the information to be understood and utilised appropriately. The interpretability of a statistical collection, product or release can be evaluated by considering two key aspects:
- Presentation of the information: the form of presentation and the use of analytical summaries to help draw out the key message of the data
- Availability of information regarding the data: the availability of key material to support correct interpretation, such as concepts, sources and methods; manuals and user guides; and measures of accuracy of data.
Accessibility is the seventh and final dimension of quality in the ABS DQF. Accessibility refers to the ease of access to data by users, including the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which information can be accessed. The cost of the information may also represent an aspect of accessibility for some users. Accessibility is a key component of quality as it relates directly to the capacity of users to identify the availability of relevant information, and then to access it in a convenient and suitable manner. The accessibility of a statistical collection, product or release can be evaluated by considering two key aspects:
- Accessibility to the public: the extent to which the data are publicly available, or the level of access restrictions. Additionally, special data services may include the availability of special or non-standard groupings of data items or outputs, if required.
- Data products available: this refers to the specific products available (e.g., publications, spreadsheets), the formats of these products, their cost, and the available data items which they contain.