|Page tools: Print Page Print All RSS Search this Product|
Research being undertaken on macroediting
Editing is the activity aimed at detecting, resolving and treating anomalies in data to help make the data ‘fit for purpose’. Whereas microediting involves the editing of collection inputs such as unit records (i.e. microdata), macroediting involves the editing of collection outputs such as estimates, ratios of estimates, and standard errors (i.e. macrodata). Note that some collections have more complex collection outputs such as indexes, medians, or composites of estimates (as in the National Accounts) which must also be macroedited. For simplicity, this article will refer to all collection outputs as 'estimates'.
The Statistical Services Branch is currently undertaking researching methods for the efficient detection of anomalous estimates for macroediting. The aim is to extend the micro significance editing approach to macroediting where a measure of significance is used to develop a ‘macro significance score’. The size of the score indicates how anomalous an estimate is considered, where higher scores indicate estimates which are considered more suspicious. Estimates can be ordered in descending score size to create a ranking. The higher the score, the more likely it is that the estimate and/or standard error may have been affected by important processing or estimation errors, important data errors, outliers; or that it is correct but requires justification. The scoring and ranking system will allow the macroediting workload to be managed where the manager can balance the amount of macroediting with the time and resources available for macroediting. This will assist macroeditors to achieve maximum benefit for their macroediting effort and, hopefully, free up macroediting time for the more complex and difficult problems.
Although macro significance scoring is a fairly simple idea, it has the advantage that it uses the same concepts used for micro significance editing such as the calculation of scores for estimates; the ranking of estimates by score size; the application of editing cut-offs (i.e. an editing cost-benefit analysis); and the identification of anomalous estimates. The scores are based on comparisons of estimates and standard errors with the expectations of them. This will usually involve comparing current estimates with previous estimates (possibly adjusted for trend or seasonality) and achieved standard errors with desired (or design) standard errors. In this case, the previous estimate is used as the expected estimate and the desired standard error is used as the expected standard error. It is envisaged that the user will be able to choose to use scores based on estimates only, scores based on standard errors only, or scores based on a combination of both. Cut-offs can be optionally used to choose a set of anomalous estimates for further investigation (it is expected that most cutoffs will be chosen interactively).
A variation of this approach can be applied to estimates where no expectations of them exist. In this case, the scores will most likely be based on the contribution of estimates to higher level estimates (where the higher level estimates are aggregations of the lower level estimates). For example, State by Industry estimates can be ranked in terms of their contributions to both State and Australian estimates. This could be of advantage when there are many lower level estimates for each item and many items.
If macro significance editing is found to be a useful addition to the wider set of macroediting tools for business surveys, it is expected that the method will be added to the Significance Editing Engine functionality. This will bring together similar micro and macro significance editing concepts and system infrastructure into the one tool. A simple test version of one adaption of the above ideas, called "Hierarchical Macroscores for Movements" (HMM), has been created. This prioritises lower level movements (such as State by Industry) in terms of their impact on two higher levels (such as State and Australia). HMM has been tried by a few surveys and initial indications are that it is very useful.
Looking some way ahead, it is possible to combine the scores with graphs such as scatterplots and scatterplot matrices where the anomalous estimates and standard errors can be displayed using symbols or colour coding. The macroeditor will be able to 'see' the result of the score cut-offs. With interactive graphics, the (objectively-chosen) anomalous estimate selections can be manually modified by the macroeditor thus incorporating the subjective component of macroediting. Ultimately, the user could click on points and drill down to more detailed decompositions of the estimates.
For further information contact Keith Farwell on (03) 6222 5889 or firstname.lastname@example.org.
These documents will be presented in a new window.