|Page tools: Print Page Print All|
A New Analytical Platform to Explore Linked Data
The Semantic Web framework provides an alternative approach to data representation, linking and retrieval that can unlock the full potential of interconnected and multi-dimensional datasets. Instead of organising datasets in a structured row-column tabular form, the Semantic Web approach models information in the form of a network of entities and relationships. The relationships are given strong computable semantics by precisely specifying their logical properties in a machine-interpretable format.
The Semantic Web approach opens up new avenues of data exploration, visualisation and network analysis. One example of this has been demonstrated in the prototype GLIDE by using it to derive network statistics and create models to distinguish true firm deaths from spurious ones. The ABS has an established process for identifying firm exits, but is not able to distinguish the type of exit – whether it is due to restructuring, merger/takeover or a genuine death.
Both multilevel logistic regression and Bayesian Network (BN) models were used to distinguish true and spurious firm deaths. Multilevel models were developed both with and without network statistics, with the data partitioned into modelling (training) and prediction (test) subsets to assess the quality of out-of-sample predictions from the models. It was found that the model with network statistics performed substantially better (95% accuracy vs 74% accuracy). Significant variables were then incorporated in a BN model. This approach took account of the relationships between all the variables, achieving similar prediction accuracy with a subset of variables, and also handling observations with missing variables in the test data. The intention was not to compare both methods on the prediction outcomes but to build on the multilevel modelling results to provide a statistical framework for the BN model.
The analytical results have shown that it is important to account for spurious firm deaths for statistical production. This is because failure to account for spurious firm deaths can result in continuing enterprises being incorrectly classified as deaths, and as a result it can affect the statistical quality from the perspectives of survey frame and accuracy of the statistics. The conclusion is that the Semantic Web is a useful approach for statistical purposes, and that network analysis can be used to effectively distinguish true and spurious firm deaths.
These documents will be presented in a new window.