As part of our ongoing commitment to protect the privacy of data providers, the ABS is investigating a constrained optimisation approach to tabular suppression as we move to modernise our business statistics production. The ABS currently applies an internally developed graphical method to determine effective suppression patterns, but we are now reassessing its suitability as more contemporary methodologies become available.
As a potential alternative suppression method, we are considering a software developed by Statistics Canada called G-Confid. G-Confid builds upon the Controlled Tabular Adjustment algorithm (Cox, 2005). For each table cell subject to primary suppression, G-Confid identifies the optimal locations for complementary suppression such that loss of information is minimised, using linear programming techniques to represent table additivity.
G-Confid offers several potential improvements over the existing methodology, including:
- the ability to process multi-dimensional collections holistically, which is increasingly important as big data continues to grow; the existing ABS method can only process collections in two-dimensional slices
- the inclusion of a post-treatment procedure to reduce the number of redundant complementary suppressions
- the extensibility of constrained optimisation techniques, such as handling non-linearity, which may allow more complex data to be protected as the demand for more detailed statistics increases.
Another constrained optimisation method we are exploring is the Attacker Model (Fischetti & Salazar, 2001), as a means of auditing suppression patterns. The model aims to recalculate the possible values of suppressed cells, again using a linear programming framework to describe additivity between table cells. If a cell can be estimated within close range of its true value, it is considered a disclosure risk.
Using an implementation of the Attacker Model in Python, we are comparing the efficacy of G-Confid against the existing graphical suppression method. In terms of both data utility and protection levels, initial findings are promising. Our next steps are to expand upon test cases and to address computation time and model complexity for large tables.
For more information, please contact Zanya Barns.