|Page tools: Print Page Print All
Unsupervised machine learning for text analysis
Specifically, unsupervised ML can be used to group similar responses together. It is intended to work in situations where no pre-labelled data exists on which to train a model. Rather, the approach seeks to detect patterns/themes in the data without any pre-existing examples to learn from (in contrast with supervised approaches). Two unsupervised ML approaches were tested.
The first method used ‘clustering’: a method where similar text responses are grouped together into clusters. This method was tested on Business Impacts of COVID-19 Survey data and provided some insights into common effects of COVID-19 on businesses. Two predominant clusters were: ‘staff working from home’ and ‘closure’ of things.
The second method used probability distributions to identify major themes in text responses by identifying the most common words associated with a topic. This method was used to analyse the Staff Wellbeing Survey. Common themes identified when staff were feeling very good were: ‘sun shining, good weather’ and ‘no commute time, sleep’. A common topic was identified for staff that were feeling very bad – this being ‘personal pressure, children’.
Both methods were quick and processed hundreds of text responses in under ten minutes. While the methods were not perfect in grouping or clustering all text responses, they did work well to reduce the amount of manual analysis required. The clustering approach for Cycle 1 of the Business Impacts of COVID-19 Survey data successfully clustered about 30% of the free-text responses into coherent groups. The probability distribution method for the Staff Wellbeing Survey outperformed clustering analysis and successfully classified over 90% of free-text responses. Given further development, these methods show great potential for use in other ABS applications.
For more information, please contact Lisa-Maree Gulino at email@example.com.
These documents will be presented in a new window.