Abstract by Wilson Fearn
Feature Hashing to Improve Interactive Topic Modeling Speeds
Interactive topic models are useful for interacting with large bodies of text without the need for expertise in machine learning. This capability is becoming more important as the amount of data we want to analyze and process increases. Current methods in topic modeling allow us to interact with large datasets, but not datasets at net scale. We propose the method of feature hashing the corpus vocabulary to speed up topic modeling without sacrificing topic usefulness. The results of our experiments show that this method is feasible and may as much as halve the speed of the topic modeling process while retaining topic quality.