Topic modeling includes a variety of machine learning techniques for identifying latent themes in a corpus of documents. Generating an exact solution (i.e., finding global optimum) is computationally intractable. Various optimization techniques (e.g., Variational Bayes, or Gibbs Sampling) are employed to generate topic solutions approximately by finding local optima. Such an approximation often begins with a random initialization, which leads to different results with different initialization.
A highly stable topic model is able to produce topic solutions that are partially or completely identical across multiple runs. Term stability refers to similarity of multiple runs of a single topic model.
This paper/project reviews different approaches to measure stability, and different techniques that are intended to improved stability. Although a couple of works have been done analyzing, measuring, and/or improving stability, no single paper has provided a thorough review of different stability metrics and various techniques that improved stability.
Under revision for ACM Computing Surveys
Leave a Reply