Anirban Gangopadhyay
Paper download is intended for registered attendees only, and is
subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
An Empirical Bayes Approach to Topic Modeling
Auto-TLDR; An Empirical Bayes Based Framework for Topic Modeling in Documents
Given a corpus of documents, we consider the problem of finding latent topics, and introduce a novel Empirical Bayes based framework that allows us to choose the optimal topic modeling algorithm given observed variables in the data. We specifically consider three disparate algorithms - LDA, graph clustering, and non-negative matrix factorization - and provide a standardized framework that compares statistical and generative assumptions each algorithm makes. We then provide a model selection algorithm that quantifies each model based on how well assumptions match the data. We illustrate the efficacy of our approach by applying our framework to different sets of document corpuses and empirically measuring results.