Anirban Gangopadhyay

Papers from this author

An Empirical Bayes Approach to Topic Modeling

Anirban Gangopadhyay

Responsive image

Auto-TLDR; An Empirical Bayes Based Framework for Topic Modeling in Documents

Slides Similar

Given a corpus of documents, we consider the problem of finding latent topics, and introduce a novel Empirical Bayes based framework that allows us to choose the optimal topic modeling algorithm given observed variables in the data. We specifically consider three disparate algorithms - LDA, graph clustering, and non-negative matrix factorization - and provide a standardized framework that compares statistical and generative assumptions each algorithm makes. We then provide a model selection algorithm that quantifies each model based on how well assumptions match the data. We illustrate the efficacy of our approach by applying our framework to different sets of document corpuses and empirically measuring results.