Starten Sie Ihre Suche...


Durch die Nutzung unserer Webseite erklären Sie sich damit einverstanden, dass wir Cookies verwenden. Weitere Informationen

Online sparse collapsed hybrid variational-gibbs algorithm for hierarchical Dirichlet process topic models

Ceci, Michelangelo (Hrsg). Machine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part II. Cham: Springer International Publishing 2017 S. 189 - 204

Erscheinungsjahr: 2017

ISBN/ISSN: 978-3-319-71245-1 ; 978-3-319-71246-8

Publikationstyp: Buchbeitrag (Konferenzbeitrag)

Sprache: Englisch

GeprüftBibliothek

Inhaltszusammenfassung


Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling...Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while simultaneously speeding up variational updates. This idea has previously been applied to standard latent Dirichlet allocation (LDA). We propose a new sampling method that enables the application of the idea to the nonparametric version of LDA, hierarchical Dirichlet process topic models. Our fast sampling method leads to a significant speedup of variational updates as compared to other sampling methods. Experiments show that training of our topic model converges to a better log-likelihood than previously existing variational methods and converges faster than Gibbs sampling in the batch setting.» weiterlesen» einklappen

Autoren


Burkhardt, Sophie (Autor)
Kramer, Stefan (Autor)

Klassifikation


DFG Fachgebiet:
Informatik

DDC Sachgruppe:
Informatik