TUM Logo

Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation

Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling method has however the crucial drawback of high computational complexity, which makes it limited applicable on large data sets. We propose a novel dynamic sampling strategy to significantly improve the efficiency of collapsed Gibbs sampling. The strategy is explored in terms of efficiency, convergence and perplexity. Besides, we present a straight-forward parallelization to further improve the efficiency. Finally, we underpin our proposed improvements with a comparative study on different scale data sets.

Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation

Asian Conference on Machine Learning (ACML)

Authors: Han Xiao and Thomas Stibor
Year/month: 2010/
Booktitle: Asian Conference on Machine Learning (ACML)
Volume: 13
Series: JMLR W&CP
Address: Japan
Note: (AR: 31%)
Fulltext: XiaoStiborACML2.pdf

Abstract

Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling method has however the crucial drawback of high computational complexity, which makes it limited applicable on large data sets. We propose a novel dynamic sampling strategy to significantly improve the efficiency of collapsed Gibbs sampling. The strategy is explored in terms of efficiency, convergence and perplexity. Besides, we present a straight-forward parallelization to further improve the efficiency. Finally, we underpin our proposed improvements with a comparative study on different scale data sets.

Bibtex:

@inproceedings { hanxiao2010b,
author = { Han Xiao and Thomas Stibor},
title = { Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation },
year = { 2010 },
booktitle = { Asian Conference on Machine Learning (ACML) },
volume = { 13 },
series = { JMLR W&CP },
address = { Japan },
note = { (AR: 31%) },
url = {https://www.sec.in.tum.de/i20/publications/efficient-collapsed-gibbs-sampling-for-latent-dirichlet-allocation/@@download/file/XiaoStiborACML2.pdf}
}