Skip to main content

Research Repository

Advanced Search

WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short text

Rashid, Junaid; Kim, Jungeun; Hussain, Amir; Naseem, Usman

Authors

Junaid Rashid

Jungeun Kim

Usman Naseem



Abstract

Short texts are a common source of knowledge, and the extraction of such valuable information is beneficial for several purposes. Traditional topic models are incapable of analyzing the internal structural information of topics. They are mostly based on the co-occurrence of words at the document level and are often unable to extract semantically relevant topics from short text datasets due to their limited length. Although some traditional topic models are sensitive to word order due to the strong sparsity of data, they do not perform well on short texts. In this paper, we propose a novel word embedding-based topic model (WETM) for short text documents to discover the structural information of topics and words and eliminate the sparsity problem. Moreover, a modified collapsed Gibbs sampling algorithm is proposed to strengthen the semantic coherence of topics in short texts. WETM extracts semantically coherent topics from short texts and finds relationships between words. Extensive experimental results on two real-world datasets show that WETM achieves better topic quality, topic coherence, classification, and clustering results. WETM also requires less execution time compared to traditional topic models.

Journal Article Type Article
Acceptance Date Jun 7, 2023
Online Publication Date Jun 8, 2023
Publication Date 2023-08
Deposit Date Aug 10, 2023
Publicly Available Date Jun 9, 2024
Journal Pattern Recognition Letters
Print ISSN 0167-8655
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 172
Pages 158-164
DOI https://doi.org/10.1016/j.patrec.2023.06.007