Junaid Rashid
WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short text
Rashid, Junaid; Kim, Jungeun; Hussain, Amir; Naseem, Usman
Abstract
Short texts are a common source of knowledge, and the extraction of such valuable information is beneficial for several purposes. Traditional topic models are incapable of analyzing the internal structural information of topics. They are mostly based on the co-occurrence of words at the document level and are often unable to extract semantically relevant topics from short text datasets due to their limited length. Although some traditional topic models are sensitive to word order due to the strong sparsity of data, they do not perform well on short texts. In this paper, we propose a novel word embedding-based topic model (WETM) for short text documents to discover the structural information of topics and words and eliminate the sparsity problem. Moreover, a modified collapsed Gibbs sampling algorithm is proposed to strengthen the semantic coherence of topics in short texts. WETM extracts semantically coherent topics from short texts and finds relationships between words. Extensive experimental results on two real-world datasets show that WETM achieves better topic quality, topic coherence, classification, and clustering results. WETM also requires less execution time compared to traditional topic models.
Journal Article Type | Article |
---|---|
Acceptance Date | Jun 7, 2023 |
Online Publication Date | Jun 8, 2023 |
Publication Date | 2023-08 |
Deposit Date | Aug 10, 2023 |
Publicly Available Date | Jun 9, 2024 |
Journal | Pattern Recognition Letters |
Print ISSN | 0167-8655 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 172 |
Pages | 158-164 |
DOI | https://doi.org/10.1016/j.patrec.2023.06.007 |
Files
This file is under embargo until Jun 9, 2024 due to copyright reasons.
Contact repository@napier.ac.uk to request a copy for personal use.
You might also like
Applications of Deep Learning and Reinforcement Learning to Biological Data
(2018)
Journal Article
Guided Policy Search for Sequential Multitask Learning
(2018)
Journal Article
Learning Latent Features With Infinite Nonnegative Binary Matrix Trifactorization
(2018)
Journal Article
Cross-modality interactive attention network for multispectral pedestrian detection
(2018)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search