Skip to main content

Research Repository

Advanced Search

Exploiting various word embedding models for query expansion in microblog

Ahmed, Shafayet; Chy, Abu Nowshed; Ullah, Md Zia

Authors

Shafayet Ahmed

Abu Nowshed Chy



Abstract

Microblogs, especially Twitter, make it easier to communicate with others in a real-time manner and is treated as a valuable information source. With the increasing amount of tweets, it would be fascinating to be able to extract essential information out of those diverse tweets. However, due to the length constraint in Twitter, users typically use unfamiliar short forms, ambiguous expressions, Twitter-specific syntaxes, and URLs to convey their brief thoughts. All of these aspects incur the severe vocabulary mismatch problem and make it difficult to perform effective information retrieval (IR) on Twitter. In this paper, we propose a query expansion method that ameliorates the initial queries with expansion terms which reflects the user’s intent effectively. To select the effective candidate expansion terms, we exploit the various word embedding models including Word2Vec, GloVe, and fastText that are trained with the different local and external corpus. Our ensemble word embedding approach helps to extract the effective contextual features of terms. Next, we ranked the candidate terms based on the mean cosine similarity score of each query-term pair and use the top-ranked terms to augment the initial query. We have performed the experiments on TREC Microblog 2011-2012 test sets covering TREC Tweets2011 corpora. Experimental results exhibit the efficacy of our query expansion method over the other competitive approaches.

Citation

Ahmed, S., Chy, A. N., & Ullah, M. Z. (2020). Exploiting various word embedding models for query expansion in microblog. In 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC). https://doi.org/10.1109/R10-HTC49770.2020.9357016

Presentation Conference Type Conference Paper (Published)
Conference Name 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC)
Start Date Dec 1, 2020
End Date Dec 3, 2020
Online Publication Date Feb 23, 2021
Publication Date 2020
Deposit Date Mar 22, 2023
Publisher Institute of Electrical and Electronics Engineers
Series ISSN 2572-7621
Book Title 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC)
DOI https://doi.org/10.1109/R10-HTC49770.2020.9357016
Keywords microblog search, query expansion, contextual information, Word2Vec, GloVe, fastText