Shafayet Ahmed
Exploiting various word embedding models for query expansion in microblog
Ahmed, Shafayet; Chy, Abu Nowshed; Ullah, Md Zia
Abstract
Microblogs, especially Twitter, make it easier to communicate with others in a real-time manner and is treated as a valuable information source. With the increasing amount of tweets, it would be fascinating to be able to extract essential information out of those diverse tweets. However, due to the length constraint in Twitter, users typically use unfamiliar short forms, ambiguous expressions, Twitter-specific syntaxes, and URLs to convey their brief thoughts. All of these aspects incur the severe vocabulary mismatch problem and make it difficult to perform effective information retrieval (IR) on Twitter. In this paper, we propose a query expansion method that ameliorates the initial queries with expansion terms which reflects the user’s intent effectively. To select the effective candidate expansion terms, we exploit the various word embedding models including Word2Vec, GloVe, and fastText that are trained with the different local and external corpus. Our ensemble word embedding approach helps to extract the effective contextual features of terms. Next, we ranked the candidate terms based on the mean cosine similarity score of each query-term pair and use the top-ranked terms to augment the initial query. We have performed the experiments on TREC Microblog 2011-2012 test sets covering TREC Tweets2011 corpora. Experimental results exhibit the efficacy of our query expansion method over the other competitive approaches.
Citation
Ahmed, S., Chy, A. N., & Ullah, M. Z. (2020). Exploiting various word embedding models for query expansion in microblog. In 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC). https://doi.org/10.1109/R10-HTC49770.2020.9357016
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC) |
Start Date | Dec 1, 2020 |
End Date | Dec 3, 2020 |
Online Publication Date | Feb 23, 2021 |
Publication Date | 2020 |
Deposit Date | Mar 22, 2023 |
Publisher | Institute of Electrical and Electronics Engineers |
Series ISSN | 2572-7621 |
Book Title | 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC) |
DOI | https://doi.org/10.1109/R10-HTC49770.2020.9357016 |
Keywords | microblog search, query expansion, contextual information, Word2Vec, GloVe, fastText |
You might also like
Instruments and Tools to Identify Radical Textual Content
(2022)
Journal Article
Query expansion for microblog retrieval focusing on an ensemble of features
(2019)
Journal Article
Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification
(2016)
Presentation / Conference Contribution
Estimating a ranked list of human hereditary diseases for clinical phenotypes by using weighted bipartite network
(2013)
Presentation / Conference Contribution
Query subtopic mining for search result diversification
(2014)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search