Dr Md Zia Ullah M.Ullah@napier.ac.uk
Lecturer
A bipartite graph-based ranking approach to query subtopics diversification focused on word embedding features
Ullah, Md Zia; Aono, Masaki
Authors
Masaki Aono
Abstract
Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and contain a group of alternative queries with similar meaning. Therefore, identifying the subtopics covering possible intents behind a query is a formidable task. Moreover, both the query and subtopics are short in length, it is challenging to estimate the similarity between a pair of short texts and rank them accordingly. In this paper, we propose a method for mining and ranking subtopics where we introduce multiple semantic and content-aware features, a bipartite graph-based ranking (BGR) method, and a similarity function for short texts. Given a query, we aggregate the suggested queries from search engines as candidate subtopics and estimate the relevance of them with the given query based on word embedding and content-aware features by modeling a bipartite graph. To estimate the similarity between two short texts, we propose a Jensen-Shannon divergence based similarity function through the probability distributions of the terms in the top retrieved documents from a search engine. A diversified ranked list of subtopics covering possible intents of a query is assembled by balancing the relevance and novelty. We experimented and evaluated our method on the NTCIR-10 INTENT-2 and NTCIR-12 IMINE-2 subtopic mining test collections. Our proposed method outperforms the baselines, known related methods, and the official participants of the INTENT-2 and IMINE-2 competitions.
Citation
Ullah, M. Z., & Aono, M. (2016). A bipartite graph-based ranking approach to query subtopics diversification focused on word embedding features. IEICE Transactions on Information and Systems, 99(12), 3090-3100. https://doi.org/10.1587/transinf.2016EDP7190
Journal Article Type | Article |
---|---|
Publication Date | 2016-12 |
Deposit Date | Mar 13, 2023 |
Journal | IEICE Transactions on Information and Systems |
Print ISSN | 0916-8532 |
Electronic ISSN | 1745-1361 |
Publisher | Institute of Electronics, Information and Communication Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 99 |
Issue | 12 |
Pages | 3090-3100 |
DOI | https://doi.org/10.1587/transinf.2016EDP7190 |
Keywords | subtopic mining, query intent, diversification, word embedding, bipartite graph |
You might also like
Instruments and Tools to Identify Radical Textual Content
(2022)
Journal Article
Query expansion for microblog retrieval focusing on an ensemble of features
(2019)
Journal Article
Selective Query Processing: A Risk-Sensitive Selection of Search Configurations
(2023)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search