Skip to main content

Research Repository

Advanced Search

A bipartite graph-based ranking approach to query subtopics diversification focused on word embedding features

Ullah, Md Zia; Aono, Masaki


Masaki Aono


Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and contain a group of alternative queries with similar meaning. Therefore, identifying the subtopics covering possible intents behind a query is a formidable task. Moreover, both the query and subtopics are short in length, it is challenging to estimate the similarity between a pair of short texts and rank them accordingly. In this paper, we propose a method for mining and ranking subtopics where we introduce multiple semantic and content-aware features, a bipartite graph-based ranking (BGR) method, and a similarity function for short texts. Given a query, we aggregate the suggested queries from search engines as candidate subtopics and estimate the relevance of them with the given query based on word embedding and content-aware features by modeling a bipartite graph. To estimate the similarity between two short texts, we propose a Jensen-Shannon divergence based similarity function through the probability distributions of the terms in the top retrieved documents from a search engine. A diversified ranked list of subtopics covering possible intents of a query is assembled by balancing the relevance and novelty. We experimented and evaluated our method on the NTCIR-10 INTENT-2 and NTCIR-12 IMINE-2 subtopic mining test collections. Our proposed method outperforms the baselines, known related methods, and the official participants of the INTENT-2 and IMINE-2 competitions.

Journal Article Type Article
Publication Date 2016-12
Deposit Date Mar 13, 2023
Journal IEICE Transactions on Information and Systems
Print ISSN 0916-8532
Publisher Institute of Electronics, Information and Communication Engineers
Peer Reviewed Peer Reviewed
Volume 99
Issue 12
Pages 3090-3100
Keywords subtopic mining, query intent, diversification, word embedding, bipartite graph