Skip to main content

Research Repository

Advanced Search

A novel multiple kernel fuzzy topic modeling technique for biomedical data

Rashid, Junaid; Kim, Jungeun; Hussain, Amir; Naseem, Usman; Juneja, Sapna

Authors

Junaid Rashid

Jungeun Kim

Usman Naseem

Sapna Juneja



Abstract

Background: Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. Methods: In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. Results: Extensive experiments are conducted on six biomedical datasets. MKFTM achieved the highest classification accuracy 99.04%, 99.62%, 99.69%, 99.61% in the Muchmore Springer dataset and 94.10%, 89.45%, 92.91%, 90.35% in the Ohsumed dataset. The CH index value of MKFTM is higher, which shows that its clustering performance is better than state-of-the-art topic models. Conclusion: We have confirmed from results that proposed MKFTM approach is very efficient to handles to sparsity and redundancy problem in biomedical text documents. MKFTM discovers semantically relevant topics with high accuracy for biomedical documents. Its gives better results for classification and clustering in biomedical documents. MKFTM is a new approach to topic modeling, which has the flexibility to work with a variety of clustering methods.

Citation

Rashid, J., Kim, J., Hussain, A., Naseem, U., & Juneja, S. (2022). A novel multiple kernel fuzzy topic modeling technique for biomedical data. BMC Bioinformatics, 23(1), Article 275. https://doi.org/10.1186/s12859-022-04780-1

Journal Article Type Article
Acceptance Date Jun 8, 2022
Online Publication Date Jul 12, 2022
Publication Date 2022
Deposit Date Jul 18, 2022
Publicly Available Date Jul 18, 2022
Journal BMC Bioinformatics
Print ISSN 1471-2105
Publisher BMC
Peer Reviewed Peer Reviewed
Volume 23
Issue 1
Article Number 275
DOI https://doi.org/10.1186/s12859-022-04780-1
Keywords Topic modeling, Medical data, Multiple kernel fuzzy topic modeling, MKFTM, Classification, Clustering
Public URL http://researchrepository.napier.ac.uk/Output/2889619

Files

A novel multiple kernel fuzzy topic modeling technique for biomedical data (2.1 Mb)
PDF

Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/

Copyright Statement
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.




You might also like



Downloadable Citations