S Muhammad Ahmed Hassan Shah
A Hybrid Neuro-Fuzzy Approach for Heterogeneous Patch Encoding in ViTs Using Contrastive Embeddings and Deep Knowledge Dispersion
Shah, S Muhammad Ahmed Hassan; Khan, Muhammad Qasim; Ghadi, Yazeed Yasin; Jan, Sana Ullah; Mzoughi, Olfa; Hamdi, Monia
Authors
Muhammad Qasim Khan
Yazeed Yasin Ghadi
Dr Sanaullah Jan S.Jan@napier.ac.uk
Lecturer
Olfa Mzoughi
Monia Hamdi
Abstract
Vision Transformers (ViT) are commonly utilized in image recognition and related applications. It delivers impressive results when it is pre-trained using massive volumes of data and then employed in mid-sized or small-scale image recognition evaluations such as ImageNet and CIFAR-100. Basically, it converts images into patches, and then the patch encoding is used to produce latent embeddings (linear projection and positional embedding). In this work, the patch encoding module is modified to produce heterogeneous embedding by using new types of weighted encoding. A traditional transformer uses two embeddings including linear projection and positional embedding. The proposed model replaces this with weighted combination of linear projection embedding, positional embedding and three additional embeddings called Spatial Gated, Fourier Token Mixing and Multi-layer perceptron Mixture embedding. Secondly, a Divergent Knowledge Dispersion (DKD) mechanism is proposed to propagate the previous latent information far in the transformer network. It ensures the latent knowledge to be used in multi headed attention for efficient patch encoding. Four benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10 and CIFAR-100) are used for comparative performance evaluation. The proposed model is named as SWEKP-based ViT, where the term SWEKP stands for Stochastic Weighted Composition of Contrastive Embeddings & Divergent Knowledge Dispersion (DKD) for Heterogeneous Patch Encoding. The experimental results show that adding extra embeddings in transformer and integrating DKD mechanism increases performance for benchmark datasets. The ViT has been trained separately with combination of these embeddings for encoding. Conclusively, the spatial gated embedding with default embeddings outperforms Fourier Token Mixing and MLP-Mixture embeddings.
Citation
Shah, S. M. A. H., Khan, M. Q., Ghadi, Y. Y., Jan, S. U., Mzoughi, O., & Hamdi, M. (2023). A Hybrid Neuro-Fuzzy Approach for Heterogeneous Patch Encoding in ViTs Using Contrastive Embeddings and Deep Knowledge Dispersion. IEEE Access, 11, 83171-83186. https://doi.org/10.1109/access.2023.3302253
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 31, 2023 |
Online Publication Date | Aug 4, 2023 |
Publication Date | 2023 |
Deposit Date | Aug 8, 2023 |
Publicly Available Date | Aug 8, 2023 |
Journal | IEEE Access |
Electronic ISSN | 2169-3536 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 11 |
Pages | 83171-83186 |
DOI | https://doi.org/10.1109/access.2023.3302253 |
Keywords | vision transformer, patch encoding, spatial gated unit, Fourier token mixing, MLP-mixture embedding, computer vision |
Files
A Hybrid Neuro-Fuzzy Approach for Heterogeneous Patch Encoding in ViTs Using Contrastive Embeddings and Deep Knowledge Dispersion
(2.4 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
Hybrid Wi-Fi and PLC network for efficient e-health communication in hospitals: a prototype
(2024)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search