Wadii Boulila
A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing
Boulila, Wadii; Ghandorh, Hamza; Masood, Sharjeel; Alzahem, Ayyub; Koubaa, Anis; Ahmed, Fawad; Khan, Zahid; Ahmad, Jawad
Authors
Hamza Ghandorh
Sharjeel Masood
Ayyub Alzahem
Anis Koubaa
Fawad Ahmed
Zahid Khan
Dr Jawad Ahmad J.Ahmad@napier.ac.uk
Visiting Lecturer
Abstract
Semantic segmentation of Remote Sensing (RS) images involves the classification of each pixel in a satellite image into distinct and non-overlapping regions or segments. This task is crucial in various domains, including land cover classification, autonomous driving, and scene understanding. While deep learning has shown promising results, there is limited research that specifically addresses the challenge of processing fine details in RS images while also considering the high computational demands. To tackle this issue, we propose a novel approach that combines convolutional and transformer architectures. Our design incorporates convolutional layers with a low receptive field to generate fine-grained feature maps for small objects in very high-resolution images. On the other hand, transformer blocks are utilized to capture contextual information from the input. By leveraging convolution and self-attention in this manner, we reduce the need for extensive downsampling and enable the network to work with full-resolution features, which is particularly beneficial for handling small objects. Additionally, our approach eliminates the requirement for vast datasets, which is often necessary for purely transformer-based networks. In our experimental results, we demonstrate the effectiveness of our method in generating local and contextual features using convolutional and transformer layers, respectively. Our approach achieves a mean dice score of 80.41%, outperforming other well-known techniques such as UNet, Fully-Connected Network (FCN), Pyramid Scene Parsing Network (PSP Net), and the recent Convolutional vision Transformer (CvT) model, which achieved mean dice scores of 78.57%, 74.57%, 73.45%, and 62.97% respectively, under the same training conditions and using the same training dataset.
Citation
Boulila, W., Ghandorh, H., Masood, S., Alzahem, A., Koubaa, A., Ahmed, F., Khan, Z., & Ahmad, J. (2024). A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing. Heliyon, 10(8), Article e29396. https://doi.org/10.1016/j.heliyon.2024.e29396
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 8, 2024 |
Online Publication Date | Apr 17, 2024 |
Publication Date | 2024 |
Deposit Date | Apr 30, 2024 |
Publicly Available Date | Apr 30, 2024 |
Journal | Heliyon |
Print ISSN | 2405-8440 |
Electronic ISSN | 2405-8440 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 10 |
Issue | 8 |
Article Number | e29396 |
DOI | https://doi.org/10.1016/j.heliyon.2024.e29396 |
Keywords | Semantic segmentation, Self-attention, Vision transformer, Satellite images, Remote sensing |
Public URL | http://researchrepository.napier.ac.uk/Output/3602254 |
Files
A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing
(1.6 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Transparent RFID tag wall enabled by artificial intelligence for assisted living
(2024)
Journal Article
Chaotic Quantum Encryption to Secure Image Data in Post Quantum Consumer Technology
(2024)
Journal Article
ML-FAS: Multi-Level Face Anonymization Scheme and Its Application to E-Commerce Systems
(2024)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search