Skip to main content

Research Repository

Advanced Search

All Outputs (40)

Diverse features discovery transformer for pedestrian attribute recognition (2022)
Journal Article
Zheng, A., Wang, H., Wang, J., Huang, H., He, R., & Hussain, A. (2023). Diverse features discovery transformer for pedestrian attribute recognition. Engineering Applications of Artificial Intelligence, 119, Article 105708. https://doi.org/10.1016/j.engapp

Recently, Swin Transformer has been widely explored as a general backbone for computer vision, which helps to improve the performance of vision tasks due to the ability to establish associations for long-range dependencies of different spatial locati... Read More about Diverse features discovery transformer for pedestrian attribute recognition.

A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids (2022)
Presentation / Conference Contribution
Bishnu, A., Gupta, A., Gogate, M., Dashtipour, K., Adeel, A., Hussain, A., Sellathurai, M., & Ratnarajah, T. (2022, October). A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids. Presented at 2022 IEEE Intern

In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The... Read More about A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids.

Multimodal salient object detection via adversarial learning with collaborative generator (2022)
Journal Article
Tu, Z., Yang, W., Wang, K., Hussain, A., Luo, B., & Li, C. (2023). Multimodal salient object detection via adversarial learning with collaborative generator. Engineering Applications of Artificial Intelligence, 119, Article 105707. https://doi.org/10.1016

Multimodal salient object detection(MSOD), which utilizes multimodal information (e.g., RGB image and thermal infrared or depth image) to detect common salient objects, has received much attention recently. Different modalities reflect different appe... Read More about Multimodal salient object detection via adversarial learning with collaborative generator.

A novel approach of many-objective particle swarm optimization with cooperative agents based on an inverted generational distance indicator (2022)
Journal Article
Kouka, N., BenSaid, F., Fdhila, R., Fourati, R., Hussain, A., & Alimi, A. M. (2023). A novel approach of many-objective particle swarm optimization with cooperative agents based on an inverted generational distance indicator. Information Sciences, 623, 22

Most evolutionary algorithms, including particle swarm optimization (PSO), use Pareto dominance as a major selection criterion and face significant challenges when dealing with many-objective problems. To tackle this issue, this paper proposes a nove... Read More about A novel approach of many-objective particle swarm optimization with cooperative agents based on an inverted generational distance indicator.

Towards Simple and Accurate Human Pose Estimation With Stair Network (2022)
Journal Article
Jiang, C., Huang, K., Zhang, S., Wang, X., Xiao, J., Niu, Z., & Hussain, A. (2023). Towards Simple and Accurate Human Pose Estimation With Stair Network. IEEE Transactions on Emerging Topics in Computational Intelligence, 7(3), 805-817. https://doi.org/10

In this paper, we focus on tackling the precise keypoint coordinates regression task. Most existing approaches adopt complicated networks with a large number of parameters, leading to a heavy model with poor cost-effectiveness in practice. To overcom... Read More about Towards Simple and Accurate Human Pose Estimation With Stair Network.

Canonical cortical graph neural networks and its application for speech enhancement in audio-visual hearing aids (2022)
Journal Article
Passos, L. A., Papa, J. P., Hussain, A., & Adeel, A. (2023). Canonical cortical graph neural networks and its application for speech enhancement in audio-visual hearing aids. Neurocomputing, 527, 196-203. https://doi.org/10.1016/j.neucom.2022.11.081

Despite the recent success of machine learning algorithms, most models face drawbacks when considering more complex tasks requiring interaction between different sources, such as multimodal input data and logical time sequences. On the other hand, th... Read More about Canonical cortical graph neural networks and its application for speech enhancement in audio-visual hearing aids.

Ellipse Encoding for Arbitrary-Oriented SAR Ship Detection Based on Dynamic Key Points (2022)
Journal Article
Gao, F., Huo, Y., Sun, J., Yu, T., Hussain, A., & Zhou, H. (2022). Ellipse Encoding for Arbitrary-Oriented SAR Ship Detection Based on Dynamic Key Points. IEEE Transactions on Geoscience and Remote Sensing, 60, Article 5240528. https://doi.org/10.1109/tgr

In recent years, there has been growing interest in developing oriented bounding box (OBB)-based deep learning approaches to detect arbitrary-oriented ship targets in synthetic aperture radar (SAR) images. However, most existing OBB-based detection m... Read More about Ellipse Encoding for Arbitrary-Oriented SAR Ship Detection Based on Dynamic Key Points.

Fusing external knowledge resources for natural language understanding techniques: A survey (2022)
Journal Article
Wang, Y., Wang, W., Chen, Q., Huang, K., Nguyen, A., De, S., & Hussain, A. (2023). Fusing external knowledge resources for natural language understanding techniques: A survey. Information Fusion, 92, 190-204. https://doi.org/10.1016/j.inffus.2022.11.025

Knowledge resources, e.g. knowledge graphs, which formally represent essential semantics and information for logic inference and reasoning, can compensate for the unawareness nature of many natural language processing techniques based on deep neural... Read More about Fusing external knowledge resources for natural language understanding techniques: A survey.

A robust deep learning approach for tomato plant leaf disease localization and classification (2022)
Journal Article
Nawaz, M., Nazir, T., Javed, A., Masood, M., Rashid, J., Kim, J., & Hussain, A. (2022). A robust deep learning approach for tomato plant leaf disease localization and classification. Scientific Reports, 12(1), Article 18568. https://doi.org/10.1038/s41598

Tomato plants' disease detection and classification at the earliest stage can save the farmers from expensive crop sprays and can assist in increasing the food quantity. Although, extensive work has been presented by the researcher for the tomato pla... Read More about A robust deep learning approach for tomato plant leaf disease localization and classification.

A Trimodel SAR Semisupervised Recognition Method Based on Attention-Augmented Convolutional Networks (2022)
Journal Article
Yan, S., Zhang, Y., Gao, F., Sun, J., Hussain, A., & Zhou, H. (2022). A Trimodel SAR Semisupervised Recognition Method Based on Attention-Augmented Convolutional Networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 1

Semisupervised learning in synthetic aperture radars (SARs) is one of the research hotspots in the field of radar image automatic target recognition. It can efficiently deal with challenging environments where there are insufficient labeled samples a... Read More about A Trimodel SAR Semisupervised Recognition Method Based on Attention-Augmented Convolutional Networks.

PointGS: Bridging and fusing geometric and semantic space for 3D point cloud analysis (2022)
Journal Article
Jiang, C., Huang, K., Wu, J., Wang, X., Xiao, J., & Hussain, A. (2023). PointGS: Bridging and fusing geometric and semantic space for 3D point cloud analysis. Information Fusion, 91, 316-326. https://doi.org/10.1016/j.inffus.2022.10.016

Directly processing 3D point cloud data becomes dominant in classification and segmentation tasks. Present mainstream point based methods usually focus on learning in either geometric space ( PointNet++) or semantic space ( DGCNN). Owing to the irreg... Read More about PointGS: Bridging and fusing geometric and semantic space for 3D point cloud analysis.

WikiDes: A Wikipedia-based dataset for generating short descriptions from paragraphs (2022)
Journal Article
Ta, H. T., Rahman, A. B. S., Majumder, N., Hussain, A., Najjar, L., Howard, N., …Gelbukh, A. (2023). WikiDes: A Wikipedia-based dataset for generating short descriptions from paragraphs. Information Fusion, 90, 265-282. https://doi.org/10.1016/j.inffus.

As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text s... Read More about WikiDes: A Wikipedia-based dataset for generating short descriptions from paragraphs.

Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions (2022)
Journal Article
Gandhi, A., Adhvaryu, K., Poria, S., Cambria, E., & Hussain, A. (2023). Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion, 91, 424-444. ht

Sentiment analysis (SA) has gained much traction In the field of artificial intelligence (AI) and natural language processing (NLP). There is growing demand to automate analysis of user sentiment towards products or services. Opinions are increasingl... Read More about Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions.

Towards real-time privacy-preserving audio-visual speech enhancement (2022)
Presentation / Conference Contribution
Gogate, M., Dashtipour, K., & Hussain, A. (2022, September). Towards real-time privacy-preserving audio-visual speech enhancement. Presented at 2nd Symposium on Security and Privacy in Speech Communication, Incheon, Korea

Human auditory cortex in everyday noisy situations is known to exploit aural and visual cues that are contextually combined by the brain’s multi-level integration strategies to selectively suppress the background noise and focus on the target speaker... Read More about Towards real-time privacy-preserving audio-visual speech enhancement.

DPb-MOPSO: A Dynamic Pareto bi-level Multi-objective Particle Swarm Optimization Algorithm (2022)
Journal Article
Aboud, A., Rokbani, N., Fdhila, R., Qahtani, A. M., Almutiry, O., Dhahri, H., …Alimi, A. M. (2022). DPb-MOPSO: A Dynamic Pareto bi-level Multi-objective Particle Swarm Optimization Algorithm. Applied Soft Computing, 129, Article 109622. https://doi.org/

Particle Swarm Optimization (PSO) system based on the distributed architecture over multiple sub-swarms is very efficient for static multi-objective optimization but has not been considered for solving dynamic multi-objective problems (DMOPs). Tracki... Read More about DPb-MOPSO: A Dynamic Pareto bi-level Multi-objective Particle Swarm Optimization Algorithm.

A New Class of Efficient Adaptive Filters for Online Nonlinear Modeling (2022)
Journal Article
Comminiello, D., Nezamdoust, A., Scardapane, S., Scarpiniti, M., Hussain, A., & Uncini, A. (2023). A New Class of Efficient Adaptive Filters for Online Nonlinear Modeling. IEEE Transactions on Systems, Man and Cybernetics: Systems, 53(3), 1384-1396. https

Nonlinear models are known to provide excellent performance in real-world applications that often operate in nonideal conditions. However, such applications often require online processing to be performed with limited computational resources. To addr... Read More about A New Class of Efficient Adaptive Filters for Online Nonlinear Modeling.

A Few-Shot Learning Method for SAR Images Based on Weighted Distance and Feature Fusion (2022)
Journal Article
Gao, F., Xu, J., Lang, R., Wang, J., Hussain, A., & Zhou, H. (2022). A Few-Shot Learning Method for SAR Images Based on Weighted Distance and Feature Fusion. Remote Sensing, 14(18), Article 4583. https://doi.org/10.3390/rs14184583

Convolutional Neural Network (CNN) has been widely applied in the field of synthetic aperture radar (SAR) image recognition. Nevertheless, CNN-based recognition methods usually encounter the problem of poor feature representation ability due to insuf... Read More about A Few-Shot Learning Method for SAR Images Based on Weighted Distance and Feature Fusion.

Multimodal audio-visual information fusion using canonical-correlated Graph Neural Network for energy-efficient speech enhancement (2022)
Journal Article
Passos, L. A., Papa, J. P., Del Ser, J., Hussain, A., & Adeel, A. (2023). Multimodal audio-visual information fusion using canonical-correlated Graph Neural Network for energy-efficient speech enhancement. Information Fusion, 90, 1-11. https://doi.org/10.

This paper proposes a novel multimodal self-supervised architecture for energy-efficient audio-visual (AV) speech enhancement that integrates Graph Neural Networks with canonical correlation analysis (CCA-GNN). The proposed approach lays its foundati... Read More about Multimodal audio-visual information fusion using canonical-correlated Graph Neural Network for energy-efficient speech enhancement.

A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning (2022)
Presentation / Conference Contribution
Hussain, T., Diyan, M., Gogate, M., Dashtipour, K., Adeel, A., Tsao, Y., & Hussain, A. (2022, July). A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning. Presented at 2022 44th Annual International Conference

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals. Despite improving the speech quality, su... Read More about A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning.

Pushing the limits of remote RF sensing by reading lips under the face mask (2022)
Journal Article
Hameed, H., Usman, M., Tahir, A., Hussain, A., Abbas, H., Cui, T. J., …Abbasi, Q. H. (2022). Pushing the limits of remote RF sensing by reading lips under the face mask. Nature Communications, 13(1), Article 5168. https://doi.org/10.1038/s41467-022-3223

The problem of Lip-reading has become an important research challenge in recent years. The goal is to recognise speech from lip movements. Most of the Lip-reading technologies developed so far are camera-based, which require video recording of the ta... Read More about Pushing the limits of remote RF sensing by reading lips under the face mask.