Skip Content
You are currently on the new version of our website. Access the old version .

235 Results Found

  • Article
  • Open Access
10 Citations
3,551 Views
16 Pages

Scene Recognition for Visually-Impaired People’s Navigation Assistance Based on Vision Transformer with Dual Multiscale Attention

  • Yahia Said,
  • Mohamed Atri,
  • Marwan Ali Albahar,
  • Ahmed Ben Atitallah and
  • Yazan Ahmad Alsariera

24 February 2023

Notable progress was achieved by recent technologies. As the main goal of technology is to make daily life easier, we will investigate the development of an intelligent system for the assistance of impaired people in their navigation. For visually im...

  • Article
  • Open Access
3 Citations
2,809 Views
16 Pages

29 February 2024

Nowadays, the field of video-based action recognition is rapidly developing. Although Vision Transformers (ViT) have made great progress in static image processing, they are not yet fully optimized for dynamic video applications. Convolutional Neural...

  • Article
  • Open Access
22 Citations
3,545 Views
25 Pages

Crop Disease Identification by Fusing Multiscale Convolution and Vision Transformer

  • Dingju Zhu,
  • Jianbin Tan,
  • Chao Wu,
  • KaiLeung Yung and
  • Andrew W. H. Ip

29 June 2023

With the development of smart agriculture, deep learning is playing an increasingly important role in crop disease recognition. The existing crop disease recognition models are mainly based on convolutional neural networks (CNN). Although traditional...

  • Article
  • Open Access
2 Citations
2,475 Views
22 Pages

11 September 2025

In recent years, segmentation for medical applications using Magnetic Resonance Imaging (MRI) has received increasing attention. Working in this field has emerged as an ambitious task and a major challenge for researchers; particularly, brain tumor s...

  • Article
  • Open Access
5 Citations
1,876 Views
21 Pages

26 May 2025

Fire detection remains a challenging task due to varying fire scales, occlusions, and complex environmental conditions. This paper proposes the CN2VF-Net model, a novel hybrid architecture that combines vision Transformers (ViTs) and convolutional ne...

  • Article
  • Open Access
14 Citations
5,353 Views
17 Pages

12 December 2024

Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high...

  • Article
  • Open Access
2 Citations
2,841 Views
21 Pages

26 September 2024

Recently, Vision Transformers (ViTs) have been actively applied to fine-grained visual recognition (FGVR). ViT can effectively model the interdependencies between patch-divided object regions through an inherent self-attention mechanism. In addition,...

  • Article
  • Open Access
397 Views
16 Pages

A Hybrid Multi-Scale Transformer-CNN UNet for Crowd Counting

  • Kai Zhao,
  • Chunhao He,
  • Shufan Peng and
  • Tianliang Lu

4 January 2026

Crowd counting is a critical computer vision task with significant applications in public security and smart city systems. While deep learning has markedly improved accuracy, persistent challenges include extreme scale variations, severe occlusion, a...

  • Article
  • Open Access
554 Views
24 Pages

26 November 2025

Colorectal cancer (CRC) is the second most common global malignancy with high mortality, and timely early polyp detection is critical to halt its progression. Yet, polyp image segmentation—an essential tool—faces challenges: blurred edges...

  • Article
  • Open Access
4 Citations
1,347 Views
20 Pages

29 November 2024

The normal operation of rolling bearings is crucial to the performance and reliability of rotating machinery. However, the collected vibration signals are often mixed with complex noise, and the transformer network cannot fully extract the characteri...

  • Article
  • Open Access
8 Citations
2,787 Views
25 Pages

7 February 2024

Salient Object Detection (SOD) is gradually applied in natural scene images. However, due to the apparent differences between optical remote sensing images and natural scene images, directly applying the SOD of natural scene images to optical remote...

  • Article
  • Open Access
1,148 Views
18 Pages

PID-NET: A Novel Parallel Image-Dehazing Network

  • Wei Liu,
  • Yi Zhou,
  • Dehua Zhang and
  • Yi Qin

Image dehazing is a critical task in image restoration, aiming to retrieve clear images from hazy scenes. This process is vital for various applications, including machine recognition, security monitoring, and aerial photography. Current dehazing alg...

  • Article
  • Open Access
7 Citations
2,906 Views
21 Pages

30 March 2025

Drone-based object detection faces critical challenges, including tiny objects, complex urban backgrounds, dramatic scale variations, and high-frequency detail loss during feature propagation. Current detection methods struggle to address these chall...

  • Article
  • Open Access
10 Citations
3,363 Views
19 Pages

24 November 2023

In the production process of metal industrial products, the deficiencies and limitations of existing technologies and working conditions can have adverse effects on the quality of the final products, making surface defect detection particularly cruci...

  • Article
  • Open Access
4 Citations
1,191 Views
17 Pages

Lightweight Transformer with Adaptive Rotational Convolutions for Aerial Object Detection

  • Sabina Umirzakova,
  • Shakhnoza Muksimova,
  • Abrayeva Mahliyo Olimjon Qizi and
  • Young Im Cho

7 May 2025

Oriented object detection in aerial imagery presents unique challenges due to the arbitrary orientations, diverse scales, and limited availability of labeled data. In response to these issues, we propose RASST—a lightweight Rotationally Aware S...

  • Article
  • Open Access
8 Citations
5,304 Views
30 Pages

31 October 2022

Transmission line fittings have been exposed to complex environments for a long time. Due to the interference of haze and other environmental factors, it is often difficult for the camera to obtain high quality on-site images, and the traditional ima...

  • Article
  • Open Access
2 Citations
2,139 Views
28 Pages

19 February 2025

Vision–language pre-training (VLP) faces challenges in aligning hierarchical textual semantics (words/phrases/sentences) with multi-scale visual features (objects/relations/global context). We propose a hierarchical VLP model (HieVLP) that addr...

  • Article
  • Open Access
5 Citations
2,108 Views
20 Pages

20 August 2024

Automated segmentation algorithms for dermoscopic images serve as effective tools that assist dermatologists in clinical diagnosis. While existing deep learning-based skin lesion segmentation algorithms have achieved certain success, challenges remai...

  • Article
  • Open Access
4 Citations
2,938 Views
21 Pages

Fresh Tea Leaf-Grading Detection: An Improved YOLOv8 Neural Network Model Utilizing Deep Learning

  • Zejun Wang,
  • Yuxin Xia,
  • Houqiao Wang,
  • Xiaohui Liu,
  • Raoqiong Che,
  • Xiaoxue Guo,
  • Hongxu Li,
  • Shihao Zhang and
  • Baijuan Wang

To facilitate the realization of automated tea picking and enhance the speed and accuracy of tea leaf grading detection, this study proposes an improved YOLOv8 network for fresh tea leaf grading recognition. This approach integrates a Hierarchical Vi...

  • Article
  • Open Access
34 Citations
6,947 Views
25 Pages

Large-Scale Date Palm Tree Segmentation from Multiscale UAV-Based and Aerial Images Using Deep Vision Transformers

  • Mohamed Barakat A. Gibril,
  • Helmi Zulhaidi Mohd Shafri,
  • Rami Al-Ruzouq,
  • Abdallah Shanableh,
  • Faten Nahas and
  • Saeed Al Mansoori

29 January 2023

The reliable and efficient large-scale mapping of date palm trees from remotely sensed data is crucial for developing palm tree inventories, continuous monitoring, vulnerability assessments, environmental control, and long-term management. Given the...

  • Article
  • Open Access
860 Views
17 Pages

26 May 2025

Effectively capturing multi-scale object features is crucial for vision sensors used in road object detection tasks. Traditional spatial pyramid pooling methods fuse multi-scale feature information but lack adaptability in dynamically adjusting convo...

  • Article
  • Open Access
8 Citations
2,862 Views
18 Pages

15 April 2025

Skin cancer is a significant global health concern, with melanoma being the most dangerous form, responsible for the majority of skin cancer-related deaths. Early detection of skin cancer is critical, as it can drastically improve survival rates. Whi...

  • Article
  • Open Access
1,346 Views
17 Pages

7 September 2025

Soybean rust, caused by the fungus Phakopsora pachyrhizi, is recognized as the most devastating disease affecting soybean crops worldwide. In practical applications, performing accurate Phakopsora pachyrhizi segmentation (PPS) is essential for elucid...

  • Article
  • Open Access
6 Citations
3,773 Views
18 Pages

26 June 2024

Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in rea...

  • Article
  • Open Access
3 Citations
1,824 Views
17 Pages

15 September 2024

Wild mushrooms are popular for their taste and nutritional value; however, non-experts often struggle to distinguish between toxic and non-toxic species when foraging in the wild, potentially leading to poisoning incidents. To address this issue, thi...

  • Article
  • Open Access
55 Citations
6,867 Views
17 Pages

16 January 2023

Forest fires have continually endangered personal safety and social property. To reduce the occurrences of forest fires, it is essential to detect forest fire smoke accurately and quickly. Traditional forest fire smoke detection based on convolutiona...

  • Article
  • Open Access
4 Citations
3,442 Views
28 Pages

A Novel Multi-Scale Transformer for Object Detection in Aerial Scenes

  • Guanlin Lu,
  • Xiaohui He,
  • Qiang Wang,
  • Faming Shao,
  • Hongwei Wang and
  • Jinkang Wang

27 July 2022

Deep learning has promoted the research of object detection in aerial scenes. However, most of the existing networks are limited by the large-scale variation of objects and the confusion of category features. To overcome these limitations, this paper...

  • Article
  • Open Access
5 Citations
4,578 Views
20 Pages

HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images

  • Mahmoud SalahEldin Kasem,
  • Mohamed Mahmoud,
  • Bilel Yagoub,
  • Mostafa Farouk Senussi,
  • Mahmoud Abdalla and
  • Hyun-Soo Kang

15 January 2025

Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combin...

  • Article
  • Open Access
1 Citations
1,414 Views
22 Pages

25 November 2024

An unescapable trend of hyperspectral image (HSI) has been toward classification with high accuracy and splendid performance. In recent years, Transformers have made remarkable progress in the HSI classification task. However, Transformer-based metho...

  • Article
  • Open Access
21 Citations
3,681 Views
22 Pages

20 October 2021

The central goal of few-shot scene classification is to learn a model that can generalize well to a novel scene category (UNSEEN) from only one or a few labeled examples. Recent works in the Remote Sensing (RS) community tackle this challenge by deve...

  • Article
  • Open Access
1 Citations
1,319 Views
28 Pages

14 April 2025

Rain is a typical meteorological event that affects the visual appeal of outdoor pictures. The presence of rain streaks severely blurs image details, negatively impacting subsequent computer visual tasks. Due to the challenge of acquiring authentic p...

  • Article
  • Open Access
7 Citations
1,675 Views
22 Pages

30 October 2024

Power transmission line icing (PTLI) poses significant threats to the reliability and safety of electrical power systems, particularly in cold regions. Accumulation of ice on power lines can lead to severe consequences, such as line breaks, tower col...

  • Article
  • Open Access
16 Citations
4,480 Views
19 Pages

STMSF: Swin Transformer with Multi-Scale Fusion for Remote Sensing Scene Classification

  • Yingtao Duan,
  • Chao Song,
  • Yifan Zhang,
  • Puyu Cheng and
  • Shaohui Mei

16 February 2025

Emerging vision transformers (ViTs) are more powerful in modeling long-range dependences of features than conventional deep convolution neural networks (CNNs). Thus, they outperform CNNs in several computer vision tasks. However, existing ViTs fail t...

  • Article
  • Open Access
12 Citations
3,030 Views
21 Pages

2 August 2024

Transformers have recently gained significant attention in low-level vision tasks, particularly for remote sensing image super-resolution (RSISR). The vanilla vision transformer aims to establish long-range dependencies between image patches. However...

  • Article
  • Open Access
3 Citations
4,966 Views
21 Pages

3 February 2023

Vision Transformers (ViTs) have shown their superiority in various visual tasks for the capability of self-attention mechanisms to model long-range dependencies. Some recent works try to reduce the high cost of vision transformers by limiting the sel...

  • Article
  • Open Access
8 Citations
2,846 Views
17 Pages

MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation

  • Hongjie Zhou,
  • Rufei Zhang,
  • Xiaoyu He,
  • Nannan Li,
  • Yong Wang and
  • Sheng Shen

8 March 2023

Few-shot semantic segmentation has attracted much attention because it requires only a few labeled samples to achieve good segmentation performance. However, existing methods still suffer from insufficient contextual information and unsatisfactory ed...

  • Article
  • Open Access
11 Citations
3,915 Views
14 Pages

29 August 2023

Pose estimation plays a crucial role in recognizing and analyzing the postures, actions, and movements of humans and animals using computer vision and machine learning techniques. However, bird pose estimation encounters specific challenges, includin...

  • Article
  • Open Access
8 Citations
2,128 Views
27 Pages

U-Shaped Dual Attention Vision Mamba Network for Satellite Remote Sensing Single-Image Dehazing

  • Tangyu Sui,
  • Guangfeng Xiang,
  • Feinan Chen,
  • Yang Li,
  • Xiayu Tao,
  • Jiazu Zhou,
  • Jin Hong and
  • Zhenwei Qiu

17 March 2025

In remote sensing single-image dehazing (RSSID), adjacency effects and the multi-scale characteristics of the land surface–atmosphere system highlight the importance of a network’s effective receptive field (ERF) and its ability to captur...

  • Article
  • Open Access
1 Citations
1,565 Views
17 Pages

In recent years, the utilization of artificial intelligence methodologies in computer vision has markedly propelled the advancement of intelligent healthcare. A multimodal medical image segmentation algorithm is proposed by combining patient metadata...

  • Article
  • Open Access
13 Citations
4,027 Views
13 Pages

Haze-Aware Attention Network for Single-Image Dehazing

  • Lihan Tong,
  • Yun Liu,
  • Weijia Li,
  • Liyuan Chen and
  • Erkang Chen

21 June 2024

Single-image dehazing is a pivotal challenge in computer vision that seeks to remove haze from images and restore clean background details. Recognizing the limitations of traditional physical model-based methods and the inefficiencies of current atte...

  • Article
  • Open Access
1,203 Views
26 Pages

4 August 2025

Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that int...

  • Article
  • Open Access
3 Citations
2,655 Views
30 Pages

Multi-Scale Vision Transformer with Optimized Feature Fusion for Mammographic Breast Cancer Classification

  • Soaad Ahmed,
  • Naira Elazab,
  • Mostafa M. El-Gayar,
  • Mohammed Elmogy and
  • Yasser M. Fouda

Background: Breast cancer remains one of the leading causes of mortality among women worldwide, highlighting the critical need for accurate and efficient diagnostic methods. Methods: Traditional deep learning models often struggle with feature redund...

  • Article
  • Open Access
3 Citations
2,517 Views
21 Pages

Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification

  • Dan Zhang,
  • Wenping Ma,
  • Licheng Jiao,
  • Xu Liu,
  • Yuting Yang and
  • Fang Liu

26 December 2024

The Transformer model can capture global contextual information but does not have an inherent inductive bias. In contrast, convolutional neural networks (CNNs) are highly praised in computer vision due to their strong inductive bias and local spatial...

  • Article
  • Open Access
1 Citations
4,064 Views
16 Pages

Unsupervised Image Translation Using Multi-Scale Residual GAN

  • Yifei Zhang,
  • Weipeng Li,
  • Daling Wang and
  • Shi Feng

19 November 2022

Image translation is a classic problem of image processing and computer vision for transforming an image from one domain to another by learning the mapping between an input image and an output image. A novel Multi-scale Residual Generative Adversaria...

  • Article
  • Open Access
32 Citations
4,432 Views
16 Pages

RCCT-ASPPNet: Dual-Encoder Remote Image Segmentation Based on Transformer and ASPP

  • Yazhou Li,
  • Zhiyou Cheng,
  • Chuanjian Wang,
  • Jinling Zhao and
  • Linsheng Huang

7 January 2023

Remote image semantic segmentation technology is one of the core research elements in the field of computer vision and has a wide range of applications in production life. Most remote image semantic segmentation methods are based on CNN. Recently, Tr...

  • Article
  • Open Access
6 Citations
2,964 Views
21 Pages

Hyperspectral Image Classification Using Multi-Scale Lightweight Transformer

  • Quan Gu,
  • Hongkang Luan,
  • Kaixuan Huang and
  • Yubao Sun

29 February 2024

The distinctive feature of hyperspectral images (HSIs) is their large number of spectral bands, which allows us to identify categories of ground objects by capturing discrepancies in spectral information. Convolutional neural networks (CNN) with atte...

  • Article
  • Open Access
8 Citations
4,055 Views
17 Pages

26 October 2023

Semantic segmentation is a fundamental task in remote sensing image analysis that aims to classify each pixel in an image into different land use and land cover (LULC) segmentation tasks. In this paper, we propose MeViT (Medium-Resolution Vision Tran...

  • Article
  • Open Access
383 Views
26 Pages

23 December 2025

Crowd counting is a significant task in computer vision. By combining the rich texture information from RGB images with the insensitivity to illumination changes offered by thermal imaging, the applicability of models in real-world complex scenarios...

  • Article
  • Open Access
12 Citations
2,987 Views
25 Pages

29 July 2024

Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fiel...

  • Article
  • Open Access
9 Citations
4,553 Views
23 Pages

24 February 2023

The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequent...

of 5