Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (28)

Search Parameters:
Keywords = multihead pyramids

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 8033 KiB  
Article
SR-DETR: Target Detection in Maritime Rescue from UAV Imagery
by Yuling Liu and Yan Wei
Remote Sens. 2025, 17(12), 2026; https://doi.org/10.3390/rs17122026 - 12 Jun 2025
Viewed by 941
Abstract
The growth of maritime transportation has been accompanied by a gradual increase in accident rates, drawing greater attention to the critical issue of man-overboard incidents and drowning. Traditional maritime search-and-rescue (SAR) methods are often constrained by limited efficiency and high operational costs. Over [...] Read more.
The growth of maritime transportation has been accompanied by a gradual increase in accident rates, drawing greater attention to the critical issue of man-overboard incidents and drowning. Traditional maritime search-and-rescue (SAR) methods are often constrained by limited efficiency and high operational costs. Over the past few years, drones have demonstrated significant promise in improving the effectiveness of search-and-rescue operations. This is largely due to their exceptional ability to move freely and their capacity for wide-area monitoring. This study proposes an enhanced SR-DETR algorithm aimed at improving the detection of individuals who have fallen overboard. Specifically, the conventional multi-head self-attention (MHSA) mechanism is replaced with Efficient Additive Attention (EAA), which facilitates more efficient feature interaction while substantially reducing computational complexity. Moreover, we introduce a new feature aggregation module called the Cross-Stage Partial Parallel Atrous Feature Pyramid Network (CPAFPN). By refining spatial attention mechanisms, the module significantly boosts cross-scale target recognition capabilities in the model, especially offering advantages for detecting smaller objects. To improve localization precision, we develop a novel loss function for bounding box regression, named Focaler-GIoU, which performs particularly well when handling densely packed and small-scale objects. The proposed approach is validated through experiments and achieves an mAP of 86.5%, which surpasses the baseline RT-DETR model’s performance of 83.2%. These outcomes highlight the practicality and reliability of our method in detecting individuals overboard, contributing to more precise and resource-efficient solutions for real-time maritime rescue efforts. Full article
Show Figures

Figure 1

18 pages, 2509 KiB  
Article
Lightweight Infrared Small Target Detection Method Based on Linear Transformer
by Bingshu Wang, Yifan Wang, Qianchen Mao, Jingzhuo Cao, Han Zhang and Laixian Zhang
Remote Sens. 2025, 17(12), 2016; https://doi.org/10.3390/rs17122016 - 11 Jun 2025
Viewed by 856
Abstract
With the flourish of deep learning, transformer models have achieved remarkable performance in dealing with many computer vision tasks. However, their applications in infrared small target detection is limited due to two factors: (1) the high computational complexity of the conventional transformer models [...] Read more.
With the flourish of deep learning, transformer models have achieved remarkable performance in dealing with many computer vision tasks. However, their applications in infrared small target detection is limited due to two factors: (1) the high computational complexity of the conventional transformer models reduces the efficiency of detection; (2) the small target is easily left out in the visual environment with complex backgrounds. To deal with the issues, we propose a lightweight infrared small target detection method based on a linear transformer named IstdVit, which achieves high accuracy and low delay in infrared small target detection. The model consists of two parts: a multi-scale linear transformer and a lightweight dual feature pyramid network. It combines the strengths of a lightweight feature extraction module and the multi-head attention mechanism, effectively representing the small targets in the complex background at an economical computational cost. Additionally, it incorporates rotational position encoding to improve understanding of spatial context. The experiments conducted on the NUDT-SIRST and IRSTD-1K datasets indicate that IstdVit achieves a good balance between speed and accuracy, outperforming other state-of-the-art methods while maintaining a low number of parameters. Full article
Show Figures

Figure 1

28 pages, 2816 KiB  
Article
Enhancing Urban Understanding Through Fine-Grained Segmentation of Very-High-Resolution Aerial Imagery
by Umamaheswaran Raman Kumar, Toon Goedemé and Patrick Vandewalle
Remote Sens. 2025, 17(10), 1771; https://doi.org/10.3390/rs17101771 - 19 May 2025
Viewed by 649
Abstract
Despite the growing availability of very-high-resolution (VHR) remote sensing imagery, extracting fine-grained urban features and materials remains a complex task. Land use/land cover (LULC) maps generated from satellite imagery often fall short in providing the resolution needed for detailed urban studies. While hyperspectral [...] Read more.
Despite the growing availability of very-high-resolution (VHR) remote sensing imagery, extracting fine-grained urban features and materials remains a complex task. Land use/land cover (LULC) maps generated from satellite imagery often fall short in providing the resolution needed for detailed urban studies. While hyperspectral imagery offers rich spectral information ideal for material classification, its complex acquisition process limits its use on aerial platforms such as manned aircraft and unmanned aerial vehicles (UAVs), reducing its feasibility for large-scale urban mapping. This study explores the potential of using only RGB and LiDAR data from VHR aerial imagery as an alternative for urban material classification. We introduce an end-to-end workflow that leverages a multi-head segmentation network to jointly classify roof and ground materials while also segmenting individual roof components. The workflow includes a multi-offset self-ensemble inference strategy optimized for aerial data and a post-processing step based on digital elevation models (DEMs). In addition, we present a systematic method for extracting roof parts as polygons enriched with material attributes. The study is conducted on six cities in Flanders, Belgium, covering 18 material classes—including rare categories such as green roofs, wood, and glass. The results show a 9.88% improvement in mean intersection over union (mIOU) for building and ground segmentation, and a 3.66% increase in mIOU for material segmentation compared to a baseline pyramid attention network (PAN). These findings demonstrate the potential of RGB and LiDAR data for high-resolution material segmentation in urban analysis. Full article
(This article belongs to the Special Issue Applications of AI and Remote Sensing in Urban Systems II)
Show Figures

Figure 1

33 pages, 20017 KiB  
Article
Unified Deep Learning Model for Global Prediction of Aboveground Biomass, Canopy Height, and Cover from High-Resolution, Multi-Sensor Satellite Imagery
by Manuel Weber, Carly Beneke and Clyde Wheeler
Remote Sens. 2025, 17(9), 1594; https://doi.org/10.3390/rs17091594 - 30 Apr 2025
Viewed by 861
Abstract
Regular measurement of carbon stock in the world’s forests is critical for carbon accounting and reporting under national and international climate initiatives and for scientific research but has been largely limited in scalability and temporal resolution due to a lack of ground-based assessments. [...] Read more.
Regular measurement of carbon stock in the world’s forests is critical for carbon accounting and reporting under national and international climate initiatives and for scientific research but has been largely limited in scalability and temporal resolution due to a lack of ground-based assessments. Increasing efforts have been made to address these challenges by incorporating remotely sensed data. We present a new methodology that uses multi-sensor, multispectral imagery at a resolution of 10 m and a deep learning-based model that unifies the prediction of aboveground biomass density (AGBD), canopy height (CH), and canopy cover (CC), as well as uncertainty estimations for all three quantities. The model architecture is a custom Feature Pyramid Network consisting of an encoder, decoder, and multiple prediction heads, all based on convolutional neural networks. It is trained on millions of globally sampled GEDI-L2/L4 measurements. We validate the capability of the model by deploying it over the entire globe for the year 2023 as well as annually from 2016 to 2023 over selected areas. The model achieves a mean absolute error for AGBD (CH, CC) of 26.1 Mg/ha (3.7 m, 9.9%) and a root mean squared error of 50.6 Mg/ha (5.4 m, 15.8%) on a globally sampled test dataset, demonstrating a significant improvement over previously published results. We also report the model performance against independently collected ground measurements published in the literature, which show a high degree of correlation across varying conditions. We further show that our pre-trained model facilitates seamless transferability to other GEDI variables due to its multi-head architecture. Full article
(This article belongs to the Special Issue Forest Biomass/Carbon Monitoring towards Carbon Neutrality)
Show Figures

Figure 1

22 pages, 118441 KiB  
Article
CBLN-YOLO: An Improved YOLO11n-Seg Network for Cotton Topping in Fields
by Yufei Xie and Liping Chen
Agronomy 2025, 15(4), 996; https://doi.org/10.3390/agronomy15040996 - 21 Apr 2025
Viewed by 704
Abstract
The positioning of the top bud by the topping machine in the cotton topping operation depends on the recognition algorithm. The detection results of the traditional target detection algorithm contain a lot of useless information, which is not conducive to the positioning of [...] Read more.
The positioning of the top bud by the topping machine in the cotton topping operation depends on the recognition algorithm. The detection results of the traditional target detection algorithm contain a lot of useless information, which is not conducive to the positioning of the top bud. In order to obtain a more efficient recognition algorithm, we propose a top bud segmentation algorithm CBLN-YOLO based on the YOLO11n-seg model. Firstly, the standard convolution and multihead self-attention (MHSA) mechanisms in YOLO11n-seg are replaced by linear deformable convolution (LDConv) and coordinate attention (CA) mechanisms to reduce the parameter growth rate of the original model and better mine detailed features of the top buds. In the neck, the feature pyramid network (FPN) is reconstructed using an enhanced interlayer feature correlation (EFC) module, and regression loss is calculated using the Inner CIoU loss function. When tested on a self-built dataset, the mAP@0.5 values of CBLN-YOLO for detection and segmentation are 98.3% and 95.8%, respectively, which are higher than traditional segmentation models. At the same time, CBLN-YOLO also shows strong robustness under different weather and time periods, and its recognition speed reaches 135 frames per second, which provides strong support for cotton top bud positioning in the field environment. Full article
(This article belongs to the Collection AI, Sensors and Robotics for Smart Agriculture)
Show Figures

Figure 1

23 pages, 19686 KiB  
Article
ESO-DETR: An Improved Real-Time Detection Transformer Model for Enhanced Small Object Detection in UAV Imagery
by Yingfan Liu, Miao He and Bin Hui
Drones 2025, 9(2), 143; https://doi.org/10.3390/drones9020143 - 14 Feb 2025
Cited by 4 | Viewed by 2513
Abstract
Object detection is a fundamental capability that enables drones to perform various tasks. However, achieving a suitable equilibrium between performance, efficiency, and lightweight design continues to be a significant challenge for current algorithms. To address this issue, we propose an enhanced small object [...] Read more.
Object detection is a fundamental capability that enables drones to perform various tasks. However, achieving a suitable equilibrium between performance, efficiency, and lightweight design continues to be a significant challenge for current algorithms. To address this issue, we propose an enhanced small object detection transformer model called ESO-DETR. First, we present a gated single-head attention backbone block, known as the GSHA block, which enhances the extraction of local details. Besides, ESO-DETR utilizes the multiscale multihead self-attention mechanism (MMSA) to efficiently manage complex features within its backbone network. We also introduce a novel and efficient feature fusion pyramid network for enhanced small object detection, termed ESO-FPN. This network integrates large convolutional kernels with dual-domain attention mechanisms. Lastly, we introduce the EMASlideVariFocal loss (ESVF Loss), which dynamically adjusts the weights to improve the model’s focus on more challenging samples. In comparison with the baseline model, ESO-DETR demonstrates enhancements of 3.9% and 4.0% in the mAP50 metric on the VisDrone and HIT-UAV datasets, respectively, while also reducing parameters by 25%. These results highlight the capability of ESO-DETR to improve detection accuracy while maintaining a lightweight and efficient structure. Full article
Show Figures

Figure 1

17 pages, 8715 KiB  
Article
Pose Estimation for Cross-Domain Non-Cooperative Spacecraft Based on Spatial-Aware Keypoints Regression
by Zihao Wang, Yunmeng Liu and E Zhang
Aerospace 2024, 11(11), 948; https://doi.org/10.3390/aerospace11110948 - 17 Nov 2024
Viewed by 1288
Abstract
Reliable pose estimation for non-cooperative spacecraft is a key technology for in-orbit service and active debris removal missions. Utilizing deep learning techniques for processing monocular camera images is effective and is a hotspot of current research. To reduce errors and improve model generalization, [...] Read more.
Reliable pose estimation for non-cooperative spacecraft is a key technology for in-orbit service and active debris removal missions. Utilizing deep learning techniques for processing monocular camera images is effective and is a hotspot of current research. To reduce errors and improve model generalization, researchers often design multi-head loss functions or use generative models to achieve complex data augmentation, which makes the task complex and time-consuming. We propose a pyramid vision transformer spatial-aware keypoints regression network and a stereo-aware augmentation strategy to achieve robust prediction. Specifically, we primarily use the eight vertices of a cuboid satellite body as landmarks and the observable surfaces can be transformed by, respectively, using the pose labels. The experimental results on the SPEED+ dataset show that by using the existing EPNP algorithm and pseudo-label self-training method, we can achieve high-precision pose estimation for target cross-domains. Compared to other existing methods, our model and strategy are more straightforward. The entire process does not require the generation of new images, which significantly reduces the storage requirements and time costs. Combined with a Kalman filter, the robust and continuous output of the target position and attitude is verified by the SHIRT dataset. This work realizes deployment on mobile devices and provides strong technical support for the application of an automatic visual navigation system in orbit. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

17 pages, 5437 KiB  
Article
ChartLine: Automatic Detection and Tracing of Curves in Scientific Line Charts Using Spatial-Sequence Feature Pyramid Network
by Wenjin Yang, Jie He and Qian Li
Sensors 2024, 24(21), 7015; https://doi.org/10.3390/s24217015 - 31 Oct 2024
Cited by 1 | Viewed by 1354
Abstract
Line charts are prevalent in scientific documents and commercial data visualization, serving as essential tools for conveying data trends. Automatic detection and tracing of line paths in these charts is crucial for downstream tasks such as data extraction, chart quality assessment, plagiarism detection, [...] Read more.
Line charts are prevalent in scientific documents and commercial data visualization, serving as essential tools for conveying data trends. Automatic detection and tracing of line paths in these charts is crucial for downstream tasks such as data extraction, chart quality assessment, plagiarism detection, and visual question answering. However, line graphs present unique challenges due to their complex backgrounds and diverse curve styles, including solid, dashed, and dotted lines. Existing curve detection algorithms struggle to address these challenges effectively. In this paper, we propose ChartLine, a novel network designed for detecting and tracing curves in line graphs. Our approach integrates a Spatial-Sequence Attention Feature Pyramid Network (SSA-FPN) in both the encoder and decoder to capture rich hierarchical representations of curve structures and boundary features. The model incorporates a Spatial-Sequence Fusion (SSF) module and a Channel Multi-Head Attention (CMA) module to enhance intra-class consistency and inter-class distinction. We evaluate ChartLine on four line chart datasets and compare its performance against state-of-the-art curve detection, edge detection, and semantic segmentation methods. Extensive experiments demonstrate that our method significantly outperforms existing algorithms, achieving an F-measure of 94% on a synthetic dataset. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

14 pages, 4951 KiB  
Article
Transmission Line Defect Target-Detection Method Based on GR-YOLOv8
by Shuai Hao, Kang Ren, Jiahao Li and Xu Ma
Sensors 2024, 24(21), 6838; https://doi.org/10.3390/s24216838 - 24 Oct 2024
Cited by 4 | Viewed by 1283
Abstract
In view of the low levels of speed and precision associated with fault detection in transmission lines using traditional algorithms due to resource constraints, a transmission line fault target-detection method for YOLOv8 (You Only Look Once version 8) based on the Rep (Representational [...] Read more.
In view of the low levels of speed and precision associated with fault detection in transmission lines using traditional algorithms due to resource constraints, a transmission line fault target-detection method for YOLOv8 (You Only Look Once version 8) based on the Rep (Representational Pyramid) Visual Transformer and incorporating an ultra-lightweight module is proposed. First, the YOLOv8 detection network was built. In order to address the needs of feature redundancy and high levels of network computation, the Rep Visual Transformer module was introduced in the Neck part to integrate the pixel information associated with the entire image through its multi-head self-attention and enable the model to learn more global image features, thereby improving the computational speed of the model; then, a lightweight GSConv (Grouped and Separated Convolution, a combination of grouped convolution and separated convolution) convolution module was added to the Backbone and Neck to share computing resources among channels and reduce computing time and memory consumption, by which the computational cost and detection performance of the detection network were balanced, while the model remained lightweight and maintained its high precision. Secondly, the loss function Wise-IoU (Intelligent IOU) was introduced as the Bounding-Box Regression (BBR) loss function to optimize the predicted bounding boxes in these grid cells and shift them closer to the real target location, which reduced the harmful gradients caused by low-quality examples and further improved the detection precision of the algorithm. Finally, the algorithm was verified using a data set of 3500 images compiled by a power-supply inspection department over the past four years. The experimental results show that, compared with the seven classic and improved algorithms, the recall rate and average precision of the proposed algorithm were improved by 0.058 and 0.053, respectively, compared with the original YOLOv8 detection network; the floating-point operations per second decreased by 2.3; and the picture detection speed was increased to 114.9 FPS. Full article
Show Figures

Figure 1

13 pages, 5820 KiB  
Article
Optic Nerve Sheath Ultrasound Image Segmentation Based on CBC-YOLOv5s
by Yonghua Chu, Jinyang Xu, Chunshuang Wu, Jianping Ye, Jucheng Zhang, Lei Shen, Huaxia Wang and Yudong Yao
Electronics 2024, 13(18), 3595; https://doi.org/10.3390/electronics13183595 - 10 Sep 2024
Viewed by 1311
Abstract
The diameter of the optic nerve sheath is an important indicator for assessing the intracranial pressure in critically ill patients. The methods for measuring the optic nerve sheath diameter are generally divided into invasive and non-invasive methods. Compared to the invasive methods, the [...] Read more.
The diameter of the optic nerve sheath is an important indicator for assessing the intracranial pressure in critically ill patients. The methods for measuring the optic nerve sheath diameter are generally divided into invasive and non-invasive methods. Compared to the invasive methods, the non-invasive methods are safer and have thus gained popularity. Among the non-invasive methods, using deep learning to process the ultrasound images of the eyes of critically ill patients and promptly output the diameter of the optic nerve sheath offers significant advantages. This paper proposes a CBC-YOLOv5s optic nerve sheath ultrasound image segmentation method that integrates both local and global features. First, it introduces the CBC-Backbone feature extraction network, which consists of dual-layer C3 Swin-Transformer (C3STR) and dual-layer Bottleneck Transformer (BoT3) modules. The C3STR backbone’s multi-layer convolution and residual connections focus on the local features of the optic nerve sheath, while the Window Transformer Attention (WTA) mechanism in the C3STR module and the Multi-Head Self-Attention (MHSA) in the BoT3 module enhance the model’s understanding of the global features of the optic nerve sheath. The extracted local and global features are fully integrated in the Spatial Pyramid Pooling Fusion (SPPF) module. Additionally, the CBC-Neck feature pyramid is proposed, which includes a single-layer C3STR module and three-layer CReToNeXt (CRTN) module. During upsampling feature fusion, the C3STR module is used to enhance the local and global awareness of the fused features. During downsampling feature fusion, the CRTN module’s multi-level residual design helps the network to better capture the global features of the optic nerve sheath within the fused features. The introduction of these modules achieves the thorough integration of the local and global features, enabling the model to efficiently and accurately identify the optic nerve sheath boundaries, even when the ocular ultrasound images are blurry or the boundaries are unclear. The Z2HOSPITAL-5000 dataset collected from Zhejiang University Second Hospital was used for the experiments. Compared to the widely used YOLOv5s and U-Net algorithms, the proposed method shows improved performance on the blurry test set. Specifically, the proposed method achieves precision, recall, and Intersection over Union (IoU) values that are 4.1%, 2.1%, and 4.5% higher than those of YOLOv5s. When compared to U-Net, the precision, recall, and IoU are improved by 9.2%, 21%, and 19.7%, respectively. Full article
(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)
Show Figures

Figure 1

24 pages, 13634 KiB  
Article
Exploring Factors Affecting the Performance of Neural Network Algorithm for Detecting Clouds, Snow, and Lakes in Sentinel-2 Images
by Kaihong Huang, Zhangli Sun, Yi Xiong, Lin Tu, Chenxi Yang and Hangtong Wang
Remote Sens. 2024, 16(17), 3162; https://doi.org/10.3390/rs16173162 - 27 Aug 2024
Cited by 2 | Viewed by 1327
Abstract
Detecting clouds, snow, and lakes in remote sensing images is vital due to their propensity to obscure underlying surface information and hinder data extraction. In this study, we utilize Sentinel-2 images to implement a two-stage random forest (RF) algorithm for image labeling and [...] Read more.
Detecting clouds, snow, and lakes in remote sensing images is vital due to their propensity to obscure underlying surface information and hinder data extraction. In this study, we utilize Sentinel-2 images to implement a two-stage random forest (RF) algorithm for image labeling and delve into the factors influencing neural network performance across six aspects: model architecture, encoder, learning rate adjustment strategy, loss function, input image size, and different band combinations. Our findings indicate the Feature Pyramid Network (FPN) achieved the highest MIoU of 87.14%. The multi-head self-attention mechanism was less effective compared to convolutional methods for feature extraction with small datasets. Incorporating residual connections into convolutional blocks notably enhanced performance. Additionally, employing false-color images (bands 12-3-2) yielded a 4.86% improvement in MIoU compared to true-color images (bands 4-3-2). Notably, variations in model architecture, encoder structure, and input band combination had a substantial impact on performance, with parameter variations resulting in MIoU differences exceeding 5%. These results provide a reference for high-precision segmentation of clouds, snow, and lakes and offer valuable insights for applying deep learning techniques to the high-precision extraction of information from remote sensing images, thereby advancing research in deep neural networks for semantic segmentation. Full article
Show Figures

Graphical abstract

17 pages, 6790 KiB  
Article
An Improved Method for Detecting Crane Wheel–Rail Faults Based on YOLOv8 and the Swin Transformer
by Yunlong Li, Xiuli Tang, Wusheng Liu, Yuefeng Huang and Zhinong Li
Sensors 2024, 24(13), 4086; https://doi.org/10.3390/s24134086 - 24 Jun 2024
Cited by 1 | Viewed by 1796
Abstract
In the realm of special equipment, significant advancements have been achieved in fault detection. Nonetheless, faults originating in the equipment manifest with diverse morphological characteristics and varying scales. Certain faults necessitate the extrapolation from global information owing to their occurrence in localized areas. [...] Read more.
In the realm of special equipment, significant advancements have been achieved in fault detection. Nonetheless, faults originating in the equipment manifest with diverse morphological characteristics and varying scales. Certain faults necessitate the extrapolation from global information owing to their occurrence in localized areas. Simultaneously, the intricacies of the inspection area’s background easily interfere with the intelligent detection processes. Hence, a refined YOLOv8 algorithm leveraging the Swin Transformer is proposed, tailored for detecting faults in special equipment. The Swin Transformer serves as the foundational network of the YOLOv8 framework, amplifying its capability to concentrate on comprehensive features during the feature extraction, crucial for fault analysis. A multi-head self-attention mechanism regulated by a sliding window is utilized to expand the observation window’s scope. Moreover, an asymptotic feature pyramid network is introduced to augment spatial feature extraction for smaller targets. Within this network architecture, adjacent low-level features are merged, while high-level features are gradually integrated into the fusion process. This prevents loss or degradation of feature information during transmission and interaction, enabling accurate localization of smaller targets. Drawing from wheel–rail faults of lifting equipment as an illustration, the proposed method is employed to diagnose an expanded fault dataset generated through transfer learning. Experimental findings substantiate that the proposed method in adeptly addressing numerous challenges encountered in the intelligent fault detection of special equipment. Moreover, it outperforms mainstream target detection models, achieving real-time detection capabilities. Full article
Show Figures

Figure 1

19 pages, 3011 KiB  
Article
Hyperspectral Image Classification Based on Multi-Scale Convolutional Features and Multi-Attention Mechanisms
by Qian Sun, Guangrui Zhao, Xinyuan Xia, Yu Xie, Chenrong Fang, Le Sun, Zebin Wu and Chengsheng Pan
Remote Sens. 2024, 16(12), 2185; https://doi.org/10.3390/rs16122185 - 16 Jun 2024
Cited by 4 | Viewed by 2735
Abstract
Convolutional neural network (CNN)-based and Transformer-based methods for hyperspectral image (HSI) classification have rapidly advanced due to their unique characterization capabilities. However, the fixed kernel sizes in convolutional layers limit the comprehensive utilization of multi-scale features in HSI land cover analysis, while the [...] Read more.
Convolutional neural network (CNN)-based and Transformer-based methods for hyperspectral image (HSI) classification have rapidly advanced due to their unique characterization capabilities. However, the fixed kernel sizes in convolutional layers limit the comprehensive utilization of multi-scale features in HSI land cover analysis, while the Transformer’s multi-head self-attention (MHSA) mechanism faces challenges in effectively encoding feature information across various dimensions. To tackle this issue, this article introduces an HSI classification method, based on multi-scale convolutional features and multi-attention mechanisms (i.e., MSCF-MAM). Firstly, the model employs a multi-scale convolutional module to capture features across different scales in HSIs. Secondly, to enhance the integration of local and global channel features and establish long-range dependencies, a feature enhancement module based on pyramid squeeze attention (PSA) is employed. Lastly, the model leverages a classical Transformer Encoder (TE) and linear layers to encode and classify the transformed spatial–spectral features. The proposed method is evaluated on three publicly available datasets—Salina Valley (SV), WHU-Hi-HanChuan (HC), and WHU-Hi-HongHu (HH). Extensive experimental results have demonstrated that the MSCF-MAM method outperforms several representative methods in terms of classification performance. Full article
(This article belongs to the Special Issue Advances in Hyperspectral Remote Sensing Image Processing)
Show Figures

Figure 1

16 pages, 6730 KiB  
Article
Real-Time Detection Technology of Corn Kernel Breakage and Mildew Based on Improved YOLOv5s
by Mingming Liu, Yinzeng Liu, Qihuan Wang, Qinghao He and Duanyang Geng
Agriculture 2024, 14(5), 725; https://doi.org/10.3390/agriculture14050725 - 7 May 2024
Cited by 3 | Viewed by 2128
Abstract
In order to solve low recognition of corn kernel breakage degree and corn kernel mildew degree during corn kernel harvesting, this paper proposes a real-time detection method for corn kernel breakage and mildew based on improved YOlOv5s, which is referred to as the [...] Read more.
In order to solve low recognition of corn kernel breakage degree and corn kernel mildew degree during corn kernel harvesting, this paper proposes a real-time detection method for corn kernel breakage and mildew based on improved YOlOv5s, which is referred to as the CST-YOLOv5s model algorithm in this paper. The method continuously obtains images through the discrete uniform sampling device of corn kernels and generates whole corn kernels, breakage corn kernels, and mildew corn kernel dataset samples. We aimed at the problems of high similarity of some corn kernel features in the acquired images and the low precision of corn kernel breakage and mildew recognition. Firstly, the CBAM attention mechanism is added to the backbone network of YOLOv5s to finely allocate and process the feature information, highlighting the features of corn breakage and mildew. Secondly, the pyramid pooling structure SPPCPSC, which integrates cross-stage local networks, is adopted to replace the SPPF in YOLOv5s. SPP and CPSC technologies are used to extract and fuse features of different scales, improving the precision of object detection. Finally, the original prediction head is converted into a transformer prediction head to explore the prediction potential with a multi-head attention mechanism. The experimental results show that the CST-YOLOv5s model has a significant improvement in the detection of corn kernel breakage and mildew. Compared with the original YOLOv5s model, the average precision (AP) of corn kernel breakage and mildew recognition increased by 5.2% and 7.1%, respectively, and the mean average precision (mAP) of all kinds of corn kernel recognition is 96.1%, and the frame rate is 36.7 FPS. Compared with YOLOv4-tiny, YOLOv6n, YOLOv7, YOLOv8s, and YOLOv9-E detection model algorithms, the CST-YOLOv5s model has better overall performance in terms of detection accuracy and speed. This study can provide a reference for real-time detection of breakage and mildew kernels during the harvesting process of corn kernels. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

20 pages, 6847 KiB  
Article
Prediction of Large-Scale Regional Evapotranspiration Based on Multi-Scale Feature Extraction and Multi-Headed Self-Attention
by Xin Zheng, Sha Zhang, Jiahua Zhang, Shanshan Yang, Jiaojiao Huang, Xianye Meng and Yun Bai
Remote Sens. 2024, 16(7), 1235; https://doi.org/10.3390/rs16071235 - 31 Mar 2024
Viewed by 1483
Abstract
Accurately predicting actual evapotranspiration (ETa) at the regional scale is crucial for efficient water resource allocation and management. While previous studies mainly focused on predicting site-scale ETa, in-depth studies on regional-scale ETa are relatively scarce. This study [...] Read more.
Accurately predicting actual evapotranspiration (ETa) at the regional scale is crucial for efficient water resource allocation and management. While previous studies mainly focused on predicting site-scale ETa, in-depth studies on regional-scale ETa are relatively scarce. This study aims to address this issue by proposing a MulSA-ConvLSTM model, which combines the multi-headed self-attention module with the Pyramidally Attended Feature Extraction (PAFE) method. By extracting feature information and spatial dependencies in various dimensions and scales, the model utilizes remote sensing data from ERA5-Land and TerraClimate to attain regional-scale ETa prediction in Shandong, China. The MulSA-ConvLSTM model enhances the efficiency of capturing the trend of ETa successfully, and the prediction results are more accurate than those of the other contrast models. The Pearson’s correlation coefficient between observed and predicted values reaches 0.908. The study has demonstrated that MulSA-ConvLSTM yields superior performance in forecasting various ETa scenarios and is more responsive to climatic changes than other contrast models. By using a convolutional network feature extraction method, the PAFE method extracts global features via various convolutional kernels. The customized MulSAM module allows the model to concentrate on data from distinct subspaces, focusing on feature changes in multiple directions. The block-based training method is employed for the large-scale regional ETa prediction, proving to be effective in mitigating the constraints posed by limited hardware resources. This research provides a novel and effective method for accurately predicting regional-scale ETa. Full article
Show Figures

Figure 1

Back to TopTop