Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (41)

Search Parameters:
Keywords = intersection re-identification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 5222 KB  
Article
A Two-Stage Concrete Crack Segmentation Method Based on the Improved YOLOv11 and Segment Anything Model
by Ru Zhang, Chaodong Guan, Yi Fang, Yuanfeng Duan and Xiaodong Sui
Buildings 2026, 16(4), 794; https://doi.org/10.3390/buildings16040794 - 14 Feb 2026
Viewed by 99
Abstract
During long-term service, concrete structures are exposed to various adverse factors, which often lead to the formation of numerous surface cracks. These cracks pose serious threats to structural safety and durability. Therefore, accurately identifying crack characteristics is essential for evaluating the service performance [...] Read more.
During long-term service, concrete structures are exposed to various adverse factors, which often lead to the formation of numerous surface cracks. These cracks pose serious threats to structural safety and durability. Therefore, accurately identifying crack characteristics is essential for evaluating the service performance of concrete structures. A two-stage concrete crack segmentation method is presented in this study. The crack is initially located by the improved YOLOv11 that integrates three novel modules, namely Multi-scale Edge Information Enhancement, Efficient-Detection, and P2-Level Feature Integration, to form the MEP-YOLOv11 model. Then, the detected region is taken as input prompts for Segment Anything Model (SAM) to achieve precise crack segmentation. This approach eliminates the need for manual prompting in SAM, enabling automatic crack feature identification. The average Accuracy, precision, and Intersection over Union (IoU) for crack segmentation are 95.98%, 92.60%, and 0.77, respectively. To further enhance the robustness of the two-stage segmentation method under non-uniform illumination conditions, a mask re-input strategy is introduced. The crack mask generated by SAM using bounding-box prompts is fed back into SAM to guide a second round of segmentation. Experimental results demonstrate that the improved method maintains high segmentation performance, with an average Accuracy of 92.38%, precision of 85.70%, and IoU of 0.64. Overall, the proposed method meets engineering requirements for high-precision and efficient crack detection and segmentation, showing strong potential for practical inspection tasks. Full article
Show Figures

Figure 1

24 pages, 5044 KB  
Article
Research on Fouling Shellfish on Marine Aquaculture Cages Detection Technology Based on an Improved Symmetric Faster R-CNN Detection Algorithm
by Pengshuai Zhu, Hao Li, Junhua Chen and Chengjun Guo
Symmetry 2025, 17(12), 2107; https://doi.org/10.3390/sym17122107 - 8 Dec 2025
Viewed by 369
Abstract
The development of detection and identification technologies for biofouling organisms on marine aquaculture cages is of paramount importance for the automation and intelligence of cleaning processes by Autonomous Underwater Vehicles (AUVs). The present study proposes a methodology for the detection of fouling shellfish [...] Read more.
The development of detection and identification technologies for biofouling organisms on marine aquaculture cages is of paramount importance for the automation and intelligence of cleaning processes by Autonomous Underwater Vehicles (AUVs). The present study proposes a methodology for the detection of fouling shellfish on marine aquaculture cages. This methodology is based on an improved version of a symmetric Faster R-CNN: The original Visual Geometry Group 16-layer (VGG16) network is replaced with a 50-layer Residual Network with Aggregated Transformations (ResNeXt50) architecture, incorporating a Convolutional Block Attention Module (CBAM) to enhance feature extraction capabilities; In addition, the anchor box dimensions must be optimised concurrently with the Intersection over Union (IoU) threshold. This is to ensure the adaptation to the scale of the object; combined with the Multi-Scale Retinex with Single Scale Component and Color Restoration (MSRCR) algorithm with a view to achieving image enhancement. Experiments demonstrate that the enhanced model attains an average precision of 94.27%, signifying a 10.31% augmentation over the original model whilst necessitating a mere one-fifth of the original model’s weight. At an intersection-over-union (IoU) value of 0.5, the model attains a mean average precision (mAP) of 93.14%, surpassing numerous prevalent detection models. Furthermore, the employment of an image-enhanced dataset during the training of detection models has been demonstrated to yield an average precision that is 11.72 percentage points higher than that achieved through training with the original dataset. In summary, the technical approach proposed in this paper enables accurate and efficient detection and identification of fouling shellfish on marine aquaculture cages. Full article
(This article belongs to the Special Issue Computer Vision, Robotics, and Automation Engineering)
Show Figures

Figure 1

19 pages, 2285 KB  
Article
Real-Time Detection and Segmentation of Oceanic Whitecaps via EMA-SE-ResUNet
by Wenxuan Chen, Yongliang Wei and Xiangyi Chen
Electronics 2025, 14(21), 4286; https://doi.org/10.3390/electronics14214286 - 31 Oct 2025
Viewed by 461
Abstract
Oceanic whitecaps are caused by wave breaking and are very important in air–sea interactions. Usually, whitecap coverage is considered a key factor in representing the role of whitecaps. However, the accurate identification of whitecap coverage in videos under dynamic marine conditions is a [...] Read more.
Oceanic whitecaps are caused by wave breaking and are very important in air–sea interactions. Usually, whitecap coverage is considered a key factor in representing the role of whitecaps. However, the accurate identification of whitecap coverage in videos under dynamic marine conditions is a tough task. An EMA-SE-ResUNet deep learning model was proposed in this study to address this challenge. Based on a foundation of residual network (ResNet)-50 as the encoder and U-Net as the decoder, the model incorporated efficient multi-scale attention (EMA) module and squeeze-and-excitation network (SENet) module to improve its performance. By employing a dynamic weight allocation strategy and a channel attention mechanism, the model effectively strengthens the feature representation capability for whitecap edges while suppressing interference from wave textures and illumination noise. The model’s adaptability to complex sea surface scenarios was enhanced through the integration of data augmentation techniques and an optimized joint loss function. By applying the proposed model to a dataset collected by a shipborne camera system deployed during a comprehensive fishery resource survey in the northwest Pacific, the model results outperformed main segmentation algorithms, including U-Net, DeepLabv3+, HRNet, and PSPNet, in key metrics: whitecap intersection over union (IoUW) = 73.32%, pixel absolute error (PAE) = 0.081%, and whitecap F1-score (F1W) = 84.60. Compared to the traditional U-Net model, it achieved an absolute improvement of 2.1% in IoUW while reducing computational load (GFLOPs) by 57.3% and achieving synergistic optimization of accuracy and real-time performance. This study can provide highly reliable technical support for studies on air–sea flux quantification and marine aerosol generation. Full article
Show Figures

Figure 1

18 pages, 16806 KB  
Article
Refined Extraction of Sugarcane Planting Areas in Guangxi Using an Improved U-Net Model
by Tao Yue, Zijun Ling, Yuebiao Tang, Jingjin Huang, Hongteng Fang, Siyuan Ma, Jie Tang, Yun Chen and Hong Huang
Drones 2025, 9(11), 754; https://doi.org/10.3390/drones9110754 - 30 Oct 2025
Viewed by 567
Abstract
Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its [...] Read more.
Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its subtropical climate and abundant solar thermal resources, accounts for over 63% of China’s total sugarcane cultivation area. In this study, we constructed an enhanced RCAU-net model and developed a refined extraction framework that considers different growth stages to enable rapid identification of sugarcane planting areas. This study addresses key challenges in remote-sensing-based sugarcane extraction, namely, the difficulty of distinguishing spectrally similar objects, significant background interference, and insufficient multi-scale feature fusion. To significantly enhance the accuracy and robustness of sugarcane identification, an improved RCAU-net model based on the U-net architecture was designed. The model incorporates three key improvements: it replaces the original encoder with ResNet50 residual modules to enhance discrimination of similar crops; it integrates a Convolutional Block Attention Module (CBAM) to focus on critical features and effectively suppress background interference; and it employs an Atrous Spatial Pyramid Pooling (ASPP) module to bridge the encoder and decoder, thereby optimizing the extraction of multi-scale contextual information. A refined extraction framework that accounts for different growth stages was ultimately constructed to achieve rapid identification of sugarcane planting areas in Guangxi. The experimental results demonstrate that the RCAU-net model performed excellently, achieving an Overall Accuracy (OA) of 97.19%, a Mean Intersection over Union (mIoU) of 94.47%, a Precision of 97.31%, and an F1 Score of 97.16%. These results represent significant improvements of 7.20, 10.02, 6.82, and 7.28 percentage points in OA, mIoU, Precision, and F1 Score, respectively, relative to the original U-net. The model also achieved a Kappa coefficient of 0.9419 and a Recall rate of 96.99%. The incorporation of residual structures significantly reduced the misclassification of similar crops, while the CBAM and ASPP modules minimized holes within large continuous patches and false extractions of small patches, resulting in smoother boundaries for the extracted areas. This work provides reliable data support for the accurate calculation of sugarcane planting area and greatly enhances the decision-making value of remote sensing monitoring in modern agricultural management of sugarcane. Full article
Show Figures

Figure 1

21 pages, 5105 KB  
Article
A Dynamic Kalman Filtering Method for Multi-Object Fruit Tracking and Counting in Complex Orchards
by Yaning Zhai, Ling Zhang, Xin Hu, Fanghu Yang and Yang Huang
Sensors 2025, 25(13), 4138; https://doi.org/10.3390/s25134138 - 2 Jul 2025
Cited by 4 | Viewed by 1805
Abstract
With the rapid development of agricultural intelligence in recent years, automatic fruit detection and counting technologies have become increasingly significant for optimizing orchard management and advancing precision agriculture. However, existing deep learning-based models are primarily designed to process static and single-frame images, thereby [...] Read more.
With the rapid development of agricultural intelligence in recent years, automatic fruit detection and counting technologies have become increasingly significant for optimizing orchard management and advancing precision agriculture. However, existing deep learning-based models are primarily designed to process static and single-frame images, thereby failing to meet the large-scale detection and counting demands in the dynamically changing scenes of modern orchards. To address these challenges, this paper proposes a multi-object fruit tracking and counting method, which integrates an improved YOLO-based object detection algorithm with a dynamically optimized Kalman filter. By optimizing the network structure, the improved YOLO detection model provides high-quality detection results for subsequent tracking tasks. Then a modified Kalman filter with a variable forgetting factor is integrated to dynamically adjust the weighting of historical data, enabling the model to adapt to changes in observation and motion noise. Moreover, fruit targets are associated using a combined strategy based on Intersection over Union (IoU) and Re-Identification (Re-ID) features, improving the accuracy and stability of object matching. Consequently, the continuous tracking and precise counting of fruits in video sequences are achieved. Experimental results with image frames of fruits in video sequence are demonstrated, showing that the proposed method performs robust and continuous tracking (MOTA of 95.0% and HOTA of 82.4%). For fruit counting, the method attains a high coefficient-of-determination of 0.85 and a low root-mean-square error (RMSE) of 1.57, exhibiting high accuracy and stability of fruit detection, tracking and counting in video sequences under complex orchard environments. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

24 pages, 7924 KB  
Article
Optimizing Car Collision Detection Using Large Dashcam-Based Datasets: A Comparative Study of Pre-Trained Models and Hyperparameter Configurations
by Muhammad Shahid, Martin Gregurić, Amirhossein Hassani and Marko Ševrović
Appl. Sci. 2025, 15(13), 7001; https://doi.org/10.3390/app15137001 - 21 Jun 2025
Viewed by 2894
Abstract
The automatic identification of traffic collisions is an emerging topic in modern traffic surveillance systems. The increasing number of surveillance cameras at urban intersections connected to traffic surveillance systems has created new opportunities for leveraging computer vision techniques for automatic collision detection. This [...] Read more.
The automatic identification of traffic collisions is an emerging topic in modern traffic surveillance systems. The increasing number of surveillance cameras at urban intersections connected to traffic surveillance systems has created new opportunities for leveraging computer vision techniques for automatic collision detection. This study investigates the effectiveness of transfer learning utilizing pre-trained deep learning models for collision detection through dashcam images. We evaluated several state-of-the-art (SOTA) image classification models and fine-tuned them using different hyperparameter combinations to test their performance on the car collision detection problem. Our methodology systematically investigates the influence of optimizers, loss functions, schedulers, and learning rates on model generalization. A comprehensive analysis is conducted using 7 performance metrics to assess classification performance. Experiments on a large dashcam-based images dataset show that ResNet50, optimized with AdamW, a learning rate of 0.0001, CosineAnnealingLR scheduler, and Focal Loss, emerged as the top performer, achieving an accuracy of 0.9782, F1-score of 0.9617, and IoU of 0.9262, indicating a strong ability to reduce false negatives. Full article
Show Figures

Figure 1

15 pages, 2794 KB  
Article
Improvement of Mask R-CNN Algorithm for Ore Segmentation
by Kai Tang, Yuguo Pei, Xiaobo Wang and Leilei Qu
Electronics 2025, 14(10), 2025; https://doi.org/10.3390/electronics14102025 - 16 May 2025
Cited by 5 | Viewed by 2822
Abstract
In response to the low precision of ore image segmentation under complex working conditions, an improved Mask R-CNN segmentation algorithm is proposed. The traditional Mask R-CNN uses a simple deconvolution operation to generate masks, which can lead to the loss of ore edge [...] Read more.
In response to the low precision of ore image segmentation under complex working conditions, an improved Mask R-CNN segmentation algorithm is proposed. The traditional Mask R-CNN uses a simple deconvolution operation to generate masks, which can lead to the loss of ore edge information and insufficient detail processing, affecting segmentation accuracy. Therefore, an improved model based on the Mask R-CNN framework is proposed in this paper. By introducing the Re-parameterized Refocus Convolution (RefConv) into the residual networks, the expressive power of the feature extraction network is enhanced. Meanwhile, the Efficient Channel Attention (ECA) is embedded in the output part of the Feature Pyramid Network (FPN), enhancing the model’s ability to capture key information. The improved Mask R-CNN network structure can reduce the loss of ore detail information caused by convolution operations and improve the network’s segmentation accuracy. Comparative experiments between the improved algorithm and the original algorithm show that the average Intersection over Union (MIoU) of the improved algorithm reached 92.8%, which is about a 6.8% increase compared to the original Mask R-CNN algorithm; the average pixel accuracy (mAP) is 97.2%, which is about a 5.1% increase compared to the original algorithm, indicating higher detection accuracy for ore identification and segmentation. Full article
Show Figures

Figure 1

21 pages, 5384 KB  
Article
A Video SAR Multi-Target Tracking Algorithm Based on Re-Identification Features and Multi-Stage Data Association
by Anxi Yu, Boxu Wei, Wenhao Tong, Zhihua He and Zhen Dong
Remote Sens. 2025, 17(6), 959; https://doi.org/10.3390/rs17060959 - 8 Mar 2025
Viewed by 2058
Abstract
Video Synthetic Aperture Radar (ViSAR) operates by continuously monitoring regions of interest to produce sequences of SAR imagery. The detection and tracking of ground-moving targets, through the analysis of their radiation properties and temporal variations relative to the background environment, represents a significant [...] Read more.
Video Synthetic Aperture Radar (ViSAR) operates by continuously monitoring regions of interest to produce sequences of SAR imagery. The detection and tracking of ground-moving targets, through the analysis of their radiation properties and temporal variations relative to the background environment, represents a significant area of focus and innovation within the SAR research community. In this study, some key challenges in ViSAR systems are addressed, including the abundance of low-confidence shadow detections, high error rates in multi-target data association, and the frequent fragmentation of tracking trajectories. A multi-target tracking algorithm for ViSAR that utilizes re-identification (ReID) features and a multi-stage data association process is proposed. The algorithm extracts high-dimensional ReID features using the Dense-Net121 network for enhanced shadow detection and calculates a cost matrix by integrating ReID feature cosine similarity with Intersection over Union similarity. A confidence-based multi-stage data association strategy is implemented to minimize missed detections and trajectory fragmentation. Kalman filtering is then employed to update trajectory states based on shadow detection. Both simulation experiments and actual data processing experiments have demonstrated that, in comparison to two traditional video multi-target tracking algorithms, DeepSORT and ByteTrack, the newly proposed algorithm exhibits superior performance in the realm of ViSAR multi-target tracking, yielding the highest MOTA and HOTA scores of 94.85% and 92.88%, respectively, on the simulated spaceborne ViSAR data, and the highest MOTA and HOTA scores of 82.94% and 69.74%, respectively, on airborne field data. Full article
(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)
Show Figures

Figure 1

21 pages, 5349 KB  
Article
RST-DeepLabv3+: Multi-Scale Attention for Tailings Pond Identification with DeepLab
by Xiangrui Feng, Caiyong Wei, Xiaojing Xue, Qian Zhang and Xiangnan Liu
Remote Sens. 2025, 17(3), 411; https://doi.org/10.3390/rs17030411 - 25 Jan 2025
Cited by 4 | Viewed by 2094
Abstract
Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. [...] Read more.
Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. Therefore, tailing pond identification based on computer vision is of practical significance for environmental protection and safety. In the context of identifying tailings ponds in remote sensing, a significant challenge arises due to high-resolution images, which capture extensive feature details—such as shape, location, and texture—complicated by the mixing of tailings with other waste materials. This results in substantial intra-class variance and limited inter-class variance, making accurate recognition more difficult. Therefore, to monitor tailing ponds, this study utilized an improved version of DeepLabv3+, which is a widely recognized deep learning model for semantic segmentation. We introduced the multi-scale attention modules, ResNeSt and SENet, into the DeepLabv3+ encoder. The split-attention module in ResNeSt captures multi-scale information when processing multiple sets of feature maps, while the SENet module focuses on channel attention, improving the model’s ability to distinguish tailings ponds from other materials in images. Additionally, the tailing pond semantic segmentation dataset NX-TPSet was established based on the Gauge-Fractional-6 image. The ablation experiments show that the recognition accuracy (intersection and integration ratio, IOU) of the RST-DeepLabV3+ model was improved by 1.19% to 93.48% over DeepLabV3+.The multi-attention module enables the model to integrate multi-scale features more effectively, which not only improves segmentation accuracy but also directly contributes to more reliable and efficient monitoring of tailings ponds. The proposed approach achieves top performance on two benchmark datasets, NX-TPSet and TPSet, demonstrating its effectiveness as a practical and superior method for real-world tailing pond identification. Full article
Show Figures

Figure 1

20 pages, 4082 KB  
Article
A Comparative Study of Decoders for Liver and Tumor Segmentation Using a Self-ONN-Based Cascaded Framework
by Sidra Gul, Muhammad Salman Khan, Md Sakib Abrar Hossain, Muhammad E. H. Chowdhury and Md. Shaheenur Islam Sumon
Diagnostics 2024, 14(23), 2761; https://doi.org/10.3390/diagnostics14232761 - 8 Dec 2024
Cited by 4 | Viewed by 2442
Abstract
Background/Objectives: Accurate liver and tumor detection and segmentation are crucial in diagnosis of early-stage liver malignancies. As opposed to manual interpretation, which is a difficult and time-consuming process, accurate tumor detection using a computer-aided diagnosis system can save both time and human efforts. [...] Read more.
Background/Objectives: Accurate liver and tumor detection and segmentation are crucial in diagnosis of early-stage liver malignancies. As opposed to manual interpretation, which is a difficult and time-consuming process, accurate tumor detection using a computer-aided diagnosis system can save both time and human efforts. Methods: We propose a cascaded encoder–decoder technique based on self-organized neural networks, which is a recent variant of operational neural networks (ONNs), for accurate segmentation and identification of liver tumors. The first encoder–decoder CNN segments the liver. For generating the liver region of interest, the segmented liver mask is placed over the input computed tomography (CT) image and then fed to the second Self-ONN model for tumor segmentation. For further investigation the other three distinct encoder–decoder architectures U-Net, feature pyramid networks (FPNs), and U-Net++, have also been investigated by altering the backbone at the encoders utilizing ResNet and DenseNet variants for transfer learning. Results: For the liver segmentation task, Self-ONN with a ResNet18 backbone has achieved a dice similarity coefficient score of 98.182% and an intersection over union of 97.436%. Tumor segmentation with Self-ONN with the DenseNet201 encoder resulted in an outstanding DSC of 92.836% and IoU of 91.748%. Conclusions: The suggested method is capable of precisely locating liver tumors of various sizes and shapes, including tiny infection patches that were said to be challenging to find in earlier research. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

17 pages, 1492 KB  
Article
Deep Learning-Based Infrared Image Segmentation for Aircraft Honeycomb Water Ingress Detection
by Hang Fei, Hongfu Zuo, Han Wang, Yan Liu, Zhenzhen Liu and Xin Li
Aerospace 2024, 11(12), 961; https://doi.org/10.3390/aerospace11120961 - 22 Nov 2024
Cited by 1 | Viewed by 1903
Abstract
The presence of water accumulation on aircraft surfaces constitutes a considerable hazard to both performance and safety, necessitating vigilant inspection and maintenance protocols. In this study, we introduce an innovative semantic segmentation model, grounded in deep learning principles, for the precise identification and [...] Read more.
The presence of water accumulation on aircraft surfaces constitutes a considerable hazard to both performance and safety, necessitating vigilant inspection and maintenance protocols. In this study, we introduce an innovative semantic segmentation model, grounded in deep learning principles, for the precise identification and delineation of water accumulation areas within infrared images of aircraft exteriors. Our proposed model harnesses the robust features of ResNet, serving as the foundational architecture for U-Net, thereby augmenting the model’s capacity for comprehensive feature characterization. The incorporation of channel attention mechanisms, spatial attention mechanisms, and depthwise separable convolution further refines the network structure, contributing to enhanced segmentation performance. Through rigorous experimentation, our model surpasses existing benchmarks, yielding a commendable 22.44% reduction in computational effort and a substantial 38.89% reduction in parameter count. The model’s outstanding performance is particularly noteworthy, registering a 92.67% mean intersection over union and a 97.97% mean pixel accuracy. The hallmark of our innovation lies in the model’s efficacy in the precise detection and segmentation of water accumulation areas on aircraft skin. Beyond this, our approach holds promise for addressing analogous challenges in aviation and related domains. The enumeration of specific quantitative outcomes underscores the superior efficacy of our model, rendering it a compelling solution for precise detection and segmentation tasks. The demonstrated reductions in computational effort and parameter count underscore the model’s efficiency, fortifying its relevance in broader contexts. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

18 pages, 3237 KB  
Article
Lightweight Wheat Spike Detection Method Based on Activation and Loss Function Enhancements for YOLOv5s
by Jingsong Li, Feijie Dai, Haiming Qian, Linsheng Huang and Jinling Zhao
Agronomy 2024, 14(9), 2036; https://doi.org/10.3390/agronomy14092036 - 6 Sep 2024
Cited by 5 | Viewed by 1461
Abstract
Wheat spike count is one of the critical indicators for assessing the growth and yield of wheat. However, illumination variations, mutual occlusion, and background interference have greatly affected wheat spike detection. A lightweight detection method was proposed based on the YOLOv5s. Initially, the [...] Read more.
Wheat spike count is one of the critical indicators for assessing the growth and yield of wheat. However, illumination variations, mutual occlusion, and background interference have greatly affected wheat spike detection. A lightweight detection method was proposed based on the YOLOv5s. Initially, the original YOLOv5s was improved by combing the additional small-scale detection layer and integrating the ECA (Efficient Channel Attention) attention mechanism into all C3 modules (YOLOv5s + 4 + ECAC3). After comparing GhostNet, ShuffleNetV2, and MobileNetV3, the GhostNet architecture was finally selected as the optimal lightweight model framework based on its superior performance in various evaluations. Subsequently, the incorporation of five different activation functions into the network led to the identification of the RReLU (Randomized Leaky ReLU) activation function as the most effective in augmenting the network’s performance. Ultimately, the network’s loss function of CIoU (Complete Intersection over Union) was optimized using the EIoU (Efficient Intersection over Union) loss function. Despite a minor reduction of 2.17% in accuracy for the refined YOLOv5s + 4 + ECAC3 + G + RR + E network when compared to the YOLOv5s + 4 + ECAC3, there was a marginal improvement of 0.77% over the original YOLOv5s. Furthermore, the parameter count was diminished by 32% and 28.2% relative to the YOLOv5s + 4 + ECAC3 and YOLOv5s, respectively. The model size was reduced by 28.0% and 20%, and the Giga Floating-point Operations Per Second (GFLOPs) were lowered by 33.2% and 9.5%, respectively, signifying a substantial improvement in the network’s efficiency without significantly compromising accuracy. This study offers a methodological reference for the rapid and accurate detection of agricultural objects through the enhancement of a deep learning network. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

20 pages, 7094 KB  
Article
DualNet-PoiD: A Hybrid Neural Network for Highly Accurate Recognition of POIs on Road Networks in Complex Areas with Urban Terrain
by Yongchuan Zhang, Caixia Long, Jiping Liu, Yong Wang and Wei Yang
Remote Sens. 2024, 16(16), 3003; https://doi.org/10.3390/rs16163003 - 16 Aug 2024
Cited by 2 | Viewed by 1821
Abstract
For high-precision navigation, obtaining and maintaining high-precision point-of-interest (POI) data on the road network is crucial. In urban areas with complex terrains, the accuracy of traditional road network POI acquisition methods often falls short. To address this issue, we introduce DualNet-PoiD, a hybrid [...] Read more.
For high-precision navigation, obtaining and maintaining high-precision point-of-interest (POI) data on the road network is crucial. In urban areas with complex terrains, the accuracy of traditional road network POI acquisition methods often falls short. To address this issue, we introduce DualNet-PoiD, a hybrid neural network designed for the efficient recognition of road network POIs in intricate urban environments. This method leverages multimodal sensory data, incorporating both vehicle trajectories and remote sensing imagery. Through an enhanced dual-attention dilated link network (DAD-LinkNet) based on ResNet18, the system extracts static geometric features of roads from remote sensing images. Concurrently, an improved gated recirculation unit (GRU) captures dynamic traffic characteristics implied by vehicle trajectories. The integration of a fully connected layer (FC) enables the high-precision identification of various POIs, including traffic light intersections, gas stations, parking lots, and tunnels. To validate the efficacy of DualNet-PoiD, we collected 500 remote sensing images and 50,000 taxi trajectory data samples covering road POIs in the central urban area of the mountainous city of Chongqing. Through comprehensive area comparison experiments, DualNet-PoiD demonstrated a high recognition accuracy of 91.30%, performing robustly even under conditions of complex occlusion. This confirms the network’s capability to significantly improve POI detection in challenging urban settings. Full article
Show Figures

Figure 1

16 pages, 7412 KB  
Article
An Identification Method for Mixed Coal Vitrinite Components Based on An Improved DeepLabv3+ Network
by Fujie Wang, Fanfan Li, Wei Sun, Xiaozhong Song and Huishan Lu
Energies 2024, 17(14), 3453; https://doi.org/10.3390/en17143453 - 13 Jul 2024
Cited by 1 | Viewed by 1666
Abstract
To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, [...] Read more.
To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, an atrous convolution layer with a dilation rate of 24 is added to the ASPP (atrous spatial pyramid pooling) module to further increase the receptive field. Meanwhile, a CBAM (convolutional block attention module) attention mechanism with a channel multiplier of 8 is introduced at the output part of the ASPP module to better filter out important semantic features. Then, a corrective convolution module is added to the network’s output to ensure the consistency of each channel’s output feature map for each type of vitrinite. Finally, images of 14 single vitrinite components are used as training samples for network training, and a validation set is used for identification testing. The results show that the improved DeepLabv3+ achieves 6.14% and 3.68% improvements in MIOU (mean intersection over union) and MPA (mean pixel accuracy), respectively, compared to the original DeepLabv3+; 12% and 5.3% improvements compared to U-Net; 9.26% and 4.73% improvements compared to PSPNet with ResNet as the backbone; 5.4% and 9.34% improvements compared to PSPNet with MobileNetV2 as the backbone; and 6.46% and 9.05% improvements compared to HRNet. Additionally, the improved ASPP module increases MIOU and MPA by 3.23% and 1.93%, respectively, compared to the original module. The CBAM attention mechanism with a channel multiplier of 8 improves MIOU and MPA by 1.97% and 1.72%, respectively, compared to the original channel multiplier of 16. The data indicate that the proposed identification method significantly improves recognition accuracy and can be effectively applied to mixed coal vitrinite identification. Full article
(This article belongs to the Special Issue Factor Analysis and Mathematical Modeling of Coals)
Show Figures

Figure 1

15 pages, 5037 KB  
Article
Aerial Image Segmentation of Nematode-Affected Pine Trees with U-Net Convolutional Neural Network
by Jiankang Shen, Qinghua Xu, Mingyang Gao, Jicai Ning, Xiaopeng Jiang and Meng Gao
Appl. Sci. 2024, 14(12), 5087; https://doi.org/10.3390/app14125087 - 11 Jun 2024
Cited by 8 | Viewed by 2034
Abstract
Pine wood nematode disease, commonly referred to as pine wilt, poses a grave threat to forest health, leading to profound ecological and economic impacts. Originating from the pine wood nematode, this disease not only causes the demise of pine trees but also casts [...] Read more.
Pine wood nematode disease, commonly referred to as pine wilt, poses a grave threat to forest health, leading to profound ecological and economic impacts. Originating from the pine wood nematode, this disease not only causes the demise of pine trees but also casts a long shadow over the entire forest ecosystem. The accurate identification of infected trees stands as a pivotal initial step in developing effective prevention and control measures for pine wilt. Nevertheless, existing identification methods face challenges in precisely determining the disease status of individual pine trees, impeding early detection and efficient intervention. In this study, we leverage the capabilities of unmanned aerial vehicle (UAV) remote sensing technology and integrate the VGG classical small convolutional kernel network with U-Net to detect diseased pine trees. This cutting-edge approach captures the spatial and characteristic intricacies of infected trees, converting them into high-dimensional features through multiple convolutions within the VGG network. This method significantly reduces the parameter count while enhancing the sensing range. The results obtained from our validation set are remarkably promising, achieving a Mean Intersection over Union (MIoU) of 81.62%, a Mean Pixel Accuracy (MPA) of 85.13%, an Accuracy of 99.13%, and an F1 Score of 88.50%. These figures surpass those obtained using other methods such as ResNet50 and DeepLab v3+. The methodology presented in this research facilitates rapid and accurate monitoring of pine trees infected with nematodes, offering invaluable technical assistance in the prevention and management of pine wilt disease. Full article
Show Figures

Figure 1

Back to TopTop