MDPI - Publisher of Open Access Journals

22 pages, 5222 KB

Open AccessArticle

A Two-Stage Concrete Crack Segmentation Method Based on the Improved YOLOv11 and Segment Anything Model

by Ru Zhang, Chaodong Guan, Yi Fang, Yuanfeng Duan and Xiaodong Sui

Buildings 2026, 16(4), 794; https://doi.org/10.3390/buildings16040794 - 14 Feb 2026

Viewed by 99

During long-term service, concrete structures are exposed to various adverse factors, which often lead to the formation of numerous surface cracks. These cracks pose serious threats to structural safety and durability. Therefore, accurately identifying crack characteristics is essential for evaluating the service performance [...] Read more.

During long-term service, concrete structures are exposed to various adverse factors, which often lead to the formation of numerous surface cracks. These cracks pose serious threats to structural safety and durability. Therefore, accurately identifying crack characteristics is essential for evaluating the service performance of concrete structures. A two-stage concrete crack segmentation method is presented in this study. The crack is initially located by the improved YOLOv11 that integrates three novel modules, namely Multi-scale Edge Information Enhancement, Efficient-Detection, and P2-Level Feature Integration, to form the MEP-YOLOv11 model. Then, the detected region is taken as input prompts for Segment Anything Model (SAM) to achieve precise crack segmentation. This approach eliminates the need for manual prompting in SAM, enabling automatic crack feature identification. The average Accuracy, precision, and Intersection over Union (IoU) for crack segmentation are 95.98%, 92.60%, and 0.77, respectively. To further enhance the robustness of the two-stage segmentation method under non-uniform illumination conditions, a mask re-input strategy is introduced. The crack mask generated by SAM using bounding-box prompts is fed back into SAM to guide a second round of segmentation. Experimental results demonstrate that the improved method maintains high segmentation performance, with an average Accuracy of 92.38%, precision of 85.70%, and IoU of 0.64. Overall, the proposed method meets engineering requirements for high-precision and efficient crack detection and segmentation, showing strong potential for practical inspection tasks. Full article

(This article belongs to the Special Issue Advanced Technologies for Structural Health Monitoring in Engineering Structure)

► Show Figures

Figure 1

24 pages, 5044 KB

Open AccessArticle

Research on Fouling Shellfish on Marine Aquaculture Cages Detection Technology Based on an Improved Symmetric Faster R-CNN Detection Algorithm

by Pengshuai Zhu, Hao Li, Junhua Chen and Chengjun Guo

Symmetry 2025, 17(12), 2107; https://doi.org/10.3390/sym17122107 - 8 Dec 2025

Viewed by 369

Abstract

The development of detection and identification technologies for biofouling organisms on marine aquaculture cages is of paramount importance for the automation and intelligence of cleaning processes by Autonomous Underwater Vehicles (AUVs). The present study proposes a methodology for the detection of fouling shellfish [...] Read more.

The development of detection and identification technologies for biofouling organisms on marine aquaculture cages is of paramount importance for the automation and intelligence of cleaning processes by Autonomous Underwater Vehicles (AUVs). The present study proposes a methodology for the detection of fouling shellfish on marine aquaculture cages. This methodology is based on an improved version of a symmetric Faster R-CNN: The original Visual Geometry Group 16-layer (VGG16) network is replaced with a 50-layer Residual Network with Aggregated Transformations (ResNeXt50) architecture, incorporating a Convolutional Block Attention Module (CBAM) to enhance feature extraction capabilities; In addition, the anchor box dimensions must be optimised concurrently with the Intersection over Union (IoU) threshold. This is to ensure the adaptation to the scale of the object; combined with the Multi-Scale Retinex with Single Scale Component and Color Restoration (MSRCR) algorithm with a view to achieving image enhancement. Experiments demonstrate that the enhanced model attains an average precision of 94.27%, signifying a 10.31% augmentation over the original model whilst necessitating a mere one-fifth of the original model’s weight. At an intersection-over-union (IoU) value of 0.5, the model attains a mean average precision (mAP) of 93.14%, surpassing numerous prevalent detection models. Furthermore, the employment of an image-enhanced dataset during the training of detection models has been demonstrated to yield an average precision that is 11.72 percentage points higher than that achieved through training with the original dataset. In summary, the technical approach proposed in this paper enables accurate and efficient detection and identification of fouling shellfish on marine aquaculture cages. Full article

(This article belongs to the Special Issue Computer Vision, Robotics, and Automation Engineering)

► Show Figures

Figure 1

19 pages, 2285 KB

Open AccessArticle

Real-Time Detection and Segmentation of Oceanic Whitecaps via EMA-SE-ResUNet

by Wenxuan Chen, Yongliang Wei and Xiangyi Chen

Electronics 2025, 14(21), 4286; https://doi.org/10.3390/electronics14214286 - 31 Oct 2025

Viewed by 461

Abstract

Oceanic whitecaps are caused by wave breaking and are very important in air–sea interactions. Usually, whitecap coverage is considered a key factor in representing the role of whitecaps. However, the accurate identification of whitecap coverage in videos under dynamic marine conditions is a [...] Read more.

Oceanic whitecaps are caused by wave breaking and are very important in air–sea interactions. Usually, whitecap coverage is considered a key factor in representing the role of whitecaps. However, the accurate identification of whitecap coverage in videos under dynamic marine conditions is a tough task. An EMA-SE-ResUNet deep learning model was proposed in this study to address this challenge. Based on a foundation of residual network (ResNet)-50 as the encoder and U-Net as the decoder, the model incorporated efficient multi-scale attention (EMA) module and squeeze-and-excitation network (SENet) module to improve its performance. By employing a dynamic weight allocation strategy and a channel attention mechanism, the model effectively strengthens the feature representation capability for whitecap edges while suppressing interference from wave textures and illumination noise. The model’s adaptability to complex sea surface scenarios was enhanced through the integration of data augmentation techniques and an optimized joint loss function. By applying the proposed model to a dataset collected by a shipborne camera system deployed during a comprehensive fishery resource survey in the northwest Pacific, the model results outperformed main segmentation algorithms, including U-Net, DeepLabv3+, HRNet, and PSPNet, in key metrics: whitecap intersection over union (IoU_W) = 73.32%, pixel absolute error (PAE) = 0.081%, and whitecap F1-score (F1_W) = 84.60. Compared to the traditional U-Net model, it achieved an absolute improvement of 2.1% in IoU_W while reducing computational load (GFLOPs) by 57.3% and achieving synergistic optimization of accuracy and real-time performance. This study can provide highly reliable technical support for studies on air–sea flux quantification and marine aerosol generation. Full article

► Show Figures

Figure 1

18 pages, 16806 KB

Open AccessArticle

Refined Extraction of Sugarcane Planting Areas in Guangxi Using an Improved U-Net Model

by Tao Yue, Zijun Ling, Yuebiao Tang, Jingjin Huang, Hongteng Fang, Siyuan Ma, Jie Tang, Yun Chen and Hong Huang

Drones 2025, 9(11), 754; https://doi.org/10.3390/drones9110754 - 30 Oct 2025

Viewed by 567

Abstract

Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its [...] Read more.

Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its subtropical climate and abundant solar thermal resources, accounts for over 63% of China’s total sugarcane cultivation area. In this study, we constructed an enhanced RCAU-net model and developed a refined extraction framework that considers different growth stages to enable rapid identification of sugarcane planting areas. This study addresses key challenges in remote-sensing-based sugarcane extraction, namely, the difficulty of distinguishing spectrally similar objects, significant background interference, and insufficient multi-scale feature fusion. To significantly enhance the accuracy and robustness of sugarcane identification, an improved RCAU-net model based on the U-net architecture was designed. The model incorporates three key improvements: it replaces the original encoder with ResNet50 residual modules to enhance discrimination of similar crops; it integrates a Convolutional Block Attention Module (CBAM) to focus on critical features and effectively suppress background interference; and it employs an Atrous Spatial Pyramid Pooling (ASPP) module to bridge the encoder and decoder, thereby optimizing the extraction of multi-scale contextual information. A refined extraction framework that accounts for different growth stages was ultimately constructed to achieve rapid identification of sugarcane planting areas in Guangxi. The experimental results demonstrate that the RCAU-net model performed excellently, achieving an Overall Accuracy (OA) of 97.19%, a Mean Intersection over Union (mIoU) of 94.47%, a Precision of 97.31%, and an F1 Score of 97.16%. These results represent significant improvements of 7.20, 10.02, 6.82, and 7.28 percentage points in OA, mIoU, Precision, and F1 Score, respectively, relative to the original U-net. The model also achieved a Kappa coefficient of 0.9419 and a Recall rate of 96.99%. The incorporation of residual structures significantly reduced the misclassification of similar crops, while the CBAM and ASPP modules minimized holes within large continuous patches and false extractions of small patches, resulting in smoother boundaries for the extracted areas. This work provides reliable data support for the accurate calculation of sugarcane planting area and greatly enhances the decision-making value of remote sensing monitoring in modern agricultural management of sugarcane. Full article

► Show Figures

Figure 1

21 pages, 5105 KB

Open AccessArticle

A Dynamic Kalman Filtering Method for Multi-Object Fruit Tracking and Counting in Complex Orchards

by Yaning Zhai, Ling Zhang, Xin Hu, Fanghu Yang and Yang Huang

Sensors 2025, 25(13), 4138; https://doi.org/10.3390/s25134138 - 2 Jul 2025

Cited by 4 | Viewed by 1805

Abstract

With the rapid development of agricultural intelligence in recent years, automatic fruit detection and counting technologies have become increasingly significant for optimizing orchard management and advancing precision agriculture. However, existing deep learning-based models are primarily designed to process static and single-frame images, thereby [...] Read more.

With the rapid development of agricultural intelligence in recent years, automatic fruit detection and counting technologies have become increasingly significant for optimizing orchard management and advancing precision agriculture. However, existing deep learning-based models are primarily designed to process static and single-frame images, thereby failing to meet the large-scale detection and counting demands in the dynamically changing scenes of modern orchards. To address these challenges, this paper proposes a multi-object fruit tracking and counting method, which integrates an improved YOLO-based object detection algorithm with a dynamically optimized Kalman filter. By optimizing the network structure, the improved YOLO detection model provides high-quality detection results for subsequent tracking tasks. Then a modified Kalman filter with a variable forgetting factor is integrated to dynamically adjust the weighting of historical data, enabling the model to adapt to changes in observation and motion noise. Moreover, fruit targets are associated using a combined strategy based on Intersection over Union (IoU) and Re-Identification (Re-ID) features, improving the accuracy and stability of object matching. Consequently, the continuous tracking and precise counting of fruits in video sequences are achieved. Experimental results with image frames of fruits in video sequence are demonstrated, showing that the proposed method performs robust and continuous tracking (MOTA of 95.0% and HOTA of 82.4%). For fruit counting, the method attains a high coefficient-of-determination of 0.85 and a low root-mean-square error (RMSE) of 1.57, exhibiting high accuracy and stability of fruit detection, tracking and counting in video sequences under complex orchard environments. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

24 pages, 7924 KB

Open AccessArticle

Optimizing Car Collision Detection Using Large Dashcam-Based Datasets: A Comparative Study of Pre-Trained Models and Hyperparameter Configurations

by Muhammad Shahid, Martin Gregurić, Amirhossein Hassani and Marko Ševrović

Appl. Sci. 2025, 15(13), 7001; https://doi.org/10.3390/app15137001 - 21 Jun 2025

Viewed by 2894

Abstract

The automatic identification of traffic collisions is an emerging topic in modern traffic surveillance systems. The increasing number of surveillance cameras at urban intersections connected to traffic surveillance systems has created new opportunities for leveraging computer vision techniques for automatic collision detection. This [...] Read more.

The automatic identification of traffic collisions is an emerging topic in modern traffic surveillance systems. The increasing number of surveillance cameras at urban intersections connected to traffic surveillance systems has created new opportunities for leveraging computer vision techniques for automatic collision detection. This study investigates the effectiveness of transfer learning utilizing pre-trained deep learning models for collision detection through dashcam images. We evaluated several state-of-the-art (SOTA) image classification models and fine-tuned them using different hyperparameter combinations to test their performance on the car collision detection problem. Our methodology systematically investigates the influence of optimizers, loss functions, schedulers, and learning rates on model generalization. A comprehensive analysis is conducted using 7 performance metrics to assess classification performance. Experiments on a large dashcam-based images dataset show that ResNet50, optimized with AdamW, a learning rate of 0.0001, CosineAnnealingLR scheduler, and Focal Loss, emerged as the top performer, achieving an accuracy of 0.9782, F1-score of 0.9617, and IoU of 0.9262, indicating a strong ability to reduce false negatives. Full article

► Show Figures

Figure 1

15 pages, 2794 KB

Open AccessArticle

Improvement of Mask R-CNN Algorithm for Ore Segmentation

by Kai Tang, Yuguo Pei, Xiaobo Wang and Leilei Qu

Electronics 2025, 14(10), 2025; https://doi.org/10.3390/electronics14102025 - 16 May 2025

Cited by 5 | Viewed by 2822

Abstract

In response to the low precision of ore image segmentation under complex working conditions, an improved Mask R-CNN segmentation algorithm is proposed. The traditional Mask R-CNN uses a simple deconvolution operation to generate masks, which can lead to the loss of ore edge [...] Read more.

In response to the low precision of ore image segmentation under complex working conditions, an improved Mask R-CNN segmentation algorithm is proposed. The traditional Mask R-CNN uses a simple deconvolution operation to generate masks, which can lead to the loss of ore edge information and insufficient detail processing, affecting segmentation accuracy. Therefore, an improved model based on the Mask R-CNN framework is proposed in this paper. By introducing the Re-parameterized Refocus Convolution (RefConv) into the residual networks, the expressive power of the feature extraction network is enhanced. Meanwhile, the Efficient Channel Attention (ECA) is embedded in the output part of the Feature Pyramid Network (FPN), enhancing the model’s ability to capture key information. The improved Mask R-CNN network structure can reduce the loss of ore detail information caused by convolution operations and improve the network’s segmentation accuracy. Comparative experiments between the improved algorithm and the original algorithm show that the average Intersection over Union (MIoU) of the improved algorithm reached 92.8%, which is about a 6.8% increase compared to the original Mask R-CNN algorithm; the average pixel accuracy (mAP) is 97.2%, which is about a 5.1% increase compared to the original algorithm, indicating higher detection accuracy for ore identification and segmentation. Full article

► Show Figures

Figure 1

21 pages, 5384 KB

Open AccessArticle

A Video SAR Multi-Target Tracking Algorithm Based on Re-Identification Features and Multi-Stage Data Association

by Anxi Yu, Boxu Wei, Wenhao Tong, Zhihua He and Zhen Dong

Remote Sens. 2025, 17(6), 959; https://doi.org/10.3390/rs17060959 - 8 Mar 2025

Viewed by 2058

Abstract

Video Synthetic Aperture Radar (ViSAR) operates by continuously monitoring regions of interest to produce sequences of SAR imagery. The detection and tracking of ground-moving targets, through the analysis of their radiation properties and temporal variations relative to the background environment, represents a significant [...] Read more.

Video Synthetic Aperture Radar (ViSAR) operates by continuously monitoring regions of interest to produce sequences of SAR imagery. The detection and tracking of ground-moving targets, through the analysis of their radiation properties and temporal variations relative to the background environment, represents a significant area of focus and innovation within the SAR research community. In this study, some key challenges in ViSAR systems are addressed, including the abundance of low-confidence shadow detections, high error rates in multi-target data association, and the frequent fragmentation of tracking trajectories. A multi-target tracking algorithm for ViSAR that utilizes re-identification (ReID) features and a multi-stage data association process is proposed. The algorithm extracts high-dimensional ReID features using the Dense-Net121 network for enhanced shadow detection and calculates a cost matrix by integrating ReID feature cosine similarity with Intersection over Union similarity. A confidence-based multi-stage data association strategy is implemented to minimize missed detections and trajectory fragmentation. Kalman filtering is then employed to update trajectory states based on shadow detection. Both simulation experiments and actual data processing experiments have demonstrated that, in comparison to two traditional video multi-target tracking algorithms, DeepSORT and ByteTrack, the newly proposed algorithm exhibits superior performance in the realm of ViSAR multi-target tracking, yielding the highest MOTA and HOTA scores of 94.85% and 92.88%, respectively, on the simulated spaceborne ViSAR data, and the highest MOTA and HOTA scores of 82.94% and 69.74%, respectively, on airborne field data. Full article

(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)

► Show Figures

Figure 1

21 pages, 5349 KB

Open AccessArticle

RST-DeepLabv3+: Multi-Scale Attention for Tailings Pond Identification with DeepLab

by Xiangrui Feng, Caiyong Wei, Xiaojing Xue, Qian Zhang and Xiangnan Liu

Remote Sens. 2025, 17(3), 411; https://doi.org/10.3390/rs17030411 - 25 Jan 2025

Cited by 4 | Viewed by 2094

Abstract

Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. [...] Read more.

Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. Therefore, tailing pond identification based on computer vision is of practical significance for environmental protection and safety. In the context of identifying tailings ponds in remote sensing, a significant challenge arises due to high-resolution images, which capture extensive feature details—such as shape, location, and texture—complicated by the mixing of tailings with other waste materials. This results in substantial intra-class variance and limited inter-class variance, making accurate recognition more difficult. Therefore, to monitor tailing ponds, this study utilized an improved version of DeepLabv3+, which is a widely recognized deep learning model for semantic segmentation. We introduced the multi-scale attention modules, ResNeSt and SENet, into the DeepLabv3+ encoder. The split-attention module in ResNeSt captures multi-scale information when processing multiple sets of feature maps, while the SENet module focuses on channel attention, improving the model’s ability to distinguish tailings ponds from other materials in images. Additionally, the tailing pond semantic segmentation dataset NX-TPSet was established based on the Gauge-Fractional-6 image. The ablation experiments show that the recognition accuracy (intersection and integration ratio, IOU) of the RST-DeepLabV3+ model was improved by 1.19% to 93.48% over DeepLabV3+.The multi-attention module enables the model to integrate multi-scale features more effectively, which not only improves segmentation accuracy but also directly contributes to more reliable and efficient monitoring of tailings ponds. The proposed approach achieves top performance on two benchmark datasets, NX-TPSet and TPSet, demonstrating its effectiveness as a practical and superior method for real-world tailing pond identification. Full article

► Show Figures

Figure 1

20 pages, 4082 KB

Open AccessArticle

A Comparative Study of Decoders for Liver and Tumor Segmentation Using a Self-ONN-Based Cascaded Framework

by Sidra Gul, Muhammad Salman Khan, Md Sakib Abrar Hossain, Muhammad E. H. Chowdhury and Md. Shaheenur Islam Sumon

Diagnostics 2024, 14(23), 2761; https://doi.org/10.3390/diagnostics14232761 - 8 Dec 2024

Cited by 4 | Viewed by 2442

Abstract

Background/Objectives: Accurate liver and tumor detection and segmentation are crucial in diagnosis of early-stage liver malignancies. As opposed to manual interpretation, which is a difficult and time-consuming process, accurate tumor detection using a computer-aided diagnosis system can save both time and human efforts. [...] Read more.

Background/Objectives: Accurate liver and tumor detection and segmentation are crucial in diagnosis of early-stage liver malignancies. As opposed to manual interpretation, which is a difficult and time-consuming process, accurate tumor detection using a computer-aided diagnosis system can save both time and human efforts. Methods: We propose a cascaded encoder–decoder technique based on self-organized neural networks, which is a recent variant of operational neural networks (ONNs), for accurate segmentation and identification of liver tumors. The first encoder–decoder CNN segments the liver. For generating the liver region of interest, the segmented liver mask is placed over the input computed tomography (CT) image and then fed to the second Self-ONN model for tumor segmentation. For further investigation the other three distinct encoder–decoder architectures U-Net, feature pyramid networks (FPNs), and U-Net++, have also been investigated by altering the backbone at the encoders utilizing ResNet and DenseNet variants for transfer learning. Results: For the liver segmentation task, Self-ONN with a ResNet18 backbone has achieved a dice similarity coefficient score of 98.182% and an intersection over union of 97.436%. Tumor segmentation with Self-ONN with the DenseNet201 encoder resulted in an outstanding DSC of 92.836% and IoU of 91.748%. Conclusions: The suggested method is capable of precisely locating liver tumors of various sizes and shapes, including tiny infection patches that were said to be challenging to find in earlier research. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

17 pages, 1492 KB

Open AccessArticle

Deep Learning-Based Infrared Image Segmentation for Aircraft Honeycomb Water Ingress Detection

by Hang Fei, Hongfu Zuo, Han Wang, Yan Liu, Zhenzhen Liu and Xin Li

Aerospace 2024, 11(12), 961; https://doi.org/10.3390/aerospace11120961 - 22 Nov 2024

Cited by 1 | Viewed by 1903

Abstract

The presence of water accumulation on aircraft surfaces constitutes a considerable hazard to both performance and safety, necessitating vigilant inspection and maintenance protocols. In this study, we introduce an innovative semantic segmentation model, grounded in deep learning principles, for the precise identification and [...] Read more.

The presence of water accumulation on aircraft surfaces constitutes a considerable hazard to both performance and safety, necessitating vigilant inspection and maintenance protocols. In this study, we introduce an innovative semantic segmentation model, grounded in deep learning principles, for the precise identification and delineation of water accumulation areas within infrared images of aircraft exteriors. Our proposed model harnesses the robust features of ResNet, serving as the foundational architecture for U-Net, thereby augmenting the model’s capacity for comprehensive feature characterization. The incorporation of channel attention mechanisms, spatial attention mechanisms, and depthwise separable convolution further refines the network structure, contributing to enhanced segmentation performance. Through rigorous experimentation, our model surpasses existing benchmarks, yielding a commendable 22.44% reduction in computational effort and a substantial 38.89% reduction in parameter count. The model’s outstanding performance is particularly noteworthy, registering a 92.67% mean intersection over union and a 97.97% mean pixel accuracy. The hallmark of our innovation lies in the model’s efficacy in the precise detection and segmentation of water accumulation areas on aircraft skin. Beyond this, our approach holds promise for addressing analogous challenges in aviation and related domains. The enumeration of specific quantitative outcomes underscores the superior efficacy of our model, rendering it a compelling solution for precise detection and segmentation tasks. The demonstrated reductions in computational effort and parameter count underscore the model’s efficiency, fortifying its relevance in broader contexts. Full article

(This article belongs to the Section Aeronautics)

► Show Figures

Figure 1

18 pages, 3237 KB

Open AccessArticle

Lightweight Wheat Spike Detection Method Based on Activation and Loss Function Enhancements for YOLOv5s

by Jingsong Li, Feijie Dai, Haiming Qian, Linsheng Huang and Jinling Zhao

Agronomy 2024, 14(9), 2036; https://doi.org/10.3390/agronomy14092036 - 6 Sep 2024

Cited by 5 | Viewed by 1461

Abstract

Wheat spike count is one of the critical indicators for assessing the growth and yield of wheat. However, illumination variations, mutual occlusion, and background interference have greatly affected wheat spike detection. A lightweight detection method was proposed based on the YOLOv5s. Initially, the [...] Read more.

Wheat spike count is one of the critical indicators for assessing the growth and yield of wheat. However, illumination variations, mutual occlusion, and background interference have greatly affected wheat spike detection. A lightweight detection method was proposed based on the YOLOv5s. Initially, the original YOLOv5s was improved by combing the additional small-scale detection layer and integrating the ECA (Efficient Channel Attention) attention mechanism into all C3 modules (YOLOv5s + 4 + ECAC3). After comparing GhostNet, ShuffleNetV2, and MobileNetV3, the GhostNet architecture was finally selected as the optimal lightweight model framework based on its superior performance in various evaluations. Subsequently, the incorporation of five different activation functions into the network led to the identification of the RReLU (Randomized Leaky ReLU) activation function as the most effective in augmenting the network’s performance. Ultimately, the network’s loss function of CIoU (Complete Intersection over Union) was optimized using the EIoU (Efficient Intersection over Union) loss function. Despite a minor reduction of 2.17% in accuracy for the refined YOLOv5s + 4 + ECAC3 + G + RR + E network when compared to the YOLOv5s + 4 + ECAC3, there was a marginal improvement of 0.77% over the original YOLOv5s. Furthermore, the parameter count was diminished by 32% and 28.2% relative to the YOLOv5s + 4 + ECAC3 and YOLOv5s, respectively. The model size was reduced by 28.0% and 20%, and the Giga Floating-point Operations Per Second (GFLOPs) were lowered by 33.2% and 9.5%, respectively, signifying a substantial improvement in the network’s efficiency without significantly compromising accuracy. This study offers a methodological reference for the rapid and accurate detection of agricultural objects through the enhancement of a deep learning network. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

20 pages, 7094 KB

Open AccessArticle

DualNet-PoiD: A Hybrid Neural Network for Highly Accurate Recognition of POIs on Road Networks in Complex Areas with Urban Terrain

by Yongchuan Zhang, Caixia Long, Jiping Liu, Yong Wang and Wei Yang

Remote Sens. 2024, 16(16), 3003; https://doi.org/10.3390/rs16163003 - 16 Aug 2024

Cited by 2 | Viewed by 1821

Abstract

For high-precision navigation, obtaining and maintaining high-precision point-of-interest (POI) data on the road network is crucial. In urban areas with complex terrains, the accuracy of traditional road network POI acquisition methods often falls short. To address this issue, we introduce DualNet-PoiD, a hybrid [...] Read more.

For high-precision navigation, obtaining and maintaining high-precision point-of-interest (POI) data on the road network is crucial. In urban areas with complex terrains, the accuracy of traditional road network POI acquisition methods often falls short. To address this issue, we introduce DualNet-PoiD, a hybrid neural network designed for the efficient recognition of road network POIs in intricate urban environments. This method leverages multimodal sensory data, incorporating both vehicle trajectories and remote sensing imagery. Through an enhanced dual-attention dilated link network (DAD-LinkNet) based on ResNet18, the system extracts static geometric features of roads from remote sensing images. Concurrently, an improved gated recirculation unit (GRU) captures dynamic traffic characteristics implied by vehicle trajectories. The integration of a fully connected layer (FC) enables the high-precision identification of various POIs, including traffic light intersections, gas stations, parking lots, and tunnels. To validate the efficacy of DualNet-PoiD, we collected 500 remote sensing images and 50,000 taxi trajectory data samples covering road POIs in the central urban area of the mountainous city of Chongqing. Through comprehensive area comparison experiments, DualNet-PoiD demonstrated a high recognition accuracy of 91.30%, performing robustly even under conditions of complex occlusion. This confirms the network’s capability to significantly improve POI detection in challenging urban settings. Full article

(This article belongs to the Special Issue Data Fusion Methods and AI Technologies for Resilient PNT in Challenging Observation Areas)

► Show Figures

Figure 1

16 pages, 7412 KB

Open AccessArticle

An Identification Method for Mixed Coal Vitrinite Components Based on An Improved DeepLabv3+ Network

by Fujie Wang, Fanfan Li, Wei Sun, Xiaozhong Song and Huishan Lu

Energies 2024, 17(14), 3453; https://doi.org/10.3390/en17143453 - 13 Jul 2024

Cited by 1 | Viewed by 1666

Abstract

To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, [...] Read more.

To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, an atrous convolution layer with a dilation rate of 24 is added to the ASPP (atrous spatial pyramid pooling) module to further increase the receptive field. Meanwhile, a CBAM (convolutional block attention module) attention mechanism with a channel multiplier of 8 is introduced at the output part of the ASPP module to better filter out important semantic features. Then, a corrective convolution module is added to the network’s output to ensure the consistency of each channel’s output feature map for each type of vitrinite. Finally, images of 14 single vitrinite components are used as training samples for network training, and a validation set is used for identification testing. The results show that the improved DeepLabv3+ achieves 6.14% and 3.68% improvements in MIOU (mean intersection over union) and MPA (mean pixel accuracy), respectively, compared to the original DeepLabv3+; 12% and 5.3% improvements compared to U-Net; 9.26% and 4.73% improvements compared to PSPNet with ResNet as the backbone; 5.4% and 9.34% improvements compared to PSPNet with MobileNetV2 as the backbone; and 6.46% and 9.05% improvements compared to HRNet. Additionally, the improved ASPP module increases MIOU and MPA by 3.23% and 1.93%, respectively, compared to the original module. The CBAM attention mechanism with a channel multiplier of 8 improves MIOU and MPA by 1.97% and 1.72%, respectively, compared to the original channel multiplier of 16. The data indicate that the proposed identification method significantly improves recognition accuracy and can be effectively applied to mixed coal vitrinite identification. Full article

(This article belongs to the Special Issue Factor Analysis and Mathematical Modeling of Coals)

► Show Figures

Figure 1

15 pages, 5037 KB

Open AccessArticle

Aerial Image Segmentation of Nematode-Affected Pine Trees with U-Net Convolutional Neural Network

by Jiankang Shen, Qinghua Xu, Mingyang Gao, Jicai Ning, Xiaopeng Jiang and Meng Gao

Appl. Sci. 2024, 14(12), 5087; https://doi.org/10.3390/app14125087 - 11 Jun 2024

Cited by 8 | Viewed by 2034

Abstract

Pine wood nematode disease, commonly referred to as pine wilt, poses a grave threat to forest health, leading to profound ecological and economic impacts. Originating from the pine wood nematode, this disease not only causes the demise of pine trees but also casts [...] Read more.

Pine wood nematode disease, commonly referred to as pine wilt, poses a grave threat to forest health, leading to profound ecological and economic impacts. Originating from the pine wood nematode, this disease not only causes the demise of pine trees but also casts a long shadow over the entire forest ecosystem. The accurate identification of infected trees stands as a pivotal initial step in developing effective prevention and control measures for pine wilt. Nevertheless, existing identification methods face challenges in precisely determining the disease status of individual pine trees, impeding early detection and efficient intervention. In this study, we leverage the capabilities of unmanned aerial vehicle (UAV) remote sensing technology and integrate the VGG classical small convolutional kernel network with U-Net to detect diseased pine trees. This cutting-edge approach captures the spatial and characteristic intricacies of infected trees, converting them into high-dimensional features through multiple convolutions within the VGG network. This method significantly reduces the parameter count while enhancing the sensing range. The results obtained from our validation set are remarkably promising, achieving a Mean Intersection over Union (MIoU) of 81.62%, a Mean Pixel Accuracy (MPA) of 85.13%, an Accuracy of 99.13%, and an F1 Score of 88.50%. These figures surpass those obtained using other methods such as ResNet50 and DeepLab v3+. The methodology presented in this research facilitates rapid and accurate monitoring of pine trees infected with nematodes, offering invaluable technical assistance in the prevention and management of pine wilt disease. Full article

(This article belongs to the Special Issue Deep Learning and Machine Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI