Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (75)

Search Parameters:
Keywords = atrous pyramid structure

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 23454 KB  
Article
Towards Accurate Prediction of Runout Distance of Rainfall-Induced Shallow Landslides: An Integrated Remote Sensing and Explainable Machine Learning Framework in Southeast China
by Xiaoyu Yi, Yuan Wang, Wenkai Feng, Jiachen Zhao, Zhenghai Xue and Ruijian Huang
Remote Sens. 2025, 17(22), 3660; https://doi.org/10.3390/rs17223660 - 7 Nov 2025
Abstract
This study addresses the challenge of predicting runout distance of rainfall-induced shallow landslides by integrating deep learning and explainable machine learning. Using the June 2024 landslide disaster at the Fujian-Guangdong-Jiangxi border as a case study and remote sensing images as the data source, [...] Read more.
This study addresses the challenge of predicting runout distance of rainfall-induced shallow landslides by integrating deep learning and explainable machine learning. Using the June 2024 landslide disaster at the Fujian-Guangdong-Jiangxi border as a case study and remote sensing images as the data source, we developed an improved U-Shaped Convolutional Neural Network model (RAC-Unet) combining Deep Residual Structure, Atrous Spatial Pyramid Pooling, and Convolutional Block Attention Module modules. The model identified 34,376 shallow landslides and built a dynamic parameter database with 8875 samples, which was used for data-driven model training. After comparing models, Extreme Gradient Boosting was chosen as the best (R2 = 0.923), with its performance confirmed by Wilcoxon analysis and good generalization in external validation (R2 = 0.877). SHapley Additive Explanations analysis revealed how factors like the area of the sliding source zone (SA), length/width ratio of the sliding source zone (SLWR), and average slope of the source zone (SS) affect landslide runout, a simplified model using the three parameters SA, SLWR, and SS was constructed (R2 = 0.862). Compared to traditional models, this integrated framework solves the pre-disaster impact range estimation problem, deepens understanding of shallow landslide dynamics, and enables accurate pre- and post-disaster predictions. It provides comprehensive support for disaster risk assessment and emergency response in southeastern hilly areas. Full article
(This article belongs to the Special Issue Advances in AI-Driven Remote Sensing for Geohazard Perception)
Show Figures

Figure 1

18 pages, 16806 KB  
Article
Refined Extraction of Sugarcane Planting Areas in Guangxi Using an Improved U-Net Model
by Tao Yue, Zijun Ling, Yuebiao Tang, Jingjin Huang, Hongteng Fang, Siyuan Ma, Jie Tang, Yun Chen and Hong Huang
Drones 2025, 9(11), 754; https://doi.org/10.3390/drones9110754 - 30 Oct 2025
Viewed by 213
Abstract
Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its [...] Read more.
Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its subtropical climate and abundant solar thermal resources, accounts for over 63% of China’s total sugarcane cultivation area. In this study, we constructed an enhanced RCAU-net model and developed a refined extraction framework that considers different growth stages to enable rapid identification of sugarcane planting areas. This study addresses key challenges in remote-sensing-based sugarcane extraction, namely, the difficulty of distinguishing spectrally similar objects, significant background interference, and insufficient multi-scale feature fusion. To significantly enhance the accuracy and robustness of sugarcane identification, an improved RCAU-net model based on the U-net architecture was designed. The model incorporates three key improvements: it replaces the original encoder with ResNet50 residual modules to enhance discrimination of similar crops; it integrates a Convolutional Block Attention Module (CBAM) to focus on critical features and effectively suppress background interference; and it employs an Atrous Spatial Pyramid Pooling (ASPP) module to bridge the encoder and decoder, thereby optimizing the extraction of multi-scale contextual information. A refined extraction framework that accounts for different growth stages was ultimately constructed to achieve rapid identification of sugarcane planting areas in Guangxi. The experimental results demonstrate that the RCAU-net model performed excellently, achieving an Overall Accuracy (OA) of 97.19%, a Mean Intersection over Union (mIoU) of 94.47%, a Precision of 97.31%, and an F1 Score of 97.16%. These results represent significant improvements of 7.20, 10.02, 6.82, and 7.28 percentage points in OA, mIoU, Precision, and F1 Score, respectively, relative to the original U-net. The model also achieved a Kappa coefficient of 0.9419 and a Recall rate of 96.99%. The incorporation of residual structures significantly reduced the misclassification of similar crops, while the CBAM and ASPP modules minimized holes within large continuous patches and false extractions of small patches, resulting in smoother boundaries for the extracted areas. This work provides reliable data support for the accurate calculation of sugarcane planting area and greatly enhances the decision-making value of remote sensing monitoring in modern agricultural management of sugarcane. Full article
Show Figures

Figure 1

19 pages, 5891 KB  
Article
MS-YOLOv11: A Wavelet-Enhanced Multi-Scale Network for Small Object Detection in Remote Sensing Images
by Haitao Liu, Xiuqian Li, Lifen Wang, Yunxiang Zhang, Zitao Wang and Qiuyi Lu
Sensors 2025, 25(19), 6008; https://doi.org/10.3390/s25196008 - 29 Sep 2025
Viewed by 1378
Abstract
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few [...] Read more.
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few geometric or textural cues, hindering discriminative feature extraction; and (3) successive down-sampling irreversibly discards high-frequency details, while multi-scale pyramids still fail to compensate. To counteract these issues, we propose MS-YOLOv11, an enhanced YOLOv11 variant that integrates “frequency-domain detail preservation, lightweight receptive-field expansion, and adaptive cross-scale fusion.” Specifically, a 2D Haar wavelet first decomposes the image into multiple frequency sub-bands to explicitly isolate and retain high-frequency edges and textures while suppressing noise. Each sub-band is then processed independently by small-kernel depthwise convolutions that enlarge the receptive field without over-smoothing. Finally, the Mix Structure Block (MSB) employs the MSPLCK module to perform densely sampled multi-scale atrous convolutions for rich context of diminutive objects, followed by the EPA module that adaptively fuses and re-weights features via residual connections to suppress background interference. Extensive experiments on DOTA and DIOR demonstrate that MS-YOLOv11 surpasses the baseline in mAP@50, mAP@95, parameter efficiency, and inference speed, validating its targeted efficacy for small-object detection. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

22 pages, 4965 KB  
Article
Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net
by Zhibo Hao, Yitao Chen, Zhongqi Yu, Yongjin Qian and Leping Zhao
Appl. Sci. 2025, 15(18), 9964; https://doi.org/10.3390/app15189964 - 11 Sep 2025
Viewed by 667
Abstract
Aluminum foil sealing is widely employed in industrial packaging, and the quality of sealing plays a crucial role in ensuring product integrity and safety. Thermal infrared images frequently exhibit non-uniform heat distribution and indistinct boundaries within the sealing region. Additionally, variations in thermal [...] Read more.
Aluminum foil sealing is widely employed in industrial packaging, and the quality of sealing plays a crucial role in ensuring product integrity and safety. Thermal infrared images frequently exhibit non-uniform heat distribution and indistinct boundaries within the sealing region. Additionally, variations in thermal response and local structural characteristics are observed across different defect types. Thus, traditional detection methods exhibit limitations regarding their stability and adaptability. In this paper, a novel thermal image recognition algorithm called EAC-Net is proposed for the classification and detection of sealing defects in thermal infrared images. In the proposed method, EfficientNet-B0 is utilized as the backbone network to improve its adaptability for industrial deployment. Furthermore, the Atrous Spatial Pyramid Pooling module is incorporated to enhance the multi-scale perception of defect regions, while the Channel–Spatial Attention Mixing with Channel Shuffle module is adopted to strengthen the focus on critical thermal features. Significant improvements in recognition performance were verified in experiments, while both computational complexity and inference latency were effectively kept at low levels. In the experiments, EAC-Net demonstrated an accuracy of 99.06% and a precision of 99.07%, indicating its high robustness and application potential. Full article
(This article belongs to the Section Applied Thermal Engineering)
Show Figures

Figure 1

21 pages, 3725 KB  
Article
Pruning-Friendly RGB-T Semantic Segmentation for Real-Time Processing on Edge Devices
by Jun Young Hwang, Youn Joo Lee, Ho Gi Jung and Jae Kyu Suhr
Electronics 2025, 14(17), 3408; https://doi.org/10.3390/electronics14173408 - 27 Aug 2025
Viewed by 784
Abstract
RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based [...] Read more.
RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based approaches, which most recent RGB-T semantic segmentation studies belong to, are very difficult to perform on edge devices, this paper considers only CNN-based RGB-T semantic segmentation networks that can be performed on edge devices and operated in real time. Although EAEFNet shows the best performance among CNN-based networks on edge devices, its inference speed is too slow for real-time operation. Furthermore, even when channel pruning is applied, the speed improvement is minimal. The analysis of EAEFNet identifies the intermediate fusion of RGB and thermal features and the high complexity of the decoder as the main causes. To address these issues, this paper proposes a network using a ResNet encoder with an early-fused four-channel input and the U-Net decoder structure. To improve the decoder performance, bilinear upsampling is replaced with PixelShuffle. Additionally, mini Atrous Spatial Pyramid Pooling (ASPP) and Progressive Transposed Module (PTM) modules are applied. Since the Proposed Network is primarily composed of convolutional layers, channel pruning is confirmed to be effectively applicable. Consequently, channel pruning significantly improves inference speed, and enables real-time operation on the neural processing unit (NPU) of edge devices. The Proposed Network is evaluated using the MFNet dataset, one of the most widely used public datasets for RGB-T semantic segmentation. It is shown that the proposed method achieves a performance comparable to EAEFNet while operating at over 30 FPS on an embedded board equipped with the Qualcomm QCS6490 SoC. Full article
(This article belongs to the Special Issue New Insights in 2D and 3D Object Detection and Semantic Segmentation)
Show Figures

Figure 1

21 pages, 6925 KB  
Article
U2-LFOR: A Two-Stage U2 Network for Light-Field Occlusion Removal
by Mostafa Farouk Senussi, Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Mohamed Mahmoud and Hyun-Soo Kang
Mathematics 2025, 13(17), 2748; https://doi.org/10.3390/math13172748 - 26 Aug 2025
Viewed by 638
Abstract
Light-field (LF) imaging transforms occlusion removal by using multiview data to reconstruct hidden regions, overcoming the limitations of single-view methods. However, this advanced capability often comes at the cost of increased computational complexity. To overcome this, we propose the U2-LFOR network, [...] Read more.
Light-field (LF) imaging transforms occlusion removal by using multiview data to reconstruct hidden regions, overcoming the limitations of single-view methods. However, this advanced capability often comes at the cost of increased computational complexity. To overcome this, we propose the U2-LFOR network, an end-to-end neural network designed to remove occlusions in LF images without compromising performance, addressing the inherent complexity of LF imaging while ensuring practical applicability. The architecture employs Residual Atrous Spatial Pyramid Pooling (ResASPP) at the feature extractor to expand the receptive field, capture localized multiscale features, and enable deep feature learning with efficient aggregation. A two-stage U2-Net structure enhances hierarchical feature learning while maintaining a compact design, ensuring accurate context recovery. A dedicated refinement module, using two cascaded residual blocks (ResBlock), restores fine details to the occluded regions. Experimental results demonstrate its competitive performance, achieving an average Peak Signal-to-Noise Ratio (PSNR) of 29.27 dB and Structural Similarity Index Measure (SSIM) of 0.875, which are two widely used metrics for evaluating reconstruction fidelity and perceptual quality, on both synthetic and real-world LF datasets, confirming its effectiveness in accurate occlusion removal. Full article
Show Figures

Figure 1

22 pages, 2420 KB  
Article
BiEHFFNet: A Water Body Detection Network for SAR Images Based on Bi-Encoder and Hybrid Feature Fusion
by Bin Han, Xin Huang and Feng Xue
Mathematics 2025, 13(15), 2347; https://doi.org/10.3390/math13152347 - 23 Jul 2025
Viewed by 456
Abstract
Water body detection in synthetic aperture radar (SAR) imagery plays a critical role in applications such as disaster response, water resource management, and environmental monitoring. However, it remains challenging due to complex background interference in SAR images. To address this issue, a bi-encoder [...] Read more.
Water body detection in synthetic aperture radar (SAR) imagery plays a critical role in applications such as disaster response, water resource management, and environmental monitoring. However, it remains challenging due to complex background interference in SAR images. To address this issue, a bi-encoder and hybrid feature fuse network (BiEHFFNet) is proposed for achieving accurate water body detection. First, a bi-encoder structure based on ResNet and Swin Transformer is used to jointly extract local spatial details and global contextual information, enhancing feature representation in complex scenarios. Additionally, the convolutional block attention module (CBAM) is employed to suppress irrelevant information of the output features of each ResNet stage. Second, a cross-attention-based hybrid feature fusion (CABHFF) module is designed to interactively integrate local and global features through cross-attention, followed by channel attention to achieve effective hybrid feature fusion, thus improving the model’s ability to capture water structures. Third, a multi-scale content-aware upsampling (MSCAU) module is designed by integrating atrous spatial pyramid pooling (ASPP) with the Content-Aware ReAssembly of FEatures (CARAFE), aiming to enhance multi-scale contextual learning while alleviating feature distortion caused by upsampling. Finally, a composite loss function combining Dice loss and Active Contour loss is used to provide stronger boundary supervision. Experiments conducted on the ALOS PALSAR dataset demonstrate that the proposed BiEHFFNet outperforms existing methods across multiple evaluation metrics, achieving more accurate water body detection. Full article
(This article belongs to the Special Issue Advanced Mathematical Methods in Remote Sensing)
Show Figures

Figure 1

21 pages, 7528 KB  
Article
A Fine-Tuning Method via Adaptive Symmetric Fusion and Multi-Graph Aggregation for Human Pose Estimation
by Yinliang Shi, Zhaonian Liu, Bin Jiang, Tianqi Dai and Yuanfeng Lian
Symmetry 2025, 17(7), 1098; https://doi.org/10.3390/sym17071098 - 9 Jul 2025
Viewed by 606
Abstract
Human Pose Estimation (HPE) aims to accurately locate the positions of human key points in images or videos. However, the performance of HPE is often significantly reduced in practical application scenarios due to environmental interference. To address this challenge, we propose a ladder [...] Read more.
Human Pose Estimation (HPE) aims to accurately locate the positions of human key points in images or videos. However, the performance of HPE is often significantly reduced in practical application scenarios due to environmental interference. To address this challenge, we propose a ladder side-tuning method for the Vision Transformer (ViT) pre-trained model based on multi-path feature fusion to improve the accuracy of HPE in highly interfering environments. First, we extract the global features, frequency features and multi-scale spatial features through the ViT pre-trained model, the discrete wavelet convolutional network and the atrous spatial pyramid pooling network (ASPP). By comprehensively capturing the information of the human body and the environment, the ability of the model to analyze local details, textures, and spatial information is enhanced. In order to efficiently fuse these features, we devise an adaptive symmetric feature fusion strategy, which dynamically adjusts the intensity of feature fusion according to the similarity among features to achieve the optimal fusion effect. Finally, a multi-graph feature aggregation method is developed. We construct graph structures of different features and deeply explore the subtle differences among the features based on the dual fusion mechanism of points and edges to ensure the information integrity. The experimental results demonstrate that our method achieves 4.3% and 4.2% improvements in the AP metric on the MS COCO dataset and a custom high-interference dataset, respectively, compared with the HRNet. This highlights its superiority for human pose estimation tasks in both general and interfering environments. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision and Graphics)
Show Figures

Figure 1

18 pages, 3300 KB  
Article
Road Scene Semantic Segmentation Based on MPNet
by Chuanwang Song, Yinghao Ma, Yuanteng Zhou, Zhaoyu Wang, Qingshuo Qi, Keyong Hu and Xiaoling Gong
Electronics 2025, 14(13), 2565; https://doi.org/10.3390/electronics14132565 - 25 Jun 2025
Viewed by 606
Abstract
The increasing demands for high-precision semantic segmentation in applications such as autonomous driving, unmanned aerial vehicles, and robotics has made improving segmentation accuracy a major research focus. In this paper, we propose MPNet, (multi-scale progressive network) a novel semantic segmentation model based on [...] Read more.
The increasing demands for high-precision semantic segmentation in applications such as autonomous driving, unmanned aerial vehicles, and robotics has made improving segmentation accuracy a major research focus. In this paper, we propose MPNet, (multi-scale progressive network) a novel semantic segmentation model based on the DeepLabV3+ architecture. First, a lightweight MobileNetV2 was employed as the backbone network, and a new multi-scale feature fusion structure was constructed by integrating the backbone with the centralized feature pyramid network (CFPNet). Then, based on the ASPP module, a progressive dilated atrous spatial pyramid pooling (PDASPP) module was designed to further enhance feature extraction. Extensive experiments were conducted on the Cityscapes and PASCAL VOC 2012 Augmented. The experimental results show that MPNet achieved 75.23% mIoU and 83.54% mPA on the Cityscapes dataset, outperforming DeepLabV3+ by 3.01% and 3.31%, respectively. On the PASCAL VOC 2012 Augmented dataset, MPNet achieved 72.70% mIoU and 83.57% mPA, with improvements of 1.59% and 1.30% over DeepLabV3+, respectively. These results demonstrate that MPNet significantly improves segmentation accuracy while maintaining model complexity under control, providing an effective solution for semantic segmentation in road scene understanding. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 4072 KB  
Article
ST-YOLOv8: Small-Target Ship Detection in SAR Images Targeting Specific Marine Environments
by Fei Gao, Yang Tian, Yongliang Wu and Yunxia Zhang
Appl. Sci. 2025, 15(12), 6666; https://doi.org/10.3390/app15126666 - 13 Jun 2025
Viewed by 835
Abstract
Synthetic Aperture Radar (SAR) image ship detection faces challenges such as distinguishing ships from other terrains and structures, especially in specific marine complex environments. The motivation behind this work is to enhance detection accuracy while minimizing false positives, which is crucial for applications [...] Read more.
Synthetic Aperture Radar (SAR) image ship detection faces challenges such as distinguishing ships from other terrains and structures, especially in specific marine complex environments. The motivation behind this work is to enhance detection accuracy while minimizing false positives, which is crucial for applications like defense vessel monitoring and civilian search and rescue operations. To achieve this goal, we propose several architectural improvements to You Only Look Once version 8 Nano (YOLOv8n) and present Small Target-YOLOv8(ST-YOLOv8)—a novel lightweight SAR ship detection model based on the enhance YOLOv8n framework. The C2f module in the backbone’s transition sections is replaced by the Conv_Online Reparameterized Convolution (C_OREPA) module, reducing convolutional complexity and improving efficiency. The Atrous Spatial Pyramid Pooling (ASPP) module is added to the end of the backbone to extract finer features from smaller and more complex ship targets. In the neck network, the Shuffle Attention (SA) module is employed before each upsampling step to improve upsampling quality. Additionally, we replace the Complete Intersection over Union (C-IoU) loss function with the Wise Intersection over Union (W-IoU) loss function, which enhances bounding box precision. We conducted ablation experiments on two widely used multimodal SAR datasets. The proposed model significantly outperforms the YOLOv8n baseline, achieving 94.1% accuracy, 82% recall, and 87.6% F1 score on the SAR Ship Detection Dataset (SSDD), and 92.7% accuracy, 84.5% recall, and 88.1% F1 score on the SAR Ship Dataset_v0 dataset (SSDv0). Furthermore, the ST-YOLOv8 model outperforms several state-of-the-art multi-scale ship detection algorithms on both datasets. In summary, the ST-YOLOv8 model, by integrating advanced neural network architectures and optimization techniques, significantly improves detection accuracy and reduces false detection rates. This makes it highly suitable for complex backgrounds and multi-scale ship detection. Future work will focus on lightweight model optimization for deployment on mobile platforms to broaden its applicability across different scenarios. Full article
Show Figures

Figure 1

35 pages, 4507 KB  
Article
Liver Semantic Segmentation Method Based on Multi-Channel Feature Extraction and Cross Fusion
by Chenghao Zhang, Lingfei Wang, Chunyu Zhang, Yu Zhang, Peng Wang and Jin Li
Bioengineering 2025, 12(6), 636; https://doi.org/10.3390/bioengineering12060636 - 11 Jun 2025
Viewed by 918
Abstract
Semantic segmentation plays a critical role in medical image analysis, offering indispensable information for the diagnosis and treatment planning of liver diseases. However, due to the complex anatomical structure of the liver and significant inter-patient variability, the current methods exhibit notable limitations in [...] Read more.
Semantic segmentation plays a critical role in medical image analysis, offering indispensable information for the diagnosis and treatment planning of liver diseases. However, due to the complex anatomical structure of the liver and significant inter-patient variability, the current methods exhibit notable limitations in feature extraction and fusion, which pose a major challenge to achieving accurate liver segmentation. To address these challenges, this study proposes an improved U-Net-based liver semantic segmentation method that enhances segmentation performance through optimized feature extraction and fusion mechanisms. Firstly, a multi-scale input strategy is employed to account for the variability in liver features at different scales. A multi-scale convolutional attention (MSCA) mechanism is integrated into the encoder to aggregate multi-scale information and improve feature representation. Secondly, an atrous spatial pyramid pooling (ASPP) module is incorporated into the bottleneck layer to capture features at various receptive fields using dilated convolutions, while global pooling is applied to enhance the acquisition of contextual information and ensure efficient feature transmission. Furthermore, a Channel Transformer module replaces the traditional skip connections to strengthen the interaction and fusion between encoder and decoder features, thereby reducing the semantic gap. The effectiveness of this method was validated on integrated public datasets, achieving an Intersection over Union (IoU) of 0.9315 for liver segmentation tasks, outperforming other mainstream approaches. This provides a novel solution for precise liver image segmentation and holds significant clinical value for liver disease diagnosis and treatment. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Applications in Healthcare)
Show Figures

Figure 1

16 pages, 3927 KB  
Article
TIANet: A Defect Classification Structure Based on the Combination of CNN and ViT
by Hongjuan Wang, Fangzheng Zhao, Xinjun An, Youjun Zhao, Kunxi Li and Quanbing Guo
Electronics 2025, 14(8), 1502; https://doi.org/10.3390/electronics14081502 - 9 Apr 2025
Cited by 2 | Viewed by 664
Abstract
Defect detection plays a crucial role in ensuring product quality. However, accurate and effective defect detection remains a challenge due to the specific features inherent in defect images, including scale and shape variations. We propose a new defect classification structure—TIANet—which includes a local [...] Read more.
Defect detection plays a crucial role in ensuring product quality. However, accurate and effective defect detection remains a challenge due to the specific features inherent in defect images, including scale and shape variations. We propose a new defect classification structure—TIANet—which includes a local feature extraction module (ISD), a global feature extraction module (DSViT), and an atrous spatial pyramid pooling (ASPP) module. Among them, ISD is composed of inverted residual structure fusing extrusion and incentive attention mechanism and path discarding mechanism to realize the extraction of local features and the learning of complex patterns. DSViT is composed of Vision Transformer and deep separable convolution, which realizes the extraction of global features and fuses them with local features to ensure the accurate feature expression of defects with similar backgrounds. ASPP enhances the multi-scale feature extraction ability and contextual information capture ability of the model and effectively perceives defects of different shapes and scales. Experimental verification on the glass bottle dataset shows that it performs well in the defect classification task of national standard white glass bottle, with an accuracy rate of 95.714%. Compared with the typical network models Vision Transformer and MobileNetV3, TIANet shows significant advantages, which verifies its effectiveness and superiority in the defect classification of glass bottles. Full article
Show Figures

Figure 1

24 pages, 12658 KB  
Article
Camouflaged Object Detection with Enhanced Small-Structure Awareness in Complex Backgrounds
by Yaning Lv, Sanyang Liu, Yudong Gong and Jing Yang
Electronics 2025, 14(6), 1118; https://doi.org/10.3390/electronics14061118 - 12 Mar 2025
Cited by 6 | Viewed by 2111
Abstract
Small-Structure Camouflaged Object Detection (SSCOD) is a highly promising yet challenging task, as small-structure targets often exhibit weaker features and occupy a significantly smaller proportion of the image compared to normal-sized targets. Such data are not only prevalent in existing benchmark camouflaged object [...] Read more.
Small-Structure Camouflaged Object Detection (SSCOD) is a highly promising yet challenging task, as small-structure targets often exhibit weaker features and occupy a significantly smaller proportion of the image compared to normal-sized targets. Such data are not only prevalent in existing benchmark camouflaged object detection datasets but also frequently encountered in real-world scenarios. Although existing camouflaged object detection (COD) methods have significantly improved detection accuracy, research specifically focused on SSCOD remains limited. To further advance the SSCOD task, we propose a detail-preserving multi-scale adaptive network architecture that incorporates the following key components: (1) An adaptive scaling strategy designed to mimic human visual perception when observing blurry targets. (2) An Attentive Atrous Spatial Pyramid Pooling (A2SPP) module, enabling each position in the feature map to autonomously learn the optimal feature scale. (3) A scale integration mechanism, leveraging Haar Wavelet-based Downsampling (HWD) and bilinear upsampling to preserve both contextual and fine-grained details across multiple scales. (4) A Feature Enhancement Module (FEM), specifically tailored to refine feature representations in small-structure detection scenarios. Extensive comparative experiments and ablation studies conducted on three camouflaged object detection datasets, as well as our proposed small-structure test datasets, demonstrated that our framework outperformed existing state-of-the-art (SOTA) methods. Notably, our approach achieved superior performance in detecting small-structured targets, highlighting its effectiveness and robustness in addressing the challenges of SSCOD tasks. Additionally, we conducted polyp segmentation experiments on four datasets, and the results showed that our framework is also well-suited for polyp segmentation, consistently outperforming other recent methods. Full article
Show Figures

Figure 1

20 pages, 3819 KB  
Article
Research on Precise Segmentation and Center Localization of Weeds in Tea Gardens Based on an Improved U-Net Model and Skeleton Refinement Algorithm
by Zhiyong Cao, Shuai Zhang, Chen Li, Wei Feng, Baijuan Wang, Hao Wang, Ling Luo and Hongbo Zhao
Agriculture 2025, 15(5), 521; https://doi.org/10.3390/agriculture15050521 - 27 Feb 2025
Cited by 2 | Viewed by 778
Abstract
The primary objective of this research was to develop an efficient method for accurately identifying and localizing weeds in ecological tea garden environments, aiming to enhance the quality and yield of tea production. Weed competition poses a significant challenge to tea production, particularly [...] Read more.
The primary objective of this research was to develop an efficient method for accurately identifying and localizing weeds in ecological tea garden environments, aiming to enhance the quality and yield of tea production. Weed competition poses a significant challenge to tea production, particularly due to the small size of weed plants, their color similarity to tea trees, and the complexity of their growth environment. A dataset comprising 5366 high-definition images of weeds in tea gardens has been compiled to address this challenge. An enhanced U-Net model, incorporating a Double Attention Mechanism and an Atrous Spatial Pyramid Pooling module, is proposed for weed recognition. The results of the ablation experiments show that the model significantly improves the recognition accuracy and the Mean Intersection over Union (MIoU), which are enhanced by 4.08% and 5.22%, respectively. In addition, to meet the demand for precise weed management, a method for determining the center of weed plants by integrating the center of mass and skeleton structure has been developed. The skeleton was extracted through a preprocessing step and a refinement algorithm, and the relative positional relationship between the intersection point of the skeleton and the center of mass was cleverly utilized to achieve up to 82% localization accuracy. These results provide technical support for the research and development of intelligent weeding equipment for tea gardens, which helps to maintain the ecology of tea gardens and improve production efficiency and also provides a reference for weed management in other natural ecological environments. Full article
(This article belongs to the Special Issue Applications of Remote Sensing in Agricultural Soil and Crop Mapping)
Show Figures

Figure 1

17 pages, 5264 KB  
Article
Automated Road Extraction from Satellite Imagery Integrating Dense Depthwise Dilated Separable Spatial Pyramid Pooling with DeepLabV3+
by Arpan Mahara, Md Rezaul Karim Khan, Liangdong Deng, Naphtali Rishe, Wenjia Wang and Seyed Masoud Sadjadi
Appl. Sci. 2025, 15(3), 1027; https://doi.org/10.3390/app15031027 - 21 Jan 2025
Cited by 4 | Viewed by 2833
Abstract
Road extraction is a sub-domain of remote sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field [...] Read more.
Road extraction is a sub-domain of remote sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field is needed. Convolutional neural networks (CNNs), especially the DeepLab series known for its proficiency in semantic segmentation due to its efficiency in interpreting multi-scale objects’ features, address some of these challenges caused by the varying nature of roads. The present work proposes the utilization of DeepLabV3+, the latest version of the DeepLab series, by introducing an innovative Dense Depthwise Dilated Separable Spatial Pyramid Pooling (DenseDDSSPP) module and integrating it in the place of the conventional Atrous Spatial Pyramid Pooling (ASPP) module. This modification enhances the extraction of complex road structures from satellite images. This study hypothesizes that the integration of DenseDDSSPP with a CNN backbone network and a Squeeze-and-Excitation block will generate an efficient dense feature map by focusing on relevant features, leading to more precise and accurate road extraction from remote sensing images. The Results Section presents a comparison of our model’s performance against state-of-the-art models, demonstrating better results that highlight the effectiveness and success of the proposed approach. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

Back to TopTop