Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (19)

Search Parameters:
Keywords = mixed depth-wise convolution

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 4529 KB  
Article
Intelligent Recognition of Muffled Blasting Sounds and Lithology Prediction in Coal Mines Based on RDGNet
by Gengxin Li, Hua Ding, Kai Wang, Xiaoqiang Zhang and Jiacheng Sun
Sensors 2025, 25(24), 7601; https://doi.org/10.3390/s25247601 - 15 Dec 2025
Viewed by 374
Abstract
In the Yangquan coal mining region, China, muffled blasting sounds commonly occur in mine surrounding rocks resulting from instantaneous energy release following the elastic deformation of overlying brittle rock layers; they are related to fracture development. Although these events rarely cause immediate hazards, [...] Read more.
In the Yangquan coal mining region, China, muffled blasting sounds commonly occur in mine surrounding rocks resulting from instantaneous energy release following the elastic deformation of overlying brittle rock layers; they are related to fracture development. Although these events rarely cause immediate hazards, their acoustic signatures contain critical information about cumulative rock damage. Currently, conventional monitoring of muffled blasting sounds and surrounding rock stability relies on microseismic systems and on-site sampling techniques. However, these methods exhibit low identification efficiency for muffled blasting events, poor real-time performance, and strong subjectivity arising from manual signal interpretation and empirical threshold setting. This article proposes retentive depthwise gated network (RDGNet). By combining retentive network sequence modeling, depthwise separable convolution, and a gated fusion mechanism, RDGNet enables multimodal feature extraction and the fusion of acoustic emission sequences and audio Mel spectrograms, supporting real-time muffled blasting sound recognition and lithology classification. Results confirm model robustness under noisy and multisource mixed-signal conditions (overall accuracy: 92.12%, area under the curve: 0.985, and Macro F1: 0.931). This work provides an efficient approach for intelligent monitoring of coal mine rock stability and can be extended to safety assessments in underground engineering, advancing the mining industry toward preventive management. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

25 pages, 8066 KB  
Article
Estimation of All-Weather Daily Surface Net Radiation over the Tibetan Plateau Using an Optimized CNN Model
by Bin Ma, Yaoming Ma and Weiqiang Ma
Remote Sens. 2025, 17(23), 3894; https://doi.org/10.3390/rs17233894 - 30 Nov 2025
Viewed by 603
Abstract
Accurate daily surface net radiation (Rn) estimation over the Tibetan Plateau’s complex and highly heterogeneous terrain is essential for advancing the understanding of land–atmosphere exchanges and regional climate processes. This study developed an optimized deep learning framework that systematically evaluates 19 [...] Read more.
Accurate daily surface net radiation (Rn) estimation over the Tibetan Plateau’s complex and highly heterogeneous terrain is essential for advancing the understanding of land–atmosphere exchanges and regional climate processes. This study developed an optimized deep learning framework that systematically evaluates 19 CNN architectures using a per-pixel multivariate regression design (1 × 1 × 21). The channel-rich representation incorporates engineered neighborhood descriptors to statistically embed spatial context while fully avoiding the mosaic and boundary artifacts common in patch-based approaches. Among all tested networks, Xception delivered the best combination of accuracy (R2 > 0.94), computational efficiency, and physical consistency. Its depthwise separable convolutions and skip connections enable hierarchical nonlinear cross-channel feature learning, effectively capturing the complex dependencies between surface variables and Rn. Independent validation confirmed stable performance under diverse weather conditions and substantially better skill than GLASS, especially across rugged terrain and high-albedo surfaces. SHAP analysis further highlights physically meaningful behavior, with astronomical and topographic factors contributing ~70% and surface properties ~25% to predictions. Remaining challenges include dependence on continuous high-quality multi-source inputs and scale effects from mixed pixels. Future work will enhance operational deployment through automated daily preprocessing, improved sub-diurnal characterization via multi-scale data fusion, and stronger physical constraints to increase reliability. Full article
(This article belongs to the Section Atmospheric Remote Sensing)
Show Figures

Figure 1

18 pages, 2703 KB  
Article
High-Frequency Guided Dual-Branch Attention Multi-Scale Hierarchical Dehazing Network for Transmission Line Inspection Images
by Jian Sun, Lanqi Guo and Rui Hu
Electronics 2025, 14(23), 4632; https://doi.org/10.3390/electronics14234632 - 25 Nov 2025
Viewed by 376
Abstract
To address the edge blurring issue of drone inspection images of mountainous transmission lines caused by non-uniform haze interference, as well as the low operational efficiency of traditional dehazing algorithms due to increased network complexity, this paper proposes a high-frequency guided dual-branch attention [...] Read more.
To address the edge blurring issue of drone inspection images of mountainous transmission lines caused by non-uniform haze interference, as well as the low operational efficiency of traditional dehazing algorithms due to increased network complexity, this paper proposes a high-frequency guided dual-branch attention multi-scale hierarchical dehazing network for transmission line scenarios. The network adopts a core architecture of multi-block hierarchical processing combined with a multi-scale integration scheme, with each layer based on an asymmetric encoder–decoder with residual channels as the basic framework. A Mix structure module is embedded in the encoder to construct a dual-branch attention mechanism: the low-frequency global perception branch cascades channel attention and pixel attention to model global features; the high-frequency local enhancement branch adopts a multi-directional edge feature extraction method to capture edge information, which is well-adapted to the structural characteristics of transmission line conductors and towers. Additionally, a fog density estimation branch based on the dark channel mean is added to dynamically adjust the weights of the dual branches according to haze concentration, solving the problem of attention failure caused by attenuation of high-frequency signals in dense haze regions. At the decoder end, depthwise separable convolution is used to construct lightweight residual modules, which reduce running time while maintaining feature expression capability. At the output stage, an inter-block feature fusion module is introduced to eliminate cross-block artifacts caused by multi-block processing through multi-strategy collaborative optimization. Experimental results on the public datasets NH-HAZE20, NH-HAZE21, O-HAZE, and the self-built foggy transmission line dataset show that, compared with classic and cutting-edge algorithms, the proposed algorithm significantly outperforms others in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM); its running time is 19% shorter than that of DMPHN. Subjectively, the restored images have continuous and complete edges and high color fidelity, which can meet the practical needs of subsequent fault detection in transmission line inspection. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

19 pages, 5891 KB  
Article
MS-YOLOv11: A Wavelet-Enhanced Multi-Scale Network for Small Object Detection in Remote Sensing Images
by Haitao Liu, Xiuqian Li, Lifen Wang, Yunxiang Zhang, Zitao Wang and Qiuyi Lu
Sensors 2025, 25(19), 6008; https://doi.org/10.3390/s25196008 - 29 Sep 2025
Viewed by 2711
Abstract
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few [...] Read more.
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few geometric or textural cues, hindering discriminative feature extraction; and (3) successive down-sampling irreversibly discards high-frequency details, while multi-scale pyramids still fail to compensate. To counteract these issues, we propose MS-YOLOv11, an enhanced YOLOv11 variant that integrates “frequency-domain detail preservation, lightweight receptive-field expansion, and adaptive cross-scale fusion.” Specifically, a 2D Haar wavelet first decomposes the image into multiple frequency sub-bands to explicitly isolate and retain high-frequency edges and textures while suppressing noise. Each sub-band is then processed independently by small-kernel depthwise convolutions that enlarge the receptive field without over-smoothing. Finally, the Mix Structure Block (MSB) employs the MSPLCK module to perform densely sampled multi-scale atrous convolutions for rich context of diminutive objects, followed by the EPA module that adaptively fuses and re-weights features via residual connections to suppress background interference. Extensive experiments on DOTA and DIOR demonstrate that MS-YOLOv11 surpasses the baseline in mAP@50, mAP@95, parameter efficiency, and inference speed, validating its targeted efficacy for small-object detection. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

24 pages, 12392 KB  
Article
A Robust and High-Accuracy Banana Plant Leaf Detection and Counting Method for Edge Devices in Complex Banana Orchard Environments
by Xing Xu, Guojie Liu, Zihao Luo, Shangcun Chen, Shiye Peng, Huazimo Liang, Jieli Duan and Zhou Yang
Agronomy 2025, 15(9), 2195; https://doi.org/10.3390/agronomy15092195 - 15 Sep 2025
Viewed by 973
Abstract
Leaves are the key organs in photosynthesis and nutrient production, and leaf counting is an important indicator of banana plant health and growth rate. However, in complex orchard environments, leaves often overlap, the background is cluttered, and illumination varies, making accurate segmentation and [...] Read more.
Leaves are the key organs in photosynthesis and nutrient production, and leaf counting is an important indicator of banana plant health and growth rate. However, in complex orchard environments, leaves often overlap, the background is cluttered, and illumination varies, making accurate segmentation and detection challenging. To address these issues, we propose a lightweight banana leaf detection and counting method deployable on embedded devices, which integrates a space–depth-collaborative reasoning strategy with multi-scale feature enhancement to achieve efficient and precise leaf identification and counting. For complex background interference and occlusion, we design a multi-scale attention guided feature enhancement mechanism that employs a Mixed Local Channel Attention (MLCA) module and a Self-Ensembling Attention Mechanism (SEAM) to strengthen local salient feature representation, suppress background noise, and improve discriminability under occlusion. To mitigate feature drift caused by environmental changes, we introduce a task-aware dynamic scale adaptive detection head (DyHead) combined with multi-rate depthwise separable dilated convolutions (DWR_Conv) to enhance multi-scale contextual awareness and adaptive feature recognition. Furthermore, to tackle instance differentiation and counting under occlusion and overlap, we develop a detection-guided space–depth position modeling method that, based on object detection, effectively models the distribution of occluded instances through space–depth feature description, outlier removal, and adaptive clustering analysis. Experimental results demonstrate that our YOLOv8n MDSD model outperforms the baseline by 2.08% in mAP50-95, and achieves a mean absolute error (MAE) of 0.67 and a root mean square error (RMSE) of 1.01 in leaf counting, exhibiting excellent accuracy and robustness for automated banana leaf statistics. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

18 pages, 2565 KB  
Article
Rock Joint Segmentation in Drill Core Images via a Boundary-Aware Token-Mixing Network
by Seungjoo Lee, Yongjin Kim, Yongseong Kim, Jongseol Park and Bongjun Ji
Buildings 2025, 15(17), 3022; https://doi.org/10.3390/buildings15173022 - 25 Aug 2025
Cited by 1 | Viewed by 1086
Abstract
The precise mapping of rock joint traces is fundamental to the design and safety assessment of foundations, retaining structures, and underground cavities in building and civil engineering. Existing deep learning approaches either impose prohibitive computational demands for on-site deployment or disrupt the topological [...] Read more.
The precise mapping of rock joint traces is fundamental to the design and safety assessment of foundations, retaining structures, and underground cavities in building and civil engineering. Existing deep learning approaches either impose prohibitive computational demands for on-site deployment or disrupt the topological continuity of subpixel lineaments that govern rock mass behavior. This study presents BATNet-Lite, a lightweight encoder–decoder architecture optimized for joint segmentation on resource-constrained devices. The encoder introduces a Boundary-Aware Token-Mixing (BATM) block that separates feature maps into patch tokens and directionally pooled stripe tokens, and a bidirectional attention mechanism subsequently transfers global context to local descriptors while refining stripe features, thereby capturing long-range connectivity with negligible overhead. A complementary Multi-Scale Line Enhancement (MLE) module combines depth-wise dilated and deformable convolutions to yield scale-invariant responses to joints of varying apertures. In the decoder, a Skeletal-Contrastive Decoder (SCD) employs dual heads to predict segmentation and skeleton maps simultaneously, while an InfoNCE-based contrastive loss enforces their topological consistency without requiring explicit skeleton labels. Training leverages a composite focal Tversky and edge IoU loss under a curriculum-thinning schedule, improving edge adherence and continuity. Ablation experiments confirm that BATM, MLE, and SCD each contribute substantial gains in boundary accuracy and connectivity preservation. By delivering topology-preserving joint maps with small parameters, BATNet-Lite facilitates rapid geological data acquisition for tunnel face mapping, slope inspection, and subsurface digital twin development, thereby supporting safer and more efficient building and underground engineering practice. Full article
Show Figures

Figure 1

28 pages, 19790 KB  
Article
HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention
by Kaipeng Wang, Guanglin He and Xinmin Li
Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025
Cited by 1 | Viewed by 1407
Abstract
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 1536 KB  
Article
A Study on Energy Consumption in AI-Driven Medical Image Segmentation
by R. Prajwal, S. J. Pawan, Shahin Nazarian, Nicholas Heller, Christopher J. Weight, Vinay Duddalwar and C.-C. Jay Kuo
J. Imaging 2025, 11(6), 174; https://doi.org/10.3390/jimaging11060174 - 26 May 2025
Cited by 1 | Viewed by 2603
Abstract
As artificial intelligence advances in medical image analysis, its environmental impact remains largely overlooked. This study analyzes the energy demands of AI workflows for medical image segmentation using the popular Kidney Tumor Segmentation-2019 (KiTS-19) dataset. It examines how training and inference differ in [...] Read more.
As artificial intelligence advances in medical image analysis, its environmental impact remains largely overlooked. This study analyzes the energy demands of AI workflows for medical image segmentation using the popular Kidney Tumor Segmentation-2019 (KiTS-19) dataset. It examines how training and inference differ in energy consumption, focusing on factors that influence resource usage, such as computational complexity, memory access, and I/O operations. To address these aspects, we evaluated three variants of convolution—Standard Convolution, Depthwise Convolution, and Group Convolution—combined with optimization techniques such as Mixed Precision and Gradient Accumulation. While training is energy-intensive, the recurring nature of inference often results in significantly higher cumulative energy consumption over a model’s life cycle. Depthwise Convolution with Mixed Precision achieves the lowest energy consumption during training while maintaining strong performance, making it the most energy-efficient configuration among those tested. In contrast, Group Convolution fails to achieve energy efficiency due to significant input/output overhead. These findings emphasize the need for GPU-centric strategies and energy-conscious AI practices, offering actionable guidance for designing scalable, sustainable innovation in medical image analysis. Full article
(This article belongs to the Special Issue Imaging in Healthcare: Progress and Challenges)
Show Figures

Figure 1

15 pages, 7036 KB  
Article
Detection of Fiber-Flaw on Pill Surface Based on Lightweight Network SA-MGhost-DVGG
by Jipei Lou, Hongyi Wang, Haodong Liang and Ziwei Wu
Computers 2025, 14(5), 200; https://doi.org/10.3390/computers14050200 - 21 May 2025
Viewed by 809
Abstract
Fiber-flaw detection on pill surfaces is a critical yet challenging task in industrial pharmacy due to diverse defect characteristics. To overcome the limitations of traditional methods in accuracy and real-time performance, this study introduces SA-MGhost-DVGG, a novel lightweight network for enhanced detection. The [...] Read more.
Fiber-flaw detection on pill surfaces is a critical yet challenging task in industrial pharmacy due to diverse defect characteristics. To overcome the limitations of traditional methods in accuracy and real-time performance, this study introduces SA-MGhost-DVGG, a novel lightweight network for enhanced detection. The proposed network integrates an MGhost module for reducing parameters and computational load, a mixed-channel spatial attention (SA) module to refine features specific to fiber regions, and depthwise separable convolutions (DepSepConv) for efficient dimensionality reduction while preserving feature information. Experimental evaluations demonstrate that SA-MGhost-DVGG achieves a mean detection accuracy of 99.01% with an average inference time of 2.23 ms per pill. The findings confirm that SA-MGhost-DVGG effectively balances high accuracy with computational efficiency, offering a robust solution for industrial applications. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

19 pages, 4026 KB  
Article
Power Converter Fault Detection Using MLCA–SpikingShuffleNet
by Li Wang, Feiyang Zhu, Fengfan Jiang and Yuwei Yang
World Electr. Veh. J. 2025, 16(1), 36; https://doi.org/10.3390/wevj16010036 - 12 Jan 2025
Viewed by 2183
Abstract
With the widespread adoption of electric vehicles, the power converter, as a key component, plays a crucial role. Traditional fault detection methods often face challenges in real-time performance and computational efficiency, making it difficult to meet the demands of electric vehicle power converters [...] Read more.
With the widespread adoption of electric vehicles, the power converter, as a key component, plays a crucial role. Traditional fault detection methods often face challenges in real-time performance and computational efficiency, making it difficult to meet the demands of electric vehicle power converters for efficient and accurate fault diagnosis. To address this challenge, this paper proposes a novel fault detection model—SpikingShuffleNet. This paper first designs an efficient SpikingShuffle Unit that integrates grouped convolutions and channel shuffle techniques, effectively reducing the model’s computational complexity by optimizing feature extraction and channel interaction. Next, by appropriately stacking SpikingShuffle Units and refining the network architecture, a complete lightweight diagnostic network is constructed for real-time fault detection in electric vehicle power converters. Finally, the Mixed Local Channel Attention mechanism is introduced to address the potential limitations in feature representation caused by grouped convolutions, further enhancing fault detection accuracy and robustness by balancing local detail preservation and global feature integration. Experimental results show that SpikingShuffleNet exhibits excellent accuracy and robustness in the fault detection task for power converters, fulfilling the real-time fault diagnosis requirements for low-power embedded devices. Full article
(This article belongs to the Special Issue Power Electronics for Electric Vehicles)
Show Figures

Figure 1

16 pages, 10398 KB  
Article
U-Net Semantic Segmentation-Based Calorific Value Estimation of Straw Multifuels for Combined Heat and Power Generation Processes
by Lianming Li, Zhiwei Wang and Defeng He
Energies 2024, 17(20), 5143; https://doi.org/10.3390/en17205143 - 16 Oct 2024
Cited by 1 | Viewed by 1310
Abstract
This paper proposes a system for real-time estimation of the calorific value of mixed straw fuels based on an improved U-Net semantic segmentation model. This system aims to address the uncertainty in heat and power generation per unit time in combined heat and [...] Read more.
This paper proposes a system for real-time estimation of the calorific value of mixed straw fuels based on an improved U-Net semantic segmentation model. This system aims to address the uncertainty in heat and power generation per unit time in combined heat and power generation (CHPG) systems caused by fluctuations in the calorific value of straw fuels. The system integrates an industrial camera, moisture detector, and quality sensors to capture images of the multi-fuel straw. It applies the improved U-Net segmentation network for semantic segmentation of the images, accurately calculating the proportion of each type of straw. The improved U-Net network introduces a self-attention mechanism in the skip connections of the final layer of the encoder, replacing traditional convolutions by depthwise separable convolutions, as well as replacing the traditional convolutional bottleneck layers with Transformer encoder. These changes ensure that the model achieves high segmentation accuracy and strong generalization capability while maintaining good real-time performance. The semantic segmentation results of the straw images are used to calculate the proportions of different types of straw and, combined with moisture content and quality data, the calorific value of the mixed fuel is estimated in real time based on the elemental composition of each straw type. Validation using images captured from an actual thermal power plant shows that, under the same conditions, the proposed model has only a 0.2% decrease in accuracy compared to the traditional U-Net segmentation network, while the number of parameters is significantly reduced by 74%, and inference speed is improved 23%. Full article
(This article belongs to the Special Issue Application of New Technologies in Bioenergy and Biofuel Conversion)
Show Figures

Figure 1

21 pages, 10847 KB  
Article
DLCH-YOLO: An Object Detection Algorithm for Monitoring the Operation Status of Circuit Breakers in Power Scenarios
by Riben Shu, Lihua Chen, Lumei Su, Tianyou Li and Fan Yin
Electronics 2024, 13(19), 3949; https://doi.org/10.3390/electronics13193949 - 7 Oct 2024
Cited by 5 | Viewed by 2226
Abstract
In the scenario of power system monitoring, detecting the operating status of circuit breakers is often inaccurate due to variable object scales and background interference. This paper introduces DLCH-YOLO, an object detection algorithm aimed at identifying the operating status of circuit breakers. Firstly, [...] Read more.
In the scenario of power system monitoring, detecting the operating status of circuit breakers is often inaccurate due to variable object scales and background interference. This paper introduces DLCH-YOLO, an object detection algorithm aimed at identifying the operating status of circuit breakers. Firstly, we propose a novel C2f_DLKA module based on Deformable Large Kernel Attention. This module adapts to objects of varying scales within a large receptive field, thereby more effectively extracting multi-scale features. Secondly, we propose a Semantic Screening Feature Pyramid Network designed to fuse multi-scale features. By filtering low-level semantic information, it effectively suppresses background interference to enhance localization accuracy. Finally, the feature extraction network incorporates Generalized-Sparse Convolution, which combines depth-wise separable convolution and channel mixing operations, reducing computational load. The DLCH-YOLO algorithm achieved a 91.8% mAP on our self-built power equipment dataset, representing a 4.7% improvement over the baseline network Yolov8. With its superior detection accuracy and real-time performance, DLCH-YOLO outperforms mainstream detection algorithms. This algorithm provides an efficient and viable solution for circuit breaker status detection. Full article
Show Figures

Figure 1

21 pages, 11733 KB  
Article
LWSDNet: A Lightweight Wheat Scab Detection Network Based on UAV Remote Sensing Images
by Ning Yin, Wenxia Bao, Rongchao Yang, Nian Wang and Wenqiang Liu
Remote Sens. 2024, 16(15), 2820; https://doi.org/10.3390/rs16152820 - 31 Jul 2024
Cited by 5 | Viewed by 1711
Abstract
Wheat scab can reduce wheat yield and quality. Currently, unmanned aerial vehicles (UAVs) are widely used for monitoring field crops. However, UAV is constrained by limited computational resources on-board the platforms. In addition, compared to ground images, UAV images have complex backgrounds and [...] Read more.
Wheat scab can reduce wheat yield and quality. Currently, unmanned aerial vehicles (UAVs) are widely used for monitoring field crops. However, UAV is constrained by limited computational resources on-board the platforms. In addition, compared to ground images, UAV images have complex backgrounds and smaller targets. Given the aforementioned challenges, this paper proposes a lightweight wheat scab detection network based on UAV. In addition, overlapping cropping and image contrast enhancement methods are designed to preprocess UAV remote-sensing images. Additionally, this work constructed a lightweight wheat scab detection network called LWSDNet using mixed deep convolution (MixConv) to monitor wheat scab in field environments. MixConv can significantly reduce the parameters of the LWSDNet network through depthwise convolution and pointwise convolution, and different sizes of kernels can extract rich scab features. In order to enable LWSDNet to extract more scab features, a scab feature enhancement module, which utilizes spatial attention and dilated convolution, is designed to improve the ability of the network to extract scab features. The MixConv adaptive feature fusion module is designed to accurately detect lesions of different sizes, fully utilizing the semantic and detailed information in the network to enable more accurate detection by LWSDNet. During the training process, a knowledge distillation strategy that integrates scab features and responses is employed to further improve the average precision of LWSDNet detection. Experimental results demonstrate that the average precision of LWSDNet in detecting wheat scab is 79.8%, which is higher than common object detection models and lightweight object detection models. The parameters of LWSDNet are only 3.2 million (M), generally lower than existing lightweight object detection networks. Full article
Show Figures

Graphical abstract

17 pages, 6782 KB  
Article
MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation on Landsat Satellite Imagery for Agriculture in Thailand
by Teerapong Panboonyuen, Chaiyut Charoenphon and Chalermchon Satirapod
Remote Sens. 2023, 15(21), 5124; https://doi.org/10.3390/rs15215124 - 26 Oct 2023
Cited by 8 | Viewed by 4135
Abstract
Semantic segmentation is a fundamental task in remote sensing image analysis that aims to classify each pixel in an image into different land use and land cover (LULC) segmentation tasks. In this paper, we propose MeViT (Medium-Resolution Vision Transformer) on Landsat satellite imagery [...] Read more.
Semantic segmentation is a fundamental task in remote sensing image analysis that aims to classify each pixel in an image into different land use and land cover (LULC) segmentation tasks. In this paper, we propose MeViT (Medium-Resolution Vision Transformer) on Landsat satellite imagery for the main economic crops in Thailand as follows: (i) para rubber, (ii) corn, and (iii) pineapple. Therefore, our proposed MeViT enhances vision transformers (ViTs), one of the modern deep learning on computer vision tasks, to learn semantically rich and spatially precise multi-scale representations by integrating medium-resolution multi-branch architectures with ViTs. We revised mixed-scale convolutional feedforward networks (MixCFN) by incorporating multiple depth-wise convolution paths to extract multi-scale local information to balance the model’s performance and efficiency. To evaluate the effectiveness of our proposed method, we conduct extensive experiments on the publicly available dataset of Thailand scenes and compare the results with several state-of-the-art deep learning methods. The experimental results demonstrate that our proposed MeViT outperforms existing methods and performs better in the semantic segmentation of Thailand scenes. The evaluation metrics used are precision, recall, F1 score, and mean intersection over union (IoU). Among the models compared, MeViT, our proposed model, achieves the best performance in all evaluation metrics. MeViT achieves a precision of 92.22%, a recall of 94.69%, an F1 score of 93.44%, and a mean IoU of 83.63%. These results demonstrate the effectiveness of our proposed approach in accurately segmenting Thai Landsat-8 data. The achieved F1 score overall, using our proposed MeViT, is 93.44%, which is a major significance of this work. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

16 pages, 9001 KB  
Article
MMNet: A Mixing Module Network for Polyp Segmentation
by Raman Ghimire and Sang-Woong Lee
Sensors 2023, 23(16), 7258; https://doi.org/10.3390/s23167258 - 18 Aug 2023
Cited by 1 | Viewed by 2532
Abstract
Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kernel focuses on only a local [...] Read more.
Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kernel focuses on only a local subset of pixels in the entire image. Several recent transformer-based networks have been shown to overcome such limitations. Such networks encode long-range dependencies using self-attention methods and thus learn highly expressive representations. However, due to the computational complexity of modeling the whole image, self-attention is expensive to compute, as there is a quadratic increment in cost with the increase in pixels in the image. Thus, patch embedding has been utilized, which groups small regions of the image into single input features. Nevertheless, these transformers still lack inductive bias, even with the image as a 1D sequence of visual tokens. This results in the inability to generalize to local contexts due to limited low-level features. We introduce a hybrid transformer combined with a convolutional mixing network to overcome computational and long-range dependency issues. A pretrained transformer network is introduced as a feature-extracting encoder, and a mixing module network (MMNet) is introduced to capture the long-range dependencies with a reduced computational cost. Precisely, in the mixing module network, we use depth-wise and 1 × 1 convolution to model long-range dependencies to establish spatial and cross-channel correlation, respectively. The proposed approach is evaluated qualitatively and quantitatively on five challenging polyp datasets across six metrics. Our MMNet outperforms the previous best polyp segmentation methods. Full article
(This article belongs to the Special Issue Machine Learning and AI for Medical Data Analysis)
Show Figures

Figure 1

Back to TopTop