Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (257)

Search Parameters:
Keywords = MobileNetv2 backbone

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 5438 KB  
Article
An Improved DeepLabV3+-Based Method for Crop Row Segmentation and Navigation Line Extraction in Agricultural Fields
by Letian Wu, Yongzhi Cui, Huifeng Shi, Xiaoli Sun, Jiayan Yang, Xinwei Cao, Ping Zou and Ya Liu
Sensors 2026, 26(10), 3142; https://doi.org/10.3390/s26103142 - 15 May 2026
Viewed by 301
Abstract
Accurate crop row detection is identified as a critical prerequisite for autonomous agricultural navigation, yet it remains challenging in complex field environments. To achieve a balance between segmentation accuracy, robustness, and real-time performance, an improved crop row segmentation and navigation method based on [...] Read more.
Accurate crop row detection is identified as a critical prerequisite for autonomous agricultural navigation, yet it remains challenging in complex field environments. To achieve a balance between segmentation accuracy, robustness, and real-time performance, an improved crop row segmentation and navigation method based on the DeepLabV3+ framework was developed. MobileNetV2 was adopted as the backbone to minimize computational costs, while feature representation was enhanced through integrated attention mechanisms and multi-scale fusion. Specifically, split-attention convolution was integrated into the backbone, a DenseASPP + SP module was employed for multi-scale contextual capture, and a Convolutional Block Attention Module (CBAM) was added to refine feature responses. Experimental results demonstrated that the proposed method outperformed mainstream models, achieving a mean Intersection over Union (mIoU) of 93.42% and an f1-score of 96.8%. The model maintained a lightweight architecture with 8.35 M parameters and a real-time speed of 32 FPS. Furthermore, crop row anchor points were extracted and processed via DBSCAN clustering and RANSAC fitting to generate high-precision navigation lines. Validation showed that the middle crop row yielded the highest fitting accuracy with minimal angular and lateral errors. This study provides an efficient visual perception solution for intelligent field operations. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

32 pages, 8414 KB  
Article
TVLightFormer: A Lightweight Cross-Modal Transformer for Language-Guided Target Localization in SAR Imagery
by Yuqiao Zhong, Haoqi Quan, Chenyu Nie, Yingmei Wei and Yanming Guo
Remote Sens. 2026, 18(9), 1430; https://doi.org/10.3390/rs18091430 - 4 May 2026
Viewed by 224
Abstract
We study language-guided target localization in synthetic aperture radar (SAR) imagery for deployment on resource-constrained platforms. Existing vision-language models either rely on heavy backbones unsuitable for edge devices or are designed for natural images, overlooking SAR-specific characteristics such as speckle noise, weak scattering [...] Read more.
We study language-guided target localization in synthetic aperture radar (SAR) imagery for deployment on resource-constrained platforms. Existing vision-language models either rely on heavy backbones unsuitable for edge devices or are designed for natural images, overlooking SAR-specific characteristics such as speckle noise, weak scattering responses, and geometric distortions. The proposed model, TVLightFormer, combines a lightweight dual-modal encoder (MobileNetV3 and TinyBERT) with a grouped-query attention (GQA) mechanism for efficient cross-modal interaction and an activation-free lightweight feature pyramid network (LFPN) to handle scale variation while preserving weak scattering signals. The individual modules are not claimed as newly invented components; the main contribution lies in their SAR-aware integration for edge-oriented cross-modal localization. We evaluate the model on five remote sensing datasets—SOMA-1M, ATRNet-STAR, GAIA, MLRSNet, and SODAS—under a unified localization setting, and we explicitly discuss the limitations introduced by weak or scene-level annotations. The results show that TVLightFormer achieves a favorable trade-off between accuracy and efficiency, reaching an average mIoU of 69.8% with 27.4 M parameters and 9.7 GFLOPs. Ablation studies quantify the contribution of each component. The model is suited for edge-oriented scenarios where computational resources are limited. We also provide a critical analysis of failure cases, SAR-specific disturbance factors, loss-function choices, and dataset-protocol sensitivity. Full article
(This article belongs to the Special Issue Radar and Photo-Electronic Multi-Modal Intelligent Fusion)
Show Figures

Figure 1

19 pages, 10671 KB  
Article
A Vehicle Type Recognition Network Based on Feature Comparison and Mixture of Experts Model
by Taotao Hu, Xiufeng Zhao and Luxia Yang
Vehicles 2026, 8(5), 101; https://doi.org/10.3390/vehicles8050101 - 3 May 2026
Viewed by 293
Abstract
To address the challenges of insufficient feature fusion and incomplete multi-scale information capture in complex traffic scenarios, we propose a vehicle type recognition network based on feature comparison and the Mixture of Experts (MoE) model. Specifically, the MobileNetV4 backbone is introduced to enhance [...] Read more.
To address the challenges of insufficient feature fusion and incomplete multi-scale information capture in complex traffic scenarios, we propose a vehicle type recognition network based on feature comparison and the Mixture of Experts (MoE) model. Specifically, the MobileNetV4 backbone is introduced to enhance deep feature extraction for vehicle targets. Meanwhile, we design a Multi-scale Interleaving Fusion Module (MSIFM), which progressively transmits feature channels via an interleaving structure to capture multi-scale features while enhancing vehicle feature representation. Moreover, we devise a Feature Compare Enhancement Module (FCEM) to efficiently fuse feature maps with different semantic information. By performing feature comparison, it strengthens strongly correlated features while suppressing weakly correlated ones. Finally, we design a Mixture of Experts Feature Enhancement Module (MOEFEM) to aggregate multi-scale feature maps and adaptively capture detailed vehicle features through multiple expert units. Experimental results demonstrate that our method achieves mAP improvements of 2.2% and 2.4% over YOLOv11 on UA-DETRAC and BDD100K, respectively. The proposed method not only improves detection accuracy significantly but also maintains real-time efficiency, providing a practical solution for high-precision vehicle type recognition. It offers valuable technical support for intelligent transportation systems, smart city management, and autonomous driving safety. Full article
(This article belongs to the Section Vehicle Dynamics and Control)
Show Figures

Figure 1

14 pages, 1377 KB  
Article
Multi-Centre Liver Tumour Classification via Federated Learning: Investigating Data Heterogeneity, Transfer Learning, and Model Efficiency
by Degang Zhu, Shiqi Wei and Xinming Zhang
Computers 2026, 15(5), 286; https://doi.org/10.3390/computers15050286 - 1 May 2026
Viewed by 337
Abstract
This paper investigates federated multi-centre liver tumour classification from contrast-enhanced CT under realistic data heterogeneity and domain shift. To address the practical constraint that medical data are often siloed across institutions, we develop a FedProx-based federated learning pipeline that enables collaborative training without [...] Read more.
This paper investigates federated multi-centre liver tumour classification from contrast-enhanced CT under realistic data heterogeneity and domain shift. To address the practical constraint that medical data are often siloed across institutions, we develop a FedProx-based federated learning pipeline that enables collaborative training without exchanging raw patient data. Using the LiTS dataset as the training domain, we construct a slice-level binary classification task based on voxel-level annotations, while rigorously assessing out-of-distribution generalisation on an external held-out dataset, 3D-IRCADb. We conduct comprehensive experiments across multiple backbone architectures, including ResNet-50, EfficientNet-B3, ViT-B/16, and MobileNetV3-Small, comparing FedProx and FedAvg under three heterogeneity intensities (IID, mild non-IID, and severe non-IID). Furthermore, we evaluate transfer learning strategies, ranging from frozen backbones to partial fine-tuning of the last stage, and perform ablations on the proximal coefficient μ and local epochs E to characterise optimisation behaviour. Our results show that FedProx is generally comparable to FedAvg, with slightly more stable behaviour in some heterogeneous settings. We also observe a clear validation-to-external gap, indicating that external-domain robustness remains challenging and requires cautious interpretation for deployment. ImageNet pretraining yields consistent gains, particularly for data-sparse clients, while partial fine-tuning enhances adaptation to CT-specific features. Finally, MobileNetV3-Small offers a favourable performance–efficiency trade-off by reducing communication payload and computation cost, supporting practical deployment on resource-constrained clinical edge devices. Full article
(This article belongs to the Special Issue Machine and Deep Learning in the Health Domain (3rd Edition))
Show Figures

Figure 1

25 pages, 8972 KB  
Article
Deep-MiSR: Multi-Scale Convolution and Attention-Enhanced DeepLabV3+ for Brain Tumor Segmentation in MRI
by Md Parvej Mosharaf, Jie Su and Jing Zhang
Appl. Sci. 2026, 16(8), 3900; https://doi.org/10.3390/app16083900 - 17 Apr 2026
Viewed by 273
Abstract
Accurate brain tumor segmentation in magnetic resonance imaging (MRI) is essential for diagnosis, treatment planning, and therapy monitoring. Conventional deep learning models often struggle with large variations in tumor shape, size, and contrast, as well as severe foreground–background imbalance. To address these challenges, [...] Read more.
Accurate brain tumor segmentation in magnetic resonance imaging (MRI) is essential for diagnosis, treatment planning, and therapy monitoring. Conventional deep learning models often struggle with large variations in tumor shape, size, and contrast, as well as severe foreground–background imbalance. To address these challenges, this study presents Deep-MiSR, an enhanced encoder–decoder framework built upon DeepLabV3+ with a MobileNetV2 backbone, tailored for single-modality contrast-enhanced T1-weighted (T1CE) MRI segmentation. Three complementary components are integrated into the architecture: mixed depthwise convolution (MixConv) with heterogeneous kernels within the atrous spatial pyramid pooling module for multi-scale feature aggregation, a squeeze-and-excitation block for adaptive channel recalibration, and R-Drop regularization that enforces prediction consistency via symmetric Kullback–Leibler divergence. The model was evaluated on 3064 T1CE slices from 233 patients drawn from the publicly available Nanfang Hospital brain MRI dataset. Deep-MiSR achieved a Dice similarity coefficient of 0.9281, a mean intersection-over-union of 0.8738, a precision of 0.8839, and a 95th-percentile Hausdorff distance of 7.69 mm, demonstrating consistent improvements over both the DeepLabV3+ baseline and all prior methods evaluated on the same data. Ablation studies confirmed that each component contributes independently, with R-Drop providing the largest individual gain. These findings demonstrate that combining multi-scale convolution, channel attention, and consistency regularization constitutes an effective and computationally practical strategy for robust single-modality brain tumor segmentation. Full article
(This article belongs to the Special Issue Advances in Deep Learning-Based Medical Image Analysis: 2nd Edition)
Show Figures

Figure 1

18 pages, 4176 KB  
Article
An Attention-Enhanced Network for Visual Attitude Estimation
by Lu Liu, Jiahao Duan, Yaoyang Shen, Shihan Wang, Jiale Mao, Wei Liu, Yuyan Guo, Lan Wu, Ming Kong and Hang Yu
Algorithms 2026, 19(4), 309; https://doi.org/10.3390/a19040309 - 15 Apr 2026
Viewed by 255
Abstract
Accurate estimation of object attitude is essential for understanding motion behavior and achieving dynamic tracking. Existing image-based methods often suffer from low efficiency and limited accuracy, while the potential of deep learning has not been fully exploited in this field. To address these [...] Read more.
Accurate estimation of object attitude is essential for understanding motion behavior and achieving dynamic tracking. Existing image-based methods often suffer from low efficiency and limited accuracy, while the potential of deep learning has not been fully exploited in this field. To address these limitations, a lightweight deep learning method for attitude estimation is proposed and validated on spherical particles. A synthetic dataset is generated through VTK-based rendering and automatic annotation, providing large-scale training samples with known Euler angles. An improved MobileNetV1 backbone is developed by integrating Squeeze-and-Excitation blocks, a dual-scale Pyramid Pooling Module, global average pooling, and a regression-oriented multilayer perceptron, which enhances feature extraction and enables direct Euler angle prediction. Experimental results show that the proposed method achieves an average error of 0.308° on synthetic test images. Furthermore, a solid particle was fabricated through 3D printing and physical measurements were conducted, where the network combined with image preprocessing and augmentation achieved an average error of about 0.5° on real images, demonstrating a lightweight and deployment-friendly framework for practical attitude estimation. The results verify the effectiveness of the method and demonstrate its potential for accurate and computationally efficient attitude measurement in applications such as fluid dynamics, industrial inspection, and motion tracking. Full article
Show Figures

Figure 1

15 pages, 2117 KB  
Article
TI-YOLO: A Lightweight and Efficient Anatomical Structure Detection Model for Tracheal Intubation
by Yu Tian, Congliang Yang, Lingfeng Sang, Cicao Ping, Lili Feng, Weixiong Chen, Hongbo Wang, Wenxian Li and Yuan Han
Bioengineering 2026, 13(4), 451; https://doi.org/10.3390/bioengineering13040451 - 13 Apr 2026
Viewed by 523
Abstract
Accurate and rapid detection of anatomical structures, such as the glottis, is critical during tracheal intubation (TI) to ensure patient safety and procedural success. However, it remains a challenge due to the limited field of view and computational resources of video laryngoscopy, especially [...] Read more.
Accurate and rapid detection of anatomical structures, such as the glottis, is critical during tracheal intubation (TI) to ensure patient safety and procedural success. However, it remains a challenge due to the limited field of view and computational resources of video laryngoscopy, especially for difficult airway situations. Existing deep learning (DL) models struggle to balance high accuracy and real-time clinical deployment. To address these issues, we propose TI-YOLO (TI-You Only Look Once), a lightweight and efficient object detection model built upon the YOLOv11 architecture. TI-YOLO introduces the Bidirectional Feature Pyramid Network (BiFPN) module for multi-scale feature fusion, effectively enhancing the ability to detect anatomical structures of different sizes. TI-YOLO integrates the Deformable Attention Transformer (DAT) module to enhance the perception of crucial regions, improving detection accuracy and robustness. To further reduce the consumption of computational resources while maintaining efficiency, TI-YOLO is optimized by reconstructing the backbone based on MobileNetV4. Furthermore, TI-YOLO employs the Slide Weight Function (SWF) as a loss function during model training to mitigate the class imbalance within the dataset. One self-built dataset is used to validate the effectiveness of TI-YOLO. Compared to the original YOLOv11, TI-YOLO achieves mean Average Precision at IoU 0.50 (mAP50) scores of 0.902, with improvements of 3.8%. Meanwhile, TI-YOLO balances detection accuracy and computational efficiency with a 10.5% reduction in floating-point operations (FLOPs) and a 28.9% reduction in parameters, and the model weight is only 4.6 MB. Additionally, to evaluate TI-YOLO real-time inference capability, we quantize and deploy it on a low-cost embedded OrangePi 5 platform. The inference speed reaches over 50 frames per second (FPS), meeting real-time clinical requirements. Full article
Show Figures

Figure 1

22 pages, 2255 KB  
Article
Distributed Stochastic Multi-GPU Hyperparameter Optimization for Transfer Learning-Based Vehicle Detection Under Degraded Visual Conditions
by Zhi-Ren Tsai and Jeffrey J. P. Tsai
Algorithms 2026, 19(4), 296; https://doi.org/10.3390/a19040296 - 10 Apr 2026
Viewed by 331
Abstract
Robust vehicle detection in real-world traffic surveillance remains challenging due to degraded imagery caused by motion blur, adverse weather, and low illumination, which significantly increases detector sensitivity to hyperparameter configurations. This study proposes a “Frugal AI” distributed multi-GPU framework that optimizes hyperparameters via [...] Read more.
Robust vehicle detection in real-world traffic surveillance remains challenging due to degraded imagery caused by motion blur, adverse weather, and low illumination, which significantly increases detector sensitivity to hyperparameter configurations. This study proposes a “Frugal AI” distributed multi-GPU framework that optimizes hyperparameters via a stochastic simplex-based search coupled with five-fold cross-validation. Utilizing three low-cost NVIDIA GTX 1050 Ti GPUs, the framework performs parallel candidate exploration with an asynchronous model-level exchange mechanism to escape local optima without the overhead of gradient synchronization. Seven CNN backbones—VGG16, VGG19, GoogLeNet, MobileNetV2, ResNet18, ResNet50, and ResNet101—were evaluated within YOLOv2 and Faster R-CNN detectors. To address memory constraints (4 GB VRAM), YOLOv2 was selected for extensive benchmarking. Performance was measured using a harmonic precision–recall-based cost metric to strictly penalize imbalanced outcomes. Experimental results demonstrate that under identical wall-clock time budgets, the proposed framework achieves an average 1.38% reduction in aggregated cost across all models, with the highly sensitive VGG19 backbone showing a 4.00% improvement. Benchmarking against Bayesian optimization, genetic algorithms, and random search confirms that our method achieves superior optimization quality with statistical significance (p < 0.05). Under a rigorous IoU = 0.75 threshold, the optimized models consistently yielded F1-scores 0.8444 ± 0.0346. Ablation studies further validate that the collaborative model exchange is essential for accelerating convergence in rugged loss landscapes. This research offers a practical, scalable, and cost-efficient solution for deploying robust AI surveillance in resource-constrained smart city infrastructure. Full article
(This article belongs to the Special Issue Advances in Deep Learning-Based Data Analysis)
Show Figures

Figure 1

21 pages, 5426 KB  
Article
Deep Learning-Based Recognition and Classification of Jin Cang Embroidery Stitches
by Ke-Ke Sun, Lu-Fei Yang, Zi-Ning Lan and Lu Gao
Mathematics 2026, 14(8), 1259; https://doi.org/10.3390/math14081259 - 10 Apr 2026
Viewed by 445
Abstract
Jin Cang embroidery, characterized by elaborate metallic threadwork and intricate textural patterns, is an important form of intangible cultural heritage. The digital preservation of Jin Cang embroidery is hindered by the scarcity of specialized datasets and the lack of object detection models that [...] Read more.
Jin Cang embroidery, characterized by elaborate metallic threadwork and intricate textural patterns, is an important form of intangible cultural heritage. The digital preservation of Jin Cang embroidery is hindered by the scarcity of specialized datasets and the lack of object detection models that balance high performance with computational efficiency for edge deployment. To address these challenges, a dedicated dataset comprising 3050 images across eight core stitch categories is introduced as the first dataset of its kind for Jin Cang embroidery. Building upon this foundation, Lite-YOLOv11s, a domain-specific lightweight detection framework, is proposed with MobileNetV4 as its backbone to improve the extraction of high-frequency texture cues associated with metallic threadwork. Experimental results show that Lite-YOLOv11s achieves an mAP@0.5 of 0.951, outperforming the YOLOv11s baseline (0.927) while reducing model parameters by 40% and FLOPs by 46%. EigenCAM visualizations further show that the model can localize discriminative stitch-level features even under complex backgrounds. This work provides an efficient and deployable solution for intelligent embroidery recognition and offers a useful reference for the digital preservation of other fine-grained cultural heritage crafts. Full article
Show Figures

Figure 1

36 pages, 15892 KB  
Article
UAV Real-Time Image Recognition Using Lightweight YOLOv11
by Xin-Yu Zhang and Jih-Gau Juang
Appl. Sci. 2026, 16(7), 3468; https://doi.org/10.3390/app16073468 - 2 Apr 2026
Viewed by 502
Abstract
Unmanned aerial vehicles (UAVs) for environmental monitoring typically rely on embedded platforms with limited computational capacity, which constrains the deployment of highly accurate yet computationally demanding object-detection models. To address this challenge and enable real-time image recognition under resource limitations, this study develops [...] Read more.
Unmanned aerial vehicles (UAVs) for environmental monitoring typically rely on embedded platforms with limited computational capacity, which constrains the deployment of highly accurate yet computationally demanding object-detection models. To address this challenge and enable real-time image recognition under resource limitations, this study develops three lightweight neural network architectures based on the YOLOv11 framework. The proposed designs aim to significantly reduce computational complexity and parameter count while maintaining stable and reliable detection performance, thereby improving inference efficiency and deployment flexibility on UAV platforms. YOLOv11-M is selected as the baseline model due to its favorable trade-off between detection accuracy and inference speed. Three lightweight strategies are then proposed and evaluated. First, a Ghost Convolution approach replaces portions of standard convolution with low-cost linear operations, effectively reducing both parameter size and computational overhead during feature extraction. Second, MobileNetV4 is employed as the backbone network; its optimized bottleneck structures and attention mechanisms enable substantial model compression without compromising recognition performance. Third, a MobileOne architecture with reparameterization is introduced, in which multi-branch structures enhance feature learning during training and are subsequently merged into a single-path network for inference, thereby significantly reducing computational cost and improving practical deployability. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

31 pages, 12308 KB  
Article
An Improved MSEM-Deeplabv3+ Method for Intelligent Detection of Rock Mass Fractures
by Chi Zhang, Shu Gan, Xiping Yuan, Weidong Luo, Chong Ma and Yi Li
Remote Sens. 2026, 18(7), 1041; https://doi.org/10.3390/rs18071041 - 30 Mar 2026
Viewed by 433
Abstract
Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited [...] Read more.
Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited suppression of background noise, inadequate multi-scale feature representation, and large parameter sizes—making it difficult to strike a balance between detection accuracy and deployment efficiency. Focusing on the Wanshanshan quarry in Yunnan, this study first constructs a high-precision digital model using close-range photogrammetry and 3D real-scene reconstruction. A lightweight yet high-accuracy intelligent detection method, termed MSEM-Deeplabv3+, is then proposed for rock mass fracture extraction. The model adopts lightweight MobileNetV2 as the backbone network, incorporating inverted residual modules and depthwise separable convolutions, resulting in a parameter size of only 6.02 MB and FLOPs of 30.170 G—substantially reducing computational overhead. Furthermore, the proposed MAGF (Multi-Scale Attention Gated Fusion) and SCSA (Spatial-Channel Synergistic Attention) modules are integrated to enhance the representation of fracture details and semantic consistency while effectively suppressing multi-source and multi-scale background interference. Experimental results demonstrate that the proposed model achieves an mPA of 89.69%, mIoU of 83.71%, F1-Score of 90.41%, and Kappa coefficient of 80.81%, outperforming the classic Deeplabv3+ model by 5.81%, 6.18%, 4.53%, and 9.2%, respectively. It also significantly surpasses benchmark models such as U-Net and HRNet. The method accurately captures fine and continuous fracture details, preserves the spatial distribution of long-range continuous fractures, and maintains robust performance on the CFD cross-scene dataset, showcasing strong adaptability and generalization capability. This approach effectively mitigates the risks associated with manual high-altitude inspections and provides a lightweight, high-precision, non-contact intelligent solution for fracture detection in high-steep rock slopes. Full article
Show Figures

Figure 1

25 pages, 260979 KB  
Article
RDAH-Net: Bridging Relative Depth and Absolute Height for Monocular Height Estimation in Remote Sensing
by Liting Jiang, Feng Wang, Niangang Jiao, Jingxing Zhu, Yuming Xiang and Hongjian You
Remote Sens. 2026, 18(7), 1024; https://doi.org/10.3390/rs18071024 - 29 Mar 2026
Viewed by 531
Abstract
Generating high-precision normalized digital surface models (nDSMs) from a single remote sensing image remains a challenging and ill-posed problem due to the absence of reliable geometric constraints. In this work, we show that monocular depth provides structurally stable cues of local geometry but [...] Read more.
Generating high-precision normalized digital surface models (nDSMs) from a single remote sensing image remains a challenging and ill-posed problem due to the absence of reliable geometric constraints. In this work, we show that monocular depth provides structurally stable cues of local geometry but lacks the global scale and vertical reference required for absolute height recovery. This intrinsic mismatch limits direct depth-to-height regression, particularly when transferring across heterogeneous terrains, land-cover compositions, and imaging conditions. Building on this idea, we propose the Relative Depth–Absolute Height Prediction Network (RDAH-Net), a framework that exploits relative depth as a geometry-aware prior while learning terrain-dependent height mappings from image appearance to absolute height. As the backbone, we employ a lightweight MobileNetV2 enhanced with a Convolutional Block Attention Module (CBAM), and further incorporate a cross-modal bidirectional attention fusion scheme with positional encoding to achieve a deep and effective fusion of image appearance and depth prior cues. Finally, a PixelShuffle-based upsampling strategy is used to sharpen prediction details and mitigate typical upsampling artifacts. Extensive experiments across diverse regions demonstrate that RDAH-Net achieves robust and generalizable height estimation, providing a practical alternative for large-scale mapping and rapid update scenarios. Full article
Show Figures

Figure 1

22 pages, 4435 KB  
Article
Semantic Mapping in Public Indoor Environments Using Improved Instance Segmentation and Continuous-Frame Dynamic Constraint
by Yumin Lu, Xueyu Feng, Zonghuan Guo, Jianchao Wang, Lin Zhou and Yingcheng Lin
Electronics 2026, 15(7), 1392; https://doi.org/10.3390/electronics15071392 - 26 Mar 2026
Viewed by 546
Abstract
Reliable semantic perception is crucial for service robots operating in complex public indoor environments. However, existing semantic mapping approaches often face the dual challenges of high computational overhead and semantic redundancy in maps. To address these limitations, this paper proposes a low-resource semantic [...] Read more.
Reliable semantic perception is crucial for service robots operating in complex public indoor environments. However, existing semantic mapping approaches often face the dual challenges of high computational overhead and semantic redundancy in maps. To address these limitations, this paper proposes a low-resource semantic mapping framework based on improved instance segmentation and dynamic constraints from consecutive frames. First, we design the lightweight model MS-YOLO, which adopts MobileNetV4 as its backbone network and incorporates the SHViT neck module, effectively optimizing the balance between detection accuracy and computational cost. Second, we propose a consecutive frame dynamic constraint method that eliminates redundant object annotations through consecutive frame stability verification. Experimental results relating to both fusion and custom datasets demonstrate that compared to YOLOv8n-seg, MS-YOLO achieves improvements in accuracy, recall, and mAP@0.5, while reducing the number of parameters by 11.7% and floating-point operations (FLOPs) by 32.2%. Furthermore, compared to YOLOv11n-seg and YOLOv5n-seg, its FLOPs are reduced by 17.2% and 25.5%, respectively. Finally, the successful deployment and field validation of this system on the Jetson Orin NX platform demonstrate its real-time capability and engineering practicality for edge computing in public indoor service robots. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

38 pages, 150385 KB  
Article
ERD-YOLO-DMS: A Multi-Domain Fusion Framework for High-Speed Real-Time Online Plywood Veneer Detection
by Hongxu Li, Zhihong Liang, Mingming Qin, Shihuan Xie, Yuxiang Huang, Xinyu Tong and Linghao Dai
Forests 2026, 17(4), 404; https://doi.org/10.3390/f17040404 - 24 Mar 2026
Viewed by 315
Abstract
Plywood has emerged as a key sustainable material in modern building. Yet, ensuring its consistent performance requires rigorous quality control of the rotary-cut veneers used in its manufacture. This task is complicated by the high-speed nature of industrial conveyors, where motion blur and [...] Read more.
Plywood has emerged as a key sustainable material in modern building. Yet, ensuring its consistent performance requires rigorous quality control of the rotary-cut veneers used in its manufacture. This task is complicated by the high-speed nature of industrial conveyors, where motion blur and the complex, varying textures of eucalyptus wood drastically reduce the effectiveness of real-time surface inspection. This study proposes an intelligent, real-time defect detection system specifically optimized for the diverse defect morphology of eucalyptus veneers. A lightweight model, YOLOv11-DMS-Veneers, was developed by integrating MobileNetV4 as the backbone, a Dynamic Head for multi-scale feature extraction, and a Shape-IoU loss function to precisely localize irregular defects like cracks and knots. Additionally, an ERD video enhancement framework (combining ESRGAN, RIFE, and DnCNN) was implemented to mitigate motion blur in dynamic environments. Experimental results demonstrate that the proposed model achieves a mean Average Precision (mAP@50) of 96.0% and a Precision of 95.7% with a low computational cost of only 4.5 GFlops, significantly outperforming traditional algorithms. Notably, the detection precision for challenging linear cracks reached 93.9%. In dynamic tests at conveyor speeds up to 24 m/min, the video enhancement strategy increased the average detection confidence by 0.288, maintaining a maximum confidence of 0.890. This technology offers a robust solution for the automated quality control of eucalyptus veneers, facilitating the production of high-performance plywood and advancing the efficient application of engineered wood in the building industry. Full article
Show Figures

Figure 1

27 pages, 5519 KB  
Article
An Approach to Crayfish Weight Estimation Based on Pose Awareness
by Xuhui Ye, Mingyang He, Jun Wang, Lilu Huang, Jing Xu, Rihui Zhang and Bo Li
Appl. Sci. 2026, 16(6), 3019; https://doi.org/10.3390/app16063019 - 20 Mar 2026
Viewed by 350
Abstract
To address the challenges of low accuracy and poor robustness in industrial crayfish weight estimation caused by variable postures, this paper proposes a lightweight method that integrates pose awareness. First, a multi-task perception model, Crayfish-YOLO, is developed based on the YOLOv8s-Seg framework. By [...] Read more.
To address the challenges of low accuracy and poor robustness in industrial crayfish weight estimation caused by variable postures, this paper proposes a lightweight method that integrates pose awareness. First, a multi-task perception model, Crayfish-YOLO, is developed based on the YOLOv8s-Seg framework. By reconstructing the backbone with MobileNetV3 and integrating Coordinate Attention (CA), CARAFE upsampling, and the Wise Intersection over Union (Wise-IoU) loss function, the model is significantly compressed while enhancing its ability to output high-fidelity pixel-level masks and pose categories. Second, a pose-adaptive weight estimation strategy is proposed, which leverages perceived pose information to dynamically invoke the optimal regression model from a pre-constructed heterogeneous model library. Using seven core geometric features extracted from the segmentation masks, the system achieves precise weight estimation. Experimental results on a self-built dataset show that Crayfish-YOLO reduces parameters by 75.2% compared to YOLOv8s-Seg, while core segmentation accuracy (mAP50~95 (Seg)) improves by 1.1%. The integrated end-to-end system achieves a Mean Absolute Error (MAE) of 2.1 g and a mean coefficient of determination (R2) of 0.92, significantly outperforming comparative algorithms. This research provides an efficient visual perception and estimation solution for the automated grading of crayfish and similar non-rigid aquatic products. Full article
Show Figures

Figure 1

Back to TopTop