Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (966)

Search Parameters:
Keywords = fusion application environment

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 6446 KB  
Article
Lightweight GAFNet Model for Robust Rice Pest Detection in Complex Agricultural Environments
by Yang Zhou, Wanqiang Huang, Benjing Liu, Tianhua Chen, Jing Wang, Qiqi Zhang and Tianfu Yang
AgriEngineering 2026, 8(1), 26; https://doi.org/10.3390/agriengineering8010026 (registering DOI) - 10 Jan 2026
Abstract
To address challenges such as small target size, high density, severe occlusion, complex background interference, and edge device computational constraints, a lightweight model, GAFNet, is proposed based on YOLO11n, optimized for rice pest detection in field environments. To improve feature perception, we propose [...] Read more.
To address challenges such as small target size, high density, severe occlusion, complex background interference, and edge device computational constraints, a lightweight model, GAFNet, is proposed based on YOLO11n, optimized for rice pest detection in field environments. To improve feature perception, we propose the Global Attention Fusion and Spatial Pyramid Pooling (GAM-SPP) module, which captures global context and aggregates multi-scale features. Building on this, we introduce the C3-Efficient Feature Selection Attention (C3-EFSA) module, which refines feature representation by combining depthwise separable convolutions (DWConv) with lightweight channel attention to enhance background discrimination. The model’s detection head, Enhanced Ghost Detect (EGDetect), integrates Enhanced Ghost Convolution (EGConv), Squeeze-and-Excitation (SE), and Sigmoid-Weighted Linear Unit (SiLU) activation, which reduces redundancy. Additionally, we propose the Focal-Enhanced Complete-IoU (FECIoU) loss function, incorporating stability and hard-sample weighting for improved localization. Compared to YOLO11n, GAFNet improves Precision, Recall, and mean Average Precision (mAP) by 3.5%, 4.2%, and 1.6%, respectively, while reducing parameters and computation by 5% and 21%. GAFNet can deploy on edge devices, providing farmers with instant pest alerts. Further, GAFNet is evaluated on the AgroPest-12 dataset, demonstrating enhanced generalization and robustness across diverse pest detection scenarios. Overall, GAFNet provides an efficient, reliable, and sustainable solution for early pest detection, precision pesticide application, and eco-friendly pest control, advancing the future of smart agriculture. Full article
Show Figures

Figure 1

25 pages, 30724 KB  
Article
Prediction of Optimal Harvest Timing for Melons Through Integration of RGB Images and Greenhouse Environmental Data: A Practical Approach Including Marker Effect Analysis
by Kwangho Yang, Sooho Jung, Jieun Lee, Uhyeok Jung and Meonghun Lee
Agriculture 2026, 16(2), 169; https://doi.org/10.3390/agriculture16020169 - 9 Jan 2026
Abstract
Non-destructive prediction of harvest timing is increasingly important in greenhouse melon cultivation, yet image-based methods alone often fail to reflect environmental factors affecting fruit development. Likewise, environmental or fertigation data alone cannot capture fruit-level variation. This gap calls for a multimodal approach integrating [...] Read more.
Non-destructive prediction of harvest timing is increasingly important in greenhouse melon cultivation, yet image-based methods alone often fail to reflect environmental factors affecting fruit development. Likewise, environmental or fertigation data alone cannot capture fruit-level variation. This gap calls for a multimodal approach integrating both sources of information. This study presents a fusion model combining RGB images with environmental and fertigation data to predict optimal harvest timing for melons. A YOLOv8n-based model detected fruits and estimated diameters under marker and no-marker conditions, while an LSTM processed time-series variables including temperature, humidity, CO2, light intensity, irrigation, and electrical conductivity. The extracted features were fused through a late-fusion strategy, followed by an MLP for predicting diameter, biomass, and harvest date. The marker condition improved detection accuracy; however, the no-marker condition also achieved sufficiently high performance for field application. Diameter and weight showed a strong correlation (R2 > 0.9), and the fusion model accurately predicted the actual harvest date of August 28, 2025. These results demonstrate the practicality of multimodal fusion for reliable, non-destructive harvest prediction and highlight its potential to bridge the gap between controlled experiments and real-world smart farming environments. Full article
20 pages, 4726 KB  
Article
Enhancing SeeGround with Relational Depth Text for 3D Visual Grounding
by Hyun-Sik Jeon, Seong-Hui Kang and Jong-Eun Ha
Appl. Sci. 2026, 16(2), 652; https://doi.org/10.3390/app16020652 - 8 Jan 2026
Viewed by 42
Abstract
Three-dimensional visual grounding is a core technology that identifies specific objects within complex 3D scenes based on natural language instructions, enhancing human–machine interactions in robotics and augmented reality domains. Traditional approaches have focused on supervised learning, which relies on annotated data; however, zero-shot [...] Read more.
Three-dimensional visual grounding is a core technology that identifies specific objects within complex 3D scenes based on natural language instructions, enhancing human–machine interactions in robotics and augmented reality domains. Traditional approaches have focused on supervised learning, which relies on annotated data; however, zero-shot methodologies are emerging due to the high costs of data construction and limitations in generalization. SeeGround achieves state-of-the-art performance by integrating 2D rendered images and spatial text descriptions. Nevertheless, SeeGround exhibits vulnerabilities in clearly discerning relative depth relationships owing to its implicit depth representations in 2D views. This study proposes the relational depth text (RDT) technique to overcome these limitations, utilizing a Monocular Depth Estimation model to extract depth maps from rendered 2D images and applying the K-Nearest Neighbors algorithm to convert inter-object relative depth relations into natural language descriptions, thereby incorporating them into Vision–Language Model (VLM) prompts. This method distinguishes itself by augmenting spatial reasoning capabilities while preserving SeeGround’s existing pipeline, demonstrating a 3.54% improvement in the Acc@0.25 metric on the Nr3D dataset in a 7B VLM environment that is approximately 10.3 times lighter than the original model, along with a 6.74% increase in Unique cases on the ScanRefer dataset, albeit with a 1.70% decline in Multiple cases. The proposed technique enhances the robustness of grounding through viewpoint anchoring and candidate discrimination in complex query scenarios, and is expected to improve efficiency in practical applications through future multi-view fusion and conditional execution optimizations. Full article
(This article belongs to the Special Issue Advances in Computer Graphics and 3D Technologies)
Show Figures

Figure 1

29 pages, 17260 KB  
Article
IMTS-YOLO: A Steel Surface Defect Detection Model Integrating Multi-Scale Perception and Progressive Attention
by Pengzheng Fu, Hongbin Yuan, Jing He, Bangzhi Wu, Nuo Xu and Yong Gu
Coatings 2026, 16(1), 51; https://doi.org/10.3390/coatings16010051 - 2 Jan 2026
Viewed by 200
Abstract
In recent years, steel surface defect detection has emerged as a significant area of focus within intelligent manufacturing research. Existing approaches often exhibit insufficient accuracy and limited generalization capability, constraining their practical implementation in industrial environments. To overcome these shortcomings, this study presents [...] Read more.
In recent years, steel surface defect detection has emerged as a significant area of focus within intelligent manufacturing research. Existing approaches often exhibit insufficient accuracy and limited generalization capability, constraining their practical implementation in industrial environments. To overcome these shortcomings, this study presents IMTS-YOLO, an enhanced detection model based on the YOLOv11n architecture, incorporating several technical innovations designed to improve detection performance. The proposed framework introduces four key enhancements. First, an Intelligent Guidance Mechanism (IGM) refines the feature extraction process to address semantic ambiguity and enhance cross-scenario adaptability, particularly for detecting complex defect patterns. Second, a multi-scale convolution module (MulBk) captures and integrates defect features across varying receptive fields, thereby improving the characterization of intricate surface textures. Third, a triple-head adaptive feature fusion (TASFF) structure enables more effective detection of irregularly shaped defects while maintaining computational efficiency. Finally, a specialized bounding box regression loss function (Shape-IoU) optimizes localization precision and training stability. The model achieved a 5.0% improvement in mAP50 and a 3.2% improvement in mAP50-95 on the NEU-DET dataset, while also achieving a 4.4% improvement in mAP50 and a 3.1% improvement in mAP50-95 in the cross-dataset GC10-DET validation. These results confirm the model’s practical value for real-time industrial defect inspection applications. Full article
(This article belongs to the Special Issue Solid Surfaces, Defects and Detection, 2nd Edition)
Show Figures

Graphical abstract

19 pages, 2314 KB  
Article
Occlusion Avoidance for Harvesting Robots: A Lightweight Active Perception Model
by Tao Zhang, Jiaxi Huang, Jinxing Niu, Zhengyi Liu, Le Zhang and Huan Song
Sensors 2026, 26(1), 291; https://doi.org/10.3390/s26010291 - 2 Jan 2026
Viewed by 199
Abstract
Addressing the issue of fruit recognition and localization failures in harvesting robots due to severe occlusion by branches and leaves in complex orchard environments, this paper proposes an occlusion avoidance method that combines a lightweight YOLOv8n model, developed by Ultralytics in the United [...] Read more.
Addressing the issue of fruit recognition and localization failures in harvesting robots due to severe occlusion by branches and leaves in complex orchard environments, this paper proposes an occlusion avoidance method that combines a lightweight YOLOv8n model, developed by Ultralytics in the United States, with active perception. Firstly, to meet the stringent real-time requirements of the active perception system, a lightweight YOLOv8n model was developed. This model reduces computational redundancy by incorporating the C2f-FasterBlock module and enhances key feature representation by integrating the SE attention mechanism, significantly improving inference speed while maintaining high detection accuracy. Secondly, an end-to-end active perception model based on ResNet50 and multi-modal fusion was designed. This model can intelligently predict the optimal movement direction for the robotic arm based on the current observation image, actively avoiding occlusions to obtain a more complete field of view. The model was trained using a matrix dataset constructed through the robot’s dynamic exploration in real-world scenarios, achieving a direct mapping from visual perception to motion planning. Experimental results demonstrate that the proposed lightweight YOLOv8n model achieves a mAP of 0.885 in apple detection tasks, a frame rate of 83 FPS, a parameter count reduced to 1,983,068, and a model weight file size reduced to 4.3 MB, significantly outperforming the baseline model. In active perception experiments, the proposed method effectively guided the robotic arm to quickly find observation positions with minimal occlusion, substantially improving the success rate of target recognition and the overall operational efficiency of the system. The current research outcomes provide preliminary technical validation and a feasible exploratory pathway for developing agricultural harvesting robot systems suitable for real-world complex environments. It should be noted that the validation of this study was primarily conducted in controlled environments. Subsequent work still requires large-scale testing in diverse real-world orchard scenarios, as well as further system optimization and performance evaluation in more realistic application settings, which include natural lighting variations, complex weather conditions, and actual occlusion patterns. Full article
Show Figures

Figure 1

18 pages, 14209 KB  
Article
A Real-Time Improved YOLOv10 Model for Small and Multi-Scale Ground Target Detection in UAV LiDAR Range Images of Complex Scenes
by Yu Zhai, Ziyi Zhang, Sen Xie, Chunsheng Tong, Xiuli Luo, Xuan Li, Liming Wang and Yingliang Zhao
Electronics 2026, 15(1), 211; https://doi.org/10.3390/electronics15010211 - 1 Jan 2026
Viewed by 200
Abstract
Low-altitude Unmanned Aerial Vehicle (UAV) detection using LiDAR range images faces persistent challenges. These include sparse features for long-range targets, large scale variations caused by viewpoint changes, and severe interference from complex backgrounds. To address these issues, we propose an improved detection framework [...] Read more.
Low-altitude Unmanned Aerial Vehicle (UAV) detection using LiDAR range images faces persistent challenges. These include sparse features for long-range targets, large scale variations caused by viewpoint changes, and severe interference from complex backgrounds. To address these issues, we propose an improved detection framework based on YOLOv10. First, we design a Swin-Conv hybrid module that combines sparse attention with deformable convolution. This module enables the network to focus on informative regions and adapt to target geometry. These capabilities jointly strengthen feature extraction for sparse, long-range targets. Second, we introduce Attentional Feature Fusion (AFF) in the neck to replace naïve feature concatenation. AFF employs multi-scale channel attention to softly select and adaptively weight features from different levels, improving robustness to multi-scale targets. In addition, we systematically study how the viewpoint distribution in the training set affects performance. The results show that moderately increasing the proportion of low-elevation-view samples significantly improves detection accuracy. Experiments on a self-built simulated LiDAR range-image dataset demonstrate that our method achieves 88.96% mAP at 54.2 FPS, which is 4.78 percentage points higher than the baseline. Deployment on the Jetson Orin Nano edge device further validates the model’s potential for real-time applications. The proposed method remains robust under noise and complex backgrounds. The proposed approach achieves an effective balance between detection accuracy and computational efficiency, providing a reliable solution for real-time target detection in complex low-altitude environments. Full article
(This article belongs to the Special Issue Image Processing for Intelligent Electronics in Multimedia Systems)
Show Figures

Figure 1

22 pages, 3277 KB  
Article
FusionBullyNet: A Robust English—Arabic Cyberbullying Detection Framework Using Heterogeneous Data and Dual-Encoder Transformer Architecture with Attention Fusion
by Mohammed A. Mahdi, Muhammad Asad Arshed and Shahzad Mumtaz
Mathematics 2026, 14(1), 170; https://doi.org/10.3390/math14010170 - 1 Jan 2026
Viewed by 222
Abstract
Cyberbullying has become a pervasive threat on social media, impacting the safety and wellbeing of users worldwide. Most existing studies focus on monolingual content, limiting their applicability to online environments. This study aims to develop an approach that accurately detects abusive content in [...] Read more.
Cyberbullying has become a pervasive threat on social media, impacting the safety and wellbeing of users worldwide. Most existing studies focus on monolingual content, limiting their applicability to online environments. This study aims to develop an approach that accurately detects abusive content in bilingual settings. Given the large volume of online content in English and Arabic, we propose a bilingual cyberbullying detection approach designed to deliver efficient, scalable, and robust performance. Several datasets were combined, processed, and augmented before proposing a cyberbullying identification approach. The proposed model (FusionBullyNet) is based on fine-tuning of two transformer models (RoBERTa-base + bert-base-arabertv02-twitter), attention-based fusion, gradually unfreezing the layers, and label smoothing to enhance generalization. The test accuracy of 0.86, F1 scores of 0.83 for bullying and 0.88 for no bullying, and an overall ROC-AUC of 0.929 were achieved with the proposed approach. To assess the robustness of the proposed models, several multilingual models, such as XLM-RoBERTa-Base, Microsoft/mdeberta-v3-base, and google-bert/bert-base-multilingual-cased, were also trained in this study, and all achieved a test accuracy of 0.84. Furthermore, several machine learning models were trained in this study, and Logistic Regression, XGBoost Classifier, and Light GBM Classifier achieved the highest accuracy of 0.82. These results demonstrate that the proposed approach provides a reliable, high-performance solution for cyberbullying detection, contributing to safer online communication environments. Full article
(This article belongs to the Special Issue Computational Intelligence in Addressing Data Heterogeneity)
Show Figures

Figure 1

73 pages, 3131 KB  
Review
Magnetic Barkhausen Noise Sensor: A Comprehensive Review of Recent Advances in Non-Destructive Testing and Material Characterization
by Polyxeni Vourna, Pinelopi P. Falara, Aphrodite Ktena, Evangelos V. Hristoforou and Nikolaos D. Papadopoulos
Sensors 2026, 26(1), 258; https://doi.org/10.3390/s26010258 - 31 Dec 2025
Viewed by 306
Abstract
Magnetic Barkhausen noise (MBN) represents a powerful non-destructive testing and material characterization methodology enabling quantitative assessment of microstructural features, mechanical properties, and stress states in ferromagnetic materials. This comprehensive review synthesizes recent advances spanning theoretical foundations, sensor design, signal processing methodologies, and industrial [...] Read more.
Magnetic Barkhausen noise (MBN) represents a powerful non-destructive testing and material characterization methodology enabling quantitative assessment of microstructural features, mechanical properties, and stress states in ferromagnetic materials. This comprehensive review synthesizes recent advances spanning theoretical foundations, sensor design, signal processing methodologies, and industrial applications. The physical basis rooted in domain wall dynamics and statistical mechanics provides rigorous frameworks for interpreting MBN signals in terms of grain structure, dislocation density, phase composition, and residual stress. Contemporary instrumentation innovations including miniaturized sensors, multi-parameter systems, and high-entropy alloy cores enable measurements in challenging environments. Advanced signal processing techniques—encompassing time-domain analysis, frequency-domain spectral methods, time–frequency transforms, and machine learning algorithms—extract comprehensive material information from raw Barkhausen signals. Deep learning approaches demonstrate superior performance for automated material classification and property prediction compared to traditional statistical methods. Industrial applications span manufacturing quality control, structural health monitoring, railway infrastructure assessment, and predictive maintenance strategies. Key achievements include establishing quantitative correlations between material properties and stress states, with measurement uncertainties of ±15–20 MPa for stress and ±20 HV for hardness. Emerging challenges include standardization imperatives, characterization of advanced materials, machine learning robustness, and autonomous system integration. Future developments prioritizing international standards, physics-informed neural networks, multimodal sensor fusion, and wireless monitoring networks will accelerate industrial adoption supporting safe, efficient engineering practice across diverse sectors. Full article
(This article belongs to the Special Issue Recent Trends and Advances in Magnetic Sensors)
Show Figures

Figure 1

46 pages, 852 KB  
Systematic Review
The Intelligent Evolution of Radar Signal Deinterleaving: A Systematic Review from Foundational Algorithms to Cognitive AI Frontiers
by Zhijie Qu, Jinquan Zhang, Yuewei Zhou and Lina Ni
Sensors 2026, 26(1), 248; https://doi.org/10.3390/s26010248 - 31 Dec 2025
Viewed by 381
Abstract
The escalating complexity, density, and agility of the modern electromagnetic environment (CME) pose unprecedented challenges to radar signal deinterleaving, a cornerstone of electronic intelligence. While traditional methods face significant performance bottlenecks, the advent of artificial intelligence, particularly deep learning, has catalyzed a paradigm [...] Read more.
The escalating complexity, density, and agility of the modern electromagnetic environment (CME) pose unprecedented challenges to radar signal deinterleaving, a cornerstone of electronic intelligence. While traditional methods face significant performance bottlenecks, the advent of artificial intelligence, particularly deep learning, has catalyzed a paradigm shift. This review provides a systematic, comprehensive, and forward-looking analysis of the radar signal deinterleaving landscape, critically bridging foundational techniques with the cognitive frontiers. Previous reviews often focused on specific technical branches or predated the deep learning revolution. In contrast, our work offers a holistic synthesis. It explicitly links the evolution of algorithms to the persistent challenges of the CME. We first establish a unified mathematical framework and systematically evaluate classical approaches, such as PRI-based search and clustering algorithms, elucidating their contributions and inherent limitations. The core of our review then pivots to the deep learning-driven era, meticulously dissecting the application paradigms, innovations, and performance of mainstream architectures, including Recurrent Neural Networks (RNNs), Transformers, Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). Furthermore, we venture into emerging frontiers, exploring the transformative potential of self-supervised learning, meta-learning, multi-station fusion, and the integration of Large Language Models (LLMs) for enhanced semantic reasoning. A critical assessment of the current dataset landscape is also provided, highlighting the crucial need for standardized benchmarks. Finally, this paper culminates in a comprehensive comparative analysis, identifying key open challenges such as open-set recognition, model interpretability, and real-time deployment. We conclude by offering in-depth insights and a roadmap for future research, aimed at steering the field towards end-to-end intelligent and autonomous deinterleaving systems. This review is intended to serve as a definitive reference and insightful guide for researchers, catalyzing future innovation in intelligent radar signal processing. Full article
Show Figures

Figure 1

18 pages, 2002 KB  
Article
YOLOv11-ASV: Research on Classroom Behavior Recognition Method Based on YOLOv11
by Zihao Wang and Tao Fan
Appl. Sci. 2026, 16(1), 432; https://doi.org/10.3390/app16010432 - 31 Dec 2025
Viewed by 176
Abstract
(1) Background: With the continuous development of intelligent education, classroom behavior recognition has become increasingly important in teaching evaluation and learning analytics. In response to challenges such as occlusion, scale differences, and fine-grained behavior recognition in complex classroom environments, this paper proposes an [...] Read more.
(1) Background: With the continuous development of intelligent education, classroom behavior recognition has become increasingly important in teaching evaluation and learning analytics. In response to challenges such as occlusion, scale differences, and fine-grained behavior recognition in complex classroom environments, this paper proposes an improved YOLOv11-ASV detection framework; (2) Methods: This framework introduces the Adaptive Spatial Pyramid Network (ASPN) based on YOLOv11, enhancing contextual modeling capabilities through block-level channel partitioning and multi-scale feature fusion mechanisms. Additionally, VanillaNet is adopted as the backbone network to improve the global semantic feature representation; (3) Conclusions: Experimental results show that on our self-built classroom behavior dataset (ClassroomDatasets), YOLOv11-ASV achieves 81.5% mAP50 and 62.1% mAP50–95, improving by 1.6% and 2.9%, respectively, compared to the baseline model. Notably, performance shows significant improvement in recognizing behavior classes such as “reading” and “writing” which are often confused. The experimental results validate the effectiveness of the YOLOv11-ASV model in improving behavior recognition accuracy and robustness in complex classroom scenarios, providing reliable technical support for the practical application of smart classroom systems. Full article
Show Figures

Figure 1

33 pages, 5502 KB  
Article
Study on Lightweight Algorithm for Multi-Scale Target Detection of Personnel and Equipment in Open Pit Mine
by Erxiang Zhao, Caimou Qiu and Chunyang Zhang
Appl. Sci. 2026, 16(1), 354; https://doi.org/10.3390/app16010354 - 29 Dec 2025
Viewed by 170
Abstract
Personnel and equipment target detection algorithms in open pit mines have significantly improved mining safety, production efficiency, and management optimization. However, achieving precise target localization in complex backgrounds, addressing mutual occlusion among multiple targets, and detecting large-scale and spatially extensive targets remain challenges [...] Read more.
Personnel and equipment target detection algorithms in open pit mines have significantly improved mining safety, production efficiency, and management optimization. However, achieving precise target localization in complex backgrounds, addressing mutual occlusion among multiple targets, and detecting large-scale and spatially extensive targets remain challenges for current target detection algorithms in open pit mining areas. To address these issues, this study proposes a novel target detection algorithm named RSLH-YOLO, specifically designed for personnel and equipment detection in complex open pit mining scenarios. Based on the YOLOv11 (You Only Look Once version 11) framework, the algorithm enhances the backbone network by introducing receptive field attention convolution and dilated convolution to expand the model’s receptive field and reduce information loss, thereby improving target localization capability in complex environments. Additionally, a bidirectional fusion mechanism between high-resolution and low-resolution features is adopted, along with a dedicated small-target detection layer, to strengthen multi-scale target recognition. Finally, a lightweight detection head is implemented to reduce model parameters and computational costs while improving occlusion handling, making the model more suitable for personnel and vehicle detection in mining environments. Experimental results demonstrate that RSLH-YOLO achieves a mAP (mean average precision) of 89.1%, surpassing the baseline model by 3.2 percentage points while maintaining detection efficiency. These findings indicate that the proposed model is applicable to open pit mining scenarios with limited computational resources, providing effective technical support for personnel and equipment detection in mining operations. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

23 pages, 6177 KB  
Article
RT-DETR Optimization with Efficiency-Oriented Backbone and Adaptive Scale Fusion for Precise Pomegranate Detection
by Jun Yuan, Jing Fan, Hui Liu, Weilong Yan, Donghan Li, Zhenke Sun, Hongtao Liu and Dongyan Huang
Horticulturae 2026, 12(1), 42; https://doi.org/10.3390/horticulturae12010042 - 29 Dec 2025
Viewed by 353
Abstract
To develop a high-performance detection system for automated harvesting on resource-limited edge devices, we introduce FSA-DETR-P, a lightweight detection framework that addresses challenges such as illumination inconsistency, occlusion, and scale variation in complex orchard environments. Unlike traditional computationally intensive architectures, this model optimizes [...] Read more.
To develop a high-performance detection system for automated harvesting on resource-limited edge devices, we introduce FSA-DETR-P, a lightweight detection framework that addresses challenges such as illumination inconsistency, occlusion, and scale variation in complex orchard environments. Unlike traditional computationally intensive architectures, this model optimizes real-time detection transformers by integrating an efficient backbone for fast feature extraction, a simplified aggregation structure to minimize complexity, and an adaptive mechanism for multi-scale feature fusion. The optimized backbone improves early-stage texture extraction while reducing computational demands. The streamlined aggregation design enhances multi-level interactions without losing spatial detail, and the adaptive fusion module strengthens the detection of small, partially occluded, or ambiguous fruits. We created a domain-specific pomegranate dataset, expanded to 13,840 images with a rigorous 8:1:1 split for training, validation, and testing. The results show that the pruned and optimized model achieves a Mean Average Precision (mAP50) of 0.928 and mAP50–95 of 0.632 with reduced parameters (13.73 M) and lower computational costs (34.6 GFLOPs). It operates at 24.6 FPS on an NVIDIA Jetson Orin Nano, indicating a strong balance between accuracy and deployability, making it well-suited for orchard monitoring and robotic harvesting in real-world applications. Full article
(This article belongs to the Section Fruit Production Systems)
Show Figures

Figure 1

21 pages, 66751 KB  
Article
Real-Time Panoramic Surveillance Video Stitching Method for Complex Industrial Environments
by Jiuteng Zhu, Jianyu Guo, Kailun Ding, Gening Wang, Youxuan Zhou and Wenhong Li
Sensors 2026, 26(1), 186; https://doi.org/10.3390/s26010186 - 26 Dec 2025
Viewed by 360
Abstract
In complex industrial environments, surveillance videos often exhibit large parallax, low illumination, low texture, and low overlap rate, making it difficult to extract reliable image feature points and consequently leading to video suboptimal stitching performance. To address these challenges, this study proposes a [...] Read more.
In complex industrial environments, surveillance videos often exhibit large parallax, low illumination, low texture, and low overlap rate, making it difficult to extract reliable image feature points and consequently leading to video suboptimal stitching performance. To address these challenges, this study proposes a real-time panoramic surveillance video stitching method specifically designed for complex industrial scenarios. In the image registration stage, the Efficient Channel Attention (ECA) and Channel Attention (CA) modules are integrated with ResNet to enhance the feature extraction layers of the UDIS algorithm, thereby improving feature extraction and matching accuracy. A loss function incorporating similarity loss Lsim and smoothness loss Lsmooth is designed to optimize registration errors. In the image fusion stage, gradient terms and motion terms are introduced for improving the energy function of the optimal seam line, enabling the optimal seam line to avoid moving objects in overlapping regions and thus achieve video stitching. Experimental validation is conducted by comparing the proposed image registration method with SIFT + RANSAC, UDIS, UDIS++, and NIS, and the proposed image fusion method with weighted average fusion, dynamic programming, and graph cut. The results show that, in image registration experiments, the proposed method achieves RMSE, PSNR, and SSIM values of 1.965, 25.338, and 0.8366, respectively. In image fusion experiments, the seam transition is smoother and effectively avoids moving objects, significantly improving the visual quality of the stitched videos. Moreover, the real-time stitching frame rate reaches 23 fps, meeting the real-time requirements of industrial surveillance applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

14 pages, 2571 KB  
Article
RMP: Robust Multi-Modal Perception Under Missing Condition
by Xin Ma, Xuqi Cai, Yuansheng Song, Yu Liang, Gang Liu and Yijun Yang
Electronics 2026, 15(1), 119; https://doi.org/10.3390/electronics15010119 - 26 Dec 2025
Viewed by 209
Abstract
Multi-modal perception is a core technology for edge devices to achieve safe and reliable environmental understanding in autonomous driving scenarios. In recent years, most approaches have focused on integrating complementary signals from diverse sensors, including cameras and LiDAR, to improve scene understanding in [...] Read more.
Multi-modal perception is a core technology for edge devices to achieve safe and reliable environmental understanding in autonomous driving scenarios. In recent years, most approaches have focused on integrating complementary signals from diverse sensors, including cameras and LiDAR, to improve scene understanding in complex traffic environments, thereby attracting significant attention. However, in real-world applications, sensor failures frequently occur; for instance, cameras may malfunction in scenarios with poor illumination, which severely reduces the accuracy of perception models. To overcome this issue, we propose a robust multi-modal perception pipeline designed to improve model performance under missing modality conditions. Specifically, we design a missing feature reconstruction mechanism to reconstruct absent features by leveraging intra-modal common clues. Furthermore, we introduce a multi-modal adaptive fusion strategy to facilitate adaptive multi-modal integration through inter-modal feature interactions. Extensive experiments on the nuScenes benchmark demonstrate that our method achieves SOTA-level performance under missing-modality conditions. Full article
(This article belongs to the Special Issue Hardware and Software Co-Design in Intelligent Systems)
Show Figures

Figure 1

27 pages, 4293 KB  
Article
SNR-Guided Enhancement and Autoregressive Depth Estimation for Single-Photon Camera Imaging
by Qingze Yin, Fangming Mu, Qinge Wu, Ding Ding, Ziyu Fan and Tongpo Zhang
Appl. Sci. 2026, 16(1), 245; https://doi.org/10.3390/app16010245 - 25 Dec 2025
Viewed by 311
Abstract
Recent advances in deep learning have intensified the need for robust low-light image processing in critical applications like autonomous driving, where single-photon cameras (SPCs) offer high photon sensitivity but produce noisy outputs requiring specialized enhancement. This work addresses this challenge through a unified [...] Read more.
Recent advances in deep learning have intensified the need for robust low-light image processing in critical applications like autonomous driving, where single-photon cameras (SPCs) offer high photon sensitivity but produce noisy outputs requiring specialized enhancement. This work addresses this challenge through a unified framework integrating three key components: an SNR-guided adaptive enhancement framework that dynamically processes regions with varying noise levels using spatial-adaptive operations and intelligent feature fusion; a specialized self-attention mechanism optimized for low-light conditions; and a conditional autoregressive generation approach applied to robust depth estimation from enhanced SPC images. Our comprehensive evaluation across multiple datasets demonstrates improved performance over state-of-the-art methods, achieving a PSNR of 24.61 dB on the LOL-v1 dataset and effectively recovering fine-grained textures in depth estimation, particularly in real-world SPC applications, while maintaining computational efficiency. The integrated solution effectively bridges the gap between single-photon sensing and practical computer vision tasks, facilitating more reliable operation in photon-starved environments through its novel combination of adaptive noise processing, attention-based feature enhancement, and generative depth reconstruction. Full article
Show Figures

Figure 1

Back to TopTop