MDPI - Publisher of Open Access Journals

15 pages, 646 KB

Open AccessArticle

VisualRNet: Lightweight Camera Rotation Estimation from Low-Resolution Optical Flow via Cross-Modal Supervision

by Xiong Yang, Hao Wang and Jiong Ni

Sensors 2026, 26(9), 2655; https://doi.org/10.3390/s26092655 - 24 Apr 2026

Camera rotation estimation is a key component in video stabilization and motion analysis. In many practical scenarios, inertial measurements are unavailable or temporally unreliable, while classical geometric pipelines degrade under blur, low texture, and low illumination. This paper investigates whether substantially downsampled optical [...] Read more.

Camera rotation estimation is a key component in video stabilization and motion analysis. In many practical scenarios, inertial measurements are unavailable or temporally unreliable, while classical geometric pipelines degrade under blur, low texture, and low illumination. This paper investigates whether substantially downsampled optical flow can retain sufficient structure for accurate frame-to-frame rotation regression. We present VisualRNet, a lightweight rotation-specific visual regression framework trained with cross-modal IMU supervision. Our design uses coordinate-aware feature encoding, depthwise separable convolutions, lightweight attention, and a compact 6D rotation head to model the spatial structure of rotational flow fields. On Deep-FVS, VisualRNet achieves a mean rotation error of 0.3151

^{\circ}

on the test set. The VisualRNet regression head contains 7.7 K parameters, 0.002 GFLOPs, and runs at 729 FPS, while the full pipeline with the FastFlowNetv2 frontend contains 1.374 M parameters, 7.194 GFLOPs, and runs at approximately 113 FPS. A cross-camera adaptation experiment on TUM VI further indicates that the learned motion representation can be aligned to a new camera system with limited calibration data. These results support low-resolution optical flow as a practical input for visual rotation estimation and suggest particular value in stabilization-oriented and cost-sensitive applications where approximate rotational trend matters more than full scene geometry. Full article

(This article belongs to the Section Optical Sensors)

25 pages, 7920 KB

Open AccessArticle

MBA-Former: A Boundary-Aware Transformer for Synergistic Multi-Modal Representation in Pine Wilt Disease Detection from High-Resolution Satellite Imagery

by Rui Hou, Yantao Zhou, Ying Wang, Zhiquan Huang, Jing Yao, Quanjun Jiao, Wenjiang Huang and Biyao Zhang

Forests 2026, 17(5), 517; https://doi.org/10.3390/f17050517 (registering DOI) - 23 Apr 2026

Abstract

Pine wilt disease (PWD) is a devastating biological forest disturbance, making its large-scale and high-precision remote sensing monitoring crucial for epidemic prevention and control. However, the performance of existing deep learning methods in high-resolution imagery is often limited by the confusion of spectral [...] Read more.

Pine wilt disease (PWD) is a devastating biological forest disturbance, making its large-scale and high-precision remote sensing monitoring crucial for epidemic prevention and control. However, the performance of existing deep learning methods in high-resolution imagery is often limited by the confusion of spectral features among disparate ground objects and the complexity of forest boundaries. To address these challenges, this study proposes an innovative, end-to-end deep learning architecture termed MBA-Former. Built upon the robust Swin Transformer V2 backbone, the model systematically integrates two highly adaptable functional modules: (1) a front-end intelligent fusion module designed to adaptively fuse heterogeneous features, and (2) a back-end boundary refinement module that refines segmentation contours via dual-task learning. To train and evaluate the model, fine-grained manual annotations were first performed on Gaofen-2 satellite imagery acquired from multiple typical epidemic areas across northern and southern China. Information-enhanced datasets were constructed by fusing the original spectral bands, typical vegetation indices, and texture features. A comprehensive performance evaluation was then conducted, specifically targeting typical challenging scenarios characterized by complex ground object boundaries. The experimental results demonstrate that the Multi-modal Boundary-Aware Transformer (MBA-Former) significantly outperforms current state-of-the-art models. It achieved a mean Intersection over Union (mIoU) of 81.74%, an IoU of 77.58% for the most critical infected tree category, and a Boundary F1-Score of 78.62%. Compared to the best-performing baseline model, Swin-Unet, these three metrics exhibited notable improvements of 2.88%, 3.55%, and 4.46%, respectively. These findings convincingly demonstrate that MBA-Former provides a highly accurate and robust solution for the large-scale, automated remote sensing monitoring of forest diseases, offering immense value in preventing significant economic losses and preserving forest ecosystem integrity. Full article

(This article belongs to the Special Issue Forest Disturbance Monitoring by Remote Sensing: Advancements and Applications)

► Show Figures

Figure 1

18 pages, 2187 KB

Open AccessArticle

DCN-KUnet: A DCNv3-Based Backbone and KAN Bottleneck for Chromosome Segmentation

by Yufei Yang and Min Chang

Electronics 2026, 15(8), 1649; https://doi.org/10.3390/electronics15081649 - 15 Apr 2026

Viewed by 199

Abstract

Chromosome foreground segmentation is a binary semantic segmentation problem that serves as a prerequisite for overlap reasoning, contact-region inspection, and automated karyotyping. Although simpler than full instance separation in formulation, it remains difficult in metaphase imagery because chromosomes are elongated, deformable, weakly bounded, [...] Read more.

Chromosome foreground segmentation is a binary semantic segmentation problem that serves as a prerequisite for overlap reasoning, contact-region inspection, and automated karyotyping. Although simpler than full instance separation in formulation, it remains difficult in metaphase imagery because chromosomes are elongated, deformable, weakly bounded, and frequently touching or partially overlapping. To address these chromosome-specific difficulties, we present DCN-KUnet as a task-oriented integration rather than a new generic segmentation family. The encoder–decoder backbone embeds DCNv3 modules to perform geometry-adaptive sampling for bending-aware and boundary-aware representation learning, while a B-spline KAN bottleneck refines the compressed semantic representation through lightweight nonlinear transformation. In addition, a hybrid objective composed of mask supervision, semantic consistency regularization, and internal feature regularization (

L_{c d} + L_{S C R} + L_{I F D}

) jointly constrains prediction accuracy, cross-stage semantic agreement, and feature compactness during training. Experiments on the public overlapping-chromosome dataset and on AutoKary2022 converted to binary foreground masks show that DCN-KUnet achieves stronger Dice, IoU, and HD95 with a moderate parameter budget. These results support the proposed framework as a practical and lightweight semantic foreground front-end for chromosome analysis pipelines rather than a full instance-disentanglement solution. Full article

► Show Figures

Figure 1

33 pages, 7834 KB

Open AccessArticle

Frequency-Domain Decoupling and Multi-Dimensional Spatial Feature Reconstruction for Occlusion-Aware Apple Detection in Complex Semi-Structured Orchard Environments

by Long Gao, Pengfei Wang, Lixing Liu, Hongjie Liu, Jianping Li and Xin Yang

Agronomy 2026, 16(8), 790; https://doi.org/10.3390/agronomy16080790 - 12 Apr 2026

Viewed by 433

Abstract

Apple detection is a core perception task for harvesting robots operating in complex orchard environments. Targets are frequently affected by branch–foliage occlusion, alternating front/side/back lighting, and strong local illumination fluctuations, which blur object boundaries against background textures and substantially increase detection difficulty. To [...] Read more.

Apple detection is a core perception task for harvesting robots operating in complex orchard environments. Targets are frequently affected by branch–foliage occlusion, alternating front/side/back lighting, and strong local illumination fluctuations, which blur object boundaries against background textures and substantially increase detection difficulty. To improve target perception under these conditions, we propose an improved detector, YOLOv11-CBMES. First, based on YOLOv11, we replace the original neck with a weighted BiFPN to enhance cross-scale feature fusion under occlusion. Second, we introduce a Contrast-Driven Feature Aggregation (CDFA) module at the P5 stage, using Haar wavelet decomposition to decouple low-frequency illumination components from high-frequency structural components. Third, we reconstruct spatial feature learning and the upsampling pathway using CSP-based multi-scale blocks and efficient upsampling blocks, and embed a zero-parameter Shift-Context strategy to strengthen local neighbourhood interaction. Finally, we formulate apple detection as a three-class occlusion classification task (No Occlusion, Soft Occlusion, and Hard Occlusion) to support occlusion-aware target recognition. On the apple occlusion dataset, YOLOv11-CBMES achieves

{m A P}_{N O}

= 83.50%,

{m A P}_{S O}

= 67.36%, and

{m A P}_{H O}

= 51.90% at IoU = 0.5. Compared with YOLOv11n under the same training protocol, the gains are +2.16 pp (NO), +3.68 pp (SO), and +5.31 pp (HO), with the largest improvement observed in Hard Occlusion (HO). The results indicate that introducing frequency-domain structural processing into the detection framework improves apple occlusion classification and object detection performance, and provides a theoretical basis for designing perception modules for end-effector operations in apple harvesting robots. Full article

(This article belongs to the Special Issue Intelligent Detection and Classification of External Traits in Crop Plants, Fruits, and Vegetables)

► Show Figures

Figure 1

22 pages, 6170 KB

Open AccessArticle

A Lightweight Net with Dual-Path Feature Enhancer and Bidirectional Gated Fusion for Cloud Detection

by Yan Mo, Puhui Chen, Shaowei Bai and Erbao Xiao

Sensors 2026, 26(5), 1727; https://doi.org/10.3390/s26051727 - 9 Mar 2026

Viewed by 333

Abstract

Cloud detection serves as a critical preprocessing step in remote sensing image processing and quantitative applications. However, prevailing deep learning-based models often depend on computationally intensive backbone networks to achieve high accuracy, which hinders their deployment in resource-constrained scenarios such as on-board processing [...] Read more.

Cloud detection serves as a critical preprocessing step in remote sensing image processing and quantitative applications. However, prevailing deep learning-based models often depend on computationally intensive backbone networks to achieve high accuracy, which hinders their deployment in resource-constrained scenarios such as on-board processing or edge computing. To bridge the trade-off between accuracy and efficiency, this paper introduces a lightweight network for cloud detection. The core innovations of our network are twofold: (1) a dual-path feature enhancer that operates at the front end to extract and fuse multi-scale features through a parallel architecture, significantly enriching feature diversity and representational capacity, thereby alleviating the need for a complex backbone, and (2) a bidirectional gated fusion module, which adaptively integrates multi-scale features from the dual-path feature enhancer with deep semantic features from the backbone decoder through a gated attention mechanism and dynamic convolution, thereby enhancing feature discriminability. Comprehensive experiments on the public HRC_WHU dataset demonstrate that the proposed model achieves a high overall accuracy of 96.31% and a mean intersection-over-union of 92.82%, with only 12.04 GFLOPs of computational cost, outperforming several state-of-the-art methods. These results validate that our approach effectively balances high detection performance with computational efficiency, offering a practical solution for real-time, lightweight cloud detection in high-resolution remote sensing imagery. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)

► Show Figures

Figure 1

16 pages, 2961 KB

Open AccessArticle

Non-Destructive Determination of Hass Avocado Harvest Maturity in Colombia Based on Low-Cost Bioimpedance Spectroscopy and Machine Learning

by Froylan Jimenez Sanchez, Jose Aguilar and Marta Tabares-Betancur

Computers 2026, 15(3), 166; https://doi.org/10.3390/computers15030166 - 4 Mar 2026

Viewed by 480

Abstract

The export of Hass avocado (Persea americana Mill.) from Colombia requires accurate determination of harvest maturity, currently assessed through destructive dry matter (DM) measurements that are wasteful and limited in throughput. The objective of the article is to propose a low-cost, non-destructive [...] Read more.

The export of Hass avocado (Persea americana Mill.) from Colombia requires accurate determination of harvest maturity, currently assessed through destructive dry matter (DM) measurements that are wasteful and limited in throughput. The objective of the article is to propose a low-cost, non-destructive approach to determine the maturity of the Hass avocado crop based on machine learning techniques. The approach consists of a low-cost, non-invasive bioimpedance spectroscopy system operating in the 1–10 kHz range, featuring a custom Analog Front End (AFE) and a tetrapolar surface probe to mitigate skin contact resistance, which collects data for predictive models of avocado maturity. To evaluate the quality of the approach, a longitudinal field study (n = 100) was conducted in a commercial orchard in Cundinamarca, Colombia, tracking complex impedance features—Magnitude, Phase Angle, Resistance, and Reactance—of tagged fruits over 8 weeks across four measurement timepoints. The predictive performance of a classical chemometric model (PLS-DA), non-linear classifiers (SVM, Random Forest), and a temporal Deep Learning (LSTM) architecture was compared using a Stratified Group K-Fold Cross-Validation scheme to prevent data leakage across fruits from the same tree. The 4-electrode configuration successfully isolated mesocarp impedance, identifying the 5–7.2 kHz band as the most sensitive to physiological maturation. In turn, the LSTM model achieved a mean accuracy of 92.0% and an AUC of 0.94, outperforming the other models by 4.0% in mean accuracy. The results demonstrate that modeling the temporal trajectory of impedance, rather than single-point measurements, improves harvest maturity classification in Hass avocados, providing a scalable, low-cost alternative to destructive testing. Full article

(This article belongs to the Special Issue Intelligent Computing and Sensing Systems for Sustainable Precision Agriculture)

► Show Figures

Figure 1

24 pages, 2038 KB

Open AccessArticle

Evaluating the Managerial Feasibility of an AI-Based Tooth-Percussion Signal Screening Concept for Dental Caries: An In Silico Study

by Stefan Lucian Burlea, Călin Gheorghe Buzea, Irina Nica, Florin Nedeff, Diana Mirila, Valentin Nedeff, Lacramioara Ochiuz, Lucian Dobreci, Maricel Agop and Ioana Rudnic

Diagnostics 2026, 16(4), 638; https://doi.org/10.3390/diagnostics16040638 - 22 Feb 2026

Viewed by 552

Abstract

Background: Early detection of dental caries is essential for effective oral health management. Current diagnostic workflows rely heavily on radiographic imaging, which involves infrastructure requirements, workflow coordination, and resource considerations that may limit frequent use in high-throughput or resource-constrained settings. These contextual factors [...] Read more.

Background: Early detection of dental caries is essential for effective oral health management. Current diagnostic workflows rely heavily on radiographic imaging, which involves infrastructure requirements, workflow coordination, and resource considerations that may limit frequent use in high-throughput or resource-constrained settings. These contextual factors motivate exploration of adjunct screening concepts that could support front-end triage decisions within existing care pathways. This study evaluates, in simulation, whether modeled tooth-percussion response signals contain sufficient discriminative information to justify further translational and managerial investigation. Implementation costs, workflow optimization, and economic outcomes are not evaluated directly; rather, the objective is to assess whether the technical preconditions for a potentially scalable screening concept are satisfied under controlled in silico conditions. Methods: An in silico model of tooth percussion was developed in which enamel, dentin, and pulp/root structures were represented as a simplified layered mechanical system. Impulse responses generated from simulated tapping were used to compute the modeled surface-vibration response (enamel-layer displacement), which served as a proxy for a measurable percussion-related signal (e.g., contact vibration), rather than a recorded acoustic waveform. Carious conditions were simulated through depth-dependent reductions in stiffness and effective mass and increases in damping to represent enamel and dentin demineralization. A synthetic dataset of labeled simulated signals was generated under varying structural parameters and measurement-noise assumptions. Machine-learning models using Mel-frequency cepstral coefficient (MFCC) features were trained to classify healthy teeth, enamel caries, and dentin caries at a screening (triage) level. Results: Under baseline simulation conditions, the classifier achieved an overall accuracy of 0.97 with balanced macro-averaged F1-score (0.97). Misclassifications occurred primarily between healthy and enamel-caries categories, whereas dentin-caries cases were most consistently identified. When measurement noise and structural variability were increased, performance declined gradually, reaching approximately 0.90 accuracy under the most challenging simulated scenario. These results indicate that discriminative information is present within the modeled signals at a screening (triage) level, meaning that higher-risk categories can be distinguished probabilistically rather than with definitive diagnostic certainty. Sensitivity and specificity trade-offs were not optimized in this study, as the objective was to assess separability rather than to define clinical decision thresholds. Conclusions: Within the constraints of the in silico model, simulated tooth-percussion response signals demonstrated discriminative patterns between healthy, enamel caries, and dentin caries categories at a screening (triage) level. These findings establish technical plausibility under controlled simulation conditions and support further investigation of percussion-based screening as a potential adjunct to clinical assessment. From a healthcare management perspective, the present results address a prerequisite question—whether such signals contain sufficient information to justify translational research, rather than demonstrating workflow optimization, cost reduction, or system-level impact. Clinical validation, threshold optimization, and implementation studies are required before managerial or operational benefits can be evaluated. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

25 pages, 6669 KB

Open AccessArticle

G-CMTF Net: Spectro-Temporal Disentanglement and Reliability-Aware Gated Cross-Modal Temporal Fusion for Robust PSG Sleep Staging

by Jiongyao Ye and Pengfei Li

Symmetry 2026, 18(2), 316; https://doi.org/10.3390/sym18020316 - 9 Feb 2026

Viewed by 451

Abstract

Automatic sleep staging from polysomnography is challenged by marked spectro-temporal heterogeneity and non-stationary cross-channel artifacts, which often undermine naïve multimodal fusion. To address this, a Gated Cross-Modal and Temporal Fusion Network (G-CMTF Net) is proposed as an end-to-end model operating on 30 s [...] Read more.

Automatic sleep staging from polysomnography is challenged by marked spectro-temporal heterogeneity and non-stationary cross-channel artifacts, which often undermine naïve multimodal fusion. To address this, a Gated Cross-Modal and Temporal Fusion Network (G-CMTF Net) is proposed as an end-to-end model operating on 30 s EEG epochs and auxiliary EOG and EMG signals, in which cross-modal contributions are regulated through reliability-aware gating. A spectro-temporal disentanglement frontend learns multi-scale temporal features while incorporating FFT-derived band-power embeddings to preserve physiologically meaningful oscillatory cues. At the epoch level, gated fusion suppresses artifact-prone auxiliary inputs, thereby limiting noise transfer into a shared latent space. Long-range sleep dynamics are modeled via a convolution-augmented self-attention encoder that captures both local morphology and transition structure. On Sleep-EDF-20 and Sleep-EDF-78, G-CMTF Net achieves Macro-F1/ACC of 81.3%/85.5% and 78.2%/83.4%, respectively, while maintaining high sensitivity and geometric-mean performance on transitional epochs, consistent with the function of reliability-aware gated fusion under non-stationary auxiliary artifacts. From a symmetry perspective, the proposed framework enforces a structured balance between heterogeneous modalities by promoting representational consistency while adaptively suppressing asymmetric noise contributions. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

13 pages, 2304 KB

Open AccessArticle

Hybrid Multi-Scale CNN and Transformer Model for Motor Fault Detection

by Prashant Kumar

Machines 2026, 14(1), 113; https://doi.org/10.3390/machines14010113 - 19 Jan 2026

Cited by 1 | Viewed by 749

Abstract

Electric motors are the workhorse of industries owing to their precise speed and torque control technologies. Despite their ruggedness, faults are inevitable due to wear and tear, their prolonged usage and multiple factors. Bearing faults are among the most frequently occurring faults in [...] Read more.

Electric motors are the workhorse of industries owing to their precise speed and torque control technologies. Despite their ruggedness, faults are inevitable due to wear and tear, their prolonged usage and multiple factors. Bearing faults are among the most frequently occurring faults in electric motors. Detecting faults at an early stage is crucial for avoiding complete shutdown. Deep learning has gained significant attention in the fault detection domain owing to its inherent advantages. This paper proposes a hybrid multi-scale convolutional neural network and Transformer model for bearing fault detection. The model combines the strengths of multi-scale convolutional front-ends for fine-grained feature extraction with Transformer encoder blocks for capturing long-range temporal dependencies. This approach combines the advantages of both models for effective bearing fault detection. The proposed method was tested on a bearing dataset to show its performance and efficacy. This method achieved high-performance accuracy in bearing fault detection. Full article

(This article belongs to the Section Machines Testing and Maintenance)

► Show Figures

Figure 1

23 pages, 1063 KB

Open AccessArticle

A Comparative Experimental Study on Simple Features and Lightweight Models for Voice Activity Detection in Noisy Environments

by Bo-Yu Su, Berlin Chen, Shih-Chieh Huang and Jeih-Weih Hung

Electronics 2026, 15(2), 263; https://doi.org/10.3390/electronics15020263 - 7 Jan 2026

Viewed by 520

Abstract

This work presents a comparative study of voice activity detection in noise using simple acoustic features and relatively compact recurrent models within a controlled MATLAB-based framework. For each utterance, 9 baseline spectral-plus-periodicity features, MFCCs, and FBANKs are extracted and passed to several lightweight [...] Read more.

This work presents a comparative study of voice activity detection in noise using simple acoustic features and relatively compact recurrent models within a controlled MATLAB-based framework. For each utterance, 9 baseline spectral-plus-periodicity features, MFCCs, and FBANKs are extracted and passed to several lightweight BiLSTM-based networks, either alone or preceded by a 1D CNN layer. The main experiments are carried out at a fixed SNR to separate the influence of the network structure and the feature type, and an additional series with four SNR levels is used to assess whether the same performance trends hold when the SNR varies. The results show that adding a compact CNN front-end before the BiLSTM consistently improves detection scores, that MFCCs generally outperform the baseline spectral–periodicity features and often give better recall/F1 than FBANKs for the considered lightweight models, and that

CNN (3, 32)

+BiLSTM with 13-dimensional MFCCs offers a favorable trade-off between accuracy, robustness across SNRs, and model size. Because all conditions share a single MATLAB implementation with fixed noise types, SNR values, and evaluation metrics, this work is positioned as a benchmark and practical guideline publication for noise-robust, resource-constrained VAD, rather than as a proposal of a completely new deep-learning architecture. Full article

(This article belongs to the Special Issue Exploring Edge AI: Architectures, Algorithms, and the Role of Edge–Cloud Cooperation for Scalable AI Systems)

► Show Figures

Figure 1

29 pages, 808 KB

Open AccessReview

Spectrogram Features for Audio and Speech Analysis

by Ian McLoughlin, Lam Pham, Yan Song, Xiaoxiao Miao, Huy Phan, Pengfei Cai, Qing Gu, Jiang Nan, Haoyu Song and Donny Soh

Appl. Sci. 2026, 16(2), 572; https://doi.org/10.3390/app16020572 - 6 Jan 2026

Cited by 1 | Viewed by 3006

Abstract

Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivation behind spectrogram-based representations was their ability to present sound as a two-dimensional signal in the time–frequency [...] Read more.

Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivation behind spectrogram-based representations was their ability to present sound as a two-dimensional signal in the time–frequency plane, which not only provides an interpretable physical basis for analysing sound, but also unlocks the use of a range of machine learning techniques such as convolutional neural networks, which had been developed for image processing. A spectrogram is a matrix characterised by the resolution and span of its dimensions, as well as by the representation and scaling of each element. Many possibilities for these three characteristics have been explored by researchers across numerous application areas, with different settings showing affinity for various tasks. This paper reviews the use of spectrogram-based representations and surveys the state-of-the-art to question how front-end feature representation choice allies with back-end classifier architecture for different tasks. Full article

(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)

► Show Figures

Figure 1

19 pages, 38545 KB

Open AccessArticle

Improving Dynamic Visual SLAM in Robotic Environments via Angle-Based Optical Flow Analysis

by Sedat Dikici and Fikret Arı

Electronics 2026, 15(1), 223; https://doi.org/10.3390/electronics15010223 - 3 Jan 2026

Viewed by 719

Abstract

Dynamic objects present a major challenge for visual simultaneous localization and mapping (Visual SLAM), as feature measurements originating from moving regions can corrupt camera pose estimation and lead to inaccurate maps. In this paper, we propose a lightweight, semantic-free front-end enhancement for ORB-SLAM [...] Read more.

Dynamic objects present a major challenge for visual simultaneous localization and mapping (Visual SLAM), as feature measurements originating from moving regions can corrupt camera pose estimation and lead to inaccurate maps. In this paper, we propose a lightweight, semantic-free front-end enhancement for ORB-SLAM that detects and suppresses dynamic features using optical flow geometry. The key idea is to estimate a global motion direction point (MDP) from optical flow vectors and to classify feature points based on their angular consistency with the camera-induced motion field. Unlike magnitude-based flow filtering, the proposed strategy exploits the geometric consistency of optical flow with respect to a motion direction point, providing robustness not only to depth variation and camera speed changes but also to different camera motion patterns, including pure translation and pure rotation. The method is integrated into the ORB-SLAM front-end without modifying the back-end optimization or cost function. Experiments on public dynamic-scene datasets demonstrate that the proposed approach reduces absolute trajectory error by up to approximately 45% compared to baseline ORB-SLAM, while maintaining real-time performance on a CPU-only platform. These results indicate that reliable dynamic feature suppression can be achieved without semantic priors or deep learning models. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

24 pages, 14385 KB

Open AccessArticle

LDFE-SLAM: Light-Aware Deep Front-End for Robust Visual SLAM Under Challenging Illumination

by Cong Liu, You Wang, Weichao Luo and Yanhong Peng

Machines 2026, 14(1), 44; https://doi.org/10.3390/machines14010044 - 29 Dec 2025

Viewed by 1130

Abstract

Visual SLAM systems face significant performance degradation under dynamic lighting conditions, where traditional feature extraction methods suffer from reduced keypoint detection and unstable matching. This paper presents LDFE-SLAM, a novel visual SLAM framework that addresses illumination challenges through a Light-Aware Deep Front-End (LDFE) [...] Read more.

Visual SLAM systems face significant performance degradation under dynamic lighting conditions, where traditional feature extraction methods suffer from reduced keypoint detection and unstable matching. This paper presents LDFE-SLAM, a novel visual SLAM framework that addresses illumination challenges through a Light-Aware Deep Front-End (LDFE) architecture. Our key insight is that low-light degradation in SLAM is fundamentally a geometric feature distribution problem rather than merely a visibility issue. The proposed system integrates three synergistic components: (1) an illumination-adaptive enhancement module based on EnlightenGAN with geometric consistency loss that restores gradient structures for downstream feature extraction, (2) SuperPoint-based deep feature detection that provides illumination-invariant keypoints, and (3) LightGlue attention-based matching that filters enhancement-induced noise while maintaining geometric consistency. Through systematic evaluation of five method configurations (M1–M5), we demonstrate that enhancement, deep features, and learned matching must be co-designed rather than independently optimized. Experiments on EuRoC and TUM sequences under synthetic illumination degradation show that LDFE-SLAM maintains stable localization accuracy (∼1.2 m ATE) across all brightness levels, while baseline methods degrade significantly (up to 3.7 m). Our method operates normally down to severe lighting conditions (30% ambient brightness and 20–50 lux—equivalent to underground parking or night-time streetlight illumination), representing a 4–6× lower illumination threshold compared to ORB-SLAM3 (200–300 lux minimum). Under severe (25% brightness) conditions, our method achieves a 62% tracking success rate, compared to 12% for ORB-SLAM3, with keypoint detection remaining above the critical 100-point threshold, even under extreme degradation. Full article

(This article belongs to the Special Issue Robotic Intelligence Development of AI in Robot Perception, Learning, and Decision)

► Show Figures

Figure 1

33 pages, 2435 KB

Open AccessArticle

Multi-Task Learning for Ocean-Front Detection and Evolutionary Trend Recognition

by Qi He, Anqi Huang, Lijia Geng, Wei Zhao and Yanling Du

Remote Sens. 2025, 17(23), 3862; https://doi.org/10.3390/rs17233862 - 28 Nov 2025

Viewed by 626

Abstract

Ocean fronts are central to upper-ocean dynamics and ecosystem processes, yet recognizing their evolutionary trends from satellite data remains challenging. We present a 3D U-Net-based multi-task framework that jointly performs ocean-front detection (OFD) and ocean-front evolutionary trend recognition (OFETR) from sea surface temperature [...] Read more.

Ocean fronts are central to upper-ocean dynamics and ecosystem processes, yet recognizing their evolutionary trends from satellite data remains challenging. We present a 3D U-Net-based multi-task framework that jointly performs ocean-front detection (OFD) and ocean-front evolutionary trend recognition (OFETR) from sea surface temperature gradient heatmaps. Instead of cascading OFD and OFETR in separate stages that pass OFD outputs downstream and can amplify upstream errors, the proposed model shares 3D spatiotemporal features and is trained end-to-end. We construct the Zhejiang–Fujian Coastal Front Mask (ZFCFM) and Evolutionary Trend (ZFCFET) datasets from ESA SST CCI L4 products for 2002–2021 and use them to evaluate the framework against 2D CNN baselines and traditional methods. Multi-task learning improves OFETR compared with single-task training while keeping OFD performance comparable, and the unified design reduces parameter count and daily computational cost. The model outputs daily point-level trend labels aligned with the dataset’s temporal resolution, indicating that end-to-end multi-task learning can mitigate error propagation and provide temporally resolved estimates. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition for the Analysis of 2D/3D Remote Sensing Data in Geoscience (Second Edition))

► Show Figures

Figure 1

27 pages, 6859 KB

Open AccessArticle

An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production

by Zhaoyuan Wang, Kaige Liu, Longwei Liang, Changhong Li, Tao Ji, Jing Xu, Huiying Liu and Ming Diao

Horticulturae 2025, 11(10), 1258; https://doi.org/10.3390/horticulturae11101258 - 17 Oct 2025

Viewed by 1238

Abstract

Facility agriculture cultivation is the main production form of the vegetable industry in the world. As an important vegetable crop, hot peppers are easily threatened by many diseases in a facility microclimate environment. Traditional disease detection methods are time-consuming and allow the disease [...] Read more.

Facility agriculture cultivation is the main production form of the vegetable industry in the world. As an important vegetable crop, hot peppers are easily threatened by many diseases in a facility microclimate environment. Traditional disease detection methods are time-consuming and allow the disease to proliferate, so timely detection and inhibition of disease development have become the focus of global agricultural practice. This article proposed a generalizable and explainable machine learning model for hot pepper damping-off in intensive seedling production under the condition of ensuring the high accuracy of the model. Through Kalman filter smoothing, SMOTE-ENN unbalanced sample processing, feature selection and other data preprocessing methods, 19 baseline models were developed for prediction in this article. After statistical testing of the results, Bayesian Optimization algorithm was used to perform hyperparameter tuning for the best five models with performance, and the Extreme Random Trees model (ET) most suitable for this research scenario was determined. The F₁-score of this model is 0.9734, and the AUC value is 0.9969 for predicting the severity of hot pepper damping-off, and the explainable analysis is carried out by SHAP (SHapley Additive exPlanations). According to the results, the hierarchical management strategies under different severities are interpreted. Combined with the front-end visualization interface deployed by the model, it is helpful for farmers to know the development trend of the disease in advance and accurately regulate the environmental factors of seedling raising, and this is of great significance for disease prevention and control and to reduce the impact of diseases on hot pepper growth and development. Full article

(This article belongs to the Special Issue New Trends in Smart Horticulture)

► Show Figures

Figure 1

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI