MDPI - Publisher of Open Access Journals

19 pages, 4732 KB

Open AccessArticle

YOLO-OBB and Two-Stage Geometric Correction for RGB-LED Array Optical Camera Communication

by Jiaqi Ju, Pan Qiu, Yipeng Tan and Zhengguang Shi

Photonics 2026, 13(6), 599; https://doi.org/10.3390/photonics13060599 (registering DOI) - 20 Jun 2026

In Optical Camera Communication (OCC), precise localization of LED arrays under complex tilt conditions is a core challenge for reliable decoding. This paper proposes an OCC reception scheme for RGB-LED arrays that integrates YOLO-OBB rotated object detection with two-stage geometric correction. The system [...] Read more.

In Optical Camera Communication (OCC), precise localization of LED arrays under complex tilt conditions is a core challenge for reliable decoding. This paper proposes an OCC reception scheme for RGB-LED arrays that integrates YOLO-OBB rotated object detection with two-stage geometric correction. The system first employs a YOLOv8n-OBB model to extract a quadrilateral region of interest that tightly encloses the LED array boundary. This effectively suppresses background interference caused by superimposed perspective tilt and in-plane rotation. A coarse-to-fine two-stage correction framework is then applied. The first stage rapidly eliminates the dominant perspective distortion based on the detected bounding-box corners. The second stage performs a refined correction using the actual LED center positions. Two homography matrices are cascaded into a combined transformation, achieving two-stage correction accuracy through a single coordinate mapping. In the corrected image, K-Means clustering constructs a 16 × 16 LED topological grid. A locking strategy is adopted so that subsequent frames skip repeated LED detection and clustering. The steady-state per-frame processing time is reduced to approximately 78.9 ms. Experiments covered 16 cross-combinations of vertical tilt from 0° to 45° (0°, 15°, 30°, 45°) and in-plane rotation from 0° to 40° (0°, 15°, 30°, 40°). The uncorrected scheme and the horizontal-box scheme experienced severe bit errors or complete failure under complicated distortion. The proposed scheme maintained error-free transmission under all 16 tested conditions. The ratios of opposite sides of the corrected LED grid remained stable between 0.997 and 1.004. The system simultaneously achieves high reliability and low-latency real-time processing under complex geometric distortions. Full article

(This article belongs to the Special Issue Editorial Board Members’ Collection Series: Optical Wireless Communication)

► Show Figures

Figure 1

16 pages, 19022 KB

Open AccessArticle

A Scanning Focal-Point Method for Enhancing the Signal Stability of Laser-Induced Acoustic Communication

by Changfei Yang, Zhuang Liu, Jiuhe Wei, Shuwan Yu, Qiang Fu and Chao Wang

Optics 2026, 7(3), 44; https://doi.org/10.3390/opt7030044 - 18 Jun 2026

Abstract

Laser-induced acoustic communication is a highly adaptable cross-medium technique that combines the advantages of optical transmission through air and acoustic transmission underwater. However, poor signal stability at high repetition frequencies currently hinders its widespread application. To address this, this paper proposes an innovative [...] Read more.

Laser-induced acoustic communication is a highly adaptable cross-medium technique that combines the advantages of optical transmission through air and acoustic transmission underwater. However, poor signal stability at high repetition frequencies currently hinders its widespread application. To address this, this paper proposes an innovative scanning focal-point method to enhance stability. Traditional methods such as beam scanning, focus control, and distributed interaction are primarily aimed at enhancing sound pressure in a specific direction, achieving near-field/far-field focusing, or improving the signal-to-noise ratio through coherent synthesis of ultrasonic intensity. In contrast, the method proposed in this paper is intended to avoid the interference of droplets and vapor generated by single-point breakdown under high repetition frequencies, which would otherwise degrade the laser-acoustic conversion efficiency. It is therefore an active defense strategy specifically targeting the stability of laser-induced acoustic communication. First, optical simulation software was used to analyze the effects of surface ripples and bubbles on focal spot displacement and size. Next, a single-pulse experimental system was developed to measure the range and duration of surface depressions caused by optical breakdown. Finally, a scanning focal-point system was constructed for comparative experiments, with results recorded via hydrophones and high-speed cameras. The maximum laser-induced acoustic signal generated by the scanning focal-point method is 7.4 times that produced by single-point breakdown. The experimental results demonstrate that the scanning focal-point method can effectively avoid the influence of water surface disturbance and steam on the optoacoustic conversion efficiency and significantly improve the amplitude and stability of the laser-induced acoustic signal. Full article

(This article belongs to the Section Laser Sciences and Technology)

► Show Figures

Figure 1

19 pages, 6317 KB

Open AccessArticle

FDARC: Frequency-Aware and Depth Association Radar–Camera Fusion

by Huiwei Wang, Xiong Duan and Chi Zhang

Electronics 2026, 15(12), 2672; https://doi.org/10.3390/electronics15122672 - 16 Jun 2026

Viewed by 146

Abstract

Autonomous driving necessitates a robust 3D perception system that includes accurate object detection, tracking, and segmentation. While recent low-cost camera-based methods have demonstrated promising results, these systems are prone to performance degradation under poor lighting conditions or adverse weather, resulting in considerable localization [...] Read more.

Autonomous driving necessitates a robust 3D perception system that includes accurate object detection, tracking, and segmentation. While recent low-cost camera-based methods have demonstrated promising results, these systems are prone to performance degradation under poor lighting conditions or adverse weather, resulting in considerable localization errors. In this paper, we present a novel approach called Frequency-aware Depth Association Radar-Camera (FDARC) Fusion. This method aims to generate semantically rich and spatially accurate Bird’s-Eye-View (BEV) feature maps by integrating data from both camera and radar sensors. Initially, the image features are enhanced using frequency-aware techniques. Subsequently, these features are transformed into BEV representation with the assistance of depth information estimated from both sensor modalities and radar measurements. This process, known as Depth Association (DA), facilitates more precise BEV representations. Following this, a Temporal and Deformable Cross-Fusion (TDCF) layer is utilized to encode multi-modal feature maps into a unified space-time dimension representation. Extensive experiments conducted on the nuScenes dataset show that FDARC achieves state-of-the-art performance in 3D detection tasks, markedly outperforming baseline models on the nuScenes val set using a ResNet-50 backbone, which attains 53.5% nuScenes Detection Score (NDS) and 44.7% mean Average Precision (mAP). Full article

► Show Figures

Figure 1

20 pages, 694 KB

Open AccessArticle

A Joint-Level Hybrid Framework for Gait Analysis Using Camera–IMU Fusion and LSTM-Based Temporal Correction

by Eunju Ha and Jong-Wook Kim

Sensors 2026, 26(12), 3828; https://doi.org/10.3390/s26123828 - 16 Jun 2026

Viewed by 173

Abstract

Gait analysis is an essential tool in clinical domains for diagnosing musculoskeletal disorders and evaluating rehabilitation, yet traditional marker-based systems are limited by high costs and spatial constraints. To overcome these challenges, this study proposes and evaluates a joint-level hybrid framework that integrates [...] Read more.

Gait analysis is an essential tool in clinical domains for diagnosing musculoskeletal disorders and evaluating rehabilitation, yet traditional marker-based systems are limited by high costs and spatial constraints. To overcome these challenges, this study proposes and evaluates a joint-level hybrid framework that integrates a single RGB camera with two shoe-mounted inertial measurement units (IMUs) to leverage their complementary strengths. The camera-based module estimates hip and knee sagittal joint angles using 3D pose estimation, where the DEAS optimization algorithm aligns estimated coordinates with a humanoid model, and an LSTM-based refinement network corrects hip angles by referencing more accurately estimated knee data. Simultaneously, the IMU-based module estimates sagittal ankle angles through kinematic chain relationships that combine camera-derived proximal joint information with IMU-measured foot orientation. Experimental validation with 11 healthy participants in a controlled laboratory environment demonstrates promising estimation performance, achieving an average mean absolute error (MAE) of 7.89° and RMSE of 10.09° on the held-out test set across sagittal hip, knee, and ankle angles. Leave-one-subject-out (LOSO) cross-validation of the LSTM correction model further confirmed its generalizability, yielding an average MAE of 6.40° across bilateral hip angles. By accurately mitigating the trunk-inclination-induced overestimation of hip angles with a minimal sensor configuration (one camera and two IMUs), the proposed framework provides a practical and interpretable approach for portable lower limb gait analysis. Full article

(This article belongs to the Section Biomedical Sensors)

17 pages, 11451 KB

Open AccessArticle

A Real-World Benchmark for Early Wildfire Detection Using Sequential Data with the PyroNear Dataset

by Mateo Lostanlen, Nicolás Isla, José Guillén, Renzo Zanca, Félix Veith, Cristian Buc and Valentín Barriere

Electronics 2026, 15(12), 2652; https://doi.org/10.3390/electronics15122652 - 15 Jun 2026

Viewed by 113

Abstract

Early wildfire detection (EWD) is of the utmost importance to enable rapid response efforts and thus minimize the negative impacts of wildfire spreads. To this end, we present PyroNear₂₀₂₅, a new dataset composed of both images and videos, allowing for the [...] Read more.

Early wildfire detection (EWD) is of the utmost importance to enable rapid response efforts and thus minimize the negative impacts of wildfire spreads. To this end, we present PyroNear₂₀₂₅, a new dataset composed of both images and videos, allowing for the training and evaluation of smoke plume detection models, including sequential models. The data is sourced from the following: (i) web-scraped videos of wildfires from public networks of cameras for wildfire detection in-the-wild, (ii) videos from our in-house network of cameras, and (iii) a small portion of synthetic and real images. This dataset includes around 150,000 manual annotations on 50,000 images, covering 640 wildfires; PyroNear₂₀₂₅ surpasses existing datasets in size and diversity. It includes data from France, Spain, Chile, and the United States. Finally, it is composed of both images and videos, allowing for the training and evaluation of smoke plume detection models, including sequential models. We ran cross-dataset experiments using a lightweight state-of-the-art object detection model, similar to the ones used in real-world applications, and found that the proposed dataset is particularly challenging, with an F1 score of around 70%, but it is more stable than existing datasets. Finally, its use in concordance with other public datasets helps to reach higher results overall. Last but not least, the video part of the dataset enables another technical contribution, as it can be used to train a lightweight sequential model, improving global recall while maintaining precision for earlier detections. The output of this work has real-life implications, as it is used to automatically detect wildfires, with our models running on Raspberry Pi in several countries. We will make both our code and data available online. Full article

(This article belongs to the Special Issue Innovations in Deep Learning and Computer Vision for Early Fire and Smoke Detection)

► Show Figures

Figure 1

23 pages, 11767 KB

Open AccessReview

Digital Implant Position Recording in Complete-Arch Prostheses: Intraoral and Extraoral Techniques

by Erhan Dilber and Kübra Yıldız Domaniç

Prosthesis 2026, 8(6), 60; https://doi.org/10.3390/prosthesis8060060 - 15 Jun 2026

Viewed by 151

Abstract

Background/Objective: Accurate digital recording of implant position is essential for achieving passive fit and predictable outcomes in complete-arch implant-supported prostheses. However, complete-arch cases remain challenging because of increased inter-implant distances, limited anatomical landmarks, soft tissue mobility, scan body-related variables, and cumulative errors during [...] Read more.

Background/Objective: Accurate digital recording of implant position is essential for achieving passive fit and predictable outcomes in complete-arch implant-supported prostheses. However, complete-arch cases remain challenging because of increased inter-implant distances, limited anatomical landmarks, soft tissue mobility, scan body-related variables, and cumulative errors during data acquisition and file registration. This narrative review aims to evaluate current intraoral and extraoral digital implant position recording techniques from a clinical decision-making perspective. Methods: A structured narrative literature search was conducted in PubMed from database inception to 15 May 2026 and was supplemented by manual screening of reference lists of key systematic reviews and eligible articles. Systematic reviews, meta-analyses, clinical studies, comparative in vitro studies, dental technique articles, and clinical reports relevant to complete-arch digital implant position recording were considered. Higher-level and clinically relevant evidence was prioritized, whereas technique reports were included primarily for emerging workflows with limited clinical evidence. Results: Intraoral techniques include non-splinted and splinted scan body protocols, calibrated implant scan bodies, calibrated frameworks, and auxiliary reference strategies. These methods may be clinically efficient but remain sensitive to scan path, scanner technology, landmark availability, scan body design, implant distribution, and operator-related factors. Extraoral techniques include stereophotogrammetry, camera- or smartphone-assisted photogrammetric systems, reverse impression workflows, and laboratory scanner-based digitization. These approaches may reduce intraoral stitching errors in complex edentulous arches, but usually require complementary datasets for soft tissue morphology, prosthetic contours, antagonist dentition, and maxillomandibular relationships. Conclusions: Direct intraoral scanner (IOS) protocols may be appropriate in favorable complete-arch situations with accessible scan bodies, limited inter-implant distances, and stable reference geometry. In clinically demanding cases requiring greater cross-arch accuracy, stereophotogrammetry, intraoral photogrammetry, or calibrated scanning approaches may provide more controlled implant position recording. Reverse impression and model-based workflows are particularly useful when a verified interim prosthesis, verification jig, or cast-based reference is available. Regardless of the selected technique, accurate integration of implant coordinates with soft tissue, prosthetic contour, antagonist arch, and occlusal data remains essential. Full article

(This article belongs to the Special Issue Advances in Digital Prosthodontics: Innovations in CAD-CAM Technology and Material Science)

► Show Figures

Figure 1

16 pages, 23623 KB

Open AccessArticle

Deep Learning-Based Blood Segmentation and Temporal Characterization for the Robin Heart Surgical Robot

by Klaudia Senator, Dariusz Krawczyk and Zbigniew Nawrat

Surgeries 2026, 7(2), 70; https://doi.org/10.3390/surgeries7020070 - 15 Jun 2026

Viewed by 405

Abstract

Background/Objectives: In laparoscopic and robot-assisted surgery, bleeding may rapidly impair operative-field readability and procedural safety. In the broader Robin Heart teleoperation framework, interpretation of such events is relevant not only for scene understanding but also as a potential prerequisite for future safety-oriented [...] Read more.

Background/Objectives: In laparoscopic and robot-assisted surgery, bleeding may rapidly impair operative-field readability and procedural safety. In the broader Robin Heart teleoperation framework, interpretation of such events is relevant not only for scene understanding but also as a potential prerequisite for future safety-oriented supervisory functions under communication-degraded conditions. The aim of this study was to assess whether a deep learning model for blood segmentation could provide outputs suitable for preliminary image-level temporal characterization of visible blood-region behavior in laparoscopic video. Methods: A U-Net-based binary blood-segmentation model was implemented in-house in PyTorch and evaluated on three paired image–mask datasets: a simulated bleeding dataset prepared under controlled laboratory conditions, an internal operative laparoscopic dataset, and an external-domain subset derived from the public GynSurg dataset. Segmentation performance was assessed using 5-fold cross-validation and reported using the Dice coefficient and Intersection over Union (IoU). Training dynamics were analyzed using training and validation loss and Dice curves. Additional baseline comparisons were performed on the internal operative dataset using U-Net++ and DeepLabV3+. Temporal analysis was performed on selected video fragments, including a low-motion reference sequence without active bleeding progression, internal bleeding-related sequences, and external-domain sequences, using mask-derived descriptors and auxiliary optical-flow-based motion descriptors computed after camera-motion compensation within the detected blood-related ROI. Results: In 5-fold cross-validation, the U-Net-based model achieved Dice coefficient and IoU values of 0.915 ± 0.012 and 0.851 ± 0.019 on the simulated dataset, 0.856 ± 0.013 and 0.756 ± 0.025 on the internal operative dataset, and 0.707 ± 0.053 and 0.570 ± 0.056 on the external-domain GynSurg subset, respectively. On the internal operative dataset, the proposed model performed comparably to U-Net++ and slightly above DeepLabV3+ under the same cross-validation protocol. The temporal descriptor set differentiated low-motion reference behavior, more spatially coherent progression, rapid coherent expansion, and dynamic or motion-active progression profiles. Peak dA/dt reflected abrupt visible blood-area expansion, temporal IoU described mask stability over time, and optical-flow-based descriptors provided additional information on local motion activity within the detected blood-related ROI. Conclusions: The results support the feasibility of combining deep-learning-based blood segmentation with temporal and optical-flow-based descriptors for exploratory image-level characterization of visible blood-region behavior in laparoscopic video. Within the Robin Heart development pathway, such descriptors may, in the future, serve as candidate components of image-analysis support modules for safety-oriented teleoperative scenarios. At this stage, they should be interpreted as exploratory image-derived indicators rather than clinically validated markers of bleeding severity. Full article

(This article belongs to the Special Issue The Application of Artificial Intelligence in Surgical Procedures)

► Show Figures

Figure 1

23 pages, 53841 KB

Open AccessArticle

UDF-3D: Uncertainty-Driven Decision-Level Fusion for Camera–LiDAR 3D Object Detection

by Chongyang Hu, Chuangye Di and Yanwei Liu

Appl. Sci. 2026, 16(12), 5983; https://doi.org/10.3390/app16125983 - 12 Jun 2026

Viewed by 220

Abstract

Camera and LiDAR provide highly complementary information, and effective fusion of both modalities is desirable for 3D object detection. However, existing decision-level fusion methods mainly rely on the confidence of objects while neglecting the object uncertainty. To address this, we propose UDF-3D, an [...] Read more.

Camera and LiDAR provide highly complementary information, and effective fusion of both modalities is desirable for 3D object detection. However, existing decision-level fusion methods mainly rely on the confidence of objects while neglecting the object uncertainty. To address this, we propose UDF-3D, an uncertainty-driven camera–LiDAR decision-level fusion method based on Dempster–Shafer evidence theory. First, object uncertainty is quantified by introducing the theory of subjective logic, where subjective opinions incorporate category belief masses and an uncertainty mass. Second, a cost matrix is designed for object matching, where each element is a weighted combination of geometric and semantic information from both sensors, and the weights are determined by the uncertainty parameters. Third, we construct a view-frustum constraint to re-evaluate unmatched objects, thereby reducing the false-negative rate. Finally, we design a novel evidence discounting factor within the Dempster–Shafer framework for matched objects, thereby mitigating cross-modal object conflicts during fusion and improving detection accuracy. Experiments on the KITTI dataset demonstrate that the proposed method outperforms existing decision-level fusion approaches, yielding improved detection accuracy. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 11251 KB

Open AccessArticle

Adaptive Sensor Fusion for Robust Perception in Dense Fog: A Gated Vision and LiDAR Integration Framework

by Fengyuan Zhang, Zixuan Guo, Jianbo Ding, Jingyun Yang and Wenhe Liu

Sensors 2026, 26(12), 3728; https://doi.org/10.3390/s26123728 - 11 Jun 2026

Viewed by 257

Abstract

Autonomous driving systems face critical perception failures in dense fog, where conventional RGB cameras suffer from severe degradation due to atmospheric scattering and reduced visibility. This paper presents an adaptive multi-modal fusion framework that synergistically integrates gated imaging with 3D LiDAR point clouds [...] Read more.

Autonomous driving systems face critical perception failures in dense fog, where conventional RGB cameras suffer from severe degradation due to atmospheric scattering and reduced visibility. This paper presents an adaptive multi-modal fusion framework that synergistically integrates gated imaging with 3D LiDAR point clouds to achieve robust obstacle detection under visibility conditions as low as 50 m. Unlike standard cameras that passively capture scattered ambient light, gated cameras employ time-synchronized active illumination to physically filter backscattered photons, preserving structural features even in low-visibility scenarios. We propose a novel Adaptive Feature-Weighting Network (AFW-Net) that dynamically adjusts sensor modality contributions based on real-time environmental degradation assessment. The framework incorporates three key innovations: (1) a cross-modal feature extraction module that exploits the complementary physical properties of gated imaging and LiDAR, (2) an attention-based adaptive fusion mechanism that quantifies per-modality reliability through uncertainty estimation, and (3) a degradation-aware training strategy using weather-specific augmentation. Extensive experiments on the Princeton Automated Driving Dataset demonstrate that our approach maintains detection average precision (AP) above 82% under dense fog conditions (50 m visibility), representing a 23.7% improvement over state-of-the-art RGB-LiDAR fusion methods that exhibit substantial performance degradation to 58.4% AP. Ablation studies validate the necessity of each component, and cross-dataset evaluation confirms the generalization capability of the proposed framework. The adaptive weighting mechanism proves particularly effective, dynamically rebalancing modality contributions across the gated imaging and LiDAR branches while maintaining LiDAR geometric constraints. This work establishes a robust perception paradigm for safety-critical autonomous systems operating in low-visibility environmental conditions. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

33 pages, 6102 KB

Open AccessArticle

From Detection Toward Decision Support: A Hierarchical Visual–Sensor Framework for Zamioculcas Monitoring in Indoor Environments

by Raikhan Amanova, Baurzhan Belgibayev, Yersaiyn Mailybayev, Gulnur Kazbekova, Zhadyra Akanova, Galiya Mamankyzy, Marzhana Amanova, Artem Bykov, Periuza Pirniyazova and Nurzhigit Smailov

Computers 2026, 15(6), 382; https://doi.org/10.3390/computers15060382 - 11 Jun 2026

Viewed by 145

Abstract

This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant [...] Read more.

This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant Health Index (PHI) and a rule-based decision-support module for integrating visual and IoT-derived indicators. For the detection task, YOLOv8, YOLO12, and YOLO26 were compared, with YOLO26 showing the most balanced performance among the evaluated implementations. To improve robustness in real indoor scenes, negative training samples were added; this reduced the image-level false alarm rate on an independent negative-scene test set from 50.7% to 10.0% and increased specificity from 49.3% to 90.0%. For the second visual level, MobileViT-S achieved an accuracy of 0.9857 and an F1-score of 0.9857 on the independent cropped leaf test subset. To reduce the dependence of this result on a single data split, an additional 5-fold cross-validation experiment was conducted on the full cropped leaf dataset of 847 images, resulting in an accuracy of 0.9858 ± 0.0068 and an F1-score of 0.9853 ± 0.0070. To further address plant-level generalization, an additional unseen-plant validation subset of 60 newly collected cropped leaf images was evaluated, and MobileViT-S achieved an accuracy of 0.9500 and an F1-score of 0.9499. These results support the stability of the leaf-condition classifier within the available data, although larger external validation with strict plant-level and session-level separation remains necessary. In addition, an Arduino-based module-level validation was conducted using a capacitive soil-moisture sensor to verify the proposed sensor-based and Vision–IoT decision rules. The experiment demonstrated that the rule-based layer can distinguish dry, normal, and wet soil states and select conservative care actions depending on both soil moisture and visual-condition input. A brief real-time camera–sensor communication test further confirmed that live camera input, Arduino-based soil-moisture sensing, PHI computation, and care-mode selection can be connected within one decision-support pipeline. The proposed PHI and care-mode selection module are therefore presented as a formalized decision-support layer rather than as a fully validated autonomous irrigation system. Further calibration, actuator integration, and closed-loop validation remain necessary before practical autonomous deployment. Full article

(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)

► Show Figures

Figure 1

29 pages, 10118 KB

Open AccessArticle

A Unified Explainable Autonomous Driving Framework via Cross-Attention Scene Selection and Semantic–Object Fusion

by Habib Dhahri, Fahad Alotaibi, Awais Mahmood and Mousa Jari

Machines 2026, 14(6), 677; https://doi.org/10.3390/machines14060677 - 10 Jun 2026

Viewed by 201

Abstract

Intelligent autonomous driving systems must not only predict the appropriate driving manoeuvre but also provide human-interpretable evidence that justifies the decision. However, existing methods typically address these objectives separately, leading to three practical limitations: multi-stage perception-to-language pipelines can propagate upstream perception errors into [...] Read more.

Intelligent autonomous driving systems must not only predict the appropriate driving manoeuvre but also provide human-interpretable evidence that justifies the decision. However, existing methods typically address these objectives separately, leading to three practical limitations: multi-stage perception-to-language pipelines can propagate upstream perception errors into downstream explanations; post hoc saliency methods often produce pixel-level highlights that are difficult to interpret semantically; and decoupled decision and explanation modules cannot guarantee that the explanation reflects the same scene evidence used for behaviour prediction. In this paper, we propose a unified framework that jointly performs vehicle behaviour prediction and human-centric interpretation from a shared visual backbone. Specifically, a hierarchical Swin Transformer encodes the driving scene into a sequence of spatial tokens, which are processed by two complementary branches. The first branch, termed the Object Selection Module (OSM), learns a compact scene-level semantic representation through query-guided cross-attention, while the second branch extracts a small set of class-agnostic object-centric tokens without requiring bounding-box or segmentation supervision. These two representations are subsequently integrated by a Semantic–Object Fusion (SOF) module based on scaled dot-product attention, residual connections, and a feed-forward network. The behaviour prediction head operates on the fused representation, whereas the interpretation head leverages the semantic representation through a skip connection to preserve decision-relevant context. For surround-view perception, learnable per-camera embeddings are introduced to maintain viewpoint identity with negligible additional parameter cost. Furthermore, a compact language model fine-tuned via Low-Rank Adaptation (LoRA) generates fluent, label-conditioned natural-language justifications. Extensive experiments on two public benchmarks, BDD-OIA and nu-AD, demonstrate that the proposed framework consistently delivers superior performance and provides effective, human-readable interpretations of driving decisions. Full article

(This article belongs to the Special Issue Intelligent Sensing, Planning and Control for Autonomous Ground Vehicles)

► Show Figures

Figure 1

31 pages, 30018 KB

Open AccessArticle

Sensors-Driven Multimodal Deepfake Detection: A Cross-Attention Fusion Approach with Adaptive Modality Gating

by Syeda Sitara Waseem, Noman Shabbir, Syed Rizwan Hassan and KangYoon Lee

Sensors 2026, 26(12), 3695; https://doi.org/10.3390/s26123695 - 10 Jun 2026

Viewed by 163

Abstract

Deepfakes threaten sensor-based authentication systems, including biometric sensors, surveillance cameras, and IoT edge devices. Unimodal detectors remain vulnerable to modality-specific attacks. We propose a multimodal deepfake detection framework optimized for resource-constrained edge devices, featuring a novel cross-modal attention fusion mechanism with adaptive gating. [...] Read more.

Deepfakes threaten sensor-based authentication systems, including biometric sensors, surveillance cameras, and IoT edge devices. Unimodal detectors remain vulnerable to modality-specific attacks. We propose a multimodal deepfake detection framework optimized for resource-constrained edge devices, featuring a novel cross-modal attention fusion mechanism with adaptive gating. The architecture combines enhanced Res2Net for audio, temporal 3D CNN with SE attention for video, and bidirectional cross-modal attention with quality-based gates. On our benchmark (5472 audio + 1842 video samples), the fusion model achieves 96.7% accuracy, 96.6% F1-score, 0.988 AUC-ROC, and 3.3% EER. Adversarial testing shows 92.3% accuracy under the Fast Gradient Sign Method (FGSM) attack. The model has a 30.3 MB footprint and runs at 20 FPS on edge hardware. Modality contribution analysis reveals adaptive weighting (72% audio for TTS forgery, 78% video for lip-synced attacks). Cross-dataset evaluation on FakeAVCeleb achieves 92.3% overall accuracy, confirming generalization. Full article

(This article belongs to the Special Issue Secure and Resilient Solutions for CCTV, Small Sensor and IoT Device Security)

► Show Figures

Figure 1

27 pages, 22077 KB

Open AccessArticle

Reliability of Thermal Conduction-Based Melt Pool Simulations Using In-Process Thermal Camera and Post-Process Single-Track Measurements

by Matheus De Araujo Soares, Donatien Campion, Aurore Leclercq, Alena Kreitcberg and Vladimir Brailovski

Appl. Sci. 2026, 16(12), 5850; https://doi.org/10.3390/app16125850 - 10 Jun 2026

Viewed by 107

Abstract

Laser Powder Bed Fusion (LPBF) is a complex manufacturing process that depends on precise control of printing parameters and melt pool geometry, which directly influence defect formation and final part quality. This study evaluated the reliability of a simplified thermal conduction-based melt pool [...] Read more.

Laser Powder Bed Fusion (LPBF) is a complex manufacturing process that depends on precise control of printing parameters and melt pool geometry, which directly influence defect formation and final part quality. This study evaluated the reliability of a simplified thermal conduction-based melt pool model by combining post-process metallographic analysis with in situ dual-wavelength infrared thermal imaging. Experimental data were obtained through single-track printing on 316L, IN625, and CoCr alloys across a wide range of parameters. The simulated melt pool length showed strong agreement with thermal camera measurements (R²_adj > 0.78), while the width showed moderate but consistent correlation (R²_adj > 0.52). For melt pool depth, the model systematically deviated due to its inability to capture keyhole melting, although a strong linear correlation was still observed (R²_adj > 0.86). Cross-validation between metallographic measurements and thermal imaging revealed only a 6–9% discrepancy, confirming the reliability of both methods and the potential of dual-wavelength cameras for industrial process monitoring. Overall, the model proves to be a reliable tool for predicting melt pool surface geometry specifically within the conduction melting regime, while its predictive capability degrades significantly in the keyhole regime, where simulated peak temperatures reach up to 7000 °C and melt pool depth errors escalate due to the disregard of recoil pressure, liquid and vapor dynamics. Full article

(This article belongs to the Special Issue Modelling and Simulation of Mechanical Properties for Additive Manufacturing Material)

► Show Figures

Figure 1

18 pages, 8478 KB

Open AccessArticle

Machine Learning-Enabled Layer-Wise Melting Quality Recognition for Laser Powder Bed Fusion Process via In Situ Monitoring

by Yuan Liu, Bowei Zou, Zhizhou Zhang, Yongxing Zhang and Shiqing Huang

Materials 2026, 19(12), 2463; https://doi.org/10.3390/ma19122463 - 9 Jun 2026

Viewed by 196

Abstract

Laser powder bed fusion (L-PBF) has emerged as a core metal additive manufacturing technology for high-end sectors, including aerospace and medical device manufacturing. However, melting anomalies that occur during fabrication accumulate layer by layer, leading to degraded surface quality and impaired mechanical performance [...] Read more.

Laser powder bed fusion (L-PBF) has emerged as a core metal additive manufacturing technology for high-end sectors, including aerospace and medical device manufacturing. However, melting anomalies that occur during fabrication accumulate layer by layer, leading to degraded surface quality and impaired mechanical performance of as-built components—a critical bottleneck limiting their large-scale industrial adoption. Accurate and robust layer-wise melting quality recognition remains a challenge due to the complex surface morphologies induced by such melting anomalies. This study presents a machine learning-enabled in situ monitoring approach for layer-wise melting quality identification in L-PBF. By systematically varying laser power and scanning speed, 24 parameter combinations were designed to fabricate specimens with three distinct melting states: over-melting (OM), lack of fusion (LOF), and normal melting. A high-resolution complementary meta–oxide–semiconductor (CMOS) camera was used to capture layer-wise surface images of the specimens, and following abnormal layer filtering and manual validation, a high-quality dataset comprising 5110 layer-wise images was constructed. Two mainstream machine learning approaches were systematically evaluated and optimized for melting quality classification: a support vector machine (SVM) model leveraging handcrafted gray-level co-occurrence matrix (GLCM) texture features achieved a classification accuracy of 96.77%, while a convolutional neural network (CNN) model with end-to-end feature learning directly from raw images attained a superior accuracy of 98.14%. In terms of computational efficiency, the CNN model exhibited a faster inference speed with a per-layer inference time of just 0.036 s, nearly half that of the SVM model (0.068 s per layer). Most critically, the CNN model completely eliminated fatal cross-class misclassification between OM and LOF—an error mode common in the SVM model that would trigger erroneous process corrective actions in practical industrial applications. The findings demonstrate that image-based machine learning provides a reliable technical foundation for intelligent in situ monitoring of the L-PBF process. With its high accuracy, strong robustness, and superior computational efficiency, the CNN model can effectively support on-site operational decision-making, reduce material and time losses, and enhance process stability in industrial settings, thus exhibiting significant potential for practical engineering deployment. Full article

(This article belongs to the Special Issue AI-Driven Modeling and Monitoring Towards Advanced Additive Manufacturing)

► Show Figures

Figure 1

19 pages, 2281 KB

Open AccessArticle

Light Attention Encoder–Decoder for Cattle Body Segmentation and Body Weight Estimation

by Sahilpreet Singh Mann, Halah K. Shehada, Sabrina T. Amorim, Dong S. Ha, Gota Morota and Sook Shin

Animals 2026, 16(12), 1773; https://doi.org/10.3390/ani16121773 - 8 Jun 2026

Viewed by 200

Abstract

Accurate, non-invasive body weight estimation is essential for management and performance monitoring in beef cattle systems, yet conventional scales and manual measurements require animal handling, infrastructure, and labor. This study presents an integrated pipeline that segments cattle from overhead depth images and predicts [...] Read more.

Accurate, non-invasive body weight estimation is essential for management and performance monitoring in beef cattle systems, yet conventional scales and manual measurements require animal handling, infrastructure, and labor. This study presents an integrated pipeline that segments cattle from overhead depth images and predicts body weight from extracted image features. The approach uses a Light Attention Encoder–Decoder (LAED) segmentation model combining depthwise separable convolutions, Gaussian Context Transformer (GCT) attention, a multi-scale dilated bottleneck, and dual heads for region and boundary prediction. Depth videos were collected using an overhead Intel RealSense D435 RGB-D camera from 60 beef heifers. To reduce animal-level leakage, leave-one-animal-out cross-validation was used for segmentation. LAED + GCT achieved 96.91% Dice (95% confidence interval (CI): 96.56–97.21%) and 94.22% IoU (95% CI: 93.58–94.77%), while operating at 33.08 frames per second. For weight prediction, biometric traits and deep features were evaluated using random forest, support vector regression, and fully connected neural networks. The best primary-metric body-weight model used biometric traits with support vector regression, achieving MAPE = 6.75%, pooled

R^{2}

= 0.68, MAE = 23.92 kg, and RMSE = 31.79 kg. Among FCNN models trained independently within each cattle-level fold, the best result used ResNet50 features and achieved MAPE = 7.76%, a pooled

R^{2}

= 0.56, an MAE = 27.60 kg, and an RMSE = 37.07 kg. The mean signed prediction bias for the biometric-SVR model was −1.04 kg, using predicted minus observed body weight, with a bootstrap 95% confidence interval of −9.63 to 7.41 kg. These results support the promise of overhead depth imaging for non-invasive cattle body segmentation and weight estimation, while larger external validation remains necessary. Full article

(This article belongs to the Section Animal Products)

► Show Figures

Figure 1

Search Results (819)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (819)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI