Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (749)

Search Parameters:
Keywords = frame fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2791 KB  
Article
Tagging Fluorescent Reporter to Epinecidin-1 Antimicrobial Peptide
by Sivakumar Jeyarajan, Harini Priya Ramesh, Atchyasri Anbarasu, Jayasudha Jayachandran and Anbarasu Kumarasamy
J 2025, 8(4), 42; https://doi.org/10.3390/j8040042 (registering DOI) - 2 Nov 2025
Abstract
In this study, we successfully cloned the fluorescent proteins eGFP and DsRed in-frame with the antimicrobial peptide epinecidin-1 (FIFHIIKGLFHAGKMIHGLV) at the N-terminal. The cloning strategy involved inserting the fluorescent reporters into the expression vector, followed by screening for positive clones through visual fluorescence [...] Read more.
In this study, we successfully cloned the fluorescent proteins eGFP and DsRed in-frame with the antimicrobial peptide epinecidin-1 (FIFHIIKGLFHAGKMIHGLV) at the N-terminal. The cloning strategy involved inserting the fluorescent reporters into the expression vector, followed by screening for positive clones through visual fluorescence detection and molecular validation. The visually identified fluorescent colonies were confirmed as positive by PCR and plasmid migration assays, indicating successful cloning. This fusion of fluorescent reporters with a short antimicrobial peptide enables real-time visualization and monitoring of the peptide’s mechanism of action on membranes and within cells, both in vivo and in vitro. The fusion of eGFP and DsRed to epinecidin-1 did not impair the expression or fluorescence of the reporter protein. Full article
(This article belongs to the Special Issue Feature Papers of J—Multidisciplinary Scientific Journal in 2025)
Show Figures

Figure 1

37 pages, 11970 KB  
Review
Sensor-Centric Intelligent Systems for Soybean Harvest Mechanization in Challenging Agro-Environments of China: A Review
by Xinyang Gu, Zhong Tang and Bangzhui Wang
Sensors 2025, 25(21), 6695; https://doi.org/10.3390/s25216695 (registering DOI) - 2 Nov 2025
Abstract
Soybean–corn intercropping in the hilly–mountainous regions of Southwest China poses unique challenges to mechanized harvesting because of complex topography and agronomic constraints. Addressing the soybean-harvesting bottleneck in these fields requires advanced sensing and perception rather than purely mechanical redesigns. Prior reviews emphasized flat-terrain [...] Read more.
Soybean–corn intercropping in the hilly–mountainous regions of Southwest China poses unique challenges to mechanized harvesting because of complex topography and agronomic constraints. Addressing the soybean-harvesting bottleneck in these fields requires advanced sensing and perception rather than purely mechanical redesigns. Prior reviews emphasized flat-terrain machinery or single-crop systems, leaving a gap in sensor-centric solutions for intercropping on steep, irregular plots. This review analyzes how sensors enable the next generation of intelligent harvesters by linking field constraints to perception and control. We frame the core failures of conventional machines—instability, inconsistent cutting, and low efficiency—as perception problems driven by low pod height, severe slope effects, and header–row mismatches. From this perspective, we highlight five fronts: (1) terrain-profiling sensors integrated with adaptive headers; (2) IMUs and inclination sensors for chassis stability and traction on slopes; (3) multi-sensor fusion of LiDAR and machine vision with AI for crop identification, navigation, and obstacle avoidance; (4) vision and spectral sensing for selective harvesting and impurity pre-sorting; and (5) acoustic/vibration sensing for low-damage, high-efficiency threshing and cleaning. We conclude that compact, intelligent machinery powered by sensing, data fusion, and real-time control is essential, while acknowledging technological and socio-economic barriers to deployment. This review outlines a sensor-driven roadmap for sustainable, efficient soybean harvesting in challenging terrains. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

32 pages, 1307 KB  
Systematic Review
Machine and Deep Learning for Wetland Mapping and Bird-Habitat Monitoring: A Systematic Review of Remote-Sensing Applications (2015–April 2025)
by Marwa Zerrouk, Kenza Ait El Kadi, Imane Sebari and Siham Fellahi
Remote Sens. 2025, 17(21), 3605; https://doi.org/10.3390/rs17213605 (registering DOI) - 31 Oct 2025
Abstract
Wetlands, among the most productive ecosystems on Earth, shelter a diversity of species and help maintain ecological balance. However, they are witnessing growing anthropogenic and climatic threats, which underscores the need for regular and long-term monitoring. This study presents a systematic review of [...] Read more.
Wetlands, among the most productive ecosystems on Earth, shelter a diversity of species and help maintain ecological balance. However, they are witnessing growing anthropogenic and climatic threats, which underscores the need for regular and long-term monitoring. This study presents a systematic review of 121 peer-reviewed articles published between January 2015 and 30 April 2025 that applied machine learning (ML) and deep learning (DL) for wetland mapping and bird-habitat monitoring. Despite rising interest, applications remain fragmented, especially for avian habitats; only 39 studies considered birds, and fewer explicitly framed wetlands as bird habitats. Following PRISMA 2020 and the SPIDER framework, we compare data sources, classification methods, validation practices, geographic focus, and wetland types. ML is predominant overall, with random forest the most common baseline, while DL (e.g., U-Net and Transformer variants) is underused relative to its broader land cover adoption. Where reported, DL shows a modest but consistent accuracy over ML for complex wetland mapping; this accuracy improves when fusing synthetic aperture radar (SAR) and optical data. Validation still relies mainly on overall accuracy (OA) and Kappa coefficient (κ), with limited class-wise metrics. Salt marshes and mangroves dominate thematically, and China geographically, whereas peatlands, urban marshes, tundra, and many regions (e.g., Africa and South America) remain underrepresented. Multi-source fusion is beneficial yet not routine; The combination of unmanned aerial vehicles (UAVs) and DL is promising for fine-scale avian micro-habitats but constrained by disturbance and labeling costs. We then conclude with actionable recommendations to enable more robust and scalable monitoring. This review can be considered as the first comparative synthesis of ML/DL methods applied to wetland mapping and bird-habitat monitoring, and highlights the need for more diverse, transferable, and ecologically/socially integrated AI applications in wetland and bird-habitat monitoring. Full article
Show Figures

Figure 1

28 pages, 2524 KB  
Article
A Multimodal Analysis of Automotive Video Communication Effectiveness: The Impact of Visual Emotion, Spatiotemporal Cues, and Title Sentiment
by Yawei He, Zijie Feng and Wen Liu
Electronics 2025, 14(21), 4200; https://doi.org/10.3390/electronics14214200 - 27 Oct 2025
Viewed by 206
Abstract
To quantify the communication effectiveness of automotive online videos, this study constructs a multimodal deep learning framework. Existing research often overlooks the intrinsic and interactive impact of textual and dynamic visual content. To bridge this gap, our framework conducts an integrated analysis of [...] Read more.
To quantify the communication effectiveness of automotive online videos, this study constructs a multimodal deep learning framework. Existing research often overlooks the intrinsic and interactive impact of textual and dynamic visual content. To bridge this gap, our framework conducts an integrated analysis of both the textual (titles) and visual (frames) dimensions of videos. For visual analysis, we introduce FER-MA-YOLO, a novel facial expression recognition model tailored to the demands of computational communication research. Enhanced with a Dense Growth Feature Fusion (DGF) module and a multiscale Dilated Attention Module (MDAM), it enables accurate quantification of on-screen emotional dynamics, which is essential for testing our hypotheses on user engagement. For textual analysis, we employ a BERT model to quantify the sentiment intensity of video titles. Applying this framework to 968 videos from the Bilibili platform, our regression analysis—which modeled four distinct engagement dimensions (reach, support, discussion, and interaction) separately, in addition to a composite effectiveness score—reveals several key insights: emotionally charged titles significantly boost user interaction; visually, the on-screen proportion of human elements positively predicts engagement, while excessively high visual information entropy weakens it. Furthermore, neutral expressions boost view counts, and happy expressions drive interaction. This study offers a multimodal computational framework that integrates textual and visual analysis and provides empirical, data-driven insights for optimizing automotive video content strategies, contributing to the growing application of computational methods in communication research. Full article
(This article belongs to the Special Issue Advances in Data-Driven Artificial Intelligence)
Show Figures

Figure 1

21 pages, 5023 KB  
Article
Robust 3D Target Detection Based on LiDAR and Camera Fusion
by Miao Jin, Bing Lu, Gang Liu, Yinglong Diao, Xiwen Chen and Gaoning Nie
Electronics 2025, 14(21), 4186; https://doi.org/10.3390/electronics14214186 - 27 Oct 2025
Viewed by 455
Abstract
Autonomous driving relies on multimodal sensors to acquire environmental information for supporting decision making and control. While significant progress has been made in 3D object detection regarding point cloud processing and multi-sensor fusion, existing methods still suffer from shortcomings—such as sparse point clouds [...] Read more.
Autonomous driving relies on multimodal sensors to acquire environmental information for supporting decision making and control. While significant progress has been made in 3D object detection regarding point cloud processing and multi-sensor fusion, existing methods still suffer from shortcomings—such as sparse point clouds of foreground targets, fusion instability caused by fluctuating sensor data quality, and inadequate modeling of cross-frame temporal consistency in video streams—which severely restrict the practical performance of perception systems. To address these issues, this paper proposes a multimodal video stream 3D object detection framework based on reliability evaluation. Specifically, it dynamically perceives the reliability of each modal feature by evaluating the Region of Interest (RoI) features of cameras and LiDARs, and adaptively adjusts their contribution ratios in the fusion process accordingly. Additionally, a target-level semantic soft matching graph is constructed within the RoI region. Combined with spatial self-attention and temporal cross-attention mechanisms, the spatio-temporal correlations between consecutive frames are fully explored to achieve feature completion and enhancement. Verification on the nuScenes dataset shows that the proposed algorithm achieves an optimal performance of 67.3% and 70.6% in terms of the two core metrics, mAP and NDS, respectively—outperforming existing mainstream 3D object detection algorithms. Ablation experiments confirm that each module plays a crucial role in improving overall performance, and the algorithm exhibits better robustness and generalization in dynamically complex scenarios. Full article
Show Figures

Figure 1

15 pages, 704 KB  
Article
PMRVT: Parallel Attention Multilayer Perceptron Recurrent Vision Transformer for Object Detection with Event Cameras
by Zishi Song, Jianming Wang, Yongxin Su, Yukuan Sun and Xiaojie Duan
Sensors 2025, 25(21), 6580; https://doi.org/10.3390/s25216580 - 25 Oct 2025
Viewed by 871
Abstract
Object detection in high-speed and dynamic environments remains a core challenge in computer vision. Conventional frame-based cameras often suffer from motion blur and high latency, while event cameras capture brightness changes asynchronously with microsecond resolution, high dynamic range, and ultra-low latency, offering a [...] Read more.
Object detection in high-speed and dynamic environments remains a core challenge in computer vision. Conventional frame-based cameras often suffer from motion blur and high latency, while event cameras capture brightness changes asynchronously with microsecond resolution, high dynamic range, and ultra-low latency, offering a promising alternative. Despite these advantages, existing event-based detection methods still suffer from high computational cost, limited temporal modeling, and unsatisfactory real-time performance. We present PMRVT (Parallel Attention Multilayer Perceptron Recurrent Vision Transformer), a unified framework that systematically balances early-stage efficiency, enriched spatial expressiveness, and long-horizon temporal consistency. This balance is achieved through a hybrid hierarchical backbone, a Parallel Attention Feature Fusion (PAFF) mechanism with coordinated dual-path design, and a temporal integration strategy, jointly ensuring strong accuracy and real-time performance. Extensive experiments on Gen1 and 1 Mpx datasets show that PMRVT achieves 48.7% and 48.6% mAP with inference latencies of 7.72 ms and 19.94 ms, respectively. Compared with state-of-the-art methods, PMRVT improves accuracy by 1.5 percentage points (pp) and reduces latency by 8%, striking a favorable balance between accuracy and speed and offering a reliable solution for real-time event-based vision applications. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

14 pages, 7476 KB  
Article
Development of 3D-Stacked 1Megapixel Dual-Time-Gated SPAD Image Sensor with Simultaneous Dual Image Output Architecture for Efficient Sensor Fusion
by Kazuma Chida, Kazuhiro Morimoto, Naoki Isoda, Hiroshi Sekine, Tomoya Sasago, Yu Maehashi, Satoru Mikajiri, Kenzo Tojima, Mahito Shinohara, Ayman T. Abdelghafar, Hiroyuki Tsuchiya, Kazuma Inoue, Satoshi Omodani, Alice Ehara, Junji Iwata, Tetsuya Itano, Yasushi Matsuno, Katsuhito Sakurai and Takeshi Ichikawa
Sensors 2025, 25(21), 6563; https://doi.org/10.3390/s25216563 - 24 Oct 2025
Viewed by 351
Abstract
Sensor fusion is crucial in numerous imaging and sensing applications. Integrating data from multiple sensors with different field-of-view, resolution, and frame timing poses substantial computational overhead. Time-gated single-photon avalanche diode (SPAD) image sensors have been developed to support multiple sensing modalities and mitigate [...] Read more.
Sensor fusion is crucial in numerous imaging and sensing applications. Integrating data from multiple sensors with different field-of-view, resolution, and frame timing poses substantial computational overhead. Time-gated single-photon avalanche diode (SPAD) image sensors have been developed to support multiple sensing modalities and mitigate this issue, but mismatched frame timing remains a challenge. Dual-time-gated SPAD image sensors, which can capture dual images simultaneously, have also been developed. However, the reported sensors suffered from medium-to-large pixel pitch, limited resolution, and inability to independently control the exposure time of the dual images, which restricts their applicability. In this paper, we introduce a 5 µm-pitch, 3D-backside-illuminated (BSI) 1Megapixel dual-time-gated SPAD image sensor enabling a simultaneous output of dual images. The developed SPAD image sensor is verified to operate as an RGB-Depth (RGB-D) sensor without complex image alignment. In addition, a novel high dynamic range (HDR) technique, utilizing pileup effect with two parallel in-pixel memories, is validated for dynamic range extension in 2D imaging, achieving a dynamic range of 119.5 dB. The proposed architecture provides dual image output with the same field-of-view, resolution, and frame timing, and is promising for efficient sensor fusion. Full article
Show Figures

Figure 1

24 pages, 3366 KB  
Article
Study of the Optimal YOLO Visual Detector Model for Enhancing UAV Detection and Classification in Optoelectronic Channels of Sensor Fusion Systems
by Ildar Kurmashev, Vladislav Semenyuk, Alberto Lupidi, Dmitriy Alyoshin, Liliya Kurmasheva and Alessandro Cantelli-Forti
Drones 2025, 9(11), 732; https://doi.org/10.3390/drones9110732 - 23 Oct 2025
Viewed by 583
Abstract
The rapid spread of unmanned aerial vehicles (UAVs) has created new challenges for airspace security, as drones are increasingly used for surveillance, smuggling, and potentially for attacks near critical infrastructure. A key difficulty lies in reliably distinguishing UAVs from visually similar birds in [...] Read more.
The rapid spread of unmanned aerial vehicles (UAVs) has created new challenges for airspace security, as drones are increasingly used for surveillance, smuggling, and potentially for attacks near critical infrastructure. A key difficulty lies in reliably distinguishing UAVs from visually similar birds in electro-optical surveillance channels, where complex backgrounds and visual noise often increase false alarms. To address this, we investigated recent YOLO architectures and developed an enhanced model named YOLOv12-ADBC, incorporating an adaptive hierarchical feature integration mechanism to strengthen multi-scale spatial fusion. This architectural refinement improves sensitivity to subtle inter-class differences between drones and birds. A dedicated dataset of 7291 images was used to train and evaluate five YOLO versions (v8–v12), together with the proposed YOLOv12-ADBC. Comparative experiments demonstrated that YOLOv12-ADBC achieved the best overall performance, with precision = 0.892, recall = 0.864, mAP50 = 0.881, mAP50–95 = 0.633, and per-class accuracy reaching 96.4% for drones and 80% for birds. In inference tests on three video sequences simulating realistic monitoring conditions, YOLOv12-ADBC consistently outperformed baselines, achieving a detection accuracy of 92.1–95.5% and confidence levels up to 88.6%, while maintaining real-time processing at 118–135 frames per second (FPS). These results demonstrate that YOLOv12-ADBC not only surpasses previous YOLO models but also offers strong potential as the optical module in multi-sensor fusion frameworks. Its integration with radar, RF, and acoustic channels is expected to further enhance system-level robustness, providing a practical pathway toward reliable UAV detection in modern airspace protection systems. Full article
Show Figures

Figure 1

23 pages, 4442 KB  
Article
Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng
by Yanlei Xu, Ziyuan Yu, Dongze Wang, Chao Liu, Zhen Lu, Chen Zhao and Yang Zhou
Agronomy 2025, 15(11), 2450; https://doi.org/10.3390/agronomy15112450 - 22 Oct 2025
Viewed by 217
Abstract
Understory ginseng, with superior quality compared to field-cultivated varieties, is highly susceptible to diseases, which negatively impact both its yield and quality. Therefore, this paper proposes a lightweight, high-precision leaf spot segmentation model, Lightweight DeepLabv3+ with a StarNet Backbone and Attention-guided Gaussian Edge [...] Read more.
Understory ginseng, with superior quality compared to field-cultivated varieties, is highly susceptible to diseases, which negatively impact both its yield and quality. Therefore, this paper proposes a lightweight, high-precision leaf spot segmentation model, Lightweight DeepLabv3+ with a StarNet Backbone and Attention-guided Gaussian Edge Enhancement (LD-SAGE). This study first introduces StarNet into the DeepLabv3+ framework to replace the Xception backbone, reducing the parameter count and computational complexity. Secondly, the Gaussian-Edge Channel Fusion module uses multi-scale Gaussian convolutions to smooth blurry areas, combining Scharr edge-enhanced features with a lightweight channel attention mechanism for efficient edge and semantic feature integration. Finally, the proposed Multi-scale Attention-guided Context Modulation module replaces the traditional Atrous Spatial Pyramid Pooling. It integrates Multi-scale Grouped Dilated Convolution, Convolutional Multi-Head Self-Attention, and dynamic modulation fusion. This reduces computational costs and improves the model’s ability to capture contextual information and texture details in disease areas. Experimental results show that the LD-SAGE model achieves an mIoU of 92.48%, outperforming other models in terms of precision and recall. The model’s parameter count is only 4.6% of the original, with GFLOPs reduced to 22.1% of the baseline model. Practical deployment experiments on the Jetson Orin Nano device further confirm the advantage of the proposed method in the real-time frame rate, providing support for the diagnosis of leaf diseases in understory ginseng. Full article
(This article belongs to the Section Pest and Disease Management)
Show Figures

Figure 1

26 pages, 18261 KB  
Article
Fully Autonomous Real-Time Defect Detection for Power Distribution Towers: A Small Target Defect Detection Method Based on YOLOv11n
by Jingtao Zhang, Siwen Chen, Wei Wang and Qi Wang
Sensors 2025, 25(20), 6445; https://doi.org/10.3390/s25206445 - 18 Oct 2025
Viewed by 504
Abstract
Drones offer a promising solution for automating distribution tower inspection, but real-time defect detection remains challenging due to limited computational resources and the small size of critical defects. This paper proposes TDD-YOLO, an optimized model based on YOLOv11n, which enhances small defect detection [...] Read more.
Drones offer a promising solution for automating distribution tower inspection, but real-time defect detection remains challenging due to limited computational resources and the small size of critical defects. This paper proposes TDD-YOLO, an optimized model based on YOLOv11n, which enhances small defect detection through four key improvements: (1) SPD-Conv preserves fine-grained details, (2) CBAM amplifies defect salience, (3) BiFPN enables efficient multi-scale fusion, and (4) a dedicated high-resolution detection head improves localization precision. Evaluated on a custom dataset, TDD-YOLO achieves an mAP@0.5 of 0.873, outperforming the baseline by 3.9%. When deployed on a Jetson Orin Nano at 640 × 640 resolution, the system achieves an average frame rate of 28 FPS, demonstrating its practical viability for real-time autonomous inspection. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

22 pages, 1678 KB  
Article
Image Completion Network Considering Global and Local Information
by Yubo Liu, Ke Chen and Alan Penn
Buildings 2025, 15(20), 3746; https://doi.org/10.3390/buildings15203746 - 17 Oct 2025
Viewed by 278
Abstract
Accurate depth image inpainting in complex urban environments remains a critical challenge due to occlusions, reflections, and sensor limitations, which often result in significant data loss. We propose a hybrid deep learning framework that explicitly combines local and global modelling through Convolutional Neural [...] Read more.
Accurate depth image inpainting in complex urban environments remains a critical challenge due to occlusions, reflections, and sensor limitations, which often result in significant data loss. We propose a hybrid deep learning framework that explicitly combines local and global modelling through Convolutional Neural Networks (CNNs) and Transformer modules. The model employs a multi-branch parallel architecture, where the CNN branch captures fine-grained local textures and edges, while the Transformer branch models global semantic structures and long-range dependencies. We introduce an optimized attention mechanism, Agent Attention, which differs from existing efficient/linear attention methods by using learnable proxy tokens tailored for urban scene categories (e.g., façades, sky, ground). A content-guided dynamic fusion module adaptively combines multi-scale features to enhance structural alignment and texture recovery. The frame-work is trained with a composite loss function incorporating pixel accuracy, perceptual similarity, adversarial realism, and structural consistency. Extensive experiments on the Paris StreetView dataset demonstrate that the proposed method achieves state-of-the-art performance, outperforming existing approaches in PSNR, SSIM, and LPIPS metrics. The study highlights the potential of multi-scale modeling for urban depth inpainting and discusses challenges in real-world deployment, ethical considerations, and future directions for multimodal integration. Full article
Show Figures

Figure 1

19 pages, 2109 KB  
Article
SF6 Leak Detection in Infrared Video via Multichannel Fusion and Spatiotemporal Features
by Zhiwei Li, Xiaohui Zhang, Zhilei Xu, Yubo Liu and Fengjuan Zhang
Appl. Sci. 2025, 15(20), 11141; https://doi.org/10.3390/app152011141 - 17 Oct 2025
Viewed by 235
Abstract
With the development of infrared imaging technology and the integration of intelligent algorithms, the realization of non-contact, dynamic and real-time detection of SF6 gas leakage based on infrared video has been a significant research direction. However, the existing real-time detection algorithms exhibit low [...] Read more.
With the development of infrared imaging technology and the integration of intelligent algorithms, the realization of non-contact, dynamic and real-time detection of SF6 gas leakage based on infrared video has been a significant research direction. However, the existing real-time detection algorithms exhibit low accuracy in detecting SF6 leakage and are susceptible to noise, which makes it difficult to meet the actual needs of engineering. To address this problem, this paper proposes a real-time SF6 leakage detection method, VGEC-Net, based on multi-channel fusion and spatiotemporal feature extraction. The proposed method first employs the ViBe-GMM algorithm to extract foreground masks, which are then fused with infrared images to construct a dual-channel input. In the backbone network, a CE-Net structure—integrating CBAM and ECA-Net—is combined with the P3D network to achieve efficient spatiotemporal feature extraction. A Feature Pyramid Network (FPN) and a temporal Transformer module are further integrated to enhance multi-scale feature representation and temporal modeling, thereby significantly improving the detection performance for small-scale targets. Experimental results demonstrate that VGEC-Net achieves a mean average precision (mAP) of 61.7% on the dataset used in this study, with a mAP@50 of 87.3%, which represents a significant improvement over existing methods. These results validate the effectiveness and advancement of the proposed method for infrared video-based gas leakage detection. Furthermore, the model achieves 78.2 frames per second (FPS) during inference, demonstrating good real-time processing capability while maintaining high detection accuracy, exhibiting strong application potential. Full article
Show Figures

Figure 1

20 pages, 1288 KB  
Article
Spatio-Temporal Residual Attention Network for Satellite-Based Infrared Small Target Detection
by Yan Chang, Decao Ma, Qisong Yang, Shaopeng Li and Daqiao Zhang
Remote Sens. 2025, 17(20), 3457; https://doi.org/10.3390/rs17203457 - 16 Oct 2025
Viewed by 323
Abstract
With the development of infrared remote sensing technology and the deployment of satellite constellations, infrared video from orbital platforms is playing an increasingly important role in airborne target surveillance. However, due to the limitations of remote sensing imaging, the aerial targets in such [...] Read more.
With the development of infrared remote sensing technology and the deployment of satellite constellations, infrared video from orbital platforms is playing an increasingly important role in airborne target surveillance. However, due to the limitations of remote sensing imaging, the aerial targets in such videos are often small in scale, low in contrast, and slow in movement, making them difficult to detect in complex backgrounds. In this paper, we propose a novel detection network that integrates inter-frame residual guidance with spatio-temporal feature enhancement to address the challenge of small object detection in infrared satellite video. This method first extracts residual features to highlight motion-sensitive regions, then uses a dual-branch structure to encode spatial semantics and temporal evolution, and then fuses them deeply through a multi-scale feature enhancement module. Extensive experiments show that this method outperforms mainstream methods in terms on various infrared small target video datasets, and has good robustness under low-signal-to-noise-ratio conditions. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

17 pages, 4777 KB  
Article
Robust Occupant Behavior Recognition via Multimodal Sequence Modeling: A Comparative Study for In-Vehicle Monitoring Systems
by Jisu Kim and Byoung-Keon D. Park
Sensors 2025, 25(20), 6323; https://doi.org/10.3390/s25206323 - 13 Oct 2025
Viewed by 419
Abstract
Understanding occupant behavior is critical for enhancing safety and situational awareness in intelligent transportation systems. This study investigates multimodal occupant behavior recognition using sequential inputs extracted from 2D pose, 2D gaze, and facial movements. We conduct a comprehensive comparative study of three distinct [...] Read more.
Understanding occupant behavior is critical for enhancing safety and situational awareness in intelligent transportation systems. This study investigates multimodal occupant behavior recognition using sequential inputs extracted from 2D pose, 2D gaze, and facial movements. We conduct a comprehensive comparative study of three distinct architectural paradigms: a static Multi-Layer Perceptron (MLP), a recurrent Long Short-Term Memory (LSTM) network, and an attention-based Transformer encoder. All experiments are performed on the large-scale Occupant Behavior Classification (OBC) dataset, which contains approximately 2.1 million frames across 79 behavior classes collected in a controlled, simulated environment. Our results demonstrate that temporal models significantly outperform the static baseline. The Transformer model, in particular, emerges as the superior architecture, achieving a state-of-the-art Macro F1 score of 0.9570 with a configuration of a 50-frame span and a step size of 10. Furthermore, our analysis reveals that the Transformer provides an excellent balance between high performance and computational efficiency. These findings demonstrate the superiority of attention-based temporal modeling with multimodal fusion and provide a practical framework for developing robust and efficient in-vehicle occupant monitoring systems. Implementation code and supplementary resources are available (see Data Availability Statement). Full article
Show Figures

Figure 1

40 pages, 4388 KB  
Article
Optimized Implementation of YOLOv3-Tiny for Real-Time Image and Video Recognition on FPGA
by Riccardo Calì, Laura Falaschetti and Giorgio Biagetti
Electronics 2025, 14(20), 3993; https://doi.org/10.3390/electronics14203993 - 12 Oct 2025
Viewed by 597
Abstract
In recent years, the demand for efficient neural networks in embedded contexts has grown, driven by the need for real-time inference with limited resources. While GPUs offer high performance, their size, power consumption, and cost often make them unsuitable for constrained or large-scale [...] Read more.
In recent years, the demand for efficient neural networks in embedded contexts has grown, driven by the need for real-time inference with limited resources. While GPUs offer high performance, their size, power consumption, and cost often make them unsuitable for constrained or large-scale applications. FPGAs have therefore emerged as a promising alternative, combining reconfigurability, parallelism, and increasingly favorable cost–performance ratios. They are especially relevant in domains such as robotics, IoT, and autonomous drones, where rapid sensor fusion and low power consumption are critical. This work presents the full implementation of a neural network on a low-cost FPGA, targeting real-time image and video recognition for drone applications. The workflow included training and quantizing a YOLOv3-Tiny model with Brevitas and PyTorch, converting it into hardware logic using the FINN framework, and optimizing the hardware design to maximize use of the reprogrammable silicon area and inference time. A custom driver was also developed to allow the device to operate as a TPU. The resulting accelerator, deployed on a Xilinx Zynq-7020, could recognize 208 frames per second (FPS) when running at a 200 MHz clock frequency, while consuming only 2.55 W. Compared to Google’s Coral Edge TPU, the system offers similar inference speed with greater flexibility, and outperforms other FPGA-based approaches in the literature by a factor of three to seven in terms of FPS/W. Full article
Show Figures

Figure 1

Back to TopTop