Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (527)

Search Parameters:
Keywords = sparse convolutional network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 2660 KB  
Article
SpaA: A Spatial-Aware Network for 3D Object Detection from LiDAR Point Clouds
by Jianfeng Song, Chu Zhang, Cheng Zhang, Li Song, Ruobin Wang and Kun Xie
Remote Sens. 2026, 18(8), 1104; https://doi.org/10.3390/rs18081104 - 8 Apr 2026
Abstract
Grid-based 3D object detection methods effectively leverage mature point cloud processing techniques and convolutional neural networks for feature extraction and object localization. However, unlike the 2D object detection domain, the unique characteristics of point cloud data being unevenly and sparsely distributed in space [...] Read more.
Grid-based 3D object detection methods effectively leverage mature point cloud processing techniques and convolutional neural networks for feature extraction and object localization. However, unlike the 2D object detection domain, the unique characteristics of point cloud data being unevenly and sparsely distributed in space necessitate that detection networks possess a certain level of spatial structural perception. Learning spatial information such as point cloud density and distribution patterns can significantly benefit 3D detection networks. This paper proposes a Spatial-aware Network for 3D object detection (SpaA). Based on the 3D sparse convolution network, we designed a Variable Sparse Convolution network (VS-Conv) capable of perceiving the importance of locations. To address the issue of set abstraction operations completely ignoring spatial structure during local feature aggregation, we proposed a Spatial-aware Density-based Local Aggregation (SDLA) method. Experiments demonstrate that enhancing the spatial-awareness capability of detection networks is crucial for complex 3D object detection. Detection results on the KITTI dataset validate the effectiveness of our method. The test set results of SpaA achieved 3D AP values of 82.20%, 44.04%, and 70.34% for the Car, Pedestrian, and Cyclist categories, respectively, and a competitive 3D mAP of 67.23%, outperforming several published methods. Full article
Show Figures

Figure 1

20 pages, 12712 KB  
Article
Large-Scale Airborne LiDAR Point Cloud Building Extraction Based on Improved Voxelized Deep Learning Network
by Bai Xue, Yanru Song, Pi Ai, Hongzhou Li, Shuhan Liu and Li Guo
Buildings 2026, 16(7), 1450; https://doi.org/10.3390/buildings16071450 - 7 Apr 2026
Viewed by 68
Abstract
High-precision 3D building data are pivotal for smart city development, urban planning, and disaster management. However, large-scale building extraction from airborne LiDAR point clouds remains challenging due to semantic ambiguity, uneven point density, and complex architectural structures. To address these limitations, we propose [...] Read more.
High-precision 3D building data are pivotal for smart city development, urban planning, and disaster management. However, large-scale building extraction from airborne LiDAR point clouds remains challenging due to semantic ambiguity, uneven point density, and complex architectural structures. To address these limitations, we propose a novel framework integrating geometric topology perception with cross-dimensional attention mechanisms within a Sparse Voxel Convolutional Neural Network (SPVCNN). The key contributions include: (1) an enhanced LaserMix++ multi-scale hybrid augmentation strategy featuring cross-scene block replacement, ground normal–constrained rotation, and non-uniform scaling; (2) a dual-branch SPVCNN architecture embedding a collaborative module of Geometric Self-Attention (GSA) and Cross-Space Residual Attention (CSRA) to preserve topological consistency and enable cross-dimensional feature interaction; and (3) a Boundary Enhancement Module (BEM) specifically designed to resolve boundary ambiguity and overlapping predictions. Evaluated on a 177 km2 dataset covering Washington, D.C., our method significantly outperforms the baseline SPVCNN, improving accuracy by 12.04 percentage points (0.8212 to 0.9416) and Intersection over Union (IoU) by 9.96 percentage points (0.866 to 0.9656). Furthermore, it surpasses mainstream networks such as Cylinder3D and MinkResNet by over 50% in absolute accuracy gain. These results demonstrate the effectiveness of synergistically combining geometric perception with adaptive attention for robust building extraction from large-scale LiDAR data. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

17 pages, 2174 KB  
Article
RadarSSM: A Lightweight Spatiotemporal State Space Network for Efficient Radar-Based Human Activity Recognition
by Rubin Zhao, Fucheng Miao and Yuanjian Liu
Sensors 2026, 26(7), 2259; https://doi.org/10.3390/s26072259 - 6 Apr 2026
Viewed by 193
Abstract
Millimeter-wave radar has gradually gained popularity as a sensor mode for Human Activity Recognition (HAR) in recent years because it preserves the privacy of individuals and is resistant to environmental conditions. Nevertheless, the fast inference of high-dimensional and sparse 4D radar data is [...] Read more.
Millimeter-wave radar has gradually gained popularity as a sensor mode for Human Activity Recognition (HAR) in recent years because it preserves the privacy of individuals and is resistant to environmental conditions. Nevertheless, the fast inference of high-dimensional and sparse 4D radar data is still difficult to perform on low-resource edge devices. Current models, including 3D Convolutional Neural Networks and Transformer-based models, are frequently plagued by extensive parameter overhead or quadratic computational complexity, which restricts their applicability to edge applications. The present paper attempts to resolve these issues by introducing RadarSSM as a lightweight spatiotemporal hybrid network in the context of radar-based HAR. The explicit separation of spatial feature extraction and temporal dependency modeling helps RadarSSM decrease the overall complexity of computation significantly. Specifically, a spatial encoder based on depthwise separable 3D convolutions is designed to efficiently capture fine-grained geometric and motion features from voxelized radar data. For temporal modeling, a bidirectional State Space Model is introduced to capture long-range temporal dependencies with linear time complexity O(T), thereby avoiding the quadratic cost associated with self-attention mechanisms. Extensive experiments conducted on public radar HAR datasets demonstrate that RadarSSM achieves accuracy competitive with state-of-the-art methods while substantially reducing parameter count and computational cost relative to representative convolutional baselines. These results validate the effectiveness of RadarSSM and highlight its suitability for efficient radar sensing on edge hardware. Full article
(This article belongs to the Special Issue Radar and Multimodal Sensing for Ambient Assisted Living)
Show Figures

Figure 1

28 pages, 5206 KB  
Article
CEA-DETR: A Multi-Scale Feature Fusion-Based Method for Wind Turbine Blade Surface Defect Detection
by Xudong Luo, Ruimin Wang, Jianhui Zhang, Junjie Zeng and Xiaohang Cai
Sensors 2026, 26(7), 2115; https://doi.org/10.3390/s26072115 - 28 Mar 2026
Viewed by 367
Abstract
Wind turbine blade surface defect detection remains challenging due to large variations in defect scales, blurred edge textures, and severe interference from complex backgrounds, which often lead to insufficient detection accuracy and high false and missed detection rates. To address these issues, this [...] Read more.
Wind turbine blade surface defect detection remains challenging due to large variations in defect scales, blurred edge textures, and severe interference from complex backgrounds, which often lead to insufficient detection accuracy and high false and missed detection rates. To address these issues, this paper proposes an improved RTDETR-based detection framework, termed CEA-DETR, for wind turbine blade surface defect inspection. First, a Cross-Scale Multi-Edge feature Extraction (CSME) backbone is designed by integrating multi-scale pooling and edge-enhancement units with a dual-domain feature selection mechanism, enabling effective extraction of fine-grained texture and edge features across different scales. Second, an Efficient Multi-Scale Feature Fusion Network (EMSFFN) is constructed to facilitate deep cross-level feature interaction through adaptive weighted fusion and multi-scale convolutional structures, thereby enhancing the representation of multi-scale defects. Furthermore, an adaptive sparse self-attention mechanism is introduced to reconstruct the AIFI module, strengthening global dependency modeling and guiding the network to focus on critical defect regions under complex background conditions. Experimental results demonstrate that CEA-DETR achieves mAP50 and mAP50:95 of 89.4% and 68.9%, respectively, representing improvements of 3.1% and 6.5% over the RT-DETR-r18 baseline. Meanwhile, the proposed model reduces computational cost (GFLOPs) by 20.1% and parameter count by 8.1%. These advantages make CEA-DETR more suitable for deployment on resource-constrained unmanned aerial vehicles (UAVs), enabling efficient and real-time autonomous inspection of wind turbine blades. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

28 pages, 7008 KB  
Article
Multimodal Deep Learning Framework for Profiling Socio-Economic Indicators and Public Health Determinants in Urban Environments
by Esaie Dufitimana, Jean Pierre Bizimana, Ernest Uwayezu, Paterne Gahungu and Emmy Mugisha
Urban Sci. 2026, 10(4), 177; https://doi.org/10.3390/urbansci10040177 - 25 Mar 2026
Viewed by 340
Abstract
Urbanization significantly enhances socio-economic conditions, health, and well-being for many by improving access to services, education, and economic opportunities. However, socio-economic and public health disparities are also being exacerbated by urbanization. The reliable data required to monitor these conditions are often unavailable, outdated, [...] Read more.
Urbanization significantly enhances socio-economic conditions, health, and well-being for many by improving access to services, education, and economic opportunities. However, socio-economic and public health disparities are also being exacerbated by urbanization. The reliable data required to monitor these conditions are often unavailable, outdated, or inconsistent. This study introduces a multimodal deep learning framework that integrates satellite imagery with street network datasets to predict urban socio-economic indicators and public health determinants at the sector level as a political administrative unit of public health planning in Rwanda. We extracted latent visual and topological embeddings of the urban built environment, using a Convolutional Neural Network (CNN) and Graph Neural Network (GNN). These embeddings were fused through an attentional mechanism to train a multi-task regression model that simultaneously predicts multiple socio-economic indicators and public health determinants. This framework was applied to the City of Kigali in Rwanda. Overall, the multimodal fusion model achieved the best average performance across targets, with an average correlation of 0.68 and MAE of 1.26 for socio-economic indicators, and 0.68 and 1.46 for public health determinants, demonstrating the benefit of integrating visual and topological information. The learned fused embedding space arranges socio-economic indicators and public health determinant deciles along a continuous morphological gradient from sparsely built rural settings to dense urban settings, demonstrating that the urban form encodes latent signals that capture socio-economic indicators and health determinants. Moreover, the study reveals a strong relationship between socio-economic indicators and the public health index, with education, cooking materials, and floor materials exhibiting a correlation above 0.96. This work demonstrates the utility of an integrated framework for socio-economic indicator profiling and public health planning in data-scarce urban contexts, offering a scalable approach for monitoring the indicators of Sustainable Development Goals in rapidly changing urban environments. Full article
(This article belongs to the Topic Geospatial AI: Systems, Model, Methods, and Applications)
Show Figures

Figure 1

23 pages, 129074 KB  
Article
High-Resolution Air Temperature Estimation Using the Full Landsat Spectral Range and Information-Based Machine Learning
by Daniel Eitan, Asher Holder, Zohar Yakhini and Alexandra Chudnovsky
Remote Sens. 2026, 18(6), 954; https://doi.org/10.3390/rs18060954 - 22 Mar 2026
Viewed by 331
Abstract
Accurate mapping of near-surface air temperature (Tair) at the fine spatial resolution is required for city-scale monitoring and remains a critical challenge in Earth Observation (EO). Reliance on ground-based measurements is constrained by their sparse spatial coverage and high operational [...] Read more.
Accurate mapping of near-surface air temperature (Tair) at the fine spatial resolution is required for city-scale monitoring and remains a critical challenge in Earth Observation (EO). Reliance on ground-based measurements is constrained by their sparse spatial coverage and high operational costs. We present a novel, scalable machine learning framework designed to overcome this limitation. Our method utilizes interpretable Convolutional Neural Networks (CNNs) to fuse high-resolution Landsat data, integrating both thermal and reflective spectral bands, with contextual spatiotemporal metadata. This approach allows for inference, at 30 m resolution, of Tair fields without relying on dense, localized ground monitoring networks. Our hybrid CNN architecture is optimized for spatial generalization, maintaining strong and transferable performance (station-wise R20.88) across diverse environments from humid coasts (R20.89) to arid interiors (R20.84). Although focused on a specific geographical region, our results suggest a robust and reproducible pathway for generating spatially consistent temperature fields from globally available EO archives, directly supporting urban heat island mitigation, climate policy development, and high-resolution public health assessment worldwide. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

15 pages, 5485 KB  
Article
DC Series Arc Fault Detection in Electric Vehicle Charging Systems Using a Temporal Convolution and Sparse Transformer Network
by Kai Yang, Shun Zhang, Rongyuan Lin, Ran Tu, Xuejin Zhou and Rencheng Zhang
Sensors 2026, 26(6), 1897; https://doi.org/10.3390/s26061897 - 17 Mar 2026
Viewed by 339
Abstract
In electric vehicle (EV) charging systems, DC series arc faults, due to their high concealment and severe hazard, have become one of the important causes of electric vehicle fire accidents. An improved hybrid arc fault model of a charging system was established in [...] Read more.
In electric vehicle (EV) charging systems, DC series arc faults, due to their high concealment and severe hazard, have become one of the important causes of electric vehicle fire accidents. An improved hybrid arc fault model of a charging system was established in Simulink for preliminary study. The results show that the high-frequency noise generated by arc faults affects the output voltage quality of the charger, and this noise is conducted to the battery voltage. Arc faults in a real electric vehicle charging experimental platform were further investigated, where it was found that, during arc fault events, the charging system provides no alarm indication, and the current signals exhibit significant large-amplitude random disturbances and nonlinear fluctuations. Moreover, under normal conditions during vehicle charging startup and the pre-charge stage, the current waveforms also present high-pulse spike characteristics similar to arc faults. Finally, a carefully designed deep neural network-based arc fault detection algorithm, Arc_TCNsformer, is proposed. The current signal samples are directly input into the network model without manual feature selection or extraction, enabling end-to-end fault recognition. By integrating a temporal convolutional network for multi-scale local feature extraction with a sparse Transformer for contextual information aggregation, the proposed method achieves strong robustness under complex charging noise environments. Experimental results demonstrate that the algorithm not only provides high detection accuracy but also maintains reliable real-time performance when deployed on embedded edge computing platforms. Full article
(This article belongs to the Special Issue Deep Learning Based Intelligent Fault Diagnosis)
Show Figures

Figure 1

23 pages, 13051 KB  
Article
BAWSeg: A UAV Multispectral Benchmark for Barley Weed Segmentation
by Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Dustin Severtson and Ajmal Mian
Remote Sens. 2026, 18(6), 915; https://doi.org/10.3390/rs18060915 - 17 Mar 2026
Viewed by 279
Abstract
Accurate weed mapping in cereal fields requires pixel-level segmentation from unmanned aerial vehicle (UAV) imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop–weed pixels, or [...] Read more.
Accurate weed mapping in cereal fields requires pixel-level segmentation from unmanned aerial vehicle (UAV) imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop–weed pixels, or on single-stream convolutional neural network (CNN) and Transformer backbones that ingest stacked bands and indices, where radiance cues and normalized index cues interfere and reduce sensitivity to small weed clusters embedded in crop canopy. We propose VISA (Vegetation Index and Spectral Attention), a two-stream segmentation network that decouples these cues and fuses them at native resolution. The radiance stream learns from calibrated five-band reflectance using local residual convolutions, channel recalibration, spatial gating, and skip-connected decoding, which preserve fine textures, row boundaries, and small weed structures that are often weakened after ratio-based index compression. The index stream operates on vegetation-index maps with windowed self-attention to model local structure efficiently, state-space layers to propagate field-scale context without quadratic attention cost, and Slot Attention to form stable region descriptors that improve discrimination of sparse weeds under canopy mixing. To support supervised training and deployment-oriented evaluation, we introduce BAWSeg, a four-year UAV multispectral dataset collected over commercial barley paddocks in Western Australia, providing radiometrically calibrated blue, green, red, red edge, and near-infrared orthomosaics, derived vegetation indices, and dense crop, weed, and other labels with leakage-free block splits. On BAWSeg, VISA achieves 75.6% mean Intersection over Union (mIoU) and 63.5% weed Intersection over Union (IoU) with 22.8 M parameters, outperforming a multispectral SegFormer-B1 baseline by 1.2 mIoU and 1.9 weed IoU. Under cross-plot and cross-year protocols, VISA maintains 71.2% and 69.2% mIoU, respectively. The full BAWSeg benchmark dataset, VISA code, trained model weights, and protocol files will be released upon publication. Full article
Show Figures

Figure 1

19 pages, 2968 KB  
Article
CBAM-Enhanced CNN-LSTM with Improved DBSCAN for High-Precision Radar-Based Gesture Recognition
by Shiwei Yi, Zhenyu Zhao and Tongning Wu
Sensors 2026, 26(6), 1835; https://doi.org/10.3390/s26061835 - 14 Mar 2026
Viewed by 368
Abstract
In recent years, radar-based gesture recognition technology has been widely applied in industrial and daily life scenarios. However, increasingly complex application scenarios have imposed higher demands on the accuracy and robustness of gesture recognition algorithms, and challenges such as clutter interference, inter-gesture similarity, [...] Read more.
In recent years, radar-based gesture recognition technology has been widely applied in industrial and daily life scenarios. However, increasingly complex application scenarios have imposed higher demands on the accuracy and robustness of gesture recognition algorithms, and challenges such as clutter interference, inter-gesture similarity, and spatial–temporal feature ambiguity limit recognition performance. To address these challenges, a novel framework named CECL, which incorporates the Convolutional Block Attention Module (CBAM) into a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architecture, is proposed for high-accuracy radar-based gesture recognition. The CBAM adaptively highlights discriminative spatial regions and suppresses irrelevant background, and the CNN-LSTM network captures temporal dynamics across gesture sequences. During gesture signal processing, the Blackman window is applied to suppress spectral leakage. Additionally, a combination of wavelet thresholding and dynamic energy nulling is employed to effectively suppress clutter and enhance feature representation. Furthermore, an improved Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm further eliminates isolated sparse noise while preserving dense and valid target signal regions. Experimental results demonstrate that the proposed algorithm achieves 98.33% average accuracy in gesture classification, outperforming other baseline models. It exhibits excellent recognition performance across various distances and angles, demonstrating significantly enhanced robustness. Full article
Show Figures

Figure 1

30 pages, 30836 KB  
Article
CrownViM: Context Clustering Meets Vision Mamba for Precise Tree Crown Segmentation in Aerial RGB Imagery
by Erkang Shi, Ziyang Shi, Fulin Su, Lin Li, Ruifeng Liu, Fangying Wan and Kai Zhou
Remote Sens. 2026, 18(6), 860; https://doi.org/10.3390/rs18060860 - 11 Mar 2026
Viewed by 271
Abstract
The proliferation of high-spatial-resolution remote sensing data is transforming forest attribute estimation, replacing traditional manual approaches with deep learning-based Individual Tree Crown Delineation (ITCD). Nevertheless, accurate ITCD boundary extraction from aerial RGB imagery faces persistent challenges: boundary ambiguity from complex crown occlusion in [...] Read more.
The proliferation of high-spatial-resolution remote sensing data is transforming forest attribute estimation, replacing traditional manual approaches with deep learning-based Individual Tree Crown Delineation (ITCD). Nevertheless, accurate ITCD boundary extraction from aerial RGB imagery faces persistent challenges: boundary ambiguity from complex crown occlusion in mixed forests, scarcity of high-quality annotations, and computational limitations of existing methods in dense forests. The latter manifests particularly in overlapping crown scenarios through constrained receptive fields, leading to substantial parameter requirements, computational inefficiency, and compromised accuracy. To overcome these limitations, we propose CrownViM, a novel architecture based on a bidirectional State Space Model (SSM). The framework integrates a linear-complexity Context Clustering Vision Mamba (CCViM) encoder for efficient global context modeling and employs a MaskFormer decoder for precise boundary prediction. We further introduce a partial-supervision loss function to reduce dependence on exhaustively annotated crown masks. Evaluations on OAM-TCD and the single-tree segmentation dataset (SSD) show CrownViM achieves significant segmentation accuracy improvements while maintaining a lightweight profile (39.6 M parameters). It substantially outperforms Convolutional Neural Network (CNN), Vision Transformer (ViT), and hybrid-based baselines when processing overlapping crowns and structurally complex scenes. As the first implementation of state space models in ITCD, CrownViM effectively addresses core limitations in global context capture, computational efficiency, and boundary definition. Our efficient architecture and sparse-annotation loss strategy enable high-accuracy, robust individual tree mapping, advancing tools for large-scale forest monitoring and accurate carbon stock quantification. Full article
Show Figures

Figure 1

20 pages, 21647 KB  
Article
Spatial Orthogonal and Boundary-Aware Network for Rotated and Elongated-Target Detection
by Yong Liu, Zhengbiao Jing, Yinghong Chang and Donglin Jing
Algorithms 2026, 19(3), 206; https://doi.org/10.3390/a19030206 - 9 Mar 2026
Viewed by 208
Abstract
In recent years, the refinement of bounding box representations has emerged as a major research focus in remote sensing. Nevertheless, mainstream detection algorithms typically ignore the disruptive impacts induced by the diverse morphologies and arbitrary orientations of high-aspect-ratio aerial objects throughout model training, [...] Read more.
In recent years, the refinement of bounding box representations has emerged as a major research focus in remote sensing. Nevertheless, mainstream detection algorithms typically ignore the disruptive impacts induced by the diverse morphologies and arbitrary orientations of high-aspect-ratio aerial objects throughout model training, thereby giving rise to several critical technical challenges: (1) Anisotropic information distribution: Target features are highly concentrated in one spatial dimension but sparse in the other, with significant feature differences across bounding box parameters, breaking the symmetry of feature distribution. (2) Missing high-quality positive samples: IoU-based assignment strategies fail to adequately capture the symmetric structural characteristics of elongated targets, resulting in incomplete coverage of critical features. (3) Loss function gradient instability: Small deviations in large-aspect-ratio bounding boxes cause drastic loss value fluctuations, as the asymmetric gradient changes hinder stable optimization directions during training. To address the challenges, we propose a Spatial Orthogonal and Boundary-Aware Network (SOBA-Net) for rotated and elongated target detection, leveraging symmetry-aware designs to enhance feature representation. Specifically, spatial staggered convolutions are constructed to fuse local and directional contextual features, effectively modeling long-range symmetric information across multiple spatial scales and reducing background noise interference. Secondly, the designed Symmetric-Constrained Label Assignment (SC-LA) introduces an IoU-weighted function, ensuring high-quality samples with symmetric structural features are classified as positive samples. Ultimately, the designed Gradient Dynamic Equilibrium Loss Function mitigates the problem of unstable gradients associated with high-aspect-ratio objects by enforcing symmetrical gradient regulation across samples with negligible localization deviations. Comprehensive evaluations across three representative remote sensing benchmarks—DOTA, UCAS-AOD, and HRSC2016—sufficiently corroborate the superiority of symmetry-aware enhancement schemes, which boast straightforward implementation and efficient inference deployment. Full article
(This article belongs to the Special Issue Advances in Deep Learning-Based Data Analysis)
Show Figures

Figure 1

21 pages, 4170 KB  
Article
Real-Time Vibration Energy Prediction for Semi-Active Suspensions Using Inertial Sensors: A Physics-Guided Deep Learning Approach
by Jian Cheng, Fanhua Qin, Leyao Wang and Ruijuan Chi
Sensors 2026, 26(5), 1695; https://doi.org/10.3390/s26051695 - 7 Mar 2026
Viewed by 323
Abstract
Response latency and sensor noise are universal challenges in closed-loop control systems. In the context of semi-active suspensions, these issues also exist and manifest as critical bottlenecks. Due to the highly transient nature of road shocks, the inherent physical actuation delays of the [...] Read more.
Response latency and sensor noise are universal challenges in closed-loop control systems. In the context of semi-active suspensions, these issues also exist and manifest as critical bottlenecks. Due to the highly transient nature of road shocks, the inherent physical actuation delays of the hardware, combined with the phase lag introduced by traditional signal filtering, often cause the control response to significantly lag behind the physical excitation. To address this issue from a predictive perspective, this study proposes a Physics-Informed Gated Convolutional Neural Network (PI-GCNN) designed to predict future multi-modal energy evolution, thereby enabling feedforward control. Unlike traditional feedback mechanisms, the proposed framework employs the Continuous Wavelet Transform (CWT) to convert short-horizon inertial data into time–frequency scalograms, effectively isolating transient shock features from background vibrations. A novel physics-guided gating mechanism is embedded within the network architecture to regulate feature activation. This mechanism is trained using an asymmetric sparse physics loss, which combines L1 regularization with adaptive spectral consistency constraints to enforce noise suppression on flat roads while ensuring sensitivity to impacts. Extensive validation was conducted using high-fidelity heavy truck simulations and the public PVS 9 real-world dataset. The results confirm that the PI-GCNN achieves a predictive phase lead of approximately 100–200 ms over real-time baselines, creating a valuable actuation window for suspension dampers. Furthermore, the model demonstrates exceptional computational efficiency, with a parameter count of 0.10 M and a single-frame inference latency of 0.25 ms, making it highly suitable for deployment on resource-constrained automotive edge computing platforms. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

18 pages, 1637 KB  
Article
Spatio-Temporal Capsule Networks for Weakly Supervised Surveillance Video Anomaly Detection
by Mohammed Iqbal Dohan Almurumudhe and Olivér Hornyák
Appl. Sci. 2026, 16(5), 2567; https://doi.org/10.3390/app16052567 - 7 Mar 2026
Viewed by 305
Abstract
Real surveillance systems require weakly supervised video anomaly detection due to the fact that long untrimmed videos do not always have accurate temporal labels. Models will be required to label a video as normal or abnormal and also to identify sparse anomaly areas [...] Read more.
Real surveillance systems require weakly supervised video anomaly detection due to the fact that long untrimmed videos do not always have accurate temporal labels. Models will be required to label a video as normal or abnormal and also to identify sparse anomaly areas with mere video-level supervision. In this paper, we introduce ST-CapsNet, which is a spatio-temporal capsule network that enhances weakly supervised localization of anomalies by using a structured representation and temporal agreement. Every video is broken down into 32 parts and coded with 512-dimensional 3D CNN (Convolutional Neural Network) features. Primary capsules record patterns of segments as vectors, and temporal capsules are created by dynamic routing over time, enabling the related abnormal segments to provide support to a common event representation. Training is based on a multiple-instance learning model that has a bag-level BCE (Binary Cross-Entropy) loss, a ranking loss between abnormal and normal separation, and smoothness and sparsity regularization to impose temporal consistency and sparse event behavior. The weakly supervised FAST (Focused and Accelerated Subset Training) split experiments on the UCF-Crime weakly supervised FAST split demonstrate that ST-CapsNet is better than strong baselines. The findings indicate that capsule routing is an effective part of the whole temporal reasoning of weakly supervised surveillance anomaly detection. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 3488 KB  
Article
Automatic Modulation Recognition for Radio Mixed Proximity Sensor Signals Based on a Time-Frequency Image Enhancement Network
by Jinyu Zhang, Xiaopeng Yan, Xinhong Hao, Tai An, Erwa Dong and Jian Dai
Sensors 2026, 26(5), 1677; https://doi.org/10.3390/s26051677 - 6 Mar 2026
Viewed by 271
Abstract
The automatic modulation recognition (AMR) of low probability intercept (LPI) signals has received a considerable amount of interest from many researchers who have done much work on electronic reconnaissance. This recognition technology aims to design a classifier that enables the identification of signals [...] Read more.
The automatic modulation recognition (AMR) of low probability intercept (LPI) signals has received a considerable amount of interest from many researchers who have done much work on electronic reconnaissance. This recognition technology aims to design a classifier that enables the identification of signals with different modulation types. Based on deep learning models such as a convolutional neural network (CNN), the time-frequency images (TFIs) of the signal can be input to further extract features for classification. To improve recognition accuracy, especially under low signal-to-noise ratios (SNRs), we propose an AMR method for radio frequency proximity sensor signals based on a TFI enhancement network. The TFIs are denoised based on a per-pixel kernel prediction network (KPN), which can improve the quality of TFIs and achieves comparable denoising performance to traditional TFI reconstruction methods (e.g., sparse representation-based methods and low-rank approximation methods), while requiring significantly less computational overhead. The denoised TFIs, with enhanced signal quality and reduced noise, are then fed into the RetinalNet-based classifier as high-quality input features. This enhancement is crucial for the subsequent recognition stage, as it significantly improves the modulation recognition accuracy, particularly under challenging low SNR conditions. Simulation results show that the proposed method can accurately identify the modulation types of different radio frequency proximity sensors that are aliased in the time-frequency domain under low SNRs, and the average recognition accuracy rate of the signal remains above 97% when the signal-to-noise ratio is above −10 dB. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 8337 KB  
Article
HPFNet: Hierarchical Perception Fusion Network for Infrared Small Target Detection
by Mingjin Zhang, Yixiong Huang and Shuangquan Li
Remote Sens. 2026, 18(5), 804; https://doi.org/10.3390/rs18050804 - 6 Mar 2026
Viewed by 266
Abstract
Infrared small target detection (IRSTD) is a fundamental task in remote sensing-based surveillance and early warning systems. However, extremely small target size, low signal-to-noise ratio, and complex background clutter make reliable detection highly challenging. To address these issues, we propose a Hierarchical Perception [...] Read more.
Infrared small target detection (IRSTD) is a fundamental task in remote sensing-based surveillance and early warning systems. However, extremely small target size, low signal-to-noise ratio, and complex background clutter make reliable detection highly challenging. To address these issues, we propose a Hierarchical Perception Fusion Network (HPFNet) for IRSTD. Specifically, the Patch-Wise Context Feature Extraction module (PCFE) jointly integrates the Patch Nonlocal Block, convolutional blocks and attention mechanism to enable global–local feature extraction and enhancement, thereby strengthening weak target representations. In addition, the Multi-Level Sparse Cross-Fusion module (MSCF) explicitly performs cross-level feature interaction between encoder and decoder representations, enabling effective fusion of low-level spatial details and high-level semantic cues. A dual Top-K sparsification mechanism is adopted to filters’ irrelevant background features, enabling the attention mechanism to focus more on the target region and thereby bolstering the discriminative power of feature representation. Finally, the Efficient Upsampling Module (EUM) combines upsampling with multi-branch dilated convolutions to enhance feature reconstruction and improve localization accuracy. Extensive experiments on publicly available benchmark datasets demonstrate that HPFNet consistently outperforms existing state-of-the-art IRSTD methods. Full article
Show Figures

Figure 1

Back to TopTop