Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,709)

Search Parameters:
Keywords = feature detector

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1455 KB  
Article
Design and Evaluation of a Hardware-Constrained, Low-Complexity Yelp Siren Detector for Embedded Platforms
by Elena Valentina Dumitrascu, Răzvan Rughiniș and Robert Alexandru Dobre
Electronics 2025, 14(17), 3535; https://doi.org/10.3390/electronics14173535 - 4 Sep 2025
Abstract
The rapid response of emergency vehicles is crucial but often hindered because sirens lose effectiveness in modern traffic due to soundproofing, noise, and distractions. Automatic in-vehicle detection can help, but existing solutions struggle with efficiency, interpretability, and embedded suitability. This paper presents a [...] Read more.
The rapid response of emergency vehicles is crucial but often hindered because sirens lose effectiveness in modern traffic due to soundproofing, noise, and distractions. Automatic in-vehicle detection can help, but existing solutions struggle with efficiency, interpretability, and embedded suitability. This paper presents a hardware-constrained Simulink implementation of a yelp siren detector designed for embedded operation. Building on a MATLAB-based proof-of-concept validated in an idealized floating-point setting, the present system reflects practical implementation realities. Key features include the use of a realistically modeled digital-to-analog converter (DAC), filter designs restricted to standard E-series component values, interrupt service routine (ISR)-driven processing, and fixed-point data type handling that mirror microcontroller execution. For benchmarking, the dataset used in the earlier proof-of-concept to tune system parameters was also employed to train three representative machine learning classifiers (k-nearest neighbors, support vector machine, and neural network), serving as reference classifiers. To assess generalization, 200 test signals were synthesized with AudioLDM using real siren and road noise recordings as inputs. On this test set, the proposed system outperformed the reference classifiers and, when compared with state-of-the-art methods reported in the literature, achieved competitive accuracy while preserving low complexity. Full article
21 pages, 8753 KB  
Article
PowerStrand-YOLO: A High-Voltage Transmission Conductor Defect Detection Method for UAV Aerial Imagery
by Zhenrong Deng, Jun Li, Junjie Huang, Shuaizheng Jiang, Qiuying Wu and Rui Yang
Mathematics 2025, 13(17), 2859; https://doi.org/10.3390/math13172859 - 4 Sep 2025
Abstract
Broken or loose strands in high-voltage transmission conductors constitute critical defects that jeopardize grid reliability. Unmanned aerial vehicle (UAV) inspection has become indispensable for their timely discovery; however, conventional detectors falter in the face of cluttered backgrounds and the conductors’ diminutive pixel footprint, [...] Read more.
Broken or loose strands in high-voltage transmission conductors constitute critical defects that jeopardize grid reliability. Unmanned aerial vehicle (UAV) inspection has become indispensable for their timely discovery; however, conventional detectors falter in the face of cluttered backgrounds and the conductors’ diminutive pixel footprint, yielding sub-optimal accuracy and throughput. To overcome these limitations, we present PowerStrand-YOLO—an enhanced YOLOv8 derivative tailored for UAV imagery. The method is trained on a purpose-built dataset and integrates three technical contributions. (1) A C2f_DCNv4 module is introduced to strengthen multi-scale feature extraction. (2) An EMA attention mechanism is embedded to suppress background interference and emphasize defect-relevant cues. (3) The original loss function is superseded by Shape-IoU, compelling the network to attend closely to the geometric contours and spatial layout of strand anomalies. Extensive experiments demonstrate 95.4% precision, 96.2% recall, and 250 FPS. Relative to the baseline YOLOv8, PowerStrand-YOLO improves precision by 3% and recall by 6.8% while accelerating inference. Moreover, it also demonstrates competitive performance on the VisDrone2019 dataset. These results establish the improved framework as a more accurate and efficient solution for UAV-based inspection of power transmission lines. Full article
Show Figures

Figure 1

15 pages, 2951 KB  
Article
Fusing Residual and Cascade Attention Mechanisms in Voxel–RCNN for 3D Object Detection
by You Lu, Yuwei Zhang, Xiangsuo Fan, Dengsheng Cai and Rui Gong
Sensors 2025, 25(17), 5497; https://doi.org/10.3390/s25175497 - 4 Sep 2025
Abstract
In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks [...] Read more.
In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks (RPNs) in generating candidate regions and the inadequate detection of small-scale objects caused by overly deep convolutional layers in both 3D and 2D backbone networks, this paper proposes a Cascade Attention Network (CAN). The CAN is designed to progressively refine and enhance the proposed regions, thereby producing more accurate detection results. Furthermore, a 3D Residual Network is introduced, which improves the representation of small objects by reducing the number of convolutional layers while incorporating residual connections. In the Bird’s-Eye View (BEV) feature extraction network, a Residual Attention Network (RAN) is developed. This follows a similar approach to the aforementioned 3D backbone network, leveraging the spatial awareness capabilities of the BEV. Additionally, the Squeeze-and-Excitation (SE) attention mechanism is incorporated to assign dynamic weights to features, allowing the network to focus more effectively on informative features. Experimental results on the KITTI validation dataset demonstrate the effectiveness of the proposed method, with detection accuracy for cars, pedestrians, and bicycles improving by 3.34%, 10.75%, and 4.61%, respectively, under the KITTI hard level. The primary evaluation metric adopted is the 3D Average Precision (AP), computed over 40 recall positions (R40). The Intersection over IoU thresholds used are 0.7 for cars and 0.5 for both pedestrians and bicycles. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

18 pages, 4451 KB  
Article
Radar Target Detection Based on Linear Fusion of Two Features
by Yong Huang, Yunhao Luan, Yunlong Dong and Hao Ding
Sensors 2025, 25(17), 5436; https://doi.org/10.3390/s25175436 - 2 Sep 2025
Abstract
The joint detection of multiple features significantly enhances radar’s ability to detect weak targets on the sea surface. However, issues such as large data requirements and the lack of robustness in high-dimensional decision spaces severely constrain the detection performance and applicability of such [...] Read more.
The joint detection of multiple features significantly enhances radar’s ability to detect weak targets on the sea surface. However, issues such as large data requirements and the lack of robustness in high-dimensional decision spaces severely constrain the detection performance and applicability of such methods. In response to this, this paper proposes a radar target detection method based on linear fusion of two features from the perspective of feature dimension reduction. Firstly, a two-feature linear dimensionality reduction method based on distribution compactness is designed to form a fused feature. Then, the generalized extreme value (GEV) distribution is used to model the tail of the probability density function (PDF) of the fused feature, thereby designing an asymptotic constant false alarm rate (CFAR) detector. Finally, the detection performance of this detector is comparatively analyzed using measured data. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

31 pages, 5788 KB  
Article
Research on the Response Characteristics of Various Inorganic Scintillators Under High-Dose-Rate Irradiation from Charged Particles
by Junyu Hou, Ge Ma, Zhanzu Feng, Weiwei Zhang, Zong Meng and Yuhe Li
Sensors 2025, 25(17), 5431; https://doi.org/10.3390/s25175431 - 2 Sep 2025
Viewed by 25
Abstract
With the advent of novel scintillators featuring higher atomic numbers and enhanced radiation hardness, these materials exhibit potential applications under high-dose-rate irradiation. In this work, we systematically compared the photon output characteristics of ten mainstream or emerging inorganic scintillators under high-dose-rate irradiation with [...] Read more.
With the advent of novel scintillators featuring higher atomic numbers and enhanced radiation hardness, these materials exhibit potential applications under high-dose-rate irradiation. In this work, we systematically compared the photon output characteristics of ten mainstream or emerging inorganic scintillators under high-dose-rate irradiation with low-energy (0.1–1.7 MeV) electrons or protons. Initially, under electron irradiation among ~0.1 to ~50 rad/s, responses exhibited saturation trends to varying degrees, with their variations conforming to the saturation model proposed. However, under proton irradiation among ~5 rad/s to ~150 rad/s, responses exhibited sigmoidal trends due to competition between radiation-induced defects and luminescence centers. Through dynamic derivation of carriers and them, a triple-balance model that demonstrated close agreement with such variations was established. Subsequently, energy-dependent responses under proton irradiation exhibited marked nonlinearity, which were well fitted by Birks’ law, confirming the validity of our measurements. In contrast, electron-induced responses remained nearly linear with increasing energy. Then, after high-dose-rate and prolonged irradiation, BGO revealed highest response degradation, while YAG(Ce) demonstrated most radiation-damage resistance. Moreover, Ce-doped scintillators displayed higher afterglow levels after prolonged irradiation, particularly for YAG(Ce). In summary, these experimental analyses can provide critical guidance for material selection and effective calibration of scintillator detectors operating under high-dose-rate radiation from charged particles. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

9 pages, 1664 KB  
Article
Quantized Nuclear Recoil in the Search for Sterile Neutrinos in Tritium Beta Decay with PTOLEMY
by Wonyong Chung, Mark Farino, Andi Tan, Christopher G. Tully and Shiran Zhang
Universe 2025, 11(9), 297; https://doi.org/10.3390/universe11090297 - 2 Sep 2025
Viewed by 71
Abstract
The search for keV-scale sterile neutrinos in tritium beta decay is made possible through the theoretically allowed small admixture of electron flavor in right-handed, singlet, massive neutrino states. A distinctive feature of keV-scale sterile-neutrino–induced threshold distortions in the tritium beta spectrum is the [...] Read more.
The search for keV-scale sterile neutrinos in tritium beta decay is made possible through the theoretically allowed small admixture of electron flavor in right-handed, singlet, massive neutrino states. A distinctive feature of keV-scale sterile-neutrino–induced threshold distortions in the tritium beta spectrum is the presence of quantized nuclear-recoil effects, as predicted for atomic tritium bound to two-dimension materials such as graphene. The sensitivities to the sterile neutrino mass and electron-flavor mixing are considered in the context of the PTOLEMY detector simulation with tritiated graphene substrates. The ability to scan the entire tritium energy spectrum with a narrow energy window, low backgrounds, and high-resolution differential energy measurements provides the opportunity to pinpoint the quantized nuclear-recoil effects. providing an additional tool for identifying the kinematics of the production of sterile neutrinos. Background suppression is achieved by transversely accelerating electrons into a high magnetic field, where semi-relativistic electron tagging can be performed with cyclotron resonance emission RF antennas followed by deceleration through the PTOLEMY filter into a high-resolution differential energy detector operating in a zero-magnetic-field region. The PTOLEMY-based approach to keV-scale searches for sterile neutrinos involves a novel precision apparatus utilizing two-dimensional materials to yield high-resolution, sub-eV mass determination for electron-flavor mixing fractions of |Ue4|2105 and smaller. Full article
Show Figures

Figure 1

18 pages, 3670 KB  
Article
Photovoltaic Cell Surface Defect Detection via Subtle Defect Enhancement and Background Suppression
by Yange Sun, Guangxu Huang, Chenglong Xu, Huaping Guo and Yan Feng
Micromachines 2025, 16(9), 1003; https://doi.org/10.3390/mi16091003 - 30 Aug 2025
Viewed by 138
Abstract
As the core component of photovoltaic (PV) power generation systems, PV cells are susceptible to subtle surface defects, including thick lines, cracks, and finger interruptions, primarily caused by stress and material brittleness during the manufacturing process. These defects substantially degrade energy conversion efficiency [...] Read more.
As the core component of photovoltaic (PV) power generation systems, PV cells are susceptible to subtle surface defects, including thick lines, cracks, and finger interruptions, primarily caused by stress and material brittleness during the manufacturing process. These defects substantially degrade energy conversion efficiency by inducing both optical and electrical losses, yet existing detection methods struggle to precisely identify and localize them. In addition, the complexity of background noise and other factors further increases the challenge of detecting these subtle defects. To address these challenges, we propose a novel PV Cell Surface Defect Detector (PSDD) that extracts subtle defects both within the backbone network and during feature fusion. In particular, we propose a plug-and-play Subtle Feature Refinement Module (SFRM) that integrates into the backbone to enhance fine-grained feature representation by rearranging local spatial features to the channel dimension, mitigating the loss of detail caused by downsampling. SFRM further employs a general attention mechanism to adaptively enhance key channels associated with subtle defects, improving the representation of fine defect features. In addition, we propose a Background Noise Suppression Block (BNSB) as a key component of the feature aggregation stage, which employs a dual-path strategy to fuse multiscale features, reducing background interference and improving defect saliency. Specifically, the first path uses a Background-Aware Module (BAM) to adaptively suppress noise and emphasize relevant features, while the second path adopts a residual structure to retain the original input features and prevent the loss of critical details. Experiments show that PSDD outperforms other methods, achieving the highest mAP50 scores of 93.6% on the PVEL-AD. Full article
(This article belongs to the Special Issue Thin Film Photovoltaic and Photonic Based Materials and Devices)
Show Figures

Figure 1

30 pages, 25011 KB  
Article
Multi-Level Contextual and Semantic Information Aggregation Network for Small Object Detection in UAV Aerial Images
by Zhe Liu, Guiqing He and Yang Hu
Drones 2025, 9(9), 610; https://doi.org/10.3390/drones9090610 - 29 Aug 2025
Viewed by 281
Abstract
In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images [...] Read more.
In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images are primarily twofold: (1) Insufficient feature representation: The limited visual information for small objects makes it difficult for models to learn discriminative feature representations. (2) Background confusion: Abundant background information introduces more noise and interference, causing the features of small objects to easily be confused with the background. To address these issues, we propose a Multi-Level Contextual and Semantic Information Aggregation Network (MCSA-Net). MCSA-Net includes three key components: a Spatial-Aware Feature Selection Module (SAFM), a Multi-Level Joint Feature Pyramid Network (MJFPN), and an Attention-Enhanced Head (AEHead). The SAFM employs a sequence of dilated convolutions to extract multi-scale local context features and combines a spatial selection mechanism to adaptively merge these features, thereby obtaining the critical local context required for the objects, which enriches the feature representation of small objects. The MJFPN introduces multi-level connections and weighted fusion to fully leverage the spatial detail features of small objects in feature fusion and enhances the fused features further through a feature aggregation network. Finally, the AEHead is constructed by incorporating a sparse attention mechanism into the detection head. The sparse attention mechanism efficiently models long-range dependencies by computing the attention between the most relevant regions in the image while suppressing background interference, thereby enhancing the model’s ability to perceive targets and effectively improving the detection performance. Extensive experiments on four datasets, VisDrone, UAVDT, MS COCO, and DOTA, demonstrate that the proposed MCSA-Net achieves an excellent detection performance, particularly in small object detection, surpassing several state-of-the-art methods. Full article
(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)
Show Figures

Figure 1

22 pages, 4922 KB  
Article
PDE-Guided Diverse Feature Learning for SAR Rotated Ship Detection
by Mingjin Zhang, Zhongkai Yang, Jie Guo and Yunsong Li
Remote Sens. 2025, 17(17), 2998; https://doi.org/10.3390/rs17172998 - 28 Aug 2025
Viewed by 269
Abstract
Detecting ships in Synthetic Aperture Radar (SAR) images poses a complex challenge, with recent progress primarily attributed to the development of rotated detectors. However, existing methods often neglect the crucial influence of inherent characteristics in SAR images, such as common speckle noise. Moreover, [...] Read more.
Detecting ships in Synthetic Aperture Radar (SAR) images poses a complex challenge, with recent progress primarily attributed to the development of rotated detectors. However, existing methods often neglect the crucial influence of inherent characteristics in SAR images, such as common speckle noise. Moreover, a notable gap exists in modeling diverse features, particularly the fusion of rotational and high-frequency features. To address these challenges, this paper introduces a high-accuracy detector called PRDet, which builds on two key innovations: partial differential equation (PDE)-Guided Wavelet Transform (PGWT) and Diverse Feature Learning Block (DFLB). The PGWT enhances high-frequency features, such as edges and textures, while eliminating speckle noise by optimizing wavelet transform with PDE, leveraging the ability of PDE to model local variations and preserve structural details. The DFLB, with strong expressive capability, extracts and fuses multi-form ship features through three branches, enabling more accurate ship localization. Extensive experimental evaluations on the publicly available RSSDD and SRSDD-V1.0 benchmarks demonstrate PRDet’s superiority over other SAR rotated ship detectors. For example, on the RSSDD dataset, PRDet achieves an offshore precision of 0.938 and an mAP of 0.908, confirming its effectiveness for practical maritime surveillance applications. Full article
Show Figures

Figure 1

20 pages, 9232 KB  
Article
Anomaly-Detection Framework for Thrust Bearings in OWC WECs Using a Feature-Based Autoencoder
by Se-Yun Hwang, Jae-chul Lee, Soon-sub Lee and Cheonhong Min
J. Mar. Sci. Eng. 2025, 13(9), 1638; https://doi.org/10.3390/jmse13091638 - 27 Aug 2025
Viewed by 223
Abstract
An unsupervised anomaly-detection framework is proposed and field validated for thrust-bearing monitoring in the impulse turbine of a shoreline oscillating water-column (OWC) wave energy converter (WEC) off Jeju Island, Korea. Operational monitoring is constrained by nonstationary sea states, scarce fault labels, and low-rate [...] Read more.
An unsupervised anomaly-detection framework is proposed and field validated for thrust-bearing monitoring in the impulse turbine of a shoreline oscillating water-column (OWC) wave energy converter (WEC) off Jeju Island, Korea. Operational monitoring is constrained by nonstationary sea states, scarce fault labels, and low-rate supervisory logging at 20 Hz. To address these conditions, a 24 h period of normal operation was median-filtered to suppress outliers, and six physically motivated time-domain features were computed from triaxial vibration at 10 s intervals: absolute mean; standard deviation (STD); root mean square (RMS); skewness; shape factor (SF); and crest factor (CF, peak divided by RMS). A feature-based autoencoder was trained to reconstruct the feature vectors, and reconstruction error was evaluated with an adaptive threshold derived from the moving mean and moving standard deviation to accommodate baseline drift. Performance was assessed on a 2 h test segment that includes a 40 min simulated fault window created by doubling the triaxial vibration amplitudes prior to preprocessing and feature extraction. The detector achieved accuracy of 0.99, precision of 1.00, recall of 0.98, and F1 score of 0.99, with no false positives and five false negatives. These results indicate dependable detection at low sampling rates with modest computational cost. The chosen feature set provides physical interpretability under the 20 Hz constraint, and denoising stabilizes indicators against marine transients, supporting applicability in operational settings. Limitations associated with simulated faults are acknowledged. Future work will incorporate long-term field observations with verified fault progressions, cross-site validation, and integration with digital-twin-enabled maintenance. Full article
Show Figures

Figure 1

11 pages, 1500 KB  
Article
Photon-Counting CT Enhances Diagnostic Accuracy in Stable Coronary Artery Disease: A Comparative Study with Conventional CT
by Mitsutaka Nakashima, Toru Miyoshi, Shohei Hara, Ryosuke Miyagi, Takahiro Nishihara, Takashi Miki, Kazuhiro Osawa and Shinsuke Yuasa
J. Clin. Med. 2025, 14(17), 6049; https://doi.org/10.3390/jcm14176049 - 26 Aug 2025
Viewed by 413
Abstract
Background/Objectives: Coronary CT angiography (CCTA) is a cornerstone in evaluating stable coronary artery disease (CAD), but conventional energy-integrating detector CT (EID-CT) has limitations, including calcium blooming and limited spatial resolution. Photon-counting detector CT (PCD-CT) may overcome these drawbacks through enhanced spatial resolution and [...] Read more.
Background/Objectives: Coronary CT angiography (CCTA) is a cornerstone in evaluating stable coronary artery disease (CAD), but conventional energy-integrating detector CT (EID-CT) has limitations, including calcium blooming and limited spatial resolution. Photon-counting detector CT (PCD-CT) may overcome these drawbacks through enhanced spatial resolution and improved tissue characterization. Methods: In this retrospective, propensity score–matched study, we compared CCTA findings from 820 patients (410 per group) who underwent either EID-CT or PCD-CT for suspected stable CAD. Primary outcomes included stenosis severity, high-risk plaque features, and downstream invasive coronary angiography (ICA) referral and yield. Results: The matched cohorts were balanced in demographics and cardiovascular risk factors (mean age 67 years, 63% male). PCD-CT showed a favorable shift in stenosis severity distribution (p = 0.03). High-risk plaques were detected less frequently with PCD-CT (22.7% vs. 30.5%, p = 0.01). Median coronary calcium scores did not differ (p = 0.60). Among patients referred for ICA, those initially evaluated with PCD-CT were more likely to undergo revascularization (62.5% vs. 44.1%), and fewer underwent potentially unnecessary ICA without revascularization (3.7% vs. 8.0%, p = 0.001). The specificity in diagnosing significant stenosis requiring revascularization was 0.74 with EID-CT and 0.81 with PCD-CT (p = 0.04). Conclusions: PCD-CT improved diagnostic specificity for CAD, reducing unnecessary ICA referrals while maintaining detection of clinically significant disease. This advanced CT technology holds promise for more accurate, efficient, and patient-centered CAD evaluation. Full article
(This article belongs to the Section Cardiovascular Medicine)
Show Figures

Figure 1

25 pages, 3053 KB  
Article
Enhanced YOLOv11 Framework for Accurate Multi-Fault Detection in UAV Photovoltaic Inspection
by Shufeng Meng, Yang Yue and Tianxu Xu
Sensors 2025, 25(17), 5311; https://doi.org/10.3390/s25175311 - 26 Aug 2025
Viewed by 609
Abstract
Stains, defects, and snow accumulation constitute three prevalent photovoltaic (PV) anomalies; each exhibits unique color and thermal signatures yet collectively curtail energy yield. Existing detectors typically sacrifice accuracy for speed, and none simultaneously classify all three fault types. To counter the identified limitations, [...] Read more.
Stains, defects, and snow accumulation constitute three prevalent photovoltaic (PV) anomalies; each exhibits unique color and thermal signatures yet collectively curtail energy yield. Existing detectors typically sacrifice accuracy for speed, and none simultaneously classify all three fault types. To counter the identified limitations, an enhanced YOLOv11 framework is introduced. First, the hue-saturation-value (HSV) color model is employed to decouple hue and brightness, strengthening color feature extraction and cross-sensor generalization. Second, an outlook attention module integrated into the backbone precisely delineates micro-defect boundaries. Third, a mix structure block in the detection head encodes global context and fine-grained details to boost small object recognition. Additionally, the bounded sigmoid linear unit (B-SiLU) activation function optimizes gradient flow and feature discrimination through an improved nonlinear mapping, while the gradient-weighted class activation mapping (Grad-CAM) visualizations confirm selective attention to fault regions. Experimental results show that overall mean average precision (mAP) rises by 1.8%, with defect, stain, and snow accuracies improving by 2.2%, 3.3%, and 0.8%, respectively, offering a reliable solution for intelligent PV inspection and early fault detection. Full article
(This article belongs to the Special Issue Feature Papers in Communications Section 2025)
Show Figures

Figure 1

22 pages, 4036 KB  
Article
An Online Modular Framework for Anomaly Detection and Multiclass Classification in Video Surveillance
by Jonathan Flores-Monroy, Gibran Benitez-Garcia, Mariko Nakano-Miyatake and Hiroki Takahashi
Appl. Sci. 2025, 15(17), 9249; https://doi.org/10.3390/app15179249 - 22 Aug 2025
Viewed by 350
Abstract
Video surveillance systems are a key tool for the identification of anomalous events, but they still rely heavily on human analysis, which limits their efficiency. Current video anomaly detection models aim to automatically detect such events. However, most of them provide only a [...] Read more.
Video surveillance systems are a key tool for the identification of anomalous events, but they still rely heavily on human analysis, which limits their efficiency. Current video anomaly detection models aim to automatically detect such events. However, most of them provide only a binary classification (normal or anomalous) and do not identify the specific type of anomaly. Although recent proposals address anomaly classification, they typically require full video analysis, making them unsuitable for online applications. In this work, we propose a modular framework for the joint detection and classification of anomalies, designed to operate on individual clips within continuous video streams. The architecture integrates interchangeable modules (feature extractor, detector, and classifier) and is adaptable to both offline and online scenarios. Specifically, we introduce a multi-category classifier that processes only anomalous clips, enabling efficient clip-level classification. Experiments conducted on the UCF-Crime dataset validate the effectiveness of the framework, achieving 74.77% clip-level accuracy and 58.96% video-level accuracy, surpassing prior approaches and confirming its applicability in real-world surveillance environments. Full article
Show Figures

Figure 1

102 pages, 17708 KB  
Review
From Detection to Understanding: A Systematic Survey of Deep Learning for Scene Text Processing
by Zhandong Liu, Ruixia Song, Ke Li and Yong Li
Appl. Sci. 2025, 15(17), 9247; https://doi.org/10.3390/app15179247 - 22 Aug 2025
Viewed by 549
Abstract
Scene text understanding, serving as a cornerstone technology for autonomous navigation, document digitization, and accessibility tools, has witnessed a paradigm shift from traditional methods relying on handcrafted features and multi-stage processing pipelines to contemporary deep learning frameworks capable of learning hierarchical representations directly [...] Read more.
Scene text understanding, serving as a cornerstone technology for autonomous navigation, document digitization, and accessibility tools, has witnessed a paradigm shift from traditional methods relying on handcrafted features and multi-stage processing pipelines to contemporary deep learning frameworks capable of learning hierarchical representations directly from raw image inputs. This survey distinctly categorizes modern scene text recognition (STR) methodologies into three principal paradigms: two-stage detection frameworks that employ region proposal networks for precise text localization, single-stage detectors designed to optimize computational efficiency, and specialized architectures tailored to handle arbitrarily shaped text through geometric-aware modeling techniques. Concurrently, an in-depth analysis of text recognition paradigms elucidates the evolutionary trajectory from connectionist temporal classification (CTC) and sequence-to-sequence models to transformer-based architectures, which excel in contextual modeling and demonstrate superior performance. In contrast to prior surveys, this work uniquely emphasizes several key differences and contributions. Firstly, it provides a comprehensive and systematic taxonomy of STR methods, explicitly highlighting the trade-offs between detection accuracy, computational efficiency, and geometric adaptability across different paradigms. Secondly, it delves into the nuances of text recognition, illustrating how transformer-based models have revolutionized the field by capturing long-range dependencies and contextual information, thereby addressing challenges in recognizing complex text layouts and multilingual scripts. Furthermore, the survey pioneers the exploration of critical research frontiers, such as multilingual text adaptation, enhancing model robustness against environmental variations (e.g., lighting conditions, occlusions), and devising data-efficient learning strategies to mitigate the dependency on large-scale annotated datasets. By synthesizing insights from technical advancements across 28 benchmark datasets and standardized evaluation protocols, this study offers researchers a holistic perspective on the current state-of-the-art, persistent challenges, and promising avenues for future research, with the ultimate goal of achieving human-level scene text comprehension. Full article
Show Figures

Figure 1

27 pages, 7285 KB  
Article
Towards Biologically-Inspired Visual SLAM in Dynamic Environments: IPL-SLAM with Instance Segmentation and Point-Line Feature Fusion
by Jian Liu, Donghao Yao, Na Liu and Ye Yuan
Biomimetics 2025, 10(9), 558; https://doi.org/10.3390/biomimetics10090558 - 22 Aug 2025
Viewed by 494
Abstract
Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. [...] Read more.
Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. In this paper, we define dynamic regions as areas in the scene containing moving objects, and dynamic features as the visual features extracted from these regions that may adversely affect localization accuracy. Inspired by biological perception strategies that integrate semantic awareness and geometric cues, we propose Instance-level Point-Line SLAM (IPL-SLAM), a robust visual SLAM framework for dynamic environments. The system employs YOLOv8-based instance segmentation to detect potential dynamic regions and construct semantic priors, while simultaneously extracting point and line features using Oriented FAST (Features from Accelerated Segment Test) and Rotated BRIEF (Binary Robust Independent Elementary Features), collectively known as ORB, and Line Segment Detector (LSD) algorithms. Motion consistency checks and angular deviation analysis are applied to filter dynamic features, and pose optimization is conducted using an adaptive-weight error function. A static semantic point cloud map is further constructed to enhance scene understanding. Experimental results on the TUM RGB-D dataset demonstrate that IPL-SLAM significantly outperforms existing dynamic SLAM systems—including DS-SLAM and ORB-SLAM2—in terms of trajectory accuracy and robustness in complex indoor environments. Full article
(This article belongs to the Section Biomimetic Design, Constructions and Devices)
Show Figures

Figure 1

Back to TopTop