MDPI - Publisher of Open Access Journals

20 pages, 1455 KB

Open AccessFeature PaperArticle

Design and Evaluation of a Hardware-Constrained, Low-Complexity Yelp Siren Detector for Embedded Platforms

by Elena Valentina Dumitrascu, Răzvan Rughiniș and Robert Alexandru Dobre

Electronics 2025, 14(17), 3535; https://doi.org/10.3390/electronics14173535 - 4 Sep 2025

The rapid response of emergency vehicles is crucial but often hindered because sirens lose effectiveness in modern traffic due to soundproofing, noise, and distractions. Automatic in-vehicle detection can help, but existing solutions struggle with efficiency, interpretability, and embedded suitability. This paper presents a [...] Read more.

The rapid response of emergency vehicles is crucial but often hindered because sirens lose effectiveness in modern traffic due to soundproofing, noise, and distractions. Automatic in-vehicle detection can help, but existing solutions struggle with efficiency, interpretability, and embedded suitability. This paper presents a hardware-constrained Simulink implementation of a yelp siren detector designed for embedded operation. Building on a MATLAB-based proof-of-concept validated in an idealized floating-point setting, the present system reflects practical implementation realities. Key features include the use of a realistically modeled digital-to-analog converter (DAC), filter designs restricted to standard E-series component values, interrupt service routine (ISR)-driven processing, and fixed-point data type handling that mirror microcontroller execution. For benchmarking, the dataset used in the earlier proof-of-concept to tune system parameters was also employed to train three representative machine learning classifiers (k-nearest neighbors, support vector machine, and neural network), serving as reference classifiers. To assess generalization, 200 test signals were synthesized with AudioLDM using real siren and road noise recordings as inputs. On this test set, the proposed system outperformed the reference classifiers and, when compared with state-of-the-art methods reported in the literature, achieved competitive accuracy while preserving low complexity. Full article

(This article belongs to the Special Issue Advancements in Signal Processing: Communications, Sensing and Imaging)

21 pages, 8753 KB

Open AccessArticle

PowerStrand-YOLO: A High-Voltage Transmission Conductor Defect Detection Method for UAV Aerial Imagery

by Zhenrong Deng, Jun Li, Junjie Huang, Shuaizheng Jiang, Qiuying Wu and Rui Yang

Mathematics 2025, 13(17), 2859; https://doi.org/10.3390/math13172859 - 4 Sep 2025

Abstract

Broken or loose strands in high-voltage transmission conductors constitute critical defects that jeopardize grid reliability. Unmanned aerial vehicle (UAV) inspection has become indispensable for their timely discovery; however, conventional detectors falter in the face of cluttered backgrounds and the conductors’ diminutive pixel footprint, [...] Read more.

Broken or loose strands in high-voltage transmission conductors constitute critical defects that jeopardize grid reliability. Unmanned aerial vehicle (UAV) inspection has become indispensable for their timely discovery; however, conventional detectors falter in the face of cluttered backgrounds and the conductors’ diminutive pixel footprint, yielding sub-optimal accuracy and throughput. To overcome these limitations, we present PowerStrand-YOLO—an enhanced YOLOv8 derivative tailored for UAV imagery. The method is trained on a purpose-built dataset and integrates three technical contributions. (1) A C2f_DCNv4 module is introduced to strengthen multi-scale feature extraction. (2) An EMA attention mechanism is embedded to suppress background interference and emphasize defect-relevant cues. (3) The original loss function is superseded by Shape-IoU, compelling the network to attend closely to the geometric contours and spatial layout of strand anomalies. Extensive experiments demonstrate 95.4% precision, 96.2% recall, and 250 FPS. Relative to the baseline YOLOv8, PowerStrand-YOLO improves precision by 3% and recall by 6.8% while accelerating inference. Moreover, it also demonstrates competitive performance on the VisDrone2019 dataset. These results establish the improved framework as a more accurate and efficient solution for UAV-based inspection of power transmission lines. Full article

► Show Figures

Figure 1

15 pages, 2951 KB

Open AccessArticle

Fusing Residual and Cascade Attention Mechanisms in Voxel–RCNN for 3D Object Detection

by You Lu, Yuwei Zhang, Xiangsuo Fan, Dengsheng Cai and Rui Gong

Sensors 2025, 25(17), 5497; https://doi.org/10.3390/s25175497 - 4 Sep 2025

Abstract

In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks [...] Read more.

In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks (RPNs) in generating candidate regions and the inadequate detection of small-scale objects caused by overly deep convolutional layers in both 3D and 2D backbone networks, this paper proposes a Cascade Attention Network (CAN). The CAN is designed to progressively refine and enhance the proposed regions, thereby producing more accurate detection results. Furthermore, a 3D Residual Network is introduced, which improves the representation of small objects by reducing the number of convolutional layers while incorporating residual connections. In the Bird’s-Eye View (BEV) feature extraction network, a Residual Attention Network (RAN) is developed. This follows a similar approach to the aforementioned 3D backbone network, leveraging the spatial awareness capabilities of the BEV. Additionally, the Squeeze-and-Excitation (SE) attention mechanism is incorporated to assign dynamic weights to features, allowing the network to focus more effectively on informative features. Experimental results on the KITTI validation dataset demonstrate the effectiveness of the proposed method, with detection accuracy for cars, pedestrians, and bicycles improving by 3.34%, 10.75%, and 4.61%, respectively, under the KITTI hard level. The primary evaluation metric adopted is the 3D Average Precision (AP), computed over 40 recall positions (R40). The Intersection over IoU thresholds used are 0.7 for cars and 0.5 for both pedestrians and bicycles. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

18 pages, 4451 KB

Open AccessArticle

Radar Target Detection Based on Linear Fusion of Two Features

by Yong Huang, Yunhao Luan, Yunlong Dong and Hao Ding

Sensors 2025, 25(17), 5436; https://doi.org/10.3390/s25175436 - 2 Sep 2025

Abstract

The joint detection of multiple features significantly enhances radar’s ability to detect weak targets on the sea surface. However, issues such as large data requirements and the lack of robustness in high-dimensional decision spaces severely constrain the detection performance and applicability of such [...] Read more.

The joint detection of multiple features significantly enhances radar’s ability to detect weak targets on the sea surface. However, issues such as large data requirements and the lack of robustness in high-dimensional decision spaces severely constrain the detection performance and applicability of such methods. In response to this, this paper proposes a radar target detection method based on linear fusion of two features from the perspective of feature dimension reduction. Firstly, a two-feature linear dimensionality reduction method based on distribution compactness is designed to form a fused feature. Then, the generalized extreme value (GEV) distribution is used to model the tail of the probability density function (PDF) of the fused feature, thereby designing an asymptotic constant false alarm rate (CFAR) detector. Finally, the detection performance of this detector is comparatively analyzed using measured data. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

31 pages, 5788 KB

Open AccessArticle

Research on the Response Characteristics of Various Inorganic Scintillators Under High-Dose-Rate Irradiation from Charged Particles

by Junyu Hou, Ge Ma, Zhanzu Feng, Weiwei Zhang, Zong Meng and Yuhe Li

Sensors 2025, 25(17), 5431; https://doi.org/10.3390/s25175431 - 2 Sep 2025

Viewed by 25

Abstract

With the advent of novel scintillators featuring higher atomic numbers and enhanced radiation hardness, these materials exhibit potential applications under high-dose-rate irradiation. In this work, we systematically compared the photon output characteristics of ten mainstream or emerging inorganic scintillators under high-dose-rate irradiation with [...] Read more.

With the advent of novel scintillators featuring higher atomic numbers and enhanced radiation hardness, these materials exhibit potential applications under high-dose-rate irradiation. In this work, we systematically compared the photon output characteristics of ten mainstream or emerging inorganic scintillators under high-dose-rate irradiation with low-energy (0.1–1.7 MeV) electrons or protons. Initially, under electron irradiation among ~0.1 to ~50 rad/s, responses exhibited saturation trends to varying degrees, with their variations conforming to the saturation model proposed. However, under proton irradiation among ~5 rad/s to ~150 rad/s, responses exhibited sigmoidal trends due to competition between radiation-induced defects and luminescence centers. Through dynamic derivation of carriers and them, a triple-balance model that demonstrated close agreement with such variations was established. Subsequently, energy-dependent responses under proton irradiation exhibited marked nonlinearity, which were well fitted by Birks’ law, confirming the validity of our measurements. In contrast, electron-induced responses remained nearly linear with increasing energy. Then, after high-dose-rate and prolonged irradiation, BGO revealed highest response degradation, while YAG(Ce) demonstrated most radiation-damage resistance. Moreover, Ce-doped scintillators displayed higher afterglow levels after prolonged irradiation, particularly for YAG(Ce). In summary, these experimental analyses can provide critical guidance for material selection and effective calibration of scintillator detectors operating under high-dose-rate radiation from charged particles. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

9 pages, 1664 KB

Open AccessArticle

Quantized Nuclear Recoil in the Search for Sterile Neutrinos in Tritium Beta Decay with PTOLEMY

by Wonyong Chung, Mark Farino, Andi Tan, Christopher G. Tully and Shiran Zhang

Universe 2025, 11(9), 297; https://doi.org/10.3390/universe11090297 - 2 Sep 2025

Viewed by 71

Abstract

The search for keV-scale sterile neutrinos in tritium beta decay is made possible through the theoretically allowed small admixture of electron flavor in right-handed, singlet, massive neutrino states. A distinctive feature of keV-scale sterile-neutrino–induced threshold distortions in the tritium beta spectrum is the [...] Read more.

The search for keV-scale sterile neutrinos in tritium beta decay is made possible through the theoretically allowed small admixture of electron flavor in right-handed, singlet, massive neutrino states. A distinctive feature of keV-scale sterile-neutrino–induced threshold distortions in the tritium beta spectrum is the presence of quantized nuclear-recoil effects, as predicted for atomic tritium bound to two-dimension materials such as graphene. The sensitivities to the sterile neutrino mass and electron-flavor mixing are considered in the context of the PTOLEMY detector simulation with tritiated graphene substrates. The ability to scan the entire tritium energy spectrum with a narrow energy window, low backgrounds, and high-resolution differential energy measurements provides the opportunity to pinpoint the quantized nuclear-recoil effects. providing an additional tool for identifying the kinematics of the production of sterile neutrinos. Background suppression is achieved by transversely accelerating electrons into a high magnetic field, where semi-relativistic electron tagging can be performed with cyclotron resonance emission RF antennas followed by deceleration through the PTOLEMY filter into a high-resolution differential energy detector operating in a zero-magnetic-field region. The PTOLEMY-based approach to keV-scale searches for sterile neutrinos involves a novel precision apparatus utilizing two-dimensional materials to yield high-resolution, sub-eV mass determination for electron-flavor mixing fractions of

| U_{e 4} |^{2} ≃ 10^{- 5}

and smaller. Full article

(This article belongs to the Special Issue keV Warm Dark Matter (ΛWDM) in Agreement with Observations in Tribute to Héctor J. de Vega (2nd Edition))

► Show Figures

Figure 1

18 pages, 3670 KB

Open AccessArticle

Photovoltaic Cell Surface Defect Detection via Subtle Defect Enhancement and Background Suppression

by Yange Sun, Guangxu Huang, Chenglong Xu, Huaping Guo and Yan Feng

Micromachines 2025, 16(9), 1003; https://doi.org/10.3390/mi16091003 - 30 Aug 2025

Viewed by 138

Abstract

As the core component of photovoltaic (PV) power generation systems, PV cells are susceptible to subtle surface defects, including thick lines, cracks, and finger interruptions, primarily caused by stress and material brittleness during the manufacturing process. These defects substantially degrade energy conversion efficiency [...] Read more.

As the core component of photovoltaic (PV) power generation systems, PV cells are susceptible to subtle surface defects, including thick lines, cracks, and finger interruptions, primarily caused by stress and material brittleness during the manufacturing process. These defects substantially degrade energy conversion efficiency by inducing both optical and electrical losses, yet existing detection methods struggle to precisely identify and localize them. In addition, the complexity of background noise and other factors further increases the challenge of detecting these subtle defects. To address these challenges, we propose a novel PV Cell Surface Defect Detector (PSDD) that extracts subtle defects both within the backbone network and during feature fusion. In particular, we propose a plug-and-play Subtle Feature Refinement Module (SFRM) that integrates into the backbone to enhance fine-grained feature representation by rearranging local spatial features to the channel dimension, mitigating the loss of detail caused by downsampling. SFRM further employs a general attention mechanism to adaptively enhance key channels associated with subtle defects, improving the representation of fine defect features. In addition, we propose a Background Noise Suppression Block (BNSB) as a key component of the feature aggregation stage, which employs a dual-path strategy to fuse multiscale features, reducing background interference and improving defect saliency. Specifically, the first path uses a Background-Aware Module (BAM) to adaptively suppress noise and emphasize relevant features, while the second path adopts a residual structure to retain the original input features and prevent the loss of critical details. Experiments show that PSDD outperforms other methods, achieving the highest

m A P_{50}

scores of 93.6% on the PVEL-AD. Full article

(This article belongs to the Special Issue Thin Film Photovoltaic and Photonic Based Materials and Devices)

► Show Figures

Figure 1

30 pages, 25011 KB

Open AccessArticle

Multi-Level Contextual and Semantic Information Aggregation Network for Small Object Detection in UAV Aerial Images

by Zhe Liu, Guiqing He and Yang Hu

Drones 2025, 9(9), 610; https://doi.org/10.3390/drones9090610 - 29 Aug 2025

Viewed by 281

Abstract

In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images [...] Read more.

In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images are primarily twofold: (1) Insufficient feature representation: The limited visual information for small objects makes it difficult for models to learn discriminative feature representations. (2) Background confusion: Abundant background information introduces more noise and interference, causing the features of small objects to easily be confused with the background. To address these issues, we propose a Multi-Level Contextual and Semantic Information Aggregation Network (MCSA-Net). MCSA-Net includes three key components: a Spatial-Aware Feature Selection Module (SAFM), a Multi-Level Joint Feature Pyramid Network (MJFPN), and an Attention-Enhanced Head (AEHead). The SAFM employs a sequence of dilated convolutions to extract multi-scale local context features and combines a spatial selection mechanism to adaptively merge these features, thereby obtaining the critical local context required for the objects, which enriches the feature representation of small objects. The MJFPN introduces multi-level connections and weighted fusion to fully leverage the spatial detail features of small objects in feature fusion and enhances the fused features further through a feature aggregation network. Finally, the AEHead is constructed by incorporating a sparse attention mechanism into the detection head. The sparse attention mechanism efficiently models long-range dependencies by computing the attention between the most relevant regions in the image while suppressing background interference, thereby enhancing the model’s ability to perceive targets and effectively improving the detection performance. Extensive experiments on four datasets, VisDrone, UAVDT, MS COCO, and DOTA, demonstrate that the proposed MCSA-Net achieves an excellent detection performance, particularly in small object detection, surpassing several state-of-the-art methods. Full article

(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)

► Show Figures

Figure 1

22 pages, 4922 KB

Open AccessArticle

PDE-Guided Diverse Feature Learning for SAR Rotated Ship Detection

by Mingjin Zhang, Zhongkai Yang, Jie Guo and Yunsong Li

Remote Sens. 2025, 17(17), 2998; https://doi.org/10.3390/rs17172998 - 28 Aug 2025

Viewed by 269

Abstract

Detecting ships in Synthetic Aperture Radar (SAR) images poses a complex challenge, with recent progress primarily attributed to the development of rotated detectors. However, existing methods often neglect the crucial influence of inherent characteristics in SAR images, such as common speckle noise. Moreover, [...] Read more.

Detecting ships in Synthetic Aperture Radar (SAR) images poses a complex challenge, with recent progress primarily attributed to the development of rotated detectors. However, existing methods often neglect the crucial influence of inherent characteristics in SAR images, such as common speckle noise. Moreover, a notable gap exists in modeling diverse features, particularly the fusion of rotational and high-frequency features. To address these challenges, this paper introduces a high-accuracy detector called PRDet, which builds on two key innovations: partial differential equation (PDE)-Guided Wavelet Transform (PGWT) and Diverse Feature Learning Block (DFLB). The PGWT enhances high-frequency features, such as edges and textures, while eliminating speckle noise by optimizing wavelet transform with PDE, leveraging the ability of PDE to model local variations and preserve structural details. The DFLB, with strong expressive capability, extracts and fuses multi-form ship features through three branches, enabling more accurate ship localization. Extensive experimental evaluations on the publicly available RSSDD and SRSDD-V1.0 benchmarks demonstrate PRDet’s superiority over other SAR rotated ship detectors. For example, on the RSSDD dataset, PRDet achieves an offshore precision of 0.938 and an mAP of 0.908, confirming its effectiveness for practical maritime surveillance applications. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

► Show Figures

Figure 1

20 pages, 9232 KB

Open AccessArticle

Anomaly-Detection Framework for Thrust Bearings in OWC WECs Using a Feature-Based Autoencoder

by Se-Yun Hwang, Jae-chul Lee, Soon-sub Lee and Cheonhong Min

J. Mar. Sci. Eng. 2025, 13(9), 1638; https://doi.org/10.3390/jmse13091638 - 27 Aug 2025

Viewed by 223

Abstract

An unsupervised anomaly-detection framework is proposed and field validated for thrust-bearing monitoring in the impulse turbine of a shoreline oscillating water-column (OWC) wave energy converter (WEC) off Jeju Island, Korea. Operational monitoring is constrained by nonstationary sea states, scarce fault labels, and low-rate [...] Read more.

An unsupervised anomaly-detection framework is proposed and field validated for thrust-bearing monitoring in the impulse turbine of a shoreline oscillating water-column (OWC) wave energy converter (WEC) off Jeju Island, Korea. Operational monitoring is constrained by nonstationary sea states, scarce fault labels, and low-rate supervisory logging at 20 Hz. To address these conditions, a 24 h period of normal operation was median-filtered to suppress outliers, and six physically motivated time-domain features were computed from triaxial vibration at 10 s intervals: absolute mean; standard deviation (STD); root mean square (RMS); skewness; shape factor (SF); and crest factor (CF, peak divided by RMS). A feature-based autoencoder was trained to reconstruct the feature vectors, and reconstruction error was evaluated with an adaptive threshold derived from the moving mean and moving standard deviation to accommodate baseline drift. Performance was assessed on a 2 h test segment that includes a 40 min simulated fault window created by doubling the triaxial vibration amplitudes prior to preprocessing and feature extraction. The detector achieved accuracy of 0.99, precision of 1.00, recall of 0.98, and F1 score of 0.99, with no false positives and five false negatives. These results indicate dependable detection at low sampling rates with modest computational cost. The chosen feature set provides physical interpretability under the 20 Hz constraint, and denoising stabilizes indicators against marine transients, supporting applicability in operational settings. Limitations associated with simulated faults are acknowledged. Future work will incorporate long-term field observations with verified fault progressions, cross-site validation, and integration with digital-twin-enabled maintenance. Full article

(This article belongs to the Special Issue Monitoring and Evaluation of Marine Engineering Equipment and Structures)

► Show Figures

Figure 1

11 pages, 1500 KB

Open AccessArticle

Photon-Counting CT Enhances Diagnostic Accuracy in Stable Coronary Artery Disease: A Comparative Study with Conventional CT

by Mitsutaka Nakashima, Toru Miyoshi, Shohei Hara, Ryosuke Miyagi, Takahiro Nishihara, Takashi Miki, Kazuhiro Osawa and Shinsuke Yuasa

J. Clin. Med. 2025, 14(17), 6049; https://doi.org/10.3390/jcm14176049 - 26 Aug 2025

Viewed by 413

Abstract

Background/Objectives: Coronary CT angiography (CCTA) is a cornerstone in evaluating stable coronary artery disease (CAD), but conventional energy-integrating detector CT (EID-CT) has limitations, including calcium blooming and limited spatial resolution. Photon-counting detector CT (PCD-CT) may overcome these drawbacks through enhanced spatial resolution and [...] Read more.

Background/Objectives: Coronary CT angiography (CCTA) is a cornerstone in evaluating stable coronary artery disease (CAD), but conventional energy-integrating detector CT (EID-CT) has limitations, including calcium blooming and limited spatial resolution. Photon-counting detector CT (PCD-CT) may overcome these drawbacks through enhanced spatial resolution and improved tissue characterization. Methods: In this retrospective, propensity score–matched study, we compared CCTA findings from 820 patients (410 per group) who underwent either EID-CT or PCD-CT for suspected stable CAD. Primary outcomes included stenosis severity, high-risk plaque features, and downstream invasive coronary angiography (ICA) referral and yield. Results: The matched cohorts were balanced in demographics and cardiovascular risk factors (mean age 67 years, 63% male). PCD-CT showed a favorable shift in stenosis severity distribution (p = 0.03). High-risk plaques were detected less frequently with PCD-CT (22.7% vs. 30.5%, p = 0.01). Median coronary calcium scores did not differ (p = 0.60). Among patients referred for ICA, those initially evaluated with PCD-CT were more likely to undergo revascularization (62.5% vs. 44.1%), and fewer underwent potentially unnecessary ICA without revascularization (3.7% vs. 8.0%, p = 0.001). The specificity in diagnosing significant stenosis requiring revascularization was 0.74 with EID-CT and 0.81 with PCD-CT (p = 0.04). Conclusions: PCD-CT improved diagnostic specificity for CAD, reducing unnecessary ICA referrals while maintaining detection of clinically significant disease. This advanced CT technology holds promise for more accurate, efficient, and patient-centered CAD evaluation. Full article

(This article belongs to the Section Cardiovascular Medicine)

► Show Figures

Figure 1

25 pages, 3053 KB

Open AccessArticle

Enhanced YOLOv11 Framework for Accurate Multi-Fault Detection in UAV Photovoltaic Inspection

by Shufeng Meng, Yang Yue and Tianxu Xu

Sensors 2025, 25(17), 5311; https://doi.org/10.3390/s25175311 - 26 Aug 2025

Viewed by 609

Abstract

Stains, defects, and snow accumulation constitute three prevalent photovoltaic (PV) anomalies; each exhibits unique color and thermal signatures yet collectively curtail energy yield. Existing detectors typically sacrifice accuracy for speed, and none simultaneously classify all three fault types. To counter the identified limitations, [...] Read more.

Stains, defects, and snow accumulation constitute three prevalent photovoltaic (PV) anomalies; each exhibits unique color and thermal signatures yet collectively curtail energy yield. Existing detectors typically sacrifice accuracy for speed, and none simultaneously classify all three fault types. To counter the identified limitations, an enhanced YOLOv11 framework is introduced. First, the hue-saturation-value (HSV) color model is employed to decouple hue and brightness, strengthening color feature extraction and cross-sensor generalization. Second, an outlook attention module integrated into the backbone precisely delineates micro-defect boundaries. Third, a mix structure block in the detection head encodes global context and fine-grained details to boost small object recognition. Additionally, the bounded sigmoid linear unit (B-SiLU) activation function optimizes gradient flow and feature discrimination through an improved nonlinear mapping, while the gradient-weighted class activation mapping (Grad-CAM) visualizations confirm selective attention to fault regions. Experimental results show that overall mean average precision (mAP) rises by 1.8%, with defect, stain, and snow accuracies improving by 2.2%, 3.3%, and 0.8%, respectively, offering a reliable solution for intelligent PV inspection and early fault detection. Full article

(This article belongs to the Special Issue Feature Papers in Communications Section 2025)

► Show Figures

Figure 1

22 pages, 4036 KB

Open AccessArticle

An Online Modular Framework for Anomaly Detection and Multiclass Classification in Video Surveillance

by Jonathan Flores-Monroy, Gibran Benitez-Garcia, Mariko Nakano-Miyatake and Hiroki Takahashi

Appl. Sci. 2025, 15(17), 9249; https://doi.org/10.3390/app15179249 - 22 Aug 2025

Viewed by 350

Abstract

Video surveillance systems are a key tool for the identification of anomalous events, but they still rely heavily on human analysis, which limits their efficiency. Current video anomaly detection models aim to automatically detect such events. However, most of them provide only a [...] Read more.

Video surveillance systems are a key tool for the identification of anomalous events, but they still rely heavily on human analysis, which limits their efficiency. Current video anomaly detection models aim to automatically detect such events. However, most of them provide only a binary classification (normal or anomalous) and do not identify the specific type of anomaly. Although recent proposals address anomaly classification, they typically require full video analysis, making them unsuitable for online applications. In this work, we propose a modular framework for the joint detection and classification of anomalies, designed to operate on individual clips within continuous video streams. The architecture integrates interchangeable modules (feature extractor, detector, and classifier) and is adaptable to both offline and online scenarios. Specifically, we introduce a multi-category classifier that processes only anomalous clips, enabling efficient clip-level classification. Experiments conducted on the UCF-Crime dataset validate the effectiveness of the framework, achieving 74.77% clip-level accuracy and 58.96% video-level accuracy, surpassing prior approaches and confirming its applicability in real-world surveillance environments. Full article

(This article belongs to the Special Issue Advanced Technologies in Intelligent Software Methodologies, Tools, and Techniques)

► Show Figures

Figure 1

102 pages, 17708 KB

Open AccessReview

From Detection to Understanding: A Systematic Survey of Deep Learning for Scene Text Processing

by Zhandong Liu, Ruixia Song, Ke Li and Yong Li

Appl. Sci. 2025, 15(17), 9247; https://doi.org/10.3390/app15179247 - 22 Aug 2025

Viewed by 549

Abstract

Scene text understanding, serving as a cornerstone technology for autonomous navigation, document digitization, and accessibility tools, has witnessed a paradigm shift from traditional methods relying on handcrafted features and multi-stage processing pipelines to contemporary deep learning frameworks capable of learning hierarchical representations directly [...] Read more.

Scene text understanding, serving as a cornerstone technology for autonomous navigation, document digitization, and accessibility tools, has witnessed a paradigm shift from traditional methods relying on handcrafted features and multi-stage processing pipelines to contemporary deep learning frameworks capable of learning hierarchical representations directly from raw image inputs. This survey distinctly categorizes modern scene text recognition (STR) methodologies into three principal paradigms: two-stage detection frameworks that employ region proposal networks for precise text localization, single-stage detectors designed to optimize computational efficiency, and specialized architectures tailored to handle arbitrarily shaped text through geometric-aware modeling techniques. Concurrently, an in-depth analysis of text recognition paradigms elucidates the evolutionary trajectory from connectionist temporal classification (CTC) and sequence-to-sequence models to transformer-based architectures, which excel in contextual modeling and demonstrate superior performance. In contrast to prior surveys, this work uniquely emphasizes several key differences and contributions. Firstly, it provides a comprehensive and systematic taxonomy of STR methods, explicitly highlighting the trade-offs between detection accuracy, computational efficiency, and geometric adaptability across different paradigms. Secondly, it delves into the nuances of text recognition, illustrating how transformer-based models have revolutionized the field by capturing long-range dependencies and contextual information, thereby addressing challenges in recognizing complex text layouts and multilingual scripts. Furthermore, the survey pioneers the exploration of critical research frontiers, such as multilingual text adaptation, enhancing model robustness against environmental variations (e.g., lighting conditions, occlusions), and devising data-efficient learning strategies to mitigate the dependency on large-scale annotated datasets. By synthesizing insights from technical advancements across 28 benchmark datasets and standardized evaluation protocols, this study offers researchers a holistic perspective on the current state-of-the-art, persistent challenges, and promising avenues for future research, with the ultimate goal of achieving human-level scene text comprehension. Full article

► Show Figures

Figure 1

27 pages, 7285 KB

Open AccessArticle

Towards Biologically-Inspired Visual SLAM in Dynamic Environments: IPL-SLAM with Instance Segmentation and Point-Line Feature Fusion

by Jian Liu, Donghao Yao, Na Liu and Ye Yuan

Biomimetics 2025, 10(9), 558; https://doi.org/10.3390/biomimetics10090558 - 22 Aug 2025

Viewed by 494

Abstract

Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. [...] Read more.

Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. In this paper, we define dynamic regions as areas in the scene containing moving objects, and dynamic features as the visual features extracted from these regions that may adversely affect localization accuracy. Inspired by biological perception strategies that integrate semantic awareness and geometric cues, we propose Instance-level Point-Line SLAM (IPL-SLAM), a robust visual SLAM framework for dynamic environments. The system employs YOLOv8-based instance segmentation to detect potential dynamic regions and construct semantic priors, while simultaneously extracting point and line features using Oriented FAST (Features from Accelerated Segment Test) and Rotated BRIEF (Binary Robust Independent Elementary Features), collectively known as ORB, and Line Segment Detector (LSD) algorithms. Motion consistency checks and angular deviation analysis are applied to filter dynamic features, and pose optimization is conducted using an adaptive-weight error function. A static semantic point cloud map is further constructed to enhance scene understanding. Experimental results on the TUM RGB-D dataset demonstrate that IPL-SLAM significantly outperforms existing dynamic SLAM systems—including DS-SLAM and ORB-SLAM2—in terms of trajectory accuracy and robustness in complex indoor environments. Full article

(This article belongs to the Section Biomimetic Design, Constructions and Devices)

► Show Figures

Figure 1

Search Results (1,709)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,709)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI