Saved Queries

Ship detection in synthetic aperture radar (SAR) remote sensing imagery is of great significance in military and civilian applications. However, two factors limit detection performance: (1) a high prevalence of small-scale ship targets with limited information content and (2) interference affecting ship detection from speckle noise and land–sea clutter. To address these challenges, we propose a novel end-to-end (E2E) transformer-based SAR ship detection framework, called Flow-Aligned Nested Transformer for SAR Small Ship Detection (FANT-Det). Specifically, in the feature extraction stage, we introduce a Nested Swin Transformer Block (NSTB). The NSTB employs a two-level local self-attention mechanism to enhance fine-grained target representation, thereby enriching features of small ships. For multi-scale feature fusion, we design a Flow-Aligned Depthwise Efficient Channel Attention Network (FADEN). FADEN achieves precise alignment of features across different resolutions via semantic flow and filters background clutter through lightweight channel attention, further enhancing small-target feature quality. Moreover, we propose an Adaptive Multi-scale Contrastive Denoising (AM-CDN) training paradigm. AM-CDN constructs adaptive perturbation thresholds jointly determined by a target scale factor and a clutter factor, generating contrastive denoising samples that better match the physical characteristics of SAR ships. Finally, extensive experiments on three widely used open SAR ship datasets demonstrate that the proposed method achieves superior detection performance, outperforming current state-of-the-art (SOTA) benchmarks. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

27 pages, 7948 KB

Open AccessArticle

Attention-Driven Time-Domain Convolutional Network for Source Separation of Vocal and Accompaniment

by Zhili Zhao, Min Luo, Xiaoman Qiao, Changheng Shao and Rencheng Sun

Electronics 2025, 14(20), 3982; https://doi.org/10.3390/electronics14203982 (registering DOI) - 11 Oct 2025

Abstract

Time-domain signal models have been widely applied to single-channel music source separation tasks due to their ability to overcome the limitations of fixed spectral representations and phase information loss. However, the high acoustic similarity and synchronous temporal evolution between vocals and accompaniment make accurate separation challenging for existing time-domain models. These challenges are mainly reflected in two aspects: (1) the lack of a dynamic mechanism to evaluate the contribution of each source during feature fusion, and (2) difficulty in capturing fine-grained temporal details, often resulting in local artifacts in the output. To address these issues, we propose an attention-driven time-domain convolutional network for vocal and accompaniment source separation. Specifically, we design an embedding attention module to perform adaptive source weighting, enabling the network to emphasize components more relevant to the target mask during training. In addition, an efficient convolutional block attention module is developed to enhance local feature extraction. This module integrates an efficient channel attention mechanism based on one-dimensional convolution while preserving spatial attention, thereby improving the ability to learn discriminative features from the target audio. Comprehensive evaluations on public music datasets demonstrate the effectiveness of the proposed model and its significant improvements over existing approaches. Full article

(This article belongs to the Section Artificial Intelligence)

►▼ Show Figures

Figure 1

21 pages, 4636 KB

Open AccessArticle

Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control

by Safeh Clinton Mawah, Dagmawit Tadesse Aga, Shahrokh Hatefi, Farouk Smith and Yimesker Yihun

Processes 2025, 13(10), 3238; https://doi.org/10.3390/pr13103238 (registering DOI) - 11 Oct 2025

Abstract

Automotive manufacturing quality control faces persistent challenges such as limited defect samples, cross-domain variability, and the demand for interpretable decision-making. This work presents an explainable few-shot anomaly detection framework that integrates EfficientNet-based feature extraction, adaptive prototype learning, and component-specific attention mechanisms to address these requirements. The system is designed for rapid adaptation to novel defect types while maintaining interpretability through a multi-modal explainable AI module that combines visual, quantitative, and textual outputs. Evaluation on automotive datasets demonstrates promising performance on evaluated automotive components, achieving 99.4% accuracy for engine wiring inspection and 98.8% for gear inspection, with improvements of 5.2–7.6% over state-of-the-art baselines, including traditional unsupervised methods (PaDiM, PatchCore), advanced approaches (FastFlow, CFA, DRAEM), and few-shot supervised methods (ProtoNet, MatchingNet, RelationNet, FEAT), and with only 0.63% cross-domain degradation between wiring and gear inspection tasks. The architecture operates under real-time industrial constraints, with an average inference time of 18.2 ms, throughput of 60 components per minute, and memory usage below 2 GB on RTX 3080 hardware. Ablation studies confirm the importance of prototype learning (−4.52%), component analyzers (−2.79%), and attention mechanisms (−2.21%), with K = 5 few-shot configuration providing the best trade-off between accuracy and adaptability. Beyond performance, the framework produces interpretable defect localization, root-cause analysis, and severity-based recommendations designed for manufacturing integration with execution systems via standardized industrial protocols. These results demonstrate a practical and scalable approach for intelligent quality control, enabling robust, interpretable, and adaptive inspection within the evaluated automotive components. Full article

(This article belongs to the Special Issue Recent Trends in Advanced Manufacturing Technologies for Materials Processing and Production)

22 pages, 5120 KB

Open AccessArticle

Adapting Gated Axial Attention for Microscopic Hyperspectral Cholangiocarcinoma Image Segmentation

by Jianxia Xue, Xiaojing Chen and Soo-Hyung Kim

Electronics 2025, 14(20), 3979; https://doi.org/10.3390/electronics14203979 (registering DOI) - 11 Oct 2025

Abstract

Accurate segmentation of medical images is essential for clinical diagnosis and treatment planning. Hyperspectral imaging (HSI), with its rich spectral information, enables improved tissue characterization and structural localization compared with traditional grayscale or RGB imaging. However, the effective modeling of both spatial and spectral dependencies remains a significant challenge, particularly in small-scale medical datasets. In this study, we propose GSA-Net, a 3D segmentation framework that integrates Gated Spectral-Axial Attention (GSA) to capture long-range interband dependencies and enhance spectral feature discrimination. The GSA module incorporates multilayer perceptrons (MLPs) and adaptive LayerScale mechanisms to enable the fine-grained modulation of spectral attention across feature channels. We evaluated GSA-Net on a hyperspectral cholangiocarcinoma (CCA) dataset, achieving an average Intersection over Union (IoU) of 60.64 ± 14.48%, Dice coefficient of 74.44 ± 11.83%, and Hausdorff Distance of 76.82 ± 42.77 px. It outperformed state-of-the-art baselines. Further spectral analysis revealed that informative spectral bands are widely distributed rather than concentrated, and full-spectrum input consistently outperforms aggressive band selection, underscoring the importance of adaptive spectral attention for robust hyperspectral medical image segmentation. Full article

(This article belongs to the Special Issue Image Segmentation, 2nd Edition)

38 pages, 5895 KB

Open AccessArticle

Beyond Accuracy: Benchmarking Machine Learning Models for Efficient and Sustainable SaaS Decision Support

by Efthimia Mavridou, Eleni Vrochidou, Michail Selvesakis and George A. Papakostas

Future Internet 2025, 17(10), 467; https://doi.org/10.3390/fi17100467 (registering DOI) - 11 Oct 2025

Abstract

Machine learning (ML) methods have been successfully employed to support decision-making for Software as a Service (SaaS) providers. While most of the published research primarily emphasizes prediction accuracy, other important aspects, such as cloud deployment efficiency and environmental impact, have received comparatively less attention. It is also critical to effectively use factors such as training time, prediction time and carbon footprint in production. SaaS decision support systems use the output of ML models to provide actionable recommendations, such as running reactivation campaigns for users who are likely to churn. To this end, in this paper, we present a benchmarking comparison of 17 different ML models for churn prediction in SaaS, which include cloud deployment efficiency metrics (e.g., latency, prediction time, etc.) and sustainability metrics (e.g., CO₂ emissions, consumed energy, etc.) along with predictive performance metrics (e.g., AUC, Log Loss, etc.). Two public datasets are employed, experiments are repeated on four different machines, locally and on the cloud, while a new weighted Green Efficiency Weighted Score (GEWS) is introduced, as steps towards choosing the simpler, greener and more efficient ML model. Experimental results indicated XGBoost and LightGBM as the models capable of offering a good balance on predictive performance, fast training, inference times, and limited emissions, while the importance of region selection towards minimizing the carbon footprint of the ML models was confirmed. Full article

(This article belongs to the Special Issue Distributed Machine Learning and Federated Edge Computing for IoT)

►▼ Show Figures

Figure 1

17 pages, 2920 KB

Open AccessArticle

Frequency Domain Reflectometry for Power Cable Defect Localization: A Comparative Study of FFT and IFFT Methods

by Wenbo Zhu, Baojun Hui, Jianda Li, Tao Han, Linjie Zhao and Shuai Hou

Energies 2025, 18(20), 5346; https://doi.org/10.3390/en18205346 - 10 Oct 2025

Abstract

At present, the development of power cables shows three notable trends: higher voltage, longer distance and more complex environments. Against this backdrop, the limitations of traditional detection techniques in locating local defects have become increasingly apparent. Frequency Domain Reflectometry (FDR) has garnered sustained research attention both domestically and internationally due to its high sensitivity and accuracy in detecting localized defects. This paper aims to compare the defect localization effectiveness of the Fast Fourier Transform (FFT) method and the Inverse Fast Fourier Transform (IFFT) method within FDR. First, the differences between the two methods are analyzed from a theoretical perspective. Then, field tests are conducted on cables of varying voltage levels and lengths, with comparisons made using parameters such as full width at half maximum (FWHM) and signal-to-noise ratio (SNR). The results indicate that the FFT method is more suitable for low-interference or short cables, while the IFFT method is more suitable for high-noise, high-resolution, or long cables. Full article

(This article belongs to the Special Issue Advanced Techniques for Power Transmission, Distribution and Transformation Equipment)

►▼ Show Figures

Figure 1

35 pages, 2444 KB

Open AccessReview

The Photosynthetic Complexes of Thylakoid Membranes of Photoautotrophs and a Quartet of Their Polar Lipids

by Anatoly Zhukov and Vadim Volkov

Int. J. Mol. Sci. 2025, 26(20), 9869; https://doi.org/10.3390/ijms26209869 (registering DOI) - 10 Oct 2025

Abstract

The important function of polar lipids in the biochemical chains of photosynthesis, the outstanding biochemical process on our planet, has been mentioned in many publications. Over the last several years, apart from the known function of lipids in creating a matrix for photosynthetic complexes, most attention has been paid to the role of lipids in building up and functioning of the photosynthetic complexes. The lipid molecules are found inside the complexes of photosystem II (PSII), photosystem I (PSI), and cytochrome b₆f (Cyt b₆f) together with other cofactors that accompany proteins and chlorophyll molecules. Super complexes PSII-light-harvesting complex II (PSII-LHCII) and PSI-light-harvesting complex I (PSI-LHCI) also include lipid molecules; part of the lipid molecules is located at the borders between the separate monomers of the complexes. Our interest is in the exact localization of lipid molecules inside the monomers: what are the protein subunits with the lipid molecules in between and how do the lipids contact directly with the amino acids of the proteins? The photosystems include very few classes of all the polar lipids, three groups of glyceroglycolipids, and one group of glycerophospholipids make up the quartet of polar lipids. What are the reasons they have been selected for the role? There are no doubts that the polar heads and the fatty acids chains of these lipids are taking part in the processes of photosynthesis. However, what are the distinct roles for each of them? The advantages and disadvantages of the head groups of lipids from thylakoid membranes and those lipids that for various reasons could not take their place are discussed. Attention is focused on those bound fatty acids that predominate or are characteristic for each class of thylakoid lipids. Emphasis is also placed on the content of each of the four lipids in all photosynthetic complexes, as well as on contacts of head groups and acyl chains of lipids with specific proteins, transmembrane chains, and their amino acids. This article is devoted to the search for answers to the questions posed. Full article

(This article belongs to the Special Issue New Insights of Fungal and Plant Lipids: Structural Diversity, Metabolism and Applications)

►▼ Show Figures

Figure 1

29 pages, 19561 KB

Open AccessArticle

Empirical Analysis of the Impact of Two Key Parameters of the Harmony Search Algorithm on Performance

by Geonhee Lee and Zong Woo Geem

Mathematics 2025, 13(20), 3248; https://doi.org/10.3390/math13203248 - 10 Oct 2025

Abstract

Metaheuristic algorithms are widely utilized as effective tools for solving complex optimization problems. Among them, the Harmony Search (HS) algorithm has garnered significant attention for its simple structure and excellent performance. The efficacy of the HS algorithm is heavily dependent on the configuration of its internal parameters, with the Harmony Memory Considering Rate (HMCR) and Pitch Adjusting Rate (PAR) playing pivotal roles. These parameters determine the probabilities of using the Random Generation (RG), Harmony Memory Consideration (HMC), and Pitch Adjustment (PA) operators, thereby controlling the balance between exploration and exploitation. However, a systematic empirical analysis of the interaction between these parameters and the characteristics of the problem at hand remains insufficient. Thus, this study conducts a comprehensive empirical analysis of the performance sensitivity of the HS algorithm to variations in HMCR and PAR values. The analysis is performed on a suite of 23 benchmark functions, encompassing diverse characteristics such as unimodality/multimodality and separability/non-separability, along with 5 real-world optimization problems. Through extensive experimentation, the performance for each parameter combination was evaluated on a rank-based system and visualized using heatmaps. The results experimentally demonstrate that the algorithm’s performance is most sensitive to the HMCR value across all function types, establishing that setting a high HMCR value (≥0.9) is a prerequisite for securing stable performance. Conversely, the optimal PAR value showed a direct correlation with the topographical features of the problem landscape. For unimodal problems, a low PAR value between 0.1 and 0.3 was more effective, whereas for complex multimodal problems with numerous local optima, a relatively higher PAR value between 0.3 and 0.5 proved more efficient in searching for the global optimum. This research provides a guideline into the parameter settings of the HS algorithm and contributes to enhancing its practical applicability by proposing a systematic parameter tuning strategy based on problem characteristics. Full article

►▼ Show Figures

Figure 1

20 pages, 1579 KB

Open AccessArticle

Towards Trustworthy and Explainable-by-Design Large Language Models for Automated Teacher Assessment

by Yuan Li, Hang Yang and Quanrong Fang

Information 2025, 16(10), 882; https://doi.org/10.3390/info16100882 (registering DOI) - 10 Oct 2025

Abstract

Conventional teacher assessment is labor-intensive and subjective. Prior LLM-based systems improve scale but rely on post hoc rationales and lack built-in trust controls. We propose an explainable-by-design framework that couples (i) Dual-Lens Hierarchical Attention—a global lens aligned to curriculum standards and a local lens aligned to subject-specific rubrics—with (ii) a Trust-Gated Inference module that combines Monte-Carlo-dropout calibration and adversarial debiasing, and (iii) an On-the-Spot Explanation generator that shares the same fused representation and predicted score used for decision making. Thus, explanations are decision-consistent and curriculum-anchored rather than retrofitted. On TeacherEval-2023, EdNet-Math, and MM-TBA, our model attains an Inter-Rater Consistency of 82.4%, Explanation Credibility of 0.78, Fairness Gap of 1.8%, and Expected Calibration Error of 0.032. Faithfulness is verified via attention-to-rubric alignment (78%) and counterfactual deletion tests, while trust gating reduces confidently wrong outputs and triggers reject-and-refer when uncertainty is high. The system retains 99.6% accuracy under cross-domain transfer and degrades only 4.1% with 15% ASR noise, reducing human review workload by 41%. This establishes a reproducible path to trustworthy and pedagogy-aligned LLMs for high-stakes educational evaluation. Full article

(This article belongs to the Special Issue Advancing Educational Innovation with Artificial Intelligence)

►▼ Show Figures

Figure 1

19 pages, 8850 KB

Open AccessArticle

Intelligent Defect Recognition of Glazed Components in Ancient Buildings Based on Binocular Vision

by Youshan Zhao, Xiaolan Zhang, Ming Guo, Haoyu Han, Jiayi Wang, Yaofeng Wang, Xiaoxu Li and Ming Huang

Buildings 2025, 15(20), 3641; https://doi.org/10.3390/buildings15203641 (registering DOI) - 10 Oct 2025

Abstract

Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed components is crucial. The key to preventive protection lies in the early detection and repair of damage, thereby extending the component’s service life and preventing significant structural damage. To address this challenge, this study proposes a Restoration-Scale Identification (RSI) method that integrates depth information. By combining RGB-D images acquired from a depth camera with intrinsic camera parameters, and embedding a Convolutional Block Attention Module (CBAM) into the backbone network, the method dynamically enhances critical feature regions. It then employs a scale restoration strategy to accurately identify damage areas and recover the physical dimensions of glazed components from a global perspective. In addition, we constructed a dedicated semantic segmentation dataset for glazed tile damage, focusing on cracks and spalling. Both qualitative and quantitative evaluation results demonstrate that, compared with various high-performance semantic segmentation methods, our approach significantly improves the accuracy and robustness of damage detection in glazed components. The achieved accuracy deviates by only ±10 mm from high-precision laser scanning, a level of precision that is essential for reliably identifying and assessing subtle damages in complex glazed architectural elements. By integrating depth information, real scale information can be effectively obtained during the intelligent recognition process, thereby efficiently and accurately identifying the type of damage and size information of glazed components, and realizing the conversion from two-dimensional (2D) pixel coordinates to local three-dimensional (3D) coordinates, providing a scientific basis for the protection and restoration of ancient buildings, and ensuring the long-term stability of cultural heritage and the inheritance of historical value. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

►▼ Show Figures

Figure 1

38 pages, 13748 KB

Open AccessArticle

MH-WMG: A Multi-Head Wavelet-Based MobileNet with Gated Linear Attention for Power Grid Fault Diagnosis

by Yousef Alkhanafseh, Tahir Cetin Akinci, Alfredo A. Martinez-Morales, Serhat Seker and Sami Ekici

Appl. Sci. 2025, 15(20), 10878; https://doi.org/10.3390/app152010878 - 10 Oct 2025

Abstract

Artificial intelligence is increasingly embedded in power systems to boost efficiency, reliability, and automation. This study introduces an end-to-end, AI-driven fault-diagnosis pipeline built around a Multi-Head Wavelet-based MobileNet with Gated Linear Attention (MH-WMG). The network takes time-series signals converted into images as input and branches into three heads that, respectively, localize the fault area, classify the fault type, and predict the distance bin for all short-circuit faults. Evaluation employs the canonical Kundur two-area four-machine system, partitioned into six regions, twelve fault scenarios (including normal operation), and twelve predefined distance bins. MH-WMG achieves high performance: perfect accuracy, precision, recall, and F1 (1.00) for fault-area detection; strong fault-type identification (accuracy = 0.9604, precision = 0.9625, recall = 0.9604, and F1 = 0.9601); and robust distance-bin prediction (accuracy = 0.8679, precision = 0.8725, recall = 0.8679, and F1 = 0.8690). The model is compact and fast (2.33 M parameters, 44.14 ms latency, 22.66 images/s) and outperforms baselines in both accuracy and efficiency. The pipeline decisively outperforms conventional time-series methods. By rapidly pinpointing and classifying faults with high fidelity, it enhances grid resilience, reduces operational risk, and enables more stable, intelligent operation, demonstrating the value of AI-driven fault detection for future power-system reliability. Full article

►▼ Show Figures

Figure 1

14 pages, 10073 KB

Open AccessArticle

Numerical Simulation of the Wind Speed Field Around Suburban Residential Buildings with Different Arrangements

by Xuchong Yi and Shuangxi Zhang

Symmetry 2025, 17(10), 1699; https://doi.org/10.3390/sym17101699 - 10 Oct 2025

Viewed by 25

Abstract

The wind environment in furnace cities has attracted considerable research attention. Investigating the impact of suburban residential building arrangements in furnace cities on inter-building wind speed fields is useful and cost-effective for scientifically optimizing layouts. This study simulated 13 wind speed fields across six symmetric and asymmetric building arrangements: linear, inclined, convex, concave, M-shaped, and V-shaped, with varying building offsets and spacing widths. We used the standard k–ε model for simulations through finite element method. Results demonstrated that larger building offsets enhanced inter-building wind speeds, with the concave arrangement most effectively enhanced the wind speed between buildings among the configurations. V-shaped arrangements slightly underperformed concave layouts in wind speed uniformity. Based on the summer wind direction data from Wuhan Tianhe Meteorological Station, we propose two corresponding layouts: concave and V-shaped arrangements, which are conductive to enhancing inter-building wind speed. In practical planning, the orientation of building clusters can be adjusted according to the local wind rose diagram. Full article

(This article belongs to the Special Issue Symmetry in Finite Element Modeling and Mechanics)

►▼ Show Figures

Figure 1

24 pages, 1545 KB

Open AccessArticle

Curvature-Aware Point-Pair Signatures for Robust Unbalanced Point Cloud Registration

by Xinhang Hu, Zhao Zeng, Jiwei Deng, Guangshuai Wang, Jiaqi Yang and Siwen Quan

Sensors 2025, 25(20), 6267; https://doi.org/10.3390/s25206267 - 10 Oct 2025

Viewed by 17

Abstract

Existing point cloud registration methods can effectively handle large-scale and partially overlapping point cloud pairs. However, registering unbalanced point cloud pairs with significant disparities in spatial extent and point density remains a challenging problem that has received limited research attention. This challenge primarily arises from the difficulty in achieving accurate local registration when the point clouds exhibit substantial scale variations and uneven density distributions. This paper presents a novel registration method for unbalanced point cloud pairs that utilizes the local point cluster structure feature for effective outlier rejection. The fundamental principle underlying our method is that the internal structure of a local cluster comprising a point and its K-nearest neighbors maintains rigidity-preserved invariance across different point clouds. The proposed pipeline operates through four sequential stages. First, keypoints are detected in both the source and target point clouds. Second, local feature descriptors are employed to establish initial one-to-many correspondences, which is a strategy that increases correspondences redundancy to enhance the pool of potential inliers. Third, the proposed Local Point Cluster Structure Feature is applied to filter outliers from the initial correspondences. Finally, the transformation hypothesis is generated and evaluated through the RANSAC method. To validate the efficacy of the proposed method, we construct a carefully designed benchmark named KITTI-UPP (KITTI-Unbalanced Point cloud Pairs) based on the KITTI odometry dataset. We further evaluate our method on the real-world TIESY Dataset which is a LiDAR-scanned dataset collected by the Third Railway Survey and Design Institute Group Co. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art methods in terms of both registration success rate and computational efficiency on the KITTI-UPP benchmark. Moreover, it achieves competitive results on the real-world TIESY dataset, confirming its applicability and generalizability across diverse real-world scenarios. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

17 pages, 7150 KB

Open AccessArticle

DeepFishNET+: A Dual-Stream Deep Learning Framework for Robust Underwater Fish Detection and Classification

by Mahdi Hamzaoui, Mokhtar Rejili, Mohamed Ould-Elhassen Aoueileyine and Ridha Bouallegue

Appl. Sci. 2025, 15(20), 10870; https://doi.org/10.3390/app152010870 - 10 Oct 2025

Viewed by 85

Abstract

The conservation and protection of fish species are crucial tasks for aquaculture and marine biology. Recognizing fish in underwater environments is highly challenging due to poor lighting and the visual similarity between fish and the background. Conventional recognition methods are extremely time-consuming and often yield unsatisfactory accuracy. This paper proposes a new method called DeepFishNET+. First, an Underwater Image Enhancement module was implemented for image correction. Second, Global CNN Stream (RestNet50) and a Local Transformer Stream were implemented to generate the Feature Map and Feature Vector. Next, a feature fusion operation was performed in the Cross-Attention Feature Fusion module. Finally, Yolov8 was used for fish detection and localization. Softmax was applied for species recognition. This new approach achieved a classification precision of 98.28% and a detection precision of 92.74%. Full article

(This article belongs to the Special Issue Advances in Aquatic Animal Nutrition and Aquaculture)

►▼ Show Figures

Figure 1

19 pages, 3418 KB

Open AccessArticle

WSVAD-CLIP: Temporally Aware and Prompt Learning with CLIP for Weakly Supervised Video Anomaly Detection

by Min Li, Jing Sang, Yuanyao Lu and Lina Du

J. Imaging 2025, 11(10), 354; https://doi.org/10.3390/jimaging11100354 - 10 Oct 2025

Viewed by 43

Abstract

Weakly Supervised Video Anomaly Detection (WSVAD) is a critical task in computer vision. It aims to localize and recognize abnormal behaviors using only video-level labels. Without frame-level annotations, it becomes significantly challenging to model temporal dependencies. Given the diversity of abnormal events, it is also difficult to model semantic representations. Recently, the cross-modal pre-trained model Contrastive Language-Image Pretraining (CLIP) has shown a strong ability to align visual and textual information. This provides new opportunities for video anomaly detection. Inspired by CLIP, WSVAD-CLIP is proposed as a framework that uses its cross-modal knowledge to bridge the semantic gap between text and vision. First, the Axial-Graph (AG) Module is introduced. It combines an Axial Transformer and Lite Graph Attention Networks (LiteGAT) to capture global temporal structures and local abnormal correlations. Second, a Text Prompt mechanism is designed. It fuses a learnable prompt with a knowledge-enhanced prompt to improve the semantic expressiveness of category embeddings. Third, the Abnormal Visual-Guided Text Prompt (AVGTP) mechanism is proposed to aggregate anomalous visual context for adaptively refining textual representations. Extensive experiments on UCF-Crime and XD-Violence datasets show that WSVAD-CLIP notably outperforms existing methods in coarse-grained anomaly detection. It also achieves superior performance in fine-grained anomaly recognition tasks, validating its effectiveness and generalizability. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 141.

Go to page 1 2 3 4 5

Search Results (7,007)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI