Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (511)

Search Parameters:
Keywords = pyramid training

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 11177 KiB  
Article
Robust Segmentation of Lung Proton and Hyperpolarized Gas MRI with Vision Transformers and CNNs: A Comparative Analysis of Performance Under Artificial Noise
by Ramtin Babaeipour, Matthew S. Fox, Grace Parraga and Alexei Ouriadov
Bioengineering 2025, 12(8), 808; https://doi.org/10.3390/bioengineering12080808 - 28 Jul 2025
Viewed by 315
Abstract
Accurate segmentation in medical imaging is essential for disease diagnosis and monitoring, particularly in lung imaging using proton and hyperpolarized gas MRI. However, image degradation due to noise and artifacts—especially in hyperpolarized gas MRI, where scans are acquired during breath-holds—poses challenges for conventional [...] Read more.
Accurate segmentation in medical imaging is essential for disease diagnosis and monitoring, particularly in lung imaging using proton and hyperpolarized gas MRI. However, image degradation due to noise and artifacts—especially in hyperpolarized gas MRI, where scans are acquired during breath-holds—poses challenges for conventional segmentation algorithms. This study evaluates the robustness of deep learning segmentation models under varying Gaussian noise levels, comparing traditional convolutional neural networks (CNNs) with modern Vision Transformer (ViT)-based models. Using a dataset of proton and hyperpolarized gas MRI slices from 56 participants, we trained and tested Feature Pyramid Network (FPN) and U-Net architectures with both CNN (VGG16, VGG19, ResNet152) and ViT (MiT-B0, B3, B5) backbones. Results showed that ViT-based models, particularly those using the SegFormer backbone, consistently outperformed CNN-based counterparts across all metrics and noise levels. The performance gap was especially pronounced in high-noise conditions, where transformer models retained higher Dice scores and lower boundary errors. These findings highlight the potential of ViT-based architectures for deployment in clinically realistic, low-SNR environments such as hyperpolarized gas MRI, where segmentation reliability is critical. Full article
Show Figures

Figure 1

18 pages, 1941 KiB  
Article
Design of Virtual Sensors for a Pyramidal Weathervaning Floating Wind Turbine
by Hector del Pozo Gonzalez, Magnus Daniel Kallinger, Tolga Yalcin, José Ignacio Rapha and Jose Luis Domínguez-García
J. Mar. Sci. Eng. 2025, 13(8), 1411; https://doi.org/10.3390/jmse13081411 - 24 Jul 2025
Viewed by 192
Abstract
This study explores virtual sensing techniques for the Eolink floating offshore wind turbine (FOWT), which features a pyramidal platform and a single-point mooring system that enables weathervaning to maximize power production and reduce structural loads. To address the challenges and costs associated with [...] Read more.
This study explores virtual sensing techniques for the Eolink floating offshore wind turbine (FOWT), which features a pyramidal platform and a single-point mooring system that enables weathervaning to maximize power production and reduce structural loads. To address the challenges and costs associated with monitoring submerged components, virtual sensors are investigated as an alternative to physical instrumentation. The main objective is to design a virtual sensor of mooring hawser loads using a reduced set of input features from GPS, anemometer, and inertial measurement unit (IMU) data. A virtual sensor is also proposed to estimate the bending moment at the joint of the pyramid masts. The FOWT is modeled in OrcaFlex, and a range of load cases is simulated for training and testing. Under defined sensor sampling conditions, both supervised and physics-informed machine learning algorithms are evaluated. The models are tested under aligned and misaligned environmental conditions, as well as across operating regimes below- and above-rated conditions. Results show that mooring tensions can be estimated with high accuracy, while bending moment predictions also perform well, though with lower precision. These findings support the use of virtual sensing to reduce instrumentation requirements in critical areas of the floating wind platform. Full article
Show Figures

Figure 1

17 pages, 3708 KiB  
Article
YOLOv8-DBW: An Improved YOLOv8-Based Algorithm for Maize Leaf Diseases and Pests Detection
by Xiang Gan, Shukun Cao, Jin Wang, Yu Wang and Xu Hou
Sensors 2025, 25(15), 4529; https://doi.org/10.3390/s25154529 - 22 Jul 2025
Viewed by 377
Abstract
To solve the challenges of low detection accuracy of maize pests and diseases, complex detection models, and difficulty in deployment on mobile or embedded devices, an improved YOLOv8 algorithm was proposed. Based on the original YOLOv8n, the algorithm replaced the Conv module with [...] Read more.
To solve the challenges of low detection accuracy of maize pests and diseases, complex detection models, and difficulty in deployment on mobile or embedded devices, an improved YOLOv8 algorithm was proposed. Based on the original YOLOv8n, the algorithm replaced the Conv module with the DSConv module in the backbone network, which reduced the backbone network parameters and computational load and improved the detection accuracy at the same time. Additionally, BiFPN was introduced to construct a bidirectional feature pyramid structure, which realized efficient information flow and fusion between different scale features and enhanced the feature fusion ability of the model. At the same time, the Wise-IoU loss function was combined to optimize the training process, which improved the convergence speed and regression accuracy of the loss function. The experimental results showed that the precision, recall, and mAP0.5 of the improved algorithm were improved by 1.4, 1.1, and 1.5%, respectively, compared with YOLOv8n, and the model parameters and computational costs were reduced by 6.6 and 7.3%, respectively. The experimental results demonstrate the effectiveness and superiority of the improved YOLOv8 algorithm, which provides an efficient, accurate, and easy-to-deploy solution for maize leaf pest detection. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

20 pages, 3898 KiB  
Article
Synergistic Multi-Model Approach for GPR Data Interpretation: Forward Modeling and Robust Object Detection
by Hang Zhang, Zhijie Ma, Xinyu Fan and Feifei Hou
Remote Sens. 2025, 17(14), 2521; https://doi.org/10.3390/rs17142521 - 20 Jul 2025
Viewed by 307
Abstract
Ground penetrating radar (GPR) is widely used for subsurface object detection, but manual interpretation of hyperbolic features in B-scan images remains inefficient and error-prone. In addition, traditional forward modeling methods suffer from low computational efficiency and strong dependence on field measurements. To address [...] Read more.
Ground penetrating radar (GPR) is widely used for subsurface object detection, but manual interpretation of hyperbolic features in B-scan images remains inefficient and error-prone. In addition, traditional forward modeling methods suffer from low computational efficiency and strong dependence on field measurements. To address these challenges, we propose an unsupervised data augmentation framework that utilizes CycleGAN-based model to generates diverse synthetic B-scan images by simulating varying geological parameters and scanning configurations. This approach achieves GPR data forward modeling and enhances the scenario coverage of training data. We then apply the EfficientDet architecture, which incorporates a bidirectional feature pyramid network (BiFPN) for multi-scale feature fusion, to enhance the detection capability of hyperbolic signatures in B-scan images under challenging conditions such as partial occlusions and background noise. The proposed method achieves a mean average precision (mAP) of 0.579 on synthetic datasets, outperforming YOLOv3 and RetinaNet by 16.0% and 23.5%, respectively, while maintaining robust multi-object detection in complex field conditions. Full article
(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)
Show Figures

Figure 1

21 pages, 4936 KiB  
Article
A Lightweight Pavement Defect Detection Algorithm Integrating Perception Enhancement and Feature Optimization
by Xiang Zhang, Xiaopeng Wang and Zhuorang Yang
Sensors 2025, 25(14), 4443; https://doi.org/10.3390/s25144443 - 17 Jul 2025
Viewed by 304
Abstract
To address the current issue of large computations and the difficulty in balancing model complexity and detection accuracy in pavement defect detection models, a lightweight pavement defect detection algorithm, PGS-YOLO, is proposed based on YOLOv8, which integrates perception enhancement and feature optimization. The [...] Read more.
To address the current issue of large computations and the difficulty in balancing model complexity and detection accuracy in pavement defect detection models, a lightweight pavement defect detection algorithm, PGS-YOLO, is proposed based on YOLOv8, which integrates perception enhancement and feature optimization. The algorithm first designs the Receptive-Field Convolutional Block Attention Module Convolution (RFCBAMConv) and the Receptive-Field Convolutional Block Attention Module C2f-RFCBAM, based on which we construct an efficient Perception Enhanced Feature Extraction Network (PEFNet) that enhances multi-scale feature extraction capability by dynamically adjusting the receptive field. Secondly, the dynamic upsampling module DySample is introduced into the efficient feature pyramid, constructing a new feature fusion pyramid (Generalized Dynamic Sampling Feature Pyramid Network, GDSFPN) to optimize the multi-scale feature fusion effect. In addition, a shared detail-enhanced convolution lightweight detection head (SDCLD) was designed, which significantly reduces the model’s parameters and computation while improving localization and classification performance. Finally, Wise-IoU was introduced to optimize the training performance and detection accuracy of the model. Experimental results show that PGS-YOLO increases mAP50 by 2.8% and 2.9% on the complete GRDDC2022 dataset and the Chinese subset, respectively, outperforming the other detection models. The number of parameters and computations are reduced by 10.3% and 9.9%, respectively, compared to the YOLOv8n model, with an average frame rate of 69 frames per second, offering good real-time performance. In addition, on the CRACK500 dataset, PGS-YOLO improved mAP50 by 2.3%, achieving a better balance between model complexity and detection accuracy. Full article
(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))
Show Figures

Figure 1

15 pages, 1142 KiB  
Technical Note
Terrain and Atmosphere Classification Framework on Satellite Data Through Attentional Feature Fusion Network
by Antoni Jaszcz and Dawid Połap
Remote Sens. 2025, 17(14), 2477; https://doi.org/10.3390/rs17142477 - 17 Jul 2025
Viewed by 228
Abstract
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information [...] Read more.
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information about space for autonomous systems, identifying landscape elements, or monitoring and maintaining the infrastructure and environment. Hence, in this paper, we propose a neural classifier architecture that analyzes different features by the parallel processing of information in the network and combines them with a feature fusion mechanism. The neural architecture model takes into account different types of features by extracting them by focusing on spatial, local patterns and multi-scale representation. In addition, the classifier is guided by an attention mechanism for focusing more on different channels, spatial information, and even feature pyramid mechanisms. Atrous convolutional operators were also used in such an architecture as better context feature extractors. The proposed classifier architecture is the main element of the modeled framework for satellite data analysis, which is based on the possibility of training depending on the client’s desire. The proposed methodology was evaluated on three publicly available classification datasets for remote sensing: satellite images, Visual Terrain Recognition, and USTC SmokeRS, where the proposed model achieved accuracy scores of 97.8%, 100.0%, and 92.4%, respectively. The obtained results indicate the effectiveness of the proposed attention mechanisms across different remote sensing challenges. Full article
Show Figures

Figure 1

18 pages, 2200 KiB  
Article
A Self-Supervised Adversarial Deblurring Face Recognition Network for Edge Devices
by Hanwen Zhang, Myun Kim, Baitong Li and Yanping Lu
J. Imaging 2025, 11(7), 241; https://doi.org/10.3390/jimaging11070241 - 15 Jul 2025
Viewed by 346
Abstract
With the advancement of information technology, human activity recognition (HAR) has been widely applied in fields such as intelligent surveillance, health monitoring, and human–computer interaction. As a crucial component of HAR, facial recognition plays a key role, especially in vision-based activity recognition. However, [...] Read more.
With the advancement of information technology, human activity recognition (HAR) has been widely applied in fields such as intelligent surveillance, health monitoring, and human–computer interaction. As a crucial component of HAR, facial recognition plays a key role, especially in vision-based activity recognition. However, current facial recognition models on the market perform poorly in handling blurry images and dynamic scenarios, limiting their effectiveness in real-world HAR applications. This study aims to construct a fast and accurate facial recognition model based on novel adversarial learning and deblurring theory to enhance its performance in human activity recognition. The model employs a generative adversarial network (GAN) as the core algorithm, optimizing its generation and recognition modules by decomposing the global loss function and incorporating a feature pyramid, thereby solving the balance challenge in GAN training. Additionally, deblurring techniques are introduced to improve the model’s ability to handle blurry and dynamic images. Experimental results show that the proposed model achieves high accuracy and recall rates across multiple facial recognition datasets, with an average recall rate of 87.40% and accuracy rates of 81.06% and 79.77% on the YTF, IMDB-WIKI, and WiderFace datasets, respectively. These findings confirm that the model effectively addresses the challenges of recognizing faces in dynamic and blurry conditions in human activity recognition, demonstrating significant application potential. Full article
(This article belongs to the Special Issue Techniques and Applications in Face Image Analysis)
Show Figures

Figure 1

22 pages, 6645 KiB  
Article
Visual Detection on Aircraft Wing Icing Process Using a Lightweight Deep Learning Model
by Yang Yan, Chao Tang, Jirong Huang, Zhixiong Cen and Zonghong Xie
Aerospace 2025, 12(7), 627; https://doi.org/10.3390/aerospace12070627 - 12 Jul 2025
Viewed by 205
Abstract
Aircraft wing icing significantly threatens aviation safety, causing substantial losses to the aviation industry each year. High transparency and blurred edges of icing areas in wing images pose challenges to wing icing detection by machine vision. To address these challenges, this study proposes [...] Read more.
Aircraft wing icing significantly threatens aviation safety, causing substantial losses to the aviation industry each year. High transparency and blurred edges of icing areas in wing images pose challenges to wing icing detection by machine vision. To address these challenges, this study proposes a detection model, Wing Icing Detection DeeplabV3+ (WID-DeeplabV3+), for efficient and precise aircraft wing leading edge icing detection under natural lighting conditions. WID-DeeplabV3+ adopts the lightweight MobileNetV3 as its backbone network to enhance the extraction of edge features in icing areas. Ghost Convolution and Atrous Spatial Pyramid Pooling modules are incorporated to reduce model parameters and computational complexity. The model is optimized using the transfer learning method, where pre-trained weights are utilized to accelerate convergence and enhance performance. Experimental results show WID-DeepLabV3+ segments the icing edge at 1920 × 1080 within 0.03 s. The model achieves the accuracy of 97.15%, an IOU of 94.16%, a precision of 97%, and a recall of 96.96%, representing respective improvements of 1.83%, 3.55%, 1.79%, and 2.04% over DeeplabV3+. The number of parameters and computational complexity are reduced by 92% and 76%, respectively. With high accuracy, superior IOU, and fast inference speed, WID-DeeplabV3+ provides an effective solution for wing-icing detection. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

21 pages, 7528 KiB  
Article
A Fine-Tuning Method via Adaptive Symmetric Fusion and Multi-Graph Aggregation for Human Pose Estimation
by Yinliang Shi, Zhaonian Liu, Bin Jiang, Tianqi Dai and Yuanfeng Lian
Symmetry 2025, 17(7), 1098; https://doi.org/10.3390/sym17071098 - 9 Jul 2025
Viewed by 328
Abstract
Human Pose Estimation (HPE) aims to accurately locate the positions of human key points in images or videos. However, the performance of HPE is often significantly reduced in practical application scenarios due to environmental interference. To address this challenge, we propose a ladder [...] Read more.
Human Pose Estimation (HPE) aims to accurately locate the positions of human key points in images or videos. However, the performance of HPE is often significantly reduced in practical application scenarios due to environmental interference. To address this challenge, we propose a ladder side-tuning method for the Vision Transformer (ViT) pre-trained model based on multi-path feature fusion to improve the accuracy of HPE in highly interfering environments. First, we extract the global features, frequency features and multi-scale spatial features through the ViT pre-trained model, the discrete wavelet convolutional network and the atrous spatial pyramid pooling network (ASPP). By comprehensively capturing the information of the human body and the environment, the ability of the model to analyze local details, textures, and spatial information is enhanced. In order to efficiently fuse these features, we devise an adaptive symmetric feature fusion strategy, which dynamically adjusts the intensity of feature fusion according to the similarity among features to achieve the optimal fusion effect. Finally, a multi-graph feature aggregation method is developed. We construct graph structures of different features and deeply explore the subtle differences among the features based on the dual fusion mechanism of points and edges to ensure the information integrity. The experimental results demonstrate that our method achieves 4.3% and 4.2% improvements in the AP metric on the MS COCO dataset and a custom high-interference dataset, respectively, compared with the HRNet. This highlights its superiority for human pose estimation tasks in both general and interfering environments. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision and Graphics)
Show Figures

Figure 1

24 pages, 2149 KiB  
Article
STA-3D: Combining Spatiotemporal Attention and 3D Convolutional Networks for Robust Deepfake Detection
by Jingbo Wang, Jun Lei, Shuohao Li and Jun Zhang
Symmetry 2025, 17(7), 1037; https://doi.org/10.3390/sym17071037 - 1 Jul 2025
Viewed by 551
Abstract
Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos [...] Read more.
Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos and cross-dataset scenarios. Observing that mainstream generation methods use frame-by-frame synthesis without adequate temporal consistency constraints, we introduce the Spatiotemporal Attention 3D Network (STA-3D), a novel framework that combines a lightweight spatiotemporal attention module with a 3D convolutional architecture to improve detection robustness. The proposed attention module adopts a symmetric multi-branch architecture, where each branch follows a nearly identical processing pipeline to separately model temporal-channel, temporal-spatial, and intra-spatial correlations. Our framework additionally implements Spatial Pyramid Pooling (SPP) layers along the temporal axis, enabling adaptive modeling regardless of input video length. Furthermore, we mitigate the inherent asymmetry in the quantity of authentic and forged samples by replacing standard cross entropy with focal loss for training. This integration facilitates the simultaneous exploitation of inter-frame temporal discontinuities and intra-frame spatial artifacts, achieving competitive performance across various benchmark datasets under different compression conditions: for the intra-dataset setting on FF++, it improves the average accuracy by 1.09 percentage points compared to existing SOTA, with a more significant gain of 1.63 percentage points under the most challenging C40 compression level (particularly for NeuralTextures, achieving an improvement of 4.05 percentage points); while for the intra-dataset setting, AUC is enhanced by 0.24 percentage points on the DFDC-P dataset. Full article
Show Figures

Figure 1

16 pages, 3892 KiB  
Article
Fault Diagnosis Method for Shearer Arm Gear Based on Improved S-Transform and Depthwise Separable Convolution
by Haiyang Wu, Hui Zhou, Chang Liu, Gang Cheng and Yusong Pang
Sensors 2025, 25(13), 4067; https://doi.org/10.3390/s25134067 - 30 Jun 2025
Viewed by 294
Abstract
To address the limitations in time–frequency feature representation of shearer arm gear faults and the issues of parameter redundancy and low training efficiency in standard convolutional neural networks (CNNs), this study proposes a diagnostic method based on an improved S-transform and a Depthwise [...] Read more.
To address the limitations in time–frequency feature representation of shearer arm gear faults and the issues of parameter redundancy and low training efficiency in standard convolutional neural networks (CNNs), this study proposes a diagnostic method based on an improved S-transform and a Depthwise Separable Convolutional Neural Network (DSCNN). First, the improved S-transform is employed to perform time–frequency analysis on the vibration signals, converting the original one-dimensional signals into two-dimensional time–frequency images to fully preserve the fault characteristics of the gear. Then, a neural network model combining standard convolution and depthwise separable convolution is constructed for fault identification. The experimental dataset includes five gear conditions: tooth deficiency, tooth breakage, tooth wear, tooth crack, and normal. The performance of various frequency-domain and time-frequency methods—Wavelet Transform, Fourier Transform, S-transform, and Gramian Angular Field (GAF)—is compared using the same network model. Furthermore, Grad-CAM is applied to visualize the responses of key convolutional layers, highlighting the regions of interest related to gear fault features. Finally, four typical CNN architectures are analyzed and compared: Deep Convolutional Neural Network (DCNN), InceptionV3, Residual Network (ResNet), and Pyramid Convolutional Neural Network (PCNN). Experimental results demonstrate that frequency–domain representations consistently outperform raw time-domain signals in fault diagnosis tasks. Grad-CAM effectively verifies the model’s accurate focus on critical fault features. Moreover, the proposed method achieves high classification accuracy while reducing both training time and the number of model parameters. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

27 pages, 2591 KiB  
Article
MCRS-YOLO: Multi-Aggregation Cross-Scale Feature Fusion Object Detector for Remote Sensing Images
by Lu Liu and Jun Li
Remote Sens. 2025, 17(13), 2204; https://doi.org/10.3390/rs17132204 - 26 Jun 2025
Viewed by 512
Abstract
With the rapid development of deep learning, object detection in remote sensing images has attracted extensive attention. However, remote sensing images typically exhibit the following characteristics: significant variations in object scales, dense small targets, and complex backgrounds. To address these challenges, a novel [...] Read more.
With the rapid development of deep learning, object detection in remote sensing images has attracted extensive attention. However, remote sensing images typically exhibit the following characteristics: significant variations in object scales, dense small targets, and complex backgrounds. To address these challenges, a novel object detection method named MCRS-YOLO is innovatively proposed. Firstly, a Multi-Branch Aggregation (MBA) network is designed to enhance information flow and mitigate challenges caused by insufficient object feature representation. Secondly, we construct a Multi-scale Feature Refinement and Fusion Pyramid Network (MFRFPN) to effectively integrate spatially multi-scale features, thereby augmenting the semantic information of feature maps. Thirdly, a Large Depth-wise Separable Kernel (LDSK) module is proposed to comprehensively capture contextual information while achieving an enlarged effective receptive field. Finally, the Normalized Wasserstein Distance (NWD) is introduced into hybrid loss training to emphasize small object features and suppress background interference. The efficacy and superiority of MCRS-YOLO are rigorously validated through extensive experiments on two publicly available datasets: NWPU VHR-10 and VEDAI. Compared with the baseline YOLOv11, the proposed method demonstrates improvements of 4.0% and 6.7% in mean Average Precision (mAP), which provides an efficient and accurate solution for object detection in remote sensing images. Full article
Show Figures

Figure 1

24 pages, 6594 KiB  
Article
GAT-Enhanced YOLOv8_L with Dilated Encoder for Multi-Scale Space Object Detection
by Haifeng Zhang, Han Ai, Donglin Xue, Zeyu He, Haoran Zhu, Delian Liu, Jianzhong Cao and Chao Mei
Remote Sens. 2025, 17(13), 2119; https://doi.org/10.3390/rs17132119 - 20 Jun 2025
Viewed by 478
Abstract
The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale [...] Read more.
The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale feature fusion framework based on an improved version of YOLOv8_L is proposed. The combination of a graph attention network (GAT) and Dilated Encoder network significantly improves the algorithm detection and recognition performance for space remote sensing objects. It mainly includes abandoning the original Feature Pyramid Network (FPN) structure, proposing an adaptive fusion strategy based on multi-level features of backbone network, enhancing the expression ability of multi-scale objects through upsampling and feature stacking, and reconstructing the FPN. The local features extracted by convolutional neural networks are mapped to graph-structured data, and the nodal attention mechanism of GAT is used to capture the global topological association of space objects, which makes up for the deficiency of the convolutional operation in weight allocation and realizes GAT integration. The Dilated Encoder network is introduced to cover different-scale targets by differentiating receptive fields, and the feature weight allocation is optimized by combining it with a Convolutional Block Attention Module (CBAM). According to the characteristics of space missions, an annotated dataset containing 8000 satellite and space station images is constructed, covering a variety of lighting, attitude and scale scenes, and providing benchmark support for model training and verification. Experimental results on the space object dataset reveal that the enhanced algorithm achieves a mean average precision (mAP) of 97.2%, representing a 2.1% improvement over the original YOLOv8_L. Comparative experiments with six other models demonstrate that the proposed algorithm outperforms its counterparts. Ablation studies further validate the synergistic effect between the graph attention network (GAT) and the Dilated Encoder. The results indicate that the model maintains a high detection accuracy under challenging conditions, including strong light interference, multi-scale variations, and low-light environments. Full article
(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)
Show Figures

Figure 1

20 pages, 2511 KiB  
Article
MT-CMVAD: A Multi-Modal Transformer Framework for Cross-Modal Video Anomaly Detection
by Hantao Ding, Shengfeng Lou, Hairong Ye and Yanbing Chen
Appl. Sci. 2025, 15(12), 6773; https://doi.org/10.3390/app15126773 - 16 Jun 2025
Viewed by 843
Abstract
Video anomaly detection (VAD) faces significant challenges in multimodal semantic alignment and long-term temporal modeling within open surveillance scenarios. Existing methods are often plagued by modality discrepancies and fragmented temporal reasoning. To address these issues, we introduce MT-CMVAD, a hierarchically structured Transformer architecture [...] Read more.
Video anomaly detection (VAD) faces significant challenges in multimodal semantic alignment and long-term temporal modeling within open surveillance scenarios. Existing methods are often plagued by modality discrepancies and fragmented temporal reasoning. To address these issues, we introduce MT-CMVAD, a hierarchically structured Transformer architecture that makes two key technical contributions: (1) A Context-Aware Dynamic Fusion Module that leverages cross-modal attention with learnable gating coefficients to effectively bridge the gap between RGB and optical flow modalities through adaptive feature recalibration, significantly enhancing fusion performance; (2) A Multi-Scale Spatiotemporal Transformer that establishes global-temporal dependencies via dilated attention mechanisms while preserving local spatial semantics through pyramidal feature aggregation. To address the sparse anomaly supervision dilemma, we propose a hybrid learning objective that integrates dual-stream reconstruction loss with prototype-based contrastive discrimination, enabling the joint optimization of pattern restoration and discriminative representation learning. Our extensive experiments on the UCF-Crime, UBI-Fights, and UBnormal datasets demonstrate state-of-the-art performance, achieving AUC scores of 98.9%, 94.7%, and 82.9%, respectively. The explicit spatiotemporal encoding scheme further improves temporal alignment accuracy by 2.4%, contributing to enhanced anomaly localization and overall detection accuracy. Additionally, the proposed framework achieves a 14.3% reduction in FLOPs and demonstrates 18.7% faster convergence during training, highlighting its practical value for real-world deployment. Our optimized window-shift attention mechanism also reduces computational complexity, making MT-CMVAD a robust and efficient solution for safety-critical video understanding tasks. Full article
Show Figures

Figure 1

43 pages, 9269 KiB  
Article
A Machine Learning Approach for Predicting Particle Spatial, Velocity, and Temperature Distributions in Cold Spray Additive Manufacturing
by Lurui Wang, Mehdi Jadidi and Ali Dolatabadi
Appl. Sci. 2025, 15(12), 6418; https://doi.org/10.3390/app15126418 - 7 Jun 2025
Viewed by 466
Abstract
Masked cold spray additive manufacturing (CSAM) is investigated for fabricating nickel-based electrodes with pyramidal pin-fins that enlarge the active area for the hydrogen-evolution reaction (HER). To bypass the high cost of purely CFD-driven optimization, we construct a two-stage machine learning (ML) framework trained [...] Read more.
Masked cold spray additive manufacturing (CSAM) is investigated for fabricating nickel-based electrodes with pyramidal pin-fins that enlarge the active area for the hydrogen-evolution reaction (HER). To bypass the high cost of purely CFD-driven optimization, we construct a two-stage machine learning (ML) framework trained on 48 high-fidelity CFD simulations. Stage 1 applies sampling and a K-nearest-neighbor kernel-density-estimation algorithm that predicts the spatial distribution of impacting particles and re-allocates weights in regions of under-estimation. Stage 2 combines sampling, interpolation and symbolic regression to extract key features, then uses a weighted random forest model to forecast particle velocity and temperature upon impact. The ML predictions closely match CFD outputs while reducing computation time by orders of magnitude, demonstrating that ML-CFD integration can accelerate CSAM process design. Although developed for a masked setup, the framework generalizes readily to unmasked cold spray configurations. Full article
Show Figures

Figure 1

Back to TopTop