MDPI - Publisher of Open Access Journals

47 pages, 3959 KB

Open AccessReview

A Review of Deep Learning in Rotating Machinery Fault Diagnosis and Its Prospects for Port Applications

by Haifeng Wang, Hui Wang and Xianqiong Tang

Appl. Sci. 2025, 15(21), 11303; https://doi.org/10.3390/app152111303 - 22 Oct 2025

As port operations rapidly evolve toward intelligent and heavy-duty applications, fault diagnosis for core equipment demands higher levels of real-time performance and robustness. Deep learning, with its powerful autonomous feature learning capabilities, demonstrates significant potential in mechanical fault prediction and health management. This [...] Read more.

As port operations rapidly evolve toward intelligent and heavy-duty applications, fault diagnosis for core equipment demands higher levels of real-time performance and robustness. Deep learning, with its powerful autonomous feature learning capabilities, demonstrates significant potential in mechanical fault prediction and health management. This paper first provides a systematic review of deep learning research advances in rotating machinery fault diagnosis over the past eight years, focusing on the technical approaches and application cases of four representative models: Deep Belief Networks (DBNs), Convolutional Neural Networks (CNNs), Auto-encoders (AEs), and Recurrent Neural Networks (RNNs). These models, respectively, embody four core paradigms, unsupervised feature generation, spatial pattern extraction, data reconstruction learning, and temporal dependency modeling, forming the technological foundation of contemporary intelligent diagnostics. Building upon this foundation, this paper delves into the unique challenges encountered when transferring these methods from generic laboratory components to specialized port equipment such as shore cranes and yard cranes—including complex operating conditions, harsh environments, and system coupling. It further explores future research directions, including cross-condition transfer, multi-source information fusion, and lightweight deployment, aiming to provide theoretical references and implementation pathways for the technological advancement of intelligent operation and maintenance in port equipment. Full article

► Show Figures

Figure 1

21 pages, 14964 KB

Open AccessArticle

An Automated Framework for Abnormal Target Segmentation in Levee Scenarios Using Fusion of UAV-Based Infrared and Visible Imagery

by Jiyuan Zhang, Zhonggen Wang, Jing Chen, Fei Wang and Lyuzhou Gao

Remote Sens. 2025, 17(20), 3398; https://doi.org/10.3390/rs17203398 - 10 Oct 2025

Viewed by 319

Abstract

Levees are critical for flood defence, but their integrity is threatened by hazards such as piping and seepage, especially during high-water-level periods. Traditional manual inspections for these hazards and associated emergency response elements, such as personnel and assets, are inefficient and often impractical. [...] Read more.

Levees are critical for flood defence, but their integrity is threatened by hazards such as piping and seepage, especially during high-water-level periods. Traditional manual inspections for these hazards and associated emergency response elements, such as personnel and assets, are inefficient and often impractical. While UAV-based remote sensing offers a promising alternative, the effective fusion of multi-modal data and the scarcity of labelled data for supervised model training remain significant challenges. To overcome these limitations, this paper reframes levee monitoring as an unsupervised anomaly detection task. We propose a novel, fully automated framework that unifies geophysical hazards and emergency response elements into a single analytical category of “abnormal targets” for comprehensive situational awareness. The framework consists of three key modules: (1) a state-of-the-art registration algorithm to precisely align infrared and visible images; (2) a generative adversarial network to fuse the thermal information from IR images with the textural details from visible images; and (3) an adaptive, unsupervised segmentation module where a mean-shift clustering algorithm, with its hyperparameters automatically tuned by Bayesian optimization, delineates the targets. We validated our framework on a real-world dataset collected from a levee on the Pajiang River, China. The proposed method demonstrates superior performance over all baselines, achieving an Intersection over Union of 0.348 and a macro F1-Score of 0.479. This work provides a practical, training-free solution for comprehensive levee monitoring and demonstrates the synergistic potential of multi-modal fusion and automated machine learning for disaster management. Full article

(This article belongs to the Special Issue Development and Implementation of Early Detection and Warning Methods for Natural Hazards Utilizing Multi-Source Remote Sensing Data)

► Show Figures

Graphical abstract

35 pages, 5316 KB

Open AccessReview

Machine Learning for Quality Control in the Food Industry: A Review

by Konstantinos G. Liakos, Vassilis Athanasiadis, Eleni Bozinou and Stavros I. Lalas

Foods 2025, 14(19), 3424; https://doi.org/10.3390/foods14193424 - 4 Oct 2025

Viewed by 1516

Abstract

The increasing complexity of modern food production demands advanced solutions for quality control (QC), safety monitoring, and process optimization. This review systematically explores recent advancements in machine learning (ML) for QC across six domains: Food Quality Applications; Defect Detection and Visual Inspection Systems; [...] Read more.

The increasing complexity of modern food production demands advanced solutions for quality control (QC), safety monitoring, and process optimization. This review systematically explores recent advancements in machine learning (ML) for QC across six domains: Food Quality Applications; Defect Detection and Visual Inspection Systems; Ingredient Optimization and Nutritional Assessment; Packaging—Sensors and Predictive QC; Supply Chain—Traceability and Transparency and Food Industry Efficiency; and Industry 4.0 Models. Following a PRISMA-based methodology, a structured search of the Scopus database using thematic Boolean keywords identified 124 peer-reviewed publications (2005–2025), from which 25 studies were selected based on predefined inclusion and exclusion criteria, methodological rigor, and innovation. Neural networks dominated the reviewed approaches, with ensemble learning as a secondary method, and supervised learning prevailing across tasks. Emerging trends include hyperspectral imaging, sensor fusion, explainable AI, and blockchain-enabled traceability. Limitations in current research include domain coverage biases, data scarcity, and underexplored unsupervised and hybrid methods. Real-world implementation challenges involve integration with legacy systems, regulatory compliance, scalability, and cost–benefit trade-offs. The novelty of this review lies in combining a transparent PRISMA approach, a six-domain thematic framework, and Industry 4.0/5.0 integration, providing cross-domain insights and a roadmap for robust, transparent, and adaptive QC systems in the food industry. Full article

(This article belongs to the Special Issue Artificial Intelligence for the Food Industry)

► Show Figures

Figure 1

17 pages, 3603 KB

Open AccessArticle

A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder

by Yueyi Yang, Haiquan Wang, Xiaobo Nie, Shengjun Wen and Guolong Li

Symmetry 2025, 17(10), 1622; https://doi.org/10.3390/sym17101622 - 1 Oct 2025

Viewed by 247

Abstract

As a critical component of rail travel, the train communication network (TCN) is an integrated central platform that is used to realize the train control, condition monitoring, and data transmission, whose failure will disrupt the symmetry of TCN topology and endanger the security [...] Read more.

As a critical component of rail travel, the train communication network (TCN) is an integrated central platform that is used to realize the train control, condition monitoring, and data transmission, whose failure will disrupt the symmetry of TCN topology and endanger the security of rail trains. To enhance the reliability of TCN, an intelligent fault diagnosis method is proposed based on active learning (AL) and a stacked consistent autoencoder (SCAE), which is capable of building a competitive classifier with a limited amount of labeled training samples. SCAE can learn better feature presentations from electrical multifunction vehicle bus (MVB) signals by reconstructing the same raw input data layer by layer in the unsupervised feature learning phase. In the supervised fine-tuning phase, a deep AL-based fault diagnosis framework is proposed, and a dynamic fusion AL method is presented. The most valuable unlabeled samples are selected for labeling and training by considering uncertainty and similarity simultaneously, and the fusion weight is dynamically adjusted at the different training stages. A TCN experimental platform is constructed, and experimental results show that the proposed method achieves better performance under three different metrics with fewer labeled samples compared to the state-of-the-art methods; it is also symmetrically valid in class-imbalanced data. Full article

(This article belongs to the Special Issue Symmetry in Fault Detection and Diagnosis for Dynamic Systems)

► Show Figures

Figure 1

22 pages, 5746 KB

Open AccessArticle

AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching

by Qianglong Feng, Xiaofeng Wang, Zhenglin Lu, Haiyu Wang, Tingfeng Qi and Tianyi Zhang

Sensors 2025, 25(18), 5905; https://doi.org/10.3390/s25185905 - 21 Sep 2025

Viewed by 519

Abstract

The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To [...] Read more.

The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To address these problems, we propose an Adaptive Geometry-aware Stereo-KANformer Network (AGSK-Net) for unsupervised stereo matching. Firstly, to resolve the conflict between the isotropic nature of traditional ViT and the epipolar geometry priors in stereo matching, we propose Adaptive Geometry-aware Multi-head Self-Attention (AG-MSA), which embeds epipolar priors via an adaptive hybrid structure of geometric modulation and penalty, enabling geometry-aware global context modeling. Secondly, we design Spatial Group-Rational KAN (SGR-KAN), which integrates the nonlinear capability of rational functions with the spatial awareness of deep convolutions, replacing the MLP with flexible, learnable rational functions to enhance the nonlinear expression ability of complex regions. Finally, we propose a Dynamic Candidate Gated Fusion (DCGF) module that employs dynamic dual-candidate states and spatially aware pre-enhancement to adaptively fuse global and local features across scales. Experiments demonstrate that AGSK-Net achieves state-of-the-art accuracy and generalizability on Scene Flow, KITTI 2012/2015, and Middlebury 2021. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

19 pages, 3745 KB

Open AccessArticle

Anomaly Detection in Mineral Micro-X-Ray Fluorescence Spectroscopy Based on a Multi-Scale Feature Aggregation Network

by Yangxin Lu, Weiming Jiang, Molei Zhao, Yuanzhi Zhou, Jie Yang, Kunfeng Qiu and Qiuming Cheng

Minerals 2025, 15(9), 970; https://doi.org/10.3390/min15090970 - 13 Sep 2025

Viewed by 400

Abstract

Micro-X-ray fluorescence spectroscopy (micro-XRF) integrates spatial and spectral information and is widely employed for multi-elemental analyses of rock-forming minerals. However, its inherent limitation in spatial resolution gives rise to significant pixel mixing, thereby hindering the accurate identification of fine-scale or anomalous mineral phases. [...] Read more.

Micro-X-ray fluorescence spectroscopy (micro-XRF) integrates spatial and spectral information and is widely employed for multi-elemental analyses of rock-forming minerals. However, its inherent limitation in spatial resolution gives rise to significant pixel mixing, thereby hindering the accurate identification of fine-scale or anomalous mineral phases. Furthermore, most existing methods heavily rely on manually labeled data or predefined spectral libraries, rendering them poorly adaptable to complex and variable mineral systems. To address these challenges, this paper presents an unsupervised deep aggregation network (MSFA-Net) for micro-XRF imagery, aiming to eliminate the reliance of traditional methods on prior knowledge and enhance the recognition capability of rare mineral anomalies. Built on an autoencoder architecture, MSFA-Net incorporates a multi-scale orthogonal attention module to strengthen spectral–spatial feature fusion and employs density-based adaptive clustering to guide semantically aware reconstruction, thus achieving high-precision responses to potential anomalous regions. Experiments on real-world micro-XRF datasets demonstrate that MSFA-Net not only outperforms mainstream anomaly detection methods but also transcends the physical resolution limits of the instrument, successfully identifying subtle mineral anomalies that traditional approaches fail to detect. This method presents a novel paradigm for high-throughput and weakly supervised interpretation of complex geological images. Full article

(This article belongs to the Special Issue Gold–Polymetallic Deposits in Convergent Margins)

► Show Figures

Figure 1

20 pages, 2498 KB

Open AccessArticle

Gray and White Matter Networks Predict Mindfulness and Mind Wandering Traits: A Data Fusion Machine Learning Approach

by Minah Chang, Sara Sorella, Cristiano Crescentini and Alessandro Grecucci

Brain Sci. 2025, 15(9), 953; https://doi.org/10.3390/brainsci15090953 - 1 Sep 2025

Viewed by 619

Abstract

Background: Mindfulness and mind wandering are cognitive traits central to attentional control and psychological well-being, yet their neural underpinnings are yet to be elucidated. This study aimed to identify structural brain networks comprising gray matter (GM) and white matter (WM) that predict individual [...] Read more.

Background: Mindfulness and mind wandering are cognitive traits central to attentional control and psychological well-being, yet their neural underpinnings are yet to be elucidated. This study aimed to identify structural brain networks comprising gray matter (GM) and white matter (WM) that predict individual differences in mindfulness and distinct mind wandering tendencies (deliberate and spontaneous). Methods: Using structural MRI data and self-report measures from 76 participants, we applied an unsupervised data-fusion machine learning technique (parallel independent component analysis) to identify GM and WM networks associated with mindfulness and mind wandering traits. Results: Our analysis revealed several distinct brain networks linked to these cognitive constructs. Specifically, one GM network involving subcortical regions, including the caudate and thalamus, positively predicted mindfulness and deliberate mind wandering, while negatively influencing spontaneous mind wandering through the mediating role of the mindfulness facet “acting with awareness.” In addition, two separate WM networks, predominantly involving frontoparietal and temporal regions, were directly associated with reduced spontaneous mind wandering. Conclusions: These findings advance our current knowledge by demonstrating that specific GM and WM structures are involved in mindfulness and different forms of mind wandering. Our results also show that the “acting with awareness” facet has a mediating effect on spontaneous mind wandering, which provides supporting evidence for attentional and executive control models. These new insights into the neuroanatomical correlates of mindfulness and mind wandering have implications for ongoing research in the growing topic of mindfulness and mind wandering, mindfulness-based interventions, and other clinical applications. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Figure 1

20 pages, 1846 KB

Open AccessArticle

Unsupervised Tablet Defect Detection Method Based on Diffusion Model

by Mengfan Zhang, Weifeng Liu, Linqing He and Di Wang

Sensors 2025, 25(17), 5254; https://doi.org/10.3390/s25175254 - 23 Aug 2025

Viewed by 807

Abstract

Reconstruction-based unsupervised detection methods have demonstrated strong generalization capabilities in the field of tablet anomaly detection, but there are still problems such as poor reconstruction effect and inaccurate positioning of abnormal areas. To address these problems, this paper proposes an unsupervised Diffusion-based [...] Read more.

Reconstruction-based unsupervised detection methods have demonstrated strong generalization capabilities in the field of tablet anomaly detection, but there are still problems such as poor reconstruction effect and inaccurate positioning of abnormal areas. To address these problems, this paper proposes an unsupervised Diffusion-based Tablet Defect Detection (DTDD) method. This method uses an Assisted Reconstruction (AR) network to introduce original image information to assist in the reconstruction of abnormal areas, thereby improving the reconstruction effect of the diffusion model. It also uses a Scale Fusion (SF) network and an improved anomaly measurement method to improve the accuracy of abnormal area positioning. Finally, the effectiveness of the algorithm is verified on the tablet dataset. The experimental results show that the algorithm in this paper is superior to the algorithms in the same field, effectively improving the detection accuracy and abnormal positioning accuracy, and performing well in the tablet defect detection task. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

29 pages, 959 KB

Open AccessEditor’s ChoiceReview

Machine Learning-Driven Insights in Cancer Metabolomics: From Subtyping to Biomarker Discovery and Prognostic Modeling

by Amr Elguoshy, Hend Zedan and Suguru Saito

Metabolites 2025, 15(8), 514; https://doi.org/10.3390/metabo15080514 - 1 Aug 2025

Cited by 1 | Viewed by 1862

Abstract

Cancer metabolic reprogramming plays a critical role in tumor progression and therapeutic resistance, underscoring the need for advanced analytical strategies. Metabolomics, leveraging mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy, offers a comprehensive and functional readout of tumor biochemistry. By enabling both targeted [...] Read more.

Cancer metabolic reprogramming plays a critical role in tumor progression and therapeutic resistance, underscoring the need for advanced analytical strategies. Metabolomics, leveraging mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy, offers a comprehensive and functional readout of tumor biochemistry. By enabling both targeted metabolite quantification and untargeted profiling, metabolomics captures the dynamic metabolic alterations associated with cancer. The integration of metabolomics with machine learning (ML) approaches further enhances the interpretation of these complex, high-dimensional datasets, providing powerful insights into cancer biology from biomarker discovery to therapeutic targeting. This review systematically examines the transformative role of ML in cancer metabolomics. We discuss how various ML methodologies—including supervised algorithms (e.g., Support Vector Machine, Random Forest), unsupervised techniques (e.g., Principal Component Analysis, t-SNE), and deep learning frameworks—are advancing cancer research. Specifically, we highlight three major applications of ML–metabolomics integration: (1) cancer subtyping, exemplified by the use of Similarity Network Fusion (SNF) and LASSO regression to classify triple-negative breast cancer into subtypes with distinct survival outcomes; (2) biomarker discovery, where Random Forest and Partial Least Squares Discriminant Analysis (PLS-DA) models have achieved >90% accuracy in detecting breast and colorectal cancers through biofluid metabolomics; and (3) prognostic modeling, demonstrated by the identification of race-specific metabolic signatures in breast cancer and the prediction of clinical outcomes in lung and ovarian cancers. Beyond these areas, we explore applications across prostate, thyroid, and pancreatic cancers, where ML-driven metabolomics is contributing to earlier detection, improved risk stratification, and personalized treatment planning. We also address critical challenges, including issues of data quality (e.g., batch effects, missing values), model interpretability, and barriers to clinical translation. Emerging solutions, such as explainable artificial intelligence (XAI) approaches and standardized multi-omics integration pipelines, are discussed as pathways to overcome these hurdles. By synthesizing recent advances, this review illustrates how ML-enhanced metabolomics bridges the gap between fundamental cancer metabolism research and clinical application, offering new avenues for precision oncology through improved diagnosis, prognosis, and tailored therapeutic strategies. Full article

(This article belongs to the Special Issue Nutritional Metabolomics in Cancer)

► Show Figures

Figure 1

18 pages, 4374 KB

Open AccessArticle

Elevation-Aware Domain Adaptation for Sematic Segmentation of Aerial Images

by Zihao Sun, Peng Guo, Zehui Li, Xiuwan Chen and Xinbo Liu

Remote Sens. 2025, 17(14), 2529; https://doi.org/10.3390/rs17142529 - 21 Jul 2025

Viewed by 749

Abstract

Recent advancements in Earth observation technologies have accelerated remote sensing (RS) data acquisition, yet cross-domain semantic segmentation remains challenged by domain shifts. Traditional unsupervised domain adaptation (UDA) methods often rely on computationally intensive and unstable generative adversarial networks (GANs). This study introduces elevation-aware [...] Read more.

Recent advancements in Earth observation technologies have accelerated remote sensing (RS) data acquisition, yet cross-domain semantic segmentation remains challenged by domain shifts. Traditional unsupervised domain adaptation (UDA) methods often rely on computationally intensive and unstable generative adversarial networks (GANs). This study introduces elevation-aware domain adaptation (EADA), a multi-task framework that integrates elevation estimation (via digital surface models) with semantic segmentation to address distribution discrepancies. EADA employs a shared encoder and task-specific decoders, enhanced by a spatial attention-based feature fusion module. Experiments on Potsdam and Vaihingen datasets under cross-domain settings (e.g., Potsdam IRRG → Vaihingen IRRG) show that EADA achieves state-of-the-art performance, with a mean IoU of 54.62% and an F1-score of 65.47%, outperforming single-stage baselines. Elevation awareness significantly improves the segmentation of height-sensitive classes, such as buildings, while maintaining computational efficiency. Compared to multi-stage approaches, EADA’s end-to-end design reduces training complexity without sacrificing accuracy. These results demonstrate that incorporating elevation data effectively mitigates domain shifts in RS imagery. However, lower accuracy for elevation-insensitive classes suggests the need for further refinement to enhance overall generalizability. Full article

► Show Figures

Figure 1

20 pages, 3898 KB

Open AccessArticle

Synergistic Multi-Model Approach for GPR Data Interpretation: Forward Modeling and Robust Object Detection

by Hang Zhang, Zhijie Ma, Xinyu Fan and Feifei Hou

Remote Sens. 2025, 17(14), 2521; https://doi.org/10.3390/rs17142521 - 20 Jul 2025

Cited by 1 | Viewed by 669

Abstract

Ground penetrating radar (GPR) is widely used for subsurface object detection, but manual interpretation of hyperbolic features in B-scan images remains inefficient and error-prone. In addition, traditional forward modeling methods suffer from low computational efficiency and strong dependence on field measurements. To address [...] Read more.

Ground penetrating radar (GPR) is widely used for subsurface object detection, but manual interpretation of hyperbolic features in B-scan images remains inefficient and error-prone. In addition, traditional forward modeling methods suffer from low computational efficiency and strong dependence on field measurements. To address these challenges, we propose an unsupervised data augmentation framework that utilizes CycleGAN-based model to generate diverse synthetic B-scan images by simulating varying geological parameters and scanning configurations. This approach achieves GPR data forward modeling and enhances the scenario coverage of training data. We then apply the EfficientDet architecture, which incorporates a bidirectional feature pyramid network (BiFPN) for multi-scale feature fusion, to enhance the detection capability of hyperbolic signatures in B-scan images under challenging conditions such as partial occlusions and background noise. The proposed method achieves a mean average precision (mAP) of 0.579 on synthetic datasets, outperforming YOLOv3 and RetinaNet by 16.0% and 23.5%, respectively, while maintaining robust multi-object detection in complex field conditions. Full article

(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)

► Show Figures

Figure 1

19 pages, 3619 KB

Open AccessArticle

An Adaptive Underwater Image Enhancement Framework Combining Structural Detail Enhancement and Unsupervised Deep Fusion

by Semih Kahveci and Erdinç Avaroğlu

Appl. Sci. 2025, 15(14), 7883; https://doi.org/10.3390/app15147883 - 15 Jul 2025

Viewed by 676

Abstract

The underwater environment severely degrades image quality by absorbing and scattering light. This causes significant challenges, including non-uniform illumination, low contrast, color distortion, and blurring. These degradations compromise the performance of critical underwater applications, including water quality monitoring, object detection, and identification. To [...] Read more.

The underwater environment severely degrades image quality by absorbing and scattering light. This causes significant challenges, including non-uniform illumination, low contrast, color distortion, and blurring. These degradations compromise the performance of critical underwater applications, including water quality monitoring, object detection, and identification. To address these issues, this study proposes a detail-oriented hybrid framework for underwater image enhancement that synergizes the strengths of traditional image processing with the powerful feature extraction capabilities of unsupervised deep learning. Our framework introduces a novel multi-scale detail enhancement unit to accentuate structural information, followed by a Latent Low-Rank Representation (LatLRR)-based simplification step. This unique combination effectively suppresses common artifacts like oversharpening, spurious edges, and noise by decomposing the image into meaningful subspaces. The principal structural features are then optimally combined with a gamma-corrected luminance channel using an unsupervised MU-Fusion network, achieving a balanced optimization of both global contrast and local details. The experimental results on the challenging Test-C60 and OceanDark datasets demonstrate that our method consistently outperforms state-of-the-art fusion-based approaches, achieving average improvements of 7.5% in UIQM, 6% in IL-NIQE, and 3% in AG. Wilcoxon signed-rank tests confirm that these performance gains are statistically significant (p < 0.01). Consequently, the proposed method significantly mitigates prevalent issues such as color aberration, detail loss, and artificial haze, which are frequently encountered in existing techniques. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 4232 KB

Open AccessArticle

Multimodal Fusion Image Stabilization Algorithm for Bio-Inspired Flapping-Wing Aircraft

by Zhikai Wang, Sen Wang, Yiwen Hu, Yangfan Zhou, Na Li and Xiaofeng Zhang

Biomimetics 2025, 10(7), 448; https://doi.org/10.3390/biomimetics10070448 - 7 Jul 2025

Viewed by 774

Abstract

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable [...] Read more.

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable support for multimodal modeling. Based on this, to address the issue of poor image acquisition quality due to severe vibrations in aerial vehicles, this paper proposes a multi-modal signal fusion video stabilization framework. This framework effectively integrates image features and inertial sensor features to predict smooth and stable camera poses. During the video stabilization process, the true camera motion originally estimated based on sensors is warped to the smooth trajectory predicted by the network, thereby optimizing the inter-frame stability. This approach maintains the global rigidity of scene motion, avoids visual artifacts caused by traditional dense optical flow-based spatiotemporal warping, and rectifies rolling shutter-induced distortions. Furthermore, the network is trained in an unsupervised manner by leveraging a joint loss function that integrates camera pose smoothness and optical flow residuals. When coupled with a multi-stage training strategy, this framework demonstrates remarkable stabilization adaptability across a wide range of scenarios. The entire framework employs Long Short-Term Memory (LSTM) to model the temporal characteristics of camera trajectories, enabling high-precision prediction of smooth trajectories. Full article

► Show Figures

Figure 1

19 pages, 3044 KB

Open AccessReview

Deep Learning-Based Sound Source Localization: A Review

by Kunbo Xu, Zekai Zong, Dongjun Liu, Ran Wang and Liang Yu

Appl. Sci. 2025, 15(13), 7419; https://doi.org/10.3390/app15137419 - 2 Jul 2025

Viewed by 2085

Abstract

As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which [...] Read more.

As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which struggle to meet practical demands in dynamic and complex scenarios. Recent advancements in deep learning have revolutionized SSL by leveraging its end-to-end feature adaptability, cross-scenario generalization capabilities, and data-driven modeling, significantly enhancing localization robustness and accuracy in challenging environments. This review systematically examines the progress of deep learning-based SSL across three critical domains: marine environments, indoor reverberant spaces, and unmanned aerial vehicle (UAV) monitoring. In marine scenarios, complex-valued convolutional networks combined with adversarial transfer learning mitigate environmental mismatch and multipath interference through phase information fusion and domain adaptation strategies. For indoor high-reverberation conditions, attention mechanisms and multimodal fusion architectures achieve precise localization under low signal-to-noise ratios by adaptively weighting critical acoustic features. In UAV surveillance, lightweight models integrated with spatiotemporal Transformers address dynamic modeling of non-stationary noise spectra and edge computing efficiency constraints. Despite these advancements, current approaches face three core challenges: the insufficient integration of physical principles, prohibitive data annotation costs, and the trade-off between real-time performance and accuracy. Future research should prioritize physics-informed modeling to embed acoustic propagation mechanisms, unsupervised domain adaptation to reduce reliance on labeled data, and sensor-algorithm co-design to optimize hardware-software synergy. These directions aim to propel SSL toward intelligent systems characterized by high precision, strong robustness, and low power consumption. This work provides both theoretical foundations and technical references for algorithm selection and practical implementation in complex real-world scenarios. Full article

► Show Figures

Figure 1

22 pages, 580 KB

Open AccessArticle

A Comparative Study of Advanced Transformer Learning Frameworks for Water Potability Analysis Using Physicochemical Parameters

by Enes Algül, Saadin Oyucu, Onur Polat, Hüseyin Çelik, Süleyman Ekşi, Faruk Kurker and Ahmet Aksoz

Appl. Sci. 2025, 15(13), 7262; https://doi.org/10.3390/app15137262 - 27 Jun 2025

Cited by 1 | Viewed by 3339

Abstract

Keeping drinking water safe is a critical aspect of protecting public health. Traditional laboratory-based methods for evaluating water potability are often time-consuming, costly, and labour-intensive. This paper presents a comparative analysis of four transformer-based deep learning models in the development of automatic classification [...] Read more.

Keeping drinking water safe is a critical aspect of protecting public health. Traditional laboratory-based methods for evaluating water potability are often time-consuming, costly, and labour-intensive. This paper presents a comparative analysis of four transformer-based deep learning models in the development of automatic classification systems for water potability based on physicochemical attributes. The models examined include the enhanced tabular transformer (ETT), feature tokenizer transformer (FTTransformer), self-attention and inter-sample network (SAINT), and tabular autoencoder pretraining enhancement (TAPE). The study utilized an open-access water quality dataset that includes nine key attributes such as pH, hardness, total dissolved solids (TDS), chloramines, sulphate, conductivity, organic carbon, trihalomethanes, and turbidity. The models were evaluated under a unified protocol involving 70–15–15 data partitioning, five-fold cross-validation, fixed random seed, and consistent hyperparameter settings. Among the evaluated models, the enhanced tabular transformer outperforms other models with an accuracy of 95.04% and an F1 score of 0.94. ETT is an advanced model because it can efficiently model high-order feature interactions through multi-head attention and deep hierarchical encoding. Feature importance analysis consistently highlighted chloramines, conductivity, and trihalomethanes as key predictive features across all models. SAINT demonstrated robust generalization through its dual-attention mechanism, while TAPE provided competitive results with reduced computational overhead due to unsupervised pretraining. Conversely, FTTransformer showed limitations, likely due to sensitivity to class imbalance and hyperparameter tuning. The results underscore the potential of transformer-based models, especially ETT, in enabling efficient, accurate, and scalable water quality monitoring. These findings support their integration into real-time environmental health systems and suggest approaches for future research in explainability, domain adaptation, and multimodal fusion. Full article

(This article belongs to the Special Issue Water Treatment: From Membrane Processes to Renewable Energies)

► Show Figures

Figure 1

Search Results (113)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (113)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI