MDPI - Publisher of Open Access Journals

22 pages, 12757 KB

Open AccessArticle

A Physics-Guided Deep Embedding Framework for Underwater Target Recognition Using Similarity-Based Decision

by Tianyang Xu, Hongjian Jia, Wensheng Zhu and Rui Xu

J. Mar. Sci. Eng. 2026, 14(12), 1088; https://doi.org/10.3390/jmse14121088 - 11 Jun 2026

Viewed by 56

In underwater target recognition, the scattering characteristics of small targets are weak and highly sensitive to observation angles, posing significant challenges to achieving stable and robust recognition in complex environments. Existing methods are mainly data-driven and rely on closed-set classifiers, which often lack [...] Read more.

In underwater target recognition, the scattering characteristics of small targets are weak and highly sensitive to observation angles, posing significant challenges to achieving stable and robust recognition in complex environments. Existing methods are mainly data-driven and rely on closed-set classifiers, which often lack physical interpretability and show limited generalization under different observation conditions. To address these issues, a physics-guided deep embedding framework for underwater target recognition is proposed. Firstly, an encoder–decoder network is designed to learn representative and physically consistent scattering features from measured echo frequency spectra. The encoder is then extracted to construct a Triplet-based embedding model, which maps high-dimensional scattering spectra into a discriminative low-dimensional feature space. In the embedding space, a similarity-based decision strategy is further adopted to replace the traditional classifier, and recognition is achieved by evaluating the relationships among embedded features. Experimental results show that the proposed method achieves robust recognition performance under varying observation angles and establishes an interpretable connection between scattering characteristics and recognition results. The proposed framework provides an effective way to combine physics-guided feature learning with deep embedding methods for underwater target recognition. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

38 pages, 8516 KB

Open AccessArticle

Physics-Prior-Augmented Deep Learning for Acoustic Convergence Zone Identification in Data-Scarce Marine Environments

by Haoyu Wang, Shuai Chang, Hao Zheng, Shuo Yang, Jianxin He and Xiong Deng

J. Mar. Sci. Eng. 2026, 14(11), 1028; https://doi.org/10.3390/jmse14111028 - 31 May 2026

Viewed by 139

Abstract

High-precision identification of acoustic convergence zones (CZs) and acoustic shadow zones (SZs) is a core prerequisite for deep-sea sonar performance prediction and long-range underwater target detection. However, in data-scarce marine environments, traditional acoustic identification methods suffer from high environmental sensitivity and significant computational [...] Read more.

High-precision identification of acoustic convergence zones (CZs) and acoustic shadow zones (SZs) is a core prerequisite for deep-sea sonar performance prediction and long-range underwater target detection. However, in data-scarce marine environments, traditional acoustic identification methods suffer from high environmental sensitivity and significant computational costs, while pure data-driven deep learning methods face dilemmas such as a lack of physical consistency and poor generalization on small samples. To address these issues, a three-level cascaded recognition framework based on physics-prior-augmented deep learning is proposed in this paper, enabling accurate segmentation of CZs and intelligent classification of sound field types under data-scarce scenarios. In this framework, physical acoustic principles are incorporated exclusively as priors through a training dataset generated by a Gaussian beam acoustic propagation code (Bellhop) and through hand-crafted geometric features derived post hoc from the initial segmentation outputs. Taking a typical deep-sea area in the Northwest Pacific Ocean as the research object, a hybrid dataset comprising 5000 simulated transmission loss images and 500 simulated images from a geographically distinct sea area is constructed. The sound field is categorized into four types: strong convergence, usable convergence, weak convergence, and shadow zone. In the first stage, the ResNet-34 backbone is improved by integrating deformable convolution and a global statistical feature module, which, combined with a joint loss function, achieves high-precision pixel-level segmentation of CZs and SZs, with the regional gray contrast reaching 86.9%. In the second stage, a customized dual-channel VGG16 architecture is designed to fuse the extracted geometric priors and visual features, achieving a sound field classification accuracy of 89.91%. In the third stage, a hybrid data augmentation technique combining Mixup and convolutional autoencoder is adopted alongside a transfer learning strategy to mitigate the data scarcity under cross-domain conditions, boosting the small-sample classification accuracy to 84.45%. The experimental results demonstrate that the models in each stage of the proposed framework significantly outperform traditional methods and baseline networks. This study provides a novel methodology and technical support for intelligent sound field identification in data-scarce marine environments. Finally, the core contributions and current limitations are summarized, and future research directions, such as constructing a dynamic hydrological parameter feedback mechanism and identifying three-dimensional complex sound fields, are prospected. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

25 pages, 16006 KB

Open AccessArticle

Underwater Target Recognition with Fusion of Multi-Domain Temporal Features

by Xiaochun Liu, Chenyu Wang, Yunchuan Yang, Xiangfeng Yang, Youfeng Hu and Jianguo Liu

Acoustics 2026, 8(2), 22; https://doi.org/10.3390/acoustics8020022 - 25 Mar 2026

Viewed by 969

Abstract

The dynamic nature of acoustic environments—particularly the fluctuation of underwater channels and time-varying target observation angles—poses significant challenges for active sonar target recognition, a problem further aggravated by the scarcity of labeled training samples. To address these limitations, this paper proposes a novel [...] Read more.

The dynamic nature of acoustic environments—particularly the fluctuation of underwater channels and time-varying target observation angles—poses significant challenges for active sonar target recognition, a problem further aggravated by the scarcity of labeled training samples. To address these limitations, this paper proposes a novel recognition method enabling deep fusion of multi-domain temporal features extracted from target echoes. First, complementary features are extracted across spatial, time–frequency, and Doppler domains to achieve a comprehensive and discriminative representation of targets. Subsequently, we introduce a feature vector-level fusion mechanism designed specifically for few-shot learning, integrating a meta-knowledge-driven multi-stream feature extractor with an internal memory module within the feature tensor framework. This architecture constitutes the Multi-domain Temporal Feature Fusion Recognition Network (MTFF-RNet). The proposed approach is evaluated on a hybrid dataset combining simulated and experimental data, achieving a high recognition accuracy of 96.2% for both targets and interferents. Experimental results demonstrate that MTFF-RNet significantly enhances robustness and adaptability under varying underwater acoustic conditions and dynamic viewing geometries. Full article

► Show Figures

Figure 1

25 pages, 13561 KB

Open AccessArticle

An Underwater Target Recognition Method Based on Feature Fusion and Balanced Ensemble Transfer Learning

by Haoqian Zhang, Hong Liang, Linfeng Zhu and Wenbo Gou

J. Mar. Sci. Eng. 2026, 14(6), 579; https://doi.org/10.3390/jmse14060579 - 20 Mar 2026

Cited by 1 | Viewed by 407

Abstract

In underwater target recognition scenarios, challenges arise as a result of the limited representational capability of acoustic images with single time-frequency features and poor recognition performance due to class imbalances in sample numbers. To tackle these issues, this paper proposes an underwater target [...] Read more.

In underwater target recognition scenarios, challenges arise as a result of the limited representational capability of acoustic images with single time-frequency features and poor recognition performance due to class imbalances in sample numbers. To tackle these issues, this paper proposes an underwater target recognition method based on feature fusion and balanced ensemble transfer learning. A LiT-INN dual-branch auto-encoder network architecture is employed for time-frequency image feature fusion to solve the weak feature representation capability of single time–frequency features. The Restormer network serves as a shared feature encoder to extract fundamental features, enabling feature fusion of underwater target echo time–frequency image data and generating a fusion image dataset with richer feature information. In order to address class imbalance in sample sizes, a balanced ensemble transfer learning method is constructed using a two-stage decoupled fine-tuning learning method. The first stage employs a uniform sampler strategy to fine-tune the feature extraction module of a pre-trained transfer learning model. The second stage uses multiple balanced sampling optimization methods to fine-tune the classifier. Then, a weight averaging ensemble learning method performs decision-level fusion of multiple weak classifiers. Field test data from three target classes validated the performance of the algorithm, demonstrating a 3% improvement in average recognition accuracy compared to deep transfer learning methods under different imbalance ratios. This method effectively enhances recognition performance for classes with limited samples while significantly boosting overall recognition accuracy, offering a novel solution for underwater target recognition. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

21 pages, 20926 KB

Open AccessArticle

Research on Neuro-Acoustic Human–Machine Collaborative Inter-Domain Global Attention Fusion for Underwater Acoustic Target Recognition

by Jiaqi Zhang, Zhangsong Shi, Huihui Xu, Zhe Rao, Songxue Bai and Junfeng Gao

J. Mar. Sci. Eng. 2026, 14(6), 578; https://doi.org/10.3390/jmse14060578 - 20 Mar 2026

Viewed by 469

Abstract

To enhance the adaptability of current underwater acoustic target recognition technology in complex marine environments and improve the performance of human–machine collaborative operations, this study proposes a human–machine collaborative underwater acoustic target recognition technology based on brain–computer interface technology. This method combines synchronized [...] Read more.

To enhance the adaptability of current underwater acoustic target recognition technology in complex marine environments and improve the performance of human–machine collaborative operations, this study proposes a human–machine collaborative underwater acoustic target recognition technology based on brain–computer interface technology. This method combines synchronized underwater acoustic neural features between acoustic signals and human brains to propose an inter-domain global attention fusion module to explore the fusion relationship of features at different depths, and to enhance the joint feature expression ability by combining potential complementary information between modalities. The experimental results show that the proposed network model can enhance the feature discrimination ability and obtain a more stable recognition model. Compared to a single feature, the human–machine collaborative fusion-feature model exhibits stronger classification performance, with an average classification accuracy of 96.4444%. This method can alleviate the limitations of single-mode underwater acoustic target recognition technology, combine the complementary advantages of humans and machines to achieve effective human–machine cooperation, and provide new insights for future underwater recognition technology and marine research. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

27 pages, 28242 KB

Open AccessArticle

Physics-Informed Side-Scan Sonar Perception: Tackling Weak Targets and Sparse Debris via Geometric and Frequency Decoupling

by Bojian Yu, Rongsheng Lin, Hanxiang Zhou, Jianxiong Zhang and Xinwei Zhang

Sensors 2026, 26(6), 1938; https://doi.org/10.3390/s26061938 - 19 Mar 2026

Viewed by 575

Abstract

Side-scan sonar (SSS) serves as the primary perceptual instrument for Autonomous Underwater Vehicles (AUVs) in large-scale marine search and rescue (SAR) operations. However, the detection of critical targets is frequently hindered by severe hydro-acoustic noise, the spatial discontinuity of wreckage, and the weak [...] Read more.

Side-scan sonar (SSS) serves as the primary perceptual instrument for Autonomous Underwater Vehicles (AUVs) in large-scale marine search and rescue (SAR) operations. However, the detection of critical targets is frequently hindered by severe hydro-acoustic noise, the spatial discontinuity of wreckage, and the weak visual signatures of small targets. To surmount these challenges, this paper presents WPG-DetNet. First, we introduce a Wavelet-Embedded Residual Backbone (WERB) to reconstruct the conventional downsampling paradigm. By substituting standard pooling with the Discrete Wavelet Transform (DWT), this architecture explicitly disentangles high-frequency noise from structural information in the frequency domain, thereby achieving the adaptive preservation of edge fidelity for large human-made targets while filtering out speckle interference. Then, addressing the distinct challenge of discontinuous aircraft wreckage, the framework further incorporates a Debris Graph Reasoning Module (D-GRM). This module models scattered fragments as nodes in a topological graph to capture long-range semantic dependencies, transforming isolated instance recognition into context-aware scene understanding. Finally, to bridge the gap between AI and underwater physics, we design a Shadow-Aided Decoupling Head (SADH) equipped with a physics-informed geometric loss. By enforcing mathematical consistency between target height and acoustic shadow length, this mechanism establishes a rigorous discriminative criterion capable of distinguishing weak-echo human bodies from seabed rocks based on shadow geometry. Experiments on the SCTD dataset demonstrate that WPG-DetNet achieves a mean Average Precision (

m A P_{50}

) of 97.5% and a Recall of 96.9%. Quantitative analysis reveals that our framework outperforms the classic Faster R-CNN by a margin of 12.8% in

m A P_{50}

and surpasses the Transformer-based RT-DETR-R18 by 5.6% in high-precision localization metrics (

m A P_{50 : 95}

). Simultaneously, WPG-DetNet maintains superior efficiency with an inference speed of 62.5 FPS and a lightweight parameter count of 16.8 M, striking an optimal balance between robust perception and the real-time constraints of AUV operations. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

24 pages, 11178 KB

Open AccessArticle

FLAMA: Frame-Level Alignment Margin Attack for Scene Text and Automatic Speech Recognition

by Yikun Xu, Zhiheng Xu and Pengwen Dai

Electronics 2026, 15(5), 1064; https://doi.org/10.3390/electronics15051064 - 4 Mar 2026

Cited by 1 | Viewed by 525

Abstract

Scene text recognition (STR) and automatic speech recognition (ASR) translate visual or acoustic signals into linguistic sequences and underpin many modern perception systems. Although their front-ends and decoders differ (e.g., CTC-based, attention-based, or variants), both tasks ultimately rely on aligning input frames to [...] Read more.

Scene text recognition (STR) and automatic speech recognition (ASR) translate visual or acoustic signals into linguistic sequences and underpin many modern perception systems. Although their front-ends and decoders differ (e.g., CTC-based, attention-based, or variants), both tasks ultimately rely on aligning input frames to output tokens by deep learning techniques, which exposes a shared vulnerability to adversarial perturbations. Existing attacks commonly optimize global sequence-level objectives. As a result, decisive frames are treated implicitly, and optimization can become unnecessarily diffuse over long input sequences, hindering convergence and perceptual quality. To address the above issues, we propose FLAMA, a unified Frame-Level Alignment Margin Attack, which could be used for both STR and ASR models. FLAMA explicitly targets alignment by maximizing per frame (or per step) recognition margins. The design is decoder-agnostic and applies to both CTC-based and attention-based pipelines. It employs a recognition-score-aware Step/Halt gate that concentrates updates on the most critical frames, and a stabilization stage that suppresses late-iteration oscillations to improve optimization stability and perceptual control. Ablation analyses show that stabilization consistently enhances attack success and reduces distortion. We evaluate FLAMA on STR benchmarks (SVT, CUTE80, and IC13) with CRNN, STAR, and TRBA, and on the ASR benchmark (LibriSpeech) with a Wav2Vec 2.0 model. Across modalities and architectures, FLAMA achieves near-100% attack success while substantially reducing

l_{2}

distortion and improving perceptual metrics compared with FGSM/PGD baselines. These results highlight frame-level alignment as a shared weak point across visual and audio sequence recognizers and suggest localized margin objectives as a principled route to effective sequence attacks. Full article

► Show Figures

Figure 1

24 pages, 5073 KB

Open AccessReview

Progress in Modern Pipeline Safety and Intelligent Technology

by Shaohua Dong, Lushuai Xu, Haotian Wei, Yong Li, Guanyi Liu, Feng Li and Yasir Mukhtar

Sustainability 2026, 18(4), 1728; https://doi.org/10.3390/su18041728 - 8 Feb 2026

Cited by 1 | Viewed by 1139

Abstract

Motivated by the need to reduce failure risks, enhance real-time situational awareness, and support data-driven decision-making, this article comprehensively reviews the latest progress in pipeline safety and intelligent technology, focusing on analyzing the effectiveness and challenges faced by integrity management technology in practical [...] Read more.

Motivated by the need to reduce failure risks, enhance real-time situational awareness, and support data-driven decision-making, this article comprehensively reviews the latest progress in pipeline safety and intelligent technology, focusing on analyzing the effectiveness and challenges faced by integrity management technology in practical situations. A structured literature survey was conducted to outline the key role and significant achievements of smart technology in improving the efficiency and reliability of pipeline safety management. Using this methodology, the review synthesizes progress in pipeline integrity management and monitoring technology, including the application of distributed strain measurement technology, wireless sensor networks, and Internet of Things technology, as well as the practical effects of deep learning and machine learning in defect detection and incident recognition. Additionally, special attention is given to analyzing the latest achievements in applications of large model technology, distributed optical fiber sensing technology, and acoustic analysis technology in the field of leakage monitoring. Based on the reviewed research, the article identifies key technical challenges, including targeted monitoring technology solutions and management strategies for the challenges in the field of pipeline safety. The findings conclude that intelligent technologies substantially enhance the development trend of AI applications. Hence, next-generation pipeline safety will rely on tightly coupled AI–IoT ecosystems. It anticipates the future of pipeline safety management by providing theoretical reference and technical support for pipeline safety guarantees and intelligent operation and maintenance. Full article

(This article belongs to the Special Issue Reliability, Sustainability and Risk Management of Energy and Pipeline Systems for a Low-Carbon Future)

► Show Figures

Graphical abstract

33 pages, 40054 KB

Open AccessArticle

MVDCNN: A Multi-View Deep Convolutional Network with Feature Fusion for Robust Sonar Image Target Recognition

by Yue Fan, Cheng Peng, Peng Zhang, Zhisheng Zhang, Guoping Zhang and Jinsong Tang

Remote Sens. 2026, 18(1), 76; https://doi.org/10.3390/rs18010076 - 25 Dec 2025

Cited by 1 | Viewed by 1028

Abstract

Automatic Target Recognition (ATR) in single-view sonar imagery is severely hampered by geometric distortions, acoustic shadows, and incomplete target information due to occlusions and the slant-range imaging geometry, which frequently give rise to misclassification and hinder practical underwater detection applications. To address these [...] Read more.

Automatic Target Recognition (ATR) in single-view sonar imagery is severely hampered by geometric distortions, acoustic shadows, and incomplete target information due to occlusions and the slant-range imaging geometry, which frequently give rise to misclassification and hinder practical underwater detection applications. To address these critical limitations, this paper proposes a Multi-View Deep Convolutional Neural Network (MVDCNN) based on feature-level fusion for robust sonar image target recognition. The MVDCNN adopts a highly modular and extensible architecture consisting of four interconnected modules: an input reshaping module that adapts multi-view images to match the input format of pre-trained backbone networks via dimension merging and channel replication; a shared-weight feature extraction module that leverages Convolutional Neural Network (CNN) or Transformer backbones (e.g., ResNet, Swin Transformer, Vision Transformer) to extract discriminative features from each view, ensuring parameter efficiency and cross-view feature consistency; a feature fusion module that aggregates complementary features (e.g., target texture and shape) across views using max-pooling to retain the most salient characteristics and suppress noisy or occluded view interference; and a lightweight classification module that maps the fused feature representations to target categories. Additionally, to mitigate the data scarcity bottleneck in sonar ATR, we design a multi-view sample augmentation method based on sonar imaging geometric principles: this method systematically combines single-view samples of the same target via the combination formula and screens valid samples within a predefined azimuth range, constructing high-quality multi-view training datasets without relying on complex generative models or massive initial labeled data. Comprehensive evaluations on the Custom Side-Scan Sonar Image Dataset (CSSID) and Nankai Sonar Image Dataset (NKSID) demonstrate the superiority of our framework over single-view baselines. Specifically, the two-view MVDCNN achieves average classification accuracies of 94.72% (CSSID) and 97.24% (NKSID), with relative improvements of 7.93% and 5.05%, respectively; the three-view MVDCNN further boosts the average accuracies to 96.60% and 98.28%. Moreover, MVDCNN substantially elevates the precision and recall of small-sample categories (e.g., Fishing net and Small propeller in NKSID), effectively alleviating the class imbalance challenge. Mechanism validation via t-Distributed Stochastic Neighbor Embedding (t-SNE) feature visualization and prediction confidence distribution analysis confirms that MVDCNN yields more separable feature representations and more confident category predictions, with stronger intra-class compactness and inter-class discrimination in the feature space. The proposed MVDCNN framework provides a robust and interpretable solution for advancing sonar ATR and offers a technical paradigm for multi-view acoustic image understanding in complex underwater environments. Full article

(This article belongs to the Special Issue Underwater Remote Sensing: Status, New Challenges and Opportunities)

► Show Figures

Graphical abstract

13 pages, 284 KB

Open AccessArticle

Two-Stage Domain Adaptation for LLM-Based ASR by Decoupling Linguistic and Acoustic Factors

by Lin Zheng, Xuyang Wang, Qingwei Zhao and Ta Li

Appl. Sci. 2026, 16(1), 60; https://doi.org/10.3390/app16010060 - 20 Dec 2025

Viewed by 1228

Abstract

Large language models (LLMs) have been increasingly applied in Automatic Speech Recognition (ASR), achieving significant advancements. However, the performance of LLM-based ASR (LLM-ASR) models remains unsatisfactory when applied across domains due to domain shifts between acoustic and linguistic conditions. To address this challenge, [...] Read more.

Large language models (LLMs) have been increasingly applied in Automatic Speech Recognition (ASR), achieving significant advancements. However, the performance of LLM-based ASR (LLM-ASR) models remains unsatisfactory when applied across domains due to domain shifts between acoustic and linguistic conditions. To address this challenge, we propose a decoupled two-stage domain adaptation framework that separates the adaptation process into text-only and audio-only stages. In the first stage, we leverage abundant text data from the target domain to refine the LLM component, thereby improving its contextual and linguistic alignment with the target domain. In the second stage, we employ a pseudo-labeling method with unlabeled audio data in the target domain and introduce two key enhancements: (1) incorporating decoupled auxiliary Connectionist Temporal Classification (CTC) loss to improve the robustness of the speech encoder under different acoustic conditions; (2) adopting a synchronous LLM tuning strategy, allowing the LLM to continuously learn linguistic alignment from pseudo-labeled transcriptions enriched with domain textual knowledge. The experimental results demonstrate that our proposed methods significantly improve the performance of LLM-ASR in the target domain, achieving a relative word error rate reduction of 19.2%. Full article

(This article belongs to the Special Issue Speech Recognition: Techniques, Applications and Prospects)

► Show Figures

Figure 1

32 pages, 5708 KB

Open AccessArticle

Affordable Audio Hardware and Artificial Intelligence Can Transform the Dementia Care Pipeline

by Ilyas Potamitis

Algorithms 2025, 18(12), 787; https://doi.org/10.3390/a18120787 - 12 Dec 2025

Viewed by 3152

Abstract

Population aging is increasing dementia care demand. We present an audio-driven monitoring pipeline that operates either on mobile phones, microcontroller nodes, or smart television sets. The system combines audio signal processing with AI tools for structured interpretation. Preprocessing includes voice activity detection, speaker [...] Read more.

Population aging is increasing dementia care demand. We present an audio-driven monitoring pipeline that operates either on mobile phones, microcontroller nodes, or smart television sets. The system combines audio signal processing with AI tools for structured interpretation. Preprocessing includes voice activity detection, speaker diarization, automatic speech recognition for dialogs, and speech-emotion recognition. An audio classifier detects home-care–relevant events (cough, cane taps, thuds, knocks, and speech). A large language model integrates transcripts, acoustic features, and a consented household knowledge base to produce a daily caregiver report covering orientation/disorientation (person, place, and time), delusion themes, agitation events, health proxies, and safety flags (e.g., exit seeking and falling). The pipeline targets real-time monitoring in homes and facilities, and it is an adjunct to caregiving, not a diagnostic device. Evaluation focuses on human-in-the-loop review, various audio/speech modalities, and the ability of AI to integrate information and reason. Intended users are low-income households in remote settings where in-person caregiving cannot be secured, enabling remote monitoring support for older adults with dementia. Full article

(This article belongs to the Special Issue AI-Assisted Medical Diagnostics)

► Show Figures

Figure 1

19 pages, 1786 KB

Open AccessArticle

Path-Routing Convolution and Scalable Lightweight Networks for Robust Underwater Acoustic Target Recognition

by Yue Zhao, Menghan Chen, Yuchen Lu, Liangliang Cheng, Cheng Chen, Yifei Li and Nizar Faisal Alkayem

Sensors 2025, 25(22), 7007; https://doi.org/10.3390/s25227007 - 17 Nov 2025

Viewed by 963

Abstract

Maritime traffic surveillance and ocean environmental protection urgently require the accurate identification of surface vessel types. Although deep learning methods have significantly improved the underwater acoustic target recognition performance, the existing models suffer from large parameter counts and fail to adapt to the [...] Read more.

Maritime traffic surveillance and ocean environmental protection urgently require the accurate identification of surface vessel types. Although deep learning methods have significantly improved the underwater acoustic target recognition performance, the existing models suffer from large parameter counts and fail to adapt to the multi-scale spectral features of radiated noise from different vessel types, restricting their practical deployment on power-constrained underwater sensors. To address these challenges, this paper proposes a novel path-routing convolution mechanism that achieves the discriminative extraction of cross-scale acoustic features through multi-dilation-rate parallel paths and an adaptive routing strategy and designs the MobilePR-ConvNet unified architecture that enables a single framework to automatically adapt to diverse hardware platforms through systematic width scaling. Experiments on the DeepShip and ShipsEar datasets demonstrate that the proposed method achieved 98.58% and 97.82% recognition accuracies, respectively, while maintaining a 77.8% robust performance under 10 dB low-signal-to-noise-ratio conditions, validating the cross-dataset generalization capability in complex marine environments and providing an effective solution for intelligent deployment on resource-constrained underwater devices. Full article

(This article belongs to the Special Issue Advances in Automatic Speech Recognition, Audio and Underwater Acoustic Signal Analysis)

► Show Figures

Figure 1

21 pages, 2679 KB

Open AccessArticle

Intelligent Feature Extraction and Event Classification in Distributed Acoustic Sensing Using Wavelet Packet Decomposition

by Artem Kozmin, Pavel Borozdin, Alexey Chernenko, Sergei Gostilovich, Oleg Kalashev and Alexey Redyuk

Technologies 2025, 13(11), 514; https://doi.org/10.3390/technologies13110514 - 11 Nov 2025

Cited by 2 | Viewed by 1051

Abstract

Distributed acoustic sensing (DAS) systems enable real-time monitoring of physical events across extended areas using optical fiber that detects vibrations through changes in backscattered light patterns. In perimeter security applications, these systems must accurately distinguish between legitimate activities and potential security threats by [...] Read more.

Distributed acoustic sensing (DAS) systems enable real-time monitoring of physical events across extended areas using optical fiber that detects vibrations through changes in backscattered light patterns. In perimeter security applications, these systems must accurately distinguish between legitimate activities and potential security threats by analyzing complex spatio-temporal data patterns. However, the high dimensionality and noise content of raw DAS data presents significant challenges for effective feature extraction and event classification, particularly when computational efficiency is required for real-time deployment. Traditional approaches or current machine learning methods often struggle with the balance between information preservation and computational complexity. This study addresses the critical need for efficient and accurate feature extraction methods that can identify informative signal components while maintaining real-time processing capabilities in DAS-based security systems. Here we show that wavelet packet decomposition (WPD) combined with a cascaded machine learning approach achieves 98% classification accuracy while reducing computational load through intelligent channel selection and preliminary filtering. Our modified peak signal-to-noise ratio metric successfully identifies the most informative frequency bands, which we validate through comprehensive neural network experiments across all possible WPD channels. The integration of principal component analysis with logistic regression as a preprocessing filter eliminates a substantial portion of non-target events while maintaining high recall level, significantly improving upon methods that processed all available data. These findings establish WPD as a powerful preprocessing technique for distributed sensing applications, with immediate applications in critical infrastructure protection. The demonstrated gains in computational efficiency and accuracy improvements suggest broad applicability to other pattern recognition challenges in large-scale sensor networks, seismic monitoring, and structural health monitoring systems, where real-time processing of high-dimensional acoustic data is essential. Full article

(This article belongs to the Special Issue Application and Development of Distributed Acoustic Sensing (DAS) Technology)

► Show Figures

Figure 1

15 pages, 926 KB

Open AccessArticle

Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction

by Huawei Tao, Yixing Jiang, Qianqian Li, Li Zhao and Zhizhe Yang

Information 2025, 16(11), 945; https://doi.org/10.3390/info16110945 - 30 Oct 2025

Viewed by 1490

Abstract

In cross-corpus scenarios, inappropriate feature-processing methods tend to cause the loss of key emotional information. Additionally, deep neural networks contain substantial redundancy, which triggers domain shift issues and impairs the generalization ability of emotion recognition systems. To address these challenges, this study proposes [...] Read more.

In cross-corpus scenarios, inappropriate feature-processing methods tend to cause the loss of key emotional information. Additionally, deep neural networks contain substantial redundancy, which triggers domain shift issues and impairs the generalization ability of emotion recognition systems. To address these challenges, this study proposes a cross-corpus speech emotion recognition model based on attention-driven feature refinement and spatial reconstruction. Specifically, the proposed approach consists of three key components: first, an autoencoder integrated with a multi-head attention mechanism to enhance the model’s ability to focus on the emotional components of acoustic features during the feature compression process of the autoencoder network; second, a feature refinement and spatial reconstruction module designed to further improve the extraction of emotional features, with a gating mechanism employed to optimize the feature reconstruction process; finally, the Charbonnier loss function adopted as the loss metric during training to minimize the difference between features from the source domain and target domain, thereby enhancing the cross-domain robustness of the model. Experimental results demonstrated that the proposed method achieved an average recognition accuracy of 46.75% across six sets of cross-corpus experiments, representing an improvement of 4.17% to 14.33% compared with traditional domain adaptation methods. Full article

► Show Figures

Graphical abstract

19 pages, 2659 KB

Open AccessArticle

A Full Pulse Acoustic Monitoring Method for Detecting the Interface During Concrete Pouring in Cast-in-Place Pile

by Ming Chen, Jinchao Wang, Jiwen Zeng and Hao He

Appl. Sci. 2025, 15(20), 11205; https://doi.org/10.3390/app152011205 - 19 Oct 2025

Viewed by 934

Abstract

As a key form of deep foundation in civil engineering, the concrete pouring quality of cast-in-place piles directly determines the integrity and long-term bearing performance of the pile body. Accurate monitoring of the pouring interface is critical to preventing defects such as mud [...] Read more.

As a key form of deep foundation in civil engineering, the concrete pouring quality of cast-in-place piles directly determines the integrity and long-term bearing performance of the pile body. Accurate monitoring of the pouring interface is critical to preventing defects such as mud inclusion and pile breakage. To address the limitations of existing monitoring methods for concrete pouring interfaces, this paper proposes a full-pulse acoustic monitoring method for the concrete pouring interface of cast-in-place piles. Firstly, by constructing a hardware system platform consisting of “multi-level in-borehole sound sources + interface acoustic wave sensors + orifice full-pulse receivers + ground processors”, differential capture of signals propagating at different depths is achieved through multi-frequency excitation. Subsequently, a waveform data processing method is proposed to realize denoising, enhancement, and frequency discrimination of different signals, and a target feature recognition model that integrates cross-correlation functions and signal similarity analysis is established. Finally, by leveraging the differential characteristics of measurement signals at different depths, a near-field measurement mode and a far-field measurement mode are developed, thereby establishing a calculation model for the elevation position of the pouring interface under different scenarios. Meanwhile, the feasibility of the proposed method is verified through practical engineering cases. The results indicate that the proposed full pulse acoustic monitoring method can achieve non-destructive, real-time, and high-precision monitoring of the pouring interface, providing an effective technical approach for quality control in pile foundation construction and exhibiting broad application prospects. Full article

► Show Figures

Figure 1

Search Results (126)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (126)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI