MDPI - Publisher of Open Access Journals

19 pages, 8193 KB

Open AccessArticle

Numerical and Experimental Analysis of Whistling Sound Generation and Suppression in Narrow-Gap Flow of Vehicle Side-View Mirror

by Kwongi Lee, Sangheon Lee, Cheolung Cheong, Sungnam Rim and Seongryong Shin

Appl. Sci. 2026, 16(1), 31; https://doi.org/10.3390/app16010031 - 19 Dec 2025

Viewed by 812

Abstract

This study investigates the generation and suppression of the whistling noise caused by flow through the narrow gap of a vehicle’s side mirror, an aerodynamic phenomenon often reported as a source of discomfort to passengers. The research employs a simultaneous approach, combining wind [...] Read more.

This study investigates the generation and suppression of the whistling noise caused by flow through the narrow gap of a vehicle’s side mirror, an aerodynamic phenomenon often reported as a source of discomfort to passengers. The research employs a simultaneous approach, combining wind tunnel experiments to determine the geometries and wind conditions at a flow speed of 22 m/s contributing to whistle generation at between 7 kHz and 8 kHz with numerical simulations utilizing compressible Large Eddy Simulation (LES) techniques for an in-depth investigation of the underlying aerodynamics. The Simplified Side-mirror Model (SSM) is developed, enabling precise wind visualization, and facilitating the identification of fundamental aerodynamic sound sources via vortex sound theory. The analysis reveals that the whistling sound is intricately linked to edge tone phenomena, driven by vortex shedding and flow instabilities at the angled shape in a narrow gap. Building on these insights, the study introduces the Suppressed Whistle Model (SWM), a configuration including shapes resembling a vortex generator that successfully mitigates the whistling by disrupting the identified flow structures causing the whistling sound. The suggested design is validated through wind visualization, comparing the numerical flow structures with the experimental ones. The experimental whistling sound pressure level of SWM decreases by about 20 dB compared to SSM, and a similar trend can be confirmed in the numerical results. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

18 pages, 2065 KB

Open AccessArticle

Phoneme-Aware Augmentation for Robust Cantonese ASR Under Low-Resource Conditions

by Lusheng Zhang, Shie Wu and Zhongxun Wang

Symmetry 2025, 17(9), 1478; https://doi.org/10.3390/sym17091478 - 8 Sep 2025

Cited by 2 | Viewed by 1631

Abstract

Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction [...] Read more.

Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction with two pho-neme-aware augmentation strategies. (1) Dynamic Boundary-Aligned Phoneme Dropout progressively removes entire IPA segments according to a curriculum schedule, simulating real-world phenomena such as elision, lenition, and tonal drift while ensuring training stability. (2) Phoneme-Aware SpecAugment confines all time- and frequency-masking operations within phoneme boundaries and prioritizes high-attention regions, thereby preserving intra-phonemic contours and formant integrity. Built on the Whistle encoder—which integrates a Conformer backbone, Connectionist Temporal Classification–Conditional Random Field (CTC-CRF) alignment, and a multi-lingual phonetic space—the approach requires only a grapheme-to-phoneme lexicon and Montreal Forced Aligner outputs, without any additional manual labeling. Experiments on the Cantonese subset of Common Voice demonstrate consistent gains: Dynamic Dropout alone reduces phoneme error rate (PER) from 17.8% to 16.7% with 50 h of speech and 16.4% to 15.1% with 100 h, while the combination of the two augmentations further lowers PER to 15.9%/14.4%. These results confirm that structure-aware phoneme-level perturbations provide an effective and low-cost solution for building robust Cantonese ASR systems under low-resource conditions. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

27 pages, 5228 KB

Open AccessArticle

Detection of Surface Defects in Steel Based on Dual-Backbone Network: MBDNet-Attention-YOLO

by Xinyu Wang, Shuhui Ma, Shiting Wu, Zhaoye Li, Jinrong Cao and Peiquan Xu

Sensors 2025, 25(15), 4817; https://doi.org/10.3390/s25154817 - 5 Aug 2025

Cited by 6 | Viewed by 2485

Abstract

Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical [...] Read more.

Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical vision pipelines or recent deep-learning paradigms, struggle to simultaneously satisfy the stringent demands of industrial scenarios: high accuracy on sub-millimeter flaws, insensitivity to texture-rich backgrounds, and real-time throughput on resource-constrained hardware. Although contemporary detectors have narrowed the gap, they still exhibit pronounced sensitivity–robustness trade-offs, particularly in the presence of scale-varying defects and cluttered surfaces. To address these limitations, we introduce MBY (MBDNet-Attention-YOLO), a lightweight yet powerful framework that synergistically couples the MBDNet backbone with the YOLO detection head. Specifically, the backbone embeds three novel components: (1) HGStem, a hierarchical stem block that enriches low-level representations while suppressing redundant activations; (2) Dynamic Align Fusion (DAF), an adaptive cross-scale fusion mechanism that dynamically re-weights feature contributions according to defect saliency; and (3) C2f-DWR, a depth-wise residual variant that progressively expands receptive fields without incurring prohibitive computational costs. Building upon this enriched feature hierarchy, the neck employs our proposed MultiSEAM module—a cascaded squeeze-and-excitation attention mechanism operating at multiple granularities—to harmonize fine-grained and semantic cues, thereby amplifying weak defect signals against complex textures. Finally, we integrate the Inner-SIoU loss, which refines the geometric alignment between predicted and ground-truth boxes by jointly optimizing center distance, aspect ratio consistency, and IoU overlap, leading to faster convergence and tighter localization. Extensive experiments on two publicly available steel-defect benchmarks—NEU-DET and PVEL-AD—demonstrate the superiority of MBY. Without bells and whistles, our model achieves 85.8% mAP@0.5 on NEU-DET and 75.9% mAP@0.5 on PVEL-AD, surpassing the best-reported results by significant margins while maintaining real-time inference on an NVIDIA Jetson Xavier. Ablation studies corroborate the complementary roles of each component, underscoring MBY’s robustness across defect scales and surface conditions. These results suggest that MBY strikes an appealing balance between accuracy, efficiency, and deployability, offering a pragmatic solution for next-generation industrial quality-control systems. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 1837 KB

Open AccessArticle

Real-Time Dolphin Whistle Detection on Raspberry Pi Zero 2 W with a TFLite Convolutional Neural Network

by Rocco De Marco, Francesco Di Nardo, Alessandro Rongoni, Laura Screpanti and David Scaradozzi

Robotics 2025, 14(5), 67; https://doi.org/10.3390/robotics14050067 - 19 May 2025

Cited by 3 | Viewed by 3057

Abstract

The escalating conflict between cetaceans and fisheries underscores the need for efficient mitigation strategies that balance conservation priorities with economic viability. This study presents a TinyML-driven approach deploying an optimized Convolutional Neural Network (CNN) on a Raspberry Pi Zero 2 W for real-time [...] Read more.

The escalating conflict between cetaceans and fisheries underscores the need for efficient mitigation strategies that balance conservation priorities with economic viability. This study presents a TinyML-driven approach deploying an optimized Convolutional Neural Network (CNN) on a Raspberry Pi Zero 2 W for real-time detection of bottlenose dolphin whistles, leveraging spectrogram analysis to address acoustic monitoring challenges. Specifically, a CNN model previously developed for classifying dolphins’ vocalizations and originally implemented with TensorFlow was converted to TensorFlow Lite (TFLite) with architectural optimizations, reducing the model size by 76%. Both TensorFlow and TFLite models were trained on 22 h of underwater recordings taken in controlled environments and processed into 0.8 s spectrogram segments (300 × 150 pixels). Despite reducing model size, TFLite models maintained the same accuracy as the original TensorFlow model (87.8% vs. 87.0%). Throughput and latency were evaluated by varying the thread allocation (1–8 threads), revealing the best performance at 4 threads (quad-core alignment), achieving an inference latency of 120 ms and sustained throughput of 8 spectrograms/second. The system demonstrated robustness in 120 h of continuous stress tests without failure, underscoring its reliability in marine environments. This work achieved a critical balance between computational efficiency and detection fidelity (F1-score: 86.9%) by leveraging quantized, multithreaded inference. These advancements enable low-cost devices for real-time cetacean presence detection, offering transformative potential for bycatch reduction and adaptive deterrence systems. This study bridges artificial intelligence innovation with ecological stewardship, providing a scalable framework for deploying machine learning in resource-constrained settings while addressing urgent conservation challenges. Full article

(This article belongs to the Section Sensors and Control in Robotics)

► Show Figures

Graphical abstract

18 pages, 6601 KB

Open AccessArticle

Dolphin Health Classifications from Whistle Features

by Brittany Jones, Jessica Sportelli, Jeremy Karnowski, Abby McClain, David Cardoso and Maximilian Du

J. Mar. Sci. Eng. 2024, 12(12), 2158; https://doi.org/10.3390/jmse12122158 - 26 Nov 2024

Cited by 4 | Viewed by 3847

Abstract

Bottlenose dolphins often conceal behavioral signs of illness until they reach an advanced stage. Motivated by the efficacy of vocal biomarkers in human health diagnostics, we utilized supervised machine learning methods to assess various model architectures’ effectiveness in classifying dolphin health status from [...] Read more.

Bottlenose dolphins often conceal behavioral signs of illness until they reach an advanced stage. Motivated by the efficacy of vocal biomarkers in human health diagnostics, we utilized supervised machine learning methods to assess various model architectures’ effectiveness in classifying dolphin health status from the acoustic features of their whistles. A gradient boosting classifier achieved a 72.3% accuracy in distinguishing between normal and abnormal health states—a significant improvement over chance (permutation test; 1000 iterations, p < 0.001). The model was trained on 30,693 whistles from 15 dolphins and the test set (15%) totaled 3612 ‘normal’ and 1775 ‘abnormal’ whistles. The classifier identified the health status of the dolphin from the whistles features with 72.3% accuracy, 73.2% recall, 56.1% precision, and a 63.5% F1 score. These findings suggest the encoding of internal health information within dolphin whistle features, with indications that the severity of illness correlates with classification accuracy, notably in its success for identifying ‘critical’ cases (94.2%). The successful development of this diagnostic tool holds promise for furnishing a passive, non-invasive, and cost-effective means for early disease detection in bottlenose dolphins. Full article

(This article belongs to the Special Issue Recent Advances in Marine Bioacoustics)

► Show Figures

Figure 1

19 pages, 755 KB

Open AccessArticle

Post-Quantum Secure ID-Based (Threshold) Linkable Dual-Ring Signature and Its Application in Blockchain Transactions

by Wen Gao, Haoyuan Yao, Baodong Qin, Xiaoli Dong, Zhen Zhao and Jiayu Zeng

Cryptography 2024, 8(4), 48; https://doi.org/10.3390/cryptography8040048 - 28 Oct 2024

Cited by 4 | Viewed by 6942

Abstract

Ring signatures are widely used in e-voting, anonymous whistle-blowing systems, and blockchain transactions. However, due to the anonymity of ring signatures, a signer can sign the same message multiple times, potentially leading to repeated voting or double spending in blockchain transactions. To address [...] Read more.

Ring signatures are widely used in e-voting, anonymous whistle-blowing systems, and blockchain transactions. However, due to the anonymity of ring signatures, a signer can sign the same message multiple times, potentially leading to repeated voting or double spending in blockchain transactions. To address these issues in blockchain transactions, this work constructs an identity-based linkable ring signature scheme based on the hardness of the lattice-based Module Small Integer Solution (M-SIS) assumption, which is hard even for quantum attackers. The proposed scheme is proven to be anonymous, unforgeable, linkable, and nonslanderable in the random oracle model. Compared to existing identity-based linkable ring signature (IBLRS) schemes of linear size, our signature size is relatively smaller, and this advantage is more pronounced when the number of ring members is small. We provide approximate signature size data for ring members ranging from 2 to 2048. When the number of ring members is 16 (or 512. resp.), the signature size of our scheme is 11.40 KB (or 24.68 KB, respectively). Finally, a threshold extension is given as an additional scheme with specifications and security analysis. Full article

► Show Figures

Figure 1

22 pages, 9907 KB

Open AccessArticle

An Automatic Deep Learning Bowhead Whale Whistle Recognizing Method Based on Adaptive SWT: Applying to the Beaufort Sea

by Rui Feng, Jian Xu, Kangkang Jin, Luochuan Xu, Yi Liu, Dan Chen and Linglong Chen

Remote Sens. 2023, 15(22), 5346; https://doi.org/10.3390/rs15225346 - 13 Nov 2023

Cited by 4 | Viewed by 2555

Abstract

The bowhead whale is a vital component of the maritime environment. Using deep learning techniques to recognize bowhead whales accurately and efficiently is crucial for their protection. Marine acoustic remote sensing technology is currently an important method to recognize bowhead whales. Adaptive SWT [...] Read more.

The bowhead whale is a vital component of the maritime environment. Using deep learning techniques to recognize bowhead whales accurately and efficiently is crucial for their protection. Marine acoustic remote sensing technology is currently an important method to recognize bowhead whales. Adaptive SWT is used to extract the acoustic features of bowhead whales. The CNN-LSTM deep learning model was constructed to recognize bowhead whale voices. Compared to STFT, the adaptive SWT used in this study raises the SCR for the stationary and nonstationary bowhead whale whistles by 88.20% and 92.05%, respectively. Ten-fold cross-validation yields an average recognition accuracy of 92.85%. The method efficiency of this work was further confirmed by the consistency found in the Beaufort Sea recognition results and the fisheries ecological study. The research results in this paper help promote the application of marine acoustic remote sensing technology and the conservation of bowhead whales. Full article

(This article belongs to the Special Issue Advanced Techniques for Water-Related Remote Sensing)

► Show Figures

Figure 1

15 pages, 5638 KB

Open AccessEditor’s ChoiceArticle

Underwater Biomimetic Covert Acoustic Communications Mimicking Multiple Dolphin Whistles

by Yongcheol Kim, Hojun Lee, Seunghwan Seol, Bonggyu Park and Jaehak Chung

Electronics 2023, 12(19), 3999; https://doi.org/10.3390/electronics12193999 - 22 Sep 2023

Cited by 5 | Viewed by 2599

Abstract

This paper presents an underwater biomimetic covert acoustic communication system that achieves high covertness and a high data rate by mimicking dolphin group whistles. The proposed method uses combined time–frequency shift keying modulation with continuous varying carrier frequency modulation, which mitigates the interference [...] Read more.

This paper presents an underwater biomimetic covert acoustic communication system that achieves high covertness and a high data rate by mimicking dolphin group whistles. The proposed method uses combined time–frequency shift keying modulation with continuous varying carrier frequency modulation, which mitigates the interference between two overlapping multiple whistles while maintaining a high data rate. The data rate and bit error rate (BER) performance of the proposed method were compared with conventional underwater covert communication through an additive white Gaussian noise channel, a modeled underwater channel, and practical ocean experiments. For the covertness test, the similarity of the proposed multiple whistles was compared with the real dolphin group whistles using the mean opinion score test. As a result, the proposed method demonstrated a higher data rate, better BER performance, and large covertness to the real dolphin group whistles. Full article

(This article belongs to the Special Issue New Advances in Underwater Communication Systems)

► Show Figures

Figure 1

25 pages, 1672 KB

Open AccessEditor’s ChoiceArticle

Drone-Based Environmental Emergency Response in the Brazilian Amazon

by Janiele Custodio and Hernan Abeledo

Drones 2023, 7(9), 554; https://doi.org/10.3390/drones7090554 - 27 Aug 2023

Cited by 4 | Viewed by 5348

Abstract

This paper introduces a location–allocation model to support environmental emergency response strategic planning using a drone-based network. Drones are used to verify potential emergencies, gathering additional information to support emergency response missions when time and resources are limited. The resulting discrete facility location–allocation [...] Read more.

This paper introduces a location–allocation model to support environmental emergency response strategic planning using a drone-based network. Drones are used to verify potential emergencies, gathering additional information to support emergency response missions when time and resources are limited. The resulting discrete facility location–allocation model with mobile servers assumes a centralized network operated out of sight by first responders and government agents. The optimization problem seeks to find the minimal cost configuration that meets operational constraints and performance objectives. To test the practical applicability of the proposed model, a real-life case study was implemented for the municipality of Ji-Paraná, in the Brazilian Amazon, using demand data from a mobile whistle-blower application and from satellite imagery projects that monitor deforestation and fire incidents in the region. Experiments are performed to understand the model’s sensitivity to various demand scenarios and capacity restrictions. Full article

(This article belongs to the Special Issue UAV IoT Sensing and Networking)

► Show Figures

Figure 1

17 pages, 4453 KB

Open AccessArticle

Interactive Attention Learning on Detection of Lane and Lane Marking on the Road by Monocular Camera Image

by Wei Tian, Xianwang Yu and Haohao Hu

Sensors 2023, 23(14), 6545; https://doi.org/10.3390/s23146545 - 20 Jul 2023

Cited by 11 | Viewed by 5199

Abstract

Vision-based identification of lane area and lane marking on the road is an indispensable function for intelligent driving vehicles, especially for localization, mapping and planning tasks. However, due to the increasing complexity of traffic scenes, such as occlusion and discontinuity, detecting lanes and [...] Read more.

Vision-based identification of lane area and lane marking on the road is an indispensable function for intelligent driving vehicles, especially for localization, mapping and planning tasks. However, due to the increasing complexity of traffic scenes, such as occlusion and discontinuity, detecting lanes and lane markings from an image captured by a monocular camera becomes persistently challenging. The lanes and lane markings have a strong position correlation and are constrained by a spatial geometry prior to the driving scene. Most existing studies only explore a single task, i.e., either lane marking or lane detection, and do not consider the inherent connection or exploit the modeling of this kind of relationship between both elements to improve the detection performance of both tasks. In this paper, we establish a novel multi-task encoder–decoder framework for the simultaneous detection of lanes and lane markings. This approach deploys a dual-branch architecture to extract image information from different scales. By revealing the spatial constraints between lanes and lane markings, we propose an interactive attention learning for their feature information, which involves a Deformable Feature Fusion module for feature encoding, a Cross-Context module as information decoder, a Cross-IoU loss and a Focal-style loss weighting for robust training. Without bells and whistles, our method achieves state-of-the-art results on tasks of lane marking detection (with 32.53% on IoU, 81.61% on accuracy) and lane segmentation (with 91.72% on mIoU) of the BDD100K dataset, which showcases an improvement of 6.33% on IoU, 11.11% on accuracy in lane marking detection and 0.22% on mIoU in lane detection compared to the previous methods. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

18 pages, 4217 KB

Open AccessArticle

Low-Resource Generation Method for Few-Shot Dolphin Whistle Signal Based on Generative Adversarial Network

by Huiyuan Wang, Xiaojun Wu, Zirui Wang, Yukun Hao, Chengpeng Hao, Xinyi He and Qiao Hu

J. Mar. Sci. Eng. 2023, 11(5), 1086; https://doi.org/10.3390/jmse11051086 - 22 May 2023

Cited by 3 | Viewed by 2782

Abstract

Dolphin signals are effective carriers for underwater covert detection and communication. However, the environmental and cost constraints terribly limit the amount of data available in dolphin signal datasets are often limited. Meanwhile, due to the low computational power and resource sensitivity of Unmanned [...] Read more.

Dolphin signals are effective carriers for underwater covert detection and communication. However, the environmental and cost constraints terribly limit the amount of data available in dolphin signal datasets are often limited. Meanwhile, due to the low computational power and resource sensitivity of Unmanned Underwater Vehicles (UUVs), current methods for real-time generation of dolphin signals with favorable results are still subject to several challenges. To this end, a Masked AutoEncoder Generative Adversarial Network (MAE-GAN) model is hereby proposed. First, considering the few-shot condition, the dataset is extended by using data augmentation techniques. Then, to meet the low arithmetic constraint, a denoising autoencoder with a mask is used to obtain latent codes through self-supervised learning. These latent codes are then utilized in Conditional Wasserstein Generative Adversarial Network-Gradient Penalty (CWGAN-GP) to generate a whistle signal model for the target dataset, fully demonstrating the effectiveness of the proposed method for enhancing dolphin signal generation in data-limited scenarios. The whistle signals generated by the MAE-GAN and baseline models are compared with actual dolphin signals, and the findings indicate that the proposed approach achieves a discriminative score of 0.074, which is 28.8% higher than that of the current state-of-the-art techniques. Furthermore, it requires only 30.2% of the computational resources of the baseline model. Overall, this paper presents a novel approach to generating high-quality dolphin signals in data-limited situations, which can also be deployed on low-resource devices. The proposed MAE-GAN methods provide a promising solution to address the challenges of limited data and computational power in generating dolphin signals. Full article

(This article belongs to the Special Issue Underwater Acoustics and Digital Signal Processing)

► Show Figures

Figure 1

18 pages, 6835 KB

Open AccessArticle

The Development of a Low-Cost Hydrophone for Passive Acoustic Monitoring of Dolphin’s Vocalizations

by Rocco De Marco, Francesco Di Nardo, Alessandro Lucchetti, Massimo Virgili, Andrea Petetta, Daniel Li Veli, Laura Screpanti, Veronica Bartolucci and David Scaradozzi

Remote Sens. 2023, 15(7), 1946; https://doi.org/10.3390/rs15071946 - 6 Apr 2023

Cited by 18 | Viewed by 7818

Abstract

Passive acoustics are widely used to monitor the presence of dolphins in the marine environment. This study aims to introduce a low-cost and homemade approach for assembling a complete underwater microphone (i.e., the hydrophone), employing cheap and easy to obtain components. The hydrophone [...] Read more.

Passive acoustics are widely used to monitor the presence of dolphins in the marine environment. This study aims to introduce a low-cost and homemade approach for assembling a complete underwater microphone (i.e., the hydrophone), employing cheap and easy to obtain components. The hydrophone was assembled with two piezo disks connected in a balanced configuration and encased in a plastic container filled with plastic foam. The hydrophone’s performance was validated by direct comparison with the commercially available AS-1 hydrophone (Aquarian Hydrophones, Anacortes, U.S.) on different underwater acoustic signals: artificial acoustic signals (ramp and multitone signals) and various dolphin vocalizations (whistle, echolocation clicks, and burst pulse signals). The sensitivity of the device’s performance to changes in the emission source position was also tested. The results of the validation procedure on both artificial signals and real dolphin vocalizations showed that the significant cost savings associated with cheap technology had a minimal effect on the recording device’s performance within the frequency range of 0–35 kHz. At this stage of experimentation, the global cost of the hydrophone could be estimated at a few euros, making it extremely price competitive when compared to more expensive commercially available models. In the future, this effective and low-cost technology would allow for continuous monitoring of the presence of free-ranging dolphins, significantly lowering the total cost of autonomous monitoring systems. This would permit broadening the monitored areas and creating a network of recorders, thus improving the acquisition of data. Full article

(This article belongs to the Special Issue Remote Sensing and Other Geomatics Techniques for Marine Applications)

► Show Figures

Graphical abstract

20 pages, 4057 KB

Open AccessArticle

A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation

by Yabo Xiao, Xiaojuan Wang, Mingshu He, Lei Jin, Mei Song and Jian Zhao

Electronics 2023, 12(4), 857; https://doi.org/10.3390/electronics12040857 - 8 Feb 2023

Cited by 9 | Viewed by 3491

Abstract

Multi-person pose estimation generally follows top-down and bottom-up paradigms. The top-down paradigm detects all human boxes and then performs single-person pose estimation on each ROI. The bottom-up paradigm locates identity-free keypoints and then groups them into individuals. Both of them use an extra [...] Read more.

Multi-person pose estimation generally follows top-down and bottom-up paradigms. The top-down paradigm detects all human boxes and then performs single-person pose estimation on each ROI. The bottom-up paradigm locates identity-free keypoints and then groups them into individuals. Both of them use an extra stage to build the relationship between human instance and corresponding keypoints (e.g., human detection in a top-down manner or a grouping process in a bottom-up manner). The extra stage leads to a high computation cost and a redundant two-stage pipeline. To address the above issue, we introduce a fine-grained body representation method. Concretely, the human body is divided into several local parts and each part is represented by an adaptive point. The novel body representation is able to sufficiently encode the diverse pose information and effectively model the relationship between human instance and corresponding keypoints in a single-forward pass. With the proposed body representation, we further introduce a compact single-stage multi-person pose regression network, called AdaptivePose++, which is the extended version of AAAI-22 paper AdaptivePose. During inference, our proposed network only needs a single-step decode operation to estimate the multi-person pose without complex post-processes and refinements. Without any bells and whistles, we achieve the most competitive performance on representative 2D pose estimation benchmarks MS COCO and CrowdPose in terms of accuracy and speed. In particular, AdaptivePose++ outperforms the state-of-the-art SWAHR-W48 and CenterGroup-W48 by 3.2 AP and 1.4 AP on COCO mini-val with faster inference speed. Furthermore, the outstanding performance on 3D pose estimation datasets MuCo-3DHP and MuPoTS-3D further demonstrates its effectiveness and generalizability on 3D scenes. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 7842 KB

Open AccessArticle

Intelligent Whistling System of Rail Train Based on YOLOv4 and U-Net

by Kai Wang, Zhonghang Zhang, Chaozhi Cai, Jianhua Ren and Nan Zhang

Appl. Sci. 2023, 13(3), 1695; https://doi.org/10.3390/app13031695 - 29 Jan 2023

Cited by 1 | Viewed by 2553

Abstract

The whistle of the rail train is usually directly controlled by the driver. However, in long-distance transportation, there is a risk of traffic accidents due to driver fatigue or distraction. In addition, the noise pollution of the train whistle has also been criticized. [...] Read more.

The whistle of the rail train is usually directly controlled by the driver. However, in long-distance transportation, there is a risk of traffic accidents due to driver fatigue or distraction. In addition, the noise pollution of the train whistle has also been criticized. In order to solve the above two problems, an intelligent whistling system for railway trains based on deep learning is proposed. The system judges whether to whistle and intelligently adjusts the volume of the whistle according to the road conditions of the train. The system consists of a road condition sensing module and a whistling decision module. The former includes the target detection model based on YOLOv4 and the semantic segmentation model based on U-Net, which can extract the key information of the road conditions ahead; the latter is to carry out logical analysis of the data after the intelligent recognition and processing and make the whistling decision. Based on the train-running data set, the intelligent whistle system model is tested. The results of this research show that the whistling accuracy of the model on the test set is 99.22%, the average volume error is 1.91 dB/time, and the Frames Per Second (FPS) is 18.7 f/s. Therefore, the intelligent whistle system model proposed in this paper has high reliability and is suitable for further development and application in actual scenes. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 11626 KB

Open AccessArticle

A Deep Learning Semantic Segmentation Method for Landslide Scene Based on Transformer Architecture

by Zhaoqiu Wang, Tao Sun, Kun Hu, Yueting Zhang, Xiaqiong Yu and Ying Li

Sustainability 2022, 14(23), 16311; https://doi.org/10.3390/su142316311 - 6 Dec 2022

Cited by 27 | Viewed by 4617

Abstract

Semantic segmentation technology based on deep learning has developed rapidly. It is widely used in remote sensing image recognition, but is rarely used in natural disaster scenes, especially in landslide disasters. After a landslide disaster occurs, it is necessary to quickly carry out [...] Read more.

Semantic segmentation technology based on deep learning has developed rapidly. It is widely used in remote sensing image recognition, but is rarely used in natural disaster scenes, especially in landslide disasters. After a landslide disaster occurs, it is necessary to quickly carry out rescue and ecological restoration work, using satellite data or aerial photography data to quickly analyze the landslide area. However, the precise location and area estimation of the landslide area is still a difficult problem. Therefore, we propose a deep learning semantic segmentation method based on Encoder-Decoder architecture for landslide recognition, called the Separable Channel Attention Network (SCANet). The SCANet consists of a Poolformer encoder and a Separable Channel Attention Feature Pyramid Network (SCA-FPN) decoder. Firstly, the Poolformer can extract global semantic information at different levels with the help of transformer architecture, and it greatly reduces computational complexity of the network by using pooling operations instead of a self-attention mechanism. Secondly, the SCA-FPN we designed can fuse multi-scale semantic information and complete pixel-level prediction of remote sensing images. Without bells and whistles, our proposed SCANet outperformed the mainstream semantic segmentation networks with fewer model parameters on our self-built landslide dataset. The mIoU scores of SCANet are 1.95% higher than ResNet50-Unet, especially. Full article

(This article belongs to the Special Issue Empowering Disaster Management with Remote Sensing and Social Sensing Advances)

► Show Figures

Figure 1

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI