Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (25)

Search Parameters:
Keywords = whistle model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 8193 KB  
Article
Numerical and Experimental Analysis of Whistling Sound Generation and Suppression in Narrow-Gap Flow of Vehicle Side-View Mirror
by Kwongi Lee, Sangheon Lee, Cheolung Cheong, Sungnam Rim and Seongryong Shin
Appl. Sci. 2026, 16(1), 31; https://doi.org/10.3390/app16010031 - 19 Dec 2025
Viewed by 812
Abstract
This study investigates the generation and suppression of the whistling noise caused by flow through the narrow gap of a vehicle’s side mirror, an aerodynamic phenomenon often reported as a source of discomfort to passengers. The research employs a simultaneous approach, combining wind [...] Read more.
This study investigates the generation and suppression of the whistling noise caused by flow through the narrow gap of a vehicle’s side mirror, an aerodynamic phenomenon often reported as a source of discomfort to passengers. The research employs a simultaneous approach, combining wind tunnel experiments to determine the geometries and wind conditions at a flow speed of 22 m/s contributing to whistle generation at between 7 kHz and 8 kHz with numerical simulations utilizing compressible Large Eddy Simulation (LES) techniques for an in-depth investigation of the underlying aerodynamics. The Simplified Side-mirror Model (SSM) is developed, enabling precise wind visualization, and facilitating the identification of fundamental aerodynamic sound sources via vortex sound theory. The analysis reveals that the whistling sound is intricately linked to edge tone phenomena, driven by vortex shedding and flow instabilities at the angled shape in a narrow gap. Building on these insights, the study introduces the Suppressed Whistle Model (SWM), a configuration including shapes resembling a vortex generator that successfully mitigates the whistling by disrupting the identified flow structures causing the whistling sound. The suggested design is validated through wind visualization, comparing the numerical flow structures with the experimental ones. The experimental whistling sound pressure level of SWM decreases by about 20 dB compared to SSM, and a similar trend can be confirmed in the numerical results. Full article
(This article belongs to the Section Acoustics and Vibrations)
Show Figures

Figure 1

18 pages, 2065 KB  
Article
Phoneme-Aware Augmentation for Robust Cantonese ASR Under Low-Resource Conditions
by Lusheng Zhang, Shie Wu and Zhongxun Wang
Symmetry 2025, 17(9), 1478; https://doi.org/10.3390/sym17091478 - 8 Sep 2025
Cited by 2 | Viewed by 1631
Abstract
Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction [...] Read more.
Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction with two pho-neme-aware augmentation strategies. (1) Dynamic Boundary-Aligned Phoneme Dropout progressively removes entire IPA segments according to a curriculum schedule, simulating real-world phenomena such as elision, lenition, and tonal drift while ensuring training stability. (2) Phoneme-Aware SpecAugment confines all time- and frequency-masking operations within phoneme boundaries and prioritizes high-attention regions, thereby preserving intra-phonemic contours and formant integrity. Built on the Whistle encoder—which integrates a Conformer backbone, Connectionist Temporal Classification–Conditional Random Field (CTC-CRF) alignment, and a multi-lingual phonetic space—the approach requires only a grapheme-to-phoneme lexicon and Montreal Forced Aligner outputs, without any additional manual labeling. Experiments on the Cantonese subset of Common Voice demonstrate consistent gains: Dynamic Dropout alone reduces phoneme error rate (PER) from 17.8% to 16.7% with 50 h of speech and 16.4% to 15.1% with 100 h, while the combination of the two augmentations further lowers PER to 15.9%/14.4%. These results confirm that structure-aware phoneme-level perturbations provide an effective and low-cost solution for building robust Cantonese ASR systems under low-resource conditions. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

27 pages, 5228 KB  
Article
Detection of Surface Defects in Steel Based on Dual-Backbone Network: MBDNet-Attention-YOLO
by Xinyu Wang, Shuhui Ma, Shiting Wu, Zhaoye Li, Jinrong Cao and Peiquan Xu
Sensors 2025, 25(15), 4817; https://doi.org/10.3390/s25154817 - 5 Aug 2025
Cited by 6 | Viewed by 2485
Abstract
Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical [...] Read more.
Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical vision pipelines or recent deep-learning paradigms, struggle to simultaneously satisfy the stringent demands of industrial scenarios: high accuracy on sub-millimeter flaws, insensitivity to texture-rich backgrounds, and real-time throughput on resource-constrained hardware. Although contemporary detectors have narrowed the gap, they still exhibit pronounced sensitivity–robustness trade-offs, particularly in the presence of scale-varying defects and cluttered surfaces. To address these limitations, we introduce MBY (MBDNet-Attention-YOLO), a lightweight yet powerful framework that synergistically couples the MBDNet backbone with the YOLO detection head. Specifically, the backbone embeds three novel components: (1) HGStem, a hierarchical stem block that enriches low-level representations while suppressing redundant activations; (2) Dynamic Align Fusion (DAF), an adaptive cross-scale fusion mechanism that dynamically re-weights feature contributions according to defect saliency; and (3) C2f-DWR, a depth-wise residual variant that progressively expands receptive fields without incurring prohibitive computational costs. Building upon this enriched feature hierarchy, the neck employs our proposed MultiSEAM module—a cascaded squeeze-and-excitation attention mechanism operating at multiple granularities—to harmonize fine-grained and semantic cues, thereby amplifying weak defect signals against complex textures. Finally, we integrate the Inner-SIoU loss, which refines the geometric alignment between predicted and ground-truth boxes by jointly optimizing center distance, aspect ratio consistency, and IoU overlap, leading to faster convergence and tighter localization. Extensive experiments on two publicly available steel-defect benchmarks—NEU-DET and PVEL-AD—demonstrate the superiority of MBY. Without bells and whistles, our model achieves 85.8% mAP@0.5 on NEU-DET and 75.9% mAP@0.5 on PVEL-AD, surpassing the best-reported results by significant margins while maintaining real-time inference on an NVIDIA Jetson Xavier. Ablation studies corroborate the complementary roles of each component, underscoring MBY’s robustness across defect scales and surface conditions. These results suggest that MBY strikes an appealing balance between accuracy, efficiency, and deployability, offering a pragmatic solution for next-generation industrial quality-control systems. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 1837 KB  
Article
Real-Time Dolphin Whistle Detection on Raspberry Pi Zero 2 W with a TFLite Convolutional Neural Network
by Rocco De Marco, Francesco Di Nardo, Alessandro Rongoni, Laura Screpanti and David Scaradozzi
Robotics 2025, 14(5), 67; https://doi.org/10.3390/robotics14050067 - 19 May 2025
Cited by 3 | Viewed by 3057
Abstract
The escalating conflict between cetaceans and fisheries underscores the need for efficient mitigation strategies that balance conservation priorities with economic viability. This study presents a TinyML-driven approach deploying an optimized Convolutional Neural Network (CNN) on a Raspberry Pi Zero 2 W for real-time [...] Read more.
The escalating conflict between cetaceans and fisheries underscores the need for efficient mitigation strategies that balance conservation priorities with economic viability. This study presents a TinyML-driven approach deploying an optimized Convolutional Neural Network (CNN) on a Raspberry Pi Zero 2 W for real-time detection of bottlenose dolphin whistles, leveraging spectrogram analysis to address acoustic monitoring challenges. Specifically, a CNN model previously developed for classifying dolphins’ vocalizations and originally implemented with TensorFlow was converted to TensorFlow Lite (TFLite) with architectural optimizations, reducing the model size by 76%. Both TensorFlow and TFLite models were trained on 22 h of underwater recordings taken in controlled environments and processed into 0.8 s spectrogram segments (300 × 150 pixels). Despite reducing model size, TFLite models maintained the same accuracy as the original TensorFlow model (87.8% vs. 87.0%). Throughput and latency were evaluated by varying the thread allocation (1–8 threads), revealing the best performance at 4 threads (quad-core alignment), achieving an inference latency of 120 ms and sustained throughput of 8 spectrograms/second. The system demonstrated robustness in 120 h of continuous stress tests without failure, underscoring its reliability in marine environments. This work achieved a critical balance between computational efficiency and detection fidelity (F1-score: 86.9%) by leveraging quantized, multithreaded inference. These advancements enable low-cost devices for real-time cetacean presence detection, offering transformative potential for bycatch reduction and adaptive deterrence systems. This study bridges artificial intelligence innovation with ecological stewardship, providing a scalable framework for deploying machine learning in resource-constrained settings while addressing urgent conservation challenges. Full article
(This article belongs to the Section Sensors and Control in Robotics)
Show Figures

Graphical abstract

18 pages, 6601 KB  
Article
Dolphin Health Classifications from Whistle Features
by Brittany Jones, Jessica Sportelli, Jeremy Karnowski, Abby McClain, David Cardoso and Maximilian Du
J. Mar. Sci. Eng. 2024, 12(12), 2158; https://doi.org/10.3390/jmse12122158 - 26 Nov 2024
Cited by 4 | Viewed by 3847
Abstract
Bottlenose dolphins often conceal behavioral signs of illness until they reach an advanced stage. Motivated by the efficacy of vocal biomarkers in human health diagnostics, we utilized supervised machine learning methods to assess various model architectures’ effectiveness in classifying dolphin health status from [...] Read more.
Bottlenose dolphins often conceal behavioral signs of illness until they reach an advanced stage. Motivated by the efficacy of vocal biomarkers in human health diagnostics, we utilized supervised machine learning methods to assess various model architectures’ effectiveness in classifying dolphin health status from the acoustic features of their whistles. A gradient boosting classifier achieved a 72.3% accuracy in distinguishing between normal and abnormal health states—a significant improvement over chance (permutation test; 1000 iterations, p < 0.001). The model was trained on 30,693 whistles from 15 dolphins and the test set (15%) totaled 3612 ‘normal’ and 1775 ‘abnormal’ whistles. The classifier identified the health status of the dolphin from the whistles features with 72.3% accuracy, 73.2% recall, 56.1% precision, and a 63.5% F1 score. These findings suggest the encoding of internal health information within dolphin whistle features, with indications that the severity of illness correlates with classification accuracy, notably in its success for identifying ‘critical’ cases (94.2%). The successful development of this diagnostic tool holds promise for furnishing a passive, non-invasive, and cost-effective means for early disease detection in bottlenose dolphins. Full article
(This article belongs to the Special Issue Recent Advances in Marine Bioacoustics)
Show Figures

Figure 1

19 pages, 755 KB  
Article
Post-Quantum Secure ID-Based (Threshold) Linkable Dual-Ring Signature and Its Application in Blockchain Transactions
by Wen Gao, Haoyuan Yao, Baodong Qin, Xiaoli Dong, Zhen Zhao and Jiayu Zeng
Cryptography 2024, 8(4), 48; https://doi.org/10.3390/cryptography8040048 - 28 Oct 2024
Cited by 4 | Viewed by 6942
Abstract
Ring signatures are widely used in e-voting, anonymous whistle-blowing systems, and blockchain transactions. However, due to the anonymity of ring signatures, a signer can sign the same message multiple times, potentially leading to repeated voting or double spending in blockchain transactions. To address [...] Read more.
Ring signatures are widely used in e-voting, anonymous whistle-blowing systems, and blockchain transactions. However, due to the anonymity of ring signatures, a signer can sign the same message multiple times, potentially leading to repeated voting or double spending in blockchain transactions. To address these issues in blockchain transactions, this work constructs an identity-based linkable ring signature scheme based on the hardness of the lattice-based Module Small Integer Solution (M-SIS) assumption, which is hard even for quantum attackers. The proposed scheme is proven to be anonymous, unforgeable, linkable, and nonslanderable in the random oracle model. Compared to existing identity-based linkable ring signature (IBLRS) schemes of linear size, our signature size is relatively smaller, and this advantage is more pronounced when the number of ring members is small. We provide approximate signature size data for ring members ranging from 2 to 2048. When the number of ring members is 16 (or 512. resp.), the signature size of our scheme is 11.40 KB (or 24.68 KB, respectively). Finally, a threshold extension is given as an additional scheme with specifications and security analysis. Full article
Show Figures

Figure 1

22 pages, 9907 KB  
Article
An Automatic Deep Learning Bowhead Whale Whistle Recognizing Method Based on Adaptive SWT: Applying to the Beaufort Sea
by Rui Feng, Jian Xu, Kangkang Jin, Luochuan Xu, Yi Liu, Dan Chen and Linglong Chen
Remote Sens. 2023, 15(22), 5346; https://doi.org/10.3390/rs15225346 - 13 Nov 2023
Cited by 4 | Viewed by 2555
Abstract
The bowhead whale is a vital component of the maritime environment. Using deep learning techniques to recognize bowhead whales accurately and efficiently is crucial for their protection. Marine acoustic remote sensing technology is currently an important method to recognize bowhead whales. Adaptive SWT [...] Read more.
The bowhead whale is a vital component of the maritime environment. Using deep learning techniques to recognize bowhead whales accurately and efficiently is crucial for their protection. Marine acoustic remote sensing technology is currently an important method to recognize bowhead whales. Adaptive SWT is used to extract the acoustic features of bowhead whales. The CNN-LSTM deep learning model was constructed to recognize bowhead whale voices. Compared to STFT, the adaptive SWT used in this study raises the SCR for the stationary and nonstationary bowhead whale whistles by 88.20% and 92.05%, respectively. Ten-fold cross-validation yields an average recognition accuracy of 92.85%. The method efficiency of this work was further confirmed by the consistency found in the Beaufort Sea recognition results and the fisheries ecological study. The research results in this paper help promote the application of marine acoustic remote sensing technology and the conservation of bowhead whales. Full article
(This article belongs to the Special Issue Advanced Techniques for Water-Related Remote Sensing)
Show Figures

Figure 1

15 pages, 5638 KB  
Article
Underwater Biomimetic Covert Acoustic Communications Mimicking Multiple Dolphin Whistles
by Yongcheol Kim, Hojun Lee, Seunghwan Seol, Bonggyu Park and Jaehak Chung
Electronics 2023, 12(19), 3999; https://doi.org/10.3390/electronics12193999 - 22 Sep 2023
Cited by 5 | Viewed by 2599
Abstract
This paper presents an underwater biomimetic covert acoustic communication system that achieves high covertness and a high data rate by mimicking dolphin group whistles. The proposed method uses combined time–frequency shift keying modulation with continuous varying carrier frequency modulation, which mitigates the interference [...] Read more.
This paper presents an underwater biomimetic covert acoustic communication system that achieves high covertness and a high data rate by mimicking dolphin group whistles. The proposed method uses combined time–frequency shift keying modulation with continuous varying carrier frequency modulation, which mitigates the interference between two overlapping multiple whistles while maintaining a high data rate. The data rate and bit error rate (BER) performance of the proposed method were compared with conventional underwater covert communication through an additive white Gaussian noise channel, a modeled underwater channel, and practical ocean experiments. For the covertness test, the similarity of the proposed multiple whistles was compared with the real dolphin group whistles using the mean opinion score test. As a result, the proposed method demonstrated a higher data rate, better BER performance, and large covertness to the real dolphin group whistles. Full article
(This article belongs to the Special Issue New Advances in Underwater Communication Systems)
Show Figures

Figure 1

25 pages, 1672 KB  
Article
Drone-Based Environmental Emergency Response in the Brazilian Amazon
by Janiele Custodio and Hernan Abeledo
Drones 2023, 7(9), 554; https://doi.org/10.3390/drones7090554 - 27 Aug 2023
Cited by 4 | Viewed by 5348
Abstract
This paper introduces a location–allocation model to support environmental emergency response strategic planning using a drone-based network. Drones are used to verify potential emergencies, gathering additional information to support emergency response missions when time and resources are limited. The resulting discrete facility location–allocation [...] Read more.
This paper introduces a location–allocation model to support environmental emergency response strategic planning using a drone-based network. Drones are used to verify potential emergencies, gathering additional information to support emergency response missions when time and resources are limited. The resulting discrete facility location–allocation model with mobile servers assumes a centralized network operated out of sight by first responders and government agents. The optimization problem seeks to find the minimal cost configuration that meets operational constraints and performance objectives. To test the practical applicability of the proposed model, a real-life case study was implemented for the municipality of Ji-Paraná, in the Brazilian Amazon, using demand data from a mobile whistle-blower application and from satellite imagery projects that monitor deforestation and fire incidents in the region. Experiments are performed to understand the model’s sensitivity to various demand scenarios and capacity restrictions. Full article
(This article belongs to the Special Issue UAV IoT Sensing and Networking)
Show Figures

Figure 1

17 pages, 4453 KB  
Article
Interactive Attention Learning on Detection of Lane and Lane Marking on the Road by Monocular Camera Image
by Wei Tian, Xianwang Yu and Haohao Hu
Sensors 2023, 23(14), 6545; https://doi.org/10.3390/s23146545 - 20 Jul 2023
Cited by 11 | Viewed by 5199
Abstract
Vision-based identification of lane area and lane marking on the road is an indispensable function for intelligent driving vehicles, especially for localization, mapping and planning tasks. However, due to the increasing complexity of traffic scenes, such as occlusion and discontinuity, detecting lanes and [...] Read more.
Vision-based identification of lane area and lane marking on the road is an indispensable function for intelligent driving vehicles, especially for localization, mapping and planning tasks. However, due to the increasing complexity of traffic scenes, such as occlusion and discontinuity, detecting lanes and lane markings from an image captured by a monocular camera becomes persistently challenging. The lanes and lane markings have a strong position correlation and are constrained by a spatial geometry prior to the driving scene. Most existing studies only explore a single task, i.e., either lane marking or lane detection, and do not consider the inherent connection or exploit the modeling of this kind of relationship between both elements to improve the detection performance of both tasks. In this paper, we establish a novel multi-task encoder–decoder framework for the simultaneous detection of lanes and lane markings. This approach deploys a dual-branch architecture to extract image information from different scales. By revealing the spatial constraints between lanes and lane markings, we propose an interactive attention learning for their feature information, which involves a Deformable Feature Fusion module for feature encoding, a Cross-Context module as information decoder, a Cross-IoU loss and a Focal-style loss weighting for robust training. Without bells and whistles, our method achieves state-of-the-art results on tasks of lane marking detection (with 32.53% on IoU, 81.61% on accuracy) and lane segmentation (with 91.72% on mIoU) of the BDD100K dataset, which showcases an improvement of 6.33% on IoU, 11.11% on accuracy in lane marking detection and 0.22% on mIoU in lane detection compared to the previous methods. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

18 pages, 4217 KB  
Article
Low-Resource Generation Method for Few-Shot Dolphin Whistle Signal Based on Generative Adversarial Network
by Huiyuan Wang, Xiaojun Wu, Zirui Wang, Yukun Hao, Chengpeng Hao, Xinyi He and Qiao Hu
J. Mar. Sci. Eng. 2023, 11(5), 1086; https://doi.org/10.3390/jmse11051086 - 22 May 2023
Cited by 3 | Viewed by 2782
Abstract
Dolphin signals are effective carriers for underwater covert detection and communication. However, the environmental and cost constraints terribly limit the amount of data available in dolphin signal datasets are often limited. Meanwhile, due to the low computational power and resource sensitivity of Unmanned [...] Read more.
Dolphin signals are effective carriers for underwater covert detection and communication. However, the environmental and cost constraints terribly limit the amount of data available in dolphin signal datasets are often limited. Meanwhile, due to the low computational power and resource sensitivity of Unmanned Underwater Vehicles (UUVs), current methods for real-time generation of dolphin signals with favorable results are still subject to several challenges. To this end, a Masked AutoEncoder Generative Adversarial Network (MAE-GAN) model is hereby proposed. First, considering the few-shot condition, the dataset is extended by using data augmentation techniques. Then, to meet the low arithmetic constraint, a denoising autoencoder with a mask is used to obtain latent codes through self-supervised learning. These latent codes are then utilized in Conditional Wasserstein Generative Adversarial Network-Gradient Penalty (CWGAN-GP) to generate a whistle signal model for the target dataset, fully demonstrating the effectiveness of the proposed method for enhancing dolphin signal generation in data-limited scenarios. The whistle signals generated by the MAE-GAN and baseline models are compared with actual dolphin signals, and the findings indicate that the proposed approach achieves a discriminative score of 0.074, which is 28.8% higher than that of the current state-of-the-art techniques. Furthermore, it requires only 30.2% of the computational resources of the baseline model. Overall, this paper presents a novel approach to generating high-quality dolphin signals in data-limited situations, which can also be deployed on low-resource devices. The proposed MAE-GAN methods provide a promising solution to address the challenges of limited data and computational power in generating dolphin signals. Full article
(This article belongs to the Special Issue Underwater Acoustics and Digital Signal Processing)
Show Figures

Figure 1

18 pages, 6835 KB  
Article
The Development of a Low-Cost Hydrophone for Passive Acoustic Monitoring of Dolphin’s Vocalizations
by Rocco De Marco, Francesco Di Nardo, Alessandro Lucchetti, Massimo Virgili, Andrea Petetta, Daniel Li Veli, Laura Screpanti, Veronica Bartolucci and David Scaradozzi
Remote Sens. 2023, 15(7), 1946; https://doi.org/10.3390/rs15071946 - 6 Apr 2023
Cited by 18 | Viewed by 7818
Abstract
Passive acoustics are widely used to monitor the presence of dolphins in the marine environment. This study aims to introduce a low-cost and homemade approach for assembling a complete underwater microphone (i.e., the hydrophone), employing cheap and easy to obtain components. The hydrophone [...] Read more.
Passive acoustics are widely used to monitor the presence of dolphins in the marine environment. This study aims to introduce a low-cost and homemade approach for assembling a complete underwater microphone (i.e., the hydrophone), employing cheap and easy to obtain components. The hydrophone was assembled with two piezo disks connected in a balanced configuration and encased in a plastic container filled with plastic foam. The hydrophone’s performance was validated by direct comparison with the commercially available AS-1 hydrophone (Aquarian Hydrophones, Anacortes, U.S.) on different underwater acoustic signals: artificial acoustic signals (ramp and multitone signals) and various dolphin vocalizations (whistle, echolocation clicks, and burst pulse signals). The sensitivity of the device’s performance to changes in the emission source position was also tested. The results of the validation procedure on both artificial signals and real dolphin vocalizations showed that the significant cost savings associated with cheap technology had a minimal effect on the recording device’s performance within the frequency range of 0–35 kHz. At this stage of experimentation, the global cost of the hydrophone could be estimated at a few euros, making it extremely price competitive when compared to more expensive commercially available models. In the future, this effective and low-cost technology would allow for continuous monitoring of the presence of free-ranging dolphins, significantly lowering the total cost of autonomous monitoring systems. This would permit broadening the monitored areas and creating a network of recorders, thus improving the acquisition of data. Full article
(This article belongs to the Special Issue Remote Sensing and Other Geomatics Techniques for Marine Applications)
Show Figures

Graphical abstract

20 pages, 4057 KB  
Article
A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation
by Yabo Xiao, Xiaojuan Wang, Mingshu He, Lei Jin, Mei Song and Jian Zhao
Electronics 2023, 12(4), 857; https://doi.org/10.3390/electronics12040857 - 8 Feb 2023
Cited by 9 | Viewed by 3491
Abstract
Multi-person pose estimation generally follows top-down and bottom-up paradigms. The top-down paradigm detects all human boxes and then performs single-person pose estimation on each ROI. The bottom-up paradigm locates identity-free keypoints and then groups them into individuals. Both of them use an extra [...] Read more.
Multi-person pose estimation generally follows top-down and bottom-up paradigms. The top-down paradigm detects all human boxes and then performs single-person pose estimation on each ROI. The bottom-up paradigm locates identity-free keypoints and then groups them into individuals. Both of them use an extra stage to build the relationship between human instance and corresponding keypoints (e.g., human detection in a top-down manner or a grouping process in a bottom-up manner). The extra stage leads to a high computation cost and a redundant two-stage pipeline. To address the above issue, we introduce a fine-grained body representation method. Concretely, the human body is divided into several local parts and each part is represented by an adaptive point. The novel body representation is able to sufficiently encode the diverse pose information and effectively model the relationship between human instance and corresponding keypoints in a single-forward pass. With the proposed body representation, we further introduce a compact single-stage multi-person pose regression network, called AdaptivePose++, which is the extended version of AAAI-22 paper AdaptivePose. During inference, our proposed network only needs a single-step decode operation to estimate the multi-person pose without complex post-processes and refinements. Without any bells and whistles, we achieve the most competitive performance on representative 2D pose estimation benchmarks MS COCO and CrowdPose in terms of accuracy and speed. In particular, AdaptivePose++ outperforms the state-of-the-art SWAHR-W48 and CenterGroup-W48 by 3.2 AP and 1.4 AP on COCO mini-val with faster inference speed. Furthermore, the outstanding performance on 3D pose estimation datasets MuCo-3DHP and MuPoTS-3D further demonstrates its effectiveness and generalizability on 3D scenes. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 7842 KB  
Article
Intelligent Whistling System of Rail Train Based on YOLOv4 and U-Net
by Kai Wang, Zhonghang Zhang, Chaozhi Cai, Jianhua Ren and Nan Zhang
Appl. Sci. 2023, 13(3), 1695; https://doi.org/10.3390/app13031695 - 29 Jan 2023
Cited by 1 | Viewed by 2553
Abstract
The whistle of the rail train is usually directly controlled by the driver. However, in long-distance transportation, there is a risk of traffic accidents due to driver fatigue or distraction. In addition, the noise pollution of the train whistle has also been criticized. [...] Read more.
The whistle of the rail train is usually directly controlled by the driver. However, in long-distance transportation, there is a risk of traffic accidents due to driver fatigue or distraction. In addition, the noise pollution of the train whistle has also been criticized. In order to solve the above two problems, an intelligent whistling system for railway trains based on deep learning is proposed. The system judges whether to whistle and intelligently adjusts the volume of the whistle according to the road conditions of the train. The system consists of a road condition sensing module and a whistling decision module. The former includes the target detection model based on YOLOv4 and the semantic segmentation model based on U-Net, which can extract the key information of the road conditions ahead; the latter is to carry out logical analysis of the data after the intelligent recognition and processing and make the whistling decision. Based on the train-running data set, the intelligent whistle system model is tested. The results of this research show that the whistling accuracy of the model on the test set is 99.22%, the average volume error is 1.91 dB/time, and the Frames Per Second (FPS) is 18.7 f/s. Therefore, the intelligent whistle system model proposed in this paper has high reliability and is suitable for further development and application in actual scenes. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

22 pages, 11626 KB  
Article
A Deep Learning Semantic Segmentation Method for Landslide Scene Based on Transformer Architecture
by Zhaoqiu Wang, Tao Sun, Kun Hu, Yueting Zhang, Xiaqiong Yu and Ying Li
Sustainability 2022, 14(23), 16311; https://doi.org/10.3390/su142316311 - 6 Dec 2022
Cited by 27 | Viewed by 4617
Abstract
Semantic segmentation technology based on deep learning has developed rapidly. It is widely used in remote sensing image recognition, but is rarely used in natural disaster scenes, especially in landslide disasters. After a landslide disaster occurs, it is necessary to quickly carry out [...] Read more.
Semantic segmentation technology based on deep learning has developed rapidly. It is widely used in remote sensing image recognition, but is rarely used in natural disaster scenes, especially in landslide disasters. After a landslide disaster occurs, it is necessary to quickly carry out rescue and ecological restoration work, using satellite data or aerial photography data to quickly analyze the landslide area. However, the precise location and area estimation of the landslide area is still a difficult problem. Therefore, we propose a deep learning semantic segmentation method based on Encoder-Decoder architecture for landslide recognition, called the Separable Channel Attention Network (SCANet). The SCANet consists of a Poolformer encoder and a Separable Channel Attention Feature Pyramid Network (SCA-FPN) decoder. Firstly, the Poolformer can extract global semantic information at different levels with the help of transformer architecture, and it greatly reduces computational complexity of the network by using pooling operations instead of a self-attention mechanism. Secondly, the SCA-FPN we designed can fuse multi-scale semantic information and complete pixel-level prediction of remote sensing images. Without bells and whistles, our proposed SCANet outperformed the mainstream semantic segmentation networks with fewer model parameters on our self-built landslide dataset. The mIoU scores of SCANet are 1.95% higher than ResNet50-Unet, especially. Full article
Show Figures

Figure 1

Back to TopTop