MDPI - Publisher of Open Access Journals

25 pages, 1155 KiB

Open AccessArticle

A Framework for Bluetooth-Based Real-Time Audio Data Acquisition in Mobile Robotics

by Sandeep Gupta, Udit Mamodiya, A. K. M. Zakir Hossain and Ahmed J. A. Al-Gburi

Signals 2025, 6(3), 31; https://doi.org/10.3390/signals6030031 - 2 Jul 2025

Viewed by 610

This paper presents a novel framework addressing the fundamental challenge of concurrent real-time audio acquisition and motor control in resource-constrained mobile robotics. The ESP32-based system integrates a digital MEMS microphone with rover mobility through a unified Bluetooth protocol. Key innovations include (1) a [...] Read more.

This paper presents a novel framework addressing the fundamental challenge of concurrent real-time audio acquisition and motor control in resource-constrained mobile robotics. The ESP32-based system integrates a digital MEMS microphone with rover mobility through a unified Bluetooth protocol. Key innovations include (1) a dual-thread architecture enabling non-blocking concurrent operation, (2) an adaptive eight-bit compression algorithm optimizing bandwidth while preserving audio quality, and (3) a mathematical model for real-time resource allocation. A comprehensive empirical evaluation demonstrates consistent control latency below 150 ms with 90–95% audio packet delivery rates across varied environments. The framework enables mobile acoustic sensing applications while maintaining responsive motor control, validated through comprehensive testing in 40–85 dB acoustic environments at distances up to 10 m. A performance analysis demonstrates the feasibility of high-fidelity mobile acoustic sensing on embedded platforms, opening new possibilities for environmental monitoring, surveillance, and autonomous acoustic exploration systems. Full article

► Show Figures

Figure 1

16 pages, 1093 KiB

Open AccessArticle

A Lightweight Framework for Audio-Visual Segmentation with an Audio-Guided Space–Time Memory Network

by Yunpeng Zuo and Yunwei Zhang

Appl. Sci. 2025, 15(12), 6585; https://doi.org/10.3390/app15126585 - 11 Jun 2025

Viewed by 519

Abstract

As a multimodal fusion task, audio-visual segmentation (AVS) aims to locate sounding objects at the pixel level within a given image. This capability holds significant importance and practical value in applications such as intelligent surveillance, multimedia content analysis, and human–robot interaction. However, existing [...] Read more.

As a multimodal fusion task, audio-visual segmentation (AVS) aims to locate sounding objects at the pixel level within a given image. This capability holds significant importance and practical value in applications such as intelligent surveillance, multimedia content analysis, and human–robot interaction. However, existing AVS models typically feature complex architectures, require a large number of parameters, and are challenging to deploy on embedded platforms. Furthermore, these models often lack integration with object tracking mechanisms and fail to address the issue of the mis-segmentation of unvoiced objects caused by environmental noise in real-world scenarios. To address these challenges, this research proposes a lightweight audio-visual segmentation framework incorporating an audio-guided space–time memory network (AG-STMNet). First, a mask generator with a scoring mechanism was developed to identify sounding objects from generated masks. This component integrates Fastsam, a lightweight, pre-trained, object-aware segmentation model, with WAV2CLIP, a parameter-efficient audio-visual alignment model. Subsequently, AG-STMNet, an audio-guided video object segmentation network, was introduced to track sounding objects using video object segmentation techniques while mitigating environmental noise. Finally, the mask generator and AG-STMNet were combined to form the complete framework. The experimental results demonstrate that the framework achieves a mean Intersection over Union (mIoU) score of 41.5, indicating its potential as a viable lightweight solution for practical applications. Full article

(This article belongs to the Special Issue Artificial Intelligence and Its Application in Robotics)

► Show Figures

Figure 1

28 pages, 13595 KiB

Open AccessArticle

Open-Set Recognition of Environmental Sound Based on KDE-GAN and Attractor–Reciprocal Point Learning

by Jiakuan Wu, Nan Wang, Huajie Hong, Wei Wang, Kunsheng Xing and Yujie Jiang

Acoustics 2025, 7(2), 33; https://doi.org/10.3390/acoustics7020033 - 28 May 2025

Viewed by 739

Abstract

While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based [...] Read more.

While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based Generative Adversarial Network (KDE-GAN) for data augmentation combined with Attractor–Reciprocal Point Learning for open-set classification. Specifically, our approach addresses three key challenges: (1) How to generate boundary-aware synthetic samples for robust open-set training: A closed-set classifier’s pre-logit layer outputs are fed into the KDE-GAN, which synthesizes samples mapped to the logit layer using the classifier’s original weights. Kernel Density Estimation then enforces Density Loss and Offset Loss to ensure these samples align with class boundaries. (2) How to optimize feature space organization: The closed-set classifier is constrained by an Attractor–Reciprocal Point joint loss, maintaining intra-class compactness while pushing unknown samples toward low-density regions. (3) How to evaluate performance in highly open scenarios: We validate the method using UrbanSound8K, AudioEventDataset, and TUT Acoustic Scenes 2017 as closed sets, with ESC-50 categories as open-set samples, achieving AUROC/OSCR scores of 0.9251/0.8743, 0.7921/0.7135, and 0.8209/0.6262, respectively. The findings demonstrate the potential of this framework to enhance environmental sound monitoring systems, particularly in applications requiring adaptability to unseen acoustic events (e.g., urban noise surveillance or wildlife monitoring). Full article

► Show Figures

Figure 1

26 pages, 7054 KiB

Open AccessArticle

An Ensemble of Convolutional Neural Networks for Sound Event Detection

by Abdinabi Mukhamadiyev, Ilyos Khujayarov, Dilorom Nabieva and Jinsoo Cho

Mathematics 2025, 13(9), 1502; https://doi.org/10.3390/math13091502 - 1 May 2025

Viewed by 1093

Abstract

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings [...] Read more.

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings in smart cities and quickly assess the situation for security purposes. This research presents a comprehensive study of an ensemble convolutional recurrent neural network (CRNN) model designed for sound event detection (SED) in residential and public safety contexts. The work focuses on extracting meaningful features from audio signals using image-based representation, such as Discrete Cosine Transform (DCT) spectrograms, Cocheagrams, and Mel spectrograms, to enhance robustness against noise and improve feature extraction. In collaboration with police officers, a two-hour dataset consisting of 112 clips related to four classes of emotional sounds, such as harassment, quarrels, screams, and breaking sounds, was prepared. In addition to the crowdsourced dataset, publicly available datasets were used to broaden the study’s applicability. Our dataset contains 5055 audio files of different lengths totaling 14.14 h and strongly labeled data. The dataset consists of 13 separate sound categories. The proposed CRNN model integrates spatial and temporal feature extraction by processing these spectrograms through convolution and bi-directional gated recurrent unit (GRU) layers. An ensemble approach combines predictions from three models, achieving F1 scores of 71.5% for segment-based metrics and 46% for event-based metrics. The results demonstrate the model’s effectiveness in detecting sound events under noisy conditions, even with a small, unbalanced dataset. This research highlights the potential of the model for real-time audio surveillance systems using mini-computers, offering cost-effective and accurate solutions for maintaining public order. Full article

(This article belongs to the Special Issue Advanced Machine Vision with Mathematics)

► Show Figures

Figure 1

24 pages, 2941 KiB

Open AccessArticle

Real-Time Acoustic Detection of Critical Incidents in Smart Cities Using Artificial Intelligence and Edge Networks

by Ioannis Saradopoulos, Ilyas Potamitis, Stavros Ntalampiras, Iraklis Rigakis, Charalampos Manifavas and Antonios Konstantaras

Sensors 2025, 25(8), 2597; https://doi.org/10.3390/s25082597 - 20 Apr 2025

Viewed by 1218

Abstract

We present a system that integrates diverse technologies to achieve real-time, distributed audio surveillance. The system employs a network of microphones mounted on ESP32 platforms, which transmit compressed audio chunks via an MQTT protocol to Raspberry Pi5 devices for acoustic classification. These devices [...] Read more.

We present a system that integrates diverse technologies to achieve real-time, distributed audio surveillance. The system employs a network of microphones mounted on ESP32 platforms, which transmit compressed audio chunks via an MQTT protocol to Raspberry Pi5 devices for acoustic classification. These devices host an audio transformer model trained on the AudioSet dataset, enabling the real-time classification and timestamping of audio events with high accuracy. The output of the transformer is kept in a database of events and is subsequently converted into JSON format. The latter is further parsed into a graph structure that encapsulates the annotated soundscape, providing a rich and dynamic representation of audio environments. These graphs are subsequently traversed and analyzed using dedicated Python code and large language models (LLMs), enabling the system to answer complex queries about the nature, relationships, and context of detected audio events. We introduce a novel graph parsing method that achieves low false-alarm rates. In the task of analyzing the audio from a 1 h and 40 min long movie featuring hazardous driving practices, our approach achieved an accuracy of 0.882, precision of 0.8, recall of 1.0, and an F1 score of 0.89. By combining the robustness of distributed sensing and the precision of transformer-based audio classification, our approach that treats audio as text paves the way for advanced applications in acoustic surveillance, environmental monitoring, and beyond. Full article

(This article belongs to the Special Issue Technologies, Challenges, Applications, and Emerging Trends in Sensor-Enabled Embedded and Ubiquitous Computing)

► Show Figures

Figure 1

18 pages, 3228 KiB

Open AccessArticle

Automatic Detection and Unsupervised Clustering-Based Classification of Cetacean Vocal Signals

by Yinian Liang, Yan Wang, Fangjiong Chen, Hua Yu, Fei Ji and Yankun Chen

Appl. Sci. 2025, 15(7), 3585; https://doi.org/10.3390/app15073585 - 25 Mar 2025

Cited by 1 | Viewed by 549

Abstract

In the ocean environment, passive acoustic monitoring (PAM) is an important technique for the surveillance of cetacean species. Manual detection for a large amount of PAM data is inefficient and time-consuming. To extract useful features from a large amount of PAM data for [...] Read more.

In the ocean environment, passive acoustic monitoring (PAM) is an important technique for the surveillance of cetacean species. Manual detection for a large amount of PAM data is inefficient and time-consuming. To extract useful features from a large amount of PAM data for classifying different cetacean species, we propose an automatic detection and unsupervised clustering-based classification method for cetacean vocal signals. This paper overcomes the limitations of the traditional threshold-based method, and the threshold is set adaptively according to the mean value of the signal energy in each frame. Furthermore, we also address the problem of the high cost of data training and labeling in deep-learning-based methods by using the unsupervised clustering-based classification method. Firstly, the automatic detection method extracts vocal signals from PAM data and, at the same time, removes clutter information. Then, the vocal signals are analyzed for classification using a clustering algorithm. This method grabs the acoustic characteristics of vocal signals and distinguishes them from environmental noise. We process 194 audio files in a total of 25.3 h of vocal signal from two marine mammal public databases. Five kinds of vocal signals from different cetaceans are extracted and assembled to form 8 datasets for classification. The verification experiments were conducted on four clustering algorithms based on two performance metrics. The experimental results confirm the effectiveness of the proposed method. The proposed method automatically removes about 75% of clutter data from 1581.3MB of data in audio files and extracts 75.75 MB of the features detected by our algorithm. Four classical unsupervised clustering algorithms are performed on the datasets we made for verification and obtain an average accuracy rate of 84.83%. Full article

(This article belongs to the Special Issue Machine Learning in Acoustic Signal Processing)

► Show Figures

Figure 1

25 pages, 10241 KiB

Open AccessArticle

Machine Learning-Based Acoustic Analysis of Stingless Bee (Heterotrigona itama) Alarm Signals During Intruder Events

by Ashan Milinda Bandara Ratnayake, Hartini Mohd Yasin, Abdul Ghani Naim, Rahayu Sukmaria Sukri, Norhayati Ahmad, Nurul Hazlina Zaini, Soon Boon Yu, Mohammad Amiruddin Ruslan and Pg Emeroylariffion Abas

Agriculture 2025, 15(6), 591; https://doi.org/10.3390/agriculture15060591 - 11 Mar 2025

Viewed by 891

Abstract

Heterotrigona itama, a widely reared stingless bee species, produces highly valued honey. These bees naturally secure their colonies within logs, accessed via a single entrance tube, but remain vulnerable to intruders and predators. Guard bees play a critical role in colony defense, [...] Read more.

Heterotrigona itama, a widely reared stingless bee species, produces highly valued honey. These bees naturally secure their colonies within logs, accessed via a single entrance tube, but remain vulnerable to intruders and predators. Guard bees play a critical role in colony defense, exhibiting the ability to discriminate between nestmates and non-nestmates and employing strategies such as pheromone release, buzzing, hissing, and vibrations to alert and recruit hive mates during intrusions. This study investigated the acoustic signals produced by H. itama guard bees during intrusions to determine their potential for intrusion detection. Using a Jetson Nano equipped with a microphone and camera, guard bee sounds were recorded and labeled. After preprocessing the sound data, Mel Frequency Cepstral Coefficients (MFCCs) were extracted as features, and various dimensionality reduction techniques were explored. Among them, Linear Discriminant Analysis (LDA) demonstrated the best performance in improving class separability. The reduced feature set was used to train both Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classifiers. KNN outperformed SVM, achieving a Precision of 0.9527, a Recall of 0.9586, and an F1 Score of 0.9556. Additionally, KNN attained an Overall Cross-Validation Accuracy of 95.54% (±0.67%), demonstrating its superior classification performance. These findings confirm that H. itama produces distinct alarm sounds during intrusions, which can be effectively classified using machine learning; thus, demonstrating the feasibility of sound-based intrusion detection as a cost-effective alternative to image-based approaches. Future research should explore real-world implementation under varying environmental conditions and extend the study to other stingless bee species. Full article

(This article belongs to the Special Issue Application of Machine Learning and Artificial Intelligence in Precision Beekeeping)

► Show Figures

Figure 1

22 pages, 872 KiB

Open AccessArticle

The Walk of Guilt: Multimodal Deception Detection from Nonverbal Motion Behaviour

by Sharifa Alghowinem, Sabrina Caldwell, Ibrahim Radwan, Michael Wagner and Tom Gedeon

Information 2025, 16(1), 6; https://doi.org/10.3390/info16010006 - 26 Dec 2024

Viewed by 1245

Abstract

Detecting deceptive behaviour for surveillance and border protection is critical for a country’s security. With the advancement of technology in relation to sensors and artificial intelligence, recognising deceptive behaviour could be performed automatically. Following the success of affective computing in emotion recognition from [...] Read more.

Detecting deceptive behaviour for surveillance and border protection is critical for a country’s security. With the advancement of technology in relation to sensors and artificial intelligence, recognising deceptive behaviour could be performed automatically. Following the success of affective computing in emotion recognition from verbal and nonverbal cues, we aim to apply a similar concept for deception detection. Recognising deceptive behaviour has been attempted; however, only a few studies have analysed this behaviour from gait and body movement. This research involves a multimodal approach for deception detection from gait, where we fuse features extracted from body movement behaviours from a video signal, acoustic features from walking steps from an audio signal, and the dynamics of walking movement using an accelerometer sensor. Using the video recording of walking from the Whodunnit deception dataset, which contains 49 subjects performing scenarios that elicit deceptive behaviour, we conduct multimodal two-category (guilty/not guilty) subject-independent classification. The classification results obtained reached an accuracy of up to 88% through feature fusion, with an average of 60% from both single and multimodal signals. Analysing body movement using single modality showed that the visual signal had the highest performance followed by the accelerometer and acoustic signals. Several fusion techniques were explored, including early, late, and hybrid fusion, where hybrid fusion not only achieved the highest classification results, but also increased the confidence of the results. Moreover, using a systematic framework for selecting the most distinguishing features of guilty gait behaviour, we were able to interpret the performance of our models. From these baseline results, we can conclude that pattern recognition techniques could help in characterising deceptive behaviour, where future work will focus on exploring the tuning and enhancement of the results and techniques. Full article

(This article belongs to the Special Issue Multimodal Human-Computer Interaction)

► Show Figures

Figure 1

16 pages, 3506 KiB

Open AccessArticle

HADNet: A Novel Lightweight Approach for Abnormal Sound Detection on Highway Based on 1D Convolutional Neural Network and Multi-Head Self-Attention Mechanism

by Cong Liang, Qian Chen, Qiran Li, Qingnan Wang, Kang Zhao, Jihui Tu and Ammar Jafaripournimchahi

Electronics 2024, 13(21), 4229; https://doi.org/10.3390/electronics13214229 - 28 Oct 2024

Cited by 1 | Viewed by 1375

Abstract

Video surveillance is an effective tool for traffic management and safety, but it may face challenges in extreme weather, low visibility, areas outside the monitoring field of view, or during nighttime conditions. Therefore, abnormal sound detection is used in traffic management and safety [...] Read more.

Video surveillance is an effective tool for traffic management and safety, but it may face challenges in extreme weather, low visibility, areas outside the monitoring field of view, or during nighttime conditions. Therefore, abnormal sound detection is used in traffic management and safety as an auxiliary tool to complement video surveillance. In this paper, a novel lightweight method for abnormal sound detection based on 1D CNN and Multi-Head Self-Attention Mechanism on the embedded system is proposed, which is named HADNet. First, 1D CNN is employed for local feature extraction, which minimizes information loss from the audio signal during time-frequency conversion and reduces computational complexity. Second, the proposed block based on Multi-Head Self-Attention Mechanism not only effectively mitigates the issue of disappearing gradients, but also enhances detection accuracy. Finally, the joint loss function is employed to detect abnormal audio. This choice helps address issues related to unbalanced training data and class overlap, thereby improving model performance on imbalanced datasets. The proposed HADNet method was evaluated on the MIVIA Road Events and UrbanSound8K datasets. The results demonstrate that the proposed method for abnormal audio detection on embedded systems achieves high accuracy of 99.6% and an efficient detection time of 0.06 s. This approach proves to be robust and suitable for practical applications in traffic management and safety. By addressing the challenges posed by traditional video surveillance methods, HADNet offers a valuable and complementary solution for enhancing safety measures in diverse traffic conditions. Full article

(This article belongs to the Special Issue Fault Detection Technology Based on Deep Learning)

► Show Figures

Figure 1

21 pages, 1089 KiB

Open AccessArticle

Cloud IaaS Optimization Using Machine Vision at the IoT Edge and the Grid Sensing Algorithm

by Nuruzzaman Faruqui, Sandesh Achar, Sandeepkumar Racherla, Vineet Dhanawat, Prathyusha Sripathi, Md. Monirul Islam, Jia Uddin, Manal A. Othman, Md Abdus Samad and Kwonhue Choi

Sensors 2024, 24(21), 6895; https://doi.org/10.3390/s24216895 - 27 Oct 2024

Cited by 8 | Viewed by 1964

Abstract

Security grids consisting of High-Definition (HD) Internet of Things (IoT) cameras are gaining popularity for organizational perimeter surveillance and security monitoring. Transmitting HD video data to cloud infrastructure requires high bandwidth and more storage space than text, audio, and image data. It becomes [...] Read more.

Security grids consisting of High-Definition (HD) Internet of Things (IoT) cameras are gaining popularity for organizational perimeter surveillance and security monitoring. Transmitting HD video data to cloud infrastructure requires high bandwidth and more storage space than text, audio, and image data. It becomes more challenging for large-scale organizations with massive security grids to minimize cloud network bandwidth and storage costs. This paper presents an application of Machine Vision at the IoT Edge (Mez) technology in association with a novel Grid Sensing (GRS) algorithm to optimize cloud Infrastructure as a Service (IaaS) resource allocation, leading to cost minimization. Experimental results demonstrated a 31.29% reduction in bandwidth and a 22.43% reduction in storage requirements. The Mez technology offers a network latency feedback module with knobs for transforming video frames to adjust to the latency sensitivity. The association of the GRS algorithm introduces its compatibility in the IoT camera-driven security grid by automatically ranking the existing bandwidth requirements by different IoT nodes. As a result, the proposed system minimizes the entire grid’s throughput, contributing to significant cloud resource optimization. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) and Sensors in Smart City Transportation and Logistics)

► Show Figures

Figure 1

21 pages, 3664 KiB

Open AccessArticle

A Reduced Complexity Acoustic-Based 3D DoA Estimation with Zero Cyclic Sum

by Rigel Procópio Fernandes, José Antonio Apolinário and José Manoel de Seixas

Sensors 2024, 24(7), 2344; https://doi.org/10.3390/s24072344 - 7 Apr 2024

Cited by 1 | Viewed by 4873

Abstract

Accurate direction of arrival (DoA) estimation is paramount in various fields, from surveillance and security to spatial audio processing. This work introduces an innovative approach that refines the DoA estimation process and demonstrates its applicability in diverse and critical domains. We propose a [...] Read more.

Accurate direction of arrival (DoA) estimation is paramount in various fields, from surveillance and security to spatial audio processing. This work introduces an innovative approach that refines the DoA estimation process and demonstrates its applicability in diverse and critical domains. We propose a two-stage method that capitalizes on the often-overlooked secondary peaks of the cross-correlation function by introducing a reduced complexity DoA estimation method. In the first stage, a low complexity cost function based on the zero cyclic sum (ZCS) condition is used to allow for an exhaustive search of all combinations of time delays between pairs of microphones, including primary peak and secondary peaks of each cross-correlation. For the second stage, only a subset of the time delay combinations with the lowest ZCS cost function need to be tested using a least-squares (LS) solution, which requires more computational effort. To showcase the versatility and effectiveness of our method, we apply it to the challenging acoustic-based drone DoA estimation scenario using an array of four microphones. Through rigorous experimentation with simulated and actual data, our research underscores the potential of our proposed DoA estimation method as an alternative for handling complex acoustic scenarios. The ZCS method demonstrates an accuracy of

89.4 % \pm 2.7 %

, whereas the ZCS with the LS method exhibits a notably higher accuracy of

94.0 % \pm 3.1 %

, showcasing the superior performance of the latter. Full article

(This article belongs to the Special Issue UAV Detection, Classification, and Tracking)

► Show Figures

Figure 1

12 pages, 394 KiB

Open AccessArticle

MHAiR: A Dataset of Audio-Image Representations for Multimodal Human Actions

by Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam and Naveed Akhtar

Data 2024, 9(2), 21; https://doi.org/10.3390/data9020021 - 25 Jan 2024

Viewed by 2704

Abstract

Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were [...] Read more.

Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can be used as a benchmark dataset for evaluating the performance of machine learning models for human action recognition and related tasks. These audio-image representations could be suitable for a wide range of applications, such as surveillance, healthcare monitoring, and robotics. The dataset can also be used for transfer learning, where pre-trained models can be fine-tuned on a specific task using specific audio images. Thus, this dataset can facilitate the development of new techniques and approaches for improving the accuracy of human action-related tasks and also serve as a standard benchmark for testing the performance of different machine learning models and algorithms. Full article

► Show Figures

Figure 1

28 pages, 23093 KiB

Open AccessArticle

A Study on the High Reliability Audio Target Frequency Generator for Electronics Industry

by Changsik Park, Euntack Han, Ikjae Kim and Dongkyoo Shin

Electronics 2023, 12(24), 4918; https://doi.org/10.3390/electronics12244918 - 6 Dec 2023

Cited by 1 | Viewed by 1717

Abstract

The frequency synthesizer performs a simple function of generating a desired frequency by manipulating a reference frequency signal, but stable and precise frequency generation is essential for reliable operation in mechanical equipment such as communication, control, surveillance, medical, and commercial fields. Frequency synthesis, [...] Read more.

The frequency synthesizer performs a simple function of generating a desired frequency by manipulating a reference frequency signal, but stable and precise frequency generation is essential for reliable operation in mechanical equipment such as communication, control, surveillance, medical, and commercial fields. Frequency synthesis, which is commonly used in various contexts, has been used in analog and digital methods or hybrid methods. Especially in the field of communication, a precise frequency synthesizer is required for each frequency band, from very low-frequency AF (audio frequency) to high-frequency microwaves. The purpose of this paper is to design and implement a highly reliable frequency synthesizer for application to a railway track circuit systems using AF frequency only with the logic circuit of an FPGA (field programmable gate array) without using a microprocessor. Therefore, the development trend of analog, digital, and hybrid frequency synthesizers is examined, and a method for precise frequency synthesizer generation on the basis of the digital method is suggested. In this paper, the generated frequency generated by applying the digital frequency synthesizer using the ultra-precision algorithm completed by many trials and errors shows the performance of generating the target frequency with an accuracy of more than 99.999% and a resolution of mHz, which is much higher than the resolution of 5 Hz in the previous study. This highly precise AF-class frequency synthesizer contributes greatly to the safe operation and operation of braking and signaling systems when used in transportation equipment such as railways and subways. Full article

(This article belongs to the Special Issue Advances in Intelligent Data Analysis and Its Applications)

► Show Figures

Figure 1

16 pages, 10854 KiB

Open AccessArticle

Comparative Analysis of Audio Processing Techniques on Doppler Radar Signature of Human Walking Motion Using CNN Models

by Minh-Khue Ha, Thien-Luan Phan, Duc Hoang Ha Nguyen, Nguyen Hoang Quan, Ngoc-Quan Ha-Phan, Congo Tak Shing Ching and Nguyen Van Hieu

Sensors 2023, 23(21), 8743; https://doi.org/10.3390/s23218743 - 26 Oct 2023

Cited by 9 | Viewed by 3143

Abstract

Artificial intelligence (AI) radar technology offers several advantages over other technologies, including low cost, privacy assurance, high accuracy, and environmental resilience. One challenge faced by AI radar technology is the high cost of equipment and the lack of radar datasets for deep-learning model [...] Read more.

Artificial intelligence (AI) radar technology offers several advantages over other technologies, including low cost, privacy assurance, high accuracy, and environmental resilience. One challenge faced by AI radar technology is the high cost of equipment and the lack of radar datasets for deep-learning model training. Moreover, conventional radar signal processing methods have the obstacles of poor resolution or complex computation. Therefore, this paper discusses an innovative approach in the integration of radar technology and machine learning for effective surveillance systems that can surpass the aforementioned limitations. This approach is detailed into three steps: signal acquisition, signal processing, and feature-based classification. A hardware prototype of the signal acquisition circuitry was designed for a Continuous Wave (CW) K-24 GHz frequency band radar sensor. The collected radar motion data was categorized into non-human motion, human walking, and human walking without arm swing. Three signal processing techniques, namely short-time Fourier transform (STFT), mel spectrogram, and mel frequency cepstral coefficients (MFCCs), were employed. The latter two are typically used for audio processing, but in this study, they were proposed to obtain micro-Doppler spectrograms for all motion data. The obtained micro-Doppler spectrograms were then fed to a simplified 2D convolutional neural networks (CNNs) architecture for feature extraction and classification. Additionally, artificial neural networks (ANNs) and 1D CNN models were implemented for comparative analysis on various aspects. The experimental results demonstrated that the 2D CNN model trained on the MFCC feature outperformed the other two methods. The accuracy rate of the object classification models trained on micro-Doppler features was 97.93%, indicating the effectiveness of the proposed approach. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

33 pages, 2312 KiB

Open AccessArticle

Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding

by Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault and Khalid Choukri

Aerospace 2023, 10(10), 898; https://doi.org/10.3390/aerospace10100898 - 20 Oct 2023

Cited by 11 | Viewed by 7711

Abstract

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts aim at [...] Read more.

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts aim at integrating artificial intelligence (AI) into ATC communications in order to lessen ATCos’s workload. However, the development of data-driven AI systems for understanding of spoken ATC communications demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, which aimed to develop an unique platform to collect, preprocess, and transcribe large amounts of ATC audio data from airspace in real time. This paper reviews (i) robust automatic speech recognition (ASR), (ii) natural language processing, (iii) English language identification, and (iv) contextual ASR biasing with surveillance data. The pipeline developed during the ATCO2 project, along with the open-sourcing of its data, encourages research in the ATC field, while the full corpus can be purchased through ELDA. ATCO2 corpora is suitable for developing ASR systems when little or near to no ATC audio transcribed data are available. For instance, the proposed ASR system trained with ATCO2 reaches as low as 17.9% WER on public ATC datasets which is 6.6% absolute WER better than with “out-of-domain” but gold transcriptions. Finally, the release of 5000 h of ASR transcribed speech—covering more than 10 airports worldwide—is a step forward towards more robust automatic speech understanding systems for ATC communications. Full article

(This article belongs to the Special Issue Automatic Speech Recognition and Understanding in Air Traffic Management)

► Show Figures

Figure 1

Search Results (44)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (44)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI