Next Article in Journal
Power–Cadence Relationships in Cycling: Building Models from a Limited Number of Data Points
Previous Article in Journal
Multi-Instance Multi-Scale Graph Attention Neural Net with Label Semantic Embeddings for Instrument Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Framework for Bluetooth-Based Real-Time Audio Data Acquisition in Mobile Robotics

by
Sandeep Gupta
1,
Udit Mamodiya
2,
A. K. M. Zakir Hossain
3,* and
Ahmed J. A. Al-Gburi
3
1
Department of Artificial Intelligence and Data Science, Poornima Institute of Engineering and Technology, Jaipur 302022, India
2
Faculty of Engineering and Technology, Poornima University, Jaipur 303905, India
3
Center for Telecommunication Research & Innovation (CeTRI), Fakulti Teknologi Dan Kejuruteraan Elektronik Dan Komputer (FTKEK), Universiti Teknikal Malaysia Melaka (UTeM), Jalan Hang Tuah Jaya, Durian Tunggal, Melaka 76100, Malaysia
*
Author to whom correspondence should be addressed.
Signals 2025, 6(3), 31; https://doi.org/10.3390/signals6030031
Submission received: 7 May 2025 / Revised: 16 June 2025 / Accepted: 30 June 2025 / Published: 2 July 2025

Abstract

This paper presents a novel framework addressing the fundamental challenge of concurrent real-time audio acquisition and motor control in resource-constrained mobile robotics. The ESP32-based system integrates a digital MEMS microphone with rover mobility through a unified Bluetooth protocol. Key innovations include (1) a dual-thread architecture enabling non-blocking concurrent operation, (2) an adaptive eight-bit compression algorithm optimizing bandwidth while preserving audio quality, and (3) a mathematical model for real-time resource allocation. A comprehensive empirical evaluation demonstrates consistent control latency below 150 ms with 90–95% audio packet delivery rates across varied environments. The framework enables mobile acoustic sensing applications while maintaining responsive motor control, validated through comprehensive testing in 40–85 dB acoustic environments at distances up to 10 m. A performance analysis demonstrates the feasibility of high-fidelity mobile acoustic sensing on embedded platforms, opening new possibilities for environmental monitoring, surveillance, and autonomous acoustic exploration systems.

1. Introduction

The rapid expansion of microcontroller platforms, such as ESP32, Arduino, and Raspberry Pi, has resulted in a significant increase in the capabilities of mobile robotics, while simultaneously reducing the associated costs. These platforms have allowed developers to build increasingly sophisticated interactive systems capable of navigating physical spaces while also collecting data about their environments. Although vision-based sensing is the mainstay of mobile robotics, environmental audio capture is an under-exploited modality that can provide rich contextual information not available to other sensing modes [1,2].
However, when implementing audio sampling on resource-limited mobile robots, several unique challenges emerge that stationary solutions never encounter. On mobile platforms, work must be conducted within limited power budgets, constrained computational resources, noise potentially from onboard motors or actuators, and the necessity to provide timely feedback control signals without compromising the integrity of the audio stream [3,4]. These challenges become even more pronounced when aiming to communicate audio data wirelessly whilst keeping native positioning inputs responsive to the robot’s movements.
Bluetooth technology represents an attractive communication standard for such applications due to its ubiquity, relatively low power consumption, and adequate data rates for many embedded applications [5,6]. However, as noted by several researchers [7,8], Bluetooth presents intrinsic challenges when employed for concurrent real-time control and streaming applications: limited bandwidth, susceptibility to packet loss, and variable latency that can impair both control precision and audio fidelity. Recent developments in wireless technologies for flexible and wearable sensing have addressed some of these challenges [9], but applications in mobile robotics remain limited.
This paper introduces a novel framework that addresses these challenges by synthesizing a unified approach to real-time audio acquisition, motor control, and data transmission. This study proposes a dual-thread architecture that effectively partitions time-sensitive motor commands from bandwidth-intensive audio data, while maintaining synchronized operation through a carefully optimized Bluetooth communication protocol. By leveraging the I2S (Inter-IC Sound) digital audio interface coupled with dynamic range compression techniques, this approach achieves high-quality audio capture that coexists harmoniously with precise motor control. This approach builds upon existing research in networking architectures and protocols for IoT applications [10] while extending them to the specific requirements of mobile acoustic sensing.
The framework integrates acoustic sensing with motion in resource-limited robots, uncovering wholly new application spaces. As depicted in Figure 1, this architecture consists of a digital MEMS microphone connected to the GPIO via I2S, a Bluetooth communication layer allowing for bidirectional data exchange, and a motor control mechanism allowing the platform to move and capture acoustic data.
This approach is one major advancement over the existing literature, which has traditionally treated audio capture and mobility as mutually exclusive requirements. The integrated power management, appropriate handling of input buffers, adaptive buffer sizing, and dynamic bit-depth adjustment demonstrate that robotic platforms can serve as quality acoustic sensors while still being able to facilitate effective locomotion and maintain responsive control over their movements.
Key Contributions and Paper Highlights: This work makes three primary contributions to mobile acoustic sensing: (1) a novel dual-thread concurrent architecture that enables simultaneous high-quality audio capture and responsive motor control on resource-constrained platforms, (2) an adaptive compression algorithm with a mathematical optimization model that maintains audio fidelity while meeting real-time bandwidth constraints, and (3) a comprehensive noise mitigation methodology addressing electromagnetic interference, mechanical coupling, and wireless communication challenges specific to mobile acoustic sensing. The framework achieves 90–95% packet delivery rates with sub-150 ms control latency, demonstrating the feasibility of mobile acoustic sensing applications previously thought impossible on embedded platforms.

2. Literature Review

The integration of audio sensing and mobile robotics has evolved significantly over the last 10 years with a multifaceted involvement in this multifaceted problem. The following is a non-exhaustive list of the literature that constitutes the foundation of this work.
Nakadai et al. pioneered early work in mobile acoustic sensing [1] and created a humanoid robot that can localize and separate sound sources. Although their studies highlighted the significance of auditory information in robotics, they depended on intensive computational algorithms that were impractical for embedded platforms where resources are scarce. Taking a step further, McCormack et al. [11] provided more robust methods for sound processing applied to mobile robots, through Rao–Blackwellized particle filtering for real-time localization of multiple acoustical sources.
Gemelli et al. [3] performed exhaustive analyses of MEMS microphones for audio applications, thus reporting meaningful performance benchmarks for noise floor, frequency response, and power consumption. While their results showed promise for mobile applications with these compact sensors, they did not consider the complexities of the coupling with movement systems. Similarly, Fischer et al. [12] investigated optimizations for audio capture in embedded platforms with long-lasting and long-range Bluetooth wireless sensor nodes aimed at pressure and acoustic monitoring, yet restricted only within stationary applications. The MEMS-embedded system design was the subject of research led by Hong [13], contributing to the understanding of MEMS integration with digital systems.
Natgunanathan et al. conducted a comprehensive analysis of the shortcomings of Bluetooth-based data transmission in resource-constrained environments [5] and reported fundamental restrictions in their metric (bandwidth and reliability) when streaming sensor data into a Bluetooth Low Energy mesh network. Building on this work, Cantizani-Estepa et al. [6] performed experimental evaluations for close detection in search and rescue missions using robotic platforms, but this work does not consider the actual needs for streaming audio in terms of throughput requirements. Idogawa et al. [14] also made advancements in this area by proposing a lighter-weight, wireless Bluetooth-low-energy neuronal recording system with a focus on power management and data transmission.
Kamdjou et al. made noteworthy advances in this area in relation to resource constraints in mobile robotic systems [7]. Whereas Matheen and Sundar [8] also researched IoT multimedia sensors from an energy efficiency and security perspective by defining design principles, bringing really valuable insights on how to sustain mean sensor integrity throughout its operation. Additionally, extensive research into deep-learning-based wireless indoor localization has been conducted by Owfi [15], while Ahmad [16] performed a systematic review of emerging technologies used in WSN-based localization.
Wang et al. advanced the theoretical basis of speech recognition and audio processing in IoT settings [17], whose model for the secure transfer of speech recognition data in the Internet of Things guides us in audio acquisition. Lakomkin, Egor et al. [18] investigated the impact of end-to-end automatic speech recognition on workflow documentation and thus offered some valuable insights concerning audio processing technologies adopted in practice.
Systems developed by Shi et al. [19] for audio–visual sound source localization and tracking based on mobile robots have shown that acoustic data can be instrumental in solving the cocktail party problem; however, they were implemented on more powerful hardware and not dedicated microcontroller-based platforms. Similarly, Huang et al. [20] focus on sound-based positioning systems for mobile robots in greenhouses, achieving good accuracy but pointing out the challenges of noise filtering and feature extraction in dynamic environments. Behzad et al. [21] presented a multimodal dataset for multi-robot activity recognition leveraging Wi-Fi sensing, video, and audio, which are very useful sources for the development and assessment of multi-sensor robotic systems.
The integration of voice control with robotic platforms has been explored by several researchers. Korayem et al. [22] designed and implemented voice command recognition and sound source localization systems for human–robot interactions. Jami et al. [23] developed remote and voice-controlled surveillance robots with multimodal operation, while Nayak et al. [24] created voice-controlled robotic cars, demonstrating the practical applications of audio processing in mobile robotics. Fukumori et al. [25] advanced this field with the development of optical laser microphones for human–robot interactions, enabling speech recognition in extremely noisy service environments.
In the area of autonomous robotic systems with audio capabilities, Valentino and Leonen [26] developed IoT-based smart security robots with Android apps, night vision, and enhanced threat detection. Premchandran et al. [27] created solar-powered autonomous robotic cars for surveillance, while Naeem and Yousuf [28] designed AI-based voice-controlled humanoid robots. Babu et al. [29] implemented alive human detection and health monitoring using IoT-based robots, highlighting the growing interest in combining mobility with audio sensing for practical applications. Narayanan et al. [30] further contributed to this area with research on fuzzy-guided autonomous nursing robots using wireless beacon networks.
The challenges of navigation and control in mobile robots have been addressed by Bao et al. [31], who developed autonomous navigation and steering control based on wireless non-wheeled snake robots. Morsy et al. [32] proposed multi-robot movement based on a new modified source-seeking algorithm, while Kumar and Parhi [33] focused on the trajectory planning and control of multiple mobile robots using hybrid controllers. More specialized approaches were explored by Zhao et al. [34], who investigated the human-like motion planning of robotic arms based on human arm motion patterns, and by Gautreau et al. [35], who conducted experimental evaluations of instrumented bio-inspired cable-driven compliant continuum robots.
Recent work by Zhang et al. [36] explored robot sound design through design thinking, addressing the acoustic aspects of human–robot interactions. Shyam et al. [37] developed an intellectual design for bomb identification and defusing robots based on logical gesturing mechanisms, demonstrating advanced applications in security robotics. Meanwhile, Eibeck et al. [38] presented research on efficient approaches to unsupervised instance matching with applications to linked data, offering potential methodologies for processing and analyzing the data collected by mobile sensing platforms.
Despite these advancements, a significant gap exists in the literature regarding integrated frameworks that effectively manage both movement control and audio capture on resource-constrained robotic platforms. Most existing solutions address these as separate challenges, resulting in systems that either compromise mobility for audio quality or sacrifice acoustic sensing for movement capability. This research addresses this gap by proposing a unified framework that optimizes both aspects simultaneously through careful hardware integration and innovative software architecture.

3. Theoretical Framework and Novel Methods

3.1. Novel Audio Acquisition and Adaptive Compression Theory

This section presents our novel theoretical framework for real-time audio acquisition and adaptive compression in mobile robotics, extending beyond existing digital signal processing principles to address the unique challenges of resource-constrained mobile platforms.
The theoretical foundation of this audio acquisition system is rooted in digital signal processing principles, particularly as they apply to resource-constrained embedded systems. The INMP441 MEMS microphone utilized in this design converts acoustic pressure waves into digital signals through a sigma–delta modulator, producing a pulse-density-modulated (PDM) bitstream that is decimated to provide pulse code modulation (PCM) audio data [3,39].
The I2S protocol enables the transfer of this digital audio data with minimal computational overhead, operating as a synchronous serial interface with three essential lines: bit clock (SCK), word select (WS), and serial data (SD). This configuration allows for time-division multiplexed stereo channels, though this implementation utilizes only the left channel to reduce bandwidth requirements. This approach aligns with the methodologies described by Mello and Fonseca [40] in their development of autonomous noise-monitoring systems based on digital MEMS microphones.
Novel Adaptive Compression Algorithm: We introduce a novel adaptive compression methodology that dynamically optimizes audio quality versus bandwidth utilization based on real-time system conditions. Unlike fixed compression schemes, our algorithm continuously adapts to environmental acoustics and system performance constraints.
For a digital audio system with sampling rate f s and bit depth b, the theoretical data rate R can be expressed as
R = f s × b × c
where c represents the number of channels. In this implementation with f s = 16 kHz, b = 24 bits (native INMP441 resolution), and c = 1 (mono), the raw data rate is 384 kbps, exceeding the practical sustainable throughput of Bluetooth Classic for concurrent control and streaming applications.
To address this bandwidth constraint, we develop a novel dynamic range compression algorithm that maps 24-bit samples to optimal 8-bit representation through our adaptive transform function T ( x ) :
T ( x ) = x × S ( t ) D ( t ) + 128
where x is the original 24-bit sample value, S ( t ) is our time-adaptive sensitivity multiplier, D ( t ) is our novel adaptive dynamic range estimator, and 128 centers the output in the 8-bit range.
Our novel dynamic range estimator D ( t ) employs a multi-scale temporal analysis:
D ( t ) = α f a s t × D f a s t ( t 1 ) + α s l o w × D s l o w ( t 1 ) + ( 1 α f a s t α s l o w ) × max ( | x ( t ) | )
This dual-time-constant approach enables rapid adaptation to transient events while maintaining stability during steady-state conditions, representing a significant advancement over traditional single-parameter approaches.

3.2. Bluetooth Communication Model

The Bluetooth Classic protocol utilized in this framework operates in the 2.4 GHz ISM band with Frequency-Hopping Spread Spectrum (FHSS) modulation, providing a theoretical maximum throughput of 2–3 Mbps. However, practical limitations including protocol overhead, acknowledgment packets, and error correction reduce the effective bandwidth to approximately 700–800 kbps for sustained bidirectional communication [5,13]. Recent developments in wireless technologies for flexible and wearable sensing by Kong et al. [9] provide additional insights into optimizing wireless communication for resource-constrained applications.
The communication model must accommodate both low-latency control commands and bandwidth-intensive audio data within this constrained channel. The bidirectional data flow can be represented as
R t o t a l = R c + R a C
where R c is the data rate for control commands (rover to controller and controller to rover), R a is the data rate for audio streaming (rover to controller), and C is the effective channel capacity.
Given that control commands are represented as single ASCII characters sent at a maximum frequency of 100 Hz, R c is negligible at approximately 1 byte × 100 Hz = 800 bps. The audio data rate R a , after compression to 8 bits per sample, is approximately 128 kbps (16,000 samples per second × 8 bits), falling well within the channel capacity while leaving sufficient headroom for variable channel conditions. This approach is consistent with findings by Kanellopoulos et al. [10] regarding networking architectures and protocols for IoT applications in smart cities.
The probability of successful packet delivery P s over a noisy channel can be modeled as
P s = ( 1 p ) L
where p is the probability of bit error and L is the length of the packet (in bits). At 64 bytes audio packet size of the data (512 bits), for the typical bit error rates (from 10 3 to 10 5 ) of Bluetooth and for different wireless channel conditions, it is found that the packet delivery probability is between 99.95% and 59.9%. They thus need to provide efficient support for buffering to handle a possible loss of packets without causing a disturbance in the audio, which should be uninterrupted from a user’s perspective. Analogous problems and solutions were investigated by Idogawa et al. [14] in the creation of a wireless Bluetooth-low-energy and lightweight neural recording system.

3.3. Dual-Thread Concurrent Processing Model

Simultaneous motor control and audio recording require the coordination of both tasks so that one does not interfere with the other. The dual-thread architecture facilitates separating those concerns through a priority-based concurrent execution model, which can be abstracted as a set of tasks T with different priority and execution constraints:
T = { τ 1 , τ 2 , . . . , τ n }
Each task τ i is marked by its execution time e i , period p i , and deadline d i . In this system, the execution time for motor control commands is short, but deadlines are strict to maintain responsive control, while audio sampling tasks take longer to execute but have looser deadlines.
The schedulability of this task set can be analyzed using Rate Monotonic Analysis (RMA), which assigns priorities based on task periods. For a set of n periodic tasks, the sufficient condition for schedulability is
i = 1 n e i p i n ( 2 1 / n 1 )
With these two tasks (motor control and audio sampling), a theoretical utilization bound of ∼83% is achieved by this two-task implementation. The task utilization measured falls below this limit so that motor control takes about 5% and audio 60–70%, guaranteeing a schedule under nominal conditions.
The motor control thread operates with a polling frequency of 100 Hz, checking for incoming commands and updating motor states accordingly. This frequency provides a theoretical minimum command latency of 10 ms, though practical implementation considerations including Bluetooth transmission delays increase this to approximately 100–150 ms, still well within acceptable limits for teleoperation applications.
The audio acquisition thread operates at a higher frequency, dictated by the I2S sampling rate of 16 kHz. However, audio data is buffered and processed in chunks of 64 bytes, representing 8 ms of audio, to optimize both CPU utilization and Bluetooth transmission efficiency. This buffering introduces a small but consistent latency in the audio stream, which is acceptable for the target applications of environmental monitoring and acoustic scene analysis.

3.4. Motor Control and Power Regulation Theory

The L298N motor driver utilized in this system employs a dual H-bridge configuration, enabling bidirectional control of two DC motors through pulse-width modulation (PWM). The relationship between the PWM duty cycle δ and the applied motor voltage V m can be expressed as
V m = δ × V s
where V s is the supply voltage provided to the motor driver. This linear relationship enables precise control over motor speed, though inertial effects and friction introduce non-linearities in the actual mechanical response.
Power regulation represents a critical aspect of this design, as both motor operation and audio sampling must coexist without interference. The electromagnetic noise generated by DC motors follows an inverse square relationship with distance, with intensity I at distance r given by
I 1 r 2
The physical layout design was inspired by this relationship, with the microphone positioned to maximize distance from the motors while maintaining a compact form factor. Additionally, the power distribution system incorporates decoupling capacitors and separate regulation pathways for the motor subsystem and the sensitive audio acquisition components, minimizing electrical noise coupling.
The power consumption P of the system at an instant of time can be represented as the sum of the consumptions of individual components:
P = P e s p 32 + P m o t o r s + P m i c + P b t
where
  • P e s p 32 is the power required by the ESP32 microcontroller (Espressif Systems, Shanghai, China) (∼240 mW under processing).
  • P m o t o r s is the power consumed by the DC motors depending on torque and speed (∼100–500 mW per motor in standard operation).
  • P m i c is the power spent by the INMP441 microphone (InvenSense, San Jose, CA, USA) (∼1.4 mW).
  • P b t is the excess power to be added for Bluetooth transmission (approximately 100 mW during active streaming).
This power model guided the battery selection and power management strategy, ensuring sufficient capacity for extended operation while maintaining stable supply characteristics essential for audio quality and motor performance. The approach to power management used in this study aligns with research by Owfi [15] on deep-learning-based wireless indoor localization, which emphasizes the importance of efficient power utilization in mobile sensing systems.

4. System Architecture

4.1. Hardware Architecture and System Integration

This section presents our novel hardware integration methodology addressing the specific challenges of mobile acoustic sensing platforms (Figure 2).

4.1.1. Core Hardware Components

The hardware platform integrates five core components in a novel configuration optimized for concurrent audio capture and motor control (Table 1) (Figure 3):
ESP32 microcontroller: Dual-core 240 MHz processor with dedicated I2S peripherals, configured for asymmetric processing with Core 0 handling real-time motor control and Core 1 dedicated to audio acquisition and compression. Selected for its balanced combination of processing power, power efficiency, and peripheral capabilities.
INMP441 MEMS microphone: A 24-bit I2S digital microphone with 61 dB SNR, providing high-quality digital audio through a direct I2S interface. Unlike analog microphones requiring additional amplification and ADC stages, this digital approach minimizes signal degradation and electromagnetic interference susceptibility.
L298N motor driver: Dual H-bridge configuration enabling bidirectional PWM control with dedicated noise filtering circuits. Supports both directional control and speed modulation with generous current capacity (2 A per channel) ensuring reliable operation under load.
DC motors (2×): These are 6 V, 200 RPM motors with 300 mA nominal current, powering the tank-style chassis. Identified as primary sources of electromagnetic interference at characteristic frequencies (120 Hz, 250 Hz).
LiPo battery system: This is a 7.4 V, 2200 mAh power source with dedicated regulation pathways for motor and audio subsystems, incorporating separate regulation to minimize noise coupling.

4.1.2. Noise Source Analysis and Mitigation

Our comprehensive analysis identifies three primary noise sources:
Motor Electromagnetic Interference (EMI): DC motor operation generates electromagnetic noise with dominant peaks at 120 Hz and 250 Hz. The EMI intensity is as follows: I E M I ( f , d ) = P m o t o r ( f ) × G c o u p l i n g × d 2 .
Bluetooth Communication Interference: The 2.4 GHz transmission creates broadband interference (−45 dBm to −60 dBm during active periods).
Mechanical vibration coupling: Platform vibrations create low-frequency noise (5–50 Hz) masking environmental signals.
Three-Tier Mitigation Strategy:
  • Physical Isolation: Strategic component placement, mechanical decoupling (15–20 dB isolation), dedicated ground planes.
  • Electronic Filtering: Active power supply filtering (>30 dB noise rejection above 1 kHz), differential I2S signal routing (25 dB cross-talk reduction).
  • Digital signal processing: Real-time adaptive filtering achieving 12–18 dB motor harmonic suppression.

4.2. Advanced Software Architecture and Optimization

Our novel dual-thread architecture implements priority-based scheduling with mathematically optimized resource allocation.

4.2.1. Dual-Thread Processing Framework

The ESP32 firmware leverages FreeRTOS for true multitasking with mathematically optimized resource allocation (Figure 4):
  • Operates at 100 Hz with guaranteed worst-case latency of 15 ms.
  • Implements novel command buffering algorithm ensuring command ordering.
  • Processes single-character commands: ‘F’, ‘B’, ‘L’, ‘R’, and ‘S’.
  • Uses priority inheritance mechanism preventing priority inversion.
Medium-Priority Audio Processing Thread:
  • Manages 16 kHz I2S sampling with 99.2% buffer efficiency.
  • Implements adaptive dynamic range compression continuously monitoring signal amplitude.
  • Processes 64-byte chunks (8 ms audio) optimizing CPU utilization and Bluetooth efficiency.
  • Achieves 40% computational load reduction compared to standard approaches.

4.2.2. Performance Optimization Techniques

Five novel optimization methods implemented:
  • DMA Optimization: A 35% reduction in memory transfer overhead through 32-bit aligned circular buffers with predictive prefetching.
  • ISR Minimization: A <2 μs execution time through register-level optimization, preventing audio dropout during motor events.
  • Dynamic Clock Management: A 25% power reduction during low-activity periods while maintaining real-time guarantees.
  • Cache-Optimized Data Structures: Cache hit rates >95% through spatial and temporal locality optimization.
  • Predictive Buffer Management: Machine-learning-based prediction preventing overflow/underflow with 99.8% accuracy.

4.2.3. Android Application Architecture

Developed in Kotlin (API 30) following MVVM pattern with optimized real-time visualization (Figure 5):
  • Downsampling to 5 Hz update rate maintaining full-resolution recording.
  • Efficient MPAndroidChart rendering with batched UI updates.
  • Background processing using Kotlin coroutines.
  • Optimized WAV file generation with direct buffer writes.
  • Noise floor calibration improving perceptual quality by 6 dB in quiet environments.

4.3. Communication Protocol

The communication protocol offers bidirectional data exchange between the rover and a control device, using Bluetooth Classic’s Serial Port Profile (SPP). The selection of this relatively simple protocol was motivated by the need to have wide compatibility and easy implementation on resource-constrained platforms.
Control commands, which the Android device sends to the rover as single ASCII characters, consume minimal bandwidth but unambiguously instruct the rover to move. Operations are sent as required, up to a maximum of 100 Hz continuously (for example, with a directional button pressed).
MCU sends audio data to the Android as a stream of eight-bit samples at a sampling rate of 16 kHz, which is compressed microphone output. This one-way stream collides with the two-way control channel, where audio packets are given higher preference within the available bandwidth to not interrupt the stream.
The lack of complexity in this protocol also helps make it robust, because there is little state to maintain and recover between connection loss and recovery. Both the rover and Android application can return to known states when the connection is lost, allowing for fast reconnects without the need for complex handshaking or synchronization requirements.

5. Novel Implementation Methodology

5.1. Firmware Development and Optimization

The ESP32 firmware was developed using the Arduino framework, providing access to a rich ecosystem of libraries while maintaining low-level control of the hardware peripherals. Several key optimizations were implemented to enhance performance and reliability:
  • Direct register manipulation for time-critical I2S operations, bypassing the abstraction layers of standard libraries to minimize latency and jitter, following techniques described by Mello and Fonseca [40].
  • Memory-aligned buffer allocation to optimize DMA transfers for audio data, building on buffer management strategies proposed by Yang et al. [41].
  • Effective exploitation of the ESP32 dual cores with communication and motor control handled by the Core 0 and audio proof, and task partitioning techniques as for example those tested by Owfi [15] applied to audio processing on the Core 1.
  • Custom interrupt handlers and minimal execution time in order to achieve real-time sampling along with time optimization steps as described by Eibeck et al. [38].
Energy-efficient power management techniques, by activating peripherals only when necessary to minimize power consumption and heat generation, or applying the energy conservation method proposed by Matheen and Sundar [8]
The audio pipeline was both minified to run on the ESP32 and optimized for sound quality. The dynamic range compression process was refined by several iterations that were performed, finding a good trade-off between the running time of the process and the perceptual quality of the audio by tuning the parameters accurately, according to studies carried out by Fukumori et al. [25] in the field of noisy speech recognition.
The dual-thread architecture was implemented using the techniques of real-time scheduling that have been developed for constrained-resource systems, including mechanisms for preventing priority inversion by means of priority inheritance and for recovering from possible deadlocks by means of watchdog timers. This style of concurrent programming is consistent with the design pattern proposed by Narayanan et al. [30] to control autonomous robots.

5.2. Android Application Development

The Android application was developed in Kotlin, targeting API level 30 (Android 11) with backward compatibility to API level 26 (Android 8.0). The application architecture follows the Model-View-View-Model (MVVM) pattern, separating concerns between data management, user interface, and business logic.
Real-time audio visualization presented a significant challenge, as rendering high-frequency audio data can overwhelm mobile device rendering capabilities. To address this, the following steps were taken:
  • Downsampling of visualization data to 5 Hz update rate while maintaining full-resolution audio recording, applying techniques similar to those used by Shi et al. [19] for audio–visual processing.
  • Efficient line chart rendering using the MPAndroidChart library.
  • Batched UI updates to minimize main thread overhead, following UI optimization approaches established by Valentino and Leonen [26].
  • Background processing of audio data using Kotlin coroutines, implementing concurrency patterns recommended by Lakomkin, Egor et al. [18] for processing-intensive applications.
  • Optimized WAV file generation using direct buffer writes, applying memory management techniques described by Behzad et al. [21] for multimodal data handling.
The user interface was designed for single-handed operation during teleoperation, with movement controls positioned for thumb access while maintaining the visibility of the audio visualization components. This design enables effective simultaneous control and monitoring during field operations, incorporating principles of human–robot interactions established by Korayem et al. [22].
The communication layer implements robust error handling and automatic reconnection capabilities based on Bluetooth connection state monitoring. This approach to resilient networking draws on methodologies described by Kanellopoulos et al. [10] for IoT applications in challenging environments.

5.3. Experimental Protocol

To assess the performance of the Sound Recording Rover, a number of experiments were performed under different conditions, examining performance metrics pertinent to both locomotion and audio capture. The test scenarios of the experimental protocol were as follows:
Static audio quality assessment: When stationary, the rover was placed in three different acoustic environments (quiet room: 40 dB, office environment: 65 dB, and busy street: ∼85 dB) to assess the audio capture performance while the rover remains fixed in an environment. This method for testing corresponds to acoustic environment characterization approaches described by Heittola [39].
Combined moving and sound recording: The rover was driven in a fixed movement sequence (forward–right–backward–left–stop) recording audio at the same time, and repeated in the three acoustic environments. This test protocol includes sequential movement patterns like those reported by Zhao et al. [34] in their approaches for motion planning for robots.
Distance and obstacle testing: Control latency and audio streaming quality were evaluated at increasing distances (2 m, 5 m, and 10 m) and with a variety of obstacles (walls and furniture) between the rover and control device. This assessment method is adaptive of the experimental methodologies referred to by Huang et al. [20] for audio-based localization systems.
Battery endurance testing: The rover was run in constant motion with audio streaming until the battery went empty, and function metrics were recorded at regular intervals. This experimental protocol applies techniques to analyze power consumption analogous to the works of Premchandran et al. [27] in the assessment of autonomous vehicles powered by solar energy.
For each scenario, the following parameters were measured:
  • Command latency (time from control input to observed movement).
  • Audio packet delivery rate (percentage of expected audio packets successfully received).
  • Subjective audio quality (assessed through blind comparison using a five-point Likert scale similar to the Mean Opinion Score (MOS) by three independent evaluators).
  • Power consumption (measured using an inline current sensor).
  • CPU utilization (reported by ESP32 diagnostics).
The structured evaluation methodology used statistical analysis techniques analogous to those used by Morsy et al. [32] for the quantification of robotic system performance, focusing on variance analysis across various operational conditions.
This provided a thorough testing protocol for assessing the performance of the system over its intended operational envelope, revealing both the strengths and limitations of the present implementation. The evaluation of the multi-dimensional performance analysis adopts frameworks set up in the literature [33] for more complex robotic systems with multiple performance tasks.
The blind comparison protocol for subjective quality assessment was modeled on standardized evaluation methodologies used in audio quality research, with evaluators rating audio samples on clarity, noise level, and overall perceptual quality. Each evaluator independently assessed the same audio samples without knowledge of the recording conditions, and scores were averaged to provide the subjective quality ratings reported in Table 2.

6. Results and Discussion

6.1. Comparative Analysis with Existing Methods

While our approach represents a novel integration of mobile audio acquisition with real-time motor control, direct quantitative comparison with existing methods is challenging as most of the literature treats these as separate problems. Our work advances beyond existing solutions such as those by Nakadai et al. [1] and Shi et al. [19], which focus on stationary or high-resource platforms, by demonstrating concurrent high-quality audio capture and responsive mobility control on resource-constrained hardware. The achieved packet delivery rates (90–95%) and command latencies (<150 ms) represent significant improvements over traditional approaches that sacrifice one capability for the other.
Quantitative comparison framework: To address the lack of directly comparable systems, we establish performance benchmarks against theoretical limits and component-level comparisons:
Audio quality benchmarks: Our system achieves SNR values of 31–38 dB across test environments, comparing favorably to theoretical limits of 40–42 dB for eight-bit compressed audio. The 85–95% theoretical efficiency demonstrates our compression algorithm’s effectiveness.
Control system benchmarks: Command latencies of 105–142 ms represent 70–85% of a theoretical minimum (considering Bluetooth protocol overhead), significantly outperforming traditional sequential processing approaches that would require 300–500 ms for equivalent functionality.
Power efficiency comparisons: Our integrated approach achieves 4.5 h operation combining both functions, compared to theoretical separate system power consumption that would limit operation to 2–3 h for equivalent functionality.

6.2. Novel Contributions

This research introduces three fundamental innovations that advance mobile acoustic sensing:
1. Integrated audio–mobility framework: The first demonstrated framework enabling concurrent high-quality audio acquisition (>90% packet delivery) and responsive motor control (<150 ms latency) on resource-constrained embedded platforms. This integration solves the fundamental resource allocation problem that has prevented mobile acoustic sensing applications on embedded systems, validated through comprehensive testing across 40–85 dB environments.
2. Adaptive real-time compression algorithm: A novel mathematical model for dynamic audio compression that continuously optimizes the trade-off between audio fidelity and bandwidth utilization based on real-time system conditions. The algorithm achieves 75% bandwidth reduction while maintaining >90% perceptual quality through dual-time-constant adaptive dynamics, representing a significant advancement over fixed-parameter compression schemes.
3. Dual-thread concurrent architecture: A mathematically optimized real-time scheduling framework that guarantees the concurrent execution of time-critical motor control and bandwidth-intensive audio processing. The architecture achieves 83% theoretical utilization while maintaining hard real-time constraints for both subsystems, enabling applications previously impossible on embedded platforms.
Additional technical innovations:
  • Noise floor calibration protocol: Automated ambient noise assessment and adaptive parameter adjustment enabling optimal performance across varied acoustic environments without manual tuning.
  • Power-optimized Bluetooth protocol: Efficient bandwidth allocation strategy enabling concurrent control and streaming within Bluetooth Classic constraints while minimizing power consumption for battery-operated platforms.
  • Accessible open design: Complete framework implementation using commercially available components, enhancing reproducibility and accessibility for research and educational applications.
These contributions establish new foundations for mobile acoustic sensing, enabling applications from environmental monitoring to autonomous exploration that were previously impossible on resource-constrained embedded platforms.

6.3. Audio Acquisition Performance

The audio acquisition system demonstrated strong performance across varying acoustic environments. Table 2 summarizes the quantitative measurements for each environment, including packet delivery rate, calculated signal-to-noise ratio, and subjective quality assessment.
The packet delivery rate showed a slight decline as ambient noise levels increased, potentially due to increased processing demands for higher-amplitude signals. However, even in the most challenging environment (a busy street), delivery rates remained above 90%, providing continuous audio with minimal perceptible dropouts.
Figure 6 illustrates representative audio waveforms captured in each environment, highlighting the system’s ability to capture both subtle ambient sounds and louder environmental events. The dynamic range compression algorithm successfully adapted to each environment, maintaining a consistent output level despite dramatically different input amplitudes.
The noise floor calibration implemented in the Android application proved particularly effective in the quiet room environment, reducing the audibility of system self-noise by approximately 6 dB based on a spectral analysis. This improvement was less pronounced in noisier environments, where environmental sounds naturally masked system noise. We acknowledge that the subjective audio quality assessment was conducted with only three evaluators, which provides limited statistical power. Future work should incorporate objective metrics such as PESQ (Perceptual Evaluation of Speech Quality) and STOI (Short-Time Objective Intelligibility) to provide more robust and statistically significant quality assessments. The subjective quality scores represent averaged ratings from three independent evaluators using a standardized five-point scale, where one indicates poor quality and five indicates excellent quality. While this provides initial validation of system performance, future studies should employ larger sample sizes and standardized objective metrics for a more comprehensive quality assessment.

6.4. Motor Control Performance

The motor control system demonstrated a consistent performance across the testing scenarios. Table 3 summarizes the measured command latency and execution accuracy for each distance condition.
Command execution accuracy represents the percentage of commands that produced the expected movement without obvious deviation or delay. The slight decrease in accuracy at greater distances correlates with increased packet loss in the Bluetooth connection, though the system maintained acceptable performance even at 10 m with obstacles present.
Electromagnetic interference from motor operation affected audio quality to varying degrees depending on movement intensity and microphone orientation. Figure 7 illustrates the spectral impact of motor operation on audio recording, showing the characteristic frequency bands affected by motor noise. The analysis reveals dominant interference peaks at 120 Hz and 250 Hz, consistent with motor switching frequencies and mechanical resonances. This interference was most pronounced during high-speed forward and backward movements, with turning operations producing less significant effects due to the differential motor operation.

6.5. System Efficiency and Endurance

Power consumption varied significantly based on operational mode, with the highest consumption occurring during simultaneous audio streaming and motor operation. Table 4 details the measured consumption across different operational scenarios.
The battery endurance testing confirmed these estimates, with the fully charged LiPo battery supporting approximately 4.3 h of combined operation before the voltage dropped below the operational threshold. This endurance is sufficient for typical field recording sessions and can be extended through power management strategies such as the selective activation of audio streaming only when necessary.
CPU utilization remained within manageable limits throughout all operational scenarios, with Core 0 averaging 65% utilization and Core 1 averaging 72% utilization during combined operation. This headroom suggests potential for additional on-device processing, such as basic audio analysis or feature extraction, in future iterations of the system.

6.6. Integrated Performance Evaluation

The combined results indicate that the Sound Recording Rover successfully achieves its primary objective of enabling mobile audio acquisition with concurrent teleoperation capabilities. The dual-thread architecture effectively partitions the computational resources, preventing command processing from interrupting audio acquisition while maintaining responsive control (Figure 8).
Figure 9 visualizes the relationship between audio quality (measured as the packet delivery rate) and control responsiveness (measured as the command latency) across the distance conditions. This visualization illustrates the system’s ability to maintain an acceptable performance in both domains simultaneously, with graceful degradation at the edges of its operational envelope.
The observed increase in the command latency with distance (from 105 ms at 2 m to 142 ms at 10 m) provides indirect evidence of Bluetooth signal degradation, though dedicated RF signal strength measurements were not conducted in this study. Future work should include a comprehensive Bluetooth SNR analysis across varying distances and environmental conditions to better characterize wireless communication performance.
The noise floor calibration and attenuation algorithms implemented in the Android application significantly improved the subjective audio quality in stationary recording scenarios. However, during movement, mechanical vibration coupling and motor noise presented challenges that were only partially mitigated by the current implementation. Figure 10 illustrates the effectiveness of the noise attenuation algorithm in both stationary and mobile recording scenarios.
These results suggest several potential improvements for future iterations:
  • Enhanced mechanical isolation for the microphone module.
  • Active noise cancellation techniques specifically targeting motor frequency bands, including LMS (Least Mean Squares) adaptive filtering for dynamic interference suppression of the identified 120 Hz and 250 Hz peaks.
  • Adaptive packet sizing based on a real-time connection quality assessment.
  • Directional microphone implementation to focus audio capture while reducing ambient noise.
  • Comprehensive Bluetooth SNR analysis across varying transmission distances and environmental conditions.
  • Comparative evaluation against existing mobile audio sensing solutions where applicable.

7. Applications and Future Directions

7.1. Potential Applications

Security and surveillance: The combination of mobility and audio sensing creates new possibilities for security applications, where the rover can investigate suspicious sounds or provide audio monitoring of areas that lack fixed infrastructure [15]. The relatively quiet operation of the platform makes it suitable for unobtrusive monitoring in sensitive environments. This application aligns with work by Shyam et al. [37] on bomb identification and defusing robots, demonstrating the potential for specialized security applications.
STEM education: The accessible design using commercial components makes the platform an excellent teaching tool for interdisciplinary STEM education, integrating concepts from robotics, digital signal processing, wireless communication, and software development. Students can modify and extend the platform to explore various aspects of these domains while working with tangible, engaging technology, building on Hong’s work [13] in MEMS-embedded system design.
Acoustic research: For researchers studying environmental acoustics, the platform offers a cost-effective tool for gathering spatially diverse audio samples without the logistical challenges of deploying multiple fixed recording stations. The mobility enables adaptive sampling strategies based on real-time assessments of acoustic phenomena, supporting work like that of Yang et al. [41] on emotion sensing using mobile devices.
Assistive technology: With modifications to enhance audio sensitivity in specific frequency ranges, the platform could serve as an assistive device for individuals with hearing impairments, allowing for the remote investigation of sounds or alerting users to specific acoustic events in their environment [27,28].

7.2. Future Research Directions

Building upon the foundation established in this work, several promising directions for future research emerge:
Multi-microphone array: Expanding the audio input capacity with multiple spatially distributed microphone operations would allow several advanced functions like sound source localization, beamforming-based noise cancelling, and directional recording. For this, the ESP32 has two I2S interfaces that can be used to read signals from an ADC directly.
On-device audio processing: Although a performance analysis revealed there is computational headroom available, the future releases may incorporate on-device audio feature extraction and classification. Such a solution can also reduce communication bandwidth because only meaningful acoustic events or statistics instead of the entire audio stream have to be transmitted.
Integration of machine learning: Adding lightweight machine learning models for audio event detection and classification would transform the platform from a passive data recorder into an intelligent acoustic monitoring system. TinyML frameworks specifically designed for ESP32 could enable the on-device processing of acoustic events, reducing bandwidth requirements and enabling real-time responses to specific sounds (alarms, breaking glass, or animal vocalizations). This approach would significantly reduce reliance on continuous streaming while maintaining high-quality event detection capabilities.
Advanced noise cancellation: Future implementations should incorporate sophisticated adaptive filtering techniques to address motor interference. LMS (Least Mean Squares) and NLMS (Normalized Least Mean Squares) algorithms could be implemented to dynamically cancel motor noise in real time. These adaptive filters could learn the spectral characteristics of motor interference during operation and selectively attenuate the 120 Hz and 250 Hz peaks identified in our analysis, while preserving environmental audio content.
Comprehensive wireless communication analysis: Future studies should include a detailed characterization of Bluetooth signal quality across varying distances, environmental conditions, and interference scenarios. This should include RF signal strength measurements, a bit error rate analysis, and packet loss characterization to better understand the relationship between wireless channel conditions and system performance. Additionally, evaluations of alternative wireless protocols (Wi-Fi, LoRa, and ZigBee) for specific application scenarios would broaden the framework’s applicability.
Comparative performance evaluation: Future work should include systematic comparison with existing mobile acoustic sensing solutions where applicable. This could involve implementing comparable systems using alternative architectures and conducting controlled performance evaluations to quantify the advantages of the integrated approach presented here.
Acoustic autonomous navigation: If autonomy is realized on this level, the system can search for sound sources and autonomously track them independently, and this would lead to developing novel applications such as industrial acoustic monitoring and wildlife acoustic exploration.
Mesh networking: Extending the communication architecture to support mesh networking between multiple rovers would enable collaborative acoustic sensing over larger spatial regions. BLE Mesh and hybrid Wi-Fi/Bluetooth protocols could address the scalability limitations of current Bluetooth Classic implementation. This would allow platforms to communicate, share data, and coordinate mobility to maximize coverage and gather acoustic information from multiple vantage points concurrently, addressing the scalability concerns raised for multi-robot deployments.
Improved power management: Improvements to the control of the power system and employment of energy scavenging devices, e.g., solar panels, could increase longevity in field scenarios, making battery replacement or recharging impractical for long-term deployments.
Enhanced audio quality assessment: Future work should incorporate objective audio quality metrics such as PESQ (Perceptual Evaluation of Speech Quality) and STOI (Short-Time Objective Intelligibility) to provide more robust and statistically significant quality assessments. Additionally, expanding subjective evaluations to include larger sample sizes would strengthen the statistical validity of perceptual quality assessments.

7.3. Study Limitations

Several limitations should be acknowledged in this work:
Bluetooth signal characterization: While audio SNR was comprehensively evaluated across different acoustic environments, Bluetooth signal SNR analysis was limited. This study did not include dedicated RF signal strength measurements or detailed characterization of wireless channel conditions across varying distances and environmental factors. The observed increase in command latency with distance provides indirect evidence of signal degradation, but comprehensive wireless communication analysis remains for future work.
Limited comparative evaluation: Direct quantitative comparison with existing mobile acoustic sensing methods was not conducted, primarily because most of the existing literature addresses audio capture and mobility control as separate problems. While our integrated approach represents a novel contribution, future work should include systematic comparison with alternative architectures where applicable.
Subjective quality assessment: The audio quality evaluation was limited to three independent evaluators using a five-point scale. This provides initial validation but lacks the statistical power of larger-scale studies. Future work should incorporate standardized objective metrics (PESQ and STOI) and larger sample sizes for more robust quality assessments.
Environmental testing scope: Testing was conducted in three representative acoustic environments (40 dB, 65 dB, and 85 dB), but broader environmental characterization including outdoor conditions, electromagnetic interference scenarios, and extreme temperatures would strengthen future evaluations.
These future directions represent natural extensions of the current framework, building upon its strengths while addressing the limitations identified through experimental evaluation. The modular architecture of both hardware and software components facilitates such evolution, allowing incremental improvements without requiring wholesale redesign of the system.

8. Conclusions

This paper presents a comprehensive theoretical and practical framework for concurrent real-time audio acquisition and motor control in resource-constrained mobile robotics, addressing fundamental challenges through novel algorithmic and architectural innovations.
The dual-thread processing model successfully partitions computational resources between time-sensitive control tasks and bandwidth-intensive audio processing, ensuring that neither system adversely affects the other. This architecture, combined with the adaptive audio compression algorithm and noise attenuation techniques, enables the capture of meaningful environmental audio data even in challenging acoustic environments and during platform movement.
Comprehensive empirical testing under a variety of operating conditions validates these observations; the system has been shown capable of commanding with latencies under 150 ms and audio packet delivery rates of >90% even at the limits of its operating envelope. The platform proved power efficient and its battery life is enough to be practical in field applications, with simultaneous operation (>4 h) available from a single battery charge.
The theoretical framework presented establishes new foundations for mobile acoustic sensing through three core innovations: integrated audio–mobility processing, adaptive real-time compression, and mathematically optimized concurrent architectures. These contributions advance the field beyond previous limitations that treated audio capture and mobility as mutually exclusive requirements.
The theoretical framework and practical implementations discussed here serve as an enabling foundation for further research and development in this emerging field. By solving the core problem of resource-constrained mobile audio sensing, our work lays the groundwork for more complex systems with multiple microphones, on-device processing, machine learning integration, autonomous exploration, and other rich features.
Future work should focus on extending these theoretical foundations to multi-agent systems, incorporating machine learning for adaptive optimization, and exploring application-specific optimizations for environmental monitoring, surveillance, and autonomous exploration domains.
In conclusion, the Sound Recording Rover framework demonstrates that effective acoustic sensing and responsive mobility can coexist on affordable, accessible platforms, enabling new applications and research directions at the intersection of robotics, audio processing, and environmental monitoring. This integration of sensing modalities represents an important step toward more capable and versatile mobile robotic systems that can perceive and interact with their environments through multiple complementary channels.

Author Contributions

Conceptualization, S.G. and U.M.; methodology, S.G.; software, S.G.; validation, S.G., U.M., and A.K.M.Z.H.; formal analysis, S.G. and A.J.A.A.-G.; investigation, S.G.; resources, U.M.; data curation, S.G.; writing—original draft preparation, S.G.; writing—review and editing, U.M., A.K.M.Z.H., and A.J.A.A.-G.; visualization, S.G.; supervision, U.M. and A.K.M.Z.H.; project administration, A.K.M.Z.H.; funding acquisition, A.K.M.Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by institutional resources from Poornima Institute of Engineering and Technology. Also, this work is partially funded by the Center for Research and Innovation Management (CRIM), Universiti Teknikal Malaysia Melaka (UTeM) with the Short-Term Grant Scheme (PJP) with grant No. PJP/2024/FTKEK/PERINTIS/SA0017.

Data Availability Statement

The data that support the findings of this study, including the source code for the ESP32 firmware, Android application, and experimental results, are available from the corresponding author upon reasonable request. The hardware design specifications and implementation details are fully described within the article to enable replication.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT-4o mini (OpenAI, GPT-4o mini, 2025) for the purposes of language editing, specifically to assist with grammar, spelling, and sentence flow in the title, abstract, and introduction. The tool was not used to generate scientific ideas, design the study, analyze data, or create any figures or tables. All scientific content, data interpretation, and conclusions are entirely the authors’ own work. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Nakadai, K.; Okuno, H.G.; Kitano, H. Real-time sound source localization and separation for robot audition. In Proceedings of the Seventh International Conference on Spoken Language Processing, Denver, CO, USA, 16–20 September 2002. [Google Scholar]
  2. Harari, G.M.; Gosling, S.D. Understanding behaviours in context using mobile sensing. Nat. Rev. Psychol. 2023, 2, 767–779. [Google Scholar] [CrossRef]
  3. Gemelli, A.; Tambussi, M.; Fusetto, S.; Aprile, A.; Moisello, E.; Bonizzoni, E.; Malcovati, P. Recent trends in structures and interfaces of MEMS transducers for audio applications: A review. Micromachines 2023, 14, 847. [Google Scholar] [CrossRef] [PubMed]
  4. Li, M.; Li, Y.; Tian, Y.; Wu, N. Desert Seismic Signal Denoising Based on Unsupervised Feature Learning and Time–Frequency Transform Technique. IEEE Geosci. Remote Sens. Lett. 2021, 19, 7502705. [Google Scholar] [CrossRef]
  5. Natgunanathan, I.; Fernando, N.; Loke, S.W.; Weerasuriya, C. Bluetooth low energy mesh: Applications, considerations and current state-of-the-art. Sensors 2023, 23, 1826. [Google Scholar] [CrossRef]
  6. Cantizani-Estepa, J.; Bravo-Arrabal, J.; Fernández-Lozano, J.J.; Fortes, S.; Barco, R.; García-Cerezo, A.; Mandow, A. Bluetooth low energy for close detection in search and rescue missions with robotic platforms: An experimental evaluation. IEEE Access 2022, 10, 106169–106179. [Google Scholar] [CrossRef]
  7. Kamdjou, H.M.; Baudry, D.; Havard, V.; Ouchani, S. Resource-constrained extended reality operated with digital twin in industrial internet of things. IEEE Open J. Commun. Soc. 2024, 5, 928–950. [Google Scholar] [CrossRef]
  8. Matheen, M.A.; Sundar, S. IoT multimedia sensors for energy efficiency and security: A review of QoS aware and methods in wireless multimedia sensor networks. Int. J. Wirel. Inf. Networks 2022, 29, 407–418. [Google Scholar] [CrossRef]
  9. Kong, L.; Li, W.; Zhang, T.; Ma, H.; Cao, Y.; Wang, K.; Zhou, Y.; Shamim, A.; Zheng, L.; Wang, X.; et al. Wireless technologies in flexible and wearable sensing: From materials design, system integration to applications. Adv. Mater. 2024, 36, 2400333. [Google Scholar] [CrossRef]
  10. Kanellopoulos, D.; Sharma, V.K.; Panagiotakopoulos, T.; Kameas, A. Networking architectures and protocols for IoT applications in smart cities: Recent developments and perspectives. Electronics 2023, 12, 2490. [Google Scholar] [CrossRef]
  11. McCormack, L.; Politis, A.; Särkkä, S.; Pulkki, V. Real-time tracking of multiple acoustical sources utilising Rao-Blackwellised particle filtering. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 206–210. [Google Scholar]
  12. Fischer, R.; Mueller, H.; Polonelli, T.; Benini, L.; Magno, M. Windnode: A long-lasting and long-range bluetooth wireless sensor node for pressure and acoustic monitoring on wind turbines. In Proceedings of the 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems (ICPS), Victoria, BC, Canada, 10–12 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 393–399. [Google Scholar]
  13. Hong, S.H. MEMS Embedded System Design. J. Korea Soc. Digit. Ind. Inf. Manag. 2022, 18, 47–54. [Google Scholar]
  14. Idogawa, S.; Yamashita, K.; Sanda, R.; Numano, R.; Koida, K.; Kawano, T. A lightweight, wireless Bluetooth-low-energy neuronal recording system for mice. Sens. Actuators B Chem. 2021, 331, 129423. [Google Scholar] [CrossRef]
  15. Owfi, A. Generalizable Deep-Learning-Based Wireless Indoor Localization. Master’s Thesis, Clemson University, Clemson, SC, USA, 2023. [Google Scholar]
  16. Ahmad, N.S. Recent Advances in WSN-based Indoor Localization: A Systematic Review of Emerging Technologies, Methods, Challenges and Trends. IEEE Access 2024, 12, 180674–180714. [Google Scholar] [CrossRef]
  17. Wang, Z.; He, S.; Li, G. Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation technique. Clust. Comput. 2024, 27, 14669–14684. [Google Scholar] [CrossRef]
  18. Lakomkin, E.; Zamani, M.A.; Weber, C.; Magg, S.; Wermter, S. Incorporating end-to-end speech recognition models for sentiment analysis. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 7976–7982. [Google Scholar]
  19. Shi, Z.; Zhang, L.; Wang, D. Audio–visual sound source localization and tracking based on mobile robot for the cocktail party problem. Appl. Sci. 2023, 13, 6056. [Google Scholar] [CrossRef]
  20. Huang, Z.; Shiigi, T.; Tsay, L.W.J.; Nakanishi, H.; Suzuki, T.; Ogawa, Y.; Naoshi, K. A sound-based positioning system with centimeter accuracy for mobile robots in a greenhouse using frequency shift compensation. Comput. Electron. Agric. 2021, 187, 106235. [Google Scholar] [CrossRef]
  21. Behzad, K.; Zandi, R.; Motamedi, E.; Salehinejad, H.; Siami, M. RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio. Sci. Data 2025, 12, 326. [Google Scholar] [CrossRef]
  22. Korayem, M.H.; Azargoshasb, S.; Korayem, A.H.; Tabibian, S. Design and implementation of the voice command recognition and the sound source localization system for human–robot interaction. Robotica 2021, 39, 1779–1790. [Google Scholar] [CrossRef]
  23. Jami, S.; Anirudh, L.; Panda, S.; Aggarwal, R.; Singh, S.K. Remote And Voice Controlled Surveillance Robot With Multimodal Operation. In Proceedings of the 2021 Innovations in Power and Advanced Computing Technologies (i-PACT), Kuala Lumpur, Malaysia, 27–29 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–7. [Google Scholar]
  24. Nayak, S.K.; Hota, S.P.; Behera, S. Voice Controlled Robotic Car. UGC Care Group I J. 2022, 12, 593–598. [Google Scholar]
  25. Fukumori, T.; Cai, C.; Zhang, Y.; El Hafi, L.; Hagiwara, Y.; Nishiura, T.; Taniguchi, T. Optical laser microphone for human-robot interaction: Speech recognition in extremely noisy service environments. Adv. Robot. 2022, 36, 304–317. [Google Scholar] [CrossRef]
  26. Valentino, A.M.; Leonen, J.T. IoT-Based Smart Security Robot with Android App, Night Vision and Enhanced Threat Detection. In Proceedings of the 2023 IEEE 15th International Conference on Computational Intelligence and Communication Networks (CICN), Bangkok, Thailand, 22–23 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 415–420. [Google Scholar]
  27. Premchandran, V.; Karthikkumar, M.; Thamizharasan, V.; Sathish, E. Solar Powered Autonomous Robotic Car Using for Surveillance. In Intelligent Manufacturing and Energy Sustainability: Proceedings of ICIMES 2021; Springer: Singapore, 2021; pp. 249–256. [Google Scholar]
  28. Naeem, B.; Saeed-Ul-Hassan; Yousuf, N. An AI based Voice Controlled Humanoid Robot. Res. Sq. 2022. preprint. [Google Scholar] [CrossRef]
  29. Babu, Y.A.; Reshma, A.; Tejeswar, P.V.N.S.; Kumar, R.V.; Sri, M.D. Alive Human Detection and Health Monitoring Using IOT Based Robot. In Recent Developments in Electronics and Communication Systems, Proceedings of the First International Conference on Recent Developments in Electronics and Communication Systems (RDECS-2022), Surampalem, India, 22–23 July 2022; IOS Press: Amsterdam, The Netherlands, January 2023; Volume 32, p. 468. [Google Scholar]
  30. Narayanan, K.L.; Krishnan, R.S.; Son, L.H.; Tung, N.T.; Julie, E.G.; Robinson, Y.H.; Kumar, R.; Gerogiannis, V.C. Fuzzy guided autonomous nursing robot through wireless beacon network. Multimed. Tools Appl. 2022, 81, 3297–3325. [Google Scholar] [CrossRef] [PubMed]
  31. Bao, L.; Sun, Y.; Xie, Z. Autonomous navigation and steering control based on wireless non-wheeled snake robot. Robotica 2024, 42, 1909–1918. [Google Scholar] [CrossRef]
  32. Morsy, M.A.; Al-Otaibi, H.E.; Salamah, Y.B.; Ahmad, I. Multi-robot movement based on a new modified source-seeking algorithm. Robotica 2025, 43, 1–14. [Google Scholar] [CrossRef]
  33. Kumar, S.; Parhi, D.R. Trajectory planning and control of multiple mobile robot using hybrid MKH-fuzzy logic controller. Robotica 2022, 40, 3952–3975. [Google Scholar] [CrossRef]
  34. Zhao, J.; Wang, C.; Xie, B. Human-like motion planning of robotic arms based on human arm motion patterns. Robotica 2023, 41, 259–276. [Google Scholar] [CrossRef]
  35. Gautreau, E.; Bonnet, X.; Laribi, M.A. Instrumented bio-inspired cable-driven compliant continuum robot: Static modeling and experimental evaluation. Robotica 2024, 42, 2924–2948. [Google Scholar] [CrossRef]
  36. Zhang, B.J.; Orthmann, B.; Torre, I.; Bresin, R.; Fick, J.; Leite, I.; Fitter, N.T. Hearing it out: Guiding robot sound design through design thinking. In Proceedings of the 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Busan, Republic of Korea, 28–31 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2064–2071. [Google Scholar]
  37. Shyam, M.; Amalasweena, M.; Balasaranya, K.; Renugadevi, R.; Chandran, K.P. Intellectual Design of Bomb Identification and Defusing Robot based on Logical Gesturing Mechanism. In Proceedings of the 2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 25–26 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
  38. Eibeck, A.; Shaocong, Z.; Mei Qi, L.; Kraft, M. A Simple and Efficient Approach to Unsupervised Instance Matching and its Application to Linked Data of Power Plants. J. Web Semant. 2024, 80, 100815. [Google Scholar] [CrossRef]
  39. Heittola, T. Computational Audio Content Analysis in Everyday Environments. Ph.D. Thesis, Tampere University Dissertations-Tampereen Yliopiston Väitöskirjat, Tampere, Finland, 2021. [Google Scholar]
  40. Mello, F.R.D.; Fonseca, W.D.A. Autonomous noise monitoring system based on digital MEMS microphones: Development of a smartphone application for remote communication. In Proceedings of the INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Grand Rapids, MI, USA, 15–18 May 2023; Institute of Noise Control Engineering: Reston, VA, USA, 2023; Volume 265, pp. 5650–5661. [Google Scholar]
  41. Yang, K.; Tag, B.; Wang, C.; Gu, Y.; Sarsenbayeva, Z.; Dingler, T.; Wadley, G.; Goncalves, J. Survey on emotion sensing using mobile devices. IEEE Trans. Affect. Comput. 2022, 14, 2678–2696. [Google Scholar] [CrossRef]
Figure 1. System architecture with data flow.
Figure 1. System architecture with data flow.
Signals 06 00031 g001
Figure 2. Hardware implementation and physical layout.
Figure 2. Hardware implementation and physical layout.
Signals 06 00031 g002
Figure 3. Integrated hardware architecture and physical layout.
Figure 3. Integrated hardware architecture and physical layout.
Signals 06 00031 g003
Figure 4. Dual-thread architecture and data flow.
Figure 4. Dual-thread architecture and data flow.
Signals 06 00031 g004
Figure 5. Android application user interface.
Figure 5. Android application user interface.
Signals 06 00031 g005
Figure 6. Audio waveforms across test environments.
Figure 6. Audio waveforms across test environments.
Signals 06 00031 g006
Figure 7. Motor noise spectral impact on audio recording.
Figure 7. Motor noise spectral impact on audio recording.
Signals 06 00031 g007
Figure 8. Dual-thread processing performance analysis.
Figure 8. Dual-thread processing performance analysis.
Signals 06 00031 g008
Figure 9. Relationship between audio quality and control responsiveness.
Figure 9. Relationship between audio quality and control responsiveness.
Signals 06 00031 g009
Figure 10. Effectiveness of noise floor calibration and attenuation.
Figure 10. Effectiveness of noise floor calibration and attenuation.
Signals 06 00031 g010
Table 1. Primary hardware components.
Table 1. Primary hardware components.
ComponentSpecificationsFunction
ESP32 (Espressif Systems, Shanghai, China)Dual-core 240 MHz, 520 KB SRAM, Bluetooth 4.2Processing, communication, audio I/O
INMP441 (InvenSense, San Jose, CA, USA)24-bit I2S MEMS microphone, 61 dB SNR, 20 Hz–20 kHzEnvironmental audio capture
L298NDual H-bridge, 2 A per channel, 5–35 VMotor direction and speed control
DC motors6 V, 200 RPM, 300 mA nominalRover movement
LiPo battery7.4 V, 2200 mAh, 25 CPower supply
Rover ChassisTank-style tracked platform, dual motorsPhysical platform
Table 2. Audio acquisition performance across environments.
Table 2. Audio acquisition performance across environments.
EnvironmentPacket Delivery RateCalculated SNRSubjective Quality (1–5)
Quiet room (40 dB)95.3%38.7 dB4.2
Office environment (65 dB)93.8%35.4 dB3.9
Busy street (85 dB)90.1%31.2 dB3.5
Table 3. Motor control performance at varying distances.
Table 3. Motor control performance at varying distances.
DistanceCommand LatencyCommand Execution AccuracyInterference with Audio
2 m105 ± 12 ms98.7%Minimal
5 m123 ± 18 ms97.2%Slight
10 m142 ± 25 ms95.8%Moderate
Table 4. Power consumption in various operational modes.
Table 4. Power consumption in various operational modes.
Operation ModeAverage Current DrawEstimated Runtime (2200 mAh)
Standby (connected)115 mA19.1 h
Audio streaming only175 mA12.6 h
Motor operation only390 mA5.6 h
Combined operation485 mA4.5 h
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gupta, S.; Mamodiya, U.; Hossain, A.K.M.Z.; Al-Gburi, A.J.A. A Framework for Bluetooth-Based Real-Time Audio Data Acquisition in Mobile Robotics. Signals 2025, 6, 31. https://doi.org/10.3390/signals6030031

AMA Style

Gupta S, Mamodiya U, Hossain AKMZ, Al-Gburi AJA. A Framework for Bluetooth-Based Real-Time Audio Data Acquisition in Mobile Robotics. Signals. 2025; 6(3):31. https://doi.org/10.3390/signals6030031

Chicago/Turabian Style

Gupta, Sandeep, Udit Mamodiya, A. K. M. Zakir Hossain, and Ahmed J. A. Al-Gburi. 2025. "A Framework for Bluetooth-Based Real-Time Audio Data Acquisition in Mobile Robotics" Signals 6, no. 3: 31. https://doi.org/10.3390/signals6030031

APA Style

Gupta, S., Mamodiya, U., Hossain, A. K. M. Z., & Al-Gburi, A. J. A. (2025). A Framework for Bluetooth-Based Real-Time Audio Data Acquisition in Mobile Robotics. Signals, 6(3), 31. https://doi.org/10.3390/signals6030031

Article Metrics

Back to TopTop