1. Introduction
Noise is common in everyday life, affecting communication not only for individuals with hearing impairments but also for those with typical hearing (TH), i.e., bilateral pure-tone thresholds ≤ 20 dB HL for 0.5–8 kHz in octave bands. Although pure-tone audiometry is widely recognized as the “gold standard” for assessing hearing function, a normal audiogram does not necessarily guarantee satisfactory communication experiences in all settings. Over 10% of adults may experience hearing difficulty in noisy environments despite having normal audiograms. Tremblay et al. [
1] reported that 12% of 682 individuals aged 21–67 years with TH reported hearing difficulties. Similarly, 15% of 2015 subjects aged 20–69 years with bilateral pure-tone averages of ≤25 dB HL for 0.5, 1, 2, and 4 kHz experienced similar issues [
2]. However, standard clinical guidelines from the American Academy of Audiology [
3] and the American Speech-Language-Hearing Association [
4] provide limited guidance for addressing these communication challenges for listeners with TH, aside from recommendations to use hearing protectors to prevent occupational noise-induced hearing loss [
5]. Mealings et al. [
6] found that many clients express frustration when told they have “normal hearing” while still experiencing hearing difficulties. Additionally, clinicians expressed frustration with a lack of adequate training or scientific resources to assist these clients beyond suggesting the use of communication strategies [
6].
Possible solutions for these challenges are reviewed. The related work is divided into three sections.
Section 1.1 presents one potential technological solution, which is the use of clinically available remote microphone (RM) systems, designed to enhance the signal-to-noise ratio (SNR).
Section 1.2 addresses the use of smartphones as RM systems. Smartphones equipped with built-in microphones and Bluetooth capabilities (Bluetooth Special Interest Group, Kirkland, WA, USA) have been suggested as components of RM systems for use in small-group conversations for adults.
Section 1.3 includes a review of the available research regarding the acoustic properties and potential benefits of using the AirPods Pro (AP) as a hearing device.
1.1. Clinically Available Remote Microphone Systems
These systems consist of a transmitter/microphone worn by the speaker and a receiver worn by the listener [
7]. Research has shown that RM systems significantly improve speech recognition in noisy environments for individuals with hearing impairments and typical hearing. Thibodeau [
8] reported that the Phonak Roger RM system enhanced speech recognition performance in persons with moderate-to-severe hearing loss from 45% to 61% compared to using hearing aids (HAs) or cochlear implants alone. In 2024, Thibodeau et al. [
9] reported that RM systems benefited adults with TH. The observed improvements were 21.37% and 9.87% when using the Phonak Roger Select and Roger Pen, respectively, compared to no RM system in 10 participants with TH, aged 20 to 63 years [
9]. Shiels and colleagues [
10] found RM benefits for children aged 6 to 12 years. They reported improvements in the sentence recognition scores by an average of 3.03, 21.40, 21.61, and 49.95% when tested in SNRs of +12, +5, 0, and −10 dB, respectively. They also reported improvements in visual and auditory attention with RM use.
Although listeners with TH can potentially benefit from using RM systems designed for persons with hearing loss [
9], the adoption of such devices is limited due to their high cost. For example, Mealings et al. [
11] reported that none of the participants (n = 27) with TH expressed a preference for purchasing a pair of mild-gain HAs priced at approximately USD 3250, despite the potential benefits for their hearing difficulties. Additionally, the higher expenses associated with RM systems and the inconvenience of carrying multiple devices may deter individuals from using RM systems in daily communication. Furthermore, the social stigma often associated with using hearing devices can be a concern for some people. da Silva et al. [
12] reported that persons with hearing impairment might be perceived as “less intelligent” or labelled as “deaf and dumb”, causing feelings of shame and embarrassment. These negative emotional feelings of self-imposed stigma may lead to social isolation and affect decision-making regarding the initial acceptance and pursuit of potential treatment [
12]. Therefore, it is essential to find more affordable and approachable alternatives to increase the acceptance and use of hearing devices for addressing hearing difficulties.
1.2. Smartphone-Based Remote Microphone Systems
Although not designed to be worn at an optimal distance from the talker’s mouth, smartphones do not require external components or additional hardware to be used as remote microphones [
13]. Furthermore, smartphones are readily used, with nearly everyone carrying one throughout the day. It has been reported that 98% of Americans own a cellphone or smartphone [
14]. Moreover, Bluetooth-compatible headphones for iOS (Apple Inc., Cupertino, CA, USA) and Android (Open Handset Alliance, Mountain View, CA, USA) operating systems have been introduced, making it easier to integrate RM system functionality directly through smartphones [
15]. However, there are limitations to using Bluetooth receivers as part of an RM system because of the delay caused by Bluetooth transmission. This delay has been reported to range from 30 to 274 ms [
7,
16]. Coyle reported that the average AP Bluetooth latency across 19 measurements was 144 ms compared to earlier AirPods versions that had latencies up to 274 ms [
16]. Goehring et al. [
17] reported a tolerable delay limit of 10 ms for TH listeners (n = 20) and 10–30 ms for listeners with hearing loss (n = 20), with annoyance ratings significantly increasing beyond these values. These findings suggest that Bluetooth transmission delays may affect the use of smartphone-based RM systems.
One smartphone-based RM system involves the iPhone and AP earphones (Apple Inc., Cupertino, CA, USA). It is noteworthy that the software available on compatible versions of the AP was recently approved by the U.S. Food and Drug Administration in 2024 as an over-the-counter HA for adults with mild-to-moderate hearing loss [
18]. The AP can connect via Bluetooth with iPhones and Android phones and serve as receivers worn by the listener, with the smartphone functioning as a transmitter/microphone when placed close to the talker. In Android phones (i.e., Samsung, Google Pixel), the RM features that can be used for enhancing hearing perception in noise include “Hearing Enhancement” in the Samsung series, and “Sound Amplifier” on Google Pixel. In iPhone devices, Live Listen (LL) is an additional native feature, available since 2014, that enables microphone transmission to compatible hearing devices such as HA or AP headphones with Bluetooth protocols. When activated, LL uses the iPhone’s microphone to capture sound and transmit it wirelessly from the iPhone to the receiver (i.e., AP), amplifying and clarifying the sounds in their immediate environment for users [
19]. The AP can be set at three modes: transparency (TP; lets outside sound in), noise cancellation (NC; cancels the external sounds), and off. This could potentially be a lower-cost alternative to a dedicated RM system designed for persons with TH. One such RM system, a Phonak Roger Touchscreen transmitter used with a Roger Focus II receiver, was shown to enhance speech intelligibility by an average of 53% in 16 children (ages 8–16 years) with unilateral hearing loss, compared to their peers with TH [
20].
1.3. Previous Research for Use of AirPods Pro as Hearing Devices
Despite growing research interest driven by the AP’s affordability, widespread adoption, and regulatory approval as an over-the-counter hearing aid [
18], evidence supporting its clinical use remains limited. Prior to 2025, only six studies systematically evaluated the AP’s performance as hearing devices, as chronologically summarized in
Table 1 [
21,
22,
23,
24,
25,
26]. These investigations, comprising peer-reviewed articles, conference proceedings, trade publications, and graduate theses, collectively suggest potential auditory benefits of using the AP for both TH and adults with hearing loss. For instance, Lin et al. [
21] verified that AP receivers have comparable electroacoustic results to HAs, providing adequate amplification for individuals with mild-to-moderate hearing loss. Hammond and Diedesch [
22] found that listeners appreciated the custom audiogram-driven features in the AP, finding them easy to use and beneficial regardless of hearing status. Valderrama et al. [
24] reported that using the AP and an iPhone mitigated hearing challenges for individuals with TH, resulting in a significant 11.8% increase in speech intelligibility and a +5.5 dB SNR advantage compared to baseline conditions without the AP (unaided). Only one master’s thesis by Foroogozar [
26] investigated the use of the AP as part of a smartphone-based remote microphone system in 23 adults aged 60 and above with normal to mild/moderate hearing loss. Use of the LL feature on the smartphone versus the AP alone showed significant improvements in memory retention from 43.8% (no LL) to 59.4% (with LL) and mean sentence recognition scores from 81.8% (no LL) to 94.4% (with LL). These promising findings suggest the need for more rigorous clinical studies to validate their efficacy in audiological practice.
Considering the focus of this study was to evaluate the AP when used with a smartphone set to LL compared to a current clinically available RM system, it is essential to provide objective verification and limit potential confounding human factors (e.g., age, gender, education, personality, attention, linguistic familiarity, cognition). An objective, non-biased speech recognition method was developed following the COVID-19 pandemic, which disrupted traditional methods of collecting experimental data involving human subjects. This new approach was developed utilizing voice-to-text transcription (VTT), the Knowles Electronics Manikin for Acoustic Research (KEMAR) with a standardized artificial ear (Zwislocki coupler), to replace human responses. Advancements in artificial intelligence have led to the creation of various transcription tools designed to convert spoken words into text with great accuracy as a way to facilitate closed captioning [
27]. Such VTT applications include Otter.ai (
https://otter.ai/ (accessed on 4 January 2024)), Sonix (
https://sonix.ai (accessed on 4 January 2024)), Trint (
https://trint.com/ (accessed on 4 January 2024)), and Google Cloud Speech-to-Text (
https://cloud.google.com/speech-to-text/ (accessed on 4 January 2024)). In a comparative study assessing various VTT tools, Otter.ai was reported to provide the most accurate transcripts with an accuracy rate of 99.7% [
28]. Given this high accuracy rate, Otter.ai was utilized for VTT in this study.
The aim of this study was to compare the VTT accuracy of two RM systems (the iPhone set to LL and the Roger RM), specifically designed for users with TH. The investigation involved three SNRs, one HINT list, and two types of noise. The research question was as follows: how does the transcription accuracy for AP in TP and NC modes change when used with an iPhone set to LL compared to the Roger RM system across two noise types and three SNR levels? It was hypothesized that transcription accuracy would be superior in speech-shaped noise compared to babble noise, for the higher SNR conditions, and with the Roger RM system relative to the smartphone-based RM system. The results confirmed these hypotheses. The key findings are as follows: (1) The transcription accuracy was significantly affected by the SNR, noise type, RM condition, and the two-way interaction between SNR and RM condition. (2) In the −5 dB SNR noise condition, the smartphone-based RM with the AP in NC mode using the LL feature yielded comparable VTT accuracy relative to the Roger RM system.
3. Results
The mean transcription accuracy scores for No RM conditions (Baseline) and RM conditions are presented in
Table 4 and
Table 5 and
Figure 2. There were three factors of interest, including technology condition, noise type, and SNR. In general, RM conditions yielded greater accuracy than the No RM conditions. Of initial interest was the comparison of baseline conditions as shown in
Table 4. The use of AP alone without the smartphone was compared to KEMAR alone to determine improvements related to the AP alone. Contrary to the results obtained with humans, the accuracy of the VTT was higher for the KEMAR alone condition compared to the two AP conditions, most likely due to the occlusion effect of the AP and the lack of binaural processing. Therefore, given the uniqueness of this arrangement, no statistical analyses were completed for the baseline conditions.
Of greater interest was the comparison of the RM technology conditions shown in
Table 5. The statistical analyses included a repeated-measures ANOVA for the RM technology conditions of interest (Roger, AP NC + LL, and AP TP + LL), noise type (speech and babble), and SNR (+5, 0, and −5 dB). Significant main effects were observed for technology [F(2,34) = 47.45,
p < 0.001, large partial η
2 (η
2p; effect size) = 0.74, Mean Square Error (MSE) = 20.55], noise type [F(1,34) = 8.65,
p = 0.006, large η
2p = 0.20, MSE = 20.55], and SNR [F(2,34) = 42.91,
p < 0.001, large η
2p = 0.72, MSE = 20.55]. The accuracy was significantly higher when tested in speech noise compared to babble noise (84.22% vs. 79.89%) with a small effect size (Cohen’s d = 0.35). A significant two-way interaction was only found between the SNR and technology [F(4,34) = 6.03,
p = 0.001, large η
2p = 0.42, MSE = 20.55]. All other interactions were non-significant (
p > 0.05).
Follow-up analyses were performed using Tukey’s adjustment for each significant effect (p-value was marked as padj). For the main effect of RM technology conditions, the transcription accuracy of Roger (95.06%) was significantly better than AP TP + LL (81.22%) (t34 = 9.15, padj < 0.001; large Cohen’s d = 1.50) but not better than AP NC + LL (92.50%) (t34 = 1.69, padj = 0.22, small Cohen’s d = 0.41). There was also a significant difference between the two AP conditions that AP NC + LL was significantly greater than AP TP + LL (t34 = 7.46, padj < 0.001; large Cohen’s d = 1.17). For the main effect of three SNR conditions, all comparisons (+5 vs. 0, 0 vs. −5, and +5 vs. −5 dB) were significant as expected, with t34 = 3.35, 5.81, and 9.15, respectively, and all padj < 0.002. The relevant effect sizes were all large (Cohen’s d = 0.80, 0.85, 1.52, respectively).
Figure 3 illustrates the two-way interaction between the settings of the SNR and RM technology conditions across both noise types. The follow-up pairwise comparisons are shown in
Table 6. The differences among the conditions were most evident at the more challenging SNR conditions (0 and −5 dB). The patterns of significance at these SNRs were the same as the main effects, with Roger and AP NC + LL significantly greater than AP TP + LL, but there were no significant differences between them.
4. Discussion
The aim of the study was to compare the transcription accuracy in noise when using AP as part of an RM system (iPhone set to LL) and when using a sophisticated RM system (Phonak Roger). Three factors were involved, including technology conditions, noise type, and SNR. The results indicated that the transcription accuracy was significantly influenced by all three factors and a two-way interaction of SNR by technology. Overall, accuracy scores were the lowest at −5 compared to 0 and +5 dB SNR, and the highest when using an RM such as Roger On transmitting to a Roger receiver or using a smartphone with LL transmitting via Bluetooth low energy to an AP set to NC mode.
Regarding the baseline conditions (
Table 4), it is interesting to note that using the AP alone did not increase the accuracy score. Because of the somewhat artificial nature of the VTT arrangement with a manikin, the baseline results are provided for information purposes. As mentioned earlier, the results may be impacted by the fitting of the AP on an artificial pinna. It should also be noted that in this test arrangement using KEMAR, the result is obtained monaurally, so that the benefits of binaural listening that are available with human listeners were not observed.
However, when using an RM system (
Table 5), there were improvements in the transcription accuracy relative to using KEMAR alone. Valderrama et al. [
24] reported that using the AP on TH adults with self-disclosed difficulties hearing in noise, set for “maximum ambient noise reduction” and “conversation boost” enabled, provided a 5.4 dB SNR advantage and 11.8% intelligibility increase. It is likely that if they had included the use of the smartphone as an RM system, the improvements would have been greater. In addition, there was an 8% reduction in mental demand and listening effort. In the present study, the most accurate VTT score with the AP was obtained when using LL and with the AP also set to NC mode (92.5%). This agrees with Foroogozar [
26], who reported 94.4% accurate sentence recognition on average when testing adults with typical hearing wearing the AP with an iPhone set to LL. The NC mode deactivates the microphones on the AP and allows for signals with the highest quality as the input to the Otter transcription program. Further research is required to confirm such benefits in humans with different degrees of hearing impairment.
When considering the interaction between the SNR and technology, the most demanding listening conditions (0 and −5 dB SNR) revealed significant differences in accuracy among the three RM conditions. The accuracy was significantly lower when using AP TP + LL compared to Roger and AP NC + LL, although there was no significant difference between Roger and AP NC + LL. This suggests that RM features offer potential benefits relative to AP TP + LL mode, which allows the transmission of environmental noise. In the AP TP mode, the environmental noise is mixed with the signal arriving from the RM, thus reducing its potential benefit. Differences between smartphone-based RM and Roger systems may stem from Bluetooth transmission delays (averaging 144 ms in AP [
16]), which are not present in Roger’s digital modulation transmission. These delays substantially exceed the reported tolerable thresholds of 10–30 ms for hearing devices [
17]. The asynchrony becomes particularly apparent when visual cues from the speaker are present, creating a temporal mismatch between auditory and visual inputs. This incongruity may strain working memory capacity during noisy listening conditions, as listeners must reconcile limited acoustic information with potentially conflicting visual cues. Such cognitive demands can delay speech comprehension and increase processing load during ongoing communication.
These results have potential clinical and research implications. Using Roger and smartphone-based RM is beneficial in reducing the challenges caused by noise relative to not using any device across the SNR and noise types, with benefits of 17.95, 15.39, and 4.11% in Roger, AP NC + LL, and AP TP + LL, respectively. Considering the current device setting is mainly for those without hearing loss, such benefits can potentially help improve hearing difficulties in noise for listeners with TH, occupying 12–15% of the general population [
1,
2]. The benefits of the Roger RM system were confirmed by Thibodeau et al. [
9] in 10 subjects with TH despite using different Roger transmitters (Pen, Select) and receivers (Roger Focus-first generation). Similar to the RM system, the benefits of using smartphone-based RM on speech recognition in noise are expected in humans with TH. It is also of possible benefit to those who have hearing aids that connect to smartphones when using a remote mic app on the phone, if the Bluetooth delay can be tolerated. However, many HA manufacturers do offer proprietary RM devices now with personal hearing technology.
In addition, an iPhone and the AP can work as a portable RM system in daily communication for listeners with TH. This is similar to the findings in
Table 1 on the benefit of using the AP as a hearing assistance device [
21,
22,
23,
24,
25,
26]. Given the widespread adoption and convenience of smartphones and earbuds, along with the lower cost compared to Roger devices, smartphone-based RM systems can be considered as alternative RM systems for TH individuals with hearing difficulties in noise, particularly those concerned about cost or the stigma associated with traditional hearing devices. Additionally, with the governmental approval of the software for iPhones to adjust the AP for persons with mild-to-moderate hearing impairments, such systems may function both as RM systems and as hearing assistive devices. However, further research on humans is required before these solutions can be suggested as part of clinical protocols.
Using VTT and KEMAR has the potential to provide objective verification of new technological features, especially in situations where real participants are unavailable due to constraints such as COVID-19 or lack of funding. This approach can yield objective results that are not influenced by human factors such as age, gender, emotional status, personality, cognitive functions, etc. However, to ensure comparability across different studies, the version of the transcription application (e.g., Otter) should be specified, given the rapid pace of feature development in this high-tech era.
Although the results suggested a promising testing method for research and highlighted the potential benefits of using RM for listeners with TH, there are limitations to consider. Firstly, the use of KEMAR may restrict the applicability of the results to humans. Further research involving human participants with various profiles (e.g., age, cognition, and hearing status) is necessary to validate these findings before adopting such a smartphone-based RM system for rehabilitative solutions. Secondly, only one smartphone-based RM system using an iOS device and a single type of headphone was evaluated. This limitation restricts the generalizability of the results to other smartphone platforms (e.g., Android) and different headphones with similar features like LL. Finally, while the cost of an iPhone and the AP may be lower than that of a sophisticated Roger system, these devices may still be expensive for individuals, especially if they are not iPhone users.
This study provides preliminary evidence on the AP’s performance as part of an RM system relative to a Phonak RM system, without considering human participant variability. While the results suggest that the AP may help improve speech recognition in noise under controlled conditions, further research with humans is needed to validate these findings. Suggested future steps include the following: (1) Human subject validation: Evaluating performance across diverse populations (varying hearing levels, ages, genders, and cognitive abilities) to confirm real-world applicability. (2) Behavioral response analysis: Examining differences in human performance (speech recognition, listening effort) and subjective evaluation (sound quality perception, technology acceptance) under the same testing conditions in this study. (3) Signal transmission optimization: Engineering studies to address Bluetooth latency issues, ensuring real-time auditory synchronization meets clinical requirements for assistive listening. These investigations will improve the generalizability of the results and provide deeper insights into the clinical feasibility of AP as an assistive listening device.