Previous Article in Journal
Adaptive Kalman Filter-Based Impulsive Noise Cancellation for Broadband Active Noise Control in Sensitive Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Combined Effects of Speech Features and Sound Fields on the Elderly’s Perception of Voice Alarms

School of Architecture, Tianjin University, Tianjin 300072, China
*
Author to whom correspondence should be addressed.
Acoustics 2026, 8(1), 2; https://doi.org/10.3390/acoustics8010002
Submission received: 29 October 2025 / Revised: 10 December 2025 / Accepted: 21 December 2025 / Published: 24 December 2025

Abstract

Using efficient voice alarms to ensure safe evacuation is important during emergencies, especially for the elderly. Factors that have important influence on speech perceptions have been investigated for several years. However, relatively few studies have specifically explored the key factors influencing perceptions of voice alarms in emergency situations. This study investigated the combined effects of speech rate (SR), signal-to-noise ratio (SNR), and reverberation time (RT) on older people’s perception of voice alarms. Thirty older adults were invited to evaluate speech intelligibility, listening difficulty, and perceived urgency after hearing 48 different voice alarm conditions. For comparison, 25 young adults were also recruited in the same experiment. The results for older adults showed that: (1) When SR increased, speech intelligibility significantly decreased, and listening difficulty significantly increased. Perceived urgency reached its maximum at the normal speech rate for older adults, in contrast to young adults, for whom urgency was greatest at the fast speech rate. (2) With the rising SNR, speech intelligibility and perceived urgency significantly increased, and listening difficulty significantly decreased. In contrast, with the rising RT, speech intelligibility and perceived urgency significantly decreased, while listening difficulty significantly increased. (3) RT exerted a relatively stronger independent influence on speech intelligibility and listening difficulty among older adults compared to young adults, which tended not to be substantially moderated by SR or SNR. The interactive effect of SR and RT on perceived urgency was significant for older people, but not significant for young people. These findings provide referential strategies for designing efficient voice alarms for the elderly.

1. Introduction

Safe evacuation from buildings has been studied extensively. Improving the evacuation efficiency during an emergency is important, especially for the elderly. Owing to a decline in speech perception ability, cognitive ability, and physical function, it is more difficult for the elderly to perceive the occurrence of an emergency and deal with danger, which results in higher requirements for evacuation safety [1]. Auditory alarms are one of the most common measures to disseminate evacuation information in case of an emergency because they make people recognize danger faster, reduce response time, and improve evacuation efficiency [2]. Auditory alarms can be categorized into alarm signals and voice alarms [3]; compared with alarm signals, voice alarms can reduce pre-movement response time and are more efficient in motivating people to evacuate [4].
The ideal aim of speech transmission is to enable the listeners to hear the voice easily and understand the information accurately [5]. Ensuring that voice alarm broadcasts are clear and understandable during an emergency facilitates safe evacuation. Therefore, the speech intelligibility to listeners is important in designing efficient voice alarms. It is very common for the elderly to have a certain degree of hearing loss [6], and have more difficulties in understanding speech even in an ideal acoustic environment, which makes speech intelligibility more important for them than for young people [7,8]. In fact, owing to the influence of noise and reverberation, people may exert much more cognitive effort to obtain a greater understanding of speech, and this effort may be overlooked when only speech intelligibility is measured [9]. To investigate the listening effort, some studies introduced “listening difficulty” and measured it with subjective rating [10,11], which has been widely used in speech assessment [12,13]. In addition, perceived urgency is an important indicator in voice alarm assessment since people’s low level of perceived risk and evacuation awareness always reduces evacuation efficiency [14,15]. Therefore, voice alarms should attract people’s attention and arouse a sense of urgency.
Previous studies found that speech features and sound fields are two essential attributes that affect speech perception. Speech features mainly mean the acoustic characteristics of speech, and factors such as speech rate, loudness and frequency have been studied to have significant effects on speech perception [16,17,18]. Speech rate has been proven to be a very important factor, and in general, reducing speech rate is an effective way to increase speech intelligibility [19,20]. However, some studies indicated that speaking too slowly may not be appropriate for emergency broadcasts, because it would reduce the perceived urgency of the alarm and result in delayed evacuation [21,22]. Regarding sound fields, many studies have indicated that factors such as the signal-to-noise ratio and reverberation time have significant effects on speech perception [23,24]. In particular, these factors have a greater impact on older adults than on younger adults [7,25].
In the current acoustic and speech research fields, studies on the factors influencing speech perception in general circumstances are relatively comprehensive. However, few studies have focused on emergency situations and explored the combined influence of speech features and sound fields on the perception of voice alarms, especially for older adults. Therefore, this study aimed to comprehensively explore the combined effects of speech features and sound field factors on the elderly’s perception of voice alarms, determine the influencing mechanism, compare the differences between older and young people, and provide guidelines for the design of an efficient voice alarm for the elderly. The specific research objectives are to identify the following:
(1)
The main effects of speech features and sound fields on older adults’ perception of voice alarms.
(2)
The interactive effects of speech features and sound fields on older adults’ perception of voice alarms.
(3)
The differences between the young and the elderly in the effects of speech features and sound fields on the perception of voice alarms.

2. Materials and Methods

2.1. Participants

Thirty adults aged 60 years or older, with no cognitive disorders, participated in the experiment (12 males and 18 females, mean age = 63.43 ± 4.49 years). An air-conductive pure-tone test with a MAICO MA 51 audiometer was conducted to test the hearing level of each participant, and the average value of pure-tone audiometry (PTA) at octave frequencies of 500–4000 Hz for both ears was used to measure their hearing loss. According to the criteria for hearing loss from the World Health Organization in 2021, 22 older adults had a normal hearing condition, while eight older adults had a mild hearing loss (PTA = 20–35 dB), which was within the normal range for their ages and was not expected to have an influence on the effectiveness of the experiment. To compare different ages, 22 students from Tianjin University were also recruited to participate in the same experiment (10 males and 12 females, mean age = 25.18 ± 1.47 years), all of whom had normal hearing ability (PTA < 20 dB). Therefore, a total of 52 participants, who speak Mandarin as their first language, took part in the experiment.
This study was approved by the Ethics Committee of Tianjin University. All participants gave written consent before the experiment began.

2.2. Materials

2.2.1. Visual Materials

To create real-life settings, visual stimuli were provided in the experiment. Older people are familiar with hospital settings, which have high population density and considerable noise. As it is necessary to investigate how to improve evacuation efficiency in these special places, a 5 min video of public spaces in a large hospital was recorded as visual material (Figure 1).

2.2.2. Sound Materials

The voice alarm materials used in our study were formulated by referring to relevant research and the voice alarm broadcasts that were applied to actual scenes [4,26]. Each alarm sentence contained three parts: a call for attention, information about the emergency, and instructions on what to do. All sentences were controlled to be within 30 words, and all words were commonly used in daily life to ensure that the sentences were easy to understand. One example was: “Attention please! Attention please! There is an emergency situation reported around the building. Please stay at your current location and await further instructions. DO NOT exit the building.” In this study, 48 alarm sentences were formulated and pronounced by a professional Chinese male voice actor in a recording room and saved in WAV format. Signals were recorded at about 4.0 sps with 44,100 Hz sampling frequency.
Our previous study [27] found that the perception of voice alarms for the elderly is mainly affected by the speech rate (SR), signal-to-noise ratio (SNR), and reverberation time (RT). Therefore, in this study, the SR was selected as the factor of speech features, and the SNR and RT were selected as factors of the sound field to create different sound conditions.
(1)
Speech rate (SR)
SR is the quantity of words included in a unit of time and can be measured as words per minute (wpm) or syllables per second (sps). Some research has found that 4–4.3 syllables/second (sps) can represent a normal Chinese SR [19]. Therefore, in our experiment, 4.0 sps was established as the normal rate. The iZotope Radius algorithm in Adobe Audition 2020 was used, with the Stretch parameter adjusted to compress or expand the time domain of voice alarm materials by 20%, thereby creating slow and fast speech rates. This approach ensured clear differences between rate conditions while avoiding noticeable voice distortion. Finally, three different SR levels were used in the experiment: slow (3.2 sps), normal (4.0 sps), and fast (4.8 sps).
(2)
Signal-to-noise ratio (SNR)
To determine the sound pressure level of the stimuli, the ambient noise levels in different hospital areas were measured before the experiment, ranging from 55 dB to 75 dB. To ensure that the sound stimuli contained the acoustic conditions existing in real hospital environments as much as possible, four levels of SNR (0 dB, 5 dB, 10 dB, 15 dB) were set in the experiment. Sounds in a hospital were recorded as the background noise of the experiment, with footsteps, human voices, and broadcasts as the main sound sources. Adobe Audition 2020 was used to control the background noise and adjust the sound pressure level of alarms to create different levels of SNR. The sound pressure level of the background noise was normalized to 60 dB using a multifunctional sound level meter (AWA6228) at the participants’ position.
(3)
Reverberation time (RT)
Before the experiment, RT conditions in real hospital environments were also measured, and the range was generally from 0.7 to 2.2 s. Based on the measured reverberation conditions, and in order to ensure that there were differences in the perception of each RT level, four RT conditions were set up for the experiment: 0.5 s, 1.0 s, 1.5 s, and 2.0 s. Two rectangular Odean models with different volumes were constructed: 16.0 m × 8.4 m × 5.1 m and 17.6 m × 17.6 m × 4.7 m, to conduct acoustic simulation. By adjusting the sound absorption materials of model surfaces, including walls, ceilings, and floors, the models were tuned to achieve the four RT conditions, and the corresponding impulse responses were subsequently generated. The alarm materials were convolved with the impulse responses of each RT condition in Adobe Audition 2020 to generate voice alarms with reverberation characteristics for the listening experiment.
Finally, 48 sound conditions with three SR levels, four SNR levels, and four RT levels were established for the experiment (Figure 2). All the auditory files were monaural in WAV format and presented via two loudspeakers from both sides of the participants.

2.3. Indicators

In this experiment, speech intelligibility was used to measure the clarity of voice alarms perceived by the participants, and two indicators, listening difficulty and perceived urgency, were selected to investigate the participants’ subjective assessments of voice alarms.
(1)
Speech intelligibility
Intelligibility refers to the recognition of speech stimuli and the accuracy of the verbal response [19]. In this experiment, the voice alarm materials were played through loudspeakers (e.g., “Attention! Attention! There has been a fire reported on the fifth floor. Please evacuate immediately to the nearest emergency exit. Do not use the elevator”). The underlined parts represented the key information of the alarm. Subjects were required to accurately repeat the key information orally after the voice alarm finished. The percentage of key information that the subjects repeated correctly was recorded as the intelligibility score. As for the key information of each underlined part, in addition to the example provided above, three to five additional types of information, such as emergency types and instructions on behaviors, were provided to form different alarm messages.
(2)
Subjective assessments
Subjective assessments included listening difficulty and perceived urgency. For the listening difficulty, the subjects answered a question, “Do you find it difficult to hear and understand the voice of the alarm as you listen to it?” A 5-point rating scale ranging from −2 (very easy) to 2 (very difficult) was used for the evaluation. For the perceived urgency, the subjects answered a question, “Do you think the voice alarm broadcast can get your attention and give enough urgency?” The question was evaluated using a 5-point rating scale from −2 (extremely no) to 2 (extremely yes).

2.4. Procedures

The experiment was conducted in a semi-anechoic room to minimize the influence of background noise and reverberation in the room when playing the voice alarms, with a laptop connected to a projector (XGIMI, Chengdu, China) which was projected onto a screen directly in front of the seat to present the recorded visual stimuli, while another laptop was connected to a Steinberg sound card and two loudspeakers (Genelec 7350A, Iisalmi, Finland) were used to play the sound materials. The loudspeakers were placed on the left and right sides of the subject, at a height of 1.5 m above the ground and a distance of 2 m from the subject. Before beginning the experiment, the sound pressure level of the stimuli was calibrated using a multifunctional sound level meter (AWA6228, Hangzhou, China) at the listening position. The connections and locations of the experimental instruments are shown in Figure 3.
Before the experiment, the subjects were informed of the test contents and procedures. Three practice materials were provided to the participants to familiarize themselves with the sound stimuli. When the formal experiment began, the subjects were told to imagine that they were now in a hospital and had something to do with the visual stimuli first presented for approximately 20 s. Then, the voice alarm was played twice with a 2 s interval. Subsequently, the subject completed the speech intelligibility test and subjective assessments, and then watched the visual material and heard another sound stimulus. All voice alarms were randomly presented. To avoid the influence of fatigue on the effectiveness of the experiment, the participants had a regular 3 min break after evaluating 12 voice alarm stimuli, and they could ask for a break at any time when they felt tired. The average length of the experiment was about 1 h for each subject. The experimental procedure is illustrated in Figure 4.

2.5. Data Analysis

Data analysis was performed using SPSS software (version 25, IBM Corporation, Armonk, NY, USA). The normality of the residuals of each dependent variable was examined using histogram plots and Quantile-Quantile (Q-Q) plots. The histograms with normal curves showed that the distribution of residuals closely approximated a normal distribution, and consistently, in the Q-Q plots, the points are evenly distributed along the diagonal line, indicating that the variables follow a normal distribution. Therefore, a series of factorial analyses of variance (ANOVA), which can be used to explore the effects of two or more factors on a single dependent variable, were performed to explore the main and interactive effects of SR, SNR, and RT on speech intelligibility and subjective assessments. The partial eta squared (η2p) was used to compare the effect size of each independent variable, where a higher η2p indicated a greater effect. Post hoc analyses were conducted using Bonferroni corrections for multiple comparisons when significant effects were observed. Each analysis was performed at a 95% significance level, and statistical significance was p < 0.05.

3. Results

3.1. The Main Effects of Speech Features and Sound Fields

Table 1 shows the main effects of speech features and sound fields on older people’s perceptions of voice alarms. The SR, SNR, and RT had significant independent effects on speech intelligibility, listening difficulty, and perceived urgency. RT had the greatest effect on speech intelligibility (η2p = 0.167) and listening difficulty (η2p = 0.216). SNR had the greatest effect on perceived urgency among the three factors, while the effect size fell within the medium range (η2p = 0.045), indicating a moderate influence.
As shown in Figure 5, the results of the post hoc test indicated that, for the effect of SR, the speech intelligibility significantly decreased from 64.49% to 56.94% and the listening difficulty significantly increased from −0.31 to 0.15 when SR increased from slow to normal. In the fast rate condition, speech intelligibility significantly decreased to below 50% and listening difficulty significantly rose to 0.34. Thus, a slow speech rate, which was 3.2 sps in our study, might be more helpful for older people to effortlessly hear and understand voice alarms. Conversely, the perceived urgency significantly improved from 0.07 to 0.31 when the SR changed from slow to normal. It then decreased to 0.20 in the fast rate condition, although the change was not significant. In general, all three SR conditions could maintain the perceived urgency at a relatively high level (> 0), but the normal speech rate, which was 4.0 sps in our study, tends to support older people in better perceiving the urgency of voice alarms.
As shown in Figure 6, regarding the effect of the SNR, the results showed that when the SNR increased, speech intelligibility and perceived urgency showed an increasing trend, while listening difficulty declined. Specifically, the speech intelligibility significantly increased from 44.24% to 58.01% and the listening difficulty significantly decreased from 0.49 to −0.06 when SNR was improved from 0 dB to 5 dB, but then the changes were not significant when SNR was higher than 5 dB. These results suggest that an SNR of around 5 dB tends to support better hearing and comprehension of voice alarms among older adults. As for perceived urgency, the changes were not significant when the SNR increased from 0 to 5 dB and from 10 to 15 dB. However, when the SNR increased from 5 to 10 dB, the perceived urgency increased significantly from 0.02 to 0.40, implying that an SNR near 10 dB might be particularly conducive to enhancing the perceived urgency of voice alarms.
In contrast to the effect of SNR, the results in Figure 7 showed that with the increase in RT, speech intelligibility and perceived urgency decreased, while listening difficulty increased. Specifically, in the 0.5 s condition, speech intelligibility was 71.17% and perceived urgency was 0.45, which were significantly higher than the scores in the other three RT conditions. The listening difficulty was −0.82 in the 0.5 s condition and significantly lower than that in other RT conditions. When RT was increased to 2.0 s, the speech intelligibility significantly reduced to 36.10% and the listening difficulty significantly rose to 0.85, while the perceived urgency did not significantly change when RT was higher than 0.5 s. These results suggested that an RT of less than 0.5 s might provide favorable conditions for older adults to clearly hear and understand voice alarms and to perceive urgency more easily, while the efficiency of voice alarms might particularly decline when RT exceeds 1.5 s.

3.2. The Interactive Effects of Speech Features and Sound Fields

The results showed that SR and SNR had significant interactive effects on speech intelligibility and listening difficulty (Table 1). As shown in Figure 8, in the slow-rate condition, speech intelligibility increased and listening difficulty decreased as the SNR improved, suggesting that when the speech rate is slow, enhancing the SNR might help older adults more clearly hear and understand voice alarms, particularly within the range from 0 to 5 dB. In the normal-rate condition, speech intelligibility increased and listening difficulty decreased when SNR varied from 0 to 10 dB, while speech intelligibility began to decrease and listening difficulty began to increase when SNR changed from 10 to 15 dB. Therefore, to maintain voice alarms that are clear and easy to listen to under a normal speech rate, it might be more helpful to maintain the SNR below approximately 10 dB. In the fast-rate condition, despite the rising SNR, speech intelligibility remained at a relatively low level and listening difficulty remained at a relatively high level unless SNR was 15 dB. This suggests that when the speech rate is fast, increasing SNR might not substantially improve speech intelligibility and reduce listening difficulty.
The interaction between SR and RT significantly affected perceived urgency (Table 1). As shown in Figure 9, in the 0.5 s RT condition, the perceived urgency significantly improved with the increase in SR, while in the higher RT conditions, the perceived urgency increased when SR increased from slow to normal, but decreased when the speech rate was fast, especially in the 2.0 s condition. These findings suggested that when the RT was below 0.5 s, increasing the speech rate might help enhance the perceived urgency of voice alarms. However, under conditions of higher reverberation, increasing the speech rate did not appear to substantially improve perceived urgency, and a normal speech rate tended to be the most effective for maintaining perceived urgency.

3.3. Differences Between Older and Young People

To further explore the characteristics of efficient voice alarms for elderly people, this study compared the differences between older and younger people regarding the effects of speech features and sound fields on speech intelligibility and subjective assessments. Table 2 presents the ANOVA results of the elderly and young groups. Comparing the main effects of the three factors, it is seen that the influence of SR on perceived urgency differed between the age groups. With an increase in SR, the perceived urgency among young adults showed an overall rising trend, suggesting that a faster rate may help enhance the perceived urgency of voice alarms for this group. For older adults, however, the highest urgency rating was observed under the normal speech rate condition, while both speeding up and slowing down the rate appeared to reduce perceived urgency. (Figure 10).
Comparing the interactive effects of these factors, it can be seen that, in contrast to the elderly group, the interactive effects between RT and SR and between RT and SNR were significant in the young group. These results suggested that RT exerted a relatively stronger independent influence on speech intelligibility and listening difficulty among older adults, and this effect tended not to be substantially moderated by SR or SNR. For perceived urgency, the interaction between SR and RT differed across age groups, reaching significance for older adults but not for young adults.

4. Discussion

4.1. Factors Influencing Speech Intelligibility and Listening Difficulty

Our results indicated some important findings regarding the combined effects of SR, SNR, and RT on speech intelligibility and listening difficulty of voice alarms for the elderly. First, the results showed that as SR increased, speech intelligibility decreased and listening difficulty increased. This was consistent with previous studies suggesting that a slow speaking rate was advantageous to speech understanding or word recognition [14]. Some studies have also found that people in degraded listening conditions preferred a slightly slow speech rate [28,29], which supports our finding that older people performed better in the slow rate condition.
Regarding the effects of SNR and RT, our study was consistent with previous findings that noise and reverberation are important factors influencing speech intelligibility and listening difficulty [30,31]. Our study also showed that when SNR was greater than 5 dB, there was no significant change in speech intelligibility and listening difficulty, which corresponded with previous research showing that SNR had a saturation effect. When it is higher than a certain value, the influence tends to become insignificant [10,23].
Regarding the interactive effects on speech intelligibility and listening difficulty, the results showed that the interaction between SR and SNR was statistically significant. For the elderly, when the speech rate was fast, speech intelligibility remained relatively low and listening difficulty remained relatively high despite the increase in SNR. This finding supports a previous study indicating that, because of the decline in cognitive processing speed, older adults are more sensitive to changes in speaking rate, and a slow speech rate is especially beneficial for them [32]. Our study also found that RT exerted a relatively stronger independent influence, which was also a crucial difference between the young and elderly. This might be because older people are generally much more influenced by informational masking [33]. Aging is not only accompanied by hearing loss, but also by age-related declines in cognition and auditory processing, which result in speech perception difficulties even for older adults with normal hearing thresholds [34]. In the condition of informational masking, for example, higher reverberation, speech perception requires more cognitive load, which leads to higher listening difficulty and worse speech intelligibility for normal hearing older adults than for normal hearing young adults [35]. This could explain our finding that the independent effect of RT was greater for the elderly and was not considerably influenced by other factors. Kwak et al. also found that compared to younger people, the older group was more affected by the reverberation condition in the sentence recognition task [36].

4.2. Factors Influencing Perceived Urgency

In our study, the SR, SNR, and RT had significant effects on perceived urgency. In general, the perceived urgency improved when the SNR increased and RT decreased. These results are in line with those of a previous study, indicating greater perceived urgency as the SNR increased when subjects were exposed to auditory warnings [37]. However, the effect of SR showed that the perceived urgency was highest in the normal rate condition for the elderly; speeding up the rate could not enhance perceived urgency, which was another main difference between the young and elderly groups. This result seemed not to be consistent with some previous findings that a fast speed rate could improve perceived urgency [38]. A possible explanation is that it is difficult for older people to easily understand rapid speech information, especially in high noise and reverberation conditions [2,39], which might influence their perception of the urgency of the alarm. The result of the interactive effect of SR and RT in our study might support this explanation, which showed that perceived urgency kept increasing with rising SR only in the 0.5 s RT condition. When there was higher reverberation, the perceived urgency significantly decreased in the fast-rate condition.

4.3. Guidelines for the Design of Efficient Voice Alarms for the Elderly

Factors influencing speech perception have been studied for several years. However, the existing research mainly focuses on the evaluation and prediction of speech intelligibility in indoor environments using acoustic parameters [40,41,42], with few providing effective design strategies to enhance speech perception, especially for voice alarms. The present study revealed the effects of SR, SNR, and RT on older adults’ perception of voice alarms. By summarizing our results, some referential strategies can be provided for the design of efficient voice alarms in China that ensure both intelligibility and urgency. First, a slow SR (3.2 sps) appeared to be the most efficient condition overall. A normal SR (4.0 sps) tended to increase urgency, although it reduced intelligibility and increased listening difficulty. In addition, when the speech rate was slow, maintaining an SNR of around 5 dB or higher appeared beneficial for both intelligibility and urgency, whereas keeping the SNR below approximately 10 dB was more suitable when the rate was normal. Taken together, an SNR in the range of 5–10 dB might represent a generally advantageous condition. Finally, shorter reverberation times, particularly those below 0.5 s, were associated with better efficiency, which tended to decline when RT exceeded about 1.5 s.

4.4. Limitations and Further Research

Although this study provides evidence and useful guidelines for designing efficient voice alarms, it has some limitations. First, the effects might vary according to different characteristics of the elderly, such as gender or noise sensitivity. Future studies should consider these personal factors. In addition, it should be noted that our experiment was conducted in a controlled laboratory environment with participants remaining stationary. As a result, the study did not account for factors such as participants’ movement or competing voices in real-world settings, both of which may influence the perception of voice alarms. Future research should incorporate these ecologically relevant factors through a more realistic and rigorous experimental design. Finally, in addition to speech intelligibility and subjective assessments, evacuation behavior is also important for safe evacuation [43]. Therefore, in future research, it will be necessary to examine older people’s risk behaviors during evacuations in order to design voice alarms to reduce these behaviors.

5. Conclusions

This study investigated the main and interactive effects of SR, SNR, and RT on the elderly people’s perception of voice alarms and compared the differences between the young and elderly. Three major results were obtained:
(1)
SR had a significant main effect on speech intelligibility and subjective assessment. Speech intelligibility significantly decreased and listening difficulty significantly increased when SR increased. However, for perceived urgency, the score increased from the slow rate to the normal rate condition and then decreased in the fast rate condition, which differed from the young group, in which the perceived urgency continuously increased and was highest in the fast rate condition.
(2)
SNR and RT also had significant main effects on speech intelligibility and subjective assessments. With an increase in SNR, speech intelligibility and perceived urgency significantly increased, and listening difficulty significantly decreased. In contrast, with rising RT, speech intelligibility and perceived urgency significantly decreased, while listening difficulty significantly increased.
(3)
The interactive results showed that, in contrast to the young group, there was only a significant interactive effect between SR and SNR on speech intelligibility and listening difficulty in the elderly group. RT exerted a relatively stronger independent influence on speech intelligibility and listening difficulty among older adults, and this effect tended not to be substantially moderated by SR or SNR. Regarding perceived urgency, the interactive effect of SR and RT was significant for older adults but not for young adults. In the 0.5 s RT condition, the perceived urgency significantly improved with the increase in SR, while in the higher RT conditions, the perceived urgency increased from a slow to normal rate, but decreased when the speech rate was fast.
These findings contribute to the research on the efficiency of voice alarms and can provide referential strategies for designing efficient voice alarms for the elderly.

Author Contributions

Conceptualization, H.M. and C.W.; methodology, H.M., W.W. and C.W.; software, W.W. and C.W.; formal analysis, W.W.; investigation, W.W.; writing—original draft preparation, Q.C.; writing—review and editing, H.M.; visualization, Q.C.; funding acquisition, H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 51978454.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors are grateful to the elderly people and students for participating in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Füllgrabe, C.; Moore, B.C.J.; Stone, M.A. Age-group differences in speech identification despite matched audiometrically normal hearing: Contributions from auditory temporal processing and cognition. Front. Aging Neurosci. 2015, 6, 347. [Google Scholar] [CrossRef] [PubMed]
  2. Xia, X.; Li, N.; Gonzalez, V.A. Exploring the Influence of Emergency Broadcasts on Human Evacuation Behavior during Building Emergencies Using Virtual Reality Technology. J. Comput. Civil Eng. 2021, 35, 04020065. [Google Scholar] [CrossRef]
  3. Taylor, J.R.I.; Wogalter, M.S. Specific egress directives enhance print and speech fire warnings. Appl. Erg. 2019, 80, 57–66. [Google Scholar] [CrossRef] [PubMed]
  4. Purser, D. Comparisons of Evacuation Efficiency and Pre-Travel Activity Times in Response to a Sounder and Two Different Voice Alarm Messages; Springer: Berlin/Heidelberg, Germany, 2009; pp. 121–134. [Google Scholar]
  5. Sato, H.; Morimoto, M.; Wada, M. Relationship between listening difficulty rating and objective measures in reverberant and noisy sound fields for young adults and elderly persons. J. Acoust. Soc. Am. 2012, 131, 4596–4605. [Google Scholar] [CrossRef]
  6. Honghu, Z.; Jia, Y.; Jianxin, P. Chinese speech intelligibility of elderly people in environments combining reverberation and noise. Appl. Acoust. 2019, 150, 1–4. [Google Scholar] [CrossRef]
  7. Zeng, J.; Peng, J.; Zhao, Y. Comparison of speech intelligibility of elderly aged 60–69 years and young adults in the noisy and reverberant environment. Appl. Acoust. 2020, 159, 107096. [Google Scholar] [CrossRef]
  8. Chlasta, K.; Struzik, P.; Wójcik, G.M. Enhancing dementia and cognitive decline detection with large language models and speech representation learning. Front. Neuroinform. 2025, 19, 1679664. [Google Scholar] [CrossRef]
  9. Visentin, C.; Prodi, N.; Cappelletti, F.; Torresin, S.; Gasparella, A. Using listening effort assessment in the acoustical design of rooms for speech. Build. Environ. 2018, 136, 38–53. [Google Scholar] [CrossRef]
  10. Morimoto, M.; Sato, H.; Kobayashi, M. Listening difficulty as a subjective measure for evaluation of speech transmission performance in public spaces. J. Acoust. Soc. Am. 2004, 116, 1607–1613. [Google Scholar] [CrossRef]
  11. Sato, H.; Bradley, J.S.; Morimoto, M. Using Listening Difficulty Ratings of Conditions for Speech Communication in Rooms. J. Acoust. Soc. Am. 2005, 117, 1157–1167. [Google Scholar] [CrossRef]
  12. Yancey, C.M.; Barrett, M.E.; Gordon-Salant, S.; Brungart, D.S. Binaural advantages in a real-world environment on speech intelligibility, response time, and subjective listening difficulty. Jasa Express Lett. 2021, 1, 14406. [Google Scholar] [CrossRef] [PubMed]
  13. Dillon, H.; Gaikwad, S.; Luengtaweekul, P.; Buchholz, J.; Cameron, S. Development of the Test of Listening Difficulties-Universal and Australian Normative Data in Children and Adults. J. Speech Lang. Hear. Res. Jslhr 2025, 68, 12. [Google Scholar] [CrossRef] [PubMed]
  14. Ofuji, K.; Ogasawara, N. Verbal disaster warnings and perceived intelligibility, reliability, and urgency: The effects of voice gender, fundamental frequency, and speaking rate. Acoust. Sci. Technol. 2018, 39, 56–65. [Google Scholar] [CrossRef]
  15. Kaspierowicz, I.; Sato, S. Predictive models for urgency perception in railway crossing alarm signals: Development and applications for Argentina. Appl. Acoust. 2025, 240, 110926. [Google Scholar] [CrossRef]
  16. Peng, J.; Yan, N.; Wang, D. Chinese speech intelligibility and its relationship with the speech transmission index for children in elementary school classrooms. J. Acoust. Soc. Am. 2015, 137, 85–93. [Google Scholar] [CrossRef]
  17. Hodoshima, N. Effects of urgent speech and preceding sounds on speech intelligibility in noisy and reverberant environments. In Proceedings of the 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), San Francisco, CA, USA, 8–16 September 2016; Volume 1–5, pp. 1696–1699. [Google Scholar] [CrossRef]
  18. Ghasemi, M.; Fullenkamp, A.M.; Whitfield, J.A. Consistency of Order Effects in Higher Effort Speaking Styles Between Sessions. J. Speech Lang. Hear. Res. 2025, 68, 3628–3645. [Google Scholar] [CrossRef]
  19. Chan, A.H.S.; Lee, P.S.K. Intelligibility and preferred rate of Chinese speaking. Int. J. Ind. Ergon. 2005, 35, 217–228. [Google Scholar] [CrossRef]
  20. Smiljanic, R.; Bradlow, A.R. Speaking and Hearing Clearly: Talker and Listener Factors in Speaking Style Changes. Lang. Linguist. Compass 2009, 3, 236–264. [Google Scholar] [CrossRef]
  21. Yokoyama, S.; Tachibana, H. Subjective experiment on suitable speech-rate of public address announcement in public spaces. Proc. Mtgs. Acoust. 2013, 19, 015081. [Google Scholar]
  22. Hodoshima, N. Effects of urgent speech and congruent/incongruent text on speech intelligibility for older adults in the presence of noise and reverberation. Speech Commun. 2021, 134, 12–19. [Google Scholar] [CrossRef]
  23. Rennies, J.; Schepker, H.; Holube, I.; Kollmeier, B. Listening effort and speech intelligibility in listening situations affected by noise and reverberation. J. Acoust. Soc. Am. 2014, 136, 2642–2653. [Google Scholar] [CrossRef]
  24. Langerak, N.C.; Stronks, H.C.; van Marrewijk, E.F.; Briaire, J.J.; Lemercier, J.; Gerkmann, T.; Frijns, J.H.M. A Novel Artificial-Intelligence-Based Reverberation-Reduction Algorithm for Cochlear Implants Enhances Speech Intelligibility and User Experience. Ear Hear 2025. [Google Scholar] [CrossRef]
  25. Mishra, S.; Shubhadarshan, A.; Behera, D.; Sahoo, R. Can working memory capacity predict speech perception in presence of noise in older adult. Int. J. Multidiscip. Educ. Res. 2021, 10, 18–23. [Google Scholar]
  26. Nilsson, D. Design of fire alarms: Selecting appropriate sounds and messages to promote fast evacuation. In Proceedings of the Sound, Safety & Society, Lund, Sweden, 28 April 2014; Lund University: Lund, Sweden, 2015. [Google Scholar]
  27. Wang, W.; Ma, H.; Wang, C. The effect of characteristics of voice and sound field on the elderly’s speech intelligibility and subjective evaluation of voice alarms. J. Appl. Acoust. 2023, 42, 844–852. [Google Scholar]
  28. Adams, E.M.; Moore, R.E. Effects of speech rate, background noise, and simulated hearing loss on speech rate judgment and speech intelligibility in young listeners. J. Am. Acad. Audiol. 2009, 20, 28. [Google Scholar] [CrossRef] [PubMed]
  29. Moore, R.E.; Adams, E.M.; Dagenais, P.A.; Caffee, C. Effects of reverberation and filtering on speech rate preference. Int. J. Audiol. 2007, 46, 154–160. [Google Scholar] [CrossRef]
  30. Fogerty, D.; Alghamdi, A.; Chan, W. The effect of simulated room acoustic parameters on the intelligibility and perceived reverberation of monosyllabic words and sentences. J. Acoust. Soc. Am. 2020, 147, EL396–EL402. [Google Scholar] [CrossRef]
  31. Sato, H.; Sato, H.; Morimoto, M. Effects of Aging on Word Intelligibility and Listening Difficulty in Various Reverberant Fields. J. Acoust. Soc. Am. 2007, 121, 2915–2922. [Google Scholar] [CrossRef]
  32. Gordon-Salant, S.; Fitzgibbons, P.J. Recognition of multiply degraded speech by young and elderly listeners. J. Speech Hear. Res. 1995, 38, 1150–1156. [Google Scholar] [CrossRef]
  33. Goossens, T.; Vercammen, C.; Wouters, J.; van Wieringen, A. Masked speech perception across the adult lifespan: Impact of age and hearing impairment. Hear. Res. 2017, 344, 109–124. [Google Scholar] [CrossRef]
  34. Profant, O.; Tintera, J.; Balogova, Z.; Ibrahim, I.; Jilek, M.; Syka, J. Functional Changes in the Human Auditory Cortex in Ageing. PLoS ONE 2015, 10, e0116692. [Google Scholar] [CrossRef]
  35. Ben-David, B.M.; Tse, V.Y.Y.; Schneider, B.A. Does it take older adults longer than younger adults to perceptually segregate a speech target from a background masker? Hear. Res. 2012, 290, 55–63. [Google Scholar] [CrossRef] [PubMed]
  36. Kwak, C.; Han, W.; Lee, J.; Kim, J.; Kim, S. Effect of noise and reverberation on speech recognition and listening effort for older adults. Geriatr. Gerontol. Int. 2018, 18, 1603–1608. [Google Scholar] [CrossRef]
  37. Baldwin, C.L. Verbal collision avoidance messages during simulated driving: Perceived urgency, alerting effectiveness and annoyance. Ergonomics 2011, 54, 328–337. [Google Scholar] [CrossRef] [PubMed]
  38. Arai, K. How to transmit disaster information effectively: A linguistic perspective on Japan’s Tsunami Warnings and Evacuation Instructions. Int. J. Disaster Risk Sci. 2013, 4, 150–158. [Google Scholar] [CrossRef]
  39. Kallinen, K.; Ravaja, N. Effects of the rate of computer-mediated speech on emotion-related subjective and physiological responses. Behav. Inf. Technol. 2005, 24, 365–373. [Google Scholar] [CrossRef]
  40. Wang, W.; Ma, H.; Wang, C.; Dong, S.; Hu, W.; He, B. Impact of Echo Interference on Speech Intelligibility in Extra-Large Spaces. Buildings 2025, 15, 3690. [Google Scholar] [CrossRef]
  41. Li, X.; Zhao, Y. Exploring Factors Influencing Speech Intelligibility in Airport Terminal Pier-Style Departure Lounges. Buildings 2025, 15, 426. [Google Scholar] [CrossRef]
  42. Pastusiak, A.; Błasiński, Ł.; Kociński, J. Listening Effort in Reverberant Rooms: A Comparative Study of Subjective Perception and Objective Acoustic Metrics. Arch. Acoust. 2025, 50, 321–329. [Google Scholar] [CrossRef]
  43. van der Wal, C.N.; Robinson, M.A.; Bruine De Bruin, W.; Gwynne, S. Evacuation behaviors and emergency communications: An analysis of real-world incident videos. Saf. Sci. 2021, 136, 105121. [Google Scholar] [CrossRef]
Figure 1. Visual materials of the hospital environment used in the experiment.
Figure 1. Visual materials of the hospital environment used in the experiment.
Acoustics 08 00002 g001
Figure 2. Sound conditions used in the experiment.
Figure 2. Sound conditions used in the experiment.
Acoustics 08 00002 g002
Figure 3. Layout of the equipment in the laboratory.
Figure 3. Layout of the equipment in the laboratory.
Acoustics 08 00002 g003
Figure 4. The procedure of the experiment.
Figure 4. The procedure of the experiment.
Acoustics 08 00002 g004
Figure 5. The effect of SR on (a) speech intelligibility and (b) subjective assessment (** p < 0.01).
Figure 5. The effect of SR on (a) speech intelligibility and (b) subjective assessment (** p < 0.01).
Acoustics 08 00002 g005
Figure 6. The effect of SNR on (a) speech intelligibility and (b) subjective assessment (** p < 0.01).
Figure 6. The effect of SNR on (a) speech intelligibility and (b) subjective assessment (** p < 0.01).
Acoustics 08 00002 g006
Figure 7. The effect of RT on (a) speech intelligibility and (b) subjective assessment (** p < 0.01).
Figure 7. The effect of RT on (a) speech intelligibility and (b) subjective assessment (** p < 0.01).
Acoustics 08 00002 g007
Figure 8. The interactive effect of SR and SNR on (a) speech intelligibility and (b) listening difficulty.
Figure 8. The interactive effect of SR and SNR on (a) speech intelligibility and (b) listening difficulty.
Acoustics 08 00002 g008
Figure 9. The interactive effect of SR and RT on perceived urgency.
Figure 9. The interactive effect of SR and RT on perceived urgency.
Acoustics 08 00002 g009
Figure 10. The main effect of SR on perceived urgency in the elderly and young groups.
Figure 10. The main effect of SR on perceived urgency in the elderly and young groups.
Acoustics 08 00002 g010
Table 1. ANOVA results of the main effects on elderly people’s perception of voice alarm.
Table 1. ANOVA results of the main effects on elderly people’s perception of voice alarm.
FactorsSpeech IntelligibilityListening DifficultyPerceived Urgency
FSig.η2pFSig.η2pFSig.η2p
SR48.509<0.001 ***0.06540.947<0.001 ***0.0564.3950.013 *0.006
SNR28.377<0.001 ***0.05838.881<0.00 ***0.07721.187<0.001 ***0.045
RT93.095<0.001 **0.167127.423<0.001 ***0.2169.462<0.001 ***0.021
SR × SNR2.9230.008 **0.0122.7610.011 *0.0120.3910.8850.002
SR × RT1.8770.0820.0081.4620.1880.0063.1340.005 **0.014
SNR × RT0.9740.4600.0060.8420.5770.0051.4700.1540.010
*** p < 0.001; ** p < 0.01; * p < 0.05.
Table 2. Comparison of ANOVA results between the elderly and young groups, with effect sizes and significance presented.
Table 2. Comparison of ANOVA results between the elderly and young groups, with effect sizes and significance presented.
FactorsSpeech IntelligibilityListening DifficultyPerceived Urgency
Elderly GroupYoung GroupElderly GroupYoung GroupElderly GroupYoung Group
SR0.065 (***)0.069 (***)0.056 (***)0.104 (***)0.006 (*)0.070 (***)
SNR0.058 (***)0.047 (***)0.077 (***)0.207 (***)0.045 (***)0.176 (***)
RT0.167 (***)0.139 (***)0.216 (***)0.303 (***)0.021 (***)0.056 (***)
SR × SNR0.012 (**)0.0080.012 (*)0.011 (*)0.0020.009
SR × RT0.0080.024 (***)0.0060.016 (**)0.014 (**)0.012
SNR × RT0.0060.026 (***)0.0050.018 (*)0.0100.016
*** p < 0.001; ** p < 0.01; * p < 0.05. The differences between the young and the elderly were bold.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, H.; Chen, Q.; Wang, W.; Wang, C. Combined Effects of Speech Features and Sound Fields on the Elderly’s Perception of Voice Alarms. Acoustics 2026, 8, 2. https://doi.org/10.3390/acoustics8010002

AMA Style

Ma H, Chen Q, Wang W, Wang C. Combined Effects of Speech Features and Sound Fields on the Elderly’s Perception of Voice Alarms. Acoustics. 2026; 8(1):2. https://doi.org/10.3390/acoustics8010002

Chicago/Turabian Style

Ma, Hui, Qujing Chen, Weiyu Wang, and Chao Wang. 2026. "Combined Effects of Speech Features and Sound Fields on the Elderly’s Perception of Voice Alarms" Acoustics 8, no. 1: 2. https://doi.org/10.3390/acoustics8010002

APA Style

Ma, H., Chen, Q., Wang, W., & Wang, C. (2026). Combined Effects of Speech Features and Sound Fields on the Elderly’s Perception of Voice Alarms. Acoustics, 8(1), 2. https://doi.org/10.3390/acoustics8010002

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop