Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design

Shimokura, Ryota; Soeta, Yoshiharu

doi:10.3390/app16083814

Open AccessArticle

Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design

by

Ryota Shimokura

¹

and

Yoshiharu Soeta

^2,*

¹

Department of Systems Science, Graduate School of Engineering Science, Osaka University, Toyonaka 560-8531, Japan

²

Molecular Biosystems Research Institute, Life Science and Biotechnology, National Institute of Advanced Industrial Science and Technology (AIST), Ikeda 563-8577, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(8), 3814; https://doi.org/10.3390/app16083814

Submission received: 13 March 2026 / Revised: 8 April 2026 / Accepted: 10 April 2026 / Published: 14 April 2026

(This article belongs to the Special Issue Architectural Acoustics: From Theory to Application—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Detectability of auditory signals in built environments is a critical issue in architectural acoustics, particularly in public spaces where notification sounds must be perceived reliably under background noise. This study investigated reaction times (RTs) to amplitude-modulated pure tones under silent, white noise, and bandpass-noise conditions. Twenty young and twenty elderly participants responded to 1 and 2 kHz tones with flat, gentle, and steep onset envelopes. To describe perceptual detection in physically interpretable terms, a time-integrated sound-exposure level model,

L_{A E} (t)

, was applied. RT was defined as the moment when cumulative acoustic energy exceeded a criterion value relative to the hearing threshold. In silent conditions, RTs were accurately predicted by

L_{A E} (t)

, with onset-envelope shape influencing early energy accumulation. In noise conditions, RTs increased systematically with spectral proximity between target and masker, consistent with auditory filter theory. When spectral separation exceeded approximately four ERB numbers, masking effects were minimal, and RT approached silent-condition values. These findings demonstrate that perceptual detection timing is governed by cumulative acoustic energy and spectral masking rather than instantaneous sound pressure level. The

L_{A E} (t)

model provides a detection-oriented metric that complements conventional room-acoustic parameters and may support evidence-based design of perceptually robust auditory signals in architectural environments.

Keywords:

pure tone; envelope; reaction time; sound-exposure level

1. Introduction

In architectural acoustics, the design of sound sources in built environments is not limited to achieving desirable reverberation characteristics or sound insulation performance; it also requires ensuring that auditory signals are perceptually effective within complex acoustic spaces. In public buildings such as railway stations, airports, and hospitals, auditory signs play a crucial role in guiding occupants, particularly visually impaired users. However, these signals are often presented in acoustically challenging environments characterized by background noise, spectral masking, and reflections. Therefore, understanding how listeners detect time-varying sounds under realistic acoustic conditions is essential for evidence-based acoustic design.

Reaction time (RT) provides a quantifiable index of auditory detectability that integrates perceptual, neural, and motor processes. In room-acoustic contexts, RT can serve as an objective measure of how effectively a sound source emerges from background noise and becomes perceptually salient. Although previous studies have investigated speech intelligibility and perception of loudness, fewer studies have examined the temporal dynamics of signal detection in noisy environments.

RTs contain several time components in different processes; therefore, they result from several time-consuming processes that cannot easily be separated. Luce [1] described five possible processes: (1) signal transduction into neural spikes, (2) transmission of spikes to the brain, (3) signal processing and motor programming for the target muscle (finger or mouth), (4) signal transmission to muscles, and (5) muscle contraction. Because RT reflects the combined duration of these perceptual and neural processes, it is sensitive to the physical and spectral characteristics of the acoustic stimulus.

Consequently, previous studies investigating RT for auditory signals have examined how variations in stimulus properties influence response latency. The target sounds most commonly studied in reaction-time experiments were tones or bandpass noises, and the RTs associated with these sounds were discussed in terms of frequency, masking, and loudness [2,3,4,5,6,7,8,9,10,11,12]. In unmasked conditions, RTs did not vary with frequency for pure tones when the sensation levels (SLs) were comfortable to hear [7,8,10,12]. In masked conditions, RTs increased when the target tone frequencies were close to the masker frequencies [4,6]. These findings are consistent with auditory filter theory, whereby the perceived loudness of a tone is reduced when its frequency overlaps with that of the masker [13]. However, the aforementioned studies [4,6] investigated only a few combinations of masker and maskee frequencies, which were insufficient to fully characterize the relationship between RT and spectral masking.

Some researchers have measured RT using amplitude-modulated tones [5,11]. As hearing stable tones is rare in real life, the findings of these studies can inform sound design, helping to identify sounds that effectively raise awareness. These studies reported that RT decreased as the onset slope increased but reached a plateau once the rise time exceeded approximately 50 ms. This plateau effect has been attributed to temporal integration mechanisms: when signal duration exceeds the critical duration (CD), loudness no longer increases with duration. The CD has been reported to range between 100 and 200 ms [14,15], suggesting that RT may be determined before loudness reaches its steady-state value.

In order to investigate easily detectable notification sounds, Shimokura and Soeta [16] measured the RTs for several types of birdsongs that are often used as auditory signals for visually challenged people in public spaces in Japan. Although the birdsongs had relatively stable pitch, their amplitude envelopes varied substantially. To account for this variation, they introduced a time-integrated sound-exposure level (

L_{A E}

) to estimate the RTs for the signals with time-varying loudness. The

L_{A E}

is the sum of the squared sound pressure over a period typically less than 1 s [17], and it quantifies the loudness of a non-steady state sound well [18]. Time-integrated

L_{A E} (t)

was then used to estimate the time at which sounds conforming to the concept of integrated loudness become perceptually noticeable [14,15,19]. RTs estimated using the

L_{A E} (t)

model for birdsongs were more accurate than RTs estimated using the Zwicker loudness model for time-varying sounds with instantaneous loudness [13,20].

Temporal integration has often been modeled using a “leaky integrator,” which accumulates input over time while allowing gradual decay governed by a time constant. Such models are physiologically plausible and have been used to explain hearing thresholds through neural spike integration [21,22,23]. Two-time-constant models, whether arranged in series or parallel [24,25], provide improved predictions of hearing thresholds and loudness and have been standardized in ISO procedures [13]. Alternatively, Heil and Neubauer [26,27] proposed a statistical neural model in which detection occurs when a criterion number of neural events is reached. These approaches share a common principle with the

L_{A E} (t)

framework: detection occurs when cumulative neural or acoustic energy exceeds a threshold. In our previous study [16], the

L_{A E} (t)

model was shown to predict reaction times more accurately than an instantaneous loudness model based on the Zwicker method, suggesting that cumulative energy integration better captures perceptual detection timing for time-varying sounds.

In their study on estimation of single RT, Miller and Ulrich [28] proposed a parallel grain model (PGM), in which each signal is represented by a number of grains of information or activation. They termed the time required for the activation of a single grain to occur as an ‘activation time’. These activation times depended on the signal intensity (e.g., 10 ms for an intense signal and 40 ms for a weak signal in their examples) and were found to be correlated with RT when combined with the transmission time required for the activated grain to reach a decision center. Applying the concept of arrival time to the time-integrated intensity model suggests that the timing at which a tone is noticed may be determined by time-integrated intensity, with the integration beginning after the activation.

Spectral masking effects on RT have been reported previously. The RTs to tones increased due to masking by noise when the tone and background noise were spectrally close [4,6]. In contrast, Emmerich et al. [6] reported that a tonal background could accelerate the RT to a tone when the spectral energies were distinct. This acceleration of RT could even be observed for the RT to birdsongs, especially for the elderly participants [16], and has been discussed in association with “stochastic resonance” [29,30,31,32,33,34] and an “inverted-U-shaped manner” [35,36,37]. As mentioned above, Miller and Ulrich [28] proposed the PGM to explain the attention timing, whereby grains of information were statistically stimulated not only by the target signal, but also by other auditory inputs. Therefore, we can conclude that the noise energy spectrally separated from a target signal may also contribute to decision-making activation, even if it does not mask the target signal.

The reason the effect of noise on RT remains unclear is that the interaction between noise and target sounds has not been systematically studied. The aim of this study is to clarify how spectral characteristics of background noise influence RT to pure tones. As the initial shape of the target sound has a significant impact on RT, the amplitudes of pure tones were modulated to control their onset slope. The modulated pure tones at 1 and 2 kHz were masked by white noise and six bandpass noises positioned at varying spectral distances from the target frequency. Twenty young and twenty elderly participants reacted to those tones. The obtained RTs were analyzed using our proposed

L_{A E} (t)

model, whereby the envelopes of the pure tones were modulated in three fade-in ways to create differences in the corresponding

L_{A E} (t)

curves. Taking into account the activation time in the PGM [21], the cumulative start time of the

L_{A E} (t)

curve occurred after the onset of the pure tones in several intervals. As in our previous study [16], RT was estimated as the moment when

L_{A E} (t)

reached a criterion value. By introducing a time-integrated sound-exposure level model,

L_{A E} (t)

, we aim to provide a physically grounded and psychoacoustically interpretable framework that links acoustic energy integration, spectral masking, and perceptual detection timing. Such an approach contributes to the design of detectable auditory signals in architectural spaces.

2. Materials and Methods

2.1. Participants

Twenty young adults (9 men and 11 women; mean age:

27.2 \pm 8.5

years) and twenty elderly adults (13 men and 7 women; mean age:

70.9 \pm 4.1

years) participated in the experiment. As hearing loss typically begins in a person’s 40s, the participants were divided into two groups based on their age: young and elderly [38]. All participants reported normal hearing for their age and had no history of neurological or auditory disorders. None of the participants used hearing-support devices. All participants took hearing threshold tests to confirm their hearing abilities and determine the sensation level (SL) of the target signal (see Section 2.3 for details). All participants provided informed consent prior to participation. Approval for the experimental protocol (Human 2020-0227L) was generated by the institutional ethics committee.

2.2. Acoustic Stimuli and Spectral Masking Conditions

Table 1 provides an overview of all signals and conditions. The target signals were pure tones of 1 and 2 kHz, with envelopes modulated in three types of fade-in envelope (flat, gentle, and steep slopes), as shown in Figure 1. The initial focus will be on the mid-to-low frequency range. Similar investigations will be conducted on other frequency ranges in the future. The flat-slope signal did not modulate the envelope. The gentle- and steep-sloped signals had envelopes that increased at

0.5 (t + 1)

and t (t: time [s]), respectively. The signals had an initial duration of 1 s and a 5 ms Hanning taper at the onset to avoid clicking. The taper part was included in the signal duration. As the RTs in this study were expected to fall within the time duration, the taper was not applied at the offset, as shown in Figure 1. The presentation levels for the three signals were set at 10, 20, and 30 dB above the participant’s hearing threshold, as indicated in dBSL. According to the categorical unit of loudness, the pure tone at 10 dBSL was “very soft”, whereas the pure tone at 30 dBSL was “soft” [39].

In the noise conditions, white noise and bandpass noise were presented in the background. Although these noises differ from those typically heard in public places, they are ideal for demonstrating the masking effect. Based on preliminary RT data obtained from four young and four elderly individuals (who did not participate in the present experiment), signal-to-noise ratios (SNRs) were determined to ensure reliable detection across participants. For the young group, SNRs of 0 and

- 20

dB were selected, whereas for the elderly group, SNRs of 10 and

- 10

dB were selected. In these SNRs, the rate of no-response trials was less than 10% for each participant. These were categorized as high-SNR (0 dB for young; 10 dB for elderly) and low-SNR (

- 20

dB for young;

- 10

dB for elderly) conditions.

Bandpass noise was generated using a Gamma-tone filter bank designed to approximate the human auditory filter (MATLAB R2021a, MathWorks, Natick, MA, USA) [40]. White noise was passed through a bank of Gammatone filters that were equally spaced on an equivalent rectangular bandwidth (ERB) scale [41,42]. For example, a pure tone of 1 kHz will primarily activate the ERB filter with a number of 15 (center frequency: 1057 Hz). Figure 2 shows the spectra of the bandpass noise (solid lines) and the pure tone (dashed line) when they were calculated for 2048 samples and averaged over 1 s. Thus, the bandpass noise that masked the 1 kHz pure tone was set to numbers 10, 12, and 14 when shifted downwards, and 16, 18, and 20 when shifted upwards in the high-SNR condition (Figure 2a). In the low-SNR condition, the ERB numbers used were 9, 11, 13, 17, 19, and 21 (Figure 2b), as listed in Table 2. Preliminary experiments showed that when the bands closest to 1 kHz (i.e., 14 and 16) were used at a low SNR, participants failed to hear the targets several times. Therefore, we shifted the signals outward against the pure-tone frequencies by one ERB number. The ERB numbers are referred to as BD3, BD2, BD1, BU1, BU2, and BU3 in ascending order (BD: band down, BU: band up). The ERB numbers for a pure tone of 2 kHz were selected in the same manner and are listed in Table 2.

2.3. Apparatus

The sensation level, SL, of the target signal was controlled according to the individual’s hearing threshold. Therefore, prior to the RT measurements, we used a transformed up-down procedure [43] to determine the hearing thresholds for pure tones at 1 and 2 kHz, which served as the target signals. Threshold tests were conducted using a two-alternative forced-choice procedure with diotic listening in the soundproof chamber.

As there were no significant differences in hearing thresholds according to the envelope modulations of the pure tone (i.e., flat, gentle, or steep slopes as shown in Figure 1), the threshold in the pure tones was represented with a flat slope. Finally, the level of the pure tone was determined using the measured hearing thresholds. The averaged sound-pressure levels at 0 dB SL were

1.2 \pm 4.9

dB at 1 kHz and

2.4 \pm 4.4

dB at 2 kHz for young participants and

23.7 \pm 9.6

dB at 1 kHz and

27.1 \pm 10.1

dB at 2 kHz for elderly participants.

Participants were instructed to tap an empty carton with their fingers as quickly as possible when they detected the target tone (Figure 3). The target signals were presented by a headphone (HD598, Sennheiser, Wedemark, Germany). A subminiature piezotronics accelerometer (352A21; PCB Piezotronics, New York, NY, USA) was attached to the reverse side of the tapping surface, and the vibration produced at the moment of tapping was registered by a PC (X1 Carbon, Lenovo, Beijing, China). A looped output containing the original signal was sent to the AD/DA converter (FireFaceUC, RME, Haimhausen, Germany) with a sampling rate of 48 kHz and a sampling resolution of 24 bits, while the tapping vibration was recorded simultaneously on the stereo channel. Then, the difference in timing (i.e., RT) between the onset of the signal and the initiation of tapping was calculated. Because the steeply sloped signals obscured the onset time, the onset times for all the signals were calculated by subtracting 1 s from the offset time.

RT measurements were conducted under three acoustic conditions: silent, white-noise, and bandpass-noise conditions (Table 1). In the silent condition, 18 types of pure tone (two frequencies, three envelope shapes, and three presentation levels) were presented four times in a random order. The inter-stimulus interval was randomly varied between 3 and 12 s to prevent participants from predicting the timing of the onset.

In the white noise and bandpass noise conditions, the target signals were limited to 30 dBSL, and the white and bandpass noises were presented continuously through headphones. To avoid the effects of fatigue, the experiment was separated into five sessions, each session lasting 6–7 min, with sufficient breaks between sessions. The RT values were averaged over four trials, and non-response trials were excluded.

2.4. Estimation of RT

Based on the RT measurements conducted using birdsong [16], a time-integrated

L_{A E} (t)

was introduced to estimate the RT for pure tones. First, the names of the variables and parameters used here are listed in Table 3.

L_{A E} (t)

was calculated as follows:

L_{A E} (t) = 10 {log}_{10} \int_{0}^{t} \frac{P (s)}{P_{m i n}} d s,

(1)

where

P (s)

is the sound pressure [Pa] and

P_{m i n}

is the pressure at hearing threshold [Pa]. In this equation, 0 s represents both the start time of the integral and the onset of the signal. Figure 4 shows the

L_{A E} (t)

curves for pure tones with flat, gentle, and steep slopes, with the sound pressure adjusted to 0 dBSL (The

L_{A E} (t)

curves shown in Figure 5 and Figure 6 of the RT study with birdsongs [16] could not be divided by the sampling rate after the discrete integrated energy was calculated, and the

L_{A E} (t)

was normalized by the minimum audible pressure (

2.0 \times 10^{- 5}

Pa). Consequently, the values on the vertical axis differ from those in Figure 4 of this paper. Therefore, for these stimuli, the

L_{A E} (t)

was 0 dBSL after one second. As shown in Figure 4, the integration curve of the pure tone with a steep envelope rose gently. Unlike the

L_{A E} (t)

of birdsongs, the curves rose smoothly. Therefore, approximations by power functions were unnecessary, and the curves could be expressed mathematically. The time-cumulative pressures on the flat (

C P_{F}

), gentle (

C P_{G}

), and steep (

C P_{S}

) slopes are expressed as follows:

C P_{F} (t) = \int_{0}^{t} sin {(2 π f s)}^{2} d s,

(2)

C P_{G} (t) = \int_{0}^{t} {\frac{1}{2} (s + 1) sin 2 π f s}^{2} d s,

(3)

C P_{S} (t) = \int_{0}^{t} {(s sin 2 π f s)}^{2} d s,

(4)

where f is the frequency of the pure tone [Hz]. After solving the above integrals and using the envelopes of the sounds,

L_{A E} (t)

in flat (

L_{A E F}

), gentle (

L_{A E G}

), and steep (

L_{A E S}

) slopes can be approximated using the following equations:

L_{A E F} (t) \approx 10 {log}_{10} t,

(5)

L_{A E G} (t) \approx 10 {log}_{10} \frac{t^{3} + 3 t^{2} + 3 t}{7},

(6)

L_{A E S} (t) \approx 10 {log}_{10} t^{3},

(7)

As t approaches 1,

L_{A E} (t)

converges to 0 dB. The frequency of the pure tone does not influence

L_{A E} (t)

.

This model hypothesizes that a listener responds when the cumulative energy of the target signal reaches a criterion corresponding to 0 dBSL (i.e., the hearing threshold). Accordingly, the RT to pure tones above 0 dBSL was estimated using

t_{a t t}

, defined as the time at which

L_{A E} (t)

reaches a certain value. Figure 4 shows examples of determining the

t_{a t t}

of the steep-sloped signal. The certain values differ depending on the target level but increase linearly in dB units. Therefore, the values for the signals at 20 and 10 dBSL are 10 dB and 20 dB higher, respectively, than those at 30 dBSL (

L_{20} = L_{30} + 10

and

L_{10} = L_{30} + 20

). When the target signal is close to the threshold, a longer time is required, and the listener reacts later. If the listener continues to hear the target signal throughout the entire time (1 s), the cumulative energy will reach the SPL at their hearing threshold, and they will clearly notice it. RT experiments using birdsong reported a logarithmic relationship between

t_{a t t}

and RT with high accuracy [16]. Therefore, similarly in this study, we explored the most likely

L_{30}

,

L_{20}

, and

L_{10}

, which produced the highest correlation with the measured RT.

In the noise condition, it is assumed that the amount of masking determines the detection criterion in the model. Previous masking studies have shown that the loudness of a pure tone is reduced by a masker within the same critical band, and the band sensation level is almost equal to the amount of masking [44,45]. The band sensation level was defined as the effective masking level within the auditory filter centered at the signal frequency. In this study, bandpass noises were generated using ERB-wide gammatone filters with center frequencies that were equally spaced on an ERB-number scale. To obtain the masking amount, the background noise was passed through the ERB filter corresponding to the signal frequency, and the sound energy within that filter was calculated. For example, the level of the white noise was 50 dBSL in the low-SNR condition for young participants. This follows from the fact that the pure tone was presented at 30 dBSL with an SNR of

- 20

dB, where SNR was defined as the difference between signal and noise levels. After generating white noise at 50 dBSL, the band sensation level for a pure tone of 1 kHz was quantified by the sound energy passing through the fifteenth ERB filter. The band sensation level for each bandpass noise was derived using the same procedure.

3. Results

3.1. RT in the Silent Condition

Figure 5 shows the average RT of the 20 young (upper panel) and 20 elderly (lower panel) participants. The pure tones at 1 and 2 kHz had three types of envelopes (flat, gentle, and steep), and the target signals were presented at three presentation levels (10, 20, and 30 dBSL). The upper and lower figures indicate RTs by young and elderly participants, respectively. Regardless of the envelope shape, the RT increased as the SL decreased. In other words, the participants reacted faster to clearly audible pure tones. For the envelope, the RTs for the flat and gentle slopes were similar, whereas the RTs for the steep slopes were longer. These differences increased as SLs decreased. A two-way analysis of variance revealed significant effects of SL and envelope on the RT were significant for both tones and participant groups (

p < 0.01

in all cases). As normal distributions were not observed for the RTs of 1 kHz and 2 kHz in each participant group, a non-parametric analysis (paired Wilcoxon signed-rank test) was conducted. No significant differences were observed between the RTs for 1 kHz and 2 kHz in young participants (

p = 0.85

), but significant differences were observed in elderly participants (

p < 0.05

). According to the unpaired Wilcoxon test, elderly participants had significantly shorter RT than young participants (

p < 0.01

), as the SPLs of the target signals were higher at the same SL.

The data are in agreement with the model’s predictions. By varying the

L_{30}

in 0.5 dB steps, we identified the most likely values of

L_{30}

,

L_{20}

, and

L_{10}

. This produced a logarithmic

t_{a t t}

that correlated most highly with the measured RTs (

R T

≈ a

l o g_{10}

t_{a t t}

+ b; a, b: constants). Furthermore, taking into account the potential for a delayed start time for integration, we changed the start time for calculating

L_{A E} (t)

in 10 increments (0–90 ms in 10 ms intervals). For example, when the start time is 50 ms, a zero is inserted for the first 50 ms before

L_{A E} (t)

is calculated. We estimated the average RTs at both 1 and 2 kHz using a unique

L_{A E} (t)

model for each young and elderly participant, as shown in Figure 5.

Table 4 shows the start time [s],

L_{30}

[dB], a, b [s], the correlation coefficient, and the averaged error [s] between the measured and estimated RTs when the correlation coefficient between the common-logarithmic

t_{a t t}

and the measured RT was the highest. These results are only valid for the 20 young and 20 elderly participants in this study. The model parameters were fitted to the data, and the best fit was achieved when a delayed integration starting 60 ms after onset was assumed for both young and elderly participants. The estimated

L_{30}

values were

- 33.5

dB and

- 38.0

dB for young and elderly participants, respectively. Estimation accuracy was high for both groups, with correlation coefficients of 0.96 for young participants and 0.99 for elderly participants, both statistically significant at

p < 0.01

. These values were comparable to the accuracy previously reported for RTs to birdsong in young participants (

r = 0.98

,

p < 0.01

) [16].

3.2. RT in Noise Conditions

Figure 6 shows the average RT under white-noise conditions. For reference, the average RT for the tone at 30 dBSL in the silent condition is included. RT increased slightly when background noise was added to the high-SNR condition. In contrast, the RTs in the low-SNR condition were considerably longer. In particular, responses to tones with steep slopes were notably slower, as were responses to tones with gentle slopes compared to those to tones with flat slopes. The prolongation of RTs under the low-SNR condition was more pronounced for young participants because the assigned SNR was lower for young participants (

- 20

dB) than for elderly participants (

- 10

dB).

In the bandpass noise condition, the RTs were arranged according to the ERB number for different SNRs (Figure 7 and Figure 8). The vertical rigid line indicates the ERB number containing the pure tone frequency. According to the auditory filter theory, the loudness of the pure tones is minimized when the bandpass noises have closer ERB numbers to them (15 for 1 kHz and 20 for 2 kHz) [44,45]. The horizontal dotted lines show the RTs at 30 dBSL in the silent condition. When the ERB number of the bandpass noise approached the pure tone frequency, the RTs increased (i.e., BD1 and BU1). By contrast, bandpass noises that were spectrally distinct from the pure tone resulted in RTs similar to those observed in the silent condition (i.e., BD3, BD2, BU2, and BU3). This tendency was clearly observed for steeply sloped tones, with higher vertex RTs around the frequencies of pure tones at a low SNR (Figure 8). In other words, if the spectral distance between the bandpass noise and the pure tone was further by four ERB numbers (1 kHz with an ERB number of 15 was below 11 and above 19, and 2 kHz with an ERB number of 20 was below 16 and above 24), the effect of the noise on RT could be minimized. RT increased more with the ERB number around the frequencies of pure tones for young participants, because the assigned SNR was lower for them than for elderly participants.

The RTs for the background noise were estimated using the

L_{A E} (t)

curves and the amount of masking. Figure 9 shows the relationship between RT and the masking amount calculated from white and bandpass noises. For the young participants (Figure 9a), RT remained low until a masking amount of 10.5 dB, after which it increased in two stages, at 22.0 and 29.4 dB. These three categories are referred to as “mask low,” “mask middle,” and “mask high.” In the high-SNR condition, the bandpass noises BD1 and BU1 were categorized as mask-middle, while in the low-SNR condition, the white and bandpass noises BD1 and BU1 were categorized as mask-high. All other noise conditions were categorized as mask low. For elderly participants, the masking amounts in the mask-low, -middle, and -high categories were 0.37, 12.1, and 19.2 dB, respectively (Figure 9b).

As the RTs in the mask-low category were similar to those in the silent condition, the noise conditions in this masking category can be considered to have a slight effect on RT. Therefore, if the

L_{A E} (t)

value used to estimate the RTs in the mask-low category is assumed to be the same as

L_{30}

, the standard values for the mask-middle (

L_{M M}

) and mask-high (

L_{M H}

) categories should be

L_{30} + 11.5

(=22.0 − 10.5) dB and

L_{30} + 18.9

(=29.2 − 10.5) dB, respectively, for the young participants. As in the silent condition,

L_{A E} (t)

estimation in the noisy condition can be performed by exploring the

t_{a t t}

s to reach

L_{30}

,

L_{M M}

, and

L_{M H}

. For the elderly participants, the

L_{M M}

and

L_{M H}

were

L_{30} + 11.7

(=12.1 − 0.37) dB and

L_{30} + 18.8

(=19.2 − 0.37) dB, respectively. For both sets of participants, the

L_{M M}

and

L_{M H}

were approximately 10 and 20 dB higher than

L_{30}

, respectively. Table 4 shows the estimation results, and Figure 10 illustrates the comparison between the measured and estimated RTs for each category. As this model estimated the average RTs in the three categories for pure tones at 1 and 2 kHz, the estimated RTs (dotted lines) were the same for pure tones at these frequencies. As shown in Figure 10, the estimation accuracy was very high for both groups, with correlation coefficients of 0.90 for young participants and 0.93 for elderly participants, both statistically significant at

p < 0.01

. The start time for calculating

L_{A E} (t)

was shorter for the elderly participants. For both groups, the estimated

L_{30}

values were close to those in the silent condition.

4. Discussion

4.1. Temporal Integration Mechanism and the $L_{A E} (t)$ Model

As shown in Figure 5, the RTs to pure tones with steep slopes were more extended for the sound pressures in the lower SL. These results were consistent with the extensions of

t_{a t t}

in the

L_{A E}

curves (Figure 4). The

L_{A E} (t)

model could then estimate the RTs to pure tones in the silent condition with a very high level of accuracy (

r = 0.96

for young participants and

r = 0.99

for elderly participants). These estimation accuracies were similar to those for birdsong RT for young participants (

r = 0.98

) but higher for elderly participants (

r = 0.72

) [16]. Changing the targets to simple pure tones resulted in the

L_{A E} (t)

model performing better for elderly participants. In the auditory temporal integration, the effective time constant (i.e., CD) just above the hearing threshold decreases to approximately 100 ms [27]. This means that little intensity accumulation occurs for determining the hearing threshold when the signal duration is longer than 100 ms. In this study, the presentation levels were below 30 dBSL (i.e., close to the thresholds) and were one second long; the remaining 900 ms may therefore have had little influence on the shift in hearing threshold. However,

L_{A E}

curves that accumulated for one second were important for estimating the RTs of the one-second-long signals.

Unlike the models used in the birdsong experiment, the

L_{A E} (t)

calculation was normalized to the hearing threshold, as shown in Equation (1). This improvement helps us to understand the model’s meaning. Since the

L_{A E} (t)

is calculated from the target signal at the auditory threshold, the

L_{A E}

at 1 s (the target’s duration) is 0 dBSL. In other words, any participant could detect the target signal if they listened to it for its entire duration. The

L_{A E}

values required to notify the pure tones at 30, 20, and 10 dBSL were

- 33.5

,

- 23.5

, and

- 13.5

dB, for the young participants, respectively. Interestingly, the absolute values of the

L_{A E}

were almost equal to the SLs. Therefore, we can hypothesize that participants may notice the signal at 30 dBSL when the

L_{A E}

value is lower than their hearing threshold by 30 dB. Drawing lines for

L_{40}

and

L_{50}

in Figure 4 additionally shows that the RTs to the pure tones at 40 and 50 dBSL were 470.16 and 469.97 ms, respectively, for flat-sloped signals. For signals with a steep slope, the estimated RTs at 40 and 50 dBSL were 479.67 and 470.70 ms, respectively. The estimated RTs remained almost unchanged at higher SLs compared to 30 dBSL. The RTs converged at sufficiently high SLs, but differed from previous studies due to variations in the number and age of participants, instructions, and the amount of training. However, there was a common tendency: the RTs increased rapidly when the SLs of the signals were less than 30 dB [10,11]. Even when the target signals were narrow-band noise, the measured RTs were prolonged at the low levels [46]. As shown in Figure 4, the

L_{A E} (t)

curves indicate a slight increase in the RT for an SL higher than 30 dBSL, and they may be applicable to any envelope modulation of the target signal.

Another characteristic of

L_{A E} (t)

was its independence from the frequency of the pure tone. The

L_{A E} (t)

curves can be approximated using Equations (5)–(7), which do not include frequency as a variable. In fact, there was little difference in the measured RTs between the 1 and 2 kHz pure tones. This independence from frequency has been observed in previous studies, not only for pure-tone targets [10], but also for narrowband noise targets [46]. However, this study only validated the results using a small sample frequency (i.e., only 1 and 2 kHz), and at relatively low sensation levels for the binaural presentation. Further research should be conducted to expand the frequency range and validate the results at higher sensation levels.

4.2. Influence of Spectral Masking

As in previous studies, RT increased with louder background noise [9,47,48]. The effect of noise on the RT may be influenced by the spectral masking of the pure tone, as shown in Figure 7 and Figure 8, in a manner similar to that observed for loudness [44,45]. Using the

L_{A E} (t)

model to apply the masking amount to obtain

t_{a t t}

produced a highly accurate estimate of RT for both participants, as shown in Figure 10. For instance, the young participants’ RT did not increase significantly up to a masking amount of 10.5 dB (mask low), and they reacted more slowly to background noise that was 11.5 dB (mask middle) and 18.9 dB (mask high) louder than the mask-low category (Figure 9). As the most likely

L_{30}

in the noise condition was

- 31

dB (Table 3), the

L_{M M}

and

L_{M H}

were

- 19.5

and

- 12.1

dB, respectively. If the SNR decreased by a further 10 dB (i.e., to SNR

- 30

dB), the

L_{A E}

would reach almost 0 dBSL. In the preliminary experiments, almost all the participants could not hear the pure tones in the white noise under an SNR of

- 30

dB; therefore, the

L_{A E} (t)

model could explain the upper limit of the masking amount.

The RTs under the noise condition were influenced not only by the noise energy but also by the spectral distance from the target signal’s frequency. These spectral effects can be seen in the measured RTs in the bandpass noise (Figure 7 and Figure 8). The bandpass noises used in this measurement were synthesized using a Gammatone filter scaled by auditory critical bands. Thus, the bandpass noises of BD1 and BU1, which overlapped with the pure-tone band spectrally (ERB 15 for 1 kHz and ERB 20 for 2 kHz), increased their masking amounts, resulting in longer RTs. However, bandpass noise that was spectrally distant (BD3, BD2, BU2, and BU3) produced RTs that were similar to those in the silent condition. In other words, the masking effect had little impact on RT. When the center frequency of the noise was more than four ERBs away from the pure tone frequency, the masking effect was almost nullified. Although some studies have reported that the small amounts of RT data measured under noise only indicate the possibility of spectral masking of RTs [4,6], our structured results based on the Gammatone filters clearly demonstrate the spectral RT behaviors along the frequency range.

4.3. Start Time for Calculating $L_{A E}$

To maximize the correlation between the measured and estimated RTs, the most suitable start time for calculating

L_{A E}

was 60 ms after the onset of the signal. As the Hanning taper of 5 ms was applied at the onset of the pure tones, the start time may include the taper length. We hypothesized that the start time might be caused by the activation time; however, we may need to reconsider this interpretation. A limitation of the

L_{A E} (t)

model is that the

t_{a t t}

approaches 0s as the signal becomes more intense, as illustrated in Figure 3 and Equation (5). As the

t_{a t t}

and RT were logarithmically related (

R T

≈ a

l o g_{10}

t_{a t t}

+ b), the RT diverges to

- \infty

in this case. This problem can be solved by changing the start time of integration. For example, considering the 60-ms delay, the time-cumulative pressure of a pure tone with a flat slope (CPF) changes to

C P_{F} (t) = \int_{0.06}^{t} {(\sin 2 π f s)}^{2} d s,

(8)

and the

L_{A E F}

can be approximated to

L_{A E F} (t) \approx 10 {log}_{10} (t - 0.06),

(9)

According to Equation (9), the

t_{a t t}

approaches, but does not fall below, 60 ms regardless of the intensity of the signal. For young participants and flat-sloped pure tones, the estimated RT was 460 ms, and the RT for pure tones with a flat slope is likely to be around 460 ms if the presentation level exceeds 30 dBSL, as shown in Figure 5a,b. Interestingly, the start time defines a possible minimum RT in this condition. As discussed above, the delayed start time of integration is also consistent with the previous studies [10,11,46] showing that the response time peaks around 30 dBSL.

4.4. Age-Related Effects

Although the

L_{30}

value for young participants was almost the same as the SL of the target signal, the

L_{30}

value for elderly participants was lower, as shown in Table 4. According to our model, elderly participants noticed the sound more quickly than younger participants. Although the presentation level was normalized by the individual hearing threshold, the measured RTs were significantly faster in elderly participants. As the presentation levels were determined based on the SL, the SPL of the pure tone for the elderly participants was approximately 24 dB higher than that for the young participants. Simply aligning the SLs of young and elderly participants made it difficult to compare their RTs. The larger variance of hearing thresholds in the elderly participant group is another issue that needs to be addressed when discussing the effect of ageing on RT.

The start times for both participant groups were the same in the silent condition (60 ms), whereas in the noise condition, there was a significant difference between the two groups (80 ms for young participants and 20 ms for elderly participants). For pure tones with a flat slope, the minimum RTs were 488 ms for young participants and 430 ms for elderly participants. This difference in RTs can be seen in Figure 10. In order to enable the elderly participants to react to the signals as many times as possible, the assigned SNRs were higher than for the young participants. However, such experimental adjustments made discussions on the effects of aging difficult, as with the silent conditions.

RT experiments using birdsong have reported that some bandpass noises can shorten the RTs, particularly for elderly participants [16]. In this study, the bandpass noise in BD3, BD2, BU2, and BU3 shortened the RTs for

38 %

of the responses from young participants and

39 %

of the responses from elderly participants. However, this tendency was not distinctive for elderly participants. One possible explanation is that the spectral overlap between the target tone and masker was insufficient to reveal age-related differences in auditory filtering. To investigate this possibility, additional RT measurements should be conducted using noisy targets whose spectra span multiple auditory filters. Under such conditions, individual audiograms may have a stronger influence on RTs and may reveal age-related differences. An elderly person often has a sensorineural hearing loss, which broadens their auditory filter and makes it difficult to separate a target signal from background noise [49]. Therefore, before conducting such RT measurements, participants’ audiograms should be examined, and notched-noise masking tests should also be conducted to estimate individual auditory filter bandwidths.

In many studies of RT, the time taken for a motor process (e.g., pushing a button) is often treated as a constant, with minimal variation between individuals [50,51]. However, this study distinguishes between young and elderly participants, so differences in RT may include the decline in physical ability. Reaction time provides a comprehensive assessment of various aspects of ageing, which makes it difficult to observe auditory behavior in isolation. However, since the

L_{A E} (t)

model demonstrates high estimation accuracy also in elderly participants, the acoustic characteristics of the target sound and background noise alone can serve as sufficient explanatory variables.

4.5. Practical Implications for Architectural Sound Design

From the perspective of architectural acoustics, the present findings have practical implications for the design of perceptually effective auditory signals in built environments. In transportation facilities, hospitals, and other public buildings, notification sounds must be detectable within complex acoustic fields characterized by background noise and room reflections. The results indicate that detectability is determined not solely by overall sound pressure level but by cumulative acoustic energy relative to spectral masking within auditory critical bands.

Conventional architectural acoustic metrics—such as reverberation time (RT60), clarity (C50/C80), and speech transmission index (STI)—primarily describe spatial acoustic characteristics. The

L_{A E} (t)

-based approach complements these metrics by providing a detection-oriented parameter grounded in temporal energy accumulation. Designing notification sounds that (1) minimize spectral overlap with dominant environmental noise components and (2) promote rapid early energy accumulation may enhance perceptual salience without increasing overall sound levels. Further study is needed to investigate the interaction between the temporal

L_{A E} (t)

model and the spatial features of a room. For example, reverberation in a room modifies the

L_{A E} (t)

curves (Figure 4) of pure tones and changes the

t_{a t t}

. It is interesting to see if the

L_{A E} (t)

model could also be applied in reverberant environments.

Of the birdsongs researched, the cuckoo’s call, which is often used in Japanese public spaces, was the easiest to notice subjectively [52] and had the shortest

t_{a t t}

[16]. In a noisy environment, the sign sounds should differ from the dominant frequency of the noise source by at least four ERB numbers to minimize the masking effects. For example, the sound energy of train noise is distributed below 500 Hz (ERB number 10) [53], so sign sounds in stations should be designed using sounds above 924 Hz (ERB number: 14). High-frequency sounds are unpleasant, so the sign sounds with the lowest possible frequency should be selected according to the noise frequency.

The cuckoo’s call is just one example of a sound in nature with a sharp, abrupt beginning. For example, the sounds of nuts cracking and shrimp snapping are well known to have rapid early energy accumulation. Analyzing these sounds using an

L_{A E} (t)

model makes it possible to determine whether the sounds can be reacted to easily or not in the noise environments of the target public space. Finally, the sense of urgency or familiarity that the sound conveys needs to be adjusted according to how it is intended to be used.

5. Conclusions

The

L_{A E} (t)

model is suitable for estimating RTs for pure tones with different amplitude-modulated envelopes. When the

L_{A E} (t)

is calculated from pure tones at the listener’s threshold, the

t_{a t t}

that is highly correlated with the RTs can be approximated to a moment at an

L_{A E}

lower than 0 dB (the hearing threshold) by almost the SL of the pure tone. For example, RTs measured from a pure tone at 30 dB SL can be estimated at the

t_{a t t}

at which

L_{A E}

reaches

- 33.5

dB for young participants. However, note that the calculation of

L_{A E} (t)

begins 60 ms (start time) after the onset of the signal.

When the amplitude-modulated pure tones overlap with the background noise, the RT can be estimated using the

L_{A E} (t)

model, as in the silent condition. However, the value used to specify

t_{a t t}

corresponds to the masking amount against the target pure tone. Therefore, as with subjective loudness, RT is influenced by spectral masking. Different from the subjective loudness, the pure tones must be at least four ERBs away from the noise frequency to minimize the effect of spectral masking on RT.

Future studies should examine RT estimation using a wider diversity of acoustic stimuli and more complex acoustic environments. In particular, targets with richer spectral and temporal structures and signals that span multiple auditory filters should be investigated to better understand how cumulative acoustic energy interacts with auditory filtering in realistic listening situations. In addition, incorporating individual audiograms and auditory-filter bandwidths into the

L_{A E} (t)

framework may improve the prediction accuracy of RTs and help clarify age-related differences in auditory processing.

Author Contributions

R.S.: Conceptualization, Methodology, Software, Investigation, Analysis, Writing—original draft. Y.S.: Implementation of psychoacoustic tests, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by a Grant-in-Aid for Scientific Research (B) from the Japan Society for the Promotion of Science (JP22H03916).

Institutional Review Board Statement

Approval for the experimental protocol (Human 2020-0227L) was generated by the ethics committee of the National Institute of Advanced Industrial Science and Technology (AIST).

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquires can be directed to the corresponding author.

Acknowledgments

The authors thank the participants for their cooperation during the experiments and the Human Resources Centers for the Aged at Ikeda (Osaka). In addition, the authors thank Takako Nakazawa, who helped implement the psychoacoustic experiments.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the reported in this paper.

References

Luce, R.D. Response Times: Their Role in Inferring Elementary Mental Organization; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
Chocholle, R. Variation des temps de réaction auditif en function de l’intensité à diverses frequencies. Annee. Psychol. 1940, 41, 65–124. [Google Scholar] [CrossRef]
Burke, K.S.; Crestone, M.E.; Shutts, R.E. Hearing loss and reaction time. Arch. Otolaryngol. 1965, 81, 49–56. [Google Scholar] [CrossRef] [PubMed]
Chocholle, R.; Greenbaum, H. La sonie de sons purs partiellement masqués. Étude comparatif par une méthode d’ égalisation et par la méthode des temps de reaction. J. Psychol. Norm. Pathol. 1966, 63, 385–414. [Google Scholar]
Warm, J.S.; Foulke, E. Effects of rate of signal rise and decay on reaction time to the onset and offset of acoustic stimuli. Percept. Psychophys. 1970, 7, 159–160. [Google Scholar] [CrossRef]
Emmerich, D.S.; Pitchford, L.J.; Becker, C.A. Reaction time to tones in tonal backgrounds and a comparison of reaction time to signal onset and offset. Percept. Psychophys. 1976, 20, 210–214. [Google Scholar] [CrossRef]
Marshall, L.; Brandt, J. The relationship between loudness and reaction time in normal hearing listeners. Acta Otoloryngol. 1980, 90, 244–249. [Google Scholar] [CrossRef]
Kohfeld, D.L.; Santee, J.L.; Wallace, N.D. Loudness and reaction time: I. Percept. Psychophys. 1981, 29, 535–549. [Google Scholar] [CrossRef]
Kemp, S. Reaction time to a tone in noise as a function of the signal-to-noise ratio and tone level. Percept. Psychophys. 1984, 36, 473–476. [Google Scholar] [CrossRef]
Epstein, M.; Florentine, M. Reaction time to 1- and 4-kHz tones as a function of level. Ear Hear. 2006, 27, 424–429. [Google Scholar] [CrossRef]
Schlittenlacher, J.; Ellermeier, W. Simple reaction time to the onset of time-varying sounds. Atten. Percept. Psychophys. 2015, 77, 2424–2437. [Google Scholar] [CrossRef]
Schlittenlacher, J.; Ellermeier, W.; Avci, G. Simple reaction time for broadband sounds compared to pure tones. Atten. Percept. Psychophys. 2017, 79, 628–636. [Google Scholar] [CrossRef]
ISO 532-1; Acoustics-Methods for Calculating Loudness-Part 1: Zwicker Method. International Organization for Standardization: Geneva, Switzerland, 2017.
Florentine, M.; Buus, S.; Poulsen, T. Temporal integration of loudness as a function of level. J. Acoust. Soc. Am. 1996, 99, 1633–1644. [Google Scholar] [CrossRef] [PubMed]
Buus, S.; Florentine, M.; Paulsen, T. Temporal integration of loudness, loudness discrimination and the form of the loudness function. J. Acoust. Soc. Am. 1997, 101, 669–680. [Google Scholar] [CrossRef] [PubMed]
Shimokura, R.; Soeta, Y. Estimation of reaction time for birdsongs and effects of background noise and listener’s age. Appl. Acoust. 2022, 194, 108785. [Google Scholar] [CrossRef]
ISO 1996-1:2016; Acoustics-Description, Measurement and Assessment of Environmental Noise—Part 1: Basic Quantities and Assessment Procedures. International Organization for Standardization: Geneva, Switzerland, 2016.
Numba, S.; Kuwano, S.; Fastl, H. Loudness of non-steady-state sound. Jpn. Psychol. Res. 2008, 50, 154–166. [Google Scholar] [CrossRef]
Munson, W.A. The growth of auditory sensation. J. Acoust. Soc. Am. 1947, 19, 584–591. [Google Scholar] [CrossRef]
Glasberg, B.R.; Moore, B.C. A model of loudness applicable to time-varying sounds. J. Acoust. Soc. Am. 2002, 50, 331–342. [Google Scholar]
Plomp, R.; Bouman, M.A. Relationship between hearing threshold and duration for tone pulse. J. Acoust. Soc. Am. 1959, 31, 749–758. [Google Scholar] [CrossRef]
Zwislocki, J.J. Theory of temporal auditory summation. J. Acoust. Soc. Am. 1960, 32, 1046–1060. [Google Scholar] [CrossRef]
Zwislocki, J.J. Temporal summation of loudness: An analysis. J. Acoust. Soc. Am. 1969, 46, 431–441. [Google Scholar] [CrossRef]
Poulsen, T. Loudness of tone pulses in a free field. J. Acoust. Soc. Am. 1981, 69, 1786–1790. [Google Scholar] [CrossRef] [PubMed]
Hots, J.; Rennies, J.; Verhey, J.L. Influence of time constants and comparison on the prediction of temporal integration of loudness. In Proceedings of the AIA-DAGA 2013 International Conference on Acoustics, Merano, Italy, 18–21 March 2013; pp. 1266–1268. [Google Scholar]
Heil, P.; Neubauer, H. A unifying basis of auditory thresholds based on temporal summation. Proc. Natl. Acad. Sci. USA 2003, 100, 6151–6156. [Google Scholar] [CrossRef] [PubMed]
Heil, P.; Matysiak, A.; Neubauer, H. A probabilistic Poisson-based model accounts for an extensive set of absolute auditory threshold measurement. Hear. Res. 2017, 353, 135–161. [Google Scholar] [CrossRef] [PubMed]
Miller, J.; Ulrich, R. Simple reaction time and statistical facilitation: A parallel grains model. Cogn. Psychol. 2003, 46, 101–151. [Google Scholar] [CrossRef]
Jaramillo, F.; Wiesenfeld, K. Mechanoelectrical transduction assisted by Brownian motion: A role for noise in the auditory system. Nat. Neurosci. 1998, 1, 384–388. [Google Scholar] [CrossRef]
Henry, K.R. Noise improves transfer of near-threshold, phase-locked activity of the cochlear nerve: Evidence for stochastic resonance? J. Comp. Physiol. A. 1999, 184, 577–584. [Google Scholar] [CrossRef]
Zeng, F.G.; Fu, Q.J.; Morse, R. Human hearing enhanced by noise. Brain Res. 2000, 869, 251–255. [Google Scholar] [CrossRef]
Moss, F.; Ward, L.M.; Sannita, W.G. Stochastic resonance and sensory information processing: A tutorial and review of application. Clin. Neurophysiol. 2004, 115, 267–281. [Google Scholar] [CrossRef]
Ries, D.T. The influence of noise type and level upon stochastic resonance in human audition. Hear. Res. 2007, 228, 136–143. [Google Scholar] [CrossRef]
Ward, L.M.; MacLean, S.E.; Kirschner, A. Stochastic resonance modulates neural synchronization within and between cortical sources. PLoS ONE 2010, 5, e14371. [Google Scholar] [CrossRef]
Yerkers, R.M.; Dodson, J.D. The relation of strength of stimulus to rapidity of habit-formation. J. Comp. Neurol. Psychol. 1908, 18, 459–482. [Google Scholar] [CrossRef]
Broadbent, D.E. A reformulation of the Yerkes-Dodson low. Br. J. Moth. Stat. Psychol. 1965, 18, 145–157. [Google Scholar] [CrossRef]
Mendl, M. Performing under pressure: Stress and cognitive function. Appl. Anim. Behav. Sci. 1999, 65, 221–244. [Google Scholar] [CrossRef]
Wasano, K.; Kaga, K.; Ogawa, K. Patterns of hearing changes in women and men from denarians to nonagenarians. Lancet Reg. Health West. Pac. 2021, 9, 100131. [Google Scholar] [CrossRef] [PubMed]
Heeren, W.; Hohmann, V.; Appell, J.E.; Verhey, J.L. Relation between loudness in categorical units and loudness in phons and sones. J. Acoust. Soc. Am. 2013, 133, EL314–EL319. [Google Scholar] [CrossRef] [PubMed]
Available online: https://jp.mathworks.com/help/audio/ref/gammatonefilterbank-system-object.html (accessed on 19 February 2026).
Moore, B.C.J.; Glasberg, B.R. Derivation of auditory filter shapes from notched-noise data. Hear. Res. 1990, 47, 103–138. [Google Scholar] [CrossRef]
Hartmann, W.M. Signals, Sound, and Sensation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2004; p. 251. [Google Scholar]
Levitt, H. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 1971, 49, 467–477. [Google Scholar] [CrossRef]
Garner, W.R.; Miller, G.A. The masked threshold of pure tones as a function of duration. J. Exp. Psychol. 1947, 37, 293–303. [Google Scholar] [CrossRef]
Hawkins, J.E.; Stevens, S.S. The masking of pure tones and of speech by white noise. J. Acoust. Soc. Am. 1950, 22, 6–13. [Google Scholar] [CrossRef]
Wagner, E.; Florentine, M.; Buus, S.; McCormack, J. Spectral loudness summation and simple reaction time. J. Acoust. Soc. Am. 2004, 116, 1681–1686. [Google Scholar] [CrossRef]
Raab, D.H.; Grossberg, M. Reaction time to changes in the intensity of white noise. J. Exp. Psychol. 1965, 69, 609–612. [Google Scholar] [CrossRef]
Kohfeld, D.L.; Goedecke, D.W. Intensity and predictability of background noise as determinants of simple reaction time. Bull. Psychon. Soc. 1978, 12, 129–132. [Google Scholar] [CrossRef]
Glasberg, B.R.; Moore, B.C.J. Auditory filter shapes in subjects with unilateral and bilateral cochlear impairments. J. Acoust. Soc. Am. 1986, 79, 1020–1033. [Google Scholar] [CrossRef] [PubMed]
Ulrich, R.; Stapf, K.H. A double response paradigm to study stimulus intensity effects upon the motor system. Percept. Psychophys. 1984, 36, 545–558. [Google Scholar] [CrossRef] [PubMed]
Ulrich, R.; Wing, A.M. A recruitment theory of force-time relations in production of brief force pulses: The parallel force unit model. Psycholog. Rev. 1991, 98, 268–294. [Google Scholar] [CrossRef] [PubMed]
Soeta, Y.; Ariki, A. Subjective salience of birdsong and insect song with equal sound pressure level and loudness. Int. J. Environ. Res. Public Health 2020, 17, 8858. [Google Scholar] [CrossRef]
Shimokura, R.; Soeta, Y. Characteristics of train noise in above-ground and underground stations with side and island platforms. J. Sound Vib. 2011, 330, 1621–1633. [Google Scholar] [CrossRef]

Figure 1. Waveforms of pure tones with (a) flat, (b) gentle, and (c) steep slopes.

Figure 2. Spectra of bandpass noises (rigid lines) and a pure tone (dot line) in (a) high- and (b) low-SNR conditions.

Figure 3. Experimental conditions.

Figure 4. Time-functional sound-exposure levels (

L_{A E} (t)

) of pure tones with flat, gentle, and steep slopes and example of

t_{a t t}

for the steep-sloped tone at 10 (

L_{10}

), 20 (

L_{20}

), and 30 (

L_{30}

) dBSL.

Figure 4. Time-functional sound-exposure levels (

L_{A E} (t)

) of pure tones with flat, gentle, and steep slopes and example of

t_{a t t}

for the steep-sloped tone at 10 (

L_{10}

), 20 (

L_{20}

), and 30 (

L_{30}

) dBSL.

Figure 5. Averaged reaction times in the silent condition in each SL (error bar: standard deviation).

Figure 6. Averaged reaction times under white noise in each SNR (error bar: standard deviation).

Figure 7. Averaged reaction times under the high-SNR bandpass noise for each ERB number (error bar: standard deviation). Vertical lines: ERB numbers containing the pure tones; dotted horizontal lines: RTs at 30 dBSL in the silent condition.

Figure 8. Averaged reaction times under the low-SNR bandpass noise for each ERB number (error bar: standard deviation). Vertical lines: ERB numbers containing the pure tones; dotted horizontal lines: RTs at 30 dBSL in the silent condition.

Figure 9. Relationships between reaction time and masking amount for (a) young and (b) elderly participants.

Figure 10. Measured (rigid lines) and estimated (dot lines) reaction times for mask-low, -middle, and -high categories.

Table 1. Summary list of signals and experimental conditions.

Signal	Frequency	1 and 2 kHz
Signal	Envelope	Flat, Gentle and Steep
Silent condition	Signal level	10, 20 and 30 dBSL
Noise condition	Signal level	30 dBSL
	Noise type	White noise and Six bandpass noises
	SNR	High and Low

Table 2. ERB number and center frequency of bandpass noise.

		1 kHz Pure tone
	Name	BD3	BD2	BD1	BU1	BU2	BU3
High-SNR	ERB number	10	12	14	16	18	20
High-SNR	Frequency [Hz]	516	698	924	1206	1556	1991
Low-SNR	ERB number	9	11	13	17	19	21
Low-SNR	Frequency [Hz]	439	602	805	1371	1762	2247
		2 kHz Pure tone
	Name	BD3	BD2	BD1	BU1	BU2	BU3
High-SNR	ERB number	15	17	19	21	23	25
High-SNR	Frequency [Hz]	1057	1371	1762	2247	2852	3603
Low-SNR	ERB number	14	16	18	22	24	26
Low-SNR	Frequency [Hz]	924	1206	1556	2533	3207	4045

Table 3. Variables in the

L_{A E} (t)

model.

Table 3. Variables in the

L_{A E} (t)

model.

Variables	Explanation
P	Sound pressure of the target signal
$P_{m i n}$	Sound pressure at hearing threshold
f	Frequency of the target signal
$C P_{F}, C P_{G}, C F_{S}$	Time-cumulative pressures for flat, gentle and steep slopes
$L_{A E F}, L_{A E G}, L_{A E S}$	Approximated $C P_{F}, C P_{G}, C F_{S}$
$L_{10}, L_{20}, L_{30}$	Threshold levels for noticing sounds with 10, 20, and 30 dBSL
$t_{a t t}$	Time until the threshold level is reached

Table 4. Parameters when

t_{a t t}

s were the most correlated with RTs.

Table 4. Parameters when

t_{a t t}

s were the most correlated with RTs.

		Start Time [s]	$L_{30}$ [dB]	a	b [s]	Correlation Coefficient	Averaged Error [s]
Silent condition	Young	0.06	$- 33.5$	0.38	0.92	0.96	0.02
Silent condition	Elder	0.06	$- 38$	0.57	1.11	0.99	0.01
Noise condition	Young	0.08	$- 31$	0.60	1.14	0.90	0.04
Noise condition	Elder	0.02	$- 40$	0.25	0.85	0.93	0.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shimokura, R.; Soeta, Y. Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design. Appl. Sci. 2026, 16, 3814. https://doi.org/10.3390/app16083814

AMA Style

Shimokura R, Soeta Y. Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design. Applied Sciences. 2026; 16(8):3814. https://doi.org/10.3390/app16083814

Chicago/Turabian Style

Shimokura, Ryota, and Yoshiharu Soeta. 2026. "Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design" Applied Sciences 16, no. 8: 3814. https://doi.org/10.3390/app16083814

APA Style

Shimokura, R., & Soeta, Y. (2026). Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design. Applied Sciences, 16(8), 3814. https://doi.org/10.3390/app16083814

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants

2.2. Acoustic Stimuli and Spectral Masking Conditions

2.3. Apparatus

2.4. Estimation of RT

3. Results

3.1. RT in the Silent Condition

3.2. RT in Noise Conditions

4. Discussion

4.1. Temporal Integration Mechanism and the $L_{A E} (t)$ Model

4.2. Influence of Spectral Masking

4.3. Start Time for Calculating $L_{A E}$

4.4. Age-Related Effects

4.5. Practical Implications for Architectural Sound Design

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Reaction Time to Amplitude-Modulated Tones Under Spectral Masking: Implications for Architectural Acoustic Design

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants

2.2. Acoustic Stimuli and Spectral Masking Conditions

2.3. Apparatus

2.4. Estimation of RT

3. Results

3.1. RT in the Silent Condition

3.2. RT in Noise Conditions

4. Discussion

4.1. Temporal Integration Mechanism and the L A E ( t ) Model

4.2. Influence of Spectral Masking

4.3. Start Time for Calculating L A E

4.4. Age-Related Effects

4.5. Practical Implications for Architectural Sound Design

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. Temporal Integration Mechanism and the $L_{A E} (t)$ Model

4.3. Start Time for Calculating $L_{A E}$