1. Introduction
Speech intelligibility is an important requirement in rooms intended for speech, such as classrooms, lecture halls, and auditoria. Adequate intelligibility depends on acoustic conditions, which are mostly determined by room geometry and acoustic treatment. Acoustic conditions are set in design requirements specified in standards and building regulations [
1,
2,
3,
4]. The present study focuses on room acoustics as part of the architectural design process, where the objective is to achieve satisfactory speech intelligibility through appropriate control of acoustic parameters.
The state-of-the-art procedure for acoustic design of large auditoria, theaters, and other specific spaces is to perform 3D acoustic simulations, mostly using geometrical acoustics methods, such as ray-tracing [
5], sometimes including modal calculations for low frequencies [
6]. A simplified procedure for smaller rooms, such as multiple classrooms for a new school, usually relies on reverberation time (RT) estimates using the Sabine formula in SI units (
1) [
7], Eyring (
2) [
8], or other empirical formulas.
where
V is the room volume,
S is the surface area of the room, and
is the average absorption coefficient. The 0.16 empirical coefficient varies from 0.16 to 0.164 [
9]. In general, Sabine’s formula is applicable for a diffuse field in rectangular rooms with a flat ceiling and homogeneous boundary conditions without highly sound-absorptive materials.
Eyring’s reverberation time equation is given as
where average absorption is expressed as
to account for high absorption values for the room surfaces.
In his classical paper [
10], Bradley stated that reverberation time alone is not enough to properly estimate speech intelligibility. The study showed that the 50 ms useful-to-detrimental energy ratio (
) and combinations of RT with background noise were the strongest predictors of speech intelligibility. Bradley’s paper established the basis for modern studies on speech intelligibility in classrooms. One important feature of his study was that measurements were conducted in occupied rooms.
Room shape also has a significant impact on the interrelations between acoustic parameters. This was shown by Barron and Lee [
11]. Acoustic measurements were conducted across 15 concert halls and two multipurpose music spaces in the UK, focusing on early decay time (EDT), the early (up to 80 ms) to late sound index C80, and overall sound level. The prediction model matched measured data well on average, though variations occurred depending on factors such as ceiling diffusivity, hall geometry (e.g., fan-shaped plans), and stage area absorption. In order to establish well-grounded estimates for the present study, only rectangular rooms and spaces are included.
The amount of absorption influences the reverberation time and the early-to-late sound ratios. However, the distribution of absorption (only on the ceiling, only on the walls, or the ceiling and walls together), as well as sound-diffusing elements (furniture or specific diffusers), has a significant impact on speech intelligibility while having relatively little influence on RT [
12,
13]. As mentioned by Harvie-Clark [
14], RT often fails to provide precise estimates of speech intelligibility due to nonlinear decay in non-diffuse rooms. It was also proven by Nilsson [
15] that in spaces with non-uniform absorption (e.g., ceiling absorption), reliance on classical RT formulas (like Sabine) is misleading. The author introduced a two-sound field approach—grazing and non-grazing—which gave more accurate results. This approach was further developed in [
16], where a statistical energy model was proposed to predict reverberation time, speech clarity, and strength in rectangular rooms with absorbent ceilings. The model separates grazing and non-grazing sound fields and incorporates the scattering effects of furniture, demonstrating significantly improved agreement with measurements compared to classical diffuse-field models. The model relies on knowledge of surface impedance, which is commonly not provided by material manufacturers. Also, this model is applicable to rooms with only ceiling absorption.
Room size is another significant factor influencing speech intelligibility. In the paper by Pelegrín-García et al. [
17], it was concluded that rooms with a volume below 210 m
3 should have a reverberation time of approximately 0.6–0.7 s in unoccupied conditions and 0.45–0.6 s when occupied (with fewer than 40 students). The authors emphasized the importance of voice support
for speakers as well as the influence of background noise levels on listeners. The
prediction model was introduced.
Apart from Bradley [
10], the importance of background noise was also stressed by Nijs and Rychtáriková [
18], who used the
parameter. Their predictive model, based on
, incorporates reverberation time, signal-to-noise ratio (SNR), and
.
is defined in the literature as the difference between early sound pressure level (SPL) and late SPL with background noise. The inclusion of background noise in the metric is beneficial, as it provides true estimates of speech intelligibility. However, for the design process, it may not be practical, as the background noise level depends on the number of students and the type of learning activity. The behavioral and cultural aspects are also important. For a designer, the only controllable background noise factor is noise from mechanical installations, HVAC (heating, ventilation, and air conditioning), and the like. It was found that under noisy conditions, when the SNR is low, a slightly higher reverberation time can improve speech intelligibility due to increased sound strength (
G) and useful reflections. Notably, the authors provide room design guidelines for architects and acoustic consultants in the form of graphs.
Harvie-Clark and Dobinson [
14] concluded that
G and
correlate well with perceived loudness and speech intelligibility. They showed that spatial variation of these parameters follows predictable relationships with room absorption, geometry, and source–receiver distance, aligning closely with theoretical models. Similar results were previously demonstrated by Nilsson [
13].
One of the more recent studies by Arvidsson et al. [
19] reached similar conclusions to Harvie-Clark’s: absorptive treatments consistently reduced strength
G and reverberation time, while diffusers preserved
G values and improved
, particularly when vertically oriented. In another paper, Nilsson [
20] concluded that including the scattering effects of furniture significantly enhanced prediction accuracy for
,
, and
G, compared to models assuming homogeneous absorption. Thus, the presence of furniture in a room is another factor that must be taken into account.
Despite substantial research on speech intelligibility, current regulations in several European countries still rely primarily on reverberation time as the main design criterion for classrooms and similar spaces. As an exception, Latvian regulations [
3] include the musical clarity parameter
for rooms intended for speech, although this choice remains debatable in the context of speech-focused spaces. Overall, speech clarity parameters such as
are rarely included in practical design workflows due to modeling complexity, which motivates the development of simplified analytical estimation tools.
Acoustic simulations require dedicated software, which, from a consultant’s perspective, increases the cost of the design process because more time is needed to perform multiple calculations for similar yet non-identical rooms. Designing a series of such rooms for speech in new or renovated buildings becomes cumbersome when speech intelligibility parameters are added as control criteria alongside reverberation time. Therefore, it is beneficial to have an analytical tool that allows estimation of speech intelligibility measures, such as speech clarity , to ensure satisfactory acoustic conditions throughout the entire room.
1.1. C50 Models
Speech clarity
is defined in ISO 3382-1 [
21] as the energy ratio between early (arriving from 0 to 50 ms) and late (arriving after 50 ms) reflections.
Bradley [
10] introduced
2nd-order polynomial model at 1 kHz based on measured RT values with a standard error of approximately ±1 dB:
Barron and Lee in 1988 [
11] made predictions of early-to-late index values in 15 unoccupied concert halls based on RT, using [
22,
23]
where
T is the reverberation time, term 1.1 is related to the 80 ms threshold as
, and 13.82 is the assumed linear reverberant sound decay slope. In this paper, an improved theory is introduced: the total sound energy at a receiver consists of the direct sound
d, early reflected sound
and late sound
l
so the early-to-late index is
Based on Barron and Lee’s theory, the authors of [
18] formulated
as a difference between the sound pressure level (SPL) arriving early (before 50 ms) at the listener’s position
and the SPL arriving late (after 50 ms)
and derived the theoretical formula:
where
is the sound source power level,
Q is the source directivity,
r is the source–receiver distance,
is the absorption coefficient, mfp is the mean free path of the room, and
is a distance factor. The distance factor
accounts for the change in the effective density of reflected sound energy at the receiver with increasing source–receiver distance. It takes into consideration the fact that early reflected energy increases with source–receiver distance more rapidly than predicted by ideal diffuse sound field models. The distance factor
[
24] was found to provide good agreement between measured and predicted early-to-late energy ratios. This value reflects typical classroom geometries with dominant wall reflections.
EASERA software developer AFMG [
25] and one of its authors, W. Anhert [
26], introduce an equation for anticipated speech clarity:
where
is a half-room diffuse-field distance
, and
is the front-to-random factor of speaker characteristic (directionality). This theoretical model is a revised version of (
8) with a 50 ms threshold instead of 80 ms and adding speakers directionality
.
In [
27],
is formulated as a function of reverberation time for distances well away from the source, such that direct sound is not significant, and in rooms where the strength
G is not much below 15 dB:
which is a simplified form of both (
8) and (
11) for a case of exponential decay without taking distance into account, as the source is far enough.
Previous work for this research [
28] resulted in a speech clarity model for rooms with ceiling and backwall absorption averaged for 125–4000 Hz octave bands:
This model was developed with typical modern classroom design in mind, non-diffuse sound field conditions, and inhomogeneous boundary conditions using empirical data from 181 individual measurements from 9 classrooms of a similar type.
The present paper uses a more diverse dataset of 455 entries from 30 different rooms. As shown above, clarity largely depends on both distance r and reverberation time, so it is important to also implement RT in an empirical model. Thus, the aim of this study is to develop an empirical and practical calculation model for speech clarity based on the reverberation time and distance in rooms of different sizes. The resulting models demonstrate good practical applicability in the mid-to-high frequency range. However, their accuracy in the low-frequency region is reduced due to modal effects.
1.2. Artificial Intelligence Use
Artificial intelligence tools were used for text editing and the generation of R scripts based on the algorithms and theory provided in the prompts by the author. The scripts were tested and validated by the author. The use of AI did not influence the research design, data collection, data analysis, interpretation of results, or scientific conclusions. All scientific content, analysis, and conclusions were produced by the author.
2. Room Acoustics Measurements
The study defines the following tasks:
Collect speech clarity , reverberation time , and source–receiver distance data for 27 different rooms.
Select 80% of the data for model training and leave 20% for cross-validation (CV).
Perform regression analysis on training data using selected mathematical models and evaluate the models on real data using statistical metrics.
Perform a cross-validation check using the CV dataset.
The was chosen among other reverberation parameters simply because it better reflects the classic analytical formulation of RT (Sabine’s, Eyring’s, or others), which is a standard way to estimate reverberation time in practical acoustics.
The data were collected partly by using the available data from room acoustics measurements provided by Akukon and partly by performing measurements at Riga Technical University. All rooms were measured without students in them. The studied rooms are divided into three acoustic categories:
Scatter reverberant—homogeneous boundary (HB), no or little absorption, semi- or fully scattering due to furniture, 7 different rooms/halls (
Figure 1a).
Empty reverberant—HB, no or little absorption, no scattering (without furniture), 5 rooms (
Figure 1b).
Directional absorptive—inhomogeneous boundary (IB), ceiling absorption and scattering due to furniture, 9 similar classrooms at the RTU campus in Riga, and 6 more rooms/halls (
Figure 1c).
All tested rooms are rectangular in shape. During measurements, the temperatures ranged between 18 and 23 °C, while humidity was between 40–60%.
The method for measurement of room acoustic parameters was ISO 3382-1:2009 Acoustics—Measurement of room acoustic parameters—Part 1: Performance spaces [
21]. The equipment used for the measurements was a Brüel & Kjær OmniPower Sound Source Type 4292-L with Power Amplifier Type 2734, a calibrated measurement microphone Dayton Audio EMM-6 powered by a Presonus AudioBox 22VSL sound interface, and Odeon Auditorium measurement software. Impulse responses were measured and processed to receive 6-octave frequency band results, mainly for
and
. The principal geometry of the rooms (length, width, and ceiling height) as well as their acoustic conditions were recorded. Source and receiver positions were also recorded, which allowed us to calculate source–receiver distances.
Category 1 includes two conference halls and two school auditoria of approximately 100 m2, two classrooms of 70 m2 and 61 m2, and one music hall of 300 m2, representing 96 data entries in total. The dimensions of these rooms range from 11 to 27 m in length, 6 to 11 m in width, and 2.6 to 5.6 m in height.
Category 2 consists of three sports halls of 708 m2, 294 m2, and 268 m2, a showroom of 125 m2, and a historic conference hall of 204 m2, representing 95 data entries in total. The dimensions of these rooms range from 12 to 33 m in length, 9 to 22 m in width, and 3.5 to 10 m in height.
Category 3 has the largest dataset. The majority of measured rooms are university auditoria. These rooms have mineral wool acoustic ceiling tiles and mineral wool panels on the back wall, a conventional design for teaching premises. The only exception is the 27 m long room, which had a sound-reflecting glass cabinet. There are tables and wooden chairs. The walls of the corridor have protrusions to the outside with a depth of 50–70 cm. A similar shape applies to the windows. The rooms are not perfectly rectangular and have at least some degree of scattering. All rooms have an average ceiling height of 2.66 ± 0.05 m and a width of 5.78 ± 0.5 m; thus, it is argued that these dimensions are similar for all rooms. The length of the rooms varies from 8.84 m to 27 m. In 8 of the 9 rooms, there were 3 separate sources and 5 to 10 individual receivers for each source, thus producing 3 sets of measurements for each room. Only one room had a single sound source, which was initially done as a pilot test. There are 4 extra rooms in this category—a sports hall of 156 m2 with ceiling absorption (CA), two school auditoria of 420 (CA) and 550 m2 (CA and wall absorption WA), and the previously mentioned showroom after acoustic treatment (CA, WA). One sports hall and one school auditorium were measured before and after additional sound absorption treatment, essentially providing two more rooms to the room set. The total number of data entries for the third category is 264.
6. Discussion
The regression analysis reveals some differences between frequency bands and room categories in terms of model structure and prediction quality. A consistent pattern can be seen across all three categories: the lowest two octave bands behave differently from the mid–high frequency range.
6.1. Behavior of the 125 Hz Band
For 125 Hz, the best-fitting model differs across categories. Category 1 and Category 2 show a quadratic dependence on the combined predictor , whereas Category 3 follows a logarithmic trend. This inconsistency between categories indicates that the 125 Hz band is governed not by diffuse-field statistical behavior but by the modal characteristics of each room. Also, Category 3 rooms have a much lower average reverberation time ( = 0.8 s), compared to the first and second categories (2.0 and 3.7 s, respectively). This observation supports the assumption that the sound field in Category 3 is different from Categories 1 and 2.
In all three categories, the RMSE exceeds 2 dB, and the adjusted coefficient of determination remains low (), confirming that none of the analytical models provides a reliable description of at 125 Hz.
This observation aligns with room acoustics theory: at low frequencies, the sound field is dominated by modal distributions rather than diffuse decay, and reverberation time fails to represent the underlying energy decay mechanism. Nilsson’s investigations of non-diffuse fields and ceiling-dominated absorption [
16] emphasized that classical RT-based models are not applicable where modal behavior prevails. The present data confirm that the product
is not an appropriate predictor for
below approximately 200 Hz.
6.2. Behavior of the 250 Hz Band
The 250 Hz band presents an intermediate case. While the RMSE still exceeds 2 dB in all categories, the adjusted consistently rises above 0.5. Thus, although prediction errors remain relatively high, models capture more than half of the variance, suggesting that diffuse-field behavior is emerging but not fully established at 250 Hz.
The differences between categories are also more pronounced at this frequency. In Category 1, the average between 500 and 4000 Hz is 0.67, whereas at 250 Hz, the value reaches only 0.53 (approximately 22% lower). A similar relative reduction is observed for Category 2 (73% vs. 83%; approximately 11% lower) and Category 3 (53% vs. 67%; approximately 20% lower).
This transitional behavior is consistent with the interpretation that the 250 Hz band sits at the boundary between modal effects and diffuse-field energy decay and therefore exhibits larger variability across rooms with different boundary conditions.
6.3. Mid–High Frequency Behavior (500–4000 Hz)
In the frequency range of 500–4000 Hz, the regression models achieve substantially better performance in all categories. The RMSE values are typically around 1.6–1.8 dB on average, and the adjusted coefficients of determination fall within the range of 0.67 (Category 1 and Category 3) to 0.83 (Category 2).
In this frequency region, the results indicate that the predictor
provides a reliable basis for estimating
. The dependence of clarity on the combined effect of source–receiver distance and reverberation time follows a clear nonlinear trend, which is consistently captured by the logarithmic model formulations. This behavior is observed across all room categories, including those with non-diffuse or strongly inhomogeneous boundary conditions, suggesting that the underlying relationship is robust even when the sound field departs from ideal diffuse assumptions. The improved model performance in the 500–4000 Hz bands is consistent with the statistical nature of the sound field at these frequencies, where modal behavior is practically nonexistent. Overall, the results confirm that
is an effective predictor for
in practical room conditions within this frequency range. The logarithmic formulations also align with existing theoretical models of clarity in rectangular and semidiffuse rooms, capturing the direct and reverberant parts of the sound decay [
10,
11,
31].
6.4. Comparison Between Room Categories
An evaluation across the three categories shows that Category 2 (empty, homogeneous boundary rooms) consistently achieves the highest goodness-of-fit metrics. The RMSE is typically lower, and is higher relative to Category 1 and Category 3 in almost every band.
An unexpected result was the relative performance of the room categories. Category 2 rooms, which contain the least amount of scattering elements, produced the most consistent regression behavior. It was initially assumed that Category 1 rooms would yield lower variability, as these spaces have more homogeneous boundary conditions and include some furniture that could introduce scattering. The observations indicate that this assumption does not hold. The furniture typically present in Category 1 rooms (mainly chairs and tables) provides only partial and uneven scattering, particularly in the mid–high frequencies, and introduces additional non-uniformity rather than improving diffuseness. Consequently, the regression models for Category 2 outperform those for Category 1 by approximately 38%, suggesting that the acoustic behavior in Category 2 is closer to that predicted by classical Sabine decay, despite the minimal presence of scattering surfaces.
Category 1 and Category 3 rooms show more variability. Category 1 (reverberant) exhibits greater spatial irregularities due to furniture and geometry, while Category 3 (directional absorptive) includes inhomogeneous boundary conditions and scattering from furniture. These effects introduce deviations from idealized diffuse decay, which is reflected in higher dispersion in the measurements and correspondingly lower goodness-of-fit.
These observations indicate that the differences in regression performance depend not only on the reverberation time but also on the degree of scattering and boundary homogeneity.
7. Conclusions
This work presented an empirical approach for estimating the speech clarity index from the combined predictor . A dataset of measured acoustic parameters from rooms of different types was analyzed. Several regression models were evaluated, and their performance was assessed using adjusted , AIC, and RMSE, followed by cross-validation on independent data.
The study showed that the behavior of can be clearly separated by frequency. For 125–250 Hz, neither quadratic nor logarithmic models provided reliable estimates due to dominant modal effects. For 500–4000 Hz, the models produced stable and consistent results, with RMSE typically below 2 dB and adjusted up to 0.83. Logarithmic formulations demonstrated the most robust behavior across all room categories in this frequency region.
Residual analysis confirmed that prediction errors follow an approximately Gaussian distribution, which supports the use of standard deviation as an accuracy indicator. Cross-validation residual spread for mid–high frequencies was 1.7–2.7 dB, which represents the expected uncertainty range for practical predictions based on the proposed model. To reach a 95% confidence range for practical estimates of , the variability should be considered within twice the residual standard deviation. For the present models, this corresponds to an interval of about to dB in the mid–high frequency bands.
The main contribution of this study is the demonstration that
can be estimated directly from distance and analytically obtained reverberation time, for example, using Eyring’s formula or the most recent approaches to RT estimation [
16], without room simulation. The combined predictor
captures the dominant decay trends in real rooms with both homogeneous and inhomogeneous boundaries. The resulting regression formulas can be applied for preliminary assessment of speech clarity in early room design stages, both for architectural and acoustic purposes. The models are not intended to replace simulations but to complement them during preliminary design and verification.
The method is limited at low frequencies, where modal behavior dominates, and statistical decay parameters are no longer valid predictors. Future work may include extending the approach to incorporate background noise or directional source characteristics and validating the model on a larger sample of strongly inhomogeneous rooms.