Empirical Modeling of Speech Clarity C50 as a Function of Distance and Reverberation Time T30

Mironovs, Deniss

doi:10.3390/buildings16040749

Open AccessArticle

Empirical Modeling of Speech Clarity C₅₀ as a Function of Distance and Reverberation Time T₃₀ ^†

by

Deniss Mironovs

Institute of High-Performance Materials and Structures, Riga Technical University, LV-1048 Riga, Latvia

^†

This article is a revised and expanded version of a paper entitled “Linking Speech Clarity, Reverberation, and Distance for Classroom Design Optimization” which was presented at the Forum Acusticum/EuroNoise 2025, Malaga, Spain, 23–26 June 2025.

Buildings 2026, 16(4), 749; https://doi.org/10.3390/buildings16040749

Submission received: 9 January 2026 / Revised: 3 February 2026 / Accepted: 10 February 2026 / Published: 12 February 2026

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

Download

Browse Figures

Versions Notes

Abstract

The acoustic design of learning spaces is commonly carried out using geometrical acoustics simulations or analytical calculations. While 3D simulations provide high accuracy, they are time-consuming and resource-intensive, whereas analytical calculations are limited to reverberation time. This study proposes a set of empirical regression formulas for estimating the speech clarity index

C_{50}

from the reverberation time

T_{30}

and the source–receiver distance r. The models are intended for design verification, allowing a quick assessment of whether a proposed acoustic solution meets speech clarity criteria without using numerical simulations. A total of 455 measurement entries from 28 rooms were analyzed, representing three categories of acoustic conditions. Polynomial and logarithmic regression models were developed and evaluated using three statistical criteria: the adjusted coefficient of determination (

R_{adj}^{2}

), the Akaike Information Criterion (AIC), and the root mean square error (RMSE). The results show that logarithmic models generally provide better fit consistency across room types, whereas polynomial models describe lower frequency bands more accurately. The proposed relationships demonstrate practical potential for predicting

C_{50}

for mid–high frequencies in real rooms using analytically obtained

T_{30}

values and geometric distances. The proposed models are intended for early-stage building and classroom design, where numerical simulations are not yet available.

Keywords:

room acoustics; speech clarity; reverberation time; regression analysis

1. Introduction

Speech intelligibility is an important requirement in rooms intended for speech, such as classrooms, lecture halls, and auditoria. Adequate intelligibility depends on acoustic conditions, which are mostly determined by room geometry and acoustic treatment. Acoustic conditions are set in design requirements specified in standards and building regulations [1,2,3,4]. The present study focuses on room acoustics as part of the architectural design process, where the objective is to achieve satisfactory speech intelligibility through appropriate control of acoustic parameters.

The state-of-the-art procedure for acoustic design of large auditoria, theaters, and other specific spaces is to perform 3D acoustic simulations, mostly using geometrical acoustics methods, such as ray-tracing [5], sometimes including modal calculations for low frequencies [6]. A simplified procedure for smaller rooms, such as multiple classrooms for a new school, usually relies on reverberation time (RT) estimates using the Sabine formula in SI units (1) [7], Eyring (2) [8], or other empirical formulas.

R T (f) = \frac{0.16 V}{S \bar{α} (f)}

(1)

where V is the room volume, S is the surface area of the room, and

\bar{α} (f)

is the average absorption coefficient. The 0.16 empirical coefficient varies from 0.16 to 0.164 [9]. In general, Sabine’s formula is applicable for a diffuse field in rectangular rooms with a flat ceiling and homogeneous boundary conditions without highly sound-absorptive materials.

Eyring’s reverberation time equation is given as

R T (f) = - \frac{0.161 V}{S \ln (1 - \bar{α} (f))}

(2)

where average absorption is expressed as

{\hat{α}}_{E} (f) = - \ln (1 - \bar{α} (f))

to account for high absorption values for the room surfaces.

In his classical paper [10], Bradley stated that reverberation time alone is not enough to properly estimate speech intelligibility. The study showed that the 50 ms useful-to-detrimental energy ratio (

U_{50}

) and combinations of RT with background noise were the strongest predictors of speech intelligibility. Bradley’s paper established the basis for modern studies on speech intelligibility in classrooms. One important feature of his study was that measurements were conducted in occupied rooms.

Room shape also has a significant impact on the interrelations between acoustic parameters. This was shown by Barron and Lee [11]. Acoustic measurements were conducted across 15 concert halls and two multipurpose music spaces in the UK, focusing on early decay time (EDT), the early (up to 80 ms) to late sound index C80, and overall sound level. The prediction model matched measured data well on average, though variations occurred depending on factors such as ceiling diffusivity, hall geometry (e.g., fan-shaped plans), and stage area absorption. In order to establish well-grounded estimates for the present study, only rectangular rooms and spaces are included.

The amount of absorption influences the reverberation time and the early-to-late sound ratios. However, the distribution of absorption (only on the ceiling, only on the walls, or the ceiling and walls together), as well as sound-diffusing elements (furniture or specific diffusers), has a significant impact on speech intelligibility while having relatively little influence on RT [12,13]. As mentioned by Harvie-Clark [14], RT often fails to provide precise estimates of speech intelligibility due to nonlinear decay in non-diffuse rooms. It was also proven by Nilsson [15] that in spaces with non-uniform absorption (e.g., ceiling absorption), reliance on classical RT formulas (like Sabine) is misleading. The author introduced a two-sound field approach—grazing and non-grazing—which gave more accurate results. This approach was further developed in [16], where a statistical energy model was proposed to predict reverberation time, speech clarity, and strength in rectangular rooms with absorbent ceilings. The model separates grazing and non-grazing sound fields and incorporates the scattering effects of furniture, demonstrating significantly improved agreement with measurements compared to classical diffuse-field models. The model relies on knowledge of surface impedance, which is commonly not provided by material manufacturers. Also, this model is applicable to rooms with only ceiling absorption.

Room size is another significant factor influencing speech intelligibility. In the paper by Pelegrín-García et al. [17], it was concluded that rooms with a volume below 210 m³ should have a reverberation time of approximately 0.6–0.7 s in unoccupied conditions and 0.45–0.6 s when occupied (with fewer than 40 students). The authors emphasized the importance of voice support

S T_{V}

for speakers as well as the influence of background noise levels on listeners. The

S T_{V}

prediction model was introduced.

Apart from Bradley [10], the importance of background noise was also stressed by Nijs and Rychtáriková [18], who used the

U_{50}

parameter. Their predictive model, based on

U_{50}

, incorporates reverberation time, signal-to-noise ratio (SNR), and

C_{50}

.

U_{50}

is defined in the literature as the difference between early sound pressure level (SPL) and late SPL with background noise. The inclusion of background noise in the metric is beneficial, as it provides true estimates of speech intelligibility. However, for the design process, it may not be practical, as the background noise level depends on the number of students and the type of learning activity. The behavioral and cultural aspects are also important. For a designer, the only controllable background noise factor is noise from mechanical installations, HVAC (heating, ventilation, and air conditioning), and the like. It was found that under noisy conditions, when the SNR is low, a slightly higher reverberation time can improve speech intelligibility due to increased sound strength (G) and useful reflections. Notably, the authors provide room design guidelines for architects and acoustic consultants in the form of graphs.

Harvie-Clark and Dobinson [14] concluded that G and

C_{50}

correlate well with perceived loudness and speech intelligibility. They showed that spatial variation of these parameters follows predictable relationships with room absorption, geometry, and source–receiver distance, aligning closely with theoretical models. Similar results were previously demonstrated by Nilsson [13].

One of the more recent studies by Arvidsson et al. [19] reached similar conclusions to Harvie-Clark’s: absorptive treatments consistently reduced strength G and reverberation time, while diffusers preserved G values and improved

C_{50}

, particularly when vertically oriented. In another paper, Nilsson [20] concluded that including the scattering effects of furniture significantly enhanced prediction accuracy for

T_{60}

,

C_{50}

, and G, compared to models assuming homogeneous absorption. Thus, the presence of furniture in a room is another factor that must be taken into account.

Despite substantial research on speech intelligibility, current regulations in several European countries still rely primarily on reverberation time as the main design criterion for classrooms and similar spaces. As an exception, Latvian regulations [3] include the musical clarity parameter

C_{80}

for rooms intended for speech, although this choice remains debatable in the context of speech-focused spaces. Overall, speech clarity parameters such as

C_{50}

are rarely included in practical design workflows due to modeling complexity, which motivates the development of simplified analytical estimation tools.

Acoustic simulations require dedicated software, which, from a consultant’s perspective, increases the cost of the design process because more time is needed to perform multiple calculations for similar yet non-identical rooms. Designing a series of such rooms for speech in new or renovated buildings becomes cumbersome when speech intelligibility parameters are added as control criteria alongside reverberation time. Therefore, it is beneficial to have an analytical tool that allows estimation of speech intelligibility measures, such as speech clarity

C_{50}

, to ensure satisfactory acoustic conditions throughout the entire room.

1.1. C₅₀ Models

Speech clarity

C_{50}

is defined in ISO 3382-1 [21] as the energy ratio between early (arriving from 0 to 50 ms) and late (arriving after 50 ms) reflections.

Bradley [10] introduced

C_{50}

2nd-order polynomial model at 1 kHz based on measured RT values with a standard error of approximately ±1 dB:

C_{50} = - 20.83 R T + 7.020 R T^{2} + 14.204 \pm 0.98 dB

(3)

Barron and Lee in 1988 [11] made predictions of early-to-late index values in 15 unoccupied concert halls based on RT, using [22,23]

C = 10 \log (e^{1.1 / T} - 1)

(4)

where T is the reverberation time, term 1.1 is related to the 80 ms threshold as

13.82 \times 0.08

, and 13.82 is the assumed linear reverberant sound decay slope. In this paper, an improved theory is introduced: the total sound energy at a receiver consists of the direct sound d, early reflected sound

e_{r}

and late sound l

d = \frac{100}{r^{2}},

(5)

e_{r} = (\frac{31 200 T}{V}) e^{- 0.04 r / T} (1 - e^{- 1.11 / T}),

(6)

l = (\frac{31 200 T}{V}) e^{- 0.04 r / T} e^{- 1.11 / T},

(7)

so the early-to-late index is

\begin{matrix} C & = 10 \log [\frac{(d + e_{r})}{l}] . \end{matrix}

(8)

Based on Barron and Lee’s theory, the authors of [18] formulated

C_{50}

as a difference between the sound pressure level (SPL) arriving early (before 50 ms) at the listener’s position

L_{p, e a r l y}

and the SPL arriving late (after 50 ms)

L_{p, l a t e}

and derived the theoretical formula:

L_{p, early} = L_{W} + 10 \log (\frac{Q}{4 π r^{2}} + \frac{4 {(1 - α)}^{f b \cdot r / mfp}}{α S} (1 - \exp (- 0.69 / R T)))

(9)

L_{p, late} = L_{W} + 10 \log (\frac{4 {(1 - α)}^{f b \cdot r / mfp}}{α S} \exp (- 0.69 / R T))

(10)

C_{50} = L_{p, early} - L_{p, late}

(11)

where

L_{W}

is the sound source power level, Q is the source directivity, r is the source–receiver distance,

α

is the absorption coefficient, mfp is the mean free path of the room, and

f b

is a distance factor. The distance factor

f b

accounts for the change in the effective density of reflected sound energy at the receiver with increasing source–receiver distance. It takes into consideration the fact that early reflected energy increases with source–receiver distance more rapidly than predicted by ideal diffuse sound field models. The distance factor

f b = 2

[24] was found to provide good agreement between measured and predicted early-to-late energy ratios. This value reflects typical classroom geometries with dominant wall reflections.

EASERA software developer AFMG [25] and one of its authors, W. Anhert [26], introduce an equation for anticipated speech clarity:

C_{50} = 10 \log (\frac{γ_{s} {(\frac{r_{H}}{r_{x}})}^{2} + 1 - e^{- \frac{13.8 \cdot 0.05}{R T}}}{e^{- \frac{13.8 \cdot 0.05}{R T}}}) dB

(12)

where

r_{H}

is a half-room diffuse-field distance

r_{H} = 0.057 \cdot \sqrt{\frac{V}{R T}}

, and

γ_{s}

is the front-to-random factor of speaker characteristic (directionality). This theoretical model is a revised version of (8) with a 50 ms threshold instead of 80 ms and adding speakers directionality

γ_{s}

.

In [27],

C_{50}

is formulated as a function of reverberation time for distances well away from the source, such that direct sound is not significant, and in rooms where the strength G is not much below 15 dB:

C_{50} = 10 \log (\frac{1 - e^{- 0.69 / R T}}{e^{- 0.69 / R T}}),

(13)

which is a simplified form of both (8) and (11) for a case of exponential decay without taking distance into account, as the source is far enough.

Previous work for this research [28] resulted in a speech clarity model for rooms with ceiling and backwall absorption averaged for 125–4000 Hz octave bands:

C_{50} = 9.65 - 0.8 r + 0.02 r^{2} .

(14)

This model was developed with typical modern classroom design in mind, non-diffuse sound field conditions, and inhomogeneous boundary conditions using empirical data from 181 individual measurements from 9 classrooms of a similar type.

The present paper uses a more diverse dataset of 455 entries from 30 different rooms. As shown above, clarity largely depends on both distance r and reverberation time, so it is important to also implement RT in an empirical model. Thus, the aim of this study is to develop an empirical and practical calculation model for speech clarity

C_{50}

based on the reverberation time and distance in rooms of different sizes. The resulting models demonstrate good practical applicability in the mid-to-high frequency range. However, their accuracy in the low-frequency region is reduced due to modal effects.

1.2. Artificial Intelligence Use

Artificial intelligence tools were used for text editing and the generation of R scripts based on the algorithms and theory provided in the prompts by the author. The scripts were tested and validated by the author. The use of AI did not influence the research design, data collection, data analysis, interpretation of results, or scientific conclusions. All scientific content, analysis, and conclusions were produced by the author.

2. Room Acoustics Measurements

The study defines the following tasks:

Collect speech clarity $C_{50}$ , reverberation time $T_{30}$ , and source–receiver distance data for 27 different rooms.
Select 80% of the data for model training and leave 20% for cross-validation (CV).
Perform regression analysis on training data using selected mathematical models and evaluate the models on real data using statistical metrics.
Perform a cross-validation check using the CV dataset.

The

T_{30}

was chosen among other reverberation parameters simply because it better reflects the classic analytical formulation of RT (Sabine’s, Eyring’s, or others), which is a standard way to estimate reverberation time in practical acoustics.

The data were collected partly by using the available data from room acoustics measurements provided by Akukon and partly by performing measurements at Riga Technical University. All rooms were measured without students in them. The studied rooms are divided into three acoustic categories:

Scatter reverberant—homogeneous boundary (HB), no or little absorption, semi- or fully scattering due to furniture, 7 different rooms/halls (Figure 1a).
Empty reverberant—HB, no or little absorption, no scattering (without furniture), 5 rooms (Figure 1b).
Directional absorptive—inhomogeneous boundary (IB), ceiling absorption and scattering due to furniture, 9 similar classrooms at the RTU campus in Riga, and 6 more rooms/halls (Figure 1c).

All tested rooms are rectangular in shape. During measurements, the temperatures ranged between 18 and 23 °C, while humidity was between 40–60%.

The method for measurement of room acoustic parameters was ISO 3382-1:2009 Acoustics—Measurement of room acoustic parameters—Part 1: Performance spaces [21]. The equipment used for the measurements was a Brüel & Kjær OmniPower Sound Source Type 4292-L with Power Amplifier Type 2734, a calibrated measurement microphone Dayton Audio EMM-6 powered by a Presonus AudioBox 22VSL sound interface, and Odeon Auditorium measurement software. Impulse responses were measured and processed to receive 6-octave frequency band results, mainly for

C_{50}

and

T_{30}

. The principal geometry of the rooms (length, width, and ceiling height) as well as their acoustic conditions were recorded. Source and receiver positions were also recorded, which allowed us to calculate source–receiver distances.

Category 1 includes two conference halls and two school auditoria of approximately 100 m², two classrooms of 70 m² and 61 m², and one music hall of 300 m², representing 96 data entries in total. The dimensions of these rooms range from 11 to 27 m in length, 6 to 11 m in width, and 2.6 to 5.6 m in height.

Category 2 consists of three sports halls of 708 m², 294 m², and 268 m², a showroom of 125 m², and a historic conference hall of 204 m², representing 95 data entries in total. The dimensions of these rooms range from 12 to 33 m in length, 9 to 22 m in width, and 3.5 to 10 m in height.

Category 3 has the largest dataset. The majority of measured rooms are university auditoria. These rooms have mineral wool acoustic ceiling tiles and mineral wool panels on the back wall, a conventional design for teaching premises. The only exception is the 27 m long room, which had a sound-reflecting glass cabinet. There are tables and wooden chairs. The walls of the corridor have protrusions to the outside with a depth of 50–70 cm. A similar shape applies to the windows. The rooms are not perfectly rectangular and have at least some degree of scattering. All rooms have an average ceiling height of 2.66 ± 0.05 m and a width of 5.78 ± 0.5 m; thus, it is argued that these dimensions are similar for all rooms. The length of the rooms varies from 8.84 m to 27 m. In 8 of the 9 rooms, there were 3 separate sources and 5 to 10 individual receivers for each source, thus producing 3 sets of measurements for each room. Only one room had a single sound source, which was initially done as a pilot test. There are 4 extra rooms in this category—a sports hall of 156 m² with ceiling absorption (CA), two school auditoria of 420 (CA) and 550 m² (CA and wall absorption WA), and the previously mentioned showroom after acoustic treatment (CA, WA). One sports hall and one school auditorium were measured before and after additional sound absorption treatment, essentially providing two more rooms to the room set. The total number of data entries for the third category is 264.

3. Results

The summary of geometrical parameters for the three room categories is given in Table 1. The table also shows minimum, maximum, and average values of

T_{30}

and

C_{50}

across all data for each category, averaged for 500–2000 Hz octave bands.

In Category 1, the shapes of

T_{30}

frequency curves show more or less uniform absorption across the six octave frequency bands, with some rooms having more low-frequency absorption relative to mid and high frequencies than others.

In Category 2, a large sports hall exhibits

T_{30}

from 4 to 10 s. The other three rooms exhibit similar

T_{30}

values between 1.5 and 1.9 s across mid frequencies.

Category 3 is the largest set of rooms, providing the highest statistical significance across all three categories. The classrooms show an average

T_{30}

of 0.6 s, and larger spaces have

T_{30}

up to 2 s.

4. Theoretical Background for Regression Analysis

4.1. Polynomial and Logarithmic Regression Models

The mathematical formulation of clarity shows logarithmic and quadratic dependencies between

C_{50}

and distance r and RT. The mathematical models used for regression analysis are second-order polynomials, logarithmic, linear, and quadratic expression models. The predictor, as stated above, was chosen to be the product of source–receiver distance r and the measured reverberation time

T_{30}

. This product is denoted as

r T

. First, simple linear expressions (first-order polynomials) were used, but they proved to be highly inaccurate in approximating the

C_{50}

to

r T

relationship, as expected. The third-order polynomial models were also tested; however,

C_{50}

estimates for higher r and

T_{30}

values were unrealistically high or low, with extrapolation being even less physically meaningful.

In the present study, empirical modeling of the clarity index

C_{50}

as a function of distance and reverberation time

T_{30}

was carried out using polynomial and logarithmic regression formulations. Second-order polynomial regression models were used. The general form of a second-order polynomial model is

C_{50} = β_{0} + β_{1} r T + β_{2} r T^{2},

(15)

where

β_{0}

,

β_{1}

, and

β_{2}

are regression coefficients. The inclusion of the quadratic term allows the model to account for curvature in the spatial or temporal decay that cannot be described by a first-order polynomial linear model.

The logarithmic (the base of 10) linear model is defined as

C_{50} = β_{0} + β_{1} l o g (r T),

(16)

and the log-quadratic as

C_{50} = β_{0} + β_{1} l o g (r T) + β_{2} l o g {(r T)}^{2} .

(17)

Logarithmic dependencies may be advantageous for describing nonlinear sound decay, which varies with distance.

4.2. Regression Analysis Metrics

The coefficient of determination

R^{2}

shows how well a regression model fits the data [29]. It measures the share of variation in the dependent variable explained by the predictors and ranges from 0 to 1, where higher values indicate a better fit, usually being

> 0.5

,

R^{2} = \frac{\sum_{i} {({\hat{y}}_{i} - \bar{y})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}}

(18)

where

y_{i}

is the value for observation i,

\bar{y}

is the mean of the observed values, and

{\hat{y}}_{i}

is the predicted value. The nominator is the sum of squared regression or variation explained by the model; the denominator is the total variation in the data or the sum of squared total.

The adjusted R-squared [29] is a modified version of

R_{adj}^{2}

that takes into account the number of independent variables p in the model, penalizes the addition of useless variables, and only increases if a new variable improves the accuracy of the model:

R_{adj}^{2} = 1 - \frac{(1 - R^{2}) (n - 1)}{n - p - 1}

(19)

where n is the total number of observations used to fit the regression model. The adjusted R-squared is more useful than

R^{2}

, as this study employs two different types of mathematical models with different numbers of coefficients.

The Akaike Information Criterion (AIC) provides a relative measure of model quality that balances goodness-of-fit with model simplicity [30]. An effective model explains the data well with the fewest possible parameters. AIC estimates the information loss associated with each candidate model and is defined as

A I C = 2 k - 2 \ln (L)

(20)

where k is the number of model parameters, and L is the maximum likelihood of the regression model assuming normally distributed residuals of

C_{50}

. It is obtained directly from the least-squares fit to the measured

C_{50}

. Lower AIC values indicate a more efficient model, achieving a better trade-off between accuracy and complexity.

The root mean square error (RMSE) measures the square root of the mean squared residuals, which is the difference between the observed value and the value predicted by the model:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(21)

For

C_{50}

, expressed in dB, RMSE is also expressed in dB. To conclude, three criteria were selected to evaluate the goodness-of-fit of the regression models: the adjusted coefficient of determination (

R_{adj}^{2}

), which assesses how well

C_{50}

is represented by the chosen variable; the Akaike Information Criterion (AIC), which balances model fit with complexity; and the root mean square error (RMSE), which evaluates prediction errors.

5. Regression Results

The regression models introduced in Section 4 were fitted to the training data. Figure 2, Figure 3 and Figure 4 show the regression models for the 1000 Hz octave frequency band.

The model selection procedure was carried out in a hierarchical manner. The first criterion was the AIC value, which reduces the risk of overfitting. Models with lower AIC were given priority. For cases where AIC values were similar, the RMSE served as the next discriminating measure, favoring the models with lower prediction error on the training data. If both AIC and RMSE did not show a clear distinction, the adjusted coefficient of determination

R_{adj}^{2}

was used as the final criterion, selecting the model that explains the largest share of variance.

As an example, in Table 2, one can see the regression models’ performance metrics at 1000 Hz for Category 3. The model with the best fit is log-quadratic.

Table 3 presents the best regression models based on the selection process. The regression coefficients

β

are rounded to two decimals.

Cross-Validation

Every fifth data entry (20%) for each category was selected for cross-validation (CV). The data, which were not used for model training, are called test data. They were applied to the model, and the statistical evaluation metrics were also estimated. In Figure 2, Figure 3 and Figure 4, triangle data points are the test data.

It can be observed that on average, across all six octave bands, the difference in RMSE values between the training and CV test datasets is around −0.3 dB. This shows a satisfactory quality of new independent data approximation by the regression models.

Standard deviation of residuals (as the difference between the regression model and real data) was also estimated for cross-validation data. For Category 1,

σ_{1} = 2

dB; for Category 2,

σ_{2} = 2.7

dB; and for Category 3,

σ_{3} = 1.7

dB. The predicted versus measured

C_{50}

plot at 1000 Hz (Figure 5) shows the agreement between model estimates and experimental data, with deviations from the 1:1 line indicating the magnitude and direction of prediction errors. The majority of residual values range between −3 and +3 dB, with some outliers having absolute error values up to 4 and 5 dB.

The histogram of residuals (Figure 6) shows a convincing Gaussian distribution for all categories, as well as for Category 3, despite the rooms not having a diffuse sound field. This is expected, as the room’s characteristics are described in the model, but the distribution of values around the regression model is random. The histogram is a bit skewed to the negative sign, which shows that the model tends to underestimate the real

C_{50}

values.

6. Discussion

The regression analysis reveals some differences between frequency bands and room categories in terms of model structure and prediction quality. A consistent pattern can be seen across all three categories: the lowest two octave bands behave differently from the mid–high frequency range.

6.1. Behavior of the 125 Hz Band

For 125 Hz, the best-fitting model differs across categories. Category 1 and Category 2 show a quadratic dependence on the combined predictor

r T

, whereas Category 3 follows a logarithmic trend. This inconsistency between categories indicates that the 125 Hz band is governed not by diffuse-field statistical behavior but by the modal characteristics of each room. Also, Category 3 rooms have a much lower average reverberation time (

T_{30}

= 0.8 s), compared to the first and second categories (2.0 and 3.7 s, respectively). This observation supports the assumption that the sound field in Category 3 is different from Categories 1 and 2.

In all three categories, the RMSE exceeds 2 dB, and the adjusted coefficient of determination remains low (

R_{adj}^{2} < 0.5

), confirming that none of the analytical models provides a reliable description of

C_{50}

at 125 Hz.

This observation aligns with room acoustics theory: at low frequencies, the sound field is dominated by modal distributions rather than diffuse decay, and reverberation time fails to represent the underlying energy decay mechanism. Nilsson’s investigations of non-diffuse fields and ceiling-dominated absorption [16] emphasized that classical RT-based models are not applicable where modal behavior prevails. The present data confirm that the product

r T

is not an appropriate predictor for

C_{50}

below approximately 200 Hz.

6.2. Behavior of the 250 Hz Band

The 250 Hz band presents an intermediate case. While the RMSE still exceeds 2 dB in all categories, the adjusted

R_{adj}^{2}

consistently rises above 0.5. Thus, although prediction errors remain relatively high, models capture more than half of the variance, suggesting that diffuse-field behavior is emerging but not fully established at 250 Hz.

The differences between categories are also more pronounced at this frequency. In Category 1, the average

R_{adj}^{2}

between 500 and 4000 Hz is 0.67, whereas at 250 Hz, the value reaches only 0.53 (approximately 22% lower). A similar relative reduction is observed for Category 2 (73% vs. 83%; approximately 11% lower) and Category 3 (53% vs. 67%; approximately 20% lower).

This transitional behavior is consistent with the interpretation that the 250 Hz band sits at the boundary between modal effects and diffuse-field energy decay and therefore exhibits larger variability across rooms with different boundary conditions.

6.3. Mid–High Frequency Behavior (500–4000 Hz)

In the frequency range of 500–4000 Hz, the regression models achieve substantially better performance in all categories. The RMSE values are typically around 1.6–1.8 dB on average, and the adjusted coefficients of determination fall within the range of 0.67 (Category 1 and Category 3) to 0.83 (Category 2).

In this frequency region, the results indicate that the predictor

r T

provides a reliable basis for estimating

C_{50}

. The dependence of clarity on the combined effect of source–receiver distance and reverberation time follows a clear nonlinear trend, which is consistently captured by the logarithmic model formulations. This behavior is observed across all room categories, including those with non-diffuse or strongly inhomogeneous boundary conditions, suggesting that the underlying relationship is robust even when the sound field departs from ideal diffuse assumptions. The improved model performance in the 500–4000 Hz bands is consistent with the statistical nature of the sound field at these frequencies, where modal behavior is practically nonexistent. Overall, the results confirm that

r T

is an effective predictor for

C_{50}

in practical room conditions within this frequency range. The logarithmic formulations also align with existing theoretical models of clarity in rectangular and semidiffuse rooms, capturing the direct and reverberant parts of the sound decay [10,11,31].

6.4. Comparison Between Room Categories

An evaluation across the three categories shows that Category 2 (empty, homogeneous boundary rooms) consistently achieves the highest goodness-of-fit metrics. The RMSE is typically lower, and

R_{adj}^{2}

is higher relative to Category 1 and Category 3 in almost every band.

An unexpected result was the relative performance of the room categories. Category 2 rooms, which contain the least amount of scattering elements, produced the most consistent regression behavior. It was initially assumed that Category 1 rooms would yield lower variability, as these spaces have more homogeneous boundary conditions and include some furniture that could introduce scattering. The observations indicate that this assumption does not hold. The furniture typically present in Category 1 rooms (mainly chairs and tables) provides only partial and uneven scattering, particularly in the mid–high frequencies, and introduces additional non-uniformity rather than improving diffuseness. Consequently, the regression models for Category 2 outperform those for Category 1 by approximately 38%, suggesting that the acoustic behavior in Category 2 is closer to that predicted by classical Sabine decay, despite the minimal presence of scattering surfaces.

Category 1 and Category 3 rooms show more variability. Category 1 (reverberant) exhibits greater spatial irregularities due to furniture and geometry, while Category 3 (directional absorptive) includes inhomogeneous boundary conditions and scattering from furniture. These effects introduce deviations from idealized diffuse decay, which is reflected in higher dispersion in the

C_{50}

measurements and correspondingly lower goodness-of-fit.

These observations indicate that the differences in regression performance depend not only on the reverberation time but also on the degree of scattering and boundary homogeneity.

7. Conclusions

This work presented an empirical approach for estimating the speech clarity index

C_{50}

from the combined predictor

r T = r \times T_{30}

. A dataset of measured acoustic parameters from rooms of different types was analyzed. Several regression models were evaluated, and their performance was assessed using adjusted

R^{2}

, AIC, and RMSE, followed by cross-validation on independent data.

The study showed that the behavior of

C_{50}

can be clearly separated by frequency. For 125–250 Hz, neither quadratic nor logarithmic models provided reliable estimates due to dominant modal effects. For 500–4000 Hz, the models produced stable and consistent results, with RMSE typically below 2 dB and adjusted

R^{2}

up to 0.83. Logarithmic formulations demonstrated the most robust behavior across all room categories in this frequency region.

Residual analysis confirmed that prediction errors follow an approximately Gaussian distribution, which supports the use of standard deviation as an accuracy indicator. Cross-validation residual spread for mid–high frequencies was 1.7–2.7 dB, which represents the expected uncertainty range for practical predictions based on the proposed model. To reach a 95% confidence range for practical estimates of

C_{50}

, the variability should be considered within twice the residual standard deviation. For the present models, this corresponds to an interval of about

\pm 3.4

to

\pm 5.4

dB in the mid–high frequency bands.

The main contribution of this study is the demonstration that

C_{50}

can be estimated directly from distance and analytically obtained reverberation time, for example, using Eyring’s formula or the most recent approaches to RT estimation [16], without room simulation. The combined predictor

r T

captures the dominant decay trends in real rooms with both homogeneous and inhomogeneous boundaries. The resulting regression formulas can be applied for preliminary assessment of speech clarity in early room design stages, both for architectural and acoustic purposes. The models are not intended to replace simulations but to complement them during preliminary design and verification.

The method is limited at low frequencies, where modal behavior dominates, and statistical decay parameters are no longer valid predictors. Future work may include extending the approach to incorporate background noise or directional source characteristics and validating the model on a larger sample of strongly inhomogeneous rooms.

Funding

This work was supported by a postdoctoral grant No. RTU-PG-2024/1-0037 under the EU Recovery and Resilience Facility funded project No. 5.2.1.1.i.0/2/24/I/CFLA/003 “Implementation of consolidation and management changes at Riga Technical University, Liepaja University, Rezekne Academy of Technology, Latvian Maritime Academy and Liepaja Maritime College for the progress towards excellence in higher education, science, and innovation”.

Data Availability Statement

The original data presented in the study are openly available in Zenodo at https://doi.org/10.5281/zenodo.18223659.

Acknowledgments

Many thanks go to industrial partner Akukon for technological support. The author expresses gratitude to Cheol-Ho Jeong and Jonas Brunskog for scientific consultations. The author acknowledge the use of artificial intelligence-based tools for text editing and programming code generation.

Conflicts of Interest

The author declares no conflicts of interest.

References

SFS 5907:2022; Rakennusten akustinen Suunnittelu ja Laatuluokitus [Acoustical Design and Quality Classes of Buildings]. Finnish Standards Association (SFS): Helsinki, Finland, 2022.
DIN 18041:2016; Hörsamkeit in Räumen—Anforderungen, Empfehlungen und Hinweise für die Planung (Acoustical Quality in Small to Medium-Sized Rooms—Requirements, Recommendations and Guidance for Design). German Institute for Standardization (DIN): Berlin, Germany, 2016.
Ministru Kabinets. Noteikumi par Latvijas Būvnormatīvu LBN 016-15 “Būvakustika” (Ministru Kabineta Noteikumi Nr. 312). Latvijas Vēstnesis, 124, 30.06.2015. 2015. Available online: https://likumi.lv/ta/id/274976-noteikumi-par-latvijas-buvnormativu-lbn-016-15-buvakustika- (accessed on 12 November 2025).
UK Department for Education. BB93: Acoustic Design of Schools—Performance Standards; Technical Report, Building Bulletin 93, Revision 2015; Department for Education: London, UK, 2015.
ODEON A/S. Odeon Room Acoustics Software, Version 17: User Manual; ODEON A/S: Lyngby, Denmark, 2023. [Google Scholar]
Treble Technologies. Treble Acoustic Simulation Platform: Online Documentation. 2024. Available online: https://docs.treble.tech/ (accessed on 25 October 2025).
Sabine, W.C. Collected Papers on Acoustics; Harvard University Press: Cambridge, MA, USA, 1922. [Google Scholar]
Eyring, C.F. Reverberation time in “dead” rooms. J. Acoust. Soc. Am. 1930, 1, 217–241. [Google Scholar] [CrossRef]
Prawda, K.; Schlecht, S.J.; Välimäki, V. Calibrating the Sabine and Eyring formulas. J. Acoust. Soc. Am. 2022, 152, 1158–1169. [Google Scholar] [CrossRef] [PubMed]
Bradley, J.S. Speech intelligibility studies in classrooms. J. Acoust. Soc. Am. 1986, 80, 846–854. [Google Scholar] [CrossRef] [PubMed]
Barron, M.; Lee, L.J. Energy relations in concert auditoria. J. Acoust. Soc. Am. 1988, 84, 618–628. [Google Scholar] [CrossRef]
Campbell, C.; Nilsson, E.; Svensson, C. The same reverberation time in two identical rooms does not necessarily mean the same levels of speech clarity and sound levels when we look at impact of different ceiling and wall absorbers. In Proceedings of the Euronoise 2015, Maastricht, The Netherlands, 31 May–3 June 2015. [Google Scholar]
Nilsson, E. Room acoustic measures for classrooms. In Proceedings of the Internoise 2010, Lisbon, Portugal, 13–16 June 2010. [Google Scholar]
Harvie-Clark, J.; Dobinson, N. The practical application of G and C50 in classrooms. In Proceedings of the Internoise 2013, Innsbruck, Austria, 15–18 September 2013. [Google Scholar]
Nilsson, E. Decay processes in rooms with non-diffuse sound fields. Part I: Ceiling treatment with absorbing material. Acta Acust. United Acust. 2004, 90, 459–466. [Google Scholar] [CrossRef]
Nilsson, E.; Arvidsson, E. An energy model for the calculation of room acoustic parameters in rectangular rooms with absorbent ceilings. Appl. Sci. 2021, 11, 6607. [Google Scholar] [CrossRef]
Pelegrín-García, D.; Brunskog, J.; Rasmussen, B. Speaker-oriented classroom acoustics design guidelines in the context of current regulations in European countries. Build. Environ. 2015, 94, 13–22. [Google Scholar] [CrossRef]
Nijs, L.; Rychtáríková, M. Calculating the optimum reverberation time and absorption coefficient for good speech intelligibility in classroom design using U50. Acta Acust. United Acust. 2011, 97, 507–515. [Google Scholar] [CrossRef]
Arvidsson, E.; Nilsson, E.; Hagberg, D.B.; Karlsson, O.J.I. The effect on room acoustical parameters using a combination of absorbers and diffusers—An experimental study in a classroom. Acoustics 2020, 2, 505–523. [Google Scholar] [CrossRef]
Nilsson, E. Input data for acoustical design calculations for ordinary public rooms. Build. Acoust. 2017, 24, 3–19. [Google Scholar]
ISO 3382-1:2009; Acoustics—Measurement of Room Acoustic Parameters—Part 1: Performance Spaces. International Organization for Standardization: Geneva, Switzerland, 2009.
Cremer, L.; Müller, H.A.; Schultz, T.J. Principles and Applications of Room Acoustics, Volume 1; Applied Science Publishers: London, UK, 1982. [Google Scholar]
Schroeder, M.R.; Atal, B.S.; Sessler, G.M.; West, J.E. Acoustical measurements in Philharmonic Hall (New York). J. Acoust. Soc. Am. 1966, 40, 434–440. [Google Scholar] [CrossRef]
Sato, H.; Bradley, J.S. Evaluation of acoustical conditions for speech communication in working elementary school classrooms. J. Acoust. Soc. Am. 2008, 123, 2064–2077. [Google Scholar] [CrossRef] [PubMed]
AFMG Technologies GmbH. EASERA Appendix: Fundamentals to Perform Acoustical Measurements; AFMG Technologies GmbH: Berlin, Germany, 2021. [Google Scholar]
Ahnert, W.; Schmidt, W. Akustik in Kulturbauten; Institut für Kulturbauten: Berlin, Germany, 1980. [Google Scholar]
Harvie-Clark, J. Use of G and C50 for classroom design. In Proceedings of the Institute of Acoustics; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 36, pp. 220–227. [Google Scholar]
Mironovs, D. Linking Speech Clarity, Reverberation, and Distance for Classroom Design Optimization. In Proceedings of the 11th Convention of the European Acoustics Association—Forum Acusticum/EuroNoise 2025, Málaga, Spain, 23–26 June 2025; European Acoustics Association (EAA): Málaga, Spain, 2025; pp. 3949–3952. ISBN 978-84-87985-35-5. [Google Scholar] [CrossRef]
Seber, G.A.F.; Lee, A.J. Linear Regression Analysis, 2nd ed.; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Bradley, J.S. A new look at acoustical criteria for classrooms. Build. Acoust. 2009, 16, 175–188. [Google Scholar]

Figure 1. Examples of measured classrooms: (a) Category 1 classic reverberant; (b) Category 2 empty; (c) Category 3 directional absorptive.

Figure 2. Regression models at 1000 Hz for Category 1. Triangle markers indicate the cross-validation (test) subset.

Figure 3. Regression models at 1000 Hz for Category 2. Triangle markers indicate the cross-validation (test) subset.

Figure 4. Regression models at 1000 Hz for Category 3. Triangle markers indicate the cross-validation (test) subset.

Figure 5. Comparison between predicted and measured

C_{50}

values for the 1000 Hz, the best-performing regression model; the dashed line indicates the ideal 1:1 relationship.

Figure 5. Comparison between predicted and measured

C_{50}

values for the 1000 Hz, the best-performing regression model; the dashed line indicates the ideal 1:1 relationship.

Figure 6. Histogram of prediction residuals for the Category 3 1000 Hz octave band. The distribution illustrates the statistical spread of the differences between measured and modeled

C_{50}

values.

Figure 6. Histogram of prediction residuals for the Category 3 1000 Hz octave band. The distribution illustrates the statistical spread of the differences between measured and modeled

C_{50}

values.

Table 1. Summary of geometrical and acoustic parameters for the three room categories. The

T_{30}

and

C_{50}

values are given as 500–2000 Hz average.

Table 1. Summary of geometrical and acoustic parameters for the three room categories. The

T_{30}

and

C_{50}

values are given as 500–2000 Hz average.

Category	L, m	W, m	H, m	$T_{30}$ , s	$C_{50}$ , dB
Cat 1
min	10.7	5.7	2.6	0.7	−8.7
max	26.8	11.2	5.6	2.9	5.4
avg	15.6	7.3	3.8	2.0	−2.5
Cat 2
min	11.5	8.5	3.5	1.4	−13.0
max	32.8	21.6	10.0	9.4	7.8
avg	23.4	12.8	5.7	3.7	−2.6
Cat 3
min	8.8	5.2	2.6	0.4	−2.9
max	37.2	15.0	6.8	2.1	14.0
avg	17.6	7.7	3.4	0.8	5.1

Table 2. Regression model performance metrics at 1000 Hz for Category 3.

Model	AIC	RMSE (dB)	$R_{adj}^{2}$
quadratic	877	1.88	0.50
log-linear	830	1.69	0.60
log-quadratic	827	1.67	0.61

Table 3. Selected regression models for the three room categories.

Frequency	Model	AIC	RMSE (dB)	$R_{adj}^{2}$	Equation
Category 1
125 Hz	quadratic	395	2.98	0.43	$C_{50} = 2.28 - 0.52 r T + 0.01 r T^{2}$
250 Hz	log-linear	346	2.21	0.53	$C_{50} = 3.98 - 6.54 \log (r T)$
500 Hz	log-quadratic	309	1.71	0.64	$C_{50} = 5.12 - 10.13 \log (r T) + 1.89 \log^{2} (r T)$
1000 Hz	log-quadratic	311	1.73	0.68	$C_{50} = 7.58 - 14.23 \log (r T) + 3.74 \log^{2} (r T)$
2000 Hz	log-quadratic	299	1.60	0.69	$C_{50} = 6.42 - 12.37 \log (r T) + 3.01 \log^{2} (r T)$
4000 Hz	log-quadratic	295	1.56	0.68	$C_{50} = 6.48 - 11.20 \log (r T) + 2.66 \log^{2} (r T)$
Category 2
125 Hz	quadratic	389	2.87	0.47	$C_{50} = 2.85 - 0.19 r T + 0.00 r T^{2}$
250 Hz	log-linear	358	2.37	0.73	$C_{50} = 10.27 - 9.56 \log (r T)$
500 Hz	log-quadratic	326	1.91	0.85	$C_{50} = 7.83 - 6.06 \log (r T) - 1.43 \log^{2} (r T)$
1000 Hz	log-quadratic	318	1.81	0.84	$C_{50} = 7.67 - 5.82 \log (r T) - 1.37 \log^{2} (r T)$
2000 Hz	log-linear	320	1.86	0.82	$C_{50} = 9.86 - 9.57 \log (r T)$
4000 Hz	log-linear	321	1.87	0.81	$C_{50} = 9.76 - 9.22 \log (r T)$
Category 3
125 Hz	log-linear	1003	2.54	0.37	$C_{50} = 7.85 - 4.98 \log (r T)$
250 Hz	log-linear	927	2.12	0.53	$C_{50} = 8.29 - 6.17 \log (r T)$
500 Hz	log-linear	871	1.86	0.60	$C_{50} = 8.86 - 6.26 \log (r T)$
1000 Hz	log-quadratic	827	1.67	0.61	$C_{50} = 8.93 - 6.89 \log (r T) + 1.16 \log^{2} (r T)$
2000 Hz	log-quadratic	807	1.59	0.68	$C_{50} = 9.36 - 8.16 \log (r T) + 1.62 \log^{2} (r T)$
4000 Hz	log-linear	729	1.33	0.79	$C_{50} = 9.56 - 6.66 \log (r T)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mironovs, D. Empirical Modeling of Speech Clarity C₅₀ as a Function of Distance and Reverberation Time T₃₀ . Buildings 2026, 16, 749. https://doi.org/10.3390/buildings16040749

AMA Style

Mironovs D. Empirical Modeling of Speech Clarity C₅₀ as a Function of Distance and Reverberation Time T₃₀ . Buildings. 2026; 16(4):749. https://doi.org/10.3390/buildings16040749

Chicago/Turabian Style

Mironovs, Deniss. 2026. "Empirical Modeling of Speech Clarity C₅₀ as a Function of Distance and Reverberation Time T₃₀ " Buildings 16, no. 4: 749. https://doi.org/10.3390/buildings16040749

APA Style

Mironovs, D. (2026). Empirical Modeling of Speech Clarity C₅₀ as a Function of Distance and Reverberation Time T₃₀ . Buildings, 16(4), 749. https://doi.org/10.3390/buildings16040749

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Empirical Modeling of Speech Clarity C₅₀ as a Function of Distance and Reverberation Time T₃₀ ^†

Abstract

1. Introduction

1.1. C₅₀ Models

1.2. Artificial Intelligence Use

2. Room Acoustics Measurements

3. Results

4. Theoretical Background for Regression Analysis

4.1. Polynomial and Logarithmic Regression Models

4.2. Regression Analysis Metrics

5. Regression Results

Cross-Validation

6. Discussion

6.1. Behavior of the 125 Hz Band

6.2. Behavior of the 250 Hz Band

6.3. Mid–High Frequency Behavior (500–4000 Hz)

6.4. Comparison Between Room Categories

7. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Empirical Modeling of Speech Clarity C50 as a Function of Distance and Reverberation Time T30 †

Abstract

1. Introduction

1.1. C50 Models

1.2. Artificial Intelligence Use

2. Room Acoustics Measurements

3. Results

4. Theoretical Background for Regression Analysis

4.1. Polynomial and Logarithmic Regression Models

4.2. Regression Analysis Metrics

5. Regression Results

Cross-Validation

6. Discussion

6.1. Behavior of the 125 Hz Band

6.2. Behavior of the 250 Hz Band

6.3. Mid–High Frequency Behavior (500–4000 Hz)

6.4. Comparison Between Room Categories

7. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Empirical Modeling of Speech Clarity C₅₀ as a Function of Distance and Reverberation Time T₃₀ ^†

1.1. C₅₀ Models