3.2. Discussion
The experimental dataset provides an articulated interpretation of the acoustic behaviour of the San Michele di Mezzo sanctuary site, including its three sacred spaces: the lower cave, the upper cave, and the later church. These three spaces correspond to different degrees of natural and artificial transformation within the same sacred complex. The lower cave is the space in which natural rock-cut features are most prevalent and where no complete flooring or architectural regularization is present, apart from limited devotional additions and the stair elements beside the altar. The upper cave preserves a rock-cut character but also includes more artificial elements, particularly around the altar and towards the present entrance, which was obtained by partially closing the original cave opening with a built wall connected to the later architectural phase of the sanctuary. The church, in contrast, is a built worship space and represents the most architecturally regular component of the complex.
Therefore, the acoustic results should be interpreted not as a ranking of spaces from “better” to “worse”, but as evidence of a progressive differentiation between a predominantly natural cave, a partially transformed cave, and a built church. This distinction is important for the cultural interpretation of the site. The internal cave spaces were not acoustically optimized in a modern or intentional sense, and such an interpretation would be historically inappropriate. Rather, the question is whether their measured acoustic response is compatible with voice-based devotional practices, such as spoken prayer, chant or liturgical recitation, and how this response differs from that of the later church. Accordingly, the following interpretation refers to the retained indicators obtained under the adopted source–receiver configurations, and not to an exhaustive acoustic mapping of the sanctuary. The comparison among the lower cave, upper cave, and church should therefore be understood as configuration-specific and repeatability-screened.
The background sound-level descriptors provide an important context for interpreting the acoustic survey. The outdoor reference position was characterized by substantially higher levels than the indoor spaces, with Lmean = 53.79 dB. By contrast, the church, upper cave, and lower cave showed lower mean levels, equal to 41.00 dB, 40.36 dB, and 39.05 dB, respectively. This confirms that the indoor measurements were acquired under relatively quiet conditions, with a clear reduction in the external acoustic background inside the sanctuary spaces. The lower cave was the quietest and most stable indoor environment, with data having a standard deviation of 1.48 dB. This indicates that the background during the acquisition window was not dominated by strong fluctuating noise. The upper cave and church showed slightly higher variability, with standard deviations of 2.45 dB and 2.07 dB, respectively. However, their lower percentile levels remained close to 40 dB or below. These values support the reliability of the impulse-response measurements, because the acoustic indicators were obtained in conditions where background noise was low and sufficiently stable within the indoor spaces.
The decay-related indicators show a clear separation between the church and the two caves. At 250 Hz, EDT is 2.136 ± 0.090 s in the lower cave and 2.044 ± 0.089 s in the upper cave, while it reaches 3.144 ± 0.273 s in the church. The same pattern is observed for T20 and T30, which are higher in the church than in both caves over the low- and mid-frequency range. At 500 Hz, for example, T30 is 2.17 ± 0.20 s in the lower cave and 2.27 ± 0.30 s in the upper cave, whereas it is 3.13 ± 0.08 s in the church. At 1000 Hz, T30 remains higher in the church, with 2.82 ± 0.07 s, compared with 1.93 ± 0.24 s in the lower cave and 2.19 ± 0.81 s in the upper cave. These results indicate that the later church provides a more persistent reverberant field than the cave spaces, especially in the frequency range most relevant to vocal sound.
The comparison between the two caves is more subtle. Their EDT values are close over most of the analyzed frequency range, with the lower cave showing 2.136 ± 0.090 s at 250 Hz, 1.659 ± 0.090 s at 500 Hz, 1.391 ± 0.029 s at 1000 Hz, 1.161 ± 0.047 s at 2000 Hz, 0.983 ± 0.043 s at 4000 Hz, and 0.666 ± 0.024 s at 8000 Hz. The corresponding upper-cave values are 2.044 ± 0.089 s, 1.594 ± 0.117 s, 1.379 ± 0.079 s, 1.198 ± 0.059 s, 0.992 ± 0.035 s, and 0.627 ± 0.088 s. These values show that the two caves have broadly comparable decay behaviours, despite their different degrees of artificial transformation. This suggests that the rock-cut morphology remains the dominant factor controlling the acoustic decay of both cave spaces.
However, the comparison of T30 indicates some differences in the reverberant tail. At 250 Hz, T30 is 2.68 ± 0.27 s in the lower cave and 3.07 ± 0.57 s in the upper cave, suggesting a slightly longer low-frequency reverberant tail in the upper cave. At 500 Hz and 2000–4000 Hz the two caves are again close, with T30 values around 2.17–2.27 s at 500 Hz, 1.48–1.53 s at 2000 Hz, and 1.16 s at 4000 Hz. At 8000 Hz, both caves show short decay times, with T30 = 0.74 ± 0.02 s in the lower cave and 0.79 ± 0.07 s in the upper cave. Therefore, the caves are acoustically differentiated, but not in a simple hierarchical way. Their main common feature is a frequency-dependent decay, with longer persistence at low frequencies and progressively shorter decay at high frequencies.
The church shows a different trend. Its EDT and T20 values decrease with frequency, but they remain higher than those measured in the caves up to 4000 Hz. At 4000 Hz, EDT is 1.913 ± 0.169 s in the church, compared with 0.983 ± 0.043 s in the lower cave, and 0.992 ± 0.035 s in the upper cave. At 8000 Hz, the church becomes closer to the caves, with EDT = 1.030 ± 0.089 s and T20 = 1.21 ± 0.03 s, but it still remains more persistent than both cave spaces. This confirms that the church behaves as a built reverberant worship environment, whereas the caves behave as compact rock-cut spaces with shorter high-frequency decay.
The clarity and definition indicators must be interpreted with particular caution, because only values satisfying the repeatability criterion were retained. Missing values in
Table 2 therefore do not indicate an absence of processing, but an insufficient repeatability for reliable interpretation. This is especially relevant for C50 and C80, which are sensitive to source directivity, receiver position, and local reflections. In the lower cave, the available C50 values are negative, with −10.10 ± 1.47 dB at 250 Hz, −4.88 ± 0.79 dB at 500 Hz, and −3.05 ± 0.60 dB at 1000 Hz. These values do not support the claim of high speech clarity according to the modern room-acoustic criteria. Nevertheless, the trend from 250 Hz to 1000 Hz shows a progressive improvement of the early-to-late energy balance in the lower cave.
The available C80 and D50 values provide additional information. In the lower cave, C80 changes from −4.24 ± 0.68 dB at 250 Hz to 4.65 ± 0.63 dB at 8000 Hz, suggesting that the balance between early and late energy becomes more favourable at higher frequencies for sustained vocal sound. D50 also increases with frequency: 24.68 ± 3.47% at 500 Hz, 33.21 ± 3.01% at 1000 Hz, 37.87 ± 4.59% at 2000 Hz, 40.52 ± 6.03% at 4000 Hz, and 50.94 ± 5.44% at 8000 Hz. This progressive increase in D50 indicates a growing early-to-total energy ratio towards the high-frequency range. Such behaviour is not evidence of acoustic optimization, but it shows that the lower cave is not acoustically incompatible with vocal practices.
The upper cave shows a similar but partly distinct behaviour. D50 is 31.06 ± 6.16% at 1000 Hz, 41.05 ± 6.68% at 2000 Hz, 45.78 ± 5.18% at 4000 Hz, and 62.18 ± 11.92% at 8000 Hz. These values are generally comparable with or higher than those measured in the lower cave at the same frequencies, but the associated standard deviations are larger. This is consistent with the upper cave being a more spatially differentiated environment, partly because it includes artificial boundaries around the altar and the present entrance. The upper cave therefore appears less as a uniformly improved acoustic space and more as a hybrid environment, where rock-cut morphology and later architectural closure jointly influence the early-energy distribution.
The church provides a further reference for interpreting the cave spaces. Its decay times are longer than those of the caves, but the available D50 values are not uniformly higher. At 2000 Hz, D50 is 23.07 ± 4.60% in the church, lower than both the lower cave and the upper cave. At 8000 Hz, D50 reaches 51.81 ± 4.96%, close to the lower cave and lower than the upper cave. This suggests that the church, although more reverberant and architecturally regular, does not necessarily provide better early sound definition in all frequency bands. In other words, the built worship space is acoustically more persistent, but not automatically more favourable in terms of early-to-total energy ratio. This result is relevant because it shows that the cave spaces cannot be considered merely acoustically deficient predecessors of the later church. They have their own measurable acoustic profile.
Before connecting these measurements with the devotional role of the lower cave, alternative acoustic explanations must be acknowledged. The measured indicators may be affected by the adopted altar-based source position, source orientation, source–receiver distance, local geometry around the altar, nearby rock surfaces, the partial architectural closure of the upper cave, and modal or frequency-dependent behaviour typical of small irregular cavities. Therefore, the retained values should not be interpreted as general intrinsic properties of the whole cave volumes, nor as direct evidence that any space was selected because of its acoustic response. They indicate how each space behaved under the adopted source–receiver configurations and within the repeatability-screened dataset. This limitation is particularly important for clarity and definition indicators, which are more sensitive than decay-related parameters to early reflections, local geometry, and receiver position.
From the point of view of heritage interpretation, the most important result is therefore not acoustic superiority, but acoustic compatibility. The lower cave, which preserves the oldest devotional nucleus of the sanctuary, remains predominantly natural and only minimally transformed. Within these limits, its measured acoustic response does not indicate intentional design or optimization, but remains compatible with voice-based devotional use under the adopted measurement conditions. The available indicators suggest that the lower cave combines moderate decay in the mid-frequency range, a progressive reduction in reverberation towards the high-frequency range and increasing early-energy contribution with frequency. These features would not prevent spoken prayer, recitation or chant, although they do not correspond to the modern criteria of high speech clarity.
The upper cave adds a second level to this interpretation. Compared with the lower cave, it is more affected by artificial elements, including built surfaces around the altar and the partial closure of the entrance. Its decay-related indicators are close to those of the lower cave, suggesting that the cave morphology remains acoustically dominant. At the same time, its D50 values at medium–high and high frequencies are higher than those of the lower cave, but with larger dispersion. This may indicate that the built additions and the modified entrance affect the distribution of early reflections, increasing the early-energy fraction in some receiver configurations while also increasing spatial variability. Therefore, the upper cave should not be interpreted as simply acoustically better or worse than the lower cave; rather, it represents a partially transformed acoustic environment within the same rock-cut system.
The comparison with published data from other cave and cave-like environments [
9,
10,
12], reported in
Appendix A, further clarifies the position of San Michele di Mezzo. The sanctuary does not behave as a large highly reverberant cave hall. Its EDT values are lower than those reported for the large Pertosa spaces and closer to compact cave environments such as La Pasiega, Tito Bustillo, Paphos, and El Castillo in the mid-frequency range. The same applies to T30: San Michele shows values lower than those of the Large Hall and Throne Hall of Pertosa over most of the frequency range, while remaining closer to the Castle Hall and to smaller cave environments. This confirms that the site should be interpreted as a compact rock-cut sanctuary rather than as a large reverberant cavity.
The comparison of clarity indicators supports an intermediate interpretation. The available C50 values in the lower cave remain negative, and therefore San Michele should not be described as an exceptionally clear acoustic environment. However, these values are less unfavourable than those reported for some acoustically complex prehistoric cave sites, such as El Castillo, La Pasiega Turret, and Tito Bustillo, where C50 values are markedly lower. Conversely, San Michele does not reach the favourable clarity conditions reported for sites such as La Garma or Las Chimeneas. The comparative evidence therefore places San Michele in an intermediate position: it is neither a highly clear cave environment nor a strongly penalizing large reverberant cavity.
The comparison of C80 and D50 is also informative, although it must be treated cautiously because the available San Michele values are incomplete after repeatability screening. The lower cave has a C80 value at 250 Hz that is comparable with or slightly less unfavourable than values reported for several large cave spaces, while its C80 at 8000 Hz is positive and higher than the values reported for some larger environments. The D50 values of the San Michele caves increase with frequency and become comparable with, or higher than, those reported for several Pertosa spaces in the medium–high and high-frequency range. This reinforces the interpretation that the most relevant favourable feature of San Michele is not high C50-based speech clarity, but the progressive strengthening of the early-to-total energy ratio at higher frequencies.
Overall, the comparative data support a nuanced interpretation. San Michele di Mezzo is neither an acoustically optimized cave nor a large reverberant hall. It is better described as a compact rock-cut sanctuary with frequency-dependent acoustic behaviour: cave-like low-frequency persistence, moderate mid-frequency decay, and increasing early-energy contribution at higher frequencies. The later church, by contrast, shows a more persistent reverberant response typical of built worship spaces. The acoustic identity of the sanctuary therefore lies in the coexistence of these different environments rather than in the superiority of one space over another.
From the perspective of intangible heritage, the measured indicators provide a quantitative but limited connection between the physical response of the spaces and historically plausible sound-related practices. Spoken prayer, chant, and liturgical recitation depend on a balance between reverberant support and intelligibility. EDT, T20, and T30 describe the temporal persistence of vocal sound, while C50, C80, and D50 describe different aspects of early-to-late and early-to-total energy balance. In the present case, these indicators do not reconstruct past rituals and do not allow the medieval sound experience to be reproduced. They do, however, provide evidence that the older lower cave was acoustically compatible with vocal devotional practices, while the later church developed a more reverberant built acoustic environment.
The methodological contribution of the study lies in the conservative use of directly measured quantities, repeated measurements, and repeatability-based screening. The analysis does not rely on unsupported transfer-function estimates between adjacent spaces or on an uncalibrated acoustic model of the sanctuary. Instead, it uses time histories, third-octave-band spectra, and impulse-response-derived indicators obtained under controlled field conditions. The exclusion of indicators with excessive dispersion avoids overinterpreting unstable estimates, especially for clarity and definition parameters, which are particularly sensitive to source directivity, receiver position, and local reflections.
The applicability of the proposed approach may extend beyond the specific heritage interpretation of San Michele di Mezzo, but only at the methodological level. Natural caves, show caves, rock-cut sanctuaries, crypts, and other confined or semi-confined spaces share some acoustic features with the investigated site, including irregular geometry, non-diffuse sound fields, and frequency-dependent responses. Previous studies on tourist caves and natural underground spaces have shown that room-acoustic indicators such as EDT, T30, C50, C80, D50, and STI can support the assessment of guided-tour communication, visitor experience, and performance suitability [
11,
12]. In this sense, the present survey may be considered as an example of a portable first-level acoustic assessment potentially transferable to other cavity-like environments. However, this transferability concerns the measurement logic, including repeated measurements, uncertainty screening, and conservative interpretation, not the specific acoustic conclusions, which remain site-specific and configuration-specific.
Nevertheless, the study has limitations that affect the universality, but not the internal consistency, of the conclusions. The survey uses selected concepts and positioning criteria inspired by ISO 3382-1, but it should not be interpreted as a standard-compliant room-acoustic characterization. The use of a directional loudspeaker instead of a standardized omnidirectional source may influence the distribution of early acoustic energy, particularly in irregular cave geometries where reflections depend on source orientation, nearby surfaces, and openings. This choice was made because the survey focuses on vocal practices and because no documentary or material evidence currently supports the historical use of musical instruments in the investigated devotional context. The conclusions are therefore site-specific and configuration-specific: they demonstrate measured acoustic differentiation among the lower cave, the upper cave, and the church of San Michele di Mezzo, but they do not define universal acoustic criteria for cave sanctuaries or rock-cut worship spaces. Future research should extend the survey by increasing the number of source and receiver positions, repeating measurements under different environmental conditions and integrating the experimental data with three-dimensional geometric documentation or numerical acoustic modelling.