Next Article in Journal
A Joint Gesture-Identity Recognition Framework Based on 4D Millimeter-Wave Radar Sensing
Previous Article in Journal
RGB-Based Staircase Detection for Quadrupedal Robots: Implementation and Analysis
Previous Article in Special Issue
Impacts of Mesoscale Eddy Structural Characteristics on Matched-Field Localization Uncertainty
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Wavelet-Based Analysis of Soundscape Dynamics in a Riparian Woodland: The Bernate-Ticino River Park

by
Roberto Benocci
1,*,
Giorgia Guagliumi
1,
Andrea Potenza
1,
Valentina Zaffaroni-Caorsi
1,
Hector Eduardo Roman
2,* and
Giovanni Zambon
1
1
Department of Earth and Environmental Sciences (DISAT), University of Milano-Bicocca, Piazza della Scienza 1, 20126 Milano, Italy
2
Department of Physics, University of Milano-Bicocca, Piazza della Scienza 3, 20126 Milano, Italy
*
Authors to whom correspondence should be addressed.
Sensors 2025, 25(23), 7248; https://doi.org/10.3390/s25237248
Submission received: 21 October 2025 / Revised: 16 November 2025 / Accepted: 26 November 2025 / Published: 27 November 2025

Abstract

Passive acoustic monitoring (PAM) is a valuable tool for ecological research, but many eco-acoustic indices show inconsistent correlations with biodiversity due to methodological variability and environmental noise. We propose a complementary, physically interpretable approach using energy-derived metrics. We analyzed audio recordings from three sites near a major highway in the Ticino River Park (Milan, Italy) using 1 sec equivalent continuous sound pressure level (Leq1s), peak interval statistics, maximal-overlap discrete-wavelet transform (MODWT), and temporal fractal analysis. This multi-resolution type of approach enabled frequency-specific tracking of acoustic energy and temporal structure. Our results reveal site-specific differences: Site 3, the most distant from the highway, showed higher high-frequency energy and longer temporal persistence, suggesting richer biophonic activity. Site 1, the closest to the highway, displayed flatter spectral profiles and faster autocorrelation decay. Diel patterns were reflected in hourly Leq trends, while fractal analysis revealed frequency- and site-dependent acoustic memory. These automated findings were corroborated by expert annotations of bird activity and traffic. The integration of Leq1s, peak metrics, and wavelet decomposition offers a suitable framework for soundscape characterization, with strong potential for long-term ecoacoustic monitoring and habitat quality assessment in complex environments.

1. Introduction

The term soundscape was introduced to describe the interplay between the physical landscape and the ensemble of biological (biophony), geophysical (geophony), and human-made (technophony) sounds that occupy it. In recent decades, passive acoustic monitoring (PAM) has enabled the continuous, non-invasive sampling of these sonic patterns, leading to the emergence of soundscape ecology and a suite of descriptors known as eco-acoustic indices [1,2,3,4].
Widely used metrics such as Acoustic Complexity Index (ACI) [5,6,7] and Acoustic Entropy Index (H) [8,9,10] condense information on pitch, modulation, and amplitude into single-value indicators. As a result, they have been promoted as proxies for species assemblage diversity and habitat quality. Despite their potential, recent reviews have highlighted that many eco-acoustic indices show inconsistent or unclear correlations with biodiversity. Their performance is often strongly influenced by various non-biological factors, including recorder characteristics, background geophony (e.g., wind, rain, or river noise), and analytical design choices, i.e., decisions made during data processing and analysis that can substantially affect both results and interpretation [11,12,13,14,15].
Among these choices, the selection of specific eco-acoustic indices plays a critical role. Different indices respond variably to soundscape characteristics: some are more sensitive to insects, others to bird vocalizations, while some may be influenced by anthropogenic or abiotic sounds, leading to markedly different interpretations [11,16,17]. Indeed, temporal parameters such as the analysis window size and resolution, e.g., using 1 s vs. 1 min segments, affect data granularity. Short windows capture rapid, transient events but may increase noise sensitivity, whereas longer windows can smooth over variability that is typical of biotic signals [13]. Similarly, the selection of frequency bands (e.g., 2–8 kHz for birds) and threshold levels can either enhance detection of target species or inadvertently exclude significant acoustic activity. Misaligned frequency ranges may also overemphasize geophonic or anthropogenic sounds [18,19,20,21].
Noise filtering and pre-processing techniques, such as filters for wind or low-frequency background noise, while reducing environmental interference may also risk removing important biotic sounds, including specific amphibian calls or low-frequency mammal vocalizations [22]. Moreover, the use of different software and scripts for calculating eco-acoustic indices, such as Wavesurfer [23], Soundecology [24], MonitoR [25], and e.g. Seewave [26], involves varying default settings and processing algorithms. Key parameters like FFT window size, smoothing methods, and detection thresholds can significantly affect outcomes, even when using the same raw audio data [13].
In addition to these important factors, it has been shown that values and trends of eco-acoustic indices are not always comparable across studies. This is partly due to the wide range of commercially available recorders, which differ in cost, sensitivity, and frequency response, introducing biases and inconsistencies in index calculations [27]. These variations contribute to a broader lack of standardization in eco-acoustic research. As a result, the generalizability of acoustic indices as reliable proxies for biodiversity remains limited. This highlights the need for calibration across different environments and validation against independent biodiversity measures.
Researchers are increasingly urged to combine multiple metrics [28] or use composite indices tailored to each study area [29], either through statistical methods or by analyzing the autocorrelation and long-term memory of environmental sounds. Efforts to develop a universal index, such as the Soundscape Ranking Index, are still ongoing. Preliminary results aimed at providing an overall assessment of an area’s soundscape are promising, although further research is needed to refine these approaches [30,31].
At the same time, PAM deployments are expanding from short-term recordings to thousands of hours of audio data. This growing volume has renewed interest in simple descriptors that can be extracted quickly and compared across sites. For instance, the equivalent continuous sound pressure level, Leq, a widely used and standardized acoustic metric, meets this need: it can be calculated at any temporal resolution and, when using calibrated devices, can be directly compared to regulatory noise limits. Furthermore, analyzing the temporal dynamics of short-window Leq values (e.g., Leq1s) allows the identification of diel cycles in biophonic activity and anthropogenic disturbances, without requiring complex spectral processing. Preliminary unsupervised analyses have already demonstrated that peak-based metrics can effectively distinguish functional periods of bird, insect, and anthropogenic activity in urban soundscapes [18,32,33].
In this work, we study a highway-bisected woodland within the Ticino River Regional Park (northern Italy), an ecologically valuable area shaped by both natural processes and anthropogenic pressures. Using passive acoustic monitoring (PAM) recordings collected at multiple locations, we aim at: (1) computing the equivalent continuous sound pressure level at one-second resolution, Leq1s, (2) extracting inter-peak intervals as a proxy for vocal activity rhythms, and (3) assessing whether these simple, energy-based metrics can effectively capture soundscape structure. However, the complexity of overlapping acoustic sources often requires more nuanced analysis.
Our main goal consists in developing a methodological approach expected to enhance our understanding of a complex soundscape dynamics based on wavelet analysis [34], referred to as the maximal-overlap discrete-wavelet transform (MODWT) [35]. The method is built upon a decomposition of each signal into a series of frequency-resolved components which, retaining the original time resolution, turns out to be particularly well-suited for analyzing complex environmental recordings. Although  this study is grounded in an ecological context, its primary contribution is indeed methodological. We evaluate whether a wavelet-based decomposition using the maximal-overlap discrete wavelet transform (MODWT) can enhance the extraction and interpretation of simple energy-based acoustic descriptors (Leq1s, peak intervals) that are widely used in soundscape ecology, with the intent to help quantify ecological disturbance. Thus, the ecological differences described in this study are site-specific exploratory observations, being the MODWT analysis the primary methodological contribution.
We compute the sound pressure level and peak statistics from both the full-band signal and its frequency-resolved wavelet components. In addition, we calculate the autocorrelation function (ACF) of the wavelet components to estimate their respective decay times. Further validations are provided by an expert listening the audio files to quantify the presence and relative contribution of different sound sources. We expect that decomposing acoustic energy across wavelet levels will reveal distinct temporal signatures of biophonic, geophonic, and anthropogenic sources, thereby improving the robustness of simple eco-acoustic indicators.
The paper is organized as follows. In Section 2, we describe the area of study within the Ticino River Regional Park, the recording setup, how do we detect peaks within Leq1s time series, and the wavelet decompositon used. Details of the validation procedure performed by an acoustic expert are discussed. In Section 3 we present the results covering the wavelet decompostion and filter used, peak distributions, Leq1h as a function of hour of the day, and autocorrelation function of the Leq1s time series. Section 4 is devoted to a discussion of the results and the concluding remarks.

2. Materials and Methods

We describe the study area, recording setup, energy normalization procedure, wavelet decomposition, autocorrelation decay-time computation, and expert annotation process.

2.1. Study Area

The Ticino River Regional Park is a protected area that includes extensive forests, wetlands, and meadows along the river. These riparian woodlands play a vital ecological role by supporting high biodiversity and ensuring habitat connectivity; however, they are particularly vulnerable to fragmentation caused by infrastructure such as roads and highways (Figure 1). Three monitoring sites were selected: Site 1, Site 2, and Site 3, situated approximately 100 m, 300 m, and 500 m, respectively, from the main technophonic noise sources (a highway and a high-speed railway).

2.2. Recording Setup and Data Collection

Acoustic data were collected between 26 May 2021 and 10 June 2021 using two Soundscape Explorer–Terrestrial (SET, Lunilettronik, Fivizzano MS, Italy) devices. Each device was equipped with environmental sensors for measuring: humidity, temperature, light, and atmospheric pressure; and two microphones: one optimized for low-frequency sounds (up to 48 kHz) and the other for high-frequency sounds (up to 192 kHz). Although each SET recorder includes two microphones (one low-frequency and one high-frequency), all analyses in this study were based on the recordings from the low-frequency microphone channel, sampled at 48 kHz with 16-bit resolution. This bandwidth (0–24 kHz) fully covers the range of interest for both biophonic and anthropogenic components while ensuring homogeneous data across all sites. The SET units were mounted approximately 4 meters above ground level on trees positioned along a transect oriented perpendicular to the main sources of anthropogenic noise in the area–namely, the A4 highway and a high-speed railway (see Figure 1).
Three sites were selected for monitoring, with each site observed for approximately one week. Due to the availability of only two SET devices, a sequential monitoring scheme was adopted. Initially, the devices were deployed at Site 1 and Site 3. After one week, the  device at Site 1 was relocated to Site 2. As a result, data collection was divided into two continuous monitoring periods:
  • Period 1: From 26 May 2021 (13:00) to 2 June 2021 (23:54), recordings were collected from Site 1 and Site 3.
  • Period 2: From 3 June 2021 (13:00) to 10 June 2021 (23:54), recordings were collected from Site 2 and Site 3.
Site 3 was intentionally kept active during both periods to serve as a clock reference, making possible the comparison between the two monitoring weeks data, and controlling for day-to-day environmental variability (e.g., weather or diel patterns). In contrast, Site 1 and Site 2 were alternated to capture the spatial gradient of anthropogenic disturbance, with Site 1 being closer to the highway and Site 2 located at an intermediate distance. Recordings were made at a 48 kHz sampling rate with 16–bit resolution, stored in WAV format. The recording schedule followed a duty cycle of 1 min of recording followed by a 5 min pause, resulting in 10 recordings per hour at each site. For simplicity, we selected just one day per site, and specifically, 1 June for Site 1, and 8 June for Site 2 and Site 3.

2.3. Peak Detection

For each 1 s interval, we computed the equivalent continuous sound pressure level as follows,
Leq 1 s = 10 log 10 p 2 / p 0 2 ,
where p 2 is the mean squared sound pressure over 1 s interval, and  p 0 = 20   μ Pa is the reference pressure. Peaks were detected using the Findpeaks function from the Pracma package in R (version 2025.09.1) [36]. This procedure was applied identically to both the full-band signal and each wavelet component, enabling consistent multi-scale comparison of temporal soundscape patterns. More generally, we often use the standard definition,
Leq τ = 10 log 10 1 τ t 1 t 2 d t p 2 ( t ) p 0 2 , τ = t 2 t 1 ,
and depending on the problem considered, we use τ = 1 s, 1 min or 1 h.
The Leq time series allows for the extraction of inter-peak intervals, or inter-peak lags, representing the time interval between two consecutive peaks in the Leq1s time series, and the median inter-peak lag corresponds to the median of all such intervals calculated over each recording and wavelet level, minimizing the influence of outliers.
The energy normalization of the signal and each wavelet component is discussed in Appendix A. The wavelet decomposition used here is summarized in Appendix B, and the MODWT frequency bands are discussed in Appendix C. We proceed with a brief discussion on the type of wavelet filters employed.

2.4. Choice of Wavelet Filters

The performance of MODWT in analyzing acoustic signals depends on the choice of wavelet filter. Here, we evaluated three wavelet bases commonly used in environmental and bioacoustic applications: Daubechies-4 (d4), Daubechies-8 (d8), and Symlet-8 (or Least Asymmetric 8, la8). These filters differ in terms of interpolating polynomial order, symmetry, and time–frequency resolution—all of which influence their ability to capture specific spectral and temporal features of soundscapes.
Daubechies wavelets [37,38,39,40,41,42] form a family widely used in signal processing. Each wavelet is identified by its order N (e.g., d4, d8), which determines two key properties: the order of the interpolating polynomial (also known as the number of vanishing moments) and the filter length (equal to 2 N taps). The filter length governs the time–frequency tradeoff: longer filters offer better frequency resolution but poorer time resolution, and vice versa.
Daubechies wavelets are particularly effective for detecting sharp acoustic events or filtering out background trends. Symlet (la8) wavelets, introduced as a modification of the Daubechies family, retain similar properties but are designed to be nearly symmetric. For example, la8 has the same number of vanishing moments as d8, but  introduces less phase distortion. This makes Symlets especially useful in applications where preserving waveform shape and timing is important.
In our context of soundscape monitoring, we tested three wavelet filters which may offer distinct advantages:
  • d4: Offers better time localization and is suited to detecting short transients, chirps, and sharp onsets in birdsong or anthropogenic pulses.
  • d8: Provides better frequency resolution and is more effective at capturing harmonic or tonal components such as whistles or environmental hums.
  • la8: A modified version of d8 with near symmetry, that can be useful to detect signals such as trills or insect calls.

2.5. Temporal Fractal Analysis

Fractals are structures characterized by scale-invariant patterns [43]. They provide a powerful framework for describing the complexity inherent in natural systems and have been applied across numerous disciplines [44,45], including ecology [46,47,48], which is particularly relevant to this study.
Fractal scaling offers an effective methodology for investigating acoustic complexity. The specific method used often depends on the nature of the signal being analyzed and the goals of the study [48]. In the fields of acoustics and music, several notable applications of fractal analysis have been reported [49,50,51]. However, fractal methods are generally more suitable for assessing the complexity of acoustic environments as a whole, rather than individual acoustic events. Recent studies have applied fractal analysis to estimate the fractal dimension of tropical acoustic communities and urban parks [52,53,54,55].
Here, we study the scaling behavior of time series derived from Leq values of both the full-band signal and its wavelet decomposition levels. Temporal scaling is quantified using the Hurst exponent, H, and several robust techniques are available for accurately estimating H [56,57]. Here, we use an indirect approach via the well-known relation,
H = 1 γ / 2 ,
where γ is the scale-invariant exponent describing a possible power-law decay of the autocorrelation function (ACF) of the time series. We estimate γ by identifying the temporal interval over which the autocorrelation function (ACF) follows a power-law decay, typically observed at shorter time scales. Values of γ in the range 0 < γ < 1 correspond to 1 > H > 1 / 2 , indicating scale-invariant persistent correlations and long time memory. Conversely, when γ > 1 , correlations are short-ranged, and the system behaves like a standard RW, with  H = 1 / 2 .
In our context, we expect natural soundscapes, such as those dominated by soniferous species, to exhibit intrinsic complexity and long-range temporal correlations. As a final remark, the term long memory in this study refers to time series whose autocorrelations decay according to a power-law at short time scales.

2.6. Validation of Acoustic Analysis Through Listening-Based Annotation

To validate the automated analysis, we conducted a listening-based annotation of the recordings to quantify the presence of biological sounds, anthropogenic noise, and natural non-biological sounds at each site. For each one-minute recording selected from the representative days, an expert listener evaluated and annotated the following:
  • Biological activity, primarily bird vocalizations. Bird numerosity: classified into three levels: none (value 0), few (value 1), many (value 2). Bird singing duration: classified as fraction of occupied singing time in each recording (range 0 to 1). Bird species: classified into none (value 0), one species (value 1), more than one (value 2). Bird distance: classified into none (value 0), close (value 1), far (value 2).  
  • Anthropogenic noise, with a focus on traffic-related sounds. Traffic activity: categorized as none (value 0), continuous (value 1), or intermittent (value 2). Traffic distance: classified into none (value 0), close (value 1), far (value 2). Train presence: classified into none (value 0) and present (value 1).
Specifically, each categorical score (0, 1, 2) was linearly rescaled to the [0–1] interval for visualization purposes. The perceptual thresholds were defined through preliminary listening sessions to ensure consistent semi-quantitative classification across recordings: 0 = absent, 1 = intermittent < 30% of the minute, 2 = dominant > 30–40%.
The presence of natural non-biological sounds was not significant for the selected analyzed period. To ensure consistency and minimize variability related to individual hearing sensitivity, all recordings were annotated by a single expert listener. Multiple listening trials were conducted to establish clear annotation criteria and enhance the reliability of the perceptual assessment. This iterative process helped refine the identification of key features, leading to a more robust qualitative classification of soundscape components.

3. Results

For each site, the recordings were processed in hourly batches of 10 wav files. Each file was energy-normalized to a spectrogram-based reference before analysis. One-second Leq1s values were then computed for both the broadband (full) signal and for 10 wavelet bands (W1–W10). Finally, peak intervals in the 1 s series were extracted to characterize the temporal distribution of high-energy acoustic events within each hour.
To select the most appropriate wavelet filter for our analysis, we compared Leq calculations using a representative recording from the study area. The results, presented in Figure 2, show Leq1min values across wavelet levels for the d4, d8, and la8 filters.
As shown in Figure 2, the d8 and la8 filters produce nearly identical results across all wavelet decomposition levels. In contrast, the d4 filter enhances energy in the higher wavelet levels W1–W4, which correspond to the mid-to-high frequency range ( f > 3  kHz). This range also includes non-biological components near the Nyquist limit, but since the frequency band (3–24 kHz) generally includes the most common biophonic components in woodland soundscapes—such as bird vocalizations and certain insect calls—we selected the d4 wavelet filter to better capture high-frequency biological activity. The d4 wavelet possess a superior temporal localization, allowing improved detection of short, transient acoustic events such as bird calls or anthropogenic pulses.
Before proceeding, we verified that the energy distribution across all the wavelet decomposition levels was consistent with the total energy of the original signal. An example of this distribution, calculated using the d4 filter for the same recording, is shown in Figure 3. The figure shows that most of the acoustic energy is concentrated in the set of levels (W6–W10), which correspond to the mid-to-low frequency range.
We also computed and compared the fractional wavelet energy distribution across levels for three pure tones (100 Hz, 1 kHz, and 10 kHz) using MODWT in R. The results, shown in Figure 4, illustrate that lower frequencies (e.g., 100 Hz) concentrate more energy in higher wavelet levels (such as W10 and W9), while 1 kHz maps to mid-levels (around W5), and higher frequencies (e.g., 10 kHz) are represented in the lower wavelet levels (W1 and W2). These results confirm the frequency localization capability of wavelet decomposition.
To investigate temporal patterns in acoustic activity, we computed the equivalent continuous sound pressure level at 1 s resolution, Leq1s, for all recordings across the three sites. From these time series, we extracted peaks as proxies for sound events, distinguishing between two detection modes: (1–1) peaks, which require a single decrease on either side of the peak, and (2–2) peaks, which require at least two consecutive decreases on both sides. While the (1–1) mode captures rapid fluctuations and short-lived events, the (2–2) mode acts as a low-pass filter, emphasizing more prominent and structured events.
The intervals between successive peaks were then used to estimate sound-activity dynamics, providing insight into the typical temporal spacing between acoustic events. Figure 5 shows the hourly peak count rate by wavelet level for both (1–1) and (2–2) modes, using the d4 filter. Overall, the peak counts for the (1–1) and (2–2) modes differ significantly, confirming that the (2–2) mode is, as expected, more selective. Additionally, the two modes exhibit opposite trends across wavelet decomposition levels. The (1–1) mode shows lower peak counts in the lower decomposition levels (W1–W4), with values steadily increasing toward higher levels. Within this pattern, Site 3 consistently shows higher peak counts, particularly in (W1–W4) and (W8–W10). In contrast, the (2–2) mode yields higher peak counts in the lower decomposition levels and an almost flat trend across the remaining levels. In this case as well, Site 3 shows the highest peak counts for (W1–W3), followed by Site 2.
Figure 6 shows the density distribution of inter-peak lags (see Section 2.3) calculated for the (2–2) mode using the d4 filter. In contrast, the (1–1) mode did not reveal any substantial differences. In Figure 6, noticeable variations are observed mainly for wavelet decomposition levels W5, W6, and W10.
As shown in Table 1, the median inter-peak lag is consistently around 10 s across all sites and wavelet levels, indicating a relatively stable underlying pattern of sound activity. However, Site 3 generally exhibits shorter mean lags in the mid-frequency wavelet levels (W2–W6), suggesting denser acoustic activity in these bands. For instance, at level W5, Site 3 has a mean lag of 11.8 s and a median lag of 9 s, compared to 10 s for both Site 1 and Site 2. Similarly, at W6, Site 3 shows a mean lag of 11.7 s and a median lag of 9 s, while Site 2 records a longer mean of 12.7 s and a median of 11 s. These shorter lag values indicate more frequent peaks (i.e., shorter intervals between sound events), which may reflect either higher sound activity or more tightly clustered acoustic events at Site 3 in those frequency bands. Furthermore, Site 3 consistently shows a higher number of peaks in the lower wavelet levels, e.g., (W2–W4), reinforcing the interpretation of denser acoustic activity. For example, Site 3 records 876 peaks at W2 and 916 peaks at W3, which align with the observed shorter lags. This increased activity may point to specific species or sound sources that are more active or acoustically dominant in that frequency range.
Figure 7 shows the hourly Leq1h distribution by wavelet level and site using the d4 filter. The FULL (original signal) band exhibits the highest Leq (≃ 60 dB) across all three sites, while W1 and W2 have the lowest. The increase in Leq from W1 to W5 indicates a general upward trend, peaking around (W5–W7), implying that mid-frequency bands dominate the acoustic energy at all sites. Site 1 and Site 2 have lower W1/W2 medians than Site 3, suggesting less energetic high-frequency content.
Between W7 and W10, Leq values plateau, suggesting that low-frequency components are relatively consistent in energy across sites. Site 3 generally shows slightly higher Leq values at lower levels (W1–W4), possibly indicating more high-frequency or impulsive events (e.g., insect calls), while Site 2 has higher median values at (W5–W7), reflecting persistent mid-frequency sounds. At higher levels (W8–W10), differences between sites converge, though Site 1 maintains a higher median value, likely due to greater exposure to technophonic sound sources.
Figure 8 illustrates the hourly evolution of Leq values across wavelet levels for Site 1, Site 2, and Site 3, respectively, using the d4 filter. The lines represent the mean Leq at each hour for the full signal (FULL) as well as for individual wavelet levels (W1–W10). At all sites, the FULL signal remains elevated throughout the day, averaging around 60 dB, with minor fluctuations that are difficult to discern due to the axis scale. In contrast, the wavelet-decomposed levels exhibit more dynamic behavior, particularly at finer temporal scales. For example, levels (W1–W4) display pronounced diurnal variations, whereas higher-level components (W8–W10) are comparatively stable.
Site 1 shows a well-defined diurnal rhythm. The finest level (W1) presents a clear midday trough, with higher values at dusk and in the early morning. Similar, though less pronounced, patterns appear in the mid-frequency bands (W3–W6), with Leq values typically decreasing from late morning to early afternoon before rising again in the evening. The low-frequency levels (W9–W10) remain relatively flat, indicating more stationary contributions at those scales.
At Site 2, the pattern is more irregular than for Site 1. Although a midday decrease is evident in many bands, the finest levels (W1–W3) occasionally show sudden increases, most notably around 04:00 and 19:00—spikes that are less prominent at Site 1. The mid-frequency bands (W4–W6) follow a broadly similar trend to Site 1 but with greater variability. As with Site 1, W9 and W10 remain relatively constant over the day, though slight evening increases are observable.
Site 3 displays yet a more complex and irregular pattern than for the other two sites. For higher frequencies (W1–W4), Leq values are both higher and more variable. Notably, (W1–W4) exhibit a dip at 15:00, displaying abrupt changes around 03:00 and 10:00. The mid-frequency bands are noisier than at the other sites, and unlike the Site 1 smooth profiles, Site 3 fluctuations are more erratic. Once again, W9 and W10 remain relatively flat, with minor evening increases, as observed at Site 2.
We now present the results of our empirical analysis of temporal scaling in both the original signal and its wavelet decomposition. The aim is to quantify the possible presence of a power-law correction to the temporal decay of the ACF of the broadband Leq1s time series for the three sites, described by the relation.
y ( t ) = y 0 t γ exp ( t / β ) ,
Equation (4) containing three fitting parameters, y 0 , γ and β , with the idea of making contact with the fractal analysis discussed in Section 2.5. To this end, we attempt to estimate the exponent γ using Equation (4) from the time series derived from both the full broadband signal and wavelet-decomposed (W1–W10) Leq representations. As outlined in Section 2.5, ACFs were computed for each site and wavelet component over a 24 h period.
We first perform a fit with the three parameters for each site, and evaluate the mean value β over the three sites. By keeping β = β 29.7 fixed in (4), thus reducing the number of fit parameters to two, we obtain new fits for y 0 and γ , displaying an accurate behavior. Note that we do not impose the constraint that γ > 0 for performing the fit, so that negative values of γ can eventually occur. In those circumstances, our approach based on Equation (3) does not apply (see below). As shown in Figure 9, Site 1 exhibits the steepest decay ( γ 0.121 ), while Site 2 and Site 3 show flatter decays ( γ 0.033 0 and γ 0.001 , respectively), indicating slightly different sound dynamics between Site 1 and (Site 2–Site 3) at short time scales. These results are only preliminary, and larger time series are needed before drawing general conclusions on the behavior of the ACF. Figure 10 shows the fitted γ values for each wavelet decomposition level (W1–W10) and site.
The results in Figure 10 reveal a strong frequency dependence in temporal scaling behavior: high-frequency components (W1–W8) have negative γ values, whereas the low-frequency ones (W9–W10) yield positive 0 γ 1 , consistent with a conspicuous persistence described by anomalous diffusion exponents 1 / 2 H 1 (Equation (3)). Site-specific differences are evident across all levels, pointing to spatial variability in the temporal dynamics of the acoustic environment. The results suggest that the ACF for Site 1, which is closer to the highways, displays a faster time decay than for Site 2 and 3, as one may expect. Tentatively, we may interpret negative values of γ , found for the high-frequency wavelets, as representing bird sound activity almost unaffected by the relatively low-frequency disturbances produced by road and railroad traffic.

4. Discussion and Concluding Remarks

This study is experimental in nature, based on a limited dataset composed of three sites and a single 24 h recording period at each location. As such, the ecological differences we report, such as higher biophonic activity at Site 3 or stronger anthropogenic signatures at Site 1, should be interpreted strictly as site-specific patterns rather than generalizable ecological trends. Our primary aim is not to draw broad ecological conclusions but to use these recordings as a controlled test bed for evaluating the performance of MODWT-based, frequency-resolved energy metrics and peak-interval analysis. Therefore, the ecological observations serve mainly as illustrative examples that demonstrate how the proposed methodology responds to real soundscape variability.
We utilize the MODWT to decompose soundscape audio into localized components. We selected the d4 wavelet filter because its shorter support provides better temporal resolution for detecting fast, transient events and enhances energy representation in higher frequency bands (W1–W4), which are typically rich in biophonic signals (e.g., bird and insect calls). While d8 and la8 produced comparable overall Leq distributions, they did suppress energy in these higher bands. The decomposition confirmed expected spectral localization: higher wavelet levels (W8–W10) captured low-frequency energy, which are likely dominated by abiotic sounds (wind, traffic), while lower levels (W1–W4) were sensitive to high-frequency components, typically associated with biophonic sources like bird and insect calls.
Short-term acoustic events were isolated using the peak-search modes ((1–1) and (2–2)) to differentiate transient from structured events. The (1–1) mode, sensitive to rapid fluctuations, showed a general increase in peaks toward lower frequencies, with Site 3 dominating, especially at (W1–W4) and (W8–W10). The (2–2) mode, which emphasizes structured events (due to its temporal smoothing), revealed distinctive activity patterns with Site 2 and Site 3 exhibiting higher peak counts. Inter-peak lag distributions (Figure 6 and Table 1) strongly supported denser acoustic activity at Site 3, particularly in the high-to-mid-frequency bands (W1–W6). Although all sites had a median lag near 10 s, Site 3’s shorter mean lags in these key bands suggest a more clustered vocal landscape, potentially indicating overlapping or competitive calling behavior.
Hourly Leq trends (Figure 7) revealed distinct site-specific differences. For instance, Site 3 consistently showed higher Leq in the high-frequency (W1–W4) levels, indicating a richer high-frequency biophony (e.g., birds, insects). Site 2 prevailed in the mid-frequency (W5–W7) range (likely technophony), while Site 1 exhibited higher Leq at the low-frequency (W8–W10) levels. Overall, Site 3 concentrated its maximal energy in the mid-to-low bands (W6–W10), reinforcing the presence of rich high-frequency biophony in (W1–W4).
The hourly Leq trends across all sites (Figure 8) reveal the diel rhythm of biological activity, primarily through dawn and dusk peaks in levels (W1–W6). The mid-day decline in acoustic energy is likely due to reduced vocalization, possibly from thermal or light constraints. Conversely, the flatter profiles observed in the lowest-frequency bands (W9 and W10) at all sites represent temporally stable background sounds or abiotic sources (e.g., geophonies like wind or distant traffic) with minimal hourly variation. These results validate MODWT-based Leq tracking as an effective method for capturing these diel acoustic signatures.
Site 1 shows a relatively smooth, structured diurnal pattern at finer wavelet levels, reflecting the natural ecological dynamics and predictable daily cycles of the biophonic activity. The proximity to the highway is evident in similar Leq trends in the low-frequency (W8–W10) decomposition levels. Site 2 also displays an almost regular pattern but with more pronounced peaks (around 04:00 and 19:00), suggesting transient or irregular sound sources, such as intense bird choruses, indicative of a more complex acoustic environment. Site 3 presents a more erratic pattern in the high-mid-frequency bands (W1–W5); its Leq profiles are not as smoothly modulated as Site 1’s. Crucially, the Leq levels in these bands are much higher than the other two sites, strongly suggesting the presence of closer and louder biophonies.
Temporal fractal analysis, describing the complexity and persistence of sounds, shows that Site 2 and Site 3 display similar persistence (up to rounding errors) in sound activity, followed by Site 1 (Figure 9). Lower values of the γ exponent (indicating a slower decay of the ACF) support this conclusion. Site 1, which is the closest to road and railroad traffic, is characterized by the fastest ACF decay.
A multiscale analysis of the numerically obtained γ values across wavelet levels (W1–W10) revealed a strong frequency dependence in temporal scaling (Figure 10). High-frequency levels (W1–W7) exhibited even negative γ values, suggesting that our approach based on Equation (3) no longer applies. We interpret this result by suggesting that high-frequency wavelet time series are unaffected by the low-frequency perturbations originated from the road and railraod traffic. In addition, site-specific differences in persistence were also consistent across all levels, reflecting spatial variability in the acoustic environment’s temporal dynamics.
To validate the MODWT-based spectral analysis, we performed extensive listening-based annotations on one-minute recordings from each site over the analyzed period (see Figure 11). It is found that bird activity and abundance are negatively related to proximity to anthropogenic disturbance. Bird abundance and songs were lowest at Site 1 (closest to disturbance) and highest at the more distant Site 2 and Site 3, suggesting birds are more vocally active where noise is less intense [18,58]. Species richness did not strongly correlate with distance, but Site 3 maintained the most stable species diversity. Spatially, birds were heard farther from the recorder at Site 1, suggesting avoidance behavior in noisier environments. Traffic intensity and train counts were also highest at Site 1. Overall, these findings support the hypothesis that anthropogenic noise negatively impacts bird activity, diversity, and spatial behavior, with the strongest effects near human infrastructure [6,59].
Expert manual annotations (Table 2) confirmed the automated MODWT findings. First, the duration and intensity of birdsong temporally aligned with peak Leq values in (W1–W4) and high peak rates, especially at Sites 2 and 3 during dawn/afternoon. Annotations also confirmed a higher biophonic presence at Site 3, validating its elevated wavelet-level energy and shorter inter-peak intervals. Second, traffic profiles confirmed site distinctions: Sites 1 and 2 have more significant traffic disturbance, while Site 3 is primarily affected by continuous, distant, low-frequency residual noise. This aligns with W9–W10 energy distributions and explains the flat Leq values at Site 2, suggesting some of its mid-band energy may originate from biophonic sources.
Overall, the convergence between manual annotations and automated metrics do reinforce the robustness of MODWT-based acoustic monitoring and validates its potential for long-term eco-acoustic assessment in multi-source soundscapes. Further in-depth analysis focusing on species and source recognition could be of help to better interpret these findings.
In summary, this study stresses the advantage of integrating simple energy-based metrics with multi-resolution wavelet analyses to characterize complex environmental soundscapes. By combining the equivalent continuous sound pressure level at one-second resolution (Leq1s), inter-peak interval statistics, and the maximal-overlap discrete-wavelet transform (MODWT), we were able to capture both the spectral and temporal dynamics of acoustic activity across a highway-bisected woodland in the Ticino River Park near Bernate, Italy. Our findings show that wavelet decomposition enhances the interpretability of Leq and peak-based metrics by isolating frequency-specific patterns of biophony, geophony, and technophony. Site-specific differences—such as the higher biophonic activity and vocal persistence observed at Site 3, or the spectral flattening and lower fractal persistence near the highway at Site 1—underscore the sensitivity of these methods to both ecological and anthropogenic influences. Hourly trends in wavelet-level Leq and autocorrelation metrics revealed diel periodicity and long-range temporal structures associated with biological activity, findings corroborated by expert listening.
The convergence between spectral analysis and expert annotation validates MODWT-based acoustic metrics as a suitable tool for eco-acoustic monitoring. Furthermore, the use of physically interpretable descriptors, such as Leq1s and inter-peak intervals, provides a valuable complement to traditional eco-acoustic indices, particularly in heterogeneous, multi-source environments. As this study was based on a limited temporal dataset and only three monitoring sites, our results should be interpreted as a methodological validation rather than a full ecological assessment. Future work will focus on extending this approach to longer time series, additional habitats, and integration with machine-learning-based source identification to improve ecological interpretation.
Because the small dataset employed, consisting of only three sites and monitored over one day each, the ecological patterns identified in this study cannot be generalized beyond the specific sampling context. The conclusions with ecological relevance should therefore be considered exploratory. In contrast, the methodological findings, that is the ability of MODWT to reveal frequency-specific energy dynamics, enhance peak-based descriptors, and highlight temporal decay properties, are the core contribution of this work (see the comparison with traditional ecoacoustic indices discussed in Appendix C). The study should thus be viewed primarily as a proof-of-concept demonstration of a multi-resolution acoustic analysis framework, with ecological interpretation offered only as an application example.
This study also presents limitations in its experimental design, particularly regarding the non-simultaneous nature of the recordings. The  three sites were monitored on different days, which introduces potential variability especially related to weather. To mitigate these issues, we examined meteorological and physical data from the ARPA weather station in Magenta during the recording period. During the measurement days, no hourly average wind speed exceeded 5 m/s and no hourly average precipitation exceeded 2 mm/h, so no recordings required exclusion. While these checks confirm that weather conditions were broadly comparable and free from major disturbances, we acknowledge that a full analysis of temperature effects was not performed. These environmental variables can influence vocal activity and soundscape structure, and future studies using simultaneous or repeated sampling designs will be necessary to better control and quantify their role.

Author Contributions

R.B.: Writing—review & editing, Writing—original draft, Visualization, Validation, Supervision, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. H.E.R.: Writing—review & editing, Writing—original draft, Visualization, Validation, Investigation, Formal analysis, Data curation, Conceptualization. G.G. and V.Z.-C.: Writing—review & editing, Validation, Conceptualization. A.P.: Writing—review & editing, Validation, Conceptualization, Data curation. G.Z.: Writing—review & editing, Validation, Conceptualization, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this study are available upon request.

Conflicts of Interest

The authors declare no conflict of interest. There is no financial interest to report. The authors certify that the submission is an original work and is not under review at any other journal.

Appendix A. Energy Normalization

The energy normalization ensures the comparison of the full-band signal (original recording) and each wavelet component. The  full-band signal is first analyzed using a short-time Fourier transform (STFT) to derive a time-frequency representation of sound pressure levels. The reference energy is computed from the spectrogram as follows: First, the spectrogram amplitude is converted from dB to squared pressure,
p 2 ( f , t ) = p 0 2 · 10 A dB ( f , t ) / 10 ,
where A dB ( f , t ) is the spectrogram amplitude in dB. Then, the average squared pressure across all frequencies at time frame t is computed as
p 2 ( t ) = 1 F f = 1 F p 2 ( f , t ) .
Finally, a reference energy (proportional to Pa2) is obtained by averaging over time,
E ref = 1 T t = 1 T p 2 ( t ) ,
where F is the number of frequency bins (in our case we use 1024 FFT points) and T is the number of time frames. The  reference energy, E ref , serves as a common baseline for normalizing both the full waveform and its wavelet components.
Now, to ensure consistency between spectral and time-domain representations, the full-band signal is rescaled to match the reference energy obtained from the spectrogram. To do this, we first compute the average energy of the original audio signal,
E signal = 1 N n = 1 N s 2 [ n ] ,
where s [ n ] is the signal amplitude at time sample n, and N is the total number of samples. Next, we apply a scaling factor to the signal to match the baseline energy:
s [ n ] = s [ n ] · E ref E signal ,
which allows the normalized waveform s [ n ] to obey the sum rule,
1 N n = 1 N s 2 [ n ] = E ref .

Appendix B. Wavelet Decomposition

We apply MODWT to the normalized signal s [ n ] . This transformation decomposes the input signal into J levels of wavelet coefficients, where each level corresponds to a specific frequency band. Specifically,
MODWT ( s ) { w j [ n ] } , j = 1 , , J .
Here, w j [ n ] represents the wavelet coefficient at level j and time index n. Unlike the standard discrete wavelet transform (DWT) [60], which downsamples the signal, MODWT preserves the original sampling rate. That is, the number of samples N remains the same for all wavelet levels. This allows temporal alignment between the original signal and all its wavelet components. Each wavelet level captures signal fluctuations at a different frequency scale. We compute the average energy at each level j using,
E j = 1 N n = 1 N w j 2 [ n ] ,
and therefore, the total energy across all levels in the decomposition is given by,
E MODWT = j = 1 J E j .
In general, the total MODWT energy, E MODWT , does not match the energy of the original signal due to the non-orthogonal nature of the MODWT filters, thus we apply a uniform scaling factor to all wavelet levels. This scaling adjusts each wavelet coefficient as follows:
w ˜ j [ n ] = w j [ n ] · E ref E MODWT ,
so that the square root ensures that energy (which scales quadratically with amplitude) is properly matched. After scaling, the energy of the transformed wavelet components equals the reference energy:
j = 1 J 1 N n = 1 N w ˜ j 2 [ n ] = E ref .
This procedure allows for a comparison of Leq indicators for the full signal and their wavelet decompositions. For the latter, the levels were computed using the modwt function from the Wavelets package in R [61].

Appendix C. MODWT Frequency Bands

We used 10 MODWT decomposition levels since this depth fully covers the ecologically relevant frequency range of our recordings given the 48 kHz sampling rate. In practice, 10 levels allow separation of:
  • <250 Hz: low-frequency anthropogenic rumble and river noise (geophony/technophony).
  • 0.25–2 kHz: mammal, amphibian, and low bird vocalizations.
  • 2–8 kHz: dominant bird biophony.
  • >8 kHz: insect stridulation and high-frequency cues.
For each level W j , the frequency interval is given by j 2 ( j + 1 ) f s ,   2 j f s Hz, where f s = 48 kHz is the sampling rate. The approximate frequency bands are reported in Table A1.
Table A1. Approximate frequency bands for each MODWT level decomposition. For the last band, the precise value 23.4 was rounded to 24. For the residual signal, s10 (0–24) Hz.
Table A1. Approximate frequency bands for each MODWT level decomposition. For the last band, the precise value 23.4 was rounded to 24. For the residual signal, s10 (0–24) Hz.
LevelFrequency Band (Hz)
W 1 12,000–24,000
W 2 6000–12,000
W 3 3000–6000
W 4 1500–3000
W 5 750–1500
W 6 375–750
W 7 188–375
W 8 94–188
W 9 47–94
W 10 24–47
As mentioned in Appendix B, MODWT does not downsample the signal. Therefore, each wavelet coefficient series retains the same length as the input signal. As a result, the transform is redundant and not orthogonal, meaning that energy is not uniquely partitioned across the decomposition levels. However, this redundancy allows for precise temporal alignment across all wavelet levels.
To be noted is that no field calibration-tone or microphone sensitivity-correction have been applied here. Consequently, the reported Leq values should not be interpreted as absolute sound pressure levels. However, because the same devices and settings were used consistently across all sites, the results still remain suitable for comparing sound level variations, spectral energy distributions and wavelet decomposition levels between sites.
The MODWT-based metrics provide clearer frequency-specific temporal dynamics than the conventional ecoacoustic indices, examples of which are plotted in Figure A1. While indices such as ACI or H summarize broadband patterns, they are known to be sensitive to geophony and recording conditions and often compress complex signals into single-value descriptors. In contrast, the MODWT decomposition allows γ , Leq1s, and peak statistics to be examined separately within ecologically meaningful frequency bands, making it easier to identify whether slow-decaying autocorrelation originates from biophony (mid–high levels) or from low-frequency anthropogenic noise (high-level MODWT components W7–W10). This complements and in some cases improves the interpretability of traditional indices.
Figure A1. Conventional ecoacoustic indices, ACI and H, vs, hour of day.
Figure A1. Conventional ecoacoustic indices, ACI and H, vs, hour of day.
Sensors 25 07248 g0a1

References

  1. Rajan, S.C.; Athira, K.; Jaishanker, R.; Sooraj, N.P.; Sarojkumar, V. Rapid assessment of biodiversity using acoustic indices. Biodivers. Conserv. 2019, 28, 2371–2383. [Google Scholar] [CrossRef]
  2. Smith, D.G.; Truskinger, A.; Roe, P.; Watson, D.M. Do acoustically detectable species reflect overall diversity? A case study from Australia’s arid zone. Remote Sens. Ecol. Conserv. 2020, 6, 286–300. [Google Scholar] [CrossRef]
  3. Alcocer, I.; Lima, H.; Moreira-Sugai, L.S.; Llusia, D. Acoustic indices as proxies for biodiversity: A meta-analysis. Biol. Rev. 2022, 97, 2209–2236. [Google Scholar] [CrossRef]
  4. Hending, D. Cryptic species conservation: A review. Biol. Rev. 2025, 100, 258–274. [Google Scholar] [CrossRef]
  5. Pieretti, N.; Farina, A.; Morri, D. A new methodology to infer the singing activity of an avian community: The Acoustic Complexity Index (ACI). Ecol. Indic. 2011, 11, 868–873. [Google Scholar] [CrossRef]
  6. Farina, A.; Pieretti, N.; Piccioli, L. The soundscape methodology for long-term bird monitoring: A Mediterranean Europe case-study. Ecol. Inform. 2011, 6, 354–363. [Google Scholar] [CrossRef]
  7. Fairbrass, A.J.; Rennert, P.; Williams, C.; Titheridge, H.; Jones, K.E. Biases of acoustic indices measuring biodiversity in urban areas. Ecol. Indic. 2017, 83, 169–177. [Google Scholar] [CrossRef]
  8. Sueur, J.; Pavoine, S.; Hamerlynck, O.; Duvail, S. Rapid acoustic survey for biodiversity appraisal. PLoS ONE 2008, 3, e4065. [Google Scholar] [CrossRef]
  9. Sugai, L.S.M.; Silva, T.S.F.; Ribeiro, J.W., Jr.; Llusia, D. Terrestrial passive acoustic monitoring: Review and perspectives. BioScience 2019, 69, 15–25. [Google Scholar] [CrossRef]
  10. Gibb, R.; Browning, E.; Glover-Kapfer, P.; Jones, K.E. Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring. Methods Ecol. Evol. 2019, 10, 169–185. [Google Scholar] [CrossRef]
  11. Sueur, J.; Farina, A.; Gasc, A.; Pieretti, N.; Pavoine, S. Acoustic indices for biodiversity assessment and landscape investigation. Acta Acust. United Acust. 2014, 100, 772–781. [Google Scholar] [CrossRef]
  12. Sueur, J.; Farina, A. Ecoacoustics: The ecological investigation and interpretation of environmental sound. Biosemiotics 2015, 8, 493–502. [Google Scholar] [CrossRef]
  13. Bradfer-Lawrence, T.; Gardner, N.; Bunnefeld, N.; Bunnefeld, L.; Dent, D.H.; Willis, S.G. Guidelines for the use of acoustic indices in environmental research. Methods Ecol. Evol. 2019, 10, 1796–1807. [Google Scholar] [CrossRef]
  14. Eldridge, A.; Guyot, P.; Moscoso, P.; Johnston, A.; Eyre-Walker, Y.C.; Peck, M. Sounding out ecoacoustic metrics: Avian species richness is predicted by acoustic indices in temperate but not tropical habitats. Ecol. Indic. 2020, 113, 106206. [Google Scholar] [CrossRef]
  15. Bradfer-Lawrence, T.; Bunnefeld, N.; Gardner, N.; Willis, S.G.; Dent, D.H. Rapid assessment of avian species richness and abundance using acoustic indices. Ecol. Indic. 2020, 115, 106400. [Google Scholar] [CrossRef]
  16. Lahoz-Monfort, J.J.; Magrath, M.J. A comprehensive overview of technologies for species and habitat monitoring and conservation. BioScience 2021, 71, 1038–1062. [Google Scholar] [CrossRef] [PubMed]
  17. Ross, S.R.J.; O’Connell, D.P.; Deichmann, J.L.; Desjonquères, C.; Gasc, A.; Phillips, J.N.; Sethi, S.S.; Wood, C.M.; Burivalova, Z. Passive acoustic monitoring provides a fresh perspective on fundamental ecological questions. Funct. Ecol. 2023, 37, 959–975. [Google Scholar] [CrossRef]
  18. Fuller, S.; Axel, A.C.; Tucker, D.; Gage, S.H. Connecting soundscape to landscape: Which acoustic index best describes landscape configuration? Ecol. Indic. 2015, 58, 207–215. [Google Scholar] [CrossRef]
  19. Buxton, R.; McKenna, M.; Clapp, M.; Meyer, E.; Stabenau, E.; Angeloni, L.; Crooks, K.; Wittemyer, G. Efficacy of extracting indices from large-scale acoustic recordings to monitor biodiversity. Conserv. Biol. 2018, 32, 1174–1184. [Google Scholar] [CrossRef]
  20. Wägele, J.W.; Bodesheim, P.; Bourlat, S.J.; Denzler, J.; Diepenbroek, M.; Fonseca, V.; Frommolt, K.H.; Geiger, M.F.; Gemeinholzer, B.; Glöckner, F.O.; et al. Towards a multisensor station for automated biodiversity monitoring. Basic Appl. Ecol. 2022, 59, 105–138. [Google Scholar] [CrossRef]
  21. Guagliumi, G.; Canedoli, C.; Potenza, A.; Zaffaroni-Caorsi, V.; Benocci, R.; Padoa-Schioppa, E.; Zambon, G. Unraveling Soundscape Dynamics: The Interaction Between Vegetation Structure and Acoustic Patterns. Sustainability 2025, 17, 4204. [Google Scholar] [CrossRef]
  22. Farina, A.; Gage, S.H. Ecoacoustics: The Ecological Role of Sounds; John Wiley and Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
  23. Farina, A.; Lattanzi, E.; Malavasi, R.; Pieretti, N.; Piccioli, L. Avian soundscapes and cognitive landscapes: Theory, application and ecological perspectives. Landsc. Ecol. 2011, 26, 1257–1267. [Google Scholar] [CrossRef]
  24. R-Project. Soundecology. Available online: https://cran.r-project.org/package=soundecology (accessed on 15 November 2025).
  25. R-Project. MonitoR. Available online: https://cran.r-project.org/web/packages/monitoR/index.html (accessed on 15 November 2025).
  26. R-Project. Seewave. Available online: https://CRAN.R-project.org/package=seewave (accessed on 15 November 2025).
  27. Potenza, A.; Zaffaroni-Caorsi, V.; Benocci, R.; Guagliumi, G.; Fouani, J.M.; Bisceglie, A.; Zambon, G. Biases in Ecoacoustics Analysis: A Protocol to Equalize Audio Recorders. Sensors 2024, 24, 4642. [Google Scholar] [CrossRef]
  28. Benocci, R.; Guagliumi, G.; Potenza, A.; Zaffaroni-Caorsi, V.; Roman, H.E.; Zambon, G. Application of Transfer Entropy Measure to Characterize Environmental Sounds in Urban and Wild Parks. Sensors 2025, 25, 1046. [Google Scholar] [CrossRef]
  29. Benocci, R.; Potenza, A.; Roman, H.E.; Bisceglie, A.; Zambon, G. Mapping of the acoustic environment at an urban park in the city area of Milan, Italy, using very low-cost sensors. Sensors 2022, 22, 3528. [Google Scholar] [CrossRef] [PubMed]
  30. Benocci, R.; Afify, A.; Potenza, A.; Roman, H.E.; Zambon, G. Toward the Definition of a Soundscape Ranking Index (SRI) in an Urban Park Using Machine Learning Techniques. Sensors 2023, 23, 4797. [Google Scholar] [CrossRef] [PubMed]
  31. Benocci, R.; Afify, A.; Potenza, A.; Roman, H.E.; Zambon, G. Self-Consistent Soundscape Ranking Index: The Case of an Urban Park. Sensors 2023, 23, 3401. [Google Scholar] [CrossRef] [PubMed]
  32. Morrison, C.; Auniņš, A.; Benkő, Z.; Brotons, L.; Chodkiewicz, T.; Chylarecki, P.; Escandell, V.; Eskildsen, D.; Gamero, A.; Herrando, S.; et al. Bird population declines and species turnover are changing the acoustic properties of spring soundscapes. Nat. Commun. 2021, 12, 6217. [Google Scholar] [CrossRef]
  33. Zhuang, Y.; Kang, Y.; Fei, T.; Bian, M.; Du, Y. From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence. Comput. Environ. Urban Syst. 2024, 110, 102122. [Google Scholar] [CrossRef]
  34. Walnut, D.F. An Introduction to Wavelet Analysis; Spriger: New York, NY, USA, 2004. [Google Scholar]
  35. Percival, D.B.; Walden, A.T. Wavelet Methods for Time Series Analysis; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
  36. Borchers, H.W. Pracma: Practical Numerical Math Functions, R Package Version 2.2.9. 2019. Available online: https://CRAN.R-project.org/package=pracma (accessed on 15 November 2025).
  37. Daubechies, I. Orthonormal Bases of Compactly Supported Wavelets. Commun. Pure Appl. Math. 1988, 41, 909–996. [Google Scholar] [CrossRef]
  38. Mallat, S. Multiresolution approximations and wavelet orthonormal bases of L2(R). Trans. Am. Math. Soc. 1989, 315, 69–87. [Google Scholar]
  39. Rioul, O.; Vetterli, M. Wavelets and signal processing. IEEE Signal Process. Mag. 1991, 8, 14–38. [Google Scholar] [CrossRef]
  40. Donoho, D.; Johnstone, I. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
  41. Li, S.; Liu, W. Meshfree and particle methods and their applications. Appl. Mech. Rev. 2002, 55, 1–34. [Google Scholar] [CrossRef]
  42. Jiang, Z.; Xie, W.; Zhou, W.; Sornette, D. Multifractal analysis of financial markets: A review. Rep. Prog. Phys. 2019, 82, 125901. [Google Scholar] [CrossRef] [PubMed]
  43. Mandelbrot, B.B. The Fractal Geometry of Nature; Freeman: New York, NY, USA, 1983. [Google Scholar]
  44. Frontier, S. Application of Fractal Theory to Ecology. In Developments in Numerical Ecology; Legendre, S., Legendre, L., Eds.; Nato ASI Series; Springer: Berlin/Heidelberg, Germany, 1987; Volume G14, pp. 335–378. [Google Scholar]
  45. Feder, J. Fractals; Plenum Press: New York, NY, USA, 1988. [Google Scholar]
  46. Sugihara, G.; May, R.M. Applications of fractals in ecology. Trends Ecol. Evol. 1990, 5, 79–86. [Google Scholar] [CrossRef]
  47. Hastings, H.M.; Sugihara, G. Fractals: A User’s Guide for the Natural Sciences; Oxford University Press: Oxford, UK, 1993. [Google Scholar]
  48. Halley, J.M.; Hartley, S.; Kallimanis, A.S.; Kunin, W.E.; Lennon, J.J.; Sgardelis, S.P. Uses and abuses of fractal methodology in ecology. Ecol. Lett. 2004, 7, 254–271. [Google Scholar] [CrossRef]
  49. Lyamshev, L.M.; Adreev, M.N. Fractals in underwater acoustics. In Proceedings of the Hydroacoustics and Ultrasonics: EAA Symposium, Jurata, Poland, 12–16 May 1997. [Google Scholar]
  50. Makabe, Y.; Muto, K. Application of fractal dimension to the evaluation of environmental sound. In Proceedings of the Inter-Noise 2014, Melbourne, Australia, 16–19 November 2014. [Google Scholar]
  51. Bigerelle, M.; Iost, A. Fractal dimension and classification of music. Chaos Solitons Fractals 2000, 11, 2179–2192. [Google Scholar] [CrossRef]
  52. Monacchi, D.; Farina, A. A Multiscale Approach to Investigate the Biosemiotic Complexity of Two Acoustic Communities in Primary Forests with High Ecosystem Integrity Recorded with 3D Sound Technologies. Biosemiotics 2019, 12, 329–347. [Google Scholar] [CrossRef]
  53. Orloci, L. An agglomerative method for classification of plant communities. J. Ecol. 1967, 55, 193–206. [Google Scholar] [CrossRef]
  54. Legendre, P.; Gallagher, E. Ecologically meaningful transformations for ordination of species data. Oecologia 2001, 129, 271–280. [Google Scholar] [CrossRef]
  55. Benocci, R.; Roman, H.E.; Bisceglie, A.; Angelini, F.; Brambilla, G.; Zambon, G. Auto-correlations and long time memory of environment sound: The case of an Urban Park in the city of Milan (Italy). Ecol. Indic. 2022, 134, 108492. [Google Scholar] [CrossRef]
  56. Koscielny-Bunde, E.; Bunde, A.; Havlin, S.; Roman, H.E.; Goldreich, Y.; Schellnhuber, H.J. Indication of a universal persistence law governing atmospheric variability. Phys. Rev. Lett. 1998, 81, 729. [Google Scholar] [CrossRef]
  57. Koscielny-Bunde, E.; Roman, H.E.; Bunde, A.; Havlin, S.; Schellnhuber, H.J. Long-range power-law correlations in local daily temperature fluctuations. Philos. Mag. B 1998, 77, 1331–1340. [Google Scholar] [CrossRef]
  58. Ulloa, J.S.; Gasc, A.; Gaucher, P.; Aubin, T.; Réjou-Méchain, M.; Sueur, J. Screening large audio datasets to determine the time and space distribution of Screaming Piha birds in a tropical forest. Ecol. Inform. 2016, 31, 91–99. [Google Scholar] [CrossRef]
  59. Gasc, A.; Anso, J.; Sueur, J.; Jourdan, H.; Desutter-Grandcolas, L. Cricket calling communities as an indicator of the invasive ant Wasmannia auropunctata in an insular biodiversity hotspot. Biol. Invasions 2018, 20, 1099–1111. [Google Scholar] [CrossRef]
  60. van Fleet, P.J. Discrete Wavelet Transformations: An Elementary Approach with Applications; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2019. [Google Scholar]
  61. Aldrich, E. Functions for Computing Wavelet Filters, Wavelet Transforms and Multiresolution Analyses. 2025. Available online: https://cran.r-project.org/web/packages/wavelets/wavelets.pdf (accessed on 15 November 2025).
Figure 1. Study area with indications of the three monitoring sites. The study was conducted near Bernate Ticino (45°27′ N, 8°48′ E), located in the western part of Lombardy, approximately 30 km west of Milan. Highlighted are the high-speed railway and the A4 highway.
Figure 1. Study area with indications of the three monitoring sites. The study was conducted near Bernate Ticino (45°27′ N, 8°48′ E), located in the western part of Lombardy, approximately 30 km west of Milan. Highlighted are the high-speed railway and the A4 highway.
Sensors 25 07248 g001
Figure 2. Leq1min [dB] as a function of Wavelet Level for d4, d8, la8 filters, calculated for a typical recording of 1 min duration (Equation (2)). To be noted is that the results for d8 and la8 are indistinguishable at this resolution.
Figure 2. Leq1min [dB] as a function of Wavelet Level for d4, d8, la8 filters, calculated for a typical recording of 1 min duration (Equation (2)). To be noted is that the results for d8 and la8 are indistinguishable at this resolution.
Sensors 25 07248 g002
Figure 3. Leq1min [dB] vs wavelet decomposition. Shown is the level distribution of energies among wavelets using d4 filter calculated for a typical recording (cf. Figure 2).
Figure 3. Leq1min [dB] vs wavelet decomposition. Shown is the level distribution of energies among wavelets using d4 filter calculated for a typical recording (cf. Figure 2).
Sensors 25 07248 g003
Figure 4. Fraction of total energy distribution across wavelet levels for three pure tones (100 Hz, 1 kHz, 10 kHz) using d4 filter and MODWT in R.
Figure 4. Fraction of total energy distribution across wavelet levels for three pure tones (100 Hz, 1 kHz, 10 kHz) using d4 filter and MODWT in R.
Sensors 25 07248 g004
Figure 5. Hourly peak count rate as a function of wavelet level for the (1–1) and (2–2) modes using the d4 filter, for the three sites.
Figure 5. Hourly peak count rate as a function of wavelet level for the (1–1) and (2–2) modes using the d4 filter, for the three sites.
Sensors 25 07248 g005
Figure 6. Distribution of inter-peak lags calculated for the (2–2) mode using the d4 filter.
Figure 6. Distribution of inter-peak lags calculated for the (2–2) mode using the d4 filter.
Sensors 25 07248 g006
Figure 7. Leq1h [dB] distribution vs. wavelet level, for each site, using d4 filter. The boxplots display median (central line), quartile and outlier information (bars) for each wavelet level and site.
Figure 7. Leq1h [dB] distribution vs. wavelet level, for each site, using d4 filter. The boxplots display median (central line), quartile and outlier information (bars) for each wavelet level and site.
Sensors 25 07248 g007
Figure 8. Leq1h [dB] values vs. hour of the day, for W1–W10 and full signal, using d4 filter.
Figure 8. Leq1h [dB] values vs. hour of the day, for W1–W10 and full signal, using d4 filter.
Sensors 25 07248 g008
Figure 9. ACF of Leq1s time-series for: Site 1 (blue circles), Site 2 (red circles) and Site 3 (gray circles). The fits using two parameters, y 0 and γ are reported in the inset (dashed lines).
Figure 9. ACF of Leq1s time-series for: Site 1 (blue circles), Site 2 (red circles) and Site 3 (gray circles). The fits using two parameters, y 0 and γ are reported in the inset (dashed lines).
Sensors 25 07248 g009
Figure 10. Fitted γ across all wavelet decomposition levels and for each site. For W7, γ coincides for Site 1 and Site 2. The horizontal line corresponds to γ = 0 , and only positive values can be interpreted using Equation (3). Additional measurements are required to improve on the temporal behavior of the ACF.
Figure 10. Fitted γ across all wavelet decomposition levels and for each site. For W7, γ coincides for Site 1 and Site 2. The horizontal line corresponds to γ = 0 , and only positive values can be interpreted using Equation (3). Additional measurements are required to improve on the temporal behavior of the ACF.
Sensors 25 07248 g010
Figure 11. Expert-based annotations for: (a) bird numerosity, (b) bird singing duration, (c) bird species, (d) bird distance, (e) traffic activity, (f) traffic distance and (g) train presence, across 24 h at each site. Values are normalized to the [0, 1] range per variable defined in Section 2.6. We note that in (e) the results for Site 1 and Site 2 coincide.
Figure 11. Expert-based annotations for: (a) bird numerosity, (b) bird singing duration, (c) bird species, (d) bird distance, (e) traffic activity, (f) traffic distance and (g) train presence, across 24 h at each site. Values are normalized to the [0, 1] range per variable defined in Section 2.6. We note that in (e) the results for Site 1 and Site 2 coincide.
Sensors 25 07248 g011
Table 1. Summary of inter-peak lags statistics by Site (1st column) and Wavelet Level (2nd column). The mean Lag (3rd column) are expressed in [s]. The relative differences of Lag between sites, R [%] (4th column), are reported as follows: R12, R13, R23, and defined as: R = (<Lag>larger − <Lag>smaller)/<Lag>smaller. The largest differences occur for W2: R12 = ( 11.5 10.0 ) / 10.0 = 15 %, and R13 = ( 11.5 10.2 ) / 10.2 = 12.7 %. We find <R> = 4.3% and σ R = 3.45%. The values in bold are larger than <R> + σ R 7.8 %. Median lags [s] are reported in the 5th column. Median lag differences between sites, D [s] (6th column), are denoted as D12, D13, and D23, and defined as Dij = Di − Dj. Non vanishing differences are highlighted in bold. The 7th column report the number of peaks found for each site and wavelet level.
Table 1. Summary of inter-peak lags statistics by Site (1st column) and Wavelet Level (2nd column). The mean Lag (3rd column) are expressed in [s]. The relative differences of Lag between sites, R [%] (4th column), are reported as follows: R12, R13, R23, and defined as: R = (<Lag>larger − <Lag>smaller)/<Lag>smaller. The largest differences occur for W2: R12 = ( 11.5 10.0 ) / 10.0 = 15 %, and R13 = ( 11.5 10.2 ) / 10.2 = 12.7 %. We find <R> = 4.3% and σ R = 3.45%. The values in bold are larger than <R> + σ R 7.8 %. Median lags [s] are reported in the 5th column. Median lag differences between sites, D [s] (6th column), are denoted as D12, D13, and D23, and defined as Dij = Di − Dj. Non vanishing differences are highlighted in bold. The 7th column report the number of peaks found for each site and wavelet level.
SiteLevel<Lag>R [%]Median LagD [s]N Peaks
Site1FULL12.72.4100551
Site2FULL12.40.8100533
Site3FULL12.61.6100550
Site1W111.98.2101623
Site2W111.08.291710
Site3W111.00.090722
Site1W211.515.091722
Site2W210.012.781842
Site3W210.22.080876
Site1W310.45.980875
Site2W39.824.080888
Site3W310.01.880916
Site1W410.71.090829
Site2W410.63.991833
Site3W410.32.981877
Site1W512.11.7100641
Site2W512.32.5101592
Site3W511.84.291713
Site1W612.33.310−1653
Site2W612.75.1111611
Site3W611.78.592638
Site1W712.85.8100570
Site2W712.10.8100641
Site3W712.75.0100622
Site1W812.42.4100599
Site2W812.74.8100601
Site3W813.02.4100563
Site1W912.40.0100582
Site2W912.42.4100540
Site3W912.72.4100559
Site1W1013.26.6100524
Site2W1012.43.010−1553
Site3W1013.69.811−1504
Table 2. Summary of bird and environmental metrics across a disturbance gradient: Bird numerosity (abundance), bird singing (duration), bird species (diversity). Arrows represent the direction of change from Site 1 to Site 3.
Table 2. Summary of bird and environmental metrics across a disturbance gradient: Bird numerosity (abundance), bird singing (duration), bird species (diversity). Arrows represent the direction of change from Site 1 to Site 3.
VariableSite 1Site 2Site 3Ecological Implication
(Near)(Mid)(Far)
Bird numerosityBirds avoid areas near high disturbance
Bird singing durationHigher vocal activity at quieter sites
Bird speciesLittle influence of anthropogenic noise
Bird spatial distanceFartherCloserBirds farther from recorder near disturbance
Traffic/trainsConfirms proximity to roads/railways
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Benocci, R.; Guagliumi, G.; Potenza, A.; Zaffaroni-Caorsi, V.; Roman, H.E.; Zambon, G. Wavelet-Based Analysis of Soundscape Dynamics in a Riparian Woodland: The Bernate-Ticino River Park. Sensors 2025, 25, 7248. https://doi.org/10.3390/s25237248

AMA Style

Benocci R, Guagliumi G, Potenza A, Zaffaroni-Caorsi V, Roman HE, Zambon G. Wavelet-Based Analysis of Soundscape Dynamics in a Riparian Woodland: The Bernate-Ticino River Park. Sensors. 2025; 25(23):7248. https://doi.org/10.3390/s25237248

Chicago/Turabian Style

Benocci, Roberto, Giorgia Guagliumi, Andrea Potenza, Valentina Zaffaroni-Caorsi, Hector Eduardo Roman, and Giovanni Zambon. 2025. "Wavelet-Based Analysis of Soundscape Dynamics in a Riparian Woodland: The Bernate-Ticino River Park" Sensors 25, no. 23: 7248. https://doi.org/10.3390/s25237248

APA Style

Benocci, R., Guagliumi, G., Potenza, A., Zaffaroni-Caorsi, V., Roman, H. E., & Zambon, G. (2025). Wavelet-Based Analysis of Soundscape Dynamics in a Riparian Woodland: The Bernate-Ticino River Park. Sensors, 25(23), 7248. https://doi.org/10.3390/s25237248

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop