Eco-Acoustic Indices to Evaluate Soundscape Degradation Due to Human Intrusion

: The characterization of environmental quality and the detection of the ﬁrst sign of environmental stress, with reference to human intrusion, is currently a very important goal to prevent further environmental degradation, and consequently habitat destruction, in order to take appropriate preservation measures. Besides the traditional ﬁeld observation and satellite remote sensing, geophonic and / or biophonic sounds have been proposed as potential indicators of terrestrial and aquatic settings’ status. In this work, we analyze a series of short audio-recordings taken in urban parks and bushes characterized by the presence of di ﬀ erent human-generated-noise and species abundance. This study aims to propose a tool devoted to the investigation of urban and natural environments in a context with di ﬀ erent soundscape qualities, such as, for example, those that can be found in urban parks. The analysis shows the ways in which it is possible to distinguish among di ﬀ erent habitats by the use of a combination of di ﬀ erent acoustic and sound ecology indices. Principles, methods, technologies and applications


Introduction
The biodiversity in different habitats is threatened mostly as a result of human activity. Monitoring the environment is the only way to capture the early indications of ecosystem degradation and the fragmentation of natural environments [1], causing the endangerment of many species [2]. Compared to the sampling strategies that rely on the collection of animals and vegetable specimen, the use of passive acoustic monitoring (PAM) is spreading as a complementary method of environmental monitoring [3][4][5][6][7]. PAM consists in recording and analysing the sounds of an environment, extracting information on the abundance of species, the ecosystem condition, and degradation due to human intrusion [8][9][10][11]. The complexity of the sound blends is usually referred to as a soundscape. The soundscape at a location includes sounds of different natures which are generally recognized as: (i) biophony, with frequency emissions between 2 and 14 kHz [12]; (ii) anthrophony, corresponding to mechanical signals, with a frequency band between 0.1 and 2 kHz; and (iii) geophony (e.g., due to wind or rain), which tends to cover the entire spectrum, with more energy in the lower frequencies [13].
Discrimination among such sound sources is usually performed by the visual inspection of sonograms. However, such analysis is time consuming, especially if it is applied to long time recordings, even though automated methods have been developed in order to investigate the presence of different biological sound sources from the spectrogram of long-term recordings [14].
However, the introduction of eco-acoustic indices simplifies the analysis, as they focus on different characteristics of the sound, such as pitch, modulation, saturation, and amplitude. These indices, also referred to as eco-acoustic metrics, have been developed in order to evaluate the complexity and dynamics of the acoustic environment, and to act as proxies of species assemblage diversity measures in order to characterize environmental quality [15][16][17]. In general, eco-acoustic metrics provide additional information with respect to the barely-physical properties of a sound (like sound pressure level or power spectrum density), and have been extensively used in ecological studies in order to investigate diel cycles in tropical forests [18] and seasonal activities in temperate and tropical habitats [19,20]. It has been also demonstrated how the different soundscapes of different types of habitat can be successfully recognized, showing that eco-acoustic indices have great potentiality for environmental research [21][22][23]. On the other hand, contradictory results have also been reported [9,[24][25][26]. Such an evolution may have been caused by the non-congruity of the collected recordings and the misuse of indices for soundscape appraisal. The fact that guidelines for the assessment of faunal activity have been issued, but not regarding soundscapes themselves, may underlie a methodological insufficiency [17,27].
Many issues are still open: for example, geophony is always present in natural soundscapes, but typically high levels of geophony are often ruled out from analyses, even though a threshold has not been defined [28][29][30]. The same considerations hold for anthrophony, as some indices can be strongly influenced by these elements [31]. Many studies often take into consideration one or two indices, without justifying for their selection, thus limiting the efficacy of acoustic indices in the monitoring of biodiversity. As each index reflects different spectro-temporal features [32], the consideration solely of a limited number of indices could provide only a partial representation of the soundscape. This issue has been pointed out in Bradfer-Lawrence et al.'s [33] research. Here, the authors address the issue regarding the most representative sample recording sequence, its length, and the most promising combination of indices that reflects the environmental dynamics.
In this paper, we extend the above analysis using a statistical analysis of six eco-acoustic indices extracted from short audio-recordings, and by defining the most significant statistical descriptor of the distribution with the aim to quantify the soundscape of a site and to discriminate among habitats with different degrees of human intrusion.

Background
The analysis performed in a previous work [34] showed the behaviour of different indices under completely different background and biophonic activity conditions. The aim of that preliminary study was to evaluate the potential of soundecology indicators for the discrimination of the different types of sounds present in medium-large urban parks. For this purpose, two environmental settings were considered: a large urban park surrounded by busy roads, and a shrub-dominated area, the latter of which was used as a reference site for natural areas. The two sonic environments were characterized by the presence of different anthropogenic noises, especially in the urban park, and sounds from avian species. Soundecology indicators, or their combinations, could help to identify the urban park areas with higher biophonic activities and, therefore, the areas that offer greater relief and restorative value.

Study Sites
Twenty audio-recordings were considered for the analysis. They were taken in areas characterized by different degrees of road traffic noise and faunal activity (Table 1). Site A was located in the 'Parco Nord' of Milan, at the northern bound of the city. It is a big metropolitan park within the town of Milan and its hinterland, recovering green areas which once were industrial or uncultivated lands surrounded by congested roads. The recorder's position was approximately 90 m from a peri-urban arterial road with continuous road traffic noise emissions (see Figure 1a); more specifically, the daily mean traffic flow is approximately 22,000/24,000 vehicles, while the traffic flow during the sound recordings was approximately 1500 vehicles per hour; the estimated mean traffic speed was approximately 50-60 km/h, with very rare congested flow events. Site B was a bush area, taken as a reference setting of a natural undisturbed area, whereas sites C and D refer to natural areas, but with the presence of vehicular disturbance with intermittent events, and with a markedly different naturalistic footprint with respect to site A, in terms of avian species abundance. Sites B, C and D are located in the Appennino mountains (central Italy), but their exact locations were not available. Table 1. List of the audio-recordings used for the analysis, grouped by location. The type of sound, as identified by an operator and some characteristics of the birds' chorus distance (1 = near; 2 = distant), the birds' chorus duration (percentage with respect to the total recording duration; 1: <25%; 2: 25-50%; 3: 50-75%; 4: >75%), and the road traffic type (0 = no traffic; 1 = continuous; 2 = intermittent) are reported.   Table 1 reports the information related to the audio-recordings used for the analysis, grouped by location. The recordings were labelled according to visual inspections of the sonograms while listening to the recorded acoustic signals in order to annotate the events they contain. As indicated by the information on the type of sounds identified by the operator, the audio-recordings contain sounds of bird vocalizations differing in intensity (1 = near; 2 = distant), duration of singing activity (percentage with respect to the total recording duration; 1: <25%; 2: 25-50%; 3: 50-75%; 4: >75%), and degree of road traffic noise: absent, quite uniform backgrounds, and intermittent time patterns (0 = no traffic; 1 = continuous; 2 = intermittent). The audio-recordings were taken on 2nd May 2019 for Site A and 3rd May 2016 for the other sites, starting at 7:00 a.m. Each measurement was one-minute long, followed by a pause of 59 min (from 7:00 to 11.00 a.m.). The recordings were taken in springtime, when the singing activity of birds is at its peak, and when it is easy to detect their presence and study their singing dynamics. The weather conditions were stable during the recordings: cloud cover was present at site A, and it was partly covered at the other sites. Details about the recording times and average weather conditions are reported in Table 2.

Recording and Instrument Characteristics
A Soundscape Explorer Terrestrial (SET) instrument was used, which is a sound and meteorological data recorder for terrestrial environments. It is equipped with two microphones, an electronic control board, a full set weather sensors, and a rechargeable lithium battery pack, all contained in a waterproof plastic case. The two microphones are: • a low frequency microphone, used to acquire data in the audio range (up to 24 kHz, sampling rate 48 kHz); • a high frequency microphone, used to acquire data in the audio and ultrasonic range.
For this specific application, only the low frequency microphone was used. The electronic board contains all of the recording and processing devices, as well as the atmospheric sensors (pressure, temperature, relative humidity, ambient light). It is based on a high performance low power DSP (Digital Signal Processor). The SET recorder was mounted on a tree, approximately 5 m high above the ground, as shown in Figure 1b for site A.1.

Analysis and Indices Computation
The growing interest in the ecological applications of acoustic indices is mainly motivated by different approaches to the measurement of the variations in acoustic activity, which are predominantly derived from statistical summaries of the amplitude variation in time domains, or the magnitude differences between the frequency bands of a spectrogram. All of these indices aim to provide a quick analysis of the sound characteristics of an environment, but with different emphases. The analysis and computation of the indices were performed in the 'R' environment, version 1.2.5033 [35]. In particular, the Fast Fourier Transform (FFT) was computed by the function 'spectro' within the R package 'seewave' [36], setting as the frequency bounds the interval between 100 Hz and 24 kHz, and 1024 points for the computation. This setting corresponds to a frequency bin resolution (FR) of FR = 46,875 Hz and, therefore, to a time resolution (TR) of TR = 1/FR = 0.0213 s. All of the indices reported in this paper were computed using the R package 'soundecology' [37], with the exception of the Dynamic Spectral Centroid (DSC), for which a dedicated script was written.
A detailed description of the indices considered in this paper is given in [34], and a short summary is reported here to help readers. The indices are the DSC, Acoustic Complexity Index (ACI), Acoustic Diversity Index (ADI), Acoustic Evenness Index (AEI), Bioacoustic Index (BI), and Normalized Difference Soundscape Index (NDSI).
The DSC is based on the time evolution (with resolution of 100 ms) of the gravity centre of the spectrum, and thus reveals both the spectral and temporal analysis of the composition of the sound events and the background sounds [38,39]. The ACI measures the absolute difference of two adjacent intensities in a specific temporal interval and within a single frequency bin [16,19]. This latter is strictly linked by the FFT computation window.
The ADI [13,40] divides the spectrogram into bins and applies the Shannon index [41][42][43] to the proportion of the signals in each bin above a threshold. Thus, it provides the diversity of the frequency bins weighted by the intensity (level) of each bin. The AEI [40] applies the same scheme as the ADI, but using the Gini index applied to the signals in each bin, and, therefore, aims to measure the degree of the inequality in a distribution [44]. The BI is calculated as the area under the mean frequency spectrum, which provides a measure of both the sound level and the number of frequency bands used by the species [45]. The NDBI simply computes the ratio between the anthrophony, for example in the frequency range 0.1-2 kHz, and the biophony acoustic components, typically in the frequency range between 2 and 11 kHz, in order to evaluate the disturbance of a habitat [46].

Cluster Analysis
Cluster analysis is a data-mining tool employed in order to group together the data or objects found to be 'close' to one another, with the purpose of uncovering some inherent structure within the data. Clustering is an unsupervised method that works on datasets in which there is no outcome (target) variable, nor anything known about the relationship between the observations, that is, the unlabelled data. The most common clustering algorithms, which were considered in the present study, are: 'hierarchical' agglomeration using Ward algorithm [47], 'k-means' [48], 'DIvisive ANAlysis' (DIANA) [49], 'Self Organizing Tree Algorithm' (SOTA) [50], ' Partition Around Medoids' (PAM) [49], 'Clustering for LARge Applications' (CLARA) [49] and 'AGglomerative NESting' (AGNES) [49].
In the Ward hierarchical agglomeration, the Ward's minimum variance criterion minimizing the total within-cluster variance is used. To implement this method, at each step, the pair of clusters that leads to the minimum increase in the total within-cluster variance is found after merging. This increase is a weighted squared distance between the cluster centers. The result is a tree-based representation of the objects named a dendrogram.
The k-means algorithm aims to partition n observations into k clusters, in which each observation belongs to the cluster with the nearest cluster centroid. k-means clustering minimizes within-cluster variances.
The DIvisive ANAlysis (DIANA) algorithm starts by including all of the objects in a single large cluster. At each step of iteration, the most heterogeneous cluster is divided into two. The process is iterated until all of the objects are in their own cluster.
The Self-Organizing Tree Algorithm (SOTA) is an unsupervised neural network with a binary tree topology. The clustering process is performed from the top to the bottom, i.e., the highest hierarchical levels are resolved before entering the lowest levels.
The PAM algorithm is considered to be a more robust version of the k-means because, unlike the k-means, the PAM function does not introduce medoids randomly, but uses data points as the medoids. This makes it less sensitive to outliers. Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all of the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, as in the case of the k-means algorithm.
Clustering for LARge Applications (CLARA) is an extension to k-medoids (PAM) methods to deal with data containing a large number of objects (more than several thousand observations) in order to reduce the computing time.
AGNES (Agglomerative Nesting) is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. The algorithm starts by treating each object as a singleton cluster. Next, pairs of clusters are successively merged until all of the clusters are merged into one big cluster containing all of the objects. The result is a tree-based representation of the objects named a dendrogram.
Given the large number of available algorithms, deciding which clustering method to use and the number of clusters that are most appropriate for the data can therefore be a daunting task. Ideally, the resulting clusters should not only have good statistical properties (compact, well-separated, connected, and stable), but also yield results that are significant.
The package "clValid" [51-53] contains a variety of methods for the validation of the results from a cluster analysis. The available validation measures fall into the three general categories of 'internal', 'stability', and 'biological'. For the scope of our study, the latter has not been considered.
Internal validation measures take only the dataset and the clustering partition as their inputs, and use the intrinsic information in the data to assess the quality of the clustering. Such validation considers three features of the cluster partitions: connectedness, compactness, and separation. Connectedness relates to the extent to which observations are placed in the same cluster, and is measured by the connectivity [54], a parameter between zero and ∞ that should be minimized. Compactness assesses the clusters' homogeneity, usually by looking at the intra-cluster variance. Separation quantifies the degree of separation between the clusters. Because compactness and separation demonstrate opposing trends (compactness increases with the number of clusters, and separation decreases it), some popular methods combine the two measures into a single score, such as the Dunn index [55] or silhouette width [56]. The Dunn index has a value between zero and ∞, and should be maximized. The silhouette width lies in the interval (−1, 1), and should be maximized. High clustering performance is indicated by these indices being minimized.
The stability measures are a special version of the internal measures. They evaluate the consistency of a clustering result by comparing it with the clusters obtained after each column is sequentially removed. The included measures are the average proportion of non-overlap, the average distance, the average distance between the means, and the figure of merit. The figure of merit is related to the average intra-cluster variance of the observations in the deleted column, where the clustering is based on the remaining (undeleted) samples [57,58]. High clustering performance is indicated by these indices being minimized. All the clustering algorithms were ranked based on their performance, as determined simultaneously by all the validation measures [59].

Correlation Analysis
In order to obtain more insights into the use of acoustic and eco-acoustic indices, and to figure out the most representative ones that are suitable to highlight the sensitivity among different environmental and biophonic conditions, we decided to adopt an approach that was as independent of the analyst as possible. For each sound clip (described in Table 1), the temporal analysis of DSC, ACI, ADI, AEI, BI, and NDSI was calculated. The time resolution of each time series was set at 0.1 s for all of the indices except NDSI, for which 1 s was chosen because it could not be calculated at a higher time resolution due to the 'soundecology' package's constraints. The first selection among the indices can be obtained by calculating the correlation among the time series corresponding to each of the indices with the same time resolution. In this case, the idea is to retain only those indices which have a low correlation, and which therefore carry different/independent information. The scheme of the adopted method is illustrated in Figure 2. Two time series are considered correlated when their Pearson's correlation coefficient is greater than 0.4 (absolute value). We recall the definition of Pearson's correlation coefficient between two time series s(t) and s'(t) as the ratio between the covariance-cov(s, s')-of the two time series and the product of their standard deviations, σ s , σ s : where the covariance is a measure of the joint variability of the two times series for s and s', and it is defined as the mean value of the product of the deviations from their mean values [60]. This correlation analysis is well suited for continuous variables. Another important issue regards the identification of the most representative statistical metrics describing the distribution of the values of each index. For this purpose, seven descriptors were considered: the mean value, median, mode, standard deviation (SD), interquartile range (IQR), skewness, and kurtosis. The choice was also determined, in this case, by calculating the correlation analysis based on Pearson's method, and keeping those with low correlation.

Results
The results of the correlation analysis obtained according to Pearson's method (see Equation (1) are reported in Figure 3. The outcome of the analysis suggests the keeping of DSC, ACI and ADI as low-correlated indices. NDSI was also considered, as it cannot be included in the correlation analysis.
As for the most representative statistical metrics describing the distribution of the values of each index, the Pearson's correlation coefficients were computed between all of the pairs of statistical metrics for all of the selected indices, NDSI included. The results of the analysis are reported in Figure 4, which shows a strong correlation among the mean, median, mode, SD and IQR, and a relatively high correlation between the skewness and kurtosis. For this reason, we decided to retain the mean and the kurtosis values as the most representative descriptors of the indices' distribution.
After the selection of the less-correlated indices (DSC, ACI and ADI, plus NDSI) and statistical metrics (mean and kurtosis), we applied the cluster analysis in order to find out the similarities among the observations. The analysis was initially performed on each single index, keeping as variables the two statistical metrics.  Because of the data range within the different intervals, they needed to be scaled (mean = 0 and standard deviation = 1) before clustering; in addition, the Euclidean distance was considered to represent the similarity between the data pairs. The optimal solution for the clustering in terms of the agglomeration algorithm and the number of clusters can be obtained by the 'clValid' R package. The selected clustering algorithms to be compared were ranked based on their performance, as determined simultaneously by all of the validation measures considered in the package.
In order to give a rough distinction between the two extreme situations, the clValid procedure was set to provide the optimal clusterization at two clusters, and to compare among the following clustering algorithms: 'hierarchical', 'k-means', 'DIvisive ANAlysis' (DIANA), 'Self Organizing Tree Algorithm' (SOTA), ' Partition Around Medoids' (PAM), 'Clustering for LARge Applications' (CLARA) and 'AGglomerative NESting' (AGNES). The performance ranking was based on 'internal' validation measures only, as the 'stability' measures were not available due to constrains on the limited number of variables (only two, i.e., mean and kurtosis) as descriptors of the distribution of each index value versus time. The results of clValid, reported in Table 3, clearly show that the DIANA algorithm and two clusters was the optimal solution. Thus, this clustering setting was applied to all of the indices.
In order to identify the possible correlations among the obtained results, Spearman's correlation analysis was undertaken, as it is suitable for categorical/ordinal variables (the cluster membership). The results are given in Figure 5.
This result shows that the ACI and NDSI indices have very similar cluster composition, and this implies that they act almost equally on the recorded time series. On the other hand, DSC and ADI are completely uncorrelated, and thus carry very different information. Table 4 reports the results of the cluster analysis with the information of Table 1 for a direct comparison between the cluster membership and the observed recording characteristics.  Table 3. The distribution of each variable is displayed on the diagonal. On the bottom of the diagonal, the bivariate scatter plots are displayed with a fitted line. On the top of the diagonal, the values of the Spearman's correlation coefficient plus the 95% significance level are shown as stars ('***' corresponds to a p-values < 0.001, no stars correspond to a p-values > 0.1). Table 3. Performance of the clustering algorithms set at two clusters obtained by the clValid package for each index represented by the two least-correlated parameters: mean and kurtosis.  Figure 6 shows the box plots of the two metrics (mean and kurtosis) in the two clusters obtained for each index (NDSI excluded, because its cluster membership is strongly correlated to that of ACI). For the DSC and ADI indices, the kurtosis can separate the two clusters quite well, whereas this happens for the mean value for the ACI index. The clusterization for DSC based on the mean and kurtosis variables shows that cluster 1 (16 recordings) shows a median value slightly above 1 kHz, but with a high variability. This explains why many sites with completely different characteristics are included in the same group. The kurtosis of about 1 indicates a platykurtic distribution. Cluster 2 (only 4 recordings) shows lower values of DSC (around 500 Hz) but a high kurtosis (around 3), and thus has a normal distribution. This result seems to suggest that the discrimination among the recordings is not satisfactory, since it does not reveal significant differences among the recordings. The analysis of the ACI index shows the way in which cluster 1 (14 recordings) is mainly composed of roads with a lower ACI index than cluster 2 (6 recordings). This means that cluster 1 includes sites with different environmental noise pollution, but with low frequency modulation (for example, file 8 shows non-vigorous bird singing activity). The kurtosis of about 1 is quite similar for both clusters, but shows different variability.

Ranking of Solutions from the Optimal
The ADI index presents quite similar mean values (about 6 for cluster 1 and 6.1 for cluster 2). The higher diversity in cluster 2 is most likely due to the presence of different noise sources, such as a motorbike pass-by (file 2), sirens (file 4), and a bird species that was very close-by (file 5). The kurtosis indicates a distribution with larger tails than the reference gaussian (kurtosis = 3) for cluster 1, and an almost flat distribution for cluster 2 (kurtosis <−1).
This analysis shows that this simplistic approach is not able to describe all of the variability and differences in the environment.
In order to obtain a deeper insight into the obtained results, a cluster analysis was performed on the indices all together. Thus, a matrix formed of 12 variables (2 for each index) and 20 observations was considered as the input. Additionally, in this case, the optimal solution for clustering was obtained by the 'clValid' R package, the performance of which was determined simultaneously by all of the validation measures, namely 'internal' and 'stability'. The number of clusters was set between two (corresponding to the minimal discrimination between the data) and nine (corresponding to a sufficiently high number of groups). The best solution was the AGNES algorithm with two clusters, followed by the hierarchical and k-means agglomeration algorithms, both with a grouping into two clusters. The cluster membership turned out to coincide with the one provided by the analysis of the ACI index, as reported in Table 4. This result can be explained by the high correlation between ACI and NSDI that might drive the cluster formation. This agglomeration tends to separate quite clearly two extreme recording environments: the urban park (Site A in Table 4) belonging to cluster 1 and the pristine bush (Site B in Table 4) belonging to cluster 2. The other recordings are attributed to cluster 1 because of the presence, besides the singing activity, of a road traffic noise background. The only exception is clip #17 belonging to Site D, because of the presence of just two vehicles passing by.
In order to highlight the smaller differences between the recordings, the number of clusters was set at four, thus mimicking the number of different sites. Additionally, in this setting, the cluster performance ranking provided by clValid was considered, using all of the validation measures. The obtained optimal solution was the hierarchical clustering, followed by k-means and AGNES. The cluster membership provided by this grouping algorithm is given the Table 5 (column 'Cluster membership for 4 clusters'). The split of recordings is in quite good agreement with the four sites, with some exception due to the differences within each site. The boxplot of the mean values for each index in the four clusters is given in Figure 7. Cluster 1 is mainly composed of recordings belonging to Site A (4 out of 5), in addition to clip #15 belonging to Site C. They are characterized by very low values of ACI, meaning low sound modulation, which is typical of road traffic noise, low mean SC values (<500 Hz), and a mean NDSI very close to −1, indicating the prevalence of anthropogenic noise sources. The mean value of the ADI index is almost the same for all of the clusters and, therefore, it does not provide additional information. Cluster 2 is very similar to cluster 1, but with a slightly higher value of mean ACI. It is made up of just two recordings. Cluster 3 is made up of 6 sound clips belonging to Site B, in addition to clip # 17. This cluster has the same composition as the one found in the previous analysis with two clusters. It is characterized by higher values of mean ACI, SC and NDSI, emphasizing the presence of biophony with a larger sound modulation, higher frequencies, and low values of road traffic noise background. Cluster 4 presents similar mean index values to clusters 1 and 2, but with slightly higher values of mean SC and NDSI, indicating the presence of more biophonic activity. Surely, in the clustering process formation, the kurtosis is also involved, meaning that the distribution of the index in each recording is also an important clustering factor. Different types of sound sources were observed in the recordings at each site (see the 'Comments' column in Table 5), leading to an 'a priori' classification into seven groups. Thus, a further clustering was performed with the number of clusters set to seven. Again, the algorithm clValid provided the following ranking of clustering algorithms (descending from the optimal solution): k-means, DIANA, and PAM. The corresponding cluster membership is reported in Table 5 (column 'Cluster membership for 7 clusters'). Cluster 1 contains two sound clips, one belonging to Site A and the other to Site C. They are characterized by low values of both mean SC and NDSI, meaning that an important road traffic noise background is present, but also by relatively high values of mean ACI and ADI, suggesting higher sound modulation and species diversity due to the presence of biophonic sources (see Figure 8). Cluster 2 contains three recordings, two belonging to Site B (pristine bush) and one to Site D (bush with only two passers-by). It shows the higher values of all of the mean indices, indicating high sound modulation with a high frequency content, a relatively high species (sound) diversity, and low background noise. Cluster 3 is made up solely of a sound clip belonging to Site C, characterized by high background road traffic noise and bird vocalization. The analysis provides lower mean values of ACI and ADI than cluster 1; thus, the algorithm provided the formation of a stand-alone cluster. Cluster 4 is made of sound clips belonging to Site C (one) and D (three), and is characterized by mean values of SC, ACI, ADI and NDSI in-between the highest and the lowest calculated values. This indicates both the presence of biophonic (with quite an important diversity) and anthrophonic sources (relatively low mean NDSI value). Cluster 5 is very similar to cluster 4, and is made up of two sound clips belonging to Site C and Site D. Cluster 6 is made up of three sound clips belonging to Site B. The high values of mean SC, ACI and NDSI indicate that this site is dominated by biophonic sound sources. The lower value of ADI than the one observed for cluster 2 indicates the presence of lower diversity, most likely due to the presence of fewer singing species. Finally, cluster 7 (four clips belonging to Site A and one belonging to Site C) presents similar characteristics to cluster 1 and cluster 3, but with relatively lower mean values of ACI than cluster 1, and relatively higher values of mean ADI than cluster 3.

Discussion
The advantage of the proposed method is its ability to ascertain the extent to which two or more acoustic communities or soundscapes are different, and to evaluate the changes between landscapes and biophonic activity by using both the uncorrelated indices and synthesis descriptors of their distribution. This aspect cannot be fully described by the use of either a single index [59] or a generic descriptor (which is not representative of the distribution) and the consideration of all of the indices simultaneously, as they carry correlated information [33].
The consideration of more indices in the analysis led to a finer discrimination among the different soundscape scenarios, enabling us to differentiate among urban environments and natural sound abundance. Our preliminary study aimed to provide a rough distinction between two extreme settings [34]: a shrub-dominated area and an urban park with both biophonic activity and the presence of anthropogenic sources. In this further investigation, the analysis based on the use of two statistical descriptors (mean and kurtosis) and each single uncorrelated index proved not to be satisfactory, since it cannot catch the subtle differences among the habitats. The subsequent analysis performed on all of the indices provided, as the 'optimal solution', a grouping into two clusters overlapping the cluster analysis based on the ACI (NDSI) index alone. This agglomeration tends to separate quite clearly two extreme recording environments: the urban park (Site A in Table 4) belonging to cluster 1 and the pristine bush (Site B in Table 4) belonging to cluster 2. The other recordings are attributed to cluster 1 because of the presence, besides the singing activity, of an anthropogenic noise background.
By forcing the number of clusters to four, smaller differences can be revealed. In this case, the singularity of each site matches the group division quite well. Eventually, an 'a priori' classification into seven groups, representing the different types of sound sources recognized by an operator, allows subtle changes among the habitats to emerge.
As in previous works [16,61,62], our statistical approach is still lacking information on community-level dynamics, as was already pointed out by the work of Eldridge et al. [32], owing to the restricted time-frame that was analyzed, and to our use of statistical summaries of the frequency or time domain signal. However, this method may potentially highlight the correlation dynamics between different indices, thus revealing similarities among the soundscapes or monitoring slight changes in complex ecosystems which are crucial for ecological research.

Conclusions
The complexity of information in the acoustic environment makes its interpretation quite difficult and even misleading, especially if it is based on a single acoustic indicator. In this work, 20 short audio-recordings were analyzed. They were taken in urban parks and bushes, and were characterized by the presence of different human-generated-noise and species abundance, with the aim of deriving information on the capability of a single sound acoustic index or combination of indices to distinguish among different habitats. A statistically-based analysis was applied, in which both the choice of the index and of the distribution descriptors were chosen according to a correlation analysis. DSC, ACI ADI and NDSI are the indices that were found to be uncorrelated, and the mean and kurtosis were the descriptors considered.
Higher discrimination among different soundscape scenarios can be achieved by considering more indices in the analysis. This allowed the differentiation of urban environments and natural sound abundance. Conversely, the use of only one single index provides partial information, making the discrimination among different habitats less effective. The analysis also pointed out the importance of an 'a priori' discrimination by the operator, who may supervise the statistical analysis in the diversification of the habitats when they are characterized by complex sound components. The results obtained, despite being preliminary, are encouraging, even though deeper insights are needed in order to assess the robustness of the findings. In particular, a longer duration of the analysed recordings (or a wider distribution across the day) can strengthen the results regarding the significance of the correlation dynamics among the indicators. This could help to reveal similarities among the soundscapes, and to monitor subtle changes in complex ecosystems, which are crucial for ecological research. to environment ecological survey and monitoring', on the occasion of the 10th IALE World Congress, from July 1st to July 5th, 2019, Milan, Italy. Thanks also go to F. Angelini for classifying the recordings.

Conflicts of Interest:
The authors declare no conflict of interest.