1. Introduction
The loss of biodiversity on a global scale is an internationally recognized crisis that has increased dramatically in recent decades [
1,
2]. Changes in land and sea use, together with the direct exploitation of natural resources, are among the main factors causing the recent loss of biodiversity worldwide [
3,
4,
5]. Among the various industries that exploit natural resources, the mining industry has experienced rapid growth in recent decades due to the increasing global demand for the various resources it provides [
6]. In fact, mining is a fundamental source of raw materials for the manufacturing, transportation, construction, and energy sectors [
7]. The world mining production has increased from 9.6 billion metric tons in 1984 to 19.2 billion in 2023 [
8] and demand for raw materials is expected to continue to rise as the global population grows and many low-income economies become middle-income countries. For instance, concrete is the most consumed material after water in the world [
9]. With growing concerns about sustainability, aggregates underpin modern society and are the most extracted material on Earth, constituting the most important component in terms of mass for most urban development and infrastructure projects [
7]. On the other hand, the mining sector is also considered essential to enabling the green transition. However, mining processes can produce significant environmental impacts [
10] even more so in the current context of the climate crisis [
11].
Surface mining is a temporary and specific use of land that necessarily involves a major transformation of the land occupied [
12]. In Europe, there are an estimated 26,000 aggregate extraction sites [
13], and the sustainability of this activity depends, among other matters [
10], on the restoration of the mined land to reduce the environmental impacts of the post-activity stage [
14]. Companies are expected to restore these degraded areas and create new ecosystems, sometimes aimed at returning them to their pre-mining (natural) state [
15]. On a continental scale, it is estimated that approximately 80% of Europe’s habitats are in poor or bad state, not only due to mining activities, and the European Union (EU) has recently adopted the Nature Restoration Law (NRL) to address its restoration from now to 2050 [
16], and it includes mechanisms for monitoring and reporting progress [
17]. The implementation of the NRL monitoring will represent a major effort by numerous stakeholders and states in an attempt to stop and reverse biodiversity losses [
18]. Given that birds are a good candidate taxon for monitoring the health of many ecosystems [
19,
20], for instance, this regulation proposes using indices of their trends to assess biodiversity progress in certain habitats [
21].
Among other monitoring techniques, passive acoustic monitoring (PAM) has proven effective for large-scale population surveys of acoustically active species, as well as for a single indicator taxon (birds), making it a valuable tool for monitoring threatened species [
22] and, in particular, for monitoring in the field of ecological restoration [
19,
23,
24]. In addition, PAM is considered a non-invasive method [
25]. Acoustic recording units (ARUs) are increasingly used in long-term PAM due to the availability of open-source software and low-cost recorders that make it feasible to quickly collect and extract vocalizations of interest from large audio datasets [
26,
27].
Manual analysis of audio recordings by experts requires a time-consuming dedication, so automated analysis of recordings represents a solution that is under development but whose impact on the scientific community is increasingly evident in relation to biodiversity monitoring [
28,
29]. Among machine learning tools capable of identifying animals by sound, BirdNET Analyzer is a free bird sound recognizer that uses convolutional neural network algorithms to identify bird vocalizations in small segments (3 s) of longer audio recordings [
30]. The current BirdNET Species Range Model V2.4 [
31] has the capacity to identify >6000 bird species worldwide [
32] and outputs provide a confidence score for each prediction ranging from 0 to 1, indicating the program’s degree of confident is that a species is present in the recording [
33].
BirdNET has several tuning parameters that can be used to adjust key recognizer functions [
33,
34,
35]. For instance:
Overlap, is the number of seconds between extracted spectrograms and it can be configurated between 0.0 and 2.9 s (default value is 0.0).
Sensitivity, is the detection sensitivity and can be set between 0.50 and 1.50 (the default value is 1.0) in Birdnet Analyzer v 2.3.0 and software versions before v 2.1.0. It can be set between 0.75 and 1.25 (the default is 1.0) in Birdnet Analyzer v 2.1.0, v 2.1.1, and v 2.2.0.
Minimum confidence, adjusts the threshold to ignore results with confidence below. It can be configurated between 0.05 and 0.95 (default value is 0.25).
Year round/Week, users can indicate whether the samples refer to the entire year or to a specific week. Although each month is considered to be 4 weeks long, so a value between 1 and 48 must be entered in case of the option “week”.
Species List, there are different options to filter the species that are included in outputs. If the option “species by location” is selected, then the coordinates of the location must be specified.
Location (latitude/longitude), coordinates of the recording location can be indicated, because these parameters are used for automatically generating species lists.
Effective PAM is not simply an act of collecting huge amounts of audio recordings, but it must be properly planned and analyzed to obtain high-quality information [
29]. There is a growing trend in the literature on the need to guide other researchers in the optimal combination of configuration parameters when using new technologies for monitoring biodiversity, such as BirdNET [
36]. In this regard, studies on BirdNET performance stand out for their applicability both for the ecological monitoring of bird species and for the characterization of bird communities [
37]. For instance, Fairbairn et al. [
38] has recently demonstrated that using appropriate parameter settings and undertaking some basic validation, BirdNET can yield results comparable to experts without the need for time-consuming estimation of species-specific thresholds. This represents a significant advancement in the study of the influence of other BirdNET configuration parameters according to the objectives of each case study. (e.g.,
Overlap and
Sensitivity).
Among other challenges highlighted in the literature on the use of BirdNET are its limited use in monitoring birds in real-world conditions [
1,
30] and the lack of information on the effect of
Sensitivity and
Overlap parameters on BirdNET performance [
24,
39,
40,
41]. In fact, due to the aforementioned scarcity of studies dealing with this matter, some authors choose to use this justification to employ the software’s standard parameters [
42]. However, default BirdNET parameter settings may perform poorly [
38].
So, the main objective of this study is to evaluate BirdNET performance considering different configurations of the Overlap and Sensitivity parameters, applied to bird monitoring during 3 years as part of a broader ongoing monitoring program on the ecological restoration of a former limestone quarry located in central Spain. In addition, we also conducted a literature review on the use of BirdNET and the level of detail and justification provided in the scientific literature on these configuration parameters.
4. Results
As expected, both the number of vocalizations and the number of species identified by BirdNET increase with the value of the
Overlap and
Sensitivity parameters (
Table 1), so the effort required to validate BirdNET predictions by experts is also directly proportional to the value of these parameters. The confusion matrix also shows a significant increase in the number of FP when
Sensitivity is 1.25 compared to the other configurations of this parameter (
Table 2). This increase is much greater than, for example, that of the number of TP. This fact makes the
Precision value significantly low (
Table 3 and
Table 4) in the combinations of the
Overlap and
Sensitivity parameters that identify the highest number of bird species. The maximum accuracy according to our objectives (when SL is minimal) is achieved with
Precision values between 0.29 and 0.36 in both
Table 3 and
Table 4.
The lowest value allowed by the
Sensitivity parameter setting (i.e., 0.75) does not work, in any combination with the possible
Overlap values, to identify all the bird species whose vocalizations are recorded in the analyzed audio samples. On the contrary, the lowest value of the
Overlap parameter (i.e., 0.0 s) does allow, in some of the combinations with the possible values of
Sensitivity (e.g., 1.25), the identification of the 68 species in La Chanta bird inventory whose voices are included in the recordings (
Table 2). In fact, the 68 bird species recorded in the audio samples were identified when the
Sensitivity value was 1.25 and in all the assessed combinations of the
Overlap parameter (i.e., 0.0, 1.0, 2.0, and 2.9 s).
When sensitivity is 1.25, the maximum Recall (or TPR) value is 0.85 in scenario A, which refers to the complete inventory of avifauna in La Chanta (
Table 3), or 1.00 if calculated with respect to the 68 bird species identified by experts in the recordings (scenario B). In the first of these two scenarios, SL is 0.15 (15% of the birds inventoried in La Chanta are not recorded in the audio samples collected), and in the second scenario, SL is 0.00 (the 68 species for which the different metrics are calculated have been identified by BirdNET). With regard to the other metrics for assessing BirdNET performance, FPR increases with the value of
Overlap and
Sensitivity (
Table 3 and
Table 4). On the other hand, the F1-Score and MCC metrics perform inversely. F1-Score and MCC values closer to 0 indicate poorer performance by BirdNET and, in addition, are more time consuming due to the need to review a greater number of vocalizations and bird species predicted by the algorithm.
5. Discussion
The main finding in this work is that the lowest SL value (the main objective of applying PAM techniques in the study area) is achieved when the
Sensitivity parameter equals 1.25 in the four
Overlap configurations evaluated (0.0, 1.0, 2.0, and 2.9). In other words, the most appropriate
Sensitivity configuration for this objective is the one that corresponds to the highest value that BirdNET allowed. However, recent research on the effect on the
Sensitivity parameter (based on an earlier version of BirdNET that allowed its setting to be modified between 0.5 and 1.5) suggested that its minimum setting (i.e., 0.5) maximizes overall performance for community-level analyses across all confidence thresholds [
37]. Our result is closer to the criterion adopted by Funosas et al. [
28] for the configuration of BirdNET in the specific context of their study, which compressed datasets from 194 different sites. Nevertheless, these studies are based on the prioritization criterion of the total area under the curve of
Precision-Recall (PR AUC). They also calculated AUC of Receiver Operating Characteristic (ROC AUC) and F1-Score, although with varying results. In this sense, it should be noted that PR-AUC is also impacted by imbalanced datasets [
73].
Another relevant reason that may justify this difference in interpretation between our study and previous literature is that, in most published studies, the analyses are based on comparing the results of the algorithm’s predictions with visual or listening supervision performed by experts on the same audio recordings [
30]. In our case, the evaluation of BirdNET performance aims to verify the usefulness of PAM applied to the monitoring of birds in a real restoration project of a restored quarry, for the purpose of inventorying birds in a study area, based on a catalogue of species whose presence has been verified in situ. The objective of using BirdNET may determine the choice of metrics for assessing predictions and, similarly, the configuration parameters of the tool [
38,
74,
75]. In our case, we did not intend to calculate population densities or to characterize the activity of the species. In this case, among the usual BirdNET evaluation metrics, only
Recall allows us to identify the combination of
Overlap and
Sensitivity settings that meets our objectives (minimizing SL).
Fairbairn et al. [
38] conducted another of the few studies that analyzed the influence of
Overlap and
Sensitivity on BirdNET performance. They recommended using an overlap between 1 and 2 s for short 1 to 5 min recording schemes and also commented that an overlap may not be necessary depending on the cases. This recommendation could coincide with some of our optimal combinations in terms of the
Overlap parameter. However, based on their findings, they also recommended keeping the default
Sensitivity value (1.0) when maintaining higher minimum confidence thresholds. But in cases such as the aims of our study, that recommendation is not applicable. Furthermore, it must be remembered that confidence scores vary with the
Sensitivity parameter setting, as has already been indicated in the methods and materials section [
32,
40].
Sensitivity controls the shape of the sigmoid curve, which affects how strict or permissive the model is when deciding whether a sound belongs to a species [
32]. With a high sensitivity value, the model accepts weaker signals as possible detections. Although this causes an increase in the number of false positives, it also reduces SL, which can be key in studies such as ours, depending on the objective [
37,
38,
75]. On the other hand, higher
Overlap values increase the number of vocalizations and processing time. In large datasets, the increase in processing and validation time may not be worth it without providing many benefits in terms of
Recall [
35]. So, in our study, once we have identified the
Sensitivity values that allow us to minimize SL, we should then consider the value of
Precision as an indicator of the validation effort in PAM. In this case, very small variations in the
Precision value imply a significant change in the number of individual vocalizations and species predicted by the BirdNET algorithm. This is largely due to the possible double-counting (or more) of the same bird vocalization detected in two overlapping segments.
As already mentioned, our results show that at equal SL values, the highest
Precision is obtained with the combination of parameters
Overlap = 0 and
Sensitivity = 1.25 (
Table 5 and
Table 6). Therefore, this would be the optimal combination of
Overlap and
Sensitivity that minimizes the loss of species in the bird inventory using PAM in La Chanta wetland and, at the same time, implies the least validation effort. This approach to an optimization problem is close to the concept of the Pareto frontier, a method for solving multi-objective optimization problems [
72,
76]. In our case, as already indicated, the second criterion would be to minimize the number of BirdNET predictions to be verified, i.e., the validation time of expert birders (
Table 5 and
Table 6). Therefore, secondly, we would prioritize solutions at the Pareto frontier that reduce the number of vocalizations or species to be validated. This is very relevant considering the impact on the level of
Precision or FPR in our case study (
Table 3 and
Table 4).
As can be seen in
Table 5 and
Table 6,
Precision is the same in both comparison scenarios.
Precision is not affected by the maximum number of detectable species (
Maxsp). However, the TPR (
Recall) is modified between the two scenarios. This is because we consider either the maximum number of known species in the study area (80 species) or the maximum number of species identified in the audio recordings (68 out of 80). For this reason, the F1-Score is altered in each reference scenario, because the F1-Score is defined as the harmonic mean of
Precision and
Recall [
67]. These changes also affect the MCC measure in a similar way to the F1-Score. Both indicators show very modest results (less than or equal to 0.5) in solutions that minimize SL. Furthermore, the variability of F1-Score and MCC (
Table 3 and
Table 4) also does not allow us to identify the combination of
Overlap = 0 and
Sensitivity = 1.25 as the optimal solution. Therefore, these assessment metrics are not useful for the purpose of our study.
In our case, the BirdNET configuration combinations that produce the lowest SL are those with the highest FPR value. Therefore, the objective of our monitoring distances us from classic quantitative evaluation methods in binary classification on imbalanced datasets [
58,
59]. In this sense, evaluation measures of machine learning experiments, such as
Recall,
Precision, and F-Score are considered a common but poorly motivated way evaluating results by some authors [
77]. Therefore, we consider it essential to introduce a metric such as SL when deciding on the optimal configuration of BirdNET in community-level studies.
Regarding the lack of motivation or detail provided in many published studies using BirdNET, it should be noted that only 4 out of 69 studies reviewed did not adopt the default BirdNET configuration parameters and, even more, included minimal justification for their decision. This finding reveals a lack of effort in the literature to date to understand the effect that inappropriate use of the tool may have on the results themselves [
30,
32,
78]. Currently, the combination of a freely accessible and user-friendly tool such as BirdNET [
38], together with the reduction in the prices of ARUs, is popularizing the widespread use of PAM [
79,
80]. The appeal of having a tool that offers immediate results without requiring extensive knowledge of ornithology [
46] and the inertial use of controversial metrics for evaluating outcomes in machine learning [
60,
61,
62,
65,
66] can result in a dangerous combination.
As ecological implications of our findings, in the context of this work, we consider that the failure to detect a particular species can affect our understanding of the bird community and its functioning, and therefore, the management of the ecosystem or the development and implementation of conservation measures, for example, if the missing species require them [
81]. Furthermore, regarding ecological restoration and the monitoring of such projects, the absence of a species can lead to an inaccurate assessment if there are specific restoration objectives related to the presence or absence of a particular species [
82]. For this reason, adequate monitoring is fundamental for determining whether restoration projects are reaching their goals and for verifying their overall effectiveness and success [
83]. Our study does not evaluate bird behavior or interactions between birds and different sound sources. This is an aspect that could be addressed with audio recordings [
37,
56,
84,
85]. However, the bird inventory is the basis on which to develop more comprehensive ecological analyses, which are very common in acoustic ecology studies [
22]. Therefore, our findings have an impact on these potential deductions, which are theoretically more advanced. It is necessary to ensure the accuracy of the bird inventory on which the following analyses are based. In this regard, Huang et al. [
86] show how different data sources may produce different inventories that affects the inference of correlations between variables such as functional diversity and environmental factors.
As a final consideration, it is worth mentioning that identifying and measuring how variables such as sounds from other sources (natural or not), signal degradation characteristics for each species, recording device quality and signal-to-noise ratio can affect BirdNET’s automatic signal recognition capabilities [
84,
87]. This is an interesting issue that has been partially addressed to date in the literature, but not in relation to changes in overlap and sensitivity settings. This could be considered a limitation of our study. However, we point it out as a topic for further research due to its potential interest. Our goal was not to evaluate how these parameters may affect false positive or false negative rates in each of the scenarios. Because our scenarios are based on the same audio files obtained with a single recorder. Therefore, the presence and interactions between different sound sources, the characteristics of the recorder, signal degradation, etc., are constant in each combination of BirdNET configuration parameters and in each scenario analyzed in our work: scenario A, comparison of predictions with the complete list of species with confirmed presence in La Chanta wetland, or scenario B, taking as baseline the list of birds with confirmed presence in the audio files.
We consider our findings to be relevant for two reasons. First, our main result differs from prior research exploring the impact of adjusting the input values of BirdNET parameters that optimize monitoring outcomes for community-level analyses (i.e., the best settings to correctly identify the species that appear in a collection of recordings). Second, setting a
Sensitivity value other than 1 causes a problem that needs to be noted in the case of long-term monitoring programs. The
Sensitivity values allowed in BirdNET-Analyzer have varied from 0.5 to 1.5 and 0.75 to 1.25 (and, again, to between 0.5 and 1.5) between different versions of the program. It must be noted that changes in the
Sensitivity parameter will modify the relative distribution of confidence prediction scores [
32,
40]. Therefore, apart from
Sensitivity = 1, other
Sensitivity values may be incompatible between different BirdNET versions [
88]. It is a significant change that may force researchers to need additional future adjustments when trying to compare results from long term monitoring datasets.
Our findings can be applied by practitioners in various fields related to biodiversity monitoring programs anywhere in the world. Not only in anthropogenically restored environments, but also in natural spaces and any biodiversity monitoring scenario that requires, as in this case study, monitoring the composition of its bird communities. For instance, at the European level, the EU Nature Restoration Law (Regulation (EU) 2024/1991) came into force in 2024. It is considered crucial for restoring biodiversity in the EU and is also a key instrument for helping Member States meet international biodiversity commitments. The Regulation requires the use of biodiversity indicators, such as the common forest bird index based on Brlík et al. [
89], as key targets in the forestry and agricultural sectors, as evidence of the success of ecological restorations and its trends to monitor ecosystems’ health.