MDPI - Publisher of Open Access Journals

13 pages, 1305 KiB

Open AccessArticle

Fine-Tuning BirdNET for the Automatic Ecoacoustic Monitoring of Bird Species in the Italian Alpine Forests

by Giacomo Schiavo, Alessia Portaccio and Alberto Testolin

Information 2025, 16(8), 628; https://doi.org/10.3390/info16080628 - 23 Jul 2025

Viewed by 253

The ongoing decline in global biodiversity constitutes a critical challenge for environmental science, necessitating the prompt development of effective monitoring frameworks and conservation protocols to safeguard the structure and function of natural ecosystems. Recent progress in ecoacoustic monitoring, supported by advances in artificial [...] Read more.

The ongoing decline in global biodiversity constitutes a critical challenge for environmental science, necessitating the prompt development of effective monitoring frameworks and conservation protocols to safeguard the structure and function of natural ecosystems. Recent progress in ecoacoustic monitoring, supported by advances in artificial intelligence, might finally offer scalable tools for systematic biodiversity assessment. In this study, we evaluate the performance of BirdNET, a state-of-the-art deep learning model for avian sound recognition, in the context of selected bird species characteristic of the Italian Alpine region. To this end, we assemble a comprehensive, manually annotated audio dataset targeting key regional species, and we investigate a variety of strategies for model adaptation, including fine-tuning with data augmentation techniques to enhance recognition under challenging recording conditions. As a baseline, we also develop and evaluate a simple Convolutional Neural Network (CNN) trained exclusively on our domain-specific dataset. Our findings indicate that BirdNET performance can be greatly improved by fine-tuning the pre-trained network with data collected within the specific regional soundscape, outperforming both the original BirdNET and the baseline CNN by a significant margin. These findings underscore the importance of environmental adaptation and data variability for the development of automated ecoacoustic monitoring devices while highlighting the potential of deep learning methods in supporting conservation efforts and informing soundscape management in protected areas. Full article

(This article belongs to the Special Issue Signal Processing Based on Machine Learning Techniques)

► Show Figures

Graphical abstract

33 pages, 38812 KiB

Open AccessArticle

What Creates Unsafe Feelings in Rural Landscapes: A Study of Perceived Safety Based on Facial Expression Recognition

by Jiayi Wang, Zhenhong Yang, Yu Lei, Tianhang Peng, Tao Long, Jiayi Liu, Haonan Li, Jie Yang and Miao Lu

Land 2025, 14(3), 575; https://doi.org/10.3390/land14030575 - 9 Mar 2025

Viewed by 1033

Abstract

Over 3 billion people live in rural, unincorporated areas globally, which are vital for habitation and production. The perceived safety of these landscapes significantly impacts health and well-being. However, rural areas, as natural environments for urban populations to connect with nature, have not [...] Read more.

Over 3 billion people live in rural, unincorporated areas globally, which are vital for habitation and production. The perceived safety of these landscapes significantly impacts health and well-being. However, rural areas, as natural environments for urban populations to connect with nature, have not been sufficiently addressed in terms of safety concerns. Negative factors often outweigh those promoting safety, limiting the restorative potential of rural landscapes. This study collected rural audio–visual samples through photography and recording, captured facial emotional responses using facial expression recognition models, collected psychological response data using the rural perceived unsafety scale, and statistically evaluated safety perceptions in rural landscapes. Results indicate that (1) audio stimuli exert a stronger influence on perceived unsafety than visual stimuli, with an EUPI (Emotional Unsafety Perception Index) value 44.8% higher under audio conditions than visual conditions; (2) artificial sounds amplify perceived unsafety by 30.9% compared to natural sounds; (3) different animal sounds show significant variations in reducing perceived unsafety, with birds and pigs identified as positive factors; (4) visual factors like plant shading and buildings strongly increase perceived unsafety; and (5) audio–visual matching complicates perceived safety. For the first time, we identify auditory stimuli as the dominant factor in perceived safety in rural landscapes. These insights establish a scientific foundation and practical guidance for improving perceived safety in rural environments. Full article

(This article belongs to the Section Land Planning and Landscape Architecture)

► Show Figures

Figure 1

19 pages, 3589 KiB

Open AccessArticle

Investigation of Bird Sound Transformer Modeling and Recognition

by Darui Yi and Xizhong Shen

Electronics 2024, 13(19), 3964; https://doi.org/10.3390/electronics13193964 - 9 Oct 2024

Cited by 1 | Viewed by 1860

Abstract

Birds play a pivotal role in ecosystem and biodiversity research, and accurate bird identification contributes to the monitoring of biodiversity, understanding of ecosystem functionality, and development of effective conservation strategies. Current methods for bird sound recognition often involve processing bird songs into various [...] Read more.

Birds play a pivotal role in ecosystem and biodiversity research, and accurate bird identification contributes to the monitoring of biodiversity, understanding of ecosystem functionality, and development of effective conservation strategies. Current methods for bird sound recognition often involve processing bird songs into various acoustic features or fusion features for identification, which can result in information loss and complicate the recognition process. At the same time, the recognition method based on raw bird audio has not received widespread attention. Therefore, this study proposes a bird sound recognition method that utilizes multiple one-dimensional convolutional neural networks to directly learn feature representations from raw audio data, simplifying the feature extraction process. We also apply positional embedding convolution and multiple Transformer modules to enhance feature processing and improve accuracy. Additionally, we introduce a trainable weight array to control the importance of each Transformer module for better generalization of the model. Experimental results demonstrate our model’s effectiveness, with an accuracy rate of 99.58% for the public dataset Birds_data, as well as 98.77% for the Birdsonund1 dataset, and 99.03% for the UrbanSound8K environment sound dataset. Full article

► Show Figures

Figure 1

14 pages, 1216 KiB

Open AccessArticle

Living Together, Singing Together: Revealing Similar Patterns of Vocal Activity in Two Tropical Songbirds Applying BirdNET

by David Amorós-Ausina, Karl-L. Schuchmann, Marinez I. Marques and Cristian Pérez-Granados

Sensors 2024, 24(17), 5780; https://doi.org/10.3390/s24175780 - 5 Sep 2024

Cited by 3 | Viewed by 2203

Abstract

In recent years, several automated and noninvasive methods for wildlife monitoring, such as passive acoustic monitoring (PAM), have emerged. PAM consists of the use of acoustic sensors followed by sound interpretation to obtain ecological information about certain species. One challenge associated with PAM [...] Read more.

In recent years, several automated and noninvasive methods for wildlife monitoring, such as passive acoustic monitoring (PAM), have emerged. PAM consists of the use of acoustic sensors followed by sound interpretation to obtain ecological information about certain species. One challenge associated with PAM is the generation of a significant amount of data, which often requires the use of machine learning tools for automated recognition. Here, we couple PAM with BirdNET, a free-to-use sound algorithm to assess, for the first time, the precision of BirdNET in predicting three tropical songbirds and to describe their patterns of vocal activity over a year in the Brazilian Pantanal. The precision of the BirdNET method was high for all three species (ranging from 72 to 84%). We were able to describe the vocal activity patterns of two of the species, the Buff-breasted Wren (Cantorchilus leucotis) and Thrush-like Wren (Campylorhynchus turdinus). Both species presented very similar vocal activity patterns during the day, with a maximum around sunrise, and throughout the year, with peak vocal activity occurring between April and June, when food availability for insectivorous species may be high. Further research should improve our knowledge regarding the ability of coupling PAM with BirdNET for monitoring a wider range of tropical species. Full article

(This article belongs to the Special Issue Advanced Acoustic Sensing Technology)

► Show Figures

Figure 1

14 pages, 2351 KiB

Open AccessArticle

A Novel Concept-Cognitive Learning Method for Bird Song Classification

by Jing Lin, Wenkan Wen and Jiyong Liao

Mathematics 2023, 11(20), 4298; https://doi.org/10.3390/math11204298 - 16 Oct 2023

Cited by 4 | Viewed by 1639

Abstract

Bird voice classification is a crucial issue in wild bird protection work. However, the existing strategies of static classification are always unable to achieve the desired outcomes in a dynamic data stream context, as the standard machine learning approaches mainly focus on static [...] Read more.

Bird voice classification is a crucial issue in wild bird protection work. However, the existing strategies of static classification are always unable to achieve the desired outcomes in a dynamic data stream context, as the standard machine learning approaches mainly focus on static learning, which is not suitable for mining dynamic data and has the disadvantages of high computational overhead and hardware requirements. Therefore, these shortcomings greatly limit the application of standard machine learning approaches. This study aims to quickly and accurately distinguish bird species by their sounds in bird conservation work. For this reason, a novel concept-cognitive computing system (C3S) framework, namely, PyC3S, is proposed for bird sound classification in this paper. The proposed system uses feature fusion and concept-cognitive computing technology to construct a Python version of a dynamic bird song classification and recognition model on a dataset containing 50 species of birds. The experimental results show that the model achieves 92.77% accuracy, 92.26% precision, 92.25% recall, and a 92.41% F1-Score on the given 50 bird datasets, validating the effectiveness of our PyC3S compared to the state-of-the-art stream learning algorithms. Full article

(This article belongs to the Special Issue Nature Inspired Computing and Optimisation)

► Show Figures

Figure 1

20 pages, 17450 KiB

Open AccessArticle

A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder

by Shaokai Zhang, Yuan Gao, Jianmin Cai, Hangxiao Yang, Qijun Zhao and Fan Pan

Sensors 2023, 23(19), 8099; https://doi.org/10.3390/s23198099 - 27 Sep 2023

Cited by 16 | Viewed by 5882

Abstract

Birds play a vital role in the study of ecosystems and biodiversity. Accurate bird identification helps monitor biodiversity, understand the functions of ecosystems, and develop effective conservation strategies. However, previous bird sound recognition methods often relied on single features and overlooked the spatial [...] Read more.

Birds play a vital role in the study of ecosystems and biodiversity. Accurate bird identification helps monitor biodiversity, understand the functions of ecosystems, and develop effective conservation strategies. However, previous bird sound recognition methods often relied on single features and overlooked the spatial information associated with these features, leading to low accuracy. Recognizing this gap, the present study proposed a bird sound recognition method that employs multiple convolutional neural-based networks and a transformer encoder to provide a reliable solution for identifying and classifying birds based on their unique sounds. We manually extracted various acoustic features as model inputs, and feature fusion was applied to obtain the final set of feature vectors. Feature fusion combines the deep features extracted by various networks, resulting in a more comprehensive feature set, thereby improving recognition accuracy. The multiple integrated acoustic features, such as mel frequency cepstral coefficients (MFCC), chroma features (Chroma) and Tonnetz features, were encoded by a transformer encoder. The transformer encoder effectively extracted the positional relationships between bird sound features, resulting in enhanced recognition accuracy. The experimental results demonstrated the exceptional performance of our method with an accuracy of 97.99%, a recall of 96.14%, an F1 score of 96.88% and a precision of 97.97% on the Birdsdata dataset. Furthermore, our method achieved an accuracy of 93.18%, a recall of 92.43%, an F1 score of 93.14% and a precision of 93.25% on the Cornell Bird Challenge 2020 (CBC) dataset. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

11 pages, 978 KiB

Open AccessArticle

Hearing to the Unseen: AudioMoth and BirdNET as a Cheap and Easy Method for Monitoring Cryptic Bird Species

by Gerard Bota, Robert Manzano-Rubio, Lidia Catalán, Julia Gómez-Catasús and Cristian Pérez-Granados

Sensors 2023, 23(16), 7176; https://doi.org/10.3390/s23167176 - 15 Aug 2023

Cited by 28 | Viewed by 6371

Abstract

The efficient analyses of sound recordings obtained through passive acoustic monitoring (PAM) might be challenging owing to the vast amount of data collected using such technique. The development of species-specific acoustic recognizers (e.g., through deep learning) may alleviate the time required for sound [...] Read more.

The efficient analyses of sound recordings obtained through passive acoustic monitoring (PAM) might be challenging owing to the vast amount of data collected using such technique. The development of species-specific acoustic recognizers (e.g., through deep learning) may alleviate the time required for sound recordings but are often difficult to create. Here, we evaluate the effectiveness of BirdNET, a new machine learning tool freely available for automated recognition and acoustic data processing, for correctly identifying and detecting two cryptic forest bird species. BirdNET precision was high for both the Coal Tit (Peripatus ater) and the Short-toed Treecreeper (Certhia brachydactyla), with mean values of 92.6% and 87.8%, respectively. Using the default values, BirdNET successfully detected the Coal Tit and the Short-toed Treecreeper in 90.5% and 98.4% of the annotated recordings, respectively. We also tested the impact of variable confidence scores on BirdNET performance and estimated the optimal confidence score for each species. Vocal activity patterns of both species, obtained using PAM and BirdNET, reached their peak during the first two hours after sunrise. We hope that our study may encourage researchers and managers to utilize this user-friendly and ready-to-use software, thus contributing to advancements in acoustic sensing and environmental monitoring. Full article

(This article belongs to the Special Issue Acoustic Sensing and Monitoring in Urban and Natural Environments)

► Show Figures

Figure 1

11 pages, 1700 KiB

Open AccessArticle

Analysis of the Territorial Vocalization of the Pheasants Phasianus colchicus

by Piotr Czyżowski, Sławomir Beeger, Mariusz Wójcik, Dorota Jarmoszczuk, Mirosław Karpiński and Marian Flis

Animals 2022, 12(22), 3209; https://doi.org/10.3390/ani12223209 - 19 Nov 2022

Viewed by 2007

Abstract

The aim of the study was to assess the impact of the duration of the mating season and the time of day on the parameters of the vocalization pheasants (duration of vocalization, frequency of the sound wave, intervals between vocalizations). In the study, [...] Read more.

The aim of the study was to assess the impact of the duration of the mating season and the time of day on the parameters of the vocalization pheasants (duration of vocalization, frequency of the sound wave, intervals between vocalizations). In the study, pheasant vocalization recorded in the morning (6⁰⁰–8⁰⁰) and in the afternoon (16⁰⁰–18⁰⁰) between April and June 2020 was analyzed. In total, the research material consisted of 258 separate vocalizations. After recognition of the individual songs of each bird, frequency-time indicators were collected from the samples to perform statistical analysis of the recorded sounds. The duration of the first syllable [s], the duration of the second syllable [s], the duration of the pause between the syllables [s], the intervals between successive vocalizations [min], and the peak frequency of the syllables I and II [Hz] were specified for each song. The duration of the syllables and the pauses between the syllables and vocalizations were determined through evaluation of spectrograms. The peak amplitude frequencies of the syllables were determined via time-frequency STFT analysis. Statistically significant differences in the distributions of the values of all variables between the analyzed months were demonstrated. The longest duration of total vocalization and the shortest time between vocalizations were recorded in May. Therefore, this month is characterized by the highest frequency and longest duration of vocalization, which is related to the peak of the reproductive period. The time of day was found to exert a significant effect on all variables except the duration of syllable II. The duration of vocalization was significantly shorter in the morning, which indicates that the cooks are more active at this time of day in the study area. The highest peak amplitude frequencies of both syllables were recorded in April, but they decreased in the subsequent months of observation. The time of day was also shown to have an impact on the peak amplitude frequencies, which had the highest values in the morning. Full article

(This article belongs to the Section Birds)

► Show Figures

Figure 1

22 pages, 1441 KiB

Open AccessReview

Bird Communities in a Changing World: The Role of Interspecific Competition

by Alban Guillaumet and Ivory Jordan Russell

Diversity 2022, 14(10), 857; https://doi.org/10.3390/d14100857 - 11 Oct 2022

Cited by 15 | Viewed by 8980

Abstract

Significant changes in the environment have the potential to affect bird species abundance and distribution, both directly, through a modification of the landscape, habitats, and climate, and indirectly, through a modification of biotic interactions such as competitive interactions. Predicting and mitigating the consequences [...] Read more.

Significant changes in the environment have the potential to affect bird species abundance and distribution, both directly, through a modification of the landscape, habitats, and climate, and indirectly, through a modification of biotic interactions such as competitive interactions. Predicting and mitigating the consequences of global change thus requires not only a sound understanding of the role played by biotic interactions in current ecosystems, but also the recognition and study of the complex and intricate effects that result from the perturbation of these ecosystems. In this review, we emphasize the role of interspecific competition in bird communities by focusing on three main predictions derived from theoretical and empirical considerations. We provide numerous examples of population decline and displacement that appeared to be, at least in part, driven by competition, and were amplified by environmental changes associated with human activities. Beyond a shift in relative species abundance, we show that interspecific competition may have a negative impact on species richness, ecosystem services, and endangered species. Despite these findings, we argue that, in general, the role played by interspecific competition in current communities remains poorly understood due to methodological issues and the complexity of natural communities. Predicting the consequences of global change in these communities is further complicated by uncertainty regarding future environmental conditions and the speed and efficacy of plastic and evolutionary responses to fast-changing environments. Possible directions of future research are highlighted. Full article

(This article belongs to the Special Issue Wildlife Population Ecology and Spatial Ecology under Global Change)

► Show Figures

Figure 1

17 pages, 682 KiB

Open AccessArticle

A Randomized Bag-of-Birds Approach to Study Robustness of Automated Audio Based Bird Species Classification

by Burooj Ghani and Sarah Hallerberg

Appl. Sci. 2021, 11(19), 9226; https://doi.org/10.3390/app11199226 - 3 Oct 2021

Cited by 17 | Viewed by 3153

Abstract

The automatic classification of bird sounds is an ongoing research topic, and several results have been reported for the classification of selected bird species. In this contribution, we use an artificial neural network fed with pre-computed sound features to study the robustness of [...] Read more.

The automatic classification of bird sounds is an ongoing research topic, and several results have been reported for the classification of selected bird species. In this contribution, we use an artificial neural network fed with pre-computed sound features to study the robustness of bird sound classification. We investigate, in detail, if and how the classification results are dependent on the number of species and the selection of species in the subsets presented to the classifier. In more detail, a bag-of-birds approach is employed to randomly create balanced subsets of sounds from different species for repeated classification runs. The number of species present in each subset is varied between 10 and 300 by randomly drawing sounds of species from a dataset of 659 bird species taken from the Xeno-Canto database. We observed that the shallow artificial neural network trained on pre-computed sound features was able to classify the bird sounds. The quality of classifications were at least comparable to some previously reported results when the number of species allowed for a direct comparison. The classification performance is evaluated using several common measures, such as the precision, recall, accuracy, mean average precision, and area under the receiver operator characteristics curve. All of these measures indicate a decrease in classification success as the number of species present in the subsets is increased. We analyze this dependence in detail and compare the computed results to an analytic explanation assuming dependencies for an idealized perfect classifier. Moreover, we observe that the classification performance depended on the individual composition of the subset and varied across 20 randomly drawn subsets. Full article

► Show Figures

Figure 1

18 pages, 1634 KiB

Open AccessArticle

An Ensemble of Convolutional Neural Networks for Audio Classification

by Loris Nanni, Gianluca Maguolo, Sheryl Brahnam and Michelangelo Paci

Appl. Sci. 2021, 11(13), 5796; https://doi.org/10.3390/app11135796 - 22 Jun 2021

Cited by 84 | Viewed by 9434

Abstract

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in [...] Read more.

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in urban environments. Since environmental audio datasets are often limited in size, a robust model able to perform well across different datasets is of strong research interest. In this paper, ensembles of classifiers are combined that exploit six data augmentation techniques and four signal representations for retraining five pre-trained convolutional neural networks (CNNs); these ensembles are tested on three freely available environmental audio benchmark datasets: (i) bird calls, (ii) cat sounds, and (iii) the Environmental Sound Classification (ESC-50) database for identifying sources of noise in environments. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. The best-performing ensembles are compared and shown to either outperform or perform comparatively to the best methods reported in the literature on these datasets, including on the challenging ESC-50 dataset. We obtained a 97% accuracy on the bird dataset, 90.51% on the cat dataset, and 88.65% on ESC-50 using different approaches. In addition, the same ensemble model trained on the three datasets managed to reach the same results on the bird and cat datasets while losing only 0.1% on ESC-50. Thus, we have managed to create an off-the-shelf ensemble that can be trained on different datasets and reach performances competitive with the state of the art. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

14 pages, 3928 KiB

Open AccessArticle

A Sound Source Localisation Analytical Method for Monitoring the Abnormal Night Vocalisations of Poultry

by Xiaodong Du, Fengdan Lao and Guanghui Teng

Sensors 2018, 18(9), 2906; https://doi.org/10.3390/s18092906 - 1 Sep 2018

Cited by 47 | Viewed by 7627

Abstract

Due to the increasing scale of farms, it is increasingly difficult for farmers to monitor their animals in an automated way. Because of this problem, we focused on a sound technique to monitor laying hens. Sound analysis has become an important tool for [...] Read more.

Due to the increasing scale of farms, it is increasingly difficult for farmers to monitor their animals in an automated way. Because of this problem, we focused on a sound technique to monitor laying hens. Sound analysis has become an important tool for studying the behaviour, health and welfare of animals in recent years. A surveillance system using microphone arrays of Kinects was developed for automatically monitoring birds’ abnormal vocalisations during the night. Based on the principle of time-difference of arrival (TDOA) of sound source localisation (SSL) method, Kinect sensor direction estimations were very accurate. The system had an accuracy of 74.7% in laboratory tests and 73.6% in small poultry group tests for different area sound recognition. Additionally, flocks produced an average of 40 sounds per bird during feeding time in small group tests. It was found that, on average, each normal chicken produced more than 53 sounds during the daytime (noon to 6:00 p.m.) and less than one sound at night (11:00 p.m.–3:00 a.m.). This system can be used to detect anomalous poultry status at night by monitoring the number of vocalisations and area distributions, which provides a practical and feasible method for the study of animal behaviour and welfare. Full article

(This article belongs to the Special Issue Sensors in Agriculture 2018)

► Show Figures

Figure 1

12 pages, 6251 KiB

Open AccessArticle

Automatic Taxonomic Classification of Fish Based on Their Acoustic Signals

by Juan J. Noda, Carlos M. Travieso and David Sánchez-Rodríguez

Appl. Sci. 2016, 6(12), 443; https://doi.org/10.3390/app6120443 - 17 Dec 2016

Cited by 34 | Viewed by 9481

Abstract

Fish as well as birds, mammals, insects and other animals are capable of emitting sounds for diverse purposes, which can be recorded through microphone sensors. Although fish vocalizations have been known for a long time, they have been poorly studied and applied in [...] Read more.

Fish as well as birds, mammals, insects and other animals are capable of emitting sounds for diverse purposes, which can be recorded through microphone sensors. Although fish vocalizations have been known for a long time, they have been poorly studied and applied in their taxonomic classification. This work presents a novel approach for automatic remote acoustic identification of fish through their acoustic signals by applying pattern recognition techniques. The sound signals are preprocessed and automatically segmented to extract each call from the background noise. Then, the calls are parameterized using Linear and Mel Frequency Cepstral Coefficients (LFCC and MFCC), Shannon Entropy (SE) and Syllable Length (SL), yielding useful information for the classification phase. In our experiments, 102 different fish species have been successfully identified with three widely used machine learning algorithms: K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Machine (SVM). Experimental results show an average classification accuracy of 95.24%, 93.56% and 95.58%, respectively. Full article

► Show Figures

Figure 1

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI