Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (102)

Search Parameters:
Keywords = vocalization classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 1634 KiB  
Review
AI-Powered Vocalization Analysis in Poultry: Systematic Review of Health, Behavior, and Welfare Monitoring
by Venkatraman Manikandan and Suresh Neethirajan
Sensors 2025, 25(13), 4058; https://doi.org/10.3390/s25134058 - 29 Jun 2025
Viewed by 824
Abstract
Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures [...] Read more.
Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures encompassing Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, attention mechanisms, and groundbreaking self-supervised models such as wav2vec2 and Whisper. The investigation reveals compelling evidence for edge computing deployment via TinyML frameworks, addressing critical scalability challenges in commercial poultry environments characterized by acoustic complexity and computational constraints. Advanced applications spanning emotion recognition, disease detection, and behavioral phenotyping demonstrate unprecedented potential for real-time welfare assessment. Through rigorous bibliometric co-occurrence mapping and thematic clustering analysis, this review exposes persistent methodological bottlenecks: dataset standardization deficits, evaluation protocol inconsistencies, and algorithmic interpretability limitations. Critical knowledge gaps emerge in cross-species domain generalization and contextual acoustic adaptation, demanding urgent research prioritization. The findings underscore explainable AI integration as essential for establishing stakeholder trust and regulatory compliance in automated welfare monitoring systems. This synthesis positions acoustic AI as a cornerstone technology enabling ethical, transparent, and scientifically robust precision livestock farming, bridging computational innovation with biological relevance for sustainable poultry production systems. Future research directions emphasize multi-modal sensor integration, standardized evaluation frameworks, and domain-adaptive models capable of generalizing across diverse poultry breeds, housing conditions, and environmental contexts while maintaining interpretability for practical farm deployment. Full article
(This article belongs to the Special Issue Feature Papers in Smart Agriculture 2025)
Show Figures

Figure 1

22 pages, 1595 KiB  
Review
Machine Learning Applications for Diagnosing Parkinson’s Disease via Speech, Language, and Voice Changes: A Systematic Review
by Mohammad Amran Hossain, Enea Traini and Francesco Amenta
Inventions 2025, 10(4), 48; https://doi.org/10.3390/inventions10040048 - 27 Jun 2025
Viewed by 576
Abstract
Parkinson’s disease (PD) is a progressive neurodegenerative disorder leading to movement impairment, cognitive decline, and psychiatric symptoms. Key manifestations of PD include bradykinesia (the slowness of movement), changes in voice or speech, and gait disturbances. The quantification of neurological disorders through voice analysis [...] Read more.
Parkinson’s disease (PD) is a progressive neurodegenerative disorder leading to movement impairment, cognitive decline, and psychiatric symptoms. Key manifestations of PD include bradykinesia (the slowness of movement), changes in voice or speech, and gait disturbances. The quantification of neurological disorders through voice analysis has emerged as a rapidly expanding research domain, offering the potential for non-invasive and large-scale monitoring. This review explores existing research on the application of machine learning (ML) in speech, voice, and language processing for the diagnosis of PD. It comprehensively analyzes current methodologies, highlights key findings and their associated limitations, and proposes strategies to address existing challenges. A systematic review was conducted following PRISMA guidelines. We searched four databases: PubMed, Web of Science, Scopus, and IEEE Xplore. The primary focus was on the diagnosis, detection, or identification of PD through voice, speech, and language characteristics. We included 34 studies that used ML techniques to detect or classify PD based on vocal features. The most used approaches involved free speech and reading-speech tasks. In addition to widely used feature extraction toolkits, several studies implemented custom-built feature sets. Although nearly all studies reported high classification performance, significant limitations were identified, including challenges in comparability and incomplete integration with clinical applications. Emerging trends in this field include the collection of real-world, everyday speech data to facilitate longitudinal tracking and capture participants’ natural behaviors. Another promising direction involves the incorporation of additional modalities alongside voice analysis, which may enhance both analytical performance and clinical applicability. Further research is required to determine optimal methodologies for leveraging speech and voice changes as early biomarkers of PD, thereby enhancing early detection and informing clinical intervention strategies. Full article
Show Figures

Figure 1

18 pages, 368 KiB  
Article
Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
by Bolaji A. Omodunbi, David B. Olawade, Omosigho F. Awe, Afeez A. Soladoye, Nicholas Aderinto, Saak V. Ovsepian and Stergios Boussios
Diagnostics 2025, 15(12), 1467; https://doi.org/10.3390/diagnostics15121467 - 9 Jun 2025
Viewed by 684
Abstract
Background: Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using [...] Read more.
Background: Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using a stacked ensemble learning approach, addressing challenges such as imbalanced datasets and feature optimization. Methods: An open-access PD dataset comprising 22 vocal attributes and 195 instances from 31 subjects was utilized. To prevent data leakage, subjects were divided into training (22 subjects) and testing (9 subjects) groups, ensuring no subject appeared in both sets. Preprocessing included data cleaning and normalization via min–max scaling. The synthetic minority oversampling technique (SMOTE) was applied exclusively to the training set to address class imbalance. Feature selection techniques—forward search, gain ratio, and Kruskal–Wallis test—were employed using subject-wise cross-validation to identify significant attributes. The developed system combined support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (DT) as base classifiers, with logistic regression (LR) as the meta-classifier in a stacked ensemble learning framework. Performance was evaluated using both recording-wise and subject-wise metrics to ensure clinical relevance. Results: The stacked ensemble learning model achieved realistic performance with a recording-wise accuracy of 84.7% and subject-wise accuracy of 77.8% on completely unseen subjects, outperforming individual classifiers including KNN (81.4%), RF (79.7%), and SVM (76.3%). Cross-validation within the training set showed 89.2% accuracy, with the performance difference highlighting the importance of proper validation methodology. Feature selection results showed that using the top 10 features ranked by gain ratio provided optimal balance between performance and clinical interpretability. The system’s methodological robustness was validated through rigorous subject-wise evaluation, demonstrating the critical impact of validation methodology on reported performance. Conclusions: By implementing subject-wise validation and preventing data leakage, this study demonstrates that proper validation yields substantially different (and more realistic) results compared to flawed recording-wise approaches. The findings underscore the critical importance of validation methodology in healthcare ML applications and provide a template for methodologically sound PD classification research. Future research should focus on validating the model with larger, multi-center datasets and implementing standardized validation protocols to enhance clinical applicability. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

33 pages, 3861 KiB  
Article
The Importance of Being Onset: Tuscan Lenition and Stops in Coda Position
by Giuditta Avano and Piero Cossu
Languages 2025, 10(6), 129; https://doi.org/10.3390/languages10060129 - 30 May 2025
Viewed by 2174
Abstract
This paper examines Gorgia Toscana (GT), a phenomenon of stop lenition observed in Tuscan varieties of Italian. Traditionally, this process has been understood to occur in post-vocalic positions, which, in the native lexicon, corresponds to onset position due to the absence of stops [...] Read more.
This paper examines Gorgia Toscana (GT), a phenomenon of stop lenition observed in Tuscan varieties of Italian. Traditionally, this process has been understood to occur in post-vocalic positions, which, in the native lexicon, corresponds to onset position due to the absence of stops in syllable codas in Italian, apart from geminate consonants that straddle the coda and onset of adjacent syllables. However, stops in coda positions are found in both loanwords (e.g., admin, Batman) and bookwords (e.g., ritmo, tecnica). Drawing on original acoustic data collected from 42 native speakers of Florentine Italian, we investigated the realization of stops in such lexical items through allophonic classification and quantitative analysis. Our primary aim was to test the Onset Hypothesis, which posits that Gorgia exclusively affects stops in onset positions, implying that coda stops should not undergo lenition. Our findings support this hypothesis. We provide a phonological analysis within the frameworks of Strict CV and Coda Mirror, emphasizing the importance of syllable structure in understanding the manifestation of Gorgia Toscana, which we argue cannot be adequately captured solely by considering the linear order of segments. Full article
(This article belongs to the Special Issue Speech Variation in Contemporary Italian)
Show Figures

Figure 1

26 pages, 13565 KiB  
Article
Marine Mammal Call Classification Using a Multi-Scale Two-Channel Fusion Network (MT-Resformer)
by Xiang Li, Chao Dong, Guixin Dong, Xuerong Cui, Yankun Chen, Peng Zhang and Zhanwei Li
J. Mar. Sci. Eng. 2025, 13(5), 944; https://doi.org/10.3390/jmse13050944 - 13 May 2025
Viewed by 506
Abstract
The classification of high-frequency marine mammal vocalizations often faces challenges due to the limitations of acoustic features, which are sensitive to mid-to-low frequencies but offer low resolution in high-frequency ranges. Additionally, single-channel networks can restrict overall classification performance. To tackle these challenges, we [...] Read more.
The classification of high-frequency marine mammal vocalizations often faces challenges due to the limitations of acoustic features, which are sensitive to mid-to-low frequencies but offer low resolution in high-frequency ranges. Additionally, single-channel networks can restrict overall classification performance. To tackle these challenges, we introduce MT-Resformer, an innovative dual-channel model with a multi-scale framework designed for classifying marine mammal vocalizations. Our approach introduces a feature fusion strategy that combines the constant-Q spectrogram with Mel filter-based spectrogram features, effectively overcoming the low resolution of Mel spectrograms in high frequencies. The MT-Resformer model incorporates two key components: a multi-scale parallel residual network (MResNet) and a Transformer network channel. The model employs a multi-level neural perceptron (MLP) to dynamically regulate the weighting of the two channels, enabling flexible feature fusion. Experimental findings validate the proposed approach, yielding classification accuracies of 99.17% on the Watkins dataset and 95.22% on the ChangLong dataset. These results emphasize its outstanding performance. Full article
(This article belongs to the Section Marine Biology)
Show Figures

Figure 1

18 pages, 4885 KiB  
Article
Decoding Poultry Welfare from Sound—A Machine Learning Framework for Non-Invasive Acoustic Monitoring
by Venkatraman Manikandan and Suresh Neethirajan
Sensors 2025, 25(9), 2912; https://doi.org/10.3390/s25092912 - 5 May 2025
Cited by 2 | Viewed by 1348
Abstract
Acoustic monitoring presents a promising, non-invasive modality for assessing animal welfare in precision livestock farming. In poultry, vocalizations encode biologically relevant cues linked to health status, behavioral states, and environmental stress. This study proposes an integrated analytical framework that combines signal-level statistical analysis [...] Read more.
Acoustic monitoring presents a promising, non-invasive modality for assessing animal welfare in precision livestock farming. In poultry, vocalizations encode biologically relevant cues linked to health status, behavioral states, and environmental stress. This study proposes an integrated analytical framework that combines signal-level statistical analysis with machine learning and deep learning classifiers to interpret chicken vocalizations in a welfare assessment context. The framework was evaluated using three complementary datasets encompassing health-related vocalizations, behavioral call types, and stress-induced acoustic responses. The pipeline employs a multistage process comprising high-fidelity signal acquisition, feature extraction (e.g., mel-frequency cepstral coefficients, spectral contrast, zero-crossing rate), and classification using models including Random Forest, HistGradientBoosting, CatBoost, TabNet, and LSTM. Feature importance analysis and statistical tests (e.g., t-tests, correlation metrics) confirmed that specific MFCC bands and spectral descriptors were significantly associated with welfare indicators. LSTM-based temporal modeling revealed distinct acoustic trajectories under visual and auditory stress, supporting the presence of habituation and stressor-specific vocal adaptations over time. Model performance, validated through stratified cross-validation and multiple statistical metrics (e.g., F1-score, Matthews correlation coefficient), demonstrated high classification accuracy and generalizability. Importantly, the approach emphasizes model interpretability, facilitating alignment with known physiological and behavioral processes in poultry. The findings underscore the potential of acoustic sensing and interpretable AI as scalable, biologically grounded tools for real-time poultry welfare monitoring, contributing to the advancement of sustainable and ethical livestock production systems. Full article
(This article belongs to the Special Issue Sensors in 2025)
Show Figures

Figure 1

20 pages, 2817 KiB  
Article
Escalate Prognosis of Parkinson’s Disease Employing Wavelet Features and Artificial Intelligence from Vowel Phonation
by Rumana Islam and Mohammed Tarique
BioMedInformatics 2025, 5(2), 23; https://doi.org/10.3390/biomedinformatics5020023 - 30 Apr 2025
Viewed by 1294
Abstract
Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has [...] Read more.
Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has been reported in the literature that PD affects the voice quality of patients at an early stage. Hence, speech samples can be used as biomarkers of PD, provided that suitable voice features and artificial intelligence algorithms are employed. Methods: Advanced signal-processing techniques are used to extract audio features from the sustained vowel ‘/a/’ sound. The extracted audio features include baseline features, intensities, formant frequencies, bandwidths, vocal fold parameters, and Mel-frequency cepstral coefficients (MFCCs) to form a feature vector. Then, this feature vector is further enriched by including wavelet-based features to form the second feature vector. For classification purposes, two popular machine learning models, namely, support vector machine (SVM) and k-nearest neighbors (kNNs), are trained to distinguish patients with PD. Results: The results demonstrate that the inclusion of wavelet-based voice features enhances the performance of both the SVM and kNN models for PD detection. However, kNN provides better accuracy, detection speed, training time, and misclassification cost than SVM. Conclusions: This work concludes that wavelet-based voice features are important for detecting neurodegenerative diseases like PD. These wavelet features can enhance the classification performance of machine learning models. This work also concludes that kNN is recommendable over SVM for the investigated voice features, despite the inclusion and exclusion of the wavelet features. Full article
Show Figures

Figure 1

19 pages, 2225 KiB  
Article
A Bird Vocalization Classification Method Based on Bidirectional FBank with Enhanced Robustness
by Chizhou Peng, Yan Zhang, Jing Lu, Danjv Lv and Yanjiao Xiong
Appl. Sci. 2025, 15(9), 4913; https://doi.org/10.3390/app15094913 - 28 Apr 2025
Viewed by 409
Abstract
Recent advances in audio signal processing and pattern recognition have made the classification of bird vocalization a focus of bioacoustic research. However, the accurate classification of birdsongs is challenged by environmental noise and the limitations of traditional feature extraction methods. This study proposes [...] Read more.
Recent advances in audio signal processing and pattern recognition have made the classification of bird vocalization a focus of bioacoustic research. However, the accurate classification of birdsongs is challenged by environmental noise and the limitations of traditional feature extraction methods. This study proposes the iWAVE-BiFBank method, an innovative approach combining improved wavelet adaptive denoising (iWAVE) and a bidirectional Mel-filter bank (BiFBank) for effective birdsong classification with enhanced robustness. The iWAVE method achieves adaptive optimization using the autocorrelation coefficient and peak-sum-ratio (PSR), overcoming the manual adjustments required with and incompleteness of traditional methods. BiFBank combines FBank and inverse FBank (iFBank) to enhance feature representation. This fusion addresses the shortcomings of FBank and introduces novel transformation methods and filter designs to iFBank, with a focus on high-frequency components. The iWAVE-BiFBank method creates a robust feature set, which can effectively reduce the noise of audio signals and capture both low- and high-frequency information. Experiments were conducted on a dataset of 16 species of birds, and the proposed method was verified with a random forest (RF) classifier. The results show that iWAVE-BiFBank achieves an accuracy of 94.00%, with other indicators, including the F1 score, exceeding 93.00%, outperforming all other tested methods. Overall, the proposed method effectively reduces audio noise, comprehensively captures the characteristics of bird vocalization, and provides improved classification performance. Full article
Show Figures

Figure 1

21 pages, 2798 KiB  
Article
High-Speed Videoendoscopy and Stiffness Mapping for AI-Assisted Glottic Lesion Differentiation
by Magdalena M. Pietrzak, Justyna Kałuża-Olszewska, Ewa Niebudek-Bogusz, Artur Klepaczko and Wioletta Pietruszewska
Cancers 2025, 17(8), 1376; https://doi.org/10.3390/cancers17081376 - 21 Apr 2025
Viewed by 478
Abstract
Objectives: This study evaluates the potential of high-speed videoendoscopy (HSV) in differentiating between benign and malignant glottic lesions, offering a non-invasive diagnostic tool for clinicians. Moreover, a new parameter derived from high-speed videoendoscopy (HSV) had been proposed and implemented in the analysis [...] Read more.
Objectives: This study evaluates the potential of high-speed videoendoscopy (HSV) in differentiating between benign and malignant glottic lesions, offering a non-invasive diagnostic tool for clinicians. Moreover, a new parameter derived from high-speed videoendoscopy (HSV) had been proposed and implemented in the analysis for an objective assessment of the vocal fold stiffness. Methods: High-speed videoendoscopy (HSV) was conducted on 102 participants, including 21 normophonic individuals, 39 patients with benign vocal fold lesions, and 42 with glottic cancer. Laryngotopographic parameter describing the stiffness of vocal fold (SAI) and kymographic parameters describing amplitude, symmetry, and glottal dynamics were quantified. Statistical differences between groups were assessed using receiver operating characteristic (ROC) analysis and lesion classification was performed using a machine learning model. Results: Univariate receiver operating characteristic (ROC) analysis revealed that SAI (AUC = 0.91, 95% CI: 0.839–0.962) and weighted amplitude asymmetry (AUC = 0.92, 95% CI: 0.85–0.974) were highly effective in distinguishing between normophonic and organic lesions (p < 0.01). Further multivariate analysis using machine learning models demonstrated improved accuracy, with the SVM classifier achieving an AUC of 0.93 for detecting organic lesions and 0.83 for distinguishing benign from malignant lesions. Conclusions: The study demonstrates the potential value of parameter describing the pliability of infiltrated vocal fold (SAI) as a non-invasive tool to support histopathological evaluation in laryngeal lesions, with machine learning models enhancing diagnostic performance. Full article
(This article belongs to the Special Issue Application of Biostatistics in Cancer Research)
Show Figures

Figure 1

12 pages, 2593 KiB  
Article
Multiclass CNN Approach for Automatic Classification of Dolphin Vocalizations
by Francesco Di Nardo, Rocco De Marco, Daniel Li Veli, Laura Screpanti, Benedetta Castagna, Alessandro Lucchetti and David Scaradozzi
Sensors 2025, 25(8), 2499; https://doi.org/10.3390/s25082499 - 16 Apr 2025
Cited by 1 | Viewed by 871
Abstract
Monitoring dolphins in the open sea is essential for understanding their behavior and the impact of human activities on the marine ecosystems. Passive Acoustic Monitoring (PAM) is a non-invasive technique for tracking dolphins, providing continuous data. This study presents a novel approach for [...] Read more.
Monitoring dolphins in the open sea is essential for understanding their behavior and the impact of human activities on the marine ecosystems. Passive Acoustic Monitoring (PAM) is a non-invasive technique for tracking dolphins, providing continuous data. This study presents a novel approach for classifying dolphin vocalizations from a PAM acoustic recording using a convolutional neural network (CNN). Four types of common bottlenose dolphin (Tursiops truncatus) vocalizations were identified from underwater recordings: whistles, echolocation clicks, burst pulse sounds, and feeding buzzes. To enhance classification performances, edge-detection filters were applied to spectrograms, with the aim of removing unwanted noise components. A dataset of nearly 10,000 spectrograms was used to train and test the CNN through a 10-fold cross-validation procedure. The results showed that the CNN achieved an average accuracy of 95.2% and an F1-score of 87.8%. The class-specific results showed a high accuracy for whistles (97.9%), followed by echolocation clicks (94.5%), feeding buzzes (94.0%), and burst pulse sounds (92.3%). The highest F1-score was obtained for whistles, exceeding 95%, while the other three vocalization typologies maintained an F1-score above 80%. This method provides a promising step toward improving the passive acoustic monitoring of dolphins, contributing to both species conservation and the mitigation of conflicts with fisheries. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

25 pages, 647 KiB  
Article
Multiscale Sample Entropy-Based Feature Extraction with Gaussian Mixture Model for Detection and Classification of Blue Whale Vocalization
by Oluwaseyi Paul Babalola, Olayinka Olaolu Ogundile and Vipin Balyan
Entropy 2025, 27(4), 355; https://doi.org/10.3390/e27040355 - 28 Mar 2025
Viewed by 863
Abstract
A multiscale sample entropy (MSE) algorithm is presented as a time domain feature extraction method to study the vocal behavior of blue whales through continuous acoustic monitoring. Additionally, MSE is applied to the Gaussian mixture model (GMM) for blue whale call detection and [...] Read more.
A multiscale sample entropy (MSE) algorithm is presented as a time domain feature extraction method to study the vocal behavior of blue whales through continuous acoustic monitoring. Additionally, MSE is applied to the Gaussian mixture model (GMM) for blue whale call detection and classification. The performance of the proposed MSE-GMM algorithm is experimentally assessed and benchmarked against traditional methods, including principal component analysis (PCA), wavelet-based feature (WF) extraction, and dynamic mode decomposition (DMD), all combined with the GMM. This study utilizes recorded data from the Antarctic open source library. To improve the accuracy of classification models, a GMM-based feature selection method is proposed, which evaluates both positively and negatively correlated features while considering inter-feature correlations. The proposed method demonstrates enhanced performance over conventional PCA-GMM, DMD-GMM, and WF-GMM methods, achieving higher accuracy and lower error rates when classifying the non-stationary and complex vocalizations of blue whales. Full article
(This article belongs to the Special Issue 25 Years of Sample Entropy)
Show Figures

Figure 1

18 pages, 3228 KiB  
Article
Automatic Detection and Unsupervised Clustering-Based Classification of Cetacean Vocal Signals
by Yinian Liang, Yan Wang, Fangjiong Chen, Hua Yu, Fei Ji and Yankun Chen
Appl. Sci. 2025, 15(7), 3585; https://doi.org/10.3390/app15073585 - 25 Mar 2025
Cited by 1 | Viewed by 525
Abstract
In the ocean environment, passive acoustic monitoring (PAM) is an important technique for the surveillance of cetacean species. Manual detection for a large amount of PAM data is inefficient and time-consuming. To extract useful features from a large amount of PAM data for [...] Read more.
In the ocean environment, passive acoustic monitoring (PAM) is an important technique for the surveillance of cetacean species. Manual detection for a large amount of PAM data is inefficient and time-consuming. To extract useful features from a large amount of PAM data for classifying different cetacean species, we propose an automatic detection and unsupervised clustering-based classification method for cetacean vocal signals. This paper overcomes the limitations of the traditional threshold-based method, and the threshold is set adaptively according to the mean value of the signal energy in each frame. Furthermore, we also address the problem of the high cost of data training and labeling in deep-learning-based methods by using the unsupervised clustering-based classification method. Firstly, the automatic detection method extracts vocal signals from PAM data and, at the same time, removes clutter information. Then, the vocal signals are analyzed for classification using a clustering algorithm. This method grabs the acoustic characteristics of vocal signals and distinguishes them from environmental noise. We process 194 audio files in a total of 25.3 h of vocal signal from two marine mammal public databases. Five kinds of vocal signals from different cetaceans are extracted and assembled to form 8 datasets for classification. The verification experiments were conducted on four clustering algorithms based on two performance metrics. The experimental results confirm the effectiveness of the proposed method. The proposed method automatically removes about 75% of clutter data from 1581.3MB of data in audio files and extracts 75.75 MB of the features detected by our algorithm. Four classical unsupervised clustering algorithms are performed on the datasets we made for verification and obtain an average accuracy rate of 84.83%. Full article
(This article belongs to the Special Issue Machine Learning in Acoustic Signal Processing)
Show Figures

Figure 1

18 pages, 2018 KiB  
Article
Adapting a Large-Scale Transformer Model to Decode Chicken Vocalizations: A Non-Invasive AI Approach to Poultry Welfare
by Suresh Neethirajan
AI 2025, 6(4), 65; https://doi.org/10.3390/ai6040065 - 25 Mar 2025
Cited by 2 | Viewed by 1282
Abstract
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, [...] Read more.
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, to decode chicken vocalizations. Our primary objective was to determine whether Whisper could effectively identify acoustic patterns associated with emotional and physiological states in poultry, thereby enabling real-time, non-invasive welfare assessments. To achieve this, chicken vocal data were recorded under diverse experimental conditions, including healthy versus unhealthy birds, pre-stress versus post-stress scenarios, and quiet versus noisy environments. The audio recordings were processed through Whisper, producing text-like outputs. Although these outputs did not represent literal translations of chicken vocalizations into human language, they exhibited consistent patterns in token sequences and sentiment indicators strongly correlated with recognized poultry stressors and welfare conditions. Sentiment analysis using standard NLP tools (e.g., polarity scoring) identified notable shifts in “negative” and “positive” scores that corresponded closely with documented changes in vocal intensity associated with stress events and altered physiological states. Despite the inherent domain mismatch—given Whisper’s original training on human speech—the findings clearly demonstrate the model’s capability to reliably capture acoustic features significant to poultry welfare. Recognizing the limitations associated with applying English-oriented sentiment tools, this study proposes future multimodal validation frameworks incorporating physiological sensors and behavioral observations to further strengthen biological interpretability. To our knowledge, this work provides the first demonstration that Transformer-based architectures, even without species-specific fine-tuning, can effectively encode meaningful acoustic patterns from animal vocalizations, highlighting their transformative potential for advancing productivity, sustainability, and welfare practices in precision poultry farming. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

20 pages, 3271 KiB  
Article
Fine-Tuned Machine Learning Classifiers for Diagnosing Parkinson’s Disease Using Vocal Characteristics: A Comparative Analysis
by Mehmet Meral, Ferdi Ozbilgin and Fatih Durmus
Diagnostics 2025, 15(5), 645; https://doi.org/10.3390/diagnostics15050645 - 6 Mar 2025
Viewed by 1343
Abstract
Background/Objectives: This paper is significant in highlighting the importance of early and precise diagnosis of Parkinson’s Disease (PD) that affects both motor and non-motor functions to achieve better disease control and patient outcomes. This study seeks to assess the effectiveness of machine [...] Read more.
Background/Objectives: This paper is significant in highlighting the importance of early and precise diagnosis of Parkinson’s Disease (PD) that affects both motor and non-motor functions to achieve better disease control and patient outcomes. This study seeks to assess the effectiveness of machine learning algorithms optimized to classify PD based on vocal characteristics to serve as a non-invasive and easily accessible diagnostic tool. Methods: This study used a publicly available dataset of vocal samples from 188 people with PD and 64 controls. Acoustic features like baseline characteristics, time-frequency components, Mel Frequency Cepstral Coefficients (MFCCs), and wavelet transform-based metrics were extracted and analyzed. The Chi-Square test was used for feature selection to determine the most important attributes that enhanced the accuracy of the classification. Six different machine learning classifiers, namely SVM, k-NN, DT, NN, Ensemble and Stacking models, were developed and optimized via Bayesian Optimization (BO), Grid Search (GS) and Random Search (RS). Accuracy, precision, recall, F1-score and AUC-ROC were used for evaluation. Results: It has been found that Stacking models, especially those fine-tuned via Grid Search, yielded the best performance with 92.07% accuracy and an F1-score of 0.95. In addition to that, the choice of relevant vocal features, in conjunction with the Chi-Square feature selection method, greatly enhanced the computational efficiency and classification performance. Conclusions: This study highlights the potential of combining advanced feature selection techniques with hyperparameter optimization strategies to enhance machine learning-based PD diagnosis using vocal characteristics. Ensemble models proved particularly effective in handling complex datasets, demonstrating robust diagnostic performance. Future research may focus on deep learning approaches and temporal feature integration to further improve diagnostic accuracy and scalability for clinical applications. Full article
Show Figures

Figure 1

21 pages, 12814 KiB  
Article
Multi-Scale Deep Feature Fusion with Machine Learning Classifier for Birdsong Classification
by Wei Li, Danju Lv, Yueyun Yu, Yan Zhang, Lianglian Gu, Ziqian Wang and Zhicheng Zhu
Appl. Sci. 2025, 15(4), 1885; https://doi.org/10.3390/app15041885 - 12 Feb 2025
Cited by 1 | Viewed by 1278
Abstract
Birds are significant bioindicators in the assessment of habitat biodiversity, ecological impacts and ecosystem health. Against the backdrop of easier bird vocalization data acquisition, and with deep learning and machine learning technologies as the technical support, exploring recognition and classification networks suitable for [...] Read more.
Birds are significant bioindicators in the assessment of habitat biodiversity, ecological impacts and ecosystem health. Against the backdrop of easier bird vocalization data acquisition, and with deep learning and machine learning technologies as the technical support, exploring recognition and classification networks suitable for bird calls has become the focus of bioacoustics research. Due to the fact that the spectral differences among various bird calls are much greater than the differences between human languages, constructing birdsong classification networks based on human speech recognition networks does not yield satisfactory results. Effectively capturing the differences in birdsong across species is a crucial factor in improving recognition accuracy. To address the differences in features, this study proposes multi-scale deep features. At the same time, we separate the classification part from the deep network by using machine learning to adapt to classification with distinct feature differences in birdsong. We validate the effectiveness of multi-scale deep features on a publicly available dataset of 20 bird species. The experimental results show that the accuracy of the multi-scale deep features on a log-wavelet spectrum, log-Mel spectrum and log-power spectrum reaches 94.04%, 97.81% and 95.89%, respectively, achieving an improvement over single-scale deep features on these three spectrograms. Comparative experimental results show that the proposed multi-scale deep feature method is superior to five state-of-the-art birdsong identification methods, which provides new perspectives and tools for birdsong identification research, and is of great significance for ecological monitoring, biodiversity conservation and forest research. Full article
Show Figures

Figure 1

Back to TopTop