Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
Abstract
:1. Introduction
- We demonstrate not only that it is possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain, but that it is also valuable to build ensembles of these deep learners.
- We compare the performance of many state-of-the-art audio representations. Additionally, we also evaluate the LM spectrogram and stockwell for the first time, to the best of our knowledge, on a bioacoustic problem.
2. Related Work in Bioacoustic Classification
2.1. Convolutional Neural Networks (CNNs)
2.2. Visual Audio Representations
3. Materials and Methods
3.1. Overview of the System
3.2. Pretrained CNNs
3.2.1. AlexNet
3.2.2. VGG-16
3.2.3. ResNet-50
3.3. Audio Representations
3.3.1. Spectrogram
3.3.2. Mel Spectrogram
3.3.3. LM, L2M, L3M
3.3.4. MFCC
3.3.5. Stockwell
3.3.6. VGGish Features
4. Data Collection and Cross-Validation Techniques
4.1. Data Collection
4.2. Cross-Validation Dataset Design
- ERR Day: each test set is the full-throated samples collected on a single day, with all the remaining making up the training set. The Day dataset is thus a 20-fold dataset.
- ERR Bout: each test set is a single bout of roars, between 1 and 3 samples, with the rest becoming the training set. Bout is thus a larger set, with around 74 folds.
4.3. Equal Error Rate (EER) Dataset Design
5. Experimental Results
- Min-max: the pattern is scaled in the range of with this formula: ;
- db: the formula for this adjustment is .
- box_n (box normalize): same as min-max, except that the pattern is scaled in the range : .
EER Dataset Validation
6. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
List of Acronyms
Acronym | Full Term |
CDCN | Convolutional Deep Clustering Neural Network |
CNN | Convolutional Neural Network |
EER | Equal Error Rate |
HMM | Hidden Markov model |
K-NN | K-Nearest Neighbors |
LOOCV | Leave One Out Cross-Validation |
MFC | Mel-frequency cepstrum |
MFCCs | Mel-Frequency Cepstral Coefficients |
PAM | Passive Acoustic Monitoring |
ROC | Receiver Operating Characteristic |
SVM | Support Vector Machines |
Acronym | Explanation |
Bout Dataset | Each test set is a single bout of roars, between 1 and 3 samples (see Section 4.1) |
box_n | Adjustment to a representation: |
Day Dataset | Each test set is the full-throated samples collected on a single day (see Section 4.1) |
ERR Day | Equal Error Rate Day dataset design (see Section 4.3) |
ERR Bout | Equal Error Rate Bout dataset design (see Section 4.3) |
db | Adjustment to a representation: |
LM and L2M | Features derived from the Mel Spectrogram (see Section 3.3.3) |
Min-max | Adjustment to a representation: ( |
S | Spectrogram |
References
- Ramsauer, S. Acoustic communication in lions and its use in territoriality. Cogn. Brain Behav. 2005, 9, 539–550. [Google Scholar]
- McComb, K.; Packer, C.; Pusey, A. Roaring and numerical assessment in contests between groups of female lions, Panthera leo. Anim. Behav. 1994, 47, 379–387. [Google Scholar] [CrossRef] [Green Version]
- Eklund, R.; Peters, G.S.; Ananthakrishnan, G.; Mabiza, E. An acoustic analysis of lion roars. I: Data collection and spectrogram and waveform analyses. In Quarterly Progress and Status Report TMH-QPSR, Volume Fonetik 2011; Royal Institute of Technology: Stockholm, Sweden, 2011. [Google Scholar]
- Wijers, M.; Trethowan, P.; Du Preez, B.; Chamaillé-Jammes, S.; Loveridge, A.J.; Macdonald, D.W.; Markham, A. Vocal discrimination of African lions and its potential for collar-free tracking. Bioacoustics 2021, 30, 575–593. [Google Scholar] [CrossRef]
- Pfefferle, D.; West, P.M.; Grinnell, J.E.; Packer, C.; Fischer, J. Do acoustic features of lion, Panthera leo, roars reflect sex and male condition? J. Acoust. Soc. Am. 2007, 121, 3947–3953. [Google Scholar] [CrossRef] [PubMed]
- Fitch, W.T. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. J. Acoust. Soc. Am. 1997, 102, 1213–1222. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Clemins, P.J.; Johnson, M.T.; Leong, K.M.; Savage, A. Automatic classification and speaker identification of African elephant (Loxodonta africana) vocalizations. J. Acoust. Soc. Am. 2003, 117, 956–963. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Watson, S.K.; Townsend, S.W.; Range, F. Wolf howls encode both sender- and context-specific information. Anim. Behav. 2018, 145, 59–66. [Google Scholar] [CrossRef] [Green Version]
- Ji, A.; Johnson, M.T.; Walsh, E.J.; McGee, J.; Armstrong, D.L. Discrimination of individual tigers (Panthera tigris) from long distance roars. J. Acoust. Soc. Am. 2013, 133, 1762–1769. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Choi, W.; Lee, J.; Sung, H.-C. A case study of male tawny owl (Strix aluco) vocalizations in South Korea: Call feature, individuality, and the potential use for census. Anim. Cells Syst. 2019, 23, 90–96. [Google Scholar] [CrossRef] [Green Version]
- Budka, M.; Deoniziak, K.; Tumiel, T.; Wpzna, J.T. Vocal individuality in drumming in great spotted woodpecker a biological perspective and implications for conservation. PLoS ONE 2018, 13, e0191716. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bedoya, C.L.; Molles, L.E. Acoustic censusing and individual identification of birds in the wild. bioRxiv 2021. [Google Scholar] [CrossRef]
- Hambálková, L.; Policht, R.; Horák, J.; Hart, V. Acoustic individuality in the hissing calls of the male black grouse (Lyrurus tetrix). PeerJ 2021, 9, e11837. [Google Scholar] [CrossRef]
- Probert, R.; Bastian, A.; Elwen, S.H.; James, B.S.; Gridley, T. Vocal correlates of arousal in bottlenose dolphins (Tursiops spp.) in human care. PLoS ONE 2021, 16, e0250913. [Google Scholar] [CrossRef] [PubMed]
- Lau, A.R.; Clink, D.J.; Bales, K.L. Individuality in the vocalizations of infant and adult coppery titi monkeys (Plecturocebus cupreus). Am. J. Primatol. 2020, 82, e23134. [Google Scholar] [CrossRef] [PubMed]
- Mumm, C.A.; Urrutia, M.C.; Knörnschild, M. Vocal individuality in cohesion calls of giant otters, Pteronura brasiliensis. Anim. Behav. 2014, 88, 243–252. [Google Scholar] [CrossRef]
- Hull, C.; McCombe, C.; Dassow, A. Acoustic Identification of Wild Gray Wolves, Canis lupus, Using Low Quality Recordings. J. Undergrad. Res. 2020, 16, 41–49. [Google Scholar] [CrossRef]
- Clink, D.J.; Crofoot, M.C.; Marshall, A.J. Application of a semi-automated vocal fingerprinting approach to monitor Bornean gibbon females in an experimentally fragmented landscape in Sabah, Malaysia. Bioacoustics 2019, 28, 193–209. [Google Scholar] [CrossRef]
- Clink, D.J.; Klinck, H. Unsupervised acoustic classification of individual gibbon females and the implications for passive acoustic monitoring. Methods Ecol. Evol. 2021, 12, 328–341. [Google Scholar] [CrossRef]
- Green, A.C.; Johnston, I.; Clark, C.E.F. Invited review: The evolution of cattle bioacoustics and application for advanced dairy systems. Anim. Int. J. Anim. Biosci. 2018, 12, 1250–1259. [Google Scholar] [CrossRef] [Green Version]
- Röttgen, V.; Schön, P.C.; Becker, F.; Tuchscherer, A.; Wrenzycki, C.; Düpjan, S.; Puppe, B. Automatic recording of individual oestrus vocalisation in group-housed dairy cattle: Development of a cattle call monitor. Animal 2020, 14, 198–205. [Google Scholar] [CrossRef] [Green Version]
- Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Rabiner, L.R.; Schafer, R.W. Theory and Application of Digital Speech Processing; Prentice Hall Press: Hoboken, NJ, USA, 2010. [Google Scholar]
- Cheng, J.; Xie, B.; Lin, C.; Ji, L. A comparative study in birds: Call-type-independent species and individual recognition using four machine-learning methods and two acoustic features. Bioacoustics 2012, 21, 157–171. [Google Scholar] [CrossRef]
- Spillmann, B.; van Schaik, C.P.; Setia, T.M.; Sadjadi, S.O. Who shall I say is calling? Validation of a caller recognition procedure in Bornean flanged male orangutan (Pongo pygmaeus wurmbii) long calls. Bioacoustics 2017, 26, 109–120. [Google Scholar] [CrossRef]
- Fox, E.J.S. A new perspective on acoustic individual recognition in animals with limited call sharing or changing repertoires. Anim. Behav. 2008, 75, 1187–1194. [Google Scholar] [CrossRef]
- Robakis, E.; Watsa, M.; Erkenswick, G. Classification of producer characteristics in primate long calls using neural networks. J. Acoust. Soc. Am. 2018, 144, 344–353. [Google Scholar] [CrossRef] [Green Version]
- Kirsebom, O.S.; Frazao, F.; Simard, Y.; Roy, N.; Matwin, S.; Giard, S. Performance of a deep neural network at detecting North Atlantic right whale upcalls. J. Acoust. Soc. Am. 2020, 147, 2636–2646. [Google Scholar] [CrossRef] [PubMed]
- Stowell, D.; Wood, M.D.; Pamuła, H.; Stylianou, Y.; Glotin, H. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods Ecol. Evol. 2019, 10, 368–380. [Google Scholar] [CrossRef] [Green Version]
- Salamon, J.; Bello, J.P. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Process. Lett. 2017, 24, 279–283. [Google Scholar] [CrossRef]
- Lasseck, M. Audio-based Bird Species Identification with Deep Convolutional Neural Networks. Available online: http://ceur-ws.org/Vol-1609/16090547.pdf (accessed on 24 February 2022).
- Thakur, A.; Thapar, D.; Rajan, P.; Nigam, A. Deep metric learning for bioacoustic classification: Overcoming training data scarcity using dynamic triplet loss. J. Acoust. Soc. Am. 2019, 146, 534–547. [Google Scholar] [CrossRef] [Green Version]
- Ibrahim, A.K.; Chérubin, L.M.; Zhuang, H.; Umpierre, M.T.S.; Dalgleish, F.; Erdol, N.; Ouyang, B.; Dalgleish, A. An approach for automatic classification of grouper vocalizations with passive acoustic monitoring. J. Acoust. Soc. Am. 2018, 143, 666–676. [Google Scholar] [CrossRef]
- Bermant, P.C.; Bronstein, M.M.; Wood, R.J.; Gero, S.; Gruber, D.F. Deep Machine Learning Techniques for the Detection and Classification of Sperm Whale Bioacoustics. Sci. Rep. 2019, 9, 12588. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hershey, S.; Chaudhuri, S.; Ellis, D.P.W.; Gemmeke, J.F.; Jansen, A.; Moore, R.C.; Plakal, M.; Platt, D.; Saurous, R.A.; Seybold, B.; et al. CNN architectures for large-scale audio classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 131–135. [Google Scholar]
- Brown, R.A.; Lauzon, M.L.; Frayne, R. A General Description of Linear Time-Frequency Transforms and Formulation of a Fast, Invertible Transform That Samples the Continuous S-Transform Spectrum Nonredundantly. IEEE Trans. Signal Process. 2010, 58, 281–290. [Google Scholar] [CrossRef]
- Mushtaq, Z.; Su, S.-F. Efficient Classification of Environmental Sounds through Multiple Features Aggregation and Data Enhancement Techniques for Spectrogram Images. Symmetry 2020, 12, 1822. [Google Scholar] [CrossRef]
- Fukushima, K. Neocognitron: A self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar] [CrossRef] [Green Version]
- Chauhan, R.; Ghanshala, K.K.; Joshi, R.C. Convolutional Neural Network (CNN) for Image Detection and Recognition. In Proceedings of the 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 15–17 December 2018; pp. 278–282. [Google Scholar]
- Tianyu, Z.; Zhenjiang, M.; Jianhu, Z. Combining CNN with Hand-Crafted Features for Image Classification. In Proceedings of the 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 12–16 August 2018; pp. 554–557. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition; Cornell University: Ithaca, NY, USA, 2014. [Google Scholar]
- Ibrahim, A.K.; Zhuang, H.; Chérubin, L.M.; Erdol, N.; O’Corry-Crowe, G.; Ali, A.M. A multimodel deep learning algorithm to detect North Atlantic right whale up-calls. J. Acoust. Soc. Am. 2021, 150, 1264–1272. [Google Scholar] [CrossRef] [PubMed]
- Merchan, F.; Guerra, A.; Poveda, H.; Guzmán, H.M.; Sanchez-Galan, J.E. Bioacoustic Classification of Antillean Manatee Vocalization Spectrograms Using Deep Convolutional Neural Networks. Appl. Sci. 2020, 10, 3286. [Google Scholar] [CrossRef]
- Zualkernan, I.; Judas, J.; Mahbub, T.; Bhagwagar, A.; Chand, P. A Tiny CNN Architecture for Identifying Bat Species from Echolocation Calls. In Proceedings of the 2020 IEEE/ITU International Conference on Artificial Intelligence for Good (AI4G), Geneva, Switzerland, 21–25 September 2020; pp. 81–86. [Google Scholar]
- Escobar-Amado, C.D.; Badiey, M.; Pecknold, S. Automatic detection and classification of bearded seal vocalizations in the northeastern Chukchi Sea using convolutional neural networks. J. Acoust. Soc. Am. 2022, 151, 299–309. [Google Scholar] [CrossRef] [PubMed]
- Ruff, Z.J.; Lesmeister, D.B.; Duchac, L.S.; Padmaraju, B.K.; Sullivan, C.M. Automated identification of avian vocalizations with deep convolutional neural networks. Remote Sens. Ecol. Conserv. 2020, 6, 79–92. [Google Scholar] [CrossRef] [Green Version]
- Gupta, G.; Kshirsagar, M.; Zhong, M.; Gholami, S.; Ferres, J.L. Comparing recurrent convolutional neural networks for large scale bird species classification. Sci. Rep. 2021, 11, 17085. [Google Scholar] [CrossRef]
- Rasmussen, J.H.; Širović, A. Automatic detection and classification of baleen whale social calls using convolutional neural networks. J. Acoust. Soc. Am. 2021, 149, 3635–3644. [Google Scholar] [CrossRef] [PubMed]
- Padovese, B.; Frazao, F.; Kirsebom, O.S.; Matwin, S. Data augmentation for the classification of North Atlantic right whales upcalls. J. Acoust. Soc. Am. 2021, 149, 2520–2530. [Google Scholar] [CrossRef] [PubMed]
- Zhong, M.; Castellote, M.; Dodhia, R.; Ferres, J.L.; Keogh, M.; Brewer, A. Beluga whale acoustic signal classification using deep learning neural network models. J. Acoust. Soc. Am. 2020, 147, 1834–1841. [Google Scholar] [CrossRef] [PubMed]
- Zhang, K.; Liu, T.; Liu, M.; Li, A.; Xiao, Y.; Metzner, W.; Liu, Y. Comparing context-dependent call sequences employing machine learning methods: An indication of syntactic structure of greater horseshoe bats. J. Exp. Biol. 2019, 222, jeb214072. [Google Scholar] [CrossRef]
- Dias, F.F.; Ponti, M.A.; Minghim, R. A classification and quantification approach to generate features in soundscape ecology using neural networks. Neural Comput. Appl. 2022, 34, 1923–1937. [Google Scholar] [CrossRef]
- O’Shaughnessy, D.D. Speech Communication: Human and Machine; Addison-Wesley Publishing Company: Boston, MA, USA, 1987. [Google Scholar]
- Ganchev, T.D.; Fakotakis, N.; Kokkinakis, G.K. Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task. In Proceedings of the 10th International Conference on Speech and Computer (SPECOM-2005), University of Patras, Patras, Greece, 17–19 October 2005; pp. 191–194. [Google Scholar]
- Jung, D.-H.; Kim, N.Y.; Moon, S.H.; Jhin, C.; Kim, H.-J.; Yang, J.-S.; Kim, H.S.; Lee, T.S.; Lee, J.Y.; Park, S.H. Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering. Animals 2021, 11, 357. [Google Scholar] [CrossRef] [PubMed]
- Hidayat, A.A.; Cenggoro, T.W.; Pardamean, B. Convolutional Neural Networks for Scops Owl Sound Classification. Procedia Comput. Sci. 2021, 179, 81–87. [Google Scholar] [CrossRef]
- Andono, P.N.; Shidik, G.F.; Prabowo, D.P.; Pergiwati, D.; Pramunendar, R.A. Bird Voice Classification Based on Combination Feature Extraction and Reduction Dimension with the K-Nearest Neighbor. Int. J. Intell. Eng. Syst. 2022, 15, 262–272. [Google Scholar]
- Ramashini, M.; Abas, P.E.; Mohanchandra, K.; De Silva, L.C. Robust cepstral feature for bird sound classification. Int. J. Electr. Comput. Eng. (2088-8708) 2022, 12, 1477–1487. [Google Scholar] [CrossRef]
- Chen, L.; Shen, X. Bird Voice Classification and Recognition Based on BA-ELM. In Proceedings of the 2021 IEEE 4th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 19–21 November 2021; pp. 278–282. [Google Scholar]
- Trawicki, M.B. Multispecies discrimination of whales (cetaceans) using Hidden Markov Models (HMMS). Ecol. Inform. 2021, 61, 101223. [Google Scholar] [CrossRef]
- Ogundile, O.; Usman, A.; Babalola, O.; Versfeld, D. Dynamic mode decomposition: A feature extraction technique based hidden Markov model for detection of Mysticetes’ vocalisations. Ecol. Inform. 2021, 63, 101306. [Google Scholar] [CrossRef]
- Goussha, Y.; Bar, K.; Netser, S.; Cohen, L.; Hel-Or, Y.; Wagner, S. HybridMouse: A Hybrid Convolutional-Recurrent Neural Network-Based Model for Identification of Mouse Ultrasonic Vocalizations. Front. Behav. Neurosci. 2022, 15. [Google Scholar] [CrossRef] [PubMed]
- Morgan, M.M.; Braasch, J. Long-term deep learning-facilitated environmental acoustic monitoring in the Capital Region of New York State. Ecol. Inform. 2021, 61, 101242. [Google Scholar] [CrossRef]
- Zhang, Y.-J.; Huang, J.-F.; Gong, N.; Ling, Z.-H.; Hu, Y. Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks. J. Acoust. Soc. Am. 2018, 144, 478–487. [Google Scholar] [CrossRef] [PubMed]
- Xu, W.; Zhang, X.; Yao, L.; Xue, W.; Wei, B. A Multi-view CNN-based Acoustic Classification System for Automatic Animal Species Identification. Ad Hoc Netw. 2020, 102, 102115. [Google Scholar] [CrossRef] [Green Version]
- Islam, S.; Valles, D. Houston Toad and Other Chorusing Amphibian Species Call Detection Using Deep Learning Architectures. In Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2020; pp. 0511–0516. [Google Scholar]
- Vithakshana, L.; Samankula, W. IoT based animal classification system using convolutional neural network. In Proceedings of the 2020 International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 24 September 2020; pp. 90–95. [Google Scholar]
- Tolkova, I.; Chu, B.; Hedman, M.; Kahl, S.; Klinck, H. Parsing Birdsong with Deep Audio Embeddings. arXiv 2021, arXiv:2108.09203. [Google Scholar]
- Schiller, D.; Huber, T.; Lingenfelser, F.; Dietz, M.; Seiderer, A.; André, E. Relevance-based feature masking: Improving neural network based whale classification through explainable artificial intelligence. In Proceedings of the INTERSPEECH 2019, Graz, Austria, 15–19 September 2019. [Google Scholar]
- Varga, D. No-Reference Image Quality Assessment with Convolutional Neural Networks and Decision Fusion. Appl. Sci. 2022, 12, 101. [Google Scholar] [CrossRef]
- Srivastava, R.K.; Greff, K.; Schmidhuber, J. Highway Networks. arXiv 2015, arXiv:1505.00387. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Smith, J.O. Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications. Available online: http://books.w3k.org (accessed on 24 February 2022).
- Chassande-Motin, É.; Auger, F.; Flandrin, P. Reassignment. In Time-Frequency Analysis: Concepts and Methods; Hlawatsch, F., Auger, F., Eds.; ISTE/John Wiley and Sons: London, UK, 2008. [Google Scholar]
- Sahidullah, M.; Saha, G. Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 2012, 54, 543–565. [Google Scholar] [CrossRef]
Network and Feature | Day | Bout |
---|---|---|
VGG16 LM S | 95.67% | 98.9% |
VGG16 dB LM S | 94.50% | 100% |
AlexNet custom LM S | 93.66% | 96.8% |
ResNet50 S | 91.97% | 97.6% |
VGG16 S | 91.10% | 97.2% |
ResNet50 min-max S | 90.59% | 95.9% |
AlexNet box_n Mel S | 90.17% | 96.3% |
VGG16 MFCC | 90.10% | 94.8% |
VGG16 L2M S | 89.78% | 94.6% |
VGG16 min-max S | 89.69% | 97.6% |
Network 1 | Network 2 | Day | Bout |
---|---|---|---|
VGG16 MFCC | VGG16 LM S | 97.64% | 99.3% |
ResNet50 S | AlexNet LM S | 97.61% | 98.7% |
ResNet50 S | VGG16 LM S | 97.61% | 99.4% |
ResNet50 S | VGG16 dB LM S | 97.45% | 99.4% |
ResNet50 min–max S | VGG16 LM S | 97.31% | 100% |
AlexNet min–max scaled S | VGG16 LM S | 97.06% | 100% |
ResNet50 min–max scaled S | VGG16 LM S | 96.90% | 99.1% |
ResNet50 Mel S | VGG16 LM S | 96.78% | 99.5% |
AlexNet dB LM S | VGG16 LM S | 96.78% | 99.5% |
VGG16 min-max scaled S | VGG16 LM S | 96.68% | 97.6% |
Network 1 | Network 2 | Network 3 | Day | Bout |
---|---|---|---|---|
VGG16 min–max S | AlexNet LM S | VGG16 LM S | 98.67% | 100% |
ResNet50 S | ResNet50 Mel S | VGG16 dB LM S | 98.42% | 100% |
ResNet50 min–max S | AlexNet LM S | VGG16 dB LM S | 98.42% | 100% |
ResNet50 S | VGG16 MFCC | VGG16 LM S | 98.42% | 100% |
ResNet50 S | VGG16 dB LM S | VGG16 LM S | 98.19% | 100% |
AlexNet min–max Mel S | VGG16 dB LM S | VGG16 LM S | 98.17% | 100% |
VGG16 dB LM S | AlexNet box_n Mel S | VGG16 LM S | 98.17% | 100% |
ResNet50 S | AlexNet VGGish | VGG16 dB LM S | 98.14% | 100% |
ResNet50 S | AlexNet LM S | VGG16 LM S | 98.14% | 100% |
ResNet50 min–max S | AlexNet dB LM S | VGG16 LM S | 98.14% | 100% |
Networks and Feature | Day | Bout |
---|---|---|
VGG16 LM S | 5.62 | 0.53 |
VGG16 dB LM S | 3.67 | 0.69 |
AlexNet LM S | 4.87 | 2.38 |
ResNet50 S | 6.97 | 1.92 |
VGG16 S | 7.27 | 2.38 |
ResNet50 min–max S | 7.87 | 1.84 |
AlexNet box_n Mel S | 3.82 | 2.99 |
VGG16 MFCC | 11.6 | 6.07 |
VGG16 L2M Mel S | 9.82 | 6.07 |
VGG16 min-max S | 7.49 | 2.45 |
Network 1 | Network 2 | Day | Bout |
---|---|---|---|
VGG16 MFCC | VGG16 LM S | 6.82 | 1.53 |
ResNet50 S | AlexNet LM S | 2.69 | 0.53 |
ResNet50 S | VGG16 LM S | 5.02 | 0.53 |
ResNet50 S | VGG16 dB LM S | 4.34 | 0 |
ResNet50 min–max S | VGG16 LM S | 2.47 | 0.53 |
AlexNet min–max S | VGG16 LM S | 5.47 | 0.53 |
ResNet50 min–max S | VGG16 LM S | 4.20 | 0.53 |
ResNet50 Mel S | VGG16 LM S | 4.49 | 0.53 |
AlexNet dB LM S | VGG16 LM S | 6.07 | 1.15 |
VGG16 min–max S | VGG16 LM S | 6.75 | 0.69 |
Networks | Networks | Networks | Day | Bout |
---|---|---|---|---|
VGG16 min–max S | AlexNet c. LM S | VGG16 LM S | 3.00 | 0.53 |
ResNet50 S | ResNet50 Mel S | VGG16 dB LM S | 5.09 | 0 |
ResNet50 min–max S | AlexNet LM S | VGG16 LM S | 3.00 | 0.53 |
ResNet50 S | VGG16 MFCC | VGG16 LM S | 5.47 | 0.61 |
ResNet50 S | VGG16 dB LM S | VGG16 LM S | 3.89 | 0.07 |
AlexNet min–max Mel S | VGG16 dB LM S | VGG16 LM S | 3.60 | 0.53 |
VGG16 dB LM S | AlexNet box_n Mel S | VGG16 LM S | 2.40 | 0.15 |
ResNet50 S | AlexNet VGGish | VGG16 dB LM S | 6.22 | 0.07 |
ResNet50 S | AlexNet LM S | VGG16 LM S | 2.47 | 0.23 |
ResNet50 min–max S | AlexNet dB LM S | VGG16 LM S | 3.29 | 0.53 |
Network and Feature | ERR Bout |
---|---|
VGG19 LM S | 98.9% |
ResNet101 LM S | 97.6% |
MobileNetV2 LM S | 96.3% |
Networks | Classification Time |
---|---|
AlexNet | 0.148 |
ResNet50 | 0.299 |
VGG16 | 0.688 |
Networks | Computation Time |
---|---|
Spectrograms | 0.015 |
MFCC | 0.009 |
Stockwell | 0.340 |
VGGish | 0.015 |
Mel spectrogram | 0.055 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Trapanotto, M.; Nanni, L.; Brahnam, S.; Guo, X. Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations. J. Imaging 2022, 8, 96. https://doi.org/10.3390/jimaging8040096
Trapanotto M, Nanni L, Brahnam S, Guo X. Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations. Journal of Imaging. 2022; 8(4):96. https://doi.org/10.3390/jimaging8040096
Chicago/Turabian StyleTrapanotto, Martino, Loris Nanni, Sheryl Brahnam, and Xiang Guo. 2022. "Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations" Journal of Imaging 8, no. 4: 96. https://doi.org/10.3390/jimaging8040096
APA StyleTrapanotto, M., Nanni, L., Brahnam, S., & Guo, X. (2022). Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations. Journal of Imaging, 8(4), 96. https://doi.org/10.3390/jimaging8040096