Benchmarking an Integrated Deep Learning Pipeline for Robust Detection and Individual Counting of the Greater Caribbean Manatee
Abstract
1. Introduction
Related Work
2. Materials and Methods
2.1. Training and Testing DBs
2.2. Computational Infrastructure
2.3. Feature Extraction and Pre-Processing
2.4. Data Mining and Augmentation
2.5. Model Building and Configuration
2.6. Model Training, Evaluation, and Cross-Validation
2.7. Model Inference and MIR-FE
2.8. Unsupervised Learning
3. Results
3.1. Acoustic Data Processing
3.2. Model Benchmarking
3.3. Manatee Call Detection
3.4. Individual Manatee Count
4. Discussion
Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- May-Collado, L. Marine mammals. In Marine Biodiversity of Costa Rica, Central America; Springer: New York, NY, USA, 2009; pp. 479–495. [Google Scholar]
- Keith Diagne, L. Trichechus Senegalensis. The IUCN Red List of Threatened Species 2015. Available online: https://www.researchgate.net/publication/350857049_THE_IUCN_RED_LIST_OF_THREATENED_SPECIES-_African_Manatee_Assessment_Errata_version (accessed on 15 February 2026).
- Marsh, H. Dugong Dugon (Amended Version of 2015 Assessment). The IUCN Red List of Threatened Species 2019. 2019. Available online: https://www.marinemammalhabitat.org/factsheets/northern-great-barrier-reef/ (accessed on 15 February 2026).
- Freitas, K. Detecção de Zoonoses em Carnes de Caça Comercializadas na Região do Médio Rio Solimões–Coari-AM. Instituto Nacional de Pesquisas da Amazônia—INPA. 2023. Available online: https://www.gov.br/inpa/pt-br (accessed on 15 February 2026).
- Human actIvity Devastating Marine Species from Mammals to Corals—IUCN Red List. 2023. Available online: https://iucn.org/press-release/202212/human-activity-devastating-marine-species-mammals-corals-iucn-red-list (accessed on 15 February 2026).
- Lin, M.; Turvey, S.T.; Han, C.; Huang, X.; Mazaris, A.D.; Liu, M.; Ma, H.; Yang, Z.; Tang, X.; Li, S. Functional extinction of dugongs in China. R. Soc. Open Sci. 2022, 9, 211994. [Google Scholar] [CrossRef]
- Marine Animals: Species Directory. Available online: https://www.fisheries.noaa.gov/species-directory/marine-mammals (accessed on 15 February 2026).
- Kayanne, H.; Hara, T.; Arai, N.; Yamano, H.; Matsuda, H. Trajectory to local extinction of an isolated dugong population near Okinawa Island, Japan. Sci. Rep. 2022, 12, 6151. [Google Scholar] [CrossRef]
- Deutsch, C.; Self-Sullivan, C.; Mignucci-Giannoni, A. Trichechus manatus. The IUCN Red List of Threatened Species 2008: E. T22103A9356917. 2008. Available online: https://manatipr.org/wp-content/uploads/2014/06/Deutsch08IUCN.pdf (accessed on 15 February 2026).
- Cubero-Pardo, P.; Castro-Azofeifa, C.; Corella, F.Q.; Ramírez, S.M.; Ramírez, E.V.; Sánchez, S.B.; Vargas-Bolaños, C. Antillean manatee (Trichechus manatus Manatus) Occur. Grazing Spots Three Prot. Areas Costa Rica. Lat. Am. J. Aquat. Mamm. 2024, 19, 82–90. [Google Scholar]
- Goal 14th: Life Below Water. 2024. Available online: https://globalgoals.org/goals/14-life-below-water/ (accessed on 15 February 2026).
- Ramos, E.A.; Maust-Mohl, M.; Collom, K.A.; Brady, B.; Gerstein, E.R.; Magnasco, M.O.; Reiss, D. The Antillean manatee produces broadband vocalizations with ultrasonic frequencies. J. Acoust. Soc. Am. 2020, 147, EL80–EL86. [Google Scholar] [CrossRef]
- Bittle, M.; Duncan, A. A review of current marine mammal detection and classification algorithms for use in automated passive acoustic monitoring. In Proceedings of Acoustics; Australian Acoustical Society: Victor Harbor, SA, Australia, 2013; Volume 2013. [Google Scholar]
- Allen, A.N.; Harvey, M.; Harrell, L.; Jansen, A.; Merkens, K.P.; Wall, C.C.; Cattiau, J.; Oleson, E.M. A convolutional neural network for automated detection of humpback whale song in a diverse, long-term passive acoustic dataset. Front. Mar. Sci. 2021, 8, 607321. [Google Scholar] [CrossRef]
- Fleishman, E.; Cholewiak, D.; Gillespie, D.; Helble, T.; Klinck, H.; Nosal, E.M.; Roch, M.A. Ecological inferences about marine mammals from passive acoustic data. Biol. Rev. 2023, 98, 1633–1647. [Google Scholar] [CrossRef]
- Rycyk, A.M.; Berchem, C.; Marques, T.A. Estimating Florida manatee (Trichechus manatus Latirostris) Abundance Using passive acoustic methods. JASA Express Lett. 2022, 2, 051202. [Google Scholar] [CrossRef]
- Brady, B.; Ramos, E.A.; May-Collado, L.; Landrau-Giovannetti, N.; Lace, N.; Arreola, M.R.; Santos, G.M.; da Silva, V.M.F.; Sousa-Lima, R.S. Manatee calf call contour and acoustic structure varies by species and body size. Sci. Rep. 2022, 12, 19597. [Google Scholar] [CrossRef]
- Usman, A.M.; Ogundile, O.O.; Versfeld, D.J. Review of automatic detection and classification techniques for cetacean vocalization. IEEE Access 2020, 8, 105181–105206. [Google Scholar] [CrossRef]
- Merchan, F.; Echevers, G.; Poveda, H.; Sanchez-Galan, J.E.; Guzman, H.M. Detection and identification of manatee individual vocalizations in Panamanian wetlands using spectrogram clustering. J. Acoust. Soc. Am. 2019, 146, 1745–1757. [Google Scholar] [CrossRef] [PubMed]
- Factheu, C.; Rycyk, A.M.; Kekeunou, S.; Keith-Diagne, L.W.; Ramos, E.A.; Kikuchi, M.; Takoukam Kamla, A. Acoustic methods improve the detection of the endangered African manatee. Front. Mar. Sci. 2023, 9, 1032464. [Google Scholar] [CrossRef]
- Erbs, F.; van der Schaar, M.; Marmontel, M.; Gaona, M.; Ramalho, E.; André, M. Amazonian manatee critical habitat revealed by artificial intelligence-based passive acoustic techniques. Remote Sens. Ecol. Conserv. 2024, 11, 172–186. [Google Scholar] [CrossRef]
- Sousa-Lima, R.S.; Paglia, A.P.; Da Fonseca, G.A. Signature information and individual recognition in the isolation calls of Amazonian manatees, Trichechus inunguis (Mammalia: Sirenia). Anim. Behav. 2002, 63, 301–310. [Google Scholar] [CrossRef]
- Sousa-Lima, R.S.; Paglia, A.P.; da Fonseca, G.A.B. Gender, age, and identity in the isolation calls of Antillean manatees (Trichechus manatus Manatus). Aquat. Mamm. 2008, 34, 109–122. [Google Scholar] [CrossRef]
- Castro, J.M.; Rivera, M.; Camacho, A. Automatic manatee count using passive acoustics. In Proceedings of Meetings on Acoustics; Acoustical Society of America: Melville, NY, USA, 2015; Volume 23, p. 010001. [Google Scholar] [CrossRef]
- Quirós-Corella, F.; Cubero-Pardo, P.; Rycyk, A.; Brady, B.; Castro-Azofeifa, C.; Mora-Ramírez, S.; Ureña-Madrigal, J.P. An effective artificial intelligence pipeline for automatic manatee count using their tonal vocalizations. In Proceedings of the Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications; Hernández-García, R., Barrientos, R.J., Velastin, S.A., Eds.; Springer: Cham, Switzerland, 2025; pp. 30–44. [Google Scholar]
- Landrau-giovannetti, N.; Mignucci-giannoni, A.A.; Reidenberg, J.S. Acoustical and Anatomical Determination of Sound Production and Transmission in West Indian (Trichechus Manatus) Amaz. (T. Inunguis) Manatees. Anat. Rec. 2014, 297, 1896–1907. [Google Scholar] [CrossRef]
- Brady, B.; Moore, J.; Love, K. Behavior related vocalizations of the Florida manatee (Trichechus manatus Latirostris). Mar. Mammal Sci. 2022, 38, 975–989. [Google Scholar] [CrossRef]
- O’Shea, T.J.; Poché, L.B. Aspects of underwater sound communication in Florida manatees (Trichechus manatus Latirostris). J. Mammal. 2006, 87, 1061–1071. [Google Scholar] [CrossRef]
- Schneider, S.; Von Fersen, L.; Dierkes, P.W. Acoustic estimation of the manatee population and classification of call categories using artificial intelligence. Front. Conserv. Sci. 2024, 5, 1405243. [Google Scholar] [CrossRef]
- Bianco, M.J.; Gerstoft, P.; Traer, J.; Ozanich, E.; Roch, M.A.; Gannot, S.; Deledalle, C.A. Machine Learning in acoustics: Theory and applications. J. Acoust. Soc. Am. 2019, 146, 3590–3628. [Google Scholar] [CrossRef]
- Mouy, X.; Leary, D.; Martin, B.; Laurinolli, M. A comparison of methods for the automatic classification of marine mammal vocalizations in the Arctic. In Proceedings of the 2008 New Trends for Environmental Monitoring Using Passive Systems, Hyeres, France, 14–17 October 2008; pp. 1–6. [Google Scholar]
- Zhong, M.; Castellote, M.; Dodhia, R.; Lavista Ferres, J.; Keogh, M.; Brewer, A. Beluga whale acoustic signal classification using deep learning neural network models. J. Acoust. Soc. Am. 2020, 147, 1834–1841. [Google Scholar] [CrossRef]
- Liu, S.; Liu, M.; Wang, M.; Ma, T.; Qing, X. Classification of cetacean whistles based on convolutional neural network. In Proceedings of the 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 18–20 October 2018; pp. 1–5. [Google Scholar]
- Murphy, D.T.; Ioup, E.; Hoque, M.T.; Abdelguerfi, M. Residual learning for marine mammal classification. IEEE Access 2022, 10, 118409–118418. [Google Scholar] [CrossRef]
- Thomas, M.; Martin, B.; Kowarski, K.; Gaudet, B.; Matwin, S. Marine mammal species classification using convolutional neural networks and a novel acoustic representation. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Würzburg, Germany, 16–20 September 2019; Springer: New York, NY, USA, 2019; pp. 290–305. [Google Scholar]
- Lu, T.; Han, B.; Yu, F. Detection and classification of marine mammal sounds using AlexNet with transfer learning. Ecol. Inform. 2021, 62, 101277. [Google Scholar] [CrossRef]
- Merchan, F.; Guerra, A.; Poveda, H.; Guzmán, H.M.; Sanchez-Galan, J.E. Bioacoustic classification of Antillean manatee vocalization spectrograms using deep convolutional neural networks. Appl. Sci. 2020, 10, 3286. [Google Scholar] [CrossRef]
- Rycyk, A.; Bolaji, D.A.; Factheu, C.; Kamla Takoukam, A. Using transfer learning with a convolutional neural network to detect African manatee (Trichechus senegalensis) Vocalizations. JASA Express Lett. 2022, 2, 121201. [Google Scholar] [CrossRef]
- Rycyk, A.; Cargille, V.; Bojali, D.; Factheu, C.; Ejimadu, U.; Berchem, C.; Takoukam Kamla, A. Bioacoustic Dataset of African and Florida Manatee Vocalizations for Machine Learning Applications, 2020–2022 ver 1; Environmental Data Initiative: Madison, WI, USA, 2025. [Google Scholar]
- Rycyk, A.M.; Factheu, C.; Ramos, E.A.; Brady, B.A.; Kikuchi, M.; Nations, H.F.; Kapfer, K.; Hampton, C.M.; Garcia, E.R.; Takoukam Kamla, A. First characterization of vocalizations and passive acoustic monitoring of the vulnerable African manatee (Trichechus senegalensis). J. Acoust. Soc. Am. 2021, 150, 3028–3037. [Google Scholar] [CrossRef]
- Knight, E.; Rhinehart, T.; de Zwaan, D.R.; Weldy, M.J.; Cartwright, M.; Hawley, S.H.; Larkin, J.L.; Lesmeister, D.; Bayne, E.; Kitzes, J. Individual identification in acoustic recordings. Trends Ecol. Evol. 2024, 39, 947–960. [Google Scholar] [CrossRef]
- Sainburg, T. Timsainb/Noisereduce: V1.0. 2019. Available online: https://zenodo.org/records/3243139 (accessed on 15 February 2026).
- Sainburg, T.; Thielk, M.; Gentner, T.Q. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Comput. Biol. 2020, 16, e1008228. [Google Scholar] [CrossRef]
- Fitzgerald, D. Harmonic/percussive separation using median filtering. In Proceedings of the 13th International Conference on Digital Audio Effects (DAFX10), Graz, Austria, 6–10 September 2010. [Google Scholar]
- Driedger, J.; Müller, M.; Disch, S. Extending harmonic-percussive separation of audio Signals. In Proceedings of the ISMIR, Taipei, Taiwan, 27–31 October 2014; pp. 611–616. [Google Scholar]
- Zoubir, A.M.; Boashash, B. The bootstrap and its application in signal processing. IEEE Signal Process. Mag. 1998, 15, 56–76. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]










| CPUs | Memory | GPUs | OS |
|---|---|---|---|
| 2× Intel Xeon Silver 4214R @ 2.40 GHz | 31 GiB | 1× Tesla V100-PCIE-32GB | Linux 3.10.0-64bit |
| 1× Intel Xeon Silver 4416+ @ 2.00 GHz | 256 GiB | 1× NVIDIA L40S–48GB | Linux 5.14.0-64bit |
| Model | Duration [s] | Accuracy | Loss | AUC-ROC |
|---|---|---|---|---|
| CNN | 1.80 | 98.01% | 6.05% | 97.97% |
| VGG-16 | 2.85 | 98.72% | 4.58% | 98.59% |
| VGG-19 | 5.04 | 98.46% | 5.32% | 98.33% |
| Model | Class | Precision | Recall | F1-Score |
|---|---|---|---|---|
| CNN | false vocalization | 98.12% | 98.91% | 98.52% |
| true vocalization | 97.83% | 96.28% | 97.04% | |
| VGG-16 | false vocalization | 98.98% | 99.11% | 99.04% |
| true vocalization | 98.21% | 97.94% | 98.08% | |
| VGG-19 | false vocalization | 98.66% | 99.01% | 98.83% |
| true vocalization | 98.00% | 97.32% | 97.66% |
| Sample | [ | Duration [ | WAV-FE [ | MCD [ |
|---|---|---|---|---|
| 96 | 20.37 | 15.98 | 1.35 | |
| 48 | 18.71 | 12.14 | 0.13 | |
| 96 | 34.08 | 21.71 | 0.18 | |
| 96 | 7.39 | 5.07 | 0.12 | |
| 96 | 17.36 | 12.35 | 0.13 | |
| 48 | 13.07 | 9.77 | 0.12 | |
| 48 | 16.35 | 10.58 | 0.13 | |
| 48 | 3.78 | 2.56 | 0.12 | |
| 96 | 59.36 | 38.26 | 0.17 | |
| 96 | 57.32 | 36.72 | 0.18 |
| Sample | Expected | Predicted | Valid | Error |
|---|---|---|---|---|
| 9 | 10 | 9 | 0.11 | |
| 6 | 7 | 7 | 0.17 | |
| 6 | 17 | 16 | 1.83 | |
| 5 | 5 | 5 | 0.00 | |
| 4 | 7 | 7 | 0.75 | |
| 4 | 5 | 4 | 0.25 | |
| 3 | 9 | 3 | 2.00 | |
| 2 | 1 | 1 | 0.50 | |
| 16 | 22 | 22 | 0.38 | |
| 14 | 17 | 16 | 0.21 |
| Model | Duration [ | Accuracy (Std) | Loss (Std) |
|---|---|---|---|
| CNN | 27.00 | 97.93% (±0.08%) | 5.78% (±0.14%) |
| VGG-16 | 35.13 | 98.92% (±0.08%) | 4.06% (±0.08%) |
| VGG-19 | 45.22 | 98.55% (±0.08%) | 4.94% (±0.13%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Quirós-Corella, F.; Rycyk, A.; Brady, B.; Cubero-Pardo, P. Benchmarking an Integrated Deep Learning Pipeline for Robust Detection and Individual Counting of the Greater Caribbean Manatee. Appl. Sci. 2026, 16, 2446. https://doi.org/10.3390/app16052446
Quirós-Corella F, Rycyk A, Brady B, Cubero-Pardo P. Benchmarking an Integrated Deep Learning Pipeline for Robust Detection and Individual Counting of the Greater Caribbean Manatee. Applied Sciences. 2026; 16(5):2446. https://doi.org/10.3390/app16052446
Chicago/Turabian StyleQuirós-Corella, Fabricio, Athena Rycyk, Beth Brady, and Priscilla Cubero-Pardo. 2026. "Benchmarking an Integrated Deep Learning Pipeline for Robust Detection and Individual Counting of the Greater Caribbean Manatee" Applied Sciences 16, no. 5: 2446. https://doi.org/10.3390/app16052446
APA StyleQuirós-Corella, F., Rycyk, A., Brady, B., & Cubero-Pardo, P. (2026). Benchmarking an Integrated Deep Learning Pipeline for Robust Detection and Individual Counting of the Greater Caribbean Manatee. Applied Sciences, 16(5), 2446. https://doi.org/10.3390/app16052446

