Dolphin Health Classifications from Whistle Features

Jones, Brittany; Sportelli, Jessica; Karnowski, Jeremy; McClain, Abby; Cardoso, David; Du, Maximilian

doi:10.3390/jmse12122158

Open AccessArticle

Dolphin Health Classifications from Whistle Features

by

Brittany Jones

^1,*

,

Jessica Sportelli

²

,

Jeremy Karnowski

³,

Abby McClain

^1,2

,

David Cardoso

⁴ and

Maximilian Du

⁵

¹

Naval Information Warfare Center Pacific, 53560 Hull Street, San Diego, CA 92152, USA

²

National Marine Mammal Foundation: 3131, 2240 Shelter Island Dr, San Diego, CA 92106, USA

³

University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA

⁴

San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA

⁵

Stanford University, 450 Jane Stanford Way, Stanford, CA 94305, USA

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(12), 2158; https://doi.org/10.3390/jmse12122158

Submission received: 13 August 2024 / Revised: 17 November 2024 / Accepted: 21 November 2024 / Published: 26 November 2024

(This article belongs to the Special Issue Recent Advances in Marine Bioacoustics)

Download

Browse Figures

Versions Notes

Abstract

Bottlenose dolphins often conceal behavioral signs of illness until they reach an advanced stage. Motivated by the efficacy of vocal biomarkers in human health diagnostics, we utilized supervised machine learning methods to assess various model architectures’ effectiveness in classifying dolphin health status from the acoustic features of their whistles. A gradient boosting classifier achieved a 72.3% accuracy in distinguishing between normal and abnormal health states—a significant improvement over chance (permutation test; 1000 iterations, p < 0.001). The model was trained on 30,693 whistles from 15 dolphins and the test set (15%) totaled 3612 ‘normal’ and 1775 ‘abnormal’ whistles. The classifier identified the health status of the dolphin from the whistles features with 72.3% accuracy, 73.2% recall, 56.1% precision, and a 63.5% F1 score. These findings suggest the encoding of internal health information within dolphin whistle features, with indications that the severity of illness correlates with classification accuracy, notably in its success for identifying ‘critical’ cases (94.2%). The successful development of this diagnostic tool holds promise for furnishing a passive, non-invasive, and cost-effective means for early disease detection in bottlenose dolphins.

Keywords:

bioacoustics; machine learning; health classifier; dolphin whistles

1. Introduction

Bottlenose dolphins are stoic mammals, often not exhibiting signs or symptoms of disease until they have far progressed [1]. For dolphins in professional care, timely diagnosis is imperative as delays can lead to critical illness and considerable healthcare costs. One of the fastest growing biotechnology sectors is in the development of classifiers of health status and/or disease from the features of the human voice [2,3]. Leveraging machine and deep learning strategies, researchers have developed impressive models that can classify diseases such as respiratory infection [4,5,6,7,8], dementia and Alzheimer’s [2,9,10,11,12], Parkinson’s disease [13,14,15,16,17], depression [18,19,20,21,22,23,24,25,26], suicide risk [22,27,28], and even cardiac failure [29]—all from the features of the voice (i.e., vocal biomarkers).

Acoustic data classification for marine mammals has historically been performed manually by human analysts by either listening to the audio directly or visually inspecting spectrograms (i.e., visual representations of the sound with time, frequency, and amplitude represented in an x, y, and intensity-colored plot, respectively). These methods are time and labor expensive which presents a great opportunity for applications of machine learning, both for cost/time reduction and improved classification in the acoustic domain [30,31,32,33,34,35,36]. Supervised machine learning is a method where human analysts provide labeled data (i.e., a dataset with known classification values) to an algorithm for it to iteratively learn the optimal strategy for classifying the desired output from features of the inputs [37,38]. Once the model is trained on the labeled dataset, a test set (i.e., new data that it was not trained on) is used to evaluate the model’s performance [35]. Machine learning for image classification has far outpaced that of audio classification to date. This computer vision bias has led to audio classification techniques largely relying on using the image classification of spectrograms [39,40]. Though, more recently, there has been a push towards developing workflow pipelines for classifying from 1D time-series data and/or acoustic features [7,26,41,42,43]. Both machine learning and deep learning strategies have been developed for improving the automated detection of dolphin species from acoustic recordings [44,45,46]. Roch et al. [47] used feature vectors from a combination of clicks, burst pulses, and tonal sounds to classify acoustic recordings to free-ranging delphinid species using Gaussian Mixture Models (GMM). Frasier et al. [48] had moderate success at classifying whistles of three delphinid species by subunits of whistle emissions. More recent work by these authors found success in using deep learning strategies for species recognition from echolocation clicks [49]. Jiang et al. [50] were successful in classifying the whistles of killer whales from short-finned pilot whales (accuracy = 95%) in the binary classification task, but the application of machine learning for dolphin whistle classification tasks has been relatively sparse to date.

In the agriculture domain, acoustic monitoring has been used to detect and classify changes in the health and welfare of livestock [51,52,53,54,55,56]. Applications of machine learning have recently been applied to automate these classifications. Da Silva et al. [57] was able to identify pain with 98% accuracy from other welfare statuses (e.g., hunger, thirst, cold, normal) in piglets using 20 features extracted from their squeals and a decision tree algorithm. Sadeghi et al. [58] similarly successfully trained a support vector machine to diagnose chickens with or without infectious bronchitis based on 23 features extracted from the time series of the chickens’ vocalizations. Machine learning strategies have been increasingly applied for the detection and classification of marine mammal species from passive acoustic monitoring systems [36,59,60,61,62,63]. These advanced methodologies have yet to be applied to the classification of health status of marine mammals.

Bottlenose dolphins produce whistles to communicate with conspecifics [64]. They produce multiple whistle types, most commonly their individually distinctive signature whistle [65]. Shared whistle contours are also commonly produced by affiliated conspecifics [66,67,68,69]. Whistles are produced by moving pressurized air from the nasal sacs past phonic lips located just underneath the dolphin’s blowhole [70,71,72,73]. This complex system for sound production presents multiple opportunities for changes or defects in their physiology and arousal to affect the control and consistency of these output signals. A number of studies have reported that dolphins changed the frequency and intensity of their whistle emissions during times of distress [64,65,74,75,76,77,78]. Janik et al. [79] furthered that nine out of fourteen measured whistle features differed depending on whether the dolphin was alone or interacting with group mates. These results suggested that although a whistle contour may remain fairly consistent over time, subtle feature changes may have been communicating additional information.

Growing interest in utilizing acoustic monitoring for the welfare of animals, both in professional care and in the wild, has been driven by recent studies that highlight the benefits of these non-invasive techniques and the knowledge gaps they can fill with expansive datasets [80,81,82]. For example, monitoring changes to historically recorded whistle rates for a population of dolphins can alert researchers to moments of distress or identify indicators of positive welfare [80]. While the Welfare Acoustic Monitoring System (WAMS) previously developed by our team has interesting applications for group monitoring of whistle behavior, an innovative solution for the identification of changes in health status outside of severe distress was still needed. Inspired by the success of human vocal biomarkers for the early detection and classification of illness, we hypothesized that supervised machine learning strategies would be able to classify a dolphin’s health status from features of their whistles better than chance. Successful development of this innovative diagnostic tool has the potential to provide an unobtrusive, passive, non-invasive, and inexpensive tool for early detection of disease states in the Navy’s bottlenose dolphins.

2. Materials and Methods

The U.S. Navy Marine Mammal Program (MMP) houses and cares for a population of dolphins in San Diego Bay, CA. The MMP is AAALAC-accredited and adheres to the national standards of the United States Public Health Service Policy on the Humane Care and Use of Laboratory Animals and the Animal Welfare Act. The MMP’s animal care and use program is routinely reviewed by an institutional animal care and use committee (IACUC) and the Navy Bureau of Medicine and Surgery (BUMED). BUMED concurred with the approval of MMP IACUC protocol #143-2021 and assigned NRD#1264 to the protocol for this study.

2.1. Acoustic Data

The focal dolphins (N = 15) were primarily housed in natural seawater enclosures with other dolphins of the same sex. Individual dolphin whistles were recorded during periods when the dolphins were isolated from conspecifics while free swimming in an above-ground pool as part of their standard training procedures. For this group of dolphins, this is a common occurrence in order to maintain their transport and travel behaviors for their Navy activities. Acoustic recordings from above-ground pools have been recorded and analyzed over a five-year time period between 20 September 2018–18 July 2024. The signature whistles of each dolphin were identified as the most commonly produced whistle contour when they were isolated from conspecifics [65]. A vocal catalogue for each dolphin detailing the signature whistle, shared whistles [67], and non-signature whistles are maintained by the Sound and Health team at the MMP.

Additionally, because their time spent in above-ground pools is relatively infrequent, in order to ensure weekly recordings of whistles from each dolphin, the dolphin’s whistle was ‘captured’ (i.e., the dolphin was trained to respond with a whistle when asked using a discriminative stimulus, in this case a hand signal). In most cases, except for two individuals, this was their signature whistle. For those two, the whistle contour produced on cue was accepted as their whistle. Once a week, the focal dolphins each participated in ‘acoustic physicals’. The ‘acoustic physical’ behavior consisted of the dolphin submerging their blowhole under the surface of the water and a hydrophone was placed ~1 m under the water and about 1 m from the dolphin’s melon. The animal was stationed facing the hydrophone and asked to emit their whistle five times for a variable fish reward. This data collection setup intentionally included both freely produced whistles and elicited whistles, which is similar to data collection strategies for identifying vocal biomarkers in the human voice [2]. All whistle types recorded were included in the data analyses for all dolphins.

All of the whistles were recorded with one of the following systems: a self-calibrating single-channel Soundtrap (Ocean Instruments) with a bandwidth of 20 Hz to 150 kHz and a 192 kHz sampling rate, a self-calibrating, four-channel Soundtrap with a 20 Hz to 90 kHz bandwidth and a 192 kHz sampling rate (i.e., a self-contained waterproof recording and digitizing device with space for four external hydrophones to record into) with a High Tech Inc., high-frequency hydrophone (2 Hz to 125 kHz frequency response) plugged into one of the four channel inputs, or a High Tech Inc., Long Beach, MS, USA, high-frequency hydrophone (2 Hz to 125 kHz, ±3 dB) connected to a Behringer UMC202HD sound digitizing card with real-time communication to a Panasonic CF-31 Toughbook laptop via a USB interface. Recordings were analyzed in Raven Pro 1.6, (Window type Hann, 2100 samples; DFT: 4096 samples, Hop Size 1050 samples; 50 percent overlap; Frequency y-axis: 0–50.0 kHz; Time x-axis 0–5.0 s). A researcher identified and boxed whistle emissions using the selection tool and subsequently exported each whistle as an individual .wav file (16 bit; 96 kHz sampling rate).

2.2. Health Data

The MMP maintains a comprehensive medical record database for all Navy dolphins. Operational definitions of ‘normal’ and ‘abnormal’ health status were developed with on-site veterinarians (Table 1). Dolphins were considered to have a ‘normal’ health status when they were exhibiting normal behavior as per the veterinary and training staff with no clinical signs consistent with abnormal health (e.g., refusing to eat, lethargy or decreased energy, abnormal body position in the water column, signs of discomfort or trauma, etc.). Additional inclusion criteria for a ‘normal’ health status was a dolphin that was not on any of the described medication types (Table 1) and had normal routine bloodwork (complete blood count, biochemistry, erythrocyte sedimentation rate, and fibrinogen) within 48 h of the date of the recorded whistle (MMP in-house reference ranges). Hereafter, any reference to ‘normal whistles’ is defined as a whistle recorded from a dolphin that was confirmed to be experiencing a ‘normal’ health condition based on these conditions. Dolphins were considered to have an ‘abnormal’ health status if any queried medical records were consistent with the listed abnormal conditions (Table 1). A label of 0 for normal or 1 for abnormal was assigned to each dolphin on each recording date. These were then assigned as the labels for any whistle recorded from that dolphin on that date. If more than one abnormal condition was identified on a recording date, the primary veterinarian for that dolphin advised what the primary case concern was (i.e., gastrointestinal, infection, or critical). Hereafter, any reference to ‘abnormal whistles’ is defined as a whistle recorded from a dolphin that was confirmed to be experiencing an ‘abnormal’ health condition based on these conditions. Dolphin ID was not included in the model because we did not want the classifier to consider a dolphin’s health status history in deciding how to classify the whistle. Further, the study goals include the model being able to generalize dolphins that are not in the training set and therefore we did not want dolphin ID to be a necessary feature.

To be included in the model, a dolphin had to have at least two recording dates during a time of a normal health status and two during a time of an abnormal health status. As of 18 July 2024, we had 36,080 whistles produced by 15 different dolphins that met this qualification. Of those whistles, 22,685 were from confirmed classified normal health status recording dates and 13,395 whistles were from abnormal health status dates (Figure 1, Table 2). Because our applied goal is for anomaly detection, we allowed for the mismatch in normal and abnormal whistles as this is indicative of what we expect in the real-world application.

2.3. Features

Python libraries librosa, audio_metadata, matplotlib, and sci-kit-maad [85,86,87] were utilized to load audio files and extract whistle features. Whistles are composed of frequency, temporal, and amplitude components. In order to characterize these three domains in our feature set, we extracted both ‘FFT features’, which was an average power spectral density (PSD) value for a specific frequency bandwidth for the duration of the signal, and ‘maad features’ which included temporal and entropy characteristics from the whistles (see below for more details).

The audio files were all downsampled to 96 kHz. The resulting audio signal was then converted to a floating-point representation. A matrix of PSD over time was created by applying a Fourier Transform to the audio signal with a window length of 256 points (NFFT = 256). To focus on the frequency range of interest, the matrix was bandpassed to include the frequency bins 6–85, each corresponded to a 375 Hz bandwidth (i.e., 1875 Hz–32,250 Hz). The first five FFT features were removed (0–1875 Hz) to remove low-frequency noise present from the pool’s life support systems or low-frequency anthropogenic noise sources in San Diego Bay. The resulting matrix was transformed using a base-10 logarithm and then flipped vertically for ease of interpretation. Finally, the mean PSD of each frequency slice was computed along the time axis, resulting in a single number feature vector for each frequency band. Each element represented the average PSD in dB per Hz for 375 Hz bandwidth. An open-source Python acoustic feature package scikit-maad produced 8 additional features (i.e., MEANt, VARt, SKEWt, KURTt, Ht, SNRt, MED, ZCR) from the ‘all_temporal_alpha_indices’ package (see 87 for feature descriptions). This resulted in a total of 89 unique whistle features which were used as the inputs for the model.

2.4. Model

Multiple commonly applied model architectures were assessed to identify the model that was most robust for our dataset (Figure 2). Early model exploration compared Random Forest, Support Vector Machines, Logistic Regression, Neural Network, Light Gradient Boosting Machine Classifier, Gradient Boosting Classifier, and XGB Classifier. Because anomaly detection was the highest priority, recall (i.e., the proportion of true positives to true positives + false negatives) was considered the highest priority metric, after overall accuracy, in the evaluation of model performance. The boosting classifiers performed the best on this dataset given the ‘default’ parameters, and were chosen as the model archetype moving forward.

Training is conducted by the Gradient Boosting Classifier algorithm from the Scikit Learn Python package. This GB algorithm is a gradient-boosting tree regressor that builds an ensemble of decision trees. The final prediction is a weighted combination of the individual trees. This works by first creating and fitting an initial base model (decision tree) to the chosen training data. From there, subsequent models (decision trees) are trained sequentially that are fitted with the negative gradient of the loss function with respect to the predictions of the previous ensemble (combination of previous models trained). This is performed to reduce the residual errors (the differences between the actual and predicted values) made by the previous models. Gradient boosting optimizes the loss function using gradient descent. The parameter of Number Of Estimators indicates the number of boosting stages (decision trees) that will be used in the ensemble during training. Max Depth indicates the depth of each decision tree. Learning rate pertains to the impact each new decision tree has on the final ensemble prediction. Once the training process reaches the number of estimators (number of boosting stages), the training will be terminated and produce a model that contains the Gradient Boosting ensemble to make predictions on new data with.

Parameter optimization grids were administered to identify the optimal combination of hyperparameters for the boosting classifiers. The parameters tested were all combinations of ‘n_estimators’; 100, 200, 300, 400, 500; ‘max_depth’; 5, 7, 10, 12, 15; ‘learning_rate’: 0.2, 0.1, 0.05, 0.01; ‘colsample_bytree’; 0.8, 1.0; ‘num_leaves’; 15, 31, 63, 127. The combination of parameters that produced the highest recall while maintaining a high level of overall accuracy were chosen as the final model parameters (see Table 3 for metric definitions).

Throughout the model exploration and hyperparameter optimization trials, a validation set of 10% of the dataset was utilized to help compare multiple runs. For all reported results, the training set comprised 85% of the dataset and tested on 15%. Training, validation, and testing sets were chosen at random from the pool of data available.

In order to ensure that the model was not learning specific characteristics of whistles produced on a certain date (e.g., background noise from that date) and generalizing the label to whistles with similar background noise levels or other non-whistle feature indicators, all of the data was grouped by animal × date so to improve sample independence. Therefore, all of the whistles produced by dolphin × on date × were either in the training set or in the test set, but not spread across the two. This is also indicative of the real-world application of how the model would be used to classify new whistles from an animal on a new date.

All experiments were executed using Python 3.11.5 in an Anaconda environment on a Dell laptop. Figure 3 details the process from data collection to classification output.

3. Results

The final gradient boosting classifier (num_estimators = 500, learning_rate = 0.2, max_depth = 10) was able to classify a dolphin’s health status (i.e., normal or abnormal) from their whistle features 72.3% of the time, which was significantly better than chance, the permutation test given 1000 iterations of randomly assigned whistle labels, p < 0.001. The model was trained on 30,693 whistles from 15 dolphins (19,073 randomly selected from 22,685 normal; 11,620 randomly selected from 13,395 abnormal). The test set consisted of 3612 normal whistles and 1775 abnormal whistles (15% of the data). The final model classified a whistle as abnormal using a threshold of 0.034. The classifier performed with 72.3% accuracy, 73.2% recall, and 56.1% precision and 63.5% F1 score on the test dataset.

We chose the optimal model threshold for detecting abnormals by comparing the default threshold (i.e., 0.5) to the ROC-AUC optimal performance threshold (i.e., 0.034) (Figure 4).

Post-hoc assessment of the specific health condition the dolphin was experiencing for each of the abnormal whistles in the test set showed that the classifier performed the best on whistles emitted during critical health condition, 94.2%, and worst during infection cases, 40.8%. It is important to note that critical cases are typically characterized by an infection that is far progressed; therefore, the most extreme infection cases are removed from that category potentially making the infection category more difficult to classify. As the data is based on opportunistic availability, there are varying sample sizes for each condition which likely play a role on the accuracy of the classifier across diseases (Table 4).

Based on availability, the number of whistles contributed by each dolphin were not equal. The minimum number of whistles a dolphin contributed to the model was 387. We present in Table 5 the breakdown for how the model performed for each dolphin for the test set (15% of total whistle data). It is important to note that each dolphin had both normal and abnormal whistle dates in order to be included in the model to ensure that the model did not learn to categorize a specific animal’s signature whistle into one health category. Table 5 demonstrates that the model was not simply categorizing a dolphin’s whistle contour as one health status or the other. The model correctly classified a range of 42.9–97.73% of whistles, by dolphin, with an average of (mean ±standard error), 77.4 +/− 3.99%. The false alarm rate ranged from 0–51.4% with an average of 14.3 +/− 3.71% and an average miss rate of 8.23 +/− 1.43%. Finally, Table 6 provides a breakdown of the test dataset for each whistle type. Signature whistles performed the best, which was expected as they made up the majority of the dataset.

Analysis of the feature relevance revealed that six (out of 89) acoustic features had a relative importance greater than 0.02, collectively accounting for 0.41 of the cumulative importance to model performance. See Figure 5 for the features with a relative importance > 0.01. The feature MEANt, representing the average value of the whistle’s amplitude distribution, ranked the highest in relative importance (0.16). The zero-crossing rate (ZCR), a measure of the frequency of sign changes in the signal, was the second most important feature (0.10). MED, the median of the audio signal envelope (0.065), further contributed to model performance by capturing an alternative measure of central tendency within the amplitude distribution. The feature from the 4500 Hz frequency band (0.044), which represents the mean power spectral density (PSD) within the range of 4500–4875 Hz, was also among the most relevant features. Temporal kurtosis (KURTt; 0.023), which describes the “tailedness” or intensity of amplitude peaks in the signal, and temporal entropy (Ht; 0.022), a measure of the signal’s unpredictability and variation over time, also ranked within the top six features.

4. Discussion

Leveraging machine learning, a gradient boosting classifier was able to classify health status from acoustic features of dolphin whistles with a 72.3% accuracy, which was significantly better than chance. This further seems to be related to the severity of illness, as ‘critical’ illness had the highest accuracy compared to less extreme ailments such as gastrointestinal issues and mild infection. Future research with larger sample sizes for each primary case concern will be needed to assess whether this is a byproduct of small and unequal sample sizes or if the severity of illness is actually encoded in whistle features. However, it is interesting that the model performed the best when identifying critical cases, even though this was not the health condition with the largest sample size. This finding is encouraging for future work where generalizing and adapting this model to free ranging cetaceans may benefit rehabilitated animals and help passively monitor recovery with less need to handle those animals [88]. This supports previous reports that suggested that dolphins in ‘distress’ emit signature whistles at a higher repetition rate, intensity, and with slightly altered parameters [64,65,74,75,76,77,78]. This also likely explains part of the variance seen in Table 6, where signature whistles were better classified than other whistle types. That said, signature whistles made up the majority of the training data (which is similar to the whistle output of free-swimming dolphins). Future research will be needed to better understand whether features of signature whistles are more likely to change during health changes compared to non-signature contours, or if there is a physiological driver that would result in similar feature changes across any whistle production. Dolphins are stoic animals who often hide behavioral signs of illness until it has progressed far. It is possible that they also conceal acoustic indicators of health until the illness meets a more critical level. These findings support Dr. Sam Ridgway’s original hypothesis that information about the internal health status of a dolphin may be either intentionally or unintentionally encoded in characteristics of their whistles (Jones personal communication). The model performed the worst in detecting infection cases, not fitting the inclusion criteria of a critical case, with a 40.8% accuracy. Despite a lower accuracy, the model could still aid veterinarians in detecting more mild infections that can often be difficult to detect behaviorally without more invasive sampling such as blood draws. It is also important to note that the critical category was made up of whistles from dolphins with more severe infections or illnesses. Therefore, the health category of ‘infection’ becomes a more difficult classification task as the most severe/obvious cases were put into another category. Additionally, detecting mild infections even with more invasive diagnostic techniques like blood draws or biopsies can be a difficult task in dolphins and an actual etiologic agent is not often determined. Treatment may be initiated early based on mild changes in bloodwork or behavior that may not have caused changes to the whistle at the time of initiation of treatment. Along with variations in veterinary practice, dolphins that fit the inclusion criteria into the health category of ‘infection’ may have been treated early enough to prevent an infection or may have had mild behavioral or bloodwork changes that were not due to an infectious agent but were treated as such. These variations could have also decreased the accuracy of the model’s function with this health category.

By prioritizing recall, we chose to use a model with reduced overall accuracy, but believe this is a truer representation of the model’s ability to identify abnormal health statuses from dolphin whistles. It is important in anomaly detection to prioritize recall, especially when using an unbalanced dataset to ensure you are not inflating your accuracy metric with the control class. In addition to under sampling our dataset, we prioritized recall over precision as the veterinary staff suggested that a false positive would be less detrimental for the animal’s health and welfare than a missed positive. This may not be true for all classification tasks, and therefore care should be taken when choosing what metrics to optimize. Although we demonstrate that the model is working better than chance, it will be important to consider the cost of early identification compared to the costs of false alarms. The next steps of this project will aim to increase the sample sizes of whistles and dolphins participating in order to improve the model performance to get the classification accuracy and recall to a level that is useful on a daily basis.

While prior studies have established that whistle parameters change during distress calls and various behavioral contexts, this study represents an initial investigation into whether these parameters may also shift in relation to health changes. The results of the feature relevance analysis indicate that summary statistic features, such as the temporal mean and median, zero crossing rate, kurtosis, and entropy of the 1D signal, tended to rank higher than the power spectral density of specific frequency band features. This is noteworthy, as it is a common practice to include many frequency-specific features (e.g., maximum, minimum, start, end, etc.) in bioacoustic research, while less emphasis is placed on summary statistics. It remains unclear which whistle features are most salient and perceptible to dolphins; thus, incorporating a mixture of ‘point’ characteristics and summarizing statistics may enhance future models. Furthermore, it is uncertain whether any of these features are universally related to health across all dolphins or if individual dolphins exhibit unique patterns in their whistle parameter changes. Interestingly, relatively low-frequency components emerged as more relevant to the model than higher frequency components. These findings raise more questions than answers; future research should test hypotheses regarding resonant frequencies and potential shifts in lower frequency components during health changes, paralleling observations in human vocalizations that tend to lower in pitch during certain illnesses. While these preliminary findings suggest that characteristic shifts do occur, a more comprehensive analysis is needed to elucidate which parameters are affected, whether these shifts are consistent across dolphins, the underlying mechanisms driving these changes, and whether specific types of disease are encoded in different features.

One limitation of our study was that our recordings reflected the conservative end of ‘abnormal health status’ as the MMP is composed of a very healthy group of dolphins relative to wild counterparts [89]. Unlike many of the studies in livestock that induce pain such as castration, or experimentally inject infectious agents, perform hunger trials, or expose the animals to severe temperatures, the majority of whistles collected for this study were achieved during relatively non-stressful routine sessions with passive sampling. This is important, as what would be considered abnormal health in this group of animals with around-the-clock care from animal caretakers and veterinary staff may be considered normal for free-ranging dolphins. Another limitation is that the signature whistle contour of the dolphins can be utilized for the model to learn the ‘health history’ of that animal and use that to classify a certain whistle shape into a specific health category. In order to mitigate this potential bias, we ensured each dolphin had at least two dates in both normal and abnormal health. Table 5 demonstrates that the model is not typically classifying all of the whistles for a specific dolphin into one category or another, even though, with the random test data split, all of the whistles tested on could have belonged to the same health category. For the applied goals of the project, the model taking into account the health history of the individual would be welcome just as a veterinarian’s previous experience with a specific dolphin can provide valuable insights into the potential diagnosis.

This initial success opens the door for many future research endeavors. The exploration of whether the model can generalize and perform above chance levels for Navy dolphins in which the model was not trained on will be an important first step. Further, whether it can generalize dolphins at other zoos and aquaria and even free-ranging dolphins’ health will allow us to consider applications for translational goals. Finally, an exploration of what whistle features were most heavily weighted in the model for success will give us insights into the vocal biomarkers of health status for marine mammals. The continued development of these novel diagnostic tools holds promise for furnishing an unobtrusive, passive, non-invasive, and cost-effective means for early disease detection in bottlenose dolphins.

5. Conclusions

Here we have demonstrated an interesting use case for supervised machine learning in bioacoustics. A gradient boosting classifier was able to successfully classify a dolphin’s health from the acoustic features of their whistles in a binary classification task. This seemed to be related to the severity of illness, with critical cases being identified most successfully. This suggests that health ailments may be encoded in the features of dolphin whistles similar to the way that vocal biomarkers can be utilized to identify illness in humans.

Author Contributions

Conceptualization: B.J.; methodology: B.J., J.S., A.M. and J.K.; programming/software development: J.K., D.C., M.D. and B.J.; formal analysis and investigation: B.J. and J.S.; writing—original draft preparation: B.J.; writing—review and editing: J.S., A.M., J.K. and D.C.; funding acquisition: B.J.; supervision: B.J. All authors have read and agreed to the published version of the manuscript.

Funding

The authors are extremely grateful to Dr. Laura Kienker and Dr. Sandra Chapman at the Office of Naval Research (ONR) for their support of Sound as Indicators of Health and Welfare of the Navy’s Dolphin; ONR Grant# N00014-18-1-2643 and, developing machine learning of vocal biomarkers for non-invasive monitoring of the health and welfare status of the Navy’s mine-hunting dolphins; ONR Grant# N00014-21-1-2414.

Institutional Review Board Statement

The focal population for the present study comprised dolphins in the Navy Marine Mammal Program (MMP) residing in San Diego Bay, CA, USA. These dolphins are housed in natural seawater enclosures in the bay and routinely perform open-water exercises in the Pacific Ocean. Recordings occurred during routine training procedures in which the animals were already habituated. The program is AAALAC-accredited and follows the national standards of the United States Public Health Service Policy on the Humane Care and Use of Laboratory Animals and the Animal Welfare Act. The MMP’s animal care and use program is routinely reviewed by an institutional animal care and use committee (IACUC) and the Navy Bureau of Medicine and Surgery (BUMED). BUMED concurred with the approval of MMP IACUC protocol #130-2018 and assigned NRD#1134 to the protocol for this study. As noted above, the research for this study adhered to the ASAB and ABS Guidelines for the Use of Animals in Research.

Informed Consent Statement

Not appliable.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are not publicly available as they are owned by the U.S. Navy marine mammal program. Data may be made available upon approval of a formal sample request. The Python code for the feature creation FFT and MAAD is available on github from https://scikit-maad.github.io/generated/maad.features.all_temporal_alpha_indices.html#maad.features.all_temporal_alpha_indices (accessed on 17 November 2024).

Acknowledgments

The authors are very grateful to the U.S. Navy Marine Mammal Program for allowing us to opportunistically record the dolphins in its care. Many thanks to Risa Daniels and Mark Baird for all of their time and contributions throughout all aspects of these efforts. A special thank you to Yohan Sequiera for his work as a student data scientist helping to optimize the software program efficiency. We appreciate Juan Sebastián Ulloa, Jérôme Sueur, Thierry Aubin, Sylvain Haupert, and Juan Felipe Latorre for sharing their feature extraction programs opening on github. This study would not be possible without the support of Braden Duryee, Jaime Bratis, Megan Sereyko-Dunn, Amanda Naderer, Sarah Hammar, Courtney Luni, Brit Swenberg, Dani Werneth and their incredible animal care teams. Thank you to Mark Xitco and Eric Jensen for their constant support of the study and review and feedback on an earlier draft of this manuscript. Thank you to the National Marine Mammal Foundation for their support of the sound and health studies. The authors are extremely grateful to the Office of Naval Research (ONR) for their support of this study through ONR Grant# N00014-18-1-2643 and ONR Grant# N00014-21-1-2414.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yeates, L.C.; Carlin, K.P.; Baird, M.; Venn-Watson, S.K.; Ridgway, S.H. Nitric oxide in the breath of bottlenose dolphins: Effects of breath hold duration, feeding, and lung disease. Mar. Mam. Sci. 2013, 30, 272–281. [Google Scholar] [CrossRef]
Fagherazzi, G.; Fischer, A.; Ismael, M.; Despotovic, V. Voice for health: The use of vocal biomarkers from research to clinical practice. Digit. Biomark. 2021, 5, 78–88. [Google Scholar] [CrossRef] [PubMed]
Rahman, S.M.A.; Ibtisum, S.; Bazgir, E.; Barai, T. The significance of machine learning in clinical disease diagnosis: A review. arXiv 2023, arXiv:2310.16978. [Google Scholar] [CrossRef]
Aljbawi, W.; Simmons, S.O.; Urovi, V. Developing a multi-variate prediction model for the detection of COVID-19 from crowd-sourced respiratory voice data. arXiv 2022, arXiv:2209.03727. [Google Scholar] [CrossRef]
Casado, C.Á.; Cañellas, M.L.; Pedone, M.; Wu, X.; López, M.B. Audio-based classification of respiratory diseases using advanced signal processing and machine learning for assistive diagnosis support. arXiv 2023, arXiv:2309.07183. [Google Scholar] [CrossRef]
Lella, K.K.; Pja, A. A literature review on COVID-19 disease diagnosis from respiratory sound data. AIMS Bioeng. 2021, 8, 140–153. [Google Scholar] [CrossRef]
Lella, K.K.; Pja, A. Automatic COVID-19 disease diagnosis using 1D convolutional neural network and augmentation with human respiratory sound based on parameters: Cough, breath, and voice. AIMS Public Health 2021, 8, 240–264. [Google Scholar] [CrossRef]
Suppakitjanusant, P.; Sungkanuparph, S.; Wongsinin, T.; Virapongsiri, S.; Kasemkosin, N.; Chailurkit, L.; Ongphiphadhanakul, B. Identifying individuals with recent COVID-19 through voice classification using deep learning. Sci. Rep. 2021, 11, 19149. [Google Scholar] [CrossRef]
Calzà, L.; Gagliardi, G.; Favretti, R.R.; Tamburini, F. Linguistic features and automatic classifiers for identifying mild cognitive impairment and dementia. Comput. Speech Lang. 2021, 65, 101113. [Google Scholar] [CrossRef]
Chen, L.; Dodge, H.H.; Asgari, M. Measures of voice quality as indicators of mild cognitive impairment. Alzheimer’s Dement. 2022, 18, e067393. [Google Scholar] [CrossRef]
Kong, W.; Jang, H.; Carenini, G.; Field, T.S. Exploring neural models for predicting dementia from language. Comput. Speech Lang. 2021, 68, 101181. [Google Scholar] [CrossRef]
López-de-Ipiña, K.; Solé-Casals, J.; Eguiraun, H.; Alonso, J.B.; Travieso, C.M.; Ezeiza, A.; Barroso, N.; Ecay-Torres, M.; Martinez-Lage, P.; Beitia, B. Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: A fractal dimension approach. Comput. Speech Lang. 2015, 30, 43–60. [Google Scholar] [CrossRef]
Bayestehtashk, A.; Asgari, M.; Shafran, I.; McNames, J. Fully automated assessment of the severity of Parkinson’s disease from speech. Comput. Speech Lang. 2015, 29, 172–185. [Google Scholar] [CrossRef] [PubMed]
Bhattacharjee, S.; Xu, W. VoiceLens: A multi-view multi-class disease classification model through daily-life speech data. Smart Health 2021, 23, 100233. [Google Scholar] [CrossRef]
Karan, B.; Sahu, S.S.; Orozco-Arroyave, J.R.; Mahto, K. Non-negative matrix factorization-based time-frequency feature extraction of voice signal for Parkinson’s disease prediction. Comput. Speech Lang. 2021, 69, 101216. [Google Scholar] [CrossRef]
Khan, T.; Lundgren, L.E.; Anderson, D.G.; Nowak, I.; Dougherty, M.; Verikas, A.; Pavel, M.; Jimison, H.; Nowaczyk, S.; Aharonson, V. Assessing Parkinson’s disease severity using speech analysis in non-native speakers. Comput. Speech Lang. 2020, 61, 101047. [Google Scholar] [CrossRef]
Warule, P.; Mishra, S.P.; Deb, S. Time-frequency analysis of speech signal using Chirplet transform for automatic diagnosis of Parkinson’s disease. Biomed. Eng. Lett. 2023, 13, 613–623. [Google Scholar] [CrossRef]
Hashim, N.W.; Wilkes, M.; Salomon, R.; Meggs, J.; France, D.J. Evaluation of voice acoustics as predictors of clinical depression scores. J. Voice 2017, 31, 256.e1–256.e6. [Google Scholar] [CrossRef]
Lee, S.; Suh, S.W.; Kim, T.; Kim, K.; Lee, K.H.; Lee, J.R.; Han, G.; Hong, J.W.; Han, J.W.; Lee, K.; et al. Screening major depressive disorder using vocal acoustic features in the elderly by sex. J. Affect. Disord. 2021, 291, 15–23. [Google Scholar] [CrossRef]
Lin, D.; Nazreen, T.; Rutowski, T.; Lu, Y.; Harati, A.; Shriberg, E.; Chlebek, P.; Aratow, M. Feasibility of a Machine Learning-Based Smartphone Application in Detecting Depression and Anxiety in a Generally Senior Population. Front. Psychol. 2022, 13, 811517. [Google Scholar] [CrossRef]
Mundt, J.C.; Vogel, A.P.; Feltner, D.E.; Lenderking, W.R. Vocal acoustic biomarkers of depression severity and treatment response. Biol. Psychiatry 2012, 72, 580–587. [Google Scholar] [CrossRef] [PubMed]
Ozdas, A.; Shiavi, R.G.; Silverman, S.E.; Silverman, M.K.; Wilkes, D.M. Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Trans. Biomed. Eng. 2004, 51, 1530–1540. [Google Scholar] [CrossRef] [PubMed]
Silva, W.J.; Lopes, L.; Galdino, M.K.C.; Almeida, A.A. Voice acoustic parameters as predictors of depression. J. Voice 2024, 38, 77–85. [Google Scholar] [CrossRef] [PubMed]
Sturim, D.; Torres-Carrasquillo, P.A.; Quatieri, T.F.; Malyska, N.; McCree, A. Automatic detection of depression in speech using gaussian mixture modeling with factor analysis. In Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 28–31 August 2011. [Google Scholar]
Wasserzug, Y.; Degani, Y.; Bar-Shaked, M.; Binyamin, M.; Klein, A.; Hershko, S.; Levkovitch, Y. Development and validation of a machine learning-based vocal predictive model for major depressive disorder. J. Affect. Disord. 2022, 325, 627–632. [Google Scholar] [CrossRef]
Weiner, L.; Guidi, A.; Doignon-Camus, N.; Giersch, A.; Bertschy, G.; Vanello, N. Vocal features obtained through automated methods in verbal fluency tasks can aid the identification of mixed episodes in bipolar disorder. Transl. Psychiat 2021, 11, 415. [Google Scholar] [CrossRef]
France, D.J.; Shiavi, R.G.; Silverman, S.; Silverman, M.; Wilkes, D.M. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 2000, 47, 829–837. [Google Scholar] [CrossRef]
Iyer, R.; Meyer, D. Detection of suicide risk using vocal characteristics: Systematic review. JMIR Biomed. Eng. 2022, 7, e42386. [Google Scholar] [CrossRef]
Firmino, J.V.; Melo, M.; Salemi, V.; Bringel, K.; Leone, D.; Pereira, R.; Rodrigues, M. Heart failure recognition using human voice analysis and artificial intelligence. Evol. Intell. 2023, 16, 2015–2027. [Google Scholar] [CrossRef]
Gnyś, P.; Szczęsna, G.; Domínguez-Brito, A.C.; Cabrera-Gámez, J. Automated audio dataset generation tool for classification tasks in marine science. Res. Sq. 2024, preprint. [Google Scholar] [CrossRef]
Malde, K.; Handegard, N.O.; Eikvil, L.; Salberg, A.B. Machine intelligence and the data-driven future of marine science. ICES J. Mar. Sci. 2019, 77, 1274–1285. [Google Scholar] [CrossRef]
Oswald, J.N.; Rankin, S.; Barlow, J.; Lammers, M.O. A tool for real-time acoustic species identification of delphinid whistles. J. Acoust. Soc. Am. 2007, 122, 587–595. [Google Scholar] [CrossRef]
Oswald, J.N.; Erbe, C.; Gannon, W.L.; Madhusudhana, S.; Thomas, J.A. Detection and Classification Methods for Animal Sounds. In Exploring Animal Behavior Through Sound; Erbe, C., Thomas, J.A., Eds.; Springer: Cham, Switzerland, 2022; pp. 269–317. [Google Scholar]
Roch, M.A.; Brandes, T.S.; Patel, B.; Barkley, Y.; Baumann-Pickering, S.; Soldevilla, M.S. Automated extraction of odontocete whistle contours. J. Acoust. Soc. Am. 2011, 130, 2212–2223. [Google Scholar] [CrossRef]
Ryazanov, I.; Nylund, A.T.; Basu, D.; Hassellöv, I.-M.; Schliep, A. Deep Learning for Deep Waters: An Expert-in-the-Loop Machine Learning Framework for Marine Sciences. J. Mar. Sci. Eng. 2021, 9, 169. [Google Scholar] [CrossRef]
Shiu, Y.; Palmer, K.J.; Roch, M.A.; Fleishman, E.; Liu, X.; Nosal, E.-M.; Helble, T.; Cholewiak, D.; Gillespie, D.; Klinck, H. Deep neural networks for automated detection of marine mammal species. Sci. Rep. 2020, 10, 607. [Google Scholar] [CrossRef]
Bianco, M.J.; Gerstoft, P.; Traer, J.; Ozanich, E.; Roch, M.A.; Gannot, S.; Deledalle, C.-A. Machine learning in acoustics: Theory and applications. J. Acoust. Soc. Am. 2019, 146, 3590–3628. [Google Scholar] [CrossRef]
Caruso, F.; Dong, L.; Lin, M.; Liu, M.; Gong, Z.; Xu, W.; Alonge, G.; Li, S. Monitoring of a nearshore small dolphin species using passive acoustic platforms and supervised machine learning techniques. Front. Mar. Sci. 2020, 7, 267. [Google Scholar] [CrossRef]
Ferrari, M.; Glotin, H.; Marxer, R.; Asch, M. Open access dataset of marine mammal transient studies and end-to-end CNN classification. In Proceedings of the HAL Open Science, Glasgow, UK, 19–24 July 2020. [Google Scholar]
Kong, Q.; Cao, Y.; Iqbal, T.; Wang, Y.; Wang, W.; Plumbley, M.D. PANNs: Large-scale pretrained audio neural networks for audio pattern recognition. arXiv 2019, arXiv:2020.3030497. [Google Scholar] [CrossRef]
Lai, T.; Ho, T.K.K.; Armanfard, N. Open-Set multivariate time-series anomaly detection. arXiv 2023, arXiv:2310.12294v3. [Google Scholar] [CrossRef]
Ravanelli, M.; Bengio, Y. Speaker recognition from raw waveform with sincnet. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop, Athens, Greece, 18–21 December 2018; pp. 1021–1028. [Google Scholar] [CrossRef]
September, M.A.K.; Passino, F.S.; Goldmann, L.; Hinel, A. Extended deep adaptive input normalization for preprocessing time series data for neural networks. arXiv 2023, arXiv:2310.14720. [Google Scholar]
Li, P.; Liu, X.; Palmer, K.J.; Fleishman, E.; Gillespie, D.; Nosal, E.-M.; Shiu, Y.; Klinck, H.; Cholewiak, D.; Helble, T.; et al. Learning deep models from synthetic data for extracting dolphin whistle contours. In Proceedings of the International Joint Conference of Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
Nanni, L.; Cuza, D.; Brahnam, S. Building ensemble of resnet for dolphin whistle detection. Appl. Sci. 2023, 13, 8029. [Google Scholar] [CrossRef]
Usman, A.M.; Ogundile, O.O.; Versfeld, D.J.J. Review of automatic detection and classification techniques for cetacean vocalization. IEEE Access 2020, 8, 105181–105206. [Google Scholar] [CrossRef]
Roch, M.A.; Soldevilla, M.; Hoenigman, R.; Wiggins, S.M.; Hildebrand, J.A. Comparison of machine learning techniques for the classification of echolocation clicks from three species of odontocetes. Can. Acoust. 2008, 36, 41–47. [Google Scholar]
Frasier, K.E.; Henderson, E.E.; Bassett, H.R.; Roch, M.A. Automated identification and clustering of subunits within delphinid vocalizations. Mar. Mammal. Sci. 2016, 32, 911–930. [Google Scholar] [CrossRef]
Frasier, K.E. A machine learning pipeline for classification of cetacean echolocation clicks in large underwater acoustic datasets. PLoS Comput. Biol. 2021, 17, e1009613. [Google Scholar] [CrossRef]
Jiang, J.; Bu, L.; Duan, F.; Wang, X.; Liu, W.; Sun, Z.; Li, C. Whistle detection and classification for whales based on convolutional neural networks. Appl. Acoust. 2019, 150, 169–178. [Google Scholar] [CrossRef]
Devi, I.; Dudi, K.; Singh, Y.; Lathwal, S.S. Bioacoustics features as a tool for early diagnosis of pneumonia in riverine buffalo (Bubalus bubalis) calves. Buffalo Bull. 2023, 40, 399–407. [Google Scholar]
Exadaktylos, V.; Silva, M.; Aerts, J.M.; Taylor, C.J.; Berckmans, D. Real-time recognition of sick pig cough sounds. Comput. Electron. Agric. 2008, 63, 207–214. [Google Scholar] [CrossRef]
Laurijs, K.A.; Briefer, E.F.; Reimert, I.; Webb, L.E. Vocalisations in farm animals: A step towards positive welfare assessment. Appl. Anim. Behav. Sci. 2021, 236, 105264. [Google Scholar] [CrossRef]
Manteuffel, G.; Puppe, B.; Schön, P.C. Vocalization of farm animals as a measure of welfare. Appl. Anim. Behav. Sci. 2004, 88, 163–182. [Google Scholar] [CrossRef]
Marx, G.; Horn, T.; Thielebein, J.; Knubel, B.; Borell, E. von Analysis of pain-related vocalization in young pigs. J. Sound. Vib. 2003, 266, 687–698. [Google Scholar] [CrossRef]
Mcloughlin, M.P.; Stewart, R.; McElligott, A.G. Automated bioacoustics: Methods in ecology and conservation and their potential for animal welfare monitoring. J. Roy. Soc. Interface 2019, 16, 20190225. [Google Scholar] [CrossRef] [PubMed]
da Silva, J.P.; de Alencar Nääs, I.; Abe, J.M.; da Silva Cordeiro, A.F. Classification of piglet (Sus Scrofa) stress conditions using vocalization pattern and applying paraconsistent logic Eτ. Comput. Electron. Agric. 2019, 166, 105020. [Google Scholar] [CrossRef]
Sadeghi, M.; Khazaee, M.; Soleimani, M.R.; Banakar, A. An intelligent procedure for the detection and classification of chickens infected by clostridium perfringens based on their vocalization. Braz. J. Poult. Sci. 2015, 17, 537–544. [Google Scholar] [CrossRef]
Bergler, C.; Smeele, S.Q.; Tyndel, S.A.; Barnhill, A.; Ortiz, S.T.; Kalan, A.K.; Cheng, R.X.; Brinkløv, S.; Osiecka, A.N.; Tougaard, J.; et al. ANIMAL-SPOT enables animal-independent signal detection and classification using deep learning. Sci. Rep. 2022, 12, 21966. [Google Scholar] [CrossRef]
Bermant, P.C.; Bronstein, M.M.; Wood, R.J.; Gero, S.; Gruber, D.F. Deep Machine Learning Techniques for the Detection and Classification of Sperm Whale Bioacoustics. Sci. Rep. 2019, 9, 12588. [Google Scholar] [CrossRef]
Lu, T.; Han, B.; Yu, F. Detection and classification of marine mammal sounds using AlexNet with transfer learning. Ecol. Inform. 2021, 62, 101277. [Google Scholar] [CrossRef]
Zhong, M.; LeBien, J.; Campos-Cerqueira, M.; Dodhia, R.; Ferres, J.L.; Velev, J.P.; Aide, T.M. Multispecies bioacoustic classification using transfer learning of deep convolutional neural networks with pseudo-labeling. Appl. Acoust. 2020, 166, 107375. [Google Scholar] [CrossRef]
Zhong, M.; Torterotot, M.; Branch, T.A.; Stafford, K.M.; Royer, J.-Y.; Dodhia, R.; Ferres, J.L. Detecting, classifying, and counting blue whale calls with Siamese neural networks). J. Acoust. Soc. Am. 2021, 149, 3086–3094. [Google Scholar] [CrossRef]
Janik, V.M.; Sayigh, L.S. Communication in bottlenose dolphins: 50 years of signature whistle research. J. Comp. Physiol. A 2013, 199, 479–489. [Google Scholar] [CrossRef]
Caldwell, M.C.; Caldwell, D.K. Individualized whistle contours in bottlenosed dolphins (Tursiops truncatus). Nature 1965, 207, 434–435. [Google Scholar] [CrossRef]
Janik, V.M.; Slater, P.J.B. Context-specific use suggests that bottlenose dolphin signature whistles are cohesion calls. Anim. Behav. 1998, 56, 829–838. [Google Scholar] [CrossRef]
Jones, B.L.; Daniels, R.; Tufano, S.; Ridgway, S.H. Five members of a mixed-sex group of bottlenose dolphins share a stereotyped whistle contour in addition to maintaining their individually distinctive signature whistles. PLoS ONE 2020, 15, e0233658-15. [Google Scholar] [CrossRef]
Sayigh, L.S.; Wells, R.S.; Janik, V.M. What’s in a voice? Dolphins do not use voice cues for individual recognition. Anim. Cogn. 2017, 20, 1067–1079. [Google Scholar] [CrossRef]
Watwood, S.L.; Tyack, P.L.; Wells, R.S. Whistle sharing in paired male bottlenose dolphins, Tursiops truncatus. Behav. Ecol. Sociobiol. 2004, 55, 531–543. [Google Scholar] [CrossRef]
Madsen, P.T.; Jensen, F.H.; Carder, D.; Ridgway, S.H. Dolphin whistles: A functional misnomer revealed by heliox breathing. Biol. Lett. 2012, 8, 211–213. [Google Scholar] [CrossRef]
Madsen, P.T.; Siebert, U.; Elemans, C.P.H. Toothed whales use distinct vocal registers for echolocation and communication. Science 2023, 379, 928–933. [Google Scholar] [CrossRef]
Ridgway, S.; Carder, D. Nasal pressure and sound production in an echolocating white whale, Delphinapterus leucas. In Animal Sonar; Nachtigall, P.E., Moore, P., Eds.; Plenum Publishing Corporation: New York, NY, USA, 1988; pp. 53–60. [Google Scholar]
Sportelli, J.J.; Jones, B.L.; Ridgway, S.H. Non-linear phenomena: A common acoustic feature of bottlenose dolphin (Tursiops truncatus) signature whistles. Bioacoustics 2022, 32, 241–260. [Google Scholar] [CrossRef]
Esch, H.C.; Sayigh, L.S.; Blum, J.E.; Wells, R.S. Whistles as potential indicators of stress in bottlenose dolphins (Tursiops truncatus). J. Mammal. 2009, 90, 638–650. [Google Scholar] [CrossRef]
Eskelinen, H.C.; Richardson, J.L.; Tufano, S. Stress, whistle rate, and cortisol. Mar. Mammal. Sci. 2020, 38, 765–777. [Google Scholar] [CrossRef]
Kuczaj, S.A.K.; Frick, E.E.; Jones, B.L.; Lea, J.S.E.; Beecham, D.; Schnöller, F. Underwater observations of dolphin reactions to a distressed conspecific. Learn. Behav. 2015, 43, 289–300. [Google Scholar] [CrossRef]
Ridgway, S.H.; Far, R.R.; Gourevitch, G. Dolphin hearing and sound production in health and illness. In Hearing and Other Senses; Fay, R.R., Gourevitch, G., Eds.; The Amphora Press: Andover, MA, USA, 1983; pp. 247–296. [Google Scholar]
Watwood, S.L.; Owen, E.C.G.; Tyack, P.L.; Wells, R.S. Signature whistle use by temporarily restrained and free-swimming bottlenose dolphins, Tursiops truncatus. Anim. Behav. 2005, 69, 1373–1386. [Google Scholar] [CrossRef]
Janik, V.M.; Dehnhardt, G.; Todt, D. Signature whistle variations in a bottlenosed dolphin, Tursiops truncatus. Behav. Ecol. Sociobiol. 1994, 35, 243–248. [Google Scholar] [CrossRef]
Jones, B.L.; Oswald, M.; Tufano, S.; Baird, M.; Mulsow, J.; Ridgway, S.H. A system for monitoring acoustics to supplement an animal welfare plan for bottlenose dolphins. J. Zool. Bot. Gard. 2021, 2, 222–233. [Google Scholar] [CrossRef]
Stevens, P.E.; Hill, H.M.; Bruck, J.N. Cetacean acoustic welfare in wild and managed-care settings: Gaps and opportunities. Animals 2021, 11, 3312. [Google Scholar] [CrossRef]
Winship, K.A.; Jones, B.L. Acoustic monitoring of professionally managed marine mammals for health and welfare insights. Animals 2023, 13, 2124. [Google Scholar] [CrossRef]
Bossart, G.D.; Romano, T.A.; Peden-Adams, M.M.; Schaefer, A.M.; Rice, C.D.; Fair, P.A.; Reif, J.S. Comparative innate and adaptive immune responses in Atlantic bottlenose dolphins (Tursiops truncatus) with viral, bacterial, and fungal infections. Front. Immunol. 2019, 10, 1125. [Google Scholar] [CrossRef]
Handin, R.; Lux, S.; Stossel, T. (Eds.) Blood: Principles and Practice of Hematology; Lippincott Williams and Wilkins: Philadelphia, PA, USA, 2003; 2304p. [Google Scholar]
McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.W.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; pp. 18–24. [Google Scholar] [CrossRef]
Tosi, S. Matplotlib for Python Developers; Packt: Birmingham, UK, 2009; 293p. [Google Scholar]
Ulloa, J.S.; Haupert, S.; Latorre, J.F.; Aubin, T.; Sueur, J. scikit-maad: An open-source and modular toolbox for quantitative soundscape analysis in Python. Methods Ecol. Evol. 2021, 12, 2334–2340. [Google Scholar] [CrossRef]
Ramos, E.A.; Jones, B.L.; Austin, M.; Eierman, L.; Collom, K.A.; Melo-Santos, G.; Castelblanco-Martínez, N.; Arreola, M.R.; Sánchez-Okrucky, R.; Rieucau, G. Signature whistle use and changes in whistle emission rate in a rehabilitated rough-toothed dolphin. Front. Mar. Sci. 2023, 10, 1278299. [Google Scholar] [CrossRef]
Venn-Watson, S.K.; Jensen, E.D.; Smith, C.R.; Xitco, M.; Ridgway, S.H. Evaluation of annual survival and mortality rates and longevity of bottlenose dolphins (Tursiops truncatus) at the United States Navy Marine Mammal Program from 2004 through 2013. JAVMA 2015, 246, 893–898. [Google Scholar] [CrossRef]

Figure 1. Whistle examples. Spectrograms (time on the x-axis and frequency on the y-axis) of whistles collected from three dolphins during normal and abnormal health states.

Figure 2. Model exploration results. Initial model exploration found that the boosting classifiers performed best on this dataset. Default parameters were used and boosting classifiers all utilized a random_state = 5, n_estimators = 200, max_depth = 10, and learning_rate = 0.1.

Figure 3. Process schematic. Process schematic for data collection, data split, feature generation, model optimization, and testing of health classifier.

Figure 4. ROC_AUC plot of the true positive rate (sensitivity). ROC_AUC plot of the true positive rate (sensitivity) of the model against the false positive rate (specificity) at varying thresholds. The G-mean was calculated for each threshold and the threshold that corresponded to the maximum G-mean (Best) was 0.034 with an AUROC = 0.803.

Figure 5. Feature relevance analysis. The relative importance (RI) of the whistle parameters, with features plotted on the x-axis and their corresponding RI values on the y-axis. Features with an RI greater than 0.01 are included. MEANt: the temporal mean, calculated as the average amplitude of the whistle signal. ZCR: the zero-crossing rate, calculated as the rate of sign changes in the signal. MED: the median amplitude of the signal, calculated as the midpoint value of the amplitude waveform. KURTt: kurtosis, calculated by taking the fourth moment of the amplitude values about the mean and normalizing by the variance squared. Ht: entropy of the audio signal, calculated as the measure of energy dispersion within the signal. VarT: variance, calculated as the average of the squared differences from the mean amplitude value. Skewt: skewness, calculated by taking the average of the cubed deviations from the mean, normalized by the variance raised to the power of 1.5. SNRt: signal-to-noise ratio, calculated as the ratio of the mean power of the signal to the mean power of the background noise. Any feature labelled with a numeric value represents the frequency in hertz at the start of the corresponding 375 Hz bandwidth from which the mean power spectral density (PSD) was calculated.

Table 1. Operational definitions of health statuses. Health definitions were written in collaboration with MMP on-site veterinarians.

Health Classification	Case Type	Definition
‘Normal’	Normal	A focal dolphin had an isolated recording on a date where they did not have any administration of an oral pain medication AND did not have a veterinary observation pertaining to injury, trauma, or abnormal body appearance AND did not have any administration of an antimicrobial medication AND were not administered any gastroprotectant or anti-nausea medication AND did not have evidence of gastritis AND did not have any administration of oral analgesics or ophthalmic medications AND did not have a veterinary observation indicating the presence of an injury, trauma, or abnormal eye appearance.
‘Abnormal’ *	Critical case	A focal dolphin had an isolated recording on a date where they had a blood value within 48 h of the recording that had a white blood cell count > 14 × 103/uL (MMP in-house reference ranges) AND/OR the dolphin received intravenous fluids and/or medications via intravenous or subcutaneous routes AND/OR a focal dolphin had an isolated recording on a date where they had an erythrocyte sedimentation rate > 35 mm/h within 48 h of the recording AND received medical intervention from the veterinary team.
	Infectious case (non-critical)	A focal dolphin had an isolated recording on a date within the first 10 days of treatment with an antimicrobial OR from the beginning to the end of the course of antimicrobial therapy if less than 10 days; antimicrobials are prescribed for clinical signs indicating general malaise and/or inappetence AND bloodwork indicating a likelihood of bacterial and/or fungal infection (specific analytes or combination or analytes outside of the normal reference range [83,84] AND/OR a relevant clinical sample is culture positive for pathogenic microbe(s).
	Gastrointestinal case	A focal dolphin had an isolated recording on a date where they had a gastric sample score that was greater than one, indicating gastritis AND/OR the isolated recording was on a date within the first five days of treatment with a gastrointestinal medication (i.e., gastroprotectant, anti-emetic) OR between the start and end date of the course of that treatment if less than five days, but was also not prescribed meloxicam (a dolphin may be prescribed a prophylactic gastroprotectant medication when given meloxicam).

* If a dolphin fit into more than one category, the primary case concern was determined by the attending veterinarian.

Table 2. Dolphin characteristics. Characteristics of the 15 focal dolphins and the number of whistles contributed to the study.

Animal ID	Sex	Age	Normal Whistles	Abnormal Whistles	Total Whistles
0	M	20	1565	125	1690
1	F	45	2861	934	3795
2	F	21	1091	162	1253
3	M	25	924	197	1121
4	M	32	778	241	1019
5	M	25	1227	85	1312
6	F	42	4031	2146	6177
7	M	43	4546	6837	11,383
8	M	20	2904	208	3184
9	F	11	503	2117	2620
10	M	20	442	24	466
11	F	18	553	160	713
12	M	16	368	19	387
13	M	14	365	22	387
14	F	41	527	46	573
Total			22,685	13,395	36,080

Table 3. Metrics, definitions, and equations. Metrics utilized for model analyses, their definition, and the way they were calculated.

Metric	Definition	Equation
Accuracy	Ratio of correct classifications to all classifications	(True Positives + True Negatives)/Total
Precision	Ratio of how often an abnormal classification was true	True Positives/ (True Positives + False Alarms)
Recall	Ratio of how often the actual abnormals were correctly classified	True Positives/Total Abnormals
F1 Score	A metric to describe the balance between precision and recall	2(Precision × Recall)/Precision + Recall
G-Mean	Balanced metric between sensitivity (true positive) and specificity (true negative)	√Sensitivity-Specificity

Table 4. Test data accuracy. Breakdown of test data accuracy by the primary health concern associated with that whistle. Abnormal cases are further broken down into gastrointestinal, infection, and critical. The whistles tested on represent the 15% of data that was held back from the training set.

Primary Concern	Whistles Tested	% Correct
Normal	3612	71.8%
Gastrointestinal	1126	63.4%
Infection	49	40.8%
Critical	600	94.2%

Table 5. Test data results. Results breakdown by dolphin for the test set (i.e., 15% of total whistle data).

Dolphin	Sex	Age	Normal Whistles	Abnormal Whistles	Whistles Tested	% Correct	% True Abnormal	% True Normal	% False Alarm	% Missed Abnormal
0	M	20	1565	125	125	76.8	2.4	74.4	13.6	9.6
1	F	45	2861	934	376	75	16.76	58.24	14.36	10.64
2	F	21	1091	162	105	85.71	3.81	81.9	7.62	6.67
3	M	25	924	197	90	76.67	15.56	61.11	12.22	11.11
4	M	32	778	241	87	65.52	9.2	56.32	14.94	19.54
5	M	25	1227	85	114	89.47	8.77	80.7	7.02	3.51
6	F	42	4031	2164	589	76.38	37.18	39.22	15.28	8.32
7	M	43	4546	6837	1320	87.58	63.41	24.17	5	7.42
8	M	20	2904	280	280	81.77	6.43	75.36	7.86	10.36
9	F	11	503	2117	354	97.17	89.55	7.63	0	2.82
10	M	20	442	24	23	91.3	0	91.3	0	8.7
11	F	18	553	160	62	82.26	17.74	64.52	6.45	11.29
12	M	16	368	19	5	80	0	80	20	0
13	M	14	365	22	18	72.22	0	72.22	11.11	16.67
14	F	41	527	46	26	92.31	7.69	84.62	0	7.69

Table 6. Whistle type breakdown. Results breakdown of the different whistle types for the test set (i.e., 15% of total whistle data).

Whistle Type	Whistles Tested	% Correct	% Incorrect	% True Abnormal	%True Normal	% False Alarms	% Missed Abnormal
Signature Whistle	2452	82.11	17.89	30.26	51.88	8.28	9.58
Group Whistle	962	61.63	38.37	17.15	44.49	26.61	11.75
Shared Whistle	348	55.74	44.26	10.92	44.83	39.37	4.89
Non-Signature Whistle	1574	67.56	32.44	21.79	45.81	25.92	6.48
Signature Whistle Copy	51	56.86	43.14	21.57	35.29	25.49	17.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jones, B.; Sportelli, J.; Karnowski, J.; McClain, A.; Cardoso, D.; Du, M. Dolphin Health Classifications from Whistle Features. J. Mar. Sci. Eng. 2024, 12, 2158. https://doi.org/10.3390/jmse12122158

AMA Style

Jones B, Sportelli J, Karnowski J, McClain A, Cardoso D, Du M. Dolphin Health Classifications from Whistle Features. Journal of Marine Science and Engineering. 2024; 12(12):2158. https://doi.org/10.3390/jmse12122158

Chicago/Turabian Style

Jones, Brittany, Jessica Sportelli, Jeremy Karnowski, Abby McClain, David Cardoso, and Maximilian Du. 2024. "Dolphin Health Classifications from Whistle Features" Journal of Marine Science and Engineering 12, no. 12: 2158. https://doi.org/10.3390/jmse12122158

APA Style

Jones, B., Sportelli, J., Karnowski, J., McClain, A., Cardoso, D., & Du, M. (2024). Dolphin Health Classifications from Whistle Features. Journal of Marine Science and Engineering, 12(12), 2158. https://doi.org/10.3390/jmse12122158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dolphin Health Classifications from Whistle Features

Abstract

1. Introduction

2. Materials and Methods

2.1. Acoustic Data

2.2. Health Data

2.3. Features

2.4. Model

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI