1. Introduction
Understanding the grazing behavior of ruminants can have a positive impact on productivity, products quality and metabolic well-being. Due to the difficulty of collecting behavioral data through direct observation, which involves following animals at short distances and recording their activities during grazing, alternative detection systems to direct observation have been developed. Among the various systems, the detection of the head position by means of triaxial accelerometers is probably the most frequently adopted method by researchers. Herbage prehension, chewing, and rumination activities or resting can be recognized using this technique [
1,
2], although the data are not always reliable due to the incorrect classification of head movements as prehensions [
3]. This can lead to overestimation of prehension and rumination movements [
4].
Video and audio recording represents another available tool for detecting feeding behavior. Sound studies, mostly conducted on cattle, have often been carried out under extremely controlled conditions, for example, with micro-swards fed to animals for short periods, in conditions of almost no external sounds [
4,
5,
6,
7]. Several sound classification systems have been developed with good performances [
6,
8,
9,
10,
11]. The adaptation of the above classification systems to the ovine species could represent a possible alternative to the creation of new tools. Nevertheless, under less controlled grazing conditions, as occurs in actual sheep farming practice, loud background noises, such as the sound of bells, the wind, or the noise of nearby animals, can make the detection of feeding behavior through audio recording more challenging than in the controlled situations adopted in the abovementioned studies. Jin et al. (2022) [
12] emphasize that automatic classification systems designed for cattle should not be adapted to sheep due to behavioral and physiological differences between species.
Few researchers focused on sheep feeding behavior by using sounds. Milone et al. (2009) [
6] in highly controlled grazing conditions, using micro swards, showed good performance of an automatic method of recognition of chewing events in sheep. Galli et al. (2020) [
4], in similar controlled conditions, found that accuracy in classifying jaw movements was better for cows than sheep. Wang et al. (2021) [
13] found a model that classifies foraging behavior in sheep with an accuracy of 93%. However, the classification was based on a very limited dataset, and the experimental conditions were very uncomfortable for the animals. Indeed, sheep had a collar connected to a rope so that the experienced staff affected sheep’s natural movement during grazing activity.
To the best of our knowledge, only one study available in the literature [
14] has been conducted with sheep fed in real grazing conditions and not on artificial swards. These authors found that automatic recognition of jaw movements was more accurate in cattle and goats than in sheep, where the false positive rate 24% compared to 7 and 4%, respectively. In fact, when sheep graze, sound is often severely disturbed by noises resulting from a different ingestion behavior compared to cattle.
In most studies on the modeling of sounds detected during grazing, preliminary recognition of grasping, chewing, ruminating, and other activities was achieved through video recordings with standard digital cameras. Therefore, to film the animal’s mouth, it was necessary to follow it, risking disturbing its natural feeding behavior. The alternative often used is to film the animals with fixed cameras positioned in front of the animal, but these involve very controlled grazing situations, on specially created artificial swards. The use of point of view (POV) cameras overcomes several limitation of traditional video recording, such as poor lighting conditions, distance from the animals, and the potential disturbance caused by following them.
Animal-borne cameras have been widely used to study the behavior of wild animals [
15] and domestic cats [
16]. To our knowledge, only a few studies tested the use of these cameras to detect grazing sheep behavior, during daylight hours [
17] and during the night [
18]. The same authors considered this approach the only valid system for detecting grazing animal behavior, demonstrating that POV cameras provide a tool for highlighting the microstructure of the bites in animals grazing on complex grasslands rich in diverse botanical species. For this reason, we believe that the use of POV cameras represents the most effective system to characterize with certainty the feeding behavior of grazing animals
The aim of the study was to evaluate, using POV cameras, the effectiveness of audio detection in recognizing typical feeding sounds in free-ranging sheep. The study also aimed to assess whether the recognition of these sounds could be influenced by pasture characteristics.
2. Materials and Methods
All of the animals were managed according to the guidelines of the Animal Ethics Committee (O.P.B.A.) of the University of Catania (prot. No.232046).
The study was conducted using 12 dry Valle del Belice ewes. All sheep grazed on two swards of 0.5 Ha with different botanical composition according to the following plan: 10 May, mixed sward with a predominance of grass (G); 13 May, mixed sward with a predominance of legumes (L) (
Table 1).
The temperature during grazing hours on 10 and 13 May was 24 and 25 °C, respectively; there was no precipitation and the wind speed ranged between 7.5 and 8.0 km/h on both days.
2.1. Biomass, Chemical and Botanical Composition of Swards
At the beginning of the two grazing days, three 0.5 × 0.5 m squares randomly distributed throughout the grazed areas, were cut at the base of each plant. The cut herbage was weighed, and each species was separated, classified into three classes (grasses, legumes and other essences) and weighed. All samples of the individual species from each 0.5 × 0.5 m square were subjected to near-infrared reflectance spectroscopy according to ISO 12099:2017 [
19]. The chemical composition of each plot was then calculated using weighted averages.
2.2. Grazing Behavior Detection
All ewes were fitted with a collar equipped with a POV camera (GOPRO HERO 11, GoPro, Inc., San Mateo, CA, USA) (
Figure 1).
The animals were left to graze for about 90 min per day. Each video was automatically segmented by the POV camera into 7–8 videos of approximately 12 min. Each of these short videos was analyzed separately and, therefore, for statistical purposes, it was considered as an experimental unit. A total of approximately 36,000 herbage prehensions were classified.
Selective behavior was assessed for each video, by noting the selected species every 60 s. At the end of all the observations, the percentage of grasses, legumes, and other essences was calculated. The selectivity index (SI) for the different forage species, calculated as the ratio between the percentage of selected species and the percentage of the same species present in the pasture, was evaluated according to the classification proposed by Stuth (1991) [
20]: SI < 0.7, rejected; SI 0.7–1.3, neutral; SI 1.3–2.5, preferred; SI > 2.5, highly preferred.
2.3. Audio File Analysis
In the first phase, all audio files produced by the cameras (without viewing the videos) were listened to by specially trained researchers. By listening to the sounds in WAV format and viewing the graphic representation of the sound wave, sounds recognized as herbage prehension (
Figure 2) and rumination activity were highlighted using a dedicated web application, named BioAcoustic Labeler (BAL), specifically developed by our research team at the University of Catania. The tool was designed to visualize the audio signal, annotate prehension and rumination events, and listen to specific segments in loop. The BioAcoustic Labeler produces CSV files with timestamps indicating the start and end of each activity, which were then used to calculate, for each audio file, the time spent eating, ruminating, and performing other activities during the observation period.
To evaluate if bites on legumes and grass differ in terms of sound characteristics, 60 audio segments of grass and 50 of legumes were randomly selected by observing the corresponding videos. For each audio segment, a set of temporal and spectral acoustic features was extracted to characterize the prehension events. The Duration of each segment was computed from the annotated start and end timestamps, providing an estimate of the bite length. The Power, in decibel Full Scale (dBFS), offered a logarithmic measure of that energy, independent of normalization. The Pitch, representing the perceived fundamental frequency, was estimated to capture harmonic patterns related to the biting action. The Zero Crossing Rate (ZCR) was used to quantify the number of times the signal crosses zero, serving as an indicator of high-frequency content or signal discontinuity. Each segment was also converted into its frequency domain representation using a spectrogram on a logarithmic scale, allowing for visual comparison of spectral content between classes with a consistent color range from –80 dB to 0 dB (
Figure 3).
2.4. Video File Analysis
Following audio analysis, all video files were subjected to observation. The aim was to validate the audio file analysis through visual recording of the same behavioral aspects detected with audio recordings alone: number of herbage prehensions and rumination activity, or other activities. Each observed event was highlighted using the same tool used for the audio, juxtaposing and synchronizing each audio file with the corresponding video (
Figure 4). This allowed us to identify true positive, false positives and false negatives.
The botanical composition of the selected diet was determined by video observation: for each video the choice of botanical species was observed every 1 min. Therefore, for each ewe, this choice was observed approximately 90 times.
2.5. Calculation of Time of Ingestion, Rumination and Other Activities
The estimated (audio) and actual (video) ingestion time were calculated from the CSV files generated by listening to the audio and watching the videos, according to the following criterion: ingestion is characterized by phases dedicated to prehension and rapid, coarse chewing of the bite, interspersed with phases in which the sheep is stationary or moves in search of other areas to graze. The ingestion time was therefore calculated by adding the time elapsed, for each ingestion phase, from the first to the last prehension, followed by an interval of more than 10 s. This interval was chosen arbitrarily based on the observations made considering that, in our experimental conditions, the rapid chewing of the bite just ingested does not last beyond this interval. Therefore, it can be considered that for more than 10 s, the animal is stationary or moving in search of other pasture areas. All intervals between prehensions exceeding 10 s were considered “other activities.” The time spent ruminating was calculated by adding the rumination periods (from bolus regurgitation and chewing to swallowing), excluding inactivity exceeding 10 s.
2.6. Statistical Analysis
The SPSS 26 package was used for statistical analysis. Data relating to ingestion time, rumination time, time spent on other activities, and number of prehensions were subjected to an univariate general linear model, considering the detection method (audio only; audio supported by video) and the type of pasture (grasses; legumes) as fixed factors, as well as their interaction. Correctly detected prehensions (true positives), false positives, false negatives, chemical and botanical composition of the pasture, selected essences, and selectivity indices were subjected to an univariate general linear model, considering the type of pasture (grasses; legumes) as a fixed factor. Duration, Power, Pitch, and Zero Crossing Rate (ZCR) data were analyzed by an univariate general linear model, considering the type of plants (grass or legumes) as a fixed factor.
All data detected by listening to the audio files relating to ingestion and rumination time and number of prehensions were subjected to simple linear regression analysis, using the respective data collected by observing the video files as independent variables.
To explore the distribution of the extracted audio features, descriptive statistical analysis of prehension classes (grass and legumes) was performed using boxplots. This approach provides a clear visual representation of each feature across classes, highlighting differences in central tendency, variability, and the presence of potential outliers.
4. Discussion
The subjective framing of POV cameras clearly highlighted all the animals’ behavioral activities without any disturbance that could interfere with their normal behavior. Therefore, in our experimental conditions, the sounds detected in the presence of POV camera videos, clearly identified and classified as prehension, rumination, or other activity, represented real data, while the sounds classified in the absence of videos were considered estimated data. No animals showed behavioral signs of discomfort due to the collar-mounted cameras.
In the majority of studies on grazing animal sounds, the recorders were mounted on the forehead [
4,
6,
13]. These recorder positions captured sounds produced by the skull following the various mouth activities. This highlighted sounds that were classified as bite, chew, and chew-bite, the latter resulting from chewing during grasping. In our conditions, since the sounds were acquired from the collars, events such as chew-bites or chewing alone, often very recognizable but at other times much less so, were not recognized. This was because, in our case, the vibration of the skull was missing. However, in our opinion, the advantage of recording sounds from a collar allows for the acquisition of a clearer sound of individual herbage bite, independent of chewing activity. These bites, in fact, represent the most important variable for herbage intake estimation; indeed, herbage intake is calculated by the product of eating time x number of bites x bite mass [
21].
Sheep are grazers with a high capacity for selection, especially when using pastures characterized by high biodiversity in terms of botanical composition [
22], and therefore, they often penetrate the vegetation in search of the most palatable plants or plant parts. Furthermore, as gregarious animals, they tend to graze very close together, so the noises produced by neighboring animals can make challenging the recognition of feeding sounds.
In our experimental conditions, the noises associated with the selective activity for preferred species produced strong rustling sounds, which nevertheless did not prevent the recognition of prehensions. Furthermore, even the grasping noises of adjacent ewes did not prevent us from recognizing the herbage prehensions. The weather conditions during the 2 days of recordings, characterized by a lack of rain and a light wind, did not interfere with the acquisition of grazing sounds. These conditions were particularly favorable. However we believe that even the presence of wind should not pose any problems in detecting prehension sounds, having had other experiences with cattle grazing [
11] in the presence of wind reaching up to 48 km/h. Despite the very loud noise recorded, the sound of the herbage prehensions was still audible, which stands out for its peculiar characteristics even against loud background noise. This is because the grasping sound is recorded at a very short distance from the animal’s mouth.
The R2 values obtained from the regression analysis between the values estimated using sound alone and the actual values monitored through video, together with the absence of significant differences between mean values acquired with audio and video, demonstrate that listening the sounds obtained through a recorder placed in the animal’s collar provides a reliable estimate of eating and rumination time, as well as the number of prehensions.
The second objective of the study was to evaluate whether the different types of pasture used on the two days of observation (grass sward vs. legume sward) could influence the recognition of grazing activity events (prehension, rumination, other activities). This hypothesis was formulated during the sound labeling phase; in fact, the noise of tearing grass generally seemed more intense and different from the sound perceived when the sheep ingested legumes or other essences. Observing the video together with the sound allowed us to note that when the sheep chose grass, the bite consisted of a group of stems which are characterized by a higher level of fiber than legumes (NDF value: 63.5 vs. 35.8% DM, respectively, in grass and legumes forages), whereas when the sheep chose legumes, the grasps were mostly directed at individual stems or leaves. This resulted in a rapid succession of sounds due to the severing of individual stems within each grass bite. Legume bite, on the other hand, consisted of the severing of an individual leaf or stem. The hypothesis was therefore that prehensions of grasses could be easier to recognize, compared to legumes.
The two swards were very different from each other in terms of both available biomass and botanical composition, with consequent differences in chemical composition as well. However, statistical significance of the differences between pastures was achieved only for a few parameters (percentage of legume forages, WSC, and fat). The lack of statistical significance for all other chemical-nutritional parameters of the two pastures is probably due to the high variability found between the different samples collected within each pasture.
The acoustic analysis of sheep prehension events on grass and legume forages, based on ANOVA results and on boxplot inspection of key audio features, reveals systematic differences between the two classes of sward. These differences pertain to temporal, spectral, and energy-related aspects of the recorded bite events. Segment duration emerged as a key differentiating factor. Legume bites were longer and more variable. In contrast, the grass bites showed more consistent, shorter events, indicating quicker and more stereotyped prehensions. Pitch analysis showed higher fundamental frequencies in the grass class, possibly due to more abrupt or forceful mandibular activity, while the legume class exhibited lower and more stable pitch values. This could reflect a smoother, less percussive prehension process when dealing with softer legumes. Power (dBFS) also showed clear separability between bite classes. The grass class exhibited both higher average power and greater variability, potentially due to more vigorous chewing or denser plant structure. Notably, while the current analysis was conducted on sheep, previous studies in cattle reported partially contrasting observations. For instance, Galli et al. (2020) [
4], although not detecting significant differences between grasses and legumes, noted that chewing sounds associated with legumes tended to exhibit greater energy and shorter duration. Li et al. (2021) [
23] found that tall fescue produced bite sounds with greater amplitude and duration compared to alfalfa. Our results, which indicate higher energy and shorter durations for grass forage prehensions, align only partially with these trends, suggesting that species-specific chewing strategies, forage texture, and bite mechanics may play a crucial role in shaping the acoustic profiles. These discrepancies highlight the importance of species-specific investigations and reinforce the need for dedicated acoustic datasets in sheep. Despite the observed differences in the acoustic characteristics between grass and legume prehensions, no significant differences were found between the two swards in the recognition of the different grazing sounds such as, eating, rumination, and other activities. This may be partially explained by the intense selective activity towards legumes also observed in the sward with a prevalence of grasses. Grass was the least selected (rejected according to the classification proposed by Stuth (1991) [
20]) in both swards, despite the high percentage present in the grass-rich one. Legumes were highly preferred in the grass-rich sward where despite the minimal percentages present in the pasture (3.4%), the selection reached 10.1%, corresponding to a selective index of 2.96. The other species were preferred in both forage fields, compared to the available percentages.
Based on these results, it cannot be stated with certainty that the presence of grasses can improve the recognition of herbage prehensions, as initially hypothesized. Statistical analysis showed no effect of pasture type on ingestion and rumination times or on the number of prehensions.