An Intelligent Monitoring System for Sheep Behavior Based on ActiGraph Sensors

Ghadir, Setayesh; Ghadir, Delaram; Mehari Berhe, Tesfalem; Adami, Davide; Giordano, Stefano; Pagano, Michele; Rossi, Pietro; Sotgiu, Francesca Daniela; Mossa, Francesca; Berlinguer, Fiammetta

doi:10.3390/network6020031

Open AccessArticle

An Intelligent Monitoring System for Sheep Behavior Based on ActiGraph Sensors

by

Setayesh Ghadir

¹

,

Delaram Ghadir

¹

,

Tesfalem Mehari Berhe

¹

,

Davide Adami

¹

,

Stefano Giordano

²

,

Michele Pagano

^2,*

,

Pietro Rossi

³,

Francesca Daniela Sotgiu

³

,

Francesca Mossa

³

and

Fiammetta Berlinguer

³

¹

Consorzio Nazionale Interuniversitario per le Telecomunicazioni (CNIT), Department of Information Engineering, University of Pisa, 56122 Pisa, Italy

²

Department of Information Engineering, University of Pisa, 56122 Pisa, Italy

³

Department of Veterinary Medicine, University of Sassari, 07100 Sassari, Italy

^*

Author to whom correspondence should be addressed.

Network 2026, 6(2), 31; https://doi.org/10.3390/network6020031

Submission received: 10 February 2026 / Revised: 29 April 2026 / Accepted: 12 May 2026 / Published: 20 May 2026

Download

Browse Figures

Versions Notes

Abstract

Continuous and objective monitoring of livestock behavior plays a key role in precision farming, animal welfare assessment, and reproductive management. This study proposes a non-invasive framework for sheep behavior and reproductive activity monitoring that integrates wearable actigraphy, machine learning, and a cloud-based data processing architecture. Tri-axial accelerometer data were collected at 30 Hz using collar-mounted ActiGraph sensors under real farming conditions. Raw acceleration signals were processed without temporal aggregation, preserving full temporal resolution that includes axis-specific acceleration, vector magnitude, and delta magnitude features. Several supervised learning models were evaluated for behavior classification, including BLSTM, LSTM, CNN–BLSTM, Random Forest, and Support Vector Machine, targeting behaviors such as standing, walking, grazing, lying, flehmen, and mating. The results indicate that both deep learning and classical machine learning approaches achieve high classification performance, with Random Forest obtaining an overall accuracy of 0.82, while deep sequential models effectively capture temporal patterns and behavioral transitions. Furthermore, a scalable cloud architecture is introduced to automate data ingestion, preprocessing, inference, storage in InfluxDB, and visualization through an interactive web application. The proposed framework supports continuous monitoring and offers practical tools for precision livestock management.

Keywords:

precision livestock farming (PLF); sheep behavior classification; tri-axial accelerometry (TA); wearable actigraphy; machine learning (ML); bidirectional long short-term memory (BLSTM); cloud-based data pipeline

1. Introduction

Effective monitoring of animal behavior is essential for ensuring animal welfare, optimizing reproductive performance, and enabling timely health interventions in precision livestock systems. In sheep, behavioral changes often provide early indications of physiological states such as estrus, stress, illness, or parturition. Continuous and objective observation of these changes is therefore critical, yet traditional visual monitoring methods are labor-intensive, subjective, and unsuitable for large-scale or extensive farming systems. Recent advances in wearable sensing technologies [1] and artificial intelligence offer new opportunities for automated and objective assessment of livestock behavior under both controlled and real-world farming conditions, particularly through the analysis of inertial sensor data [2].

Among wearable sensing modalities, tri-axial accelerometers [3] have proven particularly valuable due to their portability, low power consumption, and ability to capture high-resolution motion dynamics. Accelerometer-based actigraphy enables continuous recording of posture, locomotion, and fine-scale activity patterns that are difficult to observe manually. Barwick et al. in [4] demonstrated that tri-axial accelerometers can be effectively used to classify sheep activities under grazing conditions. When combined with advanced analytical techniques such as feature extraction, clustering, and supervised learning, accelerometer-derived representations can discriminate between postures and activities, including standing, walking, grazing, and mating. More recent studies have shown that preserving the temporal structure of accelerometer signals is crucial for recognizing short-duration and transitional behaviors, which are often smoothed out by coarse temporal aggregation [5].

This capability is particularly relevant for reproductive management in sheep, where detecting courtship and mating-related behaviors plays a critical role in breeding efficiency. Automated behavior recognition approaches have been explored to identify health-related activities in grazing livestock [6]. In addition, accelerometer data have been successfully used to evaluate the sexual activity of rams, reinforcing the feasibility of sensor-based detection of reproductive behaviors [7]. Sexual performance traits in rams have been shown to be heritable and repeatable, highlighting mating behavior as a valuable indicator for selective breeding and reproductive efficiency [8].

Despite significant progress, several challenges remain. Accelerometer-based systems must be further validated under extensive farming conditions, and their robustness in recognizing rare, short-duration, and transitional activities requires improvement. Issues such as class imbalance, inter-animal variability, and sensor placement variability continue to limit the generalization of machine learning models [9]. Moreover, many existing studies focus on offline data analysis and lack integration with scalable infrastructures capable of supporting automated data ingestion, inference, and visualization.

From a system-level perspective, deploying accelerometer-based behavior monitoring in precision livestock farming requires an end-to-end architecture capable of handling continuous sensor data streams, automating data preprocessing and inference, and delivering results through scalable services. Recent cloud-centric IoT architectures demonstrate how data ingestion, stream processing, and machine learning pipelines can be integrated to support automated analytics and timely decision-making in operational environments [10].

To address these challenges, this work investigates the application of collar-mounted ActiGraph wGT3X-BT accelerometers (ActiGraph LLC, Pensacola, FL, USA) combined with full-resolution signal processing, unsupervised clustering, and supervised learning to characterize sheep activity and reproductive behavior. By analyzing motion patterns at multiple sampling frequencies and evaluating behavioral classification using both classical machine learning and deep sequential models, this study demonstrates the potential of wearable actigraphy as a non-invasive measurement tool for precision welfare assessment and reproductive monitoring in sheep. Furthermore, the proposed approach is designed to support integration with cloud-based data pipelines, enabling scalable processing and practical deployment in precision livestock systems.

The remainder of this paper is organized as follows. Section 2 presents the materials and data processing pipeline, including the sensor system, data collection procedures, data preprocessing, feature construction and learning framework. Section 3 describes the architecture of the proposed end-to-end IoT- and AI-driven sheep monitoring system, covering network communication, cloud-based data processing, and intelligent user interaction. Section 4 reports the experimental results and discussion, including feature-level analysis and a comparative evaluation of the considered machine learning and deep learning models and a dedicated subsection addressing the limitations of the study. Finally, Section 5 concludes the paper and outlines directions for future research.

2. Materials and Data Processing

2.1. Sensor System and Local Data Collection

Sheep behavior data were collected in a farm in Sardinia, Italy, using collar-mounted ActiGraph sensors wGT3X-BT(ActiGraph LLC, Pensacola, FL, USA) equipped with a tri-axial accelerometer, positioned around the neck of each animal [11].

The experimental group consisted of Suffolk rams (aged 1.5–5 years) and Suffolk ewes. All animals were clinically healthy and maintained under standard farm conditions, with ad libitum access to food and water. When available, body condition was assessed according to standard veterinary scoring systems to ensure a homogeneous physiological status among individuals. The sensors recorded raw acceleration signals along the x, y, and z axes at a sampling frequency of 30 Hz, ensuring high temporal resolution suitable for fine-grained activity recognition. Data acquisition can be performed locally on the devices during normal farm operations, enabling non-invasive and continuous monitoring of animal movement. After the recording phase, raw sensor data were downloaded using the ActiLife software (version 7.3.0), which provides access to ActiGraph data formats and associated metadata.

ActiLife is widely used for ActiGraph data extraction and preprocessing, supporting consistent handling of raw tri-axial accelerometer recordings and device information [12]. Moreover, it is possible to send raw data to cloud using the network architecture detailed in Section 3.1.

For the analysis and data labeling process, Table 1 summarizes the validated video recordings used for behavioral annotation. The dataset comprises multiple video segments collected from five rams and two ewes, with individual recordings lasting approximately 10–17 min. It is important to note that performance metrics are computed at the window level rather than the event level. Therefore, even behaviors with a small number of annotated events contribute multiple samples to the evaluation, enabling the calculation of meaningful accuracy values. For instance, each Flehmen event has an average duration of approximately 10 s, and when segmented into fixed-length windows, a single event may produce multiple window-level samples used for training and evaluation.

For ram recordings, a total of 1 h 44 min 55 s of valid footage was retained. Within this duration, 20 mating events and 7 flehmen events were manually annotated. These reproductive behaviors were observed exclusively in rams, while ewe recordings were included only for general activity reference and therefore contain no mating or flehmen annotations. Overall, the table provides an overview of the temporal coverage and frequency of reproductive behaviors in the cleaned dataset used for subsequent analysis.

So, the dataset covered a total duration of 6295 s and was originally recorded at a sampling frequency of 30 Hz, resulting in 188,850 raw accelerometer samples, each composed of tri-axial acceleration values. After preprocessing, the raw time-series data were turned into fixed-length representations using 1-s windows. Each sample contained 150 features, corresponding to 30 samples for each acceleration axis as well as derived features including the vector magnitude (VM), which captures the overall movement intensity, and the delta vector magnitude (ΔVM), which represents temporal changes in motion intensity between consecutive samples. The mathematical formulation of VM and ΔVM is detailed in the following section. The final dataset therefore consisted of 6295 samples suitable for supervised learning.

Behavioral labels were defined based on established ethological descriptions and used for manual video annotation. Video annotation criteria, including synchronization between sensor and video data, followed procedures similar to those used in previous sheep accelerometer studies [13]. The annotated behaviors were grouped according to animal type (rams and ewes).

For rams, reproductive-related behaviors included mating and flehmen. Mating was characterized by the male projecting forward with the chest contacting the ewe’s rump, or by intromission accompanied by pelvic thrusts and followed by a brief period of disinterest. Flehmen was identified by inhalation with the upper lip curled upward, exposing the incisors.

Behaviors common to both rams and ewes included walking, defined as slow forward locomotion with independent hoof placement; grazing, corresponding to foraging activity while standing or moving with the head lowered; standing, characterized by a stationary upright posture supported by all four legs; and lying, defined as a recumbent position on the ground.

2.2. Data Preprocessing and Feature Construction

Raw acceleration signals were processed to generate structured samples suitable for supervised learning. Each data sample corresponds to a 1-s time window [14], containing 30 consecutive measurements per axis (30 × x, 30 × y, and 30 × z), reflecting the original sensor frequency.

In addition to axis-specific acceleration values, two derived features were computed:

Raw acceleration signals: $x_{i}$ , $y_{i}$ , and $z_{i}$ , representing the acceleration components along the three orthogonal axes at sample i.
Vector Magnitude ( $VM$ ): The Euclidean norm of the tri-axial acceleration vector [15], computed as:

${VM}_{i} = \sqrt{x_{i}^{2} + y_{i}^{2} + z_{i}^{2}},$

which quantifies the overall intensity of movement independently of sensor orientation.
Delta Vector Magnitude (ΔVM): The absolute difference between consecutive vector magnitude values, defined as:

$Δ {VM}_{i} = |{VM}_{i} - {VM}_{i - 1}|,$

capturing abrupt motion changes and transitions between behavioral or postural states.

These features enhance sensitivity to changes in motion intensity and transitions between behavioral states. The resulting representation retains full temporal resolution while enabling efficient model training.

2.3. Supervised Learning Framework

Supervised learning techniques were employed to classify sheep behaviors using the preprocessed tri-axial accelerometer data. Behavioral labels were assigned through expert annotation and were aligned with visually observed activities, ensuring consistency between sensor measurements and ground-truth behavior.

Livestock behavior datasets are typically characterized by significant class imbalance due to the unequal duration and frequency of activities. To mitigate this issue and improve model generalization, oversampling techniques were applied exclusively to the training data. This strategy prevented bias toward dominant classes while preserving the integrity of the validation and test sets. Multiple machine learning and deep learning models were evaluated to capture both short-term motion patterns and longer temporal dependencies. Model performance was assessed using standard classification metrics, including accuracy, precision, recall, and F1-score.

2.3.1. Supervised Learning Framework and Model Configuration

Supervised learning models are used to classify sheep behaviors from raw tri-axial accelerometer windows. Each sample corresponds to a 1-s segment at 30 Hz and is represented as a multivariate sequence of length 30. For each timestep, five features were used: the three axis accelerations

(x, y, z)

, VM, and

Δ V M

. Consequently, the final neural network input tensor has shape

30 \times 5

.

To address the strong imbalance between rare behaviors (e.g., flehmen) and frequent behaviors (e.g., standing, grazing), oversampling was applied using the Synthetic Minority Over-sampling Technique (SMOTE) [14] in the normalized feature space.

SMOTE was applied in the z-score-normalized feature space, where synthetic samples of minority behaviors were generated through k-nearest-neighbor interpolation of entire temporal sequences. This approach increases the representation of under-observed behaviors, such as flehmen, without duplicating existing samples. The number of nearest neighbors was selected conservatively to accommodate classes with very limited samples. Overall, this strategy improves the robustness of the learning process under highly imbalanced conditions. As a result, minority classes such as mating and flehmen were increased to 681 samples each (8.69%), while the majority class (standing) remained dominant at 43.45%, preserving a realistic distribution.

Five classifiers were evaluated: Bidirectional Long Short-Term Memory (BLSTM), Long Short-Term Memory (LSTM), a hybrid CNN–BLSTM architecture, Random Forest (RF), and Support Vector Machine (SVM).

The evaluated classifiers include both deep learning and traditional machine learning approaches, selected to represent complementary learning paradigms for time-series behavior recognition. The LSTM network is a recurrent architecture specifically designed to model long-term temporal dependencies through gated memory units, making it suitable for sequential accelerometer signals where behavior patterns evolve over time [16]. The BLSTM extends this capability by processing the input sequence in both forward and backward directions, enabling the model to exploit past and future contextual information and improving discriminability in behaviors with similar local motion characteristics [17,18].

The CNN–BLSTM architecture combines convolutional and recurrent processing, where the CNN layers act as a feature extractor that captures local temporal patterns and invariant representations, while the BLSTM layer models higher-level temporal dependencies. This hybrid design is widely adopted in sensor-based activity recognition, as it leverages both short-term feature learning and long-term temporal modeling [19,20].

In contrast, the Random Forest classifier is an ensemble learning method based on aggregating multiple decision trees trained on randomized subsets of the data and features. This strategy improves generalization, reduces overfitting, and often performs strongly on structured and engineered feature spaces, which contributes to its robust and balanced performance in the current study [21]. Finally, the SVM is a margin-based classifier that seeks an optimal separating hyperplane in a transformed feature space, and nonlinear kernels allow it to capture complex decision boundaries. SVMs remain a competitive baseline for activity recognition tasks, particularly when classes are well-separated in the feature space [22]. Deep models were trained using categorical cross-entropy loss and the Adam optimizer, and their performance was monitored on a validation split. All experiments used a fixed train/validation/test split of 60%/20%/20% ensuring reproducibility across models. The dataset consists of 6295 samples, which were split into 3777 training samples (60%), 1259 validation samples (20%), and 1259 test samples (20%).

2.3.2. Deep Learning Architectures

Recurrent neural network-based models have been shown to be effective for classifying livestock behavior directly from accelerometer time series, motivating the use of LSTM-based architectures in this study [23]. Three deep neural architectures were evaluated to model temporal patterns in accelerometer sequences: LSTM, BLSTM, and CNN–BLSTM. All networks used the same input representation consisting of 30 timesteps and 5 features per timestep.

LSTM Model

The LSTM classifier consists of a single LSTM layer with 64 hidden units followed by dropout regularization (

p = 0.3

), a fully connected layer with 64 neurons using ReLU activation, and a final softmax layer producing class probabilities. The LSTM uses the final hidden state (i.e., return_sequences=False).

BLSTM Model

The BLSTM model mirrors the LSTM configuration but replaces the recurrent layer with a bidirectional LSTM with 64 units, enabling the network to learn dependencies in both forward and backward temporal directions. This is followed by dropout (

p = 0.3

), a 64-unit ReLU dense layer, and a softmax output layer.

CNN–BLSTM Model

The hybrid CNN–BLSTM architecture combines convolutional feature extraction with bidirectional temporal modeling. The network begins with a 1D convolution layer (64 filters, kernel size 3, ReLU activation), followed by max pooling (pool size 2) and dropout (

p = 0.3

). A second convolution layer (128 filters, kernel size 3) is applied, again followed by max pooling and dropout. The extracted temporal feature maps are then passed to a bidirectional LSTM layer with 64 units, followed by dropout and two dense layers (64-unit ReLU and softmax output).

For all deep models, training was performed for 40 epochs using a batch size of 64.

2.3.3. Classical Machine Learning Models

In addition to deep learning models, two classical classifiers were evaluated on the standardized feature set.

Random Forest (RF)

The RF classifier was trained using 300 decision trees (n_estimators=300) with bootstrap aggregation enabled and out-of-bag estimation (bootstrap=True, oob_score=True). No explicit maximum depth constraint was imposed (max_depth=None). Training used parallel execution (n_jobs=-1) and a fixed random seed (random_state=42).

Support Vector Machine (SVM)

The SVM classifier used a radial basis function kernel (kernel=’rbf’) with regularization parameter

C = 10

and gamma=’scale’. Probability estimation was enabled for probabilistic outputs.

During manuscript preparation, ChatGPT-5.2 was used exclusively for language proofreading and assistance in generating graphical illustrations.

3. Architecture of the End-to-End IoT and AI-Driven Sheep Monitoring System

3.1. Network Communication and Data Acquisition Architecture

Tri-axial accelerometer data generated by the ActiGraph wGT3X-BT sensor impose a continuous uplink requirement that is incompatible with LoRaWAN in practical deployments because according to the ActiGraph user manual, acceleration data are sampled using a 12-bit analog-to-digital converter (ADC) at user-defined rates starting from 30 Hz and stored as raw, non-filtered measurements [24]. Assuming the ADC resolution is preserved for transmission, a single 3-axis sample corresponds to

3 \times 12 = 36

bits. Therefore, at 30 Hz the raw stream produces

36 \times 30 = 1080

bits/s, which is approximately

1080 / 8 = 135

bytes/s, corresponding to about 8.1 KB/min, 0.49 MB/h, and 11.6 MB/day of raw accelerometer traffic (excluding protocol overhead). While LoRaWAN physical-layer data rates may theoretically reach values on the order of tens of kbps depending on modulation parameters and spreading factor, the sustainable application throughput is constrained by payload size, network capacity, and regulatory limitations [25].

In wGT3X-BT sensor was already provided at its first occurrence in the manuscript particular, LoRaWAN is optimized for low-duty-cycle transmissions with small packets rather than continuous streaming; thus, transmitting raw 30 Hz actigraphy would require frequent uplinks and excessive airtime occupancy, making it unsuitable for real-time raw accelerometer transfer. For these reasons, LoRaWAN is better suited for transmitting low-rate summaries, extracted features, or event-based notifications rather than raw high-frequency actigraphy streams.

For these reasons, cellular connectivity (4G/LTE and 5G) is a solution to support stable real-time streaming of raw actigraphy data to the cloud. Cellular networks provide sufficient bandwidth, low latency, and robust outdoor coverage, enabling uninterrupted data ingestion and real-time monitoring without the limitations of low-power wide-area networks.

The ActiGraph wGT3X-BT sensor is a wearable tri-axial accelerometer that provides high-resolution motion measurements but does not include direct wide-area connectivity (e.g., Wi-Fi or cellular). Instead, it relies on Bluetooth Low Energy (BLE) for short-range communication; therefore, raw accelerometer samples cannot be uploaded directly to the cloud.

To enable streaming, as shown in Figure 1, a Bluetooth-to-cellular IoT gateway was adopted. The sensor transmits raw acceleration measurements (x, y, z) via BLE to a nearby gateway, which buffers and packages the samples into time-stamped packets (e.g., 1-s chunks) and forwards them to the cloud through a 4G/LTE or 5G uplink for storage, inference, and visualization.

It is important to note that BLE is used only for short-range communication between the sensor and a nearby gateway device, rather than for long-range transmission across the entire pasture. The experimental data collection was conducted in a controlled farm environment with a relatively limited and partially indoor area, where BLE communication is feasible and reliable. Data are transmitted intermittently when the sensor is within range of the gateway, minimizing power consumption while ensuring efficient data transfer.

3.2. Cloud-Based Application Architecture and Automated Data Processing Pipeline

A cloud-based application architecture was designed to support automated data processing [26], inference, and visualization. The system integrates:

GitHub repositories for version control and workflow automation;
Cloud storage (Drive) for intermediate data handling;
InfluxDB as a time-series database for storing raw and predicted behavioral data;
Streamlit [27] for deploying an interactive web application accessible to end users.

Automated pipelines transform raw sensor data into structured inputs, apply trained models for behavior prediction, and store results in the database. The web application enables users to visualize both raw signals and inferred behaviors in real time, supporting practical decision-making for farmers and veterinarians. Several recent works have proposed end-to-end IoT analytics and real-time processing pipelines that integrate scalable messaging systems, stream processing, and machine learning inference for industrial applications [10].

To assess the practical feasibility of the proposed architecture, system-level aspects were considered. The cloud infrastructure is designed to operate on standard virtualized environments (e.g., multi-core CPU, ≥8 GB RAM, SSD-based storage), enabling efficient ingestion and processing of time-series data using InfluxDB 3. Data processing is performed in near real-time using batch-based ingestion (1-s windows). The end-to-end latency, including preprocessing and inference, is on the order of sub-second to a few seconds, depending on network conditions. This level of performance is sufficient for livestock monitoring applications, where continuous observation rather than strict real-time response is required. The architecture supports concurrent data streams from multiple devices and users through asynchronous processing and scalable storage. Given the average data rate of approximately 11.6 MB/day per device(based on feature-level aggregation at 1-s intervals in Section 3.1), the system can scale to multiple animals by increasing computational resources or distributing workloads across multiple instances. The modular design enables horizontal scalability and can be extended with containerized deployment and load balancing strategies.

The overall workflow, illustrated in Figure 2, follows a modular pipeline that transforms raw sensor measurements into high-level behavioral insights that can be accessed by farmers and veterinarians through an interactive interface.

The system begins with wearable sheep sensors, which continuously collect motion-related measurements. These measurements are periodically exported as raw files and uploaded to cloud storage (Google Drive), which acts as an intermediate repository for storing sensor logs before processing. This design choice enables scalable access to data from multiple animals while also simplifying integration with automated workflows.

To ensure reproducibility and automation, the project is managed through GitHub repositories that provide version control for both data processing scripts and machine learning components. In addition, GitHub Actions/Workflows are used to orchestrate the pipeline execution. These workflows automatically retrieve newly uploaded sensor files from the cloud storage, trigger preprocessing tasks, and generate structured datasets in standardized formats (e.g., CSV). This preprocessing stage includes essential operations such as timestamp alignment, cleaning invalid or missing values, feature extraction, and formatting data into model-ready inputs.

Once the data is structured, the pipeline executes machine learning model inference to perform behavior classification. The trained model produces predictions that map sensor patterns into meaningful behavioral categories (e.g., mating, walking).

Both the processed sensor data and the generated predictions are stored in InfluxDB, which was selected due to its efficiency in handling high-frequency time-series data. InfluxDB enables fast querying over large temporal datasets, supporting both real-time monitoring and retrospective analysis. Storing predictions alongside raw measurements allows direct comparison between sensor signals and inferred behaviors, facilitating validation and interpretability of the model outputs.

Finally, an interactive Streamlit dashboard provides a user-friendly interface for exploring the collected data and the inference results. The application supports visualization of trends, temporal patterns, and behavior distributions, enabling end users to assess the status of each animal. The dashboard is designed to display time and predicted behaviors, offering practical insights that can support decision-making in real farming conditions. Moreover, the architecture includes a chatbot interface that allows users to interact with the system using natural language queries (e.g., requesting the activity history of a specific animal or summarizing behavior changes over time), improving accessibility for non-technical users.

Overall, the proposed architecture emphasizes modularity, automation, and scalability. By combining cloud storage, automated workflows, time-series database management, and an interactive visualization layer, the system provides a robust foundation for animal monitoring and future integration of advanced AI-driven analytics.

3.3. Intelligent User Interaction and Large Language Model Integration Architecture

To enhance system usability, a chatbot interface was integrated using a combination of Groq-based large language models [28] and database query mechanisms, as illustrated in Figure 3. The chatbot allows non-technical users to interact with the system using natural language, enabling queries such as behavior summaries and temporal trends, without requiring knowledge of database query languages.

The interaction mechanism between the LLM and the database follows a structured processing pipeline. First, the user submits a natural language query through the chatbot interface. The LLM processes the input and extracts key entities such as the requested metric (e.g., behavior type or physiological parameter), time range, and filtering conditions. Based on this information, the query is transformed into a structured format compatible with the InfluxDB query language. The generated query is then executed on the time-series database to retrieve the relevant data. Finally, the results are returned to the LLM, which generates a human-readable response grounded in the retrieved data. This ensures that the system responses are both context-aware and data-driven. The prompt design plays a key role in ensuring reliable interaction between the LLM and the database. Each user query is embedded into a structured prompt template that includes the user request, contextual information (e.g., selected filters such as animal ID and time range), and instructions to generate a database-compatible query. The prompt explicitly constrains the LLM to produce structured outputs (e.g., query statements or summarized statistics) rather than free-form responses, reducing the risk of hallucinations. In addition, retrieved data from InfluxDB are injected back into the prompt to ground the final response, ensuring that the generated answers are consistent with the underlying dataset. The selected LLM is deployed via the Groq platform due to its low-latency inference and suitability for real-time IoT applications.

By combining structured time-series data with natural language interaction, the system provides an intuitive interface for interpreting complex sensor outputs and machine learning predictions.

Figure 4 illustrates interface of the sheep monitoring application, where users can select different options to refine the dataset before analysis. This screen enables filtering based on parameters such as date and time range, sheep ID, behavior type, and other relevant categories, allowing users to focus on specific animals or behavioral events. Figure 5 presents the results interface after applying the selected filters, where the system displays the processed output using visual diagrams and summaries to support interpretation of behavioral patterns and trends. In addition, it includes an integrated chatbot module that allows users to interact with the data using natural language, enabling them to request summaries, explanations, and detailed information derived from the filtered dataset. As shown in the application interface, users can pose questions such as “What is the most frequent behaviour in this sheep?”, and the system generates responses by grounding the LLM on the filtered dataset retrieved from the time-series database. The answers are therefore data-driven and consistent with the selected filters (e.g., animal ID, time interval, or behavior subset), ensuring that responses reflect the actual statistical distribution of the monitored behaviors.

4. Experimental Results and Discussion

This section presents the experimental results obtained from the analysis of tri-axial accelerometer data collected during different behavioral states. As stated in the previous section, the extracted features include VM, and the temporal variation of movement quantified by the change in ΔVM. Results are reported for multiple behavioral classes, with a particular focus on identifying distinctive motion patterns associated with mating behavior.

4.1. Feature-Level Analysis Across Behavioral Classes

Figure 6 presents a heatmap of the mean accelerometer-derived features for each behavioral class, normalized using z-score normalization to emphasize relative differences across behaviors. The visualization reveals distinct feature patterns associated with different activity levels and motion dynamics.

High-intensity behaviors exhibit strongly elevated VM and ΔVM, reflecting abrupt and impulsive movements. Mating behavior is characterized by pronounced X-axis and increased movement variability compared to locomotion and passive behaviors, indicating irregular motion patterns associated with mounting activity.

In contrast, grazing, standing, and lying show lower normalized VM and ΔVM values, consistent with repetitive or low-mobility postures. Overall, the heatmap demonstrates that axis-specific acceleration components and ΔVM provide discriminative information for separating mating behavior from both aggressive interactions and low-activity states.

4.2. Classification Models

We recall that the evaluated models include five classifiers: BLSTM, LSTM, CNN–BLSTM, RF and SVM.

Figure 7 and Figure 8 show the confusion matrices obtained for each model. Overall, all approaches demonstrate high classification performance across most behavioral classes, particularly for behaviors characterized by distinct and irregular motion patterns, such as mating and lying.

Among the deep learning approaches, the BLSTM and LSTM models achieve consistently high recognition rates by effectively capturing temporal dependencies in the accelerometer signals. However, the Random Forest classifier achieves the highest overall accuracy, demonstrating strong generalization when using engineered features.

The Random Forest classifier achieves the highest overall accuracy (0.82), demonstrating robust and balanced performance across both high-activity and low-activity behaviors. In particular, Random Forest shows improved classification of standing compared to sequential deep learning models, highlighting the effectiveness of ensemble-based feature learning.

The SVM model also exhibits strong performance for well-separated behavioral classes, such as mating. However, increased confusion is observed between standing and walking, a trend that is consistent across all evaluated models and reflects similarities in their underlying motion patterns rather than limitations of a specific classifier.

Across all algorithms, the most frequent misclassifications occur between standing and walking, which exhibit comparable acceleration magnitudes and limited temporal variability. In contrast, behaviors characterized by higher temporal variability are consistently distinguished with high accuracy. These findings align with the feature-level patterns observed in Figure 6, confirming that overall acceleration magnitude (VM) and ΔVM play a key role in enabling reliable behavior classification.

From a practical deployment perspective, Random Forest offers a favorable trade-off between accuracy, computational efficiency, and interpretability, making it suitable for real-time and resource-constrained environments such as edge or farm-level systems. In contrast, deep learning models (LSTM, BLSTM, and CNN–BLSTM) are more effective in capturing complex temporal dependencies, but require higher computational resources and training time, making them more appropriate for cloud-based inference scenarios.

The grouped feature importance analysis in Figure 9 shows that directional motion features contribute most significantly to the classification process, with the Y-axis exhibiting the highest impact, followed by the Z-axis and X-axis. This indicates that spatial motion patterns play a dominant role in distinguishing behavioral classes. Magnitude and delta magnitude features also provide meaningful contributions by capturing overall movement intensity and temporal variations, respectively. Delta statistical features represent aggregated summaries of motion changes over time (e.g., mean and variability of movement differences). In contrast, these features show relatively low importance, suggesting that aggregated summaries are less informative than direct and dynamic motion representations. Overall, these results highlight that both spatial and temporal motion characteristics are critical for accurate behavior classification, with directional features being the most influential.

Table 2 reports the test-set performance of the evaluated classifiers using accuracy, precision, recall, and F1-score (weighted averages). Accuracy represents the proportion of correctly classified samples over the total number of test instances. Precision quantifies the reliability of the predicted labels by measuring the fraction of true positive predictions among all predicted positives, while recall (also referred to as sensitivity) measures the fraction of correctly identified positive samples among all actual positives. The F1-score is defined as the harmonic mean of precision and recall, providing a balanced metric when both false positives and false negatives are relevant.

These metrics are computed as follows [29]:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(1)

Precision = \frac{T P}{T P + F P}

(2)

Recall = \frac{T P}{T P + F N}

(3)

F1-score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(4)

where

T P

,

T N

,

F P

, and

F N

denote true positives, true negatives, false positives, and false negatives, respectively.

Overall, the Random Forest model achieves the highest performance, with an accuracy of 0.8272 and an F1-score of 0.8197, indicating strong discriminative capability and consistent classification across the test distribution. Among the deep learning approaches, the CNN–BLSTM yields the best results (Accuracy = 0.7921, F1 = 0.7843), outperforming both the BLSTM and the baseline LSTM. This improvement suggests that the convolutional layers enhance feature representation by capturing short-term local patterns, while the bidirectional recurrent layer effectively models temporal dependencies in both temporal directions. The BLSTM performs slightly better than the unidirectional LSTM, confirming the advantage of bidirectional sequence modeling for behavior recognition. Finally, the SVM achieves competitive but lower performance compared to the top-performing model.

Based on Table 3, the BLSTM model achieved the strongest performance for lying and reproductive-related behaviors such as flehmen, while walking remained the most challenging class, mainly due to its overlap with standing. Mating showed high precision but lower recall, indicating that when detected it was usually classified correctly, although a portion of mating samples was still confused with standing.

Figure 10 illustrates the training and validation accuracy and loss curves for the BLSTM, LSTM, and CNN–BLSTM models across training epochs. For all architectures, accuracy increases steadily while the loss decreases rapidly during the initial epochs, indicating effective learning of discriminative motion patterns from the accelerometer signals.

The BLSTM and LSTM models exhibit smooth convergence, with training and validation curves remaining closely aligned, suggesting stable optimization and limited overfitting. The CNN–BLSTM model shows slightly faster convergence and achieves the highest validation accuracy, reflecting the benefit of combining convolutional feature extraction with bidirectional temporal modeling.

Across all models, the small gap between training and validation performance indicates good generalization to unseen data. Minor fluctuations in the validation loss at later epochs are expected in sequential sensor data and do not indicate performance degradation. Overall, these learning curves confirm that the selected architectures are capable of learning robust temporal representations for behavior recognition.

Epoch-wise accuracy and loss curves are shown only for deep learning models, as Random Forest and SVM classifiers do not employ iterative, epoch-based training and therefore do not yield learning curves.

The results of this study confirm that collar-mounted tri-axial accelerometers, combined with machine learning and deep learning techniques, enable accurate and non-invasive classification of sheep behavior under real farming conditions. Preserving the full temporal resolution of accelerometer signals proved effective in capturing both steady-state activities and short-duration, high-intensity behaviors, supporting the working hypothesis that fine-grained temporal information is essential for reliable behavioral and reproductive monitoring.

Consistent with previous studies on accelerometer-based livestock monitoring, VM and its ΔVM emerged as key discriminative features, particularly for identifying high-energy behaviors such as mating. The strong performance of both classical and deep learning models indicates that meaningful behavioral patterns are well encoded in raw actigraphy data. Ensemble-based methods, such as Random Forest, showed robust and balanced performance, while sequential deep learning models better captured temporal dependencies, in line with earlier findings in animal activity recognition literature.

Misclassification between standing and walking was observed across all models, reflecting inherent similarities in motion dynamics rather than model limitations, a challenge widely reported in related work. At the system level, the proposed cloud-based architecture extends prior offline approaches by enabling continuous data ingestion, real-time inference, and interactive visualization, enhancing practical applicability in precision livestock farming.

To further contextualize the performance of the proposed approach, a comparative analysis with representative studies from the literature is presented in Table 4. The selected works include both traditional machine learning and deep learning approaches applied to livestock behavior classification using accelerometer data. As shown, most existing studies focus on common daily activities (e.g., grazing, lying, and walking) and often report high accuracy under controlled conditions. In contrast, the present work addresses a more challenging scenario by incorporating rare and complex behaviors, such as mating and flehmen, which are inherently difficult to capture due to their limited occurrence and variability.

It should be noted that direct comparison across studies is not strictly equivalent due to differences in datasets, sensor configurations, and behavioral definitions. In particular, the proposed work includes rare and complex behaviors such as mating and flehmen, which are not commonly addressed in prior studies.

4.3. Limitations of the Study

Despite the promising results, several limitations should be acknowledged. First, the dataset is relatively limited, particularly for rare reproductive behaviors such as mating and flehmen, which are inherently seasonal and occur infrequently under real farm conditions. Although each event generates multiple samples through high-frequency acquisition (30 Hz) and window-based segmentation, the number of independent events remains limited, which may affect model generalization.

It is important to emphasize that the primary objective of this work is the design and validation of an end-to-end intelligent monitoring system, integrating sensing, machine learning, cloud processing, and user interaction. In this context, the dataset serves as an initial real-world validation of the proposed pipeline.

Future work will focus on extending data collection across multiple seasons, animals, and environments to improve robustness, balance, and generalization.

5. Conclusions

This study, conducted in a commercial farm in Sardinia, Italy (40°50′ N, 8°5′ E), presents an end-to-end intelligent monitoring system for sheep behavior and reproductive activity. The proposed framework integrates collar-mounted ActiGraph sensors, machine learning models, and a scalable cloud-based architecture to enable continuous, non-invasive monitoring under real farm conditions. By preserving full-resolution 30 Hz accelerometer data and leveraging VM and ΔVM features, the system demonstrates the feasibility of automatically recognizing both common and rare behaviors.

Beyond classification performance, the main contribution of this work lies in the design and implementation of a complete data pipeline that supports acquisition, preprocessing, inference, storage, visualization, and user interaction. The integration of a Streamlit-based web application and a natural-language chatbot further enhances usability, enabling farmers and veterinarians to access and interpret behavioral data in an intuitive way. This highlights the practical potential of the proposed system for real-world precision livestock farming applications.

The experimental results show promising performance across different machine learning and deep learning models, with Random Forest achieving the highest overall accuracy, while sequential models effectively capture temporal dependencies in behavioral patterns. However, these results should be interpreted in the context of the dataset characteristics, particularly the limited number of independent observations for rare reproductive behaviors.

Future research will focus on improving generalization across animals and environments, extending data collection across mating seasons, and increasing the number of observed rare behavioral events such as mating and flehmen, which are inherently seasonal and difficult to capture. In addition, future work will address class imbalance more robustly and explore the integration of additional sensing modalities to further enhance behavioral discrimination and system scalability.

Author Contributions

Conceptualization, S.G. (Stefano Giordano), D.A., F.M. and F.B.; Methodology, S.G. (Setayesh Ghadir), D.G., T.M.B. and P.R.; Software, S.G. (Setayesh Ghadir) and D.G.; Validation, S.G. (Setayesh Ghadir), D.G., T.M.B. and D.A.; Investigation, S.G. (Setayesh Ghadir), D.G., T.M.B. and P.R.; Resources, S.G. (Setayesh Ghadir), D.G., T.M.B., D.A., P.R., F.D.S., F.M. and F.B.; Data Curation, P.R., F.M., F.D.S. and F.B.; Writing—Original Draft Preparation, S.G. (Setayesh Ghadir), D.G. and T.M.B.; Writing—Review & Editing, D.A., S.G. (Stefano Giordano) and M.P.; Visualization, S.G. (Setayesh Ghadir), D.G. and T.M.B.; Supervision, D.A., S.G. (Stefano Giordano), M.P., F.M. and F.B.; Application Deployment and System Integration, S.G. (Setayesh Ghadir), D.G. and T.M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was developed within the framework of the project e.INS—Ecosystem of Innovation for Next Generation Sardinia (ECS00000038), funded by the Italian Ministry of University and Research (MUR) under the National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5.

Institutional Review Board Statement

The animal study was reviewed and approved by the Organismo Preposto al Benessere e alla Sperimentazione Animale (OPBSA), Università degli Studi di Sassari (Italy). A favorable ethical opinion was issued for the project “From conception to birth: investigation on accelerometers and artificial intelligence algorithms as innovative tools to improve sheep fertility” (Protocol No. 144395; submission date: 15 December 2025; approval date: 7 January 2026).

Data Availability Statement

The datasets analyzed during the current study are publicly available on Kaggle at: https://www.kaggle.com/datasets/setayesh73/sheep-dataset, accessed on 11 May 2026.

Acknowledgments

This work was partially supported by the Italian Ministry of University and Research (MUR) in the framework of the FoReLab project (Departments of Excellence). During the preparation of this manuscript, the authors used ChatGPT-5.2 for proofreading assistance and to generate the graphical illustrations presented in Figure 1, Figure 2 and Figure 3. The authors carefully reviewed and edited the content as necessary and take full responsibility for the final published work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
BLSTM	Bidirectional Long Short-Term Memory
CNN	Convolutional Neural Network
CNN–BLSTM	Convolutional Neural Network–Bidirectional Long Short-Term Memory
CSV	Comma-Separated Values
ΔVM	Delta Vector Magnitude
Hz	Hertz
IoT	Internet of Things
LTE	Long-Term Evolution
LSTM	Long Short-Term Memory
LoRaWAN	Long Range Wide Area Network
ML	Machine Learning
PLF	Precision Livestock Farming
RF	Random Forest
ReLU	Rectified Linear Unit
SMOTE	Synthetic Minority Over-sampling Technique
SVM	Support Vector Machine
TA	Tri-Axial Accelerometry
VM	Vector Magnitude

References

Ding, L.; Zhang, C.; Yue, Y.; Yao, C.; Li, Z.; Hu, Y.; Yang, B.; Ma, W.; Yu, L.; Gao, R.; et al. Wearable Sensors-Based Intelligent Sensing and Application of Animal Behaviors: A Comprehensive Review. Sensors 2025, 25, 4515. [Google Scholar] [CrossRef] [PubMed]
Riaboff, L.; Shalloo, L.; Smeaton, A.F.; Couvreur, S.; Madouasse, A.; Keane, M.T. Predicting Livestock Behaviour Using Accelerometers: A Systematic Review of Processing Techniques for Ruminant Behaviour Prediction from Raw Accelerometer Data. Comput. Electron. Agric. 2022, 192, 106610. [Google Scholar] [CrossRef]
Barwick, J.; Lamb, D.W.; Dobos, R.; Welch, M.; Trotter, M. Categorising Sheep Activity Using a Tri-Axial Accelerometer. Comput. Electron. Agric. 2018, 145, 289–297. [Google Scholar] [CrossRef]
Barwick, J.; Lamb, D.W.; Dobos, R.; Welch, M.; Schneider, D.; Trotter, M. Identifying Sheep Activity from Tri-Axial Acceleration Signals Using a Moving Window Classification Model. Remote Sens. 2020, 12, 646. [Google Scholar] [CrossRef]
Martiskainen, P.; Järvinen, M.; Skön, J.-P.; Tiirikainen, J.; Kolehmainen, M.; Mononen, J. Cow Behaviour Pattern Recognition Using a Three-Dimensional Accelerometer and Support Vector Machines. Appl. Anim. Behav. Sci. 2009, 119, 32–38. [Google Scholar] [CrossRef]
Asmare, B. A Review of Sensor Technologies Applicable for Domestic Livestock Production and Health Management. Adv. Agric. 2022, 2022, 1599190. [Google Scholar] [CrossRef]
Mozo, R.; Alabart, J.L.; Rivas, E.; Folch, J. New Method to Automatically Evaluate the Sexual Activity of the Ram Based on Accelerometer Records. Small Rumin. Res. 2019, 172, 16–22. [Google Scholar] [CrossRef]
Snowder, G.D.; Stellflug, J.N.; Van Vleck, L.D. Heritability and Repeatability of Sexual Performance Scores of Rams. J. Anim. Sci. 2002, 80, 1508–1511. [Google Scholar] [CrossRef][Green Version]
Rahman, A.; Smith, D.; Little, B.; Ingham, A.B.; Greenwood, P.L.; Bishop-Hurley, G.J. Cattle Behaviour Classification from Collar, Halter, and Ear Tag Sensors. Inf. Process. Agric. 2018, 5, 124–136. [Google Scholar] [CrossRef]
Khattach, O.; Moussaoui, O.; Hassine, M. End-to-End Architecture for Real-Time IoT Analytics and Predictive Maintenance Using Stream Processing and ML Pipelines. Sensors 2025, 25, 2945. [Google Scholar] [CrossRef]
Giovanetti, V.; Decandia, M.; Molle, G.; Acciaro, M.; Mameli, M.; Cabiddu, A.; Cossu, R.; Serra, M.G.; Manca, C.; Rassu, S.P.G.; et al. Automatic classification system for grazing, ruminating and resting behaviour of dairy sheep using a tri-axial accelerometer. Livest. Sci. 2017, 196, 42–48. [Google Scholar] [CrossRef]
Jakobsson, T.; Lauruschkus, K.; Tornberg, Å.B. An Evaluation of Data Processing When Using the ActiGraph GT3X Accelerometer in Non-Ambulant Children and Adolescents with Cerebral Palsy. Clin. Physiol. Funct. Imaging 2022, 43, 85–95. [Google Scholar] [CrossRef]
Goldsmith, E.L.; Rickard, J.P.; Mercorelli, L.R.; González, L.A.; de Graaf, S.P. The Use of Accelerometers for the Remote Detection of Mounting in Rams and Testosterone-Treated Wethers. Comput. Electron. Agric. 2022, 199, 107129. [Google Scholar] [CrossRef]
Turner, K.E.; Thompson, A.; Harris, I.; Ferguson, M.; Sohel, F. Deep Learning Based Classification of Sheep Behaviour from Accelerometer Data with Imbalance. Inf. Process. Agric. 2023, 10, 377–389. [Google Scholar] [CrossRef]
Lee, S.-M.; Yoon, S.M.; Cho, H. Human Activity Recognition from Accelerometer Data Using Convolutional Neural Network. In Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Republic of Korea, 13–16 February 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 131–134. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K.K. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef]
Zhao, B.; Lu, H.; Chen, S.; Liu, J.; Wu, D. Convolutional Neural Networks for Time Series Classification. J. Syst. Eng. Electron. 2017, 28, 162–169. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Burges, C.J.C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
Wang, L.; Arablouei, R.; Alvarenga, F.A.P.; Bishop-Hurley, G.J. Classifying animal behavior from accelerometry data via recurrent neural networks. Comput. Electron. Agric. 2023, 206, 107647. [Google Scholar] [CrossRef]
ActiGraph. User Guide: ActiGraph™ wGT3X-BT + ActiLife; Revision A; ActiGraph: Pensacola, FL, USA, 2016; Available online: https://actigraphcorp.jp/support/pdf/gt3xbt_usersguide.pdf (accessed on 22 January 2026).
Adelantado, F.; Vilajosana, X.; Tuset-Peiró, P.; Martínez, B.; Melia-Segui, J.; Watteyne, T. Understanding the Limits of LoRaWAN. IEEE Commun. Mag. 2017, 55, 34–40. [Google Scholar] [CrossRef]
Botta, A.; de Donato, W.; Persico, V.; Pescapé, A. Integration of Cloud Computing and Internet of Things: A Survey. Future Gener. Comput. Syst. 2016, 56, 684–700. [Google Scholar] [CrossRef]
Streamlit. Streamlit Documentation. Available online: https://docs.streamlit.io/ (accessed on 1 December 2025).
Groq. Groq Documentation (GroqCloud API/LPU Inference). Available online: https://console.groq.com/docs (accessed on 26 January 2026).
Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]

Figure 1. Network Architecture.

Figure 2. End-to-end architecture of the proposed sheep behavior monitoring system.

Figure 3. Architecture of chatbot.

Figure 4. Overview of web application.

Figure 5. Overview of chatbot in web application.

Figure 6. Heatmap visualization of normalized mean acceleration features for different behavioral states.

Figure 7. Confusion matrices of BLSTM, LSTM, and CNN–BLSTM on the test set (values in %).

Figure 8. Confusion matrices of Random Forest and SVM on the test set (values in %).

Figure 9. Important features identified by the Random Forest model.

Figure 10. Validation curves of the evaluated models.

Table 1. Video-level annotation summary. Each event corresponds to an annotated behavioral event within a time interval.

Date	Time Interval	ID	Category	Duration	Mating Events	Flehmen Events
7–16 July 2025	10:22–10:34	1	Ram	11:31	9	0
	16:11–16:23	1	Ram	11:45	6	2
	11:54–12:11	2	Ram	16:52	0	1
	13:04–13:15	3	Ram	11:06	0	3
	10:37–10:47	4	Ram	10:18	0	0
	12:42–12:52	4	Ram	10:14	5	1
	07:52–08:02	5	Ram	10:07	0	0
	10:22–10:34	1	Ewe	11:31	–	–
	10:22–10:34	2	Ewe	11:31	–	–
Total (Rams)				01:44:55	20	7

Table 2. Performance comparison of the evaluated models on the test set (weighted average metrics).

Model	Accuracy	Precision	Recall	F1-Score
LSTM	0.7468	0.7475	0.7468	0.7328
BLSTM	0.7704	0.7673	0.7704	0.7635
CNN–BLSTM	0.7921	0.7894	0.7921	0.7843
SVM	0.8017	0.7980	0.8017	0.7953
Random Forest	0.8272	0.8312	0.8272	0.8197

Table 3. Per-class performance metrics of the BLSTM model on the test set.

Class	Precision	Recall	F1-Score
Flehmen	0.7826	0.8750	0.8262
Grazing	0.8091	0.8203	0.8146
Lying	0.8440	0.9675	0.9015
Mating	0.8880	0.7101	0.7880
Standing	0.7556	0.8039	0.7790
Walking	0.7226	0.4308	0.5398

Table 4. Comparison with related studies on livestock behavior classification.

Study	Animal	Behaviors	Method	Accuracy
Barwick et al. (2020) [4]	Sheep	Basic	ML (window-based classification)	48–95% (sensor-dependent)
Martiskainen et al. (2009) [5]	Cow	Daily	SVM-based approach	∼75–85%
Rahman et al. (2018) [9]	Cattle	Multiple	ML models	∼0.15–0.93%
Turner et al. (2022) [14]	Sheep	Imbalanced	Deep learning models	∼88%
Mozo et al. (2019) [7]	Ram	Mounting	ML-based detection	∼0.78–0.94%
This work	Sheep	Rare + common	RF/SVM/DL	82.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ghadir, S.; Ghadir, D.; Mehari Berhe, T.; Adami, D.; Giordano, S.; Pagano, M.; Rossi, P.; Sotgiu, F.D.; Mossa, F.; Berlinguer, F. An Intelligent Monitoring System for Sheep Behavior Based on ActiGraph Sensors. Network 2026, 6, 31. https://doi.org/10.3390/network6020031

AMA Style

Ghadir S, Ghadir D, Mehari Berhe T, Adami D, Giordano S, Pagano M, Rossi P, Sotgiu FD, Mossa F, Berlinguer F. An Intelligent Monitoring System for Sheep Behavior Based on ActiGraph Sensors. Network. 2026; 6(2):31. https://doi.org/10.3390/network6020031

Chicago/Turabian Style

Ghadir, Setayesh, Delaram Ghadir, Tesfalem Mehari Berhe, Davide Adami, Stefano Giordano, Michele Pagano, Pietro Rossi, Francesca Daniela Sotgiu, Francesca Mossa, and Fiammetta Berlinguer. 2026. "An Intelligent Monitoring System for Sheep Behavior Based on ActiGraph Sensors" Network 6, no. 2: 31. https://doi.org/10.3390/network6020031

APA Style

Ghadir, S., Ghadir, D., Mehari Berhe, T., Adami, D., Giordano, S., Pagano, M., Rossi, P., Sotgiu, F. D., Mossa, F., & Berlinguer, F. (2026). An Intelligent Monitoring System for Sheep Behavior Based on ActiGraph Sensors. Network, 6(2), 31. https://doi.org/10.3390/network6020031

Article Menu

An Intelligent Monitoring System for Sheep Behavior Based on ActiGraph Sensors

Abstract

1. Introduction

2. Materials and Data Processing

2.1. Sensor System and Local Data Collection

2.2. Data Preprocessing and Feature Construction

2.3. Supervised Learning Framework

2.3.1. Supervised Learning Framework and Model Configuration

2.3.2. Deep Learning Architectures

LSTM Model

BLSTM Model

CNN–BLSTM Model

2.3.3. Classical Machine Learning Models

Random Forest (RF)

Support Vector Machine (SVM)

3. Architecture of the End-to-End IoT and AI-Driven Sheep Monitoring System

3.1. Network Communication and Data Acquisition Architecture

3.2. Cloud-Based Application Architecture and Automated Data Processing Pipeline

3.3. Intelligent User Interaction and Large Language Model Integration Architecture

4. Experimental Results and Discussion

4.1. Feature-Level Analysis Across Behavioral Classes

4.2. Classification Models

4.3. Limitations of the Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI