An Improved Model for Online Detection of Early Lameness in Dairy Cows Using Wearable Sensors: Towards Enhanced Efficiency and Practical Implementation

Dai, Xiaofei; Cheng, Guodong; Yang, Lu; Wang, Yali; Li, Zhongkun; Han, Shuqing; Liu, Jifang

doi:10.3390/agriculture15151643

Open AccessArticle

An Improved Model for Online Detection of Early Lameness in Dairy Cows Using Wearable Sensors: Towards Enhanced Efficiency and Practical Implementation

by

Xiaofei Dai

^1,2

,

Guodong Cheng

^1,2,

Lu Yang

^1,2,

Yali Wang

^1,2,

Zhongkun Li

^1,2,

Shuqing Han

^1,2,* and

Jifang Liu

^1,2

¹

Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China

²

Key Laboratory of Agricultural Blockchain Application, Ministry of Agriculture and Rural Affairs, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(15), 1643; https://doi.org/10.3390/agriculture15151643

Submission received: 7 June 2025 / Revised: 22 July 2025 / Accepted: 25 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Application of Intelligent Technologies in Farm Animal Disease, Feeding and Building Environmental Control)

Download

Browse Figures

Versions Notes

Abstract

This study proposed an online early lameness detection method for dairy cow health management to overcome the inability of wearable sensor-based methods for online detection and low sensitivity to early lameness. Wearable IMU sensors collected acceleration data in stationary and moving states; a threshold discrimination module using variance of motion-direction acceleration was designed to distinguish states within 2 s, enabling rapid data screening. For moving-state windowed data, the InceptionTime network was modified with YOLOConv1D and SeparableConv1D modules plus Dropout, which significantly reduced model parameters and helped mitigate overfitting risk, enhancing generalization on the test set. Typical gait features were fused with deep features automatically learned by the network, enabling accurate discrimination among healthy, mild (early) lameness, and severe lameness. Results showed that the online detection model achieved 80.6% dairy cow health status detection accuracy with 0.8 ms single-decision latency. The recall and F1 score for lameness, including early and severe cases, reached 89.11% and 88.93%, demonstrating potential for early and progressive lameness detection. This study improves lameness detection efficiency and validates the feasibility and practical value of wearable sensor-based gait analysis for dairy cow health management, providing new approaches and technical support for monitoring and early intervention on large-scale farms.

Keywords:

dairy cow lameness; early lameness detection; improved InceptionTime; feature fusion

1. Introduction

Given the intensification of livestock production and increasing emphasis on animal welfare, dairy cow health management has become a critical component for ensuring milk yield and quality, and lameness is one of the most severe diseases affecting production performance and welfare levels in cows [1]. It has been shown that lameness in cows results in an average 23% reduction in milk yield [2], and treatment costs associated with lameness rank second among common diseases. Without timely intervention, more serious health complications may arise [3]. Early detection and timely treatment of lame cows are essential for interrupting the chronic cycle associated with this condition [4]. Therefore, early lameness detection in cows facilitates the prompt identification and treatment of hoof disorders, thereby reducing economic losses.

Manual identification exhibits drawbacks such as low efficiency, high cost, and strong subjectivity [5]. Computer vision-based detection methods extracted lameness characteristic parameters and constructed diagnostic models through techniques such as foreground detection, image segmentation, keypoint localization, and trajectory tracking. Related research has also been conducted in subtle motion feature extraction [5], key behavior detection [6], multi-scale keypoint detection [7], three-dimensional information from depth cameras [8], and thermal infrared temperature measurement [9]. Livestock behavior recognition using computer vision has been found to face substantial challenges in complex, dynamic farm environments due to highly variable scenes, lighting, and background conditions, which hinder reliable performance and limit industrial-scale adoption [10,11]. Moreover, accurate operation often requires animals to traverse dedicated alleys under fixed camera mounts, and long-term monitoring may be compromised by dust, humidity, and fluctuating light, making large-scale deployment technically challenging [12].

Recently, increasing attention has been devoted to monitoring cow behavior and health using wearable devices, particularly through the extraction and modeling of gait data to recognize movement states and early lameness signs. Early studies predominantly employed low-frequency accelerometers (10–50 Hz) combined with traditional machine learning algorithms, with devices deployed on the cow’s neck [13] or limbs [14]. Basic features such as activity level and Stride Frequency were extracted to construct binary classification models, achieving sensitivities up to 77% but remaining susceptible to environmental interference [15]. Limited feature representation capability made differentiation between early lameness and normal gait deviations difficult. To improve detection accuracy, research shifted to high-frequency sampling devices (100–400 Hz) and refined gait analysis, applying methods such as peak detection and time-frequency transforms to extract gait parameters, including Step Amplitude cycle and root mean square values [16,17]. However, these approaches depend on high-frequency sensors, which inherently suffer from high power consumption and short battery life, and impose substantial challenges in data storage and transmission. In cows’ activity, actual walking time accounts for only a small proportion; continuous acquisition without filtering generates large volumes of redundant stationary data, further burdening the system. Recent efforts have sought to integrate deep learning to enhance model generalization, such as employing LSTM-Autoencoders to process six-axis sensor time series data [18] or augmenting time-frequency feature expression via wavelet transforms [19]. Although these methods can capture subtle gait anomalies, they typically rely on high-quality annotated data and complex offline analysis pipelines, and they lack pre-screening mechanisms for raw data, making them unsuitable for online processing on low-power devices.

With the widespread application of time-series detection methods in fields such as electrocardiography [20,21] and industrial equipment monitoring [22,23], techniques for online detection and rapid alerting have attracted increasing academic attention. Existing studies have demonstrated that machine-learning-based online ECG classification systems can achieve a recognition accuracy of 90.89% [24] and that multivariate time-series models exhibit high robustness in the presence of artifacts, effectively reducing false alarms for myocardial infarction [21]. Furthermore, LSTM networks have been widely employed to construct online anomaly-detection frameworks in smart-grid and IoT scenarios [25]. These approaches highlight the potential of deep-learning-based time-series models for real-time monitoring in complex environments.

Inspired by these advances, this study sought to introduce mature sequence-modeling techniques into the task of early lameness detection in cows by identifying an architecture suited to the characteristics of dairy cow gait signals.

Comparative analyses indicated that the InceptionTime network, owing to its multi-scale convolutional structure, offers significant advantages in mining temporal patterns and exhibits strong feature-extraction capability and adaptability [26], making it well suited to handling non-stationary, highly variable gait data [27]. The network’s ability to automatically learn hierarchical time-domain features has led to excellent performance across various time-series classification tasks. However, the relatively large size of the InceptionTime architecture renders it ill-suited for deployment on resource-constrained edge devices. Moreover, static periods dominate in dairy farms, and walking data are sparse. As a result, a conventional InceptionTime network tends to overfit on the limited samples and struggles to balance detection accuracy with online processing efficiency [28]. Consequently, it is necessary to optimize the network architecture and integrate additional feature-fusion strategies tailored to the specific demands of cow gait analysis.

To date, early online detection of dairy cow gait abnormalities still faces several technical bottlenecks. First, traditional sensor-based monitoring methods rely heavily on manual data screening, and the collected data typically require offline analysis, which does not satisfy the need for real-time early detection. Second, although deep-learning techniques offer powerful feature-extraction capabilities, their complexity hinders on-site deployment in farm environments; moreover, the limited availability of annotated dairy cow gait samples increases the risk of model overfitting. Third, because static intervals comprise a large proportion of dairy cow gait recordings and motion data are sparse, general time-series models (e.g., LSTM) often fail to perform reliably on such data. These challenges point to the urgent need for a more robust, adaptive method that supports early online lameness detection.

To address these bottlenecks, this study proposes a lightweight online lameness detection model featuring three key innovations. First, a threshold discrimination module based on variance was designed to automatically distinguish between static and motion data windows, thereby filtering out redundant stationary data and enhancing online-analysis efficiency. Second, the InceptionTime architecture was upgraded by replacing standard Conv1D layers with YOLOConv1D and SeparableConv1D modules and inserting Dropout layers after each Inception block, which reduced model parameter count and helped mitigate overfitting. Finally, handcrafted expert features (Stride Frequency, Signal Amplitude, Step Amplitude, and Support Time) were fused with the multi-scale temporal features learned by the deep network, resulting in improved discrimination accuracy on limited gait datasets. Literature review indicates that no prior research has applied an improved InceptionTime network to dairy cow lameness detection.

2. Materials and Methods

2.1. Wearable Sensor Data Acquisition

Data acquisition was carried out in September 2024 at the Beijing Agricultural Machinery Experimental Station. A short habituation period was included to minimize behavioral artifacts associated with unfamiliar equipment. One week before formal data collection, each experimental cow was fitted with the same limb-mounted IMU (LPMS-B2, Alubi, Guangzhou, China) at the intended attachment point once per day. Sensor-wear time was increased progressively—from roughly 2 h on the first 2 days, to 6 h on days 3–5, and to a full working day on the day immediately preceding the trial. Throughout this week, trained stock personnel observed the animals at 15-min intervals and recorded signs of head-shaking, limb-kicking, altered stride length, or changes in feed intake. No abnormal behaviors were noted after the third day, and stride parameters measured during the final acclimation session overlapped the baseline range obtained without sensors. These observations indicate that the cows had adapted to the device before the start of data acquisition in September 2024, reducing the likelihood that novelty effects influenced the recorded gait signals.

Wearable IMU sensors with a sampling frequency of 30 Hz were employed to capture acceleration signals during cows’ locomotion. To ensure the accuracy and consistency of the collected data, all sensors were rigorously calibrated prior to deployment and mounted on the metatarsal region of each cow’s limb to prevent displacement during walking; the sensor X-axis was aligned parallel to each cow’s walking direction. Video recordings were used for time synchronization.

At the onset of each trial, cows were guided from the milking parlor along a 30 m corridor to the pen (Figure 1). Continuous back-and-forth walking within the corridor was enforced for at least 20 min to acquire sufficient high-quality walking data. Subsequently, cows entered the pen, and acceleration data were collected for the stationary states (standing and recumbency) without interference with their natural behavior.

2.2. Animal Ethics Approval

The animal study protocol was approved by the Institutional Review Board (or Ethics Committee) of the Agricultural Information Institute of the Chinese Academy of Agricultural Sciences (protocol code AII2021-810, approved on 10 August 2021).

2.3. Dataset Definition

The health status of 36 lactating Holstein cows (36–58 months old) was scored by three experts using the five-point Numerical Rating System (NRS) [29], where 1 denotes sound and 5 denotes severely lame. Mean scores were used to classify the animals into three groups: healthy (mean < 1.5; n = 12), early lameness (1.5 ≤ mean ≤ 2.5; n = 12), and severe lameness (mean > 2.5; n = 12) [16]. Because locomotion predominantly occurs along the sensor X-axis and its acceleration signal most directly reflects key gait-cycle events (acceleration, deceleration, and impact peaks), only the 30 Hz X-axis acceleration data were retained for analysis to reduce interference from other axes.

Because processing full tri-axial signals on resource-limited edge devices would significantly increase computational load, memory transfers, and energy consumption, and because the sensor is mounted on the distal metatarsus with its X-axis closely aligned to the walking direction—providing clear cyclic patterns of swing, hoof contact, and stride timing—while the Z-axis, being perpendicular to the walking direction, offers no additional discriminative information, and the Y-axis, though capable of capturing limb-lift and impact events, was omitted to simplify the processing pipeline and remain within the embedded platform’s resource budget, only the X-axis acceleration was retained for both variance-threshold screening and deep-learning analysis.

A sliding-window approach (window length λ = 120, frames ≈ 4 s, step size = 1 frame) was applied to the continuous X-axis time series, yielding 418,863 windowed samples in total (Table 1). A total of 94,373 windows corresponded to stationary states and 324,490 to motion states (healthy: 136,790; early: 80,893; severe: 106,807). To avoid data leakage and evaluate generalization on unseen individuals, a stratified split was performed at the cow level: one cow per health category (3 cows total) was randomly held out as an independent test set, contributing 15,840 windows (healthy: 4517; early: 5377; severe: 5946). The remaining 33 cows (308,650 windows) were randomly shuffled and partitioned into a training set (246,920 windows) and a validation set (61,730 windows) in an 8:2 ratio, with class proportions maintained (Table 1). This ensured that all test samples originated from cows not represented in training or validation, thereby providing a realistic assessment of model performance on “unseen” subjects.

2.4. Online Lameness Detection Model Architecture Design

The overall architecture of the proposed Online Early Lameness Detection Model is shown in Figure 2. It comprised the following modules: sensor-data preprocessing, preliminary state assessment, feature extraction, deep-feature learning, and feature fusion. Acceleration data were segmented into fixed-length windows, and an initial classification was performed by computing the variance of the X-axis acceleration within each window using a threshold discrimination module. When a window was classified as motion, a dual-branch feature-extraction pipeline was activated to distinguish among healthy gait, early lameness, and severe lameness.

In the handcrafted branch, four key gait features (Stride Frequency, Signal Amplitude, Step Amplitude, and Support Time) were extracted to form a 4 × 1 feature vector. Simultaneously, the raw windowed data were fed into multiple improved Inception modules to automatically learn deep temporal features, yielding a deep-feature vector. The handcrafted and deep feature vectors were concatenated along the channel dimension to create a fused feature vector. This fused vector was then passed through a fully connected layer to produce predicted probabilities for each lameness state (healthy, early lameness, and severe lameness).

To meet the low-latency requirements of edge-device deployment, the entire pipeline (from feature extraction through to final classification) was executed online at short, regular intervals (e.g., one complete inference every 4 s), ensuring rapid response.

2.5. Variance-Threshold Discrimination Module

To enable online differentiation between stationary and motion states, a variance-threshold discrimination module was designed. The module exploits the statistical characteristics of stationary and motion time-series samples in the training set by computing the variance of X-axis acceleration within each fixed-length window.

For each window of length, sensor bias was eliminated by adding an offset of +5 to all acceleration values, ensuring that all processed values, denoted as

z (t) (t = 1,2, \dots, λ)

, lie in the positive domain. The variance

σ^{2}

of a window was then computed as

σ^{2} = \frac{1}{λ} \sum_{t = 1}^{λ} {(z (t) - μ)}^{2}

(1)

where

μ

represents the mean acceleration within the window.

During module construction, the variances of all windows in the stationary dataset

\{σ_{s t i l l}^{2}\}

and the motion dataset

\{σ_{m o t i o n}^{2}\}

were computed. If

\min \{σ_{m o t i o n}^{2}\} > \max \{σ_{s t i l l}^{2}\}

(2)

then the discrimination threshold T was defined as

T = \frac{1}{2} (\min \{σ_{m o t i o n}^{2}\} + \max \{σ_{s t i l l}^{2}\})

(3)

For each newly acquired window, its variance

σ_{n e w}^{2}

was calculated; if

σ_{n e w}^{2} < T

, the window was classified as stationary, otherwise as motion. It should be noted that cow motion was predominantly walking, whereas other actions (e.g., turning, standing up) were transient, and their sporadic windows have a negligible impact on overall lameness detection.

2.6. Improved InceptionTime Architecture Design

The InceptionTime network, a classical deep neural network in the time-series processing domain [26], employs multiple stacked Inception modules to extract multi-scale features and excels at capturing temporal patterns. A conventional InceptionTime architecture consists of six Inception modules, each comprising four parallel branches (Figure 3a). In branch 1, a 1 × 1 convolution is used for inter-channel feature fusion and dimensionality reduction; branch 2 applies a 3 × 3 convolution to capture local temporal variations; branch 3 employs a 5 × 5 convolution to gather information over a larger receptive field; and branch 4 performs 3 × 3 max-pooling followed by a 1 × 1 convolution to extract robust local statistical features. The outputs of all branches are concatenated along the channel dimension, then passed through Batch Normalization and a ReLU activation to form the module output. Finally, global average pooling aggregates all temporal features across the network, and a fully connected layer maps to the three lameness classes (healthy, early lameness, severe lameness).

The strength of the traditional InceptionTime lies in its multi-scale convolutional structure, which effectively captures subtle variations in time-series data, and has the ability to enhance feature representation through module stacking [30]. However, the large model size and high parameter count hinder direct deployment on resource-constrained edge devices and can exacerbate overfitting issues [31].

An improved InceptionTime network was constructed by replacing the standard Conv1D layers with YOLOConv1D and SeparableConv1D modules, as illustrated in Figure 3b. The detailed architecture parameters are provided in Table 2. In the YOLOConv1D branch, Batch Normalization and LeakyReLU were applied immediately after each convolution, which enhanced training stability [32] and alleviated the vanishing-gradient issue [33]. The SeparableConv1D branch employed a combination of Depthwise and Pointwise convolutions, substantially reducing the model’s parameter count while preserving its capacity for local temporal-feature extraction [34]. Finally, a Dropout layer (node-retention probability set to 0.5) was inserted after the concatenation of each Inception module to mitigate overfitting and bolster the model’s generalization ability [35].

In the improved InceptionTime architecture, the network comprises six stacked enhanced Inception modules. Temporal features are aggregated via an adaptive global average pooling layer, then passed through a Dropout layer and a fully connected layer to produce the three-class output.

2.7. Lameness Feature Extraction Module

Lameness features were extracted from each fixed-length signal window by means of a dedicated feature-extraction module. This module was designed to leverage expert knowledge to derive physically meaningful gait indicators from the preprocessed acceleration signal, thereby compensating for potential deficiencies in feature representation by the deep neural network and enabling their subsequent fusion. The extraction process consisted of two stages: data preprocessing and handcrafted feature computation.

For data preprocessing, raw acceleration signals were first filtered using a fourth-order Butterworth band-pass filter with a low cutoff frequency of 0.5 Hz and a high cutoff frequency of 5 Hz. The filter’s transfer function was defined as

H (s) = \frac{1}{\sqrt{1 + {(\frac{s}{ω_{c}})}^{2 n}}}

(4)

where

ω_{c}

denotes the cutoff angular frequency (calculated from the sampling rate) and

n = 4

is the filter order. This filtering step removed low-frequency drift and high-frequency noise, retaining only the frequency components relevant to the cow gait cycle (0.5–5 Hz).

Subsequently, the filtered signal was smoothed using a Savitzky-Golay filter. A second-order polynomial with a window length of five was fitted within each sliding window to attenuate noise while preserving signal peaks and waveform morphology. The smoothed signal was then detrended by subtracting its mean, producing a zero-mean signal that removed static offsets arising from sensor bias or environmental factors, thus ensuring that subsequent feature extraction focused solely on dynamic variations. These preprocessing steps preserved only the gait-related frequency components, reduced noise interference, and yielded signals with more stable statistical properties.

From each zero-mean window, four lameness-related features were extracted. First, the signal amplitude

A

, which reflects the dynamic range within the window, was calculated as

A = \max (z (t)) - \min (z (t))

(5)

where

z (t)

denotes the preprocessed, zero-mean signal.

Stride Frequency denotes the number of gait cycles completed by the cow per unit time. Local maxima were identified in the preprocessed, zero-mean signal using a peak-detection algorithm. To prevent multiple detections within a single gait cycle, a minimum peak-to-peak interval of 0.3 s was enforced (

d i s t a n c e = 0.3 \times F S = 9

samples, where

F S = 30 H z

). Additionally, to exclude spurious peaks caused by non-cyclic signal fluctuations, a prominence threshold of 3 was applied. As illustrated in Figure 4, this procedure reliably isolates peaks corresponding to true gait cycles.

Let

N_{s t e p s}

denote the number of detected peaks and

λ

the window length (in samples). The window duration is

\frac{λ}{F S}

seconds, converted to minutes (denoted as

T_{m i n u t e s}

) as

T_{m i n u t e s} = \frac{λ}{F S \times 60}

(6)

The Stride Frequency

f_{s t e p}

is then calculated as

f_{s t e p} = \frac{N_{s t e p s}}{T_{m i n u t e s}}

(7)

Support Time reflects the duration during which the signal remained near zero (i.e.,

| z (t) | < 0.1

), typically corresponding to the phase of foot-ground contact. The signal was first scanned to count all frames satisfying

|z (t)| < 0.1

. Thereafter, the frame count of each contiguous interval was divided by the sampling rate FS to obtain its duration. Finally, the mean duration across all intervals was computed to yield the Support Time

T_{s u p p o r t}

.

An approximately linear relationship between Step Amplitude and Stride Frequency was assumed to simplify computation. The time required for a single step (i.e., the gait cycle) was first calculated as follows

T_{s t e p} = \frac{60}{f_{s t e p}} (s e c o n d s p e r s t e p)

(8)

The gait-cycle duration

T_{s t e p}

was then multiplied by a calibration coefficient

k

to derive the Step Amplitude

T_{s t e p}

. In this study,

k

was set to 0.5 based on the walking speeds of healthy and lame cows (0.52–1.37 m·s⁻¹) and the regression slope of dairy cow gait data [36], yielding

L_{s t e p} = k \times T_{s t e p}

(9)

The four handcrafted features were assembled into a

4 \times 1

feature vector and concatenated with the deep temporal features automatically extracted by the improved InceptionTime network to form the final feature representation.

All processing and analysis were performed on a workstation equipped with an Intel i7-12650H @ 2.30 GHz CPU, 32 GB RAM, and an NVIDIA GeForce RTX 4060 GPU, running with Pytorch 2.7.0 as the deep-learning framework.

3. Results and Discussion

3.1. Variance-Threshold Discrimination Module Distinguished Cow Motion State Quickly

Six window lengths (30, 50, 60, 80, 100, and 120 frames) were selected to generate six datasets of windowed time-series samples for stationary and motion states, each of which was split into training and validation subsets. The threshold discrimination module was applied to the training sets under each window length to compute the variance of every time-window sample. As shown in Figure 5, red and green dots denote the variance values of individual motion and stationary samples, respectively, while the blue horizontal line indicates the discrimination threshold determined for each window length.

For a window length of 120 frames, the variance range for stationary windows was 1.33 × 10⁻⁵ to 0.076, and for motion windows, 0.275 to 2.18; accordingly, a threshold of 0.175 was defined. Smaller window lengths correspond to faster model response and stronger online detection capability. As illustrated in Figure 5, decreasing the window length narrowed the gap between the minimum variance of motion windows and the maximum variance of stationary windows. When the window length was reduced to 50 frames, the minimum motion variance exceeded the maximum stationary variance, making threshold determination infeasible. Consequently, the threshold discrimination module was effective for window lengths

\geq 60

frames, corresponding to a minimum supported discrimination interval of 2 s.

The algorithm was then evaluated on the test set, which comprised 94 373 stationary windows and 15,630 motion windows. When the window length was reduced to 60 frames (2 s), the misclassification rate for motion windows was only 1.2%; extending the window to 90 frames (3 s) reduced this rate to 0.1%. Furthermore, at a window length of 120 frames, the variance-threshold module achieved a stationary-state detection accuracy of 100% (no false positives) and a motion-state detection accuracy of 99.7% (false-negative rate of 0.3%), with a single-window inference latency of just 0.8 ms on the CPU used.

Unlike other approaches, the variance-threshold module proposed here enables preliminary filtering of invalid data before model inference, effectively removing redundant information from stationary periods and substantially reducing both data storage requirements and communication load. In future deployments, this mechanism is expected to alleviate data backlog issues associated with high-frequency sensors and to enhance the system’s online processing efficiency and response speed.

3.2. Cross-Model Evaluation Revealed Superior Performance of InceptionTime

Firstly, the InceptionTime modules were employed in the network architecture to train a lameness detection model (Exp 1, Figure 6a). Experiments were conducted using three window lengths (60 frames, 90 frames, and 120 frames) to investigate the impact of time-window size on model performance. The training results for each window length are presented in Figure 6c. Both validation and test accuracies increased as the window length grew. However, regardless of window size, the validation accuracy approached nearly 100%, indicating overfitting. Moreover, the InceptionTime-based lameness detection model exceeded 5 MB in size and contained millions of parameters, making it unsuitable for deployment on lightweight hardware such as microcontrollers.

To assess InceptionTime’s performance relative to prevalent industry models, TCN, FNet, Transformer, and LSTM architectures were selected as baselines, with results collated in Table 3. Both TCN and LSTM achieved >99% accuracy on the validation set, but their test-set accuracies declined to 63.7% and 58.8%, respectively, indicating the most severe overfitting. FNet and Transformer demonstrated mediocre performance under small-sample conditions, with test accuracies below 62%. The InceptionTime baseline, leveraging its multi-scale convolutional structure, obtained the highest test accuracy of 66.24% among the evaluated models; however, it also possessed the largest model size and parameter count and exhibited overfitting, thereby motivating subsequent architectural enhancements.

3.3. YOLOConv1D and SeparableConv1D Lightened Network Sharply

To reduce model size and mitigate overfitting, the standard Conv1D layers within each Inception module were replaced with YOLOConv1D (1 × 1, 1 × 5) and SeparableConv1D (3 × 3) in Experiment 2 (Figure 3a), and a Dropout layer (p = 0.5) was appended to the InceptionTime backbone (Figure 3b), resulting in the improved InceptionTime architecture.

An ablation study was designed (Table 4) to assess the impact of these modifications on model compactness and generalization. InceptionTime-YOLO1D involved replacing only the Conv1D layers with YOLOConv1D (1 × 1, 1 × 5). InceptionTime-YoloSep further substituted the remaining Conv1D layers with SeparableConv1D (3 × 3, expansion = 2) on top of the YOLOConv1D replacements. Although the lameness-detection model built on the original InceptionTime architecture attained higher accuracy on the validation set, its performance declined sharply on the test set, revealing pronounced overfitting. The architectural refinements in the improved InceptionTime effectively mitigated this problem.

Across all sliding-window lengths, the baseline InceptionTime exhibited consistently higher validation accuracy than its improved counterpart, but substantially lower test accuracy. At a window size of 120 frames, test accuracy rose from 66.24% to 73.14% (an improvement of 6.90%), while model size shrank from 5.39 MB to 0.72 MB and parameter count fell from 1.40 M to 0.17 M. These findings demonstrate that the Improved InceptionTime effectively reduced overfitting on the validation set and enhanced generalization to unseen cows. Furthermore, the structural optimizations greatly decreased both parameter count and computational complexity, producing a lightweight model that not only shortened inference time but is also more suitable for deployment on edge devices or in resource-constrained environments.

YOLOConv1D augments the standard Conv1D layer with Batch Normalization and LeakyReLU, which improves training stability and mitigates the vanishing-gradient problem. Because LeakyReLU is less susceptible than ReLU to “dying” neurons, it also helps preserve fine-grained details in the feature maps. SeparableConv1D, which first applies a depthwise convolution (one filter per input channel; parameters = in_channels × kernel_size), and a pointwise 1 × 1 convolution that mixes channels (parameters = in_channels × out_channels) drastically reduces the total parameter count relative to a conventional convolution (in_channels × out_channels × kernel_size). The savings become more pronounced as the kernel size grows. As a result, SeparableConv1D yields a lighter model without sacrificing its ability to extract local temporal features, thereby maintaining robust detection performance.

Window length did not affect model size or parameter count. Based on overall accuracy, real-time responsiveness, and resource usage, a window size of 120 frames (4 s per inference) was identified as the optimal compromise: it yielded the highest test accuracy while satisfying the storage and computational constraints of edge devices.

As shown by Table 3 and Table 4, the improved InceptionTime network achieved an 86% reduction in model size and a 6.9% increase in test accuracy, surpassing mainstream architectures such as TCN, LSTM, FNet, and Transformer. Thus, the proposed model strikes an optimal balance between accuracy and lightweight design, rendering it well-suited for deployment on microcontrollers and other resource-constrained platforms.

3.4. Feature Extraction Further Enhanced the Detection Accuracy of Lameness

After the improved InceptionTime achieved a 6.9% increase in test accuracy, its overall performance surpassed existing models; however, a gap remained relative to production-level requirements, and end-to-end convolutional networks risked under-extracting key physiological gait features (e.g., Stride Frequency, Step Amplitude). To address this, handcrafted gait features were incorporated to create a “feature-fusion” version of the improved InceptionTime (Figure 7a).

First, raw acceleration signals were preprocessed using a fourth-order Butterworth band-pass filter and a Savitzky-Golay filter to retain only gait-related frequency components (0.5–5 Hz), reducing noise interference and facilitating the extraction of Stride Frequency, Step Amplitude, Signal Amplitude, and Support Time. Next, each preprocessed window was passed through the feature-extraction module to derive these four gait indicators, which were assembled into a 4 × 1 handcrafted feature vector. Concurrently, the improved InceptionTime branch produced a 128 × 1 deep-feature vector from the same window. These two vectors were concatenated and fed through a fully connected layer to yield the final predicted probabilities for each lameness state.

After the feature-extraction module was incorporated, the training and inference results for each window size are presented in Figure 7c. Across all window sizes, the lameness detection model augmented with handcrafted features achieved further improvements in accuracy, with the largest gain observed at a window size of 120 frames: validation and test accuracies increased to 84.71% and 80.6%, respectively (Table 4). Validation accuracy rose by 20.63% relative to the model without the feature-extraction module, while neither model size nor parameter count changed. The confusion matrix (Figure 7b) demonstrates high accuracy in classifying both early and severe lameness, indicating that lameness symptoms (light and severe) at these stages can be distinguished reliably, producing a recall of 89.11%.

End-to-end deep-learning modules often rely solely on data-driven feature learning, which can be hindered by noise and limited samples, resulting in a failure to capture subtle gait variations. By introducing an expert-knowledge-based feature-extraction module, four handcrafted gait indicators were explicitly computed from the preprocessed signals. These features not only provided physically interpretable representations of dairy cow gait but also supplied additional discriminative cues. In the feature-fusion model, the 128-dimensional deep-feature vector was concatenated with the four-dimensional handcrafted feature vector, enabling the integration of data-driven and knowledge-driven information. Automatically learned features captured complex patterns and nonlinear relationships, while handcrafted features offered clear, physically meaningful metrics. Their complementary effects allowed the model to preserve both local detail and overall trends across different data distributions (validation vs. test), thereby markedly enhancing detection accuracy. Moreover, the presence of handcrafted features reduced dependence on deep layers to extract all discriminative information, mitigating overfitting and substantially improving generalization on unseen data.

3.5. External-Site Validation Under Different Farm and Environmental Conditions

To examine whether the proposed pipeline generalizes beyond the original experimental site, an additional dataset was recorded at Dongxin Dairy Farm, Kaifeng, Henan Province, on 25 June 2025. Six Holstein cows were instrumented with the same metatarsal-mounted IMU sensors and driven along the same 40 m concrete lane used in the primary study. Although breed and attachment protocol were identical, the validation environment differed in several respects: (i) early-summer ambient temperature instead of late-autumn conditions, (ii) lower hygiene scores for the walking surface, and (iii) distinct housing and feeding management. These factors introduce realistic variation in substrate cleanliness, hoof condition, and background limb motion.

After standard preprocessing (30 Hz sampling, 120-frame windows, variance-threshold filtering), 20,433 windows were obtained: 7122 healthy, 4520 mildly lame, and 8791 severely lame. The frozen model trained on the primary site was applied directly without any parameter update. The resulting confusion matrix is shown in Figure 7d; diagonal elements denote correctly classified windows.

Overall accuracy reached 75.5%. When the mildly and severely lame classes were pooled as a single “lameness” category, the recall for lameness was 85.4%, only 4 percentage points lower than the in-site score (89.1%). Most errors arose from misclassifying healthy windows as lame, likely reflecting greater gait variability caused by higher temperature and less uniform flooring. Nevertheless, the results demonstrate that the learned feature representations retain substantial discriminative power under a new farm, season, and hygiene context, even without retraining. Future work will incorporate this and other external datasets into the training schedule and explore lightweight fine-tuning strategies to further close the domain gap, thereby enhancing robustness across breeds, climates, and management systems.

3.6. X-Axis–Only Signal Selection Balanced Embedded Constraints

All signal preprocessing and model inference in the current system used the forward-aligned acceleration component from the limb-mounted IMU. The sensor was strapped at the distal metatarsus such that its X-axis was oriented as closely as practical to the direction of limb progression during walking. Under this mounting, the forward component provided a clear cyclic structure related to swing, hoof contact, and stride timing, and therefore served as the input to both the variance-threshold screening (Section 3.1) and the subsequent deep learning stages (Section 3.2, Section 3.3 and Section 3.4). The study was designed for long-term, real-time monitoring on embedded edge hardware; processing full tri-axial data would substantially increase computational load, memory movement, and energy consumption, which is unfavorable for device miniaturization and operating duration. For these reasons, we prioritized a computationally efficient, low-power input configuration.

Data from the other axes were not incorporated into the present analysis. The sensor’s Z-axis (approximately perpendicular to the walking direction in the limb reference frame) showed no clear additional contribution to gait discrimination in our preliminary inspection. The Y-axis (nominally vertical) could contain information related to limb lift and impact events, but was excluded in order to control experimental variables, simplify the modeling pipeline, and remain within the resource budget of the embedded platform. This forward-axis focus may reduce sensitivity to gait adaptations that manifest outside the sagittal plane—such as lateral displacement, abduction/adduction, or asymmetric weight bearing across contralateral limbs. Future work will evaluate the incremental value of incorporating the Y-axis and, where feasible, full tri-axial signals, together with orientation handling and lightweight feature-fusion strategies, to improve robustness and early detection sensitivity under field conditions.

3.7. Lightweight Design and Data Structure Support Integration with Commercial Herd-Monitoring Systems

Commercial precision-livestock platforms such as AfiAct leg sensor/AfiTag family, CowManager ear-sensor networks, and SenseHub (formerly SCR) collar or tag systems (Afimilk Ltd., Kibbutz Afikim, Israel; CowManager B.V., Harmelen, Netherlands; SCR Engineers Ltd., Netanya, Israel) are widely deployed on dairy farms to generate continuous activity, rumination/ingestion, reproductive, and general health alerts at the individual-cow and group levels. These systems already provide on-farm wireless data acquisition, device addressing, and herd-management software environments into which additional sensor data streams can be ingested. Leveraging such existing infrastructure could substantially reduce the incremental hardware and management burden associated with rolling out specialized lameness analytics.

The proposed early-lameness pipeline produces low-bandwidth, window-level summary outputs (variance screening, feature vectors, class probabilities) rather than requiring continuous high-rate waveform upload, which aligns well with the communication and battery constraints typical of commercial tag networks. In practical deployment, this early-lameness detection pipeline can be embedded directly within a limb-mounted module analogous to current leg-activity tags to preprocess the raw forward-axis acceleration signal and apply variance-threshold screening; after motion is identified, the system transmits only downsampled or event-flagged window-level summaries via the farm’s existing wireless base-station network; finally, the resulting lameness risk scores are integrated seamlessly into herd-management dashboards alongside existing activity, rumination, health, and reproductive alerts to support decision-making.

4. Conclusions

An online early lameness detection method for cows based on an improved InceptionTime architecture and feature fusion was first proposed. Acceleration data were acquired using wearable IMU sensors, downsampled, and segmented into sliding windows to construct a library of dairy cow locomotion time-series samples. To achieve a lightweight model, low-latency processing, and high accuracy, three strategies were implemented. Firstly, a threshold discrimination module based on X-axis acceleration variance was designed to rapidly distinguish stationary from motion states within each fixed-length window. On the test set, this module achieved 100% accuracy for stationary windows and 99.7% accuracy for motion windows, with a per-window inference time of only 0.8 ms, thereby alleviating data-storage burdens under high-frequency sampling. Secondly, within the standard InceptionTime backbone, ordinary Conv1D layers were replaced with YOLOConv1D and SeparableConv1D modules, and Dropout (p = 0.5) was added after each Inception block. These modifications reduced model size and parameter count, helped mitigate overfitting, and markedly enhanced generalization performance on the test set. Finally, four handcrafted features (Signal Amplitude, Stride Frequency, Step Amplitude, and Support Time) were extracted based on dairy cow kinematic characteristics and fused with the 128-dimensional deep features automatically obtained by the improved InceptionTime network. This fusion further improved early-lameness detection, yielding an recall of 89.11% for early and severe lameness with a model size of only 0.72 MB. This approach offers a low-cost, easily deployable solution for real-time early-lameness monitoring and intervention in large-scale dairy farming.

Despite the strong performance demonstrated on both the primary research station and the independent Henan farm, two practical constraints remain. First, the current pipeline processes only the fore-aft (X-axis) acceleration signal, which may limit its ability to detect asymmetric or lateral gait impairments; when hardware resources permit, future work will incorporate multi-axis IMU data to capture three-dimensional movement and improve sensitivity to complex step deviations. Second, although the external-site validation broadens geographic coverage, the overall dataset still represents a single breed and a limited set of housing and management conditions; forthcoming studies will therefore include recordings from multiple commercial farms with varied climates, flooring materials, and production systems, complemented by targeted data augmentation to further enhance generalization. These enhancements are expected to strengthen the pipeline’s robustness across diverse production settings while preserving its low-latency, low-bandwidth operation.

Author Contributions

Conceptualization, X.D., G.C. and S.H.; methodology, X.D. and L.Y.; software, X.D.; validation, X.D. and G.C.; formal analysis, X.D.; investigation, Y.W.; resources, J.L. and S.H.; data curation, X.D. and G.C.; writing—original draft preparation, X.D.; writing—review and editing, G.C., L.Y., Y.W., Z.L. and S.H.; visualization, X.D.; supervision, S.H. and J.L.; project administration, J.L.; funding acquisition, J.L. and S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (grant number 2024YFD1300604), the National Natural Science Foundation of China (grant number 32102600), the Agricultural Science and Technology Innovation Program (ASTIP) of Chinese Academy of Agricultural Sciences (grant number CAAS-ASTIP-2024-AII), and the Fundamental Research Funds for Central Non-profit Scientific Institutions (grant number JBYW-AII-2024-28/35).

Institutional Review Board Statement

The animal study protocol was approved by the Institutional Review Board of the Agricultural Information Institute of Chinese Academy of Agricultural Sciences (protocol code AII2021-810 and date of approval 10 August 2021).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Beusker, N. Welfare of Dairy Cows: Lameness in Cattle—A Literature Review. Ph.D. Thesis, University of Veterinary Medicine Hannover, Hanover, Germany, 2007. [Google Scholar]
Andrýsek, J.; Liprtová, L.; Falta, D. The influence of lameness of Holstein breed milk cows on their dairy yield. Anim. Welf. Ethol. Hous. Syst. 2013, 9, 1–5. [Google Scholar]
Liang, D.; Arnold, L.; Stowe, C.; Harmon, R.; Bewley, J. Estimating US dairy clinical disease costs with a stochastic simulation model. J. Dairy Sci. 2017, 100, 1472–1486. [Google Scholar] [CrossRef]
Pedersen, S.; Wilson, J. Early detection and prompt effective treatment of lameness in dairy cattle. Livestock 2021, 26, 115–121. [Google Scholar] [CrossRef]
Li, Q.; Chu, M.; Kang, X.; Liu, G. Temporal aggregation network using micromotion features for early lameness recognition in dairy cows. Comput. Electron. Agric. 2023, 204, 107562. [Google Scholar] [CrossRef]
Kang, X.; Li, S.; Li, Q.; Liu, G. Dimension-reduced spatiotemporal network for lameness detection in dairy cows. Comput. Electron. Agric. 2022, 197, 106922. [Google Scholar] [CrossRef]
Wu, S.; Han, S.; Zhang, J.; Cheng, G.; Wang, Y.; Zhang, K.; Han, M.; Wu, J. Multi-scale keypoints detection and motion features extraction in dairy cows using ResNet101-ASPP network. J. Integr. Agric. 2024, in press. [Google Scholar] [CrossRef]
Van Hertem, T.; Bahr, C.; Tello, A.S.; Viazzi, S.; Steensels, M.; Romanini, C.; Lokhorst, C.; Maltz, E.; Halachmi, I.; Berckmans, D. Lameness detection in dairy cattle: Single predictor v. multivariate analysis of image-based posture processing and behaviour and performance sensing. Animal 2016, 10, 1525–1532. [Google Scholar]
Harris-Bridge, G.; Young, L.; Handel, I.; Farish, M.; Mason, C.; Mitchell, M.A.; Haskell, M.J. The use of infrared thermography for detecting digital dermatitis in dairy cattle: What is the best measure of temperature and foot location to use? Vet. J. 2018, 237, 26–33. [Google Scholar] [CrossRef]
Fatoki, O.; Du, C.; Hans, R.; Bello, R.-W. Role of computer vision and deep learning algorithms in livestock behavioural recognition: A state-of-the-art-review. Edelweiss Appl. Sci. Technol. 2024, 8, 6416–6430. [Google Scholar] [CrossRef]
Araújo, V.M.; Rili, I.; Gisiger, T.; Gambs, S.; Vasseur, E.; Cellier, M.; Diallo, A.B. AI-Powered Cow Detection in Complex Farm Environments. Smart Agric. Technol. 2025, 10, 100770. [Google Scholar] [CrossRef]
Kang, X.; Zhang, X.D.; Liu, G. A review: Development of computer vision-based lameness detection for dairy cows and discussion of the practical applications. Sensors 2021, 21, 753. [Google Scholar] [CrossRef]
Borghart, G.; O’Grady, L.; Somers, J. Prediction of lameness using automatically recorded activity, behavior and production data in post-parturient Irish dairy cows. Ir. Vet. J. 2021, 74, 4. [Google Scholar] [CrossRef] [PubMed]
Lavrova, A.I.; Choucair, A.; Palmini, A.; Stock, K.F.; Kammer, M.; Querengässer, F.; Doherr, M.G.; Müller, K.E.; Belik, V. Leveraging Accelerometer Data for Lameness Detection in Dairy Cows: A Longitudinal Study of Six Farms in Germany. Animals 2023, 13, 3681. [Google Scholar] [CrossRef] [PubMed]
Muzzo, B.I.; Ramsey, R.D.; Villalba, J.J. Changes in Climate and Their Implications for Cattle Nutrition and Management. Climate 2025, 13, 1. [Google Scholar] [CrossRef]
Alsaaod, M.; Luternauer, M.; Hausegger, T.; Kredel, R.; Steiner, A. The cow pedogram—Analysis of gait cycle variables allows the detection of lameness and foot pathologies. J. Dairy Sci. 2017, 100, 1417–1426. [Google Scholar] [CrossRef]
Haladjian, J.; Haug, J.; Nüske, S.; Bruegge, B. A wearable sensor system for lameness detection in dairy cattle. Multimodal Technol. Interact. 2018, 2, 27. [Google Scholar] [CrossRef]
Zhang, K.; Han, S.; Wu, J.; Cheng, G.; Wang, Y.; Wu, S.; Liu, J. Early lameness detection in dairy cattle based on wearable gait analysis using semi-supervised LSTM-autoencoder. Comput. Electron. Agric. 2023, 213, 108252. [Google Scholar] [CrossRef]
Jarchi, D.; Kaler, J.; Sanei, S. Lameness detection in cows using hierarchical deep learning and synchrosqueezed wavelet transform. IEEE Sens. J. 2021, 21, 9349–9358. [Google Scholar] [CrossRef]
Zhou, H.; Kan, C. Tensor-based ECG anomaly detection toward cardiac monitoring in the internet of health things. Sensors 2021, 21, 4173. [Google Scholar] [CrossRef]
Jain, P.; Deshmukh, A.; Padole, H. Design of an integrated myocardial infarction detection model using ecg connectivity features and multivariate time series classification. IEEE Access 2024, 12, 9070–9081. [Google Scholar] [CrossRef]
Falcão, D.; Reis, F.; Farinha, J.; Lavado, N.; Mendes, M. Fault Detection in Industrial Equipment through Analysis of Time Series Stationarity. Algorithms 2024, 17, 455. [Google Scholar] [CrossRef]
Yan, P.; Abdulkadir, A.; Luley, P.-P.; Rosenthal, M.; Schatte, G.A.; Grewe, B.F.; Stadelmann, T. A comprehensive survey of deep transfer learning for anomaly detection in industrial time series: Methods, applications, and directions. IEEE Access 2024, 12, 3768–3789. [Google Scholar] [CrossRef]
Liu, H.; Zhang, S.; Gamboa, H.; Xue, T.; Zhou, C.; Schultz, T. Taxonomy and real-time classification of artifacts during biosignal acquisition: A starter study and dataset of ECG. IEEE Sens. J. 2024, 24, 9162–9171. [Google Scholar] [CrossRef]
Quraishi, A.; Rusho, M.A.; Prasad, A.; Keshta, I.; Rivera, R.; Bhatt, M.W. Employing Deep Neural Networks for Real-Time Anomaly Detection and Mitigation in IoT-Based Smart Grid Cybersecurity Systems. In Proceedings of the 2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 26–27 April 2024; pp. 1–6. [Google Scholar]
Ismail Fawaz, H.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. Inceptiontime: Finding alexnet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
Sofia, V. Analysis and Innovation in Time Series Classification Algorithms and Methods: Towards a Superior Convolutional Algorithm with Feature Selection. Ph.D. Thesis, Aristotle University of Thessaloniki, Thessaloniki, Greece, 2024. [Google Scholar]
Usmankhujaev, S.; Ibrokhimov, B.; Baydadaev, S.; Kwon, J. Time series classification with inceptionfcn. Sensors 2021, 22, 157. [Google Scholar] [CrossRef] [PubMed]
Flower, F.; Weary, D. Effect of hoof pathologies on subjective assessments of dairy cow gait. J. Dairy Sci. 2006, 89, 139–146. [Google Scholar] [CrossRef]
Wu, W.; Qiu, F.; Wang, L.; Liu, Y. Multiscale spatial-temporal transformer with consistency representation learning for multivariate time series classification. Concurr. Comput. Pract. Exp. 2024, 36, e8234. [Google Scholar] [CrossRef]
Shuvo, M.M.H.; Islam, S.K.; Cheng, J.; Morshed, B.I. Efficient acceleration of deep learning inference on resource-constrained edge devices: A review. Proc. IEEE 2022, 111, 42–91. [Google Scholar] [CrossRef]
Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How does batch normalization help optimization? Adv. Neural Inf. Process. Syst. 2018, 31, 2488–2498. [Google Scholar]
Maniatopoulos, A.; Mitianoudis, N. Learnable leaky relu (LeLeLU): An alternative accuracy-optimized activation function. Information 2021, 12, 513. [Google Scholar] [CrossRef]
Naseri, M.; De Poorter, E.; Moerman, I.; Poor, H.V.; Shahid, A. High-Throughput Blind Co-Channel Interference Cancellation for Edge Devices Using Depthwise Separable Convolutions, Quantization, and Pruning. IEEE Open J. Commun. Soc. 2024, 6, 656–670. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Walker, A.; Pfau, T.; Channon, A.; Wilson, A. Assessment of dairy cow locomotion in a commercial farm setting: The effects of walking speed on ground reaction forces and temporal and linear stride characteristics. Res. Vet. Sci. 2010, 88, 179–187. [Google Scholar] [CrossRef]

Figure 1. On-site figure of cow gait data collection.

Figure 2. Architecture of Real-time Early Lameness Recognition Model.

Figure 3. (a) Baseline Inception block; (b) Improved block.

Figure 4. Illustration of Step Peak Filtering Based on Distance and Prominence Thresholds.

Figure 5. Scatter plot of sample variance distribution under six groups of window sizes: 30, 50, 60, 80, and 120 (100 samples are displayed in each window). Red, green, and blue horizontal lines represent the minimum motion variance, maximum stationary variance, and discrimination threshold, respectively.

Figure 6. (a) Architecture of Inception Module (Exp 1) and Improved Inception Module (Exp 2). (b) Architecture of Improved InceptionTime. (c) Comparison of model performance between InceptionTime and Improved InceptionTime on window sizes = 60, 90, and 120.

Figure 7. (a) Network Architecture of improved InceptionTime with Feature Extraction Module. (b) Confusion Matrix of improved InceptionTime with Feature Extraction Module on Window Size = 120. (c) Comparison of model performance between improved InceptionTime with and without feature extraction modules on window sizes = 60, 90, and 120. (d) Confusion matrix for the Improved InceptionTime with Feature Extraction module evaluated on the independent Henan-farm validation dataset (window size = 120).

Table 1. Structure and sample size of the dairy cow lameness dataset (taking a window size of 120 as an example).

Activity State	Health Condition	Training Samples	Validation Samples	Test Samples	Total Sample
Stationary (standing and recumbent)	—	75,498	18,875	—	94,373
Motion	Healthy	105,818	26,455	4517	136,790
	Early lameness	60,413	15,103	5377	80,893
	Severe lameness	80,689	20,172	5946	106,807
Sum (motion)	—	246,920	61,730	15,840	324,490
Sum	—	—	—	—	418,863

Table 2. Architecture configuration of the Improved Inception Module.

Branch	Layer Type	Input Channel	Out Channel	Kernel Size	Stride	Padding	Additional Operation
Branch1	YOLO Conv1D	128	32	1 × 1	1	0	BatchNorm + LeakyReLU
Branch2	Separable Conv1D	128	32	3 × 3	1	1	Depthwise Conv + Pointwise Conv, BatchNorm + ReLU
Branch3	YOLO Conv1D	128	32	5 × 5	1	2	BatchNorm + LeakyReLU
Branch4	MaxPool 1D + YOLO Conv1D	128	32	Pooling: 3 × 3 Conv: 1 × 1	1	1	BatchNorm + LeakyReLU
After concatenation	Dropout + BatchNorm + ReLU	128	128	—	—	—	Dropout (p = 0.5)

Table 3. Comparison of the existing model and this work in terms of accuracy, model size, number of model parameters, and inference speed on window size = 120.

Models	Accuracy of Validation Set	Accuracy of Test Set	Model Size (MB)	Model Parameters
TCN	99.83%	63.72%	0.51	131,363
FNet	82.92%	59.20%	2.03	529,539
Transformer	71.02%	61.36%	0.39	100,291
LSTM	99.78%	58.80%	0.26	67,459
InceptionTime	99.95%	66.24%	5.39	1,397,507

Table 4. Ablation study of InceptionTime variants (Baseline-Yolo1D-YoloSep) on window size = 120: accuracy, model size, parameter count, and inference speed.

Models	Accuracy of Validation Set	Accuracy of Test Set	Model Size (MB)	Model Parameters
InceptionTime	99.95%	66.24%	5.39	1,397,507
InceptionTime-Yolo1D	66.8%	67.93%	0.87	208,579
InceptionTime-YoloSep	70.22%	73.14%	0.72	169,478
Improved InceptionTime + Feature extraction	84.71%	80.6%	0.72	169,490

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dai, X.; Cheng, G.; Yang, L.; Wang, Y.; Li, Z.; Han, S.; Liu, J. An Improved Model for Online Detection of Early Lameness in Dairy Cows Using Wearable Sensors: Towards Enhanced Efficiency and Practical Implementation. Agriculture 2025, 15, 1643. https://doi.org/10.3390/agriculture15151643

AMA Style

Dai X, Cheng G, Yang L, Wang Y, Li Z, Han S, Liu J. An Improved Model for Online Detection of Early Lameness in Dairy Cows Using Wearable Sensors: Towards Enhanced Efficiency and Practical Implementation. Agriculture. 2025; 15(15):1643. https://doi.org/10.3390/agriculture15151643

Chicago/Turabian Style

Dai, Xiaofei, Guodong Cheng, Lu Yang, Yali Wang, Zhongkun Li, Shuqing Han, and Jifang Liu. 2025. "An Improved Model for Online Detection of Early Lameness in Dairy Cows Using Wearable Sensors: Towards Enhanced Efficiency and Practical Implementation" Agriculture 15, no. 15: 1643. https://doi.org/10.3390/agriculture15151643

APA Style

Dai, X., Cheng, G., Yang, L., Wang, Y., Li, Z., Han, S., & Liu, J. (2025). An Improved Model for Online Detection of Early Lameness in Dairy Cows Using Wearable Sensors: Towards Enhanced Efficiency and Practical Implementation. Agriculture, 15(15), 1643. https://doi.org/10.3390/agriculture15151643

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Model for Online Detection of Early Lameness in Dairy Cows Using Wearable Sensors: Towards Enhanced Efficiency and Practical Implementation

Abstract

1. Introduction

2. Materials and Methods

2.1. Wearable Sensor Data Acquisition

2.2. Animal Ethics Approval

2.3. Dataset Definition

2.4. Online Lameness Detection Model Architecture Design

2.5. Variance-Threshold Discrimination Module

2.6. Improved InceptionTime Architecture Design

2.7. Lameness Feature Extraction Module

3. Results and Discussion

3.1. Variance-Threshold Discrimination Module Distinguished Cow Motion State Quickly

3.2. Cross-Model Evaluation Revealed Superior Performance of InceptionTime

3.3. YOLOConv1D and SeparableConv1D Lightened Network Sharply

3.4. Feature Extraction Further Enhanced the Detection Accuracy of Lameness

3.5. External-Site Validation Under Different Farm and Environmental Conditions

3.6. X-Axis–Only Signal Selection Balanced Embedded Constraints

3.7. Lightweight Design and Data Structure Support Integration with Commercial Herd-Monitoring Systems

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI