Next Article in Journal
Progress in the Determination of Resorcinol Using Electrochemical Method
Previous Article in Journal
Correction: He et al. A Novel Optical Fiber Terahertz Biosensor Based on Anti-Resonance for the Rapid and Nondestructive Detection of Tumor Cells. Biosensors 2023, 13, 947
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Creation of Machine Learning Models Trained on Multimodal Physiological, Behavioural, Blood Biochemical, and Milk Composition Parameters for the Identification of Lameness in Dairy Cows

by
Karina Džermeikaitė
1,*,
Justina Krištolaitytė
1,
Samanta Grigė
1,
Akvilė Girdauskaitė
1,
Greta Šertvytytė
1,
Gabija Lembovičiūtė
1,
Mindaugas Televičius
1,
Vita Riškevičienė
2 and
Ramūnas Antanaitis
1
1
Animal Clinic, Veterinary Academy, Lithuania University of Health Sciences, Tilžės Str. 18, LT-47181 Kaunas, Lithuania
2
Department of Veterinary Pathobiology, Faculty of Veterinary, Veterinary Academy, Lithuanian University of Health Sciences, Tilžės Str. 18, 44307 Kaunas, Lithuania
*
Author to whom correspondence should be addressed.
Biosensors 2025, 15(11), 722; https://doi.org/10.3390/bios15110722
Submission received: 9 October 2025 / Revised: 22 October 2025 / Accepted: 30 October 2025 / Published: 31 October 2025
(This article belongs to the Special Issue Sensors for Human and Animal Health Monitoring)

Abstract

Lameness remains a significant welfare and productivity challenge in dairy farming, often underdiagnosed due to the limitations of conventional detection methods. Unlike most previous approaches to lameness detection that rely on a single-sensor or gait-based measurement, this study integrates four complementary data domains—behavioural, physiological, biochemical, and milk composition parameters—collected from 272 dairy cows during early lactation to enhance diagnostic accuracy and biological interpretability. The main objective of this study was to evaluate and compare the diagnostic classification performance of multiple machine learning (ML) algorithms trained on multimodal data collected at the time of clinical lameness diagnosis during early lactation, and to identify the most influential physiological and biochemical traits contributing to classification accuracy. Specifically, six algorithms—random forest (RF), neural network (NN), Ensemble, support vector machine (SVM), k-nearest neighbors (KNN), and logistic regression (LR)—were assessed. The input dataset integrated physiological parameters (e.g., water intake, body temperature), behavioural indicators (rumination time, activity), blood biochemical biomarkers (non-esterified fatty acids (NEFA), aspartate aminotransferase (AST), lactate dehydrogenase (LDH), gamma-glutamyl transferase (GGT)), and milk quality traits (fat, protein, lactose, temperature). Among all models, RF achieved the highest validation accuracy (97.04%), perfect validation specificity (100%), and the highest normalized Matthews correlation coefficient (nMCC = 0.94), as determined through Monte Carlo cross-validation on independent validation sets. Lame cows showed significantly elevated NEFA and body temperatures, reflecting enhanced lipid mobilization and inflammatory stress, alongside reduced water intake, milk protein, and lactose content, indicative of systemic energy imbalance and impaired mammary function. These physiological and biochemical deviations emphasize the multifactorial nature of lameness. Linear models like LR underperformed, likely due to their inability to capture the non-linear and interactive relationships among physiological, biochemical, and milk composition features, which were better represented by tree-based and neural models. Overall, the study demonstrates that combining sensor data with blood biomarkers and milk traits using advanced ML models provides a powerful, objective tool for the clinical classification of lameness, offering practical applications for precision livestock management by supporting early, data-driven decision-making to improve welfare and productivity on dairy farms.

Graphical Abstract

1. Introduction

Lameness is one of the most prevalent and economically significant health issues affecting dairy farms worldwide. It compromises not only animal welfare but also reproductive efficiency, milk production, and the overall sustainability of dairy operations [1]. Despite its major impact, lameness often remains underdiagnosed, especially during its early stages, primarily due to the limitations of conventional detection methods. Visual locomotion scoring—the most commonly used approach—is subjective, labour-intensive, and highly dependent on the observer’s experience, leading to inconsistent and delayed identification of affected animals [2,3]. It has been shown that 55% of lactations are linked to lameness-related health issues, and 15% to mastitis or uterine infections. This indicates that, despite extensive information regarding dairy cow management, some unresolved challenges persist [4]. Biochemical markers such as non-esterified fatty acids (NEFA) and beta-hydroxybutyrate (BHB) reflect energy balance and are linked to postpartum metabolic disorders, which can affect cow health, productivity, and welfare [5]. Lameness not only affects locomotion but also notably alters cow behaviour—lame dairy cows spend less time feeding, exhibit decreased feed intake, and are less active overall, while rumination time is significantly reduced [5,6]. These behavioural changes often correspond with declines in milk yield, and milk composition is also altered, with lame cows producing less milk and showing higher fat and lower lactose content compared to healthy cows [6].
In recent years, the dairy industry has undergone a technological transformation, with increasing adoption of precision livestock farming (PLF) tools that generate high-frequency, individualized health and productivity data [7]. This advancement opens new opportunities to develop objective, data-driven diagnostic solutions. Yet, most existing efforts to apply data analytics to lameness detection have focused on single domains—such as motion sensors, video-based gait analysis, or milk yield deviations—providing only a partial understanding of the complex physiological disruptions associated with lameness [8,9]. Few studies have attempted to combine diverse biological and production-related indicators into a unified, interpretable model for classifying clinically lame versus healthy cows.
Machine learning (ML), a subfield of artificial intelligence, employs statistical methods to identify or forecast bovine performance or illness occurrences by utilising extensive datasets and managing complex relationships arising from an ever-expanding array of variables [10]. Researchers have employed ML techniques to identify or forecast many health conditions, including clinical mastitis, utilising random forest (RF), naïve Bayes, eXtreme Gradient boosting [11], neural networks (NN) [12], decision-tree induction [13], and logistic generalised linear mixed models [14]. Furthermore, the identification of metritis [15] and the assessment of metabolic status [16] in dairy cows during early lactation have been concurrently executed using five and eight ML algorithms, respectively.
The present study addresses this gap by evaluating the diagnostic performance of six ML models—RF, support vector machine (SVM), logistic regression (LR), NN, k-nearest neighbors (KNN), and an Ensemble approach—in classifying lameness based on a comprehensive, multimodal dataset. Data were collected at the time of clinical diagnosis and included physiological metrics (such as water intake, reticulorumen temperature), behavioural patterns (rumination time, cow activity), blood biomarkers (including NEFA, aspartate aminotransferase (AST), lactate dehydrogenase (LDH), and gamma-glutamyl transferase (GGT), triglycerides (TRIG), total protein (TP), lactate dehydrogenase (LDH), iron (Fe)), and milk quality parameters (such as milk protein, lactose, temperature and fat-to-protein ratio). By integrating these diverse features, this study moves beyond reductionist, single sensor approaches and reflects the multifactorial nature of lameness as a clinical condition. In ML, it is customary to assess multiple algorithms on integrated data (e.g., management, health, milking) as the efficacy of each algorithm may be influenced by features, sample size, structure, and other attributes of the data set [16]. Prior research has utilised accelerometer-derived data to predict lameness in dairy cows, revealing a minor to moderate correlation between these behavioural metrics (laying and standing) and lameness detection [17], with an accuracy of 87% [18], sensitivity of 90.2%, and specificity of 91.7% [19]. What sets this study apart is its methodological rigor and biological breadth. The integration of phenotypic and biochemical information allows for a robust evaluation of model performance, highlighting the superiority of tree-based and neural network models in capturing complex patterns associated with lameness.
The novelty of this work lies in its integrative approach, the comparative evaluation of multiple ML algorithms, and the inclusion of underexplored health indicators such as metabolic and inflammatory biomarkers. By bridging sensor data with blood chemistry and milk traits, this study introduces a new paradigm for understanding and diagnosing lameness in dairy cattle. It represents a meaningful advancement in both scientific methodology and real-world applicability, aligning with the broader goals of sustainable and welfare-oriented livestock farming.
The aim of this study was to evaluate and compare the diagnostic classification performance of six ML models—RF, SVM, LR, NN, KNN, and an Ensemble model—in distinguishing clinically lame versus healthy dairy cows during early lactation. All multimodal data, including physiological sensor measurements, behavioural traits, blood biomarkers, and milk quality parameters, were collected at the time of clinical lameness diagnosis. This design enables an objective evaluation of diagnostic model performance, rather than pre-clinical prediction, while emphasizing the biological interpretability of integrated multimodal features.

2. Materials and Methods

2.1. Ethical Approval

All animal-related procedures received approval from the Institutional Animal Care and Use Committee of the Lithuanian University of Health Sciences (LSMU) (Protocol No. G2-227, adopted on 7 March 2025). The research complied with the stipulations of EU Directive 2010/63/EU regarding the protection of animals utilised for scientific purposes.

2.2. Study Design and Animals

This observational case–control study was conducted at a dairy farm of Practical Training and Research Center of LSMU, using high-producing Holstein-Friesian cows in early lactation. The 5 January 2025 marked the beginning of the experiment, which lasted until the 31 August 2025. The study population consisted of two groups: cows clinically diagnosed with lameness (n = 110) and clinically healthy controls (n = 162), matched by parity and lactation stage. All cows were housed in free-stall barns under uniform management and feeding conditions. The trial period coincided with the early lactation phase (up to 60 days in milk). DeLaval milking robots, manufactured by DeLaval Inc. in Tumba, Sweden, were used for milking the cows. A mean body weight of 550 ± 45 kilograms was recorded for the cows. They were housed in stables that were made of free-stalls and had a DeLaval ventilation system (DeLaval Inc., Tumba, Sweden). In the year 2024, the average amount of milk produced by each cow was 10 310 kg, with 4.2% fat and 3.5% protein. Throughout the course of the year, the animals were fed a total mixed ration (TMR), which was designed to meet the physiological requirements that they had. Calculations were executed utilising existing formulas [20]. The feed was carefully designed by a nutritionist to guarantee that the cows had all vital elements required for optimal health and milk output. Residual feed was eliminated daily between 5 a.m. and 2 p.m. Feeding took place daily at 8:00 and 16:00. All cows had unrestricted access to water for drinking. The TMR composition is broken out in detail in Table 1 and Table 2.
The study followed a case–control design, in which clinically lame cows (cases) were identified at the time of diagnosis through routine locomotion scoring and clinical examination. Each case was matched with a healthy control cow from the same herd, at a similar stage of lactation and parity, to minimize confounding effects related to production level or physiological stage. Matching was performed within ±5 days of lactation stage. Data collection for both groups was carried out simultaneously, ensuring that environmental and management conditions were equivalent. This design allowed for a direct comparison of physiological, behavioural, biochemical, and milk quality parameters between clinically lame and healthy animals under similar herd conditions.

2.3. Clinical Lameness Assessment

Cows on the farm were systematically assessed for lameness, and those displaying signs of lameness were transferred to an examination pen for the study of gait-related disorders. All cows on the farm receive corrective claw clipping at least twice a year. The herd analysed in this study exhibited a history of lameness associated with foot rot, white line disease, and digital dermatitis.
The classification of cows as either healthy or lame was based on farm herd health records, which ensured that cases of lameness were reported in a timely manner. Rapid detection and notification enabled prompt intervention and treatment, thereby contributing to better herd health outcomes. Veterinary expertise was essential in confirming lameness diagnoses and delivering appropriate therapy, underlining the value of close cooperation between veterinarians and farm personnel. Although the exact interval between the first signs of lameness and veterinary examination was not precisely established, the farm was visited by a veterinarian on a daily basis, excluding weekends. This routine presence likely minimized any delay between the onset of clinical symptoms and initiation of treatment. Continuous veterinary involvement allowed for swift management of lameness cases and helped safeguard animal welfare and herd productivity. All veterinary services were provided by the Large Animal Clinic of the LSMU, where clinicians were trained to apply standardized protocols for diagnosis and therapy. Such collaboration between academic veterinary services and farm staff promoted consistent herd health management and supported more effective livestock production practices.
Lameness was evaluated using a five-point locomotion scoring system, as described by Thomsen et al. [21] and Sprecher et al. [22]. A score of 1 represented normal gait, where cows walked with even, steady strides and showed no signs of discomfort or weight shifting. A score of 2 indicated a slightly irregular gait, with subtle deviations such as shortened steps or minor inconsistencies in movement, although obvious lameness was not apparent. A score of 3 reflected mild lameness, characterized by a clearly uneven gait and visibly shortened strides in one or more limbs, suggesting moderate discomfort though the animal was still able to walk. A score of 4 corresponded to moderate lameness, where cows showed pronounced difficulty in locomotion, often reducing weight-bearing on the affected limb(s), with restricted mobility and obvious pain. Finally, a score of 5 denoted severe lameness, marked by extreme locomotor impairment, avoidance of weight-bearing, and movement achieved only with great effort, indicating substantial pain and compromised functionality.
Cows were classified according to their locomotion scores, with animals scoring 1 or 2 considered non-lame, while those with scores of 3 to 5 were categorized as lame, following the classification system proposed by Winckler and Willen [23]. Veterinary examination further specified the clinical causes of lameness, which included foot ulcers, digital dermatitis, and white line disease. Management of affected cows consisted of hoof trimming combined with pharmacological treatment [24]. Depending on the case, therapy involved either a single subcutaneous injection of ceftiofur sodium (Naxcel® 200 mg/mL, Zoetis Belgium SA, Zaventem, Belgium) at a dosage of 1 mL per 30 kg body weight, or the administration of nonsteroidal anti-inflammatory drugs to relieve pain. In one case, meloxicam (Melovem® 20 mg/mL, Dopharma B.V., Raamsdonksveer, The Netherlands) was administered subcutaneously at a dose of 2.5 mL per 100 kg body weight.
Altogether, 272 cows were enrolled and divided into two study groups based on health status: 162 non-lame cows and 110 lame cows. Health status was determined through veterinary assessment and the locomotion scoring system described above.

2.4. Data Collection and Sensor Measurements

Daily data were collected using automated PLF technologies, including the Brolis HerdLine in-line milk analyser (Brolis Sensor Technology, Vilnius, Lithuania), intraruminal SmaXtec boluses (SmaXtec Animal Care GmbH, Graz, Austria), DeLaval milking robots (DeLaval Inc., Tumba, Sweden), and the SmaXtec climate station. These systems continuously recorded milk composition, behavioural, physiological, reticulorumen, and environmental parameters (including temperature–humidity index (THI)). In addition, blood samples were collected and analysed for biochemical and inflammatory biomarkers. An overview of all collected variables, measurement methods, and instruments is presented in Table 3.

2.5. Blood Sampling and Biochemical Analysis

Blood samples were collected from the coccygeal vein on the day of clinical lameness diagnosis (or the corresponding day for control cows), approximately two hours after feeding. Sampling was performed during the clinical examination, with cows restrained in headlocks, and blood drawn using a needle syringe. For biochemical profiling, samples were collected into evacuated tubes without anticoagulant (BD Vacutainer®, Eysin, Switzerland). Samples were maintained in an upright position and permitted to coagulate at ambient temperature (~22 °C) for approximately 30 min. Blood samples were transported at +4 °C within one hour of collection to the Laboratory of Clinical Tests at the Large Animal Clinic, Veterinary Academy, LSMU, for further analysis. During the laboratory procedure, the samples were subjected to centrifugation at a force of 1500× g for a duration of 15 min. Subsequently, the serum was analysed by means of conventional veterinary biochemistry analysers. Randox clinical chemistry reagent kits were employed alongside an automated wet chemistry analyser (RX Daytona, Randox Laboratories Ltd., London, UK). The subsequent biomarkers were assessed: liver enzymes (AST, GGT, LDH), metabolic indicators (NEFA, TP, TRIG), and Fe. All analyses were conducted in duplicate to guarantee quality assurance.

2.6. Sensor-Based Monitoring of Cow and Environmental Parameters

Reticulorumen monitoring was carried out using orally administered SmaXtec boluses, which were introduced into the reticulorumen with a specialized applicator device in accordance with the manufacturer’s instructions. Prior to use, each bolus was activated, calibrated, and digitally linked to the corresponding animal’s identification number to ensure accurate data assignment. Once applied, the devices were connected to a base station that enabled uninterrupted data transmission throughout the study period.
The boluses continuously measured key physiological and behavioural parameters, including water intake, internal body temperature, rumination time, and cow activity. Data were automatically collected at ten-minute intervals, providing a detailed and dynamic profile of animal health and behaviour.
Environmental monitoring was performed simultaneously using a SmaXtec climate sensor (SmaXtec Animal Care GmbH, Graz, Austria), which continuously recorded ambient temperature and relative humidity. These values were used to calculate the THI, a widely recognized indicator of heat stress in dairy cattle. All physiological, behavioural, and environmental data were processed and integrated through the SmaXtec Messenger® software (version 3.2, SmaXtec Animal Care GmbH, Graz, Austria), creating a comprehensive and real-time overview of health status and environmental conditions.

2.7. Milk Quality Assessment

In this investigation, the Brolis HerdLine in-line milk analyser, which was manufactured by Brolis Sensor Technology and located in Vilnius, Lithuania, was utilised to record the composition of milk. During each milking, samples of milk were taken and tested for the following parameters: protein, fat, lactose, fat-to-protein ratio, and milk temperature.
This analyser makes use of a specialised external cavity laser spectrometer that is dependent on GaSb. This spectrometer runs in the spectral range of 2100–2400 nm and may be adjusted to a wide range of wavelengths. From the beginning of the milking process until the end, it monitored the flow of milk in transmission mode continually. The chemical absorption spectra that were collected were analysed to ascertain the amounts of significant milk ingredients such as lactose, protein, and fat. This compact “mini spectroscope” can be integrated into milking stalls or robotic milking systems, hence eliminating the need for supplementary chemicals or maintenance. During each milking procedure, the milk’s composition was assessed at five-second intervals. To derive the final results reflective of the full milking session, weighted averages of fat, protein, and lactose were calculated. These values were obtained from the dynamics of milk flow.
During the calibration process, the accuracy of every Brolis HerdLine in-line milk analyser was evaluated and evaluated in the laboratory of Eurofins. When it came to fat, protein, and lactose, the root mean square error of prediction (RMSEP) values were 0.21%, 0.19%, and 0.19%, respectively.
Milk yield was automatically recorded at each milking using a DeLaval milking robot.

2.8. Data Processing and Feature Selection

Prior to model development, all data underwent systematic preprocessing. The dataset was first examined for missing values. Observations with more than 10% missing data were excluded, whereas sporadic missing entries (<5% of the dataset) were imputed using variable means to preserve the sample size.
To reduce multicollinearity, Pearson correlation coefficients were computed between all continuous variables. Pairs with a correlation coefficient of r > 0.90 were considered collinear, and one variable from each pair was removed based on biological relevance, measurement reliability, and interpretability.
All continuous predictor variables were standardized using z-score normalization (mean = 0, standard deviation = 1) to ensure equal weighting and improve the performance of algorithms sensitive to feature scale (e.g., SVM, KNN, NN). Categorical variables (lame vs. healthy) were binary encoded (0 = healthy, 1 = lame).
Descriptive statistics and group comparisons were performed using independent sample t-tests, with the significance level set at p < 0.05.

2.9. Description of ML Models and Performance Evaluation

Datasets were initially organized using Microsoft Excel and subsequently imported into Python 3.10 for statistical analysis and ML modelling. Data processing and model development were carried out using the scikit-learn library (v1.2.2) and complementary Python packages (NumPy, Pandas, Matplotlib, and Seaborn).
To evaluate model performance, the full dataset was randomly partitioned into 80% training and 20% validation sets. Stratified sampling was applied to preserve the original class distribution (healthy: 162; lame: 110) with both subsets, ensuring balanced representation of both classes in each training and validation sample. This approach minimizes the risk of bias caused by unequal class proportions and ensures fair comparison across models. Additionally, it was enforced that each validation sample contained at least one lame cow, ensuring that classification metrics could be computed reliably in every iteration.
Model performance was assessed using Monte Carlo cross-validation. Specifically, the random splitting procedure was repeated 10 times, each time generating independent training and validation sets. For each repetition, models were retrained from scratch on the new training data and evaluated on the corresponding validation set. This approach provided a more robust estimate of model generalization performance compared to a single split, as it captures variability associated with data partitioning.
For each model and repetition, performance metrics were computed on the validation set (see Section 2.9 for details). The mean and standard deviation of all evaluation metrics—accuracy, sensitivity, specificity, precision, F1 score, receiver operating characteristic–area under the curve (ROC–AUC) and Matthews correlation coefficient (MCC)—were calculated across the 10 Monte Carlo iterations to summarize the stability and overall performance of each model.

2.9.1. Random Forest

RF, a tree-based ensemble learning method, was chosen due to its robustness to noisy features, ability to handle nonlinear relationships, and built-in feature importance estimation.
The model was trained with 500 decision trees (n_estimators = 500) and a maximum tree depth of 10, which provided a balance between model complexity and generalization. The Gini impurity criterion was used to evaluate split quality, and bootstrap sampling was enabled to introduce variability among trees.
Hyperparameters (n_estimators, max_depth, min_samples_split, and min_samples_leaf) were optimized using grid search during cross-validation. Feature importance rankings were later extracted to interpret which physiological, biochemical, or milk-related traits most influenced classification.

2.9.2. Support Vector Machine

SVM was selected for its effectiveness in high-dimensional spaces and its ability to find non-linear decision boundaries through kernel functions. A Radial Basis Function (RBF) kernel was applied, enabling the model to capture complex patterns in the data.
Two key hyperparameters were optimized:
C (regularization parameter): controlling the trade-off between misclassification and margin width.
γ (gamma): defining the influence of single training points on decision boundaries.
A grid search explored combinations of C and γ values within a logarithmic scale (e.g., 10−3 to 103). Standardized input features ensured appropriate scaling for kernel-based distance computations.

2.9.3. Logistic Regression

LR served as a baseline linear model, enabling comparison with more complex, non-linear algorithms. An L2 regularization penalty was applied to reduce overfitting, and the ‘lbfgs’ solver was used for parameter estimation. Regularization strength (C) was optimized through grid search. Despite its simplicity, LR provides interpretable coefficients, useful for understanding the direction and magnitude of effects of individual features.

2.9.4. Neural Network

A feed-forward fully connected neural network was implemented to model potentially complex, non-linear interactions between features. The network architecture consisted of:
Input layer with dimension equal to the number of features,
Two hidden layers with 64 and 32 neurons, respectively,
Output layer with one neuron using a sigmoid activation function for binary classification.
The ReLU activation function was used in hidden layers to improve gradient propagation and computational efficiency. The model was trained using the Adam optimizer with a learning rate of 0.001, binary cross-entropy loss, and a batch size of 32.
Training was performed over 200 epochs, with early stopping based on validation loss to avoid overfitting. Hyperparameters (number of layers, neurons, learning rate, batch size) were optimized through grid search combined with 5-fold cross-validation.

2.9.5. k-Nearest Neighbors

KNN was included as a non-parametric, instance-based learning method that classifies new observations based on similarity to the k nearest neighbors in the feature space. The Euclidean distance metric was used to compute similarity, and distance-based weighting was applied so that closer neighbors had a greater influence on classification decisions.
The number of neighbors (k) was optimized within the range k = 3–15, and the optimal value k = 5 was selected based on cross-validation accuracy. Because KNN performance depends strongly on feature scaling, standardized z-score normalized inputs were used.

2.9.6. Ensemble Model (Stacked RF + NN + SVM)

To combine the strengths of individual classifiers, an ensemble model was constructed using hard voting. The ensemble integrated the predictions from RF, SVM, NN, KNN, and LR. Each model contributed one vote, and the final classification was based on majority voting. Ensemble methods are known to improve generalization by reducing the variance of individual models, especially when combining algorithms with different inductive biases.

2.9.7. Model Evaluation

Model performance was evaluated on the independent test set using the following metrics: accuracy, sensitivity (recall), specificity, precision, F1 score, ROC-AUC, and the MCC.
Confusion matrices were generated for each model to visualize the distribution of true positives, true negatives, false positives, and false negatives. The MCC was included to provide a balanced evaluation even in the presence of potential class imbalance.
All results were averaged across 10 Monte Carlo repetitions with random train–test splits to ensure the stability and reproducibility of model performance estimates. This repeated resampling approach provides a robust estimate of model generalization performance while mitigating overfitting to specific data partitions. Similar to the dual-layer filtering strategies proposed for correlated network systems by Zhao et al. [25], the ensemble and cross-validation framework adopted in this study effectively addresses the challenges of temporal and physiological correlations within multimodal datasets, ensuring reliable model evaluation [25].

2.10. Measures of Accuracy

To assess the models’ performance, a confusion matrix was created for each classification task. The rows denote the anticipated cases generated by the models. The columns denote the actual values. The definition of each numeral in the confusion matrix is as follows:
True Positives (TP): Instances identified by the model as lame cows that are indeed afflicted cases.
False Positives (FP): Instances identified by the model as lame cows that are, in fact, non-lame.
True Negatives (TN): Instances identified by the model as non-lame cows that are indeed non-lame.
False Negatives (FN): Instances identified by the model as non-lame cows that are, in fact, lame cases.
The subsequent metrics were derived from the confusion matrix:
Accuracy is defined as the ratio of correctly classified data to the total data by the model. Nonetheless, when the data are uneven, the outcomes may be excessively optimistic [26].
A c c u r a c y = ( T P + T N ) / ( T P + F P + F N + T N )
Sensitivity, also known as the True Positive Rate or recall measure, is the percentage of ill animals that the model accurately identifies.
S e n s i t i v i t y = T P / ( T P + F N )
Specificity: This is the proportion of negative cases that the model accurately identifies.
S p e c i f i c i t y = T N / ( F P + T N )
ROC–AUC: the ROC curve is a probabilistic graph that provides a thorough evaluation of the models’ efficacy. Probabilities are utilised in classification tasks to enable the categorisation of data at a defined threshold. The value was determined to be 0.5 in our study. This indicates that cows were categorised as lame cases when the chance above 0.5. Cows were categorised as non-lame if the probability was below 0.5. The false positive rate (FPR) is denoted by 1-Specificity on the x-axis of the ROC curve graph. Sensitivity is depicted on the vertical axis. The area beneath the ROC curve, ranging from 0 to 1, is termed the ROC–AUC. The ROC–AUC quantifies the model’s capacity to distinguish between positive and negative instances. The model’s capacity to differentiate between ill and non-ill instances is improved by an elevated ROC–AUC value.
Positive predictive value (PPV): This denotes the probability that the model accurately detects a cow as lame, notwithstanding the cow’s real illness. The prevalence of the condition within the sample affects it.
P P V = T P / ( T P + F P )
Negative predictive value (NPV): This is the probability that the model erroneously classifies a cow as non-lame when it is, in fact, lame. This value is affected by the disease’s prevalence in the sample, akin to PPV.
N P V = T N / ( T N + F N )
The MCC is an alternative statistic that, as seen by Chicco and Jurman [26], is not influenced by imbalanced datasets. The author described it as the calculus of the Pearson product-moment correlation coefficient between the actual and expected data. The scope of this metric is [−1, +1]. A MCC score nearing −1 signifies that the model generates exceptionally accurate predictions. An MCC nearing −1 signifies inadequate model performance. An MCC of 0 signifies performance equivalent to random prediction.
M C C = ( T P × T N F P × F N ) / [ ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N ) ] 1 / 2  
Chicco and Jurman [26] suggested that the MCC value can be represented within the interval [0, 1]. This is known as normalised MCC (nMCC), computed using the formula nMCC = (MCC + 1)/2, where 0 signifies the worst-case scenario and 1 indicates the best-case scenario.

2.11. Feature Importance Analysis

Feature importance was assessed using two complementary approaches applied to the RF and Ensemble models: Gini importance (mean decrease in impurity) and permutation importance.
The Gini index was used to quantify the average decrease in node impurity contributed by each feature across all trees in the model, providing a measure of how strongly each variable contributed to model splitting decisions.
Permutation importance was calculated by randomly shuffling each feature and measuring the resulting decrease in model performance (ROC-AUC). This method provides a model-agnostic assessment of each variable’s contribution and helps mitigate potential biases associated with Gini-based rankings (e.g., preference for variables with more categories or higher variance).
The aim was to identify which physiological, behavioural, biochemical, and milk traits most strongly contributed to the model’s ability to discriminate between lame and healthy cows, thereby providing both predictive insight and biological interpretation.
Descriptive statistics were calculated using SPSS version 29.0 (IBM Corp., Armonk, NY, USA). Data normality was evaluated, and differences between lame and healthy cows were assessed using independent Student’s t-tests for normally distributed variables and one-way ANOVA where appropriate. For non-normally distributed variables, non-parametric tests (Mann–Whitney U) were applied. A 95% confidence interval (CI) was applied, and results were considered statistically significant when p < 0.05. Data are presented as means ± standard deviations for normally distributed variables, and as medians with interquartile ranges for non-normally distributed variables.

3. Results

This section provides a detailed comparison of six ML algorithms applied to the early identification of lameness in dairy cows. Key performance metrics—including accuracy, sensitivity, specificity, predictive values, and overall classification ability—are presented for each model. Moreover, the analysis highlights the most influential physiological, behavioural, blood-based, and milk quality indicators contributing to the classification accuracy, offering insights into their predictive importance for detecting lameness in early lactation.

3.1. Classification Model Performance Based on Normalised MCC

Table 4 presents the performance of six ML models in classifying lame and healthy dairy cows during early lactation, as evaluated by the nMCC. The nMCC metric is particularly well-suited for imbalanced datasets and offers a robust measure of overall model performance by considering true and false positives and negatives. Among the evaluated models, RF achieved the highest nMCC score of 0.94, indicating near-perfect predictive capability and strong agreement between predicted and actual classifications. Both the NN and the Ensemble model also exhibited excellent performance, each attaining an nMCC of 0.90, suggesting their high reliability in capturing complex nonlinear patterns associated with lameness. KNN algorithm followed closely with an nMCC of 0.78, demonstrating relatively strong discriminative ability. In contrast, SVM achieved a moderate nMCC of 0.71, indicating somewhat lower classification precision compared to the top-performing models. LR model, with an nMCC of 0.30, exhibited the weakest performance, likely due to its linear nature and limited capacity to model complex interactions among physiological, behavioural, and biochemical indicators. These findings underscore the superiority of tree-based and deep learning approaches for early detection of lameness in dairy cows, particularly when working with heterogeneous and multidimensional sensor-derived data.

3.2. Comparative Analysis of Classification Models and Influential Diagnostic Features

Table 5 presents a comprehensive evaluation of six ML models—RF, SVM, LR, NN, KNN, and an Ensemble model—for classifying dairy cows with and without lameness during early lactation. Among all tested models, RF achieved the highest performance, with a sensitivity of 92.73 ± 12.06 and a specificity of 100.00 ± 0.00, indicating excellent capability to correctly classify both healthy and lame cows. Its overall accuracy was 97.04 ± 4.91, and its area under the curve (AUC) (99.77 ± 0.68) and MCC (0.94 ± 0.10) confirmed its strong generalizability and robust predictive power. The NN model showed comparably high performance, with a sensitivity of 94.55 ± 11.64, specificity of 95.07 ± 6.04, and an AUC of 97.73 ± 5.73, suggesting its suitability for capturing complex nonlinear patterns related to lameness. The Ensemble model, which integrates predictions from RF, SVM, and NN, also achieved high scores across metrics, including an MCC of 0.90 ± 0.09 and AUC of 96.82 ± 4.43, confirming the benefits of model aggregation.
In contrast, LR demonstrated the weakest performance, with a sensitivity of 52.73 ± 9.79, specificity of 75.85 ± 8.75, and an AUC of only 74.89 ± 6.16, indicating limited utility in this complex classification task. SVM provided intermediate results, showing strong specificity (94.49 ± 5.09) but lower sensitivity (73.64 ± 11.10), and an AUC of 94.31 ± 5.89, reflecting a tendency to favour healthy classifications. KNN achieved moderate performance across all metrics, with a sensitivity of 89.09 ± 12.06, specificity of 88.38 ± 8.29, and an AUC of 89.39 ± 7.53.
Among the tested models, RF demonstrated the most consistent and outstanding performance, achieving near-perfect scores in all four metrics (sensitivity = 92.73 ± 12.06; specificity = 100.00 ± 0.00; PPV = 100.00 ± 0.00; accuracy = 97.04 ± 4.91) (Figure 1). NN and the Ensemble model also performed strongly, with almost identical metric profiles (e.g., accuracy = 94.84 ± 4.73; PPV = 93.85 ± 7.54), reflecting their robust ability to capture complex patterns in high-dimensional physiological, behavioural, and milk-based features. SVM displayed strong specificity (94.49 ± 5.09) and PPV (90.53 ± 8.28), but a lower sensitivity (73.64 ± 11.10), suggesting a tendency to under-detect lame cows. LR showed the weakest performance overall, particularly in sensitivity (52.73 ± 9.79) and PPV (60.46 ± 9.01), indicating a limited ability to generalize beyond linear patterns. KNN achieved intermediate results with balanced though slightly lower metrics compared to NN and the Ensemble model.
Among all models, RF consistently achieved the highest scores across most metrics, including perfect specificity and PPV (100.00 ± 0.00), high accuracy (97.04 ± 4.91), and AUC (99.77 ± 0.68), confirming its superior ability to correctly classify both lame and healthy cows with minimal error (Figure 2). NN and the Ensemble model closely followed, showing strong performance across all criteria, particularly with balanced sensitivity (94.55 ± 11.64), NPV (96.89 ± 6.53), and AUC values above 96%. SVM demonstrated good specificity and PPV but was notably lower in sensitivity (73.64 ± 11.10), suggesting its limited ability to detect all lame cows. KNN also performed relatively well, especially in accuracy and AUC, outperforming SVM in sensitivity (89.09 ± 12.06) but slightly underperforming in specificity. Conversely, LR consistently exhibited the lowest metrics among all models, particularly in sensitivity (52.73 ± 9.79) and AUC (74.89 ± 6.16), indicating poor discrimination capacity for the lameness condition based on complex, nonlinear physiological and biomarker signals.
This comparative visualization clearly illustrates that ensemble-based and nonlinear models (RF, NN, Ensemble) are more reliable and accurate in detecting early-lactation lameness than linear models, which are less adaptable to the multimodal and noisy nature of sensor and biological data.

3.3. Feature Importance Evaluation

To explore which parameters contributed most to the model’s decision-making, a feature importance analysis was performed using the Random Forest classifier. Variable importance was estimated through permutation analysis on the validation set, quantifying the decrease in predictive performance when each feature was randomly permuted. Although the analysis identified consistent trends, the overall ranking stability was affected by the moderate class imbalance (162 healthy vs. 110 lame cows) and by intercorrelations among several predictors—particularly milk composition traits (fat, protein, lactose) and biochemical parameters (GGT, AST, NEFA). As a result, importance scores exhibited small absolute differences, making precise ordering between features less meaningful. Future studies using larger datasets and model-agnostic interpretability techniques (e.g., SHAP analysis) could provide more stable feature ranking across models.

3.4. Formatting of Mathematical Components

Descriptive statistical analysis was conducted to compare a broad set of indicators between dairy cows diagnosed with lameness and clinically healthy counterparts during early lactation (Table 6).
Among environmental parameters, the THI and heat index showed no significant differences between groups (p = 0.722 and p = 0.686, respectively), indicating that lameness was not confounded by ambient climatic conditions in this dataset. However, significant physiological and behavioural deviations were observed. Water intake was significantly lower in lame cows (127.35 ± 29.77 L/day) compared to healthy cows (145.06 ± 29.92 L/day, p < 0.001), reflecting altered hydration or feeding behaviour possibly due to discomfort or reduced mobility. Reticulorumen temperature without drinking cycles and normal body temperature were slightly elevated in lame cows (p = 0.003 and p < 0.001, respectively), potentially indicating low-grade inflammation or altered thermoregulation. In contrast, overall cow activity and rumination time did not differ significantly between groups (p = 0.534 and p = 0.461), suggesting that compensatory behavioural adaptations may mask clinical signs in some cases.
Biochemically, lame cows exhibited significantly lower levels of AST, GGT, and LDH, with respective p-values of 0.001, <0.001, and <0.001. These findings may indicate hepatic strain or systemic inflammatory responses in association with lameness. Notably, NEFA concentrations were significantly elevated in lame cows (0.14 ± 0.14 mmol/L vs. 0.09 ± 0.08 mmol/L in healthy cows, p < 0.001), suggesting increased lipomobilization and metabolic stress. While TP and TRIG levels did not differ significantly between groups, Fe levels showed a borderline reduction in lame cows (p = 0.076).
Milk production traits revealed further disparities. Although milk yield was slightly higher in lame cows (37.68 ± 9.69 kg/day vs. 36.44 ± 8.88 kg/day), the difference was not statistically significant (p = 0.278), possibly due to large variability within groups. However, milk protein content was significantly lower in lame cows (3.21 ± 0.38%) compared to healthy cows (3.35 ± 0.27%, p < 0.001), while milk lactose was also lower in the lame group (4.71 ± 0.51% vs. 4.84 ± 0.18%, p = 0.003). No significant differences were detected for milk fat content, fat-to-protein ratio, or milk temperature.

4. Discussion

4.1. Main Features Associated with Clinical Lameness in Dairy Cattle

This study provides compelling evidence that ML algorithms, particularly RF, NN, and Ensemble models, can accurately classify dairy cows with clinically diagnosed lameness using a comprehensive, multimodal dataset encompassing physiological, behavioural, blood-based, and milk composition variables. The high classification accuracy achieved by RF (97.04%), coupled with perfect specificity and PPV, demonstrates the strong predictive capability of tree-based models in handling heterogeneous, nonlinear data. NN and the Ensemble model also performed consistently well across all performance metrics, further validating their capacity to capture complex biological interactions associated with lameness. Similar findings have been reported in other contexts; for example, in a study evaluating classification models for lameness treatment events, RF was compared with LR and Gaussian Naïve Bayes. Although the RF model achieved a more modest performance (AUC = 0.71) for lameness detection based on sensor-derived variables such as pedometer activity and feed intake, it was still recommended as a benchmark model owing to its interpretability and consistent predictive ability. Interestingly, other authors also observed that oversampling techniques did not enhance AUC, reinforcing RF’s robustness under different modelling conditions [27]. Indeed, previous research in dairy cattle health monitoring confirms the suitability of RF in such contexts: for example, Dineva et al. [28] applied RF to classify cow health status using heterogeneous IoT and sensor data and achieved an accuracy of 0.959, with recall 0.954 and precision 0.97 [28].
The superiority of RF and NN observed in this study aligns with previous research indicating that models capable of handling nonlinear, high-dimensional data outperform simpler linear methods in animal health applications [9,29,30]. In contrast, LR, a linear model, showed limited predictive value, with the lowest sensitivity (52.73%) and AUC (74.89%) among all tested models. This aligns with earlier findings, where logistic regression models for lameness detection generally achieved only moderate accuracy, with AUC values ranging from 0.70 to 0.77 [31]. One explanation for this limited performance is that LR assumes linear relationships between predictors and outcomes, whereas lameness in dairy cows arises from multi-factorial and nonlinear physiological processes. Pain-induced alterations in gait, weight redistribution, and locomotor asymmetry interact with changes in feed intake, rumination time, and milk yield, all of which are influenced by metabolic and inflammatory states [32]. Such complex interactions are not easily captured by linear models.
Among the models evaluated, SVM and KNN demonstrated intermediate performance. SVM showed high specificity (94.49%) but lower sensitivity (73.64%), suggesting a bias toward correctly identifying healthy animals while under-detecting truly lame cows. This could lead to an increased risk of false negatives in practical settings. KNN displayed balanced but slightly lower overall performance, which may be attributed to its sensitivity to feature scaling and data noise. In line with these observations, previous work has shown that an Long Short-Term Memory (LSTM) model trained on step-size feature vectors achieved a lameness detection accuracy of 98.57%, outperforming SVM, KNN, and Decision Tree Classifier (DTC) by margins of 2.93%, 3.88%, and 9.25%, respectively [33], further emphasizing the advantage of advanced nonlinear models in capturing gait-related abnormalities. Complementing our findings, Neupane et al. [34] evaluated ML models using accelerometer-derived behavioural data (day-to-day lying time, step counts, and their trends) for detecting lameness and the need for corrective or therapeutic claw treatments. They found that the ROCKET time-series classifier, particularly when combining conventional and slope features, significantly outperformed RF, Naïve Bayes, and LR—achieving accuracies > 90%, ROC–AUCs > 0.74, and F1-scores > 0.61 for identifying cows needing intervention [34]. Post et al. [27] developed classification models for both mastitis and lameness treatment events using daily individual sensor data—such as milking parameters, pedometer activity, feed and water intake, and body weight. They compared various ML methods (LR, SVM, KNN, Gaussian Naïve Bayes, Extra Trees (ET), and RF, and found the ET classifier achieved the highest mean AUC of 0.79 for mastitis and 0.71 for lameness, closely followed by Gaussian Naïve Bayes, LR, and RF—highlighting good interpretability alongside competitive performance [27]. A recent study by Lemmens et al. [35] demonstrated that integrating sensor data, automated milking system (AMS) parameters, and farm-level information substantially improved the detection of mild lameness (locomotion score ≥ 2). Their RF model achieved an accuracy of approximately 0.75, with a sensitivity of 0.72 and a specificity of 0.78. Importantly, eating time, low activity, medium activity, and activity trends differed significantly between lame and non-lame cows, whereas rumination time remained largely unaffected [35].
The Ensemble model, integrating predictions from RF, NN, and SVM, provided robust performance across all metrics and may serve as a practical compromise when aiming to balance specificity, sensitivity, and interpretability. Its consistent classification accuracy and relatively low variation across Monte Carlo cross-validations point to its potential as a stable and generalizable tool for on-farm decision support. Similarly, Yuhao Shen et al. [36] described an ensemble learning approach for detecting cow lameness, where an improved YOLOv8-Pose model was used to identify key points on the hooves, knees, hips, and head, and motion features were fused through a stacking ensemble method, achieving an overall accuracy of 97.2% [36].
The clinical relevance of the models’ performance is underscored by the biological interpretation of key input features. For example, lame cows exhibited significantly lower water intake and higher reticulorumen and body temperatures—traits likely reflective of discomfort, inflammatory responses, or altered metabolic function. Biomarkers such as NEFA, GGT, AST, and LDH were also significantly altered in lame cows, supporting the hypothesis that lameness is accompanied by systemic physiological and metabolic changes. Interestingly, Meléndez et al. [37] demonstrated that acute health disorders in dairy cows can induce measurable shifts in serum metabolic profiles, including GGT activity, thereby supporting the notion that alterations in this enzyme are part of a broader systemic response to disease rather than isolated anomalies [37]. Reduced AST levels observed in our study may also point toward compromised metabolic processes; however, it is important to acknowledge that the relationship between lameness and AST activity remains inconsistent in the literature, with some report describing unchanged values [38,39]. These findings concur with Praxitelous et al. [40], who documented higher GGT concentrations in lame cows during the puerperium period (25.83 vs. 23.56, p = 0.02) [40], as well as with un-targeted metabolomics work by He et al. [41], which identified lipid-metabolism metabolites as discriminative markers in lame dairy cattle [41]. Furthermore, scoping reviews such as Sadiq et al. [42] consistently list NEFA and liver enzymes among the most frequently studied biomarkers linked to lameness in dairy cows [42]. In addition, Dineva et al. [28] demonstrated that a RF classifier using heterogeneous sensor and physiological data achieved high predictive accuracy (0.959), recall (0.954) and precision (0.97)—evidence for the utility of combining diverse biological signals in health monitoring [28]. These discrepancies suggest that biochemical responses to lameness are likely influenced by the severity, chronicity, and underlying causes of locomotor impairment, as well as by concurrent metabolic and inflammatory conditions. Notably, elevated NEFA levels in lame cows suggest increased lipomobilization, which may reflect an underlying energy imbalance or stress-induced metabolic shift. These findings are consistent with prior studies that have identified NEFA and liver enzymes as important indicators of systemic inflammation and metabolic stress in dairy cattle.
From a milk production perspective, lame cows showed reduced milk protein and lactose content, further highlighting the systemic impact of lameness on metabolic efficiency and mammary gland function. Although overall milk yield was not significantly different, the compositional shifts in milk underline the potential of milk traits as non-invasive biomarkers for health monitoring. Kass et al. [43], in a study of Estonian Holstein cows, demonstrated that lame individuals produced significantly less milk overall, with concomitant decreases in milk protein and fat yield compared to their non-lame counterparts [43]. Furthermore, our previous studies showed that milk lactose dynamics are also altered around the onset of lameness: healthy cows had significantly lower lactose levels com-pared to lame cows both on the day of diagnosis (−2.15%) and seven days thereafter (−1.73%), suggesting that lameness is associated with transient changes in lactose synthesis and secretion [44]. Indeed, in the study by Jukna et al. [45], severe lameness was associated with a decrease in milk lactose concentration by 0.16 percentage points (p < 0.001) as lameness severity intensified, supporting the notion that lameness alters milk composition [45]. Moreover, Bonfatti et al. [46] explored the use of milk mid-infrared spectra to predict lameness scores, suggesting that deviations in milk spectral traits may reflect underlying metabolic disturbances, which in turn influence milk composition [46].
The effectiveness of our predictive models for lameness is underscored by the biological and physiological significance of their key input features. We observed that lame cows exhibit reduced water intake and elevated body temperature. The elevated body temperature is a clear physiological marker of inflammation, a process where the cow redirects energy and resources away from productive behaviours like milk synthesis and toward pain management and tissue repair. Further supporting this link, a separate study by Antanaitis et al. [47] demonstrated that changes in reticulorumen temperature patterns coincided with the onset of clinical lameness, highlighting the strong connection between these physiological indicators and the manifestation of the disease [47]. Therefore, our models’ reliance on these specific behavioural and physiological features is biologically justified, reinforcing their clinical relevance for early lameness detection.
Interestingly, rumination time and cow activity did not differ significantly between groups, which may suggest behavioural compensation in lame animals or highlight the difficulty of relying on these measures in isolation for lameness detection. Weigele et al. [48] observed that moderate lameness had no significant impact on rumination time, number of ruminating chews, or boluses, suggesting that basic rumination behaviour remains relatively stable despite locomotor impairment [48]. Likewise, Thorup et al. [49] reported that lameness did not affect daily rumination time, number of rumination events, or overall rumination behaviour, even though feeding behaviour was clearly altered. From a physiological standpoint, this stability may be explained by the cow’s strong homeostatic drive to maintain rumen function and fiber digestion, which are essential for sustaining microbial fermentation and volatile fatty acid production, even when mobility is compromised [50]. Consequently, lame cows may preserve rumination behaviour while reducing other energy-demanding activities, such as locomotion or the frequency of visits to the feed bunk [51]. These findings underscore that rumination and activity data, when used in isolation, may fail to capture the multifactorial nature of lameness, highlighting the need to integrate them with gait-related metrics, weight distribution patterns, or biochemical indicators to achieve more reliable detection. This further justifies the use of multimodal data integration in model development. The inclusion of both objective sensor data and clinically relevant biomarkers strengthens the interpretability and real-world applicability of the resulting models.

4.2. Reflections on Strengths, Limitations, and Scientific Outlook

One of the main strengths of this study lies in its integrative approach to lameness classification, combining physiological, behavioural, blood biochemical, and milk composition data into a comprehensive ML framework. This multimodal design reflects the multifactorial nature of lameness and allows for more biologically meaningful classification models than approaches that rely on single data streams. By evaluating six different algorithms and incorporating robust cross-validation, the study offers a clear and comparative picture of model performance and stability. The consistent superiority of Random Forest, Neural Networks, and the Ensemble model demonstrates the power of non-linear and ensemble methods to capture complex patterns in real-world, farm-derived data. Although preliminary feature importance analysis was conducted, the relatively limited sample size and intercorrelated structure of the dataset may have affected the stability of variable ranking. Future studies with larger, longitudinal datasets are needed to derive more robust interpretability measures.
Another important strength is the biological interpretability of the input features. Several of the indicators that differed significantly between lame and healthy cows—such as water intake, NEFA levels, liver enzymes (GGT, AST, LDH), milk protein, and lactose content—are not only statistically significant but also physiologically relevant, reinforcing the clinical validity of the model outputs. These findings support the growing role of sensor and biomarker-based monitoring systems in veterinary diagnostics.
Despite these contributions, the study has several limitations. The models do not predict the onset of lameness but rather classify cows as lame or healthy based on their current biological status. Nevertheless, it is essential to acknowledge that the models developed in this study serve as diagnostic aids rather than predictive tools. Because the data were collected at the time of clinical diagnosis, the models classify cows according to their current physiological and biochemical status and do not predict the future onset of lameness. This distinction is important for setting realistic expectations about their practical application. Moreover, the cross-sectional nature of the study limits conclusions about the temporal progression of lameness or the dynamics of the involved indicators over time.
Another constraint is the absence of external validation. While the models performed well under internal cross-validation, their generalizability to other farms, management systems, or cow populations remains to be tested. Additionally, behavioural variables such as rumination time and activity did not show significant differences between groups, possibly due to adaptation or masking behaviours in lame cows. This suggests that sensor-based behavioural indicators alone may not be sufficiently sensitive or specific for detecting lameness without additional physiological or biochemical context. It should also be noted that a moderate class imbalance existed in the dataset (healthy cows = 162; lame cows = 110). Although stratified sampling and balanced evaluation metrics such as MCC and nMCC were used to mitigate its effects, linear models like Logistic Regression may still be more sensitive to such imbalance, which could have contributed to their comparatively lower performance.
Looking ahead, future research should focus on longitudinal data collection and repeated measurements across the disease trajectory to evaluate whether these multimodal ML models can be adapted for pre-clinical prediction, relapse monitoring, or progression assessment. External validation using independent datasets from commercial farms is also essential to assess model scalability and real-world applicability. Lastly, integrating genetic, environmental, and management variables could further refine model accuracy and resilience across diverse production settings.
In summary, this study contributes a robust, multimodal framework for clinical classification of lameness in dairy cows and highlights the utility of combining biosensor data with milk and blood biomarkers. While not predictive, the models offer a promising diagnostic support tool that can enhance the objectivity, consistency, and efficiency of veterinary decision-making within precision dairy systems.

5. Conclusions

This study confirms the effectiveness of ML models, particularly RF, NN, and Ensemble methods, in accurately classifying clinically lame dairy cows using multimodal data collected at the time of diagnosis. The RF model achieved the highest accuracy (97.04%) and perfect specificity and PPV, while NN and the Ensemble model also demonstrated strong and consistent diagnosis performance. In contrast, linear models such as LR showed substantially lower sensitivity and diagnostic value. Significant differences in physiological, biochemical, and milk-related indicators between lame and healthy cows—including elevated NEFA concentrations, liver enzymes, and body temperature, together with reduced milk protein and lactose—confirm the biological relevance of the selected features. These findings highlight the multifactorial nature of lameness and justify the integration of such parameters into data-driven diagnostic tools.
While the models were trained on data collected during clinical lameness diagnosis—and therefore do not constitute pre-clinical prediction—they provide a robust, objective, and high-accuracy diagnostic framework for detecting lameness. This approach represents a significant step forward in precision dairy health monitoring and establishes a foundation for future integration of multimodal biosensor and biomarker data into intelligent herd management systems.

Author Contributions

Conceptualization, K.D. and R.A.; methodology, K.D.; software, K.D.; formal analysis, S.G., A.G., G.Š., J.K. and G.L.; investigation, S.G., A.G., G.Š., J.K. and G.L.; data curation, K.D., V.R. and M.T.; writing—original draft preparation, K.D. and J.K.; writing—review and editing, V.R., M.T. and J.K.; visualization, K.D.; project administration, R.A. All authors have read and agreed to the published version of the manuscript.

Funding

The Research Council of Lithuania (LMTLT) and the Ministry of Education, Science, and Sport of the Republic of Lithuania have provided financial support for this initiative under agreement No: S-A-UEI-23-7.

Institutional Review Board Statement

The research was performed in compliance with the Declaration of Helsinki and received approval from the Ethics Committee (approval number G2-227, dated 7 March 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All pertinent data are presented within the manuscript. The machine learning scripts developed for this study were written in Python and can be obtained from the corresponding author upon reasonable request to support transparency and reproducibility of the findings.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLMachine learning
RFRandom Forest
SVMSupport Vector Machine
NNNeural Network
KNNK-nearest neighbors
LRLogistic Regression
NEFANon-esterified fatty acids
ASTAspartate aminotransferase
LDHLactate dehydrogenase
GGTGamma-glutamyl transferase
nMMCNormalized Matthews correlation coefficient
BHBBeta-hydroxybutyrate
PLFPrecision livestock farming
TRIGTriglycerides
ROCReceiver operating characteristic
AUCArea under the curve
MCCMatthews Correlation Coefficient
FPFalse Positives
TNTrue Negatives
FNFalse Negatives
FPRFalse Positive Rate
PPVPositive predictive value
NPVNegative predictive value
TPTotal protein
FeIron
LSMULithuanian University of Health Sciences
TMRTotal mixed ration
DMDry matter
THITemperature–humidity index
RMSEPRoot mean square error of prediction
RBFRadial Basis Function
TPTrue Positives
ClConfidence interval
LSTMLong Short-Term Memory
DTCDecision Tree Classifier
ETExtra Trees
AMSAutomated milking system

References

  1. Barney, S.; Dlay, S.; Crowe, A.; Kyriazakis, I.; Leach, M. Deep Learning Pose Estimation for Multi-Cattle Lameness Detection. Sci. Rep. 2023, 13, 4499. [Google Scholar] [CrossRef]
  2. Taneja, M.; Byabazaire, J.; Jalodia, N.; Davy, A.; Olariu, C.; Malone, P. Machine Learning Based Fog Computing Assisted Data-Driven Approach for Early Lameness Detection in Dairy Cattle. Comput. Electron. Agric. 2020, 171, 105286. [Google Scholar] [CrossRef]
  3. Randall, L.V.; Kim, D.-H.; Abdelrazig, S.M.A.; Bollard, N.J.; Hemingway-Arnold, H.; Hyde, R.M.; Thompson, J.S.; Green, M.J. Predicting Lameness in Dairy Cattle Using Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics and Machine Learning. J. Dairy Sci. 2023, 106, 7033–7042. [Google Scholar] [CrossRef]
  4. Cockburn, M. Review: Application and Prospective Discussion of Machine Learning for the Management of Dairy Farms. Animals 2020, 10, 1690. [Google Scholar] [CrossRef] [PubMed]
  5. Overton, T.R.; McArt, J.A.A.; Nydam, D.V. A 100-Year Review: Metabolic Health Indicators and Management of Dairy Cattle. J. Dairy Sci. 2017, 100, 10398–10417. [Google Scholar] [CrossRef] [PubMed]
  6. Bruinjé, T.C.; LeBlanc, S.J. Invited Review: Inflammation and Health in the Transition Period Influence Reproductive Function in Dairy Cows. Animals 2025, 15, 633. [Google Scholar] [CrossRef]
  7. Schmeling, L.; Elmamooz, G.; Hoang, P.T.; Kozar, A.; Nicklas, D.; Sünkel, M.; Thurner, S.; Rauch, E. Training and Validating a Machine Learning Model for the Sensor-Based Monitoring of Lying Behavior in Dairy Cows on Pasture and in the Barn. Animals 2021, 11, 2660. [Google Scholar] [CrossRef]
  8. Dhaliwal, Y.; Bi, H.; Neethirajan, S. Bimodal Data Analysis for Early Detection of Lameness in Dairy Cows Using Artificial Intelligence. J. Agric. Food Res. 2025, 21, 101837. [Google Scholar] [CrossRef]
  9. Riaz, M.U.; O’Grady, L.; McAloon, C.G.; Logan, F.; Gormley, I.C. Comparison of machine learning and validation methods for high-dimensional accelerometer data to detect foot lesions in dairy cattle. PLoS ONE 2025, 20, e0325927. [Google Scholar] [CrossRef]
  10. Zhou, X.; Xu, C.; Wang, H.; Xu, W.; Zhao, Z.; Chen, M.; Jia, B.; Huang, B. The Early Prediction of Common Disorders in Dairy Cows Monitored by Automatic Systems with Machine Learning Algorithms. Animals 2022, 12, 1251. [Google Scholar] [CrossRef]
  11. Fadul-Pacheco, L.; Delgado, H.; Cabrera, V.E. Exploring Machine Learning Algorithms for Early Prediction of Clinical Mastitis. Int. Dairy J. 2021, 119, 105051. [Google Scholar] [CrossRef]
  12. Ankinakatte, S.; Norberg, E.; Løvendahl, P.; Edwards, D.; Højsgaard, S. Predicting mastitis in dairy cows using neural networks and generalized additive models: A comparison. Comput. Electron. Agric. 2013, 99, 1–6. [Google Scholar] [CrossRef]
  13. Kamphuis, C.; Mollenhorst, H.; Heesterbeek, J.A.P.; Hogeveen, H. Detection of Clinical Mastitis with Sensor Data from Automatic Milking Systems Is Improved by Using Decision-Tree Induction. J. Dairy Sci. 2010, 93, 3616–3627. [Google Scholar] [CrossRef] [PubMed]
  14. Khatun, M.; Thomson, P.C.; Kerrisk, K.L.; Lyons, N.A.; Clark, C.E.F.; Molfino, J.; García, S.C. Development of a New Clinical Mastitis Detection Method for Automatic Milking Systems. J. Dairy Sci. 2018, 101, 9385–9395. [Google Scholar] [CrossRef]
  15. Džermeikaitė, K.; Krištolaitytė, J.; Antanaitis, R. Application of Machine Learning Models for the Early Detection of Metritis in Dairy Cows Based on Physiological, Behavioural and Milk Quality Indicators. Animals 2025, 15, 1674. [Google Scholar] [CrossRef]
  16. Xu, W.; van Knegsel, A.T.M.; Vervoort, J.J.M.; Bruckmaier, R.M.; van Hoeij, R.J.; Kemp, B.; Saccenti, E. Prediction of Metabolic Status of Dairy Cows in Early Lactation with On-Farm Cow Data and Machine Learning Algorithms. J. Dairy Sci. 2019, 102, 10186–10201. [Google Scholar] [CrossRef]
  17. O’Leary, N.W.; Byrne, D.T.; O’Connor, A.H.; Shalloo, L. Invited Review: Cattle lameness detection with Accelerometers. J. Dairy Sci. 2020, 103, 3895–3911. [Google Scholar] [CrossRef]
  18. Byabazaire, J.; Olariu, C.; Taneja, M.; Davy, A. Lameness Detection as a Service: Application of Machine Learning to an Internet of Cattle. In Proceedings of the 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 11–14 January 2019; pp. 1–6. [Google Scholar]
  19. Beer, G.; Alsaaod, M.; Starke, A.; Schuepbach-Regula, G.; Müller, H.; Kohler, P.; Steiner, A. Use of Extended Characteristics of Locomotion and Feeding Behavior for Automated Identification of Lame Dairy Cows. PLoS ONE 2016, 11, e0155796. [Google Scholar] [CrossRef]
  20. Hall, M.B. Invited Review: Corrected Milk: Reconsideration of Common Equations and Milk Energy Estimates. J. Dairy Sci. 2023, 106, 2230–2246. [Google Scholar] [CrossRef] [PubMed]
  21. Thomsen, P.T.; Munksgaard, L.; Tøgersen, F.A. Evaluation of a Lameness Scoring System for Dairy Cows. J. Dairy Sci. 2008, 91, 119–126. [Google Scholar] [CrossRef] [PubMed]
  22. Sprecher, D.J.; Hostetler, D.E.; Kaneene, J.B. A Lameness Scoring System That Uses Posture and Gait to Predict Dairy Cattle Reproductive Performance. Theriogenology 1997, 47, 1179–1187. [Google Scholar] [CrossRef]
  23. Winckler, C.; Willen, S. The Reliability and Repeatability of a Lameness Scoring System for Use as an Indicator of Welfare in Dairy Cattle. Acta Agric. Scand. Sect. A-Anim. Sci. 2001, 51, 103–107. [Google Scholar] [CrossRef]
  24. Kloosterman, P. Laminitis—Prevention, Diagnosis and Treatment. WCDS Adv. Dairy Technol. 2007, 19, 157–166. [Google Scholar]
  25. Zhao, H.; Yan, L.; Hou, Z.; Lin, J.; Zhao, Y.; Ji, Z.; Wang, Y. Error Analysis Strategy for Long-Term Correlated Network Systems: Generalized Nonlinear Stochastic Processes and Dual-Layer Filtering Architecture. IEEE Internet Things J. 2025, 12, 33731–33745. [Google Scholar] [CrossRef]
  26. Chicco, D.; Jurman, G. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
  27. Post, C.; Rietz, C.; Büscher, W.; Müller, U. Using Sensor Data to Detect Lameness and Mastitis Treatment Events in Dairy Cows: A Comparison of Classification Models. Sensors 2020, 20, 3863. [Google Scholar] [CrossRef]
  28. Dineva, K.; Atanasova, T. Health Status Classification for Cows Using Machine Learning and Data Management on AWS Cloud. Animals 2023, 13, 3254. [Google Scholar] [CrossRef] [PubMed]
  29. Moawed, S.A.; Mahrous, E.; Elaswad, A.; Gouda, H.F.; Fathy, A. Milk Yield Prediction in Friesian Cows Using Linear and Flexible Discriminant Analysis under Assumptions Violations. BMC Vet. Res. 2024, 20, 392. [Google Scholar] [CrossRef] [PubMed]
  30. Li, D.; Yan, G.; Li, F.; Lin, H.; Jiao, H.; Han, H.; Liu, W. Optimized Machine Learning Models for Predicting Core Body Temperature in Dairy Cows: Enhancing Accuracy and Interpretability for Practical Livestock Management. Animals 2024, 14, 2724. [Google Scholar] [CrossRef]
  31. Lavrova, A.I.; Choucair, A.; Palmini, A.; Stock, K.F.; Kammer, M.; Querengässer, F.; Doherr, M.G.; Müller, K.E.; Belik, V. Leveraging Accelerometer Data for Lameness Detection in Dairy Cows: A Longitudinal Study of Six Farms in Germany. Animals 2023, 13, 3681. [Google Scholar] [CrossRef]
  32. Antanaitis, R.; Džermeikaitė, K.; Krištolaitytė, J.; Ribelytė, I.; Bespalovaitė, A.; Bulvičiūtė, D.; Rutkauskas, A. Alterations in Rumination, Eating, Drinking and Locomotion Behavior in Dairy Cows Affected by Subclinical Ketosis and Subclinical Acidosis. Animals 2024, 14, 384. [Google Scholar] [CrossRef]
  33. Wu, D.; Wu, Q.; Yin, X.; Jiang, B.; Wang, H.; He, D.; Song, H. Lameness Detection of Dairy Cows Based on the YOLOv3 Deep Learning Algorithm and a Relative Step Size Characteristic Vector. Biosyst. Eng. 2020, 189, 150–163. [Google Scholar] [CrossRef]
  34. Neupane, R.; Aryal, A.; Haeussermann, A.; Hartung, E.; Pinedo, P.; Paudyal, S. Evaluating Machine Learning Algorithms to Predict Lameness in Dairy Cattle. PLoS ONE 2024, 19, e0301167. [Google Scholar] [CrossRef] [PubMed]
  35. Lemmens, L.; Schodl, K.; Fuerst-Waltl, B.; Schwarzenbacher, H.; Egger-Danner, C.; Linke, K.; Suntinger, M.; Phelan, M.; Mayerhofer, M.; Steininger, F.; et al. The Combined Use of Automated Milking System and Sensor Data to Improve Detection of Mild Lameness in Dairy Cattle. Animals 2023, 13, 1180. [Google Scholar] [CrossRef]
  36. Shen, Y.; Li, B.; Wang, Y.; Li, Q.; Zhang, Z. An Algorithm for Detecting Cow Lameness Based on Ensemble Learning of Keypoint Motion Features. J. Dairy Sci. 2025, 108, 11520–11534. [Google Scholar] [CrossRef]
  37. Melendez, P.; Gomez, V.; Bothe, H.; Rodriguez, F.; Velez, J.; Lopez, H.; Bartolome, J.; Archbald, L. Ultrasonographic Ovarian Dynamic, Plasma Progesterone, and Non-Esterified Fatty Acids in Lame Postpartum Dairy Cows. J. Vet. Sci. 2018, 19, 462–467. [Google Scholar] [CrossRef]
  38. Barbosa, A.A.; de Araújo, M.C.N.; Krusser, R.H.; Martins, C.F.; Schmitt, E.; Rabassa, V.R.; Pino, F.A.B.D.; Brauner, C.C.; Corrêa, M.N. Prepartum Lameness on Subsequent Lactation in Holstein Dairy Cows. Ciênc. Rural 2020, 50, e20190333. [Google Scholar] [CrossRef]
  39. Nozad, S.; Ramin, A.G.; Moghaddam, G.; Asri-Rezaei, S.; Kalantary, L. Monthly Evaluation of Blood Hematological, Biochemical, Mineral, and Enzyme Parameters during the Lactation Period in Holstein Dairy Cows. Comp. Clin. Pathol. 2014, 23, 275–281. [Google Scholar] [CrossRef]
  40. Praxitelous, A.; Katsoulos, P.D.; Tsaousioti, A.; Schmicke, M.; Basioura, A.; Boscos, C.M.; Tsousis, G. Metabolic Characteristics of Lame Cows During Puerperium and the Beginning of the Reproductive Period. Ruminants 2025, 5, 8. [Google Scholar] [CrossRef]
  41. He, W.; Cardoso, A.S.; Hyde, R.M.; Green, M.J.; Scurr, D.J.; Griffiths, R.L.; Randall, L.V.; Kim, D.-H. Metabolic Alterations in Dairy Cattle with Lameness Revealed by Untargeted Metabolomics of Dried Milk Spots Using Direct Infusion-Tandem Mass Spectrometry and the Triangulation of Multiple Machine Learning Models. Analyst 2022, 147, 5537–5545. [Google Scholar] [CrossRef]
  42. Sadiq, M.B.; Ramanoon, S.Z.; Mansor, R.; Mossadeq, W.M.S.; Syed-Hussain, S.S.; Yimer, N.; Kaka, U.; Ajat, M.; Abdullah, J.F.F. Potential biomarkers for lameness and claw lesions in dairy cows: A scoping review. J. Dairy Res. 2024, 91, 202–210. [Google Scholar] [CrossRef]
  43. Kass, M.; Karis, P.; Leming, R.; Haskell, M.J.; Ling, K.; Henno, M. Associations of Lameness with Milk Composition, Fatty Acid Profile, and Milk Coagulation Properties in Mid-Lactation High-Yielding Holstein Cows. Int. Dairy J. 2024, 153, 105908. [Google Scholar] [CrossRef]
  44. Džermeikaitė, K.; Krištolaitytė, J.; Anskienė, L.; Šertvytytė, G.; Lembovičiūtė, G.; Arlauskaitė, S.; Girdauskaitė, A.; Rutkauskas, A.; Baumgartner, W.; Antanaitis, R. Effects of Lameness on Milk Yield, Milk Quality Indicators, and Rumination Behaviour in Dairy Cows. Agriculture 2025, 15, 286. [Google Scholar] [CrossRef]
  45. Jukna, V.; Meškinytė, E.; Urbonavičius, G.; Bilskis, R.; Antanaitis, R.; Kajokienė, L.; Juozaitienė, V. Association of Lameness Prevalence and Severity in Early-Lactation Cows with Milk Traits, Metabolic Profile, and Dry Period. Agriculture 2024, 14, 2030. [Google Scholar] [CrossRef]
  46. Bonfatti, V.; Ho, P.N.; Pryce, J.E. Usefulness of Milk Mid-Infrared Spectroscopy for Predicting Lameness Score in Dairy Cows. J. Dairy Sci. 2020, 103, 2534–2544. [Google Scholar] [CrossRef]
  47. Antanaitis, R.; Juozaitienė, V.; Urbonavičius, G.; Malašauskienė, D.; Televičius, M.; Urbutis, M.; Džermeikaitė, K.; Baumgartner, W. Identification of Risk Factors for Lameness Detection with Help of Biosensors. Agriculture 2021, 11, 610. [Google Scholar] [CrossRef]
  48. Weigele, H.C.; Gygax, L.; Steiner, A.; Wechsler, B.; Burla, J.-B. Moderate Lameness Leads to Marked Behavioral Changes in Dairy Cows. J. Dairy Sci. 2018, 101, 2370–2382. [Google Scholar] [CrossRef] [PubMed]
  49. Thorup, V.M.; Nielsen, B.L.; Robert, P.-E.; Giger-Reverdin, S.; Konka, J.; Michie, C.; Friggens, N.C. Lameness Affects Cow Feeding But Not Rumination Behavior as Characterized from Sensor Data. Front. Vet. Sci. 2016, 3, 37. [Google Scholar] [CrossRef]
  50. Goff, J.P. The Monitoring, Prevention, and Treatment of Milk Fever and Subclinical Hypocalcemia in Dairy Cows. Vet. J. 2008, 176, 50–57. [Google Scholar] [CrossRef]
  51. Abutarbush, S.M. Veterinary Medicine—A Textbook of the Diseases of Cattle, Horses, Sheep, Pigs and Goats, 10th Edition. Can. Vet. J. 2010, 51, 541. [Google Scholar]
Figure 1. Performance metrics (Mean ± SD) of ML models for lameness detection in dairy cows. Grouped bar chart displaying sensitivity, specificity, positive predictive value (PPV), and accuracy (mean ± SD) of six classification models: Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Neural Network (NN), k-Nearest Neighbors (KNN), and an Ensemble model. Metrics were derived using Monte Carlo cross-validation based on a dataset including physiological, behavioural, blood biomarker, and milk quality traits. RF, NN, and Ensemble models showed superior and stable classification performance.
Figure 1. Performance metrics (Mean ± SD) of ML models for lameness detection in dairy cows. Grouped bar chart displaying sensitivity, specificity, positive predictive value (PPV), and accuracy (mean ± SD) of six classification models: Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Neural Network (NN), k-Nearest Neighbors (KNN), and an Ensemble model. Metrics were derived using Monte Carlo cross-validation based on a dataset including physiological, behavioural, blood biomarker, and milk quality traits. RF, NN, and Ensemble models showed superior and stable classification performance.
Biosensors 15 00722 g001
Figure 2. Comparative performance of six ML models in classifying lameness in dairy cows (Mean ± SD). Bar plot comparing the performance of Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Neural Network (NN), k-Nearest Neighbors (KNN), and Ensemble models across six classification metrics: sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and area under the curve (AUC). Values represent mean percentage scores ± standard deviation from repeated Monte Carlo cross-validations using physiological, behavioural, blood biomarker, and milk quality features for early-lactation lameness detection.
Figure 2. Comparative performance of six ML models in classifying lameness in dairy cows (Mean ± SD). Bar plot comparing the performance of Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Neural Network (NN), k-Nearest Neighbors (KNN), and Ensemble models across six classification metrics: sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and area under the curve (AUC). Values represent mean percentage scores ± standard deviation from repeated Monte Carlo cross-validations using physiological, behavioural, blood biomarker, and milk quality features for early-lactation lameness detection.
Biosensors 15 00722 g002
Table 1. Chemical composition of TMR for lactating dairy cows.
Table 1. Chemical composition of TMR for lactating dairy cows.
TMR ComponentValue
Dry matter50.7%
Neutral detergent fiber28.3% of DM
Acid detergent fiber19.8% of DM
Net lactation energy1.6 Mcal/kg
Crude protein15.8% of DM
Non-fiber carbohydrates38.7% of DM
TMR—total mixed ratio, DM—dry matter.
Table 2. Composition of TMR for lactating dairy cows.
Table 2. Composition of TMR for lactating dairy cows.
TMR ComponentValue
Mineral mix6.0%
Grain concentrate mash49.0%
Grass silage10.0%
Corn silage31.0%
Alfalfa grass hay4.0%
TMR—total mixed ratio.
Table 3. Summary of collected variables and laboratory analyses.
Table 3. Summary of collected variables and laboratory analyses.
Type of MeasurementMeasured VariableUnitInstrument
Production parametersMilk yieldkg/dayDeLaval milking robots
Milk parametersMilk composition (fat, protein, lactose, milk fat-to-protein ratio)%Brolis HerdLine in-line milk analyzer (Brolis Sensor Technology, Vilnius, Lithuania)
Environmental parametersTHI-SmaXtec climate station (SmaXtec Animal Care GmbH, Graz, Austria)
Behavioural parametersRumination timemin/daySmaXtec bolus (SmaXtec Animal Care GmbH, Graz, Austria)
Activityh/daySmaXtec bolus
Water intakeL/daySmaXtec bolus
Physiological parametersBody temperature°CSmaXtec bolus
Milk temperature°CBROLİS HerdLine in-line milk analyzer (Brolis Sensor Technology, Vilnius, Lithuania)
Blood biomarkersASTU/LRandox clinical chemistry (RX Daytona, Randox Laboratories Ltd., London, UK)
NEFAmmol/LRandox clinical chemistry
GGTU/LRandox clinical chemistry
LDHU/LRandox clinical chemistry
TPG/LRandox clinical chemistry
TRIGmmol/LRandox clinical chemistry
Feµmol/LRandox clinical chemistry
THI—temperature-humidity index; NEFA—nonesterified fatty acids; AST—aspartate aminotransferase activity; Fe—iron; GGT—gamma-glutamyl transferase activity; LDH—lactate dehydrogenase; TP—total protein; TRIG—triglycerides.
Table 4. Performance of classification models based on nMCC obtained through 10 Monte Carlo cross-validations for identifying lameness and healthy cows during early lactation.
Table 4. Performance of classification models based on nMCC obtained through 10 Monte Carlo cross-validations for identifying lameness and healthy cows during early lactation.
Model 2nMCC 1
RF0.94
SVM0.71
LR0.30
NN0.90
KNN0.78
Ensemble0.90
1 Values correspond to the normalised Matthews correlation coefficient (nMCC) obtained by the models in classifying cows with or without lameness based on physiological, behavioural, blood, and milk quality traits. The nMCC ranges from 0 (random prediction) to 1 (perfect classification) and provides a robust evaluation metric even for imbalanced data. 2 RF—random forest, SVM—support vector machine, NN—neural network, LR—logistic regression, KNN—k-nearest neighbors, Ensemble—integrated model combining predictions from RF, SVM, and NN.
Table 5. Performance metrics of ML models for lameness classification in dairy cows during early lactation.
Table 5. Performance metrics of ML models for lameness classification in dairy cows during early lactation.
ModelSensitivitySpecificityAccuracyPPVNPVAUCMCC
RF92.73 ± 12.06100.00 ± 0.0097.04 ± 4.91100.00 ± 0.0095.78 ± 6.8599.77 ± 0.680.94 ± 0.10
SVM73.64 ± 11.1094.49 ± 5.0986.04 ± 5.6490.53 ± 8.2884.42 ± 6.0694.31 ± 5.890.71 ± 0.12
LR52.73 ± 9.7975.85 ± 8.7566.53 ± 6.0960.46 ± 9.0170.39 ± 4.7974.89 ± 6.160.30 ± 0.13
NN94.55 ± 11.6495.07 ± 6.0494.84 ± 4.7393.85 ± 7.5496.89 ± 6.5397.73 ± 5.730.90 ± 0.09
KNN89.09 ± 12.0688.38 ± 8.2988.61 ± 6.2784.79 ± 9.5292.91 ± 7.3989.39 ± 7.530.78 ± 0.13
Ensemble94.55 ± 11.6495.07 ± 6.0494.84 ± 4.7393.85 ± 7.5496.89 ± 6.5396.82 ± 4.430.90 ± 0.09
Mean ± standard deviation of performance metrics for six classification models—Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Neural Network (NN), k-Nearest Neighbors (KNN), and an Ensemble model—based on 10-fold Monte Carlo cross-validation. Metrics include Sensitivity, Specificity, Accuracy, Positive Predictive Value (PPV), Negative Predictive Value (NPV), Area Under the Curve (AUC), and Matthews Correlation Coefficient (MCC). Classification was based on physiological, behavioural, blood biomarker, and milk quality traits to distinguish lame cows (n = 1) from healthy cows (n = 0).
Table 6. Descriptive statistics of physiological, behavioral, and biochemical parameters in healthy and lame dairy cows during early lactation.
Table 6. Descriptive statistics of physiological, behavioral, and biochemical parameters in healthy and lame dairy cows during early lactation.
Descriptives
TraitsCow GroupN RecordsMeanStd. DeviationStd. Error95% Confidence Interval for MeanMinimumMaximumSignificant
Lower Bound Upper Bound
THIHealthy16261.053.170.2560.5661.5456.6964.530.722
Lameness11061.193.260.3160.5761.8156.6964.53
Water intake (L/day)Healthy162145.0629.922.35140.42149.7074.93225.72<0.001
Lameness110127.3529.772.84121.73132.9872.80179.19
Cow activity (h/day)Healthy1627.323.360.266.807.842.3120.010.534
Lameness1107.063.310.326.437.691.3115.92
Body temperature (°C)Healthy16239.430.130.0139.4139.4539.1439.72<0.001
Lameness11039.490.150.0139.4639.5239.1640.00
Rumination time (min/day)Healthy162493.9366.395.22483.63504.23327.62610.130.461
Lameness110488.1459.085.63476.97499.30329.43575.43
ASTHealthy16295.7532.922.5990.65100.8639.70218.600.001
Lameness11084.3721.012.0080.4088.3448.60144.20
FeHealthy16222.745.520.4321.8823.6012.4039.400.076
Lameness11021.436.540.6220.1922.676.9036.60
GGTHealthy16238.0014.231.1235.7940.2112.5077.50<0.001
Lameness11032.198.900.8530.5133.8715.9049.90
LDHHealthy1621393.41277.3221.791350.381436.44911.002471.000.001
Lameness1101284.51257.4824.551235.851333.17935.001967.00
NEFAHealthy1620.090.080.010.070.100.020.42<0.001
Lameness1100.140.140.010.110.160.020.58
TPHealthy16280.526.760.5379.4781.5768.90100.400.397
Lameness11081.267.430.7179.8582.6662.70101.10
TRIGHealthy1620.090.020.000.080.090.000.140.294
Lameness1100.090.030.000.090.100.050.15
Milk yield (kg)Healthy16236.448.880.7035.0637.8215.7356.570.278
Lameness11037.689.690.9235.8539.5120.4060.00
Milk temperature (°C)Healthy16236.420.590.0536.3336.5134.9037.430.880
Lameness11036.430.880.0836.2636.6033.8237.82
Milk fat (%)Healthy1624.030.520.043.954.112.785.180.148
Lameness1103.910.830.083.754.061.776.88
Milk protein (%)Healthy1623.350.270.023.313.392.753.99<0.001
Lameness1103.210.380.043.143.281.513.84
Milk fat-to-protein ratioHealthy1621.200.140.011.181.220.801.550.431
Lameness1101.220.230.021.181.260.621.96
Milk lactose (%)Healthy1624.840.180.014.824.874.435.190.003
Lameness1104.710.510.054.624.812.175.08
NEFA—nonesterified fatty acids; AST—aspartate aminotransferase activity; Fe—iron; GGT—gamma-glutamyl transferase activity; LDH—lactate dehydrogenase; TP—total protein; TRIG—triglycerides.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Džermeikaitė, K.; Krištolaitytė, J.; Grigė, S.; Girdauskaitė, A.; Šertvytytė, G.; Lembovičiūtė, G.; Televičius, M.; Riškevičienė, V.; Antanaitis, R. Creation of Machine Learning Models Trained on Multimodal Physiological, Behavioural, Blood Biochemical, and Milk Composition Parameters for the Identification of Lameness in Dairy Cows. Biosensors 2025, 15, 722. https://doi.org/10.3390/bios15110722

AMA Style

Džermeikaitė K, Krištolaitytė J, Grigė S, Girdauskaitė A, Šertvytytė G, Lembovičiūtė G, Televičius M, Riškevičienė V, Antanaitis R. Creation of Machine Learning Models Trained on Multimodal Physiological, Behavioural, Blood Biochemical, and Milk Composition Parameters for the Identification of Lameness in Dairy Cows. Biosensors. 2025; 15(11):722. https://doi.org/10.3390/bios15110722

Chicago/Turabian Style

Džermeikaitė, Karina, Justina Krištolaitytė, Samanta Grigė, Akvilė Girdauskaitė, Greta Šertvytytė, Gabija Lembovičiūtė, Mindaugas Televičius, Vita Riškevičienė, and Ramūnas Antanaitis. 2025. "Creation of Machine Learning Models Trained on Multimodal Physiological, Behavioural, Blood Biochemical, and Milk Composition Parameters for the Identification of Lameness in Dairy Cows" Biosensors 15, no. 11: 722. https://doi.org/10.3390/bios15110722

APA Style

Džermeikaitė, K., Krištolaitytė, J., Grigė, S., Girdauskaitė, A., Šertvytytė, G., Lembovičiūtė, G., Televičius, M., Riškevičienė, V., & Antanaitis, R. (2025). Creation of Machine Learning Models Trained on Multimodal Physiological, Behavioural, Blood Biochemical, and Milk Composition Parameters for the Identification of Lameness in Dairy Cows. Biosensors, 15(11), 722. https://doi.org/10.3390/bios15110722

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop