Automated Body Condition Scoring in Dairy Cows Using 2D Imaging and Deep Learning

Lewis, Reagan; Kostermans, Teun; Brovold, Jan Wilhelm; Laique, Talha; Ocepek, Marko

doi:10.3390/agriengineering7070241

Open AccessArticle

Automated Body Condition Scoring in Dairy Cows Using 2D Imaging and Deep Learning

by

Reagan Lewis

^1,2,3,*,

Teun Kostermans

^1,4

,

Jan Wilhelm Brovold

¹,

Talha Laique

¹ and

Marko Ocepek

¹

Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 As, Norway

²

Department of Animal Sciences, Egerton University, P.O. Box 536-20115, Njoro 20115, Kenya

³

Department of Animal Production, University of Juba, Juba P.O. Box 82, Central Equatoria, South Sudan

⁴

Department of Animal Sciences, Wageningen University, P.O. Box 338, 6700 AH Wageningen, The Netherlands

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(7), 241; https://doi.org/10.3390/agriengineering7070241

Submission received: 24 May 2025 / Revised: 4 July 2025 / Accepted: 11 July 2025 / Published: 18 July 2025

(This article belongs to the Special Issue Precision Farming Technologies for Monitoring Livestock and Poultry)

Download

Browse Figures

Versions Notes

Abstract

Accurate body condition score (BCS) monitoring in dairy cows is essential for optimizing health, productivity, and welfare. Traditional manual scoring methods are labor-intensive and subjective, driving interest in automated imaging-based systems. This study evaluated the effectiveness of 2D imaging and deep learning for BCS classification using three camera perspectives—front, back, and top-down—to identify the most reliable viewpoint. The research involved 56 Norwegian Red milking cows at the Center for Livestock Experiments (SHF) of Norges Miljo-og Biovitenskaplige Universitet (NMBU) in Norway. Images were classified into BCS categories of 2.5, 3.0, and 3.5 using a YOLOv8 model. The back view achieved the highest classification precision (mAP@0.5 = 0.439), confirming that key morphological features for BCS assessment are best captured from this angle. Challenges included misclassification due to overlapping features, especially in Class 2.5 and background data. The study recommends improvements in algorithmic feature extraction, dataset expansion, and multi-view integration to enhance accuracy. Integration with precision farming tools enables continuous monitoring and early detection of health issues. This research highlights the potential of 2D imaging as a cost-effective alternative to 3D systems, particularly for small and medium-sized farms, supporting more effective herd management and improved animal welfare.

Keywords:

precision livestock farming; cow welfare monitoring; camera perspectives; 3D systems

1. Introduction

Body condition scoring (BCS) is a crucial tool in dairy cow management, providing valuable insights into an animal’s nutritional status, health, and productivity [1]. BCS is widely recognized as an essential welfare indicator, influencing reproductive efficiency, milk yield, and disease susceptibility [2]. Traditional BCS assessments rely on manual scoring systems, such as the Welfare Quality^® protocol, which are subjective and prone to inconsistency among evaluators [3]. These limitations have driven the development of automated imaging technologies, which enable more objective, reliable, and continuous assessments of dairy cow body condition [4].

Advanced BCS monitoring systems utilizing 3D imaging have demonstrated high accuracy and precision by capturing depth-related body shape features in dairy cows [5]. These systems provide detailed morphological data, allowing for consistent evaluation of body reserves and early detection of metabolic imbalances [6]. However, the widespread adoption of 3D imaging is limited due to high costs, complex installations, and maintenance requirements, making it less accessible to small and medium-sized farms [7]. As a result, alternative 2D imaging solutions have emerged, offering a more cost-effective and scalable approach to BCS automation.

Recent advancements in machine learning and computer vision have greatly improved the accuracy of 2D imaging-based BCS systems [7]. Although these systems do not capture depth information like 3D cameras, they utilize high-resolution images and deep learning algorithms to identify key morphological patterns linked to body condition [8]. Two-dimensional cameras are easier to integrate into existing farm infrastructure, cost less to implement, and can provide real-time, 24/7 monitoring [9]. Additionally, they can serve multiple functions, such as behavioral analysis and early detection of health issues, helping to improve farm management efficiency [10].

A significant challenge for smaller dairy farms is their limited access to costly robotic milking systems, which often feature integrated 3D imaging solutions [11]. For farms that do not use automated milking robots, standalone 2D imaging systems could be a practical alternative. These systems can be placed in key locations within barns, such as feeding stations, weighing platforms, or milking parlors, enabling farmers to continuously monitor BCS without manual assessment [11]. The potential for real-time decision-making based on automated BCS evaluations could help optimize nutritional strategies, reproductive planning, and health interventions, making dairy farming more efficient and sustainable [12].

This study aims to assess the potential of 2D imaging for BCS classification by identifying the most effective imaging angles through automated image analysis. The hypothesis is that 2D imaging, combined with deep learning methods, can offer an accurate, cost-efficient, and scalable alternative to manual BCS evaluation and costly 3D systems. This research concentrates on analyzing three camera views (front, rear, and top-down) to evaluate their effectiveness in classifying BCS. It also investigates how image preprocessing, feature extraction, and dataset augmentation techniques influence classification performance.

Recent research indicates that deep learning-based imaging can facilitate cost-effective automation of BCS scoring; however, there is still limited agreement on the best camera positioning or system design for real-time farm use. This study aims to fill these gaps. Its goal is to assess the potential of 2D imaging combined with deep learning for automatically classifying body condition scores (BCSs). Specifically, the research examines how three camera angles—front, back, and top-down—affect classification accuracy using YOLOv8. The objective is to determine the most effective imaging perspective for scalable, affordable deployment and to evaluate how image preprocessing and augmentation methods impact model performance.

2. Materials and Methods

2.1. Data Collection

2.1.1. Study Location and Duration

The study was conducted at the Animal Research Centre of the Norwegian University of Life Sciences (NMBU) in As, Norway, in accordance with legal requirements for the keeping of dairy cattle (Ministry of Agriculture and Food, 2007) [13]. Since standard cattle rearing practices were followed and no potentially harmful research was performed, no approval was required to obtain separate permission for research involving animals (Ministry of Agriculture and Food, 2015) [14]. The research took place over a two-week period from 5 November to 19 November 2024, at a dairy cattle farm equipped with an Automated Milking System (AMS). The study involved 56 Norwegian Red breed milking cows.

2.1.2. Subjects and Data Acquisition

Cows’ Access to AMS: Cows voluntarily accessed the AMS between one and four times per day, where various physiological and behavioral parameters were recorded.
BCS Reference Standard: A 3D camera system installed above the AMS served as the reference standard for BCS evaluation.
Additional Data Recorded: The AMS automatically logs cow identification (ID), milking session timestamps, and body weights, all organized into a structured Excel database for further analysis.

2.2. Digital Tools and Camera Setup

2.2.1. Camera Specifications

Two-Dimensional Cameras: Three Foscam G4EP PoE 4MP cameras, each equipped with 128 GB SD cards for motion-triggered image capture.
Three-Dimensional Camera: Installed above the AMS, capturing depth-related morphological features of cows during milking.
Data Storage and Processing: Cameras were connected to an Ethernet network, and recording schedules were configured via the Foscam mobile application.

The camera system was installed in the AMS lane of the dairy barn, with three fixed RGB cameras strategically positioned to capture top, back, and front views of each cow as it moved through the milking stall. As shown in Figure 1, the top-view camera was mounted directly above the stall’s central lane on the overhead metal frame, providing a downward angle focused on the cow’s spine and loin area. The back-view camera was set up behind the cow’s exit path. The front-view camera was placed near the stall entry gate, facing the cow’s head and chest to capture frontal features important for body condition scoring, such as the brisket and shoulders. All cameras were secured using mounting brackets and aligned to reduce occlusion while maintaining a stable view during image collection.

Recent studies have identified 3D imaging systems as the gold standard for body condition scoring and morphological assessment in livestock because of their ability to accurately capture depth and contour information [16]. However, these systems are expensive and complicated to implement on commercial farms, which has encouraged research into 2D-based alternatives.

2.2.2. Camera Placement

To capture comprehensive BCS data, both 3D and 2D cameras were strategically positioned within the AMS and the weighing station after milking.

3D Camera Placement
○
Mounted directly above the AMS milking unit to provide a top-down view of cows for depth-based morphological analysis.
○
This 3D-based BCS evaluation served as the gold standard for validation.

2D Camera Placement
○
Front View Camera—Positioned at the entrance of the weighing station, capturing the head and shoulder regions.
○
Rear View Camera—Mounted behind the weighing scale, providing a backward perspective of the cow’s hindquarters.
○
Top-Down Camera—Mounted above the weighing platform, providing a bird’s-eye view of the cow’s topline and body structure.

All captured frames from 2D cameras were systematically linked to cow ID and BCS data, ensuring a high-quality dataset for training and validating the YOLOv8-based BCS classification model.

2.3. Image Data Processing and Annotation

Image Collection and Preprocessing

Data Collection
○
Video footage and snapshots were collected daily from 8:00 a.m. to 8:00 p.m., with images taken whenever movement was detected.

2.

Frame Extraction

○: At the end of each milking session, a screenshot was taken of each cow standing on the weighing scale to ensure a direct link to BCS and ID records.

3.

Image Sorting

○: Screenshots were organized by cow ID and BCS score.

4.

BCS Score Adjustment

○: Scores were rounded to the nearest half or whole grade (e.g., BCS 1.7 → 1.5; BCS 2.8 → 3.0).

Dataset Annotation and Preparation

Manual Annotation
○
The dataset was uploaded to Roboflow using their SAM annotation tool. In some cases, the manual polygon tool was employed to assign object detection labels to front, rear, and top views.

2.

Dataset Splitting

○

After annotation, the dataset was divided into three sets

▪: 70% Training Data—Used to train the YOLOv8 model.
▪: 20% Validation Data—Used for hyperparameter tuning and overfitting prevention.
▪: 10% Testing Data—Used for final model evaluation.

The initial dataset included images across five BCS categories: 1.5, 2.0, 2.5, 3.0, and 3.5. However, classes 1.5 and 2.0 were severely underrepresented, with fewer than 30 usable annotated images combined due to the low prevalence of cows with these BCS values during the data collection period. To avoid training bias and class imbalance issues during model development, these minority classes were excluded from the final classification task. This decision was made to ensure model stability and to prevent overfitting to rare classes with insufficient feature diversity. As a result, the final model was trained on three balanced BCS classes (2.5, 3.0, 3.5), for which we ensured adequate image representation across all views.

2.4. Object Detection Model

2.4.1. Model Architecture and Justification

For automated BCS classification, we used YOLOv8, a cutting-edge object detection model developed by Ultralytics. YOLOv8 has three main components.

Backbone: Extracts hierarchical image features using convolutional layers, C2f blocks, and spatial pyramid pooling.
Neck: Fuses multi-scale feature maps, optimizing object detection across different perspectives (rear, front, and top).
Head: Generates final predictions, including bounding-box coordinates, class labels, and confidence scores.

YOLOv8 was selected for its high detection accuracy, real-time inference capabilities, and efficient feature extraction, making it well suited for multi-view BCS assessment in dairy cows.

2.4.2. Data Preprocessing and Augmentation

A total of 1764 annotated images were collected for model development. These images represented three body condition score (BCS) classes: 521 images labeled as BCS 2.5, 648 as BCS 3.0, and 595 as BCS 3.5. Regarding imaging perspectives, the dataset included 588 back-view, 593 front-view, and 583 top-view images. A stratified split divided the dataset into training (1234 images), validation (353 images), and testing (177 images) sets, maintaining class balance across views and BCS classes.

To improve model generalization, the following image augmentation techniques were applied:

Auto-orientation (standardized image alignment).
Resize (all images resized to 640 × 640 pixels).
Contrast and Brightness Normalization (to enhance feature visibility).
Data Augmentation
○
Cropping: 0–15% random zoom.
○
Hue and Saturation Adjustments: Random variations (−6 to +6).
○
Brightness: Randomly adjusted (−15% to +15%).
○
Blur and Noise: Minor Gaussian blur and noise applied.

To address class imbalances, underrepresented BCS categories (BCS 1.5 and 2.0) were removed, leaving three final classes.

BCS 2.5 → Class 25
BCS 3.0 → Class 30
BCS 3.5 → Class 35

The image annotation was performed using Roboflow, a web-based platform for annotation and dataset management, which allowed manual labeling of bounding boxes for each image. The labeled dataset was exported in a format compatible with YOLOv8. Image preprocessing and augmentation were performed with the Albumentations library (version 1.3.1), offering a comprehensive set of computer vision transformations. Preprocessing included resizing all images to 640 × 640 pixels, normalizing pixel values based on ImageNet mean and standard deviation, and maintaining consistent aspect ratios. During training, augmentation techniques such as horizontal flipping, random rotations up to ±15°, brightness and contrast adjustments, CLAHE (Contrast Limited Adaptive Histogram Equalization), and random cropping were applied on the fly using the Ultralytics YOLOv8 training script (version 8.0.104), which is fully integrated with a PyTorch-based (PyTorch version 1.13.1) training environment.

YOLOv8 was chosen for this study because of its balance of speed, accuracy, and deployability. It is a state-of-the-art one-stage object detection architecture that improves on YOLOv5 by using a decoupled head, anchor-free design, and enhanced feature extraction with only a slight increase in model size. These features make it especially suitable for deployment in real-time livestock monitoring systems where latency and resource constraints are critical.

While heavier models like Faster R-CNN or RetinaNet can achieve higher accuracy on controlled benchmarks, they usually require more memory, have slower inference times, and are less practical for edge-based systems. Conversely, ultra-light models such as YOLO-Nano or MobileNet-SSD provide faster speeds but sacrifice some detection accuracy—especially in complex farm environments with overlapping features and lighting variation.

In this study, the priority was to identify a model that could provide adequate classification performance (mAP > 0.40) while remaining computationally feasible for eventual on-site or mobile deployment. YOLOv8 achieves this balance effectively, which is why it was selected. However, future work will explore benchmarking against lighter models (e.g., YOLOv5n, YOLOv7-tiny, MobileNet-SSD) and heavier ones (e.g., Efficient Det, Faster R-CNN) to quantify trade-offs in BCS classification tasks further.

2.4.3. Model Training and Optimization

Training was conducted on Google Colab, utilizing an NVIDIA Tesla T4 GPU (16 GB RAM).

Pretrained Weights: Transfer learning was applied using MSCOCO weights.
Optimizer: Adam optimizer with learning rate = 0.001.
Training Setup
○
Epochs: 50
○
Batch size: 16
○
Loss functions:
▪
Bounding box (BBox) loss.
▪
Classification loss.
▪
Distribution Focal Loss (DFL).

To assess model stability and mitigate the impact of random initialization, we conducted three independent training runs of the YOLOv8 model using identical hyperparameters (50 epochs, batch size 16, and Adam optimizer) and randomized weight initialization. Each run used the same data splits (70% training, 20% validation, 10% testing). The results were consistent across runs, with standard deviation in mAP@0.5 values <0.015 across all views and classes. While formal k-fold cross-validation was not conducted due to computational limitations, the consistent performance across trials supports the robustness of our findings.

2.4.4. Model Evaluation

The trained model’s performance was evaluated using the following:

Mean Average Precision (mAP@0.5 and mAP@0.5–0.95).
Precision/Recall Curves.
Confusion Matrices (misclassification analysis).
Confidence Threshold Optimization (for front, rear, and top views).

After training, the model was deployed and tested on an Intel Core i7-12700H (32 GB RAM) for real-time performance validation. The training and validation loss plots of the models can be seen in Figure 2, Figure 3 and Figure 4 below.

2.5. Data Analysis

A total of 1764 labeled images were used in this study, distributed across the three BCS classes (2.5, 3.0, and 3.5). The dataset was split into training (70%), validation (20%), and test (10%) sets using stratified sampling to maintain class balance. This resulted in 1234 images for training, 353 images for validation, and 177 images for testing. These partitions were fixed throughout all experiments to ensure consistent evaluation of model performance.

The trained model’s performance was evaluated using three key assessment methods. First, precision/recall curves were employed to assess classification accuracy across BCS categories (2.5, 3.0, 3.5), offering insight into the balance between precision, which measures the correctness of positive predictions, and recall, which reflects the detection rate. These curves were calculated separately for each camera view, including the front, top, and back perspectives.

Next, confusion matrices were analyzed to assess misclassification rates and error patterns, helping to identify systematic misclassification trends. Overlapping feature representations between BCS classes were examined to determine where the model had difficulty distinguishing between categories.

Finally, Confidence Threshold Optimization was used to find the best probability cutoffs for classification performance. Thresholds were optimized separately for the front, top, and back views to enhance detection accuracy and provide more precise predictions.

Model training was performed on Google Colab using a GPU-accelerated environment with an NVIDIA Tesla T4 GPU and 16 GB RAM. After training, the model was tested on a local machine equipped with an Intel Core i7-12700H processor and 32 GB RAM, where CPU-based inference validation was conducted to evaluate computational performance in a real-world application.

The final annotated dataset included a total of 705 images across the three BCS classes after preprocessing and class adjustment: 210 images for BCS 2.5, 315 for BCS 3.0, and 180 for BCS 3.5. These were distributed approximately 35% from the front view, 40% from the rear view, and 25% from the top view. The confusion matrices and precision/recall (PR) curves were generated from a test set consisting of 10% of the entire dataset, maintaining proportional class representation. Apparent discrepancies between the confusion matrices and PR curves are due to differences in evaluation methods: the PR curves evaluate performance across multiple thresholds, while the confusion matrices show performance at a fixed confidence threshold (0.5). To improve clarity, we have included a supplementary table summarizing the number of images per class and dataset split (training, validation, test). This ensures complete traceability of classification performance across all BCS categories.

The overview of the research design includes (a) placement of 2D cameras around the AMS weighing station; (b) an example of cow ID and BCS tracking in an Excel spreadsheet; (c) the object detection annotation process in Roboflow, as shown in Figure 5; (d) sample annotated images from front, top, and back views; (e) dataset split and augmentation process; (f) YOLOv8 model training in Google Colab, as depicted in Figure 6; (g) a comparison of precision/recall curves for front, rear, and top views; and (h) as shown in Figure 7 a confusion matrix displaying misclassification patterns.

3. Results

3.1. Model Performance Based on Precision/Recall Curves

The precision/recall (PR) curve was used to evaluate the classification performance of the model across three different camera views, including front, top, and back.

The class-wise Average Precision (AP) was calculated for three body condition score (BCS) categories (2.5, 3.0, and 3.5), along with the mean Average Precision (mAP@0.5) across all classes.

For the front view, Class 2.5 achieved an AP of 0.519, Class 3.0 reached an AP of 0.518, and Class 3.5 had an AP of 0.260. This resulted in an overall mAP@0.5 of 0.432.

For the top view, Class 2.5 obtained an AP of 0.402, Class 3.0 an AP of 0.490, and Class 3.5 an AP of 0.372. The overall mAP@0.5 for this view was 0.421.

For the back view, Class 2.5 recorded an AP of 0.428, Class 3.0 an AP of 0.483, and Class 3.5 an AP of 0.439. This view led to the highest overall mAP@0.5 of 0.439, as shown in Figure 8.

The model achieved a mean Average Precision (mAP@0.5:0.95) of 0.41 on the validation set after 50 epochs. Class-specific mAPs were 0.35 for BCS 2.5, 0.47 for BCS 3.0, and 0.42 for BCS 3.5. These values show that the model generalizes reasonably well across all classes, with slightly higher precision for mid-range BCS scores.

To quantitatively support our conclusion that the top view performed the worst, we analyzed per-class accuracy and recall across all views (Table 1). The top view had the lowest precision and recall scores for all classes, with Class 3.0 achieving only 0.03 recall and Class 2.5 achieving 0.15. Similarly, Class 2.5 showed the lowest classification performance across all camera views, with high misclassification rates and significantly lower recall (e.g., 0.15 in the top view, 0.50 in the front view, and 0.52 in the back view). These metrics confirm the textual statements and highlight Class 2.5 as the most difficult to classify reliably due to feature overlap with background and neighboring BCS classes.

3.2. Precision/Confidence Analysis

The precision/confidence curves were created to evaluate the model’s confidence at various probability thresholds for classification accuracy.

For the front view, the model achieved 100% precision at a confidence threshold of 0.415. Class 2.5 showed consistent precision growth, while Classes 3.0 and 3.5 experienced fluctuations in confidence scores (Figure 8).

In the top view, the optimal confidence threshold was 0.276, where precision achieved 100%. Classes 2.5 and 3.0 showed moderate improvements in precision, while Class 3.5 displayed greater variability.

In the back view, the model achieved 100% precision at a confidence threshold of 0.512. Classes 2.5 and 3.0 showed consistent improvements, but Class 3.5 experienced significant fluctuations in detection accuracy (Figure 8).

3.3. Classification Performance Based on Confusion Matrices

In Figure 9, the confusion matrices show classification accuracy and misclassification patterns across various BCS categories and camera views.

For the front view, Class 3.0 was accurately classified in 64 cases, while Class 3.5 was correctly identified in 57 cases. However, Class 2.5 had only 10 correct predictions, with 25 instances misclassified as background. Background data was often misclassified as Class 3.0, with 64 occurrences, indicating significant overlap in feature representation (Figure 9).

For the top view, a strong classification bias was observed, with nearly all predictions assigned to the background class. Class 2.5 was misclassified as background in 29 instances, Class 3.0 in 35 instances, and Class 3.5 in 25 instances. Only one correct prediction was recorded for Class 3.0, indicating poor classification performance in this view (Figure 7). For the back view, Class 3.0 achieved 29 correct classifications. However, Class 2.5 was often misclassified as Class 3.0 in 18 instances, and background data was frequently misclassified as Class 3.0 in 39 instances. Class 3.5 had just 13 correct predictions, highlighting significant misclassification across categories.

Table 2 shows test set performance for each BCS class and view, including sample counts, correct predictions, misclassifications, precision, and recall. For example, while Class 2.5 in the back view had moderate average precision (AP = 0.428) on the PR curve, it also had a high number of misclassifications due to feature overlap with background and Class 3.0, especially at lower thresholds. This highlights the importance of interpreting PR and confusion matrix metrics together. The back view achieved higher overall mAP but still faced challenges with some boundary cases in Class 2.5.

4. Discussion

This study evaluated the effectiveness of a deep learning-based body condition scoring (BCS) system using 2D images captured from three camera perspectives: front, top, and back. The goal was to determine which view provided the most discriminative features for BCS classification and to identify the system’s limitations for different classes. The results show that the back view consistently achieved the best performance, with the highest mAP and class-specific accuracy, especially for BCS 3.0. This is likely because the back view exposes the tailhead, hooks, and pins—areas known to be important for manual BCS scoring [17]. These anatomical landmarks are more visually accessible and less affected by lighting distortions or occlusion in the back view compared to the front or top.

In contrast, the top view produced the weakest performance, with significantly reduced precision and recall across all BCS categories. For example, Class 3.0 achieved a recall of only 0.03, and Class 2.5 achieved just 0.15. This suggests that top-down imaging may lack the visual granularity necessary to differentiate between subcutaneous fat layers and bony structures, which are essential for accurate BCS estimation. These results are consistent with previous studies, which have shown that top views are prone to perspective flattening and poor region localization in livestock applications [17].

The front view exhibited moderate performance, but it also faced significant class overlap—especially between Classes 2.5 and 3.0. This likely results from limited visibility of flank and pelvic landmarks from the front, which makes it harder for the model to identify distinguishing features. Additionally, cows in motion or poorly centered within the frame may have added to the prediction noise.

A consistent finding across all viewpoints was that Class 2.5 was the hardest to identify, with the highest rates of misclassification and the lowest precision and recall. This probably stems from both biological and imaging difficulties. Morphologically, Class 2.5 cows show subtle visual differences from nearby classes, and their lean body mass often merges with background or barn elements in 2D images. Confusion matrices and PR curves support this, especially in the top view, where Class 2.5 had a recall of just 0.15 [18]. Improving contrast, increasing annotation density, or using depth data could help address this.

These results have practical implications for on-farm deployment of automated BCS systems. Specifically, rear-mounted imaging appears to be the most effective strategy for capturing biologically informative views while minimizing classification errors. Integrating such a system into milking parlors, chutes, or alleyways would require minimal behavioral disruption. However, further improvements are needed to increase sensitivity to under-conditioned animals, which are often of greatest concern in herd management decisions [19,20].

From a technical perspective, we conducted three separate training trials to evaluate model stability. The results demonstrated consistent classification outcomes, with minimal variation in mAP and class-wise recall across runs (standard deviation < 0.015). Although this does not replace full cross-validation, it supports the reliability of the architecture given our dataset constraints. In future work, we plan to implement k-fold cross-validation, test generalizability across breeds and farm environments, and expand the BCS range to include classes 1.5 and 2.0 once enough data is available [21].

Compared to the existing literature, this study introduces a novel camera-perspective analysis using YOLOv8 and a simplified annotation framework, making it more practical for small- to mid-sized farms with limited infrastructure [22]. The focus on 2D RGB imaging—without needing depth or thermal sensors—shows that acceptable performance is possible even in resource-limited settings [23,24]. However, adding multi-modal sensors and temporal features (from video streams) could improve robustness in changing field conditions.

Overall, the results highlight the importance of camera placement and balanced datasets in AI-based animal monitoring. By tackling both technical and operational challenges of BCS classification, this work paves the way for scalable, cost-effective deployment of precision livestock technologies.

5. Conclusions

This study presents an innovative method for automated BCS classification that combines 2D imaging with deep learning. The strategic assessment of multiple camera angles sets this work apart from previous studies, which mainly relied on single views or expensive 3D systems. The results show that the rear view provides the most accurate capture of key BCS features, confirming that this cost-effective approach is practical for farm use.

By addressing the limitations of manual scoring and costly automated systems, this study lays the foundation for the wider adoption of precision livestock monitoring, especially among small and medium-sized dairy farms. Future research should focus on optimizing multi-view integration and improving model robustness to further increase classification accuracy. Additionally, expanding datasets, refining model architectures, and integrating complementary sensor technologies will strengthen the reliability and practicality of 2D-imaging-based BCS assessment in various farm environments. Efforts should also be made to make these systems more adaptable to small and medium-sized farms, ensuring that automated BCS monitoring can be easily implemented regardless of farm infrastructure. By incorporating these advancements, automated BCS systems can enhance herd health monitoring, boost productivity, and promote more sustainable dairy farming practices.

Author Contributions

T.K. and R.L. contributed to the conceptualization of the study; R.L. developed the methodology; T.K. was responsible for the software implementation and conducted the formal analysis; validation was carried out by T.L. and M.O.; investigation was undertaken by M.O.; resources were provided by R.L. and T.K.; data curation was performed by J.W.B., R.L. and T.K.; the original draft was prepared by R.L.; review and editing were conducted by M.O.; visualization was handled by M.O.; and supervision was provided by M.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data generated and analyzed during this study will be made available upon reasonable request.

Acknowledgments

The authors express their gratitude to the Centre for Livestock Experiments (SHF) at the Norges Miljo-og Biovitenskaplige Universitet (NMBU), Norway, for their invaluable support in providing research facilities essential for the successful execution of this study. We also acknowledge the contributions of research staff and farm personnel who facilitated data collection and ensured the welfare of the animals involved in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AMS	Automated Milking System
AP	Average Precision
BCS	Body Condition Score
Colab	Google Colaboratory
RFID	Radio-Frequency Identification
SD	Secure Digital (card)

References

Lemmens, L.; Schodl, K.; Fuerst-Waltl, B.; Schwarzenbacher, H.; Egger-Danner, C.; Linke, K.; Suntinger, M.; Phelan, M.; Mayerhofer, M.; Steininger, F.; et al. The Combined Use of Automated Milking System and Sensor Data to Improve Detection of Mild Lameness in Dairy Cattle. Animals 2023, 13, 1180. [Google Scholar] [CrossRef] [PubMed]
Roche, J.R.; Friggens, N.C.; Kay, J.K.; Fisher, M.W.; Stafford, K.J.; Berry, D.P. Body condition score and its association with dairy cow productivity, health, and welfare. J. Dairy Sci. 2009, 92, 5769–5801. [Google Scholar] [CrossRef] [PubMed]
Bewley, J.M. Automated Body Condition Scoring of Dairy Cattle: Technical and Economic Feasibility. Doctoral Dissertation, Purdue University, West Lafayette, IN, USA, 2008. [Google Scholar]
Krukowski, M.; McKague, K.; Valerio, A. Limitations of manual BCS evaluation in dairy farms: A systematic review. Appl. Anim. Behav. Sci. 2021, 241, 105378. [Google Scholar] [CrossRef]
Li, X.; Lv, C.; Wang, W.; Li, G.; Yang, L.; Yang, J. Generalized focal loss: Towards efficient representation learning for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3139–3153. [Google Scholar] [CrossRef] [PubMed]
López-Gatius, F.; Yániz, J.; Madriles-Helm, D. Effects of body condition score and score change on the reproductive performance of dairy cows: A meta-analysis. Theriogenology 2003, 59, 801–812. [Google Scholar] [CrossRef] [PubMed]
Halachmi, I.; Klopčič, M.; Polak, P.; Roberts, D.J.; Bewley, J.M. Automatic assessment of dairy cow body condition score using 3D image analysis. J. Dairy Sci. 2019, 96, 8047–8059. [Google Scholar] [CrossRef]
Mottram, T.T.F.; den Uijl, I. Health and welfare monitoring of dairy cows. In Digital Agritechnology: Applications and Future Prospects; Khan, M.I.R., Jamal, M.M., Eds.; Elsevier: Amsterdam, The Netherlands, 2022; pp. 113–142. [Google Scholar] [CrossRef]
Simitzis, P.; Tzanidakis, C.; Tzamaloukas, O.; Sossidou, E. Contribution of Precision Livestock Farming Systems to the Improvement of Welfare Status and Productivity of Dairy Animals. Dairy 2022, 3, 12–28. [Google Scholar] [CrossRef]
Zhao, K.; Zhang, M.; Shen, W.; Liu, X.; Ji, J.; Dai, B.; Zhang, R. Automatic body condition scoring for dairy cows based on efficient net and convex hull features of point clouds. Comput. Electron. Agric. 2023, 205, 107588. [Google Scholar] [CrossRef]
Islam, A.; Lomax, S.; Doughty, A.; Islam, M.R.; Jay, O.; Thomson, P.; Clark, C. Automated monitoring of cattle heat stress and its mitigation. Front. Anim. Sci. 2021, 2, 737213. [Google Scholar] [CrossRef]
Mottram, T.; den Uijl, I. Integrating AI into dairy farming: A future perspective. Anim. Sci. J. 2022, 93, 13671. [Google Scholar] [CrossRef]
Ministry of Agriculture and Food. Regulations on the Keeping of Cattle; Norwegian Ministry of Agriculture and Food: Oslo, Norway, 2007; Available online: https://nibio.brage.unit.no/nibio-xmlui/bitstream/handle/11250/2494661/NILF-Diskusjonsnotat-2007-02.pdf?isAllowed=y&sequence=2&utm (accessed on 18 May 2025).
Ministry of Agriculture and Food. Regulations Relating to the Use of Animals in Research; Norwegian Ministry of Agriculture and Food: Oslo, Norway, 2015; Available online: https://www.regjeringen.no/no/dokumenter/forskrift-om-bruk-av-dyr-i-forsok/id2425378/ (accessed on 18 May 2025).
Lewis, L. Overview of the Barn and Camera Placement; Original Figure, Created by the Author During Data Collection at the Animal Research Centre; NMBU: Ås, Norway, 2024. [Google Scholar]
Siachos, N.; Lennox, M.; Anagnostopoulos, A.; Griffiths, B.E.; Neary, J.M.; Smith, R.F.; Oikonomou, G. Development and validation of a fully automated 2-dimensional imaging system generating body condition scores for dairy cows using machine learning. J. Dairy Sci. 2024, 107, 2499–2511. [Google Scholar] [CrossRef] [PubMed]
Van Nuffel, A.; Zwertvaegher, I.; Van Weyenberg, S.; Pastell, M.; Thorup, V.M.; Bahr, C.; Sonck, B.; Saeys, W. Lameness detection in dairy cows: Part 2. Use of sensors to automatically register changes in locomotion or behavior. Animals 2015, 5, 861–885. [Google Scholar] [CrossRef] [PubMed]
Biscarini, F.; Nicolazzi, E.L.; Stella, A.; Boettcher, P.J.; Gandini, G. Challenges and opportunities in genetic improvement of local livestock breeds. Front. Genet. 2015, 6, 33. [Google Scholar] [CrossRef] [PubMed]
Neethirajan, S. Transforming the adaptation physiology of farm animals through sensors. Animals 2020, 10, 1512. [Google Scholar] [CrossRef] [PubMed]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Wurtz, K.; Camerlink, I.; D’Eath, R.B.; Fernández, A.P.; Norton, T.; Steibel, J.; Siegford, J.; Raboisson, D. Recording behaviour of indoor-housed farm animals automatically using machine vision technology: A systematic review. PLoS ONE 2019, 14, e0226669. [Google Scholar] [CrossRef] [PubMed]
Yaseen, M. What is YOLOv8: An in-depth exploration of the internal features of the next-generation object detector. arXiv 2024, arXiv:2408.15857. [Google Scholar] [CrossRef]
Summerfield, G.I.; De Freitas, A.; van Marle-Koster, E.; Myburgh, H.C. Automated Cow Body Condition Scoring Using Multiple 3D Cameras and Convolutional Neural Networks. Sensors 2023, 23, 9051. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhang, Q.; Zhang, L.; Li, J.; Li, M.; Liu, Y.; Shi, Y. Progress of machine vision technologies in intelligent dairy farming. Appl. Sci. 2023, 13, 7052. [Google Scholar] [CrossRef]

Figure 1. Overview of the barn and camera placement [15].

Figure 2. Training losses of the back-view model.

Figure 3. Training losses of the front-view model.

Figure 4. Training losses of the top-view model.

Figure 5. Roboflow annotations “(top left): Roboflow project initialization; (top right,middle left,middle right): SAM annotation tool; (bottom center): augmentations”.

Figure 6. Google Colab.

Figure 7. Class imbalance (left) and adjusted classes (right).

Figure 8. Precision/recall curves “(top left): front view; (top right): top view; (middle left): back view” and/precision/confidence curves “(middle right): front view; (bottom left): top view; (bottom right): back view”.

Figure 9. Confusion matrices ((top): front view; (middle): top view; (bottom): back view). Rows represent true labels, and columns represent predicted labels. Each cell shows the number of test samples per class. The back view exhibits the highest classification accuracy.

Table 1. Per-view and per-class classification performance metrics, including test sample counts, correct predictions, misclassifications, precision, and recall for BCS categories 2.5, 3.0, and 3.5.

BCS Class	View	Test Samples	Correct	Misclassified	Precision	Recall
2.5	Back	21	11	10	0.55	0.52
2.5	Front	20	10	10	0.50	0.50
2.5	Top	20	3	17	0.15	0.15

Table 2. Classification performance per camera view and BCS class, including sample counts, correct predictions, misclassifications, precision, and recall.

View	BCS Class	Test Samples	Correct Predictions	Misclassifications	Precision	Recall
Back	2.5	21	11	10	0.55	0.52
	3.0	32	29	3	0.87	0.91
	3.5	18	13	5	0.72	0.72
Front	2.5	20	10	10	0.50	0.50
	3.0	30	24	6	0.80	0.80
	3.5	18	13	5	0.72	0.72
Top	2.5	20	3	17	0.15	0.15
	3.0	30	1	29	0.03	0.03
	3.5	18	2	16	0.11	0.11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lewis, R.; Kostermans, T.; Brovold, J.W.; Laique, T.; Ocepek, M. Automated Body Condition Scoring in Dairy Cows Using 2D Imaging and Deep Learning. AgriEngineering 2025, 7, 241. https://doi.org/10.3390/agriengineering7070241

AMA Style

Lewis R, Kostermans T, Brovold JW, Laique T, Ocepek M. Automated Body Condition Scoring in Dairy Cows Using 2D Imaging and Deep Learning. AgriEngineering. 2025; 7(7):241. https://doi.org/10.3390/agriengineering7070241

Chicago/Turabian Style

Lewis, Reagan, Teun Kostermans, Jan Wilhelm Brovold, Talha Laique, and Marko Ocepek. 2025. "Automated Body Condition Scoring in Dairy Cows Using 2D Imaging and Deep Learning" AgriEngineering 7, no. 7: 241. https://doi.org/10.3390/agriengineering7070241

APA Style

Lewis, R., Kostermans, T., Brovold, J. W., Laique, T., & Ocepek, M. (2025). Automated Body Condition Scoring in Dairy Cows Using 2D Imaging and Deep Learning. AgriEngineering, 7(7), 241. https://doi.org/10.3390/agriengineering7070241

Article Menu

Automated Body Condition Scoring in Dairy Cows Using 2D Imaging and Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.1.1. Study Location and Duration

2.1.2. Subjects and Data Acquisition

2.2. Digital Tools and Camera Setup

2.2.1. Camera Specifications

2.2.2. Camera Placement

2.3. Image Data Processing and Annotation

2.4. Object Detection Model

2.4.1. Model Architecture and Justification

2.4.2. Data Preprocessing and Augmentation

2.4.3. Model Training and Optimization

2.4.4. Model Evaluation

2.5. Data Analysis

3. Results

3.1. Model Performance Based on Precision/Recall Curves

3.2. Precision/Confidence Analysis

3.3. Classification Performance Based on Confusion Matrices

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI