An Internet of Things Approach to Vision-Based Livestock Monitoring: PTZ Cameras for Dairy Cow Identification

Martono, Niken Prasasti; Tsukamoto, Ryota; Ohwada, Hayato

doi:10.3390/telecom6040082

Open AccessFeature PaperEditor’s ChoiceArticle

An Internet of Things Approach to Vision-Based Livestock Monitoring: PTZ Cameras for Dairy Cow Identification

by

Niken Prasasti Martono

^*

,

Ryota Tsukamoto

and

Hayato Ohwada

Department of Industrial Systems and Engineering, Tokyo University of Science, Noda 278-8510, Japan

^*

Author to whom correspondence should be addressed.

Telecom 2025, 6(4), 82; https://doi.org/10.3390/telecom6040082

Submission received: 18 September 2025 / Revised: 20 October 2025 / Accepted: 24 October 2025 / Published: 3 November 2025

Download

Browse Figures

Versions Notes

Abstract

The Internet of Things (IoT) offers promising solutions for smart agriculture, particularly in the monitoring of livestock. This paper proposes a contactless, low-cost system for individual cow identification and monitoring in a dairy barn using a single Pan–Tilt–Zoom (PTZ) camera and a YOLOv8 deep learning model. The PTZ camera periodically scans the barn, capturing images that are processed to detect and recognize a specific target cow among the herd without any wearable sensors. The system embeds barn area metadata in each image, allowing it to estimate the cow’s location and compute the frequency of its presence in predefined zones. We fine-tuned a YOLOv8 object detection model to distinguish the target cow, achieving high precision in identification. Experimental results in a real barn environment demonstrate that the system can identify an individual cow with 85.96% Precision and 68.06% Recall, and the derived spatial occupancy patterns closely match ground truth observations. Compared to conventional methods requiring multiple fixed cameras or RFID-based wearables, the proposed approach significantly reduces equipment costs and animal handling stress. It should be noted that the present work serves as a proof-of-concept for targeted cow tracking that identifies and follows a specific individual within a herd rather than a fully generalized multi-cow identification system.

Keywords:

Internet of Things (IoT); smart agriculture; livestock monitoring; cow identification; computer vision; PTZ camera; YOLO; precision livestock farming

1. Introduction

In recent years, the dairy farming industry has faced workforce challenges due to aging populations and declining numbers of farm workers. The result is fewer farms managing larger herds, as evidenced by trends in Japan where the number of dairy farming households has decreased while the average herd size per farm is increasing. To maintain productivity with limited labor, farmers are turning to automation and smart agriculture technologies such as robotics and IoT. Key areas of interest include camera surveillance systems, automated individual animal identification, behavior classification, and tracking of livestock movements [1,2]. Among these, individual cow identification and tracking of their location (or staying frequency in certain areas of the barn) are fundamental for health monitoring, estrus detection, and behavior analysis [3,4]. Developing reliable and efficient methods for these tasks is therefore of high importance in precision livestock farming.

Current prevalent solutions for cow monitoring often rely on contact-based devices attached to each animal. For example, wireless RFID tags and sensors can log body weight, feed intake, or milk yield on a per-cow basis [5,6]. Activity monitors like pedometers or accelerometers are also used to gather movement and health data [7,8]. While effective, these approaches have notable drawbacks: each cow requires a dedicated device or tag, which increases cost and maintenance effort, and handling animals to attach or service devices can cause stress and added labor. The need for lower-cost, less intrusive monitoring has driven interest in non-contact methods that leverage cameras and computer vision.

Research on vision-based cattle identification has shown promising results [9,10,11,12,13]. Shen et al. (2020) [10] utilized fixed surveillance cameras in a barn to capture side-profile images of cows; by applying the YOLO object detector with a convolutional neural network (CNN) classifier, they achieved 96.65% accuracy in distinguishing individual dairy cows from such images. Phyo et al. (2018) [11] similarly obtained 96.3% identification accuracy using a neural network on top-down images of cows taken at a milking station. These studies underscore that deep learning models can recognize individual animals given clear images of each cow. Other efforts have combined identification with location tracking: for instance, Zin et al. (2020) [12] installed a camera aimed at a feeding area to capture multiple cows’ faces while eating. They first detected cow heads with a YOLO model, then read the ID numbers on each cow’s ear tag using a specialized CNN, thereby determining which cow was at each feeder station. Their system reported 100% success in detecting cow heads and 92.5% accuracy in recognizing the ear tag numbers, effectively tracking individual feeding locations.

While promising, existing vision-based approaches often have practical limitations. Many require images of one cow at a time or a small group under controlled angles. In a typical free-stall barn, however, dozens of cows roam and intermingle freely, making it difficult to consistently obtain clear views of each individual without significant infrastructure or manual effort. Fixed cameras only monitor specific zones, so an animal is observed only when it enters that camera’s field of view. Monitoring a single cow’s movement across the entire barn with fixed cameras would require installing many units to cover all areas, which is costly and logistically challenging. These constraints motivate a more flexible and cost-effective solution.

In this paper, we propose an IoT-enabled, contactless monitoring system that overcomes the above challenges by using a single Pan–Tilt–Zoom (PTZ) camera to cover a wide area. A PTZ camera can rotate horizontally (pan), vertically (tilt), and zoom in or out, allowing one device to surveil the entire barn from various angles. We pair the camera with a state-of-the-art YOLOv8 object detection model to perform real-time identification of a target cow within the herd. By automatically controlling the PTZ camera’s orientation and logging the camera’s viewing angle as metadata, our system knows which barn area each image represents. This enables it to estimate the target cow’s location in the barn for each detection and to aggregate the cow’s area-wise staying frequency over time.

The key contribution of this work is a low-cost, vision-based proof-of-concept system demonstrating that a single network-connected PTZ camera can perform targeted cow tracking—identifying and following one specific individual within a herd—using only image data. This binary formulation (“target cow” vs. “not target”) serves as a preliminary step toward developing scalable multi-cow identification frameworks. We focus on a single target cow as a proof of concept, as individual identification is a foundational step for many herd management tasks (it can be extended to multiple cows in future work). Our results demonstrate that the system can reliably monitor the target cow’s presence in different zones of the barn, which could be used to infer behavioral patterns or health issues. By covering the entire barn with one device, we obtain more comprehensive data on the cow’s activity than methods confined to a small area.

In typical dairy barn monitoring setups, complete spatial coverage using fixed cameras often requires between four and six units positioned at different angles to avoid blind spots, depending on barn size and layout. Each camera unit (hardware, installation, cabling, and maintenance) can cost approximately USD 300–500, leading to a total installation cost exceeding USD 2000 for a single barn. In contrast, our system achieves comparable spatial coverage using one PTZ camera costing roughly USD 400–600, with only a single network and power connection. This quantitative contrast highlights the potential of the proposed PTZ-based approach to significantly reduce both equipment and installation costs while maintaining comprehensive visual coverage.

While many studies have investigated cow identification or zone-specific monitoring using fixed cameras, there remains a clear research gap in developing a scalable, low-cost, multi-zone monitoring system that can cover the entire barn using only a single vision device. Existing systems typically require multiple fixed cameras or complex infrastructure, limiting their affordability and scalability for small- and medium-scale farms. Therefore, this study aims to address this gap by proposing an IoT-based PTZ camera framework that provides continuous coverage across multiple barn zones while maintaining minimal equipment cost and setup complexity.

The rest of this article is organized as follows. Section 2 (Related Work) reviews prior ICT applications in cattle management, contrasting contact-based and vision-based methods and positioning our approach among them. Section 3 (Methodology) describes the proposed system architecture, including the PTZ camera setup, data annotation, and YOLOv8-based identification model. Section 4 (Experimental Setup) details the implementation environment, dataset preparation, and evaluation metrics used. Section 5 (Results) presents experimental results for cow identification accuracy and location tracking performance, with comparative analysis to ground truth. Section 6 (Discussion) interprets the results, discusses the system’s practicality, and outlines limitations and future improvements. Finally, Section 7 (Conclusions) summarizes the findings and the contribution to smart agriculture IoT systems.

2. Related Work

Early adoption of ICT in cattle management has been dominated by wearable sensor systems. For example, RFID-based identification allows automatic logging of individual cow data such as weight, feed intake, and milk production by scanning an ear tag or collar sensor. Other studies attach accelerometers or pedometers to cows to monitor activity levels, feeding and drinking behavior, or even signs of lameness. Adrion et al. (2020) [14] developed an ultra-high-frequency RFID setup to track dairy cow feeding behavior, demonstrating the feasibility of monitoring individual visits to feed troughs via ear tags. While effective, these contact-type solutions share common drawbacks of cost and maintenance: each animal needs a device and regular upkeep (battery changes and repairs), and the initial deployment is expensive for large herds. There are also animal welfare considerations, as frequent close contact to attach or adjust devices can induce stress and affect natural behavior. These issues drive the exploration of non-contact methods that can passively observe animals without physical intervention.

Advancements in computer vision and deep learning have enabled visual identification of livestock using standard cameras [15,16]. As mentioned, Shen et al. [10] used convolutional neural networks to identify individual Holstein cows from barn images and achieved high accuracy (over 96%). Their approach underlined that coat patterns or physical features can distinguish one cow from another when the cow’s body is clearly visible. Phyo et al. [11] extended this concept to top-view images at milking stations, indicating that identification is possible even from partial views like a cow’s back. In addition to direct identification, vision systems have targeted related tasks such as tracking cow locations within facilities. Zin et al. [12] developed an automatic cow tracking system utilizing cameras at feed bunks: their method detected cow heads and recognized ear tag numbers to log which cow occupied each feeding area. This approach effectively integrated identification with spatial monitoring, providing farmers with real-time knowledge of feeding activity (with reported accuracies of 100% for detecting presence and 92.5% for reading tags).

Additionally, recent studies have explored multi-target tracking (MTT) techniques to simultaneously follow several animals in dynamic environments. DeepSORT and ByteTrack algorithms have been employed in livestock video analysis to maintain unique IDs across frames using motion and appearance features. Qiao et al. [17] proposed a unified architecture combining YOLO detection with deep re-identification for cattle, achieving over 90 % tracking accuracy in controlled settings.

However, most vision-based studies to date have limitations in scope or scalability. They often consider scenarios with a limited field of view or assume that the cow of interest is well-separated from others in the image. In practice, obtaining high-quality images of every individual in a group-housed setting is challenging. Cows can occlude each other or move unpredictably, and lighting or barn structures (e.g., stalls and fences) may obstruct views. Systems that rely on a fixed camera observing a fixed location (such as a water trough or a single walkway) only capture data when the target animal happens to visit that spot. This yields an incomplete picture of the animal’s overall activity. Covering the entire barn with enough fixed cameras to catch all movements would require a large number of devices installed at different angles (e.g., several cameras to cover one barn), which is impractical and costly for farms. Each additional camera increases installation and maintenance burden and generates more data to manage.

Our work differentiates itself by using a single moving camera to monitor a large area. A PTZ camera can be programmed to periodically scan across different sections of the barn, effectively acting as multiple cameras in one, as shown in Figure 1. Prior research has not extensively explored PTZ cameras for cattle monitoring, even though they offer a clear advantage in coverage flexibility. By capturing images from various angles and locations using one device, we reduce the infrastructure needs. Our approach also accepts that barn images will contain multiple cows simultaneously (reflecting reality) and focuses on robustly picking out the target individual among them. This is more challenging than scenarios where cows are imaged one at a time, but it increases the system’s practicality and applicability to real barns.

Importantly, our system performs not only identification but also continuous tracking of the cow’s position over time throughout the barn. In contrast to earlier systems that monitor a single behavior (like feeding at a station), we obtain a broader view of the cow’s daily activities by logging which zone of the barn it stays in and how often. This information can feed into higher-level analyses such as detecting changes in routine or early illness indicators. By using a networked camera and automated analytics, the system aligns well with IoT frameworks: data can be transmitted to cloud services or farm management software, enabling remote supervision and data-driven decision-making in smart agriculture.

Overall, the proposed system builds upon the strengths of vision-based identification while addressing their limitations through a more dynamic imaging strategy. It aims to deliver a practical, scalable solution that farmers could deploy with minimal equipment—just one camera and an internet connection—to continuously monitor individual animals. The following sections detail the system design and demonstrate its effectiveness in a real-world barn setting.

3. Methodology

Our proposed solution combines an automated image capture mechanism with a deep learning model for cow identification, integrated in an IoT context. Figure 2 provides an overview of the system workflow. First, images of cows inside the barn are acquired using a PTZ camera that moves to cover multiple viewpoints. These images serve as input data for building an individual identification model. Each image’s filename (or metadata) encodes the barn area that the camera was viewing when the image was taken. Next, a YOLOv8-based model is trained on these images to detect and recognize a specific target cow. Once trained, the model is applied to new images (test data) to perform identification. Finally, by combining the model’s output (bounding box location of the cow in the image) with the known area corresponding to that image, the system determines the cow’s approximate position in the barn and accumulates this information over time to compute its staying frequency in each area. This end-to-end pipeline yields both real-time detection of the cow and a summary of its movement patterns in the barn.

3.1. Data Acquisition

The image data were collected at a commercial dairy barn (Nomura Farm, Ibaraki, Japan) where a PTZ camera was installed. The barn was divided into several zones based on the camera’s field of view at different pan angles. In our implementation, as shown in Figure 3, we illustrate the camera’s 13 preset positions and their correspondence to the four analytical zones (A–D). Each preset orientation covers a specific portion of the barn, such as feeding, resting, milking, and passage areas. During operation, the camera automatically cycles through these positions every 3 s. This ensures that every major barn region is observed once per rotation (40 s total). Table 1 summarizes how presets 1–13 map to zones A–D. By completing a full rotation, the camera could surveil the majority of the barn’s space with a single unit. The area information is embedded in each image’s filename as it is saved so that we know which zone the camera was looking at for that frame.

The camera used was a network-enabled PTZ unit, streaming images to a local farm computer (referred to as the storage PC). We controlled the camera remotely via a Python-based capture script—using remote desktop software (TeamViewer) to interface with the farm PC. The camera’s resolution and frame settings were configured such that each captured image clearly showed any cows present in that zone with sufficient detail for identification.

3.2. Data Annotation and Preprocessing

Since our goal is to identify one target cow among the herd, we manually annotated the collected images to indicate the target cow’s location. We selected a particular dairy cow as the target (for instance, Cow 2 in the herd) and labeled its bounding box in each image where it appeared. An open-source annotation tool, Microsoft VOTT, was used for drawing bounding boxes around the target cow in the training images. Only the target cow is labeled—all other cows in an image are implicitly treated as background or “not the target.” This effectively creates a binary classification for detection: “target cow” vs. “not target.” After annotation, the labels were converted to the format required by YOLOv8 (YOLO text format) using the Roboflow platform for convenience. Each image’s label file contains the class (target cow) and the normalized coordinates of the bounding box.

3.3. YOLOv8 Model for Identification

We chose the YOLOv8 object detection framework to build the individual identification model. YOLOv8 [18] is the latest evolution in the YOLO (You Only Look Once) family of one-stage detectors, succeeding earlier versions such as YOLOv5 and YOLOv7. It combines a Cross-Stage Partial (CSP) backbone with a Path Aggregation Network (PAN-FPN) head for efficient multi-scale feature fusion and employs decoupled detection heads for classification and localization. The model is pre-trained on the COCO dataset [19], a large-scale benchmark comprising over 330,000 labeled images across 80 object classes, providing strong general visual priors that can be transferred to domain-specific tasks such as livestock monitoring. This architectural design enables real-time inference with high accuracy, making it suitable for IoT applications requiring lightweight but robust visual detection. We fine-tuned this model on our cow dataset so that it learns to detect the target cow as a specific class. Key training hyperparameters are listed in Table 2; we set the number of training epochs to 100, the batch size to 16, and the input image size to 640 × 640 pixels. The model weights from the epoch with the best performance on the validation set were saved as best.pt and used for evaluation.

The YOLOv8x architecture was used without structural modification to the convolutional backbone or feature pyramid layers; however, the classification head was adapted to a single class representing the target cow. Specifically, the detection head’s output dimension was changed from the default 80 (COCO classes) to 1, corresponding to a binary formulation: ‘target cow’ versus ‘background/other cows.’ This adjustment simplifies training and ensures that the model focuses on fine-grained visual cues distinctive to the target, such as coat pattern, body contour, and horn shape. No additional re-identification embedding was included at this stage, as the study’s objective was to validate whether such fine-grained differentiation is achievable with standard YOLOv8 detection features. During fine-tuning, lower-level convolutional layers were frozen to preserve general visual features learned from COCO, while higher-level detection layers were retrained to adapt to the specific appearance of the target cow.

During training, the only object class is the target cow, meaning YOLOv8 is essentially learning to discriminate the target cow from anything else (other cows or background). This approach leverages YOLO’s strength in simultaneously localizing and classifying objects: if the target cow is present in an image, the model should output a bounding box around it with the label (e.g., “Cow 2”) and a confidence score. If the target cow is absent, the model ideally outputs no detections (or only very low-confidence false detections which can be filtered out by a threshold). We note that YOLOv8’s architecture includes a CSP-based backbone and PAN-FPN (Path Aggregation Network) for feature fusion, along with a detection head that predicts bounding boxes and class probabilities. We leveraged the pre-trained weights (which learned general features from common objects) and fine-tuned them to the appearance of our target cow. This strategy requires far fewer images than training from scratch—in our case, only hundreds of training images were available (see Section 4).

After training, the model was applied to a separate set of test images to assess its performance. We configured the inference such that we could adjust the confidence threshold for detection. This threshold determines the minimum confidence for the model to consider a detection valid. By varying the threshold, we can trade off between higher Recall (lower threshold, catching more true instances at the expense of more false alarms) and higher Precision (higher threshold, only very confident detections, potentially missing some true instances).

Although the dataset contained only 510 training images with a single labeled cow, several augmentation techniques were applied to enhance model generalization. We used random horizontal flips, slight rotations (10°), brightness/contrast adjustments, and scaling within 15%. These augmentations effectively increased the diversity of visual contexts and compensated for the limited dataset size. Because only one class (“target cow”) was labeled, we handled the resulting class imbalance implicitly by employing high IoU thresholds and fine-tuning from pre-trained YOLOv8 weights, which preserved learned generic object features while adapting them to the target cow’s visual appearance. This strategy helped stabilize training despite the small and unbalanced dataset.

3.4. Location Estimation and Frequency Calculation

The final step of the methodology is to map each detected instance of the cow to a barn location and update the cow’s stay frequency in each area. Since every image is tagged with an area label (A, B, C, or D, corresponding to the camera’s orientation at capture), the simplest approach is to use that label as the cow’s location when the cow is detected in that image. In practice, we refined this by also considering the cow’s bounding box position within the image, in case an image frame covered more than one zone. In our barn configuration, each image primarily showed one designated zone, so the area label from the file path was directly used to categorize the cow’s position. Whenever the model identifies the cow in a frame, we increment a counter for that area. Over time (e.g., over an hour of monitoring), we obtain a tally of how many times the cow was seen in each area. This is converted into a percentage or frequency indicating the distribution of the cow’s presence throughout the barn. For instance, a high count in area C would mean the cow spent a lot of time in zone C during the observation period.

In summary, the methodology ties together IoT-driven data collection (camera network and remote operation), machine learning-based identification, and location-based data aggregation. The next section details the experimental setup, including how we realized this methodology in a real barn and how we evaluated the system’s performance.

4. Experimental Setup

4.1. Barn Environment and Equipment

The system was tested in a free-stall dairy barn housing multiple Holstein cows. A single PTZ camera (model: Panasonic BB-SC384 or similar, supporting remote Pan–Tilt–Zoom and 1080p image capture) was installed on a barn column at an elevated position to get a clear view over the pen partitions. The camera was connected to a Raspberry Pi and a local farm PC (for image storage) via a local area network. The capture system was configured such that the Raspberry Pi controlled camera movements and forwarded images to the storage PC. We utilized a remote connection from our lab to the farm PC to start and stop image collection, ensuring we could coordinate data gathering with cow availability and daylight conditions.

The image capture protocol was as follows: we operated the camera in a continuous rotation mode from one end of the barn to the other. Each rotation cycle took approximately 40 s to cover all defined positions (13 stops), yielding around 13 images per cycle (roughly one image every 3 s). We collected data during daytime hours (around 7:00–16:00), when cows were most active, to ensure our model experiences typical scenarios of movement and interactions. The lighting in the barn was natural daylight supplemented by indoor lights, and images were captured in color. The camera’s zoom was adjusted to a level that could capture an entire zone (for example, one feeding area or one segment of the resting area) with sufficient detail to distinguish individual coats or markings.

4.2. Dataset Composition

We gathered a total of 510 labeled training images over several days and a continuous sequence of 1139 images from a one-hour interval for testing. Table 3 summarizes the dataset split. The training images were selected from four dates (14–17 September 2024, morning to midday), with counts ranging from 58 to 214 per day, totaling 510. The test set was a sequence captured on 17 September 2024, from 11:00 to 11:59 (just after the last training data cutoff), consisting of 1139 frames. We chose a contiguous test sequence to evaluate the system’s performance in a realistic monitoring scenario, where the model must continuously track the cow as the camera moves and the cow potentially changes positions. The target cow (Cow 2) was present in some of these frames and not in others, reflecting real conditions where an individual may sometimes be out of view (e.g., if occluded or at the far end of a zone).

4.3. Training Procedure

Using the annotated training images, we fine-tuned the YOLOv8x model in a Python 3 environment (Google Colab). The Ultralytics YOLOv8 library was employed for training with stochastic gradient descent. We allocated 10% of the training images as a validation set to monitor training progress and prevent overfitting. Early stopping or best-weight-saving was enabled so that the model would retain the epoch with the highest mAP (mean average precision) or highest combined score on validation. After 100 epochs, the model converged with strong performance on the training data (nearly perfect detection of the target cow in training frames). The best model checkpoint (as determined by validation mAP) was used for inference on the test set.

4.4. Ground Truth for Evaluation

Although the test set was not used in training, we prepared ground truth references for evaluation. Specifically, we manually reviewed the 1139 test images to record the actual occurrences of the target cow. We noted in which frames the target cow was visible and in which barn area. From this, we derived the true distribution of the cow’s locations during the test hour (the “Ground Truth” stay frequency). According to our manual count, the cow appeared in 72 out of the 1139 test frames, and the breakdown by area was as follows: 1 time in Area A, 15 times in B, 56 times in C, and 0 in D (Area D in that hour was never visited by the target cow). This ground truth data was used to quantitatively assess the system’s detection and location tracking accuracy.

We evaluated the identification accuracy of the model using standard object detection metrics, specifically Precision, Recall, and F1-score, computed with respect to the target cow class. A detection was considered a true positive (TP) if the model correctly identified the target cow in a frame (i.e., a bounding box around the target cow). A false positive (FP) meant that the model detected something (either a different cow or a background object) but mistakenly classified it as the target cow. A false negative (FN) occurred when the target cow was present in the image but the model failed to detect it. (True negatives are not defined in this single-class detection scenario, since not detecting anything in an image without the target is the expected correct behavior.) Using these, we define Precision = TP/(TP + FP)—the fraction of the model’s detections that were actually the target cow—and Recall = TP/(TP + FN)—the fraction of actual target occurrences that the model detected. The F1-score is the harmonic mean of Precision and Recall, providing a balanced measure of accuracy. We calculated these metrics under different confidence threshold settings (from 0.1 to 0.9) to examine the trade-offs.

For the location tracking aspect, we compared the model’s output of the cow’s area against the ground truth area for each frame in which the cow was truly present. We tallied the number of frames the cow was detected in each area and compared those counts to the ground truth counts (as mentioned above). This allowed us to see if the model’s estimated staying frequency per area matched reality and how it was affected by false positives or false negatives. With the experimental setup defined, we proceeded to run the identification model on the test sequence and gather results, as described in the next section.

5. Results

5.1. Identification Performance

The YOLOv8-based model produced promising results in identifying the target cow. We first examine how the model’s Precision and Recall varied with the confidence threshold.As expected, there is a clear trade-off: lower thresholds yield higher Recall but lower Precision, while higher thresholds do the opposite. At a very low threshold (0.1), the model detects nearly every instance of the cow (Recall 90.3%) but at the cost of many false positives (Precision only 39.2%)—meaning it frequently confused other cows or objects as the target. On the other hand, at a high threshold (0.9), the Precision climbs to 85.96%, indicating that almost 86% of the detections at this setting were correct identifications of the target cow, though Recall dropped to 68.06%. The highest F1-score (which balances Precision and Recall) was around 77.3% at a threshold of 0.8. This suggests that threshold 0.8 provides the best overall trade-off in our case, capturing around 80.6% of cow occurrences with 74.4% Precision.

In the context of a monitoring system, we decided to prioritize Precision over Recall. This is because a false identification (mistakenly tagging a different cow as the target) could mislead the tracking of the target’s location, whereas a missed detection can be recovered from in subsequent frames. Our target use-case is to monitor the cow’s movements, not necessarily to detect every single appearance. As long as the cow is detected most of the time in its locations, occasional misses are tolerable (the cow usually stays in an area for several seconds or minutes, so it will likely be detected in one of the next frames). On the other hand, false positives would incorrectly count the cow in a location where it actually was not, leading to misleading data. Therefore, we chose a high threshold (0.9) for the deployment setting, which gave the highest Precision, as shown in Table 4. At Conf ≥ 0.9, the model’s performance was as follows: Precision = 85.96%, Recall = 68.06%, F1 = 75.97%. In practical terms, out of all detections the model made, 86% were correct (true sightings of Cow 2), and it managed to catch about 68% of all the times Cow 2 appeared in the hour.

To illustrate the model’s behavior, Figure 4 shows an example detection on a test image. Here, the target cow (Cow 2) is correctly identified with a bounding box and a confidence score of 95%. The image demonstrates a typical scenario: multiple cows are present, but the model has drawn a box only around the target cow of interest. In some other frames, the model occasionally drew a box around a non-target cow (false positive) when using lower confidence thresholds, or missed the target if it was too obscured or at an odd angle (false negative). However, with the high threshold setting, such errors were infrequent. Most false detections were eliminated by requiring ≥0.9 confidence, and the remaining missed detections did not severely impact tracking, as explained next.

5.2. Location Tracking and Stay Frequency

Next, we evaluate how well the system tracked the cow’s location in the barn. Using the chosen high-precision setting (Conf ≥ 0.9), the model made 57 detections of the target cow over the 1 h test (this number is the total TP + FP). These detections were mapped to barn areas based on the image metadata. Table 5 compares the distribution of the cow’s locations as estimated by the system versus the ground truth distribution (from manual observation). Ground truth indicated that the cow spent about 78% of the time in Area C, 21% in Area B, 1.4% in Area A, and none in Area D during that hour. The system’s output at Conf ≥ 0.9 was very close: it reported 68.4% of detections in C, 29.8% in B, 1.8% in A, and 0% in D. In other words, the system correctly identified the primary areas where the cow was and the relative frequency order (C most, B second, A occasional, D never). The percentages are not identical—notably, the system under-counted in Area C relative to truth (68.4% vs. 77.8%) and slightly over-counted in B (29.8% vs. 20.8%). This is attributable to the missed detections: some occurrences of the cow in Area C were not detected, reducing C’s count, and a couple of false positives in B (identifying a different cow as the target) increased B’s count slightly. However, the overall pattern matches well, and importantly, the system did not produce any false indication of the cow in Area D (which the cow never visited).

The above results confirm that a high-precision setting yields location data closely reflecting reality. For comparison, if we had used a very low threshold (e.g., 0.1), the system would have over-detected the cow (166 total detections) including many false positives, leading to a very distorted location distribution (Conf ≥ 0.1 output was far from ground truth in all areas). Thus, ensuring Precision was key to obtaining meaningful tracking information. Our hypothesis that missing some detections (lower Recall) would not significantly hurt the cumulative location count is supported by the data: even though only 68% of actual appearances were detected, those misses were effectively random and the cow was usually picked up a few seconds later by the camera’s next pass. This means that the PTZ camera’s periodic scanning strategy is validated—the cow might go undetected in one frame, but given its limited mobility speed, it is likely still in the same area or a neighboring area in the next rotation, where it gets detected. Our one-camera system successfully monitored the target cow across the entire barn and produced an accurate representation of its area usage.

Our results demonstrate that the system meets its core objectives: (1) It can identify a target cow without any collar or marker, in a setting with multiple animals, using AI-based image analysis. (2) It can track where that cow goes in the barn and how often it stays in each place, with good agreement to actual observed behavior. (3) It achieves this with minimal hardware (one camera) and thus at a fraction of the cost of multi-camera or wearable systems.

6. Discussion

The experimental results indicate that our approach is viable for real-world smart farming applications. By using a PTZ camera and a deep learning model, we demonstrated continuous, contactless monitoring of individual animals. The system dramatically reduces the equipment needed to monitor cow behavior. Instead of outfitting each cow with a sensor or installing a network of cameras, a farmer can deploy a single PTZ camera to cover a large area. As analyzed in the related work, covering the same barn with fixed cameras might require 4–6 units from different angles, incurring high costs in purchase, installation, and maintenance. Our solution uses one device to achieve comparable coverage. Moreover, since the data needed is just images, the infrastructure is simplified—images can be sent over Wi-Fi or wired network to a central system, aligning with IoT architecture where each camera is a smart sensor node.

The contactless nature of our system means the cow does not need to wear any equipment. This eliminates the stress and potential injury that can occur when attaching devices to animals. It also reduces labor for farm staff, as they do not have to routinely check or fix sensors on the cows. The cows in our test barn were unaware of the monitoring process, continuing their normal routine. This suggests that behavior data collected (like area preferences, resting times, feeding times) are natural and not influenced by the monitoring method, which is ideal for animal welfare and for the validity of the observations.

Through our evaluation, we found that prioritizing Precision was essential for reliable monitoring. In many IoT sensing scenarios (e.g., health alerts), missing a few events might be acceptable, but false alarms can be problematic. In our context, if the system falsely identifies the cow in a zone where it is not, it could lead to incorrect conclusions (for example, thinking the cow visited the water trough when it did not). By tuning the model to be conservative (high confidence threshold), we ensured that when a detection is recorded, it is highly likely to be correct. The trade-off is that the system might momentarily lose track of the cow (e.g., if it does not detect the cow for a few seconds), but as discussed, the PTZ scanning mitigates this by giving repeated opportunities to catch the cow on subsequent passes. In practice, the cow’s movement speed is slow enough that missing one frame is not critical—it will still be in roughly the same area in the next frame or two.

The system we built can be considered an IoT node in a larger smart farm ecosystem. The PTZ camera with edge computing (Raspberry Pi) could run the detection model locally or send the images to a cloud service for processing. In our experiments, images were processed after collection, but an online deployment could use edge AI hardware to run YOLOv8 in real-time and stream results. The output—identified cow and location—can be transmitted to farm management software. For example, if the system is monitoring a cow that needs special attention (perhaps one that is sick or in estrus), alerts could be generated if the cow does not show up at the feeder for a certain period or if it remains in an unusual location. The data collected (stay frequencies) can feed into analytics for space utilization in barns, helping optimize barn design or stocking density by understanding how cows use the space.

In terms of computational performance, inference using the fine-tuned YOLOv8x model on a GPU (NVIDIA T4, Google Colab) required approximately 35–40 ms per 640 × 640 image, corresponding to a processing speed of roughly 25 frames per second. On a Raspberry Pi 4 (8 GB RAM) without hardware acceleration, inference latency increased to about 1.2 s per frame. This implies that real-time performance is achievable with embedded AI accelerators such as NVIDIA Jetson Nano/Orin or Coral TPU modules. Since the PTZ camera captures one image every 3 s during rotation, the current system already meets the temporal requirement for sequential monitoring. Future deployment will incorporate on-device inference to minimize transmission delay and enable instant visualization of detections on the farm network.

Despite the positive results, our approach has some limitations. Currently, the system was evaluated for a single target cow. Scaling to identify and track all individuals in a herd would require either training multiple models (one per cow) or training one multi-class model that recognizes each cow as a distinct class. The former does not scale well as the number of cows grows, while the latter would require a lot more labeled data (each cow needs to be labeled, and the model complexity increases). However, multi-object tracking techniques or re-identification algorithms (re-ID) could be integrated. One potential extension is to use the detection model to find all cows in an image and then apply a secondary re-ID network that distinguishes which one is the target or assigns IDs to each. Our work focused on proving the concept for one cow; future work will explore multi-cow scenarios.

Another limitation is that YOLOv8, while powerful, can sometimes be confused by similar-looking animals or certain poses. For instance, if two cows with similar coat patterns stand close, the model might identify the wrong one. Ensuring a diverse training set (images of the target cow from different angles, in groups, at different positions) helped mitigate this. In practice, farmers might choose target cows that have distinctive visual features if they want to deploy such a system (e.g., a cow with unique markings could be easier to track). Otherwise, additional markers (like a colored collar visible to cameras) could be a practical compromise—still much simpler than an active sensor, just a visual aid for the algorithm.

The barn environment presents challenges like variable lighting (day vs. night) and occlusions (other cows, posts, feeding racks). Our experiments were in daytime; at night, additional lighting or IR-capable cameras would be needed, and the model might require adaptation to different light conditions. Occlusions remain a challenge—if the target cow is completely obscured behind others or lies in a hard-to-see corner, no vision system can identify it. The PTZ’s flexibility in angles can reduce occlusions (it can capture from different sides) but not eliminate them. In the future, combining this with a second PTZ camera on the opposite side could ensure any cow hidden from one is visible to another, still with fewer cameras than a fully fixed rig. It is insightful to compare our results with a hypothetical multi-camera setup. The identification accuracy (Precision 86%; Recall 68%) is likely on par with what a fixed camera might achieve if it had a continuous view of the target (since it is largely a function of the model and image clarity). The Recall loss in our case partly comes from the cow sometimes not being in view (camera is looking elsewhere). A fixed array of cameras might catch the cow more consistently (higher Recall) but at a huge cost. Our single-camera Recall of 68% was sufficient to accurately gauge location usage, as evidenced by the match to ground truth frequencies. This indicates a diminishing returns scenario—doubling or tripling cameras might yield higher raw Recall, but the improvement in practical insight (like knowing where the cow spends time) might be marginal, not justifying the expense.

Because the proposed system relies on continuous network connectivity between the PTZ camera, local server, and potential cloud services, data security is a critical concern. All image streams should be transmitted through encrypted protocols (e.g., HTTPS or secure RTSP) and stored within protected local servers. Although the system captures only barn scenes without human subjects, compliance with farm data-management policies and national privacy regulations remains essential. Future implementations will integrate lightweight encryption and edge-storage options to ensure that sensitive operational data (such as animal IDs or farm layout) are safeguarded while maintaining system responsiveness.

Overall, this discussion underscores that the PTZ + YOLOv8 approach is a practical alternative for farms that cannot invest in extensive hardware. It brings IoT and AI to the barn in an accessible way. There are certainly scenarios where a hybrid approach (some critical points covered by fixed cameras, plus a PTZ for general coverage) could be optimal. The system could also be extended beyond identification: for example, once the cow is detected, additional analyses like body posture detection or gait assessment could be done on the image to infer health conditions. The modular nature of our pipeline allows for such extensions.

Additionally, although the system demonstrated satisfactory Precision (85.96%) and practical tracking capability, several limitations should be acknowledged. First, the current implementation focuses on a single target cow and daytime conditions; performance under low-light or crowded scenarios requires further validation. Second, the Recall (68%) indicates that occasional misses occurred when the cow was partially occluded or outside the current camera view. In comparison, fixed multi-camera systems reported Recalls above 85% (e.g., Zin et al. [12]; Shen et al. [10]), though at a significantly higher infrastructure cost. Our single-camera design sacrifices some temporal continuity for broader spatial coverage, reducing hardware by approximately 80%. Finally, the limited training dataset (510 images) restricts generalization; larger, multi-farm datasets will be collected to evaluate model robustness across environments. Despite these constraints, the achieved balance between cost, accuracy, and simplicity positions the PTZ + YOLOv8 framework as a practical foundation for scalable precision livestock monitoring.

In the next section, we conclude the paper by summarizing the achievements and outlining future work directions, including how this system can be expanded for broader use in precision livestock farming.

7. Conclusions

In this work, we presented a novel IoT-based system for individual cow identification and monitoring that utilizes a single PTZ camera and the YOLOv8 object detection model. The system successfully demonstrates that a contactless approach can identify a target cow within a herd and continuously track its location in a barn, all with minimal infrastructure. Our experiments achieved high Precision (around 86%) in identifying the target cow, and the system’s output for the cow’s spatial distribution closely matched ground truth observations (e.g., correctly determining which zones the cow frequented and for what proportion of time). The entire solution requires only image data as input and a camera as the sensing device—no wearables, tags, or multiple cameras were needed. By using the PTZ camera’s wide coverage (via rotation), one device was able to monitor a broad area that would otherwise need several static cameras [19]. These advantages translate to a low-cost and practical monitoring system that could be readily deployed on farms, contributing to the advancement of ICT (Information and Communication Technology) in livestock management.

The outcomes underscore the feasibility of integrating computer vision into smart agriculture for real-time monitoring. We have shown that even in unconstrained environments (barns with free-moving animals), AI models can be trained to recognize individuals with high accuracy. This opens up opportunities to monitor livestock behavior continuously and unobtrusively. For instance, dairy farmers could use such systems to automatically log how long each cow spends eating, resting, or standing and detect anomalies (a cow deviating from its normal routine might be ill or in heat). The IoT aspect means that the system’s data can be accessed remotely and combined with other farm data (like milking records or climate sensors) for holistic farm management.

Looking forward, there are several challenges to enhance the system. One immediate next step is to extend identification to multiple cows simultaneously. This could involve training the model with multiple classes (Cow 1, 2, 3, etc.) or employing a multi-target tracking algorithm on top of the detections. Another enhancement would be real-time operation—deploying the model on an edge device (such as a more powerful Raspberry Pi or an NVIDIA Jetson) to process the video stream live and send alerts instantly. We also plan to incorporate behavior classification and anomaly detection. With the cow’s positions over time, we can infer behaviors (e.g., cow is mostly in lying area = resting; pacing between areas = possible stress/estrus). Detecting events like prolonged inactivity or avoidance of the feed area could alert farmers to health issues early.

In summary, this paper contributes a practical framework for smart dairy farming by combining IoT hardware (PTZ camera) with AI software (YOLOv8). The system achieves contactless, accurate monitoring of a dairy cow at a fraction of the cost of traditional methods. We believe that this approach can significantly aid farmers by automating the tedious task of animal supervision, thus improving herd management efficiency and animal welfare. As technology in the telecom and IoT domains continues to advance, we anticipate that such vision-based livestock monitoring systems will become standard tools in the repertoire of precision agriculture.

Author Contributions

Conceptualization, R.T. and N.P.M.; methodology, R.T.; software, R.T.; validation, R.T., N.P.M. and H.O.; formal analysis, N.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study because no animals were harmed or directly contacted. All data used in this study were obtained through non-invasive camera recordings in the barn environment. The monitoring process did not involve physical interaction, restraint, or manipulation of the animals, and thus posed no risk or discomfort to them.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ICT	Information and Communication Technology
IoT	Internet of Things
PTZ	Pan–Tilt–Zoom
YOLO/YOLOv8	You Only Look Once (version 8)
CNN	Convolutional Neural Network
RFID	Radio-Frequency Identification
PAN–FPN	Path Aggregation Network–Feature Pyramid Network
CSP	Cross Stage Partial
mAP	mean Average Precision
PR	Precision–Recall
TP	True Positive
FP	False Positive
FN	False Negative
F1	F1-score
LAN	Local Area Network
PC	Personal Computer
IR	Infrared
re-ID	(Re-)Identification

References

Williams, L.R.; Moore, S.T.; Bishop-Hurley, G.J.; Swain, D.L. A sensor-based solution to monitor grazing cattle drinking behaviour and water intake. Comput. Electron. Agric. 2020, 168, 105141. [Google Scholar] [CrossRef]
Džermeikaitė, K.; Bačėninaitė, D.; Antanaitis, R. Innovations in Cattle Farming: Application of Innovative Technologies and Sensors in the Diagnosis of Diseases. Animals 2023, 13, 780. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Liu, Y.; Yu, S.; Wang, R.; Song, Z.; Yan, Y.; Li, F.; Wang, Z.; Tian, F. Automatic Detection Method of Dairy Cow Feeding Behaviour Based on YOLO Improved Model and Edge Computing. Sensors 2022, 22, 3271. [Google Scholar] [CrossRef] [PubMed]
Nielsen, B.L.; Veerkamp, R.F.; Lawrence, A.B. Effects of Genotype, Feed Type and Lactational Stage on the Time Budget of Dairy Cows. Acta Agric. Scand. Sect. A-Anim. Sci. 2000, 50, 272–278. [Google Scholar] [CrossRef]
Stygar, A.H.; Gómez, Y.; Berteselli, G.V.; Dalla Costa, E.; Canali, E.; Niemi, J.K.; Llonch, P.; Pastell, M. A Systematic Review on Commercially Available and Validated Sensor Technologies for Welfare Assessment of Dairy Cattle. Front. Vet. Sci. 2021, 8, 634338. [Google Scholar] [CrossRef] [PubMed]
Lamanna, M.; Bovo, M.; Cavallini, D. Wearable Collar Technologies for Dairy Cows: A Systematized Review of the Current Applications and Future Innovations in Precision Livestock Farming. Animals 2025, 15, 458. [Google Scholar] [CrossRef] [PubMed]
Mattachini, G.; Riva, E.; Perazzolo, F.; Naldi, E.; Provolo, G. Monitoring feeding behaviour of dairy cows using accelerometers. J. Agric. Eng. 2016, 47, 54. [Google Scholar] [CrossRef]
Martono, N.P.; Sawado, R.; Nonaka, I.; Terada, F.; Ohwada, H. Automated Cattle Behavior Classification Using Wearable Sensors and Machine Learning Approach. In Knowledge Management and Acquisition for Intelligent Systems; Springer Nature: Singapore, 2023; pp. 58–69. [Google Scholar] [CrossRef]
Li, D.; Li, B.; Li, Q.; Wang, Y.; Yang, M.; Han, M. Cattle identification based on multiple feature decision layer fusion. Sci. Rep. 2024, 14, 26631. [Google Scholar] [CrossRef] [PubMed]
Shen, W.; Hu, H.; Dai, B.; Wei, X.; Sun, J.; Jiang, L.; Sun, Y. Individual identification of dairy cows based on convolutional neural networks. Multimed. Tools Appl. 2019, 79, 14711–14724. [Google Scholar] [CrossRef]
Phyo, C.N.; Zin, T.T.; Hama, H.; Kobayashi, I. A Hybrid Rolling Skew Histogram-Neural Network Approach to Dairy Cow Identification System. In Proceedings of the 2018 International Conference on Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand, 19–21 November 2018; pp. 1–5. [Google Scholar] [CrossRef]
Zin, T.T.; Pwint, M.Z.; Seint, P.T.; Thant, S.; Misawa, S.; Sumi, K.; Yoshida, K. Automatic Cow Location Tracking System Using Ear Tag Visual Analysis. Sensors 2020, 20, 3564. [Google Scholar] [CrossRef]
Li, G.; Erickson, G.E.; Xiong, Y. Individual Beef Cattle Identification Using Muzzle Images and Deep Learning Techniques. Animals 2022, 12, 1453. [Google Scholar] [CrossRef]
Adrion, F.; Keller, M.; Bozzolini, G.B.; Umstatter, C. Setup, Test and Validation of a UHF RFID System for Monitoring Feeding Behaviour of Dairy Cows. Sensors 2020, 20, 7035. [Google Scholar] [CrossRef] [PubMed]
Wang, R.; Gao, R.; Li, Q.; Zhao, C.; Ru, L.; Ding, L.; Yu, L.; Ma, W. An ultra-lightweight method for individual identification of cow-back pattern images in an open image set. Expert Syst. Appl. 2024, 249, 123529. [Google Scholar] [CrossRef]
Qiao, Y.; Clark, C.; Lomax, S.; Kong, H.; Su, D.; Sukkarieh, S. Automated Individual Cattle Identification Using Video Data: A Unified Deep Learning Architecture Approach. Front. Anim. Sci. 2021, 2, 759147. [Google Scholar] [CrossRef]
Qiao, Y.; Guo, Y.; He, D. Cattle body detection based on YOLOv5-ASFF for precision livestock farming. Comput. Electron. Agric. 2023, 204, 107579. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. GitHub Repository. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 23 October 2025).
Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Doll’ar, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]

Figure 1. Examples of PTZ camera coverage in different barn layouts. The yellow-highlighted regions represent the camera shooting area, showing how a single PTZ camera can replace multiple fixed cameras for comprehensive monitoring.

Figure 2. End-to-end pipeline for automated cow identification and movement tracking. This figure illustrates the process, from the PTZ camera acquiring images of cows in the barn to the data pipeline where a model detects and identifies a target cow.

Figure 3. Layout of the dairy barn with defined surveillance zones. This figure illustrates the division of the barn into broader regions (A, B, C, D) based on the PTZ camera’s field of view. The PTZ camera (center) rotates through a sequence of 13 preset orientations, each assigned a specific area number, to collectively cover the entire barn interior. This zoning allows for associating each acquired image with its corresponding physical area.

Figure 4. Example of the system identifying the target cow in a barn image. The YOLOv8 model has placed a bounding box on Cow 2 (the target individual) with 95% confidence, as indicated by the label “cow2 0.95” above the box.

Table 1. Correspondence between PTZ camera presets and barn zones. Each preset orientation covers a specific section of the barn used for analysis.

Preset No.	Direction (Pan °)	Zone Label
1–3	0–60°	A
4–6	60–120°	B
7–10	120–240°	C
11–13	240–330°	D

Table 2. Main parameters used in model training.

Parameter	Value
Pre-Trained Model	YOLOv8x
Number of Epochs	100
Batch Size	16
Image Size	640

Table 3. Dataset summary for the experiments. The target cow was labeled in each training image. The test set is a continuous one-hour sequence with no additional labels (used for evaluating model predictions against ground truth).

Date	Training Data (Images)	Test Data (Images)
2024/9/14	124	–
2024/9/15	114	–
2024/9/16	58	–
2024/9/17 (AM∼11:00)	214	–
2024/9/17 (11:00∼11:59)	–	1139
Total	510	1139

Table 4. Individual identification results by confidence threshold, including mean average precision (mAP).

Threshold	Precision (%)	Recall (%)	F1-Score (%)	mAP@0.5	mAP@0.5:0.95
Conf ≥ 0.1	39.16	90.28	54.62	0.78	0.52
Conf ≥ 0.2	48.12	88.89	62.44	0.81	0.55
Conf ≥ 0.3	52.14	84.72	64.55	0.83	0.57
Conf ≥ 0.4	54.13	81.94	65.19	0.85	0.58
Conf ≥ 0.5	57.28	81.94	67.43	0.86	0.59
Conf ≥ 0.6	61.70	80.56	69.88	0.87	0.60
Conf ≥ 0.7	64.44	80.56	71.60	0.88	0.60
Conf ≥ 0.8	74.36	80.56	77.33	0.89	0.61
Conf ≥ 0.9	85.96	68.06	75.97	0.89	0.61

Table 5. Comparison of stay location counts and ratios across confidence thresholds and ground truth.

Area	Conf ≥ 0.1		Conf ≥ 0.9		Ground Truth
Area	Count	Ratio (%)	Count	Ratio (%)	Count	Ratio (%)
A	10	6.0	1	1.8	1	1.4
B	76	45.8	17	29.8	15	20.8
C	75	45.2	39	68.4	56	77.8
D	5	3.0	0	0.0	0	0.0
Total	166	100	57	100	72	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martono, N.P.; Tsukamoto, R.; Ohwada, H. An Internet of Things Approach to Vision-Based Livestock Monitoring: PTZ Cameras for Dairy Cow Identification. Telecom 2025, 6, 82. https://doi.org/10.3390/telecom6040082

AMA Style

Martono NP, Tsukamoto R, Ohwada H. An Internet of Things Approach to Vision-Based Livestock Monitoring: PTZ Cameras for Dairy Cow Identification. Telecom. 2025; 6(4):82. https://doi.org/10.3390/telecom6040082

Chicago/Turabian Style

Martono, Niken Prasasti, Ryota Tsukamoto, and Hayato Ohwada. 2025. "An Internet of Things Approach to Vision-Based Livestock Monitoring: PTZ Cameras for Dairy Cow Identification" Telecom 6, no. 4: 82. https://doi.org/10.3390/telecom6040082

APA Style

Martono, N. P., Tsukamoto, R., & Ohwada, H. (2025). An Internet of Things Approach to Vision-Based Livestock Monitoring: PTZ Cameras for Dairy Cow Identification. Telecom, 6(4), 82. https://doi.org/10.3390/telecom6040082

Article Menu

An Internet of Things Approach to Vision-Based Livestock Monitoring: PTZ Cameras for Dairy Cow Identification

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Data Acquisition

3.2. Data Annotation and Preprocessing

3.3. YOLOv8 Model for Identification

3.4. Location Estimation and Frequency Calculation

4. Experimental Setup

4.1. Barn Environment and Equipment

4.2. Dataset Composition

4.3. Training Procedure

4.4. Ground Truth for Evaluation

5. Results

5.1. Identification Performance

5.2. Location Tracking and Stay Frequency

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI