A Comprehensive Evaluation of YOLO-Based Deer Detection Performance on Edge Devices

Adhikari, Bishal; Li, Jiajia; Michel, Eric S.; Dykes, Jacob; Tseng, Te-Ming; Tagert, Mary Love; Chen, Dong

doi:10.3390/electronics15051026

Open AccessArticle

A Comprehensive Evaluation of YOLO-Based Deer Detection Performance on Edge Devices

by

Bishal Adhikari

¹

,

Jiajia Li

²,

Eric S. Michel

³,

Jacob Dykes

³

,

Te-Ming Tseng

⁴

,

Mary Love Tagert

¹

and

Dong Chen

^1,*

¹

Department of Agricultural and Biological Engineering, Mississippi State University, Starkville, MS 39762, USA

²

Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA

³

Department of Wildlife, Fisheries and Aquaculture, Mississippi State University, Starkville, MS 39762, USA

⁴

Department of Plant and Soil Science, Mississippi State University, Starkville, MS 39762, USA

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(5), 1026; https://doi.org/10.3390/electronics15051026

Submission received: 7 January 2026 / Revised: 5 February 2026 / Accepted: 25 February 2026 / Published: 28 February 2026

(This article belongs to the Special Issue Advances in Intelligent Computer Vision and Multimedia Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The escalating economic losses in agriculture due to deer intrusion, estimated to be in the hundreds of millions of dollars annually in the U.S., highlight the inadequacy of traditional mitigation strategies such as hunting, fencing, use of repellents, and scare tactics. This underscores a critical need for intelligent, autonomous solutions capable of real-time deer detection and deterrence. But the progress in this field is impeded by a significant gap in the literature, mainly the lack of a domain-specific, practical dataset and limited studies on the viability of deer detection systems on edge devices. To address this gap, this study presents a comprehensive evaluation of state-of-the-art deep learning models for deer detection in challenging real-world scenarios. We introduce a curated, publicly available dataset of 3095 annotated images with bounding box annotation of deer. Then, we provide an extensive comparative analysis of 12 model variants across four recent YOLO architectures (v8 to v11). Finally, we evaluated their performance on two representative edge computing platforms, the CPU-based Raspberry Pi 5 and the GPU-accelerated NVIDIA Jetson AGX Xavier, to assess feasibility for real-world field deployment. To ensure a standardized comparison, we established a framework-agnostic deployment pipeline using universal Open Neural Network Exchange (ONNX) runtimes. Results show that the real-time detection performance is not feasible on Raspberry Pi using universal runtimes, suggesting that while framework-agnostic runtimes facilitate portability, low-power CPU deployment requires hardware-specific optimization to achieve real-time thresholds. Conversely, NVIDIA Jetson provides greater than 30 frames per second (FPS) with ‘s’ and ‘n’ series models. This study also reveals that smaller, architecturally advanced models such as YOLOv11n, YOLOv8s, and YOLOv9s offer the optimal balance of high accuracy (Average Precision (AP) > 0.85) and computational efficiency (Inference Time < 34 milliseconds).

Keywords:

deer; deer detection; wildlife management; edge devices; YOLO object detection; NVIDIA Jetson; Raspberry Pi; YOLOv8; YOLOv9; YOLOv10; YOLOv11

1. Introduction

The growing population of the white-tailed deer (Odocoileus virginianus, hereafter deer) across the United States is a serious challenge for farmers, as they have to bear significant economic losses from deer depredation of corn, soybeans, cotton, and wheat. For instance, the state of Mississippi has the highest deer density in the nation, where a survey of row crop producers revealed that 17,830 total acres of farmland were affected by deer damage, resulting in a staggering annual economic loss of $4.6 million [1]. The issue occurs nationwide. One study estimated that the combined loss due to wildlife damage across corn, soybeans, wheat, and cotton to be $592.6 million [2], and another identified white-tailed deer as the primary wildlife species responsible for most of this damage across the U.S. [3].

People have explored a variety of control methods to reduce wildlife-related crop damage, such as fencing, hunting, trapping, and repellents. Each of these approaches, however, presents significant limitations in terms of cost, scalability, and long-term sustainability. Fencing is one of the most widely used deterrents, and high-tensile or electrified fences can be effective in reducing deer damage. However, the costs are prohibitive for large-scale agricultural operations, averaging $13,000 per mile of fencing, and ongoing maintenance is required to repair damage from storms, fallen trees, or determined animals [4]. Moreover, persistent deer often learn to breach or circumvent barriers, reducing effectiveness over time. Hunting and culling programs represent another strategy. Regulated hunting seasons can provide some localized relief, but they are typically seasonal and sometimes of low-intensity, decreasing their effectiveness in reducing deer damage. A recent study on hunting behavior in the U.S. projected that the number of hunters is declining by 2.2% to 1.5% per year [5]. This further calls into question the long-term efficacy of hunting as a strategy for mitigating deer intrusion on agricultural lands. More intensive damage control programs may reduce local populations more significantly; yet, they often face public opposition, animal welfare concerns, and ecological debates about altering wildlife populations. These programs also require permits from relevant wildlife management agencies, limiting farmers’ autonomy and flexibility [6].

Repellents have also been widely studied, particularly for deer [7]. While chemical or odor-based repellents can reduce browsing pressure, they typically degrade quickly after exposure to rain, wind, or sun and therefore require frequent reapplication. Overall, these approaches are largely reactive rather than proactive, addressing the problem only after damage has occurred. They are often unsustainable in the long term and may cause unnecessary stress, injury, or mortality to wildlife, raising ethical as well as ecological concerns [8].

Sustainable wildlife management that actively prevents deer from entering farms represents a foundational solution to these challenges [9,10]. Future systems can leverage advanced computer vision and deep learning techniques, most notably You Only Look Once (YOLO) models [11], to achieve highly accurate and real-time identification of deer presence and movement [12]. Detailed outputs, such as the location of deer within an image or enhanced segmentation masks, can then trigger tailored, dynamic deterrents including species-specific sound or light emissions, or autonomous maneuvers by Unmanned Aerial Vehicles (UAVs) and Unmanned ground Vehicles (UGVs) [13]. A common design pattern in emerging research involves a hierarchical process in which initial motion detection, often through Passive Infrared (PIR) sensors, activates more sophisticated vision-based models for animal detection. For example, Ref. [14] integrated a Convolutional Neural Network (CNN) to identify animals, which then activated non-lethal deterrents such as ultrasonic sounds, flashing lights, or sprinklers. Similarly, Ref. [15] employed a PIR sensor to trigger a Recurrent Convolutional Neural Network (R-CNN), which in turn activated species-specific ultrasonic frequencies to mitigate habituation. More advanced systems have also been developed for other species. For instance, Ref. [16] designed a wild boar deterrent system that uses YOLOv5n [17] to confirm presence before escalating deterrents from ultrasonic sound to predator scent (wolf urine) and vocalizations.

Machine vision serves as the foundation for these emerging deterrence systems, acting as the “eyes” for real-time deer detection. Among the available approaches, YOLO models have been preferred over two-stage detectors (e.g., Faster R-CNN or EfficientDet), due to their balance of accuracy and speed, making them well-suited for field deployment [18,19]. For instance, Ref. [18] demonstrated the superiority of YOLOv10 over YOLOv5 for wildlife detection on an NVIDIA Jetson Nano, achieving a AP@0.5 of 0.934. Similarly, Ref. [20] implemented a YOLOv8m-based surveillance system on a Raspberry Pi, showing its feasibility on low-power edge devices. In a more recent application, Ref. [21] deployed a TensorRT-optimized YOLOv5 model on an NVIDIA Jetson Orion Nano for a UAV-based deterrence system, achieving an inference time of only 0.025 s per frame. Collectively, these studies affirm that the YOLO family of models has become the de facto standard for practical, high-performance wildlife detection systems on edge hardware.

Despite recent advancements, most deer and wildlife detection studies continue to rely on private datasets or general-purpose object detection datasets. For instance, Ref. [22] applied YOLOv5 to the PASCAL VOC dataset, while others used aggregated datasets from Roboflow Universe that include multiple animal classes [18,20]. In deer-specific research, custom datasets have been used for detecting Sika deer from camera traps with YOLOv8n [12] and for identifying various deer species from UAV imagery with YOLOv8-seg [23]. However, these datasets are often limited to simplified scenarios focused on targeted deer populations and are generally not publicly available, making reproducibility and broad benchmarking difficult (see Table 1). This lack of open-source domain-specific data creates a reproducibility gap, making it difficult to benchmark progress or validate whether models trained in isolation can generalize to new environments.

Furthermore, existing literature has primarily focused on the evaluation of individual YOLO iterations in isolation, often neglecting the systematic comparison of emerging architectures. Crucially, performance is frequently reported based on high-end desktop GPUs (e.g., NVIDIA RTX series), which obscures the practical limitations of deploying these models on resource-constrained edge devices. Therefore, a critical gap exists between the theoritical capability of modern architectures and their verified feasibility in the field.

To address these gaps, we present a publicly available dataset of 3095 annotated deer images with bounding-box labels, derived from the Idaho Cameratraps project, representing challenging real-world scenarios. Using this dataset, we establish a comprehensive benchmark of YOLOv8–YOLOv11 models, evaluating performance not only on a high-end NVIDIA RTX 5090 GPU but also on resource-constrained platforms, including the CPU-based Raspberry Pi 5 and the GPU-accelerated NVIDIA Jetson AGX Xavier. This rigorous approach provides a validated reference for the deployment of autonomous wildlife detection systems in real-world agricultural and conservation scenarios.

The key contributions of this work are as follows:

An open-source dataset of 3095 annotated deer images with bounding-box labels, covering diverse environmental conditions and lighting scenarios.
A systematic evaluation and benchmarking of four YOLO architectures (v8–v11), encompassing 12 model variants for deer detection.
Inference benchmarking of the 12 YOLO model variants on edge devices, including the CPU-based Raspberry Pi 5 and the GPU-accelerated NVIDIA Jetson AGX Xavier.

This paper is organized as follows. Section 2 describes the data acquisition methods, model selection, training, evaluation, and inference metrics. Section 3 presents the model evaluation results, along with the limitations of the study. Section 4 interprets the results and provides a broad discussion regarding the implications and limitations of this study. Finally, Section 5 provides concluding remarks and potential directions for future work.

2. Materials and Methods

This section describes the data acquisition process, the YOLO models employed, the training procedure, and the model evaluation strategy, followed by inference experiments conducted on edge devices.

2.1. Data Acquisition

To enable a comprehensive comparison, we curated two deer image datasets. We first selected a dataset from Roboflow Universe containing annotated images of deer, which will be referred to as the “Roboflow dataset”. The Roboflow dataset [30] consists of 2339 images of different deer species captured in diverse environments, mostly under favorable lighting conditions and with minimal motion, as is common in manually photographed images (Figure 1). The dataset was divided into training set with 2043 images and validation set with 296 images. However, this type of dataset does not fully capture the challenges of real-world field conditions, such as low-light environments, motion blur, partial occlusion, and varying camera trap settings.

To better represent these scenarios, we curated a second dataset from the Idaho Cameratraps project, shared by the Idaho Department of Fish and Game via the LILA BC online repository [31]. The Cameratraps project contains over 1.5 million camera trap images collected from video sequences across multiple regions. While the original dataset provides only sequence-level labels without bounding-box annotations, we filtered approximately 7000 images containing deer (confidence score > 0.9) and manually annotated 3095 images with bounding boxes using the Computer Vision Annotation Tool (CVAT) [32]. This “Cameratrap dataset” was then divided into a training set with 2578 images and a validation set with 517 images (Figure 1). For both the dataset, training set was utilized for training purpose while isolating validation set solely for evaluation purpose throughout the whole study.

The two datasets differ significantly in complexity and realism. The Roboflow dataset largely consists of clean, manually photographed images of deer, mostly under good lighting and with limited motion or occlusion, making it more suitable for baseline model training. In contrast, the Cameratrap dataset captures deer under challenging real-world conditions, including varying backgrounds, night/low-light settings, occlusion, and camouflage. As shown in Figure 1, the Cameratrap dataset therefore provides a more realistic benchmark for evaluating model robustness in field deployments.

2.2. Deep Learning Models

The YOLO framework made a groundbreaking contribution to computer vision, particularly in object detection, by introducing a single-pass approach that framed detection as a regression problem. This innovation enabled real-time inference, which was lacking in other state-of-the-art models at the time [33]. Since then, the YOLO family has evolved rapidly, producing numerous variants that build on the original design with architectural and performance enhancements [11].

In this study, we focus on four recent versions, YOLOv8, YOLOv9, YOLOv10, and YOLOv11, for comparative benchmarking. The study is further constrained to the three most lightweight variants of each model family: nano (n), small (s), and medium (m). Notably, for YOLOv9, the tiny (t) variant was used because it has a similar parameter count as for nano-scale variants. YOLOv8 introduced notable improvements, including an anchor-free detection method for enhanced accuracy, architectural refinements in the backbone and detection head to better capture objects of varying scales, and a decoupled head design that further improved precision [34,35]. We conclude our comparisons with YOLOv11. While alternative architectures, such as YOLO-NAS (Neural Architecture Search) or the latest transformer-hybrid variants(e.g., YOLOv12) offer promising capabilities, they represent divergent evolutionary branches or a gradual shift towards attention-based vision transformer architectures [11,36,37]. This study specifically bounds its scope to the continuous CNN-based lineage of YOLOv8 through YOLOv11 to provide a controlled comparative analysis.

2.2.1. YOLOv8

YOLOv8, developed by Ultralytics, represents a state-of-the-art object detection model that introduced several architectural improvements over its predecessor, YOLOv5 [38]. One of its key innovations is anchor-free estimation, which accelerates post-processing, particularly Non-Maximum Suppression (NMS), for selecting prediction boxes [39]. Additionally, the model employs the CSPDarknet backbone, which reduces computational complexity while maintaining accuracy by splitting feature maps and incorporating the Sigmoid Linear Unit (SiLU) activation function [38]. This design enhances gradient flow and feature reuse, resulting in smaller, more efficient models well-suited for edge devices. Furthermore, YOLOv8 integrates a PANet neck, which improves multi-scale feature learning by augmenting the traditional Feature Pyramid Network (FPN) with a bottom-up path alongside the top-down pathway [40]. Another improvement is the C2f (Cross Stage Partial network with two fusion blocks) module, which captures complex patterns more effectively, further boosting model accuracy [38,41]. Ultralytics provides YOLOv8 in five sizes (“n”, “s”, “m”, “l”, and “x”), offering flexibility depending on accuracy and resource constraints. The smallest model, YOLOv8n, has 3.2 million parameters, while the largest, YOLOv8x, has approximately 257.8 million parameters.

2.2.2. YOLOv9

YOLOv9 architecture marks a significant improvement over earlier YOLO versions in terms of speed and accuracy of object detection. A key challenge it addressed is the information bottleneck problem in deep neural networks, where gradients diminish or useful information is lost as they propagate through successive layers of large models. To overcome this, YOLOv9 introduced Programmable Gradient Information (PGI), which ensures more reliable gradient flow and enhances the network’s ability to learn from complex image features. As shown in Figure 2, PGI is implemented through a reversible auxiliary branch that updates the main branch by generating useful gradients through a supervision mechanism. Moreover, YOLOv9 introduced the Generalized Efficient Layer Aggregation Network (GELAN), an advancement over the original Efficient Layer Aggregation Network (ELAN) architecture. Unlike ELAN, which relied solely on convolutional blocks, GELAN can flexibly incorporate different computational blocks, improving both efficiency and generalization. Together, the integration of PGI and GELAN enables YOLOv9 to achieve robust gradient flow, efficient computation, and superior detection performance [42,43].

2.2.3. YOLOv10

YOLOv10 introduced several architectural innovations aimed at improving both efficiency and accuracy, while addressing key limitations in earlier YOLO versions [45]. One major issue identified in prior models was the reliance on NMS during post-processing, which added computational overhead and delayed inference. To overcome this, YOLOv10 adopted NMS-free training by incorporating a one-to-one prediction head alongside the traditional one-to-many head during training, allowing the network to optimize using both. However, at inference time, only the one-to-one head is used, which eliminates the need for NMS and enables faster, end-to-end deployment. Another notable improvement is the simplification of the classification head. The authors observed that the regression head plays a more critical role in YOLO performance, so the classification head was streamlined to reduce computational cost without sacrificing accuracy.

YOLOv10 also redesigned its convolutional blocks to reduce redundancy. Traditional YOLO architectures used 3 × 3 convolutions with a stride of 2 to simultaneously downsample spatial dimensions (from H × W to H/2 × W/2) and expand channels (from C to 2C). In contrast, YOLOv10 decouples these operations: a pointwise convolution first handles channel transformations, followed by a depthwise convolution for spatial downsampling. This design significantly reduces computational overhead. To further enhance accuracy, YOLOv10 integrates partial self-attention and employs improved large-kernel convolutions, particularly for lightweight models. Together, these modifications result in a more efficient and accurate object detection framework suitable for real-time applications.

2.2.4. YOLOv11

YOLOv11 represents the most recent advancement in the YOLO family, developed by Ultralytics as a more sophisticated yet efficient model. It extends the versatility of YOLO by supporting multiple computer vision tasks, including object detection, instance segmentation, and pose estimation, while maintaining strong performance relative to its size [46]. Built upon the YOLOv8 architecture, YOLOv11 introduces two key innovations. The C2PSA (Cross Stage Partial with Spatial Attention) block enables the model to focus on the most relevant regions of an image, thereby improving its ability to detect small and occluded objects. Meanwhile, the C3k2 (Cross Stage Partial with 3 Convolutions and 2 Kernels) block reduces computational complexity while preserving rich feature representations, making the model more efficient without compromising accuracy.

Ultralytics maintains the YOLO family of models and provides open-source implementations of the versions they primarily developed, such as YOLOv8 and YOLOv11. In practice, YOLO models present a tradeoff between model size and reliability. Although they are widely applicable across various computer vision tasks, our study focuses specifically on their object detection capabilities. To capture the balance between performance and computational efficiency, we selected the “n”, “s”, and “m” variants of each model. Among these, YOLOv8m is the largest, with 25.84 million parameters, while YOLOv9t is the smallest, with only 1.94 million parameters (see Table 2). Each version incorporates different techniques for gradient flow, optimization, and image processing. By evaluating performance across a wide range of architectural variations and model sizes, we will provide deeper insights that support informed decision-making when selecting YOLO models for deer detection tasks.

2.3. Training

The YOLO models were trained on an NVIDIA GeForce RTX 5090 GPU with 32 GB of VRAM, a high-end platform well-suited for deep learning workloads. The training was conducted on a Linux machine with CUDA version 12.9 (Table 3). While the default training configurations provided by Ultralytics are effective, adjustments were made to better leverage the available hardware. In particular, the batch size and number of workers were tuned for different models based on their parameter size. Larger batch sizes enable faster training by improving GPU utilization [47], but they are constrained by memory usage, especially for larger models. Therefore, careful experimentation was conducted to identify the optimal values for batch size and the number of workers, given the model size and available VRAM. All models were trained utilizing Common Objects in Context(COCO) pretrained weights for a total of 100 epochs with image size 640, initial learning rate 0.01, and SGD(Stochastic Gradient Descent) optimizer with standard augmentation techniques (Table 4).

2.4. Testing and Performance Evaluation

In this study, the model needs to correctly identify the presence of deer in an image and localize them with bounding boxes. In addition, we need to assess the computational efficiency of the models and the feasibility of real-time deployment. There are several standard metrics to evaluate the performance of models.

Intersection over Union (IoU): IoU is a widely adopted metric used to evaluate the correctness of detections produced by an object detection model. The model is trained to predict bounding boxes around objects in an image, and IoU measures how closely these predictions match the ground-truth bounding boxes [48].

Precision (P): Precision quantifies the proportion of detections that are correct, i.e., true positives relative to all predicted positives. In object detection, a detection is considered a true positive if both the class label and localization (

I o U \geq threshold

) are correct.

Recall (R): Recall measures the proportion of ground-truth objects correctly detected, i.e., true positives relative to all actual objects. Missed detections or those failing the IoU threshold contribute to false negatives.

F1 Score: The F1-score is the harmonic mean of precision and recall, providing a single balanced measure of detection accuracy at a given confidence threshold.

AP (Average Precision): AP summarizes the tradeoff between precision and recall across confidence thresholds by computing the area under the precision–recall curve. Higher AP values indicate better overall detection performance [48].

Mean Average Precision (mAP): mAP is the mean of AP across all predicted classes. In this study, it reduces to the AP of the single “Deer” class. Commonly reported values include mAP@0.5 (IoU threshold of 0.5) and mAP@0.5:0.95 (averaged across thresholds from 0.5 to 0.95 with step size 0.05).

Inference Time: Inference time is the average time required for a model/computational graph (e.g., ONNX model) to execute a forward pass on the hardware accelerator (CPU/GPU). It reflects the raw computational capability of the device and is reported as time per batch-amortized image.

Processing Time (System Overhead): Processing time includes time required for additional CPU-bound operations to prepare the data and interpret the results. This encompasses pre-processing (disk I/O, resizing and normalization) and post-processing (decoding raw tensors, coordinate scaling and confidence thresholding) time. Visualization is excluded from this metric to isolate detection performance. Thus, total time is reported in this study as the sum of inference time and system overhead.

Frames Per Second (FPS): FPS is derived strictly from inference time to benchmark the maximum computational throughput of the edge devices. This helps to isolate the accelerator’s performance from system-level bottlenecks related to processing time (system overhead).

In this study, we use IoU, precision, recall, F1-score, and AP@0.5 to measure detection accuracy, while inference time and processing time are used to evaluate computational capability [48]. To ensure consistency in performance reporting, a standardized evaluation procedure was followed for all timing measurements. Each inference session was preceded by a pre-load phase, consisting of 3 iterations over 8 sample images to stabilize hardware clock frequencies and initialize the computational graph within the system memory. Following this initialization, inference was executed sequentially across the entire validation dataset (

N = 517

images). The reported values represent the arithmetic mean derived from this full-set evaluation. This approach ensures that the performance metrics reflect sustained hardware capability across diverse environmental inputs rather than isolated peak performance.

2.5. Inference on Edge Devices

The YOLO models were trained on high-performance workstations. However, for applications such as deer detection, real-time identification and localization are essential for system effectiveness. One approach is to outsource inference to a cloud-based server, where live video frames are continuously transmitted and processed, with detection results returned to the client [49]. However, such solutions often face challenges related to bandwidth requirements, latency, and network security. Edge computing provides an alternative by enabling local image or video processing, thereby reducing latency and ensuring rapid response. Performing all computation on the local device also creates an independent detection and tracking system, which is particularly advantageous in remote agricultural settings where reliable cloud connectivity cannot be guaranteed [50]. A wide range of off-the-shelf edge devices are available for such applications, including single-board computers (SBCs) such as the Raspberry Pi, NVIDIA Jetson platforms, USB accelerators, and mobile devices with embedded CPUs and GPUs [51]. We evaluate inference performance on two representative devices: a CPU-based Raspberry Pi and a GPU-accelerated NVIDIA Jetson platform (Figure 3).

2.5.1. Raspberry Pi 5

Raspberry Pi is a series of small single-board computers that offer low-cost, small-sized hardware for computing. They are often credit-card-sized but deliver excellent computing power for their size. In this study, we test the performance of trained models on Raspberry Pi 5, which is the newest model of the Raspberry Pi series (Figure 3). It features an onboard computer with a quad-core 64-bit Arm Cortex-A76 CPU. Raspberry Pi is widely used in applications like real-time image or video processing since it supports a dedicated Camera Serial Interface (CSI). Moreover, it can operate on low power, making it suitable for remote deployment as an independent computer, able to get real-time data from the environment, such as the presence of deer in our case. For this study, a device with 16 GB system memory and the operating system Linux Ubuntu 22.04 was used.

2.5.2. Jetson AGX Xavier

NVIDIA Jetson AGX Xavier is a powerful edge device designed explicitly to address the rigorous demands of robotics and edge AI applications. It is centered around a highly integrated System-on-Chip (SoC) that combines Volta GPU, ARM-based CPUs, and specialized hardware accelerators within a unified memory framework. The GPU incorporates 512 CUDA cores and 64 tensor cores, capable of delivering up to 11 TFLOPS (FP16) or 22 TOPS (INT8). It operates at a maximum frequency of 1.37 GHz with dynamic scaling, enabling fine-grained control of power and performance [52]. The tensor cores provide significant efficiency for AI workloads, as they are optimized for matrix multiply–accumulate (MMA) operations, which underpin most deep learning algorithms. Moreover, the AGX Xavier integrates dedicated accelerators such as Deep Learning Accelerators (DLA) and Programmable Vision Accelerators (PVA), providing hardware-level support for diverse workloads in autonomous and embedded systems. In addition, this device comes with a compact form factor of 4.2 × 4.2 × 4 inches (Figure 3 (right)). Together, these features make the Jetson AGX Xavier a versatile platform for real-time robotics and edge AI deployment.

2.5.3. Model Deployment and Optimization

While powerful hardware provides the foundation for high-speed computing, the performance of AI models is equally dependent on their software implementations [53]. To fully exploit the capabilities of the edge devices, specialized software frameworks and optimizations are required. Ultralytics provides YOLO models with optimizations for accuracy and speed; however, the default PyTorch 2.7.1 [54] exports are not always the most efficient for deployment on storage- and performance-constrained devices. To address this, several open-source model optimization methods enable faster inference without compromising accuracy. Common export and deployment formats include ONNX, OpenVINO, TorchScript, TensorFlow Lite, NCNN, and TensorRT [55]. Each format supports different types of AI models while incorporating optimizations to improve inference speed on specific hardware. In many cases, the creators of these model frameworks also provide dedicated runtime platforms to ensure efficient execution across devices.

In this study, we evaluated the YOLO models on both devices by converting them to ONNX format. ONNX provides a framework-agnostic representation of deep learning models as computation graphs, enabling portability across platforms [56]. Inference was performed using ONNX Runtime, a lightweight, high-performance engine that reduces deployment overhead and supports hardware-specific optimizations via execution providers (e.g., ARM CPUs or CUDA GPUs). By utilizing a unified deployment pipeline, we eliminate software-specific optimization as a confounding variable, ensuring that observed performance discrepancies are primarily attributable to hardware architecture and model design. While specialized compilers—such as NVIDIA TensorRT for Jetson devices or TFLite for Raspberry Pi can unlock significant performance gains, they often introduce platform-specific conversion artifacts [57,58,59]. This study prioritizes a reproducible, portable baseline to benchmark the intrinsic capabilities of each device under a common software stack. The specific inference and evaluation settings used across both devices for inference on ONNX models are summarized in Table 5. Minor platform specific variations are expected despite identical evaluation settings.

3. Results

3.1. Cross-Dataset Evaluation

To evaluate the contribution of our curated dataset, we performed a cross-dataset experiment on the high-end GPU (Table 3) by training models on the Roboflow dataset and evaluating them on the Cameratraps dataset. As shown in Table 6, all models achieved excellent performance when trained and tested on the Roboflow dataset, with YOLOv11n reaching AP@0.5 of 0.9920 and AP@0.5:0.95 of 0.8475. However, when the same models were tested on the Cameratraps dataset, performance dropped sharply. For instance, YOLOv11n achieved only 0.7395 AP@0.5 and 0.5341 AP@0.5:0.95, despite its near-perfect results on Roboflow. This substantial performance gap highlights the limitations of existing datasets such as Roboflow, which consist mostly of clean, well-lit images, and do not capture the challenging conditions present in real-world scenarios. In contrast, the Cameratraps dataset includes varied lighting, occlusion, camouflage, and motion blur, making it a more realistic benchmark for wildlife detection. Figure 4 shows some successful detections by the YOLOv11m model on the Cameratraps validation dataset across different scenarios. These images are carefully curated difficult scenarios, but the model shows an impressive ability to handle varied lighting conditions, distances, and deer poses. Therefore, this evaluation confirms the necessity of using domain-representative datasets. For the remainder of this study, we adopt the Cameratraps dataset as the primary evaluation dataset to ensure that results reflect real-field applicability.

3.2. Raspberry Pi 5 (CPU-Based Inference)

Table 7 summarizes the performance of the YOLO models on the Raspberry Pi 5. With only CPU support, inference times were orders of magnitude slower than on the benchmark GPU. Even the smallest model, YOLOv9t, required over 250 ms per inference, corresponding to fewer than 4 frames per second (FPS). Larger models, such as YOLOv8m, exceeded 1.5 s per frame. These low frame rates are inadequate for monitoring or tracking moving animals, making CPU-only deployment impractical for real-time applications on this class of hardware.

Figure 5 shows the latency–reliability trade-off for YOLO models on the Raspberry Pi 5. The bar plot represents inference time (ms), while the line plot indicates AP@0.5. The results illustrate the inherent trade-off between detection accuracy and computational latency: smaller models, such as YOLOv9t, achieve lower latency but at the cost of reduced accuracy, whereas larger models like YOLOv8m and YOLOv11m provide higher accuracy but suffer from prohibitive inference times. This also highlights the challenge of balancing model complexity and deployability on CPU-only edge devices.

The performance results, detailed in Table 7, highlight the significant computational challenge this presents. While the detection accuracy (AP@0.5) is primarily determined by the model’s learned weights, minor variations are observed across hardware platforms (mean difference of 0.0154), which are attributed to backend-dependent floating-point arithmetic and threshold-sensitive evaluation effects. In contrast, the inference times are orders of magnitude higher than on the benchmark GPU. Even the smallest model, YOLOv9t, required over 250 ms for a single inference, resulting in a detection speed of less than 4 frames per second (FPS). For the larger models, such as YOLOv8m, the total time exceeded 1.5 s per frame. Such low frame rates are insufficient for monitoring or tracking moving animals, rendering most models impractical for any application requiring real-time responsiveness on this class of hardware.

3.3. NVIDIA Jetson AGX Xavier (GPU-Accelerated Inference)

CPU-based edge devices such as the Raspberry Pi 5 face severe latency constraints. In contrast, as shown in Table 7, the Jetson AGX Xavier achieved substantially lower inference times across all models, demonstrating the advantage of the Volta GPU architecture. The smallest models, such as YOLOv11n and YOLOv10n, delivered detection speeds exceeding 40 FPS, comfortably meeting the requirements for real-time video analysis. Even medium-sized models, including YOLOv11s and YOLOv8s, sustained performance above 20 FPS. Figure 6 further illustrates the latency–reliability trade-off across models on the Jetson platform. Inference times remained below 70 ms even for the larger models, while accuracy (AP@0.5) consistently stayed above 0.85. These results confirm that the Jetson can support both lightweight and medium-sized YOLO models with real-time throughput, a stark contrast to the Raspberry Pi.

This level of throughput enables not only accurate detection but also continuous tracking and the ability to trigger dynamic, responsive deterrence. These results indicate that the full set of evaluated models can be effectively deployed on the NVIDIA Jetson AGX Xavier when combined with efficient preprocessing and post-processing pipelines.

3.4. Comparative Analysis of Model Performance

Figure 7 provides a comparative evaluation of deer detection models across two edge devices. The time axis is shown on a logarithmic scale to accommodate the orders-of-magnitude difference in processing speed. Two distinct clusters of results are apparent: the NVIDIA Jetson models are concentrated in the low-latency region (below 200 ms), while the Raspberry Pi models are spread across a much wider range, extending from 200 ms to nearly 2000 ms. Here, total time represents the sum of inference, preprocessing, and postprocessing times (see Section 2.4).

As noted earlier, differences in total time relative to inference time are primarily attributable to software inefficiencies. The plot illustrates the “efficiency frontier” for each device, defined as the set of models offering the best accuracy for a given level of performance. On the Jetson, this frontier is nearly vertical, indicating that models can achieve high F1-scores without sacrificing latency. By contrast, the Raspberry Pi’s efficiency frontier is much more extended. Models from the m series in all YOLO versions fail to deliver adequate throughput, and even smaller models face latency challenges. Nevertheless, lightweight variants such as YOLOv8n and YOLOv11n achieve the best balance, sustaining competitive accuracy while remaining relatively efficient.

This analysis highlights that for CPU-bound platforms like the Raspberry Pi, model selection must carefully balance accuracy against the practical frame-rate requirements of real-time applications. In contrast, GPU-accelerated platforms such as the Jetson provide greater flexibility, allowing higher accuracy models to be deployed without significant performance trade-offs.

4. Discussion

4.1. Assessment of the Domain Gap

One of the primary outcomes of this study is the empirical quantification of the domain gap between controlled detection environments and the complex conditions encountered in real-world wildlife monitoring. While contemporary YOLO architectures frequently report near-perfect precision on curated benchmark datasets (e.g., COCO or cleaned Roboflow subsets), our results demonstrate a substantial degradation in performance under field conditions characterized by camouflage, occlusion, and heterogeneous illumination.

Specifically, models that perform at or near the state of the art in benchmark settings exhibited a decline to approximately 0.74 AP when evaluated on the Cameratraps dataset, corresponding to an approximate 25% reduction in detection performance. This finding underscores the limitations of benchmark-centric evaluation and highlights the importance of domain-specific data in ecological monitoring applications. Our results suggest that architectural advances alone are insufficient to ensure reliable deployment performance, and instead emphasize the need for data-centric strategies and evaluation protocols tailored to operational environments. Consequently, this work provides a realistic performance baseline and informs future efforts toward robust, field-ready computer vision systems for agricultural and wildlife monitoring.

4.2. Choices of CPU or GPU Edge Devices

The choice between CPU- and GPU-based edge devices depends largely on the requirements of the target application. CPU platforms, such as the Raspberry Pi, are low-cost, energy-efficient, and suitable for lightweight tasks where occasional or event-triggered detections are sufficient, for example, confirming the presence of a stationary animal or supporting long-term monitoring in power-constrained environments. By contrast, GPU-accelerated platforms, such as the NVIDIA Jetson AGX Xavier, are better suited for real-time applications that demand high-frame-rate video analysis, continuous tracking, and rapid decision-making. The Jetson provides the computational capacity to run larger models at interactive speeds, making it ideal for dynamic scenarios where animals are moving, and system responsiveness is critical. In practice, the Raspberry Pi offers accessibility and low-power operation, while the Jetson delivers the performance necessary for advanced, real-time wildlife detection and deterrence. The decision between the two thus reflects a trade-off between efficiency, cost, and the level of intelligence required at the edge.

4.3. Robustness to Other Species

In this work, we curated the Cameratraps dataset, which covers a diverse and challenging range of environmental conditions (see Figure 1). However, the ability of the models to generalize to other deer species, or to avoid misclassifying non-deer animals as deer, has not yet been evaluated. For deployment in a real-world automatic deer deterrence system, robustness against such incorrect classifications is essential. Out-of-distribution (OOD) detection methods could be applied to mitigate these risks [60]. Achieving this will require additional studies, including the development of a multi-species dataset to rigorously test cross-species generalization and misclassification robustness.

4.4. Dataset Limitation and Scope

While the Cameratraps dataset presents a critical benchmark for deer detection in challenging environmental conditions, the scope and generalization of the presented findings is subject to some constraints. First, the dataset is characterized by the specific vegetation and terrain of the intermountain West (Idaho) making it more suitable as a testbed for regional deployments. Second, the modest data scale of 3095 images does not allow for the training of expansive models entirely from scratch without risking overfitting. However, this scale effectively simulates the data-constrained scenarios typical of wildlife monitoring and demonstrates the potency of transfer learning using pre-trained YOLO weights. Finally, as current annotations aggregate all the detections into a single “Deer” class, the scope is constrained. But the prioritization of a unified class helps to establish a robust baseline for target species recovery and effectively isolates the detection capabilities. It will be fruitful to extend this dataset in the future to explicitly characterize interactions with non-target species, such as elk or livestock often encountered in the same geography.

4.5. Real-World Deployment

Our evaluation on edge devices showed strong detection accuracy across platforms, but real-world deployment also requires speed, efficiency, and reliability. The NVIDIA Jetson AGX Xavier offered a favorable balance of performance and accuracy, supporting real-time operation. In contrast, smaller low-power devices such as the Raspberry Pi 5 performed poorly with larger models, where high GFLOPs demands quickly overwhelmed the CPU. The study further indicates that simply converting PyTorch models to ONNX and running them with ONNX Runtime is insufficient for achieving real-time performance on the Raspberry Pi. However, the specialized optimization frameworks and model compression techniques can significantly shift these boundaries. For instance, the use of TensorRT on Jetson devices can leverage hardware-specific optimizations that represent the theoretical upper bound of the hardware’s performance. Similarly, on the Raspberry Pi 5, lightweight frameworks like NCNN or TFLite, particularly when combined with INT8 quantization, have been shown to improve frame rates [57]. These measures are essential to minimize resource usage while maintaining acceptable detection performance in practical field settings. The benchmarking in this study serves as a critical reference point for researchers to determine whether a model requires such intensive optimization for a specific deployment setting.

Furthermore, a limitation of the current performance evaluation is the reliance on steady-state average metrics recorded in a controlled environment. While the results provide a consistent baseline for performance comparison, they may not account for the performance variance caused by thermal throttling or system interrupts encountered in unpredictable field conditions. Future extensions should involve long-term operational stability tests that report statistical variance and standard deviation over extended deployment time periods to better characterize the deployment reliability of these models. Additionally, subsequent studies must also incorporate physical power-metering to profile the energy consumption per inference across the edge devices, which is a critical constraint for battery-powered remote agricultural deployments.

5. Conclusions

In this paper, we introduced an open-source dataset of 3095 annotated deer images with bounding-box labels, capturing diverse environmental conditions and lighting scenarios. We benchmarked 12 model variants from four recent YOLO architectures (v8, v9, v10, and v11) and evaluated their viability on two representative edge devices: the CPU-based Raspberry Pi 5 and the GPU-powered NVIDIA Jetson AGX Xavier. Our results quantify a critical domain gap, demonstrating that while modern architectures are theoretically capable of high precision, their real-world feasibility varies significantly across hardware. Results showed that larger m-series models achieved the highest accuracy on high-end hardware, with AP@0.5 scores exceeding 0.94. However, using a standardized, framework-agnostic ONNX runtime, their computational demands made them unsuitable for real-time deployment on CPU-only devices such as the Raspberry Pi 5 where the performance dropped below 1 FPS. In contrast, the Jetson AGX Xavier provided an optimal balance, sustaining real-time processing speeds above 25 FPS while maintaining high detection accuracy (AP@0.5 > 0.85). These findings demonstrate that GPU-accelerated hardware is a prerequisite for real-time wildlife tracking at the edge when relying on universal deployment stacks.

Overall, this study provides clear, actionable guidance for the design of effective autonomous deer detection systems that can be deployed on edge devices. Since fast and accurate deer detection is fundamental to advanced monitoring, tracking, and deterrence applications, this study plays a foundational role in the development of such systems. Future work will focus on hardware-specific optimization techniques and lightweight frameworks, such as TensorFlow Lite, to further improve performance on constrained devices. Additionally, we plan to expand the dataset by collecting and annotating images that capture a wider range of challenging conditions, including adverse weather (e.g., snow, heavy rain, fog), diverse agricultural landscapes (e.g., cornfields, soybean fields), and a greater variety of deer species and behaviors.

Author Contributions

Conceptualization, B.A., J.L., E.S.M., J.D., T.-M.T., M.L.T. and D.C.; software, B.A.; investigation, B.A. and D.C.; writing—original draft preparation, B.A. and J.L.; writing—review and editing, E.S.M., J.D., T.-M.T., M.L.T. and D.C.; supervision, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Mississippi Soybean Promotion Board under fund number (25-2026) and McIntire-Stennis fund under accession number (7010294).

Data Availability Statement

The original data presented in the study are openly available in Kaggle at https://www.kaggle.com/datasets/winnerbishal/deer-cameratraps (accessed on 24 February 2026).

Acknowledgments

This publication is a contribution of the Forest and Wildlife Research Center, Mississippi State University. We acknowledge the Idaho Department of Fish and Game for making the Idaho Camera Traps dataset [31] publicly available for research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mills, B.E.; Croft, B. Deer Impact on Crop Producers: A Buck’s Buck Effect. Southern Ag Today. 26 May 2025. Available online: https://southernagtoday.org/2025/05/26/deer-impact-on-crop-producers-a-bucks-buck-effect/ (accessed on 24 February 2026).
McKee, S.C.; Shwiff, S.A.; Anderson, A.M. Estimation of wildlife damage from federal crop insurance data. Pest Manag. Sci. 2021, 77, 406–416. [Google Scholar] [CrossRef] [PubMed]
Boyer, C.N.; Chen, L.; Perez-Quesada, G.; Smith, S.A. Unwelcomed guests: Impact of deer harvest on corn and soybean wildlife damage. Crop Prot. 2024, 183, 106753. [Google Scholar] [CrossRef]
Landguth, E.; Jakes, A.; Hebblewhite, M. Testing ‘Wildlife Friendly’ Fence Modifications to Manage Wildlife and Livestock Movements; Technical Report FHWA/MT-20-001/9596-617; Montana Department of Transportation, Research Programs: Helena, MT, USA, 2020.
Mohr, A.S.; Henry, M.E.; Wojcik, B.; Anhalt-Depies, C.; Storm, D.J. The decline of deer hunting: Demographic analysis and future projections of Wisconsin deer hunters. Wildl. Soc. Bull. 2025, 49, e1578. [Google Scholar] [CrossRef]
Warren, R. Deer overabundance in the USA: Recent advances in population control. Anim. Prod. Sci. 2011, 51, 259–266. [Google Scholar] [CrossRef]
el Hani, A.; Conover, M.R. Comparative Analysis of Deer Repellents. In Proceedings of the Repellents in Wildlife Management: A Symposium, Denver, CO, USA, 8–10 August 1995; Mason, J.R., Ed.; USDA National Wildlife Research Center: Fort Collins, CO, USA, 1997; pp. 147–155. Available online: https://digitalcommons.unl.edu/nwrcrepellants/14/ (accessed on 24 February 2026).
Dubois, S.; Fenwick, N.; Ryan, E.A.; Baker, L.; Baker, S.E.; Beausoleil, N.J.; Carter, S.; Cartwright, B.; Costa, F.; Draper, C.; et al. International consensus principles for ethical wildlife control. Conserv. Biol. 2017, 31, 753–760. [Google Scholar] [CrossRef]
Abed, N.; Murgun, R.; Deldari, A.; Sankarannair, S.; Ramesh, M.V. IoT and AI-driven solutions for human-wildlife conflict: Advancing sustainable agriculture and biodiversity conservation. Smart Agric. Technol. 2025, 10, 100829. [Google Scholar] [CrossRef]
Kovacs, D.; Hajnal, E. Current and Potential New Methods to Prevent Damage Caused by Wild Animals in Agricultural Areas. In Proceedings of the 2024 IEEE 18th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 23–25 May 2024; pp. 000105–000110. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Sharma, S.; Pansri, B.; Timilsina, S.; Gautam, B.P.; Okada, Y.; Watanabe, S.; Kondo, S.; Sato, K. Developing an Alert System for Agricultural Protection: Sika Deer Detection Using Raspberry Pi. Electronics 2024, 13, 4852. [Google Scholar] [CrossRef]
Roca, A.; Torre, G.; Giribet, J.I.; Castro, G.; Colombo, L.; Mas, I.; Pereira, J. Efficient Endangered Deer Species Monitoring with UAV Aerial Imagery and Deep Learning. In Proceedings of the 2024 IEEE Biennial Congress of Argentina (ARGENCON), San Nicolás de los Arroyos, Argentina, 18–20 September 2024; pp. 1–8. [Google Scholar] [CrossRef]
Geerthik, S.; Senthil, G.; Vishal, B.; Rao, K.T. IoT-Based System for Humane Animal Deterrence and Sustainable Crop Management. In Proceedings of the 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 5–7 February 2025; pp. 770–775. [Google Scholar] [CrossRef]
Mishra, A.; Yadav, K.K. Smart Animal Repelling Device: Utilizing IoT and AI for Effective Anti-Adaptive Harmful Animal Deterrence. BIO Web Conf. 2024, 82, 05014. [Google Scholar] [CrossRef]
Lal Raja Singh, R.; Sandeep Krishnaa, S.; Shreenikesh, V. Real-Time Wild Animal Intrusion Detection and Repellent System Using YOLOv5n and Predator Scent. In Proceedings of the 2025 6th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 25–27 June 2025; pp. 1729–1735. [Google Scholar] [CrossRef]
Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Diaconu, L.; Poznanski, J.; Yu, L.; Rai, P.; Ferriday, R.; et al. YOLOv5, version 3.0; Zenodo: Geneva, Switzerland, 2020; Available online: https://zenodo.org/records/3983579 (accessed on 24 February 2026).
Nishanth, I.; Rakeshkumar, R.; Thenmozhi, V. Real-Time Wild Animal Detection using YOLOv10 Algorithm. In Proceedings of the 2025 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET), Chennai, India, 20–22 March 2025; pp. 1–6. [Google Scholar] [CrossRef]
Li, S.; Zhang, H.; Xu, F. Intelligent Detection Method for Wildlife Based on Deep Learning. Sensors 2023, 23, 9669. [Google Scholar] [CrossRef]
Nishanth, K.; Jayasuriyaa, E.; Ajey Prasand, M.; Shri Pranav, S.; Lalithamani, N. YoloV8-Powered Surveillance: Safeguarding Crops from Wildlife and Fire Hazards. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–7. [Google Scholar] [CrossRef]
Temesgen, E.; Jerez, M.; Brown, G.; Wilson, G.; Divakarla, S.G.L.; Boelter, S.; Nelson, O.; McPherson, R.; Gini, M. Geofenced Unmanned Aerial Robotic Defender for Deer Detection and Deterrence (GUARD). arXiv 2025, arXiv:2505.10770. [Google Scholar] [CrossRef]
Abood, B.S.Z.; M, M.B.; Almoussawi, Z.A.; Shilpa, N.; Shakir, A.M. Revolutionizing Wildlife Monitoring: A Novel Approach to Camera Trap Image Analysis with YOLOv5. In Proceedings of the 2023 3rd International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, India, 4–5 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
Roca, A.; Castro, G.; Torre, G.; Colombo, L.J.; Mas, I.; Pereira, J.; Giribet, J.I. Detection of Endangered Deer Species Using UAV Imagery: A Comparative Study between Efficient Deep Learning Approaches. In Proceedings of the 2025 International Conference on Unmanned Aircraft Systems (ICUAS); IEEE: Piscataway, NJ, USA, 2025; pp. 83–90. [Google Scholar] [CrossRef]
Yen, J.J.; Pan, Y.H.; Wang, C.H. Deer Species and Gender Detection system based on YOLO v9. In Proceedings of the 2024 International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), Taichung, Taiwan, 9–11 July 2024; pp. 463–464. [Google Scholar] [CrossRef]
Siddique, M.J.; Ahmed, K.R. DDM: Study Of Deer Detection And Movement Using Deep Learning Techniques. In Proceedings of the 2023 IEEE 15th International Symposium on Autonomous Decentralized System (ISADS), Mexico City, Mexico, 15–17 March 2023; pp. 1–8. [Google Scholar] [CrossRef]
Saraswathi, R.; Shobarani, G.; Subramani, A.; Tamilarasan, D. Applying Deep Learning and Transfer Learning Techniques for Improving Animal Intrusion Detection in Agriculture Farms. In Proceedings of the 2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Chennai, India, 21–23 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
Adami, D.; Ojo, M.O.; Giordano, S. Design, Development and Evaluation of an Intelligent Animal Repelling System for Crop Protection Based on Embedded Edge-AI. IEEE Access 2021, 9, 132125–132139. [Google Scholar] [CrossRef]
Arshad, B.; Barthelemy, J.; Pilton, E.; Perez, P. Where is my Deer?—Wildlife Tracking and Counting via Edge Computing and Deep Learning. In Proceedings of the 2020 IEEE SENSORS, Rotterdam, The Netherlands, 25–28 October 2020; pp. 1–4. [Google Scholar] [CrossRef]
Munian, Y.; Martinez-Molina, A.; Alamaniotis, M. Intelligent System for Detection of Wild Animals Using HOG and CNN in Automobile Applications. In Proceedings of the 2020 11th International Conference on Information, Intelligence, Systems and Applications (IISA), Piraeus, Greece, 15–17 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
Dyer, A. Deer Dataset. 2022. Available online: https://universe.roboflow.com/aidan-dyer/deer-iyhon (accessed on 24 February 2026).
Morris, D. Idaho Cameratraps Dataset. 2022. Available online: https://lila.science/datasets/idaho-camera-traps/ (accessed on 24 February 2026).
Corporation, C. Computer Vision Annotation Tool (CVAT). 2024. Available online: https://zenodo.org/records/18750037 (accessed on 24 February 2026).
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar] [CrossRef]
Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar]
Varghese, R.; M., S. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Gupta, C.; Gill, N.S.; Gulia, P. An optimized YOLO NAS based framework for realtime object detection. Sci. Rep. 2025, 15, 32903. [Google Scholar] [CrossRef]
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Hussain, M. YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision. arXiv 2024, arXiv:2407.02988. [Google Scholar]
Solawetz, J.; Francesco. What is YOLOv8? A Complete Guide. Roboflow Blog. 23 October 2024. Available online: https://blog.roboflow.com/what-is-yolov8/ (accessed on 24 February 2026).
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8. arXiv 2024, arXiv:2305.09972. [Google Scholar]
Talib, M.; Al-Noori, A.; Suad, J. YOLOv8-CAB: Improved YOLOv8 for Real-time object detection. Karbala Int. J. Mod. Sci. 2024, 10, 5. [Google Scholar] [CrossRef]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Parambil, M.M.A.; Ali, L.; Swavaf, M.; Bouktif, S.; Gochoo, M.; Aljassmi, H.; Alnajjar, F. Navigating the YOLO Landscape: A Comparative Study of Object Detection Models for Emotion Recognition. IEEE Access 2024, 12, 109427–109442. [Google Scholar] [CrossRef]
Hidayatullah, P.; Tubagus, R. YOLOv9 Architecture Explained. 2024. Available online: https://article.stunningvisionai.com/yolov9-architecture (accessed on 24 February 2026).
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Gusak, J.; Cherniuk, D.; Shilova, A.; Katrutsa, A.; Bershatsky, D.; Zhao, X.; Eyraud-Dubois, L.; Shliazhko, O.; Dimitrov, D.; Oseledets, I.; et al. Survey on Efficient Training of Large Neural Networks. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria, 23–29 July 2022; Raedt, L.D., Ed.; International Joint Conferences on Artificial Intelligence Organization: Freiburg, Germany, 2022; pp. 5494–5501. [Google Scholar] [CrossRef]
Padilla, R.; Netto, S.L.; Da Silva, E.A. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP); IEEE: Piscataway, NJ, USA, 2020; pp. 237–242. [Google Scholar]
Bandaru, J.; Basa, N.; Raghavendra, P.; Sirisha, A. Review on Various Techniques for Wildlife Monitoring and Alerting Systems. In Proceedings of the 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS), Chikkaballapur, India, 18–19 April 2024; Volume 1, pp. 1–5. [Google Scholar] [CrossRef]
Galliera, R.; Suri, N. Object Detection at the Edge: Off-the-shelf Deep Learning Capable Devices and Accelerators. Procedia Comput. Sci. 2022, 205, 239–248. [Google Scholar] [CrossRef]
Alqahtani, D.K.; Cheema, A.; Toosi, A.N. Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices. arXiv 2024, arXiv:2409.16808. [Google Scholar] [CrossRef]
Cetre, C.; Ferreira, F.; Sevin, A.; Barrere, R.; Gratadour, D. Real-time high performance computing using a Jetson Xavier AGX. In Proceedings of the 11th European Congress Embedded Real Time System (ERTS2022), Toulouse, France, 1–2 June 2022. [Google Scholar]
Ravi, N.; Goel, A.; Davis, J.C.; Thiruvathukal, G.K. Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis. arXiv 2025, arXiv:2505.03165. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
Iqbal, U.; Davies, T.; Perez, P. A review of recent hardware and software advances in GPU-accelerated edge-computing Single-Board Computers (SBCs) for computer vision. Sensors 2024, 24, 4830. [Google Scholar] [CrossRef]
Joshua, C.; Karkala, S.; Hossain, S.; Krishnapatnam, M.; Aggarwal, A.; Zahir, Z.; Pandhare, V.; Shah, V. Cross-Platform Optimization of ONNX Models for Mobile and Edge Deployment. Int. J. Wirel. Netw. Broadband Technol. 2025. [Google Scholar]
Verma, G.; Gupta, Y.; Malik, A.M.; Chapman, B. Performance Evaluation of Deep Learning Compilers for Edge Inference. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Portland, OR, USA, 17–21 June 2021; pp. 858–865. [Google Scholar] [CrossRef]
Cantero, D.; Esnaola-Gonzalez, I.; Miguel-Alonso, J.; Jauregi, E. Benchmarking Object Detection Deep Learning Models in Embedded Devices. Sensors 2022, 22, 4205. [Google Scholar] [CrossRef] [PubMed]
Maulana, D.; Ramasamy, R.K. Systematic Review of Quantization-Optimized Lightweight Transformer Architectures for Real-Time Fruit Ripeness Detection on Edge Devices. Computers 2026, 15, 69. [Google Scholar] [CrossRef]
Liu, J.; Shen, Z.; He, Y.; Zhang, X.; Xu, R.; Yu, H.; Cui, P. Towards out-of-distribution generalization: A survey. arXiv 2021, arXiv:2108.13624. [Google Scholar]

Figure 1. Sample images from two sources: Roboflow Universe and Idaho Cameratraps.

Figure 2. Overview of YOLOv9 architecture [44].

Figure 3. Raspberry Pi 5 (left) and NVIDIA Jetson AGX Xavier (right).

Figure 4. Example deer detections using YOLOv11m on Cameratraps dataset.

Figure 5. Latency-reliability trade-off across models on Raspberry Pi.

Figure 6. Latency-reliability trade-off across models on NVIDIA Jetson AGX Xavier.

Figure 7. Efficiency frontier in NVIDIA Jetson AGX Xavier (blue) and Raspberry Pi 5 (red).

Table 1. Dataset and deep learning models across deer detection literature (N/A = Not Accessible).

Paper	Year	Images	Detection Model	Acquisition Method	Availability
[23]	2025	39,000	YOLOv11, RT-DETR	UAV Capture	N/A
[21]	2025	N/A	YOLOv5	Roboflow	Open Source
[12]	2024	6400	YOLOv8n	Cameratrap	N/A
[20]	2024	7000	YOLOv8n	Roboflow	Open Source
[24]	2024	400	YOLOv9	Google Images	N/A
[13]	2024	39,000	YOLOv8n-seg	UAV Capture	N/A
[25]	2023	1063	YOLOv5, ResNet	Mixed	N/A
[26]	2023	4000	Custom, MobileNet, VGG	N/A	N/A
[27]	2021	1000	YOLOv3, Tiny-YOLOv3	Cameratrap	N/A
[28]	2020	N/A	YOLOv3	Manual	N/A
[29]	2020	1067	CNN + HOG Custom	ThermalCam	N/A

Table 2. YOLO model variants used in this study.

Model Name	Parameters (Millions)	GFLOPs
YOLOv8n	3.01	8.09
YOLOv8s	11.13	28.44
YOLOv8m	25.84	78.69
YOLOv9t	1.97	7.60
YOLOv9s	7.17	26.73
YOLOv9m	20.01	76.51
YOLOv10n	2.27	6.53
YOLOv10s	7.22	21.41
YOLOv10m	15.31	58.85
YOLOv11n	2.58	6.31
YOLOv11s	9.41	21.30
YOLOv11m	20.03	67.65

Table 3. Specifications of the device used for model training.

Item	Specification	Value
CPU	Model	AMD Ryzen Threadripper PRO 7965WX
	Cores	24
	Threads	48 processors (2 threads per core)
	Architecture	x86_64
	Boost Clock	5.3 GHz
GPU	Model	NVIDIA GeForce RTX 5090
	VRAM	32 GB
	Power Limit	600 W
	CUDA	12.9
Memory	Total RAM	125 GB
	Available RAM	115 GB
System	Platform	Linux

Table 4. Key training details.

Parameter	Value
Input resolution	$640 \times 640$
Epochs	100
Batch size	48
Optimizer	SGD
Learning rate ( $l r_{0}$ )	0.01
Momentum	0.937
Weight decay	$0.0005$
Learning rate scheduler	Linear decay (to $0.01 \times l r_{0}$ )
IoU threshold	0.5
Weights initialization	COCO-pretrained
Random seed	0

Table 5. Inference and evaluation settings used across edge devices.

Category/Setting	Jetson	Raspberry Pi 5
Model format	ONNX (opset 13)	ONNX (opset 13)
Input resolution	$640 \times 640$	$640 \times 640$
Input color space	RGB	RGB
Input normalization	Pixel values scaled by $1 / 255$	Pixel values scaled by $1 / 255$
Input tensor precision	FP16	FP16
Inference framework	ONNX Runtime	ONNX Runtime
Execution provider	CUDAExecutionProvider	CPUExecutionProvider
Confidence threshold	0.25	0.25
IoU threshold (TP/FP)	0.5	0.5
Matching strategy	Greedy NMS	Greedy NMS

Table 6. Cross-dataset evaluation results (all models were trained on Roboflow dataset).

Model	Evaluation Dataset	Precision	Recall	AP@0.5	AP@0.5:0.95
YOLOv10s	RoboFlow	0.9868	0.9594	0.9910	0.8484
YOLOv11n		0.9894	0.9797	0.9920	0.8475
YOLOv8n		0.9867	0.9847	0.9918	0.8365
YOLOv10s	Cameratraps	0.9032	0.5672	0.7334	0.5356
YOLOv11n		0.9306	0.5934	0.7395	0.5341
YOLOv8n		0.9227	0.5773	0.7199	0.5035

Note: Blue indicates best performance (Highest AP). Red indicates worst.

Table 7. Performance and complexity of YOLO models on different edge devices, where, Ops = GFLOPs (in Millions), Params = Parameters (in Millions).

Model	Inference (ms)			FPS		AP@0.5			Ops	Params
Model	Jetson	Pi 5	High-End GPU	Jetson	Pi 5	Jetson	Pi 5	High-End GPU	Ops	Params
y11n	15	219	1.4	68	5	0.8677	0.8351	0.8439	6.31	2.58
y10n	13	247	1.2	75	4	0.8056	0.8149	0.8056	6.53	2.27
y9t	18	267	1.8	54	4	0.8367	0.8447	0.8368	7.60	1.97
y8n	13	217	1.4	76	5	0.8397	0.8711	0.8411	8.09	3.01
y11s	28	515	1.7	36	2	0.8581	0.8869	0.8581	21.30	9.41
y10s	25	472	1.4	41	2	0.8588	0.8933	0.8601	21.41	7.22
y9s	30	544	2.0	34	2	0.8657	0.8710	0.8657	26.73	7.17
y8s	23	494	1.6	43	2	0.8699	0.8670	0.8699	28.44	11.13
y10m	54	985	1.8	18	1	0.8512	0.8597	0.8512	58.85	15.31
y11m	65	1702	2.4	15	0	0.8677	0.8755	0.8676	67.65	20.03
y9m	71	1120	2.6	14	0	0.8760	0.9008	0.8760	76.51	20.01
y8m	55	1075	2.2	18	0	0.8607	0.8917	0.8620	78.69	25.84

Note: Blue indicates best performance (Lowest Time/Ops/Params; Highest FPS/AP). Red indicates worst.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Adhikari, B.; Li, J.; Michel, E.S.; Dykes, J.; Tseng, T.-M.; Tagert, M.L.; Chen, D. A Comprehensive Evaluation of YOLO-Based Deer Detection Performance on Edge Devices. Electronics 2026, 15, 1026. https://doi.org/10.3390/electronics15051026

AMA Style

Adhikari B, Li J, Michel ES, Dykes J, Tseng T-M, Tagert ML, Chen D. A Comprehensive Evaluation of YOLO-Based Deer Detection Performance on Edge Devices. Electronics. 2026; 15(5):1026. https://doi.org/10.3390/electronics15051026

Chicago/Turabian Style

Adhikari, Bishal, Jiajia Li, Eric S. Michel, Jacob Dykes, Te-Ming Tseng, Mary Love Tagert, and Dong Chen. 2026. "A Comprehensive Evaluation of YOLO-Based Deer Detection Performance on Edge Devices" Electronics 15, no. 5: 1026. https://doi.org/10.3390/electronics15051026

APA Style

Adhikari, B., Li, J., Michel, E. S., Dykes, J., Tseng, T.-M., Tagert, M. L., & Chen, D. (2026). A Comprehensive Evaluation of YOLO-Based Deer Detection Performance on Edge Devices. Electronics, 15(5), 1026. https://doi.org/10.3390/electronics15051026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comprehensive Evaluation of YOLO-Based Deer Detection Performance on Edge Devices

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition

2.2. Deep Learning Models

2.2.1. YOLOv8

2.2.2. YOLOv9

2.2.3. YOLOv10

2.2.4. YOLOv11

2.3. Training

2.4. Testing and Performance Evaluation

2.5. Inference on Edge Devices

2.5.1. Raspberry Pi 5

2.5.2. Jetson AGX Xavier

2.5.3. Model Deployment and Optimization

3. Results

3.1. Cross-Dataset Evaluation

3.2. Raspberry Pi 5 (CPU-Based Inference)

3.3. NVIDIA Jetson AGX Xavier (GPU-Accelerated Inference)

3.4. Comparative Analysis of Model Performance

4. Discussion

4.1. Assessment of the Domain Gap

4.2. Choices of CPU or GPU Edge Devices

4.3. Robustness to Other Species

4.4. Dataset Limitation and Scope

4.5. Real-World Deployment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI