A Lightweight YOLO-Based Architecture for Apple Detection on Embedded Systems

Olguín-Rojas, Juan Carlos; Vasquez, Juan Irving; López-Canteñs, Gilberto de Jesús; Herrera-Lozada, Juan Carlos; Mota-Delfin, Canek

doi:10.3390/agriculture15080838

Open AccessArticle

A Lightweight YOLO-Based Architecture for Apple Detection on Embedded Systems

by

Juan Carlos Olguín-Rojas

¹

,

Juan Irving Vasquez

^2,*

,

Gilberto de Jesús López-Canteñs

¹

,

Juan Carlos Herrera-Lozada

²

and

Canek Mota-Delfin

¹

Departamento de Ingeniería Mecánica Agrícola, Universidad Autónoma Chapingo, Carretera México-Texcoco km 38.5., Texcoco de Mora 56230, Mexico

²

Centro de Innovación y Desarrollo Tecnológico en Cómputo (CIDETEC), Instituto Politécnico Nacional (IPN), Juan de Dios Bátiz s/n esq. Miguel Othón de Mendizábal, Col. Nueva Industrial Vallejo, Del. Gustavo A. Madero, Mexico City 07738, Mexico

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(8), 838; https://doi.org/10.3390/agriculture15080838

Submission received: 22 February 2025 / Revised: 1 April 2025 / Accepted: 7 April 2025 / Published: 13 April 2025

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Apples are among the most important agricultural products worldwide. Ensuring apple quality with minimal effort is crucial for both large-scale and local producers. In Mexico, the manual detection of damaged apples has led to inconsistencies in product quality, a problem that can be addressed by integrating vision systems with machine learning algorithms. The YOLO (You Only Look Once) neural network has significantly improved fruit detection through image processing and has automated several related tasks. However, training and deploying YOLO models typically requires substantial computational resources, making it essential to develop lightweight and cost-effective detection systems, especially for edge computing systems. This paper presents a mechatronic system designed to detect apple varieties and potential damage in apples (Malus domestica) within the visible spectrum. The cultivated apple varieties considered were Gala, Golden, Granny Smith, and Red Delicious. Our contribution lies in developing a lightweight neural network architecture optimized specifically for embedded systems. The proposed architecture was compared against YOLOv3-Tiny, YOLOv4-Tiny, and YOLOv5-s. Our optimized model achieved a high accuracy and sensitivity (94–99%) and was successfully implemented on a Jetson Xavier NX board, where it reached a processing speed of 37 FPS.

Keywords:

machine learning; computer vision; apple quality control; YOLO; embedded systems

1. Introduction

Apples (Malus domestica) are among the most important horticultural crops worldwide, with an annual production of approximately 716.93 million tons. While China and the United States lead global production, Mexico ranks thirteenth in apple cultivation [1].

Apple quality assessment is performed throughout the supply chain to classify the product and determine its commercial value. Therefore, accurately detecting color defects, physical damage, and diseases is essential to minimize losses and meet market standards. Additionally, during the storage of products, physiological disorders, such as bitter pit, internal browning, sunburn, and superficial scald, can further reduce the fruit’s market value [1,2].

In Mexico, the manual inspection of apple quality presents limitations that can be addressed through automation [2]. In recent years, computer vision has advanced rapidly, and artificial intelligence (AI) has become a powerful tool in this field [3]. AI algorithms can analyze images of agricultural products to asses attributes such as the size, shape, color, ripeness, quality, and defects [4,5]. However, many current solutions require considerable computational resources, often relying on specialized units (GPUs) to process the large number of operations demanded by AI’s deep neural networks. These networks consist of numerous layers and millions of interconnected neurons, making them highly powerful in terms of processing and learning capacity [6]. Nonetheless, training such deep networks requires substantial amounts of data and considerable computational resources.

This high cost can hinder the adoption of vision systems in rural or medium-scale agricultural settings, where budgets are often more limited and infrastructure is restricted [7]. For instance, real-time fruit inspection in orchards or packing lines demands fast data processing and immediate feedback; however, many rural facilities lack large GPU clusters or a stable power supply. Consequently, there is a need for efficient and lightweight solutions that maintain a high accuracy without requiring expensive hardware or energy resources.

The objective of this study was to develop a solution for damage detection in four apple cultivars (Gala, Golden, Granny Smith, and Red Delicious) that addresses the limitations of low-efficiency in situ vision systems while also meeting strict real-time constraints. In this sense, our main contribution lay in modifying the YOLO architecture [8] by reducing the number of layers and filters, as well as removing layers from the detection heads. In a controlled environment, such as a conveyor belt, factors like the apple’s aspect ratio, lighting conditions, and fruit occlusion remain relatively stable, allowing the network to be simplified and enabling real-time inference.

Our research methodology was based on designing an ablation study that involved several baseline architectures in order to achieve a good trade-off between computational reduction and performance. To validate the effectiveness and efficiency of the found lightweight architecture, we conducted a two-step evaluation. First, we compared its performance against the YOLOv3-Tiny, YOLOv4-Tiny, and YOLOv5-s architectures. Our lightweight architecture achieved high accuracy and sensitivity values that ranged from 94% to 99%. Second, we implemented the architecture on a Jetson Xavier NX board, which achieved a processing speed of 37 FPS for apple detection and the classification of potential damage.

This document is organized as follows: Section Related Work provides an overview of techniques and methods reported in the literature for apple damage detection. Section 2.1 describes the design and construction of a conveyor belt system used for collecting apple images. Section 2.2 details the RGB image dataset utilized in this work. Section 2.3 outlines the proposed solution for apple damage detection, while Section 2.6 elaborates on the experimental design, including the metrics and hardware requirements for implementation. Section 3 presents the experimental results, Section 4 discusses these findings, and Section 5 concludes the paper.

Related Work

Numerous scientific studies explored fruit quality assessment, where many relied on destructive techniques. In contrast, artificial vision was proven effective at classifying damage in various fruits. For instance, Zhang et al. [9] developed a self-propelled apple harvester equipped with an in-field sorting machine that utilizes computer vision to make decisions based on color, size, shape, and defects, and it demonstrated a high inspection speed at low cost. Similarly, Lu and Lu [10] enhanced defect detection by incorporating fruit surface characteristics, including the presence of stems and calyxes.

The methods commonly used for classifying apple varieties and detecting surface defects are based on digital image processing techniques involving background removal, defect segmentation, and stem and calyx identification [11]. Sun et al. [12] emphasized that an automatic apple detection system should consider factors such as the color, weight, size, and defects, highlighting the importance of surface shine as a freshness indicator. Li et al. [13] introduced a novel methodology that integrates compressed sensing (CS) with lidar technology to acquire single-pixel three-dimensional imaging, which is also regarded as a noninvasive detection approach.

Sofu et al. [14] identified spotting and decay as primary factors that affect apple classification quality. Sachin et al. [15] utilized the YOLO algorithm for vegetable detection, preprocessing images by manually drawing bounding boxes using OpenCV. Their method demonstrated a fast and efficient way to identify objects in images and videos, and achieved an accuracy of 61.6%.

Tian et al. [16] focused on recognizing anthracnose lesions on apple surfaces using YOLOv3-Dense, a deep learning-based strategy that outperformed Faster R-CNN with VGG16-NET. Their proposed model attained an average precision at 0.5 intersection over union and (mean average precision at 50% IoU threshold) (mAP50) of 0.917, in contrast to YOLOv3, which recorded an mAP50 of 0.903, and Faster R-CNN utilizing VGG16-NET, which registered an mAP50 of 0.863. Siddiqi [17] worked on automated defect detection in apples using an SSD (single-shot detector) and YOLOv2 models after creating a dataset of 244 defective apple images. Their results demonstrate that the SSD-based system outperformed the YOLOv2-based system. The apple defect detection system based on YOLOv2 achieved an mAP50 of 0.725, whereas the apple defect detection system grounded in a single-shot multibox detector (SSD) achieved an mAP50 of 0.878.

Chen et al. [18] proposed a fruit quality identification system using AI and the YOLOv3 algorithm to detect round fruits, such as apples, oranges, and lemons. Their system featured a graphical user interface for data collection and model evaluation, where it achieved an accuracy of up to 88% with 6000 fruit images. Similarly, Huang et al. [19] introduced an improved YOLOv3 model for detecting immature apples in orchard environments, where it achieved a mean average precision (mAP) of 0.675 and a detection speed of up to 83 FPS on a 1080Ti GPU.

Hu et al. [20] developed a system for detecting and classifying apples in the field using multi-feature fusion that incorporates size, color, shape, and surface defects into a classification model based on a Support Vector Machine (SVM). Their system achieved an average classification accuracy of 95.49% and demonstrated high effectiveness in field conditions. Nguyen et al. [21] explored the early detection of mild bruises in apples using near-infrared (NIR) imaging, where they concluded that this approach holds promise for detecting damage in apples and other fruits with soft and thin skin.

Numerous adaptations of YOLO architectures pertaining to object detection have been documented in the scholarly literature, yielding promising outcomes.

Tian et al. [22] presented an enhanced YOLO-V3 architecture meticulously tailored for the real-time detection of apples within orchard ecosystems. The investigators procured image datasets utilizing a camera with a resolution of 3000 × 4000 pixels, which captured images under a diverse array of lighting and meteorological conditions. To enrich the dataset, methodologies for data augmentation were employed, thereby enhancing its variability. In order to optimize the low-resolution feature layers intrinsic to YOLO-V3, the authors incorporated the DenseNet methodology with the objective of promoting feature propagation and augmenting the overall efficacy of the network. The proposed YOLOv3-Dense model exhibited a significant enhancement in performance compared with the original YOLO-V3 network, particularly regarding the detection accuracy and real-time operational capabilities.

Sharpe and colleagues developed a precision applicator aimed at achieving the effective control of goosegrass within Florida’s plant plasticulture production systems [23]. The investigation assessed the efficacy of the YOLOv3-Tiny detector for on-site identification and subsequent spraying. The image processing encompassed a diverse array of plant species, including strawberries and tomatoes, along with various weed species, to facilitate the training and evaluation of the neural network.

Junos et al. [24] introduced YOLO-P, an advanced object detection framework that facilitates the recognition and localization of objects (such as the FFB, grabber, and palm tree) within oil palm plantations. Following a comprehensive series of experiments, the proposed framework demonstrated an exceptional average accuracy and F1 scores of 98.68% and 0.97, respectively. Extensive experimental outcomes indicate that YOLO-P possesses the capability to accurately and reliably detect objects within palm oil plantations, thereby contributing to enhanced productivity and optimized operational expenditures within the agricultural domain.

Sekharamantry et al. [25] introduced a YOLO-based apple detection approach that integrates an attention mechanism and an optimized loss function. Meanwhile, Liu et al. [26] presented an enhanced YOLOv5s-based method capable of detecting Fuji apple ripeness and diameter in real time. By incorporating feature fusion modules and a dual attention mechanism, they achieved a 98.7% accuracy rate in ripeness detection. Additionally, the model demonstrates high computational efficiency, with inference speeds of 155 FPS for the ripeness determination and 56 FPS for the diameter measurement, making it highly suitable for automated apple harvesting. Zuo et al. [27] proposed LBDC-YOLO, a YOLOv8-based model focused on precise and efficient broccoli head detection in complex agricultural settings. By adopting a triple attention mechanism, the model significantly reduces the parameter count and computational load, where it achieved a 94.44% accuracy. Compared with previous models, like YOLOv5n and YOLOv7-tiny, LBDC-YOLO delivers enhanced computational efficiency without sacrificing accuracy. Han et al. [28] presented Rep-ViG-Apple, a hybrid CNN-GCN framework designed to improve apple detection in challenging environments characterized by branch occlusion and varying lighting conditions. The reparameterized feature extraction blocks, combined with a graph-based attention mechanism, enhance object detection across multiple scenarios. The model achieved an average accuracy of 93.3%, which surpassed earlier designs, and reduced its size by 22%, making it more practical for devices with limited computational resources.

Meng et al. [5] presented a spatiotemporal convolutional neural network architecture that capitalizes on the Shifted Window Transformer Fusion Region Convolutional Neural Network framework for the purpose of pineapple detection. The investigation encompassed a comparative evaluation of the findings in relation to those derived from traditional models.

Li et al. [7] introduced an optimized, lightweight YOLOv5s architecture tailored for the detection of pitayas within both diurnal and nocturnal supplemental lighting conditions, and successfully deployed it on an Android platform. This architecture initially employed the shufflenetv2 module to restructure the YOLOv5s backbone network. Subsequently, this study advanced a Concentrated Comprehensive Convolution Receptive Field Enhancement (C3RFE) module aimed at augmenting the accuracy of pitaya detection. Furthermore, a Bidirectional Feature Pyramid Network (BiFPN) feature fusion technique was utilized to bolster multi-scale feature integration.

Finally, Chen et al. [29] achieved significant advancements in citrus detection through the development of the CitrusYolo algorithm. The enhancements made to the YoloV4 model encompassed the integration of a 152 × 152 feature detection layer, the establishment of dense connections for multi-scale fusion, and the incorporation of deeply separable convolution and attention mechanism modules. These enhancements culminated in an improved detection accuracy and real-time operational efficiency. CitrusYolo demonstrated outstanding performance, where it surpassed traditional deep learning algorithms in terms of both accuracy and temporal efficiency. Notwithstanding these advancements, the research recognized certain limitations and potential pathways for enhancement. The algorithm’s effectiveness under diverse lighting conditions and its applicability to other fruits or objects in varying environments remain yet to be determined. While experimental results corroborated its effectiveness, real-world applications, such as crop yield estimation and robotic fruit harvesting, necessitate further validation.

2. Materials and Methods

2.1. Data Collection

For the classification, a conveyor belt system equipped with artificial vision was designed and built in our previous work [30]. The system consists of an aluminum frame supporting a food-grade PVC conveyor belt, integrated with an electronic control system and designed to transport apples simultaneously for mass analysis. The design and construction are illustrated in Figure 1. The conveyor belt includes a stainless steel compartment isolated from external lighting conditions; inside, controlled illumination is provided by LED strips, emitting 300 lumens per meter. The capture area measures 200 cm², with the camera and LED lighting positioned at a height of 30 cm each. For the image capture, a Logitech C920 (Logitech Inc., Newark, CA, USA) camera with a maximum resolution of 1080p at 30 FPS was installed. However, for this study, images were stored at 800 × 600 pixels to optimize the balance between image quality and resource usage.

2.2. Dataset

The dataset collected for this study consisted of 4800 RGB images, each with a resolution of 800 × 600 pixels, that corresponded to four apple varieties. The images were manually labeled using LabelImg (version 1.8.6; https://github.com/tzutalin/labelImg, accessed on 15 January 2025) based on the condition of the fruit to distinguish between healthy and damaged apples.

To ensure reproducibility, specific criteria were defined to classify an apple as “damaged”. Only fruit that exhibited visible signs of mechanical injuries were labeled as such, including bruises, punctures, scratches, and cuts. The classification was based on direct visual inspection, taking into account features such as indentations, skin ruptures, abnormal discoloration, and changes in the surface texture.

Consequently, the 4800 images were organized into eight classes with a uniform distribution of 600 images per class: Gala healthy (see Figure 2A), Gala damaged (Figure 2E), Golden Delicious healthy (Figure 2B), Golden Delicious damaged (Figure 2F), Granny Smith healthy (Figure 2C), Granny Smith damaged (Figure 2G), Red Delicious healthy (Figure 2D), and Red Delicious damaged (Figure 2H). For the experiments, 70% of the images were allocated for training, 15% for validation, and 15% for testing.

2.3. General Workflow

Damage detection in apples during the packing and marketing process remains a labor-intensive task. Automating this process is critical to improving the efficiency and accuracy. However, achieving the high-precision and real-time detection of fruit damage is challenging, as it requires distinguishing subtle changes in texture and coloration, which are key indicators of potential damage. These changes, such as bruises, scrapes, and wounds, are directly linked to the internal quality of the fruit.

The proposed lightweight architecture addresses this challenge by leveraging the YOLO architecture [8], a single-stage object detection system capable of detecting multiple objects in a single image while providing precise localization information. YOLO-based architectures are particularly suited for this task due to their ability to handle images of varying sizes and scales, making them ideal for detecting and classifying apple varieties (Gala, Golden, Granny Smith, and Red Delicious) and their potential damage.

The workflow of the proposed system begins with image acquisition, where apples are captured under controlled lighting conditions to ensure consistent quality. These images are then processed by the YOLO-based architecture, which performs two key tasks: (1) detecting the presence of apples and (2) classifying them as healthy or damaged based on texture and coloration features. The architecture has been optimized for deployment on low-cost embedded systems, such as the Jetson Xavier NX board (NVIDIA Corporation, Santa Clara, CA, USA), ensuring real-time processing capabilities with a small computational overhead.

The subsequent illustration (Figure 3) provides a comprehensive depiction of the operational framework of the mechatronic system designed for the identification of damage in apples. This system possesses the capacity to accommodate unsorted apples, capture images of the fruit, and transmit these images to the integrated system, which encompasses a lightweight architecture based on YOLOv5 that is proficient in detecting damage to apples. The outcomes are relayed to a user interface and actuators that are capable of redirecting apples that do not conform to established quality standards away from the designated pathway.

2.4. Apple Detection with Neural Networks

A critical step in object detection is identifying the key components of the problem [31]. For apple damage detection, selecting an appropriate architecture is essential. The architecture must balance the detection accuracy, model size, and inference speed to achieve real-time performance. Implementing this on a conveyor belt requires GPU capabilities, as the number of parameters typically reaches

10^{6}

, necessitating an optimized algorithm for efficient processing.

YOLO [8] is a state-of-the-art object detection framework that divides the input image into a grid of cells. Each cell predicts bounding boxes, confidence scores, and class probabilities for objects within its region. Unlike traditional two-stage detectors, YOLO performs detection in a single forward pass, enabling real-time inference. This approach reduces the computational complexity while maintaining a high accuracy, making YOLO suitable for real-time applications, like apple damage detection.

This study began with YOLOv5s, a compact version of the YOLO architecture, and proposed a lightweight adaptation tailored to the task at hand. Through experimentation, hyperparameters were optimized using a grid search and transfer learning. The goal was to determine whether the proposed architecture outperformed other YOLO variants in terms of the detection speed and effectiveness. The experimental process included hyperparameter optimization, model training, and performance evaluation.

2.4.1. Baseline Neural Network YOLOv3-Tiny

The first approach involved implementing the YOLOv3-Tiny architecture, a compact variant of the original YOLOv3 model. It is specifically designed to be faster and more efficient, enabling real-time object detection in images on devices with limited computational resources. The YOLOv3-Tiny architecture was built on a convolutional neural network (CNN) structure consisting of multiple convolutional and pooling layers, followed by fully connected layers. In contrast to the full YOLOv3 architecture, which comprises 106 layers, YOLOv3-Tiny is streamlined to only 23 layers. To address the saturation problem commonly associated with the ReLU activation function, this architecture employs the LeakyReLU activation function in its convolutional layers.

2.4.2. Baseline Neural Network YOLOv4-Tiny

The second approach was the YOLOv4-Tiny architecture, which uses a convolutional neural network (CNN) composed of 17 layers, including an input layer and an output layer. Another feature of this architecture is the use of the Mish activation function, which replaces the ReLU function. It was demonstrated in the literature that Mish is more effective in terms of performance and accuracy for object detection tasks [31]. The YOLOv4-Tiny architecture employs a learning optimization method called “SPP-Block”, which is a reduced version of the “Spatial Pyramid Pooling” method used in the full YOLOv4 architecture. This method reduces the number of computations required and, consequently, decreases the processing time needed for object detection.

2.4.3. Baseline Neural Network YOLOv5s

In 2020, Glenn Jocher introduced YOLOv5, managed by Ultralytics (Ultralytics LLC, Frederick, MD, USA. YOLOv5 selects PyTorch (https://pytorch.org/, accessed on 15 January 2025) as its developmental framework in lieu of Darknet, incorporating a variety of enhancements aimed at augmenting its efficacy in object detection tasks. Central to YOLOv5 is the CSP (Cross-Stage Partial) Network, which was derived from the ResNet architecture and features a cross-stage partial connection to bolster network efficiency. The CSPNet is further augmented by several SPP (Spatial Pyramid Pooling) blocks that facilitate feature extraction across multiple scales.

The architecture’s neck incorporates a PAN (Path Aggregation Network) module, along with subsequent upsampling layers to enhance the resolution of the feature maps. The head of YOLOv5 employs convolutional layers to generate predictions pertaining to bounding boxes and class labels. YOLOv5 utilizes anchor-based predictions, linking each bounding box to pre-established anchor boxes characterized by specific shapes and dimensions. The loss function utilized in YOLOv5 integrates Binary Cross-Entropy with Complete Intersection over Union (CIoU) for the calculation of class, objectness, and localization losses.

YOLOv5s represents one of the most lightweight and rapid variants of the YOLOv5 model, where it is specifically optimized for speed and operational efficiency (Figure 4). It employs a reduced number of parameters and necessitates less computational power when juxtaposed with its counterparts, such as YOLOv5m, YOLOv5l, and YOLOv5x. Owing to its compact dimensions, YOLOv5s exhibits remarkably low latency, rendering it particularly suitable for deployment on resource-constrained devices; however, its accuracy is inferior to that of its larger variants due to its utilization of fewer layers and parameters within its neural architecture.

2.5. Evaluation Metrics

The metrics used to compare each treatment were the precision, recall, F1-score, accuracy, and mean average precision ( mAP). The precision evaluates the closeness between a prediction result and the true value. It is calculated as the ratio of the correct predictions to the total number of positive predictions. The recall (also known as the true positive rate) is the proportion of positive cases correctly identified by the model. The F1-score combines the precision and recall to facilitate a performance comparison, especially when the class distribution is imbalanced. The accuracy refers to the proportion of correct predictions relative to the total number of cases. Finally, another relevant metric in object detection tasks is the mAP. To calculate it, the average precision ( AP) is first obtained, which consists of the maximum precision achieved by the model at various recall points ranging from 0% to 100%. If there are multiple classes, the process is repeated for each class, and the resulting mAP is computed.

2.6. Design of the Experiment

Our design of experiments aimed to develop a custom lightweight architecture for efficient apple detection, which enabled its implementation in an embedded system. To achieve this, the experiment was structured as a sequential process that comprised three key studies (trainings): (i) an initial search, (ii) a comparison of baseline architectures, and (iii) an ablation study. For a visual representation of the experimental design, refer to Figure 5.

2.6.1. Initial Hyperparameter Search

In the first step, a short training phase, only for 30 epochs, was conducted for each baseline architecture to identify promising hyperparameters and reduce the search space for subsequent experiments. The choice of 30 epochs was based on empirical evidence showing that this duration is adequate to observe meaningful trends in model performance without excessive computational cost. During this experiment, each hyperparameter listed in Table 1 was adjusted within a predefined range, and the configurations that demonstrated the best performance as measured by validation accuracy and loss were selected. The resulting hyperparameters are detailed in Table 1.

2.6.2. Baseline Architectures Comparison

Once the factors of Table 1 were fixed, a random search [32] was defined to train each baseline architecture in order to find the one that best fit the task. The hyperparameters that were optimized with random search were the learning rate, momentum, and weight decay. Table 2 describes the resulting combinations of hyperparameters.

Each configuration was combined with two training approaches: transfer learning (T.L.) using pretrained weights from general datasets, such as COCO, and training from scratch (T.S.), which initialized the weights randomly. The full combination gave a total of 12 experimental units (named EU1 to EU12) (Table 3). The results of the experiment are discussed in Section 3.

2.6.3. Ablation Study of the Best Baseline Architecture

The evaluation of the 12 described experimental units allowed for the selection of the best-performing architecture and hyperparameters in terms of the accuracy, sensitivity, and inference speed. The best neural network was modified in its architecture to reduce the processing time, while its performance was kept. This is usually named as an ablation study. The modifications are summarized in Table 4.

Figure 6 presents the modifications introduced to the YOLOv5s architecture, which allowed it to operate with only two detection scales. This change reduced the total number of network parameters, which corresponded to the M13 treatment listed in Table 4.

Figure 7 shows the simplification applied to the YOLOv5s architecture (M11) to function with a single detection scale. These adjustments corresponded to the M14 and M15 treatments in Table 4. The only distinction was that M15 employed fewer filters, which led to a reduced overall number of network parameters.

2.7. Implementation

For the training and validation, the experiments were conducted using the Google Colab platform, which is equipped with Tesla P100-PCIE-16GB graphics cards. For the real-time detection testing, an image acquisition system based on a conveyor belt was utilized. The system was equipped with two machines. The first one (hardware1) consisted of a computer with an Intel^® i5-10300H (Intel Corporation, Santa Clara, CA, EE.UU.) processor that runs at 2.5 GHz and 8 GB of RAM. It was also equipped with an NVIDIA GEFORCE GTX-1650Ti^® graphics (NVIDIA Corporation, Santa Clara, CA, EE.UU.) processing unit (GPU), as illustrated in Figure 8. This first machine was used only for monitoring. The second machine (hardware2), depicted in Figure 9, featured a NVIDIA Jetson Xavier NX DEVELOPER KIT-SUB^® board with the following specifications: CPU: 6-core NVIDIA Carmel ARM^® v8.2 64-bit CPU with 6 MB L2 + 4 MB L3 cache; GPU: NVIDIA Volta™ architecture incorporating 384 NVIDIA^® CUDA^® cores and 48 Tensor cores. In the second machine (hardware2), the proposed lightweight architecture was implemented using the PyTorch (version 2.6.0; https://pytorch.org/), which allowed for the entire real-time detection process to be carried out entirely locally. For this purpose, a Logitech C920 USB camera was employed for the real-time video acquisition, with its stream processed directly on the Jetson Xavier NX by the previously trained YOLO network.

3. Results

Table 5 presents the results of the comparison of the baseline architectures (trained with random search) for the object detection task. These results include the average precision (P), average recall (R), mean average precision at 50% IoU threshold (mAP50), total number of parameters for each model (T.P.), and average frames per second (AVG-FPS*) measured during the inference.

In Table 5, we can observe that the average precision values ranged from 0.56 to 0.99, while the average recall fell between 0.85 and 0.99. Regarding the mAP50 metric, values between 0.710 and 0.998 were observed. Specifically, the EU11 experiment, which was based on the YOLOv5s architecture, achieved the highest precision and recall (both 0.99), along with an mAP50 of 0.998. This highlights the model’s high detection capability, positioning it as the best alternative among the 12 proposed configurations. Figure 10 presents the confusion matrices for the best baseline achitecture and its modifications.

To illustrate the balance between the precision and recall, as well as the model’s stability across different thresholds, Figure 11 presents the precision–recall (P-R) curves for the base model (EU11) and its modifications (M13, M14, and M15). These curves provide insights into how well each model differentiated between classes while minimizing false positives.

To complement the analysis, Figure 12 displays the F1-score curves for each model. These graphs show how the models balance precision and recall at different decision thresholds, which provided a measure of the overall classification performance.

Finally, Figure 13 shows an example of object detection using the M14 (YOLO-Optimal2) architecture. These images demonstrate the correct identification of objects, reaffirming the usefulness of the reduced versions of the YOLOv5s architecture for applications with limited computational resources.

On the Proposed Lightweight YOLOv5

As a result of the experiment, the proposed solution emerged as the architecture that presented the best performance. See Figure 4, which represents an optimized and lightweight YOLO-based architecture, which was specifically engineered to operate swiftly and efficiently on devices characterized by constrained resources. In contrast to YOLOv5s, the found architecture diminished the quantity of detection scales and convolutional layers, which yielded a more lightweight and expeditious model through the reduction of the overall number of parameters and floating point operations (FLOPs), and culminated in enhanced real-time inference capabilities, particularly on low-end GPUs and CPUs.

4. Discussion

The results of experimental unit EU11 (Table 5) overcome those reported by [33] where their YOLOv3 architecture, which was trained on a dataset of 452 apple images (226 healthy and 226 defective), achieved an

m A P 50

of 0.7430 for YOLOv3 and 0.6959 for the SSD (single-shot detector) architecture.

In the work of [18] it is reported that their YOLOv4 architecture achieved an

m A P 50

of 97.13% for apple detection, a sensitivity rate of 90%, and a detection speed of 51 frames per second. The results of this work show that EU11 (Table 5) not only exceeded the detection performance reported by [18] but also achieved a detection speed of 63.2 frames per second.

Apple detection using a YOLOv5s architecture is reported in the work of [34] where their results indicate detection performance with the following values: sensitivity, precision,

m A P 50

, and F1 score of 91.48%, 83.83%, 86.75%, and 87.49%, respectively. In contrast, the results of EU11 (Table 5), which is reported in this work, also outperform those of [34].

Real-time apple detection systems require optimized architectures to enable harvesting robots to detect apples quickly and accurately in complex environments [35]. To measure the size of a YOLO network, it was necessary to count the total number of parameters in the network. The total number of parameters depends on the network architecture, as well as the input image size, the number of convolutional layers, and the size of convolutional filters, among other factors. Experimental unit EU11 has a total of approximately

7.04 \times 10^{6}

parameters, and to optimize this architecture, modifications (M13, M14, and M15) were made, as reported in Table 4, with total parameters of

5.24 \times 10^{6}

,

4.78 \times 10^{6}

, and

1.20 \times 10^{6}

, respectively.

Finally, [35] reported that their optimized Des-YOLOv4 architecture achieved an

m A P 50

of 97.13% for apple detection, a sensitivity rate of 90%, and a detection speed of 51 frames per second, evaluated on an Intel^® Core^TM i7-9700KF CPU @ 3.60 GHz, NVIDIA RTX2080 GPU (8 GB VRAM), and 32 GB RAM hardware. The lightest architecture in this work corresponded to modification M15, which achieved an

m A P 50

of 97.76% and a detection speed of 37 frames per second when evaluated on lower-capacity hardware (Jetson Xavier Nx board, as described in Figure 9).

5. Conclusions

In this study, a mechatronic system was developed for detecting apple varieties (Malus domestica) and potential damage to them. The system was designed for on-site operation with low computational demands. Our primary contribution lay in the proposal of a lightweight neural network architecture, which delivered a high performance with minimal computational overhead.

The experimental results demonstrate that the proposed modifications yielded a good performance. The architecture achieved 100% accuracy when detecting healthy Gala and healthy Red apple categories. For the remaining six apple categories, the detection rates ranged from 94% to 99%. Furthermore, the system achieved a mean average precision (mAP50) of 99.76%, underscoring its high detection accuracy. These findings confirm the effectiveness of the architecture for apple detection, showcasing its capability to reliably distinguish between different apple categories and possible damage.

A second key finding of this work is that the proposed modifications resulted in significantly lighter architectures, where they achieved size reductions of up to 82.96% while maintaining a high precision and sensitivity, with average values that ranged between 94% and 99%. This confirms that the streamlined architecture is suitable for implementation on low-cost computational hardware. Additionally, real-time detection was demonstrated, where the system achieved a detection speed of 37 frames per second on the embedded platform. Based on these findings, the proposed architecture can be applied on-site to solve the proposed task.

Future research could explore the scalability of the proposed lightweight architecture to other fruit varieties and agricultural applications. Integrating multispectral or hyperspectral imaging may also improve the robustness under varying lighting and environmental conditions. Finally, deploying the system in real-world agricultural settings and evaluating its performance in large-scale operations and with different platforms would provide valuable insights for practical implementations.

Author Contributions

Conceptualization: J.C.O.-R., G.d.J.L.-C., J.C.H.-L. and J.I.V.; Methodology: J.C.O.-R., J.I.V. and G.d.J.L.-C.; Writing—Review and Editing: J.C.O.-R., J.I.V. and C.M.-D.; Design of Experiments: J.C.O.-R., C.M.-D. and J.C.H.-L.; Experimentation: J.C.O.-R., C.M.-D. and G.d.J.L.-C. All authors read and agreed to the published version of this manuscript.

Funding

This research was supported by SIP Innovation project award number 20242883.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank the support of the Secretaría de Ciencia, Humanidades, Tecnología e Innovación (SECIHTI) and Secretaría de Investigación y posgrado (SIP-IPN).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Leo, L.S.C.; Hernández-Martínez, D.M.; Meza-Márquez, O.G. Analysis of physicochemical parameters, phenolic compounds and antioxidant capacity of peel, pulp and whole fruit of five apple varieties (Malus domestica) harvested in Mexico. BIOtecnia 2019, 22, 166–174. [Google Scholar] [CrossRef]
Nguyen, N.H.; Michaud, J.; Mogollon, R.; Zhang, H.; Hargarten, H.; Leisso, R.; Torres, C.A.; Honaas, L.; Ficklin, S. Rating pome fruit quality traits using deep learning and image processing. Plant Direct 2024, 8, e70005. [Google Scholar] [CrossRef] [PubMed]
Ghazal, S.; Munir, A.; Qureshi, W.S. Computer vision in smart agriculture and precision farming: Techniques and applications. Artif. Intell. Agric. 2024, 13, 64–83. [Google Scholar] [CrossRef]
Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: Berlin, Germany, 2022. [Google Scholar]
Meng, F.; Li, J.; Zhang, Y.; Qi, S.; Tang, Y. Transforming unmanned pineapple picking with spatio-temporal convolutional neural networks. Comput. Electron. Agric. 2023, 214, 108298. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2017; Volume 1. [Google Scholar]
Li, H.; Gu, Z.; He, D.; Wang, X.; Huang, J.; Mo, Y.; Li, P.; Huang, Z.; Wu, F. A lightweight improved YOLOv5s model and its deployment for detecting pitaya fruits in daytime and nighttime light-supplement environments. Comput. Electron. Agric. 2024, 220, 108914. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Zhang, Z.; Pothula, A.K.; Lu, R. Economic evaluation of Apple Harvest and In-Field sorting technology. Trans. Asabe 2017, 60, 1537–1550. [Google Scholar] [CrossRef]
Lu, Y.; Lu, R. Non-Destructive defect detection of Apples by Spectroscopic and Imaging Technologies: A review. Trans. Asabe 2017, 60, 1765–1790. [Google Scholar] [CrossRef]
Wang, S.H.; Chen, Y. Fruit category classification via an eight-layer convolutional neural network with parametric rectified linear unit and dropout technique. Multimed. Tools Appl. 2020, 79, 15117–15133. [Google Scholar] [CrossRef]
Sun, K.; Li, Y.; Peng, J.; Tu, K.; Pan, L. Surface gloss Evaluation of Apples based on Computer Vision and Support Vector Machine Method. Food Anal. Methods 2017, 10, 2800–2806. [Google Scholar] [CrossRef]
Li, X.; Hu, Y.; Jie, Y.; Zhao, C.; Zhang, Z. Dual-Frequency LiDAR for Compressed Sensing 3D Imaging Based on All-Phase Fast Fourier Transform. J. Opt. Photonics Res. 2024, 1, 74–81. [Google Scholar] [CrossRef]
Sofu, M.; Er, O.; Kayacan, M.; Cetişli, B. Design of an automatic apple sorting system using machine vision. Comput. Electron. Agric. 2016, 127, 395–405. [Google Scholar] [CrossRef]
Sachin, C.; Manasa, N.L.; Sharma, V.; Kumaar, N.A.A. Vegetable Classification Using You Only Look Once Algorithm. In Proceedings of the 2019 International Conference on Cutting-Edge Technologies in Engineering (ICon-CuTE), Uttar Pradesh, India, 14–16 November 2019; pp. 101–107. [Google Scholar]
Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Detection of Apple Lesions in Orchards Based on Deep Learning Methods of CycleGAN and YOLOV3-Dense. J. Sens. 2019, 2019, 7630926. [Google Scholar] [CrossRef]
Siddiqi, R. Automated apple defect detection using state-of-the-art object detection techniques. SN Appl. Sci. 2019, 1, 1345. [Google Scholar] [CrossRef]
Chen, W.; Zhang, J.; Guo, B.; Wei, Q.; Zhu, Z. An Apple Detection Method Based on Des-YOLO v4 Algorithm for Harvesting Robots in Complex Environment. Math. Probl. Eng. 2021, 2021, 7351470. [Google Scholar] [CrossRef]
Huang, Z.; Zhang, P.; Liu, R.; Li, D. Immature Apple detection method based on improved Yolov3. Asp. Trans. Internet Things 2021, 1, 9–13. [Google Scholar] [CrossRef]
Hu, G.; Zhang, E.; Zhou, J.; Zhao, J.; Gao, Z.; Sugirbay, A.; Jin, H.; Zhang, S.; Chen, J. Infield Apple Detection and grading based on Multi-Feature Fusion. Horticulturae 2021, 7, 276. [Google Scholar] [CrossRef]
Nguyen, C.N.; Lam, V.L.; Le, P.H.; Ho, H.T.; Nguyen, C.N. Early detection of slight bruises in apples by cost-efficient near-infrared imaging. Int. J. Electr. Comput. Eng. (IJECE) 2022, 12, 349–357. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Sharpe, S.M.; Schumann, A.W.; Boyd, N.S. Goosegrass Detection in Strawberry and Tomato Using a Convolutional Neural Network. Sci. Rep. 2020, 10, 9548. [Google Scholar] [CrossRef]
Junos, M.H.; Khairuddin, A.S.M.; Thannirmalai, S.; Dahari, M. An optimized YOLO-based object detection model for crop harvesting system. IET Image Process. 2021, 15, 2112–2125. [Google Scholar] [CrossRef]
Sekharamantry, P.K.; Melgani, F.; Malacarne, J. Deep Learning-Based Apple Detection with Attention Module and Improved Loss Function in YOLO. Remote Sens. 2023, 15, 1516. [Google Scholar] [CrossRef]
Liu, J.; Zhao, G.; Liu, S.; Liu, Y.; Yang, H.; Sun, J.; Yan, Y.; Fan, G.; Wang, J.; Zhang, H. New Progress in Intelligent Picking: Online Detection of Apple Maturity and Fruit Diameter Based on Machine Vision. Agronomy 2024, 14, 721. [Google Scholar] [CrossRef]
Zuo, Z.; Gao, S.; Peng, H.; Xue, Y.; Han, L.; Ma, G.; Mao, H. Lightweight Detection of Broccoli Heads in Complex Field Environments Based on LBDC-YOLO. Agronomy 2024, 14, 2359. [Google Scholar] [CrossRef]
Han, B.; Lu, Z.; Zhang, J.; Almodfer, R.; Wang, Z.; Sun, W.; Dong, L. Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments. Agronomy 2024, 14, 1733. [Google Scholar] [CrossRef]
Chen, W.; Lu, S.; Liu, B.; Chen, M.; Li, G.; Qian, T. CitrusYOLO: A Algorithm for Citrus Detection under Orchard Environment Based on YOLOv4. Multimed. Tools Appl. 2022, 81, 31363–31389. [Google Scholar] [CrossRef]
Olguín-Rojas, J.C.; Vasquez-Gomez, J.I.; López-Canteñs, G.D.J.; Herrera-Lozada, J.C. Clasificación de manzanas con redes neuronales convolucionales. Rev. Fitotec. Mex. 2022, 45, 369. [Google Scholar] [CrossRef]
Kolosova, T.; Berestizhevsky, S. Supervised Machine Learning: Optimization Framework and Applications with SAS and R; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
Kelleher, J.D.; Mac Namee, B.; D’Arcy, A. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
Valdez, P. Apple Defect Detection Using Deep Learning Based Object Detection for Better Post Harvest Handling. arXiv 2020, arXiv:2005.06089. [Google Scholar] [CrossRef]
Yan, B.; Fan, P.; Lei, X.; Liu, Z.; Yang, F. A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5. Remote Sens. 2021, 13, 1619. [Google Scholar] [CrossRef]
Wang, Z.; Jin, L.; Wang, S.; Xu, H. Apple Stem/Calyx Real-Time Recognition Using YOLO-v5 Algorithm for Fruit Automatic Loading System. Postharvest Biol. Technol. 2021, 185, 111808. [Google Scholar] [CrossRef]

Figure 1. Design of the conveyor belt (a) and constructed conveyor belt (b) for detecting damage in apples (Olguín-Rojas et al. [30]).

Figure 2. Examples of apple images collected on the conveyor belt (adapted from Olguín-Rojas et al. [30]).

Figure 3. Depiction of the operational framework of the mechatronic system designed for the identification of damage in apples (figure by the authors).

Figure 4. Original YOLOv5s architecture (figure by the authors). The asterisk (*) indicates the base structure to be scaled by a factor x to generate other YOLOv5 variants (e.g., YOLOv5m, YOLOv5l, YOLOv5x).

Figure 5. Design of the experiment to obtain the proposed architecture. First, an initial search was performed to reduce the number of hyperparameters. Second, a random search was performed to train each baseline architecture in the target task. Finally, the best baseline architecture was used in an ablation study to obtain a lightweight architecture (figure by the authors).

Figure 6. Modified YOLOv5s architecture designed to operate with two detection scales (M13) (figure by the authors). The asterisk (*) indicates the base structure to be scaled by a factor x to generate other YOLOv5 variants (e.g., YOLOv5m, YOLOv5l, YOLOv5x).

Figure 7. Modified YOLOv5s architecture configured for a single detection scale (M14 and M15) (figure by the authors). The asterisk (*) indicates the base structure to be scaled by a factor x to generate other YOLOv5 variants (e.g., YOLOv5m, YOLOv5l, YOLOv5x).

Figure 8. Conveyor system. The system included the image capturing and a machine for monitoring purposes (figure by the authors).

Figure 9. System implemented with a Jetson Xavier NX DEVELOPER KIT-SUB^® board (figure by the authors).

Figure 10. Confusion matrices for (a) M11-YOLOv5s, (b) M13-YOLO-Optimal1 (two detection scales), (c) M14-YOLO-Optimal2 (a single scale), and (d) M15-YOLO-Optimal3 (a single scale with reduced depth) (figure by the authors).

Figure 11. Precision–recall (P-R) curves for treatments M11 (YOLOv5s), M13, M14, and M15. These curves illustrate the model’s performance in distinguishing between classes at different recall levels (figure by the authors).

Figure 12. F1-score curves for treatments M11 (YOLOv5s), M13, M14, and M15. These graphs illustrate how the precision and recall were balanced to achieve the highest classification accuracy (figure by the authors).

Figure 13. Example of object detection using the M14 (YOLO-Optimal2) architecture on different apple varieties (figure by the authors).

Table 1. Parameter established after the initial training phase (also known as orthogonal factors for the following experiments). Parameters 3, 4, 5, and 6 were introduced to the image augmentation process.

ID	Hyperparameter	Value
1	Image size	416 × 416
2	Image channels	3
3	Randon rotation angle	0
4	Random hue	0.1
5	Random saturation	1.5
6	Random exposure	1.5
7	Optimizer	SGD
8	Batch size	64

Table 2. Search space. Combination of hyperparameters that were used in the training of each baseline architecture. HC: hyperparameter configuration.

ID	Learning Rate	Momentum	Decay
HC-1	0.000001	0.5	0.00005
HC-2	0.00001	0.75	0.0005
HC-3	0.0001	0.9	0.005
HC-4	0.01	0.95	0.05

Table 3. Experimental units for the comparison of the baseline architectures. EU: experimental unit. HC: hyperparameter configuration. T.L.: transfer learning. T.S.: training from scratch.

ID	Architecture	Hyperparameters	T.M.
EU1–EU3	YOLOv3-Tiny	HC-1 to HC-3	T.L.
EU4	YOLOv3-Tiny	HC-4	T.S.
EU5–EU7	YOLOv4-Tiny	HC-1 to HC-3	T.L.
EU8	YOLOv4-Tiny	HC-4	T.S.
EU9–EU11	YOLOv5s	HC-1 to HC-3	T.L.
EU12	YOLOv5s	HC-4	T.S.

Table 4. Proposed modifications to optimize the best-performing architecture.

ID	Architecture	Modification
M13	Yolo-Optimal1	Removal of one detection scale (only two)
M14	Yolo-Optimal2	Use of a single detection scale
M15	Yolo-Optimal3	Single detection scale and reduced depth

Table 5. Results of hyperparameter search and performance in object detection. ID.: identifier, P: average precision, R: average recall, T.P.: total number of parameters, AVG-FPS: average frames per second (on the conveyor belt, with hardware1: Intel^® i5-10300H CPU and NVIDIA^® GeForce GTX-1650Ti^® GPU).

ID.	P	R	mAP50	T.P.	AVG-FPS
EU1	0.95	0.97	0.993	$8.91 \times 10^{6}$	49.2
EU2	0.80	0.92	0.944	$8.91 \times 10^{6}$	48.3
EU3	0.96	0.98	0.994	$8.91 \times 10^{6}$	45.9
EU4	0.94	0.96	0.988	$8.91 \times 10^{6}$	47.4
EU5	0.98	0.98	0.991	$10.2 \times 10^{6}$	46.4
EU6	0.92	0.97	0.983	$10.2 \times 10^{6}$	45.8
EU7	0.98	0.99	0.994	$10.2 \times 10^{6}$	45.2
EU8	0.96	0.97	0.989	$10.2 \times 10^{6}$	43.6
EU9	0.87	0.85	0.922	$7.04 \times 10^{6}$	59.4
EU10	0.56	0.95	0.710	$7.04 \times 10^{6}$	60.4
EU11	0.99	0.99	0.998	$7.04 \times 10^{6}$	63.2
EU12	0.88	0.90	0.970	$7.04 \times 10^{6}$	62.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Olguín-Rojas, J.C.; Vasquez, J.I.; López-Canteñs, G.d.J.; Herrera-Lozada, J.C.; Mota-Delfin, C. A Lightweight YOLO-Based Architecture for Apple Detection on Embedded Systems. Agriculture 2025, 15, 838. https://doi.org/10.3390/agriculture15080838

AMA Style

Olguín-Rojas JC, Vasquez JI, López-Canteñs GdJ, Herrera-Lozada JC, Mota-Delfin C. A Lightweight YOLO-Based Architecture for Apple Detection on Embedded Systems. Agriculture. 2025; 15(8):838. https://doi.org/10.3390/agriculture15080838

Chicago/Turabian Style

Olguín-Rojas, Juan Carlos, Juan Irving Vasquez, Gilberto de Jesús López-Canteñs, Juan Carlos Herrera-Lozada, and Canek Mota-Delfin. 2025. "A Lightweight YOLO-Based Architecture for Apple Detection on Embedded Systems" Agriculture 15, no. 8: 838. https://doi.org/10.3390/agriculture15080838

APA Style

Olguín-Rojas, J. C., Vasquez, J. I., López-Canteñs, G. d. J., Herrera-Lozada, J. C., & Mota-Delfin, C. (2025). A Lightweight YOLO-Based Architecture for Apple Detection on Embedded Systems. Agriculture, 15(8), 838. https://doi.org/10.3390/agriculture15080838

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight YOLO-Based Architecture for Apple Detection on Embedded Systems

Abstract

1. Introduction

Related Work

2. Materials and Methods

2.1. Data Collection

2.2. Dataset

2.3. General Workflow

2.4. Apple Detection with Neural Networks

2.4.1. Baseline Neural Network YOLOv3-Tiny

2.4.2. Baseline Neural Network YOLOv4-Tiny

2.4.3. Baseline Neural Network YOLOv5s

2.5. Evaluation Metrics

2.6. Design of the Experiment

2.6.1. Initial Hyperparameter Search

2.6.2. Baseline Architectures Comparison

2.6.3. Ablation Study of the Best Baseline Architecture

2.7. Implementation

3. Results

On the Proposed Lightweight YOLOv5

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI