Flaw-YOLOv5s: A Lightweight Potato Surface Defect Detection Algorithm Based on Multi-Scale Feature Fusion

Wu, Haitao; Zhu, Ranhui; Wang, Hengren; Wang, Xiangyou; Huang, Jie; Liu, Shuwei

doi:10.3390/agronomy15040875

Open AccessArticle

Flaw-YOLOv5s: A Lightweight Potato Surface Defect Detection Algorithm Based on Multi-Scale Feature Fusion

by

Haitao Wu

,

Ranhui Zhu

,

Hengren Wang

,

Xiangyou Wang

^*

,

Jie Huang

and

Shuwei Liu

College of Agricultural Engineering and Food Science, Shandong University of Technology, Zibo 255200, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(4), 875; https://doi.org/10.3390/agronomy15040875

Submission received: 7 March 2025 / Revised: 29 March 2025 / Accepted: 29 March 2025 / Published: 31 March 2025

(This article belongs to the Special Issue Advanced Machine Learning in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Accurate and rapid detection of potato surface defects is crucial for advancing intelligent potato sorting. To elevate detection accuracy as well as shorten the computational load of the model, this paper proposes a lightweight Flaw-YOLOv5s algorithm for potato surface defect detection. Firstly, Depthwise Separable Convolution (DWConv) is used to displace the original Conv in the YOLOv5s network, aiming to reduce computational burden and parameters. Then, the SPPF in the backbone network is replaced by SPPELAN, which combines SPP with ELAN to enable the model to perform multi-scale pooling and feature extraction, optimizing detection capacity for small targets in potatoes. Finally, the lightweight convolution PConv is used to introduce a new structure, CSPC, to substitute for the C3 in the benchmark network, which decreases redundant computations and reduces the model parameters, achieving a lightweight network model. Experimental results demonstrate that the Flaw-YOLOv5s algorithm obtains a mean Average Precision (mAP) of 95.6%, with a precision of 94.6%, representing, respectively, an improvement of 1.6 and 1.8 percentage points over the YOLOv5s network. With only 4.33 million parameters, this lightweight and efficient model satisfies the requirements for detecting surface defects in potatoes. This research provides a reference for the online detection of potato surface defects and deployment on mobile devices.

Keywords:

potato; surface detection; Flaw-YOLOv5s; depthwise separable convolution

1. Introduction

The potato, recognized as the fourth-most important food crop in all parts of the world, is considered one of the most promising economic crops of the 21st century. It is valued for its nutritional and potential medicinal properties and is crucial in maintaining global food security and stability [1,2]. However, potatoes are susceptible to diseases and damage during growth and storage, often displaying external defects such as being mechanically damaged, rotten, and worm-eaten, which not only directly affect the economic benefits of the product but present a significant challenge for potato packaging, storage, and deep processing. Currently, the detection technology for surface defects in potatoes faces several challenges. Firstly, the variety and complexity of surface defects make them difficult to identify effectively through traditional image processing methods [3]. Secondly, traditional manual sorting is inefficient and prone to high error rates, unable to fulfill the demands of large-scale potato production [4,5]. Therefore, developing an accurate automatic detection method for potato defects is essential for improving production efficiency and product quality.

With advancements in science and technology, the detection of potato defects primarily relies on spectral detection methods and machine vision techniques [6,7]. Spectral technology mainly operates by analyzing the theory in optics of Near-Infrared Spectroscopy (NIR) and hyperspectral technology [8] to acquire valid data about the goal surface, gradually analyzing and judging potato surface defects according to the optical principles of spectra. LS-SVM [9] classifies seven types of potato defects using the spectral and texture characteristics of multispectral pictures at 690, 757, and 927 nm, categorizing them accurately as external defects, but the preprocessing steps can be quite complex. To address the challenge of sorting potatoes with various defects, Deng et al. [10] conducted principal component analysis on reflective surface images to extract key features. Utilizing spectral data, they developed a support vector machine (SVM) model for defect detection, achieving promising results in their experiments. While combining spectral technology with traditional machine vision [11,12] can enhance the average accuracy of defect detection, it still has shortcomings, such as the requirement for extensive preprocessing and subjective human factors, affecting the actual detection effects of different potato surface defects. Additionally, the research equipment required for these methods tends to be expensive.

Nowadays, with the quick progress of scientific technology, deep learning technology has been more and more applied to the surface defects detection of agricultural products [13,14,15]. For instance, Noordam et al. [16] invented a high-speed sorting machine that graded potatoes according to measurements, appearance, and color based on various image processing and classification models. They achieved color separation by combining linear discriminant analysis with Mahalanobis distance classifiers and distinguished defects of similar colors using estimators for appearance classification, like central moments, area, and oddity. Hassankhani et al. [17] utilized an image processing technique and applied MATLAB 7.6.0 for distinguishing surface defects based on appearance and physical properties, achieving a classification accuracy of up to 97.67%. However, this method could not accomplish specific evaluation and was unable to fulfill the current necessity for efficient detection of potato surface defects in terms of work efficiency. In addition, Wang et al. [18] conducted an application study on convolutional neural networks (CNNs) for detection of surface defects on potatoes. They employed transfer learning to optimize three models: SSD Inception V2, RFCN ResNet101, and Faster RCNN ResNet101. Experimental results demonstrated defect detection accuracies of 92.5%, 95.6%, and 98.7%, respectively, for these models. Although the two-stage detection algorithm showed great precision, it lacked processing speed and real-time capabilities.

As a one-stage algorithm, the YOLO series excels in object detection due to its high detection accuracy and lightweight network design, constantly outperforming traditional two-stage algorithms. For instance, Tian et al. [19] conducted experiments using high-resolution apple images and developed an improved YOLOv3 framework specifically designed for detecting apple growth stages. Notably, this algorithm demonstrated strong efficiency across diverse detection scenarios, with an average processing time per image not exceeding 0.31 s. Li et al. [20] devised a lightweight potato surface defect detection network based on YOLOv5s to enhance the identification of external defect features. By integrating ASFF into the PANet structure of the baseline algorithm and combining it with the ASPP module for enhanced multi-scale feature fusion, the model demonstrated superior detection performance for defects such as sprouting, greening, scab disease, and mechanical damage. Experimental comparisons showed that the proposed approach outperformed mainstream models, including Faster R-CNN, YOLOv6, as well as YOLOv7. Zhang et al. [21] used indirectly the CBAM attention mechanism with the BiFPN feature fusion structure to achieve a dual enhancement of the YOLOv5s framework. Specifically, spatial and channel attention was introduced at the feature extraction layer to improve detection with a small set of target features, while the decoupled head structure was reconstructed so as to accelerate model convergence. This enhanced approach demonstrated significant advantages over the baseline model in potato sprout and eye detection. Nevertheless, while introducing the attention module brings high accuracy, it also increases the parameter volume and consumption of computing resources.

In summary, scholars have conducted extensive research on recognizing surface defects in potatoes and have made significant progress. However, due to the variety and different forms of potato surface defects, as well as the complex architecture and heavy computational load of the detection models, it is not easy to satisfy the high throughput and real-time requirements of production while ensuring accurate detection [22]. To overcome the actual limitations of these technologies, this paper employs the YOLOv5s network—renowned for its high detection accuracy and lightweight network design—and proposes Flaw-YOLOv5s, a lightweight algorithm for potato surface defect detection. The three main works are the following: Firstly, a DWConv was designed to replace the original Conv in the benchmark network, decreasing computational load. Then, the SPPF in the backbone network was replaced by SPPELAN, which combines SPP with ELAN to enable the model to perform multi-scale pooling and feature extraction, optimizing the detection capabilities for small targets in potatoes. Finally, the lightweight convolution PConv is used to introduce a new structure, CSPC, to substitute for the C3 in the YOLOv5s network, which reduces redundant computations as well as decreases the model parameters, achieving a lightweight network model.

2. Materials and Methods

2.1. Data Collection

The collection of potato samples and the establishment of a dataset serve as critical procedures for advancing research on machine vision-based detection methodologies targeting external defects in potato. Due to the absence of publicly available benchmark datasets for external potato defects, this study utilizes potato samples sourced from Shandong Star Agricultural Equipment Co., Ltd., located in Dezhou, Shandong Province, China. Images were collected from various commonly cultivated potato varieties to increase the model’s generalization ability. The potato image dataset [23] used in this study was self-constructed using a potato image capture box, as shown in Figure 1. The setup primarily consists of an annular light source, a Daheng industrial camera, a camera mounting bracket, and a light source support frame to capture and identify the potatoes’ surface image features. The annular light source provides uniform and stable illumination to counteract varying external lighting conditions, enabling the camera to capture clearer features of the potatoes. The industrial camera utilized is a Daheng Image Mercury II MER2-G series high-performance area array camera, which supports multiple acquisition methods with a decision of 2448 × 2048, meeting the requirements for potato image acquisition.

To heighten the stability of the dataset, the potato images collected by industrial cameras were manually screened to remove duplicates and those of poor pixel quality, resulting in a set of qualified sample images. From the collected potato images, we saved 900 images in JPG format with 2448 × 2048 pixels, of which the number of samples with dry rot, insect eyes, or damage and normal potato samples were 224, 223, 226, and 227, respectively. Some sample images are shown in Figure 2.

2.2. Dataset Construction

The potato samples were annotated using the LabelImg tool, which classified them into four categories: normal, damaged, rotten, and worm-eaten, labeled as “Normal”, “Damage”, “Rot”, and “Worm-eaten”, respectively. This annotation process generated corresponding TXT format label files for algorithm training.

To heighten the accuracy and generalization capability of the improved potato surface defect detection model, image augmentation techniques were applied to the potato image dataset. Since the lighting conditions during image acquisition were consistent, there was no need for additional random noise processing on the potato images. To simulate different poses of the potatoes during flipping, the annotated image samples were subjected to vertical and horizontal flipping, resulting in 900 vertically flipped and 900 horizontally flipped images. This augmentation process increased the overall dataset to 2700 labeled potato images. The annotated potato image dataset was stochastically divided into training, validation, as well as testing sets in accordance with a proportion of 8:1:1 for algorithm development and evaluation.

2.3. Depthwise Separable Convolution Module

Model compression and acceleration are crucial for both mobile and server-side applications. On mobile devices, they enhance user experience and application compatibility, while on servers, they improve response times and system throughput to meet real-time demands, ultimately reducing operational costs. To further compress network parameters, decrease the model size, enhance inference speed, and minimize resource consumption, the improved Flaw-YOLOv5s model incorporates Depthwise Separable Convolution (DWConv) which replaces the Conv in the YOLOv5s backbone network [24].

Depthwise separable convolution can reduce parameter count, making it an essential option for deployment on mobile devices. The conventional convolution operation is simplified into two sequential stages through this technique: depthwise convolution followed by pointwise convolution. Depthwise convolution performs convolution separately for each input channel without involving cross-channel computations, ensuring that each processed input channel produces one output channel. Pointwise convolution then uses a 1×1 convolution kernel to process the output of the depthwise convolution, mixing the various output channels to generate the final output channel [25].

In practical applications, DWConv diminishes computational complexity and decreases the number of parameters, enhancing operational speed. For instance, when using a 3 × 3 convolution kernel, the parameters and computational load of depthwise separable convolution are approximately one-ninth of standard convolution, and the structural diagram of DWConv is illustrated in Figure 3. This significant reduction in computational complexity brings relief about the ease of implementation and deployment of models across various devices, particularly in mobile and embedded systems with limited computational resources. Therefore, selecting DWConv can reduce the model size and decrease computational complexity.

2.4. Principle of SPPELAN Module

The introduction of Spatial Pyramid Pooling Fast (SPPF) and SPP Enhanced with ELAN (SPPELAN) has significantly enhanced the properties of YOLOv5s [26]. SPPF is an optimized spatial pyramid pooling technique that inherits the traditional SPP’s capability to heighten the algorithm’s capability to detect surface defects on potatoes through multi-scale pooling. Unlike SPP, which applies pooling layers of varying scales (e.g., 5 × 5, 9 × 9, 13 × 13) for multi-scale feature extraction, SPPF streamlines this process by utilizing three consecutive 5 × 5 small-scale pooling layers. This refinement reduces the number of multi-scale pooling operations, and reduces computational complexity through feature map compression and rapid pooling strategies, alleviating the potential impact on real-time performance caused by the substantial computation demands of traditional SPP.

In the improvement of the YOLOv5s object detection network, the SPPF has been replaced by SPPELAN. The construction of the SPPF and SPPELAN are illustrated in Figure 4. Regarding network structure, SPPELAN introduces the Efficient Local Aggregation Network (ELAN) based on SPP, leveraging the multi-scale characteristics of SPP while incorporating an attention mechanism through the ELAN [27]. The SPPELAN module leverages a local attention mechanism to focus on localized information within images, facilitating finer feature extraction across various scales. This enhancement significantly improves the detection accuracy of small targets, such as potato defects. Additionally, while reducing computational load, SPPELAN effectively addresses the limitations of SPPF and SPP in feature extraction capabilities, thereby offering distinct advantages for practical applications. SPPELAN’s unique training strategy enhances the algorithm’s focus on critical regions by calculating the correlation between feature maps and assigning different weights to various regions, thus enhancing local attention and improving detection precision.

In summary, the integration of SPPELAN into YOLOv5s enhances the model by retaining the advantages of the original SPP module while improving its sensitivity to local details. This refinement strengthens the extraction of potato defect features, thereby contributing to improved detection accuracy and computational efficiency. This provides a new reference for developing the YOLO series and the entire field of object detection. As research in deep learning and computer vision technologies continues, these innovative improvements will promote the development of the artificial intelligence industry.

2.5. Principle of SCPC Module

The C3 in YOLOv5s is a key component of its backbone network, mainly demonstrating strengthened performance in feature extraction, which leads to a significant enhancement in target detection accuracy under these conditions [28]. However, the design of the C3 exhibits certain parameter and computational redundancies, particularly when the number of feature map channels is high. The parameter count of the concatenated convolutional layers significantly increases, and the redundant convolutional layers and branching structures elevate the floating-point operations (FLOPs). This adds a computational burden to the model and hampers the efficiency in extracting surface defect features in potatoes, especially in edge devices or real-time applications.

Then, we introduce a novel structure called the CSPC module, which overcomes the previous C3 module’s computational redundancy limitations. This design achieves a lightweight architecture, and greatly improves the accuracy of potato surface defect detection. The advancement of the field in machine vision is demonstrated by expanding indirectly the depth of the neural network and the count of feature layers, and the optimization framework enables continuous performance enhancements. However, this also leads to trivial information redundancy and increased computational costs. This paper presents an improved method that utilizes Partial Convolution (PConv) to introduce the new CSPC structure, replacing the C3 structure in the YOLOv5s network, as illustrated in Figure 5. As a core component of the CSPC network module [29], it enhances the efficiency of the potato surface defect algorithm by reducing computational redundancy through improved traditional convolution operations.

In conventional image processing frameworks, convolutional layers execute complete computations across all input channels, which significantly elevates computational cost. To mitigate that shortcoming, we introduce a new lightweight convolution PConv, as depicted in Figure 6. The symbol * in the diagram represents the photo convolution operation. The key point of PConv is to execute conventional convolution operations only on a portion of the input channels for extracting spatial features while maintaining the surplus channels unvaried, effectively reducing the computational load [30]. During the computation process, PConv designates a contiguous channel to express the whole feature of the image, which aids in-memory optimization. Enhancing the predictive performance of the model while optimizing computational efficiency, the redundant computational components in the C3 are replaced with partial convolution operations.

The floating-point operation (FLOP) and memory access calculation results of PConv are interpreted in Formulas (1) and (2), respectively.

h \times w \times k^{2} \times {c_{p}}^{2}

(1)

h \times w \times 2 c_{p} + k^{2} \times {c_{p}}^{2} \approx h \times w \times 2 c_{p}

(2)

From the equations mentioned above, it can be deduced that when processing 1/4 of the input channels, the PConv is merely 1/16 required by traditional convolution methods [31]. This significant reduction in computational demand illustrates the method’s effectiveness in decreasing computational overhead. In the above equations, h denotes the height of the channel, w its width, k the filter size, and c_p represents multi-channel in partial convolution. Thus, the introduction of PConv can enhance the detection performance of Flaw-YOLOv5s, making it applicable for edge devices or real-time inspection applications.

2.6. Algorithm Improvement

2.6.1. YOLOv5s Detection Algorithm

To address the hardware limitations of automated potato sorting equipment and meet the accuracy requirements for identifying various surface defects [32], this study adopts the YOLOv5s network, renowned for its high detection precision and real-time performance. Similarly, it aims to provide insights for the development of future algorithms in agricultural product defect detection, as detailed in the following sections.

The construction of the YOLOv5s network consists of four primary components: Input, Backbone, Neck, and Head [33,34]. Figure 7 illustrates that the input section for potato sample images begins with Mosaic data augmentation, which employs stochastic cropping and scaling to composite images by concatenating four distinct samples. Subsequent processing stages involve anchor box computation for optimizing bounding box dimensions and randomly scaling images. The adaptive anchor box computation mechanism utilized in YOLOv5s dynamically adjusts anchor box sizes and ratios to accommodate objects of varying sizes and shapes, significantly enhancing the detection accuracy of potato defects. The refined dataset is further delivered to the backbone architecture for feature extraction and deep learning operations. The Backbone is made up of CBS layers, C3 layers, and an SPPF layer, which are primarily responsible for extracting features from the potato surface images. In this version, a Conv module has been employed to replace the Focus slicing operation, while a sequential SPPF has replaced the parallel SPP, reducing computational load and enhancing detection speed under the same accuracy conditions. The Neck employs a dual-branch architecture consisting of FPN and PAN, where FPN facilitates hierarchical semantic aggregation through top-down pathways while PAN strengthens spatial precision via bottom-up feature propagation, enhancing the detection performance for potato images of different sizes. Finally, the Head outputs the categories of external defects, along with the predicted bounding box locations and the confidence scores for the detected objects.

2.6.2. Flaw-YOLOv5s Detection Algorithm

To strengthen the precision of potato defect detection and achieve rapid recognition of the model on mobile devices [35,36], we propose the Flaw-YOLOv5s detection algorithm. The structural modification introduced in this study effectively enhances the model’s detection performance with reduced computational expenditure, which has been experimentally verified through the results presented in Figure 8.

With the limitations imposed by diverse potato surface defects, model optimization is critical for accurate detection of small targets amidst intricate conditions. To achieve this, the original Conv module in the baseline network is replaced with DWConv, which includes both depthwise and pointwise convolution operations. This modification facilitates feature extraction and integration across channels, enhancing contextual information consolidation and multi-scale feature extraction capabilities.

To preferably increase the capacity of the algorithm to extract small objects like potato surface defects, the SPPF in the YOLOv5s network was replaced with the SPPELAN module, which combines SPP and ELAN. This combination allows for multi-scale pooling and feature extraction, reducing model complexity while greatly heightening detection capability. It especially enhances the capacity of the algorithm to extract small objects like potato surface defects [37,38].

The presence of redundant computations in the CBS of the YOLOv5s and the increased model size limit its deployment on equipment with limited resources. To strengthen the model’s efficiency and diminish consumption of computing resources, we introduce the lightweight convolution PConv to replace the C3 module with a new structure called CSPC. This approach minimizes redundant computations while maintaining higher real-time performance and lightweight characteristics. The model’s accuracy remains uncompromised while computational overhead is significantly lowered through this methodology.

3. Experimental Results and Analysis

3.1. Experimental Settings

The operating environment used in this experiment was Windows 10, equipped with an 11th-generation Intel Core i7-11800H processor running at a frequency of 2.30 GHz. The system was configured with 512 GB of storage as well as 16 GB of RAM. An NVIDIA GeForce RTX 3060 Laptop GPU was used for graphics processing. Development was performed in the PyCharm environment. The software setup was configured to include Python 3.8 and CUDA 11.3.

In this experiment, the Adam optimizer was used with an initial learning rate of 0.01, a decay factor of 0.05, and a momentum of 0.937. The input image size was set to 640 × 640, and a total of 200 training iterations, with a batch size of 16, were utilized. The trend of training results is shown in Figure 9.

3.2. Evaluation Metrics

To comprehensively measure the Flaw-YOLOv5s model in detecting surface defects in potatoes, this study utilizes several metrics: precision (P), recall (R), and mAP. This study also used three key metrics, GFLOPs, parameter count, and model size, to assess extensively the algorithm’s lightweight effect on the model. The corresponding formulas are displayed below:

P = \frac{T P}{T P + F P}

(3)

R = \frac{T P}{T P + F N}

(4)

m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N} \times 100 %

(5)

In the above equations, TP denotes True Positive samples, which are correctly identified defects. FP stands for False Positives, where flawless situations are incorrectly classified as defects. FN indicates False Negatives, where defective situations are wrongly classified as flawless.

3.3. Generalization Analysis Between Algorithms Before and After Enhancement

The training outcomes of the original YOLOv5s algorithm and the enhanced Flaw-YOLOv5s model are compared in Figure 10, both conducted under identical experimental conditions and parameters. As depicted in the figure, the Flaw-YOLOv5s model demonstrates a higher mAP in detecting surface defects on potatoes, showcasing a significant advantage.

Figure 11 presents a visual contrast of detection metrics between the baseline and improved models, with different sample comparisons, highlighting enhanced algorithm adaptability. The results show that the Flaw-YOLOv5s algorithm not only detects the features of potato surface defects but also exhibits a stronger ability to accurately identify surface defects in edge regions, mainly for features that are difficult to distinguish. For instance, in Figure 11a, YOLOv5s displays apparent missed tests, whereas the Flaw-YOLOv5s model successfully identifies the surface defects associated with rot. Additionally, in Figure 11b,c, the potato features in the edge regions are not exactly observed by the YOLOv5s model. In comparison, the Flaw-YOLOv5s model exactly detects subtle defect features and accurately classifies different kinds of potato defects, achieving superior detection accuracy.

3.4. Ablation Experiments

Then, to assess the feasibility of the following five enhanced models for potato defect detection, we conducted ablation experiments under identical experimental parameters [37,38]. Moreover, we performed a detailed analysis of the ablation consequences based on the evaluation metrics mentioned in Section 3.2. The experimental framework systematically evaluated five distinct models as follows: the YOLOv5s network (YOLOv5s), the new module modified adding CSPC (YOLOv5s-C), the C3 module further integrated with DWConv module (YOLOv5s-DC), the C3 module further optimized with the combination of the SPPELAN network (YOLOv5s-DS), and the improved Flaw-YOLOv5s model. Each improvement focused on heightening model precision and decreasing model parameters. To delineate the impact of individual modules, Table 1 outlines the ablation experiment results with different kinds of indicators.

Table 1 shows that incorporating the new CSPC structure (YOLOv5s-C) compared to the YOLOv5s model achieved enhancements of 0.7% in precision (P) and 2.5% in recall (R). Moreover, mAP exhibited a 1.2% improvement over the original architecture. These data suggest the CSPC structure effectively strengthens the spatial feature extraction capacity within the backbone network of YOLOv5s. Notably, the structural optimization achieved model complexity reduction, with parameter quantities and memory footprint showing remarkable reductions of 0.89 M and 1.3 MB, respectively, and reduced the computation cost by 1.2 GFLOPs. This demonstrates that the CSPC structure optimizes and replaces the redundant components in C2f with lightweight PConv structures on specific channels, reducing computational resource consumption and making the enhanced model function better.

Compared to YOLOv5s-C, YOLOv5s-DC further reduces parameters by optimizing the Conv module. Under this improvement, the parameter and weight size are separately reduced by 1.15 M and 2.2 MB, and the model’s mAP increased by 0.3%. This indicates a reduction in parameters and model size, effectively showing an enhancement in model performance by integrating the DWConv module. The YOLOv5s-DS architecture demonstrates enhancements in detection performance, achieving a 1.7% improvement in precision and a 2.7% gain in recall compared to the baseline YOLOv5s model, which indicates superior target recognition and localization capabilities. Notably, the optimized framework reduces computational overhead through parameter compression, with model complexity decreasing by 1.5 M parameters and memory footprint contracting by 1.4 MB. The structural optimization achieved through integrating the SPPELAN module has been identified as the principal factor driving enhanced model performance. Then, the Flaw-YOLOv5s algorithm gained an increase in accuracy to 94.6% and an improvement of mAP to 95.6%. As a result, the model demonstrated higher accuracy while maintaining a lightweight network design.

To assess the comparative effects of the ablation experiments on the improved model, refer to Figure 12. The training trends and outcomes for each model indicate rapid growth during the initial 50 rounds, followed by a plateau and gradual stabilization after 200 training iterations, demonstrating excellent convergence without any signs of overfitting. Moreover, compared with other models, the enhanced Flaw-YOLOv5s model achieves better detection performance.

3.5. Experimental Results Among Different Surface Defect Detection Models

To certify the Flaw-YOLOv5s model’s practicality, a detailed comparison was conducted between the enhanced Flaw-YOLOv5s model and several recent mainstream detection algorithms. The models shown include YOLOv3, YOLOv3-tiny, YOLOv5s, YOLOv6s, YOLOv8s, and the Flaw-YOLOv5s model. Experimental findings are revealed in Table 2.

As the results presented in Table 2 show, the enhanced Flaw-YOLOv5s surface defect model achieved an mAP of 95.6%. This represents an improvement of 4.0, 7.2, 1.6, 1.5, and 0.1 percentage points in comparison to YOLOv3, YOLOv3-tiny, YOLOv5s, YOLOv6s, and YOLOv8s, respectively. The results demonstrate that the Flaw-YOLOv5s model achieved mAP and precision representing an improvement of 1.6 and 1.8 percentage points over the YOLOv5s framework, respectively. With only 4.33 million parameters, this lightweight and efficient model satisfies the requirements for detecting surface defects in potatoes. Furthermore, regarding lightweight performance among the models, the Flaw-YOLOv5s model demonstrated an advantage over the other five surface defect detection algorithms, confirming the practicality of its improvements. In summary, the Flaw-YOLOv5s model exhibited outstanding precision in detecting potato surface defects, making it especially suitable for deployment with mobile devices.

We conducted a more transparent performance comparison between the Flaw-YOLOv5s model and existing potato surface defect detection models by evaluating a randomly selected set of potato images. The comparative results are presented in Figure 13. As observed in Figure 13, the distinct detection outcomes of various models are evident. For instance, in subfigures a and c of Figure 13, the Flaw-YOLOv5s model successfully and accurately identifies surface defect features. In contrast, other detection models misclassify potato skin texture as surface defects, leading to occasional errors due to false positives. In comparison, subfigures b and d of Figure 13 highlight that the YOLOv3, YOLOv5s, YOLOv6s, and YOLOv8s models also exhibit missed detection issues, where subtle potato defect features are not accurately captured. Overall, while conventional potato defects are generally detected across all models, the ability of other models to identify less conspicuous defect features remains limited, potentially impacting their application in practical potato sorting tasks. The Flaw-YOLOv5s model, enhanced by integrating a PConv template, significantly improves its capability in extracting fine defect features, effectively addressing these limitations and fulfilling the requirements of current potato surface defect detection applications. In summary, the enhanced Flaw-YOLOv5s balances lightweight design and exceptional detection precision, making it particularly applicable for real-world applications.

4. Conclusions

This study presents an algorithm for detecting surface defects in potatoes, utilizing an enhanced version of the YOLOv5s model to improve accuracy in defect detection on mobile devices. Firstly, the Conv module of the baseline network was substituted with DWConv, which consists of depthwise convolution operations and pointwise convolution operations. It can reduce parameter count, making it an essential option for deployment on mobile devices. Secondly, the SPPF in the YOLOv5s network was replaced with SPPELAN, which combines SPP and ELAN, enabling multi-scale pooling and feature extraction to enhance the capacity of this algorithm to extract small objects like potato defects. Finally, a new structure, CSPC, was proposed to replace C3 in YOLOv5s network using PConv, reducing redundant computations and achieving model lightweight.

In these experiments, generalization analysis of the models before and after improvement, ablation studies, and comparative tests of different models were performed to validate the efficiency of the modified algorithm. The experimental results demonstrate that the Flaw-YOLOv5s model achieved an mAP of 95.6%, with a precision of 94.6%, representing an improvement of 1.6 and 1.8 percentage points over the YOLOv5s framework, respectively. With only 4.33 million parameters, this lightweight and efficient model satisfies the requirements for detecting surface defects in potatoes. Furthermore, the developed framework exhibits notable advancement compared to current mainstream detection models, providing a reference for the implementation of potato surface defect detection technologies.

Currently, variations in surface defect morphologies and characteristics across different potato varieties and growth stages may impose specific challenges for the improved algorithm’s performance in practical applications. To address these issues, future efforts will focus on constructing a more diverse and representative dataset to raise the algorithm’s generalization and adaptive capability in sophisticated agricultural surroundings. Further optimization of the network architecture and training strategies will be carried out to improve sorting efficiency, reduce false detections, and lower operational costs, thereby aligning the model more closely with the specific requirements of large-scale industrial sorting applications. Subsequent research will investigate the model’s transferability in detecting surface defects across various crops. By integrating IoT technologies, an intelligent sorting system will be developed to provide robust support for advancing sustainable agricultural practices.

Author Contributions

Conceptualization, H.W. (Haitao Wu) and R.Z.; software, H.W. (Haitao Wu) and R.Z.; investigation, X.W. and H.W. (Hengren Wang); data curation, and resources, H.W. (Haitao Wu), R.Z. and X.W.; writing—original draft preparation, H.W. (Haitao Wu); writing—review and editing, H.W. (Haitao Wu), J.H. and S.L.; supervision, H.W. (Hengren Wang) and S.L.; project administration, X.W. and H.W. (Hengren Wang); funding acquisition, X.W. and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (31972144) and the Shandong Province Major Agricultural Application Technology Innovation Project (SD2019NJ010).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data can be obtained from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.; Feng, Q.; Li, T.; Xie, F.; Liu, C.; Xiong, Z. Advance of Target Visual Information Acquisition Technology for Fresh Fruit Robotic Harvesting: A Review. Agronomy 2022, 12, 1336. [Google Scholar] [CrossRef]
Li, Y.; Tang, J.Z.; Wang, J.; Zhao, G.; Yu, Q.; Wang, Y.; Hu, Q.; Zhang, J.; Pan, Z.; Pan, X.; et al. Diverging water-saving potential across China’s potato planting regions. Eur. J. Agron. 2022, 134, 126450. [Google Scholar] [CrossRef]
Dorokhov, A.; Aksenov, A.; Sibirev, A.; Hort, D.; Mosyakov, M.; Sazonov, N.; Godyaeva, M. Development of an Optical System with an Orientation Module to Detect Surface Damage to Potato Tubers. Agriculture 2023, 13, 1188. [Google Scholar] [CrossRef]
Su, W.-H.; Xue, H. Imaging Spectroscopy and Machine Learning for Intelligent Determination of Potato and Sweet Potato Quality. Foods 2021, 10, 2146. [Google Scholar] [CrossRef] [PubMed]
Zheng, Z.; Zhao, H.; Liu, Z.; He, J.; Liu, W. Research Progress and Development of Mechanized Potato Planters: A Review. Agriculture 2021, 11, 521. [Google Scholar] [CrossRef]
Sanchez, P.D.C.; Hashim, N.; Shamsudin, R.; Nor, M.Z.M. Applications of imaging and spectroscopy techniques for non-destructive quality evaluation of potatoes and sweet potatoes: A review. Trends Food Sci. Technol. 2020, 96, 208–221. [Google Scholar] [CrossRef]
Su, W.; Liu, G.; He, J.; Wang, S.; He, X.; Wang, W.; Wu, L. Detection of external defects on potatoes by hyperspectral imaging technology and image processing method. J. Zhejiang Univ. Agric. Life Sci. 2014, 40, 188–196. Available online: https://www.zjujournals.com/agr/CN/10.3785/j.issn.1008-9209.2013.08.241 (accessed on 9 February 2025).
Su, W.-H.; Sun, D.-W. Potential of hyperspectral imaging for visual authentication of sliced organic potatoes from potato and sweet potato tubers and rapid grading of the tubers according to moisture proportion. Comput. Electron. Agric. 2016, 125, 113–124. [Google Scholar] [CrossRef]
Zhang, W.; Zhu, Q.; Huang, M.; Guo, Y.; Qin, J. Detection and classification of potato defects using multispectral imaging system based on single shot method. Food Anal. Methods 2019, 12, 2920–2929. [Google Scholar] [CrossRef]
Ji, Y.; Sun, L.; Li, Y.; Li, J.; Liu, S.; Xie, X.; Xu, Y. Non-destructive classification of defective potatoes based on hyperspectral imaging and support vector machine. Infrared Phys. Technol. 2019, 99, 71–79. [Google Scholar] [CrossRef]
Su, W.-H.; Bakalis, S.; Sun, D.-W. NIR/MIR Spectroscopy in Tandem with Chemometrics for Rapid Identification and Evaluation of Potato Variety and Doneness Degree. In Proceedings of the 2019 ASABE Annual International Meeting, Boston, MA, USA, 7–10 July 2019; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2019; p. 1900120. [Google Scholar] [CrossRef]
Kjær, A.; Nielsen, G.; Stærke, S.; Clausen, M.R.; Edelenbos, M.; Jørgensen, B. Prediction of starch, soluble sugars and amino acids in potatoes (Solanum tuberosum L.) using hyperspectral imaging, dielectric and LF-NMR methodologies. Potato Res. 2016, 59, 357–374. [Google Scholar] [CrossRef]
Hu, J.; Gong, H.; Li, S.; Mu, Y.; Guo, Y.; Sun, Y.; Hu, T.; Bao, Y. Cotton Weed-YOLO: A Lightweight and Highly Accurate Cotton Weed Identification Model for Precision Agriculture. Agronomy 2024, 14, 2911. [Google Scholar] [CrossRef]
Li, K.; Wang, J.; Jalil, H.; Wang, H. A fast and lightweight detection algorithm for passion fruit pests based on improved YOLOv5. Comput. Electron. Agric. 2023, 204, 107534. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, D.; Guo, X.; Yang, H. Lightweight Algorithm for Apple Detection Based on an Improved YOLOv5 Model. Plants 2023, 12, 3032. [Google Scholar] [CrossRef]
Noordam, J.C.; Otten, G.W.; Timmermans, T.J.; van Zwol, B.H. High-speed potato grading and quality inspection based on a color vision system. In Proceedings of the Machine Vision Applications in Industrial Inspection VIII, San Jose, CA, USA, 24–26 January 2000; Volume 3966, pp. 206–217. [Google Scholar] [CrossRef]
Hassankhani, R. Potato surface defect detection in machine vision system. Afr. J. Agric. Res. 2012, 7, 844–850. [Google Scholar]
Wang, C.; Xiao, Z. Potato Surface Defect Detection Based on Deep Transfer Learning. Agriculture 2021, 11, 863. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Li, X.; Wang, F.; Guo, Y.; Liu, Y.; Lv, H.; Zeng, F.; Lv, C. Improved YOLO v5s-Based Detection Method for External Defects in Potato. Front. Plant Sci. 2025, 16, 1527508. [Google Scholar] [CrossRef]
Zhang, W.; Zeng, X.; Liu, S.; Mu, G.; Zhang, H.; Guo, Z. Detection Method of Potato Seed Bud Eye Based on Improved YOLO v5s. Trans. Chin. Soc. Agric. Mach. 2023, 54, 260–269. [Google Scholar]
Zhang, X.; Cui, J.; Liu, H.; Han, Y.; Ai, H.; Dong, C.; Zhang, J.; Chu, Y. Weed Identification in Soybean Seedling Stage Based on Optimized Faster R-CNN Algorithm. Agriculture 2023, 13, 175. [Google Scholar] [CrossRef]
Zhang, K.; Wang, S.; Hu, Y.; Yang, H.; Guo, T.; Yi, X. Evaluation Method of Potato Storage External Defects Based on Improved U-Net. Agronomy 2023, 13, 2503. [Google Scholar] [CrossRef]
Xiang, Y.; Yao, J.; Yang, Y.; Yao, K.; Wu, C.; Yue, X.; Li, Z.; Ma, M.; Zhang, J.; Gong, G. Real-Time Detection Algorithm for Kiwifruit Canker Based on a Lightweight and Efficient Generative Adversarial Network. Plants 2023, 12, 3053. [Google Scholar] [CrossRef]
Song, P.; Zhao, L.; Li, H.; Xue, X.; Liu, H. RSE-YOLOv8: An Algorithm for Underwater Biological Target Detection. Sensors 2024, 24, 6030. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Huang, H.; Sun, Y.; Wu, X. AgriPest-YOLO: A Rapid Light-Trap Agricultural Pest Detection Method Based on Deep Learning. Front. Plant Sci. 2022, 13, 1079384. [Google Scholar] [CrossRef]
Jing, J.; Zhai, M.; Dou, S.; Wang, L.; Lou, B.; Yan, J.; Yuan, S. Optimizing the YOLOv7-Tiny Model with Multiple Strategies for Citrus Fruit Yield Estimation in Complex Scenarios. Agriculture 2024, 14, 303. [Google Scholar] [CrossRef]
Qiu, Z.; Wang, W.; Jin, X.; Wang, F.; He, Z.; Ji, J.; Jin, S. DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s. Agronomy 2024, 14, 2558. [Google Scholar] [CrossRef]
Li, Y.; Xu, S.; Zhu, Z.; Wang, P.; Li, K.; He, Q.; Zheng, Q. EFC-YOLO: An Efficient Surface-Defect-Detection Algorithm for Steel Strips. Sensors 2023, 23, 7619. [Google Scholar] [CrossRef]
Dang, F.; Chen, D.; Lu, Y.; Li, Z. YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Comput. Electron. Agric. 2023, 205, 107655. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar] [CrossRef]
Zhang, J.; Tian, M.; Yang, Z.; Li, J.; Zhao, L. An improved target detection method based on YOLOv5 in natural orchard environments. Comput. Electron. Agric. 2024, 219, 108780. [Google Scholar] [CrossRef]
Lyu, S.; Li, R.; Zhao, Y.; Li, Z.; Fan, R.; Liu, S. Green Citrus Detection and Counting in Orchards Based on YOLOv5-CS and AI Edge System. Sensors 2022, 22, 576. [Google Scholar] [CrossRef]
Zheng, L.; Long, L.; Zhu, C.; Jia, M.; Chen, P.; Tie, J. A Lightweight Cotton Field Weed Detection Model Enhanced with EfficientNet and Attention Mechanisms. Agronomy 2024, 14, 2649. [Google Scholar] [CrossRef]
Dai, G.; Hu, L.; Fan, J.; Yan, S.; Li, R. A deep learning-based object detection scheme by improving YOLOv5 for sprouted potatoes datasets. IEEE Access 2022, 10, 85416–85428. Available online: https://ieeexplore.ieee.org/document/9832885 (accessed on 16 January 2025).
Huang, J.; Wang, X.; Jin, C.; Cheein, F.A.; Yang, X. Estimation of the Orientation of Potatoes and Detection Bud Eye Position Using Potato Orientation Detection You Only Look Once with Fast and Accurate Features for the Movement Strategy of Intelligent Cutting Robots. Eng. Appl. Artif. Intell. 2024, 142, 109922. [Google Scholar] [CrossRef]
Liu, W.; Li, Z.; Zhang, S.; Qin, T.; Zhao, J. Bud-YOLOv8s: A Potato Bud-Eye-Detection Algorithm Based on Improved YOLOv8s. Electronics 2024, 13, 2541. [Google Scholar] [CrossRef]
Zhang, D.Y.; Luo, H.S.; Wang, D.Y.; Zhou, X.-G.; Li, W.-F.; Gu, C.-Y.; Zhang, G.; He, F.-M. Assessment of the levels of damage caused by Fusarium head blight in wheat using an improved YOLOV5 method. Comput. Electron. Agric. 2022, 198, 107086. [Google Scholar] [CrossRef]

Figure 1. Structural diagram of image acquisition box. 1. Camera mounting bracket. 2. Daheng industrial camera. 3. Annular light source. 4. Light source support frame. 5. Potatoes.

Figure 2. Partial sample images. 1. Worm-eaten. 2. Damaged. 3. Normal. 4. Rotten.

Figure 3. Structure diagram of DWConv.

Figure 4. Structure diagram of SPPF and SPPELAN.

Figure 5. The network structure of CSPC.

Figure 6. Schematic diagram of PConv architecture. The symbol * in the diagram represents the photo convolution operation.

Figure 7. YOLOv5s network structure.

Figure 8. Flaw-YOLOv5s network illustration.

Figure 9. Training results of the improved model.

Figure 10. Comparison of training results.

Figure 11. Comparison of algorithmic performance outcomes. In (a), the YOLOv5s displays apparent missed tests, whereas the Flaw-YOLOv5s model successfully identifies the surface defects associated with rot. In (b,c), the potato features in the edge regions are not exactly observed by the YOLOv5s model.

Figure 12. Trends in training changes across models.

Figure 13. Experimental results among different surface defect detection models. In (a,c), the Flaw-YOLOv5s model successfully and accurately identifies surface defect features. In contrast, other detection models mis-classify potato skin texture as surface defects, leading to occasional errors due to false pos-itives. (b,d) highlight that the YOLOv3, YOLOv5s, YOLOv6s, and YOLOv8s models also exhibit missed detection issues, where subtle potato defect features are not accurately captured.

Table 1. Flaw-YOLOv5s model ablation experimental results.

Model	P/%	R/%	mAP/%	Params (M)	FLOPs (G)	Weights/MB
YOLOv5s	92.8	88.3	94.0	7.02	15.8	13.7
YOLOv5s-C	93.5	90.8	95.2	6.13	14.6	12.4
YOLOv5s-DC	93.0	91.3	95.5	4.98	14.8	10.2
YOLOv5s-DS	94.5	91.0	95.0	6.02	15.3	12.3
Flaw-YOLOv5s	94.6	91.1	95.6	4.33	13.8	8.9

Table 2. Experimental results with surface defect detection models.

Model	P/%	R/%	mAP/%	Params (M)	FLOPs (G)	Weights/MB
YOLOv3s	92.3	84.5	91.6	61.51	154.6	123.4
YOLOv3-tiny	92.0	82.9	88.4	8.08	18.4	16.3
YOLOv5s	92.8	88.3	94.0	7.02	15.8	13.7
YOLOv6s	94.3	88.1	94.1	16.31	44.2	32.9
YOLOv8s	94.9	91.3	95.5	11.14	28.6	22.5
Flaw-YOLOv5s	94.6	91.1	95.6	4.33	13.8	8.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, H.; Zhu, R.; Wang, H.; Wang, X.; Huang, J.; Liu, S. Flaw-YOLOv5s: A Lightweight Potato Surface Defect Detection Algorithm Based on Multi-Scale Feature Fusion. Agronomy 2025, 15, 875. https://doi.org/10.3390/agronomy15040875

AMA Style

Wu H, Zhu R, Wang H, Wang X, Huang J, Liu S. Flaw-YOLOv5s: A Lightweight Potato Surface Defect Detection Algorithm Based on Multi-Scale Feature Fusion. Agronomy. 2025; 15(4):875. https://doi.org/10.3390/agronomy15040875

Chicago/Turabian Style

Wu, Haitao, Ranhui Zhu, Hengren Wang, Xiangyou Wang, Jie Huang, and Shuwei Liu. 2025. "Flaw-YOLOv5s: A Lightweight Potato Surface Defect Detection Algorithm Based on Multi-Scale Feature Fusion" Agronomy 15, no. 4: 875. https://doi.org/10.3390/agronomy15040875

APA Style

Wu, H., Zhu, R., Wang, H., Wang, X., Huang, J., & Liu, S. (2025). Flaw-YOLOv5s: A Lightweight Potato Surface Defect Detection Algorithm Based on Multi-Scale Feature Fusion. Agronomy, 15(4), 875. https://doi.org/10.3390/agronomy15040875

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Flaw-YOLOv5s: A Lightweight Potato Surface Defect Detection Algorithm Based on Multi-Scale Feature Fusion

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Dataset Construction

2.3. Depthwise Separable Convolution Module

2.4. Principle of SPPELAN Module

2.5. Principle of SCPC Module

2.6. Algorithm Improvement

2.6.1. YOLOv5s Detection Algorithm

2.6.2. Flaw-YOLOv5s Detection Algorithm

3. Experimental Results and Analysis

3.1. Experimental Settings

3.2. Evaluation Metrics

3.3. Generalization Analysis Between Algorithms Before and After Enhancement

3.4. Ablation Experiments

3.5. Experimental Results Among Different Surface Defect Detection Models

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI