Intelligent Vision System with Pruning and Web Interface for Real-Time Defect Detection on African Plum Surfaces
Abstract
:1. Introduction
- Developed models based on YOLOv9, YOLOv8, YOLOv5, Fast R-CNN, Mask R-CNN, VGG-16, DenseNet-121, MobileNet, and ResNet for African plum quality assessment;
- Implemented pruning techniques to optimize YOLOv9, YOLOv8, YOLOv5, MobileNet, and ResNet models, resulting in more efficient, computationally lightweight models;
- Collected a new labeled dataset of over 2892 African plum samples, the first of its kind for this fruit crop, see Figure 1;
2. Related Works on Plum
3. Data Collection and Dataset Description
4. Model Architecture
4.1. Model and Technique Descriptions
- You Only Look Once (YOLO): YOLO frames object detection as a single-stage regression problem, directly predicting bounding boxes and class probabilities in one pass [39]. We experiment with YOLOv5, YOLOv8, and YOLOv9, which build upon smaller, more efficient backbone networks like CSPDarknet53 compared to earlier YOLO variants. These models divide the image into a grid and associate each grid cell with bounding box data. YOLOv8 and YOLOv9 improve accuracy through an optimized neck architecture that enhances the flow of contextual information between the backbone and prediction heads [17]. YOLOv5’s Architecture is shown in Figure 2.
- Fast R-CNN: Fast R-CNN is a two-stage detector that utilizes a Region Proposal Network (RPN) to propose regions of interest (RoIs), followed by classification and refinement of the detected objects in each RoI [40]. It employs a Region-of-Interest Pooling (RoIPool) layer to extract fixed-sized feature maps from the backbone network’s feature maps for each candidate box.
- Mask R-CNN: Building on Faster R-CNN, Mask R-CNN introduces a parallel branch for predicting segmentation masks on each RoI, in addition to bounding boxes and class probabilities [20]. It utilizes a mask prediction branch with a Fully Convolutional Network (FCN) to predict a binary mask for each RoI. This per-pixel segmentation ability enables instance segmentation tasks alongside object detection.
- DenseNet-121: DenseNet-121 is a widely used convolutional neural network model that features densely connected layers, which improve gradient flow and reduce the number of parameters required compared to traditional architectures. Although it is primarily a classification model, it has been integrated into object detection frameworks like Mask R-CNN to enhance feature extraction and improve performance in complex object detection tasks [22].
- VGG16: VGG16 is a widely adopted CNN architecture that has shown strong performance in object detection tasks [21]. Its deep network structure and large receptive field contribute to its ability to capture and represent complex visual patterns.
- MobileNet: MobileNet, also known as Inception-v1, is another popular CNN architecture introduced by Szegedy et al. [38]. It employs the concept of inception modules, which are designed to capture multi-scale features by using filters of different sizes within the same layer. GoogleNet’s architecture enables efficient training and inference with a reduced number of parameters.
- ResNet: ResNet, proposed by He et al. [24], addresses the degradation problem in deep neural networks by introducing residual connections. These skip connections allow the gradients to flow more easily during training, enabling the training of very deep networks. ResNet has achieved state-of-the-art performance in various computer vision tasks, including image classification and object detection.
4.2. Key Features
- YOLO: YOLO models provide real-time object detection capabilities due to their single-stage regression approach and optimized architecture.
- Fast R-CNN: The two-stage design of Fast R-CNN, with the RPN and RoIPool layer, enables accurate localization and classification of objects in images.
- Mask R-CNN: In addition to bounding boxes and class probabilities, Mask R-CNN introduces per-pixel segmentation to enable instance-level object detection and segmentation.
- DenseNet-121: DenseNet-121 is a convolutional neural network known for its densely connected architecture, where each layer is connected to every other layer, facilitating better gradient flow and feature reuse.
- VGG16: With its deep network structure and large receptive field, VGG16 has demonstrated strong performance in previous image classification studies.
- MobileNet: MobileNet’s inception modules allow it to capture multi-scale features efficiently, leading to good performance with fewer parameters.
- ResNet: ResNet’s residual connections address the degradation problem in deep networks, enabling the training of very deep architectures and achieving state-of-the-art performance in various computer vision tasks.
4.3. Supporting Evidence
4.4. Framework and Dataset
4.5. Application Relevance
5. Experimental Results and Analysis
5.1. Data Preprocessing and Augmentation
- Labeling for object detection models: The dataset of 2892 images was manually annotated using the Roboflow platform. Each image was labeled to delineate the regions corresponding to good and defective plums. Additionally, a background class was used to indicate areas where no fruit was present in the image (see Figure 4).
- Labeling for classification models: For the classification models, a simplified labeling approach was used. Two separate annotation files were created, one for good plums and one for defective plums. The images were labeled with their respective class, without the inclusion of a background class. This approach was suitable for the classification task performed by these models.
- Augmentation: To increase the diversity of the dataset and improve the model’s generalization ability, online data augmentation techniques were applied during training. These techniques included rotations, flips, zooms, and hue/saturation shifts. By augmenting the data, we introduced additional variations and enhanced the model’s ability to handle different scenarios.
- Data splitting: The dataset was split into three subsets: a training set comprising 70% of the data, a validation set comprising 20%, and a test set comprising the remaining 10%. The splitting was performed in a stratified manner to ensure a balanced distribution of good and defective plums in each subset.
- Image resizing: The image resolutions used for the various models were selected based on the specific requirements and constraints of each model. The YOLOv5, YOLOv8, and YOLOv9 models, which are designed for real-time object detection, used higher input resolutions (416 × 416, 800 × 800, and 640 × 640, respectively) to capture more detailed visual information and improve the model’s ability to detect smaller objects. The Mask R-CNN and Faster R-CNN models, used for instance segmentation, required higher-resolution inputs (640 × 640) to accurately delineate object boundaries and capture fine-grained details. In contrast, the VGG16, DenseNet-121, MobileNet, and ResNet models, which are classification-based and were trained on the ImageNet dataset, used a standard input size of 224 × 224 pixels, as this lower resolution is sufficient for image classification tasks, which focus on recognizing high-level visual features rather than detailed object detection or segmentation.
5.2. Model Training
6. Evaluation and Results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ajibesin, K.K. Dacryodes edulis (G. Don) HJ Lam: A review on its medicinal, phytochemical and economical properties. Res. J. Med. Plant 2011, 5, 32–41. [Google Scholar]
- Schreckenberg, K.; Degrande, A.; Mbosso, C.; Baboulé, Z.B.; Boyd, C.; Enyong, L.; Kanmegne, J.; Ngong, C. The social and economic importance of Dacryodes edulis (G. Don) HJ Lam in Southern Cameroon. For. Trees Livelihoods 2002, 12, 15–40. [Google Scholar] [CrossRef]
- Rimlinger, A.; Carrière, S.M.; Avana, M.L.; Nguegang, A.; Duminil, J. The influence of farmers’ strategies on local practices, knowledge, and varietal diversity of the safou tree (Dacryodes edulis) in Western Cameroon. Econ. Bot. 2019, 73, 249–264. [Google Scholar] [CrossRef]
- Swana, L.; Tsakem, B.; Tembu, J.V.; Teponno, R.B.; Folahan, J.T.; Kalinski, J.C.; Polyzois, A.; Kamatoa, G.; Sandjo, L.P.; Chamcheu, J.C.; et al. The Genus Dacryodes Vahl.: Ethnobotany, Phytochemistry and Biological Activities. Pharmaceuticals 2023, 16, 775. [Google Scholar] [CrossRef] [PubMed]
- Leakey, R.R.; Tientcheu Avana, M.L.; Awazi, N.P.; Assogbadjo, A.E.; Mabhaudhi, T.; Hendre, P.S.; Degrande, A.; Hlahla, S.; Manda, L. The future of food: Domestication and commercialization of indigenous food crops in Africa over the third decade (2012–2021). Sustainability 2022, 14, 2355. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Zhou, Z.H. Machine Learning; Springer Nature: London, UK, 2021. [Google Scholar]
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: London, UK, 2022. [Google Scholar]
- Szeliski, R. Concise Computer Vision. An Introduction into Theory and Algorithms; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Apostolopoulos, I.D.; Tzani, M.; Aznaouridis, S.I. A General Machine Learning Model for Assessing Fruit Quality Using Deep Image Features. AI 2023, 4, 812–830. [Google Scholar] [CrossRef]
- Xiao, F.; Wang, H.; Xu, Y.; Zhang, R. Fruit Detection and Recognition Based on Deep Learning for Automatic Harvesting: An Overview and Review. Agronomy 2023, 13, 1625. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Xu, B.; Cui, X.; Ji, W.; Yuan, H.; Wang, J. Apple grading method design and implementation for automatic grader based on improved YOLOv5. Agriculture 2023, 13, 124. [Google Scholar] [CrossRef]
- Rahat, S.M.S.S.; Al Pitom, M.H.; Mahzabun, M.; Shamsuzzaman, M. Lemon Fruit Detection and Instance Segmentation in an Orchard Environment Using Mask R-CNN and YOLOv5. In Computer Vision and Image Analysis for Industry 4.0; CRC Press: Boca Raton, FL, USA, 2023; pp. 28–40. [Google Scholar]
- Mao, D.; Zhang, D.; Sun, H.; Wu, J.; Chen, J. Using filter pruning-based deep learning algorithm for the real-time fruit freshness detection with edge processors. J. Food Meas. Charact. 2024, 18, 1574–1591. [Google Scholar] [CrossRef]
- Solawetz, J. What Is Yolov5? A Guide for Beginners; Roboflow: Des Moines, IA, USA, 2022. [Google Scholar]
- Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Doll’ar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Available online: https://shorturl.at/bShHi (accessed on 7 October 2024).
- YOLOV8 Running Instance. Available online: https://shorturl.at/hmrzF (accessed on 7 October 2024).
- Pachylobus edulis G.Don. Gen. Hist. 1832, 18, 89. Available online: https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:128214-1 (accessed on 7 October 2024).
- Chu, P.; Li, Z.; Lammers, K.; Lu, R.; Liu, X. Deep learning-based apple detection using a suppression mask R-CNN. Pattern Recognit. Lett. 2021, 147, 206–211. [Google Scholar] [CrossRef]
- Asriny, D.M.; Rani, S.; Hidayatullah, A.F. Orange fruit images classification using convolutional neural networks. IOP Conf. Ser. Mater. Sci. Eng. 2020, 803, 012020. [Google Scholar] [CrossRef]
- Lamb, N.; Chuah, M.C. A strawberry detection system using convolutional neural networks. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 2515–2520. [Google Scholar]
- Nithya, R.; Santhi, B.; Manikandan, R.; Rahimi, M.; Gandomi, A.H. Computer vision system for mango fruit defect detection using deep convolutional neural network. Foods 2022, 11, 3483. [Google Scholar] [CrossRef]
- Khan, A.I.; Quadri, S.M.K.; Banday, S.; Shah, J.L. Deep diagnosis: A real-time apple leaf disease detection system based on deep learning. Comput. Electron. Agric. 2022, 198, 107093. [Google Scholar] [CrossRef]
- Liu, X.; Li, G.; Chen, W.; Liu, B.; Chen, M.; Lu, S. Detection of dense Citrus fruits by combining coordinated attention and cross-scale connection with weighted feature fusion. Appl. Sci. 2022, 12, 6600. [Google Scholar] [CrossRef]
- Kusrini, K.; Suputa, S.; Setyanto, A.; Agastya, I.M.A.; Priantoro, H.; Pariyasto, S. A comparative study of mango fruit pest and disease recognition. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2022, 20, 1264–1275. [Google Scholar] [CrossRef]
- Čakić, S.; Popović, T.; Krčo, S.; Nedić, D.; Babić, D. Developing object detection models for camera applications in smart poultry farms. In Proceedings of the 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS), Barcelona, Spain, 1–3 August 2022; pp. 1–5. [Google Scholar]
- Yumang, A.N.; Samilin, C.J.N.; Sinlao, J.C.P. Detection of Anthracnose on Mango Tree Leaf Using Convolutional Neural Network. In Proceedings of the 2023 15th International Conference on Computer and Automation Engineering (ICCAE), Sydney, Australia, 3–5 March 2023; pp. 220–224. [Google Scholar]
- Palakodati, S.S.S.; Chirra, V.R.R.; Yakobu, D.; Bulla, S. Fresh and Rotten Fruits Classification Using CNN and Transfer Learning. Rev. D’Intell. Artif. 2020, 34, 617–622. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, ON, Canada, 7–12 December 2015. [Google Scholar]
- Zhu, L.; Geng, X.; Li, Z.; Liu, C. Improving YOLOv5 with attention mechanism for detecting boulders from planetary images. Remote Sens. 2021, 13, 3776. [Google Scholar] [CrossRef]
- Lin, Q.; Ye, G.; Wang, J.; Liu, H. RoboFlow: A data-centric workflow management system for developing AI-enhanced Robots. Conf. Robot. Learn. 2022, 164, 1789–1794. [Google Scholar]
Model | Input Resolution | Batch Size | Optimizer | Training Epochs |
---|---|---|---|---|
YOLOv5 | 416 × 416 | 16 | Adam | 150 |
YOLOv8 | 800 × 800 | 16 | Adam | 80 |
YOLOv9 | 640 × 640 | 16 | Adam | 30 |
Mask R-CNN | 640 × 640 | 8 | SGD | 10,000 |
Fast R-CNN | 640 × 640 | 64 | SGD | 1500 |
VGG16 | 224 × 224 | 32 | Adam | 15 |
DenseNet-121 | 224 × 224 | 32 | SGD | 50 |
MobileNet | 224 × 224 | 32 | Adam | 40 |
ResNet | 224 × 224 | 32 | Adam | 16 |
Model | Precision (%) | Recall (%) | F1-Score (%) | mAP (%) |
---|---|---|---|---|
YOLOv5 | 80 | 85 | 82.5 | 89.5 |
YOLOv8 | 87 | 90 | 89 | 93.6 |
YOLOv9 | 85.9 | 90 | 87.9 | 93.1 |
Fast R-CNN | 84.8 | 86.4 | 85.6 | 84.8 |
Mask R-CNN | 61.3 | 68.2 | 64.6 | 61.3 |
Model | Precision (%) | Recall (%) | F1-Score (%) | Accuracy (%) |
---|---|---|---|---|
VGG16 | 78 | 80 | 79 | 91 |
DenseNet-121 | 80 | 82 | 81 | 86 |
MobileNet | 87 | 97 | 92 | 86 |
ResNet | 91 | 98 | 94 | 90 |
Model | Pruning (%) | mAP (%) |
---|---|---|
YOLOv5 | - | 89.5 |
30 | 87.5 | |
20 | 89.1 | |
10 | 89.8 | |
YOLOv8 | - | 93.6 |
30 | 59.7 | |
20 | 81 | |
10 | 90.2 | |
YOLOv9 | - | 93.1 |
30 | 71.9 | |
20 | 92.4 | |
10 | 93 |
Model | Pruning (%) | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|---|
ResNet | 0 | 90.9 | 91.3 | 98.6 | 94.8 |
10 | 86.1 | 94.3 | 89.6 | 91.9 | |
20 | 79.7 | 79.3 | 100 | 88.5 | |
30 | 65.5 | 96.5 | 59.8 | 73.8 | |
MobileNet | 0 | 86.1 | 87.3 | 97.5 | 92.1 |
10 | 89.1 | 89.7 | 98.2 | 93.8 | |
20 | 88.2 | 89.4 | 97.5 | 93.3 | |
30 | 86.1 | 89.4 | 93.7 | 91.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fadja, A.N.; Che, S.R.; Atemkemg, M. Intelligent Vision System with Pruning and Web Interface for Real-Time Defect Detection on African Plum Surfaces. Information 2024, 15, 635. https://doi.org/10.3390/info15100635
Fadja AN, Che SR, Atemkemg M. Intelligent Vision System with Pruning and Web Interface for Real-Time Defect Detection on African Plum Surfaces. Information. 2024; 15(10):635. https://doi.org/10.3390/info15100635
Chicago/Turabian StyleFadja, Arnaud Nguembang, Sain Rigobert Che, and Marcellin Atemkemg. 2024. "Intelligent Vision System with Pruning and Web Interface for Real-Time Defect Detection on African Plum Surfaces" Information 15, no. 10: 635. https://doi.org/10.3390/info15100635
APA StyleFadja, A. N., Che, S. R., & Atemkemg, M. (2024). Intelligent Vision System with Pruning and Web Interface for Real-Time Defect Detection on African Plum Surfaces. Information, 15(10), 635. https://doi.org/10.3390/info15100635