Innovative Machine Learning and Image Processing Methodology for Enhanced Detection of Aleurothrixus Floccosus

Valderrama Solis, Manuel Alejandro; Valenzuela Nina, Javier; Echaiz Espinoza, German Alberto; Yanyachi Aco Cardenas, Daniel Domingo; Villanueva, Juan Moises Mauricio; Salazar, Andrés Ortiz; Villarreal, Elmer Rolando Llanos

doi:10.3390/electronics14020358

Open AccessArticle

Innovative Machine Learning and Image Processing Methodology for Enhanced Detection of Aleurothrixus Floccosus

by

Manuel Alejandro Valderrama Solis

^1,†

,

Javier Valenzuela Nina

^1,†

,

German Alberto Echaiz Espinoza

^2,*,†

,

Daniel Domingo Yanyachi Aco Cardenas

^2,†

,

Juan Moises Mauricio Villanueva

^3,†

,

Andrés Ortiz Salazar

^4,†

and

Elmer Rolando Llanos Villarreal

^5,†

¹

Professional School of Engineering Telecommunications, Universidad Nacional de San Agustin de Arequipa, Arequipa 04002, Peru

²

Department of Engineering Electronics, Universidad Nacional de San Agustin de Arequipa, Arequipa 04002, Peru

³

Department of Electrical Engineering, Center for Alternative and Renewable Energies—CEAR Federal University of Paraíba (CEAR-UFPB), João Pessoa 58051-900, PB, Brazil

⁴

Department of Computer Engineering and Automation, Federal University of Rio Grande do Norte (DCA-UFRN), Natal 59072-970, RN, Brazil

⁵

Department of Natural Sciences, Mathematics, and Statistics, Federal Rural University of Semi-Arid (DCME-UFERSA), Mossoró 59625-900, RN, Brazil

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(2), 358; https://doi.org/10.3390/electronics14020358

Submission received: 1 December 2024 / Revised: 5 January 2025 / Accepted: 14 January 2025 / Published: 17 January 2025

(This article belongs to the Special Issue Innovations in Intelligent Agriculture: Advanced AI and Robotics for Modern Farming)

Download

Browse Figures

Versions Notes

Abstract

This paper presents a methodology for detecting the pest Aleurothrixus floccosus in citrus crops in Pedregal de Arequipa, Peru. The study employs simple random sampling during image collection to minimize bias, alternating and extracting leaves from different citrus trees. Image processing techniques, including noise reduction, edge smoothing, and segmentation, are applied for pest detection. Machine learning algorithms are used to classify the images, culminating in a robust detection methodology. A dataset of 1200 images was analyzed during the study.

Keywords:

machine learning; random sampling; aleurothrixus floccosus; segmentation

1. Introduction

Cultivating citrus fruits in regions with high humidity and temperatures, like southern Arequipa, creates an ideal environment for the proliferation of insects and bacteria, which can lead to plant health issues. The presence of numerous citrus diseases and pests, their long damage cycles, and the challenges in control seriously affect citrus yield and quality [1]. Advances in technology and industry have helped mankind to increase the quality of life and life expectancy. This includes delegating laborious processes traditionally performed by hand to machines, which can perform these tasks more efficiently [2].

Plants become more prone to disease due to the many pathogens surrounding them. Pest and disease attacks are significant causes of reduced crop yields. Accurate and timely prediction of plant diseases helps to apply appropriate prevention and protection measures. Therefore, it helps to improve yield quality and increase crop productivity. Plant diseases are detected by various symptoms such as lesions, color changes, damaged leaves, stem damage, abnormal growth of stem, leaf, bud, florm and/or root, etc. In addition, leaves show symptoms such as spotting, dryness, premature drooping, etc., as an indicator of disease [3].

Similarly, early classification of plant diseases will help farmers use the best strategies to combat them. Using sensors, machine vision, AI models, and robots allows harvesting processes to be performed on behalf of workers with greater accuracy and speed. In addition, it helps to reduce crop wastage in the field which is experienced with traditional harvesting methods. Finally, the process of early detection of pests in plants helps to reduce economic loss due to the execution of prevention protocols when pests are detected at an early stage [4].

The following will list specific publications that were reviewed during our research related to precision agriculture and crop pest detection. It is essential to understand that in this list some papers use neural networks due to the high accuracy in image classification and detection.

First, we examine the agricultural Monitoring System with IoT and Machine Learning.

In [5], they proposed a recommendation system for farmers in real time using sensors, IoT (Internet of Things) devices, and machine learning algorithms. In addition, the proposed architecture consists of three layers, including the data acquisition layer which is the first layer that is responsible for the continuous monitoring of water level, temperature, humidity, light intensity, and rainfall level, the data processing layer helps the data processing through a master node that receives information from the previous layer and sends it via wifi to the cloud, and finally the visualization and analysis layer that preprocesses the data in the cloud server with XGBoost and sends the recommendations in real time where the optimization of the system in the cloud is concluded.

In [6], they proposed a model for the prediction of apple orchard disease in apple orchards, which was carried out in the Kashmir valley and uses different methods for data analysis, machine learning algorithms such as linear regression, and IoT systems using WSN (wireless sensor networks) network by adding sensors and using ZigBee. Finally, they tested the farmers by examining the different challenges they had to overcome during the incorporation of technology into the agricultural area.

Second, we examine the pest and disease detection in plants.

In [7], they analyzed pest detection in plants using images and two machine learning models: Support Vector Machine (SVM) and AlexNet deep learning. The study considered three key factors: computational power, the amount of input data, and model architecture. The results showed 92.67% accuracy for SVM and 97% accuracy for the neural network.

In [1], they developed a methodology for the detection of pests and diseases in citrus, using Self-Attention YOLOV8, with hyperspectral and multispectral imaging techniques to analyze different wavelengths with a focus on artificial vision technology taking into account characteristics such as texture, color, and shape. However, in this work, they used convolutional neural networks such as YOLOV8 to detect pests, which require a large variety of data for training, and it is more difficult to understand how they reach these conclusions compared to more transparent conventional methods.

In [8], they reviewed pest detection and classification using deep learning techniques. The study reviewed different neural network models such as CNNs (Convolutional Neural Networks) and their application in agriculture to improve the accuracy of disease identification. It also compares approaches with conventional methods and highlights the advantages and limitations of deep learning in this emerging field.

In [9], they developed a methodology for early detection of crop pests using CNN-type artificial neural networks. This methodology considers several relevant attributes of plant images. Among the main results, the high accuracy in identifying pests in the early stages of infestation is highlighted. However, the paper also discusses some limitations and areas for future improvements in the approach used.

In [10], they present an approach for disease detection in citrus fruits and leaves using DenseNet (Dense convolutional network), a deep convolutional neural network architecture. DenseNet takes advantage of dense connectivity between layers to improve detection efficiency and accuracy. The study demonstrates that this model is effective in identifying multiple citrus diseases, outperforming other techniques in terms of accuracy and speed, and offering a valuable tool for precision agriculture. Next, we examine Citrus pest-Specific Studies.

In [11], they conducted a study on the population fluctuations of citrus with the whitefly in a 4-hectare orchard in the citrus region of Chlef, first by sampling the population of the pest from July 2013 to June 2014 every two weeks, then entering the stage of infestation rate where the Townsmen–Heuberger formula calculates the infestation rate, followed by the rate of parasitism by C. Noacki, the phenology of the orange tree about climatic data, and statistical analysis. These stages include parasite counts per square centimeter, growth cycle analysis of the orange tree and data evaluation using ANOVA (Analysis of Variance) and GLM (General Linear Model) methods. The temporal variation results of the pest as the evolution of the abundance indexes where three abundance peaks of the study are presented and it shows the periods of fluctuation of the pest. New contributions in citrus pest detection.

The objective of this article is to study a new proposal for machine learning and image processing techniques for the early detection of diseases in citrus plant leaves, using methods studies such as filters, transformations, and segmentation. For the study, a digital camera was used to capture images of the leaves in the Pedregal region of Arequipa. Additionally, around 1200 images of citrus leaves were used to obtain the results.

Among the main contributions of this article is the proposal of a methodology for detecting Aleurothrixus floccosus in citrus plants, using a set of leaf images for farmers or agricultural studies.

This article is divided into five sections, in addition to this introductory section. Section 1 presents the Introduction, detailing the preliminaries of the work. Section 2 presents the Background definitions. Section 3 presents the Materials and Methods. Section 4 presents the Results and Discussion. Finally, Section 5 presents the Conclusions, summarizing the achievement of the article’s objectives.

2. Background Definitions

2.1. Grayscale

The RGB image is transformed into grayscale values due to its ease of processing and manipulation, as it simplifies working in this format by converting a color pixel (with red, green and blue components) into a single grayscale value. This process eliminates chromatic components while retaining only the brightness, facilitating analysis in the proposed methodology.

2.2. Equalization

The equalization process aims to obtain a new histogram, from the original histogram, with a uniform distribution of the different intensity levels. By transforming any continuous distribution into a uniform distribution, the amount of information it contains is maximized. Although it has already been said that in the discrete case, it is impossible to increase the amount of information, the equalization of the histogram improves the visual quality of saturated images. This effect is because the intensity values of the saturated areas are changed, in which originally some objects are not adequately distinguished when visually inspecting the image [12]. Equalization helps the image so that the distribution of all the pixel intensities of an image can be equitably distributed along the color histogram.

2.3. Median Filter

The median filter selects the median value from each pixel’s neighborhood. Median values can be computed in expected linear time using randomized select algorithms and incremental variants. Since the shot noise value usually lies well outside the true values, the median filter can filter away such bad pixels [13].

It is mainly used to remove impulsive noise or salt and pepper noise, unlike linear filters, this filter preserves edges better which is ideal for image processing.

2.4. Salt and Pepper Noise

Salt and pepper noise is one of the types of image noise, which is usually a minimum extreme (0) and a maximum extreme (255) of a grayscale image for an 8-bit image [14]. It can arise for different reasons such as bad camera calibration or unwanted energy. This type of noise is usually identified by black and white dots scattered throughout the image.

2.5. GLCM Matrix

GLCM (fray level co-occurrence matrices) is a second-order statistical method that estimates the frequency of pixel pairs having the same gray levels in an image and applies additional knowledge gained from spatial relationships [15]. The GLCM is calculated for a selected pair of distance and angle. For a specific pair of distance and angle, the relative recurrences of that pair are calculated for each pixel and its neighbors [16]. The conclusion is that the GLCM matrix is a statistical method and is easily applicable in images to obtain characteristics related to texture.

2.6. Data Augmentation

Data augmentation is one of the effective regularization techniques that aims to prevent the overfitting of a network and increase the generalization performance. Data augmentation techniques create richer training data, transformed from the original so that the trained network obtains a higher generalization performance on unseen test data [17]. Data augmentation is a strategy for increasing the number of records in a dataset when the problem arises of having an adequate level for the machine learning algorithm to learn.

3. Materials and Methods

3.1. Methodology

The methodology proposed in the following article is presented in a concise and summarized manner in the following block diagram Figure 1:

Breaking down the block diagram for better understanding, we have Figure 2, Figure 3, Figure 4 and Figure 5 which represents the methodology used in this paper.

Input Image: We obtained images of citrus leaves affected by the pest through a digital camera.

Figure 2. Methodology—Input.

Figure 2. Methodology—Input.
Preprocessing: We transform the obtained RGB image, which is composed of three channels, into a single grayscale image for subsequent equalization. This process redistributes pixel intensity improving contrast. Finally, we apply noise reduction using a median filter to remove unwanted noise from the image.

Figure 3. Methodology—Preprocessing.

Figure 3. Methodology—Preprocessing.
Processing: We apply dimensionality reduction to simplify the dataset while preserving as much relevant information as possible, data augmentation to balance the obtaned images and finally classify using algorithms such as SVM, DECISION TREE, and XGBoost.

Figure 4. Methodology—Processing.

Figure 4. Methodology—Processing.
Output Image: We obtain and image and classify it as either pest-affected or healthy.

Figure 5. Methodology—Output.

Figure 5. Methodology—Output.

3.2. Data Description

The obtained sample consists of 1200 images of leaves from a series of citrus plants, primarily orange and lemon. These images were captured using a digital camera. Furthermore, the data was collected from production centers located in the Valle de Majes, Arequipa, Peru. The photographs show different stages of the plant when the pest infiltrates and develops, affecting the growth and production of the citrus plants.

These images are divided into three categories: Aleurothrixus floccosus, healthy leaves, and aphids. During the leaf collection process, a random sampling of various trees is conducted to minimize bias during the sample collection process.

3.3. Image Preprocessing

3.3.1. Image Type and Size

A digital camera was used to take photos of the leaves in the Pedregal region in Arequipa. The type and size of the image are essential for the preprocessing stage. The format used for the images is bmp (bitmap) a format without decompression which allows us to work with a wide variety of information in comparison with other formats such as .jpg which is a compressed format and presents a loss of information in the image: however, this also leads to the image having much less weight, and the size of the images is rescaled to

1000 \times 473

px.

3.3.2. Transform

It transforms the rgb image into grayscale due to its easy processing and manipulation, making it easier to work in this format. When converting a color pixel with colors r, g, and b to a grayscale value gv, instead of taking the average of the RGB values, we apply the following conversion formula to calculate gv: [18].

g (v) = f (r, g, b) = round (0.299 r + 0.587 g + 0.114 b)

(1)

where

f (.)

is the function for color to grayscale conversion and the function round(x) rounds the value of X to an integer representing the grayscale value gv [19].

Figure 6 presents the original image and the grayscale image.

3.3.3. Histogram Equalization with the n CDF (Cumulative Distribution Function)

Equalization helps the image so that the distribution of all pixel intensities along the histogram can be distributed equally through equalization, as shown in Figure 7. However, it applies in the set of images to adjust the appropriate contrasts for each image. As a result, this leads to the captured images being able to obtain a greater definition of the image object since maintaining the set of pictures without preprocessing them can interfere with the extraction of features and image segmentation. The image equalization process involves the distribution of all pixel intensities in the different gray intensities. It is a very common method to adjust the contrast of the image. This is performed on all captured images to achieve an equalization of all images. In Figure 7, presents the Original Histogram and Equalization Histogram.

3.3.4. Noise Reduction

Noise in images refers to small distortions in pixel intensity, as shown in Figure 8. This noise can affect the results of various image processing techniques that are sensitive to the disturbance, so it is essential to eliminate or reduce it. In this context, “salt and pepper” noise, which is common in digital images, can be identified. To reduce this noise, a median filter is applied to all image sets using a

5 \times 5

kernel. This filter convolves the entire image, effectively eliminating or reducing the white spots in both the outer and inner areas of the leaf, as shown in Figure 8. Additionally, in Figure 8, the effect of the median filter on the obtained images can be observed, with the first two images at the top showing the original images without the filter and the same images at the bottom after applying the median filter.

3.4. Image Processing

As seen in Table 1, each GLCM cell shows how many times the corresponding pair occurs in the original matrix for the defined spatial relationship (d = 0°).

3.4.1. Feature Extraction

In this stage, plant leaf images were used to identify the presence of Aleurothrixus Floccosus through the analysis of radiomic features. For feature extraction, the PyRadiomics system was used, a software specialized in obtaining radiomic data from medical images [20]. In this study, PyRadiomics and Transform Wavelet were adapted to analyze leaf images to detect textural patterns associated with the presence of Aleurothrixus Floccosus. The configuration and installation of PyRadiomics were carried out through its official GitHub repository (version 3.1.0, https://github.com/Radiomics/pyradiomics, accessed on 17 May 2023), using the Python programming language. Among the radiomic features extracted, those derived from the gray-level co-occurrence matrix (GLCM) play a prominent role, as they allow the examination of leaf texture by describing the spatial relationships between the pixel intensity levels. These features include metrics such as contrast, homogeneity, energy, entropy, and correlation, which are essential for identifying textural patterns and subtle variations in the leaves that may be related to the presence of Aleurothrixus Floccosus. A total of 29 radiomic features were extracted, which were subsequently analyzed to classify and detect Aleurothrixus Floccosus based on the textures and patterns identified in the regions of interest (ROI).

3.4.2. Dimensionality Reduction

Principal Component Analysis (PCA) generates new components that retain the most valuable information from the features by capturing high variance [15], as shown in Table 2. The first eight components account for the highest percentage of variance in the dataset. This technique not only simplifies the dataset structure by reducing its dimensionality but also preserves its most critical information. A dimensionality reduction algorithm was employed to select the most representative principal components. The analysis resulted in eight principal components that condense the key characteristics of the dataset, thereby facilitating interpretation, as illustrated in Figure 9.

Additionally, as shown in the Table 3, The dataset components are presented, with a total of eight components, where Energy has the highest contribution to the dataset.

3.4.3. Classification Algorithms

At this stage, corresponding to the classification of the dataset, the algorithms SVM (Support Vector Machines), Decision Tree, and XGBoost will be evaluated. The dataset will be divided into 70% for training and 30% for testing.

SVM

For the classification of the dataset, the SVM (Support Vector Machines) classification algorithm is used. SVM performs the classification process by detecting hyperplanes that separate the different classes in the dataset. These hyperplanes are the decision boundaries that maximize the margin between data points of different classes. In essence, the algorithm works by finding the optimal hyperplane that best divides the data, ensuring that the margin, or distance, between the hyperplane and the closest data points from each class (called support vectors) is as large as possible. The execution of the SVM machine learning algorithm is carried out with the following configuration. The parameter C, with values 0.1, 1, and 10, regulates the trade-off between a wider decision boundary and classification errors, where smaller values prioritize generalization and larger values aim for higher classification accuracy. The kernel, set to ‘rbf’ (Radial Basis Function), enables handling non-linear relationships by transforming the data into a higher-dimensional space to facilitate separation. Finally, gamma, with values scale, 0.1 and 0.01, determines the influence of individual data points, where smaller values lead to a more generalized model and larger values result in a more specific model, while the scale option automatically adjusts its value based on the dataset’s characteristics.

DECISION TREE

One of the algorithms also used for classification is the decision tree. The decision tree to choose the different levels based on metrics such as the gini index in the data set for the application of this algorithm on the obtained data set. It is essential to divide the data set into two parts which are a training part and a test part as well as this result is compared with the accuracy of two other algorithms. In the decision tree model parameters, max_depth, with values 4, 5, and 10, controls the maximum depth of the tree, limiting its growth to prevent overfitting on complex data. The min_samples_split parameter, with values 2, 5, and 10, defines the minimum number of samples required to split a node, where higher values promote a more generalized model. Lastly, min_samples_leaf, with values 1, 2, and 5, sets the minimum number of samples needed at each leaf, helping to regulate the tree’s complexity and avoid excessive fitting to the training data.

XGGBOOTS

XGBoost is an algorithm based on the concept of boosting, which combines several decision trees that build several models sequentially and where each of them has the task of correcting the errors of previous models as well as this model creates several small trees by adjusting the large ones and optimizing it progressively to improve the final accuracy. It is suitable for tasks where accuracy is crucial, which makes it valuable for optimizing for farmers. Likewise, the XGBoost algorithm is applied to the obtained data set in the same way as in the previous cases, the data set is divided into 70 percent training and 30 percent test data sets.

During the tuning of the XGBoost model, the best results were achieved with the following hyperparameters: max_depth = 6, learning_rate = 0.1, and n_estimators = 300. With this configuration, the model reached a precision of 0.82 on the test set. These optimized parameters allowed the model to improve its performance, enhancing its generalization ability and boosting its predictive power on the data.

3.4.4. Segmentation

Segmentation in image processing is one of the most important stages in the study, as it is crucial to obtain the detection of the object in the image. Currently, there are several segmentation methods. The method employed in the research consists of converting the image to the HSV color space and extracting the S (saturation) channel for segmentation. Then, thresholding is applied to the S channel to isolate the areas of interest. To improve the segmentation, a morphological operation is used to close any imperfections in the image. Finally, Canny edge detection is applied to highlight the boundaries of the segmented object.

The Canny method works by taking grayscale images as input and producing edge images as output. In its application, the Canny method of edge detection consists of the following steps, namely smoothing or noise filtering, calculation of gradient magnitude or gradient direction, implementation of non-maximal suppression, and set the process of Hysteresis Thresholding. Clustering is a multivariate technique that aims to group similar observations into clusters based on the observed values of several variables for each individual [21]. Figure 10 presents the Segmented Image and Detected Edges using the Canny method.

3.4.5. Detection and Classification

Pest detection involves classifying leaves using our proposed method for identifying whitefly or Aleurothrixus floccosus. The detected edges are used to outline the target image, which in this case is the leaf with the pest. This marked leaf is illustrated in Figure 11.

By capturing the values of the segmented edges, which exhibit sharp intensity changes in the binary image, we can overlay these edges onto the original image. This combined image, along with the classification algorithm, enables the detection and identification of the target pest.

4. Results and Discussions

The results of the methodology proposed in Section 3 will be presented. To achieve the proposed objectives, the following three cases are presented: (a) pre-processing, (b) processing, and (c) classification.

4.1. Pre-Processing

The data set collected from the leaves of citrus trees was exposed to preprocessing due to the noise they presented, going through the transformation to grayscale for better manipulation, equalization for the improvement in the image for its intensities, and median filter for the reduction of noise that was presented in the images. The results of some images showed a noise reduction of 87% having remained a little salt and pepper noise in the image, whereas other images showed 98% noise elimination leaving few traces of noise in the image.

4.2. Processing

The processing stage focuses on the operations performed on the image, such as obtaining features calculated from the GLCM (Gray Level Co-occurrence Matrix). The result involves transforming the images into numerical values, converting the image into a feature vector, and extracting the most important features in the process. The following features are highlighted: contrast, correlation, energy, homogeneity, dissimilarity, energy, entropy, and variance, with a particular emphasis on the variables energy and entropy.

4.3. Classification

According to the proposed methodology, the results indicate that, in the test set, the XGBoost model achieved an accuracy of 82%, with an F1-score of 81% and a recall of 81%. The SVM model achieved an accuracy of 75%, an F1-score of 74%, and a recall of 75%, while the Decision Tree model achieved an accuracy of 65%, an F1-score of 65%, and a recall of 65%, placing it below the performance of both XGBoost and SVM. Additionally, the proposed methodology was compared with several convolutional neural network (CNN) architectures, such as ResNet, DenseNet, and VGG19, in image classification. Each architecture was evaluated with a learning rate of 0.0001, 15 epochs, and a batch size of 226, yielding the following results: the ResNet model recorded an accuracy of 66%, with an F1-score of 69% and a recall of 75%; the DenseNet model achieved an accuracy of 73%, with an F1-score of 77% and a recall of 82%; and the VGG19 model, though slightly lower, reached an accuracy of 76%, an F1-score of 57%, and a recall of 61%. Figure 12 illustrates the comparison of these metrics among the SVM, XGBoost, Decision Tree, ResNet, DenseNet, and VGG models. Additionally, as shown in Figure 13, the comparison of training and test metrics is presented.

Moreover, in Figure 14, the loss curve of each architecture (CNN) is presented.

5. Conclusions

The solution of problems in the detection of patterns in images in some cases can be solved with image processing techniques without entering complex algorithms and robust models as demonstrated in our research; however, the complement of current techniques tends to a better development of research and other areas of knowledge.

This article aimed to study a proposal for machine learning and image processing techniques in the early detection of diseases in citrus plant leaves using machine learning and image processing techniques such as filters, transformations, and segmentation. Using 1200 images of citrus leaves, the results were obtained.

The captured images at different states of the pest infiltration greatly influenced the results for its detection, which is essential for the understanding of the problem and the proposal of a solution. However, this is not usually a primary determinant of a solution being that badly captured images can be a factor and prompt other questions such as, for example, image reconstruction.

Author Contributions

M.A.V.S., J.V.N., G.A.E.E., D.D.Y.A.C. and A.O.S. conceived and designed the study; M.A.V.S., J.V.N., G.A.E.E., D.D.Y.A.C. and J.M.M.V. were responsible for the methodology; M.A.V.S., J.V.N. and J.M.M.V. performed the simulations and experiments; M.A.V.S., J.V.N., J.M.M.V. and E.R.L.V. reviewed the manuscript and provided valuable suggestions; M.A.V.S., J.V.N., J.M.M.V. and E.R.L.V. wrote the paper; G.A.E.E., D.D.Y.A.C. and A.O.S. were responsible for supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universidad Nacional de San Agustin Arequipa (UNSA), through UNSA INVESTIGA (Contract N° PI-01-2024-UNSA).

Data Availability Statement

Data are contained within the article.

Acknowledgments

We would like to thank the National University of San Agustín de Arequipa.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GLCM	Gray level co-occurrence matrices
GLM	General linear model
CDF	Cumulative Distribution Function
SVM	Support Vector Machine
IoT	Internet of Things
WSN	Wireless sensor networks
CNN	Convolutional Neural Networks
DenseNet	Dense convolutional network
ANOVA	Analysis of variance
RGB	Red Green Blue

References

Luo, D.; Xue, Y.; Deng, X.; Yang, B.; Chen, H.; Mo, Z. Citrus Diseases and Pests Detection Model Based on Self-Attention YOLOV8. IEEE Access 2023, 11, 139872–139881. [Google Scholar] [CrossRef]
Hachimi, C.E.; Belaqziz, S.; Khabba, S.; Sebbar, B.; Dhiba, D.; Chehbouni, A. Smart Weather Data Management Based on Artificial Intelligence and Big Data Analytics for Precision Agriculture. Agriculture 2023, 13, 95. [Google Scholar] [CrossRef]
Dhaka, V.S.; Meena, S.V.; Rani, G.; Sinwar, D.; Kavita, I.M.F.; Wozniak, M. A survey of deep convolutional neural networks applied for prediction of plant leaf diseases. Sensors 2021, 21, 4749. [Google Scholar] [CrossRef] [PubMed]
Elbasi, E.; Mostafa, N.; Alarnaout, Z.; Zreikat, A.I.; Cina, E.; Varghese, G. Artificial Intelligence Technology in the Agricultural Sector: A Systematic Literature Review. IEEE Access 2023, 11, 171–202. [Google Scholar] [CrossRef]
Choudhury, S.; Singh, R.; Gehlot, A.; Kuchhal, P.; Akram, S.V.; Priyadarshi, N.; Khan, B. Agriculture Field Automation and Digitization Using Internet of Things and Machine Learning. J. Sens. 2022, 2022, 9042382. [Google Scholar] [CrossRef]
Akhter, R.; Sofi, S.A. Precision agriculture using IoT data analytics and machine learning. J. King Saud-Univ. Comput. Inf. Sci. 2022, 234, 5602–5618. [Google Scholar] [CrossRef]
Abdu, A.M.; Mokji, M.M.; Sheikh, U.U. Machine learning for plant disease detection: An investigative comparison between support vector machine and deep learning. IAES Int. J. Artif. Intell. 2020, 9, 670–683. [Google Scholar] [CrossRef]
Li, L.; Zhang, S.; Wang, B. Plant Disease Detection and Classification by Deep Learning—A Review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
Hosny, K.M.; El-Hady, W.M.; Samy, F.M.; Vrochidou, E.; Papakostas, G.A. Multi-Class Classification of Plant Leaf Diseases Using Feature Fusion of Deep Convolutional Neural Network and Local Binary Pattern. IEEE Access 2023, 11, 62307–62317. [Google Scholar] [CrossRef]
Shireesha, G.; Reddy, B.E. Citrus Fruit and Leaf Disease Detection Using DenseNet. In Proceedings of the 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 23–25 December 2022; pp. 1–5. [Google Scholar]
Mahmoudi, A.; Benfekih, L.A.; Yigit, A.; Goosen, M.F.A. An assessment of population fluctuations of citrus pest woolly whitefly Aleurothrixus floccosus (Maskell, 1896) (Homoptera, Aleyrodidae) and its parasitoid Cales noacki Howard, 1907 (Hymenoptera, Aphelinidae): A case study from Northwestern Algeria. Acta Agric. Slov. 2018, 111, 407–417. [Google Scholar] [CrossRef]
Vélez Serrano, J.F. Vision por Computador; Chapter 3; Paraninfo: Madrid, Spain, 2021; p. 72. [Google Scholar]
Jain, A.K. Fundamentals of Digital Processing; Prentice-Hall: Upper Saddle River, NJ, USA, 1989. [Google Scholar]
Kumar, A.; Rout, K.N.; Kumar, S. High Density Salt and Pepper Noise Removal by a Threshold Level Decision based Mean Filter. In Proceedings of the 2018 International Conference on Applied Electromagnetics, Signal Processing and Communication (AESPC), Bhubaneswar, India, 22–24 October 2018; Volume 1, pp. 1–5. [Google Scholar]
Gárate-Escamila, A.K.; El Hassani, A.H.; Andrès, E. Classification models for heart disease prediction using feature selection and PCA. Inform. Med. Unlocked 2020, 19, 100330. [Google Scholar] [CrossRef]
Alazawi, S.A.; Shati, N.M.; Abbas, A.F. Texture features extraction based on GLCM for face retrieval system. Period. Eng. Nat. Sci. 2019, 7, 1459–1467. [Google Scholar] [CrossRef]
Kim, Y.; Uddin, A.F.M.S.; Bae, S.H. Local Augment: Utilizing Local Bias Property of Convolutional Neural Networks for Data Augmentation. IEEE Access 2021, 9, 15191–15199. [Google Scholar] [CrossRef]
Hong, W.; Chen, J.; Chang, P.S.; Wu, J.; Chen, T.S.; Lin, J. A Color Image Authentication Scheme With Grayscale Invariance. IEEE Access 2021, 9, 6522–6535. [Google Scholar] [CrossRef]
Chen, R.C.; Dewi, C.; Zhuang, Y.C.; Chen, J.K. Contrast Limited Adaptive Histogram Equalization for Recognizing Road Marking at Night Based on Yolo Models. IEEE Access 2023, 11, 92926–92942. [Google Scholar] [CrossRef]
Van Griethuysen, J.J.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.; Fillion-Robin, J.C.; Pieper, S.; Aerts, H.J. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017, 77, e104–e107. [Google Scholar] [CrossRef]
Riyana Putri Putri, F.N.; Wibowo, N.C.; Mustofa, H. Clustering of Tuberculosis and Normal Lungs Based on Image Segmentation Results of Chan-Vese and Canny with K-Means. Indones. J. Artif. Intell. Data Min. 2023, 6, 18–28. [Google Scholar] [CrossRef]

Figure 1. Block Diagram.

Figure 6. Original Image (a), Grayscale Image (b).

Figure 7. Original Histogram (a), Equalized Histogram (b).

Figure 8. First Original Image in Grayscale (a), Second Original Image in Grayscale (b), Filter for the First Original Image (c), Filter for the Second Original Image (d).

Figure 9. PCA.

Figure 10. Segmented Image (a), Detected Edges with the Canny method (b).

Figure 11. Bounding Boxes.

Figure 12. Accuracy by Model (a), F1—Score by Model (b), Recall by Model (c), Support by Model (d).

Figure 13. Accuracy Comparison.

Figure 14. Loss Curves: VGG19 (a), ResNet (b), DenseNet (c).

Table 1. Variance explained by each principal component.

Table 2. Variance explained by each principal component.

Principal Components	Variance	Variance (%)
PC1	0.274846	57.372779
PC2	0.088262	18.424324
PC3	0.029825	6.225807
PC4	0.027170	5.671614
PC5	0.022058	4.604421
PC6	0.014767	3.082623
PC7	0.011572	2.415664
PC8	0.010552	2.202767

Table 3. Principal Component Analysis (PCA).

	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8
Energy	0.330514	0.060444	0.008409	−0.047678	−0.066937	−0.004838	−0.050243	−0.027284
Variance_Sum	0.277826	0.044041	0.022526	−0.074000	−0.184435	0.013523	0.051554	−0.015736
Variance	0.277826	0.044041	0.022526	−0.074000	−0.184435	0.013523	0.051554	−0.015736
Homogeneity_2	0.277826	0.044041	0.022526	−0.074000	−0.184435	0.013523	0.051554	−0.015736
Sum_Squares	0.277826	0.044041	0.022526	−0.074000	−0.184435	0.013523	0.051554	−0.015736

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Valderrama Solis, M.A.; Valenzuela Nina, J.; Echaiz Espinoza, G.A.; Yanyachi Aco Cardenas, D.D.; Villanueva, J.M.M.; Salazar, A.O.; Villarreal, E.R.L. Innovative Machine Learning and Image Processing Methodology for Enhanced Detection of Aleurothrixus Floccosus. Electronics 2025, 14, 358. https://doi.org/10.3390/electronics14020358

AMA Style

Valderrama Solis MA, Valenzuela Nina J, Echaiz Espinoza GA, Yanyachi Aco Cardenas DD, Villanueva JMM, Salazar AO, Villarreal ERL. Innovative Machine Learning and Image Processing Methodology for Enhanced Detection of Aleurothrixus Floccosus. Electronics. 2025; 14(2):358. https://doi.org/10.3390/electronics14020358

Chicago/Turabian Style

Valderrama Solis, Manuel Alejandro, Javier Valenzuela Nina, German Alberto Echaiz Espinoza, Daniel Domingo Yanyachi Aco Cardenas, Juan Moises Mauricio Villanueva, Andrés Ortiz Salazar, and Elmer Rolando Llanos Villarreal. 2025. "Innovative Machine Learning and Image Processing Methodology for Enhanced Detection of Aleurothrixus Floccosus" Electronics 14, no. 2: 358. https://doi.org/10.3390/electronics14020358

APA Style

Valderrama Solis, M. A., Valenzuela Nina, J., Echaiz Espinoza, G. A., Yanyachi Aco Cardenas, D. D., Villanueva, J. M. M., Salazar, A. O., & Villarreal, E. R. L. (2025). Innovative Machine Learning and Image Processing Methodology for Enhanced Detection of Aleurothrixus Floccosus. Electronics, 14(2), 358. https://doi.org/10.3390/electronics14020358

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Innovative Machine Learning and Image Processing Methodology for Enhanced Detection of Aleurothrixus Floccosus

Abstract

1. Introduction

2. Background Definitions

2.1. Grayscale

2.2. Equalization

2.3. Median Filter

2.4. Salt and Pepper Noise

2.5. GLCM Matrix

2.6. Data Augmentation

3. Materials and Methods

3.1. Methodology

3.2. Data Description

3.3. Image Preprocessing

3.3.1. Image Type and Size

3.3.2. Transform

3.3.3. Histogram Equalization with the n CDF (Cumulative Distribution Function)

3.3.4. Noise Reduction

3.4. Image Processing

3.4.1. Feature Extraction

3.4.2. Dimensionality Reduction

3.4.3. Classification Algorithms

3.4.4. Segmentation

3.4.5. Detection and Classification

4. Results and Discussions

4.1. Pre-Processing

4.2. Processing

4.3. Classification

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI