Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds

Tariq, Muhammad Hamza; Sultan, Haseeb; Akram, Rehan; Kim, Seung Gu; Kim, Jung Soo; Usman, Muhammad; Gondal, Hafiz Ali Hamza; Seo, Juwon; Lee, Yong Ho; Park, Kang Ryoung

doi:10.3390/fractalfract9050315

Open AccessArticle

Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds

by

Muhammad Hamza Tariq

,

Haseeb Sultan

,

Rehan Akram

,

Seung Gu Kim

,

Jung Soo Kim

,

Muhammad Usman

,

Hafiz Ali Hamza Gondal

,

Juwon Seo

,

Yong Ho Lee

and

Kang Ryoung Park

^*

Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2025, 9(5), 315; https://doi.org/10.3390/fractalfract9050315

Submission received: 15 April 2025 / Revised: 30 April 2025 / Accepted: 13 May 2025 / Published: 14 May 2025

(This article belongs to the Special Issue Advances in Pattern Recognition—Image and Time Series Analyses—through Fractal Geometry and Complexity Theory)

Download

Browse Figures

Versions Notes

Abstract

Accurate classification of plant disease by farming robot cameras can increase crop yield and reduce unnecessary agricultural chemicals, which is a fundamental task in the field of sustainable and precision agriculture. However, until now, disease classification has mostly been performed by manual methods, such as visual inspection, which are labor-intensive and often lead to misclassification of disease types. Therefore, previous studies have proposed disease classification methods based on machine learning or deep learning techniques; however, most did not consider real-world plant images with complex backgrounds and incurred high computational costs. To address these issues, this study proposes a computationally effective residual convolutional attention network (RCA-Net) for the disease classification of plants in field images with complex backgrounds. RCA-Net leverages attention mechanisms and multiscale feature extraction strategies to enhance salient features while reducing background noises. In addition, we introduce fractal dimension estimation to analyze the complexity and irregularity of class activation maps for both healthy plants and their diseases, confirming that our model can extract important features for the correct classification of plant disease. The experiments utilized two publicly available datasets: the sugarcane leaf disease and potato leaf disease datasets. Furthermore, to improve the capability of our proposed system, we performed fractal dimension estimation to evaluate the structural complexity of healthy and diseased leaf patterns. The experimental results show that RCA-Net outperforms state-of-the-art methods with an accuracy of 93.81% on the first dataset and 78.14% on the second dataset. Furthermore, we confirm that our method can be operated on an embedded system for farming robots or mobile devices at fast processing speed (78.7 frames per second).

Keywords:

artificial intelligence; plant disease classification; residual convolution attention network; fractal dimension estimation

1. Introduction

According to expert predictions, an increasing number of people are migrating to cities due to population growth, and by 2050, over seven out of ten people will live in urban areas [1], raising the possibility of less productive land available for farming. In addition, in the upcoming years, food consumption will increase by 70%, causing large stress in the agricultural system [2]. Additionally, a significant portion of crop yields has been lost due to various diseases, resulting in substantial waste in agricultural production and a worldwide economic loss of millions of dollars [3]. Therefore, it is essential to quickly and correctly classify the type of disease and evaluate its severity to implement effective measures to prevent yield loss [4]. However, the manual evaluation of the leaf’s appearance has been the basis of traditional disease classification techniques employed in the agricultural sector. Owing to the complex patterns of leaves, these traditional methods require human interaction, resulting in errors in judgment [5]. To overcome such problems, previous studies proposed disease classification methods based on artificial intelligence (AI), including machine learning (ML) and deep learning (DL) techniques [6]. In recent years, the integration of the Internet of Things (IoT) with advanced DL and ML algorithms has elevated disease detection to a new level, enabling the development of intelligent, robust, and highly efficient systems. This fusion of technologies empowers real-time, precise disease monitoring, and analysis, providing unprecedented support for proactive agricultural management and minimizing yield losses [7,8]. Despite the advancements IoT technology has brought to various fields, it still faces challenges in transmitting data from sensing devices to cloud servers and other devices. These limitations stem from constraints in available bandwidth and the reliability of internet connectivity or alternative communication protocols [9].

To address this issue, we proposed an IoT-based precision agriculture system, illustrated in Figure 1. In this approach, agricultural robots or mobile devices can classify the diseased plant leaf images collected by a built-in camera. For diseased classification in complex backgrounds, we proposed a DL-based convolutional neural network (CNN) architecture, the residual convolutional attention network (RCA-Net), which is implemented on an embedded system with a fast processing speed (78.7 frames per second). Instead of sending many images to the cloud server, only classification results are sent, which solves the bandwidth issues of the IoT environment. The cloud server commands the sprayer to apply the appropriate type of pesticide based on the classification results. Farmers can also access the data on the cloud server to monitor the growth of their crops. By sending only classification results to the cloud server and receiving pesticide spraying commands from the cloud server instead of numerous images in large farming areas with poor internet access, it is a solution in the intelligent IoT environment.

This study makes four major contributions to the literature:

-: We propose a residual convolution attention (RCA) block that enhances feature maps by focusing on disease-affected leaf regions, while suppressing irrelevant background noise. It applies attention weights to emphasize important disease-related features and uses residual connections to retain the original information.
-: Complex backgrounds in plant leaf disease images increase inter-class similarities and amplify intra-class differences. To address this issue, a residual concatenated block (RCB) is proposed to use parallel convolution to capture fine and coarse features, thereby increasing the inter-class differences. Additionally, batch normalization within the RCB module can help normalize the feature distributions by minimizing the internal covariant shift and reducing the variation within the same class. This module combines original input features and trained ones with a residual connection containing crucial information regarding the disease region.
-: The analyzed dataset contained various interfering elements in the background, such as leaves, branches, or soil. Therefore, a parallel dilated convolution block (PDCB) with four parallel convolutional layers, each with different dilation rates, is proposed to expand the receptive field without increasing the kernel size, acquiring features at multiple scales. This enables each layer to capture a wider context from the image, which is useful for identifying leaf patterns in infected areas from a complex background.
-: We introduce the fractal dimension estimation to analyze the complexity and irregularity of class activation maps from the cases of healthy plants and their disease classes, confirming that our model can extract important features for the correct classification of plant disease. In addition, we confirm that our method can be operated on an embedded system for farming robots or mobile devices at fast processing speed (78.7 frames per second). Furthermore, our model and code are made publicly available on GitHub [10] for a fair comparison.

The overall mind-map of the whole proposed study is presented in Figure 2. The remainder of this paper is organized as follows: Section 2 presents the previous studies regarding plant disease classification. Section 3 explains the proposed methodology. Section 4 compares the performance of the proposed approach with that of the existing methodologies. Section 5 presents an in-depth discussion of the performance results. Finally, Section 6 concludes the study.

2. Related Work

This section reviews previous studies on plant disease classification. It is divided into two subsections: disease classification using images with simple backgrounds and that with complex backgrounds.

2.1. Disease Classification of Images with Simple Background

2.1.1. ML-Based Methods

ML has primarily focused on examining the physical characteristics of plant leaves, such as color, texture, and type, to classify plant diseases. Support vector machine (SVM) [11], K-nearest neighbors (KNN), and K-means classifiers [12] are well-known ML methods for this purpose. Hossain et al. [13] employed the gray-level co-occurrence matrix (GLCM) to extract texture and color characteristics, and KNN for classifying plant leaf diseases. Mokhtar et al. [14] investigated tomato leaf diseases by using an SVM as a classifier and a GLCM. Bhagat and Kumar [15] used the speeded-up robust features (SURF) methodology and bag of words (BoWs) to extract the important features from images of diseases affecting potatoes, tomatoes, and bell peppers, and SVM was used as the classifier in this study. Pantazi et al. [16] developed a one-class SVM classifier for disease recognition and generated features using a local binary pattern (LBP). Most of these studies showed good performance in disease classification but yielded low performance with many images showing various types of diseases in real-world and uncontrolled environments. This limitation has made it possible for researchers to investigate DL as a potential remedy for plant diseases, as explained in the following subsections.

2.1.2. DL-Based Methods

In red-green-blue (RGB) image classification tasks, CNN excel because of their automatic feature extraction capacity [17]. Paymode and Malode [18] conducted research on the identification of leaf diseases in grapes and tomatoes using a transfer learning approach. They modified the pre-trained visual geometry group (VGG)-16 and were able to attain classification accuracies of 95.71% and 98.40% for grape and tomato datasets, respectively. Agarwal et al. [19] proposed a custom CNN model for the classification of 10 classes from a tomato disease leaf classification dataset, which outperformed the state-of-the-art (SOTA) model. Atila et al. [20] presented an effective customized CNN model based on DL utilizing the PlantVillage dataset [21]. In another study, biotic stress and severity were measured in coffee leaves using the residual network (ResNet)-50, VGG-16, MobileNetV2, etc. [22]. Nag et al. [23] utilized transfer-learning to fine-tune SOTA models to develop a classification model. Following this, Zhao et al. [24] proposed a convolution and self-attention transformer network (CAST-Net) model using a combination of convolution and self-attention to recognize plant leaf disease. They experimented with the PlantVillage and tomato 10-class dataset which was also obtained from PlantVillage.

Although these studies showed good performances with many images of various types of diseases in real-world and uncontrolled environments, they did not consider images with complex backgrounds, which can cause performance degradation owing to the image backgrounds, including soil, other leaves, bushes, or tree branches [25]. Therefore, there is a need for a method that can effectively classify plant diseases even in complex backgrounds, and the following research has investigated this topic.

2.2. Disease Classification of Images with Complex Background

The accurate classification of plant diseases in images with complex backgrounds poses a significant challenge because of intricate and varied visual patterns. Various advanced models have been developed to address this issue and have demonstrated promising results. Considering the ML-based method, Madhavan et al. [26] conducted a study using ML and image processing techniques for disease classification and recognition of pomegranate leaves. The dataset consisted of five classes, four diseased and a healthy class. Preprocessing was performed using image processing techniques such as color enhancement, resizing, and converting RGB-colored plant leaf images to luminosity. The researchers utilized the K-means algorithm to cluster the diseased regions in the leaf images. For classification, they used a multi-class SVM as a classifier. The proposed method shows promise for early detection of diseases to help crop health management and reduce economic losses.

Zhao et al. [27] introduced a multi-scale attention fusion with discriminative enhancement deep nearest neighbor neural network (MAFDE-DN4) and a few-shot learning strategy with meta-attention mechanisms, which also achieve promising results in a complex background. However, their method requires high-quality labeled images, as low-quality samples may lead to misclassification. Ding et al. [28] proposed receptive field and coordinate attention ResNet (RFCA ResNet), based on the dual attention and multiscale feature extraction, for the classification of apple leaf diseases, achieved 89.61% accuracy. They utilized focal loss along with class balance techniques to mitigate the effect of unbalanced data. Wang et al. [29] presented the DL methods based on the improved multi-scale Retinex color restoration (MSRCR) for enhancing the image’s contrast and restoring the true color and proposed self-calibration convolutional residual network (OSCRNet) for the classification of the plant disease. Although the MSRCR increases the quality of the image, it also increases the noise which may lower the efficiency of the model. Wang et al. [30] proposed the Efficient Channel Attention (ECA)-ConvNeXt model for classifying six categories of rice leaf diseases, achieving an accuracy of 94.82%. The ECA module significantly improved ConvNeXt’s ability to capture essential features and enhanced efficiency. However, the ECA-ConvNeXt model may still be resource-intensive and computationally complex, limiting its deployment on mobile devices or agricultural robots. Daphal and Koli [31] proposed an ensemble model for classifying diseases in sugarcane leaves. They enhanced the model’s feature extraction capability by stacking multiple CNN models and incorporating a spatial attention mechanism. Although this approach achieved an accuracy of 86.53%, the increased computational complexity from stacking models limits its feasibility for real-world deployment.

Although they demonstrated good performance for images with complex backgrounds, their models were computationally expensive. Because plant disease classification is often performed on the embedded system of an agricultural robot or mobile device with low computing power, which is the real-world scenario in the agricultural domain, the issue of computational cost should be considered. Therefore, this study focused on creating a computationally effective CNN architecture that can classify plant leaf diseases in complex backgrounds, RCA-Net.

Several DL architectures integrate residual connections with attention mechanism; however, they are not specifically designed for the plant disease classification tasks. In contrast, our proposed RCA-Net is tailored to address the challenges of plant disease classification in real-world images with complex backgrounds. To this end, we have introduced scenario-specific attention mechanisms that incorporate residual connections. Although they are not related to plant disease classification, RCAF-Net [32] is designed for road extraction via multi-modal fusion of remote sensing images and GPS trajectories. RCAF-Net’s attention mechanism (RSTCA) focuses on fusing features from different modalities using Swin transformer and channel attention blocks, resulting in increased computational complexity. In contrast, our RCA block in RCA-Net is directly integrated into the feature extraction pipeline, ensuring that attention is focused on disease-relevant regions while maintaining computational efficiency suitable for real-time deployment on embedded and IoT devices.

In other research [33], residual attention network employs the generic stacked attention modules and bottom-up top-down structure for attention. This stacking of the attention modules leads to the degradation in accuracy because there is risk of amplifying unwanted noise along with useful features. Different from [33] which relies on the fully connected (FC) layers within its attention modules, our RCA-Net’s attention module uses the light-weight convolutional layers to generate attention maps, making it more efficient and better suited for suppressing background noise. Additionally, our RCA-Net’s residual and attention modules are specifically engineered to maintain strong gradient flow and feature preservation, leading to higher classification accuracy and robustness in noisy, real-world agricultural environments, outperforming both the generic and fusion-based attention mechanisms of the compared SOTA models as shown in Section 4.3.3.

Table 1 shows the comprehensive comparison of the methodologies adopted by the previous studies and our proposed methodology.

3. Proposed Method

3.1. Workflow Overview of the Proposed Method

Figure 3 illustrates the overall procedure of the proposed approach. Once RCA-Net was trained in the training phase, we performed testing using an unseen set of the dataset in the testing phase, and RCA-Net produced one of the n-class outputs, including one healthy class and n − 1 disease class presents a schematic of the overall procedure of the proposed method.

3.2. Structure of RCA-Net

This section outlines the structure of the proposed RCA-Net architecture, as shown in Figure 4. In this study, pre-trained MobilenetV3-Large [34] was utilized as the backbone network of RCA-Net, and for its implementation, we employed the source code of pre-trained MobilenetV3-Large from following GitHub repository [35]. Except for this, we did not use any other source code for the foundation of our work. RCA-Net comprises 23 blocks, including 15 bottleneck blocks of the backbone network. In addition, three new blocks were introduced in this study: RCA, RCB, and PDCB, as shown in Figure 4, Figure 5 and Figure 6, respectively.

The proposed RCA-Net utilizes multi-scale feature extraction, residual connections, and attention mechanisms, which are effectively combined with the bottleneck blocks of the base model, MobileNetV3-Large, to accomplish the classification task as shown in Figure 4. The design employs element-wise addition to fuse the output features from the attention and residual blocks into the bottleneck block of the next stage in the base model. Element-wise addition merges these features in a way that enhances mutual information without redundancy. This approach enables the model to emphasize key enhanced features while simultaneously retaining essential information from the residual block. The use of residual connections in the proposed blocks helps maintain gradient flow and preserves important details that are critical for classification.

An input image of 224 × 224 × 3 pixels is fed to the model, which initially passes through a convolutional block. This block consists of convolution, BN, and HS activation functions and reduces the dimensions to 112 × 112 × 16. After processing, the output feature map of the initial convolutional block is passed to the bottleneck block in the first stage. The bottleneck block is used twice during this stage and the output feature map is reduced to 56 × 56 × 24 pixels. The bottleneck block, including depth-wise convolution and residual connection, performs the function of the feature extractor along with the channel reduction and expansion processes throughout the core structure of the proposed mode. After the first stage, we integrated two modules, RCA and RCB, which operate in parallel and receive the same input from the previous bottleneck. The RCA module enhances the most relevant features extracted by the bottleneck while preserving informative spatial regions.

The RCB further refines the multi-scale features using two parallel convolutional layers with kernel sizes of 3 × 3 and 5 × 5. For establishing the effective collaboration mechanism between modules, this design employs element-wise addition to fusing the output features from the attention and residual blocks into the bottleneck block of the next stage in the base model as shown in Figure 4. Element-wise addition combines these features in a way that enhances mutual information without redundancy. This approach enables the model to emphasize key enhanced features while simultaneously retaining essential information from the residual block. The combined outputs from the RCA and RCB modules are passed to the bottleneck block in the next stage of the model. In the next two stages, the pattern of feeding the combined output of the RCA and RCBs into the subsequent bottleneck blocks is repeated. After each repetition, the output feature maps are 56 × 56 × 24, 28 × 28 × 40, and 14 × 14 × 80 pixels. Combining these modules improves the representation of low and intermediate-level features. The next three stages of the base model are kept the same, and multiple bottleneck blocks are used to process the feature map. The feature map dimensions are reduced to 7 × 7 × 160 pixels after the final bottleneck block. Then, the new block PDCB is incorporated, and the final bottleneck output is taken as the input. Although attention and residual blocks are used to extract low- and intermediate-level features, the spatial size of plant disease in the image is very small in some cases. This necessitates expanding the receptive field to capture more detailed information and focus on key regions of the disease. The PDCB is particularly beneficial for this purpose, as it enhances the model’s ability to classify diseases effectively. PDCB is designed to capture the most useful dilated features with an increased receptive field, ensuring that the model can gather spatially extended features. An output feature map with dimensions 7 × 7 × 160 is obtained.

The feature vector is then fed into convolutional layer 2, which consists of a 1 × 1 convolution layer, followed by the BN and HS layers. Next, average pooling is applied to reduce the spatial dimensions to 1 × 1 × 960. Subsequently, the two FC layers produce output feature maps of 1 × 1 × 960 and 1 × 1 × 1280. Finally, the SoftMax layer is integrated at the end for the assigning of the final probability score of each class. Moreover, the purpose of introducing these new blocks is to focus on the crucial low-, intermediate-, and high-level contextual features of the base net. This method works well by giving much better results without a significant jump in the complexity of the model.

We placed the modules of RCA-Net between the bottleneck blocks at different stages of MobileNetV3-Large, optimizing feature transmission across the network. The bottleneck blocks in MobileNetV3-Large are designed to efficiently compress information, capturing essential features. This scenario-based module placement allows the network to refine the combination of features from modules, resulting in an enriched feature representation. Table 2 presents details related to the layer-by-layer configuration of the proposed model, and detailed explanations of each RCA-Net module are presented in the following sections.

3.2.1. RCA

We propose a residual convolution attention (RCA) module specifically for the challenges of plant disease recognition in agriculture contexts as shown in Figure 5. Unlike standard attention mechanisms that are designed for generic images datasets, the RCA module incorporates the scenario-specific adaptation to handle the noise introduced by the complex background such as leaves and branches. This design ensures the effective suppression of irrelevant features while maintaining focus on disease affected leaf regions. The RCA module is integrated into the backbone network, MobilenetV3-Large, to enhance the feature representation. The RCA plays an important role in improving the accuracy of the model. The attention mechanism enhances the network’s ability to focus on relevant disease-related features, while suppressing irrelevant background noise. Traditionally, attention mechanisms contain an FC layer [36], while we have used convolutional layers, making the RCA lightweight and potentially reducing the number of parameters and computational costs.

The RCA also uses a residual connection that allows the important original feature to be retained alongside the attention-weighted features. This design ensures that disease-related critical information is preserved throughout the processing of our model, preventing any feature loss. By integrating both the attention-enhanced and original features, the model can leverage both refined information and unaltered, ultimately enhancing the effectiveness and robustness of the attention mechanism.

The RCA is utilized in the initial three stages of RCA-Net, as shown in Table 2, to exploit both low- and intermediate-level features. The details are shown in Figure 5. The input feature map

F_{i n} ϵ R^{C_{i n} \times H_{i n} \times W_{i n}}

first passes through the average pooling layer, which reduces its dimensions while preserving the channel information. Here,

C_{i n}

indicates the number of input feature maps, and

H_{i n}

and

W_{i n}

are the height and width of the feature maps, respectively. This pooling operation aggregates spatial information and helps create a global context for the features. Following the pooling layer, feature map

f_{1}

is processed through a convolutional layer with a filter size of 7 × 7, followed by a BN layer, as shown in Equation (1), to capture the relevant features while maintaining the spatial information and normalizing. There is a 1 × 1 convolutional layer for projecting the feature map onto a different channel. The sigmoid function is then used to obtain attention coefficient A, as shown in Equation (2). Finally, the attention-modulated features from the sigmoid function are multiplied elementwise with the original input and added back to the module’s input through the residual connections, as shown in Equation (3). This helps the network preserve the original information. The mathematical expression of RCA can be represented as follows:

f_{1} = {C o n v}^{7 \times 7} (a v g . p o o l (F_{i n}))

(1)

A = S i g m o i d ({C o n v}^{1 \times 1} (f_{1}))

(2)

F_{R C A} = (A \otimes F_{i n}) ⨁ F_{i n}

(3)

where ⊗ and ⨁ are the element-wise multiplication and addition operation.

3.2.2. RCB

Complex backgrounds in plant leaf disease images increase inter-class similarities and amplify intra-class differences. The purpose of developing an RCB is to increase the inter-class differences and decrease the intra-class variations. The module comprises two parallel convolutional layers, each followed by a BN and a rectified linear unit (ReLU) 6 activation function layer. The detailed structure is shown in Figure 6.

The parallel structure enhances the system’s ability to capture a diverse set of features from the input feature map using different filter sizes for each layer, thereby increasing the inter-class differences. The same input feature map

F_{i n} ϵ R^{C_{i n} \times H_{i n} \times W_{i n}}

from the previous bottleneck is passed to the RCB in each layer, separately, as shown in Figure 4. The first convolution layer applies a filter size of 3 × 3 to the input, and a filter size of 5 × 5 is used in the second layer.

The RCB utilizes multiple kernel sizes to increase the receptive field, allowing each output to encompass a large area of the input. This enables the network to capture more contextual information from a broader spatial region, especially at the initial stages of the model, which is essential for capturing disease-related features effectively. BN within the RCB module can help normalize the feature distributions by minimizing the internal covariant shift and reducing the variation within the same class. BN achieves this by normalizing the activations of each batch to have a mean of zero and a standard deviation of one and then scaling and shifting these normalized activations using learnable parameters [37]. The outputs of the layers with different receptive fields are then concatenated along the channel dimensions, as shown in Equation (4). This concatenation facilitated the extraction of a broader range of spatial information.

Finally, the concatenated feature map

f_{c a t}

is added elementwise to the input using a residual connection, which helps reduce the intra-class differences. The residual connection preserves the original input features and carries them forward, easing the learning process and mitigating the vanishing gradient. By blending the original features with the learned features (from both 3 × 3 and 5 × 5 convolutional layers), the model leveraged both types of information. The original input features may contain essential information, whereas learned features capture more complex and detailed variations. By combining these features, the model reduces background noise and other variations that could lead to intra-class differences, thereby focusing on maintaining the distinct features necessary for class identification. The final output feature map is obtained as

F_{R C B} ϵ R^{C_{o u t} \times H_{o u t} \times W_{o u t}}

), as shown in Equation (5). Employing RCB at different stages of a network helps capture discriminative features from complex patterns of input data. RCB can be represented by the following equations:

{f_{c a t} = C o n c a t (C o n v}^{3 \times 3} {(F}_{i n}), {C o n v}^{5 \times 5} {(F}_{i n}))

(4)

F_{R C B} = f_{c a t} ⨁ F_{i n}

(5)

where

⨁

is the element-wise addition operation.

3.2.3. PDCB

We used the leverage of the dilated convolution in the PDCB to exploit the high-level features acquired from stage 6 bottleneck, as shown in Table 2. In contrast to conventional convolutional layers, dilated convolution helps capture a broader range of contextual information [38]. The proposed module contains four parallel-dilated convolutional layers with different dilation rates, as shown in Figure 7.

The use of dilation convolution helps the network to extract a wider range of features by increasing the receptive field without adding additional parameters, which is useful for identifying leaf patterns with infected areas from a complex background. Dilation rates of 6, 12, 18, and 24 are applied to the four layers and the filter size is 3 × 3. Various dilation rates can be used to capture multi-scale features. The input feature tensor

F_{i n} \in R^{C_{i n} \times H_{i n} \times W_{i n}}

of size 7 × 7 × 160 is passed through each layer, and four output feature tensors (

f_{d 1}, f_{d 2}, f_{d 3}, {a n d f}_{d 4}

) are regenerated. The mathematical representation is as follows:

(f_{d 1}, f_{d 2}, f_{d 3}, {a n d f}_{d 4}) = D C (m, n)

(6)

D C (m, n) = \sum_{q} \sum_{r} F_{i n} (m + d \times q, n + d \times r) \times k (q, r)

(7)

where DC (m, n) represents the output of dilated convolution,

F_{i n}

(m, n) is the input, and k denotes the convolution filter with dilation rates d = 6, 12, 18, and 24.

The four extracted feature maps from the parallel dilated convolutional layers are then fused through a concatenation operation along the channel dimension, as shown in Equation (8). The concatenated feature

f_{C}

is fed to an additional convolutional layer, which reduces channel dimensionality while preserving spatial information. Finally, the sigmoid activation function is applied, and the output

F_{P D C B} \in R^{C_{o u t} \times H_{o u t} \times W_{o u t}}

is obtained as shown in Equation (9).

f_{C} = C o n c a t (f_{d 1}, f_{d 2}, f_{d 3}, f_{d 4})

(8)

F_{P D C B} = S i g m o i d ({C o n v}^{3 \times 3} (f_{C}))

(9)

The PDCB causes a feature vector of 7 × 7 × 160 to pass into the 1 × 1 convolution layer to perform an effective linear transformation of the channels, as shown in Figure 7. Following the 1 × 1 convolution layer, global average pooling is applied to calculate the average value of each channel of the feature map across all spatial locations. Subsequently, the FC layer based on two 1 × 1 convolutional layer is integrated for the conversion of high-dimensional features to low-dimensional features, as shown in Figure 7. Before applying the FC Layer, the feature map must be flattened and fed into the FC layer, which further investigates the discriminative feature vector and provides a 1 × 1 × 1280 feature vector. This is achieved by taking the product of the previously obtained feature vectors with weights and adding the bias vector to the product. The probability score for each class is then calculated by applying the SoftMax function, which converts the final feature vector into class probabilities. These probabilities are used to assign each sample to the class with the highest score. The mathematical expression for SoftMax is as follows:

{S M (z}_{i}) = \frac{k^{z_{i}}}{\sum_{p = 1}^{q} k^{z_{p}}}

(10)

where

z_{i}

is the i-th element of the input array

z

and q is the total number of classes. k is the base of an exponential function.

4. Experimental Results and Analysis

4.1. Experimental Dataset and Setup

To evaluate the performance of the proposed network, we used two public datasets: the sugarcane leaf disease dataset (SCLD) [39] and the uncontrolled environment potato leaf dataset (UPLD) [40]. The SCLD [39] consists of 2569 images of sugarcane leaves collected directly from different farms in Pune, India. Smartphone cameras were used to capture pictures, with an image pixel range of 13–48 megapixels (MPs). The SCLD was categorized into five classes: four disease classes of mosaic, red rot, rust, yellow, and healthy, each containing approximately 500 images. The obtained images had different image resolutions but were scaled to a uniform size of 224 × 224 pixels. Figure 8 presents sample images from the SCLD.

The UPLD [40] is a dataset of potato leaf diseases in an uncontrolled environment. This dataset was collected from Magelang and Wonosobo fields in Central Java, Indonesia. Smartphone cameras were used to capture images while maintaining a variety of backgrounds. The resolution of the original images was 1500 × 1500 pixels, which were resized to 224 × 224 pixels. A total of 3076 images of potato leaves were collected and categorized into seven classes: six diseases caused by viruses, phytophthora, nematode, fungi, bacteria, and pests, and one healthy class. Example images from the UPLD are shown in Figure 9. The details of both the datasets are listed in Table 3.

For the experimentation and evaluation of the RCA-Net, we used a desktop computer equipped with an Intel Core i5-4690 central processing unit (CPU), manufactured by Taiwan semiconductor manufactoring co. (TSMC), Hsinchu, Taiwan, with 16 GB of installed RAM and an NVIDIA GTX 1070 graphics processing unit (GPU) [41], manufactured by NVIDIA, Santa Clara, CA, USA. The proposed algorithm is developed using PyTorch (version 1.12.0) [42].

4.2. Training of the Proposed Method

For training RCA-Net, each dataset is divided into 80% training, 10% validation, and 10% testing sets according to the data distribution schemes in SCLD [39] and UPLD [40], and 10-fold cross-validation is performed for fair comparisons with SOTA methods. Images from both datasets are uniformly resized to 224 × 224 pixels. During training, adaptive moment estimation (Adam) was used as an optimization technique for the network, with an initial learning rate of 0.001 for SCLD and 0.0001 for UPLD. For the SCLD, RCA-Net is trained for only 20 epochs, whereas for the UPLD model, it remained in the training phase for 50 epochs, because it is a lightweight model, requiring a small number of epochs. Increasing the number of epochs did not increase the accuracy. The total span for training the proposed model is about 4 and 10 min for SCLD and for UPLDs, respectively.

Graphs of the training and validation accuracy and loss using RCA-Net are depicted in Figure 10 and Figure 11. As shown in these figures, the training accuracy and loss increased and decreased, respectively, and finally converged as the number of epochs increased, confirming that RCA-Net was sufficiently trained with the training data. In addition, the validation accuracy and loss increased and decreased, respectively, and finally converged with the increase in epochs, confirming that RCA-Net did not overfit the training data.

4.3. Testing of Proposed Method

4.3.1. Evaluation Metrics

Four performance indicators accuracy, precision, recall, and F1 score are used in this study, which are the universal metrics widely utilized in previous studies [43,44,45]. Accuracy is the most common and key indicator that reflects the overall accuracy of a model across all classes. It considers the ratio of the correctly identified samples to the total number of samples. Higher accuracy values indicate better results produced by the model. Precision is the ratio between the accurately classified positive samples and all predicted positive labels. The recall evaluates the capability of the model to identify positive samples. A high recall indicates that the model detected a large number of positive samples. The F1 score is defined as the harmonic mean of recall and precision. The mathematical expressions for the four performance indicators are defined as follows:

A c c u r a c y = \frac{T_{P} + T_{N}}{T_{P} + T_{N} + F_{P} + F_{N}}

(11)

P r e c i s i o n = \frac{T_{P}}{T_{P} + F_{P}}

(12)

R e c a l l = \frac{T_{P}}{T_{P} + F_{N}}

(13)

F 1 s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(14)

where

T_{P}

,

T_{N}

,

F_{P}

, and

F_{N}

are the number of true positives, true negatives, false positives, and false negatives, respectively.

T_{P}

denotes the number of samples from the dataset that the model correctly classifies as positive.

T_{N}

represents the number of samples that belong to the negative samples and the model also predicts them as negative samples.

F_{P}

indicates the total count of samples belonging to the negative class that the model identifies as positive class samples. The number of samples from the dataset that the model classifies as negative samples but from truly positive samples is referred to as

F_{N}

.

Fractal dimension (FD) is the fifth evaluation metric used in this study. FD is a mathematical measure that provides the quantitative analysis of surface irregularities. Recent studies have applied FD in several DL domains, including medical disease recognition [46], pattern recognition [47], plant disease detection [45], and urbanization studies [48]. This study utilizes FD estimation to analyze pattern complexity and structural irregularities of structure. This analysis provides important insights regarding the morphological differences between healthy and diseased leaves, thereby enhancing the effectiveness of diseased plant leaf images by using our proposed methodology. We calculated the FD by converting the activated leaf images into binary images based on the predictions of our RCA-Net for healthy and diseased classes. In these binary images, the activated region of the images is represented by 1 (white) color and other regions with 0 (black). The FD ranges between the 1 and 2, where a higher the value indicates greater shape complexity. We adopted the famous box-counting methodology [49], it can be mathematically represented as follows:

F D = {l i m}_{β \to 0} \frac{l o g (M (β))}{l o g (1 / β)}

(15)

where M(β) represents the number of boxes whose size is

β

containing at least one black pixel. FD refers to the fractal dimension, and it lies between 1 and 2 for all values of β > 0. There have been the cases that FD typically ranges between 1 < FD < 2, which aligns with previous research involving real object examples [50,51]. The pseudocode for the FD estimation using the box-counting methodology is provided in Algorithm 1.

Algorithm 1: FD estimation pseudo-code [46]

Input: Img: Binarized grad-cam activated image from the output of RCA-Net

Output: Fractal dimension (FD)

1: Fix the box to maximum dimensions nearest to the power of 2

β

= 2^[log(max(size(Img)))/log2]

2: Adjust the size by padding of the Img if its dimensions are less than

β

if size(Img) < size(

β

):
padding(Img) =

β

end

3: Initialize the number of boxes
b = zeros (1,

β

+ 1)

4: Calculate the number of boxes M(

β

) until the last pixel of diseased region
b(

β

+ 1) = sum(Img(:))

5: Decrease the size of the box by dividing by 2 and again calculate M(

β

) while

β > 1 .

6: Perform calculation of log(M(

β)) a n d l o g (1 / β)

for each value of

β .

7: Fit a straight line to [(log(M(

β)), l o g (1 / β)

)] using least square regression:
FD = slope of fitted line

4.3.2. Ablation Studies

To analyze the significance of the newly introduced modules in RCA-Net, an ablation study was conducted to assess their relative contribution to the effectiveness of the network. The goal was to systematically analyze how different combinations of modules in the model could influence its classification capability. The results of these experiments are presented in Table 4. Five different combinations of the proposed modules with a base network were used. First, the base network (MobileNetV3-Large) was trained using the weights of ImageNet (Case 1). Second, we integrated the RCA module to enhance the power of the important low-level features (Case 2). The performance metrics were enhanced compared to the results obtained from the previous case.

Finally, an experiment was conducted by embedding only the RCB module in the base network (Case 3). This scenario also enhanced the accuracy compared with the base network, but it was slightly lower than that in Case 2, whereas the other three performance indicators were higher. Next, we extracted more global contextual information related to plant leaf disease from the high-level features using the PDCB module (Case 4). Overall, the addition of these three modules individually boosted the results compared with the base module. Finally, we integrated a combination of the RCA and RCB modules, highlighting the most relevant features at the initial stages of the network (Case 5). This combination showed a significant increase in the performance measures. Finally, RCA-Net was trained by incorporating all three proposed modules into the base network (Case 6). The proposed model performed significantly well and accurately in distinguishing between the different classes of diseases in plant leaves. The key driver for the improved performance of the model was the enhanced feature representation achieved by integrating low-, medium-, and high-level features. In the RCA, the attention map enhances the power of the features of the diseased region to suppress background noise. RCB utilizes a combination of parallel convolutional layers to lower the intra- and inter-class similarities. The combination of these blocks enhanced the ability of the model to learn the discriminative features of plant leaf diseases in the initial stages. The PDCB captures multiscale features with the help of a parallel dilated convolution. Overall, all these modules contributed significantly to enhancing the performance of the proposed RCA-Net.

To further investigate the impact of the residual connection in the RCA module of RCA-Net, we conducted experiments under two scenarios: first, by removing the residual connection from the RCA module, and second, by applying the residual connection, as shown in Figure 4. Our exploration revealed that optimal performance was achieved by utilizing the skip connection in the RCA module because it significantly enhanced the learning capacity and feature representation. The detailed results of the experiments are presented in Figure 12.

A third ablation study was conducted to evaluate the influence of RCB on the performance of the proposed network. We performed various experiments by altering the number of paths and removing the residual connection from the RCB. In Case 1, we removed the skip connection from the RCB and used a concatenated feature map as the output. Additionally, we tested a combination of the upper path 1 (3 × 3 Conv + BN + ReLU6 in Figure 5) and skip connection (Case 2), but the results were lower than those of the proposed network. We also experimented using the lower path 2 (5 × 5 Conv + BN + ReLU6 in Figure 5) and a residual connection (Case 3), which again resulted in decreased performance. Overall, the third ablation study indicated that the feature learning capability of the model was reduced in Cases 1–3 compared to the proposed network, leading to a decline in performance, as shown in Figure 13.

The residual connection in the proposed model is implemented using element-wise addition to make sure the efficient mechanism has a low computational cost to merge features. Although we recognize that it may lack fine control over feature selection, we prioritized maintaining computational efficiency and avoiding unnecessary complexity. To address the optimization of residual connections, we conducted the fourth ablation study, by comparing our simple residual connections with gated [52] and weighted [53] residual connections in the RCB module. The results, as shown in Figure 14, demonstrate that the simple residual connection, despite its simplicity, achieves the highest performance, even in the presence of complex backgrounds. The simple addition of features helps the model improve training stability and reduces the risk of vanishing gradient, without adding extra complexity. In contrast, the weighted and gated residual connections offer more control over feature selection, increasing the number of learnable parameters, which may result in redundancy and hinder performance. Additionally, RCA-Net already incorporates an attention mechanism that effectively emphasizes essential information. The added control offered by gated or weighted residual becomes redundant leading to overfitting in our framework and does not significantly enhance the model’s ability to handle complex backgrounds.

In the fifth ablation study, we assessed the performance of the proposed RCA-Net based on various numbers of dilated convolution layers in the PDCB module. We tested combinations of three, four, and five dilated convolutional layers. Among these approaches, the PDCB, with four layers, outperformed the other two combinations and delivered the best results. When the model was trained with three or five layers, a decline in performance was observed. This suggests that using four dilated convolution layers yields the best balance, enhancing the ability of the model to learn complex patterns without over- or under-fitting. The detailed results of these experiments are presented in Figure 15, and they provide valuable insights into optimizing the architecture of RCA-Net for achieving improved performance.

We conducted another experiment using dynamic dilated convolution and static dilated convolution. Dynamic dilated convolution [54] offers flexibility by adjusting the receptive field across different diseased regions based on the input feature map. This adaptability allows the model to capture features at varying scales dynamically. Dynamic dilated convolution introduces additional computational complexity by the extra task of predicting dilation rates and adjusting the receptive field based on image characteristics. These additional operations increase processing time, making the deployment of agricultural robots having low computational resources more challenging. Our experiments demonstrate that using static dilation rates is more effective for the current application than using dynamic dilated convolution [54] as shown in Figure 16. The use of a static dilation rate ensures robust and efficient multi-scale feature extraction, which is particularly important for addressing the diverse and complex scenarios encountered in agricultural environments.

In the final ablation study, a series of experiments were conducted by adjusting the number of RCA and RCBs used in the proposed network. The aim of these experiments was to achieve a balance between the performance of the model and the overall complexity. The results of comprehensive ablation studies are presented in Figure 17. This involved iteratively varying the number of RCA and RCBs to observe their impact on both the predictive capabilities and computational efficiency of the model. Notably, the ablation study revealed that the proposed architecture yielded the best results with three iterations of RCA and RCB. When these blocks were repeated twice, although the number of parameters of the proposed architecture decreased, the performance declined. Moreover, in the other cases, we configured the RCA and RCB by repeating them four and five times, which led to an increase in the complexity of the model and resulted in its overfitting.

4.3.3. Comparison of the RCA-Net with SOTA Models

This section presents a comparative analysis of the proposed RCA-Net framework that considers SOTA models. VGG-19 [55], VGG-16 [56], ResNet-50 [57], XceptionNet [58], MobileNetV2 [59], EfficientNet-B3 [60], MobileNetV3-Large [61], ConvNeXt-Tiny [62], DenseNet121 [23], ResNet-101 [63], ShuffleNetV2 [64], ECA-ConvNeXt [30], and Ensemble Net [31] are compared for the classification of the plant leaf disease images. We trained the CNN SOTA models on our datasets by utilizing pretrained weights and performing fine-tuning. For fair comparisons, the same numbers of epochs to those for our model in Section 4.2 were used for the training of other SOTA models. The results are not from the reference papers but performed by us through the implementations of the SOTA models. In addition, the same parameters as those for our model were used for fair comparisons. Our proposed network, RCA-Net, surpassed all existing models listed in Table 5 across all performance metrics on SCLD.

To further validate the robustness and effectiveness of our proposed RCA-Net, we conducted a comprehensive series of experiments on the UPLD. In these experiments, we benchmarked its performance against that of various SOTA methods. RCA-Net consistently outperformed these SOTA models across all evaluation metrics, as shown in Table 5. This comprehensive set of results underscores the enhanced capabilities and reliability of the proposed RCA-Net for various classification tasks.

The proposed model presents such a difference in its performance in Table 5 due to variations in the characteristics of datasets. Specifically, the UPLD contains images with more complex backgrounds, varying lighting conditions, and higher noise levels compared to the SCLD. This makes it more challenging for our model to accurately classify plant disease with UPLD compared to that with SCLD. From Figure 8, it is evident that the SCLD samples exhibit consistent light. In contrast, Figure 9e,g illustrate the examples where parts of the leaf are illuminated by sunlight while other parts remain in shadow, complicating the model’s ability to extract meaningful features effectively. Additionally, the presence of similar-looking leaves in the background further increases the noise level. For instance, in Figure 9e,g, the leaf belongs to pest and healthy class, respectively, but leaves in the background are from other classes, potentially increasing the noise which can lead to misclassification. Although these factors are creating the chances of misclassification, our proposed RCA-Net is still outperforming the other SOTA models. Some SOTA algorithms exhibit a similar decrease in performance between Table 5 while others do not show a decline in performance. This is not caused by the hyperparameters presented in Section 4.2, but for the above-mentioned reasons.

We applied data augmentation techniques to overcome the severe class imbalance in the UPLD, which had 748 images for fungi compared to only 68 for nematodes. Techniques such as brightness adjustment, horizontal and vertical flipping, in-plane rotation, zooming, and translation were employed to increase the number of images in each class to 1000. However, the performance of the proposed model degraded compared to training on the original, non-augmented dataset. This significant drop in accuracy may be attributed to the limited semantic diversity within the original minority classes, which could not be meaningfully enriched through augmentation. As a result, our light-weight model may have overfitted on redundant patterns. Additionally, it is possible that augmentation amplified the existing label noise and changed its distribution, thereby preventing the proposed model from achieving higher performance. The experimental results without and with an augmented dataset are shown in Table 5 and Table 6, respectively.

4.3.4. Comparisons of Processing Time and Model Complexity

To analyze the complexity of our proposed architecture, we compared the number of parameters (Param), number of floating-point operations (FLOPs), and total memory usage of our method with those of the SOTA methods, as shown in Table 7. Although our method has the fourth lowest Param, FLOPs, memory usage, and inference time among all methods, the proposed RCA-Net has the best classification performance, as shown in Table 7, which was the main goal of this study. Table 7 also presents a comparison of the inference time per image using our method and the SOTA models measured on a desktop and Jetson TX2 embedded system. The Jetson TX2 system can be used in mobile devices or farming robots as edge computing. It features an NVIDIA Pascal^TM-family GPU (256 compute unified device architecture (CUDA) cores), 8 GB of memory shared between the central processing unit (CPU) and GPU, and a memory bandwidth of 59.7 GB/s, using less than 7.5 W, as shown in Figure 18 [65].

Modern agricultural robots and mobile devices are equipped with cameras that can capture real-time images. These images can be processed for disease classification tasks in two ways: either by using cloud computing to transfer the image to a server for classification or by performing on-the-spot classification using an embedded system installed in the robot or mobile device. In the case of cloud computing, the internet with high bandwidth is required between the robot (or mobile device) and the server, which is not a feasible solution in agricultural fields that exhibit a wide area. Therefore, the latter method can be a better solution with an embedded system installed on the robot (or mobile device) as an onboard computing system that can process the images and identify the disease. Table 7 lists the inference times per image using 32 batch sizes that can be operated in parallel on multiple cores in the GPU card. As shown in Table 7, our method required 7.26 and 12.70 ms as the inference time per image on the desktop and Jetson TX2 system, respectively, which corresponds to a fast-processing speed of 137.7 (1000/7.26) frames per second (fps) and 78.7 (1000/12.70) fps on the desktop and Jetson TX2 system, respectively. Although our method is the third and fourth fastest among all the methods on the desktop and Jetson TX2 systems, respectively, the proposed RCA-Net exhibited the best classification performance, as shown in Table 7, which is the main goal of this study.

5. Discussions

5.1. Confusion Matrix

A comprehensive analysis of the class-wise performance of the proposed RCA-Net is presented in a confusion matrix (Figure 19). A confusion matrix is used to evaluate the effectiveness of the model during classification [66]. Each value along the diagonal in Figure 18 represents the correctly predicted samples from the corresponding class. Most classes exhibited a notably high classification accuracy using our proposed model. However, two classes of disease, mosaic and yellow, show slightly lower accuracy values than the other classes owing to inter-class similarity and background noises. For example, 12.24% of the samples classified under yellow disease were incorrectly predicted as red rot disease due to the influence of the background noises. In addition, 6.52% of the samples of the mosaic disease are wrongly predicted as the healthy class, illustrating the resemblance among these classes, as shown in Figure 8.

The performance of RCA-Net on the UPLD per class is shown in Figure 20 as a confusion matrix. High accuracies for the bacteria and Phytophthora classes indicate that these classifications are well differentiated by our model. Pest demonstrates the lowest accuracy at 63.33%, followed by nematode with 66.66% due to background noises, and virus with an accuracy of 69.81% due to resemblance to healthy class and background noises as shown in Figure 9.

According to the results in Figure 20, the Nematode (true) × Fungi (predicted) and the Fungi (true) × Nematode (predicted) cells cannot be identical because the numbers of images in each class are different. In detail, in the former case, the true (ground-truth) class is Nematode, and 16.67% of Nematode data are misclassified into Fungi class (false negative). However, in the second case, the true (ground-truth) class is Fungi, and 1.35% of Fungi data are misclassified into Nematode class (false negative). Because the number of images in the Nematode class is much smaller than that of the Fungi class as shown in Table 3, the results (%) in the Nematode (true) × Fungi (predicted) and the Fungi (true) × Nematode (predicted) cells are different.

5.2. Grad-CAM

In our study, the gradient-weighted class activation mapping (Grad-CAM) [67] methodology was adopted to visualize the classified image samples. Grad-CAM emphasizes critical areas within the image, which is vital for ensuring precise predictions and enabling us to extract meaningful insights from the proposed architecture. We obtained the Grad-CAM activation map from the PDCB, as shown in Figure 4 and Table 2. The correctly predicted results from both datasets are presented in Figure 21 and Figure 22. The first row presents the original input images, and the corresponding Grad-CAM-generated maps are shown in the second row. The reddish color represents the regions of important features, whereas the bluish color represents the regions of unimportant features. Figure 21 and Figure 22 confirm that RCA-Net emphasizes the discriminative and important features of the input samples within the leaf and diseased areas, irrespective of complex backgrounds.

The incorrect predictions by our proposed model from the disease classes of SCLD and UPLD are shown in Figure 23 and Figure 24. The input images are shown in Figure 23a and Figure 24a and the corresponding heat maps for each class generated using the Grad-CAM and the corresponding heat maps for each class generated using Grad-CAM are shown in Figure 23b and Figure 24b. The image samples from the predicted classes are shown in Figure 23c and Figure 24c. The incorrect classifications by the proposed RCA-Net revealed several issues. There is always a high probability of misclassification owing to background noise in the dataset, which is directly collected from fields with complex backgrounds. Uneven lighting, multiple leaves in the image, bushes, and soil made it difficult for our model to make accurate predictions. In the first row of Figure 23, a sample from the mosaic disease class is misclassified as the healthy class because it shares similar characteristics with the healthy sample presented in the first row of Figure 23c.

Uneven illumination is one of the reasons for this misclassification because it misguides the model in extracting discriminative features. From Figure 21c,d, we can see that the rust and red rot disease classes differ from each other owing to their spatial positions, the red rot disease targets the midrib of the leaf, whereas the rust disease spreads to the lateral parts of the leaf. However, in some cases, such as in the second row of Figure 23, a sample from the rust disease class was incorrectly predicted as red rot because some rust appeared along the midrib of the leaf. The third misclassified sample in Figure 23 belongs to the yellow disease class but due to background noise and a small brown spot near the midrib, the model incorrectly classified it as red rot disease.

According to the confusion matrix in Figure 20, we found that fungi disease has common features with pest and phytophthora disease classes. The first row of Figure 24 is an example of a fungi disease class that was falsely predicted as a phytophthora disease class. The second row of Figure 24 shows a sample from the pest disease class that was mistakenly classified as a fungi disease class because of its similar appearance and complex background. Similarly, leaves in the virus and healthy classes also showed a resemblance in features as shown in the third row of Figure 24, which caused misclassification.

In this paper, we did not consider datasets involving plants growing in water or submerged in water. Because RCA-Net is designed to efficiently capture detailed features of leaves through its attention and residual modules, we believe that it can adapt well to the characteristics of plants grown in water with proper training. While aquatic plants may present additional challenges, such as differences in reflections or texture patterns due to water, the core design of the proposed methodology is flexible enough to handle these variations, as it is already capable of handling complex datasets containing images from different backgrounds. In future work, we would further generalize the proposed method by including datasets of plants growing in water or soaked in water and validating their performance under these conditions.

5.3. Evaluating RCA-Net’s Performance by FD Estimation

To gain deeper insights into the evaluation of RCA-Net’s predicted outcomes, we applied the FD estimation. This approach aids in examining the structural complexity and irregularities among the healthy and diseased categories. For this purpose, box-counting methodology was employed on the activation maps generated by the proposed model. For obtaining the activation maps we used Grad-CAM methodology, as explained in the previous subsection, which represents the affected regions of the leaves in the red, blue and green color scheme. The most important region of the leaves is highlighted as red color and contains a large number of important features. So, for binarization, we converted the red region of the activation map of each class into white color and other colored areas into black ones, as illustrated in Figure 25.

Table 8 provides a comparative analysis of the structural intricacy of healthy and diseased leaf classes in the UPLD, based on FD values, coefficient of determination (R²), and correlation coefficient (C). Elevated high FD values suggest greater complexity and irregularity in leaf patterns, while lower values imply more uniform and simpler structures. In the same way, R² and C values approaching 1 signify a better alignment of the model with the observed data. From Figure 25 and Table 8 we can see that healthy classes have the FD value of 1.7465 which is the highest compared to all diseased classes examples listed. Likewise, the R² and C values—0.998539 and 0.999269, respectively—are the highest and closest to 1, reinforcing the reliability of FD values as indicators of structural complexity and highlighting the effective performance of our proposed model. The noticeable differences between the values of healthy and diseased classes demonstrate the model’s effectiveness in distinguishing between categories. These findings suggest that the proposed model holds significant potential for enhancing smart agricultural technologies through accurate plant disease classification.

5.4. Statistical Analysis

Additionally, we performed a statistical analysis using a t-test [68], which signified the disparity between the proposed model and the second-best model. This analysis was crucial for substantiating the notable and statistically significant difference between the performance of our model and that of the other models. We also measured Cohen’s d value [69] to measure the effect size, which helped us understand the magnitude of the differences. A large Cohen’s d value indicates a substantial effect size and is more meaningful for practical implementation.

The experimental results show that the obtained p-value was 0.001, as shown in Figure 26, which is less than the threshold of 0.01, indicating statistical significance at the 99% confidence level. Moreover, the effect size obtained by calculating Cohen’s d value was 1.5, which is higher than the standard large effect size value of 0.8. These results statistically confirm the superiority of the proposed RCA-Net over the other models.

We acknowledge that weather conditions and lighting variations may have a significant impact on the quality and appearance of the leaf in the images, which can cause a drop in performance of the proposed network. We acknowledge the specific challenges in agricultural scenarios, such as variable lighting conditions, diverse viewing angles, and complex plant morphologies, which can impact plant disease recognition accuracy. To address these challenges, our proposed RCA mechanism enhances feature extraction by focusing on disease-relevant regions while suppressing irrelevant background noise. The RCA module uses a large receptive field to capture broader contextual information and residual connections to preserve critical features, ensuring effective focus on disease-related patterns. To further improve adaptability, the model integrates an RCB with parallel convolutions of different kernel sizes, enabling it to capture multi-scale features and mitigate intra-class variation and inter-class similarity. This design allows the model to better handle variations in image scales caused by different viewing angles and plant structures. While we currently employ static dilation rates, the architecture effectively addresses multi-scale feature extraction, which helps in adapting to the variability of agricultural environments as shown in Figure 16. The proposed strategy enables the model to adapt well to the challenging agricultural environment. Our experiments further confirm the model’s robustness and effectiveness in classifying plant leaf diseases under complex background conditions as shown in Table 5. In future work, we would have experimented with more data including weather conditions and lighting variations.

The combination of our proposed modules (RCA, RCB, and PDCB) with the bottleneck blocks of MobileNetV3-Large enables efficient classification of plant disease datasets with complex backgrounds. RCA and RCB are integrated at the initial stages of RCA-Net, where the spatial dimensions of feature maps are large. The attention module enhances key features, while the residual block extracts essential disease-related information. Their outputs are combined through element-wise addition as shown in Figure 4, allowing the model to refine critical features, which are further processed by the bottleneck blocks. At the final stage, PDCB is applied for multi-scale feature extraction. This technique enhanced the ability of the RCA-Net to effectively classification.

In summary, AI plays a significant role in transforming modern agriculture and assisting farmers with fast, accurate, and automated monitoring of plant health conditions. By using the proposed RCA-Net framework, we presented the AI-driven methodology that effectively classifies plant disease even in complex real-world environments. Numerous experiments conducted on two publicly available datasets advocate that the proposed methodology achieves superior performance while maintaining computational efficiency suitable for embedded systems. Furthermore, FD analysis confirmed the model’s ability to capture structural complexity in diseased leaves. By integrating RCA-Net within intelligent IoT-based agricultural systems, it became possible to achieve real-time disease detection and contribute to sustainable and precision farming practices. These results highlight the broader potential of AI in transforming conventional agriculture toward smart and efficient crop management solutions.

6. Conclusions

In this study, we proposed a novel RCA-Net for the classification of plant leaf diseases from complex backgrounds. The feature representation capability of the model includes three proposed modules, RCA, RCB, and PDCB, which are integrated with the base model. Experiments were conducted using the SCLD and UPLD. Compared with SOTA models, the proposed model indicated clear superiority in terms of accuracy, precision, recall, and F1 score. RCA-Net effectively handles the inter-class similarities and intra-class differences caused by diverse backgrounds and other noise factors, such as uneven illumination, and discriminates correctly between various disease and healthy classes in both datasets. This indicates the real-world applicability of agricultural imaging by adopting a unique feature representation of plant leaf disease images. In addition, we confirmed that our model can be operated on a resource constraint embedded system for farming robots or mobile devices at fast processing speed (78.7 frames per second). In addition, we introduce fractal dimension estimation to analyze the complexity and irregularity of class activation maps from the cases of healthy plants and their diseases, confirming that our model can extract important features for the correct classification of plant disease. Statistically, the t-test and Cohen’s d value also proved the superiority of the proposed architecture over the SOTA methods. Nevertheless, our model exhibited classification errors in the case of images with small inter-class differences, severely complex backgrounds, and uneven illumination.

In future work, we will investigate methods for amplifying inter-class differences based on important features. We will also research methods for reducing the effect of complex backgrounds by focusing on the region of interest based on a heatmap. We will consider additional image preprocessing to normalize the uneven illumination. Additionally, we will apply our model to other plant and fruit disease classification tasks to verify the generality of the proposed method.

Author Contributions

Methodology, writing—original draft, M.H.T.; conceptualization, H.S.; investigation, R.A. and S.G.K.; software, J.S.K.; data curation, M.U. and H.A.H.G.; validation, J.S. and Y.H.L.; supervision, writing—review and editing, K.R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Ministry of Science and ICT (MSIT), Korea, through the Information Technology Research Center (ITRC) Support Program under Grant IITP-2025-2020-0-01789, and in part by the Artificial Intelligence Convergence Innovation Human Resources Development supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP) under Grant IITP-2025-RS-2023-00254592.

Data Availability Statement

Our model and code are made publicly available on GitHub site (https://github.com/mhamza92/RCA-Net, accessed on 15 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tian, H.; Wang, T.; Liu, Y.; Qiao, X.; Li, Y. Computer vision technology in agricultural automation—A review. Inf. Process. Agric. 2020, 7, 1–19. [Google Scholar] [CrossRef]
Sharma, A.; Jain, A.; Gupta, P.; Chowdary, V. Machine learning applications for precision agriculture: A comprehensive review. IEEE Access 2020, 9, 4843–4873. [Google Scholar] [CrossRef]
Iqbal, Z.; Khan, M.A.; Sharif, M.; Shah, J.H.; ur Rehman, M.H.; Javed, K. An automated detection and classification of citrus plant diseases using image processing techniques: A review. Comput. Electron. Agric. 2018, 153, 12–32. [Google Scholar] [CrossRef]
Ampatzidis, Y.; De Bellis, L.; Luvisi, A. iPathology: Robotic applications and management of plants and plant diseases. Sustainability 2017, 9, 1010. [Google Scholar] [CrossRef]
Cruz, A.; Ampatzidis, Y.; Pierro, R.; Materazzi, A.; Panattoni, A.; De Bellis, L.; Luvisi, A. Detection of grapevine yellows symptoms in Vitis vinifera L. with artificial intelligence. Comput. Electron. Agric. 2019, 157, 63–76. [Google Scholar] [CrossRef]
Shirahatti, J.; Patil, R.; Akulwar, P. A survey paper on plant disease identification using machine learning approach. In Proceedings of the 3rd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 15–16 October 2018. [Google Scholar]
Mohyuddin, G.; Khan, M.A.; Haseeb, A.; Mahpara, S.; Waseem, M.; Saleh, A.M. Evaluation of Machine Learning approaches for precision Farming in Smart Agriculture System-A comprehensive Review. IEEE Access 2024, 12, 60155–60184. [Google Scholar] [CrossRef]
Delfani, P.; Thuraga, V.; Banerjee, B.; Chawade, A. Integrative approaches in modern agriculture: IoT, ML and AI for disease forecasting amidst climate change. Precis. Agric. 2024, 25, 2589–2613. [Google Scholar] [CrossRef]
Naseer, A.; Shmoon, M.; Shakeel, T.; Ur Rehman, S.; Ahmad, A.; Gruhn, V. A Systematic Literature Review of the IoT in Agriculture-Global Adoption, Innovations, Security Privacy Challenges. IEEE Access 2024, 12, 60986–61021. [Google Scholar] [CrossRef]
RCA-Net. Available online: https://github.com/mhamza92/RCA-Net (accessed on 15 April 2025).
Narla, V.L.; Suresh, G. Multiple feature-based tomato plant leaf disease classification using SVM classifier. In Machine Learning, Image Processing, Network Security, and Data Sciences; Doriya, R., Soni, B., Shukla, A., Gao, X.-Z., Eds.; Lecture Notes in Electrical Engineering; Springer Nature: Singapore, 2023; Volume 946, pp. 443–455. [Google Scholar] [CrossRef]
Applalanaidu, M.V.; Kumaravelan, G. A review of machine learning approaches in plant leaf disease detection and classification. In Proceedings of the 3rd International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–5 February 2021. [Google Scholar] [CrossRef]
Hossain, E.; Hossain, M.F.; Rahaman, M.A. A color and texture based approach for the detection and classification of plant leaf disease using KNN classifier. In Proceedings of the International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh, 7–9 February 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar] [CrossRef]
Mokhtar, U.; El Bendary, N.; Hassenian, A.E.; Emary, E.; Mahmoud, M.A.; Hefny, H.; Tolba, M.F. SVM-based detection of tomato leaves diseases. In Proceedings of the 7th IEEE International Conference Intelligent Systems IS, Warsaw, Poland, 24–26 September 2014. [Google Scholar] [CrossRef]
Bhagat, M.; Kumar, D. Efficient feature selection using BoWs and SURF method for leaf disease identification. Multimed. Tools Appl. 2023, 82, 28187–28211. [Google Scholar] [CrossRef]
Pantazi, X.E.; Moshou, D.; Tamouridou, A.A. Automated leaf disease detection in different crop species through image features analysis and One Class Classifiers. Comput. Electron. Agric. 2019, 156, 96–104. [Google Scholar] [CrossRef]
Pandey, A.; Jain, K. An intelligent system for crop identification and classification from UAV images using conjugated dense convolutional neural network. Comput. Electron. Agric. 2022, 192, 106543. [Google Scholar] [CrossRef]
Paymode, A.S.; Malode, V.B. Transfer learning for multi-crop leaf disease image classification using convolutional neural network VGG. Artif. Intell. Agric. 2022, 6, 23–33. [Google Scholar] [CrossRef]
Agarwal, M.; Abhishek, S.; Siddhartha, A.; Amit, S.; Suneet, G. ToLeD: Tomato leaf disease detection using convolution neural network. Procedia Comput. Sci. 2020, 167, 293–301. [Google Scholar] [CrossRef]
Atila, Ü.; Uçar, M.; Akyol, K.; Uçar, E. Plant leaf disease classification using EfficientNet deep learning model. Ecol. Inform. 2021, 61, 101182. [Google Scholar] [CrossRef]
Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2016, arXiv:1511.08060. [Google Scholar]
Esgario, J.G.; Krohling, R.A.; Ventura, J.A. Deep learning for classification and severity estimation of coffee leaf biotic stress. Comput. Electron. Agric. 2020, 169, 105162. [Google Scholar] [CrossRef]
Nag, A.; Chanda, P.R.; Nandi, S. Mobile app-based tomato disease identification with fine-tuned convolutional neural networks. Comput. Electr. Eng. 2023, 112, 108995. [Google Scholar] [CrossRef]
Zhao, Y.; Li, Y.; Wu, N.; Xu, X. Neural network based on convolution and self-attention fusion mechanism for plant leaves disease recognition. Crop Prot. 2024, 180, 106637. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 1–18. [Google Scholar] [CrossRef]
Madhavan, M.V.; Thanh, D.N.H.; Khamparia, A.; Pande, S.; Malik, R.; Gupta, D. Recognition and classification of pomegranate leaves diseases by image processing and machine learning techniques. Comput. Mater. Contin. 2021, 66, 2939–2955. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, Z.; Wu, N.; Zhang, Z.; Xu, X. MAFDE-DN4: Improved Few-shot plant disease classification method based on Deep Nearest Neighbor Neural Network. Comput. Electron. Agric. 2024, 226, 109373. [Google Scholar] [CrossRef]
Ding, J.; Zhang, C.; Cheng, X.; Yue, Y.; Fan, G.; Wu, Y.; Zhang, Y. Method for classifying apple leaf diseases based on dual attention and multi-scale feature extraction. Agriculture 2023, 13, 940. [Google Scholar] [CrossRef]
Wang, P.; Xiong, Y.; Zhang, H. Maize leaf disease recognition based on improved MSRCR and OSCRNet. Crop Prot. 2024, 183, 106757. [Google Scholar] [CrossRef]
Wang, X.; Wang, Y.; Zhao, J.; Niu, J. ECA-ConvNext: A rice leaf disease identification model based on ConvNext. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar] [CrossRef]
Daphal, S.D.; Koli, S.M. Enhancing sugarcane disease classification with ensemble deep learning: A comparative study with transfer learning techniques. Heliyon 2023, 9, e18261. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Shi, Z.; Xie, X.; Chen, Z.; Xie, Z. Residual channel attention fusion network for road extraction based on remote sensing images and GPS trajectories. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 8358–8369. [Google Scholar] [CrossRef]
Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 17 October–2 November 2019. [Google Scholar] [CrossRef]
MobilenetV3. Available online: https://github.com/xiaolai-sqlai/mobilenetv3.git (accessed on 10 January 2024).
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
Life, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference of Machine Learning (ICML), Lille, France, 6–11 July 2015. [Google Scholar] [CrossRef]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the International Conference on Learning Representation (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar] [CrossRef]
Daphal, S.D.; Koli, S.M. Sugarcane leaf disease dataset. Mendeley Data 2022, 1. [Google Scholar] [CrossRef]
Shabrina, N.H.; Indarti, S.; Maharani, R.; Kristiyanti, D.A.; Prastomo, N. A novel dataset of potato leaf disease in uncontrolled environment. Data Brief 2024, 52, 109955. [Google Scholar] [CrossRef]
NVIDIA GeForce GTX 1070. Available online: https://www.nvidia.com/en-us/geforce/10-series/ (accessed on 12 December 2023).
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Conference of Advances in Neural Information Processing Systems 32 (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar] [CrossRef]
Pham, T.D.; Lee, Y.W.; Park, C.; Park, K.R. Deep Learning-Based Detection of Fake Multinational Banknotes in a Cross-Dataset Environment Utilizing Smartphone Cameras for Assisting Visually Impaired Individuals. Mathematics 2022, 10, 1616. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Akram, R.; Hong, J.S.; Kim, S.G.; Sultan, H.; Usman, M.; Gondal, H.A.H.; Tariq, M.H.; Ullah, N.; Park, K.R. Crop and weed segmentation and fractal dimension estimation using small training data in heterogeneous data environment. Fractal Fract. 2024, 8, 285. [Google Scholar] [CrossRef]
Sultan, H.; Ullah, N.; Hong, J.S.; Kim, S.G.; Lee, D.C.; Jung, S.Y.; Park, K.R. Estimation of fractal dimension and segmentation of brain tumor with parallel features aggregation network. Fractal Fract. 2024, 8, 357. [Google Scholar] [CrossRef]
Kim, S.G.; Hong, J.S.; Kim, J.S.; Park, K.R. Estimation of fractal dimension and detection of fake finger-vein images for finger-vein recognition. Fractal Fract. 2024, 8, 646. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Y.; Li, X. Fractal dimensions derived from spatial allometric scaling of urban form. Chaos Solitons Fractals 2019, 126, 122–134. [Google Scholar] [CrossRef]
Wu, J.; Jin, X.; Mi, S.; Tang, J. An Effective Method to Compute the Box-counting Dimension Based on the Mathematical Definition and Intervals. Results Eng. 2020, 6, 100106. [Google Scholar] [CrossRef]
Wu, J.; Xie, D.; Yi, S.; Yin, S.; Hu, D.; Li, Y.; Wang, Y. Fractal Study of the Development Law of Mining Cracks. Fractal Fract. 2023, 7, 696. [Google Scholar] [CrossRef]
Cheng, J.; Chen, Q.; Huang, X. An algorithm for crack detection, segmentation, and fractal dimension estimation in low-light environments by fusing fft and convolutional neural network. Fractal Fract. 2023, 7, 820. [Google Scholar] [CrossRef]
Savarese, P.H.; Mazza, L.O.; Figueiredo, D.R. Learning identity mappings with residual gates. arXiv 2016, arXiv:1611.01260. [Google Scholar] [CrossRef]
Shen, F.; Gan, R.; Zeng, G. Weighted residuals for very deep networks. In Proceedings of the 3rd International Conference of System and Informatics (ICSAI), Shanghai, China, 19–21 November 2016. [Google Scholar] [CrossRef]
Yao, J.; Wang, D.; Hu, H.; Xing, W.; Wang, L. ADCNN: Towards learning adaptive dilation for convolutional neural networks. Pattern Recognit. 2022, 123, 108369. [Google Scholar] [CrossRef]
Paul, S.G.; Biswas, A.A.; Saha, A.; Zulfikar, M.S.; Ritu, N.A.; Zahan, I.; Rahman, M.; Islam, M.A. A real-time application-based convolutional neural network approach for tomato leaf disease classification. Array 2023, 19, 100313. [Google Scholar] [CrossRef]
Paul, H.; Udayangani, H.; Umesha, K.; Lankasena, N.; Liyanage, C.; Thambugala, K. Maize leaf disease detection using convolutional neural network: A mobile application based on pre-trained VGG16 architecture. N. Z. J. Crop Hortic. Sci. 2025, 53, 367–383. [Google Scholar] [CrossRef]
Zhang, R.; Zhu, Y.; Ge, Z.; Mu, H.; Qi, D.; Ni, H. Transfer learning for leaf small dataset using improved ResNet50 network with mixed activation functions. Forests 2022, 13, 2072. [Google Scholar] [CrossRef]
Sutaji, D.; Yıldız, O. LEMOXINET: Lite ensemble MobileNetV2 and Xception models to predict plant disease. Ecol. Inform. 2022, 70, 101698. [Google Scholar] [CrossRef]
Srivastava, M.; Meena, J. Plant leaf disease detection and classification using modified transfer learning models. Multimed. Tools Appl. 2024, 83, 38411–38441. [Google Scholar] [CrossRef]
Adnan, F.; Awan, M.J.; Mahmoud, A.; Nobanee, H.; Yasin, A.; Zain, A.M. EfficientNetB3-adaptive augmented deep learning (AADL) for multi-class plant disease classification. IEEE Access 2023, 11, 85426–85440. [Google Scholar] [CrossRef]
Bi, C.; Xu, S.; Hu, N.; Zhang, S.; Zhu, Z.; Yu, H. Identification method of corn leaf disease based on improved Mobilenetv3 model. Agronomy 2023, 13, 300. [Google Scholar] [CrossRef]
Li, H.; Qi, M.; Du, B.; Li, Q.; Gao, H.; Yu, J.; Bi, C.; Yu, H.; Liang Ye, G.; Tang, Y. Maize disease classification system design based on improved ConvNeXt. Sustainability 2023, 15, 14858. [Google Scholar] [CrossRef]
Ji, Z.; Bao, S.; Chen, M.; Wei, L. ICS-ResNet: A Lightweight Network for Maize Leaf Disease Classification. Agronomy 2024, 14, 1587. [Google Scholar] [CrossRef]
Zhou, H.; Chen, J.; Niu, X.; Dai, Z.; Qin, L.; Ma, L.; Li, J.; Su, Y.; Wu, Q. Identification of leaf diseases in field crops based on improved ShuffleNetV2. Front. Plant Sci. 2024, 15, 1342123. [Google Scholar] [CrossRef]
NVIDIA Jetson TX2 Module. Available online: https://developer.nvidia.com/embedded/jetson-tx2 (accessed on 10 May 2024).
Confusion Matrix. Available online: https://en.wikipedia.org/wiki/Confusion_matrix (accessed on 5 January 2024).
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
Student’s T-Test. Available online: http://en.wikipedia.org/wiki/Students%27s_t-test (accessed on 16 June 2024).
Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155–159. [Google Scholar] [CrossRef]

$Fractalfract 09 00315 g001$

Figure 1. Proposed method of an intelligent IoT-based agriculture system.

$Fractalfract 09 00315 g001$

$Fractalfract 09 00315 g002$

Figure 2. Mind-map of the proposed study.

$Fractalfract 09 00315 g002$

$Fractalfract 09 00315 g003$

Figure 3. Workflow overview of the proposed method. (RCA-Net, residual convolutional attention network; Conv, convolution; BN, batch normalization; HS, hard-swish activation function; RCA, residual convolutional attention; RCB, residual concatenated block; PDCB, parallel dilated convolutional block).

$Fractalfract 09 00315 g003$

$Fractalfract 09 00315 g004$

Figure 4. Structure of RCA-Net.

$Fractalfract 09 00315 g004$

$Fractalfract 09 00315 g005$

Figure 5. Structure of the RCA Block.

$Fractalfract 09 00315 g005$

$Fractalfract 09 00315 g006$

Figure 6. Structure of RCB.

$Fractalfract 09 00315 g006$

$Fractalfract 09 00315 g007$

Figure 7. Structure of PDCB.

$Fractalfract 09 00315 g007$

$Fractalfract 09 00315 g008$

Figure 8. Example images from the SCLD. Four disease classes of (a) rust, (b) red rot, (c) yellow, (d) mosaic, and (e) healthy.

$Fractalfract 09 00315 g008$

$Fractalfract 09 00315 g009$

Figure 9. Example images from the UPLD. Six disease classes of (a) virus, (b) phytophthora, (c) fungi, (d) bacteria, (e) pest, (f) nematode, and (g) healthy.

$Fractalfract 09 00315 g009$

$Fractalfract 09 00315 g010$

Figure 10. Training and validation accuracy and loss achieved by RCA-Net with SCLD.

$Fractalfract 09 00315 g010$

$Fractalfract 09 00315 g011$

Figure 11. Training and validation accuracy and loss achieved by RCA-Net with UPLD.

$Fractalfract 09 00315 g011$

$Fractalfract 09 00315 g012$

Figure 12. Performance comparison with different combinations of the RCA module (Unit: %).

$Fractalfract 09 00315 g012$

$Fractalfract 09 00315 g013$

Figure 13. Performance comparison with different combinations of RCB module (Unit: %).

$Fractalfract 09 00315 g013$

$Fractalfract 09 00315 g014$

Figure 14. Performance comparison with gated, weighted and simple residual connection of the RCB module (Unit: %). * [52], ** [53].

$Fractalfract 09 00315 g014$

$Fractalfract 09 00315 g015$

Figure 15. Performance comparison with varied number of dilated convolutional layers of PDCB (No.: Number of) (Unit: %).

$Fractalfract 09 00315 g015$

$Fractalfract 09 00315 g016$

Figure 16. Performance comparison with dynamic and static dilated convolutional layers of PDCB (Unit: %). * [54].

$Fractalfract 09 00315 g016$

$Fractalfract 09 00315 g017$

Figure 17. Performance comparison with different numbers of blocks of RCA and RCB modules (No.: Number of) (Unit: %).

$Fractalfract 09 00315 g017$

$Fractalfract 09 00315 g018$

Figure 18. NVIDIA Jetson TX2 embedded system.

$Fractalfract 09 00315 g018$

$Fractalfract 09 00315 g019$

Figure 19. Class-wise performance of RCA-Net in a confusion matrix on SCLD (unit: %).

$Fractalfract 09 00315 g019$

$Fractalfract 09 00315 g020$

Figure 20. Class-wise performance of RCA-Net in a confusion matrix on UPLD (unit: %).

$Fractalfract 09 00315 g020$

$Fractalfract 09 00315 g021$

Figure 21. Correctly classified examples from SCLD. The first row shows the input image samples, and the second row shows the corresponding Grad-CAM images. (a) Healthy class and disease classes of (b) mosaic, (c) red rot, (d) rust, and (e) yellow.

$Fractalfract 09 00315 g021$

$Fractalfract 09 00315 g022$

Figure 22. Correctly classified examples from UPLD. The first row shows the input image samples, and the second row shows the corresponding Grad-CAM images. (a) Healthy class and disease classes of (b) bacteria, (c) fungi, (d) nematode, (e) pest, (f) phytophthora, and (g) virus.

$Fractalfract 09 00315 g022$

$Fractalfract 09 00315 g023$

Figure 23. Incorrectly classified samples from the SCLD. (a) Input images: mosaic disease class → healthy class in the first row; rust disease class → red rot disease class in the second row; yellow disease class → red rot disease class in the third row. (b) Grad-CAM images of (a), and (c) examples of the predicted class.

$Fractalfract 09 00315 g023$

$Fractalfract 09 00315 g024$

Figure 24. Incorrectly classified samples from UPLD. (a) Input image: fungi disease class → phytophthora disease class in the first row; pest disease class fungi disease class in the second row; virus disease class → healthy class in the third row. (b) Grad-CAM images of (a), and (c) examples of the predicted class.

$Fractalfract 09 00315 g024$

$Fractalfract 09 00315 g025a$ $Fractalfract 09 00315 g025b$

Figure 25. FD analysis for the activation map generated by the RCA-Net: (a–e) from the left, the first column contains the RCA-Net’s output Grad-CAM, the second column contains their corresponding binarized images, and the third one shows the FD graph, respectively. (a) Healthy, (b) mosaic, (c) red rot, (d) rust, and (e) yellow.

$Fractalfract 09 00315 g025a$ $Fractalfract 09 00315 g025b$

$Fractalfract 09 00315 g026$

Figure 26. t-test results of the proposed and second-best models.

$Fractalfract 09 00315 g026$

Table 1. Comparison of our proposed and previous methods for the classification of plant leaf diseases.

Categories		Method	Dataset	Strengths	Limitations
Disease classification of images with simple background	ML-based	GLCM texture features and KNN-based classification [12]	- Arkansas plant disease dataset - Reddit-plant leaf disease dataset	- High accuracy - Texture features can be extracted effectively	KNN cannot easily adapt to various changes in a new and untrained pattern
		BoWs and SVM classifier are used along with the SURF technique for feature extraction [14]	Tomato, potato, and pepper dataset	Feature reduction makes the model robust	Both the SVM and SURF techniques increase computational complexity
		LBP feature extractor and one-class SVM [15]	Vine leaf dataset	- One class classifier reduces the complexity and cost of obtaining labeled data - It learns dynamically from newly added images and expands its recognition ability	The conflict resolution algorithm impacts the model’s interpretability and transparency
		GLCM and SVM-based detection of tomato leaves disease [13]	Self-collected tomato leaves dataset	- GLCM enhances the ability to differentiate between the classes - SVM with different classes offers robustness and flexibility	Appropriate kernel selection is time-consuming
	DL-based	CNN-based VGG-16 for MCLD [17]	- PlantVillage dataset - Self-collected dataset	Performance improved by the hyper-parameter tuning of the VGG-16 network	Additional preprocessing is required
		Customized CNN architecture for tomato leaf disease classification [18]	PlantVillage dataset	- Shallow networks and less time required for training - A small number of trainable parameters	- Model trained by 1000 epochs increases the chance of overfitting - Only compared with three pre-trained models
		EfficientNet, VGG-16, ResNet-50 and Inception V3 [19]	PlantVillage dataset	High accuracy was achieved by fine-tuning the hyper-parameters	Computationally complex
		CAST-Net [23]	PlantVillage dataset	An increased receptive field and self-attention mechanism help to increase efficiency	Additional post-processing is required
		ResNet-50, MobileNetV2, VGG-16, etc. [21]	Arabica coffee leaves dataset	Data augmentation increases accuracy and makes the system robust	Only one type of dataset is used
Disease classification of images with complex background	ML-based	K-means algorithm and multi-class SVM classifier [26]	Pomegranate leaves dataset	- ROI extraction helps the system to extract the important features - Multi-class SVM captures the complexities of leaf patterns more easily	Multi-class SVM requires a long training time
	DL-based	MAFDE-DNA4 + few shots learning [27]	PlantVillage, FGVC8 and minimageNet dataset	Few-shots learning along with meta-attention mechanism achieved promising results in a complex background.	Requires high-quality labeled images.
		MSRCR + OSCRNet [29]	- Maize dataset from 2018 AI challenger crop disease detection competition. - Self collected	- Multi-scale Retinex color restoration. - Self-calibration convolutional residual network	Additional noise was introduced due to color restoration technique.
		ECA-ConvNeXt [30]	Rice leaf disease image sample dataset	The performance is enhanced by the addition of ECA in the ConvNeXt architecture	Increased computational complexity
		Ensemble model [31]	Sugarcane leaf disease dataset	- Better performance even on small datasets - Low amount of time required for the training	Computationally expensive and has a chance of overfitting
		Proposed method (RCA-Net)	- Sugarcane leaf disease dataset - Uncontrolled environment potato leaf dataset	- Robust and computationally effective for disease classification with actual field images - Higher accuracy than SOTA methods	Exhibits low accuracy for images with severely complex backgrounds

Table 2. Layer details of the RCA-Net (c indicates the number of classes).

Block	Layer Type	Input	Output	Number of Filters	Stride Info
Input	Input	224 × 224 × 3	-	-	-
Conv. Layer_1	3 × 3 Conv, BN, HS	224 × 224 × 3	112 × 112 × 16	16	2
Stage 1	Bottleneck × 2	112 × 112 × 16	56 × 56 × 24	16, 24	1, 2
	RCA_1	56 × 56 × 24	56 × 56 × 24	24	1
	RCB_1	56 × 56 × 24	56 × 56 × 24	24	1
	Addition_1	56 × 56 × 24	56 × 56 × 24	24	-
Stage 2	Bottleneck × 2	56 × 56 × 24	28 × 28 × 40	24, 40	1, 2
	RCA_2	28 × 28 × 40	28 × 28 × 40	40	1
	RCB_2	28 × 28 × 40	28 × 28 × 40	40	1
	Addition_2	28 × 28 × 40	28 × 28 × 40	40	-
Stage 3	Bottleneck × 3	28 × 28 × 40	14 × 14 × 80	40, 40, 80	1, 1, 2
	RCA_3	14 × 14 × 80	14 × 14 × 80	80	1
	RCB_3	14 × 14 × 80	14 × 14 × 80	80	1
	Addition_3	14 × 14 × 80	14 × 14 × 80	80	-
Stage 4	Bottleneck × 4	14 × 14 × 80	14 × 14 × 112	80, 80, 80, 112	1
Stage 5	Bottleneck × 2	14 × 14 × 112	7 × 7 × 160	112, 160	1, 2
Stage 6	Bottleneck × 2	7 × 7 × 160	7 × 7 × 160	160	1
Stage 6	PDCB	7 × 7 × 160	7 × 7 × 160	160	1
Conv. Layer_2	1 × 1 Conv, BN, HS	7 × 7 × 160	7 × 7 × 960	960	1
pooling	Avg. pooling	7 × 7 × 960	1 × 1 × 960	-	1
FC1	1 × 1 Conv, HS	1 × 1 × 960	1 × 1 × 1280	1280	1
FC2	1 × 1 Conv	1 × 1 × 1280	1 × 1 × C	C	-
Output	SoftMax	1 × 1 × C	C	-	-

Table 3. Class-wise data distribution of the SCLD and UPLD.

Dataset Name	Classes Name		Number of Images	Total Number of Images
SCLD	Disease	Rust	514	2569
		Red Rot	519
		Yellow	505
		Mosaic	511
	Healthy		520
UPLD	Disease	Virus	532	3076
		Phytophthora	347
		Fungi	748
		Bacteria	569
		Pest	611
		Nematode	68
	Healthy		201

Table 4. Performance comparison across different combinations of proposed modules (RCA, RCB, and PDCB) with a base network (MobileNetV3-Large) (Unit: %).

RCA	RCB	PDCB	Accuracy	Precision	Recall	F1 Score
			84.52	87.93	84.52	86.03
✓			88.33	88.04	86.77	87.07
	✓		87.78	88.9	86.89	88.45
		✓	86.84	87.01	86.32	86.88
✓	✓		91.10	92.02	90.80	91.4
✓	✓	✓	93.81	94.09	93.68	93.87

Table 5. Comparison of RCA-Net with SOTA models using SCLD and UPLD (Unit: %).

Model	SCLD				UPLD
Model	Accuracy	Precision	Recall	F1 Score	Accuracy	Precision	Recall	F1 Score
VGG-19 [55]	70.83	58.83	57.70	58.25	75.94	74.51	75.85	75.17
VGG-16 [56]	64.67	70.33	64.66	67.18	59.81	60.5	59.81	60.16
ResNet-50 [57]	80.64	81.21	83.60	82.38	68.17	70.06	68.17	69.10
XceptionNet [58]	79.17	84.36	78.79	81.47	64.45	63.13	64.14	63.63
MobileNetV2 [59]	81.65	85.24	80.08	82.83	76.15	71.9	76.99	74.36
EfficientNet-B3 [60]	76.91	79.74	78.01	78.85	72.35	73.78	72.35	73.08
MobileNetV3-Large [61]	84.52	87.93	84.52	86.03	72.03	73.16	72.03	72.57
ConvNeXt-Tiny [62]	85.80	89.07	85.96	87.45	59.72	63.65	60.16	61.86
DenseNet121 [23]	79.28	84.92	79.13	81.88	59.16	60.58	59.16	59.88
ResNet-101 [63]	77.12	83.57	77.17	80.78	65.21	68.92	65.26	67.04
ShuffleNetV2 [64]	88.23	89.73	87.94	88.50	64.48	66.27	64.29	65.26
ECA-ConvNeXt [30]	82.34	86.46	83.12	83.77	62.78	66.13	62.52	64.28
Ensemble Net [31]	86.53	87.00	88.00	87.50	-	-	-	-
RCA-Net (proposed)	93.81	94.09	93.68	93.87	78.14	75.39	78.01	76.91

Table 6. Comparison of RCA-Net with SOTA models using Augmented UPLD (Unit: %).

Model	Accuracy	Precision	Recall	F1 Score
VGG-19 [55]	73.97	74.71	73.80	73.24
VGG-16 [56]	56.27	56.97	56.27	56.57
ResNet-50 [57]	66.24	66.59	66.29	66.49
XceptionNet [58]	62.85	61.66	62.35	61.93
MobileNetV2 [59]	74.03	72.88	74.79	73.82
EfficientNet-B3 [60]	72.35	73.78	72.35	72.99
MobileNetV3-Large [61]	70.42	70.92	70.42	70.79
ConvNeXt-Tiny [62]	65.89	68.72	65.01	66.84
DenseNet121 [23]	58.52	58.52	58.52	58.52
ResNet-101 [63]	68.42	70.10	67.68	69.84
ShuffleNetV2 [64]	69.12	68.67	68.03	68.35
ECA-ConvNeXt [30]	60.63	62.97	60.02	61.75
RCA-Net (Proposed)	74.59	67.72	73.14	70.32

Table 7. Comparison of Param, Flops, memory and inference time (desktop and jetson TX2) usage between proposed and SOTA models.

Model	Param (M)	FLOPs (G)	Memory Usage (MB)	Inference Time (ms)
Model	Param (M)	FLOPs (G)	Memory Usage (MB)	Desktop	Jetson TX2
VGG-19 [55]	139.59	19.63	532.50	11.20	112.68
VGG-16 [56]	134.28	15.46	512.24	10.46	80.38
ResNet-50 [57]	23.52	4.13	89.72	8.02	26.62
XceptionNet [58]	22.85	8.34	88	10.14	44.28
MobileNetV2 [59]	2.23	0.32	8.51	7.11	11.26
EfficientNet-B3 [60]	10.70	1.01	40.83	8.21	41.1
MobileNetV3-Large [61]	4.21	0.23	16.05	7.15	12.63
ConvNeXt-Tiny [62]	28.57	4.46	109.03	10.27	53.44
DenseNet121 [23]	6.95	2.89	30.8	8.64	31.71
ResNet-101 [63]	42.51	7.86	162.16	10.20	43.17
ShuffleNetV2 [64]	1.25	0.15	4.80	7.67	9.61
ECA-ConvNeXt [30]	87.51	15.35	333.99	12.78	135.96
RCA-Net (proposed)	5.91	0.41	22.55	7.26	12.70

Table 8. FD, R², and C values.

Results	Healthy	Mosaic	Red Rot	Rust	Yellow
FD	1.7465	1.5591	1.3891	1.4746	1.4707
R²	0.998539	0.99565	0.99162	0.99303	0.99590
C	0.999269	0.99783	0.99580	0.99651	0.99795

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tariq, M.H.; Sultan, H.; Akram, R.; Kim, S.G.; Kim, J.S.; Usman, M.; Gondal, H.A.H.; Seo, J.; Lee, Y.H.; Park, K.R. Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds. Fractal Fract. 2025, 9, 315. https://doi.org/10.3390/fractalfract9050315

AMA Style

Tariq MH, Sultan H, Akram R, Kim SG, Kim JS, Usman M, Gondal HAH, Seo J, Lee YH, Park KR. Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds. Fractal and Fractional. 2025; 9(5):315. https://doi.org/10.3390/fractalfract9050315

Chicago/Turabian Style

Tariq, Muhammad Hamza, Haseeb Sultan, Rehan Akram, Seung Gu Kim, Jung Soo Kim, Muhammad Usman, Hafiz Ali Hamza Gondal, Juwon Seo, Yong Ho Lee, and Kang Ryoung Park. 2025. "Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds" Fractal and Fractional 9, no. 5: 315. https://doi.org/10.3390/fractalfract9050315

APA Style

Tariq, M. H., Sultan, H., Akram, R., Kim, S. G., Kim, J. S., Usman, M., Gondal, H. A. H., Seo, J., Lee, Y. H., & Park, K. R. (2025). Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds. Fractal and Fractional, 9(5), 315. https://doi.org/10.3390/fractalfract9050315

Article Menu

Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds

Abstract

1. Introduction

2. Related Work

2.1. Disease Classification of Images with Simple Background

2.1.1. ML-Based Methods

2.1.2. DL-Based Methods

2.2. Disease Classification of Images with Complex Background

3. Proposed Method

3.1. Workflow Overview of the Proposed Method

3.2. Structure of RCA-Net

3.2.1. RCA

3.2.2. RCB

3.2.3. PDCB

4. Experimental Results and Analysis

4.1. Experimental Dataset and Setup

4.2. Training of the Proposed Method

4.3. Testing of Proposed Method

4.3.1. Evaluation Metrics

4.3.2. Ablation Studies

4.3.3. Comparison of the RCA-Net with SOTA Models

4.3.4. Comparisons of Processing Time and Model Complexity

5. Discussions

5.1. Confusion Matrix

5.2. Grad-CAM

5.3. Evaluating RCA-Net’s Performance by FD Estimation

5.4. Statistical Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI