WT-ResNet: A Non-Destructive Method for Determining the Nitrogen, Phosphorus, and Potassium Content of Sugarcane Leaves Based on Leaf Image

Sun, Cuimin; Dou, Junyang; He, Biao; Cai, Yuxiang; Zou, Chengwu

doi:10.3390/agriculture15161752

Open AccessArticle

WT-ResNet: A Non-Destructive Method for Determining the Nitrogen, Phosphorus, and Potassium Content of Sugarcane Leaves Based on Leaf Image

by

Cuimin Sun

^1,2

,

Junyang Dou

¹

,

Biao He

¹,

Yuxiang Cai

¹ and

Chengwu Zou

^2,3,*

¹

School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China

²

State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning 530004, China

³

School of Agriculture, Guangxi University, Nanning 530004, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(16), 1752; https://doi.org/10.3390/agriculture15161752

Submission received: 18 July 2025 / Revised: 12 August 2025 / Accepted: 13 August 2025 / Published: 15 August 2025

(This article belongs to the Special Issue Optics and Image Analysis in Modern Agriculture: Transforming Practices and Unveiling Opportunities)

Download

Browse Figures

Versions Notes

Abstract

Traditional nutritional diagnosis suffers from inefficiency, high cost, and damage when predicting the nitrogen, phosphorus, and potassium content of sugarcane leaves. Non-destructive nutritional diagnosis of sugarcane leaves based on traditional machine learning and deep learning suffers from poor generalization and lower accuracy. To address these issues, this study proposes a novel convolutional neural network called WT-ResNet. This model incorporates wavelet transform into the residual network structure, enabling effective feature extraction from sugarcane leaf images and facilitating the regression prediction of nitrogen, phosphorus, and potassium content in the leaves. By employing a cascade of decomposition and reconstruction, the wavelet transform extracts multi-scale features, which allows for the capture of different frequency components in images. Through the use of shortcut connections, residual structures facilitate the learning of identity mappings within the model. The results show that by analyzing sugarcane leaf images, our model achieves R² values of 0.9420 for nitrogen content prediction, 0.9084 for phosphorus content prediction, and 0.8235 for potassium content prediction. The accuracy rate for nitrogen prediction reaches 88.24% within a 0.5 tolerance, 58.82% for phosphorus prediction within a 0.1 tolerance, and 70.59% for potassium prediction within a 0.5 tolerance. Compared to other algorithms, WT-ResNet demonstrates higher accuracy. This study aims to provide algorithms for non-destructive sugarcane nutritional diagnosis and technical support for precise sugarcane fertilization.

Keywords:

sugarcane; leaf; nitrogen; phosphorus; potassium; wavelet transform; prediction; convolutional neural network

1. Introduction

As a globally important sugar crop, sugarcane is mainly used for producing sugar and ethanol [1]. Regarding its nutritional needs, the growth of sugarcane is primarily influenced by the balanced supply of nitrogen, phosphorus, and potassium in the soil [2]. To prevent hindered growth in sugarcane, farmers often apply excessive fertilizer during the initial planting stage and critical growth periods. While over-fertilization ensures the growth of sugarcane, it also leads to soil non-point source pollution and resource waste [3]. To conserve resources and protect the environment, the nutrient diagnosis methodology is commonly applied to guide sugarcane fertilization [4].

Traditional crop nutrient diagnosis primarily consists of morphological diagnosis, rapid tissue testing, and soil analysis [5]. Specifically, morphological diagnosis involves observing changes in the phenotypes of crop parts that are particularly sensitive to specific nutrient deficiencies, thereby evaluating the crop nutritional status. But morphological diagnosis suffers from a high rate of misdiagnosis and low timeliness. The rapid tissue test method utilizes fresh crop tissues that are sensitive to nutrient levels for chemical nutrient analysis. But the rapid tissue test method is destructive to crops and inefficient. Soil analysis directly assesses the availability of soil nutrients to guide fertilization, but is costly. Furthermore, more advanced spectroscopic analysis methods are also available [6]. Nutrient deficiency in crops leads to changes in key indicators such as chlorophyll and water content in the leaves, affecting their spectral reflectance characteristics [7]. Spectroscopic analysis establishes a relationship between crop nutritional status and multi-spectral data to evaluate the crop nutritional level [8]. However, this technique is costly and has limited generalizability. Therefore, crop nutrition diagnosis is in urgent need of a non-destructive, efficient, and cost-effective diagnostic technique.

Advances in artificial intelligence technology have led to the wide application of computer vision in agriculture monitoring and management [9]. The color, texture, and shape of crop leaves can vary in response to environmental factors like nutrient absorption, reflecting the regulation of growth and development by internal signaling mechanisms [10]. The color, texture, and shape of crop leaves are associated with the crop nutrient absorption [11]. Previous studies mainly employed manually selected features from crop leaves to develop models and make predictions. For instance, Li Y et al. used the green and red channels extracted from wheat leaf images to predict wheat nitrogen content, achieving an R² value of 0.71 [12]; Haider et al. used the green index derived from spinach leaf images to estimate the nitrogen content in spinach leaves, achieving an R² value of 0.9198 [13]; Sulistyo et al. used deep sparse extreme learning machines and genetic algorithms to predict the nutritional parameters of wheat leaves, achieving an R² value of 0.83 [14]; and Hui Y et al. utilized Principal Component Analysis (PCA) dimensionality reduction to integrate leaf color and texture information for regression prediction of nitrogen content in sugarcane leaves, achieving an R² value of 0.9264 [15]. Nevertheless, conventional machine learning techniques necessitate manual feature engineering, resulting in challenges for achieving efficient and robust feature extraction.

With the advent of convolutional neural networks, crop leaf feature extraction has entered a new era. Lei S et al. employed a convolutional neural network (CNN) model to predict the nitrogen content in corn leaves, achieving an average prediction accuracy of 95% [16]. However, they did not specify the criteria used to assess this accuracy. Janani et al. extracted color features from peanut leaves and fed them into a CNN-based model for prediction [17]. The model classified nitrogen levels into deficient, adequate, and excessive categories, achieving a classification accuracy of 92%. Lu et al. used complex pre-trained models for transfer learning to enable nitrogen prediction in sugarcane, achieving an R² value of 0.9349 [18]. However, current research primarily focuses on predicting the content of a single element in crops. In contrast, research on predicting multiple common elements remains limited.

The introduction of drones and multispectral imaging has ushered agricultural machine vision into the low-altitude era [19]. Studies utilizing high-resolution RGB images have successfully estimated plant height, biomass, density, and yield [20]. Zheng et al. employed multiple linear regression equations to validate the effectiveness of a model based on drone imagery [21]. This model integrated texture and color features for estimating rice nitrogen content, achieving an R² value of 0.84. Zhang Y et al. compared stepwise regression and random forest regression methods in predicting the leaf area index of kiwifruit orchards [22]. The random forest regression method achieved an R² value of 0.972. Zhang X et al. utilized drone multispectral imagery integrated with texture features to develop a novel approach termed the “Spectral-Texture Fusion Index” (STFI) for predicting rice leaf nitrogen content, achieving an R² value of 0.8740 [23]. Zhang S et al. integrated machine learning with multi-source remote sensing data, thereby enhancing the estimation accuracy of the winter wheat nitrogen nutrition index, achieving an R² value of 0.89 [24]. Yang et al. developed a framework for classifying rice nitrogen fertilizer levels by integrating drone multispectral imagery and machine learning techniques. This framework achieved a 90.0% accuracy in distinguishing between three nitrogen content categories: deficient, adequate, and sufficient [25]. Current research predominantly focuses on single-period nutrient regression prediction, whereas fewer studies conduct comprehensive analyses spanning the entire growth cycle. Furthermore, many studies have developed prediction models based on univariate image features; models that integrate multiple feature types remain relatively scarce.

This study introduces a novel convolutional neural network named WT-ResNet. WT-ResNet employs the residual structure of ResNet and utilizes the wavelet transform from image processing. Compared to other studies, this research offers the following innovations and improvements:

(1): This study pioneers the use of deep learning models to establish a correlation between sugarcane leaf images and the phosphorus and potassium levels within the leaves.
(2): The CNN model design integrates wavelet transform, a technique from image processing, into the residual network architecture, enabling multi-scale feature extraction.
(3): The nitrogen, phosphorus, and potassium prediction model based on sugarcane leaf images achieves the best performance among comparable prediction models. Additionally, the introduction of the tolerance concept facilitates a more objective evaluation of the model.

This study aims to provide algorithms for non-destructive sugarcane nutritional diagnosis and technical support for precise sugarcane fertilization. Moreover, it offers novel insights and algorithmic approaches for the design of related models.

2. Materials and Methods

2.1. Field Experiment

Field trials were conducted in the Agricultural High-Tech Industry Demonstration Zone of Quli Town, Fusui County, Chongzuo City, Guangxi Zhuang Autonomous Region (107.8° E, 22.5° N), an area characterized by a typical subtropical monsoon climate. The experimental sugarcane variety used was ROC22. The nutrient levels in the soil of the sugarcane field are presented in Table 1.

Samples were collected in March 2024, a period when the sugarcane plants were transitioning from the seedling stage to the tillering stage. This experiment primarily targets sugarcane leaves that are free from pest and disease infestation. Leaf sampling was performed from 8:00 a.m. to 10:00 a.m. In the designated sampling area, the S-shaped multi-point method was applied to select sugarcane plants for sampling, with one leaf collected from each plant. Consistency in sugarcane density, growth vigor, and growth stage was maintained throughout the sampling period. Sugarcane leaves showing signs of pest or disease infestation were excluded from the collection. Leaves three or four leaves down from the top of the sugarcane stalk were harvested. We removed the upper and lower parts of the longitudinal edges of the collected elongated leaves, retaining a middle section of about 20 cm. After collection, the leaves were rinsed with clean water and dried. They were then stored in opaque bags to prevent nutrient loss. The fresh samples were initially placed in an oven set at 80–90 °C for 15 to 30 min to deactivate the enzymes. Subsequently, the temperature was reduced to 65 °C, and the samples were dried for 48 h to completely eliminate moisture. Lastly, the samples were pulverized and passed through a sieve.

Soil sampling was performed concurrently with leaf sampling. Within the designated sampling area, the X-shaped multi-point method was used to select sampling locations. Soil sampling utilized a single-point approach, with a sampling depth of about 15 cm. To ensure an adequate quantity of the soil sample, each individual sample should weigh at least 500 g. After collection, the samples were air-dried, then mechanically ground and sieved through a 2 mm mesh for the measurement of the relevant parameters listed in Table 1. The parameters measured and the corresponding methods used in the experiment are detailed in Table 2.

2.2. Field Experiment Result

Following leaf collection, images of the leaves were captured using a smartphone. Given that sugarcane leaves rapidly dehydrate and wilt once detached from the stalk, it is crucial to capture images promptly to prevent phenotypic alterations. Background removal from the images was performed using U-Net [26]. Lastly, the images were cropped to conform to the model input requirements. Background removal was performed to eliminate irrelevant information from the background. Image cropping was performed to maximize the coverage of the effective region within the entire image [27]. Specifically, a pre-trained U-Net model was first employed to segment the sugarcane leaves from the background of the collected images. Following this, each segmented leaf was processed to generate three square sub-images. The side length of these squares was defined as the maximum leaf width plus a 1 cm margin on both sides. To preserve the spatial distribution of nutrients, one square was cropped from the center of the leaf upper, middle, and lower sections. Finally, all resulting square images were uniformly resized to 224 × 224 pixels to prepare them as inputs for the model. By implementing the aforementioned standardization and data augmentation, the robustness of the data collection process was guaranteed. The preprocessing steps are illustrated in Figure 1.

The experiment involved collecting 170 valid sugarcane leaf images. Following preprocessing, 510 images were obtained and made available for the training phase. The images were split into training, validation, and test sets in a 4:1:1 ratio. In total, 510 images were generated from 170 original images, where each group of 3 images shared the same nutrient content labels. Figure 2 illustrates that the label distributions across the training, validation, and test sets are nearly identical, and each approximates a normal distribution. This resemblance confirms that the partitioning of the dataset was effective, yielding representative subsets without introducing systematic bias.

2.3. WT-ResNet

The hardware and software platforms of the deep learning experiment platform are shown in Table 3.

As shown in Figure 3, we designed a convolutional neural network named WT-ResNet. WT-ResNet builds upon the residual and hierarchical structures of ResNet. It incorporates wavelet convolution kernels to substitute the traditional convolutional kernels in both the initial layer and the residual blocks. ResNet, a classic convolutional neural network, has strong feature extraction capabilities [28]. Its residual structure and modular design allow for the flexible construction of deep neural networks. Wavelet convolution kernels are a novel type of convolution kernel based on wavelet transform functions in signal processing [29]. Wavelet convolution kernels offer superior feature extraction capabilities and a reduced number of parameters compared to traditional convolutional kernels. Combining these two elements results in WT-ResNet having a simple yet highly effective structure. The original image, in RGB format with a size of 224 × 224 × 3, is first processed through two downsampling steps, which reduce the spatial dimensions of the feature maps while increasing their depth. Subsequently, the data enters a layer composed of four WT residual blocks, which facilitate effective information flow and learning. Next, the features pass through a global average pooling layer (GAP) and a flattening layer, where their dimensionality is reduced and they are reshaped into a one-dimensional vector. Finally, this vector is fed into a linear layer (FC) to produce the predicted values. This study focuses on predicting the nitrogen, phosphorus, and potassium content of sugarcane leaves. A WT-ResNet model was developed and trained to establish a functional mapping between sugarcane leaf images and their corresponding nutrient content.

2.3.1. Residual Structure

In ResNet, the residual structure incorporates shortcut connections, which facilitate the learning of identity mappings by the network [28]. This effectively mitigates the gradient vanishing problem encountered in deep networks, thereby enabling the training of deeper networks with enhanced performance. The fundamental concept behind the residual structure is to add the input x directly to the output that results from processing it through a residual block (a sequence of layers). As illustrated in Figure 4, within a residual network, the model learns a residual mapping F(x) rather than directly learning H(x), establishing the relationship H(x) = F(x) + x.

2.3.2. Wavelet Convolution

Wavelet convolution employs wavelet transform functions to capture multi-scale features across various frequencies within images, thereby addressing the limitations of conventional convolution which often fails to adequately capture mid- and low-frequency details [29]. The transformation and convolution process proceeds as follows. Initially, wavelet transform (WT) is applied to filter and down-scale the low-frequency and high-frequency components of the input. Subsequently, convolution is performed on these resulting frequency maps, and then inverse wavelet transform (IWT) is utilized to reconstruct the output. Specifically, this process is detailed in Formula (1).

Y = I W T (C o n v (W, W T (X)))

(1)

where X is the input tensor, Y is the output tensor, and W is the weight tensor of the convolutional kernel, whose input channel count is four times that of X.

This operation not only performs convolution separately within different frequency components, but also enables convolution over a larger region of the original input using a smaller kernel, thus enlarging the receptive field. For efficient computation, 2D Haar Wavelet Transform (WT) is employed as the transformation here [30,31]. The process is carried out as follows. First, wavelet transform (WT) and convolution (Conv) operations are performed, as given in Formulas (2) and (3)

X_{L L}^{(i)}, X_{H}^{(i)} = W T (X_{L L}^{(i - 1)})

(2)

Y_{L L}^{(i)}, Y_{H}^{(i)} = C o n v ({W^{(i)}, (X}_{L L}^{(i)}, X_{H}^{(i)}))

(3)

where

X_{L L}^{(0)}

is the input tensor;

X_{L L}^{(i)}

denotes the low-frequency mapping tensor from layer (i − 1);

X_{H}^{(i)}

denotes the three high-frequency mapping tensors from layer (i − 1);

Y_{L L}^{(i)}, Y_{H}^{(i)}

are the output tensors corresponding to

X_{L L}^{(i)}, X_{H}^{(i)}

, respectively; and

W^{(i)}

is the weight tensor of the convolutional kernel at layer i.

Then, a cascaded inverse transform (IWT) is performed, followed by summation, as shown in Formula (4).

Z^{(i)} = I W T (Y_{L L}^{(i)} + Z^{(i + 1)}, Y_{H}^{(i)})

(4)

where

Z^{(0)}

denotes the final output tensor, and

Z^{(i)}

represents the output tensor from the i-th layer. The other notations follow the definitions given in the previous equations.

2.3.3. WT Residual Block

As illustrated in the comparative diagram of Figure 5, the residual block in ResNet18 employs two 3 × 3 convolutional kernels stacked sequentially for feature extraction. In contrast, the residual block in WT-ResNet requires only a single 5 × 5 wavelet convolutional kernel to deliver superior feature extraction capability. In comparison to ResNet18, which features an 18-layer architecture, WT-ResNet consists of only 10 layers, resulting in a more streamlined structure. Notably, the feature extraction power of stacking two 3 × 3 traditional convolutional kernels is comparable to that of a single 5 × 5 traditional convolutional kernel, while also requiring fewer parameters [32]. In constructing WT-ResNet, a single 5 × 5 wavelet convolutional kernel was selected to substitute the two originally stacked 3 × 3 traditional convolutional kernels. This decision stems from two key considerations: Firstly, the ablation study results in Section 3.4 indicate that the single 5 × 5 wavelet convolutional kernel design achieves superior performance. Secondly, the wavelet convolutional kernel design incorporates depthwise separable convolution techniques, which significantly cuts down the parameter count [33]. Ultimately, these improvements reduce the number of parameters in WT-ResNet to one-tenth of those in the baseline model.

2.4. Evaluation Metrics

The evaluation metrics used for the model in this study are presented in Table 4. Since traditional statistical indicators are standardized, making it difficult to grasp the actual performance; the concept of tolerance accuracy is therefore introduced. The tolerance is set to 0.5 for both nitrogen and potassium. The tolerance for phosphorus is set to 0.1 because its average value is about one-fifth of the average values for nitrogen and potassium.

3. Results

3.1. WT-ResNet Training Results

During model training, the Mean Squared Error (MSE) is used as the loss function, and the AdamW optimizer is employed. The initial learning rate is set at 0.0001. Following a 5-epoch warm-up, the model is trained for 100 epochs using cosine annealing with a single cycle. The best epoch is chosen from the final 10 stable epochs and then tested on the test set. The model predicts the nitrogen (N), phosphorus (P), and potassium (K) content in sugarcane leaves. The training process is illustrated in Figure 6.

WT-ResNet achieves rapid convergence within the first 20 epochs. The loss for phosphorus (P) starts lower because its target values are smaller. Later, some fluctuations appear because of the relatively high learning rate, which is actually advantageous for the model in reaching the global optimum. As the learning rate decreases, the range of fluctuations decreases. Overall, there is a negative correlation between Loss and R². For the final 10 epochs, the highest R² values achieved on the validation set for predicting sugarcane leaf nitrogen, phosphorus, and potassium content are 0.9428, 0.9384, and 0.8921, respectively. On the test set, the model achieves R² values of 0.9420 for nitrogen prediction, 0.9084 for phosphorus prediction, and 0.8235 for potassium prediction. The accuracy rate within a 0.5 tolerance for nitrogen prediction is 88.24%, within a 0.1 tolerance for phosphorus prediction is 58.82%, and within a 0.5 tolerance for potassium prediction is 70.59%.

3.2. Grad-CAM

To better elucidate the model inference process, Gradient-weighted Class Activation Mapping (Grad-CAM) is employed to provide visualizations of the reasoning steps [34]. Grad-CAM is a visualization technique designed to interpret the decision-making process of deep learning models, particularly convolutional neural networks. To gain an intuitive understanding of WT-ResNet feature perception capability, the last convolutional layer of Layer 4 was selected, and Grad-CAM was used to generate heatmaps. As illustrated in Figure 7, the red regions indicate areas with significant contribution, whereas the blue regions indicate areas with lesser contribution. The model effectively concentrates on the leaf’s inherent color and texture, exhibiting lower focus on the non-informative regions at the edges. Consequently, the model is capable of performing effective feature extraction from sugarcane leaf images. For different tasks, models have varying focuses, but they consistently concentrate on the main vein of the leaf and its nearby areas. This might be due to the fact that, under abiotic stress, nitrogen, phosphorus, and potassium can function as signals influencing crop leaf growth and development [35,36]. Alternatively, it could be because crops exhibit patterns of nutrient uptake and allocation, leading to certain correlations among these nutrients [37,38]. A more direct observation is that when crops lack essential nutrients like nitrogen, phosphorus, and potassium, they exhibit corresponding deficiency symptoms [39,40].

3.3. Backbone Comparison Experiments

To validate the accuracy of WT-ResNet in predicting sugarcane nitrogen, phosphorus, and potassium content, a comparative experiment was performed using common backbone networks. The results of the comparative experiment are illustrated in Figure 8. The results are sorted in descending order based on the R² value. “avg” indicates the performance achieved when using the average value as the predicted value. WT-ResNet achieved the optimal performance in all cases. It is noteworthy that the generalization performance varies among different models. For instance, the results indicate that ResNeXt50 achieved high R² values for nitrogen and phosphorus predictions, yet its performance was suboptimal for potassium prediction. Although ResNet18 is not the best-performing model among them, its performance consistently remains at a high level. Therefore, we selected it as the base architecture for further improvement. The performance metrics of the lower-performing models are similar to those of the model predicting using the average value, indicating that these corresponding models did not effectively extract features or perform accurate predictions. It is noteworthy that when evaluating solely based on R² value, a high R² value does not necessarily indicate practical applicability. The introduction of tolerance accuracy addresses the limitation that R² value provides an incomplete assessment. The R² value for phosphorus is high, but the actual prediction accuracy is low. This is because the range of phosphorus levels is small, allowing the model to achieve a high R² value by merely fitting the general trend. While MAE and MSE can also provide some insights into the prediction performance, such an evaluation requires referencing various types of metrics, which is not as direct as tolerance accuracy in indicating the performance. R² value and tolerance accuracy exhibit a positive correlation. Even when R² value is low, some predictions may coincidentally fall within the tolerance range. However, such random predictions lack meaningful value. As shown in Figure 9 of Section 3.4 (ablation experiment), the visual representation of prediction performance on the test set is presented. Expanding the tolerance suitably could achieve acceptable accuracy. However, a stricter tolerance was employed to reflect the models’ true performance.

3.4. Ablation Study

To verify the effectiveness of the modifications, we conducted an ablation study on WT-ResNet. More precisely, the model was partitioned into initial layers and residual layers. Specifically, an ablation study was conducted on the initial layers, focusing on the replacement of wavelet convolutions. For the residual layers, both alternative structures (either two 3 × 3 convolutional layers or one 5 × 5 convolutional layer) and the substitution with wavelet convolutions were explored. In total, eight different configurations were tested, and the results of the ablation experiments are presented in Table 5, Table 6 and Table 7. Further, 2*3 × 3 indicates the presence of two 3 × 3 standard convolutional kernels within the residual block; 1*5 × 5 indicates the presence of a single 5 × 5 standard convolutional kernel within the residual block. “WT+” indicates substituting the convolution in the model’s input layer with wavelet convolution; “3 × 3 WT” and “5 × 5 WT” indicate substituting the corresponding traditional convolutions with wavelet convolutions. The ablation experiment results show that, compared to the baseline model, the R² values for nitrogen, phosphorus, and potassium prediction improved by 8.52%, 9.37%, and 30.98%, respectively, while the accuracy improved by 25.00%, 66.68%, and 42.89%, respectively. Compared to the baseline model, the R² values for nitrogen, phosphorus, and potassium prediction improved by 8.52%, 9.37%, and 30.98%, respectively, while the accuracy improved by 25.00%, 66.68%, and 42.89%, respectively. The ablation results show that there is little difference in performance metrics between the double-layer 3 × 3 and single-layer 5 × 5 traditional convolutions. While the double-layer 3 × 3 convolution shows some improvement when replaced with wavelet convolution, the enhancement effect is inferior to that achieved by replacing the single-layer 5 × 5 convolution. The 5 × 5 wavelet convolution offers a multi-scale feature extraction capability through cascaded wavelet decomposition, which is suitable for this task. In summary, the WT-ResNet model we have proposed demonstrates enhanced modeling capabilities.

To more intuitively demonstrate the performance improvement of WT-ResNet compared to the baseline model, we plotted Ground Truth images using both WT-ResNet and the baseline model on the test set. The Ground Truth images show the true value on the x-axis and the predicted value on the y-axis, including the y = x Ground Truth line (indicating perfect prediction) and the corresponding tolerance line. Figure 9 clearly shows that the prediction points of the WT-ResNet model mostly fall within the tolerance lines and are more closely clustered around the Ground Truth line, demonstrating that the improvement in prediction performance of WT-ResNet is significant.

4. Discussion

This research successfully developed and validated a predictive model for estimating nitrogen, phosphorus, and potassium nutrient levels in sugarcane leaves using leaf images. The development of this model preliminarily proves the viability of using computer vision technology to substitute for or complement traditional chemical analysis methods in crop nutrition diagnosis, thereby offering a novel technological approach for advancing precision and smart agriculture. Nevertheless, while affirming the model potential, it is crucial to acknowledge the limitations inherent in the current study. Building upon this understanding, we can then outline potential avenues for future research.

(1): Although the model demonstrates strong performance on the current dataset, its generalization ability and robustness still encounter several limitations. The size and variety of the dataset represent key limitations. The dataset employed in this study exhibits limitations regarding sample size, the range of sugarcane growth stages represented, the number of varieties included, and the variety of environmental stressors encountered (such as drought, pest infestations, and diseases). Consequently, the model predictive accuracy may considerably decline when estimating nutrient levels in diverse and unpredictable field environments not represented in the training data. The model dependability hinges directly on how well the training data represents the full spectrum of conditions it will encounter. To overcome the limitation of data, we are actively and systematically enriching the dataset, aiming to include a broader range of sugarcane varieties, the complete growth cycle, and abiotic stress conditions. Furthermore, we will investigate unsupervised and self-supervised learning methods. A key benefit of this strategy is its capacity to leverage vast amounts of unlabeled field images for pre-training, allowing the model to independently learn common visual characteristics of leaves, including texture, shape, and structure.
(2): Difficulties arise in automating the process of acquiring and preparing field images. In natural and unstructured agricultural settings, the efficient and standardized acquisition of images remains a significant challenge. Complex environmental factors, such as soil, weeds, and other plants, significantly disrupt the model’s ability to focus on the target leaf and extract its features, thereby decreasing prediction accuracy [41]. Furthermore, fluctuations in lighting conditions are critical determinants of image quality [42]. Uncompensated differences in illumination can generate significant noise, thereby directly impacting the model’s stability. The current automated acquisition and preprocessing methods are still inadequate in addressing these complex challenges. To ensure the practical implementation of the technology, it is essential to address the challenges encountered in engineering practice. Developing image acquisition protocols, including standardized shooting distance, angle, and the use of reference objects, is fundamental to ensuring data consistency. Additionally, incorporating illumination correction algorithms into the preprocessing pipeline can effectively mitigate the effects caused by variations in lighting conditions.
(3): Although deep learning models offer high performance, they typically require substantial computational resources. This presents a challenge for application scenarios that need to be deployed on edge devices, such as smartphones and portable devices. Furthermore, existing models still have room for improvement in balancing the capture of the leaf global structure and local fine details. This may result in not fully leveraging all the phenotypic information present in the images. Regarding the model architecture, we will investigate more advanced network designs to enhance both performance and efficiency. These models leverage techniques like depthwise separable convolution [33], model pruning, and quantization [43] to significantly reduce both the number of parameters and computational complexity, all while preserving high accuracy. This allows for seamless deployment on mobile devices such as drones and smartphones.
(4): The quality of raw data collected by visual sensors directly sets the upper bound for the performance of subsequent model analysis [44]. Portable agricultural devices, such as drones and handheld detectors, impose stringent requirements on sensor size and power consumption. Currently available high-precision visual sensors, such as 3D cameras and hyperspectral cameras, tend to be large in size and consume significant energy during prolonged operation, which makes it challenging to meet the demands of mobile applications in field environments. To address the complexities of crop shapes, such as vines and clusters of fruits, bionic visual sensors are being investigated. This approach aims to widen the field of view and minimize blind spots. Meanwhile, the use of flexible electronic technology holds promise for enabling sensors to adhere closely to crop surfaces for monitoring purposes.

5. Conclusions

This study developed a convolutional neural network: WT-ResNet, designed for extracting features from sugarcane leaves. It enables effective feature extraction, ultimately facilitating relatively accurate, non-destructive prediction of the nitrogen, phosphorus, and potassium content within the leaves. The results show that by analyzing sugarcane leaf images, our model achieves R² values of 0.9420 for nitrogen content prediction, 0.9084 for phosphorus content prediction, and 0.8235 for potassium content prediction. Compared to the baseline model, the R² values for nitrogen, phosphorus, and potassium prediction have improved by 8.52%, 9.37%, and 30.98%, respectively. The accuracy rate for nitrogen prediction reaches 88.24% within a 0.5 tolerance, 58.82% for phosphorus prediction within a 0.1 tolerance, and 70.59% for potassium prediction within a 0.5 tolerance. Compared to the baseline model, the accuracy in nitrogen, phosphorus, and potassium prediction has improved by 25.00%, 66.68%, and 42.89%, respectively.

In comparison with other mainstream models, it exhibits strong accuracy. In Section 3.3, “Backbone Comparison Experiments,” it is demonstrated that the performance achieved on our dataset is optimal. In previous studies on predicting nutrient levels in sugarcane leaves, the highest R² values achieved for nitrogen prediction were 0.9264 and 0.9349 [15,18]. This study achieved an R² of 0.9420 for nitrogen prediction, representing improvements of 1.68% and 0.76% over prior research. Furthermore, the introduction of accuracy metrics brings the methodology closer to practical application. Moreover, this study made the first attempt to predict phosphorus and potassium levels. While the R² values were slightly lower than those for nitrogen at 0.9084 and 0.8235, respectively, these results still offer valuable references for future research in this area. In summary, the introduction of WT-ResNet enables the relatively accurate non-destructive assessment of nitrogen, phosphorus, and potassium content in sugarcane leaves. It facilitates the precise application of fertilizers and offers a new algorithm for sustainable agricultural practices and intelligent agricultural systems.

The significance of this research extends beyond sugarcane nutrient prediction. The methodology introduced in this research is highly scalable. Its core framework can be generalized and applied to the extraction of leaf phenotypes and the analysis of physiological indicators for other primary food crops and economic crops, including rice, wheat, and corn. This will offer an efficient and non-destructive phenotypic screening tool for crop breeding, thereby expediting the selection process for improved varieties. Furthermore, it can also offer real-time nutritional diagnosis and fertilization guidance to agricultural extension workers and growers, facilitating the transition of agriculture to a sustainable development model characterized by resource conservation, environmental friendliness, high yield, and high efficiency. Despite certain limitations in the current model, the potential for significant development and wide-ranging applications of deep learning-based crop phenotyping analysis technology is evident through systematic optimization across data, algorithms, and practical implementation.

Author Contributions

Conceptualization, methodology, software, formal analysis, visualization, and writing—original draft preparation, C.S. and J.D.; resources and data curation, J.D., B.H., and Y.C.; writing—review and editing, project administration, and funding acquisition, C.S. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangxi Young Elite Scientist Sponsorship Program (grant number NO.2025014) and the Scientist Sponsorship Program Science and Technology Major Project of Guangxi (grant number Guike AA22117005, Guike AA22117007 and Guike AA22117004).

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

The relevant data of the field experiment were provided by Guangxi University Agricultural New Town.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Phiri, R.; Rangappa, S.M.; Siengchin, S. Sugarcane Bagasse for Sustainable Development of Thermoplastic Biocomposites. Ind. Crops Prod. 2024, 222, 120115. [Google Scholar] [CrossRef]
Liu, H.; Wang, Y.; Cai, T.; He, K.; Tian, X.; Chen, Z.; Yin, Y.; Cui, Z. Integrated Management to Achieve Synergy in Sugarcane Production and Quality in China. Field Crops Res. 2024, 317, 109552. [Google Scholar] [CrossRef]
Zhang, Z.; Hua, C.; Ayyamperumal, R.; Wang, M.; Wang, S. The Impact of Specialization and Large-Scale Operation on the Application of Pesticides and Chemical Fertilizers: A Spatial Panel Data Analysis in China. Environ. Impact Assess. Rev. 2024, 106, 107496. [Google Scholar] [CrossRef]
Pereira Da Silva, G.; Justino Chiaia, H.L. Limitation Due to Nutritional Deficiency and Excess in Sugarcane Using the Integral Diagnosis and Recommendation System (DRIS) and Nutritional Composition Diagnosis (CND). Commun. Soil Sci. Plant Anal. 2021, 52, 1458–1467. [Google Scholar] [CrossRef]
Lemaire, G.; Tang, L.; Bélanger, G.; Zhu, Y.; Jeuffroy, M.-H. Forward New Paradigms for Crop Mineral Nutrition and Fertilization towards Sustainable Agriculture. Eur. J. Agron. 2021, 125, 126248. [Google Scholar] [CrossRef]
Mahlayeye, M.; Darvishzadeh, R.; Nelson, A. Characterising Maize and Intercropped Maize Spectral Signatures for Cropping Pattern Classification. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103699. [Google Scholar] [CrossRef]
Siedliska, A.; Baranowski, P.; Pastuszka-Woźniak, J.; Zubik, M.; Krzyszczak, J. Identification of Plant Leaf Phosphorus Content at Different Growth Stages Based on Hyperspectral Reflectance. BMC Plant Biol. 2021, 21, 28. [Google Scholar] [CrossRef]
Shankar, T.; Malik, G.C.; Banerjee, M.; Dutta, S.; Praharaj, S.; Lalichetti, S.; Mohanty, S.; Bhattacharyay, D.; Maitra, S.; Gaber, A.; et al. Prediction of the Effect of Nutrients on Plant Parameters of Rice by Artificial Neural Network. Agronomy 2022, 12, 2123. [Google Scholar] [CrossRef]
Kolhar, S.; Jagtap, J. Plant Trait Estimation and Classification Studies in Plant Phenotyping Using Machine Vision—A Review. Inf. Process. Agric. 2023, 10, 114–135. [Google Scholar] [CrossRef]
VanHook, A.M. Nitrogen Assimilation Gets a HY5. Sci. Signal. 2016, 9, ec59. [Google Scholar] [CrossRef]
Akkem, Y.; Biswas, S.K.; Varanasi, A. Smart Farming Using Artificial Intelligence: A Review. Eng. Appl. Artif. Intell. 2023, 120, 105899. [Google Scholar] [CrossRef]
Li, Y.; Chen, D.; Walker, C.N.; Angus, J.F. Estimating the Nitrogen Status of Crops Using a Digital Camera. Field Crops Res. 2010, 118, 221–227. [Google Scholar] [CrossRef]
Haider, T.; Farid, M.S.; Mahmood, R.; Ilyas, A.; Khan, M.H.; Haider, S.T.-A.; Chaudhry, M.H.; Gul, M. A Computer-Vision-Based Approach for Nitrogen Content Estimation in Plant Leaves. Agriculture 2021, 11, 766. [Google Scholar] [CrossRef]
Sulistyo, S.B.; Wu, D.; Woo, W.L.; Dlay, S.S.; Gao, B. Computational Deep Intelligence Vision Sensing for Nutrient Content Estimation in Agricultural Automation. IEEE Trans. Autom. Sci. Eng. 2018, 15, 1243–1257. [Google Scholar] [CrossRef]
You, H.; Zhou, M.; Zhang, J.; Peng, W.; Sun, C. Sugarcane Nitrogen Nutrition Estimation with Digital Images and Machine Learning Methods. Sci. Rep. 2023, 13, 14939. [Google Scholar] [CrossRef]
Sun, L.; Yang, C.; Wang, J.; Cui, X.; Suo, X.; Fan, X.; Ji, P.; Gao, L.; Zhang, Y. Automatic Modeling Prediction Method of Nitrogen Content in Maize Leaves Based on Machine Vision and CNN. Agronomy 2024, 14, 124. [Google Scholar] [CrossRef]
Janani, M.; Jebakumar, R. Detection and Classification of Groundnut Leaf Nutrient Level Extraction in RGB Images. Adv. Eng. Softw. 2023, 175, 103320. [Google Scholar] [CrossRef]
Lu, Z.; Sun, C.; Dou, J.; He, B.; Zhou, M.; You, H. SC-ResNeXt: A Regression Prediction Model for Nitrogen Content in Sugarcane Leaves. Agronomy 2025, 15, 175. [Google Scholar] [CrossRef]
Agrawal, J.; Arafat, M.Y. Transforming Farming: A Review of AI-Powered UAV Technologies in Precision Agriculture. Drones 2024, 8, 664. [Google Scholar] [CrossRef]
Bendig, J.; Yu, K.; Aasen, H.; Bolten, A.; Bennertz, S.; Broscheit, J.; Gnyp, M.L.; Bareth, G. Combining UAV-Based Plant Height from Crop Surface Models, Visible, and near Infrared Vegetation Indices for Biomass Monitoring in Barley. Int. J. Appl. Earth Obs. Geoinf. 2015, 39, 79–87. [Google Scholar] [CrossRef]
Zheng, H.; Cheng, T.; Li, D.; Zhou, X.; Yao, X.; Tian, Y.; Cao, W.; Zhu, Y. Evaluation of RGB, Color-Infrared and Multispectral Images Acquired from Unmanned Aerial Systems for the Estimation of Nitrogen Accumulation in Rice. Remote Sens. 2018, 10, 824. [Google Scholar] [CrossRef]
Zhang, Y.; Ta, N.; Guo, S.; Chen, Q.; Zhao, L.; Li, F.; Chang, Q. Combining Spectral and Textural Information from UAV RGB Images for Leaf Area Index Monitoring in Kiwifruit Orchard. Remote Sens. 2022, 14, 1063. [Google Scholar] [CrossRef]
Zhang, X.; Hu, Y.; Li, X.; Wang, P.; Guo, S.; Wang, L.; Zhang, C.; Ge, X. Estimation of Rice Leaf Nitrogen Content Using UAV-Based Spectral–Texture Fusion Indices (STFIs) and Two-Stage Feature Selection. Remote Sens. 2025, 17, 2499. [Google Scholar] [CrossRef]
Zhang, S.; Duan, J.; Qi, X.; Gao, Y.; He, L.; Liu, L.; Guo, T.; Feng, W. Combining Spectrum, Thermal, and Texture Features Using Machine Learning Algorithms for Wheat Nitrogen Nutrient Index Estimation and Model Transferability Analysis. Comput. Electron. Agric. 2024, 222, 109022. [Google Scholar] [CrossRef]
Yang, M.-D.; Hsu, Y.-C.; Chen, Y.-H.; Yang, C.-Y.; Li, K.-Y. Precision Monitoring of Rice Nitrogen Fertilizer Levels Based on Machine Learning and UAV Multispectral Imagery. Comput. Electron. Agric. 2025, 237, 110523. [Google Scholar] [CrossRef]
Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
Levin, A.; Lischinski, D.; Weiss, Y. A Closed-Form Solution to Natural Image Matting. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 228–242. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2025; pp. 363–380. ISBN 978-3-031-72948-5. [Google Scholar]
Huang, H.; He, R.; Sun, Z.; Tan, T. Wavelet-SRNet: A Wavelet-Based CNN for Multi-Scale Face Super Resolution. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 1698–1706. [Google Scholar]
Gal, R.; Hochberg, D.C.; Bermano, A.; Cohen-Or, D. SWAGAN: A Style-Based Wavelet-Driven Generative Model. ACM Trans. Graph. 2021, 40, 1–11. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 2818–2826. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 1800–1807. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 618–626. [Google Scholar]
De Bang, T.C.; Husted, S.; Laursen, K.H.; Persson, D.P.; Schjoerring, J.K. The Molecular–Physiological Functions of Mineral Macronutrients and Their Consequences for Deficiency Symptoms in Plants. New Phytol. 2021, 229, 2446–2469. [Google Scholar] [CrossRef]
Houmani, H.; Corpas, F.J. Can Nutrients Act as Signals under Abiotic Stress? Plant Physiol. Biochem. 2024, 206, 108313. [Google Scholar] [CrossRef]
Li, M.; Zhang, H.; Yang, X.; Ge, M.; Ma, Q.; Wei, H.; Dai, Q.; Huo, Z.; Xu, K.; Luo, D. Accumulation and Utilization of Nitrogen, Phosphorus and Potassium of Irrigated Rice Cultivars with High Productivities and High N Use Efficiencies. Field Crops Res. 2014, 161, 55–63. [Google Scholar] [CrossRef]
Guo, Y.; Yan, Z.; Gheyret, G.; Zhou, G.; Xie, Z.; Tang, Z. The Community-level Scaling Relationship between Leaf Nitrogen and Phosphorus Changes with Plant Growth, Climate and Nutrient Limitation. J. Ecol. 2020, 108, 1276–1286. [Google Scholar] [CrossRef]
Liang, C.; Tian, J.; Liao, H. Proteomics Dissection of Plant Responses to Mineral Nutrient Deficiency. Proteomics 2013, 13, 624–636. [Google Scholar] [CrossRef] [PubMed]
Xue, Y.; Zhu, S.; Schultze-Kraft, R.; Liu, G.; Chen, Z. Dissection of Crop Metabolome Responses to Nitrogen, Phosphorus, Potassium, and Other Nutrient Deficiencies. Int. J. Mol. Sci. 2022, 23, 9079. [Google Scholar] [CrossRef] [PubMed]
Gao, J.; Liao, W.; Nuyttens, D.; Lootens, P.; Xue, W.; Alexandersson, E.; Pieters, J. Cross-Domain Transfer Learning for Weed Segmentation and Mapping in Precision Farming Using Ground and UAV Images. Expert Syst. Appl. 2024, 246, 122980. [Google Scholar] [CrossRef]
Xu, J.; Hou, Y.; Ren, D.; Liu, L.; Zhu, F.; Yu, M.; Wang, H.; Shao, L. STAR: A Structure and Texture Aware Retinex Model. IEEE Trans. Image Process. 2020, 29, 5022–5037. [Google Scholar] [CrossRef]
Jiang, Z.; Xu, Y.; Xu, H.; Wang, Z.; Liu, J.; Chen, Q.; Qiao, C. Computation and Communication Efficient Federated Learning with Adaptive Model Pruning. IEEE Trans. Mob. Comput. 2024, 23, 2003–2021. [Google Scholar] [CrossRef]
Fu, J.; Nie, C.; Sun, F.; Li, G.; Shi, H.; Wei, X. Bionic Visual-Audio Photodetectors with in-Sensor Perception and Preprocessing. Sci. Adv. 2024, 10, eadk8199. [Google Scholar] [CrossRef]

Figure 1. Preprocessing of sugarcane leaf images: (a) Sugarcane leaf images after background removal; (b) processed sugarcane leaf images.

Figure 2. Histograms of sugarcane leaf N, P and K content: (a) training set; (b) validation set; (c) test set.

Figure 3. WT-ResNet network architecture.

Figure 4. The residual structure.

Figure 5. Comparison of residual blocks: (a) ResNet18 residual block; (b) WT-ResNet residual block.

Figure 6. WT-ResNet training process: (a) training process for predicting nitrogen (N); (b) training process for predicting phosphorus (P); (c) training process for predicting potassium (K).

Figure 7. Grad-CAM of WT-ResNet Layer 4: (a) original image; (b) heatmap of predicted nitrogen; (c) heatmap of predicted phosphorus; (d) heatmap of predicted potassium.

Figure 8. Performance of different backbones in sugarcane leaf nutrition prediction: (a) performance in predicting nitrogen; (b) performance in predicting phosphorus; (c) performance in predicting potassium.

Figure 9. Ground Truth images in sugarcane leaf nutrition prediction: (a) performance in predicting nitrogen (N); (b) performance in predicting phosphorus (P); (c) performance in predicting potassium (K).

Table 1. Soil nutrient content of the experimental field.

pH	Organic Carbon (g/kg)	Total Nitrogen (mg/kg)	Total Phosphorus (mg/kg)	Total Potassium (mg/kg)
5.13 ± 0.01	11.45 ± 0.04	97.36 ± 1.84	64.37 ± 1.53	102.32 ± 1.45

Table 2. Measurement items and methods.

Items	Methods
pH of soil	Acidity meter method
Organic matter of soil	Potassium dichromate volumetric method (external heating method)
Nitrogen and phosphorus of soil and crop	Measurement using a semi-automatic analyzer after H₂SO₄-H₂O₂ digestion (AMS, Italy; Model: SMARTCHEM 200)
Potassium of soil and crop	Flame photometry method for H₂SO₄-H₂O₂ digestion

Table 3. Deep learning environment parameter table.

Items	Detail
Operating System	Linux
CPU	Intel Xeon W-2235
GPU	NVIDIA GeForce RTX 3090
Acceleration Environment	CUDA 12.6
Language	Python 3.8.20
Framework	Pytorch 2.4.1

Table 4. Model evaluation metrics.

Metrics	Definition	Formula
MAE (Mean Absolute Error)	The average absolute difference between predicted values and actual values.	$\frac{1}{n} \sum_{i = 1}^{n} \|{\hat{y}}_{i} - y_{i}\|$
MSE (Mean Squared Error)	The average of the squared differences between predicted values and actual values.	$\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}$
R² (Determination coefficient)	The proportion of variance in the target variable explained by the model.	$1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(\bar{y} - y_{i})}^{2}}$
Accuracy within tolerance	Accuracy within a specified allowable error.	$\frac{1}{n} \sum_{i = 1}^{n} Ⅱ (\|{\hat{y}}_{i} - y_{i}\| \leq ∆)$

Where

{\hat{y}}_{i}

is the predicted value;

y_{i}

is the actual value;

\bar{y}

is the sample mean;

n

is the sample size; and

Ⅱ

is the indicator function: it takes the value 1 if the condition is true, and 0 otherwise;

∆

is the tolerance.

Table 5. Ablation experiment results on sugarcane leaf nitrogen prediction.

Model	R²	Accuracy	MAE	MSE
2*3 × 3	0.8679 †	0.7059 †	0.3986 †	0.2555 †
2*3 × 3 WT	+0.061	+0.1058	−0.0245	−0.0592
1*5 × 5	+0.0269	+0.0941	−0.0462	−0.052
1*5 × 5 WT	+0.0712	+0.1646	−0.1400	−0.1376
WT + 2*3 × 3	+0.0128	+0.0235	−0.0347	−0.0247
WT + 2*3 × 3 WT	+0.0364	+0.0941	−0.0883	−0.0705
WT + 1*5 × 5	+0.0535	+0.0706	−0.0844	−0.1034
*WT + 15 × 5 WT (Ours)**	+0.0741	+0.1765	−0.1456	−0.1433

The † symbol represents the performance of the baseline model on the test set. The values for other models indicate the increase or decrease relative to this baseline.

Table 6. Ablation experiment results on sugarcane leaf phosphorus prediction.

Model	R²	Accuracy	MAE	MSE
2*3 × 3	0.8306 ^†	0.3529 ^†	0.1734 ^†	0.0468 ^†
2*3 × 3 WT	+0.0311	+0.1059	−0.0230	−0.0087
1*5 × 5	−0.0432	−0.0235	+0.0131	+0.0119
1*5 × 5 WT	+0.0132	+0.1059	−0.0180	−0.0037
WT + 2*3 × 3	−0.0297	−0.0235	+0.0079	+0.0081
WT + 2*3 × 3 WT	+0.0550	+0.1177	−0.0443	−0.0152
WT + 1*5 × 5	−0.0567	−0.0941	+0.0302	+0.0156
*WT + 15 × 5 WT (Ours)**	+0.0778	+0.2353	−0.0573	−0.0215

The † symbol represents the performance of the baseline model on the test set. The values for other models indicate the increase or decrease relative to this baseline.

Table 7. Ablation experiment results on sugarcane leaf potassium prediction.

Model	R²	Accuracy	MAE	MSE
2*3 × 3	0.6287 ^†	0.4940 ^†	0.5643 ^†	0.4964 ^†
2*3 × 3 WT	+0.1400	+0.1766	−0.1251	−0.1870
1*5 × 5	+0.0203	+0.0119	−0.0033	−0.0268
1*5 × 5 WT	+0.1505	+0.2001	−0.1243	−0.2011
WT + 2*3 × 3	+0.0338	+0.0589	−0.0213	−0.0450
WT + 2*3 × 3 WT	+0.1864	+0.1884	−0.1722	−0.2558
WT + 1*5 × 5	−0.0696	−0.0352	+0.0687	+0.0934
*WT + 15 × 5 WT (Ours)**	+0.1948	+0.2119	−0.1789	−0.2603

The † symbol represents the performance of the baseline model on the test set. The values for other models indicate the increase or decrease relative to this baseline.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, C.; Dou, J.; He, B.; Cai, Y.; Zou, C. WT-ResNet: A Non-Destructive Method for Determining the Nitrogen, Phosphorus, and Potassium Content of Sugarcane Leaves Based on Leaf Image. Agriculture 2025, 15, 1752. https://doi.org/10.3390/agriculture15161752

AMA Style

Sun C, Dou J, He B, Cai Y, Zou C. WT-ResNet: A Non-Destructive Method for Determining the Nitrogen, Phosphorus, and Potassium Content of Sugarcane Leaves Based on Leaf Image. Agriculture. 2025; 15(16):1752. https://doi.org/10.3390/agriculture15161752

Chicago/Turabian Style

Sun, Cuimin, Junyang Dou, Biao He, Yuxiang Cai, and Chengwu Zou. 2025. "WT-ResNet: A Non-Destructive Method for Determining the Nitrogen, Phosphorus, and Potassium Content of Sugarcane Leaves Based on Leaf Image" Agriculture 15, no. 16: 1752. https://doi.org/10.3390/agriculture15161752

APA Style

Sun, C., Dou, J., He, B., Cai, Y., & Zou, C. (2025). WT-ResNet: A Non-Destructive Method for Determining the Nitrogen, Phosphorus, and Potassium Content of Sugarcane Leaves Based on Leaf Image. Agriculture, 15(16), 1752. https://doi.org/10.3390/agriculture15161752

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

WT-ResNet: A Non-Destructive Method for Determining the Nitrogen, Phosphorus, and Potassium Content of Sugarcane Leaves Based on Leaf Image

Abstract

1. Introduction

2. Materials and Methods

2.1. Field Experiment

2.2. Field Experiment Result

2.3. WT-ResNet

2.3.1. Residual Structure

2.3.2. Wavelet Convolution

2.3.3. WT Residual Block

2.4. Evaluation Metrics

3. Results

3.1. WT-ResNet Training Results

3.2. Grad-CAM

3.3. Backbone Comparison Experiments

3.4. Ablation Study

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI