1. Introduction
Natural rubber is an important strategic resource, and the rubber tree is its main commercial source [
1]. In order to achieve scientific management of nutrients in rubber plantations, realize efficient monitoring of nutrients, and guarantee safe production [
2], hyperspectral imaging (HSI) has attracted extensive attention in the field of non-destructive nutrient diagnosis of rubber leaves [
3,
4,
5]. According to “Technical Regulations for Nutrient Diagnosis of Rubber Leaves” (GB/T 29570-2013 [
6]) and its important role in natural rubber production [
7], potassium is one of the important indicators in the nutrient diagnosis of rubber leaves. Therefore, establishing an accurate potassium diagnosis model for rubber leaves based on HSI technology is of positive significance to the natural rubber industry.
In mainstream modeling research methods for non-destructive nutrient diagnosis of crop leaves, performing mask extraction on HSI and calculating the average leaf spectra is a robust approach [
8,
9]. The advantage of HSI over probe spectroscopy is that it avoids the problem that too little spectral information cannot represent the complete sample information and improve the robustness of the model [
10]. While providing comprehensive spectral information, HSI enables the measurement results of samples to have the sources for mining both spatial features and spectral features at the same time [
11].
However, what follows is a huge number of variable dimensions, which is often referred to as the “curse of dimensionality” in the field of machine learning [
12]. Therefore, even when using HSI with high resolution in both spatial and spectral dimensions for feature acquisition, the data processing technology route dominated by mask extraction and average spectral calculation remains mainstream. This approach allows for the smooth continuation of the mainstream scheme of probe-based spectral analysis in HSI datasets. Under the application of this method, the rich spatial features in HSI are neglected, which in turn sacrifices the empirical advantages of probe devices in discrete spectral acquisition. Therefore, effective and organic spatial–spectral feature fusion is an important research content in leaf-scale HSI modeling.
There are no clear research conclusions indicating whether rubber leaves have specific leaf surface response characteristics under different potassium levels, but there are abstract empirical summaries in the agricultural field regarding the relationship between potassium and leaf surface response characteristics. In cases of potassium deficiency, yellow or brown areas appear at the edges of crop leaves [
13]. Meanwhile, by affecting leaf expansion and development, potassium causes the key characteristics related to its content to tend to be present at the leaf edges [
14]. Potassium usually exists in ionic form in crop leaves [
15]. Its solute form and its impact on lateral cell growth are, respectively, correlated with the vascular bundle tissues, water content, and edge regions of leaves. In research conclusions targeting different crops, the characteristics formed by potassium status on leaf surfaces all point to a local rather than global pattern. This precisely indicates that, in HSI, it is inappropriate to calculate the average of pixel spectra of the entire leaf to represent the spectral characteristics of the leaf for establishing a potassium diagnosis model. In the process of potassium diagnosis, it is necessary to conduct differentiated detection on different regions. This also puts forward specific requirements for reasonable spatial–spectral feature fusion strategies in HSI.
The widespread heterogeneity among local spectra on crop leaf surfaces is an important source of spatial feature extraction in HSI. Although no studies have yet demonstrated the heterogeneity of spectral characteristics on rubber leaf surfaces from the perspective of HSI, conclusions drawn from banana leaves [
16] can also support this characteristic of leaf HSI. This widespread heterogeneity stems from physical and chemical factors resulting from leaf cell differentiation. Taking vascular bundle tissue and photosynthetic tissue as examples: in terms of chemical factors, differences in lignin content and chlorophyll content directly affect spectral characteristics; in terms of physical factors, there are also significant differences in the scattering properties between leaf veins and mesophyll. Features related to leaf growth and expansion should be concentrated in the leaf edge area, but in rubber leaves, the separability of their spectral characteristics needs to be verified through experiments.
In studies on HSI-based spatial–spectral fusion strategies, for the purpose of improving modeling accuracy, previous research has demonstrated that integrating leaf structural features [
17], texture features [
18], and the uneven distribution of features [
19] can effectively enhance the accuracy of disease detection and nutrient content estimation. Specifically, Song et al. divides the corn leaf surface into 30 different types of regions through the extraction of vein structures, thereby realizing the decomposition of the average spectrum based on the spatial characteristics of the leaf [
17]. In the results of nitrogen diagnosis modeling accuracy, for the modeling of different genotypes, the opposition coefficients of the average spectral model and the feature fusion model increased from 0.752 to 0.870 and from 0.879 to 0.973, respectively. In another of their studies, the surface of soybean leaves was divided into multiple sub-regions based on leaf veins, and more accurate nitrogen indices were calculated from the spectral features of these sub-regions. This is essentially an update of the contribution weights of each sub-region on the surface of soybean leaves to modeling. The study labeled the innovative coordinate axis divided based on leaf veins as NLCS, and labeled the further derived nitrogen index as NLCS-N. Compared with all models, the opposition coefficient of NLCS-N was 31% higher on average than the average leaf NDVI. In the HSI-based study on the diagnosis of pear leaf anthracnose, Zhang et al. extracted texture features from HSI by calculating the gray-level co-occurrence matrix [
19]. The discrimination accuracy of the model that fused vegetation index features, characteristic band screening, and texture features reached 0.986, which was higher than the 0.925 obtained by average spectral. Wu et al. extracted the texture features on the leaf surface as spatial features via the gray-level co-occurrence matrix [
20]. Compared with average spectral modeling, the modeling results that added vegetation index features and texture features also increased from 0.950 to 0.963.
As shown in
Table 1, these studies have revealed two important conclusions in HSI-based crop leaf nutrient diagnosis. First, spatial features are an important source of information for more accurate modeling results. Second, each local part on the leaf surface should have differentiated weights rather than unified weights in nutrient diagnosis. Therefore, developing feature extraction and fusion methods different from average spectral modeling is an important stage for efficiently utilizing HSI information abundance and achieving accurate modeling. However, the methods in the aforementioned studies have certain issues. Manual regional division based on the direction of leaf veins is itself a labor-intensive process and also involves reliance on experience. Moreover, the sub-regional division of leaf surfaces based on Local Binary Pattern (LBP) is only sensitive to structures like leaf veins; leaf edges with response characteristics under specific nutrient states are not significant, posing difficulties in extraction. Additionally, while research conclusions based on texture feature fusion support the effectiveness of disease diagnosis, no studies have shown that such features have a positive impact on nutrient diagnosis.
Therefore, this study proposes to use a clustering method to achieve dynamic division of sub-regions on the leaf surface, with the regional division results determined by the spectral characteristics of local leaf pixels. Subsequently, an elitist-preserving genetic algorithm is adopted to optimize the modeling contribution weights of each sub-region and the leaf spectral features used for modeling. The optimal sub-region weight allocation results and the optimal partial least squares model are obtained through cross-validation.
2. Methods and Materials
2.1. Sample Collection and Measurement
The rubber leaf samples utilized in this study were collected from Danzhou City, Hainan Province, China, situated at 109.47°E and 19.54°N. Upon ensuring that only mature and healthy leaves without conspicuous stress characteristics were gathered, the samples were carefully preserved and then subjected to HSI measurements. As an application study of HSI in rubber leaf nutrient diagnosis, the process of collecting rubber leaf samples in this study was carried out in accordance with the standard technical specifications (GB/T 29570-2013). Samples were collected from stably aged top canopy leaves on the main lateral branches in the lower layer of the crown of mature sample plants, excluding diseased leaves and wind-damaged leaves. Standard technical specifications evaluate the nutritional status of a sample plant based on the nutrient status of two leaf samples from it, which requires accurate measurement results of leaf nutrient content. Therefore, in addition to the evaluation of leaf status and sampling height, in this study, in accordance with the diagnostic technical specifications recorded in national standards, a total of 9 batches (1800 samples) were collected. From the sample set, a total of 201 rubber leaf samples that met the standards of deficiency (Def), adequacy (Adq), and excess (Exc) were selected for modeling research. After collection was completed, the leaf samples were sent to the laboratory, and indoor hyperspectral imaging acquisition was conducted on each sample one by one.
The HSI system primarily comprises a computer, a darkroom (GaiaSorter, Zolix Instruments Co., Ltd., Beijing, China), a hyper-spectrometer (GaiaField-F-N17, Zolix Instruments Co., Ltd., Beijing, China) with a spectral range of 900–1700 nm and a total of 254 bands (wherein the band slices of the first 30 and last 20 bands were removed due to noise interference), and a stepper motor-driven stage. Within the scanning field of view of the hyper-spectrometer, apart from the leaf to be measured, a black background plate, a standard whiteboard, and its metal border. The standard whiteboard is intended for the black-and-white correction of the HSI. The main process of black-and-white correction can be expressed by Equation (1),
where
represents the standard whiteboard,
represents the black frame,
represents the original spectral data, and
represents the corrected HSI.
After the completion of HSI measurements, the leaf samples were dispatched for physicochemical analysis to determine the leaf potassium content (LKC). All of the 201 collected leaf samples were categorized into three groups, namely potassium deficiency, normal, and potassium enrichment, per the national standard (GB/T 29570-2013). According to China’s national standard GB/T 29570-2013 for rubber tree cultivation in Hainan Province, the potassium content thresholds in leaves are categorized as severely deficient (<0.7% dry weight), normal (0.9–1.1%), and excessive (>1.5%), with these thresholds specifically adapted to Hainan’s lateritic soils and high-yield rubber clones.
As shown in
Table 2, the study labeled all samples falling outside the normal range as either deficient or excessive, and systematically partitioned them into training and validation sets. Consequently, the discriminant model developed in this study is designed to classify three primary LKC status categories: deficient (K-Def), adequate (K-Adq), and excessive (K-Exc).
2.2. Spectral Preprocessing
In the field of spectral analysis, mainstream preprocessing methods include Multiple Scattering Correction (MSC) [
21,
22,
23,
24], Standard Normal Variate transform (SNV), and Z-Score [
25,
26,
27]. In this study, the average spectra used for comparison were processed using traditional preprocessing methods. For the spatial–spectral fusion strategy proposed in this study, the segmentation of leaf regions relies on pixel-level spectral reflectance, implying differences in the feature domains between leaf-level average spectra and pixel-level spectral data.
The main purpose of traditional hyperspectral preprocessing is to eliminate spectral feature differences caused by physical factors and retain key spectral features resulting from chemical factor differences [
28]. This is common and reasonable in average spectral modeling. However, for the pixel-level spectral reflectance feature space, the proportion of influences of physical and chemical factors on features is uncertain. Under unified objective factors in the indoor spectral measurement environment, different sub-regions on the leaf surface will naturally exhibit significant differences in both physical and chemical factors due to functional differentiation. Therefore, in this study, the preprocessing of pixel spectra during leaf sub-region segmentation needs to be discussed based on experimental results.
Spectra of the leaf region exhibit significant differences from those of the black background region and the standard board region, and this difference provides a basis for extracting the binary mask of the leaf ROI. In this study, in the binary mask of rubber leaf HSI, the leaf region is assigned a value of 1, while other regions are assigned a value of 0. Using the mask as a standard, all pixel-level spectra on the surface of rubber leaves can be obtained, and the average spectrum of rubber leaves can also be calculated quickly.
2.3. Modeling Methods for Comparison
In this study, to demonstrate the necessity of implementing spatial–spectral feature fusion from leaf-scale HSI for modeling, the leaf average spectral modeling method is primarily taken as the comparison object. The leaf average spectrum begins with the extraction of the ROI in the leaf area from HSI, which is the mask extraction mentioned in the above text. After obtaining the binarized leaf ROI mask, the average spectrum of the leaf is derived by extracting all pixel spectra in the leaf region and calculating their average. Once the average spectrum and the modeling target are obtained, a model can be established using shallow machine learning algorithms. Among these, partial least squares (PLS) is a more mainstream algorithm, which can effectively mitigate the interference caused by collinear and redundant features in hyperspectral information [
29,
30,
31].
Besides extracting the leaf average spectrum from HSI, calculating the gray-level co-occurrence matrix (GLCM) and texture features [
32] from the band-by-band slices of HSI is also a method for HSI feature extraction, and it belongs to a type of spatial features [
19]. In this study, the calculation of GLCM computes the spatial features of each band slice based on 3 image pixel distances (1, 2 and 3), 4 directions (
,
,
and
), and 5 texture feature indices (contrast, dissimilarity, homogeneity, energy, and correlation).
After removing band slices severely affected by noise, 204 bands are used to calculate the average spectrum and texture features. Therefore, each sample can generate spatial features with a dimensionality of . To quickly screen out effective variables, the absolute values of Pearson correlation coefficients and Spearman correlation coefficients are calculated, and the top 50 variables with the highest correlation from each are selected, resulting in a total of 100 variables. These variables are used to explore whether they can have a positive impact on improving the modeling performance.
2.4. Proposed Method
The quantity of leaf pixels is exceedingly large, and the conventional K-means algorithm incurs substantial computational resource demands. The segmentation of leaf sub-regions can be achieved either by batch-based deep learning or batch-based machine learning clustering algorithms. In this study, the Mini-Batch K-means [
33] algorithm is employed for clustering the pixels. This algorithm necessitates initialization with the number of cluster centers. To study the influence of the number of cluster centers in the key feature extraction process, it is necessary to conduct experiments on different numbers of cluster centers within a range.
After clustering the leaf pixels, the corresponding blade sub-regions are also partitioned, and the regional proportion of each type of pixel sub-region on the blade ROIs is computed. The average spectra of each sub-region are multiplied by the area proportion to derive the leaf component spectra based on area division. Equation (2) represents the relationship between the component spectra and the average leaf spectra,
where
represents the number of clustering centers,
represents the proportion of sub-regions, and
represents the corresponding average spectrum of sub-regions. This also implies that allotting additional weights to
in the leaf ROI modifies the sensitivity of different leaf regions to the modeling process.
The spectra obtained after the reweighting process are denoted as
. The specific calculation is shown in Equation (3).
Among them, represents the optimized weight of each sub-region spectral component, with a range between 0 and 1, which is used to emphasize or ignore the spectral components of a specific leaf surface sub-region. Meanwhile, , as a vector composed of the modeling contribution weights of each sub-region of the leaf, has the same number of elements as that of elements in a and the number of rows in . This number will be denoted by in the subsequent text, which stands for the number of clustering centers.
In Equation (3), the calculation result of is a feature matrix, where each row is the product of the average spectral of a sub-region and its area proportion. This makes the sum of all rows of this matrix yield the leaf average spectra. It should be noted that, as the number of cluster centers decreases or increases, the vector lengths of and will decrease or increase accordingly. is obtained by calculating the area proportion of sub-regions, that is, the ratio of the number of elements after clustering to the total number of leaf pixels, while needs to be optimized through subsequent methods. If the length of this vector is too small, it means that the division of the leaf surface is not detailed enough; if it is too large, it will lead to difficulties in optimization for subsequent algorithms or the problem of overfitting. Therefore, in this study, the range of the number of cluster centers is set to 2–16, that is, pixels are divided into at least 2 categories and at most 16 categories.
The reweighting procedure optimizes partial least square discrimination analysis (PLS-DA) model accuracy, with weights updated via Strengthen Elitist Genetic Algorithm. The optimization objective was defined as the cross-validation accuracy score within the training set obtained by the PLS-DA model. After the completion of this optimization process, the study can analyze, by observing the heatmap of weight distribution on the leaf surface, which local parts of the leaf retain more significant potassium content characteristics in potassium diagnosis and make more important contributions to accurate discrimination.
The proposed method of this study implemented a five-step modeling framework, and the main technical route is shown in
Figure 1:
Baseline PLS-DA optimization: Extracted whole-leaf average spectra and determined optimal component number through 5-fold cross-validation on the training set.
Pixel-level preprocessing: Processed all pixel spectra using both Multiplicative Scatter Correction (MSC) and Standard Normal Variate (SNV) transformations, while maintaining raw spectra as reference.
Spatial clustering: Performed Mini-Batch K-Means clustering (batch size = 1024) separately on raw, MSC-, and SNV-processed spectra, then calculated sub-region pixel proportions and cluster-specific mean spectra for each sample.
Weight optimization: Defined a cluster-dimensional weight vector and employed genetic algorithm optimization (fitness = CV accuracy) to determine optimal sub-region weights for PLS-DA modeling.
Model validation: Evaluated the weighted PLS-DA model’s generalization performance on the independent validation set using standard classification metrics.
As two methods used for comparison shown in
Figure 1, the average spectral modeling method and GLCM approach were implemented as follows: After mask extraction, the average spectra of rubber leaf samples were extracted, which served as the basis for modeling in the average spectral modeling method. For the GLCM-based approach, textural features were calculated for each band slice of all rubber leaf HSI datasets. Subsequently, the average spectra and textural features were concatenated, which provided the basis for establishing the spatial–spectral feature fusion model based on textural features.
2.5. Evaluation Metrics
In the weight optimization stage of this study, the discrimination accuracy of the cross-validation was used as the main optimization goal. While in the validation stage, accuracy, macro-f1, macro-precision, and macro-recall [
34,
35] were adopted as the evaluation indicators, as shown in Equations (4)–(7).
The counts therein come from the correctness or incorrectness of the yes-no discrimination and discrimination results for each category, which form combinations such as True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). In addition, N in the equation represents the total amount of samples.
When calculating macro-f1, for each category, there are Precision
and Recall
, and the
. And then, macro-f1 is
3. Results
3.1. Modeling Results Under the Proposed Method
As presented in
Table 3, the weight optimization via genetic algorithm yielded varying accuracies under different numbers of cluster centers, and the model achieved optimal classification performance when the number of cluster centers was set to 15. In
Table 3,
Table 4 and
Table 5,
denotes the number of cluster centers, the Accuracy CV column indicates the accuracy results of cross-validation in the training set (
), and the subsequent columns represent the accuracy evaluation metrics for the validation set (
).
Table 4 and
Table 5 present the modeling results obtained using MSC or SNV preprocessing methods for pixel-level spectral pretreatment. Both techniques aim to address baseline drift and other artifacts induced by environmental sampling conditions. From the perspective of modeling accuracy results alone, eliminating the influence of physical factors on spectral reflectance has more positive impacts on the modeling results after sub-regional segmentation of leaf surfaces. The model, based on pixel-level MSC preprocessing and the proposed method, achieved a validation discrimination accuracy of 0.97.
From the perspective of the discriminative accuracy of cross-validation in the training set as shown in
Table 3,
Table 4 and
Table 5, the results of weight optimization for leaf sub-regions exhibit a trend: as the degree of subdivision of pixel spectral sets increases, the cross-validation accuracy of the model becomes higher. This indicates that, to achieve more accurate modeling for potassium diagnosis of rubber leaves, a sufficient degree of subdivision of sub-regions on the leaf surface based on leaf-scale HSI is required.
3.2. Comparative Analysis
To intuitively compare the modeling results of different modeling methods in this study, the accuracy results of the leaf average spectrum method, texture feature fusion method, and the proposed method are compiled in
Table 6 for comparison. The screening of texture features is based on the ranking results of Pearson correlation coefficients, from which 50 optimal features are selected. Ultimately, the optimal one in cross-validation is the “correlation” texture feature calculated from the 1500 nm band slice, with a pixel distance of 2 and an angle of 0°. In the table, (Pixel-Level) denotes the corresponding computational processing of the preprocessing method applied to the set of pixel spectra; (Leaf Average) denotes the preprocessing applied to leaf average spectra. TF denotes texture feature. The Accuracy CV column indicates the accuracy results of cross-validation in the training set, and the subsequent columns represent the accuracy evaluation metrics for the validation set.
The screening results of texture features will be elaborated in subsequent paragraphs. However, based on the results, even though there is a certain degree of correlation, these features cannot guarantee to provide sufficiently valuable information for the potassium diagnosis of rubber leaves. When combined with spectral features, they failed to show an improvement in modeling accuracy.
3.3. Differences in Sub-Region Spectral Characteristics of Leaves
The leaf surface can be used to achieve dynamic region division via clustering algorithms, and the training basis of the segmentation model lies in the pixel-level hyperspectral reflectance vectors of each local part of the leaf in rubber leaf HSI. To investigate the results of region division and the differences in spectral reflectance vectors among leaf sub-regions, the study visualized the sub-region segmentation results and the average spectra of sub-regions in rubber leaves, taking the optimal model in
Table 6 as an example.
Figure 2 shows the results of sub-region segmentation on the leaf surface, i.e., the prediction results of the Mini-Batch K-means model. From the segmentation results, it can be observed that regions such as mesophyll, leaf edge, and leaf vein were accurately segmented. Unlike traditional digital image analysis methods (e.g., Local Binary Pattern, LBP), this segmentation depends only on the inherent characteristics of pixel spectra and does not require adjustment of relevant parameters.
Furthermore, the problem of baseline drift in pixel spectra has been alleviated via the pixel-level MSC method, and the spectral characteristics of each sub-region shown in
Figure 2 also exhibit significant differences. In the wavelength range of 900–1200 nm, veins (sub region 5) show the highest reflectance, attributed to the absence of chlorophyll and dense cell arrangement that enhances light scattering. The mesophyll (sub region 1) exhibits intermediate reflectance, due to residual chlorophyll absorption and loose cell structure. In contrast, the leaf margin (sub region 2) displays the lowest reflectance, resulting from its thin cell layer and weak scattering ability.
Within the 1200–1500 nm range (a strong water absorption band), veins present the deepest reflectance valley, driven by abundant free water in xylem vessels that intensifies infrared absorption. The leaf margin shows a moderately deep valley, as its young cells contain a relatively high proportion of free water. The mesophyll has the shallowest valley, since most of its water exists as bound water (associated with proteins and chloroplasts), reducing absorption intensity.
Beyond 1500 nm, as water absorption diminishes, the leaf margin exhibits the steepest reflectance rise, and its low water content causes rapid dissipation of residual water absorption. The mesophyll shows an intermediate rise rate, while veins display the slowest rise, as their high free water content prolongs the residual effect of water absorption.
As shown in
Figure 3, after using the segmentation model with the optimal results in
Table 2 and plotting the spectral reflectance curves of different regions for all samples, it can be observed that the differences in spectral characteristics exhibit variations in degree among different regions. Subfigures (a), (b), and (c) in this figure correspond to Sub-regions 1, 2, and 5, as shown in
Figure 2. By comparing the spectral characteristics of these sub-regions, it can be found that the leaf vein and leaf edge regions exhibit more significant differences under potassium-deficient conditions; compared with the mesophyll region, these differences are more easily captured by discriminant models. However, due to the small area proportion of these two regions (leaf vein and leaf edge), their prominent characteristics are diluted to a large extent in the average spectra. Meanwhile, under different potassium content levels, the spectral variation trends in the 900–1200 nm wavelength band exhibit relatively similar characteristics among different sub-regions, but distinct variation trends emerge after 1400 nm. This difference also indicates that the response patterns of water-related activities in different sub-regions to potassium content levels are distinct.
3.4. Visualization of Leaf Sub-Region Weight
The study employed Mini-Batch K-means clustering to segment rubber leaf surfaces, followed by genetic algorithm-based optimization of sub-regional feature weights for modeling. Using the leaf surface segmentation module and optimized weights of the optimal model in
Table 6, as shown in
Figure 4, the optimized weights of sub-regions on the leaf surface are presented in the form of a heatmap. Among samples with three potassium content levels, high weights are concentrated in the leaf edge areas, though there are still slight differences. In K-Def samples, this high-weight area is wider, while it is narrower in the other categories. Moreover, a low-weight (blue) region is closely present in the neighborhood of the high-weight (red) region, which constitutes a source of instability in manual point selection. Leaf sub-region segmentation incorporating clustering algorithms enables dynamic segmentation of sensitive regions, which provides a premise for more accurate modeling in potassium diagnosis of rubber leaves.
The output includes the weight optimization results under all
conditions after MSC preprocessing of pixel spectra. As shown in
Figure 5, the details of the leaf surface are gradually subdivided. Given the goal of genetic algorithms to optimize cross-validation accuracy, the features of leaf vein regions have been weakened throughout the entire range of
values. As
increases, leaf edges can be divided into separate regions and thus assigned higher weights to participate in modeling. This result is consistent with the problem of local feature dilution that needed to be addressed when the present method was proposed. Analysis based on the optimization results indicates that leaf edge regions, whose width and range cannot be directly determined, are important potassium-sensitive regions. Dynamic segmentation and higher weights are important prerequisites for accurate diagnosis. Visualization results also reveal the law of rubber leaves’ response to potassium content. From the weight distribution results shown in the heatmap, the response pattern of rubber leaves has a certain degree of similarity with the response patterns of other crops cited in the introduction.
3.5. Results of Texture Feature Extraction and Fusion
After evaluating the 12,240 texture features using Pearson correlation coefficients and Spearman correlation coefficients, 100 features were selected to verify the impact of texture features participating in modeling on accuracy.
As shown in
Figure 6, superimposing the selected features with the average spectrum fails to enable the PLS-DA algorithm to build a model with higher accuracy.
Among the texture features, there are extensive features that have a correlation with potassium content. As shown in
Figure 7, among all the spectral bands, the water absorption band ranging from 1400 to 1500 nm can yield higher Pearson correlation coefficient results. In particular, the “contrast” and “correlation” features exhibit this characteristic. However, this result is only relative. The linear correlation represented by the Pearson coefficient is only around 0.2, which cannot indicate a significant correlation between the potassium content in rubber tree leaves and these features.
This evaluation also applies to the results of feature selection using the Spearman coefficient. As shown in
Figure 8, the optimal correlation results in each band slice do not exceed 0.4, which means that no more valuable features can be provided for model building.
3.6. Discriminant Model and Regression Coefficient
The regression coefficient curve of the PLS-DA model is output to observe the sensitive spectral bands of the model. As shown in
Figure 9, all the models exhibit relatively significant responses in the extensive water absorption bands, which is indeed consistent with the relationship between potassium and water in crop leaves.
The water absorption bands are located near 950, 1150, 1450, 1950, and 2350 nm [
36]. And the sensitive band shown in the figure indicates that PLS-DA models paid attention to the characteristics related to the moisture content of the leaves when judging the potassium content of rubber leaves.
The band near the 1390 nm wavelength proved to be a key feature for effectively discriminating K content in crop leaves [
37]. Meanwhile, the water content and structure of the leaf also affect the reflectance in this band, while the water balance as well as the synthesis of cellulose receives the influence of the level of K content [
14,
38,
39].
4. Discussion
4.1. Spectral Characteristics of Rubber Leaf Surface Pixels
In rubber leaves and broadly across crop species, leaf surfaces exhibit intricate functional zonation, coupled with spatially heterogeneous nutrient distribution patterns. This richness and complexity are directly observable at the pixel-spectral level. As demonstrated in
Figure 10a, the reflectance spectra vectors extracted from HSI data of rubber leaf surfaces exhibit remarkable diversity, reflecting the underlying biochemical and structural heterogeneity. Furthermore, as demonstrated in
Figure 10b, the distribution of these features in the three-dimensional linear mapping space exhibits non-Gaussian characteristics. With the application of high-resolution spectral acquisition technology, the rich structural and biochemical features within leaves are comprehensively captured and manifested as discriminative spectral signatures. These signatures serve as the fundamental basis for sub-regional segmentation of leaf surfaces in this study.
4.2. Sub-Regional Weight Optimization and Leaf Potassium Response Mechanisms
As shown in
Figure 4, it can be seen from the heatmap that the model tends to strengthen the spectral at the edge of the leaf when discriminating the potassium content level of the leaf. It was shown in Lin et al.’s research that reducing solute K provision may limit lateral cell expansion, and a similar response may also occur in the rubber leaves used in this study [
15]. The research by Hu et al. showed that the potassium content level affects the morphological characteristics of leaves. In their experimental results, an increase in potassium content was accompanied by an increase in leaf area and a decrease in leaf thickness [
14]. In Rawat et al.’s research, it was mentioned that K-deficient plants show symptoms like brown or yellow edges along with their leaves [
13]. It can be seen that, after optimizing the weights, the model can extract the key features more significantly.
4.3. Rationality of Modeling
Potassium diagnosis in crop leaves usually fails to achieve the same accuracy as nitrogen diagnosis, as potassium in leaves does not directly affect the spectral response information of crop leaves [
40]. The spatial distribution of sensitive regions on the surface of crop leaves has been sufficiently concluded, which supports the rationality of the visualization results of sub-regional optimized weights in this study.
Figure 8, which displays the regression coefficient curve, shows that the modeling results of this study exhibit sensitive features in a wide range of water absorption bands. This implies that, in rubber leaves, potassium diagnosis is closely related to the water content in leaves, which is consistent with the relationship between potassium and water in most crops and is reflected in spectral characteristics. Potassium also affects the synthesis of lignin and cellulose in crop leaves; therefore, the modeling results with high correlation in the 1400–1500 nm [
41] range also support the rationality of the modeling results in this study.
Through feature selection, Yu et al. and Wang et al. identified the characteristic spectral band of potassium in crop leaves around 950, 1150, and 1450 nm [
42,
43], which also supports the rationality of the modeling in this study.
4.4. Comparison of Research Methods
Compared with the LBP and methods of manually dividing leaf sub-regions, the method proposed in this study achieves the segmentation of leaf sub-regions based on the inherent characteristics of pixel spectra in HSI. This approach is more dynamic and avoids the issue of labor intensity. For the segmentation of leaf edges, it can be simply achieved via the erosion method in digital image processing. However, as can be seen from the visualization results of this study, highly sensitive edge regions are uncertain and irregular. Therefore, the segmentation–optimization–modeling framework proposed in this study is conducive to higher-precision modeling of leaf-scale HSI.
Since this study realizes the regional segmentation of leaf-scale HSI, it separates out a new optimizable dimension from the mainstream leaf average spectra, namely the weight of each component when participating in modeling. As the degree of subdivision increases, the length of the weight vector to be optimized becomes longer. When the segmentation is relatively coarse, the features of key sub-regions cannot be highlighted; if the segmentation is too fine, the weight vector becomes excessively long, which increases the difficulty of optimization by genetic algorithms. Therefore, for the regional segmentation of rubber leaf HSI, approximately 10 cluster centers are sufficient to achieve adequately complete and detailed segmentation.
4.5. Limitations and Future Research Directions
This study focuses on potassium diagnosis through the fusion of spatial–spectral features for leaf-scale rubber leaf HSI. In the HSI acquisition system adopted in this study, samples are naturally placed on the mobile platform. The natural curling and tilting of leaf samples may affect the results of spectral feature acquisition, and thus further impact the outcomes of leaf region segmentation. Jin et al. developed a handheld HSI acquisition device [
44], whose design enables the collection of crop leaf samples under fully vertical conditions, thereby providing more stable HSI outputs for the method proposed in this study.
Potassium diagnosis is crucial to the natural rubber industry. According to diagnostic technical regulations, there are still two undefined potassium content intervals. The potassium characteristics in these intervals are not sufficiently prominent, making it still difficult to achieve sufficient accuracy in the process of building classification models or regression models. In subsequent studies, to achieve high-precision modeling over the complete potassium content range, it will be necessary to develop methods for the co-optimization of the spatial–spectral feature space. Specifically, feasible methods need to be developed to screen different combinations of key characteristic bands in different sub-regions of the leaf, and through the realization of model miniaturization, more robust and widely applicable modeling results can be obtained.
5. Conclusions
This study addresses the issue where mainstream average spectral modeling ignores spatial information in hyperspectral imaging for potassium content diagnosis in rubber leaves, and proposes a leaf-scale spatial–spectral feature fusion method based on sub-region segmentation to improve modeling accuracy.
By fully considering the objectively existing differences in spectral characteristics among various local regions on the leaf surface, the method dynamically segments sub-regions on the leaf surface via a clustering algorithm and optimizes the modeling weights of each sub-region using a genetic algorithm. Experiments show that its performance is significantly superior to traditional average spectral modeling and texture feature fusion methods. In particular, after applying MSC to pixel-level spectra, when the leaf pixel spectra are clustered into nine subsets, the accuracy, precision, macro-F1, and macro-recall for potassium content diagnosis all reach 0.97, which is much higher than the 0.87 achieved by average spectral modeling.
Visualization of sub-region weights reveals that enhancing the modeling contribution of leaf edge regions can improve diagnostic accuracy, which is consistent with the response pattern of potassium in leaves, confirming the rationality of this spatial–spectral fusion strategy.
In summary, this study provides an effective technical approach for accurate non-destructive diagnosis of potassium content in rubber leaves, which is of positive significance for scientific nutrient management in rubber plantations and the sustainable development of the natural rubber industry. Its framework can also serve as a reference for nutrient diagnosis in other crops.