Using Field Spectroradiometer to Estimate the Leaf N/P Ratio of Mixed Forest in a Karst Area of Southern China: A Combined Model to Overcome Overfitting

The ratio between nitrogen and phosphorus (N/P) in plant leaves has been widely used to assess the availability of nutrients. However, it is challenging to rapidly and accurately estimate the leaf N/P ratio, especially for mixed forest. In this study, we collected 301 samples from nine typical karst areas in Guangxi Province during the growing season of 2018 to 2020. We then utilized five models (partial least squares regression (PLSR), backpropagation neural network (BPNN), general regression neural network (GRNN), PLSR+BPNN, and PLSR+GRNN) to estimate the leaf N/P ratio of plants based on these samples. We also applied the fractional differentiation to extract additional information from the original spectra of each sample. The results showed that the average leaf N/P ratio of plants was 17.97. Plant growth was primarily limited by phosphorus in these karst areas. The sensitive spectra to estimate leaf N/P ratio had wavelengths ranging from 400–730 nm. The prediction capabilities of these five models can be ranked in descending order as PLSR+GRNN, PLSR+BPNN, PLSR, GRNN, and BPNN when considering both accuracy and robustness. The PLSR+GRNN model yielded high R2 and performance to deviation (RPD), and low root mean squared error (RMSE) with values of 0.91, 3.15, and 1.98, respectively, for the training test and 0.81, 2.25, and 2.46, respectively, for validation test. Compared with the PLSR model, both PLSR+BPNN and PLSR+GRNN models had higher accuracy and were more stable. Moreover, both PLSR+BPNN and PLSR+GRNN models overcame the issue of overfitting, which occurs when a single model is used to predict leaf N/P ratio. Therefore, both PLSR+BPNN and PLSR+GRNN models can be used to predict the leaf N/P ratio of plants in karst areas. Fractional differentiation is a promising spectral preprocessing technique that can improve the accuracy of models. We conclude that the leaf N/P ratio of mixed forest can be effectively estimated using combined models based on field spectroradiometer data in karst areas.


Introduction
Nitrogen (N) and phosphorus (P) are crucial functional elements for organisms [1] and play a vital role in plant physiological processes [2][3][4]. Changes of the N and P concentration are essential for the growth of plants, as they are closely related to photosynthesis, respiration, N 2 fixation, and organic matter mineralization [5]. N and P are the primary limiting nutrients for plant growth in most natural systems. An N/P ratio greater than 16 indicates that plant growth is limited by P, while an N/P ratio of less than 14 is limited by N. Values between 14 and 16 suggest either N or P can be limited, or plant growth is meaning that overfitting is a critical issue that has to be considered in the estimation of biochemical parameters. It has been proven that increasing data samples, cross-validation [20], regularization [19], noise removal [21], and integration of multiple models [22] are effective ways to overcome overfitting [23,24]. Therefore, combining linear and nonlinear models may be a way to reduce overfitting.
Spectral differentiation transform methods play an essential role in estimating plant parameters, and the most commonly used are spectral first-order and second-order differentiation [25]. It has been demonstrated that differential transformation of spectra can improve model performance for estimating plant water [10], chlorophyll [9], and N content [17,26]. However, the application of integer-order differentiation is not always sufficient, as the spectral curve shifts in shape from n-order to n + 1-order in a sharply fluctuating way, and there is no smooth transition between the intermediate stages. Fractional order transform methods allow differentiation from zero to arbitrary real numbers [27]. Using fractional order differentiation transforms spectra more continuously and produces more detailed information about the spectra.
The prediction of leaf N/P ratio of a particular species by field spectroradiometer has been reported [8,28], but directly estimating the leaf N/P ratio of all plants worldwide is difficult to achieve with current methods. Therefore, improving model performance and applying these methods to estimate leaf N/P ratio of regional mixed-species ecosystems is a pressing issue. Karst landscapes are one of the most crucial landform types globally, accounting for roughly 15% of the Earth's total land area and inhabited by about 1 billion people [29]. Southwestern China has the largest karst area on the planet [30] and is one of 25 global biodiversity hotspots that contain many endemic and threatened species [31]. The karst areas of Guangxi Province are an important part of the southwest karst region, containing diverse landscape types, a wide variety of plants, and representing a key area for biodiversity conservation. However, anthropogenic disturbances have led to species loss in karst areas [32]. Non-destructive and rapid estimation of leaf N/P ratio of plants is required for ecological restoration and conservation in karst areas.
In this paper, we simulated the leaf N/P ratio of mixed forest in karst areas of Guangxi Province using fractional spectral differentiation and multiple models, combined with field spectroradiometer data. The primary objectives of this study are to (1) explore the predictive capabilities of linear regression models (partial least squares regression, PLSR) and nonlinear regression models (backpropagation neural network (BPNN), generalized neural network (GRNN)) to estimate leaf N/P ratio of mixed species based on field spectroradiometer data, (2) evaluate the contribution of fractional differentiation in improving these linear and nonlinear regression models' performance, and (3) propose the best models that can overcome the insufficient accuracy of the PLSR model and overfitting of the BPNN and GRNN models to estimate leaf N/P ratio of plants in karst areas.

Study Area
The study area is located in Guangxi Province (20 • 54 -26 • 24 N, 104 • 28 -112 • 04 E) in south China (Figure 1). The elevation of the study area ranges from 0 to 2141.50 m. This area borders the South China Sea and has a tropical and subtropical climate. The average annual temperature ranges from 17.30 to 23.80 • C, and annual precipitation ranges from 1024.60 to 2358.60 mm. Karst landforms are distributed widely throughout Guangxi and cover about 97,000 km 2 , accounting for 41% of the total area of the province. There are more than 4000 species of vascular plants, including more than 2000 species of medicinal plants, in the karst areas of Guangxi Province [33]. Nine typical karst experimental plots were selected, containing primary forests, secondary forests, and shrubs that represent the vegetation succession in karst areas. The area of each plot was about 200 m 2 . Detailed information on each plot is described in Table 1. more than 4000 species of vascular plants, including more than 2000 species of medicinal plants, in the karst areas of Guangxi Province [33]. Nine typical karst experimental plots were selected, containing primary forests, secondary forests, and shrubs that represent the vegetation succession in karst areas. The area of each plot was about 200 m 2 . Detailed information on each plot is described in Table 1.

Data Collection
The leaves were sampled from July 2018 to September 2020. In each experimental plot, leaf samples from 8-15 plants of locally dominant species were collected. In total, the Remote Sens. 2021, 13, 3368 5 of 17 database includes 301 samples covering 37 families, 59 genera, and 70 species. As plants are susceptible to light conditions [34], we collected leaves from three directions for each plant (0-120 • , 120-240 • , and 240-360 • , with 0 • due north) to reduce random errors in samples.
Leaf spectral reflectance was measured in attached leaves using a spectroradiometer (Fieldspec 4, Analytical Spectral Devices, ASD, Boulder, CO, USA), with a spectral resolution of 3 nm in the visible and near-infrared (NIR) (350-1000 nm) and 8 nm in shortwave-infrared (SWIR, 1000-2500 nm) [14]. Reference plate (white reference) calibration was performed every 10 min during the measurement. Three branches of each tree were selected for measurement. As the instrument battery only has a continuous operating time of about 4 h in the field, only two mature and healthy leaves per branch were taken for spectral scanning due to time limitations. Finally, the scanned spectral reflectance of all leaves on each branch were arithmetically averaged, and the average value was taken as the spectral sample of each tree.
After the spectral measurements, the healthy and mature leaves on the branches were collected. It was then kept intact in a self-sealing bag and immediately placed in an incubator (ICERSICE940). Leaf samples were transported back to the laboratory within 24 h and dried at 75 • C. The dry samples were entirely sieved through a 100 mesh sieve for physicochemical analysis. Finally, the total nitrogen (TN) content of plant leaves was measured using the Kjeldahl method [35], and the total phosphorus (TP) content of plant leaves was measured by the phosphomolybdate blue spectrophotometry method [36]. The ratio of TN to TP was determined as the leaf N/P ratio.

Fractional Differentiation (FD)
Fractional differentiation is an extension of integer differentiation to arbitrary differentiation [37] and is widely applied in electromagnetic field theory, control systems, nonlinear dynamics, biomedicine, and digital signal processing [38]. The most common method of fractional differentiation is mainly in the form of Riemann-Liouville, Grünwald-Letnikov, and Caputo [25,35]. The Grünwald-Letnikov is a finite-difference expression: where v is the order of differentiation, h is the step size, and t and a are the upper and lower limits of differentiation, respectively. Γ(·) is the Gamma function: where β is an arbitrary variable (we defined it as the order of differentiation in this study). In this paper, the plant leaf spectra were differentiated in the range between 0 to 3 orders (at an interval of 0.1 order). The integer order refers to zero, first, second, and third orders, while the other values are fractional orders.

Partial Least Squares Regression (PLSR)
The Partial Least Squares Regression (PLSR) model (Höskuldsson 1988) combines the merits of principal components, typical correlation, and multiple linear regression analysis. This method is essentially based on the assumption that the sample size is n and the data sets for the independent and dependent variables are Z = [z 1 , z 2 , · · · z k ] n×k , Q = [q] n×1 , respectively. The first component f 1 is extracted from Z. f 1 is a linear combination with z 1 , z 2 , · · · z k that carries the maximum variance information in Z and reaches the maximum correlation with q. If the accuracy of the model is satisfied, the component extraction is stopped. Otherwise, the next principal component is extracted until the requirement is satisfied.
Remote Sens. 2021, 13, 3368 6 of 17 where m is the number of principal components, k is the number of independent variables, a is the regression coefficients of y on f , and w is the linear coefficient of f on z. In this study, the fractional differentiation spectral reflectance of each order that had a significant correlation (p < 0.05) on the leaf N/P ratio of karst plants was used as the independent variable Z. This method is the same as used for the BPNN and GRNN models described later.

Back Propagation Neural Network (BPNN)
Back Propagation Neural Network (BPNN) is a multilayer feed-forward neural network [39]. The key traits of this method are the forward transmission of signals and the backpropagation of errors [39]. During forward transmission, the input signal is transferred from the input layer through the hidden layer and is then output. The neuron state of each layer only affects the neuron state of the next layer. If the output layer does not return the expected results, it transfers to backpropagation. The weights and thresholds of the network are adjusted according to the prediction error, resulting in the predicted output of the BPNN continuously approximating the expected results.
In this study, a one-hidden layer with the tansig function and an output layer with the purelin function neural network was built. The number of nodes in the hidden layer significantly impacts the output result [40], so 5-fold cross-validation was used to select the optimal number of hidden layer nodes (from [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. We used the arithmetic mean value of 10 consecutive operations of the BPNN model as the final results to eliminate fluctuations of the neural network operation. This threshold was also applied to the GRNN, PLSR+BPNN, and PLSR+GRNN models.

Generalized Regression Neural Network (GRNN)
Generalized regression neural network (GRNN) is a radial basis function neural network model proposed by Specht (1991) [41]. GRNN essentially derives the maximum probability estimate from the training data, which can be considered as an arbitrary function between input and output vectors. Unlike BPNN, GRNN does not require an iterative training procedure, making it significantly faster than BPNN in terms of computational efficiency [42]. This method displays greater prediction capability for nonlinear estimation. The prediction function can be expressed as: where j is the number of training samples, δ is the smoothing factor, X is the network input variable, and X i is the learning sample corresponding to the ith neuron. The weights factor for each observation Y i is the squared Euclid distance between the corresponding sample X i and the input variable X. The smoothness factor δ has a significant effect on the model. In this study, 5-fold cross-validation was used to identify the value of δ.

Combined Models, Sample Segmentation, and Accuracy Assessment
The principal components extracted from PLSR were used as input variables for the BPNN and GRNN models to overcome the overfitting problem of artificial neural network models. The purpose of extracting the principal components is to reduce the dimensionality of the spectral data, thus reducing the complexity of the BPNN and GRNN model. Therefore, the number of principal components should not be too large. The number of input variables can influence the structure and performance of the BPNN and GRNN models. We found that for most of the fractional differential spectra, it is able to represent more than 60% of the variability when the number of principal components is Remote Sens. 2021, 13, 3368 7 of 17 six. To better compare the performance of different fractional differential spectra, we set the number of extracted principal components as six. The combined PLSR+BPNN and PLSR+GRNN models were used to compare their simulated performance against PLSR, BPNN, and GRNN.
The field samples were randomly split into two datasets using the randperm function in MATLAB R2020a, with the training dataset accounting for 3/4 of the validation datasets and 1/4 of the total samples (Table 2). The accuracy of the model was assessed using the coefficient of determination (R 2 ), root mean squared error (RMSE), and the ratio of performance to deviation (RPD) [15]. An RPD greater than or equal to 2 indicates that the model has excellent predictability, while less than 2 and greater than or equal to 1.4 indicates that the model can make a rough estimate of the sample. An RPD of less than 1.4 indicates that the model cannot predict the sample [43].

Leaf N/P Ratio, Fractional Differentiation of Reflectance, and Their Correlation
The mean values of leaf TN, TP, and N/P ratio are 18.51 mg/g, 1.17 mg/g, and 17.97, respectively ( Figure 2). The TN content of our study is slightly lower than global and continent levels, with values 20.1 mg/g [44] and 20.20 mg/g [45], respectively. This TP content is significantly lower than the global average of 1.80 mg/g [44] or 1.99 mg/g [46]. The leaf N/P ratio from our study is similar to the results of Yang et al. [47]. The maximum and minimum value of the N/P ratio is 1. 34  N/P ratio from our study is similar to the results of Yang et al. [47]. The maximum and minimum value of the N/P ratio is 1.34 and 36.94, respectively, with a variation coefficient of 33.68%. Figure 2. The leaf N/P ratio frequency distribution. TN-mean, TP-mean, and N/P-mean are the mean values of total nitrogen, total phosphorus, and N/P ratio, respectively. N/P-minimum, N/P-maximum, N/P-std, N/P-cv, and N/P-ks are the N/P ratio minimum, maximum, standard deviation, coefficient of variation, and one-sample Kolmogorov-Smirnov test, respectively. n is the number of samples. Figure 3 shows the variation of the differential spectral reflectance from 0 to 3 orders. The shape of spectral curves of different orders is smoothly transitional. Compared to integer differentiation (FD(0.0), FD(1.0), FD(2.0), and FD(3.0)), fractional differentiation methods produce more detailed information about the spectrum, and these data subse- Figure 2. The leaf N/P ratio frequency distribution. TN-mean, TP-mean, and N/P-mean are the mean values of total nitrogen, total phosphorus, and N/P ratio, respectively. N/P-minimum, N/P-maximum, N/P-std, N/P-cv, and N/P-ks are the N/P ratio minimum, maximum, standard deviation, coefficient of variation, and one-sample Kolmogorov-Smirnov test, respectively. n is the number of samples.  Figure 3 shows the variation of the differential spectral reflectance from 0 to 3 orders. The shape of spectral curves of different orders is smoothly transitional. Compared to integer differentiation (FD (0.0), FD (1.0), FD (2.0), and FD (3.0)), fractional differentiation methods produce more detailed information about the spectrum, and these data subsequently allow for more complex leaf N/P ratio inversion training methods. R PEER REVIEW 9 of 18 Figure 3. Fractional differential curves of plant leaf spectral reflectance (average of all the collected samples). If the curves overlap, the former will be overwritten by the latter.
The leaf N/P ratio of plants displays a significant correlation with the fractional differential spectra for wavelengths ranging from 400-730 nm (Figure 4). From FD (0) to FD (3), the maximum absolute value of the correlation coefficient displays a unimodal distribution as the fractional differentiation increases from FD (0) to FD (3) with a peak value of 0.44 for FD (1.6).

Performance of a Single Model Using Fractional Differentiation of Reflectance
The accuracy of the PLSR model for predicting leaf N/P ratio of plants continuously increases with increasing fractional order for training sets but displays a unimodal distribution for validation sets (Figure 5a), yielding the highest R 2 values of 0.66 for FD (2.1). The prediction capability of the PLSR model gradually improves with the increase in fractional order from FD (0.6) to FD (2.1). However, the prediction capability of the PLSR model displayed a higher accuracy in training sets than for validation sets, especially after FD (2.1). This finding suggests that overfitting of the PLSR model is an issue for using this method to predict the leaf N/P ratio of plants.
The accuracy of the BPNN model in predicting the leaf N/P ratio of plants increases with increasing orders of fractional differentiation. The highest R 2 value for this method is 0.48 for validation sets and 0.92 for training sets around FD (1.1), with values remaining stable across higher fractional orders (Figure 5b). However, the overfitting problem is still present in the BPNN model when predicting the leaf N/P ratio of plants, as shown by the large differences in R 2 between the training and validation sets.
The value of R 2 of the GRNN model displays a general increasing trend as the fractional differentiation orders increase between FD (0) and FD (1.2) for training sets and then remain stable (Figure 5c). The R 2 value for the validation set increases with increasing fractional order from FD (0.0) to FD (1.7). However, similar to the BPNN model, the overfitting problem still exists in the GRNN model, as shown by the differences in the values of R 2 between training and validation sets.

Performance of a Single Model Using Fractional Differentiation of Reflectance
The accuracy of the PLSR model for predicting leaf N/P ratio of plants continuously increases with increasing fractional order for training sets but displays a unimodal distribution for validation sets (Figure 5a), yielding the highest R 2 values of 0.66 for FD (2.1). The prediction capability of the PLSR model gradually improves with the increase in fractional order from FD (0.6) to FD (2.1). However, the prediction capability of the PLSR model displayed a higher accuracy in training sets than for validation sets, especially after FD (2.1). This finding suggests that overfitting of the PLSR model is an issue for using this method to predict the leaf N/P ratio of plants.
The accuracy of the BPNN model in predicting the leaf N/P ratio of plants increases with increasing orders of fractional differentiation. The highest R 2 value for this method is 0.48 for validation sets and 0.92 for training sets around FD (1.1), with values remaining stable across higher fractional orders (Figure 5b). However, the overfitting problem is still present in the BPNN model when predicting the leaf N/P ratio of plants, as shown by the large differences in R 2 between the training and validation sets.
The value of R 2 of the GRNN model displays a general increasing trend as the fractional differentiation orders increase between FD (0) and FD (1.2) for training sets and then remain stable (Figure 5c). The R 2 value for the validation set increases with increasing fractional order from FD (0.0) to FD (1.7). However, similar to the BPNN model, the overfitting problem still exists in the GRNN model, as shown by the differences in the values of R 2 between training and validation sets.

Performance of Combined Models Using Fractional Differentiation of Reflectance
To overcome the overfitting problem of models in predicting the N/P ratio of plant leaves, a combined model using PLSR+BPNN methods was applied. This PLSR + BPNN model used the principal components extracted from the PLSR model as input variables.
The overfitting issue appears to be well-controlled by this method, as shown by the minor differences in R 2 between the training and validation sets (Figure 6a). The PLSR + BPNN model displays the best performance in predicting the leaf N/P ratio of plants when fractional differentiation was set to FD (2.3), yielding R 2 , RMSE, and RPD values of 0.90, 1.94,

Performance of Combined Models Using Fractional Differentiation of Reflectance
To overcome the overfitting problem of models in predicting the N/P ratio of plant leaves, a combined model using PLSR+BPNN methods was applied. This PLSR+BPNN model used the principal components extracted from the PLSR model as input variables.
The overfitting issue appears to be well-controlled by this method, as shown by the minor differences in R 2 between the training and validation sets (Figure 6a). The PLSR+BPNN model displays the best performance in predicting the leaf N/P ratio of plants when fractional differentiation was set to FD (2.3), yielding R 2 , RMSE, and RPD values of 0.90, 1.94, and 3.21, respectively, for training sets and 0.79, 2.71, and 2.04 for validation sets (Figure 7). The combined PLSR+GRNN model also uses principal components extracted from the PLSR model as input variables. When the fractional differential is larger than 1.7, the differences in R 2 values between the training and validation sets are minor (Figure 6b), suggesting that the overfitting is well-controlled compared to using the PLSR or GRNN models individually. The PLSR+GRNN performed well when the fractional differentiation was set to FD (2.6), with R 2 , RMSE, and RPD values of 0.91, 1.98, and 3.15, respectively, for the training sets, and 0.81, 2.46, and 2.25, respectively, for validation sets (Figure 7). Remote Sens. 2021, 13, x FOR PEER REVIEW 12 of 18 and 3.21, respectively, for training sets and 0.79, 2.71, and 2.04 for validation sets ( Figure  7). The combined PLSR+GRNN model also uses principal components extracted from the PLSR model as input variables. When the fractional differential is larger than 1.7, the differences in R 2 values between the training and validation sets are minor (Figure 6b), suggesting that the overfitting is well-controlled compared to using the PLSR or GRNN models individually. The PLSR+GRNN performed well when the fractional differentiation was set to FD (2.6), with R 2 , RMSE, and RPD values of 0.91, 1.98, and 3.15, respectively, for the training sets, and 0.81, 2.46, and 2.25, respectively, for validation sets ( Figure  7).

Model Comparison and Optimal Model Selection
In this study, five models, namely PLSR, BPNN, GRNN, PLSR+BPNN, and PLSR+GRNN, combined with fractional differentiation techniques, were used to predict the leaf N/P ratio of plants in the karst area of Guangxi Province. The optimal fractional differentiation prediction results of each model are shown in Figure 7. The prediction accuracy of these five models can be ranked in descending order as GRNN, BPNN, PLSR+GRNN, PLSR+BPNN, and PLSR according to the coefficient of determination (R 2 ) of the training sets, and PLSR+GRNN, PLSR+BPNN, PLSR, GRNN, and BPNN according to the coefficient of determination (R 2 ) of the validation sets.

Model Comparison and Optimal Model Selection
In this study, five models, namely PLSR, BPNN, GRNN, PLSR+BPNN, and PLSR+GRNN, combined with fractional differentiation techniques, were used to predict the leaf N/P ratio of plants in the karst area of Guangxi Province. The optimal fractional differentiation prediction results of each model are shown in Figure 7. The prediction accuracy of these five models can be ranked in descending order as GRNN, BPNN, PLSR+GRNN, PLSR+BPNN, and PLSR according to the coefficient of determination (R 2 ) of the training sets, and PLSR+GRNN, PLSR+BPNN, PLSR, GRNN, and BPNN according to the coefficient of determination (R 2 ) of the validation sets.
The PLSR+BPNN and PLSR+GRNN are excellent models for predicting the leaf N/P ratio of plants in karst area, as they display high prediction accuracy and successfully control overfitting. The PLSR+GRNN model is slightly better than the PLSR+BPNN and is selected as the optimal model in this study.

Advantages of Fractional Differentiation
Fractional differentiation of spectra can improve the performance of models in predicting the N/P ratio of plant leaves. The optimal differentiation for the five models used in this study is fractional differentiation rather than integer differentiation (Table 3). For example, the best fractional differentiation of the PLSR model is FD (2.1), with an RPD of 2.45 for the training sets and an RPD of 1.57 for the validation sets. The PLSR model with a fractional differentiation of 2.1 produced more accurate and robust values than zero, first, second, and third orders differentiation. Additionally, the optimal fractional differentiation of the PLSR+BPNN model and the PLSR+GRNN model is FD (2.3) and FD (2.6), respectively, which both produce better values than zero, first, second, and third orders differentiation. These results suggest that the fractional differential transform plays a positive role in predicting the leaf N/P ratio of plants. The PLSR+BPNN and PLSR+GRNN are excellent models for predicting the leaf N/P ratio of plants in karst area, as they display high prediction accuracy and successfully control overfitting. The PLSR+GRNN model is slightly better than the PLSR+BPNN and is selected as the optimal model in this study.

Advantages of Fractional Differentiation
Fractional differentiation of spectra can improve the performance of models in predicting the N/P ratio of plant leaves. The optimal differentiation for the five models used in this study is fractional differentiation rather than integer differentiation (Table 3). For example, the best fractional differentiation of the PLSR model is FD (2.1), with an RPD of 2.45 for the training sets and an RPD of 1.57 for the validation sets. The PLSR model with a fractional differentiation of 2.1 produced more accurate and robust values than zero, first, second, and third orders differentiation. Additionally, the optimal fractional differentiation of the PLSR+BPNN model and the PLSR+GRNN model is FD (2.3) and FD (2.6), respectively, which both produce better values than zero, first, second, and third orders differentiation. These results suggest that the fractional differential transform plays a positive role in predicting the leaf N/P ratio of plants.

Distribution of Sensitive Wavelengths
The spectral reflectance from 400-730 nm, and especially 520-650 nm, are sensitive wavelengths for predicting the leaf N/P ratio of plants. This result is consistent with previous studies, such as Cui et al. [8], who found that near 650 nm was the best wavelength for estimating the N/P ratio. Hansen and Schjoerring [48] also showed that spectral reflectance near 530 nm and 720 nm are important wavelengths for estimating the N concentration of wheat. In addition, Xu et al. [49] found that spectral reflectance of 540-560 nm and 760-780 nm are sensitive wavelengths for the C/N ratio of wheat and barley leaves. There are some differences in sensitive wavelengths related to the leaf N/P ratio between the karst and non-karst plants, which may be caused by adaptions to survive in karst environments. For example, plants growing in karst areas have decreased stomatal conductance, thickened palisade tissue, and increased keratinization due to thin soil layers and relatively low air humidity [50]. These physiological differences can impact the radiative transfer processes of plant leaves, leading to changes in the sensitive wavebands.

Control Overfitting
We reduced the noise in the spectral reflectance by fractional differentiation and improved the representativeness of the feature variables for the model input. Moreover, we applied the PLSR model to extract principal components to reduce the dimensionality of the spectral data. The reduced-dimensional spectral data are used as input variables of the BPNN and GRNN models. In this way, we proposed two composite models, PLSR+BPNN and PLSR+GRNN. The prediction performance of PLSR+BPNN and PLSR+GRNN models will be better than simple models such as spectral index and ordinary linear regression. The simple model is more adapted to a single plant species [14,51,52] and very sensitive to databases [16]. In contrast, our model is more adaptable to complex environments as it is set up in a mixed forest database. On the other hand, this method decreases the complexity of the model while ensuring minimal loss of spectral information and reducing the occurrence of overfitting. These results are consistent with previous studies [53] that show overfitting issues could be overcome by using non-negative principal component analysis (NPCA) to extract principal components as input variables for machine learning. Although combining PCA or PLSR models with machine learning methods does not help improve the model prediction capability, it effectively minimizes the overfitting problem of machine learning methods.

Application for Mixed Forest
Empirical models tend to be site-, time-, and species-specific and are therefore unsuitable for large-scale analyses [54]. Previous studies investigating the inversion of plant biochemical parameters are also biased towards specific regions, such as wetlands [24] or grasslands [28], or species such as wheat [55] or rice [17]. Although high accuracy can be obtained from the inversion of biochemical parameters for a single species [15], plants are more likely to co-exist in a mixed form in the natural environment, and studies on only one species are not applicable to diverse ecosystems. Studies across various species need to be conducted before these methods can be applied to mixed forest environments. However, it is crucial to improve the model accuracy and robustness before the inversion of biochemical parameters of mixed species. We applied five models to predict the leaf N/P ratio of plants in a karst area. Among them, the PLSR+BPNN and PLSR+GRNN models have the best performance, with high prediction accuracy and robustness. The performance of both the PLSR+BPNN and PLSR+GRNN models was better than the models used by Cheng et al. [16] in terms of the coefficient of determination (R 2 ). This improved performance is mainly due to the full consideration of both linear and nonlinear relationships between leaf biochemical parameters and field spectroradiometer data in our study.
Sample composition also impacts the model performance. For example, there was a significant difference in the performance of the BPNN model between our study and Cui et al. [8] when comparing the coefficient of determination for validation tests, and no overfitting was observed in the results of Cui et al. [8]. These differences in performance were due to the large variability in sample composition, with leaf N/P ratio of plants in this study ranging from 1.34 to 36.94 compared to that of phragmites communis N/P ratio ranging from 6.7 to 15.9 in the Cui et al. study [8].

Conclusions
Estimating the N/P ratio of plant leaves using field spectroradiometer data is challenging. We estimated the variation of leaf N/P ratio of plants in a karst area of Guangxi Province using five models, namely PLSR, BPNN, GRNN, PLSR+BPNN, and PLSR+GRNN. The sensitivity wavelengths of the leaf N/P ratio of plants are mainly in the range of 400-730 nm. Applying a single model (such as PLSR, or BPNN, and or GRNN) can estimate the leaf N/P ratio of plants, but all methods produce significant overfitting of the data. In contrast, the combined models of PLSR+BPNN and PLSR+GRNN can avoid the overfitting problem in predicting the leaf N/P ratio of plants, and have high accuracy prediction capabilities. In addition, using fractional differentiation methods can effectively improve the prediction capability of the model in estimating the leaf N/P ratio across a variety of plant species. This study provides a valuable scientific basis for long-term dynamic monitoring of plant biochemical parameters using field spectroradiometer data.