Optimized Estimation of Leaf Mass per Area with a 3D Matrix of Vegetation Indices

: Leaf mass per area (LMA) is a key plant functional trait closely related to leaf biomass. Estimating LMA in fresh leaves remains challenging due to its masked absorption by leaf water in the short-wave infrared region of reﬂectance. Vegetation indices (VIs) are popular variables used to estimate LMA. However, their physical foundations are not clear and the generalization ability is limited by the training data. In this study, we proposed a hybrid approach by establishing a three-dimensional (3D) VI matrix for LMA estimation. The relationship between LMA and VIs was constructed using PROSPECT-D model simulations. The three-VI space constituting a 3D matrix was divided into cubical cells and LMA values were assigned to each cell. Then, the 3D matrix retrieves LMA through the three VIs calculated from observations. Two 3D matrices with different VIs were established and validated using a second synthetic dataset, and two comprehensive experimental datasets containing more than 1400 samples of 49 plant species. We found that both 3D matrices allowed good assessments of LMA (R 2 = 0.76 and 0.78, RMSE = 0.0016 g/cm 2 and 0.0017 g/cm 2 , respectively for the pooled datasets), and their results were superior to the corresponding single Vis, 2D matrices, and two machine learning methods established with the same VI combinations.


Introduction
Leaf biochemical properties are important indicators of changes in ecosystem functions and environment, and are necessary components in dynamic vegetation models [1][2][3]. Leaf mass per area (LMA), defined as the ratio of dry mass of a leaf to its one-sided surface area, is one of these key plant traits. Closely related to plant growth, LMA is often used in characterizing leaf biomass [4][5][6][7]. It also affects plant adaptation, including plant growth strategy and leaf lifespans, and the lower part of the canopy and longer-lived plants usually contain higher LMA [8][9][10][11]. At a broader scale, LMA can explain the differences in functional groups and is closely related to environmental stress [12]. Therefore, accurate LMA prediction is of great significance to improve our understanding of the functioning of the biosphere.
Remote sensing methods have been widely used in predicting plant traits for nondestructive and large-scale measurements [3,[13][14][15]. Foliar spectroscopy-based approaches for estimating leaf biochemistry include physical and empirical methods [16][17][18]. Though LMA is reported to have obvious absorption peaks in shortwave infrared (SWIR) region, its retrieval remains difficult due to the lower absorption coefficient compared with leaf water content (C w ) [19][20][21]. This can lead to serious ill-posed problems in physics-based LMA inversion [22][23][24]. To improve its LMA estimation, Féret et al. [25] determined the range of 1700 to 2400 nm for LMA retrieval through the PROSPECT model. Qiu et al. [4] developed a new version of PROSPECT-g to consider the anisotropic scattering within leaves. The physical method has a rigorous mathematical foundation, yet its inversion can be complicated and time-consuming because of the numerous amounts of parameters [26]. Among the empirical methods, fitting relationships to vegetation indices (VIs) are one of the most popular. Several VIs have been developed for LMA estimation [27][28][29]. They require observations of only a few spectral bands and are very convenient to use. Constructing regression models between vegetation indices (VIs) and leaf physiological traits with machine learning (ML) algorithms have also achieved success [30,31]. Partial least squares regression (PLSR) was used by Serbin et al. [32] to establish a widely applicable LMA prediction model. However, a substantial amount of prior information is needed to train the regression equations, and their quality is found to be restricted by the training data, especially when the model is to be applied on a dataset independent from the training set [25].
In order to combine the benefits of physical methods and VIs, this study proposes a hybrid VI-combination approach for optimizing LMA estimation from leaf reflectance. The relationship between LMA and three VIs constituting a three-dimensional (3D) matrix was established using PROSPECT-D simulations. Six different VIs were used in the construction of two 3D matrices for LMA estimation. Their performance was tested using a second synthetic dataset and two comprehensive experimental datasets against the corresponding single VIs and 2D matrices. This method is expected to precisely retrieve LMA without prior information and has the potential to aid the estimation of other leaf components.

PROSPECT Model
Based on the plate model [33], the PROSPECT model was developed in 1990 [34] and has been widely used in modeling of radiative transfer in a leaf. PROSPECT-D is the version developed in 2017 and was reported to perform better than previous versions [35]. The input parameters of PROSPECT-D include leaf structure index (N), leaf chlorophyll content (C ab ), carotenoid content (C ar ), anthocyanin content (C ant ), brown pigment content (C brown ), leaf water content (C w ), and leaf mass per area (LMA). The forward running of the model can generate accurate simulations of leaf hemispherical reflectance and transmittance from 400 nm to 2500 nm at a resolution of 1 nm. Leaf chemical traits can be inverted using PROSPECT-D by assigning the component content of the most similar spectrum to the measured electromagnetic spectrum.

Description of the Synthetic Datasets
By running the PROSPECT-D model in the forward mode, two synthetic datasets were generated. The model parameters were assumed to have a Gaussian distribution, and their statistical characteristics were listed in Table 1 based on previous studies [20,32]. The model parameters of C anth and C brown were not considered here as they are insensitive to leaf reflectance in the SWIR domain based on sensitivity analysis and therefore set at 0 µg/cm 2 [36]. The first synthetic dataset was composed of 200,000 simulations (hereinafter dataset T) for training the 3D matrices. The second synthetic dataset (hereinafter dataset E) with 1000 simulations was generated for evaluating the 3D matrices. Given that synthetic data have a stable distribution feature, further enlarging the size of validation datasets would not greatly affect the quantification of the accuracy of the simulated LMA.

Description of the Experimental Datasets
Since both the synthetic training and validation data sets were generated with the same settings in the PROSPECT-D model, they cannot be considered independent. Hence, in order to do a truly independent evaluation, we compiled two experimental datasets. These also take the various realistic factors in actual measurements into account, and validate the inversion results of LMA based on the synthetic T dataset. The first dataset was the Leaf Optical Properties Experiment (LOPEX) dataset [37], which was released by the Joint Research Center of the European Commission in 1993. This dataset consists of 330 leaves from 45 plant species. Leaf reflectance spectra within 400-2500 nm were measured with a spectrophotometer equipped with an integrating sphere. Five corn dry leaves were excluded because the PROSPECT-D model is intended for fresh leaves [35].
The second dataset was collected at Madison, Wisconsin (hereinafter called MA) in the summer of 2016 by the University of Wisconsin Environmental Spectroscopy Laboratory [38]. Leaf reflectance spectra within 350-2500 nm were measured using an ASD FieldSpec 3 instrument with a leaf clip. This dataset contains 1143 leaves of four plant types: grasses, trees, vines, and forbs. The two experimental datasets contain a total of 1468 leaves, representing different leaf internal structures, leaf spectra, and biochemical components. The spectral resolution of the measured leaf reflectance in both datasets was 1 nm. As the absorption characteristics of LMA are weak even at its sensitive wavelengths [20,36], LMA estimation using a single VI or physically-based method is often inaccurate [39]. Thus, in the current study, six VIs correlated with LMA were combined in two different 3D matrices using PROSPECT-D simulations to enhance the retrieval of LMA. The VIs selected have been demonstrated to have comparatively good performance by previous studies.

3D Matrix
The first 3D matrix is composed of modified simple ratio (MSR), normalized difference (ND), and simple reflectivity (R2300) indices. The MSR index was developed to compensate for leaf surface reflectance [28,40]. The ND index was a commonly used indicator for predicting LMA [27]. As to the R2300 index, leaf reflectance at 2300 nm is reported to be sensitive to LMA [28]. The expressions of MSR, ND, and R2300 are presented in Table 2. Table 2. Spectral indices constructing the first 3D matrix.

Index ID Formula Reference
Modified simple ratio-type index MSR (R2265 − R2400) (R1620 − R2400) [28,40] Normalized difference-type index ND [27,39] Single reflectivity-type index R2300 R2300 [28] The second 3D matrix consisted of single reflectivity-type (R1800), modified ND (MND) and difference-type (D) indices. R1800 is based on the SWIR band, being one of the most relevant bands indicating LMA variability [32]. The MND is a more sophisticated index than ND, which has a similar capability of MSR to compensate for leaf surface reflectance [28,40]. Previous studies showed that the use of spectral information from 1700 nm to 2400 nm was more conducive, improving the accuracy of LMA estimation, and the D index was particularly consistent with this conclusion [25]. The R1800, MND, and D indices are defined in Table 3. Table 3. Spectral indices constructing the second 3D matrix.

Index
Index ID Formula Reference [28,41] Single reflectivity-type index R1800 R1800 [32] The two 3D matrices contained VIs of similar forms and wavelengths. For instance, wavelengths around 1300 nm, 1700 nm, and 2300 nm were found to be effective for extracting LMA [20].

Establishment of the 3D Matrices
To construct the LMA 3D matrices, we first divided the VI 1 -VI 2 -VI 3 space into several small cells of the same size ( Figure 1). Each cell was linked to an LMA value, which corresponded to a small range of the three VI values. Thus, the smaller the size of the cells in the 3D matrix, the more simulations are needed for building up the whole 3D matrix. In the present study, the 3-VI space was partitioned into 100 × 100 × 100 combining both accuracy and efficiency. In this way, two LMA 3D matrices were generated by using the two VI combinations in Section 2.2.1.
(MND) and difference-type (D) indices. R1800 is based on the SWIR band, being one of the most relevant bands indicating LMA variability [32]. The MND is a more sophisticated index than ND, which has a similar capability of MSR to compensate for leaf surface reflectance [28,40]. Previous studies showed that the use of spectral information from 1700 nm to 2400 nm was more conducive, improving the accuracy of LMA estimation, and the D index was particularly consistent with this conclusion [25]. The R1800, MND, and D indices are defined in Table 3. Table 3. Spectral indices constructing the second 3D matrix.

Index
Index [28,41] Single reflectivity-type index R1800 R1800 [32] The two 3D matrices contained VIs of similar forms and wavelengths. For instance, wavelengths around 1300 nm, 1700 nm, and 2300 nm were found to be effective for extracting LMA [20].

Establishment of the 3D Matrices
To construct the LMA 3D matrices, we first divided the VI1-VI2-VI3 space into several small cells of the same size ( Figure 1). Each cell was linked to an LMA value, which corresponded to a small range of the three VI values. Thus, the smaller the size of the cells in the 3D matrix, the more simulations are needed for building up the whole 3D matrix. In the present study, the 3-VI space was partitioned into 100 × 100 × 100 combining both accuracy and efficiency. In this way, two LMA 3D matrices were generated by using the two VI combinations in Section 2.2.1.  The training process of an LMA 3D matrix is as follows. First, for all samples in synthetic dataset T, the three VIs adopted by the 3D matrix were calculated. Second, the 3D domain of each cell in the 3D matrix was determined by dividing the full variation range of each VI by 100. Finally, the corresponding LMA of each sample in synthetic dataset T was assigned to a specific cell in the 3D matrix, directed by the coordinates of calculated VIs. In cases that different LMA values were mapped to the same cell, their average was considered the final cell value. The process of establishing a 2D matrix follows a similar procedure. The linear regression model of each VI was generated using the Remote Sens. 2021, 13, 3761 5 of 15 synthetic dataset T. Furthermore, the standard deviation of each cell in the 3D matrices were established to indicate the uncertainty of the 3D matrices.

Estimation of the LMA
After the establishment of a 3D matrix, LMA can be estimated through referring to the three VIs calculated from observations of a sample. The working mode of a 3D matrix includes two cases: Case 1. Direct retrieval: The three VI values calculated from leaf reflectance are traced back to one cell in the matrix. If the cell has a non-null value, the matrix returns it directly as the LMA value retrieved by the 3D matrix. Case 2. Indirect retrieval (nearest neighbor retrieval): In some rare cases, the cell directly searched by the matrix is not a number (NaN), and a second retrieval is required. All cells with LMA values nearest to the current cell are identified, and their average is retrieved as the LMA by the 3D matrix.

Estimation of LMA through ML Algorithms
Support vector machine (SVM) is a supervised learning model that is widely used for regression of high-dimensional samples [42]. When it fits linear model in a highdimensional space, it limits the complexity of the model by minimizing the risk of overfitting [43]. PLSR is often used in spectroscopy to handle high predictive collinearity and insufficient observed variables. PLSR can reduce the impact of these circumstances by orthogonalizing the predictor variables [32,44]. In this study, we utilized SVM with linear kernel function and PLSR methods as comparison to estimate LMA. The ML models were parameterized using the VI combinations and the LMA simulations of the synthetic dataset T (described in Section 2.2.1).

Accuracy Evaluation
Each of the three VIs constituting the 3D matrix can also constitute three 2D matrices. For the evaluation of their performance, both 3D matrices were compared with their 2D versions and single Vis, and the ML methods in terms of estimating LMA using the independent experimental datasets and synthetic E dataset. The following statistical parameters were used: coefficient of determination (R 2 ), root mean square error (RMSE), and normalized root mean square error (NRMSE) [45,46]: where y i and y i is the estimated and measured LMA values, respectively; y max and y min are the maximum and minimum values of y i , respectively; n is the number of measurements, and i is the serial number of the measured LMA.

3D Matrices of VIs for LMA
The distribution of samples in the synthetic dataset T in the 3D matrix shows a trend, where most points formed a curved face, with the rest distributed below the surface (Figures 2 and 3). Several LMA iso-surfaces can be observed in both 3D matrices, indicating a progressive trend from the minimum to the maximum. Notably, a converging point existed in both matrices, in which almost all the LMA iso-surfaces converged. The coordinate values of this point were theoretically close to the extreme values of VIs calculated based on synthetic dataset T (approximately MSR of 0.4353, ND of 0.0321, and R2300 of 0.0201 for MSR-ND-R2300; R1800 of 0.0896, MND of −1, and D of 0.0078 for R1800-MND-D). The farther away from the converging point, the more obvious the differences between different LMA values were, thereby facilitating LMA inversion.
The distribution of samples in the synthetic dataset T in the 3D matrix shows a trend, where most points formed a curved face, with the rest distributed below the surface (Figures 2 and 3). Several LMA iso-surfaces can be observed in both 3D matrices, indicating a progressive trend from the minimum to the maximum. Notably, a converging point existed in both matrices, in which almost all the LMA iso-surfaces converged. The coordinate values of this point were theoretically close to the extreme values of VIs calculated based on synthetic dataset T (approximately MSR of 0.4353, ND of 0.0321, and R2300 of 0.0201 for MSR-ND-R2300; R1800 of 0.0896, MND of -1, and D of 0.0078 for R1800-MND-D). The farther away from the converging point, the more obvious the differences between different LMA values were, thereby facilitating LMA inversion.  where most points formed a curved face, with the rest distributed below the surface (Figures 2 and 3). Several LMA iso-surfaces can be observed in both 3D matrices, indicating a progressive trend from the minimum to the maximum. Notably, a converging point existed in both matrices, in which almost all the LMA iso-surfaces converged. The coordinate values of this point were theoretically close to the extreme values of VIs calculated based on synthetic dataset T (approximately MSR of 0.4353, ND of 0.0321, and R2300 of 0.0201 for MSR-ND-R2300; R1800 of 0.0896, MND of -1, and D of 0.0078 for R1800-MND-D). The farther away from the converging point, the more obvious the differences between different LMA values were, thereby facilitating LMA inversion.  Representing the uncertainty of the two 3D matrices in LMA estimation, two 3D matrices of standard deviation (STD) were calculated (Figures 4 and 5). In these 3D matrices, the farther away from the converging point, the smaller the standard deviation. This indicated that the sensitivity of the 3D matrix to LMA increases from the converging point to the extended direction. Cells with high LMA values in both 3D matrices usually have high STD. Besides, the overall STD of MSR-ND-R2300 was lower than that of R1800-MND-D. The average STD of R1800-MND-D (0.003 g/cm 2 ) was approximately three times that of MSR-ND-R2300 (0.001 g/cm 2 ). These results showed that MSR-ND-R2300 had a lower uncertainty than R1800-MND-D for LMA estimation. dicated that the sensitivity of the 3D matrix to LMA increases from the converging point to the extended direction. Cells with high LMA values in both 3D matrices usually have high STD. Besides, the overall STD of MSR-ND-R2300 was lower than that of R1800-MND-D. The average STD of R1800-MND-D (0.003 g/cm 2 ) was approximately three times that of MSR-ND-R2300 (0.001 g/cm 2 ). These results showed that MSR-ND-R2300 had a lower uncertainty than R1800-MND-D for LMA estimation.   high STD. Besides, the overall STD of MSR-ND-R2300 was lower than that of R1800-MND-D. The average STD of R1800-MND-D (0.003 g/cm 2 ) was approximately three times that of MSR-ND-R2300 (0.001 g/cm 2 ). These results showed that MSR-ND-R2300 had a lower uncertainty than R1800-MND-D for LMA estimation.

Evaluation against Synthetic Data
Both 3D matrices provided substantially higher accuracy estimating LMA for the synthetic dataset E compared to the corresponding estimates of the 2D matrices and the single VIs (Tables 4 and 5). In most cases, 3D matrices performed better than the 2D matrices, and the 2D matrices performed better than the single VIs. Of all combinations, the matrix MSR-ND-R2300 had the highest accuracy (R 2 = 0.99, RMSE = 0.0005 g/cm 2 , NRMSE = 1.7%). The NRMSE of LMA retrieved by either of the two 3D matrices was less than 2%, indicating that the 3D matrices significantly optimized LMA estimation, and the effects of involving more than three VIs may not be obvious.

Evaluation against the Experimental Datasets
The retrieval performances of LMA using MSR-ND-R2300, corresponding 2D matrices, single VIs, and ML models were compared against the experimental datasets (Table 6). For both experimental datasets, the MSR-ND-R2300 matrix consistently performed best in estimating LMA. The improvement of the LMA estimation by the MSR-ND-R2300 was more significant on the dataset LOPEX (RMSE decreased by 36%) than on the MA dataset (RMSE decreased by 5%). Given the superior performance of the MSR-ND-R2300 on both datasets, it yielded the highest accuracy of LMA estimation on the pooled dataset (R 2 = 0.78, RMSE = 0.0017 g/cm 2 , NRMSE = 10.5%). Thus, the matrix of MSR-ND-R2300 was considered a more robust and high-precision method than the relevant 2D matrices, single VIs, and ML algorithms.
The retrieval performances of LMA using R1800-MND-D and corresponding 2D matrices, single VIs, and the two ML models were also compared against the experimental datasets ( Table 7). The R1800-MND-D performed best for estimating LMA using the pooled experimental dataset (R 2 = 0.76, RMSE = 0.0016 g/cm 2 , NRMSE = 9.9%). For both experimental datasets, the retrieval of LMA using matrix of R1800-MND-D yielded the highest accuracy (RMSE = 0.0016 g/cm 2 for MA, RMSE = 0.0015 g/cm 2 for LOPEX). To conclude, the matrix of R1800-MND-D is also a superior method for estimating LMA compared with the corresponding 2D matrices, single VIs, and the two ML models. Table 6. Corresponding to the MSR-ND-R2300 matrix, the coefficient of determination (R 2 ), root mean square error (RMSE, in g/cm 2 ), and normalized RMSE (NRMSE, %) of the estimated LMA from experimental datasets using the 3D matrix, corresponding 2D matrices, single VIs, support vector machine (SVM), and partial least-squares regression (PLSR) (the optimal results are indicated in bold).  Comparing the two 3D matrices, MSR-ND-R2300 had higher R 2 (0.85 for MA, 0.74 for LOPEX, and 0.78 for pooled dataset, Figure 6a,c,e in all experimental datasets, whereas R1800-MND-D achieved lower RMSE (0.0016 g/cm 2 for MA, 0.0015 g/cm 2 for LOPEX, and 0.0016 g/cm 2 for pooled dataset, Figure 6b,d,f. Moreover, the estimation bias of R1800-MND-D for a few samples with LMA around 0.005 g/cm 2 was relatively large, which may lead to the decrease of R 2 . Nevertheless, the RMSE of LMA estimation using the matrix of R1800-MND-D was still lower than that using MSR-ND-R2300. Consequently, the 3D matrix of R1800-MND-D was considered more promising than MSR-ND-R2300 for estimating LMA.

Improvements over Traditional VIs
Traditional methods for estimating LMA with a single VI commonly use two or several spectral bands related to LMA, and the regression is relatively simple to implement. Nevertheless, as light absorption of LMA is relatively weak and often masked by C w in its sensitive bands, the detection of variations in LMA by VIs was limited [39]. The 3D matrix proposed in this study can effectively strengthen the relationship between VIs and LMA, and thus facilitate the retrieval of LMA. The range of LMA in the 3D matrices established in this study was between 0.001 and 0.036 g/cm 2 (Figures 2 and 3), which covered the range of most vegetation species and growing stages according to the TRY database [47] and previous researches [32,48]. In addition, these 3D matrices had achieved good accuracy when estimating LMA on all the validation datasets (Tables 4-7). The especially high accuracy obtained based on synthetic dataset was partly due to that the same Gaussian distribution used in generating synthetic datasets G and T, and that random and systematic noises were not considered. Yet it proved that the three dimensions were adequate for LMA estimation. Using the 3D matrix improved LMA retrieval compared with any single corresponding VI because inversion is based on similarity of the values of multiple VIs rather than an inaccurate regression relationship susceptible to C w absorption interference.
In addition, our results showed that the proposed method has superior performance compared with the hybrid models based on SVM and PLSR approaches (Tables 6 and 7). We suggested that this method can be applied to indicate LMA when no prior measurements are available. The 3D matrix works very similarly to a lookup table (LUT). The minor difference between the proposed method and the standard LUT with VIs is that the LUT is generated by partitioning the ranges of LMA and other parameters, and then obtaining the corresponding VIs. The 3D matrix, by contrast, is generated by partitioning the ranges of the three VIs, and then the samples with different LMAs are distributed in the matrix. Those samples in the same cell were averaged. Besides, the 3D matrix holds advantages in manifesting a visible and clearer structure of the distribution of the VIs and LMA.
The estimation results of LMA through 3D matrix in this study were also compared with previous reports. Féret et al. [25] used an optimal spectral domain to invert LMA with the PROSPECT model with an RMSE of 0.002 g/cm 2 using the LOPEX dataset. Wu et al. [49] adopted the spectral invariants theory at the leaf level. The inversion accuracy of LMA was 0.0018 g/cm 2 (RMSE) using the same dataset. Validation using LOPEX showed that our method had a slightly higher inversion accuracy (RMSE = 0.0015 g/cm 2 ).

Impact of VI Selection in 3D Matrix Construction
Two factors that greatly affect the performance of a VI-based matrix are the sensitivity and structure of the VIs. VIs that are sensitive to a parameter of interest or confounding parameters can be combined in the form of a matrix for the improvement of the estimation of this parameter. Selecting VIs sensitive to the parameter of interest is easy to understand, whereas involving VIs sensitive to confounding factors can help alleviate the influence of these factors. Given that the contribution of C ab and LAI is hard to differentiate on the canopy scale, Xu et al. [50] proposed the use of two VIs sensitive to LAI and C ab in generating a matrix to decrease the influence of LAI and were then able to improve the accuracy of the model predictions. In the case of LMA, although C w affects the detection of the spectral signals of LMA in the SWIR domain, combining VIs related to LMA and C w would not result in the same degree of improvement in LMA estimation according to our prior experiments (not shown here). This is due to that the correlation between C ab and LAI is strong, while that between LMA and C w is much poorer [27]. As a result, a C w -sensitive VI, while acting as an axis of the matrix, does not assist retrieval of LMA because it has little contribution to LMA differentiation. The results in this study showed that a 3D matrix based on VIs related to LMA yields a high accuracy for LMA estimation, indirectly making up the absorption of LMA. This 3D matrix-based method can potentially be applied to invert other leaf parameters, particularly those with weak light absorption features, such as leaf carotenoid content.
In this study, we made no specific restrictions on the types of VIs constituting the 3D matrices, showing that a broad range of VI types can be used in generating a 3D matrix. In future studies, the redundancy of spectral information in selected VIs may be a factor to consider. The VIs of LMA containing bands with low correlations are expected to improve LMA estimation.

Sources of Error and Further Development of the 3D Matrix Approach
Although the 3D matrix-based approach optimizes the LMA estimations, uncertainty exists in three aspects: (1) inaccuracy of the PROSPECT model; (2) the included Vis, and (3) NaN existed in the 3D matrix. The PROSPECT model assumes the specific absorption coefficient of LMA as the integration of the optical influences of various organic constituents, which may lead to inaccuracies [25]. Additionally, PROSPECT can be inaccurate in describing leaf surface features, such as surface roughness [34]. As a synthetic dataset simulated by the PROSPECT model was used, these uncertainties were propagated when establishing the 3D matrix. Given that here the range of biochemical parameters adopted to generate the synthetic dataset was determined by integrating TRY database and other comprehensive studies, the 3D matrices established in this study hold good generalization ability. However, for specific vegetation types or growth phases, their performance may not be optimal. Using prior information in adjusting the ranges and distributions of biochemical parameters and the relationships between parameters during a 3D matrix establishment can help reduce uncertainty.
Uncertainty also stems from the inherent limitations of VIs, for disparate LMA levels may be misclassified to the same cell because they had similar spectral characteristics within the three VIs. This error may be difficult to eliminate completely because VIs cannot fully reflect variations in LMA. Still, in this respect, the 3D matrix has a lower degree of uncertainty than a 2D one because the third VI corrects some misallocation of the LMA into cells. However, the performance of some 2D matrices on the experimental datasets was inferior to that of single VIs, which may be caused by the interaction between VIs.
Finally, the 3D matrices contained NaN (Figures 2 and 3). The main reason is that some combinations of VI values represented by cells do not exist in real leaf samples, and these cells are theoretically NaN. In addition, the possible inadequacy of PROSPECT model simulations may result in NaN in some cells. In the inversion process, when the cells with NaN were matched, the estimation was based on the average of the nearest cells with non-values, which may result in errors.
The results in Section 3.1 showed that the average STD of cells in MSR-ND-R2300 was lower than that in R1800-MND-D, which reflected a theoretically lower uncertainty of the former 3D matrix. However, when estimating LMA from both experimental datasets, the RMSE obtained by MSR-ND-R2300 was higher than that of R1800-MND-D (Section 3.2). The synthetic dataset E with the same configuration can detect the training accuracy of the 3D matrix, with higher R 2 often accompanied by smaller errors, thus the average STD of MSR-ND-R2300 was relatively lower. The utilization of synthetic data set may introduce bias to the 3D matrix when applied in experimental datasets. Therefore, the R 2 of the two 3D matrices from experimental datasets was therefore consistent with the STD matrices, while the RMSE was not.
This study proposed a novel method for LMA estimation at leaf scale, and the problem of atmospheric absorption could thereby be ignored. However, this could be a serious problem in airborne and satellite imagery applications [51], as it often leads to the weakening or unavailability of some effective reflection features [52]. For instance, VIs with bands covering the spectral region of high-water vapor absorption (such as 1800 nm and 2400 nm) should be avoided. Currently, several VIs for LMA inversion at the canopy scale have been developed [28,53,54], providing a promising basis for the development of a 3D LMA matrix at the canopy scale. The 3D matrix suitable for the canopy scale needs to Remote Sens. 2021, 13, 3761 13 of 15 be further investigated. Additionally, the PROSPECT model needs to be coupled with a canopy reflectance model, and the signal-to-noise ratio of VIs at the canopy scale should be considered.

Conclusions
The remote sensing of LMA is often challenging due to its lower light absorption coefficient compared with C w in the SWIR region. Thus, this study developed a novel approach for optimizing LMA estimation using a 3D VI matrix based on PROSPECT-D simulations. The results showed that compared with the corresponding single VIs and 2D matrices, the 3D matrix improved the estimation of LMA effectively. Compared with the machine learning models constructed using the same VI combinations, the 3D matrices estimate LMA more accurately. Between the two 3D matrices, R1800-MND-D achieved slightly better comprehensive performance than MSR-ND-R2300, with a lower RMSE for both experimental datasets. Given the physical principle used in constructing the 3D matrix, and that little prior information was involved in calibrating the models, the 3D matrix shows high potential for a stronger generalization capability. The proposed method can provide valuable guidance for reflecting vegetation growth, development stages, as well as physiological response to environmental stresses. Data Availability Statement: All data used in this manuscript are publicly available through EcoSIS spectral database, including LOPEX (https://ecosis.org/package/leaf-optical-propertiesexperiment-database--lopex93-) and MA (https://ecosis.org/package/7433af7d-fbbd-4617-8df4-4 d892f0d4357 (accessed on 1 January 2019)).