Regional Scale Dryland Vegetation Classification with an Integrated Lidar-Hyperspectral Approach

—Nitrogen (N) has been linked to different ecosys- tem processes, and retrieving this important foliar biochemical constituent from remote sensing observations is of widespread interest. Since N is not explicitly represented in physically based radiative transfer models, empirical methods have been used as an alternative. The spectral bands selected during the calibration of empirical methods have been interpreted in the context of light-N interactions and, consequently, ecosystem function. The low amount of leaves on shrubs and their sparse distribution in drylands create an environment, in which the canopy structure and the bright background soil play an important role in the canopy total radiation budget. In this paper, we examine the assumption that removing the impact of canopy structure and soil will result in improved N retrieval using the empirical methods. We report the inconsistencies in the selection of spectral bands among the empirical approaches. Moreover, these methods are sensitive to the scale of the study and band transformations. Using the generalized theory of canopy spectral invariants, we found that at the canopy scale, a combination of canopy structure and soil dominates the total canopy radiation. At the plot scale, soil be statistically achievable, caution should be taken when interpreting their results in the context of N-light interactions and ecosystem function. Our approach can be extended to all terrestrial ecosystems with multiple layers of canopy and understory.


I. INTRODUCTION
T ERRESTRIAL ecosystem processes have been interpreted from the remote sensing estimates of foliar nitrogen and other leaf biochemicals. Canopy nitrogen (N) has been related to forest albedo and linked to climate change [1], nutrient limitation [2], the Amazonian functional biodiversity [3], and the role of the plant community in controlling canopy biochemistry [4].
There are two general approaches for remote sensing of canopy chemistry: physical methods based on the concept of radiative transfer models (RTMs) and empirical/statistical methods based on regression analysis. A combination of these two approaches, known as the hybrid methods, can also be used [5]. Since there are no reliable RTMs that include leaf N, this foliar variable is mostly identified by empirical methods. Acceptable estimates of N have been reported using a range of empircal methods (average R 2 and root-mean-square error (RMSE) of 0.72 and ± 0.16, respectively, [6]). These include multiple linear regression [7], [8], partial least squares (PLSs) regression [7], [9]- [11], stepwise multiple linear regression [12], and, more recently, popular methods, such as neural networks [13], [14], support vector machines (SVMs) [15], [16], Bayesian regression (BR) [17], and random forest (RF) [18].
The goal of statistical analysis is to fit a model between N and the feature space (i.e., spectral bands) or a transformation of the feature space. The developed model is then tested for its predictive power using cross validation. The most influential variables on the model fit are then discussed in the context of light-N interactions. There are three issues related to this type of study. First, it is known that multiple chemical, physical, and structural properties of vegetation and background soil control the spectral signal received at leaf, canopy, and plot scales. In many cases, such as in sparse vegetation cover, N does not dominate the spectra [19], [20]. Ideal empirical studies usually include relevant predictor and response variables. It is thus essential to consider whether the empirical relationships between reflectance spectra and N are suitable. Another consideration is that there is a limited consistency between empirical studies in the selected bands for N prediction [6], [21]. More importantly, in some cases, the selected bands are not consistent with the concepts of radiative transfer of N absorption but rather driven by the canopy structure or other factors that may or may not covary with N. For example, in dense forests with the assumption of a zero canopy background contribution (i.e., black soil) to the total canopy bidirectional reflectance factor (BRF), it has been shown that the canopy structure derives positive correlation between the near-infrared (NIR, 800-850 nm) and N [22]. In fact, multiple studies have identified NIR as the important predictors of N [1], [23], [24]. Finally, further investigation on the generalizability of empirical studies with cross validation is needed. The number of successful N retrieval studies using statistical methods has been used as the affirmation for the replicability of these models [25]. There is an urgent need to examine the interpretability of empirical methods and their fundamental meaning to the remote sensing and ecology community.
One way to study the interpretation of empirical methods is to investigate their link to underlying light-canopy physical processes. Knyazikhin et al. [22] used the theory of spectral invariants [26], which is based on radiative transfer. [27], and introduced a directional area scattering factor (DASF) as a new measure of canopy structure. DASF, in concept, is an estimate of the ratio of the leaf area that forms the canopy boundary, as seen along a given direction, to the total leaf area. Normalization of BRF to DASF results in canopy scattering coefficients (W ) that are insensitive to canopy structure. In contrast to empirical findings, W exhibits either negative or no correlation with N [22], [28]. A complicating factor is that while the DASF approach assumes a black soil background, in many ecosystems, this assumption is violated, and indeed, impacts of soil can be larger than those from the canopy structure itself.
While empirical methods are widely used for canopy N retrieval, comprehensive studies linking these results to physical processes, such as canopy radiative transfer, are lacking. Our goal is to examine the empirical methods used for more than two decades in the remote sensing community to answer the fundamental question of whether we can rely on these methods to predict N in the context of the confounding factors of canopy structure and soil. Our null hypothesis is that correcting for confounding factors will improve the N predictions using empirical methods. To test our hypothesis, we implement a range of empirical models and physical analyses based on the generalized theory of canopy spectral invariants [22], [29]. Our study advances the community discussion of the light-N interactions beyond dense forests to include ecosystems with multiple layers of canopy and understory.

II. MATERIALS AND METHODS
Five sites were selected across western USA in the semiarid ecosystem, known as the Great Basin (GB), for the field study and data collection (see Fig. S1 in the Supplementary Material). The Reynolds Creek Experimental Watershed, Birds of Prey, and Hollister sites are located in Idaho, and the Big Pine and Lone Pine sites are located in California on the eastern side of the Sierra Mountains. The dominant vegetation cover in the GB is the sparsely distributed shrubs. These dryland study sites provide the opportunity to study the impact of canopy structure and soil on remote sensing of N and extend the previous work in dense forests to xeric ecosystems. Most of the ecosystems follow the same pattern, in which an understory layer (e.g., soil, grass, and so on) contributes to the total pixel radiation budget. Field data sampling was conducted during 2014-2015 (see Table S1 and data set S1 in the Supplementary Material). Considering the dominance of sagebrush (Artemisia tridentata) and bitterbrush (Purshia tridentata) in the study sites, plots were selected based on the dominant cover of one of these two species. We define three scales for the study: at the leaf scale, the spectra were collected from dry leaves; at the canopy scale, the spectrometer was held above the top of the canopy; and at plot scale (10 m × 10 m), the spectra are acquired from the Airborne Visible/Infrared Imaging Spectrometer-Next Generation (see data set S2 in the Supplementary Material) airborne hyperspectral sensor. An extended version of the data collection can be found in Text1 in the Supplementary Material. Four different types of spectral transformations were applied to the spectra, the smoothed data set using the Savitzky-Golay filter [30] with a frame size of 11, second-degree polynomial, logarithmic transformation, first derivative of smoothed data set, and logarithm of the first derivative. These transformations are widely used in remote sensing of canopy chemistry and are known to enhance the absorption features [31]- [33]. For the statistical analysis, we implemented PLS [34], SVM [35], RF [36], and BR regression [17] methods and a newly developed multimethod ensemble variable selection (VS) based on the integration of PLS, SVM, and RF [37]. In this Ensemble approach, a spectral band is important if it is identified as important by all three methods. Each method returns band importance, which will be weighted by the explained variance of selected model for each method according to the following equation: where I iw is the weighted importance of band i , I i is the raw measure of the band importance of the regression method, R 2 is the explained variance of the model in cross validation or out-of-the-bag testing, and σ I is the standard deviation across the raw measures of importance of each model. The product of the three weighted importance values is considered as the Ensemble importance. As a base method, we calculate the variable importance in projection (VIP) by developing 1000 PLS models [37], [38]. VIP is the most common approach for VS based on PLS outputs. In our approach, a band was considered important when its average VIP value along with one standard deviation (derived from 1000 iterations) was above 1. We refer to the mean of the 1000 PLS models as the PLS Ref model. The reported R 2 and the coefficient of variation (CV) for PLS ref is the mean of all 1000 model runs. K-fold cross validation and also leave one data set out validation has been used to assess the model's performance. In the leave one data set out approach, a complete data set that has been collected in a given year and site was kept out of the calibration step. A synthetic data set was created to test the performance of all methods in an ideal case. This data set contained 200 observations with 500 correlated predictors; 10 of the predictors were set to have a coefficient value of 1 (i.e., relevant predictors), and all other predictors are 0 (unrelated predictors). A complete description of a similar data set construction is presented in [17]. The purpose of the statistical analysis is not to test a comprehensive list of the algorithms but to implement the most common ones. The physical analysis is based on the theory of canopy spectral invariants. According to this theory, under the assumption of black soil, canopy scattering s(λ) and absorption a(λ) are expressed in the following equations: where i 0 is the probability of canopy interceptance, ω l is the single-scattering albedo of an average phytoelement at any wavelength, and p is the recollision probability [39]. The recollision probability can be interpreted as the probability that a photon interacted with canopy elements will interact within the canopy again [40]. This theory can be generalized to the situation with multiple end-members, and the interaction between photons and end-members can be treated as an infinite-state Markov chain [29]. Then, (2) and (3) have the following form: where q = (I−P n )1 is the vector of escape probability after n interactions, is a diagonal matrix of single-scattering albedo associated with endmembers, and ∝ (λ) = (I − (λ))1 is the vector of the end-member's absorptance. The quantity q defines the probability that a photon scattered by phytoelements will escape vegetation toward a given direction [41]. This generalization includes both photons from the canopy end-members r c and background end-members r B or s = r c + r B . Furthermore, based on the principle of energy conservation, we can calculate the canopy radiation budget (CRB) and quantify canopy and surface contribution to the CRB. We assume that there are two layers consisting of a canopy layer on top of a flat soil layer. The solution of CRB with reflective surface is built by integrating the black soil solution (BS) problem, in which soil impact is negligible, and the second solution where soil is considered as the source of illumination (S problem). Then, CRB can be expressed in the following equations: where r c , a c , t c, ρ s , and q s are canopy reflectance, canopy absorptance, canopy transmittance, soil reflectance, and photon escape probability from soil, respectively. The BS' and S' reflectance, absorptance, and transmittance are defined in the following equations: and where q l , i 0 , and t 0 are canopy escape probability, canopy interceptance, and uncollided transmittance of the BS problem, respectively. p_{LL}, p_{LS}, p_{SL}, and p_{SS} are the leaf-leaf, leaf-soil, soil-leaf, and soil-soil recollision probabilities. The second term in (6) accounts for the influence of soil to the CRB. Photons that escape directly from the surface are not part of the canopy reflectance even if they interact with canopy before reaching the surface. Subtracting this term from r c leave us with r BS , which has the form noted in (9). Not surprisingly, r BS is similar to the model developed for the black soil problem [39, eq. (2)] with recollision probability p = p_{LL}. Thus, one should note that the contribution of soil to the total reflectance is the sum of the photons that escape from soil (r B ) and reach the sensor and the portion of photons that escape from soil and influence the canopy reflectance (r c ). We refer to the first component as soil contribution to the total reflectance and second component as the soil contribution to the reflectance of CRB. A complete derivation and description of the terms in (4)- (14) are provided in [29] and [42]. With the assumption that soil is flat (no interaction between soil end-members), p_{SS} is much lower than the other recollision probabilities and can be neglected ( p_{SS} = 0). In order to estimate p_{LL}, p_{LS}, p_{SL}, and i 0 , we fit the spectra at each scale to (4) using the covariance matrix adaptation evolution strategy (CMAES) optimization approach [43]. The mean RMSE between simulated and measured total canopy spectra at the canopy scale is 0.01 and at the plot scale is 0.02. CMAES is a state-of-the-art evolutionary algorithm developed for nonlinear, nonconvex black-box optimization problems. We added 4% Gaussian noise to the spectrums and performed inversion. Our preliminary analysis (not shown in this paper) using 100 different random initial points and a noisy objective function showed that CMAES is not sensitive to the initial points or noise and consistently returns the global minima.
Using the recollision probabilities and (6)- (8), CRB was calculated. For the retrieval of DASF, r BS was used. The DASF was calculated using the algorithm developed in [22]; the only modification was instead of using a reference leaf albedo, we used measured green leaf reflectance of sagebrush and bitterbrush [44]. In order to calculate the leaf albedo, leaf transmittance was acquired by inversion of PROSPECT-5 leaf model [45] using the measured leaves reflectance. In summary, the steps of our physical analysis are as follows. 1) Fit the spectrum to (4) to estimate the recollision probabilities. 2) Use estimated recollision probabilities to calculate r BS using (6)- (14). 3) Use r BS to calculate DASF and canopy scattering. 4) Canopy scattering is then used for statistical analysis.

A. Variable Selection Is Sensitive to Transformation and Scale
Using an ideal data set (materials and method), Fig. 1 shows the performance of an Ensemble model and its submodels (PLS, SVM, and RF), VIP, and BR VS techniques. We start by discussing the VIP results. Between methods, VIP performs the worst in VS by identifying numerous unrelated bands as important. In cases where there is high covariation between variables (e.g., spectral bands) but small correlation to the target (e.g., N), the standardization used by PLS can address the latter but at the cost of increasing the weight of minor variables with low signal-to-noise ratios [46]. This directly impacts the VIP as it is a filter method and uses the PLS output with no further postprocessing. This behavior can also be seen at the leaf scale (see Fig. 2), though to a lesser degree than the canopy and plot scales. Our first observation shows that the VIP is sensitive to the type of transformation. Across all transformations, the VIP identified the NIR region (∼800-1350 nm) as important predictors of leaf N. This is controversial since most of the incident radiation is reflected and transmitted (∼50% each) in the NIR region by the leaf mesophyll [47], and it is extremely difficult to identify weak N absorption bands (e.g., 910 and 1020 nm [48]) using VS techniques. While many hyperspectral studies use VIP for band selection [32], [49], [50] and, specifically, in foliar N estimation [10], [24], [51], [52], the selection of unrelated bands may require further examination of VIP. Fig. 1 shows that the BR produced the closest results to the ideal model for band selection, and the Ensemble approach follows with less ideal but comparable results. In the Ensemble approach, a spectral band is important if it is considered important by all three regression methods (PLS, RF, and SVM). The peaks in the PLS match the coefficients of the ideal model, but PLS has assigned weights to many unrelated variables. RF is more restrictive, and consequently, many unrelated bands are close to zero. This restrictive behavior, however, caused RF to miss some of the important variables. The SVM results are similar to PLS, which has been observed in other studies [37]. The improved performance of the BR and Ensemble methods can be attributed to the fact that both methods are ensembles of competing models. BR is based on the Bayesian model averaging and has theoretical advantages over standard regression analysis [17]. The Ensemble method, on the other hand, uses the relative merits of PLS, SVM, and RF. For example, the restrictive VS method of RF is balanced with the more inclusive PLS and SVM in the Ensemble approach.
When we used field data at the leaf scale, both the Ensemble and RF methods show less sensitivity to the transformations. However, the individual models of the Ensemble in isolation show sensitivity to the transformation. These transformations have been accepted as a standard preprocessing method for remote sensing of foliar biochemistry. Thus, two studies with the same data set, but different spectral transformations for predicting N can suggest different spectral bands as important predictors. At the leaf scale, our findings show that the log-transformed data set leads to more meaningful bands when compared with known N absorption regions (see Table S2 in the Supplementary Material). If assuming the leaf reflectance spectrum has the same shape as leaf transmittance, then the log transformation is an approximation of the foliar absorption spectra and is consistent with the Beer's law absorption coefficient [53].
Using the log transformation and the Ensemble approach for interpretation, most of the selected bands at the leaf scale are in the visible and mid-infrared (>1350 nm) with no bands selected in the NIR region. Selection of the visible spectrum can be attributed more explicitly to the chlorophyll content [54]. Moderate correlation is reported between chlorophyll and N [6]. In the shortwave-infrared region identified bands centered near 1655, 1715, 1900, and 2200 nm can be attributed to N absorption at 1690, 1940, and 2240 nm. Among them, the bands near 2240 and 2300 nm are more directly related to the N or proteins that carry N, while other bands can be associated with absorption by biochemicals, such as lignin, cellulose, and starch. The complex interactions between N and these biochemicals may result in misinterpretation of the selected bands and, consequently, unreliable statistical estimates of N. The Ensemble method is a VS method and does not provide predictions. Since the VS performance of BR and Ensemble is close, BR can be used for both VS and model fitting at the leaf scale. We conclude that at the leaf scale, with the assumption of no confounding factors, robust empirical methods, such as BR, are likely to provide a meaningful model for N predictions. However, it is important to choose the appropriate spectral transformation (e.g., logarithmic transformation), and results of the VS should be checked with known absorption regions.
Moving from the leaf to canopy and plot scales, the selected bands for N differ, regardless of the method used. Fig. 3 shows VS for the log-transformed data set. Other transformations can be found in Fig. S3 in the Supplementary Material. The fundamental assumption of empirical predictions is foliar N controls the reflectance [55], [56], and thus, we would expect consistency in the bands identified as important across scales. However, Fig. 3 indicates a lack of this consistency across scales. This is the first evidence that there are other elements that may have more dominant role in controlling the canopy radiation.
Using log-transformed data and focusing on the Ensemble or BR methods, the spectral regions identified as important for N are similar to the regions identified as important for a similar analysis of leaf area index (LAI) at the canopy and plot scales (see Fig. 4, and Fig. S4 in the Supplementary Material). Most of these bands lie in the red edge and the NIR region. The  NIR region in particular is known to be attributed to canopy structure (i.e., LAI). However, the predictive power of these bands for LAI is much less than N (see Tables S3 and S4 in the Supplementary Material). For example, the cross-validated mean R 2 and CV of all methods for the log-transformed prediction of N at the canopy scale are 0.61 and 16.67, respectively, while for LAI, they are 0.26 and 35.39, respectively. It is tempting to discuss this in terms of ecology and the association between canopy N and LAI; however, it is more likely a statistical problem. An explanation is that statistically significant explanatory variables (e.g., spectral bands) that have an association with a target variable might not necessarily carry the most predictive power, and the most predictive variables are not necessarily the most significant ones [57], [58].
A key distinction that makes a variable significant or predictive lies in the properties of their underlying distribution. This issue has been observed in different disciplines from genome-wide association studies [59] for disease predictions to social studies predicting civil wars [60]. Thus, the problem can be statistically framed by asking if the research goal is to find highly significant or highly predictive variables, whereas searching for both significant and predictive variables can lead to conflicting directions [57]. The notion of predictability and significance of variables has not been explored in remote sensing of canopy biochemistry. In Section III-B, we discuss the problem associated with cross validation and demonstrate that these predictive variables for N lose their predictive power when applied to an external data set, of which time or location is different from the calibration data set.

B. Cross Validation Is Overoptimistic
Generalization of a statistical method is a key concept and refers to applying a model based on a particular target population to other populations [61]. As discussed in the classical paper by Verstraete et al. [55] (1996), "a relation, obtained by statistically correlating remote sensing measurements and field observations, is useful only for those locations and times other than those used to establish the correlation." Otherwise, remote sensing provides no new information. Cross validation is accepted as the de facto standard method used in remote sensing communities to test for the generalization of the developed model. Table S5 in the Supplementary Material shows results for normal cross-validation versus validation results for the data set that was kept out of calibration (see Section II). The data sets are different in either time (year of data collection) or spatial location. While cross validation shows well enough performance of different methods, when applying the same models to an independent data set, R 2 of at least one of the methods is close to 0. Similar results are reported in other disciplines [62], [63]. Even strict cross validations may still be overoptimistic due to heterogeneity between data sets [64]. Thus, our results indicate that indeed these statistical methods are not replicable.
Changes in time and location change the distribution of feature space [65], [66] due to differences in measurements or state variables that control the soil-plant-reflectance interactions. The state variables are those that are clearly represented in RTMs [55]. The simplest statistical fix for the replicability of statistical methods is to seek models that perform well enough in the context of the leave one data set out test. None of the methods used in this paper show strong overall performance with this test. More sophisticated solutions are based on the methods that can compensate (or correct) for distributional shifts that may also be referred to as "domain adaptation" [67], [68]. The recent framework developed in [65] would enable us to identify and correct the distributional shift. The important point here is even if we correct for distribution shifting and ignore the fact that we do not have enough information about the vegetation to derive N (see Section III-C), our model is still, at best, a predictive model and is not correlative. Because we are changing the distribution of features, as a consequence, we might obtain a new set of features that might have good predictive power, even for external data sets, but with limited correlative relation with N. Our conclusion is that if we ignore the impacts of canopy structure and soil on the total canopy reflectance and if the question at hand is just prediction not interpretation, then with some statistical manipulations, we are able to produce predictive models. However, caution must be observed when interpreting these models and, in particular, at scales larger than the leaf scale. Fig. 5 shows the box plot of the estimated spectral invariants at the canopy and plot scales. Here, we assume a two-layer system, in which a layer of a canopy is on the top of a layer of soil, an assumption similar to many open canopy ecosystems. However, this approach can be extended to the layers of multiple canopies from different species and understories. The probabilities of a photon intercepted by the canopy and soil are i 0 and 1 − i 0 , respectively. P_{LL}, P_{LS}, and P_{SL} are the probabilities of photon interactions between canopy-canopy, canopy-soil, and soil-canopy, respectively. Fig. 5 shows that changing scale from canopy to plot should affect P_{SL} and i 0 but not P_{LL} and P_{LS}. This is because, the probability that a photon escapes from the canopy (1 − P_{LL} − P_{LS}) should remain independent of the soil condition (e.g., reflective versus nonreflective). At the canopy scale, the mean of i 0 is 0.17, and at the plot scale, it is 0.05. This low number simply shows the large impact of the soil on the total reflectance at both canopy and plot scales. For example, if we assume no additional interaction between photons from vegetation and soil, the total canopy reflectance is composed of 17% information from the shrub and 83% from the soil. At the plot scale, the contribution of soil can be represented in two forms: photons leaving the soil toward the sensor and photons leaving the soil and contributing to the CRB. Thus, the impacts of the soil at the plot scale are more than the canopy. Fig. 6 compares the simulation for two shrubs. The contribution of canopy reflectance to total reflectance is much higher in the green shrub. If we remove the contribution  of soil in the CRB for the dry shrub, then the residual (which is reflectance of the canopy itself) is close to 0. This is expected since there is no leaf on the canopy. Finally, Fig. 7 shows the simulations of different components of the total canopy and plot reflectance for all samples. Not surprisingly, the greater impact of soil at the plot scale, compared to the canopy scale, is observable. The next step is to estimate the impact of canopy structure on the total BRF. We removed the soil contribution in total BRF, and the residual was used to calculate the DASF (see Section II). Among 151 samples at the canopy scale, we could not calculate DASF for 49 samples. These shrubs are mostly located in California sites, suffering drought, that have few small leaves [e.g., Fig. 6 (Left)]. Thus, after removing the impact of soil, the residual is close to 0, and calculating DASF is meaningless. As expected, this problem is worse at the plot scale. In fact, we could not calculate DASF for any of the plot samples after removing soil. This is another piece of evidence that supports the rejection of using a cross-validated statistical N estimate at the plot scale (see Table S3 in the Supplementary Material). In conclusion, the majority of the information contained in the plot reflectance is from the soil (up to 95%), and after removing it, there might not be enough information to infer canopy structure or N. Thus, our statistical prediction of foliar N at the plot scale is unreliable.  R 2 of the BRF/leaf albedo ratio versus BRF relationship is an estimate of the DASF quality retrieval. A note of caution is that, in theory, it is still possible to estimate DASF values with an R 2 close to 1 for small BRF. However, the estimated DASF is very small. Normalizing canopy BRF to a small DASF will result in large canopy scattering (W more than 1), which is impossible. This is important since filtering sparse vegetation based on this R 2 has been recently suggested [69], which may lead to incorrect canopy scattering. For green shrubs, DASF has a positive correlation (see Fig. 8) with the shoulder of the NIR region (800-850 nm). This is in line with the radiative transfer process and the findings by Knyazikhin et al. [22] (2013). Fig. 9 shows canopy scattering for the shrubs, for which we were able to calculate DASF. Canopy scattering mimics a typical leaf albedo and is insensitive to canopy structure [22], demonstrating success with the DASF approach. In Section III-D, we experiment with using canopy scattering across all wavelengths to predict foliar N. Table I shows the results of corrected canopy samples versus their counterpart samples, which were not corrected for soil and canopy structure. It can be seen that after correction for canopy structure and soil, none of the methods or transformations predict N for canopy-scale data. Since we have selected the log transformation for our statistical analysis in Section III-A, we also provided its 1:1 plots between measured and predicted N before and after corrections (SI, Fig. S5 in the Supplementary Material). The correlation between N and reflectance implies association (dependence) rather than causation. Association can exist between two variables in the presence or absence of causality [70]. It is common to deduce a causal relationship from a correlation. For any causal claim to be verified, one should consider the might be condition. For example, what might be the case for the N-light relationship if canopy structure and soil did not exist? This is known as the theory of counterfactual [71]. Causality is the fundamental property of a system, which means a causal relationship would be invariant to the changes of the system. Our physical analysis provides the basis to test for the counterfactual outcomes. Since the results of the N prediction before and after correction for soil and canopy structure have changed, a causal relationship between reflectance and N cannot be concluded. Consequently, we reject our null hypothesis that removing confounding factors improves predictions. One argument is that N-reflectance correlation implies a functional association [72], which is consistent with ecological understanding (i.e., plant physiology). From our analysis, correction for canopy structure and soil leads to no correlation. This does not, however, invalidate the functional association. Undoubtedly, N plays an important role in different canopy processes; however, not all associations lead to correlation [70]. The functional association can be translated into a priori information. Currently, remote sensing alone is not able to incorporate such a priori information into the predictions of N. Dynamic vegetation models (DVMs) can be used to reconcile the theory of remote sensing and ecology. For example, these models incorporate both ecological processes (e.g., photosynthesis) and light-canopy interactions (e.g., RTMs).

D. Correlation and Causation and the Concept of Counterfactual
To account for the role of canopy structure-and to some extent soil, two solutions have been proposed. First, empirical models are applied to adjusted spectra that have been filtered with a normalized difference vegetation index and height and, then, adjusted with a brightness normalization [73]. A second approach is to add light detection and ranging (LiDAR)derived canopy structural parameters, such as canopy height or fractional cover, to the feature space to construct the statistical model [52]. In both approaches, multiple scattering is not explicitly solved. The impact of canopy geometry, such as orientation and arrangement of leaves and branches, as well as multiple scatterings between the canopy and different layers of understory, including soil, confound the N signal. Adding LiDAR variables to the feature space makes the final model more complicated to interpret rather than simpler.

IV. CONCLUSION
Due to the lack of inclusion of N in leaf RTMs, N, historically, has been estimated with remote sensing data using statistical methods [6]. The interpretation of the statistical models depends on the spectral bands selected during the process of model fitting. We have shown that different models can identify different important bands (see Fig. 1). Moreover, each model is sensitive to the type of transformation applied to the spectra before model fitting. These experiments show that common VS routines for foliar biochemistry studies at scales coarser than the leaf may be misleading. Strong prediction rates reported in remote sensing studies are often based on cross validation, which may be overoptimistic. None of our empirical methods could reproduce cross-validation results when applied to an external data set. Thus, we concluded that these methods are not replicable.
We extended the physical work of [22] to cases, where vegetation is sparse, and soil cannot be ignored. At the plot scale, the impact of soil is a dominant confounding factor, and in more extreme cases, such as drylands, there might not be enough information to retrieve biochemistry or some canopy structure variables, such as DASF. Recent developments using full-waveform LiDAR may solve the problem of canopy structure [74]. Removing confounding factors at the canopy and plot scales leads to different statistical models that might or might not have prediction power. We discussed this against the theory of counterfactuals, leading to rejection of our null hypothesis that removing confounding factors will improve empirical predictions. The idea of functional association, which is used to justify statistical methods, is best suited for remote sensing coupled with DVMs.
Our overall conclusion is that if we are interested in predicting N with remote sensing, then we might be able to produce such empirical models, in particular with the growing body of machine learning algorithms. However, one must be cautious in interpreting these models, particularly in complex ecosystems, because they may be affected by the canopy structure, soil, spectral transformation, and the type of model implemented. With advancements in spaceborne hyperspectral and full-waveform LiDAR observations and, thus, sophisticated measurements of ecological processes (e.g., photosynthesis, solar-induced fluorescence, and so on), there is strong potential to gain new insights of changing ecosystems [75]. Empirical methods serve an important role in analyzing the products of these newly developed sensors. Given the issues related to empirical methods, the remote sensing community should be cautious about the statistical tools they use. We encourage deeper discussions of these methods and the need to explore new fields in statistics for remote sensing, such as a rigorous investigation of causality and predictivity versus significance of variables.
Our results might imply that radiative transfer modeling is the best approach to estimate N. However, currently there is no reliable leaf scale RTM that incorporates N because of the lack of absorption coefficients and other information due to the many different types of bonds N carries [76]. Moreover, RTMs at both the leaf and canopy scales carry multiple implicit and explicit assumptions that might cause relative error up to 70% [77]. For example, the assumption of considering canopy as turbid media in 1-D RTMs leads to the definition of effective LAI, which is different from true LAI measured [78]. Regardless, these deficiencies do not make RTMs useless. Most DVMs require RTMs to simulate the CRB. Thus, a potentially better approach to solve the problem of N retrieval is to incorporate it with the DVMs (i.e., through data assimilation) rather than using remote sensing data in insolation. For example, ED2 [79], [80] and CLM [81] DVMs use a simple two-stream canopy RTM [82]. Incorporation of more sophisticated 3-D RTMs into these models might improve their performance [83] and, consequently, facilitate better N predictions. In addition, the canopy spectral invariants theory has produced results close to 3-D RTMs [84] with much faster performance, and thus, a logical next step is to incorporate the generalized canopy spectral invariants theory into DVMs. He is currently an Assistant Professor with Ohio State University, Columbus, OH, USA. His research interests include mapping, monitoring, modeling, and managing terrestrial environments across scales, especially in the context of global environmental changes.
Lucas P. Spaete received the B.A. degree in program in the environment from the University of Michigan, Ann Arbor, MI, USA, and the M.S. degree in forest ecology and management from the Michigan Technological University, Houghton, MI, USA.
He is currently a Senior Spatial Analyst with Superior Geospatial, Superior, WI, USA. His research interests include remote sensing of vegetation using a combination of optical and LiDAR sensors.
Marie-Anne de Graaff received the B.S. degree in forestry and nature management and the Ph.D. degree in environmental science from Wageningen University, Wageningen, The Netherlands.
She is currently an Associate Professor with Boise State University, Boise, ID, USA. Her lab broadly studies how changes in climate and land use affect ecosystem processes that drive the global carbon cycle.