Enhanced automated canopy characterization from hyperspectral data by a novel two step radiative transfer model inversion approach

: Automated, image based methods for the retrieval of vegetation biophysical and biochemical variables are often hampered by the lack of a priori knowledge about land cover and phenology, which makes the retrieval a highly underdetermined problem. This study addresses this problem by presenting a novel approach, called CRASh, for


Introduction
In agriculture, remote sensing in the reflective optical domain is often used as a cost-effective method to determine spatial and temporal variations in canopy state variables such as leaf area index (LAI) and leaf chlorophyll content.Such variables are driving variables in models predicting growth and yield, and are foreseen to play a major role in precision farming, which aims at optimizing crop management practices to spatial and temporal variability within fields in order to maximize outputs and profit while minimizing ecological impacts [1,2].In order to be effective, methods providing such indirect estimates from remote sensing data should be highly accurate [3].
Canopy reflectance is controlled by a large number of factors, including the optical properties of vegetation elements (e.g., leaves, stems, and fruits) and soil, and their arrangement, size, and area share in the vegetation canopy.Moreover, at-sensor radiance is highly dependent on the configuration of viewing and illumination angles as well as on the prevailing atmospheric conditions [4,5].
Remote sensing techniques for estimating vegetation characteristics from reflective optical measurements are either based on statistical-empirical or physical approaches, both having their advantages and disadvantages [6].Statistical-empirical methods are typically based on a regression function between measured biochemical or biophysical properties and spectral measurements in the form of a vegetation index (VI).A VI is a combination of a limited number of spectral bands, selected and combined in a way that enhances spectral features related to the variable of interest while reducing undesired effects caused by variations in soil reflectance, sun and view geometry, atmospheric composition, and other leaf or canopy properties [6].Classical VIs such as the NDVI [7] and SAVI [8] have been based on broad spectral bands in the visible and near infrared (NIR) part of the spectrum and are particularly suitable to monitor structural vegetation variables such as LAI and fraction of green cover [9].With the recent development of hyperspectral sensors, new indices have been developed that exploit the information contained in narrow absorption features and the shape of the spectral curve.Such indices have facilitated more accurate estimations, especially of leaf biochemicals like chlorophyll, water, and cellulose [10][11][12].VI based approaches are computationally fast, but a fundamental problem is their lack of generality.Since canopy reflectance depends on a complex interaction of internal and external factors that may vary significantly in time and space and from one crop type to another, no universal relationship between a single canopy variable and a spectral signature can be expected to exist [13].Consequently, to be valid over a wide range of species, canopy conditions, and view/sun configurations, these relationships need to be recalibrated for each specific situation, which is a tedious and costly task [14][15][16].For this reason, the relationships between VIs and biochemical and physical variables are often established by the use of canopy radiative transfer models (RTMs), which explicitly describe the interactions between solar radiation and the elements constituting the canopy using physical laws.RTMs thus allow to generate a virtually endless number of canopy reflectance spectra for which the input variables are known and illumination and view conditions can be controlled [10,[17][18][19].Regression equations developed using RTMs are therefore often more robust than those based on in situ measurements.They are often referred to as predictive equations or predictive relationships [20].
Pure physical retrieval methods involve the inversion of a RTM.The great advantage of RTM inversion is the fact that all variables influencing top of canopy reflectance can be explicitly accounted for.Various strategies have been proposed for the inversion including numerical optimization methods [21][22][23], look-up table approaches [24][25][26] and artificial neural network methods (e.g., [27][28][29]), which all have their strengths and weaknesses in specific situations [6].The inversion of canopy RTMs is by nature an ill-posed problem mainly for two reasons [24,25].On the one hand, several combinations of canopy biophysical and biochemical variables have a mutually compensating effect on canopy reflectance thus leading to very similar remote sensing signals.On the other hand, measurement (e.g., radiometric calibration, georectification, atmospheric correction) and model uncertainties and simplifications (e.g., assumptions on canopy architecture) may induce large inaccuracies in the modelled canopy reflectance [30].Thus, for one and the same input spectrum a large range of plausible results is usually obtained [21].
To reduce the effect of multiple solutions, regularization is required, which involves either the introduction of additional spectral constraints or the introduction of some a priori information about the expected estimates of the variables.Spectral constraints can be obtained by enclosing neighbourhood information through an object-based approach [28] or by increasing the spectral dimensionality of the observation, either by including multiple view angles [25,31,32], or by increasing the number of spectral bands [33,34].Although hyperspectral remote sensing systems are known to provide additional information on vegetation vitality in waveband ranges that are not accounted for by most multispectral sensors [35,36], up to date such systems are rarely used on an operational basis for the characterization of agricultural crops.Several sources of a priori information about the variables can be considered for remote sensing applications, including ancillary data measured on site, estimates provided by another sensor, knowledge of the dynamic evolution of the biophysical variables over time [37,38], and knowledge about the typical distribution of the input variables in a particular development stage [24].Land cover classification schemes help to split the problem into sub-domains for which prior information is attributed separately [39,40].Nevertheless, land cover classification may introduce problems due to misclassification and attribution errors and lead to artefacts at the limit between classes translating into more chaotic spatial or temporal variation of the solution [41].Moreover, many of the biomes found at the Earth surface are better characterized by gradual transitions then by a crisp classification.
In order to guarantee timely decision support for crop management strategies, remote sensing based products for agricultural applications should be delivered typically within a few days after data acquisition, which implies the use of automated methods.Therefore, this study explores the possibilities of developing a completely automated and image based RTM inversion approach for the concurrent retrieval of the key crop variables LAI, leaf chlorophyll content, leaf water content, leaf dry matter content, and average leaf angle.The approach, which we call CRASh (Canopy variable Retrieval Approach based on PROSPECT and SAILh) is based on the combined leaf reflectance model PROSPECT [36,42] and the canopy radiative transfer model SAILh [43,44].Innovation of the approach is not the automated model inversion per se, but the exploitation of novel methods to ensure stable RTM inversions for mono-temporal high resolution data in an operational setting where inclusion of a priori information on the variables is strongly hampered.Whereas operational algorithms for medium to low resolution imagery can rely on more frequent coverage, which facilitates land cover classifications in support of retrievals optimized for specific biomes [13,41], a similar option is not offered for the locally operating high resolution systems.Thus, alternative ways are exploited.To regularize the inverse problem in the variable domain, RTM inversion is coupled with an automated land cover classification and a semi-empirical approach based on vegetation indices and RTM simulations.As suggested by [33,34], hyperspectral data is used to further constrain the range of possible solutions in the spectral domain.Even though the CRASh approach has been developed for the inversion of entire images, in this study we confine validation of the approach to simulated and field measured hyperspectral reflectance spectra.In this way we can focus on the performance of the approach itself while excluding inaccuracies resulting e.g., from co-registration errors or scaling issues.Performance of the approach is analysed in terms of prediction accuracy and bias.Its performance is also discussed with regard to a broadly used conventional RTM inversion approach.

Methodological Overview
A schematic overview of the automated CRASh canopy variable retrieval approach is given in Figure 1.The approach is based on the inversion of a canopy radiative transfer model (Section 2.2.).In order to let the inversion take place in a restricted domain of the space of canopy realization, the top of canopy spectra input to the inversion are first submitted to an automated spectral land cover classifier (Section 2.3.).RTM inversion is based on lookup tables (LUTs, Section 2.4.) which are generated for each land cover class separately (Section 2.4.1.).Minimization between the observed reflectance spectrum and the reflectance spectra contained in the LUT takes place first in the radiometric space, and then in the canopy variable space (Sections 2.4.2., and 2.4.4., respectively).The latter is based on a first estimate of the solution based on predictive regression equations between selected vegetation indices (VI) and each variable input to the radiative transfer model (Section 2.4.3.).The individual components are discussed in more detail in the respective sections.Notice that although the inversion scheme was originally designed for image data, it just as well works with a collection of field spectra.

The Radiative Transfer Model
An automated RTM approach for local applications postulates a lack of a priori knowledge about the land cover and the actual status of the vegetation canopy under consideration.This impedes the use of a complex 3-D RTM requiring many input variables, as insufficiently characterizing canopy structure in such a context leads to an increased number of solutions to the ill-posed problem [45].For this reason, we based our approach on the leaf reflectance model PROSPECT [36,42] and the canopy model SAILh [43,44] which proved their value in a large number of studies and applications (For a comprehensive historic review see [46]).The coupled 1-D turbid medium PROSPECT+SAILh model requires only nine input variables: leaf chlorophyll content Cab (μg cm −2 ), leaf water content Cw (g cm −2 ), leaf dry matter content Cdm (g cm −2 ), leaf brown pigment content Cdm (unitless), leaf mesophyll structure index N, leaf area index LAI (m 2 m −2 ), average leaf angle ALA (°), hot spot variable HS, and soil brightness BS.Canopies demonstrating a complex 3-D structure are not accurately enough characterized with PROSPECT+SAILh, so that application of the model should be confined to canopies obeying the turbid medium assumption in which scattering elements (leaves) are randomly distributed in the canopy volume.This assumption is generally legitimate for most agricultural crops.

Land Cover Classification
To let the inversion take place in a restricted domain, spectra are first automatically classified with the SPECL spectral land cover classifier [47].SPECL performs a spectral classification of the reflectance cube based upon template spectra for the Landsat Thematic Mapper (TM) reference wavelengths.The template spectra consist of typical vegetation covers, soil, sand, and water.Rather than explicitly assigning land use or cover type, vegetated surfaces are segregated into various "greenness" classes such as "sparse", "dark", "average", or "bright vegetation" (Figure 2).SPECL uses a framework of decision rules to decide whether a spectrum is assigned to a specific class or not.An overview of the land cover types and decision rules can be found in Table 1.The choice to base land cover classification on the Landsat TM band configurations and not on the full hyperspectral reflectance cube is motivated by the fact that the specific absorption features of the land cover types considered in this study are broad enough to allow for a classification based only on this subset of bands.Besides, using Landsat TM template spectra with a limited number of broad wavebands ascertains that spectral classification can be performed independently of the band configuration of the considered hyperspectral sensor.In this way, the spectral classifier does not need to be recalibrated for each sensor which facilitates its use in an operational environment.

Lookup Table Inversion
Model inversion is based on a lookup table (LUT) approach.LUT based approaches are a trade-offs between the commonly used, highly flexible, but expensive iterative approaches and neural networks which, once trained, are very fast, but more difficult to adapt to changing local conditions [6,37].Using a LUT approach involves two steps: i) generation of a synthetic data set, the LUT, and ii) definition of the solution corresponding to the LUT entries that best match a given measurement.
The cost function to minimize between measurement and model may be derived from the maximum a posteriori estimator [24,48].The solution is thus found by minimizing: where r is the ensemble of radiometric observations (depending on wavebands and observation geometries), r s the radiometric signal simulated with the RTM, v the vector of variables used to simulate r s and v p the vector of expected values of the variables.M and P are the variance-covariance matrices of the measurements and the prior estimates, respectively.
In the case of an automated approach, a priori information about the variables is generally absent or only very vaguely defined.Therefore, we propose a new method for a definition of a first a priori estimate (Section 2.4.3.).As we will see, the first estimates obtained by the proposed method partly depend on the radiometric measurement.Thus, it would be incorrect to speak of a priori knowledge in a strict sense and we decided to abandon common theoretical perception where both parts of the cost function are simultaneously explored and weighted according to the respective covariance matrices.Instead we explore radiometric information and information on the variables in two successive steps (Figure 1).The successive exploration of radiometric and a priori information has been successfully implemented by other authors too [37,49].

Lookup table generation
Based on the SPECL land cover classification, PROSPECT+SAILh is used to generate a LUT for every available class, using the atmospheric characteristics, and view/sun configuration during data acquisition.A soil reflectance spectrum corresponding to an average of local measurements is used to characterise background reflectance.Canopy variables in every class are sampled according to an "experimental plan" [27] in which the total range of expected values is subdivided into a fixed number of intervals from which canopy variables are randomly drawn.The range of variation of each variable depends on the land cover class, although some overlap is maintained between adjoining classes to account for misclassifications.As the number of canopy reflectance realizations increases exponentially with the number of free variables, physically based logics were used to keep computation efforts within reasonable limits.The number of sampling intervals per variable for a given land cover class is determined by the sensitivity of canopy reflectance to the variable in question.For example, BS and Cbp (indicating senescence) receive only one interval for dense, vigorous vegetation classes such as dark and bright vegetation, while LAI and Cab ranges for these classes are subdivided into 8 intervals.The LUTs constructed in this way contain between 30,375 and 141,750 entries in total.Distribution laws of each variable and for each class were determined in such a way that the probability density was proportional to the sensitivity of the reflectance to the variable considered.This allowed better sampling of the domains where reflectance is more sensitive to the considered variables.Where appropriate, a transformation of the distribution according to [26] and [24] was applied.A detailed overview of the employed sampling schemes can be found in Appendix I.

Exploiting radiometric information
Since the (co-)variance between measurement errors is very difficult to characterize, this is commonly ignored.Most authors therefore usually minimize for radiometric information by calculating the root-mean-squared-error (RMSE) between simulated spectrum and LUT entry (e.g., [37,50,51]) where R i and R i LUT are the reflectance of the input spectrum and the LUT reflectance in band i, respectively, and N equals the number of spectral bands.Nevertheless, in our case we can approximate measurement uncertainty by calculating the spectral covariance from all spectra attributed to a specific SPECL land cover class.Thus, for all spectra within a certain land cover class the same covariance matrix is used.This class based covariance is similar to the object based signatures proposed by [28].If the covariance matrix appears non-invertible, the variance of the wavebands is used instead.Under this assumption, measurement errors are assumed to be uncorrelated between bands and M in Equation 1 becomes a diagonal matrix.It should be noticed however that M only accounts for non-systematic uncertainties, thus ignoring possible biases due to model assumptions and parameterizations.J(r) is calculated for each LUT entry.The 20% of LUT entries with the best radiometric match (i.e., the smallest J(r) values) are considered possible solutions and subjected to minimization in the variable space (Section 2.3.3).The 20% threshold is consistent with what other authors proposed in earlier studies [37,49].

Using predictive equations for a first guess solution
Apart from the variable ranges and statistical distributions used to simulate the LUTs for the respective land cover classes, no further a priori information on variables is available in an automated approach where each input data set is regarded independently.We therefore propose the use of predictive equations based on RTM simulations to obtain a first estimate of the solution.In the current approach, no additional RTM simulations have to be carried out, since a large number of canopy realizations and accompanying variables for each land cover class are already contained in the LUTs.Moreover, canopy realizations have been optimized for the prevailing illumination/observation geometry, atmospheric conditions and local soil background.In order to make the regression function more robust and realistic, approximations for sensor, measurement, and model uncertainties are attributed to the simulated spectra in the LUT on a random Gaussian basis.According to the average specifications of various sensors (e.g., HyMap, MODIS, Landsat TM), the sensor noise level is set to 0.01 (1%) for wavelengths severely affected by water absorption and to 0.003 (0.3%) for the other wavelengths [52].Measurement and atmospheric uncertainties, mainly resulting from errors in atmospheric correction, are set to 0.01 (1%; in reflectance units) for the blue range, and 0.006 (0.6%; reflectance units) for the rest of the spectrum [53].Finally, modelling errors are approximated by the relative standard deviation of spectral measurements of a garden cress performed under laboratory conditions [54].Such a laboratory set-up excludes any disturbing influences of atmosphere, positioning, or view constellation and is therefore representative for variations in the canopy alone.According to the wavelength range these relative errors range between 0.05 (5%; NIR region) and 0.4 (40%; bands severely affected by atmospheric absorption).
The predictive performance of a certain vegetation index depends on the variable to be estimated, and the canopy, atmospheric, soil, and illumination/view conditions at the time of data recording.Therefore, it was decided not to select one specific vegetation index, but to test for every new situation, land cover class, and variable the predictive performance of a wide range of selected VIs.For this purpose, several VIs developed for vegetation analysis are calculated and plotted versus the different canopy variables.The selected VIs and their appearance are listed in Appendix II.They can be roughly subdivided into four different categories: i) Broadband vegetation indices, originally designed to estimate canopy structural variables such as LAI and fractional cover; ii) Chlorophyll indices which are narrow band indices designed to estimate leaf chlorophyll content; iii) Narrow band water indices, designed to estimate plant water content or detect water stress and having one or more bands in spectral regions sensitive to water absorption, and iv) dry matter indices which are sensitive to one or more of the chemicals composing Cdm, such as lignin and cellulose.For a detailed discussion of characteristics and performance the reader is referred to the original publications or to several publications describing and comparing various VIs [10,18,55].In order to find the best predictive equation, both linear and exponential fits are tested for each combination of variable and VI.For each variable and for each land cover class, the VI and regression function providing the highest predictability (in terms of highest coefficient of determination R 2 and lowest root mean squared error of the residuals) were selected to compute a first estimate of the solution from the measured spectra.

Minimizing for first guess of the solution and defining the final solution
Minimization between a priori estimates and the variables in the reduced LUT was performed using: where v p is the vector with the prior estimates of variables that are left free in the inversion.v s is the set of free variables that was used to simulate r s in Equation 1.The covariance matrix P accounts for the non-systematic model errors related to the predictive equations themselves and for the random errors in the predicted values resulting from uncertainties in reflectance (e.g., instrument noise and coregistration errors).This matrix is determined individually for each land cover class and calculated from the a priori estimates of all spectra attributed to this class during land cover classification.W is a diagonal matrix containing the weights that different variables receive in the cost function.These weights correspond to the coefficients of determination that were assigned while establishing the predictive regression equations, with high R 2 values corresponding to high weights.The coefficients in W are indicative for the quality of the regression model.They represent the systematic errors and are therefore complementary to P. Thus, prior estimates that are based on high correlation regression models are emphasized while those based on poor predictive equations are suppressed.The smallest 20% of sorted J(v) values is used to select those entries from the reduced LUT that are used for calculating the final solution of variable v j .The latter is done in a way that the set of variables leading to the smallest J(v) receives largest weight: where N is the number of entries selected from the LUT reduced during variable minimization and v j k the value of variable v j for LUT entry k.The weight w k , attributed to each particular entry selected from the LUT, is defined by: where J k (v) is the result of the cost function for LUT entry k.In principle, all variables input to PROSPECT+SAILh (Section 2.2.) are output by CRASh, although not all variables have importance for further applications.

Testing Model Performance
The CRASh inversion approach was first tested on synthetic data, which offers the possibility to study its performance for a large range of differing canopy realizations.In this way, the approach can be tested for all variables left free during inversion.Based on a synthetic data set, especially characteristics and trends concerning the inversion approach itself can be identified, since the physical correctness of the RTM does not have to be accounted for.Secondly, the CRASh approach was used to characterize intensively used grasslands in Central Europe from field spectrometer measurements while validating the results with in situ measurements.In this way also the effect of including actual measurement and model uncertainties and biases could be studied, yet under a more limited range of canopy realizations.

Synthetic Data Sets
Top of canopy hemispherical directional reflectance factors representing a wide range of crop types and growth stages were calculated with PROSPECT+SAILh using the input variables given in Table 2. Cab, Cw, N, LAI, ALA, and BS varied, whereas Cw and Cdm were fixed at a ratio of 1:4, representing a leaf water content of 80%.Cbp, being of little relevance for green crops, was set to a value close to zero while HS was fixed.For background characterization we used a local soil spectrum measured at the Waging-Taching test site (Section 3.2.1.).Canopy reflectance was calculated for the sensor configuration of the HyMap airborne imaging spectrometer.This sensor covers the electromagnetic spectrum between 0.438 and 2.483 μm with a total of 126 contiguous bands having a full-width-half maximum between 11 and 22 nm [52].View direction was assumed nadir, the solar zenith angle was set to 35°, and the fraction of diffuse radiation was calculated with MODTRAN4 [53] for a typical mid-European atmosphere (rural aerosol type, visibility = 35 km, water vapour column = 2.0 cm).We considered two scenarios: in the first scenario, the PROSPECT+SAILh model used to generate the synthetic data set was considered free of uncertainties and bias; in the second scenario, we attributed realistic errors to the synthetic spectra according to the approximations for sensor, measurement, and model uncertainties outlined in Section 2.4.3.Both synthetic datasets contained a total number of 270 spectra.In order to gain insight into the improvements provided by the CRASh inversion scheme, it was compared to a more conventional radiative transfer model inversion scheme based on a single LUT and minimizing only for spectral reflectance using the RMSE (Equation 2).Selection of the final solution is in line with the one proposed in Section 2.4.2. for the CRASh approach, i.e., the 20% of LUT entries with the best radiometric match were used for calculating the final solution of the variables according to the weighting functions presented in Equations 4 and 5.The accuracy of both model inversion approaches were determined by calculating the RMSE and average relative bias between retrieved values and the variable values used for generating the synthetic spectra.Assuming automated model inversion, no a priori information on land cover or crop status is available.Therefore, the LUT used for the conventional inversion approach should cover all canopy realizations possibly found for agricultural land use and thus span all the LUTs created for the CRASh approach (Appendix I).The resulting comprehensive LUT contains 388,000 entries.Variable ranges and distribution are given in Table 3.

Test site
Field spectrometer measurements and validation data were taken in the catchment of Lake Waging-Taching, close to Salzburg in Southeast Germany (Figure 3).Within the study area, two intensively managed meadows were selected for detailed in situ measurements.The two grasslands were sampled by spectroradiometric and biometric field measurements at June 30 and August 4, 2003.At both dates, the first meadow (MEA1) was characterized by a fully developed canopy, while the second meadow (MEA2) had been recently cut and hence contained a considerable amount of dry material.On each field five to seven sample plots, covering an area of 1 × 1 m 2 , were selected on a stratified basis.

Biometric sampling
Leaf fresh weight was determined by harvesting the total above ground biomass of the square meter while leaf dry weight was determined after oven-drying the sample at 70 °C for 36 hours.The total amount of water was calculated by the difference between fresh and dry leaf weight.LAI of the entire above ground vegetation of each sample was determined by applying a previously established linear relationship between scanned leaf area and wet biomass (see [56] for details).Cdm and Cw were calculated by dividing leaf dry weight and the total amount of water, respectively, by the LAI [17].Summary statistics of the measured variable are presented in Table 4.

Field spectrometer measurements and RTM inversion
Spectral properties of each single sample plot were measured exactly on the location where subsequently the biometric sampling would take place.Per sample plot, ten spectroradiometric measurements were taken from nadir using a portable Fieldspec PRO FR spectrometer (Analytical Spectral Devices, Inc.).The radiance measurements were directly converted into hemispherical directional reflectance factors by taking a Spectralon™ panel as white reference.The single spectra were corrected for the spectral properties of the applied Spectralon panel, deviations of the white reference off the 100 % reflectance line, and spectral jumps between the VNIR and SWIR1 detector [57].Subsequently, the average reflectance per sample plot was calculated and resampled to the spectral characteristics of the HyMap airborne imaging spectrometer [52].Bands strongly affected by atmospheric gases and aerosols (i.e., wavelengths < 0.470 μm, between 1.300 and 1.460 μm, between 1.780 and 1.970 μm, and > 2.400 μm) were not considered during inversion, which resulted in a number of 106 bands.Due to technical problems with the spectrometer, only 19 of the initially 23 sample plots could be used for further analysis.Both the spectral RMSE minimization (Equation 2) based on the comprehensive LUT described in Section 3.1.and the CRASh inversion scheme were applied to these spectra.

Synthetic Data Sets
Table 5 shows that the CRASh approach generally provides more accurate estimates than the ones obtained using a comprehensive LUT and the RMSE of reflectance values as cost function (Equation 2).Estimates of Cab, Cw, Cdm, N, and HS are significantly more accurate for the CRASh approach, while accuracies of LAI and ALA retrievals are similar for both approaches.Only estimates of BS are slightly less accurate when using CRASh.Concerning the average relative bias, improvements provided by the CRASh approach for leaf variable estimates and HS are even larger.Only for LAI there is a significant increase in bias compared to the spectral RMSE based inversion.Figure 4 compares the retrievals of the conventional spectral RMSE based approach with those of the CRASh approach for the variables Cab, Cw, Cdm, and LAI.
The plots show the generally higher retrieved values for the conventional RMSE based approach.For the leaf variables this confirms the higher positive biases reported in Table 5.For Cab also the smaller range of solutions for the spectral RMSE based approach compared to CRASh becomes obvious.For the other three considered variables the ranges of estimated values are very similar for both approaches.For LAI the estimates from both approaches coincide well for high and low values, while for intermediate values spectral RMSE based estimates are higher than CRASh based ones.Given the average LAI bias close to zero for the conventional RMSE based approach and the values below zero for the CRASh approach, this might indicate that CRASh slightly underestimates LAI especially in the intermediate range.Adding uncertainty to the synthetic observations appears to have only little influence on the retrieval accuracy, both for the CRASh inversion procedure and the spectral RMSE based inversion scheme.Shifts in relative RMSE and bias values for all cases are within ±5%.These results are indicative for the robustness of the two presented RTM inversion approaches to uncertainties in the input measurements.The large number of spectral bands simultaneously considered during inversion probably has a positive influence in this respect, by levelling the random noise.For model inversion schemes considering a reduced number of bands the effect of adding noise might become more significant.Relative RMSE values obtained with the CRASh approach range between 20.5% and 74.0% for the case were a perfect model is assumed, and between 20.6 and 78.0% when uncertainties on the input spectra are assumed.In both cases these values were obtained for ALA and HS, respectively.

Field Spectrometer Measurements
Table 6 and Figure 5 show the performance of the spectral RMSE and CRASh in estimating Cw, Cdm and LAI for meadows from spectral field measurements.For the CRASh approach, obtained RMSE values of Cw are comparable to those obtained for the synthetic data set.Biases of Cw retrievals from the spectrometer data are on average more negative than those obtained for the synthetic data sets.Although correlation between measured and estimated Cw is nearly absent, all data pairs, except for one outlier, are concentrated close to the 1:1 line (Figure 5).Accuracies of Cdm retrievals are significantly higher than those obtained for the synthetic data set.This improved accuracy is also expressed by the significantly reduced relative bias, being close to zero.Nevertheless, even though estimates and in situ measurements are located close to the 1:1 line, variation of the differences between measured and estimated differences is still considerable and no clear correlation between measured and estimated values is observed.Also for LAI observed bias improved compared to the synthetic case.However, RMSE is about 10% larger.Of the considered variables only LAI shows a high correlation between measured and estimated values.In contrast to the synthetic data set, intermediate LAI values were slightly overestimated, whereas large LAI values were underestimated (Figure 5).Estimates obtained with the spectral RMSE as a cost function showed clear differences with respect to the simulated data sets.Accuracies of Cw and Cdm clearly improved for the measured data set, whereas the bias encountered for Cdm is still high.Nevertheless, estimation accuracy obtained for Cw, both in terms of RMSE and average bias, is higher than that obtained with CRASh.Regarding LAI, the accuracy and bias obtained for the measured field data are much higher than for the simulated data set and results are clearly inferior to those obtained with CRASh.The scatterplot for spectral RMSE based LAI estimates in the bottom row of Figure 5 shows that especially low measured LAI values are significantly overestimated whereas the estimates for higher measured LAI values are generally in agreement with those obtained using the CRASh approach.

Land Cover Classification
Classification results of the 19 field spectra considered in this study are found in the second column of Table 7.The classification results generally match the expectancy, i.e., the spectra taken at the recently cut meadow (MEA2) were attributed to SPECL class "dark vegetation", "average vegetation" and "mixture of soil and vegetation, whereas spectra taken at the vigorous MEA1 were predominantly attributed to the classes "average vegetation" and "bright vegetation".Three spectra taken at MEA1 on plots with lower LAI values were attributed to the class "mix vegetation/soil".Despite the classification of spectra taken at one and the same field into different classes, no inconsistencies in retrieval accuracy resulting from these classification anomalies could be observed.Most likely, this is due to the overlap in parameterization between the classes.Applying CRASh to image data will have to reveal if the chaotic spatial or temporal variation of the solution observed by other authors [41] is overcome with the parameterization used in this study.
Including the SPECL land cover classification attributes leads to improved model inversion due to several reasons: First, it facilitates a more explicit characterization of spectral uncertainties of the measurements and calculating the covariance between different wavebands (Section 2.4.2.).Even though accounting for the covariance theoretically leads to the soundest solution, its influence on the obtained results strongly depends on the band configuration of the sensor and the considered variable [33].Second, introducing the SPECL land cover classification made it possible to optimize the LUTs for each land cover type and, hence, to let model inversion take place in a restricted variable domain.This in turn reduces the uncertainties related to the individual inversion results.Finally, the LUTs that were calculated separately for each land cover class and illumination/observation geometry allowed for the generation of semi-empirical predictive equations between spectral reflectance and vegetation indices optimized for each specific situation and hence to improved first guesses of the solution.For a detailed discussion on the influence of the single components we refer to [33] and [56].

A Priori Estimates of the Solution
The quality of the a priori estimates, and thus the final estimate of a certain variable, hinges on the accuracy of the predictive regression equations.As expected, the performance of the predictive regression equations strongly depends on the land cover class and the ensemble of canopy, observation, and illumination characteristics, which, in turn, determines the radiometric effect of each variable (Table 7).
In general, regression functions for Cab, Cw, Cdm, LAI, and ALA perform well with medium to high R 2 values and moderate RMS errors.The predictive power of VIs in estimating LAI (expressed by R 2 and RMSE) decreases with increasing canopy density.Saturation of spectral reflectance and, hence, an accurate retrieval for LAI values is a well-known phenomenon [22,24].A reverse trend can be observed for Cw and, to a smaller degree, for Cdm.The trend observed for Cdm is somewhat remarkable since the spectral influence of this leaf constituent is usually masked by leaf water in healthy vegetation [58].Correlations obtained for ALA are similar for all classes.N, Cbp, HS, and BS show only poor correlations, independent of land cover type.The obtained correlations imply that according to Equation 3 minimization between a priori estimates and variables in the LUT is dominated by Cab, Cw, Cdm, LAI, and ALA while other variables only play a small role.Table 7. Best performing predictive equations based on simulated spectra in different SPECL classes.The second column shows the results of the SPECL classification applied to the field spectra considered in this study, categorised according to the meadow on which they were taken.For a description of the VI abbreviations see Appendix II.For variables that have a fixed value in the LUT, no predictive equations could be generated.The best performing VIs selected for each variable were in good agreement with the purpose they were designed for (Table 7).For example, for the canopies and observation/insolation geometry considered in this study Cab was best predicted with the chlorophyll indices LCI and the Red Edge Inflection Point formulation of [59].Variations in both Cw and Cdm for all classes were best predicted with LWVI1 and LWVI2.The reason that not one of the typical dry matter indices was most suited for predicting Cdm has to be sought in the low sensitivity of spectral reflectance to changes in Cdm in the considered green canopies [58].Finally, LAI and ALA were best predicted with broad band indices (RDVI, TVI, MTVI1, TSAVI) specifically designed for estimating structural properties.The fact that exclusively narrow band indices performed best in predicting leaf constituents underpins the relevance of hyperspectral data for monitoring crop status.

SPECL class
The question still remains whether the final result is actually improved by minimizing simultaneously for all a priori estimates (Equation 3) instead of directly using the a priori estimates provided by the predictive equations as the final solution.Evidence is provided in Table 8 which shows that the estimates of Cw, Cdm, and LAI as provided by the predictive equations in Table 7 are considerably less accurate than those obtained after minimization in the variable space (cf.Table 6).The reason for the improved estimates based on RTM inversion is twofold.First of all, the predictive equations are based only on a limited number of bands whereas the RTM approach is a full spectrum approach.As a consequence, the semi-empirical approach is more strongly underdetermined leading to more ambiguous solutions.Secondly, the minimization in the variable space offered by CRASh simultaneously accounts for all variables and searches for the most likely solution for all variables together.Moreover, by introducing the covariance between variables, the compensating effect on spectral reflectance of different variables can be accounted for [21,30].

Overall Performance
The trends in correlations observed for the predictive equations were partly reflected by the final estimates (Figure 5), showing more accurate LAI retrievals for plots with low measured LAI values and slightly better estimates of Cw and Cdm for plots with high vegetation densities.Similar trends were found for the synthetic data set: The highest accuracies were obtained for those variables that have the largest influence on the TOC spectrum, i.e., Cab, LAI, and ALA.This is conform to what several other studies observed [22,24,60].Cw and, especially, Cdm were retrieved with less accuracy.The difficulty of accurately retrieving leaf water and leaf dry matter contents from top of canopy green vegetation spectra is explained by the low sensitivity of spectral reflectance to changes in their concentrations and the only small variations usually observed for such canopies.This has already been recognized by many authors [21,22,24,51,60,61].Remarkably, also N and BS which, according to the predictive equations show only low retrievability, were estimated with reasonable accuracy from the synthetic top of canopy reflectance spectra.The low retrieval accuracies of Cbp and the hotspot parameter HS are in line with the low predictability of this parameter for the considered canopy types (green vegetation) and observation/insolation geometry.
Not all variables output by CRASh have relevance for further applications, e.g., as input to yield models or in the form of stress or drought indicators.Among the variables provided by the CRASh approach, Cab, LAI, and ALA are the most important inputs to process models [62,63].Based on the synthetic data set with simulated uncertainties and the measured field spectra these variables were retrieved with RMSE values between 20.6 and 34.8%.These accuracies can be considered reasonable given the fully automated nature of the approach and the assumption of total absence of a priori knowledge on land cover and development stage.They are well within the range of accuracies obtained for hyperspectral data in studies where significantly more knowledge on crop type and canopy was a priori available and, hence, the parameterization of model inversion could be better tailored to the prevailing conditions [21,22,24,50].The regularization proposed by CRASh in most cases provided clear improvements compared to the case where no a priori info at all is available and minimization only takes place in the radiometric domain.Especially variables that are an important input to process models such as Cab and LAI were retrieved with a significantly higher accuracy.Nevertheless, the studied data sets are still too small to draw comprehensive conclusions about the performance of the presented radiative transfer model approach.Specifically, additional spectroradiometric and biophysical measurements performed over different crop types and for a larger range of environmental and phenologic conditions should be included in order to better evaluate model performance under operational conditions.

Conclusions and Outlook
In this study we presented the CRASh approach, based on the inversion of the PROSPECT and SAILh radiative transfer models, for automated crop canopy characterization from high resolution hyperspectral data sets.In general, the major limitation of an automated approach is the lack of a priori information on land cover and phenology which strongly hinders the regularization of the ill-posed nature of model inversion.Therefore, we incorporated in CRASh two novel regularization techniques: an automated SPECL land cover classifier [47] and a first guess of the solution based on vegetation indices and radiative transfer model simulations optimized for land cover and view/sun geometry.The use of hyperspectral data provides additional regularization in the spectral domain compared to multispectral systems [33,34].
The results obtained in this study for synthetic data sets and field spectrometer measurements over temperate grassland canopies provided estimates of leaf chlorophyll content and leaf area index that in terms of accuracy are comparable to the results obtained by various authors for better constrained inversions.Even if CRASh does not yet provide the accuracy of 10% required for precision agriculture applications [3], the open structure of the approach allows input from the user side, such as a validated land use map or the definition of plausible phenological properties.Such viable a priori information will most certainly further improve estimation accuracy.Nevertheless, the automated CRASh approach as presented in this study showed the potential of a fully automated, image based retrieval of vital agroecosystem variables, and could be an important contribution towards the incorporation of land surface products in operational chains for upcoming high resolution airborne and spaceborne imaging spectrometers such as APEX [64], ARES [65,66], and EnMap [67].

Acknowledgements
This study was carried out in the framework of the 'High-Tech Offensive Zukunft Bayern' project No.290 ('Pilotprojekt Waging-Tachinger See') funded by the Bavarian State Ministry of Science, Research, and the Arts.Special thanks go out to Wout Verhoef and Tomi Schneider for their valuable feedback on the topic.Greenness Index (GI) 677 554 R R [18] Photochemical Reflectance Index (PRI)

Figure 1 .
Figure 1.Overview of CRASh inversion scheme.The upper part shows the general sequence, the grey box shows the inversion approach itself.

Figure 2 .
Figure 2. Template vegetation spectra used in SPECL and typical vegetation types falling into the respective class (in brackets).

Figure 4 .
Figure 4. Inversion results of CRASh approach versus conventional spectral RMSE based approach.

Figure 5 .
Figure 5.In situ measurements of Cw, Cdm and LAI versus estimates obtained from spectral reflectance using the CRASh approach (above) and the spectral RMSE (below).Circles refer to the measurements taken at MEA1, bullets to the measurements taken at MEA2.The dotted line indicates the 1:1 line.

Table 1 .
Decision rules used in SPECL classification, based on reflectance measured at Landsat TM central wave bands: b1 is located at 0.48 μm, b2 at 0.56 μm, b3 at 0.66 μm, b4 at 0.83 μm, b5 at 1.6 μm, b7 at 2.2 μm. a These expressions are optional and only used if b5 is present.b Decision rule depends on presence of b5.c Decision rule depends on presence of b7.

Table 2 .
Parameterization used for the synthetic data set.

Table 3 .
Distribution and sampling plan of the input variables used to construct a comprehensive LUT matching all agricultural land cover types.

Table 4 .
Summary statistics of the measured biophysical and biochemical variables of grassland sample plots.

Table 5 .
Model inversion results based on synthetic data set for CRASh approach and minimization based on spectral RMSE.All values are given in %. Results for Cbp were not included as these showed values close to infinity due to fixing Cbp to value close to zero.

Table 6 .
Absolute and relative RMSE between in situ measurements of Cw, Cdm and LAI and estimates obtained from field spectrometer measurements using the CRASh approach.

Table 8 .
Absolute and relative RMSE, and relative bias between in situ measurements of Cw, Cdm and LAI and estimates obtained from field spectrometer measurements using predictive equations.