Do Red Edge and Texture Attributes from High-Resolution Satellite Data Improve Wood Volume Estimation in a Semi-Arid Mountainous Region?

: Remote sensing-based woody biomass quantiﬁcation in sparsely-vegetated areas is often limited when using only common broadband vegetation indices as input data for correlation with ground-based measured biomass information. Red edge indices and texture attributes are often suggested as a means to overcome this issue. However, clear recommendations on the suitability of speciﬁc proxies to provide accurate biomass information in semi-arid to arid environments are still lacking. This study contributes to the understanding of using multispectral high-resolution satellite data (RapidEye), speciﬁcally red edge and texture attributes, to estimate wood volume in semi-arid ecosystems characterized by scarce vegetation. LASSO (Least Absolute Shrinkage and Selection Operator) and random forest were used as predictive models relating in situ-measured aboveground standing wood volume to satellite data. Model performance was evaluated based on cross-validation bias, standard deviation and Root Mean Square Error (RMSE) at the logarithmic and non-logarithmic scales. Both models achieved rather limited performances in wood volume prediction. Nonetheless, model performance increased with red edge indices and texture attributes, which shows that they play an important role in semi-arid regions with sparse vegetation.


Introduction
Standing biomass in semi-arid to arid regions plays a significant role in preventing soil erosion and degradation and can be considered as an important carbon pool due to the vast extent of drylands over the Earth's land surface.It also provides a year-round source of firewood and construction timber for the local population [1][2][3].For gaining quantitative information on aboveground biomass, the utilization of remote sensing-based applications has become increasingly feasible in recent years.Earth Observation (EO) datasets are available for large areas, and rapid advances in remote sensing techniques allow fast, frequent and continuous biomass observations over various scales in time and space [4,5].As optical EO data alone cannot directly generate reliable quantitative biomass information [6], a common approach correlates satellite-derived parameters-primarily vegetation indices (VIs) measuring photosynthetic vigor-with ground-based measured biomass information, e.g., [7][8][9][10].This allows an indirect prediction of quantitative biomass information.
To overcome this methodological barrier and to find proxies that could improve the accuracy in retrieving biomass information in sparsely-vegetated areas, different techniques and sensors have been used and tested and include combining high-resolution satellite data with multi-or hyper-spectral information [6,10,26,27].Nonetheless, clear recommendations on the suitability of specific sensors for semi-arid to arid environments are lacking, and the development of an operational technique that is consistently accurate and reproducible still remains challenging [6,13].
Synthesizing suggestions, the red edge band (spectral range between 690 and 730 nm) is supposed to be more effective in differentiating the reflectance of the soil background from woody reflectance characteristics due to its wavelength position (690-730 nm) at the edge between red and near infrared [28].The position covers chlorophyll absorption, as well as leaf cell structure reflection, adding information for vegetation characterization.Thus, red edge indices are expected to favor biomass and wood volume estimation in semi-arid landscapes more than traditional VIs [6,26,29,30].Besides the use of red edge indices, it has been suggested to additionally include texture attributes of satellite images [1,5,10,[30][31][32][33][34].Image texture discriminates the spatial variability of neighboring pixels independent from image tone [35].The review of existing scientific literature has shown that little research has been conducted and published on biomass or wood volume estimation in semi-arid regions using high resolution imagery in combination with indices, including the red edge band or texture attributes, e.g., [10,27,36,37].
The main objective of this study is therefore to improve the understanding of the interrelationship between multispectral high-resolution satellite data and ground-based measured wood volume in semi-arid ecosystems with scarce woody vegetation.Our hypothesis is that red edge indices in combination with texture attributes are more effective for wood volume estimation than conventional broadband spectral information.This hypothesis is tested by linking high-resolution RapidEye satellite data with in situ field data of wood volume obtained in the semi-arid high mountainous region of Tajikistan.

Study Area
Sampled forest plots for obtaining field measurements of woody biomass are located in a valley in the southwestern part of Gorno Badakhshan Autonomous Oblast (also known as the Tajik Pamirs) in the eastern high mountains of Tajikistan (Figure 1).In this region, the local energy demand for cooking and heating is high [38,39].Woody biomass, used as firewood, is of major importance to cover the energy needs of the local population [39].However, due to the mountainous topography and a continental climate with long winters of up to six months and considerable dryness, habitats for woody biomass are scarce [40,41].Only fertile riparian zones and alluvial fans can provide larger habitats for denser woody vegetation [42].However, these fertile areas are also important for crop cultivation and livestock farming, which implies that different ecosystem services compete on a local scale [43].Given the dependence of the local population on firewood and the lack of related research, information on forest woody biomass is required to support the sustainable management of the stocks.

Material and Methods
The methodological approach of the study includes three steps as shown in Figure 2

Material and Methods
The methodological approach of the study includes three steps as shown in Figure 2

Material and Methods
The methodological approach of the study includes three steps as shown in Figure 2

Field Data
Field data were collected in August and beginning of September 2013 in seven forest stands distributed along the valley (Figure 1).The total area of sampled forest stands aggregates to 255 ha.These forest stands are found inside defined plots according to the cadaster and serve as a source of firewood and timber for the adjacent villages.The demarcation and size of the plots are defined by the cadaster.To homogeneously distribute sampling transects in relation to the size and vegetation density of the stands, a pre-stratification of the stands was conducted.Four vegetation cover classes were defined within the forest stands (Figure 3) based on very high resolution Google Earth Images from 2008, a land cover map based on QuickBird imagery from 2008 [45] and a guided walk through the whole forest stand with local foresters.The guided walk, considered as purposive sampling, was conducted to update the pre-stratification of the stands, as timely corresponding satellite images were not available prior to the time of field data collection.

Field Data
Field data were collected in August and beginning of September 2013 in seven forest stands distributed along the valley (Figure 1).The total area of sampled forest stands aggregates to 255 ha.These forest stands are found inside defined plots according to the cadaster and serve as a source of firewood and timber for the adjacent villages.The demarcation and size of the plots are defined by the cadaster.To homogeneously distribute sampling transects in relation to the size and vegetation density of the stands, a pre-stratification of the stands was conducted.Four vegetation cover classes were defined within the forest stands (Figure 3) based on very high resolution Google Earth Images from 2008, a land cover map based on QuickBird imagery from 2008 [45] and a guided walk through the whole forest stand with local foresters.The guided walk, considered as purposive sampling, was conducted to update the pre-stratification of the stands, as timely corresponding satellite images were not available prior to the time of field data collection.Based on the pre-stratification, 95 sampling transects were placed in the four different classes.Data collection in each sampling transect followed the line intercept method [46].This method is especially straightforward when field measurements over a larger area are required [46] and when boundaries of plant growth are relatively easy to determine, as is the case for semi-arid shrubby vegetation types [47].Each sampling transect represented a 20 × 2 m transect, leading to a total area of 3800 m 2 sampled.The sampling intensity amounts to 0.15% in relation to the total area of sampled forest stands.Within each transect, the start and end with a GPS (Trimble Juno 3B with an accuracy of 3-5 m) and the quantitative data of each individual standing tree and shrub were recorded.

Sampled Woody Species
The small-leaved mountain forest, also known as riparian or floodplain forest [48,49], is predominant in the region and consists of the following main species: willow species (Salix turanica, Salix shugnanica, Salix wilhelmsiana, Salix alba), poplar species (Populus pyramidalis, Populus pamirica) and sea buckthorn (Hippophae rhamnoides) [48][49][50][51].All sampled transects were dominated by willow (Salix spec.)(N = 472) and sea buckthorn (H.rhamnoides) (N = 560), whereas poplars (Populus spec.)(N = 67) were only present in some plots in smaller abundance (Figure 4).Based on the pre-stratification, 95 sampling transects were placed in the four different classes.Data collection in each sampling transect followed the line intercept method [46].This method is especially straightforward when field measurements over a larger area are required [46] and when boundaries of plant growth are relatively easy to determine, as is the case for semi-arid shrubby vegetation types [47].Each sampling transect represented a 20 ˆ2 m transect, leading to a total area of 3800 m 2 sampled.The sampling intensity amounts to 0.15% in relation to the total area of sampled forest stands.Within each transect, the start and end with a GPS (Trimble Juno 3B with an accuracy of 3-5 m) and the quantitative data of each individual standing tree and shrub were recorded.

Field Measurements and Wood Volume Calculation
Wood volume was derived from non-destructive measurements, including: most common dimensions (stem diameter and height) to determine the stem volume of a standing tree or shrub consisting of stem plus bark, which is widely considered as merchantable stem and bark volume [52][53][54].Stem volume was estimated as a function of the stem basal area, derived from the diameter or circumference and height [54].Circumference was measured with a measuring tape at breast height (d1.3m) for single stemmed trees, whereas for multiple stemmed shrubs, the diameter of each single stem sprouting from the ground was determined with a caliper at knee height (d0.3m) and summed up.A clinometer was used to measure total height (htot) for single-stemmed trees and average height (havg) for multiple stemmed shrubs.As Hoyer [54] does not consider the shape of a stem or branch in his formula, a form quotient approximately representing the shape of a stem was integrated to increase accuracy [46,55].This quotient was not derived from empirical destructive measurements either, but from on-site measurements at the standing tree or shrub.As with form factors, several types of form quotients were developed [53,55].The quotient chosen follows an approach developed by Jonson [53,55], who suggested integrating an absolute form quotient, consisting of the ratio between measured diameters at two points (breast height and half the stem height above breast height) to capture the tree shape.In this case, we used points at knee height (d0.3m) and breast height (d1.3m) at the standing stem or branch, because most of the tree stems and branches were heavily branched and not much higher than breast height.Subsequently, the ratio from these two values was derived to gain an approximate quotient (ƒ) for the taper shape.The form quotient was calculated on the basis of 100 willow and sea buckthorn individuals, respectively, as well as for 35 poplar individuals.The derived form quotient (for poplar and sea buckthorn 0.73; for willow 0.69) was compared to the literature.According to Cannell [52], empirical studies on 640 forest and woodland stands around the world found that heavily-branched stands had form factors in the range of 0.6-0.8.This underlines the fact that the derived form quotient represents a solid reference value for the stem shape and is not too far from reality given the time constraints and conditions to do field work in the research area.Integrating all parameters, equations for single-stemmed trees (Vtree) and for multiple stemmed shrubs (Vshrub) are as follows (Equations ( 1) and ( 2)).The calculated volume per

Field Measurements and Wood Volume Calculation
Wood volume was derived from non-destructive measurements, including: most common dimensions (stem diameter and height) to determine the stem volume of a standing tree or shrub consisting of stem plus bark, which is widely considered as merchantable stem and bark volume [52][53][54].Stem volume was estimated as a function of the stem basal area, derived from the diameter or circumference and height [54].Circumference was measured with a measuring tape at breast height (d 1.3m ) for single stemmed trees, whereas for multiple stemmed shrubs, the diameter of each single stem sprouting from the ground was determined with a caliper at knee height (d 0.3m ) and summed up.A clinometer was used to measure total height (h tot ) for single-stemmed trees and average height (h avg ) for multiple stemmed shrubs.As Hoyer [54] does not consider the shape of a stem or branch in his formula, a form quotient approximately representing the shape of a stem was integrated to increase accuracy [46,55].This quotient was not derived from empirical destructive measurements either, but from on-site measurements at the standing tree or shrub.As with form factors, several types of form quotients were developed [53,55].The quotient chosen follows an approach developed by Jonson [53,55], who suggested integrating an absolute form quotient, consisting of the ratio between measured diameters at two points (breast height and half the stem height above breast height) to capture the tree shape.In this case, we used points at knee height (d 0.3m ) and breast height (d 1.3m ) at the standing stem or branch, because most of the tree stems and branches were heavily branched and not much higher than breast height.Subsequently, the ratio from these two values was derived to gain an approximate quotient (ƒ) for the taper shape.The form quotient was calculated on the basis of 100 willow and sea buckthorn individuals, respectively, as well as for 35 poplar individuals.The derived form quotient (for poplar and sea buckthorn 0.73; for willow 0.69) was compared to the literature.According to Cannell [52], empirical studies on 640 forest and woodland stands around the world found that heavily-branched stands had form factors in the range of 0.6-0.8.This underlines the fact that the derived form quotient represents a solid reference value for the stem shape and is not too far from reality given the time constraints and conditions to do field work in the research area.Integrating all parameters, equations for single-stemmed trees (V tree ) and for multiple stemmed shrubs (V shrub ) are as follows (Equations ( 1) and ( 2)).The calculated volume per individual was summed up for the whole transect and, thus, served as input data for the statistical regression with satellite data.

Satellite Data
High-resolution RapidEye satellite images (8 tiles; 25 ˆ25 km each), consisting of 5 bands, including a red edge band covering the spectral range between 690 and 730 nm, were obtained from the RapidEye Science Archive with a pixel spacing of 6.5 m, resampled to 5 m.The acquisition dates of the images were chronologically very close (13 July 2013; 19 July 2013) and close to the dates of the fieldwork, which facilitates a comparison with the field measurements.
Image data were atmospherically corrected using the FLAASH (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes) module in ENVI Version 4.7, cf.[10,56].Required parameters to run FLAASH were mainly set to standard conditions matching the research area.Mid-latitude winter was set as the atmospheric model with a low value for water vapor and below zero surface air temperature.Visibility was set to 80 km, because weather conditions in the research area are mainly clear, especially in summer, cf.[57].Other parameters (water retrieval and aerosol) were set to none.As further step, images were mosaicked using a feathering algorithm.
To use spectral and spatial information for computing indices, mean reflectance values of each band were extracted for the pixels covered by the sampling transects.Various indices were calculated with the derived reflectance values (Table 1).The indices can be categorized into: (i) single bands, representing reflectance values within the spectral range; (ii) band ratios, which detect differences in surface properties; (iii) broadband greenness vegetation indices, measuring photosynthetic activity [58]; (iv) red edge indices, which use reflectance measurements in the narrow red edge reflectance portion, showing maximum sensitivity for detecting the state of the vegetation [59]; (v) soil adjusted vegetation indices, which attempt to minimize the effect of soil background [60]; and (vi) leaf pigments vegetation indices, which do not measure chlorophyll, but stress-related pigments present in vegetation [59].Besides spectral information, spatial information was extracted from the satellite data.A co-occurrence-based filter embedded in ENVI software (Version 4.7) was used to extract the following image texture parameters: mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment and correlation [61].The filter window size was kept low (3 ˆ3 pixels), in order not to lose spatial information due to over-smoothing of textural variations.In a relatively narrow forest plot, a larger window size increasingly contains non-forest information.In addition, a 3 ˆ3 window size shows good performance in deriving biomass in forests [62].

Single Bands
Represent reflectance values within respective spectral range B1-B5

Broadband Greenness VIs
Try to measure and display the overall amount of photosynthetic material in vegetation Tucker [58] Chlorophyll Index Green/Chlorophyll Green Model CGM " N IR G ´1 Gitelson et al. [63] Green Normalized Difference Vegetation Index GNDV I " N IR´G N IR`G

Modeling Wood Volume
The large number of potential predictors (160) exceeds the number of ground observations (95), which creates a high-dimensional problem leading to overfitting of models [75].To prevent this issue, two different models were selected, which are stated to be effective in this context [10,76,77].Firstly, a linear model with variable selection based on the LASSO (Least Absolute Shrinkage and Selection Operator) technique was chosen as a method that uses shrinkage heuristics and performs variable subset selection prior to prediction [10].This method is therefore able to deal with large numbers of variables and it is also robust when using unequally-distributed variables [76].An internal cross-validation was used to optimize the shrinkage penalty.Secondly, the Random Forest (RF) technique was used, which has become popular in remote sensing applications when dealing with high-dimensional data [77].This technique is based on a large number of decision trees (in this study, 500) fitted to random subsets of the training sample.Both LASSO and RF were fitted to logarithmic wood volume data (to base 10) to better account for non-negativity and nonlinearity.One outlier value in four predictors was trimmed to a value near the second most extreme observed value in the sample.
Predictive model performances of LASSO and RF were estimated using spatial cross-validation [78].Considering the spatial clustering of field sites and the expected autocorrelation of observations within forest stands, the dataset was subdivided into 5 spatial subsets containing between 9 and 29 observations.Five-fold cross-validation was performed using this partitioning.Within this process, one subset at a time was used as a test set, while the other four were used as training sets for a predictive model [75].Predictions from all five test sets were combined in order to calculate cross-validation bias, standard deviation and Root Mean Square Error (RMSE) at the logarithmic and non-logarithmic scale of wood volume.In addition, three different predictor sets of indices (1: predictor set with only broadband Vis, including single bands and band ratios; 2: Predictor Set 1 + red edge indices; 3: Predictor Sets 1 + 2 + texture attributes) were fed into the model to assess the importance of red edge indices and texture attributes.Topographic information (e.g., digital elevation model) was not integrated, as sampled forest stands were located only in flat riparian zones.
One model was selected based on model performance and computational complexity in order to map the wood volume for all sampled forest stands in the study region.To consider the bias introduced in back transformation calculation by logarithmic transformation, predicted values were multiplied with an empirical correction factor based on Baskerville [79].In order to assess each predictor's relative predictive importance, the permutation-based approach was used [80,81].Specifically, each predictor was randomly permutated in order to obtain degraded predictions on spatial cross-validation test sets, and the increase in cross-validation RMSE was used as a measure of variable importance.In the case of the LASSO model, linear model coefficients were furthermore extracted as indicators of variable selection and importance.Nevertheless, variable importance measures in this high-dimensional setting should be taken with a grain of salt due to the high degree of redundancy in the data.For example, correlation among predictors (collinearity) was not considered in the model.However, 25% of all pairwise Pearson's correlations among predictors were >0.80 in absolute value, indicating potential collinearity among multiple predictors.

Ground-Based Measured and Calculated Parameters
Focusing on height and diameter as the main input variables for estimating wood volume, poplars were the highest individuals measured (max.11.7 m), whereas sea buckthorn individuals were relatively small.Willow individuals had a high abundance of stems sprouting from the ground (on average 11.6 stems over all individuals measured) leading to high values for diameter.Poplars had primarily single solid stems.Diameter values for sea buckthorn individuals were rather low (on average 5.6 cm over all individuals measured).In comparison to willow, the stems were relatively small.Table 2 shows most relevant statistical parameters for each forest stand over all individuals measured divided by species.Wood volume was calculated for each recorded individual.In Table 2, median wood volume per ha is indicated for each species per forest stand.Here, the values vary depending on the median number of trees per ha found in each forest stand.For poplars, the height of the single stems was the decisive factor for volume values.Willow individuals do not grow very high, but due to a high abundance of stems the wood volume was similar to that of poplars.Sea buckthorn individuals showed the lowest values for wood volume.Even though the individuals often appear rather bulky, the actual wood volume was comparatively low.

Empirical Wood Volume Models and Variable Importance
LASSO and RF both achieved rather limited performances in wood volume prediction.This is true for all predictor sets of VIs (Table 3).The model performances for both LASSO and RF increase with feeding red edge indices (Predictor Set 2) into the model, but show the best performances with adding both red edge indices and texture attributes (Predictor Set 3).The best models, using Predictor Set 3 as input data, show Spearman correlations of 0.51 for LASSO and 0.50 for RF between measured values and predictions on cross-validated test sets.High values of RMSE at the non-logarithmic scale (LASSO: 687.5 m 3 ¨ha ´1; RF: 667.5 m 3 ¨ha ´1) may seem disappointing compared to the wood volume standard deviation of observed values of 672.5 m 3 ¨ha ´1.However, these values are inflated by extreme values, especially in the case of the less robust LASSO method as a linear model.The Spearman correlations mentioned above and the somewhat lower Pearson correlations of 0.31 (LASSO) and 0.40 (RF) confirm the influence of outliers and the (limited) predictive capability (p-values <0.01 for both correlation tests).On a logarithmic scale, the cross-validated RMSEs are 0.71 for LASSO (687.5 m 3 ¨ha ´1 back-transformed to absolute real values) and 0.70 for RF (667.5 m 3 ¨ha ´1 back-transformed to absolute real values) for Predictor Set 3. The increase of the Pearson correlation to 0.52 (LASSO) and 0.51 (RF) shows some predictive capability.
Among the top 20 predictors in the permutation-based assessment for RF were nine texture attributes, eight red edge indices and three band ratios, also including the red edge band.The most important variables for the sparser LASSO models include four texture attributes, four red edge indices and one band ratio of red edge and green bands.
Scatterplots of predicted versus observed wood volume for both better performing models, LASSO and RF volume predictions tend to be biased towards lower values (Figure 5).Scatterplots of predicted versus observed wood volume for both better performing models, LASSO and RF volume predictions tend to be biased towards lower values (Figure 5).The volume map, obtained with the best performing model (LASSO: Predictor Set 3; log-scale; R 2 = 0.27; PC = 0.52; RMSE = 687.5 m 3 •ha −1 ; 120%), generally predicted values between 10 and 400 m 3 of wood volume per ha (Figure 6).Wood volume is mainly scattered over the plots, tending to decrease towards the edges of the plots and to increase along small canals (e.g., Hisor, where small canals meander through the forest plot) and other water sources, such as gorges (e.g., Narkhun).Predicted volume can only be found homogeneously for Tugoz over the whole plot.This distribution corresponds to the observations in the field.The highest predicted average stock of wood volume per hectare can be found in the forest plot in Tugoz (343.2 m 3 •ha −1 of wood volume in comparison to 230.2 m 3 •ha −1 observed in the field).For the other plots, predicted wood volume ranges between 150 and 250 m 3 •ha −1 , whereas the range for observed values is much bigger.However, absolute values need to be interpreted with caution.The volume map, obtained with the best performing model (LASSO: Predictor Set 3; log-scale; R 2 = 0.27; PC = 0.52; RMSE = 687.5 m 3 ¨ha ´1; 120%), generally predicted values between 10 and 400 m 3 of wood volume per ha (Figure 6).Wood volume is mainly scattered over the plots, tending to decrease towards the edges of the plots and to increase along small canals (e.g., Hisor, where small canals meander through the forest plot) and other water sources, such as gorges (e.g., Narkhun).Predicted volume can only be found homogeneously for Tugoz over the whole plot.This distribution corresponds to the observations in the field.The highest predicted average stock of wood volume per hectare can be found in the forest plot in Tugoz (343.2 m 3 ¨ha ´1 of wood volume in comparison to 230.2 m 3 ¨ha ´1 observed in the field).For the other plots, predicted wood volume ranges between 150 and 250 m 3 ¨ha ´1, whereas the range for observed values is much bigger.However, absolute values need to be interpreted with caution.

Model Performance
This study uses high-resolution red edge and texture attributes retrieved from RapidEye satellite images to tackle the challenge of remote sensing-based wood volume estimation in semi-arid regions.We demonstrate that red edge indices and texture attributes improve the predictive performance in comparison to conventional methods limited to broadband VIs.In this respect, our findings are

Model Performance
This study uses high-resolution red edge and texture attributes retrieved from RapidEye satellite images to tackle the challenge of remote sensing-based wood volume estimation in semi-arid regions.We demonstrate that red edge indices and texture attributes improve the predictive performance in comparison to conventional methods limited to broadband VIs.In this respect, our findings are consistent with several other studies [26, [85][86][87].However, many studies, testing red edge indices, focus either on regions with high biomass levels or on crops.For example, in Eitel et al. [26], red edge information improved the detection of stress-related shifts in foliar chlorophyll in conifer woodland.Ali et al. [85] tested the red edge band for estimating winter wheat and found a better correlation of red edge indices with the leaf area index of winter wheat than with conventional VIs.Kross et al. [86] found similar results estimating biomass in corn and soybean crops.
Findings for woody biomass estimation testing red edge indices particularly in semi-arid to arid regions are diverse.In Li et al. [87], the red edge VI outperformed the commonly-used NDVI in estimating vegetation fraction in arid regions.For savanna [88], desert steppe [89] and grass/herb vegetation [90], the analyses showed similar results.In other studies, however, the additional red edge band was not superior to other model inputs [10,30].Here, SAV (Spectral Angle Values)-based variables, soil adjusted vegetation indices or topographic variables were more important.
The findings are similar for texture measures.The general picture of the studies is in line with our findings.Texture measures play an important role in predicting biomass.In, e.g., [5,33,91], texture variables distinctively improved forest biomass estimates and carbon prediction.Again, these studies were carried out in biomass-rich areas.Eckert [32] found in her study that biomass correlates more with texture measures than with conventional spectral parameters, especially in degraded forest areas.In strongly arid regions, texture measures were also found to be important variables [10,30], but not as decisive as in former studies.In these two studies, it is also suggested to further exploit texture variables not only from single bands, but also from indices, such as soil adjusted indices.In doing so, model performance was shown to improve in Vanselow and Samimi [30].

Model Uncertainties
Even though the predictive capacity of the model increased with red edge indices and texture attributes, the overall model accuracy is rather moderate.Therefore, uncertainties need to be taken into consideration when looking at the predicted wood volume distribution map.High predictive errors can also be found in other studies that relate biomass in semi-arid regions to optical remote sensing data [10,18,22,92], whereas a direct comparison of statistical results must also be treated with caution.
Predictive uncertainty can partly be attributed to the fact that the photosynthetic signal captured by most spectral bands and indices is biased.This is especially true in semi-arid landscapes where sparse woody vegetation is predominant.Pixels are mixed partially with a strong soil background or herbaceous vegetation and plants consisting of photosynthetic and non-photosynthetic woody material.A multi-temporal approach to map soil, herb, shrub and tree cover according to seasonal phenological differences may be an appropriate way of overcoming this issue, as suggested and successfully implemented in Shoshany and Svoray [93].This study took only a snapshot within one year, which can be definitely considered as a limitation.However, in the cold semi-arid environment of the research area, the vegetation period is rather short.A reliable stable year-round phenology pattern is not given and, therefore, not as decisive as in other regions.The growth of vegetation does not depend so much on the rainfall distribution, but on water availability from the rivers, irrigation and groundwater fed by snow melt and glacial runoff.With the launch of Sentinel 2, red edge images with a high spatial resolution are freely available at a high temporal frequency.The 10-day revisiting period allows future studies to take the phenological differences of plants into account, leading to a potential improvement in predictions of wood volume.Moreover, as compared to RapidEye, Sentinel 2 provides three red edge bands, and recent research confirms their high value for vegetation monitoring [94,95].Even though Sentinel 2 has a coarser spatial resolution than RapidEye (10-20 m), a study by Radoux et al. (2016) demonstrates its potential for detecting sub-pixel landscape features.This makes Sentinel 2 an interesting alternative to commercial satellites like RapidEye in the context of woody biomass estimations in semi-arid areas [95].In addition, hyperspectral satellite data may reduce model uncertainties for satellite-based vegetation analysis in drylands, as the higher spectral resolution is more capable in capturing the non-photosynthetic part of wood plants [27].Furthermore, limitations regarding field observations and related wood volume calculation need to be noted.Due to the limited time and difficulty in obtaining official permission for collecting field data, destructive harvesting techniques or allometric equations could not be applied.Besides, allometric equations, which could be helpful in such a case, do not exist for the research area.The transferability of allometric equations from other regions is challenging due to the site and species specificity and was therefore not applicable in this context.This indicates the tradeoff between gaining accurate measurements and having a less time-and labor-intensive method.

Statistical Performance and Importance of Predictors
The challenge of having a large number of predictors was tackled with choosing LASSO using an integrated shrinkage technique and the tree-based RF model.For both models the suitability in such a high-dimensional setting was proven in different studies.In the case of LASSO, Zandler et al. [10] and Lazaridis et al. [96] tested shrinkage regression techniques in comparison to other standard methods and concluded that LASSO performs particularly well.According to the cross-validated model performances, the RF model produced similar results to LASSO.This is contrary to Zandler et al. [10], where the RF model showed a poorer performance than most other models.However, in Powell et al. [92], the lowest RMSE was produced with RF for forest biomass estimation.
In the context of variable importance, red edge indices and texture attributes are highly dominant among the top predictors.However, the significance of the variable importance has to be considered carefully in such a high dimensional setting.Additionally, collinearity was not considered when feeding in additional predictors.

Conclusions
In summary, it can be stated that our study further improved the understanding of estimating wood volume in a semi-arid ecosystem with scarce vegetation using high-resolution multispectral satellite data.Many studies using optical satellite data tested the suitability of red edge and texture measures for crop and grass vegetation or in areas with high biomass levels.Knowledge of its potential for woody and shrubby vegetation in a semi-arid to arid context is still limited.Our study showed that red edge indices and texture measures play an important role in wood volume estimation, as the model performance significantly improved in direct comparison to conventional vegetation indices.Still, as our achieved model performance highlights, biomass mapping in these environments is subject to further improvements.
As a research outlook, we suggest to focus on high-resolution hyperspectral data to achieve better model performance for wood volume estimation in semi-arid areas.In this regard, the high temporal frequency of Sentinel 2, which has a similar pixel resolution to RapidEye, as well as a red edge band, provides further opportunities.A number of studies have already shown its suitability in arid environments.Furthermore, airborne (including unmanned) laser scanning could be utilized depending on the scale of assessment.It is very accurate in assessing forest characteristics, such as stand height or the distribution of biomass or volume and can generate better training data for the correlation with satellite data.From a field methodological point of view, it may be useful to increase the number of sampling points or to develop allometric functions for local woody species to increase the accuracy of ground data.
: (1) obtaining ground-based measured woody biomass information and deriving wood volume; (2) processing of satellite images, including the development of indices; and (3) linking both ground-and satellite-based data sources to find a correlation and to spatially predict wood volume for sampled forest plots.

Figure 2 .
Figure 2. Flowchart of the forest wood volume estimation model.
: (1) obtaining ground-based measured woody biomass information and deriving wood volume; (2) processing of satellite images, including the development of indices; and (3) linking both ground-and satellite-based data sources to find a correlation and to spatially predict wood volume for sampled forest plots.
: (1) obtaining ground-based measured woody biomass information and deriving wood volume; (2) processing of satellite images, including the development of indices; and (3) linking both ground-and satellite-based data sources to find a correlation and to spatially predict wood volume for sampled forest plots.

Figure 2 .
Figure 2. Flowchart of the forest wood volume estimation model.Figure 2. Flowchart of the forest wood volume estimation model.

Figure 2 .
Figure 2. Flowchart of the forest wood volume estimation model.Figure 2. Flowchart of the forest wood volume estimation model.

Figure 3 .
Figure 3. Assigned vegetation cover classes according to the stratification of forest stands in the research area.

Figure 3 .
Figure 3. Assigned vegetation cover classes according to the stratification of forest stands in the research area.

Figure 5 .
Figure 5. Scatterplots of predicted wood volume derived from the relationship between spectral data and in situ measurements versus observed wood volume (N = 95) in a m 3 /transect the best performing model: (a) LASSO (0.27 R 2 ; 0.52 Pearson correlation (PC); 0.51 Spearman correlation (SC)); and (b) random forest (0.26 R 2 ; 0.51 PC; 0.50 SC); predicted values correspond to cross-validation test sets

Figure 5 .
Figure 5. Scatterplots of predicted wood volume derived from the relationship between spectral data and in situ measurements versus observed wood volume (N = 95) in a m 3 /transect the best performing model: (a) LASSO (0.27 R ; 0.52 Pearson correlation (PC); 0.51 Spearman correlation (SC)); and (b) random forest (0.26 R 2 ; 0.51 PC; 0.50 SC); predicted values correspond to cross-validation test sets.

Table 1 .
Categorized indices used in this study (selected).

Table 2 .
Statistical parameters of all forest stands and of all individuals measured per species per forest stand (based on median values).

Table 3 .
Statistics of the cross-validated LASSO and RF model results for different predictor sets.