Next Article in Journal
Using Unoccupied Aerial Systems (UAS) and Structure-from-Motion (SfM) to Measure Forest Canopy Cover and Individual Tree Height Metrics in Northern California Forests
Previous Article in Journal
Basin Ecological Zoning Based on Supply–Demand Assessment and Matching of Green Infrastructure: A Case Study of the Jialing River Basin
Previous Article in Special Issue
Estimation of the Total Carbon Stock of Dudles Forest Based on Satellite Imagery, Airborne Laser Scanning, and Field Surveys
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forest Height and Volume Mapping in Northern Spain with Multi-Source Earth Observation Data: Method and Data Comparison

by
Iyán Teijido-Murias
1,*,
Oleg Antropov
2,
Carlos A. López-Sánchez
1,
Marcos Barrio-Anta
1 and
Jukka Miettinen
2,*
1
SMartForest Research Group, University of Oviedo, 33600 Mieres, Spain
2
VTT Technical Research Centre of Finland, 02044 Espoo, Finland
*
Authors to whom correspondence should be addressed.
Forests 2025, 16(4), 563; https://doi.org/10.3390/f16040563
Submission received: 26 February 2025 / Revised: 12 March 2025 / Accepted: 17 March 2025 / Published: 24 March 2025

Abstract

:
Accurate forest monitoring is critical for achieving the objectives of the European Green Deal. While national forest inventories provide consistent information on the state of forests, their temporal frequency is inadequate for monitoring fast-growing species with 15-year rotations when inventories are conducted every 10 years. However, Earth observation (EO) satellite systems can be used to address this challenge. Remote sensing satellites enable the continuous acquisition of land cover data with high temporal frequency (annually or shorter), at a spatial resolution of 10-30 m per pixel. This study focused on northern Spain, a highly productive forest region. This study aimed to improve models for predicting forest variables in forest plantations in northern Spain by integrating optical (Sentinel-2) and imaging radar (Sentinel-1, ALOS-2 PALSAR-2 and TanDEM-X) datasets supported by climatic and terrain variables. Five popular machine learning algorithms were compared, namely kNN, LightGBM, Random Forest, MLR, and XGBoost. The study findings show an improvement in R2 from 0.24 when only Sentinel-2 data are used with MultiLinear Regression to 0.49 when XGboost is used with multi-source EO data. It can be concluded that the combination of multi-source datasets, regardless of the model used, significantly enhances model performance, with TanDEM-X data standing out for their remarkable ability to provide valuable radar information on forest height and volume, particularly in a complex terrain such as northern Spain.

1. Introduction

Rapid and precise assessment of forest ecosystem structure is becoming increasingly important in the context of climate change monitoring and the European Green Deal. Forests can sequester CO2 and also provide numerous other ecosystem services. They therefore play an important role in climate change adaptation and in maintaining ecosystem health. To support sustainable use of forests and to improve understanding of the status of European forests, the European Union (EU) has implemented a number of policies that prioritize biodiversity conservation and sustainable forest management. These include the Biodiversity Strategy [1], the Carbon Removals and Carbon Farming Certification (CRCF) Regulation [2], and the Regulation on a Forest Monitoring Framework [3], all of which require forest monitoring with unprecedented spatial detail and temporal frequency.
Information about forests is traditionally obtained through national forest inventories (NFIs). However, the laborious and expensive fieldwork involved limits the temporal frequency of data acquisition to such an extent that in some areas, the measuring interval can be close to the rotation period (harvest years) of fast-growing forest plantations [4]. It is thus evident that NFI sampling alone cannot fulfill the new requirements for the temporal and spatial detail of forest information. Two possible approaches can be used to address this challenge: (i) reducing the interval of field inventories for fast-growing species (as in the Spanish NFI) and/or (ii) using remote sensing to acquire data.
In addition to meeting monitoring requirements, forestry stakeholders and managers are often interested in quantifying forest resources to support forest management decisions such as the development of management plans and forest resource planning as well as the implementation of silviculture or timber harvesting operations. Updated forest information is needed at the national, regional (e.g., autonomous communities in Spain), and local levels. At the higher levels, the information supports the formulation of forest policies and forest monitoring, while at local levels, it is needed for forest management, helping forest owners to optimize their management activities. Earth observation (EO) satellite systems provide high-frequency monitoring at 10–30 m resolution, depending on the type of sensor and mission, thus enabling continuous forest monitoring at the stand level [5]. In comparison with traditional inventories, in which sampling plots are established in forest stands and the data obtained are extrapolated to the entire stand on an area basis, remote sensing enables the production of spatially explicit data, which are particularly useful for forest management planning purposes. However, the most suitable data sources and methods can vary from one region to another and are strongly affected by local conditions (e.g., type of forestry activity, topography, and cloudiness).
Many remote sensing datasets require complex pre-processing, both at the theoretical level and also in terms of processing infrastructure, which complicates standardization and the possible transfer of the data and models to other areas. However, various space agencies implement standardized processing for their data, delivering end-user datasets that are ready for analysis. The Committee on Earth Observation Satellites (CEOS) has introduced the concept of Analysis-Ready Data (ARD). ARD is defined as geospatial data that have been processed according to a minimum set of requirements and organized in a way that facilitates immediate analysis with minimal additional user effort, ensuring interoperability over time and with other datasets. The use of ARD enables application of the models to images from successive years, in time series, etc., with minimal effort in terms of image processing. The present study therefore focused on assessing the functional use of ARD in a topographically complex region of northern Spain.
Remote sensing sensors can be grouped in two different categories: (i) active, such as Airborne LiDAR Scanners (ALSs) and radar sensors, and (ii) passive, such as optical sensors. Data from optical sensors (e.g., Landsat and Sentinel-2) have traditionally been commonly used in forest monitoring [6,7], although radar data have the potential to provide information on the forest structure and have been used to map forest growing stock volume (L-band and C-band SAR datasets) [8,9,10,11,12]. Various models based on ALS and optical data [13] have been developed and tested for predicting forest variables (e.g., total volume, dominant height, basal area, etc.) in the region of interest. However, the use of radar imaging data, such as TanDEM-X, ALOS-2 PALSAR-2, or Sentinel-1 data, has not yet been tested for forest mapping in the study region. In theory, the inclusion of radar observables can enhance information on the structural characteristics of the forest. In addition to the radar backscatter, repeat-pass interferometric coherence can be used in the analysis. Coherence measures the degree of similarity between radar signals reflected from the same area at different times, enabling better differentiation of forest types and conditions, and it can be suitable for forest attribute retrieval [14,15]. An additional unique data source, such as satellite bistatic interferometric SAR mission TanDEM-X [16], can be used to produce very accurate Digital Surface Models with interferometric coherence uniquely suited to retrieval of vegetation vertical structure [17,18].
Authors, such as [19,20,21,22,23], have studied the use of optical images for the study of forest parameters, obtaining positive results. Other authors have sought to complement this information with additional sensors, generally radar, as is the case with [24,25,26,27]. In these studies, it has been demonstrated that the differences between the wavelengths of the radar instrument used can either favor or hinder the objective of biomass or forest stock estimation, so the selection of these wavelengths is of certain relevance.
Various approaches can be used to analyze this type of data, including the use of parametric and non-parametric models, and machine learning approaches. Machine learning approaches have been widely proven to be useful in terrestrial area monitoring for a variety of purposes [28,29,30,31,32,33,34]. Regarding the selection of machine learning algorithms, several authors have made comparisons in classification [35,36] and in regression [37,38,39,40], showing that the greater or lesser fit of the models is largely influenced by the training database, and to a lesser extent, by the type of variables used.
In Spain, several studies have looked into forest parameters prediction using remote sensing data, such as [41,42,43], and a small number of studies have analyzed the contribution of radar information in forest predictions alone [44]. However, no study has been found that investigates forest parameter prediction using remote sensing data in the region of interest with a multi-source EO-data approach that includes radar information, considering the complex orography and large forested areas that combine native forests with plantations.
In this study, we used several of the most widely used machine learning algorithms in the forestry context, including multiple linear regression [45], k-Nearest Neighbors [46], Random Forest [47], and also two that are less popular in forestry predictions but that are innovative aggregative approaches, namely LightGBM [48] and XGBoost [49]. We evaluate the usability of these methods with different combinations of datasets to understand the potential and usability of multi-source EO datasets for forest monitoring in the region.
Existing monitoring approaches used in forest management in northern Spain require annual updates of forest data, which can pose a challenge due to the complexity of the forest landscape, the small size of stands, the presence of mixed stands, difficult topography, and other factors. Monitoring plantations is particularly challenging due to the rapid changes in both the extent and characteristics of the stands. In this study, models for predicting forest variables of interest for major commercial timber plantations were developed and evaluated. The main objective of this article is to find optimal method and dataset combinations for forest monitoring in the region and to establish a benchmark with which to compare the integration of novel deep learning techniques in the study area. To achieve this, the specific objectives of this study were defined as follows: (i) to evaluate any improvement in the models by including radar variables acquired by Sentinel-1, ALOS-2 PALSAR-2, and TandDEM-X satellites and terrain and climate variables, (ii) to select the best machine learning (ML) algorithm for each forest variable, and (iii) to compare the results of a generic model with species-specific models. For these purposes, different combinations of satellite data were tested by using the Spanish NFI plot measurements as reference data. In addition to Sentinel-2 optical data, three radar sensors (C-band, Sentinel-1; L-band, ALOS-2 PALSAR-2; and X-band, TanDEM-X) and ancillary data were used.

2. Materials and Methods

2.1. Study Area

The region of interest in this study was northern Spain, which is characterized by a temperate climate with abundant rainfall well distributed throughout the year. As a result of these conditions, the region is the most productive forest area in the country [50]. This allows the use of fast-growing species, such as Eucalyptus globulus, with harvesting cycles of less than 15 years. Pinus pinaster is almost as important as Eucalyptus globulus in the region, while Pinus radiata is less common, except in the eastern area (Basque Country), where it is the most important commercial species. Most of the stands are privately owned plantations in which management is complicated by the small size of forest ownership, the lack of forestry associations, and the scarce or non-existent management of the stands, among other factors. The profitability of these small plantations is not sufficient to warrant contracting qualified technicians, who could recommend optimal silvicultural schemes. The treatments applied are therefore often those recommended by local companies, which primarily involve planting. The lack of control over the stands complicates forest management, leading to a lack of data required for proper regional planning. In this context, the use of remotely sensed data to detect changes and produce land cover, production, and productivity maps is invaluable for forestry managers.
The study area includes four administrative regions in northern Spain (Galicia, Asturias, Cantabria, and the Basque Country), which cover a total area of 5.2 million hectares. The area is mainly located in the European Atlantic Biogeographical Region, except for the southeastern part of Galicia, which belongs to the Mediterranean Biogeographical Region [51]. Forest cover amounts to 2.8 million hectares [4], representing 56% of the total surface area of the study region. Forest plantations are of great importance in the region, which is one of the most productive forest areas in Europe. The plantations are mainly composed of three fast-growing species: blue gum (Eucalyptus globulus Labill.), maritime pine (Pinus pinaster Aiton), and radiata pine (Pinus radiata D. Don). These plantations are of great socio-economic importance, as indicated by the average of 10.8 million m3 of wood cut annually in the four communities during the period 2005 to 2021. This volume of timber represents 67% of the total wood cut annually in Spain during that period, with the three species mentioned accounting for 92% of the 10.8 million m3 [4].

2.2. Field Reference Data

For this study, we used data from the most recent Spanish NFI inventory update (SNFI-4.5) (Figure 1), conducted in 2018, focusing on the three most productive forest species.
We only considered plots established in pure stands, with the basal area of main species accounting for a minimum of 80% of the total basal area within each plot. This approach resulted in 1471 plots being available for analysis in the whole region. Of these, 589 were dominated by E. globulus, 474 by P. pinaster, and 408 by P. radiata.
The SNFI plots [50] are systematically located at the intersections of a 1 km × 1 km grid within the forested areas. Each field plot consists of four concentric circular subplots with fixed areas, with radii of 5, 10, 15, and 25 m. In the 25 m circle, trees with a DBH ≥ 42.5 cm are measured; in the 15-m circle, trees with a DBH ≥ 22.5 cm are measured; in the 10-m circle, trees with a DBH ≥ 12.5 cm are measured; and in the 5-m circle, all trees with a DBH ≥ 7.5 cm are measured, while trees with a DBH between 2.5 and 7.5 cm are only counted. Field data collected include the azimuth and distance of each measured tree from the plot center, plot identification, forest type, tree measurements (DBH and height of the measured trees), erosion factors, anthropogenic activities, tree damage, shrub species, and shrub cover within the 10-m subplot.
Key forest variables, including the dominant height ( H 0 ) and total overbark volume per hectare ( T V h a ), were derived from tree measurements.

2.3. EO Data

2.3.1. Sentinel-2

Sentinel-2 is a satellite mission operated by the European Space Agency which is designed for high-resolution optical imaging. It consists of two satellites that provide multi-spectral data across 13 bands, with a revisit time of 5 days at the Equator (2–3 days in mid-latitudes). For our study area, we selected images that had minimal observation angles, were taken under cloud-free conditions, and were captured during the summer months. We used Sentinel-2 Level 2A products, which include atmospheric and topographic correction. The bands used to estimate the dependent variable and develop spectral indices were B2, B3, B4, B5, B6, B7, B8, B8A, B11, and B12. Additionally, the Quality Assessment band (BQA) was used for outlier detection. All images (Table 1) were downloaded using the Forestry TEP platform developed by VTT (https://f-tep.com/, accessed on 19 March 2025). All bands were resampled to a 10 m spatial resolution using nearest-neighbour resampling. Since this was the highest resolution available from Sentinel-2, the information provided by bands B2, B3, B4, and B8 of the sensor was leveraged in order to improve the precision and representativeness of the map.

2.3.2. Sentinel-1

As a source of preprocessed Sentinel-1 data, we used an extract from the “Global Seasonal Sentinel-1 Interferometric Coherence and Backscatter” dataset [52]. The dataset provides a global spatial representation of multi-seasonal synthetic aperture radar (SAR) repeat-pass interferometric coherence and backscatter signatures computed from single-look complex (SLC) images acquired by ESA Sentinel-1 sensor in Interferometric Wide-Swath mode between 1 December 2019 and 30 November 2020.
We used the summer season coherence mosaic metrics of median 6-day repeat coherence estimates for C-band VV polarized data, mean backscatter for VV and VH polarizations, and local incidence, taking into account layover/shadow areas for all relative Sentinel-1A and Sentinel-1B orbits within the study site (Table 2).
The summer season dataset was acquired in 2020, while our field measurements were made and other EO data were captured in 2018. Although there is a difference of one year between our reference data and Sentinel-1 coherence, we included this dataset to assess the utility of Sentinel-1 images in the absence of other ARD datasets.

2.3.3. ALOS-2 PALSAR-2

Another source of SAR data we used is represented by the L-band ALOS-2 PALSAR-2 JAXA mosaic product of year 2018. It contains orthorectified and radiometrically corrected backscatter at HH and HV polarizations, as well as information on the observation date and local incidence angle. See Table 3 for the layers used.
L-band SAR data are well known as one of the best sources for satellite image-based predictions of forest variables, with demonstrated potential to map forest growth stock volume, above-ground biomass, and structural forest variables [53,54,55].

2.3.4. TanDEM-X

To evaluate the usability of TanDEM-X interferometric phase-based products, we derived a canopy height model estimate from the TanDEM-X-based digital surface model (DSM) and ALS-based local terrain/topographic model. For this purpose, we first combined the Copernicus DEM (a DSM for 2013/2014) with the TanDEM-X 30m DEM Change Maps, which provide height differences from Copernicus DEM till 2018/2019. Combining these datasets enables the construction of an updated DSM close to our observation period. By subtracting an accurate regional DEM from this updated DSM, we obtained relative InSAR height information close to our observation time, which is assumed to correlate well with the target features.
In other words, the TanDEM-X height calculated for the years 2013–2014 is combined with the TanDEM-X change map calculated for 2018–2019. This way, we obtain the TanDEM-X height for 2018–2019. To normalize this height and derive the relative InSAR height of the forest, we subtract a digital terrain model (DEM) calculated for our study area using ALS data. This gives us the TanDEM-X variable, which we will correlate with the forest variables.
H tdx = H Tan DEM - X + Δ H Tan DEM - X DEM ALS
This formula represents the calculation of the relative height from TanDEM-X ( H tdx ) , where
  • H Tan DEM - X is the initial TanDEM-X height;
  • Δ H Tan DEM - X is the change in TanDEM-X height;
  • DEM ALS is the digital elevation model (DEM) derived from ALS data.
This process combines the initial height from TanDEM-X with the height changes observed over a specific time period and then subtracts the ALS-based terrain model to derive the relative forest height for the study area.
Despite limited radiowave penetration at the X-band, TanDEM-X data products proved to be a useful source of forest height information [56,57,58] and can be combined with other EO data sources to estimate other forest parameters, such as volume, dominant height, and basal area.

2.4. Other Features

In addition to the optical and radar features described above, 19 spectral indices were derived from the Sentinel-2 bands, as listed in Table A1.
The environmental (climate and elevation) data used are shown in Table 4.
We obtained the climate information from the Digital Climatic Atlas of the Iberian Peninsula, a set of digital climatic maps of air temperature (minimum, mean, and maximum), precipitation, and solar radiation with 200 m of spatial resolution developed by [59]. The series used to generate this information correspond to 15 years for temperature and 20 years for precipitation.
A digital elevation model (DEM) with a spatial resolution of 5 m, developed by the Spanish National Center for Geographic Information (CNIG), was obtained for the four administrative areas. This model is available for free download at https://centrodedescargas.cnig.es/CentroDescargas/home, accessed on 15 September 2024. This was resampled to 10 m using the cubic convolution resampling method. We calculated the variables to include in the models using ArcGIS Pro 3.4.

2.5. Data Analysis

To obtain the best models for the study area, we divided this study into three parts: (i) study of the sensitivity of total volume to remote sensing data using three different datasets (optical, radar, and optical–radar) and improvement of the previous model with the inclusion of terrain and climatic variables, (ii) comparison of five different regression algorithms; and (iii) development of models for forest plantations and each species.
Finally, the model results for forest plantations were compared separately with the data obtained for each species by applying the same models. This also enabled us to assess the need for a precise cover classification map at the species level, or whether a map that only differentiates between plantations and broadleaf forests is sufficient.
The software used can be found in Appendix A.1.

2.5.1. Workflow

A detailed description of the data processing workflow is shown in Figure 2.

2.5.2. Outlier Detection

We excluded plots that are in shadow/layover areas for the radar information and the plots that contain values other than 4 (vegetation) in the BandQA from Sentinel-2 (Table 5). Moreover, plots that were clear-cut in 2019 were not used with TanDEM-X data.

2.5.3. Models, Feature Selection, and Performance

The analysis consisted of a regression modeling process with feature selection and hyperparameter optimization to enhance the accuracy of the predictive models. Initially, the data were normalized to ensure that all features were on a similar scale, which is crucial for many machine learning algorithms.
In the first phase, the importance of the different available data sources was analyzed using the multilinear regression (ML) algorithm [45], and the best possible combination was selected as a preliminary step to evaluate the five proposed algorithms.
Subsequently, Lasso Feature Selection [60] was used to identify the most influential variables while simultaneously optimizing hyperparameters through a grid search (Grid Search) with cross-validation, thereby maximizing the predictive performance of each model.
Several popular regression models were evaluated, including multiple linear regression (MLR) [45], k-Nearest Neighbors (kNN) [46], Random Forest [47], LightGBM [48], and XGBoost [49].
The primary objective was to identify the model and feature combination that provided the best fit for the data and yielded the most accurate predictions. The aim of combining feature selection and grid search was to find the simplest and most effective model possible, avoiding overfitting and improving generalization to new data. Furthermore, exhaustive evaluation of different models enabled comparison of the performance and selection of the most suitable model for the task at hand.

2.5.4. Uncertainty Evaluation

In this study, a 10-fold cross-validation was performed to assess the goodness of fit of the models. This method divides the dataset into 10 equal parts, where each part is used once as a validation set, while the remaining parts are used for training.
The models were evaluated using average metrics from k-fold cross-validation, such as the Root Mean Squared Error (RMSE), the pseudocoefficient of determination (R2), prediction bias, and relative RMSE (RMSE(%)), providing a comprehensive view of their performance.
The following equations were used as evaluation metrics:
RMSE = 1 n i = 1 n ( y i y ^ i ) 2
Root Mean Squared Error (RMSE) quantifies the difference between predicted values ( y ^ i ) and actual values ( y i ). A lower RMSE indicates a better fit of the model to the data.
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
Coefficient of determination (R2) measures the proportion of variance in the dependent variable that is predictable from the independent variables. A higher R2 indicates a better model fit.
Bias = 1 n i = 1 n ( y i y ^ i )
Bias calculates the average difference between the predicted and actual values. A bias close to zero suggests no systematic error in predictions.
RMSE ( % ) = RMSE y ¯ × 100
Relative RMSE (RMSE (%)) normalizes the RMSE by the mean value ( y ¯ ) of the observed data, providing a percentage value to evaluate the model’s performance relative to the size of the data.

3. Results and Discussion

3.1. Database Selection

Total volume was used as the target variable for comparing the contribution of different EO and auxiliary datasets to select the best feature configuration for forest variable prediction.
The results of the comparison of the performance of various machine learning algorithms (kNN, Random Forest, LightGBM, XGBoost, and MultiLinear Regression) using two datasets (Sentinel-2 and Sentinel-2 combined with TanDEM-X) are summarized in Table 6. This comparison aimed to assess whether the choice of dataset significantly influences model performance in subsequent modeling phases. Notably, while the addition of TanDEM-X data improved the overall predictive accuracy across all models (indicated by higher values R2 and lower RMSE and RMSE (%) values), the relative performance of all algorithms was very similar.
The results of the multiple linear regression (MLR) presented in Table 7 demonstrate the performance of different models combining optical, radar, and additional ancillary variables (such as terrain and climate). In the first step, models based on spectral bands (SBs) and spectral indices (SIs) yielded low R2 values (0.24–0.26), high bias (−8.1% to −9.0%), and high RMSE values (131–134), indicating significant discrepancies between predicted and observed values. The relative RMSE (RMSE (%)) of 66% highlights the limited predictive capacity of the models including only optical data. These findings are consistent with those of [61,62], who reported similarly low predictive performance, although other studies, such as [63], achieved better results with different modeling strategies.
For radar-only models, the performance remained limited. The Sentinel-1 model yielded a very low R2 (0.0122), while the ALOS-2 PALSAR-2 model yielded a slightly higher R2 (0.07). Despite the low R2 values, the bias and RMSE values are relatively consistent with the limitations of radar data when using small plots for training. This is not surprising, since plots are not optimal reference data for the evaluation of radar-based predictions. Field data measured over larger surfaces would be more suitable as reference for radar-based predictions.
Combining multiple radar data sources substantially improved model performance. The TanDEM-X model yielded an R2 of 0.33, and combinations such as X + L bands and C + X bands further increased the R2 values to 0.37 and 0.34, respectively. The model including all radar data (Sentinel-1, ALOS-2, and TanDEM-X) yielded the highest R2 among the radar-based models (0.39) with minimal bias (−0.17%) and the lowest RMSE (119.38). These results highlight the potential value of combining different radar bands and sources to enhance the accuracy of prediction.
In step 2, integrating optical and radar data significantly improved the model performance. The best combination of optical and radar data (SB + SI + all radar) yielded an R2 of 0.47, with a bias of −4.08% and an RMSE of 111.49. Adding terrain variables marginally improved the R2 (to 0.48), with slight reductions in bias and RMSE. Similarly, including climate variables or a combination of terrain and climate yielded small improvements (R2 between 0.46 and 0.47). These findings suggest that while the inclusion of terrain and climate variables slightly improved the model, they did not greatly improve the predictive performance of the model relative to that of the best optical + radar model. This is consistent with findings of similar studies that demonstrate limited incremental gains from ancillary variables for volume prediction [61].
Gaps in coverage often occur with radar data, especially in areas with rugged terrain, where obtaining wall-to-wall data for all the study area is challenging. When using only the optical + radar model (combination 3+6), the R2 increased (to 0.44), with a slightly higher RMSE (114.09), making it comparable to that yielded by the Phase 2 models. Despite these limitations, the inclusion of radar bands could be justified by the model improvement obtained.
In the study area, approximately 4% of the surface is affected by radar shadow or layover, as shown in Figure 3. Optical data, which are only limited by cloud cover, provide complete spatial coverage; combining optical data with other auxiliary variables allows us to avoid relying on radar data (which have shadows). The inclusion of terrain or climate data slightly increased the R2 (to 0.46) and reduced the RMSE (112), demonstrating the value of this type of data for generating continuous maps without the need to rely on radar data.
The combination selected for Phase 2, based on Phase 1 results, was the model combining Sentinel-2, Sentinel-1, ALOS-2 PALSAR-2, TanDEM-X, and terrain variables (model 12). This model does not include climate variables. However, considering the lack of continuous coverage for Sentinel-1 and ALOS-2 PALSAR-2, additional analysis was conducted using only variables with full coverage. Although this model (combination 15) did not yield the same R2 and RMSE values, it was deemed suitable for wall-to-wall mapping applications, similar to the approach used in [62]. Option 18, which incorporates terrain and climate variables, emerges as an optimal choice for continuous variable prediction and subsequent mapping.

3.2. Algorithm Selection and Parameter Optimization

We executed the models to identify the optimal parameter configuration for each and also to perform feature selection for dimensionality reduction.
The performance of five machine learning models (kNN, Random Forest, LightGBM, XGBoost, and MLR) for predicting two variables ( T V h a and H 0 ) using different statistical metrics with all variables included is summarized in Table 8. The R2 values indicate that gradient boosting models (LightGBM and XGBoost) are the most effective for explaining data variability, yielding R2 values above 0.49 for both variables. By contrast, kNN was the poorest-performing model, with R2 values close to 0.34 for H 0 and 0.39 for T V h a . Multiple linear regression (MLR) models remained competitive, producing adequate results for metrics such as bias and relative error, influenced by the correlation between the target variables and TanDEM-X height.
In terms of error and bias, LightGBM and XGBoost yielded the lowest RMSE and RMSE (%) values, and the perfomance of Random Forest was intermediate. Overall, LightGBM and XGBoost appear ideal for modeling forest variables, although MLR provided the highest level of interpretability. The parameters obtained for each model using GridSearch in Python are shown in Table A2.
We evaluated a feature reduction system based on Lasso. We analyzed three different feature sets for each model, T V h a and H 0 . We selected the variables on the basis of their importance (Figure 4. The first group was formed by those variables that, when summed, exceeded 40% of the importance. The second group included those variables whose combined value was more than 75%. Finally, the third group consisted of all variables whose importance was higher than 1%.
The performance metrics for five regression models (kNN, LGBM, MLR, RF, and XGBoost) under three configurations, corresponding to increasing numbers of variables (N Var = 5, 13, 23) are shown in Table 9. The metrics evaluated include R2, bias (absolute and percentage), RMSE, and relative RMSE (RMSE (%)). In general, the perfomance of all models improved as the number of variables increased, and only kNN performed better with a medium number of variables. Among the models, XGBoost produced the best results with an R2 of 0.49, the lowest RMSE (109.07), and a minimal bias percentage of −1.30% when using 23 variables. This result is better than when all features were used. The best-performing model was LGBM, particularly in terms of RMSE and RMSE (%), but it did not surpass XGBoost in overall performance.
The same analysis was conducted for the dominant height (Table 10), and three configurations with varying numbers of predictors (N Var = 2, 7, 16) were calculated. As with the total volume, the performance of all models, apart from kNN, consistently improved as the number of predictive variables increased. Among the metrics considered ( R 2 , bias, percentage bias, RMSE, and RMSE (%)), LGBM performed best overall, yielding the highest R2 (0.49) and the lowest RMSE (4.99) and RMSE (%) (24.12%) with 16 predictors. XGBoost and MLR also exhibited a high predictive capacity, with R2 and error metrics comparable to those of XGBoost, particularly when more predictors were included.
On the other hand, the performance of kNN and RF was intermediate, with slightly higher RMSE and RMSE (%) values than for XGBoost and LGBM. However, the low percentage bias of RF (0.99% with 15 variables) indicates consistent predictions, although this model falls short of XGBoost in terms of R 2 .
Overall, XGBoost emerged as the most effective model for this dataset due to its high predictive accuracy and low error rates across all configurations, making it the optimal choice for modeling the target variable.

3.3. Forest Plantation Models

We developed eight forest models for each target variable to examine the influence of the species on the spectral signature and the contribution of work with each species separately. We selected the best models from the previous analysis: LightGBM for H 0 and XGBoost for T V h a .
Table 11 includes the statistical results for two features, H 0 and T V h a , for three tree species: Eucalyptus globulus, Pinus radiata, and Pinus pinaster, as well as for the whole Forest Plantations group. For the characteristic H 0 , the R2 values indicate moderate predictive power for all species and forest plantations, with Pinus pinaster yielding the highest value of 0.51, while Eucalyptus globulus and Pinus radiata have similar values close to 0.44. The bias was generally low, with Pinus radiata showing a bias close to zero (0). The RMSE values were lowest for Pinus pinaster (4.19). The lowest RMSE (%) corresponded to Pinus radiata (19%). Compared to previous studies, the inclusion of the TanDEM-X height provides valuable information and improved previous results (R2 around 0.37 for each species).
For the T V h a feature, the R2 values were higher for Pinus pinaster (0.61) and lower for Pinus radiata (0.35). The negative bias was highest for Pinus radiata (−2.87), with the predicted values being underestimated, while the positive bias was low (0.44) for Pinus pinaster, with the predicted values being overestimated. The RMSE values for T V h a show that the model performed worse than H 0 . In terms of relative error, Eucalyptus globulus yielded the highest RMSE (%) (60%), indicating greater variability in the predictions than for the other species. In previous studies, such as [61], the R2 value for this variable was around 0.45, and lower values were obtained in the current study.
In general, these results show that the predictions were better for Pinus pinaster in both cases. This could be due to the type of stand formed by Pinus pinaster, in which annual growth is the lowest among the three species, so the variability in radar data acquisition between 2018 and 2019 did not have as great an influence as on Eucalyptus globulus, which has higher average growth rates. As a result, Pinus pinaster remained more stable over time, which could explain this better fit for the species. Analysis of the dominant height CAI for each species obtained from the site quality curves of [64] from Galicia for the study area yielded the following values: E. globulus: 1.82 m year−1, P. radiata: 0.80 m year−1, and P. pinaster: 0.55 m year−1. Therefore, the lack of temporal correlation between field data collection and image acquisition had less influence on the species with the slowest growth rates.
In comparison with the results obtained by [61], there was a significant improvement for Pinus pinaster, for which an R2 value of 0.45 was obtained in the earlier study. Comparing the results of the forest plantations with the individual species showed that the plantations generally tended to yield intermediate results between those obtained for the individual species. In the case of H 0 , the R2 value for the forest plantations was 0.49, which is higher than for Pinus radiata but lower than for Pinus pinaster. For T V h a , the forest plantations yielded an R2 of 0.49, also between the values for the individual species. Additionally, the bias and RMSE of the plantations tend to be intermediate, providing a more stable prediction than with the species-specific models.
When we compare the results of this study with other similar studies on tree height or forest stock in m³/ha, we find that the results are in line with those obtained by other researchers, such as [19] (R2 for E globulus in a total volume of 0.52) and [39] (R2 in a total volume of 0.58). However, with similar EO data databases but different field data, the results presented by authors like [26] (R2 for E globulus in a total volume of 0.78, RMSE (%) = 23%) or [24] (R2 in a total volume of 0.65–0.74) were better than those obtained in this study, emphasizing the influence of these differences in model generation, where the study range, shape, and number of plots play a crucial role in the outcomes. For example, the total volume range in the plots of this study spans from 0.68 to around 700 m³/ha, while that in [24] ranges from 12 to 253 m³/ha.
It has to be noted also that this study was conducted using ARD data sources, which enables the correponding analysis to be conducted on cloud computing platforms without the need for image processing. While it has been shown that precise topographic corrections affect predictions [61,65], the trade-off between improving these corrections and the time and resources spent on this should be assessed.
On the other hand, adopting an approach focused on forest plantations enables the development of more stable models. The size of the training dataset and the increased variability in the data are considered to be the main reasons for the improved stability. By integrating data from different species, greater diversity in spectral and growth characteristics is introduced, which allows the model to better generalize across various conditions. This reduces overfitting to a particular species and results in more consistent predictions. Such an approach could even accommodate mixed plantation plots, although these types of plots were not used in this study.
Final plots with the reference vs. predicted values are shown in Figure 5 and Figure 6.
The application of one of the generated models can be seen in Figure 7. In comparison with the high-resolution image, it is evident how the model-generated information adapts to the study area. However, as stated by [61], these maps derived from spatially continuous variables should be clipped using accurate land cover classification maps to precisely delineate the area occupied by forest plantations on the ground.

3.4. Limitations and Further Studies

The main limitation of this study is the temporal harmonization of the data. TanDEM-X data are only available in derived products for 2018–2019, highlighting the need for annual mosaics for better forest height prediction. It also has to be remembered that the models are highly dependent on the field reference datasets, as discussed above, and the findings of this study therefore cannot be generalized to other regions.
There are at least three aspects that need further research and may potentially significantly improve the approaches tested in this study. These include the development of models for mixed forest plantations, the utilization of L-band repeat-pass interferometry, and the development of deep learning methodologies. The current study will serve as a good benchmark for the evaluation of improvements that can be achieved by additional datasets and deep learning models. Other future lines of research include the combination of several types of sensors under deep learning techniques [66] and the analysis of temporal series [67,68].
This article is part of a study on prediction using deep learning algorithms and will be used as a comparison between models developed using deep learning and traditional machine learning algorithms.

4. Conclusions

The results of this study indicate that the models that combine optical and imaging radar data acquired by Sentinel-1, ALOS-2 PALSAR-2, and TanDEM-X satellites generally perform best, in terms of R2, bias, and RMSE, outperforming models that use only one type of sensor data. Terrain variables make little contribution to improving the models. The addition of climate variables does not seem to provide significant improvements in prediction accuracy, although it does help refine the predictions in some cases. Models enabling wall-to-wall mapping tend to perform slightly less well than the best optical + radar combination models, but yield acceptable results for operational purposes.
The best machine learning (ML) algorithms for each forest variable were LightGBM for total height and XGBoost for T V h a , specifically for the study area and the forest stands analyzed, namely Pinus pinaster, Pinus radiata, and Eucalyptus globulus. The variable reduction suggests that only the kNN method benefited from optimized feature reduction.
The models developed for forest plantations were more stable than generic models, although the best results obtained for Pinus pinaster were probably due to the stability of the stands over time. However, generic plantation models can be useful for predicting volume in mixed plantations of these species in the study area.
The temporal harmonization of the data is a limitation, as TanDEM-X data are only available in derived products for 2018–2019, highlighting the need for annual mosaics for better forest height prediction. Future research should focus on mixed forest plantations, L-band repeat-pass interferometry, and deep learning methodologies, with this article serving as a comparison between deep learning and traditional machine learning models.

Author Contributions

Conceptualization, J.M., O.A., and I.T.-M.; methodology, I.T.-M. and O.A.; software, I.T.-M. and O.A.; resources, J.M. and C.A.L.-S.; data curation, O.A. and I.T.-M.; writing—original draft preparation, I.T.-M.; writing—review and editing, C.A.L.-S., M.B.-A., J.M., and O.A.; supervision, C.A.L.-S. and J.M.; funding acquisition, C.A.L.-S., J.M., and M.B-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the research project of code PID2020-112839RB-I00 funded by the Spanish State Research Agency (AEI) of the Ministry of Science and Innovation (MCIN/AEI /10.13039/501100011033). The work of O.A. and J.M. was supported by the European Space Agency (ESA), contract 4000135015/21/I-NB—Forest Carbon Monitoring, under the EOEP5 program. This work was carried out while the first author was conducting a research stay at VTT, funded by the Government of the Principality of Asturias, Grants for Short Stays at Research Centers (code EB24-26). While undertaking the present study, the first author was in receipt of a Severo Ochoa Fellowship from the Government of Asturias (ref. BP21-125).

Data Availability Statement

The databases used are all derived from public sources, as mentioned in the text. However, the authors may share the data upon a reasonable and justified request to the first author.

Acknowledgments

The first author is grateful to VTT and its representatives for the opportunity to carry out this stay.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Suplementary Data

Appendix A.1. Software

We used Forest-TEP to visualize and download Sentinel-2 images. Python 3.11.4 was used for data analysis within the Jupiter Notebook GUI on a local server. We used the following libraries: pandas, sklearn.model_selection, sklearn.pipeline, sklearn.linear_model, sklearn.neighbors, sklearn.ensemble, lightgbm, xgboost, mlxtend.feature_selection, numpy, sklearn.metrics, and matplotlib.pyplot. For map visualization, we used QGIS 3.34.14, and for processing the terrain variables, we used ArcGIS.
Table A1. Spectral index variables used in this study.
Table A1. Spectral index variables used in this study.
Spectral IndexEquation / Description
Anthocyanin Reflectance Index (ARI) ARI = 1 B 3 1 B 5
Chlorophyll Red-Edge (CRE) CRE = B 8 A B 5 1
Enhanced Vegetation Index (EVI) EVI = 2.5 · ( B 8 B 4 ) ( B 8 + 6 · B 4 7.5 · B 2 + 1 )
Enhanced Vegetation Index 2 (EVI2) EVI 2 = 2.5 · ( B 8 B 4 ) ( B 8 + B 4 + 1 )
Green Normalized Difference Vegetation Index (GNDVI) GNDVI = B 8 B 3 B 8 + B 3
Modified Anthocyanin Reflectance Index (MARI) MARI = B 8 B 4 B 5
Modified Chlorophyll Absorption in Reflectance Index (MCARI) MCARI = B 5 B 4 B 5 B 4 B 3
Modified Soil-Adjusted Vegetation Index (MSAVI) MSAVI = 2 · B 8 + 1 ( 2 · B 8 + 1 ) 2 8 · ( B 8 B 4 ) 2
Moisture Stress Index (MSI) MSI = B 11 B 8
Normalized Burn Ratio (NBR) NBR = B 8 B 12 B 8 + B 12
Normalized Burn Ratio 2 (NBR2) NBR 2 = B 11 B 12 B 11 + B 12
Normalized Difference Moisture Index (NDMI) NDMI = B 8 B 11 B 8 + B 11
Normalized Difference Vegetation Index (NDVI) NDVI = B 8 B 4 B 8 + B 4
Pigment-Specific Simple Ratio (PSSR) PSSR = B 8 B 4
Soil-Adjusted Vegetation Index (SAVI) SAVI = ( B 8 B 4 ) · ( 1 + L ) B 8 + B 4 + L ( L = 0.5 )
Tasseled Cap Brightness (TCB) TCB = 0.3029 · B 2 + 0.2786 · B 3 + 0.4733 · B 4 + 0.5599 · B 8 + 0.508 · B 11 + 0.1872 · B 12
Tasseled Cap Greenness (TCG) TCG = 0.2941 · B 2 0.243 · B 3 0.5424 · B 4 + 0.7276 · B 8 + 0.0713 · B 11 0.1608 · B 12
Tasseled Cap Wetness (TCW) TCW = 0.1511 · B 2 + 0.1973 · B 3 + 0.3283 · B 4 + 0.3407 · B 8 0.7117 · B 11 0.4559 · B 12
Tasseled Cap Angle (TCA) TCA = tan 1 TCG TCB
Table A2. Parameter selection for each model and variable.
Table A2. Parameter selection for each model and variable.
ModelParameterParameters ValueParameters TVha Parameters H 0
kNNn_neighbors5, 7, 9, 111111
weightsuniform, distancedistancedistance
metricminkowskiminkowskiminkowski
p1, 222
Random Forestn_estimators50, 100, 200, 300, 500200300
max_depthNone, 5, 10, 15, 20, 30NoneNone
min_samples_split2, 5, 10, 2022
min_samples_leaf1, 2, 4, 811
max_featuressqrt, log2, Nonesqrtsqrt
LightGBMnum_leaves20, 31, 50, 70, 1002020
learning_rate0.01, 0.05, 0.1, 0.20.050.1
n_estimators50, 100, 200, 30010050
boosting_typegbdt, dart, gossgbdtgbdt
max_depth−1, 5, 10, 155−1
XGBoostn_estimators50, 100, 200, 300100100
max_depth3, 5, 7, 9, 1153
learning_rate0.01, 0.05, 0.1, 0.20.050.1
subsample0.6, 0.8, 1.00.80.6
colsample_bytree0.6, 0.8, 1.01.01.0
MLRfit_interceptTrue, FalseTrueTrue

References

  1. European Commission. EU Biodiversity Strategy for 2030; European Commission: Brussels, Belgium, 2020. [Google Scholar]
  2. European Commission. Union Certification Framework for Permanent Carbon Removals, Carbon Farming and Carbon Storage in Products; European Commission: Brussels, Belgium, 2024. [Google Scholar]
  3. European Commission. Proposal for a Regulation on a Forest Monitoring Framework—European Commission; European Commission: Brussels, Belgium, 2024. [Google Scholar]
  4. MITECO. Anuario de Estadistica Forestal, Avance; MITECO: Madrid, Spain, 2024. [Google Scholar]
  5. Fassnacht, F.E.; White, J.C.; Wulder, M.A.; Næsset, E. Remote sensing in forestry: Current challenges, considerations and directions. For. Int. J. For. Res. 2024, 97, 11–37. [Google Scholar] [CrossRef]
  6. Wulder, M. Remote Sensing of Forest Environments; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar] [CrossRef]
  7. Gómez, C.; Alejandro, P.; Hermosilla, T.; Montes, F.; Pascual, C.; Ángel Ruiz, L.; Álvarez Taboada, F.; Tanase, M.A.; Valbuena, R. Remote sensing for the Spanish forests in the 21st century: A review of advances, needs, and opportunities. For. Syst. 2019, 28, eR001. [Google Scholar] [CrossRef]
  8. Antropov, O.; Rauste, Y.; Ahola, H.; Hame, T. Stand-level stem volume of boreal forests from spaceborne SAR imagery at L-Band. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2013, 6, 35–44. [Google Scholar] [CrossRef]
  9. Cartus, O.; Santoro, M. Exploring combinations of multi-temporal and multi-frequency radar backscatter observations to estimate above-ground biomass of tropical forest. Remote Sens. Environ. 2019, 232, 111313. [Google Scholar] [CrossRef]
  10. Rodríguez-Veiga, P.; Quegan, S.; Carreiras, J.; Persson, H.J.; Fransson, J.E.; Hoscilo, A.; Ziółkowski, D.; Stereńczak, K.; Lohberger, S.; Stängel, M.; et al. Forest biomass retrieval approaches from earth observation in different biomes. Int. J. Appl. Earth Obs. Geoinf. 2019, 77, 53–68. [Google Scholar] [CrossRef]
  11. Ge, S.; Tomppo, E.; Rauste, Y.; McRoberts, R.E.; Praks, J.; Gu, H.; Su, W.; Antropov, O. Sentinel-1 time series for predicting growing stock volume of boreal forest: Multitemporal analysis and feature selection. Remote Sens. 2023, 15, 3489. [Google Scholar] [CrossRef]
  12. Santoro, M.; Cartus, O.; Antropov, O.; Miettinen, J. Estimation of forest growing stock volume with synthetic aperture radar: A comparison of model-fitting Methods. Remote Sens. 2024, 16, 4079. [Google Scholar] [CrossRef]
  13. Novo-Fernández, A.; Barrio-Anta, M.; Recondo, C.; Cámara-Obregón, A.; López-Sánchez, C.A. Integration of National Forest Inventory and Nationwide Airborne Laser Scanning Data to Improve Forest Yield Predictions in North-Western Spain. Remote Sens. 2019, 11, 1693. [Google Scholar] [CrossRef]
  14. Giménez, M.H.; López-Martinez, C.; Antropov, O.; Lopez-Sanchez, J.M. Role of temporal decorrelation in C-Band SAR interferometry over boreal and temperate Forests. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; pp. 4230–4234. [Google Scholar] [CrossRef]
  15. Cartus, O.; Santoro, M.; Wegmüller, U.; Labrière, N.; Chave, J. Sentinel-1 coherence for mapping above-ground biomass in semiarid forest areas. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  16. Krieger, G.; Moreira, A.; Fiedler, H.; Hajnsek, I.; Werner, M.; Younis, M.; Zink, M. TanDEM-X: A Satellite Formation for High-Resolution SAR Interferometry. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3317–3341. [Google Scholar] [CrossRef]
  17. Kugler, F.; Schulze, D.; Hajnsek, I.; Pretzsch, H.; Papathanassiou, K.P. TanDEM-X Pol-InSAR Performance for Forest Height Estimation. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6404–6422. [Google Scholar] [CrossRef]
  18. Bispo, P.D.C.; Pardini, M.; Papathanassiou, K.P.; Kugler, F.; Balzter, H.; Rains, D.; dos Santos, J.R.; Rizaev, I.G.; Tansey, K.; dos Santos, M.N.; et al. Mapping forest successional stages in the Brazilian Amazon using forest heights derived from TanDEM-X SAR interferometry. Remote Sens. Environ. 2019, 232, 111194. [Google Scholar] [CrossRef]
  19. dos Reis, A.A.; Carvalho, M.C.; de Mello, J.M.; Gomide, L.R.; Ferraz Filho, A.C.; Acerbi Junior, F.W. Spatial prediction of basal area and volume in Eucalyptus stands using Landsat TM data: An assessment of prediction methods. N. Z. J. For. Sci. 2018, 48, 1–17. [Google Scholar] [CrossRef]
  20. Rahimzadeh-Bajgiran, P.; Hennigar, C.; Weiskittel, A.; Lamb, S. Forest potential productivity mapping by linking remote-sensing-derived metrics to site variables. Remote Sens. 2020, 12, 2056. [Google Scholar] [CrossRef]
  21. Immitzer, M.; Vuolo, F.; Atzberger, C. First experience with Sentinel-2 data for crop and tree species classifications in central Europe. Remote Sens. 2016, 8, 166. [Google Scholar] [CrossRef]
  22. Keskes, M.I.; Mohamed, A.H.; Borz, S.A.; Niţă, M.D. Improving National Forest Mapping in Romania Using Machine Learning and Sentinel-2 Multispectral Imagery. Remote Sens. 2025, 17, 715. [Google Scholar] [CrossRef]
  23. Han, H.; Wan, R.; Li, B. Estimating forest aboveground biomass using Gaofen-1 images, Sentinel-1 images, and machine learning algorithms: A case study of the Dabie Mountain Region, China. Remote Sens. 2021, 14, 176. [Google Scholar] [CrossRef]
  24. Liu, Y.; Gong, W.; Xing, Y.; Hu, X.; Gong, J. Estimation of the forest stand mean height and aboveground biomass in Northeast China using SAR Sentinel-1B, multispectral Sentinel-2A, and DEM imagery. Isprs J. Photogramm. Remote Sens. 2019, 151, 277–289. [Google Scholar] [CrossRef]
  25. Latifi, H.; Nothdurft, A.; Koch, B. Non-parametric prediction and mapping of standing timber volume and biomass in a temperate forest: Application of multiple optical/LiDAR-derived predictors. Forestry 2010, 83, 395–407. [Google Scholar] [CrossRef]
  26. Dube, T.; Mutanga, O.; Abdel-Rahman, E.M.; Ismail, R.; Slotow, R. Predicting Eucalyptus spp. stand volume in Zululand, South Africa: An analysis using a stochastic gradient boosting regression ensemble with multi-source data sets. Int. J. Remote Sens. 2015, 36, 3751–3772. [Google Scholar] [CrossRef]
  27. Jiang, F.; Kutia, M.; Ma, K.; Chen, S.; Long, J.; Sun, H. Estimating the aboveground biomass of coniferous forest in Northeast China using spectral variables, land surface temperature and soil moisture. Sci. Total. Environ. 2021, 785, 147335. [Google Scholar] [PubMed]
  28. WU, H.; WU, F.; CAI, Y.; LI, Z. Assessing the spatiotemporal impacts of land use change on ecological environmental quality using a regionalized territorial impact assessment framework. Sustain. Cities Soc. 2024, 112, 105623. [Google Scholar] [CrossRef]
  29. Wang, X.; Zhang, Y.; Atkinson, P.M.; Yao, H. Predicting soil organic carbon content in Spain by combining Landsat TM and ALOS PALSAR images. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102182. [Google Scholar] [CrossRef]
  30. Li, F.; Yigitcanlar, T.; Nepal, M.; Nguyen, K.; Dur, F.; Li, W. Mapping heat vulnerability in Australian capital cities: A machine learning and multi-source data analysis. Sustain. Cities Soc. 2025, 119, 106079. [Google Scholar] [CrossRef]
  31. Alonso, L.; Picos, J.; Armesto, J. Automatic Identification of Forest Disturbance Drivers Based on Their Geometric Pattern in Atlantic Forests. Remote Sens. 2022, 14, 697. [Google Scholar] [CrossRef]
  32. Hermosilla, T.; Wulder, M.A.; White, J.C.; Coops, N.C.; Hobart, G.W. Regional detection, characterization, and attribution of annual forest change from 1984 to 2012 using Landsat-derived time-series metrics. Remote Sens. Environ. 2015, 170, 121–132. [Google Scholar]
  33. Thapa, B.; Lovell, S.; Wilson, J. Remote sensing and machine learning applications for aboveground biomass estimation in agroforestry systems: A review. Agrofor. Syst. 2023, 97, 1097–1111. [Google Scholar]
  34. Wulder, M.A.; Coops, N.C.; Roy, D.P.; White, J.C.; Hermosilla, T. Land cover 2.0. Int. J. Remote. Sens. 2018, 39, 4254–4284. [Google Scholar] [CrossRef]
  35. Alshari, E.A.; Gawali, B.W. Analysis of Machine Learning Techniques for Sentinel-2A Satellite Images. J. Electr. Comput. Eng. 2022, 2022, 9092299. [Google Scholar] [CrossRef]
  36. Thanh Noi, P.; Kappas, M. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery. Sensors 2018, 18, 18. [Google Scholar] [CrossRef]
  37. Yu, J.W.; Yoon, Y.W.; Baek, W.K.; Jung, H.S. Forest vertical structure mapping using two-seasonal optic images and LiDAR DSM acquired from UAV platform through random forest, XGBoost, and support vector machine approaches. Remote Sens. 2021, 13, 4282. [Google Scholar] [CrossRef]
  38. Liu, Z.; Ye, Z.; Xu, X.; Lin, H.; Zhang, T.; Long, J. Mapping forest stock volume based on growth characteristics of crown using multi-temporal Landsat 8 OLI and ZY-3 stereo images in planted eucalyptus forest. Remote Sens. 2022, 14, 5082. [Google Scholar] [CrossRef]
  39. Hu, Y.; Xu, X.; Wu, F.; Sun, Z.; Xia, H.; Meng, Q.; Huang, W.; Zhou, H.; Gao, J.; Li, W.; et al. Estimating forest stock volume in Hunan Province, China, by integrating in situ plot data, Sentinel-2 images, and linear and machine learning regression models. Remote Sens. 2020, 12, 186. [Google Scholar] [CrossRef]
  40. Ahmadi, K.; Kalantar, B.; Saeidi, V.; Harandi, E.K.G.; Janizadeh, S.; Ueda, N. Comparison of Machine Learning Methods for Mapping the Stand Characteristics of Temperate Forests Using Multi-Spectral Sentinel-2 Data. Remote Sens. 2020, 12, 3019. [Google Scholar] [CrossRef]
  41. Esteban, J.; McRoberts, R.E.; Fernández-Landa, A.; Tomé, J.L.; Naesset, E. Estimating Forest Volume and Biomass and Their Changes Using Random Forests and Remotely Sensed Data. Remote Sens. 2019, 11, 1944. [Google Scholar] [CrossRef]
  42. Guerra-Hernández, J.; Narine, L.L.; Pascual, A.; Gonzalez-Ferreiro, E.; Botequim, B.; Malambo, L.; Neuenschwander, A.; Popescu, S.C.; Godinho, S. Aboveground biomass mapping by integrating ICESat-2, SENTINEL-1, SENTINEL-2, ALOS2/PALSAR2, and topographic information in Mediterranean forests. Giscience Remote Sens. 2022, 59, 1509–1533. [Google Scholar] [CrossRef]
  43. Gómez, C.; Wulder, M.A.; Montes, F.; Delgado, J.A. Modeling Forest Structural Parameters in the Mediterranean Pines of Central Spain using QuickBird-2 Imagery and Classification and Regression Tree Analysis (CART). Remote Sens. 2012, 4, 135–159. [Google Scholar] [CrossRef]
  44. He, W.; Zhu, J.; Lopez-Sanchez, J.M.; Gómez, C.; Fu, H.; Xie, Q. Forest Height Inversion by Combining Single-Baseline TanDEM-X InSAR Data with External DTM Data. Remote Sens. 2023, 15, 5517. [Google Scholar] [CrossRef]
  45. Montgomery, D.; Peck, E.; Vining, G. Introduction to Linear Regression Analysis, 5th ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012; Volume 81, pp. 318–319. [Google Scholar]
  46. Cover, T.M.; Hart, P.E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  47. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  48. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  49. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  50. Alberdi, I.; Cañellas, I.; Bombín, R.V. The Spanish National Forest Inventory: History, development, challenges and perspectives. Pesqui. Florest. Bras. 2017, 37, 361. [Google Scholar] [CrossRef]
  51. Rivas-Martínez, S. Global Bioclimatics. Clasif. BioclimáTica Tierra 2004, 16, 1–23. [Google Scholar]
  52. Kellndorfer, J.; Cartus, O.; Lavalle, M.; Magnard, C.; Milillo, P.; Oveisgharan, S.; Osmanoglu, B.; Rosen, P.; Wegmüller, U. Global seasonal Sentinel-1 interferometric coherence and backscatter data set. Sci. Data 2022, 9, 73. [Google Scholar] [CrossRef]
  53. Antropov, O.; Rauste, Y.; Häme, T.; Praks, J. Polarimetric ALOS PALSAR time series in mapping biomass of boreal forests. Remote Sens. 2017, 9, 999. [Google Scholar] [CrossRef]
  54. Richards, J.A.; Sun, G.Q.; Simonett, D.S. L-Band Radar Backscatter Modeling of Forest Stands. IEEE Trans. Geosci. Remote Sens. 1987, GE-25, 487–498. [Google Scholar] [CrossRef]
  55. Ye, Z.; Long, J.; Zhang, T.; Lin, B.; Lin, H. L-Band Synthetic Aperture Radar and Its Application for Forest Parameter Estimation, 1972 to 2024: A Review. Plants 2024, 13, 2511. [Google Scholar] [CrossRef]
  56. Treuhaft, R.; Goncalves, F.; Santos, J.R.D.; Keller, M.; Palace, M.; Madsen, S.N.; Sullivan, F.; Graca, P.M. Tropical-forest biomass estimation at X-band from the spaceborne tandem-X interferometer. IEEE Geosci. Remote. Sens. Lett. 2015, 12, 239–243. [Google Scholar] [CrossRef]
  57. Olesk, A.; Praks, J.; Antropov, O.; Zalite, K.; Arumäe, T.; Voormansik, K. Interferometric SAR Coherence Models for Characterization of Hemiboreal Forests Using TanDEM-X Data. Remote Sens. 2016, 8, 700. [Google Scholar] [CrossRef]
  58. Schlund, M.; Kotowska, M.M.; Brambach, F.; Hein, J.; Wessel, B.; Camarretta, N.; Silalahi, M.; Jaya, I.N.S.; Erasmi, S.; Leuschner, C.; et al. Spaceborne height models reveal above ground biomass changes in tropical landscapes. For. Ecol. Manag. 2021, 497, 119497. [Google Scholar] [CrossRef]
  59. Ninyerola, M.; i Fernández, X.P.; Roure, J.M. Atlas Climático Digital de la Península Ibérica. Metodología y Aplicaciones en Bioclimatología y Geobotánica; Universitat Autònoma de Barcelona: Bellaterra, Spain, 2005; ISBN 932860-8-7. [Google Scholar]
  60. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. Stat. Methodol. 1996, 58, 267–288. [Google Scholar]
  61. Novo-Fernández, A.; López-Sánchez, C.A.; Cámara-Obregón, A.; Barrio-Anta, M.; Teijido-Murias, I. Estimating Forest Variables for Major Commercial Timber Plantations in Northern Spain Using Sentinel-2 and Ancillary Data. Forests 2024, 15, 99. [Google Scholar] [CrossRef]
  62. Huong, N.T.T.; Quynh, C.T.N.; Dinh, N.D.; Hoai, C.T.; Hang, P.T.; Bao, H.D.; Son, N.T.; Dan, L.Q.; Anh, P.T. Estimating tropical forest stand volume using Sentinel-2A imagery. In Proceedings of the 2021 2nd International Conference on Intelligent Data Science Technologies and Applications, IDSTA 2021, Tartu, Estonia, 15–17 November 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway Township, NJ, USA, 2021; pp. 130–137. [Google Scholar] [CrossRef]
  63. Astola, H.; Häme, T.; Sirro, L.; Molinier, M.; Kilpi, J. Comparison of Sentinel-2 and Landsat 8 imagery for forest variable prediction in boreal region. Remote. Sens. Environ. 2019, 223, 257–273. [Google Scholar] [CrossRef]
  64. Diéguez-Aranda, U.; Rojo-Alboreca, A.; Castedo-Dorado, F.; Álvarez González, J.; Anta, M.; Crecente-Campo, F.; González-González, J.M.; Pérez Cruzado, C.; Rodríguez-Soalleiro, R.; López-Sánchez, C.; et al. Herramientas selvícolas para la gestión forestal sostenible en Galicia. Forestry 2009, 82, 1–16. [Google Scholar]
  65. Teijido-Murias, I.; Barrio-Anta, M.; López-Sánchez, C.A. Evaluation of Correction Algorithms for Sentinel-2 Images Implemented in Google Earth Engine for Use in Land Cover Classification in Northern Spain. Forests 2024, 15, 2192. [Google Scholar] [CrossRef]
  66. Lahssini, K.; Teste, F.; Dayal, K.R.; Durrieu, S.; Ienco, D.; Monnet, J.M. Combining LiDAR Metrics and Sentinel-2 Imagery to Estimate Basal Area and Wood Volume in Complex Forest Environment via Neural Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2022, 15, 4337–4348. [Google Scholar] [CrossRef]
  67. Liu, P.; Ren, C.; Wang, Z.; Jia, M.; Yu, W.; Ren, H.; Xia, C. Evaluating the Potential of Sentinel-2 Time Series Imagery and Machine Learning for Tree Species Classification in a Mountainous Forest. Remote Sens. 2024, 16, 293. [Google Scholar] [CrossRef]
  68. Li, Q.; Lin, H.; Long, J.; Liu, Z.; Ye, Z.; Zheng, H.; Yang, P. Mapping Forest Stock Volume Using Phenological Features Derived from Time-Serial Sentinel-2 Imagery in Planted Larch. Forests 2024, 15, 995. [Google Scholar] [CrossRef]
Figure 1. Plots used for the SFNI-4.5 (a), Sentinel-2 granules used (b) and region of interest (ROI) in Spain (Europe) (c), EPSG:32629.
Figure 1. Plots used for the SFNI-4.5 (a), Sentinel-2 granules used (b) and region of interest (ROI) in Spain (Europe) (c), EPSG:32629.
Forests 16 00563 g001
Figure 2. Study logic workflow.
Figure 2. Study logic workflow.
Forests 16 00563 g002
Figure 3. Layover and shadow effect in the region of interest.
Figure 3. Layover and shadow effect in the region of interest.
Forests 16 00563 g003
Figure 4. Feature selection: relative importance of each independent variable in relation to the modeled dependent variable.
Figure 4. Feature selection: relative importance of each independent variable in relation to the modeled dependent variable.
Forests 16 00563 g004
Figure 5. XGBoost model for total volume per hectare where the blue dots represent a real vs. predicted value pair, and the red line is the 1:1 line.
Figure 5. XGBoost model for total volume per hectare where the blue dots represent a real vs. predicted value pair, and the red line is the 1:1 line.
Forests 16 00563 g005
Figure 6. LightGBM model for dominant height where the blue dots represent a real vs. predicted value pair, and the red line is the 1:1 line.
Figure 6. LightGBM model for dominant height where the blue dots represent a real vs. predicted value pair, and the red line is the 1:1 line.
Forests 16 00563 g006
Figure 7. Wall-to-wall map obtained using the XGBoost algorithm for total volume ( T V h a ) (10 m) (a), Sentinel-2 image used as an independent variable (10 m) (b), and a high-resolution orthophoto (0.25 m) from the year 2017 (c), for a subset of the study region.
Figure 7. Wall-to-wall map obtained using the XGBoost algorithm for total volume ( T V h a ) (10 m) (a), Sentinel-2 image used as an independent variable (10 m) (b), and a high-resolution orthophoto (0.25 m) from the year 2017 (c), for a subset of the study region.
Forests 16 00563 g007
Table 1. Sentinel-2 satellite data acquisition details.
Table 1. Sentinel-2 satellite data acquisition details.
Satellite/GranuleAcquisition DateSolar Zenith (°)Solar Azimuth (°)
S2A/29TMH11 August 201830.86148.82
S2A/29TNG19 June 201822.94138.83
S2A/29TNH11 August 201830.42150.94
S2A/29TNJ11 August 201831.22151.58
S2A/29TPG19 June 201822.36141.25
S2B/29TPH14 June 201823.10143.23
S2B/29TPJ24 June 201823.95143.16
S2B/29TQH24 June 201822.67144.43
S2B/29TQJ24 June 201823.41145.61
S2A/30TUN5 August 201829.46146.64
S2A/30TUP5 August 201830.25147.34
S2A/30TVN5 August 201829.00148.81
S2A/30TVP5 August 201829.80149.51
S2B/30TWN27 August 201835.52153.22
S2B/30TWP27 August 201836.34153.70
Table 2. Sentinel-1 features used in this study.
Table 2. Sentinel-1 features used in this study.
Feature NameDescription
vh_AMPMean backscatter ( γ 0 ) for VH polarization
vv_AMPMean backscatter ( γ 0 ) for VV polarization
vh_COH06Median 6-day repeat coherence estimates for C-band VV polarized data
incLocal incidence angle
lsmapLayover/shadow regions: 1—no shadow or layover, 5—layover, 17—shadow, 21—shadow in layover
Table 3. ALOS-2 PALSAR-2 features used in this study.
Table 3. ALOS-2 PALSAR-2 features used in this study.
Feature NameDescription
hh_F02DARNormalized radar backscattering coefficient (Gamma-0) for HH polarization
hv_F02DARNormalized radar backscattering coefficient (Gamma-0) for HV polarization
linc_F02DARLocal incidence angle image
lsmask_F02DARProcessing mask information image: 0—no data, 50—ocean and water, 100—layover, 150—shadowing, 255—land
Table 4. Other variables used in this study.
Table 4. Other variables used in this study.
Feature NameDescription
mtriT mean (the average daily temperature during a year)
mxriT max (the average of the daily maximum temperatures during a year)
mnriT min (the average of the daily minimum temperatures during a year)
plrsPrecipitation (the accumulated precipitation per pixel during a year)
radgeRadiation (the accumulated radiation per pixel during a year)
X_25830X coordinate in EPSG:25830
Y_25830y coordinate in EPSG:25830
aspAspect (the orientation of the slope)
asrAspect/slope ratio (the relationship between aspect and slope)
cuCurvature (the concavity or convexity of the terrain)
elvElevation (the height above sea level)
hliHeat load index (an index representing the potential heat load)
plcPlan curvature (the curvature of the terrain in a horizontal plane)
pfcProfile curvature (the curvature of the terrain in a vertical plane)
slpSlope (the steepness of the terrain)
tsiTerrain shape index (a measure of terrain shape)
wiWetness index (an index representing soil moisture conditions)
Table 5. Criteria and number of outliers for the different data sources.
Table 5. Criteria and number of outliers for the different data sources.
SourceCriterionn Outliers
S2 band QABand_QA = 452
Radar shadow/layoverlsmap ! = 0 & lsmask ! = 25541
TanDEM-X temporalHtdx_first >= 0100
Table 6. Model performance metrics for Sentinel-2 and Sentinel-2 + TanDEM-X datasets.
Table 6. Model performance metrics for Sentinel-2 and Sentinel-2 + TanDEM-X datasets.
DatasetModelR2 BiasRMSERMSE (%)
Sentinel-2kNN0.25−2.87132.9367
Random Forest0.261.39132.1466
LightGBM0.272.33131.4266
XGBoost0.27−1.06131.6866
Linear Regression0.24−0.20133.5467
Sentinel-2 + TanDEM-XkNN0.40−8.33118.7660
Random Forest0.43−0.06116.1558
LightGBM0.442.99114.4657
XGBoost0.44−0.87113.8957
Linear Regression0.44−0.40114.4357
Table 7. Results of the multiple linear regression (MLR) model for different combinations of optical and radar data.
Table 7. Results of the multiple linear regression (MLR) model for different combinations of optical and radar data.
ModelMLR Total VolumeNR2 BiasBias (%)RMSERMSE (%)
Step 1Optical
SB(1)0.24−0.20−8.11133.5467
SI(2)0.25−0.16−9.07131.6766
SB + SI (1+2)(3)0.26−0.04−8.82132.0466
Radar
Sentinel-1(4)0.01−0.04−4.32152.9277
ALOS-2 PALSAR-2(5)0.07−0.20−1.46148.2574
TanDEM-X(6)0.33−0.02−2.42125.4563
C + L bands (4+5)(7)0.10−0.22−2.36146.1173
X + L bands (6+5)(8)0.37−0.07−0.96121.1561
C + X bands (4+6)(9)0.340.07−1.26123.7062
All radar (4+5+6)(10)0.390.02−0.17119.3860
Step 2Best Optical + Radar
Best op + best rad (1+10)(11)0.47−0.14−4.09111.4956
Terrain (11+terr)(12)0.48−0.18−3.51110.2455
Climate (11+clim)(13)0.46−0.10−3.75111.9856
Ter-clim (11+both)(14)0.47−0.18−3.16110.7456
Continuous maps
Optical + Radar (3+6)(15)0.44−0.23−5.80114.0957
Terrain (15+terr)(16)0.46−0.18−4.44112.1556
Climate (15+clim)(17)0.43−0.21−5.25114.5657
Ter-clim (15+both)(18)0.46−0.16−3.98112.4656
Table 8. Statistics for different models and all variables.
Table 8. Statistics for different models and all variables.
FeatureStatistickNNRandom ForestLightGBMXGBoostMLR
T V h a R20.400.440.490.490.48
bias−9.673.470.310.42−0.18
bias (%)−7.88−2.43−1.47−0.13−3.51
RMSE119.10115.52108.92108.93110.24
RMSE (%)6058555555
H 0 R20.340.430.490.500.47
bias−0.370.05−0.02−0.03−0.01
bias (%)−1.571.091.661.551.26
RMSE5.655.284.964.945.09
RMSE (%)2725242424
Table 9. Performance of the five methods for total volume prediction as a function of the number of features.
Table 9. Performance of the five methods for total volume prediction as a function of the number of features.
ModelN VarR2BiasBias (%)RMSERMSE (%)
MLR50.41−0.16−6.25117.7859
130.47−0.15−4.06111.0856
230.49−0.27−3.77109.3355
kNN50.40−4.48−4.34118.9760
130.44−7.73−6.10114.7458
230.40−9.97−5.95118.5759
RF50.420.21−3.43116.4558
130.470.60−2.38112.3656
230.461.15−3.29112.8757
LGBM50.423.84−2.61116.3758
130.473.720.76111.1356
230.484.55−0.77110.1555
XGBoost50.440.13−4.16114.9658
130.490.04−1.34109.7455
230.49−0.02−1.30109.0755
Table 10. Performance of the five methods in dominant height prediction as a function of the number of features.
Table 10. Performance of the five methods in dominant height prediction as a function of the number of features.
ModelN VarR2BiasBias (%)RMSERMSE (%)
MLR20.400.002.235.3926
70.450.001.895.1525
160.470.001.975.0524
kNN20.36−0.102.135.6027
70.42−0.31−1.405.3226
160.39−0.42−1.825.4526
RF20.400.022.335.4326
70.440.031.805.2225
160.460.000.995.1525
LGBM20.390.072.215.4426
70.450.112.205.1826
160.490.011.934.9924
XGBoost20.400.012.265.4326
70.450.001.965.1625
160.48−0.052.065.0124
Table 11. Summary of statistical metrics for each species and forest plantations.
Table 11. Summary of statistical metrics for each species and forest plantations.
FeatureStatisticE. globulusP. radiataP. pinasterForest Plantations
H 0 R20.440.430.510.49
bias0.20−0.000.040.01
bias %−1.162.81−2.291.93
RMSE5.384.544.195.05
RMSE (%)25192424
T V h a R20.490.350.610.49
bias1.25−2.870.44−0.02
bias %0.83−0.79−5.21−1.30
RMSE92.37136.8280.41109.07
RMSE (%)61504355
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Teijido-Murias, I.; Antropov, O.; López-Sánchez, C.A.; Barrio-Anta, M.; Miettinen, J. Forest Height and Volume Mapping in Northern Spain with Multi-Source Earth Observation Data: Method and Data Comparison. Forests 2025, 16, 563. https://doi.org/10.3390/f16040563

AMA Style

Teijido-Murias I, Antropov O, López-Sánchez CA, Barrio-Anta M, Miettinen J. Forest Height and Volume Mapping in Northern Spain with Multi-Source Earth Observation Data: Method and Data Comparison. Forests. 2025; 16(4):563. https://doi.org/10.3390/f16040563

Chicago/Turabian Style

Teijido-Murias, Iyán, Oleg Antropov, Carlos A. López-Sánchez, Marcos Barrio-Anta, and Jukka Miettinen. 2025. "Forest Height and Volume Mapping in Northern Spain with Multi-Source Earth Observation Data: Method and Data Comparison" Forests 16, no. 4: 563. https://doi.org/10.3390/f16040563

APA Style

Teijido-Murias, I., Antropov, O., López-Sánchez, C. A., Barrio-Anta, M., & Miettinen, J. (2025). Forest Height and Volume Mapping in Northern Spain with Multi-Source Earth Observation Data: Method and Data Comparison. Forests, 16(4), 563. https://doi.org/10.3390/f16040563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop