Sentinel-2 Time Series Analysis for Identification of Underutilized Land in Europe

Biomass and bioenergy play a central role in Europe’s Green Transition. Currently, biomass is representing half of the renewable energy sources used. While the role of renewables in the energy mix is undisputed, there have been many controversial discussions on the use of biomass for energy due to the “food versus fuel” debate. Using previously underutilized lands for bioenergy is one possibility to prevent this discussion. This study supports the attempts to increase biomass for bioenergy through the provision of improved methods to identify underutilized lands in Europe. We employ advanced analysis methods based on time series modelling using Sentinel-2 (S2) data from 2017 to 2019 in order to distinguish utilized from underutilized land in twelve study areas in different bio-geographical regions (BGR) across Europe. The calculated parameters of the computed model function combined with temporal statistics were used to train a random forest classifier (RF). The achieved overall accuracies (OA) per study area vary between 80.25 and 96.76%, with confidence intervals (CI) ranging between 1.77% and 6.28% at a 95% confidence level. All in all, nearly 500,000 ha of underutilized land potentially available for agricultural bioenergy production were identified in this study, with the greatest amount mapped in Eastern Europe.


Introduction
Biomass as a source of renewable energy production plays an essential role in Europe's Green Transition [1]. The European Union's (EU) Revised Renewable Energy Directive requests 32% of EU's energy production to originate from renewables. Moreover, bioenergy represents a valuable option to support the implementation of the Sustainable Development Goals (SDGs) of the United Nations (UN), especially SDG Nr. 7: "Ensure access to affordable, reliable, sustainable and modern energy for all" and Nr. 13: "Take urgent action to combat climate change and its impacts" [2]. The IPCC special report on the impacts of global warming 1.5 • also highlights the importance of bioenergy "due to its multiple roles in decarbonizing energy" [3]. The IPCC report and the EU explicitly state that bioenergy should be produced in a sustainable manner at all levels along the entire value chain and must not affect agricultural or food systems, biodiversity and various other ecosystem functions and services [3,4]. One approach to prevent the food vs. fuel competition is the use of contaminated, underutilized and/or marginal land that cannot be used for food or feed production but still retains the potential to produce biomass feedstock for bioenergy purposes [2,5,6]. Several studies concluded that using these lands for bioenergy production could have positive environmental and socio-economic impacts [7][8][9][10].
Generally, a land is considered to be underutilized when there is no sign of any human intervention over a certain period. The Food and Agriculture Organization (FAO) of the United Nations (UN) [11], for example, uses an idle period of five years in its definition. In this study, we used a period of three years due to the availability of S2 data from 2017 onwards. With the availability of satellite image time series of high temporal resolution, remote sensing offers the possibility to generate area-wide information on land use and therefore the presence and absence of human interventions. The opening of the entire Landsat data archive and the publicly accessible data of the S2 and Moderate Resolution Imaging Spectroradiometer (MODIS) missions with global coverage, especially, increased the use of dense satellite image time series for land use monitoring purposes [12]. Various studies used MODIS data for large-area assessments of underutilized or abandoned (farm) land in Europe and Asia due to their high temporal resolution [13][14][15][16][17]. Within Europe, the former communist countries in the Eastern part of the continent have been the regions of interest in these studies, since large agricultural areas were left fallow following the collapse of the Soviet Union [13,18]. However, the main drawback of MODIS data lies in the spatial resolution of 500 m, which prevents accurate and detailed regional and local assessments in areas of small structured agriculture, typical of large parts of Europe. To bridge the gap between continental assessment and high-resolution mapping requirements [19], Landsat 8 (L8) image time series from 2015-2019 with a spatial resolution of 30 m and a temporal resolution of 16 days are used to generate a European-wide map of underutilized land. The authors employed temporal features for random forest classification using Google Earth Engine (GEE) [20]. Due to the pan-European extent, only temporal features like minimum, maximum, standard deviation, etc., are used in this approach [19]. At the regional level, Landsat time series data from 1986 until 2008 have been used [21] to map post-socialism farmland abandonment in western Ukraine.
Being part of the EU's Copernicus program since 2017, the S2 satellites provide both higher spatial (10 m) and higher temporal (5 days) resolution optical data compared to Landsat. The potential of Sentinel-2 data for mapping and monitoring land abandonment has already been tested in several, mostly local studies, e.g., in Lithuania to map abandoned farmland [22]. Similarly, [23] used all 10 m resolution S2 bands and four derived vegetation indices of three observations to map farmland abandonment in western Slovakia. The performance of S2 to map abandoned citrus plantations on a local level in Valencia (Spain) was done [24] by employing mono-temporal S2 data in combination with airborne images from 2019 with a very high resolution (VHR). On the regional level, [25] employed S2 time series data from 2017-2020 to detect abandoned agricultural land in Valencia, including five spectral indices to evaluate the performance of different machine learning and deep learning classifiers [25].
Previous studies have shown that underutilized land (UU) has a different spectral reflectance behavior over time compared to utilized land (U) due to missing human interventions [14,19,25]. Typical human interventions are mowing and ploughing, which result in clear changes in the spectral reflectance of the respective patch of land. It was shown, that UU land usually shows different magnitudes and standard deviations of changes over time compared to utilized land due to the missing above-mentioned interventions [14,25]. Sometimes, however, temporal statistical features (TSFs) like minimum, maximum or standard deviation can be similar for different land use classes, while the temporal curve, which can be represented by time series model features (TSMFs), is very different (see Figure 1).
The aim of this study is thus to derive alternative features based on a time series model fitting of S2 imagery that captures the differences in spectral behavior of different land use classes over the course of time, other than temporal statistics. The calculated model function reconstructing the continuous spectral curve is described by a set of parameters, which depend on the employed mathematical model. Examples of these model parameters are the amplitude of the sinus or cosine, model variance or observation variance. Instead of using the modeled images (i.e., synthetic images), we analyze the performance of the model parameters to serve as classification input features to train a random forest classifier.
Therefore, the research questions to be analyzed in this study are:

1.
Which S2 time series model features of which spectral bands work best for the differentiation between utilized and underutilized land? 2.
What is the level of accuracy that can be achieved in different bio-geographical regions of Europe using a common classification approach?

Study Area
The selection of the study areas was done by the project consortium of BIOPLAT-EU (https://bioplat.eu accessed date 8 November 2021) based on statistical data and previous knowledge of underutilized lands in these areas. The twelve study areas (see Figure 2) are spread over three biogeographical regions (BGR) and thus cover much of the variability of the appearance of underutilized lands, due to unequal environmental conditions (e.g., climate, soil etc.), to be expected within Europe, except for Scandinavia and the British and Irish islands. Table 1 covers the main parameters of the study areas. Since the two study areas in Germany as well as in Spain are adjacent, they are treated as one single study area in the assessment, and no separate results will be reported. More detailed information on the selection of study areas can be found here [26].

Satellite Imagery
The Copernicus S2 mission provides high-resolution dense optical time series data for the development of Earth-Observation-based environmental monitoring systems [27]. The high temporal resolution of 5 days is obtained by having two satellites operating in the orbit carrying identical sensor systems. The whole S2 system has been fully operational since mid-2017, delivering imagery with 13 spectral bands [28]. In this study, S2 data including all 10 m and 20 m spectral bands (see Table 2) from 2017-2019 were used to derive the time series features used for the classification.

Training Data
The visual interpretation of Google Earth VHR image time series was the main source for the compilation of a reference dataset for underutilized lands. Though being a static product, the "Land Use/Cover Area frame statistical Survey" (LUCAS) database proved to be of valuable assistance for the identification of underutilized lands. LUCAS is a harmonized in situ (terrestrial) LCLU data collection procedure [29]. Its classification key includes two LU classes that can serve as indicators for underutilized lands (U410 "Abandoned Areas" and U420 "Semi-Natural and Natural Areas not in Use"). Details and the datasets can be obtained from the Eurostat Website [30]. LUCAS is conducted every three years, with the most recent survey having taken place in 2018. All U410 and U420 points in the study areas were visually checked and converted into polygon information, if the areas proved to be underutilized for the past three years according to data available in Google Earth. Additional reference data for underutilized lands were also gathered for the study areas in Germany, Hungary, Romania and Ukraine from local stakeholders.
A training dataset that covers all types of used land, in particular forests, settlements, annual and permanent cropland and managed grassland, was generated using a random point sampling approach with the following European-wide COPERNICUS Land Monitoring Service Products: 1.
Due to their wall-to-wall structure and large amounts, no further visual interpretation was needed for these training datasets. Since the above-mentioned datasets do not cover the state of Ukraine, we used a separate land use classification provided by [31]. For the classification, all training data for different types of used land were summarized in one single class, e.g., the same label was assigned to all of them.

Reference Data for Exclusion of Specific Areas
As laid out in the introduction, the production of biomass for bioenergy purposes should not affect any existing agricultural or food systems or jeopardize nature's biodiversity and various other ecosystem functions and services. Therefore, areas known to feature these functions are removed from the results. This step can be seen as a "safeguarding" procedure to ensure that, ideally, no areas used for food production are assigned to underutilized land. Due to the project's ambition to generate close-to-practice results, we had to include this step, removing known areas used for food or feed production, steep slopes not suitable for bioenergy production and protected areas for example. All details on the exclusion of specific areas can be found in [19], where the same procedure was applied.
For the assessment, only those parts within the selected study areas that are not covered by these existing dataset are considered in the assessment and are referred as "area of interest" (AOI) in the further course of this paper (see Table 3).

Reference Data for Validation
To report accuracies of the mapping results, validation data were derived based on a stratified sampling approach. Based on an independent classification of underutilized lands (the map produced by [19]), we sampled the same number of points per class, e.g., utilized and underutilized land, randomly. These points were then visually interpreted using Google Earth VHR time series data using the same procedure described by [19] with two adaptations: first, the minimum mapping unit (MMU) was changed to 0.5 ha to be in line with the product definition, and second, the reference period was adapted to 2017-2019 instead of 2015-2019, since satellite image time series data from this period were employed. The weights for each class were calculated using the ratio of the area of the respective class from the independent classification [19] and the number of interpreted points. Using such an approach, the resulting accuracy measures are unbiased. Compared to the study of [19], it has been necessary to adapt two parameters for the visual interpretation of the validation points: first, the minimum mapping unit (MMU) was changed to 0.5 ha, and second, the reference period was adapted to 2017-2019 instead of 2015-2019, since satellite image time series data from this period were employed. The final number of available validation points per study area is given in Table 4.

Methods
The entire mapping and validation approach is schematically shown in Figure 3. The first component of the mapping procedure was the S2 imagery pre-processing. This step included the transformation of S2 L1C top of atmosphere data to surface reflectance values using the Sen2Cor processor provided by ESA [32], topographic normalization to correct terrain effects influencing reflectance values as well as cloud and cloud shadow masking using the FMask algorithm [33].
Following our research questions, the main focus of this study is on the selection of the best image time series features for distinguishing underutilized and utilized land. Therefore, all other parameters of the entire classification process, such as the S2 preprocessing as well as the training data, the used classification algorithm (random forest) and the post-processing are kept constant. In this study, we distinguish two categories of time series features. First, "statistical temporal features" (STFs), which are calculated from the time series of pre-processed images. These are, for example, minimum, maximum, mean, median, standard deviation, trend, etc. The period used to calculate the STFs depends on the research purpose and, therefore, can vary from days to years. STFs do not require a time series model to be fit to the data stack. Instead, they are calculated from the pixel values directly (either spectral values or combinations of them, such as ratios or indices). STFs have been used successfully in the past, e.g., for forest disturbance mapping [34]. The second group of time series features are based on a model fit, thus called "time series model features" (TSMFs). These TSMFs correspond to the parameters of the fitted model function. While STFs for different land use classes can be similar (see Figure 1), TSMFs can vary significantly and thus be used for improved differentiation. The model used to derive TSMFs is variable, e.g., can be a harmonic regression model or a more complex model, like a logistic or double logistic model. The number of TSMFs varies with the selected model. For a harmonic regression model, the TMSFs are the three model parameters defining the function (offset, sinus and cosine), the related three model variance parameters and the observation variance. In this study, it was decided to use a harmonic model based on a Fourier series (a series of superimposed sine and cosine functions), which has been used in various studies based on image time series to model vegetation phenology [35][36][37][38][39][40]. TSMFs based on harmonic regression techniques have already been used for crop type mapping [41][42][43][44] and within various forest-related LCLU classification [45][46][47][48] and disturbance mapping [49,50] approaches. Regarding the EO data used to derive TSMFs, the majority of these studies used vegetation indices or components of the Tasseled Cap Transformation derived from Landsat time series data [41,[45][46][47]49,50]. The authors of [42] employed MODIS NDVI data and SAR Sentinel-1 data. Equally based on Landsat data, but using spectral bands solely to calculate harmonic time series models, the authors of [49] developed a forest disturbance mapping and LCLU classification approach. More recently, a couple of studies have been published evaluating the use of harmonic TSMFs based on S2 spectral bands and/or indices time series data for different LCLU applications [44,48].
The basic formula of a Fourier series is written as: In this formula, f t j corresponds to the modeled reflectance value; a 0 is the mean reflectance value over the modeled time series; a i and b i are the amplitudes of the cosine and sinus wave of the harmonic component i; and M is the number of harmonic components used (M ≥ 1), i.e., the order of the harmonic function. T describes the length of the time period, and t j corresponds to a certain point in time.
Based on Equation (1), for the actual observations of a time series the following equation applies: where f t j is the observed value, and the term ε t j corresponds to the residual error. The observation variance used in this study is the variance of all residual errors. The harmonic model was used to calculate the time series model for two reasons. The first reason is performance. Logistic and double logistic models are non-linear models and, therefore, only approximate solutions can be calculated. They are highly demanding in computational power, and the accuracies of these approximate solutions rely heavily on the initial conditions and the quality of the raw input time series data [51,52].
The second reason is that we expect a harmonic model behavior in underutilized land. If a patch of land is untouched, e.g., it is not influenced by any human induced activities, such as mowing or ploughing, natural processes causes the spectral value's temporal trajectory to behave similarly to a 1st order harmonic curve (M = 1) during the growing season. This is shown in Figure 4 for several land use categories and for underutilized land. In contrast, the temporal spectral behavior of cropland, especially when harvested during the growing season, as well as managed grassland with related mowing events, show more changes and therefore do not fit a 1st order harmonic model. Hence, we decided to use a 1st order harmonic model (M = 1) to calculate the time series model. Moreover, regarding the length of the time period T, this value was set for study areas located in the Mediterranean region from April until the end of October, and for all other study areas observations from the beginning of May until the end of October were used. This selection was necessary to remove data affected by snow, low sun illumination and deep shadows.
The harmonic model was calculated for each spectral band and each year separately. This resulted in a TSMF set of four features per band. They are: offset a 0 , amplitude of cosine a i , amplitude of sinus b i and the observation variance (i.e., the average residual error). These four features times three years times ten bands leads to 120 TSMFs for each study area. In addition to these 120 TSMFs, the temporal standard deviation of the Normalized Difference Vegetation Index (NDVI) was calculated for each year separately and added to the pool of classification input features. Therefore, the final feature dataset per study area to train the random forest (RF) classifier was composed of 123 features. No time series model of the NDVI or any other vegetation index or band combination was calculated. As laid out in Section 1, the objective was to investigate the performance of spectral bands. Moreover, since all 10 m and 20 m S2 bands are included in the investigation, the information of indices or band combinations is already included and therefore available for the classifier. However, the NDVI temporal standard deviation was included as an additional feature because previous studies showed that this feature is very valuable for the detection of human-induced activities [19,53]. RF is an ensemble learning method belonging to the group of non-metric decision tree classifiers. It constructs several independent decision trees modeling the relationship between the predictor (classification input features) and response variable (used versus underutilized land). The final response is calculated using the majority vote [54][55][56]. In this study, all classifiers are generated with 500 trees, and the employed reference data were split into a set of training data (90%) and a set of RF internal model validation data (10%). Though the accuracies of the results are assessed in an independent validation, this internal validation was used as an a priori indication on the performance of the classification to evaluate the need for modifications in the training process.
In the process of training, RF allowed us to calculate the importance of each feature for the classification, i.e., which features comprise a lot of valuable information to distinguish between the response variables [57]. Since computation time does not only depend on the number of trees but also on the number of predictor variables, a threshold of 0.01 for this feature importance was defined to narrow down the number of predictor variables. The threshold value marks the percentage of importance that must be reached by a feature to be included in the final set of features used by the classifier. Consequently, this resulted in a different number of features contributing to the classification for each study area. In the final post-processing step, a MMU of 0.5 ha was applied.

Feature Importance
The first part of this chapter presents the results of the feature importance analysis, per BGR. As laid out in Section 2.3, the input feature set for each study area to train the RF classifier consists of 123 features, 120 TSMFs (4 parameters × 10 bands × 3 years) and 3 STFs (standard deviation of NDVI per year). According to Table 1, six study areas are located in the Continental, four in the Mediterranean and two in the Pannonian BGR. Therefore, the maximum number of time a certain parameter can be used for classifications within a BGR is not identical for all BGRs. For the Continental BGR the maximum is 15, considering that the two German study areas are classified in the same run (5 study areas × 3 years), for the Mediterranean BGR it is 12 (4 study areas × 3 years) and for the Pannonian BGR it is 6 (2 study areas × 3 years). To compare the BGRs directly, Figures 5-7 depict the relative frequency of usage of each TSMF for the RF classification in the respective bio-geographical region (BGR).    Figure 6). For the visible domain of the electromagnetic spectrum (B2, B3, B4), the results reveal that only the observation variance of these three bands is of importance for the classification in all three BGR's, with the red band (B4) being used more often than the green (B3) and blue (B2) bands. It can also be observed that the observation variance has a higher importance in the dryer BGRs (Mediterranean and Pannonian, Figures 6 and 7) than in the humid Continental region ( Figure 5). Apart from the B11 and B12 amplitude of cosine in the Pannonian region, all three figures indicate that the amplitude of both the cosine and the sinus are of less importance than the offset and observation variance.
In addition to the TSMFs, the temporal standard deviation of the NDVI per year as one TSF complemented the input feature dataset to train the RF classifier. The analysis of the frequency of this feature among the most important features strongly depends on the BGR. In the Pannonian BGR the temporal standard deviations of the NDVI significantly contributed to the classification with 5 out of 6 NDVI TSF features included according to the RF feature importance report (83%). In the Mediterranean region, almost 60% of the combinations (7 out of 12) included the standard deviation of the NDVI in the classification. In the Continental region, however, it was found to be less important, with presence in only about one third of the combinations (5 out of 15).

Classification Results
This chapter describes the classification results. Table 5 provides the area of detected underutilized lands, their shares of the entire AOI as well as the average and median size of single patches per study area. The largest absolute area of UU land has been detected in the Spanish study area, followed by Chernihiv in Ukraine and Gorj in Romania. The least absolute area was detected in Sulcis (Italy) and Hungary-North. The absolute figures are important for the stakeholders and users in order to decide on investments in the bioenergy sector. However, since the size of AOIs differs significantly, it is difficult to draw any conclusions from the overall area of underutilized land expected in the individual BGRs or countries. Therefore, the area share of underutilized lands per AOI was calculated. The highest shares can be found in Ukraine and Romania, followed by Spain and Sulcis, in Italy (see Table 5). In terms of patch size, the average patch size is largest in Chernihiv (Ukraine), with 11.12 ha followed by the Spanish study area (5.65 ha), and the second Ukrainian study area, with 5.01 ha. Regarding the average patch sizes per BGR, the continental BGR has the largest patch size, but with a large internal variation (2.76 ha in Germany versus 11.12 ha in Chernihiv). The smallest patch sizes were found in the Pannonian BGR. Since the mean value is sensitive to outliers, the median value is also reported. Splitting the data into a lower and an upper half, this value is expected to indicate more reliably what the "typical" UU patch size is. Results show the same pattern as for the average size, and the highest median values are found in the Continental BGR, followed by the Mediterranean BGR and the Pannonian BGR. However, the median values lie within a range of 0.45 ha across all study areas, with the smallest value (0.95 ha) for Bacs-Kiskun & Csongrad and the highest value for Chernihiv (1.40 ha).

Accuracy Assessment
This part of the results reports the achieved unbiased classification accuracies calculated based on the validation data listed in Table 4. In addition, count-based accuracy measures are reported in the Table 6 for comparison. In Table 7 the achieved overall accuracies (OA), commission errors (CE) and omission errors (OE) are reported. In brackets, the values for the respective confidence intervals (CI) at the 95% confidence level are reported. This means that with a probability of 95%, the respective accuracy value lies within the CI around the given value. For example, the highest OA with 96.76% and a confidence interval (CI) of 1.83% for Hungary-North means that with a probability of 95%, the OA in Hungary-North lies between 94.93 and 98.59%. The second highest OA was reported for Albacete & Cuenca (94.89%), followed by the second Hungarian study area, Bacs-Kiskun & Csongrad (92.34%). The smallest OAs are slightly above 80% and were obtained for both Italian and Romanian study areas. Considering the before-mentioned food versus fuel debate, we considered the CE to be more critical than OA because high CEs mean that a lot of underutilized lands detected by the mapping approach actually are used. According to Table 6, the lowest CEs were achieved in Hungary-North (0%/no CE), followed by Val Basento (2.77%) and Sulcis (5.83%). The highest CEs are reported for Dahme-Spreewald & Spree-Neiße (91.45%) and the two study areas in Romania, Gorj (43.93%) and Bacau (20.53%), respectively.

Feature Importance
To our knowledge, there are no other studies employing TSMFs for the differentiation of underutilized and utilized lands. Thus, we have to broaden the discussion of the feature importance to other vegetation and agricultural classification approaches while recognizing the limited comparability. Our finding, that offset and observation variance are the most valuable information sources, was also reported by [48], who used TSMFs from S2 spectral bands and spectral indices time series to estimate the canopy height.
Figures 5-7 also show that the difference between utilized and underutilized land mainly occurs in the red, near infrared, red-edge and shortwave infrared bands. This is in line with previous studies on the vitality of vegetation [58]. A study applying a monotemporal classification approach [24] also reported high importance for the near infrared and shortwave infrared S2 bands as well as derived indices from these bands to map land abandonment.
The importance of the observation variance supports the hypothesis stated in Section 2.3, i.e., that we expect the temporal spectral behavior of underutilized land to fit a harmonic model better that utilized land (see Figures 1 and 3). Regarding the amplitudes of the cosine and the sinus, Figures 5-7 suggest that they contain less relevant information for the differentiation of the target classes. This implies that the maximum of the modeled curve and the time of reaching during the growing season can be quite similar for some utilized and underutilized lands. This implication is supported by Figure 3, e.g., comparing underutilized land and maize.
The importance of the temporal standard deviation of the NDVI for the study areas in the Mediterranean and Pannonian BGR indicates that especially in BGRs, characterized by dry climate, STF comprise valuable information to differentiate between utilized and underutilized land. The reason may be found in the agricultural practice of irrigating cropland, which is necessary in wide areas of the Mediterranean BGR to increase crop yields. Similar results in this respect were also found in earlier works of the authors [19].

Classification Results
The greatest amount of underutilized land, not only in terms of absolute area but also of area share, was mapped in the eastern part of the Continental BGR (study areas in Romania and Ukraine). This result is perfectly in line with previous assessments [13,14].
In comparison to the use of Landsat data [19,59] or MODIS data [13,14], employing S2 image time series has two major advantages: the revisit interval and the higher spatial resolution, which enables a much smaller MMU. Table 5 shows that the average patch size in all study areas, except Cernihiv, is below 10 ha, and the median does not even exceed 2 ha. Comparing this with results achieved by [19], where a MMU of 10 ha was used with Landsat 8 time series data, it can be deduced that a great amount of potentially underutilized lands mapped in this study could never have been detected with a lower resolution data.
In addition to the feasibility of mapping smaller underutilized land patches, the higher spatial resolution of S2 images also allowed us to delineate the boundaries of the identified underutilized land patches more accurately. Figure 8 shows a comparison of underutilized land delineated in this study and the pan-European approach using Landsat 8 time series data employed by [19].

Accuracy Assessment
Generally, the quality of a calculated time series model strongly depends on the number of valid observations (e.g., valid pixel values) and the quality of these observations. Invalid observations are induced by clouds, cloud shadows, haze or snow. Pixels representing these conditions are filtered and eliminated through a separate pre-processing algorithm. It further needs to be considered that adjacent S2 granules overlap vertically and horizontally, leading to more observations for certain parts of a granule, thus increasing the amount of potentially useful observations. Figure 9 shows how the conditions of the atmosphere, which are highly diverse across Europe, and the overlapping of S2 granules impact the available number of valid pixels in the case the following study regions: Gorj (left), Dahme-Spreewald & Spree-Neiße (middle) and Sulcis (right).  The number of valid pixel values strongly affects time series modeling: if there is a large gap in the time series, it is more likely to miss a change caused by human intervention. In this case, the classification algorithm also fails to recognize these patterns. This situation would result in higher omission errors for the utilized land class and higher commission errors for the underutilized land class. Figure 9 clearly highlights the fact of the significantly reduced availability of valid pixels for the German study areas, explaining the low OE and CE obtained for this study area (see Table 6).
A further source of misclassifications that needs to be kept in mind is the method of generating training data for utilized land. As mentioned in Section 2.2.2, existing datasets are used to generating training data for the utilized category using a random sampling approach. Since the sampled points were not revised manually, errors may be present in the training data. Moreover, the existing products have different MMUs compared to the results produced within this study, which can also lead to errors in the training data. Finally, most of the existing datasets represent status products of 2018 as compared to the three-year interval captured in our study.
Looking at Table 6, it can be noticed that for both study areas in Hungary an extremely high OE for UU land is reported while the CE for UU land is low. One reason for this might be found in the training data. Since the CE is low, the conclusion can be drawn that the UU training data does not represent the entire variety of UU lands in these study areas. Therefore, it is to be expected that there is a considerable amount of underutilized land not detected with the proposed approach.
For the German study areas, not only a high OE but also a high CE are obtained for UU land. One possible explanation could be the lower number of valid observations available to calculate the harmonic model (see Figure 9). A second reason, specifically for the high OE, can be found in inaccurate pre-processing due to difficult atmospheric conditions. Valid pixels influenced by remaining clouds, haze or snow, lead to a higher observation variance. This may induce the classifier to assign the specific pixel that actually represents underutilized land to the utilized class. A third reason could be the similar spectral curve of maize, which is very common in this region of Germany, and underutilized land (see Figure 4). The high CI at the 95% confidence level (15.82%) is related to the low number of validation points for the underutilized class (see Table 4). As in the case of these two districts, Dahme-Spreewald and Spree-Neiße, it is challenging to generate a validation dataset with a reasonable amount of points representing underutilized lands, since the area of underutilized land in the reference map [19] used for stratification is small. This limited number of validation points automatically leads to higher confidence intervals. Moreover, the small area of underutilized lands leads to a small weight for underutilized validation points, while the weight of utilized points is comparably high. The consequence of this can be observed very well when comparing unbiased (Table 6) and count-based (Table 7) accuracy measures. The much lower count-based underutilized land CE of 38.46% illustrates the impact of a small amount of wrongly classified utilized validation points with a high weight on the unbiased CE.
After the Dahme-Spreewald & Spree-Neiße study area, the highest CE was achieved for Gorj. A possible cause for this might be the small structured agriculture found in large parts of the study area, leading to mixed-pixels representing both utilized as well as underutilized land (see Figure 10)

Conclusions
In this study, S2 satellite image time series from 2017-2019 were employed to map underutilized lands in twelve different study areas in six different European countries across three biogeographical regions. The mapping approach was based on TSMFs derived from a 1st order harmonic function. The time series model was calculated for each year (2017-2019) and each S2 band (10 m and 20 m bands). In addition, the temporal standard deviation of the NDVI complemented the dataset. It was successfully demonstrated that the retrieved model parameters offset, the amplitude of the cosine, the amplitude of the sinus and the observation variance in combination with the temporal standard deviation of the NDVI can serve as predictor variables for a RF classification approach to map underutilized land. With this study we aimed to investigate the following research questions:

1.
Which S2 time series model parameters of which spectral bands work best for the differentiation between utilized and underutilized land? 2.
What is the level of accuracy that can be achieved in different bio-geographical regions of Europe using a common classification approach?
Regarding the first research question, the study revealed that, regardless of the BGR, the TSMFs offset and observation variance are of great relevance to distinguish between utilized and underutilized land. In particular, the importance of the observation variance supports our hypothesis that utilized land does not fit a harmonic model well due to human interventions, such as mowing or ploughing. Concerning the importance of different spectral bands, it turned out that the near infrared, red-edge and short waved infrared bands comprise more important information than the bands sensitive to the visible domain. The importance of the temporal standard deviation of the NDVI strongly depends on the BGR. All in all, nearly 500,000 ha of underutilized land were detected across all study areas, with the greatest amounts found in the Mediterranean BGR and in Eastern Europe.
The topic of the second research question is the achievable accuracy. In terms of OA, the results range between 80.25% and 96.76%, with the highest OA achieved for Hungary-North (around 96%), followed by the study area in Spain (around 94%). The lowest accuracy was obtained for Sulcis, in Italy, followed by Chernihiv, in Ukraine (both slightly over 80%). Despite the unequal environmental conditions, it was successfully shown that the same mapping approach works very well in different BGRs. The produced maps provide a valuable basis for further assessment of using so far underutilized land for sustainable bioenergy production and, consequently, supporting Europe's Green Transition. Data Availability Statement: Publicly available datasets were analyzed in this study. Sentinel-2 data can be downloaded here: https://scihub.copernicus.eu (accessed on 8 November 2021). COPERNICUS HRL layers are accessible here: https://land.copernicus.eu/pan-european/highresolution-layers (accessed on 8 November 2021). CLC data is available here: https://land.copernicus. eu/pan-european/corine-land-cover (accessed on 8 November 2021). OSM data can be found here: http://download.geofabrik.de/europe.html (accessed on 8 November 2021). LUCAS data are available here: https://ec.europa.eu/eurostat/web/lucas/data/primary-data/2018 (accessed on 8 November 2021). Natura2000 data are available here: https://ec.europa.eu/environment/nature/ natura2000/access_data/index_en.htm (accessed on 8 November 2021).