Testing the Effect of Relative Pollen Productivity on the REVEALS Model: A Validated Reconstruction of Europe-Wide Holocene Vegetation

: Reliable quantitative vegetation reconstructions for Europe during the Holocene are crucial to improving our understanding of landscape dynamics, making it possible to assess the past effects of environmental variables and land-use change on ecosystems and biodiversity, and mitigating their effects in the future. We present here the most spatially extensive and temporally continuous pollen-based reconstructions of plant cover in Europe (at a spatial resolution of 1° × 1°) over the Holocene (last 11.7 ka BP) using the ‘Regional Estimates of VEgetation Abundance from Large Sites’ (REVEALS) model. This study has three main aims. First, to present the most accurate and reliable generation of REVEALS reconstructions across Europe so far. This has been achieved by including a larger number of pollen records compared to former analyses, in particular from the Mediterranean area. Second, to discuss methodological issues in the quantification of past land cover by using alternative datasets of relative pollen productivities (RPPs), one of the key input parameters of REVEALS, to test model sensitivity. Finally, to validate our reconstructions with the global forest change dataset. The results suggest that the RPPs.st1 (31 taxa) dataset is best suited to producing regional vegetation cover estimates for Europe. These reconstructions offer a long-term perspective providing unique possibilities to explore spatial-temporal changes in past land cover and biodiversity.


Introduction
The IPBES 2019 global report ranked changes in land use and land cover as the greatest drivers of declines in nature and biodiversity [1].Anthropogenic biodiversity decline and anthropogenic climate change have largely been driven by the direct exploitation of nature through deforestation and conversion for agriculture and livestock production (IP-BES, [1,2]).Loss of biodiversity and the transformation of nature by humans are often considered to be recent impacts on the environment, stretching over recent decades and centuries, and reflected in instrumental records and detailed ecological surveys.However, the reshaping of terrestrial nature began in the Paleolithic and Mesolithic ages, with practices including hunting, fishing, and gathering having the potential to modify existing ecosystem systems at localized scales through, for example, selective gathering or depleting of local resources [3][4][5][6].Forest clearance for agriculture started at least 6000 years ago in western Europe and probably earlier in the regions in which agriculture developed [7].Sustainable practices over time took the form of appropriation, colonization, and land-use intensification, which led to ecosystem transformations [8].The intensification of human activities and the loss of sustainable practices make Europe one of the regions of the world where human-induced effects on vegetation are most notable [7].The increments of species extinction, soil erosion, altered biogeochemical processes, fire frequency, and hydrology left long-term legacies across the biosphere and shaped most of terrestrial nature for at least 12,000 years [8].A worldwide acceleration in the rates of vegetation compositional change starting between 4600 and 2900 years ago was demonstrated using a global set of over 1000 fossil pollen records and a new method to estimate the "rate of change" [9].This human-induced acceleration was shown to exceed the climate-driven transformations of the last deglaciation.This study highlights past land use and environmental forcings legacies in relation to the strong anthropogenic imprint of the past decades on contemporary communities and biodiversity trends.Times of technological development (e.g., the introduction of metals, innovations in plough shape, new cropping systems) transformed ecological structures and dynamics, including vegetation (impacting species richness, evenness, and biomass), which led to progressive replacement of semi-natural or natural ecosystems by human-modified ones [3,[9][10][11].In order to fully understand past and contemporary ecological processes, rates of biodiversity changes, and ecological thresholds at continental scales and globally, it is essential to have an overview of long-term land-cover dynamics [12][13][14][15][16][17][18].
Attempts to reconstruct past land cover have been made to model land-use and landcover change (LULCC) over Holocene time scales (e.g., HYDE 3.2 [19] and KK10 [20]).Such LULCC scenarios have been used in combination with dynamic vegetation models to understand interactions between different components of the Earth system in the past (Earth system modeling (ESM), [21]).However, there can be considerable disagreement between different LULCC scenarios, and this highlights the need to use independent and empirical datasets of land use and land cover [22].Pollen archives remain the best empirical data to address differences between LULCC scenarios, as they provide a direct proxy for vegetation cover [22][23][24][25].
Efforts to improve our understanding of past vegetation using pollen have led to the development of models that correct the non-linear pollen-vegetation relationship and can compensate for plant taxon-specific differences in pollen production, dispersal, and deposition [26,27].Currently, the 'Regional Estimates of VEgetation Abundance from Large Sites' (REVEALS) model is the most appropriate method to reconstruct plant cover at a regional spatial scale of ca. 100 km × 100 km [27].The REVEALS model was developed to transform pollen data from large lakes but can also produce regional vegetation cover estimates from multiple small-sized sites [27][28][29][30].
The REVEALS model has previously been applied at regional to continental scale.Githumbi et al. (2022) published the most detailed estimates of past plant cover across Europe and part of the eastern Mediterranean-Black Sea-Caspian corridor [31].REVEALS reconstructions were performed at a spatial scale of 1° × 1° (grid cell of ca. 100 km × 100 km) and a temporal resolution of 500 years between 11.7 and 0.7 ka BP, and three shorter time windows (0.7−0.35 ka BP, 0.35−0.1 ka BP, and 0.1 ka BP−present) for use in climate modeling studies (e.g., [32]).The accuracy and reliability of gridded REVEALS estimates have been discussed in several studies [28,31,33,34].Gridded REVEALS estimates are influenced by the quality of individual records used (pollen count size, taxonomic resolution, and chronological uncertainty), basin size, and type of sites (lakes or bogs), the number of pollen records used in each grid cell, and the reliability of the relative pollen productivities (RPPs) used.Where gridded REVEALS estimates are based on low numbers of small sites, there is greater uncertainty in the reliability of the REVEALS estimates; if the RPPs for taxa that are considered important components of the regional vegetation are based on limited empirical studies, this can further impact the quality of the estimates.For instance, more work has been undertaken on RPPs of temperate and boreal taxa than those that characterize the Mediterranean region.
RPPs and their standard deviations exist for more than 131 Northern Hemisphere taxa, with the longest research effort in Europe, and several syntheses of RPPs have been published [28,31,[34][35][36].As the REVEALS model assumes that RPP values are constant within the region of interest and through time [27], studies working at the sub-continental scale have calculated mean RPPs considering all available RPP values.This can overcome the variability of RPP estimates within one taxon.Mazier et al. (2012) produced the first RPP-means dataset for Europe [28], comprising 25 pollen taxa that were used in the "first generation" of REVEALS reconstruction for Europe [33].Githumbi et al. (2022) published an updated RPP-mean dataset for 50 taxa from Europe [31].Thirty-nine of the taxa were from boreal and temperate Europe, and for the first time, 11 taxa characteristic of Mediterranean Europe were included.
The first RPP-mean dataset for Europe was used to evaluate the effect of entomophilous taxa on gridded REVEALS estimates for the Czech Republic [28].The authors showed that entomophilous taxa tend to affect the REVEALS estimates because the REVEALS model assumes that all pollen is airborne [27] and justified excluding as many entomophilous taxa as possible from REVEALS reconstructions.Githumbi et al. (2022) included taxa with mixed wind and insect pollen transport such as Artemisia, Amaranthaceae/Chenopodiaceae, Ericaceae (Calluna excluded), Rubiaceae, and Plantago lanceolata [31].The application of multiple sets of RPP values for different climatic regions in a single REVEALS reconstruction cannot be achieved without independent data on past climatic changes, which can shift the boundaries between climate regimes over the Holocene [31].Some taxa are more difficult to handle, such as the family Ericaceae, which contains a morphologically diverse range of taxa, including herbs, dwarf shrubs, shrubs, and trees [37].Only two RPP values across Europe are available for Ericaceae [31].The first is from the Mediterranean area, where Ericaceae species are mainly tree forms, produce abundant pollen, and therefore have a high RPP.The second is based on lowgrowth shrubs in northern Europe and has a lower RPP value than that from the Mediterranean.
In this paper, we use an updated version of the REVEALS reconstruction from [31].This third generation produces grid-based estimates at 1° × 1° (ca. 100 km × 100 km) across 30°-71° N, 20° W-47° E (northwestern, central Europe, Mediterranean area, and part of the East until 47° E, Figures 1 and A1) for 25 contiguous time windows across the Holocene.The number of pollen records used (1607) and the area covered (most of Europe) of 539 grid cells represent a significant advance on the results presented by [31], which was based on 1128 pollen records for Europe and part of the Eastern Mediterranean-Black Sea-Caspian-Corridor. We used three RPP-means datasets and evaluated the extent to which the selection of a set of RPP-means influences the REVEALS estimates.The three datasets are (i) the RPP-means dataset from [31] (RPPs.st1:31 taxa); (ii) a new synthesis proposed in this study inclusive of a larger number of entomophilous taxa (RPPs.st2:46 taxa); and (iii) a composite dataset, derived from [28,31] (RPPs.st3:31 taxa).Grid-cell reliability depends on the number and type of pollen records for the 25 time windows (TWs).Reliable: ≥1 large lake(s), ≥2 small lake(s) and/or small bog(s), mix of ≥1 large lake(s) and ≥1 small lake(s) and/or small bog(s); less reliable: 1 bog (large or small) or 1 small lake.Grey grid cells: less reliable results for all TWs.Colour indicate, for each grid cell, the % of the total number of TWs with reliable REVEALS reconstructions of plant cover.For instance, light yellow grid cells imply that reconstructions are reliable for 8-21% of the TWs, while they are less reliable for 79-92% of the TWs; and dark green grid cells indicate that reconstructions are reliable for 98-100% of the TWs, while they are less reliable for 0-2% of the TWs.
The specific aims of this paper are: (1) to improve the accuracy and reliability of RE-VEALS estimates across all of Europe through significantly increasing the number of pollen records used for the reconstruction (particularly in the Mediterranean region); (2) to explore how three different RPP-means datasets impact the model output; (3) to identify the geographical location of the differences between the three REVEALS reconstructions and which plant taxa may explain these differences; and (4) to determine which RPPmeans dataset is best to use, by validating REVEALS estimates against recent vegetation cover across Europe.

The REVEALS Model
REVEALS, a generalized form of the R-value model [38], estimates past regional vegetation abundance using fossil pollen counts from large sites [27] and expresses the regional vegetation composition as "the ratio of the pollen counts of each taxon weighted by its pollen productivity and dispersal term to the total sum of those for all taxa" [39].REVEALS and its assumptions are described in detail in [27].Here we briefly list the main assumptions: (1) the major agent of pollen transport is wind, and wind direction is even in all directions; (2) the site shape is circular; (3) no source plants for pollen grow on the basin surface; (4) relative pollen productivities are constant through time and space [27].
REVEALS was developed for pollen records from large lakes (>50-100 ha) [27].Several empirical studies tested its performance using pollen counts from multiple smallsized sites, showing that REVEALS estimates based on pollen records from small lakes or bogs are similar to REVEALS estimates based on pollen records from large lakes [28][29][30].In the absence of pollen records from large lakes, the larger the number of small sites (lakes or bogs), the better the REVEALS results.Simulations showed that increasing the number of pollen records significantly decreased the standard error of the REVEALS estimates [27].However, bogs (large and small) violate one of the assumptions of the RE-VEALS model, i.e., "no source plants for pollen grow on the basin surface" [27].Violation of this assumption has been shown to bias REVEALS results most significantly in the case of large bogs, while pollen records from multiple small bogs have been shown to produce REVEALS estimates that are similar to those from large lakes and can thus be used to provide reliable estimates of plant cover [28,30].In Figure 1, we present a measure of the reliability of the REVEALS reconstructions presented in this paper.It is based on the number of sites (pollen records) and the type and size of sites used in each grid-based RE-VEALS estimate of plant cover and expressed in percentage of all 25 contiguous timewindows of the Holocene with reliable REVEALS reconstructions (see further details on this issue in the Discussion).
REVEALS accounts for inter-taxonomic differences in pollen productivity and dispersal proprieties as well as the size and type of sedimentary basins.Two major modelling schemes have been implemented in REVEALS to describe the dispersal and deposition of pollen grains in the air.Pollen dispersal is approximated either by a Gaussian plume model (GPM) of small particles from a ground-level source under various atmospheric conditions [40][41][42][43][44][45] or by a Lagrangian stochastic model (LSM) under more realistic wind fields and atmospheric turbulence conditions [46][47][48].Theoretically, the choice of dispersal model needs to be consistent for both obtaining RPPs and reconstructions of past vegetation using those RPPs [49].Because of the limited number of LSM-based RPPs available in Europe and elsewhere, most of the REVEALS applications have so far used GPM-based RPPs [28,31,33,34].
The input parameters to run the REVEALS model are original pollen counts, relative pollen productivity (RPPs) and their standard deviations (SDs), fall speed of pollen (FSP), basin type (lake or bog), size (radius, m), maximum extent of the regional vegetation (km), wind speed (m.s −1 ), and atmospheric conditions.We followed the protocols and criteria published in [28,33] and lately in [31].The selection and preparation of individual pollen records and the values of model parameters used are described in the following sections.

Fossil Pollen Records: Data Compilation and Preparation
A total of 1607 pollen records (923 and 684 sites from bogs and lakes, respectively) (see TERRANOVA_metadata in https://doi.org/10.48579/PRO/J5GZUO,accessed on 24 April 2023) was compiled from 41 countries covering all European countries, the western part of Russia and the eastern Mediterranean-Black Sea-Caspian corridor (Appendix A, Figure A1).This work benefited from earlier efforts and projects (Landclim I and II), which compiled pollen datasets from open-access databases and individual data contributors [31,33].The Landclim II pollen dataset includes pollen data from the European Pollen Database [50,51], the Alpine Palynological Database (ALPADABA; Institute of Plant Sciences, University of Bern), the Czech Quaternary Palynological Database (PALYCZ; [52]), the Pyrenean Paleoenvironmental Database (PALEOPYR; [53]), and datasets compiled within synthesis projects from the Mediterranean region [7,11] and the eastern Mediterranean-Black Sea-Caspian corridor (EMBSeCBIO project; [54]).Most of the cited datasets are now archived in NEOTOMA [55].The 479 new records that have been used here are either datasets added to Neotoma or the EPD until end of 2020 or collated from individual data contributors that fulfill the protocols applied in the Landclim projects.Pollen records are from natural terrestrial basins (lakes or bogs) with calibrated chronologies based on ≥ 3 dates.Where necessary new age-depth models expressed as (calibrated) calendar years before the present (i.e., cal BP = before 1950 CE, hereafter referred to as BP) were established in collaboration with the data contributors or database managers using the R-package clam [56].
Site radius information was obtained from original publications where possible.Where a site's radius could not be determined from publication, it was geolocated in Google Earth, and the area of the site was measured.A radius value was extracted, assuming that a site shape is circular [28].Available pollen records were filtered based on criteria including basin type (to exclude archaeological sites and marine records) and quality of chronological control (excluding sites with poor age-depth models or fewer than three radiocarbon dates).
Pollen counts were aggregated into 25 time windows across the Holocene (present-11,700 BP).The use of consecutive 500-year-long time windows results in REVEALS reconstructions with low SEs.The 500-year-long time windows are meaningful for the study of past land-cover changes over several millennia [31,34] and maximize the pollen-count size within time windows.This minimizes standard errors by decreasing variability between samples.Because human-induced land-cover changes were often more rapid since the early Middle Ages than through the earlier millennia, the three most recent time windows were fixed to present-100 BP (where the present is the year of coring), 100-350 BP, and 350-700 BP.An additional modern window was considered to evaluate the performance of the quality of the REVEALS reconstruction (see Section 2.3).
The taxonomy of each of the 1607 original pollen data files was harmonized.Pollen morphological types were assigned to 31 and 46 taxa (Table 1) using an updated dictionary table from [31] following the protocol described in [33].Samples from each harmonized record were aggregated in time windows using the assigned calibrated ages BP from each age-depth model.The pollen records were then filtered to remove time windows with fewer than 100 pollen grains to avoid sterile samples that would compromise the correctness of the REVEALS estimates.
RPP (relative to Poaceae, RPP = 1) is one of the most important input parameters required to run the REVEALS model [27].We test the inclusion or exclusion of plant taxa with dominant entomophily and the effect of RPP values on the grid-based REVEALS estimates (Gb-RVest).The selection of RPP studies, RPP values, and calculation of mean RPP and their standard deviation (SD) for Europe (Table 1) are explained in Appendix B, Table A1.This paper uses three alternative RPP-means datasets (Table 1) to evaluate the effect of RPP selection on REVEALS results.
RPPs.st1 (31 taxa) is the European RPP-means dataset from [31].It includes plant taxa from boreal, temperate, and Mediterranean Europe for the calculation of the RPP-mean values.In this selection, most entomophilous herbs are excluded, except the most common taxa with mixed wind and insect transport, such as Amaranthaceae/Chenopodiaceae, Filipendula, Rumex acetosa t., and Plantago lanceolata.Note that the entomophilous tree Tilia and partly entomophilous tree/shrub Salix are included.
RPPs.st2 (46 taxa) is a new synthesis proposed in this study, inclusive of a larger number of entomophilous taxa.It uses the same 31 taxa and mean values of RPP used in the first dataset [31] and 13 additional entomophilous taxa (some with mixed wind and insect transport), i.e., Empetrum, Acer, Sambucus nigra t., Fabaceae, Apiaceae, Compositae SF.Cichorioideae, Comp.Leucanthemum (Anthemis) t., Plantago media, Plantago montana, Ranunculus acris t., Potentilla t., Rubiaceae, and Trollius, as well as Populus and Urtica (mainly anemophilous).For these additional taxa, the mean was calculated using all available European RPP values (Appendix B, Table A1) based on standard 2 from [28].We excluded values that were not significantly different from zero considering the lower bound of its SD (e.g., Empetrum, 0.07) and values assumed to be outliers or unreliable in the original publications.The RPPs.st2 is used to test the sensitivity of the REVEALS model to the use of pollen types from entomophilous plant taxa, although the model assumes that all pollen is transported by wind (see 2.1).
Fall speed of pollen (FSP) values are listed in [31].

Modern Vegetation and Pollen Datasets
To perform an evaluation of the quality of the REVEALS results and the effect of RPPmeans datasets (RPPs.st1,RPPs.st2, and RPPs.st3) on the grid-based REVEALS estimates (Gb-RVest), we compared the sum of REVEALS-based tree cover for the most recent decades (RV-Trees) with modern measurements of tree cover (GFC-Trees).
GFC-Trees was derived from the global forest change dataset [57].This is based on the analysis of Landsat 7 Enhanced Thematic Mapper Plus data at a 30-m spatial resolution to characterize forest extent, loss, and gain from 2000 to 2012.We used tree cover data for the year 2000 that expresses tree cover (defined as vegetation taller than 5 m in height) as a percentage per output grid cell.All forms of natural forests or plantations across a range of canopy densities are considered.Broadleaved and coniferous trees are not differentiated.Original tree cover data are viewable and downable at full resolution at http://earthenginepartners.appspot.com/science-2013-global-forest.
RV-Trees values are based on the Gb-RVest results from the core top samples (−45 to −55 BP, i.e., 1995 AD to 2005 AD).GB-RVest results for taxa in the summer green trees and evergreen tree groups (Table 1) have been summed.The number of core top records from large lakes is limited in Europe (N = 9), and thus we have included Gb-RVest, which is based on multiple small-sized sites within our evaluation of RPP-mean datasets.The total number of grid cells used is 111.This results in 36 grid cells with lakes and 75 grid cells with both lakes and bogs (of which 47 with only ≥1 bog(s)) of all radii.Therefore, the modern pollen dataset covered 20 European countries from the Mediterranean to the boreal vegetation zones, with a tree-cover gradient from 1% to 80% (Figure 2).The same pollen dataset was used for the comparison between raw pollen data (RW-data) and GFC-Trees for 31 taxa and 46 taxa.
Comparison of RV-Trees with GFC-Trees required a transformation of the spatial resolution of the modern tree cover.While the tree cover is available at 1 arc-minute resolution, the REVEALS reconstructions were prepared for 1° grid cells.We aggregated the tree-cover data to the REVEALS grid by averaging the tree-cover percentages in each RE-VEALS grid cell (Figure 2).

REVEALS Run and Data Analysis
The REVEALS function within the LRA R-package [58] was used to produce gridbased REVEALS estimates (Gb-RVest).In this study, we selected the Gaussian Plume model to describe pollen dispersal as selected RPP values are derived from the GPM model.Depending on the type of site, the REVEALS function used a different deposition model, Sugita's model for lakes and ponds [40] or Prentice's model for bogs and mires [43,44].Pollen records from all sites, regardless of their size, are used to maximize the number of pollen records within each 1° × 1° grid cell across the studied area.For the grid cells that include pollen data from both lakes and bogs, we apply REVEALS separately for the lake and bog data and then combine results to produce a single mean Gb-RVest and its standard error (SE) for each taxon.
When running REVEALS, neutral atmospheric conditions and wind speed of 3 m.s−1 as in [27,28,31,33] are assumed.Zmax, the maximum spatial extent of the regional vegetation from the centre of the site, is set to 50 km, roughly corresponding to a 1° × 1° grid cell [28,31,33].
REVEALS results are extracted by time window, producing 25 matrices of mean Gb-RVest and 25 matrices of corresponding mean SEs for each of the RPP taxa and each grid cell.As three RPP-means datasets are tested, three REVEALS results are produced per time window.The taxon-based Gb-RVest are then grouped into land-cover types (LCTs, Table 1), hereafter named Gb-RVest-LCTs.For the modern time window (−45 to −55 BP), as the RV-Trees do not separate the contributions of evergreen and summer green species, the sum of the two Gb-RVest-LCTs was calculated for comparison with GFC-Trees (Table 1).The SEs of each Gb-RVest-LCT and overall tree cover were calculated using the delta method [59].
We use here major axis (MA) as the regression method (see Appendix C) [60,61] to explore the bivariate relationships between two pairs of variables, or data series: among different RPPs.sts over the Holocene (RPPs.st2 vs. RPPs.st1and RPPs.st3 vs. RPPs.st1)and between modern vegetation and REVEALS results.
Further, pairs of Gb-RVest-LCTs (RPPs.st1 vs. RPPs.st2,RPPs.st1 vs. RPPs.st3)for all time windows together were compared calculating the difference between the values in each grid cell, geolocalizing in the map of Europe the negative and positive values.

Effect of Type and Number of Pollen Taxa in RPP-Means Datasets on Gb-RVest-LCTs
In this section, we evaluate how Gb-RVest-LCTs, using RPPs.st1 as a reference, compare to RPPs.st2 and RPPs.st3 (Figure 3) and formulate hypotheses on the nature of these relationships and how they vary across RPP-means datasets.
The strongest association in the MA regression results is between Gb-RVest.

Geographical Pattern of the Gb-RVest-LCTs Differences between RPP-Means Datasets
Differences between pairs of Gb-RVest-LCTs were mapped using the positive and negative results of the differences (i.e.diff.A, Figure 4a,b; diff.B, Figure 5a,b).Furthermore, the negative and positive values of the differences are explain by maximum values of the most representative taxa over all time windows (Figure 4c-f; Figure 5c,d).These taxa influence the over-or under-representation of LCT in the analyses.

REVEALS Validation for All Europe
Validation was undertaken on two groups of sites (lakes only: RV-Trees.L.st1; lakes plus bogs: RV-Trees.LnB.st1) to test (a) which RPPs.sts to use in the REVEALS model in order to have robust reconstructions on a wide scale, and (b) whether the inclusion of bog sites influences the goodness of fit between REVEALS model results and modern vegetation.
The best-fit relationship in both MA regression analyses between GFC-Trees and RV-Trees.L.sts/RV-Trees.LnB.stsshows that trees are over-represented in the RV results (Figures 6 and 7a-c).The residuals (Figures 6 and 7f-h) are normally distributed across the RV-Trees gradient.
RW-data (for 31 and 46 taxa, for lakes, and lakes and bogs) was used instead of RVest, to test whether the use of raw pollen-Trees rather than RV-Trees better represented the actual forest cover (GFC-Trees) (Figures 6 and 7d,e).The results show that raw pollen-Trees have a worse association with GFC-Trees than RV-Trees.The regression between raw pollen-Trees (31 and 46 taxa) for lakes and bogs and GFC-Trees only has a weak association (R 2 = 0.082) (Figure 6d,e), and for the lakes-only group, an even lower R 2 (0.073) (Figure 7d,e).The residuals in both cases are not normally distributed along the regression line (Figures 6 and 7i,j).

Discussion
Our discussion focuses on (i) proposing a robust RPP-means dataset, through validation, for a reliable representation of vegetation for the last 11,700 years BP at a European scale, (ii) the influence of different RPP-means datasets on Gb-RVest as a test of the sensitivity of the REVEALS approach, (iii) the evaluation of some challenging taxa, and (iv) the importance of the number of pollen records for high-quality Gb-RVest to capture transient vegetation change at a sub-millennial time scale through the Holocene.

New Insight after Validation
Testing the reliability of REVEALS-based reconstructions relies on comparison with different datasets, such as remote sensing data, and here we have used the global forest change dataset (GFC).Neither RVest nor GFC provides a completely accurate reflection of the "actual" vegetation.Both are subject to a number of potential sources of errors, as already observed in [62] in the correspondence between CORINE [63] and pollen-based land-cover classes.This study shares some similar challenges for comparing estimates of vegetation based on the remotely sensed and pollen-inferred land cover with [62], which might have influenced the validation as a whole.These include: (1) georeferencing inaccuracies: misplaced pollen site locations can affect both GFC (by extracting the wrong forest cover data for sites) and RVest (by placing sites in the wrong grid cells); (2) misclassification of land cover as remote sensing techniques make it difficult to differentiate between land-cover types (e.g., the determination of different forest types) and not all landcover types are detectable via remote sensing; (3) the normalization factor applied to both RVest and GFC to make the datasets comparable leads to loss of some details.However, our comparison differs from that of [62] for several reasons, which are: (1) the approach used to transform pollen data into records of land-cover change, as we have been able to compare quantified values in both RVest and GFC, rather than compare classification results (via biomization techniques used in [62]), ( 2) the modern vegetation used, (3) the use of modern surface pollen samples.
The global forest change dataset is >80% accurate [57].It is the most widely used forest cover product for global and regional analyses due to its high resolution (30 m), standardized classes, yearly updates, and convenient and cost-free use [64].It also has the advantage of being a time series of changes in tree cover [65].Nevertheless, a number of sources of error are specific to GFC.In [65,66], accuracy issues for this dataset are reported.GFC is less accurate in mountainous regions due to a combination of intense cloudiness and topographic shadowing [67].GFC is also unable to distinguish between natural forest cover and agricultural tree crops [66].Lower accuracy is also reported in regions with sparse and variable tree cover due to the background signal or seasonal variability in phenology or cloud cover.Another limiting factor can be ascribed to the rescaling process that limits the ability of the dataset to capture tree canopy cover values on the ground at 30 m × 30 m as it is derived from a coarser-resolution product [64].
In this study, the number of available modern pollen datasets with top cores samples was limited.For the sake of complete Europe-wide validation, as many sites (191) as possible were considered, including small lakes and bogs.A total of 41% of the grid cells (111 out of 539) had top core samples useful for comparison.Our comparison revealed that RPPs.st1 is the most suitable to represent modern vegetation in Europe, both using bogs and lakes, or only lakes, as suggested by [31].The REVEALS model improved the accuracy of vegetation reconstruction significantly over the pollen proportions alone.Both RV-Trees and uncorrected pollen-Trees over-represented forest cover compared to GFC-Trees; however, the best match was found between RV-Trees and GFC-Trees.Trees in GFC are defined as plants taller than 5 m.Some common European trees begin to produce and disperse pollen even before reaching a height of 5 m (e.g., Betula and Alnus), particularly in regions where tree growth might be more stunted or there are lots of shrubby trees [68][69][70].This may explain a greater representation of trees in RVest than in GFC.On the other hand, a young Pinus woodland may not produce substantial volumes of pollen but will appear in the remote sensing dataset as a forest.Thus, it is important to bear in mind the characteristics of the plant taxa and take into consideration their flowering age (i.e. the number of years a particular plant taxon needs before it produces a significant amount of pollen), location in the landscape (within or outside a woodland), or location within a woodland (with flowering parts below or within the woodland higher canopy) in order to better interpret the pollen-based reconstructions of plant cover.This validation not only identifies the most suitable RPP-means dataset so far that can be used at the European scale for the Holocene but also highlights the complexity of land cover, whose different sets of conditions, history, and dynamics are difficult to interpret from pixelated data.Each set of land-cover maps contains its own limitations and biases, which should not overshadow the value of these products.

Influence of RPPs.sts on REVEALS Model Sensitivity and Pattern of Difference at the Spatial Level
The second aim of this paper was to explore how different RPP-means datasets impact the model output.It has previously been shown that RVest is strongly influenced by the choice of RPPs that are used [39,71,72].The RPPs values for a given taxon may, in some cases, differ between study areas, although it was less clear whether differences were related to environmental factors (e.g., climate, soils, land-use practices) or field methodologies in pollen sampling and vegetation survey [35,73,74].The solution to variable RPPs between studies has been the calculation of mean RPP values that are applied within single studies (e.g., [28,31,33]), and these have facilitated comparison between studies.We explored in our analysis the impact that different RPP-mean datasets may have on the results of analysis, with a particular focus on the inclusion of entomophilous taxa and through experimentation with taxa that have very different values depending on the environment from which they derive.This is the case in particular for Ericaceae (see Appendix B, Table A1), which, to date, has only two extreme values available.The first, from central Sweden, is 0.07 [75], and the second one from Mediterranean France of 4.265 [31], all values are relative to Poaceae.
The results of our analysis show that Gb-RVest is more sensitive to changes in RPP values (when the comparison sets have the same number of RPP taxa) than the addition of taxa.The addition of a larger number of entomophilous taxa did not significantly impact the overall results, despite REVEALS assuming that the major agent of pollen transport is wind [27].The main differences in Gb-RVest are found between runs using the RPPs.st1 and RPPs.st3 datasets (Figure 3), which are caused by differences in RPP values for 19 taxa (see Table 1).Experimentation using different RPPs.sts, and careful comparison to modern forest cover data, has enabled us to test different values for the same taxa by observing which taxa are responsible for the over-or under-representation of the Gb-RVest-LCTs.We have shown in our experimentation that uncertainty in RPPs for two challenging taxa (Ericaceae and Empetrum) has the greatest impact on our Gb-RVest.This is mainly due to factors that influence the RPP values.

Challenging Taxa: Ericaceae and Empetrum
The pollen morphotype Ericaceae comprises a wide range of species with highly varied growth forms.Species growing in the Mediterranean area, such as Arbutus unedo and Erica arborea [76][77][78][79], can grow as shrubs up to 5 m in height and have a large number of inflorescences [79,80].Low-growth species are characteristic of central and northern Europe, e.g., Andromeda polifolia, Erica cinerea, and Vaccinum spp.[81,82].The observed variability in RPP values (4.265 in the Mediterranean, used in RPPs.st1, and 0.07 in central Sweden, used in RPPs.st3) is most likely a result of both growth form and the number of inflorescences.The use of the higher value (RPPs.st1)led to the under-representation of Ericaceae in Gb-RVest in central northern Europe (Figure 4).The lower value (RPPs.st3)resulted in an overrepresentation of Gb-RVest in the Mediterranean region (Figure 5).In the case of Ericaceae, we might employ two different values in different regions of Europe; however, it is not possible without independent climate data to use several different values for a single reconstruction because the extent of the Mediterranean biome is likely to have shifted during the Holocene [31].
The overrepresentation of Empetrum in Gb-Rvest-ET.st2 in North Europe, mainly in the British Isles, has similar causes as those for Ericaceae.The abundance of Empetrum in some grid cells may reflect the habitat of the species; the type of basin type (lake or bog) may also play a role.Empetrum is generally found in regions with high rainfall at low altitudes in northwest England and at sea level in western Ireland, as shown by our results.Empetrum is most characteristic of ombrogenous bogs but is also present in some open pine and birch woodland [83]: as a result, we are more likely to reconstruct greater land cover of Empetrum when bogs are used rather than lakes, and the same is probably true for Ericaceae.
Besides the different basin types, there are inherent characteristics of Ericaeae and Empetrum that may amplify the importance of heathland pollen taxa.The morphology of Ericaceae flowers (e.g., exserted stamens) as in Calluna vulgaris, Erica umbellata, E. vagans, and E. erigena can trigger anemophilous pollination and, therefore, a wider pattern of pollen dispersal.The buoyancy and hydrodynamic characteristics of the pollen shape of Ericaceae (i.e., tetragonal tetrads) may intensify transport by water (e.g., streams or surface runoff) with subsequent accumulation at the margin of the water body (in our case, lakes) [84].The combination of those factors can influence the relative abundance of Ericaceae pollen in sediments.

Importance of the Number of Pollen Records in Europe: Data Reliability
The reliability (quality) of grid-based REVEALS estimates across the Holocene depends on three key elements: the number of pollen records used and their distribution in each grid cell; the type and size of pollen records; and the variation of these factors across time windows [31,33].The REVEALS model was developed to reconstruct regional vegetation abundance using pollen data from large lakes (>100-500 ha) [27].Studies using pollen records from the Czech Republic [28], Britain and Ireland [29], and southern Sweden [30] have shown that REVEALS estimates based on pollen records from 2 to 3 small sites (<50 ha) are similar to REVEALS estimates based on pollen records from large lakes.The minimum number of small sites required to obtain reliable outcomes is difficult to define [33].In this study, we used the "protocol of reliability" proposed earlier [31], considering "reliable" the grid cells with one large lake or at least three small sites.Those grid cells with less than three small sites (lake or bog) or a large bog that violates assumptions of the model (i.e., no pollen-bearing plants grow on the sedimentary basin) were considered "unreliable" (Figure 1).The availability of pollen records from large sites in Europe remains limited, which means that the multiple small sites approach [27] had to be implemented to obtain a larger spatial density of REVEALS estimates across Europe.
Through time, the reliability of an individual grid cell may change, as not all pollen sequences cover the whole Holocene.In this dataset, 186 out of 539 grids are sufficiently reliable (Figure 1) because at least 50% of the time windows are based on large lakes or more than 3 small sites.A total of 62 grid cells are partly reliable as fewer than 50% of the time windows are based on reliable sites or groups of sites.A total of 291 grids are unreliable.Caution should be applied when using REVEALS estimates from unreliable grid cells.These values may still represent regional vegetation if the vegetation in the grid cell was homogeneous in the past, but if the vegetation was heterogenous Gb-RVest from pollen sites that represent local vegetation cover are unlikely to wholly reflect regional vegetation.
The precision of Gb-RVest is indicated by their SEs.Increasing the size of the pollen count for a time window results in RVest with a smaller SE [27,28,31,33].The 500-yearlong time windows (except for the three most recent ones) help to maximize the size of the pollen count for each time window.Caution should be applied when using the Gb-RVest when SEs are equal to or greater than RVest [31].
The results presented here are based on 1607 pollen sequences, which is 40% more than [31], and has greatly improved the availability of reliable Gb-RVest, particularly in southern Europe.Future work should focus on further enhancing this research effort by using more pollen sequences to improve both reliability of values (more sites in each grid cell) and focusing on regions with unreliable grid cells (Figure 1) or where open-source or well-dated sequences are currently lacking.These regions include the Balkan peninsula, Northern Scandinavia, and Eastern and Southern Europe.

Conclusions
This paper describes how a different selection of input parameters (three RPP-means datasets, RPPs.sts) affects grid-based REVEALS estimates (Gb-RVest) across the Holocene at a pan-European scale.Using major axis regression, we have shown that the choice of RPP values can result in significant differences in Gb-RVest.RPPs.sts were validated for the first time on a European scale.We had shown that REVEALS performed better when RPPs.st1 was used.This RPP set excluded entomophilous taxa but included those with mixed dispersal mechanisms.Thus, the addition of a larger number of entomophilous taxa does not significantly improve the overall result, and it is more important to obtain reliable RPP values for taxa than broaden the number used.The validation process confirmed earlier studies that have demonstrated that the REVEALS model improves the accuracy of vegetation reconstruction (RVest) significantly over the pollen proportions (raw pollen data) alone.
This study points out the complexity of the variables acting on Gb-RVest.Mainly RPPs values, intrinsic plant characteristics (e.g., entomophily/anemophily, flowers, and pollen morphology), and the place where pollen grains were sampled (lakes or peat bogs) can influence quantified vegetation reconstructions.The increasing number of pollen records used in this study across Europe and during the Holocene has increased the quality and accuracy of vegetation estimates both at the spatial and temporal levels.
This emphasizes the importance of all inputs used in the model and intends to foster the inclusion of numerous factors that act on the pollen grains' production, dispersion, and deposition when interpreting the estimated results.Thus, it encourages new studies on the improvement of RPPs and pollen records in Europe in order to make the reconstructions increasingly accurate.
The great improvement of the accuracy and spatial coverage of REVEALS-based reconstruction enables better and more detailed usages of these results.Within the Terranova Project, examples of uses are the exploration of spatial-temporal changes in past land cover and biodiversity over long time periods at a European scale, the evaluation of model-simulated vegetation cover from dynamic vegetation model (CARAIB [85]), and Agent-Based Models (ABM [86]), respectively.lake sediments (Sugita's lake model [40]), except for the study by [95], where RPP values were calculated using the Lagrangian stochastic model.
Even though the REVEALS model assumes that RPP values are constant within the region of interest and through time [27], it has been suggested that RPPs may vary between regions, with the variation caused by environmental variability (climate), vegetation structure, or methodological design differences [28,36,71,102].
In the case of multiple RPP values for one taxon in Europe, the mean was calculated to equalize within-taxon variabilities.In the synthesis, we seek to select and calculate mean values coming from boreal, temperate, and Mediterranean Europe without separating the datasets in the base of the regions.This is not straightforward to achieve because the borders of these regions shifted over the Holocene [31].
Table A1.Available RPPs values for the selected taxa used to calculate the RPPs-means and their SDs of the 31 and 46 taxa in the datasets (i.e.RPPs.st1,RPPs.st2,RPPs.st3).RPPs (with their standard deviations SDs) of 26 tree pollen taxa and 23 herbs, and small shrubs in 17 study areas.The type of surface samples and the ERV submodel used to calculate the original RPPs are indicated.The superscript numbers 1, 2, 3 indicate the values that were not included in the RPPs.st1,st2, and st3, respectively.If no superscript number, the value is included in all three datasets.Poaceae was selected as reference taxon.*RPPs from Germany [68], reference taxon Pinus.RPPs converted to Poaceae as reference taxon.The RPP estimates selected in this case were obtained with a vegetation dataset including only the trees that had reached their flowering age.** RPPs from Germany [95]; in the original publication, the ERV analysis was performed with the Lagrangian stochastic model (LSM) for the dispersal of pollen and with Pinus as a reference taxon.In Githumbi et al. (2022), Martin Theuerkauf redid the analysis with the Gaussian Plume model for the dispersal of pollen [45,90] and with Poaceae as a reference taxon.Poaceae (Reference taxa) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) Herb taxa Amaranthaceae/Chenopodiaceae (mainly Amaranthus retroflexus and Chenopodium album) 4.28 (0.27) Apiaceae 0.26 (0.009)

Figure 1 .
Figure 1.Study area showing grid coverage with available REVEALS-based reconstruction of landcover.Grid-cell reliability depends on the number and type of pollen records for the 25 time windows (TWs).Reliable: ≥1 large lake(s), ≥2 small lake(s) and/or small bog(s), mix of ≥1 large lake(s) and ≥1 small lake(s) and/or small bog(s); less reliable: 1 bog (large or small) or 1 small lake.Grey grid cells: less reliable results for all TWs.Colour indicate, for each grid cell, the % of the total number of TWs with reliable REVEALS reconstructions of plant cover.For instance, light yellow grid cells imply that reconstructions are reliable for 8-21% of the TWs, while they are less reliable for 79-92% of the TWs; and dark green grid cells indicate that reconstructions are reliable for 98-100% of the TWs, while they are less reliable for 0-2% of the TWs.

Figure 2 .
Figure 2. Grid cells with sites (bogs and lakes) used for validation and tree cover at 2000 AD according to the global forest change dataset [57] in mean percentage cover of the grid cell (1° × 1°).Blue and red grid cells, see the legend.See Methods for the definition of "reliable number of sites" (i.e., implying reliable REVEALS estimates of plant cover).Red grid cells represent less reliable RE-VEALS reconstructions.Note that the information on the reliability of results in this Figure is valid for the time window -45 to -55 (1995-2005 CE) only.

Figure 4 .
Figure 4. Geolocalisation of the diff.A (see Section 3.2) between grid-based REVEALS estimates (Gb-RVest) for Land-cover types using RPPs.st1 and RPPs.st3, shown as negative values of diff.A for evergreen trees (ET) in panel (a) and positive values of diff.A for open land (OL) in panel (b).Panel (c) shows the maximum values of Ericaceae using RPPs.st3.Panels (d-f) illustrate the maximum

Figure 5 .
Figure 5. Geolocalisation of the diff.B (see Section 3.2) between grid-based REVEALS estimates (Gb-RVest) for Land-cover types for RPPs.st1 and grid-based REVEALS estimates for Land-cover types using RPPs.st2, in terms of negative values of evergreen trees (ET) panel (a) and positive values of the open land (OL) panel (b).Panel (c) shows the maximum values of Empetrum using RPPs.st2.Panel (d) illustrates the maximum values of Calluna vulgaris using RPPs.st1.For scale range (see caption of Figure 4).

Figure 6 .
Figure 6.From left to right major axis regression between global forest change trees and REVEALS estimates for tree cover RPPs.st1 (lakes and bogs), global forest change trees and REVEALS estimates for tree cover RPPs.st2 (lakes and bogs), global forest change trees and REVEALS estimates for tree cover RPPs.st3 (lakes and bogs) (panels: a-c) and corresponding residuals (panels: f-h), major axis regression between global forest change trees and raw pollen data (31 taxa), global forest change trees and raw pollen data (46 taxa) (panels: d,e) and corresponding residuals values graphs below (panels: i,j).Black dots correspond to the grid-cell values used, the dark line is the 1 to 1 relationship, and the red line is the best-fitted relationship.

Figure 7 .
Figure 7. From left to right major axis regression between global forest change trees and REVEALS estimates for tree cover RPPs.st1 (lakes), global forest change trees and REVEALS estimates for tree cover RPPs.st2 (lakes), global forest change trees and REVEALS estimates for tree cover RPPs.st3 (lakes) (panels: a-c) and corresponding residuals (panels: f-h), major axis regression between global forest change trees and raw pollen data (31 taxa), global forest change trees and raw pollen data (46 taxa) (panels: d,e) and corresponding residuals values graphs below (panels: i,j).Black dots correspond to the grid cells values used, the dark line is the 1 to 1 relationship, and the red line is the best-fitted relationship.

Table 1 .
Land-cover types (LCTs) and their corresponding pollen morphological types.Fall speed of pollen (FSP) and the mean relative pollen productivity estimates (RPPs) for three different RPPmeans datasets (RPPs.sts),withtheirstandard deviations (SDs) in brackets (see text for more explanations).We highlighted the values that remain fixed across the RPPs.sts in green, the additional values considered in RPPs.st3 in blue and, the 15 additional RPP values (mostly entomophilous taxa) in RPPs.st2 in orange.For more information on species involved in the calculation of original RPP values, see Appendix B, TableA1.