Enough Is Enough? Searching for the Optimal Sample Size to Monitor European Habitats: A Case Study from Coastal Sand Dunes

: A robust survey method that samples the main characteristics of plant assemblages is needed to assess the conservation status of European habitat in the Natura 2000 network. A measure of variability, called pseudo-multivariate dissimilarity-based standard error (MultSE), was recently proposed for assessing sample-size adequacy in ecological communities. Here, we used it on coastal sand dune systems in three Special Areas of Conservation (SACs) in Tuscany. Our aim was to assess the minimum number of replicates necessary to adequately characterize sand dune environments in terms of di ﬀ erences between habitats and SACs, after a preliminary baseline assessment of plant diversity. Analysis of α and β diversity indicated that especially between habitats the three SACs protect di ﬀ erent plant communities. The study of the MultSE proﬁles showed that the minimum number of replicates needed to assess di ﬀ erences among habitats varied between 10 and 25 plots. Two-way PERMANOVA and SIMPER analysis on the full and reduced datasets conﬁrmed that SACs and habitats host di ﬀ erent plant communities, and that the contribution of the target species remained unchanged even with a reduced sample size. The proposed methodological approach can be used to develop cost-e ﬀ ective monitoring programs and it can be useful for plant ecologists and biodiversity managers for assessing ecosystem health and changes. habitat, SAC and their interaction. Breaking points of MultSE (i.e., where the linear relation changes) were estimated by segmented relationships.


Introduction
The Habitats Directive (92/43/EEC) obliges Member States of the European Union (EU) to monitor the conservation status of habitats and species listed in the Directive Annexes, and to report the results every 6 years [1,2]. In the Natura 2000 network, it is therefore essential to assess the actual distribution, natural variation and information on the quality of habitats in each site, and at the same time to provide solid data useful for objectively and quantitatively evaluating changes due to any conservation and/or restoration activities [1]. Nevertheless, no robust exhaustive method is available to detect the main characteristics of plant assemblages (presence and abundance) and to monitor habitat health along with habitat changes and conservation status. Probabilistic samplings and representativeness Diversity 2020, 12, 138 2 of 16 assessment have not yet been used in monitoring schemes of European habitats [3], although some attempts based on species accumulation curves have been made [4]. Monitoring schemes have been established for many different purposes, and three aspects, namely sampling design, sample size and type of statistical analysis, are regarded as generally relevant in determining the scientific quality of the information derived from biodiversity monitoring [5]. Efficient sampling design is essential for accuracy, i.e., correspondence between real and measured biodiversity trends [6]. The sample size, namely the number of measurements made, is central for data precision (i.e., the ability to measure the same value under identical conditions). Finally, appropriate statistical analysis is needed to translate the data collected into useful information with relative uncertainty, which also depends on sampling design [7]. In plant ecology, when comparing attributes of plant communities in space or in time, it is fundamental to estimate how adequate a sample is for capturing the species diversity, taxonomic composition and relative abundance of the entire survey population, avoiding bias and dependency on sample size [8]. Sampling effort can influence the possibility of differentiating ecological communities [8][9][10][11], to the detriment of monitoring for community conservation and restoration purposes. In relation to habitat comparisons, these are usually evaluated through multivariate differences in the composition of plant communities (e.g., Anderson and Santana-Garcon [11], Tordoni et al. [12]), classifying assemblages and inferring species-environment relationships [8]. Anderson and Santana-Garcon [11] recently proposed a measure of precision for dissimilarity-based multivariate analysis of ecological communities called pseudo multivariate dissimilarity-based standard error (MultSE) for assessing sample-size adequacy within ecological communities. This statistic, which is the multivariate analog of the univariate standard error, measures the variability in the position of the sample centroid under repeated sampling for a given sample size in the space of a chosen dissimilarity matrix [11].
Here we apply this measure of multivariate precision in the context of habitat monitoring, aiming to simulate the effect of sampling size reduction on the discrimination power of statistical analysis based on multivariate (species composition) characteristics of biodiversity useful for discriminating habitats.
As a working example, we focused on a recent monitoring of protected coastal sand dune habitats in Special Areas of Conservation (SACs) of the Natura 2000 network in Tuscany, central Italy. Coastal sand dunes are usually characterized by marked vegetation zonation; the different zones often host rare or exclusive species important for dune formation and stabilization because they enhance sand deposition [13]. These habitats have suffered a heavy loss of biodiversity and fragmentation in recent decades, chiefly due to human encroachment in the form of tourism, urban sprawl and shoreline erosion whose consequences for biodiversity and related ecosystem services have been severe [14][15][16][17][18][19]. Furthermore, biological invasions pose a serious threat to sand dune ecosystems, threatening local plant diversity and related functional aspects and may lead to long-term alterations [12,[20][21][22].
The general aim of this study was therefore to assess the number of replicates needed to adequately characterize sand dune environments in terms of differences between habitat types, SACs and habitat types within SACs, after obtaining a baseline assessment of plant diversity at habitat and site scale. We postulated that the decrease in MultSE would be more evident for species assemblages closer to the drift line than for those in the landward part of the beach.

Study Area and Sampling Design
The study was performed in three SACs on the Tyrrhenian coast of central  (Figure 1). Geologically, the sites are mainly composed of Quaternary sand sediments, mostly Holocenic [23]. Macrobioclimate is Mediterranean with upper meso-mediterranean thermotype and ombrotype ranging from upper dry (PU) to upper humid (TL and SP) [24]. EU habitat maps was detected from the available information provided by the HaSCITu (Habitat in the Sites of Conservation Interest in Tuscany) program of the Tuscan Regional Diversity 2020, 12, 138 3 of 16 Administration (http://www.regione.toscana.it/-/la-carta-degli-habitat-nei-siti-natura-2000-toscani) where dune habitats form an intricate shifting mosaic hard to map and included in large patches that create a serious difficulty when planning sampling design. We solved these problems by adopting the European Nature Information System (EUNIS), at the third classification level, generally more inclusive respect to EU habitats because mainly based on physiognomic and physical attributes (i.e., B1.3-Shifting coastal dunes, including EU habitats 2110 and 2120). Moreover, EUNIS is a pan-European system for hierarchical habitat classification and its commonly accepted nomenclature facilitates comparison of the results between European countries [25,26]. Mediterranean with upper meso-mediterranean thermotype and ombrotype ranging from upper dry (PU) to upper humid (TL and SP) [24]. EU habitat maps was detected from the available information provided by the HaSCITu (Habitat in the Sites of Conservation Interest in Tuscany) program of the Tuscan Regional Administration (http://www.regione.toscana.it/-/la-carta-degli-habitat-nei-sitinatura-2000-toscani) where dune habitats form an intricate shifting mosaic hard to map and included in large patches that create a serious difficulty when planning sampling design. We solved these problems by adopting the European Nature Information System (EUNIS), at the third classification level, generally more inclusive respect to EU habitats because mainly based on physiognomic and physical attributes (i.e., B1.3-Shifting coastal dunes, including EU habitats 2110 and 2120). Moreover, EUNIS is a pan-European system for hierarchical habitat classification and its commonly accepted nomenclature facilitates comparison of the results between European countries [25,26]. Three EUNIS habitat types (for more information see Davies et al. [27]), were mapped in the dune systems: (a) shifting coastal dunes (B1.3, including EU habitats 2110-embryonic shifting dunes-and 2120-shifting dunes along the shoreline with Ammophila arenaria); (b) coastal stable dune grasslands (B1.4, including EU habitats 2210-Crucianellion maritimae fixed beach dunes-and 2230-Malcolmietalia dune grasslands) and (c) coastal dune scrub (B1.6, including habitats 2230-Malcolmietalia dune grasslands-2240-Brachypodietalia dune grasslands with annuals and 2250-Coastal dunes with Juniperus sp. pl.). The target species for each EUNIS habitat type are reported in Table 1. 206 squared plots of 4 m 2 were randomly allocated in proportion to the surface area of the Three EUNIS habitat types (for more information see Davies et al. [27]), were mapped in the dune systems: (a) shifting coastal dunes (B1.3, including EU habitats 2110-embryonic shifting dunes-and 2120-shifting dunes along the shoreline with Ammophila arenaria); (b) coastal stable dune grasslands (B1.4, including EU habitats 2210-Crucianellion maritimae fixed beach dunes-and 2230-Malcolmietalia dune grasslands) and (c) coastal dune scrub (B1.6, including habitats 2230-Malcolmietalia dune grasslands-2240-Brachypodietalia dune grasslands with annuals and 2250-Coastal dunes with Juniperus sp. pl.). The target species for each EUNIS habitat type are reported in Table 1. 206 squared plots of 4 m 2 were randomly allocated in proportion to the surface area of the EUNIS habitats in each SAC (1 plot/ha for B1.3 and B1.4 and 1 plot/3-ha for B1.6; see Table 2 for further details) where occurrence and abundance (% visual cover estimation) of vascular plant species was recorded.

Workflow of the Analysis
Habitats and SACs were first characterized in terms of species richness and compositional similarity. These diversity characteristics were obtained from our whole dataset and served as a baseline for evaluating how much information was lost when sampling size was reduced. The following analyses were performed.

Description of Diversity Patterns
We first evaluated overall sampling efficiency and diversity, computing classical sample-based rarefaction curves (RC) and spatial-explicit rarefaction curves (SER, see Chiarucci et al. [29] for more details on methodology) using the function available in Bacaro et al. [30], and those available in the 'vegan' R package [31]. Unlike RCs which do not account for spatial autocorrelation, SERs take adjacency of sampling units into account and consequently the spatial structure of the data [32]. RCs and SERs were calculated for the three habitats separately (across SACs) and at SAC scale (across habitats), RC was also computed for the whole dataset (Random curve). The consistency of species diversity patterns across spatial scales (plot, habitat and site) was also assessed using additive partitioning techniques [33,34] in R package 'vegan' [31]. To test for significance, a null model was generated, permutating the original data matrix 999 times to assess deviation from random expectations.

Species Composition Variation across Spatial Scales
To investigate how SAC (fixed effect, three levels: PU, TL and SP), EUNIS habitats (fixed effect, three levels: B1.3, B1.4 and B1.6) and their interaction determined community composition, we performed a permutational multivariate analysis of variance (PERMANOVA, [35]) on the whole dataset, where: (a) Bray-Curtis dissimilarities where calculated on log (x + 1)-transformed abundance data and (b) Jaccard dissimilarity on species occurrences (i.e., p/a matrix). All tests were performed using 9999 permutations of residuals under a reduced model and α = 0.05; this method yields the best power and the most accurate type I error for multi-factorial designs [36]. The significant interaction term was then investigated using a posteriori pairwise comparisons with the PERMANOVA t statistic and 9999 permutations. We also calculated the pseudo multivariate variance component (expressed as percentages) for each source of variation. The analysis was performed using the PERMANOVA routine in the PRIMER v6 computer program [37], including the add-on package PERMANOVA+ [38]. After PERMANOVA, the same data underwent similarity percentage analysis (SIMPER, [39]) to identify the species that contribute most to the average Bray-Curtis dissimilarity between habitats across SACs.

Measurement of Multivariate Precision and Associated Dissimilarities (MultSE)
MultSE was calculated according to Equations (1) and (2) using the code and functions available in [11] and the method described therein. To compute MultSE, the community composition matrix was log (x + 1) transformed and Bray-Curtis dissimilarity calculated. This statistic was calculated between habitats (pooling all SACs) and between SACs (pooling all habitats). It can be considered a direct analogue of univariate SE, but is based on the chosen dissimilarity measure, thus providing a powerful tool to examine the relative precision of a sampling procedure. This statistic was calculated as follows: where V is a multivariate measure of pseudo variance in the space of the chosen dissimilarity measure: where n is the number of sampling units and d represents the squared distance between individual sampling points to their centroid, given a chosen dissimilarity measure. We computed 95% confidence intervals by a double resampling method based on permutations for means calculation and bias-adjusted bootstrap-based error bars (5000 resamples). As in the case of its univariate counterpart, when the profile of MultSE as a function of increasing sampling size reaches an asymptote, this measure can be considered indicative of adequate sampling effort. Beyond this threshold, in fact, the relationship becomes flat and only negligible changes in sampling precision can be expected. The breaking point of the MultSE profile was estimated using R package 'segmented' [40,41]. The statistic is unbiased if and only if the sampling procedure is representative of the statistical population and an equal probability is given to each sample by appropriate sampling methods [11]. In a similar way, we computed the expected increase in community dissimilarity related to sampling effort. In other words, using simple randomization procedures, we randomly extracted an increasing number of replicates (from 1 to n−1) 999 times, and we calculated the average Bray-Curtis and Jaccard dissimilarity profiles that indicate the centroid of the species assemblage, for a given sampling size.

Effect of Sampling Size Reduction
In order to describe how a reduction in sampling size affects the ecological conclusions obtained from the analysis of the complete dataset, and in particular the ability to: (1) characterize the species composition of single habitats, (2) discriminate compositional differences between habitats, and (3) provide an acceptable level of sampling precision under reduced sampling effort, we resampled plots virtually by means of a stratified random sampling approach. The plots were resampled from the whole species assemblage, using the number of plots derived from MultSE estimation for each level of the crossed factor SAC × habitat. Then this subset was used to compute PERMANOVA and do SIMPER analysis as described above. PERMANOVA and SIMPER were calculated at each randomization (999), and the resulting statistics were compared with those of the whole dataset. All the analyses, except for PERMANOVA, were computed using R 3.6.1 [42].

Results
Rarefaction curves (Figure 2) did not reach an asymptote: their comparison revealed that coastal dune scrubs (B1.6) accumulated more species than shifting coastal dunes (B1.3) and dune grasslands (B1.4) across sampling sites; whereas, among SACs, TL showed the highest plant richness followed by SP and PU. It is worth noting that the RC of dune scrubs lay above the random curve, whereas the corresponding SER did not, again corroborating the need to include the spatial structure of the data in order to avoid biased results. Additive partitioning showed that overall diversity in each site was mainly due to variation among habitats rather than plots (Figure 3).  Two-way PERMANOVA revealed that all sources of variation significantly affected community composition; pairwise comparisons for the interaction SAC × habitat were significant for all pairs examined, except coastal dune scrub of SP and TL when considering abundance data ( Table 3, Table  S1). Figure 4 and Figure S1 summarize the relationship between MultSE and the number of replicates among habitats in each SAC. Based on this, we estimated that among habitats, approximately 10 plots were enough to grasp overall diversity in the study area, even though slight differences could be detected in relation to habitat and SAC ( Table 4). The dissimilarity profiles computed among habitats, pooling all SACs, flattened out at about 20 plots (Figure 5), suggesting an effect of the SAC on the plant species pool. Notably, there were different patterns of diversity accumulation when abundances or incidence matrix were used, especially in habitat B1.6. Interestingly, the overall signal remained constant in the reduced dataset which was approximately one-quarter of the original dataset (54 vs. 206 plots, Table 5) whether we considered species abundance or species presence/absence. EUNIS habitat type accounted for the highest variance component in both datasets (original vs. reduced dataset), further corroborating the output in Figure 3; on the other hand, there were slight differences in the role of SAC and its interaction with habitat, depending on the type of data (abundance vs. presence/absence data). The contribution of the single species characterizing each habitat was concordant in the two datasets (Table S2 and Figure S2 of supplementary material).

Figure 2.
Spatially-explicit rarefaction curves (SERs, solid lines) and plot-based rarefaction curves (RCs, dashed lines) calculated among habitats (left) and SACs (right). The dotted line (Random) represents the RC calculated from the whole species pool. Please note that this curve was truncated according to the maximum number of plots detected in each factor. TL = Torre del Lago; PU = Parco Uccelllina; SV = Selva Pisana.    Table 5) whether we considered species abundance or species presence/absence. EUNIS habitat type accounted for the highest variance component in both datasets (original vs. reduced dataset), further corroborating the output in Figure 3; on the other hand, there were slight differences in the role of SAC and its interaction with habitat, depending on the type of data (abundance vs. presence/absence data). The contribution of the single species characterizing each habitat was concordant in the two datasets (Table S2 and Figure S2 of supplementary material).   regional/local scale, as in our case study, communities of fixed dunes also have similar floristic compositions [47,51]. This confirms that in habitats with strong environmental gradients, local variability is more important in shaping communities than larger scale variability. This was also found in Tuscan badland environments, where salinity and erosion create a complex mosaic of habitat types recognized at a local scale [52,53].

Spatial Variation of Reduced Dataset
The differences between habitats and SACs and between habitats within SACs remained appreciable with the reduced dataset which was approximately one-quarter of the original dataset

Diversity Patterns Across Habitats and SACs
Our analysis provided evidence that overall, the three EUNIS habitats host different plant communities and that the three SACs protect different dune vegetation. This information is confirmed by the diversity patterns and variations in species composition between different scales of analysis. Regarding diversity, the rarefaction curves described how species richness differed substantially between habitats, where an increasing number of species for a given sample size was detected moving from shifting dunes (B1.3) to dune grasslands (B1.4) and dune scrub (B1.6). Other studies have shown a correlation between species diversity and the coast-to-inland environmental gradient. Indeed, species richness generally increases as one moves from the annual communities of the upper beach (more unstable habitats and stressful conditions) to the fixed dunes (more stable environments) along the psammophile sequence [43][44][45][46] and our results are in line with these findings. Dune scrub has higher richness, as this habitat is dominated by shrub communities characterized by open areas with many annual species, as already shown by Acosta et al. [44]. However, interesting differences in rarefaction curve patterns were observed with respect to those of Ciccarelli and Bacaro [46] for the same study area and habitats: while these authors observed an asymptotic pattern for all curves, our rarefaction curves showed a constantly increasing trend. Sampling design and sampling size can be considered the main factors responsible for these differences: while our study was based on a randomly chosen plots, the study of the authors was based on contiguous transects. The overall sampling effort was also different: only 206 plots in our study versus a total of 980 plots. The total number of species collected per habitat in the two studies was nevertheless comparable, implying good sampling design efficiency.
α and β diversity partitioning across spatial scales showed substantial similarity for PU and SP, where β habitat and β plot components gave the same contribution to total gamma diversity. Notably, β habitat in TL showed the highest relative contribution to total γ diversity, meaning that in this SAC, communities are clearly distinguished from each other: TL has a more stable coastal configuration, allowing a more ample dune system with well-defined habitats. The second-largest variation was found at EUNIS habitat level in relation to the strong environmental gradients in coastal dunes. The latter ensure the development of floristically different vegetation types that host species with a narrow ecological range [47][48][49][50] and the existence of vegetation zonation which inevitably controls not only diversity patterns [13,44] but also community structure [26].
Based on the sea-inland environmental gradient, the species that contribute to average dissimilarities between EUNIS habitats are target species for EU dune habitats (key species or diagnostic species sensu Biondi et al. [28], Angiolini et al. [50], Sperandii et al. [26]). Elymus farctus and Ammophila arenaria are considered constructor species of embryonic and mobile dunes, respectively, linked to shifting coastal dune habitat (B1.3). Helichrysum stoechas, Seseli tortuosum and Vulpia fasciculata are considered structural species of fixed beach dune garrigues or grasslands, characterizing B1.4. Woody species such as Juniperus oxycedrus subsp. macrocarpa, Smilax aspera and Pistacia lentiscus are typical of fixed coastal dunes dominated by Juniperus sp. pl. such as in dune scrubs (B1.6). At the European scale, this is mostly true for communities of mobile dunes, even if at regional/local scale, as in our case study, communities of fixed dunes also have similar floristic compositions [47,51]. This confirms that in habitats with strong environmental gradients, local variability is more important in shaping communities than larger scale variability. This was also found in Tuscan badland environments, where salinity and erosion create a complex mosaic of habitat types recognized at a local scale [52,53].

Spatial Variation of Reduced Dataset
The differences between habitats and SACs and between habitats within SACs remained appreciable with the reduced dataset which was approximately one-quarter of the original dataset (54 vs. 206 plots). This suggests that a reduced sample can also capture the structure and composition of plant communities and environmental gradients, at least in the type of community considered here. A reduced dataset can, therefore, provide information on habitat conditions and be useful for monitoring habitat conservation status over time, since ecological groups such as target species contribute substantially to ecosystem structure and function, being particularly responsive to threats and habitat modifications [26,50,54]. Obviously, in order to ensure its representativeness, the reduced dataset must be based on information obtained by adequate sampling methods.
As expected, a smaller number of replicates was enough to distinguish B1.3 (shifting coastal dunes) from other EUNIS habitats. In line with Acosta et al. [44] and Angiolini et al. [50], this habitat showed the lowest number of species per plot. Conversely, greater sampling effort was needed in dune grasslands (B1.4), probably due to the presence of more stable and heterogeneous communities favored by deeper, more evolved soil along with lower exposure to the already mentioned limiting factors [13,55]. We also observed that a greater number of replicates was needed to characterize habitat when site variation was not considered. This was somehow expected, since SACs contribute to community composition, even if the greatest percentage of variance is accounted for by habitat type (see Table 3). In addition, another explanation for the low number of replicates needed to characterize dune community structure can be explained by the relatively low number of species thanks to species' highly restricted ecological preferences and specific functional features [22].

The Lesson We Learned
To date, few monitoring programs reach the standards necessary (e.g., survey design, hypothesis formulation, statistical power) to be considered statistically unbiased [7]. In recent years, the use of a probabilistic sampling design has proved useful in monitoring natural vegetation both for collecting reliable quantitative information and for representing of different physiognomic vegetation types, also allowing for generalizations [56]. In this light, it has now been widely acknowledged that it is not appropriate to examine biodiversity patterns such as species abundance by preferential sampling (e.g., Diekmann et al. [57]; Lájer [58]). It has also been shown that preferential sampling may lead to biased results by narrowing the environmental gradient or artificially restricting the species pool which may cause overestimation of rare and underestimation of common ones such as generalist or alien species [4,59]. Nonetheless, an urgent need for a quantitative measure of sampling adequacy in plant communities is advocated, especially by conservation technicians and plant ecologists. Quantitatively speaking, for instance, Stohlgren et al. [60] found that as few as ten replicates of 1 m 2 are satisfactory to detect fine-scale species richness patterns along an elevation gradient in the Rocky Mountains, and similar outcomes have been reported for semi-natural vegetation in Eastern Europe [61]. Our results agree with these studies suggesting that sampling effort in dune ecosystems could be moderately reduced, because biodiversity patterns remain quite stable and detection bias is relatively low. As a cautionary note, however, it worth noting that this method may be inappropriate for taxa with low detectability such as rare or cryptic species [61]; in this case, a more intense sampling effort is needed (e.g., Bried and Pellet [62]).

Conclusions
To the best of our knowledge, this is the first attempt to apply a MultSE approach to terrestrial habitats. MultSE proved useful for characterizing sampling adequacy and habitat features in a cost-effective way and highlighted that the three SACs protect different plant communities. The methodology proposed here evaluated different aspects of the monitoring of plant communities: in particular, it offers a flexible solution for plant ecologists and biodiversity managers wishing to optimize sampling design for habitat monitoring, facilitating the assessment of habitat quality and conservation status over time as specified in Art. 11 and 17 of EEC Directive 92/43. The type of response variable (abundance or occurrence) affects the restoration and conservation monitoring [63] and sampling costs, but this does not seem to occur for dune habitats (at least as far as the ability to distinguish different habitats and SACs is concerned). In any case, sand dune environments are usually characterized by a few abundant species that structure community composition facilitating the collection of abundance data which is often preferred for a quantitative assessment of the effects of conservation measures or habitat changes through time. Thus, before implementing habitat surveys, we recommend plant ecologists and biodiversity managers to consider the following aspects: • conduct pilot studies testing different sampling probabilistic methods; • wisely plan sampling efforts taking into account resource availability (i.e., time and costs); • approaches based on plant functional traits and remote sensing may provide novel insights on ecosystem functioning, the latter revealed to be also a cost-effective way to handle biodiversity measurements and to predict species changes through time.
In conclusion, we advocate the use of the present methodological approach in other habitat types and geographical areas in order to test its reproducibility and effectiveness and to develop cost-effective monitoring programs for other European protected areas under Habitats Directive.