The Big Five: Species Distribution Models from Citizen Science Data as Tool for Preserving the Largest Protected Saproxylic Beetles in Italy

: Background. Volunteers’ participation in scientiﬁc research has increased in recent decades. Citizen science (CS) data have been used in quantitative ecology to analyse species ranges by means of species distribution models. We investigated the Italian distribution of ﬁve large saproxylic beetles ( big ﬁve ), to describe their niche space, paramount areas for their conservation, and conservation gaps. Methods. CS data from two projects, climate and environmental variables were used to produce Habitat suitability (HS) maps for each species and averaged HS maps. The big ﬁve ’s conservation status was assessed interpolating HS maps with the distribution of protected areas, concomitantly identifying conservation gaps. Results. The pre-alpine and Apennines arcs, north-eastern Sicily and eastern Sardinia, were identiﬁed as conservation’s hotspots. Ranking HS levels from minimum to optimal, the extent of conservation gaps decreases as environmental suitability for the big ﬁve increases. Conclusions. For the ﬁrst time in Italy, CS data have been used to investigate niche space of the largest protected saproxylic beetles and analyse the distribution of their suitable habitat. The resulting HS raster maps and vector layers, reporting HS value in all Italian protected areas (n ◦ 3771), were provided and discussed, reporting an application example for conservation purposes.


Introduction
The participation of non-scientist volunteers in scientific research, referred to as citizen science (CS), has consistently increased in recent decades [1].Following a period of rapid growth and a process of professionalisation, the non-expert contribution to global research can now be considered fundamental in many disciplines [2][3][4][5].Particularly, CS has proved to be a powerful tool for facing conservation challenges, from public awareness-raising regarding environmental issues to building scientific knowledge [6].In recent years, CS data have been increasingly used in quantitative ecology to predict current species range by means of species distribution models (SDM) [5,7,8].SDM, besides being useful in theoretical research on species ecology and distribution, represents a powerful tool in nature conservation and management [9].Developing robust SDMs needs a large dataset, but collecting occurrence data over a large spatial scale poses a severe challenge to researchers; CS projects offer an effective alternative to tackle this issue [1,4].The distribution of a species can change in response to a large number of interrelated natural and anthropogenic processes.Climate and land use changes, urbanisation, deforestation, increasing extent of lands earmarked for agriculture and invasive species introduction are the main factors driving the Anthropocene's biodiversity loss [10][11][12][13].Even though the processes contributing to the decline of many organisms are well known, many endangered species still have poorly studied geographic distributions [14].Unfortunately, collecting occurrence data on invertebrates, particularly on endangered insect species, can be difficult, due to their short adult activity period, hidden life cycle, low detection probability, the great sampling effort needed and, in many cases, the engagement of expert entomologists for species identification.In this regard, citizens' involvement in entomological studies has recently become a pillar of conservation-oriented research [15,16].In Italy, several CS projects focusing on insects have been recently carried out: Farfalle in ToUr (2013-present), Life MIPP (2012-2017), InNat (2017-2022), Life ESC360 (2018-2022) and X-Polli:Nation (2021-present), just to cite a few.Some of these projects focused on endangered saproxylic species, mainly dealing with the conservation of saproxylic beetles (order Coleoptera).Saproxylic beetles depend upon dead or dying wood during some part of their life-cycle and represent one of the largest taxa contributing to forest saproxylic biodiversity [17,18].Playing a remarkable role in the trophic web of the forest ecosystem [19,20], they are considered good indicators of forest health [21].Indeed, saproxylic beetles have been the subject of CS and research projects, which demonstrated their relevance as flagship and umbrella species, whose conservation leads to the preservation of many other forestdwelling species [22][23][24][25][26].
We selected data on saproxylic beetle occurrence from two CS projects carried out in Italy.Five saproxylic beetle taxonomic units (TU) were chosen: Morimus Brullé, 1832, Lucanus cervus (Linnaeus, 1758), Cerambyx cerdo Linnaeus, 1758, Rosalia alpina (Linnaeus, 1758) and Osmoderma Serville, 1828.These five TUs (hereafter the big five) are easily recognizable by the citizen due to their relatively large size (from 15 mm of the smallest R. alpina to ~90 mm of the largest L. cervus) [27] and thanks to in-depth sheets for each target TU provided in the project-related apps and websites.
For the TU Morimus (Cerambycidae), we considered two taxa occurring in Italy as subspecies of Morimus asper (Sulzer 1776) (hereafter M. asper asper/funereus), as asserted by many authors [28][29][30], omitting the formerly Morimus asper ganglbaueri Reitter, 1894.M. asper asper/funereus adults measure 15-40 mm in length [31], have an elongated oval body and can be chromatically and morphologically distinguished from similar longhorn beetles of the Italian fauna [32].The European stag beetle L. cervus (Lucanidae) is one of the best-known coleopteran species and the largest stag beetle species in Europe, characterized by a pronounced sexual dimorphism in adults [33].The species is widely distributed in northern and central regions, whereas only the congeneric Lucanus tetraodon Thunberg, 1806 occurs in the central and southern portion of the Italian peninsula and Sicily [34,35].Differences in the morphology of the male mandibles make the species easily identifiable from the closely related L. tetraodon [33,36].Adults of the great Capricorn beetle C. cerdo (Cerambycidae), can be distinguished from other similar (and co-occurring) congeneric species, such as C. welensii (Küster, 1845), by expert entomologists thanks to their heavy sculptured pronotum, blackish body and reddish elytral apex [31,37,38].Adults of the Rosalia longicorn R. alpina (Cerambycidae) are easily identifiable thanks to their peculiar colour and the black spots on the elytra [31,39,40].Due to the difficulties in morphological identification of the European Osmoderma spp.(Scarabaeidae), species assignment remains a task for specialists.Hence, for the TU Osmoderma, we considered the species of this genus occurring in Italy, which share the same micro-habitat, inhabiting mature and hallow broad-leaved trees: O. eremita (Scopoli, 1763), O. cristinae Sparacio 1994 and the putative Italian subspecies O. eremita italicum Sparacio, 2000 [41].
Although distributed across Europe in old-growth forests, populations of the big five have been, in the past centuries threatened by forest practices, such as the removal of dead wood, which dramatically reduced their larval habitat [32,42].The conservation of these species is enforced by the European Habitats Directive 92/43/EC, which lists all TUs in the Annexes II (together with the subspecies M. asper funereus), and three TUs (C.cerdo, R. alpina and Osmoderma spp.) in Annexes IV, with R. alpina and O. eremita as priority species.Additionally, they are classified as Least Concern (C.cerdo, L. cervus, M. asper asper), Near Threatened (R. alpina), Vulnerable (M.asper funereus, O. eremita) and Endangered (O.cristinae) in the Italian IUCN Red List [43].
In view of their relevance as flagship and umbrella species, for the protection of the local saproxylic communities, and considering their conservation status, SDMs of the big five were developed to provide a useful tool to concentrate conservation efforts [22,43].
The present study aims to: (i) identify suitable areas and explore the ecological niches of the TUs; (ii) analyse TU habitat suitability maps in order to highlight hotspots areas for their conservation; (iii) match the identified suitable areas with the Italian protected areas to develop a useful tool for conservation purpose, providing a case of practical application.

Materials and Methods
SDMs for the big five were developed using a comprehensive modelling and simulation framework technique based on the R package 'sdm' [44].The generated models' ensemble was used to predict species potential distribution and highlight any possible gap in their conservation status by comparison with current extent and distribution of protected areas in Italy.Predicted big five distributions were obtained combining remote sensing data on climate and vegetation indices, with a high resolution (~1 km), with species occurrence data (presence and pseudo-absence).Details regarding the specific attributes of occurrence data, environmental variables and modelling are outlined henceforth.

Occurrence Data
Occurrence data came from two CS projects: a European project financed by the EU Life programme, Monitoring Insects with Public Participation (Life 11/NAT/IT 000252) and a national project financed by the Ministry of the Ecological Transition (InNat).The data came from non-expert participants that used the websites and mobile apps of the CS projects to send records of the target species, accompanied by photographs to allow the validation by experts.Pictures were uploaded on the project websites or through the mobile apps, identified and validated by expert entomologists.The raw dataset consisted of 3538 reports from 865 people, uploaded from 2014 to 2020 (accession data: January 2021), even though some records (n = 125) date back to the early 2000s and have been uploaded later.A data cleaning procedure was performed to increase the quality of occurrence data and the model's performance [45,46].Data preparation consisted in removing occurrence data lying outside Italian borders and in eliminating duplicate records.Furthermore, a grid with the same resolution (~1 km), extent and origin of environmental variables was created to select a single occurrence record in each occupied cell, avoiding multiple records which would lead to resampling the same environmental predictors.This procedure led to the following occurrence data per TU: 696 for M. asper asper/funereus, 894 for L. cervus, 124 for C. cerdo, 243 for R. alpina and 32 for Osmoderma spp.(Figure 1).
ords.Furthermore, a grid with the same resolution (~1 km), extent and origin of environmental variables was created to select a single occurrence record in each occupied cell, avoiding multiple records which would lead to resampling the same environmental predictors.This procedure led to the following occurrence data per TU: 696 for M. asper asper/funereus, 894 for L. cervus, 124 for C. cerdo, 243 for R. alpina and 32 for Osmoderma spp.(Figure 1).

Environmental Variables Selection
Given the fundamental role of climate, soil properties, vegetation indices, forest structure and land use for the saproxylic beetle fauna [42,[47][48][49], we compiled a dataset consisting of 47 biotic and abiotic variables (Table 1).The set of predictors for the SDMs consisted in 19 bioclimatic variables, 6 seasonal variables (6 × 4 = 24 in total), among which were vegetation and soil water indices, 3 geomorphological variables (altitude, slope and aspect) and the human modification of terrestrial systems index.Climate data were obtained from WorldClim website (https://www.worldclim.org/data/index.html)(accessed on 22 March 2021), recently updated to the 2.1 version (January 2020), which provides 19 bioclimatic variables at very high spatial resolution (~1 km at the equator).These bioclimatic variables represent historical climate data  on annual trends, seasonality and extreme or limiting environmental factors [50].Elevation above sea level was obtained from WorldClim; slope and aspect were computed using the terrain function of the R package 'raster' [51].Seasonal wind speed, water vapour pressure and solar radiation were calculated from the available monthly variables on WorldClim, dividing seasons according to meteorological breakdown into four three-month periods (i.e., winter starts in December to end in February) [52].As soil properties represent fundamental factors for the larval stage of saproxylic beetles [53], the soil water index (SWI) from Copernicus Global Land Service database [54] was used as a predictor variable.Daily SWI data were downloaded for a six years period (2015-2020) and averaged to obtain seasonal SWI using the Geostatistical Analyst module of ESRI ArcGis TM .The SWI quantifies the moisture condition at various depths in the soil, providing a 1 km resolution product covering Europe (Version 1) [55], which has been demonstrated to yield high agreement with ground observations [56].The normalised difference vegetation index (NDVI) and the enhanced vegetation index (EVI) were chosen as independent variables, being correlated to vegetation biomass and canopy biophysical parameters such as photosynthetic activity [57][58][59].NDVI and EVI were downloaded through the online Data Pool at the NASA Land Processes Distributed Active Archive Center (LP DAAC, https://lpdaac.usgs.gov)(accessed on 22 March 2021), provided by the MODIS 6 collection products.A monthly temporal aggregation over a ten years period (2000-2010) at a spatial resolution of 1 km provided the chosen features for the data acquisition.Seasonal averages of SWI, NDVI and EVI were then computed.To account for the human impact on the environment, data on global human modification of terrestrial systems (HMTS) were downloaded from the NASA data centre of Earth Observing System Data and Information System (EOSDIS).HMTS index quantifies human modification of lands, at a spatial resolution of 1 km, modelling 13 anthropogenic stressors within five major categories: human settlement, agriculture, transportation, mining and energy production, and electrical infrastructure [60].The HMTS index output is a continuous 0-1 metric that reflects the proportion of modified landscape.

Species Distribution Model
As it is likely that the 19 bioclimatic variables from WorldClim are subject to multicollinearity issue [61], to check the overall correlation a variance inflation factor (VIF) stepwise procedure was performed using the vifstep function in the 'usdm' R package [62].The spatial coordinates of occurrence data of each TU (separately) were used to extract values of the 19 bioclimatic variables, then the VIF analysis was performed.Vifstep, calculates VIF for all variables, excludes the one with highest VIF (threshold = 10), then repeats the procedure until no variable with VIF greater than threshold remains [62].Only variables that had VIF > 10 were excluded from subsequent analyses [63,64].Four among the most commonly used modelling methods were used in the SDM framework: generalised linear model (GLM), boosted regression tree (BRT), random forest (RF) and maximum entropy (MaxEnt) [65,66].Pseudo-absence data were generated for each TU, obtaining more than twice the amount of occurrence data, in order to yield the most reliable distribution models [67].For this purpose, a random method in geographic space was used, which removes points located in presence cells.Two data-splitting procedures, subsampling and bootstrapping, were used to produce partitions of the data.For the subsampling procedure, 30% of the initial data was used.Partitions were then used to train the models, whilst the remaining data were used for models' evaluation.Both procedures were applied to each modelling method and repeated three times, leading to a total of 24 model outcomes for each TU, considering the whole SDM framework (4 modelling methods × 2 data-splitting procedures × 3 replications).Model outcomes were selected according to their Area Under Curve value (AUC), effective in summarizing the overall accuracy [68], choosing those with an AUC greater than 0.8, i.e., with an excellent discriminating ability [69].The chosen model outcomes were then combined using a weighted averaging based on true skill statistic (TSS) and a threshold optimization criterion to maximise sensitivity and specificity [44,70].The result of the best model outcomes' combination was used: (i) to build five habitat suitability (HS) maps, (ii) to calculate the relative importance of environmental variables for each TU (based on Pearson correlation test) and (iii) to draw niche-space plots, characterising the big five's ecological requirements.Finally, to provide useful tools for saproxylic beetle conservation, two HS maps were produced summarising the big five's HS maps.The first HS map (HS mean ), resulting from averaging the HS for each TU, highlighted those areas suitable for big five conservation as a whole.To generate the second HS map (HS max ), the maximum HS value of each 1 km 2 cell was used, regardless of the TU.The HS max map can be useful to pinpoint meaningless areas for big five conservation, together with areas where protection's actions should target few Tus.The HS range (max-min value) of both maps has been divided into five equal ranks (hereafter HS levels), defined as follows: minimum, low, medium, high and optimal HS.Then, the raster layer unique value tool in QGIS [71] was applied to the projected HS mean and HS max maps (EPSG: 6875, RDN2008/Italy zone) to calculate the extent (km 2 ) of each HS level.Considering that L. cervus, Osmoderma spp.and R. alpina do not occur in Sardinia [28] and L. cervus does not occur in part of central and in southern Italy [33,72], their HS maps were clipped according to the known distributions, to avoid over/under estimation in calculating the HS mean and HS max maps.

Gap Analysis
Boundaries of 3948 Italian reserves and national parks were downloaded, as three shapefile layers, from the world database on protected areas [73].The protected areas predominantly or entirely marine were excluded, leading to a subset of 3771 protected areas.Portions of these protected areas on the sea were excluded by clipping the vector layers with the mainland boundary of Italy.The zonal statistic tool in QGIS was used to calculate basic statistics (mean, median and maximum value) for each protected area, based on HS mean and HS max raster layers.Then, the resulting statistics were stored in the attribute table of the shapefiles containing the 3771 protected areas (provided in Supplementary Materials).Finally, HS mean and HS max raster layers were clipped according to the maximum extent of protected areas, merging the three shapefiles.
The output raster layers were reprojected as above described to calculate HS levels' extent (km 2 ) within protected areas and, by subtraction, the conservation's gaps.In order to show how our results can contribute to conservation action, we provided a concrete example at a smaller scale.We chose to use the SDM results to identify a conservation gap within a recognized hotspot area, by overlaying the HS mean map (raster file) with the borders of protected areas (shape file).

Species Distribution Model
According to the VIF results, 7 out of 19 bioclimatic variables were excluded from all SDMs due to collinearity problems (VIF > 10).Among the 12 remaining bioclimatic variables, those showing a VIF < 10 for each TU were used as predictors together with the remaining 28 environmental variables (Table 1).Considering the whole SDMs, the RF modelling method reported the best performances, i.e., highest AUC, TSS values and lowest deviance (AUC = 0.90, TSS = 0.71, deviance = 0.67) (Table 2).Details on the 24 model outcomes, for each TU, are reported in Table S1.The outcomes with an excellent discriminating ability, chosen to be combined, were 17 for M. asper asper/funereus, 24 for L. cervus, 9 for C. cerdo, 24 for R. alpina and 13 for Osmoderma spp.The big five HS maps, resulting from the model ensemble, were provided as geoTIFF raster layers (Supplementary Materials and illustrated in Figure 1).Environmental variables' importance, assessed by the model ensemble of each TU, are reported in Figure S1 (Supplementary Materials).Two niche space plots for each TU were reported to illustrate the role of the more relevant environmental variables (according to the Pearson correlation results) in predicting the occurrence of suitable habitat (Figure 2 and Figure S1).Morimus asper asper/funereus shows the widest distribution among the big five, with the highest HS values in northeast Italy and along the northern Apennines (Emilia-Romagna and Tuscany regions).Suitable habitat of this TU is distributed along the whole pre-alpine arc, within the northern and central   Table 2. Summary of the SDM framework results, reporting the actual number of occurrence data, the generated pseudo-absence for each taxonomic unit (TU), together with the means of each calculated metric per modelling method.Metrics' means were calculated from the model outcomes (n° 24) for each TU (reported in supplementary material Table A1).Table 2. Summary of the SDM framework results, reporting the actual number of occurrence data, the generated pseudo-absence for each taxonomic unit (TU), together with the means of each calculated metric per modelling method.Metrics' means were calculated from the model outcomes (n • 24) for each TU (reported in Supplementary Materials Table S1).

Gap Analysis
The overall result of the zonal statistic tool applied to the shapefiles of protected areas was reported as Supplementary Materials (Protected_areas_ITA_vector_layers.zip) and graphically represented showing the differences among protected areas in mean value of HS mean and HS max maps (Figure 4a,b).The protected areas reporting high value of suitable habitat for the big five as a whole (HS mean ) are located in the north-east, in the southern portion of the northern Apennine, in the central Apennines and in the southern Apennine (Calabria region) (Figure 4a).Considering the zonal statistic result for HS mean raster layer, the protected area reporting the highest mean, median and maximum value is the state nature reserve of Sasso Fratino (Emilia-Romagna region).Conversely, the lowest values were recorded in the regional nature reserve Monte Conca (Sicily region).As regards the HS max map raster layer, the special area of conservation of Torrente Lerada (Friuli-Venezia Giulia region) reported the highest mean and median values, whereas the National Park of Foreste Casentinesi showed the highest maximum value (Emilia-Romagna and Tuscany regions) (Figure 4b).The amount of conservation gaps decreases at increasing HS level, from ~80% of the minimum to ~57% of the optimal HS level, considering the suitable habitat for the big five as a whole (HS mean ) (Table 3).On the other hand, the extent of conservation gaps concerning HS max does not show any decreasing trend (Table 3).A practical example of how protected area shapefiles can overlay HS mean map to highlight conservation gaps and possibly plan targeted actions is illustrated in Figure 5.A relevant area for the big five conservation (marked by red squares in Figure 5) is located in the northern Apennines (central Italy), between SACs (Special Areas of Conservation-Habitat Directive 92/43/CE).This area, consisting of approximately 100 km 2 , holds a wide territory characterized by high HS mean values (range 43-57%), deserving of being placed under environmental protection.
Table 3. Results of the raster layer unique value tool applied to the habitat suitability (HS) maps.The extent of the five HS levels was reported for the Italian mainland and for the protected areas (considering their maximum extent).The conservation gaps were reported as the percentage of non-protected areas for each HS level over the Italian mainland.Maps showing the distribution of the Italian protected areas and their relevance for the conservation of big five's suitable habitat.In (a), each protected area was coloured according to the average HSmean value, dividing the range (min-max) in five HS levels: red = minimum, orange = low, yellow = medium, light green = high and dark green = optimal HS.In (b), the shape of each protected area was coloured according to the average HSmax value, following the same ranking procedure.Average, together with median and maximum value, of both HS maps by protected areas were stored in supplementary material (Protected_areas_ITA_vector_layers.zip).

Discussion
This is the first study modelling the distribution of protected saproxylic beetles in Italy using CS data.Our findings confirmed that citizens' participation in conservation projects, as long as their contributions are validated by experts, can play a crucial role in studying the distribution and habitat preferences of endangered insect species.Involvement of the citizen already proved to be a useful tool in monitoring saproxylic beetle species [74].Besides, as previously suggested, CS projects can improve knowledge Maps showing the distribution of the Italian protected areas and their relevance for the conservation of big five's suitable habitat.In (a), each protected area was coloured according to the average HS mean value, dividing the range (min-max) in five HS levels: red = minimum, orange = low, yellow = medium, light green = high and dark green = optimal HS.In (b), the shape of each protected area was coloured according to the average HS max value, following the same ranking procedure.Average, together with median and maximum value, of both HS maps by protected areas were stored in Supplementary Materials (Protected_areas_ITA_vector_layers.zip).

Discussion
This is the first study modelling the distribution of protected saproxylic beetles in Italy using CS data.Our findings confirmed that citizens' participation in conservation projects, as long as their contributions are validated by experts, can play a crucial role in studying the distribution and habitat preferences of endangered insect species.Involvement of the citizen already proved to be a useful tool in monitoring saproxylic beetle species [74].Besides, as previously suggested, CS projects can improve knowledge

Discussion
This is the first study modelling the distribution of protected saproxylic beetles in Italy using CS data.Our findings confirmed that citizens' participation in conservation projects, as long as their contributions are validated by experts, can play a crucial role in studying the distribution and habitat preferences of endangered insect species.Involvement of the citizen already proved to be a useful tool in monitoring saproxylic beetle species [74].Besides, as previously suggested, CS projects can improve knowledge about saproxylic beetle distribution, accelerating the process of gathering occurrence data [16].In particular, our results pinpointed that suitable habitat has a greater extent for some TUs (i.e., M. asper asper/funereus and L. cervus), while it might be restricted for some others (i.e., R. alpina and Osmoderma spp.).Indeed, M. asper asper/funereus and L. cervus are among the most widespread protected saproxylic beetle species in Italy [16].A wide area, covering the Pre-alpine arc and the northern Apennines, reported the highest values of suitable habitat either for M. asper asper/funereus and L. cervus.This result is in line with the preferences of these species for mature and well-structured deciduous forests, especially for and medium-altitude oak woodlands [33,43,75].Conversely, the habitat of Osmoderma spp.and R. alpina can be considered so narrow to be properly described as a series of suitable woodland "islands" that remain in a "sea" of unsuitable land.According to previous findings [76], the suitable habitat for R. alpina occurs mainly in the mountainous altitudinal zone (Figures 2g and 1d), following almost entirely the distribution of beech forest (Fagus sylvatica) [77].Indeed, even if R. alpina can colonise plenty of deciduous tree species, from the coastline to about 2000 m asl, it is considered a montane species, associated with beech forests [23,78].The preference of R. alpina for cold climates is supported by the niche space results which indicate that sites with an annual mean temperature within the range 1-13 • C and a low to medium spring solar radiation are suitable for this species.Contrastingly, other TUs, such as C. cerdo, prefer high summer temperatures and present high HS values in the lowland of the Po valley, characterised by medium-high values of HMTS owing to the intensive agricultural activity.These findings corroborate previous considerations about habitat openness and isolated trees as important parameters for predicting the presence of this species [79].In Italy C. cerdo generally occurs in semi-open wood stands and viable populations can also be found in tree-lined avenues [37,43].It has to be remarked that, our predicted HS maps are based on remote sensing data, which did not take into account some fine-scale habitat features, such as presence of hallow or dead trees, stands and logs, crucial for saproxylic beetles' life-cycle.Nevertheless, the highlighted hotspots might facilitate the process of fine-scale assessment of suitable habitat, targeting field surveys where the HS index reported the highest values.In addition, it might be interesting to conduct surveys in those areas where high HS values have been predicted and no data on big five's occurrence have been reported.This would help both to validate the proposed model and to add new data on the presence of the big five.
To summarise, as it is evident by comparing the HS maps of each TU (Figure 1), the big five showed some differences in the extent and geographical location of suitable habitat, a direct consequence of dissimilarities in their niche space.Nevertheless, the considerations made regarding HS values for each TU might also be applied to some congeneric species, as in the case of Osmoderma spp.For instance, Cerambyx welensii, which is not protected by Habitat Directive, shares the same ecological niche with C. cerdo.
As a consequence of the differences in niche space between TUs, the area of the highlighted hotspots (HS mean map in Figure 3a) has a reduced extent and is marked by lower HS values if compared to the HS maps of single TUs.However, we believe it is critical to focus conservation actions on the highlighted hotspots.Primarily because it might be a way to make the most of the frequently scarce resources earmarked for conservation; secondly, as an umbrella effect, the conservation status of many other species sharing the big five's habitat, may also be improved.Furthermore, the provided HS maps of each TU (Supplementary Materials: HS_maps_raster_layers.zip) could be useful for conservation managers to highlight more accurately TU-specific sites, within the boundaries of the protected area under their own jurisdiction, crucial for conservation purposes.
Alongside their contribution to conservation management, the provided HS maps for the big five fill the gaps in their distribution and provide insights about their habitat preferences.Besides, HS maps can be considered a baseline for further research aiming at studying the range contraction of these endangered species (extinction risk) under future scenarios [80][81][82].Monitoring changes in species distribution may indeed be considered an early warning signal for habitat and species loss.
As regards levels of suitable habitat within the network of protected areas, the results of the zonal statistic tool can be compared to highlight the difference between mean value of HS mean and HS max within each reserve.This match highlights: those protected areas that represent hot spots for the conservation of the big five as a whole, those reserves holding relevant sites for just some target species and the protected areas less valuable for the big five protection (Figure 4a,b).In particular, some protected areas of the central Po plain and the central Apennines (Umbria-Marche Apennine), while being less pivotal for the big five (low HS mean value), own critical sites for the conservation of certain target species (high HS max value).For instance, the Po plain contains protected areas that show marked improvements in the HS level-from low HS mean to high HS max -such as the Special Protection Areas of Valli di Novellara and Valli di Gruppo (Emilia Romagna Region).This finding entails that, even an area characterised by a high level of human landscape modification (logging, farming and agriculture activities), might contain suitable sites for species such as C. cerdo or O. eremita, which benefit from individual old growth trees, albeit placed in an anthropized context.The same reasoning could be applied to some protected areas of central Italy, which have suitable sites for M. asper asper/funereus, L. cervus and R. alpina, turning from medium HS mean to high HS max level (e.g., Gran Sasso e Monti della Laga National Park) (Figure 4a,b).
Finally, as regards the gap analysis results, conservation measures are focused on the big five's hot spots (HS mean ), since nearly the 50% of optimal HS level falls within the protected areas network.Considering HS max , the absence of a decreasing trend in conservation gaps might be the result of the habitat preferences of some TUs for anthropized landscape, where protected areas are mainly small and far apart (e.g., the presence of suitable habitat for C. cerdo in the Po plain).In this regard, the creation of ecological corridors, to connect protected areas, could be a priority to prevent the risk of local extinction.
From a research point of view, CS data inherently present some biases, e.g., due to the different detection probability of the species in relation to their abundance and behaviours.Among the five TUs, Osmoderma spp. is probably the less suited for CS monitoring in Italy, due to its low detectability by non-expert entomologists.Furthermore, the difference in the number of visitors among protected areas (attractiveness, accessibility, etc.), should be taken into account to remove spatial autocorrelation among occurrence data [83].By contrast, it should be mentioned that museum data-frequently used in SDM-could suffer from biases too, e.g., spatial and temporal problems due to old records, which require careful examination to avoid introducing error in distribution models [84,85].On the other hand, we would like to remark that public participation represents a pivotal way to educate new generations and increase their awareness on nature conservation issues [86][87][88].Indeed, even citizens, though to a lesser extent with respect to researchers, might notice changes in abundance and distribution of target species through a continuative participation in CS projects.Finally, the provided practical example in the northern Apennines allowed us to highlight the process to identify conservation gaps and to target managing actions.Hence, we demonstrated how CS data, once validated and analysed by researchers, could contribute to increase knowledge on species ecology and highlight those areas where conservation actions should be focused.Consequently, the final task of national and local authorities, together with park managers, should be to concentrate their efforts on expanding/establishing protected areas located in the highlighted hotspots.

Supplementary Materials:
The following supporting information can be downloaded at: https: //doi.org/10.5281/zenodo.7523948:HS mean , HS max and HS maps of each TU (raster layers); shape files of protected areas with their mean, median and maximum values of HS mean and HS max raster layers (vector layers); Table S1 and Figure S1.

Figure 1 .
Figure 1.Maps of the suitable habitat for each TU, where the colour scale represents the habitat suitability (HS) from minimum (red) to optimal HS (blue).Black points represent the distribution of the big five occurrence data used in the SMD framework.Reports from the Life MIPP and InNat citizen science projects were selected excluding duplicate occurrences and doubletons within the same ~1 km 2 cell.In (a), the HS map of Morimus asper asper/funereus; in (b), Lucanus cervus; in (c),

Figure 1 .
Figure 1.Maps of the suitable habitat for each TU, where the colour scale represents the habitat suitability (HS) from minimum (red) to optimal HS (blue).Black points represent the distribution of the big five occurrence data used in the SMD framework.Reports from the Life MIPP and InNat citizen science projects were selected excluding duplicate occurrences and doubletons within the same ~1 km 2 cell.In (a), the HS map of Morimus asper asper/funereus; in (b), Lucanus cervus; in (c), Cerambyx cerdo; in (d), Rosalia alpina; in (e), Osmoderma spp.The L. cervus HS map was clipped in central Italy according to its known occurrence data and considering that only L. tetraodon is present in southern Italy.

Figure 2 .
Figure 2. Graphical representation of the niche space for the big five: (a,b) Morimus asper asper/funereus, (c,d) Lucanus cervus, (e,f) Cerambyx cerdo, (g,h) Rosalia alpina and (i,j) Osmoderma spp.The colour scale indicates habitat suitability (HS) according to a two-dimensional environmental space, based on the selected predictors, from low HS (dark red) to high HS (dark blue).Environmental variables are reported

Figure 3 .
Figure 3. Maps of the suitable habitat for the big five, where the colour scale represents the habitat suitability (HS) from minimum (red) to optimal HS (green).In (a), the HSmean map indicates the hot spots for the big five conservation as a whole, derived from the raster layer produced by averaging the HS of each TU.In (b), the HSmax map, produced by selecting the maximum value of each cell among the TUs' HS maps, shows those relevant areas for the conservation of at least one TU (green) and meaningless areas for big five conservation (red).For the HS maps of each TU and the raster layers of HSmean and HSmax please refer to the supplementary material (HS_maps_raster_layers.zip).

Figure 3 .
Figure 3. Maps of the suitable habitat for the big five, where the colour scale represents the habitat suitability (HS) from minimum (red) to optimal HS (green).In (a), the HS mean map indicates the hot spots for the big five conservation as a whole, derived from the raster layer produced by averaging the HS of each TU.In (b), the HS max map, produced by selecting the maximum value of each cell among the TUs' HS maps, shows those relevant areas for the conservation of at least one TU (green) and meaningless areas for big five conservation (red).For the HS maps of each TU and the raster layers of HS mean and HS max please refer to the Supplementary Materials (HS_maps_raster_layers.zip).

Figure 4 .
Figure 4. Maps showing the distribution of the Italian protected areas and their relevance for the conservation of big five's suitable habitat.In (a), each protected area was coloured according to the average HSmean value, dividing the range (min-max) in five HS levels: red = minimum, orange = low, yellow = medium, light green = high and dark green = optimal HS.In (b), the shape of each protected area was coloured according to the average HSmax value, following the same ranking procedure.Average, together with median and maximum value, of both HS maps by protected areas were stored in supplementary material (Protected_areas_ITA_vector_layers.zip).

Figure 5 .
Figure 5. Map showing the habitat suitability hotspot in northern Apennines of the Tuscany region.The area marked by red squares indicates the conservation gap highlighted by overlaying the national protected area boundaries to the HSmean map of the big five.

Figure 4 .
Figure 4.Maps showing the distribution of the Italian protected areas and their relevance for the conservation of big five's suitable habitat.In (a), each protected area was coloured according to the average HS mean value, dividing the range (min-max) in five HS levels: red = minimum, orange = low, yellow = medium, light green = high and dark green = optimal HS.In (b), the shape of each protected area was coloured according to the average HS max value, following the same ranking procedure.Average, together with median and maximum value, of both HS maps by protected areas were stored in Supplementary Materials (Protected_areas_ITA_vector_layers.zip).

Diversity 2023 , 18 Figure 4 .
Figure 4.Maps showing the distribution of the Italian protected areas and their relevance for the conservation of big five's suitable habitat.In (a), each protected area was coloured according to the average HSmean value, dividing the range (min-max) in five HS levels: red = minimum, orange = low, yellow = medium, light green = high and dark green = optimal HS.In (b), the shape of each protected area was coloured according to the average HSmax value, following the same ranking procedure.Average, together with median and maximum value, of both HS maps by protected areas were stored in supplementary material (Protected_areas_ITA_vector_layers.zip).

Figure 5 .
Figure 5. Map showing the habitat suitability hotspot in northern Apennines of the Tuscany region.The area marked by red squares indicates the conservation gap highlighted by overlaying the national protected area boundaries to the HSmean map of the big five.

Figure 5 .
Figure 5. Map showing the habitat suitability hotspot in northern Apennines of the Tuscany region.The area marked by red squares indicates the conservation gap highlighted by overlaying the national protected area boundaries to the HS mean map of the big five.

Table 1 .
Variables used to model habitat suitability of the big five saproxylic beetle species.The results of the variance inflation factor (VIF) stepwise procedure indicate the BIO variables used in the SDM framework for each TU (VIF < 10).