1. Introduction
Fireflies have long been considered familiar beings in Korea due to their light-emitting properties and have been recognized as insects with emotional symbolism [
1]. On the other hand, as their habitat is limited and they can only survive in an ecologically stable environment, their value as an environmental indicator insect that reflects the level of environmental pollution is currently highly evaluated [
1,
2,
3]. However, despite these ecological characteristics and environmental sensitivity, the habitat is damaged by reckless development, environmental destruction, ecosystem disturbance, and landscape damage due to the industrialization and urbanization of modern society, and the population and habitat area are rapidly decreasing [
3,
4,
5]. In particular, the number and intensity of artificial light sources are increasing, which reduces mutual recognition opportunities between females and males, thereby decreasing the population [
5,
6,
7]. Consequently, fireflies are increasingly valued as environmental indicators of the extent of environmental pollution and the need for restoration [
4].
The most commonly encountered fireflies in South Korea are
Luciola lateralis, Lychnuris rufa, and
Luciola unmunsana; conservation studies have mainly focused on
Luciola lateralis and
Lychnuris rufa [
8]. Of these,
Luciola unmunsana is endemic to South Korea [
9] and is difficult to collect because of its lack of inner wings and low mobility in females [
6,
10]; therefore, there is a relative lack of research on its distribution and restoration compared to other firefly species in South Korea.
Currently, SDM are used in various studies, including biodiversity assessment, protected area designation, habitat management and restoration, population or community ecosystem modeling, and climate change prediction [
11]. In particular, they provide important information for conservation planning and management by identifying the geographical distribution and properties of populations to identify priority areas to be protected or potentially threatened areas in which to establish conservation plans and management measures [
12,
13]. Maximum Entropy (MaxEnt), a single model, is effective in modeling the potential distribution of rare and endangered species, as it performs better in small sample sizes compared to other species distribution modeling methods, and is widely used in Korea and abroad because it has the advantage of estimating the ecological status of species with only occurrence information [
13,
14]. However, when applying single models alone, the accuracy of the models has been questioned because different algorithms of single models lead to different predictions [
15]. Therefore, ensemble models that integrate multiple models have recently been used and have the advantage of minimizing the shortcomings of single models and maximizing the advantages of reducing the uncertainties of single models [
11,
13]. Relatedly, a number of studies have been reported that utilize MaxEnt and ensemble models to predict potential habitats for specific species [
11,
16,
17,
18,
19,
20]. However, these prior studies were conducted primarily for endangered or tree-damaging pest species, and few studies have been conducted to predict potential habitats for species with emotional/cultural value and environmental indicator properties, such as fireflies. Abroad, habitat prediction studies using SDMs have been actively conducted on various firefly species such as
Luciola cruciate,
Photinus signaticollis,
Atypella Oliff, etc. [
21,
22,
23]. These studies consider fireflies to be environmental indicator species and actively utilize SDM results in habitat characteristic analysis and conservation area designation. Reflecting this research trend, this study also predicted potential habitats for
Luciola unmunsana, a species native to Korea.
Regarding
Luciola unmunsana habitat characteristics and restoration, studies have been conducted by the Daegu Gyeongbuk Research Institute (2012) [
24], the Daegu Provincial Environment Agency (2015) [
25], and Kim (2015) [
8]. Some studies have been conducted by the Jeonbuk Green Environment Support Center (2021) [
26], Jeonbuk Green Environment Support Center (2022) [
10], and Lim et al. (2022) [
27]. However, these studies analyzed specific occurrence points or limited administrative areas, and none performed analysis at a national spatial scale.
Therefore, the aim of this study was to predict the potential habitats of Luciola unmunsana, a major environmental indicator species in South Korea, by creating a species distribution model for the entire country. It is believed that these results can be utilized as basic data for investigating the occurrence of Luciola unmunsana in South Korea.
2. Materials and Methods
The spatial scope of this study was set to South Korea to predict potential habitats for
Luciola unmunsana. The temporal range reflects a 30-year normal climate, which is known to be the optimal sample size for reliable estimates [
28].
In addition, we reviewed environmental factors influencing the habitat of Luciola unmunsana based on previous studies, reflected in the development of ecoclimatic indices, topographic variables, and land cover maps. To ensure consistency in the analysis, all variables were standardized to a spatial resolution of 1 km × 1 km.
In many previous studies [
21,
22,
23] on fireflies, species distribution models such as MaxEnt or Random Forest have often been used individually depending on the study’s objectives and species characteristics, which can lead to limitations in terms of prediction variability and uncertainty between models. To address these issues, this study applied both a MaxEnt model and an ensemble model using MaxEnt 3.4.4 and RStudio 4.2.1. Based on these models, we analyzed the contribution and importance of the environmental variables affecting the predicted potential habitats of
Luciola unmunsana, and also evaluated the prediction accuracy of both models.
2.1. Building Input
2.1.1. Occurrence Point Data
To build a species distribution model, we needed data on the target species’ occurrence points; in this study, we obtained
Luciola unmunsana occurrence points from JGESC, GBIF (survey period 2000–2004), and NIBR [
29]. In addition, we constructed GPS coordinates of the occurrence points presented in previous studies on
Luciola unmunsana [
8,
25] and constructed GPS coordinates of 39 points in total (
Figure 1).
2.1.2. Ecological Climate Index
In general, the data for the Ecological Climate Index comprise global-scale input data [
30] provided by Worldcilm, Climatologies at high resolution for the Earth’s land surface areas (CHELSA), and global climatologies for bioclimatic modeling (CliMond). However, to improve the accuracy of the analysis, this study utilized ecoclimatic index data based on the shared socioeconomic pathway (SSP) scenario [
31] at a 1 km resolution produced by the Korea Rural Development Administration. This dataset was generated using 20 bioclimatic indices (Bio01–Bio19) proposed by O’Donnell and Ignizio (2012) [
32] (
Table A2). Most previous studies based on SSP and RCP scenarios [
33,
34] have applied the 1981–2010 period as the temporal range for current bioclimatic variables. In contrast, nonclimatic variables such as topography and land use are generally based on more recent data sources [
35,
36]. Accordingly, this study applied bioclimatic variables based on 1981–2010 climate data.
When modeling using the Ecological Climate Index, a high correlation between variables can reduce efficiency and adversely affect the interpretation of results [
37,
38]. Therefore, to account for the correlation between variables, multicollinearity was removed through an analysis using Pearson’s correlation coefficient. This is the most widely used statistic to measure the correlation between variables on an equivalence/ratio scale [
39]. In this study, multicollinearity was removed by using Pearson’s correlation coefficient in RStudio 4.3.3 to exclude variables with a high correlation of ±0.85 or higher, resulting in the selection and analysis of Bio01, Bio02, Bio04, Bio12, Bio14, and Bio15.
2.1.3. Terrain Variables
In general, fireflies occur at high densities in low-slope sites [
25]. These slopes are less prone to soil runoff, allowing the accumulation of organic matter and moisture, which can lead to diverse vegetation [
19]. In particular, fireflies prefer dark and shady environments and thrive in areas with diffused light or short periods of sunlight [
24].
Luciola unmunsana is also generally found in terrains where stable humidity can be maintained, such as forest edges on gentle slopes, which tend to be located near water resources, such as streams and ponds, or around broadleaf forest stands with multi-layered vegetation that are often associated with agricultural ditches and streams [
25,
26,
40]. Terrestrial snails, the main food source for
Luciola unmunsana, are found in shady forests with little direct sunlight or stable humidity [
26].
Therefore, to derive nonclimatic variables affecting the habitat of
Luciola unmunsana, a Digital Elevation Model (DEM) with a resolution of 90 m × 90 m was obtained from CGIAR-CSI [
41], and slope and shade gradient analyses were conducted. In addition, a water network analysis map was generated using the Environmental Big Data Platform [
42] to derive variables related to the distance from water bodies.
2.1.4. Land Cover Map
To reflect land cover and use in Korea, we used WorldCover V2 2021, a 10 m × 10 m spatial resolution land cover map provided by the European Space Agency (ESA) [
43]. It is based on Sentinel-1 and Sentinel-2, and has an overall accuracy of 76.7% [
44].
The results from the Daegu Provincial Environment Agency (2015) [
25] showed that
Luciola unmunsana occurred mainly in coniferous forests with mixed broadleaf trees and in areas dominated by coniferous forests. In addition,
Luciola unmunsana is found in broadleaf forests, but its food source is also found in forests such as bamboo forests and coniferous forests [
10]. Therefore, in this study, we utilized a map for tree cover among the classified items and included data for non-forested areas, because it is believed that the response of potential habitats in non-forested areas will also affect the prediction of potential habitats in forested areas.
2.1.5. Enhanced Vegetation Index (EVI)
As vegetation develops, dead leaves and octopuses accumulate on the surface, and microorganisms in the soil decompose them, increasing the organic matter content [
25]. This increases water retention during rainfall, creating conditions for
Luciola unmunsana larvae to live under fallen leaves, octopuses, organic matter layers, and stones [
25]. Therefore, in this study, the vegetation index (VI) was additionally entered to reflect information on vegetation abundance and vegetation vigor in the species distribution model.
The EVI is an index developed to correct for atmospheric conditions, water pipe effects, and areas with high vegetation density and provides an improved vegetation index using atmospheric correction factors, water pipe correction factors, and blue light values [
45]. Compared to the Normalized Difference Vegetation Index (NDVI), it reduces errors due to atmospheric residuals and can be used more effectively in seasonal and process-based models of forest vegetation [
46,
47,
48]. Therefore, in this study, the EVI was used to construct a species distribution model.
To obtain the EVI, we used monthly averaged values for 2022 from Moderate Resolution Imaging Spectroradiometer (MODIS) [
49,
50] satellite imagery.
2.2. Creation of SDM
2.2.1. MaxEnt
The MaxEnt model is a machine learning model based on the Maximum Entropy Approach, first introduced by Berger et al. (1996) [
51], which can estimate values by maximizing incomplete data [
19] and shows high accuracy compared to other SDMs that use only species occurrence data [
52,
53,
54]. It is also commonly used in species distribution modeling because it represents linear non-parametric relationships between variables [
55,
56].
Although parameters are set by default within MaxEnt, this does not always result in an optimal model and can result in a suboptimal model [
57,
58]. Therefore, it is necessary to analyze the complexity of the model with different combinations of parameters to select a combination with a lower complexity for modeling and optimizing the model. Two main selectable parameters affect model performance: Feature Class (FC) and Regularization Multiplier (RM) [
59]. FC refers to a set of mathematical transformations of the independent variables used in the model to optimize the model, and there are five types: linear (L), Quadratic (Q), Product (P), Hinge (H), and threshold (T) [
55,
60]. RM is a numerical parameter that controls the strength of the FC used in the model and can reduce or increase the ease of modeling [
33,
61]. The Akaike information criterion (AIC) value is a statistic that quantifies the degree of discrepancy between the true and candidate models, reflecting the fit and complexity of the model; the model with a minimum AICc value, delta AICc, equal to zero is considered the best model [
62,
63].
In this study, 60 models were generated using six FCs (L, LQ, H, LQH, LQHP, and LQHPT) and 10 RMs (0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, and 5). In addition, MaxEnt 3.4.4 and ENMeval packages were run together in Rstudio, K-fold was performed 10 times, and the model with a delta AICc value of ‘0’ was finally selected.
In the modeling settings, FC was selected to be LQHP, RM was selected to be 3, and ‘Replicated run type’ was set to ‘Bootstrap’ and repeated 10 times.
Background points used during model training represent environmental information from locations where the presence of the species has not been confirmed, unlike occurrence points [
61]. The ‘Max number of background points’ is a parameter that limits the number of these background samples, and it is generally set to 10,000 when the number of available background points exceeds that threshold [
64]. In this study, 98,928 potential background points were identified within the spatial extent, and the number was set to 10,000. The results were generated using logistic output.
2.2.2. Ensemble Model
Ensemble models are a recently developed method for predicting species distributions that combine multiple algorithms and statistical models to reduce the uncertainty of a single model. It has been proposed to improve outcomes such as predicting the current distribution of species, patterns of species richness, and species diversity [
11,
65,
66]. It has the advantage of providing a variety of validation methods for the model that can overcome the shortcomings of other commonly used models, and is currently widely used [
66].
In this study, RStudio 4.3.3 was used to perform the analysis for developing an ensemble model to predict the potential habitats of
Luciola unmunsana. The analysis employed the Biomod2 package within RStudio, and following the approach of previous studies by Liu et al. (2019) [
67] and Čengić et al. (2020) [
68], 1000 pseudo-absence points were randomly generated based on 39 occurrence records to construct pseudo-absence data for model training.
We used six individual models to build the ensemble model—GLM, GBM, CTA, FDA, MARS, and RF—among the models provided by Biomod2. The selected models require the input of non-abundance and abundance data and are known to be more accurate than models based on abundance data alone [
13].
2.3. Model Accuracy Validation
2.3.1. MaxEnt Model Accuracy Validation
To verify the accuracy of the MaxEnt model, an AUC test was performed. The AUC is a measure of whether a true value is predicted by a true value or whether a false value is predicted by a true value, and the accuracy is measured using the AUC value of the ROC curve [
17,
69]. In general, AUC values are interpreted as follows: 0.5–0.6 (failure), 0.6–0.7 (no value), 0.7–0.8 (poor), 0.8–0.9 (good), and <0.9 (excellent) [
70].
2.3.2. Ensemble Model Accuracy Validation
Kappa, TSS, and AUC validations were performed to verify the accuracy of the ensemble models. Kappa is used to determine the overall accuracy of model predictions by correcting the possibility of classification matching by chance. It is mainly used to validate the accuracy of occurrence and non-occurrence data [
11]. In particular, it is widely used as a means of validating models in ecology and validating the accuracy of land cover classification using satellite imagery [
11,
71]. The value of the coefficient ranges between −1 and 1, with values of 0.2 or less indicating poor agreement, values of 0.21 to 0.4 indicating moderate agreement, values of 0.41 to 0.6 indicating moderate agreement, values of 0.61 to 0.8 indicating high agreement, and values of 0.81 to 1 indicating perfect agreement [
72].
The TSS value includes an assessment of the accuracy of both the occurrence and non-occurrence data. Unlike the AUC, it is not dependent on the distribution area or shape of the target species and is therefore often used to validate SDMs [
71]. A TSS coefficient value of 0.4 to 0.6 indicates ‘moderate agreement’, of 0.6 to 0.7 indicates ‘high agreement’, and of 0.7 or higher indicates ‘near agreement’ [
12].
4. Discussion
This study utilized a single model, MaxEnt, and an ensemble model to predict the potential habitats of
Luciola unmunsana. Importance analysis of the variables in both models showed that the EVI, land cover, hydrological network analysis, and annual precipitation (Bio12) were highly important. According to the response curve analysis for each variable, the response value for the EVI increased as vegetation vigor increased. In the water network analysis, the response decreased as the distance from water bodies increased, which is consistent with the ecological characteristics of
Luciola unmunsana, a species that prefers moist environments near water sources [
25,
27,
40]. In the case of land cover maps, the response was higher in forested areas than in non-forested areas, and the response increased as annual precipitation increased.
In overlaying the highly significant variables with the predicted potential habitat, the EVI was 0.4 to 0.5, the distance from the water bodies was 0–100 m, and the annual precipitation was 1500 mm–2000 mm. Taken together, these results suggest that the most suitable areas for Luciola unmunsana are those with forested vegetation and relatively close proximity to water systems, where humidity is stable.
The predicted area of the potential habitats was smaller in the ensemble model (6971 km
2) compared to the MaxEnt model (8785 km
2). This result was likely due to the tendency of the MaxEnt model to overestimate potential habitats, as in previous studies [
74,
75], and the fact that the ensemble model only identified potential habitats where the predictions of all six models used to build the model overlapped.
5. Conclusions
The aim of this study was to predict the potential habitats of Luciola unmunsana, a major environmental indicator species in South Korea. To this end, we determined the occurrence points of Luciola unmunsana and predicted potential habitats using MaxEnt and ensemble models for South Korea. To predict potential habitats, we reviewed the main environmental factors that affected the habitat of Luciola unmunsana in previous studies and utilized them as variables for analysis. Subsequently, the contribution and significance of the variables were evaluated, and the prediction accuracy of the two models was verified.
The main findings of this study are as follows: First, both models showed that the EVI, hydrological network analysis, land cover, and annual precipitation (Bio12) were relatively influential in predicting Luciola unmunsana potential habitats. The response curve analysis of MaxEnt showed that the response value increased as the EVI increased, and the response tended to increase with increasing distance from water systems. In the case of the land cover map, the response was higher in forested areas and the response value increased with higher annual precipitation.
Second, we overlaid the predicted potential habitats with variables that showed high importance in determining their distribution and found that areas with high vegetation vigor within their forests, close proximity to water systems, and relatively high annual precipitation, which allows humidity to remain stable, were analyzed as potential habitats for
Luciola unmunsana. These results are consistent with the ecological characteristics of
Luciola unmunsana, which prefers forest edges or low-light forests with developed understory vegetation and stable humidity [
25,
27,
40], as well as the habitat characteristics of its main food source, terrestrial snails [
26].
Third, literature surveys confirmed the occurrence of
Luciola unmunsana in areas such as Geumsan-gun, Chungcheongnam-do [
76,
77], Yeongam-gun, Jeollanam-do [
78], Mudeungsan Mountain, Gwangju-si [
79], and Gijang-gun, Busan-si [
80], which were predicted as potential habitats but were not included as occurrence points in the model. As a result of the model accuracy validation, the MaxEnt model was evaluated as ‘good,’ with an AUC value of 0.810. In addition, the ensemble model was evaluated as ‘good’ with a Kappa value of 0.741, a TSS value of 0.808, and a near-agreement level, and its AUC value of 0.961 was evaluated as ‘excellent.’ Therefore, the potential habitat prediction results of this study were reliable based on the relatively high model accuracy, and we believe that key habitats were predicted even in areas for which no emergence points were entered.
This study is significant in that it is the first to establish a national-level species distribution model for
Luciola unmunsana, which is declining in population owing to industrialization and urbanization, and to predict potential habitats by applying various environmental variables reflecting ecological characteristics, thereby providing a foundation for the conservation and utilization of a species of ecological and cultural importance in Korea. In particular, the findings of this study offer more reliable predictions by integrating a single model (MaxEnt) and an ensemble model, and thus can serve both academic and practical purposes in habitat conservation efforts. However, the 39 points of occurrence used in this study correspond to a relatively small number of samples compared to the national analysis. In general, when the number of samples is small, the number of environmental variables that can be included in the model is limited [
81] and the risk of overfitting increases [
81,
82]. Moreover, with a limited number of occurrence points, it is challenging to achieve an even spatial distribution, which may lead to spatial bias if data are concentrated in specific areas [
83,
84]. Therefore, to improve the accuracy and generalizability of future species distribution models, it is necessary to collect a larger number of occurrence points and construct models using spatially well-distributed data. Furthermore, in carrying out an on-site survey based on the derived potential habitats to preserve and restore a habitat, it is necessary to check the actual habitat of the individual and to comprehensively check the physical and ecological characteristics and threats and the use of surrounding land. In addition, the spatial resolution was re-projected to 1 km × 1 km to analyze South Korea. Consequently, a single pixel may contain various environmental and topographical characteristics, and some details may have been lost. Therefore, future studies with regional spatial coverage may need to input variables with higher spatial resolutions to improve the precision and predictive power of the model.