Exploring the Potential Distribution of Relic Trochodendron aralioides : An Approach Using Open-Access Resources and Free Software

: Trochodendron aralioides Siebold & Zucc. is a relic tree that is discontinuously scattered across the mountainous areas of Japan, Taiwan, and South Korea, but the origin of T. aralioides in South Korea is still unclear and debated. To conﬁrm its distribution and explore its origins, we constructed a streamlined framework to examine potential species distribution using multiple open access data and free and open-source software, as well as employing maximum entropy principles to predict the potential distribution of T. aralioides . The results showed reasonably good discrimination and were used to examine and discuss the explicit distribution of T. aralioides . The potential distribution of T. aralioides in Japan extended from Iriomote Island to approximately 37 ◦ N in Honshu on the Paciﬁc Ocean side. In Taiwan, the potential distribution of T. aralioides was more common than in Japan. It occurred at 1500–3000 m a.s.l. across the Central Mountain Range and decreased toward the northern and southern tips, correlating to the descending pattern of the cloud belt. Thermal and moisture conditions were important factors to determine the distribution of T. aralioides . The potential distribution indicated that Jeju island had high potential as a habitat for T. aralioides , and that may indirectly imply its existence and origins in South Korea, as some researchers have noted.


Introduction
Trochodendron aralioides Siebold and Zuccarini (Trochodendraceae Prantl) is a distinctive broad-leaved tree without a vessel in the woody tissue and is the only living species of this genus [1,2]. Although Trochodendron was widely distributed in the Northern Hemisphere during the geological ages in the past [3][4][5][6], the distribution of relic T. aralioides is currently restricted to East Asia [2]. In general, T. aralioides is distributed in the prevalent cloud zones of temperate or subtropical mountains in East Asia, but only a few distribution maps of T. aralioides were found in the Angiosperm Phylogeny website (http://www.mobot.org/MOBOT/research/APweb/; accessed on 19 September 2021) and GBIF georeferenced data map (http://www.gbif.org/species/4191567; accessed on 19 September 2021). The spatially explicit distribution of T. aralioides still remains to be further explored.
Whether T. aralioides is a native species in South Korea is not clear due to the inconsistent descriptions of its natural distribution in different sources. It has been consistently described as a native species in Taiwan and Japan. In Taiwan, T. aralioides is moderately Since the number of occurrences in Japan was relatively small in the Global Biodiversity Information Facility (GBIF) database, we merged it with the occurrence data from the Phytosociological Relevé database (PRDB). PRDB is a phytosociological database composed of relevé data throughout Japan, while the original data sources were digitized from a variety of published and unpublished literature [36]. GBIF was established by recommendation from the Organization for Economic Cooperation and Development in 2001 to encourage free and open access to biodiversity data. Through a global network of countries and organizations, GBIF promotes and facilitates the mobilization, access, discovery, and use of biodiversity information across spatial and temporal scales. Therefore, many researchers (e.g., Guralnick et al. [37]; Porretta et al. [38]) have used records sourced from GBIF search results to represent species occurrence samples, and many front-end interfaces have been developed with programming languages R, Python, and Ruby to access GBIF occurrence data [39]. Since the data were collected by different methods and the data quality would be important for accurate modeling, we adopted the data-cleaning framework employed by Maletic and Marcus [40], which included three steps: (1) to define error types, (2) to identify error instances, and (3) to correct the defined errors. Chapman (2005) also introduced several data-cleaning principles for biodiversity spatial data, such as verifying georeferences against other information, and checking spatial accuracy against other information using GIS, etc. In this study, we defined three error types: (1) occurrence data out of the target area, such as occurrences in the United Kingdom; (2) impossible locality, such as for occurrences in the Pacific Ocean; and (3) duplicate entries. We used QGIS to test for and remove the aforementioned errors to ensure the accuracy of the data used for modeling.

Open-Access Resources and Free Software
Many open-access resources can be accessed via the Internet, including geographical data and software. In this paper, we used the following five freely available online resources (Table 1) to model the distribution of T. aralioides. Figure 2 shows the flowchart of the integrated applications.

Species Occurrences
Since the number of occurrences in Japan was relatively small in the Global Biodiversity Information Facility (GBIF) database, we merged it with the occurrence data from the Phytosociological Relevé database (PRDB). PRDB is a phytosociological database composed of relevé data throughout Japan, while the original data sources were digitized from a variety of published and unpublished literature [36]. GBIF was established by recommendation from the Organization for Economic Cooperation and Development in free and open access to biodiversity data. Through a global network of countries and organizations, GBIF promotes and facilitates the mobilization, access, discovery, and use of biodiversity information across spatial and temporal scales. Therefore, many researchers (e.g., Guralnick et al. [37]; Porretta et al. [38]) have used records sourced from GBIF search results to represent species occurrence samples, and many front-end interfaces have been developed with programming languages R, Python, and Ruby to access GBIF occurrence data [39]. Since the data were collected by different methods and the data quality would be important for accurate modeling, we adopted the data-cleaning framework employed by Maletic and Marcus [40], which included three steps: (1) to define error types, (2) to identify error instances, and (3) to correct the defined errors. Chapman (2005) also introduced several data-cleaning principles for biodiversity spatial data, such as verifying georeferences against other information, and checking spatial accuracy against other information using GIS, etc. In this study, we defined three error types: (1) occurrence data out of the target area, such as occurrences in the United Kingdom; (2) impossible locality, such as for occurrences in the Pacific Ocean; and (3) duplicate entries. We used QGIS to test for and remove the aforementioned errors to ensure the accuracy of the data used for modeling.

Environmental Variables
High-resolution climate data are essential for ecological studies at a global or regional scale, and interpolation methods are typically required with meteorological station data, such as PRISM, WorldClim [28], and CHELSA [29], etc. Many researchers have used World-Clim layers as environmental variables, such as Ashraf et al. [41], Meneguzzi et al. [42], Porto et al. [43], and Rupprecht et al. [44]. However, the CHELSA datasets integrate wind and topological factors to correct for bias, which is why they are considered more robust and accurate than WorldClim datasets [29]. CHELSA provides global climatic layers downscaled to 30 arcseconds (~1 km), which are therefore more precise than the other free global climate databases, such as CliMond [45], for ecological modeling and geographic information system. The climate conditions data include monthly precipitation; mean, minimum, and maximum temperature, as well as altitude; and 19 bioclimatic layers; however, some of the environmental factors are estimated [44]. Therefore, the most common method of variable selection for SDM was based on pairwise Pearson correlation coefficients [46,47]. We excluded bioclimatic variables that had a correlation coefficient > 0.75. However, the determining climatic variables of Trochodendron aralioides were still unclear, so we also used jackknife resampling analysis (a leave-one-out resampling technique to evaluate the environmental variables) to assess the most important variables for model building.

Environmental Variables
High-resolution climate data are essential for ecological studies at a global or regional scale, and interpolation methods are typically required with meteorological station data, such as PRISM, WorldClim [28], and CHELSA [29], etc. Many researchers have used WorldClim layers as environmental variables, such as Ashraf et al. [41], Meneguzzi et al. [42], Porto et al. [43], and Rupprecht et al. [44]. However, the CHELSA datasets integrate wind and topological factors to correct for bias, which is why they are considered more robust and accurate than WorldClim datasets [29]. CHELSA provides global climatic layers downscaled to 30 arcseconds (~1 km), which are therefore more precise than the other free global climate databases, such as CliMond [45], for ecological modeling and geographic information system. The climate conditions data include monthly precipitation; mean, minimum, and maximum temperature, as well as altitude; and 19 bioclimatic layers; however, some of the environmental factors are estimated [44]. Therefore, the most common method of variable selection for SDM was based on pairwise Pearson correlation coefficients [46,47]. We excluded bioclimatic variables that had a correlation coefficient >

GIS: GRASS GIS and QGIS
GRASS GIS is a free GIS software used for geospatial data management and analyses, image processing, cartographic production, spatial modeling, and visualization. GRASS has been developing for more than 34 years, and it includes many geographical algorithms that have been used regularly for environmental modeling [48]. QGIS is also a free GIS desktop software; in addition to its tools for vector and raster processing, the most advanced feature is that it can function as a wrapper of third-party GISs (e.g., call the function or programs of GRASS GIS, Saga GIS, PostgreSQL/PostGIS or even R programs while working inside QGIS). In this study, we used QGIS as the main GIS processing tool and called GRASS functions by a GRASS plugin.

MaxEnt
MaxEnt is a machine-learning method based on maximum entropy principles, and its presence-only method is suitable for species investigation datasets [49]. Although there are many SDM techniques [19,20,50], MaxEnt was selected to model the potential distribution of T. aralioides for the following two reasons. First, the accuracy associated with occurrences has been a common concern when using location data [51,52]. The spatial positions of T. aralioides in GBIF have often been in error. MaxEnt is more robust than other methods, especially in regression-based modeling, in spatial errors of species occurrence [53]. Second, the sample size (occurrence records) of T. aralioides is relatively small in the scale of East Asia. MaxEnt has performed better than other methods with small sample sizes [44,54,55].

Model Building and Validation
We developed the SDMs with MaxEnt, version 3.4.1, using nine important environmental variables and cleaned data. Before predicting our model, we used the ENMeval package of R to evaluate and test a range of parameters to optimize MaxEnt's model parameters [56,57]. We randomly created 10,000 background points and applied a checkboard2 algorithm to create spatial blocks in order to avoid model-overfitting issues in MaxEnt. A set of features including linear (L), quadratic (Q), product (P), threshold (T), and hinge (H) as well as regularization multipliers (RMs) from 0.25 to 1.5 with 0.25 intervals were tested in the ENMevaluate function and to calculate the corrected Akaike information criterion (AICc) and determine an optimized model. Furthermore, in order to achieve statistical significance, we also executed 100 replications with 5000 iterations per replication in MaxEnt to obtain the mean probability of the potential distribution map.
We used the area under the receiver operating characteristics curve (AUC); a thresholdindependent method that characterizes the performance of a model at all possible thresholds by a single number to evaluate the modeling performance [49]. The modeling performance was categorized into three classes, as described by Swets [58]: poor discrimination (0.5 < AUC < 0.7), reasonable discrimination (0.7 < AUC < 0.9), and very good discrimination (0.9 < AUC < 1.0). However, several studies reported that the AUC would not be a good measure of performance in SDMs [59,60], so we also used the AICc from the ENMeval package to increase the discrimination of the model.

Species Dataset
The numbers of T. aralioides records contained in the PRDB and GBIF database obtained in July 2018 were 529 and 1729, respectively (Table S1 from Supplementary Materials). Among them, only 943 records had been georeferenced. We excluded 214 occurrences based on a point-in-polygon check (i.e., joining attributes by location in QGIS) and data-cleaning procedures (e.g., remove duplicates, doubtful records, etc.) available in the QGIS and PostgreSQL databases. Besides the outliers of occurrences, there were several plots located in residential areas, such as in a garden or a park. Therefore, we had 806 occurrence records of T. aralioides with verified coordinates as the species dataset, with 308 records in Taiwan, 498 records in Japan, and none in South Korea (see Figure 1). The species-occurrence samples were exported from the PostgreSQL database via a comma-separated-values (.csv) file, which served as the input data for MaxEnt. The 806 samples were located within an area of occurrence spread across 683 modeling environmental grids (with a resolution of 30 arcseconds) due to the occurrence of several records in the same grid. Table 2 presents the statistical summary of the bioclimatic layers in our modeling area and in the occurrence grids.

MaxEnt Modeling and Performance
The ENMevaluate results revealed that the lowest AICc (18,897.19) was the combination of LQ and LQPT features with an RM = 0.25 ( Figure S1). The average tested AUC was 0.8821, and the trained AUC was 0.8873. The distribution modeling by MaxEnt for T. aralioides obtained reasonable discrimination with an AUC mean value of 0.8577 (0.6738-0.9733) for the training data and 0.8588 (0.8555-0.8619) for the testing data. The modeling distribution map of T. aralioides is shown in Figure 3. The highest probabilities of predicting hot spots were in the Central Mountain Range in Taiwan (including the low elevation areas of the northern and southern tips that include winter northeast monsoon-facing slopes); the mountainous ranges of Honshu, Shikoku, Kyushu, Yakushima Island, and Iriomote Island in Japan; and Jeju Island in South Korea. Figure S2 shows the results of the jackknife test and the training gains for evaluating the relative contributions of the predictor environmental variables for the MaxEnt T. aralioides model. The estimates of the relative contributions of five environmental variables are given in Table S2. Based on the permutation importance, the most important variables were BIO5, BIO12, and BIO1, explaining 36.4, 30.8, and 12.7 of variation in the distribution of T. aralioides, respectively. The aforementioned three variables contributed to almost 80% of our model; the other variables had relatively low contributions in terms of permutation importance. The response curves, shown in Figure 4, further detailed how each of the environmental variables altered the model prediction. relative contributions of five environmental variables are given in Table S2. Based on the permutation importance, the most important variables were BIO5, BIO12, and BIO1, explaining 36.4, 30.8, and 12.7 of variation in the distribution of T. aralioides, respectively. The aforementioned three variables contributed to almost 80% of our model; the other variables had relatively low contributions in terms of permutation importance. The response curves, shown in Figure 4, further detailed how each of the environmental variables altered the model prediction.

Data Quality for Species Occurrence
GBIF was the largest single source of global biodiversity occurrenc study, there were 1729 records of T. aralioides accessed from GBIF's onlin 786 (~45%) records did not have georeferenced information. Most of thes enced records were obtained from herbariums and museums prior to th global positioning systems and would likely have temporal sampling biase paratively, the PRDB data had better quality overall, as compared to the G only 10 occurrences showed georeferenced errors, ~2%), particularly since currence records were based on a phytosociological survey and were digi barium specimens with low geographical resolution. Furthermore, amon records with georeferenced information, 444 (47%) records did not pass g

Data Quality for Species Occurrence
GBIF was the largest single source of global biodiversity occurrence data. In this study, there were 1729 records of T. aralioides accessed from GBIF's online database, but 786 (~45%) records did not have georeferenced information. Most of these non-georeferenced records were obtained from herbariums and museums prior to the invention of global positioning systems and would likely have temporal sampling biases [61,62]. Comparatively, the PRDB data had better quality overall, as compared to the GBIF data (e.g., only 10 occurrences showed georeferenced errors,~2%), particularly since most of the occurrence records were based on a phytosociological survey and were digitized from herbarium specimens with low geographical resolution. Furthermore, among the 943 GBIF records with georeferenced information, 444 (47%) records did not pass geographic validation tests or were determined to be duplicates through the data cleaning process in QGIS (Table S1). This determination was made based on conditions noted by Yesson et al. [63], including undersea coordinates and georeferenced errors, as well as conditions considered in this study, including coordinates located in distinctly residential areas and improbable altitudes (e.g., ranging 300-3000 m a.s.l. in Taiwan) [1,7,64]. The spatial bias of the species records had been widely discussed regarding both theoretical errors and real cases [63,[65][66][67][68][69]. Filtering out these biases in occurrence records would be a pioneering achievement as the bias data heavily influences the accuracy and reliability of the SDMs [70][71][72]. Of the 308 acceptable records in Taiwan, 217 (70.5%) were from vegetation diversity inventories and mapping plans (Chiou et al. [73]; also see Table S1, GBIF dataset key: 9edd4990-4138-11de-b28f-b8a03c50a862). Therefore, integrating vegetation inventory databases, such as the Global Index of Vegetation-Plot Databases (http://www.givd.info; accessed on 19 September 2021) with the 313 databases and 3,630,327 plots registered (date accessed: October 2021), into the GBIF database was a critical factor for broadening and refining biodiversity occurrence data [74,75].

Distribution of T. aralioides 4.2.1. Japan
As shown in Figure 3, T. aralioides was distributed across the mountains of Honshu, Shikoku, and Kyushu. The northeast occurrence of T. aralioides was on the Iwate Prefecture, but there was only one point that would need further validation. However, the probable northern limit of T. aralioides was near Yamagata City, Yamagata Prefecture, such as in Mount Funagata and Mount Zao, at approximately 38 • N as denoted by Wu et al. [13]. Around 35 • N, T. aralioides occurred on the piedmont slopes of Mount Fuji and the southern parts of the Akaishi and Kiso Mountains, but not on the alpine slopes where coniferous trees were dominant. In addition to the mainland of Japan, T. aralioides also occurred on the Izu [76] and Ryukyu Islands [6,7,10,[14][15][16]. On Yakushima Island (30.2 • N, 130.3 • E, Figure 3b), which is one UNESCO's World Heritage Sites largely due to its unique vegetation, T. aralioides was a frequent feature in forest vegetation at 500-1800 m a.s.l., along with Cryptomeria japonica D.Don and Tsuga sieboldii Carrière [77][78][79][80][81][82].
Our results indicated that temperature-related variables, such as BIO5, were very important based on our models, and BIO5 was a limiting factor for the lower or southern distribution of T. aralioides. The potential distribution map (Figure 3) also showed the same pattern, while highly potential habitats were located on the Pacific side of the Honshu region. Figure 4d shows the probability of T. aralioides abruptly decreasing under BIO5 > 23 • C conditions. However, the precipitation-related variables, such as BIO12 (annual precipitation) also played important roles in the distribution of T. aralioides (see Figure 4f), but they required further research that was outside the scope of this study. Comparing the altitudinal ranges of T. aralioides in Japan and Taiwan (Huang et al. [83]; also see Table S1), T. aralioides in Japan (at high latitudes) occurred at lower altitudes than those in Taiwan (at low latitudes). Therefore, the thermal conditions were critical factors in T. aralioides distribution.

Taiwan
In contrast with the potential T. aralioides distribution in Japan, the distribution was relatively wide in Taiwan. It was moderately common in the mid-altitude forests of the Central Mountain Range [7], in the 1500-3000 m a.s.l. regions. Wu et al. [13] noted that central Taiwan was a probable refuge for T. aralioides during the last glacial periods due to those populations having the greatest genetic diversity. The altitudinal range of T. aralioides decreased considerably to the north and south [8,13,64,83,84]. As shown in Figure 3, the lowest altitude of T. aralioides distribution was at approximately 400 m a.s.l. in the northern region, then gradually increased up to 1300 m a.s.l. in the central region, and then finally descended to 500 m a.s.l. in the southern region. The altitudinal pattern of T. aralioides has been correlated with the position of cloud belts [8]. T. aralioides favor cool, humid habitats. In Japan, for example, T. aralioides tended to occur in steep slopes and sometimes near bodies of water in the mountains, whereas in Taiwan, it had extended widely across the mountainous regions, where cloud belts are known to form. T. aralioides has been considered as an indicator for prevalent cloud zones where Quercus oak trees may also be found [8]. Within the cloud belt of the Central Mountain Range, T. aralioides is typically mixed with coniferous trees (e.g., Chamaecyparis formosensis Matsum. and Cunninghamia konishii Hayata) and other broad-leaved trees (e.g., Quercus morii Hayata and Neolistea spp.; [8]). In the northern and southern regions of Taiwan, T. aralioides occurred in areas coinciding with cloud belts, but where coniferous trees were absent, suggesting that both were subject to thermal conditions. In this study, we used the novel effective warmth index (EWI), which associates the temperature sum with the thermal seasonality [85], to describe the T. aralioides habitat and applied the equal training sensitivity and specificity logistic threshold test [86] to classify our model predictions into suitable and unsuitable habitats. As shown in Figure 5, EWI = 160 isoline (yellow line) approximately matched the lowest limit of T. aralioides highly suitable habitat in the Central Mountain Range. Consequently, the distribution of T. aralioides was related to thermal variables and seasonal precipitation rather than to a simple climate factor.

Conclusions
Trochodendron aralioides is an important indicator of prevalent cloud zones in Taiwan as well as Chamaecyparis species, whose disjointed distribution also reflects the historical impacts of prior climate changes. This study constructed a streamlined framework to predict the potential distribution of T. aralioides by applying species distribution modeling with multiple free software and open data sources. Our results showed that the most im-

T. aralioides in South Korea?
There were two opposing views regarding whether T. aralioides is native to South Korea or not. Most of the literature [6,10,11,15,16,87] has noted that the natural distribution of T. aralioides includes Taiwan, Japan, and South Korea. However, Wu et al. [13] and Andrews [14] opposed this point of view, as did Korean scientist Koh [88], all of whom agreed that Trochodendron aralioides did not originate on the Korean peninsula and its adjacent islands, though many foreign publications have suggested that this species could be found in the southern part of Korea. Furthermore, the flora identified in the Hallasan Natural Reserve in Jeju island did not include T. aralioides [89]. However, our results ( Figure 3a) showed that Jeju Island in South Korea had a high probability (around 0.7-0.8) of T. aralioides. Nonetheless, whether T. aralioides is native to South Korea could not be determined and will require additional research.
On one hand, T. aralioides may not be native in South Korea. Our SDM was based on the concept of the realized niche [90] despite the arguments of Pearson and Dawson [91], as well as Soberón [92]. The probability of occurrence did not represent the real occurrence because the actual range was part of the potential range [93,94]. However, our SDM did not consider the potential effects of biotic exclusion, historical contingency, dispersal limitations, land usage, or species ranges that restrict the geographic expression of the fundamental niche [95,96]. Thus, Jeju Island may be a suitable habitat for T. aralioides; even so, T. aralioides may never have spread to there. On the other hand, T. aralioides may be native in South Korea. Brown et al. [97] noted that species tended to prosper in the center of their distribution and become rarer and more restricted to specific habitats towards the distribution margins. Our results indicated Jeju Island was a potential habitat of T. aralioides, but the presence of T. aralioides on the island has never been verified. However, an undiscovered or unrecorded species cannot prove a negative, i.e., that T. aralioides has never existed on Jeju Island. To answer the question about T. aralioides' native origins in South Korea, we could perhaps examine the surveys of Jeju island, especially the natural vegetation in Hallasan National Park. The discovery of a new species requires effort (e.g., vegetation surveys, specimen collections, time, etc.) and, in many cases, fortunate circumstances. In addition, it is possible that T. aralioides found refuge in central Taiwan during the last glacier period [13] and has not yet migrated to and settled in South Korea. Since one of the largest GBIF species occurrence datasets can be found in iNaturalist research-grade data, future research should consider further investigation concerning the origins and spread of T. aralioides in South Korea, namely on Jeju Island.

Conclusions
Trochodendron aralioides is an important indicator of prevalent cloud zones in Taiwan as well as Chamaecyparis species, whose disjointed distribution also reflects the historical impacts of prior climate changes. This study constructed a streamlined framework to predict the potential distribution of T. aralioides by applying species distribution modeling with multiple free software and open data sources. Our results showed that the most important environmental variables that influence the distribution of T. aralioides were BIO5 (maximum temperature of the warmest month), BIO12 (annual precipitation), and BIO1 (annual mean temperature), which implied that the maximum temperature of the warmest month would be the limitation factor and may be vulnerable to climate changes. In general, the distribution of T. aralioides in Japan was not common, except in the mountainous area of Yakushima Island. It was, however, common in the cloud zones in Taiwan. Therefore, conservation strategies can be developed based on the potential distribution probabilities of different countries. Regarding the uncertainty about the origins of T. aralioides in South Korea, we suggest that future research should consider examining Jeju island further, based on the results of our SDM, and develop a complete distribution map according to free, expansive data sources, such as iNaturalist.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/f12121749/s1, Table S1: The occurrence table of Trochodendron aralioides, Table S2: Average relative contribution and permutation importance of the environmental variables of the MaxEnt model, Figure S1: Delta AICc of regularization multipliers for optimization of models, Figure S2: Results of jackknife analyses of regularized training gain (a) and test gain (b) for Trochodendron aralioides.