Distribution of Suitable Habitats for Soft Corals (Alcyonacea) Based on Machine Learning

Dong, Minxing; Yang, Jichao; Fu, Yushan; Fu, Tengfei; Zhao, Qing; Zhang, Xuelei; Xu, Qinzeng; Zhang, Wenquan

doi:10.3390/jmse12020242

Open AccessArticle

Distribution of Suitable Habitats for Soft Corals (Alcyonacea) Based on Machine Learning

by

Minxing Dong

^1,2,3,

Jichao Yang

¹,

Yushan Fu

^2,3,

Tengfei Fu

^2,3,*

,

Qing Zhao

¹,

Xuelei Zhang

⁴,

Qinzeng Xu

⁴ and

Wenquan Zhang

^5,*

¹

College of Ocean Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

²

Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao 266061, China

³

Key Laboratory of Deep Sea Mineral Resources Development, Shandong (Preparatory), Qingdao 266061, China

⁴

Key Laboratory of Marine Eco-Environmental Science and Technology, First Institute of Oceanography, Ministry of Natural Resources, Qingdao 266061, China

⁵

National Deep Sea Center, Ministry of Natural Resources, Qingdao 266237, China

^*

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(2), 242; https://doi.org/10.3390/jmse12020242

Submission received: 17 November 2023 / Revised: 24 January 2024 / Accepted: 25 January 2024 / Published: 30 January 2024

(This article belongs to the Section Marine Ecology)

Download

Browse Figures

Versions Notes

Abstract

The soft coral order Alcyonacea is a common coral found in the deep sea and plays a crucial role in the deep-sea ecosystem. This study aims to predict the distribution of Alcyonacea in the western Pacific Ocean using four machine learning-based species distribution models. The performance of these models is also evaluated. The results indicate a high consistency among the prediction results of the different models. The soft coral order is primarily distributed in the Thousand Islands Basin, Japan Trench, and Thousand Islands Trench. Water depth and silicate content are identified as important environmental factors influencing the distribution of Alcyonacea. The RF, Maxent, and XGBoost models demonstrate high accuracies, with the RF model exhibiting the highest prediction accuracy. However, the Maxent model outperforms the other three models in data processing. Developing a high-resolution, high-accuracy, and high-precision habitat suitability model for soft corals can provide a scientific basis and reference for China’s exploration and research in the deep sea field and aid in the planning of protected areas in the high seas.

Keywords:

species distribution models; machine learning; maximum entropy model; random forest; Alcyonacea

1. Introduction

Species distribution models (SDMs) are models that link the distribution of target species with environmental data to estimate their distribution in a specific area. These models have been widely used in ecology and conservation, providing support for predicting the range of invasive species, studying the impact of climate change on species distribution, and planning biological reserves [1].

There are various modeling methods for SDMs, each with different principles and algorithms. The choice of modeling method can greatly influence the prediction results. Studies have shown that machine learning-based models (such as Maxent and RF) outperform traditional regression models (such as GAM and GLM) [2,3]. However, the application of machine learning-based SDMs in the marine environment is still an ongoing exploration due to the unique characteristics of the marine environment, such as depth, hydrostatic pressure, light, and complex hydrodynamics. Although most studies have focused on terrestrial organisms, hundreds of papers have been published on marine SDMs, and many aimed to compare model performance [4].

Deep-sea corals, which are foundation species in the oceans, are typically found at depths greater than 1000 m and are concentrated in global deep-sea biodiversity hotspots [5]. They play a crucial role in deep-sea benthic environments, enhancing complexity and biodiversity [6]. Deep-sea coral ecosystems are Vulnerable Marine Ecosystems (VMEs) and are highly vulnerable to global climate change and human activities, such as deep-sea mineral resource exploration and development [7]. Due to their slow growth and slow recovery from damage, deep-sea coral ecosystems require urgent management and protection [8].

Traditionally, the distribution of deep-sea corals has been estimated using fisheries surveys or catch records [9]. However, advancements in technology now allow for more accurate information on the distribution and density of deep-sea corals through underwater photography using manned submersibles or remotely operated vehicles. Despite its accuracy, this method is expensive and time-consuming, limiting its application to small-scale studies [10,11]. At present, the regions surveyed using visualization methods only account for a small portion of the global ocean, previous habitat suitability models for deep-sea corals have shown significant improvement in terms of environmental variable resolution, number of presence points, pseudo-absence point selection methods, model selection, and spatial coverage. Whether from the perspective of understanding the potential spatial distribution pattern of deep-sea corals or further exploring the mechanisms by which deep-sea coral distribution is influenced by environmental factors, there is an urgent need for research to develop a high-resolution, high-accuracy, and high-precision deep-sea coral habitat suitability model.

In this study, we focused on the soft coral Alcyonacea, a type of Anthozoa, and examined its distribution in the western Pacific Ocean. We constructed distribution models for the Alcyonacea using various modeling techniques such as the ANN model, maximum entropy model, RF model, and XGBoost model. By comparing the performance of these models, we aimed to identify the most suitable modeling method for deep-sea species. This research provides practical recommendations for the future development of high-resolution, high-accuracy, and high-precision global habitat suitability models. Additionally, it offers valuable insights for the selection and delineation of marine protected areas, marine spatial management planning, marine ecological protection, and the design of marine scientific surveys. Furthermore, it serves as a scientific basis for China’s surveys and research in the deep sea field and the planning of protected areas in the high seas.

2. Materials and Methods

2.1. Materials

2.1.1. Study Area

The study area spans from 10.00° N to 50.00° N and from 120.00° E to 180.00° E (Figure 1). The marine topography in this region is complex, including features such as the Philippine Basin, the Mariana Trench, the Japan Trench, and Palau Ridge in Kyushu. The area is known for its rich biodiversity, with deep-sea corals, deep-sea sponges, dolphins, seals, mollusks, and numerous other species.

2.1.2. Different Species Distribution Models

In this study, we selected four machine learning algorithms to construct the species distribution model: maximum entropy model [12], random forest [13], ANN [14], and XGBoost [15]. Maximum entropy model, random forest, and ANN have been widely used in previous studies, while XGBoost is a newly proposed method that is currently undergoing rapid development.

1.: Maximum entropy models

The maximum entropy model (Maxent) was introduced by Phillips [12]. It combines the principle of maximum entropy with species distribution modeling. Maxent has been continuously updated and improved since its inception. Compared to other species distribution models, Maxent offers better accuracy with limited data and uses a regularization algorithm to prevent overfitting. The principle of Maxent involves using species distribution and environmental data to determine the constraints of ecological niches. This information is used to calculate the probability distribution function of the system under maximum entropy, H(X). Based on this function, the potential distribution condition of maximum entropy is fitted to construct a species distribution model for the geographical scale.

2.: Random forest models

The Random Forest (RF) model, introduced by Breiman [13], is a machine learning algorithm that combines multiple decision trees. Each decision tree is trained independently, and the final output is determined by voting or averaging the predictions of the individual trees. The RF model enhances model diversity through random sampling and feature selection, reducing the risk of overfitting. It is particularly effective in handling high-dimensional data and large-scale datasets.

3.: Artificial neural network models

The Artificial Neural Network (ANN) model is a machine learning model that simulates the connections between neurons in the human brain [14]. It aims to replicate the cognitive functions of the brain by using an input layer, hidden layer, and output layer. The model also includes weights and biases that connect these layers. The ANN model has the ability to adapt and learn, which helps prevent overfitting and improves the accuracy of prediction models.

4.: The XGBoost model

The XGBoost (eXtreme Gradient Boosting, XGBoost) model, proposed by Prof. Chen Tianqi [15], is an algorithmic improvement over the GBDT (gradient-boosting decision trees) model. Unlike GBDT, which only uses first-order derivatives for optimization, XGBoost incorporates second-order derivatives to tune the loss function. Additionally, XGBoost includes a regularization term in the objective function to prevent overfitting by considering the complexity of the tree model. Inspired by random forests, XGBoost selectively uses a subset of samples and features in each iteration of the training process, enhancing the model’s generalization ability and mitigating under- or overfitting. Moreover, XGBoost supports parallel computation, enabling faster operation.

2.2. Data Sources

2.2.1. Species Distribution Data

In this study, we obtained presence data for Alcyonacea from various sources, including OBIS1, NOAA deep-sea coral database2, and PANGAEA3, a German geo-environmental database. A total of 5102 presence points were collected within the study area, which had a water depth greater than 1000m.

2.2.2. Data on Marine Environmental Variables

For this study, we selected 21 environmental variables, including depth (bathymetry), seafloor topography, and seafloor chemical variables (Table 1). These variables were obtained from four sources: Bio-ORACLE v2.0 [16], Satellite Geodesy (SRTM15PLUS) [17], Davies and Guinotte [18], and Steinacher [19].

2.3. Construction of 15 Arc Sec Resolution Marine Environmental Dataset and Sample Data Establishment

A conceptual model was developed to understand the environmental factors that influence deep-sea coral habitats, taking into account the specific biological characteristics of Alcyonacea. To create this model, we utilized a variety of marine near-bottom environmental data from the past few decades. These data sources had varying collection periods and resolutions. By prioritizing data with higher resolution, we reconstructed a high-resolution marine environmental dataset at a 15 arc sec resolution. This dataset is specifically focused on the distribution and ecology of Alcyonacea and is based on the marine Digital Elevation Model (DEM) known as SRTM 15+ [17].

2.3.1. Marine Environmental Dataset

We used ArcGIS 10.8 and ocean DEM to create a depth dataset with a resolution of 15 arc sec in our study area (Figure 2). We using Landserf 2.3, ArcGIS 10.8, and their extension package Benthic Terrain Modeler (v 3.0) to calculate various seafloor topographic feature parameters based on ocean DEM, such as Slope, Slope Direction, Slope Direction-East, Slope Direction-North, Slope of Aspect, Slope of Slope, and Ground Roughness. Two calculation scales were used, with moving windows of 3 × 3 and 9 × 9, respectively. We then used ArcGIS 10.8’s resampling function to standardize the resolution to 15 arc sec for the other environmental variables, including residual temperature and salinity.

To account for the covariance between different environmental variables and its impact on the prediction results of the distribution model, we conducted pre-testing using 21 environmental variables and calculated their importance with R code [20,21,22] (Figure 3). The presence of Alcyonacea has been studied to be influenced by environmental factors such as seawater temperature near the seabed, and corals mainly feed on primary productivity near the surface. Therefore, variables such as chlorophyll may limit their spatial distribution [23]. Finally, we selected 8 representative environmental variables, namely Chlorophyll, Dissolved Iron, Silicate, Nitrate, Temperature, Ground Roughness, Slope, and Depth. These variables were chosen based on their importance and relevance to the Alcyonacea. We used these variables to construct the distribution prediction model for the Alcyonacea.

2.3.2. Species Distribution Data Processing

The data used for model prediction consisted of presence points and absence points. The presence point coordinates were obtained from a 15 arc sec resolution grid. Duplicate points were eliminated from the initial 5102 presence data to ensure that each grid point had only one presence point. Ultimately, 230 presence point data were retained (Figure 4).

To ensure accurate prediction and model evaluation results, it is crucial to select appropriate pseudo-absence points for species when real absence points are not available. In this study, we randomly selected 5000 raster points, referred to as pseudo-absence points, and extracted the values of environmental variables at these points as well as at the presence points of the Alcyonacea. To standardize the different environmental variables, we employed principal component analysis (PCA) to transform the points from geographic space to environmental space, retaining only the first few principal components. Assuming that the number of principal components is 3 and the coordinates of any two points A and B in a three-dimensional space are A₁, A₂, A₃ and B₁, B₂, B₃, respectively, the Euclidean distance Dist(A,B) between the two points in the environmental space is:

D i s t (A, B) = \sqrt{\sum_{i}^{3} {(A_{i} - B_{i})}^{2}}

(1)

For any presence point O_i (i = 1, …, N), the geometric mean DistM(C_j) from any pseudo-absence point C_j to all presence points in the environmental space is:

D i s t M (C_{j}) = \sqrt[N]{\prod_{i = 1}^{N} D i s t (C_{j}, O_{i})}

(2)

Subsequently, we ranked the pseudo-absence points based on their geometric mean and selected the point with the largest geometric mean, as pseudo-absence points. The implementation of this methodology involved using ArcGIS 10.8 and the R package factoextra (v 1.0.7), resulting in the selection of a total of 1000 pseudo-absence points.

2.4. Model Framework

The flowchart of using the model to predict the potential distribution of cold water corals is shown in Figure 5.

2.4.1. Construction of Training Dataset

The ocean environment dataset constructed in Section 2.3.1 is used as the input parameter for the model, and the presence and pseudo-absence data obtained in Section 2.3.2 are used as sample data. Overall, 75% of the data are selected as the training set, and the remaining 25% are used as the testing set. Choose Environment-based Cross-Validation (env) [22] as the cross-validation method.

2.4.2. Model Training

Using Bayesian optimization method to obtain the optimal parameter combination [24], optimizing model parameter settings, each algorithmic approach demands the careful tuning of specific parameters to achieve optimal performance. The Maxent model, is primarily optimized for its constraint parameters and complexity regularization to ensure a balance between model fit and generalization. In contrast, the ANN model, benefits from the optimization of the number of hidden layers and the number of neurons within each layer, which are critical for capturing the intricate relationships between species occurrences and environmental variables. For RF model, the key lies in adjusting the number of hidden layers and the number of trees (num_trees), the number of features randomly selected at each node (mtry), and the maximum depth of trees (max_depth), which collectively influence the model’s predictive accuracy and robustness. Similarly, the XGBoost model sees its performance enhanced through the optimization of tree depth (max_depth), the number of leaves per tree (num_leaves), and the learning rate (learning_rate), among other parameters. Select L1 regularization parameters to prevent overfitting of the model, generate habitat prediction models through training, and analyze the relative importance of various environmental variables in the modeling process.

2.4.3. Model Prediction

Using the model trained in Section 2.4.2, a habitat suitability prediction model is constructed using a large dataset of marine environments as input parameters. Evaluate the prediction accuracy of the prediction model using sample test data.

The above work is implemented using r packages “maxent”, “randomForest”, “nnet”, “xgboost”, and “biomod2”.

2.5. Model Assessment

After constructing the 15 arc sec resolution marine environmental dataset and sample data, 75% of the data were selected as the training set and the remaining 25% as the test set, substituting four models and performing 10 repetitions of prediction for each of them, and choosing the three commonly used evaluation metrics, namely, AUC, TSS, and Kappa, for evaluating the performance of different models.

All samples were first divided into four categories:

True Positive (TP): predicted positive, actual positive.

False Positive (FP): predicted positive, actual negative.

True Negative (TN): predicted negative, actual negative.

False Negative (FN): predicted negative, actual positive.

Area under the curve (AUC) is the area under the receiver operating characteristic curve (ROC), which has the advantage that the AUC value itself is independent of the threshold set by the model and the distribution of the samples. The horizontal coordinate of the ROC curve is the false positive rate (FPR) and the vertical coordinate is the true positive rate (TPR).

FPR denotes the probability of all negative cases being judged positive, and TPR denotes the probability of all positive examples being judged positive. They are explained as follows:

F P R = \frac{F P}{F P + T N}

(3)

T P R = \frac{T P}{T P + F N}

(4)

True skill statistics (TSS) represents the ability of a model to determine “positive” and “negative”, the value itself is affected by a threshold and is not affected by the sample distribution. The formula is:

T S S = T P R - F P R

(5)

The Kappa statistic refers to the accuracy of a prediction relative to random occurrences, subject to thresholds and sample distributions, and is calculated as:

K a p p a = \frac{p_{0} - p_{e}}{1 - p_{e}}

(6)

p_{0} = \frac{T P + T N}{N}

(7)

p_{e} = \frac{[(T P + F N) (T P + F P) + (F P + T N) (F N + T N)]}{N^{2}}

(8)

N is the total number of samples.

All three evaluation metrics can be computed using the R package “dismo”, and the evaluation criteria are shown in Table 2 [25,26].

3. Results

3.1. Statistical Comparison of Model Performance

The performance of four models (Maxent, RF, XGBoost, and ANN) was evaluated using three metrics: AUC, TSS, and Kappa. Table 3 presents the evaluation metrics for the prediction results, allowing for a visual comparison of the models’ performance.

The statistical analysis revealed that all four models had mean AUC values above 0.9, mean TSS values above 0.8, and mean Kappa values above 0.7. This indicates that the prediction results of these models are reliable and demonstrate good performance. Specifically, the mean AUC values for Maxent, RF, and XGBoost models were all above 0.9, while the mean TSS and Kappa values for these models were also above 0.9. Therefore, these three models exhibited excellent performance. Although the ANN model had the lowest performance among the three types of models, its mean values for the three evaluation metrics were still at a satisfactory level. Both the mean TSS and Kappa values for the ANN model exceeded 0.9, indicating a good overall performance.

The Maxent model has a significantly shorter runtime compared to the other three models. It also generates accurate predictions in a much shorter time period than the other models.

3.2. Predicted Distribution of Species

3.2.1. Potential Distribution Projections

The predictions with the highest AUC values for each of the four models were chosen and depicted in Figure 6 to illustrate the projected distribution of Alcyonacea habitats in the study area.

The prediction results indicate a high consistency among the different models. All four models suggest that Alcyonacea are primarily found in the Thousand Islands Basin, Japan Trench, and Thousand Islands Trench. The Maxent and XGBoost models show similar predicted distributions, while the ANN and RF models exhibit noticeable differences in certain areas, particularly in the size of highly suitable distribution areas. For instance, the Marianas Trench and Eastern Marianas Basin are predicted to have a wide distribution of Alcyonacea, which is supported by the ANN, Maxent, and XGBoost models, but not by the RF model. The ANN model predicts that areas with rich seafloor topography in the study area are highly suitable for Alcyonacea presence. The highly suitable distribution areas predicted by the Maxent and XGBoost models are encompassed within the ANN model’s predictions. However, the ANN model’s prediction of the vicinity of the Ryukyu Islands as a highly suitable distribution area lacks sufficient distribution data for validation.

Based on the prediction results, the potential presence of Alcyonacea in the study area was categorized into four suitability classes: high-suitability distribution area, more suitable distribution area, low-suitability distribution area, and unsuitable distribution area. This categorization was performed using the natural discontinuity method (Table 4). The cumulative percentage of the area for each suitability class was calculated using ArcGIS 10.8 and is presented in Figure 7.

The figure shows that the Maxent and XGBoost models have similar proportions of suitable distribution areas for each grade. The highly suitable distribution area for Alcyonacea purposes accounts for approximately 18% of the study area. However, the proportions differ more in the prediction results of the ANN and RF models. The highly suitable distribution area in the ANN model’s prediction results is as high as 25%, whereas it is only 6% in the RF model’s prediction results. This indicates that different models yield varying results even when using the same data, highlighting the unavoidable uncertainty in model predictions.

3.2.2. Analysis of Environmental Factors Affecting Species Distribution

Figure 8 illustrates the percentage of importance for each environmental variable. The prediction results from all four models indicate that bathymetry is the most important variable, with significantly higher importance compared to other factors. It dominates the prediction distribution.

Based on the performance comparison of the four models, select the Maxent model to draw the response curves. The response curves for water depth and silicate were plotted individually (Figure 9). The range with a probability P > 0.762 was identified as the most suitable environmental parameter for Alcyonacea. It is evident that the optimal conditions for the survival of Alcyonacea are water depths ranging from 1000 m to 1900 m, with a silicate content exceeding 175 mol·m⁻³.

4. Discussion

Four common machine learning models were used in this study to predict and model the distribution of Alcyonacea in the study area and analyze the differences in performance among the four models in predicting the distribution on this species.

The result shows that Alcyonacea mainly distributed in Kuril Basin, Japan Trench, Kuril Trench, and other areas, with a minority in areas like the Mariana Trench. Through the comparison of model performance, RF achieved best performance in prediction, followed by Maxent and XGBoost. Runtime is another important aspect of modeling approach, often not reported in distribution modeling studies [27,28]. The results show that there is little difference in the predicting among these four models, but with considerable gaps in computing costs. Maxent achieved a remarkable predictive performance in a much shorter time than the other methods.

The data of environment and species distribution are both necessary to establish model of species distribution, which thus can have an influence on the results. Due to the difficult investigation and surveying in deep sea and errors caused by other objective causes, presence points in the study area cannot be 100% detected, while scientific sampling survey can effectively reduce this error [29]. In this study, through random sampling and principal component analysis (PCA), data of absence points can be gained, which can help reduce the error of sampling and improve the accuracy of the sampling results. In addition, research has shown that population characteristics, distribution patterns, the distribution span of species, and the interplay among species will affect the model prediction [30,31].

Previous studies by Kinlan et al. [32] and Doherty et al. [33] have shown a strong correlation between water depth and the presence of Alcyonacea. This correlation is particularly pronounced at depths exceeding 1500 m. Additionally, deep-sea silicate content, as observed in the study by Chu et al. [34] in the Canadian region, also greatly influences the presence of Alcyonacea.

The sample size of species will have an effect on the model prediction to some extent. When the sample size is relatively small, the stability of the model will be poorer. But when the sample size reaches a proper scale, the model prediction will show little change with a relatively high stability [35]. This study provided a proper sample size, so there were no large fluctuations in the results during the prediction.

Corals and their habitats can maintain extremely diverse biological communities. Therefore, by applying modeling techniques incorporated with geospatial tools to construct high-resolution, high-accuracy, and high-precision coral habitat suitability models, the distribution of the entire biological community can be obtained [36]. This result can help researchers and managers build valuable models of biodiversity distribution [37], and will affect the formulation and implementation of marine biodiversity conservation policies in the western Pacific Ocean [38].

5. Conclusions

This study compares different machine learning-based modeling algorithms to determine the most suitable method for predicting the distribution of marine organisms. The RF model demonstrates the highest prediction accuracy among all three evaluation indexes, but it also has the longest operation time. The XGBoost model, a newer method, performs similarly to the RF model but with a shorter operation time. However, the XGBoost model occasionally experiences overfitting issues and is not yet widely used in species distribution modeling. The Maxent model, a well-established method in this field, exhibits stable performance, particularly in terms of computing time, which is superior to other models. This advantage becomes more apparent as the sample size increases. The ANN model does not offer significant advantages in terms of performance or computing time and is therefore not recommended for predicting the distribution of marine organisms.

Therefore, when predicting the distribution of deep-sea species, the RF model is recommended for small ranges, low resolution, and fewer species distributions, while the Maxent model is suitable for large ranges, high precision, and more species distributions.

Author Contributions

Conceptualization, M.D. and T.F.; methodology, M.D. and J.Y.; software, M.D. and Y.F.; validation, M.D. and Q.X.; formal analysis, M.D.; investigation, Q.Z.; resources, T.F.; data curation, T.F.; writing—original draft preparation, M.D. and J.Y.; writing—review and editing, J.Y. and Q.X.; visualization, X.Z.; supervision, T.F.; project administration, T.F.; funding acquisition, T.F. and W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the National Natural Science Foundation of China (42276226, 42276223), the MNR Key Laboratory of Eco-Environmental Science and Technology, China (MEEST-2021-02), Shandong Provincial Natural Science Foundation (ZR2022MD048), Key Research and Development Program of Shandong Province (2020JMRH0101), Open Fund of the 801 Institute of Hydrogeology and Engineering Geology (801KF2021-6), Shandong Institute of Chinese Engineering S&T Strategy for Development (202203SDYB08).

Institutional Review Board Statement

Authors have signed the statement.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are from the experiments we conducted in this work.

Acknowledgments

We thank the support of the observation and research station of seawater intrusion and soil salinization, Laizhou Bay. Meanwhile, we would like to thank the anonymous reviewers and the editor, who helped improve the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Notes

1	Available online: https://mapper.obis.org/ (Accessed: 10 March 2023).
2	Available online: https://www.ncei.noaa.gov/maps/deep-sea-corals/mapSites.htm (Accessed: 13 June 2023).
3	Available online: https://pangaea.de (Accessed: 19 June 2023).

References

Zhonglin, X.; Huanhua, P.; Shouzhang, P. The development and evaluation of species distribution models. Acta Ecol. Sin. 2015, 35, 11. [Google Scholar]
Hallgren, W.; Santana, F.; Low-Choy, S.; Zhao, Y.; Mackey, B. Species distribution models can be highly sensitive to algorithm configuration. Ecol. Model. 2019, 408, 108719. [Google Scholar] [CrossRef]
Valavi, R.; Guillera-Arroita, G.; Lahoz-Monfort, J.J.; Elith, J. Predictive performance of presence-only species distribution models: A benchmark study with reproducible code. Ecol. Monogr. 2022, 92, e01486. [Google Scholar] [CrossRef]
Melomerino, S.M.; Fath, B.D. Ecological niche models and species distribution models in marine environments: A literature review and spatial analysis of evidence. Ecol. Model. 2020, 415, 108837. [Google Scholar] [CrossRef]
Vohsen, S.A. The Chemical and Microbial Ecology of Deep-Sea Corals; The Pennsylvania State University: State College, PA, USA, 2019. [Google Scholar]
Quintanilla, E.; Rodrigues, C.F.; Henriques, I.; Hilário, A. Microbial Associations of Abyssal Gorgonians and Anemones (>4000 m Depth) at the Clarion-Clipperton Fracture Zone. Front. Microbiol. 2022, 13, 828469. [Google Scholar] [CrossRef]
Long, S.; Sparrow-Scinocca, B.; Blicher, M.E.; Arboe, N.H.; Fuhrmann, M.; Kemp, K.M.; Nygaard, R.; Zinglersen, K.; Yesson, C. Identification of a Soft Coral Garden Candidate Vulnerable Marine Ecosystem (VME) Using Video Imagery, Davis Strait, West Greenland. Front. Mar. Sci. 2020, 7, 460. [Google Scholar] [CrossRef]
Wagner, D.; Friedlander, A.; Pyle, R.L.; Wilhelm, T. Coral Reefs of the High Seas: Hidden Biodiversity Hotspots in Need of Protection. Front. Mar. Sci. 2020, 7, 776. [Google Scholar] [CrossRef]
Jorgensen, L.L.; Ljubin, P.; Skjoldal, H.R.; Ingvaldsen, R.B.; Anisimova, N.; Manushin, I. Distribution of benthic megafauna in the Barents Sea: Baseline for an ecosystem approach to management. Ices J. Mar. Sci. 2015, 72, 595–613. [Google Scholar] [CrossRef]
Tong, R.; Purser, A.; Guinan, J.; Unnithan, V.; Yu, J.; Zhang, C. Quantifying relationships between abundances of cold-water coral Lophelia pertusa and terrain features: A case study on the Norwegian margin. Cont. Shelf Res. A Companion J. Deep-Sea Res. Prog. Oceanogr. 2016, 116, 13–26. [Google Scholar] [CrossRef]
Buhl-Mortensen, L.; Serigstad, B.; Buhl-Mortensen, P.; Olsen, M.N.; Ostrowski, M.; Błażewicz-Paszkowycz, M.; Appoh, E. First observations of the structure and megafaunal community of a large Lophelia reef on the Ghanaian shelf (the Gulf of Guinea). Deep-Sea Res. Part II Top. Stud. Oceanogr. 2017, 137, 148–156. [Google Scholar] [CrossRef]
Phillips, S.J.; Anderson, R.P.; Schapire, R.E. Maximum entropy modeling of species geographic distributions. Ecol. Model. 2006, 190, 231–259. [Google Scholar] [CrossRef]
Breiman, L. Random forests, machine learning 45. J. Clin. Microbiol. 2001, 2, 199–228. [Google Scholar]
Lek, S.; Guégan, J.-F. Artificial Neuronal Networks. In Environmental Science; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Assis, J.; Tyberghein, L.; Bosch, S.; Verbruggen, H.; Serrao, E.A.; De Clerck, O. Bio-ORACLE v2.0: Extending marine data layers for bioclimatic modelling. Glob. Ecol. Biogeogr. 2018, 27, 277–284. [Google Scholar] [CrossRef]
Tozer, B.; Sandwell, D.T.; Smith, W.H.F.; Olson, C.; Beale, J.R.; Wessel, P. Global Bathymetry and Topography at 15 Arc Sec: SRTM15+. Earth Space Sci. 2019, 6, 1847–1864. [Google Scholar] [CrossRef]
Davies, A.J.; Guinotte, J.M. Global Habitat Suitability for Framework-Forming Cold-Water Corals. PLoS ONE 2011, 6, e18483. [Google Scholar] [CrossRef] [PubMed]
Steinacher, M.; Joos, F.; Frölicher, T.L.; Plattner, G.K.; Doney, S.C. Imminent ocean acidification in the Arctic projected with the NCAR global coupled carbon cycle-climate model. Biogeosciences 2009, 6, 515–533. [Google Scholar] [CrossRef]
Cutler, D.R.; Jr, E.T.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random forests for classification in ecology. Ecol. A Publ. Ecol. Soc. Am. 2007, 88, 2783–2792. [Google Scholar] [CrossRef] [PubMed]
Elith, J.; Leathwick, J.R. Species Distribution Models: Ecological Explanation and Prediction Across Space and Time. Annu. Rev. Ecol. Evol. Syst. 2009, 40, 677–697. [Google Scholar] [CrossRef]
Vignali, S.; Barras, A.G.; Arlettaz, R.; Braunisch, V. SDMtune: An R package to tune and evaluate species distribution models. Ecol. Evol. 2020, 10, 11488–11506. [Google Scholar] [CrossRef]
Dullo, W.C.; Flögel, S.; Rüggeberg, A. Cold-water coral growth in relation to the hydrography of the Celtic and Nordic European continental margin. Mar. Ecol. Prog. 2008, 371, 165–176. [Google Scholar] [CrossRef]
Xia, Y.; Liu, C.; Li, Y.; Liu, N. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 2017, 78, 225–241. [Google Scholar] [CrossRef]
Allouche, O.; Tsoar, A.; Kadmon, R. Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol. 2006, 43, 1223–1232. [Google Scholar] [CrossRef]
Thuiller, W.; Lafourcade, B.; Engler, R.; Araújo, M.B. BIOMOD—A platform for ensemble forecasting of species distributions. Ecography 2009, 32, 369–373. [Google Scholar] [CrossRef]
Breiner, F.T.; Nobis, M.P.; Bergamini, A.; Guisan, A. Optimizing ensembles of small models for predicting the distribution of species with few occurrences. Methods Ecol. Evol. 2018, 9, 802–808. [Google Scholar] [CrossRef]
Ingram, M.; Vukcevic, D.; Golding, N. Multi-output Gaussian processes for species distribution modelling. Methods Ecol. Evol. 2020, 11, 1587–1598. [Google Scholar] [CrossRef]
Marchetto, E.; Da Re, D.; Tordoni, E.; Bazzichetto, M.; Zannini, P.; Celebrin, S.; Chieffallo, L.; Malavasi, M.; Rocchini, D. Testing the effect of sample prevalence and sampling methods on probability-and favourability-based SDMs. Ecol. Model. 2023, 477, 110248. [Google Scholar] [CrossRef]
Marmion, M.; Luoto, M.; Heikkinen, R.K.; Thuiller, W. The performance of state-of-the-art modelling techniques depends on geographical distribution of species. Ecol. Model. 2009, 220, 3512–3520. [Google Scholar] [CrossRef]
Godsoe, W.; Harmon, L.J. How do species interactions affect species distribution models? Ecography 2012, 35, 811–820. [Google Scholar] [CrossRef]
Kinlan, B.P.; Poti, M.; Drohan, A.F.; Packer, D.B.; Dorfman, D.S.; Nizinski, M.S. Predictive modeling of suitable habitat for deep-sea corals offshore the Northeast United States. Deep Sea Res. Part I Oceanogr. Res. Pap. 2020, 158, 103229. [Google Scholar] [CrossRef]
Doherty, B.; Cox, S.P.; Rooper, C.N.; Johnson, S.D.; Kronlund, A.R. Species distribution models for deep-water coral habitats that account for spatial uncertainty in trap-camera fishery data. Mar. Ecol. Prog. Ser. 2021, 660, 69–93. [Google Scholar] [CrossRef]
Chu, J.W.; Nephin, J.; Georgian, S.; Knudby, A.; Rooper, C.; Gale, K.S. Modelling the environmental niche space and distributions of cold-water corals and sponges in the Canadian northeast Pacific Ocean. Deep Sea Res. Part I Oceanogr. Res. Pap. 2019, 151, 103063. [Google Scholar] [CrossRef]
Yu, H.; Cooper, A.R.; Infante, D.M. Improving species distribution model predictive accuracy using species abundance: Application with boosted regression trees. Ecol. Model. 2020, 432, 109202. [Google Scholar] [CrossRef]
Georgian, S.E.; Shedd, W.; Cordes, E.E. High-resolution ecological niche modelling of the cold-water coral Lophelia pertusa in the Gulf of Mexico. Mar. Ecol. Prog. Ser. 2014, 506, 145–161. [Google Scholar] [CrossRef]
Yesson, C.; Taylor, M.L.; Tittensor, D.P.; Davies, A.J.; Guinotte, J.; Baco, A.; Black, J.; Hall-Spencer, J.M.; Rogers, A.D. Global habitat suitability of cold-water octocorals. J. Biogeogr. 2012, 39, 1278–1292. [Google Scholar] [CrossRef]
Rathore, M.K.; Sharma, L.K. Efficacy of species distribution models (SDMs) for ecological realms to ascertain biological conservation and practices. Biodivers. Conserv. 2023, 32, 3053–3087. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Depth raster map.

Figure 3. Percentage of importance.

Figure 4. Presence points of Alcyonacea.

Figure 5. Flow chart.

Figure 6. Prediction results of different models: (a) Maxent; (b) RF; (c) ANN; (d) XGBoost.

Figure 7. Cumulative percentage of area in suitability classes.

Figure 8. Percentage of the importance of environmental factors.

Figure 9. Response curve of environmental factors: (a) depth; (b) silicate.

Table 1. Environmental data sources.

Source	Variable Name	Unit	Data Collection Period
Bio-ORACLE v2.0	Temperature	°C	2000–2014
	Salinity	PSS	2000–2014
	Silicate	μmol·m⁻³	2000–2014
	Phosphate	μmol·m⁻³	2000–2014
	Nitrate	μmol·m⁻³	2000–2014
	Dissolved Molecular	μmol·m⁻³	2000–2014
	Dissolved Iron	μmol·m⁻³	2000–2014
	Phytoplankton	μmol·m⁻³	2000–2014
	Chlorophyll	mg·m⁻³	2000–2014
	Primary Productivity	g·m⁻³·day⁻¹	2000–2014
	Current Velocity	m·s⁻¹	2000–2014
Satellite Geodesy	Depth	m	1903–2019
	Slope	°	1903–2019
	Slope Direction	°	1903–2019
	Slope Direction-East	1	1903–2019
	Slope Direction-North	1	1903–2019
	Slope Of Aspect	1	1903–2019
	Slope Of Slope	1	1903–2019
	Ground Roughness	1	1903–2019
Davies and Guinotte	Vgpm	mg·m⁻³·d⁻¹	2002–2007
Steinacher	Calcite	Ω_ARAG	2003–2018

Table 2. AUC, TSS, and Kappa evaluation criteria.

Evaluation Criteria	Bad	General	Favorable	Fabulous
AUC	0.5~0.7	0.7~0.8	0.8~0.9	0.9~1.0
TSS	0.5~0.6	0.6~0.7	0.7~0.9	0.9~1.0
Kappa	0.5~0.6	0.6~0.7	0.7~0.9	0.9~1.0

Table 3. Comparison of the performance of the four models.

Model	AUC	TSS	KAPPA	TIME(s)
Maxent	0.957	0.852	0.864	670
RF	0.994	0.984	0.962	3930
ANN	0.926	0.837	0.723	2508
XGBoost	0.964	0.887	0.897	2888

Table 4. Classification of suitability.

Habitability Class	Projected Value
Optimum Distribution Area	0.762 < P < 1.000
More Suitable Distribution Area	0.409 < P < 0.762
Low-Suitability Distribution Area	0.119 < P < 0.409
Unsuitable Distribution Area	0.000 < P < 0.119

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, M.; Yang, J.; Fu, Y.; Fu, T.; Zhao, Q.; Zhang, X.; Xu, Q.; Zhang, W. Distribution of Suitable Habitats for Soft Corals (Alcyonacea) Based on Machine Learning. J. Mar. Sci. Eng. 2024, 12, 242. https://doi.org/10.3390/jmse12020242

AMA Style

Dong M, Yang J, Fu Y, Fu T, Zhao Q, Zhang X, Xu Q, Zhang W. Distribution of Suitable Habitats for Soft Corals (Alcyonacea) Based on Machine Learning. Journal of Marine Science and Engineering. 2024; 12(2):242. https://doi.org/10.3390/jmse12020242

Chicago/Turabian Style

Dong, Minxing, Jichao Yang, Yushan Fu, Tengfei Fu, Qing Zhao, Xuelei Zhang, Qinzeng Xu, and Wenquan Zhang. 2024. "Distribution of Suitable Habitats for Soft Corals (Alcyonacea) Based on Machine Learning" Journal of Marine Science and Engineering 12, no. 2: 242. https://doi.org/10.3390/jmse12020242

APA Style

Dong, M., Yang, J., Fu, Y., Fu, T., Zhao, Q., Zhang, X., Xu, Q., & Zhang, W. (2024). Distribution of Suitable Habitats for Soft Corals (Alcyonacea) Based on Machine Learning. Journal of Marine Science and Engineering, 12(2), 242. https://doi.org/10.3390/jmse12020242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distribution of Suitable Habitats for Soft Corals (Alcyonacea) Based on Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Study Area

2.1.2. Different Species Distribution Models

2.2. Data Sources

2.2.1. Species Distribution Data

2.2.2. Data on Marine Environmental Variables

2.3. Construction of 15 Arc Sec Resolution Marine Environmental Dataset and Sample Data Establishment

2.3.1. Marine Environmental Dataset

2.3.2. Species Distribution Data Processing

2.4. Model Framework

2.4.1. Construction of Training Dataset

2.4.2. Model Training

2.4.3. Model Prediction

2.5. Model Assessment

3. Results

3.1. Statistical Comparison of Model Performance

3.2. Predicted Distribution of Species

3.2.1. Potential Distribution Projections

3.2.2. Analysis of Environmental Factors Affecting Species Distribution

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI