Coral Reef Bleaching under Climate Change: Prediction Modeling and Machine Learning

: The coral reefs are important ecosystems to protect underwater life and coastal areas. It is also a natural attraction that attracts many tourists to eco-tourism under the sea. However, the impact of climate change has led to coral reef bleaching and elevated mortality rates. Thus, this paper modeled and predicted coral reef bleaching under climate change by using machine learning techniques to provide the data to support coral reefs protection. Supervised machine learning was used to predict the level of coral damage based on previous information, while unsupervised machine learning was applied to model the coral reef bleaching area and discovery knowledge of the relationship among bleaching factors. In supervised machine learning, three widely used algorithms were included: Naïve Bayes, support vector machine (SVM), and decision tree. The accuracy of classifying coral reef bleaching under climate change was compared between these three models. Unsupervised machine learning based on a clustering technique was used to group similar characteristics of coral reef bleaching. Then, the correlation between bleaching conditions and characteristics was examined. We used a 5-year dataset obtained from the Department of Marine and Coastal Resources, Thailand, during 2013–2018. The results showed that SVM was the most effective classiﬁcation model with 88.85% accuracy, followed by decision tree and Naïve Bayes that achieved 80.25% and 71.34% accuracy, respectively. In unsupervised machine learning, coral reef characteristics were clustered into six groups, and we found that seawater pH and sea surface temperature correlated with coral reef bleaching. under climate change. Unsupervised learning was used to obtain knowledge from previous coral reef bleaching data. The level of coral reef bleaching should be classiﬁed into six levels based on retrospective data. In addition, we found that the pH level is associated with the sea surface temperature. Thus, this study provides additional evidence that machine learning models are a viable and useful approach to monitoring and analyzing coral reef bleaching under climate change.


Introduction
Coral reefs are important and valuable ecosystems to protect underwater life and coastal areas. Moreover, the presence of many coral reefs supports commercial livelihood for fishing and tourism careers. Healthy coral reefs attract many tourists to eco-tourism under the sea. In Thailand, the income derived from the marine tourism industry in southern Thailand drives the GDP growth of country [1]. For this reason, if the level of marine fertility situation can be predicted, it can lead to an assessment of fishing and tourism careers subsistence. It also is an able to organize zoning tourism for further conservation further. Thus, this research presents a modeling and predictive approach to guideline the healthy coral reefs with consideration of coral reef bleaching. There are many causes of coral reef bleaching, but the higher ocean temperatures from climate change is the leading cause of bleaching and elevated mortality rates [2,3]. The current global temperature has an increasing trend, with simultaneous slight warming of the ocean waters. Warmer water is a major contributor to coral bleaching [4][5][6][7]. The coral bleaching effect is caused by climate change and other processes, such as runoff and pollution (storm generated precipitations can rapidly dilute ocean water and runoff can carry pollutants that can bleach near-shore corals), overexposure to sunlight (when temperatures are high, high solar irradiance contributes to bleaching in shallow-water corals), and extreme low tides (exposure to the air during extreme low tides can cause bleaching in shallow corals). Thus, the sea surface temperatures are the main factor used for the predictive model in this work.
Another contributing factor is wind speed, which has an interaction effect with the seawater temperature. When the wind speed is low, a large amount of solar energy can penetrate the water surface, and this increases the water temperature [8]. Therefore, when corals are exposed to very high sunlight in combination with low wind speeds, the algae in the corals can be harmed by the sunlight, making coral bleaching more likely [9,10]. In addition to the factors mentioned above, many other factors affect coral reefs, such as the levels of chemical contaminants in the sea that largely have negative effects. Some of these are natural and some are anthropogenic. Contaminants can cause a change in pH, which can harm the corals. Coral tissue that is irritated tends to contract and form a gel layer [11,12]. Thus, an explicit and measurable expression of pH is chosen as one factor for this proposed modeling.
Predictive models are widely used for prevention and protection activities [10,[13][14][15][16]. In particular, machine learning techniques have been applied to predict the bleaching of coral reefs. These techniques determine statistical relationships and patterns in a dataset by fitting a limited predefined model structure, or by statistical inference [13]. Most studies have used machine learning on remote-sensed imagery and in geospatial image processing for predicting coral reef bleaching. This approach is advantageous because it enables data visualization for monitoring coral reef bleaching. However, this technique requires costly data processing when an organization or a government agency provides big data for a retrospective study. Studies have reviewed data collected for coral reefs and coral bleaching [17,18]. Only a few studies have used machine learning techniques to predict and model collected data. Thus, this current study applied machine learning techniques, both supervised and unsupervised learning.
In this study, various alternative models were studied to select the model giving the highest accuracy. The study used three types of models, namely, Naïve Bayes, SVM, and Decision tree. The Naïve Bayes model makes probabilistic predictions of a classification label. Coral damage levels resolve classification problems based on findings are also predicted by a trained SVM. It is a model uses coefficients to create cluster boundaries. A decision tree model can be easily interpreted by people as a set of simple rules at the decision nodes. In this research, the natural factors that change over time are sea water temperature and wind speed. A further factor that may be influenced naturally or by anthropogenic effects is the pH of water. The retrospective data for these were collected from various databases for a period of 5 years. The study area is made up of coastal and island areas in southern Thailand, and the data were provided from the Department of Marine and Coastal Resources [19].
This article is organized as follows: Section 2 reviews the study factors of coral reef bleaching. Section 3 provides the materials and methods used for predictive modelling. Section 4 presents the proposed methodology Section 5 presents the results and discussion, and finally, Section 6 concludes the article.

Related Works
In this section, we address the study factors of coral reef bleaching. Coral bleaching is mainly caused by stress due to climate change, in which corals expel symbiotic algae to turn white. Coral may bleach for other reasons, like extremely low tides, pollution, or too much sunlight. For this reason, many studies have addressed the factors effect to coral reef bleaching [4][5][6][7]20,21]. Melissa et al. [20] and Steve et al. [21] studied the temperature factor affecting coral bleaching. In an experimental study, samples of Acropora Formosa were taken to study the coral bleaching effects of temperature. The coral samples were divided into three treatment groups, with some placed in cool water, some in temperature-controlled water, and some in water with elevated temperature. The duration of the study was 20 days and some changes were observed on day 5 of the study, suggesting a decrease in coral algal density in the corals having cool or warm water. However, the corals in controlled temperature water had an increase in the algae inhabiting the coral. Afterwards, in the study from day 10 onwards, it became apparent that the warm temperature caused continually decreasing coral algal density, and this was a major contributor to coral bleaching. This is because the algae that live in corals escape from their original habitat to find a new habitat with a suitable water temperature. Thus, the temperature factor has a significant effect on coral bleaching.
In addition, climate change that leads to warming the ocean leads to higher sea levels and changes ocean conditions due to decreases in the salinity. Thus, sea water conditions such as turbidity and pH of water have implications for coral bleaching risk. The turbidity is partially caused by wind speed.
Paparella et al. [22] studied the effect of wind speed and daily temperature change. They found that both factors were highly correlated and led to effects on coral bleaching. The faster winds caused cooling, with the magnitude of temperature decline increasing with wind speed [8]. Recent researchers found that the pH change led to changes in calcium carbonate content of corals [11,12]. Tresguerres et al. [23] studied maintaining a steady pH to function all metabolic processes. They found that acid-base homeostasis mechanisms affect coral physiological responses. Understanding the physiological interactions between temperature stress and acid-base homeostasis is critical for predicting coral performance and acclimatization potential in a changing environment. However, little is known about how climate change affects acid-base homeostasis in corals. The potential effects of heat stress on acid-base regulation pose a particular challenge for maintaining coral calcification, as biomineralization is highly pH-dependent [24,25]. For these reasons, this work selects three mentioned factors to model forecasting on coral reefs bleaching.

Materials and Methods
Coral reefs are crucial for maintaining diverse ecosystems in the sea. Many studies have investigated the major causes of coral bleaching in various areas of the world [20]. Modern technologies such as machine learning can be used in such investigations. Machine learning is a part of artificial intelligence that automates analytical model building by using data, and automatically builds predictive models without being explicitly programmed for that task, requiring only little human involvement [26][27][28]. Machine learning is finding its way into every facet of not only society but also the natural world [6,29]. Most studies on coral reefs have applied machine learning to detect the health of coral reefs [29][30][31][32]. Machine learning algorithms learn to make decisions or predictions based on data. If the level of the marine fertility situation can be predicted, it can lead to an assessment of fishing and tourism careers subsistence. These algorithms can be traditionally classified into three main categories based on learning feedback [28,33]: supervised learning, unsupervised learning, and reinforcement learning. In this study, we focused on supervised and unsupervised learning.
Supervised learning is used to predict the level of coral bleaching by learning from previous information. Moreover, this approach has many algorithms that can be used to build the predictive model. Thus, this work selected three popular algorithms based on different construction to fit model. Namely, Naïve Bayes, SVM, and decision tree are built based on probability, functional, and information gain theory, respectively. On the other hand, unsupervised learning is used to cluster coral reef bleaching that has led to organized zoning of the level of healthy coral reefs. Moreover, the association rule based on unsupervised learning is used to discover the relationship of the study coral bleaching factors.

Supervised Learning
Supervised learning algorithms can be used to build mathematical models of a set of data that contains both inputs and target outputs as training data (i.e., inputs and known outputs caused by them) [33]. The training data set is used to build a representative model that has learned the relationship between the input and output. The trained or fitted model is then used to evaluate its performance in test data. After testing, when an unseen case is fed into the system, the model can be used to predict the expected output. Recently, algorithms to train models have been developed for many approaches. We describe the following supervised learning algorithms for classification tasks that are applied in this study: Naïve Bayes, Support Vector Machine (SVM), and Decision Tree models. These procedures will use the R program as a management and analytical tool.
1. Naïve Bayes Naïve Bayes or Bayesian classifier can be used to predict class membership probabilities, i.e., the probability that a given sample belongs to a particular class [34]. This type of model also can reach a high accuracy and speed when applied to large databases. Moreover, training the model requires only a small number of exemplars to learn the model parameters. The principles of classification are based on the Bayes theorem in conditional probability: where P(H) and P(X) are the probabilities of observing H and X without regard of each other, while P(H|X) is the conditional probability of H given X, P(X|H) is the conditional probability of X given H, and P(X|H)/P(X) is called the likelihood ratio or Bayes factor. The workflow is summarized as follows: • Represent each exemplar with a parameter vector X = (x 1 , x 2 , . . . , x n ), where x i is factor attribute i.

•
Compute probability of class label among the m classes C 1 , C 2 , . . . , C m according to Bayes theorem where class label in this work is the condition of level coral reefs. • Compute the probability of each attribute for all class with P(X|C i )P(C i ) where P(X|C i ) is calculated from product rule for independent events Classify an unknown case X to the class C i which gives the maximal P(X|Ci)P(Ci).

Support Vector Machine
SVM is another widely used supervised learning method that reduces the empirical risk while maximizing the margin from a separating hyperplane to the separated classes [35]. Basically, SVMs are linear classifiers when they use linear kernel functions, which find a hyperplane to separate two classes of data. Linear kernel functions work effectively if classes are linearly separated. Nonlinear separation can be done with other kernel functions. Many mapping functions are available, including linear, polynomial, and radial basis functions (RBF). Polynomial and RBF kernel functions are commonly used depending on the training dataset. SVM maps the training exemplar to a point in high-dimensional space to separate the clusters with a hyperplane that maximizes the margin gaps to the two categories, while mapping back to the original lower dimensional space would show this as nonlinear separation. SVM supports two-class classification. To manage multi-class classification, there are various strategies, such as one-against-rest and error-correcting output coding. The advantage of SVM is that it is based on theoretical mathematics and often provides high-performance classification for both high-and low-dimensional data.

Decision Tree
Decision tree is a learning model that performs classification through binary branchings at decision nodes. In the tree, each internal node denotes an attribute and a threshold, and each branch represents a value range of the attribute, while the final leaf nodes hold the class labels called. This model was applied using the algorithm in [34], To classify an unknown sample, the attribute values of the sample are tested against the decision tree, and the path is traversed from the root to a leaf node, which shows the class label call for that sample or exemplar [34]. Decision trees can be easily converted into classification rules, taken from the binary decision nodes. Thus, an unseen case without a class label can be classified simply by comparing attribute values with the nodes of the decision tree. The advantages of a decision tree are intuitively appealing knowledge expression, simple implementation, and high classification accuracy. Thus, we selected this model to compare its performance with the other candidate models.

Unsupervised Learning
An unsupervised learning algorithm learns patterns from an unlabeled set of inputs (there is no target output). Practically, this requires finding patterns, structures, or knowledge from unlabeled data. In this study, we used k-means clustering and association rules for unsupervised learning.

Clustering
Clustering is a method often used in exploratory analysis. Closely similar cases or exemplars are assigned to the same cluster that should differ from objects in the other clusters. A popular method is the k-means clustering that we use in this study. The k-means algorithm takes the training dataset and the number of clusters k as required inputs. The steps in the k-means algorithm are as follows: • Select the value of k to estimate group of data. Next, each group of k will be guessed for the centroids. • Calculate the distance data between the centroids k and data point for all centroids.Next, the minimum distance value of centroid k will be chosen to assign the first k clusters. • Calculate the new centroid for each clusters from Step 2. • Repeat Steps 2 and 3 until the algorithm converges to an answer.
An advantage of the k-means algorithm is that it is easy to use and the interpretation of cluster results is also easy. From the k-means algorithm, the k clusters are groups of similar data, but the choice of k should be determined by testing several alternatives. Commonly, the within sum of squares (WSS) metric is used to select the value of k. For WSS, the sum of squares is calculated for squared distance between each data point and the closest centroid, and these are compared over a range of k values.

Association Rules
Association rules record the associations or correlations among a large set of data [35]. The process of rule generation consists of two main steps: finding all frequent items and creating rules for the frequent items. In the first step, each record of data is counted for the item frequency, and a cut-off is based on a predetermined minimum support. Later, the frequent data sets are converted to association rules. The rules must satisfy both minimum support and minimum confidence.
In this study, we used the Apriori algorithm to generate association rules, in the WEKA software. Association rules are formed using if/then statements that determine the logical relationships in coral reef bleaching data. These rules summarize the relationships between causal factors and the condition of coral reefs. The minimum support and confidence used in this study were 0.80 and 0.95, respectively.

Research Methodology
In this study, the modeling and prediction of coral reef bleaching associated with climate change had three phases: data preparation, data modeling, and evaluation and deployment. The data preparation phase provided the data set in the first step in order to build the predictive model using machine learning. Next, the data modeling phase involved the fitting predictive model. The supervised machine learning i.e., Naïve Bayes, SVM, and decision tree, was used to predict level of coral damage from previous data. Likewise, unsupervised machine learning was used to cluster and recover the bleaching factors relationship with k-mean cluster and association rule, respectively. The final phase is the evaluation and deployment mention validation methods. Each phase was implemented as described below.

Data Preparation
This study examined the bleaching of corals in the southern marine area of Thailand. This area is rich in marine nature and has become a major tourist attraction of the country. In Thailand, the income derived from the marine tourism industry drives the GDP growth of country. Coral regions are ecological areas with high biodiversity. However, the occurrence of coral reef bleaching indicates serious damage to those ecosystems. Thus, this study tested modeling and predicting of coral reef bleaching to prepare guidelines for managing or preventing this phenomenon. We used data from the Department of Marine and Coastal Resources [19] that collects statistical information on marine resources in Thailand, for the years from 2013 to 2018. This study focused on three factors that affect coral bleaching: pH, sea surface temperature, and wind speed. We used these factors to model and predict the level of coral reef bleaching.
The data were collected by performing a staff survey in each quarter to explore the level of coral reef bleaching, where the level classification of damage was based on the ratio of total area of coral to area of coral bleaching, in each locality. This study examined 287 coral areas in the southern marine area of Thailand (Figure 1; border shading shows the coastal and island areas). The overall condition of coral reefs was defined in five levels: completely damaged coral, damaged coral, moderately luxuriant coral, luxuriant coral, and perfectly luxuriant coral. These are also the targeted class labels in classifier models. The meaning of each level is shown in Table 1. The data set contained the coral colony count showing bleaching. (i.e., the percentage of coral reef recorded as bleached), which is an indication of the relative area of bleaching (Table 1). The obtained data set was prepared for further analysis by removing unnecessary columns i.e., I.D. for each analysis. Missing values were handled by truncating the data in that record. Afterwards, the data were consolidated into forms appropriate for data analysis. Finally, we stored the output data in .csv format for the next step.

Data Model
The planning process was based on the study of algorithms used for the classification of coral bleaching conditions. In this study, we modeled with and compared the performances of three classifier types: Naïve Bayes, SVM, and decision tree. Figure 1 shows the study area. Moreover, we applied clustering to a group of similar factors and then used the association rule to determine dependent factors to suggest relevance of data used to examine coral reef bleaching. Figure 2 shows steps including data segmentation, modeling, testing, and benchmarking. This study performed modeling in two parts: supervised learning (classification task) and unsupervised learning.
In the supervised learning part, Naïve Bayes, SVM, and decision trees were trained in order to select the best model for classifying and predicting the risk of coral bleaching, based on the aforementioned three causal factors. Naïve Bayes was used to categorize or classify coral data groups by using probability principles. Next, SVM was used to categorize coral condition in the areas of this study. Finally, a decision tree could analyze the data in the form of a tree diagram demonstrating the role of each causal factor or condition in causing the bleaching of corals. For this part, we divided the data into two sets: training and test sets (Training set: 70 and Test set: 30). The training dataset was used to fit each model to these data, while the test data were used to measure classification accuracy of each trained model in new data not shown to them earlier. Afterwards, the model showing the highest performance in test data was chosen for subsequent studies. In the unsupervised learning part, clustering was first used to group similar cases. This step applied the k-means clustering. Next, association rule learning was applied to determine the associations or correlations among a large set of data items. To determine factors contributing to coral bleaching in the southern sea regions of Thailand, each group of clusters was formulated to examine causes of coral bleaching.

Evaluation and Deployment
Estimating classifier accuracy is crucial because it indicates how reliably it correctly calls the labels in future data. Thus, this study used 10-fold cross-validation to assess the classifier accuracies. Moreover, we used another accuracy indicator of the model performance namely the kappa coefficient. Its value is between 0 and 1, where 0 indicates no agreement of classifier calls with reference data, and 1 indicates perfectly identical calls with true labels. Thus, a larger kappa coefficient is better. The performance of the model was evaluated based on both classification accuracy and kappa coefficient. To examine the performance of clustering and of the association rules, within sum of squares (WSS) was used to find the best clustering and supply the grouped data. Afterwards, each group with similar factors was examined using an association rule with a minimum confidence threshold of 95%.

Results and Discussion
In this study, we used collected data to analyze coral reef bleaching in relation to climate change by considering three candidate causal factors: seawater pH, water temperature, and wind speed. The results of the preliminary analysis are shown in Figure 3, where (a), (b), and (c) depict findings for seawater pH, temperature, and wind speed, respectively. The time interval was divided into four quarters (Q). Each quarter on the left side of the figure shows the time interval for affecting the coral reef. The reason we collected four quarters of data is because the changes in the weather during each quarter results in changes in the parameters measured in the sea, which in turn can result in coral bleaching each quarter. The average across all quarters on the right side of figure shows that each of these factors damages coral reefs when the level is elevated.  This study modeled and predicted coral reef bleaching by using machine learning techniques, including both supervised and unsupervised learning. First, supervised learning of a classification task was used to build models for predicting severity level of coral reef bleaching in a future situation. Then, unsupervised learning based on clustering and association rules was applied to a group of similar situations to assess correlations among each group of coral reef bleaching data. This study used the R program as a management and analytical tool with R-3.5.1 for Windows and RStudio Version 1.1.456. The results of supervised learning are listed in Table 2. The results show that the SVM model was the most accurate in classifying corals in the southern marine area of Thailand, having an accuracy of 88.85%, followed by the decision tree model with an accuracy of 80.25% and the Naïve Bayes model with an accuracy of 71.34%. This indicates that all three model types were effective in coral bleach classification in the sea area of southern Thailand. This is in accordance with some previous studies [26,36] that have used the Naïve Bayes model to predict the risk posed to the health and resilience of the coral reef system from adverse effects of climate change and harmful human activities, and the possible success of adaptation strategies. Thus, on the basis of these results, it is possible to examine relationships and conditions that cause the white foaming phenomenon in corals. These models can be used to model and predict coral reef bleaching during climate change. Later on, the data set was used to cluster groups of records with similar properties. WSS was used to find a suitable number of clusters for k-means clustering as shown in Figure 4. As shown in Figure 4, the k-means algorithm was run for k = 2, 3, . . . , 15. The WSS was computed to determine a suitable k, the number of clusters. WSS was smaller with more clusters, allowing better splitting of the data for higher similarity within clusters. WSS declined significantly as the k value increased from 1 to 2. Another substantial reduction in WSS occurred at k value of 6; thus, from this analysis we selected k = 6. The process of determining the optimum value of k is known as finding the elbow in the WSS curve.
When we compared the classification accuracy of the condition of coral reef to a previous study [19] based on zoning features, we found that the proposed model provides more details of the condition as shown in Figure 5. We found that using the presented model could provide details of the area that should be monitored and maintained because of relatively high coral fertility. Each group was discovered with the association rules from the Apriori algorithm. We found that pH depends on sea surface temperature. If the sea surface temperature increases, then the pH reduces [37][38][39][40]. Due to current climate change, it is likely that sea acidity and its temperature may continue to increase in the future [41].
In summary, this work applied the supervised machine learning with finding the fitted predictive model to predict the level of healthy coral reefs. The SVM model is the best model to classify bleaching status when we know the value of pH, sea surface temperature, and wind speed. In addition, the unsupervised machine learning using k-mean cluster lead to the explicit zoning of coral reefs bleaching. This method also discovered the relationship among the study factors of coral reef bleaching using the association rule approach. This discovery knowledge can be used as a guideline for the protection of coral reefs.

Conclusions
Coral reef bleaching is an important sign of marine ecosystem destruction, which affect subsistence and businesses in the marine aspect. Although climate change is the main cause and is unmanageable in a short time, the protection of coral reefs can still be obtained to prevent further damage. The past information can be used to guide coral safeguarding for the future. Thus, machine learning is suitable for use in decision support for coral reef protection. In this study, we demonstrated building a model for predicting the bleaching of coral reef by using machine learning. We applied three supervised learning algorithms, namely, Naïve Bayes, SVM, and decision trees, to select from these the model with the best classification performance. The SVM presented the best performance for classifying the level of coral reefs bleaching with 88.85% accuracy. The classifier calls indicated severity of bleaching. The developed model could be used to predict the level of coral reef bleaching under climate change. Unsupervised learning was used to obtain knowledge from previous coral reef bleaching data. The level of coral reef bleaching should be classified into six levels based on retrospective data. In addition, we found that the pH level is associated with the sea surface temperature. Thus, this study provides additional evidence that machine learning models are a viable and useful approach to monitoring and analyzing coral reef bleaching under climate change.