Spatio-Temporal Analysis of Oil Spill Impact and Recovery Pattern of Coastal Vegetation and Wetland Using Multispectral Satellite Landsat 8-OLI Imagery and Machine Learning Models

: Oil spills are a global phenomenon with impacts that cut across socio-economic, health, and environmental dimensions of the coastal ecosystem. However, comprehensive assessment of oil spill impacts and selection of appropriate remediation approaches have been restricted due to reliance on laboratory experiments which o ﬀ er limited area coverage and classiﬁcation accuracy. Thus, this study utilizes multispectral Landsat 8-OLI remote sensing imagery and machine learning models to assess the impacts of oil spills on coastal vegetation and wetland and monitor the recovery pattern of polluted vegetation and wetland in a coastal city. The spatial extent of polluted areas was also precisely quantiﬁed for e ﬀ ective management of the coastal ecosystem. Using Johor, a coastal city in Malaysia as a case study, a total of 49 oil spill (ground truth) locations, 54 non-oil-spill locations and Landsat 8-OLI data were utilized for the study. The ground truth points were divided into 70% training and 30% validation parts for the classiﬁcation of polluted vegetation and wetland. Sixteen di ﬀ erent indices that have been used to monitor vegetation and wetland stress in literature were adopted for impact and recovery analysis. To eliminate similarities in spectral appearance of oil-spill-a ﬀ ected vegetation, wetland and other elements like burnt and dead vegetation, Support Vector Machine (SVM) and Random Forest (RF) machine learning models were used for the classiﬁcation of polluted and nonpolluted vegetation and wetlands. Model optimization was performed using a random search method to improve the models’ performance, and accuracy assessments conﬁrmed the e ﬀ ectiveness of the two machine learning models to identify, classify and quantify the area extent of oil pollution on coastal vegetation and wetland. Considering the harmonic mean (F 1 ) , overall accuracy (OA), User’s accuracy (UA), and producers’ accuracy (PA), both models have high accuracies. However, the RF outperformed the SVM with F 1 , OA, PA and UA values of 95.32%, 96.80%, 98.82% and 95.11%, respectively, while the SVM recorded accuracy values of F 1 (80.83%) , OA (92.87%), PA (95.18%) and UA (93.81%), respectively, highlighting 1205.98 hectares of polluted vegetation and 1205.98 hectares of polluted wetland. Analysis of the vegetation indices revealed that spilled oil had a signiﬁcant impact on the vegetation and wetland, although steady recovery was observed between 2015-2018. This study concludes that Chlorophyll Vegetation Index, Modiﬁed Di ﬀ erence Water Index, Normalized Di ﬀ erence Vegetation Index and Green Chlorophyll Index vegetation indices are more Vegetation Index for wetlands. Thus, remote sensing and Machine Learning models are essential tools capable of providing accurate information for coastal oil spill impact assessment and recovery analysis for appropriate remediation initiatives.

due to their high sensitivity to different elements [27,28] rather than terrestrial monitoring. Nonetheless, mapping of post-spill affected vegetation areas using remote sensing is affected by overestimation due to similarities in spectral appearance of oil spill affected vegetation, wetland and other elements like burnt and dead vegetation [29,30]. To date, remote sensing multispectral images have been used in monitoring impacts of disasters like hurricanes [14,31] and oil spills [32][33][34] on vegetation. Previous studies such as [35][36][37][38] considered mainly terrestrial vegetation health indices which are only cable of assessing the impact of oil spills without giving the exact extent of the polluted and nonpolluted areas. More recent studies have incorporated machine learning models that classify affected and nonaffected terrestrial vegetation to give the exact area extent of polluted areas [39,40]. This process was hitherto affected by inaccuracies in classification, leading to spectral confusion. Further, lack of model comparison has limited the reliability of these approaches. Moreover, assessment of oil impacts on wetlands which constitute part of coastal zones has been scant, with a lot of focus on vegetation despite the variations in hydrocarbon stress for different coastal zones and vegetation areas [41]. Moreover, previous studies have neglected recovery assessment, while impact assessments were often undertaken using data covering two broad periods: pre-and post-oil spill.
To address the aforementioned research gaps, this paper integrated Support Vector Machine (SVM) and Random Forest (RF) machine learning models due to their impressive functionalities in analysis and achieving local minima and generalization with a small sample size [42,43] to classify polluted and nonpolluted vegetation and wetland. This will be followed by a comprehensive assessment of the impacts and recovery trend of the polluted vegetation and wetland over an extended period. The empirical recovery assessment of vegetation and wetlands proposed in this study will provide evidence-based information to better aid decision making for sustainable management of coastal oil spill disasters.

Study Area
Johor (Figure 1a) is bounded by straits of Malacca in the west, straits of Johor in the south and China Sea in the east. It has a total of 400 km of coastline, majorly in the east and west, which are predominantly habitats of mangrove, swampy wetland, grasses and Niplah forest [44]. High percentage of oil palm production is carried out in Johor because of its fertile land [45], and it is renowned for its intensive port activities, comprising domestic and international marine transportation. The coastal city, especially Kota Tinngi (Figure 1b), is highly vulnerable to oil spills because of the frequent use and movement of petroleum products that are often discharged into the water body [46]. Similarly, its proximity to the China sea, which experiences intense cargo vessel movements, exacerbates its vulnerability to oil spill pollution [47]. This frequent transportation of crude oil has caused different oil spills like the Jeti PML plant vessel explosion and fire (2012)

Data Used
The first step in this study was the collection of oil spill data (2014) of the study area from Malaysia's Ministry of Environment, which includes the location, date, time, causes and the type of spill. These spills, comprising mostly crude oil, heavy fuel, Tarball, medium fuel and diesel, usually originated from ship accidents and pipeline leakages and are subsequently washed to the coastal areas over time, affecting the vegetation and wetland. Within the state of Johor, Kota Tinggi experienced a larger portion of the oil spill, which is attributable to the presence of the Liquid Natural Gas (LNG) terminal at Pangarran and a major ship route at Tanjun Balau and strait of Johor. A total of fifteen sites (See Figure 1c) were identified to have been affected by the oil spill, with a land area of more than 3600 square meters (sqm). Forty-nine ground-truth points were then identified around these oil spill sites (Figure 2a,b), and a buffer area of 60 meters was created around the points (Figure 2c) for the classification exercise. As a control point, a total of 54 nonpolluted sites (ground-truth data) with similar 60 meters buffer as obtained for the polluted sites were also identified ( Table 1). To classify the wetland and vegetation polluted and nonpolluted areas, the ground truth data of the classes were divided into 70% training and 30% validation data.
To undertake the oil spill impact and recovery analysis, a reconnaissance survey was conducted for site selection using Google Earth aerial photographs, which identified Land Use Land Cover (LULC) changes in the study area between 2014 and 2018. Sites with significant infrastructural developments, land reclamation and deforestation were excluded from the analysis, and a total of nine sites were used for the impact and recovery analysis.

Data Used
The first step in this study was the collection of oil spill data (2014) of the study area from Malaysia's Ministry of Environment, which includes the location, date, time, causes and the type of spill. These spills, comprising mostly crude oil, heavy fuel, Tarball, medium fuel and diesel, usually originated from ship accidents and pipeline leakages and are subsequently washed to the coastal areas over time, affecting the vegetation and wetland. Within the state of Johor, Kota Tinggi experienced a larger portion of the oil spill, which is attributable to the presence of the Liquid Natural Gas (LNG) terminal at Pangarran and a major ship route at Tanjun Balau and strait of Johor. A total of fifteen sites (See Figure 1c) were identified to have been affected by the oil spill, with a land area of more than 3600 square meters (sqm). Forty-nine ground-truth points were then identified around these oil spill sites (Figure 2a,b), and a buffer area of 60 m was created around the points (Figure 2c) for the classification exercise. As a control point, a total of 54 nonpolluted sites (ground-truth data) with similar 60 m buffer as obtained for the polluted sites were also identified ( Table 1). To classify the wetland and vegetation polluted and nonpolluted areas, the ground truth data of the classes were divided into 70% training and 30% validation data.
To undertake the oil spill impact and recovery analysis, a reconnaissance survey was conducted for site selection using Google Earth aerial photographs, which identified Land Use Land Cover (LULC) changes in the study area between 2014 and 2018. Sites with significant infrastructural developments, land reclamation and deforestation were excluded from the analysis, and a total of nine sites were used for the impact and recovery analysis. Table 1. Oil spill and non-oil-spill ground truth points.

Landsat 8-OLI
Landsat 8-OLI of row and path 59 and 125 between 2013-2018 were acquired from the NASA Landsat mission's global land cover launched in 2013. Landsat 8 has improved technical features. The NIR band has a closer width to Moderate Resolution Imaging Spectroradiometer (MODIS) near infrared (NIR) band, which is widely used in detection of vegetation health status [7,19]. In addition, two reflectance wavelength bands have been added: the shorter wavelength blue band (0.43-0.45) and shortwave infrared SWIR band (1.36-1.39). The former improves the chlorophyll sensitivity while the latter enables cloud cirrus detection [49,50]. The acquired Landsat 8-OLI (level 2) imageries were from December 2013 to December 2018 during the monsoon period with lesser rainfall and minimal cloud cover. The images with the lowest cloud cover of 20% were acquired and subjected to sun angle atmospheric correction. The Landsat 8-OLI image was re-projected to Universal Transverse Mercator (UTM) in accordance to the study location (Johor, Malaysia). These procedures are pertinent to quality control and assurance since the study depends majorly on the spectral values from the imageries. Table 2 presents the image's specifications.  The NIR band has a closer width to Moderate Resolution Imaging Spectroradiometer (MODIS) near infrared (NIR) band, which is widely used in detection of vegetation health status [7,19]. In addition, two reflectance wavelength bands have been added: the shorter wavelength blue band (0.43-0.45) and shortwave infrared SWIR band (1.36-1.39). The former improves the chlorophyll sensitivity while the latter enables cloud cirrus detection [49,50]. The acquired Landsat 8-OLI (level 2) imageries were from December 2013 to December 2018 during the monsoon period with lesser rainfall and minimal cloud cover. The images with the lowest cloud cover of 20% were acquired and subjected to sun angle atmospheric correction. The Landsat 8-OLI image was re-projected to Universal Transverse Mercator (UTM) in accordance to the study location (Johor, Malaysia). These procedures are pertinent to quality control and assurance since the study depends majorly on the spectral values from the imageries. Table 2 presents the image's specifications.

Machine Learning Algorithms
In machine learning, size, number of samples, target variable and training data determine the algorithm selection [39,40]. Specifically, there are mainly 2 types of modeling: supervised and unsupervised learning, which depend on the target availability. There are also 2 main types of results: regression or classification output, which depend on target type, factor or numeric. For this study, two supervised learning classification models (Support Vector Machine (SVM) and Random Forest (RF)) were used.

Support Vector Machine (SVM)
Support vector machine (SVM) is a supervised statistical learning technique developed by Vapnik in 1995 [51,52]. Its applications cut across areas like machine vision, handwriting digit and text identification and satellite imagery classification [53,54]. The model is based on user-defined Kernel function for mapping nonlinear decision boundaries in a dataset to linear boundaries of high-dimensional construct [55] with the goal of ascertaining the hyperplane that optimally separates different classes [56,57]. This hyperplane is determined using training data while validating data set are used for making inference [55]. For this study, both the training and validation data sets are represented by a point vector with a 60 m buffer on the stacked 23 spectral variables (See Table 3). In addition to being a binary classifier, SVMs are also used for multiple class classification through the One Against All and One Against One (OAA and OAO, respectively) [58]. SVM was used for the discrimination of the oil-spill-and non-oil-spill-affected vegetation and wetland in various studies [56,[58][59][60].

Random Forest (RF)
Random forest (RF) is a set of tree predictors wherein each tree relies on the value of a random vector sampled independently and with the same distribution for all trees in the forest [61]. Being an ensemble method, random forest is based on the combination of bootstrap aggregation. Individual trees are parameterized through random selection of samples from observations as training data, enabling multicollinearity reduction [62]. RF has been used for the successful classification of oil-spill-and non-oil-spill-affected vegetation and wetland [49,59,63].

Machine Learning Models for Pollution Classification
The evaluation of the 2 machine learning models for the classification and extent quantification were conducted in 2 stages. The first stage involves the stacking of Landsat 8-OLI band 1-7 and 16 spectral vegetation indices as presented in Table 3. The vegetation indices were all derived from the Landsat 8-OLI imagery of December 2014 which is a cloud-free imagery acquired from the US Geological Survey (USGS) website. The training of the two models was subsequently carried out by first conducting the parameterization of the ground truth on the stacked images. The output was then used Remote Sens. 2020, 12, 1225 8 of 25 for the classification of the area into polluted, nonpolluted and others (spectral reflectance for built-up areas and bare land that are not of interest). Upon the completion of the training task, the validation data set was used to assess the reliability of the models using confusion matrix. The training and validation activities were performed using EnMap Box software.

Accuracy Assessment
Several methods have been developed and used in the assessment of machine learning models for thematic map classifications [64,65], but the error matrix-also referred to as confusion matrix, confusion table or contingency table-is mostly used [40,59,[66][67][68]. Error matrix comprises of a square of array values in rows and columns, depicting the number of sampling units of a class to the same class of the verified (validation) ground truth [65,69]. For this study, the evaluation of the accuracy for each of the Machine Learning models were based on a harmonic mean of precision and sensitivity recall (F 1 accuracy) and the number of matrices derived from the error matrix based on the 30% validation datasets for the four different classes (polluted vegetation, polluted wetland, nonpolluted vegetation and nonpolluted wetland). F 1 accuracy presents the harmonic mean of precision and sensitivity recall which ascertain the out-of-bag error of the model [39]. Equation (1) is used for the F 1 accuracy calculation.
The The matrix values are based on overall accuracy (OA), User's accuracy (UA) and producers' accuracy (PA). The OA indicates the percentage or proportion of the overall map which is correctly classified based on the validation ground truth dataset; UA connotes the proportion of a map class (Pixel) that is correctly classified with reference to that particular class (Pixel) on the validation ground truth; and PA is the proportion of a particular class on the ground that is mapped as that particular class, i.e., how well the assigned pixel is classified [68,70]. These are more accurate reliability assessment indices than the Kappa coefficient, which is an overall measure of accuracy based on a random allocation agreement incorporating an adjustment. Although the Kappa statistic is popular, it is not appropriate for accuracy comparison between different models [67,71] because of its inability to distinguish between elements in the confusion matrix [69]. The four accuracy assessment matrices for the two models were computed from the polluted and nonpolluted classification classes and the proportion areas of the four classes where indicated. Finally, the results were evaluated using the Ms Nemar's chi-square (X 2 ) test to compare them statistically at a confidence level of 95% [72] in order to achieve a marginal homogeneity between the two classes as adopted by [40,59].

Vegetation Indices
Vegetation indices are defined based on the arithmetic combination of two or more spectral bands from an electromagnetic wave reflectance information acquired through satellites [73]. Variations in the reflectance of light spectra indicate the status of the target plant under study. The effect of oil spill hydrocarbon pollution on wetland and vegetation can be identified through changes in the rate of photosynthesis; changes in the relative and absolute concentration of chlorophyll a and b; changes in lead size; thickness and structure [41,74]. Previous studies have utilized several indices to assess vegetation health status, like eight vegetation indices used in [39], three indices [40], one index [75], and two indices [32]. In this study, we utilized sixteen different indices (Table 4) derived from existing literature to examine the effect and recovery pattern of oil spill polluted vegetation and wetland. The impact was ascertained by comparing the vegetation indices value from the polluted site before and after oil spills. Results from 2013 imagery were used for the pre-oil spill analysis and 2015 for the post-oil-spill assessment.

Model Hyper-Parameter Optimization
The models were trained and validated using EnMap software [91]. Hyper-parameters' optimization entails using a set of optimal values as parameters to improve the learning rate, forming an integral part of the general model training. Similar to the approach adopted by [59], the two models were optimized using k-fold (where k = 10) cross-validation by randomized sampling to a certain iteration on the training dataset (Table 5). For SVM, the Gaussian radial basis function (RBF) kernel, which is a multidimensional distribution describing the distance between the input vector and the predefined center vector, has a value of 10; regularization parameter (C) has a value of 10 while the sigma, which represents the weight of the RBF kernel, has a value of 0.001000 on a variable/class number of four. On the other hand, the RF model's variable/class was four with a tree of 500 and an impurity function of Gini Coefficient. L represents the canopy background adjustment factors, which is usually 0.5. C1 and C2 represent coefficients of atmospheric resistance, which are always 6 and 7.5, respectively. RED, GREEN, BLUE, etc. are Landsat 8 band as explained in Table 2.

Land Use Land Cover (LULC) of the study area
The LULC analysis gives information of the spatial distribution of different land use in a particular area at a point in time [92]. This is important to identify the land use type in the study area, especially the vegetation and wetland. In this study, the land use distribution of the study area for 2014 was analyzed using Random Forest machine learning model and Landsat 8 OLI satellite multispectral imagery of December 2014 because of its low cloud cover. The area is surrounded by water bodies that include the South China Sea, Strait of Johor and some parts of Malacca. Based on the Malaysia Land Area Boundary Administration Map Shape file from (diva-gis.org), the subject site is predominantly made up of four major land uses: vegetation, bare land, built-up area and waterbody. The vegetation is divided into terrestrial vegetation and wetland (swampy area). From Figure 3a waterbody. The vegetation is divided into terrestrial vegetation and wetland (swampy area). From Figure 3a and b, a higher percentage of the area (80.74%; 275,681.25 hectares) is made up of vegetation. Next to that is bareland, with 10.92% (7115.22 hectares), wetland 3.72% (12,694.50 hectares) and builtup area 2.54% (8656.29 Hectares). The overall accuracy from the model classification and validation was 99.83%, with a standard error of 0.03%, confirming the model's high accuracy.

Accuracy Assessment
The F 1 , OA, PA and UA error matrix values for the four categories (polluted vegetation, polluted wetland, nonpolluted vegetation, nonpolluted wetland) classification from SVM and RF were used for the accuracy assessment as shown in Tables 6 and 7. While the former represents the result for the study area alone, the latter shows the performance of similar training and validation data set in classifying larger areas by including Pontian, Johor Baharu and part of Keluang. All the training and validation datasets for this study were the same for both models. The assessment results for the study area reveal that RF outperforms SVM, with F1 (95.32%), OA (96.80%), UA (98.82%) and PA (95.11%) as against the SVM's accuracy values of F1 (80.83%), OA (92.87%), PA (95.18%) and UA (93.81%), respectively.
It can also be seen from the PA that the classification of nonpolluted vegetation has a high accuracy for both SVM and RF. For the UA, nonpolluted vegetation has a higher accuracy in SVM, while the polluted wetland has the highest for RF. Assessing the models' performance on a larger area, the RF outperformed the SVM with F1 (85.56%), OA (86.31%), PA (88.29%), and UA (95.45%) compared to the SVM's 80.31%, 83.61%, 89.17% and 90.67%, respectively. This reveals a similar performance pattern irrespective of the size of the study area. However, the models' accuracies in the larger area were lower than those of the smaller study area. This is likely due to the smaller number of the data sets used in this regard [39]. Analysis of the McNemar's chi-squared (X 2 ) ( Table 8) indicates significant statistical differences across the four classification groups in the two models for the study area. (p < 0.05) implies significant statistical difference in the area classification of each of the categories.

Classification and Mapping of Polluted Coastal Areas (Vegetation and Wetland)
Figure 4a-i shows the models' classification outcomes. The SVM classification for the four classes (polluted vegetation, polluted wetland, nonpolluted vegetation, nonpolluted wetland) are presented in Figure 4a,b while Figure 4c,d shows the classification maps from the RF model for the main study area. Figure 4i indicates variations in the models' area classification. For instance, slight differences in the output area for similar classes in the two models exist across the four classes. The areas of polluted vegetation and polluted wetland in SVM are higher than those of RF, while nonpolluted vegetation and nonpolluted wetland areas are higher for RF than SVM. Based on the RF model, the polluted vegetation and the nonpolluted vegetation areas are 2949.79 hectares and 272,731.46 hectares, respectively, while the SVM classified the polluted vegetation and nonpolluted vegetation at 3004.93 hectares and 272,676.32 hectares, respectively, revealing a difference of 55.14 hectares each for the polluted and nonpolluted vegetation areas. It can be inferred that the lower accuracy of the SVM model affects its ability to adequately classify the entire area [39,59]. Similarly, in the classification of wetlands (polluted and nonpolluted), the polluted wetland areas are 1205.98 hectares and 1209.79 hectares from the RF and SVM models respectively. For nonpolluted wetland, RF classification is 11,488.52 hectares, and SVM classification is 11,484.71 hectares, revealing a difference of 3.81 hectares in both instances. Field observations using some purposively selected sites show that RF models have a high true positive accuracy for the four classification categories than SVM, which is reflected in the higher classification accuracy of RF polluted and nonpolluted area extents in comparison to the SVM's.

Oil Spill Pollution Impact Assessment on Vegetation and Wetland
The impact of oil spill on affected vegetation and wetland was examined using the sixteen vegetation indices presented in Table 4. These indices can provide information on the wetland and vegetation health stress due to the oil spill [32,33,39]. The vegetation indices of 2013 were used for pre-oil-spill assessment, while 2015 indices were used for post-oil-spill assessment of the polluted vegetation and wetlands. As depicted in Figure 5a, the comparison of the pre-and post-oil-spill status of the vegetation area showed a general decrease in vegetation health with respect to 15 of the indices. However, the NDWI after oil pollution shows an increase in value, which is likely due to the high absorption and presence of surface water that changed over time [63,93]. Further analysis with the use of paired T-Test (Table 9) indicated that nine of the fifteen indices that reflect deterioration in post oil spill vegetation health (RVI, CVI, GCI, GNDVI, NDVI, MSI, MDWI, SARVI2 and SAVI) were statistically significant with p-value < 0.05. Similarly, from Figure 5b, which represents differences between the pre-and post-oil-spill impact on wetland, a general reduction in the values of all the wetland assessment indices was observed. However, MSI and RVI increased in 2015, two years after contact with oil hydrocarbons. Figure 6 shows the weight of the indices in classifying oil spill impacts in the study area. For the SVM model, a high percentage of the indices (variables) showed significant contributions, with the first five variables, SARVI2, AFRI, NIR, NDVI and MSI, being the most sensitive to oil spills in the study area. In contrast, CVI, Blue, Green, SWIR-1 and GCI, showed more sensitivity in the RF model.
Analysis of these outcomes indicates that an overwhelming majority of the assessment indices respond negatively to exposure to hydrocarbon, with a T-test statistical significance level (p-value) < 0.05. Further, it is observable that wetlands are more impacted by oil spills than vegetation due to their closeness to the waterbody. A higher percentage of the polluted sites are located at the south, southeast and southwest regions of the study area. However, some polluted sites were equally identified towards the eastern and northern parts of the area. The concentration of the polluted sites in these regions is due to the higher number of oil spills recorded along this area. Moreover, the limited detection of polluted sites in the eastern and northern regions is likely due to the undocumented terrestrial oil spill incidents that have occurred along that axis, which has limited the scope of this study. To date, a significant number of oil spill incidents are not well-documented [94].

Oil Spill Pollution Impact Assessment on Vegetation and Wetland
The impact of oil spill on affected vegetation and wetland was examined using the sixteen vegetation indices presented in Table 4. These indices can provide information on the wetland and vegetation health stress due to the oil spill [32,33,39]. The vegetation indices of 2013 were used for pre-oil-spill assessment, while 2015 indices were used for post-oil-spill assessment of the polluted vegetation and wetlands. As depicted in Figure 5a, the comparison of the pre-and post-oil-spill status of the vegetation area showed a general decrease in vegetation health with respect to 15 of the indices. However, the NDWI after oil pollution shows an increase in value, which is likely due to the high absorption and presence of surface water that changed over time [63,93]. Further analysis with the use of paired T-Test (Table 9) indicated that nine of the fifteen indices that reflect deterioration in post oil spill vegetation health (RVI, CVI, GCI, GNDVI, NDVI, MSI, MDWI, SARVI2 and SAVI) were statistically significant with p-value < 0.05. Similarly, from Figure 5b, which represents differences between the pre-and post-oil-spill impact on wetland, a general reduction in the values of all the wetland assessment indices was observed. However, MSI and RVI increased in 2015, two years after contact with oil hydrocarbons. Figure 6 shows the weight of the indices in classifying oil spill impacts in the study area. For the SVM model, a high percentage of the indices (variables) showed significant contributions, with the first five variables, SARVI2, AFRI, NIR, NDVI and MSI, being the most sensitive to oil spills in the study area. In contrast, CVI, Blue, Green, SWIR-1 and GCI, showed more sensitivity in the RF model.
Analysis of these outcomes indicates that an overwhelming majority of the assessment indices respond negatively to exposure to hydrocarbon, with a T-test statistical significance level (p-value) < 0.05. Further, it is observable that wetlands are more impacted by oil spills than vegetation due to their closeness to the waterbody. A higher percentage of the polluted sites are located at the south, southeast and southwest regions of the study area. However, some polluted sites were equally identified towards the eastern and northern parts of the area. The concentration of the polluted sites in these regions is due to the higher number of oil spills recorded along this area. Moreover, the limited detection of polluted sites in the eastern and northern regions is likely due to the undocumented terrestrial oil spill incidents that have occurred along that axis, which has limited the scope of this study. To date, a significant number of oil spill incidents are not well-documented [94].
Based on the foregoing, we conclude that vegetation indices are suitable proxies for estimating the effects of hydrocarbon spill on vegetation as well as wetland. This is similar to the findings of [32,33,35,39,40] wherein vegetation indices were used to examine the effect of the oil spill on vegetation. Although the focus of those studies was terrestrial vegetation, this present study has shown that the approach can also be extended to wetland assessment. Moreover, aside from the common indices (CVI, GCI, GNDVI, NDVI, MDWI, SARVI2 and SAVI), which show a significant deterioration in both polluted vegetation and wetland, EVI, EVI 2, MNDVI, NDMI, NDWI and RDVI can also be used for detecting the effect of the oil spill on wetland. From evaluating the p-value (Table 9), it is evident that CVI, MDWI, NDVI and GCI are more significant in the assessment of both vegetation and wetland oil spill impacts, in addition to MNDVI for wetland assessment. Based on the foregoing, we conclude that vegetation indices are suitable proxies for estimating the effects of hydrocarbon spill on vegetation as well as wetland. This is similar to the findings of [32,33,35,39,40] wherein vegetation indices were used to examine the effect of the oil spill on vegetation. Although the focus of those studies was terrestrial vegetation, this present study has shown that the approach can also be extended to wetland assessment. Moreover, aside from the common indices (CVI, GCI, GNDVI, NDVI, MDWI, SARVI2 and SAVI), which show a significant deterioration in both polluted vegetation and wetland, EVI, EVI 2, MNDVI, NDMI, NDWI and RDVI can also be used for detecting the effect of the oil spill on wetland. From evaluating the p-value (Table 9), it is evident that CVI, MDWI, NDVI and GCI are more significant in the assessment of both vegetation and wetland oil spill impacts, in addition to MNDVI for wetland assessment.

Polluted Vegetation and Wetland Recovery Assessment
The effects of oil spills cover the broad range of vegetation loss caused by reduction of plant chlorosis, loss of water and soil moisture level reduction, among others [95]. Over the years, this has been the scenario of the affected oil spill sites in the study area. The recovery assessment of the affected vegetation and wetland areas was based on the comparison with nonaffected areas from 2015 to 2018. Figure 7a shows the vegetation recovery pattern which is based on all the sixteen indices, with emphasis on nine of the vegetation indices (RVI, CVI, GSI, GNDVI, NDVI, MSI, MDWI, SARVI2 and SAVI). The choice of these indices is premised on their ability to depict the effect of the oil spill on vegetation, as discussed in Section 3.3. Figure 7a highlights the significant improvement in the vegetation health exemplified by enhanced greenness through chlorosis, leaf water retention and soil moisture level increment. This is represented by the increase in the values of most of the indices after the oil spill, which is attributable to the various treatment that the vegetation was subjected to during this period. Similarly, the recovery of the wetland (Figure 7b) was assessed based on the fourteen vegetation indices discussed in Section 3.3. The observed recovery across all the vegetation indices aligns with the findings of [15,19] that depicted vegetation recovery sometime after exposure to the oil spill.
For further insights, the status of the nonpolluted vegetation and nonpolluted wetland was also evaluated over a similar period (Figure 7c The effects of oil spills cover the broad range of vegetation loss caused by reduction of plant chlorosis, loss of water and soil moisture level reduction, among others [95]. Over the years, this has been the scenario of the affected oil spill sites in the study area. The recovery assessment of the affected vegetation and wetland areas was based on the comparison with nonaffected areas from 2015 to 2018. Figure 7a shows the vegetation recovery pattern which is based on all the sixteen indices, with emphasis on nine of the vegetation indices (RVI, CVI, GSI, GNDVI, NDVI, MSI, MDWI, SARVI2 and SAVI). The choice of these indices is premised on their ability to depict the effect of the oil spill on vegetation, as discussed in Section 3.3. Figure 7a highlights the significant improvement in the vegetation health exemplified by enhanced greenness through chlorosis, leaf water retention and soil moisture level increment. This is represented by the increase in the values of most of the indices after the oil spill, which is attributable to the various treatment that the vegetation was subjected to during this period. Similarly, the recovery of the wetland (Figure 7b) was assessed based on the fourteen vegetation indices discussed in Section 3.3. The observed recovery across all the vegetation indices aligns with the findings of [19,15] that depicted vegetation recovery sometime after exposure to the oil spill.
For further insights, the status of the nonpolluted vegetation and nonpolluted wetland was also evaluated over a similar period (Figure 7c

Conclusions
This study has evaluated the potential of multispectral Landsat 8-OLI remote sensing satellite imagery and machine learning models in the quantification of pollution extent through the classification of oil-spill-polluted vegetation and wetland. Advancing previous studies that have focused on monitoring terrestrial vegetation, we evaluated oil spill impacts on wetlands in addition to vegetation. Further, we undertook a systematic assessment of the recovery of the affected zones, which has been sparsely addressed in earlier studies. Support Vector Machine (SVM) and Random Forest (RF) machine learning models were used in the discrimination of the polluted and nonpolluted vegetation and wetland. The accuracies of the two models were validated using four parameters: F 1 , OA, UA and PA, with the RF outperforming the SVM across the board. McNemar's chi-squared (X 2 ) analysis indicated a statistically significant difference in the proportion of land area classification covered by the four (polluted wetland and vegetation, nonpolluted wetland and vegetation as represented in Figure 4i) with p-value < 0.05 from the two models.
Sixteen vegetation health indices were used for the assessment of the impacts of oil spills on vegetation and wetland over a two-year period (2013-2015) which represent pre-oil-spill (2013) and post-oil-spill (2015). Analysis of the results indicates significant vegetation and wetland stress. As observed from the result of the vegetation, 93% of the indices reflected a reduction in value but only 56% were statistically significant at p-value < 0.05. For the wetland, 87.5% of the indices showed a reduction in value of pre-and post-oil-spill sites and are all statistically significant at p-value < 0.05. CVI, MDWI, NDVI, GCI, GNDVI, SARVI2 and SAVI are appropriate for both vegetation and wetland impact assessment, with the first four being the most suitable because of their higher significance level in indicating plant stress in comparison to other indices. In addition to these seven homogenous indices, EVI, EVI 2, MNDVI, NDMI, NDWI and RDVI can also be used to examine wetland hydrocarbon oil spill impact since the greenness of vegetation and sensitivity to high biomass region are most represented by the NIR, SWIR and RED bands which are the basis of the indices.
In addition, the comparison of the nonpolluted and polluted areas over a similar period confirmed the healthier status of the former, although signs of recovery were observed in the latter, which is likely due to treatment interventions by the government. However, more initiatives are required to improve the recovery process. In conclusion, it can be inferred that remote sensing technology and machine learning models are powerful and reliable tools for the impact and recovery assessment of oil-spill-affected vegetation and wetland.