Broadscale Landscape Mapping Provides Insight into the Commonwealth of Dominica and Surrounding Islands Offshore Environment

: A lack of data hinders effective marine management strategies for developing island states. This is a particularly acute problem for the Commonwealth of Dominica. Here we use publicly available remote sensing and model data to map their relatively unstudied waters. Two study areas were selected; a smaller area focussing on the nearshore marine environment, and a larger area to capture broader spatial patterns and context. Three broadscale landscape maps were created, using geophysical and oceanographic data to classify the marine environment based on its abiotic characteristics. Principal component analysis (PCA) was performed on each area, followed by K-means clustering. The larger area PCA revealed three eigenvalues > 1, and one eigenvalue of 0.980. Therefore, two maps were created for this area, to assess the signiﬁcance of including the fourth principal component (PC). We demonstrate that including too many PCs could lead to an increase in the confusion index of ﬁnal output maps. Overall, the marine landscape maps were used to assess the spatial characteristics of the benthic environment and to identify priority areas for future high-resolution study. Through deﬁning and analysing existing conditions and highlighting important natural areas in the Dominican waters, these study results can be incorporated into the Marine Spatial Planning process.


Introduction
The growing human population has caused increased exploitation of the marine environment, through activities such as commercial fishing, coastal development, tourism, and seabed mining [1], leading to increased habitat damage, pollution, and littering. Marine spatial planning (MSP) is a process that organises the human use of marine space to balance ecological, economic, and social goals. To appropriately manage and promote sustainable human-marine interactions requires detailed knowledge of the marine environment. In many cases, this is restricted by data availability and the ability to process and interpret environmental data. In situ data collection techniques (e.g., multibeam and sidescan sonar and groundtruthing data collection) form the prime method to expand spatial and temporal data coverage [2]. However, in situ data acquisition, processing, and interpretation is not always possible or a priority for many developing countries due to the expense and expertise required [2,3], even if those nations are reliant on their waters for food and income [3][4][5]. Due to prolonged and increasing exploitation, easily accessible terrestrial and marine resources are dwindling [6], moreover, advancements in technology allow greater scope for resource mining [7]. Therefore, understanding what currently exists in national ing use of standard computing resources. More advanced unsupervised clustering methods have been described and increasingly Artificial Intelligence approaches are being tested, such as autoencoders [20] or the use of a fully convolutional network to predict superpixels for segmentation [21]. However, their implementation is often less straightforward, which may limit their use in SIDS.
The study was conducted at both a local scale and at a wider scale to give a broader context for the Dominican waters. PCA is a common method used to reduce the number of input variables to a new set of linearly independent principal components (PCs) that account for the majority of the variance in the original variables [17,22]. An important consideration in the use of this unsupervised classification method is how many principal components to retain for K-means clustering [23,24]. Underestimating results in the loss of important information, while overestimating diffuses the variables into more components than needed to describe the main structure of the data, and could lead to an interpretation focussing on trivial components [24,25].
The Kaiser-Guttman criterion, a popular method for factor retention, dictates that only those PCs with eigenvalues greater than one should be retained [26,27]. The rationale behind this criterion is that a factor should account for more variance than the original input variables. Several studies showed that the use of the Kaiser-Guttman criterion overestimates the number of factors to retain [25,28]; however, Pituch and Stevens [29] suggest this rule might be used for studies in which there are less than 30 variables. Therefore, the results of the PCA will be assessed for both eigenvalues and the proportion of variance explained. Once the factor retention is decided, a Varimax rotation is performed on the retained PCs, resulting in the original input variation being maximally aligned to single rotated PCs (RPCs), so correlated variables are more likely to have maximal loadings on the same RPC and variation among RPCs is equalised [30].
K-means clustering is then performed on the resulting RPCs, a data partitioning process commonly used for marine data [17]. The results of this clustering will then be mapped for the local and broadscale studies, and the correlation of the original input abiotic variables against the K-means cluster values assessed to determine the physical characteristics of each cluster and therefore, the region.
Within this study, we assess the Kaiser-Guttman criterion of retaining eigenvalues greater than one [26] during factor retention decisions, as this decision determines the variance explained in the study. We also evaluate the suitability of this landscape mapping method for Caribbean Islands, due to the availability of open source environmental data.

Study Area
The study area consists of the Commonwealth of Dominica and surrounding waters, first using a large bounding box extending from 57.5 • W to 63 • W (~586 km) and 14 • N to 16.9 • N (~320 km). This covers an area of~188,000 km 2 including several Caribbean Islands and stretches into the abyssal plain of the NW Atlantic ( Figure 1). Second, a smaller bounding box is used, extending from 60.3 • W to 62.3 • W (~215 km) and 14 • N to 16.9 • N (~320 km), covering an area of~68,800 km 2 . This smaller area study was conducted to focus on the shallower marine environment surrounding the Caribbean Islands, without extending further into the NW Atlantic or Caribbean Sea. Removing the influence of the deeper, further offshore region may reveal patterns in the nearshore clusters that give us greater insight into these shallower waters, but it has the risk to ignore wider environmental patterns.

Bathymetric Data
Two sources of bathymetric data were used within this study. The first data source, at 800 m resolution, is the General Bathymetric Chart of the Oceans (GEBCO) [31] and the boundaries of this dataset (57.5°-63.0°W and 14.0°-17.0°N) were selected to include the entire Dominican Exclusive Economic Zone (EEZ), which extends from 57.875° to 62.8139°W and 14.4886° to 16.5008°N [32]. The second bathymetry dataset is a digital elevation model (DEM) of Guadeloupe and Martinique produced as part of the HOMONIM (History, Observation and Modelling of sea levels) project [33]; piloted by the French Naval Hydrographic and Oceanographic Service (SHOM) and the French national meteorological service (Météo-France).
This DEM encompasses several of the Lesser Antilles Islands, from Montserrat in the north to Saint Lucia in the south and extends offshore to a depth of 5800 m. It is comprised of SHOM bathymetry combined with existing compilations of DEM (GEBCO, EMODnet etc.) and bathymetric data from international databases. The boundaries of the smaller dataset defined the extent of the local scale study, extending from 60.3° to 62.3°W and 14.0° to 17.0°N. This dataset had a resolution of 100 m, but due to noise in the data, a Gaussian Filter (Standard deviation 20, radius 40) was applied in SAGA GIS (version 7.2) to soften noise that might affect the clustering process (see landscape mapping).
Both datasets were interpolated to 250 m resolution and then combined using the TOPOGRID interpolation function in ArcGIS (version 10.6). This function uses the ANUDEM algorithm, a computationally efficient multigrid interpolation process, which aims to account for errors in large datasets without over-smoothing well-defined features [18,34,35]. The SHOM dataset was prioritised due to its initial higher resolution and a 1000 m surrounding buffer with no data were applied to smooth the boundary joining.
Following the creation of an initial map, which demonstrated clusters in unnatural straight lines, highlighting the survey artefacts in the data, the mean Focal Statistics function in the Spatial Analyst toolbox of ArcMap was used (a neighbourhood settings of three cells) to further reduce the noise associated with the bathymetric data. This process created a smoother DEM with output cell values that are a function of the original cell values in a specified neighbourhood around the focal cell.

Bathymetric Data
Two sources of bathymetric data were used within this study. The first data source, at 800 m resolution, is the General Bathymetric Chart of the Oceans (GEBCO) [31] and the boundaries of this dataset (57.5 • -63.0 • W and 14.0 • -17.0 • N) were selected to include the entire Dominican Exclusive Economic Zone (EEZ), which extends from 57.875 • to 62.8139 • W and 14.4886 • to 16.5008 • N [32]. The second bathymetry dataset is a digital elevation model (DEM) of Guadeloupe and Martinique produced as part of the HOMONIM (History, Observation and Modelling of sea levels) project [33]; piloted by the French Naval Hydrographic and Oceanographic Service (SHOM) and the French national meteorological service (Météo-France).
This DEM encompasses several of the Lesser Antilles Islands, from Montserrat in the north to Saint Lucia in the south and extends offshore to a depth of 5800 m. It is comprised of SHOM bathymetry combined with existing compilations of DEM (GEBCO, EMODnet etc.) and bathymetric data from international databases. The boundaries of the smaller dataset defined the extent of the local scale study, extending from 60.3 • to 62.3 • W and 14.0 • to 17.0 • N. This dataset had a resolution of 100 m, but due to noise in the data, a Gaussian Filter (Standard deviation 20, radius 40) was applied in SAGA GIS (version 7.2) to soften noise that might affect the clustering process (see landscape mapping).
Both datasets were interpolated to 250 m resolution and then combined using the TOPOGRID interpolation function in ArcGIS (version 10.6). This function uses the ANU-DEM algorithm, a computationally efficient multigrid interpolation process, which aims to account for errors in large datasets without over-smoothing well-defined features [18,34,35]. The SHOM dataset was prioritised due to its initial higher resolution and a 1000 m surrounding buffer with no data were applied to smooth the boundary joining.
Following the creation of an initial map, which demonstrated clusters in unnatural straight lines, highlighting the survey artefacts in the data, the mean Focal Statistics function in the Spatial Analyst toolbox of ArcMap was used (a neighbourhood settings of three cells) to further reduce the noise associated with the bathymetric data. This process created a smoother DEM with output cell values that are a function of the original cell values in a specified neighbourhood around the focal cell.
From this combined grid, the following bathymetric derivatives were calculated (summarised in Table 1): within ArcMap slope and plan curvature were derived using the Spatial Analyst Toolbox (using a 3 × 3 cell neighbourhood). Topographic position index (TPI) was calculated using the Land Facet Corridor Tools extension in ArcMap (radius of 4 cells, 1000 m). Terrain ruggedness index (TRI) was calculated using the SAGA GIS Terrain Analysis Morphometry tools (radius of four cells, 1000 m). Multiple scales of the same variables were highly correlated and so, rather than including them all, finer scale plan curvature and slope were used to capture finer patterns (gullies and canyons) while broad scale TPI and TRI were used to capture broad patterns. Plan curvature was selected, as it is the least correlated with TPI.

Satellite and Oceanographic Model Derived Variables
Seabed salinity, temperature and current velocity data were derived from the Copernicus Global Sea Physical Analysis and Forecasting numerical model [36] at a spatial resolution of 0.083 degrees × 0.083 degrees (~8900 m by longitude and~9100 m by latitude at the latitude of study). Contrary to Hogg et al. [18], who split oceanographic variables into summer and winter averages, we used annual mean values, given the limited annual variation in water column characteristics at this latitude. Satellite derived net primary productivity (NPP) data were sourced from Oregon State University [37] (http://sites.science.oregonstate.edu/ocean.productivity, accessed 8 March 2019), as monthly means at 1/12 degree spatial resolution. NPP is defined as a function of the Vertically Generalised Production Model, MODIS surface chlorophyll concentrations, MODIS 4-micron sea surface temperature and MODIS cloud-corrected incident daily photosynthetically active radiation (PAR). Table 2 provides a summary of satellite and oceanographic model derived variables included in the landscape mapping analysis. These data were imported into RStudio (version 2021.9.1.372) where they were filtered to the region of interest and an average of the monthly means from January 2016 to December 2018 was calculated. These data were then imported into ArcMap, transformed to WGS 84 UTM 20N and interpolated to a spatial resolution of 250 m using the Spline with Barriers function from the Spatial Analyst Toolbox. The island's coastline was used as the barrier to prevent interpolation across the terrestrial landscape.

Landscape Mapping
This study follows the method adapted by Hogg et al. [18], to create a landscape map of sub-Antarctic South Georgia, following Verfaillie et al. [17] and Ismail et al. [19]. The following analysis was conducted in RStudio [38] (package 'psych' [39]), with the data normalised to have zero-means and unit variance. The stages were as follows: 1.
Principal component analysis (PCA). PCA was conducted on the nine input variables to reduce the data to linearly independent Principal Components (PCs) and remove collinearity. These PCs account for the greatest variance in the data without needing to predetermine which variables should be used for the analysis. PCs with eigenvalues less than 1 are traditionally discarded following the Kaiser-Guttman criterion; however, the initial eigenvalues calculated from the larger dataset had one borderline value and so the study was performed using both values >1 and >0.97. A Varimax rotation of the retained PCs was performed, to clarify the loadings matrix structure, and the subsequent analysis was performed on the resulting rotated PCs (RPCs).

2.
Determine optimum number of clusters. A predefined number of clusters must be input into the K-means clustering algorithm. Here we used both the Calinski-Harabasz index (C-H) [40] and the elbow method [22] to determine the number of clusters. The C-H index is the ratio of the sum of inter-cluster variance to the intracluster variance; the highest value indicates the optimum number of clusters. The elbow method assesses the variance within clusters against a range of cluster values (here 1-15). The point at which increasing the number of clusters does not significantly lower the intra-cluster variance is the optimum number of clusters. 3.
K-means clustering. K-means clustering is a common algorithm used to partition marine environmental data [17,41]. The number of clusters for K-means clustering must be specified for this analysis; both the Calinski-Harabasz index (C-H) [40] and the elbow method [22] will be used to determine this value in an objective way. The K-means clustering algorithm uses an iterative method, whereby cluster centres are randomly allocated, and each data point is temporarily assigned to the cluster that minimises the distance between the focal point and the centre of the cluster in the multidimensional PC space. The centre points are then repeatedly shifted, the distances recalculated, and the data points re-allocated to the closest cluster centres, until the positions of the centroids are optimal or until the specified number of iterations has been reached. 4.
Landscape map. The final cluster value for each data point was plotted against the point location to create a landscape map of the study regions. Boxplots summarising the distribution of the original input abiotic variables against the K-means cluster values were created to assess the influence of the abiotic inputs on the cluster solution and determine the physical characteristics of each cluster.

5.
Confusion Index Map. To assess how well each data point fitted within its assigned cluster, a confusion index map was created using the inverse distance squared in attribute space between the data observation and each K-means cluster centre to give a cluster membership value. A quantitative uncertainty measurement can be calculated using, for each data point, the ratio of the second highest membership value versus the highest, known as a confusion index (CI) [19]. If the data point is well characterised by the assigned cluster, the CI value will approach zero. Conversely, if the data point is not dominated by the assigned cluster, and the membership values are spread across several clusters, the value will be closer to one. These values were plotted against the data observation location to create a confusion map. For more detail, see Hogg et al. [18].

Larger Study Area
Principal component analysis (PCA) performed on the larger study area variables resulted in three eigenvalues greater than one, and one eigenvalue of 0.980 (Table 3). To assess the significance of including this fourth principal component (PC), two landscape maps were created; henceforth these two maps shall be referred to as E3 (three eigenvalues used) and E4 (four eigenvalues used). Taking only the initial three PCs with eigenvalues of one or greater, 73.265% of the total variance was explained, while the addition of the fourth RPC with eigenvalue 0.980 explains a further 10.894% of the variance. Varimax rotated principle component analysis (RPCA) performed on the E3 data (Table 4), resulted in RPC 1 having high loads for depth, salinity, and temperature and lower loads for NPP and current; RPC 2 had high loads for slope and TRI and a lower load for current; RPC 3 had high loads for plan curvature and TPI. Joint high loadings on one RPC clearly illustrate correlation between those variables. In the E4 RPCA (Table 5), all abiotic variables had a connection with only one RPC, except for depth and current. RPC 1 had high loads for depth, salinity, and temperature as well as lower loads for NPP and current; RPC 2 had high loads for slope and TRI and a lower load for current; RPC 3 had high loads for plan curvature and TPI; RPC 4 had a high load for NPP and a lower load for depth.

Smaller Study Area
PCA performed on the smaller study area resulted in three RPCs with an eigenvalue of one or greater (Table 6), which explained 73.490% of the total variance and RPCA was performed on these PCs. The rotated components matrix (Table 7) shows the factor loads which explain the correlation between the RPCs and the original abiotic variables, excluding any factor loads −0.3 > r < 0.3. All variables had a high factor load (r > 0.7) with only one RPC, except current and NPP. RPC 1 had a high load for the variables depth, salinity, and temperature; RPC 2 for slope and TRI; RPC 3 for plan curvature and TPI. NPP had a lower factor load with RPC 2 and current had a lower factor load with RPCs 1 and 2. For the larger study area, the 'elbow method' was applied to the within group sum of squares clustering solution. The gradient change in the graph, which would identify the optimal clustering solution, was unclear for both the E3 and E4 studies (Figure 2a,c, respectively). The Calinski-Harabasz method showed clear first local maxima for both the E3-six clusters ( Figure 2b) and E4-five clusters (Figure 2d) studies, and so these values were used for the following K-means analyses.
Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 20 the optimal clustering solution, was unclear for both the E3 and E4 studies (Figure 2a,c, respectively). The Calinski-Harabasz method showed clear first local maxima for both the E3-six clusters ( Figure 2b) and E4-five clusters (Figure 2d) studies, and so these values were used for the following K-means analyses.

Smaller Study Area
The result of the group sum of squares clustering solution for the smaller study area was evident at four clusters (Figure 2e). The Calinski-Harabasz method agreed, showing a clear first local maximum at four clusters ( Figure 2f) and so this value was used for the following K-means clustering.

Broadscale Landscape Map
The results of the K-means analysis clustering solution were interpreted alongside the boxplots of the distribution of the original input variables for each cluster.

Larger Study Area
The E3 area resulted in six clusters (Figure 3), each of which had distinct physical conditions (Figure 4). Clusters 1 and 6 occur in deeper waters and were influenced by the eastward transition from high to low NPP. Cluster 2 occurs in areas of steep slopes and high TRI, as well as higher currents and a broad range of plan curvature and TPI, indicating an area of depressions and peaks. This cluster occurs on the west side of each island, on the shoulder to the northeast of Guadeloupe and along a ridge to the northeast of the study area. Clusters 3 and 4 were characterised by plan curvature and TPI, occurring on the sides of slopes and around the islands where the seafloor drops into deep water. Cluster 3 occurs in high TPI and curvature indicating peaks and ridges and cluster 4 in low TPI and curvature indicating gullies or canyons. Cluster 5 occurred predominantly to the east of each island in shallow depths, high temperatures and high salinity waters with a broad range of currents. Small features occur throughout the study area to the east of the islands.

Smaller Study Area
The result of the group sum of squares clustering solution for the smaller study area was evident at four clusters (Figure 2e). The Calinski-Harabasz method agreed, showing a clear first local maximum at four clusters ( Figure 2f) and so this value was used for the following K-means clustering.

Broadscale Landscape Map
The results of the K-means analysis clustering solution were interpreted alongside the boxplots of the distribution of the original input variables for each cluster.

Larger Study Area
The E3 area resulted in six clusters (Figure 3), each of which had distinct physical conditions (Figure 4). Clusters 1 and 6 occur in deeper waters and were influenced by the eastward transition from high to low NPP. Cluster 2 occurs in areas of steep slopes and high TRI, as well as higher currents and a broad range of plan curvature and TPI, indicating an area of depressions and peaks. This cluster occurs on the west side of each island, on the shoulder to the northeast of Guadeloupe and along a ridge to the northeast of the study area. Clusters 3 and 4 were characterised by plan curvature and TPI, occurring on the sides of slopes and around the islands where the seafloor drops into deep water. Cluster 3 occurs in high TPI and curvature indicating peaks and ridges and cluster 4 in low TPI and curvature indicating gullies or canyons. Cluster 5 occurred predominantly to the east of each island in shallow depths, high temperatures and high salinity waters with a broad range of currents. Small features occur throughout the study area to the east of the islands. The E4 study area demonstrates similar cluster characteristics to the E3 map; however, the four eigenvalues resulted in only five clusters (Figures 5 and 6). Clusters 1 and 5 demonstrate the declining NPP gradient from east to west. Cluster 2 is defined by shallow, high temperature and high salinity waters. However, clusters 3 and 4 represent the areas covered by the E3 clusters 2, 3 and 4 and are characterised by slope, TRI, plan curvature and TPI. The E4 study area demonstrates similar cluster characteristics to the E3 map; however, the four eigenvalues resulted in only five clusters (Figures 5 and 6). Clusters 1 and 5 demonstrate the declining NPP gradient from east to west. Cluster 2 is defined by shallow, high temperature and high salinity waters. However, clusters 3 and 4 represent the areas covered by the E3 clusters 2, 3 and 4 and are characterised by slope, TRI, plan curvature and TPI.

Smaller Study Area
The smaller study area was clustered into four regions ( Figure 7) and was characterised in a similar manner to both the larger studies ( Figure 8). Cluster 2 was characterised by shallow, high temperature, high salinity water to the east of the islands while clusters 3 and 4 represented the areas of high slope and ruggedness (TRI), and range in plan curvature and TPI. Cluster 1 represented the remaining deeper water but did not demonstrate the gradient in NPP shown in the larger area. In this local map, the shoulder to the east of Guadeloupe is divided into two clusters, similar to the E4 map, demonstrating that the difference in plan curvature and TPI is more significant than depth and slope. Figure 9 shows the confusion index maps for both E3 and E4, with the confusion index for each study scaled between 0 and 1. Around the islands, the confusion index map is dark, showing low confusion in the assigned cluster for each study and indicating the regions dictated by depth, salinity and temperature are distinct. The area to the southwest of the islands also demonstrates low confusion, although the E4 study is darker in this Figure 6. Boxplots of the nine original abiotic variables for each of the K-means derived clusters for the larger study area using four eigenvalues. The x-axis denotes the five K-means clusters, the y-axis denotes the respective units of each abiotic variable. The colours represent the corresponding cluster in the landscape map ( Figure 5). Format of boxplots as described in Figure 4.

Smaller Study Area
The smaller study area was clustered into four regions ( Figure 7) and was characterised in a similar manner to both the larger studies ( Figure 8). Cluster 2 was characterised by shallow, high temperature, high salinity water to the east of the islands while clusters 3 and 4 represented the areas of high slope and ruggedness (TRI), and range in plan curvature and TPI. Cluster 1 represented the remaining deeper water but did not demonstrate the gradient in NPP shown in the larger area. In this local map, the shoulder to the east of Guadeloupe is divided into two clusters, similar to the E4 map, demonstrating that the difference in plan curvature and TPI is more significant than depth and slope. region than the E3 study, indicating the E3 study has higher confusion in the assigned cluster for that region. The same occurs in the northeast of the map extent.   region than the E3 study, indicating the E3 study has higher confusion in the assigned cluster for that region. The same occurs in the northeast of the map extent.   (Figure 7). Format of boxplots as described in Figure 4. Figure 9 shows the confusion index maps for both E3 and E4, with the confusion index for each study scaled between 0 and 1. Around the islands, the confusion index map is dark, showing low confusion in the assigned cluster for each study and indicating the regions dictated by depth, salinity and temperature are distinct. The area to the southwest of the islands also demonstrates low confusion, although the E4 study is darker in this region than the E3 study, indicating the E3 study has higher confusion in the assigned cluster for that region. The same occurs in the northeast of the map extent. axis denotes the respective units of each abiotic variable. The colours represent the corresponding cluster in the landscape map (Figure 7). Format of boxplots as described in Figure 4.

Larger Study Area
The shoulder to the northeast of Guadeloupe is more distinctly dark in the E3 study, where the shoulder was classed as one cluster, whereas in the E4 study, where the shoulder was divided into two clusters, the region has higher confusion. The highest confusion occurs at the boundaries of the clusters, as the characteristics of the region transition between defining variables. High confusion is indicative that the habitats have characteristics of more than one cluster. The E3 patchiness of the smaller features to the east of the islands is more apparent than in the E4 map but the western extent of cluster 5 in the E4 study has higher confusion than the E3 assigned clusters, particularly to the northeast of the Commonwealth of Dominica as cluster 5 transitions to cluster 1 along the NPP gradient.  The shoulder to the northeast of Guadeloupe is more distinctly dark in the E3 study, where the shoulder was classed as one cluster, whereas in the E4 study, where the shoulder was divided into two clusters, the region has higher confusion. The highest confusion occurs at the boundaries of the clusters, as the characteristics of the region transition between defining variables. High confusion is indicative that the habitats have characteristics of more than one cluster. The E3 patchiness of the smaller features to the east of the islands is more apparent than in the E4 map but the western extent of cluster 5 in the E4 study has higher confusion than the E3 assigned clusters, particularly to the northeast of the Commonwealth of Dominica as cluster 5 transitions to cluster 1 along the NPP gradient.

Smaller Study Area
The confusion index map for the smaller study area (Figure 10) has extensive areas of low confusion to the southwest of the map extent, similar to the larger study areas. Most of the higher confusion zones appear at the cluster boundaries. Areas of lighter grey occur to the northeast where cluster 1 reaches around the islands, in the region the broadscale maps transition to another cluster due to the NPP gradient. The confusion index map for the smaller study area (Figure 10) has extensive areas of low confusion to the southwest of the map extent, similar to the larger study areas. Most of the higher confusion zones appear at the cluster boundaries. Areas of lighter grey occur to the northeast where cluster 1 reaches around the islands, in the region the broadscale maps transition to another cluster due to the NPP gradient.

The Marine Landscape around Dominica
Each of the studies above maps environmental zones based on abiotic factors derived from publicly available data. This demonstrates the potential of open access data, and, while the method does not determine where ecologically important habitats are located, it highlights areas of potential interest without the input of biological data. The results can be used to identify areas of survey priority and in need of MSP, be a starting point for stakeholder discussion, and be incorporated into MSP development and MPA designation through defining and analysing existing conditions and highlighting potential areas of interest. If conservation efforts need to be focused on just one or a few areas, selecting the areas of highest habitat heterogeneity-the areas with the highest number of clusters, and therefore habitats, within close proximity-can allow each habitat to be included within the MSP. Additionally, calculating the area covered by each cluster can contribute to proportional conservation target achievement. The marine landscape maps can be used as input to quantify the structure (composition and configuration) of the offshore environment, using metrics such as those used in landscape ecology (e.g., Swanborn et al. [42]). At the same time, it may be possible to apply some local knowledge of the nearshore environment to offshore areas that have been allocated to the same cluster.
Previous studies of the nearshore distribution of benthic habitats in Dominican waters revealed seagrass habitats dominate the west and north coast of the island, with coral reefs also occurring in these regions [13,43,44]. In the E3 study, this western nearshore region is classified predominately as cluster 2, with areas of clusters 3 and 4, the same as the shoulder to the northeast of Guadeloupe, the southern tail of which lies within Dominican waters, and is characterised by higher slope, TRI, and currents. In the E4 and

The Marine Landscape around Dominica
Each of the studies above maps environmental zones based on abiotic factors derived from publicly available data. This demonstrates the potential of open access data, and, while the method does not determine where ecologically important habitats are located, it highlights areas of potential interest without the input of biological data. The results can be used to identify areas of survey priority and in need of MSP, be a starting point for stakeholder discussion, and be incorporated into MSP development and MPA designation through defining and analysing existing conditions and highlighting potential areas of interest. If conservation efforts need to be focused on just one or a few areas, selecting the areas of highest habitat heterogeneity-the areas with the highest number of clusters, and therefore habitats, within close proximity-can allow each habitat to be included within the MSP. Additionally, calculating the area covered by each cluster can contribute to proportional conservation target achievement. The marine landscape maps can be used as input to quantify the structure (composition and configuration) of the offshore environment, using metrics such as those used in landscape ecology (e.g., Swanborn et al. [42]). At the same time, it may be possible to apply some local knowledge of the nearshore environment to offshore areas that have been allocated to the same cluster.
Previous studies of the nearshore distribution of benthic habitats in Dominican waters revealed seagrass habitats dominate the west and north coast of the island, with coral reefs also occurring in these regions [13,43,44]. In the E3 study, this western nearshore region is classified predominately as cluster 2, with areas of clusters 3 and 4, the same as the shoulder to the northeast of Guadeloupe, the southern tail of which lies within Dominican waters, and is characterised by higher slope, TRI, and currents. In the E4 and smaller area study both areas are more evenly divided between clusters 3 and 4, characterised by slope, TRI, TPI and plan curvature, as these clusters in E4 account for clusters 2, 3 and 4 in E3. Hydrodynamics and terrain complexity strongly influence habitat suitability [45,46]. While this dataset lacked substrate type, which is also influential in determining habitat assemblages, the inclusion of bathymetric derivatives such as rugosity and slope can be an indicator of substrate type [47]. Areas of high slope and current are often indicative of hard substrate, due to low sedimentation or erosion. Higher current velocity and terrain complexity have also been linked with increased biodiversity [46]. A relationship between filter-feeding communities and increased (up to limiting values) slope, nearbottom currents and resultant turbidity has been demonstrated, as these organisms feed on particles suspended in the water [45,48]. Habitats identified from previous nearshore studies of areas that lie in the same cluster as the southern tail of the shoulder can be used to infer whether further investigation of this area is warranted.
Rocky and sandy areas comprise 81% of the shallow marine benthos of Dominica and dominate the south and east coast [13] where the topography is not as steep as the west. The shelf area, characterised by shallow, higher temperature and salinity waters is comparatively much smaller for Dominica than the surrounding islands. Each of the broadscale landscape maps identify a bank to the southeast of Dominica that demonstrates the same characteristics, defined by shallow depth, higher temperature, and salinity, as the nearshore waters on the northeast of the islands. This is Macouba bank, partially within Dominican waters but, as it is~24 km from the coast, it is difficult for the smaller Dominican fishing boats to access and has allegedly been exploited by fishers coming from other nations with more established fishing gear [49,50]. If this is an area of ecological and environmental interest, as it has waters of similar characteristics to the more productive shallow Dominican waters, it might be a priority for future management and investigation. It is also of note that the more productive shallow waters on the Dominican coast lie to the northeast of this island and fishers from this area would have to travel over 40 km to reach the exposed waters of Macouba Bank. This distance could also present difficulties with management and policing of this bank.

Unsupervised Classification: Methodological Considerations
The purpose of PCA is to reduce the dimensionality of the input data and to ensure the input variables are independent. A significant consideration for this study was how many PCs to include within the analysis. An eigenvalue >1 explains more variance in the data than the individual input variables. Therefore, the initial study follows this rule. However, as computers have developed, constraining the dimensionality of the data is less important and the priority is reducing the correlation between variables. Therefore, the E4 study was also conducted to prevent loss of variability in the dataset. By including the fourth eigenvalue, a further 10.894% of the variance was included in the map. This additional data may be a deciding factor in the cluster assignment of a data point where the confusion index is high.
The difference between the E3 and E4 study is the amalgamation of E3 clusters 2, 3 and 4 into E4 clusters 3 and 4, removing the separation between the TRI/slope and curvature/TPI characterised clusters. This is most apparent in the shoulder to the northeast of Guadeloupe, and while the confusion index for the E3 study is low in this area, the boundaries of the shoulder clusters in the E4 study have higher confusion. This indicates that depth, salinity, temperature and NPP have a stronger influence on the segmentation of this dataset than the other variables.
By including the fourth eigenvalue in the E4 study and losing the sixth cluster, it could be said that too much variance is included and this has diluted the contribution of the other eigenvalues and increased the map confusion. The E3 study shows more detail, by including the sixth cluster and separating the TRI/slope and curvature/TPI contributions, it also has lower confusion in the region to the east of the islands. It is of note that in the smaller study area the shoulder to the northeast of Guadeloupe is divided into two clusters, similar to the E4 study. The influence of the NPP gradient is not seen in the smaller study, indicating that the TRI/slope and curvature/TPI variables are more significant in this smaller area. The smaller extent study was conducted to focus on the abiotic characteristics impacting the nearshore waters; however, as the clustering in this region seems to repeat in the larger extent studies, removing the deeper, offshore water influence has been shown to not greatly influence the clustering result, and no further patterns were revealed in the shallow waters. By conducting the larger extent studies, the NPP gradient influence has been highlighted, providing context information for the Dominican marine environment.
While this approach is unsupervised and objective in its clustering of the environmental variables, careful consideration of the input variables had to be conducted. Equal weighting is given to each variable; therefore, if there is an imbalance of topographic versus oceanographic variables, the results may be skewed towards the dominant data input. To improve upon this, a review of initial results was used to infer the importance of the input variables and consider their inclusion throughout the study. Furthermore, normalising the data to make the range of values in all variables comparable also means that even if there is very little change in a variable, these values will be stretched to make them as significant as the other data, which may have significant impacts on the final map. The assumption that organisms are equally influenced by each variable is inherent in this equal weighting approach. It is a valid default starting position if very little is known about the area or its fauna. Subsequent investigations should clarify if this assumption holds for the area under investigation.
Using bathymetric data sourced from multiple surveys, collected over a large area using different equipment and with varying resolution, and creating secondary derivatives from these merged surfaces, means any artefacts or errors included in the initial data will be propagated throughout the analysis and may bias results. As there was no access to raw data, which would have taken considerable time to process, it was necessary to use already processed bathymetry layers. The noise remaining from survey artefacts would have presented in the data as higher topographic complexity, e.g., higher rugosity or gradient. This was evident in the initial broadscale map and demonstrates how the results of the analysis are determined by the quality of the input data. The mean Focal Statistics function in the Spatial Analyst toolbox of ArcMap was used to reduce the noise in the bathymetric data, by creating a new value for each cell that is a function of the specified surrounding cells; however, by dampening the noise there is also loss of detail, and so a balance between detail and noise must be reached. Additionally, rather than grid at the coarsest resolution and lose the finer detail available, it is preferable to resample both the higher (100 m) and lower (800 m) resolution bathymetric surfaces to a middle ground (250 m). The low resolution of the open access bathymetric, satellite and oceanographic data means fine scale analysis is not possible. As data coverage and resolution increase in the future, and models advance, the resolution of the final products will also improve.
As part of our study, it was decided to include surface NPP values as a predictor of benthic characteristics. An implicit assumption in this approach is that benthic habitats are equally connected to surface productivity at all locations. The range of water depth in our study was 0-6200 m. However, the impact of primary production upon the benthic community will be influenced by the vertical attenuation due to remineralisation of the organic carbon as it sinks through the water column [51]. Therefore, at greater depths, the connection between surface and benthic environments may be reduced. Additionally, by assuming a one-dimensional water column, any lateral particulate organic carbon (POC) flux is neglected. Better POC estimates in deep waters at the benthic boundary layer would allow more accurate characterisation and the ability to distinguish between the range of benthic environments included in this study. The authors should discuss the results and how they can be interpreted from the perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted.

Conclusions
Based on publicly available datasets, we created broadscale landscape maps for the Commonwealth of Dominica and surrounding islands that allow us to assess the abiotic variables influencing the offshore benthic environment and to infer areas of interest for future study. Of particular note is Macouba Bank, an offshore area with similar characteristics to the productive nearshore environment of Dominica. It is proposed as a focus for future research and a priority for management. By defining and analysing existing conditions of the Dominican environment, the results of this study can be a starting point for MSP development.
The creation of the different extent maps highlighted the strong influence of NPP across the region and demonstrates that to capture large spatial patterns and context, sufficient spatial data coverage is needed.
The low resolution and high noise in the input data means fine scale analysis was not possible in this study; however, in areas of little available data, this method highlights the capabilities of publicly available datasets in providing context for a region. The equal weighting of the variables is both advantageous, as there is no prior knowledge of the most influential variables and so this method is objective, and disadvantageous, as with the equal weighting comes the assumption that all variables are equally influential on organism assemblages. However, careful consideration of the input variables and review of the initial results meant only variables considered important to the study were included in the final product. Our multiple clustering exercises also illustrated that inclusion of too many PCs in the K-means clustering could lead to an increase in the confusion index of the final output maps, and hence may not always be an advantage.