Classiﬁcation of Australian Waterbodies across a Wide Range of Optical Water Types

: Baseline determination and operational continental scale monitoring of water quality are required for reporting on marine and inland water progress to Sustainable Development Goals (SDG). This study aims to improve our knowledge of the optical complexity of Australian waters. A workﬂow was developed to cluster the modelled spectral response of a range of in situ bio-optical observations collected in Australian coastal and continental waters into distinct optical water types (OWTs). Following clustering and merging, most of the modelled spectra and modelled speciﬁc inherent optical properties (SIOP) sets were clustered in 11 OWTs, ranging from clear blue coastal waters to very turbid inland lakes. The resulting OWTs were used to classify Sentinel-2 MSI surface reﬂectance observations extracted over relatively permanent water bodies in three drainage regions in Eastern Australia. The satellite data classiﬁcation demonstrated clear limnological and seasonal di ﬀ erences in water types within and between the drainage divisions congruent with general limnological, topographical, and climatological factors. Locations of unclassiﬁed observations can be used to inform where in situ bio-optical data acquisition may be targeted to capture a more comprehensive characterization of all Australian waters. This can contribute to global initiatives like the SDGs and increases the diversity of natural water in global databases.


Introduction
Operational continental scale monitoring and baseline determination of water quality are fundamental for reporting on marine and inland water Sustainable Development Goals (SDG). Surface water is an important global resource and plays an essential role in biochemical cycling, maintenance of biodiversity, human wellbeing, and prosperity [1,2]. In the Australian context, existing water quality information is often limited, and inland water, in particular, is not well represented in global datasets or even nationally. Incompatibilities in the content and scale of state and territory environment reports confound attempts to integrate water resource data into a consistent national reporting framework [3][4][5]. In addition, existing monitoring practices do not encompass the breadth of temporal and spatial scales required for standardized baseline establishment (SDG indicator 14.1.1) or reporting on continental scale indicators (SDG indicator 6.3.2) germane to human health, environmental sustainability, and economic prosperity.

Spectral Data Classification Dataset Reference
SeaWiFS data ED a Eigenvector Northwest Atlantic [42] Normalized SeaWiFS data ISODATA b Global [43] In situ R rs c hierarchical English Channel [37] In situ R rs FCM d Global [44] In situ R rs FCM d Chinese lakes [45] In situ R rs FCM d Global lakes [46] In situ R rs thresholding Yellow Sea [47] Normalized In situ R rs hierarchical Eastern English Channel North Sea French Guiana [33] Normalized In situ R rs k-means Global [1] Simulated r rs e WFCM f Estonia Finland [48] a ED: Euclidian Distance, b ISODATA: Iterative Self-Organizing Data Analysis Technique, c R rs : Remote sensing reflectance, d FCM: Fuzzy-c Means, e r rs : subsurface reflectance, f WFCM: Weighted Fuzzy-c Means.
Several studies have successfully implemented optical classification techniques to formulate OWTs at a global scale (e.g., [1,43,44,46]). However, apart from incorporating data from three Southern African reservoirs [35], most of these global studies used data collected predominantly in the Northern hemisphere. Most of the regional studies also report on areas in the Northern Hemisphere (Table 1). This study aims to improve our knowledge of the optical complexity of Australian coastal and continental waters. Although several regional datasets have been reported on, e.g., [49][50][51][52][53], this will be the first attempt at analysing the data at a continental scale in Australia. To this end, a database of observations, capturing a range of optical data based on seasonality and geography, was used to:

1.
Develop a method to define distinct OWTs.

2.
Create a set of synthetic generalized inherent optical properties (GIOPs), based on the key features of each unique OWT. 3.
Present a case study as an example of a potential application of implementing the GIOPs water quality monitoring at a drainage basin scale.

Datasets
The observed reflectance spectra from coastal and inland waters share common features of Case 2 waters. Therefore, [44] proposed that coastal and inland waters may benefit from a common classification scheme, using aggregate data that provide continuity from fresh waters to marine environments. Thus, a database of 396 in situ measurements of specific inherent optical properties (SIOPs) and water quality concentrations of optically active water column, acquired in waterbodies around the Australian coast (Table S1) and along a latitudinal gradient in Eastern Australia (Table S2), was collated for analysis in this study. Figure 1 Shows the spatial distribution of sampling locations. Each record in the database represents quality assured observations collected at a single location with a complete description of optically active water column constituents and IOPs. A complete dataset Remote Sens. 2020, 12, 3018 4 of 24 comprises of in situ absorption and backscattering measurements as well as laboratory analysis of total suspended sediments, pigment analysis, and CDOM absorption. The protocols for sampling and sample analyses are comprehensively described in [54]. Figure 2a,b and Table 2 summarize the ranges of the IOPs used in this study while Figure 2c shows the absorption budget, demonstrating the range of the optically active water column constituents within the dataset.
Remote Sens. 2020, 01, x FOR PEER REVIEW 4 of 26 Table 2 summarize the ranges of the IOPs used in this study while Figure 2c shows the absorption budget, demonstrating the range of the optically active water column constituents within the dataset.   Table 2 summarize the ranges of the IOPs used in this study while Figure 2c shows the absorption budget, demonstrating the range of the optically active water column constituents within the dataset.   Most studies related to generalized OWTs are based on optical classification of in situ hyperspectral surface reflectance data (R rs ) (Table 1). However, surface reflectance, as an apparent optical property (AOP), is affected by water column optical properties and, to a lesser degree, on other environmental factors, such as sun and observations angles, atmospheric conditions, and sea-surface state [55,56]. There are also a diversity of radiometers and measurement protocols that can be implemented to measure AOPs, some of which are reviewed by [57]. This can potentially introduce spectral variations into an in situ dataset collated from numerous field campaigns. To ensure that the surface reflectance data used for this analysis represents the effects of only the water column, a modelling approach was implemented. This approach is like that of [48,58] where the authors classified coastal and open ocean waters, respectively, using simulated remote sensing reflectance (r rs ). A summary of the data analysis approach used in this work is presented in Figure 3.
A four-component optical model (IOPs of the waterbody, water surface state, sky spectral radiance distribution, and the nature of the bottom boundary) was parameterized in Ecolight 5.0 (Sequoia Scientific, Inc., Bellevue, WA, USA) [59,60]. For all simulations, the sun zenith angle was set at 30 • from nadir, wind speed was fixed at 0 m s −1 , and the water column was assumed to be optically deep and homogeneous. Inelastic scattering was excluded. The IOP inputs, obtained from the bio-optical database, were thus the only factors contributing to spectral variability within the simulated spectra. The simulated surface reflectance spectra (R rs ) were produced at 5 nm spectral intervals between 350 nm and 800 nm.
Spectral clustering is usually based on one of two approaches: hierarchical or non-hierarchical. Hierarchical methods initially cluster the two most similar spectra together and then iterates, forming higher clusters until all the spectra are combined into a single cluster [37,61]. Non-hierarchical methods consider the overall distribution of spectral pairs and classifies them into a prescribed number of clusters. The most popular non-hierarchical clustering techniques is k-means and fuzzy-c means (e.g., [1,[44][45][46]).
The simulated R rs spectra in this study were subjected to hierarchical agglomerative clustering [61,62] where each spectrum was initially considered as a single-element cluster. At each Remote Sens. 2020, 12, 3018 6 of 24 step of the algorithm, a Euclidian distance (ED) metric was used to determine the two clusters that are the most similar [62,63]. These clusters were then combined into a new bigger cluster. This procedure was iterated until all points were members of just one single big cluster. A centroid linkage function was used to group pairs of objects into clusters based on their similarity [61]. This process was iterated until all the objects in the original dataset were linked together in a hierarchical tree.
The simulated Rrs spectra in this study were subjected to hierarchical agglomerative clustering [61,62] where each spectrum was initially considered as a single-element cluster. At each step of the algorithm, a Euclidian distance (ED) metric was used to determine the two clusters that are the most similar [62,63]. These clusters were then combined into a new bigger cluster. This procedure was iterated until all points were members of just one single big cluster. A centroid linkage function was used to group pairs of objects into clusters based on their similarity [61]. This process was iterated until all the objects in the original dataset were linked together in a hierarchical tree.
To ensure the shape of the spectrum is adequately taken into consideration, the spectra were scaled prior to clustering. To preserve the shape of the spectra across the entire spectral range by not artificially giving a greater weight to a defined part of the Rrs spectra, normalization was based on the mean (μ) and standard deviation (σ) of the entire database [1,33]:
To ensure the shape of the spectrum is adequately taken into consideration, the spectra were scaled prior to clustering. To preserve the shape of the spectra across the entire spectral range by not artificially giving a greater weight to a defined part of the R rs spectra, normalization was based on the mean (µ) and standard deviation (σ) of the entire database [1,33]: After spectral clustering, the SIOP data, used as input for each spectrum was analysed to determine whether the spectral clusters could be translated into unique SIOP groupings. A one-way ANOVA Remote Sens. 2020, 12, 3018 7 of 24 was applied to all the SIOP parameters in each of the spectral clusters to determine which parameters had the most significant effect on the spectral variability between classes. The parameters contributing most to spectral variability within the simulated spectral dataset were used to determine the SIOP separability between the different spectral clusters.
The SIOPs of clusters were compared with each other using a Tukey HSD test with a 95% family-wise confidence level [64]. Where the SIOPs closely resembled each other, clusters were merged into larger groupings. The resulting clusters were used to classify satellite-derived surface reflectance observations extracted over selected waterbodies in Eastern Australia.

Satellite Data
To capture enough variability in continental water quality to demonstrate the applicability of implementing the derived OWTs for temporal water quality monitoring at a drainage basin scale, a large temporal and spatial dataset is required [17,20,65]. Digital Earth Australia (DEA) provides analysis-ready (ARD) satellite data in the form of stacks of consistent, time-stamped surface reflectance observations. Using high-performance computing power provided by the National Computational Infrastructure (NCI) and commercial cloud computing platforms, DEA organizes and prepares the satellite data using Open Data Cube (ODC) technology [66]. Preparation includes geometric correction [67,68], atmospheric correction [69], and pixel quality flags [70][71][72].
For the purpose of this study, Sentinel-2 MSI satellite data were used. The Copernicus Sentinel-2 mission comprises a constellation of two polar-orbiting satellites placed in the same sun-synchronous orbit, phased at 180 • to each other. Sentinel-2A was launched in June 2015 and Sentinel-2B in March 2017, creating a revisit time of 5 days. Sentinel-2 MSI has six spectral bands that are potentially useful for this study: Coastal aerosol (442 nm), blue (B, 490 nm), green (G, 560 nm), red (R, 665 nm), red-edge 1 (RE-1, 705 nm), and red-edge 2 (RE-2, 740 nm). However, the atmospheric correction protocol implemented on DEA is based on standard terrestrial aerosol climatology model parameters and processing. Consequently, it was found to have a large positive bias across the surface reflectance of all bands due to uncorrected aquatic specific influences such as sky and sun-glint. In addition, a shape distortion of the spectra between the coastal aerosol and blue bands of the Sentinel-2 MSI images is observed with the application of a terrestrially parameterized atmospheric correction process [73]. To limit the effect of this distortion in the blue region of the spectrum, only five of the Sentinel-2 MSI bands were used for further analysis (B, G, R, RE-1, RE-2).
A workflow was developed to select suitable sampling locations at which to extract Sentinel-2 MSI data from DEA. The aim was to extract data from all the relatively permanent inland waterbodies of sufficient size. For this purpose, the Water Observations from Space (WOfS) dataset [74] was used to identify pixel locations with a high percentage (above 80%) of observed water. Selected locations had to be in the centre of a window of at least 9 × 9 WOfS pixels with similar high percentage of observed water and was a minimum distance (two or more pixels) away from the edges of the waterbody, to minimize any adjacency effects from nearby vegetation. These criteria led to the definition of several rules that were applied to the WOfS dataset (25m pixel resolution) to select the sampling locations. The resulting algorithm can be summarized as described below and was applied to a series of overlapping (spatial) windows covering the region of interest, in a parallelized implementation executed on the NCI:

1.
Create a mask of permanent waterbodies over the current (spatial) window by thresholding the WOfS dataset at the specified percentage limit (80%).

2.
Erode the water mask (by 2 pixels) and then re-expand it (by 2 pixels) in order to remove thin features connecting several main waterbodies: this allows for the selection of more than one sampling location where several waterbodies are connected by, e.g., thin river channels (would otherwise be counted as a single waterbody in the next step).

3.
Identify and count all the spatially distinct waterbodies in the current window. Discard any waterbody whose boundary extends beyond the edge of the window. The window size (1.0 × 1.0 degree) and overlap (0.7 degree) are selected such that these split waterbodies are ultimately processed (as a whole) in a different (overlapping) window, resulting in an unbiased selection of sampling location for these waterbodies. The window overlap is selected such that the largest waterbodies over the region of interest are properly captured.

4.
For each identified waterbody, erode the water mask (by 2 pixels) to remove the potential influence of nearby vegetation on the edges of the waterbody.

5.
Further erode the water mask (by 4 pixels) to ensure that the selected pixel is at the centre of a window of at least 9 × 9 pixels. 6.
For the remaining pixels, gradually erode the mask further until it cannot be eroded any more without removing all pixels. The resulting pixel(s) represent the most central location(s) for the considered waterbody. If more than one pixel remains, select the location closest to centre of gravity of the remaining pixels.
An area of interest (AOI) for data extraction was defined that covered three major drainage divisions in eastern Australia: The Murray-Darling drainage basin (MDB), the North East Coast (NEC), and the South East Coast (SEC). These three drainage regions cover a latitudinal gradient along the east coast of Australia and straddles the Great Dividing Range, which serves as a watershed, dividing the two coastal drainage divisions from the large inland drainage region of the MDB. This AOI represents a wide range of limnological, climatological, and land-use factors. Table A3 presents a summary of the main characteristics of each of the three drainage regions.
ARD Sentinel-2A and Sentinel-2B surface reflectance observations were extracted from DEA at 885 locations, over a period spanning 2016-2019. The data represent water quality variability over a wide variety of seasonal and limnological conditions.
The spectral observations were filtered using the pixel quality flags provided by the DEA dataset to ensure that only pure water pixels were analysed. An additional SWIR threshold was applied to limit the number of glint-affected pixels included in the final dataset [75]. Each location was tagged as coastal (estuaries and coastal lakes) or inland (fresh water) using the Australian coastal waterways map layer [76]. Inland locations were also tagged as rivers (bodies of mainly flowing water) and lakes (bodies of mainly static water/reservoirs), using GEODATA TOPO 2.5M 2003 map layers [77]. All locations were also tagged with the associated drainage basin and with an elevation class using GEODATA TOPO 2.5M 2003 map layers.

Data Analysis
The modelled R rs spectra, corresponding to the spectral shape of each of the observations that describe the OWT clusters, were convolved to the spectral response of the Sentinel-2 MSI bands. The surface reflectance of the Sentinel-2 MSI pixel observations was compared to each convolved modelled spectrum in each of the clusters using the normalized Spectral Similarity Metric (nSSM) [78] and assigned to the cluster that yielded the lowest nSSM metric.
The nSSM combines the spectral shape and a normalized Euclidean distance (nED) between spectra, accounting for differences in albedo, to determine the level at which spectra can be separated from each other. This metric enables objective determination of the similarity between spectra, where a smaller nSSM value represents a better match.
Observations where the smallest nSSM value exceeded a maximum threshold were considered unclassified. To define the nSSM threshold, the modelled R rs spectral response of each in situ observation was compared with each other and the threshold was determined by determining the lowest nSSM value where most of the modelled OWT spectra were assigned to their original cluster.

Results
Concurrent in situ AOP measurements for 73 of the 110 inland data points shows a relatively good comparison between model derived and measured R rs (Figure 4). Linear percentage error [28] indicates wavelength-dependent differences, introduced by the confounding effects of environmental factors, Remote Sens. 2020, 12, 3018 9 of 24 such as atmospheric variability, the water surface state (with swell-, wave-, and wavelet-induced reflections), and refraction of diffuse and direct sky and sunlight [55]. Errors are larger where the signal is smallest, e.g., below 450 nm and above 700 nm. The agreement between modelled and measured R rs gave us confidence to use the modelled spectra to analyse spectral variability on Australian waters. indicates wavelength-dependent differences, introduced by the confounding effects of environmental factors, such as atmospheric variability, the water surface state (with swell-, wave-, and wavelet-induced reflections), and refraction of diffuse and direct sky and sunlight [55]. Errors are larger where the signal is smallest, e.g., below 450 nm and above 700 nm. The agreement between modelled and measured Rrs gave us confidence to use the modelled spectra to analyse spectral variability on Australian waters. Spectral clustering resulted in 32 and 17 unique clusters for the coastal and inland waters datasets, respectively. The results of the analysis of variance test applied to the SIOP data for each cluster are listed in Table 3. The test showed that the spectral slope constant of NAP absorption coefficient (ϒaNAP) contributed significantly to spectral variability within both the coastal and inland waters dataset while chlorophyll concentration contributed significantly to spectral variability within the coastal dataset and NAP concentration contributed significantly to spectral variability within the inland dataset. The absorption of CDOM at 440nm (aCDOM(440 nm)) had a strongly significant effect on spectral variability within the coastal dataset and a lesser effect on spectral variability within the inland dataset. Table 3. Analysis of variance of the input specific inherent optical properties (SIOP) parameters between the spectral classes returned by the hierarchical clustering of the simulated Rrs data. Black text indicates the coastal/marine waters dataset (n = 32), and grey text indicates the inland waters dataset (n = 17). Refer to Table A2   Spectral clustering resulted in 32 and 17 unique clusters for the coastal and inland waters datasets, respectively. The results of the analysis of variance test applied to the SIOP data for each cluster are listed in Table 3. The test showed that the spectral slope constant of NAP absorption coefficient (γ aNAP) contributed significantly to spectral variability within both the coastal and inland waters dataset while chlorophyll concentration contributed significantly to spectral variability within the coastal dataset and NAP concentration contributed significantly to spectral variability within the inland dataset. The absorption of CDOM at 440 nm (a CDOM (440 nm)) had a strongly significant effect on spectral variability within the coastal dataset and a lesser effect on spectral variability within the inland dataset. Table 3. Analysis of variance of the input specific inherent optical properties (SIOP) parameters between the spectral classes returned by the hierarchical clustering of the simulated R rs data. Black text indicates the coastal/marine waters dataset (n = 32), and grey text indicates the inland waters dataset (n = 17). Refer to Table A2  Following clustering and merging, most of the modelled spectra and modelled SIOP sets were clustered in five groups within the coastal dataset (240 of the 286 observations) and six groups within the inland dataset (102 of the 110 observations). The remaining 54 spectra represents small clusters (n < 4) or individual in situ observations that were too distinct, both spectrally and bio-optically, to be grouped with any of the bigger clusters or each other. Figure 4 shows the variability in optically active water column constituents (a-c) and SIOPs (d-h) of the 11 clusters that emerged from the data analysis. Figures 5 and 6 shows the variability in the concentration, absorption and scattering characteristics of the optically active water column constituents of the 11 clusters. A summary of the main features of the 11 clusters is presented in Table 4.  Figure 7 Figure 7 captures the variability in spectral shape of each of the observations that describe the 11 OWTs in the dataset. There is a clear distinction between the clusters that made up the coastal dataset (01-05) and those of the inland dataset (06-11). The coastal spectra tend to be dominated by response curves that peaks in the blue region of the spectrum while the inland spectral responses peak predominantly within the green and red region of the spectrum.       There is a clear distinction between the clusters that made up the coastal dataset (01-05) and those of the inland dataset (06-11). The coastal spectra tend to be dominated by response curves that peaks in the blue region of the spectrum while the inland spectral responses peak predominantly within the green and red region of the spectrum. Figure 8 shows the spatial distribution of 885 locations with relatively permanent water bodies of sufficient size to not be affected by adjacency effects. The colour scale indicates the number of Sentinel-2 MSI observations retrieved from each point, while the three major drainage regions in the AOI are delineated in shades of grey. There were generally fewer observations on the western and northern sides of the MDB and in the far north of the NEC. To determine to what extent the nSSM could match Sentinel-2 MSI surface reflectance data to the 11 OWTs, each convolved Rrs spectrum was compared to the cluster medians and ranges and assigned to the cluster that yielded the lowest nSSM metric. Ideally, the input spectra would be matched back to the original groupings that they were clustered to. However, with the reduced number of spectral bands implemented for the Sentinel-2 MSI spectra (compared to the 90 spectral bands used for the cluster analysis), some of the finer spectral differences between the clusters will be degraded.  To determine to what extent the nSSM could match Sentinel-2 MSI surface reflectance data to the 11 OWTs, each convolved R rs spectrum was compared to the cluster medians and ranges and assigned to the cluster that yielded the lowest nSSM metric. Ideally, the input spectra would be matched back to the original groupings that they were clustered to. However, with the reduced number of spectral bands implemented for the Sentinel-2 MSI spectra (compared to the 90 spectral bands used for the cluster analysis), some of the finer spectral differences between the clusters will be degraded. Table 5 shows an overall classification accuracy of 69% with a Kappa of 62%. Some clusters are more uniquely separable than others (e.g., 11). A few of the clusters with smaller sample sizes are more often mis-classified (e.g., 05, 07, 08, and 10). It should be noted that 08, 07, and 10 are comprised of smaller clusters that were bio-optically similar, thus compromising the spectral uniqueness of the groupings. Within the clusters with a higher number of observations, generally data belonging to the coastal dataset is most often misclassified into another coastal OWT. OWT 06, representing deep, clear inland lakes is often confused with coastal clusters, representing clear coastal waters. Cluster 09 appears to not be spectrally unique with misclassifications into several other classes. Table 5. Confusion matrix of simulated spectral response of the SIOP observation in the Australian water database, convolved to five Seninel-2 MSI bands, matched to the convolved spectral response of the OWT cluster medians and ranges. Of the 69% of the convolved R rs spectra that were correctly matched to the original OWT clusters, 20% were matched with an nSSM value of 1.0 or larger. 40% of the convolved R rs spectra that were incorrectly matched to the original OWT clusters had an nSSM value exceeding 1.0. This value was selected as the maximum threshold where matches with a larger nSSM value were considered unclassified.
The surface reflectance of the Sentinel-2 MSI pixel observations was compared to each convolved modelled spectrum in each of the 11 clusters and assigned to the cluster that yielded the lowest nSSM metric. An nSSM threshold was defined by determining the lowest nSSM where most of the modelled OWT spectra were assigned to their original cluster. Any match greater than 1.00 was considered spectrally distinct from all the existing OWTs and labelled "unclassified". Figures 9 and 10 show the limnological, landscape, and temporal distribution of OWTs matched to Sentinel-2 MSI observations over each waterbody within the three drainage regions. The observations, which were not matched with the existing OWTs, are labelled a red colour. There appear to be more unclassified observations in the inland lakes and rivers class than in the coastal waters ( Figure 9).
The inland lake and river classes show a clear limnological split east and west of the Great Dividing Range (Figure 9). Waterbodies in the west (MDB) have a larger proportion of observations with dominant spectral responses in the green and red part of the spectrum and waterbodies in the east (NEC and SEC) have a higher proportion of observations with dominant spectral responses in the blue part of the spectrum. north and drier savannas towards the south and west. The coastal waterways in this region is characterized by estuarine waters (Figure 9). The lakes and rivers in this region are predominantly clear during the observation period while some rivers in the lower elevations have water that is dominated by suspended sediments (Figure 9). This region has relatively similar proportions of unclassified observations within the lake and river classes, while the coastal waterways have a relatively low proportion of unclassified observations. The MDB is a predominantly inland drainage basin with a wide range of climatic and geohydrological conditions (Table A3 in Appendix A). There is a strong bimodal clustering of OWTs within the lakes and rivers class in this region ( Figure 9) The observations in the lakes in the lower elevations (<200m) falls predominantly in OWT classes where absorption is CDOM and NAP dominated with high amounts of suspended sediment (11) or chlorophyll (08). Lakes in higher elevations generally have clearer waters with a spectral response dominated by blue reflectance (02, 03, 06). Similarly, rivers in the lower elevations are more dominated by OWT classes with higher NAP and chlorophyll concentrations, while the higher elevations are predominantly clearer waters that are dominated by CDOM absorption (Figure 9). There appears to be a seasonal shift in water quality across the MDB drainage region with the frequency of NAP dominated OWTs decreasing during the winter months (June, July, August), and with an increase in the observation frequency of clearer OWTs during this period (Figure 10). The lake class in the higher elevations has the most unclassified observations in this drainage region.
The SEC has a temperate climate with predominantly moderate summer rainfall (Table A3), resulting in an increase in the observation frequency of more sediment dominated water types in the summer months (December, January, February, Figure 10). It has many coastal waterways, characterized by estuarine waters (Figure 9). The lakes in this region are predominantly clear while many of the rivers in the lower elevations have water that is dominated by suspended sediments (Figure 8). This region has relatively similar proportions of unclassified observations within the coastal, lake, and river classes. The rivers class in the lower elevations has the most unclassified observations within this class in this drainage region.
The monsoonal climate of the NEC (Table A3) results in a lower observation frequency in the summer months when conditions are cloudy (Figure 9). This region comprises of numerous drainage basins with predominantly natural waterways in the north. The southern drainage basins have more man-made water storages. There is also a climatological gradient with wet tropical conditions in the north and drier savannas towards the south and west. The coastal waterways in this region is characterized by estuarine waters (Figure 9). The lakes and rivers in this region are predominantly clear during the observation period while some rivers in the lower elevations have water that is dominated by suspended sediments (Figure 9). This region has relatively similar proportions of unclassified observations within the lake and river classes, while the coastal waterways have a relatively low proportion of unclassified observations.

Discussion
The objective of this study was to improve our knowledge of the optical complexity of Australian waters. A workflow was developed to cluster the modelled spectral response of a range of in situ biooptical observations collected in Australian coastal and continental waters.
Coastal and inland Case 2 waters share common features and represents a continuum from fresh waters in catchments to coastal marine environments [1,2]. Despite this continuum, there is a clear distinction between the clusters that made up the coastal dataset and those of the inland dataset in this study. Similar to a global study by [1], the coastal spectra tend to be dominated by response curves that peaks in the blue region of the spectrum, while the inland spectral responses peaks predominantly within the green and red region of the spectrum.
The coastal clusters represent Australian coastal waters ranging from clear blue to more turbid estuarine waters. The first three coastal OWTs (01, 02, 03) have a strong blue reflectance signal, relative to the rest of the spectrum with absorption budgets dominated by CDOM absorption. The 01 OWT most closely resembles open ocean waters, with NAP absorption affecting the absorption budget to a small degree and very low inputs from phytoplankton. The 02 OWT represents coastal

Discussion
The objective of this study was to improve our knowledge of the optical complexity of Australian waters. A workflow was developed to cluster the modelled spectral response of a range of in situ bio-optical observations collected in Australian coastal and continental waters.
Coastal and inland Case 2 waters share common features and represents a continuum from fresh waters in catchments to coastal marine environments [1,2]. Despite this continuum, there is a clear distinction between the clusters that made up the coastal dataset and those of the inland dataset in this study. Similar to a global study by [1], the coastal spectra tend to be dominated by response curves that peaks in the blue region of the spectrum, while the inland spectral responses peaks predominantly within the green and red region of the spectrum.
The coastal clusters represent Australian coastal waters ranging from clear blue to more turbid estuarine waters. The first three coastal OWTs (01, 02, 03) have a strong blue reflectance signal, relative to the rest of the spectrum with absorption budgets dominated by CDOM absorption. The 01 OWT most closely resembles open ocean waters, with NAP absorption affecting the absorption budget to a small degree and very low inputs from phytoplankton. The 02 OWT represents coastal waters with a strong estuarine influence. It has the lowest γaNAP, suggesting a higher proportion of large organic particles [79,80]. The 02 OWT closely resembles the 01 OWT but has a higher phytoplankton component. The remaining two coastal OWTs (04, 05) represents moderately reflective tropical waters [49]. The 04 OWT is dominated by phytoplankton absorption and has relatively high chlorophyll content, suggesting that this represents eutrophic lagoonal waters. The higher a CDOM (440 nm) and smaller γaCDOM of the OWT indicates a coral reef matrix influence [49,81]. The higher NAP concentration and large γaNAP of the 05 OWT implies that this OWT represents tropical waters with a stronger coastal influence than 04.
The inland clusters represent eastern Australian continental waters ranging from relatively clear alpine reservoirs to turbid shallow lakes. All these OWTs have a maximum reflectance peak around 570-585 nm. The first inland OWT (06) has relatively low reflectivity and low concentrations of water column constituents, denoting clear waters with high transparency. The two green inland OWTs (07, 08) are both highly reflective water types with high chlorophyll concentrations. They are spectrally very similar with spectral differences mainly limited to a difference in the 705 nm reflectance peak and a lower reflectance in the blue region by 08, characteristic of the stronger CDOM absorption in this OWT. The 07 OWT has a smaller median γ b b NAP and a larger median γ aNAP than 08, suggesting a predominance of larger organic particles in the water column [79,80]. The last three OWTs represents waters that are dominated by NAP and CDOM absorption with a relatively small phytoplankton component. The 09 OWT has a moderate reflectivity and represents fairly clear inland waters. The comparatively high NAP absorption and low b* b NAP suggests a water column that is characterized by small suspended particles. Both the 10 and 11 OWTs are highly reflective sediment-laden water types with high NAP concentrations and high CDOM absorption. The large γaNAP of 10 suggests larger amounts of organic particulate material [79] while the relatively low b* b NAP of the 11 OWT suggests waters with smaller suspended particles [82].
To test the applicability of the 11 derived OWTs for water quality monitoring at a drainage basin scale, temporal analysis was undertaken on Sentinel-2 MSI surface reflectance observations. A total of 885 sample locations across three drainage regions were used in the case study. The satellite-derived surface reflectance observations, sourced from DEA, captured a wide range of climatological and limnological conditions. The analysis demonstrated clear limnological and seasonal differences in water types within and between the drainage divisions, which corresponds with general limnological, topographical, and climatological factors. Due to the operational period of the Sentinel-2 satellite platforms coinciding with severe drought conditions [83], there were generally fewer observations on the western and northern sides of the MDB, as surface water extents were significantly reduced. There were also fewer observations collected over this region before the Sentinel-2 platform was fully operationalized, which further confounded the analysis. Less observations in the far north of the NEC are predominantly due to cloud cover. As much of this region is subjected to monsoonal conditions in summer [84], there are larger proportions of observations made in winter, which will affect any analysis of seasonal water quality trends from satellite data in this region.
The DEA Landsat archive, representing more than 30 years of ARD observations, is ideal to capture natural water quality variations caused by seasonal, limnological, and climatological factors as well as anthropogenic factors, such as land-use change [85]. However, the atmospheric correction protocol implemented on DEA satellite data is based on standard terrestrial aerosol climatology model parameters. Atmospheric correction over water bodies is challenging because waterbodies are dark, and about 90% of the signal received by the sensor is not caused by the water itself [86,87]. The large positive bias in the shape of the reflectance in the blue bands over water targets, which is introduced by this model [73], renders the Landsat data unsuitable for the purpose of this study. With future reprocessing, incorporating a maritime aerosol type, in combination with the Landsat-8 top of atmosphere (TOA) reflectance-calibrated product and a glint-correction algorithm [73], the spectral shape of the blue bands will be more suitable for aquatic applications on DEA datasets.
Although the Sentinel-2 MSI data on DEA are corrected with the same atmospheric correction parameterization as the Landsat data, the increased number of spectral bands and finer spatial resolution somewhat alleviates the existing issues with the spectral shape of the surface reflectance spectra. To further limit the confounding effects of the existing atmospheric correction protocol on the spectral shape of aquatic targets, the coastal blue band of the Sentinel-2 MSI sensor was excluded from the analysis.
The optical cluster analysis was carried out on hyperspectral modelled R rs spectra whilst the spectral matching of the case study was applied to five multispectral bands. The reduced spectral coverage of multispectral sensors limits the separability of water types because some water column constituents have characteristic absorption features in narrow sections of the spectrum [88]. The reduced number of spectral bands implemented for the Sentinel-2 MSI spectral analysis degraded some of the finer spectral differences between the clusters. Due to the limited spectral resolution of the Sentinel-2 satellite data, the spectral signatures of some of the OWTs in this study became less uniquely separable than others. To mitigate the confounding effect of the reduced spectral resolution of the Sentinel-2 bands, a threshold metric was defined. An optimal threshold is a compromise between maximizing the probability of a correct match and minimizing the rate of false positive matches [89]. The threshold defined for the nSSM metric balances the requirement to capture the maximum number of correct spectral matches with the need to identify areas where there are genuine gaps in the existing water quality database. To limit the confounding effect of environmental factors on the accuracy of results, care was taken to select only observations that were not affected by atmospheric variability, sun glint, and the effects of the brighter response of adjacent terrestrial targets [56].
The OWT classification of waterbodies demonstrates clear limnological and seasonal differences in water types within and between the drainage divisions, which corresponds with general limnological, topographical, and climatological factors in those regions [84]. Unclassified observations were clustered predominantly in lakes within the alpine regions and east of the Great Dividing Range. Observations within rivers (flowing water) are less frequently labelled unclassified. Targets with a higher albedo were more often matched to an OWT, suggesting that some of the unclassified observations in the deep, clear lakes in the alpine regions [84] may be due to low target albedo. The water leaving signal at the satellite sensor is a small part of the total measured signal [86,88]; this potentially constrains the ability of the Sentinel-2 sensor to record a sufficient amount of water leaving light through a set of atmospheric and air water interface conditions to allow a clearly distinguishable spectral signature that can be matched to an existing OWT.

Conclusions
The OWTs defined in this study were based on in situ observations capturing the bio-optical response of a wide range seasonal and geographical variability. The extent of the Australian coastline and diversity of continental landscape and climatic features implies that there may still be gaps in our knowledge. For example, the current coastal dataset, except for the Great Barrier Reef, lacks the temporal range to fully understand the variability associated with seasons [90]. Similarly, the inland dataset has been sourced exclusively from eastern Australia and lacks data associated with unique limnological features further west [91].
The deficiency in capturing the full variability of water quality across the Australian landscape can be observed in the clustering of waterbodies that regularly do not match to any of the existing OWTs in the study area of the case study. The locations of the unclassified observations can be used to inform where investment for in situ bio-optical data acquisition may be better targeted to achieve a more comprehensive objective characterization of all Australian waters, which can contribute to global initiatives like the SDG.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/18/3018/s1, Table S1: Description of the coastal and marine datasets used in this work, Table S2: Description of the lakes and reservoirs that contributed towards the inland water quality dataset, Figure S1: Hierarchical clustering dendrogram of the simulated R rs spectra produced from the coastal waters dataset, Figure S2: Hierarchical clustering dendrogram of the simulated R rs spectra produced from the inland waters dataset. Funding: IWQual-a collaborative project between CSIRO and Geoscience Australia.

Acknowledgments:
The authors wish to acknowledge and thank the many CSIRO staff, past and present, involved in the acquisition of the data that underpins our science. For each of these many field campaigns, Lesley Clementson (CSIRO Oceans and Atmosphere, Hobart) has undertaken the laboratory analyses of constituent concentrations, absorptions, pigment data and is gratefully acknowledged.

Conflicts of Interest:
The authors declare no conflict of interest.  Table A2. Glossary of water column constituents presented in this study.

Parameter Description Unit
C CHL Chlorophyll concentration (a proxy for phytoplankton) µg L −1 CDOM Coloured dissolved organic matter C NAP Non algal particulates concentration mg L −1 PHY Phytoplankton a* PHY (440 nm) Chlorophyll-a specific absorption at 440 nm m 2 mg −1 a* PHY (676 nm) Chlorophyll-a specific absorption at 676 nm m 2 mg −1 γ aCDOM Spectral slope constant of CDOM absorption coefficient nm −1 a CDOM (440 nm) Absorption of CDOM at 440 nm m −1 a* NAP (440 nm) Specific absorption of NAP at 440 nm m 2 g −1 γ aNAP Spectral slope constant of NAP absorption coefficient nm −1 b* bNAP (555 nm) Specific backscattering due to NAP at 555 nm m 2 g −1 γ b b NAP Spectral slope constant of NAP backscattering coefficient nm −1