Automated estimations of benthic coverage were in agreement with those manually generated by expert observers for all five global regions. Similarly, Williams et al. [
30] found high consistency between the cover estimations done by machine and observers using images from Hawaii and American Samoa and CoralNet, which is an online platform designed for automated image analysis based on deep learning (
www.coralnet.ucsd.edu). The use of deep learning was also superior in performance when compared to shallow learning approaches (i.e., Support Vector Machine, SVM). With low errors, comparable power in detecting temporal trends to expert observations and a productivity that is at least 200x higher than manual labour, at fractional cost of manual data extraction (1%), the results of the present study make a very strong case for implementing deep learning in coral reef monitoring. It is important to add a couple of caveats. Firstly, there are limitations in terms of the taxonomic resolution, which may or may not be important depending on the question being asked. The ability of machines to detect objects is, however, likely to improve over time as machine learning innovations continue to escalate. Secondly, while monitoring of coral reefs can benefit from fast processing and data standardisation powered by automated image analyses, an integration between human expert observations and machine learning may be recommended in some circumstances.
According to our results, the errors introduced by implementing deep learning in automated abundance estimations (2%–6%) are within the range of previously reported inter- and intra-observer variability for established monitoring programs (e.g., 2%–5%, Long Term Monitoring Program, AIMS, Australia; [
8]), where the variability introduced by automated estimations of abundance are well below the range of spatial and temporal changes observed in nature [
8,
9,
31]. Concomitantly, the introduced error by automated annotations had little influence over the detection of change in coral cover, because the statistical power for detecting significant change resulted virtually identical between expert and automated estimations. Furthermore, data continuity was not constrained by implementing automated image processing given that these estimations are highly compatible with established monitoring programs, which ensures consistency in long-term monitoring [
2]. Therefore, we conclude that errors introduced by networks are unlikely to limit the effectiveness of machine-based systems in monitoring change.
4.1. Challenges and Further Considerations in Automated Benthic Assessment
Challenges posed by the visual identification of species from imagery, taxonomic/functional definition of labels, inter-observer variability in abundance estimations and innate aspects of automated image annotation can partially explain observed errors.
As one label typically contains several morphologically diverse taxa, the taxonomical complexity or potential number of species within a label can introduce variability in the accuracy of automated classifications. The definition of labels influences the error of automated classifiers, in terms of innate capacity of the machine to identify each label, as well as the error introduced by inter-observer variability in the training of network models. A key example is the algae group, comprised of a large number of species, with an estimate of 630 for the Great Barrier Reef alone [
32], and are only represented here by five functional groups or labels (Epilithic Algal Matrix, Macroalgae, Crustose Coralline Algae and Cyanobacteria). Therefore, variations in the visual attributes that define each species within a label adds confusion in terms of identification [
17,
30]; also observed here. Arguably, inter- and intra-observer variability drives larger errors per class because the training and test datasets derive from variable observer estimations [
8,
17,
18]. While separating both effects (machine-introduced and inter-observer error) can be difficult, more defined labels will yield to lower errors in automated analyses [
17]. This can be observed when comparing the overall error and community-wide agreement among regions, where more defined label-sets and lower taxonomic complexity (e.g., Eastern Atlantic and Central Pacific,
Table 1), showed the lowest error and highest community similarity between manual and machine observations. Arguably, a relationship between mean observed values and the variance of automated annotations could explain that aggregated labels (e.g., Hard Corals) will be more abundant than taxonomically defined labels (e.g.,
Montastraea cavernosa) and therefore present larger errors. However, an analysis of the error of automated estimations across the mean of observed values shows no bias or over dispersion of the error (
Figure A1). However, the aggregation of labels into larger benthic categories may carry forward sources of error in machine annotations that lead to slightly larger discrepancies between machine and observer estimations.
Visual traits of coral reef benthos can also pose several challenges for estimations of abundance. Among these traits, there are: (1) plasticity of forms, (2) visual similarity among phylogenetically distinctive organisms (e.g., algae and corals, soft-corals and hard-corals), (3) undefined shapes and edges (e.g., algae), (4) intricate growth among organisms (e.g., algal matrices), and (5) different magnification scale required to accurately identify organisms.
Environmental regimes (e.g., wave action and light) can heavily influence the morphology of sessile organisms [
33]. A single taxonomical classification can be visually distinct in response to environmental regimes, thus introducing variability in the training data. For example, hard coral species from the genera
Orbicella and
Siderastrea, typically from massive mound or branching morphologies, but in light deficient environments (e.g., at depth), the same species will adopt plating morphologies, maximising their capacity to capture light [
34]. Considering morphological traits in the classification scheme may allow to account for phenotypical plasticity among environmental regimes, maximising the consistency of the training dataset, hence the machine performance.
Ill-defined edges, patchiness, intricate growth and the different resolution of taxonomic attributes in marine benthos can also add complexity in their identification using point-based abundance estimations. A clear example is the classification of algae, which consistently showed the highest error (3%–6%) across regions, concurring with other studies [
17,
18,
30]. If the definition of the algae is too patchy (e.g., turf algae) or it is growing among other algae species (e.g., macroalgae), higher magnification maybe needed to resolve its taxonomy. The approach used here for automated image identification defines a fixed window size, on which the machine extracts and weights the importance of visual attributes (e.g., texture, colours) to assign a label. Well-defined and large organisms, such as corals and soft corals, showed the lowest error (1%–2%) when observed in our window size. This window size, however, penalises the estimation of smaller, patchy or less defined organisms, such as Epilithic Algal Matrix (EAM, error ~ 3%–6%). Using multi-scale or regional networks to account for the taxa-specific model sensitivity (e.g., [
35]) may help by increasing versatility in the definition of the region of interests and maximising the accuracy of abundance estimations. Furthermore, light spectral signature (e.g., fluorescence, reflection) may offer an alternative to expand the parameters that define phototrophic or pigmented organisms [
36,
37].
Machine learning is a constantly evolving field and a diversity of alternative classifiers and software frameworks are available. While the VGG architecture, used in this study, remains in the top tier of image classifiers in terms of classification errors [
38], newer and deeper convolutional neural network models are less computationally intense (faster) architectures, slightly improving in classification error benchmarks [
39]. Similarly, newer software frameworks (e.g., Tensorflow, Keras) are now faster and easier to implement than Caffe, offering greater versatility in its applications [
40]. Further work to compare the network architecture and software framework will provide a better overview for choosing the best configuration that fits the purpose of individual applications of this technology.
Looking forward, this work comprises a specific network model for each country/region (
Table 1) and a more advanced system, accounting for the geographic origin of the data, will facilitate its applications in global monitoring. Furthermore, propagating the classification uncertainty from the machine into statistical analyses will ensure a more robust integration and interpretation of monitoring data (e.g., [
41]).
4.2. Implications of Automated Benthic Assessments for Coral Reef Monitoring
It is important to understand the cost-effectiveness of machine-based monitoring relative to more conventional methodologies. Here, we demonstrated several breakthrough steps with machine-based monitoring. These include scale, repeatability, rigour, speed of reporting, and cost-effectiveness. While it is argued here that automated image analysis is not to replace expert observations in monitoring, the higher efficiency (200×) and lower costs (1.3%, plus reduced ongoing costs) provide compelling reasons for being adopted over conventional human-based techniques. This advantage becomes clearer when one considers the large volume of images [
18], increasing demand for reporting (e.g., [
42]) and large management extents [
1]. Automated image annotation offers the possibility of alleviating back-logs and accelerating the availability of detailed and accurate monitoring data, allowing relocating limited resources (e.g., expert staff) towards detailed and needed observer-derived data (e.g., biodiversity assessments). In this way, automated assessments can provide an avenue to expand the detail in monitoring by reallocation of resources, while significantly reducing the reporting time for large scale metrics (e.g., changes in composition).
Machine-based monitoring has the advantage of data integration, where once an image is collected, it can be revisited and automatically processed over time. Data integration often generates invaluable insights on the status and trends of ecosystems at global and regional scales [
43,
44]. Nonetheless, much more could be done to harness the vast amounts of data collected by government agencies, NGOs, citizens and scientists [
6]. As a common monitoring tool, images generate valuable and standardised information, which paired with automated image analyses can ease data integration, without impairing long-term data continuity. The number of open-access online tools for automated image annotation and robust demonstrations of their applications across conservation disciplines [
16,
30,
45,
46] is rapidly increasing. Critical steps are now needed to capitalise on these efforts [
28,
47] and maximise the advantages of a global monitoring and conservation science empowered by machine learning.