The world’s oceans are under tremendous stress from global warming, ocean acidification and other human activities [1
], and the UN has declared 2021–2030 as the ocean decade (https://en.unesco.org/ocean-decade
). Monitoring the marine environment is a part of the ecosystem-based Marine Spatial Planning initiative by the IOC [2
] and Life Under Water is number 14 of the UN’s Sustainable Development Goals.
The aim here is to study how the use of machine-learning techniques, combined with physical modeling, can assist in designing and operating a marine environmental monitoring program. The purpose of monitoring is to detect tracer discharges from an unknown location. Examples are accidental release of radioactive, biological, or chemical substances from industrial complexes, e.g., organic waste from fish farms in Norwegian fjords [3
] and other contaminants that might have adverse effects on marine ecosystems [4
As a case study, we use the monitoring of areas in which large amounts of CO
are stored in geological formations deep underneath the seafloor. Such storage is a part of the Carbon Capture and Storage (CCS) technology and, according to the International Energy Agency and The Intergovernmental Panel on Climate Change, will be a key factor to reach the −1.5
C goal and should account for
of the total CO
]. Due to the large amount of CO
to be stored, and as a precaution, the marine environment will have to be monitored for indications of a leak through the seafloor to compile with regulations (https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32009L0031&from=EN
A challenge for detecting CO seeps to marine waters is that CO is naturally present in marine environments and the concentration is highly variable due to local transport by water masses, uptake of atmospheric CO, temperature, biological activity, geochemistry of the sediments and other factors. Therefore, a CO seep signal can be hidden within natural variability. The purpose here is to classify noisy time series into two classes: leak vs. no-leak.
Time series classification (TSC), or distinguishing of ordered sequences, is a classical problem within the field of data mining [10
]. The problem has been tackled with numerous different approaches; see for instance Bagnall et al. [12
Distance-based methods use a measure of distance, or similarity, between times series. They then combine them with a distance-based classifier. Dynamic Time Warping (DTW) and Euclidean distance are typical examples of metrics between time series. DTW seems to be the most successful distance-based method, as it allows for perturbations, shifts, variations and in the temporal domain [13
Feature-based methods extract features from the time series and use traditional classification methods on the extracted features [12
]. Traditional signal-processing techniques, using various transforms, e.g., Fast Fourier Transform or Discrete Wavelet Transform, are often used as preprocessing steps to generate features for traditional classifiers.
Model-based TSC uses time series generated by some underlying process model, and new time series can be assigned to the model class that fits best. Typical models used for model-based time series classification are auto-regressive models [15
] and hidden Markov models [16
There are also many other techniques, such as Gaussian processes [17
] and functional data analysis [18
] that can be successfully applied to the TSC problem.
According to Bagnall et al. [12
], the state-of-the-art TSC algorithms are the Collective of Transformation Based Ensembles (COTE) [20
] and Dynamic Time Warping (DTW) [13
] in combination with some classifier, e.g., k-nearest-neighbor or decision tree. More recently, COTE has been extended to use a hierarchical vote system, resulting in the HIVE-COTE algorithm, a significant improvement over the COTE algorithm [21
]. HIVE-COTE is a hierarchical method combining different classifiers into ensembles to increase the performance of the classification capabilities. It combines 37 different classifiers that use all the above-mentioned techniques, including frequency, and shapelet transformation domains. HIVE-COTE is currently the classifier that performs best on the UCR-datasets [12
], and it is considered state of the art in TSC. The major drawback with both COTE and DTW methods is the high computational cost.
Lately, the hegemony of COTE and DTW has been challenged by several efforts of using artificial neural networks for TSC [22
]. For example Zheng et al. [23
] applied Multi-Channel Deep Convolutional Neural Networks (MC-CNN) on data for human activity as well as on congestive hearing failure data. They found that MC-CNN is more efficient, and competitive in accuracy, compared to state-of-the-art traditional TSC algorithms (1-NN DTW). A fully Convolutional Neural Network (CNN), deep multilayer perceptions network (Dense Neural Network, DNN) and a deep Residual Neural Network architecture was tested in a univariate TSC setting in [24
]. They found that both their fully connected Convolutional Neural Network and the deep Residual Neural Network architectures achieve better performance compared to other state-of-the-art approaches.
Both Recurrent Neural Networks (RNN) [25
] and Convolutional Neural Networks (CNN) for TSC are showing state-of-the-art performance, outperforming other techniques on some datasets, but not on others [22
]. Due to their nature, RNNs are a natural choice when dealing with time series; however, one of their drawbacks is that they use more time on optimization. In a review of TSC methods, Fawaz et al. [22
] found that CNNs outperform RNNs not only in training time, but also in accuracy.
An important issue that many TSC techniques cannot achieve is to quantify prediction uncertainty. One way to overcome this limitation, and retain state-of-the-art predictive power, is to use Bayesian Neural Networks [26
], e.g., Bayesian Convolutional Neural Networks (BCNNs) or Bayesian Recurrent Neural Networks (BRNNs). These have the same advantages as the standard neural networks, but have the have the additional benefit of providing posterior predictive distributions.
When dealing with a binary classification problem, such as the leak vs. no-leak classification used here, the output from a Bayesian Neural Network is a probability estimate, or level of uncertainty, of the class that a given time series belongs to. This uncertainty is important information when making decisions based on the classification, such as to mobilize a costly confirmation and localization survey [27
The major drawback with classical Bayesian Neural Networks is their failure to up-scale to large data sets. Gal and Ghahramani [28
] have recently proposed to use Bernoulli dropout during both the training and the testing stages. This can be viewed as an approximate variational inference technique [28
]. Blundell et al. [29
] presented an algorithm called Bayes by backprop
and showed that it can efficiently estimate the distributions over the model parameters. Shridhar et al. [30
] applied the Bayes by backprop algorithm in different CNNs and for different data sets, and compared it with the Gal and Ghahramani approach. They found that the two approaches are comparable. The simple and applicable nature of the method by Gal and Ghahramani has made it popular in a wide range of applications where quantification of uncertainty is important, e.g., [31
In their review of the status and future of deep-learning approaches in oceanography, Malde et al. [34
] argues that deep learning is still an infant method within marine science. Neural network models have been applied to various environmental data, e.g., [35
], and there have been some efforts regarding classification of environmental time series, e.g., [37
]. To our knowledge, using a probabilistic deep-learning approach, such as BCNN, has yet to be explored on the environmental time series. This is the motivation for the present work.
We use Gal and Gahrmani’s BCNN on the classical statistical problem of TSC and show that it can be a valuable tool for environmental time series analysis. Our aim is to contribute to the community of TSC, here focusing on the CCS, geo-science and oceanography applications.
A recent technique for gas seep detection is based on relatively simple threshold techniques of the difference between time lags in time series [39
], and we are certain that our study can be a contribution to increase detectability and thus optimization of monitoring programs in future CCS projects.
We present a solution to the binary classification problem of detecting CO seeps in the marine environment using the Scottish Goldeneye area as the case study.
Classifying time series requires data for all the classes, i.e., time series of the variables in question for no-leak and leak conditions. The no-leak situation represents natural environmental statistics, i.e., the environmental baseline, and should be based on in situ measurements, preferably supplemented with model simulations [40
]. Time series for the seep situations must rely on process modelling, simulating the different processes involved during a seep [41
], preferably supported by in situ and laboratory experiments [45
]. The use of modeling data in this case is a necessity since the data corresponding to the leak scenarios are difficult, expensive, and, in some cases, impossible to obtain.
Moreover, the trained deep neural network can be used in a transfer learning setting [46
], i.e., we can pre-train a neural network on the model data, fix the parameter weights in the first few layers, and then train the network with the limited in situ measurement data as input. The conjecture is that the first few layers adequately represent the core feature characteristics of the time series, and thus reduce the need for in situ data.
Here we use model data from the Goldeneye area, in the Scottish sector of the North Sea, obtained through the STEMM-CCS project (http://www.stemm-ccs.eu
). The simulations are performed with the Finite-Volume Coastal Ocean Model (FVCOM) [47
] coupled with European Regional Seas Ecosystem Model (ERSEM) [48
]. Different scenarios have been created in a statistically sound manner to train the BCNN. Deep learning can be used as surrogate models to extend the results from PDE simulators; see e.g., Hähnel et al. [49
] and Ruthotto and Haber [50
]. Here instead we simulate data with a computational fluid dynamics model, and use neural networks to estimate model parameters.
This manuscript is outlined in the following manner: In Section 2
we present the underlying framework for Monte Carlo dropout (MC dropout) and BCNN in a binary TSC context. We present a general deep-learning framework, Bayesian Neural Networks, how the stochastic regularization technique MC dropout can produce predictive distributions, Bayesian decision theory and presents an algorithm for decision support in environmental monitoring under uncertainty. In Section 3
, we apply the MC dropout for the case study at the Goldeneye area. Here we describe the data and how it is pre-processed, present the model, architecture and hyper-parameter settings. We use the output of the classifier in a Bayesian decision rule setting with varying cost function and demonstrates our proposed algorithm. Section 4
summarize our findings, compare our approach with relevant literature, discuss strengths and weaknesses and proposes potential extensions and further work.
We have presented a three-step algorithm that combines BCNNs with Bayesian decision theory, for support in environmental monitoring. In the first step a BCNN is optimize on labeled simulated data. In the next step, the BCNN is used to generate a posterior predictive distribution of class labels. In the final step, the optimal monitoring strategy is calculated based on the posterior predictive distribution and given operational costs.
Our case study indicates that MC dropout and BCNN are effective tools for detecting CO
seeps in marine waters. The overall predictive power is large, as seen through the ROC curve in Figure 4
. While the majority of the time series are either classified as leak or no-leak with little uncertainty, a small proportion will be inconclusive; see Figure 5
. As expected, the classification uncertainties increase with the distance to the leak source location. This distance depends on the seep flux rates and can be modeled from the outcome of the BCNN; see Figure 6
Demanding high detection probability reduces the detectable area of a monitoring location, i.e., you will need to measure closer to a leak in order to achieve the desired accuracy; see Figure 10
. This motivates the use of optimal decision strategies, that is finding appropriate action that should be initiated based on the prediction, its uncertainty, and costs associated with taking wrong and right decisions. We gave an example of an optimal strategy choice in a binary classification setting in Section 3.7
for different costs, showcasing the algorithm.
There is extensive literature, including several deep-learning approaches, on monitoring and forecasting of air quality time series. For example Biancofiore et al. [35
] used a recursive and feed forward neural network and Freeman et al. [36
] a RNN to forecast and monitor air quality from time series data. In context of oceanography monitoring, Bezdek et al. [67
] used hyper-ellipsoidal models for detecting anomalies in a wireless sensor network and validated their approach on data from Great Barrier Reef. None of these approaches take uncertainty into account. Ahmad [68
] recently published a literature review of machine-learning applications in oceanography.
is naturally present in the ocean and is highly variable, it is difficult to attribute measured CO
to an unplanned release. Blackford et al. [39
] developed pH anomaly criterion for determination of marine CO
leaks. This criterion is however location dependent and cannot be generalized. In contrast to Blackford et al., our method automatically extracts features and characteristics that could be hidden within the natural variability, potentially increasing the detection probability. Due to the costly survey in case of confirmation of leakages, it will be of importance with information about the model’s uncertainty, our approach provides that.
The most important strength of our approach is that it is principled and consistently uses a probabilistic Bayesian framework for effective decision-making in the context of environmental monitoring. A potential weakness of our case study is that we did not systematically optimize hyper-parameters, which could potentially improve the results. Examples are increasing or decrease the number of CNN layers, changing activation functions, loss functions, optimization procedures, and size of filters, strides and kernel size of the convolutional layers. With respect to loss function and activation function, we have chosen the standard techniques used in the machine-learning community today and have not addressed these norms. With respect to size of filters, strides and kernel size of the convolutional layers, we have only performed a small trial and error search of the possible hyper-parameters. Looking into the model’s weights shows that quite a large part of the network is active, suggesting that the model structure and size is relatively well-balanced. We have not found it feasible to use a lot of time to tune the model further, as we have achieved good results with the current hyper-parameter configuration.
Another more serious limitation is the fact that the BCNN is optimized with data based on a limited number of simulations for a limited time period. This biases the BCNN model towards the simulated conditions. The model can still be used in a general scenario, but its predictive power will decrease. That is why more simulations with different forcing, leak locations and fluxes are necessary to generate a predictive model that is more robust.
To test how well our BCNN model generalizes, we have carried out some sensitivity analyses in Section 3.6
. The general observation was that adding noise to the test data increased uncertainty and decreased predictive accuracy. With low level of noise the prediction deteriorates, mainly because no-leak time series is miss-classified as leaks, false positives. However, it is still able to distinguish the two classes with relatively good success. Corruption of deep-learning models is a well know problem, even with small alterations in the input data. This problem might have been solved by training the model on noisy data.
A second sensitivity analysis showed that removal of the 300T data set reduced the overall predictive power in all cases; however, the method was still able with high confident and accuracy to predict the 300T test data set. This indicates at least good generalization properties to different leakage fluxes, and potentially to different leakage locations.
In this study, univariate time series was used; however, if more information were available in terms of different sensors, measuring different geochemical markers, this framework could be extended to a multi-variate TSC setting. Another extension could be to increase the number of classes and treat them as a multi-label classification tasks, with the purpose to classify time series as be either of the following classes; 0T, 30T, 300T or 3000T.
Another question we would like to study is how well our model generalizes to different locations. One of the key concepts of CNNs is the ability to extract important features from the data. Here the concept capability to capture key characteristics of time series that contain a leak signal or not. In this sense, the model should generalize well to other locations. Testing this hypothesis would be an important step towards wide-spread use of our method.
However, in our view, the most interesting extension would be to extend the framework to a transfer learning setting. The concept of transfer learning is to use a pre-trained model and fix the weights of the first few layers and re-train with an entirely new data set. Transfer learning has been used with success in computer vision and other tasks, and the key benefit is that the need for data is reduced drastically. Translation of this concept to binary TSC, where the pre-trained model would be trained on simulation data, should be investigated closer.