Image-Based River Water Level Estimation for Redundancy Information Using Deep Neural Network

: Monitoring and management of water levels has become an essential task in obtaining hydroelectric power. Activities such as water resources planning, supply basin management and flood forecasting are mediated and defined through its monitoring. Measurements, performed by sensors installed on the river facilities, are used for precisely information about water level estimations. Since weather conditions influence the results obtained by these sensors, it is necessary to have redundant approaches in order to maintain the high accuracy of the measured values. Staff gauge monitored by conventional cameras is a common redundancy method to keep track of the measurements. However, this method has low accuracy and is not reliable once it is monitored by human eyes. This work proposes to automate this process by using image processing methods of the staff gauge to measure and deep neural network to estimate the water level. To that end, three models of neural networks were compared: the residual networks (ResNet50), a MobileNetV2 and a proposed model of convolutional neural network (CNN). The results showed that ResNet50 and MobileNetV2 present inferior results compared to the proposed CNN.


Introduction
Image-based water level estimation corresponds to a visual-sensing technique that uses an imaging process to automatically inspect readings of the water-line instead of the human eye [1]. Monitoring the water level has become an essential task for regulatory control of rivers in order to manage disaster risk assessment, flood warnings, water resources planning, public and industrial supply [2]. In hydro-power energy production, it is essential to monitor the rainfall, inflows and water level in order to maximize energy revenue, while taking into account dam safety risks [3].
Different methods are used in redundancy in order to guarantee the availability and accuracy of measurements. Automatic water-level gauges are used for monitoring water-level by sensors that measure the level of water (i.e., float-type, pressure-type, ultrasonic-type and radar-type gauge) [4][5][6]. Moreover, video surveillance has become widely used for monitoring and measuring the system at hydro-power stations as a redundancy system [7]. The problem with this method is that human eyes are not reliable and subject to errors, which compromise the security of the system. Therefore, defining an accurate and reliable method to monitor the water-level is a challenge for hydro-power system control. Estimate models using deep neural networks provide an ideal solution to monitoring the water-level and stream flow in hydroelectric power plant [8].
A deep neural network is a set of algorithms inspired by the functioning of the human brain. Each neural network acts as a neuron designed to recognize patterns, group and analyze data collected in real-world such as images, sounds, texts, and time series. The deep learning neural network has application in areas as image classification [9] segmentation and object detection [10]. Recent advances of deep learning neural network techniques present good performance to fine-grained image classification to distinguish subordinate-level categories [11].
Convolutional neural network (CNN) is a class of deep neural network of feed-forward type that uses a variation of multilayer perceptron developed in order to demand the least possible pre-processing. There are various variants of CNN architectures. However, their basic components are very similar [12]. LeNet-5, for example, consists of three types of layers: convolutional, pooling, and fully-connected layers [13]. Various improvements in CNN learning methodology and architecture are progressing to make CNN scalable to large, heterogeneous, complex, and multiclass problems. New features in CNNs include modification of processing units, a new approach to parameter and hyper-parameter optmization, and connectivity of layers [14].
CNN have increased the number of layers, starting from AlexNet, visual geometry (VGG) [15], Inception to Residual networks, showing deep networks has best results to image segmentation and classification than other techniques [16]. However, training a deep CNN is difficult due the phenomenons of exploding/vanishing gradients and degradation. Nowdays, various techniques are being proposed to training deeper networks, as initialization strategies, better optimizers, skip connections, knowledge transfer and layer training [17][18][19].
Residual neural networks (ResNet) is as a continuation of deep networks [20]. It drastically changes CNN architecture by introducing the concept of residual learning and defined an efficient methodology for training deep networks [21]. ResNet get state-of-art in classification task on large scale visual recognition challenge (ILSVRC) 2015 [22]. ResNet uses identity shortcut connections that enable the flow of information across layers without attenuation that would be caused by multiple stacked non-linear transformations, resulting in improved optimization [23]. ResNet uses residuals connections skip layers and are implemented with non-linear double or triple layer jumps with normalization batches between them [24]. Such residual connections followed by layers of normalization enabled ResNet to mitigate the problem of the explode/vanish gradient. ResNet application has achieved great success in image processing field in the last five years [25].
A MobileNet is a simplification of standard CNN to realtime applications such as image classification, captioning, object detection and semantic segmentation [26]. MobileNet architecture is based on depthwise convolutions which requires only one-eighth of computation cost compared with CNN [27]. MobileNetV2 is very similar to MobileNet, except that it uses inverted residual blocks with bottlenecking features. MobileNetV2 has a lower number parameters than the original MobileNet. It is a general architecture and can be used for multiple use cases. Depending on of case, it can use different input layer size and different width factors [28]. This allows different width models to reduce the number of multiply-adds and thereby reduce inference cost on mobile devices. MobileNetV2 architecture is based on an inverted residual structure where shortcut connections are between thin bottleneck layers [29]. It is important to remove non-linearities in narrow layers in order to maintain representational power and to improve performance. It results in a very memory-efficient inference model. MobileNetV2 improves the performance of mobile models on multiple tasks. It uses a factorized version of the convolutional operator to splits convolution into two separate layers [30]. In the literature, there are some works related to the identification of water levels through neural networks models. Wang et al. [31] proposed a deep CNN, based on the multidimensional densely connected CNN, for identifying water in Poyang Lake area. Han et al. [32] combine max-RGB method and shades of the gray method to achieve the enhancement of underwater vision. CNN method is used for solving a weakly illuminated problem for underwater images by training mapping relationship to obtain an illumination map. After image processing, a deep CNN method is proposed to perform underwater detection and classification, according to characteristics of underwater vision. Song et al. [33] construct an excellent self-learning ability of deep learning to modified structure of the Mask R-CNN (extends Faster R-CNN) method, which integrates bottom-up and top-down processes for water recognition. Gan and Zailah [34], propose a water level classification model into the flood monitoring system by integrating it with Artificial Intelligence technology and CNN.
This study proposes to develop an automatic detection approach that can be used in the redundancy of sensor techniques, in order to assure a security system of monitoring water level. It is used image analysis techniques and deep neural networks in order to automatically measure and estimate the water level, considering images from conventional cameras of the staff gauge. In this context, this article proposes a CNN model and compare it with two models of redundant system in order corroborate that is a suitable model for water level estimation.

Case Study
The Madeira River is the biggest tributary of the Amazon river in South America. The river has a capacity for power generation of 3350 MW and its flows can reach 60,000 m 3 /s, enough to supply approximately ten million homes. The Jirau Hydroelectric Plant is one of two power stations installed in Madeira River, located in state of Rondônia, Brazil ( Figure 1). It is composed of 50 generating units, managed by Consortium Energia Sustentável do Brasil (ESBR), which has provided all data for this study. The continuous monitoring of inflows and water-level is an essential tool for hydropower dam operators by providing real-time data for decision making in power generation and planning. The water level, measured in meters according to mean sea level, requires a maximum accuracy and must be efficiently available to hydropower control systems and operators. The water-level at the Jirau plant reservoir ranges from 82.5 m to 90 m and its monitoring system consists of sensors that measure water-level at the dam, sending data to the control room. Moreover, the water-level is monitored by real-time videos from the staff gauge displayed at the control room and monitored by operating staff experts who confront data from sensors with the level at staff gauge.
For this study, real-time videos of the staff gauges were recorded between 31 May and 6 June of 2020. The dataset consists of thirty-five real-time videos of staff gauges over different conditions, i.e., days, nights, and weather conditions. The lowest level registered in the dataset is 86.35 m and the highest level registered is 88.89 m which covers an amplitude of 31.87% of water level and uniformity distribution of 0.2995. The measurement errors/uncertainties while using the conventional staff gauge are maximum of 0.6%. The aim of this paper is to estimate water level of a river at a hydro-power plant. The videos were recently recorded and provided by Jirau Hydroelectric Plant specific for this case study.

Methodology
The images from real-time videos are preprocessed in order to determine measurements of the level of the river. Moreover, the deep neural network model is used in order to automatically measure and estimate the water level. These videos are separated into frames of 3364 images in total, where the water level is measured for each image. Initially, images are preprocessed to remove symbols of date and hours and prepared for extraction of the region of interest (ROI) [35]. Once the water level is detected by image analysis, each image is classified by its level and divided into two sets: 2706 images for the training set and 658 for the testing set. Thereafter, deep neural networks are used to train and estimate the water level.

Image Processing
Digital image processing is applied to remove noise for better visualization and to extract the ROI of staff gauge. For the provided dataset, the camera angle is not suitable for ROI extraction.
The captured images show that staff gauge is not straight-positioned, which influences correct detection of the water level in the final analysis. The original image of staff gauge to be binarized to extract ROI is shown in Figure 2. To improve image quality, a vertical shearing filter was applied to make the image straight [36]. Next, non-uniform illumination was corrected by applying imaging segmentation. The morphological opening filter was applied to noise reduction and gamma filter to enhance contrast ( Figure 3). To extract ROI, an enhanced image was binarized and borders of ROI are detected and cropped from image. The result was the region of interest of the image and it is depicted in Figure 4. After extracting ROI, water-level management was performed in order to identify the water level measured at each image.

Water Level Management
The staff gauge contains count marks that measure the level of water reached at a specific mark. To measure water level by image analysis of ROI, it is necessary to define a window size of these count marks, in pixel coordinates, and obtain the number of counts present in the staff gauge. In addition, size difference between the beginning of staff gauge and the river's surface line is also defined. Considering a fixed mark, in meters, number of counts and defined surface line, it is possible to obtain a relationship between them according to Equation (1) where l is water level of the river, r is fixed mark, c is the number of counts on staff gauge, d is difference between water line surface and beginning of staff gauge and s is the size of each counter mark in pixel coordinates. To ensure results are in centimeters, l is multiplied by 0.1. The final result shows the preprocessed image and water level measurement detected ( Figure 5). Once images were processed, deep neural networks were used to train and test the dataset in order to estimate the level of the river at hydro-power control.

Residual Neural Network Model
ResNet are extensively used in computer vision tasks [21]. The ResNet consists of a residual learning framework to simplify the formation of deep networks with an architecture composed of residual blocks. A deep network is composed of many nonlinear functions in which dependencies between layers can be highly complex, making gradient computations inconstant [21]. To circumvent this problem, ResNet introduces identity skip connections in order to bypass residual layers, allowing information to pass directly through any subsequent layer. Therefore, instead of layers fitting desired underlying mapping, they fit residual mapping.
Considering input x and an underlying mapping H(x) to be fit by stacked layers, residual mapping is defined as The unit residual structure is represented bŷ wherex is output of residual unit, H(x) is an identity mapping, W is set of weights, function σ is activation function of nodes (ReLU) and F(x, W) is residual mapping to be learned [21,37]. The ResNet used in this study is called ResNet50 and consists of 50 layers, composed of a convolutional layer and residual blocks. The dataset provides preprocessed images as input. Each image is resized in the model in order to scale to 224 rows and 224 columns. Thereafter, a convolutional layer was applied with a filter (7 × 7) followed by residuals blocks, with a series of residuals layers with filter (3 × 3). The lasted layer is connected with a fully connected layer followed by regression layers [21].

MobileNetV2 Model
A MobileNetV2 is a neural network characterized by optimizing memory consumption and execution at a low computational cost [38]. MobileNetV2 architecture is composed of a concept of depthwise separable convolutions and inverted residual structure, similar to residuals blocks [39,40]. As described in [41], depthwise separable convolutions have the aim of replacing a full convolutional operator into two separate layers. The first layer is a depthwise convolution performing a feature map-wise convolution which is applied for each feature map. The second layer is a 1 × 1 convolution kernel, named pointwise, applied for feature maps through computing linear combinations. In a regular convolutional operator, an image is processed through height, width and channel dimensions at the same time. However, separable convolutions process an image by height and width at the first layer and channel dimensions at the second layer [42].
According to [30], computational costs of a regular convolutional operator is calculated by and for a separable convolutions the formula is where C regular is a regular convolutional operator, C separable is a separable convolution, i is input layer index, j output layer index, h and w represents height and width of features maps respectively. The d i and d j are respectively the number of inputs and outputs of feature maps, and k is filter size. Therefore, to quantify computational advantage of use separable convolutions is used Equation (6)

Evaluation Criteria
The deep neural networks methods were evaluated by three different criteria: root mean square error (RMSE), expressed by Equation (7), mean absolute error (MAE) in Equation (8) and determination coefficient (R 2 ) in Equation (9), where N is the amount of data,ŷ i is the estimated value, y i is the measured value andȳ i average of the measured value of the water level. These errors are used to measure the accuracy of the estimation model for training and testing. The RMSE and MAE measure the magnitude of errors in meters. A lower value for RMSE is better than a higher one. The MAE calculates the average over the dataset between estimation and water level measurements from image analysis. R 2 represents the relationship between the variance of estimation and total variance of data. All evaluation results, accuracy and errors of each model tested are individually presented in Section 5.

Convolutional Neural Networks: Proposed Method
The proposed CNN model is composed of 19 layers in which the first layer is responsible to normalizing image dimension in order to guarantee the output of 224 rows and columns and three channels (RGB). The next five layers contain a set of filters in order to perform convolutional operations with its subsequent layer. For a set of filters, each of them produces an activation map that is stacked along depth dimension, producing output volume. As a parameter for all layers, we used a kernel with size 3 × 3. CNN architecture is summarized in Figure 6. The model is empirically calibrated with an initial learning rate set to 0.001 for 350 epochs, chosen independently to faster training and with better accuracy comparing to renowned model [43]. To avoid fast reduction of matrix dimension, it is set padding 0 after each convolutional layer. For each layer, non-linearity layers involve the use of a nonlinear activation function which takes a single number and performs a fixed mathematical operation on it, and it is a complement of the convolutional operator [44].
The pooling layer is responsible for reducing spatial size, number of parameters and computation in networks. We used an average pooling type to reduce activation maps. However, it was necessary to apply a dropout layer equal to 20% in order to decrease further data. Fully connected layer performs classification of features and extracts all convolutional layers [43]. To perform estimations, a regression layer is added at end of CNN architecture. It represents six convolutional layers with its respective filters between brackets, batch Normalization, ReLU and average pooling layer (after each convolution layer), one dropout layer, fully connected layer and regression layer to make estimations.

Results
Three different deep neural networks where tested in order to identify the model that better fits to the problem. A proposed CNN, ResNet50 [21] and MobileNetV2 [30].They were evaluated by training and testing images under the same parameters calibration (i.e., initial learning rate, epochs, bias, validation frequency, etc.). Table 1 shows results of RMSE, MAE and R 2 for train and test datasets. Comparing the coefficient of determination, ResNet50 presented 0.7808 for the training dataset and 0.7692 for the testing dataset. MobileNetV2 presented 0.7803 for training, and 0.7612 for the testing dataset. Finally, the proposed model coefficient of determinantion was 0.9004 and 0.8868 for training and test datasets respectively. The difference between other criteria (RME and MAE) is also evident in Table 1 which indicates that the proposed model obtained the best result in all criteria compared with ResNet50 and MobileNetV2 networks.  Figure 7 compare estimations obtained with ResNet50, MobileNetV2 and proposed CNN model according to data value from image analysis (Expected) and models output (Predicted) under the same configuration. Due to noise in some images caused by acquisition and/or weather conditions (i.e., rain), the predicted level was not precise, once image analysis could not be effective for this error. These situations were characterized by spikes in plots. Figure 7a,b compare results respectively for training and test set of ResNet50. For this model, both results are similar and remain within a narrow range without much variation. Since ResNet50 architecture is less susceptible to error, the predicted level shows fewer spikes compared to other models. However, the model underestimates the predicted level compared to the expected level in comparison to other models under same configuration. This results corroborates evaluation presented in Table 1. Figure 7c,d shows values for MobileNetV2 training and test set respectively. In this case, there is a considerable variation between predicted and estimated levels compared to ResNet50 model. Moreover, due to its architecture being more susceptible to error, predicted level presents more spikes at plots. However, the estimated level for MobileNetV2 was in range of the predicted level which also corroborates results from evaluation criteria. Figure 7d,e shows values of the CNN proposed model for training and test set. Despite some spikes presented in the predicted model, susceptibility for the error of this model is lower then MobileNetV2 but still bigger than ResNet50. However, this model presented a better result for the predicted and expected level for both training and test sets compared to previous models under the same configuration, which also confirms the results from Table 1.

Conclusions and Discussion
This work proposes water level detection using images and a deep neural network method to automatically estimate the level of a river at a hydro-power plant. The image dataset was preprocessed from videos of staff gauge measures provided by Jirau Hydroelectric Power Plant. Digital image processing consisted of apply filters (i.e., vertical shearing filter, imaging segmentation, morphological opening filter and gamma filter) to enhance images in order to identify a region of interest. Moreover, to identify water measure from images, we defined a window size of count marks, in pixel coordinates, and obtain the number of counts present in the staff gauge. Strategy to identify water measure from images by correlating the level of the river, fixed marks and number of counts of staff gouge provided a qualified dataset for training and test deep neural networks.
For a deep neural network, a CNN model was proposed to detect the water level using preprocessed images by training and test datasets. The estimation capacity of the proposed model was tested in terms of RMSE, MAE and R 2 , resulting in low errors for training and test sets. The determination coefficient of the proposed model for training images reached 0.9004, while for testing it reached 0.8868. Moreover, MobileNetV2 neural network and ResNet50 residual network were implemented, using their standard parameters, in order to compare with proposed model. To that end, the same dataset, configuration and test criteria were used for each compared model. However, there are some limitations regarding this methodology: The dataset used is limited to a short period of recording and to a specific staff gauge since it was a new procedure at Jirau Power Plant. Therefore, increase data with measurements in a more extensive range of water levels and considering different seasons would increase the accuracy of the models. Moreover, noise at images caused by weather condition and/or acquisition also influence the efficiency of models that are more susceptible to errors. Therefore, a fine-tuning of parameters of deep neural networks (i.e., bias, learn rate, validation frequency, etc.) can enhance not only the proposed CNN model, but also tested models in order to provide a more precise estimation of the water level.
Therefore, it can be concluded that CNN based strategy is a promising approach for water level detection of Madeira River, and can collaborate with security regarding data integrity, through redundancy of information through another acquisition paradigm. This safety is essential in efficiency studies of the Jirau Hydroelectric Power Plant and energy production as well as hydropower control and management.
As future work, it is possible to work on these limitations, increasing the dataset with new images at different dates, water levels and weather seasons, providing better training and test sets for the model. Furthermore, exploring different calibration and configurations of the deep neural networks can be performed in order to adapt the model to the problem.

Conflicts of Interest:
The authors declare no conflict of interest.