# Research on Water-Level Recognition Method Based on Image Processing and Convolutional Neural Networks

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Region and Data Acquisition

^{2}and the elevation range between 2960 and 4820 m. The Hulu watershed belongs to a continental climate, the average annual temperature is −0.3 °C and the average annual precipitation is 599.8 mm [22]. The average annual runoff was 10,035,203.71 m

^{3}from 2010 to 2015. Precipitation in this region is frequent and 89% of the precipitation falls during the wet season from May to September [23,24].

#### 2.2. Processing Flow of the Water Level Image Recognition System

#### 2.3. Gray Level Transformation

#### 2.3.1. Graying

#### 2.3.2. Binarization

- Traverse all the pixels of the image and count the histogram of the gray distribution.
- Normalize the histogram and set the ratio of the number of pixels with gray value $i$ to the total number of pixels as $p\left(i\right)$.
- Assuming that the current threshold is $t$, the normalized histogram can calculate the target pixel ratio ${\omega}_{0}$. the normalized histogram can calculate the target pixel ratio ${\omega}_{1}$, under the current division, as well as the average gray level of the target area ${\mu}_{0}$. under the current division, as well as the average gray level of the target area ${\mu}_{1}$.

- 4.
- To make the intra-class variance the smallest and the inter-class variance the largest, it is equivalent to making $g\left(t\right)={\omega}_{0}\left(t\right){\omega}_{1}\left(t\right){({\mu}_{0}\left(t\right)-{\mu}_{1}\left(t\right))}^{2}$ the largest. OTSU, introduced in the paper, uses the largest between-class variance:$${\sigma}_{B}^{2}={\omega}_{0}{\left({\mu}_{0}-{\mu}_{T}\right)}^{2}+{\omega}_{1}{\left({\mu}_{1}-{\mu}_{2}\right)}^{2}={\omega}_{0}{\omega}_{1}{\left({\mu}_{1}-{\mu}_{0}\right)}^{2}$$
- 5.
- Traverse all the values of T from 0 to 255 to find the value of t that maximizes $g\left(t\right)$, that is, the global threshold of the image.

#### 2.4. Morphological Processing

#### 2.4.1. Dilation and Erosion

#### 2.5. Extraction of Regions of Interest

#### 2.5.1. Edge Detection

- 1.
- We use a Gaussian filter to convolve the image in order to filter out noise and smooth the image to prevent the false detection caused by noise. The convolution kernel scale of 3 × 3 or 5 × 5 is commonly used.The following formula is the generating equation of the Gaussian filter kernel with a size of (2k + 1) × (2k + 1):$${H}_{ij}=\frac{1}{2\pi {\sigma}^{2}}exp\left(\frac{{\left(i-\left(k+1\right)\right)}^{2}+{\left(j-\left(k+1\right)\right)}^{2}}{2{\sigma}^{2}}\right);1\le i,j\le \left(2k+1\right)$$If a 3 × 3 window in the image is A and the pixel to be filtered is $e$, after Gaussian filtering, the brightness value of pixel $e$ is:$$e=\mathrm{H}\ast \mathrm{A}=\left[\begin{array}{ccc}{h}_{11}& {h}_{12}& {h}_{13}\\ {h}_{21}& {h}_{22}& {h}_{23}\\ {h}_{31}& {h}_{32}& {h}_{33}\end{array}\right]\ast \left[\begin{array}{ccc}a& b& c\\ d& e& f\\ g& h& i\end{array}\right]\phantom{\rule{0ex}{0ex}}=sum\left(\left[\begin{array}{ccc}a\times {h}_{11}& b\times {h}_{12}& c\times {h}_{13}\\ d\times {h}_{21}& e\times {h}_{22}& f\times {h}_{23}\\ g\times {h}_{31}& h\times {h}_{32}& i\times {h}_{33}\end{array}\right]\right)$$
- 2.
- The magnitude and direction of the ladder are calculated to estimate the edge strength and direction at each point.$$G\left(x,y\right)=\sqrt{{G}_{x}^{2}\left(x,y\right)+{G}_{y}^{2}\left(x,y\right)}=\left|{G}_{x}\right|+\left|{G}_{y}\right|$$$$\theta =arc\mathrm{tan}\left({G}_{y}/{G}_{x}\right).$$

- 3.
- Non-maximum SuppressionNon-Maximum Suppression is an edge thinning technique which can help suppress all gradient values other than the local maximum to 0. According to the gradient direction, the gradient amplitude is suppressed by Non-Maximum Suppression to eliminate the stray response caused by edge detection. In essence, this operation is a further refinement of the results of the Sobel and Prewitt operators for meeting the third standard. The algorithm of non-maximum suppression for each pixel in the gradient image is:(1) Compare the gradient intensity of the current pixel with two pixels along the positive and negative gradient direction (not the edge direction).(2) If the gradient intensity of the current pixel is the largest compared with the other two pixels, the pixel remains as an edge point, otherwise, the pixel will be suppressed.Generally, for more accurate calculation, linear interpolation is used between two adjacent pixels across the gradient direction to obtain the pixel gradient to be compared.

- 4.
- Apply Double-Threshold Detection to determine true and potential edges.

- 5.
- Finally, edge detection is completed by suppressing isolated weak edges (low threshold points).

#### 2.5.2. Contour Detection

#### 2.5.3. Tilt Correction

#### 2.6. Character Positioning and Segmentation

#### 2.7. Identification and Calculation the Value of Water Level

**Recognize characters and return coordinates:**The CNN is designed to classify and recognize the segmented digital characters, take the largest character among all recognized characters, and return the position coordinates of the character.**Count the number of scale lines:**A counter that counts down the scale line based on the coordinate position of the largest recognized numeric character (after a series of preprocessing operations is set up, and the pixels are traversed and counted using the pixel variation of the binary image).**Calculate the value of water level:**The value of the largest numeric character identified in step (1) is used, and the value of the counter in step 2 (the value of the number of tick marks traversed) is used, which is the final water level value.

#### 2.7.1. Design of CNN

**Input Layer:**The number of nodes in the input layer of a CNN is determined by the dimension of the input vector. The binarized image dimension of the digital characters to be recognized in this research is $28\times 28$, so the number of nodes in the input layer is $784$.

**Convolution Layer**:

**Pooling Layer:**

**Dropout Layer:**

**Flatten Layer:**It is used to “Flatten” the input data, that is, to make the multidimensional input one-dimensional. It is commonly used in the transition from the convolutional layer to the fully connected layer, and the Flatten layer does not affect the size of the batch.

**Fully Connected Layer:**There are two layers, the Param of the fully connected layer neural network, which describes the number of neuron weights in each layer. Its calculation formula is as follows:

**Output Layer:**

#### 2.7.2. Train the CNN

- Ten image samples are selected containing printed numeric characters 0–9, each containing 1016 binary images with a size of $128\times 128$, for a total of 10,160 numeric character images. We randomly assign 80% of the training set and 20% of the validation set to be the data set to train the CNN.
- After 50 epochs of iterative training, the training results show that when the loss function converges, the recognition accuracy of the neural network on the verification set reaches 97-98%, which is shown in Figure 20.
- Save the best training results as h5 model, evaluate the model and call it in the test phase.

#### 2.8. Extraction of Scale Line and Calculation of Water Level

## 3. Results

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

CNN | Convolutional Neural Networks |

ANN | Artificial Neural Networks |

CV | Computer Vision |

SVM | Support Vector Machine |

ILSVRC | ImageNet Large Scale Visual Recognition Challenge |

CNY | China Yuan |

DTU | Data Transfer Unit |

## References

**Figure 1.**(

**a**) The geographical location of Hulu watershed. (

**b**) The distribution of instruments and observatories of the Hulu watershed [22].

**Figure 2.**The hydrographic cross-section located in the Hulu watershed of the Qilian Mountains. (

**a**) Cross-sectional view. (

**b**) Vertical view. The camera in this figure is an infrared sensor camera called LTL5120A, which can automatically shoot continuously according to the set time interval, and the resolution of the photos taken is 2560 × 1920.

**Figure 3.**Two wells constructed of the hydrographic cross-section. (

**a**) The silted-up wells. (

**b**) Manual dredging from the water wells.

**Figure 7.**(

**a**) Sobel-Operated image from Figure 6b. (

**b**) The OTSU method for binarization operation from (

**a**).

**Figure 8.**The effect after the opening and closing operation. (

**a**) The image from Figure 7b by closing operation. (

**b**) The image from (

**a**) by horizontal opening operation. (

**c**) The image from (

**b**) by vertical opening operation. (

**d**) The image from (

**c**) by horizontal opening operation.

**Figure 9.**The effect after expansion operation. (

**a**) The image in Figure 8d by horizontal expansion. (

**b**) The image from (

**a**) in this figure by vertical expansion.

**Figure 10.**The gradient vector, azimuth angle and edge direction of the center point. The edge of any point is orthogonal to the gradient vector.

**Figure 12.**The effect of contour detection from the image in Figure 9b.

**Figure 13.**The effect of contour detection and edge detection. (

**a**) is obtained from Figure 12 by the size of the area of the detected contour. (

**b**) The effect from (

**a**) by the edge detection of Canny operator.

**Figure 14.**The effect of linear detection and tilt correction. (

**a**) The straight line detected by Hough transform from image Figure 13b. (

**b**) Rotation correction according to the detected tilt angle (8.3°) of the lines in (

**a**). (

**c**) The deviation between the displayed value and the real value after tilt correction.

**Figure 15.**The water ruler image containing only the part above the water surface is extracted. (

**a**) is the water ruler image truncated from Figure 14b. (

**b**) is the effect of closing operation from (

**a**). (

**c**) is the effect of contour detection from (

**b**) by using the method named “findContours”. (

**d**) is the effective area extracted from the detected contour range.

**Figure 16.**The ruler images (

**a**–

**c**) are, in turn, Gaussian filter, Grayscale processing, and binarization from Figure 15d.

**Figure 17.**Further segmentation of characters and scale lines. (

**a**) The image of the water ruler containing only one side of the array character area segmented from the image Figure 16c. (

**b**) The image of the water ruler containing only one side of the scale lines area segmented from the image in Figure 16c. (

**c**) The final four images (

**c**–

**f**) are obtained by morphological manipulation and projection from (

**b**).

**Figure 18.**The images above are the final character images, which are positioned and divided from Figure 17f according to the projection method.

**Figure 19.**The Fully Connected Layer part of a neural network (

**left**) and the Fully Connected Layer part after adding the Dropout layer (

**right**).

**Figure 21.**Flowchart and pseudo-code for calculating the number of scale lines. (

**b**) is the portion of the scale extracted from (

**a**). (

**c**) is the binarized image of (

**b**). (

**d**) is the pseudo-code of the calculation procedure to obtain the number of scale-lines.

**Figure 22.**Comparison of water level values. The green line shows the values by visual reading. The yellow line represents the values of intelligent recognition. The bule line represents the values by the method of template matching. The abscissa represents the sequence number of the image and the ordinate represents the value of the reading (the unit is m).

**Figure 23.**Comparison of time loss. The orange columns show the time loss of intelligent recognition. The blue columns represent the template matching. The ordinate represents the sequence number of the image and the abscissa represents the value of the time loss (the unit is s).

Layer (Type) | Output Shape | Param |
---|---|---|

sequential (Sequential) | (None, 28, 28, 3) | 0 |

rescaling_1 (Rescaling) | (None, 28, 28, 3) | 0 |

conv2d (Conv2D) | (None, 28, 28, 16) | 448 |

max_pooling2d (MaxPooling2D) | (None, 14, 14, 16) | 0 |

conv2d_1 (Conv2D) | (None, 14, 14, 32) | 4640 |

max_pooling2d_1 (MaxPooling2D) | (None, 7, 7, 32) | 0 |

conv2d_2 (Conv2D) | (None, 7, 7, 64) | 18,496 |

max_pooling2d_2 (MaxPooling2D) | (None, 3, 3, 64) | 0 |

dropout (Dropout) | (None, 3, 3, 64) | 0 |

flatten (Flatten) | (None, 576) | 0 |

dense (Dense) | (None, 10) | 73,856 |

dense_1 (Dense) | (None, 10) | 1290 |

**Table 2.**The results of the comparison between the template matching algorithm and the intelligent recognition algorithm designed for this experiment. The accuracy and time loss of these two algorithms were compared using manual readings as the standard. Each of the manual readings is the average of three manual readings.

Ruler No | Visual (m) | Template Matching | Intelligent Recognition | ||||
---|---|---|---|---|---|---|---|

Value (m) | Error | Time (s) | Value (m) | Error | Time (s) | ||

1 | 0.23 | 0.22 | 4.35% | 8.32 | 0.23 | 0.00% | 6.94 |

2 | 0.27 | 0.10 | 62.96% | 8.44 | 0.28 | 3.70% | 6.82 |

3 | 0.24 | 0.24 | 0.00% | 8.73 | 0.24 | 0.00% | 4.59 |

4 | 0.24 | 0.06 | 75.00% | 8.96 | 0.23 | 4.17% | 4.58 |

5 | 0.24 | 0.16 | 33.33% | 9.29 | 0.24 | 0.00% | 4.58 |

6 | 0.26 | 0.17 | 34.62% | 9.74 | 0.25 | 3.85% | 4.58 |

7 | 0.23 | 0.17 | 26.09% | 9.35 | 0.17 | 26.09% | 4.56 |

8 | 0.24 | 0.10 | 58.33% | 8.93 | 0.19 | 20.83% | 4.6 |

9 | 0.23 | 0.19 | 17.34% | 8.78 | 0.19 | 17.39% | 4.61 |

10 | 0.22 | 0.20 | 9.01% | 9.48 | 0.21 | 4.55% | 4.61 |

11 | 0.23 | 0.17 | 26.09% | 8.89 | 0.25 | 8.70% | 4.6 |

12 | 0.24 | 0.07 | 70.83% | 8.26 | 0.24 | 0.00% | 6.86 |

13 | 0.24 | 0.14 | 41.67% | 9.27 | 0.23 | 4.17% | 4.53 |

14 | 0.24 | 0.16 | 33.33% | 8.92 | 0.24 | 0.00% | 4.63 |

15 | 0.23 | 0.22 | 4.35% | 9.32 | 0.23 | 0.00% | 4.66 |

16 | 0.22 | 0.19 | 13.64% | 9.63 | 0.19 | 13.64% | 4.59 |

17 | 0.21 | 0.13 | 38.10% | 8.64 | 0.22 | 4.76% | 4.44 |

18 | 0.23 | 0.14 | 39.13% | 9.33 | 0.23 | 0.00% | 4.58 |

19 | 0.27 | 0.22 | 18.52% | 8.28 | 0.27 | 0.00% | 4.37 |

20 | 0.28 | 0.30 | 7.14% | 8.72 | 0.30 | 7.14% | 4.61 |

21 | 0.26 | 0.16 | 38.46% | 8.37 | 0.24 | 7.69% | 4.56 |

22 | 0.28 | 0.17 | 39.29% | 8.88 | 0.26 | 7.14% | 4.58 |

23 | 0.27 | 0.26 | 3.70% | 8.96 | 0.26 | 3.70% | 4.6 |

24 | 0.21 | 0.22 | 4.76% | 8.92 | 0.21 | 0.00% | 4.61 |

25 | 0.23 | 0.25 | 8.70% | 8.94 | 0.23 | 0.00% | 4.63 |

26 | 0.21 | 0.19 | 9.52% | 8.95 | 0.19 | 9.52% | 4.6 |

27 | 0.21 | 0.11 | 47.62% | 9.04 | 0.20 | 4.76% | 4.6 |

28 | 0.20 | 0.10 | 50.00% | 9.1 | 0.18 | 10.00% | 4.58 |

29 | 0.22 | 0.12 | 45.45% | 9.12 | 0.23 | 4.55% | 4.58 |

30 | 0.21 | 0.11 | 47.62% | 8.09 | 0.20 | 4.76% | 4.68 |

31 | 0.23 | 0.30 | 30.43% | 8.8 | 0.24 | 4.35% | 4.64 |

32 | 0.20 | 0.11 | 45.00% | 8.21 | 0.20 | 0.00% | 4.36 |

33 | 0.23 | 0.22 | 4.35% | 9.01 | 0.21 | 8.70% | 4.63 |

34 | 0.22 | 0.21 | 4.55% | 8.17 | 0.22 | 0.00% | 4.47 |

35 | 0.21 | 0.21 | 0.00% | 8.38 | 0.21 | 0.00% | 4.19 |

36 | 0.20 | 0.18 | 10.00% | 9.09 | 0.19 | 5.00% | 4.63 |

37 | 0.21 | 0.14 | 33.33% | 9.21 | 0.23 | 9.52% | 4.62 |

38 | 0.21 | 0.10 | 52.38% | 9 | 0.19 | 9.52% | 4.57 |

39 | 0.21 | 0.12 | 42.86% | 9.81 | 0.21 | 0.00% | 4.59 |

40 | 0.22 | 0.16 | 27.27% | 8.94 | 0.24 | 9.09% | 4.59 |

Average | — | — | 28.98% | 8.91 | — | 5.43% | 4.74 |

