Developing a Discharge Estimation Model for Ungauged Watershed Using CNN and Hydrological Image

: This study aimed to estimate the discharge in ungauged watersheds. To this end, we herein deviated from the model development methodology of previous studies and used convolution neural network (CNN), a deep training algorithm, and hydrological images. As the CNN model was developed for solving classiﬁcation issues in general, it is unsuitable for simulating the discharge, which is a continuous variable. Therefore, the fully connected layer of the CNN model was improved. Moreover, images reﬂecting the hydrological conditions rather than a general photograph were used as input data for the CNN model. Three study areas that have discharge gauged data were set for the model’s training and testing. The data from two of the three study areas were used for CNN model training, and the data of the other were used to evaluate model prediction performance. The results of this study demonstrate a moderate predictive success of the discharge of an ungauged watershed using the CNN model and hydrological images. Therefore, it can be suitable as a methodology for the discharge estimation of ungauged watersheds. Simultaneously, it is expected that our methodology can be applied to the ﬁeld of remote sensing or to the ﬁeld of real-time discharge simulation using satellite imagery on a global scale or across a wide area. normalization layer are arranged, and a density layer is re-connected. Here, the batch normalization layer is characterized by improving the gradient vanishing and local minima, enabling stable model training [42].


Introduction
Due to the population growth and industrialization brought about by the Industrial Revolution, as well as flood and drought due to climate change, there has been an increased emphasis on the importance of water resources; in particular, the demand for water resources is rapidly increasing. To this end, each country establishes a national water resource management plan at the watershed level to manage water resources. Water resource management plans such as integrated water resource management [1] require recording of changes in discharge depending on conditions such as weather and hydrology. This is because these represent the basic data for establishing future plans such as water resource management and usage. Discharge data from countries that operate the total maximum daily loads [2] for watershed management, such as South Korea, are an absolutely necessary factor in establishing a water resource management plan. In particular, Paldang Lake in South Korea is an extremely rare case because it is used by more than 50% of the population. The water resource management plan for Paldang Lake and its inflowing rivers is recognized as extremely important at the government level, and the South Korean government intends to establish a long-term comprehensive water resource plan. There are several methods of collecting discharge data for watershed management; of collecting discharge data in ungauged watersheds and to promote convenience in model development.
To this end, the convolution neural network (CNN) was adopted herein and was determined as a methodology suitable for the development of a new type of runoff model as it has the characteristic of using spatial attributes extracted from training images. In addition, instead of the input data in the one-dimensional (1D) vector format used in the conventional ANN model, its performance was presented using the hydrological image as input data for the model. The hydrological image is a 2D matrix structure with a grid unit, with each grid having an attribute value reflecting the conditions of precipitation, land use, and soil.

Study Area
To develop a discharge simulation model for ungauged watersheds, three watersheds within the Paldang watershed-Jo Jong (JJ), Heuk Cheon (HC), and Bok Ha (BH)-were set as the target study areas herein, as shown in Figure 1. Paldang watershed refers to the seven local governments surrounding Paldang Lake, an artificial lake built as a result of the Paldang Dam. Paldang Lake is an artificial lake approximately 27.3 km away from Seoul, the capital city of South Korea. The watershed area of Paldang Lake is 23.800 km 2 , the storage capacity is 244 million m 3 , the effective storage capacity is 18 million m 3 , and the residence time is 5.4 days. Its inflow rivers comprise the Namhan River, Bukhan River, and Gyeongan River, which are national rivers.
Water 2020, 12, x FOR PEER REVIEW 3 of 22 As mentioned above, the objective of our study was to overcome the complexity and difficulty of constitutive model development by using empirical models to improve the difficulty and inconvenience of collecting discharge data in ungauged watersheds and to promote convenience in model development. To this end, the convolution neural network (CNN) was adopted herein and was determined as a methodology suitable for the development of a new type of runoff model as it has the characteristic of using spatial attributes extracted from training images. In addition, instead of the input data in the one-dimensional (1D) vector format used in the conventional ANN model, its performance was presented using the hydrological image as input data for the model. The hydrological image is a 2D matrix structure with a grid unit, with each grid having an attribute value reflecting the conditions of precipitation, land use, and soil.

Study Area
To develop a discharge simulation model for ungauged watersheds, three watersheds within the Paldang watershed-Jo Jong (JJ), Heuk Cheon (HC), and Bok Ha (BH)-were set as the target study areas herein, as shown in Figure 1. Paldang watershed refers to the seven local governments surrounding Paldang Lake, an artificial lake built as a result of the Paldang Dam. Paldang Lake is an artificial lake approximately 27.3 km away from Seoul, the capital city of South Korea. The watershed area of Paldang Lake is 23.800 km 2 , the storage capacity is 244 million m 3 , the effective storage capacity is 18 million m 3 , and the residence time is 5.4 days. Its inflow rivers comprise the Namhan River, Bukhan River, and Gyeongan River, which are national rivers.   The three study areas are mostly composed of forest and agricultural areas, where the change in land use over time is rare due to the abovementioned regulations ( Figure 2). Yanghwacheon is the main stream of BH, which is a watershed with little interference from external streams. The main stream length, the average slope, and the area of BH are 32 km, 4.7 • , and 181.1 km 2 , respectively, which is a typical agricultural area mainly comprising paddies, uplands, and forests. HC is a watershed with no inflow of external streams, and Heukcheon is the main stream. The main stream length, average slope, and the area of HC are 42.9 km, 18.4 • , and 314.1 km 2 , respectively, making it a forest-dominant area. JJ is a watershed with Jojongcheon as the main stream, where the main stream length, average slope, and area are 39 km, 20.2 • , and 260.6 km 2 , respectively. Similar to HC, JJ is a forest-dominant region ( Table 1).
Water 2020, 12, x FOR PEER REVIEW 4 of 22 The three study areas are mostly composed of forest and agricultural areas, where the change in land use over time is rare due to the abovementioned regulations ( Figure 2). Yanghwacheon is the main stream of BH, which is a watershed with little interference from external streams. The main stream length, the average slope, and the area of BH are 32 km, 4.7°, and 181.1 km 2 , respectively, which is a typical agricultural area mainly comprising paddies, uplands, and forests. HC is a watershed with no inflow of external streams, and Heukcheon is the main stream. The main stream length, average slope, and the area of HC are 42.9 km, 18.4°, and 314.1 km 2 , respectively, making it a forest-dominant area. JJ is a watershed with Jojongcheon as the main stream, where the main stream length, average slope, and area are 39 km, 20.2°, and 260.6 km 2 , respectively. Similar to HC, JJ is a forest-dominant region (Table 1).

Data Collection
The data required in this study were four items: precipitation, discharge, land-use map, and soil map. The data collection period was 10 years from 1 January 2010 to 31 December 2019. The precipitation data, discharge and soil map, and land-use map were collected from the Korea Meteorological Administration [25], Water Resources Management Information System [26], and Environmental Geographic Information Service [27], respectively. Discharge data were daily average data (m 3 /s), wherein gauge points are as shown in Figure 1, and precipitation data involved the daily data of 34 precipitation gauge stations inside and outside the study area (mm/day) as shown in Figure  1. A grid-format land-use map and soil map were used, with the resolution of 30 m × 30 m and a data scale of 1:5.000. The land-use map comprised eight items-water, urban, barren, pasture, forest, paddy, upland, and wetlands-and the soil map included data with 59 physical properties (these 59

Data Collection
The data required in this study were four items: precipitation, discharge, land-use map, and soil map. The data collection period was 10 years from 1 January 2010 to 31 December 2019. The precipitation data, discharge and soil map, and land-use map were collected from the Korea Meteorological Administration [25], Water Resources Management Information System [26], and Environmental Geographic Information Service [27], respectively. Discharge data were daily average data (m 3 /s), wherein gauge points are as shown in Figure 1, and precipitation data involved the daily data of 34 precipitation gauge stations inside and outside the study area (mm/day) as shown in Figure 1. A grid-format land-use map and soil map were used, with the resolution of 30 m × 30 m and a data scale of 1:5000. The land-use map comprised eight items-water, urban, barren, pasture, forest, paddy, upland, and wetlands-and the soil map included data with 59 physical properties (these 59 physical properties are afa, fba, mab, etc., as classified by the National Institute of Environmental Research of South Korea).

Research Method
This study can be divided into three stages: (1) construction of input data for CNN model training, (2) training of CNN model, and (3) review of prediction results. The training of a CNN requires a labeled dataset training as it uses supervised training. The dataset required for the prediction and training of the CNN model was constructed by setting the hydrological image as an input feature and the corresponding discharge value as a target. The fully connected layer of the CNN model was improved to simulate the discharge, and the predicted results derived through this were compared with the historical records of discharge of the study areas to evaluate the model performance.

Building the Dataset for the CNN Model
(1) Hydrological Image as a Feature Most CNN models use RGB (red, green, blue) photographs for the purpose of predicting what is represented in the picture; however, predicting runoff using general RGB photographs can deny the premise of the empirical model (the relationship between independent and dependent variables) used herein. In other words, as an RGB photograph is similar to representing an arbitrary shape (the object and background in photograph) in a color value, the relationship between the feature and the prediction target becomes meaningless even if an extremely accurate prediction is derived for an RGB photograph. Therefore, the features for the CNN model require data having the property of a causal relationship with the data type and runoff (m 3 /s) required by CNN.
The hydrological image proposed by Song [28] is defined as a set of grids with dimensionless hydrological properties in a two-dimensional grid space for an arbitrary watershed. The hydrological image has merit in that it can reflect the hydrological phenomenon in the watershed. However, the characteristic of hydrological images allows image generation only for the event, and there is a limitation that they are not suitable for continuous discharge simulations. Moreover, because of its noncontinuous data, the lag time cannot be considered. Nevertheless, this study judged that the hydrological image is suitable for using the data of the development model of discharge because it can reflect the hydrologic condition.
The hydrological properties of each grid point in the image was based on the hydrological curve number (CN) [29], as shown in Equations (1)-(3), published by the SCS, formerly known as the National Resource Conservation Service. The CN is a value derived using conditions such as precipitation, soil map, and land use, and it is used to simulate direct runoff (mm), as well as for evaluating hydrological effects such as direct runoff caused by land-use change [30,31].
where Q denotes the amount of direct runoff (mm), P is the amount of precipitation (mm), Ia means the initial loss (mm), and S is the amount of residual storage (mm). The CN in Equation (3) was allocated according to the physical properties and land use of the four soils divided into hydrologic soil groups (HSGs) A, B, C, and D. In South Korea, this method can Water 2020, 12, 3534 6 of 22 be used as it is. However, the estimated discharge had an enormous error because the proposed CN in the SCS was built using the result of an experiment in United States (US) terrain. Consequently, South Korea re-manufactured the CN for the topographic conditions of South Korea. For this reason, the CN proposed in the Design Flood Estimation Techniques [32] by the Ministry of Land, Transport, and Maritime Affairs of South Korea was used in this study. The hydrological properties of each grid to generate hydrological images were calculated using Equations (1)-(3) proposed by the SCS, where Q in Equations (1) and (2) is the proxy of hydrological property [29].
The Thiessen polygon method [33] was proposed as a method of based on measurements of precipitation. However, this method has a disadvantage of generating distortions or differences in precipitation due to the discontinuity of numerical change at the polygonal boundary. Moreover, if the same precipitation is applied to calculate the hydrological properties for each grid as done herein, the hydrological image becomes very monotonous, making it unsuitable as training data for the CNN model. In this study, two-dimensional data of precipitation in grid units were constructed using inverse distance weighting (IDW) without using the Thiessen polygon method. As IDW is an interpolation method based on Tobler's law [34], it can estimate continuous precipitation change in grid units, thereby compensating for the disadvantages of the Thiessen polygons. In addition, the SCS proposed a modification of CN [29] considering the effect of precipitation in order to take into account the change in runoff due to antecedent precipitation. SCS introduced the concept of antecedent moisture condition (AMC), which is the cumulative precipitation for 5 days, and proposed dividing it into three stages-AMC I , II , and III -according to the dry and wet seasons. Moreover, the CN in Equation (3) represents the AMC II condition. Through this, the SCS proposed that CN can be adjusted according to conditions such as AMC changes, as shown in Equations (4) and (5). This adjustment was done and reflected in the calculation of the hydrological properties of each grid. However, in terms of the dry and wet seasons, the dry season was classified as from October to May and the wet season was classified as from June to September according to the climatic characteristics of South Korea, where precipitation is concentrated from June to September (Table 2).
where CN I is the CN for the AMC I condition and CN III is the CN for the AMC III condition. As the image of the non-precipitation condition was fixed, the data were excluded from the dataset composition, and only the data in the event of precipitation were collected. The collected data were converted into a TIF format file, an image file format that has the same two-dimensional structure as the land-use and soil map to be constructed as a feature. The TIF format file was used as the runoff data, which is the property value of the grid, and it was converted into meaningful color values in the integer range of 0-255, resulting in information loss if formats such as JPG, BMG, and PNG were used.
Consequently, the building process was as shown in Figure 3. Hydrological attributes were calculated using the CN value and precipitation value of each grid. The bottom grid plot of the Figure 3 corresponds to the final hydrological image. The grid color becomes whiter as the grid value increases and becomes darker as the grid value decreases. Here, the hydrological images were built using Python 3.7 which is an open-source language.
Water 2020, 12, x FOR PEER REVIEW 7 of 22 (2) Target Data In this study, three discharge data corresponding to the entire study area-JJ, HC, and BHwere used as target data that were subsequently built into a CSV format file to be recognized by the CNN model. The target data were for supervised training of the CNN model, and the CNN model herein was trained using hydrological images. The error was calculated through the difference between the estimated value derived on this basis and the target data, and the process of reducing the error through training was iterated. In this study, the target data were the discharge data corresponding to the date of the hydrological image.
(3) Dataset Setting The previously built hydrological image and target data were used as a dataset for the prediction and training of the CNN model. Basically, the dataset was divided into an input dataset and a test dataset. The input dataset was divided into a training dataset for model training and a validation dataset for verification of model training. As a method of setting the dataset, the datasets corresponding to two watersheds were used to train the model and the datasets corresponding to (2) Target Data In this study, three discharge data corresponding to the entire study area-JJ, HC, and BH-were used as target data that were subsequently built into a CSV format file to be recognized by the CNN model. The target data were for supervised training of the CNN model, and the CNN model herein was trained using hydrological images. The error was calculated through the difference between the estimated value derived on this basis and the target data, and the process of reducing the error through training was iterated. In this study, the target data were the discharge data corresponding to the date of the hydrological image. The previously built hydrological image and target data were used as a dataset for the prediction and training of the CNN model. Basically, the dataset was divided into an input dataset and a test dataset. The input dataset was divided into a training dataset for model training and a validation dataset for verification of model training. As a method of setting the dataset, the datasets corresponding to two watersheds were used to train the model and the datasets corresponding to one watershed were used to examine the prediction performance of the model. The datasets of the three cases were set as shown in Table 3.

CNN Structure Configuration
The development of the CNN algorithm was accelerated by LeCun et al. [35], mimicking the visual processing of object recognition in organisms. CNN has been applied to image classification, speech recognition, and image semantic segmentation [36]; CNN has also been very quickly evaluated as a core technology in the image classification field owing to its advantages such as reliable results and outstanding efficiency [37][38][39]. CNN can process two-dimensional or three-dimensional input data instead of the one-dimensional input data used in the conventional ANN. In particular, it learns the spatial information of the input data and can efficiently understand spatial attributes. The CNN model can be mainly divided into two parts. Part 1 comprises repetition of the convolution layer and the pulling layer, and Part 2 comprises a flatten layer, a dense layer, and an output layer in a fully connected layer connected after Part 1.
Part 1 is the core function of CNN that maintains the shape of the input/output data of each layer and spatial data of the image while effectively recognizing the attribute of the adjacent images. The convolution layer of Part 1 performs extraction of the image's characteristics by using the image searching window called the kernel while maintaining the shape of the image. Figure 4a shows the scheme of the convolution layer. The kernel moves on the input image and makes a new image of a different size from the input image. This image is called the feature, and the size of the feature depends on the kernel's moving distance, which is called the stride. The kernel acts like a weighted value in the ANN and is optimized by the CNN model training. The pooling layer reduces the size of the feature by down-sampling and has the function of inhibiting overfitting. Since the pooling layer also has a kernel and a stride, the size of the feature changes when the feature passes through the pooling layer (Figure 4b).
Water 2020, 12, x FOR PEER REVIEW 9 of 22 Part 2 refers to a DNN composed of several hidden layers or dense layers between the flatten layer and the output layer that derives the simulation results [40,41]. Figure 4c shows the flatten layer and the dense layer of Part 2. The flatten layer has the function of converting features into onedimensional data as shown in Figure 4c. Furthermore, the dense layer in Figure 4c is the same as the hidden layer in ANN, where each dense layer is optimized by updating the weight of each node in dense layer.
As mentioned above, as the CNN model was operated for the purpose of solving the classification issue, the current CNN model is unsuitable for simulating unspecified continuous variables such as discharge rate. Therefore, the CNN model was improved herein as shown in Figure  5. Part 1 was designed in a structure wherein the convolution layer and the pulling layer are repeated five times, and Part 2 was designed in a structure wherein a flatten layer, two density layers, and a batch normalization layer are arranged, and a density layer is re-connected. Here, the batch Part 2 refers to a DNN composed of several hidden layers or dense layers between the flatten layer and the output layer that derives the simulation results [40,41]. Figure 4c shows the flatten layer and the dense layer of Part 2. The flatten layer has the function of converting features into one-dimensional data as shown in Figure 4c. Furthermore, the dense layer in Figure 4c is the same as the hidden layer in ANN, where each dense layer is optimized by updating the weight of each node in dense layer.
As mentioned above, as the CNN model was operated for the purpose of solving the classification issue, the current CNN model is unsuitable for simulating unspecified continuous variables such as discharge rate. Therefore, the CNN model was improved herein as shown in Figure 5. Part 1 was designed in a structure wherein the convolution layer and the pulling layer are repeated five times, and Part 2 was designed in a structure wherein a flatten layer, two density layers, and a batch normalization layer are arranged, and a density layer is re-connected. Here, the batch normalization layer is characterized by improving the gradient vanishing and local minima, enabling stable model training [42].
which uses the function of Tensorflow [44] (a machine training library released by Google), was used as the environment for CNN model design, implementation, and operation, which was subsequently implemented and experimented in Python. Keras is a high-level DNN application programming interface written in Python that supports all special functions of Tensorflow and has very high flexibility in implementing DNN models. In addition, fast experiments can be conducted as it can efficiently use the central and graphics processing units (CPU and GPU) [43].

Detailed Modified Configuration
CNN uses a separate 2D plane function called a kernel, a type of image filter, which is in the convolution layer and plays the role of a parameter to obtain the image features. The size of the kernel is defined in a square shape, wherein the size and number can be arbitrarily set. In this study, all kernels were set to a size of 3 × 3, and the number of kernels was set to 32, 64, 128, 256, and 512 in the order of five convolution layers. In addition, this kernel extracted image features while searching the input image. The interval of this movement is called the stride, which was set to move by one space. The image was reduced while extracting the image features via the kernel, and the method to prevent this is called padding. Padding was also set herein to prevent information loss of output data. A rectified linear unit (ReLu) function was set as the convolution layer had an activation function as The CNN model uses the Softmax or Sigmoid function as the activation function to solve classification problems. However, a linear function has frequently been used in the regression model instead of the Softmax or Sigmoid function. Because this paper aimed to simulate the discharge, the activation function in the last dense layer of this CNN model was set as a linear function. Keras [43], which uses the function of Tensorflow [44] (a machine training library released by Google), was used as the environment for CNN model design, implementation, and operation, which was subsequently implemented and experimented in Python. Keras is a high-level DNN application programming interface written in Python that supports all special functions of Tensorflow and has very high flexibility in implementing DNN models. In addition, fast experiments can be conducted as it can efficiently use the central and graphics processing units (CPU and GPU) [43].

Detailed Modified Configuration
CNN uses a separate 2D plane function called a kernel, a type of image filter, which is in the convolution layer and plays the role of a parameter to obtain the image features. The size of the kernel is defined in a square shape, wherein the size and number can be arbitrarily set. In this study, all kernels were set to a size of 3 × 3, and the number of kernels was set to 32, 64, 128, 256, and 512 in the order of five convolution layers. In addition, this kernel extracted image features while searching the input image. The interval of this movement is called the stride, which was set to move by one space. The image was reduced while extracting the image features via the kernel, and the method to prevent this is called padding. Padding was also set herein to prevent information loss of output data. A rectified linear unit (ReLu) function was set as the convolution layer had an activation function as shown in Equation (6). The ReLu function is commonly used in CNN models. It can avoid gradient vanishing, and the optimization efficiency of the model is high [45,46].
The pooling layer is a sub-sampling method wherein only important information is left while reducing the size of the data received from the convolution layer as input data. It has an effect of preventing overfitting as the computer memory can be efficiently used and the calculated data are reduced. There are max pooling and average pooling methods for pooling. Max pooling was used in this model as average pooling can cause loss of information. The size of the pooling can also be arbitrarily set. In this study, the size of the convolution layer was set to 3 × 3 as the kernel size. The neural network using supervised training optimizes the model by continuously updating the parameters to minimize errors between the predicted values and target values of the model derived during training. The error is called loss in neural networks, and the purpose of supervised training can be described as minimizing loss. The change in loss can be used as an index to determine the result of model training, and the change in loss can be monitored using the loss function. Mainly, the CNN model uses loss functions such as binary cross-entropy, categorical cross-entropy, and sparse categorical cross-entropy. These are loss functions for the purpose of classification and are unsuitable for a model predicting continuous variables as done herein. Therefore, mean square error (MSE), which is widely used in regression problems, was used as the loss function in this study, as shown in Equation (7), and the training degree (optimization) of the model was determined on this basis. The item to be examined together with model training was the predictive performance of the trained model. As an examination metric, the mean absolute error (MAE) was applied herein, as shown in Equation (8).
where y i and y i denote a measured value and a predicted value, respectively. In addition, as MSE and MAE converge to 0, the performance of the model can be considered to be higher. Even if the result of the loss function and the predictive performance of the model are moderate or good, it is the result of training between the input data and the target data for model training. Therefore, it should be verified that smooth prediction is derived even for new data (or unseen data) not used in model training. This is also called model generalization. In this study, some of the collected data were selected as a validation dataset to evaluate the generalization of the model. The results of model training were determined using loss, MAE, validation loss (Val loss), and validation MAE (Val MAE), where loss and MAE refer to training indices for input data, and Val loss and Val MAE refer to validation indices in the validation dataset. In terms of the algorithm for optimization, the stochastic gradient descent (SGD) [47] was considered to be the most basic methodology. Whenever the weight is updated, the SGD measures the slope through differentiation and subsequently updates toward a lower slope to reduce loss. To improve the shortcomings of SGD in recent years, optimization algorithms such as Momentum [48], Nesterov Accelerated Gradient [49], Adagrad [50], Nadam [51], AdaDelta [52], RMSProp [53], and Adam [54] were used to improve training speed and accuracy based on SGD. Adam, referring to adaptive momentum estimation, is a combination of the concepts of momentum optimization and RMSProp, which follows the exponential decaying average of the gradient similar to momentum optimization and follows the exponential decaying average of the square of the last gradient similar to RMSProp [55]. Adam with these characteristics has a function enabling stable and fast optimization of the model, leading to better results than other optimization methods. Adam was herein set as the weight optimization algorithm of the model, whereby the learning rate was set to 0.0001 and the training epoch was set to 500. The CNN model training herein was performed on a desktop using an Intel ® Core™ i9-9900K CPU @ 3.60 GHz, 32 GB random-access memory (RAM), NDIVIA ® GEFORCE RTX™ 2080Ti GPU, and Windows operating system (OS).

Evaluation of Model
For the evaluation of the model, its performance was determined by comparing the measured value and the predicted value. Three methods were used to evaluate the model: Pearson correlation coefficient (r), Nash-Sutcliffe efficiency (NSE), and root-mean-square error (RMSE); r ranged from −1 to 1 and was a method to examine the linearity relationship between the measured and predicted values. The value of r was measured as shown in Equation (9), with values closer to 1 denoting stronger linearity between the predicted value and the measured value, according to which the performance of the model was determined to be good.
The NSE is presented in Equation (10). NSE ranges from 0 to 1, with the model results being in perfect agreement when the measured values of NSE are 1. On the other hand, the model performance degrades as it converges to zero.
RMSE shows how close the model predicts to the measured value, as shown in Equation (11), and the value ranges from 0 to infinity within the range of the data. In general, RMSE follows the unit of the target to be simulated; thus, the RMSE of this study was equal to the discharge rate unit of m 3 /s. Although the evaluation criteria for the RMSE itself are not clear, a closer value to 0 denotes that the model results are closer to the measured values.
where y i and y i denote the measured values and predicted values, respectively, while y and y i present the average measured values and predicted values.

Precipitation and Discharge by Watershed
The number of precipitation events in the period of 10 years from 2010 to 2019 during the study period was 566, 571, and 554 for JJ, HC and BH, respectively. For JJ, the precipitation range was 0.5-358.0 mm, and the range of discharge was 0.8-774.0 m 3 /s (Figure 6a). For HC, the precipitation range was 0.3-218.6 mm, and the range of discharge was 0.2-603.2 m 3 /s (Figure 6b). For BH, the precipitation was 0.5-178.8 mm, and the range of discharge was 0.9-299.5 m 3 /s (Figure 6c). All three watersheds showed the highest discharge rate in 2011, possibly due to the highest precipitation in 2011 (1.581.6 mm) compared to other years. In addition, the average value of discharge was 18.5 m 3 /s for BH, 25.5 m 3 /s for HC, and 23.1 m 3 /s for JJ, which is considered to be the effect of the watershed area as shown in Table 1.

Result of Building the Hydrological Image
As explained in Section 3.1, the number of hydrological images was 554 for JJ, 571 for HC, and 566 for BH, like the number of days of precipitation in the three study areas. Figure 7 shows one of the three hydrological images of the study areas. All hydrological images of all study areas were built according to equations 1-3 using a 30 m × 30 m resolution Precipitation Image and CN Image. JJ, HC and BH were images with grids of 670 × 819, 1014 × 632 and 513 × 872, respectively. The CNN model is required to have the same size for all input data images. Hence, they cannot be used for CNN model training as each study area had a different size as shown in Figure 7b. Therefore, the hydrological images of all study areas were rebuilt to 1014 × 1014 herein based on 1014, which is the largest size of the study area as shown in Figure 7c. In terms of the method, Pillow's linear interpolation method [56], an image module of Python, was used in this study.

Result of Building the Hydrological Image
As explained in Section 3.1, the number of hydrological images was 554 for JJ, 571 for HC, and 566 for BH, like the number of days of precipitation in the three study areas. Figure 7 shows one of the three hydrological images of the study areas. All hydrological images of all study areas were built according to equations 1-3 using a 30 m × 30 m resolution Precipitation Image and CN Image. JJ, HC and BH were images with grids of 670 × 819, 1014 × 632 and 513 × 872, respectively. The CNN model is required to have the same size for all input data images. Hence, they cannot be used for CNN model training as each study area had a different size as shown in Figure 7b. Therefore, the hydrological images of all study areas were rebuilt to 1014 × 1014 herein based on 1014, which is the largest size of the study area as shown in Figure 7c. In terms of the method, Pillow's linear interpolation method [56], an image module of Python, was used in this study.
Although the readjusted hydrological images were used for the training and testing of the CNN model and the data were used for training, verification, and testing of the model, there is no standard set for the data ratio. However, training and verification data were used in the ratio of 7:3 or 8:2 in several previous studies [57][58][59][60]. Therefore, the data were divided as shown in Table 4, and the model was divided into Cases 1, 2, and 3, used herein for training, verification, and testing, respectively. Although the readjusted hydrological images were used for the training and testing of the CNN model and the data were used for training, verification, and testing of the model, there is no standard set for the data ratio. However, training and verification data were used in the ratio of 7:3 or 8:2 in several previous studies [57][58][59][60]. Therefore, the data were divided as shown in Table 4, and the model was divided into Cases 1, 2, and 3, used herein for training, verification, and testing, respectively.

Model Structure and Training Results
The CNN model used herein is as shown in Table 5. As designed, it comprised a structure wherein the convolution layer and the pooling layer were repeated five times with a structure followed by a fully connected layer. The total number of parameters, which is the number of weights in the CNN model, was 27,850,241, the number of trainable parameters was 27,849,985, and the number of nontrainable parameters was 256, as shown in Table 5. The nontrainable parameters are

Model Structure and Training Results
The CNN model used herein is as shown in Table 5. As designed, it comprised a structure wherein the convolution layer and the pooling layer were repeated five times with a structure followed by a fully connected layer. The total number of parameters, which is the number of weights in the CNN model, was 27,850,241, the number of trainable parameters was 27,849,985, and the number of nontrainable parameters was 256, as shown in Table 5. The nontrainable parameters are shown due to the batch normalization layer having 285 nodes. For the purpose of this study, only one value was allowed to be the output from the last layer, i.e., dense layer 4 of Table 5, and a linear function was set as the activation function to simulate unspecified continuous variables. Figure 8 shows

Model Prediction
Using the classified data in Table 5 and Cases 1-3 of the model for estimating the discharge in the ungauged watershed, the prediction results of the precipitation for the entire study period are shown in Figure 9. The result of simulating the discharge of JJ using the Case 1 model, the discharge of HC using the Case 2 model, and the discharge of BH using the Case 3 model are shown in Figure  9a-c, respectively. The measured and predicted values of the discharge are as shown on the left graph in Figure 9a-c and the distribution of the measured versus predicted values is as shown on the right graph. The predicted value in Cases 1-3 tended to show a lower value than the actual value, and this can also be seen on the graphs to the right where the slope is much less than 1. However, it followed the measured value overall. In the case of the dispersion, a linear relationship between the measured value and the predicted value was shown in all models.

Model Prediction
Using the classified data in Table 5 and Cases 1-3 of the model for estimating the discharge in the ungauged watershed, the prediction results of the precipitation for the entire study period are shown in Figure 9. The result of simulating the discharge of JJ using the Case 1 model, the discharge of HC using the Case 2 model, and the discharge of BH using the Case 3 model are shown in Figure 9a-c, respectively. The measured and predicted values of the discharge are as shown on the left graph in Figure 9a-c and the distribution of the measured versus predicted values is as shown on the right graph. The predicted value in Cases 1-3 tended to show a lower value than the actual value, and this can also be seen on the graphs to the right where the slope is much less than 1. However, it followed the measured value overall. In the case of the dispersion, a linear relationship between the measured value and the predicted value was shown in all models.

Model Evaluation
To evaluate this model, Equations (9)-(11) were used to determine r, NSE, and RMSE, as shown in Table 6. In general, when evaluating the model using r, it was seen as weak when the value of r was in the range of 0.1-0.3, as moderate in the range of 0.4-0.6, as strong in the range of 0.7-0.9, and as perfect when it was 1 [61]. When evaluating the model using NSE, it was seen as moderate if the value of NSE was in the range of 0.6-0.8 and as good if it was above 0.8 [62]. In this model, r of Cases 1-3 was 0.9, showing a strong correlation, and NSE of Cases 1-3 was 0.7, indicating that all cases were moderate. In terms of the RMSE, Case 3 was 16.1 m 3 /s, Case 1 was 27 m 3 /s, and Case 2 was 28.5 m 3 /s, indicating that the RMSE in Case 1 was the lowest. These results determined that the discharge simulation methodology of ungauged watershed using hydrological images and that the CNN model has moderate predictive performance overall. Case 3 had a singular point unlike other cases. The BH simulated by Case 3 showed moderate discharge prediction results similar to Cases 1 and 2, which are forest-dominant areas, even though it had more agricultural areas than forest areas compared to other study areas. This model reflected the land use of the study area through hydrological images.

Model Evaluation
To evaluate this model, Equations (9)-(11) were used to determine r, NSE, and RMSE, as shown in Table 6. In general, when evaluating the model using r, it was seen as weak when the value of r was in the range of 0.1-0.3, as moderate in the range of 0.4-0.6, as strong in the range of 0.7-0.9, and as perfect when it was 1 [61]. When evaluating the model using NSE, it was seen as moderate if the value of NSE was in the range of 0.6-0.8 and as good if it was above 0.8 [62]. In this model, r of Cases 1-3 was 0.9, showing a strong correlation, and NSE of Cases 1-3 was 0.7, indicating that all cases were moderate. In terms of the RMSE, Case 3 was 16.1 m 3 /s, Case 1 was 27 m 3 /s, and Case 2 was 28.5 m 3 /s, indicating that the RMSE in Case 1 was the lowest. These results determined that the discharge simulation methodology of ungauged watershed using hydrological images and that the CNN model has moderate predictive performance overall. Case 3 had a singular point unlike other cases. The BH simulated by Case 3 showed moderate discharge prediction results similar to Cases 1 and 2, which are forest-dominant areas, even though it had more agricultural areas than forest areas compared to other study areas. This model reflected the land use of the study area through hydrological images. This study tried to compare the results of this paper and similar papers. This was difficult because few studies were conducted on the discharge simulation of an ungauged watershed using the CNN model. Instead, this paper compared results with a previous paper using the ANN model. The CNN model using a hydrological image with spatial attribute could derive moderate results for the discharge simulation of the ungauged watershed, and the hydrological image had sufficient utility as input data for the CNN model. In addition, considering the aspect of not using the watershed data for CNN model training to simulate the discharge of ungauged watersheds, our CNN model had sufficient function as a discharge prediction model for ungauged watersheds. However, the prediction results such as the RMSE of this model were observed to be similar or lower than the results of the preceding studies mentioned in Section 1. Therefore, even if r or NSE showed moderate results, this model has limitations in terms of quantitative numerical prediction. This was attributable to the structure of the CNN model used herein designed in a relatively simple structure, and the data used depend only on hydrological images, whereby the number of images was extremely small as mentioned earlier. However, our study is meaningful in that a new methodology for estimating the discharge rate in the ungauged watershed was proposed by developing a CNN model using hydrological images, and moderate results were derived. performance can be derived but also its application range can be greatly expanded considering the speed of technological development of the current CNN algorithm.