Comparative Study of AI-Based Methods— Application of Analyzing Inﬂow and Inﬁltration in Sanitary Sewer Subcatchments

: Inﬂow and inﬁltration (I / I) is a common problem in sanitary sewer systems. The I / I rate is also considered to be an important indicator of the operational and structural condition of the sewer system. Situation awareness in sanitary sewer systems requires accurate wastewater-ﬂow information at a ﬁne spatiotemporal scale. This study aims to develop artiﬁcial intelligence (AI)-based models (adaptive neurofuzzy inference system (ANFIS) and multilayer perceptron neural network (MLPNN)) and to compare their performance for identifying the potential inﬂow and inﬁltration of the sanitary sewer subcatchment of two pumping stations. We tested the performance of these AI models by using data gathered from two pumping stations through a supervisory control and data acquisition (SCADA) system. As a result, these two AI models produced similar inﬂow and inﬁltration patterns—both subcatchments experienced inﬂow and inﬁltration. On the other hand, the ANFIS had overall higher performance than that of the MLPNN model for modelling the I / I situation for the catchments. The results of the research can be used to support spatial decision making in sewer system maintenance.


Introduction
Sewer systems are used to collect sewage from water consumers and convey it to wastewater-treatment plants, forming part of society's critical infrastructure. In theory, sanitary sewers should only carry sewage originating from water consumption. Dry weather flow should follow the same pattern as water consumption in a physically undamaged, watertight network. Typically, structure: input, one hidden, and one output layer, and was reported in the literature as a universal approximator [25][26][27]. There are several advantages of using MLPNN artificial neural networks [25][26][27]. It is data-driven that do not require any restrictive assumptions on the form of the model. On the other hand, the model has the ability to generalize, thus the neural networks will respond to new data that has not been used in the training phase. The MLPNN is able to detect complex nonlinear relationships between variables. The ANFIS model has often been compared with the MLPNN algorithm in the application areas of modelling river temperature and water-quality management [28,29]. Compared to artificial neural networks (ANNs), the ANFIS model is more transparent to the user and causes fewer memorization errors [28,29].
The rest of the article is organized as follows: Section 2 gives an overview of the study area, datasets, and theoretical background that is relevant to the development of the AI-based models. Section 3 illustrates the results and validation of both models. The discussion and conclusion are presented in Sections 4 and 5, respectively.

Study Area
The study area was in the city of Espoo, Finland ( Figure 1). The Espoo sanitary sewer network is more than 900 km long and contains approximately 200 pumping stations. A subcatchment is defined as a surface area that potentially contributes rainfall-induced runoff to the flow at a pumping station. Subcatchments were delineated by using a watershed algorithm. The flow-direction raster was generated by using the eight-direction flow model and the 2 × 2 m digital elevation model (DEM) provided by the National Land Survey of Finland. The sewer network was burnt into the elevation model so that the direction of flow in a sewer was correct and rainfall could flow into a sewer system. After that, each subcatchment that contributes the surface runoff to another pumping station was excluded. All network pipes located upstream of a pumping station were rasterized and used as pour point cells in the watershed algorithm. The first subcatchment has an area of 1.1 km 2 , and the second subcatchment has an area of 0.7 km 2 .
Sustainability 2020, 12, x FOR PEER REVIEW 3 of 14 reported in the literature as a universal approximator [25][26][27]. There are several advantages of using MLPNN artificial neural networks [25][26][27]. It is data-driven that do not require any restrictive assumptions on the form of the model. On the other hand, the model has the ability to generalize, thus the neural networks will respond to new data that has not been used in the training phase. The MLPNN is able to detect complex nonlinear relationships between variables. The ANFIS model has often been compared with the MLPNN algorithm in the application areas of modelling river temperature and water-quality management [28][29]. Compared to artificial neural networks (ANNs), the ANFIS model is more transparent to the user and causes fewer memorization errors [28][29]. The rest of the article is organized as follows: Section 2 gives an overview of the study area, datasets, and theoretical background that is relevant to the development of the AI-based models. Section 3 illustrates the results and validation of both models. The discussion and conclusion are presented in Sections 4 and 5, respectively.

Study Area
The study area was in the city of Espoo, Finland ( Figure 1). The Espoo sanitary sewer network is more than 900 km long and contains approximately 200 pumping stations. A subcatchment is defined as a surface area that potentially contributes rainfall-induced runoff to the flow at a pumping station. Subcatchments were delineated by using a watershed algorithm. The flow-direction raster was generated by using the eight-direction flow model and the 2 × 2 m digital elevation model (DEM) provided by the National Land Survey of Finland. The sewer network was burnt into the elevation model so that the direction of flow in a sewer was correct and rainfall could flow into a sewer system. After that, each subcatchment that contributes the surface runoff to another pumping station was excluded. All network pipes located upstream of a pumping station were rasterized and used as pour point cells in the watershed algorithm. The first subcatchment has an area of 1.1 km 2 , and the second subcatchment has an area of 0.7 km 2 .

Data Collection and Processing
We used radar-based rainfall measurement in this study since the Espoo area has only two rain gauge sites, but radar measurements cover all study catchments. Rainfall data were obtained from the Vantaa C-band dual-polarization radar operated by the Finnish Meteorological Institute (FMI) [30,31]. Weather-radar data had high spatial (100 × 100 m) and temporal (five minutes between each scan) resolution. Two preprocessing steps were carried out to avoid contamination of radar measurements by the ground clutter, which was a considerable issue at close range (<30 km from the radar). For each elevation angle, the radar data was first resampled to a 100 × 100 m horizontal grid using inverse distance-weighted interpolation. The resulting grids were vertically interpolated to a constant altitude level of 500 m in the second stage. This was done by linear interpolation between the two radar scans nearest to the 500 m level.
The removal of non-meteorological echoes and filling the resulting gaps in the radar data were performed by using the AnDRe software package, which is in operational use at the FMI [32]. Reflectivity measurements from six elevation angles were combined by interpolating them to a constant altitude level of 500 m. For the conversion of radar reflectivity (Z) into rainfall intensities (R), the Z-R relationship adapted to the Finnish climate was used, where Z is in units of millimeters to the sixth power per cubic meter, and R is in millimeters per hour [33].
Rainfall intensities were interpolated into a 100 × 100 m grid enclosing each catchment. The final rainfall data were available in an hourly scale. Hourly rainfall accumulations were then obtained by averaging intensities measured every five minutes. Finally, rainfall accumulations were bias corrected by using two rain gauges located in the Espoo area. The correction factors were separately estimated for each year in the dataset.
Wastewater-flow data were collected from four to seven snowless months for each year from 2012 to 2014, altogether covering 17 months of time-series wastewater-flow data. The data-collection period was selected to make sure that frost and snowmelt did not falsify the results. Hourly follow data received from two pumping stations were used for analysis, and SCAD-based flow data quality was found to be adequate. At the pumping stations, flow data were estimated by using the registered number of times that a pump well was emptied in an hour multiplied by the well volume. After that, the SCADA system sent this information to a control room. In the data pre-processing phase, obvious errors of the dataset were removed. These two subcatchments did not experience sewer overflows within the data-collection period, which ensured the flow dataset that captures a typical inflow and infiltration response to a rainfall event.
In this study, three rainfall-threshold values were defined. According to the definition provided by FMI, a dry day is a day in which the amount of rainfall is less than 0.3 mm/day, and a rainy (wet) day refers to a day in which the total amount of rainfall varies from 1 to 4.4 mm/day [30]. According to the definition, three rainfall-threshold values, 0.3, 1, and 2 mm/day, were defined. After that, the amount of rainfall-threshold value was further divided by 24 h to obtain the estimated hourly rainfall value. For each pumping station, the flow dataset was divided into dry-and wet-day datasets according to three rainfall-threshold values. The data that had a lower value than the threshold value formulated a dry-weather dataset, and the rest were considered as the wet-weather dataset. After that, each dataset was normalized and used for both AI-based models. Table 1 illustrate the input and output variables for the ANFIS and MLPNN models. Table 1. Input and output variables for the adaptive neurofuzzy inference system (ANFIS) and multilayer perceptron neural network (MLPNN) models.

Input Variables
Time (hours); Wastewater flow rate (cubic meters)

Output Variable
Predicted wastewater flow at the corresponding time in hours

Water Consumption in Study Areas
Water-consumption variations need to be considered when studying I/I since water consumption becomes the main component in sewer base flow. In this research, water-consumption data are available for both subcatchment areas for each quarter of the year from 2012 to 2014 and were estimated on the basis of water-consumption billing information. The water consumption of Subcatchments 1 and 2 was 36,000 and 62,000 m 3 /year, respectively. Analysis results showed that, from April to June and from July to September, water consumption was approximately 1.2% and 0.4%, respectively, less than the whole-year average, which indicated that annual variation in water consumption has little effect on the sewer base flow.

Adaptive Neurofuzzy Inference System (ANFIS)
Fuzzy-set theory was first introduced by Zadeh as a mathematical theory of vagueness [34]. If X is the universe of discourse, and its elements are denoted by x, then fuzzy set A in X is defined as a set of ordered pairs called the membership functions (MFs) of x in A. The fuzzy set maps each element of X to an MF value between 0 and 1. The degree of membership function µ x is used to measure the degree to which the input variable belongs to different MFs. For instance, input value x more likely belongs to a low MF than a medium MF if the value of µ x(low) is greater than that of µ x(middle) . Fuzzy rules, such as "if input X is low and input Y is medium, then output Z is low," are then used to obtain the relationship between input and output. Fuzzy operators OR, AND, and NOT in the fuzzy rule can be used to describe a fuzzy union, intersect, or complement operations of the input MFs. For instance, a Gaussian function depends on two parameters, σ and c, as given by The author in [35] introduced the ANFIS principles. Figure 2 illustrates a general ANFIS architecture using five steps [35]. In the first step, input variables X and Y are specified, and each input variable is described by using two MFs. In this case, two fuzzy rules are created for two input variables, where {a i, b i, r i } is the parameter set (consequent parameters): Rule 2: If input X is µ x2 and input Y is µ y2 , then f 2 = a 2 µ x2 +b 2 µ y2 +r 2.
In the second step, two elements are created by multiplying the input MFs, and they are used to represent the strength of the rule. In the third step, the strength of the rule is normalized by calculating the ratio of the strength of the ith rule to the sum of the strengths of all the rules. After that, normalized rule strength w i is multiplied by the consequent part of the rule (function f ). In the last step, overall output Z is computed by using the sum of all incoming elements.
In the second step, two elements are created by multiplying the input MFs, and they are used to represent the strength of the rule. In the third step, the strength of the rule is normalized by calculating the ratio of the strength of the ith rule to the sum of the strengths of all the rules. After that, normalized rule strength is multiplied by the consequent part of the rule (function f). In the last step, overall output Z is computed by using the sum of all incoming elements.  ANFIS is an adaptive network that consists of nodes and directional links through which nodes are connected. The outputs depend on the parameter(s) pertaining to these nodes, and a learning rule specifies how these parameters should be changed to minimize a prescribed error measure [35]. The ANFIS uses a hybrid learning algorithm [35]. Let us assume that the adaptive network under consideration has only one output.
where I is the set of input variables, and S is the set of parameters. If there is a composite function H F and it is linear in the elements of S, the elements can be identified by using the least squares method [29]. For instance, S can be decomposed into the direct sum of two sets S 1 and S 2 , and function H F is linear in the elements of S 2 .
Training data P can be plugged into Equation (7). It obtains matrix equation AX = B, where X is an unknown vector whose elements are the parameters in S 2 . A least square estimation (LSE) of X, X * is sought to minimize squared error ||AX − B|| 2 , where X * uses a pseudoinverse of X. As a result, sequential formulas are employed to compute the LSE of X. Let the ith element of B be b T i ; then, X can be iteratively calculated using sequential formulas.
where S i is often called the covariance matrix, and least squares estimation X * is equal to X P . For the multioutput adaptive network, b T i is the ith rows of matrix B. In the ANFIS, each epoch of this hybrid learning procedure is composed of a forward and backward pass. In a forward pass, input data and functional signals go forward to calculate each node output until matrix AX = B is obtained, and parameter S 2 is identified by the sequential least squares formulas (Equations (7) and (8)). After that, functional signals keep going forward until error measures are calculated. In a back pass, error rates propagate from the output end towards the input end, and the parameters in S 1 are updated. Error tolerance is used to create a training stopping criterion that is related to error size. The training stops after the training-data error remains within this tolerance.

Multilayer Perceptron Neural Network
Artificial neural networks (ANNs) are a first-order mathematical approximation to the human nervous system that have been widely used for modelling nonlinear models [24]. ANN models are organized in three parallel layers: input, hidden, and output layers. The input layer contains the input variables, and the hidden layer contains several neurons determined using trial and error. In the beginning, neurons in the hidden layer receive input variables multiplied by the corresponding weights to perform a summation. In the second stage, the result is passed to the second layer through a nonlinear activation function, generally the sigmoid. In the output layer, we have only one neuron that corresponds to the dependent variables. Equation (9) illustrates the mathematical formula of the MLPNN, with one hidden layer containing n neurons and one output layer with only one neuron.
where x i is the input variable, w ij is the weight between input i and hidden neuron j, δ j is the bias of the hidden neuron j, f 1 is the activation sigmoid function, w jk is the weight of connection of neuron j in the hidden layer to unique neuron k in the output layer, δ 0 is the bias of the output neuron k, and f 2 is a linear activation function for the neuron in the output layer. We choose the MLPNN as a comparison because it is common in supervised learning and has been compared with ANFIS in many other applications [28,29]. In this study, the proposed MLPNN has three layers: one input layer, one hidden layer, and one output layer. The general structure of the MLPNN is illustrated in Figure 3.
The input layer has two variables: the flow value and its corresponding time in hours, and the output layer value is the predicted wastewater-flow value at its corresponding time in hours.
Sustainability 2020, 12, x FOR PEER REVIEW 7 of 14 layer, one hidden layer, and one output layer. The general structure of the MLPNN is illustrated in Figure 3. The input layer has two variables: the flow value and its corresponding time in hours, and the output layer value is the predicted wastewater-flow value at its corresponding time in hours.

Model Evaluation
In this study, ANFIS and MLPNN model performance was evaluated using the root mean square error (RMSE) [36] and coefficient of determination (R 2 ). RMSE describes the average difference between experiment values and estimated values, as expressed by equation 10: where N is the total number of data pairs, is the experiment value, and is the estimated value. In the ANFIS, the RMSE method is used to estimate training and checking errors. The training (or checking) error is the difference between a training (or checking)-data output value, and the output of the ANFIS corresponding to the same training (or checking) input value [18]. The training (or checking) error records the RMSE for the training (or checking) data at each epoch. R 2 is the coefficient of determination, which is the proportion of the variance in the dependent variable that is predictable from the independent variable (see equation 11).
In the MLPNN model, RMSE is often used to define the network error. The weights (wij) and bias levels (δ0) (see Equation (9)) are free parameters that can be adjusted when the structure of the neural network is defined. They need to be adjusted in order to minimize the RMSE. Data normalization is an important step in modelling the I/I rate with the MLPNN model. It removes dimensional differences in the data and improves the prediction ability of the MLPNN model. In this

Model Evaluation
In this study, ANFIS and MLPNN model performance was evaluated using the root mean square error (RMSE) [36] and coefficient of determination (R 2 ). RMSE describes the average difference between experiment values and estimated values, as expressed by Equation (10): where N is the total number of data pairs, y j is the experiment value, andŷ j is the estimated value.
In the ANFIS, the RMSE method is used to estimate training and checking errors. The training (or checking) error is the difference between a training (or checking)-data output value, and the output of the ANFIS corresponding to the same training (or checking) input value [18]. The training (or checking) error records the RMSE for the training (or checking) data at each epoch. R 2 is the coefficient of determination, which is the proportion of the variance in the dependent variable that is predictable from the independent variable (see Equation (11)).
In the MLPNN model, RMSE is often used to define the network error. The weights (w ij ) and bias levels (δ 0 ) (see Equation (9)) are free parameters that can be adjusted when the structure of the neural network is defined. They need to be adjusted in order to minimize the RMSE. Data normalization is an important step in modelling the I/I rate with the MLPNN model. It removes dimensional differences in the data and improves the prediction ability of the MLPNN model. In this study, all variables were normalized using min-max feature scaling, which bring all values into the range from zero to one [37].

Results
In the ANFIS model, a parameterized model structure of membership functions and rules were generated, and eight Gaussian MFs were created for each training process. The number of MFs were chosen in such a way that training and checking error could be obtained to an adequate limit. The grid-partition method was used to generate the fuzzy-inference system. In the grid-partition method, a dataset is divided into rectangular subspaces using axis-paralleled partition based on a predefined number of MFs and their corresponding types in each dimension. The number of fuzzy rules increases exponentially when the number of input variables increases; therefore, a grid-partition method is especially suitable for a case with small numbers of input variables. A hybrid learning algorithm was used to train the fuzzy-inference-system (FIS) model, and zero error tolerance was used as a criterion for stopping the training. The training process stops whenever the maximal epoch number is reached, and the training error goal is achieved. In the next step, we trained the MLPNN regressor and fit the model with existing datasets. Finally, the ANFIS and MLPNN models were validated by computing RMSE and R 2 . The model-evaluation results are illustrated in Tables 2 and 3. ANFIS had much better RMSE value than the MLPNN model for almost all input datasets except for the wet-weather scenario of the Station 2. For the ANFIS model, the RMSE value is within the range of 0.07 to 0.1199, which is reasonably good. Table 3 also indicates similar results. The ANFIS has better R 2 performance than the MLPNN model except for Station 2 dry-weather scenario with a threshold value of 2 mml.  The results of the trained ANFIS and MLPNN models for Subcatchments 1 and 2 are illustrated in Figures 4 and 5. The time in hours is presented on the x-axis and the corresponding models' output on the y-axis. Models for dry weather flow for pumping Stations 1 and 2 are presented with blue and yellow curves and those for wet weather flow with red and purple curves. Areas between red and blue curves represent the amount of inflow within Subcatchment 1. The area between purple and yellow curves represents the amount of inflow within Subcatchment 2. The curves for different subcatchments were plotted together in Figures 4 and 5, which allowed the inflow levels of the two stations to be compared. Later, the curves of each pumping station for three rainfall-threshold values were plotted to observe the sensitivity of the model towards a change of rainfall-threshold values.     According to Figure 4, both subcatchments experienced inflow, since the wet-weather curve was above the dry-weather curve for most of the time. Subcatchment 1 experienced higher levels of inflow than those of Subcatchment 2. Some exceptions occurred between 15:00 and 20:00, with a rainfall-threshold value of 0.3 and 1 mm/day for the ANFIS model. In addition, inflow value increased for both subcatchments when rainfall-threshold value rose. For instance, Subcatchment 1 suffered from inflow even under mild rainfall with a threshold of 0.3 mm/day, and the two curves were further apart from each other. With a threshold value of 2 mm/day, the difference was even more evident, which indicated that the amount of flow inside the sewer network increased significantly under heavy rainfall.
In addition to inflow, the effect of infiltration can also be identified by studying the minimal night-time flow levels of the subcatchments. For each ANFIS model, input-flow values were normalized; therefore, minimal night-time flow should be the same (normalized value of zero) for both dry-and wet-weather scenarios in an ideal case without infiltration. Night-time minimal flow represents a period of minimal sanitary flow. A high percentage of night-time minimal flow may be attributed to groundwater infiltration [31]. However, for both subcatchments, the flow is always above the zero level, which indicated that the flow is typically above the minimal flow value. If hourly flow data covered only a short period, elevated minimal night-time flows could be caused by atypical water consumption occurring by chance during the studied period. However, the period of data used in the study was relatively long, 17 months in total. Therefore, frequent night-time flow values above the minimum could not be caused by unusual water consumption but by infiltration since these two subcatchment areas are mainly residential areas without big industrial consumers. The ANFIS output values for minimal night-time flow were around 0.07 and 0.06 for Subcatchments 1 and 2, respectively. The infiltration ratio during maximal flow conditions could be calculated by dividing the maximal dry weather flow value by the minimal dry weather night-time flow value. Using the 1 mm/day threshold value, Subcatchments 1 and 2 had infiltration ratios of approximately 40% and 17%, respectively. Figure 5 illustrates the results of the MLPNN model. The MLPNN model produced a similar pattern as that of the ANFIS model. There were peak values identified in the morning and in the evening. The wet curves were mostly above the dry curves for both stations, which indicated the possibility of having inflow and infiltration. However, curves produced by the MLPNN model seemed to fluctuate since the initial curve of the perceptron model was random, and it needed a larger dataset to teach itself to come to a convergence. If there were no existing data points to calibrate the random value, the value in the results could be far from the expected value, which would generate excessive fluctuation.
In this research work, flow data for typical weekdays were used to ensure that water-consumption behavior in the study area was always similar. According to Figure 4 (rainfall-threshold value 0.3 mm/day), minimal flow occurred around 03:00, flow started to increase rapidly and reached its peak value at around 07:00, and another peak value appeared at around 20:00. This indicates that most people living inside the catchment area use more water around 7:00 before leaving to work or school, and 20:00 before going to sleep. Subcatchments of both pumping stations were relatively small, therefore the time needed for flow to reach the pumping station from the farthest reaches of the network was in the order of minutes. Therefore, flow delay would not affect the results. Figures 6 and 7 illustrate sensitivity analysis of the ANFIS and MLPNN models to a change of rainfall-threshold values. Results showed that both models were not sensitive to a change of the threshold value in the dry-weather scenario but sensitive in the wet-weather scenario. The reason is that the wet-weather dataset contained much fewer data compared to the dry-weather dataset. When the threshold value increased, the amount of rain data decreased rapidly, which caused the model to be sensitive to the results. Compared to the ANFIS model, the MLPNN model was more sensitive to a change of threshold value for the wet-weather scenario in Station 2. That means that the change of threshold value could make the curve vary more than the ANFIS model. This is because the perceptron model was using random mapping that could exaggerate the difference.
Sustainability 2020, 12, x FOR PEER REVIEW 11 of 14 that the wet-weather dataset contained much fewer data compared to the dry-weather dataset. When the threshold value increased, the amount of rain data decreased rapidly, which caused the model to be sensitive to the results. Compared to the ANFIS model, the MLPNN model was more sensitive to a change of threshold value for the wet-weather scenario in Station 2. That means that the change of threshold value could make the curve vary more than the ANFIS model. This is because the perceptron model was using random mapping that could exaggerate the difference.

Discussion
This article introduced an AI-driven approach to estimate the I/I of two subcatchments. The proposed AI-based models have several advantages in estimating I/I values. A subcatchment is defined as a surface area that potentially contributes to rainfall-induced runoff to the flow at a

Discussion
This article introduced an AI-driven approach to estimate the I/I of two subcatchments. The proposed AI-based models have several advantages in estimating I/I values. A subcatchment is defined as a surface area that potentially contributes to rainfall-induced runoff to the flow at a pumping station. It enables a spatial-thinking approach of the I/I problem and helps to identify locations where sewer maintenance is needed. Therefore, the results of the study can be used to support spatial decision-making for sewer system maintenance. The ANFIS brings better performance than ANN model in predicting the inflow and infiltration of subcatchments. For instance, it is possible to use hourly flow data that include outliers and uncertainties since the ANFIS captures the typical pattern of the majority of data values. On the other hand, the ANFIS cannot be used to estimate inflow and infiltration under extreme conditions since extremely high-and low-flow data points were relatively few and not captured in the ANFIS model.
We used radar-based rainfall measurements in this study to define the sewer subcatchments. There are two types of uncertainty in radar-based rainfall measurements. Systematic biases can happen due to radar calibration issues, wet random attenuation, or a reflectivity-rain rate conversion that is not appropriate to the present weather situation. These can lead to over/underestimation of rainfall over the whole catchment area. If the radar data is gauge-corrected, the systematic biases are not expected to last for a long time period. Random errors can occur at individual locations inside the catchment grid. These can be, for instance, due to ground clutter, beam blockage or attenuation of the radar beam. In addition, note that the bias correction of radar data by rain gauges is not guaranteed to remove all errors. A yearly correction factor using two rain gauges might not be sufficient for correcting transient errors, errors that occur on a very small scale, or errors that occur far from the gauge sites.
In this research, the studied subcatchments were relatively small, thus the time that the flow took from the farthest reaches of the network to reach the pumping station was short (within an hour). In the future, maximal flow delay (e.g., more than an hour) should be incorporated into the model process to make this data-driven approach suitable for different subcatchment sizes. Furthermore, groundwater infiltration was not considered in this research due to the lack of a groundwater-level dataset. If a subcatchment experiences constant groundwater infiltration, e.g., pipes being continuously below the groundwater table, this data-driven approach cannot differentiate between actual sewage flow and groundwater infiltration. In the future, this problem can be solved, e.g., by comparing the groundwater level to pipe locations and finding locations that have a high probability of groundwater infiltration.
We also applied a perceptron neural network model. However, we found that this model had lower performance than the ANFIS, and several problems may exist. First, the perceptron model needs randomly generated datasets, and this can much increase fluctuations. The result was not stable when comparing while calculating multiple times. This would have significantly larger fluctuations when using a small dataset. In the future, other types of machine learning models such as long short-term memory (LSTM) can also be used to conduct time-series analysis of the wastewater-flow data.
One of the objectives of this research work was to use flow data that were automatically collected from pumping stations to analyze the I/I. This entire analysis could be automated in the future since installations that are needed for collecting flow data at pumping stations are already in place. Flow estimation was originally meant to serve automation at pumping stations; therefore, the quality is not yet sufficient for the quantification of I/I in all situations. In the future, after the quality of the flow data has improved, there will be better chances of also estimating the response of the network to individual rainfall events.

Conclusions
In this article, two AI-based methods (ANFIS and MLPNN) were developed to incorporate an hourly flow dataset derived from sanitary sewer pumping stations to aid in I/I estimation and sanitary-sewer-system maintenance. Results were validated by computing the RSME and R 2 value for each model's results. The fuzzy model had an overall higher performance than that of the MLPNN model. In this research, three rainfall-threshold values were used to analyze the sensitivity of the model. Using a different threshold value only slightly affected dry-weather curves but significantly affected wet-weather curves. Results indicated that both subcatchments suffered from both inflow and infiltration, and Subcatchment 1 had a higher inflow level than that of Subcatchment 2.
The effect of infiltration could be identified by studying the minimal dry weather flow levels of each subcatchment. The normalized minimal dry weather flows were above zero, which indicated that minimal dry weather flow is typically higher than the flow that is caused by water consumption alone. Therefore, both subcatchments experienced infiltration, and Subcatchment 1 to a higher degree. According to the results, it is recommended that additional studies should be carried out for Subcatchment 1 to further identify the causes for the high levels of both inflow and infiltration.
Author Contributions: Z.Z.; conceptualization, data curation, investigation, methodology, project administration, software development, supervision, visualization, writing original draft, and reviewing and editing; T.L.: data curation, writing original draft, and reviewing and editing; Z.W.: software development, writing original draft, and reviewing and editing; S.P.: data curation, writing original draft, and reviewing and editing; S.A.: data curation, reviewing and editing; K.V.; supervision, reviewing and editing; Y.L.; supervision, reviewing and editing; C.Z.: supervision, reviewing and editing; R.V.: supervision, reviewing and editing; Z.S.: supervision, reviewing and editing; X.C.: reviewing and editing. All authors have read and agreed to the published version of the manuscript.