Development and Comparison of Prediction Models for Sanitary Sewer Pipes Condition Assessment Using Multinomial Logistic Regression and Artificial Neural Network

: Sanitary sewer pipes infrastructure system being in good condition is essential for provid-ing safe conveyance of the wastewater from homes, businesses, and industries to the wastewater treatment plants. For sanitary sewer pipes to deliver the wastewater to the treatment plants, they must be in good condition. Most of the water utilities have aged sanitary sewer pipes. Water utilities inspect sewer pipes to decide which segments of the sanitary sewer pipes need rehabilitation or replacement. The process of inspecting the sewer pipes is described as condition assessment. This condition assessment process is costly and necessitates developing a model that predicts the condition rating of sanitary sewer pipes. The objective of this study is to develop Multinomial Logistic Regression (MLR) and Artiﬁcial Neural Network (ANN) models to predict sanitary sewer pipes condition rating using inspection and condition assessment data. MLR and ANN models are developed from the City of Dallas’s data. The MLR model is built using 80% of randomly selected data and validated using the remaining 20% of data. The ANN model is trained, validated, and tested. The signiﬁcant physical factors inﬂuencing sanitary pipes condition rating include diameter, age, pipe material, and length. Soil type is the environmental factor that inﬂuences sanitary sewer pipes condition rating. The accuracy of the performance of the MLR and ANN is found to be 75% and 85%, respectively. This study contributes to the body of knowledge by developing models to predict sanitary sewer pipes condition rating that enables policymakers and sanitary sewer utilities managers to prioritize the sanitary sewer pipes to be rehabilitated and/or replaced.


Introduction
The provision of wastewater services to communities and municipalities is essential for public health, safety, and socioeconomic development. This requires wastewater infrastructure systems that collect wastewater from homes, businesses, and industry and convey the sewer to the treatment plants. The components of wastewater infrastructure systems include service laterals, sewer pipelines, manholes, force mains, siphons, combined sewer overflow regulations, pumping stations, and wet wells [1][2][3].
According to the [4], wastewater infrastructure was given a D+ score. It is unpredictable to know where or when an accidental pipeline failure would occur. Consequently, and to mitigate this, regulating agencies demand that wastewater collection systems conduct periodic sewer inspections to comply with legal requirements. Due to limited budgets, however, not all segments of sewer pipes in wastewater collections systems can be inspected and assessed. To address this shortcoming of assessing criticality of the sewer pipes, utilities need pipe condition estimation models [5].
With limited budgets, policy makers and utility managers must make rational decisions in replacing and/or rehabilitating the pipelines. Asset managers need to make decisions regarding the selection of optimal rehabilitation action for each sewer condition [6]. Managing these assets rationally is, therefore, fundamental for the sustainability of the services and to the economy of societies [7]. To be able to make effective decisions, most water utilities are implementing asset management. This involves mapping and condition assessment of the wastewater collection systems. In reviewing the sustainability of urban water systems [8], it was concluded that identifying and implementing sustainable rehabilitation interventions in the long-term is essential for the survival of a high-service level urban water system.

Sewer Pipe Classification
Sewer pipes are classified based on several attributes. These include physical and environmental factors.

Physical Factors
The physical factors include pipe material, diameter, age, length, depth, and slope. Cement-based pipes, vitrified clay pipe (VCP), plastic pipes, and metallic pipes are four categories of pipe material. Ref. [9] discussed types of these pipe categories as: Cement based pipes including concrete pipes and asbestos-cement (AC) pipe. Concrete pipes are nonreinforced concrete pipe (CP), reinforced concrete pipe (RCP), prestressed concrete cylinder pipe (PCCP), reinforced concrete cylinder pipe, bar-wrapped steel-cylinder concrete pipe, and polymer concrete pipe (PCP). The plastic pipes are polyvinyl chloride (PVC) pipe, polyethylene (PE) pipe, glass reinforced pipe (GRP or Fiberglass Pipe). Metallic pipes are ductile iron (DI) pipe and steel pipe. Ref. [10] found that material characteristic has the greatest impact on the pipe condition.
Sewer pipe diameter is a significant variable in pipe condition rating. Most water collection systems use a minimum of 6 in. sewer pipes while a few others use a minimum of 8 in. These two common sizes notwithstanding, pipes can be greater than 96 in. Pipe diameter is one of the factors affecting deterioration of sewer pipes. Ref. [11] stated that some condition prediction models identified that sewer deterioration rate decreases with increasing diameter. "With occurrence of obstacles in the conduit, segments with small diameters are more likely to experience a hydraulic performance drop than large diameter ones" [12].
Pipe age is one of the pipe characteristics that is a significant variable in sewer pipes condition rating. Refs. [12,13] found that pipe age was a significant parameter in the sewer systems deterioration model. Ref. [10] stated that age has a negative impact on pipe condition. Ref. [14] established that the pipe age influenced sewer pipe condition and found that poor pipe sewer condition is higher for pipes more than 50 years. "Most of the condition prediction models developed in previous studies show that pipe age has a significant relationship with deterioration of sewer pipes" [15].
Sewer pipe length is a segment of a pipe that is measured from manhole-to-manhole. Ref. [16] noted that length is relevant in describing sewer deterioration even though it is secondary to pipe material. "Typically, longer manhole-to-manhole sewer pipe segments have higher deterioration rates because the probability of defects is greater in longer pipes" [9]. Ref. [10] established that pipe length is an important factor that influences pipe condition. Pipe depth is the distance from ground surface to the top of the installed pipe in the ground. [17] determined that pipes buried in depths between 2 m (6 ft) and 3 m (9 ft) were least connected to poor sewer pipe condition. Ref. [9] stated that shallowly buried pipes are subjected to more defects and higher deterioration rate due to surface load, illegal connections, and tree root intrusion.
Slope is the gradient of pipe installed from one manhole to manhole. Ref. [12] described segment slope in percentage per length of a segment. Slope will determine the velocity of flow in the sewer. Flat slope will encourage deposition of debris inside the pipe. Ref. [14] stated that negative slopes and extremely low slopes lead to debris accumulation and blockages. Ref. [18] found that negative and very low slopes were the most harmful conditions for sewer pipes, whereas steep slope high velocities cause erosion in the pipe walls. Pipe age is one of the pipe characteristics that is a significant variable in the condition rating of sewer pipes.

Environmental Factors
Environmental factors associated with sewer pipe condition rating include, but are not limited to, surface condition, soil type, soil pH, and corrosivity. Surface condition is the ground surface beneath which a sewer pipe is located. Ref. [9] stated that the location of pipe affects the magnitude of surface loading to which it is subject. The most common types of soil classified as soil texture are sand, loam, clay, and rock. Ref. [19] stated that the type of soil is a factor that affects ground loss and stability of the sewer pipeline. The interaction of the soil with the sewer pipes determines the deterioration of pipes.

Sewer Pipe Condition Prediction Models
Sewer pipes condition prediction models are utilized to determine the condition of non-inspected pipes. These assist operators of water utilities to develop renewal strategies of the pipes and to forecast the evolution of the condition of the sewer network under different investment strategies.

Statistical Methods
Statistical models establish relations between known pipe variables and the sewer pipe condition based on the condition assessment inspection data. Statistical models include discriminant analysis, logistic regression, binary regression, exponential regression, and Markov, Gomptiz, and Bayesian. "The model is calibrated using maximum likelihood fitting methods to provide the best match between model predictions and recorded failure data. Goodness of fit between model forecasts and actual observations is then demonstrated by comparison with a blind data set that was not part of the calibration process" [20]. Multiple linear regression analysis allows many observed factors to affect y. The general multiple linear regression model can be written as (Equation (1)): β 0 is the intercept β 1 is the parameter associated with x 1 , β 2 is the parameter associated with x 2 , and so on is the parameters that cannot be included and are collectively contained in u The equation below is a multiple regression where Y is a predicted outcome for individual based on (a) the Y intercept, a, the value of Y when all predictor values are 0, (b) the product of the independent variables, X s , and the regression coefficients, b k ; and (c) the residual, ε i (Equations (2)-(5)): Odds and Logit Log odds Ref. [2] used age, diameter, length, slope, and material, and built a logistic regression model as follows (Equation (5)): where Y * is the unobservable conduit condition, α 0 is the threshold, β 1 . . . . β 5 are regressor coefficients. A logistic model describes the relationship between an outcome (i.e., dependent or response) and a set of prediction (i.e., independent or explanatory) variables, often referred to as covariates [21]. Equation (6) represents logistic regression model according to [3].
where: Y = dependent variable a = intercept parameter β p = regression coefficients associated with p independent variables. Probability of (y = 1) determined using exponential transformation. π = p(y = 1|x 1 . . . .x n ) In this model, new values of Y can be forecasted with new observed values of X. Equation (7) shows general function of binary logistic regression. π represents Pr (Y = 1) meaning probability associated with outcome of condition 1. Consequently, 1 − π represents Pr (Y = 0) meaning probability of outcome of condition 0. π/(1 − π) means the odds of having (Y = 1).
Multinomial logistic regression is an extension of binary logistic regression and can be used when dependent variable is categorical and has more than two levels [3]. ln where, i = 1, 2, . . . , k − 1 correspond to categories of the dependent variable, xs are independent variables, n is the number of independent variables, β 0 is the intercept for category i, β is are the regression coefficients of independent variables defined for each category i.
Assuming three sewer pipe conditions using Equations (8) and (9), Equations (10)-(12) represent multinomial logistic regression for a pipe system with three condition levels 0, 1, and 2. Category zero (0) is used as the reference value. The model is developed with logit functions. To develop the model, p covariate and a constant term were denoted by the vector x [22].

Artificial Intelligence System
ANN is one of the modeling techniques of artificial intelligence modeling techniques. According to [23], some emerging techniques for artificial intelligence systems seek to make better use of human reasoning to solve problems involving incomplete knowledge and use of descriptive terms. ANN predicts output from input information in a manner that simulates the operation of the human central nervous system [17]. Ref. [20] stated that ANN is being increasingly used to solve complex problems, which are also often treated as 'black box' solutions. Ref. [20] stated that ANN has layers of nodes which provide a functional relationship between input information and predicted output. The layers are trained on historical data sets. These data sets demonstrate the actual relationship between input and output information.
According to Ref. [24], ANN can learn the patterns of the underlying process from past data, capturing the relations between the inputs and the outputs. According to [25], ANN is a set of independent neurons linked together in the same way as the synapses, neurons, and dendrites of our brain [26]. The neural network learns and execute tasks. During training, the network modifies the weights of the links among the neurons in a way that each input produces the expected outputs. The output is the dependent variable, and the inputs are independent variables.
In this study, three-layer feed-forward neural networks with back propagation (BP) learning were constructed for computation of eleven physical and environmental input variables, as shown in Figure 1.  Figure 1 shows the algorithm having nodes and network lines. The lines are assigned weights of the connecting nodes. Equation (13) shows a relationship between the input and output variables: where, w ij (i = 0, 1, 2, . . . ..p, j = 1, 2 . . . . . . q) and w j (j = 0, 1, 2, . . . .., q) are model parameters often called connection weights; p is the number of input nodes and q is the number of hidden nodes. Figure 2 shows a diagram demonstrating input with a summation n feed into a neuron that computes the inputs and produces a binary output, y, which is either +1 or −1. The bias weight, θ, is introduced with a fixed input at +1. The bias weight allows greater flexibility of the learning process. Figure 2. Schematic Diagram of a Single Artificial Neuron [28,29].
In Equation (14), y is the neuron and f is a threshold function known as the neuron's transfer function, which gives an output of +1 whenever Σwixi is greater than zero (the threshold value) or −1 whenever Σwixi is less than (or equal to) zero [28].
In this study, the activation function used to act upon input to get output is bipolar sigmoid function. The bipolar sigmoid function Formula is in Equation (15) and Figure 3   The output of the bipolar sigmoid function is between −1 and 1.

Problem Statement and Objectives
Ref. [1] stated that more investigation is required to identify the influence of physical and environmental factors that affect deterioration of sewer pipes. Additionally, they recommended future research to investigate more pipe material such as steel and concrete pipes in sewer networks and compare the results, and that results of prediction models should be developed for different cities. Ref. [31] review on sewer pipes condition prediction models observed that there is a need for more research to predict condition of sewer pipes with higher accuracy and confidence level.
Following his research on advanced sewer asset management using dynamic deterioration models, Ref. [32] discovered there was still room for improvement. Accordingly, he recommended, in future research, a more comprehensive model be developed by incorporating additional location related attributes such as soil type, water table, among others. In a later study, Ref. [17] advised municipalities to develop and implement risk assessment models for their utilities to get the best utility of their limited budgets available for replacing deteriorating assets. Ref. [33], in their research on infrastructure management and deterioration risk assessment of wastewater collection systems, advised that the deterioration models can be improved by addition or consideration of other independent variables such as soil type, groundwater level, and initial quality of construction. Ref. [33] recommended environmental factors, including surface condition, soil type, corrosivity concrete, corrosivity steel, and pH considered in building the models were included in this dissertation. Ref. [13] recommended factors like type of soil backfill, H 2 S, and groundwater level to be investigated to understand sewer deterioration mechanism and develop an effective model. Surface condition and corrosivity variables have not been studied more by others.
According to [16], the improvement of technical asset management and the use of digital solutions to improve the efficiency of inspection and rehabilitation strategies is the promising leverage of utilities. Ref. [16] further stated that most metrics are based on statistics and do not provide understanding of deteriorations for sewer operators. Accordingly, there is a need to utilize artificial intelligence methods and compare the results to those of statistical methods.
The objective of this research is to develop MLR and ANN models to predict sanitary sewer pipes condition rating using inspection and condition assessment data. The secondary objectives of this research are to identify, evaluate, categorize, and develop relationships of different factors affecting sewer pipes condition ratings and to compare the performance of MLR and ANN for predicting sewer pipes condition. The adopted methodology is explained as follows. Figure 4 illustrates the methodology adopted to carry out this research. First, utilizing engineering journals, databases, and Google Scholar, a thorough literature review was conducted to mainly study current sewer pipe predictive models, modes of sewer pipe failure, and variables for sewer pipe failure.

Methodology
In addition to physical, environmental, and operational factors influencing sewer pipes failure, literature on the failure of sewer pipe and risk process evaluation were reviewed. Second, data were collected from Geographical Information System (GIS) shape files for the City of Dallas Water Utilities (DWU) GIS web/database. The GIS data originated from condition assessment and CCTV inspection records.
The data comprised of pipe segments/locations, length (manhole to manhole), pipe material, pipe diameter, pipe age (current year minus year of installation), depth (depth of backfill over the crown of pipe in ft), soil conditions, corrosivity, slope, surface conditionhighway/street, and PACP condition rating. Third, the data were prepared, processed, and analyzed. Condition rating was designated as the dependent variable, while physical and environmental factors were independent variables.

Model Development
In this section, multinomial logistic regression and neural networks model development are illustrated. The development of the models involved a multinomial logistic regression model that was developed using IBM SPSS Statistics (version 27). ANN model was developed using Brain maker California Scientific Software. Before the development of the models, eighty percent (80%) of the data were randomly selected and the remaining 20% was set aside to validate the models and/or used as a case study to check the applicability of the models. Eighty-five percent (85%) and fifteen percent (15%) of the randomly selected 80% data were used in developing and testing the ANN model, respectively. Table 1 shows a sample of the 80% sewer pipes dataset.

Multinomial Logistic Regression Model
Multinomial logistic regression analysis evaluated the relationship between eleven (11) independent or predictor variables and one (1) dependent variable. Pipe material, diameter, age, slope, depth, surface condition, soil type, corrosivity concrete, corrosivity steel, and pH were independent variables used to generate prediction models. The condition rating score was the dependent variable.

Model Parameters Estimation
The data were randomly divided into 80% and 20% for multinomial logistic regression model development and validation, respectively. MLR analysis was conducted, and 4 models were built based on sewer pipes condition ratings 1, 2, 3, and 4. Condition 5 was used as the reference category. Table 1 shows a sample of 80% of the sewer pipes dataset.

Validation of Multinomial Logistic Model
Model parameters estimation tables were used to derive one set of model equation broken down into four multinomial logistic regression equations, one for each condition category relative to reference category for sewer pipe condition 5. The four equations were used to predict sewer pipe conditions 1, 2, 3, and 4, relative to sewer pipes condition 5 that was used as a reference category. The variables coefficients (β) were used to develop the 4 multinomial logistic regression equations relative to condition 5 (C = 5) reference category. The equations are presented in Equations (16) where: Pr (C = 1) is the probability of sanitary sewer pipe condition dependent variable being condition 1 relative to condition 5. Pr (C = 5) is the probability of reference category condition 5.
where: Pr (C = 4) is the probability of sanitary sewer pipe condition dependent variable being condition 4 in relative to condition 5. Pr (C = 5) is the probability of reference category condition 5. Diameter, Age, Slope, Depth, Length, Material, Surface, Soil Type, and Corrosivity are independent variables that influence sanitary sewer pipe condition. Equations (20)- (24) show probabilities of sewer pipes conditions 1, 2, 3, 4, and 5 occurring. Probabilities for each sewer pipe segment are calculated using Equations (20)- (24). The highest probability value is taken as the predicted respective sewer pipe condition.

Artificial Neural Networks Model
Development of ANN model included preparing data as inputs and output, training, and testing the model. The data sets were randomly divided as follows: Training (85%), and Testing (15%). During training, the network was fed with inputs. The network then generated output. The network checked the results with correct answers and made corrections to internal connections while minimizing the errors. During testing inputs, they were paired with outputs which were provided. Testing is the same as training. Validation of the model was conducted after the model was developed. This process is referred to as running the model. Different data were used to run and check the application of the developed model. In model validation, only inputs were used to predict the sewer pipes condition.

Neural Networks Data Processing Software Selection
The data were stored in Microsoft Excel. It was processed and grouped into input and output variables. The created input data comprised of independent variables. These variables included pipe material, diameter, age, slope, depth, surface condition, soil type, corrosivity concrete, corrosivity steel, and pH. The created output data comprised the dependent variable. The dependent variable was sewer pipes condition rating.
The BrainMaker, a commercially available Simulator distributed by California Scientific Software, was used to develop the neural network model. BrainMaker used data stored in neural networks files. The neural network files were Definition (.def), Fact (.fct), and Testing (.tst). Figure 5 shows the process of developing the neural network model. The steps that were used to develop the neural network model are as follows: (1) Acquire inspection and condition assessment data, (2) Prepare and process data. Categorical data were split into categories. The data were labeled into inputs and outputs, (3) Train network model. Twelve different neural network architectures were trained and obtained the optimal architecture with the lowest errors, (4) Tested network model. The architectures were tested using 15% of the data, (5) Run or validated model. New data were used as a case study to validate the use of the model. Datasets were randomly divided: Training (70%), and Testing (30%) (IBM SPSS Neural network Software) and Training (85%), and Testing (15%) (Brain Maker Neural Network Software, California Scientific Software). The backpropagation algorithm was used in training the neural network model. Training involved presenting inputs to the network. The network uses the input variables to establish a relation between the inputs and outputs that are placed in NetMaker. The Netmaker tool is incorporated in BrainMaker. The data is processed in Netmaker and saved in the BrainMaker file.

Neural Networks Architecture
The neural network architecture comprises three (3) layers, namely, input, hidden, and output layers. The hidden layer is known as hidden neurons. The number of neurons should be sufficient to provide optimal performance in the modeling prediction process. Too few or too many neurons will not enable the network to acquire knowledge that can be generalized for future predictions. There are three ways of determining the ideal number of hidden neurons. Equations (25) and (26) show two ways of calculating the number of hidden neurons.

Number of hidden neurons =
of Data sets − Outputs C(#Input + #Output + 1) (25) where, C = 2-5 Number of Neurons = # Inputs + # Outputs 2 (26) Equation (24) was suggested by BrainMaker Manual. In this study, Equation (24)   In this study, starting at one neuron, the experiment was conducted from 1 neuron to 15 neurons. The neural network with the least testing error was selected (Table 2). Model #3 was found to be optimal with the least training and testing errors. Model #3 was chosen for model development.

Neural Networks Model Development
In BrainMaker, the network size of the optimal neurons was set at 6. Various training and testing tolerances were tested starting at 0.1 and 0.1, respectively. Training and testing tolerance of 0.3 and 0.3, respectively, was selected to be optimal, having the lowest training and testing errors. The training was achieved by the trial-and-error method, with weights randomly taking numbers in training the model. The training was stopped when the neural network reached the lowest training and testing errors. The training algorithm helps distribute the error to arrive at the minimum error. The information moves forward in the network to predict the output. While minimizing the error is achieved through several iterations, the backpropagation algorithm redistributes the error and adjusts the weights. Figure 6 shows the ANN structure of the model. The structure was comprised of the input layer, hidden layer, and output layer. The input layer was comprised of independent variables. The input variables used were diameter, age, slope, depth, length, pH, material, surface condition, soil type, corrosivity concrete, and corrosivity steel. The hidden layer was comprised of the six neurons.
The output layer consisted of sewer pipe conditions 1, 2, 3, 4, and 5. Like it is shown for diameter parameter, there is a network of a relation between the input, hidden, and output layers. Similar relationships apply for all other parameters. This illustrates how ANN is working. This relationship in the network mirrors similar neural network illustrated by [30]. Table 3 shows that the model learned 72% of the facts and predicted 85% of the testing factors.

Results and Discussion
In this section, results and discussions of multinomial logistic regression and neural networks models are presented. The accuracy of the models was discussed using a classification table, sensitivity, and specificity curves. ROC curve and model influence variables are also presented. The significance coefficients of the models are used to point out the variables that influence sewer pipe conditions. The confidence level used in the data analysis is 95%. The validation of multinomial logistic regression, including the justification of the results, is also discussed.

Multinomial Logistic Regression Model
The MLR correctly predicted 75% of the sewer pipe conditions overall. Prediction of sewer pipe condition 1 was 97% correct, with 3 percent incorrectly predicted. This demonstrated that the model had a high accuracy in predicting condition 1. According to the classification table conditions 2, 3, 4, and 5 were 0%, 28%, 4%, and 14% was correctly estimated. Sewer condition 1 datasets were 73% of the sewer pipe segments. Sewer pipes datasets condition 2, 3, 4, and 5 were 4%, 12%, 3%, and 7%, respectively, of the sewer pipes segments. This explains why the percent prediction correct rate was low in conditions 2, 3, 4, and 5 compared to condition 1. The prediction was consistent with the available datasets that were analyzed.  Table 4 demonstrated the performance of the ANN model. When the area is close to one (1), it refers to the perfect model. When the area is greater than 0.7, it implies an acceptable model. According to Hosmer et al., 2013, when the area is close to one (1), it refers to the perfect model. When the area is greater than 0.7, it implies an acceptable model. Table 4 shows that the model is acceptable to be used in the prediction of sewer pipes conditions.  The accuracy of MLR and ANN model was compared. The ANN model was found to be better in predicting the sewer pipe condition compared to the MLR model. The prediction accuracy of the logistic regression model was 75% and that of the ANN model was 85%.

Neural Networks Model Performance
Ref. [34] defined sensitivity as True Positive Rate and specificity as True Negative Rate. These are metrics of measuring performance of a model.

Discussion
This study was set to develop logistic regression and neural networks sanitary sewer condition assessment prediction models. Observed datasets of independent and dependent variables were utilized to develop the models. The independent variables that influenced the condition ratings of the sewer pipes were presented in order of importance in Figure 6. Age, depth, slope, diameter, and depth, which are physical factors, were found to be the most important predictors compared to environmental factors. The significant variables that influence sewer pipes condition rating were diameter, age, length, pipe material (CONC), soil type (Loam), soil type (Clay), corrosivity concrete (Highway), and corrosivity concrete (Low). Influencing and non-influencing variables were determined by significance value (p < 0.05) based on a 95% Confidence Level. Table 5 shows factors that were found to be significant on the sewer pipe condition. Pipe diameter was found to be significant in sewer pipe conditions 1, 2, and 4 with a significant level of 95% the significant value p < 0.05. In condition 1, 2, and 4, the value p was 0.001, 0.000, and 0.008, respectively. Age was one of the significant variables. Age was found to be significant in sewer pipe conditions 1, 2, 3, and 4 with a significant value of p < 0.05. The significant p value in conditions 1, 2, and 3 were 0.000, 0.001, and 0.000, respectively. Pipe length was a significant factor in sewer pipe condition 1. The significant value of the pipe length was 0.000. Pipe material was a significant variable in the prediction model in sewer pipe conditions 2 and 3. The significant value of p was 0.025 and 0.001, respectively. The PVC material had most of the pipes that were in good condition.  Figure 8 shows significant factors. The significance of the of the factors is presented as a ratio and percentage (Normalized significance). Loam and clay soil types were found to be significant predictors. The significant value p was < 0.05 in condition 4. The significant values were 0.05 and 0.05, respectively. Corrosivity concrete (High) and corrosivity concrete (Low) variables are significant in the sewer pipes condition. The significant value p was <0.05 was found to be in condition 4. The significant values were 0.036 and 0.031, respectively. Depth, slope, corrosivity steel, and pH variables were found not to be significant in predicting sewer pipes condition.

Justification of Results
The model results were consistent with similar studies conducted by other authors. Ref. [3] found that age and material factors have a high impact on sewer pipe condition. In addition, Ref. [3] found that pipe diameter, depth, and slope were less significant in pipe condition. Significant p-Value and ODDs ratio (Exp(B)) generated in this study were compared with the results of other authors. In this research, the diameter was found to be significant in conditions 1, 2, and 4, with p values of 0.001, 0.000, and 0.008, respectively. The Exp(B) was 0.978, 0.923, and 0.951 for sewer pipes condition 1, 2, and 4, respectively. The Wald for sewer conditions 1, 2, and 4 were found to be 10.390, 12.975, and 7.020, respectively. The diameter of the pipe being significant was consistent with [12,18,35].
Age was another important factor that influences sewer pipes condition. In this study, age was found to be significant in predicting sewer pipe conditions. The p-value for age factor in conditions 1, 2, and 3 were 0.000, 0.001, and 0.000, respectively. Wald was 81.139, 11.57, and 15.222 for conditions 1, 2, and 3. The Exp(B) was 0.845, 0.968, and 0.974 for conditions 1 and 2, respectively. Refs. [12,13,18,35] found age to be a significant factor. It was revealed that length was a significant factor in influencing the sewer pipes condition 1. The p-value was 0.000, and Wald and Exp (0.999) were 20.852 and 0.999, respectively. This was confirmed by a study conducted by [12,18,35]. Pipe material (Conc) was a significant factor in influencing sewer pipe conditions. In conditions 2, p-value, Wald, and Exp(B) were 0.025, 5.030, and 0.472, respectively. In condition 3, p-value, Wald, and Exp(B) were 0.001, 10.828, and 2.089, respectively. Pipe material was found to be insignificant in conditions 1 and 4. This is consistent with [12].
Soil type (Clay), soil type (Loam), and soil type rock were revealed to be significant in sewer conditions 2, 3, and 4. In condition 2 for clay soil p-value, Wald, and Exp(B) were 0.05, 3.89, and 9.12, respectively. In condition 3 for clay soil, p-value, Wald, and Exp(B) were 0.021, 0.021, and 1.153, respectively. Similarly, for loam soil, the p-value, Wald, and Exp(B) were 0.05, 3.89, and 9.12, and for loam soil, the p-value, Wald, Exp(B) for soil type (Rock) was 0.018, 0.0.18, and 0.864. Ref. [18] found that soil type is a significant variable in the prediction of sewer pipe condition. In condition 4, soil type clay and loam have Wald of 3.89 and 3.93, p-value 0.05 and 0.05, and Exp(B) of 9.12 and 9.68, respectively.
Corrosivity concrete was found to be significant in influencing pipe conditions. The p-value, Wald, and Exp(B) were 0.036, 4.411, and 0.096, respectively. Ref. [18] found corrosivity to be of very high significance. The model prediction accuracy in this research was compared with other authors. Table 6 shows that different authors obtained varying prediction accuracy. Collected data could be a factor in varying prediction accuracy. Ref. [36] stated that the usefulness of a model will be increased with more data.

Comparison of MLR and ANN Models and Conclusions
It can be concluded that MLR and ANN models were developed to predict sanitary sewer pipe conditions. The models were developed, validated, and tested in prediction sewer pipe condition scores to prioritize pipes to be rehabilitated and/or replaced and further condition assessment. The developed models added knowledge in the tools used to predict sanitary sewer pipes condition. Predicting and knowing the sanitary sewer pipes condition rating score would be beneficial to policymakers, and sanitary sewer utilities managers in prioritizing rehabilitation and/or replacement of sanitary sewer pipes.
The logistic model was built with 80% of the randomly selected dataset. The randomly remaining 20% of the data were utilized in the validation of the model. The ANN model was trained, validated, and tested. The feed-forward network with a backpropagation learning algorithm was employed. Based on the logistic model results, the significant physical factors influencing sanitary pipes failure included diameter, age, pipe material, and length. Soil type was the environmental factor that influenced sanitary sewer pipes failure. The accuracy of the performance of the MLR and ANN was found to be 75% and 85%, respectively. Concerning the main objective of this research, it was determined that the use of ANN model provided more accurate prediction of sanitary sewer pipes condition by testing the results of the condition rating values.

Recommendations for Future Research
This study raised several important points to be considered for future research. The data for this dissertation was collected from the Dallas Water Utilities. Other cities should be included in future studies and results compared with results of this research. There is a need for utilizing more datasets to increase the accuracy of the prediction models. Data collected for analysis should include more uniformly distributed number of observations in every condition. More training and testing of the ANN model are needed to fine tune and validate its prediction strength for conditions 4 and 5. Excluded pipe material types could be included for further model development. These models can be improved by utilizing wastewater type and volumetric flow rate variables in their prediction and comparison.  Data Availability Statement: All the data, models, or code generated or used during the study is available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.