A Random Forest Model for the Prediction of FOG Content in Inlet Wastewater from Urban WWTPs

: The content of fats, oils, and greases (FOG) in wastewater, as a result of food preparation, both in homes and in different commercial and industrial activities, is a growing problem. In addition to the blockages generated in the sanitary networks, it also represents a difﬁculty for the performance of wastewater treatment plants (WWTP), increasing energy and maintenance costs and worsening the performance of downstream treatment processes. The pretreatment stage of these facilities is responsible for removing most of the FOG to avoid these problems. However, so far, optimization has been limited to the correct design and initial installation dimensioning. Proper management of this initial stage is left to the experience of the operators to adjust the process when changes occur in the characteristics of the wastewater inlet. The main difﬁculty is the large number of factors inﬂuencing these changes. In this work, a prediction model of the FOG content in the inlet water is presented. The model is capable of correctly predicting 98.45% of the cases in training and 72.73% in testing, with a relative error of 10%. It was developed using random forest (RF) and the good results obtained ( R 2 = 0.9348 and RMSE = 0.089 in test) will make it possible to improve operations in this initial stage. The good features of this machine learning algorithm had not been used, so far, in the modeling of pretreatment parameters. This novel approach will result in a global improvement in the performance of this type of facility allowing early adoption of adjustments to the pretreatment process to remove the maximum amount of FOG.


Introduction
Fats, oils, and greases (FOG) are some of the components of urban wastewater and the result of food preparation both in homes and in various commercial and industrial settings. FOG is a growing concern for municipalities and sewage plant operators, due to its tendency to cause severe blockages in pipes and sewers [1][2][3].
FOG characteristics can vary greatly depending on the types of fat, oil, and grease and their sources of collection [4]. FOGs can appear as liquids or solids and are characterized by a greasy texture and lower density than water, which is why they float on the surface. Furthermore, FOG can form emulsions in aqueous media in the presence of soap or other emulsifying agents. FOG is composed of fatty acids, triacylglycerol, and lipid-soluble hydrocarbons, with FFA (free fatty acids) being the most important components due to their chemical reactivity. The presence of a large amount of FFA results in a characteristically low pH [1,5].
Upstream of the treatment plants, the FOG with other types of waste generate the so-called "fatbergs" [2] that cause different problems in the pipes to the treatment plants [6]. Due to all the problems generated by the FOG, different prevention systems have been developed with different approaches, from educational campaigns to promote good management practices, the installation of grease trapping systems (GTSs), or the performance of periodic inspections to avoid improper disposal [7,8]. Numerous initiatives and programs

•
Weather changes, i.e., rain, more or less intense, ambient temperature, number of previous days without rain with consequent reduction of the inflow, among others, modify the quantity and characteristics of FOG reaching the WWTP. Predicting these weather events and their influence on different management infrastructures water has been studied in numerous works [19][20][21][22]; • The part of FOG from domestic activities is altered by holidays, vacation periods, the different seasons of the year, or the weather itself [3]; • The features of commercial sources of FOG (size, density, and geographical distribution) such as restaurants, and the use of grease trapping systems, for example [1,8]; • Another important source of FOG is industrial activities, such as food processing or slaughterhouse factories [13,23,24]; • The presence of other types of residues mixed with FOG present in the wastewater, such as gross solids (especially wet wipes), grit, and others [25].
Another important challenge of this work involved the selection and subsequent processing of the input variables to have an adequate number of training and testing patterns. Current WWTPs collect a large amount of data, often unused for facility management, so it is necessary to make an initial effort of exploration, visualization, and selection of relevant information [17,26].
This paper is divided into three main sections. Section 2 describes the characteristics of the WWTP being studied, the acquisition and processing of data, and the mathematical techniques used in the development of the model. Collecting data from different sources and different frequencies, to have enough training and test patterns and subsequent processing to ensure quality and representativeness have been one of the initial challenges of this work. Next, in Section 3, the results obtained are presented and discussed, both in the model training process and in its validation. These results indicate that the FOG Water 2021, 13, 1237 3 of 17 prediction model developed has enough accuracy to provide valuable information that will improve the operation of the WWTP. Finally, the main contributions of the study are highlighted in Section 4.

Case Study
The Villapérez Wastewater Treatment Plant is located in the northeast of the city of Oviedo (Spain) and occupies an area of nearly 21 hectares (Figure 1). It provides service to an approximate population of 723,000 equivalent inhabitants. Wastewater arrives at Villapérez through a unitary network of collectors that has an approximate length of 75 km. This network includes 44 spillways. Collector diameters range from 600 mm to 2000 mm with sections in gravity and impulsion.

Case Study
The Villapérez Wastewater Treatment Plant is located in the northeast of the city of Oviedo (Spain) and occupies an area of nearly 21 hectares (Figure 1). It provides service to an approximate population of 723,000 equivalent inhabitants. Wastewater arrives at Villapérez through a unitary network of collectors that has an approximate length of 75 km. This network includes 44 spillways. Collector diameters range from 600 mm to 2000 mm with sections in gravity and impulsion.
The Villapérez WWTP collects both urban and industrial wastewater. One of the main industries that discharge to Villapérez is a dairy facility with a production capacity of 500,000,000 million liters of milk per year and that discharges an average flow of 200 m 3 /h into the sanitation network. Therefore, the representativeness of this WWTP is given by being a medium-sized facility, which receives urban wastewater from a relatively large area and which must also treat industrial discharges with high FOG content such as dairy industries.
As can be seen in Figure 1, the wastewater treatment in Villapérez WWTP begins with a pretreatment stage in which the larger solids, sands, and fats are removed. Subsequently, water is taken to primary settling by gravity. Then, water goes to biological treatment where organic matter, nitrogen, and phosphorus are removed. This treatment involves passing the water through several anoxic chambers, anaerobic and aerobic. The next stage is secondary settling, which is carried out via gravity. Finally, the tertiary treatment stage consists of a physical-chemical treatment, lamellar settling, and filtration. The pre-treatment has the capacity to treat an inflow of 8.5 m 3 /s and starts with two, thick wells, equipped with a 500 L clamshell bucket. The plant then has four roughing channels, each of which includes an automatic cleaning screen with a 60 mm clearance and a self-cleaning fines screen with a 3 mm clearance and an inclination of 50°. After the The Villapérez WWTP collects both urban and industrial wastewater. One of the main industries that discharge to Villapérez is a dairy facility with a production capacity of 500,000,000 million liters of milk per year and that discharges an average flow of 200 m 3 /h into the sanitation network. Therefore, the representativeness of this WWTP is given by being a medium-sized facility, which receives urban wastewater from a relatively large area and which must also treat industrial discharges with high FOG content such as dairy industries.
As can be seen in Figure 1, the wastewater treatment in Villapérez WWTP begins with a pretreatment stage in which the larger solids, sands, and fats are removed. Subsequently, water is taken to primary settling by gravity. Then, water goes to biological treatment where organic matter, nitrogen, and phosphorus are removed. This treatment involves passing the water through several anoxic chambers, anaerobic and aerobic. The next stage is secondary settling, which is carried out via gravity. Finally, the tertiary treatment stage consists of a physical-chemical treatment, lamellar settling, and filtration.
The pre-treatment has the capacity to treat an inflow of 8.5 m 3 /s and starts with two, thick wells, equipped with a 500 L clamshell bucket. The plant then has four roughing channels, each of which includes an automatic cleaning screen with a 60 mm clearance and a self-cleaning fines screen with a 3 mm clearance and an inclination of 50 • . After the roughing stage, the water reaches the facilities for separating FOG and sands from raw water, which consist of 5 rectangular grit traps with a unit useful volume of 449.8 m 3 . To properly separate the FOG, they are first emulsified, and for this, the grit traps are aerated: 2/3 of the length of the grit remover using coarse bubble aerators, and 1/3 of the grit remover using fine bubble diffusers. Once the fat has been emulsified, it is collected by a scraper that cyclically runs the entire length of the sand trap.
After this separation, the emulsified FOG is sent to a fat concentrator by means of chains and scrapers that separate water from fat ( Figure 2). These concentrators have a flow rate of 30 m 3 /h and a power of 0.18 kW. The Villapérez WWTP removes an average of 5.25 tons of FOG per month, which is approximately 63 tons per year, or in other words, a container is filled every 9 days. roughing stage, the water reaches the facilities for separating FOG and sands from raw water, which consist of 5 rectangular grit traps with a unit useful volume of 449.8 m 3 . To properly separate the FOG, they are first emulsified, and for this, the grit traps are aerated: 2/3 of the length of the grit remover using coarse bubble aerators, and 1/3 of the grit remover using fine bubble diffusers. Once the fat has been emulsified, it is collected by a scraper that cyclically runs the entire length of the sand trap.
After this separation, the emulsified FOG is sent to a fat concentrator by means of chains and scrapers that separate water from fat ( Figure 2). These concentrators have a flow rate of 30 m 3 /h and a power of 0.18 kW. The Villapérez WWTP removes an average of 5.25 tons of FOG per month, which is approximately 63 tons per year, or in other words, a container is filled every 9 days. The main design parameters of the treatment plant are included in Table 1.

Data
All data used in this work were collected in the period from 1/03/2017 to 24/06/2019 and come from different sources: • Data related to wastewater were obtained through the Supervisory Control and Data Acquisition software (SCADA) of the WWTP. This system registers 226 parameters The main design parameters of the treatment plant are included in Table 1.

Data
All data used in this work were collected in the period from 1 March 2017 to 24 June 2019 and come from different sources: • Data related to wastewater were obtained through the Supervisory Control and Data Acquisition software (SCADA) of the WWTP. This system registers 226 parameters every 9 min from measuring equipment and sensors distributed all over the treatment plant. From this set of data, the data associated with the measurement of input parameters in the raw water during the pre-treatment stage were used. The parameters measured in the raw water are the input flow rate, pH, raw water temperature, conductivity, and ammonia. The data associated with these variables are identified by the time and date of the data measurement. • FOG data were collected from the container removal delivery notes, which contained the actual data of the waste total weight inside each container. The number of containers in the study period was 89. Their filling time was used as time intervals to group the data of the SCADA system.
• Climate data comes from the Spanish State Agency for Meteorology website (Agencia Estatal de Meteorología, Aemet) and the pluviometry data (instantaneous and accumulated rainfall) is obtained from those recorded by the plant's weather station. All of them are also grouped considering the intervals in which the containers are filled. From these data, a new calculated variable from the instantaneous precipitation is also created, corresponding to the number of previous days without rain.
Statistical data for the variables initially considered in the study are presented in Table 2. As indicated above, the reference is the time interval from when an empty container is placed to when it is removed. When each container is removed, it is weighed, and the data is recorded on the corresponding delivery note. For the elaboration of the training patterns, some variables have been calculated. The data corresponding to each of these periods was summarized by calculating for each variable its minimum, mean and maximum value, as shown in Table 2. A preliminary analysis by principal component analysis (PCA) [27] was carried out in order to study the initial data set. The graph in Figure 3 shows the contribution of the different variables to the dimensions of the PCA projection.
Some aspects that can be highlighted from this graph are: • As might be expected, the temperature variables (ambient temperature, wastewater temperature) appear grouped.

•
Conductivity is related to the number of days without rain. This is because wastewater, both urban and industrial, is not diluted by rainwater.

•
Obviously, the flow variables are related to the level of precipitation, that is, the more rain, the higher the inlet flow. • Finally, it can be seen how the amount of FOG (fat variable) is related to ammonium, and therefore this is an important parameter to consider in the modeling. This relationship may be due to industrial discharges since they provide both fat and nitrogen.  Some aspects that can be highlighted from this graph are: • As might be expected, the temperature variables (ambient temperature, wastewater temperature) appear grouped.

•
Conductivity is related to the number of days without rain. This is because wastewater, both urban and industrial, is not diluted by rainwater.

•
Obviously, the flow variables are related to the level of precipitation, that is, the more rain, the higher the inlet flow.

•
Finally, it can be seen how the amount of FOG (fat variable) is related to ammonium, and therefore this is an important parameter to consider in the modeling. This relationship may be due to industrial discharges since they provide both fat and nitrogen. Figure 4 shows the contribution of each of the variables in the complete dataset. It can be seen that the FOG variable is one of the variables that least contributes to variability and this is because it has a fairly steady behavior. The dotted reference line in red corresponds to the expected value if the contributions were uniform.   Finally, a PCA plot ( Figure 5) was performed in order to detect outliers and groups of cases with similar characteristics. After analyzing the within clusters summed squares (WCSS) and using the elbow method (a heuristic used in determining the number of clusters in a data set [28]), 4 was the optimal number of groups we decided to take. For group identification, hierarchical clustering [29] has been chosen, using complete linkage clustering [30] as the agglomeration method.  Figure 6 shows the same projection of the data of the previous figure ( Figure 5), but representing the variables average temperature (MedTemperature), average flow (Med-Flow), and average ammonia (MedAmmonium) in the same way. Comparing both graphs, it is possible to observe that the cases with the highest average temperature are in Four groups can be observed ( Figure 5) with the following characteristics: • Cluster 1 includes those cases with a maximum flow value greater than 10,000 m 3 /h; • Cluster 2 part of the cases with a maximum flow greater than 10,000 m 3 /h and also with average ammonium above 33 mg/L are included in this group; • Cluster 3 consists of data with an average temperature greater than 20 • C; • Cluster 4 is defined by an average flow greater than 7052 m 3 /h and includes 100% of the cases in this cluster. Figure 6 shows the same projection of the data of the previous figure ( Figure 5), but representing the variables average temperature (MedTemperature), average flow (Med-Flow), and average ammonia (MedAmmonium) in the same way. Comparing both graphs, it is possible to observe that the cases with the highest average temperature are in the area of cluster 3. In the graph at the bottom left, it can be seen how the points with the lowest average flow (MedFlow) values correspond to the cases of cluster 2 and 3. Finally, the points with the highest average ammonium values (MedAmmonium) correspond to cluster 2.

Methods
In this study, random forest (RF) analysis, a machine-learning approach for feature selection from highly multivariate datasets, was used to develop a forecast model of FOG content in the inlet wastewater. The RF algorithm reaches the final prediction from the majority voting of the decisions made with multiple decision trees constructed with randomly permuted features and observations via recursive partitioning [31]. RF method has been applied in a wide range of research areas due to its numerous advantages [32] and in recent years it has gained great importance in water resource-related research. Random forests have been used to address numerous research problems in WWTPs, such as: • Estimating different parameters of water quality or processes as chemical oxygen demand (COD) [33], total suspended solids (TSS) [34], stream nitrogen (N) and phosphorus (P) concentrations [35], or influent flow of WWTPs [36]; • To monitor different treatment processes such as to make predictions of 'settleability'

Methods
In this study, random forest (RF) analysis, a machine-learning approach for feature selection from highly multivariate datasets, was used to develop a forecast model of FOG content in the inlet wastewater. The RF algorithm reaches the final prediction from the majority voting of the decisions made with multiple decision trees constructed with randomly permuted features and observations via recursive partitioning [31]. RF method has been applied in a wide range of research areas due to its numerous advantages [32] and in recent years it has gained great importance in water resource-related research. Random forests have been used to address numerous research problems in WWTPs, such as:

•
Estimating different parameters of water quality or processes as chemical oxygen demand (COD) [33], total suspended solids (TSS) [34], stream nitrogen (N) and phosphorus (P) concentrations [35], or influent flow of WWTPs [36]; • To monitor different treatment processes such as to make predictions of 'settleability' of activated sludge [37], or nitrogen removal systems [38]; • To generate models of energy cost [39] or pumping systems [40] in WWTPs; • To obtain other improvements in plant control [41] or reliability of small wastewater treatment plants [42].
The main advantage of the random forest algorithm over other techniques is its great generalizability [42,43], which is why it has been used in a growing number of works related to water management [32] such as those indicated above. In addition, RF is able to provide better information compared to other methods on the importance of each input variable [36]. Good accuracy achieved by the RF models and the ability to more easily interpret the results over other methods were the main reasons for their use in this case study.
The model presented in this paper was developed using R [44] and the packages caret [45] and randomForest [46].

Results and Discussion
The representativeness of training datasets is very important to the effectiveness and overall performance of an RF model [47]. In this study, 90% of the data in the original dataset are selected randomly to generate a training dataset, while the other 10% are used to form the corresponding testing dataset in order to have a sample as representative as possible. In addition to configuring the data set, the training process requires adjusting several parameters. The number of trees (ntree) and the number of variables randomly sampled as candidates at each split (mtree) are the two most important parameters because they have a big effect on the final accuracy of an RF model [48,49]. To adjust these parameters, the cross-validation algorithm was used with a division into three folds and repeating the training ten times [50].
After the training process, different parameters to evaluate model results have been taken into consideration: • Root mean square error (RMSE) is a frequently used measure of the differences between values predicted by a model and the values observed. The smaller the value, the better the model's performance.
• Mean absolute error (MAE) is also a common measure to forecast a model's error.
The determination coefficient (R 2 ) is the proportion of the variance in the dependent variable that is predictable from the independent variables and it is a statistical measure of how well a model approximates the real data points. A bigger value indicates a better fit between prediction and actual value.
The model developed for FOG content prediction in the inlet waters to the wastewater treatment plant presents the following values (Table 3) and the three indicators show very good performance.  Figure 7 compares the performance of the model in training and test with an estimate using the mean value of content in FOG. It can be seen that with a relative error of 10%, the model is capable of correctly predicting 98.45% of the cases in training and 72.73% in testing, while under these same conditions the mean FOG value would only be correct in 24.17% of the cases.  Initially, 22 variables were introduced for the generation of the model, 11 of them were discarded during the training process since they were not used in any of the splits. The relative importance (Table 4) of the model variables can be calculated with samples not selected in the cross-validation sub-samples used to construct a tree [51]. One of the most significant advantages of the RF method is its evaluation of the importance of the variables used in the training process [52]. The interpretation made of the importance of these variables in the development of the model is described below:

•
In this case, the two most relevant variables are the average (MedAmmonium) and maximum (MaxAmmonium) ammonium values. This could be due to the large amount of ammonium and FOG contained in the discharges from the dairy facility served by the Villapérez WWTP as was mentioned in the case study description; Initially, 22 variables were introduced for the generation of the model, 11 of them were discarded during the training process since they were not used in any of the splits. The relative importance (Table 4) of the model variables can be calculated with samples not selected in the cross-validation sub-samples used to construct a tree [51]. One of the most significant advantages of the RF method is its evaluation of the importance of the variables used in the training process [52]. The interpretation made of the importance of these variables in the development of the model is described below:

•
In this case, the two most relevant variables are the average (MedAmmonium) and maximum (MaxAmmonium) ammonium values. This could be due to the large amount of ammonium and FOG contained in the discharges from the dairy facility served by the Villapérez WWTP as was mentioned in the case study description; • The third most significant variable is maximum precipitation (PrecipMax). Greater precipitation implies a greater inflow into the WWTP, with more dissolved FOG, which makes it difficult to remove it in the pretreatment process; • Urban wastewater has a steady conductivity, so it is possible to associate the variations and relevance of this variable with industrial discharges; • The relevance of the following variables related to the number of previous days without rain (MxDwR, PDwR, and MedPDwR) can be explained in a similar way to precipitation, that is, as there is less inflow to be treated, the FOG is less dissolved and it is possible to remove it in a greater proportion; • pH: urban wastewater has a relatively steady pH, so variations in this indicator can be associated with industrial discharges; • The average temperature (TempExtMed) provides information on the seasonal situation at the time of analysis. A higher temperature makes it easier to emulsify the FOG and therefore its removal is more effective; • The relevance of the average flow variable (MedFlow) can be explained in the same way as the precipitation or the number of previous days without rain mentioned above; In Figure 8, the behavior of the training data is represented. It can be observed that the predicted data precisely fit the real ones and how the errors have a steady behavior, which reinforces the quality of the model. • The third most significant variable is maximum precipitation (PrecipMax). Greater precipitation implies a greater inflow into the WWTP, with more dissolved FOG, which makes it difficult to remove it in the pretreatment process; • Urban wastewater has a steady conductivity, so it is possible to associate the variations and relevance of this variable with industrial discharges; • The relevance of the following variables related to the number of previous days without rain (MxDwR, PDwR, and MedPDwR) can be explained in a similar way to precipitation, that is, as there is less inflow to be treated, the FOG is less dissolved and it is possible to remove it in a greater proportion; • pH: urban wastewater has a relatively steady pH, so variations in this indicator can be associated with industrial discharges; • The average temperature (TempExtMed) provides information on the seasonal situation at the time of analysis. A higher temperature makes it easier to emulsify the FOG and therefore its removal is more effective; • The relevance of the average flow variable (MedFlow) can be explained in the same way as the precipitation or the number of previous days without rain mentioned above; In Figure 8, the behavior of the training data is represented. It can be observed that the predicted data precisely fit the real ones and how the errors have a steady behavior, which reinforces the quality of the model. Similarly, in Figure 9, it is possible to observe the performance of the RF model with the test data. The model is capable of adequately predicting the trend of the behavior of Similarly, in Figure 9, it is possible to observe the performance of the RF model with the test data. The model is capable of adequately predicting the trend of the behavior of the arrival of FOG, which will provide relevant information when making decisions in plant operations.
Water 2021, 13, x FOR PEER REVIEW 13 of 18 the arrival of FOG, which will provide relevant information when making decisions in plant operations. The sensitivity analysis of the FOG model developed assesses the change produced in the output in response to the variation of one ( Figure 10) or two of the inputs ( Figure  11). In this way, it is possible to identify from which value of a variable a trend change in the FOG content is expected.
As can be seen in Figure 10, the behavior of the variables is consistent and fits what is expected. Despite the fact that the y-axis (FOG) has small variation ranges, the expected trends can be seen. For example, it is possible to observe that increasing the average and maximum ammonium (MedAmmonium and MaxAmmonium) increases the amount of FOG (Figure 10 (7, 8)). Some studies have modeled the amount of ammonium in wastewater, indicating greater uncertainty in the estimation during periods of rain, but without referring to the content of FOG [53]. Also, when it has been raining recently, that is, a number of days without rain (PDwR) close to zero, an initial washing effect is produced in the pipes and sewers that increases the arrival of FOG while when this variable increases the amount of FOG is little influenced. This behavior, with an initial increase of all types of waste such as FOG at the beginning of the rain episodes, with a subsequent dilution, has also been observed in other research works [54]. Along with this, the changes in pH are in agreement with the results of other studies, where the pH values on rainy days are numerically higher [55]. The sensitivity analysis of the FOG model developed assesses the change produced in the output in response to the variation of one ( Figure 10) or two of the inputs ( Figure 11). In this way, it is possible to identify from which value of a variable a trend change in the FOG content is expected.    . Sensitivity analysis (two variables). Figure 11. Sensitivity analysis (two variables).
As can be seen in Figure 10, the behavior of the variables is consistent and fits what is expected. Despite the fact that the y-axis (FOG) has small variation ranges, the expected trends can be seen. For example, it is possible to observe that increasing the average and maximum ammonium (MedAmmonium and MaxAmmonium) increases the amount of FOG ( Figure 10 (7, 8)). Some studies have modeled the amount of ammonium in wastewater, indicating greater uncertainty in the estimation during periods of rain, but without referring to the content of FOG [53]. Also, when it has been raining recently, that is, a number of days without rain (PDwR) close to zero, an initial washing effect is produced in the pipes and sewers that increases the arrival of FOG while when this variable increases the amount of FOG is little influenced. This behavior, with an initial increase of all types of waste such as FOG at the beginning of the rain episodes, with a subsequent dilution, has also been observed in other research works [54]. Along with this, the changes in pH are in agreement with the results of other studies, where the pH values on rainy days are numerically higher [55]. Figure 11 shows how the variation of two variables affects the FOG content in the inlet water. As already indicated, it is confirmed that the presence of ammonium is not influenced by the variation in precipitation, since it is mainly due to discharges derived from industrial activities (Figure 11 (2, 3)). On the contrary, it can be observed how the variation of ammonium affects the conductivity values ( Figure 11 (4, 5)).
Tests have been carried out with other predictive methods of regression machine learning, such as multivariate adaptive regression splines (MARS) [56] or support vector machine (SVM) [57]. However, when performing the corresponding sensitivity analyzes, it has been seen that the model generated with RF presents greater stability since it better adjusts to the behavior expected by the target variable. In this case, the other techniques extrapolate the data worse, generating anomalous values in areas where the dataset has a low information density. Many of these advantages of RF, such as the ability to identify non-linear relationships between the predictor and the dependent variables [58], not overfitting [59], the handling of highly correlated variables [60], or the possibility of ordering the relative importance of the variables [61] have been previously identified by several authors in other fields. In addition, as other researchers indicate, the potential of this algorithm in the field of water resources has been very little exploited [32]. Even less has it been used in the field of the pretreatment stage of a WWTP which, as previously mentioned, has not received much research attention so far, which constitutes one of the novelties of this work. No other scientific publication has been found in which a similar prediction model has been presented, so it has not been possible to compare the results.
The ability to anticipate trends in incoming wastewater provided by the model will allow the pretreatment process to be adjusted to optimize FOG removal. This process does not detect if there is an increase in the FOG content, so it is not adjusted until that increase is detected in the downstream stages. For example, when large production peaks occur FOG air injection is varied to optimize emulsification. Reducing the time for the early adoption of this type of measure, thanks to the information provided by the model presented in this work, will certainly improve the removal of the FOG content and will positively affect all the treatment processes of the WWTP.

Conclusions
Like other fractions of urban wastewater withdrawn in the pretreatment stage of wastewater treatment plants, the optimization of FOG removal has received relatively little attention from researchers beyond its subsequent use or its influence on subsequent wastewater treatment processes. However, its influence on these later stages of wastewater treatment can be important to improve both the overall performance of WWTP and their operability. With this objective, in this work, a prediction model of the FOG content in the inlet waters of the treatment plant has been developed. The ability to provide operators with advanced information of changes in the wastewater entering the WWTP, taking into account various factors (chemical composition, meteorological changes, seasonal changes, etc.) had not been addressed so far in any other research.
The model is based on data collected for more than two years at the plant of Villapérez (Oviedo, Spain) and the well-known random forest algorithm, but which had not been used for this purpose so far. The results obtained, evaluated using several common indicators, reflect the good performance of the model both in the training (RMSE = 0.037, MAE = 0.025 and R 2 = 0.9888) and test (RMSE = 0.089, MAE = 0.066 and R 2 = 0.9348) stages. Thanks to the features of the RF technique, the most relevant variables used in the model have been interpreted, such as ammonia or changes in precipitation. As expected, the influence on changes in the FOG content of industrial discharges is highlighted in the case study.
Better information will enable operators to better decision-making, allowing optimization of the removal of FOG in pretreatment processes. It will result in a reduction of the content of FOG subsequent processes and a reduction of energy consumption and maintenance costs of the plant.
Future research could apply similar RF models to other WWTPs with different characteristics to verify their good performance. On the other hand, WWTPs receive other important wastes, such as gross solids or grit, whose prediction could be integrated into a more complete model of the incoming wastewater features.