A Neural Network Based Superstructure Optimization Approach to Reverse Osmosis Desalination Plants

An ever-growing population together with globally depleting water resources pose immense stresses for water supply systems. Desalination technologies can reduce these stresses by generating fresh water from saline water sources. Reverse osmosis (RO), as the industry leading desalination technology, typically involves a complex network of membrane modules that separate unwanted particles from water. The optimal design and operation of these complex RO systems can be computationally expensive. In this work, we present a modeling and optimization strategy for addressing the optimal operation of an industrial-scale RO plant. We employ a feed-forward artificial neural network (ANN) surrogate modeling representation with rectified linear units as activation functions to capture the membrane behavior accurately. Several ANN set-ups and surrogate models are presented and evaluated, based on collected data from the H2Oaks RO desalination plant in South-Central Texas. The developed ANN is then transformed into a mixed-integer linear programming formulation for the purpose of minimizing energy consumption while maximizing water utilization. Trade-offs between the two competing objectives are visualized in a Pareto front, where indirect savings can be uncovered by comparing energy consumption for an array of water recoveries and feed flows.


Introduction
Due to a worldwide growing population the demand for water is ever-increasing, leading to global water scarcity that is not only driven by water quantity, but also by water quality issues [1]. The water supply systems are further stressed by climate change together with increased intensification of agriculture, industrialization, and water withdrawal [2][3][4][5], challenging the "clean accessible water for all" UN Sustainable Development Goal for 2030 [6].
Such stresses imply that conventional water sources are no longer sufficient to meet human water demands, especially in water-scarce regions [7]. Desalination is one of the technologies which can help overcome this challenge, since fresh or potable water can be obtained from available saline water sources [8,9]. Generally, desalination technologies can be classified based on either membrane processes or thermal separation [10]. The latter are highly energy-intensive and characterized by high capital and operational costs, due to their dependency on thermal energy, mainly produced from fossil fuels. On the other hand, membrane-based desalination processes are regarded as the most promising and practical desalination technologies, due to their high process energy efficiency [11]. Among membrane processes, reverse osmosis (RO) desalination is the industry leader constituting 69% of the worldwide installed desalination capacity, equalling 65.5 M m 3 day of produced fresh water [12]. Nevertheless, RO still remains an energy intensive process requiring process optimization, to minimize energy consumption [13,14]. The operation and design of brackish water reverse osmosis (BWRO) desalination plants has already been studied in the literature [11]. Ruiz-Garcia et al. for example evaluated operational windows of two BWRO systems based on hydrochemical fluctuations in well groundwater and derived optimal operational points in terms of minimum specific energy consumption and maximum water recovery [15]. Li studied the optimal operation of a BWRO desalination plant based on a constrained nonlinear optimization model resulting in a 10% energy consumption reduction while the same permeate flows are being maintained [16]. Further, Fellaou et al. analyzed the exergy of a full-scale BWRO desalination plant to analyze the performance of the main plant components to ultimately determine inefficiencies within the plant [17]. Additionally, Patel et al. used a rigorous system-scale RO model to determine the specific energy consumption and energy efficiency of the RO process over a wide range of brackish water conditions [18]. Kotb et al. optimized several multi-stage RO system arrangements identifying the cost minimizing operational parameters [19]. On the other hand, Sassi and Mujtaba combined the solution-diffusion model with film theory to derive a nonlinear optimization framework which minimizes the specific energy consumption of the RO system at fixed permeate output and quality based on operational parameters [20].
Generally, the optimization of RO systems requires modeling the mass transfer of the employed membrane modules [21], which may result in a complex mathematical model if first order principles are employed [22][23][24]; often a computationally expensive task [25]. Data-driven surrogate models can be utilized to capture the complex mass transfer behavior of membrane systems [26], utilizing machine learning (ML) methods [27]. ML can be classified into unsupervised and supervised learning, the difference between the two being that the former is used to analyze data with no apparent input-output connection, whereas the latter is used to obtain functions mapping an explicit input-output structure within the data [28]. Within supervised learning, the utilization of feed-forward artificial neural networks (ANNs) has been very popular [29], since ANNs have excellent approximation capabilities [30]. Given a sufficient number of neurons in only one hidden layer, ANNs can virtually approximate any function of interest to any degree of accuracy [31].
ANNs have already been applied to RO membranes and processes to analyze their respective performance [32][33][34][35]. Libotean et al. for example utilized an ANN surrogate model approach to estimate RO plant performance with applications in operational diagnostics [36]. Choi et al. employed surrogate models to analyze the long-term performance of full-scale RO desalination plants [37]. Sivanantham et al. modeled and optimized the rejection of chlorophenol in spiral wound RO modules using ANNs [38]. However, the data to train the ANN is either derived by model simulation [39,40], experimental single-stage RO membrane data points [41] or water sampling [42].
Furthermore, ANNs can be used for the operation optimization of membrane-based desalination systems, enabling effective decision making and better design [43,44]. Madaeni et al. modeled three RO systems with the aid of an ANN to forecast performance degradation optimizing the operating conditions [45]. Farsi and Rosen performed a multi-objective optimization case study based on an ANN for a geothermal desalination system to evaluate the trade-off between exergy efficiency and process cost [46]. Nazif et al. optimized the operation of a RO system to reduce fouling, increase membrane life span, and minimize system cost using an ANN [47]. Soleimani et al. employed an ANN to derive Pareto-optimal solutions for maximizing the permeate output and minimizing the fouling resistance for membrane separation of wastewater [48].
To generate sustainable solutions for RO systems it is vital to consider the system implications to resources other than water, including energy and food, referred to as food-energy-water nexus (FEWN) [49][50][51][52] when optimizing the system. Namany et al. optimized the FEWN for various food security scenarios realizing that the utilization of RO reduces the environmental impact of solutions [53]. Tsolas et al. investigated a network representation-based graphical approach to the energy-water nexus incorporating reverse osmosis desalination systems [54]. Apart from that, Elmaadawy et al. designed a renewable energy system to meet the electrical load demand of a large-scale reverse osmosis desalination plant [55].
In this work, we employ a feed-forward ANN to approximate the energy consumption of an industrial-scale RO plant. We use hourly measured industrial-scale RO plant data, representing one and a half years of plant operation, to train and analyze several ANN setups and other surrogate models to comprehensively describe the complete plant behavior. A multi-objective optimization study is employed, where sustainability considerations are systematically taken into account to evaluate the trade-off between permeate production and energy consumption.
The remainder of this paper is structured as follows: The next section introduces the RO desalination plant under investigation, describes the obtained data and states the problem definition. In Section 3, all surrogate models are presented, and various ANN setups evaluated. Then, in Section 4, the optimization model is derived using a mixed-integer linear reformulation of all developed neural networks. Subsequently, the optimization model is used for minimizing the RO plant's energy consumption, as well as for multiobjective optimization to analyze the trade-offs between energy consumption and purified water production. Lastly, Section 5 concludes this work.

RO Plant Description and Problem Definition
The San Antonio Water System (SAWS) H 2 Oaks Desalination plant is located in Elmendorf, Texas, just South of San Antonio, Texas. It draws water from the lower Carrizo-Wilkox Aquifer, which has a total dissolved solids (TDS) concentration of approximately 1300 mg L [56]. The inorganic composition of the feed water is provided in Table A1 of the Appendix A. The RO system consists of three stages, four primary RO trains and two secondary or concentrator RO trains. The first and second stage build the primary RO train, whereas the third stages can be summarized as the concentrator RO train. The maximum water recovery of the first stage is WR max 1 = 56.25%, of the second stage is WR max 2 = 54.32% and of the third stage is WR max 3 = 50%. Therefore the primary RO train has a maximum water recovery of WR max prim = 80%. Overall, this results in a RO plant with a maximum achievable water recovery of WR max sys = 90%. The detailed RO process flow structure is given in Figure 1. It can be seen that the permeate of two primary RO trains are blended and used as a feed for the third stage or concentrator RO train. Only pressurization of the initial feed flow and repressurization of the feed flow of the third stage are taking place. Further, the Advanced Turbo Turbocharger AT-1500 of Energy Recovery Inc. is used as an energy recovery device (ERD) in the desalination plant. It is located on the concentrate side of the third stage. The membrane system of each stage is specified in Figure 1. Throughout the plant spiral wound membranes BW30-400/34 produced by Dow Filmtec are used. All available measurement points and their respective type throughout the RO plant are given in the same figure.
Measurements of all marked points for one and a half years in an hourly frequency were available (beginning of 2017 to mid 2018), resulting in 14,542 data points for each parameter. In January 2017, the H 2 Oaks desalination plant has commenced operation, and therefore a variety of operating points are captured from 2017 to mid 2018, since the plant operation was still being tested.The various operational feed flows (e.g., zero feed flow for maintenance) being tested are presented in Figure 2. Information regarding the maximum ramp up and ramp down rates was also collected from this set of data, with the maximum rump up in one hour in 2017 being 97% or 1694.35 m 3 h , and the maximum ramp down in one hour being 49% or 858.53 m 3 h . The concentration of the permeate flows of each stage, the permeate flow of the primary RO train and the overall permeate concentration have been measured. The first and second stage consist of four parallel RO flows, whereas the third stage consists of two parallel flows. For each respective parallel RO flow separate pressure measurements are available. Taking into account the number of membranes, columns and pressure vessels incorporated in each stage, this results roughly in a "4-2-1" RO separation set-up. Only data regarding the feed flow of the system has been collected. It is important to note that the pressure measurement of the retentate in the third stage is taking place after the energy recovery device. Also, the retentate flow of the first stage is the feed flow of the second stage. Table 1 summarizes all available data. The feed and retentate pressure of the second stage, third parallel flow, are for example given as P f ,2,3 and P r,2,3 . The permeate concentration is specified as C p,stage and the feed flow as Q f . In addition, measurements regarding the concentration of the permeate after the primary RO train (C p,prim,1 , C p,prim,2 , C p,prim,3 , C p,prim,4 ), as well as measurements of the overall permeate concentration of the system (C p,sum ) are available. It is worth noting that conductivity ( µS cm ) is a measure of concentration ( mg L ) and can be expressed as such [57], thus the two terms are used interchangeably throughout this work.    Feed pressure (bar) P f ,1,1 , P f ,1,2 , P f ,1,3 , P f ,1,4 P f ,2,1 , P f ,2,2 , P f ,2,3 , P f ,2,4 P f ,3,1 , P f ,3,2 Retentate pressure (bar) P r,2,1 , P r,2,2 , P r,2,3 , P r,2,4 P r,3,1 , P r,3,2 Permeate Conductivity ( µS cm ) C p,1,1 , C p,1,2 , C p,1,3 , C p,1,4 C p,2,1 , C p,2,2 , C p,2,3 , C p,2,4 C p,3,1 , C p,3,2 Feed flow ( m 3 h ) Q f Thus, in this work, we present a framework for the optimal operation of industrialscale RO desalination plants, using plant data to derive optimized process parameter results, with the goal of minimizing the overall energy consumption of the RO system. To do so, the obtained data is analyzed to derive linear surrogate models for all possible process parameters. If the dependencies between parameters are non-linear, ANNs with rectified linear units (ReLU) as activation functions are introduced to capture the observed behavior adequately. Then, the derived surrogate models can be reformulated as one mixed-integer linear programming (MILP) problem to optimize the energy consumption of the plant. The optimization model can further be modified to obtain Pareto-optimal solutions regarding the energy consumption minimization and the output water maximization. This work further includes a comparison of various ANN set-ups with changing inputs and outputs for calculating the overall permeate concentration of the plant. Depending on the research task at hand, researchers can choose models most tailored to their application.

Surrogate Modeling
Using the phenomenological transport equations for RO membranes to derive operational desalination plant parameters, results in a complex system of equations, which is computationally expensive to solve, especially within the context of optimization [24,25]. To overcome these drawbacks, linear surrogate models are employed to capture the operational RO plant behavior. To be more precise, linear correlations for the approximation of the retentate pressures of each stage and parallel flows have been derived, together with estimations of the water recovery of each stage. Then, the data has been used for the training of a feed-forward ANN with ReLUs as activation functions to approximate the permeate concentrations of each stage with the aim of calculating the overall permeate concentration of the system. An ANN with ReLUs has been selected since it can be exactly reformulated as a MILP, facilitating the optimization of the system, as it can be embedded into optimization formulations and conserve their linearity [58]. For each ANN training the obtained data has been split at random into a training set (70%), a validation set (15%) and a testing set (15%) to reduce the possibility of over-and under-fitting [59,60]. To do so, initially, the ANN is fit to the training data set by adjusting the weights and biases of all neurons in each hidden layer using the Levenberg-Marquardt algorithm to minimize the sum of squared errors [61,62]. Then, the fitted model is used for an unbiased evaluation of a second data set called validation, while the model's hyperparameters are tuned. Lastly, the final model is used for an unbiased approximation of the independent test data set [63]. Further, for all following surrogate models each data set has been normalized between −1 and 1. All reported root mean square error (R) values are referring to the normalized data and the complete data set under investigation (training, validation and testing for ANNs). Apart from that, a linearized objective function based on the energy consumption of the RO system is derived.

Retentate Pressures
To calculate the retentate pressure of each stage i and parallel flow j a linear correlation based on the respective feed pressure has been assumed, as summarized in Equation (1).
The results of the approximation are summarized in Figures 3-5 for one of the parallel flows of the first, second and third stage. The remaining results can be found in the Appendix A Tables A2-A5 and Figures A1-   Regarding stage 3, it can be seen that even with a relatively good root mean square error of R = 0.97 the actual behavior is not captured adequately. A reason for this is most likely that in this case, the pressure measurement occurs after the energy recovery device and not immediately after the RO unit. Nevertheless, for the energy optimization it is especially important to calculate the pressure drop across the energy recovery device (∆P ERD,j , j = {1, 2}) since it can be used accordingly for energy recovery and therefore has the potential to improve the energy efficiency of the system [64]. Therefore, for the third stage, the pressure difference across the ERD is approximated with an ANN with ReLU activation functions. To calculate said pressure difference from the obtained data, it is assumed that the pressure drop across the RO unit is negligible (P f ,3,j = P r,3,j , j = {1, 2}) for sufficiently high pressures [65]. Further, the data set for each parallel flow has been united in one overall data set, since the plant only uses one ERD. The most advantageous ANN in terms of simplicity and accuracy was obtained for one hidden layer and three nodes, resulting in an overall root mean square error of R = 0.9983. Further, the derived weights and biases of the ANN are summarized in Table A6 of the Appendix A. In this case, not only the accuracy could be improved, but the surrogate model follows more closely the observed trend, as can be seen in Figure 6. However, using the obtained ANN for calculating the output pressures of the ERD based on the ANN (P ERD out,j = P f ,3,j − ∆P ERD,j ) did not yield satisfactory results, as can be seen in Figure A8. It is important to note that P r,3,j as previously mentioned in Section 2, is now renamed as P ERD out,j . Since it may be desired to not only approximate the pressure difference across the ERD, but also the outlet pressure of the ERD, another ANN with ReLUs as activation functions is utilized to to so. The most advantageous result was obtained when the ANN was only trained with the data set of parallel flow one, resulting in R = 0.9786, with one hidden layer and two nodes. The derived weights and biases of the ANN are summarized in Table A7 of the Appendix A. The results of the surrogate are summarized in Figures A9 and A10. The presented approach results overall in a mathematical description of the pressures of stage 3 according to Figure 7.

Water Recoveries
To ensure that the RO plant operates adequately in terms of water consumption, the water recoveries have been estimated for each stage based on the actual pressure measured P i,j and the maximum observed pressure max(P i,j ) in each stage and parallel flow. A linear correlation between the pressures of each parallel flow and the water recovery of the respective stage is assumed [66], as introduced in Equation (2). Then, based on an overall mass balance the RO plant water recovery can be calculated according to Equation (3).
This approach results in water recovery estimations based on the obtained data as specified in Table 2.

Permeate Concentration
Since surrogate models for the retentate pressures and energy recovery have already been defined, the only variable left to compute is the overall permeate concentration. To do so, again an ANN with ReLUs as activation functions is employed. Further, instead of simply using defined inputs as the sole inputs of the ANN (input = {in(t)}) to compute the outputs (output = {out(t)}), it is also possible to consider as an additional input the inputs of the previous time point (input = {in(t), in(t − 1)}) or the outputs of the previous time point (input = {in(t), out(t − 1)}). The latter two approaches are used to capture the dynamics of the system. For all here presented ANN results, the last approach (input = {in(t), out(t − 1)}) was selected, since the R value was the highest for all investigated cases, compared to the other two set-ups.
Generally, with the available data, it is possible to generate surrogate models based on ANNs for several plant set-ups. It is possible to generate an ANN for each respective stage, for each RO train or for the complete RO plant. Further, several different process parameters can be considered as inputs of the ANN, as summarized in Table 3. It is worthy to note that adding several layers has been investigated for all presented cases as well. However, the accuracy could not be improved. The presented number of nodes (#nodes) in the hidden layer represents the number of nodes after which no significant change in the R value (second decimal) could be detected by further increasing the number of nodes.
Moreover, Table 3 also summarizes the results of each ANN training in terms of R, to compare the accuracy of the derived surrogate models. In this case, we desire a compromise between accuracy and simplicity of the model. The most accurate networks are being generated, when each stage is modeled separately. However, in this case the results of each surrogate have to be used for further calculations, i.e., closing the overall permeate mass balance, which reduces the accuracy of the obtained results in terms of approximating the overall permeate concentration of the system. In addition, more nodes are required to achieve these high R values (with the expectation of stage 3). Apart from that, in only 56 out of 14,543 data points (0.39%) the observed permeate concentration is higher than the restriction for drinking water (500 mg L [67]), with a maximum observed overall permeate concentration of 626 mg L . Therefore, we decided to use the overall RO plant surrogate model one (R = 0.895), which is arguably one of the least accurate surrogates of the presented ones, but by far the simplest one and yet still sufficiently accurate for the purpose of energy optimization, while fulfilling the drinking water quality restriction. The results of the ANN training for the selected case, split into the training, testing, validation, as well as showing the overall result, is summarized in Figure A11 of the Appendix A. Additionally, the derived weights and biases of the ANN are presented in Table A8 of the Appendix A.
It is important to mention that also a multivariate linear regression was employed instead of using an ANN, for the same inputs and outputs as the selected surrogate, which resulted in less accurate approximations (R = 0.848), as can be seen in Figure A12.

Objective Function
To calculate the overall RO plant energy consumption, the specific energy consumption (SEC) of a single pump, according to Equation (4), is applied to the RO system, resulting in an expression as shown in Equation (5) [68].

Optimization Model
To incorporate the ANN to calculate the pressure difference across the ERD and the ANN to approximate the permeate concentration in the optimization model, both ANN models are reformulated as MILPs, as presented in [58,70]. Therefore, the weights and biases of the former ANN are referred to as W ERD k,l and b ERD k,l , where the hidden and output layer can be distinguished with k = {1, 2} and the number of nodes in the hidden layer are given as l = {1, 2, 3}. For the latter ANN, an additional index h = {1, 2, 3, 4, 5} has to be introduced to distinguish the weights of the input layer for the set of inputs, resulting in W RO h,k,l and b RO k,l . Further, auxiliary variables x 1,l,j , s 1,l,j and z 1,l,j are introduced for the reformulation of the ANN approximating ∆P ERD,j . Accordingly, for the ANN calculating C p,t , x 2,l , s 2,l and z 2,l are introduced. To differentiate between the permeate concentration of a previous time point and the permeate concentration in the time point under investigation the indices t and t − 1 are utilized. Together with the other surrogate models for the retentate pressures and the water recovery estimation, this results in the following optimization model (Equations (7)-(25)) to minimize the overall energy consumption of the RO system.
C p,t ≤ C res p (10) P f ,2,j = P r,1,j , ∀j = {1, 2, 3, 4} (14) x 1,l,j − z 1,l,j · U 1,l,j ≤ 0, ∀j = {1, 2}, l = {1, 2, 3} (17) x 1,l,j , s 1,l,j , x 2,l , The objective function denoting the energy consumption of the RO plant, as derived in the previous section, is presented in Equation (7). To calculate the overall energy consumption in kW, the objective function is multiplied with Q f . To derive the specific energy consumption in kWh m 3 , the objective is divided by the overall water recovery of the system as calculated with Equation (3). From Equations (8)-(10) process restrictions are introduced to guarantee that a minimum feed flow and a minimum water recovery per stage are fulfilled, to produce drinking water (C res p = 500 mg L ). Equation (11) is introduced to incorporate a steady-state process assumption and guarantee that in t − 1 the water quality is comparable to the one obtained in t. Then, the water recoveries of each stage based on the presented linear pressure correlations (see Equation (2)) of the pressure in the respective parallel flows of each stage (Equation (12)) and the retentate pressure of stages one and two according to the linear pressure correlations are calculated (Equation (13)). Further, the retentate pressure of stage one is used as the feed pressure of stage two (Equation (14)). Equations (15)- (18) summarize the MILP reformulation of the ANN to approximate the pressure difference across the ERD, whereas Equations (20)- (23) are used as the representation of the ANN to calculate the overall permeate concentration. To successfully implement these reformulations, the auxiliary variables x 1,l,j , s 1,l,j , x 2,l and s 2,l have to be positive, to split the ReLU output into a positive and negative component according to the binary variables z 1,l,j and z 2,l (Equations (24) and (25)). This ReLU output distinction is further implemented with the inequality constraints shown in Equations (17), (18), (22) and (23), which introduce lower and upper boundaries for each output of a node in the hidden layer (L 1,l,j , U 1,l,j and L 2,l , U 2,l ). Overall, the optimization model results in a MILP, which was solved in MATLAB using the CPLEX solver.

Results and Discussion
Next, the optimization model is used for two studies. First of all, the energy of the system is minimized for distinct water recovery and feed flow restrictions to enable a comparison of generated results with the actual energy consumption of the desalination plant. Then, the trade-off between energy minimization and water utilization in terms of the feed flow and the overall water recovery are visualized in a Pareto front. To do so, the -constrained method is used for multi-objective optimization [71].

Energy Minimization
To evaluate the optimization model, the overall energy consumption of the RO plant for the year 2017 is compared to the energy optimization results. Therefore, the data set for the year 2017 for the feed flow and the estimated water recovery of the system are split into 12 sets according to the month of the year. Then, the monthly average of the feed flow and the overall water recovery are calculated. The obtained values are summarized in Table 4. These values are used as a pairwise input of the optimization model to update the restrictions shown in Equations (8) and (9). More specifically, the optimization model is solved while the restrictions of Equation (9) are updated to fulfill the set overall water recovery as presented in Table 4, calculated with Equation (3). The results of this case study are presented in Figure 8. In June 2017, the fraction of the derived RO desalination energy consumption to the overall energy consumption of the plant is at a year high of 51%, to compare, in March 2017 said fraction is at a year low of 10%, whereas calculating the average results in a fraction of only 40%. Generally, the energy consumption of the RO system is the major energy consumer of a RO plant having a major impact on the process performance and sustainability [72]. Therefore, the minimized energy consumption of the RO system is comparably low, i.e., less than half the overall consumed energy of the plant on average, yielding satisfactory results which can lead to substantial energy savings.
Originally, the obtained monthly energy consumptions were planned to be compared to literature values. However, the energy consumption of RO plants is reported in the literature normalized with the permeate output in kWh m 3 permeate . The here presented results are derived in MWh. However, the SEC of the plant for various operating points is presented and compared to literature values in Section 5.2. For now, the results of the energy optimization are compared to the overall energy consumption of the high service pumps of the desalination plant, to classify the order of magnitude of results. In 2017, the high service pumps consumed 1.5169 × 10 3 MWh year . Summing up the obtained results to calculate the energy consumption for 2017 results in 1.3929 × 10 3 MWh year . Consequently, the difference in energy consumption is 124 MWh or around 8%. Since the overall energy consumption of the RO system is in the same order of magnitude as the energy consumption of the high service pumps, the initial statement of obtaining advantageous results, in terms of minimizing the systems energy consumption, is confirmed.

Multi-Objective Optimization
After confirming the capabilities of the optimization model, the model was expanded to consider multiple objectives using multi-objective optimization. To do so, the constraints summarized in Equations (8) and (9) are updated to calculate the minimized energy for a set of water recoveries and feed flows. Compared to the previous study, where a pair of water recovery and feed flow were the inputs, an array of water recoveries and feed flows are used as inputs, performing the minimization at each feed flow for all specified water recoveries. The multi-objective optimization has only been performed in the RO plant's relevant water recovery region of 40% to 85%. The obtained Pareto-optimal solutions are visualized in Figure 9. It can be seen that the energy consumption increases with increasing water recoveries, as well as increasing feed flows. Apart from that, it can be deducted that the energy consumption from the lowest water recovery to the highest water recovery for all feed flows increases approximately by a factor of 5. Additionally, comparing the calculated energy consumption for the highest feed flow with the lowest one, results in the deduction that the gap between the two feed flows increases with increasing water recoveries. For a water recovery of 50% the energy consumption difference between the two is approximately 125,883 kW, whereas the difference increases to 636,471 kW for a water recovery of 85%. Therefore, depending on the separation task at hand, this result can lead to indirect energy savings.
To further analyze the results, the specific energy consumption in kWh m 3 of the results is calculated and summarized in Table 5. The specific energy consumption can be approximated with the linear correlation presented in Equation (26), with an accuracy of R = 0.998.
As expected, the specific energy consumption increases with the water recovery of the system. Increasing the water recovery by a factor of 1.7 from 50% to 85% results in approximately doubling the specific energy consumption (factor of 2.07), from 0.2455 kWh m 3 to 0.5072 kWh m 3 . This underlines the sensitivity of the specific energy consumption to changes in water recovery since the relative change between the respective factors is more than 20%. Lastly, we compared the obtained minimum specific energy consumption with literature values. Stillwell and Webber report observed literature values for the SEC of BWRO desalination plants between 0.5 kWh m 3 and 3 kWh m 3 [73]. In addition, Sassi and Mujtaba report a minimized SEC between 0.578 kWh m 3 and 0.730 kWh m 3 [20]. The highest here observed SEC is 0.5072 kWh m 3 for WR sys = 85%. Overall, this comparison underlines the competitiveness of the presented optimization methodology, as well as the potential energy savings when this approach is employed.  Figure 9. Pareto front evaluating the trade-off between minimizing the energy consumption in kW, the feed flow, as well as the water recovery of the system.

Conclusions
We presented a comprehensive methodology to utilize industrial scale RO desalination plant data for surrogate modeling and subsequent optimization. Linear surrogate models were developed for the retentate pressures of the parallel flows of the first and second stage. For the pressure difference across the ERD, as well as the output pressure of the ERD, ANNs with ReLUs were trained and reformulated as MILPs with the feed pressure of stage three as the sole input. To calculate the permeate concentration of the system several possible ANN formulations have been presented and compared in terms of their respective inputs and accuracy (R). Depending on the task at hand different surrogate models can be selected. For the subsequent optimization case studies the simplest ANN system resulted in sufficiently accurate results and was therefore reformulated as a MILP.
The optimization model was first used for minimizing the energy consumption of the RO plant based on the monthly averaged feed flows and water recoveries of the year 2017. The derived energy consumption constituted at most 51% of the monthly overall RO desalination energy consumption and on average 40% throughout the year, underlining the major energy saving potential using the presented methodology. In fact, the derived energy consumption is on the same order of magnitude as the energy consumption of the high service pumps of the RO plant. Then, the model was used to derive a Pareto front to illustrate the trade-offs between minimizing the energy of the system as well as maximizing the feed flow and water recovery of the system. Here, the potential for indirect savings was uncovered by comparing the increase in energy consumption for the lowest and highest utilized feed flows. The results were further analyzed in terms of the specific energy consumption of the system, which ultimately showed the significant impact of changing water recoveries on the specific energy consumption.
Overall, the presented methodology should be understood as a recipe for other researchers on how to move from obtained RO data sets to an optimization model for single and multi-objective optimization using linear data driven surrogate models. In subsequent work, the objective function can be re-evaluated by introducing a nonlinear function and compare obtained results to the linear case. Moreover, if other data points across the plant can be obtained, i.e., the permeate flows or water recoveries for each stage, the accuracy of the surrogates can be further validated and if necessary modified.

Data Availability Statement:
The data of this study can be provided by the authors upon request.

Acknowledgments:
The authors would like to thank the Texas A&M Energy Institute and the Multiparametric optimization & control group for their help and support. Furthermore, the authors would like to thank the SAWS H 2 Oaks Desalination Facility.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The       Table A8. Weights and biases of the ANN to approximate C p,sum . Five inputs (Q f , WR 1 , WR 2 , WR 3 , C p,t−1 ) and one hidden layer with three nodes.

Layer Weight Bias
Hidden layer