Surrogate Modeling for Liquid–Liquid Equilibria Using a Parameterization of the Binodal Curve

Computational effort and convergence problems can pose serious challenges when employing advanced thermodynamic models in process simulation and optimization. Data-based surrogate modeling helps to overcome these problems at the cost of additional modeling effort. The present work extends the range of methods for efficient data-based surrogate modeling of liquid–liquid equilibria. A new model formulation is presented that enables smaller surrogates with box-constrained input domains and reduced input dimensions. Sample data are generated efficiently by using numerical continuation. The new methods are demonstrated for the surrogate modeling and optimization of a process for the hydroformylation of 1-decene in a thermomorphic multiphase system.


Introduction
In chemical process design, advanced thermodynamic models offer increasing predictive accuracy for physical properties of mixtures. However, this correlates with increasingly complex models, which introduce serious challenges when used for process simulation and optimization. Problems may arise from embedded iterative solution procedures for the thermodynamic models, which may not converge to a valid solution or require a large number of iteration steps. The benefits of advanced thermodynamic models can be used by replacing them with surrogates that are computationally easy to handle [1].
The concept of surrogate modeling comprises a number of different approaches [2] that depend on the type of the original model and the desired properties of the surrogate. In chemical process engineering, data-driven surrogate modeling is usually employed [3]. This class of methods generally treats the original model as a black box with input-output data. The surrogate that replaces the original model is constructed such that it approximates the data and is easy to evaluate. These requirements define major steps of data-driven surrogate modeling.
In a first step, inputs and outputs have to be selected from the original model. This determines the properties of the input-output data and thereby affects both the accuracy and size of the surrogate model. A larger input dimension increases the size of the surrogate and necessitates more sample data. A difficult input domain may require modeling with an additional surrogate [4]. In addition, output multiplicities have to be treated with special methods [5].
Several data-driven surrogate models are available, e.g., polynomials, neural networks, and Gaussian processes [2]. The choice for the surrogate model structure is usually not trivial and a number of different models should be tried. Tools for automatically selecting a surrogate model structure are available (e.g., [6]).
Parameter fitting for surrogate models requires sample data that fully characterize the input-output behavior. The additional cost for sample generation is reduced by employing efficient sampling methods. Adaptive sampling approaches that employ exploration and exploitation strategies are preferred over one-shot methods in recent literature (see [3,4] for references).
Although various strategies are available for each aspect of surrogate modeling, there is still significant potential for improvements, especially when problem-specific features are taken into account.
The present contribution extends the methods of surrogate modeling for liquid-liquid phase equilibria (LLE). This includes efficient sample generation by employing a numerical continuation method and an efficient surrogate formulation that has a box-constrained input domain and a reduced input dimension compared to established formulations. In Section 2, LLE modeling is introduced briefly and a description of surrogate modeling of LLE is given that summarizes recent work on the topic. Section 3 describes the methods of sample data generation and surrogate modeling proposed in this work. A case study based on a process for the hydroformylation of 1-decene in a thermomorphic multiphase system is used in Section 4 to demonstrate the proposed methods. A short conclusion is given in Section 5.

Background
This section summarizes the required background for LLE modeling and recent work on surrogate modeling of LLE.

Liquid-Liquid Equilibrium Modeling
The liquid-liquid equilibrium is described by a two-phase flash model comprising component mass balance conditions and thermodynamic equilibrium conditions Here, N j i is the mole amount, µ j i is the chemical potential, T j is the temperature and p j is the pressure in phase j ∈ {I, II} for component i. The mole amounts of the feed mixture are denoted by N in i and the total number of components in the mixture is N c . The conditions for equal chemical potential can be reformulated for liquid-liquid equilibria as the isoactivity conditions with mole fractions x j i , vectors of mole fractions x j , and activity coefficients γ j i .
Iterative procedures are usually utilized to find a solution of the implicit equilibrium model, evaluating the activity coefficients provided by the thermodynamic model in each iteration. Therefore, using advanced models such as the perturbed-chain statistical associating fluid theory (PC-SAFT) [7] or universal quasichemical functional-group activity coefficients (UNIFAC) [8] directly for flowsheet simulation and optimization is unfavorable in terms of convergence behavior and computational effort. Furthermore, excluding undesired trivial solutions of the equation system given above requires additional effort. The following section describes the technique of replacing such expensive calculations by data-driven surrogate models that are computationally easier to handle.

Surrogate Modeling of LLE
Surrogate modeling aims at replacing a function f , with a surrogate functionf that is similar to the original function f and exhibits favorable properties, e.g., much lower computational effort. More specifically, data-driven surrogate modeling generally uses samples (x k , f (x k )) to fit parameters of the surrogatef such that some performance criterion is satisfied. If an approximation of the original functionf ≈ f is desired, typical performance criteria include minimizing the mean squared error or mean absolute error with respect to the samples. Another performance criterion is the exact reproduction of the sample pointsf (x k ) = f (x k ), which can be used for instance in Kriging interpolation (e.g., [9]). In this sense, data-driven surrogate modeling is another name for nonlinear regression and interpolation methods. Applications for surrogate modeling include, but are not limited to, cases where the original function is expensive to evaluate, e.g., computational fluid dynamics simulations, or where a functional form may not be available, e.g., experimental results. This section gives a summary of recent efforts [4,10] on using data-driven surrogate models for phase equilibrium calculations, particularly for LLE, to improve computational performance while preserving high model accuracy. For the sake of readability, some details of each approach are omitted here and we refer to the original publications for comprehensive descriptions. A broader overview on general surrogate modeling is available in the literature [2,3,11,12].

Selection of Inputs and Outputs
Distribution coefficients K i := N I i /N in i are introduced, which allows one to rewrite the equilibrium model as In the model comprising Equations (7)-(10), f i is replaced by a surrogatef i ≈ f i . Kriging interpolation is used for the surrogatesf i in [4,10]. The inputs of the surrogate are N c − 1 mole fractions of the feed mixture and the temperature. The outputs comprise the N c distribution coefficients. The pressure does not appear as an input since its influence on the liquid-liquid equilibrium is assumed to be negligible.
It should be emphasized that, given a feasible feed mixture and temperature, all quantities in Equations (7)-(10) can be calculated in a sequential fashion.

Description of the Input Domain X
A natural choice for the input domain, given molar fractions and the temperature as inputs, would beX However, the distribution coefficient model in Equations (7)-(10) is only valid within the miscibility gap of the mixture. Therefore, box constraints are not suitable to accurately describe the complete input domain X of the surrogatef i . Instead, the input domain can be defined by a classifier g(x in 1 , . . . , x in N c −1 , T), which is less than zero only within the miscibility gap: Possible choices for the classifier model include a set of linear expressions g = A (x in 1 , . . . , x in N c −1 , T) − b as in [10] or a scalar function g(x in 1 , . . . , x in N c −1 , T) as in [4]. Since the scalar classifier g in [4] is as difficult to calculate as the equilibrium model, it is approximated by another surrogate modelĝ ≈ g, in this case a support vector machine.

Sample Data and Surrogate Generation
The selection of sample locations for data generation can be categorized in one-shot space-filling designs, which select locations according to pre-defined strategies, and adaptive sampling approaches, which exploit information from available sampling data to select new sampling locations. In [10], sample values are calculated by evaluating the equilibrium model at pre-defined sample locations generated by a space-filling design. A major contribution of Nentwich and Engell [4] is a method of adaptively choosing new sample locations based on available sample data to improve the quality of the approximation compared to one-shot designs with the same number of samples. In principle, new sample locations are chosen by maximizing a weighted sum of the minimum distance to existing sample locations and the predicted variance, i.e., by exploration and exploitation strategies. The surrogate model needs to be updated in each iteration of this method.
In both references, the sample generation takes a substantial amount of time. For a constant sample density, the number of required samples scales exponentially with the dimension of the input domain. In addition, the equilibrium model is solved at each sample location independently, i.e., the solution algorithm needs to converge starting from an independent initial guess.

Methods
The literature approaches to surrogate modeling of LLE rely on a regression model for the distribution of components between phases and an additional classification model to determine whether a given set of input values lies in the two-phase region. This section describes new approaches aiming at extending the pool of methods for surrogate modeling of LLE. For this, a model reformulation that enables smaller surrogate models and a method for efficiently calculating sample data are presented.

Parameterization of Binodal Curves
The strategy for reducing the computational effort for surrogates in LLE modeling presented here comprises two ideas. First, instead of modeling the entire miscibility gap, only the binodal curve is described by a surrogate, which reduces the surrogate input dimension by one. As shown in Figure 1 for a ternary mixture, the binodal curve has one dimension fewer than the miscibility gap, i.e., it is a line instead of an area. Note that for more than three components N c > 3, the set of equilibrium compositions is a (N c − 2)-dimensional surface. For a constant sample density, the number of required sample points for a surrogate model depends exponentially on its input dimension. Reducing the number of inputs therefore improves computational tractability of the surrogate.
Second, instead of using molar fractions as inputs of the surrogate, the position on the binodal curve is introduced as a new coordinate, marked by t 1 in Figure 1. Each value of this coordinate corresponds to the endpoints of a tie line, i.e., the compositions of two phases in equilibrium. The parameterization variable t 1 can be defined from the border of the phase diagram to the critical point, if the data are available, or just on a section of the binodal curve. This removes the need for a complicated (surrogate) model of the input domain, since the parameterization of the binodal curve can be chosen such that box constraints are sufficient, i.e., t 1 ∈ [0, 1]. Using these ideas, the equilibrium model is reformulated as follows.
Here, the parameterization variables for the binodal curve are denoted by t 1 , . . . , t N c −2 . In the model comprising Equations (13) and (14), f The outputs comprise 2(N c − 1) independent mole fractions. We decided to include all 2N c molar fractions here, because eliminating one molar fraction by using the summation condition would make it harder to have the same error scaling for all components during surrogate fitting and it would increase the maximum error for the eliminated component in the final surrogate model.
In summary, the parameterization of binodal curves enables the generation of smaller surrogate models due to the reduction of the input dimension and the definition of the input domain by box constraints. The overall model is rendered implicit by this reformulation. Implicit models are generally harder to solve than explicit models. However, the reduced model size, and thereby the reduced number of nonlinear expressions that have to be evaluated, is favorable for the computational performance.

Numerical Continuation for Sample Calculation
Calculations of phase equilibria, especially for LLE, often require considerable computational effort due to iterative solution procedures and in some cases good initial guesses to calculate valid equilibrium compositions. Therefore, fast surrogates are employed for simulation and optimization. The original equilibrium model is only used during sample generation. However, the samples need to cover the entire miscibility gap to enable accurate surrogate modeling and their acquisition can take a considerable amount of time.
The computational cost of sample generation can be reduced by exploiting existing sample data. For instance, in [4], new sample locations are chosen such that they are most likely to improve the surrogate model, thus reducing the overall number of required sample points.
The basic idea followed in the present work is using numerical continuation methods (e.g., [13]) to improve convergence behavior of equilibrium calculations and thereby to reduce the computation time. Continuation methods are used to find solutions of an equation system that depend on a parameter, e.g., solutions of the equilibrium model depending on the feed composition. Varying this parameter, a prediction for the new solution is calculated based on previous solutions and then the prediction is corrected. Since initial guesses are improved using systematic predictions, the correction step converges more quickly to an accurate solution.
In an earlier work [14], the phase stability of an arbitrary point in the phase diagram is tested by tracking valid solutions of the equilibrium model from a known starting composition to the desired location using homotopy continuation. This strategy is adapted here to track solutions of the equilibrium model along the binodal curve by varying the feed composition.
We apply a dynamic method for minimization problems (e.g., [15]) to solve the equilibrium model for a given feed composition. The following ordinary differential equation system In an effort to track solutions along the binodal curve, the feed composition is varied along a line with a constant phase ratio. This trajectory, depicted in Figure 2, starts from a point on the border of the phase diagram and covers the entire miscibility gap of a ternary mixture. For all feed compositions along the trajectory, corresponding points on the binodal curve are obtained by solving the equilibrium model using Equations (16) and (17). In comparison to the previous approaches in [4,10], this strategy only requires sampling on a line in the miscibility gap instead of sampling in the whole area. If the starting point does not lie on the border of the phase diagram, the line has to be tracked in both directions from the starting point.
An efficient approach to characterize the overall phase behavior of a mixture by evaluating the convex envelope of the Gibbs energy, and thus to identify viable starting points within a miscibility gap, is given in [16].
The continuation method is implemented here by augmenting Equations (16) and (17) with additional expressions that offer convergence to the selected phase ratio and movement of the feed composition perpendicular to the current tie line. Each additional expression is chosen such that the dynamic it introduces is orders of magnitude slower than that of the equilibrium conditions. The resulting stiff ordinary differential equation system is solved using the MATLAB function ode15s [17]. In the case of mixtures with more than three components, the equilibrium compositions form a (N c − 2)-dimensional surface instead of a one-dimensional curve. In the present work, the continuation method is adapted to this case by fixing the molar fractions of additional components to constant values and repeat the calculations for all sets of fixed values.
In an alternative approach, instead of varying the feed composition, the compositions of both equilibrium phases could be directly tracked along the binodal curve using continuation methods [18] with Equations (16) and (17) as a possible choice for a robust corrector.

Results
This section demonstrates the application of the methods presented in the previous section to the modeling and optimization of a liquid-liquid extraction process.

Description of the Case Study
A multistage liquid-liquid extraction process adapted from [10] is used here as a case study for the methods presented above. In [10], the optimization of a process for the homogeneously catalyzed hydroformylation of 1-dodecene in a thermomorphic multiphase system is considered. The mixture is homogeneous at reaction conditions and biphasic at lower temperatures. This is used to separate the catalyst from the product in an extraction cascade and to recycle the catalyst back to the reactor. The extraction cascade considered here is highlighted in Figure 3, which shows a modified flowsheet of the overall process from [10]. In the present work, the mixture consists of the polar solvent dimethylformamide, the nonpolar solvent dodecane, the product n-undecanal and the substrate 1-decene. As in [10], the mixture includes a catalyst, which is distributed between the two phases depending on their equilibrium composition. The catalyst concentration is higher in the more polar phase than in the less polar phase. Moreover, it is assumed that the catalyst has no influence on the phase equilibrium due to it having a very low concentration. Thus, the catalyst partition coefficient is modeled at infinite dilution and therefore it is added to the set of surrogate outputs, but not to the inputs. All extraction stages are operated at the same constant temperature T = 298.15 K, as suggested by the results in [10]. The resulting equation system for a single extraction stage readṡ withṅ j i denoting molar flows of component i in phase j and f cat calculating the logarithmic partition coefficient log 10 P 12 for the catalyst between the less polar phase I and the more polar phase II. Table 1 lists the names of the components corresponding to the index i.  With N c = 4, the inputs of the surrogate are two parameterization variables t 1 and t 2 . The outputs are eight mole fractions for the equilibrium compositions in both phases as well as the catalyst partition coefficient.
The extraction cascade consists of k = 1, .., N s stages, numbered from k = 1 for the output stage of the more polar phase II to k = N s for the output stage of the less polar phase I. The stage k = 1 is also the feed stage. Some of the polar solvent is separated using distillation column 1 and recycled to the extraction cascade at stage k = N s . The mass balances that describe the extraction cascade reaḋ n in cat,1 =ṅ II cat,2 +ṅ feed cat , n in cat,k =ṅ I cat,k−1 +ṅ II cat,k+1 , k = 2, . . . , N s −1, whereṅ col i is the distillate molar flow of distillation column 1 that recycles the extraction solvent anḋ n feed i is the feed flow of the extraction cascade for component i. The mass balances of the distillation column are replaced by the approximation that the distillate of column 1 contains only the polar solventṅ col =ṅ col 1 . The distillate flow is a degree of freedom for this process. A larger distillate flow is able to extract more catalyst, but requires, e.g., more heat for vaporization, a larger distillation column, and larger extraction stages. The costs considered for this process comprise the catalyst that is retained in the nonpolar product stream, the investment for the decanter vessels, and the operation and investment for distillation column 1. The catalyst costs are proportional to the amount of catalyst that is lost. The investment costs for the distillation column and the decanter vessels are calculated based on [19]. The sizing of the column utilizes the Fenske-Underwood-Gilliland correlations [20][21][22][23] with mean mixture and component properties. A simplified cost function is generated by fitting a polynomial function to data obtained by evaluating the cost calculation for a set of different compositions of the column feed and different values of the distillate flow rate at a distillate purity of 99 % (see [24] for more details). The overall cost J in $ year −1 for this process reads with parameters given in Table 2.

Data Generation
The method presented in Section 3.2 is used to generate sample data for f j i . The activity coefficients are calculated utilizing a modified UNIFAC (Dortmund) model [8] with some parameters fit to experimental data as in [10]. Recall that, for four components, one molar fraction is required to be fixed to a constant value for each continuation run. Therefore, a set of binodal curves is calculated for different fixed values of the molar fraction of 1-dodecene in the less polar phase x I 4 ∈ {0, 0.01, . . . , 0.25}. Each continuation run is initialized at the boundary of the phase diagram with zero n-undecanal, i.e., x I 3 = x II 3 = 0. The starting point is calculated by solving the dynamic equation system using ode15s in MATLAB with an arbitrary initial guess for the molar fractions of the polar solvent and the nonpolar solvent. Beginning at the calculated starting point, the feed composition is continued along the line of a constant phase ratio of 0.5. The continuation is stopped when the condition x I 1 = x I 2 is met, i.e., when the amount of polar solvent in the less polar phase is equal to that of the nonpolar solvent. This stopping criterion is chosen here because the extraction of catalyst is ineffective beyond that point due to low values of the partition coefficient. An example of the resulting binodal curves is shown in Figure 4.
The total number of 26 continuation runs produce more than 4 × 10 4 samples with an accuracy The calculations require fewer than 2 × 10 5 calls to the UNIFAC implementation of the activity coefficient model (γ j 1 , . . . , γ j N c ) = f UNIFAC (x j , T, p) and a computation time of less than 1 min on a standard desktop computer without utilizing parallelization. For each binodal curve with a fixed value of x I 4 , this corresponds to more than 1500 samples and less than 2 s of computation time. Calculating each starting point requires approximately 1000 function calls to the activity coefficient model and each binodal curve requires fewer than 5000 function calls. In other words, the continuation method required fewer than four function calls per sample point. Note that each evaluation of the right-hand-side of Equations (16) and (17) Table 3).
The results from f j i are used in COSMOtherm [25] to calculate the logarithmic partition coefficient of the catalyst f cat using the same parameterization and catalyst COSMO input file as in [10]. Continuation is not available for COSMOtherm. Therefore, the logarithmic partition coefficient needs to be calculated separately for each phase composition, which takes a few seconds per sample point. To limit the computational effort, only 51 equally spaced sample points are used for each binodal curve.
The overall dataset used for the surrogate generation comprises 51 × 26 = 1326 sample locations on a regular grid for the input variables (t 1 , t 2 ) ∈ [0, 1] 2 . The variables (t 1 , t 2 ) determine the equilibrium compositions of the two phases, i.e., the position on the two-dimensional binodal "curve". The value of the molar fraction of 1-dodecene in the less polar phase is defined as x I 4 := 0.25 t 2 . At each sample location, the output vector contains nine values, namely the phase compositions and the logarithmic catalyst partition coefficient: with f = ( f I 1 , . . . , f I N c , f II 1 , . . . , f II N c , f cat ) . Because the definition for t 2 makes interpolating x I 4 redundant, only eight outputs are used for the surrogate model generation, which corresponds to 1326 × 8 = 10,608 sample values.

Surrogate Fitting
In the present work, the surrogate model is constructed as a sum of a second order polynomial function and a shallow artificial neural network (ANN) [26], i.e., an ANN with one hidden layer: The polynomial function captures most of the information contained in the data using only simple expressions. The ANN is employed to further reduce the interpolation error of the surrogate model. The second-order polynomial function f poly (t 1 , t 2 ) = a 00 + a 10 t 1 + a 01 t 2 + a 11 t 1 t 2 + a 20 t 2 1 + a 02 t 2 2 is fitted to the original data in three steps. First, for each output f i , all parameters a ij of Equation (31) are determined by minimizing the sum of squared errors. Second, all terms with a low contribution to an output value are set to zero for that output value to reduce the number of nonlinear expressions, i.e., if a ij =00 ≤ 0.05 max kl =00 |a kl |, then a ij := 0.
Third, the remaining parameter values are recalculated by minimizing the sum of squared errors. The polynomial function has a maximum of 6 × 8 = 48 nonzero parameters.
Afterwards, the shallow ANNf ANN is fitted to the residual of the data f (t 1,k , t 2,k ) −f poly (t 1,k , t 2,k ). The model is fitted using the train command of MATLABs deep learning toolbox with the following options: the hyperbolic tangent as the activation function, the mean squared error as the performance function, data division using interleaved indices, Levenberg-Marquardt backpropagation, the regularization parameter equal to 1 × 10 −11 , and a minimum gradient equal to 1 × 10 −16 . The error weights w e,l for the outputs l are chosen proportionally to the inverse squared data range over all samples k ∈ {1, . . . , N samples } w e,l ∝ range( f l ) −2 , l = 1, ..., N outputs , (33) with The remaining options are at default values. The number of neurons in the hidden layer N neurons is varied from 5 to 15 to show the trade-off between the model accuracy and the number of parameters and nonlinear expressions.
The results of the parameter fitting are shown in Table 3 with the error metrics defined below. A part of the training algorithm relies on random number generation. The variations in model performance caused by this are not accounted for. The ANN training was executed once for each number of neurons N neurons in the hidden layer. The mean normalized error e mean norm is the algebraic mean, over all outputs l ∈ {1, ..., N outputs } and samples k ∈ {1, . . . , N samples }, of the absolute error of an output value divided by the range of values for that output, i.e., The maximum normalized error e max norm is the maximum error of any output l over all samples k divided by the range of values for that output, i.e., The maximum normalized error offers a data-independent estimate of the surrogate quality, e.g., a surrogate with e max norm ≥ 0.5 is not better than a constant function. The error metric e max norm,poly is the same as e max norm , but with the surrogate comprising only the polynomial functionf =f poly .
The surrogate fitting includes all molar fractions of a phase, although one of the molar fractions could be replaced by the summation condition. However, if the summation condition is used to replace one molar fraction, the worst case error for this quantity is the sum of absolute errors for all other molar fractions. This is increasingly relevant for a larger number of components. Including all molar fractions allows one to define suitable error weights for all quantities.
The results in Table 3 show that the surrogate can reproduce the data to a high degree of accuracy with a relatively low number of parameters. Based on the results for different numbers of neurons in the hidden layer, a surrogate model with N neurons = 10 is used for the following surrogate based optimization. The mean error over all sample data and surrogate outputs normalized by the data range for this surrogate model is 0.00039 and the maximum normalized error is 0.0038. For 10608 sample values, this is achieved using 158 nonzero parameter values. There is no meaningful distinction between training set and validation set for this ratio of data points to number of parameters. The difference between surrogate model and sample data is also illustrated in Figure 4.
If only a second-order polynomial function is used as a surrogate, the maximum normalized error is 0.1002.

Optimization
In this section, the surrogate model is tested for its suitability to be used in process optimization problems. The process model introduced in Section 4.1 is used as a case study. A mixture containing four components plus a catalyst is considered for the liquid-liquid extraction at a fixed temperature and pressure. The aim is to find the optimal extraction solvent recycle such that the costs, comprising energy, investment, and catalyst loss, are minimized. Surrogate models are employed for the liquid-liquid equilibrium in each extraction stage to keep the computational burden of the optimization problem low.
The original phase equilibrium model is replaced by the surrogate model and slack variables δ that account for regression errors with the values of the slack variables lying between lower bounds δ j i and upper boundsδ The values of the upper and lower bounds are estimated by the maximum error of the surrogate model at the sample points k ∈ {1, . . . , N samples }.
The optimization problem reads minimize J (46) with domains for all variables set to appropriate values and additional redundant constraints that enable tighter bounds for deterministic global optimization. The hyperbolic tangent is formulated as tanh(x) = 1 − 2 1+exp(2 x) as suggested in [26]. The feed streamṅ feed i is given in Table 4. The number of stages is fixed to values between N s = 1 and N s = 6. The nonlinear programming (NLP) problem is implemented in GAMS 26.1.0 [27] using the deterministic global solver BARON 18.11.12 [28] with CONOPT 4.09 [29,30] as the nonlinear programming subsolver and CPLEX 12.8.0 [31] as the mixed-integer programming subsolver, and it is solved on the same standard desktop computer as used in Section 4.2. All options are at default values except for the relative optimality criterion, which is set to 1 × 10 −3 . The optimization is repeated for different numbers of stages N s . The computation time for global optimization is the total CPU time used as reported by BARON. Performance variability [32] is not accounted for: Each optimization problem is solved once. The time and the iterations required to find a local solution are determined by solving each problem with CONOPT and using the total time elapsed as reported by GAMS. Default values chosen by GAMS are used as initial guesses for all variables. Although this initial point is always infeasible, the selected solvers found feasible solutions in all observed cases.
The results in Table 5 show that the cost of the process falls steeply until a minimum is reached at N s = 4. The cost increases slightly for N s > 4. This behavior is due to large solvent recycles being required for good catalyst retention at low numbers of stages and the relatively small costs of increasing the number of stages. The overall optimum at N s = 4 resembles a trade-off between the cost for catalyst loss and solvent recycle on one hand and the investment cost for additional stages on the other hand.
The solvent recycle for N s = 1 is an exception. The feed of the single stage already contains a large amount of the extraction solvent. Increasing the solvent amount further has only a marginal effect on catalyst retention, which does not compensate for the additional cost of solvent recycle at some point. The surrogate model implementation is significantly faster than the original model considering that we observed computation times of a few seconds for a single COSMOtherm evaluation. The computation times using BARON show that the surrogate model formulation is also suitable for small to medium sized deterministic global optimization problems. CONOPT was able to find a global solution for each of the considered cases. BARON requires the computation time to prove globality by converging the lower bound.

Conclusions
In this work, we introduce a new surrogate formulation that uses a parameterization of the binodal curve to reduce the input dimension and simplify the input domain. We also propose using numerical continuation methods for sampling of LLE data based on advanced thermodynamic models. The presented methods reduce the time required for sample data generation and allow for highly accurate reproduction of the data using smaller surrogate models.
Parameterization of complex input domains and sampling by numerical continuation are general strategies that may be extended to other applications in future work. Another possible improvement is replacing the activity coefficient model currently used for LLE calculations with more predictive equation of state models. The computation time of global optimization may be further reduced by employing reduced-space formulations of the optimization problem as in [26,33,34].
Funding: This research was funded by DFG grant number TRR 63.