A Hybrid Framework for Simultaneous Process and Solvent Optimization of Continuous Anti-Solvent Crystallization with Distillation for Solvent Recycling

Anti-solvent crystallization is frequently applied in pharmaceutical processes for the separation and purification of intermediate compounds and active ingredients. The selection of optimal solvent types is important to improve the economic performance and sustainability of the process, but is challenged by the discrete nature and large number of possible solvent combinations and the inherent relations between solvent selection and optimal process design. A computational framework is presented for the simultaneous solvent selection and optimization for a continuous process involving crystallization and distillation for recycling of the anti-solvent. The method is based on the perturbed-chain statistical associated fluid theory (PC-SAFT) equation of state to predict relevant thermodynamic properties of mixtures within the process. Alternative process configurations were represented by a superstructure. Due to the high nonlinearity of the thermodynamic models and rigorous models for distillation, the resulting mixed-integer nonlinear programming (MINLP) problem is difficult to solve by state-of-the-art solvers. Therefore, a continuous mapping method was adopted to relax the integer variables related to solvent selection, which makes the scale of the problem formulation independent of the number of solvents under consideration. Furthermore, a genetic algorithm was used to optimize the integer variables related to the superstructure. The hybrid stochastic and deterministic optimization framework converts the original MINLP problem into a nonlinear programming (NLP) problem, which is computationally more tractable. The successful application of the proposed method was demonstrated by a case study on the continuous anti-solvent crystallization of paracetamol.


Introduction
Solution crystallization involves the formation of a crystalline solid state from a solution; it is frequently used as a separation and purification technology in pharmaceutical industry. Anti-solvent crystallization is an established method of pharmaceutical crystallization [1]. A miscible anti-solvent is mixed with a feed stream containing a solute dissolved in a solvent. If the anti-solvent is selected appropriately, a solvent mixture is created with low solubility for the solute, which provokes crystallization despite the dilution effect from mixing. Solvents are a main source of waste in pharmaceutical processes. Anti-solvent crystallization generally produces more solvent waste compared to other crystallization methods such as cooling or evaporative crystallization unless the anti-solvent can be separated and recycled.
The current paradigm shift in the pharmaceutical industry towards continuous manufacturing provides new opportunities for solvent reduction via continuous anti-solvent recycling [2][3][4][5][6][7][8]. However, the integration of pharmaceutical crystallization with solvent separation and recycling raises new challenges related to the optimal solvent selection, for which effective computational tools are needed. First, the solvent selection for such a process is non-trivial due to trade-offs between optimal solvent properties for different unit operations. For example, a solvent pair that provides a strong solubility reduction and, consequently, a high yield for anti-solvent crystallization may be difficult to separate for recycling and reuse. Second, the optimal process design is dependent on the solvent selection. Therefore, the process and solvent types are preferably optimized simultaneously due to this general interdependence between optimal solvent properties and process operating conditions and design [9][10][11][12]. Third, optimal solvent selection requires the consideration of various performance criteria related to aspects such as safety, economics, health, and environment. Therefore, the successful application of a solvent design method for an integrated crystallization and solvent separation process requires accurate models for predicting key thermodynamic properties for different unit operations and efficient solution strategies enabling simultaneous solvent and process optimization considering various performance criteria.
Several computer-aided approaches have been developed for solvent selection and design of solution crystallization processes. Karunanithi et al. [13] proposed a computer-aided molecular design framework for the design of solvents or anti-solvents for solution crystallization. The optimal solvent was determined based on solubility, which was predicted by the UNIFAC model. Other solvent properties such as flash point, toxicity, viscosity, and boiling point were posed as constraints, which were predicted by the group contribution method. Different solvent design methods can also be developed using different solubility prediction models [14,15]. Most recently, Watson et al. [16] proposed a methodology enabling the simultaneous identification of the optimal process temperature, solvent and anti-solvent molecules, and solvent mixture composition by using the SAFT-γ Mie equation of state. Besides adopting solubility as the key performance criterion, the optimal solvent could also be selected based on its influence on the crystal morphology. Chen and Trout [17] developed a solvent selection procedure for improving the morphology of needle-like crystals, where the solvent effects on crystal morphology were captured by using a modified attached energy model.
Most of the previous solvent design methods focus on stand-alone crystallization processes. The shift to continuous operation opens up opportunities for integrating crystallization with other up-or down-stream processing steps, [18] which also brings new challenges on the optimal solvent selection. The solvent should be selected not only based on the optimal properties for crystallization, but the ability for regeneration and recycling [19], or the easiness for solvent-swap [20] should also be considered. We have previously developed a systematic approach for simultaneous solvent and process optimization for a continuous anti-solvent crystallization process with solvent separation by flash [21]. A perturbed-chain statistical associated fluid theory (PC-SAFT) model [22] with a unified parameterization scheme was adopted to calculate thermodynamic properties, including the solubility of pharmaceuticals in solvent mixtures for crystallization and component fugacity coefficients for the flash separation. The PC-SAFT equation of state enabled the reliable prediction of various pure component and mixture properties using only few molecule parameters. In addition, the low-dimension parameter space of the PC-SAFT model allowed for efficient optimization. The integrated solvent and process optimization problem inherently led to a mixed integer nonlinear programming (MINLP) problem due to the discrete nature of solvent selection. The continuous mapping method [23] was applied to convert this MINLP problem into a nonlinear programming (NLP) problem to identify an approximate solution to the original MINLP problem. This solution strategy relaxed the solvent PC-SAFT parameters and searched for optimal parameters within a predefined convex hull, which reduced the computational load. The successful application of this integrated optimization approach has been demonstrated for a case involving a process model based on material balances only [21], which was later extended with energy balances and distillation with a fixed configuration of two equilibrium stages [24]. Although such extension allowed for energy costs to be included in the objective function, the fixed configuration for distillation restricted the solution space. A process superstructure must be optimized when considering the number of equilibrium stages as degrees of freedom, which increases the computational load of the problem substantially and cannot be solved when using the existing framework. Therefore, novel computational methods are needed to support decision making for these types of processes.
The objective of this work was to develop a computational method for the simultaneous solvent selection and process optimization of continuous anti-solvent crystallization processes with distillation for solvent recycling. The process model includes the prediction of thermodynamic and heat properties using the PC-SAFT equation of state. Therefore, MESH equations (material balance/equilibrium/fraction summation/enthalpy balance) can be employed to provide a rigorous description of the processes. Moreover, the number of equilibrium stages in the distillation column was considered a decision variable, which was fixed in earlier work [24]. A novel hybrid stochastic-deterministic optimization framework was developed to solve this type of process optimization. In particular, the integer variables for solvent selection were relaxed by the continuous mapping method in order to make the size of the optimization problem independent from the number of solvents under consideration. The integer variables representing process configurations embedded within the process superstructure were dealt with using a genetic algorithm. As a consequence, the original MINLP problem was converted and solved as an NLP problem, which is computationally more reliable. The presented approach aims to enable faster process development with reduced experimental efforts and, ultimately, reduced solvent waste in industry.

Approach
The continuous process ( Figure 1) involves a crystallizer for crystal formation and growth followed by filtration for crystal separation of the crystals from the mother liquor and distillation for anti-solvent separation and recycling. The feed solution to the crystallizer is assumed saturated with an active pharmaceutical ingredient (API). Crystallization is operated at constant temperature, and the supersaturation is generated by the addition of recycled and fresh make-up anti-solvent. After filtration, the mother liquor is pre-heated to the bubble point and fed into the distillation column. The top product of the distillation column is rich in anti-solvent and is partially recycled back to the crystallizer after cooling. The remaining part is possibly sent back to the column as reflux. The bottom product of the column is considered a waste stream and contains solvent, dissolved API, and some anti-solvent. The distillation column can consist of multiple equilibrium stages with one total condenser operating at the bubble point and one partial reboiler. on material balances only [21], which was later extended with energy balances and distillation with a fixed configuration of two equilibrium stages [24]. Although such extension allowed for energy costs to be included in the objective function, the fixed configuration for distillation restricted the solution space. A process superstructure must be optimized when considering the number of equilibrium stages as degrees of freedom, which increases the computational load of the problem substantially and cannot be solved when using the existing framework. Therefore, novel computational methods are needed to support decision making for these types of processes. The objective of this work was to develop a computational method for the simultaneous solvent selection and process optimization of continuous anti-solvent crystallization processes with distillation for solvent recycling. The process model includes the prediction of thermodynamic and heat properties using the PC-SAFT equation of state. Therefore, MESH equations (material balance/equilibrium/fraction summation/enthalpy balance) can be employed to provide a rigorous description of the processes. Moreover, the number of equilibrium stages in the distillation column was considered a decision variable, which was fixed in earlier work [24]. A novel hybrid stochasticdeterministic optimization framework was developed to solve this type of process optimization. In particular, the integer variables for solvent selection were relaxed by the continuous mapping method in order to make the size of the optimization problem independent from the number of solvents under consideration. The integer variables representing process configurations embedded within the process superstructure were dealt with using a genetic algorithm. As a consequence, the original MINLP problem was converted and solved as an NLP problem, which is computationally more reliable. The presented approach aims to enable faster process development with reduced experimental efforts and, ultimately, reduced solvent waste in industry.

Approach
The continuous process ( Figure 1) involves a crystallizer for crystal formation and growth followed by filtration for crystal separation of the crystals from the mother liquor and distillation for anti-solvent separation and recycling. The feed solution to the crystallizer is assumed saturated with an active pharmaceutical ingredient (API). Crystallization is operated at constant temperature, and the supersaturation is generated by the addition of recycled and fresh make-up anti-solvent. After filtration, the mother liquor is pre-heated to the bubble point and fed into the distillation column. The top product of the distillation column is rich in anti-solvent and is partially recycled back to the crystallizer after cooling. The remaining part is possibly sent back to the column as reflux. The bottom product of the column is considered a waste stream and contains solvent, dissolved API, and some anti-solvent. The distillation column can consist of multiple equilibrium stages with one total condenser operating at the bubble point and one partial reboiler.

MESH Equations
The continuous process is assumed to operate at steady state with phase equilibrium in the crystallizer and distillation column, which can be described by the MESH equations (i.e., material balances, equilibrium conditions, summation equations, and heat (enthalpy) balances). The material balances over the crystallizer and filter, considered as a single system, are given by and over the total condenser, trays, and reboiler of the distillation column, respectively, by where F in , F anti , F crystal , F distil , V B , L C , D, and B denote the molar flowrates of the process streams ( Figure 1); x i represents the mole fraction of compound i, which can be solvent, anti-solvent or API. It is assumed that the make-up anti-solvent and product stream with API are both pure, and that all of the API is recovered at the bottom of the distillation column. In general, the role of the API in the distillation column is neglected due to the low concentration for optimal solutions and negligible vapor pressure of common APIs. The API feed stream is assumed to be saturated with API, and the outlet of the crystallizer is also assumed to be at phase equilibrium. Therefore, the concentration of the feed stream and crystallizer outlet depends on the solvent selection. The solid-liquid equilibrium (SLE) is described by where γ API is the activity coefficient of the API in solution; ∆H f m , T m , ∆C p , are the API pure component properties, denoting the enthalpy of fusion, melting point, and heat capacity difference between the hypothetical supercooled liquid form and the crystalline form, respectively. The vapor and liquid phases are assumed to be in equilibrium on each tray in the distillation column. The distillation is assumed to be operated at atmospheric pressure without pressure drop. The vapor-liquid equilibrium (VLE) criterion is described by the equality of the component fugacity in the vapor and liquid phase as follows: where ϕ is the fugacity coefficient. The material balances are closed by summation equations for the mole fractions in each stream, which are not listed explicitly here. Energy balances are considered for the distillation column only, since the temperature in the crystallizer is assumed to be constant, and the latent heat of crystallization is assumed to be negligible, as follows: V n+1 · H V n+1 + L n−1 · H L n−1 = L n · H L n + V n · H V n , n Feed tray (9) where H is the molar enthalpy of each stream; Q C , Q B , Q R , and Q F are the heat duties for the condenser, the reboiler, the recycle, and the distillation feed (Figure 1), respectively.

Thermodynamic Model
The PC-SAFT model was adopted as the thermodynamic model for each unit operation [22]. The parameterization schemes for APIs and different types of solvents including a solvent database with 48 candidate solvents that are typically used in pharmaceutical industry and that do not pose any major concerns for safety, health, and environment were adopted from earlier work [21]. The PC-SAFT model enables the calculation of the reduced Helmholtz free energy a res for a given system volume, temperature, and composition. Many other properties can then be derived from a res by applying various thermodynamic relationships. The derivation of the activity coefficient for SLE and the fugacity coefficient for VLE calculation was described by Gross and Sadowski [22], which was used in this work. The good predictability of the PC-SAFT model for the SLE of pharmaceuticals and VLE phase equilibria of organic compounds has been previously demonstrated [22,25,26].
The molar enthalpy H consists of contributions from the ideal part H ig and residual part H res : where H ig is written as the summation of the component ideal heat capacity C ig p , and H res is expressed as a function of a res . C ig p has been empirically correlated with PC-SAFT pure component parameters m, ε, and σ by the following relationship: The calculated ideal heat capacity C ig,cal p from Equation (15) lies within ± 10% relative error of tested reference values ( Figure 2). Furthermore, a comparison between the predicted and experimentally measured heat of vaporization for different types of solvents shows a good predictability for H res (Figure 3). Note that benzene, ethanol, and ethyl acetate were selected to include non-polar non-associating, associating, and non-associating solvents, respectively. A satisfied prediction of ∆H vap can be achieved within a wide temperature range for solvents with different characteristics.

Optimization Problem Formulation
The problem for simultaneous optimization of solvent types and the process design was formulated as follows: tested reference values ( Figure 2). Furthermore, a comparison between the predicted and experimentally measured heat of vaporization for different types of solvents shows a good predictability for (Figure 3). Note that benzene, ethanol, and ethyl acetate were selected to include non-polar non-associating, associating, and non-associating solvents, respectively. A satisfied prediction of ∆ can be achieved within a wide temperature range for solvents with different characteristics.

Optimization Problem Formulation
The problem for simultaneous optimization of solvent types and the process design was formulated as follows: The economic objective function mimics revenues (α 1 F crystal ) from product minus the operational costs C op and the capital costs related to the number of distillation trays (a 2 N), and is normalized with respect to the inlet flowrate F in . C op is a linear combination of various costs including the use of make-up anti-solvent (α 2 F anti ), incineration of solvent waste (α 3 B), and cooling and heating utilities (α 4 (Q C + Q R ), α 5 (Q B + Q F )). The free variables for optimization are the solvent and anti-solvent properties represented by the PC-SAFT parameters (p i ), the solvent composition of the crystallizer outlet (x i,F out ), the duty for the distillation reboiler (Q B ) and condenser (Q C ), the number of trays for the rectifying section (N R ), and the stripping section (N S ) of the distillation column. The optimization is constrained by the process model, i.e., the MESH equations (P1.2). The operating pressure for the integrated system is assumed constant at 1 bar (P1.3), and the crystallization temperature is fixed at 298 K (P1.4). Constraint P1.5 ensures that the selected solvent and anti-solvent are miscible [27]. Furthermore, a convex hull (P1.6) is used to restrict the search space of the solvent parameters [23]. Finally, the distillation column is restricted by a maximum number of trays (N max ) excluding one feed tray (P1.7).

Solution Strategy
P1 is an MINLP problem, which is difficult to be solved by state-of-the-art MINLP solvers, at least for our tested cases when using the GAMS/SBB [28] solver and the GAMS/DICOPT solver [29]. Convergence always fails due to the infeasibility of the sub-NLP problems. This infeasibility is likely due to the high non-linearity of the sub-NLP problem, even when the disjunctive programming formulation [30] is adopted to reduce the redundant equations in each sub problem. In this respect, a novel hybrid optimization framework is proposed, which takes advantage of both deterministic and stochastic optimization approaches. An overview of the framework is provided in Figure 4. of the solvent parameters [23]. Finally, the distillation column is restricted by a maximum number of trays ( ) excluding one feed tray (P1.7).

Solution Strategy
P1 is an MINLP problem, which is difficult to be solved by state-of-the-art MINLP solvers, at least for our tested cases when using the GAMS/SBB [28] solver and the GAMS/DICOPT solver [29]. Convergence always fails due to the infeasibility of the sub-NLP problems. This infeasibility is likely due to the high non-linearity of the sub-NLP problem, even when the disjunctive programming formulation [30] is adopted to reduce the redundant equations in each sub problem. In this respect, a novel hybrid optimization framework is proposed, which takes advantage of both deterministic and stochastic optimization approaches. An overview of the framework is provided in Figure 4. In the outer layer, the optimal process operating conditions and structures are determined by the genetic algorithm [31]. Each individual in the population represents one process design. The roulette wheel method is applied for the evolution based on the fitness values (i.e., objective function values of P1) of individuals. Crossover and mutation are considered as the evolution operators. In  In the outer layer, the optimal process operating conditions and structures are determined by the genetic algorithm [31]. Each individual in the population represents one process design. The roulette wheel method is applied for the evolution based on the fitness values (i.e., objective function values of P1) of individuals. Crossover and mutation are considered as the evolution operators. In particular, continuous variables (i.e., process operating conditions) are pre-coded in a binary form before performing evolution operations. Only mutations are considered for the binary integer variables indicating the existence of each distillation tray. In the inner layer, the solvent pair for each individual is optimized deterministically. The solvent searching space is firstly relaxed from discrete points to continuous space within a convex hull by relaxing the solvent PC-SAFT parameters. NLP algorithms with multiple initial guesses are then applied in an attempt to obtain the (near) global optimum. Finally, a Taylor expansion is used to map the optimal solvents, which may not exist in reality due to the relaxation, to the nearest points in the solvent parameter searching space corresponding to real solvents. More details on this mapping procedure can be found elsewhere [21].
From the above discussion, it is clear that that the solvent PC-SAFT parameters p i are optimized deterministically by the mapping procedure. As consequence, the increase of problem size due to the combination of different solvents can be avoided when purely employing stochastic algorithms. Furthermore, the remaining free variables in P1 are related to the process (x i,F out , Q B , Q C , N R , N S ) and are optimized stochastically, so that the convergence performance of the NLP problem by deterministic algorithms can be improved with reduced free variables that are related to solvents only. Note that despite these different approaches for optimizing solvent and process variables, the hybrid framework ultimately leads to a simultaneous optimal solvent selection and process design.

Case Study
The proposed method is illustrated with a case study. Paracetamol, a commonly used over-the-counter API for treating pain and fever, was selected as the model compound.
The parameter values for calculating the SLE from Equation (6) Table 1 [32,33]. The value of N max in P1 was arbitrarily set as 6, which reflects an implicit assumption that solvent pairs that are particularly difficult to separate with distillation will never provide the most economical design and operation even if the maximum number of trays would be much higher. An interface between GAMS (GAMS Development Corporation) and MATLAB (The Mathworks, version R2015a) was built utilizing GDXMRW procedures [34] The CONOPT solver implemented in GAMS was used to solve all NLP problems [35], for which multiple initial guesses were generated randomly from MATLAB. In addition, MATLAB was also used to execute the genetic algorithm. The population size was set as 100 with total generation of 40. The probability for crossover was 0.2, while the probability for mutation was 0.3.

Results
The genetic algorithm belongs to the family of stochastic algorithms, which cannot guarantee the convergence to the global optimum. Thus, we performed the calculation several times in order to achieve the near global optimal solution. The evolution of the fitness values from a typical run is illustrated by Figure 5. The red dotted curve and the blue curve show the best and the average fitness values for each generation, respectively. It can be seen that the convergence is very fast at the beginning with steadily increasing best and average values, which, however, begin to level off after about 10 generations. In order to avoid convergence to local optima, a disturbance was provided in the 18th and 19th generation by increasing the evolution operation probabilities to one. As a result, both the best and the average fitness values decreased substantially. The introduction of the disturbance increases the diversity of the population and may help to converge to a better fitness value. In the example provided in Figure 5, the best fitness value still remained the same after the second convergence. The maximum identified objective function value for P1 was 0.138, which was tested by several runs and corresponds to a distillation column design with one partial reboiler, one feed stage, and one total condenser ( Figure 6). This design enables a sharp separation between the optimal solvent and anti-solvent pair, while the cost for additional trays apparently exceeds the merits on improving the separation efficiency. The best solvent and anti-solvent were DMSO and t-butyl acetate, respectively, whose PC-SAFT parameters are reported elsewhere [21]. From the composition of the crystallizer inlet, it is clear that DMSO has a high solubility for paracetamol. The mole fraction of paracetamol was reduced by two orders of magnitude after mixing with the make-up and recycle anti-solvent streams, which enabled a high crystal product yield. Furthermore, the optimal solvent and anti-solvent pair did not only allow for a high crystallization yield, but also for a sharp separation in a compact distillation column such that the recycle stream contained 97% anti-solvent. Most of the anti-solvent can be recycled, and dilution from recycling feed solvent is limited. Consequently, only a small amount of make-up anti-solvent needs to be provided and crystallization yield is high. The maximum identified objective function value for P1 was 0.138, which was tested by several runs and corresponds to a distillation column design with one partial reboiler, one feed stage, and one total condenser ( Figure 6). This design enables a sharp separation between the optimal solvent and anti-solvent pair, while the cost for additional trays apparently exceeds the merits on improving the separation efficiency. The best solvent and anti-solvent were DMSO and t-butyl acetate, respectively, whose PC-SAFT parameters are reported elsewhere [21]. From the composition of the crystallizer inlet, it is clear that DMSO has a high solubility for paracetamol. The mole fraction of paracetamol was reduced by two orders of magnitude after mixing with the make-up and recycle anti-solvent streams, which enabled a high crystal product yield. Furthermore, the optimal solvent and anti-solvent pair did not only allow for a high crystallization yield, but also for a sharp separation in a compact distillation column such that the recycle stream contained 97% anti-solvent. Most of the anti-solvent can be recycled, and dilution from recycling feed solvent is limited. Consequently, only a small amount of make-up anti-solvent needs to be provided and crystallization yield is high. paracetamol was reduced by two orders of magnitude after mixing with the make-up and recycle anti-solvent streams, which enabled a high crystal product yield. Furthermore, the optimal solvent and anti-solvent pair did not only allow for a high crystallization yield, but also for a sharp separation in a compact distillation column such that the recycle stream contained 97% anti-solvent. Most of the anti-solvent can be recycled, and dilution from recycling feed solvent is limited. Consequently, only a small amount of make-up anti-solvent needs to be provided and crystallization yield is high.  The economics of the case study lead to a high throughput of the distillation column with high energy input at optimal conditions. In particular, the distillate flow rate is more than six times larger than the API feed flow rate, and none of the condensed vapor is sent back to the column as reflux. This large recycle flow rate enables a substantial drop in solubility, thus increasing the crystal yield, and the absence of reflux helps to decrease the operating load of the column, and the energy costs are thus reduced. This type of operation for the distillation column is possible because the identified optimal solvent and anti-solvent pair have a high relative volatility, which stresses the importance of optimizing the solvent types and process simultaneously. In conclusion, the simultaneous optimization of solvent types and process conditions demonstrates that for the adopted economic parameters, a solvent and anti-solvent pair can be found for which the recycling of anti-solvent is economically favorable, which is the result of low utility costs compared with raw material costs.

Conclusions
A hybrid stochastic-deterministic optimization framework was developed for the simultaneous solvent selection and process optimization of a continuous anti-solvent crystallization process with solvent recycling through multi-stage distillation. The PC-SAFT model was applied for the prediction of various thermodynamic and caloric properties for phase equilibria and energy balance calculations. The optimization framework had two nested layers. In the inner layer, the continuous mapping method was applied for solvent optimization within a given process structure and fixed operating conditions, so that the original MINLP problem can be converted into an NLP problem and solved deterministically. In the outer layer, a genetic algorithm is applied for the optimization of the process structure and operating conditions, which provides a promising alternative solution to the deterministic algorithms for such process optimization involving complex process models and highly non-linear and non-convex thermodynamic equations.
The proposed optimization method allows for the use of an economic objective function, which reflects revenues from a product stream and costs associated with the use of raw materials, waste, energy, and equipment. In view of the large number of solvents that are routinely used in the pharmaceutical industry and the combinatorial nature of solvent selection for mixtures, the proposed method may also support faster process development by identifying candidate solvent pairs and processes during early process development stages.