Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process

Ehlhardt, Jens; Wolf, Inga; Engell, Sebastian

doi:10.3390/pr13103140

Open AccessFeature PaperArticle

Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process^†

by

Jens Ehlhardt

^1,*

,

Inga Wolf

² and

Sebastian Engell

¹

Process Dynamics and Operations Group, TU Dortmund, Emil-Figge-Straße 70, 42277 Dortmund, Germany

²

Covestro Deutschland AG, Kaiser-Wilhelm-Allee 60, 51373 Leverkusen, Germany

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of the “Ehlhardt, J.; Wolf, I.; Engell, S. Real-time optimization with machine learning models and distributed modifier adaptation applied to the MDI-process.” In Proceedings of the 34th European Symposium on Computer Aided Process Engineering and the 15th International Symposium on Process Systems Engineering (ESCAPE34-PSE24), Florence, Italy, 2–6 June 2024.

Processes 2025, 13(10), 3140; https://doi.org/10.3390/pr13103140

Submission received: 31 August 2025 / Revised: 23 September 2025 / Accepted: 23 September 2025 / Published: 30 September 2025

(This article belongs to the Special Issue Innovative Approaches to Modeling, Optimization, Control, and Monitoring in Industrial Processes)

Download

Browse Figures

Versions Notes

Abstract

The energy and resource efficient operation of continuously operated large-scale chemical plants is an important factor in the transition towards a sustainable and green process industry. In this work, the operation of the heat exchangers in the diphenylmethane diisocyanate (MDI) process is optimized to reduce fouling and thereby increase their energy efficiency. Real-time optimization (RTO) using Modifier Adaptation With Quadratic Approximation (MAWQA) is applied to cope with plant–model mismatch. It is combined with distributed estimation of the modifiers while retaining a centralized optimization to ensure rapid convergence. It reduces the data points needed for their computation and enables application to large-scale processes. The plant model that is used in the optimization is a surrogate of an available detailed flow-sheet simulator model. The algorithm is demonstrated first for a small problem and then applied to the operator training simulator (OTS) of the MDI process in several operation scenarios. Compared to previous work, the algorithm converges to the optimal operating conditions in fewer iterations.

Keywords:

real-time optimization; Modifier Adaptation; Modifier Adaptation with quadratic approximations; isocyanate production; surrogate models; distributed estimation of modifiers

1. Introduction

Today, the chemical industry faces a huge challenge: the transition to reduced CO₂ footprints and enhanced sustainability while staying competitive. Optimizing the operation of plants and processes that cannot be directly replaced by sustainable alternatives is an important factor in reducing the industry’s emissions. One example of large-scale energy intensive processes is the production of polyurethanes and their precursors, which are used, e.g., in the production of foams. These foams are used in applications ranging from shoes and mattresses to refrigerators, building insulation, and cars [1].

The most widely used precursor of polyurethanes is diphenylmethane diisocyanate (MDI). In the current production process, large heat inputs are needed, which are mainly provided in the form of steam and transferred to the process via heat exchangers. Fouling in the heat exchangers reduces their efficiency and eventually leads to the need of cleaning and plant shutdowns. The rate of fouling depends on the operating conditions of the process. Consequently, there is a strong interest to optimize the operation of the heat exchangers to reduce the fouling while maintaining high productivity levels [2].

This task is within the scope of real-time optimization (RTO), where numerical optimization is typically employed at relatively long sampling times to optimize the stationary operating conditions of, e.g., a chemical plant. For RTO, a mathematical model of the process is needed that accurately describes the plant and can be coupled with modern optimization solvers [3]. For many large-scale production processes, detailed steady-state simulation models are available in designated software: so-called flow-sheet simulators. These are either commercial products or in-house tools of the producing companies. Such simulators are designed for the interactive use by process designers and the emphasis is on very detailed and accurate descriptions of the behaviour of the processes, at the expense of computational efficiency. Moreover, convergence problems may occur and often, there is no option offered to couple them directly to real-time data acquisition and process control systems. In [4], employing artificial neural networks (ANNs) as surrogate models was proposed, which approximate the detailed flow-sheet simulation of complex processes, as well as using these surrogate models in an RTO scheme. Thus, the flow-sheet simulator acts as an offline data source or digital twin for the identification of the ANN models, which are then employed in the online application of RTO. The combination of surrogate modeling and digital twins enables the fast execution of the underlying models as well as the integration of measured data or only partly understood phenomena into optimization [5]. The idea is to use surrogate models of flow-sheet simulators for large-scale optimization [6].

Generally, RTO schemes need to handle the inevitable mismatch between the optimization model and the real behaviour of the plant; due to this discrepancy, the application of the optimal set-points that were computed for the plant model may not lead to an optimal operation of the real plant. This can result in sub-optimal performance or even infeasible operating conditions [7]. Several approaches have been proposed to address the issue of plant–model mismatch. The best known and most widely used approach is the two-step approach, see, e.g., [8], where, at each sampling time or at longer intervals, some parameters of the plant model are updated based on the available measurements and the optimization is performed using the updated model. The number of parameters that can be estimated depends on the available information. The two-step approach is tailored to the situation where the model is structurally correct so that it can represent the true plant behaviour for a suitable set of parameters. However, strictly speaking, this is never true, and practically, there may be significant unmodelled effects and phenomena. In this case, parameter updates may even lead to worse results than using a nominal model [9] and in general, the true optimum may not be achieved. Nonetheless, the two-step approach is the standard RTO methodology in industry, especially in the petroleum industry [10]. Often, it is applied in combination with (linear) model predictive control. Therefore, the economics are optimized via RTO and constraint satisfaction is enforced by a model predictive controller [11].

In Modifier Adaptation (MA), instead of adapting model parameters, the optimization problem itself is updated. Zeroth-order (bias) and first-order (gradient) correction terms, also called modifiers, are added to the cost and constraint functions to match the Karush-Kuhn-Tucker conditions (KKT) and to achieve the true plant optimum up on convergence. This was first proposed by Tatjewski [12] and extended to problems with constraints in [13]. While bias correction is well-known and easy to implement, the estimation of the gradients of the cost function and of the constraints of the real plant is the most challenging element of the MA approach. They can be approximated by finite difference (FD) methods, as discussed in [13,14], making use of available measurements rather than perturbing the plant at each iteration possible. However, using finite differences leads to a high sensitivity to measurement noise or disturbances for small step sizes and to inaccurate approximations for larger step sizes. To cope with this problem, in [15], a new technique, called Modifier Adaptation With Quadratic Approximations (MAWQA), was proposed where the cost and constraint functions of the plant are approximated by quadratic functions. The required gradients are then calculated based on the quadratic approximations. Due to the smoothing effect of the least-squares minimization in the identification of the approximations, MAWQA is more robust to noise. MAWQA was inspired by “derivative-free optimization” as proposed in [16] and incorporates techniques for data-point selection [17], the use of trust regions, and a model quality check. MAWQA has been successfully implemented to lab- and pilot-scale plants, as shown in [18,19,20,21]. A combination of MAWQA and Effective Model Adaptation (EMA) was successfully demonstrated at a lab-scale membrane plant. In EMA, the model parameters are updated while enforcing its adequacy for the optimization [22].

A drawback of MAWQA is that the minimum number of data points that are needed to compute the quadratic approximations is a quadratic function of the number of inputs. This limits its application to large-scale plants, where the efficient quadratic approximations can only be used for the first time after a relatively large number of probing steps and iterations have been performed, e.g., 21 plant evaluations for five inputs that are optimized. This problem can be addressed by combining Modifier Adaptation with distributed optimization. The goal of the application of decomposition techniques is to split the optimization problem into a set of smaller problems, each with fewer inputs. In [23], Modifier Adaptation was investigated in the context of distributed optimization. Three different algorithms were presented, which exchange different amounts of information between the subsystems. In all three algorithms, the modifiers are identified globally with respect to all decision variables. The focus of the study was on assuring privacy between the different subsystem operators (e.g., different plants in a chemical site), which is paid for by reducing the speed of convergence. In [24], a similar approach was successfully applied to gas compressor stations both in simulations and lab-scale settings. The modifiers are estimated locally on a subsystem-level and finite differences are used for the gradient estimation. In [25], the same approach was applied to output Modifier Adaptation (MAy). In MAy, the model outputs rather than the objective and constraint functions are adapted. Consequently, the modifiers of the sub-models can be computed individually and the modified outputs are then fed into the global constraint and objective functions.

Another approach was presented by [26]. Here, distributed feedback optimizing control is used to apply RTO to a gas-lifted oil well rig in simulation. The RTO problem is stated as a cascaded control structure that resembles the dual descent method. In the inner loop, the gradients of the Lagrangian are controlled to be zero, while in the slower outer loop, the Lagrange multipliers are updated to satisfy the constraints. However the poor scalability in combination with the complex control structure design for larger systems limits its application [27]. In [28], the approach was applied to a lab-scale gas-lifted oil rig.

Ref. [29] proposed using Gaussian processes in Modifier Adaptation. This was demonstrated in the optimization of wind farms [30] and for a lab-scale gas-lifted oil-rig [31].

Ref. [32] proposed another method to reduce the number of gradients that are estimated with the goal of enabling the application of MA to larger systems. The plant gradients are only computed with respect to few privileged directions in the inputs space, which are identified based on the sensitivity of the Lagrangian with respect to some uncertain parameters. The scheme was successfully applied to optimize the trajectory of an airborne energy system. It has not yet been investigated though as to how this could be generalized to structural plant–model mismatch. Despite all these advances in Modifier Adaptation and beyond, it still lacks acceptance and application in the industry [10].

In this work, MAWQA is combined with distributed estimation of the required modifiers to enable its application to large-scale systems. The problem is decomposed by means of introducing coupling variables while retaining a centralized optimization to ensure rapid convergence. Moreover, it avoids the convergence problems of distributed optimization algorithms (see, e.g., [33]). In this way, decomposition is performed only for the identification of the local quadratic approximations for the estimation of local modifiers. In addition, the decision between solving the modified optimization problem and an optimization problem based on the quadratic approximation, the trust-region constraints, and potentially required additional perturbations are handled on the subsystem level. The distribution of the modifier estimation problem thereby exploits the structure of the plant at hand. Hence, the required number of perturbations can be reduced and the speed of convergence to the true optimum of the overall plant can be improved. If the decomposition is conducted with respect to subunits of a plant, there is usually no need to respect limitations of the sharing of data; hence, an integrated optimization is feasible. The method presented in this paper has successfully been applied to the operator training simulator of the MDI-production process of the MDI producer Covestro, which provides an accurate digital twin of the real plant. The model of the OTS is structurally different from the model used in the flowsheet simulator and shows a different behaviour. In comparison to previous work [4], faster convergence is achieved. The distributed MAWQA method was first been proposed in [34]. In this paper, we extend this work and provide a detailed analysis for a small case study as well as for the MDI process.

The remainder of the paper is structured as follows: Section 2 describes the concepts of Modifier Adaptation and MAWQA, as well as the distributed estimation of modifiers. In addition, the distributed scheme is applied to a small case study. Section 3 provides an overview of the investigated MDI process. Section 4 and Section 5 present the application of the distributed MAWQA scheme to the MDI process and the results obtained. Finally, Section 6 provides a conclusion to the paper and directions for further research.

2. Real-Time Optimization Using Modifier Adaptation

2.1. Real-Time Optimization

The goal of real-time optimization is to compute the optimal steady-state operating conditions of a plant or process. Mathematically, the objective is to find the solution of the optimization problem ((1) and (2)).

\begin{matrix} \underset{u}{m i n} & J_{p} (u) \end{matrix}

(1)

\begin{matrix} s . t . & G_{P} (u) \leq 0 \end{matrix}

(2)

Throughout this work, vectors and matrices are written in bold.

J_{P}

is the cost function of the plant,

u

is a vector of inputs that can be influenced, and

G_{P}

represents a set of constraint functions. The plant model is contained in

G_{P}

as an equality constraint. However, the true plant cost function and constraints, indicated by the subscript _P, are generally not known due to plant–model mismatch and unknown influences. Therefore, ((1) and (2)) is replaced by the model-based optimization.

\begin{matrix} \underset{u}{m i n} & J_{M} (u) \end{matrix}

(3)

\begin{matrix} s . t . & G_{M} (u) . \leq 0 \end{matrix}

(4)

In (3) and (4), the assumed cost and constraint functions (including the nominal model) are indicated by the subscript _M. Due to plant–model mismatch and unknown influences, the optimal solution of the optimization problem for the plant,

u_{P}^{*}

, and the optimal solution of the model,

u_{M}^{*}

, do not match,

\begin{matrix} u_{M}^{*} \neq u_{P}^{*}, \end{matrix}

(5)

and the constraints of the true plant may not be met at

u_{M}^{*}

.

2.2. Modifier Adaptation with Quadratic Approximation

In Modifier Adaptation (MA) or iterative gradient correction, the model-based optimization problem ((3) and (4)) is extended by affine correction terms in the objective and constraint functions to match the conditions of optimality of the real plant [13]:

\begin{matrix} \underset{u^{k + 1}}{m i n} J_{a d}^{k} (u^{k + 1}) & : = J_{m} (u^{k + 1}) + λ_{J}^{k} (u^{k + 1} - u^{k}) \end{matrix}

(6)

\begin{matrix} s . t . & G_{a d}^{k} (u^{k + 1}) : = G_{m} (u^{k + 1}) + λ_{G}^{k} (u^{k + 1} - u^{k}) + ϵ_{G}^{k} \leq 0 \end{matrix}

(7)

\begin{matrix} u^{l b} \leq u^{k + 1} \leq u^{u b} . \end{matrix}

(8)

In every iteration k, the adapted optimization problem ((9)–(11)) is solved to find the next vector of operating conditions

u^{k + 1}

.

u^{l b}

and

u^{u b}

are the lower and upper bounds of the degrees of freedom. The affine correction factors, so-called modifiers, are added to correct the gradient of the cost function and the absolute values and the gradients of the constraint functions [15]. The bias modifier

ϵ_{G}

and the gradient modifiers

λ_{G}

and

λ_{J}

at iteration k are defined as

\begin{matrix} ϵ_{G}^{k} & = G_{P}^{k} - G_{M}^{k} \end{matrix}

(9)

\begin{matrix} λ_{J}^{k} & = \nabla_{u} J_{P}^{k} - \nabla_{u} J_{M}^{k} \end{matrix}

(10)

\begin{matrix} λ_{G}^{k} & = \nabla_{u} G_{P}^{k} - \nabla_{u} G_{M}^{k} . \end{matrix}

(11)

\nabla_{u}

denotes the gradients of the cost or constraint functions with respect to the inputs. As the plant functions are not known, the gradients need to be determined based on measured data. Often, finite differences are used [14]. However, the computation via FD is sensitive to noise and the selection of an appropriate step size is challenging. Gao et al. [15] proposed identifying the process gradients based on quadratic approximations of the cost and constraint functions and called the resulting scheme Modifier Adaptation with Quadratic Approximation (MAWQA). The quadratic approximations are of the form (12)

\begin{matrix} J_{Φ} (u, θ) & = \sum_{i = 1}^{N_{u}} \sum_{j = 1}^{i} a_{i, j} u_{i} u_{j} + \sum_{i = 1}^{N_{u}} b_{i} u_{i} + c \end{matrix}

(12)

\begin{matrix} θ & = {[a_{1, 1}, \dots, a_{N_{u}, N_{u}}, b_{1}, \dots, b_{N_{u}}, c]}^{T} . \end{matrix}

(13)

In ((12) and (13)),

N_{u}

is the number of inputs and the parameters of the quadratic functions

a_{i, j}

,

b_{i}

, and c are subsumed in the parameter vector

θ

.

Φ

indicates which function is approximated. Quadratic functions for the constraints are formed analogously. The gradients of the cost and constraint functions can then be robustly approximated according to

\begin{matrix} \nabla_{u} J_{P}^{k} & \approx \nabla_{u} J_{Φ}^{k} \end{matrix}

(14)

and

\begin{matrix} \nabla_{u} G_{P}^{k} & \approx \nabla_{u} G_{Φ}^{k} . \end{matrix}

(15)

The MAWQA workflow is depicted in Figure 1.

In every iteration, it is checked whether enough and sufficiently rich data are available for the quadratic approximations or gradient computation by finite differences. If neither criterion is fulfilled, the current set-points are perturbed and the plant is probed. If enough data are available for computing finite differences, the gradients are computed by a local linear approximation of the cost or constraint functions; the modifiers are calculated and the modified optimization problem (6)–(8) is solved to obtain the next set-points. If enough data points are available for computing quadratic approximations, a regression set is formed and the quadratic approximations are identified. To ensure a regression set that is well-posed for the identification of the quadratic functions, a subset of all available data points is chosen by an algorithm as presented in [17]. The gradients are then computed analytically based on the quadratic approximations. Dependent on quality measures, see (A5) and (A6), the modified problem (6)–(8) or an optimization problem based on the quadratic approximations is solved. In both cases, a trust region constraint is added to the optimization problems. The new set-points are applied and the algorithm is repeated as soon as a new steady-state has been reached. For more information, the interested reader is referred to in Appendix A and [15].

2.3. Distributed Modifier Adaptation with Quadratic Approximation

A bottleneck of the application of Modifier Adaptation schemes to plants with a significant number of inputs, and especially of MAWQA, is the number of perturbations required for gradient estimation. Their number grows linearly for finite differences and quadratically for quadratic approximations. A comparison is shown in Figure 2. Even for a medium number of inputs (≥4), MAWQA is difficult to apply in reality due to the required large number of perturbations. Therefore, exploiting the decomposable structure of the plants to extend the application of MAWQA to large-scale plants is proposed. Note that we do not necessarily decompose the solution of the optimization problem, as we assume the real-time optimization of a complex plant under single ownership and feasibility of the solution of the problem within the available computation time, but only the estimation of the modifiers. This ensures a fast convergence and avoids the convergence problems of fully distributed algorithms.

Many processes and plants consist of a network of different unit operations, which, however, need to be optimized simultaneously; examples include chemical production sites, heat exchanger networks, and wind farms. We assume that the problem at hand can be represented by the general formulation of a decomposable optimization problem with an additive cost function and local as well as plant-wide constraints as given in (16)–(18):

\begin{matrix} \underset{u_{1}, \dots, u_{N}}{m i n} & J_{t o t} = \sum_{1}^{N} J_{i} (u_{i}) \end{matrix}

(16)

\begin{matrix} s . t . & \sum_{1}^{N} G_{o v, i} (u_{i}) - G_{o v}^{t o t} \leq 0 \end{matrix}

(17)

\begin{matrix} G_{l o c, i} (u_{i}) \leq 0, i = 1, \dots, N . \end{matrix}

(18)

The overall cost function

J_{t o t}

is assumed to be the summation of the cost functions

J_{i}

of the N subsystems, which only depend on the individual inputs

u_{i}

. These may, e.g., be the operating cost or the energy consumption of the different units. In addition, the subsystems possess local constraints

G_{l o c, i}

and are coupled via Equation (17), also called the complicating constraint. These constraints can, for example, express shared resources (steam, electricity, etc.) [33]. Auxiliary variables (19) can be introduced to decompose optimization problems that are initially not decomposable, as follows:

\begin{matrix} v_{i} = h (u_{1}, \dots, u_{j}, \dots, u_{N}), i \neq j \end{matrix}

(19)

v_{i}

are coupling or interconnection variables and

h (•)

is assumed to be a known, differentiable function. Such variables may, e.g., represent streams between process units. The connection variables are assumed to be fully defined based on the inputs of the other subsystems. Combining the standard MA equations ((6)–(8)), the coupling variables (19), and distributed optimization problem ((16)–(18)), a Modifier Adaptation problem can be formulated. The decomposable structure can then be exploited to identify the required modifiers on the subsystem level.

\begin{matrix} \underset{u^{k + 1}, v^{k + 1}}{m i n} & J_{o v, a d}^{k} = \sum_{i = 1}^{N} J_{a d, o v, i}^{k} (u_{i}^{k + 1}, v_{i}^{k + 1}) \end{matrix}

(20)

\begin{matrix} s . t . & G_{o v, a d}^{k} = \sum_{i = 1}^{N} G_{o v, a d, i}^{k} (u_{i}^{k + 1}, v_{i}^{k + 1}) - G_{o v}^{t o t} \leq 0 \end{matrix}

(21)

\begin{matrix} v_{i}^{k + 1} = h (u_{1}^{k + 1}, \dots, u_{j}^{k + 1}, u_{N}^{k + 1}), \forall i \in N, j \neq i \end{matrix}

(22)

\begin{matrix} G_{l o c, a d, i} (u_{i}^{k + 1}, v_{i}^{k + 1}) \leq 0, i = 1, \dots, N \end{matrix}

(23)

\begin{matrix} u_{i}^{l b} \leq u_{i}^{k + 1} \leq u_{i}^{u b}, i = 1, \dots, N \end{matrix}

(24)

The separable overall adapted cost function

J_{a d, o v}

is the summation of the adapted cost functions of the individual N sub-models

J_{a d, o v, i}

. Analogously, the adapted complicating constraint functions

G_{o v, a d}

are the sum of the adapted contributions of the sub-models

G_{o v, a d, i}

.

u_{i}

and

v_{i}

are the set-points and coupling variables of the i-th subsystems.

G_{a d, l o c, i}

are local constraints and

u_{i}^{l b}

and

u_{i}^{u b}

represent the bounds of the inputs of the i-th subsystem.

\begin{matrix} J_{o v, a d, i}^{k} & : = J_{o v, m, i} (u_{i}^{k + 1}, v_{i}^{k + 1}) + λ_{J, i}^{k} (z_{i}^{k + 1} - z_{i}^{k}) \end{matrix}

(25)

\begin{matrix} G_{o v, a d, i}^{k} & : = G_{o v, m, i} (u_{i}^{k + 1}, v_{i}^{k + 1}) + λ_{G, o v, i} (z_{i}^{k + 1} - z_{i}^{k}) + ϵ_{G, o v, i} \end{matrix}

(26)

\begin{matrix} G_{l o c, a d, i}^{k} & : = G_{l o c, m, i} (u_{i}^{k + 1}, v_{i}^{k + 1}) + λ_{G, l o c, i} (z_{i}^{k + 1} - z_{i}^{k}) + ϵ_{G, l o c, i} \end{matrix}

(27)

Similar to (6) and (8), the adapted cost and constraint functions are summations of the model functions and affine correction terms. In the distributed case, the modifiers are defined in terms of the variables

z_{i}

instead of the original full input vector

u

, according to

\begin{matrix} z_{i} & = {[u_{i}^{T}, v_{i}^{T}]}^{T} \end{matrix}

(28)

\begin{matrix} λ_{J, o v, i}^{k} & = \nabla_{z_{i}} J_{P, o v, i}^{k} - \nabla_{z_{i}} J_{M, o v, i}^{k} \end{matrix}

(29)

\begin{matrix} λ_{G, o v, i}^{k} & = \nabla_{z_{i}} G_{P, o v, i}^{k} - \nabla_{z_{i}} G_{M, o v, i}^{k} \end{matrix}

(30)

\begin{matrix} λ_{G, l o c, i}^{k} & = \nabla_{z_{i}} G_{P, l o c, i}^{k} - \nabla_{z_{i}} G_{M, l o c, i}^{k} \end{matrix}

(31)

\begin{matrix} ϵ_{G, o v, i}^{k} & = G_{P, o v, i}^{k} - G_{M, o v, i}^{k} \end{matrix}

(32)

\begin{matrix} ϵ_{G, l o c, i}^{k} & = G_{P, l o c, i}^{k} - G_{M, l o c, i}^{k} . \end{matrix}

(33)

z_{i}

represents the set-points and the coupling variables of the individual subsystems. The maximum dimension of

z_{i}

is often smaller than the dimension of the overall input vector

u

, see Equation (34). The gradients

\nabla_{z_{i}}

are calculated with respect to the inputs

z_{i}

and the quadratic functions (12) are based on the variables

z_{i}

instead of

u

. If

\begin{matrix} N_{z, m a x} = m a x (d i m ({z_{1}, \dots, z_{N}})) < d i m (u), \end{matrix}

(34)

less data points are needed for the calculation of the gradients related to all subsystems and, thus, for the identification of the quadratic functions. A limitation here is that it must be possible to compute the coupling variables from the applied inputs or they must be measurable and exhibit sufficient variation.

The flow-chart of the resulting algorithm is shown in Figure 3. First,

N_{z, m a x} + 1

initial perturbations are applied to the plant. Then, enough data points are available such that the gradients can be estimated via the method presented in [13]. The next steps are carried out for each sub-model individually. Gradients and/or quadratic functions are calculated and the decision between choosing the adapted optimization problem or the problem based on the quadratic function ((A2)–(A4)) is taken for each sub-model. Afterwards, all local modified cost and constraint functions are assembled to form the overall optimization problem. Problem ((20)–(24)) is solved and the new input is applied to the plant. Convergence is checked and the algorithm either is terminated or repeated. The optimization problem Equations (20)–(24) can be solved in a centralized fashion or with distributed approaches, e.g., alternating direction method of multipliers (ADMM) [35].

2.4. Demonstration for a Small Example

In this Section, the methodology presented in Section 2.3 is applied to a small system for demonstration purposes. In the first scenario, the number of subunits N is fixed to 3 to demonstrate the convergence of the approach. In the second scenario, the number of subunits is increased, to showcase the advantage of MAWQA with distributed estimation of the modifiers (disMAWQA) when applied to larger systems. The structure of the plant is displayed in Figure 4. In the first stage, units are operated in parallel with their associated inputs

u_{1}

to

u_{N - 1}

. The outgoing streams are mixed and fed into unit N. The mixing can be described by a mixing rule according to

\begin{matrix} v = \frac{1}{N - 1} \sum_{i = 1}^{N - 1} u_{i} . \end{matrix}

(35)

The optimization problem is given in (36)–(38).

\begin{matrix} \underset{u_{1}, \dots, u_{N}, v}{m i n} & \sum_{i = 1}^{N} J_{i} \end{matrix}

(36)

\begin{matrix} s . t . & u_{i}^{l b} \leq u_{i} \leq u_{i}^{u b}, i \in {1, 2, \dots, N} \end{matrix}

(37)

\begin{matrix} v = \frac{1}{N - 1} \sum_{i = 1}^{N - 1} u_{i} \end{matrix}

(38)

with

\begin{matrix} J_{i} & = \{\begin{matrix} a_{i} {(u_{i} - b_{i})}^{2}, & i \in {1, 2, \dots N - 1} \\ a_{i} {(u_{i} - b_{i} - v)}^{2}, & i = N \end{matrix} \end{matrix}

(39)

\begin{matrix} a_{i} & = \{\begin{matrix} {2.4 + i \times \frac{0.7}{N - 1}}, & i \in {1, 2, \dots N - 1} \\ 3.1, & i = N \end{matrix} \end{matrix}

(40)

\begin{matrix} b_{i} & = \{\begin{matrix} {1.5 + i \times \frac{2}{N - 1}}, & i \in {1, 2, \dots N - 1} \\ 4, & i = N . \end{matrix} \end{matrix}

(41)

The overall cost function is the summation of the individual cost functions

J_{1}

to

J_{N}

. Equation (37) bounds the inputs and the interconnection variable v is added as a constraint (38). The parameters

a_{i}

and

b_{i}

are defined according to Equations (40) and (41). The plant model and the optimization model have the same structure but different values of

b_{i}

. In this example, no complicating or local constraints are present and no noise is added to the problem. The offset in the parameters is given by

\begin{matrix} b_{i} & = \{\begin{matrix} {1 + i \times \frac{2}{N - 1}}, & i \in {0, 1, \dots N - 1} \\ 5, & i = N . \end{matrix} \end{matrix}

(42)

In the first scenario, the number of subunits is 3 and the total number of inputs is

N_{u} = 3

. Accordingly, four perturbations are needed for computing finite differences and

(3 + 1) (3 + 2) / 2 = 10

perturbations for the quadratic approximations. With the distributed estimation of the modifiers, the number of perturbations can be reduced. The number of necessary perturbations and the necessary gradients are listed in Table 1. Quadratic approximations for all models are available after six perturbations.

The resulting trajectories of the first scenario are shown in Figure 5. It can be seen that the distributed modifier estimation approach converges to the optimal values of the inputs in iteration 12 while the standard approach needs 15 iterations. The same settings were used for both approaches. All optimization problems were solved in Matlab (R2022a) [36] using the OPTI toolbox (v2.29) [37].

In the second scenario, the advantage of the distributed approach when applied to larger systems is investigated. The number of subunits is varied between 3 and 11 and the resulting trajectories are displayed in Figure 6.

In Figure 6, the trajectories for the objective values of the standard MAWQA algorithm

J_{S, N}

and the disMAWQA algorithm

J_{D, N}

as well as the Euclidean distances to the optimal operating points of the standard MAWQA algorithm

d_{S, N}

and the disMAWQA algorithm

d_{D, N}

are displayed in the upper and lower plot. N represents the number of subunits in the plant, which are highlighted by the consistent line types. In both plots, the iteration number is displayed on the x-axes. As in the first case, the disMAWQA approach converges to the optimal operating point in iteration 12. For disMAWQA, this number stays constant even for larger systems. The iterations needed for convergences in contrast increase drastically for the standard MAWQA algorithm. In the case of 11 subunits, more than 80 iterations are needed. Both approaches converge to the optimal operating points as the Euclidean distance converges to zero, as is visible in the lower subplot. Due to the quadratic approximations, the number of perturbations increase quadratically with the number of inputs of the system. In the disMAWQA approach, the quadratic approximations are identified locally on a subunit level with only 1 or 2 inputs each. The standard MAWQA approach still relies on all inputs of the whole system. The advantage of disMAWQA is caused by the exploitable structure of the system and accordingly, the lower number of perturbations needed for the gradient estimation.

3. Isocyanate Production Process

Here, we consider one process stage of the production of diphenylmethane diisocyanate (MDI), as described in [4]. In the 1930s, the industrial production of MDI became important after the addition polymerization of difunctional isocyanates was discovered by O. Bayer. Methylene diphenyl diisocyanates are important feedstocks for the production of polyurethanes, and diphenylmethane diisocyanate (MDI) is the most widely used precoursor of polyurethanes [1]. Polyurethanes are a main component in the foam production with end-user applications ranging from mattresses and footwear to refrigerators, insulation panels, and cars.

In the process stage that is considered here, methylenedianiline (MDA) reacts to MDI in several reactors. Thereafter, MDI is separated from other gaseous reaction products. During the reaction and the separation step, several heat exchangers transfer heat to the process that is provided in the form of steam [4]. The objective of this work is to improve the energy efficiency of the MDI process and reduce its downtime by optimally distributing the total heat required in this process stage among the available heat exchangers, so that fouling on the process side is minimized and cleaning operations are needed as infrequently as possible [2]. The speed of the fouling processes is directly correlated to the pressure and temperature of the steam. If the steam temperature in a heat exchanger is high, fouling processes are accelerated on the process side of that heat exchanger. In order to be able to transfer the same amount of heat for this state of increased fouling, the steam temperature has to be increased, which results in more accelerated fouling such that cleaning activities are soon unavoidable. Rather than applying the same amount of heat input to the heat exchangers subject to increased fouling, it is preferable to transfer part of the required heat via the heat exchangers subject to less fouling. This leads to the goal that the steam pressures of all heat exchangers remain close to their clean values. In this manner, not only is the overall steam demand reduced but the time until the next cleaning activities are necessary is also extended.

The process consists of two stages. In the reaction stage, MDA reacts to MDI in multiple reactors that are operated in parallel. The outlet streams of the reactors are mixed and further processed in the separation stage, where multiple units are operated in sequence. In each unit, heat is supplied via heat exchangers. Figure 7a shows the scheme of the MDI-plant. Each heat exchanger is equipped with a cascaded control structure, see Figure 7b, that consists of a temperature controller in the outer loop and a pressure controller in the inner loop.

3.1. Surrogate Modeling of the MDI Process

In order to implement a model-based RTO scheme, a mathematical model that represents the process under consideration is needed. A physics- and chemistry-based steady-state model of the complete MDI production process is available in Covestro’s in-house simulator. The model comprises about 160,000 equations and is an accurate representation of the process that is used for design studies. However, it cannot be directly coupled with an optimization algorithm and its execution time is too long for real-time applications. As proposed in [4], artificial neural networks (ANNs) can be used as a surrogate model of the flowsheet simulation. For the training of the ANN model, the simulator acts as a data generator.

In this case, several ANNs are combined to represent the full plant that is presented in Figure 7a. All quantities of interest, which are the heat duties or steam pressures in the heat exchangers, are modelled individually for each subsystem as represented formally by Equations (43) and (44). The quantities, X, in the reaction stage depend only on the temperature set-point

T_{I, i}

and the load

{\dot{m}}_{I, i}

of the corresponding i-th heat exchanger. For the separation stage, the independent variables are the corresponding temperature set-points

T_{I I, i}

, the mixing temperature

T_{M}

, and the load

{\dot{m}}_{I I, i}

.

T_{M}

is the temperature of the mixed outlets streams that enter the separation stage. The load describes the mass flow entering the unit operation. In the remainder of the work, the load argument is dropped for better readability.

\begin{matrix} X_{I, i} & = f (T_{I, i}, {\dot{m}}_{I, i}) \end{matrix}

(43)

\begin{matrix} X_{I I, i} & = f (T_{I I, i}, T_{M}, {\dot{m}}_{I, i}) \end{matrix}

(44)

\begin{matrix} X_{j, i} & \in {p_{V P, j, i}, {\dot{Q}}_{j, i}} \end{matrix}

(45)

The manipulated variables (e.g., temperature set-points) and the plant load were sampled on an equidistant grid and the outputs X calculated by the flow-sheet simulator. All models were identified using the Levenverg–Marquardt algorithm in Matlab’s deep learning toolbox [36]. The networks consist of one layer with 8–32 neurons where the

t a n h

-activation function and a linear output layer are employed. Further information on the training of the surrogate models can be found in [4].

3.2. Operator Training Simulator

As a representation (or digital twin) of the real plant, an operator training simulator (OTS) is used. Operator training simulators are utilized to train new plant operators and to enhance the abilities of plant operators to handle abnormal operating conditions, which leads to less activation of interlocks and less unplanned shutdowns. The OTS is also utilized to test new process control schemes. This has the advantage that the tests can be performed independent of the production schedule and can be carried out under specified and repeatable conditions. Furthermore, the obtained simulation results can be used to increase the understanding and acceptance of advanced process control methods among plant personnel.

The OTS consists of an emulated version of the distributed control system of the real plant and a semi-rigorous dynamic process model. The dynamic process model is defined and implemented in a manner such that the OTS is able to simulate the process in a numerically stable way for a large set of operating conditions including start-up and shutdown scenarios in real time. The accuracy must be sufficient to make the operators feel as if they control the real plant. The OTS of the MDI plant is realized in the Workforce Competency framework by Honeywell Forge. One the one hand, the OTS model contains much more detail of the plant (e.g., controllers) and on the other hand, the model of the physics and chemistry of the plant is simpler than the stationary model in the flowsheet simulator; there is a significant mismatch between the stationary behaviour of the OTS and the flowsheet simulator. In addition, noise is present in the simulation. This is shown in detail in [4].

4. Application to the Process

The MAWQA scheme presented in Section 2.3 is applied to the MDI production process (Section 3). The OTS is used as a substitute of the real plant as experiments are possible with the OTS and the operators accept it as a faithful representation of the behaviour of the plant. The computed set-points were entered manually to the OTS after every iteration of the MAWQA scheme and the average time to steady-state is 2–4 h. For the initialization of the algorithm, a case-specific probing sequence is used, which is described in Section 4. Two different scenarios are investigated. In the first scenario, the RTO algorithm is started from a standard operating point of the plant to find optimal operation conditions that minimize the fouling in the heat exchangers. After the algorithm has converged, the total load of the plant is changed, which acts as a known disturbance. In the second trial, the algorithm is started from a different operating point and the process is disturbed after convergence by a change in the heat transfer coefficients of a subset of the heat exchangers, which represents an unmeasured change in the behaviour of the plant. In between the two experiments, the OTS was updated by Covestro in the course of regular maintenance, which leads to small differences in the initial steps of the algorithms. In all experiments, the same formulation of the optimization problem ((46)–(51)) is used and the required models for

{\dot{Q}}_{j, i}

and

p_{V P, j, i}

are identified as described in Section 3.1.

\begin{matrix} \underset{T_{I}, T_{I I}, T_{M}}{m i n} & \sum_{i = 1}^{N_{I}} {(p_{V P, I, i} - p_{V P, I, i}^{c l e a n})}^{2} + \sum_{i = 1}^{N_{I I}} {(p_{V P, I I, i} - p_{V P, I I, i}^{c l e a n})}^{2} \end{matrix}

(46)

\begin{matrix} s . t . & {\dot{Q}}_{j, i} \leq 0, i = 1, \dots N_{j}, j = {I, I I} \end{matrix}

(47)

\begin{matrix} {\dot{Q}}_{t o t}^{l b} ({\dot{m}}_{t o t}) \leq (\sum_{i = 1}^{N_{I}} {\dot{Q}}_{I, i} + \sum_{i = 1}^{N_{I I}} {\dot{Q}}_{I I, i}) \leq {\dot{Q}}_{t o t}^{u b} ({\dot{m}}_{t o t}) \end{matrix}

(48)

\begin{matrix} p_{V P, j, i} \leq p_{V P, j, i}^{m a x}, i = 1, \dots, N_{j}, j = {I, I I} \end{matrix}

(49)

\begin{matrix} T_{j}^{l b} \leq T_{j} \leq T_{j}^{u b}, j = {I, I I} \end{matrix}

(50)

\begin{matrix} T_{M} = w_{I}^{T} T_{I} \end{matrix}

(51)

T_{I}

and

T_{I I}

are the set-points of the heat exchangers of the reactors and of the separators.

T_{M}

is the temperature of the stream that enters the separation section, which acts as a coupling variable.

T_{M}

is based on a mixing rule and is determined by Equation (51). The vector of weighting factors

w_{I}

depends on the heat capacities, mass flows, and temperatures of the outgoing streams of all reactors. The function

h (•)

, cf. Equation (22), is thus linear. The subscripts _I identifies the reactors and the subscript _II identifies the separators. The differences of the stream pressures in the heat exchangers

p_{V P, j, i}

and the desired reference values

p_{V P, j, i}^{c l e a n}

are penalized quadratically. The numbers of reactors and separators are

N_{I}

and

N_{I I}

.

{\dot{Q}}_{j, i}

are the heat duties, and the total heat duty of all reactors and separators is bounded by the lower bound

{\dot{Q}}_{t o t}^{l b}

and upper bound

{\dot{Q}}_{t o t}^{u b}

. The bounds depend on the total load of the production plant

{\dot{m}}_{t o t}

. In addition, there is an upper bound

p_{V P}^{m a x}

of the available steam pressure. All temperature set-points must be in the range between

T_{j}^{l b}

and

T_{j}^{u b}

. All optimization problems were solved in Matlab [36] using IPOPT [38] within the OPTI toolbox [37].

Initial Probing Sequence

Before bias and gradient correction are applied in MA or MAWQA, the system must be probed to collect enough data points to allow estimation of the plant gradients. Initially, finite differences are used; thereafter, quadratic approximations provide more robust gradient estimates. The structure of the MDI-plant can be exploited to reduce the length of the initial probing sequence, as the probing actions for the reaction and separation stages can be decoupled. Ref. [13] proposed maximizing the inverse condition number

κ^{- 1} (•)

to yield a good approximation by finite differences. An optimization problem is solved to compute probing actions when the identification matrix

S_{k}

becomes nearly singular; in this work, the initial probing sequence also was determined by solving the optimization problem

\begin{matrix} \underset{Δ_{T^{I}}^{k}, Δ_{T_{I I}}^{k}}{m a x} & κ^{- 1} (S_{k}) \end{matrix}

(52)

\begin{matrix} s . t . & Δ_{T_{j}, m i n}^{k} \leq Δ_{T_{j}}^{k} \leq Δ_{T_{j}, m a x}^{k}, k = {1, 2}, j = {I, I I} \end{matrix}

(53)

\begin{matrix} T_{M}^{k} = w_{I}^{T} T_{I}^{k}, k = {1, 2} \end{matrix}

(54)

\begin{matrix} T_{j}^{k} = T_{j}^{0} + Δ_{T_{j}}^{k}, k = {1, 2}, j = {I, I I} \end{matrix}

(55)

with

\begin{matrix} S_{k} = {[(\binom{T_{M}^{0} - T_{M}^{1}}{T_{I I}^{0} - T_{I I}^{1}}) (\binom{T_{M}^{0} - T_{M}^{2}}{T_{I I}^{0} - T_{I I}^{2}})]}^{T} . \end{matrix}

(56)

As

N_{z, m a x} = 2

, only two perturbations, designated as

Δ_{T_{j}}^{k}

, are required to estimate all plant gradients. The superscript k indicates the perturbation number. The perturbations are added to the initial inputs

T_{j}^{0}

for the reaction stage (I) and the separation stage (

I I

). The mixing temperature

T_{M}

is calculated according to Equation (54).

Δ_{T_{j}, m i n}

and

Δ_{T_{j}, m a x}

are user-defined parameters that control the minimum and maximum step-size. An exemplary result for the initial probing sequence can be seen in Figure 8. The initial set-points and the two perturbations are plotted for the models in the reaction stage (lower graph) and separation stage (upper graph). The set-points of the separation stage are placed to ensure a good approximation using FD as there are two inputs. After applying two vectors of set-points to the reaction stage, the quadratic approximations can already be computed as the models only have one input. Due to the parallel structure in the reaction stage, only two trajectories are visible as the individual trajectories of the reactors lie on top of each other.

5. Results

Distributed MAWQA, as presented in Section 2.3, was applied in the two different scenarios described above to the OTS of the MDI process as described in Section 3.2.

5.1. Known Disturbance—Load Change

In the load change experiment, the algorithm was started from a known operating point in iteration 1. The resulting trajectories are shown in Figure 9. In iterations 2 and 3, the perturbation sequence according to Section 4 was applied. Due to the reduced number of inputs, quadratic approximations can be constructed for the reaction stage from iteration 4 onward. The gradients for the separation stage were calculated via FD. The set-points of the reaction stage decrease towards 0, which simultaneously decreases the objective function value. The set-points of the separation stage increase for the iterations 5 and 6 and the total heat flow constraint slightly violates the upper bound. In iteration 7, sufficient data points are available to identify quadratic approximations for all subsystems. Due the improved gradient approximation, the algorithm converges to the optimal operating conditions in iteration 8 satisfying all constraints. In iteration 10, a known disturbance is applied the total load of the plant is reduced by 10%. In Figure 9, this point in time is indicated by the vertical dashed lines. The algorithm is restarted with a new probing sequence similar to the one used before. The new perturbations are applied in the following two iterations. Similar to the first part of the experiment, quadratic approximations are built after iterations 13 to 15 for the reaction section while finite differences are used in the separation section to estimate the plant gradients. From iteration 16 onwards, quadratic approximations are computed for the separation section as well. The algorithm converges to a new optimal set of temperature set-points in iteration 17. Minor oscillations are still visible caused by the noise applied in the OTS.

5.2. Unknown Disturbance—Change in Heat Transfer Coefficients

In this experiment, the algorithm again starts from a standard operating point employing a probing sequence. As in Section 5.1, the algorithm converges quickly once a sufficient number of data points have been collected to identify quadratic approximations for all sub-models. The OTS was updated in between the two experiments, which led to different noise levels and operating conditions. After reaching optimal operating conditions, the unknown disturbance occurs in iteration 10. The resulting trajectory is shown in Figure 10. No predefined perturbations are applied, and quadratic approximations are identified in all iterations for all models. In contrast to the previous experiments, data points older than 6 iterations are removed from the data set for the QAs. The size of the sliding window was chosen based on the minimum number of data points required to identify quadratic approximations for all submodels. Further investigations into the optimal window size may be required for other situations. The algorithm converges to a new optimal set of temperature set-points in 15 iterations. In iterations 11–16, data points both from before and after the occurrence of the disturbance are used to identify the quadratic approximations. Therefore, the algorithm needs longer to converge. Once all the used data points were collected after the disturbance, more heat duty was distributed to the heat exchangers that are not affected by the disturbance, while the heat duty of the heat exchangers that are affected by the disturbance was reduced. This results in a lower value in the objective value and reduces the fouling.

5.3. Comparision of the Standard MAWQA Algorithm

In [4], the MAWQA algorithm was applied to the MDI process without the distributed estimation of the modifiers. In both cases, artificial neural network models were used in the optimization. The disMAWQA algorithm converges significantly faster to the optimal operation conditions of the plant. Table 2 lists the number of iterations required for convergence. The standard MAWQA algorithm requires twice as many iterations to reach the optimal operating conditions when starting from common operating points. Note that in [4], no external disturbances are applied; therefore, only the iterations until the first convergence are counted for the two scenarios presented in this paper.

6. Conclusions and Outlook

In this paper, a modification of the MAWQA algorithm for large-scale plants with few couplings between the subunits is proposed and applied to the MDI production process. By exploiting the structure of the plant in a distributed gradient estimation scheme, the number of required data points that are needed to compute quadratic approximations of the cost and constraint functions, which are the basis of the estimation of the plant gradients, can be drastically reduced. With this modification, the convergence of the MAWQA algorithm is significantly faster compared to previous work, see [4]. The structure of the plant was also exploited to design an initial probing sequence to reduce the needed initial perturbations. The resulting scheme was applied to the operator training simulator (OTS) of the MDI plant in two realistic scenarios. In the first scenario, the load of the plant changed, which represents a known disturbance, while in the second scenario, an unknown disturbance was introduced. In both cases, the algorithm is able to converge to optimal operating conditions fast and smoothly and thereafter to appropriately react to the disturbances.

The distributed estimation makes it possible to also apply MAWQA to larger plants where otherwise, the number of perturbations that are required to estimate the gradients becomes large so that it would be difficult to convince the plant managers to apply RTO with Modifier Adaptation or MAWQA. In the case considered here, the plant structure is parallel–sequential with only one coupling variable. We will investigate which other structures will be promising in the future for MAWQA, with distributed estimation of the modifiers. Also, we assume full knowledge of the relationship of the coupling variables and the manipulated inputs, which is justified in this case, but in general, uncertainties will also be present. How to deal with this issue will be the subject of further work. It may be further beneficial to couple the algorithm with disturbance detection or with a restart algorithm.

7. Patents

The application of iterative RTO to the MDI process has been submitted as a patent [2].

Author Contributions

Conceptualization, J.E., I.W. and S.E.; methodology, J.E. and S.E.; software, J.E.; validation, I.W.; formal analysis, J.E. and S.E.; investigation, J.E.; resources, I.W.; data curation, J.E. and I.W.; writing—original draft preparation, J.E.; writing—review and editing, S.E.; visualization, J.E.; supervision, S.E.; project administration, S.E.; funding acquisition, S.E. All authors have read and agreed to the published version of the manuscript.

Funding

The research that lead to the presented results was part of KEEN project funded by the German Ministry of Economic Affairs and Climate Action (BMWK) under the grant number 01MK20014T.

Data Availability Statement

The datasets presented in this article are not readily available because of confidentiality reasons. Requests to access the datasets should be directed to Covestro Germany AG.

Conflicts of Interest

The authors declare the following potential conflict of interest: All authors are co-inventors of a patent (Patent No. [EP22209645.5]) that is related to the subject matter of this publication. Jens Ehlhardt and Sebastian Engell are employed by TU Dortmund University, and Inga Wolf is employed by Covestro Deutschland AG. Apart from the mentioned patent, the authors declare no additional financial or personal relationships that could have influenced the work reported in this manuscript.

Abbreviations

The following abbreviations are used in this manuscript:

disMAWQA	MAWQA with Distributed Estimation of Modifiers
FD	Finite Differences
KEEN	Künstliche-Intelligenz Inkubator-Labore in der Prozessindustrie
MA	Modifier Adaptation
MAWQA	Modifier Adaptation with Quadratic Approximation
MDA	Methylenedianiline
MDI	Diphenylmethane Diisocyanate
OTS	Operator Training Simulator
QA	Quadratic Approximations
RTO	Real-time Optimization

Appendix A

The least squares problem stated in Equation (A1) is solved to identify the vector of parameters of the quadratic approximation,

θ

,

\begin{matrix} \underset{θ}{m i n} \sum_{i = 1}^{N_{U}} {(J_{P} (u_{i}) - J_{Φ} (u_{i}, θ))}^{2}, \end{matrix}

(A1)

where

J_{P} (u_{i})

are the measured values of the objective function and

N_{U}

is the number of data points in the regression set

U

. The regression set is a subset of the available data points and selected to ensure a well poised identification [17]. Analogously, quadratic approximations of the constraint functions

G_{Φ}

are identified. In addition, more concepts from derivative free optimization are used in MAWQA, for more information see [16]. Depending on the quality of the model and of the quadratic approximation, the surrogate optimization problem ((A2)–(A4)) may be solved instead of ((6)–(8)).

\begin{matrix} \underset{u^{k + 1}}{m i n} & J_{Φ} (u^{k + 1}) \end{matrix}

(A2)

\begin{matrix} s . t . & G_{Φ}^{k} (u^{k + 1}) \leq 0 \end{matrix}

(A3)

\begin{matrix} u^{l b} \leq u^{k + 1} \leq u^{u b}, \end{matrix}

(A4)

where

J_{Φ}

and

G_{Φ}

are the quadratic approximations of the cost and constraint functions. The decision is made based on the following performance metrics, as follows:

\begin{matrix} ρ_{a d}^{k} & = m a x \{|1 - \frac{J_{a d}^{k} - J_{a d}^{k - 1}}{J_{P}^{k} - J_{P}^{k - 1}}|, |1 - \frac{G_{a d, 1}^{k} - G_{a d, 1}^{k - 1}}{G_{P, 1}^{k} - G_{P, 1}^{k - 1}}|, \dots, |1 - \frac{G_{a d, N_{G}}^{k} - G_{a d, N_{G}}^{k - 1}}{G_{P, N_{G}}^{k} - G_{P, N_{G}}^{k - 1}}|\} \end{matrix}

(A5)

\begin{matrix} ρ_{Φ}^{k} & = m a x \{|1 - \frac{J_{Φ}^{k} - J_{Φ}^{k - 1}}{J_{P}^{k} - J_{P}^{k - 1}}|, |1 - \frac{G_{Φ, 1}^{k} - G_{Φ, 1}^{k - 1}}{G_{P, 1}^{k} - G_{P, 1}^{k - 1}}|, \dots, |1 - \frac{G_{Φ, N_{G}}^{k} - G_{Φ, N_{G}}^{k - 1}}{G_{P, N_{G}}^{k} - G_{P, N_{G}}^{k - 1}}|\} . \end{matrix}

(A6)

In (A5) and (A6), the subscripts

a d

, P, and

Φ

are the adapted functions, plant measurements, and quadratic functions, respectively.

N_{G}

is the number of constraints. If

ρ_{a d}^{k} \leq ρ_{Θ}^{k}

, problem (6)–(8) will be solved else (A2)–(A4) will be solved. Additionally, a trust-region constraint (A7) is added to (6)–(8) and (A2)–(A4) to ensure the validity of the local quadratic approximations.

\begin{matrix} {(u^{k + 1} - u^{k})}^{T} c o v (U^{k}) (u^{k + 1} - u^{k}) \leq γ^{2} \end{matrix}

(A7)

The trust-region is determined by the regression set

U

and a tuning factor

γ

that controls the size of the trust-region. The performance metrics

ρ_{a d}

and

ρ_{Φ}

are used to make the decision between the adapted optimization problem and the optimization based on the quadratic function. In [15], a criticality check is proposed, which is commonly used in derivative free optimization [16]. The criticality step ensures that all points lie close to the next iterate

u^{k + 1}

. In this work, however, the criticality step is not used because oscillating behavior could be observed during tests prior to the experiments.

References

Sonnenschein, M.F. Polyurethanes: Science, Technology, Markets, and Trends; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2021. [Google Scholar] [CrossRef]
Engell, S.; Ahmad, A.; Ehlhardt, J.; Wolf, I.; Arras, J.; Hielscher, A. Method for Controlling a Distributed Control System. EU Patent EP22209645.5, 2022. applied. [Google Scholar]
Müller, D.; Dercks, B.; Nabati, E.; Blazek, M.; Eifert, T.; Schallenberg, J.; Piechottka, U.; Dadhe, K. Real-Time Optimization in the Chemical Processing Industry. Chem. Ing. Tech. 2017, 89, 1464–1470. [Google Scholar] [CrossRef]
Ehlhardt, J.; Ahmad, A.; Wolf, I.; Engell, S. Real-Time Optimization Using Machine Learning Models Applied to the 4,4’-Diphenylmethane Diisocyanate Production Process. Chem. Ing. Tech. 2023, 95, 1096–1103. [Google Scholar] [CrossRef]
Bárkányi, Á.; Chován, T.; Németh, S.; Abonyi, J. Modelling for digital twins—Potential role of surrogate models. Processes 2021, 9, 476. [Google Scholar] [CrossRef]
Caballero, J.A.; Grossmann, I.E. An algorithm for the use of surrogate models in modular flowsheet optimization. AIChE J. 2008, 54, 2633–2650. [Google Scholar] [CrossRef]
Engell, S. Feedback control for optimal process operation. J. Process Control 2007, 17, 203–219. [Google Scholar] [CrossRef]
Chen, C.Y.; Joseph, B. On-line optimization using a two-phase approach: An application study. Ind. Eng. Chem. Res. 1987, 26, 1924–1930. [Google Scholar] [CrossRef]
Chachuat, B.; Srinivasan, B.; Bonvin, D. Adaptation strategies for real-time optimization. Comput. Chem. Eng. 2009, 33, 1557–1567. [Google Scholar] [CrossRef]
Câmara, M.M.; Quelhas, A.D.; Pinto, J.C. Performance evaluation of real industrial RTO systems. Processes 2016, 4, 44. [Google Scholar] [CrossRef]
Darby, M.L.; Nikolaou, M.; Jones, J.; Nicholson, D. RTO: An overview and assessment of current practice. J. Process Control 2011, 21, 874–884. [Google Scholar] [CrossRef]
Tatjewski, P. ITERATIVE OPTIMIZING SET-POINT CONTROL—THE BASIC PRINCIPLE REDESIGNED. IFAC Proc. Vol. 2002, 35, 49–54. [Google Scholar] [CrossRef]
Gao, W.; Engell, S. Iterative set-point optimization of batch chromatography. Comput. Chem. Eng. 2005, 29, 1401–1409. [Google Scholar] [CrossRef]
Marchetti, A.G.; François, G.; Faulwasser, T.; Bonvin, D. Modifier adaptation for real-time optimization—Methods and applications. Processes 2016, 4, 55. [Google Scholar] [CrossRef]
Gao, W.; Wenzel, S.; Engell, S. A reliable modifier-adaptation strategy for real-time optimization. Comput. Chem. Eng. 2016, 91, 318–328. [Google Scholar] [CrossRef]
Conn, A.R.; Scheinberg, K.; Vicente, L.N. Introduction to Derivative-Free Optimization; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2009. [Google Scholar] [CrossRef]
Wenzel, S.; Yfantis, V.; Gao, W. Comparison of Regression Data Selection Strategies for Quadratic Approximation in RTO. In Computer Aided Chemical Engineering, Proceedings of the 27th European Symposium on Computer Aided Process Engineering, ESCAPE27, Barcelona, Spain, 1–5 October 2017; Elsevier: Oxford, UK, 2017; Volume 40, pp. 1711–1716. [Google Scholar] [CrossRef]
Hernández, R.; Dreimann, J.; Engell, S. Reliable Iterative RTO of a Continuously Operated Hydroformylation Process. IFAC-PapersOnLine 2018, 51, 61–66. [Google Scholar] [CrossRef]
Gottu Mukkula, A.R.; Kern, S.; Salge, M.; Holtkamp, M.; Guhl, S.; Fleicher, C.; Meyer, K.; Remelhe, M.P.; Maiwald, M.; Engell, S. An application of modifier adaptation with quadratic approximation on a pilot scale plant in industrial environment. IFAC-PapersOnLine 2020, 53, 11773–11779. [Google Scholar] [CrossRef]
Gottu Mukkula, A.R.; Valiauga, P.; Fikar, M.; Paulen, R.; Engell, S. Experimental real time optimization of a continuous membrane separation plant. IFAC-PapersOnLine 2020, 53, 11786–11793. [Google Scholar] [CrossRef]
Cegla, M.; Fage, A.; Kemmerling, S.; Engell, S. Experimental Application of Real-Time Optimization with Modifier Adaptation and Quadratic Approximation to a Reactive Extrusion Process. IFAC-PapersOnLine 2023, 56, 6150–6155. [Google Scholar] [CrossRef]
Ahmad, A.; Paulen, R.; Valo, R.; Fikar, M.; Engell, S. Iterative real-time optimization of a membrane pilot plant. Control Eng. Pract. 2024, 147, 105907. [Google Scholar] [CrossRef]
Schneider, R.; Milosavljevic, P.; Bonvin, D. Distributed modifier-adaptation schemes for the real-time optimisation of uncertain interconnected systems. Int. J. Control 2019, 92, 1123–1136. [Google Scholar] [CrossRef]
Milosavljevic, P.; Marchetti, A.G.; Cortinovis, A.; Faulwasser, T.; Mercangöz, M.; Bonvin, D. Real-time optimization of load sharing for gas compressors in the presence of uncertainty. Appl. Energy 2020, 272, 114883. [Google Scholar] [CrossRef]
Papasavvas, A.; Francois, G. Internal Modifier Adaptation for the Optimization of Large-Scale Plants with Inaccurate Models. Ind. Eng. Chem. Res. 2019, 58, 13568–13582. [Google Scholar] [CrossRef]
Krishnamoorthy, D. A distributed feedback-based online process optimization framework for optimal resource sharing. J. Process Control 2021, 97, 72–83. [Google Scholar] [CrossRef]
Krishnamoorthy, D.; Skogestad, S. Real-Time optimization as a feedback control problem—A review. Comput. Chem. Eng. 2022, 161, 107723. [Google Scholar] [CrossRef]
Dirza, R.; Matias, J.; Skogestad, S.; Krishnamoorthy, D. Experimental validation of distributed feedback-based real-time optimization in a gas-lifted oil well rig. Control Eng. Pract. 2022, 126, 105253. [Google Scholar] [CrossRef]
Ferreira, T.D.A.; Shukla, H.A.; Faulwasser, T.; Jones, C.N.; Bonvin, D. Real-Time optimization of Uncertain Process Systems via Modifier Adaptation and Gaussian Processes. In Proceedings of the 2018 European Control Conference, ECC 2018, Limassol, Cyprus, 12–15 June 2018; pp. 465–470. [Google Scholar] [CrossRef]
Andersson, L.E.; Imsland, L. Real-time optimization of wind farms using modifier adaptation and machine learning. Wind Energy Sci. 2020, 5, 885–896. [Google Scholar] [CrossRef]
Turan, E.M.; Lia, S.; Matias, J.; Jäschke, J. Experimental validation of modifier adaptation and Gaussian processes for real time optimisation. IFAC-PapersOnLine 2023, 56, 1394–1399. [Google Scholar] [CrossRef]
Costello, S.; François, G.; Bonvin, D. A Directional Modifier-Adaptation Algorithm for Real-Time Optimization. J. Process Control 2016, 39, 64–76. [Google Scholar] [CrossRef]
Wenzel, S.; Riedl, F.; Engell, S. An efficient hierarchical market-like coordination algorithm for coupled production systems based on quadratic approximation. Comput. Chem. Eng. 2020, 134, 106704. [Google Scholar] [CrossRef]
Ehlhardt, J.; Wolf, I.; Engell, S. Real-time optimization with machine learning models and distributed modifier adaptation applied to the MDI-process. In Computer Aided Chemical Engineering, Proceedings of the 34th European Symposium on Computer Aided Process Engineering/15th Symposium on Process System Engineering, Florence, ESCAPE34/PSE24, Italy, 2–6 June 2024; Elsevier: Oxford, UK, 2024; Volume 53, pp. 1981–1986. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2010, 3, 1–122. [Google Scholar] [CrossRef]
MATLAB, version 9.12.0 (R2022a); The MathWorks Inc.: Natick, MA, USA, 2022; Available online: https://www.mathworks.com (accessed on 9 September 2025).
Currie, J.; Wilson, D.I. Opti: Lowering the Barrier Between Open Source Optimizers and the Industrial MATLAB User. In Proceedings of the Foundations of Computer-Aided Process Operations, Tucson, Arizona, 10–14 January 2027. [Google Scholar]
Wächter, A.; Biegler, L.T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]

Figure 1. Schematic flow diagram of the Modifier Adaptation with Quadratic Approximation algorithm used for RTO. When a steady state has been reached, it is evaluated whether finite-difference or quadratic approximations can be used for gradient estimation. If too few data points are available, new setpoints are generated by perturbation. In the case of QA, a quality check is performed to decide whether the modified optimization problem or the optimization based on the quadratic approximation should be used. In all cases, the new setpoints are applied to the plant, followed by waiting for the next steady state.

Figure 2. Minimum number of perturbations needed for gradient estimation using finite differences and quadratic approximations. The number of perturbations needed for the QA increases quadratically. Even for a system with four or more inputs, QA becomes difficult to apply due to the number of initial data points required.

Figure 3. Schematic flow diagram of the distributed MAWQA scheme. If no data points are available, initial perturbations are applied to the system. In each iteration of the disMAWQA algorithm, convergence is checked. If the user-defined convergence criterion is not met, the choice between finite differences and quadratic approximations, as well as the quality check, is performed for each subsystem individually. The overall centralized optimization problem is then assembled from all individual objective and constraint functions. This problem is solved, and the new setpoints are applied to the system. Note that the overall problem may include modified objective and constraint functions in addition to quadratic approximations.

Figure 4. General structure of the minimal example.

Figure 5. Trajectories of the inputs (upper plot) and objective function values (lower plot) of MAWQA and MAWQA with the distributed estimation of modifiers (disMAWQA) for the demonstration example with three subsystems. On the x-axes, the iteration number is displayed.

Figure 6. In the upper plot, the trajectories of the objective values of the standard MAWQA

J_{S, N}

and the disMAWQA approach

J_{D, N}

are shown. In the lower plot, the Euclidean distance to the optimal input vector

u^{*}

of the standard MAWQA approach

d_{S, N}

and the disMAWQA approach

d_{D, N}

are presented. On the x-axes, the iteration number is displayed.

Figure 6. In the upper plot, the trajectories of the objective values of the standard MAWQA

J_{S, N}

and the disMAWQA approach

J_{D, N}

are shown. In the lower plot, the Euclidean distance to the optimal input vector

u^{*}

of the standard MAWQA approach

d_{S, N}

and the disMAWQA approach

d_{D, N}

are presented. On the x-axes, the iteration number is displayed.

Figure 7. In (a), a schematic representation of the plant is shown. In the reaction stage, multiple units are operating in parallel with the corresponding temperature set-points of the heat exchangers

T_{I}^{i}

. The outlet streams are mixed, and the resulting mixing temperature is denoted as

T_{M}

. After mixing, the streams are further processed in the separation stage with the corresponding temperature set-points

T_{I I}^{j}

. The employed control structure is shown in (b).

Figure 7. In (a), a schematic representation of the plant is shown. In the reaction stage, multiple units are operating in parallel with the corresponding temperature set-points of the heat exchangers

T_{I}^{i}

. The outlet streams are mixed, and the resulting mixing temperature is denoted as

T_{M}

. After mixing, the streams are further processed in the separation stage with the corresponding temperature set-points

T_{I I}^{j}

. The employed control structure is shown in (b).

Figure 8. Results of the optimization of the initial probing sequence. In the upper subplot, the scaled temperature set-points of the separation stage

T_{II}

are plotted on the y-axis and the scaled mixing temperature is plotted on the x-axis. The labels indicate the iteration number.

T_{M}

is computed from the set-points of the reaction stage depicted in the lower subplot. The iteration number is displayed on the x-axis and the scaled reaction stage set-points

T_{I}

are displayed on the y-axis.

Figure 8. Results of the optimization of the initial probing sequence. In the upper subplot, the scaled temperature set-points of the separation stage

T_{II}

are plotted on the y-axis and the scaled mixing temperature is plotted on the x-axis. The labels indicate the iteration number.

T_{M}

is computed from the set-points of the reaction stage depicted in the lower subplot. The iteration number is displayed on the x-axis and the scaled reaction stage set-points

T_{I}

are displayed on the y-axis.

Figure 9. Results of the load change experiment. The scheme is started from a common operating point. After convergence, the total load of the plant was reduced by 10% in iteration 10. In all plots, the iteration number is displayed on the x-axis. In the upper subplot, the temperature set-points of the reactors

T_{I}

, the temperature set-points of the separators

T_{I I}

, and the mixing temperature

T_{M}

are shown. The second and third subplots show the evolution of the total objective and the global heat duty constraint. All values on the y-axes are scaled. Before the disturbance, initial perturbations are applied during iterations 1–3. Afterwards, finite differences and quadratic approximations are used for each subsystem depending on the number of inputs. The algorithm converges quickly within 8 iterations, once sufficient data points are available to estimate quadratic approximations for all subsystems. When the disturbance occurs, the algorithm restarts and new perturbations are applied to the system. Convergence is again achieved within a similar number of iterations, once quadratic approximations are available for all subsystems.

Figure 9. Results of the load change experiment. The scheme is started from a common operating point. After convergence, the total load of the plant was reduced by 10% in iteration 10. In all plots, the iteration number is displayed on the x-axis. In the upper subplot, the temperature set-points of the reactors

T_{I}

, the temperature set-points of the separators

T_{I I}

, and the mixing temperature

T_{M}

are shown. The second and third subplots show the evolution of the total objective and the global heat duty constraint. All values on the y-axes are scaled. Before the disturbance, initial perturbations are applied during iterations 1–3. Afterwards, finite differences and quadratic approximations are used for each subsystem depending on the number of inputs. The algorithm converges quickly within 8 iterations, once sufficient data points are available to estimate quadratic approximations for all subsystems. When the disturbance occurs, the algorithm restarts and new perturbations are applied to the system. Convergence is again achieved within a similar number of iterations, once quadratic approximations are available for all subsystems.

Figure 10. Results of the experiment with an unknown disturbance as described in the text. The scheme is started from a standard operating point. After reaching an optimal steady-state, a disturbance occurs after iteration 10. In all plots, the iteration number is displayed on the x-axis. In the upper subplot, the temperature set-points of the reactors

T_{I}

, the temperature set-points of the separators

T_{I I}

, and the mixing temperature

T_{M}

are shown. The second and third subplots show the evolution of the total objective and the global heat duty constraint. All values on the y-axes are scaled. The algorithm needs approximately 9 iterations to reach the optimal operating points. Note that higher noise levels are present compared to Section 5.1 due to the updated OTS. In this scenario, only the last six date points are used to approximate the quadratic functions. After the disturbance has occurred in iteration 10, the algorithm converges quickly to the optimal operating point as soon as all data points are collected from the disturbed space. The temperature setpoints are increased in the clean heat exchangers while they are decreased in the fouled heat exchangers.

Figure 10. Results of the experiment with an unknown disturbance as described in the text. The scheme is started from a standard operating point. After reaching an optimal steady-state, a disturbance occurs after iteration 10. In all plots, the iteration number is displayed on the x-axis. In the upper subplot, the temperature set-points of the reactors

T_{I}

, the temperature set-points of the separators

T_{I I}

, and the mixing temperature

T_{M}

are shown. The second and third subplots show the evolution of the total objective and the global heat duty constraint. All values on the y-axes are scaled. The algorithm needs approximately 9 iterations to reach the optimal operating points. Note that higher noise levels are present compared to Section 5.1 due to the updated OTS. In this scenario, only the last six date points are used to approximate the quadratic functions. After the disturbance has occurred in iteration 10, the algorithm converges quickly to the optimal operating point as soon as all data points are collected from the disturbed space. The temperature setpoints are increased in the clean heat exchangers while they are decreased in the fouled heat exchangers.

Table 1. Properties of the small system with

N = 3

.

Table 1. Properties of the small system with

N = 3

.

	Unit 1	Unit 2	Unit 3
$z_{i}$	$[u_{1}]$	$[u_{2}]$	${[u_{3}, v]}^{T}$
$N_{m i n, F D}$	2	2	3
$N_{m i n, Q A}$	3	3	6
$\nabla_{z_{i}} J_{i}$	$[\frac{\partial J_{1}}{\partial u_{1}}]$	$[\frac{\partial J_{2}}{\partial u_{2}}]$	${[\frac{\partial J_{3}}{\partial u_{3}}, \frac{\partial J_{3}}{\partial v}]}^{T}$

Table 2. Comparison of the iterations needed for convergence of the MAWQA and disMAWQA algorithms applied to the MDI process. The standard algorithm [4] needs approximately twice as many iterations compared to the proposed algorithm.

	MAWQA [4]	disMAWQA (Section 5.1)	disMAWQA (Section 5.2)
$N_{c o n}$	16	8	9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ehlhardt, J.; Wolf, I.; Engell, S. Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process. Processes 2025, 13, 3140. https://doi.org/10.3390/pr13103140

AMA Style

Ehlhardt J, Wolf I, Engell S. Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process. Processes. 2025; 13(10):3140. https://doi.org/10.3390/pr13103140

Chicago/Turabian Style

Ehlhardt, Jens, Inga Wolf, and Sebastian Engell. 2025. "Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process" Processes 13, no. 10: 3140. https://doi.org/10.3390/pr13103140

APA Style

Ehlhardt, J., Wolf, I., & Engell, S. (2025). Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process. Processes, 13(10), 3140. https://doi.org/10.3390/pr13103140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process^†

Abstract

1. Introduction

2. Real-Time Optimization Using Modifier Adaptation

2.1. Real-Time Optimization

2.2. Modifier Adaptation with Quadratic Approximation

2.3. Distributed Modifier Adaptation with Quadratic Approximation

2.4. Demonstration for a Small Example

3. Isocyanate Production Process

3.1. Surrogate Modeling of the MDI Process

3.2. Operator Training Simulator

4. Application to the Process

Initial Probing Sequence

5. Results

5.1. Known Disturbance—Load Change

5.2. Unknown Disturbance—Change in Heat Transfer Coefficients

5.3. Comparision of the Standard MAWQA Algorithm

6. Conclusions and Outlook

7. Patents

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process †

Abstract

1. Introduction

2. Real-Time Optimization Using Modifier Adaptation

2.1. Real-Time Optimization

2.2. Modifier Adaptation with Quadratic Approximation

2.3. Distributed Modifier Adaptation with Quadratic Approximation

2.4. Demonstration for a Small Example

3. Isocyanate Production Process

3.1. Surrogate Modeling of the MDI Process

3.2. Operator Training Simulator

4. Application to the Process

Initial Probing Sequence

5. Results

5.1. Known Disturbance—Load Change

5.2. Unknown Disturbance—Change in Heat Transfer Coefficients

5.3. Comparision of the Standard MAWQA Algorithm

6. Conclusions and Outlook

7. Patents

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Modifier Adaptation with Quadratic Approximation with Distributed Estimations of the Modifiers Applied to the MDI-Production Process^†