Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems

Gerling, Tim; Dresia, Kai; Deeken, Jan; Waxenegger-Wilfing, Günther

doi:10.3390/aerospace11110934

Open AccessArticle

Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems

¹

Institute of Computer Science, University of Würzburg, 97082 Würzburg, Germany

²

Institute of Space Propulsion, DLR, 74239 Hardthausen, Germany

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(11), 934; https://doi.org/10.3390/aerospace11110934

Submission received: 15 October 2024 / Revised: 30 October 2024 / Accepted: 1 November 2024 / Published: 11 November 2024

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

:

Traditional experimental design methods often face challenges in handling complex aerospace systems due to the high dimensionality and nonlinear behavior of such systems, resulting in nonoptimal experimental designs. To address these challenges, machine learning techniques can be used to further increase the application areas of modern Bayesian Optimal Experimental Design (BOED) approaches, enhancing their efficiency and accuracy. The proposed method leverages neural networks as surrogate models to approximate the underlying physical processes, thereby reducing computational costs and allowing for full differentiability. Additionally, the use of reinforcement learning enables the optimization of sequential designs and essential real-time capability. Our framework is validated by optimizing experimental designs that are used for the efficient characterization of turbopumps for liquid propellant rocket engines. The reinforcement learning approach yields superior results in terms of the expected information gain related to a sequence of 15 experiments, exhibiting mean performance increases of 9.07% compared to random designs and 6.47% compared to state-of-the-art approaches. Therefore, the results demonstrate significant improvements in experimental efficiency and accuracy compared to conventional methods. This work provides a robust framework for the application of advanced BOED methods in aerospace testing, with implications for broader engineering applications.

Keywords:

Bayesian optimal experimental design; surrogate models; neural networks; aerospace testing; reinforcement learning; EcosimPro simulation

1. Introduction

The design of experiments presents a significant challenge for researchers due to the multitude of variables involved and constraints on the number of feasible experiments. These constraints often arise from the high costs, substantial time requirements, and potential risks associated with conducting experiments, particularly in fields like aerospace engineering. A prime example is the performance testing and characterization of turbopumps, which are critical components in liquid propellant rocket engines. The performance of a turbopump cannot be directly measured and often deviates from theoretical or manufacturer estimates. This necessitates a systematic and rigorous approach to experimental design to ensure optimal information gain.

BOED is a widely used framework in scientific research for determining the optimal design of experiments. The primary goal of BOED is to maximize the information gain about uncertain parameters of interest. Over time, BOED has evolved to incorporate sequential experimental planning and model-based approaches, enhancing its applicability and effectiveness. Recent advancements in machine learning (ML), particularly the use of neural networks (NNs) as design policies for a sequence of experiments, have further expanded the capabilities of the BOED, enabling more scalable, accurate, and efficient experimental designs.

This article builds upon these advancements by extending the BOED with surrogate models, specifically neural networks, and demonstrating their application in the testing of turbopumps. Unlike traditional approaches that depend on rapid and individual simulations to generate a sufficient number of samples for the BOED, our approach is highly versatile and can be applied to any dataset that meets the criteria for training a neural network as a surrogate model (SM). Consequently, the prerequisites for utilizing the BOED are relaxed within our framework, thus negating the necessity to create supplementary simulations that are optimized for BOED applications. This makes it suitable for a broad range of experimental conditions, even in instances where simulations are time-consuming or impractical, as illustrated by our turbopump example test case.

The focus is on the integration of modern machine learning techniques with traditional BOED frameworks to enhance the experimental design process. By leveraging neural networks as flexible and powerful surrogate models, we create a robust methodology that improves the efficiency and accuracy of experimental designs. This study not only advances the methodology for turbopump testing but also provides a comprehensive approach that can be applied to various experimental setups in aerospace and other engineering disciplines. This research aims to enhance the efficacy of experimental designs in aerospace applications. It is hoped that this will facilitate the extension of these methods to other complex systems in future research endeavors. This will be achieved by capitalizing on the strengths of the BOED in general and reinforcement learning (RL) for the optimization of design policies. One of the key advantages of RL is its capacity to effectively explore the design space, thereby identifying optimal designs.

Our main contributions are as follows:

The integration of neural networks as surrogate models for complex physical systems into the BOED framework;
The application of BOED to a turbopump performance experiment, which is a common example from the field of aerospace engineering;
Verification of the performance of RL-based BOED methods over conventional approaches and for experimental setups with complex physical behavior.

The following Section 2 presents an overview of related work, identifying the principal contributions to this field of research and demonstrating how our work is distinct from that of others. Subsequently, we present a comprehensive overview of the theoretical background that is essential for an understanding of our approach in Section 3. We then delineate our methodology for integrating neural networks into BOED in Section 4. In the following Section 5, we introduce a test case in which this extended framework is applied to turbopump testing. The results of this application are discussed in detail in Section 6, after which a conclusion is presented in Section 8 that highlights the contributions of this work and potential future research directions.

2. Related Work

The field of BOED has evolved significantly since its inception in the mid-20th century. Early contributions by Lindley [1] laid the groundwork for BOED by formalizing the concept of expected information gain (EIG). Recent advancements have been driven by the integration of machine learning and reinforcement learning techniques. For instance, Foster et al. [2] and Rainforth et al. [3] have developed various EIG estimators and employed them in gradient-based methods to train policy neural networks. Their work has led to the development of deep adaptive design (DAD) [4] and its extension, as well as implicit deep adaptive design (iDAD) by Inanova et al. [5], which does not rely on closed-form likelihoods.

Our study is closely aligned with the research conducted by Blau et al. [6] on RL approaches to BOED. The authors defined an RL framework by formulating the BOED problem as a Markov decision process (MDP), assuming closed-form likelihoods. We utilize this RL-BOED framework but extend it with a neural network-based surrogate model to apply it to heavily computational expensive simulations that cannot be parallelized easily. Additionally, the focus of our study is the application of such methods to experiments relevant for aerospace engineering and the evaluation of the performance of the RL-based approach for such applications in particular. Beyond that, Lim et al. [7] introduced an RL approach with looser assumptions, which do not rely on a closed-form likelihood nor a differentiable simulator.

Other notable contributions include the application of BOED to physical simulations and chemical kinetics, demonstrating the versatility of this approach. Huan et al. [8] applied BOED to shock tube ignition experiments, while Walker et al. [9] utilized it in catalytic membrane reactor studies.

3. Preliminaries

The efficient design of experiments is a cornerstone in scientific research and engineering, where the primary goal is to obtain the maximum amount of information with the least amount of resources. Traditional experimental design methods often fall short in terms of adaptability and efficiency, particularly in complex systems with numerous variables. This has led to the increasing interest in Bayesian Optimal Experimental Design, a framework that leverages Bayesian statistics to optimize the design process.

3.1. Bayesian Optimal Experimental Design

BOED aims to maximize the EIG from experiments by selecting designs that are expected to be most informative about the parameters of interest. This is achieved by integrating Bayesian methods, which use prior distributions to incorporate existing knowledge and update this knowledge as new data become available [10]. The goal is to reduce uncertainty about the system being studied in the most efficient way possible.

BOED uses the Bayesian statistical framework to solve the optimization problem of experimental design. This problem involves selecting an experiment design d that maximizes the understanding of an uncertain parameter

θ

. In the fashion of Bayesian statistics, the initial beliefs about

θ

are expressed as a design-independent prior

P (θ)

. A simulator

P (y | θ, d)

of possible experiment outcomes y, given

θ

and d, can be constructed using the underlying model. The information gain (IG) of a hypothetical experiment can be defined by using the reduction in Shannon entropy H [11] from prior to posterior

P (θ | y, d)

:

\begin{matrix} I G_{θ} (d, y) & ≜ H (P (θ)) - H (P (θ | y, d)) \end{matrix}

(1)

\begin{matrix} = E_{θ \sim P (\cdot | y, d)} {log P (θ | y, d)} - E_{θ \sim P} {log P (θ)} \end{matrix}

(2)

where

E

denotes the expected value of a random variable [3]. Since the outcome y is unknown, the optimization needs to use a slightly different target, the marginal predictive distribution, known as EIG:

\begin{matrix} E I G_{θ} (d) & ≜ E_{y \sim P (\cdot | d)} {I G_{θ}} \end{matrix}

(3)

\begin{matrix} = E_{\begin{matrix} y \sim P (\cdot | θ, d) \\ θ \sim P \end{matrix}} \{log P (θ | y, d)} - log P (θ)\} . \end{matrix}

(4)

Replacing the posterior

P (θ | y, d)

by applying the Bayes’ theorem

P (θ | y, d) \propto P (θ) P (y | θ, d)

gives

\begin{matrix} E I G_{θ} (d) & = E_{\begin{matrix} y \sim P (\cdot | θ, d) \\ θ \sim P \end{matrix}} \{log P (y | θ, d) - log P (y | d)\} \end{matrix}

(5)

\begin{matrix} = E_{\begin{matrix} y \sim P (\cdot | θ, d) \\ θ \sim P \end{matrix}} \{log \frac{P (y | θ, d)}{P (y | d)}\} . \end{matrix}

(6)

Therefore, the goal of Bayesian optimal experimental design, to find the optimal design

d^{*}

, can be written as

d^{*} = \underset{d \in D}{arg max} E I G_{θ} (d),

(7)

where

D

is the space of all possible designs [3]. Since this information objective is doubly intractable, an estimator is required to evaluate it. One possible lower bound approximation is the prior contrastive estimation (PCE), proposed by Foster et al. [2],

I_{P C E} (d, L) ≜ E_{\begin{matrix} y \sim P (\cdot | θ_{0}, d) \\ θ_{0 : L} \sim P \end{matrix}} \{log \frac{P (y | θ_{0}, d)}{\frac{1}{L + 1} \sum_{ℓ = 0}^{L} P (y | θ_{ℓ}, d)}\} .

(8)

In this approximation,

θ

is sampled from the prior

P (θ)

. The original sample used to generate

y \sim P (y | θ, d)

, denoted as

θ_{0}

, is included to ensure that the denominator overestimates

P (y | d)

, which results in an lower bound on

I G_{θ} (d, y)

. The samples

θ_{1 : L}

are drawn independently from

P (θ)

and can be seen as contrasts to the original sample. This is the reason why they often referred to as contrastive samples; the bound tightens as

L \to \infty

. The optimization of the design is feasible by maximizing the PCE bound through stochastic gradient ascent.

3.2. Sequential BOED

Although the BOED framework is effective for single experiments, in reality, one experiment is usually insufficient. Therefore, a sequence of experiments is performed, with the possibility of designing later experiments based on the results of predecessors, making them adaptive. In order to address these circumstances, some additional considerations must be taken into account, laying the foundation for sequential BOED.

The naive approach, called variational PCE (VPCE), would be to iteratively optimize the PCE bound before an experiment and update the posterior after execution using the results. This results in short-sighted outcomes that fail to consider future experiments and solely maximize the immediate EIG. Another disadvantage of this method is that it requires heavy computations between every step to optimize the design with PCE and update the posterior. This means that it cannot be used in applications where the next experiment should be designed in real-time.

In order to overcome both shortcomings simultaneously, a design policy is optimized rather than a design directly. This policy maps a history of experiments h at time step t to the next design d:

π (h_{t - 1}) = d_{t} .

(9)

Following this definition, the expected information gain for a sequence of experiments with a fixed, given time horizon T becomes

\begin{matrix} E I G_{T} (π) & ≜ E_{\begin{matrix} h_{T} \sim P (\cdot | θ, π) \\ θ \sim P \end{matrix}} \{log \frac{P (h_{T} | θ, π)}{P (h_{T} | π)}\}, \end{matrix}

(10)

\begin{matrix} P (h_{T} | θ, π) & = \prod_{t = 1}^{T} P (y_{t} | θ, d_{t}) . \end{matrix}

(11)

As this objective is doubly intractable again, approximations are necessary for evaluation. In the fashion of Equation (8), two bounds can be obtained by sampling

θ_{0}

,

h_{T} \sim P (θ, h_{T} | π)

and introducing L independent contrasting samples

θ_{1 : L} \sim P (θ)

. The first bound, referred to as sequential PCE (SPCE), does not include

θ_{0}

and is a lower bound, since it cannot exceed

log (L + 1

). The second bound, called sequential Nested Monte Carlo (SNMC), includes

θ_{0}

and is potentially unbounded, which leads to a numerically more unstable upper bound [4].

\begin{matrix} I_{S P C E} (π, L, T) & ≜ E_{\begin{matrix} h_{T} \sim P (\cdot | θ_{0}, π) \\ θ_{0 : L} \sim P \end{matrix}} {g (θ, h_{T})}, with \end{matrix}

(12)

\begin{matrix} g (θ, h_{T}) & = log \frac{P (h_{T} | θ_{0}, π)}{\frac{1}{L + 1} \sum_{ℓ = 0}^{L} P (h_{T} | θ_{ℓ}, π)} \end{matrix}

(13)

\begin{matrix} I_{S N M C} (π, L, T) & ≜ E_{\begin{matrix} h_{T} \sim P (\cdot | θ_{0}, π) \\ θ_{0 : L} \sim P \end{matrix}} \{log \frac{P (h_{T} | θ_{0}, π)}{\frac{1}{L} \sum_{ℓ = 0}^{L} P (h_{T} | θ_{ℓ}, π)}\} . \end{matrix}

(14)

3.3. Reinforcement Learning BOED

The integration of machine learning with BOED has been a recent advancement that enhances the adaptability and efficiency of experimental design. ML-based approaches to BOED facilitate the dynamic adjustment of the design process based on the outcomes of previous experiments, enabling the selection of upcoming designs in near real-time through the utilization of pre-trained policies. This feature has the potential to be crucial in a multitude of practical applications. With regard to the testing of rocket propulsion systems, it is conceivable that in a scenario in which a series of operating points is to be approached in an experiment, the sequence is not fixed in advance. Instead, the next optimal operating point is selected during operation based on previous results.

Among these ML techniques, reinforcement learning has shown great promise. RL, a subset of ML, focuses on training agents to make a sequence of decisions by rewarding them for favorable outcomes, thus progressively improving their performance. A general formulation of this problem provides the framework of the Markov Decision Process [12].

An MDP comprises the tuple

(S, A, R, T, ρ_{0})

. The state space of the problem is described by

S

, which can be either discrete or continuous, just like the action space

A

. The reward function

R

assigns a numerical reward to each transition from the last to the current state, defining the goal for the agent by determining the desirability of the transition.

T

is the state transition function that assigns a probability to each transition between two states, which is triggered by an action a. Finally, the starting state distribution is given by

ρ_{0}

. In classical MDPs, the Markov assumption underlies every state transition. This assumption states that the probability of reaching a subsequent state

s^{'}

from a state s depends only on s and the taken action a and not on its predecessor states [12].

A policy

π

is a function that outputs the probability of an action a for all possible states of the MDP at every time step t. Therefore,

π (a | s) = P (a_{t} = a | s_{t} = s)

(15)

indicates the probability P of selecting a certain action a in state s at time step t. For a deterministic policy, the probability of being chosen is zero for all actions except one. An MDP is considered solved when an optimal policy

π^{*}

is known that assigns an action to each state, optimizing the cumulative reward over time. The goal of reinforcement learning is to find such an optimal policy [12]. In the context of experimental design, the agent is tasked with selecting the experimental conditions that are expected to yield the most informative data.

The cumulative discounted reward—also known as the return—for an infinite sequence of states

s_{t}

and actions

a_{t}

is given by

R (τ) = \sum_{t = 0}^{\infty} γ^{t} r_{t + 1},

(16)

where

τ = (s_{0}, a_{0}, s_{1}, a_{1}, . . .)

denotes a trajectory, and

γ \in [0, 1)

is a discount factor, which guarantees that the infinite sum converges. If the expected return J, when acting according to policy

π

, is represented by

J (π) = E_{τ \sim P (\cdot ∣ π)} {R (τ)},

(17)

then the goal of finding an optimal policy can also be expressed as

π^{*} = \underset{π}{arg max} J (π) .

(18)

The problem presented in sequential BOED shares many similarities with the RL problem. For instance, both require finding an optimal policy, balancing myopic and long-term solutions, and addressing the exploration vs. exploitation dilemma. Therefore, RL methods are well suited, as the associated algorithms are highly optimized for solving such tasks.

In order to apply such algorithms, it is first necessary to formulate the sequential BOED objective as an MDP, thereby satisfying

J (π) = S P C E (π, L, T) .

(19)

One possible approach is to assign the posterior as the state

s_{t} = P (θ | h_{t})

, the design d as the actions, and use

g (θ, h_{T})

(13) as the reward. This has the advantage that the reward can be computed without the need for the inference of a posterior at every time step. However, this approach has the significant shortcoming that the state requires the true value of

θ

in order to calculate the reward. This is challenging because the value of

θ

is unknown at run time, so it cannot be observed or influenced by any actions. Consequently, the classic MDP is expanded to the formulation of a hidden parameter MDP (HiP-MDP) [13].

To obtain an HiP-MDP, the tuple of a classic MDP

(S, A, R, T, ρ_{0})

is augmented by

Θ

and

P_{Θ}

in order to obtain

(S, A, Θ, R, T, ρ_{0}, P_{Θ})

.

Θ

represents a parameter space, and

P_{Θ}

is a prior over the parameter. Furthermore, the reward

R

and the transition function

T

are now dependent on the parameter

θ

, which is once sampled at the start of the episode from

P_{Θ}

and remains fixed, meaning that

r_{t} = R (s_{t - 1}, a_{t - 1}, s_{t}, θ)

and

s_{t} \sim T (s_{t} | s_{t - 1}, a_{t - 1}, θ)

, respectively. The return for a HiP-MDP is

J (π) = E_{τ_{θ} \sim P (\cdot ∣ π)} \{\sum_{t = 1}^{T} γ^{t - 1} R (s_{t - 1}, a_{t - 1}, s_{t}, θ)\} .

(20)

The resulting HiP-MDP is still deficient in two aspects. The states include the entire history to compute

g (θ, h_{T})

, which violates the Markov property. Furthermore, with

g (θ, h_{T})

as the reward function, the reward can only be assigned to the terminal state to prevent the contribution of the

t^{t h}

experiment being counted

T - t

times, which leads to a sparse reward.

Both issues are addressed by the reward function proposed by Blau et al. [6] that estimates the marginal contribution of the

t^{t h}

experiment to the cumulative EIG

R (s_{t - 1}, a_{t - 1}, s_{t}, θ) = log P (y_{t} | θ_{0}, d_{t}) - log (C_{t} \cdot 1) + log (C_{t - 1} \cdot 1),

(21)

where

C_{t} = {[\prod_{k = 1}^{t} P (y_{k} | θ_{ℓ}, d_{k})]}_{ℓ = 0}^{L}

(22)

is a vector representation of history likelihoods. Each element

c_{t, ℓ}

is the likelihood of observing the history

h_{t}

if

θ_{ℓ}

were the true model parameters.

1

is a vector of ones of the appropriate length [6].

Learning a comprehensive representation of the history from data is a common approach of partially observable MDPs [14]. To do so, a permutation-invariant representation proposed by Foster et al. [4] is used:

B_{ψ, t} = \sum_{k = 1}^{t} E N C_{ψ} (d_{k}, y_{k}),

(23)

where

E N C_{ψ}

is an encoder network with parameters

ψ

.

The combination of two distinct history representations, one pertaining to the policy

π

and the other to the reward

R

, results in the formation of a state that is a concatenation of both according to [6]

\begin{matrix} y_{t} & \sim P (y_{t} | d_{t}, θ_{0}), \\ B_{ψ, t} & = B_{ψ, t - 1} + E N C_{ψ} (d_{t}, y_{t}) and \\ C_{t} & = C_{t - 1} ⊙ {[P (y_{t} | θ_{ℓ}, d_{t}]}_{ℓ = 0}^{L}, \end{matrix}

(24)

where ⊙ denotes the Hadamard product. The final HiP-MDP is as follows [6]:

$S$ : The current experiment outcome $y_{t}$ and the history summaries and likelihoods used by $π$ and $R$ , respectively. $s_{t} = (B_{ψ, t}, C_{t}, y_{t}) \in [0, T]$ .
$A$ : The design space with $a_{t - 1} = d_{t} \forall t \in [1, T]$ .
$Θ$ : The space of model parameters.
$R$ : The reward function of Equation (21).
$T$ : The transition dynamic of Equation (24).
$ρ_{0}$ : The initial history is always the empty set, thus $ρ_{0} = (B_{ψ, 0}, C_{0}, y_{0}) = (0, 1, \emptyset)$ .
$P_{Θ}$ : The prior $P (θ)$ .
$π$ : For the policy at time t, a policy $π (a_{t} | s_{t})$ maps to a distribution $B_{ψ, t}$ over designs $d_{t + 1}$ . Therefore, the policy is stochastic.

4. Methodology

The following section presents a detailed account of the methodologies and solutions devised to address the challenges encountered when implementing the BOED framework in tangible real-world applications and experiments conducted at the German Aerospace Center (DLR) facility in Lampoldshausen.

Surrogate Model-Based BOED

In many real-world applications, the relationship between the unknown parameter

θ

, the design d, and the observation y is complex, and therefore, computing the likelihood is difficult. However, in order to have access to the likelihood nevertheless, it is necessary to have a model from which samples can be drawn easily. In the case where a simulation is too slow to fulfill this property, it has been shown that an SM can be the solution if it reflects the relationship with sufficient accuracy [8]. To obtain a surrogate model, several approaches are possible, such as a polynomial chaos expansion or artificial neural networks. In this study, fully connected feedforward neural networks have been selected, as they offer advantages in terms of computational efficiency and differentiability.

Neural networks are capable of precisely approximating any given function and offer advantages during training, as they only require an appropriate dataset [15]. Inspired by biology, artificial neural networks consist of multiple layers of interconnected units called neurons. During the training process, the neuron parameters are optimized to approximate the mapping from inputs to outputs [16]. This is usually accomplished by minimizing a cost function C using gradient-based methods and error backpropagation [17]. In theory, an applicant may utilize any type of simulation to generate a dataset for the training process. This dataset maps the designs d and uncertain parameters

θ

to outputs based on the system dynamics. Furthermore, it is feasible to employ real experimental data to rectify model inaccuracies and to fine-tune the data set. Once the network has undergone training, it is able to make precise predictions regarding the outcomes. In the case of deterministic simulation, a known noise characteristic is added to the output in order to obtain an observation, as shown in Figure 1. This enables the analysis and investigation of analogous experimental configurations with different noise characteristics without the necessity of retraining the NN, which would be the case if the network were to make direct likelihood predictions.

5. Test Case

This section presents the details of and the theoretical background behind our test case, which is an experiment that aims to determine the performance of a turbopump. As the majority of algorithms and methodologies employed in this study utilize a set of hyperparameters that permit the applicant to conduct fine-tuning, we also provide an overview of the hyperparameters used to produce the presented results, with the aim of enabling the reproducibility of the results.

5.1. Turbopump Performance

The main model evaluated in this study is inspired by the LUMEN oxidizer turbopump (OTP) [18]. For this purpose, a simulation was used to generate data for training the neural networks that was used as a surrogate model. The simulation was implemented with the commercial software EcosimPro from Empresarios, which is the official tool of the European Space Agency (ESA) for such applications. The majority of components utilized in this simulation model were taken from the ESPSS libraries, which are also supported by ESA [19,20]. The schematic for this turbopump simulation consists of two parts, the turbine and the pump, both independently connected to their respective supplies by pipes and two valves for controlling the turbine inlet pressure (TOV) and pressure loss after the pump (OCV), as shown in Figure 2.

The pump and turbine characteristics can be defined by means of user-defined tables for specific coefficients. In the EcosimPro simulation environment, this is typically achieved through the use of lookup tables [20,21]. The turbine characteristics describe the produced torque of the turbine as a function of the rotational speed and pressure loss. In contrast, characteristics of the pump describe the performance of the pump. The pump performance is a crucial parameter for the overall behavior of the system, which is related to temperature and pressure. Additionally, it influences the consumed torque T of the pump. The resulting curves from the user-defined tables for the pump are referred to as pump curves, with each curve being associated with a pump coefficient [22,23]. One such coefficients is the reduced torque

C^{+}

, which will be the objective of analysis. In equilibrium, the produced torque of the turbine and the consumed torque of the pump must to be equal. The goal of a turbopump performance experiment is to experimentally determine the pump performance, for example,

C^{+}

vs.

P s i^{+}

, without directly measuring the torque.

For a liquid pump, the mass flow coefficient is

φ^{+} = \frac{{\dot{m}}_{p u m p}}{ρ_{i n} ω} [m^{3}],

(25)

where

{\dot{m}}_{p u m p}

is the mass flow,

ρ_{i n}

is the inlet density, and

ω

is the rotational speed. The reduced torque

C^{+}

is a function of

φ^{+}

and defined as

C^{+} = f (φ^{+}) = \frac{T}{ρ_{i n} ω^{2}} [m^{5}],

(26)

where T is the consumed torque. As this value differs for each pump (even for geometrically similar pumps),

C^{+}

is calculated with respect to

φ^{+}

using a polynomial approximation of the unique function f [20].

5.2. Turbopump Performance Experiment

As already stated, the aim of the tubopump performance experiment is to experimentally determine one of the characteristic performance curves of the pump component of a turbopump. More precisely, the intention is to estimate the coefficients of a polynomial interpolation of the

C^{+}

vs.

P s i^{+}

curve. For approximating this curve, a 2nd degree polynomial is used:

y = b_{0} x^{2} + b_{1} x + b_{2}

.

The general model for BOED consists of the scalar observable y, which depends on the uncertain parameter

θ

and the design variables denoted as d:

y (θ, d) = G (θ, d) + ϵ,

(27)

with a Gaussian measurement error added to the model output

G (θ, d)

. As a result, the observable y itself follows a normal distribution with mean

μ = G (θ, d)

and standard deviation

σ = ϵ

.

In the case of the turbopump performance experiment, the three unknown parameters of the model

θ_{1}

,

θ_{2}

, and

θ_{3}

correspond to the three coefficients of the polynomial

b_{0}

,

b_{1}

, and

b_{2}

. The design variables d are the inlet pump mass flow

{\dot{m}}_{p u m p}

and the position of the turbine oxidizer control valve (TOV). While the mass flow

{\dot{m}}_{p u m p}

is given in

\frac{k g}{s}

, the valve position is a ratio from 0 (completely close) to 1 (completely open). A greater degree of opening for the inlet valve of the turbine (value closer to 1) results in a corresponding rise in inlet pressure

p_{G N 2}

, which in turn leads to an increased power output, accompanied by an acceleration of the rotational speed

n_{p u m p}

. For the experimental setup considered in this study, the design space is bounded to a mass flow

{\dot{m}}_{p u m p}

from 0.5

\frac{k g}{s}

to 6.0

\frac{kg}{s}

and a ratio from

0.1

to

1.0

, which is close to the typical operating range of the LUMEN turbopump [18]. The output of the model resulted from a simulation that consists of the physical dynamics of the system. Ultimately, the observable y is a tuple. The tuple consists of the inlet fluid temperature

T_{L O X}

at the pump in Kelvin and the outlet pressure

p_{p u m p}

in megapascals. Both values have an observation noise according to Equation (27).

As a starting point for a belief about the prior, one could consider the theoretical specifications of the pump, in this case, the LUMEN turbopump. The coefficients that specify the pump performance are calculated by a tool used at DLR to analyze pumps, with the following values:

b_{0} = 2.216 \times 10

,

b_{1} = 9.697 \times 10^{- 4}

, and

b_{2} = 3.428 \times 10^{- 9}

. The values are adjusted to achieve a smooth distribution with some uncertainty introduced to obtain

b_{0} = N (2.5, 0.2) \times 10

,

b_{1} = N (9.5, 0.2) \times 10^{- 4}

, and

b_{2} = N (3.4, 0.2) \times 10^{- 9}

. Based on real test campaigns with the turbopump, the number of experiments per sequence was set to 15. The observation noise for the pressure and temperature was 0.21 MPa and 0.5 K, respectively, which are significantly higher than the prediction error of the SM model. Given the observation interval of 0.65 MPa to 9.50 MPa for the pressure and 90 K to 155 K for the temperature, the observation noise was below 1% of the maximum observation value. While these values may differ from the actual specifications of test runs, they are sufficiently close for the purpose of demonstrating the BOED approaches. For instance, the observation range is a direct consequence of the simulation outputs for given input values. In the event that specific combinations contravene certain restrictions, for example those imposed by the physical system, they must subsequently be excluded, as it is challenging to identify them in advance, given that they are typically unknown. Moreover, it is advisable to investigate a wide range of designs in order to identify trends and promising regions within the design space, thus facilitating the identification of optimal designs.

5.3. EcosimPro

The EcosimPro framework was used to defined an experiment, that simulates the physical behavior of the turbopump. Each experiment was executed over a total time span of 10 s with integration steps of 1 s and the goal of achieving a steady state. This is sufficiently accurate for the application purpose, particularly given that EcosimPro automatically decreases the step size if necessary. The error tolerance for the solvers is

2 \times 10^{- 4}

for REL_ERROR, ABS_ERROR and TOLERANCE [24].

5.4. Surrogate Model

As previously stated, it is essential to have a model from which one can obtain samples with high efficiency. To this end, a fully connected feedforward neural network was trained over 20,000 epochs, with the hyperparameters specified in Table 1 serving as an SM for the turbopump. Despite the apparently high number of epochs, the validation demonstrated that the performance of the SM continued to improve during extended training sessions, with no evidence of overfitting. This may be attributed to the dense distribution of the data and the consequent minimal degree of generalizability that is required. The training was executed on a set of 25,000 data points generated by the aforementioned EcosimPro simulation, with the training and validation sets split by a ratio of

0.85

. The design space and parameter space are specified in Table 2. In this case, it was deemed appropriate to normalize the output training data, given that the observations of pressure and temperature are of different magnitudes. In addition, the three coefficients

b_{0}

,

b_{1}

, and

b_{2}

that form

θ

were linearly scaled down by dividing by their magnitudes. This resulted in a notable improvement in training performance and ensured numerical stability during training. The minimum and maximum values for the normalization were directly obtained from the training data, with a small tolerance incorporated to ensure that the SM could also handle slightly larger and smaller values. Subsequently, the outputs of the network were scaled back. For the turbopump SM, the validation yielded an absolute error of 0.0094 MPa for the pressure and 0.052 K for the temperature. The specifics of the neural network architecture are outlined in Table 3. The selected architecture was chosen after a process involving the evaluation of multiple alternatives. The objective was to identify an architecture that balances simplicity and concision with the capacity to accurately map the specified relationships, as evidenced by a validation error that was considerably smaller than the measurement error associated with the experimental setup.

It is important to note that the quality of the design optimization is directly dependent on that of the SM. The SM introduces another potential error source, which can be mitigated through the use of precise simulation data and optimized NN training.

5.5. Reinforcement Learning

The reinforcement learning methodology utilized the REDQ algorithm [25,26,27]. The agent was trained over 20,000 episodes, with the hyperparameters specified in Table 4 [28,29]. A plot of training progress is shown in Figure 3. The networks that represent the two distinct histories, for the policy and the reward, consist of two fully connected layers with 128 neurons and ReLU activation. For the

B_{Φ}

network, the output layer has 64 neurons and no activation function, while the output layer of the policy network

π_{Φ}

depends on the action space of the designated experiment.

6. Results

The results focus on the performance of applying BOED methods to the turbopump test case. All results were generated using the general random seed 3869 and the distinct random seed 23,457 for the random design generation [30,31,32]. A sequence of experiments was designed by each approach and evaluated to compare them. The different approaches were the following: a random policy, which serves as a naive baseline and chooses the next design randomly from the design space; the VPCE method [2], which selects the next design based on a PCE optimization between every experiment; and the RL approach, which determines the next experiment design by passing the current state to a pre-trained policy.

It is important to note that while the RL and the random approach operate in near real-time, the VPCE approach requires a significantly longer run time due to the need for a complex design optimization step between each experiment, which is computationally intensive. In contrast, the only time-consuming step of RL and random is the inference step, which is also included for VPCE. Furthermore, while the VPCE method is highly dependent on this inference step, since the posterior is used as the prior for the next design phase and therefore directly influences its performance, the other two methods work even without this step. They execute the inference step only for the purpose of evaluation. However, while the RL approach is fast at deployment time, it involves a time-consuming training period beforehand.

In order to compare the two different BOED approaches with the random baseline, it is necessary to define performance metrics, for which the EIG and SPCE are used. To calculate the EIG, a stochastic variational inference (SVI) step with the settings from Table 5 was executed after every experiment [33,34]. SVI is a scalable method for approximating posterior distributions by leveraging variational inference combined with stochastic gradient updates, allowing for efficient posterior calculation even with large datasets [35,36]. The purpose of this is to fit the prior with the data from the experiment and to obtain a posterior. This posterior is then set as prior for the subsequent experiment. The EIG results from the difference in entropy between the prior and the posterior. As this computation is dependent on the performance of the SVI, which in turn is dependent on a number of factors—including observation noise, the number of gradient steps, and the number of observations available—it is a less general metric.

A more general metric is the SPCE, which is calculated by evaluating Equation (13) with a number of contrastive samples L. For the turbopump performance experiment, the number of samples was

10^{5}

, with 100 parallel rollouts executed to obtain the mean and the standard error. The results for a sequence of 15 experiments are shown in Figure 4 and Table 6.

7. Discussion

The RL approach had the most favorable results. It outperformed the random baseline by a multiple of the standard error when considering both the EIG and the SPCE. There was a 9.07% improvement in the EIG and a 18.04% improvement in the SPCE. In contrast, the situation is less clear with the VPCE method. A comparison to the random baseline showed a 2.44% improvement in the EIG and an 8.19% improvement in the SPCE. With regard to the SPCE metric, this approach performed well at the beginning and was close to the RL method. However, as the number of experiments increased, it grew closer and closer to the random baseline. This is simply due to the fact that the distinct components of

θ

, namely, the coefficients of the polynomial, vary in their difficulty of determination. While the determination of the quadratic coefficient,

b_{0}

, was relatively challenging, the difficulty decrease with the constant offset,

b_{2}

, being the easiest one to determine. After initial success through determining the simple coefficients, it was challenging to surpass the random baseline without careful planning at a later stage. Consequently, the RL approach has the potential to achieve EIG values that are 6.47% higher and SPCE values that are 9.11% higher.

In addition to the considerable complexity of determining the coefficients from experimental data, the results indicate that the choice of experimental design is only loosely related to the EIG in this setting. The performance outcomes of all three approaches are relatively close to each other [2,6]. For the EIG metric, which utilizes SVI, the performance of the VPCE method is nearly identical to a random choice of experimental design. This indicates that specific conditions inherent to experimental setups may influence the overall utility of BOED in the design process. However, these characteristics are challenging to ascertain in advance without a comprehensive analysis.

An alternative method for visualizing the results is to present the evolution of the posterior for a single sequence of experiments, as illustrated in Figure 5. The illustration is limited to one part of the three-part parameter space, in this case,

θ_{3}

. Furthermore, the figure comprises three subplots corresponding to the three different experimental design approaches: random, RL, and VPCE. It is important to note that the plots shown are for illustrative purpose only, as they represent a single rollout out of 100. Additionally, it should be noted that the performance of a single rollout can vary and is not statistically stable. Therefore, the results presented in Table 6 were subjected to statistical analysis, with the mean and standard error of all rollouts being evaluated. Nevertheless, the plots visually demonstrate that the VPCE and RL-based BOED approaches performed better than the random baseline, with RL performing even better than VPCE.

In addition to the exemplary nature of these plots, it is important to note that the posteriors shown are obtained using the SVI method and are therefore dependent on the performance of this method. For instance, the utilization of the posterior as a prior for the subsequent experiment may result in significant discrepancies from the true value, which are introduced at the beginning and are challenging to rectify in subsequent stages. Such discrepancies are particularly likely to arise when there is a considerable degree of noise present, coupled with a limited number of observations. It is therefore advisable, if feasible, to include the SVI step only after some experiments have already been carried out. With regard to the RL method, it is recommended that the entire sequence of experiments is completed before the SVI step is performed in order to obtain the most optimal results. For the VPCE method, it was found that the deviations could be reduced if the prior was updated only after the first few experiments were conducted. Accordingly, the standard deviation of the probability density functions displayed is a more meaningful indicator than the proximity of the mean to the true value. This metric is more resilient to deviations and indicates the degree of certainty that can be placed on the estimate.

Another interesting insight is given by the EIG distribution over the design space. Figure 6 shows the EIG estimated by PCE for the initial prior. At this stage, the majority of the design space is promising, even if valve positions below

0.2

, particularly when paired with high mass flows exceeding 4

\frac{k g}{s}

are not favorable. This makes it highly unlikely that even a random approach will result in a suboptimal design. However, the situation changed as the number of experiments increased and the prior became more certain. An illustrative example is presented in Figure 7. It can be observed that by neglecting the order of magnitude of the coefficients, which serves merely to scale the performance curve so it fits the physical unit, the estimates for

b_{1}

and

b_{2}

become quite precise, while the prior for

b_{0}

is relatively uncertain. In this instance, the absolute EIG values are considerably smaller than in the previous case, as it is more challenging to increase the certainty further. Furthermore, the promising region is smaller and is primarily situated in the upper right corner of the design space. This is equivalent to a valve that is more than

60 %

opened and is combined with high mass flows, which leads to the conclusion that even in the initial distribution, individual regions can be assigned to the different coefficients. For instance, while the right upper corner is favorable for determining

b_{2}

, it is likely that the dark red region towards the left side is optimal for increasing the certainty about

b_{0}

.

In light of this insight and the distribution of the EIG over the design space, an analysis of the design choices is possible. Figure 8 presents a comprehensive overview of the design choices made by the VPCE and RL approach across a series of 15 experiments for all 100 rollouts. It is possible to identify particularly dense regions for both approaches. It can be observed that the VPCE method exhibited a greater degree of dispersion, which suggests that identifying the optimal design may be challenging. In particular, the boundaries of the design space appear to have been selected with high frequency, and there is even a highly dense region in the lower right-hand corner. This is unexpected, as the previous study suggested that this region is unfavorable for all stages. The other dense regions illustrate well how the VPCE approach concentrates on the coefficients that are relatively straightforward to determine. In contrast, the RL approach was able to identify even the regions that are particularly promising for an increasing number of experiments.

8. Conclusions

In this study, the performance of Bayesian optimal experimental design was evaluated based on the example of a turbopump performance experiments. The findings demonstrate that the RL approach is a convincing methodology in many respects. First and foremost, it is evident that the approach demonstrates superior performance. In the experiment presented, this approach was able to achieve the best results. In the majority of cases, the RL approach exhibited superior performance already from the beginning, and in some cases achieved remarkable results—many times the standard error was better than those of the others approaches. This was achieved mainly by unerringly identifying the optimal designs and also acting with foresight to optimize the EIG across the entire sequence of experiments. However, these advantages are also closely linked to the advantages of RL algorithms in general. The fact that the RL training was carried out for all three experimental setups with the same hyperparameters and achieved good results clearly illustrates the stability and reliability of these algorithms. In contrast, the VPCE method frequently failed to identify the optimal design, despite individual and complex fine-tuning. This property, coupled with the ability to identify optimal designs in near real time after a single, time-consuming training session, is the key advantage of RL method.

With regard to the turbopump performance experiment, it is important to emphasize the inherent weaknesses of the BOED framework, regardless of the method used. It was found, that specific conditions must be met in order for the BOED approach to be successful. It is an obvious requirement that different designs must influence the outcome of the experiment or that the unknown parameters must somehow be related to the observations. However, the strength of these characteristics also plays an important role. For instance, for certain components of

θ

, the turbopump performance experiment has demonstrated that the choice of an experimental design has a minimal impact on the EIG. In addition, the ratio of the number of designs and observations to the number of unknown parameters must be considered. In the majority of cases, this should be approximately balanced.

The developed approach can be easily applied to other experiments due to the flexibility of its interface. In all cases where a sufficiently large dataset is available, consisting of designs and thetas as input and corresponding observations as output, optimal designs can be determined with minimal effort using the preferred method.

The BOED research area is a dynamic one, with further developments expected in the near future. In particular, policy-based approaches that do not utilize RL algorithms, such as DAD, are being continuously developed and are exhibiting considerable potential. It remains to be seen which of the two will ultimately prevail. In addition, considerable effort is being put into the generalizability of the BOED framework, with the aim of relaxing the conditions to the greatest extent possible and enabling its application to a wide range of models.

As demonstrated by the work of Lim et al. [7], it is possible to train neural networks that directly approximate the likelihood. It would therefore be possible in the future to bypass the need for a surrogate model and, instead of training a NN that maps inputs to observations, to train a NN that maps inputs directly to likelihoods. This approach would not only speed up the calculations, but it could also improve the accuracy of BOED methods. In this manner, observations could be entirely eliminated, with the exception of the evaluation step. For this step, real experimental data or slow but accurate simulations could be used, since evaluation requires a much smaller number of samples than design optimization.

This study not only identifies the current capabilities in experimental design, enhanced by advanced machine learning techniques, but it also provides a foundation for future innovations that promise to further refine the precision and efficiency of scientific research.

Author Contributions

Conceptualization, T.G. and G.W.-W.; methodology, T.G. and G.W.-W.; software, T.G.; investigation, T.G.; resources, J.D.; writing—original draft preparation, T.G. and K.D.; writing—review and editing, T.G., K.D. and G.W.-W.; visualization, T.G.; supervision, G.W.-W. and J.D.; project administration, G.W.-W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ESA	European space agency
DLR	German Aerospace Center
ML	Machine learning
RL	Reinforcement learning
MDP	Markov decision process
HiP-MDP	Hidden parameter Markov decision process
DAD	Deep adaptive design
iDAD	implicit deep adaptive design
SM	Surrogate model
NN	Neural network
BOED	Bayesian optimal experimental design
IG	Information gain
EIG	Expected information gain
PCE	Prior contrastive estimation
SPCE	Sequential prior contrastive estimation
VPCE	Variational prior contrastive estimation
SNMC	Sequential nested monte carlo
SVI	Stochastic variational inference
OTP	Oxidizer turbopump
GN2	Gaseous nitrogen
LOX	Liquid oxygen
TOV	turbine oxidizer control valve
OCV	Oxidizer control valve

References

Lindley, D.V. On a Measure of the Information Provided by an Experiment. Ann. Math. Stat. 1956, 27, 986–1005. [Google Scholar] [CrossRef]
Foster, A.; Jankowiak, M.; O’Meara, M.; Teh, Y.W.; Rainforth, T. A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments. arXiv 2020, arXiv:1911.00294. [Google Scholar]
Rainforth, T.; Foster, A.; Ivanova, D.R.; Smith, F.B. Modern Bayesian Experimental Design. arXiv 2023, arXiv:2302.14545. [Google Scholar] [CrossRef]
Foster, A.; Ivanova, D.R.; Malik, I.; Rainforth, T. Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design. arXiv 2021, arXiv:2103.02438. [Google Scholar]
Ivanova, D.R.; Foster, A.; Kleinegesse, S.; Gutmann, M.U.; Rainforth, T. Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods. arXiv 2021, arXiv:2111.02329. [Google Scholar]
Blau, T.; Bonilla, E.V.; Chades, I.; Dezfouli, A. Optimizing Sequential Experimental Design with Deep Reinforcement Learning. arXiv 2022, arXiv:2202.00821. [Google Scholar]
Lim, V.; Novoseller, E.; Ichnowski, J.; Huang, H.; Goldberg, K. Policy-Based Bayesian Experimental Design for Non-Differentiable Implicit Models. arXiv 2022, arXiv:2203.04272. [Google Scholar]
Huan, X.; Marzouk, Y.M. Simulation-based optimal Bayesian experimental design for nonlinear systems. J. Comput. Phys. 2013, 232, 288–317. [Google Scholar] [CrossRef]
Walker, E.A.; Ravisankar, K. Bayesian Design of Experiments: Implementation, Validation and Application to Chemical Kinetics. arXiv 2019, arXiv:1909.03861. [Google Scholar]
Lee, P.M. Bayesian Statistics: An Introduction, 4th ed.; Wiley Publishing: Hoboken, NJ, USA, 2012. [Google Scholar]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Doshi-Velez, F.; Konidaris, G. Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations. arXiv 2013, arXiv:1308.3513. [Google Scholar]
Gregor, K.; Rezende, D.J.; Besse, F.; Wu, Y.; Merzic, H.; van den Oord, A. Shaping Belief States with Generative Environment Models for RL. arXiv 2019, arXiv:1906.09237. [Google Scholar]
Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
Zhou, Z.H. Machine Learning; Springer: Singapore, 2022. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Traudt, T.; Armbruster, W.; Groll, C.; Hahn, R.d.S.; Dresia, K.; Börner, M.; Klein, S.; Suslov, D.I.; Haemisch, J.; Müller, M.A.; et al. LUMEN, the Test Bed for Rocket Engine Components: Results of the Acceptance Tests and Overview on the Engine Test Preparation. In Proceedings of the Space Propulsion Conference, Glasgow, Scotland, 20–23 May 2024. [Google Scholar]
Vilá, J.; Moral, J.; Fernandez Villace, V.; Steelant, J. An Overview of the ESPSS Libraries: Latest Developments and Future. In Proceedings of the Space Propulsion Conference, Seville, Spain, 14–18 May 2018. [Google Scholar]
Aranda, M.; Gutiérrez, D.; Villagarcía, V. ESPSS 3.7.0 User Manual; Empresarios Agrupados Internacional S.A.: Madrid, Spain, 2022. [Google Scholar]
EL Hefni, B.; Bouskela, D. Modeling and Simulation of Thermal Power Plants with ThermoSysPro: A Theoretical Introduction and a Practical Guide; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
Japikse, D.; Marscher, W.; Furst, R. Centrifugal Pump Design and Performance; Concepts ETI: Wilder, VT, USA, 1997. [Google Scholar]
Huzel, D.; Huang, D. Design of Liquid Propellant Rocket Engines; NASA SP, Scientific and Technical Information Office, National Aeronautics and Space Administration: Washington, DC, USA, 1971.
Empresarios Agrupados Internacional S.A. EcosimPro2022 Version 6.4.0 Modelling and Simulation Software Complete Reference Manual; Empresarios Agrupados Internacional S.A.: Madrid, Spain, 2022. [Google Scholar]
Chen, X.; Wang, C.; Zhou, Z.; Ross, K. Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. arXiv 2021, arXiv:2101.05982. [Google Scholar]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2019, arXiv:1812.05905. [Google Scholar]
Achiam, J. Spinning Up in Deep Reinforcement Learning. 2018. Available online: https://spinningup.openai.com (accessed on 31 October 2024).
The Garage Contributors. Garage: A Toolkit for Reproducible Reinforcement Learning Research. 2019. Available online: https://github.com/rlworkgroup/garage (accessed on 31 October 2024).
Bingham, E.; Chen, J.P.; Jankowiak, M.; Obermeyer, F.; Pradhan, N.; Karaletsos, T.; Singh, R.; Szerlip, P.; Horsfall, P.; Goodman, N.D. Pyro: Deep Universal Probabilistic Programming. arXiv 2018, arXiv:1810.09538. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar]
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
Wingate, D.; Weber, T. Automated Variational Inference in Probabilistic Programming. arXiv 2013, arXiv:1301.1299. [Google Scholar]
Ranganath, R.; Gerrish, S.; Blei, D. Black Box Variational Inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, 22–25 April 2014; Kaski, S., Corander, J., Eds.; Volume 33, pp. 814–822. [Google Scholar]
Hoffman, M.D.; Blei, D.M.; Wang, C.; Paisley, J. Stochastic Variational Inference. J. Mach. Learn. Res. 2013, 14, 1303–1347. [Google Scholar]

Figure 1. The flowchart illustrates the manner in which the surrogate model is integrated into the experimental design process. The surrogate model is derived from training data that are generated in a simulation and contain the uncertain parameter

θ

and the design d as inputs. In the depicted case of a deterministic simulation, the noise is subsequently added to the surrogate model output G in order to obtain an observation y. This observation, along with the experimental design that led to it, is subsequently employed in the design optimization step. This step may utilize reinforcement learning methods.

Figure 1. The flowchart illustrates the manner in which the surrogate model is integrated into the experimental design process. The surrogate model is derived from training data that are generated in a simulation and contain the uncertain parameter

θ

and the design d as inputs. In the depicted case of a deterministic simulation, the noise is subsequently added to the surrogate model output G in order to obtain an observation y. This observation, along with the experimental design that led to it, is subsequently employed in the design optimization step. This step may utilize reinforcement learning methods.

Figure 2. Schematic of the OTP system, including the pump component and the turbine component with their respective liquid oxygen (LOX) and gaseous nitrogen (GN2) suppliers. The illustration also includes the turbine oxidizer control valve (TOV), the oxidizer control valve (OCV), the rotational speed

n_{p u m p}

, and the pump mass flow

{\dot{m}}_{p u m p}

.

Figure 2. Schematic of the OTP system, including the pump component and the turbine component with their respective liquid oxygen (LOX) and gaseous nitrogen (GN2) suppliers. The illustration also includes the turbine oxidizer control valve (TOV), the oxidizer control valve (OCV), the rotational speed

n_{p u m p}

, and the pump mass flow

{\dot{m}}_{p u m p}

.

Figure 3. Learning curve for the reinforcement learning agent for the turbopump performance experiment. Training over 20,000 episodes with hyperparameters from Table 4.

Figure 4. Performance of random, VPCE, and RL method over a sequence of turbopump performance experiments with length

t = 15

. (On the left): the EIG with shaded regions indicating the standard error. (On the right): the SPCE evaluated on

10^{5}

contrastive samples with shaded regions indicating the standard error. Mean and standard error from 100 rollouts.

Figure 4. Performance of random, VPCE, and RL method over a sequence of turbopump performance experiments with length

t = 15

. (On the left): the EIG with shaded regions indicating the standard error. (On the right): the SPCE evaluated on

10^{5}

contrastive samples with shaded regions indicating the standard error. Mean and standard error from 100 rollouts.

Figure 5. The distributions that represent an estimate of

θ_{3}

for the turbopump performance experiment after a number of experiments were conducted. From dark blue after the first experiment to light green for the last experiment. Organized by the three different BOED approaches from top to bottom: random, VPCE, and RL. The dashed line represents the true value of

θ_{3}

.

Figure 5. The distributions that represent an estimate of

θ_{3}

for the turbopump performance experiment after a number of experiments were conducted. From dark blue after the first experiment to light green for the last experiment. Organized by the three different BOED approaches from top to bottom: random, VPCE, and RL. The dashed line represents the true value of

θ_{3}

.

Figure 6. The distribution of the EIG estimated by PCE, with

N = 100

and

M = 10

, over the design space for the first turbopump performance experiment. For the discretization of the design space, a step size of 0.05 on the x axis and 0.01 on the y axis was used. The EIG is the mean from 100 rollouts.

Figure 6. The distribution of the EIG estimated by PCE, with

N = 100

and

M = 10

, over the design space for the first turbopump performance experiment. For the discretization of the design space, a step size of 0.05 on the x axis and 0.01 on the y axis was used. The EIG is the mean from 100 rollouts.

Figure 7. The distribution of the EIG estimated by PCE, with

N = 100

and

M = 10

, over the design space after a few turbopump performance experiments are conducted. The assumed values for the prior were

b_{0} = N (2.5, 0.1) \times 10

,

b_{1} = N (9.5, 0.05) \times 10^{- 4}

, and

b_{2} = N (3.4, 0.01) \times 10^{- 9}

. For the discretization of the design space, a step size of 0.05 on the x axis and 0.01 on the y axis was used. The EIG is the mean from 100 rollouts.

Figure 7. The distribution of the EIG estimated by PCE, with

N = 100

and

M = 10

, over the design space after a few turbopump performance experiments are conducted. The assumed values for the prior were

b_{0} = N (2.5, 0.1) \times 10

,

b_{1} = N (9.5, 0.05) \times 10^{- 4}

, and

b_{2} = N (3.4, 0.01) \times 10^{- 9}

. For the discretization of the design space, a step size of 0.05 on the x axis and 0.01 on the y axis was used. The EIG is the mean from 100 rollouts.

Figure 8. The experimental design choices by the VPCE approach on the left and RL approach on the right side. Plotted are all the designs of 100 rollouts of a sequence of 15 turbopump performance experiments, giving a total of 1500 designs. The color indicates the density of the region in design space.

Table 1. Hyperparameters for turbopump surrogate model training.

Parameter	Value
learning rate	0.01
optimizer	Adam
scheduler	linear
end factor	0.1
total iterations	17,500
batch size	1024
split ratio	0.85

Table 2. Design and parameter space for the generation of training data.

Parameter	Value
massflow	$U$ (0.5, 6.0)
valve position	$U$ (0.1, 1.0)
distribution	uniform
b0	$N$ (2.5, 0.25) $\times 10$
b1	$N$ (9.5, 0.25) $\times 10^{- 4}$
b2	$N$ (3.4, 0.25) $\times 10^{- 9}$
distribution	normal

Table 3. Architecture of the turbopump surrogate model neural network.

Layer	Description	Dimension	Activation
input	$d, θ$	5	-
H1	Fully connected	256	ReLU
H2	Fully connected	128	ReLU
H3	Fully connected	64	ReLU
output	y	2	-

Table 4. Hyperparameters for reinforcement learning training.

Parameter	Value
parallel	10
contrastive samples	$10^{6}$
$γ$	0.99
$τ$	0.001
policy learning rate	0.0005
critic learning rate	0.0015
buffer size	$10^{6}$
M (subset)	2
N (critics)	2
batch size	1024
experiments	15

Table 5. Hyperparameters for VPCE baseline for different BOED experiments.

Parameter	Value
design gradient steps	4500
design learning rate	0.25
design optimizer	Adam
design scheduler	linear
design end factor	0.1
total iterations	3150
contrastive samples M	100
expectation samples N	1000
SVI gradient steps	2500
SVI learning rate	0.015
SVI optimizer	Adam
SVI $β$	(0.9, 0.999)
SVI schedueler	exponential
SVI $γ$	0.9

Table 6. EIG and SPCE for the turbopump performance experiment at

t = 15

. SPCE evaluated on

10^{5}

contrastive samples. Mean and standard error from 100 rollouts.

Table 6. EIG and SPCE for the turbopump performance experiment at

t = 15

. SPCE evaluated on

10^{5}

contrastive samples. Mean and standard error from 100 rollouts.

Method	SPCE	EIG
random	4.899 ± 0.159	6.218 ± 0.034
VPCE	5.300 ± 0.159	6.370 ± 0.039
RL	5.783 ± 0.174	6.782 ± 0.059

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gerling, T.; Dresia, K.; Deeken, J.; Waxenegger-Wilfing, G. Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems. Aerospace 2024, 11, 934. https://doi.org/10.3390/aerospace11110934

AMA Style

Gerling T, Dresia K, Deeken J, Waxenegger-Wilfing G. Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems. Aerospace. 2024; 11(11):934. https://doi.org/10.3390/aerospace11110934

Chicago/Turabian Style

Gerling, Tim, Kai Dresia, Jan Deeken, and Günther Waxenegger-Wilfing. 2024. "Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems" Aerospace 11, no. 11: 934. https://doi.org/10.3390/aerospace11110934

APA Style

Gerling, T., Dresia, K., Deeken, J., & Waxenegger-Wilfing, G. (2024). Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems. Aerospace, 11(11), 934. https://doi.org/10.3390/aerospace11110934

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Bayesian Optimal Experimental Design

3.2. Sequential BOED

3.3. Reinforcement Learning BOED

4. Methodology

Surrogate Model-Based BOED

5. Test Case

5.1. Turbopump Performance

5.2. Turbopump Performance Experiment

5.3. EcosimPro

5.4. Surrogate Model

5.5. Reinforcement Learning

6. Results

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI