A Robust Bayesian Optimization Framework for Microwave Circuit Design under Uncertainty

: In modern electronics, there are many inevitable uncertainties and variations of design parameters that have a profound effect on the performance of a device. These are, among others, induced by manufacturing tolerances, assembling inaccuracies, material diversities, machining errors, etc. This prompts wide interests in enhanced optimization algorithms that take the effect of these uncertainty sources into account and that are able to ﬁnd robust designs, i.e., designs that are insensitive to the uncertainties early in the design cycle. In this work, a novel machine learning-based optimization framework that accounts for uncertainty of the design parameters is presented. This is achieved by using a modiﬁed version of the expected improvement criterion. Moreover, a data-efﬁcient Bayesian Optimization framework is leveraged to limit the number of simulations required to ﬁnd a robust design solution. Two suitable application examples validate that the robustness is signiﬁcantly improved compared to standard design methods.


Introduction
The emergence of new technologies and the constant miniaturization of integrated circuits challenge engineers to obtain designs that satisfy stringent functional specifications and signal integrity (SI) requirements, as well as electromagnetic compatibility (EMC) constraints. Various optimization techniques are used to find optimal designs and improve the device performance at the early design stage [1][2][3]. While in other engineering domains it is sometimes possible to solve complex numerical equations with efficient-stable numerical methods [4][5][6]; given the high computational cost of simulating modern high-frequency circuits, surrogate modeling techniques are typically utilized to efficiently perform design optimization [7][8][9][10][11]. In particular, since the complexity of design optimization problems is constantly increasing, machine learning-based algorithms have become a popular choice to cope with the multiscale issues in radio frequency (RF) and microwave designs [12][13][14][15].
While, theoretically speaking, the performance of an electromagnetic device can be improved by applying adequate optimization techniques, in reality, there are many unavoidable geometrical and material uncertainties that degrade the device's performance. In particular, the real-life performance of a device is significantly affected by the manufacturing technology employed, the assembling inaccuracy, uncertainties due to material diversities, operation environment, etc. From the perspective of industrial design and manufacturing, there are two types of optimization models: deterministic and robust optimization models [16]. In deterministic optimization methods, all design parameters and system variables assume known values, which are freely specified by designers. Since no random variability and data uncertainty are investigated, deterministic optimization models fall short to find robust designs. In contrast, the latter are considered (to a certain extent) insensitive to variations of design parameters.
As a relatively simple case of an electromagnetic system, consider, for example, a pair of coupled microstrip lines for high-speed digital circuits. There are several manufacturing uncertainties for these types of structures. These include an asymmetric ground pin configuration, signal trace length mismatch, and asymmetric discontinuities in the circuit layout, such as at a bend of coupled microstrip lines [17]. These manufacturing variations lead to severe signal degradation, thus decreasing performance in terms of its SI. To increase the reliability of electromagnetic circuits, a robust design optimization is thus a necessity.
Several formulations for robust design have been proposed in the literature [16,[18][19][20][21][22][23]. The main idea of these approaches is to capture and account for the process variations and manufacturing tolerances early in the design cycle. In [24], Kriging surrogate models assist a worst-case robustness scenario for the optimization of electromagnetic devices. Simulation-based robust design strategies were developed by employing the Monte Carlo (MC) [25] sampling method to investigate the impact of parameter variations on product performance [26,27]. Most of these approaches, however, rely on methods that are not data-efficient. This motivates us to introduce a data-efficient optimization methodology while accounting for the uncertainties described earlier in this section.
The term data-efficient is used here to indicate techniques that minimize the amount of data necessary to reach the desired objective. Data-efficiency is particularly relevant for modern microwave design problems since (full-wave) simulations are computationally very expensive. Therefore, in this paper, we present a new data-efficient methodology for robust optimization of microwave circuits and systems based on Bayesian inference, which allows designers to take manufacturing imperfections during the optimization process into account, while minimizing the related computational cost. More precisely, our methodology adopts Gaussian Processes (GPs) [28] and propagates the input uncertainty on the design parameters by moment matching the predictive distribution of the underlying GP. Furthermore, we propose a modified version of the Expected Improvement (EI) criterion that accounts for the input uncertainty, called stochastic EI (sEI), which is used to guide the optimization process [29]. To the best of the authors' knowledge, in this paper, sEI is used for the first time for the data-efficient, robust optimization of microwave circuits and systems.
The paper is organized as follows. Section 2 first discusses the concept of robust optimum by means of an illustrative example. Next, it continues with a brief discussion on robust design optimization in engineering and introduces some of the most recent developments. Since our methodology is based on Bayesian inference, the relevant features of BO, GPs and EI are introduced in Sections 2.1-2.3, respectively. In Section 3, the complete novel methodology for solving robust BO problems in electrical engineering is proposed. Two representative numerical examples and their results are presented in Section 4. Finally, conclusions are drawn and the future research avenues are described in Section 5.

Problem Formulation
In electronics, engineers are interested in a design solution that satisfies specifications and, at the same time, is stable under uncertainty. Such a robust solution may represent a local rather than a global optimum, as long as it is significantly less sensitive to small production perturbations compared to the global optimum. A schematic example of a minimization problem is shown in Figure 1. In this figure, the horizontal axis shows a design parameter x, while f (x) indicates the function that we want to minimize (often referred to as the objective function in an optimization framework). For microwave circuits, f (x) typically corresponds to a figure of merit describing the performance of the circuit, such as bandwidth, gain, attenuation, etc. Deterministic methods tend to find the global optimum of a function f (x), indicated as point A in Figure 1, without taking any parameter variations into account [16]. This may result in big fluctuations (∆ f 1 ) of the objective with respect to small variations ∆x of a design parameter (the length of a microstrip, for example). Looking at Figure 1, the solution B is considered as robust, as the objective function f (x) does not change much with respect to the same perturbation ∆x. Therefore, despite being a local optimum, in practice design B is more favorable for engineers. Recently, several procedures have been presented to find such robust solutions in different engineering domains. While in [30] a sensitivity analysis was exploited to build a framework for robust design optimization of Mechatronics Systems, a global two-layer meta-model approximation was presented in [31] to cope with the computational challenges of robust design optimization. Moreover, in [32] a multidisciplinary robust design optimization framework, which takes both parameter and metamodeling uncertainties into account, was introduced.
In general, evaluating a relatively simple robustness measure (e.g., expectancy or worst value) of the cost function is a common practice in literature. However, worst-case analysis methods typically result in an overly pessimistic estimation of the tolerance effects [33,34]. Nowadays, Uncertainty Quantification (UQ) of electronic circuits is a popular alternative where surrogate models are adopted for efficient statistical analysis [35][36][37][38]. UQ methods can assist engineers to achieve robust designs. In [39], an UQ-based optimization strategy, leveraging on knowledge-based feature surrogates, is introduced for the robust design of microwave components. For a broad overview of several robust optimization approaches, the reader is referred to [2,16,19,40].
However, the current state-of-the-art techniques are either not probabilistic or not data-efficient. To overcome these shortcomings, we present a novel data-efficient approach that relies on a robustness measure and stochastic EI, detailed in Section 3. Still, first, the preliminary concepts BO, GP, and EI are succinctly introduced in Sections 2.1-2.3, respectively.

Bayesian Optimization
Consider an objective function f : X → R, also called cost function in the BO framework, defined on a compact subset X ⊆ R D . BO is a model-based approach to solve a global minimization (or maximization) problem defined as follows: where x is the vector of length D containing the D design parameters. In the context of an electromagnetics problem, one might want to minimize the return loss of a filter at its center frequency or maximize the gain of an antenna over a certain frequency range.
Here, the objective function f is often expensive to evaluate, as it requires full-wave modelling implemented in simulation tools such as Advanced Design System (ADS) [41].
BO is particularly effective to solve expensive design optimization problems, as it aims at minimizing the number of expensive function evaluations. BO relies on two main elements: a surrogate model that mimics the objective function, and a sequential sampling strategy that selects the next sample intelligently in order to reach the global optimum as fast as possible. Let us consider the general BO methodology depicted in Figure 2. First, the objective function f (x) is evaluated for a limited set of design parameters [x k ] K k=1 ∈ X. Initial samples are typically generated according to suitable space filling techniques, such as a Latin Hypercube Design [42]. Based on this initial data, a surrogate model of f (x) is computed. This model is very cheap to evaluate compared to performing (full-wave) simulations, and it is used to calculate the location of the candidate optimum. It is important to remark that surrogate models used in BO are stochastic: the model not only predicts the value of the objective function with regards to the parameters x, but also the degree of confidence in its prediction. The sampling strategy in BO relies on a so-called acquisition function. More specifically, the acquisition function is used to determine the location of the new sample to be evaluated, based on the stochastic model's predictions and its confidence bounds. The sample selected is then evaluated by a new (expensive) simulation of the objective function and the surrogate model is updated. This procedure continues until suitable stopping criteria are met, and each simulation refines the surrogate, increasing the probability of finding the global optima of the problem (1). A complete description of the BO properties is given in [43,44].

Gaussian Processes
In this work, we adopt a BO framework that leverages on GP as a surrogate model. Different surrogate models can be used in BO, such as Bayesian neural networks [45] and GPs [28,46]. The latter choice (GPs) is common in a BO context and also used in this paper. This is because it is analytically tractable and provides a predictive distribution given new input data. More specifically, a GP f ∼ GP (m, k) represents a distribution over functions f : X → R, which is completely characterized by its mean function m: X → R and a positive-definite kernel, or covariance function, k: X × X → R. As such, a finite set of function values [y 1 , y 2 , . . . , The mean is typically chosen to be zero. Among the different covariance functions presented in literature, the popular squared exponential (SE) kernel is used in this work, which has the form: where the hyperparameters θ are the collection of the kernel variance σ and the lengthscales l d , d = 1,. . . , D. The hyperparameters θ are tuned for a given training data set using the Maximum Likelihood Estimation (MLE): Let D n = {x n , y n }, n = 1,. . . , N, denote the set of observations of the design under study. The predictive distribution of GPs for a new input x based on D n (also called posterior distribution) is denoted p( f |x , D n ) and it can be analytically calculated, resulting in a Gaussian distribution with the following moments [28]: . In our optimization problem, the posterior mean µ(x ) represents the GP prediction, while its variance σ 2 (x ) indicates the model's confidence in its predictions.

Expected Improvement
Among the most commonly used acquisition functions are the Expected Improvement (EI) [47,48] and Probability of Improvement (PoI) [49]. In particular, a stochastic version of EI is adopted in this manuscript (see Section 3). Traditional EI is defined as: Here, y min is the minimum value observed thus far, y min = min x∈D n E( f |x, D n ) and |y min − f | represents the improvement. Hence, under the GP model, EI can be rewritten in closed form as: when σ > 0 and vanishes otherwise. Note that φ(·) and Φ(·) denote the probability density function and cumulative distribution function of the standard normal distribution, respectively. The goal is to find the point that maximizes the EI and add it to the data set D n . Using this data, the predictive distribution p( f |x , D n ) is updated and EI is recalculated to determine the next point to evaluate. This corresponds to one iteration of the BO loop and continues until the global optimum is found.
Note that finding the value of the parameters x that maximizes the acquisition function is an optimization problem per se. However, solving this problem is a relatively easy task: the EI is fast to compute since it is calculated using the current posterior distribution. It is also differentiable and can therefore be maximized with a standard gradient-based optimizer [50].

Robust Bayesian Optimization in Engineering Design
In circuits, design parameters are usually affected by uncertainties which degrade the overall performance. Among different types of uncertainty, manufacturing tolerances are one of the primary uncertainty sources, and therefore the focus of this work.
The optimization strategy described in Section 2 is deterministic. Thus, it does not take the uncertainty on the design parameters into account when searching for the optimal design configuration. Hence, the goal of robust design optimization is finding the minimum of f (x + δ), where x is the set of design parameters and δ represents the perturbation on the design parameters. Note that δ is a vector of stochastic (random) variables and not deterministic ones. In this paper, the elements of δ are assumed to be normally distributed, that is δ ∼ N (0, Σ), where the correlation matrix Σ is chosen as a diagonal matrix. Thus, the elements of δ are considered uncorrelated (and since they are Gaussian random variables, they are also independent). Additionally, we assume that the elements of Σ are constant, so their values are independent on x.
Finding a robust optimum instead of a global optimum is a challenging mathematical problem. Moreover, there is no unified mathematical definition of robustness [16,18]. One main idea portrayed in literature is to evaluate an expectation measure of the objective function. This was first introduced in literature as Type 1 robustness by Deb and Gupta [51].
In this manuscript, inspired by [51,52], the used expectation measure E r is: where p(z) = N (z|x, Σ) is the probability density function of a multivariate normal distribution with mean x and covariance Σ. This expectation measure E r is, in essence, the expectation value of the cost function under perturbation [29]. Figure 3 illustrates the optimal solution resulting from minimizing an objective function (point B) versus its corresponding expectation measure E r (point A). Solution A is considered robust, as for a small amount of perturbation δ of the design parameters, the objective function value of the solution does not change significantly. Solution B offers a better minimum, but it is sensitive to perturbations: it may not be suited for the application at hand. The main idea of the proposed robust optimization strategy is to adopt the BO algorithm described in Section 2.1 and summarized in Figure 2, where the uncertainty on the design parameters is propagated through the GP model and a new acquisition function is defined to estimate a robust optimum following the measure in Equation (3). Hence, the EI presented in Section 2.3 is replaced by a stochastic EI as follows: first, the expectation of EI in Equation (2) is taken with regards to p(z), which is the equivalent of the regular EI, but using the metric defined by Equation (3). Next, using Fubini's theorem [53], the following acquisition function is obtained: The last integral corresponds to the marginalization of z and has the predictive distribution p( f |x , D n ) as a result.
The chosen acquisition function α sEI,n requires the marginalization of the input space of the GP. However, this is analytically intractable. Relying on the literature, the most accurate way to calculated the predictive distribution is to apply Markov Chain Monte Carlo (MCMC) methods. Since the moments of the predictive distribution are analytically tractable if the kernel expectations are tractable [54], a moment matching method is used to approximate the predictive distribution p( f |x , D n ). The kernel expectations are noted as follows: In Section 2.2, it is described that the SE is used as a kernel of the GP model: the expectations in Equation (5) can be calculated analytically for the SE kernel. To obtain a sufficiently good approximation of the real distribution, in practice, we suggest to adopt only the mean and variance and calculate them through the law of total cumulance [55]: These stochastic moments are matched with a Gaussian distribution to approximate p( f |x , D n ). In this case, the expression of the sEI is similar to the deterministic case: Finally, the global minimum y min (which is unknown) is replaced by the lowest expectation under uncertainty: y min ≈ min x∈X n E p(z|x,Σ) (E( f |z, D n )). The derived formula for sEI is used in the BO framework defined in Figure 2 to find the robust optimum.
A 1D example that illustrates the sEI versus deterministic EI is shown in Figure 4. In this example, the sEI is much smoother than the regular EI because of the averaging over the input distribution. The robust optimum is a local minimum. Clearly, the EI samples at the global optimum (i.e., it has higher values around the global optimum), while the sEI samples near the robust optimum.
In the traditional BO setting, the current best optimum is easily obtained by comparing the function values evaluated so far. However, when using sEI, it is impossible to know what the current best optimum is. Indeed, to calculate the current best optimum, the GP model is used to approximate the expectation measure E r . Consequently, unlike in the regular BO setting, the convergence graph of the current best optimum may not exhibit a monotonously decreasing trend.

Application Examples
In this section, the proposed method is applied to optimize two microwave filter designs, and their performances are compared to standard BO. The proposed method was implemented using the GPFlowOpt Python package [56].

Microstrip Lowpass Filter
The two-port lowpass stepped impedance microstrip filter presented in [57] is studied in this section. The filter is formed by six microstrip line sections with different lengths l i , i = 1,. . . , 6, while the following relation holds for their widths [57]: w 1 = w 3 = w 5 and w 2 = w 4 = w 6 . The corresponding filter layout is presented in Figure 5. This filter operates in the [1,4] GHz band, and resides on a substrate with a relative permittivity r = 4.2 and thickness h = 1.58 mm. The nominal values of its geometrical parameters are shown in Table 1. The simulator used to estimate the filter's performance is the MATLAB RF Toolbox (Mathworks Inc., Natick, MA, USA). Table 1. Microstrip low-pass filter geometrical parameters.

Parameter Value
Microstrip lengths l 1 = 2.05 mm, l 2 = 6.63 mm, l 3 = 7.69 mm, l 4 = 9.04 mm, l 5 = 5.63 mm, l 6 = 2.41 mm Microstrip widths w 1 = w 3 = w 5 = 11.3 mm w 2 = w 4 = w 6 = 0.428 mm For this example, we will consider the length of each microstrip section as a relevant design parameter, defined in the supports l 1 , l 6 ∈ [1, 5] mm, l 2 , l 5 ∈ [4,8] mm and, l 2 , l 3 ∈ [6.5, 10.5] mm. The design goal is a filter with a 3 dB cut-off frequency at 2.4 GHz. This is achieved by minimizing the objective function formulated as: f c (x) = | f c − 2.4|, where f c (x) is the cut-off frequency expressed in GHz for a specific design configuration. In this example, we assume that the perturbation δ on the six design parameters has the following covariance matrix Σ = diag(0.6 2 , 1.2 2 , 1.7 2 , 1.7 2 , 1.2 2 , 0.6 2 ) mm 2 . Comparing the elements of the covariance matrix with the nominal values defined in Table 1, it is clear that a relatively large perturbation of the design parameters is assumed. This choice is made to clearly illustrate the performance of the proposed method compared to standard BO. The optimization is performed by assuming a total computational budget of 100 (l 1 , l 2 , l 3 , l 4 , l 5 , l 6 ) samples for both standard BO (with EI) and our novel robust BO methodology (with sEI). Ten initial samples are chosen via a Latin Hypercube Design, while the remaining samples are sequentially chosen by the optimization algorithm, comparing EI with sEI.
In order to assess the robustness to the value of the initial samples, 12 sets of initial samples are chosen via a Latin Hypercube Design and the optimization process is repeated for each sample set. This allows one to estimate a confidence interval for both standard BO and the advocated methodology.
Results are shown in Figure 6, which illustrates the progression of the robust measure E r with respect to the number of simulations performed for both BO and the proposed robust algorithm. The proposed optimization method clearly outperforms BO to find a robust optimum, and the difference already becomes clear after a few iterations. To evaluate how different E r values describe the robustness of the filter performance with respect to the perturbation δ on the design parameters, additional results concerning the filter's transmission characteristic (element S 21 of the filter's scattering matrix) are shown in Figure 7. First, consider the optimization results for one set of the initial samples: standard BO individuates a global optimum x BO = (5.00, 5.25, 9.89, 8.38, 8.00, 1.33) corresponding to f (x BO ) = 3.4e − 16, while the proposed robust optimization strategy selects a robust local optimum at x RBO = (1.00, 7.81, 10.50, 6.50, 8.00, 5.00), corresponding to f (x RBO ) = 0.025. Figure 7 shows the results of the MC analysis performed around x BO and x RBO : the design solution individuated by the novel algorithm is significantly more robust to the perturbation δ on the design parameters. Similar results hold for all 12 initial sample sets.

Zigzag Narrow Bandpass Filter
The second microwave example is a zigzag narrow bandpass filter (see Figure 8) based on the design proposed in [58]. The filter is defined over the frequency range [2,3] GHz on a substrate with height h = 0.5 mm, relative permittivity r = 2.2 and loss tangent tanδ = 0.003. The width of both the horizontal and vertical conductors is 0.4 mm. This filter has a very narrow pass-bandwidth around [2.4, 2.6] GHz and a center frequency at 2.5 GHz. Since the bandwidth is very sensitive to the filter's geometrical parameters, three design parameters are considered: the distance D, the length L of the two coupling parts and the gap S between the horizontal conductors (see Figure 8). These parameters are defined in the supports [0.2, 2] mm, [16,20] mm and [0.1, 1] mm, respectively. The goal is to design a filter with passband in the range [2.4, 2.6] GHz. For simplicity, we only consider performance specifications imposed on the reflection coefficient (element S 11 of the filter scattering matrix). In order to achieve our optimization goal, the objective is formulated as follows: where f L and f H are the frequencies determining the intended operating bandwidth expressed in GHz. The perturbation on the design parameters has the following covariance matrix Σ = diag(0.03 2 , 0.06 2 , 0.9 2 ), which corresponds to around 5% of their nominal values. The optimization starts with 15 initial points, and continues for 100 iterations adding up to 115 (D,L,S) samples in total. As for the previous example, 16 sets of different initial samples are chosen via Latin Hypercube Design and the optimization process is repeated for each sample set. Differently from the example in Section 4.1, where the scattering parameters of the filter are computed using a quasi-analytical model in MATLAB, here, full-wave simulations, using Advanced Design System (Momentum EEsof EDA, Keysight Technologies) [41], are adopted to calculate the reflection coefficient. Due to the high associated computational cost, it was not feasible to calculate the E r measure over all 100 iterations. Therefore, we present the results only for the last iteration of the optimization loop. The results are shown in Table 2. As observed, the proposed methodology indeed achieves a lower E r value compared to the BO method with respect to the small perturbations imposed. The global minimum found by BO is x BO = (0.3, 0.7, 18), leading to f (x BO ) = 4.4e − 16. The robust minimum defined by metric E r lies at x RBO = (0.5, 1.1, 17), corresponding to f (x RBO ) = 0.089. The results of the MC analysis performed around x BO and x RBO are shown in Figure 9: despite being a local optimum, the design solution individuated by the novel algorithm is more robust to the perturbation δ on the design parameters.

Conclusions
We presented a framework for robust optimization within the standard Bayesian optimization framework, which considers input uncertainty. The standard BO framework is extended to the novel robust methodology by using the sEI, a modified version of the EI acquisition function, as the sampling strategy. sEI takes a measure of robustness (E r ) into account to converge to a robust optimum rather than a global one. Such a robust solution may be a local optimum, however, it is more stable under perturbations compared to the global optimum obtained by standard BO. GPs are employed as the surrogate model and the uncertainty on the input parameters is propagated by moment matching the predictive distribution. Moreover, our advocated robust optimization is performed in a data-efficient manner: the number of the expensive function evaluations is limited owing to the BO scheme. The effectiveness of the optimization framework is demonstrated on two filter design examples. Clearly in both cases, a lower median of the measure E r , as well as a lower variance across replications of the optimization method, is observed.
Several extensions and improvements to the current approach are possible. In particular, we aim at extending the current approach to high dimensional inputs. Another improvement would be considering an input uncertainty distribution other than Gaussian. Moreover, for engineering design, a multi-objective version of a robust optimization method is very relevant.