Mechanical Characterization of Polysilicon MEMS: A Hybrid TMCMC/POD-Kriging Approach

Microscale uncertainties related to the geometry and morphology of polycrystalline silicon films, constituting the movable structures of micro electro-mechanical systems (MEMS), were investigated through a joint numerical/experimental approach. An on-chip testing device was designed and fabricated to deform a compliant polysilicon beam. In previous studies, we showed that the scattering in the input–output characteristics of the device can be properly described only if statistical features related to the morphology of the columnar polysilicon film and to the etching process adopted to release the movable structure are taken into account. In this work, a high fidelity finite element model of the device was used to feed a transitional Markov chain Monte Carlo (TMCMC) algorithm for the estimation of the unknown parameters governing the aforementioned statistical features. To reduce the computational cost of the stochastic analysis, a synergy of proper orthogonal decomposition (POD) and kriging interpolation was adopted. Results are reported for a batch of nominally identical tested devices, in terms of measurement error-affected probability distributions of the overall Young’s modulus of the polysilicon film and of the overetch depth.


Introduction
By combining electronic and structural components, micro electro-mechanical systems (MEMS) have been a successful example of micro-technology surpassing conventional ones for a variety of devices like e.g., accelerometers, magnetometers, scanners, pressure sensors and gyroscopes [1]. However, the development of these devices involves relatively complicated steps, due to their small size scale. Testing and characterization are obviously necessary to predict the operational performance of a new device, optimize the micro-fabrication processes and develop applications [2,3]. Such tasks may become complex due to the presence of unknown mechanical or geometrical parameters, featuring large uncertainty levels at the microscale. The current miniaturization of the mechanical components, as needed for ever smaller devices, is enhancing the importance of such uncertainty issues in the fabrication process [4]. In fact, the small dimensions of miniaturized devices can go in the direction of micro-fabrication tolerances and of characteristic lengths linked to material heterogeneities. Therefore, improvements of our understanding of the involved sources of uncertainty can largely affect the success of newly designed devices to meet application constraints [5,6]. Even if the discussion to follow is focused on the operational conditions of the devices, when, from the mechanical perspective, no dissipation mechanisms take place in their movable structures, some reliability issues might be triggered by uncertainties, as linked to the propagation of microcracks in the polysilicon film [7][8][9].
Geometrical uncertainties, namely variations of the fabricated device geometry away from the designed layout, strongly depend at this length-scale on the fabrication process tolerances (see e.g., [10]). For instance, in [11,12], flexure width variations up to 0.4 µm and 0.7 µm were respectively reported. Fabrication inaccuracies can then result in a 10% variation of the geometry of conventional MEMS [10], and can consequently spoil or compromise the target performance. Variations in the material properties (like e.g., the Young's modulus) can instead have intrinsic reasons, in addition to the dependency on the micro-fabrication process. This is especially the case for micro-sized components made of polycrystalline materials like polysilicon. As the dimensions of the components shrink, stochasticity of crystal orientation and grain morphology can lead to a scattered mechanical behavior of the whole component. This issue was reported numerically in [13,14] and experimentally in [15], for a decreasing material sample size.
A successful approach to assess the two aforementioned types of uncertainty has to be based on an experimental method and a relevant modelling/data reduction strategy. Due to the diversity of fabrication processes and MEMS applications, a unique approach to the mentioned assessment does not exist: a plethora of methods can be found in the literature to characterize the material at the microscale. The most common ones are based on tension and bending deformations of a specimen; a classification can be made depending on the test setup, on the sample preparation and on the actuation and read out methods. Reviews of these methods can be found, e.g., in [5,[16][17][18]. Common approaches to characterize process variations at the microscale are instead based on surface inspections through optical or scanning electron microscopy [19,20], or laser interferometry [21]. These methods, even if characterized by a rather easy data reduction, are generally expensive and need the devices to be on open wafers, so that they are limited to pre-packaged testing. In [22], a testing platform using laser Doppler vibrometry was proposed to reduce the testing time, but still on unpackaged devices. An alternative approach was adopted in [11] to measure the capacitance change induced by an applied voltage, using standard electrical probing. This metrology is inexpensive and also applicable to packaged devices; nevertheless, due to the adopted data reduction method, it can be used only for measuring geometrical features.
In this paper, following our former works [23][24][25], a methodology is proposed to deal with the response of an on-chip testing device. The method is based upon standard electrical probing, and aims to stochastically characterize geometrical and material properties of polysilicon films through a flexible and powerful data reduction technique. The testing device here exploited, as introduced in [23], features a micro-cantilever film sample subjected to a bending deformation. Within a stochastic framework, the experimental results are interpreted by means of a transitional Markov chain Monte Carlo (TMCMC) analysis, to also account for the effects of measurement errors on the estimation of the unknown parameters ruling geometrical and material uncertainties. A high fidelity, finite element (FE) model of the testing device was adopted to properly represent the complexity of the system, potentially in terms of the polysilicon morphology too [25]. Since the handling of FE simulations is prohibitive in a TMCMC analysis, a parametric model order reduction method was proposed to reduce the computational effort, preserving the required accuracy of the solution. Such order reduction method is based on a hybrid use of proper orthogonal decomposition and kriging metamodeling [26,27].
For ten devices randomly chosen from the same wafer, the proposed approach has been used to quantitatively assess the effective stiffness of the micro-cantilever and the overetch depth, the latter being defined as a defect in the in-plane geometry of the movable structure caused by fluctuations of temperature and etchant concentration [28] that lead to a real geometry of the device different from the target one. Both polysilicon elastic properties and overetch depth were assumed isotropic in the plane of motion, not only in terms of their most probable values but also through their scattering around the mean.
The remainder of this paper is organized as follows. In Section 2, some details of the on-chip testing device are provided; the measurements collected for a batch of nominally identical specimens are also discussed. In Section 3, the numerical procedure adopted to model the coupled electro-mechanical response of the device is detailed, focusing on the methodology for reduced-order modeling based on proper orthogonal decomposition and kriging interpolation in the domain of variation for the parameters to be identified. Moving to the identification procedure through a Bayesian approach, Section 4 gathers the major details of the adopted TMCMC methodology (while additional technical details are given in the Appendix A). Results of parameter estimation are then collected and discussed in Section 5. Finally, some concluding remarks and comments on future activities are provided in Section 6.

On-Chip Testing Device: Experimental Data and Relevant Scattering
The on-chip testing device proposed in [23] for the estimation of micromechanics-induced uncertainties in polysilicon MEMS, was fabricated through the ThELMA process by STMicroelectronics, Geneva, Switzerland (see e.g., [29]). An SEM picture of the mechanical part of the device is shown in Figure 1: the massive perforated plate (denoted as rotor) and the stators surrounding it were designed to deform, through an electrostatic actuation, the slender cantilever beam shown in the close-up. This beam or spring is connected to the substrate on its top side and to the rotor on its bottom side. Such a statistically determinate mechanical scheme of the movable structure was devised to reduce to a minimum the effects of residual stresses on the measured response to the actuation. Although a slight in-plane offset of the rotor can be induced by gradients of the residual stress field inside the polysilicon film, such effect can be considered negligible (see [24,25]). The target geometrical characteristics of the tested devices are summarized in Table 1. The structure has been packaged at a pressure of 0.35 mbar, so that the working conditions are close to those of similar commercial devices. Under quasi-static loading conditions, with the rotor electrically grounded and a bias voltage applied to the stators, attractive forces easily reduce the gap at the capacitors due to the slenderness of the beam. Because of the symmetric position of the top and bottom stators about the longitudinal axis of the beam, a pure bending is induced in the cantilever and the rotor gets displaced (mainly rotated) as a rigid body. For capped devices, the voltage-induced change of the mechanical configuration could not be assessed through visual inspection; the change of the capacitance at the same capacitors where the voltage is applied was then measured to sense the deformation of the beam. Table 1. Target geometric dimensions of the device (see Figure 1). Due to the presence of the lateral stator shown in Figure 1, different deformation modes can be induced in the beam. In this work, only the aforementioned purely bending one is investigated, to focus on the statistics and dispersion of the results induced by micromechanical features and by the measurement noise. Additional details on this device and on its response to the actuation can be found in [23][24][25][30][31][32].

Parameter
Since the cantilever was purposely designed as thin as possible, in compliance with the micro-fabrication constraints, a remarkable spreading in the mechanical response of nominally identical devices was already reported and is studied here. Ten specimens were taken from a single wafer, and their response measured in terms of capacitance change C against the applied voltage V: results are all collected in Figure 2. The different colors used in this figure to distinguish each single response from all the others will be exploited in Section 5 to correlate the stiff/complaint response in this C-V plane with the final estimates of the unknown parameters characterizing the devices (see below).  The goal of the experimental campaign was not an assessment of the pull-in instability; in fact, in [23], we showed that a one-to-one relationship does not exist between the pull-in voltage and the micro-fabrication uncertainties. Accordingly, the devices that displayed a more compliant response were actuated up to lower voltage levels, so as to avoid any damage and allow repeated testing. The scattering shown in the graph, which was reported in [25] to depend on the properties of the devices and not on measurement errors, is therefore directly related to uncertainties at the microscale. Among them, one major source is induced by the polysilicon film morphology at the beam level, which is different for each device. Such effect was shown in [23,30] to give rise to variations of the measured (effective) Young's modulus of the microcantilever, on the order of ±5% with respect to the reference isotropic value [14]. Another uncertainty source is related to the patterning/etching stages of the production process, which do not provide a uniform geometry of the moving structure over the whole wafer (see, e.g., [12]). Accordingly, the overetch defect [32], which modifies the in-plane dimensions of the device components with respect to the target ones, needs to be considered too.
In former works [23,32], it was already pointed out that estimating the effects of these two main sources of uncertainty from the overall device response is not a trivial task. As already stated, data concerning the pull-in voltage alone are not enough to micromechanically characterize each single device, due to its nonlinear response in terms of the C-V plot and to the intricate, combined effects of film morphology and overetch depth. Within a fully stochastic frame, the evolution of the system response under an increasing applied voltage is accounted for here, along with the measurement noise.
To slightly simplify the problem, in compliance with previous studies, the microstructural uncertainties are assumed to depend on two unknown parameters only: the overetch O, assumed to be homogeneous and direction-independent for each single device; and the effective Young's modulus E of the polysilicon beam. Variations of O affect the thickness and the length of the beam, and the gap at capacitors in the initial configuration; variations of E affect instead the stiffness of the beam only.

Reduced-Order Modelling
To identify the characteristic values and the dispersion of parameters O and E, as featured by the batch of devices that furnished the responses gathered in Figure 2, an inverse problem has to be solved. By comparing the experimental data with the results of micromechanics-informed numerical simulations, the mentioned stochastic properties of the parameters can be identified, also handling the measurement error. Such characterization of the polysilicon film will be therefore richer in the content of information if compared to those provided in [23,24], for which the ill-conditioning of the inverse problem [33] would have potentially affected the solution.
As will be detailed in Section 4, a TMCMC method for Bayesian parameter estimation was adopted. This type of investigation requires a large number of simulations to properly catch all the effects of the parameters to be identified on the nonlinear transition between the input parameter (the applied voltage V) and the output one (the measured capacitance change C) (see [34]).
Each high-fidelity, coupled electro-mechanical simulation is based on a finite element (FE) discretization of the mechanical and electrical domains. Although the type of actuation described in Section 2 gives rise to a deformation of the movable structure that can be handled through simplified, beam bending-like analytical approaches like those explored in [23,31], the FE analyses can provide a more general framework and account for effects disregarded in the analytical models (like e.g., those linked to anchor compliance, re-entrant corners close to the final cross-sections of the cantilever, and fringe fields in the electrical domain) (see also [25]). Handling such FE analyses within an inverse analysis turns out to be computationally prohibitive, and so the overall computational burden of the procedure must be reduced.
To preserve as much as possible the accuracy of the solutions, we discuss next a methodology for model order reduction, which is based on a set of system responses (also called snapshots) at varying values of the unknown parameters to be estimated. As the investigation aims to get insights into the (nonlinear) effects of those parameters on the C-V response of the devices, this methodology is referred to as parametric model order reduction [35]. Algorithmically, the method is based on a coupling of proper orthogonal decomposition (POD) and kriging interpolation, to build a surrogate of the initial full-order FE model. POD is used to quantify the correlation among the snapshots, and then set the subspace onto which the full-order model response can be projected; kriging is used to predict (and not compute) the system response for any values of the unknown parameters different from those adopted to obtain the snapshots.
Parametric model order reduction strategies similar to the one adopted here were already proposed for nonlinear problems, even within a multi-scale framework, in e.g., [36]. As long as the response of the system evolves smoothly in the inspected domain of variation of the parameters, wherein the interpolation is introduced, the aforementioned strategies can work quite satisfactorily (see also [37,38]); whenever the response varies instead non-smoothly, for instance as due to crack evolution, the applicability of the methodology has to be proven.

Proper Orthogonal Decomposition
We briefly review first the key features of POD for building a model-specific optimal subspace from the full-order FE model, based on an ensemble of responses. A general and synthetic introduction to POD for parametric reduced-order modelling can be found in [39] (see also [40,41]).
We start by considering the case of a single FE analysis, at assigned values of O and E collected in the vector x of unknowns. Later, we will extend this solution to the case of multiple x values, in order to allow for parametric dependence in the solution.
Let c ∈ R m be a vector representing the device response, measured at a progressively increasing value of the input actuation. Here, R is the set of real numbers, and m denotes the dimension of c. For the present case, c collects the values of the capacitance change C linked to a set of values of the applied voltage V. Even though the response of each device smoothly evolves with V (see Figure 2), vector c is designed to gather a set of values of C to exploit redundancy and to also reduce the effect of measurement noise on the final estimates of O and E. The vector c can be represented as a superposition of orthonormal bases φ i , i = 1, 2..., m, according to: where a i are the expansion coefficients or, in other words, the amplitudes of the bases φ i in the solution. In compliance with the notation of Equation (1), the aforementioned coefficients are gathered in the column vector a, while matrix Φ = [φ 1 , φ 2 , ..., φ m ] collects all the bases, or proper orthogonal modes (POMs) as columns. By truncating the expansion in Equation (1) according to: with q < m, a reduced-order representation of the original model is obtained. The matrixΦ thus consists of the first q bases only, whereas the vectorâ collects the corresponding weighting coefficients, which are different from those in a to compensate for the truncation errors. The computational gain when moving from Equation (1) to Equation (2) is related to the ratio q/m between the degrees of freedom of the reduced-order and of the full-order models. The aim of the POD-based reduced order modeling procedure is to provide, at assigned q, an optimal set of bases able to minimize the truncation error in Equation (2); alternatively, POD can provide the value of q to guarantee a requested accuracy of the order reduction [42]. The snapshot version of POD, see [39], can address both of the aforementioned problems at the same time, setting q to attain the requested accuracy and providing theΦ matrix.
To move now to the solution of the parametric reduced-order model problem, we consider n different systems, each one featuring its own values of the unknown parameters O and E. All the responses are collected in the snapshot matrix C, so that the representation can be written as: Matrix C can be factorized through a singular value decomposition (SVD), to provide [40,43]: where L is an orthonormal matrix, whose columns are the left singular vectors of C; Λ is a pseudo-diagonal and semi-positive definite matrix, whose pivotal entries λ ii are the singular values of C; R is an orthonormal matrix, whose columns are the right singular vectors of C. Granted that the singular values of C are sorted in a decreasing order, Equation (4) provides the sought basis set via L.
The order q of the reduced-order model is finally set by fulfilling: where is the assigned error threshold (typically small, i.e., ∼ = 1 − 5%). Being related to the snapshots in C, the accuracy provided by Equation (5) holds true only for the handled values of the unknowns in x 1 , x 2 , ..., x n . To ensure the capability to approximate the system response also for values of x different from those adopted to obtain the snapshots, so for any arbitrary combination of the ruling parameters in their domain of variation, an interpolation needs to be introduced. By assuming thatΦ is not affected by x, we state: where the approximate responseĉ(x) and the weighting functionsâ(x) depend on x through an interpolation to build around the coefficients stored in matrixÂ.
In Equation (6), the reduced-order basis matrixΦ is not supposed to vary significantly with x, as the snapshots collected in C are assumed to provide physically-sound and accurate information on the actual system behavior, independently of x. The combination coefficientsâ instead depend on x, as different parameter values may selectively increase the importance of each mode inΦ. Since the system response is supposed to smoothly depend on O and E up to pull-in, a smooth interpolationâ(x) is looked for within their ranges of variation. Accordingly, system nonlinearities linked to the effect of x on the C-V curve are covered by the interpolation functionsâ(x); nonlinearities related instead to the C-V plot itself at assigned x, are (approximately) accounted for when building the modes retained inΦ.

Kriging Interpolation
To build the functionsâ(x) or an approximation for them, kriging is discussed next. This methodology already showed a high performance, in terms of accuracy and robustness, when adopted to obtain surrogates for nonlinear systems [44,45]. In what follows, we denote withα(x) the obtained approximating functions, to distinguish them from the correct onesâ(x); since the sought smoothing is aimed at minimizing the interpolation errors within the whole domain of variation of the parameters, the computed solutions may not even match some of those collected by the snapshots. Kriging looks particularly attractive for the present investigation as it provides a probabilistic framework to buildα(x), which obviously depends on the training data obtained with the full-order simulator.
As the standard goal of kriging is to furnish an estimate of a scalar function, we build an individual emulator for each entry of the interpolation vector. Hence,α i (x), i = 1, 2, ..., q, is assumed to be a sample path from a Gaussian process, which needs to be characterized in terms of its mean and autocovariance functions. To get these features, the available coefficients of matrixÂ (see Equation (3)) are exploited. For ease of notation, the index i is dropped in what follows and reference is made to the scalar functionα(x); according to the preceding discussion, the procedure reported in what follows has to be repeated q times, handling the interpolation functions one-by-one.
With universal kriging, it is assumed that the Gaussian process under investigation is centered around a regression model, known as a trend [46], in compliance with: where .., f s (x)} are the regression functions, and β = {β 1 , ..., β s } T are the corresponding regression coefficients. The number of regression functions is assumed s ≤ n (where n is the number of observations), so as to avoid managing an under-determined problem. Function Z(x) has zero mean, and a variance σ 2 α R(|x − x |, ξ) depending, besides σ 2 α and ξ, on the distance between the two points x and x in the parameters space. Since the correct representation of the data critically depends on the type of correlation function R, parameters ξ, β and σ 2 α have to be determined on the basis of the chosen correlation and regression functions. Various types of functions are available in the literature (see [47]): the anisotropic generalized exponential model is the common choice for the correlation function [48], and low order polynomials of x are usually adopted for the regression model [46].
The vectorâ * , representing the responses computed through the simulator (and here meant as the row of matrixÂ in Equation (3) corresponding to the function α(x) currently handled), and the relevant emulator predictionα(x), are all assumed to be normally distributed and then written as [49]: where entries of F are computed as F jk = f k (x j ), j = 1, ..., n, k = 1, ..., s; entries of the correlation matrix R are instead computed as R ij = R(x i , x j ; ξ), i, j = 1, ..., n; vector r(x) of cross-correlations between the observation point x j and the prediction point x, is provided by r j (x) = R(x, x j ; ξ), j = 1, ..., n.
The mean and variance of the conditional distribution ofα(x) =α(x) |â * , also known as mean and variance of the kriging predictor, are Gaussian as well and are respectively given by, see [46]: where: In the equations above here,β is the generalized least squares solution (with respect to R) of the regression problem: Additional technical details of kriging were gathered in [30]. In practical terms, the regression and correlation models need to be selected first, and then kriging parameters can be determined via the provided statistical inference technique. Open software programs are available to obtain µα and σ 2 α , like e.g., the Matlab toolbox DACE [50] adopted in this work; in it, the built-in anisotropic generalized exponential correlation model and constant regression functions were used.
As far as the computational costs of kriging interpolation are concerned, it is worth noting that the procedure is used to derive metamodels for the weighting coefficients of the POMs, rather than directly for the response of the FE model (i.e., for data points in the C-V curves). Computational resources are therefore saved as kriging models that have to be built as many times as the number q of POMs retained in the reduced-order model, which are typically a few (on the order of q =2-4 for many applications, see also [43,51]), rather than as the number of the data points to define the C-V curves, which are instead many more to properly catch the nonlinear response of the device. This hybrid use of POD and kriging for model order reduction can be accordingly termed POD-kriging (see [26]).
To build such POD-kriging model, a set of simulations for given values of the parameter vector x is necessary; this step can be regarded as a sampling within the parameters space. Several sampling methods are available in the literature, such as Monte Carlo [52], Latin Hypercube [53], Halton [54], Sobol [55] and greedy [56] ones. In this work, sampling was based on the sparse grid Smolyak algorithm [57], which had been formerly adopted for kriging sampling in [58,59]. The most important feature of this technique is rooted in its hierarchical structure [60]: new sampling points, required if the surrogate does not attain a required level of accuracy, are plugged into the current set without any rearrangement of the already handled ones. This enriching stage of the surrogate becomes necessary whenever the initial set of the sampling points does not prove adequate to account for all the effects of model parameters on the POM mixture. Among the different sparse grid sampling procedures, the Clenshaw-Curtis one [59,61,62] was used to construct the POD-kriging model. The relevant hierarchical sequence of the sampling points within an exemplary (dimensionless) two-dimensional domain is sketched in Figure 3, at changing degree of interpolation: it is clearly shown that, by increasing the order d, the whole domain of variation of model parameters gets more densely inspected. The relevant number of sampling points to deploy at varying degree d is collected in Table 2, in order to also show that the adopted sampling method allows not to exponentially increase with d, so, with the accuracy of the reduced-order model, the computational burned related to the construction of the POD-kriging model.

Transitional Markov Chain Monte Carlo Method for Bayesian Parameter Estimation
The reduced-order modelling technique described in Section 3 was adopted to reduce a priori the computational burden of each analysis to carry out within an (offline) Bayesian parameter identification procedure, based on a TMCMC simulation [63]. The TMCMC method is one of the most advanced ones to deal with the optimization of relatively high-dimensional stochastic systems, and to sample complex probability density functions (PDFs) [64]. Within such framework, the unknown model parameters are viewed as random variables described by the relevant PDFs; this standpoint varies from deterministic approaches, dealing instead with the unknown parameters as deterministic variables, and generally failing to address the ill-posedness of the problem when noisy data and modeling inaccuracies are involved.
Within the proposed probabilistic inverse analysis, for each device, a population of possible solutions (in terms of unknown parameters x) is generated and then evolved according to the information collected through the C-V plot; the values providing the best fit to the experimental points are finally selected as the solution of the identification procedure. If C exp represents the system observations, namely the values of the capacitance change measured at varying actuation voltage, by applying the Bayes' rule [65], one can sequentially update the probability of x conditioned on C exp through: where p(x, M) represents the prior PDF of x, which accounts for the knowledge of the parameters before accounting for the observations; p(C exp |x, M) is the likelihood, which describes the probability of witnessing the data C exp , given the parameters x in the model M. The proportionality factor, implicitly linking terms on the left-and right-hand sides of Equation (14), stands for the evidence of the model M to represent the measurements [34]. Within each successive iteration of the (reduced-order) model updating technique, Equation (14) converts the prior knowledge p(x, M) into the posterior density p(x|C exp , M), having observed the data in C exp . In relation to this Bayesian framework, two important issues to address are the type of the likelihood, and how to properly reconstruct the posterior PDF. By relying on the central limit theorem, the likelihood can be assumed Gaussian; interested readers can find a detailed discussion on the choice of the likelihood function in [66]. The posterior distribution can be instead reconstructed using stochastic simulation methods, able to generate samples distributed according to such PDF. MCMC algorithms are typically recommended for this type of Bayesian simulations (see e.g., [34]): they involve a random walk (i.e., a succession of random steps) through the PDF, favoring values with higher probability. By repeating these random walks, every point in the parameters space is hit with a frequency proportional to its probability, so that the stationary distribution of the Markov chains is equal to (or at least it is proportional to) the target PDF.
Within this class of algorithms, the TMCMC relies on sampling from a sequence of Γ non-normalized intermediate PDFs p γ (x) according to, see [63]: where the tempering parameters η γ monotonically increase with γ (so that 0 = η 0 < η 1 < ... < η Γ = 1), to conceptually resemble simulated annealing [67]. By managing the experimental measurements collected in C exp , the TMCMC adopts these tempering parameters to dilute the effect of data being used in the Γ-1 sampling sequences.
We assume now that N s samples x (γ) k , k = 1, ..., N s are taken from each intermediate PDF p γ (x); accordingly, this is done also at the initiation stage handling p(x, M). A technique similar to importance sampling is then used to generate the distribution p γ+1 (x), where the importance weights are given by: and next normalized through:w By adopting a standard importance sampling procedure, a resampling stage is necessary to move the samples with the highest normalized weights onto the next intermediate distribution. This way, the method would soon face the well-known sample degeneracy problem [68]. To avoid such issue, the TMCMC procedure handles each resampled value x (γ) k as the starting point of a Markov chain, which evolves in accordance with the Metropolis-Hastings algorithm [69], and along which each normalized importance weight determines the relevant chain length. A major feature of the present method is that alternate Markov chains, growing towards the high probability regions of the posterior, allow for also reconstructing multimodal PDFs. The algorithmic details of the TMCMC method used in this work is summarized in the Appendix A (see also [63]).

Results: Parameter Identification via POD-Kriging and TMCMC Analysis
Referring to the micromechanically governed problem introduced in Section 3, the high-fidelity FE model driving the parameter identification procedure was developed according to [30] (see also [25]). To study the in-plane motion of the rotor induced by the top and bottom stators, a two-dimensional discretization was adopted. The movable structure was modeled with quadratic triangular elements with displacement degrees of freedom only, whereas the gaps at capacitors were discretized with linear elements featuring both displacement and voltage degrees of freedom. To describe the configuration changes induced by the applied voltage, which leads to a consistent narrowing of the aforementioned gaps, the number of elements across the thickness of the electrical domains was selected to provide accurate and mesh-independent results, up to situations close to pull-in. Concerning the mechanical domain, to account for the possible stress intensification close to the re-entrant corners at the final cross-sections of the beam, the mesh was locally refined. This overall resulted into a large number of degrees of freedom, amounting to around 250,000, somehow depending on the overetch value that affects the structural geometry.
By adopting the proposed POD-kriging method, the computing time for the whole analysis up to pull-in, has been decreased from 3600 s to a few milliseconds on a personal computer featuring an Intel ® Core TM i7 3.00 GHz as processor and Windows 10 as operating system. The ensemble of snapshots used for POD-kriging surrogate modeling was generated using the full-order  Table 2), adopting an interpolation of degree d = 4 in the domain of variation of the parameters (see Figure 3), and leading to an error < 1% in the POD truncation by keeping the first two POMs only in the reduced-order model (i.e., q = 2). The shapes of these POMs are illustrated in Figure 4, where the C-V curve relevant to a device featuring O = 0.15 µm and E = 169 GPa and its POD representation are compared. To assess the contribution to the overall response provided by each mode, the two POMs are also reported separately. While the amplification of each POM represented in Figure 4 is specific for the aforementioned values of the unknown parameters, a similar trend was shown by all the other simulations.
The kriging interpolation functionsα(x) (see Section 3.2) are reported in Figure 5 over the whole domain of variation introduced before. It is shown thatα 1 nonlinearly varies with O and E;α 2 instead varies almost linearly with O, being almost independent of E. This latter feature, together with the difference in the amplitudes of the two interpolation functions in the studied range, allow stating that, in case of variation of the Young's modulus E only, the C-V curve can be fairly well presented using one POM only, whose amplitude nonlinearly depends on the value of E itself.
To estimate the specimen-dependent unknown values of O and E, the TMCMC algorithm was run using 10,000 samples for each stage. Thanks to the low computational burden associated with the POD-kriging model, the whole parameter identification procedure took up to 9 min for the analysis of each device on the same personal computer mentioned above. In the analysis, the measurement noise was assumed to be an additive white Gaussian process, with zero mean and standard deviation σ m = 10 −3 fF, in accordance with the experimental details provided in [25]. Since no prior knowledge on the unknown parameters was available, their initial distribution in the domain of variation was considered uniform.  The results of the TMCMC analyses are plotted in Figure 6, in terms of the spreading of the samples at the end of the identification procedure. Results are relevant to nine specimens, out of the ten experimentally tested; those relevant to the last specimen will be discussed later in this section, and they are not included here as the procedure did not provide convergence towards feasible values of the two unknown parameters, at least within the considered domain of variation. The same color coding adopted in Figure 2 is managed here, as already commented on, to provide a visual correlation between the C-V response and the obtained estimates of O and E. Outcomes show that the values of overetch and Young's modulus vary from one specimen to another: for all the cases, O turns out to be bounded between −0.1 µm and 0, while E significantly varies between 130 GPa and 160 GPa. Sample distributions also testify to a negative correlation between O and E: larger values of O correspond to lower values of E, and vice versa. Such correlation between the two unknown parameters and the dispersion of the samples corresponding to each device reveal a potential ill-posedness of the problem due to the existence of locally flat areas of the objective function implicitly handled in the identification procedure. Dealing with such objective function by means of a deterministic approach (see e.g., [24]) often fails to provide a reliable estimation of O and E for all the specimens.
The final specimen-dependent estimates can be better assessed by looking at the distributions of the samples, shown together with the histograms of parameter distribution in Figure 7. Such histograms provide a complete insight into the statistical properties of sample scattering. If estimates of O and E are given in terms of mean value and variance only, confidence intervals can be established, but they are affected by the presence of outliers (see e.g., Figure 7c) improperly estimated in the case of multimodal distributions (see e.g., Figure 7b), or in the case of heavily skewed distributions (see e.g., Figure 7e). One may instead adopt the mode of each posterior distribution as a more appropriate estimate, as it represents the value that appears most often in it. Accordingly, the estimated values O m and E m , along with the relevant standard deviations are collected in Table 3 for the nine devices. The reported results show that overetch varies in the interval (−0.122 µm −0.020 µm), while the Young's modulus varies in the interval (135.06 GPa 158.65 GPa).  Figure 8 finally collects the results corresponding to the mentioned device for which the parameter estimation process did not provide successful outcomes. The TMCMC samples are concentrated at the border of the considered domain of variation, which had been previously set in compliance with the features of the micro-fabrication process. Therefore, this particular sample distribution is assumed to be unrealistic due to some peculiar defects probably featured by the device. Even if the full-order model is supposed to properly simulate the electro-mechanical response of the device, the considered unknown parameters may not completely describe the micromechanically-governed imperfections for the present case; for instance, an initial offset of the rotor may have been induced by gradients of the residual stresses in the polysilicon film.
Although improvements of the FE model, to allow for additional sources of uncertainty at the microscale, can lead to better parameter estimation results, the framework here proposed, adopting a hybrid model order reduction technique and TMCMC simulations, has clearly shown its capabilities at accommodating detailed numerical models and Bayesian inverse analysis for parameter estimation at the microscale.

Conclusions
In this paper, we have proposed a fully stochastic approach to estimate microscale uncertainties linked to the polycrystalline morphology of the columnar film constituting the movable structures of inertial MEMS, and to the etching phase of the relevant micro-fabrication process. Estimates of the overetch depth O (assumed to be isotropic and homogeneous in the plane parallel to the substrate surface) and of the overall film stiffness (ruled by its Young's modulus E) were obtained through a transitional Markov chain Monte Carlo approach, so as to get full insights into the probability distributions of the said parameters. By this approach, the effects of the (rather limited) measurement noise were also accounted for.
Ten devices, all supposed to be nominally identical, were experimentally tested to collect information on the signature of the aforementioned uncertain parameters on the input-output characteristics of the devices. By means of a standard electrostatic actuation, the responses, in terms of the capacitance change C induced by an applied voltage V, were obtained to feed the TMCMC analysis. As a term of comparison for the experimental data, a high fidelity finite element model, parametrized by the values of O and E, was adopted; within such a model, the nonlinear effects induced by electrostatic loading and by the mechanical deformation were fully accounted for.
To speed up the procedure, the mentioned high fidelity solutions were adopted to build a reduced-order model guaranteeing the required accuracy to obtain robust estimations of the unknowns. First, the C-V curves were processed via proper orthogonal decomposition to identify the fundamental modes governing the nonlinear system response, independently of the values of O and E. Next, this set of data was interpolated in the domain of variation of the parameters through kriging; specifically, the Clenshaw-Curtis sparse grid sampling method was used to address the deployment of the high fidelity analyses in the parameters' space.
Results of this hybrid TMCMC/POD-kriging approach have shown that the ten samples, even if taken from the same wafer and even if nominally identical, were characterized by a response that is clearly governed by different values of O and E, not only in terms of mean (or mode) values, but also in terms of potential scattering around them.
Out of the ten devices here analyzed, the proposed procedure led to unsatisfactory outcomes only in one case, for which the estimates were considered too close to the boundary of their physically-sound domain of variation. This issue will be further considered in future investigations, wherein the current numerical model is going to be enhanced by allowing for additional uncertainty parameters governing the mechanical response of the polysilicon film (like e.g., anchor compliance) and the actual geometry of the movable parts (like e.g., either anisotropic or non-homogeneous overetch depths).