Optimal Sensor Placement for Reliable Virtual Sensing Using Modal Expansion and Information Theory

A framework for optimal sensor placement (OSP) for virtual sensing using the modal expansion technique and taking into account uncertainties is presented based on information and utility theory. The framework is developed to handle virtual sensing under output-only vibration measurements. The OSP maximizes a utility function that quantifies the expected information gained from the data for reducing the uncertainty of quantities of interest (QoI) predicted at the virtual sensing locations. The utility function is extended to make the OSP design robust to uncertainties in structural model and modeling error parameters, resulting in a multidimensional integral of the expected information gain over all possible values of the uncertain parameters and weighted by their assigned probability distributions. Approximate methods are used to compute the multidimensional integral and solve the optimization problem that arises. The Gaussian nature of the response QoI is exploited to derive useful and informative analytical expressions for the utility function. A thorough study of the effect of model, prediction and measurement errors and their uncertainties, as well as the prior uncertainties in the modal coordinates on the selection of the optimal sensor configuration is presented, highlighting the importance of accounting for robustness to errors and other uncertainties.


Introduction
Virtual sensing is used to complement the physical sensing when the direct observations are not available in field and laboratory experiments. Virtual sensing is accomplished by combining the information in output-only vibration measurements with the information contained in a model (usually a finite element model) of the system to predict response time histories of various quantities of interest (QoI). The subject of virtual sensing, also known as response reconstruction, has received a lot of attention recently due to its importance in monitoring critical performance and safety-related quantities such as accelerations, displacements, structural shapes, interstory drifts, strains/stresses and fatigue damage accumulation in structures that operate in a dynamic environment.
Filtering techniques for input, state and parameter estimation as well as modal expansion techniques for response reconstruction are the two type of methods extensively used in the literature for virtual sensing and response reconstruction. The optimal sensor placement techniques developed in this work are based on modal expansion techniques for virtual sensing. The modal expansion technique represents the system response QoI (acceleration, displacement, strain, etc.) as a modal superposition involving the modeshapes of the structure (e.g., displacement or strain modeshapes) and a fixed number of modal coordinates. This allows the prediction (virtual sensing) of any response QoI by estimating the temporal variability of the modal coordinates from measured response time histories and borrowing the information from a finite element model for representing the rest of the quantities involved in the modal expansion.
The modal expansion method has been used in structural dynamics for reconstructing stress/strain fields using limited number of acceleration measurements [1] or displacement/strain measurements [2][3][4]. It is pointed out that estimating the strains/stresses is important for fatigue damage identification. The potential of providing fatigue damage accumulation predictions in the entire body of metallic structures based on virtual strain/stress sensing has been demonstrated for the first time in [5,6] by combining outputonly vibration measurements, finite element models and filtering techniques with stochastic and deterministic fatigue theories. Such predictions are based on the actual operating conditions of structures and thus provide realistic fatigue estimates consistent with existing fatigue theories.
Bayesian methods were applied for modal identification and virtual sensing using modal expansion techniques to better account for the model and measurement errors in the modal coordinates estimation [36] and response predictions [37][38][39] and also used for damage detection [40][41][42]. Such methods have the advantage of predicting also the uncertainty in the modal coordinates and/or predictions. In particular, the uncertainty formulation is useful for optimizing the location and number of sensors by minimizing the uncertainty in the estimates of the modal coordinates [36] and response predictions or virtual sensing [38,43,44].
OSP techniques have been developed in the past for the purpose of extracting the most informative data from a given number of sensors. A recent article [45] reviews methods and optimization algorithms for optimizing the location of sensor in a structure. Selecting the most informative sensor configuration is often performed using information theory based approaches. Measures of information that have been used in the past for structural dynamics problems include the Fisher information matrix (FIM) [46][47][48][49][50][51], the information entropy [52][53][54][55][56][57][58][59][60][61][62][63][64], the joint entropy [65,66], the expected Kullback-Liebler divergence (KLdiv) index [67][68][69] and mutual information [70,71], and the value of information [72][73][74]. FIM-based OSP techniques address problems of parameter estimation of physics-based models for linear structural systems [46][47][48], modal estimation [51], as well as identification of distributed parameter systems [49,50]. The OSP techniques based on information and joint entropy address problems of parameter estimation of linear systems subjected to known [54][55][56]63,64] and unknown [53] excitation, model updating [52,61], modal identification [59], model selection [58], structural health monitoring and damage detection [60,62]. In particular, techniques to avoid sensor clustering by taking into account the redundant information contained in the measurements have been discussed in [57,59]. OSP approaches based on expected KL-div and utility theory also address problems of parameter estimation of linear [70,[75][76][77] and nonlinear [78] structural systems, modal estimation [71,79] and structural health monitoring [80,81]. Finally, OSP issues related to multi-type sensor placement have also been addressed [56,65]. The aforementioned methods, however, have focused on OSP problems for parameter estimation of physics-based or mathematical models. Using information theory formulations, the OSP problem for response reconstruction has been studied recently considering that the input forces are known [38,44]. OSP for response reconstruction under unknown input forces modeled by stochastic processes has been attempted in [82] using the kriging technique. An OSP formulation to handle the case of output-only vibration measurements has also been presented recently by combining filtering techniques with information theory [43].
In this work, considering output-only vibration measurements, a novel OSP framework is presented for accurate response reconstruction and virtual sensing in linear systems based on modal expansion techniques and information theory. The information gained by a sensor configuration is measured by the KL-div [83] between the posterior and prior probability distribution of the response QoI to be virtually sensed or reconstructed, by combining available modal expansion techniques and data. The KL-div is averaged over all possible QoI to be sensed. This is obtained using the Lindley's utility function [68,84] quantifying the average information in the data over all possible measurements generated by the prediction error model. The measure is extended to include uncertainties in the model parameters, as well as in the model/prediction and measurement errors which are assigned in the modeling process for the Bayesian estimation of the posterior distribution of the modal coordinates. The optimal sensor configuration is obtained by maximizing the utility function. For the case of uncertainties in model parameters, the utility function involves a multidimensional integral over the uncertain parameter space which can be computed using sparse grid or Monte Carlo techniques. Due to the linearity of the response QoI and the modal coordinates, exact analytical expressions are developed for the utility function in terms of the variance of the responses of the QoI to be sensed. The structure of the analytical expressions developed are used to derive useful formulas that show the effect of measurements and model/prediction errors on the expected information gained from the data, as well as derive the dependence of the information gain as a function of the number of sensors.
This study is organized as follows: in Section 2, the modal expansion is outlined for formulating the uncertainty in the predictions of response QoI. In Section 3, the optimal sensor placement methodology for predicting response time histories of desirable QoI (virtual sensing) with the least uncertainty is presented based on utility and information theory. In Section 4, models for the model/prediction and measurement errors required in the formulation are introduced. Section 5 discusses implementation issues and the importance of taking into account the uncertainties in the input characteristics for optimizing the sensor placement. An application on a square plate structure is used in Section 6 to demonstrate the capabilities and effectiveness of the OSP methodology for reliable virtual sensing. Conclusions are drawn in Section 7.

Bayesian Virtual Sensing Using the Modal Expansion Method
Consider a structural model used to predict the temporal variability of the response vector z(t; ϕ) ∈ R n z (e.g., accelerations, displacements, strain or stresses) at n z locations given the values of a structural model parameter set ϕ (e.g., stiffness, mass and damping related parameters) and the excitation vector u(t) ∈ R n u . Let D = y(t) ∈ R N 0 be the vector of response time history data collected by placing N 0 sensors in the structure. These data depend on the sensor configuration vector δ ∈ R N 0 indicating the location and measurement direction of sensors placed in a structure. The data may consist of either acceleration, displacement and strain measurements. In what follows, a linear model of the structure is assumed. Additionally, it is assumed that the excitation time histories u(t) are not available.

Modal Expansion for Virtual Sensing
Given output-only data y(t), the modal expansion technique is used to predict responses at measured and unmeasured output QoI z(t; ϕ) using modal coordinates and mode shape vectors. Based on the modal expansion technique, the measured displacement or strain response time histories at N 0 degrees of freedom (DOFs) are given with respect to the modal coordinates by the mode superposition equation where ξ(t) ∈ R m is the vector of m modal coordinates satisfying the modal equations, L(δ) ∈ R N 0 ×n is the observation matrix that maps the displacements at all n model DOF to the measured displacement or strain quantities indicated by the sensor location vector δ, Φ(ϕ) ∈ R n×m is the displacement mode shape matrix corresponding to the n model DOF and m contributing modes, and e(t) is a multi-variable zero-mean Gaussian noise term with covariance matrix Q e that accounts for measurement and model errors. The modal equation can be written as where Λ is the diagonal matrix of the squares ω 2 r of the modal frequencies ω r , Z is the diagonal matrix with the r-th diagonal element equal to 2ω r ζ r , ζ r is the modal damping ratio, and M ∈ R n×n u is a matrix of zeros and ones associating the independent excitations in the vector u(t) to the DOF of the structural model. Displacement, strain and/or stress predictions at output locations or DOF are given by the prediction equation where Ψ(ϕ) ∈ R n z ×m are the corresponding displacement, strain and/or stress mode shapes that relates the modal coordinates to displacement, strain and/or stress QoI, and ε(t) is a zero-mean prediction error with covariance matrix Q ε accounting for model error.
The mode shape matrices Φ(ϕ) and Ψ(ϕ) are available by analyzing the model (e.g., finite element model) of the structure. It should be noted that formulation using Equations (1) and (3) can also be used when the available measurements and predictions consist of accelerations. In this case the modal vector ξ(t) in Equations (1) and (3) refers to the second derivatives of the modal coordinates with respect to time.

Bayesian Virtual Sensing
Bayesian inference is next used to estimate the modal vector parameter ξ(t) ≡ ξ(t; δ, ϕ) and its uncertainties based on the data collected from a sensor configuration δ and then propagate these uncertainties to predictions of output QoI z(t) ≡ z(t; δ, ϕ). The posterior probability distribution function (PDF) p ξ(t)|y(t), δ, ϕ quantifying the uncertainty in the modal coordinates ξ(t) at time t, given the data y(t), the sensor configuration δ and the model parameters ϕ, takes the form p ξ(t)|y(t), δ, ϕ ∝ p y(t)|ξ(t), δ, ϕ p ξ(t)|ϕ (4) where, using the assumption that e(t) in Equation (1) follows a Gaussian distribution N(e(t)|0, Q e ) with mean 0 and covariance Q e , the likelihood takes the form The prior PDF p(ξ(t)|ϕ) is postulated to be a zero-mean Gaussian p ξ(t)|ϕ = N ξ(t)|0, S with covariance matrix S. Substituting Equation (5) into Equation (4) it is straightforward to show that the output modal coordinates follow a multi-variable normal distribution [37,43] p with mean and covariance matrix Using Equations (3) and (6) to propagate uncertainty to output QoI z(t), it can be readily obtained that z(t) also follows a multivariable normal distribution that depends on the data, and covariance matrix Σ z|D (δ, ϕ) given by In particular, the variance of the i-th element z i (t; δ, ϕ) of the response vector z(t; δ, ϕ) is given by the i-th diagonal element Σ z i |D (δ, ϕ) of the covariance matrix Σ z|D (δ, ϕ) as follows where ψ i (ϕ) denotes the i-th column of the matrix Ψ T (ϕ), and Q ε i is the i-th diagonal element of the matrix Q ε . For a Gaussian prior PDF N(ξ(t)|0, Σ ξ ) of the modal coordinate vector ξ(t), the prediction of the QoI z prior to the data can be readily obtained from Equation (3) to be Gaussian with mean zero and covariance matrix Σ z (ϕ) = Ψ(ϕ)Σ ξ (ϕ)Ψ T (ϕ) + Q , where Σ ξ (ϕ) = S is the covariance matrix corresponding to the assigned prior distribution. In particular, the variance of the i-th element z i (t; ϕ) prior to the data is given by The posterior and prior variances Σ z i |D (δ, ϕ) in Equation (12) and Σ z i (ϕ) in Equation (13), described in terms of the parameters ϕ and the sensor locations δ, are the main quantities involved in the next section to solve the optimal sensor location problem for virtual sensing and response reconstruction. It is clear that these variances are independent of the measurements/data y(t) and depend only on the structural model parameters ϕ, the model and measurement error covariances Q e and Q ε , as well as the covariance matrix S of the prior probability distribution of the modal coordinates ξ(t). For practical convenience and without loss of generality, stationarity conditions are assumed, where the covariance matrices Q e , Q and S are independent of time t. As a results the posterior and prior variances defined in Equations (12) and (13) do not depend on time t. The parameters that define the model/prediction and measurement error covariances can be included in the parameter set ϕ. A probability distribution p(ϕ) can be postulated to quantify the uncertainties in the values of model and input characteristics involved in ϕ.
Note that the matrix Φ T (ϕ)L T (δ)Q −1 e (δ, ϕ)L(δ)Φ(ϕ) in Equation (12) is nonsingular only if the number of sensors is greater or equal to the number of modes. Thus, for S = 0, the condition N 0 ≥ m should be met in order for the system to be identifiable [36]. The prior covariance matrix S is particularly important when the condition N 0 ≥ m is not met. This matrix contributes subjective information from the prior that allows the inversion of the matrix appearing in the first term of Equation (11) for values of N 0 < m.

Expected Utility Using Information Gain
Information theory is next combined with utility theory to measure the usefulness of a sensor configuration for reliable virtual sensing that is robust to modeling and measurement uncertainties. The objective is to select the sensor locations that maximize the information contained in the data for predicting with the least uncertainty the output response QoI at desirable locations. A measure of the information gain for estimating a response QoI z i , given a set of data y and the model parameters ϕ, is the KL-div [83] between the prior and posterior probability distribution of the output QoI z i , defined for an experimental design δ asD For several output QoI included in the vector z, the measure can be extended to the weighted average of the information gain for all possible output QoI, given as where the values of the weight w i are selected to quantify the importance of the i-th QoI z i in the design of the sensor configuration.
In the initial design phase the data are not available. Instead they can be generated by the prediction error model Equation (1) for given values of the model parameters ϕ and the probability distribution of the prediction error term e(t). Following Lindley's work [68], utility theory is used to measure the usefulness of the experiment with the utility function selected to be the expected value of the information gain in Equation (15) over all possible values of the experimental data. Extending the utility function to include the uncertainty in the model parameters ϕ as well, one introduces the expected utility function that quantifies the usefulness of learning from the data for predicting the output QoI included in the vector z, in the presence of model and measurement uncertainties, where is the expected utility function that accounts for a component z i of the response QoI z, p(y, ϕ|δ) = p(y|ϕ, δ) p(ϕ), p(y|ϕ, δ) is the uncertainty in the outcome y given the model parameters ϕ, and p(ϕ) is the uncertainty in the model parameters. The utility function defined in Equations (16) and (17) is an average of the information gain over all the possible data outcomes. It is shown in Appendix A that the expected utility function U i (δ) can be formulated in terms of the change in the expected information entropy before and after the data are collected, given by where H z i (ϕ) is the prior information entropy in z i (t) given the model parameter set ϕ, and H z i |D (δ, y, ϕ) is the posterior information entropy in z i (t) given the data y and the model parameter set ϕ. For Gaussian probability distribution of the response z i (t), the posterior information entropy given the values of the data set and the model parameter set ϕ is given in terms of the i-th diagonal component of the covariance matrix Σ z|D (δ, ϕ) of the error in the estimate of z as follows Thus it depends on the sensor locations and the values of the parameters set ϕ, while it is independent of the data. For the prior information entropy H z i (ϕ) an expression similar to Equation (19) holds with the posterior Σ z i |D (δ, ϕ) replaced by the prior Σ z i (ϕ).
Taking into account that the prior information entropy H z i (ϕ) in Equation (18) is constant, independent of the sensor configuration δ, and that the posterior information entropy H z|D (δ, y, ϕ) does not depend on the data, the expected utility function U i finally takes the form whereH andH are respectively the expected posterior and prior information entropies over all possible values of the model parameters ϕ, weighted by the PDF p(ϕ) of the model parameters.
Substituting Equation (20) into Equation (16), the expected utility function that accounts for all response entries in the vector z takes the form where r(δ, ϕ) is defined as the ratio Using equal weight values w i = 1/n z , the utility function takes the form The integral in Equation (23) or Equation (25) is a probability integral over the space of uncertain parameters ϕ. This integral represents the robust information entropy change before and after the data are available, weighted over all possible values of the model parameters quantified by the PDF p(ϕ). The multidimensional integral can be evaluated using Monte Carlo techniques or sparse grid methods [85,86]. It is verified in Appendix B that the ratio r(δ, ϕ) in Equation (24) cannot exceed the value of 1 and so the information entropy change is always non-positive or, equivalently the utility function is non-negative as expected, meaning that there can be only information gain when placing a given number of sensors in the structure.

Optimal Sensor Placement
The optimal sensor configuration δ opt is obtained by maximizing the utility U(δ) or, equivalently, minimizing the change in information entropy ∆H z|D (δ), with respect to the design variables δ, that is The optimal number of sensors in the sensor configuration can be estimated by monitoring the gain in information as additional sensors are placed in the structure. Usually, after sufficiently number of sensors are placed in the structure, the information gain using additional sensors is relatively small and the process of adding sensors in the structure is terminated.
The optimization in Equation (26) may result in multiple local/global solutions [57]. The optimization problem can be solved using continuous design variables δ accounting for the location of the sensors over the physical domain of the structure or discrete design variables δ accounting for the discrete locations (e.g., DOF at nodes for placing displacement/acceleration sensors or Gauss integration points for placing strains sensors in a finite element mesh). Global optimization algorithms [87,88] as well as stochastic optimization algorithms, such as CMA-ES [89] and genetic algorithms [45,[90][91][92][93] can be employed in order to avoid premature convergence to a local optimum. Alternative heuristic forward and backward sequential sensor placement (FSSP/BSSP) algorithms [54,57] are effective in solving the optimization problem. The heuristic algorithms bypass the problem of multiple local/global optima manifested in optimal experimental designs, providing near optima solutions in a fraction of the computational effort required in stochastic optimization algorithms or exhaustive search methods [52]. For a total of N all possible sensors positions and N 0 sensors to be placed in the structure, for N 0 relatively small compared to N all the FSSP algorithm requires approximately N F = N 0 N all function evaluations, while the BSSP algorithm requires approximately N B = N all (N all + 1)/2 function evaluations [54]. Although the computational effort for the BSSP is approximately N B /N F ≈ 0.5N all /N 0 times larger than the computational effort of FSSP and thus FSSP should be preferred for N 0 << N all , the estimate from BSSP may in some cases be better than the FSSP estimate and thus the combined estimate should be used in the optimization. More details are presented in the applications Section 6.

Model Prediction Error Formulation
Following the analysis in Section 2.1, the prediction error e = e meas + e model in Equation (1) is partly due to a term, e meas , accounting for the measurement error and partly due to a term, e model , accounting for the model error. Assuming that measurement and model errors are independent and zero-mean Gaussian vectors with covariance matrices Q e,meas and Q e,model , the covariance of the total prediction error is To proceed with the optimal sensor placement design one has to select the values of the covariance matrices Q e,meas and Q e,model . The selection depends on the nature of the problem analyzed. The following selections follow the suggestions presented in [94]. For the measurement error term it is reasonable to assume independence of the values of the errors from the intensity of the response so that Q e,meas = s 2 I, where I is the identity matrix, and the level s depends on the sensor accuracy and characteristics. A reasonable choice of the model error variance Q (ii) e,model at the i-th DOF is to have it proportional to the square of the intensity of the QoI at DOF i, given by Q y , where σ e denotes the level of model error in relation to the intensity of the QoI. In this way, the level of model error is independent on the intensity of the response. In addition, a certain degree of correlation is expected for the model errors between two neighborhood locations, arising from the underlining model dynamics [57]. This correlation can be taken into account by selecting a non-diagonal covariance matrix Q e,model . The correlation Q (ij) e,model between the predictions errors e i,model and e j,model at DOFs i and j, respectively, can be assumed to be so that it accounts for the spatial distance d ij between the DOFs i and j, where ρ(d ij ) is a correlation function satisfying ρ(0) = 1. However, in the experimental design phase where data are not available, the actual errors and correlations should be postulated in order to proceed with the design of the optimal sensor locations. Several correlation functions can be explored. For demonstration purposes in this study, the following exponentially decaying correlation function is assumed: where λ is a measure of the spatial correlation length. However, the formulation in this work is general and does not depend on the choice of the correlation model. Using the aforementioned selections, the covariance matrix Q e in Equation (27), required in Equation (12), simplifies to where the notationQ denotes a diagonal matrix that contains in i-th diagonal entry the quantity Q (ii) y ,Q 1/2 represents a diagonal matrix with elements the square roots of the elements inQ, and R is the correlation matrix with the (i, j) element R ij equal to ρ(d ij ).
A similar formulation can be used for the variance Q i involved in Equation (12) of the error ε i for the predicted QoI z i . In this case only the model error exists and the i-th diagonal element of the covariance matrix can be selected to be were Q z i is the square of the intensity of the predicted QoI z i (t), and σ ε is the level of model error in relation to the intensity of the predicted QoI. The formulation for the errors and the mathematical structure of the ratio r(θ, ϕ) in Equation (24) can be used to show a number of useful properties for the utility function and thus for the information gain. Specifically, in Appendix B.1 it is shown that as the measurement and model/prediction errors increase for a given sensor configuration, the ratio r(θ, ϕ) increases and so the utility function decreases. This indicates that the higher the errors, the less the information gain from the sensor configuration. Finally, another important property shown in Appendix B.2 is that adding a sensor in an existing sensor configuration increases the information gain, which is similar to the results presented in [54] for parameter estimation. This should be expected since as a sensor is added in an existing sensor configuration, there can be null or extra information provided by this sensor. As a result, the maximum and minimum value of the utility function is an increasing function of the number of sensors. Lastly, the spatially correlated structure of the model error, introduced in Equation (29), has the important effect of avoiding clustering of sensors as it was theoretically shown in [57] for OSP for parameter estimation.

Implementation
It should be noted that the estimate in Equation (10) of the response QoI z and the error in the estimate, quantified by the covariance matrix Σ z (δ, ϕ) in Equation (12), depends on the measurement/model and prediction error covariance matrices Q e and Q , as well as the covariance matrix S of the prior Normal distribution assumed for the modal coordinates ξ(t). The effect of the values of these covariance matrices on the optimal sensor placement for response predictions is investigated in this work.
The selection of the the covariance matrix S of the prior normal distribution should take into account the relative contribution of the different modal coordinates on the response of the system. Such contribution depends highly on the excitation characteristics. Equations (30) and (31) also suggest that the values of the covariance matrices Q e and Q ε of the model prediction errors should be carefully selected based on the intensity of the measured and predicted responses which are not known at the OSP design phase. To proceed with rational selections, the intensities of measured and predicted QoI have to be considered which depend on the characteristics of the excitation. Thus, the characteristics of the excitation have to be considered in the analysis in order to decide on the values of the covariance matrices S, Q e and Q upon which the OSP design will be based. Failing to consider the intensity of the modal coordinates and the responses in the selection of the prior and error covariance matrices may lead to OSP designs that are based on non-rational choices of these error covariance matrices.
Due to the uncertainty in the excitation characteristics the values to be assigned for the model/prediction and measurement errors involve large uncertainty. We proceed with a thorough investigation of the effect of the model/prediction and measurement errors as well as the effect of the uncertainty in the prior distribution on the information gain and the optimal sensor location. Finally, the robust design proposed in this work will take into account these uncertainties in the design of the optimal sensor configuration.
To demonstrate concepts, we assume a zero-mean stationary white noise excitation. We also assume, without loss of generality, that the location of the excitation is known. Unknown locations or multiple excitation components can as well be treated in the formulation. However, such an analysis is beyond the scope of the present work. Using a linear model (e.g., a finite element model) of the structure, one can readily obtain the covariance Q ξ of the modal quantities, as well as the covariance of the response QoI (displacements, velocities, strains and stresses) Q y and Q z . These matrices can be used in Equations (30) and (31) to make the proper assignment for Q e and Q ε through the proper selection of the prediction error parameters σ e , σ ε and s. Furthermore, accepting that the excitation is white noise, it is also reasonable to assume that the covariance of the prior distribution for ξ(t) is selected to be proportional to the covariance Q ξ of the modal quantities ξ(t), i.e., where α quantifies the extent of the uncertainty in the prior distribution. This assignment will correctly take into account the participation of each mode in the vibration analysis of the structure. The analysis for estimating the covariance matrix Q ξ of the modal coordinates and the covariance matrices Q y and Q z of various response QoI is presented next based on the finite element model of the structure and a discrete state space representation of the modal equations in Equation (2). Introducing the state space vector and ∆t is the sampling time of a signal, the modal equation in Equation (2) can be written in the discrete state space form where, using zero-order hold, the state space matrices are given as A = exp(A c ∆t) and An output vector QoI h k at time t = k∆t, either it corresponds to measured quantities y or predicted quantities z, can be written in the form where the matrices C and D relate the response QoI to the state vector and input load vector, respectively. For displacement, strain or stress responses at all DOF of the structure, the matrices C = GΦ[ I 0 ] and D = 0, where the matrix G relates the displacement DOF with the output QoI (displacements, strains and stresses). For acceleration responses the matrices Here it is assumed that Φ is mass normalized. In particular, to estimateξ k one uses Equation (35) Assuming a scalar stationary zero-mean Gaussian white noise excitation with variance σ 2 wn , the covariance Q x of the state vector under stationary conditions is given by wnQx , whereQ x can be obtained by solving the discrete Liapunov equation Using Equation (35), the covariance of the output response QoI in the vector h(t) is given by and is proportional to the variance σ 2 wn of the discrete white noise excitation. Setting h = y, or h = z, or h = ξ, the covariance matrices Q y , or Q z , or Q ξ are obtained, required in the error covariance matrices Q e and Q ε i in Equations (30) and (31) and the prior distribution S in Equation (32).

Applications
The methodology is demonstrated for a square plate structure modeled by thinshell finite elements (FEs). The plate is fixed at the left edge. The model is meshed with eight-node shell elements containing six DOFs per node (Figure 1). To investigate the effect of mesh size on the optimal sensor placement, two models are considered corresponding to different mesh types: a coarse and a fine mesh. The coarse mesh model consist of 420 elements, 441 nodes, while the fine mesh model consists of 3660 elements and 3721 nodes. Linear elastic behavior is assumed. The lowest eight natural frequencies of the models for the coarse and fine mesh are presented in Table 1.

Strain Predictions Using Strain Observations
Normal strain measurements and predictions are considered along the x direction at the midpoints of all finite elements of the plate surface comprising the mesh. OSP of strain sensors is performed for predicting the strains at all finite elements of the mesh (Figure 1). Contribution of the lower eight modes to the dynamic behavior of the plate is assumed in designing the optimal location of sensors.

Model/Prediction Errors, Measurement Error and Prior Distribution
Reasonable choices of the error parameters s, σ e and σ ε involved in the covariance matrices Q e and Q ε in Equations (30) and (31) of the model/prediction and measurement error models are next considered. For this, it is assumed that the plate is subjected to a concentrated load applied at the right bottom corner A, as shown in Figure 1. A broad band excitation is considered, modeled by a discrete Gaussian white noise sequence with standard deviation σ wn . To select the standard deviation s of the measurement error, the intensities of the normal strain responses along the x direction predicted for white noise input are computed and shown in Figure 2a,b for the coarse and fine mesh, respectively. The intensity of a response QoI z i is quantified by the standard deviation Q 1/2 z i computed by solving the Liapunov Equation (36) and using Equation (37). The results in Figure 2 are normalized with respect to the intensity σ wn of the white noise input. Approximately 98% of the computed intensities of the strains in all plate elements are greater than min ≡ Q 1/2 z,min = 10 −6 , while the maximum strain intensity value is approximately max ≡ Q 1/2 z,max = 2x10 −5 = 20 min . To investigate the effect of measurement error, the parameter s of the error covariance matrix in Equation (30) is selected as shown in Table 2 to have four different values corresponding to very small, small, moderate and large measurement error, respectively. The σ e of the model error and σ ε of the prediction error involved in Equations (30) and (31) are selected to be σ e = σ ε = 0.01 and 0.001 corresponding to small and very small model/prediction errors, respectively. The case of uncorrelated prediction error is considered (λ = 0 in Equation (29)).
The intensities of the modal coordinates ξ(t) predicted for white noise input are shown in Figure 2c. It is clear that the intensities of the modal coordinates vary considerably from mode to mode. This reinforces the fact that the covariance of the prior distribution of the modal coordinates should carefully be chosen using Equation (32) to take into account the different intensities of each mode. These modal intensities highly depend on the spatial and temporal excitation characteristics (number and location of excitation, frequency characteristics, etc.). The value of α in Equation (32) is selected to be α = 10 2 and α = 1 corresponding respectively to large and small uncertainties in the prior distribution of the modal coordinates ξ. Table 2. Different measurement errors s assumed. min = Q 1/2 z,min ≈ 10 −6 is the minimum value of the element strain that cover 98% of the plate surface. max = Q 1/2 z,max ≈ 2 × 10 −5 is the maximum value of the strain in the plate surface.

FSSP and BSSP Algorithms
The results for the utility values obtained using the FSSP and BSSP algorithms for σ e = σ ε = 0.01 (small model/prediction errors), s = 10 −7 (moderate measurement error) and α = 10 2 (large prior uncertainty in modal coordinates) are compared in Figure 3 for coarse ( Figure 3a) and fine meshes (Figure 3b). The estimates from the two algorithms differ due to the fact that both algorithms are heuristic and provide approximate values. In this specific case and for the coarse mesh, the BSSP algorithm provides better solutions for the maximum and minimum utility for more than eight sensors, while the FSSP algorithm provides better solution than the BSSP algorithm for one to seven sensors. This observation is not consistent for the fine mesh where FSSP algorithms provides better estimates for the minimum utility for all number of sensors, while the BSSP provides a better estimate for the maximum utility for seven sensors. Similar behavior for the accuracy of the results provided from the FSSP and BSSP algorithms is observed for other error cases as well. To increase the reliability of the estimates arising from the two heuristic algorithms, the final solution is taken from the combination of the FSSP and BSSP solutions. Specifically, for each sensor configuration containing a fixed number of sensors, the final maximum utility value is taken to be U max = max(U F,max , U B,max ), where U F,max and U B,max are the maximum values estimated from the FSSP and BSSP algorithms, respectively. Additionally, the optimal sensor placement is selected among the FSSP and BSSP optimal sensor placement that corresponds to the value of U max . A similar procedure is used for the minimum utility value, i.e., U min = max(U F,min , U B,min ). The combined FSSP/BSSP result will be referred from here on as the sequential sensor placement (SSP) estimate.
The use of BSSP to obtain results has the effect of raising substantially the computational cost in relation to FSSP. Specifically, comparing the number of function evaluations N F and N B for the FSSP and BSSP algorithms one has that for N 0 = 30 sensors that N B /N F = 0.5N all /N 0 ≈ 6 for the coarse mesh and N B /N F ≈ 60 for the fine mesh. The number of functions evaluations for BSSP for the fine mesh is two order of magnitude larger than the one required for FSSP. Additionally, for the spatially correlated prediction error case, the FSSP and BSSP require the repeated solutions of algebraic linear system of equations of size (see Equation (12)) as high as N 0 and N all , respectively, raising substantially the computational effort for BSSP in relation to FSSP for the common case for which the number of possible sensor locations N all is usually much higher than number of sensors N 0 (N all >> N 0 ) in a sensor configuration.

Information Gain versus Number of Sensors
The SSP results for the maximum and minimum utility values as a function of the number of sensors for the optimal and worst sensor configurations for up to 30 sensors are shown in Figure 4 for different measurement and model/prediction errors for both coarse (Figure 4a,c) and fine (Figure 4b,d) meshes. Large prior uncertainty in the modal coordinates is assumed (α = 10 2 ). Comparing the maximum utility values in Figure 4a,c for the coarse mesh with the corresponding maximum utility values in Figure 4b,d for the fine mesh, it can be seen that the results are almost indistinguishable. Thus, the mesh size does not affect the maximum value of the expected information gain, as it should be expected since the dynamic characteristics from both meshes do not differ significantly as shown in Table 1. However, the mesh size affects the minimum value of the information gain, providing substantially lower values of the utility for the fine mesh. This is due to the fact that a fine mesh contains significantly more finite elements and thus more strain sensor locations with non-informative strains than the coarse mesh does. It should be noted that the difference between maximum and minimum expected information entropy values for a fixed number of sensors gives the maximum information gain that can be achieved by employing the optimal sensor placement methodology. To interpret the results in Figure 4, it should be kept in mind that for eight contributing modes one needs at least eight sensors in order for the information matrix in Equation (12) to be invertible and the problem to be identifiable without the use of the subjective information from the prior PDF of the modal coordinates. For less than eight sensors the information matrix in Equation (12) is not invertible without prior information. The prior covariance matrix S of the modal coordinates provides the missing information required to make the problem identifiable. From the results in Figure 4a, it is observed that the expected information gain steadily increases as one adds from one to seven sensors due to the increase of the information from the data, and rises sharply from seven to eight sensors due to the fact that eight sensors placed at their optimal positions provide the necessary information without the need of the small complementary information from the prior.
Normalized utility values obtained by dividing the maximum and minimum utility values in Figure 4 by the utility values obtained by placing the maximum number of strain sensors at all finite elements of the mesh (420 for coarse mesh and 3660 for fine mesh) are presented in Figure 5 for each measurement and model/prediction error case. By tracking the maximum normalized information gain values as a function of the number of sensors, it is possible to decide on the number of sensors to be kept in an optimal sensor configuration. One should stop adding sensors in the structure when the information gained by additional sensors is not significant compared to the information gained by the existing sensors, or when the information gained by a number of sensors is a sufficiently large percentage of the maximum information that can be achieved by placing sensors at all possible locations (e.g., all finite elements of the mesh in the plate problem).

Information Gain versus Measurement Error
As seen in Figure 5a,b, for very small model error (σ e = σ = 0.001) and for very small (s = 10 −9 ) to small (s = 10 −8 ) measurement error, eight sensors placed at their optimal positions account for approximately 97% to 93% of the maximum information that can be gained by adding strain sensors at all possible locations. For moderate (s = 10 −7 ) and large (s = 10 −6 ) measurement error, eight optimally located sensors provide an information gain of the order of 84% and 79% for the coarse mesh and 79% and 70% for the fine mesh compared to the maximum information gain that can be achieved for the coarse and fine mesh, respectively. These lower values are due to the fact that information extracted from sensors is affected by the model and measurement errors. The higher the error, the less the information extracted from the sensors. Additionally, comparing the normalized information gain values for the fine and coarse meshes, smaller normalized information gain values are reported for the fine mesh due to the fact that in these large error case the 3660 strain sensors provide more information than the 440 strain sensors placed at all finite element of the fine and coarse mesh, respectively. Thus the normalizing quantity for the fine mesh is highest for the fine mesh and as a result the normalized information gain values for the fine mesh appear smaller than the corresponding ones for the coarse mesh.
For the small measurement error cases (s = 10 −9 and s = 10 −8 ), eight sensors placed at their optimal positions provide most of the information for accurate response prediction ( Figure 5). Given that eight sensors have been placed on the structure, there is very small gain in information (less than 7%) if the plate is fully populated with sensors. This is mainly due to the fact that quality of the measurements is very good and/or the model error is small, so the number of sensors needed is at most the number of sensors required for making the problem identifiable. For large measurement errors in Figure 5, the quality of information deteriorates significantly due to measurements and/or model error and so the minimum number of eight sensors required for identifiability appears less informative than the case of small measurement error. More than eight sensors increase further the utility values, providing significant additional information to counterbalance the deteriorated quality of the measurements.
Considering the cost of instrumentation, the process of placing more sensors optimally in the structure in order to gain a higher percentage of the total information should be considered with care and in some cases might not be justifiable (like in the case of small measurement and model error for the plate problem). Nevertheless, the final choice of the number of sensors to be placed in the structure depends on the cost of instrumentation which may also affect the location of sensors, especially for the cases where instrumentation cost depends on the location of a sensors. For example, not easily accessible areas in a structure, such as underwater locations in off-shore platforms or wind turbines, might substantially increase the cost of adding sensors in relation to the cost of instrumenting easily accessible areas. However, considering cost issues in designing the sensor configuration falls outside the objectives of this work and the reader is referred to value of information formulations (e.g., [72,73]). Figures 6 and 7 plots the information gain as a function of the measurement error s for 8, 30 and N all sensors, for both the coarse and the fine mesh, where N all is the number of finite elements in the coarse or fine mesh. For fixed number of sensors, the information gain decreases as the measurement error increases. This is due to the fact that the quality of information contained in measurements decreases due to higher noise to signal ratio and thus the information gain is lower as the measurement error increases. The decrease is more pronounced for very small modeling error (σ e = σ = 0.001) since most of the error in this case, modeled in the covariance matrix Q e in Equation (27), arises from the measurement error. For higher model error (σ e = σ = 0.01) shown in Figure 6c,d, the information gain values are less sensitive to the measurement error values of s = 10 −9 (very small), s = 10 −8 (small) and s = 10 −7 (moderate), while there is a more pronounced drop in information gain for large measurement error (s = 10 −6 ). This insensitivity of the information gain to smaller values of the measurement error is due to the fact that the larger value of model error dominates the very small to moderate measurement errors as seen by the mathematical model for Q e in Equation (30). The quality of information in the data will be further deteriorated only for sufficiently large values of measurement error (here s = 10 −6 ). Comparing the results in Figure 6 for values of α = 1 and α = 10 2 corresponding to small and large prior uncertainty in the modal coordinates, it is clearly seen that the information gain for small prior uncertainty is less than the information gain for large prior uncertainty since significant part of the information is provided from the more informative (due to narrower bounds) prior distribution of the modal coordinates ξ(t), making the data effectively less informative. Comparing the results in Figure 7, the decrease in the percentage information gain, normalized with respect to the maximum information that can be achieved by fully populating the plate with strain sensors, is more pronounced as the measurement error increases. For example, 30 sensors placed at their optimal position using α = 1 (narrower prior bounds) accounts for approximately 70% of the information that can be gained from strain sensor placement as opposed to approximately 90% of the percentage information gain that can be achieved with α = 10 2 (large prior uncertainty bounds). This is expected since in the case of α = 1 the prior contains significant information in relation to the information provided from the data.

Optimal Locations of Strain Sensors
Optimal strain sensor positions for 8 and 30 sensors are shown in Figures 8 and 9 for model/prediction errors σ e = σ = 0.001 and 0.01, respectively. The optimal sensor locations are compared for different values of the measurement errors. Comparing Figure 8a,b for eight sensors and Figure 8c,d for 30 sensors, it can be seen that the results for the coarse and the fine mesh are very similar for a given measurement error. For very small measurement error, sensors are placed towards the right edge of the plate where strains are small compared to the strains at the left side and middle area of the plate. The reason is that the OSP methodology for predicting strain responses in all finite elements of the plate has a tendency to spread the sensors to cover the whole surface of the plate as long as the quality of information is very good over the plate surface. In the very small measurement error case, the errors are much smaller than the intensity of the strains and so signal to noise ratio is high and most strain locations in the plate are informative. For large measurement errors, placing sensors in the right edge is avoided since the signal to noise ratio decreases and the quality of information from sensors placed towards the right edge is substantially deteriorated. For higher model/prediction error values of σ e = σ = 0.01 (see Figure 9) there is a tendency that the sensor move from left to the right, towards strains with smaller intensities. This is due to the fact that higher model/prediction error dominates the overall error, with the size of measurement error playing a lesser role in the optimal sensor placement design. In this case, since the model error is assigned in each position as a fraction of the intensity of the strains measured in the respective positions, all positions on the surface plate do provide similar information with the noise (here model error) to signal ratio to be the same, thus the sensors towards the left edge are equally counted in the optimal sensor placement methodology.

Effect of Spatial Correlation of Model Error
It is observed in Figures 8c,d and 9c,d that the 30 sensors placed optimally in the structure are clustered in specific regions on the plate surface. The size of each clustering region is proportional to the size of the finite element used for coarse and fine mesh. So clustering in similar for both coarse and fine mesh sizes. To avoid sensor clustering one has to use the spatial correlation function in Equation (29) [57] for the prediction error.
The effect of correlation in the model error is next investigated as a function of the size of measurement and model error. Figure 10 compares the optimal sensor locations for 30 sensors for spatially uncorrelated (ρ = 0) and correlated (ρ = 0.1) prediction error models for different values of the measurement error and for model/prediction error equal to σ e = σ ε = 0.01. The results clearly indicate that for the correlated case clustering is avoided and the 30 sensors are more uniformly distributed over the surface of the plate for relatively small to moderate measurement error. This is due to the fact that the model error dominates the prediction error for relatively small measurement error and thus the measurement error has a small effect on the design of optimal sensor locations. However, for large measurement error, clustering persists (Figure 10d) since the measurement error is the dominant source of error compared to the model error. Thus model error and as a result the effect of spatial correlation structure of the model error is insignificant and does not affect the design of the sensors for large (s = 10 −6 ) values of the measurement error. So the clustering problem reappears and the model error correlation structure has no effect on the optimal sensor placement.

Effectiveness of Optimal Sensor Configuration for Response Predictions
The effectiveness of the best sensor configuration is next investigated using simulated measurements. For this, simulated strain response time histories are generated from the model of the plate subjected to white noise input at location A ( Figure 1) and using up to eight contributing modes. The simulations are generated using a sampling period ∆t = 0.01 s and the standard deviation σ wn = 1 of a Gaussian white noise sequence. To simulate measurement error (noise from sensors), zero mean Gaussian white noise with standard deviation 1% of the simulated response at each time instant is added to generate the noise contaminated measurements. Alternatively, to simulate model error, the measured data are simulated using a model with mass values for all finite elements randomly perturbed by adding to the nominal mass values zero-mean Gaussian distributed values with standard deviation equal to 5% of the nominal mass values.
The relative errors between the strain responses predicted by the modal expansion technique given a fixed number of sensors and the simulated measurements are used to demonstrate the effectiveness of the optimal sensor configuration in the presence of measurement or model error. The relative strain error at each location is defined as the ratio of the root mean square error between the predicted and measured responses over the root mean square value (intensity) of the measured strain response time history, given by where N is the number of data points in the time histories,ẑ i (k) is the predicted values from the nominal model based on the modal expansion Equation (10), and z i (k) is the simulated "measurements" at the i-th DOF, and k indicates the time index corresponding to time instant t k = k∆t. Figure 11a,b presents the results for the relative errors of the optimal sensor configuration design for eight sensors corresponding to information gain value of U = 8.38 (92% of the maximum that could be achieved by fully populating the plate surface with strain sensors), and for two alternative sub-optimal sensor configurations for eight sensors (Figure 11c,d)) corresponding to lower information gain value of U = 7.4 (81%) and for higher number of 10 sensors (Figure 11e,f) also corresponding to lower information gain value of U = 7 (80%). For the optimal sensor configuration cases the predictions are quite reliable with relative errors less than 2% and 0.8% over 90% of the surface of the plate for measurements simulated for noise and model error, respectively. The errors are higher over 10% of the surface close to the right edge where strain level are very low with high noise to signal ratio in the measured time histories. For both measurement errors (Figure 11a,c,e) and model/prediction error (Figure 11b,d,f), the predictions from the optimal sensor configuration (Figure 11a,b) are consistently better than the predictions obtained from the sub-optimal sensor configurations since the relative errors based on the optimal sensor configuration are overall lower than the relative errors obtained from the sub-optimal sensor configurations over the surface of the plate. In particular, for the case of measurement error, the predictions based on a sensor configuration with higher number of 10 sub-optimal sensors (Figure 11e) are significantly worse than the predictions from the optimal configuration containing less number of eight sensors (Figure 11a), emphasizing the need of a cost-effective design of the sensor network in a structure. It is also clear that the errors in response predictions obtained at the measured locations is lower than errors in the predictions at other non-measured locations. Finally, it should be noted that there exist sensor configurations corresponding to information gain values closer to the minimum information gain (not shown in the figures) that provide relative errors higher than 100% which means that predictions can be completely unreliable from such non-optimal sensor configurations.  Figure 12 presents relative error results for two optimal sensor configurations for seven sensors designed using large prior uncertainty (α = 10 2 ) in the modal coordinates and for two choices of the covariance matrix S of the prior distribution. The first case corresponds to covariance matrix of the prior proportional to the non-diagonal covariance matrix of the modal coordinates obtained for white noise input (see Equation (32)), while the second choice corresponds to diagonal isotropic covariance matrix with strength proportional to the variance of the first modal coordinate (S = α 2 Q ξ (1, 1)I). In the first case the relative intensity of the modal coordinates is taken into account in the definition of the prior covariance matrix, while in the second case this relative intensity is ignored and all modal coordinates are equally considered in the definition of the covariance matrix S of the prior distribution. It can be seen that the error distribution over the plate surface differs for the two optimal designs. Moreover, the relative errors in the predictions obtained from the first optimal sensor configuration are lower than the errors obtained from the second optimal sensor configuration, signifying the importance in considering the intensity of each mode, affected by the excitation frequency content, in the choice of the prior. (σ e = σ = 0.01, s = 10 −7 and α = 10 2 ).

Robustness to Model/Prediction and Measurement Error Uncertainties
Robust optimal sensor placement results are next obtained by taking into account the uncertainties in the model/prediction error parameters σ e and σ ε and the measurement error parameter s. Specifically, the re-parameterization σ e = σ ε = 10 −β σ and s = 10 −β s is used and uniform uncertainty is assigned in the values of β σ and β s with bounds that cover the previously lower and upper values of these parameters. The distributions are selected to be β σ ∼ U(2, 3), β s ∼ U (6,9), where U(a, b) denotes a uniform distribution with lower bound a and upper bound b. This case accounts for the realistic scenario of uncertain model/prediction and measurement errors assigned in the formulation, arising mostly from the uncertain excitation intensities and frequency content that have to be taken into account in the design of the optimal sensor configuration. The uncertain parameters β σ and β s are included in the nuisance parameter set ϕ and their uncertainty is taken into account in the generalized utility function introduced in Equation (25). The sparse grid algorithm [86] of order 4 is used to evaluate the integrals in Equation (25).
Results for the maximum robust information gain values are compared in Figure 13a,b with the corresponding maximum information gain values obtained by fixing the uncertain error parameters values (σ e , σ ε and s) to the minimum values by selecting β σ = 3 and β s = 9, as well as maximum values by selecting β σ = 2 and β s = 6. The corresponding optimal sensor locations are compared in Figure 13c,d. It can be seen that the robust information gain estimates differ from the estimates obtained from the fixed minimum and maximum values of the model/prediction and measurement error parameters. As expected the results for the robust information gain values are found between the information gain values using the minimum or maximum values of the error parameters. The optimal sensor configuration proposed by the robust OSP methodology differs from the optimal sensor configuration obtained by the OSP methodology corresponding to minimum values of the error parameters. Specifically, for the very small model/prediction error case the sensors tend to be placed towards the right edge of the plate since, despite the smaller strain intensity in this area, the noise to signal ratio is very small and thus the measurements from this right edge are also informative. The robust OSP design seems to be closer to the OSP design corresponding to the maximum values of the error parameters. This is due to the high measurement and model/prediction errors that are taken into account in the assigned probability distribution of the error parameters. As a result, sensors designed according to the robust information gain are placed farther away from the right edge of the plate due to larger noise to signal ratio in the locations close to the right edge.

Strain Predictions Using Displacement Observations
The methodology is next applied to the case where displacement sensors are used for predicting strains at the midpoints of all finite elements of the coarse mesh. Displacement sensors measure out-of-plane displacements at the nodes of the mesh, perpendicular to the plate surface. As before, the number of contributing modes are kept to eight. For choosing the measurement and model/prediction error parameters, the intensities of the displacement responses at all nodes of the finite element mesh to a white noise excitation at the right lower corner of the plate (point A in Figure 1) are shown in Figure 14. Based on the results in this figure the measurement errors can now be selected as in Table 3.  Table 3. Different measurement errors assumed. d min = Q 1/2 z,min ≈ 10 −4 is the minimum value of the node displacements to discrete white noise input with σ wn that cover 92% of the plate surface. d max = Q 1/2 z,max ≈ 5 × 10 −3 is the maximum value of the displacement to same white noise input. Uncertainties in the model/prediction error parameters σ e and σ ε and the measurement error parameter s are accounted for in the sensor placement design. As before, the re-parameterization σ e = σ ε = 10 −β σ and s = 10 −β s is used and uniform uncertainty is assigned in the values of β σ and β s with bounds to cover the lower and upper values of these parameters shown in Table 3. Thus, the distributions are selected to be β σ ∼ U(2, 3), β s ∼ U (4, 7).

Measurement Error
Results for the maximum robust information gain values are compared in Figure 15a,b with the corresponding maximum information gain values obtained for very small errors (β σ = 3 and β s = 7) as well as large errors (β σ = 2 and β s = 4). The corresponding optimal sensor locations are compared in Figure 15c for eight sensors and Figure 15d for 30 sensors. Comparing the results for the robust error case with the ones for small and large error cases, the results are found to be qualitatively similar to the ones obtained for strain sensor measurements in Figure 13. Specifically, the robust sensor placement design differs significantly from the design based on small error case, while it is closer to the sensor placement design based on the large error case. The optimal location of displacement sensors for the small error case tends to also cover the left fixed edge of the plate where displacements are relatively small compared to the middle and right locations of the plate since the noise to signal ratio from these displacement locations is small and thus the measured displacements, despite their relative small values, are informative. The optimal sensor placement for large error tends to move towards the left edge of the plate where the displacements are usually large and the noise to signal ratio is small. The locations close to the left edge (fixed support) of the plate are avoided in this case due to high noise to signal ratio for large measurement errors.  Figure 16a,b presents the results for the relative errors for the optimal displacement sensor configuration for 8 sensors corresponding to information gain value of U = 7.3 (84%), and for two alternative sub-optimal sensor configurations for 8 sensors (Figure 16c,d) corresponding to lower information gain value of U = 6.8 (78%) and for higher number of 10 sensors (Figure 16e,f) also corresponding to lower information gain value of U = 6.6 (75%). Comparing the effectiveness of the optimal and the two sub-optimal sensor configurations, the results are qualitatively similar to the ones presented in Figure 11a,b for the strain sensor case. The relative errors for the optimal sensor configurations are lower than the relative errors for the two sub-optimal sensor configurations for both 8 and 10 sensors and for both measurement (Figure 16a,c,e) and model (Figure 16b,d,f) errors, pointing out the superiority of the optimal sensor configuration for reliable response predictions. Comparing Figure 15a,b with Figure 13a,b, it is clear that the information gain values for strain sensors in Figure 13a,b are higher than the information gain values for displacement sensors in Figure 15a,b. Thus, among the two type of strain and displacement sensors it is preferred to place in the structure strain sensors. This is also confirmed from the relative errors values for the optimal sensor configuration obtained for the displacement sensors in Figure 16a,b. These relative errors reach values as high as 13% and 1% for the simulated measurement error and the model/prediction error cases, respectively, which are higher than the corresponding values of 2% and 0.8% for strain sensors presented in Figure 11a,b. However, this result holds for the specific plate structure analysed and cannot be generalized for other applications. An investigation for the best combination of displacement and strain sensors has to be performed for each application. The methodology proposed in this work can be extended to fuse sensors by optimally placing simultaneously displacement and strain sensors for gaining the maximum information for response predictions with the smallest uncertainty.

Conclusions
Using information and utility theory, the optimal sensor placement problem for reliable virtual sensing and response reconstruction is formulated based on the modal expansion technique as a problem of maximizing a multi-dimensional integral quantifying the information gain from the data. The framework addresses the challenging case of output-only vibration measurements and provides optimal sensor configurations that are robust to uncertainties in model parameters as well as in model/prediction and measurement errors. Such uncertainties are usually not known in the initial optimal experimental design phase and thus need to be postulated using prior distributions. Sparse grid or Monte Carlo techniques can be used to estimate the multidimensional integral that arises in the robust formulation. Computationally efficient heuristic forward and backward sequential sensor placement strategies are combined to estimate the near optimal sensor locations. Useful expressions are derived for the effect of measurement and model/prediction errors on the information gained by a sensor configuration. As these errors increase, it was shown analytically that the information gain decreases. In addition, it was analytically derived that the information increases as one adds sensors in the structure, as it would be intuitively expected.
The methodology was demonstrated by designing the optimal strain or displacement sensors over a plate model of a structure. A thorough investigation was conducted on the effect of measurement and model/prediction errors, the size of the prior uncertainty in the modal coordinates, the spatial correlation structure of the model error, the uncertainties in model/prediction and measurement errors on the optimal sensor placement and the variation of the highest and lowest utility (information gain) values as a function of the number of sensors. The issue of the effect of the excitation characteristics on the design of the optimal sensor configuration was also pointed out. Excitation characteristics (locations, intensity and frequency content of excitations) affect response intensities and thus the selection of the level of measurement errors due to sensor accuracy and the model/prediction error levels due to different mechanisms activated/re-activated from different excitation levels and/or excitation frequencies. It was demonstrated that sensor accuracy (measurement error), related to noise to signal ratio, affected the optimal placement of sensors. The model/prediction error has also an effect on the optimal design. In particular, when model/prediction error dominates the measurement error, the accuracy of the sensors plays insignificant role in the design of the optimal sensor configuration. The level of noise to signal ratio are not known a priori since the intensity and frequency content of the excitation and thus the level of measured response is not known. The size of model/prediction errors due to model inadequacy is also not known a priori. Thus a robust design is more rational to use in order to better account for uncertainty in measurements and model/prediction errors. Such robust design over wide uncertainty bounds of errors leads to optimal sensor placement designs that are closer to the ones obtained for high measurement and model/prediction error, provided that measurement error dominates the model/prediction error. The effectiveness of the optimal designs was validated against suboptimal ones by comparing errors in the predictions between the modal expansion method and simulated, noise and/or model error contaminated, measurements. It was also found that strain measurements are slightly more informative than displacement measurements for virtual strain sensing for the specific application. The proposed information-based method can be extended to select the optimal sensor configuration that contains both displacement and strain sensors.
The proposed OSP methodology is appropriate to use for reliably reconstructing responses that are important for providing data-driven safety and performance estimates of systems, as well as reconstructing stress response time histories that are needed for predicting fatigue damage accumulation.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  (18) Substituting Equation (14) into the inner integral of Equation (17) and using the properties of the logarithm (ln(A/B) = ln A − ln B), one has where the last equality is obtained by interchanging the order of integration of the double integral in the second term of Equation (A1) and noting that Y p(z i |y, δ, ϕ) p(y|ϕ, δ) dy = p(z i |δ, ϕ) Introducing the prior information entropy of z i (t) given the model parameter set ϕ, and the posterior information entropy H z i |D (y, δ, ϕ) = − Z p(z i |y, δ, ϕ) ln p(z i |y, δ, ϕ) dz i (A5) of z i (t) given the data y and the model parameter set ϕ, the integral in Equation (A2) simplifies to YD i (δ, ϕ, y) p(y, ϕ|δ)dy = − Y H z i |D (y, δ, ϕ) p(y|ϕ, δ) dy + H z i ϕ (A6) Equation (18) is readily obtained by substituting Equation (A6) into Equation (17).

Appendix B. Properties of Information Gain (Utility Function)
For notational convenience, in the following analysis the dependence of the quantities on the uncertain parameter set ϕ is dropped. From the mathematical structure of the covariance matrix Σ z i |D (δ) appearing in Equation (12), one can readily observe that the quantity Σ z i |D (δ) is non-negative. Additionally, for four matrices A 1 ∈ R n×n , B 1 ∈ R m×m , V ∈ R m×n and U ∈ R m×n , the following useful property for the inverse of the sum of two matrices holds Setting A 1 = S −1 , B 1 = Q −1 e , U = V = L(δ)Φ and applying Equation (A7), the covariance matrix Σ z i |D (δ) in Equation (12) can be written in the alternative form The ratio between the posterior and prior covariance matrices in Equation (24) can thus be simplified in the form r z i |D (δ) = 1 − (L(δ)ΦSψ i ) T L(δ)ΦSΦ T L T (δ) + Q e −1 (L(δ)ΦSψ i ) Taking into account that the matrices S and L(δ)ΦSΦ T L T (δ) + Q e are semi-positive definite and that Q ε i > 0, it can be concluded that the second term in Equation (A9) is nonnegative and thus the value of the ratio is always less than or equal to one, i.e., r z i |D (δ) ≤ 1. As a result, the utility function, quantifying the information gain from the data for least uncertainty in the response prediction, is always greater than or equal to zero.

Appendix B.1. Effect of Modelling and Measurement Errors
Next we examine the effects of measurement and models errors quantified by the matrices Q e and Q ε . The higher the value of the prediction error variance Q ε i , the higher the value of the denominator ψ T i S ψ i + Q ε i in Equation (A9) and the smaller the information gained by the sensor configuration. Let now Q e be an error covariance matrix similar to Equation (30), corresponding to higher values of the measurement and model error, so that it admits the representation Q e = Q e + ∆Q e , where ∆Q e is positive definite matrix. This representation is true for the case where the values of σ e and/or σ em in Equation (30) are increased to represent higher measurement and/or modeling error. Substituting Q e in place of Q e in Equation (A9) and using the expansion Equation (A7) for the inverse of [L(δ)ΦSΦ T L T (δ) + Q e ] = [L(δ)ΦSΦ T L T (δ) + Q e + ∆Q e ] by setting A = [L(δ)ΦSΦ T L T (δ) + Q e ], B = ∆Q e and U = V = I, the following expression for the new ratio r z i |D (δ) corresponding to the error covariance matrix Q e arises r z i |D (δ) = r z i |D (δ) + g T ∆Q e + (L(δ)ΦSΦ T L T (δ) + Q e ) −1 −1 g where g = (L(δ)ΦSΦ T L T (δ) + Q e ) −1 (L(δ)ΦSψ i ). The numerator in the second term in the right hand side of Equation (A10) is positive due to the fact that both the matrices ∆Q e and (L(δ)ΦSΦ T L T (δ) + Q e ) are positive definite matrices. Thus the ratio r z i |D (δ) > r z i |D (δ) which implies that the information gain decreases as the measurement and model/prediction errors increases. As the model or measurement errors σ e , σ ε and σ em become very large (in the limit as errors approach infinity), the ratio r z i |D (δ) between the posterior and prior covariance matrices approach zero and the information gain is zero, independent of the number of sensors placed in the structure.

Appendix B.2. Utility Versus Number of Sensors
Consider a sensor configuration δ and a new sensor configuration δ 1 which consists of the sensors in the configuration δ and one additional sensor placed in the structure. We will show that the utility value for the sensor configuration δ 1 , containing an extra sensor, cannot be lower than the utility value of the sensor configuration δ, that is, U i (δ 1 ) ≥ U i (δ). Let L(δ 1 ) andQ e be the sensor locator matrix and the covariance error matrix in Equation (24) for r z i |D (δ 1 ) corresponding to the augmented sensor configuration δ 1 . Then one has the partitions L T (δ 1 ) = L T (δ) 1  Substituting in the numerator in Equation (24) for the sensor configuration δ 1 , setting E = Φ T L T (δ)Q −1 e L(δ)Φ + S −1 and using Equation (A7) with A = E, B = ∆Q 1,e and U = V = L(δ 1 )Φ, the ratio r(δ 1 ) for the sensor configuration δ 1 takes the form from which it follows that r z i |D (δ 1 ) ≤ r z i |D (δ) since the matrices ∆Q 1,e , Q e , E and thus Z(δ 1 ) are symmetric and semi-positive definite. As a result U i (δ 1 ) ≥ U i (δ) and thus the information gained by adding a sensor in an existing sensor configuration, which is consistent with intuition. Moreover, it is straightforward to conclude that the maximum information gain is an increasing function of the number of sensors.