The key idea of the POD method is to expand a state variable

$u(x,t)$ at certain time

t in a spatial domain

X as an orthogonal series

where

${\mathsf{\Phi}}_{k}\left(x\right)$’s are orthonormal spatial modes in

${L}^{2}\left(X\right)$. Expansion (1) is normally truncated after

K terms, which represents the dimensionality of the system. Then, number

K is usually small so that we have a low dimensional representation of the process

$u(x,t)$ while

$\left|\right|u(\xb7,t)-{\sum}_{k=1}^{K}{a}_{k}\left(t\right){\mathsf{\Phi}}_{k}\left|\right|\le \u03f5\left|\right|u(\xb7,t)\left|\right|$ and

ϵ is small, say

$\u03f5=0.1$.

In this paper, we will use POD modes obtained from the exact complete field and hence they are “perfect” POD modes. We first review the Gappy-POD before we discuss our sensor placement strategy and the effect of noisy measurements.

#### 2.1. A Review of Gappy-POD

In practice,

$u\left(x,t\right)$ may be only obtained at limited locations. To deal with such data (called gappy data), a so-called Gappy-POD was proposed in [

7] in order to reconstruct gappy flow fields from a limited number of measurements. As a brief review, we adopt the introduction in [

19,

27]. Consider a scalar gappy field

$\tilde{u}\left(x,t\right)$ as a point-wise product of an indicator function

$m\left(x,t\right)$ and a complete field

$u\left(x,t\right)$,

i.e.,

The indicator function

$m\left(x,t\right)$ has values 0 or 1 depending on whether we have data at the corresponding space-time location or not. The goal is to construct a reliable estimator

$\widehat{u}\left(x,t\right)$ of

$u\left(x,t\right)$ in the space-time regions where

$m\left(x,t\right)=0$. In the Gappy-POD framework, it is done applying the method of least-squares: we look for a representation of

$\widehat{u}$ in the form

where

${\mathsf{\Phi}}_{k}$ are normalized POD spatial modes of

$\widehat{u}$ and

${b}_{k}$ are unknown coefficients. Note that in practice,

${\mathsf{\Phi}}_{k}$’s, are extracted from simulation or data assimilation results, and in this paper they are from the perfect model (1). Then, for each snapshot (e.g., each time

t), we minimize the approximation error between

u and

$\widehat{u}$ in the gappy norm

where the (gappy) inner product

${(\xb7,\xb7)}_{m}$ is defined as

Minimizing Equation (

4) with respect to

${b}_{k}$ yields the linear system

where

${\mathsf{M}}_{kj}\stackrel{\mathrm{def}}{=}{({\mathsf{\Phi}}_{k},{\mathsf{\Phi}}_{j})}_{m}$ and

${f}_{j}\stackrel{\mathrm{def}}{=}{(u,{\mathsf{\Phi}}_{j})}_{m}$. Then, we can obtain time-coefficient vector

$b=[{b}_{1}\left(t\right),...,{b}_{K}\left(t\right)]$ from Equation (

6).

**Remark 1.** One should be careful to solve Equation (

6) efficiently as the condition number of

$\mathbf{M}$ varies with

$m(x,t)$. The matrix

$\mathbf{M}$ reduces to a

$K\times K$ identity matrix in case of complete data,

i.e.,

$m(x,t)=1$. For gappy data, the condition number of

$\mathbf{M}$ can be small or large depending on the

$m(x,t)$. When the condition number of

$\mathbf{M}$ is large, we still formally write Equation (

6), but we will solve its corresponding overdetermined system rather than Equation (

6). If

$\mathbf{M}$ is singular, we seek the pseudoinverse. From now on, we keep this notation and will not state it explicitly.

Based on the aforementioned POD modes and the reconstruction procedure, the authors in [

27,

29] demonstrated that effective sensor placement strategies can be designed. In this case, the indicator function

$m(x,t)$ is defined through sensor locations,

i.e.,

$m({l}_{j},t)=1$ if there is a (working) sensor at position

${l}_{j}$—zero otherwise. The only difference is that the complete field

u is now substituted by sensor measurements

$d\left({l}_{j},t\right)$ (

$j=1,...,{N}_{s}$), where

${l}_{j}$ denotes the position of the

j-th sensor and

${N}_{s}$ is the total number of sensors. The resulting linear system will be also denoted by Equation (

6).

In this work, we use the 24 hourly spatial wind field simulation data sets in Maine Bay simulated from the Weather Research and Forecasting (WRF) model. The data we use contain 24 snapshots of the wind field within an area of about 35 km × 25 km from 13 July 2004 00:00 a.m. to 14 July 2004 00:00 a.m. The computation domain and grid are illustrated in

Figure 1. The time difference between two successive snapshots is 1 h, namely, snapshot 1 corresponds to data on 13 July 2004 at 0:00 a.m., snapshot 2 corresponds to the data on 13 July 2004 at 1:00 a.m.,

etc. The temporal average of the window field is subtracted from the field, hence we concentrate on the fluctuation. By cross-correlating different snapshots obtained from the wind simulations, we construct the covariance matrix and its eigen-decomposition. From the eigen-decomposition, we obtain the POD eigenvalues and corresponding hierarchical modes. Here, we use Equations (

2)–(

6) to obtain Gappy-POD modes. We note that in [

27], the procedure Equations (

2)–(

6) is iterated when the measurement data is incomplete. Here, we focus on complete data but with noise.

In

Figure 2, we show the normalized POD eigenvalue spectra of total velocity. For illustration, in

Figure 3, we show the contour plots of the first and second POD spatial modes for total velocity.

In case of complete data, e.g.,

$m(x,t)=\mathbf{1}$, the

${L}_{2}$ error of the truncated approximation for each snapshot in Equation (

4) is:

due to the orthogonality of

${\mathsf{\Phi}}_{k}$ and

N is the total number of snapshots,

i.e.,

$N=24$. We note that the

${\u03f5}_{c}$ is the lower bound of the error by Gappy-POD method. Since we use gappy data to reconstruct the entire field, the coefficients for the POD models may not be optimal, the coefficients

${b}_{k}$’s in Equation (

3) are not equal to

${a}_{k}$’s in Equation (1). Therefore, the reconstruction error for a fixed snapshot (fixed

t) is:

In [

18,

19], the authors demonstrate numerically the convergence of the reconstruction error of the Gappy-POD method:

${\u03f5}_{g}$ becomes closer to

${\u03f5}_{c}$ as the number of sensors increases. More specifically,

${\sum}_{k=1}^{K}{({a}_{k}-{b}_{k})}^{2}\to 0$ as number of sensors increases. In our data, the first six POD modes (

$K=6$) capture more than

$90\%$ percent of the total energy. Throughout the paper, we set

$K=6$ and we select locations for sensors based on these six modes.

#### 2.2. Constrained Sensor Placement

We first consider the problem of finding the best possible locations where to deploy a limited number of sensors to reconstruct the entire field with existing

exact POD modes. Authors in [

18,

30] demonstrate that the extrema of POD modes are very good locations, when examining reconstruction error in

${l}_{2}$ norm, see Equations (

10) and (

11), to place the sensors for regional ocean forecasting. Yang

et al. [

19] further introduced the exclusion cylinder to further increase the accuracy of the reconstruction by reducing the redundancy as locations of extrema of different POD modes can be very close. As shown in [

19], the exclusion cylinder also helps to reduce the variance of the reconstructed field if the measurement is polluted by the noise. In this paper we employ the exclusion disk instead of the exclusion cylinder for a 2D model. Specially, we impose the following constraint that the distances between each two sensors are larger than

R. That is, if we place two sensors at the point

$({x}_{1},{y}_{1})$ and another point

$({x}_{2},{y}_{2})$, then the following restraint is imposed:

In our wind model, the radius

R will be expressed in kilometers. This is achieved by a greedy algorithm: we start with a empty set

$F=\varnothing $, then for the next candidate location

$({x}^{*},{y}^{*})$ of the sensor (an extrema of a specific mode), we examine whether

$\sqrt{{({x}^{*}-{x}_{i})}^{2}+{({y}^{*}-{y}_{i})}^{2}}>R$ holds for each point

$({x}_{i},{y}_{i})$ in

F. If not, we neglect this location and examine the next candidate. When a qualified new sensor location

$(\tilde{x},\tilde{y})$ is found, we expand

F as

$F\leftarrow F\cup (\tilde{x},\tilde{y})$.

We distribute the sensors evenly according to the extrema of the POD modes we use to reconstruct the field. In

Table 1, we provide examples of sensor distribution for a fixed number of sensors with different numbers of POD modes reconstructing the field. With a configuration denoted as “2

s-2

s-2

s-2

s” with

$s=1$, we distribute 2, 2, 2 and 2 sensors associated with the POD modes 1, 2, 3 and 4, respectively. Similarly a configuration denoted as “4

s-4

s-4

s-4

s-4

s-4

s” with

$s=2$ means that we use eight sensors in connection with each POD mode from 1 to 6. Here, we employ this even distribution to demonstrate our idea. The numerical results in [

18,

19] demonstrate that different configuration of sensor placement will not induce significant difference in the reconstruction.

In our numerical tests, we fix the number of sensors to 24 and use six modes (hence the configuration is “4-4-4-4-4-4”) to reconstruct the entire field to see the effect of the exclusion disk size in reconstructing the total velocity.

Figure 4 presents the locations of the sensors with different sizes of the exclusion disk. The contour is for the first POD mode. With larger

R, the radius of the disk, the sensors are distributed further away from each other.

**Remark 2.** To avoid redundant measurements, the authors [

25] suggested an approach without using the exclusion disk strategy. However, the approach therein only allows one sensor per mode.

To determine the effect of the different exclusion disk size, we define the reconstruction error as (for each flow snapshot)

where

${U}_{i}$ is the measurement of total velocity at the

i-th snapshot and

${\widehat{U}}_{i}$ denotes the estimator of the

i-th snapshot obtained by solving Equation (

6). Here, the integration is performed on the computational domain in physical space. To measure the reconstruction error within the whole period of interest, we also define the time-averaged error as

where

S denotes the total number of available snapshots within the considered time interval. The time-averaged error for

$R=0,0.5,1,2$ are presented in

Table 2. Similar to the cases studied in [

19] a proper selection of the size of exclusion disk helps to improve the accuracy of the reconstruction. In addition, the size of the exclusion disk need not be very large as we can observe that the error for

$R=0.5,1,2$ are quite close. In addition, the condition number of matrix

M in Equation (

6) decreases as

R increases, which is consistent with the results reported in [

18,

19]. As an illustration, we compare the exact velocity field of the first snapshot (13 July 2004 00:00 a.m.) with those reconstructed with

$R=0$ and

$R=2$, respectively in

Figure 5. It is demonstrated in

Figure 5 that the reconstruction of the wind field at the first snapshot is more accurate with the exclusion disk than that without the exclusion disk.