Efficient Sensor Node Selection for Observability Gramian Optimization

Optimization approaches that determine sensitive sensor nodes in a large-scale, linear time-invariant, and discrete-time dynamical system are examined under the assumption of independent and identically distributed measurement noise. This study offers two novel selection algorithms, namely an approximate convex relaxation method with the Newton method and a gradient greedy method, and confirms the performance of the selection methods, including a convex relaxation method with semidefinite programming (SDP) and a pure greedy optimization method proposed in the previous studies. The matrix determinant of the observability Gramian was employed for the evaluations of the sensor subsets, while its gradient and Hessian were derived for the proposed methods. In the demonstration using numerical and real-world examples, the proposed approximate greedy method showed superiority in the run time when the sensor numbers were roughly the same as the dimensions of the latent system. The relaxation method with SDP is confirmed to be the most reasonable approach for a system with randomly generated matrices of higher dimensions. However, the degradation of the optimization results was also confirmed in the case of real-world datasets, while the pure greedy selection obtained the most stable optimization results.

The physical aspects of the phenomena are interpreted as state space models, while the assumed system representations are often nominal. Our study implements optimization frameworks for sensor node selections based on the criteria of expected estimation errors defined for these models. Sensor measurements in many nondynamical configurations are statistically described by the Fisher information matrix (FIM), which originated in the parameter sensitivity analysis [47,48]. This measure addresses the expected error ellipsoid of the estimation using instantaneous measurements, which corresponds to the inverse of the Cramer-Rao bound, see Chapter 8 of [49,50]. The combinatorial problem structure is transformed into small subproblems in a greedy formulation [40,41,51], of which, the solution quality is known to be ensured by the submodularity property in such optimization problems [52][53][54][55][56]. This property of the optimization problem helps to exploit other criteria from information theory, including entropy and various types of information distances [3,57,58]. Moreover, greedy selections are empirically confirmed to effectively work even in the absence of submodularity [51,59,60]. There are also other approaches that seek the global optimum by relaxing the integer constraints on the selection variable [50,61], or by employing proximal algorithms that form proximity operators [62][63][64].
The subset selection of sensor nodes for dynamical systems is similarly conducted based on the estimation error covariance, while the measurement quality evaluation tends to be more cumbersome. The observability Gramian is the counterpart of the FIM for linear time-invariant (LTI) representations with deterministic dynamics, as shown in [65]. The actuator node selection, which is a dual problem of the sensor node selection, is similarly obtained using the controllability Gramian [66,67]. It is worth emphasizing that a previous analysis used examples of random and regular networks of nodes of tens or hundreds due to the computational cost. For example, a previous study reported a tremendous increase in the computation time of the greedy method and the SDP relaxation method for a power grid system of such a size [66]. There are likely to be computational issues when the optimization objective is further extended to the observability for the nonlinear state space models [44,68,69], the error covariance of the Kalman filter [70][71][72][73][74], and the H 2 norm of an LTI system, which can be reduced through balanced truncation [75][76][77]. Accordingly, the main interest in this study is the applicability of the Gramian-based methods for the larger scale system constructed by the data-driven modeling methods.
The main claims of this paper are listed below: Novel selection methods proposed: Two novel selection methods are proposed and implemented. Those two methods utilize the gradients of the objective functions while relaxing them to continuous functions. Computations in both methods are expected to be remarkably accelerated as compared to those of existing greedy or relaxation-based algorithms since the gradients of objective functions are mere quadratic forms of the dimensions of sensor candidates. Comparison using high-dimensional and low-rank systems: A comparison of selection strategies is also provided in Section 4 to illustrate the characteristics of each selection method in terms of the execution time and the acquired optimization measure. This comparison also elaborates effective computational complexities of each selection method, which is distinct from the results of the previous study. Figure 1 depicts the main frame of this manuscript. In the following sections, the basic formulation of the optimization problem is revealed in Section 2 after providing the dynamical system of interest. Section 3 subsequently addresses our novel algorithms for optimization while the previously presented approaches are briefly included. The systems are constructed in two different ways, for comparison; one is constructed via random numbers as a general case in Section 4.1, and in the other case, a data-driven method using proper orthogonal decomposition (POD) [78] is applied to real-world datasets, as seen in Sections 4.2 and 4.3.

Data-driven reduced order modeling
Obtain small measurement subset based on the observability measure The representative data point is revealed from "rich" measurement data by use of a data-driven model and optimization procedure. (Right) A data-driven method constructs linear reduced-order models before the optimization is conducted by approximate algorithms, including our novel methods, denoted by bold types.

LTI State Space and the Observability Gramian
This section provides rudimentary formulations that represent dynamics and optimization tasks based on the observability of the system. Problem settings are first presented with basic equations while notations for our formulation are defined. The term "observability" is omitted hereafter unless it causes confusion with the controllability Gramian or other Gram matrices. One may be reminded that the methodologies presented here are similarly adopted to controllability optimization since these properties are dual for the linear systems. A discrete-time LTI state space model is considered with r(∈ N) states and n(∈ N) measurements. Let (C, A) be an observable pair and an autonomous system generates a trajectory of state variables x k ∈ R r and observations y k ∈ R n . The subscript k ∈ N refers to a snapshot at the instance k. Assume the observation Equation (2) is corrupted by a Gaussian noise v k ∼ N (0|R), which is independent of the state x k and has the variance of the same amplitude, E v k v k = R = σ 2 I n , where I n is the identity matrix of size n. Here, E[ ] is taking the expectation over the random variable. Some data-driven techniques systematically provide this linear representation from the measured data, such as proper orthogonal decomposition and dynamic mode decomposition [37,38]. In the state space representation obtained by the data-driven techniques, the state vector represents coordinates of a low-dimensional subspace inherent in the high-dimensional space of the measurement vector and, therefore, n r is assumed hereafter. The observability of the system is quantified by the Gramian equation In our formulation, this matrix is characterized by the linear least-squares estimation of the state vector that utilizes time-series measurements. The measurement equation is extended first, using from k = 0 to k = l − 1 of Equations (1) and (2): where C 0:l−1 and v 0:l−1 are the stacked components in the brackets in Equation (4). An estimate of x 0 is obtained by the linear inversion as follows: wherex 0 ∈ R r and † stand for the estimate of x 0 and the Moore-Penrose pseudo inverse, respectively. The matrix C 0:l−1 is assumed to have full column rank due to n r. An error covariance matrix under the estimationx 0 is proportional to the inverse matrix of the Gramian, as shown in the following equation: Here, let a small number of sensors be deployed while maintaining the quality of state estimation. The optimization problem is formulated as a selection of a subset I p = { i 1 , . . . , i p } from all available n measurement nodes, labeled as I n = {1, 2, . . . , n}, where a measure of the Gramian is optimized under some constraints. A permutation matrix P I p that extracts a part of the measurement corresponding to the sensor indices I p is defined by where a unit vector e i ∈ R n has unity in the i-th entry, and the rest is zero. The Gramian given by the selected sensors is rewritten as a subset function and, therefore, W O (I n ) = W O . Here, W O I p can be calculated by solving the following Lyapunov equation: Some measures for the optimization of W O I p were presented in [66]. A maximization of the determinant of the Gramian is configured in the current study. The maximization problem is as follows: The logarithmic form, which is monotone, is considered for the ease of calculations in the algorithms. This determinant maximization strategy is commonly used in the sensor placement and optimal design of interpolation methods [40,79].

Sensor Selection Strategies
The optimization problem (10) is a combinatorial problem and, therefore, finding the true optimum is computationally prohibitive. This section introduces four methods, as shown in Table 1, for obtaining a suboptimal yet reasonable solution to (10). Section 3.1 deals with convex relaxation approaches to (10), where the notations of SDP and approximate convex relaxation stand for the selection problems based on the semidefinite programming (SDP) of [67] and an approximate smooth convex relaxation for the Newton method, which is extended from [50], respectively. Subsequently, Section 3.2 provides formulations with regard to the pure greedy and gradient greedy methods, as a simplified greedy implementation of [66] for the matrix determinant maximization, while the latter is its linear approximation. The computational complexities are discussed in Section 3.3. Table 1. Selection algorithms for each relaxed problem and the expected arithmetic complexity order based on the basic matrix operations. Several approaches are proposed in the presented paper.

Convex Relaxation Methods
This strategy replaces the Gramian W O I p by the weighted sum of the Gramian for every sensor candidate, namely the relaxed Gramian. Let the sum of the selection weight s ∈ [0, 1] n be p, and the relaxed optimization problem of interest is: This "weight" formulation is widely used in the various types of sensor selections, including linear inverse and Kalman filter estimations, [50,61]. The "gain" formulation, on the other hand, is employed to optimize the gains for yielding state variables from measurements. The group regularization is therein adopted to distinguish the representative measurement locations, which can also be found in the literature on sensor selection for linear inverse and Kalman filter estimations [62][63][64]80]. To the best of our knowledge, there is no existing study in which the gain formulation is applied to the sensor selection based on the observability Gramian. This may be because the Gramian is related to an infinite series of temporal measurements, and so are the gains for such measurements. Truncating Equation (6) to a finite time horizon is obviously a possible option, but dealing with a large solution vector and defining an appropriate finite time horizon as a hyperparameter are practically infeasible. Accordingly, the implementation of the gain formulation remains an interesting challenge.

Semidefinite Programming-Based Selection (SDP)
Convex relaxation for Gramian optimization was previously introduced in the section II-B of [67]; therefore, the readers should refer to the original work for more details. It should be emphasized that an optimization problem in the discrete-time form is briefly revisited in our study and is included in Section 4 for a comparison of selection methods. An optimization problem (11) is transformed into the following SDP representation: where c i is the i-th row of the measurement matrix C, and 0 is a zero matrix with appropriate dimensions. It should be noted that the Lyapunov equation, imposed as a constraint in the problem, can be satisfied by SDP solvers introduced later. The complexity in Table 1 is based on a path-following method for a general primal-dual interior point method [81].

Newton Method for Approximate Convex Relaxation, and Its Customized Algorithm with Randomized Subspace Sampling (Approximate Convex Relaxation)
This novel method solves the convex relaxation problem by applying the Newton method and a customized randomization technique applied to Equation (11), with a penalty term that bounds the weight variables added.
The description of the proposed method starts with the extension of the formulation of sensor selection for static systems first introduced in [50].
In their approach, the Newton method solved a weight formulation of a determinant optimization for the FIM of the linear inverse problem, which returned the p-largest indices of s as a result of a heuristic sensor node selection. In this study, the idea above is straightforwardly extended to the Gramian. A smooth convex objective function approximates Equation (10) as follows: with κ > 0, which adjusts the smoothness of the objective function. A Newton step δs ∈ R n is determined by minimizing the second-order approximation of the objective function under a constraint δs = 0 as discussed in the Section 10.2 of [50,81], with notation f referring to the objective function in Equation (13), as shown below: The first and second derivatives, with respect to the selection weight s, are given by where δ i,j is the Kronecker delta, and is the solution of the following Lyapunov equation; The procedure is carried out according to Algorithm 1, where the iteration is completed when the 2 norm of the solution update falls below the threshold given. (13) Input: C ∈ R n×r , A ∈ R r×r , p ∈ N Output: Indices of chosen p sensor positions I p Set an initial weight s ← 1p/n while convergence condition not satisfied do Calculate ∇ f by Equation (15) and ∇ 2 f by Equation (16) Calculate δs by Equation (14) Obtain step size t by backtracking line search Set s ← s + tδs end while Return the indices of the p-largest components of s as I p One of the most computationally demanding steps of the Newton method in Section 3.1.2 is the inverse of the Hessian in Equation (14), approximately reaching O(n 3 ). The rest of this section describes an efficiency improvement for the Newton method of Algorithm 1, as the computational cost was relaxed using small sketches, as in [82,83]. The dimension of the sketched Newton step isñ < n, where the results forñ/n = 0.1 in [83] showed a drastic reduction in the total computation time for convergence in spite of the increase in the number of iterations. In this study, a sketching matrix Sñ ∈ Rñ ×n is constructed using a permutation matrix for ease of computation, as summarized in the following and in Algorithm 2. The subspace referring to the permutation is randomly assigned from the n components, which is indexed by;

Algorithm 1 Newton algorithm for Equation
This random selection is biased, according to the selection weight s in the previous iteration. This heuristic leads to a reasonable acceleration of the convergence because the weights of higher weighted sensors in the first few iterations are more likely updated frequently for convergence. In this study, half of the permutation coordinates correspond to the indices of the highest ρ =ñ/2 values of s. The other half of the permutation coordinates were randomly selected from the remaining n −ñ/2 dimensions, and the exploration of sensor selection was further accelerated.
Accordingly, the calculations of the gradient (the first term in Equation (15)) and the Hessian (also the first term in Equation (16)) are simplified to the subspace indexed by Iñ. These subsampled derivatives and the derived Newton step are denoted by ∇f , ∇ 2f , and δs. The criterion of the convergence is modified from that in Algorithm 1 due to the randomness, where the algorithm stops if the update size is less than a given threshold in n/ñ consecutive iterations.

Greedy Algorithms
This type of algorithm sequentially adds the most effective sensor to the subset determined in the previous greedy iterations. The greedy algorithms, therefore, approximate the combinatorial aspect of the original optimization problem, yet preserve the discrete optimization structure. However, even the use of the greedy selection is empirically known to be prohibitive for the system of high-dimensional state vectors and observation vectors; therefore, the accelerating method is often applied [55,56]. Section 3.2.2 provides an approximated form of the greedy selection.

Greedy Selection with Simple Evaluation (Pure Greedy)
In the selection of the q-th sensor, where 1 ≤ q ≤ p, the Gramian W O I q is obtained from the algebraic Lyapunov Equation (9). The submodularity in the objective function guarantees the quality of greedy solutions [52,53,84], and the derivation of the submodularity of several metrics related to the Gramian is discussed in detail in [66]. This method starts with finding the most effective single sensor; therefore, the determinant can be zero if the observability is not obtained by any single sensor. As mentioned in [66], one may evaluate the objective function for the observable subspace on such occasions. Algorithm 3 starts by decomposing the Gramian for each sensor candidate and identifying its subset I * , which results in the highest dimension of the observable subspace, then evaluates the determinant of the decomposed GramianŴ O I q−1 ∪ {i} into the observable space by multiplying their nonzero eigenvalues. Once the full-rank Gramian is achieved by an obtained subset, the greedy algorithm drops the computation of the rank of the candidate Gramian since the rank is monotone-increasing [66].

Greedy Selection with Gradient Approximation (Gradient Greedy Proposed in This Study)
Algorithm 4 proposed in this paper is expected to accelerate the greedy evaluations by the linear approximation of the objective function, which is schematically explained in Figure 2.
True evaluation of objective functions ! j is selected. The gradient greedy algorithm selects, in the current step q, a sensor corresponding to the element of the highest gradient of the objective function f (s) in the prior step. Recall the gradient with respect to the i-th sensor candidate of Equation (15); where c i is the same notation as used in (13). In the selection of the q-th sensor, the i-th component of ∇ log det(Q(s)) is given by replacing Q(s) → W O I q−1 as follows: This approximation drops the number of evaluations of the Lyapunov equation and the matrix determinant from the greedy selection, and instead computes inner products of vectors with weighting matrix including the Gramian determined in the previous search. Note that the inverse in Equation (20) is singular when the system is unobservable. A small regularization term diag[δ, δ, . . . , δ] is therefore added to W O I q−1 to ensure its regularity.

Expected Computational Complexity
This section describes the expected computational complexity of selection methods based on the matrix operations found in textbooks of basic linear algebra. The leading terms, with respect to the system dimension parameters n, p, r, are summarized in Table 1. This also gives the expected overhead term of the run time for each algorithm, while in practice, more compact expressions can be obtained, depending on the libraries for computation employed and the problem structures considered. Readers should also note that the notation of complexity O( ) conventionally omits the constant factors for simplicity and, therefore, lower-order terms might overwhelm the others in actual computations.
Although the optimal subset I p for Equation (10) can be found by calculating the objective function for all subsets of which the member size is p, the computational complexity of this brute-force search will be O n p r 3 and barely amenable. As for the existing approaches, a naive implementation of the linear convex relaxation method assuming the semidefinite problem (SDP) structure is a simplified form of [67]. The interior point method and path-following iterations should require (O(n 4 ) + O(n 2 r 2 ) + O(nr 3 )) per iteration to construct the Newton direction. This is due to the linear matrix inequality (LMI) constraints, such as the Gramian, being semidefinite, or the selection variable being bounded, while it is known that certain problem structures ameliorate the overall costs ( [81], Section 11.8). A simple greedy algorithm (pure greedy) requires solving the Lyapunov equation and calculating the determinant of the observability Gramian, of which, the cost is O r 3 , for all n sensor candidates and p sensor increment iterations. The overall computational complexity is, therefore, O pnr 3 .
The proposed methods will accelerate both of the existing approaches so far by comparing the theoretical complexities. The proposed convex relaxation method (denoted by approximate convex relaxation) utilizes the Newton method iterations for the sensor selection. The algorithm requires O n 3 computations of the inversion of Hessian. Furthermore, Equation (18) will be solved for every j ∈ {1, 2, . . . , n} for Equation (16), which requires computational complexities of O nr 3 for solving the algebraic Lyapunov equation (18) and obtaining the matrix multiplication inside, respectively. The first term in Equation (16) requires O n 2 r 2 ; therefore, the leading terms will be the sum O n 3 + O n 2 r 2 + O nr 3 per iteration. The sketching matrix, in Algorithm 2, compresses the dimension of the Newton system to n →ñ, and reduces the costs of the above-mentioned computations by ratio. However, its computational complexity notation does not change. The proposed gradient greedy algorithm (gradient greedy) replaces the evaluations of the Lyapunov equation over n sensor candidates with inner products of vectors weighted by an r × r matrix. The overall computational complexity will be O pr 3 + O pnr 2 , where an evaluation of the algebraic Lyapunov equation is taken only p times.

Comparison and Discussion
Numerical examples illustrate the efficiency of the considered algorithms in Section 4 The tested systems, which are represented by (C, A), are generated by synthetic datasets of random numbers and real-world counterparts of large-scale measurements. As used in Section 2, the systems of interest are in discrete-time LTI forms. The entire computation was conducted on MATLAB R2022a, and CVX 2.2 [85,86]; the MOSEK solver was used for the SDP-based selection, as noted in Appendix A. The dlyap function was used for the solutions of the discrete-time Lyapunov equations, such as Equations (9) and (18), adopting the subroutine libraries from the Subroutine Library in Control Theory (SLICOT) [87][88][89][90]. It should also be noted that there is another approach to solving the equation, such as [91]. The MATLAB programs are available through the GitHub repository of the present authors [92].

Results of the Randomly Generated System
The characteristics of the sensor selection methods are investigated for different sets of system dimension parameters (n, r, p) by varying one parameter while fixing the others. The abscissa of Figure 3a,b is the rank r ∈ {10, 20, . . . , 60}, which is the dimension of the re-duced state variable vector and, thus, of the Gramian. In Figure 3c,d, n ∈ {2 10 , 2 11 , . . . , 2 16 } denote the sizes of node members comprising the original measurements, and accordingly the number of sensor candidates. In Figure 3e,f, p ∈ {1, 2, . . . , 10, 20, . . . , 100} represents the varying number of sensors selected. 10 30 60 Rank of system; r Comp.  The problem setting for the system of random numbers is found in [39]. First, the conjugate complex numbers, of which the real parts are negative, are assigned to the eigenvalues of a damping continuous-time system matrixÂ. A discrete-time system matrix can be obtained by A = exp(Â∆t), which is stable and full-rank, whereas ∆t is the time step of the discrete system. The observation matrix C is a column-orthogonal matrix generated by the singular value decomposition for a matrix of Gaussian random numbers of appropriate dimensions. Sensors up to p are selected using the algorithms presented in the previous section, and the objective function, log det W O I p , is calculated for each selected subset. Figure 3 shows the performance of the selection methods applied to the system of random numbers. Figure 3a,c,e illustrate the total computation time of each algorithm in the tests, where the gradient greedy method is the least time-consuming, followed by the pure greedy and the SDP-based methods, which produce similar results. Unfortunately, the proposed approximate convex relaxation method took the longest time to solve the optimization in almost all of the conditions tested despite its acceleration owing to the customized randomization. However, it is also expected that the computation time of the SDP-based method and the pure greedy selection will exceed the others for even larger r and p, respectively.
The empirical orders of the computation time with regard to each parameter are analyzed herein. The growth rates of the computation time of each algorithm are evaluated by solely changing the system dimension parameters r, n, and p, as summarized in Table 2, based on the results shown in Figure 3a,c,e. First, the gradient greedy method ran in time proportional to n, but it is not clear regarding r. Since the number of sensor candidates, n, is much larger than r in the first experiment, the term with r 3 is not significant for the gradient greedy method. The increase against p is on the order of unity, which is clearly reasonable. Second, the empirical orders of the pure greedy method when solely changing r, n, and p are r 3 , n, and p, respectively, which agrees with the expected leading order. One can see a term with r 3 becomes dominant as r grows in Figure 3a. Meanwhile, the estimated orders of the SDP-based method when solely changing r and n are r [4] and n, while those of the approximate convex relaxation method are r [3] and n [2] , respectively. Here, let the notation [ j ] stand for (despite an obvious abuse) a real number x bounded by j − 1 and j for an arbitrary positive integer j. These noninteger orders may be due to the optimized arithmetic employed in the software, such as the Strassen-like algorithm [93]. The dependency regarding p was not clear in the SDP-based method, while a slight increase was observed for the approximate convex relaxation method. The interesting aspect is that the dominating order of the SDP-based method with respect to n was not n 4 , which was initially expected but is approximately proportional to n. This is perhaps because the constraint posed as 0 ≤ s i ≤ 1 in Appendix A was simplified in the tested implementation as a mere diagonal block of the semidefinite linear matrix inequality (LMI), and the MOSEK solver should have taken advantage of its structure during the Newton step calculation, despite the large n assumed. It should also be noted that this efficacy was not observed for other solvers, such as the SDPT3, although the CVX parser does not seem to change its output. Nonetheless, the complexity of solving the SDP would be enormous, as expected, if the LMI included such a semidefinite relaxation [61,94] of the selection variable vector s, such as ss being a semidefinite matrix. This agrees well with the observations from the experiment for the SDP-based method regarding r, where the LMI of the dimension of r × r is included.
In addition to the computation time per step, the iteration numbers before convergence, as shown in Table 3, also illustrate the computationally friendly features for the high-dimensional problems. The iteration numbers of the SDP-based and the approximate convex relaxation methods do not increase significantly as r, n, and p individually increase. Interestingly, the iteration numbers of these convex relaxation methods change in a different fashion. That of the SDP-based method slightly increases with increases in n and p, and slightly decreases with an increase in r, while that of the approximate convex relaxation shows opposite results. This leads to similar growth rates against r between these two methods, as shown in Figure 3a, where the empirical computational complexity is less than O(r 4 ) for the SDP-based method and more than O(r 3 ) for the approximate convex relaxation method. Moreover, Figure 3c shows that this difference leads to growth rates that are slightly larger than O(n) for the SDP-based method and smaller than O(n 3 ) for the approximate convex relaxation method, against n. This unexpected efficiency of the SDP-based method leads to better scalability with respect to n over the approximate convex relaxation, which is the other convex relaxation method. With regard to the increase in p, the iteration numbers of the SDP-based method increased only slightly, while those of the approximate convex relaxation method exhibited an increase around p = 20, corresponding to the increase in the total run time in Figure 3e. As shown in Figure 3b,d,f, almost the same objective values were obtained by the SDPbased and the approximate convex relaxation methods, which implies a good agreement between the solutions of these relaxation methods. They yielded better or comparable objective function values as compared to the greedy methods, which were up to twice as high, except for p = 1, 2 cases. Those obtained by the gradient greedy method, on the other hand, give an inferior impression as a selection method as compared with the other methods for p ≤ r cases. This is possibly due to the difficulty in ensuring the observability with a limited number of sensors, especially r > p, which should lead to an unstable calculation of the gradient Equation (20). The gradient greedy algorithm is concluded to have poor performance in achieving observability, especially when the convex objective function is hardly approximated by its linear tangent due to large r.
The discussion above illustrates that the use of the SDP-based method seems to be the most favorable among the compared methods in this experiment. The use of the pure greedy method is also a reasonable choice due to its shorter computation time, straightforward implementation, and reasonable performance, which returns half the objective function value of the convex relaxation methods in the experiments.

Results for Data-Driven System Derived from Real-World Experiment
An example of a practical application was conducted using the experimental dataset of flow velocity distribution around an airfoil. The data used herein are found in [95,96], which were previously acquired by the particle image velocimetry (PIV) in a wind tunnel. A brief description of the experiment is shown in Table 4; refer to the original article by the authors for more specific information. The snapshots taken in the experiment quantify velocity vectors that span the visualized plane, i.e., two components on a two-dimensional grid of n = 9353 points, as depicted in Figure 4. Table 4. Brief descriptions of PIV data [95,96]. Only the streamwise components (the direction shown by the arrow in Figure 4) are used, and the ensemble averages over m snapshots are subtracted at each measurement location; that is, averages for each pixel of the calculated velocity image.
As attempted in Section 4.1, a linear representation (C, A) is derived first. The data-driven system identification procedure is based on the modeling method of [30,78]. Here, the snapshots of the velocity field are reshaped to (n × 1)-dimensional m vectors, y 1 , . . . , y m , then the data matrix is defined by Y := [y 1 , . . . , y m ] ∈ R n×m . The proper orthogonal decomposition (POD) is then adopted and the data are projected onto the subspace of the leading r POD modes [37], resulting in Y ≈ UΣV (U ∈ R n×r , Σ ∈ R r×r , V ∈ R m×r , respectively). The measurement matrix C is defined by U, which consists of r left singular vectors related to the largest r singular values. These left singular vectors illustrate the dominant spatial coherent structures, while the right vectors represent temporal coherence, and the singular values are the amplitudes of these modes. The use of POD in our study is intended for a more fundamental discussion based on a linear system representation where arbitrary-order low-rank systems are derived from high-dimensional measurement data.
The r-dimensional state variable vector x is given by After introducing X 1 = [x 1 , . . . , x m−1 ] and X 2 = [x 2 , . . . , x m ], the system matrix is then expressed using linear least squares via the pseudo-inverse operation A = X 2 (X 1 ) † .
The system matrix A is manipulated to ensure that its eigenvalues are less than unity and, therefore, the considered systems are stable.
As discussed in Section 4.1, the comparisons are illustrated using the objective function values obtained. The sensor candidate set corresponds to each pixel obtained by the experimental visualization and, thus, its size is fixed to n = 9353. The state variable vector is set to have r = 10, . . . , 60 components as a result of the order reduction at different truncation thresholds, while p is fixed to 20. Moreover, the results with respect to various p values are also provided, where p = 1, 2, . . . , 10, 20, . . . , 100 with r = 10 being fixed. The original dataset consisting of the 10, 000 snapshots is divided into m = 2000 consecutive snapshots and, therefore, the following results are the five-fold average.
The values of the objective function show similar trends to Figure 3b,f for the gradient greedy algorithm, while a large degradation is observed in the convex relaxation methods for r > p. Surprisingly, those of the convex relaxation methods are 2-100 times lower than that of the pure greedy method. This trend is also partially observed for the p = 1 case in Figure 5b, where the pure greedy selection corresponds to the true optimum for this configuration by definition. This clear degradation in the convex relaxations has not yet been explained in this study and may be due to the ill condition of the low-order approximation or the numerical error. A similar discussion was conducted for the previous study [66]. The cause should be investigated in detail for a practical application with the real-world data in a future study. 10 30 60 Rank of system; r The results above, together with those in the previous subsection, illustrate that the pure greedy method shows a more stable performance, while the convex relaxation methods show wide variations in their performance. Therefore, the pure greedy method can be considered the most appropriate method, taking into account the computation time. It should be noted that the present study also shows the prominent performance of the SDP-based method in terms of the objective function and the computation time in the randomly generated, general problem settings. As long as the additional computation time is acceptable, the present authors recommend trying both the pure greedy and the SDP-based methods and to use a better sensor set when selecting the sensors based on the observability Gramian.

Results for Data-Driven System Derived from Weather Observation
A data-driven linear model is obtained from the time-series data of the global sea surface temperature (SST), as conducted in Section 4.2. Table 5 and Figure 6 briefly describe the conditions of processing for the SST dataset. The satellite observation, along with in situ correction composites, interpolated the distribution of SST, and the weekly mean distribution was stacked in the data matrix as a time-series measurement spanning from 1989 to 2023. The data matrix consists of 1727 snapshots of 44,219 spatial components, which correspond to the number of columns and rows, respectively. The sensor selection application to the SST dataset is a common benchmark, as found in [2,40,41,63,97,98].
The linear model was constructed in the same data-driven manner as previously explained in Section 4.2. Moreover, with a dimension set of the system given by r = 10, n = 44,219 and p = 10, the evaluation was conducted as in the previous section by varying r and p solely as r = 10, 20, . . . , 60, and p = 1, 2, . . . , 10, 20, . . . , 100, respectively. The whole duration was used for the construction of the reduced-order model in this example.  The sensor selection results in Figure 7a,b support the observations of the real-world examples of PIV measurements in Section 4.2, as the optimization measures of the SDPbased method and the approximate relaxation method similarly returned deteriorated results as compared with those of the pure-greedy method at p ≤ r.

Conclusions
This research investigates efficient algorithms that give beneficial sensor positions based on the observability Gramian of a discrete-time linear dynamical system, and are essential to realizing effective state estimation and control. The selection methods were characterized using synthetic numerical examples and a data-driven system representation of a real-world dataset. An analysis of the increase in computation time with respect to the system dimension parameters showed the effectiveness of each selection method, along with comparisons of their optimization performance.
This study offers two novel approaches for sensor selection: an approximate convex relaxation solved by a customized Newton method and a linear approximation of a pure greedy evaluation. A comparison was conducted against conventional methods, including a convex relaxation to the semidefinite programming and the pure greedy method. The proposed gradient greedy method achieved a moderate solution with orders-of-magnitude speedups compared to the other methods when the dimensions of the system's state variables did not exceed the number of sensors deployed. Meanwhile, this method was found to be incapable of ensuring the observability of the larger dimensions of the state variable. The convex relaxation methods, including the SDP-based and the approximate convex relaxation methods, achieved better solutions than the greedy methods in synthetic data, while the real-world example discloses their unstable optimization results, especially for a large-scale system. In less observable situations, the pure greedy selection is supposed to be the most reliable choice in terms of the optimization metric obtained.
Thanks to the parser CVX [85,86], the SDP formulation defined by Equation (12) is easily implemented in MATLAB and solved by using the MOSEK solver in [67] and in our case. The actual implementation is shown below, and MATLAB codes are provided in [92].