Operator Smith Algorithm for Coupled Stein Equations from Jump Control Systems

: Consider a class of coupled Stein equations arising from jump control systems. An operator Smith algorithm is proposed for calculating the solution of the system. Convergence of the algorithm is established under certain conditions. For large-scale systems, the operator Smith algorithm is extended to a low-rank structured format, and the error of the algorithm is analyzed. Numerical experiments demonstrate that the operator Smith iteration outperforms existing linearly convergent iterative methods in terms of computation time and accuracy. The low-rank structured iterative format is highly effective in approximating the solutions of large-scale structured problems.


Introduction
Consider the discrete-time jump control systems given by x j+1 = A(t i )x j + B(t i )u j , y j = C(t i )x j , i, j = 1, . . ., m, where A(t i ) ∈ R N×N , B(t i ) ∈ R N×l b , and C(t i ) ∈ R l c ×N with l b , l c ≪ N. Here, N represents the scale of the jump control system.Efficient control in the analysis and design of jump systems involves associating the observability Gramian W oi = ∑ ∞ k=0 (A(t i ) ⊤ ) k C(t i ) ⊤ C(t i )A(t i ) k and controllability Gramian W ci = ∑ ∞ k=0 A(t i ) k B(t i )B(t i ) ⊤ (A(t i ) ⊤ ) k [1], which are solutions of the corresponding coupled discrete-time Stein equations (CDSEs): Here, for i = 1, . . ., m, A i ∈ R N×N is the input matrix, Q i ∈ R N×N is symmetric and positive semi-definite, and E i (X) = ∑ m j=1 p ij X j ∈ R N×N with probability values p ij satisfying ∑ m j=1 p ij = 1.Numerous methods, ranging from classical to state-of-the-art, have been developed over the past decades to address the single Stein equation (i.e., m = 1 in (1)), particularly for special matrix structures.For example, Betser et al. investigated solutions tailored to cases where coefficient matrices are in companion forms [2].Hueso et al. devised a systolic algorithm for the triangular Stein equation [3].Li et al. introduced an iterative method for handling large-scale (the term "large-scale" refers to the scale N of the corresponding equations being large) Stein and Lyapunov equations with low-ranked structures [4].Fan et al. discussed generalized Lyapunov and Stein equations, deriving connections from rational Riccati equations [5].Yu et al. scrutinized large-scale Stein equations featuring high-ranked structures [6].
For CDSEs (1), the parallel iteration [7], essentially a stationary iteration for linear systems, is a commonly used method to compute the desired solution.This approach has been extended to an implicit sequential format [8], leveraging the latest information from obtained solutions to accelerate the iteration of the left part.The gradient-based iterative algorithm, introduced for solving CDSEs, explicitly determines the optimal step size to achieve the maximum convergence rate [9].By utilizing positive operator theory, two iterative algorithms were established for solving CDSEs [10], later extended to Itô stochastic systems.The continuous-time Lyapunov equations can be transformed into CDSEs via the Cayley transformation [11], although determining the optimal Cayley parameter remains a challenge.For other related iterative methods of continuous-time Lyapunov equations, consult [12][13][14][15][16] and the corresponding references.
CDSEs also arise from solving sub-problems of coupled discrete-time Riccati equations from optimized control systems.The difference method [17] and CM method [18] were proposed to tackle the sub-problems of CDSEs.Ivanov [19] developed two Stein iterations that also exploit the latest information of previously derived solutions for acceleration.Notably, the iteration schemes provided in [8] are almost identical to these two Stein iterations [19], essentially corresponding to the Gauss-Jacobi and the Gauss-Seidel iterations applied to coupled matrix equations.Successive over-relaxation (SOR) iterations and their variants for CDSEs were explored in [20,21], though determining the optimal SOR parameter remains challenging.One limitation of the aforementioned methods is that they are linearly convergent, and the potential structures (such as the low rank and sparseness) of the matrices are not fully exploited.To enhance the convergence rate of iterative methods, the Smith method was employed to solve the single Stein equation [22] and extended to structured large-scale problems [4,11].This method offers the advantage of being parameter-free and converging quadratically to the desired symmetric solution.A similar idea was extended to address some stochastic matrix equations in [5].In this paper, we adapt the Smith method to an operator version to compute the solution of CDSEs and subsequently construct the corresponding low-ranked format for large-scale problems.The main contributions encompass the following significant aspects:

•
We introduce the operator defined as This operator formulation enables us to adapt the Smith iteration [4,11,22] to an operator version, denoted as the operator Smith algorithm (OSA).By doing so, the iteration maintains quadratic convergence for computing the symmetric solution of CDSEs (1).Our numerical experiments demonstrate that the OSA outperforms existing linearly convergent iterations in terms of both the CPU time and accuracy.

•
To address large-scale problems, we structure the OSA in a low-ranked format with twice truncation and compression (TC).One TC applies to the factor in the constructed operator (2), while the other TC applies to the factor in the approximated solution.This approach effectively reduces the column dimensions of the low-rank factors in symmetric solutions.

•
We redesign the residual computation to suit large-scale computations.We incorporate practical examples from industries [23] to validate the feasibility and effectiveness of the presented low-ranked OSA.This not only demonstrates its practical utility but also lays the groundwork for exploring various large-scale structured problems.
This paper is structured as follows.Section 2 outlines the iterative scheme of the OSA for CDSEs (1), along with a convergence analysis.Comparative results on smallscale problems highlight the superior performance of the OSA compared to other linearly convergent iterations.Section 3 delves into the development of the low-ranked OSA, providing details on truncation and compression techniques, residual computations, as well as complexity and error analysis.In Section 4, we present numerical experiments from industrial applications to illustrate the effectiveness of the introduced low-ranked OSA in real-world scenarios.
Throughout this paper, I m (or simply I) is the m × m identity matrix.For a matrix A ∈ R N×N , ρ(A) is the spectral radius of A. Unless stated otherwise, the norm ∥ • ∥ is the ∞-norm of a matrix.For matrices A and B ∈ R N×N , the direct sum A ⊕ B means the block diagonal matrix A 0 0 B .For symmetric matrices A and B ∈ R N×N , we say A > B

Iteration Scheme
The operator Smith algorithm (OSA) represents a generalization of the Smith iteration applied to a single matrix equation [4].
For each i = 1, . . ., m, the operator F in (2) at k-th iteration is defined as With the initial X i,0 = Q i , the OSA for CDSEs (1) is given by where Remark 1.By the definition of the operator (F ) i,k , it is not difficult to see that (F ) i,k+1 doubles the former operator (F ) i,k for k = 1, 2, . ... Specifically, the operator (F ) i,k+1 acting on a matrix is equivalent to applying the former operator (F ) i,k twice on that matrix.To illustrate, let us consider m = 2 as an example.
The following proposition indicates the concrete form of X i,k , (i = 1, . . ., m) in the OSA (4): The k-th iteration X i,k generated by (4) admits a representation for i = 1, . . ., m.
Proof.We prove (5) by induction.It is obvious from the OSA that X i,1 Then, (5) holds for k = 1.Assume that (5) holds for k = l.Then, one has To obtain the convergence of the OSA, we further assume that all A i for i = 1, . . ., m are d-stable, i.e., ρ(A i ) < 1.
The following theorem concludes the convergence of the OSA: Theorem 1.Let ρ := max m i=1 ρ(A i ) and p := max m i,j=1 p ij such that 2pρ 2 < 1.Then, the sequence {X i,k } generated by (4) is convergent to the solution of the CDSEs when A i is d-stable.Moreover, one has where ∥Q∥ := max m i=1 ∥Q i ∥ for i = 1, . . ., m.
Proof.It follows from Proposition 1 that the solution of CDSEs has the form of (6) when the assumption holds.Subtracting (5) from ( 6) and taking the norm, one then has Remark 2. It is evident from Theorem 1 that the OSA admits the quadratic convergence rate when 2pρ 2 < 1.This highlights its superiority over the prevailing linearly convergent iterations [8,19,21,24] both on accuracy and CPU time, as elaborated in the next subsection.

Examples
In this subsection, we present several examples that highlight the superior performance of the OSA compared to linearly convergent iterations [8,19,21,24].Notably, the iteration method outlined in [8] is identical to the one in [19].Additionally, other linearly convergent iterations exhibit similar numerical behaviors.Therefore, for accuracy and CPU time comparisons, we select the iteration method from [8,19], referred to as "FIX" in this subsection.It is important to note that the discrete-time Lyapunov equation in "FIX" was solved using the built-in function "dlyap" in Matlab 2019a [25].Example 1.This example is from a slight modification of the all-pass SISO system [26], where the controllability and observability Gramians are quasi-inverse to each other, i.e., W ci W oi = σ i I for some σ i > 0. This property indicates that the system has a single Hankel singular value of multiplicity equal to the system's order.The derived system matrices are as follows: where G 1 and G 2 are matrices with zero elements except for the last row of 0.1g 1 and 0.3g 2 , respectively (both g 1 and g 2 are random row vectors with elements from the interval (0, 1)); Ā1 and Ā2 are both tri-diagonal matrix tridiag(−1, 0, 1) but with Ā1 (1, 1) = −0.5 and Ā2 (1, 1) = −0.8, We consider m = 2 and select the probability matrix Π = 0.26 0.74 0.53 0.47 .
We evaluate the OSA and FIX for dimensions N = 400 and N = 800 and present their numerical behaviors in Table 1.Here, δt k and t k record the CPU time of the current iteration and accumulated iterations, respectively.The Rel_Res column exhibits the relative residual of CDSEs in each iteration.From Table 1, it is evident that the OSA achieves equation residuals of O(10 −16 ) within five iterations for different dimensions.The CPU time required for the OSA is significantly less than that required for FIX.Conversely, FIX maintains equation residuals at the level of O(10 −13 ) even after 11 iterations.The symbol " * " in the table indicates that, despite resuming the iteration, it can not further reduce the equation residuals to terminate the FIX.To visualize the residuals of each equation (here, m = 2) for different iteration methods, we plot the history of the equation residuals in Figure 1.Here, "S-Resi" and "F-Resi" (i = 1, 2) represent the residuals of equations obtained by the OSA and FIX, respectively.The figure illustrates that the OSA has quadratic convergence.Interestingly, although FIX converges rapidly for solving the second equation, it maintains linear convergence for solving the first equation, resulting in an overall linear convergence for FIX.We further consider modified system matrices where G 1 and G 2 are matrices with zero elements except for the last row of 0.6g 1 and 0.8g 2 , respectively.In this case, the spectral radii of matrices A 1 and A 2 are 0.96 and 0.95, respectively.We rerun the OSA and FIX algorithm, and the obtained results are recorded in Figure 2. From the plot, it can be observed that for N = 400, the OSA requires nine iterations to achieve equation residuals at the O(10 −15 ) level, with a total time of 10.17 s.Conversely, FIX, after consuming 51.31 s over 90 iterations, only achieves equation residuals at the scale of O(10 −13 ).
Further numerical experiments demonstrate that even by increasing the number of iterations, FIX fails to further reduce the residual level.Similar numerical results are obtained for the scale of N = 800.Example 2. Consider a slight modification of a chemical reaction by a convection reaction partial differential equation on the unit square [27], given by where x is a function of time (t), vertical position (v), and horizontal position (z).The boundaries of interest in this problem lie on a square with opposite corners at (0, 0) and (1, 1).The function x(t, v, z) is zero on these boundaries.This PDE is discretized using centered difference approximations on a grid of n v × n z points.The dimension N of A is the product of the state space dimension n v n z , resulting in a sparsity pattern of A as Here we take where The parameter m = 2 and the probability matrix Π = 0.244 0.756 0.342 0.658 .
Similarly, we applied the OSA and FIX iterations to CDSEs with dimensions N = 350 and N = 700, and the results are presented in Table 2.It is evident that the OSA can achieve equation residuals of O(10 −14 ) within five iterations, with the required CPU time approximately one-ninth of that needed for the FIX iterations.However, FIX only attains equation residuals of O(10 −13 ) after 10 iterations.We also depict the residuals of the two different iteration methods for their respective m equations (here m = 2) in Figure 3.The figure illustrates that the OSA exhibits quadratic convergence, while FIX demonstrates only linear convergence.

Structured Algorithm for Large-Scale Problems
In numerous practical scenarios [23], matrix A i is often sparse, and Q i is typically of a low-ranked structure.Therefore, in this section, we adapt the OSA to a low-ranked format, well-suited for large-scale computations.

Structured Iteration Scheme
Given the initial matrices X i,0 . ., m, we show that the iteration (4) can be organized as the following format: where , and K Q i,k are of forms in the following proposition.
Proposition 2. Let L Q i ∈ R N×l i in X i,0 be the initial factor with l i ≪ N. The sequences F i,k (X •,k ) and X i,k generated by (4) are factorized as in (7), where and Proof.Given the initial matrix X i,0 So (7) holds for k = 0. Assume ( 7) is true for k = l.It follows from the iteration scheme (4) that Then (7) holds true for k = l + 1.The proof is complete.

Truncation and Compression
It is evident that the columns of L F 2 k i,k at the k-th iteration will scale approximately as O(m 2 k−1 l i ), where l i is the initial column number of the factor L Q i .Consequently, we implement the truncation and compression (TC) to reduce the column number of low-rank factors [11,28].Notably, our algorithm employs the TC technique twice within one iteration: once for L F 2 k i,k and once for L Q i,k .For simplicity in notation, we omit the subscript i for low-rank factors.Recalling (8) and ( 9), we perform TC on L F 2 k k and L Q k using QR decompositions with column pivoting.Then, one has where P F 2 k and P Q k are permutation matrices ensuring that the diagonal elements of the decomposed block triangular matrices decrease in absolute value.Additionally, u f 0 and u q 0 represent constants, and τ is some small tolerance controlling TC.Let m f 2 k and m q k denote the respective column numbers of L F 2 k k and L Q k , bounded above by a given m max .Then, their ranks satisfy with m max ≪ N. The truncated factors are still denoted as respectively.The compressed kernels are denoted as where k and K Q k represent the abbreviated kernels in ( 8) and ( 9) without subscript i, respectively.

Computation of Residuals
Given the initial matrix X i,0 where Similarly, we also impose the truncation and compression on L R i,k , k ≥ 0, i.e, implementing the QR decomposition with pivoting: where P R i,k is a pivoting matrix and u r 0 is some constant.
. The corresponding kernel of residual is also denoted as and the terminating condition of the whole algorithm is chosen to be with ϵ being the tolerance.

Large-Scale Algorithm and Complexity
The OSA with a low-rank structure equipped with TC is summarized in the following OSA_lr Algorithm 1.
To show the computational complexity of the algorithm OSA_lr, we assume that all matrices A i for i = 1, . . ., m are sufficiently sparse.This allows us to consider the cost of both the product A i B and solving the equation , respectively.The flops and memory of the k-th iteration are summarized in Table 3 below.Algorithm 1: Algorithm OSA_lr.Solve large-scale CDSEs with low-ranked Q i Inputs: Sparse matrices A i , low-rank factors L Q i for i = 1, . . ., m, probability matrix Π ∈ R m×m , truncation tolerance τ, upper bound m max and the iteration tolerance ϵ.Outputs: Low-ranked matrix L X i and the kernel matrix K X i with the solution i,k as in ( 11) and ( 13).6.
Compute K Q i,k and L Q i,k as in (9). 7.
Truncate and compress L q i,k as in (10) with accuracy u q 0 τ.

8.
Construct compressed low-ranked factor L Q i,k and kernel K Q i,k as in ( 12) and ( 14). 9.
Table 3. Complexity and memory at k-th iteration in algorithm OSA_lr.

Items Flops Memory
Householder QR decomposition is used [29].

Error Analysis
In this subsection, we will conduct the error analysis of OSA_lr.For i = 1, . . ., m, let be the errors yielded by roundoff or iteration.Here, A i and X i,k are true matrices, while A i and X i,k are the practical iteration matrices.The following lemma indicates the error propagation of the operator.
Lemma 1.Given errors δX i,k and δA i as in (20), the error of the operator is and p = max i,j {p i,j }.
Proof.By merely retaining one order error of δA i and δX i,k , it follows from the definition of F i,l (X k ) in (3) that the practical operator is ≤ . . .Note that error items in • in ( 21) and ( 22) have corresponding upper bounds (mp) 2 (α 2 δX k + 2q k αδA) and (mp) 3 (α 4 δX k + 4q k α 3 δA), respectively.After multiplying the left by (A i + δA i ) ⊤ and the right by A i + δA i in the outermost layer, the upper bounds of the errors are (mp) 2 (α 4 δX k + 4q k α 3 δA) and (mp) 3 (α 6 δX k + 6q k α 5 δA), respectively.Then, by the induction, it is not difficult to see that the final error is We have the following error bound at the k + 1-th iteration.
Theorem 2. Given errors δX k and δA i as in (20), the error at the k + 1-th iteration has the bound where m, p, α, and q k are defined in Lemma 1 and τ is the error of TC described in Section 3.2.
Proof.It follows from the iteration format (4) that the error at the k + 1-th iteration has the upper bound where O(τ) represents the truncation and compression error on F i,k (X k ) and X i,k .By taking l = k in Lemma 1, one has and (23) holds true.

Numerical Examples
In this section, we illustrate the effectiveness of OSA_lr in computing symmetric solutions to large-scale CDSEs (1) through practical examples [23,26,27,[30][31][32][33].The algorithm OSA_lr was coded by MATLAB 2019a on a 64-bit PC running Windows 10.The computer is equipped with a 3.0 GHz Intel Core i5 processor with six cores and six threads, 32 GB RAM, and a machine unit roundoff value of eps = 2.22 × 10 −16 .The maximum allowed column number of the low-ranked factors in OSA_lr is bounded by m max = 1000, and the tolerance for the TC of columns is set to τ = 10 −16 .In our experiments, we also attempted using N•eps as the TC tolerance for τ but found it had no impact on the computation accuracy.The residuals of the equations are calculated in (19) with a termination tolerance of ϵ = 10 −13 .It is noteworthy that we no longer compare with the linearly convergent iterative methods in Section 2, as the computational complexity of those algorithms per iteration is O(N 3 ).
Example 3. We still employ the modification of the all-pass SISO system [26] in Section 2, but here we take N = 12,000.We list the calculated results of OSA_lr in Table 4, where the columns δ k and t k record the CPU time for each iteration and for cumulative iterations, respectively.The Resi and Rel_Resi (i = 1, 2) columns provide the absolute residual ∥K R i,k ∥ and relative residual ∥K R i,k ∥/∥K R i,0 ∥ computed by OSA_lr at each iteration, respectively.The m Q i k (i = 1, 2) columns indicate the column number of the low-ranked factor L Q i,k .
From the table, it is evident that OSA_lr achieves the prescribed equation residual level within five iterations, and the residual history demonstrates the quadratic convergence of the algorithm.The column count m Q i k for the low-ranked factor L Q i,k expands at a rate greater than twice with each iteration, resulting in an exponential increase in the CPU time.Particularly, significant growth in the CPU time occurs during the third and fourth iterations.In numerical experiments, we observed that this substantial increase in the CPU time primarily lies in the residual computation step, specifically in Step 9 of OSA_lr.Hence, further investigation on the efficient evaluation of the equation residual is a crucial consideration for large-scale computations.We also plot the residual history of OSA_lr in Figure 7 to show its performance, where R-Res_i (i = 1, 2) denotes the relative residual of the i-th equation.Example 4. We continue to examine CDSEs from Example 2 [27,30], but with larger scales N = 21,000, 28,000, and 35,000.The derived results of OSA_lr are presented in Table 5.
The symbols δ k , t k , Resi, Rel_Resi, and m Q i k (i = 1, 2) are defined similarly to those in Example 3. In all experiments, the equation residuals (in log 10) reached the predetermined residual level by the sixth iteration.For equations of different dimensions, the Resi (i = 1, 2) columns indicate that the algorithm OSA_lr is of nearly quadratic convergence, except for the final two iterations.The m Q i k column reveals that, in the fifth and sixth iterations, the column number of the factor L Q i,k increased by nearly five times and six times, respectively.This resulted in a substantial increase in the CPU time during the last two iterations.A further detailed analysis indicated that this increased time primarily came from the com-putation of equation residuals in the final two steps.The performance of OSA_lr on the residual history with N = 35,000 is plotted in Figure 7 , where R-Res_i (i = 1, 2) denotes the relative residual of the i-th equation.30,31].These problems involve a flow region with a prescribed velocity profile, incorporating convective transport.Achieving solution accuracy with upwind finite element schemes typically requires a considerable number of elements for a physically meaningful simulation.In the illustrated scenario (see the left side of Figure 4), a 3D model of a chip is subjected to forced convection, utilizing tetrahedral element type SOLID70 as described by [34].Both the Dirichlet boundary conditions and initial conditions are set to 0.  We consider the case that the fluid speed is zero and the discretization matrices are symmetric.The system matrices are where r 1 and r 2 are random numbers from (0, 1) and matrices A ∈ R N×N , B ∈ R N×1 , and C ∈ R 1×N (N = 20,082) can be found at [23].Since A 1 and A 1 have almost the same sparse structure, we only plot A 1 in the right of Figure 4, where the non-zero elements attain the scale of 381,276.In the numerical experiments, we set m = 2 and use the probability matrix Π = 0.631 0.369 0.143 0.857 .We ran OSA_lr for 10 different r 1 and r 2 and recorded the averaged CPU time, residual of CDSEs, and column dimension of the low-ranked factor in Table 6.The computational results in Table 6 reveal that OSA_lr requires only four iterations to achieve the predetermined equation residual accuracy.Moreover, the Resi and Rel_Resi columns indicate that OSA_lr exhibits quadratic convergence.The m Q i k column demonstrates that the column count of the low-rank factor L Q i k approximately doubles in the first three iterations but experiences a close to three-fold increase in the final iteration.In terms of the CPU time for each iteration, the time required for the final iteration is significantly greater than the sum of the preceding three.A detailed analysis indicates that the primary reason for this phenomenon is similar to the previous examples, wherein the computation of equation residuals in the algorithm accounts for the majority of the time.The performance of OSA_lr on the residual history is plotted in Figure 7, where R-Res_i (i = 1, 2) denotes the relative residual of the i-th equation.Example 6.Consider the structurally vertical stand model from machinery control systems, depicted on the left side of Figure 5, representing a segment of a machine tool [30].In this structural component, a set of guide rails is situated on one of its surfaces.Throughout the machining process, a tool slide traverses various positions along these rails [32].The model was created and meshed using ANSYS.For spatial discretization, the finite element method with linear Lagrange elements was employed and implemented in FEniCS.

The derived system matrices are
where r 1 and r 2 are random numbers from (0, 1) and L Q 1 and L Q 2 ∈ R N×1 are vectors with all elements being zeros except fives ones located in rows (3341, 6743, 8932, 11,324, and 16,563) and rows (1046, 2436, 6467, 8423, and 12,574), respectively.As the sparse structures of A 1 and A 2 are analogous, the right of Figure 5 only exhibits the structure of A 1 , which contains 602,653 non-zero elements.Matrices A ∈ R 16,626×16,626 and B ∈ R 16,626×1  can be found at [23].In this example, we set m = 2 and use the probability matrix as Π = 0.564 0.436 0.785 0.215 .We utilized OSA_lr to solve the CDSEs, and the computed results are presented in Table 7.It can be observed from the table that OSA_lr terminates after four iterations, achieving a high-precision solution, where the equation residuals reach a level of O(10 −15 ) to O(10 −16 ).The iteration history in the Resi and Rel_Resi columns illustrates the quadratic convergence of OSA_lr.The m Q i k column indicates that in the second, third, and fourth iterations, the column count of the low-rank factor L Q i k approximately doubles, triples, and quadruples, respectively.This demonstrates that the truncation and compression techniques have a limited impact on reducing the column count of L Q i k in this scenario.Similarly, δt k reveals that the CPU time for the final iteration is significantly greater than the sum of the preceding three, primarily due to the algorithm spending substantial time computing equation residuals in the last iteration.Example 7. Consider a semi-discretized heat transfer problem aimed at optimizing the cooling of steel profiles in control systems, as discussed in works by [23,30].The order of the models varies due to the application of different refinements to the computational mesh.For the discretization process, the ALBERTA-1.2fem-toolbox [33] and the linear Lagrange elements are utilized.The initial mesh (depicted on the left in Figure 6) is generated using MATLAB's pdetool.We slightly modify the model matrices as follows: where r 1 = 0.009 and r 2 = 0.008 for N = 20,209 and r 1 = 0.0015 and r 2 = 0.0012 for N = 79,841.In this experiment, we take L Q 1 = r 3 C ′ and L Q 2 = r 4 C ′ , with r 3 , r 4 being a random number in (0, 1).Matrices A ∈ R N×N , B ∈ R N×1 , and C ∈ R 1×N can be found at [23].The sparse structures of matrices A 1 and A 2 are nearly identical, and for illustration purposes, we only display the structure of A 1 with N = 79,841 on the right side of Figure 6.The probability matrix is defined as Π = 0.713 0.287 0.584 0.416 .
We employed OSA_lr to solve CDSEs under two different dimensions, and the computed results are presented in Table 8.It is evident from the table that OSA_lr terminates after achieving the predetermined equation residual levels for various dimensions.The Rel_Resi column indicates a significant decrease in the relative equation residuals to the level of O(10 −8 ) by the second iteration, enabling the algorithm to obtain a high-precision solution in only four iterations.The two columns of m Q i k demonstrate that, for different dimensions, the column count of the low-rank factor L

Conclusions
This paper introduces an OSA method for coupled Stein equations in a class of jump systems.The convergence of the algorithm is established.For large-structured problems, the OSA method is extended to a low-rank structured iterative format, and an error propagation analysis of the algorithm is conducted.Numerical experiments, drawn from practical problems [23], indicate that in small-scale computations, the OSA outperforms existing linearly convergent iterative methods in terms of both the CPU time and accuracy.In large-scale computations, OSA_lr efficiently computes high-precision solutions for CDSEs.Nevertheless, the experiments reveal that the time spent on residual computation in the final iteration is relatively high.Therefore, improving the efficiency of the algorithm's termination criteria is a direction for further research in future work, and it is currently under consideration.

Figure 1 .
Figure 1.Residual history in each equation for OSA and FIX.

Figure 3 .
Figure 3. Residual history in each equation for OSA and FIX in Example 2.
A i X = B, which are both within the range of cN floating-point operations (flops), where B is an N × m b matrix with m b ≪ N and c is a constant.Additionally, the number of truncated columns of L F 2 k−1 i,k and L Q i,k for all i = 1, . . ., m are denoted as m f k and m q k

Figure 4 .
Figure 4.The 3D model of the chip and the discretized matrix A 1 .

Figure 5 .
Figure 5.The vertical stand model and the discretized matrix A 1 .

4 Figure 6 .
Figure 6.The initial mesh for cooling of steel and the discretized matrix A 1 and A 2 .
Q i k increases by a factor of two, indicating that truncation and compression techniques effectively constrain the growth of the column count of L Q i k in this scenario.Similarly, the δt k column reveals that in the final iteration, due to the computation of equation residuals, OSA_lr consumes considerably more CPU time than the sum of the preceding three iterations.The performances of OSA_lr on the residual history with N = 20,209, 799,841 are plotted in Figure 7, where R-Res_i (i = 1, 2) denotes the relative residual of the i-th equation.

Table 1 .
History of CPU time and residual for OSA and FIX in Example 1.

Table 2 .
History of CPU time and residual for OSA and FIX in Example 2.

Example 5 .
Consider the thermal convective flow control systems in [23,

Table 7 .
CPU time and residual in Example 6.