A Novel, Direct Matrix Solver for Supersonic Boundary Element Method Systems

Goates, Cory; Hunsaker, Douglas

doi:10.3390/aerospace11121018

Open AccessArticle

A Novel, Direct Matrix Solver for Supersonic Boundary Element Method Systems

by

Cory Goates

^*

and

Douglas Hunsaker

Mechanical and Aerospace Engineering, Utah State University, Logan, UT 84322, USA

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(12), 1018; https://doi.org/10.3390/aerospace11121018

Submission received: 27 September 2024 / Revised: 27 November 2024 / Accepted: 5 December 2024 / Published: 11 December 2024

(This article belongs to the Special Issue Research and Development of Supersonic Aircraft)

Download

Browse Figures

Versions Notes

Abstract

For problems with very fine surface meshes, typically the most time-consuming step of a boundary element method (BEM, also called a panel method) is solving the final linear system of equations. Many have already studied how to efficiently solve the dense, asymmetric systems which arise in elliptic BEMs. However, this has not been studied for a supersonic aerodynamic BEM, for which the governing PDE is hyperbolic. Due to this hyperbolic character, the matrix equation which arises from a supersonic BEM has a large number of identically zero elements. But the resulting linear system of equations is also not sparse in the standard sense. Hence, the efficient solution of the linear system of equations arising in a supersonic BEM is here considered. A novel sorting algorithm is developed whereby the non-zero elements may be arranged into a useful structure with minimal cost. A novel direct solution method is developed here based on fast Givens rotations and the QR decomposition. This novel solver leverages the unique matrix structure to solve the supersonic system of equations more quickly than traditional direct methods. This novel method is then compared to other direct and iterative matrix solvers and is shown to be more robust than iterative solvers and more efficient than other direct solvers, with a computational time complexity of approximately

O (N^{2.5})

.

Keywords:

boundary element methods; panel methods; supersonic; aerodynamics; matrix solvers; QR decomposition; Given’s rotations

1. Introduction

Boundary element methods (BEM, also referred to as panel methods) arise from applying the method of Green’s functions to a given governing partial differential equation (PDE) and a set of boundary conditions [1]. BEMs are advantageous because the computational domain consists of only the boundary of the fluid or structural domain of interest. On this boundary, the strengths of certain singularities (sources, doublets, and/or vortices in fluid dynamics) are solved for that uniquely determine the solution to the governing PDE everywhere in the domain. Rather than treating the boundary as one continuous whole, boundary element methods solve a given problem by discretizing the domain boundary into discrete elements or panels. The singularity distributions across each panel are defined by certain parameters, which, when combined with appropriate boundary conditions, may be solved for using a linear system of equations. Solving this system is the focus of this work. For greater detail on other aspects of a BEM, particularly as applied to the problem of supersonic aerodynamics, see [2,3,4,5,6,7,8,9]. The paper by Erickson [2] is a particularly good introduction to panel methods. References [3,4] deal with the theory and implementation of the panel code PAN AIR and, while extremely dense, give good information. A more digestible treatment of compressible panel methods is given in [5]. The papers by Maruyama et al. [6], Youngren et al. [7], and Davis [8] describe other implementations of a compressible panel method.

When the governing PDE for a BEM is elliptic (such as with methods for electrostatics, linearized elastostatics, and subsonic aerodynamics), each panel exerts an influence everywhere in the computational domain. Because of this, the A matrix in the linear system of equations within an elliptic BEM has all non-zero coefficients. This is different from finite-volume or finite-element methods, where the linear system of equations is very sparse.

The efficient solution of the dense system of equations in an elliptic BEM has been investigated previously [10,11,12,13,14,15,16,17,18]. In 1987, Mullen and Rencis evaluated different iterative matrix solvers for a 2D Laplacian BEM [10]. For the computing power available to them at that time, Mullen and Rencis found that the iterative schemes they considered could not compete with direct Gauss elimination. However, shortly thereafter, in 1992, Mansur et al. showed that iterative methods including the bi-conjugate-gradient method performed faster than Gauss elimination [11]. That same year, Barra et al. demonstrated the usefulness of GMRES for solving dense BEM systems [12], and in 1996, Boschitsch et al. showed how the fast-multipole method coupled with a GMRES solver could be used to significantly reduce panel method computation times [13]. Willis et al. demonstrated the same in 2005 [14]. The usefulness of Krylov subspace methods, such as GMRES, for dense BEM systems was confirmed in 2007 by Xiao and Chen [15]. Further investigations into improved convergence for GMRES were presented in 2019 by Yao et al. [16] and in 2021 by Sun et al. [17]. Davey and Bounds in 1998 showed how a generalized SOR method could be effectively used for the dense system of equations in an elliptic BEM [18]. Thus, the efficient solution of the dense, asymmetric system of equations arising from an elliptic BEM has been and continues to be a topic of significant interest.

On the other hand, when the governing PDE is hyperbolic, each panel only exerts an influence within a limited region. The Prandtl–Glauert equation, which is the governing equation for linearized supersonic flow, is hyperbolic, meaning the influence of a given panel is limited to a downstream region known as the domain of influence [19]. Because of this limited influence, the resulting system matrix contains many elements that are identically zero. However, there may be many downstream control points within the domain of influence, and so the system of equations is still not sparse. To the knowledge of the authors, the efficient solution of such a neither dense nor sparse system of equations has not yet been considered in the available literature. One of the most well-known supersonic panel methods, PAN AIR, simply used LU decomposition to solve this system [4]. The matrix solvers used by other supersonic panel codes (such as MARCAP [6] or CPanel [8]) are not reported.

The purpose of this work is to assess various methods for solving the linear system of equations arising from a supersonic BEM. In particular, the structure of the linear system of equations is examined to facilitate the development of a solver specifically tailored to solving the supersonic problem. The supersonic BEM considered here is fully described in [5,20], and the source code is available at [21,22]. This BEM is implemented on an unstructured mesh of triangles with linear–doublet–constant–source panels. The traditional Morino formulation is used to determine the unknown source and doublet strengths [3,6,20].

Today, it is common to improve computational efficiency by parallelizing as many operations as possible. However, parallelization adds another dimension to assessing solver performance, and this extra dimension is outside the scope of the current work. Parallelization can only improve the run time of a given algorithm by a factor of how many parallel threads are used. The total CPU time (number of threads multiplied by the time on each thread) is unchanged or even increased (due to communication overhead) by parallelization. Thus, parallelization has no effect on the order of complexity of the method. As the number of unknowns increases, a solution method with a low order of complexity will run faster than a solution method with a high order of complexity, even if the latter method can be efficiently parallelized. For these reasons, all methods described here are implemented in serial, and only serial performance is compared. It is common practice in the technical literature to consider only serial implementation at first for a newly developed method (e.g., see [15,23,24]). Some discussion on the parallelization of each method will be given, but such is not the focus of this work.

This paper proceeds as follows. First, the structure of the linear system matrix is examined, and a method is presented for arranging the system to obtain better solver performance. Various methods for solving the linear system are then presented, including a novel method based on Givens rotations. The relative performance of each method is then assessed and discussed.

2. Linear System of Equations from the Supersonic BEM

While the focus of this work is on matrix solvers, a brief overview of supersonic panel methods is necessary in order to understand the structure of the system of equations being solved. The structure of the resultant linear system of equations will then be discussed. An algorithm for ensuring a favorable matrix structure will then be presented.

2.1. Supersonic Panel Methods

The supersonic panel method considered here finds solutions to the supersonic (

M_{\infty} > 1

) Prandtl–Glauert equation [19,20,25]

(1 - M_{\infty}^{2}) ϕ_{x x} + ϕ_{y y} + ϕ_{z z} = 0

(1)

where

M_{\infty}

is the freestream Mach number,

ϕ

is the perturbation velocity potential, and subscripts denote (double) derivatives with respect to the subscripted variables. The Prandtl–Glauert equation is the simplest equation governing three-dimensional flow that takes compressibility into account.

The Prandtl–Glauert equation is also linear, and so may be solved using the method of Green’s functions [26]. Applying Green’s third identity to Equation (1) results in the boundary integral equation for a point

r = (x, y, z)

in the flow [25,27]

2 π ϕ (r) = {\int \int}_{S} G σ - μ \frac{\partial}{\partial \tilde{n}} (G) d S

(2)

where the supersonic Green’s function, G, is given by

G = \{\begin{matrix} \frac{- 1}{\sqrt{{(x - ξ)}^{2} + (1 - M_{\infty}^{2}) [{(y - η)}^{2} + {(z - ζ)}^{2}]}}, & ξ - x < 0 and {(x - ξ)}^{2} + (1 - M_{\infty}^{2}) [{(y - η)}^{2} + {(z - ζ)}^{2}] > 0 \\ 0, & otherwise \end{matrix}

(3)

and

σ

and

μ

represent surface distributions of supersonic sources and doublets, respectively, S is the surface of the configuration being analyzed,

(ξ, η, ζ)

is the point of integration on S, and

\tilde{n}

is the conormal vector to S [19]. Equation (2) states that, for any flow satisfying Equation (1), the velocity potential,

ϕ

, may be found as a function of the source and doublet strengths on the boundary of the flow, S. This ability to simulate a full three-dimensional flow field using only unknowns on the boundary is what makes panel methods very fast compared to other methods such as finite-volume CFD.

However, for most flows, the correct source and doublet strengths are not known a priori. These may be determined by first discretizing Equation (2) and then applying appropriate boundary conditions. The surface, S, is divided into panels, each with a distribution of

σ

and

μ

governed by a finite set of unknown parameters. Then, for this work, the Morino boundary condition formulation is applied. Within the Morino formulation [3,20], the boundary conditions are

σ = - \hat{c} \cdot n

(4)

ϕ_{i} \equiv 0

(5)

where

\hat{c}

is the direction of the freestream,

n

is the local surface normal, and

ϕ_{i}

is the perturbation potential inside the body being analyzed. Using Equation (4), the source strengths are first calculated based on the freestream and local panel geometry. Control points are then placed inside S, at which the total perturbation potential calculated using Equation (2) needs to satisfy Equation (5). For this work, the control points are placed just inside the surface from each vertex. The doublet strengths are also defined at each mesh vertex, resulting in a square system of equations. Writing Equation (2) for each control point in terms of the unknown doublet strengths results in a linear system of equations of the form

A x = b

(6)

where A is a square, asymmetric, indefinite matrix called the aerodynamic influence coefficient (AIC) matrix, x is a vector of doublet strengths, and b contains the negative of the potential induced by the surface sources at each control point [25]. The structure of the AIC matrix is such that the rows correspond to which control point is being influenced, and the columns correspond to which vertex is exerting an influence. This is shown diagrammatically in Figure 1. Solving this matrix equation will then give the correct surface doublet strengths, from which the entire flow field is specified. This may be performed using any convenient matrix solver; however, it will be shown here that the supersonic AIC matrix has a particular structure that can be exploited to quickly solve the system.

2.2. Structure of the Supersonic Matrix Equation

As may be seen from Equation (3), in supersonic flow, disturbances only propagate downstream (i.e., downstream panels do not influence upstream control points) [19]. Because of this, many elements in the AIC matrix will be zero. This is unlike a panel method for subsonic flow, where disturbances propagate throughout the entire flow field and the resulting AIC matrix is entirely filled-in. To illustrate this difference, the AIC matrices for an isolated wing in subsonic and supersonic flow are shown in Figure 2. While these two matrices share qualitative similarities, such as the clustering of high-magnitude coefficients near the diagonal, the difference between the two flow regimes is clear.

The ordering of the AIC matrices represented in Figure 2 came from the order of the vertices in the mesh file. Again, the unknown doublet strengths are defined at each vertex, and each control point is associated with a vertex. Because of this, there is no useful pattern to the non-zero elements in the supersonic AIC matrix (Figure 2b).

However, since it is known that the panels in the supersonic case exert no upstream influence, if the mesh vertices were arranged in order relative to the freestream direction, then most of the non-zero elements in the AIC matrix would appear in a single region of the matrix. This is evident in Figure 3, which shows the AIC matrix for a 10° cone in supersonic flow. The mesh file for this particular cone was generated such that the vertices were listed starting at the base (downstream) and ending at the tip of the cone (upstream). Recalling how the AIC matrix is formed (Figure 1), the first row of the AIC matrix (corresponding to the most-downstream control point) is mostly filled-in, since it is influenced by most of the vertices in the mesh. Moving down the rows of the AIC matrix (corresponding to moving upstream through control points), more and more elements on the left end of the row become zero. The zero elements are clustered at the left end of the row because those elements represent vertices that are further downstream (i.e., those vertices that influence fewer control points).

The AIC matrix shown in Figure 3 is almost upper-triangular. However, not all elements left of the diagonal are zero. This arises because a vertex may still have an influence on a control point upstream of itself if that vertex belongs to a panel which does influence that control point, because the doublet strength at that vertex helps determine the doublet distribution over the entire panel. However, despite not being upper-triangular, the AIC matrix has a definite structure where the non-zero elements are clustered towards the upper-right. One can imagine drawing a pentagon around these non-zero elements, such as shown in Figure 4. Because of this shape, such a matrix will be called upper-pentagonal. Just as an upper-triangular matrix has non-zero elements bounded by a triangle, an upper-pentagonal matrix has non-zero elements bounded by a pentagon.

In mathematical terms, an upper-pentagonal matrix is here defined as follows.

Definition 1.

A given matrix, A, having elements

A_{i j}

is upper-pentagonal, if

\exists B_{l} \geq 0 s . t . i > j + B_{l} \Rightarrow A_{i j} = 0 \forall i

(7)

where the quantity

B_{l}

is called the lower bandwidth.

This lower bandwidth denotes the lowest subdiagonal that contains non-zero elements. As defined here, an upper-triangular matrix is a special case of an upper-pentagonal matrix having

B_{l} = 0

, and an upper-Hessenberg matrix is one with

B_{l} = 1

. A matrix simply being upper pentagonal is nothing special. Considering Definition 1, even a matrix with all non-zero elements may be considered upper-pentagonal with bandwidth

B_{l} = N - 1

, where N is the size of the matrix, assuming the matrix is square. As will be shown later, to efficiently solve the linear system of equations, it is best to have

B_{l} < < N

. Thus, the goal is to obtain an upper-pentagonal matrix with small lower bandwidth.

2.3. Sorting Algorithm for Obtaining Upper-Pentagonal Form

An algorithm has been developed for rearranging the mesh vertices such that the AIC matrix is upper-pentagonal with small lower bandwidth. This algorithm may be applied to any geometry and is only a function of the vertex locations and freestream direction. The first step of this algorithm is to sort all vertices based on the negative of their downstream distance. If

r_{i}

is the location of the i-th vertex, then the negative of the downstream distance,

x_{i}

, is given by

x_{i} = - U_{\infty} \cdot r_{i}

(8)

where

U_{\infty}

is the freestream velocity vector.

The second step of the algorithm is to sort all vertices based on the negative of the downstream distance (same as Equation (8)) of their most-downstream neighboring vertex. This is because each neighbor of a given vertex will help determine the influence of that vertex through setting the distribution parameters over shared panels, even if the neighbor vertex itself falls outside of the domain of dependence. Since many vertices share downstream neighbors, this second sort is non-unique. Hence, a stable sorting algorithm (such as insertion sort) is required for the second sort so that the ordering from the first sorting pass is preserved whenever the second pass encounters a tie.

This method is described in Algorithm 1. For efficiency, instead of rearranging the mesh vertices in memory, the sorting algorithm is implemented to produce a permutation vector that gives the ordering of the rearranged vertices.

Algorithm 1 Method for sorting mesh vertices such that the AIC matrix will be upper-pentagonal

1:: Input
2:: N Number of mesh vertices
3:: $r_{i}$ Vertex locations
4:: $U_{\infty}$ Freestream velocity
5:: Output
6:: P Vector of vertex indices which will sort the AIC matrix (permutation vector)
7:: $i \leftarrow 1$
8:: while $i \leq N$ do ▹ Get the downstream distances of each vertex and store in x
9:: $x_{i} \leftarrow - 〈 U_{\infty}, r_{i} 〉$
10:: $i \leftarrow i + 1$
11:: end while
12:: $P_{1} \leftarrow$ Indices which will sort x
13:: $i \leftarrow 1$
14:: while $i \leq N$ do ▹ Get the downstream distances of the most-downstream neighbors
15:: $j \leftarrow P_{1_{i}}$
16:: $k \leftarrow$ Index of most-downstream neighbor of vertex j
17:: $x_{i} \leftarrow - 〈 U_{\infty}, r_{k} 〉$
18:: $i \leftarrow i + 1$
19:: end while
20:: $P_{2} \leftarrow$ Indices which will sort x using a stable sort
21:: $i \leftarrow 1$
22:: while $i \leq N$ do ▹ Combine the two permutations
23:: $j \leftarrow P_{2_{i}}$
24:: $k \leftarrow P_{1_{k}}$
25:: $P_{k} \leftarrow i$
26:: $i \leftarrow i + 1$
27:: end while

The effect of this sorting algorithm is evident in Figure 5. Figure 5 represents the AIC matrix for the same mesh as Figure 2b, but with Algorithm 1 applied. After sorting is applied, the system matrix is upper-pentagonal and has a small bandwidth. Once a matrix is in upper-pentagonal form, its lower bandwidth may be calculated using Algorithm 2.

Algorithm 2 Method for determining the lower bandwidth of a given matrix A

1:: Input
2:: A System matrix
3:: Output
4:: $B_{l}$ Lower bandwidth
5:: $B_{l} \leftarrow 0$
6:: $i \leftarrow N$
7:: while $i > 0$ do ▹ Loop through rows of the matrix starting at the bottom
8:: found_nonzero ← False
9:: $j \leftarrow 1$
10:: while $j < i$ and not found_nonzero do▹ Loop through columns from the left to the diagonal
11:: if $|A_{i j}| > 1 e - 12$ then ▹ Check for a nonzero element at the current location
12:: found_nonzero ← True
13:: else
14:: $j = j + 1$
15:: end if
16:: end while
17:: if found_nonzero then
18:: $B_{l} \leftarrow max (B_{l}, i - j)$ ▹ Keep the largest lower bandwidth
19:: end if
20:: $i = i - 1$
21:: end while

3. Solution Methods

Having developed the upper-pentagonal matrix structure, how to most-efficiently solve such a system of equations will now be investigated. First, a novel algorithm is developed that exploits the upper pentagonal structure just discussed. Then, several other existing matrix solvers are described that may prove useful for solving the supersonic BEM system of equations. For this work, both direct and iterative solvers were considered.

3.1. Novel Method Using QR Decomposition

A well-known direct method for solving linear systems of equations is the QR decomposition, which factors a matrix A into the form

A = Q R

(9)

where Q is an orthogonal matrix (i.e.,

Q^{T} = Q^{- 1}

) and R is upper triangular [28]. Once A has been decomposed, the equation

A x = b

may be solved quickly, as it may be written as

R x = Q^{T} b

(10)

Since R is upper-triangular, this system may be solved using back-substitution in

O (N^{2})

time. In practice, the product

Q^{T} b

is usually calculated instead of calculating Q by itself.

The QR factorization can be obtained using Givens rotations [28,29]. In this method, successive rotations are applied to A that zero out each of the elements of A below the diagonal, beginning at the bottom left and proceeding up each column successively. Thus, A is gradually replaced with R. At the same time, these rotations are applied to b, building up

Q^{T} b

in place of b.

A single Givens rotation for zeroing out an element of A is given in Algorithm 3. For efficiently operating on an upper-pentagonal A matrix, each nonzero subdiagonal element is zeroed out with the diagonal element in the same column. This saves significant time over performing rotations between adjacent matrix elements, as performing rotations only between adjacent matrix elements may result in a large number of trivial swaps with elements that are already zero. As each element is zeroed out, the rotation is also applied to the rest of those two rows and the corresponding elements of b using Algorithm 4.

Algorithm 3 Method for generating a standard Givens rotation to zero out element

A_{i j}

using element x

1:: Input
2:: $A_{j j}$ Matrix element used to zero out $A_{i j}$
3:: $A_{i j}$ Matrix element to be zeroed out
4:: Output
5:: c Rotation cosine
6:: s Rotation sine
7:: $d \leftarrow \sqrt{A_{j j}^{2} + A_{i j}^{2}}$
8:: $c \leftarrow A_{j j} / d$
9:: $s \leftarrow A_{i j} / d$
10:: $A_{j j} \leftarrow d$
11:: $A_{i j} \leftarrow 0$

Algorithm 4 Method for applying a Givens rotation to a pair of row vectors

1:: Input
2:: x Upper row vector
3:: y Lower row vector
4:: c Rotation cosine
5:: s Rotation sine
6:: $t \leftarrow c x + s y$
7:: $y \leftarrow c y - s x$
8:: $x \leftarrow t$

Using Givens rotations to reduce a dense matrix to upper-triangular form takes

O (N^{3})

time, since there are

O (N^{2})

elements below the diagonal to be eliminated and applying each rotation takes

O (N)

time. However, R may be obtained faster if it is known which elements are already zero. For example, Givens rotations are often used to reduce a Hessenberg matrix (which has zero elements everywhere below the first subdiagonal) to an upper-triangular matrix [28,30,31]. Since there are only

O (N)

nonzero elements below the diagonal of a Hessenberg matrix, the reduction to upper-triangular form using Givens rotations takes only

O (N^{2})

time. For an upper-pentagonal matrix with lower bandwidth

B_{l} = 2

, reduction to upper-triangular form will take only

O (2 N^{2})

time. Extrapolating this, any given upper-pentagonal matrix with lower bandwidth

B_{l}

may be reduced to upper-triangular form in

O (B_{l} N^{2})

time. If

B_{l} < < N

, then such a method should be much faster than

O (N^{3})

solvers such as LU decomposition.

A method for solving an upper-pentagonal system (such as would arise from a supersonic panel method using Algorithm 1) using Givens rotations is described in Algorithm 5. This method is called the QR upper-pentagonal (QRUP) solver.

Algorithm 5 QRUP solver

1:: Input
2:: N System dimension (system is assumed square)
3:: A System matrix
4:: b RHS vector
5:: Output
6:: x Solution vector.
7:: $B_{l} \leftarrow$ output from lower bandwidth calculation
8:: $j \leftarrow 1$
9:: while $j \leq N$ do ▹ Loop through columns
10:: $i \leftarrow min (j + B_{l}, N)$
11:: while $i > j$ do ▹ Loop up through the rows
12:: if $A_{i j} \neq 0$ then ▹ We make sure $A_{i j} \neq 0$ to save unnecessary computations
13:: $A_{j j}, A_{i j}, c, s \leftarrow$ Generate Givens rotation to zero out $A_{i j}$ using $A_{j j}$
14:: Apply Givens rotation to the rest of rows j and i of A using c and s
15:: Apply Givens rotation to rows j and i of b using c and s
16:: end if
17:: $i \leftarrow i - 1$
18:: end while
19:: $j \leftarrow j + 1$
20:: end while
21:: $x \leftarrow$ solution of $R x = Q^{T} b$ from back substitution

One disadvantage to standard Givens rotations is that each rotation requires computing a square root, which can be computationally expensive [28]. However, fast Givens rotations exist that do not compute square roots [30,32,33,34]. These fast rotations avoid square roots by factoring the matrix A into the form

A = D Y

(11)

where D is a diagonal matrix. The squares of the denominators in the standard Givens rotations are stored in D. The fast Givens rotations as originally formulated in [30] are impractical, as D and Y must be monitored and periodically normalized to prevent overflow, negating the intended speed benefit [32]. However, Anda and Park presented an alternative formulation [33,34] that dynamically scales D to prevent overflow. The specific form of the rotation is dependent upon the relative magnitudes of the elements being rotated (

A_{j j}

and

A_{i j}

) and the diagonal elements in the corresponding rows (

D_{j}

and

D_{i}

). The formulas for these fast rotations are given in Table 1. For all four rotation types, the final step of applying the rotation is setting

A_{i j}

to zero.

In order to use Anda and Park’s [34] algorithm for the solution of a standard system of equations, the matrix equation must be written as

A x = b \to D Y x = D c

(12)

In the fast Givens algorithm, D is initialized to the identity matrix, which means, initially,

c = b

and

Y = A

. Since D contains only a non-zero diagonal, it is most-efficiently stored as a vector.

At each step of the algorithm, a fast Givens rotation is generated that zeros out an element of Y and updates two elements of D. This rotation is then applied to the other columns of Y and the corresponding rows of c (see Algorithm 6). Once the triangularization is complete, D may be factored out of both sides of Equation (12), resulting in

Y x = c

(13)

which may be solved in

O (N^{2})

time using back-substitution, since Y is upper-triangular.

A method for solving an upper-pentagonal system using fast Givens rotations is described in Algorithm 7. This is called the fast QRUP (FQRUP) solver. Similar to the QRUP solver, the FQRUP solver reduces an upper-pentagonal matrix to upper-triangular form in

O (B_{l} N^{2})

time, but without calculating square roots.

It may be noted here that the QRUP and FQRUP solvers possess another desirable trait beyond reduced order of complexity. Since these solvers are based on QR decomposition, they are rank-revealing [28]. This means that if the AIC matrix is rank-deficient, then the bottom-right element of the R matrix will be zero (to numerical precision). This zero element can be checked for and explicitly handled based on the desired behavior. Such explicit handling of a rank-deficient matrix system would be useful for a purely Neumann-based panel method using doublets, for which the AIC matrix is rank-deficient. This would require minimal modification to the algorithms described above.

Algorithm 6 Method for applying a fast Givens rotation

1:: Input
2:: x Upper row vector
3:: y Lower row vector
4:: $α$ Rotation factor
5:: $β$ Rotation factor
6:: $r t$ Rotation type
7:: if $r t = 1$ then
8:: $x \leftarrow x + β y$
9:: $y \leftarrow - α x + y$
10:: end if
11:: if $r t = 2$ then
12:: $y \leftarrow - α x + y$
13:: $x \leftarrow x + β y$
14:: end if
15:: if $r t = 3$ then
16:: $t \leftarrow y$
17:: $y \leftarrow - x + α y$
18:: $x \leftarrow t - β y$
19:: end if
20:: if $r t = 4$ then
21:: $t \leftarrow x$
22:: $x \leftarrow β x + y$
23:: $y \leftarrow α x - t$
24:: end if

Algorithm 7 FQRUP solver

1:: Input
2:: N System dimension (system is assumed square)
3:: A System matrix
4:: b RHS vector
5:: Output
6:: x Solution vector
7:: $B_{l} \leftarrow$ output from lower bandwidth calculation
8:: $j \leftarrow 1$
9:: $D \leftarrow 1$ ▹D is a length-N vector of diagonal elements
10:: while $j \leq N$ do ▹ Loop through columns
11:: $i \leftarrow min (j + B_{l}, N)$
12:: while $i > j$ do ▹ Loop up through the rows
13:: if $A_{i j} \neq 0$ then ▹ We make sure $A_{i j} \neq 0$ to save unnecessary computations
14:: $A_{j j}, A_{i j}, α, β, D_{j}, D_{i}, r t \leftarrow$ Generate fast Givens rotation to zero out $A_{i j}$ using $A_{j j}$
15:: Apply fast Givens rotation of type $r t$ to the rest of rows j and i of A using $α$ and $β$
16:: Apply fast Givens rotation of type $r t$ to rows j and i of b using $α$ and $β$
17:: end if
18:: $i \leftarrow i - 1$
19:: end while
20:: $j \leftarrow j + 1$
21:: end while
22:: $x \leftarrow$ Solution of $Y x = c$ from back substitution

3.2. Existing Methods for Comparison

To assess the performance of the QRUP and FQRUP solvers just developed, they will be compared against four existing matrix solvers, briefly described below.

3.2.1. LU Decomposition

One of the most basic matrix solvers is LU decomposition with partial pivoting. The complexity of this method is

O (N^{3})

, making it computationally expensive for large meshes. However, it is theoretically exact for well-conditioned systems and robust. It is also the solver that has typically been used for supersonic panel methods (e.g., see [4,35]). The implementation of LU decomposition used here was taken from [36].

3.2.2. BSSOR and BJAC Solvers

Two basic iterative solvers considered here are the block symmetric successive overrelaxation (BSSOR) [37] and block Jacobi (BJAC) methods [38]. These methods are attractive mainly for their simplicity. They have also been shown previously to work well for the dense, asymmetric systems arising from elliptic BEMs (e.g., see [10,11,18]).

In this study, the AIC matrix is asymmetric and indefinite, for which point successive overrelaxation and Jacobi iterations are not guaranteed to converge. To alleviate this, both are relaxed, and the block-iterative versions are used rather than the traditional, point-iterative versions. To save time, the blocks along the diagonal are decomposed (using LU decomposition) before the iterations begin.

3.2.3. GMRES

An alternative iterative method is the generalized minimum residual (GMRES) algorithm [31]. This method has become widely used for modern panel methods (and boundary element methods in general) due to its speed and ability to handle non-symmetric and non-definite matrices (e.g., see [12,13,14,15,16,17,39]). GMRES is a Krylov subspace method (a family of methods based on the Cayley–Hamilton theorem [28,40]), meaning it iteratively builds an approximation to the solution

x

in the space

K = span \{r_{0}, A r_{0}, A^{2} r_{0} \dots\}

(14)

where

r_{0} = A x_{0} - b

for some initial guess

x_{0}

. Each iteration of the GMRES algorithm involves adding another dimension to

K

, updating the orthogonal basis for

K

(called the Arnoldi update), and then determining the optimal solution in the expanded basis. One of the disadvantages of GMRES is that its memory and computation requirements steadily grow with each iteration [12]. However, it is guaranteed to converge in N iterations for an

N \times N

system, though convergence is typically must faster, making GMRES close to

O (N^{2})

[15]. The GMRES algorithm used here comes from [31].

3.3. Comparison Cases

The relative performance of each of the above solvers was assessed by using them to solve the system of equations arising from the supersonic BEM described previously. Three configurations were considered: a circular cone with a 10° half angle, a straight, double-wedge wing with a 5° half angle, and a wing–body configuration with engine nacelles. These configurations are shown in Figure 6 and Figure 7. These different configurations allow for exploring the effect of variations in the matrix structure on solver performance. For each case, three different mesh densities were analyzed to estimate the computational complexity of each solver. The resultant system dimension for each mesh refinement level, along with the freestream Mach number, is given in Table 2 for each configuration.

Each solver was tested with and without sorting the system first using Algorithm 1. While required for the QRUP and FQRUP solvers, initial tests revealed that the sorting algorithm developed here also improved the performance of some iterative methods. Each solver was also tested with and without diagonal preconditioning, as described in [12].

For both the BJAC and BSSOR solvers, a relaxation factor of 0.8 was used. The block size for these was

N / 5

, meaning the block size changed with each configuration and mesh refinement. Initial tests showed that this block size provided a good balance between the time it took to calculate the block decompositions and the time it took to run the iterations. All iterative solvers had a termination tolerance of

10^{- 12}

and a maximum of 1000 iterations were allowed.

For each solver, the total time required to sort the linear system (or not), apply the diagonal preconditioner (or not), and solve the system was recorded and then averaged over five repeated tests. The norm of the final residual vector was recorded (again averaged) to assess the accuracy of each solver. All tests were run on an OnLogic (South Burlington, VT, USA) K801 workstation with an Intel i9 24-core processor and 64 GB of RAM.

4. Results

In this section, the results from the six solvers for each of the three test cases are presented. The computational orders of complexity of the solvers are then discussed.

4.1. Cone

The first test case was a circular cone with a 10° half angle. The slender shape of the cone makes it such that the upper-pentagonal AIC matrix is very filled-in (see Figure 3) and has a small lower bandwidth. This is because the domain of influence for any given panel will encompass much of the rest of the cone downstream of it, particularly for panels closer to the tip.

Figure 8 shows the matrix solver run times for the 10° half-angle cone. In this case, the QRUP and FQRUP solvers ran the fastest out of all solvers, with GMRES having comparable run times for the fine mesh. Sorting the system significantly affected the run times only for the QRUP and FQRUP solvers, for which the run times were significantly reduced by sorting, as would be expected. Diagonal preconditioning had no noticeable effect on the run time of any solver.

Figure 9 shows the solution residual norms for the different matrix solvers. With the sorting algorithm applied and using diagonal preconditioning, all solvers produced a residual norm of less than

10^{- 12}

. Without sorting, the QRUP and FQRUP solvers produced unacceptably high residuals. This may be because the many operations required to reduce a non-upper-pentagonal matrix to upper-triangular form resulted in a non-negligible buildup of numerical error, something not seen when relatively few operations were used on the upper-pentagonal system. Preconditioning improved the final accuracy of QRUP, but not FQRUP. This may be due to how the FQRUP solver stores the matrix diagonal separately.

It is useful to observe how the residual error decreases with each iteration of the iterative solvers. This is shown in Figure 10 for the cases where diagonal preconditioning was applied and the system was sorted. Convergence was very smooth for all solvers, indicating a well-conditioned system.

4.2. Double-Wedge Wing

The second test case was a straight, double-wedge wing with a 5° wedge half angle. The relatively high aspect ratio of the double-wedge wing makes the upper-pentagonal AIC matrix rather sparse (see Figure 5). This is because any given panel will have relatively few panels within its domain of influence, even if that panel is near the leading edge.

Figure 11 shows the solver run times for the double-wedge wing. For this case, the GMRES solver was the fastest of all the solvers considered. The QRUP and FQRUP solvers still outperformed the LU, BJAC, and BSSOR solvers. As before, sorting the system of equations significantly improved the speed of the QRUP and FQRUP solvers. In addition, sorting the system also significantly improved the run times for the BJAC and BSSOR solvers. It is evident in Figure 2b that there are large blocks of nonzero elements far from the diagonal of the unsorted AIC matrix. Thus, sorting the system likely improved diagonal dominance, which helps convergence of these methods [38].

Figure 12 shows the final residuals for the various solvers. In this case, the final residuals produced by the QRUP and FQRUP solvers were relatively high, even with the system sorted. Without sorting the system, the QRUP and FQRUP residuals are unacceptably high. The QRUP solver was again helped by diagonal preconditioning, but all other solvers appeared unaffected. It can be noted that, without sorting, the BJAC and BSSOR solvers failed to converge, meaning the run times for these solvers reported in Figure 11 are artificially low. This indicates the improvement to these algorithms provided by the upper-pentagonal sorting algorithm is even greater than seen in Figure 11.

Figure 13 shows the iterative residual histories for the cases where diagonal preconditioning was applied and the system was sorted. As with the cone case, convergence was very smooth. Interestingly, the BJAC and BSSOR solvers converged in fewer iterations for the medium and fine meshes than for the coarse mesh.

4.3. Wing–Body–Nacelle Combination

The final test case considered was a wing–body–nacelle combination. The resultant upper-pentagonal AIC matrix for the wing–body–nacelle combination is shown in Figure 14. Its structure is somewhere in between that of the AIC matrices for the wing and cone. The upper-pentagonal portion is more dense than that of the wing but less dense than that of the cone. The AIC matrix for this case also has a large lower bandwidth, which serves to test the efficiency of the QRUP and FQRUP solvers for non-optimal cases. In addition, the nacelle on the configuration has an extremely sharp trailing edge. Because of this, the control points placed on either side of the trailing edge are very close together, making the AIC matrix more-poorly conditioned than the other cases (for how these control points are placed, see [20]). For this case, the matrix condition number is typically on the order of 56,000 (as calculated using the singular value decomposition), whereas it was much lower for the two other cases. This gives an opportunity to test the robustness of each solver.

Figure 15 shows the solver run times for the wing–body–nacelle configuration. For this case, the QRUP and FQRUP solvers were the fastest out of all solvers for the sorted system. The BJAC and BSSOR solvers failed to fully converge for the coarse- and medium-density meshes with the system sorted, and so the timing results for these cannot be considered.

The final residuals for the wing–body–nacelle configuration are shown in Figure 16. It is interesting to note that, even for the sorted system, GMRES failed to converge fully for the fine mesh. The BJAC and BSSOR solvers also struggled to converge, consistent with the relatively high condition number of the AIC matrix. On the other hand, the FQRUP solver consistently produced residuals less than

10^{- 12}

. The QRUP solver did not do as well, though it did produce residuals consistently less than

10^{- 10}

.

Figure 17 shows the residual history for the iterative solvers with diagonal preconditioning and the system sorted. Both BJAC and BSSOR converged relatively quickly on the fine mesh. For the other meshes, both of these solvers failed to converge. For the medium mesh, both solvers diverged. The GMRES algorithm converged more slowly as the mesh became more refined, reflecting what is shown in Figure 16.

4.4. Solver Time Complexities

The time complexity of a method is often used to predict its performance for general cases. Time complexity is estimated by fitting a power law to the execution time as a function of the size of the linear system. This was performed here for each solver with the system sorted, as this typically produced the best results. The time complexity averaged across the three test cases considered here are shown in Figure 18. Note that the results from the BJAC and BSSOR solvers on the wing–body–nacelle configuration are not included in this analysis, as those solvers failed to converge for all mesh refinement levels in that case. The GMRES solver was considered sufficiently converged. Not surprisingly, the LU solver showed the least variation in complexity between the different configurations, as well as the highest complexity in general. Diagonal preconditioning evidently had little to no effect on the order of complexity of any solver. In all cases, the iterative solvers had lower time complexity than the direct solvers, though the QRUP and FQRUP solvers still showed a significant improvement over the LU solver. The time complexities of the QRUP and FQRUP solvers had the largest variation between cases out of all the solvers, showing the sensitivity of these solvers to the lower bandwidth (for the cone case, the time complexity was estimated to be 2.3, whereas it was 2.7 for the wing–body–nacelle combination). Out of all the solvers considered, GMRES had the lowest time complexity by a small margin. However, this may be artificially low due to GMRES not converging for the fine mesh in the wing–body–nacelle case.

5. Discussion

In two of the three cases considered here (the cone and wing–body–nacelle cases), the QRUP and FQRUP solvers were the fastest out of all solvers considered. For the double-wedge wing, GMRES was fastest. However, despite being faster than the other solvers for the cases considered here, the QRUP and FQRUP solvers had higher time complexities than the BJAC, BSSOR, and GMRES solvers. Thus, in terms of application, the QRUP and FQRUP solvers may be best-suited for smaller systems of equations.

There are some differences in run time to note between the QRUP and FQRUP solvers. The FQRUP solver was developed to reduce computation times compared to the QRUP solver. For the cases considered here, the FQRUP solver did run slightly faster than the QRUP solver. However, as shown in Figure 18, the FQRUP solver had a slightly higher average time complexity than the QRUP solver. Their time complexities should be the same, since the two methods require similar numbers of operations, and so the difference in measured time complexities would likely be eliminated by considering more cases. This can be seen in the variation of these time complexities.

In terms of robustness, the QRUP and FQRUP solvers resulted in final residual norms of

10^{- 10}

or less, as long as the system was sorted into upper-pentagonal form. Only the LU decomposition solver consistently produced lower residuals, and the iterative solvers failed to converge for some cases. Thus, the QRUP and FQRUP solvers are very robust. Across all cases considered, the QRUP solver had higher residuals if diagonal preconditioning was not used. However, the robustness of the FQRUP solver was unaffected by diagonal preconditioning. Thus, the FQRUP solver is the more robust of the two.

The GMRES solver showed the best overall performance in terms of time complexity. It seemed to also be unaffected by diagonal preconditioning and whether the system was upper-pentagonal. This makes GMRES an attractive option for implementation in a general panel method, as separate solvers would not need to be used to maximize performance for both subsonic and supersonic cases. However, the GMRES solver performed poorly for the wing–body–nacelle case as the mesh refinement increased. For the fine mesh in this case, the QRUP and FQRUP solvers were both faster and more accurate than GMRES. Additionally, the QRUP and FQRUP solvers outperformed GMRES for the cone case at all mesh resolutions. The BJAC and BSSOR solvers also had lower time complexity than the QRUP and FQRUP solvers. However, they were less robust than GMRES at producing low residuals.

In light of this evidence, the QRUP and FQRUP solvers are viable alternatives to GMRES and other iterative solvers. Paired with the novel sorting algorithm developed here, these solvers are faster for lower system sizes and more robust than the iterative solvers considered. In addition, as discussed at the end of Section 3.1, the QRUP and FQRUP solvers are rank-revealing and can be easily modified to solve rank-deficient systems of equations. This is a significant advantage of the QRUP and FQRUP solvers over the other solvers considered here. LU decomposition is not rank-revealing, and the iterative solvers considered struggled to converge when the matrix condition number was high. For implementation, the FQRUP solver is recommended over QRUP, as FQRUP is not sensitive to whether or not diagonal preconditioning is also used.

Though the robustness of the QRUP and FQRUP solvers has been mentioned, they failed to produce residuals as low as LU decomposition. As mentioned previously, this could likely be improved by pivoting, as is typically performed with LU decomposition. This is because some diagonal elements of the resulting upper-triangular matrix were often small (on the order of

10^{- 7}

). Swapping rows or columns to put larger elements on the diagonal could result in less numerical error, as the back-substitution step requires dividing by the diagonal elements. However, for efficiency, such a pivoting scheme would need to be designed to produce the smallest possible increase in the lower bandwidth of the system matrix. This is an area of potential future research.

Also, there was significant variation in the time complexity of the QRUP and FQRUP solvers between the three test cases. This is due to the large variation in lower bandwidth between these test cases. If the sorting algorithm could be improved to reduce lower bandwidth even further, then the performance of these solvers would improve.

It has also been shown that the sorting algorithm developed here (Algorithm 1) significantly improved the performance of the BJAC and BSSOR solvers. Thus, some developers may want to implement this sorting algorithm within a supersonic panel method, even if the QRUP or FQRUP solver is not being used.

Parallelization

As stated in the Introduction, parallelization is not a focus of this work. However, it is valuable to consider how each method could be parallelized and the effect this would have on performance. Some brief thoughts are given here. A more in-depth study into the effect of parallelization on these methods may prove valuable in the future.

LU decomposition, being essentially the same as Gaussian elimination, is inherently serial, as each step in the algorithm is dependent upon results obtained from a previous step. However, each step of both decomposition and back substitution require at least one vector operation that could be parallelized. Hence, some gains could be made by parallelizing LU decomposition. However, overhead costs would likely be significant unless some means were found to avoid starting up and shutting down a team of threads at each step of the algorithm.

The block-Jacobi method is very easily parallelized. The system of equations can be divided into as many blocks as there are available threads (or some multiple thereof). In this case, each thread holds onto a given block or blocks. The only inter-process communication then necessary is the passing of the latest update to the unknown vector. This communication is minimal if shared-memory parallelization is used. Additionally, this can be implemented in such a way that the team of threads does not need to be recreated each iteration. From experience, the authors are of the opinion that parallelization of the BJAC method leads to significant gains.

On the other hand, block-symmetric-successive overrelaxation is not easily parallelized. This is because, at each iteration, each block requires the solution from previous blocks for that same iteration. Parallelization could be implemented within each block solve. However, as with LU decomposition, the overhead costs associated with this are likely significant. In addition, the complexity of BSSOR is kept low by using small blocks along the diagonal. But the smaller these blocks are, the less effective parallelization becomes.

Parallelization of the QRUP and FQRUP solvers is essentially the same as for LU decomposition. The algorithms are inherently serial, but parallelization may be used within each step. In particular, each step requires applying the calculated Givens rotation to the rest of the two current rows, and this application could be parallelized. Again, it is likely that the overhead cost of parallelizing these rotations would be significant.

Within GMRES, most parts of the algorithm need to be performed serially. However, it would be fairly simple to parallelize the Arnoldi update step of each iteration, as this requires a matrix multiplication and then vector subtraction, both of which are easily parallelized [31]. However, as with LU, QRUP, and FQRUP, it is anticipated that this would have high overhead costs associated with restarting the parallelization at each iteration.

Due to the high overhead costs of recreating the necessary team of processes at each step of the algorithm, it is anticipated that parallelizing LU decomposition, GMRES, QRUP, or FQRUP would be most advantageous for cases with large numbers of unknowns.

Out of all solvers considered, BJAC is very easily parallelized and has a low order of complexity. It also benefits from the sorting algorithm (Algorithm 1) developed here. However, it should be remembered that BJAC was not robust compared to FQRUP or even GMRES for the cases considered here.

6. Conclusions

In this work, the efficient solution of the linear system of equations arising in a supersonic BEM was considered. The BEM considered here was implemented on an unstructured mesh of triangles with linear-doublet-constant-source panels. While efficient solution of similar equations for subsonic flow has been considered previously, the supersonic case has not been examined. It was shown here how, in the supersonic case, the linear system of equations may be manipulated into a special form, termed upper-pentagonal. A simple sorting algorithm was presented that, for an unstructured, supersonic BEM, can result in an upper-pentagonal system matrix with relatively low bandwidth. This algorithm is based only on the mesh geometry and the freestream direction, and so may be executed without any knowledge of the actual system matrix.

A new matrix solver was then developed, based on Givens rotations, for efficiently solving a system of equations with an upper-pentagonal structure. This method is called the QRUP solver. A variation of this (called the FQRUP solver), based on Anda and Park’s self-scaling fast Givens rotations, was also developed and tested, which removed the need to calculate square roots. For comparison, other methods for solving the linear system of equations were then discussed, including LU decomposition, BJAC, BSSOR, and GMRES.

The performance of these different solvers was compared for three representative supersonic cases: a circular cone, a double-wedge wing, and a wing–body–nacelle combination. For two out of the tree test cases, the QRUP and FQRUP solvers outperformed the other solvers in terms of both speed and accuracy. Only for the wing case did GMRES run faster and have lower residuals than QRUP and FQRUP. However, for the wing–body–nacelle case, GMRES failed to converge on the highest mesh resolution. In analyzing time complexity, it was found that GMRES had the lowest time complexity of all the solvers. The time complexity of the QRUP and FQRUP solvers varied significantly with the lower-bandwidth of the system but was significantly lower than LU decomposition and only slightly higher than the iterative solvers.

While the question of which matrix solver to implement depends upon many factors, the FQRUP solver can be recommended as a fast and robust alternative to existing solvers. It also has the advantage of being easily modifiable to solve rank-deficient matrix equations, making it applicable to Neumann-based panel methods. And it is anticipated that FQRUP would benefit from parallelization to the same degree as GMRES would.

It was also shown that the novel sorting algorithm developed here can be used to significantly improve the performance of the BJAC and BSSOR solvers. Thus, the sorting algorithm has use beyond the QRUP and FQRUP solvers.

In the future, it would be valuable to see if pivoting could be implemented as part of the QRUP and FQRUP solvers. This would potentially improve the robustness of these solvers even further. In addition, improvements to the novel sorting algorithm could be made to reduce the resulting lower bandwidth. This would in turn improve the performance of the QRUP and FQRUP solvers and reduce their time complexity. This would be significant, as time complexity was found here to be the primary disadvantage to the QRUP and FQRUP solvers.

Author Contributions

Conceptualization, C.G. and D.H.; Methodology, C.G.; Software, C.G.; Validation, C.G.; Writing—original draft, C.G.; Writing—review & editing, C.G. and D.H.; Supervision, D.H.; Project administration, D.H.; Funding acquisition, C.G. and D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the NASA University Leadership Initiative (ULI) program under federal award number NNX17AJ96A, titled “Adaptive Aerostructures for Revolutionary Civil Supersonic Transportation”. This work was also funded in part by Air Force STTR Contract number FA864922P0081.

Data Availability Statement

The data presented in this work may be obtained by contacting the authors directly.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Brebbia, C.A. The Boundary Element Method for Engineers; John Wiley & Sons, Inc.: New York, NY, USA, 1978. [Google Scholar]
Erickson, L.L. Panel Methods—An Introduction; Technical Paper 2995; NASA Ames Research Center: Moffett Field, CA, USA, 1990.
Ehlers, F.E.; Epton, M.A.; Johnson, F.T.; Magnus, A.E.; Rubbert, P.E. A Higher Order Panel Method for Linearized Supersonic Flow; NASA Contractor Report 3062; NASA: Washington, DC, USA, 1979.
Epton, M.A.; Magnus, A.E. PAN AIR: A Computer Program for Predicting Subsonic or Supersonic Linear Potential Flows About Arbitrary Configurations Using a Higher Order Panel Method. Volume 1: Theory Document; NACA-CR-3251; NASA: Washington, DC, USA, 1981.
Goates, C.D. Development, Implementation, and Optimization of a Modern, Subsonic/Supersonic Panel Method. Ph.D. Thesis, Utah State University, Logan, UT, USA, 2023. [Google Scholar] [CrossRef]
Maruyama, Y.; Akishita, S.; Nakamura, A. Numerical Simulations of Supersonic Flows about Arbitrary Configurations Using a New Panel Method Program. In Proceedings of the Astrodynamics Conference, Williamsburg, VA, USA, 18–20 August 1986; pp. 303–312. [Google Scholar]
Youngren, H.H.; Bouchard, E.E.; Coopersmith, R.M.; Miranda, L.R. Comparison of Panel Method Formulations and its Influence on the Development of QUADPAN, an Advanced Low Order Panel Method. In Proceedings of the Applied Aerodynamics Conference, Danvers, MA, USA, 13–15 July 1983; p. 14. [Google Scholar]
Davis, J.D.; Marshall, D.D. A Higher-Order Method Implemented in an Unstructured Panel Code to Model Linearized Supersonic Flows. In Proceedings of the SciTech Forum, Orlando, FL, USA, 6–10 January 2020. [Google Scholar]
Tinoco, E.N.; Rubbert, P.E. Panel Methods: PAN AIR. In Computational Methods in Potential Aerodynamics; Morino, L., Ed.; Springer: New York, NY, USA, 1985; pp. 39–93. [Google Scholar]
Mullen, R.L.; Rencis, J.J. Iterative Methods for Solving Boundary Element Equations. Comput. Struct. 1987, 25, 713–723. [Google Scholar] [CrossRef]
Mansur, W.J.; Araujo, F.C.; Malaghini, J.E.B. Solution of BEM Systems of Equations via Iterative Techniques. Int. J. Numer. Methods Eng. 1992, 33, 1823–1841. [Google Scholar] [CrossRef]
Barra, L.P.S.; Coutinho, A.L.G.A.; Mansur, W.J.; Telles, J.C.F. Iterative Solution of BEM Equations by GMRES Algorithm. Comput. Struct. 1992, 44, 1249–1253. [Google Scholar] [CrossRef]
Boschitsch, A.; Curbishley, T.; Quackenbush, T.; Teske, M. A fast panel method for potential flows about complex geometries. In Proceedings of the 34th Aerospace Sciences Meeting and Exhibit, Reno, NV, USA, 15–18 January 1996. [Google Scholar] [CrossRef]
Willis, D.J.; Peraire, J.; White, J.K. A combined pFFT-multipole tree code, unsteady panel method with vortex particle wakes. Int. J. Numer. Methods Fluids 2007, 53, 1399–1422. [Google Scholar] [CrossRef]
Xiao, H.; Chen, Z. Numerical Experiments of Preconditioned Krylov Subspace Methods Solving the Dense Non-Symmetric Systems Arising from BEM. Eng. Anal. Bound. Elem. 2007, 31, 1013–1023. [Google Scholar] [CrossRef]
Yao, Z.; Zheng, X.; Yuan, H.; Feng, J. Research progress of high-performance BEM and investigation on convergence of GMRES in local stress analysis of slender real thin-plate beams. Eng. Comput. 2019, 36, 2530–2556. [Google Scholar] [CrossRef]
Sun, J.; Zheng, X.; Liu, Y.; Yao, Z. Some investigations on convergence of GMRES in solving BEM equations for slender beam structures. Eng. Anal. Bound. Elem. 2021, 126, 128–135. [Google Scholar] [CrossRef]
Davey, K.; Bounds, S. A Generalized SOR Method for Dense Linear Systems of Boundary Element Equations. SIAM J. Sci. Comput. 1998, 19, 935–967. [Google Scholar] [CrossRef]
Ward, G.N. Linearized Theory of Steady High-Speed Flow; Cambridge University Press: Cambridge, UK, 1955. [Google Scholar]
Goates, C.D.; Houser, A.M.; Hunsaker, D.F. MachLine: Development of a Dirichlet-Based, Subsonic/Supersonic Panel Method for Unstructured Grids. J. Aircr. 2024, 61. [Google Scholar] [CrossRef]
AeroLab. MachLine. 2022. Available online: https://github.com/usuaero/MachLine (accessed on 1 August 2024).
Goates, C.D. MachLine: Dissertation Version. 2023. Available online: https://doi.org/10.5281/zenodo.8253561 (accessed on 1 August 2024).
Coulier, P.; Pouransari, H.; Darve, E. The Inverse Fast Multipole Method: Using a Fast Approximate Direct Solver as a Preconditioner for Dense Linear Systems. SIAM J. Sci. Comput. 2017, 39, A761–A796. [Google Scholar] [CrossRef]
Xia, J. Randomized Sparse Direct Solvers. SIAM J. Matrix Anal. Appl. 2013, 34, 197–227. [Google Scholar] [CrossRef]
Goates, C.D.; Houser, A.M.; Hunsaker, D.F. Implementation of MachLine: A Subsonic/Supersonic, Unstructured Panel Code. In Proceedings of the SciTech Forum, National Harbor, MD, USA, 23–27 January 2023. [Google Scholar] [CrossRef]
Kellogg, O.D. Foundations of Potential Theory; Dover Publications, Inc.: Mineola, NY, USA, 1953. [Google Scholar]
Goates, C.D.; Hunsaker, D.F. Development of a Subsonic-Supersonic, Unstructured Panel Method. In Proceedings of the SciTech Forum, San Diego, CA, USA, 3–7 January 2022. [Google Scholar] [CrossRef]
Moon, T.K.; Stirling, W.C. Mathematical Methods and Algorithms for Signal Processing; Prentice Hall: Upper Saddle River, NY, USA, 2000. [Google Scholar]
Stewart, G.W. Matrix Algorithms Volume 1: Basic Decompositions; SIAM: Philadelphia, PA, USA, 1998. [Google Scholar]
Gentleman, W.M. Least Squares Computations by Givens Transformations Without Square Roots. J. Inst. Math. Its Appl. 1973, 12, 329–336. [Google Scholar] [CrossRef]
Hobgen, L. (Ed.) Handbook of Linear Algebra; Chapman & Hall/CRC: Boca Raton, FL, USA, 2007. [Google Scholar]
Golub, G.H.; Loan, C.F.V. Matrix Computations; Johns Hopkins University Press: Baltimore, MA, USA, 1996. [Google Scholar]
Anda, A.A.; Park, H. Fast Plane Rotations with Dynamic Scaling. SIAM J. Matrix Anal. Appl. 1994, 15, 162–174. [Google Scholar] [CrossRef]
Anda, A.A.; Park, H. Self-Scaling Fast Rotations for Stiff and Equality-Constrained Linear Least Squares Problems. Linear Algebra Its Appl. 1996, 234, 137–161. [Google Scholar] [CrossRef][Green Version]
Davis, J.D. A Higher-Order Method Implemented in an Unstructured Panel Code to Model Linearized Supersonic Flows; California Polytechnic State University: San Luis Obispo, CA, USA, 2019. [Google Scholar]
Press, W.H.; Flannery, B.P.; Teukolsky, S.A.; Vetterling, W.T. Numerical Recipes—The Art of Scientific Computing; Cambridge University Press: Cambridge, UK, 1986. [Google Scholar]
Ehrlich, L.W. The Block Symmetrix Successive Overrelaxation Method. J. Soc. Ind. Appl. Math. 1964, 12, 807–826. [Google Scholar] [CrossRef]
Hageman, L.A.; Young, D.M. Applied Iterative Methods; Academic Press: New York, NY, USA, 1981. [Google Scholar]
Shakib, F.; Hughes, T.J.; Johan, Z. A multi-element group preconditioned GMRES algorithm for nonsymmetric systems arising in finite element analysis. Comput. Methods Appl. Mech. Eng. 1989, 75, 415–456. [Google Scholar] [CrossRef]
Layton, S. Fast Multipole Boundary Element Solutions with Inexact Krylov Iterations and Relaxation Strategies. Ph.D. Thesis, Boston University, Boston, MA, USA, 2013. [Google Scholar]

Figure 1. Diagram of the general structure of an AIC matrix.

Figure 2. Heatmaps showing the magnitudes of the AIC matrix coefficients for a high-aspect-ratio, 5° half-angle, double-wedge wing in (a) subsonic and (b) supersonic flow. The scale is logarithmic. Black regions signify AIC elements which are identically zero.

Figure 3. Heatmap showing the magnitudes of the AIC matrix coefficients for a 10° half-angle cone in supersonic flow (

M_{\infty} = 1.5

). The scale is logarithmic. Black regions signify AIC elements which are identically zero.

Figure 3. Heatmap showing the magnitudes of the AIC matrix coefficients for a 10° half-angle cone in supersonic flow (

M_{\infty} = 1.5

). The scale is logarithmic. Black regions signify AIC elements which are identically zero.

Figure 4. Schematic of the pentagonal shape of the non-zero elements in the supersonic AIC matrix when properly ordered. The width of the band of non-zero elements below the diagonal is exaggerated slightly to make the pentagonal shape clear.

Figure 5. Heatmap representing the AIC matrix coefficients for the wing represented in Figure 2b after sorting the mesh vertices using Algorithm 1.

Figure 6. Medium meshes for the (a) cone and (b) double-wedge wing.

Figure 7. Medium mesh for the wing–body–nacelle combination.

Figure 8. Average solver run times for the cone case with the system of equations (a) sorted and (b) unsorted to achieve upper-pentagonal structure.

Figure 9. Average residual norms for the cone case with the system of equations (a) sorted and (b) unsorted to achieve upper-pentagonal structure. Dashed line indicates the iterative solver termination tolerance.

Figure 10. Iteration history for the BJAC, BSSOR, and GMRES solvers on the cone case with the system sorted and diagonal preconditioning. The various shades of gray correspond to the mesh densities (i.e., light is fine, dark is coarse).

Figure 11. Average solver run times for the wing case with the system of equations (a) sorted and (b) unsorted to achieve upper-pentagonal structure.

Figure 12. Average residual norms for the wing case with the system of equations (a) sorted and (b) unsorted to achieve upper-pentagonal structure. Dashed line indicates the iterative solver termination tolerance.

Figure 13. Iteration history for the BJAC, BSSOR, and GMRES solvers on the double-wedge wing case with the system sorted and diagonal preconditioning. The various shades of gray correspond to the mesh densities (i.e., light is fine, dark is coarse).

Figure 14. Heatmap representing the AIC matrix for the medium wing–body–nacelle configuration mesh at

M_{\infty} = 2

.

Figure 14. Heatmap representing the AIC matrix for the medium wing–body–nacelle configuration mesh at

M_{\infty} = 2

.

Figure 15. Average solver run times for the wing–body–nacelle case with the system of equations (a) sorted and (b) unsorted to achieve upper-pentagonal structure.

Figure 16. Average residual norms for the wing–body–nacelle case with the system of equations (a) sorted and (b) unsorted to achieve upper-pentagonal structure. Dashed line indicates the iterative solver termination tolerance.

Figure 17. Iteration history for the BJAC, BSSOR, and GMRES solvers on the wing–body–nacelle case with the system sorted and diagonal preconditioning. The various shades of gray correspond to the mesh densities (i.e., light is fine, dark is coarse).

Figure 18. Solver time complexities averaged across the three configurations with and without diagonal preconditioning. The uncertainty bands shown represent the minimum and maximum complexities across the three configurations.

Table 1. Formulas for generating a fast Givens rotation, taken from [34]. Note that the final step of each is setting

A_{i j} \leftarrow 0

.

Table 1. Formulas for generating a fast Givens rotation, taken from [34]. Note that the final step of each is setting

A_{i j} \leftarrow 0

.

	$\| D_{j} \| \geq \| D_{i} \|$	$\| D_{j} \| < \| D_{i} \|$
$D_{j} / D_{i} \geq A_{i j}^{2} / A_{j j}^{2}$	$r t \leftarrow 1$	$r t \leftarrow 2$
	$τ \leftarrow A_{i j} / A_{j j}$	$α \leftarrow A_{i j} / A_{j j}$
	$β \leftarrow τ / γ$	$τ \leftarrow α / γ$
	$δ \leftarrow 1 + β τ$	$δ \leftarrow 1 + α τ$
	$α \leftarrow τ / δ$	$β \leftarrow τ / δ$
	$D_{j} \leftarrow D_{j} / δ$	$D_{j} \leftarrow D_{j} δ$
	$D_{i} \leftarrow D_{i} δ$	$D_{i} \leftarrow D_{i} / δ$
	$A_{j j} \leftarrow A_{j j} δ$	$A_{j j} \leftarrow A_{j j}$
$D_{j} / D_{i} < A_{i j}^{2} / A_{j j}^{2}$	$r t \leftarrow 3$	$r t \leftarrow 4$
	$α \leftarrow A_{j j} / A_{i j}$	$τ \leftarrow A_{j j} / A_{i j}$
	$τ \leftarrow α γ$	$β \leftarrow τ γ$
	$δ \leftarrow 1 + α τ$	$δ \leftarrow 1 + β τ$
	$β \leftarrow τ / δ$	$α \leftarrow τ / δ$
	$t \leftarrow D_{i} δ$	$t \leftarrow D_{i} / δ$
	$D_{i} \leftarrow D_{j} / δ$	$D_{i} \leftarrow D_{j} δ$
	$D_{j} \leftarrow t$	$D_{j} \leftarrow t$
	$A_{j j} \leftarrow A_{i j}$	$A_{j j} \leftarrow A_{i j} δ$

Table 2. Freestream Mach numbers and system dimension (N) for the three configurations considered.

			N
Configuration	$M_{\infty}$	Coarse	Medium	Fine
Cone	1.5	321	1241	4881
Wing	2.0	419	1127	3847
Wing–Body–Nacelle	2.0	332	1285	5541

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Goates, C.; Hunsaker, D. A Novel, Direct Matrix Solver for Supersonic Boundary Element Method Systems. Aerospace 2024, 11, 1018. https://doi.org/10.3390/aerospace11121018

AMA Style

Goates C, Hunsaker D. A Novel, Direct Matrix Solver for Supersonic Boundary Element Method Systems. Aerospace. 2024; 11(12):1018. https://doi.org/10.3390/aerospace11121018

Chicago/Turabian Style

Goates, Cory, and Douglas Hunsaker. 2024. "A Novel, Direct Matrix Solver for Supersonic Boundary Element Method Systems" Aerospace 11, no. 12: 1018. https://doi.org/10.3390/aerospace11121018

APA Style

Goates, C., & Hunsaker, D. (2024). A Novel, Direct Matrix Solver for Supersonic Boundary Element Method Systems. Aerospace, 11(12), 1018. https://doi.org/10.3390/aerospace11121018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel, Direct Matrix Solver for Supersonic Boundary Element Method Systems

Abstract

1. Introduction

2. Linear System of Equations from the Supersonic BEM

2.1. Supersonic Panel Methods

2.2. Structure of the Supersonic Matrix Equation

2.3. Sorting Algorithm for Obtaining Upper-Pentagonal Form

3. Solution Methods

3.1. Novel Method Using QR Decomposition

3.2. Existing Methods for Comparison

3.2.1. LU Decomposition

3.2.2. BSSOR and BJAC Solvers

3.2.3. GMRES

3.3. Comparison Cases

4. Results

4.1. Cone

4.2. Double-Wedge Wing

4.3. Wing–Body–Nacelle Combination

4.4. Solver Time Complexities

5. Discussion

Parallelization

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI