To solve large sparse systems of linear equations quickly, the iterative method selected was the AMGPCG algorithm. It is necessary to study the specific algorithm theory and the implementation difficulties of the AMG, such as the time of stopping coarsening, the method of quickly solving the coarse grid matrix and the way of combining the AMG and CG methods.
2.1. AMG Method
The multigrid (MG) algorithm is one of the most effecient numerical methods for solving large-scale systems arising from (elliptic) PDEs. For large-scale FEM systems, the local relaxation methods (e.g., Gauss–Seidel) are typically effective at eliminating the high-frequency error components, while the low-frequency parts cannot be eliminated effectively [
11]. The MG method’s idea is to project the error obtained after applying a few iterations of the local relaxation methods onto a coarser grid. The low-frequency part of the error on the find-grid becomes a relatively high-frequency part on the coarser grid, and these frequencies can be further corrected by a local relaxation method on the coarse grid. The MG method is repeating this process in ever coarser grids [
12]. The local relaxation method is called smoother in this process.
The MG algorithm divides into the geometric multigrid (GMG) and algebraic multigrid (AMG) methods. The GMG method depends on a hierarchy of geometric grids. Since the geometric multi-grid is constructed by the geometric information, it is very difficult to generate a coarse grid for complex geometric structures. The AMG methods were put forward to solve the challenges of the multiple geometric algorithms [
13]. They only need the coefficient matrix
A of linear systems, which do not require different grid levels. A. Brandt, K. Stüben et al., proposed and developed the AMG method within the past three decades [
14,
15,
16,
17]. There are two kinds of methods that have been developed greatly. One method is constructing coarse matrixes and interpolation operators based on geometric parts and analytical information in linear systems, such as the smooth aggregation-based (SA-AMG) [
18], energy-min AMG [
19] and aggregation-based AMG methods [
17]. The other is the adaptive AMG methods. Their basic idea is to adjust and optimize the AMG components in the solution process. This kind of method includes being based on methods such as compatible relaxation AMG [
20], Bootstrap AMG [
21] and root node-based AMG [
16]. In this paper, the aggregation-based AMG proposed by Y. Notay [
17] is used as the preprocessing condition of the AMGPCG method.
The AMG method consists of two phases: the setup phase and the solution phase [
22]. The set-up phase first needs to design a multigrid
.
is set by the coarsening algorithm.
is the top-level grid.
,
is a set of coarse grid points, and
is a set of thin grid points.
and
do not intersect. Then, the set-up phase needs to construct a grid operator
and interpolator
P for every
. The solution phase performs multigrid loops, such as V-cycle, W-cycle and FMG.
The coarsening algorithm can construct
using only the information of
. The coarse grid matrix
is computed by the Galerkin formula:
The algebraic coarsening algorithm has classical coarsening and aggregate roughing, among others. Aggregation coarsening is used in this paper. The coarse grid point set defines the aggregates
, and the interpolation matrix
P is constructed from
as follows:
requires the set of nodes
to which
i is strongly negatively coupled using the strong or weak coupling threshold
:
where
. EOS constructs a finite element matrix in which Dirichlet boundary conditions have been imposed.
is the number of unmarked nodes that are strongly negatively coupled to
i (
is the number of sets
to which
i belongs and that correspond to an unmarked node
j):
Algorithm 1 is part of the coarsening [
17], and it finds coarse grid aggregations
, where
.
Algorithm 1 Pairwise aggregation. |
- Input:
Matrix ; -
Sets ; -
Sets ; -
Array ; -
The bool (whether the grid is top-level grid ) ; -
The number of coarse variables ; - Output:
aggregation . - 1:
if ==true then - 2:
U\; - 3:
end if - 4:
while
do - 5:
Select with minimal ; = +1; - 6:
Select such that ; - 7:
if then - 8:
; - 9:
else - 10:
; - 11:
end if - 12:
; - 13:
For all , updata: for ; - 14:
end while
|
The technical difficulty of Algorithm 1 is to select
with minimal
. Time complexity in the bubble sort is
, so it cannot quickly find the minimal
by sorting
. This paper uses minimum heap sorting to quickly select
with a minimal
. The minimum heap data structure is a complete binary tree. This complete binary tree has nodes whose values are less than their children, as shown in
Figure 1. The minimum heap in
Figure 1 shows the data structure of a binary tree. Because it is too complex to store the minimum heap data in a full binary tree, the minimum heap in
Figure 1 stores it in an array, as shown in
Figure 2.
We can use step 5 of Algorithm 1 to take out the root node in the min-heap. Step 13 of Algorithm 1 updates , and the order of the heap is restored through the upper filtering of the min-heap. This paper uses an array to store . If has two points, then , inserts nodes j and k. If there is only one point, then , inserts j and −1. can be used by the array index.
Algorithm 1 achieves aggregation, which is always in pairs. However, coarsening by pairwise aggregation is slow. Repeating double pairwise aggregation can result in faster coarsening. The aggregation
is constructed from matrix
A based on Algorithm 1. We construct auxiliary matrix
as follows:
The aggregation
is constructed by
based on Algorithm 1. Then, we construct an aggregation
, which is given by
These aggregates
are mostly quadruplets, with some triplets, pairs and singletons left. Aggregations
can construct the coarse matrix
and the interpolator
with Equations (
1) and (
2):
Because the nonzero entry values of the interpolator P are all one, and there is only one nonzero entry in one row, then the coefficient matrix A and the interpolator P are sparse matrixes. In accumulation , many cumulative points are zero. Therefore, accumulation only computes when .
With the multigrid
, the common coarsening stopping condition is that the order of the coarse matrix
is less than 100. However, coarsening is very slow for the individual multigrid. An example is listed in
Table 1, in which
is the grid layer and
n is the matrix order number of the grid layer. If
, then the coarsening is very fast. If
, then the coarsening effect is feeble. The whole coarsening efficiency is poor. According to this phenomenon,
is used as the coarsening stopping index, and we finally found that
could meet our requirement. Therefore, if
or
is met during coarsening, then coarsening is stopped.
After constructing the coarse grid matrix and the interpolator operator P, the set-up phase of the AMG algorithm has been implemented. The solution phase performs two schemes. In EOS, the V-cycle and K-cycle schemes are both tested.
The AMG program in EOS can be implemented in parallel theoretically. The CPU transfers the find-grid matrix
A from the CPU’s DRAM to the GPU’s device memory. Then, the AMG finds coarse grid aggregations
on the GPU by a mutex lock. Equation (
1) proves that the solutions of the coarse matrix element
do not affect each other’s elements. Therefore, the program of computing
can be evaluated in parallel on the GPU, and the GPU can be used to compute intensive smoothing operations at the same time [
7].