Let us consider the implications of the boundary conditions (
2a) for the flow. Solenoidality of
(
1d) and the no-slip conditions for the horizontal components of the flow imply that in addition to (
2a), the condition
holds (which is a boundary condition for the pressure in an implicit form). The flow can be decomposed into the sum of the toroidal,
, poloidal,
, and mean-field,
, (understood here in the sense of averaging over the horizontal variables
and
) components (see, e.g., [
29]):
where
In view of the boundary conditions (
2a) for the flow, the potentials
and
, and the mean-field component satisfy the boundary conditions
Adding an arbitrary function of
(and time) to the toroidal or poloidal potential does not alter the respective components of a vector field; thus, we assume
Following the Galerkin method, we expand the potentials in series of functions satisfying the respective boundary conditions and consider ordinary differential equations in time for the coefficients of the expansions that express the orthogonality of the resultant discrepancies in the partial differential equations to certain test functions. The full set of functions used for expansion of approximate solutions should constitute a basis in the suitable functional space, as well as the full set of the test functions. The number of the test functions employed for the orthogonal projection when constructing an approximate solution should be equal to the dimension of the functional subspace, where the approximate solution is sought. The traditional Galerkin method, in which the same functional subspaces are employed for approximating the solution and for the orthogonal projection, is usually advantageous (e.g., one can derive energy estimates useful for controlling the convergence of the approximate solutions to the precise ones on increasing the dimension of the subspace). (In the Petrov–Galerkin method, distinct functional subspaces are used for expanding solutions and for the projection [
10].) By contrast, it was proposed in [
7] to use the collocation method (this can be regarded as “orthogonalisation” to
-functions, see [
28]) to expand solutions in Chebyshev polynomials and to employ two higher-degree Chebyshev polynomials to satisfy the boundary conditions. This procedure is reminiscent of the tau method due to Lancoz [
30] (see also [
31]). However, then the coefficients of the correcting Chebyshev polynomials depend on the coefficients of the low-degree Chebyshev polynomials in such a way that, in principle, the former coefficients can remain large instead of tending to zero on increasing the number of the polynomials used for approximation.
We follow the traditional Galerkin method and use the polynomials (
10) for both expanding and projecting. Expansion in the vertical coordinate,
, is not straightforward. In view of (
6a), it may seem natural to expand the toroidal potential in the sine Fourier series. However, this basis is not particularly suitable for resolving the boundary layers emerging at high Ekman numbers. Moreover, every zero of the sine is also a zero of its second derivative; if solutions to the system of equations (
Section 2) do not feature this property at the horizontal boundaries (and it is unlikely that they do), then sine series are guaranteed to converge poorly. We therefore resort to the next simplest basis in the Lebesgue space, consisting of Chebyshev polynomials,
for
and
[
32]. They are orthogonal on this interval with the weight
and we always carry out orthogonalisation in
with this weight function. The relations
(see, e.g., [
32]) are useful in treating the boundary conditions. They imply that a linear combination of Chebyshev polynomials vanishes at the end points as long as the sum of the coefficients of the even-degree polynomials vanishes, as well as the sum of the coefficients of the odd-degree polynomials. We expand functions satisfying the conditions (
6a) in the series of the polynomials
as recommended in [
33] because of the relative simplicity of the matrices resulting from discretisation of the second- and fourth-order differential operators.
3.1. Algorithms for Determining the Coefficients of a Linear Combination of
Consequently, we approximate the toroidal potential by finite series
where
. The equation for the mean-field component of the flow is derived by averaging over horizontal variables of the horizontal component of (
1a). This component is also expanded in polynomials (
10),
and equations for the coefficients in the expansion are also obtained by orthogonal projection on these polynomials. In view of the boundary conditions (
2d), following the same approach for discretisation of temperature we expand
The vector
of the time derivatives of the coefficients of the expansion is a solution to the linear system of equations of the form
. Here
is the vector (of length
) of the dot products of the r.h.s. of (
8) with
, and
G is the Gram matrix for the polynomials
, both (
G and
) divided by
. The matrix has the following non-zero entries:
for all
on the diagonal, and
on two subdiagonals. The linear system of equations can be solved by seven algorithms.
Algorithms 1 and 1 are the standard shuttle (also known as Thomas) algorithms for a three-diagonal matrix [
34] applied in the direct and reverse direction.
Algorithms 2 and 2. We observe that all the equations except for the first two and the last two involve the coefficients
that are also encountered in the simplest finite differences approximation of the second derivative. Consequently, the l.h.s. of these equations do not change when the solution
is altered by any linear function of
n, i.e., the “intermediate” equations are invariant under the transformation
, where
and
are arbitrary constant numbers. Following this observation, we set, in the spirit of the shuttle algorithm,
and find (
N is assumed to be even)
This is algorithm 2. While here all unknown coefficients
are expressed in terms of
and
, in algorithm
and
play the role of such “basic” variables: setting
we obtain
Both versions of this algorithm have the numerical complexity O(
N); by contrast, using the orthogonal polynomials
results in the O(
) complexity of computations, which is prohibitively large. Given that the Chebyshev series coefficients
tend to zero for large
n, the recurrence relation (12b), where computations of
proceed from smaller in magnitude values to larger ones and all
involve products of large numbers
with small factors
and
, is less affected by round-off errors. Thus, algorithm
(12) apparently suits our purposes better than algorithm 2.
The analogy between the simplest finite-difference approximation of second-order derivatives of a fictitious function in the l.h.s. of the system at hand is more closely exploited in derivation of algorithms 3 and 3. This gives an opportunity to find explicit solutions for .
In terms of the variables
for
(“the first-order derivatives” approximated by the Euler scheme for the unit mesh size), the system of equations takes the form
We will now “twice integrate” the fictitious function. The system (13) splits into separate systems for even-index and odd-index variables. In the odd case, we find sequentially from (
13a), (
13c) and (
13d) (“the first integration”)
Summing up all these
relations (“the second integration”) yields
whereby
Summing up the relations from the first up to the
kth one now yields
We find the same way from the system in the even-index unknowns
In algorithm
the variables
and
play the role of the basic variables
and
in algorithm 3, resulting in the relations
Rearranging sums in (
15) and (
16), we obtain alternative equivalent expressions
Algorithm 4 exploits them instead of (
15) and (
16).
In order to assess the performance of the seven algorithms, we have conducted preliminary numerical experiments, tracking their efficiency (execution times) and the quality of the output (numerical errors originating from rounding-off in the course of computations). Regarding a set of numbers
, where
and
are pseudorandom numbers uniformly distributed in the intervals [
] and [5, 8], respectively, as a sample test solution to the problem
, we compute the respective r.h.s.,
, and analyse the discrepancies
for the approximate solutions
obtained by each algorithm. This procedure has been performed for
(mimicking a poor resolution of the convective dynamo problem discretization),
(intermediate), and
(high resolution) for
sets of
, comprising
test problems in total. The results are summarised in
Table 1 and
Table 2, and
Figure 2 and
Figure 3.
For each resolution parameter (the number
M of the sought coefficients in the expansion at hand) and each algorithm,
Table 1 presents the ranges (i.e., the maximum and minimum values) of the error norms for
approximate solutions to the test problems. Two norms are considered: the maximum norm
and the so-called “energy” norm
.
Table 2 illustrates the execution times for each algorithm. Since each individual computation of a sample solution is very fast, the individual durations of computations by the same algorithm for the same
M significantly vary from run to run. To reduce the influence of this noise, we have employed the following procedure: for a sample problem, we measure the CPU time required for 1000 (identical) applications of the algorithm under consideration; we make 1000 measurements of this time (for test problems for the same resolution parameter
M solved by the same algorithm); the average of the obtained numbers is reported in
Table 2. These mean CPU execution times are reproduced to at least 3 significant digits for all the 21 test problems used for measuring the times; thus, the influence of noise in these values is reduced as desirable. In principle, algorithm 1 execution times should not differ from those of algorithm
; similarly for the algorithm pairs 2 and
, and 3 and
. The nature of the variation of these values is related to the intimate details of application of the compiler optimisation techniques (in this experiment we have used the maximum optimisation). For algorithms 3 and
the variation also reflects the difference in the programming of computation of the sums involved aimed at enhancing the accuracy.
Figure 2 illustrates the error distributions among the approximate solutions
to the sample test problems. Plots for two test problems for each of the three considered values of
M are shown. The problem splits into two independent subproblems for odd- and even-number variables
; errors for odd-index coefficients are shown by continuous lines and for even-index coefficients by dashed lines. Graph colours indicate by which algorithms the curves have been obtained. We observe that algorithms 2 and 3 yield blatantly high errors; algorithms 2 and
are also imperfect in that they produce higher errors for higher-index unknowns
. The errors experience a wild spiky behaviour for low-index coefficients
, but, except for algorithm
, it becomes more ordered roughly for
.
To assess how statistically relatively poor or good the considered algorithms are, we present in
Figure 3 the distributions of error norms measured in the units of the smallest error norm for the given test problem. More precisely, the plots have been constructed by the following procedure: A set of
test problems with solutions, pseudorandomly generated as discussed above, have been solved by each of the seven algorithms. For the
pth problem and the
kth algorithm, the error norms
have been determined. We denote the smallest of these seven norms by
and define the quantities
as the number of cases (among the considered
test problems), in which the error norm obtained by algorithm
k relative to the smallest norm
falls into the
mth bin, or in other words, the ratio
satisfies the inequality
. In particular,
is the number of test problems, for which algorithm
k delivers the smallest, over all the seven algorithms, norm, i.e.,
. These computations have been performed for both norms, the maximum norm
and the energy norm
.
For each algorithm and both error norms, in
Figure 3 we show the quantities
versus the bin number
m. The data points for six algorithms are joined into plots up to the smallest
m such that
(the use of the logarithmic scale for
forces us to break the plot at this point). The outlier values
for larger
m are shown as disjoint points; all these values are small. The data distribution for algorithm 2 has a different nature, more resembling a cloud; we therefore render the data as two plots (for odd- and even-index numbers) only for the small-dimensional problem (
) for
. The outliers
for
for
, and for
for
and 2046 are ignored in
Figure 3. There are 211 and 1529 such outliers for the odd- and even-index test problems, respectively, for algorithm 3 for
, 18 for the even-index problems for
, and 2265 and 8831 for the odd- and even-index test problems for
. These statistics demonstrate that the larger the number of unknown coefficients
M, the more the output of algorithm 2 is affected by the numerical noise.
The numerical results lead us to the following conclusions:
- i.
Except for algorithm 2, in applications for solving the test problems, for any M and for both error norms each algorithm delivers approximate solutions of any possible accuracy from the best to the worst one, i.e., for each algorithm, solutions to some test problems obtained by this algorithm have the smallest, the largest, or any differently placed intermediate error norm among the seven error norms obtained for this problem.
- ii.
Algorithm 2 yields solutions of blantly poor accuracy. For , its output still includes solutions occupying any possible place between the best-accuracy and worst-accuracy solutions. However, for , it yields just three most accurate solutions in terms of the maximum error norm and just one, if the energy error norm is used. These numbers gradually increase via 46 and 15 penultimately worst-accuracy solutions to 999,934 and 999,975 least accurate solutions. By contrast, for and 2046 all its outputs are the least accurate solutions for both error norms, except for one case of penultimately worst-accuracy solutions for the maximum norm.
- iii.
The error norms (see
Table 1) are compatible with the standard “double” (in the Fortran speak; 64 bit words) computer precision (the
“machine epsilon”). The poor performance of algorithm 2 and the second-worst (accuracy-wise) algorithm 3 stems from the involvement in the expressions used in their formulation of the products of
and
with the numbers proportional to the indices of the unknown coefficients that increase up to the large numbers
M.
- iv.
We can regard the accuracy results at a different angle. In the worst-case scenario, the norm of a discrepancy in the r.h.s. of the equation
is multiplied by the condition number of the matrix
G, which is defined as the product of norms of the matrices
G and
. The former norm is obviously order 1; to estimate the latter, we use (
14) and (
15) for the r.h.s. where all
, and find
. Hence, the condition number of
G is (at least) order
, which is compatible with the accuracy results for algorithm 2 (see maximum errors in
Table 1). We observe that numerical errors are amplified by an algorithm-specific effective condition number [
35], which can be much smaller than the worst-case theoretical condition number, provided a specialised algorithm takes into account particular properties of the problem,
- v.
For all the three M values used, most frequently the smallest-error approximate solutions are provided by algorithm for whichever error norm: 315,174 and 236,707 times out of for , 325,841 and 236,544 for , and 333,062 and 239,009 for (the first number in a pair is for the maximum error norm and the second one for the energy norm). Accuracy-wise, the shuttle algorithms 1 and are mutually close and not significantly inferior to algorithm .
- vi.
The quantities
have the maxima at
(see
Figure 3), i.e., for each algorithm and each
M, the error norms fall into the bin
with the highest probability.
- vii.
Solutions
obtained by all algorithms except for two and three have significantly larger errors
for small
n (say, for
) than for larger
n (see
Figure 3). By contrast, algorithms 2 and 3 yield maximum errors for intermediate and high
n. In addition, the errors in solutions computed by algorithm 3 wildly oscillate, which is not typical for the behaviour of errors generated by the other six algorithms. Consequently, algorithms 2 and 3 yield approximate solutions in which
for intermediate and high
n have exceptionally high relative errors.
- viii.
Algorithms 2 and
are significantly faster than the other five algorithms (see
Table 2). Their execution CPU times are mutually close and differ much more from the execution times registered for the other ones.
Properties iii and vii render algorithms 2 and 3 inapplicable. None of the seven algorithms is “perfect”, but the compromise between the highest efficiency and accuracy reveals the optimality of algorithm .
3.2. Algorithms for Determining the Coefficients of a Linear
Combination of
The equation for the poloidal potential is obtained by applying twice the curl to (
1a) and taking the vertical component of the result:
The above equation is equivalent to the parabolic equation
To apply the inverse Laplacian
is now tempting but unfeasible, since the conditions (6b) do not imply suitable boundary conditions for
. By (
9), the polynomials
satisfy (6b), and the poloidal potentials can be approximated by finite series
where
.
Substituting the series into (
17), we obtain an equation, whose l.h.s. is of the form
where
is a polynomial of degree
. As usual, we apply the Fourier transform in the horizontal directions to deduce the equation for each pair of indices
and
, and orthogonally project
on the polynomials
with the weight
. Computing d
requires inverting the matrices acting on the columnar vectors consisting of the time derivatives d
for
and
fixed. As in the case of the toroidal potential of the flow, these are band matrices. To see this, we integrate twice by parts the integral
encountered in the projection of the l.h.s. of (
17) (noting that the test polynomial
satisfies the boundary conditions (6b)) and apply the relation
It can be proven by using the identities
Relation (
19) is also useful for computing
where
and
is the Kronecker symbol. Differentiation in
can now be performed ([
9], see also [
36]) using the recurrence relation
Thus, the vector
of the time derivatives of the expansion coefficients is a solution to the linear system of equations of the form
where
is the vector (of length
) of the dot products of the r.h.s. of (
17) with
; both
G and
are divided by
. The matrix has non-zero entries on the diagonal:
and on four subdiagonals:
where
Algorithm 1 is the standard shuttle algorithm for a pentadiagonal matrix [
37]. We note that the odd-index subproblem is separated from the even-index one and use the ansatz
The first four equations imply
The
equations for
yield the recurrence relations
where it is assumed
for
. This yields
for
and
M, and we use the recurrence relation (
21) to compute sequentially all the unknowns
for decreasing
n. (Note
for
and
M.)
Algorithm 1 is the same shuttle procedure applied to the system of equations with the reverse numbering of equations and variables.
Algorithms 2 and 2 are modifications of the shuttle algorithm based on the observation that the sum of coefficients in the equation number
k for
is zero. Hence, in new variables
, the system reduces to a problem with a matrix
involving three subdiagonals for
:
The problem splits into two independent subproblems for odd- and even-number variables
. We formulate algorithm 2 for solving the subproblem for even-number variables; it is straightforward to reformulate it for the subproblem for odd-number variables. We introduce
variables
,
for
and
. They satisfy
original equations and the gauge equation
The first equation is equivalent to
. Equations for
take the form
whereby
which defines the first, “direct run of the shuttle”. Here,
. For
, the relation (
23) reduces to just
where
and
. We can now use (
23) for
to obtain the coefficients in (
24) recursively,
(the second, “reverse run of the shuttle”). Simultaneously, we compute the coefficients
and
in the partial sums
By (
22), we then find
. Now, the third “run of the shuttle” yields
This algorithm is apparently more involved than the standard shuttle algorithm. However, since it works with the differences
, it may be advantageous in yielding more accurate values of
.
Algorithm
amounts to the same computational procedure applied to the system of equations, where the numbering of equations and variables is reversed. Thus, we obtain the relations, similar to (
25), in which all
are expressed in terms of
instead of
. Normally, the coefficients
tend to zero, whereby
is a priori much larger than
; for a given precision of computations (in our case, the standard real*8 “double” precision of the floating point arithmetics), this is a tighter constraint on the number of correct digits after the decimal point. Consequently, we may expect algorithm
to yield less accurate values of
than algorithm 2.
We have investigated the performance of the four algorithms following the same approach as for the algorithms for determining the coefficients of linear combinations of the polynomials
: we have synthesized
pseudorandom sample test solutions
by the same procedure, outlined in
Section 3.1, as for the polynomials
, and analysed the errors
for the approximate solutions obtained by each of the four algorithms for
, 508 and 2044 terms in linear combinations of
.
For each resolution parameter
M and each algorithm,
Table 3 shows the ranges (i.e., the maximum and minimum values) of the maximum and energy norms of the approximate solution errors.
Table 4 illustrates the execution times for each algorithm, measured by the same procedure (see
Section 3.1) as for the polynomials
. The obtained mean execution times are again accurate to at least three significant digits. We expect the algorithm 1 execution times to coincide with those for algorithm
, similarly for algorithms 2 and
. However, the times presented in
Table 4 for algorithms 1 and 2 are slightly smaller than those for algorithms
and
, respectively. This difference is insignificant, it just reflects the way we have programmed the latter algorithms: in our implementation, the orders of the equations and unknown variables are reversed at run time, which can be avoided by a more careful programming of the algorithms using index reversion.
Figure 4 illustrates the error distributions among the unknowns detected for solutions to six sample test problems (plots for two problems for all the considered
, 508 and 2044 are shown).
Figure 5 presents the distributions of error norms measured by the same procedure, discussed in
Section 3.1, as for the polynomials
. For the
pth problem and the
kth algorithm, we find the error norms
and the smallest norm for this problem,
. We again denote by
the number of cases, among the
test problems, in which the error norm produced by algorithm
k, measured in the units of
, falls into the
mth bin, i.e.,
m satisfies the inequality
. For each algorithm and both error norms (the maximum norm and the energy norm),
Figure 5 shows the histogram
versus the bin number
m. The data points are joined up to the smallest
m such that
, and the outlier values
(always small) for larger
m are shown as disjoint points; unlike in
Figure 3, all outliers are shown.
The results reveal the following properties of the algorithms:
- i.
None of the four algorithms is “perfect”: for all considered M and for both error norms, when solving the test problems each algorithm delivers solutions of any possible relative accuracy from the best to the worst one, in the sense that solutions to some test problems obtained by any algorithm have the smallest, the largest or any intermediate error norms among the four error norms produced by the four algorithms for this problem.
- ii.
The quantities
have the maxima at
(see
Figure 5), i.e., for each algorithm the error norms fall into the second bin,
, with the highest probability.
- iii.
For any M and for both error norms, algorithm delivers the best accuracy solutions (has the maximum over k) more often than the other three. For and 508, the least accurate solutions are obtained most frequently by algorithm and for by algorithm 2.
- iv.
The errors for intermediate and high n (say, for ) in approximate solutions computed by algorithm are significantly larger than in solutions obtained by any of the other three algorithms.
- v.
The execution CPU times of algorithms 1 and are slightly larger than those for algorithms 2 and .
Property iv implies that algorithm yields approximate solutions, where for intermediate and high n are polluted by high relative errors, rendering this algorithm inapplicable. The smallest execution time and the highest frequency of yielding the smallest errors distinguish algorithm , but algorithms 2 and follow closely.