A Preconditioned Variant of the Reﬁned Arnoldi Method for Computing PageRank Eigenvectors

: The PageRank model computes the stationary distribution of a Markov random walk on the linking structure of a network, and it uses the values within to represent the importance or centrality of each node. This model is ﬁrst proposed by Google for ranking web pages, then it is widely applied as a centrality measure for networks arising in various ﬁelds such as in chemistry, bioinformatics, neuroscience and social networks. For example, it can measure the node centralities of the gene-gene annotation network to evaluate the relevance of each gene with a certain disease. The networks in some ﬁelds including bioinformatics are undirected, thus the corresponding adjacency matrices are symmetry. Mathematically, the PageRank model can be stated as ﬁnding the unit positive eigenvector corresponding to the largest eigenvalue of a transition matrix built upon the linking structure. With rapid development of science and technology, the networks in real applications become larger and larger, thus the PageRank model always desires numerical algorithms with reduced algorithmic or memory complexity. In this paper, we propose a novel preconditioning approach for solving the PageRank model. This approach transforms the original PageRank eigen-problem into a new one that is more amenable to solve. We then present a preconditioned version of the reﬁned Arnoldi method for solving this model. We demonstrate theoretically that the preconditioned Arnoldi method has higher execution efﬁciency and parallelism than the reﬁned Arnoldi method. In plenty of numerical experiments, this preconditioned method exhibits noticeably faster convergence speed over its standard counterpart, especially for difﬁcult cases with large damping factors. Besides, this superiority maintains when this technique is applied to other variants of the reﬁned Arnoldi method. Overall, the proposed technique can give the PageRank model a faster solving process, and this will possibly improve the efﬁciency of researches, engineering projects and services where this model is applied.


Introduction
With the rapid development of the Internet, web search engines become very popular for information retrieval. Because a web search engine can usually find an immense set of Web pages matching the search query, it is necessary to rank higher the most important pages to make this tool practical. Fur this purpose, the PageRank model was developed by the Google team to rank the importance of Web pages based on the frequency of page visits recorded by a random user who keeps browsing the World Wide Web with an equal probability of choosing the hyperlinks on each page. Mathematically speaking, it requires the computation of the stationary distribution of this Markov random walk on the linking structure of pages, the values within the distribution represent the frequency of visits to each Web page. The linking structure of the Web is represented by a large directed graph (called the Web link graph) and by its adjacency matrix G ∈ N n×n (here n is the number of Web pages) such that G(i, j) = 0 (being 1) only when page j has a hyperlink pointing to page i. The transition probability matrix P ∈ R n×n of this random walk process is defined as , if G(i, j) = 1, 0, otherwise. (1) However, to avoid that the process stagnates if a dangling page without hyperlinks is visited, in the model P is usually modified as where v ∈ R n×1 is a probability distribution vector, d ∈ N n×1 is a binary vector, and d(i) = 1 (1 ≤ i ≤ n) if page i has no hyperlinks. According to the Perron-Frobenius theorem, the unique existence of the stationary distribution vector is guaranteed when P is an irreducible stochastic matrix. To enforce this assumption, P is further modified into the matrix where α ∈ (0, 1) is called the damping factor and e = [1, 1, · · · , 1] T . Matrix A is often called the Google matrix [1]. Finally, the PageRank model can be stated mathematically as finding the unit positive eigenvector corresponding to the eigenvalue 1 of A, that is the solution of the eigen-problem Ax = x, Because of the assumption that A is an irreducible stochastic matrix, given by Equation (2), it follows that Equation (3) is uniquely solvable. The unique stationary distribution vector x is called the PageRank vector. Note that, the PageRank model now is often used as a network centrality measure to identify the most important nodes within large networks arising in several applications such as in chemistry, bioinformatics, neuro-science, bibliometrics, web search engines, social networks, etc. [2]. In some fields such as the bioinformatics, the related networks are usually undirected, thus the corresponding adjacency matrices are symmetry. This symmetry can be used to build efficient storage format and efficient implementation for matrix-vector multiplications, then the difficulty of solving the PageRank model (also called the PageRank problem) can be reduced.
With rapid development of science and technology, the dimension of PageRank problems from various application fields has grown hugely in the last decades and still keeps growing. Accordingly, iterative methods become the only viable option to solve Equation (3) numerically. Stationary iterative methods such as the Power, Jacobi and Gauss-Seidel methods are effective when the damping parameter α is not too close to 1, e.g., for search engine applications Google initially used α = 0.85. However, larger values of α (not too close to 1) such as 0.99 may sometimes give a better ranking result [3], and they tend to converge significantly more slowly when α is large. These computational issues make the PageRank problem always require new algorithmic solutions that are more time-saving and memory-saving.
The development of more efficient algorithms for solving PageRank problems has been ongoing for the past decade or so. One research direction is to accelerate the convergence speed of stationary iterative methods, related works include adaptive [4] and extrapolation methods [5][6][7], multigrid solvers [8,9], and inner-outer iterations [10][11][12][13][14]. Meanwhile a significant amount of work has been devoted in particular to the analysis of Krylov subspace methods for computing PageRank. Golub and Greif proposed in [3] a new variant of the Arnoldi algorithm that is particularly suited for solving problems with a large range of damping values, and can outperform many conventional stationary iterative solvers when α is closer to 1. Yin et al. have used weighted inner-products instead of the standard inner-products in the Arnoldi process reporting faster convergence due to a more favourable distribution of the harmonic Ritz values [15]. The Power-Arnoldi method [16], its weighted version [17] and the Arnoldi-Inout method [18] are hybrid variants of the Arnoldi process that combine periodically stationary and Krylov iterations, and can improve other established PageRank solvers on many examples. All of these techniques could accelerate the Arnoldi method from [3], but there is likely to be a lot of room for further improvement. One obvious thing is that, preconditioning, which is widely used to accelerate Krylov subspace methods for solving linear systems, has not been considered yet for accelerating the Arnoldi-type methods when solving the PageRank eigen-problem. This is an area worth investigating because successful preconditioning usually results in significant acceleration of the solving process, and can be well combined with other acceleration techniques.
In this work we propose a new theoretical formulation of the PageRank eigenvector problem (3) that is characterized by a better eigenvalue separation in the spectrum between the dominant and the second dominant eigenvalues of the coefficient matrix, so that the Arnoldi process may require significantly less iterations to converge, and consequently less BLAS-1 (inner-products and SAXPY) operations in the Gram-Schmidt orthogonalization. Our strategy to transform the original problem into a new one that is more amenable to the iterative solution can be seen as a form of preconditioning. The optimal parameter setting to choose in this preconditioning technique is also discussed. Our experiments confirm the theoretical findings and demonstrate that the Arnoldi-type methods applied to the preconditioned eigen-problem can converge much faster over their standard counterparts. Thus it has potential to solve large-scale PageRank computations more effectively on both sequential and parallel machines. Accordingly, we can expect that the proposed technique can accelerate accomplishing projects from various fields that use the PageRank model. For example, for the GeneRank problem [19] and the ProteinRank [20] problem that use the PageRank model on the gene-gene and protein-protein networks, users can find the genes and proteins that are pathogenic with high probability faster.
The paper is organized as follows. In Section 2, we outline the refined Arnoldi method proposed in [3] that is at the basis of our development, along with its main convergence results. In Section 3, we present a preconditioned variant of the refined Arnoldi method for computing PageRank eigenvectors. Numerical experiments are reported in Section 4 to support our theoretical findings. Finally, some conclusions from this study are presented in Section 5. Note that, MATLAB notations are used throughout this article.

The Refined Arnoldi Method for PageRank
The Arnoldi process proposed in 1951 is an algorithm based on the modified Gram-Schmidt orthogonalization that, after m steps, computes an orthonormal basis {v 1 , v 2 , · · · , v m+1 } of the Krylov subspace K m+1 (A, v 0 ) = span{v 0 , Av 0 , · · · , A m v 0 }, where A ∈ R n×n and v 0 ∈ R n×1 are a given matrix and an initial vector, respectively. Here we sketch it in Algorithm 1.
In matrix form, Algorithm 1 yields after m steps the Arnoldi decompositions and where H m = {h i,j } ∈ R m×m is an upper Hessenberg matrix, and we denote byH m = [H m ; h m+1,m e T m ] ∈ R (m+1)×m , by e m = [0, 0, . . . , 0, 1] T and by V m+1 = [V m , v m+1 ]. In Table 1 we summarize the computational cost of Algorithm 1 for the case of a general matrix A, and in Table 2 for the special case of the Google matrix ( (2). Note that in the latter case, from the expression A = αP + v((1 − α)e + αd) T , it follows that the product Au requires 1 sparse matrix-vector operation αPu (hereafter u represents an arbitrary vector with appropriate dimension), 1 inner product f T u where f = ((1 − α)e + αd) is stored, 1 vector scaling operation and 1 vector addition.
Compute w = Av j . 4: for i = 1 : j do 5: Compute h i,j = v T i w.

Operation Times
Sparse matrix-vector multiplication m Inner product Vector scaling The Arnoldi process is the main ingredient of many efficient numerical techniques for solving linear systems and eigen-problems. After m steps, scalars λ are called the Ritz values and the Ritz vectors of A onto the Krylov subspace K m (A, v 0 ), respectively. The Ritz values with largest real parts and their corresponding Ritz vectors are often used to approximate the eigenvalues with largest real parts of A and their associated eigenvectors [21]. However, in-depth convergence analysis of the Arnoldi method shows that the Ritz vectors are not guaranteed to converge to the actual eigenvectors, even when the Ritz values do converge.
To overcome this difficulty, Jia proposed in [22] to compute the refined Ritz vector u The solution of Equation (7) is given by → λ i , some potential numerical difficulties may still arise in practice. Firstly, the largest Ritz value may be complex, and the use of complex arithmetic may be a memory burden for large-scale computations. Secondly, when α is close to 1, slow or irregular convergence can still happen due to a weak separation of the eigenvalues of A. Golub and Greif propose to use the largest eigenvalue of the Google matrix, which is known and equal to 1, as a shift in (8) instead of the largest Ritz value λ (m) 1 as an attempt to overcome problems related to slow solving process [3]. We present the Golub and Greif variant of the refined Arnoldi method for PageRank in Algorithm 2, and refer to it shortly as the GG-Arnoldi method hereafter. Outputs x = x/ x 1 and ends. 7: else 8: Set x 0 = x and go to step 1. 9: end if 10: return x.
The overall computational cost of Algorithm 2 is summarized in Table 3. Note that the singular value decomposition computed at line 3 is omitted because this cost is negligible when m is very small. Analogously, the vector scaling x = x/ x 1 at line 6 is not reported in the table as it is computed only at the last cycle. Finally, the operation x = V m W(:, m) is equivalent to m vector scaling operations and m − 1 vector additions.
The convergence analysis of the GG-Arnoldi method is presented in [20]. Here we recall the main result. We denote as X ⊥ the standard orthogonal basis of the orthogonal complement of the space span{x}, so that [x, X ⊥ ] is orthonormal. Then, we have where A 2 = X T ⊥ AX ⊥ . According to the Perron-Frobenius theorem, the spectral radius of A is equal to 1, A has only one eigenvalue 1 on the unit circle, and the eigenspace of this eigenvalue is 1-dimensional. As 1 is a simple eigenvalue of A, we introduce a separation function named "sep" and defined as

Operation Times
Sparse matrix-vector multiplication m Inner-product Vector-scaling Vector 1-norm computation 1 Under these assumptions, the theorem below suggests that the larger the second dominant eigenvalue λ 2 , the slower the convergence rate of the GG-Arnoldi method applied to the eigen-problem (3).
Theorem 1 (Theorem 2.2 in [20]). Let P m be the orthogonal projector onto the subspace K m (A, v 0 ), m = (I − P m )x 2 be the distance between the PageRank vector x and the subspace K m (A, v 0 ), and x AT be the solution generated by the GG-Arnoldi method. Then it holds We recall below another property of the eigenvalues of the Google matrix A that is very relevant to our analysis.

Theorem 2 ([23]
). If a column-stochastic matrix P has at least two irreducible closed subsets (which is the case for the web hyperlink matrix), then the second eigenvalue λ 2 of A = α P + (1 − α)ve T , where 0 < α < 1 and v is a vector with non-negative elements satisfying v 1 = 1, is given by α.
Therefore, slow convergence of the GG-Arnoldi method for PageRank problems may be expected in particular when α approaches 1. As this situation arises in several applications, some acceleration techniques have been developed in the past years to enhance the robustness of the GG-Arnoldi method. They can be classified in two types: (1) using weighted inner products instead of the standard inner products in Algorithm 2 to ensure a more favourable distribution of Ritz values [15,17]; (2) combining the GG-Arnoldi method with a few cycles of stationary iterative solvers, like in the Power-Arnoldi method [16], the Inout-Arnoldi method [18], the Arnoldi-PET method [17] and others. Both approaches attempt to provide better initial guesses for the Arnoldi process. To the best of our knowledge, no research work has investigated efficient ways to precondition the PageRank eigen-problem (3), in order to make it easier to be solved by the Arnoldi-type method. This is the main objective of our study.

Preconditioning the Refined Arnoldi Method
Preconditioning is an established technique used to accelerate Krylov subspace methods for solving linear systems. A nonsingular linear system Ay = c can be transformed into an equivalent one of either the form MAy = Mc (left preconditioned system) or the form AMz = c with y = Mz (right preconditioned system), where M ≈ A −1 is called the preconditioner matrix or simply the preconditioner. Goal of preconditioning is to improve the spectral distribution of the coefficient matrix A so that a Krylov subspace algorithm can solve the transformed preconditioned system much faster than the original one. The development of efficient preconditioners is a very important topic in numerical linear algebra because it can enable rapid solution of problems that may initially appear numerically intractable. However, preconditioning is seldom used for solving eigen-problems Ay = λy because the left preconditioned (MAy = λMy) and right preconditioned (AMz = λy, y = Mz) eigenproblems are no longer equivalent to the original one.
Here we propose a new approach to precondition the PageRank eigen-problem (3) for the computation of the dominant eigenvector x corresponding to the largest eigenvalue λ 1 = 1 of the Google matrix A. According to the analysis presented in Section 2, the convergence speed of the GG-Arnoldi method is mainly determined by the separation between the largest and second largest eigenvalues of A, namely by the quantity |λ 1 − λ 2 | = 1 − α. Therefore, our strategy for preconditioning Equation (3) is to transform it into an equivalent eigen-problem that has a better separation between its two dominant eigenvalues. The main theoretical result underlying our method is presented in the theorem below. (2) and its eigenvalues 1 > |λ 2 | > |λ 3 | > · · · > |λ s |, if a polynomial P satisfies P (λ i ) = P (1) (2 ≤ i ≤ s), then the PageRank eigen-problem (3) is equivalent to the eigen-problem:

Theorem 3. For Google matrix A in
Proof. It is clear that any solution of Ax = x is also the solution of P (A)x = P (1)x. Suppose where each J i (2 ≤ i ≤ s) is a Jordan matrix whose diagonal elements equal to λ i . Then, Clearly, each P (J i ) (2 ≤ i ≤ s) is still a lower triangle matrix, and its diagonal elements equal to P (λ i ). Because P (λ i ) = P (1) (2 ≤ i ≤ s), therefore the algebraic multiplicity of the eigenvalue P (1) of P (A) equals to 1, as same as that of the eigenvalue 1 of A. Therefore P (A)x = P (1)x has the same solution space as Ax = x, i.e., problem (3) is equivalent to problem (11).
The obvious question to address is whether such a polynomial P satisfying the condition P (λ i ) = P (1) (2 ≤ i ≤ s) on the eigenvalues 1 > |λ 2 | > |λ 3 | > · · · > |λ s | of A exists, and then how to make P (A)x = P (1)x easier to solve than Ax = x. The simple polynomial P (x) = x k (k ∈ N + ) clearly satisfies P (λ i ) = P (1) = 1 (2 ≤ i ≤ s). According to Theorem 3, the two eigen-problems A k x = x, x > 0 and x 1 = 1 and Ax = x, x > 0 and x 1 = 1 are equivalent. Besides, in A k the distance between the largest two eigenvalues equals to |P (1) − P (α)| = 1 − α k >> 1 − α. As a result, we can expect that the GG-Arnoldi method will require less cycles and operations to converge when it is applied to A k x = x than to the problem Ax = x. We sketch the complete preconditioned version of the Golub and Greif variant of the refined Arnoldi method with the polynomial choice P (x) = x k , hereafter shortly referred to as the PGG-Arnoldi method, in Algorithm 3.

Algorithm 3
The preconditioned version of the Golub and Greif variant of the refined Arnoldi method, with the polynomial choice P (x) = x k (shortly, PGG-Arnoldi) for PageRank.
Input: the PageRank coefficient matrix A, initial guess x 0 , parameters m, k and tol. Outputs x = x/ x 1 and ends. 7: else 8: Set x 0 = x and go to step 1. 9: end if 10: return x Note that the matrix A k in Algorithm 3 is never formed explicitly. The operations associated with A k in this algorithm are all matrix-vector multiplications, such operation A k u is computed by carrying out k sparse matrix-vector products without assembling the Google matrix A shown as follows in Algorithm 4.

Algorithm 4
Implementation of the matrix-vector product y ← A k u 1: for i = 1 : k do 2: Compute u ← αPu + ( f T u)v. 3: end for 4: Set y ← u. 5: return y.
The overall algorithmic complexity of a cycle of the PGG-Arnoldi method (Algorithm 3) is summarized in Table 4. It is clear that one cycle of Algorithm 3 is computationally more expensive than that of the GG-Arnoldi method (Algorithm 2) for the same value of m, because the former requires to compute the matrix-vector product A k u. In detail, the PGG-Arnoldi algorithm needs additional (k − 1)m sparse matrix-vector products, (k − 1)m inner products, (k − 1)m vector scalings and (k − 1)m vector additions compared with the GG-Arnoldi algorithm per cycle. The computation of an inner product requires n floating-point multiplications and n − 1 floating-point additions while scaling a vector only needs n floating-point multiplications. Thus, we will assume that a vector scaling operation has half the cost of an inner product with vectors of the same dimension. For the same reason, the cost of each vector addition, 2-norm and 1-norm of a vector can be counted as 0.5, 1 and 0.5 inner-product, respectively. Finally, the algorithmic complexity estimates of one cycle of GG-Arnoldi and PGG-Arnoldi in terms of sparse matrix-vector multiplications and vector inner-products are presented in Table 5. Table 5. Estimated algorithmic complexity of the GG-Arnoldi (Algorithm 2) and of its preconditioned version PGG-Arnoldi (Algorithm 3).

GG-Arnoldi PGG-Arnoldi
Sparse matrix-vector multiplications m km Inner-products We observe from Table 5 that one cycle of the PGG-Arnoldi costs less than k times the cost of GG-Arnoldi. This means that the operations in addition to the sparse matrixvector product in PGG-Arnoldi are less than those required in GG-Arnoldi. Therefore, if the number of cycles required by the PGG-Arnoldi is less than 1 k times of the GG-Arnoldi, the former must be faster. Indeed, it may be faster even when the number of cycles required by the PGG-Arnoldi is larger than 1 k times of the GG-Arnoldi, which will be shown by numerical experiments. Besides, in real applications, the computation of a matrix-vector product is more efficient than that of vector operations in either serial or parallel environments, as the former belongs to BLAS-2 classification while the latter belongs to BLAS-1. Note that, it is very desirable to improve the efficiency of parallel computing for solving large problems such as PageRank. Given a good estimate of the convergence rate comparison between these two methods and the density of matrix G, the results presented in Table 5 can guide the choice between the PGG-Arnoldi or the GG-Arnoldi methods for solving the problem at hand. However, few things (in particular, quantitative conclusions) are known about the convergence rate of the GG-Arnoldi method besides that it increases with the gap |1 − λ 2 |, therefore we rely on numerical experiments to analyze the performance of the PGG-Arnoldi method.
Finally, we remark that: • acceleration techniques designed to work with the GG-Arnoldi, such as extrapolation methods [6,7], formulations based on weighted inner products [3] and hybrid solution schemes [16][17][18] that combine stationary iterative iterations with the Arnoldi method, can still be applied to the PGG-Arnoldi; • the performance of the PGG-Arnoldi can be further improved by using pre-processing algorithms such as the elimination strategy in [24] and the low-rank factorization in [25] that reduce the time cost and the memory cost of computing the sparse matrixvector product αPu, because the sparse matrix-vector products in the PGG-Arnoldi account for a larger proportion of computation compared to the GG-Arnoldi.

Numerical Results
In this section, we present results of numerical experiments on a suite of Web matrix problems obtained from the University of Florida matrix repository [26] and from the Laboratory for Web Algorithmics [27][28][29]. The characteristics of our test problems are presented in Table 6. For each Web adjacency matrix G, we build the Google matrix using Equation (2) with personalization vector v = [1, 1, . . . , 1] T /n. We use the value α = 0.99 for the damping parameter, so that the resulting PageRank problems are rather difficult to solve for iterative methods. All the runs with iterative solvers are started from the initial guess x 0 = [1/n, 1/n, · · · , 1/n] T and are stopped when either the approximate solution x i satisfies (I−A)x i 2 x i 1 < 10 −8 , or the number of matrix-vector products computed exceeds 20,000. All the runs are carried out in MATLAB R2016b on a 64-bit Windows-10 computer equipped with an Intel core i7-8750H processor and 16 GB RAM memory.

Performance Analysis of the PGG-Arnoldi Method
We first assess the performance behaviour of the PGG-Arnoldi method with the polynomial choice P (x) = x k , using different values of degree k and increasing dimensions m of the Krylov search space. Because of the very large size of PageRank problems in real applications, the dimension of the Krylov subspace should be kept as small as possible. In our experiments on the web_NotreDame and in-2004 matrices, we use m = [3, 4, · · · , 10] and k = [1, 2, · · · , 10]. In Tables 7 and 8 we report on the results of our experiments in terms of number of iterations and CPU time (in seconds) for each run. Table 7. Performances of the PGG-Arnoldi method on the web_NotreDame problem. 10   3  351  226  117  110  73  72  53  55  39  44  4  233  147  84  71  50  47  35  34  25  27  5  173  109  61  53  37  35  27  25  22  19  6  139  87  49  42  29  27  22  20  17  15  7  112  70  35  34  23  22  17  16  13  13  8  90  58  34  27  19  18  15  13  10  11  9  78  47  28  23  16  16  12  12  10  8  10  62  39  24  18  14  12  10  8  8  7 CPU time (in seconds) We observe that, in general, the number of iterations decreases when higher degree polynomials and Krylov subspaces of higher dimensions are used. By a simple multiple linear regression of number of iterations (Iter), 1 k and 1 m for Table 7, we obtained Iter = −66.4 + 394.2 1 m + 157.1 1 k with the R 2 statistic being 0.8, the probability of the Fstatistic is less than 0.05, and all the confidence intervals not including the origin. Therefore, the linear relationship can be considered significant. Similar results are obtained for Table 8. The CPU time cost generally decreases when m increases, while it oscillates with the increase of k. For the purpose of saving computing time, it is suggested to set the dimension of the Krylov subspace to the maximum value allowed by the available memory. For a given m, the number of iteration tends to increase inversely proportional with k, or approximately proportional to 1 k . Then, if the ratio between the time costs of computing the matrix-vector product αPu and an inner product between vectors of the same dimension is known, the complexity estimates given in Table 5 can guide the user in the choice of the almost optimal k in the range 1:10. Here we set m = 10 and k = 5 based on our experiments. Table 8. Performances of the PGG-Arnoldi method on the in-2004 problem.

Preconditioning Combined with Weighted Inner-Product
We study the effect of using weighted inner products instead of standard inner products on the performance of the PGG-Arnoldi method. The experiments are carried out on the same problems as in our previous experiments. The results are presented in Tables 9 and 10.
The general trend is still the same, that is the number of iterations generally decreases with the increase of the polynomial degree k and of the Krylov subspace dimension m, but some exceptions to this trend can be observed. Here we also carry out a simple multiple linear regression of the data: the number of iterations (Iter), 1 k and 1 m . The result for Table 9 is Iter = −72.4 + 426.3 1 m + 146.0 1 k , with the R 2 statistic being 0.75 that is smaller than 0.8 of the results of Table 7. This phenomenon may be explained by the fact that the adaptively weighting technique [15] makes the Arnoldi method more irregular. Besides, we can see that the preconditioning strategy proposed in this paper can also accelerate the weighted Arnoldi remarkably when solving these difficult PageRank problems. For nearly all the value of m, the CPU time cost corresponding to the optimal value of k is less than 50% of the time cost corresponding to k = 1 that is the case with no preconditioning. Table 9. Performances of the weighted PGG-Arnoldi method on the web_NotreDame problem.

Comparisons with Other Methods
In this section, we compare the performance of the PGG-Arnoldi algorithm against other algorithms including the FOM method in [30] and its weighted version (referred to as W-FOM), the Golub and Greif variant of the refined Arnoldi method for PageRank (referred to as GG-Arnoldi) and its weighted version in [15] (referred to as W-Arnoldi), the Power method (referred to as Power), the extrapolation accelerated Power-Arnoldi method in [7,17] (referred to as EXT-Arnoldi) and the multi-step Power-inner-outer method in [12] (referred to as MPIO). Note that, we also test the performances of the preconditioned weighted Arnoldi method (referred to as PW-Arnoldi) and the preconditioned extrapolation accelerated Power-Arnoldi method (referred to as EXT-PArnoldi) where the GG-Arnoldi method is replaced by our preconditioned Arnoldi method. All the matrices listed in Table 6 are tested. For all the tested methods, the Krylov subspace dimension m is set as m = 10, and the polynomial degree k of the preconditioner is set as k = 5. For the MPIO method, the number of steps of the Power method is set as 7, the parameters β and η for controlling the inner iterations are set as 0.5 and 0.1 respectively. For the EXT-Arnoldi method and the EXT-PArnoldi method, the extrapolation technique is applied every 40 Power iterations, and the residual tolerance value 1 of the Power method is set as 10 −6 . Note that, the parameter settings of the MPIO and the EXT-Arnoldi methods are almost the best settings given in the literatures [7,12]. The numerical results are presented in Table 11, where the smallest CPU time cost is typeset for clarity in bold font. We can see from Table 11 the clear potential of the proposed preconditioning strategy for accelerating the GG-Arnoldi method remarkably. The time costs of PGG-Arnoldi and PW-Arnoldi are significantly smaller than their unpreconditioned versions. Moreover, the time cost can be often further reduced when PGG-Arnoldi is combined with Power iterations and extrapolation techniques. The resulting EXT-PArnoldi outperforms all the other methods on four out of six problems, PGG-Arnoldi acts as the fastest method on one problem and is close to the best on the remaining problem. Note that, the GG-Arnoldi method is always slower than W-FOM or MPIO, while PGG-Arnoldi outperforms them in most cases. We can conclude that the proposed preconditioning strategy can improve the efficiency of the Arnoldi-type methods, and makes them faster than some other state-ofthe-art methods, when solving difficult PageRank problems.
Because the Arnoldi-type methods are very competitive for computing PageRank, it can be expected that the developed preconditioning technique can possibly accelerate the computation of PageRank problems arising from various fields. Besides, this technique can improve the efficiency of parallel computing of the PageRank problem. Accordingly, any project using the PageRank model can have a faster solving process, this will possibly improve the efficiencies of scientific researches, engineering and services. As pre-described, for the GeneRank problem [19] and the ProteinRank [20] problem, users can find the genes and proteins that are pathogenic with high probability faster.

Conclusions
In this paper, we show that if a polynomial P satisfies P (1) = P (λ i ) for any |λ i | < 1, then the PageRank problem Ax = x is equivalent to the new eigen-problem P (A)x = P (1)x. Moreover, by a suitable choice of the polynomial P such as P = x k chosen in this paper, the new eigen-problem can exhibit a much better separation between the two largest eigenvalues, thus the Arnoldi-type methods can solve this problem by less iterations. Accordingly, the number of vector-vector operations with low parallelism in the solving process can be reduced. Based on this result, we introduce a preconditioned version of the Golub and Greif variant of the refined Arnoldi method for computing PageRank. Numerical experiments demonstrate that this method can solve PageRank problem much faster than the refined Arnoldi in a wide range of parameter settings, meanwhile the weighted-Arnoldi and the extrapolated-Arnoldi methods can also be accelerated by this preconditioning strategy. Finally, preconditioned Arnoldi-type methods show superiority over some other state-of-the-art methods for solving this problem class.