Fibrosis Protein-Protein Interactions from Google Matrix Analysis of MetaCore Network

Protein–protein interactions is a longstanding challenge in cardiac remodeling processes and heart failure. Here, we use the MetaCore network and the Google matrix algorithms for prediction of protein–protein interactions dictating cardiac fibrosis, a primary cause of end-stage heart failure. The developed algorithms allow identification of interactions between key proteins and predict new actors orchestrating fibroblast activation linked to fibrosis in mouse and human tissues. These data hold great promise for uncovering new therapeutic targets to limit myocardial fibrosis.


Introduction
Cardiovascular disease, a class of diseases that impact the cardiovascular system, is responsible for 31% of all deaths and remains the leading cause of mortality worldwide [1]. Myocardial fibrosis is central to the pathology of cardiovascular complications that leads to human failure and death [2]. Cardiac fibrosis results from uncontrolled fibroblast activity and excessive extracellular matrix deposition [2]. Although a number of factors have been implicated in orchestrating the fibrotic response, tissue fibrosis is dominated by a central mediator: transforming growth factor-β (TGF-β) [3]. Sustained TGF-β production leads to a continuous cycle of growth factor signaling and deregulated matrix turnover [3]. However, despite intensive research, the factors that orchestrate fibrosis are still poorly understood and, as a result, effective strategies for reversing fibrosis are lacking [2,4]. Considering the complex heterogeneity of fibrosis, research strategy on a system-level understanding of the disease using mathematical modeling approaches is a driving force to dissect the complex processes involved in fibrotic disorders. Recently, we have reproduced the classic hallmarks of aberrant cardiac fibroblast activation leading to fibrosis, and provided a powerful toolbox for fully characterizing cardiac fibroblast transcriptome [5]. Although the pathogenesis of fibrotic remodeling has not been well identified, accumulated evidence suggests that multiple genes/proteins and their interactions play important roles in disease scenarios [6].
Traditional research has been performed to reveal the involvement of a particular gene or protein in fibrosis physiopathology [5,7]. Although these studies generated invaluable data, they still provide a small amount of evidence that is insufficient to clarify the complex nature of interactions between multiple genes or proteins simultaneously. Consequently, it is essential to develop new, multitiered approaches for global analysis of molecular interactions defining cell functional status in pathological conditions. In this context, protein-protein interactions (PPI) represent a highly promising, although challenging, class of potential targets for therapeutic development. The PPI control key functions and physio(patho)logical states of the cells. In fibrotic tissue remodeling, PPI form signaling nodes and hubs that transmit pathophysiological cues along molecular networks to achieve an integrated biological output, thereby promoting fibrosis [6]. Thus, pathway perturbation, through disruption of PPI critical for fibrosis, offers a novel and effective strategy for curtailing the transmission of profibrotic signals. Deciphering of fibrosis-specific PPI would uncover new mechanisms of fibrotic signaling for therapeutic interrogation.
In this study, we propose a Google matrix-based approach for the prediction of PPI linked to myocardial fibrosis using MetaCore network database. The present work is based on the recent results presented in [5] which allowed determination of the protein profibrotic responses as a feedback on TGF protein stimulation, which is known to play an important role in tissue fibrosis [3]. These experiments identify proteins with most positive and most negative response in cardiac fibroblasts.
To sum up, from the experimental results reported in [5], we select 40 proteins, including the top 20 positive and top 20 negative responses. The protein profile is given in Table 1 marked by indexes K u = 1, 2, . . . , 20; K d = 1, 2, . . . , 20. These proteins are ordered monotonically from the strongest K u = 1 to to weakest K u = 20 positive responses; the same monotonic ordering is performed by modulus of negative response with strongest K d = 1 to weakest K d = 20 responses. An additional group of 4 TGF-β-associated proteins with indexes K t = 1, 2, 3, 4 was integrated in the primary list of factors used in experiments [5]. These 44 proteins form the internal selected fibrosis group. For the analysis of PPI characterizing fibrosis, we added a group of 10 external proteins with indexes K x = 1, 2, . . . , 10. The choice of these 10 proteins is explained below in detail, but in short, these external proteins are those which affect, according to our network analysis, the internal proteins in the strongest manner. Thus, in total we have the PPI fibrosis network with 54 proteins (nodes). They are ordered by their global index K g = 1, 2, . . . , 54 in Table 1 (first 4 K t , then 20 K u , 20 K d and 10 K x ). Table 1. Table of the subset of N r = 54 selected fibrosis proteins (nodes). Here, K g represents the global index of this group, K t,u,d,x represent the index of the four subgroups of 4 TFG-β proteins, 20 up-proteins, 20 down-proteins and 10 additional X-proteins; K (K * ) represents the local PageRank (CheiRank) index obtained from the reduced Google matrix G R (G R * ) for this group of 54 proteins; K M (K * M ) indexes represent the PageRank (CheiRank) index for the global MetaCore network of N = 40,079 nodes; the last column gives the associated protein names. To analyze the properties of this PPI fibrosis network, we use the developed commercial MetaCore network database of Clarivate [8]. This network database has been shown to be useful for analysis of various specific biological problems (see, e.g., [9,10]). At present, the MetaCore network has N = 40,079 nodes with N = 292,191 links (without self-connections) with on average n = N /N ≈ 7.3 links per node [11]. The nodes are given mainly by proteins but there are also certain molecules and molecular clusters catalyzing the interactions with proteins. This MetaCore PPI network is directed and nonweighted. In addition, its network links mark the bifunctional nature of interactions leading to the activation or the inhibition of one protein by another one. For some nodes, link action is neutral or unknown. Thus, overall, the MetaCore network is a network with activation or inhibition directed links showing that a protein A acts on protein B. We note that this network is based on a detailed analysis of world literature describing experimental results of how one protein acts on another one. The construction of this network has been performed during several years and is now continued at Clarivate [8]. Scientific biological results obtained with this MetaCore network can, for example, be found at [9,10]. This MetaCore network represents a commercial product actively used by the world's leading pharmaceutic companies [8].
We note that at present, new types of computational methods are actively being developed, e.g., using DeepMind methods [12], with new possibilities of predicting new structures and interactions between proteins. Such methods appear to be very promising. Indeed, they can add new interaction links between proteins in the MetaCore network.
However, the creation of such a global PPI network as MetaCore with almost all proteins requires long work of gathering all available interactions between proteins and representing these interactions in a format of directed network which is very useful for scientific analysis of multiple PPI. We note that there are also other types of PPI networks developed by other companies and research groups (e.g., TRANSPATH [13], REACTOME [14]). Here, we present a universal mathematical analysis based on Google matrix methods which can be also applied to other PPI networks, such as [13,14]. However, here, we present the analysis only for the MetaCore network available to us.
For the investigation of fibrosis PPI network, we use the Google matrix algorithms developed for the analysis of the World Wide Web [15,16] and other directed networks, such as Wikipedia networks, world trade networks, and others (see review [17]). Such an approach to network characterization is based on the concept of Markov chains invented by Markov in an article published in 1906 in the proceeding of the Kazan University [18].
The important method for analysis of directed networks is the reduced Google matrix (REGOMAX) algorithm developed and described in detail in [19,20]. The REGOMAX algorithm has been applied to PPI networks of SIGNOR database as reported in [21,22]. However, the number of nodes in the SIGNOR database is approximately ten times smaller than in the MetaCore network. Thus, the SIGNOR network can only be considered as a test bed for the numerical algorithms and its conceptional base. A first description of the statistical properties of the global MetaCore network, including PageRank, CheiRank, and REGOMAX characteristics, was presented in [11]. However, this work only represents a statistical study of the MetaCore network without any applications to a concrete biological problem. In this work, we apply the REGOMAX analysis to the specific biological problem of fibrosis.
The important feature of the REGOMAX algorithm is that it constructs the Google matrix of a selected subset of nodes N r N (here, we have N r = 54) taking into account not only direct links between these N r nodes but also all indirect pathways connecting them via the global MetaCore network of much larger size N. The efficiency of the REGOMAX approach was demonstrated for various applications concerning the Wikipedia and world trade networks [23][24][25][26], and we also expect that this method will provide useful and new insights in the context of fibrosis protein-protein interactions using the MetaCore network.
The paper is constructed as follows: Section 2 describes the datasets and Google matrix algorithms, Section 3 presents the obtained results of the reduced Google matrix and sensitivity analysis for the particular group of 54 proteins (of Table 1) we consider here, and Section 4 provides the discussion of the results and the conclusion. In Appendix A, we provide additional figures and a simple analytical estimate for the sensitivity matrix to which we refer in the main part of the work; more detailed and additional numerical data obtained from the Google matrix computations are available at [27].

Network Datasets
The global MetaCore PPI network contains N = 40,079 nodes with N = 292,191 links (without self connections). The number of activation/inhibition links is N + /N − = 65,157/49,321 1.3 and the number of neutral links is N n = N − N + − N − = 177, 713. Here, we mainly present the results without taking into account the bifunctional nature of links. However, a part of the results takes into account this bifunctionality of links using the Ising Google matrix approach described in [11,22]. The subset of selected N r = 54 fibrosis proteins (nodes) is given in Table 1; these nodes are represented by 4 TGF-β proteins/nodes (K t = 1, 2, 3, 4), 20 "up-proteins" (K u = 1, . . . , 20), 20 "down-proteins" (K d = 1, . . . , 20), both obtained from experiments [5] (as described above), and 10 new "X-proteins" (or "Xnodes"; K x = 1, . . . , 10) whose selection is explained later. The TGF-β 4 nodes correspond to different isoforms of this protein. In Table 1, we show four groups of proteins and we consider that it is useful to use a specific index for each group: TGF-β proteins with index K t = 1, 2, 3, 4; up-proteins with a strongest positive response noted by index K u = 1, · · · , 20 (ordered by the positive response with the strongest response for K u = 1); down-proteins with a strongest negative response noted by index K d = 1, · · · , 20 (ordered by the modulus of negative response with the strongest response modulus for K d = 1); external proteins noted by index K x ordered by their local PageRank index (strongest PageRank probability of these 10 proteins is at K x = 1; see more details below). All these 54 proteins have their global index K g = 1, · · · , 54 as is shown in Table 1.
The Google matrix approach used in this work is explained in detail in [15][16][17], and the related REGOMAX algorithm is described in [11,19,20,22]. Below, we present a short description of these methods following mainly the presentation given in [11], keeping the same notations.

Without Formulas: Methods, Characteristics, and Expected Network Results
Here, we present qualitative explanations without formulas of the mathematical methods and characteristics described in the next subsections. Our aim here is to give a global view of our approach for a common reader.
We use the MetaCore directed network [8] which represents an action of a protein A on protein B in a form of a directed link (edge) for N = 40,079 proteins forming the network nodes (proteins). Such links are obtained on the basis of careful and detailed analysis of scientific literature about thousands of experiments of various research groups that allowed collection of information about PPI and thus generated a network database with N = 40,079 nodes and N = 292,191 links.
The universal mathematical methods to analyze such networks are generic and based on the concept of Markov chains [18] and Google matrix [15][16][17]. The validity of these methods has been confirmed for various directed networks from various fields of science. Therefore, since the Google matrix analysis is based on a generic mathematical foundation, we expect that this analysis will also work efficiently for PPI networks.
The Google matrix of the global MetaCore PPI network G is constructed with specific rules described in [15][16][17], and the mathematical aspects of this construction are given in Section 2.3. The important property of G is that its application (multiplication) to an initial vector v preserves the probability and the normalization of this vector (sum of all vector elements) remains constant (taken to be unity). As a result of multiple multiplications of v by G, any initial vector converges in the long time limit to the steady-state distribution given by the PageRank vector P. The components of this vector represent the probabilities of each node (protein) in this limit. The nodes with the highest probabilities are the most influential nodes of the network (all nodes are monotonically ordered by decreasing values of the PageRank components which provides the "PageRank index" K such K(j) = 1, 2, . . . for nodes j with largest values P(j)). These nodes have typically many ingoing links and it is likely that some of these ingoing links come from other nodes that also have large PageRank values.
It is also useful to consider the same network but with the inversed direction of links. For this inverse network, the corresponding PageRank is called CheiRank vector P * [17] with the highest probabilities P * (j) for nodes j with the CheiRank index K * (j) = 1, 2, . . . being the most communicative nodes with typically many outgoing links.
If we are interested in a specific selected, typically rather small, group of N r nodes (N r N), then the reduced Google matrix (REGOMAX) algorithm (described in Section 2.4 and Equations (2)-(5)) allows us to obtain a "reduced Google matrix" G R which describes effective interactions between these N r nodes, taking into account both direct links but also all indirect links due to pathways through the complementary network of the other N − N r N r nodes. In our study, the group of 44 nodes, given in Table 1, is selected on the basis of the experimental results for fibrosis responses obtained in [5]. In addition to these 44 fibrosis internal proteins (1 ≤ K g ≤ 44 in Table 1), we determine a special group of 10 external proteins (45 ≤ K g ≤ 54 in Table 1). These external proteins are found numerically with the following procedure: outside of the 44 proteins, we take those proteins which have at least one ingoing link to the top five positive response proteins (5 ≤ K g ≤ 9, 1 ≤ K u ≤ 5) and the top five negative response proteins (25 ≤ K g ≤ 29, 1 ≤ K d ≤ 5). There are 122 such external proteins, so that in total we have a group of 44 + 122 = 166 proteins (44 internal and 122 external ones). With the REGOMAX algorithm we obtain the reduced Google matrix for these 166 proteins. Then, we apply small variations of the transition matrix elements from the external 122 proteins to the 5 + 5 = 10 (top response) internal proteins with the above K g index values. We select the 10 external proteins which have the strongest PageRank probability changes induced by such variations (this provides a quantity called "sensitivity" which is formally defined in Section 2.6; see also the detailed procedure described in Section 2.7). In this way, we obtain the group of N r = 54 proteins of Table 1 (with 1 ≤ K g ≤ 44 being internal and 45 ≤ K g ≤ 54 being external proteins).
For this group of 54 proteins, we again compute the reduced Google matrix G R and the associated sensitivity matrix from which we numerically determine which of the 10 external proteins affect in the strongest way (highest sensitivity values) the PageRank probabilities of internal proteins participating in the fibrosis process, as found in [5].
Our REGOMAX-conjecture is that these newly discovered external proteins (which mostly affect the PageRank probabilities of internal nodes) will actually produce significant effects on the fibrosis process. We point out that such a conjecture has been well confirmed in different contexts for Wikipedia networks, world trade networks, and other networks [23][24][25][26]. However, this REGOMAX-conjecture for PPI networks is still to be verified experimentally.
The possibility to take into account the bifunctional nature (activation or inhibition) of links in the MetaCore PPI network is described in Section 2.5.
Finally, we note that the validity of the REGOMAX algorithms has been confirmed for various directed networks: the world trade network from the United Nations COMTRADE and World Trade Organization databases [25,26], world influence and impact of infectious diseases and cancers from Wikipedia networks [23,24], and PPI SIGNOR networks [21,22]. Since the REGOMAX method is based on the generic and universal mathematical features of the concept of Markov chains and Google matrix, it can be applied to various fields of science involving directed networks. Here, we apply the REGOMAX analysis to the very rich and advanced MetaCore network, taking into account the protein response results reported in [5], and we predict new potential proteins which may affect significantly the fibrosis process.
Below, we present the more formal and mathematical aspects of the REGONAX analysis qualitatively outlined above.

Google Matrix Construction, PageRank and CheiRank
First, we construct the Google matrix G of the MetaCore network for the simple case where the bifunctional nature of links is neglected. Furthermore, the directed links are nonweighted. First, one defines an adjacency matrix with elements A ij being equal to 1 if node j points to node i, and equal to 0 otherwise. In the next step, the stochastic matrix S describing the node-to-node Markov transitions is obtained by normalizing each column sum of the matrix A elements to unity. For dangling nodes j corresponding to zero columns of A, i.e., A ij = 0 for all nodes i, the corresponding elements of S are defined by S ij = 1/N. The stochastic matrix S describes a Markov process on the network: a random surfer jumps from node j to node i with the probability S ij , therefore following the directed links. The column sum normalization ∑ i S ij = 1 ensures the conservation of probability. The elements of the Google matrix G are then defined by the standard form where α = 0.85 is the usual damping factor [15,16]. The Google matrix is also column sum normalized and now the random surfer jumps on the network in accordance with the stochastic matrix S with a probability α and with a complementary probability (1 − α), to an arbitrary random node of the network. The damping factor allows escape from possible isolated communities and ensures that the Markov process converges for long times rather quickly to a uniform stationary probability distribution. The latter is given by the PageRank vector P, which is the right eigenvector of the Google matrix G corresponding to the leading eigenvalue, here, λ = 1. The corresponding eigenvalue equation is then GP = P. According to the Perron-Frobenius theorem, the PageRank vector P has positive elements and their sum is normalized to unity. The PageRank vector element P(j) gives the probability to find the random surfer on the node j at the stationary state of the Markov process. Thus, all nodes can be ranked by a monotonically decreasing PageRank probability. The PageRank index K(j) gives the rank of the node j with the highest (lowest) PageRank probability P(j) corresponding to K(j) = 1 (K(j) = N). The PageRank probability P(j) is proportional, on average, to the number of ingoing links pointing to node j. However, it also takes into account the "importance" (i.e., PageRank probability) of the nodes having a direct link to j. We note that multiple checks, described in [16,17,23] and carried out for a variety of directed networks, including PPI networks [21,22], showed that the PageRank probabilities are stable with respect to variation of α in the range (0.5, 0.95). Here, we use the traditional value α = 0.85 used in [15,16,21,22].
It is also useful to consider a network obtained by the inversion of all link directions. For this inverted network, the corresponding Google matrix is denoted G * and the corresponding PageRank vector, called the CheiRank vector P * , is defined such as G * P * = P * . A detailed statistical analysis of the CheiRank vector can be found in [28,29] (see also [17]). Similarly to the PageRank vector, the CheiRank probability P * (j) is proportional, on average, to the number of outgoing links going out from node j. The CheiRank index K * (j) is also defined as the rank of the node j according to decreasing values of the CheiRank probability P * (j).

Reduced Google Matrix (REGOMAX)
The concept of the REGOMAX algorithm was introduced in [19] and a detailed description of the first applications to groups of political leaders having articles in Wikipedia networks (different language editions) can be found in [20]. This algorithm determines effective interactions between a selected subset of N r nodes enclosed in a global network of size N N r . These interactions are determined taking into account direct and all indirect transitions between N r nodes via all the other N s = N − N r nodes of the global network. We note that, quite often in certain network analyses, only direct links of a subset of elected N r nodes are taken into account, and their indirect interactions via the global network are omitted, thus clearly missing the important interactions.
On a mathematical level, the REGOMAX approach uses ideas similar to those of the Schur complement in linear algebra (see, e.g., [30]) and quantum chaotic scattering in the field of quantum chaos and mesoscopic physics (see, e.g., [31,32]). The Schur complement was introduced by Issai Schur in 1917 (see history in [30]) and found a variety of applications. In the context of Markov chains, this approach was discussed in [33]. However, there are new elements, developed in [19,20], related to a specific matrix decomposition of the Schur complement which allows one to understand its new features and to compute efficiently (numerically) the three related matrix components in the framework of the reduced Google matrix approach for very large networks (e.g., N ∼ 5 × 10 6 as for English Wikipedia).
We write the full Google matrix G of the global network in the block form where the label "r" refers to the nodes of the reduced network, i.e., the subset of N r nodes, and "s" to the other N s = N − N r nodes which form the complementary network, acting as an effective "scattering network". The reduced Google matrix G R acts on the subset of N r nodes and has the size N r × N r . It is defined by Here, P r is a vector of size N r , its components are the normalized PageRank probabilities of the N r nodes, P r (j) The REGOMAX approach allows one to find an effective Google matrix for the subset of N r nodes, keeping fixed the relative ranking probabilities between these nodes. The reduced Google matrix G R has the form [19,20] Furthermore, it satisfies the relation of Equation (3), and it is also column sum normalized. The reduced Google matrix G R can be represented as the sum of three components [19,20]: Here, the first component, G rr , corresponds to the direct transitions between the N r nodes; the second component, G pr , is a matrix of rank 1 with all the columns being proportional (actually approximately equal to the reduced PageRank vector P r ); the third component, G qr , describes all the "interesting indirect pathways" passing through the global network of G matrix. Without going into the details, we mention here that mathematically (and also numerically), G pr is obtained from Equation (4) by extracting the contribution of the leading eigenvector of G ss (which is very close to the PageRank of the complementary scattering network of N s nodes) whose eigenvalue is close to unity but it is not exactly unity, as G ss is not column normalized and there is a small escape probability from the N s scattering nodes to the selected subset with N r nodes. This eigenvector therefore dominates the matrix inverse in Equation (4) and its contribution produces the rank 1 matrix G pr , and the remaining contributions of the other eigenvectors of G ss to the matrix inverse provide the matrix G qr which can be efficiently computed by a rapid convergent matrix series (see [19,20] for details). This point is crucial since it allows for a highly efficient numerical evaluation of all three components of G R also for the case where a direct numerical computation of the matrix inverse of (1 − G ss ) is not possible due to very large values of N (note G ss has the size N s × N s with N s ≈ N N r ). While G pr , being typically numerically dominant, has a very simple rank 1 structure, the matrix G qr contains the most nontrivial information related to indirect hidden transitions. Actually, mathematically, both components G pr and G qr arise from indirect pathways through the scattering nodes (represented by the matrix inverse term in Equation (4)) but G pr can be viewed as a uniform background generated by the long time limit (i.e., the leading eigenvector of G ss ) of the effective process in the complementary scattering network. The component G qr gives the deviations from this background and in the following when we speak of "contributions from indirect pathways", we refer essentially to the contributions of G qr . It is possible that certain matrix elements of G qr are negative, and if this happens, this is also important information as it indicates a reduction from the uniform background for certain links (matrix elements of G R , G rr , and G pr are always positive due to mathematical reasons).
Furthermore, we also define the matrix G (nd) qr which is obtained from the matrix G qr by setting its diagonal elements to zero (these elements correspond to indirect self-interactions of nodes). We consider that this matrix contains the most interesting link information, direct links, and "relevant" indirect links describing the deviations from the uniform background due to G pr . The contribution of each component is characterized by their weights W R , W pr , W rr , W qr (W (nd) qr ), respectively, for G R , G pr , G rr , G qr (G (nd) qr ). The weight of a matrix is given by the sum of all the matrix elements divided by its size N r (W R = 1 due to the column sum normalization of G R ). Examples of interesting applications and studies of reduced Google matrices associated with various directed networks are described in [21][22][23][24].

Bifunctional Ising MetaCore Network
To take into account the bifunctional nature (activation and inhibition) of MetaCore links, we use the approach proposed in [22] with the construction of a larger network, where each node is split into two new nodes with labels (+) and (−). These two nodes can be viewed as two Ising-spin components associated with the activation and the inhibition of the corresponding protein. In the construction of the doubled "Ising" network of proteins, each element of the initial adjacency matrix is replaced by one of the following 2 × 2 matrices: where σ + applies to "activation" links, σ − to "inhibition" links, and σ 0 when the nature of the interaction is "unknown" or "neutral". For the rare cases of multiple interactions between two proteins, we use the sum of the corresponding σ-matrices which increases the weight of the adjacency matrix elements. Once the "Ising" adjacency matrix is obtained, the corresponding Google matrix is constructed in the usual way, as described above. The doubled Ising MetaCore network corresponds to N I = 80,158 nodes and N I, = 939,808 links given by the nonzero entries of the used σ-matrices. Now, the PageRank vector associated with this doubled Ising network has two components P + (j) and P − (j) for every node j of the simple network. Due to the particular structure of the σ-matrices (Equation (6)), one can show analytically the exact identity, P(j) = P + (j) + P − (j), where P(j) is the PageRank of the initial single PPI network [22]. The numerical verification shows that the identity P(j) = P + (j) + P − (j) holds up to the numerical precision ∼ 10 −13 .
As in [22], we characterize each node by its PageRank "magnetization", given by By definition, we have −1 ≤ M(j) ≤ 1. Nodes with positive M are mainly activated nodes and those with negative M are mainly inhibited nodes.
In this work, the results are mainly presented for the simple network without taking into account the bifunctional nature of links. However, for an illustration, we also present some results for the bifunctional network, keeping for further studies a more detailed analysis of this case.

Sensitivity Derivative
The reduced Google matrix G R of the fibrosis network describes effective interactions between N r nodes, taking into account all direct and indirect pathways via the global MetaCore network.
As in [11], we determine the sensitivity of PageRank probabilities with respect to a small variation of the matrix elements of G R . The PageRank sensitivity of the node j with respect to a small variation of the link b → a is defined as Here, for fixed values of a and b, P rε (j) is the PageRank vector computed from a perturbed matrix G Rε where the elements are defined by G Rε (a, and for arbitrary c (including c = a). In other words, the element G R (a, b), corresponding to the transition b → a, is enhanced/multiplied with (1 + ε) and then the column b is resum-normalized by multiplying it with the factor 1/[1 + εG R (a, b)], and all other columns d = b are not modified. We use here an efficient algorithm described in [34] to evaluate the derivative in Equation (8) exactly without usage of finite differences (see also the Appendix A for some details on this and other related points). In the following, we consider the case where j = a and we define the "sensitivity matrix" as D ab = D (b→a) (a). It turns out from the numerical computations that for the cases considered here, all values of D ab are positive: D ab > 0 which can also be analytically understood as explained in Appendix A.

Determination of External X-Proteins
From the experimental results of [5], we have 44 nodes of our selected subset (see the first 44 rows of Table 1). Of course, the interactions between these nodes are very important but it is also important to determine how these 44 fibrosis proteins are influenced by external nodes. To find the most important and influential external nodes, we take five top up-and five down-proteins with K u = 1, . . . , 5 and K d = 1, . . . , 5 from Table 1. Then, we determine all external nodes having direct ingoing 134 links to one of these 5 + 5 fibrosis proteins. There are 122 such proteins (some of them have several links to these 5 + 5 proteins providing 134 links in total). The first 44 proteins of Table 1 together with these 122 external proteins (ordered by their PageRank index) constitute an intermediary group of size 166 for which we first compute the reduced Google matrix by Equation (4) and which we note as G  (8)) (with j = a; see also Figure A3). Then, we compute the sum of In the following, we call this new subgroup the subgroup of X-proteins (or X-nodes). They are given in the last 10 rows of Table 1 (for K g = 45, . . . , 54 and K x = 1, . . . , 10). We mention that these 10 X-proteins have index values of (1, 2, 3, 4, 6, 8, 10, 15, 27) with respect to the initial list of 122 external proteins (which were already PageRank ordered). It turns out that this procedure automatically selects 10 external nodes which have approximately the strongest PageRank values. This can be understood by the fact that the matrix D (166) ab is roughly proportional to P(b) except for a small number of cells with strong peak values (see also Figure A3 and Appendix A for a theoretical explanation). In this way, we obtain the full subset of 54 fibrosis proteins given in Table 1. The REGOMAX analysis is performed for these 54 fibrosis proteins and, unless stated otherwise, all results for G R , D ab , etc., refer to this group of 54 proteins.

Results
In this section, we present the results of Google matrix analysis of fibrosis proteinprotein interactions.

Fibrosis Proteins on PageRank-CheiRank Plane
As in [11], we determine the density distribution of all proteins of the MetaCore network on the PageRank-CheiRank plane of logarithms (ln K, ln K * ) of indexes (K, K * ), which is shown in Figure 1. The whole plane is divided on 100 × 100 logarithmically equidistant cells and the density is defined as the number of proteins in a given cell divided by a total possible nodes in a given cell (this approach is discussed in more detail, e.g., in [29]). The highest density is located at top indexes K, K * , but in this region there is a relatively small number of proteins. The positions of fibrosis proteins of Table 1 are marked by crosses of three colors: red for 10 external X-proteins (K x = 1, . . . , 10), pink for 4 TGF-β proteins (K t = 1, 2, 3, 4), and white for the 40 up-and down-proteins (K u , K d = 1, . . . , 20). We see that X-proteins have highest rank positions; two of the TGF-β proteins approximately follow after K x values of PageRank and two others have significantly lower K-rank positions (positions in K * -rank are rather low); proteins K u and K d have, on average, rather low rank positions (very large K, K * values). Therefore the X-proteins have the highest network influence and communicativity (small K, K * values).
The presentation of Figure 1 uses the global MetaCore rank index values (in the following, these values are noted as K M , K * M ; see also Table 1). For the selected subset of 54 fibrosis proteins, we note their local rank indexes in this group as K, K * , which are also given in Table 1. The distribution of these 54 local rank indexes on the PageRank-CheiRank plane of size 54 × 54 is given in Appendix A Figure A1.  Table 1 with colors: red for the X-proteins, pink for the TGF-β subgroup, and white for the up-and down-protein subgroups.

Reduced Google Matrix of Fibrosis
The reduced Google matrix G R of 54 fibrosis proteins and its 3 matrix components G pr , G rr , G qr are shown in Figure 2. The weights of these matrices are: W pr = 0.9522, W rr = 0.0228, W qr = 0.0250, (W (nd) qr = 0.0211), and W R = 1 (due to the column sum normalization of G R ). Thus, the weight of G pr is significantly higher compared to the two other components. This behavior is quite typical and was also observed for Wikipedia networks (see, e.g., [20,23,24]). The physical reason for this is that G pr is obtained from the contribution of the leading eigenvector of the matrix G ss whose eigenvalue is close to unity and dominates, numerically, the matrix inverse in Equation (4) (see also the discussion in the last section and [19,20] for details). Furthermore, G pr has a very simple structure since it is of rank one, i.e., all columns are exact multiples of the first column. Furthermore, these columns are approximately equal to the local PageRank vector. Therefore, the component G pr does not provide any new interesting information about possible interactions other than that it trivially reproduces the PageRank vector.  Table 1; the x-axis corresponds to the first (row) index (increasing values of K g ) from top to down) and the y-axis corresponds to the second (column) index of the matrix (increasing values of K g from left to right). The outside tics indicate multiples of 10 of K g . The numbers in the color bar correspond to |g|/g max , with g being the value of the matrix element and g max being the maximum value. In order to increase the visibility for the cases of G R , G rr , G qr , the maximum value has been reduced (saturated) to the value of the third largest value of g for each case, and the cells corresponding to the first and second largest values are reduced to the saturation value. In particular, G R (45, 15) (G R (46, 13)) has been reduced from 0.876387 (0.297512) to G R (49, 3) = 0.208777; G rr (45, 16) (G rr (29,24)) has been reduced from 0.850004 (0.121432) to G rr (29, 54) = 0.019322 (same third value also for the other three cells in column 54); G qr (49, 3) (G qr (40, 41)) has been reduced from 0.240629 (0.062024) to G qr (46, 32) = 0.041108. For the matrix G qr , there are some negative values, and here, we show their absolute values (see text).
Numerically, G R is dominated by G pr (with its high weight W pr = 0.9521). However, the other two components give us important additional information about direct interactions between the 54 fibrosis proteins (G rr ) , and, even more importantly, about all indirect interactions (G qr ) between these proteins via the global MetaCore network performing an effective summation over all indirect pathways (see [19,20] for details). The weights of the components of G rr and G qr are comparable. We also see that nearly all direct transitions visible in G rr are from X-proteins to other proteins (all subgroups), which is not astonishing due to the selection rule that any X-node must have at least one direct link to the first five top-or first five up-proteins and also due to the fact that they have rather high PageRank but also CheiRank positions (according to Table 1, Figure 1 and Appendix A Figure A1). Since the PageRank probabilities are higher for X-proteins (see Figure 1), there are rather strong transitions to these X-proteins well visible for G R , G pr , and, to a lesser extent, also in G qr . We note that the component G qr has a small number of nonvanishing diagonal matrix elements which appear due to the possibility that a pathway over the global MetaCore network can return to an initial protein.
It should be noted that a few matrix elements of G qr have negative values. Such a situation has been already found for other directed networks, e.g., Wikipedia networks studied in [20]. To be more precise for G qr and G rr + G  Figure 2, only the modulus of matrix elements is shown in order to have a uniform style for all components (the 10 strongest negative values of G qr correspond to green color with color bar values of 0.3 to 0.4 and after taking the modulus). Of course, the matrix elements of G R , G rr , and G pr are always positive due to strict mathematical properties. Figure 3 shows the effective matrix of transitions for direct links and relevant indirect pathways (without self-interactions) which is obtained as the sum of the two components qr . There are also some cells with cyan color for negative matrix elements (corresponding to −0.3 to −0.2 in units of the color bar for the strongest 10 negative values). Most links are due to the interactions from K x to K t , K u , K d proteins, but there are also some other significant transitions between the other members of the group of 54 proteins. have negative values visible as cyan color (see text). The numbers in the color bar correspond to sgn(g) |g|/g max , with g being the value of the matrix element and g max being the maximum value.

Network Diagrams of Fibrosis Interactions
In this section, we discuss two types of effective networks (of most important PPI links) obtained from the two matrices G R and G rr + G (nd) qr , the latter containing the "interesting" links without the uniform background generated by the component G pr (and without self-interactions). We remind the reader that the value of a matrix element g(a, b) (with g being either G R or G rr + G (nd) qr ) corresponds to the strength of the link b → a. If this value is sufficiently high, we say that a is a "friend" of b and b is a "follower" of a. This distinction allows one to construct for each matrix two types of effective networks by choosing a few number of "top nodes" and adding a certain number of the strongest friends (or followers) according to the values of |g(a, b)| and repeating this procedure for a modest number of depth levels.
In Figure 4, we show four graphical representations of such effective networks for the two cases of friend or follower networks and the two matrices G R and G rr + G (nd) qr visible in Figures 2 and 3. In these figures and the remainder of this subsection, we use the short notations Tj, Uj, Dj or Xj for a protein/node where j = 1, 2, . . . is the integer value of the subgroup index K t , K u , K d or K x , respectively, with real protein names given in Table 1.
To construct the effective network for a matrix component g (with g being either G R or G rr + G (nd) qr ), we first choose five initial top nodes/proteins corresponding to U1, U2 (ADAMTS16, FGF21), D1, D2 (CLEC3B, SCARA5), and X9 (MMP-14). U1, U2 (D1, D2) have the strongest positive (negative) TGF-β response observed experimentally in [5]. The node corresponding to X9 (MMP-14) produces the strongest sensitivity D ab (among those elements D ab where a is an up-or down protein and b is a TGF-β or X-protein; see next subsection for details on this). These five proteins form the set of level-0 nodes which are placed on a large circle.
We attribute the color red to the combined subgroups of 10 external X-proteins (K x = 1, . . . , 10) and 4 TGF-β proteins (K t = 1, 2, 3, 4). The transitions inside this red group are not taken into account since we are mainly interested in the influence of this group on the other up-and down-proteins. We attribute two colors to the up-proteins (olive green to U1, green to U2) and two colors to the down-proteins (cyan to D1, blue to D2). Inside the group of up-proteins, we attribute the color olive green to a protein Uj if Uj is a stronger follower of U1 than of U2 with respect to g = G rr + G (nd) qr , i.e., if g(K u = 1, K u = j) > g(K u = 2, K u = j), and green otherwise. In other words, we compare the strength of the links Uj → U1 and Uj → U2 to determine if Uj has the color olive green of U1 or green of U2. In a similar way, by comparing the strength of the two links from a Dj protein to either D1 or D2, we attribute the two colors cyan and blue to downproteins. This attribution rule, using the strongest followers with respect to G rr + G (nd) qr of the two top nodes inside a subgroup, ensures that for all colors there is a considerable number of proteins and it is the same for all four network diagrams (both matrices and both friend/follower cases). For each of the five level-0 proteins, noted a, we first search the four strongest friends (followers), noted b, with largest value of |g(b, a)| (or |g(a, b)|) corresponding the strongest link a → b (or b → a), where the matrix g is either G R or G rr + G (nd) qr . The new nodes b (if not yet present in the set of level-0 nodes) form the set of new level-1 nodes and they are placed on medium-sized circles of level 1 around the corresponding "parent" node a of level-0. The links between the nodes a and b are drawn as thick black arrows with direction a → b (b → a) for the friend (follower) case. If a node b already belongs to the set of level-0 nodes, we also draw a thick black arrow but using its already existing position on the initial large circle. If a node b has several parent nodes a, we place it only on one medium circle, preferably around a parent node of the same color if possible.
This procedure is repeated once: for each level-1 protein we determine the four strongest level-2 friends (or followers) which are placed on smaller circles of level 2 around the corresponding level-1 protein, provided that they are not yet present in the former sets of level-0 or level-1 proteins. The links corresponding to this stage are drawn as thin red arrows with the same directions as in the first stage (we also draw thin arrows for selected nodes who were already previously selected and using their former positions). As already mentioned above, links where both proteins (a and b) belong to the combined set of Xand TGF-β proteins are not taken into account (otherwise they would strongly dominate these diagrams). We limit ourselves to two stages of the procedure (i.e., three levels of nodes) because otherwise the diagrams would require still smaller circles and many nodes would be hidden by former nodes. We note that for the friend-G R diagram, a further third stage would not add any new nodes since the strongest friends of level-2 are already in the network. For the other cases, additional further stages would only add a few number of new nodes with a quite rapid saturation of the network at some limit level where no new nodes are selected. Figure 4. Effective friend and follower networks generated from G R and G rr + G (nd) qr . Starting from five top nodes, the four strongest friends/followers for each initial node are selected and links are shown by thick black arrows. For each selected new node, further four strongest friends/followers are selected and corresponding new links are shown by thin red arrows. In this procedure, the direct links between two nodes belonging both to one of the two subgroups of X-proteins or TGF-β proteins are not taken into account. The node labels Tj, Uj, Dj, Xj (with j being an integer value) correspond to the local subgroup index K t = j, K u = j, K d = j or K x = j, respectively, which are given in Table 1. Color attributions: 10 external proteins K x and 4 TGF-β proteins are in red; protein K u = 1 and its friends are in olive green; protein K u = 2 and its friends are in green; protein K d = 1 and its friends in cyan; protein K d = 2 and its friends are in blue. Further details about precise selection rules of links, top nodes, and colors are given in the text. Figure 4 shows diagrams of level-2 networks for the cases of friend (top row) and follower (bottom row) diagrams and the two matrices g = G R (left column) or g = G rr + G For the friend network of G R , there is a dominance of links (black arrows) U1, U2, D1, D2 → Xj for certain X-proteins Xj which can be understood by the fact that most Xj proteins have significantly higher PageRank probabilities than the other proteins. Furthermore, the total number of nodes in this diagram is quite small because the strongest friends of level-1 nodes (X1, X2, X3, U4, U5, D14) are mostly other level-1 nodes and there is only one new level-2 node (D20). This diagram is obviously dominated by the uniform background (of the component G pr contributing to G R ) which tends to select mostly the "same new friends" at each level.
For the friend case of G rr + G (nd) qr , the network structure is significantly richer, since here, the global PageRank transitions (due to the uniform background of G pr ) do not play a role. The group around U1 includes T2, T3, T4. Thus, we see a formation of groups of friends around U1, and especially U2, with many friends, and smaller groups of friends appear around D1, D2 and X9.
For the follower network of G rr + G (nd) qr , the largest groups of followers are again formed around U1, U2. In the group around U1, we have only other up-proteins while in the group around U2 we have up-, down-, and X-proteins. The third group around X9 is composed of several up-and down-proteins as well as one TGF-β protein (T1) on level 2. The fourth group around D1 includes D3, D20 and X5 but there are also two other followers U7, U9 which are placed on the U1-circle. The fifth group around D2 includes only X8 (on its own circle) and U7, U9, U10 from the U1-circle.
The follower network of G R matrix has a similar structure, since for followers the contribution of G pr is not so significant that several links of followers of G R and G rr + G (nd) qr are similar.
It should be noted that the few negative matrix elements of G qr have a modest impact on the network diagrams of G rr + G (nd) qr (∼15% of links and only one stage-1 link for the friend case).
These network diagrams allow us to obtain a qualitative graphical view on the most significant fibrosis PPI interactions from a friend or a follower point of view.
We note that in principle it is possible to choose another initial set of five proteins at level 0. In Appendix A Figure A2, we show the network diagrams for the modified level-0 set: D1, D2, U9, U18 and X9. Here, the four up-and down-proteins have the highest sensitivity with respect to X-proteins (see next section). Some features are quite similar to the first case: the friend diagram of G R has only a modest number of nodes with a domination of X-proteins, and generally, the groups associated with the two up-top nodes appear somewhat larger than the groups for the two down-top nodes.

Sensitivity of Fibrosis Proteins
In addition to the matrix components G R , G pr , G rr , G qr and the network diagrams (of G R and G rr + G (nd) qr ), it is also important to analyze the sensitivity matrix D ab defined previously in Equation (7). This matrix D ab gives the sensitivity of a protein a with respect to a small variation of the transition matrix element of G R from protein b to a on the basis of logarithmic derivative of the PageRank probability (see Section 2.5 and also Appendix A for more technical details on this).
As described previously (see Section 2.6), we first compute the sensitivity matrix D (see Section 2.6) which form the group of 10 X-proteins. The 44 TGF-β, up-and down-proteins, together with these 10 X-proteins, form our main group of 54 proteins given Table 1 and for which we present results of the reduced Google matrix in the last subsections.
The sensitivity matrix D ab of size 54 × 54 for this main group is shown in Figure 5 with zoomed parts visible in Figure 6.  Table 1; the axes and colors are defined as in Figure 2 (without saturation); the strongest top 40 sensitivity values are given in Table 2.   (a, b) with strongest sensitivity matrix element D ab , with a belonging to the subgroups of up-or down-proteins and b belonging to the subgroups of TGF-β and X-proteins. The first column gives the ranking index K s of D ab matrix elements ordered by a decreasing value, the second to fourth columns provide the K g , K u,d indexes and the name of the protein (a), the fifth to seventh columns provide the K g , K t,x indexes and the name of the protein (b), and the eighth column shows the value of D ab . See also Figure 5, which shows a color density plot for all matrix elements D ab , and Table 1 for the list of considered proteins. An ordered list of all 560 values of sensitivity influence values D ab of TGF-β or X-proteins (for "b") on up-/down proteins (for "a") is available at [27].
The list of all 560 sensitivity matrix values D ab with a belonging to the subgroups of up-or down-proteins and b belonging to the subgroups of TGF-β and X-proteins is available at [27]. The strongest 40 D ab values of this list are shown in Table 2. Among the top three pairs, we find that the protein MMP-14 gives the top sensitivity (influence) on the protein CLEC3B (D ab = 0.263109), next is the protein p53 giving the sensitivity (D ab = 0.259298) on the protein GALNT3, and the third place is for the sensitivity of C1QTNF3 from PPAR-γ (D ab = 0.225877).
We mention that the appearance of MMP-14 (K x = 9) at the top position of Table 2 is the reason why we selected this protein as one of the five top nodes in the net diagrams discussed in the last subsection. For the net diagrams shown in Figure 4, the other four top nodes were simply chosen as the first two up-(K u = 1, 2) and down-proteins (K d = 1, 2). However, for the net diagrams shown in Appendix A Figure A2, the two top up-and downnodes were also chosen by the criterion of top positions in Table 2 resulting in K u = 9, 18 and K d = 1, 2.
We also computed the effective TGF-β sensitivity on up-or down-proteins (noted a) defined by the sum D Ordering these values in decreasing order, we obtain the ranking index K (TGF−β) s = 1, . . . , 40 whose dependence on K u and K d is visible in Appendix A Figure A4. We see that for the up-proteins we have 14 ranking values  18,19,20). This shows that the overall influence of TGF-β proteins is somewhat stronger on the up-proteins, compared to the down-proteins.
However, we mention that the different values of D (TGF−β) s (a) used to determine this ranking have only modest size variations in the interval 0.0250 to 0.0465 with most values between 0.040 and 0.043. Furthermore, overall, the external X-proteins have a much higher influence (on up-and down-proteins) than the TGF-β proteins. For instance, in Table 2, the TFG-β proteins do not appear at all (in the three "b" columns), and in the full list of 560 entries, the first appearance of a TFG-β protein is at the ranking position K s = 319.
Both of these points can be explained by the approximate expression D ab ≈ [1 − P r (a)]P r (b) ≈ P r (b) which is derived in the appendix for a simplified model of a rank 1 G R matrix but which also holds approximately for arbitrary G R matrices due to the strong numerical weight of the rank 1 component G pr . This behavior is also confirmed, for a "uniform background", by Figures 5 and 6 for D ab and Appendix A Figure A3 for D  Table 2, containing the largest D ab values (with b being either an X or a TGF-β protein and a being an up-or down protein), is dominated by X-proteins which have mostly larger P r (b) values than the TGF-β proteins.
We also determine the global influence on the whole group of fibrosis up-and downproteins by computing the sum D (u/d) s (b) = ∑ 44 a=5 D ab (i.e., the a-sum is over up-and downproteins) for each X or TGF-β protein b. The resulting values of this quantity are provided in Table 3. According to the simple expression for D ab , we have a linear dependence of D (u/d) s (b) on P r (b), and due to the a-su,m the effect of exceptional peaks is strongly reduced. This linear dependence is clearly visible in Table 3 and Appendix A Figure A5.
a=5 D ab (i.e., the a-sum is over up-and down-proteins) for b belonging to the TGF-β or the X-proteins subgroups. The list is ordered with respect to decreasing D (u/d) s (b) values with the first column giving the corresponding ranking index; the second and third columns giving the K g , K t,x indexes; the fourth and fifth columns containing the local PageRank index K and the name of the protein b; and the sixth and seventh columns giving the values of D (u/d) s (b) and the local PageRank probability P r (b). Both K and P r (b) correspond to the group of 54 fibrosis proteins of Table 1. However, Table 3 also shows that at the ranking positions 9 (K x = 9 for MMP-14) and 10 (K t = 2 for TGF-β 1), there is one ranking inversion between D , while the PageRank value of the former is very slightly (0.15%) smaller than the value of the latter (both PageRank values are nearly identical). In Appendix A Figure A5, both of these proteins correspond to two data points with a certain visible (vertical) difference for D (u/d) s (b) but with no visible (horizontal) difference for P r (b).
We argue that the obtained high sensitivity values shown in Figures 5 and 6 and Table 2 can be tested in experiments similar to those reported in [5]. The global influence Table 3 also gives us a prediction of the globally stronger influence of the X-proteins than the TGF-β proteins. These results open new perspectives for external proteins influence on fibrosis.

Bifunctionality of Fibrosis Network
Here, we present in short certain results for the bifunctional MetaCore network. The doubled Ising MetaCore network has N I = 80,158 nodes and N I, = 939,808 links. We compute the reduced Google matrix G R for the doubled number of nodes 2 × 54 = 108 (by attributing (+) and (−) labels to each node) for the fibrosis proteins of Table 1. Here, we present only some selected characteristics; all data for the Ising Google matrix are available at [27].
In Figure 7, we show the magnetization M(j) = (P + (j) − P − (j))/(P + (j) + P − (j)) of proteins of Table 1 with their location on the PageRank-CheiRank plane (K, K * ). Remember that P ± (j) is the PageRank value of the node j with label (±) and that the sum satisfies P(j) = P + (j) + P − (j) where P(j) is the PageRank value of the node j of the simple network. The magnetization is positive for nodes which are more likely to be activated, or in other words, which have on average more incoming activation links (and/or coming from other nodes with larger PageRank values) than inhibition links, while negative values correspond to nodes being more likely to be inhibited by other nodes.
According to Figure  . We note that CELC3B is also selected in both network diagrams of Figure 4 and Appendix A Figure A2 as one of the two down-top-nodes, either because it is the first protein in the list of down-proteins or because it appears at the top position of Table 2 for the strongest sensitivity value D ab (with a being CELC3B and b being the X-protein MMP-14). One may also note that Appendix A Figure A1 shows the same (K, K * ) positions as Figure 7 and allows us to identify which of the boxes belong to the subgroups of TGF-β proteins, up-or down-proteins, or X-proteins.   Table 1 shown on the PageRank-CheiRank plane (K, K * ) of local indices; here, j represents a protein node in the initial single protein network and P ± (j) are the PageRank components of the bifunctional Ising MetaCore network (see text). The values of the color bar correspond to M/ max |M| with max |M| = 0.690937 being the maximal value of |M(j)| for the shown group of proteins. Note that the positions in the PageRank-CheiRank plane are identical to the positions of Appendix A Figure A1, and the corresponding K, K * values are given in the third and fourth column of Table 1.
In Figure 8, we show the matrices components G R and G rr + G (nd) qr for the group of selected 108 nodes corresponding to the Ising MetaCore network. Their structure is quite similar to the corresponding components for the group of 54 nodes for the simple network shown in Figures 2 and 3, i.e., G R is dominated by the uniform background due to the component G pr with some exceptional peak values and large values if the first (vertical) matrix index corresponds to an X-protein with large PageRank probability. For G rr + G (nd) qr , the structure is more sparse, showing the most significant direct and relevant indirect transitions. We note that for the Ising case, the matrix values are identical for the two labels of a given node in the horizontal position (except for the diagonal elements of G rr + G (nd) qr , which have been artificially set to zero), which is a mathematical property of these matrices. However, in the vertical direction, there are significant differences between the two Ising labels, especially for G rr + G  Table 1. The matrix plot style is similar to in Figure 2, with outside tics indicating multiples of 20 of the index values. The color bar is as in Figure 2 with the same translation of colors to matrix values. The saturation value is, for both panels, the sixth largest value for each matrix, and larger values are reduced to this value. The strongest cell values are reduced from 0.437575 (0.424939) to 0.101874 (0.060717) for G R (G rr + G (nd) qr ).

Summarizing Results Without Formulas
We present here a short summary of results without formulas to make them more clear for a common reader. With the REGOMAX analysis, we find the external proteins (K g = 45, . . . 54, K x = 1, . . . 10 in Table 1) which produce the strongest influence on the PageRank probabilities of the internal protein group (K g = 5, . . . 44 in Table 1) characterizing the fibrosis process. Since the PageRank probabilities determine the global influence of proteins on the MetaCore PPI network, we push forward the REGOMAX-conjecture that these external proteins, found in this work, will produce a significant influence on the fibrosis process. The lists of these external proteins with their effective influence on internal proteins (sensitivity) are given in Tables 2 and 3. We also determined the most significant interactions between the 54 fibrosis proteins; these interactions are given by their G R matrix elements.
We point out that such a prediction of the REGOMAX analysis has never been tested in real protein fibrosis processes. However, our previous studies of other directed networks (Wikipedia networks, world trade networks, etc. [23][24][25][26]) allowed us to compare the predictions of the REGOMAX analysis with other studies performed by other scientific methods, confirming the obtained REGOMAX results and therefore showing the efficiency of this approach. On these grounds, we expect that our predictions for fibrosis will find their experimental confirmations.
We also show that the bifunctional nature of fibrosis PPI can be also analyzed by the REGOMAX algorithm. Thus, the detailed analysis of these bifunctional effects opens unexplored perspectives left for further studies.

Conclusions
Identifying fibrosis-associated proteins is a critical issue in treating heart failure. However, deciphering fibrosis proteins experimentally is extremely time-consuming and laborintensive. Thus, alternative methods should be developed to discover fibrosis proteins. In the current study, we explored fibroblast transcriptome profiling data [5] to develop a model for predicting cardiac fibrosis protein-protein interactions using the Google matrix analysis. Thus, we implemented the REGOMAX algorithm to the MetaCore PPI network to dissect the key proteins driving cardiac fibroblast activation leading to fibrosis.
In this work, we presented the Google matrix analysis of PPI of cardiac fibrosis. The group of 54 proteins actively participating in the fibrosis process is determined on the basis of INSERM experimental results presented in [5], which identify 44 proteins. In addition, we discover 10 external proteins with strongest sensitivity action on the fibrosis related 44-group. The sensitivity action is computed in the context of the REGOMAX approach applied to the MetaCore PPI network [8]. Our results allow us to identify the most important interactions between 54 proteins related to fibrotic cascade. The strongest integrated sensitivity actions of fibrosis proteins are summarized in Table 3, predicting the strongest influence of the myocardial fibrosis process. The strongest interactions between fibrosis proteins are also identified from the REGOMAX analysis and are summarized in Table 2.
The current research not only significantly improves the prediction performance of fibrosis proteins, but also discovers several potential fibrosis-associated proteins for future experimental investigations. It is anticipated that the current research could provide new insights into fibrosis-related disease mechanisms and diagnosis. Confirmatory testing of these predictions is planned with the experimental investigations of fibrosis to be performed at INSERM.
We argue that the developed Google matrix analysis for PPI has a generic and universal nature, being based on the strict mathematical features of Markov chains and directed networks [15][16][17][18]. Thus, this approach can be applied not only to the MetaCore network but also to other PPI network databases, such as TRANSPATH [13] and REACTOM [14]. The mathematical foundations of the Google matrix analysis have proved to be useful and efficient for different types of directed networks, including the World Wide Web [15,16], Wikipedia networks, and the world trade networks [17,20,23,25,26]. Thus, we expect that the analysis of the existing PPI network databases [8,13,14] with the Google matrix algorithms described here will find broad applications for analysis of various complex biosystems and diseases.   Table 1 in the local PageRank-CheiRank. Note that these positions are identical to the positions of Figure 7 and the corresponding K, K * values are given in the 3rd and 4th column of Table 1. Pink full circles correspond to the subgroup of TGF-β nodes, full black boxes correspond to the subgroups of up-and down-proteins and red squares correspond to the subgroup of X-proteins. Appendix A Figure A1 provides complementary information to Figure 1. Appendix A Figure A2 provides the network diagrams similar as in Figure 4 but with a different choice of 5 top nodes based on the criterion of top positions in Table 2. Figure A2. Effective network diagram for the same cases as in Figure 4 but using different 5 top nodes being the first X-node, the first two up-nodes and the first two down-nodes according to Table 2.
Appendix A Figure A3 shows the sensitivity matrix D (166) ab for the intermediary group of 166 proteins which was used to determine the additional 10 X-proteins as explained in Section 2.6. Appendix A Figure A4 provides an additional analysis of the overall influence of the TGF-β proteins on the up-and down-proteins which is discussed in Section 3.5.
Appendix Figure A5 provides the graphical and fit verification of the linear behavior between the two quantities D (u/d) s (b) and P r (b) appearing the last two columns of Table 3.

Appendix A.2. Simple Estimate for the Sensitivity Matrix
In the second part of this appendix we remind some details (see [34]) about the numerical computation of the sensitivity (8) and provide an analytic approximation based on a simplified model. Let (a, b) be an arbitrary index pair and G ε be the perturbed Google matrix obtained from a general unperturbed Google matrix G 0 by multiplying its element G 0 (a, b) at position (a, b) by (1 + ε) and then sum-renormalizing the column b to unity. The elements in the other columns are not modified. In a more explicit formula we have: where δ ca = 1 (or 0) if c = a (or c = a). Note that the denominator is either 1 if d = b or the modified column sum 1 + ε G 0 (a, b) of column b if d = b. Expanding (A1) up to first order in ε we obtain G ε = G 0 + ε∆G + . . . with ∆G having the elements: Let P ε be the sum-normalized PageRank vector of G ε determined by the conditions G ε P ε = P ε and the normalization E T P ε = 1 where E T = (1, . . . , 1) is a (row) vector with unit entries. Note that the column sum condition of G ε can be written as E T G ε = E T and of course for ε = 0 we also have G 0 P 0 = P 0 , E T P 0 = 1 and E T G 0 = E T . Furthermore we write the perturbed PageRank vector in the form P ε = P 0 + ε∆P + . . . where the ∆P must satisfy the condition E T ∆P = 0. Then the sensitivity (8) is directly related to ∆P by: Expanding the PageRank equation G ε P ε = (G 0 + ε∆G + . . .)(P 0 + ε∆P + . . .) = P ε = P 0 + ε∆P + . . . to order one we first obtain the unperturbed PageRank equation G 0 P 0 = P 0 and a further inhomogeneous equation : which can be efficiently numerically solved by iteration (choosing initially ∆P = 0 on the right hand side) once P 0 has been computed (see [34] for details on this point). This provides a numerical precise scheme to compute the sensitivity in the limit ε → 0 without the need to take finite ε-differences. Now, we consider a particular very simple model where G 0 has identical columns being the PageRank P 0 , i.e., G 0 = P 0 E T or more explicitely G 0 (c, d) = P 0 (c) for all values of c, d. Then we obtain from (A2) ∀ c,d ∆G(c, d) = δ ca δ db P 0 (c) − δ db P 0 (a) P 0 (c) (A5) and from (A4) ∆P = (P 0 E T ) ∆P + ∆G P 0 = ∆G P 0 (A6) since E T ∆P = 0. Inserting (A5) in (A6) we obtain (replacing c = j and performing the d-sum for the matrix vector product) ∀ j ∆P(j) = [δ ja − P 0 (a)] P 0 (j) P 0 (b) (A7) and from (A3) D (b→a) (j) = [δ ja − P 0 (a)] P 0 (b) .
Choosing j = a this gives the sensitivity matrix where the last approximation holds if typically P 0 (a) 1. This result is of course only valid for the simplified model of identical columns (being the PageRank vector) in G 0 . However, when G 0 represents a typical reduced Google matrix, with N r N, the component G pr which has the strongest numerical weight (typically ∼95%) is of the form G pr =P 0Ẽ T whereP 0 ≈ P 0 andẼ T ≈ E T except for a few number of components j where strong deviations betweenP 0 (j) and P 0 (j) (and similarly betweenẼ(j) and E(j)) are possible.
Our examples of D ab visible in Figure 5 and of D (166) ab of Appendix A Figure A3 confirm the typical behavior D ab ∼ P 0 (b) for a "uniform background" but there are some exceptional peak values which arise from the deviations from G pr to the simplified model and also from the contributions of G rr and G qr . This also explains our numerical finding that all matrix elements of D ab are positive. Actually, according to (A8) we expect that D (b→a) (j) is typically positive if j = a and negative if j = a.
Furthermore, when taking the partial a-sum over up-and down-nodes of D ab the effect of exceptional peaks is strongly reduced thus explaining the linear behavior D Table 3 and Figure A5.