Identifying the Salient Genes in Microarray Data: A Novel Game Theoretic Model for the Co-Expression Network

Microarray techniques are used to generate a large amount of information on gene expression. This information can be statistically processed and analyzed to identify the genes useful for the diagnosis and prognosis of genetic diseases. Game theoretic tools are applied to analyze the gene expression data. Gene co-expression networks are increasingly used to explore the system-level functionality of genes, where the roles of the genes in building networks in addition to their independent activities are also considered. In this paper, we develop a novel microarray network game by constructing a gene co-expression network and defining a game on this network. The notion of the Link Relevance Index (LRI) for this network game is introduced and characterized. The LRI successfully identifies the relevant cancer biomarkers. It also enables identifying salient genes in the colon cancer dataset. Network games can more accurately describe the interactions among genes as their basic premises are to consider the interactions among players prescribed by a network structure. LRI presents a tool to identify the underlying salient genes involved in cancer or other metabolic syndromes.


Introduction
The occurrence or activity of the gene product from its coding gene can be investigated through gene expression analyses. The study of gene expression profiling of cells and tissue has become a major tool for discovery in medicine [1]. It is a profound indicator of biological activity where a change in a biological process results from a changing gene expression pattern. Gene expression data analysis requires suitable tools for storing and managing relevant data. Microarrays have been identified as a promising technology to generate huge amounts of information related to the gene expression data [2,3]. and therefore, the Shapley value should be substituted by its network counterpart. The standard values for network games are the Myerson value, which is a player based value or allocation rule, and the position value, which is a link based value or allocation rule [24]. The choice of a particular type of value, player based or link based, depends on the physical problem. If players are more important, we adopt the player based rule, and if the links are more important, we take the link based rule. In our present work, we focus on the gene co-expression networks and the network game over such co-expression networks. Therefore, our emphasis is more towards the linking abilities of the genes that lead to the introduction of the Link Relevance Index (LRI) as a suitable candidate for explaining the relevance of the genes rather than the player based values. We argue that network games can more accurately describe the interactions among genes as they consider not only the cooperation among agents (genes), but also account for how the agents (genes) are connected in a network. We evaluated LRI for the gene co-expression networks, which is analogues to the Shapely value. Therefore, our study involves a more detailed description of genetic markers and their combined effects.
Throughout this paper, we work on a matrix of gene expression values that have been already pre-processed, according to the previous methods. Firstly, we build the theoretical background of the gene co-expression network games, propose the LRI of a network game as a solution representing the significance of each of the genes, and finally, compare the results obtained from the existing methods with our results. The LRI, as we see later, stresses more the links formed by the genes and their respective contributions in the network.

Materials and Methods
We recall some basic concepts related to the development of our model from [9,[16][17][18]21,23,[25][26][27][28] in Sections 2.1-2.3. In Section 2.4, we introduce the microarray network games and the corresponding LRI. We also obtain a characterization of the LRI in the context of gene expression networks.

Cooperative Games with Transferable Utilities
Let N = {1, 2, ..., n} be a finite set of players and 2 N the power set of N, i.e., the set of all the subsets of N. A cooperative game with Transferable Utilities (TU) is a pair (N, v), where v : 2 N → R is the characteristic function with v(∅) = 0. Every subset S of N is called a coalition, and its worth is given by the real number v(S). The set N of all the players is called the grand coalition. The class of all TU-games on the player set N is denoted by G(N). The main assumption in TU-games is that the grand coalition N will eventually form. A solution is a function Φ : G(N) → R n that assigns a vector Φ(v) ∈ R n to each game v ∈ G(N). The Shapley value, which assigns to each player his/her average marginal contribution over all the coalitions, is perhaps the most popular solution concept that builds on some standard rationality axioms [29]. Formally, given a TU-game (N, v), for each player i ∈ N, the Shapley value Φ(v) is defined by, where s = |S| and n = |N| are the cardinalities of coalitions S and N, respectively. An alternative representation of the Shapley value can be given as: where the coefficients (λ S (v)) (S∈2 N ) are called the Harsanyi dividends [30] and given by,

Microarray Games
Microarray games were defined as TU-games in [16] that account for the relevance of groups of genes in relation to a specific condition. A Microarray Experimental Situation (MES), which is the basis of the microarray games, is defined as follows (see [16] for more details).
Let N = {1, 2, · · · , n} be a set of n genes, S R = {s R 1 , ..., s R r } a set of r reference samples, i.e., the set of cells from normal tissues, and S D = {s D 1 , ..., s D d } be the set of cells from tissues with a genetic disease. In a microarray experiment, each sample j ∈ S R ∪ S D is associated with an expression profile A(j) = (A ij ) i∈N , where A ij ∈ R represents the expression value of the gene i in sample j. These expression values are called the dataset of the microarray experiment. The dataset allows for comparison among the expression intensities of genes from different samples. These datasets are presented as two real-valued expression matrices A S R = (A S R ij ) i∈N;j∈S R and A S D = (A S D ij ) i∈N;j∈S D . An MES is the tuple E =< N; S R ; S D ; A S R ; A S D >. In practice, the genes from the samples in S D that are abnormally expressed with respect to the set S R are distinguished according to some discriminant function m. The overexpressed genes pertaining to the discriminant function m are assigned one and the normal ones zero. Thus, each MES can be represented by a Boolean matrix B ∈ {0, 1} n×k , where k ≥ 1 is the number of arrays with the Boolean values (zero and one). A coalition S ⊆ N that realizes the association between the expression property and the condition on a single array is called a winning coalition for that array. Let B .j be the jth column of B. The support of B .j , denoted by sp( The microarray game corresponding to B is the TU-game (N, v), where v : 2 N → R is such that v(T) denotes the rate of occurrences of coalition T as a winning coalition, i.e., as a superset of the supports in the Boolean matrix B. Formally, for each where |Θ(T)| is the cardinality of the set Θ(T) = {j ∈ K : sp(B .j ) ⊆ T, sp(B .j ) = ∅}. The class of microarray games is denoted by the symbol M. The Shapley value is shown to be a solution to the microarray games by genetically interpreting its properties.

Network Game
Let N = {1, 2, ..., n} be a nonempty set of players that are connected in some network relationship. A link is an unordered pair of players {i, j}, where i, j ∈ N. For simplicity, write ij to represent the link {i, j}. The set g N ={ij : i, j ∈ N, i = j} of all subsets of N of size two is called the complete network. Let G = {g : g ⊂ g N } denote the set of all possible networks on N. With an abuse of notation, by ij ∈ g, we mean that i and j are linked under the network g. For instance, if N = {1, 2, 3}, then g = {12, 23} is the network where there is a link between Players 1 and 2 and another link between Players 2 and 3, but there is no link between Players 1 and 3. Therefore, with the above notation, we have 12 ∈ {12, 23} and similarly 23 ∈ {12, 23}. Let N(g) be the set of players who have at least one link in g; that is, N(g) = {i | ∃ j ∈ N; such that ij ∈ g}. Let n(g) = |N(g)| denote the number of players involved in g. Take |g| to be the number of links in g. By g i , we denote the set of links that player i is involved in g, so that g i = {ij | ∃ j ∈ N : ij ∈ g}. The number of elements in N(g i ) given by n(g i ) is also called the degree of the node i ∈ N in the network g and is denoted by deg(i). For any g 1 , g 2 ∈ G, denote by g 1 + g 2 the network obtained through adding networks g 1 and g 2 and by g 1 \ g 2 the network obtained from g 1 by subtracting its subnetwork g 2 . With an abuse of notation, we use g \ ij to denote g \ {ij} for every link ij ∈ g. A path in a network g ∈ G between players i and j is a sequence of players i 1 , ..., i k such that i k i k+1 ∈ g for each k ∈ {1, ..., n − 1}, with i 1 = i and i k = j. The path relationships in a network naturally partition a network into different maximally connected subgraphs that are commonly referred to as components. A component of a network g is a non-empty subnetwork g ⊆ g such that: • if i ∈ N(g ) and j ∈ N(g ) where j = i, then there exists a path in g between i and j and • if i ∈ N(g ) and ij ∈ g, then ij ∈ g .
Thus, the components of a network are the maximally connected subgraphs of a network. The set of components of g is denoted by C(g). Note that g = g for all g ∈ C(g). In our framework, we do not consider the isolated players, i.e., the nodes without any link as components.
Definition 1. A function v : G → R with the condition v(∅) = 0 is called a value function where ∅ denotes the empty network. The set of all value functions on G is denoted by V. Under the standard addition and scalar multiplication of functions, namely (v + w)(g) = v(g) + w(g) and (αv)(g) = αv(g) for each v, w ∈ V and α ∈ R, V is a linear space. Definition 2. Given g ∈ G, each of the following special value functions makes a basis for V.
Note that the notion of a basis in V is critical to axiomatizing the solution concepts. Since each value function is a linear combination of its basis vectors, the corresponding characterization of a solution in terms of the basis vectors ensures the same characterization of the original game. Definition 4. A network game is a pair, (N, v), where N is a set of players and v is a value function on V. If the player set N is fixed, we denote a network game (N, v) simply by the value function v.

Definition 5.
An allocation rule is a function Y : G× V → R n that assigns a value Y i (g, v) to each player i ∈ N.
Thus, an allocation rule in a network game describes how the value generated by the network is allocated among the players. For a survey on the alternative allocation rules for network games, we recommend [25,31]. An allocation rule Y is link based if there exists a function Ψ : G × V → R n(n−1)/2 such that: Thus, a link based allocation rule allocates the total worth of a network to the players in two steps: the value is first allocated among the links treating them as players, and then, it is divided equally between the nodes (players) forming each such link. The position value [25,27,28,32] is one of the popular link based allocation rules that is based on the Shapley value [29] of the links in a network. It is denoted by Y PV i and given by (see [28]), An equivalent form of the position value using the unanimity coefficients λ g (v) due to [28] is given below.
Observe that the position value in a network game (N, v) receives half of the Shapley value of each of the links in which the player is involved. In what follows next, we present a recent characterization of the position value due to [28]. As an a priori requirement, we state the following definitions.
for each network g ⊆ g.
Definition 8. An allocation rule Y defined on G × V satisfies the superfluous link property if: for each network game (N, v) with a component additive value function v and all links ij that are superfluous in (N, v).
The superfluous link property states that if a link in the network is insignificant in terms of the value the network accrues, the allocation rule also does not consider that link for the computation of the value for each player. This idea is similar to the null-player property of TU-games [25].
Link anonymity states that when all the links in a network are interchangeable for the purpose of determining the values of the subnetworks, the relative allocations of the players in the network are determined by the respective number of links in which each player is involved. This idea is similar to that of the symmetry axiom of the Shapley value for TU-games [25].

Definition 10.
An allocation rule Y on G × V is link anonymous if for every network g ∈ G and link anonymous value function v ∈ V on g, there exists an α ∈ R such that: Definition 11. An allocation rule Y satisfies efficiency if ∑ i∈N Y i (g, v) = v(g) for all network games (N, v).
In [28], the following characterization theorem of the position value is proven. This result is used in a later part of this paper. Theorem 1. ( [28], p. 16) The position value Y PV is the unique allocation rule on the domain of all value functions that satisfies efficiency, additivity, the superfluous link property, and link anonymity.

Microarray Network Games and the Link Relevance Index
To obtain a microarray network game, we construct a gene co-expression network and then define a value function on this network. Recall from Section 1 that the co-expression networks are connection situations based on the extent of correlation between pairs of genes across a gene expression dataset. Here, nodes are genes and connections are defined by the co-expression of two genes. Often, we consider the Pearson correlation coefficient as the initial measure of gene co-expression [8]. This measure is then transformed into an adjacency matrix, according to different alternative statistical procedures. When the network game is fully described, we obtain this network game. The LRI of the nodes are indicative of the salient genes responsible for the onset of a disease. In the following, we first describe how the gene co-expression network is obtained.

Construction of Gene Co-Expression Networks
We follow a general framework for the construction of gene co-expression networks (for details, see [33]). In such networks, each gene corresponds to a node, and nodes are connected if the corresponding genes are significantly co-expressed across appropriately chosen tissue samples. In reality, it is tricky to define the connections between the nodes in such networks. To correlate the degrees of two nodes i and j, we use the Pearson Correlation Coefficient (PCC). The PCC (or the r-value) between two nodes is defined as the covariance of the two nodes divided by the product of their standard deviations. If N is the number of samples and x i and y i the expression values of genes i and j of the corresponding samples, then the PCC is calculated as follows.
Consider the MES E =< N; S D ; S R ; A S D ; A S R >. Construct a real matrix R (E,m) using a discriminant function m on the entries of A S D and A S R . In R (E,m) , zeroes represent the normal genes, and the nonzero entries represent the diseased genes with different expression levels of respective samples, which is unlike the Boolean matrix B of a microarray game. From R (E,m) , we obtain the adjacency matrix for the gene co-expression network based on some biologically motivated criterion (referred to as the scale-free topology criterion). This is done by defining first a similarity measure s ij between each pair of genes i and j. Denote by s ij the absolute value of the Pearson correlation coefficient, |cor(i, j)|. Note that s ij ∈ [0, 1]. Genes with no correlation are assigned a value near 0.0, while genes that are strongly correlated are assigned a value near 1.0. We denote the similarity matrix by S = [s ij ]. S can be considered to be a weighted network.
To transform the similarity matrix into an adjacency matrix, an adjacency function needs to be defined. The adjacency function is a monotonically increasing function that maps the interval [0, 1] into {0, 1}. The most widely used adjacency function is the signum function, which involves the threshold parameter τ; see [33]. The signum function is defined as follows, There are several approaches for choosing the threshold parameter τ. Sometimes, information gets lost due to hard thresholding. For example, if two genes are correlated with coefficient 0.79, they are considered to be disconnected with regard to a hard threshold τ = 0.8. The signum adjacency function forms an unweighted network. Thus, the gene co-expression network is represented by the adjacency matrix A =[a ij ], where a ij is one if the connection between two nodes i and j exists and zero otherwise, so the diagonal elements should be zero. Let us denote by g E the gene co-expression network with respect to the MES E =< N; S D ; S R ; A S D ; A S R >.
The following example is a slight modification of Example 1 in [16] (pg 259), which highlights the process of obtaining a gene co-expression network from an MES. Example 1. Consider the MES E =< N; S D ; S R ; A S D ; A S R > such that the normal sample A S R and the diseased sample A S D are reported in the following tables, respectively. The dataset of a microarray experiment is presented in terms of the logarithms of the relative gene expression ratios of the target sample with the reference sample. A positive number indicates a higher gene expression in the target sample than in the reference one, whereas a negative number indicates a lower expression in the target sample. Now, construct a real matrix from the expression matrices by using a discriminant method m such that for each i ∈ N and each j ∈ S D : The corresponding real matrix is: In this matrix, zero represents the normal genes, and the real numbers represent the diseased genes with different expression levels of the respective samples. The similarity matrix S with respect to R (E,m) is given by: Considering soft threshold β = 1, it follows that S represents a weighted network where all genes are connected to each other with some weights. Choosing the power β, the resulting network displays an approximate scale-free topology. However, one potential drawback of the soft threshold is that the network becomes too complex to track the relationship among the nodes. Therefore, the selection of a suitable threshold that allows the connection weights up to a certain level is a critical step. After applying a threshold, we obtain the resulting matrix as an unweighted network. Let us take τ = 0.8 for the sake of illustration. Then, the adjacency matrix corresponding to S becomes:

Microarray Network Games
Once the co-expression network g E has been constructed, i.e., the adjacency matrix has been formed, we have to define a value function v on G, the set of all possible networks on N. Let N(g E ) and n(g E ) denote, respectively, the set of genes and the number of genes that form the network g E . For instance, in Example 1, N(g E ) = {1, 2, 3, 4, 5} and n(g E ) = 5.

Definition 12.
Given the co-expression network g E ∈ G, let the support sp(i) of gene i ∈ N in g E be defined as the set of links in g E that gene i is involved in, i.e., sp(i) = {ij : ij ∈ g E for j ∈ N(g E )}. Therefore, following the standard notations, we have sp(i) = g E i .
Consider the network g E = {12, 23, 45} in Example 1. The supports of the respective genes are sp (1) Definition 13. Let N = {1, 2, ..., n} be the set of genes. Given an MES E =< N; S D ; S R ; A S D ; A S R > and the corresponding gene co-expression network g E , a microarray network game with respect to E and g E is the triple (N, v, g E ) where (N, v) is a network game with the value function v that assigns to each g ∈ G the average number of genes having connections in g E . Formally, we define the value function v : G → R as: Thus, the value function v determines the collective influence of a set of genes who are connected through a co-expression network. In practice, v(g) is the average number of genes added over all components that are contained in the set of links where both the genes are involved together in the onset of the disease determined by the network g. It follows that an equivalent form of the value function v as a sum of the basis games v g defined in Equation (4) in a microarray network game (N, v, g E ) is given by: where we choose the coefficients α g (v) =¯α g (v) n(g E ) such thatᾱ g (v) = |{i ∈ N(g E ) : g E i = g}|. If no ambiguity on N arises, we denote a microarray network game by the pair (v, g E ). The class of microarray network games with player set N is denoted by M N .

Example 2.
In Example 1, recall that g E = {12, 23, 45} is the gene co-expression network and N = {1, 2, 3, 4, 5} the set of genes. The value function v of the microarray network game (v, g E ) is given by, The value function v of the microarray network game (v, g E ) picks up the information that can be used to define the role of each link in each co-expression of genes by applying suitable solution concepts of network games. The value function v specifies the total value that is generated by a given network structure. The calculation of the value may involve both costs and benefits in networks and is a richer object than a characteristic function of the microarray game. This is because the value depends on the network structure in addition to the coalition of players involved [26].

LRI for Microarray Network Games and Its Characterization
In the previous subsection, we discussed the allocation rules for network games. An allocation rule for microarray network games describes how the value generated by a network is allocated among the genes. We call it the LRI. Define the function F : G × M N → R n on the class of microarray network games as follows.
where α g (v) and, hence,ᾱ g (v) are defined as in Equation (14). The following example shows the relevance of F in Example 2.
The numerical values are indicative of the individual contributions of the genes in the network g, given the microarray network game (v, g E ).
In what follows next, we define the LRI based on properties similar to the ones that are used to characterize the position value. Recall that the superfluous link property states that the presence or absence of a link between players that has no influence on the value of any network also has no influence on the allocations of respective players in a network. The interpretation of the superfluous link property in the genetic context is simple and intuitive. If a link is deleted from the gene co-expression Thus, we see that F satisfies all the axioms of an LRI. For the converse part, let the function Y : G × M N → R n satisfy these properties. Then, Y can be extended to a functionỸ : G × V → R n that also satisfies these properties. It is straight forward to show thatỸ is the position value on G × V such thatỸ| G×M N = F. Thus, by the uniqueness of the position value, Y = F. This completes the proof. Remark 1. In particular, when g = g E in Equation (16), an equivalent form of the LRI F i (g E , v, g E ) can be obtained as follows. Take N i (g E ) = N(g E i ) \ {i} and n j (g E ) = n(g E j ) − 1. Thus, N i (g E ) denotes the set of neighbors of i in g E (i.e., all the nodes j = i that are directly connected to i) and n j (g E ) the number of neighbors of node j (that is the degree of j in the graph). Next, consider the game v g E i (refer to Equation (5)) with g E i = ∅. By Theorem 2, F(g E , v g E i , g E ) satisfies gene link anonymity. Therefore, we have: Moreover, by Equation (14) and the additivity of F, we have that: . It follows that, Equation (17) suggests that, according to the LRI, a node is more important if connected to too many nodes that are not very well connected. This formula is very close (at least in the interpretation) to the Shapley values given in [19,20] for TU-games defined on a gene network. However, the two approaches are completely different both in the game formulation and in the definition of the index. Another important difference between them is that in Equation (17), each node contributes to its relevance a fixed amount of one, whereas in the formula of the Shapley value in [19,20], it contributes with the value of 1 n(g E i )+1 .

Results and Discussions
We tested our model on a previously reported colon cancer dataset [4,16,35,36] (http://genomicspubs.princeton.edu/oncology/affydata/index.html.) containing the expression of 2000 genes with highest minimal intensity across 62 tissues. In the expression data measured using Affymetrix oligonucleotide microarrays, forty tumor samples and a set of 22 normal samples exist. An adjacency matrix is obtained using the signum function based hard thresholding approach, which encodes edge information for each pair of nodes in the network. A pair of genes is said to be connected by an edge if their similarity value, which is calculated using the Pearson correlation, is greater than a threshold. We considered the threshold value to be 0.9 for our experiment.
A network (Figure 1) was constructed employing the LRI on the colon cancer dataset (refer Section 3). The network was made utilizing the igraph [37] package in R [38] by using the adjacency matrix generated after removing isolated points. The colors of the nodes connote the link relevance index varying from least (green) to highest (blue). Affy IDs of the top 15 genes are used to label the highest LRI nodes. The top fifteen genes selected by their highest LRI and its corresponding Shapley values reflect various cellular mechanisms ( Table 1). Most of them were previously observed to be associated with the colon cancer. We further analyzed if the genes were similarly ranked by the two methodologies viz., the LRI and the Shapely value. The LRI and the Shapely value depict no overlap between the top 100 genes (Figure 2A). However, the top 200, 300, 400, and 500 genes ( Figure 2B-E) exhibit 3, 11, 30, and 134 gene overlaps, respectively, between the two indices, suggesting there is a difference in the relative scoring of the genes using the two methodologies and therefore less similarity in the top selected gene sets.
The LRI and the corresponding Shapely value of top 50 genes are plotted to analyze any link/similarity between them (Figure 3). We found that the distribution of the LRI score of the top genes was not only different than the Shapely value, but also their distribution may follow a varied trend due to the likely difference in the background ranking method. Furthermore, Pearson's correlation also suggests no significant correlation (R 2 = 0.0833) between the LRI and Shapely value. The two methods were found to be separate in terms of their overall findings, and therefore, the LRI was considered to be a unique approach rather than a derived one.   We retrieved the list of all marker genes from the CellMarker database [16,39] that were well characterized and validated through the experimental setup and not just through theoretical estimation. Thereafter, we mined these marker genes to corresponding gene names and mapped them against the probe in the microarray platform. Three IDs viz. "Hsa.1240", "Hsa.654", and "Hsa.663" corresponding to genes ALDH1A1(M31994), CD24 (L33930), and CD44(M59040), respectively, were selected for further analysis, as can be seen in Figure 4. Figure 4 exhibits the distribution of the LRI of 2000 genes from highest to lowest in a rank-wise manner for each gene. We also plot the position of the three biomarkers, namely (CD44) M59040, (ALDH1A1) M31994, and (CD24) L33930, to show their relative position in this distribution. Shapely values of corresponding microarray games, arranged from highest to lowest, are also presented to compare the distribution pattern and relative position of the three biomarkers. LRI was able to correctly estimate the expected relative position of these colon cancer biomarkers. On the one hand, the Shapely value exhibited an exponential increase in the score, the LRI, which is based on the contribution of each gene in the co-expression network, exhibited a nonlinear curve in the distribution of the scores of 2000 genes.
Colon Cancer Stem Cells (CCSCs) not only have the potential of self-renewal and differentiation, but also exhibit "tumorigenicity" when transplanted into an animal host. CD44 (M59040) expressed on the surface of the CCSC is reported to have a major role in the progression, survivability, and "tumorigenicity" of such CCSCs, thereby making it a potent biomarker and target for diagnosis, biosensing, prognosis, and therapeutics in the case of colon cancer [40][41][42][43]. Du L et al. (2008) [41] reported the relevance of CD44 as a superior marker and its functional significance in contributing to CCSCs for cancer initiation and progression.
We found the LRI was able to estimate the higher relevance of CD44 (M59040) by means of estimating its contribution in the co-expression network by assigning it higher index of relevance. On the other hand, the same gene scored poorly in the Shapely value, which undermines its relevance. This validates that the LRI is better able to estimate the relevance of the gene compared to the Shapely value (Table 2, Figure 4).  The gene M31994 encodes Aldehyde dehydrogenase 1A1 (ALDH1A1), which catalyzes aldehydes to their corresponding carboxylic acids through the oxidation process [44]. It has also been enunciated that a considerable amount of ALDH1A1 enrichment occurs in colon cancer [45,46]. ALDH1A1 has been successfully used as a CCSC marker along with many other cancers, including breast cancer [47,48]. However, studies evaluating the association/relationship between ALDH1A1 expression with colon cancer initiation and progression for prognosis and therapeutics remain inconclusive [49][50][51][52][53]. Scientists have argued about the significance of the role of ALDH1A1 in colorectal cancer. Furthermore, clinical evidence equivocally suggests ALDH1A1's application as a prognostic or predictive biomarker in colon cancer [50]. Moreover, most of the aforementioned research articles did mention the role of CD44 along with ALDH1A1 in cancer initiation, progression, and metastasis.
The gene M31994's (ALDH1A1) relevance in the control case dataset of colon cancer was found to be moderate using the LRI. However, for the Shapely value, the same gene scored very high along with L33930 (CD24). The LRI method was better able to estimate its position relative to M59040 (CD44) compared to the Shapely value.
CD24 is the product of the L33930 gene and is anchored on the exterior side of the cell membrane. The positive expression and overabundant distribution of CD24 in colorectal cancer is under dispute [52]. A few previous studies reported that CD24 was expressed higher in a fraction of the colorectal cancer population [54,55]. Furthermore, researchers asserted CD24 expression to be limited to only a small fraction of colon cancer cell lines [56]. However, none of these previous reports refuted the significant role of CD44 in colon cancer cell lines. Instead, experimental evidence indicated that CD44 expression was highly significant in the considered colon cancer cell lines, thus highlighting its importance in colon cancer development and progression, but maintaining that only a fraction of these cells exhibited the expression of CD24 [52,[54][55][56]; in the authors own words, at "a fair level of 5-10%" [56]. They reported that HCT116 and SW480 colon cancer cells were CD44+ cells and that only a subpopulation of these CD44+ cells exhibited CD24 [56]. Evidence based on clinical studies not only highlighted the marginal contribution of CD24 [52,56], but also stressed CD44 expression in CCSC in initiating cancer, thus making it a better biomarker for colon cancer [41,52].
While comparing the three biomarkers, LRI rightfully estimated the marginal contribution of L33930 (CD24) in colon cancer development and progression; however, the Shapely value scored it very high compared to M59040 (CD44). The Shapely values scored L33930 (CD24) highest among all three genes, despite the previous experimental evidences suggesting its relatively lower relevance. The LRI, however, was able to predict the relative relevance of this gene and positioned it after M59040 (and M31994). In fact, the LRI was able to predict that L33930's (CD24) role is only incidental and that its expression has no or marginal contribution to colon cancer.
Compared to the Shapley value, the LRI was able to identify the relative contribution/position of the three colon cancer biomarkers. The relevance of same three biomarkers is also evident from experimental studies, including high-throughput single cell RNA seq, as mentioned in the PanglaoDB [57].

Pseudocodes for the Gene Co-Expressions Networks' Formation
The symbols given in Table 3 are useful in describing our method. The pseudocodes of the proposed method is presented in Algorithm 1.  11. g E is obtained from A

Conclusions
The identification of salient genes that mediate cancer etiology, progression, or therapy response is a challenging task due to the complexity and heterogeneity in cancer data. In a network game, the challenge is to find how players form a network, accrue a value due to the formation of the network, and finally, allocate the value of the network among the participating players. In this paper, we introduced the notion of a microarray network game to highlight the application of network games in gene expression analysis related to disease onset. We obtained the Link Relevance Index (LRI) to highlight the significance of the genes in a Microarray Experimental Situation (MES). By analyzing a real-world dataset, we made a comparison of our model with the existing game theoretic model in identifying the salient genes responsible for colon cancer. Indexing of genes according to the Shapely values rarely identified genes according to the expectation. The LRI model was validated by its ability to identify the relative relevance of three biomarkers of colon cancer. The results of the analysis on these biomarkers established not just the validity of the Link Relevance (LR) method, but also its advantage compared to the Shapely value in its ability to find the salient genes. In all three biomarker cases, the LR was able to score the genes according to their relative relevance and thus was able to identify salient genes in comparative expression studies. Moreover, in comparison to the Shapely value, the results of the LR method are close to actual immuno-histo-chemical assays and cancer genetic experiments reported previously. These results suggest that our proposed model is superior, and the top genes in the network show their contribution towards the development of colon cancer. The proposed model can be extended to study similar problems related to other genetic or metabolic syndromes.