The Expression of Embryonic Liver Development Genes in Hepatitis C Induced Cirrhosis and Hepatocellular Carcinoma

Hepatocellular carcinoma (HCC) remains a difficult disease to study even after a decade of genomic analysis. Patient and disease heterogeneity, differences in statistical methods and multiple testing issues have resulted in a fragmented understanding of the molecular basis of tumor biology. Some researchers have suggested that HCC appears to share pathways with embryonic development. Therefore we generated targeted hypotheses regarding changes in developmental genes specific to the liver in HCV-cirrhosis and HCV-HCC. We obtained microarray studies from 30 patients with HCV-cirrhosis and 49 patients with HCV-HCC and compared to 12 normal livers. Genes specific to non-liver development have known associations with other cancer types but none were expressed in either adult liver or tumor tissue, while 98 of 179 (55%) genes specific to liver development had differential expression between normal and cirrhotic or HCC samples. We found genes from each developmental stage dysregulated in tumors compared to normal and cirrhotic samples. Although there was no single tumor marker, we identified a set of genes (Bone Morphogenetic Protein inhibitors GPC3, GREM1, FSTL3, and FST) in which at least one gene was over-expressed in 100% of the tumor samples. Only five genes were differentially expressed exclusively in late-stage tumors, indicating that while developmental genes appear to play a profound role in cirrhosis and malignant transformation, they play a limited role in late-stage HCC.


Background and Motivation
Hepatocellular Carcinoma (HCC) ranks fifth among all cancers and third in mortality, accounting for hundreds of thousands of deaths per year. One-year survival rates remain less than 50% in the United States, despite advances in therapy [1]. HCC is the primary cancer morbidity evolving over decades in underlying hepatitis C (HCV) liver pathology in North America and Japan. HCC development is generally thought to be a multistep process resulting from hepatocyte turnover, chronic inflammation, regeneration, oxidative stress, DNA damage, and cirrhosis, as well as direct viral injuries. Unfortunately, the specific molecular mechanisms underlying hepatocarcinogenesis remain unclear.
In the last ten years, microarray technology has been a powerful tool to study the molecular basis of disease. By measuring whole-genome transcript levels, expression patterns associated with liver dysfunction have been examined. However, HCC remains a difficult disease to study, with widely variable findings between studies and several proposed non-overlapping gene signatures [2][3][4][5][6][7][8][9][10][11][12]. This is likely due not only to the biological heterogeneity of HCC pathogenesis, but also reflects the varied clinical background of patients and variation in statistical technique. There are significant statistical challenges which plague the analysis and interpretation of microarray experiments. Differences in technique in every stage of data pre-processing have been demonstrated to dramatically affect the end results, including background correction [13], normalization [14,15], and probe set summarization [16].
In addition, traditional statistical approaches are not particularly well-suited to cancer genomic data. When simultaneously comparing many thousands of genes, multiple testing problems are considerable. Worse, because the assumption of independent tests is often violated, actual false positive rates can be much higher than estimated by standard methods [13][14][15]. This implies that, even using conservative multiple testing correction methods, cancer studies which generate thousands of significantly differentially expressed genes could have several hundred false positive results. To reduce this problem, we used a knowledge-driven approach, using what is already known about normal and disease processes to generate hypotheses that can be tested with a relatively small number of statistical tests [17][18][19].
Another difficulty stems from the heterogeneity of cancer processes, in which changes in the expression of important genes occur only in subsets of tumors. This results in skewed density curves (sometimes even bi-modal) that may not be easily detected by means-based tests. Most statistical tests in common use are based on comparing the magnitude of mean change relative to the variation. These tests also place focus on the largest magnitude changes which are often products of tumor behavior, such as increased metabolism and cell proliferation/turnover, rather than drivers that often have smaller fold-changes [20]. We suspect that there are modest changes in the expression of critical genes that may be difficult to distinguish from "noise" in the data, but may have a significant impact on tumor development [21,22]. To address this we applied techniques that find patterns in the data (such as dimension reduction techniques), focused on such genes, to identify those that drive tumor behavior even if they have modest expression changes or skewed distribution patterns.

Questioning Biological Randomness in HCV-HCC
Tumors are widely believed to arise through an accumulation of random mutations. Mutations and chromosomal instability have been demonstrated in several carcinomas, including non-Hodgkins lymphoma, ovarian, colorectal, and oral tissue types [23][24][25][26][27]. HCC arising from chronic HBV infection has also been associated with mutations via HBx gene interference with p53 binding, leading to faulty DNA repair mechanisms [28,29]. It would be natural to think that the widespread dysregulation of gene expression in HCV-HCC is also largely random. However, HCV-HCC may be unusual because hepatitis C is an RNA virus that codes proteins that have direct interaction with over thirty host proteins. Tumors emerge from an environment of decades of host response to infection and liver damage. Therefore we hypothesize that induction of HCC in chronic HCV liver pathology may depend more on host response to chronic infection and HCV-host interactions than on direct DNA damage. If this is true, the effects of the HCV virus will be seen in the perturbation of the "tools at hand": gene expression changes that might be expected include modified expression of genes already in use in the liver (including genes expressed by activated hepatic stellate cells), target genes of host proteins that HCV proteins interact with, and genes used in the liver's own life history. Such genes contain the specific transcription factor binding sites (TFBS) that are responsive to the transcription factors expressed in the liver, while genes that are not normally expressed in the liver are responsive to different promoters. For instance, the promoter region for FGF7 (expressed in the embryonic liver) contains binding sequences for ATF2, FOXD1, HNF3B, STAT3, and JUN which are all expressed in the liver and dysregulated in liver disease. This reasoning also implies that genes never expressed by a healthy liver would not be expected to be activated by HCV-induced tumors to the same extent as in HBV-HCC or other cancers.
To further target our hypotheses, we reviewed the current knowledge of processes involved in HCC. For instance, it has recently been noted that there appear to be pathways common to both cancer and embryonic development in HCC and other cancers [30,31]. In the context of the hypothesis of non-random response to HCV as described above, this led us to question whether any developmental genes involved in HCC are specific to liver development, and if paralog genes (similar in structure and function in other tissues) remain dormant. In this paper we demonstrate that HCV-induced liver cirrhosis and HCC do indeed show a general pattern of differential expression of liver development genes compared to paralog genes that have similar roles in the development of other tissues. Many of these developmental genes are up-or down-regulated in cirrhotic livers in a coherent way (clustering closely together), then degenerating into widely variable expression patterns in tumors. Some of the genes identified in this manner are already associated with HCC, while others appear to be novel. We also observed that some of these important embryonic signals are secreted from mesodermal tissues during development. These same signaling molecules may be secreted from mesodermally-derived stellate cells in adults. However, these cells comprise less than five percent of adult liver volume, which may result in an observed low signal that may have been difficult to distinguish from noise in previous studies.

Overview of Liver Development
Liver development is a multi-stage process orchestrated by nearly 200 master regulators, growth factors, and their receptors. Growth factors secreted externally and from within the developing liver bind receptors on the surface of liver cells, which transduce signals to transcription factors (TFs) within the nucleus. These transcription factors, either individually or as co-factors, regulate a complex program of inducing or repressing access to gene transcription by a number of activities including chromatin decompaction, recruitment of chromatin remodeling complexes, and histone marker methylation, demethylation, or acetylation, as well as by physically blocking or recruiting RNA polymerase. For example, some of the earliest TFs that induce hepatic fate from the endoderm (FOXA1-3, GATA4/6) open the chromatin structure around early hepatic markers such as albumin and alpha fetoprotein (AFP), while β-catenin (CTNNB1), LEF1, and TCF3 recruit chromatin remodeling complexes, and ARID5B and ATF2 modify histone markers of target genes. See Si-Tayeb et al. for a recent review [32] of embryonic liver development.
In the neonatal period, many of these regulators are repressed in most parts of the liver, while others maintain some level of activity throughout adult life either playing roles in metabolism and homeostasis or in maintaining niches of hepatic stem cells.

Description of Patient Population
Microarray studies were obtained over a 10 year period from 180 samples of cirrhotic tissue and tumors collected from 140 patients with chronic HCV infection. As described in the Experimental Section, stringent quality control criteria were applied minimize technical artifacts, leaving us with 30 HCV-cirrhosis (CIR) and 49 HCV-HCC (HCC) tumors. These were compared to a control group of 12 samples taken from non-diseased, deceased donor livers (NOR). Tissue was obtained from 31 early HCC (stage T1 and T2) and 18 late HCC (T3 and T4). Twenty-nine HCC patients were transplanted, six died on the waiting list and 14 were never listed for transplant due to age, stage of cancer, or other co-morbidities.

Differentially Expressed Genes Are Specific to Liver Development
We hypothesized that genes not normally expressed in adult livers are less likely to be transcriptionally activated in HCV-HCC. To test this, we identified a set of genes with zero counts in an RNA sequencing study on a normal liver sample. 1,399 were represented on the Affymetrix U133Av2 genechip. We tested for expression in HCC compared to normal samples in our dataset using a one-sided Kolmogorov-Smirnov (K-S) test of identical distribution. Of the 1,399 genes, none were expressed in the HCC samples (α < 0.001).
More specifically, we wished to determine whether the general developmental pathways altered in HCC were using genes specific to liver development, or whether any member of the gene family might be engaged. To examine this, we identified 26 paralog genes that have highly related developmental functions in other tissues that are not expressed in normal healthy livers (based on the RNA sequencing data and verified with a literature search). In our data, no paralogs were expressed in HCC compared to normal samples (α < 0.001) (Supplementary Table S1, column A). We validated this to an independently collected HCV-HCC dataset from Wurmbach et al. (see Experimental Section), which also had no expression of these paralog genes (Supplementary Table S1, column B). Next, we compared the expression patterns of liver genes to their non-liver paralogs in tumors. Most (31/33, 94%) of the liver genes had significantly different expression distributions compared to their paralog (Supplementary Table S1, column C). Figure 1 shows examples of the typically observed patterns of expression and poor regulation. Even when a liver gene is not differentially expressed in HCC, it often seems to be more poorly regulated with a wider distribution pattern than is seen in normal tissue. were not expressed in HCC or normal samples, while liver development genes RXRA and SOX9 were differentially expressed in HCC. These patterns were also observed in the Wurmbach dataset.

Normalized signal intensity
Normalized signal intensity

Patterns of Expression in HCV-Cirrhosis and HCV-HCC
We then identified patterns of expression for liver development genes in HCV-CIR and HCV-HCC. To capture changes in either mean or variation, we assessed significance with a combined p-value from both t and F tests. We were particularly interested in whether the genes associated with progression from cirrhosis to cancer were those particular to a certain developmental stage. However, differentially expressed genes were found from all stages of development, and PCA plots of stage-specific genes all showed discrimination between normal, cirrhotic, and tumor samples (data not shown).
Twenty-nine developmental genes had a pattern of higher magnitude over-expression in cirrhosis, then declining values in HCC (Table 1). These include several extra-cellular matrix (ECM) genes and members of the TGFβ/BMP signaling pathway. Transcription factors following this expression pattern include SOX9, GATA6, HAND2, and IRS2. The only genes differentially expressed between early (T1 and T2) and late-stage (T3 and T4) tumors were EPCAM and tumor suppressor KLF6. Nine genes were down-regulated in cirrhosis and remained low in tumors: transcription factors FOXA1, FOXA2, GATA4, HNF1A and STAT3; growth factor receptors ACVR2B, RXRA; signaling molecule NRTN, and MMP15, which degrades intracellular matrix proteins. Sixteen genes were differentially expressed uniquely in the tumors ( Table 2). Magnitude of change was modest, however six of the 16 genes were transcription factors or activators (TBX3, HHEX, ATF2, FOXM1, PROX1 and STAT3) which might be expected to induce large effects with small expression changes. Thirty-five more genes were differentially expressed in cirrhosis and either had similar expression in tumors (25 genes) or had larger magnitude changes in HCC (10 genes). For example, GPC3 was up-regulated slightly in cirrhosis (×2), and greatly up-regulated in both early (×7.2) and late (×10.4) tumors. Overall, 98 of the 179 (55%) genes critical to liver development had altered expression patterns in cirrhosis and early stage tumors. Only five of the genes were uniquely changed in late-stage tumors: ACVR1, HMGA2, IGF2, CP and YAP1. A complete list of all 179 liver development genes tested, and relative expression in each disease group can be found in Supplementary Table S2.

Functional Gene Sets That Discriminate between Normal, Cirrhosis, and Tumor Samples
From these significant liver development genes we have been able to identify some specific functional groups. Genes related to extra-cellular matrix (ECM) maintenance or remodeling demonstrated major changes in both cirrhosis and tumors. PCA of the significantly changed genes demonstrate that these genes also independently discriminate between normal, cirrhosis, and tumor samples (Figure 2A). The BMP signaling pathway is also highly dysregulated in HCV-cirrhosis and HCC. BMP2, its receptors, and BMP inhibitors are all differentially expressed in cirrhosis and HCC. Embryonically, BMP signaling is antagonistic to FGF signaling and this balance is controlled by the DAN family of BMP antagonists from mesenchymal cells and GPC3 expressed by hepatocytes. BMP2, BMPR1A, FGF7, FGFR2 and ID3 were more highly expressed in cirrhosis than HCC, while the BMP inhibitors were more highly expressed in tumors. At least one of the inhibitors GPC3, GREM1, FSTL3, and/or FST were over-expressed (FC > 1.5) in 100% of tumor samples (Supplementary Table S3). A PCA plot demonstrates the gene set's ability to discriminate most tumors from normal and cirrhosis samples ( Figure 2B). Leave-one-out cross-validation (LOOCV) of ROC curves assessed predication sensitivity of 95.9% and specificity of 83.3%.  These findings were validated against the Wurmbach dataset by applying the PCA loadings generated from our data to their dataset, and we observed similar patterns of separation between normal, cirrhosis, and HCC tissues (Figure 2A,B). The ECM and BMP genesets were well described with the first two principal components.

Discussion
Hepatocellular carcinoma is widely recognized as a highly heterogenous disease, which has made it difficult to characterize. However, because HCV is RNA virus that exerts its effects by direct protein interactions and host response to chronic infection and inflammation, we questioned the common perception that the gene expression changes observed in tumors are likely due to random activation of genes arising from mutations and DNA damage. We screened a total of 1,425 genes that were demonstrated by RNA seq analysis to have no expression in healthy liver. None of them were expressed in our HCV-HCC samples. This data suggests that the progression from cirrhosis to HCC in patients with chronic HCV infection is not accompanied by random gene activation, as has often been supposed. In particular, 26 of these genes have developmental roles highly paralogous to liver development genes and known association with carcinoma of other tissues, but remain under tight transcriptional control in both cirrhotic and HCC tissues. For example, BMP3 inactivation is associated with colorectal cancer [33], gastric cancer [34], and pancreatic cancer [35]; CDH3 is over-expressed in several types of cancer including breast [36] and pancreatic [37]; FGF3 is amplified in ovarian [38], breast [39], and bladder cancer [40]; FGF12 is amplified in esophageal squamous cell carcinoma [41] and squamous cell carcinoma of the lung [42], and GATA1 mutations have recently been associated with acute megakaryoblastic leukaemia [43] and breast cancer [44].
Using a knowledge-driven approach, we have shown that genes critical to every stage of liver development are differentially expressed in HCV-cirrhosis and further deregulated in tumors. In general we observed a pattern of modest changes in genes differentially expressed in cirrhosis and HCC. Mean fold-change in tumors was <2× up or down in many cases. Some of these genes are expressed by mesodermally-derived tissues in the embryonic liver and thus may also be expressed exclusively by the activated hepatic stellate cells (HSCs) in the adult liver. Since these cell types make up less than five percent of the liver volume in HCC, even large changes in gene expression from the HSC would result in modest overall signal change in a tissue sample . BMP2, BMP4, BMPR1A, HGF,  LHX2, FST, SOX9, GREM1, FGF7, TGFB1, MMP2, TIMP2, SMAD7, KLF6 and JUN have experimentally validated expression in activated hepatic stellate cells [45][46][47][48][49]. All of these genes except BMP4, HGF, and LHX2 were differentially expressed in our study. As predicted, most of these have modest but significant mean fold-change.
The major pattern that emerged from our detailed analysis of these important genes was the BMP pathway. Embryonically, BMP2 and BMP4 are secreted from the mesenchymal STM and act antagonistically against FGF2 and FGF4 to regulate rate of proliferation. Later in development, FGF7 and FGFR2 promote differentiation into biliary epithelial cells. In healthy adults, BMP2 is secreted from hepatic stellate cells to suppress hepatocyte proliferation. During normal liver regeneration (such as after partial hepatectomy), BMP2 is down-regulated and hepatocyte proliferation is activated by at least four distinct and redundant mechanisms: FGF7 and FGFR2 [50]; KDR (aka VEGFR2), ID1, HGF, and WNT2 [51]; TNF, NFKB1, STAT3, and IL6 [52]; and β-catenin and Cyclin D1 [53]. During active hepatitis C infection, the HCV core protein induces over-expression of BMP2, which then participates in the activation of hepatic stellate cells and also functions to suppress hepatocyte proliferation in response to liver damage. In our dataset, late-stage cirrhotic livers had elevated levels of BMP2, its receptor, and some of its downstream effectors (SMADs and ID transcription factors). The proliferation promoters noted above were generally unchanged in cirrhotic tissue, while TNF and β-catenin were under-expressed. FGF7 and FGFR2, however, were over-expressed in both cirrhosis and HCC. This suggests a complex and somewhat chaotic response to active HCV infection and liver damage in the cirrhotic liver. However, many cirrhotic tissues and every tumor in our samples over-expressed at least one of the BMP inhibitors (FC > 1.5, see Supplementary Table S3), with higher expression in tumors. This data suggests there is a chronic struggle in the cirrhotic liver to maintain homeostasis of the BMP pathway, and that tumors might have found a way to overcome the proliferative inhibition imposed by BMP2 by sufficiently up-regulating any of the BMP inhibitors. In the tumors, most of the proliferation promoters had expression values similar to that of normal tissue (which is sufficient to induce proliferation in the absence of active BMP2), including those that were down-regulated in cirrhosis.
Several of the developmental genes expressed in adult livers have been shown to have roles in maintaining tissue differentiation or regulating cell proliferation. BMPs and their DAN family antagonists have opposing effects on Wnt signaling and the BMP-GREM-Wnt circuit has been proposed as a mechanism to maintain stem cell niches in the colon [54]. FOXA1, FOXA2, HLX and SFRP5 are expressed in progenitor cells, while SOX9, SOX17, FOXA2 and KIT have confirmed expression in hepatic stem cells [55]. FOXA1-3 drive differentiation of hepatocyte stem cells. Merlin (NF2) has recently been proposed as a regulator of liver stem cells, with deletion leading to HCC in rat models [56]. In our data, FOXA1, FOXA2, MST1 and NF2 were down-regulated, suggesting that in this population there is a general pattern of de-differentiation of hepatocytes within tumors and acquisition of stem-cell-like properties.

Study Population
Since 1997, HCV patients diagnosed with cirrhosis and HCC have been evaluated and treated at the Hume-Lee Transplant Center at VCUHS according to an Institutional Review Board approved study protocol [57]. Informed consent was obtained from all patients. After staging, HCC patients had their tumors ablated and were evaluated for liver transplant according to the United Network for Organ Sharing criteria. Tissue samples were collected from biopsies and explanted livers according to protocols established by the Liver Tissue Cell Distribution System (Richmond, Virginia, funded by NIH Contract #N01-DK-7-0004/HHSN267200700004C). Control liver samples were obtained from explanted donor livers. Donor livers were shown to have normal function and were negative for hepatitis C virus antibodies.
An independently published dataset of HCV-cirrhosis and HCC was also obtained for verification of results [58] (Wurbach et al., NCBI GEO database accession GEO6764). In addition, the absolute expression levels of target genes in a normal adult human were obtained from the BodyMap gene expression database [59] and mapped with Burrows-Wheeler Alignment tool (BWA) [60] with default parameters. In cases where BodyMap results were inconclusive (counts of 0-40), a literature search was performed to confirm adult expression of target genes.

Sample Preparation
Pre-transplant biopsies and explanted livers were sectioned and grossly examined. Samples from tumors and cirrhotic liver tissue (according to diagnosis and pathological examination) were freshly snap-frozen and processed in the Hume-Lee Transplant Center Molecular Diagnostic Laboratory. Liver tissue samples were collected in liquid nitrogen or RNAlater solution (Ambion, Austin, TX, USA) and stored at −80 °C until use. Explanted livers were sliced at intervals of 4-5 mm, and all nodules suspicious for HCC processed for light microscopy. Only tumor samples with more than 85% tumor cell content were used for the microarray studies. Normal and necrotic tissues were macro-dissected from the sample.
With minor modifications, the sample preparation protocol follows the Affymetrix GeneChip Expression Analysis Manual (Affymetrix, Santa Clara, CA, USA). After hybridization and scanning, the microarray images were checked for major chip defects or abnormalities in the hybridization signal. Total RNA quality and integrity of each sample were analyzed using the Agilent 2100 Bioanalyzer (Agilent Technologies), and products of cDNA synthesis and in vitro transcription (IVT) were tested before being considered for microarray analysis using the Agilent 2100 Bioanalyzer (cDNA synthesis 1.5 kb < cDNA < 5.0kb; IVT 1.0kb < cRNA < 4.5 kb).
All chips in the study were examined with several quality control tests [61,62]. Any chip that fell well outside the recommendations for any of the quality assessment tests was excluded from further analysis.

Statistical Methods
Data files were read into the R (version 2.13) programming environment using updated probe annotations from the BrainArray project (version 14.1.0, HGU133A2_Hs_REFSEQ), which have been shown to improve accuracy of probe-gene mapping over the standard Affymetrix annotation [63].
Robust Multichip Average (RMA) pre-processing is broadly accepted as robust, easy to implement, and widely applicable. However, we were concerned that the assumptions of RMA may not be met (that only a small proportion (1-5%) of genes are differentially expressed, and that about the same number of genes are over-vs. under-expressed). Therefore, we processed a test dataset using RMA and carefully examined differential expression results. Comparison of group contrasts identified 25-45% genes that were called as differentially expressed at an FDR < 0.05, and 2/3 of those genes were over-expressed in tumors. This suggested to us that quantile normalization may not be an appropriate method for this dataset. Instead, we first performed background correction, then normalized the data using non-parametric, distribution-free regression on technical covariates of probes (GC content, melting temperature, and probe location) to estimate and correct for systematic bias [64].
We used scaled Principal Components Analysis (PCA) on specific gene sets to explore the relationship between those genes and disease behavior. We used the function "prcomp" with scale = TRUE in the R environment [65].
Normally regulated genes generally have a narrow distribution with small variation, while poorly regulated genes may have broad, flat distribution of expression values or long thick tails and high variability. Genes that are differentially expressed in a subset of tumor samples will have a skewed distribution. Either of these situations violate the assumption of normal distribution and can be difficult to identify with a t-test. Instead, we used F-test of variance to identify high-variance genes and the non-parametric two-sample Kolmogorov-Smirnov Test to test differences in both location and shape of the distributions.

Identification of Test Genesets
In this study, we investigated the hypothesis that genes critical to the development of the liver are poorly regulated or preferentially activated in HCV-cirrhosis and HCV-HCC. There are five main stages of liver development: hepatic fate specification, hepatoblast migration, liver bud growth, hepatocyte/biliary differentiation, and maturation. We identified 179 genes (via literature review) that are experimentally determined to be critical regulators necessary for normal liver development (See Supplementary Table S2 for a complete list). We also wished to determine whether liver development genes are more likely than their non-liver paralogs (which have highly similar functions in the development of other tissue types) to be dysregulated in liver tumors. We identified 26 such genes that are not normally expressed in either embryonic or adult liver tissue (Supplementary Table S1).

Conclusions
In summary, there is growing awareness that hepatocarcinogenesis shares many features with liver organogenesis. Recent reviews have compared the pathways that appear to be involved in both development and cancer [66,67]. We show here, for the first time to our knowledge, that tumors dysregulate the genes specific to their own development rather than closely related genes in the same pathways. We further demonstrate that these closely related paralogs, which have established roles in cancers of other tissues, remain under good transcriptional control in HCV-induced cirrhosis and HCC even in late-stage tumors. We have found that the earliest signals responsible for guiding the initiation and development of the embryonic liver are correlated with the development of cirrhosis and HCC due to chronic Hepatitis C infection, but do not appear to be involved in the progression from early to late stage cancer. These observed patterns of expression were validated in an independently collected HCV-HCC microarray experiment. By applying what is known about the control of these genes in normal liver development and their aberrant control in HCC, we can begin to model knock out and deletion experiments to further define how these pathways can be interrupted or stimulated to impact HCC occurrence or recurrence. Table S1. Liver development genes compared to their non-liver paralogs. A: one-sided K-S test of identical distribution comparing HCC to normal samples; B: on-sided K-S test in the Wurmbach dataset; C: K-S test comparing the liver development gene to it's non-liver paralog.