Meta-Analysis of Grainyhead-Like Dependent Transcriptional Networks: A Roadmap for Identifying Novel Conserved Genetic Pathways

The Drosophila grainyhead (grh) and vertebrate Grainyhead-like (Grhl) transcription factors are among the most critical genes for epithelial development, maintenance and homeostasis, and are remarkably well conserved from fungi to humans. Mutations affecting grh/Grhl function lead to a myriad of developmental and adult onset epithelial disease, such as aberrant skin barrier formation, facial/palatal clefting, impaired neural tube closure, age-related hearing loss, ectodermal dysplasia, and importantly, cancers of epithelial origin. Recently, mutations in the family member GRHL3 have been shown to lead to both syndromic and non-syndromic facial and palatal clefting in humans, particularly the genetic disorder Van Der Woude Syndrome (VWS), as well as spina bifida, whereas mutations in mammalian Grhl2 lead to exencephaly and facial clefting. As transcription factors, Grhl proteins bind to and activate (or repress) a substantial number of target genes that regulate and drive a cascade of transcriptional networks. A multitude of large-scale datasets have been generated to explore the grh/Grhl-dependent transcriptome, following ablation or mis-regulation of grh/Grhl-function. Here, we have performed a meta-analysis of all 41 currently published grh and Grhl RNA-SEQ, and microarray datasets, in order to identify and characterise the transcriptional networks controlled by grh/Grhl genes across disparate biological contexts. Moreover, we have also cross-referenced our results with published ChIP and ChIP-SEQ datasets, in order to determine which of the critical effector genes are likely to be direct grh/Grhl targets, based on genomic occupancy by grh/Grhl genes. Lastly, to interrogate the predictive strength of our approach, we experimentally validated the expression of the top 10 candidate grhl target genes in epithelial development, in a zebrafish model lacking grhl3, and found that orthologues of seven of these (cldn23, ppl, prom2, ocln, slc6a19, aldh1a3, and sod3) were significantly down-regulated at 48 hours post-fertilisation. Therefore, our study provides a strong predictive resource for the identification of putative grh/grhl effector target genes.


Introduction
Among the earliest tissues formed during embryonic development are epithelia, the thin, cellular layers that line the surfaces of all organs and body cavities. More than just semi-permeable "barriers", epithelia are, in fact, highly dynamic tissues, actively regulating embryonic growth and expansion, as well as providing essential biochemical and mechanical signal-transduction cues. Understanding the nature of the genetic networks that regulate normal epithelial development is therefore a cornerstone of understanding birth defects due to epithelial aberrations. These not only include significant congenital abnormalities, (e.g., neural tube defects, cleft lip/palate, intestinal dysfunction), but, importantly, also underpin numerous epithelial disorders in later life (e.g., psoriasis, eczema, ectodermal dysplasia and, critically, cancers). Therefore, identifying the genes that drive epithelial formation and maintenance is a necessary first step to developing therapies that may ultimately ameliorate both the severity and incidence of congenital and adult onset epithelial pathologies.
Among the most critical genes for epithelial development and homeostasis are the highly conserved Grainyhead-like (Grhl) transcription factors. First described in Drosophila [1] as grainyhead (grh), this gene family is found in all organisms within the animal kingdom, and is remarkable both for its extremely high evolutionary conservation across eukaryotes [2,3], and for the extensive and disparate functions that are controlled by its various orthologues, in the context of both embryonic development and adult onset disease. In Drosophila, grh regulates key processes of epithelial development, such as Planar Cell Polarity (PCP), formation of the chitinous head "skeleton", dorsal hole closure, and cuticle integrity and maintenance [1], whereas fungal Grhl is essential for cell wall remodelling and asexual spore (conidial) separation [4]. In vertebrates, Grainyhead-like orthologues display a conserved functional homology of regulating processes, required for epithelial development. Studies in mice [2,5], Xenopus [6] and zebrafish [7] have determined remarkable conserved functional homology to Drosophila grh, apparent in the morphogenic development of the epidermal barrier [8][9][10][11], closure of the neural tube [12][13][14][15], maintenance of neural cell survival [16], and the growth and development of the craniofacial skeleton [17,18]. Together, these findings point to a critical role for Grhl function in the development of all organisms within the animal kingdom.
Recent work has also shown that Grhl function is highly conserved in humans, and that these genes regulate both development and adult disease. Critical recent work has demonstrated that loss of GRHL3 leads to the craniofacial defect Van der Woude Syndrome (VWS), characterised by clefting of the secondary palate. This anomaly is due to aberrant maintenance of the oral periderm, a thin, extra-epithelial tissue that lines the developing palatal shelves [19]. Additionally, GRHL3 mutations are also associated with non-syndromic palatal clefting [20] and spina bifida [21]. As substantial, disparate craniofacial defects are also observed in Drosophila, zebrafish and murine models lacking grh/Grhl2/Grhl3 function, and, as these factors are expressed almost exclusively in epithelia, it becomes clear that both clefting and other facial, jaw and skull deformities observed in these models are secondary to a primary epithelial defect, following abrogation of grh/Grhl function.
After development, Grainyhead-like (Grhl) plays an important a role in regulating genes required for the maintenance of the epithelial barrier during growth and injury. Studies of human Grainyhead-like (Grhl) orthologues have begun to uncover the role that loss of function or gene variants have in adult-onset diseases, such as hearing loss/deafness [22,23], defects of epidermal integrity and, particularly, numerous forms of cancer [9,[24][25][26]. The Grhl-genes are also implicated in the aetiology of a number of human diseases, including cleft palate [27,28], spina bifida [5,14,29], cancer [30], ectodermal dysplasia [28] and asthma [31]. This incredible functional diversity is mediated, at least in part, through a grh/Grhl transcription factor binding to a promoter or enhancer element and regulating the subsequent transactivation or repression of target effector genes through recruitment of transcriptional co-factors.
The grh/Grhl binding site is exceptionally well-conserved through evolution, broadly taking the form AACCGGTT (with the first "C" and second "G" largely invariant in experimentally validated assays). Although several upstream regulatory factors, such as branchless/FGF8 and Erk [32,33], and interacting co-factors, such as LMO4 [10,34], have been described, the delineation of grh/Grhl-dependent transcriptional networks has largely focussed on the identification and characterisation of these direct downstream target genes. Many candidate genes have been experimentally validated in the literature, both through targeted approaches (e.g., Tgm1/5 [35], ARHGEF19 [36], Dsg1 [37], eng2a [16], spec1 [16], edn1 [17] and PTEN [30], and, more recently, through large-scale screening approaches, such as RNA-SEQ and ChIP-SEQ. However, the precise mechanisms by which the grh/Grhl-dependent transcriptome regulates numerous disparate cellular and developmental functional processes are still unclear. Nevertheless, it is clear that the functional interaction between grainyhead-signalling and target gene regulation is dependent on tissue, and developmental/disease stage and organism, and the incredible heterogeneity of these regulations suggest that delineation of these grh/Grhl transcriptional networks will shed significant light on both embryonic development and disease. Currently, 41 Microarray/RNA-SEQ datasets probing Grhl function have been reported in the literature, from both Drosophila and mouse tissues, as well as several human cancer/short-hairpin RNA (shRNA) cell lines (Table 1). In order to delineate the grh/Grhl transcriptional network, we have interrogated these datasets to bring greater clarity to the Grainyhead-dependent transcriptome, and to begin to understand the critical effector genes that operate broadly across disparate contexts, as well as identifying those that have more restricted roles in specific processes, particularly epithelial development.
Lastly, using our recently described grhl3 -/zebrafish that present with epithelial defects of the enveloping layer, EVL [7], we performed Q-RT-PCR to determine the expression of zebrafish orthologues of the ten most differentially regulated epithelial genes, thereby functionally testing the predictive strength of our meta-analysis approach in a novel animal model. Together, this study collates multiple "big-data" approaches and provides an experimental pipeline for determining and testing novel grh/Grhl-dependent functional interactions. Table 1. Published datasets used for meta-analysis. Microarray and RNA-SEQ datasets including tissue of origin were classified as being derived from either "primary epithelia", "cell line-cancer", "cell line-non-cancer", "other mammalian" or "other non-mammalian", for subsequent analyses.

RNA-SEQ and Microarray Meta-Analysis and Database Construction
The datasets from individual studies were converted to a common file format (.csv) and imported into MySQL (www.mysql.com), a relational database management system that enables the identification of relationships between datasets. Inside the database, the data were combined into species-specific tables. Using information from Ensembl v.91, gene symbols from the original human data files were mapped to the human Ensembl gene_stable_ids; gene symbols from other species were mapped to human gene symbols and gene_stable_ids of their orthologous human genes. After mapping, all the human-mapped data were joined together into a single table, searchable by the original gene symbol in every species, as well as the gene symbol and gene_stable_id of the human orthologue, publication name, dataset name, type of Grhl regulator, and direction of Grhl regulation (positive or negative), and were designated as being derived from either "cell-line" or "tissue". For interactive visualization and exploring, the data were exported from the MySQL database, re-structured and uploaded into the Ordino tool (https://ordino.caleydoapp.org). To achieve a common gene ranking, the fold change rank-based scores from individual datasets were aggregated into the sum of scores for each gene, separately for positively and negatively regulated genes. For datasets where p-values/false discovery rates (FDRs) were available, another set of scores, based on p-value/FDR ranks, and computed independently for positively and negatively regulated genes, are also available. For singular genes, and the direction of regulation, p-values/FDRs from multiple datasets were combined into a single p-value, using the standard Fisher's method. The ranks were then transformed into normalised scores that permitted aggregation of rank-based information from datasets comprising different numbers of genes. In order to do this, we used a scoring function f(r) =1/rˆk, where k was an arbitrary parameter, allowing us to determine the significance of a genes' rank across datasets. The constant of k = 3 was chosen, as it balanced the number of datasets that a given gene appeared in with instances in which this gene was highly ranked within an individual dataset. Therefore, the top ranked gene (Rank #1) within a given dataset would achieve a normalised score of 1 (1/1ˆ3), whereas, if a gene did not appear within a particular dataset, it would be given a score of 0.

Zebrafish Experimentation
All zebrafish experiments were conducted in accordance with La Trobe University guidelines for the care and housing of zebrafish, under Animal Ethics Committee approval number AEC16-91. The grhl3 -/fish line was reported previously [7].

RNA Extraction and cDNA Synthesis
RNA was isolated from grhl3 -/and wild type embryos (at 48 hours post-fertilisation; hpf) as described previously [38]. cDNA was synthesized (from three biological replicates, using WT and phenotypic grhl3 -/embryos at 48hpf, utilising 50 embryos per group per replicate) using the iScript™ cDNA (Life Science Research, Bio-Rad, Hercules, CA, USA) synthesis kit, according to the manufacturer's instructions. Quantitative Reverse Transcription Polymerase Chain Reaction (Q-RT-PCR) using SsoFast™ EvaGreen®Supermix (Bio-Rad) and a BioRad CFX96 Real-Time System/C1000 Thermal Cycler was employed to quantify expression levels of 10 genes (comprising 17 fish orthologues) that were predicted to be differentially regulated in epithelial establishment following loss of grhl3, using methodology described previously [38]; expression levels were normalised to the EF1α housekeeping gene, as described previously [7]. Oligonucleotide sequences used for Q-RT-PCR are shown in Table S1.

Meta-Analysis Reveals the Most Consistently Differentially-Regulated Genes Following grh/Grhl Misexpression
In order to analyse the grh/Grhl transcriptome across multiple disparate biological contexts, we standardised all published grh/Grhl Microarray/RNA-SEQ datasets into a common format that allowed for direct comparison of differing methodologies and results, together with the corresponding published fold-changes and p-values ( Figure 1). Differences in methodologies and data reporting in published studies created the need to integrate lists of regulated genes that were of different lengths, selected using different (study-specific) criteria. As studies varyingly examined the effects of either loss or gain of function of grh/Grhl, we divided the data based on whether a gene was positively or negatively regulated under normal homeostatic conditions. That is, if a gene was found to be downregulated in an experimental paradigm following loss of grh/Grhl, we deemed it to be positively regulated. Next, the positively and the negatively regulated genes were separately ranked, on the basis of the reported fold-change within each dataset ( Figure 1).
Our analysis allowed us to determine the genes that are both most frequently and most substantially differentially regulated, following gain/loss of grh/Grhl function. From these comparisons, we determined which genes exhibited the greatest degree of regulation following disruption of grh/Grhl signalling. We determined this by: (1) frequency of altered expression (the number of datasets a given gene appeared in as "regulated"); and (2) degree of regulation, measured by the score, based on the rank given to a gene in a particular dataset, with respect to other differentially regulated genes. These analyses allowed us to reveal the genes that appeared within at least one dataset and determine the top differentially regulated genes following grh/Grhl de-regulation or misexpression, across all biological contexts thus far tested and described in the literature ( Figure 2).
From these analyses, we compiled a list of the top 50 most-highly differentially regulated genes (following grh/Grhl modulation) across various tissues and cell lines, and at different developmental time points (Table 2). As grh/Grhl transcription factors bind to the promoter of target genes in order to drive transcription, we determined whether any of these genes had previously been identified as being bound by grh/Grhl factors in ChIP or ChIP-SEQ datasets. In total, 15 of the top 50 genes in our list (30%: tmem54, prom2, cldn4, ppl, cdh1, rab15, lad1, rab25, Epcam, Tacstd2, st14, Esrp1, Prss22, spint1 and prss8) are known to be bound to grh/Grhl factors, supporting the robustness of our analysis algorithm. Additionally, seven genes in our list-cldn4, rab15, rab25, epcam, cdh1, tslp, and spint1-had been previously characterised as direct Grhl-target genes through biological validation experiments, further highlighting the predictive strength of our approach.

Figure 1. Meta-analysis pipeline methodology
All published grh/Grhl Microarray/RNA-SEQ datasets were converted to a common file format (.csv, A) and imported into MySQL (B). Next, we determined whether genes within each dataset were positively or negatively regulated following functional modulation of grh/Grhl (C), and genes across the datasets were ranked according to their regulation within each dataset. To achieve a common gene ranking, the fold change rank-based scores from individual datasets were aggregated into the sum of scores for each gene, separately for positively and negatively regulated genes. (D). Each gene symbol within all datasets was mapped to the appropriate gene using Ensembl. After mapping, a table of all the human-mapped data was generated, searchable by the original gene symbol in every species (E). For interactive visualization and exploring, the data were exported from the MySQL database, re-structured and uploaded into Ordino (F).

Figure 1. Meta-analysis pipeline methodology
All published grh/Grhl Microarray/RNA-SEQ datasets were converted to a common file format (.csv, A) and imported into MySQL (B). Next, we determined whether genes within each dataset were positively or negatively regulated following functional modulation of grh/Grhl (C), and genes across the datasets were ranked according to their regulation within each dataset. To achieve a common gene ranking, the fold change rank-based scores from individual datasets were aggregated into the sum of scores for each gene, separately for positively and negatively regulated genes. (D). Each gene symbol within all datasets was mapped to the appropriate gene using Ensembl. After mapping, a table of all the human-mapped data was generated, searchable by the original gene symbol in every species (E). For interactive visualization and exploring, the data were exported from the MySQL database, re-structured and uploaded into Ordino (F). WT and grhl3 -/-embryos. Gene expression was normalised to expression of housekeeping gene EF1a. Of the 18 fish orthologues selected, 10 showed differential expression in WT (black bars) vs grhl3 -/-(white bars) embryos. * p < 0.05, ** p < 0.01, n.s., non-significant.
Our analysis allowed us to determine the genes that are both most frequently and most substantially differentially regulated, following gain/loss of grh/Grhl function. From these comparisons, we determined which genes exhibited the greatest degree of regulation following disruption of grh/Grhl signalling. We determined this by: (1) frequency of altered expression (the number of datasets a given gene appeared in as "regulated"); and (2) degree of regulation, measured by the score, based on the rank given to a gene in a particular dataset, with respect to other differentially regulated genes. These analyses allowed us to reveal the genes that appeared within at least one dataset and determine the top differentially regulated genes following grh/Grhl deregulation or misexpression, across all biological contexts thus far tested and described in the literature ( Figure 2). WT and grhl3 -/embryos. Gene expression was normalised to expression of housekeeping gene EF1a. Of the 18 fish orthologues selected, 10 showed differential expression in WT (black bars) vs grhl3 -/-(white bars) embryos. * p < 0.05, ** p < 0.01, n.s., non-significant.

Biological Pathways Regulated by grhl in Epithelia as Identified by GO Analysis
In order to determine the overall biological processes regulated by the grh/Grhl pathway, we performed GO (Gene Ontology) pathway analysis, using gProfiler's GOStat (Gene Group Functional Profiling), with the default background set of all human genes. Mindful of obvious limitations, such as tissue origin, disease state and temporal/developmental time point, we nonetheless analysed all the disparate datasets to determine an "overall" snapshot of grhl-target gene biology (Table 3). We found that this analysis largely confirmed previous known roles of the Grhl family. The major biological functions ascribed to genes regulated by the Grhl family were "cell-cell adhesion" "tissue (epithelia/epidermis) development", "animal organ/skin development", "water homeostasis" and "epithelial cell morphogenesis", consistent with known grh/Grhl-dependent functions in epithelial morphogenesis and maintenance, as well as preventing trans-epidermal water loss and maintaining epithelial integrity through suppressing the epithelial-mesenchymal transition (EMT). Somewhat surprisingly, the GO term "protein cross-linking" was also highlighted in our grh/Grhl transcriptome analysis. grh/Grhl factors are thought to be constitutively bound to the promoters/enhancers of target genes, achieving transcriptional specificity through the recruitment of partner protein co-activators as development proceeds [39]. These co-factors may include members of the Polycomb and Trithorax groups in Drosophila [39,40], and the vertebrate transcription factor LMO4 [34] in mice, although, as few such co-factors are known, enrichment of this GO term may suggest a novel, hitherto unsuspected grh/Grhl role in the assembly, formation or maintenance of the core transcriptional apparatus.

Grhl-Dependent Target Genes are Well-Conserved between Mouse and Human; Less so between Mouse and Drosophila
In order to determine how well conserved the grh/Grhl-dependent transcriptional pathways were between mouse, Drosophila and human, we generated separate lists for each organism, using only species-specific, large-scale microarray/RNA-SEQ datasets. This analysis comprised 20 datasets from mouse, six datasets from Drosophila, and 12 datasets from human (Table 1). Using the top 50-ranked mouse genes as our root list for comparison, we determined which of these genes were ranked in the top 10% differentially-regulated genes in both Drosophila (in the top 1300 of~13,000 genes in the Drosophila genome) and human (top 2500 out of~25,000 total human genes), reasoning that genes which fell below this ranking threshold were less likely to be biologically-relevant in the context of grh/Grhl transcription (Table 4). Our analyses showed that 32 of the top 50 mouse genes had an identifiable Drosophila orthologue, with 4/32 orthologues of these mouse genes-Grhl2 (grh), Car6 (CAH7), Car2 (CAH1) and unc93a (GC4928)-also being differentially-regulated in Drosophila. Unsurprisingly, the degree of conservation was greater in human, with 47/50 human orthologues identified, and 17/47 of these mouse targets appearing in the top 10% differentially regulated human genes. Moreover, 13 of these orthologues-PROM2, CLDN4, PPL, GRHL2, CDH1, LAD1, RAB25, TMPRSS13, AP1M2, KLK6, SFN, CLDN1 and RPTN-appeared within the top 500 (top~2%) most differentially regulated genes in humans, suggesting that these genes may substantially underpin Grhl-dependent transcriptional networks in mammalian species. Analysis of these 13 genes shows that eight of these-Prom2, Cldn4, Ppl, Cdh1, Lad1, Tmprss13, Cldn1 and RPTN-are involved in maintaining structural integrity within epithelia, particularly the epidermis. Of the others, there is a member of the Ras superfamily of small GTPases, involved in vesicular trafficking to the cell membrane (Rab25), a transcription factor (Grhl2 itself), a serine protease (KLK6), a clathrin-associated adaptor in the Golgi/endosome network (Ap1M2) and a cell-cycle checkpoint protein (SFN). This suggests that no other biological processes are as well conserved among Grhl-differentially regulated transcripts across multiple species as epithelial and epidermal integrity. These data are particularly striking, given the large differences in the source material used to generate each of the datasets, comprising different organisms, disease states, developmental timepoints and manipulating different grh/grhl orthologues. Table 3. Gene Ontology (GO) analysis of the most differentially regulated cellular processes in all biological contexts, following modulation of grh/Grhl function. Functional Gene Ontology (GO) annotations are over-represented among the 200 genes positively regulated by grh/grhl, with the highest evidence of regulation across all species (highest fold change score aggregated over all the datasets).

Source
Term

Refining Meta-Analyses to Encompass Only Large-Scale Datasets Generated from Epithelial Tissue or Cancer Cell Lines Reveals grh/Grhl-dependent Transcriptome Specificity
As the grh/Grhl family are key regulators of epithelial establishment, development, maintenance and homeostasis, we refined our meta-analyses to only include datasets generated from primary epithelial tissues (Table 1). These data encompassed 18 datasets from seven separate publications, and restricted our data to Grhl-dependent transcriptomes, derived from epithelia from mouse embryonic skin [10], bladder [41], non-neural ectoderm [42] and lung epithelium [43], human neonatal epidermal keratinocytes [44], and undifferentiated primary human bronchial epithelial cells [31], as well as adult mouse and human psoriatic skin [45], and allowed us to identify the most-differentially regulated genes in epithelial development, the top 50 of which are shown in Table 5. This list includes numerous well characterized genes, known to be expressed in epithelia, which are involved in the maintenance of epithelial integrity and structural stability, such as Claudins 1/4/23, Periplakin, Envoplakin, Occludin and Prominin 2 (Table 5). Interestingly, the top 50 genes include five "predicted" genes-Gm19601, Gm3579, 231000H18Rik, Gm20305 (possible pseudogene) and Gm10639-that are yet to be ascribed a function. Next, we determined which of these top 50 "epithelial" grh/Grhl-regulated genes may also be de-regulated in Grhl-influenced cancerous cells (Table 6). We generated a dataset of differentially regulated genes in those RNA-SEQ/Microarray datasets that were designated as being derived from "cancer cell-lines" ( Table 6). These experiments typically included control cancer cells, compared to those in which Grhl activity was modulated through short-hairpin (shRNA) knockdown, and included 4T1 breast tumor cells [46], human mammary epithelial (HMLE-Twist-ER) cells [47], MSP (mesenchymal sub-population) cells obtained from HMLE cells [48], OVCA429 (ovarian cystadenocarcinoma; intermediate epithelial (IE) phenotype) cells [49] and LNCaP (human prostate carcinoma) cells [50]. Surprisingly, only two genes appeared in both the "epithelial targets" and "cancer targets" tables-Tmem54 and Claudin-4, a previously experimentally validated Grhl-target. We did note several instances of different family members expressed across both sets (e.g., Sod3, Krt6b/78, Tmprss4/11bnl and Cldn1/23/b8 in epithelia, and Sod1/2, Krt7/19, Tmprss13 and Cldn7 in cancer), suggesting that loss of Grhl-mediated transcription may lead to similar biological consequences, albeit mediated via different (possibly secondary, or indirect) target pathways. However, these results indicate that the Grhl-dependent transcriptome is largely context and tissue-type dependent, with little evidence of conserved mechanisms operating in epithelial homeostasis and epithelial cancer. Table 5. The top 50 most differentially regulated genes in primary epithelia following modulation of grh/Grhl function. Orthologues of the genes highlighted in red were screened and validated in subsequent Q-RT-PCR experiments in grhl3 -/zebrafish.

Rank
Gene Regulation Normalised Score

Expression of Differentially-Regulated Epithelial Genes in a Novel Vertebrate Model
In order to determine the predictive power of our meta-analysis approach, to identify novel grhl-dependent putative target genes (both direct and indirect), we analysed the expression of our top-ranked genes that were differentially regulated in epithelia, by performing Q-RT-PCR on cDNA from grhl3 -/mutant zebrafish. This model was utilised to test our meta-analyses, as none of the large-scale datasets were generated from zebrafish tissue, hence, this model would speak to the genetic conservation of grhl-dependent pathways across vertebrates.
From the genes that were differentially regulated in epithelia (Table 5), we identified the top 10 genes with an identifiable zebrafish orthologue, or orthologues. The roles of these ten mammalian genes-Cldn23, Ppl, Prom2, Ocln, Slc6a19, Aldh1a3, Sod3, Car6, Cldn4 and Evpl-were sub-functionalised into a total of 17 zebrafish orthologues, based on sequence conservation homology (Table S2). Next, we performed Q-RT-PCR on fifty, 48 hours post-fertilisation (hpf) WT and grhl3 -/embryos (experiment performed in triplicate, see methods) to determine the relative expression of each of these fish orthologues. Of the 17 zebrafish orthologues analysed, we found that 10-cldn23a, cldn23b, ppl, prom2, oclna, slc6a19a.1, slc6a19b, aldh1a3, sod3a and evplb-showed statistically significant differences in expression between WT and grhl3 -/fish (primarily down-regulation). We utilised whole embryos for our analyses, not just epithelial tissue, and we only performed analyses at a single developmental timepoint. Nonetheless, even within these experimental constraints, the fact that 10/17 zebrafish orthologues were significantly differentially regulated highlights the remarkable conservation of grhl-dependent pathways across the evolutionary spectrum, and the strong predictive power of our approach for identifying novel, biologically relevant transcriptional networks downstream of grh/Grhl function.

Discussion
In this study, we interrogated large-scale transcriptomic data to uncover novel genetic mechanisms that operate downstream of activation or repression by grh/Grhl genes. We collected all currently published large-scale RNA-SEQ and microarray datasets from multiple disparate animal models, developmental timepoints, tissues and cancer cell lines, to delineate conserved grh/Grhl-dependent genetic networks. Using a novel, ranking-based algorithm approach to draw synergies between disparate methodologies, we identified the top-differentially regulated genes following gain or loss of grh/Grhl function in all datasets, as well as refining our analyses to the top-differentially regulated genes in primary epithelia and in cancer. We also examined the conservation of mechanisms in mouse, Drosophila and human genes, and, lastly, we tested the predictive strength of our meta-analysis approach to determine which of the top-ranked genes were differentially regulated in a novel vertebrate grhl-loss-of-function model, the zebrafish. Taken together, our study brings cohesion and clarity to the existing "big-data" approaches to tackle grh/Grhl-dependent transcription, and opens up new avenues (and identifies novel putative targets) to further experimentally delineate grh/Grhl-dependent transcriptional genetic regulatory networks.
Across the 41 disparate Microarray/RNA-SEQ datasets, of the top 50 genes we identified, 15 had previous connections to Grhl transcription factors, either through ChIP/ChIP-SEQ experiments to identify regions of genome occupancy by Grhl proteins, or through targeted biological experiments to empirically determine novel functional relationships. The genes to which Grhl-factors bind, and drive/repress transcription can be considered true "direct" target genes. A number of our top 50 genes fall into this category, namely, Cldn4, Cdh1 (E-Cadherin), Rab25, Epcam, Esrp1 and Spint1, with a further nine genes-Tmem54, Prom2, Ppl, Lad1, Tacstd2, St14, Prss22, and Prss8-showing instances of promoter-binding by Grhl-factors in ChIP-analyses, but currently lacking empirical experimental validation to determine true biological interactions. As such, these nine genes would appear to be the prime candidates for future studies into identifying novel Grhl-target genes through established analyses, such as in vitro luciferase activation assays, genetic complementarity in animal models (e.g., through generating Grhl/gene-of-interest doubly heterozygous mice) or models to rescue Grhl loss-of function phenotypes, as we have shown previously by rescuing aberrant developmental phenotypes in grhl2b or grhl3-deficient zebrafish embryos, with eng2a [16] and edn1 [17] mRNA micro-injection, respectively.
The Grhl-family are well-known in both embryogenesis and cancer. Multiple studies have highlighted substantial developmental abnormalities in animal models lacking Grhl factors. These include numerous skin formation, maintenance and healing defects [2,11,28,36,51], impaired neural tube closure [5], facial and palatal clefting [19,52], regulation of skull bone apposition [18] and impaired formation of the lower jaw [17]. These latter (craniofacial) defects are interesting, owing to the fact that they are secondary consequences of disrupted epithelial formation-either the overlying cranial skin not allowing cranial bones to grow and expand [18], or through decreased endodermal edn1-signalling from the endodermal epithelium to cranial neural crest cells [17]. These studies parallel recent findings in human craniofacial disorders, namely Van der Woude Syndrome (VWS), a congenital syndrome characterised by palatal clefting. Disruption of the thin epithelial layer covering the developing palatal shelves-the periderm-leads to clefting in both mice and humans due to adhesions forming between the palatal shelves and tongue in utero [19]. Critically, VWS is also known to be caused by mutations in a critical Grhl-co-factor, IRF6 [53] in both mice and humans.
These studies show that understanding the Grhl-dependent transcriptome is likely to uncover further novel biological pathways in the aetiology of craniofacial (and other epithelial) disorders caused by a primary epithelial defect.
In the context of cancer, Grhl-factors perform tumor-suppression roles in the epithelia of both the skin [30] and oesophagus [54], as well as maintaining epithelial identity through suppression of the epithelial-mesenchymal transition (EMT) in cancers of epithelial origin [47]. Strikingly, and somewhat surprisingly, our study found virtually no correlation between the top 50 differentially regulated genes in normal epithelia as compared to cancerous cells, with only two genes-Tmem54 and Claudin4-appearing in both lists. This could suggest that, in general terms, the normal grh/Grhl-dependent transcriptional pathways, that operate in epithelial development, homeostasis and maintenance, are not the most highly dysregulated in epithelial cancers. Caveats exist in this interpretation, chief among them being the lack of a direct comparison between normal and cancerous epithelia derived from the same source, as well as the substantial differences that exist between cells and tissues of origin utilised for the individual RNA-SEQ/Microarray datasets. Additionally, the published microarray/RNA-SEQ datasets defining the role of Grhl in cancers follow the experimental paradigm of inhibiting or over-expressing Grhl-factors in already tumorigenic cell lines, which is a separate biological question to analysis of the transcriptome of cancers that arise directly due to aberrant Grhl-function. Transcriptomic analyses of tumors arising in Grhl3-deficient mice, relative to non-cancerous adjoining tissue, such as the skin and esophagus [30,54], would add important data points to our current meta-analyses.
Overall, this study demonstrates that meta-analysis of previously published RNA-Seq datasets is able to identify novel targets of grhl transcription. The depth and robustness of this approach is clearly demonstrated by the fact that many of the genes identified via this method have been previously validated in the literature. Moreover, the experimental validation of altered expression of our identified target genes, using a vertebrate model that was not included in any of the published data sets, highlights the power of this type of analysis to predict conserved gene targets across different model organisms. Our experimental paradigm naturally has certain caveats and limitations, which any analysis must keep in mind, such as the degree and direction of target regulation in fish compared to mammals, the analysis of gene expression over separate developmental and, perhaps, adult timepoints, and specific analyses through ISH/IHC of mRNA/protein distribution, specifically in epithelial tissues, e.g., developing EVL and the skin. Moreover, grh (and presumably also Grhl) proteins are known to be post-translationally modified through processes such as phosphorylation [32,55], which further confounds the precise nature of grh/Grhl-dependent regulation (or co-regulation) of transcriptional targets. Nonetheless our experimental approach in fish successfully identified 10 gene orthologues that are significantly differentially regulated, as predicted by the meta-analysis database construction algorithms we have developed and utilised here. Lastly, our analyses did not determine which of these genes may be true direct targets, based on promoter occupancy of the target gene promoter by grh/Grhl, although such direct binding and activation/repression approaches would be a logical extension of our meta-analyses. Thus, our database is an excellent resource for researchers in the field of grhl function, helping to identify and categorise novel molecular pathways underlying various developmental and disease processes.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/11/876/s1, Table S1: Oligonucleotide sequences used in the present study; Table S2: Zebrafish orthologues of differentially regulated mammalian genes. The top 10 differentially regulated genes following Grhl-dependent modulation in primary epithelia that had identifiable zebrafish orthologues, showing the function of each gene in mammals. Zebrafish orthologues of these ten mammalian genes comprise a total of 18 separate genes.