FLAME: A Web Tool for Functional and Literature Enrichment Analysis of Multiple Gene Lists

Simple Summary Manipulation of multiple gene lists when going through a functional enrichment analysis is often a hefty task. In this article, we present FLAME, a web application that facilitates such multiple-list manipulations, enabling the construction of intersections and unions among multiple gene lists for further enrichment analysis. Reported results are presented via multiple interactive viewers such as heatmaps, barcharts, Manhattan plots, and networks. Abstract Functional enrichment is a widely used method for interpreting experimental results by identifying classes of proteins/genes associated with certain biological functions, pathways, diseases, or phenotypes. Despite the variety of existing tools, most of them can process a single list per time, thus making a more combinatorial analysis more complicated and prone to errors. In this article, we present FLAME, a web tool for combining multiple lists prior to enrichment analysis. Users can upload several lists and use interactive UpSet plots, as an alternative to Venn diagrams, to handle unions or intersections among the given input files. Functional and literature enrichment, along with gene conversions, are offered by g:Profiler and aGOtool applications for 197 organisms. FLAME can analyze genes/proteins for related articles, Gene Ontologies, pathways, annotations, regulatory motifs, domains, diseases, and phenotypes, and can also generate protein–protein interactions derived from STRING. We have validated FLAME by interrogating gene expression data associated with the sensitivity of the distal part of the large intestine to experimental colitis-propelled colon cancer. FLAME comes with an interactive user-friendly interface for easy list manipulation and exploration, while results can be visualized as interactive and parameterizable heatmaps, barcharts, Manhattan plots, networks, and tables.


Introduction
Functional enrichment analysis is a method to identify classes of bioentities in which genes or proteins have been found to be over-represented. This type of analysis can aid researchers to reveal biological insights from various omics experiments and interpret gene lists of interest in a biologically meaningful way. For this purpose, several applications have been proposed [1,2]. Established tools include: g:Profiler [3], Panther [4], DAVID [5], WebGestalt [6], EnrichR [7], AmiGO [8], GeneSCF [9], AllEnricher [10], aGOtool [11], FLAME allows for the uploading of multiple gene lists as separate files, or as pasted text. In the online version, FLAME can accommodate up to 10 active gene lists with a file size smaller than 1 MB each, a limitation that can be bypassed by downloading the application from GitHub and running it locally after editing the value of the corresponding variable (FILE_LIMIT configuration variable in global.R and shiny.maxRequestSize option in ui.R). The input text can be imported in comma-, tab-, or line-separated format. In this version, FLAME supports 197 organisms and several identifiers, such as gene IDs (proteins, transcripts, microarray IDs, etc.), SNP IDs, chromosomal intervals, and term IDs, in accordance with the gconvert function of the gprofiler2 library. Different lists are not allowed to have the same name, with options for renaming and deleting them being suitably provided. Once uploaded, list contents can be seen in interactive searchable tables.

List Manipulation with the Use of UpSet Plots
After uploading the gene lists of interest and prior to analysis, UpSet plots can be generated to show possible intersections, unions, and distinct elements between the selected lists. The UpSet plot is a sophisticated alternative of a Venn diagram and is preferred for many sets (>5) where a Venn diagram becomes incomprehensive. In Figure 1, we demonstrate a simple example with three lists, each containing 100 random genes, and visualize the various UpSet plot options in relation to a Venn diagram ( Figure 1A-D). Similarly, in Figure 1E, we describe the distinct intersections among seven gene lists, a task that cannot be drawn as a Venn diagram.
An UpSet plot consists of two axes and a connected-dot matrix. The vertical rectangles represent the number of elements participating in each list combination. The connecteddots matrix indicates which combination of lists corresponds to which vertical rectangle. Finally, the horizontal bars (Set Size) denote the participation of hovered objects (from the vertical rectangles) in the respective lists. In its current version, FLAME supports four UpSet plot modes: (I) intersections, (II) distinct intersections, (III) distinct elements per file, and (IV) unions. The intersections mode creates all file combinations as long as they share at least one element, allowing an element to participate in more than one combination. The distinct intersections mode creates file combinations only for distinct elements that do not participate in other lists. The distinct elements per file mode shows the distinct elements of each input list. The unions mode constructs all available file combinations, showing their total unique elements. Upon mouse-hover over each UpSet plot element combination (vertical rectangles), FLAME presents the respective genes in a table, whereas on a mouse click, the user can append the selected list of files, with a new file containing the selected genes and processing it separately. An UpSet plot consists of two axes and a connected-dot matrix. The vertical rectangles represent the number of elements participating in each list combination. The connected-dots matrix indicates which combination of lists corresponds to which vertical rectangle. Finally, the horizontal bars (Set Size) denote the participation of hovered objects (from the vertical rectangles) in the respective lists. In its current version, FLAME supports four UpSet plot modes: (I) intersections, (II) distinct intersections, (III) distinct elements per file, and (IV) unions. The intersections mode creates all file combinations as long as they share at least one element, allowing an element to participate in more than one combination. The distinct intersections mode creates file combinations only for distinct elements that do not participate in other lists. The distinct elements per file mode shows the distinct elements of each input list. The unions mode constructs all available file combinations, showing their total unique elements. Upon mouse-hover over each Up-Set plot element combination (vertical rectangles), FLAME presents the respective genes in a table, whereas on a mouse click, the user can append the selected list of files, with a new file containing the selected genes and processing it separately.

Functional Enrichment
Once an input file or an UpSet plot column (intersection/union of sets) has been selected, FLAME takes advantage of the g:Profiler library [3] and aGOtool API [11] to offer functional enrichment analysis for a list of 197 organisms.
In detail, g:Profiler is used for the identification of enriched functional terms from Gene Ontology [19]; pathways from KEGG [20], Reactome [22], and WikiPathways [23]; protein complexes from CORUM [25]; expression data from the Human Protein Atlas [26]; regulatory motifs from TRANSFAC [27] and miRTarBase [28]; and phenotypes from the Human Phenotype Ontology [29]. Similarly, aGOtool is used for the identification of enriched terms from the UniProt keyword classification system [30], protein families and domains from Pfam [31] and InterPro [32], and human diseases from the DISEASES database [33].

Functional Enrichment
Once an input file or an UpSet plot column (intersection/union of sets) has been selected, FLAME takes advantage of the g:Profiler library [3] and aGOtool API [11] to offer functional enrichment analysis for a list of 197 organisms.
In detail, g:Profiler is used for the identification of enriched functional terms from Gene Ontology [19]; pathways from KEGG [20], Reactome [22], and WikiPathways [23]; protein complexes from CORUM [25]; expression data from the Human Protein Atlas [26]; regulatory motifs from TRANSFAC [27] and miRTarBase [28]; and phenotypes from the Human Phenotype Ontology [29]. Similarly, aGOtool is used for the identification of enriched terms from the UniProt keyword classification system [30], protein families and domains from Pfam [31] and InterPro [32], and human diseases from the DISEASES database [33].
g:Profiler and aGOtool test for statistically significant enrichment to compare the user's input gene list to a background set from organism-specific genes annotated in the Ensembl database [34] and UniProt Reference Proteomes [30], respectively. In the case of g:Profiler, the resulting p-values are corrected for multiple testing using g:SCS, Bonferroni correction, or Benjamini-Hochberg false discovery rate (FDR), whereas in the case of aGOtool, p-values are corrected using Bonferroni correction or FDR. Notably, the reported lists with the enrichment results can be shrunk or expanded using the aforementioned parameters as thresholds.
Enrichment analysis is essentially performed using ENSEMBL identifiers, while, on the basis of the user's choice, results can be reported in the input format or as Entrez, UniProt [30], EMBL [35], ENSEMBL [34], ChEMBL [36], WikiGene [37], and RefSeq [38] identifiers. Conversion to ENSEMBL identifiers is done internally at the backend, regardless of the identifier type imported by the user. Input elements that are not identified in the enriched terms are presented to the user, along with elements that could not be translated into the requested output format.

Literature Enrichment
In addition to functional enrichment, FLAME enables literature enrichment analysis for a selected gene list, via the aGOtool API. To this end, FLAME allows users to retrieve scientific articles that are tightly connected to the genes/proteins provided in the uploaded input files. The literature enrichment analysis concept is very similar to the one of the enrichment analyses and aims to aid users to identify scientific publications of relevance to a given gene/protein list.
The publication enrichment analysis is based on the aGOtool, which uses a text corpus of all PubMed abstracts and full-text open access articles from PubMed Central. These documents are processed by OnTheFly's [39] or EXTRACT's [40] underlying Named Entity Recognition (NER) tagger [41] on a weekly basis for the identification of biomedical entities (genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes, and Gene Ontology terms). As a result, all documents are being automatically annotated with the genes mentioned within them, thus, as in functional enrichment, turning every document into a 'gene set'.

Protein-Protein Interaction Analysis
FLAME offers the capability to construct and visualize interactive PPIs for a set of 197 organisms using the STRING API [24]. Users may submit their gene list and visualize the results as networks, with the interacting entities being presented as nodes and their interactions as edges. In the online version, FLAME allows for a maximum of 500 proteins per request, a limitation that can be bypassed by downloading the application from the GitHub repository and running it locally after editing the value of the corresponding variable (STRING_LIMIT configuration variable in global.R). We note that for each input gene, we only keep one converted Ensembl protein identifier.
STRING supports both physical and functional interactions. Physical interactions refer to proteins that are part of the same biomolecular complex, whereas functional interactions refer to proteins that are involved in the same pathway or biological process. Through FLAME, users can select whether to visualize the full set of interactions (both physical and functional), or just the physical sub-network. Users can also adjust the interaction score and apply a cutoff on the edges. In addition, users can choose between the evidence or the confidence mode. In the first case, a multi-edged graph is drawn, in which each edge shows the evidence channel (e.g., fusion event, co-expression, and text mining), and the information comes from [42], whereas in the second case, the thickness of the edge reflects the interaction score. While the resulting network is presented in a separate Network Viewer panel, one can export a network as an image or as a tab-delimited file to be visualized by external viewers [43][44][45] or get redirected in STRING's original source for further analysis.

Visual Analytics and Interactive Visualization
FLAME offers various interactive plots and visualization options for reporting results. Functional enrichment results produced by g:Profiler and aGOtool, as well as literature enrichment results produced by aGOtool, can be shown as tables, scatter plots, barcharts, heatmaps, and networks.
Resulting lists of functional terms are initially reported as interactive searchable tables displaying details about each functional term. One can expand each row of the table to see which of the identified genes/proteins were found to be associated with the functional term. For example, in the case of a KEGG pathway, one can see how many proteins or genes were found to be related to it and get redirected to the KEGG repository to see the actual schema of the pathway in a static form with all of the detected genes/proteins highlighted.
In the case of g:Profiler only, an 'adjusted to the selected data sources' interactive Manhattan plot is offered for a clearer overview. In this plot, functional terms are organized along the x-axis and colored by their data source, whereas the y-axis shows the significance (p-value) of each term. Hovering over a data point generates a popup window with key information about the functional term. When a set of points are selected using a lasso or a rectangle, the Manhattan plot will be redrawn, showing information about the selected items only. Upon selection, the corresponding tables will be automatically updated.
In all of the enrichment-type analyses offered by FLAME, for every source, the most significant functional terms will be shown in a bar chart or a scatter plot, in which the user can further customize to adjust the desired number of terms analyzed. Bar charts are sorted according to the enrichment score or the p-value. Finally, on mouse hovering, a tooltip with key information about the functional term will be shown.
Heatmap and network visualization options complement each other and come with two different execution modes. In the first case, genes are plotted against functional terms, while in the second case, functional terms are plotted against themselves (square matrix). In the case of heatmaps, a color-scheme is used to capture the enrichment or the statistical score for a particular functional term. In the second mode (functional terms vs. functional terms), a cell value depicts a similarity capturing the number of common genes. This is calculated as the summation of the unique common genes between a pair of functional terms divided by the number of total unique genes found to be associated with the functional terms. All of the heatmaps are fully interactive, and one can zoom in and isolate an area of interest, swap the xand y-axes, and adjust the number of elements that will be shown. Notably, all heatmaps are clustered after applying a hierarchical clustering method, and export options are also supported.
In addition to heatmaps, a three-mode network visualization is offered. In the first case, nodes represent genes and/or functional terms (colored differently), whereas edges represent the similarity scores between them, as explained previously (weighted networks). In the second case, nodes represent the functional terms, while edges reflect the number of common genes between them. Finally, in the third mode, nodes represent genes that are connected on the basis of the number of common functions or processes they are involved in.
While a heatmap is a preferable option for observing all possible pairwise similarities, in the second and third network mode users can apply an edge-cutoff, based on the similarity score or the common number of functions, respectively. This can help reduce the networks' density and make visualization more appealing and more informative. Networks are fully interactive, and one can zoom in/out, adjust the layout accordingly, or export the network in various file formats. Examples of all of the aforementioned visualizations are shown in Figure 2.

Gene ID Conversion and Orthology Search
Frequently, different databases and tools accept varying gene/protein identifiers as their input. Thereby, the ID mapping creates a bottleneck for the non-expert end-users. To aid researchers in overcoming this problem, FLAME utilizes g:Profiler's converters to allow (i) cross-database ID conversions and (ii) cross-species ID conversions (orthologs). In detail, FLAME allows ID conversions between well-known name spaces, such as Entrez Gene, Uniprot, ChEMBL, ENSEMBL, and RefSeq, as well as among the 197 different organisms supported by FLAME.

Implementation
FLAME is mainly written in R and JavaScript. The R/Shiny package was used for the GUI implementation and the interoperability between R and Javascript. UpSet plots have been implemented with the use of R/upsetjs library. Functional enrichment analyses are offered by the R/gprofiler2 library [46] and aGOtool API. The latter is also used for literature enrichment analysis. Networks are stored as igraph objects [47] and visualized with the visNetwork library. Scatter and Manhattan plots are generated with the help of the R/plotly library [48], bar plots with R/ggplot2 [49], and heatmaps via the R/heatmaply library [50]. Interactive tables are generated through the R/DT library. Finally, network analysis is performed with the employment of the STRING API.

Gene ID Conversion and Orthology Search
Frequently, different databases and tools accept varying gene/protein identifiers as their input. Thereby, the ID mapping creates a bottleneck for the non-expert end-users. To aid researchers in overcoming this problem, FLAME utilizes g:Profiler's converters to allow (i) cross-database ID conversions and (ii) cross-species ID conversions (orthologs). In detail, FLAME allows ID conversions between well-known name spaces, such as Entrez Gene, Uniprot, ChEMBL, ENSEMBL, and RefSeq, as well as among the 197 different organisms supported by FLAME.

Results and Discussion
We tested the capacity of FLAME for knowledge discovery by re-analyzing gene expression data associated with colitis-propelled carcinogenesis (CAC) in mice [51]. In this model, tumorigenesis develops following a single application of the carcinogen azoxymethane (AOM) combined with four cycles of dextran sodium sulfate (DSS) administration that causes chronic colitis. However, inflammation, tissue damage, dysplasia, and cancer are manifested in the distal but not the proximal part of the large intestine. To gain insight into the biological basis of this intriguing phenomenon, we previously interrogated the transcriptome of proximal and distal colon and reported intrinsic differences in gene expression, which are augmented during CAC. These analyses also provided evidence that lipid metabolic pathways operating in the proximal part of the large intestine mediate resistance to experimental colitis and CAC [51].
Herein, we focused on exploring gene ontologies and pathways that may mediate sensitivity to CAC. We reasoned that, transcripts that are upregulated at both the 'early' (i.e., 2 DSS cycles) and 'late' (i.e., 4 DSS cycles) stages of AOM/DSS-induced carcinogenesis in the distal part of the colon and are not detected in the proximal region must play a prominent role in CAC. We identified 165 transcripts belonging to this intersection by using the 'UpSet Plot' option of FLAME ( Figure 3A and Table S1), and termed this gene set 'the susceptibility-associated gene signature' (SAS).
To interpret SAS transcripts as biological functions and pathways, we combined KEGG and REACTOME pathway enrichment analysis using FLAME. Reassuringly, we found 'inflammatory bowel disease' to be among the most significantly enriched pathways ( Figure 3B and Table S2). Other pathways found to be significantly enriched in SAS were related to immune function, including T helper 1 (Th1)/Th2 differentiation, Th17 cell differentiation, natural killer (NK) cell-mediated cytotoxicity, and cytokine/chemokine signaling, which align with the pivotal role of unresolved inflammation in promoting intestinal cancer [52]. Indeed, Th17 lymphocytes give rise to pathogenic Th1 cells implicated in colitis [53], and when activated by TGFβ1 (a SAS gene set cytokine; Table S1), Th17 cells promote CAC [54]. Moreover, chronic experimental colitis is associated with dual Th1 and Th2 cytokine profile [55], and there is evidence to suggest that Th2-driven colonic inflammation enhances the formation of colorectal tumors [56]. We also note the significant enrichment of SAS for the pro-inflammatory TNF pathway, which represents a major therapeutic target for inflammatory bowel diseases.
Gene ontology (GO) enrichment was also performed on the FLAME platform. SAS was found to be enriched for biological processes that were mostly related to T cell-mediated immunity and cytokine synthesis ( Figure S1A and Table S3). The GO enrichment for molecular functions ( Figure S1B and Table S3) identified several terms related to cytokine and chemokine activity, including TNF, and to death receptor signaling, which has been implicated in CAC [57]. A gene-gene network was also constructed using FLAME to uncover clusters of SAS genes with common molecular functions ( Figure S1C). Analysis of putative protein-protein interactions (physical and functional associations) through the STRING option of FLAME uncovered a cluster of interacting components of T cellassociated immunity and a cluster of interferon beta (IFNβ) response proteins implicated in epithelial regeneration upon DSS-induced tissue damage [58] (Figure 3C). To interpret SAS transcripts as biological functions and pathways, we combined KEGG and REACTOME pathway enrichment analysis using FLAME. Reassuringly, we found 'inflammatory bowel disease' to be among the most significantly enriched pathways ( Figure 3B and Table S2). Other pathways found to be significantly enriched in SAS were related to immune function, including T helper 1 (Th1)/Th2 differentiation, Th17 cell differentiation, natural killer (NK) cell-mediated cytotoxicity, and cytokine/chemokine signaling, which align with the pivotal role of unresolved inflammation in promoting intestinal cancer [52]. Indeed, Th17 lymphocytes give rise to pathogenic Th1 cells implicated in colitis [53], and when activated by TGFβ1 (a SAS gene set cytokine; Table S1), Th17 cells To gain putative mechanistic insights into the aforementioned biological functions and pathway data, we used the FLAME platform to perform transcription factor motif enrichment and analyze transcription factor networks versus SAS transcripts. The results showed a cluster of interferon regulatory factor (IRF) family members ( Figure S1D and Table S4), which may be linked to both the enrichment of SAS in IFNβ components and T helper cell differentiation. Another transcription factor cluster was associated with SMADs, which have been implicated in CAC, in part downstream of TGFβ1 [59,60]. Overall, the aforementioned analyses underscore the practical utility of FLAME to rapidly process, analyze, and visualize gene expression data, and hence assist knowledge discovery.

Conclusions
FLAME is a web application aimed to effectively complement existing enrichment tools and make gene list analysis, information extraction, knowledge integration, and visualization much easier and more comprehensive, accurate, and reliable. It combines functionalities from g:Profiler, aGOtool, and STRING to cover a wide spectrum of analyses, such as functional, literature enrichment, and network analysis. FLAME is designed for non-experts and via the engagement of UpSet plots; it allows users to manipulate, compare, and handle multiple gene lists prior to any type of analysis. Moreover, it follows a visual analytics approach, thus offering users various options for visualizing their results and adjusting the corresponding views upon parameterization to generate publication-accepted figures. In a future version, we expect to expand FLAME's capabilities by supporting more organisms and integrating a greater plethora of sources. We expect FLAME to reach out to many users varying from non-experts to highly specialized bioinformaticians.  Table S1: Gene names comprising the 'susceptibility-associated gene signature' (SAS). Table S2: List of KEGG and REACTOME pathways enriched in SAS. Table S3: List of the most enriched GO biological processes (BP) and molecular functions (MF) in SAS. Table S4: List of transcription factors predicted to regulate SAS genes. Funding: This study was financially supported by the Operational Program Competitiveness, Entrepreneurship and Innovation, NSRF 2014-2020, action code: MIS 5002562, co-financed by Greece and the European Union (European Regional Development Fund). The study was also supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the 'First Call for H.F.R.I. Research Projects to support faculty members and researchers and the procurement of high-cost research equipment grant', grant ID: 1855-BOLOGNA. We also acknowledge support of this work by the project 'The Greek Research Infrastructure for Personalised Medicine (pMedGR)' (MIS 5002802), which is implemented under the Action 'Reinforcement of the Research and Innovation Infrastructure', funded by the Operational Program 'Competitiveness, Entrepreneurship and Innovation' (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).

Conflicts of Interest:
The authors declare no conflict of interest.