Dissecting Transcription Factor-Target Interaction in Bovine Coronavirus Infection

Coronaviruses are RNA viruses that cause significant disease within many species, including cattle. Bovine coronavirus (BCoV) infects cattle and wild ruminants, both as a respiratory and enteric pathogen, and possesses a significant economic threat to the cattle industry. Transcription factors are proteins that activate or inhibit transcription through DNA binding and have become new targets for disease therapies. This study utilized in silico tools to identify potential transcription factors that can serve as biomarkers for regulation of BCoV pathogenesis in cattle, both for testing and treatment. A total of 11 genes were identified as significantly expressed during BCoV infection through literature searches and functional analyses. Eleven transcription factors were predicted to target those genes (AREB6, YY1, LMO2, C-Rel, NKX2-5, E47, RORAlpha1, HLF, E4BP4, ARNT, CREB). Function, network, and phylogenetic analyses established the significance of many transcription factors within the immune response. This study establishes new information on the transcription factors and genes related to host-pathogen interactome in BCoV infection, particularly transcription factors YY1, AREB6, LMO2, and NKX2, which appear to have strong potential as diagnostic markers, and YY1 as a potential target for drug therapies.


Introduction
Coronaviruses are enveloped RNA virus, spherical to pleomorphic in shape, with 20 nm spikes on the envelope. A member of the coronavirus family, within the beta A subtype [1,2], Bovine coronavirus (BCoV), is a respiratory and enteric pathogen that infects cattle and wild ruminant species. Like several other variations of coronaviruses, BCoV has additional 5 nm secondary spikes [2,3]. The larger spike interacts with sialic acid within receptors on the target cell, while the smaller spike contains hemagglutinin-esterase (HE), another protein critical in attachment to target cells [1,4]. The virus results in three distinct symptoms: calf diarrhea, winter dysentery with hemorrhagic diarrhea in adults, and respiratory infection in cattle of all ages [4,5]. Neonatal calf diarrhea (NCD) is the major cause of death in unweaned heifers [6,7]. These symptoms are devastating for the agricultural industry, with impacts including the cost of treatment and prophylaxis, increased vulnerability to additional diseases, increased mortality rates, long-term lowered growth rates, and lowered milk production [8,9]. Widespread BCoV infections can have disastrous effects, especially in countries like Uruguay or Vietnam, where cattle production is a primary source of income and the overall BCoV detection rate is 7.8% [7] and 33.3% [1], respectively.

Prediction of Disease Associated Bovine Transcription Factors
The sequences and locations of the 11 genes were identified using NCBI nucleotide databases. The complete bovine nucleotide sequences along with accession numbers were retrieved and saved for analysis: TLR7 (NM_001033761), TLR9 (NM_183081.1), IRF1 (NM_001191261.2), IL-6 (NM_173923.2), CEBPD (NM_174267), TRAF2 (XM_005213525.4), RHOA (NM_176645.3), TP53 (NM_174201.2), CEBPB (NM_176788.1), SRC (NM_001110804.2), and NANS (NM_001046482.1). These genes were then used as seeds to identify transcription factors through different transcription factors prediction software. To increase accuracy, three computational tools were utilized; AnimalTFDB [22] and GeneXplain [26] that uses Match and P-Match algorithms for gene regulation. AnimalTFDB was run using the default settings, but both Match and P-Match algorithms on GeneXplain were adjusted to use the cutoff selection "0.95 and 1.0 as mat. sim. and core sim." With Venny [27], TFs that appeared in the results of at least two software were considered significant for further analysis. We then searched for transcription factors that target multiple genes. This step allowed us to identify significant transcription factors that were substantially targeting multiple genes across the entire list of genes, to limit false positives and increase predicted relation to BCoV infection. After these steps were performed, only 11 TFs remained, which were then included for further analyses.

Functional Analysis of Transcription Factors and Gene Targets
Using Gene Ontology [24,25], we performed functional analysis of the 11 transcription factors and 11 genes previously identified as significant in BCoV infections. Pathways with p < 0.001; False Discovery Rate (FDR) < 0.05 and a fold enrichment value over 70 were recorded, along with the genes and transcription factors involved with them were considered important. Immune specific pathways were highlighted for later analysis.

Network Analysis of Transcription Factors and Genes
Using STRING database [28], we performed a network analyses of TFs, genes, and the TFs and gene. The software produces a network of the genes and TFs through colored connections representing established protein-protein associations. To develop a network through the tool Cytoscape (v3.8.0, [29], the identified TFs and their target genes, along with the pathways from the GO functional annotation, were compiled and used as input into the software.

Phylogenetic Analysis of Conserved Transcription Factors
To determine conservation of the identified transcription factors, sequences were retrieved from the GenBank database and run through BLAST. The topmost 25 hits were selected for different species based on the E-values from the BLAST search, then run through the MEGA (v7) software for multiple sequence alignment. Phylogenetic trees were then developed using the Neighbor-Joining Tree method [30].

Identification of Genes and TFs Associated with Bovine Coronavirus
Literature searches and pathway analyses identified a total of 10 genes that appeared important in BCoV infections, as illustrated in Table 1. As explained in the Methods section, genes were recorded if they appeared in at least one significant pathway, with significance relying on the fold enrichment and the p-value. The gene NANS was added due to its significance in BCoV infection in the host, bringing the total of genes to 11. NANS is the gene responsible for producing N-acetyl 9-O-acetylneuraminic acid, the sialic acid within the host receptor BCoV utilizes to bind to host cell.
From the binding site predictions, 11 transcription factors were identified. All factors were identified by at least two unique software in at least two different BCoV identified genes, with some identified in up to 10 different genes, as shown in Tables 2 and 3. The only TF to appear in all three prediction software was NKX2-5. For three genes, NKX2-5 appeared in the results of two software, and for two genes, it appeared in all three. AREB6, YY1, and LMO2 all also appeared frequently in the predictions. As illustrated in Figure 1 and Table 3, most TFs were identified for two or three genes, which emphasizes a relationship between the genes, however, not necessarily a relationship with the factors used to establish the greater list of genes. However, with transcription factors that appear in 5, 7, or 10 of the 11 identified genes (NKX2-5, LMO2, YY1, and AREB6), there appears to be a potential relationship with the defining factors of the original gene list.    Table 3. List of predicted transcription factors targeting 11 genes significantly expressed during infection with bovine coronavirus.
Microorganisms 2020, 8, x FOR PEER REVIEW 4 of 21 the binding site predictions, 11 transcription factors were identified. All factors were identified by at least two unique software in at least two different BCoV identified genes, with some identified in up to 10 different genes, as shown in Tables 2 and 3. The only TF to appear in all three prediction software was NKX2-5. For three genes, NKX2-5 appeared in the results of two software, and for two genes, it appeared in all three. AREB6, YY1, and LMO2 all also appeared frequently in the predictions. As illustrated in Figure 1 and Table 3, most TFs were identified for two or three genes, which emphasizes a relationship between the genes, however, not necessarily a relationship with the factors used to establish the greater list of genes. However, with transcription factors that appear in 5, 7, or 10 of the 11 identified genes (NKX2-5, LMO2, YY1, and AREB6), there appears to be a potential relationship with the defining factors of the original gene list.

Functional Analysis of Transcription Factors and Genes
After a list of genes and transcription factors were identified, functional analysis was performed to identify significant pathways. A pathway was considered significant if the fold enrichment was over 70 and p-values less than 0.00041. Figure 2a represents the number of genes and transcription factors identified within each pathway. The majority of the pathways only reported a total of two genes or transcription factors. However, the pathways "regulation of interferon-β production" and "regulation of type II interferon production" identified the same four genes and transcription factors: TLR7, YY1, TLR9, and IRF1. Figure 2b also represents results from the function analysis, however, with specific focus on immune specific pathways. These pathways reported between one and four genes and transcription factors. They are shown in the Cytoscape network illustrated in the section below (Section 3.3), where they are integrated into a network of genes and transcription factors.

Network Analysis of Transcription Factors, Genes, and Genes and Transcription Factors
Network analysis of the relationships between the TFs, the genes, and then the genes and TFs were performed initially with STRING (Figures 3 and 4). Since these figures are second or third interaction networks, there are genes included that are not included in original literature searches and pathway analyses and are not a part of the analysis. As shown by Figure 3, YY1 and NKX2-5 appear to be the largest hubs of predicted TFs. The TF ZEB1 is an alternative name of AREB6 and is only incorporated into the network through NKX2-5, which appears surprising. AREB6 is predicted in all genes except one, and therefore, is predicted for the same genes as the majority of the other transcription factors; however, the network only shows interaction with NKX2-5. Another surprising result from the network was that even in a second interaction network, ARNT does not have any interactions, which might represent an unrelated function to the other TFs. In Figure 4, some of the larger hubs from the original list of genes appear to be TLR9, TLR7, IL-6, TP53, and SRC, with the largest hub being RHOA. NANS is unconnected in this network, which is most likely due to its specific function as a gene. In Figure 5, a third interaction relationship, NANS, is still unconnected. ARNT becomes attached through an additional gene to YY1 and TP53, along with many other additional genes. NKX2-5 appears to be less of a hub than other transcription factors like ZEB1 (AREB6) and LMO2. The genes have much larger hubs than the transcription factors, with SRC, RHOA, and TP53 illustrating some of the largest original hubs. Figure 6 depicts the Cytoscape network, which incorporates transcription factors, genes, and immune significant pathways. Similar to earlier STRING networks, TLR7 appears to be a major hub of the network, creating the link between many transcription factors and pathways. YY1 appears to serve as a major transcription factor hub, even directly interacting with two major pathways "regulation of type I interferon production" and "regulation of interferon-β production". Many of the transcription factors such as HLF, E4BP4, NRF-2, and ELK-1, only connect to a singular gene. Although HLF's singular connection is to TLR7, the most prominent hub within the network, the transcription factors YY1 or AREB6, which interact with many genes including TLR7, appear to be a stronger target.  factor hub, even directly interacting with two major pathways "regulation of type I interferon production" and "regulation of interferon- production". Many of the transcription factors such as HLF, E4BP4, NRF-2, and ELK-1, only connect to a singular gene. Although HLF's singular connection is to TLR7, the most prominent hub within the network, the transcription factors YY1 or AREB6, which interact with many genes including TLR7, appear to be a stronger target.

Evolutionary Conservation of Transcription Factors
Based on the results of TF prediction, homolog searches were performed through NCBI BLAST to retrieve multiple sequences from unique species. Phylogenetic trees were produced for eight of the predicted transcription factors. The factors E47, RORAlpha1, and E4BP4 were not present on NCBI, Ensembl, or UniProt, and therefore were excluded from analysis. These trees are illustrated in Figure  7a-h. AREB6, which is depicted in Figure 7a, is highly conserved in comparison to Bos taurus within Bison and Bubalus bubalis clade. Bos indicus and Odocoileus virginianus texanus are also within the clade.

Evolutionary Conservation of Transcription Factors
Based on the results of TF prediction, homolog searches were performed through NCBI BLAST to retrieve multiple sequences from unique species. Phylogenetic trees were produced for eight of the predicted transcription factors. The factors E47, RORAlpha1, and E4BP4 were not present on NCBI, Ensembl, or UniProt, and therefore were excluded from analysis. These trees are illustrated in Figure 7a-h. AREB6, which is depicted in Figure 7a, is highly conserved in comparison to Bos taurus within Bison and Bubalus bubalis clade. Bos indicus and Odocoileus virginianus texanus are also within the clade. ARNT, illustrated in Figure 7b, has less evolutionary conservation in relation to Bos taurus. The closest relation appears to be with Ovis aries. In Figure 7c, which depicts the transcription factor C-Rel, there appears to be a lack of evolutionary conservation. C-Rel appears to be the closest to Bos taurus for Capra hircus and Ovis aries. The transcription factor HLF, present in Figure 7d, appears to be the most conserved in relation to Bos taurus in the species Bubalus bubalis and Bos indicus. Odocoileus viriginianus texanus, Capra hircus and Bison are also within the clade. Figure 7e illustrates the phylogenetic tree of CREB, which appears to have the most conservation from Bos taurus in Bos mutus. LMO2, the transcription factor for Figure 7f, is highly conserved between Bos taurus and Bos indicus. Bison and Bos mutus also appeared to be conserved in relation to Bos taurus. NKX2-5, one of the most predicted transcription factors, is illustrated in Figure 7g. Bos indicus appears to be very highly conserved in relation to Bos taurus, with Bos mutus and Bison also appearing closely conserved. Figure 7h depicts YY1, where Bos indicus and Bison appear to be the most conserved in relation to Bos taurus. Species within these phylogenies were mostly consistent between the trees, resulting in very similar phylogenetic trees and therefore very similar relationships of conservation. From these eight transcription factors, the species that appear to be highly conserved in relation to Bos taurus include Bos mutus, Bos indicus, Bison, and Bubalus bubalis.

Discussion
As one of the major risks to the cattle industry, bovine coronavirus outbreaks have the potential to cause significant economic effects and negative downturn in agricultural output. Causing both respiratory infections and weight-loss inducing diarrhea, along with long lasting lowered growth rates and diminished milk production, widespread BCoV infection would be devastating to countries with cattle as a primary source of GDP [8,9,31]. Despite the risk associated with uncontrolled outbreaks, BCoV is under researched and lacking in strong testing and treatment options. With recent progress in data mining, transcription factor prediction, and transcription targeted treatments, new testing and treatment methods are within reach. In this study, an extensive pipeline was developed to identify significant gene and transcription factor targets for future BCoV testing and treatment resources.
Through data mining and functional analyses, a list of 11 significant genes in BCoV infection were identified. Many of these genes were involved in immune response to infection, particularly inflammation. TLR7 and TLR9 are both implicated in pro-inflammatory signaling in response to recognized conserved viral molecular patterns and are key in the innate antiviral response [32,33]. TLR7 is activated by microbial derived nucleic acids while TLR9 is activated by dsDNA-derived CpG motifs [34]. TLRs are responsible for linking the innate and adaptive immune response, clearly a desired response to preserve or improve with a targeted drug. If a transcription factor was to repress TLR7 or TLR9, it would damage the immune response to BCoV. Other genes that would harm the desired immune response if repressed are RHOA, which is responsible for antigen presentation; T-cell activation; and chemokine, cytokine, and growth factor binding [35,36]; and IRF1, which is responsible for defense of host cells through the regulation of autoimmunity, inflammation, viral infections, and innate and adaptive immune responses [10]. The disruption of any of these genes will most likely affect the immune response. As seen from our results, the 10 transcription factors that target these four genes (HLF, YY1, AREB6, C-Rel, NKX2-5, E47, E4BP4, RORAlpha1, LMO2, ARNT) will either activate or repress the immune response through the activation or repression of these four genes.
IL-6 is also involved in immune reactions, however, as a pro-inflammatory cytokine, responsible for acute phase responses [37][38][39]. As pro-inflammatory cytokines result in inflammation, tissue damage, fever, and potential shock and even death [40], which are found to be increasingly expressed within respiratory tracts [41], a transcription factor enabled activation of the IL-6 gene is inferred to increase the potential of these negative effects. Similar to IL-6, NANS is inferred to be an undesirable target for transcription factor activation. The repression of the gene NANS would logically improve the desired immune response. NANS is the gene responsible for the production of N-acetyl 9-O-acetylneuraminic acid [42,43]; the sialic acid BCoV binds to on host cell receptors. The use of transcription factors to repress NANS would limit the production of N-acetyl 9-O-acetylneuraminic acid and reduce the opportunities for BCoV binding, thereby hampering the virus entry in to the host. Such limitations will reduce the capacity for infection, advance innate immunity and ameliorate further pathological outcome.
The other five genes lack a clear direct effect on the immune pathways and therefore have less inferable effects of potential regulation or activation through transcription factor targeting. CEBPB, involved in monocyte development [44], is key in the response to many types of infection but occasionally cause immunopathology [45]. TRAF2 is a key element of TNF [46,47], which has significant roles in inflammation. However, as also for CEBPD, SRC, and TP53, TNF has established links to cancer and its polymorphisms serving as susceptibility factor for many infectious diseases [48]. CEBPD is a tumor suppressor [49,50], which identifies it as a bad candidate for repression. However, it has been reported to enhance oncogenic pathways in different contexts [50]. TP53 has also been identified as a tumor suppressor gene [51] and SRC implicated in some cancers such as sarcoma [52]. As many of the transcription factors target both genes that should not be repressed and genes that should be repressed, transcription factors must be carefully chosen as target, with gene function in mind. Additionally, transcription factors such as NKX2-5, which have been proven to cause heart defects or disease when mutated or deleted [53,54], have external effects that must be carefully considered in the decision whether to target them for treatments. Transcription factors selected as biomarkers for targeting should be predicted for genes such as TLR7, TLR9, RHOA, IRF1, and IL6 since they are activated early in infection. These genes have functions directly related to immune response pathways and have the highest probability of expression during BCoV. The possibility to activate them for a robust innate immune response through TF-targeted therapy would be important in the treatment of BCoV infection [55,56]. Given the diverse functions of the 11 genes identified as significantly expressed during BCoV infection, and the high overlap between the genes for predicted transcription factors, the transcription factor selected to serve as the target for therapy must be selected with those functions in mind.

Conclusions
This study identifies specific transcription factors that could affect host response to bovine coronavirus infection, particularly within the immune response system. Eleven genes were identified as significantly expressed during infection, and an extensive pipeline was developed to predict and analyze potential transcription factors targeting those genes. Transcription factor involved with many genes and pathways were inferred to have a stronger potential effect on the host if targeted, due to their role in activating or repressing many genes. For example, the targeting of AREB6 would regulate 10 of the 11 identified genes, including TLR7, TLR9, IRF1, and many more. We established the predicted effect of these transcription factors based on the function of their gene interactions and associated pathways. In total, this study establishes relationships between bovine genes, transcription factors, and immune pathways. Given that the genes NANS and IL-6 were identified as having potentially negative effect on immune response, the regulation/activation of genes to improve host immune response to BCoV must be done through transcription factors capable of repressing either or both of them. The best transcription factors we identified as potential markers for host-pathogen interactome in BCoV infection include NKX2-5, AREB6, YY1, and LMO2. Given the criteria above and the fact that YY1 could target TLR7 and TLR9 signaling, type I and II interferon production/pathways, reported as effective for virus clearance during infection, if activated, establishes YY1 as the best TF with the best potential as a drug target for BCoV therapy. Funding: This work was supported by the Ralph E. Hansmann Science Students Support Fund and the Sergei S. Zlinkoff Student Medical Research Fund, Hamilton College SSRF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The APC was discounted by MDPI.

Conflicts of Interest:
The authors declare we have no conflicts of interest.