Evolutionary Patterns of Non-Coding RNA in Cardiovascular Biology

Cardiovascular diseases (CVDs) affect the heart and the vascular system with a high prevalence and place a huge burden on society as well as the healthcare system. These complex diseases are often the result of multiple genetic and environmental risk factors and pose a great challenge to understanding their etiology and consequences. With the advent of next generation sequencing, many non-coding RNA transcripts, especially long non-coding RNAs (lncRNAs), have been linked to the pathogenesis of CVD. Despite increasing evidence, the proper functional characterization of most of these molecules is still lacking. The exploration of conservation of sequences across related species has been used to functionally annotate protein coding genes. In contrast, the rapid evolutionary turnover and weak sequence conservation of lncRNAs make it difficult to characterize functional homologs for these sequences. Recent studies have tried to explore other dimensions of interspecies conservation to elucidate the functional role of these novel transcripts. In this review, we summarize various methodologies adopted to explore the evolutionary conservation of cardiovascular non-coding RNAs at sequence, secondary structure, syntenic, and expression level.


Introduction
Myocardial and vascular ailments are complex systemic diseases, successively leading to chronic cardiac complications. These cardiovascular diseases (CVDs) encompass a broad range of disorders including atherosclerosis, inflammatory heart disease, arrhythmias, and congenital heart disease among others. Cardiovascular disease remains the major cause of death in the world, exceeding deaths due to communicable diseases such as malaria, HIV/AIDS, and tuberculosis [1,2]. In 2016, an estimated 17.9 million people died from CVDs all across the world [1,3]. Approximately 85% of deaths in these cases are due to myocardial infarction and stroke. Currently, 80% of CVD mortality occurs in developing nations and is expected to be the major cause of mortality in most developing nations by 2020. In 2011, three in every 10 deaths were caused by CVD and it is estimated that by 2030, 23.3 million people will die annually due to CVD [1,2].
In addition to sex, age, and other environmental factors, genetic factors are major drivers for complex cardiovascular diseases [2,3]. Over the past years, several genetic studies have tried to correlate genotype with phenotype, i.e., to identify gene-gene and gene-environment interactions.

MicroRNAs
Another major class of non-coding RNAs are the small non-coding RNAs, with miRNAs being most notable. MicroRNAs are~22 nucleotides single-stranded molecules which primarily function as post-transcriptional regulators of gene regulation. They are the functional unit of the RNA-induced silencing complex (RISC), which bind to its target mRNA in a sequence-dependent manner resulting in the degradation or deadenylation of the mRNA [49]. Additionally, miRNAs can also be sequestered by other lncRNA or pseudogenes introducing a new layer of regulatory complexity [50].
MicroRNAs play an integral part in all the facets of cardiovascular biology, including smooth muscle maturation and proliferation, endothelial function, and regulation of genes involved in cardiogenesis. Several pathological conditions, such as atherosclerosis, heart failure, cardiomyopathy, and myocardial fibrosis are shown to result from the dysregulation of miRNA ( Table 2).
A large effort has been placed in developing miRNA mimics and anti-miRNA inhibitor molecules as therapeutic interventions to regulate disease physiologies. One of the first miRNA-dependent therapies was developed in 2008, where a highly specific antagomir of miR-21 was developed for the attenuation of cardiac dysfunction in rodent model of cardiac fibrosis [51]. The high stability of circulating miRNAs in plasma and their differential expression in disease phenotypes also makes them excellent candidates as biomarkers in CVD.

Circular RNAs
Recent development in RNA sequencing technology has facilitated the characterization of several novel RNA transcripts. Circular RNAs (CircRNAs) represent one such emerging class, which has been identified across multiple species including archaea, fungi, plants, fish, insects and mammals [65][66][67]. These transcripts have been shown to perform a myriad of regulatory roles in multiple biological processes. Circular RNAs are known to function as miRNA sponges [66,68], splicing competitors [69], protein binding/sequesters [70], and transcription [71] and translation [70] regulators of the host gene. Some circRNAs have also been shown to produce proteins using the translational machinery in a cap-independent manner [72,73]. The expression of circRNAs is spatio-temporally regulated and plays a critical role in the development and pathogenesis of several diseases including cancer, neurological and CVD [74][75][76].
Many transcriptomic studies have focused on the identification of circRNAs [77,78] during cardiac development and pathological conditions. Interestingly, most of these studies detect differential expression of multiple circRNA isoforms specifically from TTN and RYR2 genes. These are known genes which play an important role in cardiovascular biology, yet the functional characterization of their circular isoforms remains to be established. Recent studies have tried to interpret the role of many candidate circRNAs in cardiovascular development and disease, which is summarized in Table 3.
The efforts to identify circRNAs is also in part due to the promise, which these novel transcripts offer as potential biomarkers. For one reason they are expressed in a cell-specific manner. Another reason is the lack of free ends which renders them resistant to exonuclease-mediated degradation. CircRNAs have been shown to have a median half-life of at least 2.5 times higher than their linear counterparts [79]. Apart from being highly stable, they have been detected to be circulating in the blood and are present in plasma as well as extracellular vesicles [80][81][82]. One study also showed their presence in cell-free saliva which makes them excellent candidates for non-invasive detection [83]. Wesselhoeft et al. engineered circular RNAs for the production of proteins and showed their prowess as robust and stable protein producers. This also suggests their potential as therapeutic vehicles [84].

RNA-Sequencing for Identification of Non-Coding RNA
RNA sequencing (RNA-seq) has emerged as one of the major facilitators for the identification and characterization of ncRNAs. RNA sequencing characterizes CVD by studying transcriptome-wide expression profiles, alternative splicing patterns, and regulatory networks that provide deeper information of the biochemical pathways altered in the diseased condition and possible modifiable genome level interactions. RNA sequencing has enabled us to compare gene expression in diseased and Non-coding RNA 2019, 5, 15 8 of 21 non-diseased tissues or blood components to yield a set of genes that might explain the pathological condition. Several variants of RNA-Seq protocol have been developed to study spatio-temporal complexity of individual components of the transcriptomes. These protocols have been accompanied by computational methodologies to assist in the proper quantification of transcripts [91].
Most of the lncRNA detection studies either involve poly-A enrichment or rRNA depletion before library preparation. While mRNAs and many lncRNAs contain a poly-A tail, these molecules can be detected using poly-A enrichment. However, since there are many non-polyadenylated lncRNAs, these transcripts will not be captured. Sequencing protocols involving rRNA depletion enable us to cover the whole diversity of transcripts. Thus, the choice of sequencing methodology highly depends on the desired targets to be sequenced and economic viability. Multiple algorithms have been developed to help distinguish between protein-coding and lncRNA transcripts [92]. Small RNA-seq enables the identification of miRNA and other small RNA species using size selection techniques. Although, total RNA-seq can capture circRNA transcripts, specialized protocols have been developed to enrich for circular transcripts by selecting against poly-A transcripts. Several algorithms have also been developed to facilitate the identification of back-splicing junctions, which are a hallmark of circRNA transcripts [93,94]. Recently, the focus has also shifted towards the quantification as well as the relative abundance of these molecules compared to linear counterparts of the host gene [95,96].

Experimental Methodologies to Explore ncRNA Functionality
Despite the growth in the number of lncRNAs and database resources, most of the lncRNAs remain uncharacterized [97]. While miRNA function and their binding targets are better understood, these resources remain far from completion [98]. Many resources provide information about experimentally validated functions of lncRNAs, yet just looking at the number of represented lncRNAs makes the void quite evident [99,100].
One way of addressing this gap is to search for the homologous transcripts in related organisms. The main assumption behind most of the ortholog identification studies is that they also share biological function. However, due to limited consensus in the methods for identification of ncRNA homologs, this may not always be true. Therefore, apart from verifying the presence of individual lncRNAs, there is also an urgent need to experimentally validate their biological role.
Despite recent progress, functional genomics is yet to be completely exploited to understand the lncRNAs' functions, their interactions, as well as mechanism of regulation and physiological relevance. In recent years, novel methodologies have been developed to probe the function of individual transcripts mostly involving its overexpression, knockout or knockdown studies (Table 4) [101,102]. Several high-throughput techniques also enable the investigation of interactions of lncRNAs with DNA, RNA, and proteins (Table 4) [103]. As lncRNAs are pivotal to cardiovascular biology, functional validation can help deepen our understanding of their biological implication in development and disease.

Conserved Nature of Non-Coding RNAs
The evolutionary conservation of ncRNA has been a topic of intense research in the last few years. While some classes of ncRNA such as miRNAs are considered highly conserved, establishing the conservation of lncRNAs remains challenging. Most of the earlier efforts were focused on establishing these orthologous relations based on sequence conservation. Some studies tried to identify segments of the genome which were ultra-conserved across species and found that majority of them were located in introns and intergenic regions [104]. Further studies confirmed that most of these regions are indeed transcribed into lncRNA sequences [105].
On the other end of this spectrum, Pollard et al. [106] investigated regions within humans with high sequence diversity but were conserved in other species, and also found them to be mostly within non-coding regions. They argued that the lack of sequence conservation does not mean lack of function. Although, sequence conservation still remains the primary method for identifying orthologs, many researchers have tried to complement this with structure, synteny, and expression level conservation (Figure 1). these regions are indeed transcribed into lncRNA sequences [105].
On the other end of this spectrum, Pollard et al. [106] investigated regions within humans with high sequence diversity but were conserved in other species, and also found them to be mostly within non-coding regions. They argued that the lack of sequence conservation does not mean lack of function. Although, sequence conservation still remains the primary method for identifying orthologs, many researchers have tried to complement this with structure, synteny, and expression level conservation (Figure 1).

Sequence Level
The precise detection of homologous transcripts has mainly relied on sequence level conservation between species. Over the years, several resources have been developed to infer these orthologous relations which can be divided into tree-based or graph-based algorithms [107]. The main principle behind these methodologies is to differentiate between orthologs, which are a result of speciation events and have the same function and paralogs resulting from gene duplication and can differ functionally. Several attempts have also been made to compare and standardize the methodologies in order to get a more accurate ortholog detection [108][109][110]. However, the fact that lncRNAs are not well conserved at the sequence level has limited their application beyond coding genes. In fact, due to the degree at which the sequences have diverged, it is sometimes impossible to call any ortholog. There are only a handful of known lncRNAs which show sequence conservation similar to coding genes [111].
With decreasing sequencing costs, it has become feasible to investigate genome-wide lncRNA sequences across organisms. Some studies which have attempted to look at genome-wide sequence homology in lncRNAs, mainly employ a reciprocal best hit method involving two-way sequence alignment ( Table 5). Most of these studies looked at transcriptome patterns across different organs to capture complete transcriptomic heterogeneity across each species [112,113]. Recent attempts have tried to improve this approach by utilizing synteny and structure-based methods to aid in the identification of orthologs [114]. Nonetheless, these studies mostly agree that lncRNAs undergo rapid evolutionary changes and the sequences are rarely conserved beyond a particular evolutionary point.  [116] However, the transcriptome profiles evolve more dynamically and in many cases the comparison of transcribed sequences may not provide the complete perspective. The changes in splicing patterns and exonic boundaries result in lncRNAs of one species aligning to non-transcribed regions in the other. Moreover, in some cases there is no sequence similarity between organisms except near the 5 end and promoter sequence of the lncRNA. In fact, some studies have also pointed out that the promoter regions of lncRNAs are often as conserved as the promoters of protein coding genes [113,117]. These complexities make it crucial to correctly identify and characterize lncRNA orthologs. Numerous lncRNAs, such as MALAT1, HOTAIR, GAS5, CARMEN and CHAST, have been identified in humans with some degree of sequence conservation across other organisms. Still there are many other lncRNAs in CVD, whose orthologs are yet to be identified.

Structure Level
Non-coding RNAs, especially miRNAs are known to form secondary structures, which are important for its interactions with other biomolecules and thus their function. Just like other mRNA transcripts, lncRNAs are also known to form stable secondary structures [118]. The absence of significant sequence conservation does not mean lack of selection at the structural level [119]. This is evident in case of the telomerase RNA and the stem region of miRNAs, which even in the absence of sequence similarity maintain structural integrity. However, this hypothesis has been tested with limited success in case of lncRNAs.
The fact that even random RNA sequences can form stable structures, suggests that it is not a necessary condition for a functional correlation at the sequence level. Indeed, there are contradictory views about the conservation of lncRNA secondary structures. While some studies suggest lack of any statistically significant conserved RNA structure for some lncRNAs, others have shown conserved structural domains in several lncRNAs [120][121][122][123]. In fact, several important cardiovascular lncRNAs, such as GAS5 and HOTAIR, have been shown to have some degree of structural conservation [124,125].
Over the last two decades, several studies have tried to use computational methods to look at the genome-wide RNA secondary structure conservation ( Table 6). Most of these tools are based on sequence alignment, and thus require a certain degree of sequence conservation. Others have tried to overcome this obstacle by using conserved synteny as the basis for the identification of stretches for structural survey. However, most of these computational methods, irrespective of their intrinsic principle, suffer from low detection accuracy and their predictions rarely agree [126].  [138] Recent advancements in high-throughput technologies have made it possible to probe RNA structures across the genome [139]. Methodologies such as PARS (parallel analysis of RNA structure), Frag-Seq (fragmentation-sequencing), SHAPE-Seq, DMS-Seq, among others, have been developed to determine RNA structures on a genome-wide scale. These experimental methodologies coupled with computational algorithms can greatly improve the accuracy of RNA structure prediction, thus improving our understanding of lncRNA structure conservation.

Synteny Level
Non-coding RNAs are known to regulate the expression of protein coding genes both via cis and trans-acting mechanisms. Although, most lncRNAs undergo rapid evolutionary turnover in terms of sequence and transcription, yet the syntenic relationship with neighboring genes appears to be preserved [114,140,141]. Many times, such lncRNAs display only local levels of sequence conservation mostly near the promoter region, which suggests that the transcriptional event from that loci is essential and the lncRNA itself might be of less importance. FENDRR and PVT1 are two lncRNAs, which are essential to cardiovascular biology and do not show high levels of sequence conservation, yet their relative location is conserved [114].
The fact that lncRNAs maintain their positional integrity across species, provides insight into the origins of these transcripts. Hezroni and co-workers [142] suggest some lncRNAs might be relics of ancestral genes which lost their coding potential. Ning et al. [143] suggested that many of the lncRNA-coding gene overlap pairs were a result of overprinting and not due to genomic rearrangements. Other recent findings suggest lncRNAs to be intermediaries leading to the origin of novel protein coding genes [144,145]. Chen et al. [141] also investigated the positional conservation of lncRNAs with respect to miRNAs, snoRNAs, and protein coding transcripts and suggested their classification based on evolutionary history.
This evidence suggests that the position of lncRNAs is important for the cis regulatory function. Long non-coding RNAs that are antisense to protein coding genes have been shown to influence nearly every aspect of gene expression regulation by interacting with DNA, RNA, and proteins of the respective coding gene [146]. In particular, lncRNAs overlapping protein coding genes display a high level of co-expression and tissue specificity resulting in their evolutionary retention [143]. Amaral et al. [147] described lncRNAs, which bear positionally conserved promoters in humans and mice, and were enriched at topologically associating domain (TAD) boundaries. Their findings indicated their role in the regulation of expression in neighboring genes and modulation of chromatin looping. These studies emphasize the importance of syntenic conservation on the functional properties of lncRNA.

Expression Level
Transcriptome profiles of individual organs have been demonstrated to be more conserved across species than they are across organs within the same species [148]. Long non-coding RNAs are expressed at lower levels than mRNAs and less conserved at the sequence level, but they are known to be highly tissue specific [113,114]. It therefore becomes imperative to carefully match homologous tissues across species in order to capture the complete expression profiles. This specificity has also been observed for the human heart. Not only the transcriptome profile of the heart is different from other organs, recent studies have also demonstrated it to be different across the heart chambers [149,150]. These differences shed some light into the function and pathophysiology of heart related ailments. Indeed, there are known examples of non-coding transcripts which are expressed exclusively in a particular heart chamber ( Figure 2), yet not much is known about this specificity. Future studies will provide deeper insight into the conservation of expression profiles across heart chambers and other tissue subtypes.
other organs, recent studies have also demonstrated it to be different across the heart chambers [149,150]. These differences shed some light into the function and pathophysiology of heart related ailments. Indeed, there are known examples of non-coding transcripts which are expressed exclusively in a particular heart chamber (Figure 2), yet not much is known about this specificity. Future studies will provide deeper insight into the conservation of expression profiles across heart chambers and other tissue subtypes.

Conclusion
The advancements in NGS technologies have accelerated the identification of various novel ncRNA transcripts in CVD. However, only a handful of these transcripts have been functionally characterized. The lack of high-throughput experimental approaches to elucidate the role of these transcripts makes their functional investigation very challenging. The availability of well characterized genomes has led to the emergence of comparative genomics methodologies to

Conclusions
The advancements in NGS technologies have accelerated the identification of various novel ncRNA transcripts in CVD. However, only a handful of these transcripts have been functionally characterized. The lack of high-throughput experimental approaches to elucidate the role of these transcripts makes their functional investigation very challenging. The availability of well characterized genomes has led to the emergence of comparative genomics methodologies to functionally annotate them. These methods are highly dependent on sequence conservation across species, and thus, limited mainly to protein coding genes.
Although several lncRNAs show sequence conservation, the rapid evolutionary turnover has resulted in sequence divergence beyond recognition. Despite this, most of the lncRNAs appear to have conserved expression patterns and functions. Over the past years, several experimental methodologies have been developed to explore the structural elements within lncRNAs. These protocols have enabled us to investigate the genome-wide structural conservation of lncRNAs. Apart from this, many studies have tried to exploit the syntenic conservation of lncRNAs to improve the characterization of their homologs. These studies highlight the fact that there are several dimensions to interspecies conservation, and a lack of sequence conservation does not necessitate lack of function.
Novel and innovative approaches accompanied by improved experimental methodologies should aid to understand the functional implications of non-coding transcripts. In summary, future studies encompassing these dimensions of non-coding RNA conservation pose an exciting opportunity to investigate the role of non-coding RNAs in the cardiovascular system.