Genome-Wide Analysis Reveals Changes in Long Noncoding RNAs in the Differentiation of Canine BMSCs into Insulin-Producing Cells

Long noncoding RNAs (lncRNAs) have been extensively explored over the past decade, including mice and humans. However, their impact on the transdifferentiation of canine bone marrow mesenchymal stem cells (cBMSCs) into insulin-producing cells (IPCs) is largely unknown. In this study, we used a three-step induction procedure to induce cBMSCs into IPCs, and samples (two biological replicates each) were obtained after each step; the samples consisted of “BMSCs” (B), “stage 1” (S1), “stage 2” (S2), “stage 3” (S3), and “islets” (I). After sequencing, 15,091 lncRNAs were identified, and we screened 110, 41, 23, and 686 differentially expressed lncRNAs (padjusted < 0.05) in B vs. S1, S1 vs. S2, S2 vs. S3, and I vs. S3 pairwise comparisons, respectively. In lncRNA target prediction, there were 166,623 colocalized targets and 2,976,362 correlated targets. Gene Ontology (GO) analysis showed that binding represented the main molecular functions of both the cis- and trans-modes. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis suggested that the insulin signaling pathway, Rap1 signaling pathway, tight junctions, MAPK signaling pathway, and cell cycle were enriched for these relative genes. The expression of lncRNAs was verified using qRT-PCR. This study provides a lncRNA catalog for future research concerning the mechanism of the transdifferentiation of cBMSCs into IPCs.


Introduction
Long noncoding RNAs (LncRNAs) are the rest of the protein-coding transcripts >200 nt in length that lack coding potential [1]. LncRNAs were discovered many years ago; for example, XIST was found in the 1980s [2]. To date, lncRNAs can be divided into five kinds based on their relative positions with respect to protein-coding genes: (1) intergenic lncRNAs, which are also known as lincRNAs, are transcribed from the DNA sequence between two protein-coding genes [3,4]; (2) intronic lncRNAs are transcribed from introns of protein-coding genes [5]; (3) antisense lncRNAs, as their name implies, are generated from the opposite direction of protein-coding genes [6]; (4) sense lncRNAs overlap protein-coding genes; and (5) bidirectional lncRNAs [7]. LncRNA catalogs of human [8], mouse [9], zebrafish [10], and fly [11] lncRNAs have already been collected; however, canine lncRNAs data are lacking. LncRNAs can be classified into two categories based on whether they function in cis-(these lncRNAs are located near the transcription site) or in trans-(this role is more complicated than the cis-role). These potential differences provide lncRNAs the ability to participate in many biological processes. It has been reported that lncRNAs can be therapeutic targets for managing many diseases, such as breast cancer [12] and hepatocellular carcinoma [13]. LncRNAs can also influence the

Overview of lncRNA Sequencing
The library was constructed from B, I, S1, S2, and S3, with two biological replicates. The sequencing data showed that there were approximately 90 million clean reads of approximately 100 million raw reads per sample (Table S3). Among these reads, 92-95% of the clean reads mapped to the reference genome (Canis lupus familiaris CanFam3.1, NCBI annotation release number 105), and the percentage of uniquely mapped reads accounted for 78.95-88.65% (Table S4). The mapped reads were classified into eight types, and the percentages of lncRNAs in B, I, S1, S2, and S3 were 0.4%, 0.9%, 0.5%, 0.7%, and 0.7%, respectively ( Figure S1). The data were uploaded to the SRA (Sequence Read Archive) database (https://www.ncbi.nlm.nih.gov/sra) under submission number SUB7457848.

Identification of lncRNAs
We used a 5-step procedure to filter the correct lncRNAs without gene-coding transcripts ( Figure 1) and screened 15,091 lncRNAs including lincRNAs (30.1%), antisense lncRNAs (8.5%), and intronic lncRNAs (61.4%) (Figure 2A). Most lncRNAs contained approximately 4 exons and no more than 20 exons; moreover, the peak number in the protein-coding transcripts was 8 exons, with a maximum of 40 exons ( Figure 2C). In addition, a length of 1000 bp represented the dominant portion of lncRNAs, while the length of known transcripts is nearly 2500 bp ( Figure 2D). Overall, compared with protein-coding transcripts, lncRNAs had fewer exons but a similar length.

Differentially Expressed lncRNAs
Pairwise comparisons between different samples based on the fragments per kilobase of exon per million fragments mapped (FPKM) indicated that all samples could be clustered into different groups (two biological replicate each) and showed that I and S3 were similar to some extent ( Figure 2B). There were 21 upregulated and 89 downregulated lncRNAs in the B vs. S1 comparison ( Figure 3A). Similarly, there were 20 upregulated and 21 downregulated lncRNAs in S1 vs. S2 ( Figure 3B Figure 3E). These findings suggest that the differentially expressed lncRNAs slightly declined during induction ( Figure 3F). lncRNAs (61.4%) (Figure 2A). Most lncRNAs contained approximately 4 exons and no more than 20 exons; moreover, the peak number in the protein-coding transcripts was 8 exons, with a maximum of 40 exons ( Figure 2C). In addition, a length of 1000 bp represented the dominant portion of lncRNAs, while the length of known transcripts is nearly 2500 bp ( Figure 2D). Overall, compared with proteincoding transcripts, lncRNAs had fewer exons but a similar length.

Differentially Expressed lncRNAs
Pairwise comparisons between different samples based on the fragments per kilobase of exon per million fragments mapped (FPKM) indicated that all samples could be clustered into different groups (two biological replicate each) and showed that I and S3 were similar to some extent ( Figure 2B). There were 21 upregulated and 89 downregulated lncRNAs in the B vs. S1 comparison ( Figure 3A). Similarly, there were 20 upregulated and 21 downregulated lncRNAs in S1 vs. S2 ( Figure 3B), 12 upregulated and 11 downregulated lncRNAs in S2 vs. S3 ( Figure 3C), 357 upregulated and 329 downregulated lncRNAs

Verification of Target Genes and lncRNAs
FOS (Fos proto-oncogene), VIP (vasoactive intestinal peptide), SSTR2 (somatostatin receptor 2), and RPS6KA6 (ribosomal protein S6 kinase A6) were selected from the PPI (protein-protein interaction) network based on our previous work of protein-coding transcripts, and the chosen lncRNAs (LNC_000123, LNC_004661, LNC_014532, LNC_004320, and XR_292983.1) were colocalized with these genes. The qRT-PCR results were consistent with the RNA-Seq data ( Figure 8). However, whether internal interactions exist requires further detection.

Verification of Target Genes and lncRNAs
FOS (Fos proto-oncogene), VIP (vasoactive intestinal peptide), SSTR2 (somatostatin receptor 2), and RPS6KA6 (ribosomal protein S6 kinase A6) were selected from the PPI (protein-protein interaction) network based on our previous work of protein-coding transcripts, and the chosen lncRNAs (LNC_000123, LNC_004661, LNC_014532, LNC_004320, and XR_292983.1) were colocalized with these genes. The qRT-PCR results were consistent with the RNA-Seq data ( Figure 8). However, whether internal interactions exist requires further detection.

Verification of Target Genes and lncRNAs
FOS (Fos proto-oncogene), VIP (vasoactive intestinal peptide), SSTR2 (somatostatin receptor 2), and RPS6KA6 (ribosomal protein S6 kinase A6) were selected from the PPI (protein-protein interaction) network based on our previous work of protein-coding transcripts, and the chosen lncRNAs (LNC_000123, LNC_004661, LNC_014532, LNC_004320, and XR_292983.1) were colocalized with these genes. The qRT-PCR results were consistent with the RNA-Seq data ( Figure 8). However, whether internal interactions exist requires further detection.

Discussion
Nearly 90% of the mammalian genome is transcribed as noncoding RNAs, and many of these noncoding RNAs are treated as junk or transcript noise [21]. However, this concept has been brought to the forefront in recent years. It has been confirmed that lncRNAs can function in transcription and posttranscription in addition to interacting with proteins [22] and can even participate in epigenetic modification [23]. According to many studies in other species and based on existing high-throughput technology, many lncRNAs have been reported in either disease research or basic research areas. The two reasons we chose cBMSCs, derived from the long bones of Chinese rural dogs, as our cell resource were as follows: (1) cBMSCs, which have the ability to evade the surveillance of the immune system [24] and are used as materials in diabetes research, are not involved in ethical problems as embryonic stem cells or other somatic cells derived from humans. (2) Canines, which are our friends and family members in normal life, represent an ideal model for disease research because they live in the same environment as humans; furthermore, there are increasing numbers of clinical cases of canine diabetes. In general, our work can serve as foundation not only for canine clinical treatment but also for research investigating human diabetes.
In this study, based on a five-step filter (Figure 1), we obtained 15,091 lncRNAs and 110, 41, 23, and 686 differentially expressed lncRNAs in the pairwise comparisons of different states of induction. LncRNAs accounted for only less than one percent of the RNA-Seq data, whereas protein-coding transcripts represented nearly 70%. However, we observed a relatively large increase in the lncRNAs ratio during the induction ( Figure S1). Other research has shown that a low number of lncRNAs is sensible [25,26]. Regarding the characteristics of lncRNAs, their exons and lengths are smaller and shorter than those of protein-coding transcripts [27]; as a result, lncRNAs cannot be translated into proteins, although some lncRNAs can encode functional peptides to regulate biological processes [28]. The expression levels were verified by qRT-PCR, and the trends were consistent with the RNA-Seq data; however, the expression level of the lncRNAs was much lower than that of the protein-coding transcripts (Figure 8). The genes we chose in this study were derived from differentially expressed genes that also had an interaction network with several key genes, including INS (insulin), SST (somatostatin), and GCG (glucagon). FOS and RPS6KA6 are members of the MAPK signaling pathway, which is a vital pathway for cell differentiation and proliferation [29,30]. Tuning these genes may contribute to the transdifferentiation of cBMSCs into IPCs. Therefore, we chose five lncRNAs that were colocalized with these genes to verify their expression levels, and the results confirmed the profile data, although whether these lncRNAs have internal relationships needs to be explored in depth. GO analysis showed that. during induction binding, enzyme regulatory activity and signal transduction were the main activities. Because islets are constructed by endocrine cells, the Golgi complex remains highly dynamic and the enzymes involved vary [31,32]. KEGG analysis identified the MAPK signaling pathway, PI3K-Akt signaling pathway, insulin signaling pathway, cell cycle, and Rap1 signaling pathway, all of which are associated with pancreas development [33][34][35].
Our work demonstrates the characteristics, changes, and potential function of lncRNAs during the transdifferentiation of cBMSCs into IPCs. These findings expand the canine lncRNA database and could also serve as a reliable foundation for further research.

RNA Isolation and Qualification
According to the manufacturer's protocol, total RNA was extracted using TRIzol ® reagent (Invitrogen, USA), and genomic DNA was removed using DNase I (Takara, Japan). RNA purity was checked using a NanoPhotometer ® spectrophotometer (IMPLEN, CA, USA), and the RNA concentration was measured using a Qubit ® RNA Assay Kit in conjunction with a Qubit ® 2.0 Fluorometer (Life Technologies, CA, USA). RNA integrity was assessed using an RNA Nano 6000 Assay Kit in conjunction with a Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA).

Library Preparation
For RNA sample preparations, the input material was 3 µg of RNA per sample. First, the ribosomal RNA was removed by an Epicentre Ribo-zero™ rRNA Removal Kit (Epicentre, Brooklyn, NY, USA), and ethanol precipitation was performed to obtain rRNA-free RNA. Subsequently, the libraries were constructed by using an NEBNext ® Ultra™ Directional RNA Library Prep Kit for Illumina ® (NEB, Ipswich, MA, USA) according to the manufacturer's instruction. Then, we acquired RNA fragmentation by using NEBNext First Strand Synthesis Reaction Buffer (5×) under an increased temperature. The first-strand cDNA was subsequently synthesized using random hexamer primers; then, DNA polymerase I and RNase H were used for second-strand cDNA synthesis with dNTPs and dTTP substituted by dUTP in the reaction buffer. Simultaneously, exonuclease/polymerase activities turned all remaining overhangs into blunt ends. Then, the NEBNext adaptors with hairpin loop structures were linked to the 3 ends of cDNA fragments that were previously acetylated. In addition, cDNA fragments 150-200 bp in length were screened by an AMPure XP system (Beckman Coulter, Beverly, MA, USA). Before PCR, 3 µL USER Enzyme (NEB, USA) was added to the cDNA at 37 • C for 15 min and at 95 • C for 5 min. PCR was then performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. The PCR products were finally purified with an AMPure XP system and assessed by an Agilent Bioanalyzer 2100 system.

Sequencing
The TruSeq PE Cluster Kit v3-cBot-HS (Illumina) was used to cluster the index-encoded samples using a cBot cluster generation system. Following the cluster generation, sequencing was performed on an Illumina HiSeq 4000 platform, and 150 bp paired-end reads were generated.

Transcriptome Assembly
The mapped reads of each sample were assembled by StringTie (v1.3.1) [38] in accordance with the method in the reference. By using the high-quality assembler StringTie, we obtained potential transcripts with full-length and splice variants for each gene locus.

Identification of lncRNAs
We used CNCI (Coding-Non-Coding-Index) (v2) [39] with the default parameters. We also used the CPC (Coding Potential Calculator) (0.9-r2) [40] and the NCBI eukaryote protein database, with the e-value set to "1e-10" in our analysis. We also used Pfam Scan (v1.3) [41,42], with the following default parameters: -E 0.001 -domE 0.001. We performed multispecies genome sequence alignments and used PhyloCSF [43] with the default parameters. PHAST (v1.3) is a software package that contains many statistical programs and is mostly used for phylogenetic analyses [44], and phastCons is a conservation scoring and identification program for conserved elements. We used phyloFit to construct phylogenetic models of the conserved and nonconserved regions of species and then exported to phyloP with the model and hidden Markov model (HMM) transition parameters to perform the conservation analysis of the lncRNAs and protein coding genes.

Differential Expression Analysis
FPKM is calculated based on the length of the fragments and the read count mapped to these fragments. Cuffdiff (v2.1.1) was used to calculate the FPKMs of both the lncRNAs and coding genes in each sample. The FPKMs of the genes were computed by summing the FPKMs of the transcripts in each gene group. Transcripts with p adjust < 0.05 were considered differentially expressed.

GO and KEGG Enrichment Analyses
GO enrichment analysis of the differentially expressed genes or lncRNA target genes was implemented by the GOseq R package, and gene length bias was corrected [45]. GO terms with corrected p values less than 0.05 were considered significantly enriched differentially expressed genes. We used KOBAS software [46] to test the statistical enrichment of the differentially expressed genes or lncRNA target genes in the KEGG pathways.

Target Gene Prediction
In the cis mode, lncRNAs act on neighboring target genes. We searched the coding genes that were 10 k/100 k upstream and downstream of the lncRNAs and then analyzed their function. Regarding their trans role, lncRNAs must be identified by their expression level compared with that of other relative genes. Although there were no more than 25 samples, we used custom scripts to calculate the expression correlations between the lncRNAs and coding genes.

Gene Expression Validation by qRT-PCR
The first strand cDNA was obtained by using a PrimeScript TM First Strand cDNA Synthesis Kit (TaKaRa) with RNA (1 µg/mL). We selected five differentially expressed lncRNAs and their predicted target genes that may contribute to the transdifferentiation of cBMSCs (Table S1). All primers used are shown in Table S2. The GAPDH gene was used as an internal control. qRT-PCR was performed using a CFX Connect TM Real Time PCR Detection System. The reaction mixture was prepared with Maxima SYBR Green/ROX qPCR Master Mix (2×) (Thermo Scientific, USA). The relative expression levels were calculated by the 2 −∆∆Ct method.

Conclusions
To the best of our knowledge, this study is the first to focus on lncRNAs during the transdifferentiation of canine BMSCs, and lncRNAs expression changes were acquired and verified. In addition, an enrichment analysis of the targets of differentially expressed lncRNAs was performed, which could serve as a foundation for further mechanistic studies.
Supplementary Materials: Supplementary materials can be found at http://www.mdpi.com/1422-0067/21/15/ 5549/s1. Figure S1. The ratio of different types of transcripts. Table S1. A list of chosen genes and their relative lncRNAs. Table S2. The primers used for the qRT-PCR. Table S3. The clean and raw reads of sequencing data. Table S4. The match quality to genome of sequencing data. Table S5. The top 20 signaling pathways enriched by the colocalized genes of differentially expressed lncRNAs from pairwise comparison "B vs. S1": "Sample number" was the number of colocalized genes of differentially expressed lncRNAs relative with the signaling pathway. "Background number" was the number of genes in the signaling pathway. "UniGenes" were the predicted target genes of lncRNAs. Table S6. The top 20 signaling pathways enriched by the colocalized genes of differentially expressed lncRNAs from pairwise comparison "S1 vs. S2". Table S7. The top 20 signaling pathways enriched by the colocalized genes of differentially expressed lncRNAs from pairwise comparison "S2 vs. S3". Table S8. The top 20 signaling pathways enriched by the colocalized genes of differentially expressed lncRNAs from pairwise comparison "I vs. S3".
Funding: This work was supported by the National Natural Science Foundation of China (project number: 31872529) in the design of study, data collection, analysis, and writing of the manuscript.