mRNA and miRNA Expression Analysis Reveal the Regulation for Flower Spot Patterning in Phalaenopsis ‘Panda’

Phalaenopsis cultivar ‘Panda’ is a beautiful and valuable ornamental for its big flower and unique big spots on the petals and sepals. Although anthocyanins are known as the main pigments responsible for flower colors in Phalaenopsis, and the anthocyanins biosynthetic pathway in Phalaenopsis is generally well known, the detailed knowledge of anthocynins regulation within the spot and non-spot parts in ‘Panda’ flower is limited. In this study, transcriptome and small RNA libraries analysis from spot and non-spot sepal tissues of ‘Panda’ were performed, and we found PeMYB7, PeMYB11, and miR156g, miR858 is associated with the purple spot patterning in its sepals. Transcriptome analyses showed a total 674 differentially expressed genes (DEGs), with 424 downregulated and 250 upregulated (Non-spot-VS-Spot), and 10 candidate DEGs involved in anthocyanin biosynthetic pathway. The qPCR analysis confirmed that seven candidate structure genes (PeANS, PeF3′H, PeC4H, PeF3H, PeF3H1, Pe4CL2, and PeCHI) have significantly higher expressing levels in spot tissues than non-spot tissues. A total 1552 differentially expressed miRNAs (DEMs) were detected with 676 downregulated and 876 upregulated. However, microRNA data showed no DEMs targeting on anthocyanin biosynthesis structure gene, while a total 40 DEMs target transcription factor (TF) genes, which expressed significantly different level in spot via non-spot sepal, including 2 key MYB regulator genes. These results indicated that the lack of anthocyanidins in non-spot sepal may not directly be caused by microRNA suppressing anthocyanidin synthesis genes rather than the MYB genes. Our findings will help in understanding the role of miRNA molecular mechanisms in the spot formation pattern of Phalaenopsis, and would be useful to provide a reference to similar research in other species.


Introduction
Flower spots are heterochromatic dots or streaks with a specific texture and pattern appearing on the corollas of plants, which can affect the behavior of pollinators and the ornamental value of flowers [1][2][3][4]. Previous studies have confirmed the flower spot is caused by the accumulation of anthocyanins in a specific area of corollas. For example, peonidin-3-O-glucoside, malvidin-3-O-glucoside,

Anthocyanin Accumulation Patterns in Phalaenopsis 'Panda'
The examination by light microscope showed that anthocyanin accumulated in the upper epidermal cells of spot area, while no anthocyanin accumulated in cells in non-spot area ( Figure 1A-C). This result indicated that the visible spots resulted from accumulation of anthocyanin in cells of the upper epidermis. Furthermore, the anthocyanin content of the spot in Phalaenopsis 'panda' sepals was significantly different from that of the sepal according to the results by the UV-visible spectrophotometer scanning detection ( Figure 1D,E). Anthocyanin content measured by the pH differential method revealed that high content of anthocyanin (1.0798 mg/g) was accumulated in spot area, while was barely detectable in the acyanic parts extraction. Meanwhile, the model of flower spot formation pattern was predicted by the interactions of the structural genes, regulatory genes together with small RNA. This study created a joint research of transcriptome sequencing and small RNA sequencing to explain the spot pigmentation in Phalaenopsis. spp., and the results will be helpful for the breeding of new colorful cultivars of Phalaenopsis spp..

Anthocyanin Accumulation Patterns in Phalaenopsis 'Panda'
The examination by light microscope showed that anthocyanin accumulated in the upper epidermal cells of spot area, while no anthocyanin accumulated in cells in non-spot area ( Figure 1A-C). This result indicated that the visible spots resulted from accumulation of anthocyanin in cells of the upper epidermis. Furthermore, the anthocyanin content of the spot in Phalaenopsis 'panda' sepals was significantly different from that of the sepal according to the results by the UV-visible spectrophotometer scanning detection ( Figure 1D,E). Anthocyanin content measured by the pH differential method revealed that high content of anthocyanin (1.0798 mg/g) was accumulated in spot area, while was barely detectable in the acyanic parts extraction.

Construction of cDNA Library and Gene Mapping to the Reference Genomes
Four cDNA libraries were constructed using Illumina Hiseq 2000 platform (Illumina, San Diego, CA, USA) and 44,632,028,45,110,536,45,380,086,44,434,116 high-quality reads were obtained, respectively (Table S1 in supplememntary materials). The sequencing raw data has been deposited into the Short Reads Archive (SRA) database under the accession number SRP166213. These clean reads were mapped to reference genome of Phalaenopsis equestris and the average gene mapping ratio of each sample was 55.47%. We considered that the low homology between Phalaenopsis 'Panda' and Phalaenopsis equestris result to the low mapping ratio, but it has no effect on RNAseq quantitative analysis owing to the high clean reads quantity and enough sequencing data (Table S2).

Construction of cDNA Library and Gene Mapping to the Reference Genomes
Four cDNA libraries were constructed using Illumina Hiseq 2000 platform (Illumina, San Diego, CA, USA) and 44,632,028,45,110,536,45,380,086,44,434,116 high-quality reads were obtained, respectively (Table S1 in supplememntary materials). The sequencing raw data has been deposited into the Short Reads Archive (SRA) database under the accession number SRP166213. These clean reads were mapped to reference genome of Phalaenopsis equestris and the average gene mapping ratio of each sample was 55.47%. We considered that the low homology between Phalaenopsis 'Panda' and Phalaenopsis equestris result to the low mapping ratio, but it has no effect on RNAseq quantitative analysis owing to the high clean reads quantity and enough sequencing data (Table S2).

Functional Annotation and Classification
To annotate the gene with putative functions, the assembled genes were searched against the public databases of NR. Among them, 17,871 genes were annotated to the NR database. To further illustrate the main biological functions of the transcripts, GO (15,588 genes) and KEGG pathway (19,864 genes) analyses were performed (Table S3).

Identification of Differentially Expressed Genes and KEGG Enrichment Analysis of DEGs
A total of 674 DEGs were detected in pairwise comparison with 424 downregulated and 250 upregulated genes (Non-spot-VS-Spot) ( Figure 2). A total of 543 DEGs were mapped to all 106 pathways. Notably, the Flavonoid biosynthesis (ko00941, 12 DEGs of 105 gene with Q-value = 0.0005279107) and Phenylalanine metabolism (ko00360, 8 DEGs of 132 gene with Q-value = 1.900591 × 10 −1 ) were most significantly enriched in top20 pathways ( Figure 3, Table S4).

Functional Annotation and Classification
To annotate the gene with putative functions, the assembled genes were searched against the public databases of NR. Among them, 17,871 genes were annotated to the NR database. To further illustrate the main biological functions of the transcripts, GO (15,588 genes) and KEGG pathway (19,864 genes) analyses were performed (Table S3).

Identification of Differentially Expressed Genes and KEGG Enrichment Analysis of DEGs
A total of 674 DEGs were detected in pairwise comparison with 424 downregulated and 250 upregulated genes (Non-spot-VS-Spot) ( Figure 2). A total of 543 DEGs were mapped to all 106 pathways. Notably, the Flavonoid biosynthesis (ko00941, 12 DEGs of 105 gene with Q-value = 0.0005279107) and Phenylalanine metabolism (ko00360, 8 DEGs of 132 gene with Q-value = 1.900591 × 10 −1 ) were most significantly enriched in top20 pathways ( Figure 3, Table S4).   . The most enrichment pathway of DEGs (TOP20). X axis: enrichment factor; Y axis: pathway name; The color: the q-value (high: white, low: blue), the lower q-value indicates the more significant enrichment; Point size: DEG number (The bigger dots refer to larger amount); Rich Factor: the value of enrichment factor, which is the quotient of foreground value (the number of DEGs) and background value (total Gene amount), the larger the value, the more significant enrichment. Red arrows represent the pathways related directly to the anthocyanin biosynthesis.

DEGs in Anthocyanin Biosynthesis and MBW Genes
There were 10 DEGs in flavonoid biosynthesis which are directly related to spot pattern, , and most of them presented higher expression level in spot areas than non-spot areas (Table 1, Figure 4). The most enrichment pathway Figure 3. The most enrichment pathway of DEGs (TOP20). X axis: enrichment factor; Y axis: pathway name; The color: the q-value (high: white, low: blue), the lower q-value indicates the more significant enrichment; Point size: DEG number (The bigger dots refer to larger amount); Rich Factor: the value of enrichment factor, which is the quotient of foreground value (the number of DEGs) and background value (total Gene amount), the larger the value, the more significant enrichment. Red arrows represent the pathways related directly to the anthocyanin biosynthesis.
A total of 1552 DEMs were detected in pairwise comparison with 676 downregulated and 876 upregulated (Non-spot-VS-Spot) ( Figure 5). miRNA target gene prediction resulted in 1396 common unigenes between TargetFinder (http://http://targetfinder.org/) and psRobot (http://http://omicslab.genetics.ac.cn/psRobot/), 1648 by TargetFinder and 7652 by psRobot, and totally 1307 target unigenes were aligned to KEGG database. However, there were no DEMs were found targeting on anthocyanin biosynthesis structure gene. As for the regulator genes, we found totally 12 microRNAs targeting on 10 significant different expressed MYB, bHLH or WRKY unigenes (Table 3). However, only PeMYB7 (PEQU_03393), PeMYB11 (PEQU_10361, PEQU_10362) were significantly upregulated in these TF families according to the transcriptomes data, and these microRNAs belonged to miR156 or miR858 families.  (Table 3). However, only PeMYB7 (PEQU_03393), PeMYB11 (PEQU_10361, PEQU_10362) were significantly upregulated in these TF families according to the transcriptomes data, and these microRNAs belonged to miR156 or miR858 families.    The expression levels of 10 anthocyanin genes and 4 transcriptor genes between the non-spot and spot areas were verified by qPCR. The results showed the genes Pe4CL2, PeANS, PeF3H, PeF3H1, PeF3 H and PeMYB7, PeMYB11 presented significantly higher expression levels in spot areas than the non-spot areas (Figure 6), which was generally consistent with the results of the transcriptome data (Table 1).  The expression of microRNA of RNA miR858, miR156g were also analyzed by qPCR between the non-spot and spot areas (Figure 7). The results showed that they presented higher expression levels in non-spot areas, which means their targeting genes, PeMYB7 and PeMYB11, having less expression levels in the non-spot areas. The expression of microRNA of RNA miR858, miR156g were also analyzed by qPCR between the non-spot and spot areas (Figure 7). The results showed that they presented higher expression levels in non-spot areas, which means their targeting genes, PeMYB7 and PeMYB11, having less expression levels in the non-spot areas.

Low Expression of Anthocyanin Genes Causing the Lack of Pigments in Non-Spot Areas
In this study, we found anthocyanin accumulated in the upper epidermal cells in spot area, while no anthocyanin accumulated in cells in non-spot areas of 'Panda' (Figure 1). Transcriptome data and qPCR indicated 7 anthocyanin genes were low expression in the non-spot areas, especially for the very different expression levels of 4CL2, F3'H, F3H and F3H1 (Table 2, Figure 6). In Phalaenopsis 'Everspring Fairy', which also has purple spot on the white petal and sepal, DFR is the main gene which differently expressed between the purple and white part of the petals and sepals [12]. However, in Phalaenopsis 'Panda' expression of DFR in spot tissue was not significantly changed in comparison to non-spot area. These results showed that the same phenotype may be caused by different genes in the pigmentation in Phalaenopsis.

PeMYB7 and PeMYB11 Are Important Genes in Spot Formation
MYBs have been found to be very important in the floral pigmentation patterning in Phalaenopsis spp. In the sepals/petals, silencing of PeMYB2, PeMYB11, and PeMYB12 resulted in the loss of the full-red pigmentation, red spots, and venation patterns, respectively; PeMYB11 was responsive to the red spots in the callus of the lip, and PeMYB12 participated in the full pigmentation in the central lobe of the lip [21]. In P. amabilis and P. schilleriana, anthocyanin-specific Myc and Wd were expressed, however, Myb specific for anthocyanin biosynthesis were undetectable in P. amabilis; in Phalaenopsis 'Everspring Fairy' petals and sepals, high levels of anthocyanin-specific Myb transcripts were present in the purple, but not in the white sectors [12]. Hsu, et al. [29] verified PeMYB11 as the major regulatory R2R3-MYB transcription factor for regulating the production of the black color, and a retrotransposon, named Harlequin Orchid RetroTransposon 1 (HORT1), resulted in strong expression of PeMYB11 and thus extremely high accumulation of anthocyanins in the harlequin flowers of the P. Yushan Little Pearl variety.
In our study, PeMYB7 and PeMYB11 expressed significantly different between the spot and non-spot areas, while PeMYB2 and PeMYB12 had not different expression levels (Table 2 and Figure 6). This result indicated that the gene of PeMYB11 may also play an important role in spot pigmentation in Phalaenopsis 'Panda', which was similar to these previous researches [12,21,29,30]. PeMYB7 had different expression levels between the spot and non-spot areas of sepals in this research, which indicated this gene may also associate with purple spot patterning in Phalaenopsis 'Panda'. The function of PeMYB7 is not very clear presently, however, in the phylogenetic tree inferred from MYB genes of Phalaenopsis equestris and Oryza sativa ( Figure 8A), PeMYB7 was in the same clad of PeMYB2 and PeMYB8, suggesting PeMYB7 may have similar function as PeMYB2 which can activate anthocyanin synthesis [30].

miR156g, miR858 Silence PeMYB7, and PeMYB11
MiRNAs can influence tissue anthocyanin formation in previous studies. In petunia, the sequence-specific degradation of CHS RNA is the primary cause of the formation of white sectors in 'Red Star' flowers [23]. In potato (Solanum tuberosum L.), small RNAs of miR828 and TAS4 D4(−) can decrease the expression levels of MYB12 and R2R3-MYB genes in purple skin and flesh, which caused to anthocyanin accumulation [31]. In our research, miR156 and miR858 were predicted as the main interference RNAs for PeMYB7 and PeMYB11 (Table 3). In order to confirm PeMYB7 and PeMYB11 are the direct targets of the small RNAs, we download the complete cDNAs of PeMYB7 (GenBank: KF769472.1) and PeMYB11 (GenBank: MH675649.1), and analyzed the target sites of miR156 and miR858 in these two genes ( Figure 8B). This finding confirms that miR156g and miR858 can silence PeMYB7 and PeMYB11. In Arabidopsis thaliana, miR156 negatively regulates the anthocyanin accumulation by regulating the expression level of target gene MYB113 indirectly [32]. High miR156 level promotes anthocyanin biosynthesis through the negative regulation of SPL9 gene because SPL9 can destabilize the MYB-bHLH-WD40 transcriptional activation complex. However, in this study, SPL9 gene in Phalaenopsis doesn't express differently, which indicated that mtr-miR156g-3p may be able to directly silence PeMYB7 and caused the anthocyanin lacking on the white sector.
The biological functions of miR858 has not been fully explored, however, small RNA sequencing in Arabidopsis thaliana [24] and Vitis vinifera L. [33] verified that its target gene is MYB12, which is a positive transcript factor for the anthocyanin synthesis. Chitwood et al. transferred the constructed binary vector into wild-type tomato plants to reduced MIR858 expression level, and found the expression levels of SLY-Myb12 in transgenic plants were lower than wild type, with upregulations of the related structure gene PAL, DFR, ANS and CHS involved in anthocyanin biosynthesis, and increasing of anthocyanin content [34]. Ballester et al. [35] used VIGS technology silenced MYB12 and then found the tomato fruits turned pink. In our study the targeted genes of miR858 were predicted to be both MYB12 and MYB11 (Table 3), however, the unigene of PeMYB12 didn't show significantly different expression levels between the spot and non-spot areas, suggesting that miR858 explored higher effect on the unigene of PeMYB11 in 'Panda'.

A Proposed Modelsummarizing of Spot Formation Pattern in Phalaenopsis 'Panda'
Our research confirmed Pe4CL2, PeF3H, PeF3′H, and PeANS expression were low or repressed in non-spot areas in 'Panda' sepals, and MIR156g and MIR858 may silence the expression of PeMYB11 and PeMYB7 in non-spots parts. In conclusion, we illustrated a proposed molecular mechanism in In Arabidopsis thaliana, miR156 negatively regulates the anthocyanin accumulation by regulating the expression level of target gene MYB113 indirectly [32]. High miR156 level promotes anthocyanin biosynthesis through the negative regulation of SPL9 gene because SPL9 can destabilize the MYB-bHLH-WD40 transcriptional activation complex. However, in this study, SPL9 gene in Phalaenopsis doesn't express differently, which indicated that mtr-miR156g-3p may be able to directly silence PeMYB7 and caused the anthocyanin lacking on the white sector.
The biological functions of miR858 has not been fully explored, however, small RNA sequencing in Arabidopsis thaliana [24] and Vitis vinifera L. [33] verified that its target gene is MYB12, which is a positive transcript factor for the anthocyanin synthesis. Chitwood et al. transferred the constructed binary vector into wild-type tomato plants to reduced MIR858 expression level, and found the expression levels of SLY-Myb12 in transgenic plants were lower than wild type, with upregulations of the related structure gene PAL, DFR, ANS and CHS involved in anthocyanin biosynthesis, and increasing of anthocyanin content [34]. Ballester et al. [35] used VIGS technology silenced MYB12 and then found the tomato fruits turned pink. In our study the targeted genes of miR858 were predicted to be both MYB12 and MYB11 (Table 3), however, the unigene of PeMYB12 didn't show significantly different expression levels between the spot and non-spot areas, suggesting that miR858 explored higher effect on the unigene of PeMYB11 in 'Panda'.

A Proposed Modelsummarizing of Spot Formation Pattern in Phalaenopsis 'Panda'
Our research confirmed Pe4CL2, PeF3H, PeF3 H, and PeANS expression were low or repressed in non-spot areas in 'Panda' sepals, and MIR156g and MIR858 may silence the expression of PeMYB11 and PeMYB7 in non-spots parts. In conclusion, we illustrated a proposed molecular mechanism in spots formation in 'Panda' (Figure 9). The diagram revealed the unique spot formation in 'panda' may result from the higher expression of MIR156g and MIR858 clusters in non-spot tissue, which targeted and suppressed the expression of key regulate genes PeMYB7, PeMYB11 and then caused very low expression of Pe4CL2, PeF3H, PeF3 H, and PeANS in non-spot tissue. These results finally led to the lack of synthesis and accumulation of anthocyanin in non-spot area.

Plant Materials
Phalaenopsis'Panda' (Figure 10) were grown at the horticultural farm of Hainan University (Latitude: 20.03N, longitude: 110.33E), Haikou, Hainan Province, China. The sepal tissues were divided into two parts, spots area and non-spots area for total anthocyanin analyses and totally RNA extraction. Three independent biological replicates were collected for each sample. All samples were immediately frozen in liquid nitrogen and stored at −80 °C until use.

Plant Materials
Phalaenopsis 'Panda' (Figure 10) were grown at the horticultural farm of Hainan University (Latitude: 20.03N, longitude: 110.33E), Haikou, Hainan Province, China. The sepal tissues were divided into two parts, spots area and non-spots area for total anthocyanin analyses and totally RNA extraction. Three independent biological replicates were collected for each sample. All samples were immediately frozen in liquid nitrogen and stored at −80 • C until use.

Plant Materials
Phalaenopsis'Panda' (Figure 10) were grown at the horticultural farm of Hainan University (Latitude: 20.03N, longitude: 110.33E), Haikou, Hainan Province, China. The sepal tissues were divided into two parts, spots area and non-spots area for total anthocyanin analyses and totally RNA extraction. Three independent biological replicates were collected for each sample. All samples were immediately frozen in liquid nitrogen and stored at −80 °C until use.

Observations of Sepal Anatomy and Determination of Total Anthocyanin Content
Spot and non-spot sepal tissue of full bud were cut into segments longitudinally (approx.10 mm × 5 mm), and the upper epidermis and dorsal epidermis were separated with forceps. The sections were observed and photographed with a light microscope (Primo Vert, Zeiss, Germany).
The total anthocyanin content of Phalaenopsis sepal in spot and non-spot areas were determined by the pH differential method [36].

RNA Extraction, cDNA Library Construction, and mRNA Sequencing
RNAs were extracted from spot and non-spot parts of sepals of full bud using the modified Trizol method [6]. Two biological replicates were used for spot sepal and non-spot sepal. A total of 4 RNA-seq libraries were constructed from these sepals. RNA-Seq libraries were constructed using the RNA Library Prep Kit for Illumina according to the manufacturer's instructions (NEB, Ipswich, MA, USA). Library quality was assessed on the Agilent Bioanalyzer 2100 system. The mRNA libraries were sequenced on the Illumina HiSeq 2000 platform (Illumina, San Diego, CA, USA) based on sequencing by synthesis with 150 bp paired-end reads (Biomarker Technologies, Wuhan, China).

mRNA Transcriptome Data Analysis
The reads with adaptors were removed firstly. The reads in which unknown bases comprised more than 5% of the total and low quality reads (the percentage of the low quality bases of quality value ≤ 15 is more than 20% in a read) were also removed. The clean reads were aligned to Phalaenopsis equestris genome accessed from (https://www.ncbi.nlm.nih.gov/bioproject/192198) by HISAT2 (http://www.ccb.jhu.edu/software/hisat/index.shtml) [37].

qRT-PCR Analyses of mRNA
10 DEGs in anthocyanin biosynthesis and 4 DEGs of transcription factor were chosen for qRT-PCR analyses. qRT-PCR analyses were performed using SYBR Premix Ex TaqTM II (Tli RNaseH Plus) (Takara, Dalian, China) according to the manufacturer's instructions with the following reaction conditions: denaturation at 95 • C for 30 s and 40 cycles of amplification (95 • C for 5 s, 60 • C for 30 s). The β-Actin gene was used as an internal control for normalization. Relative expression levels of target genes were calculated using the 2 −∆∆Ct [40] against the internal control. The gene-specific primers are shown in Table S7. Experiments were performed with three independent biological replicates and three technical replicates.

Small RNA Library Construction and Sequencing
RNA was extracted from spot and non-spot sepal using the modified Trizol method [6], and 4 small RNA libraries from sepal spot and non-spot samples with 2 biological replicates were constructed and sequenced by SBQ500 sequencing method (BGI, Wuhan, China). Then the low-quality, 5 primer contaminants, without 3 primers and insert tags, and sequences fewer than 18 nucleotides (nt) were filtered out to obtain clean reads from raw data reads. The final clean reads of the 4 libraries were mapped to Phalaenopsis genome and other smallRNA databases using Bowtie2 [41]. All the remaining clean sequences were annotated into different classes to remove rRNA, scRNA, snoRNA, snRNA, and tRNA using miRBase, siRNA, piRNA, snoRNA database with Bowtie2 [42] and Rfam database with cmsearch [43]. The novel miRNAs were predicted by Mireap software (https://sourceforge.net/ projects/mireap/).

Data Analysis of Small RNA Sequencing
The expression level of miRNAs was compared between spot and non-spot tissues to identify differentially expressed miRNAs (DEMs). The differentially expressed miRNA between spot and non-spot tissues were identified and filtered with the R package DESeq2 and Poisson Distribution [38,44]. The DEMs were determined based on FDR ≤ 0.001 and the absolute value of |Log2FC| ≥ 1. The heatmap displays of the Trimmed Mean of M-values (TMM) normalized against the Fragment Per Kilobase of transcripts per Million mapped reads (FPKM) were performed created using the R package pheatmap (https://cran.r-project.org/src/contrib/Archive/pheatmap/). The potential target genes by differentially expressed miRNAs were predicted and analyzed with 2 different software, including psRobot [45] and Targetfinder [46]. In order to increase the level of confidence and get more reliable results, we selected only those binding sites that were predicted by both of two softwares.
All protein-coding target genes regulated by DEMs were annotated against the KEGG databases. The KEGG enrichment analysis for the target genes of the DEMs was performed by Fisher's exact test (P-values) in Blast2GO pipeline, and P-values were used to conduct multiple test correction by FDR. KEGG terms with FDR< 0.05 were considered to be significantly enriched.

Verification of miRNA Expression Levels by qPCR
Two miRNA targets on key regulate gene transcription factor PeMYB11 and PeMYB7 were chosen for qRT-PCR analyses. The qRT-PCR analyses were performed using SYBR Premix Ex TaqTM II (Tli RNaseH Plus) (Takara, Dalian, China) according to the manufacturer's instructions with the following reaction conditions: Denaturation at 95 • C for 30 s and 40 cycles of amplification (95 • C for 5 s, 60 • C for 30 s). The U6 gene was used as an internal control for normalization. Relative expression levels of target genes were calculated using the 2 −∆∆Ct [40] against the internal control. The specific stem-ring primers were designed according to the miRNA qPCR primer design method [47], showed in Table S8. Experiments were performed with three independent biological replicates and three technical replicates.