miRNA Profiling and Its Role in Multi-Omics Regulatory Networks Connected with Somaclonal Variation in Cucumber (Cucumis sativus L.)

The role of miRNAs in connection with the phenomenon of somaclonal variation, which occurs during plant in vitro culture, remains uncertain. This study aims to investigate the possible role of miRNAs in multi-omics regulatory pathways in cucumber somaclonal lines. For this purpose, we performed sRNA sequencing (sRNA-seq) from cucumber fruit samples identified 8, 10 and 44 miRNAs that are differentially expressed between somaclones (S1, S2, S3 lines) and the reference B10 line of Cucumis sativus. For miRNA identification, we use ShortStack software designed to filter miRNAs from sRNAs according to specific program criteria. The identification of predicted in-silico targets revealed 2,886 mRNAs encoded by 644 genes. The functional annotation of miRNA’s target genes and gene ontology classification revealed their association with metabolic processes, response to stress, multicellular organism development, biosynthetic process and catalytic activity. We checked with bioinformatic analyses for possible interactions at the level of target proteins, differentially expressed genes (DEGs) and genes affected by genomic polymorphisms. We assume that miRNAs can indirectly influence molecular networks and play a role in many different regulatory pathways, leading to somaclonal variation. This regulation is supposed to occur through the process of the target gene cleavage or translation inhibition, which in turn affects the proteome, as we have shown in the example of molecular networks. This is a new approach combining levels from DNA-seq through mRNA-seq, sRNA-seq and in silico PPI in the area of plants’ somaclonal variation.


Introduction
For over forty years, it has been reported that the genetic variation in regenerated plants can be induced de novo as a result of the tissue culture environment [1]. However, earlier work indicates that in vitro tissue cultures are susceptible to mutations [2], and the occurrence of many phenotypic changes were later described [3,4]. This variability was defined as somaclonal variation and is defined as the genetic variation observed among the progeny of plants regenerated from somatic cells cultured in vitro [1]. Acquiring plants through tissue cultures is associated with the appearance of changes of a different nature. One of them is genetic variation, manifested by the presence of an increased frequency of point mutations [5], chromosome instability, structural changes and aneuploidy [6,7] and the activation of mobile elements [8]. Moreover, the coexistence of genetic and epigenetic changes resulting from in vitro culture was detected in regenerated crops, e.g., in rye [9], barley [10] and wheat [11]. Somaclonal variation is thought to be a negative phenomenon, for example introducing deleterious traits to the resulting plants. It is still not possible to predict the results of in vitro regeneration. It is random, and in many cases, there is no repeatability, and a large number of genetic changes are based on point mutations or chromosome rearrangements that occur in the segregation of the R1 generation [12,13]. On the other hand, somaclonal variation in in vitro regeneration can be a positive force in inducing a new variety and a source of useful new traits, while being cheaper than conventional breeding. Many agronomic traits that resulted from somaclonal variation have been exploited in crop breeding [9,14]. The phenomenon of somaclonal variation is being investigated, and new aspects of analysis are emerging, such as the role of miRNA in this process. Until now, miRNAs have been studied more in the context of in vitro cultures rather than the phenomenon of variability. In vitro culture conditions such as culture medium, photoperiod or phytohormone ratio can be adjusted to promote somatic embryogenesis or organogenesis from induced callus [15]. The molecular pathway of shoot regeneration primarily includes phytohormones and genes related to shoot apical meristem [16]. The expression of these genes appears to be regulated at transcriptional and post-transcriptional levels. Many relevant transcripts are overseen by microRNAs (miRNAs) that have been shown to act as signals limiting gene expression in specific plant tissues [17][18][19][20]. Another example would be the study concerned with regulatory miRNA analysis in Taxus cells. These miRNAs regulate a series of genes involved in various pathways, including transcription factors, methyltransferases and functional enzyme genes. Genes targeted by miRNAs were correlated with pathways such as oxidative phosphorylation, RNA polymerase, purine and pyrimidine metabolism, plant hormone and signal transduction. These results indicate that miRNAs are one of the essential regulators of primary metabolism during long-term subculture [21]. Further results suggest that increasing the biosynthesis of taxol and other secondary metabolites is an active regulatory measure of calli to adapt to the heterotrophic culture, and this alteration mainly involved direct and indirect miRNA-induced transcriptional reprogramming. These results expand the understanding of the relationships among the metabolism of biological substances, the biosynthesis of secondary metabolites and defense systems [22].
Moreover, miRNA genes, as well as their biogenesis, are susceptible to aberrant regulation in vitro [23]. An example of such regulation was described for micropropagated strawberries; during in vitro culture, the miR156 was up-regulated, whereas other miRNAs (miR164, miR172 and miR390) were down-regulated [24]. Differentially regulated miRNAs are involved in many fundamental biological processes. Thus, under in vitro conditions, various miRNAs even from the same family may differ in their expression.
With the availability of next generation high-throughput sequencing (NGS) technology, thousands of miRNAs have been identified across the plant kingdom [25,26]. Many of them are conserved among plant species, whereas others are specific to certain species or even plant lines [27,28]. Conserved miRNAs do not necessarily show the same expression or pattern in different species or even at different stages of development within a species [27,29].
Plant miRNAs are typically 20-24 nt and act on targets through post-transcriptional gene silencing (PTGS), either by transcript cleavage or by translational repression [30]. The miRNAs are by far the best characterized class of sRNAs. They are involved in a series of biological processes, such as growth and development [31,32], hormone signal transduction [22,33] and response to abiotic and biotic stresses [34,35].
For cucumbers, miRNAs from leaf, root, stem and phloem exudates were analyzed in three independent high-throughput sequencing studies [36][37][38]. They identified 31, 29 and 25 known and 49, 2 and 7 novel families of miRNAs in cucumber plants. Recently, the regulatory roles of miRNAs and their targets, which are modules in cucumber fruit expansion, described the identification of 1253 known and 1269 new miRNAs from nine cucumber small RNA (sRNA) libraries by high-throughput sequencing [39].
In this study, we used three somaclonal lines (S1, S2 and S3) that differed phenotypically from the cultivar 'Borszczagowski B10' (the progenitor line) and were obtained by different regeneration methods. The S1 line showed a mosaic phenotype (the combination of small yellow and irregular large silvery spots on the leaves). The S2 line showed an altered fruits phenotype, which is light green, glossy, without a waxy coat and lacking typical wards and netting. The S3 line produced shoot apices yellow-green in color. The phenotypic differences in the somaclonal lines are predominantly associated with the constituent factors and explants from which the culture was initiated: direct leaf regeneration, leaf callus regeneration and embryogenic suspension culture (see details in the Materials and Methods section).
In an earlier work, we attempted to explore the variability between the genomes [40] and transcriptomes profile [41]. We suggested that differential gene expression may be caused by a polymorphism in the genic region and may also be a result of interaction among molecular networks [41]. We hypothesize that, between the somaclonal lines and the reference B10 line, there are miRNAs with differential expression, and our aim was to find the specific differentially regulated miRNAs and their targets and to indicate significant biological processes in somaclonal variations. We performed sequencing of sRNA profiles and pointed out differentially regulated miRNAs and their targets. After functional annotation and gene ontology classification of targets, the most abundant terms were described. Further, we checked the molecular interaction of target proteins with proteins encoded by genes with polymorphism, and proteins encoded by genes with differential expression with genes. In summary, we identified differential miRNAs and targets specific to somaclonal lines and also common to all lines. This is interesting, given that the somaclonal lines came/originated from different experiments. Additionally, we created molecular networks combining omic levels from DNA-seq through mRNAseq, sRNA-seq and in silico protein-protein interaction (PPI). Such molecular network imaging is a new approach, especially for plants, and we are the first to show this for somaclonal variation.

Overall Scheme of the Analysis
The analysis performed in the experiment included several consecutive steps, starting with the analysis of the raw sequencing data. A general scheme showing the successive steps is shown in Figure 1.

Overview of sRNA Sequencing
sRNAs can influence complex molecular networks and certainly play a key role in a variety of regulatory pathways involved in many plant developmental processes. We performed sRNA sequencing from small fruits (two weeks after pollination) from three cucumber somaclonal lines (S1, S2 and S3) and a control B10 line. A total of 12 sRNA libraries were constructed (three repetitions per sample). A total number of reads after high-throughput sequencing ranged between 52 and 65 million, with an average Phred quality of 37 (Table 1). The 18-25 nt reads were annotated and the length distribution was analyzed ( Figure 1). More than 80% of the total redundant sRNA were between 21 and 24 nt in length, which is the most typical length of sRNA in plants. Fragments of 24 nt were most abundant in each cucumber fruit library in all sRNA profiles analyzed along with redundant (S1-47.48%; S2-48.97%; S3-39.48% and B10-45.53%) and non-redundant (S1-44.38%; S2-47.49%; S3-43.07% and B10-47.82%) pools. In all analyzed lines, the distribution of the number and length of the reads were similar (Figure 2). Principal component analysis divided sRNA samples based on their similarity into separate clusters ( Figure 3).

miRNA Identification and Differential Expression
The first step in the analysis was to identify miRNA sequences among the sRNA sequencing results. For this purpose, ShortStack software [17] was used. This tool allowed us to create a list of potential sequences that could be classified as miRNAs to Y and N15 status (Table S1).
The comparison of the miRNA expression profiles of the S1, S2 and S3 lines with the control B10 line revealed differences in miRNA expression ( Table 2) with parameters as an adjusted p-value cutoff 0.05 and a Log 2 FC (Fold Change) ≥ 1.0 ( Figure 4). There were eight miRNAs in S1, one of which was down-regulated and seven of which were up-regulated. In S2, there were 10 miRNAs, five of which were down-regulated and five of which were up-regulated. The highest number of variable expressions of the miRNAs was observed in the S3 line, and it was 44 miRNAs, out of which 25 were down-regulated and 19 were up-regulated ( Figure 5, Table S1). The miRNAs were aligned to the miR-Base with Cucumis sativus, Cucumis melo, Arabidopsis thaliana, Arabidopsis lyrata, Brassica napus, Brassica rapa, Medicago truncatula, Lotus japonicus and other plant species. The annotation of differential miRNAs showed that, of all miRNAs, only 14 had database equivalents, and the rest of the miRNAs were identified as new (cst-novel-miR) (Table S1). In the S1 line, only one up-regulated miRNA (ath-miR393a-3p) was similar to those in the databases, and in the S2 line, one was down-regulated (cme-miR169l), and four were up-regulated (cme-miR167c, cme-miR394a, cme-miR171e, ath-miR393a-3p). In the S3 line, the two known miRNAs were down-regulated (cme-miR169l, cme-miR156b), and six were up-regulated (cme-miR164b, cme-miR166b, cme-miR396e, cme-miR390d, cme-miR169f, ath-miR393a-3p). Primary transcripts (pri-novel-miRNAs) form complicated, stable secondary structures, from which pre-miRNA and finally mature miRNAs are cleaved ( Figure 6 and Supplementary Figure S1). Experimental confirmation of the presence of the five miR-NAs was carried out by a two-step RT-qPCR method: the first step was two-tailed RT with target-specific primers, and the second step was a standard qPCR reaction ( Figure 7, Table S2). The cst-novel-miR5.2 and ath-miR393a-3p were significantly down-regulated in the S1 line, but no changes were observed in the S2 and S3 lines. A significant decrease in cst-novel-miR274 expression was observed in all somaclonal lines. The cst-novel-miR414 presented a lower expression in the S1 and S2 lines and a higher expression in the S3 line, but it was not statistically significant. The cst-novel-miR444.3 expression level was significantly higher in the S2 and S3 lines. cst-novel-miR153 −5.474

Identification of Target Genes
A total of 2886 mRNAs (including isoforms) encoded by 644 genes were identified as predicted in-silico miRNA targets, including 689 targets coded by 113 genes targeted by known miRNAs, and 2284 targets coded by 564 genes targeted by novel miRNAs. In total, there were 560 mRNAs (244 genes), 283 mRNAs (40 genes) and 2400 mRNAs (408 genes) for the S1, S2 and S3 lines, respectively. The identified genes were targets to one known and six novel miRNAs in the S1 line, five known and three novel miRNAs in the S2 line, and eight known and 35 novel miRNAs in the S3 line. The annotation of the targets is listed in Table S1. Ten genes detected as targets for miRNAs were common to all three lines, and all of them were targets to ath-miR393a-3p ( Figure 8, Table 3). The degradome PAREsnip analysis confirmed 173 targets in S1, 95 targets in S2 and 227 targets in S3. The same analysis confirmed one miRNA in S1, four miRNAs in S2 and 13 miRNAs in S3 (Table S3).

Figure 8.
Venn diagram of targeted genes (with maximum expectation value ≤ 3) in three cucumber somaclonal lines (S1, S2 and S3). Table 3. Description of 10 target genes that were common to all three somaclonal lines and their miRNAs. C -regulation of gene expression by cleavage.

Target
Biotype Description miRNA S1 S2 S3 protein_coding peroxisome biogenesis protein ath-miR393a-3p C G15867 protein_coding peroxisome biogenesis protein ath-miR393a-3p C G16626 protein_coding chaperone protein ath-miR393a-3p C G2235 protein_coding OTU domain-containing protein ath-miR393a-3p C G2374 protein_coding Putative methyltransferase ath-miR393a-3p C G2520 protein_coding CAS1 domain-containing protein ath-miR393a-3p C G3656 protein_coding protein_coding GYF domain-containing protein ath-miR393a-3p C All detected differentially regulated miRNAs had targets. The miRNA target genes were classified by gene ontology into three categories: biological process (BP), metabolic function (MF) and cellular component (CC) (Figure 9, Table S4). There were 44 genes, 38 genes and 381 genes in the S1 line, the S2 line and the S3 line, respectively. Of the genes, 79-80% were annotated with at least one GO term. Figure 9 shows the five most abundant terms in the biological process, molecular function and cellular component categories. A detailed analysis of the GO assignment is presented in Table S4. In the biological process category, the most abundant terms were common to all of the somaclonal lines, and those were: cellular process (29-35% of genes), metabolic process (21-30%) and regulation of the biological process (15-29%). Many genes were also classified to response to stimulus (5-10%) and localization (5-8%) terms. In the S1 and the S3 line, many genes were assigned with multicellular and developmental process terms. In the cellular component category, there were also similar terms in all three lines. The membrane and its internal component (26-31% of genes), intracellular structure (16-20% of genes), organelles (16-18%) and cytoplasm (9-13%) were the most abundant terms. The most common terms in the molecular function category were binding (42-46% of genes) and catalytic activity (35-40%). Other highly prevalent terms were ATP-dependent activity (4-6%) and transcription regulator activity (2-5%). In the S3 line, a large set of genes was classified with the molecular function regulator term (7%). . Percentage of GO terms assigned to target genes in three cucumber somaclonal lines (S1, S2 and S3). BP-biological process, MF-molecular function, CC-cellular compartment.
The 10 target genes common to the three lines are responsible for cellular component organization, response to stress, and catabolic and carbohydrate metabolic processes. They show catalytic and hydrolase activity and binding of the nucleic acid and proteins. According to enrichment, the common target genes are related to membrane, cell wall, chloroplast and peroxisome ( Figure 10). Figure 10. Percentage of GO terms assigned to 10 target genes that are common to all three cucumber somaclonal lines (S1, S2 and S3). BP-biological process, MF-molecular function, CC-cellular compartment.
We created a genomic map and marked the chromosome localization of differentially expressed miRNA genes and their targets for each somaclonal line separately. Not all of the miRNAs are shown on the map, as not all of the reference genome contigs are assigned to the chromosomes. On the presented map, 50% of the miRNAs from the S1 line (Figure 11b), 92% of the miRNAs from the S2 line ( Figure 11d) and 81% of the miRNAs from the S3 line (Figure 11f) are shown. An analysis of the target gene distribution on cucumber chromosomes (Figure 11a,c,d) showed that they are distributed evenly on seven chromosomes. The position of target genes and the position of the miRNA genes that regulate their expression seem to not be related.

Target Expression Profiles Based on Transcriptome Data
To profile the expression of targets, transcriptome sequencing was performed using the same samples that were used for miRNA sequencing. The data expression level of targets was extracted from the transcriptome data [41].
Transcripts were considered differentially expressed in somaclonal lines relative to the control line B10 when the log 2 Fold Change was >0.5 and the adjusted p-value was >0.05. A total of 40 transcripts, which are targets for miRNAs, are differentially expressed in the three somaclonal lines. Among them, four targets of two novel miRNAs and one known miRNA were in the S1 line. There were three up-regulated targets and one down-regulated target (Table 4). In the S2 line, there was one up-regulated transcript targeted by one known miRNA (Table 4). In the S3 line, there were 35 differentially expressed transcripts targeted by 12 miRNAs, among which seven were novel and five were known miRNAs (Table 4). Sixteen targets were down-regulated, and 10 targets were up-regulated. In silico predicted inhibition: C Target regulation by cleavage, T Target regulation by translation inhibition.
We also performed relative expression analysis for seven genes that are targeted by ath-miR393a-3p (G1085 and G3656) and cst-novel-miR5.2 (G8545 and G9069), cst-novel-miR242 (G6795), cst-novel-miR9 (G18264), cst-novel-miR19 (G3861) and cme-MIR394a (G6278). The expression of the G1085 gene was significantly lower in the S1 and the S2 lines, whereas G3656 gene expression was 2.7 and 3.2 times higher in lines S2 and S3, respectively, than in the control line. However, the other target gene of cst-novel-miR5.2, G9069, was up-regulated in the S1 line and down-regulated in the S2 and S3 lines, and all changes are significantly important ( Figure 12, Table S2). The expression of the G6795 gene was significantly lower in the S1 line. The expression of the G18264 gene was also significantly lower in the S1 line, but 4.3 times higher than the control in the S3 line. G3861 gene expression was 11.3 times higher in the S3 line than in the control B10 line.
The relative expression level of the G6278 gene was also significantly higher in the S3 line ( Figure 12, Table S2). The expression results obtained by qPCR were in accordance with results obtained previously by RNA-seq analysis (Table S2).

Modeling of Interaction Regulatory Networks
We constructed a molecular network consisting of three types of protein data. The first group included in the network of protein-protein interaction (PPI) were proteins coded by target genes for miRNA, the second type were proteins coded by differentially regulated genes described in detail in Pawełkowicz et al. [41], and the third set were proteins coded by genes (pointed by genome comparisons of the somaclonal lines with the B10 genome) in which polymorphisms of high importance in the context of the structure of the emerging protein have occurred [40]. The protein interaction networks were constructed with Cytoscape with the STRING application. As input, we used a total of 581, 491 and 640 identifiers for the S1, S2 and S3 lines, respectively. These included 44 miRNA targets, 418 differentially expressed genes (DEGs) and 119 genes affected by single nucleotide variants (G_SNVs) for the the S1 line; 36 targets, 365 DEGs and 90 G_SNVs for the S2 line; and 331 targets, 198 DEGs and 111 G_SNVs for the S3 line (Table S5).
As a result, for the S1 line, all 535 identifiers were matched in the STRING database, and 398 proteins were included in the main network, among which there were proteins encoded by 27 target genes, 288 DEGs and 83 G_SNVs. Fourteen proteins were involved in six smaller tri-and di-component networks ( Figure 13, Table S5). The rest of the proteins were singletons. In the S2 line, 426 proteins were found. The main network consisted of 249 proteins, among which there were 14 targets, 181 DEGs and 54 G_SNVs. There were also 12 smaller networks containing 30 proteins. The remaining proteins were singletons ( Figure 13, Table S5). In the S3 analysis, 604 proteins were matched in the STRING databases. The main network consisted of 387 proteins, among which there were 204 targets, 118 DEGs and 65 G_SNVs. The 42 proteins were grouped into 13 smaller networks. The rest of the proteins were assigned as singletons ( Figure 13, Table S5). The trend of the node degree distribution (NDD) in each network shows that most nodes have a relatively small degree, but a few nodes will have very large degrees, being connected to many other nodes ( Figure S2a). The neighborhood connectivity (NC) spans over a range of values wider than the standard nodal degree, resulting in a statistically more reliable classification of infrastructure networks ( Figure S2b). Plots for both NDD and NC values in the three analyzed lines indicate that the infrastructure of PPI networks are real.
We performed networks enrichment with regard to gene ontology and UniProt Keywords (Table S6). Enrichment of the molecular network in the S1 line showed that most of the terms that were linked in the cellular components group were related to plastids and their compartments. Proteins assigned to molecular function were connected with transmembrane transporter activity, oxidoreductase activity and ion binding. In the biological processes category, the most enriched terms were connected to biosynthetic processes of heme, chlorophyll and porphyrin-containing compounds, and they were also connected to a response to the absence of light. The most numerous Uniprot Keywords were porphyrin, chlorophyll and heme biosynthesis.
The molecular network enrichments in the S2 line revealed that most of the terms that were connected in the cellular component group were connected with cells and membranes. Proteins assigned to molecular function were connected with DNA binding, transcription factor activity, catalytic activity and molecular function regulator. In the category of biological processes, the most enriched terms were connected with different processes related to development of shoot systems and anatomical structure morphogenesis. The most numerous Uniprot keywords were as follows: amino-acid binding, calmodulinbinding, membrane and transcription regulation.
Enrichment of the molecular network in the S3 line showed that most of the terms that were connected in the cellular components group were related to chloroplast, cytosol and cytoplasm. Proteins assigned to molecular function were connected with compound binding and catalytic activity. In the category of biological processes, the most enriched terms were connected with microsporogenesis, regulation of reproduction process and shoot morphogenesis. The most common UniProt keyword was oxidoreductase.

Discussion
Somaclonal variability is a phenomenon that occurs during plant-regeneration in in vitro cultures. It is usually undesirable because it affects plant variability, but sometimes the new traits obtained are desirable and can provide a new source for plant breeding. The phenomenon of somaclonal variation is very complex and still poorly understood. In an earlier work, we attempted to explore the variability between the genomes of somaclonal lines (S1, S2 and S3) compared to the reference B10 line [40]. The total number of obtained polymorphisms differed from 7591 to 44510. Detected polymorphisms were most frequent in non-coding regions and were mainly SNPs. High-impact changes accounted for 1-3% of all polymorphisms [40]. Comparison of the transcriptome profiles of small fruits revealed 418, 36 and 273 genes that were differentially regulated in the S1, the S2 and the S3 line, respectively [41]. We suggested that the differential gene expression may be caused by polymorphism in the genic region and may also be a result of interaction among molecular networks, which triggers specific pathways. We wondered whether miRNAs play a role in somaclonal variation, whether there is any difference in miRNA profiles in the somaclonal lines compared to the B10 reference line, and whether these changes are transmitted generationally. For this purpose, we performed sequencing of sRNA profiles and identified differentially expressed miRNAs for which we found target molecules. Then, we checked bioinformatic analyses for possible interactions between proteins encoded by targets, G_SNV and DEGs. Checking such interactions seems to be important in the context of molecular network analysis and in the link to the phenomenon of somaclonal variation. microRNA molecules can be negative regulators of gene expression, reducing the stability of target RNAs or limiting their translation. However, in fact, the direction of these changes is unknown. According to the competitive endogenous RNA (ceRNA) hypothesis [42], the relationship between mRNAs and microRNAs could be reciprocal. In addition to the conventional microRNA → RNA function, a reversed RNA → microRNA logic exists, in which coding and noncoding RNA targets can crosstalk through their ability to compete for microRNA binding. RNA molecules could communicate with each other through microRNA and microRNA response sequences (MREs-miRNA response elements). The greater the number of shared MREs, the greater the level of "communication" and thus coregulation. Most miRNAs have multiple targets, and many mRNAs are targets for multiple miRNAs. The reason for a negative correlation in the expression level is frequently not clear. Additionally, many miRNAs target long non-coding RNAs, which also influence the RNA concentration. Therefore, it is difficult to predict the direction of changes in target expression and its impact on molecular networks.
Undoubtedly, miRNAs form a complex network and have key roles in many diverse regulatory pathways involved in plant development, plant health, environmental and disease responses [30,43,44]. Therefore, could miRNA be responsible for the development of somaclonal variation? Is there one pattern triggering miRNA complexes that leads to changes in regenerants? We demonstrated this through an in silico display of a PPI network in which one of the components was targets regulated by cleavage or translation inhibition by miRNAs. This has an impact on the proteome content, as seen in the molecular networks that we have performed.
The complex network theory (CNT) is becoming one of the most powerful and versatile tools to investigate, describe and understand biological systems [45]. In the last decades, CNT has had an unrestrained development, and researchers have proposed novel approaches, metrics and theories to explore and disentangle network features [46][47][48]. Protein-protein interaction analysis with the STRING tool allows for proper configuration of the network based on a database containing all available protein association evidence and prediction algorithms. Herein, we performed complex analysis of multi-omics research to elucidate the role of miRNA with target genes and its influence on protein occurrence or absence. The regulatory role of miRNA in each somaclonal line on the protein-protein networks is proven. miRNA interacts with the targets mainly in two ways: by cleavage or translation inhibition, thus causing disturbances in the abundance or structure of the resulting proteins encoded by target genes for small miRNA molecules. In our in silico research, we created networks including proteins encoded by target genes, and as shown in Figure 13, they interact with clusters encoded by genes with high impact polymorphisms [40,41] and the genes that are differentially expressed [41]. Therefore, we can conclude that miRNA plays a role in complex processes in cucumber fruit derived from the somaclonal lines. From the analyses performed, a functional picture emerges, in which the components are involved mostly in processes connected with metabolic processes, as well as the regulation of biological processes with regard to molecular function in binding and catalytic activity. We observed similar enrichment in gene ontology across all three somaclonal lines. This may indicate co-processes that occur when new pathways are created or when alternative pathways in plants lead to somaclonal changes. Further, in the cellular component category, the enrichment network analysis showed terms common to the three somaclonal lines, such as membranes, cytoplasm and organelle (especially chloroplast in the S1 and S3 lines), which indicates that the proteins of the interactome under study act mainly in these areas. Most cellular activities take place within the cytoplasm, e.g., many metabolic pathways. Movement of ions in and out of the cytoplasm is a signaling activity for metabolic processes [49]. Membranes act as major checkpoints for signal sensing and transport control. An important role in communication is played by membrane-associated proteins, which interact with intracellular processes through protein interaction networks [50]. Deciphering these signaling networks certainly provides important information for elucidating in vivo cellular regulation in somaclones, particularly membrane-protein interactions, as well as how these proteins may be related to downstream changes in gene expression, metabolism and plant physiology. It seems that the molecular networks we present illustrate how plants responded to in vitro cultures. Membranes, and more specifically receptor proteins, transmit a signal inside the cell, and there are other proteins whose role is to participate in metabolic pathways that make the cell adapt to the surrounding environment of in vitro cultures.
Returning to the question of miRNA's role in this complex process, it seems certain that by acting on target molecules through digestion or translation inhibition, miRNAs influence signaling networks in response to the conditions in in vitro cultures. Target genes are involved in processes such as cellular and metabolic processes and their regulation, and more particularly in response to stress, multicellular organism development, biosynthetic process and transport. The most abundant terms in the category of molecular function were: catalytic activity, protein binding, binding, hydrolase and transferase activity, nucleic acid binding and transcription factor activity. This is in line with the trend described above for the terms that were the most numerous groups in the network enrichment analysis. The same applies to cellular compartments.
This connection with ontological groups also shows that proteins are involved in numerous processes. In the cells of the cucumber somaclonal lines, numerous metabolic processes take place, taking into account various biochemical changes of proteins, which may be a response to the conditions in in vitro culture. Surely, these numerous reactions are intended to adapt to the environment of the in vitro cultures and to develop reaction pathways in such a way that the plants survive.
In the molecular function category are also proteins connected with nucleic acid binding and transcription factors, which are essential for the regulation of gene expression and usually belong to members of multigene families [51]. Generally, TFs exist as modular proteins containing a DNA-binding domain that interacts with cis-elements of their target genes [52]. Moreover, it also consists of a protein-protein interaction domain that assists oligomerization between TFs or with other regulators [53]. We hypothesized that miRNAs could act with targets that are TF, and thus influence other genes, resulting in the changing of transcriptome profiles in somaclonal lines. The regulation of genes at transcriptional and post-transcriptional levels, which includes miRNAs and TFs as key regulatory entities, is an interesting aspect. Understanding the regulation of miRNAs and their interactions with TF can help scientists deepen their understanding of plants and the mechanism of survival in adverse environmental conditions [54] or, in this case, in an in vitro culture environment.
Noteworthy is the fact that the common miRNA ath-miR393a-3p has changed in all three somaclonal lines. This sequence belongs to the miR393 family, which is predicted to target mRNAs coding for F-box proteins and bHLH transcription factors [55]. F-box proteins are substrate-recognition components of the Skp1-Rbx1-Cul1-F-box protein (SCF) ubiquitin ligases. In plants, F-box genes form one of the largest multigene superfamilies and control many important biological functions. However, it is unclear how and why plants have acquired a large number of F-box genes [56]. The bHLH transcription factors comprise a large family in higher plants, and numerous studies have shown that bHLHtype transcription factors are involved in diverse biological processes in plant growth, development and stress responses [57]. There are a multitude of processes in which these TFs are involved: F-box and bHLH confirm that miRNAs, by reacting with these targets as TFs, can have a large impact on other genes.
Other miRNAs pointed out in our study as differentially expressed with counterparts in the miRBase data were similar to those from Cucumis melo [58]. Among the downregulated miRNAs were cme-miR169l in the S2 and S3 lines and cme-miR156b in the S3 line. Among up-regulated miRNAs were: cme-miR167c, cme-miR394a and cme-miR171e in the S2 line; and cme-miR164b, cme-miR166b, cme-miR396e, cme-miR390d and cme-miR169f in the S3 line.
Many of the down-regulated miRNAs were also associated with transcription factors. The cme-MIR394a belongs to the miR394 family, which is predicted to target mRNAs coding for F-box proteins [55]. The miR156 is expressed at high levels in organs produced early in shoot development, where they repress the expression of their targets, SQUAMOSA PROMOTER BINDING PROTEIN (SBP) transcription factors [59]. In addition, miR156 was recently identified as an enhancer of the callus embryogenic potential in Citrus sinensis [60]. miR166b is thought to target mRNAs coding for HD-Zip transcription factors, including Phabulosa (PHB) and Phavoluta (PHV), that regulate axillary meristem initiation and leaf development [61]. The cme-MIR171e is also associated with transcription factors [55].
Several miRNAs appear to be temporally regulated during development, such as the miR167, miR169 and miR390 families during ovarian development [62], or miR164b, which targets mRNAs encoding NAC domain-containing proteins such as the cup-shaped cotyledon 2 (CUC2) required for shoot apical meristem formation [61]. Furthermore, transcription factors CUC1 and CUC2 regulated by miR164 participate in the establishment and maintenance of axillary meristem and organ boundary during embryogenesis [63,64]. Further, miR396 regulates growth-regulating factor genes [65]. Some miRNAs, including miR156 and miR169, are associated with the coordination of the relationship between development and stress responses [66].
miRNAs have been reported to be involved during in vitro culture of plants [23,67]. Differential microRNA expression has been recorded in micropropagated strawberry plantlets regenerated by tissue culture, and these have been correlated with existing differences with the phenotype [24]. During in vitro culture, the miR156 was upregulated, whereas other miRNAs (miR164, miR172 and miR390) were downregulated. Authors have suggested that the regulation of miRNAs may differ in their expression, and the regulation of miRNAs is controlled by different factors in the culture media, such as hormones allowing differentiated tissue to reach growth potential [24].
In our study with cucumber, the set of miRNAs belonging to the same families were differentially regulated similarly to those in the strawberry. However, to date, no direct relationship has been found between somaclonal variation and miRNA [68]. Further studies are required to characterize the miRNA expressed during somaclonal variation from different plants so that diagnostic miRNA associated with this phenomenon can be compiled and its possible use as a biomarker facilitated [66]. Typically, this work concerned changes in miRNAs during various periods of in vitro time cultures, with the addition of various substances to the medium, but not directly connected with somaclonal variation.
In our work, we compared miRNA profiles and connected these results with other omics studies performed on the same plant [40,41]. In a sequence similarity analysis, we identified novel and known miRNAs, that were also associated with somaclonal variation according to the literature [24]. In order to study the effects of miRNAs, we identified targets and annotated their functions and gene ontology. Further, we performed multi-omics interaction networks, using predicted in silico targets from this study and additional polymorphism data from the study of comparative genomics [40] and data from transcriptome profiling [41]. We have shown that miRNAs can have an influence on the molecular networks in the cell, by regulating various metabolic processes and influencing transcription factors that are responsible for direct gene expression. Further in-depth studies of miRNAs may shed light on essential hotspots of the regulatory pathways leading to somaclonal variation in plants.

Plant Material
Three cucumber somaclonal lines and one control B10 line were used in this study. The somaclones have the same genetic background, as the B10 line was a donor of explants for in vitro cultures, from which the analyzed lines were created in the later stages of development. The S1 line was obtained by direct leaf regeneration [69,70] and showed a mosaic phenotype, with irregular small yellow and silvery spots [69][70][71]. The S2 line was obtained from leaf callus regeneration. It possesses an altered fruit phenotype, which is light green, glossy, without a waxy coat, and lacking typical wards and netting [72]. The S3 line was obtained from cytokinin-dependent embryogenic suspension culture, and it possesses shoot apices that are yellow-green in color [73]. After the selection of each type of somaclone, the somaclones were self-pollinated for at least 10 generations with the maintenance of a specific phenotype that arose during in vitro cultures. The field experiment was carried out using the random blocks method. Ten plants per each line were seeded and phenotypically assessed. After self-pollination, young fruits (7 days old) were harvested, frozen in liquid nitrogen and stored at −80 • C.

Isolation of RNA, Library Construction and High-Throughput Sequencing of the Small RNA Fraction
Total RNA was extracted from 100 mg of tissue using the RNeasy Mini Kit (Qiagen, Valencia, CA, USA), with an additional step of DNase I treatment, in accordance with the manufacturer's protocol. The nucleic acid concentration and quality were assessed with a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and via standard electrophoresis on a 1.0% (w/v) ethidium bromide-stained agarose gel to allow for RNA visualization. Of the RNA, 10 µg from each of the three biological replicates (per sample) were used for the preparation of the sRNA library TrueSeq Small RNA library Kit (Illumina, Inc., San Diego, CA, USA). Parallel sequencing was performed on an Illumina HiSeq 2000 SR50 platform (McGill University Genome Quebec Innovation Centre, Montreal, QC, Canada). We obtained 50-bp single-end sequenced reads for sRNA-seq. FastQC [74] was used to assess the quality of the short reads. The sequences generated in this study have been deposited in the Sequence Read Archive (SRA) at the National Center for Biotechnology Information under accession numbers BioProjects: PRJNA723857 and PRJNA610495.

Bioinformatic Assessment of the miRNA and Targets: Degradome Verification
The identification of microRNAs and other sRNAs from small RNA-Seq data was performed with ShortStack ver. 3.3 [17]. The sequence was considered to be miRNA when it was tagged Y or N15 based on Axtell's criteria [17,18]. The clean reads were used in the BLAST [75] search against known mature miRNAs and pre-miRNAs of the miRbase (version 22.1) [76]. Differential expression analysis was performed with DESeq2 [77] using default parameters. Differentially expressed miRNAs were detected using DESeq with an adjusted p-value cutoff 0.05 and a Log 2 FC (Fold Change) ≥ 1.0. The in-silico predicted targets of the mature miRNA sequences were identified using psRNAtarget 2017 with the scale range 0-5 [78]. For further analysis, we used only those targets that ranged on a scale of 0-3. Functional annotation and Gene Ontology (GO) classification of the miRNA targets were carried out using Blast2GO software [79]. The obtained sequences were folded using Folder version 1.12 (RNAfold algorithm) [80]. The longest sequences that could form the stem-loop structures were used for pre-microRNA construction. The pre-microRNA structures with the lowest ∆G energy value were chosen, and the corresponding miRNA was marked in red. miRNA target interactions were confirmed using our miRNA FASTA and transcriptome files with publicly available cucumber degradome data (SRR7620953 from NCBI SRA archive) [81,82]. The small RNA Workbench ver. 3.2 (http://srna-workbench.cmp.uea.ac.uk/; 28 February 2022) [83] software package was used for this analysis with low default stringency parameters (p-value cut off 0.05 and 4.5 number of mismatches allowed, in category scale 0-4).

Integration of Multi Omics Data
Identified miRNAs and their targets were compared with previously obtained data related to creating a multi-omics molecular network. For this purpose, we used miRNA targets obtained in this study and previously obtained genomic [40] and transcriptomic data [41]. The genomic data (BioProject PRJNA563814) come from genome sequencing of the somaclonal lines and consist of selected genes affected by structural polymorphisms SNVs (Single Nucleotide Variants) [40]. The transcriptome data (BioProjects 578634 and 578623) contain the differentially expressed genes (DEGs) from cucumber fruits [40]. As a reference cucumber, genome B10v3 was used (GenBank LKUO00000000) [84]. The STRING algorithm (version 10.5) [85], using Arabidopsis thaliana as a model, was applied for analysis of the possible interactions between protein-protein interaction (PPI) encoded by DEGs, G_SNVs and miRNA targets (Table S4). We used Cytoscape string APP (version 3.7.2) [86] to edit the layout of our map. Each node was color-coded based on the data type. The networks enrichment was performed with regard to gene ontology and UniProt Keywords (Table S5).

qPCR Analysis of Expression of miRNA and Their Target Genes
Total RNA was isolated with the miRNeasy Micro Kit (Qiagen), which is effective for total RNA and miRNA, according to standard protocol. Reverse transcription was performed with the High-Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's protocol. For cDNA synthesis, 1 µg of the total RNA was used. cDNA synthesis of miRNA was performed using the two-tailed RT target-specific primers. All primers for miRNA qPCR were designed according to the two-tailed RT-qPCR method [87]. miRNAs for the analysis were chosen randomly from miRNAs differentially expressed in all lines. For target validation, target genes of the chosen miRNAs that were differentially expressed at significant levels were chosen. Quantitative PCR analysis of miRNA and target transcripts was performed with three biological and three technical replicates using 3 µL of diluted (1:5) cDNA, Power SYBR Green Master Mix (Thermo Fisher Scientific, Waltham, MA, USA) and the Applied Biosystems 7500 Real-Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA). A melting curve analysis was completed immediately after the qPCR. Relative expression levels were determined according to the 2 −∆∆Ct method, and statistical significance analysis was performed using REST2009 software. For the relative expression analysis of miRNA, the U6 snRNA was used as a reference, and for the analysis of target genes, the CACS gene was used as a reference. The full list of primers used in the RT-qPCR analysis is shown in Table S4.

Conclusions
In the analysis, we confirmed our hypothesis of the existence of differentially expressed miRNAs between the somaclonal lines and the reference B10 line of Cucumis sativus. We conclude that miRNAs have a role in somaclonal variation, influencing targets by cleavage or translation inhibition. Predicted in-silico targets are mostly connected with metabolic processes, response to stress, multicellular organism development and biosynthetic process. The analysis of the targets' functions revealed that most of them were classified to catalytic activity, binding, transferase activity and transcription regulation activity. In the cellular component category, the most abundant terms were: membrane, intracellular, cytoplasm and organelle. The connection with ontological groups shows that target proteins are involved in numerous processes. In the cells of the cucumber somaclonal lines, numerous metabolic processes take place to react and adapt to the environment of the in vitro cultures. We identified several novel and known miRNAs, and the target molecules of the latter are mostly transcription factors influencing many other genes. We also checked bioinformatic analyses for possible interactions at the level of target proteins, genes affected by polymorphism and genes that are differentially regulated. Herein, for the first time, we showed the complexity of processes by molecular imaging of interaction from different omic's levels. We hypothesize that miRNAs could indirectly affect molecular networks and could play a role in many diverse regulatory pathways leading to somaclonal variation.