Genome-wide identi�cation and characterization of lectin receptor-like kinase gene family in cucumber and expression pro�ling analysis under different treatments

Abstract


Background
Plants are immobile creatures compared with animals, so they have more sensitive signal sensors to cope with the changing environment around them.In the long-term evolutionary process, plants have evolved a complete set of signal receptor proteins.After receiving external stimulus, they would transmit signals to downstream pathways the rst time to allow plant respond to the stimulus.Cell surface receptors, a kind of signal receptor proteins, play important roles in receiving and transmitting environmental signals.The receptor-like kinase (RLK) family, one important family of cell surface receptors, contains three kinds of domains, such as extracellular domain, transmembrane domain (TM) and intracellular kinase domain.RLK proteins could be classi ed into different families based on the structure of the extracellular domains and intracellular kinase domains.
Lectin receptor-like kinases (LecRLKs), a class of RLKs that contain a lectin domain within the extracellular domain, are a gene family that is specialized for sensing external environmental stimuli and transmitting signals.
They are localized on the cell membrane, relying on N terminus diverse extracellular ligand recognition domains (also called lectin domain) to recognize various environmental stimulus, and then phosphorylate downstream proteins through their C terminus intracellular kinase domain to pass received signals (Bouwmeester and Govers, 2009).
Based on the identity of lectin domains, the LecRLKs have been divided into three subfamilies (Fig. 1): L-type, Gtype, and C-type LecRLKs (Vaid et al., 2012).These subclasses are very distinct from each other, with sugarbinding ability of lectin domain.The G-type LecRLKs possess α-D mannose speci c plant lectins, which are also accompanied in most of the proteins (not necessarily all) by both EGF and PAN domain motif or one of them.
The EGF motif is cysteine rich (Shiu and Bleecker, 2001) and probably takes part in the formation of disul de bonds (Vaid et al., 2012).The PAN motif is believed to be involved in protein-protein and protein carbohydrate interactions (Naithani et al., 2007).As the name suggests the lectin domain of L-type LecRLKs resembles soluble lectin protein found in leguminous plants (Hervé et al., 1999).The third class of LecRLK is C-type lectin kinase.The C-type lectin kinases in plants are thought to be homologues of calcium-dependent lectin motifs which are a large group of mammalian proteins known to be involved in innate immune responses and pathogen recognition (Cambi et al., 2005).Though C-type LecRLKs are present in large number in mammalian system, only one gene encoding for C type LecRLK exists in rice and Arabidopsis respectively (Bouwmeester and Govers, 2009).
The roles of LecRLKs in plants are believed to be primarily involved in plant development, innate immunity and abiotic responses.Previous reports have con rmed that LecRLK family is involved in plant root development (Cheng et al., 2013), pollen development (Wan et al., 2008), cotton ber development (Zuo et al., 2004), and hormone signal recognition (Deng et al., 2009).At the same time, LecRLKs also play an irreplaceable role in plant resistance to diseases, insect pests and stresses, such as salt stress (Li et al., 2014;He et al., 2004), wounding (Riou et al, 2002), and fungal pathogen (Ohtake, 2000;Desclos et al., 2012).Compared to L-type and G-type LecRLKs, the C-type LecRLK is the most mysterious member of the family.Although there is only one subfamily member in Arabidopsis and rice, the C-type LecRLK had not been reported involving any speci c biological traits for a long time until (Guo et al., 2017) identi ed that a C-type LecRLK may affect morphogenesis of trichome in cucumber (Cucumis sativus).This study suggested that the LecRLKs family may be involved in more biological pathways.
Genome-wide analysis of the LecRLK gene family has been done in Arabidopsis thaliana (Vaid et al., 2012), rice (Vaid et al., 2012), bread wheat (Shumayla et al., 2016), soybean (Liu et al., 2018), and Populus trichocarpa (Yang et al., 2016), but no previous studies have been reported in Cucurbitaceae plant.Cucumber is an important economic crop of Cucurbitaceae, Genome-wide analysis of the LecRLK gene family has been done in Arabidopsis thaliana (Vaid et al., 2012), rice (Vaid et al., 2012), bread wheat (Shumayla et al., 2016), soybean (Liu et al., 2018), and Populus trichocarpa (Yang et al., 2016).Cucumber is an important economic crop of Cucurbitaceae, but there are only a few early studies reporting on the presence of LecRLKs (Wu et al., 2014).A comprehensive understanding of LecRLKs in cucumber is still lacking.Here, we did a complete identi cation and analysis of the entire LecRLK gene family in cucumber, 46 LecRLK genes (CsLecRLKs) identi ed.Furthermore, we analyzed their phylogenetic relationship, gene structure, conserved domain, gene duplications, chromosome distribution and cisacting elements on the promoters.Finally, we pro led the expression of the predicted genes in different tissues and response to gibberellin (GA), abscisic acid (ABA),1-Naphthaleneacetic acid (NAA), indole-3-acetic acid (IAA) and cold treatments in cucumber.Our study provides valuable information for further functional research on the LecRLK gene family in cucumber.

Identi cation of the LecRLK genes in cucumber
The whole genome and protein sequence data of cucumber were downloaded from a public database (http://cucurbitgenomics.org/) (Cucumber (Chinese Long) genome v2).The Hidden Markov Model (HMM) was used to identify cucumis sativus LecRLK candidates, and the HMM pro les of LecRLKs were downloaded from the Pfam protein database (http://pfam.xfam.org/), the models for these les are L-type (Lectin_legB PF00139), G-type (B_lectin PF01453) and C-type (Lectin_C PF00059).We used HMMER 3.0 (Johnson et al., 2010) to search three types of genes from cucumber protein sequence data with e-value cutoffs of 0.001.Then we further examined these selected genes by PFAM (Finn et al., 2016) and SMART (Letunic) program to ensure that each protein contains conserved lectin domain, transmembrane domain and a kinase domain.The LecRLK family data of rice and Arabidopsis were downloaded from a previous research literature (Vaid et al., 2012).

Phylogenetic analysis
The full-length protein sequences of CsLecRLKs were aligned by the MUSCLE program with the default parameters (Edgar, 2004).The phylogenetic tree was constructed through Neighbor-Joining (NJ) method by MEGA 7.0.21,with the following parameters: Poisson model, pairwise deletion, and 1000 bootstrap replications.

Conserved domain, motif identi cation and gene structure analysis
The conserved motifs of the CsLecRLK were predicted by the MEME program (Bailey, 1994), the parameters were set as any number of repetitions, optimum motif width of 6-210 residues, and searching for 10 motifs, with other parameters at default.The Gene Structure Display Server (Hu et al., 2014) was used to show the exon-intron structures of CsLecRLK genes.

Gene location and duplication analysis of CsLecRLKs
The location information of CsLecRLKs was from the genome annotation les of Cucurbit Genomics Databases (http://cucurbitgenomics.org/) by a series of in-house Perl scripts.The map of gene location was constructed with MapChart software described previously (Voorrips, 2002).We used two methods to nd duplication events among the CsLecRLKs, one way was that Gene duplication was con rmed with two criteria by an in-house Perl scripts: (a) the shorter aligned sequence covered >70% of the longer sequence in length; (b) the similarity of aligned sequences was >70% (Gu, 2002;Yang, 2008).Two genes located in the same chromosomal fragment of less than 100 kb and separated by ve or fewer genes were identi ed as tandem duplicated genes (Mehan et al., 2004).Another way was that we used the Multiple Collinearity Scan toolkit (MCScanX) to analyze the gene duplication events, with the default parameters (Wang et al., 2012).Ks (synonymous substitution rate) and Ka (nonsynonymous substitution rate) values of tandem duplicated genes were calculated by the method of Nei and Gojobori as implemented in KaKs_calculator (Zhang et al., 2006) based on the coding sequence alignments.The divergence time was calculated based on the formula T = Ks/2r, with Ks being the synonymous substitutions per site and r being the rate of divergence for nuclear genes from plants.The r was taken to be 1.5×10 -8 synonymous substitutions per site per year for dicotyledonous plants (Olds et al., 2000).

Analysis of cis-acting elements
The upstream 1500bp of each CsLecRLK was obtained from the genome annotation les of Cucurbit Genomics Databases (http://cucurbitgenomics.org/) by a series of in-house Perl scripts, then scanned cis-acting elements contained in these sequences using the Plantcare Databases (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/), in the process of analysis, we ltered out the cisacting elements that are ubiquitous in most genes, for example CAAT-box, TATA-box, TATC-box and so on, only showing that may be typical and functional cis-acting elements presented in the CsLecRLK gene family.Then, the structures and annotations of the promoters were generated by GSDS (http:// gsds.cbi.pku.edu.cn/;)(Hu et al., 2015).

Expression pattern analysis
For expression pro ling of the CsLecRLK genes in 10 tissues, we utilized the Illumina RNA-seq data that were previously generated by our lab.Those tissues of cucumber include 10mm oral bud (remove sepals), 10mm ovary (remove sepals), pericarp (one day before owering and remove the trichomes), pistil (one day before owering), stamen (one day before owering), root (seeds were place on a moist sprouting paper for 72 hours), stem, tendril (not stretch), fruit spines (one day before owering) and cotyledon (extend to 2 cm).The Heatmap of the gene FPKM values in 10 tissues of CsLecRLKs was drawn by the R program.We also selected 5 typical tissues, included oral bud, ovary, cotyledon, fruit spines and root, to draw Venn diagrams using the R program.

Hormones and cold treatments
The typical cucumber line '9930' was used as the experimental material to investigate the expression pattern in response to various phytohormones and cold treatments.Cucumber seeds were soaked in 55℃ water for 2h and germinated on petri dish in a growth chamber at 28℃ in the dark for 2d.The germinated seeds were grown in pots containing peat: vermiculite mixture (3:1) in the greenhouse of Shanghai Jiao Tong University, and the controlled environment growth chamber was programed for light 16 h/25°C and dark 8 h/20°C.After germination for 4 weeks, the seedlings were placed into hydroponic boxes with 1/2 MS liquid solution (pH 5.8, without sugar) for 1 week to adapt to the environment (root was lucifugal), and then were treated with 100 mM indole-3-acetic acid (IAA), 100 mM 1-Naphthaleneacetic acid (NAA), 100 mM abscisic acid (ABA), or 100 mM gibberellin (GA) for 3h under the same growth conditions as described earlier, respectively.The 1/2 MS liquid solution without any hormones was used as a control.Another group of seedlings was treated at 4°C for 1h, 25°C was used as a control.Each treatment consisted of three replicates.

RNA extraction and gene expression analysis
Roots were collected from plants treated with different hormones, and young leaves were collected from plants subject to 4°C.Total RNA was extracted using the RNeasy Plant Mini Kit (Cwbio, Beijing, China).The rst strand cDNA was prepared according to the PrimeScript RT reagent Kit with gDNA Eraser (Cwbio, Beijing, China) protocol.To identify the relative expression level of different LecRLK genes under different treatment, qRT-PCR was conducted using FastStart Essential DNA Green Master (Roche, Mannheim, Germany).CsActin3 (Csa6G484600.1)was used as an internal control.qRT-PCR was performed in a total volume of 20 μL, containing 2 μL of cDNA, 10 μL SYBR mix, 2 μL gene-speci c primers (10 μM) and 6 μl ddH 2 O, using the CFX Connection Real-Time System (Bio-Rad, California, USA) with 40 cycles of 5s at 95°C, 30s at 60°C.Each experiment was repeated three times, and each experiment included three biological repeats.The data from real-time PCR ampli cation was analyzed using 2 −△△CT method.Primers used for qRT-PCR were followed as Table S3.

Results
Genome-wide identi cation of the LecRLKs in cucumber We identi ed a total of 46 LecRLK genes which named CsLecRLKs (Table1) in the cucumber genome by the Pfam and Smart search.The total number of LecRLKs in cucumber is less than that in Arabidopsis (75 LecRLK genes) or rice (173 LecRLK genes) (Vaid et al., 2012).The 46 CsLecRLKs were classi ed into 23 G-type, 22 L-type, and one C-type based on their extracellular lectin domain.The molecular weight (MW) of the proteins ranged from 62.5 kDa (Csa1G056960) to 94.5 kDa (Csa3G733860), and the isoelectric point (Ip) ranged from 4.98 (Csa4G289630) to 9.51 (Csa1G056960), the range of CDS length was 1,803-2,502bp.With the predicted protein structures, it could be considered that most of the CsLecRLKs were localized on the plasma membrane, only Csa7G045520 was located on the extracellular.More information of CsLecRLKs, including the length of the gene, the length of CDS, the length of the protein sequence, the protein MW and pI were listed in Table1.
By analyzing the molecular weight of all 46 CsLecRLKs, we found that the weight of G-type CsLecRLKs (83.2 kDa) are generally larger than L-type (62.5 kDa) and C-type (74.6 kDa).This may be mainly due to the fact that in addition to the lectin domain, G-type CsLecRLKs often contain the EGF and PAN domains (Fig. 1).Signal peptides and transmembrane domain (TM) domains are critical for protein localization.The software prediction indicated that not each CsLecRLK had signal peptide and unique TM domain.The loss of signal peptide or TM domain would directly affect the localization of proteins in cells (Table 1).The plasma membrane localization of most of the CsLecRLKs indicated that they are signal receptors which can sense extracellular signals and then transmit the signals to the interior of the cells.

Phylogenetic analysis of the CsLecRLKs
We constructed an unrooted phylogenetic tree by the MEGA 7.0.21(Fig. 2).As expected, the phylogenetic tree showed that the CsLecRLK family could be classi ed into three subgroups of L-type, G-type, and C-type.This result is consistent with the domain-based classi cation of CsLecRLK family.The phylogenetic tree indicated that the L-type and C-type had a closer relationship.This result was different from previous reports in Arabidopsis and rice, which revealed a closer genetic relationship between G-type and L-type (Vail et al., 2012).As shown in Fig. 3, the phylogram of G-type and L-type CsLecRLKs could be divided into four and three sub-groups respectively.The division of individual clades was supported by high bootstrap values.

Exon-Intron Structural Analysis of CsLecRLKs
The genomic sequence and corresponding cDNA sequence of the CsLecRLKs were submitted to GSDS (Gene Structure Display Server) together for analyzing their gene structure (Fig. 3).The genome sequence lengths of CsLecRLKs ranged from 1803bp to 6481bp, the lengths of CDS ranged from 1674 bp to 2502 bp.The number of exon of these genes varied from one to nine, 80% CsLecRLKs had less than three exons, excepted that Cs4G296230 contains three exons.All L-type CsLecRLKs contained only one or two exons, and the C-type CsLecRLK (Csa1G056960) contains four exons.The G-type CsLecRLKs contain one to nine exons.Among them, the Csa7G446780 contains nine exons, which has the most exons.), indicating that there may be multiple phosphorylation catalytic sites in each of CsLecRLKs.

Chromosomal Location and Gene Duplication of CsLecRLKs
We extracted the location data of CsLecRLKs and the length data of each chromosome from the cucumber genome annotation les by a series of Perl scripts, and constructed gene location map using MapChart software.
As shown in Figure 4, all CsLecRLKs were unevenly distributed across 7 cucumber chromosomes, and genes from the same subfamily on the same chromosome had a tendency to cluster.The number of CsLecRLKs on each chromosome varied from 1 to 12, chromosome 3 contains the largest number of 12 CsLecRLKs and chromosome 2 had only one CsLecRLK.
During the biological evolution, the generation of gene family could be caused by tandem duplication and segmental duplication (Kent et al., 2003;Mehan et al., 2004).In order to explore whether CsLecRLK gene family also have an expansion caused by the two kinds of duplication, we analyzed the duplication events of CsLecRLK genes.The result indicated that although many genes were clustered on the chromosomes, only Csa1G071170 and Csa1G071160 were a pair of tandem duplicated genes, their divergence time was about 38.606 million of years ago (MYA).The other two pairs of duplicated events, Csa1G073890 and Csa7G048050, and Csa3G734030 and Csa4G296230, may be caused by duplication or ectopia of chromosome fragments during the evolution.These duplicated genes are not in the same chromosome.Their divergence times were 30.96 and 32.35 MYA, respectively.Based on the above results, it could be inferred that tandem duplication contributed to the expansion of CsLecRLK gene family.

Cis-acting Elements Analysis on CsLecRLKs promoter
Different genes have their own speci c or consensus cis-acting elements on their promoters.Trans-acting factors bind to the cis-acting elements to regulate the gene expression.Different cis-acting elements may correspond to different biotic or abiotic stress signals which could induce or inhibit the genes expression.Therefore, the cisacting elements analysis on CsLecRLKs promoter will help us to further understand these genes' function.We used Plantcare website to analyze the promoters of 1500bp upstream sequence from translation initiation site of CsLecRLKs, and found that there were 54 typical and functional cis-acting elements (Fig. 5), which could be divided into four types: light response, stress resistance, plant hormone and others.Among them, 24 cis-acting elements were related to light response, 11 were related to hormone included salicylic acid (SA), jasmonic acid (JA), ethylene, gibberellin and auxin, and 9 were abiotic stress elements.These results suggested that the CsLecRLK gene family may be mainly involved in the biological pathway of stress resistance in cucumber.There were six developmentally related cis-acting elements, ve of which were related to seed development, suggesting that this gene family may play a role in seed development.More details were shown in the supplementary table 1.

Expression Pattern Analysis of CsLecRLK genes
Little is known about the functions of LecRLKs in cucumber.As a rst attempt to provide insights into their potential functions, we used RNA-seq data from 10 tissues of cucumber to investigate the expression of each CsLecRLK gene.Most of CsLecRLKs were expressed at a low level, some (Csa6G338050, Csa1G071160, and Csa3G115090) were barely expressed in any tissue.The expression pattern of 10 tissues could be divided into two groups based on the expression level of CsLecRLKs (Fig. 6A).Group 1 included stamen, most of CsLecRLKs were barely expressed in it, just 10 genes had a constitutive expression pattern (FPKM >= 1 in all tissues, Tao et al., 2018).Group 2 included other 9 tissues, there were at least 19 genes had a constitutive expression pattern in each tissue.The expression pattern of all CsLecRLKs could be divided into 3 groups based on their expression level in each tissue (Fig. 6A).From group 1 to 3, the range and level of gene expression decreased successively.
Group 1 contained 4 genes, which had a high expression level in each tissue with average FPKM of 31.00.There were 14 genes belonged to Group 2, they had an intermediate expression level in each tissue with average FPKM of 7.86.Group 3 included 28 genes expressing at low level in each tissue with average FPKM of 1.81.Excepted that C-type CsLecRLK (Csa1G056960) belonged to group 1, and that G-type CsLecRLKs had higher expression level than L-type.
Thirty-six CsLecRLKs were expressed in all tissues (FPKM > 0 in all tissues) (Tao et al., 2018) and 8 genes were constitutively expressed (FPKM >= 1 in all tissues).Then we focused on those genes with relatively high expression (FPKM >2 in all tissues) (Tao et al., 2018) and selected 5 tissues of cucumbers for cluster analysis (Fig. 6B), including 10mm oral bud, 10mm ovary, cotyledon, fruit spines and root.We found a total of 15 genes were expressed in all these tissues.Specially, two genes were expressed only in the roots (Csa1G605730 and Csa3G115060), one gene (Csa3G048440) was expressed in ovary, and six genes just expressed in fruit spines (Csa4G289620, Csa4G289630, Csa4G289640, Csa4G296230, Csa7G067410 and Csa7G446780).
Expression analysis of CsLecRLK genes in response to different treatments Gene expression is not only spatiotemporal speci c but also can be induced or repressed by hormones and stress.Because most of LecRLKs are receptor proteins on the membrane, they usually can sense those stimuli at the rst time and send signals to intracellular receptors.To uncover all the divergence information of CsLecRLKs under different environment for a short time, the expression patterns under different hormone treatments, including IAA, GA, ABA, and NAA, and cold stress treatments, were analyzed by qRT-PCR.The result showed that most of CsLecRLKs (31/46) responded to at least one treatment (Fold change > 1 than the control group Signi cance p=0.05).Overall, there were 20 upregulated events and 38 downregulated events totally (Signi cance p=0.05).In order to show the experimental results more conveniently and intuitively, the change fold under different treatment was displayed in heatmap (Fig. 7) based on the data of qRT-PCR.Firstly, some CsLecRLKs (7/46) could be induced or repressed by multiple treatments (treatment number > 3), for instance, Csa1G071170 could be induced by GA, IAA, NAA and ABA treatments, Csa4G005510 was repressed by all treatments except ABA.Secondly, different CsLecRLKs could be induced or repressed by different treatments.The cold stress induced or repressed the minimal CsLecRLKs gene expression, there were 4 genes expression had changed, they were downregulated.On the contrary, NAA induced or repressed the most CsLecRLKs gene expression, there were 20 genes that responded to NAA treatment, 6 genes were upregulated and 14 genes were downregulated.The 16 CsLecRLKs changed their expression level under ABA treatment, there were 8 genes upregulated and 8 genes downregulated.IAA and GA caused expression level change in 9 and 8 genes respectively.The IAA treatment caused 1 genes expression upregulated, and 8 genes downregulated.The GA treatment caused 5 gene expression upregulated and 3 downregulated.Thirdly, 14 CsLecRLKs had different expression pattern under various treatments, for example, Csa3G734030 could be induced by NAA, and repressed by ABA, Csa1G071150 was upregulated under ABA treatment, and downregulated under NAA and IAA treatments.
The results indicated that the members of CsLecRLKs had their own response characteristics to hormones and stresses and may play an important role in sensing external stimulus signals.For example, although Csa1G071160 and Csa3G115090 were not expressed in the root, our experiment showed that they can be induced by NAA and ABA, respectively (Fig. 7B).There were 15 genes did not have signi cance expression change under different treatment, they were Csa1G056960, Csa7G029930, Csa5G550210, Csa4G296250, Csa7G048050, Csa1G073890, Csa4G289620, Csa3G115060, Csa3G099580, Csa1G071270, Csa6G516770, Csa2G439150, Csa1G605730, Csa6G338050 and Csa5G648630.

Discussion
Compared with other plants, cucumber contains fewer members of LecRLK gene family, there are only 46 CsLecRLKs in cucumber genome, among which 23 G-type, 22 L-type and only one C-type.The number of LecRLK family members varies from different plants, this may be caused by the following three reasons.Firstly, the genome sizes of various plants are different, for example, the genome size of soybean is above 1 GB, which contains 52,051 protein-coding genes (Shen et al., 2018), so soybean has more members of LecRLK than other plants (soybean contains 189 LecRLK genes) (Liu et al., 2018).Secondly, different numbers of LecRLKs are also related to the function of LecRLKs in different plants' life activities, for instance, G-type LecRLKs are generally considered as a class of proteins that may be involved in self-incompatibility because some G-type LecRLKs contain a S-domain (Bouwmeester and Govers, 2009), which is an essential domain for sporophytic selfincompatibility response-related proteins.The cucumber is a kind of unisexual ower plant that does not have the problem of identifying the source of pollen during reproduction, unlike Populus and Eucalyptus, which are obligate outcrossing plant, this may be partially explain that why Populus and Eucalyptus contain more G-types LecRLKs, leading to a big increase in the number of members of the entire gene family (Yang et al., 2016).Thirdly, there are fewer duplicated events in CsLecRLK gene family.For instance, we only identi ed three pairs of duplicate genes in cucumber, but there are 36 paralogous gene pairs were generated with duplicate events in soybean (Liu et al., 2018), this lead to a wider range of expansion of the gene family in other plants.
Another interesting nding in our study was that most CsLecRLK genes have less introns, it is similar to previous studies in other plants.For example, most members of LecRLK gene family in soybean only have one or even no intron (Liu et al., 2018).Previous study in Arabidopsis and rice also indicated that there are few genes with introns in this gene family, containing only ve and eight in Arabidopsis (contains 75 LecRLK genes) and rice (contains 173 LecRLK genes) respectively (Vaid et al., 2012).The reason for this kind of structure is probably that because the LecRLKs act as the signal receptors in plant, less introns means less selective cutting and splicing, this could save more transcription time, so that the signal could be transmitted at rst time (Jeffares et al., 2008).
The LecRLK is a special family in plant genome, to date, no homologs of LecRLKs have been reported in the genomes of fungus and human (Yang et al., 2016).As the singal station of the immovable creatures, unique characteristic of LecRLK family may be closely related to their function on sensing the external environment.
Some LecRLKs have been reported that involved in sensing invasion of microorganisms.The SD1-29 is one kind of G-type LecRLK in Arabidopsis, it can identify lipopolysaccharide which is a secretion from Gramnegative Pseudomonas and Xanthomonas (Ranf et al., 2015).Some LecRLKs will change their expression pattern with the changes of hormone and nutritional conditions.For example, the SIT1, a L-type LecRLK in rice, can mediate salt sensitivity.With the increase of NaCl concentration, the SIT1 was activated rapidly, which reduced the survival of rice (Li et al., 2014).In addition, some LecRLKs affect the development and growth of plant, for instance, two Ltype LecRLKs, LecRK-IX.1 and LecRK-IX.2will induce cell death, thereby increasing plant survival, when infected with phytophthora (Wang et al., 2015).A C-type LecRLK mutant changed cell stacking pattern of trichome in cucumber, resulting fruit spines being easy to fall off from pericarp (Guo et al., 2017).It is similar to previous study (Yang et al., 2016), drastic number of CsLecRLKs may be expression in root (Fig. 6), this may be because root is one of the most important organ of plant, its role in xation, temperature sensing, and nutrient absorption is irreplaceable.We also found a number of CsLecRLKs whose expression levels were induced or repressed through hormone and stress treatment, implying that the expression level of these CsLecRLKs may be affected by biotic or abiotic stimuli not contained in our tested tissues.Of course, there are also some CsLecRLKs whose expression levels were not only low in tested tissues, but also not induced or repressed in our hormones and stress treatments, this may be because the stimulation conditions that relate to these genes were not found in our experiment, such as salt stress, insect stress, salicylic acid, ethylene and so on, these stimulation condition represents an exploratory area for further investigation.

Declaration
The experiments comply with the current laws of the country in which we were performed.
Statistical difference was determined by t-test (*P < 0.05, n = 3) using Microsoft Excel 2010.The Heatmap of the change fold values of CsLecRLKs under ve treatments was drawn by the R program.
Motif analysis of CsLecRLKs Through the SMART program prediction, we investigated conserved domains that present in CsLecRLKs.C-type and L-type CsLecRLKs were both only contain three based categories domain, Lectin domain, Transmembrane domain and Kinase domain.But some G-type CsLecRLKs also contained other two categories domains, PAN domain and EGF domain.Among G-type CsLecRLKs, ten proteins contain PAN and EGF domains at the same time, ve proteins only contain PAN domain, eight proteins only contain EGF domain, and only one contains neither PAN domain nor EGF domain.Our result indicated that signal peptide would be not necessary to CsLecRLKs.There are 25 CsLecRLKs without signal peptide and 8 CsLecRLKs with more than two transmembrane domains.Ten conserved motifs were identi ed in CsLecRLKs using the MEME program.These motifs were labelled Motif 1 to Motif 10 from the N-to the C-terminus.The details of the conserved motifs were shown in Figure 3.The lengths of these motifs ranged from 15 to 60 residues.Generally, the CsLecRLKs contains 4 to 10 motifs.None of the motifs appeared in all gene family members.Excepted that Motif 8 and Motif 9 were only present in the Gtype CsLecRLKs, other motifs were present in three type CsLecRLKs.With the CDD program, we found that the six of these motifs represent different kinase domains (Supplementary table

Table Table 1
. Basic character of CsLecRLK family