Genome-Wide Re-Identification and Analysis of CrRLK1Ls in Tomato

The Catharanthus roseus receptor-like kinase 1-like (CrRLK1L), which is a vital member of the plant receptor-like kinase family, plays versatile roles in plant growth, development, and stress response. Although the primary screening of tomato CrRLK1Ls has been reported previously, our knowledge of these proteins is still scarce. Using the latest genomic data annotations, a genome-wide re-identification and analysis of the CrRLK1Ls in tomatoes were conducted. In this study, 24 CrRLK1L members were identified in tomatoes and researched further. Subsequent gene structures, protein domains, Western blot analyses, and subcellular localization analyses all confirmed the accuracy of the newly identified SlCrRLK1L members. Phylogenetic analyses showed that the identified SlCrRLK1L proteins had homologs in Arabidopsis. Evolutionary analysis indicated that two pairs of the SlCrRLK1L genes had predicted segmental duplication events. Expression profiling analyses demonstrated that the SlCrRLK1L genes were expressed in various tissues, and most of them were up- or down-regulated by bacteria and PAMP treatments. Together, these results will lay the foundation for elaborating the biological roles of SlCrRLK1Ls in tomato growth, development, and stress response.


Introduction
As a crucial member of signal transduction, receptor-like kinases (RLKs) constitute the largest receptor family in plants and play a significant role in plant growth, development, stress, and pathogen response [1,2]. According to their diverse extracellular domains, plant RLKs can be mainly divided into the following: the S-domain, the wall-associated kinase domain, the legume lectin domain, the CRINKLY4 domain, the malectin-like (CrRLK1L) domain, the malectin-like leucine-rich repeat domain, the leucine-rich repeat malectin domain, the cysteine-rich repeat domain, the leucine-rich repeat (LRR) domain, the lysin motif domain, the pro-rich/extension domain, and the calcium-dependent lectin domain RLK family [3]. To the interest of many researchers, the plant-specific CrRLK1L protein kinases were firstly identified in Madagascar periwinkle and have since been found to exist in a variety of plant species [2,4,5]. Traditionally, CrRLK1Ls possess the following three conserved domains: the malectin-like domain, the transmembrane helix domain, and the kinase domain [5]. Some of the CrRLK1L members have been functionally identified, including FERONIA (FER), ANXUR1/2 (ANX1/2), THESEUS1 (THE1), BUDDHA'S PAPER SEAL1/2 (BUPS1/2), and HERCULES1 (HERK1).

Identification of Tomato CrRLK1L Protein Kinases
All CrRLK1L protein kinases consist of a malectin-like domain and a kinase domain. Arabidopsis CrRLK1L protein kinase sequences were submitted to the Pfam database, and two conserved domains (Pfam: PF12819 and PF07714) were acquired. Based on these criteria, the two conserved domains served as queries to screen the tomato protein databases in the National Center for Biotechnology Information (NCBI) and the Sol Genomics Network (SGN). As shown in Figure 1, 32 and 24 CrRLK1L protein kinase candidates were identified in NCBI and SGN, respectively. As a result, 24 tomato CrRLK1Ls were matched to both databases and named from SlCrRLK1L1 to SlCrRLK1L24 according to the location of the chromosomes (Table 1). Meanwhile, there were differences in the annotation of ten proteins in the NCBI and SGN databases ( Figure S1; Table 1; Table S1). One of them was chosen to verify the accuracy. We analyzed the gene structure and protein domain of SlCrRLK1L20 from the SGN annotation versions (from ITAG2 to ITAG4.1) and the NCBI RefSeq assembly accession versions (from GCF_000188115.2 to GCF_000188155.5) and found that the NCBI RefSeq had complete UTR, CDS, intron, signal peptide, transmembrane helix, malectin-like, and protein kinase descriptions (Figure 2a). Moreover, the SlCrRLK1L20 gene sequences from SGN were all included in NCBI. Then, a unique polypeptide was used as an antigen to produce a SlCrRLK1L20 antibody, which could detect SlCrRLK1L20 NCBI (NCBI GCF_000188115.5, the newest annotation protein) and SlCrRLK1L20 SGN (SGN ITAG4.1, the newest annotation protein) simultaneously (Figure 2b). The SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN CDSs were amplified by PCR ( Figure 2c) and used to construct a plant expression vector. A Western blot assay showed that SlCrRLK1L20 NCBI was detected by the anti-SlCrRLK1L20 antibody in the tomato but SlCrRLK1L20 SGN was not (Figure 2d). At the same time, the subcellular localization of SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN was conducted by confocal analysis. Solanum lycopersicum REMORIN1 (SlREM1) was identified as a plasma membrane-labeled protein in previous research [38]. As shown in Figure 2e, SlCrRLK1L20 NCBI -GFP was co-localized with SlREM1 at the plasma membrane, while SlCrRLK1L20 SGN was not. The above results revealed that the annotation of SlCrRLK1L20 from NCBI was more accurate than SGN. Therefore, the subsequent related research was mainly based on the NCBI database.
to both databases and named from SlCrRLK1L1 to SlCrRLK1L24 according to the of the chromosomes (Table 1). Meanwhile, there were differences in the annotati proteins in the NCBI and SGN databases ( Figure S1; Table 1; Table S1). One of t chosen to verify the accuracy. We analyzed the gene structure and protein d SlCrRLK1L20 from the SGN annotation versions (from ITAG2 to ITAG4.1) and RefSeq assembly accession versions (from GCF_000188115.2 to GCF_00018815 found that the NCBI RefSeq had complete UTR, CDS, intron, signal peptide, tr brane helix, malectin-like, and protein kinase descriptions (Figure 2a). More SlCrRLK1L20 gene sequences from SGN were all included in NCBI. Then, a uniq peptide was used as an antigen to produce a SlCrRLK1L20 antibody, which cou SlCrRLK1L20 NCBI (NCBI GCF_000188115.5, the newest annotation prote SlCrRLK1L20 SGN (SGN ITAG4.1, the newest annotation protein) simultaneousl 2b). The SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN CDSs were amplified by PCR (F and used to construct a plant expression vector. A Western blot assay sho SlCrRLK1L20 NCBI was detected by the anti-SlCrRLK1L20 antibody in the tom SlCrRLK1L20 SGN was not (Figure 2d). At the same time, the subcellular locali SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN was conducted by confocal analysis. Sola persicum REMORIN1 (SlREM1) was identified as a plasma membrane-labeled p previous research [38]. As shown in Figure 2e, SlCrRLK1L20 NCBI -GFP was cowith SlREM1 at the plasma membrane, while SlCrRLK1L20 SGN was not. The abov revealed that the annotation of SlCrRLK1L20 from NCBI was more accurate th Therefore, the subsequent related research was mainly based on the NCBI datab    Figure 2. The accuracy of the tomato CrRLK1L annotations in NCBI was higher than that in SGN. (a) The different versions of the gene structure and protein domain of SlCrRLK1L20 in the NCBI and SGN databases. The data were extracted from NCBI and SGN and then analyzed for visualization; detailed information can be found in Table S2. (b) Polypeptide antigen location in SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN . (c) Amplification of the SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN CDSs. (d) Western blot analysis of SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN . 1: N. benthamiana leaves transiently expressing CaMV35S::SlCrRLK1L20 NCBI -HA; 2: N. benthamiana leaves transiently expressing CaMV35S::SlCrRLK1L20 SGN -HA; 3: S. lycopersicum fruit. The uncropped Western blot gel image can be found in Figure S2. (e) Confocal analysis of SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN subcellular localization. Bars = 50 μm.  Figure 2. The accuracy of the tomato CrRLK1L annotations in NCBI was higher than that in SGN. (a) The different versions of the gene structure and protein domain of SlCrRLK1L20 in the NCBI and SGN databases. The data were extracted from NCBI and SGN and then analyzed for visualization; detailed information can be found in Table S2. (b) Polypeptide antigen location in SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN . (c) Amplification of the SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN CDSs. (d) Western blot analysis of SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN . 1: N. benthamiana leaves transiently expressing CaMV35S::SlCrRLK1L20 NCBI -HA; 2: N. benthamiana leaves transiently expressing CaMV35S::SlCrRLK1L20 SGN -HA; 3: S. lycopersicum fruit. The uncropped Western blot gel image can be found in Figure S2. (e) Confocal analysis of SlCrRLK1L20 NCBI and SlCrRLK1L20 SGN subcellular localization. Bars = 50 µm.

Tomato CrRLK1L Gene Locations and Duplication on Tomato Chromosome
To better understand the relationship between tomato CrRLK1L genes, the chromosomal distribution and collinearity of these genes were analyzed by TBtools. The results were as follows: The tomato CrRLK1L genes were distributed on chromosomes 1 to 3, 5 to 7, and 9 to 11, and not distributed on chromosomes 4, 8, and 12 ( Figure 3b). Chromosome 2 had the largest number of SlCrRLK1L genes, and chromosomes 7 and 11 had only one SlCr-RLK1L gene (Figure 3b). Segmental duplication played an important role in the gene family expansion. During this study, one-step MCScanX was used to reveal the collinearity of the SlCrRLK1L genes. As shown in Figure 3b, there was a collinearity relationship between SlCrRLK1L2, SlCrRLK1L23, SlCrRLK1L3, and SlCrRLK1L20, which showed duplication events of these genes.

Tomato CrRLK1L Protein Domain and Gene Structure
In order to further confirm the SlCrRLK1L proteins, conserved domain detection was carried out. All of the sequences were submitted to the NCBI Batch CD-Search to search for common domains. As a result, the malectin domain, the malectin-like domain, and the PKc-like domain were verified (Figure 4). At the same time, DeepTMHMM (https://dtu. biolib.com/DeepTMHMM) (accessed on 3 September 2022) was used to detect the signal peptide and transmembrane helix of SlCrRLK1Ls. As shown in Figure 4, Tables S3 and S4, all of the SlCrRLK1L proteins held one signal peptide and one transmembrane helix, except for SlCrRLK1L11. SlCrRLK1L11 had two transmembrane helices and no signal peptides. for common domains. As a result, the malectin domain, the malectin-like domain, and the PKc-like domain were verified (Figure 4). At the same time, DeepTMHMM (https://dtu.biolib.com/DeepTMHMM) (accessed on 3 September 2022) was used to detect the signal peptide and transmembrane helix of SlCrRLK1Ls. As shown in Figure 4, Table S3, and  Table S4, all of the SlCrRLK1L proteins held one signal peptide and one transmembrane helix, except for SlCrRLK1L11. SlCrRLK1L11 had two transmembrane helices and no signal peptides.

Prediction of SlCrRLK1L Conserved Protein Motifs
The SlCrRLK1L conserved protein motifs were analyzed by MEME (https://memesuite.org/meme/tools/meme) (accessed on 1 September 2022). In total, ten conserved motifs were acquired ( Figure 5, motif one to ten); the amino acid numbers ranged from 21 to 50. Among them, motifs one to five could be found in all of the 24 members, while motif seven could only be found in 14 members ( Figure 5). The similarities in the characteristic motifs between the SlCrRLK1L proteins may reflect functional similarities.

Subcellular Localization
Previous studies have found that most CrRLK1L proteins are localized in the plasma membrane. In this study, three assays were used to predict the subcellular localization of the SlCrRLK1L proteins. As shown in Table S3, almost all of the SlCrRLK1L proteins were predicted to localize in the plasma membrane, which was consistent with our SlCrRLK1L20 subcellular localization results. Meanwhile, the results obtained by different prediction methods were also different. The CELLO and MultiLoc2 shared most of their results, while the Plant-mPLoc Computation did not, owing to their various predicted algorithms. In addition, the signal peptide and transmembrane helix predictions of the SlCrRLK1L proteins further demonstrated the membrane localization of these proteins.

SlCrRLK1L Gene Promoter Analysis
To better explore the putative functions in tomatoes, the SlCrRLK1L promoters were analyzed by PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (accessed on 3 September 2022) and PlantTFDB (http://planttfdb.gao-lab.org/index.php) (accessed on 5 September 2022). The PlantCARE tool was used to detect the predicted cis-acting elements. As a result, 709 cis-acting elements were predicted in the SlCrRLK1L promoters, which were divided into 20 featured categories ( Figure 6; Table S5). The predicted cis-acting elements were mainly related to light, low temperature, ethylene, gibberellin, abscisic acid (ABA), methyl jasmonate (MeJA), salicylic acid (SA), auxin, and wound responsiveness, suggesting that SlCrRLK1L may participate in hormone, stress, and defense responses. In addition, PlantTFDB was selected to predict transcription factor binding sites. As shown in Figure 6 and Table S6, 712 binding sites were identified in the SlCrRLK1L promoters; the represented sites were visualized and belonged to various types of transcription factors. Among them, NAC, AP2, MIKC-MADS, Dof, and MYB were the most abundant. However, there was no available data on the SlCrRLK1L18 promoter because of incomplete sequencing or annotation of the genome.

Prediction of SlCrRLK1L Conserved Protein Motifs
The SlCrRLK1L conserved protein motifs were analyzed by MEME (https://memesuite.org/meme/tools/meme) (accessed on 1 September 2022). In total, ten conserved motifs were acquired ( Figure 5, motif one to ten); the amino acid numbers ranged from 21 to 50. Among them, motifs one to five could be found in all of the 24 members, while motif seven could only be found in 14 members ( Figure 5). The similarities in the characteristic motifs between the SlCrRLK1L proteins may reflect functional similarities.

Subcellular Localization
Previous studies have found that most CrRLK1L proteins are localized in the plasma membrane. In this study, three assays were used to predict the subcellular localization of the SlCrRLK1L proteins. As shown in Table S3, almost all of the SlCrRLK1L proteins were sponsiveness, suggesting that SlCrRLK1L may participate in hormone, stress, and defense responses. In addition, PlantTFDB was selected to predict transcription factor binding sites. As shown in Figure 6 and Table S6, 712 binding sites were identified in the SlCrRLK1L promoters; the represented sites were visualized and belonged to various types of transcription factors. Among them, NAC, AP2, MIKC-MADS, Dof, and MYB were the most abundant. However, there was no available data on the SlCrRLK1L18 promoter because of incomplete sequencing or annotation of the genome.

SlCrRLK1L Gene Expression Pattern Analysis
It has been experimentally demonstrated that the CrRLK1L genes have tissue-specific expression patterns in Arabidopsis, tobacco, and apple [34,39,40]. To study the tissue-specific expression in tomato CrRLK1L genes, the expression profiles of all of the 24 SlCrRLK1L genes were examined in ten samples (root, leaf, bud, flower, from 1cm to 3 cm

SlCrRLK1L Gene Expression Pattern Analysis
It has been experimentally demonstrated that the CrRLK1L genes have tissue-specific expression patterns in Arabidopsis, tobacco, and apple [34,39,40]. To study the tissue-specific expression in tomato CrRLK1L genes, the expression profiles of all of the 24 SlCrRLK1L genes were examined in ten samples (root, leaf, bud, flower, from 1 cm to 3 cm of fruit, mature green fruit, breaker fruit, and breaker plus a 10-day fruit). The original RNAseq data was extracted from the SGN tomato functional genomic database (SGN-TFGD, http://ted.bti.cornell.edu/cgi-bin/TFGD/digital/home.cgi) (accessed on 4 September 2022) [36,41]. As shown in Figure 7a, SlCrRLK1L20 was dominantly expressed in all ten samples, especially in the roots and fruits. SlCrRLK1L2, SlCrRLK1L5, SlCrRLK1L16, SlCrRLK1L17, and SlCrRLK1L23 were mainly expressed in the flowers. SlCrRLK1L7 and SlCrRLK1L15 had relatively high expression levels in the fruits, and SlCrRLK1L7 held the highest expression level in the leaves as compared to the other genes. Compared with other tissues, SlCrRLK1L8 and SlCrRLK1L12 had relatively high expression levels in the roots. The other SlCrRLK1L genes held relatively low expression levels in all of the examined samples.
The promoter analysis of the SlCrRLK1L genes indicated that SlCrRLK1L might not only be involved in plant growth but also in defense responses. To explore this query, we calculated and compared the expression ratios of SlCrRLK1Ls treated with different bacteria and PAMP using the RNA-seq data from SGN-TFGD. The expression levels of the SlCrRLK1L genes changed with the different treatments, yet some of them possessed no available data (Figure 7b). When treated with flgII-28, a pathogen-associated molecular pattern (PAMP) founded in Pseudomonas syringae pv. tomato T1, SlCrRLK1L3, 7, 8, 9, and 15 were up-regulated and SlCrRLK1L2, 21, and 22 were down-regulated. Only SlCrRLK1L22 and SlCrRLK1L11 were significantly down-regulated by Pseudomonas syringae pv. tomato DC3000 or Agrobacterium tumefaciens infections. As for the Pseudomonas fluorescens and Pseudomonas putida treatments, SlCrRLK1L2 was significantly down-regulated by P. fluorescens and P. putida, while SlCrRLK1L3 was significantly up-regulated by these two bacteria.
SlCrRLK1L15 had relatively high expression levels in the fruits, and SlCrRLK1L7 held the highest expression level in the leaves as compared to the other genes. Compared with other tissues, SlCrRLK1L8 and SlCrRLK1L12 had relatively high expression levels in the roots. The other SlCrRLK1L genes held relatively low expression levels in all of the examined samples. The promoter analysis of the SlCrRLK1L genes indicated that SlCrRLK1L might not only be involved in plant growth but also in defense responses. To explore this query, we

Discussion
As an important member of the plant RLK family, CrRLK1Ls have been found in many species, including angiosperms (for example, Arabidopsis, rice, and apple), gymnosperms (Picea abies), and early diverging lineages (for example, the Closterium peracerosumstrigosumlittorale complex, Marchantia polymorpha, and Physcomitrella patens) ( Table 2). However, the structural characteristics and functions of the tomato CrRLK1L gene family remain unclear. Based on this, we comprehensively analyzed the physicochemical properties, structural characteristics, and expression patterns of tomato CrRLK1Ls. A previous study found that there were 23 CrRLK1L subfamily members in the tomato genome [37]. In this study, after sequence analysis, we used a new method to search the state-of-the-art and well-annotated tomato protein databases, and 24 SlCrRLK1Ls were re-identified. A comparison between the tomato CrRLK1L proteins in "this study" and a "previous study [37]" was also carried out. As shown in Figure S1, eight proteins were identified here for the first time. These proteins have not been identified previously, maybe due to the different analytical methods and genome annotations used. In addition, the annotations in SGN and NCBI had some differences. In short, some of the SlCrRLK1L gene structures from the SGN database lacked well-annotated UTRs and CDSs ( Figure S1). To ensure the accuracy of the results, SlCrRLK1L20 was selected for further analysis. The results showed that NCBI had better annotations than SGN at this point ( Figure 2). This situation is also present in other species. After the first identification, subsequent restudies found that the number of CrRLK1Ls was different from that in previous studies of Arabidopsis and rice [27,57].
Homologous proteins often have similar functions. The phylogenetic analysis revealed that tomato CrRLK1Ls were closely related to Arabidopsis. Our homology search showed that 11 out of the 24 SlCrRLK1Ls had Arabidopsis homologs with known functions. It is speculated that these homologous genes may have evolved from a common ancestor, implying that they may have similar functions in some signaling pathways.
In Arabidopsis, rice, apple, strawberry, and soybean, CrRLK1Ls have been proven to be involved in development, fertility, environmental responses, and immunity [2]. Our cisactivating elements and transcription factor binding site analysis indicated that SlCrRLK1Ls may be involved in plant development, hormones, and environmental responses, such as auxin, ethylene, abscisic acid, wounds, light, and temperature ( Figure 6), some of which were confirmed in Arabidopsis homologs, as illustrated above. Gene expressions were closely linked to their functions. During fruit ripening, SlCrRLK1L20 showed a very high abundance of expression (Figure 7), which was consistent with its function in regulating fruit ripening [30]. Moreover, the expression pattern analysis suggested that SlCrRLK1Ls may participate in the response to bacterial infections. Upon treatment, SlCrRLK1L2, 3,8,11,15,19, and 22 displayed relatively strong responses (Figure 7), indicating that these genes may be involved in plant-pathogen interactions. The functions of these SlCrRLK1L members need further exploration in the future.
In conclusion, we identified and analyzed the CrRLK1L family in tomato by bioinformatic, biochemical, and cell biology assays and provided a theoretical basis and guidance for further functional studies of these proteins.

Antibody Preparation
The specific polypeptide (KDLNESPGYDASMTDSRS) was synthesized and used as an antigen for immunizing rabbits in order to prepare the anti-SlCrRLK1L20 polyclonal antibody by the Abmart (Shanghai) company (Shanghai, China).

Western Blot Assay
The total proteins were extracted as described previously [30] and separated using a 10% SDS-PAGE gel. After the electrophoresis, the proteins were transferred to a PVDF membrane. The PVDF membrane was blocked in 5% skim milk for 1 h and then incubated with anti-HA (Abmart; 1:5000) and anti-SlCrRLK1L20 (this study; 1:1000) antibodies for 1 h, respectively. The images were captured by a chemiluminescent imaging system (Tanon). SlCrRLK1L20-HA and SlCrRLK1L20 were detected with the anti-HA and anti-SlCrRLK1L20 antibodies, respectively.

Gene Location and Collinearity Analysis
The location information of the SlCrRLK1L genes on the tomato chromosomes was obtained from the NCBI database and was illustrated by Advanced Circos (TBtools v1.108) [62]. The collinearity analysis of the SlCrRLK1L genes was conducted using the one-step MCScanX from TBtools with the default parameters [62].

Protein Domain and Gene Structure Analyses
For the protein domain analyses, the SlCrRLK1L protein sequences were submitted to the NCBI Batch CD-Search (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb. cgi?) (Bethesda, MD, USA, accessed on 3 September 2022) [69] and processed using the default parameters, and the results (E-value < 1 × 10 −10 ) were then obtained. The signal peptide and transmembrane helix regions were predicted by DeepTMHMM (https://dtu. biolib.com/DeepTMHMM) (Copenhagen, Denmark, accessed on 3 September 2022) [70] using the default parameters. The detailed protein domain data are listed in Table S4. For the gene structure analyses, the SlCrRLK1L gene annotation files were obtained from NCBI and subjected to the Visualize Gene Structure tools from TBtools for visualization.

Conserved Protein Motif Analysis
The SlCrRLK1L protein sequences were submitted to the MEME suite 5.5.0 webtool (https://meme-suite.org/meme/tools/meme) (San Diego, CA, USA, accessed on 1 September 2022) [71] and processed using the default parameters, and the result file was visualized using the Visualize MEME/MAST Motif Pattern (TBtools v1.108).

Gene Expression Pattern Analysis
The SlCrRLK1L gene expression pattern analysis used RNA-seq data from the SGN tomato functional genomic database (SGN-TFGD, http://ted.bti.cornell.edu/cgi-bin/ TFGD/digital/home.cgi) (Ithaca, NY, USA, accessed on 4 September 2022) [36,41]. For the expression pattern of the bacteria and the PAMP treatment, the expression data ratio was calculated and transformed with log2 to normalize. The data are listed in Tables S7 and  S8. Morpheus (https://software.broadinstitute.org/morpheus/) (Cambridge, MA, USA, accessed on 4 September 2022) was adopted to illustrate the heatmap.

Accession Number
The detailed accession number can be found in Table S9.

Conflicts of Interest:
The authors declare no conflict of interest.