Leucine-Rich Repeat (LRR) Domains Containing Intervening Motifs in Plants

LRRs (leucine rich repeats) are present in over 14,000 proteins. Non-LRR, island regions (IRs) interrupting LRRs are widely distributed. The present article reviews 19 families of LRR proteins having non-LRR IRs (LRR@IR proteins) from various plant species. The LRR@IR proteins are LRR-containing receptor-like kinases (LRR-RLKs), LRR-containing receptor-like proteins (LRR-RLPs), TONSOKU/BRUSHY1, and MJK13.7; the LRR-RLKs are homologs of TMK1/Rhg4, BRI1, PSKR, PSYR1, Arabidopsis At1g74360, and RPK2, while the LRR-RLPs are those of Cf-9/Cf-4, Cf-2/Cf-5, Ve, HcrVf, RPP27, EIX1, clavata 2, fascinated ear2, RLP2, rice Os10g0479700, and putative soybean disease resistance protein. The LRRs are intersected by single, non-LRR IRs; only the RPK2 homologs have two IRs. In most of the LRR-RLKs and LRR-RLPs, the number of repeat units in the preceding LRR block (N1) is greater than the number of the following block (N2); N1 » N2 in which N1 is variable in the homologs of individual families, while N2 is highly conserved. The five families of the LRR-RLKs except for the RPK2 family show N1 = 8 − 18 and N2 = 3 − 5. The nine families of the LRR-RLPs show N1 = 12 − 33 and N2 = 4; while N1 = 6 and N2 = 4 for the rice Os10g0479700 family and the N1 = 4 − 28 and N2 = 4 for the soybean protein family. The rule of N1 » N2 might play a common, significant role in ligand interaction, dimerization, and/or signal transduction of the LRR-RLKs and the LRR-RLPs. The structure and evolution of the LRR domains with non-LRR IRs and their proteins are also discussed.

be completely consistent with, or almost so, with those revealed by structure analyses [72]. Furthermore, to identify non-LRR IRs, a method (called LRR@IRpred) utilizing LRRpred was developed and used to find LRR@IR proteins from organisms other than plants [47]. The present article reviews 19 families of plant LRR@IR proteins identified by LRR@IRpred and describes some features of their LRR domains. The structure, function and evolution of the LRR domains as well as the LRR@IR proteins are discussed.

Structures of Plant LRR Proteins
All of the LRR domains in one protein form a single continuous structure and adopt an arc or horseshoe shape [73]. Three residues at positions 3 to 5 in the HCS, LxxLxLxxNxL or LxxLxLxxCxxL, form a short -strand. On the inner, concave face there is a stack of the parallel -strands and on the outer, convex face there are a variety of secondary structures such as -helix, 3 10 -helix, polyproline II helix, or a tandem arrangement of -turns, which are connected by two loops. Most of the known LRR structures have caps, which shield the hydrophobic core of the first LRR unit at the N-terminus and/or the last unit at the C-terminus. In extracellular proteins or extracellular regions, the N-terminal and C-terminal caps frequently consist of Cys clusters including two or four Cys residues; the Cys clusters on the N-and C-terminal sides of the LRR arcs are called LRRNT and LRRCT, respectively [8][9][10].
The crystal structures of PS-LRR domains of Phaseolus vulgaris PGIP and A. thaliana BRI1 (an LRR@IR protein) have been determined [15][16][17]. The structure of the BRI1 LRR domain forms a right-handed superhelix composed of 25 PS-LRRs ( Figure 1A) [16,17]; most of these 25 PS-LRRs are 24 residues long. The helix completes one full turn, with a rise of 70 Å. The concave surface is formed by and 3 10 helices that produce inner and outer diameters of 30 and 60 Å, respectively. The consensus sequence LxGx(I/L)P at positions 11 to 16 likely forms a second -strand, which characterizes the fold of the PS-LRRs. Thus, the structural LRR units may be represented by --3 10 . BRI1 has both an LRRNT with Cx 6 C and an LRRCT with Cx 6 C; both the LRRNT and LRRCT form two disulfide bonds. The disulfide bonds contribute to the stability of the N-terminal cap structure (N-Cap) consisting of one -strand and two -helices and the C-terminal cap structure (C-Cap) consisting of two short helices.
The crystal structures of LRR domains of A. thaliana transport inhibitor response 1 (TIR1) and coronatine-insensitive protein 1 (COI1) (that are F-box proteins) are also available [74][75][76]. TIR1 has 18 LRRs of various lengths (from 22 to 35 residues) of which 13 are noncanonical, imperfect LRRs and have long -strands of 4-6 residues. Most VS parts adopt -helix. Thus, the structural LRR units may be represented by -. The TIR1 LRR domain form a right-handed superhelix of one full turn, which is represented by one closed ring, as well as the BRI1 LRR domain [74,75]. The top surface of the TIR1 superhelix has three long intra-repeat loops (loop-2 in LRR2, loop-12 in LRR12 and loop-14 in LRR14). The loop-2 plays a pivotal role in constructing the auxin-and substrate-binding surface pocket by interacting with the nearby concave surface of the TIR1 LRR structure. The COI1 LRR domain adopts a very similar structure to that of TIR1 [76]. Similarly, three long intra-repeat loops are involved in the bindings of hormone (jasmine) and polypeptide substrates [76]. . The LRRs are colored blue, the cap structures at the N-terminal and C-terminal side orange, the non-LRR IR in BRI1 pink, and the disulfide bonds yellow. All figures were prepared with PYMOL.

Plant LRR@IR Proteins
Plant LRR@IR proteins found through previous research by Matsushima et al. [47] and by use of keywords in the references are described. Homologs of an individual protein family from various plant species were collected by the following procedures. First, LRRs in a representative LRR@IR protein of each family were identified by LRR@IRpred; the number of repeat units in the preceding LRR block (N 1 ), its number in the following block (N 2 ), and the non-LRR IR sequence of the LRR region were determined. Second, database searches using the amino acid sequences of the non-LRR IR and one LRR unit at the N-terminal and C-terminal IR region were performed by FASTA at the Bioinformatic Center, Institute for Chemical Research, Kyoto University on February 15, 2012.
Third, PS-LRR proteins with highly significant similarity (E-value < 10 −10 ) were identified and then they were regarded as putative homologs in which the results of amino acid sequence alignments of full lengths and non-LRR IRs, and their domain architecture, were taken account of. Finally, LRRs in the homologs of each family were identified by LRR@IRpred. When a candidate region is not an LRR unit and its length is longer than average length of the repeating unit of LRRs, it was defined as a non-LRR IR.
Finally, the 19 families of 344 LRR@IR proteins are described (Supplementary Table S1). The 19 families are grouped into LRR-RLKs, LRR-RLPs, and intracellular proteins. At least one protein in each family has clear experimental evidence for its existence or expression data (such as existence of cDNA(s), RT-PCR or Northern hybridizations) of the existence of a transcript. TMHMM predicts that A. thaliana RSYR1 and RPP27 contain a transmembrane region at the N-terminal side (Supplementary  Table S1). However, orthology or domain structure was taken account of, and then these two proteins were regarded as LRR-RLKs. SignalP predicts no signal peptide in A. thaliana At1g74360 and soybean putative disease resistance protein. Similarly, these proteins were regarded as an LRR-RLK and an LRR-RLP, respectively.
The present review could not describe all families of LRR@IR proteins in plants because of a limited survey of LRRs having non-LRR, IRs which comes from LRR@IRpred.

Six Families of LRR-RLKs
LRR-RLKs have an extracellular LRR region with an N-terminal signal peptide, a single transmembrane-spanning region, and an intracellular serine-threonine kinase region [18,19],. Transmembrane kinase 1 (TMK1), brassinosteroid insensitive 1 (BRI1), A. thaliana At1g74360 protein, phytosulfokine receptor (PSKR), tyrosine-sulfated glycopeptide receptor 1 (PSYR1), and LRR receptor-like serine/threonine-protein kinase RPK2 are members of the LRR-RLKs family. The LRR-RLKs are LRR@IR proteins in which the LRRs are intersected by a single non-LRR IR; only RPK2 has two IRs ( Figure 2 and Table 1, and Supplementary Table S1 and Figure S1).   Table 1. Nineteen families of plant LRR proteins having LRR domains intersected by non-LRR island regions. a "N 1 " is the repeat number of LRRs of the first LRR block in the homologs of each family. b "N 2 " is the repeat number of LRRs of the second LRR block in the homologs of each family. c "N 1 /N 2 " is average values. d The LRR domain in Arabidopsis RPK2 contains two non-LRR IRs. The number "13" is the sum of repeat number of LRRs of the first and second LRR blocks. The number "8" is the repeat number of the third LRR block. The transcript concentration of O. sativa TMK1 increase in the rice internode in response to gibberellins [83]. Nicotiana tabacum TMK1 mRNA accumulation in leaves was stimulated by CaCl 2 , methyl jasmonate, wounding, fungal elicitors, chitins, and chitosan [84]. TMK1 orthologs were identified from 14 plant species and its paralogs are present in 10 species, including A. thaliana, Glycine max, and O. sativa ( Figure 2 and Table 1, and Supplementary Table S1 and Figure S1). Also G.max Rhg4, which is a soybean cyst nematode resistance gene [85], was identified as a TMK1 homolog; while G.max Rhg1 [C9VZY3] contains 13 PS-LRRs of 24 residues in which only LRR6 is 29 residues long. The TMK1 homologs contain 13 LRRs intercepted by a 56 to 76-residue, non-LRR IR. The number of repeat units in the preceding LRR block (N 1 ) is greater than the number of the following block (N 2 ), which means N 1 » N 2 with N 1 = 10 and N 2 = 3. The non-LRR IRs have a cluster of four Cys residues with the pattern of Cx 6−7 Cx 29−30 Cx 6−11 C and a conserved motif of Lx 8 Yx 7−8 WxG where "Y" is Tyr or Phe, "W" is Trp, and "G" is Gly; this motif is similar to Yx 8 KG found in many LRR-RLPs [46]. An LRRNT (with Cx 6 C) is observed, but not an LRRCT. Putative C-Cap regions are rich in Gly, Ser, and Pro residues. BRI1/SR160 is a receptor complex for brassinosteroids that are necessary for plant development, including expression of light-and stress-regulated genes, promotion of cell elongation, normal leaf and chloroplast senescence, and flowering [86][87][88][89][90][91][92]. BRI1 orthologs were identified from 24 species and its paralogs are also present in 10 species. The BRI1/SR160 homologs contain 21-26 LRRs with a single non-LRR IR. The N 1 value is relatively variable among species and is 10-22, while N 2 = 4; N 1 » N 2. (Figure 2 and Table 1, and Supplementary Table S1 and Figure S1). A. thaliana BRI1 contains 25 LRRs interrupted by a 70-residue IR between LRR21 and LRR22. The non-LRR IR, together with LRR22, binds brassinosteroids [62]. The non-LRR IRs of the BRI1 homologs are 68-70 residues long and have a cysteine cluster of Cx 25 PSKR is a PSK receptor that regulates, in response to PSK binding, a signaling cascade involved in plant cell differentiation, organogenesis, and somatic embryogenesis [55,63,93,94]. PSKR orthologs and paralogs were identified ( Figure 2 and Table 1, and Supplementary Table S1 and Figure S1). The PSKR homologs contain LRRs with a 36 to 38-residue, non-LRR IR. N 1 = 17 − 18 and N 2 = 4 ( Figure 2 and Table 1, and Supplementary Table S1 and Figure S1). The non-LRR IRs have a conserved motif of (Y/F)x 5−12 Yx 5 F. Most LRRCT regions are basic. Daucus carota PSKR contains 22 LRRs intersected by a 36-residue IR between LRR17 and LRR18. An LRRNT (with Cx 33 CCx 6 C) that is similar to that in PGIP [15] and LRRCT (with Cx 6 C) are observed. A 15-residue region within the non-LRR IR is a binding site of PSK [63]. The corresponding regions in the homologs are relatively variable.
A. thaliana At1g74360 is a BRI1-related protein ( Figure 2 and Table 1, and Supplementary Table  S1 and Figures S1). Putative orthologs and paralogs were identified from 10 species. The At1g74360 family contains 21-22 LRRs with a single IR. The N 1 value is relatively conserved among species; N 1 = 16 − 17, while not N 2 = 4 but N 2 = 5. The non-LRR IRs of 76-residue are longer than those in BRI1 and have a cysteine cluster with the pattern of Cx 25 Cx 16 C. The IRs are highly conserved among the homologs.

Eleven Families of LRR-RLPs
LRR-RLPs have a short cytoplasmic tail instead of the kinase region in LRR-RLKs (Figure 3) [20]. LRR-RLPs are involved both in resistance of plant-pathogen interactions and development [34,102]. Tomato Cf genes confer resistance to the fungal pathogen Cladosporium fulvum [43,56,103,104]. Tomato Verticillium wilt disease resistance gens (Ve1) and Ve2, apple HcrVf2, Arabidopsis RPP27 are involved in resistance to Verticillium, Venturia, and Peronospora, respectively [105][106][107]. Furthermore, the tomato LeEIX initiates defense responses upon elicitation with a fungal ethylene-inducing xylanase (EIX) of non-pathogenic Trichoderma from tomato that confer resistance against the fungal pathogen Cladosporium fulvum [108,109]. The clavata2 (CLV2) functions in both shoot and root meristems of Arabidopsis [58,[110][111][112] and also affects autoregulation of nodulation of pea and Lotus japonicus [113,114]. Zea mays fascinated ear2 is involved in meristem development [59]. A. thaliana RLP2 is involved in the perception of CLV3 and CLV3-like peptides, that act as extracellular signals regulating meristems maintenance [64]. The LRR-RLPs are all LRR@IR proteins in which the LRRs are intersected by a single non-LRR IR ( Figure 3 and Table 1, and Supplementary Table S1 and Figure S1).  Tomato Cf-9/Cf-4 homologs were identified from six species. Elicitor-inducible LRR receptor-like protein (EILP) from N. tabacum [115] was identified as ortholog of tomato Cf-9/Cf-4. The number of N 1 is 18 to 22, while N 2 keeps 4, and the non-LRR IRs are 40-44 residues long ( Figure  3 and Table 1, and Supplementary Table S1 and Figure S1) and have a conserved motif of MKx 3 Ex 6 Yx 5−8 Yx 7 TKG in which hydrophilic residues are conserved. The EILP protein also contains 27 LRRs with N 1 = 23 and N 2 = 4. Most of the homologs have LRRNT consisting of six Cys residues with the pattern of Cx 24−29 Cx 13−23 CCx 6 Cx 12−13 C. However, peru 1 and peru 2 have an LRRNT of four Cys's with the pattern of Cx 47 CCx 6 C [116]. The C-terminal side of the LRRCT is rich in Glu and Asp residues and thus is acidic.
Tomato Cf-2/Cf-5 homologs were identified from two species (Lycopersicon esculentum, and L. pimpinellifolium). The number of N 1 is highly variable; N 1 = 20 − 33, while N 2 keeps 4, and the non-LRR IRs are 37-41 residues long. The IRs are hydrophilic. The variability of N 1 has been reported by other researches in between the paralogs and orthologs [43,46,103,104] (Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). Interestingly, the N-terminal LRRs include tandem repeats of the super-motif of two highly conserved LRRs; for example, LxxLxLxxNxLSGxIPxxIGYLRS and LxxLxLSxNxLNGxIPxxFGxLxN in currant tomato Cf-2.1 [103].
Apple HcrVfs (Homologs of Cladosporium fulvum resistance genes of Vf region) are scab resistance genes [119,120]. Mentha longifolia HcrVfs are orthologs of tomato Ve genes [105,117,118]. The HcrVfs paralogs contain 32-34 LRRs intercepted by a 41 to 46-residue, non-LRR IR with N 1 = 22 − 28 and N 2 = 4 ( Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). The non-LRR IRs have a conserved motif of VTKGxExEYx(K/E)ILxFxKxxDLSCNF in which hydrophilic residues are conserved. The C-terminal side of the LRRCT is rich in Gly and Pro residues.
A. thaliana RPP27 homologs were also identified from A. lyrata. The LRR@IR proteins contain 16-30 LRRs intercepted by a 61 to 71-residue, non-LRR IR with N 1 = 11 − 26 and N 2 = 4 ( Figure 2 and Table 1, and Supplementary Table S1 and Figure S1). The IRs have a conserved motif of FxxKxRYD. The C-terminal side of most LRRCT regions is acidic.
Tomato LeEIX1 and LeEIX2 contain 31 LRRs intercepted by a 47 to 49-residue, non-LRR IR with N 1 = 27 and N 2 = 4 ( Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). The C-terminal side of the LRRCT is acidic.
A. thaliana CLV2 homologs were identified from 11 species. The CLV2 homologous proteins contain 22 LRRs intercepted by a 41 to 43-residue, non-LRR IR with N 1 = 18 and N 2 = 4 ( Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). The IRs have a conserved motif of LxFxYxL. The C-terminal side of most LRRCT regions is acidic. A. thaliana CLV1 is an LRR-RLP but not LRR@IR protein.
Z. mays fascinated ear2 is an ortholog of Arabidopsis CLV2. The homologs were also identified from O. sativa subsp. Japonica, and indica, and S. bicolor. The fascinated ear2 homologous proteins contain 17-18 LRRs intercepted by a 41 to 42-residue, IR with N 1 = 10 − 14 and N 2 = 4 ( Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). The IRs and the LRRCT regions are rich in Gly. Both regions may be flexible.
A. thaliana RLP2 contains 23 LRRs that are intercepted by a 44-residue, IR with N 1 = 18 and N 2 = 4 ( Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). There are an LRRNT and an LRRCT. The extracellular region including the 23 LRRs is homologous to that in A. thaliana PSYR1 [121].
O. sativa Os10g0469700 is an LRR@IR protein; the function is unknown (Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). The homologs from four species contain 10 LRRs with a single IR with N 1 = 6 and N 2 = 4. The non-LRR IRs with 39-40 residues is represented by MKxP(K/E)IxSSx 2−3 LDGSxYQDRIDIxWKGx 3 FQx 4 L.
A putative disease resistance protein from soybean [C6ZS07] is an LRR@IR protein ( Figure 3 and Table 1, and Supplementary Table S1 and Figure S1). The homologs were identified from four species and contain 8-32 LRRs with a single IR with N 1 = 4 − 28 and N 2 = 4. The N 1 number is highly variable in both the paralogs and orthologs. The IRs have a conserved motif of Yx 2 Sx 5 Kx 7 (R/K)I.

Two Families of Plant Intracellular Proteins
A. thaliana TONSOKU(TSK)/MGOUN3(MGO3)/BRUSHY1(BRU1), which is localized in the nucleus and is preferentially expressed in the shoot apex than in the leaves and stems, is required for cell arrangement in root and shoot apical meristems and involved in structural and functional stabilization of chromatin [122][123][124]. The TONSOKU protein may represent a link between response to DNA damage and epigenetic gene silencing [125].
Potential homologs of A. thaliana TONSOKU have been identified in eight species. The UniProKB database describes that A. thaliana TONSOKU contains three LRRs and eight TPRs, while the data bases -InterPro, Gene3D, SMART and PROSITE-identify only TPR. LRR@IRpred identifies 14 LRRs with a single IR; N 1 = 13, N 2 = 1 ( Figure 4 and Table 1, and Supplementary Table S1 and Figure  S1) [47]. The LRRs are not "plant-specific" motifs but presumably "RI-like" motifs. Thus, the structural LRR units may be represented by - instead of --3 10 . The LRR domain is predicted to adopt a typical horseshoe shape seen in ribonuclease inhibitor [126]. The non-LRR IRs are 70-131 residues long and are rich in Ser and Gly. The IRs may be unstructured or flexible.
A. thaliana MJK13.7 is considered to be intracellular protein. The function is unknown. A. thaliana MJK13.7 homologs were identified from 11 species. The homologs contain 20 LRRs intersected by a single IR; N 1 = 12, N 2 = 8 ( Figure 4 and Table 1, and Supplementary Table S1 and Figure S1). All of the non-LRR IRs are 60-62 residues long and have conserved Lys residues at five positions. The consensus of the LRRs is LxxLxLxxNxLxxLPxxLxxLxx of 23 residues that are present in many proteins from bacteria to human (data not shown). The LRR motif does not belong to PS-LRR and the structure of the LRR domain is not available. However, the LRR motifs are contained in part of the LRR domains in toll-like receptor 1 (TLR1) and glycoprotein Ib (GpIb) of which the crystal structures are available [127][128][129][130]. Four LRRs are IKVLDLHSNKI KSIPKQVVKLEA and LQELNVASNQL KSVPDGIFDRLTS in TLR1, and LGTLDLSHNQL QSLPLLGQTLPA and LDTLLLQENSL YTIPKGFFGSHL in GpIbe. The structures revealed that the LRRs may be characterized by extended conformations at the bold sequences [127][128][129][130]. Moreover, A. thaliana MJK13.7 forms a family with its homologs from insect species, Strongylocentrotus purpuratus, Nematostella vectensis, and Paramecium tetraurelia and LRRC40 from vertebrates species [47]. The S. purpuratus protein has 163 residues containing two repeats of 64 residues each [47].

An NBS-LRR Protein
Rice blast resistance gene Pi-ta encodes an NBS-LRR protein with 928 residues [44,45]. The Pi-ta protein [Q9AY26] lacks a canonical LRR [44]. The C-terminal region contains highly imperfect LRRs with 10 repeats of various lengths (from 16 to 75 residues) based on the consensus LxxLxxL. The Pi-ta protein appears to be an LRR@IR protein. LRR@IRpred predicts 13 LRRs of 20-54 residues with one non-LRR, IR between LRR6 and LRR7 (Supplementary Figure S1). The secondary structure prediction prefers -helix in the VS's. The Pi-ta LRR domain might adopt a similar structure to those of TIR1 and COI1 [74][75][76].
The non-LRR IRs in plant LRR@IR proteins may be classified into two groups; one group is non-LRR IRs having cysteine clusters, while the other has no cysteine clusters. The IR cysteine clusters are characterized by Cx 6-7 Cx 29-30 Cx 7-11 C in A. thaliana TMK1 homologs, Cx 25 C in BRI1 homologs, and Cx 25 Cx 16 C in At1g74360 homologs. The other non-LRR IRs frequently have a conserved motif of Yx 8 KG which are observed in the homologs of A. thaliana TMK1, tomato Cf-9/Cf-4, tomato Cf-2/Cf-5, tomato Ve, M. longifolia HCrVf, A. thaliana CLV2, and Z. mays fascinated ear2, and O. sativa Os10g0469700. Non-LRR IRs in many LRR-RLPs from Arbidopsis and rice contain a conserved motif of Yx 8 KG [46].
Most of the LRRNTs consist of two, four, or six Cys residues of which the patterns are Cx 6−7 C, Cx 23−34 CCx 6 C, and Cx 24−29 Cx 13−23 CCx 6 Cx 12−13 C. They probably form one, two, and three disulfide bonds, respectively. The LRRCTs consist of two Cys's with the pattern of Cx 4−29 C which probably form one disulfide bond (Supplementary Table S1 and Figure S1). The disulfide bonds should contribute to the structural stabilization of the N-terminal and C-terminal caps.

Possible Structures
The structure of a non-LRR IR is available in A. thaliana BRI1 ( Figure 1A). The BRI1 LRR domain forms a superhelix with 25 LRRs. The 70-residue, non-LRR, IR in BRI1 between LRR21 and LRR22 forms a small domain that folds back into the interior of the superhelix, where it makes extensive polar and hydrophobic interactions with LRRs 13-25 [16,17]. The LRR domain fold is characterized by an anti-parallel -sheet, which is sandwiched between the LRR core and a 3 10 helix and stabilized by a disulphide bridge of the Cys cluster with Cx 25 C in the non-LRR, IR. Cys clusters are also present in non-LRR, IRs in the homologs of TMK, At1g74360 and TONSOKU. Thus, the non-LRR IRs may adopt similar structures with disulfide bridges. All of the non-LRR IRs would fold back into the interior or exterior of a superhelix of the LRR domains.

Possible Function(s)
The non-LRR IRs of BRI1 and PSKR participate in ligand/protein-protein interactions. The BRI1 non-LRR IR binds brassinosteroids [62]. The insertion of a folded domain into the LRR repeat is probably an adaptation to the challenge of sensing a small steroid ligand [16]. The PSKR non-LRR IR also binds PSK [63]. The non-LRR IRs in TLRs 7, 8, and 9 was also predicted to contribute to nucleic acid-protein interaction [66,132].
The non-LRR IRs in plant LRR@IR proteins have frequently conserved motifs that are characterized by hydrophilic residues such as Lys, Arg, Glu and Asp, as noted. Some non-LRR IRs are presumably flexible. The conservation of hydrophilic residues in the IRs is also observed in the respective families of LRRC40, LRRC9, and C. elegans LRK-1 which are LRR@IR proteins from organisms including vertebrate other than plants [47]. The IRs might contribute to ligand/protein-protein interactions [47]. Moreover, Afzals et al. [133] suggested, based on circular dichroism data, that non-LRR IRs are intrinsically unstructured, providing binding diversity to the domains.
Drosophila Toll and vertebrate TLRs 7, 8, and 9 are LRR@IR proteins [65][66][67] which contain one single transmembrane-spanning region as well as LRR-RLKs and LRK-RLPs from plant. Homo-or heterodimerization are involved in ligand-interactions of vertebrate TLRs [68][69][70][71]. A model for Drosophila Toll activation by ligand Spatzle has been proposed; the first LRR block interacts with Spatzle and the second LRR block forms strong dimer contacts that are prevented by the first block, which in the absence of ligand provides a steric constraint [67,131]. The BRI1 receptor activation involves homodimerization [139]; although Hothorn et al., [16] suggested that the superhelical BRI1 LRR domain alone has no tendency to oligomerize, indicating that BRI1 receptor activation may not be mediated by ligand-induced homodimerization of the ectodomain.
Taken together, non-LRR IRs in plant LRR@IR proteins might participate in ligand/proteininteractions, dimerization or both, although an LRR-RLP, A. thaliana CLV2, remains functional without non-LRR IR, while the first and the second LRR blocks are essential for functionality [64]. N 1 » N 2 brings close proximity of the non-LRR IRs to interact with ligand/protein and a transmembrane region. N 1 » N 2 might facilitate signaling in the cytoplasm through the ligand/protein-interactions.
There is a possibility that Cys residues in LRRs are involved in dimerization of LRR@IR proteins. The conserved hydrophobic residues of the PS-LRR consensus sequence of LxxLxLxxNxLSGxIPxxLxxLxx at positions 1, 4, 6, 11, 15, 19, and 22 contribute to the hydrophobic cores in the LRR arcs [8,9]. The conserved hydrophobic residues at positions 1, 19 and 22, and "N" at position 9, are frequently occupied by Cys in the PS-LRRs. Moreover, Cys residues are frequently observed in noncanonical PS-LRRs which, as examples, are longer LRR motifs of 25-30 residues with the consensus of LxxLxLxxNxLSGxIPxxLCxxxxx(x/-)(x/-)(x/-)(x/-)(x/-), in which "-" indicates a possible deletion site. At the present stage it remains unknown whether the Cys residues contribute to the hydrophobic core of the LRR arcs or are exposed to solvent. However, some LRR@IR proteins contain PS-LRRs having Cys at positions 2, 3, or 5 in the HCS part (Supplementary Table S1). The Cys residues are likely to be exposed to solvent in the LRR arc and thus might induce dimerization.

Implications for Evolution
What is the evolutionary origin of non-LRR IRs interrupting LRRs? Previous research provided evidence that a direct duplication of the super motifs containing non-LRR regions naturally leads to the occurrence of non-LRR IRs in LRR@IR proteins, including LRR-containing 17 protein (LRRC17), LRRC32, LRR33, chondroadherin-like protein, trophoblast glycoprotein precursor, and Leishmania proteophosphoglycans, not from plants but from other eukaryotes [47]. The non-LRR IRs in plant LRR@IR proteins might originate from such similar events.

Evolution of Plant LRR@IR Proteins
A large number of LRR-RLPs resembling the extracellular domains of LRR-RLKs are found in the Arabidopsis genome; although not all RLK subfamilies have corresponding RLPs [121]. Indeed, the present analysis indicates that the extracellular domain in PSYR1 is highly similar to that in RLP2. The same distributions also occur in LRR@IR proteins from other plants, such as S. bicolor and O. sativa (Supplementary Figure S2). Here four examples are described: Sb10g028170/Sb10g028210 (LRR-RLK/LRR-RLP), and Os06g0691800/Os06g0692700; all the four proteins contain 22 LRRs intersected by a single non-LRR IR of 33 residues with N 1 = 18 and N 2 = 4. The others are Os07g0597200/Os03g0400850, and OsI_26735/OsI_11946; the LRR-RLKs-Os07g0597200 and OsI_26735 are homologs of Arabidopsis At1g74360. The pair-wise comparisons of the amino acid sequences exceed 50% of the identity in respective pairs. The above observations indicate that the LRR-RLKs and LRR-RLPs evolved from gene duplications and recombination [39].

Conclusions
Most plant LRR@IR proteins have LRRs intersected a single IR with N 1 » N 2 in which N 1 is variable in their individual homologs, while N 2 is highly conserved. For all known LRR-RLPs, N 1 = 4. The rule of N 1 » N 2 plays a common, significant role in ligand-interaction, dimerization, and/or signal transduction of the LRR-RLKs and the LRR-RLPs. All of the LRR domains consisting of PS-LRRs are predicted to form a superhelix and non-LRR IRs in plant LRR@IR proteins fold back into the interior or exterior of the superhelix. The present analyses suggest that some LRR-RLKs and LRR-RLPs evolved from gene duplications and recombination. The present review will stimulate various experimental studies to understand the structure and evolution of the LRR domains with non-LRR IRs and their proteins.