Exploration of Autophagy Families in Legumes and Dissection of the ATG18 Family with a Special Focus on Phaseolus vulgaris

Macroautophagy/autophagy is a fundamental catabolic pathway that maintains cellular homeostasis in eukaryotic cells by forming double-membrane-bound vesicles named autophagosomes. The autophagy family genes remain largely unexplored except in some model organisms. Legumes are a large family of economically important crops, and knowledge of their important cellular processes is essential. Here, to first address the knowledge gaps, we identified 17 ATG families in Phaseolus vulgaris, Medicago truncatula and Glycine max based on Arabidopsis sequences and elucidated their phylogenetic relationships. Second, we dissected ATG18 in subfamilies from early plant lineages, chlorophytes to higher plants, legumes, which included a total of 27 photosynthetic organisms. Third, we focused on the ATG18 family in P. vulgaris to understand the protein structure and developed a 3D model for PvATG18b. Our results identified ATG homologs in the chosen legumes and differential expression data revealed the nitrate-responsive nature of ATG genes. A multidimensional scaling analysis of 280 protein sequences from 27 photosynthetic organisms classified ATG18 homologs into three subfamilies that were not based on the BCAS3 domain alone. The domain structure, protein motifs (FRRG) and the stable folding conformation structure of PvATG18b revealing the possible lipid-binding sites and transmembrane helices led us to propose PvATG18b as the functional homolog of AtATG18b. The findings of this study contribute to an in-depth understanding of the autophagy process in legumes and improve our knowledge of ATG18 subfamilies.


Introduction
Autophagy is a degradation process essential in the maintenance of homeostasis in eukaryotic cells and is related to a wide variety of physiological and pathophysiological roles, such as host defense, development, infection, and tumorigenesis [1,2]. Autophagy/macroautophagy is a process in which cytosolic components are sequestered within double-membrane vesicles called autophagosomes, which fuse with lysosomes or vacuoles for degradation/recycling [3]. This process is mediated by evolutionarily conserved genes known as autophagy genes (ATGs) [4], which were originally discovered in and isolated from Saccharomyces cerevisiae [5][6][7][8]. Three major intracellular autophagy pathways, namely, macroautophagy, microautophagy and chaperone-mediated autophagy (CMA), have been elucidated, and these differ in the mode of cargo delivery to the lysosome or vacuole [9,10]. Macroautophagy can be nonselective or selective: Nonselective autophagy is a cellular response to nutrient deprivation that involves the random uptake of Recent studies on ATG genes conducted by Norizuki and colleagues (2019) [51] have shown the diversification of ATGs from early plant lineages to higher plants. However, legumes are a large and economically important family of flowering plants, and few studies have investigated autophagy-related aspects. The aim of the present study was to expand the previous studies to higher clades, specifically to fabaceous plants, and thus understand the current diversity and complexity of ATGs. Furthermore, we focused on the ATG18 family to understand its evolutionary relationships, diversification, expression patterns and cis-regulatory elements in many plants ranging from early plant lineages to fabaceous members. We also performed a comprehensive study of various functional and structural aspects of ATG18b in P. vulgaris.

Identification of ATG families in P. vulgaris, M. truncatula and G. max
In A. thaliana, a total of 39 ATG sequences divided into 17 families have been reported. In the present study, we identified a total of 32 genes in P. vulgaris (2n), 39 genes in M. truncatula (2n) and 61 genes in G. max (4x) ( Table 1). A BLAST analysis of Arabidopsis sequences returned 19 (59.37%) homologs in P. vulgaris, 28 (77.77%) homologs in M. truncatula and 30 (48.38%) homologs in G. max with a query coverage of 93-94% and 66-77% identity (Supplementary Information SI1). For this reason, other ortholog analysis databases were used to identify any missing ATG members. The KEGG orthology table for the autophagy pathway was the second main tool because it contains a wide variety of species, and we used this table to obtain more than 70% of genes in P. vulgaris and M. truncatula and 58% in G. max. An analysis of legumes using Ensembl Plants provided more than 70% of ATGs in the legumes under study. Other studies were performed through a HMMER analysis using Ensembl databases and the InParanoid tools in Phytozome. The obtained sequences were verified using Pfam to acquire the positions of the families, domains and repeats, and the protein motifs were determined with MEME. Additional studies were performed using EggNOG, which provided a list of orthologs, particularly in P. vulgaris (Supplementary Figure S1). We also identified 21, 17 and 15 orthologs and 10, 17 and 21 paralogs in P. vulgaris, M. truncatula and G. max, respectively. The genes identified in P. vulgaris, M. truncatula and G. max are listed in Table S1. To understand the evolutionary relationships among ATGs, we generated 17 phylogenetic trees, one for each ATG family in A. thaliana, P. vulgaris, M. truncatula and G. max as per the classification in A. thaliana. The primary protein sequences of A. thaliana, P. vulgaris, M. truncatula and G. max were aligned using Clustal Omega with the default parameters, and phylogenetic trees were obtained with the neighbor-joining method. Each of the ATG sequences was also subjected to a motif analysis, which revealed that the sequences and motifs in all the studied legumes showed high identity to their homologs in Arabidopsis. The phylogenetic tree also revealed that the majority of the ATG family distributions was predominantly composed of Medicago sequences that were more closely related to those in Arabidopsis. Among all the phylogenetic trees of ATGs developed, 11 contained only one clade (ATG2, ATG3, ATG4, ATG5, ATG6, ATG7, ATG10, ATG11, ATG12, ATG14 and ATG101), even if there was more than one isoform, and most of the motif P-values were greater than 1e-100. ATG8 and ATG18 were the families with the highest number of members: ATG18, eight each in Arabidopsis, Medicago and Phaseolus and 19 in G. max; ATG8, nine in Arabidopsis, eight in Medicago, six in P. vulgaris and 10 in G. max. The phylogenetic analysis of ATG8 and ATG18 was divided into three clades with motif P-values between 1 × 10 −13 and 1 × 10 −90 ( Figure 1). The close association of the homologs in all the species studied depicts the conservation of sequences and hence implies biological function conservation.
The chromosome localization of ATGs in the A. thaliana and legume genomes was mapped using Circos ( Figure 2). The distribution of ATG homologs among the chromosomes was uneven in all the species compared. Among all 17 families, the maximal number of homologs was located on chromosome 3 in A. thaliana (8) and P. vulgaris (6), chromosome 4 in M. truncatula (6) and chromosomes 4 and 17 in G. max (6).
The Ka/Ks ratio among most of the ATG sequences was lower than 1 (average 0.17), which indicates purifying selection; in contrast, the sequences of ATG8 (1.24) and two sequences (GmATG18e and GmATG18b. I) of ATG18 (1.09 and 1.04) in G. max had values higher than 1, which indicated accelerated evolution and positive selection ( Figure 3). The Ka/Ks ratios suggest the conservation of ATG homologs in terms of both sequence and biological function.

Promoter Analysis and Expression Profiling of ATG Families
Promoter analysis is an important method for understanding the regulatory mechanisms governing ATGs in response to growth and developmental issues and to environmental cues. The analysis of cis-acting elements in the promoters of all 17 ATG families resulted in 44 different transcription factors. The most abundant transcription factors identified were B-Proto-Oncogene-MYB involved in the ABA response and C-Proto-Oncogene-MYC related to jasmonate signaling, and the transcription factors with the motifs ethylene response elements (ERE), TATA box, CAATT-box and G-box were found for all ATGs in A. thaliana, P. vulgaris, M. truncatula and G. max (Supplementary Figure S5). Our results also showed that the ATG8 and ATG18 families contained the highest numbers of MYB, MYC, ERE and Box 4 (ATTAAT) transcription factor-binding sites. Most of the promoters contained MeJA-, SA-, GA-and ABA-responsive elements. Furthermore, light-responsive transcription factors such as BOX-4, G-box, GT1 motif, MRE and ACE were also detected abundantly in most of the families (Figure 4). The phylogenetic tree was constructed with the neighbor-joining method with 1000 repeated bootstrap tests, p-distance and pairwise deletion in MEGA X software and visualized using EvolView. MEME was used to identify motifs of the ATG homologs in A. thaliana, P. vulgaris, M. truncatula and G. max.
The chromosome localization of ATGs in the A. thaliana and legume genomes was mapped using Circos ( Figure 2). The distribution of ATG homologs among the chromo- The phylogenetic tree was constructed with the neighbor-joining method with 1000 repeated bootstrap tests, p-distance and pairwise deletion in MEGA X software and visualized using EvolView. MEME was used to identify motifs of the ATG homologs in A. thaliana, P. vulgaris, M. truncatula and G. max. somes was uneven in all the species compared. Among all 17 families, the maximal number of homologs was located on chromosome 3 in A. thaliana (8) and P. vulgaris (6), chromosome 4 in M. truncatula (6) and chromosomes 4 and 17 in G. max (6).
The chromosomal localization, synteny relationship and gene expression of autophagy genes were integrated into the Circos plot designed using OmicCircos. The outermost circle shows the A. thaliana (blue), P. vulgaris (green), M. truncatula (pink) and G. max (brown) chromosomes. The inner circle is a heatmap that shows the log2 RPKM values of gene expression in leaves and roots under ammonia, nitrate and urea treatments. The innermost line is the synteny of autophagy genes, but the yellow, purple and red lines represent ATG18b subfamilies I, II and III, respectively.
The Ka/Ks ratio among most of the ATG sequences was lower than 1 (average 0.17), which indicates purifying selection; in contrast, the sequences of ATG8 (1.24) and two sequences (GmATG18e and GmATG18b. I) of ATG18 (1.09 and 1.04) in G. max had values higher than 1, which indicated accelerated evolution and positive selection ( Figure 3). The Ka/Ks ratios suggest the conservation of ATG homologs in terms of both sequence and biological function. The chromosomal localization, synteny relationship and gene expression of autophagy genes were integrated into the Circos plot designed using OmicCircos. The outermost circle shows the A. thaliana (blue), P. vulgaris (green), M. truncatula (pink) and G. max (brown) chromosomes. The inner circle is a heatmap that shows the log 2 RPKM values of gene expression in leaves and roots under ammonia, nitrate and urea treatments. The innermost line is the synteny of autophagy genes, but the yellow, purple and red lines represent ATG18b subfamilies I, II and III, respectively.
Interestingly, we elucidated the influence of nitrogen sources on ATG expression in the legume members P. vulgaris, M. truncatula and G. max due to their ability to establish symbiotic associations with nitrogen-fixing Rhizobia. Gene expression data from the Phytozome database were retrieved for leaf and root tissues under urea as the organic source and nitrate and ammonia as inorganic sources, as depicted in Figure 2. The highest expression of ATGs was recorded in roots treated with ammonia and leaves treated with urea. ATG8i and ATG3 showed the highest abundance in all the treatments, and the lowest expression levels were recorded for ATG18b, e, c and h, ATG2 and ATG2.II in G. max and ATG3 and ATG8c in M. truncatula. The ATG18 family homologs ATG18a.II, ATG18g and ATG18h showed induced expression in all tissues under all treatments (Figures 2 and 5a).

Promoter Analysis and Expression Profiling of ATG Families
Promoter analysis is an important method for understanding the regulatory mechanisms governing ATGs in response to growth and developmental issues and to environmental cues. The analysis of cis-acting elements in the promoters of all 17 ATG families resulted in 44 different transcription factors. The most abundant transcription factors identified were B-Proto-Oncogene-MYB involved in the ABA response and C-Proto-Oncogene-MYC related to jasmonate signaling, and the transcription factors with the motifs ethylene response elements (ERE), TATA box, CAATT-box and G-box were found for all ATGs in A. thaliana, P. vulgaris, M. truncatula and G. max (Supplementary Figure S5). Our results also showed that the ATG8 and ATG18 families contained the highest numbers of MYB, MYC, ERE and Box 4 (ATTAAT) transcription factor-binding sites. Most of the promoters contained MeJA-, SA-, GA-and ABA-responsive elements. Furthermore, light-responsive transcription factors such as BOX-4, G-box, GT1 motif, MRE and ACE were also detected abundantly in most of the families (Figure 4).  Interestingly, we elucidated the influence of nitrogen sources on ATG expression in the legume members P. vulgaris, M. truncatula and G. max due to their ability to establish symbiotic associations with nitrogen-fixing Rhizobia. Gene expression data from the Phy-

Identification of ATG18 Families in Plants
Through an extensive study aiming to identify and analyze the ATG18 family, w a b Figure 5. Expression profiles of ATGs in P. vulgaris tissues. (a) The transcription abundances of P. vulgaris ATGs in different tissues and organs during different stages of development and during rhizobial infections obtained from the PvGEA database. (b) Expression data from nodulated roots (R. tropici) and mycorrhized roots (R. irregularis) obtained from RNA-seq analysis. A violin plot shows total number of up/dowregulated ATGs under nodulated/mycorrhized conditions. The highlighted box represents higher number of downregulated genes in mycorrhized condition. Furthermore, the differential expression analysis of ATGs in P. vulgaris tissues showed very low expression in young pods collected 1 to 4 days post floral senescence, whereas the fix-(inefficient) nodules collected at 21 days showed the most abundant expression of all ATGs. Interestingly, inefficient fixation increased the expression levels compared with those found with efficient fixation. Among all PvATGs, the ATG1, ATG10, ATG13b, ATG18c and ATG18g.I genes showed the lowest expression in all the analyzed tissues, and a total of 16 ATGs were found to be expressed in most of the tissues (Figure 5a; Supplementary Information SI2). Following the interesting observation of ATG expression in nodules, we analyzed the expression of ATGs using our previously generated RNA-seq data of Rhizobium/mycorrhiza-inoculated P. vulgaris roots. The results were interesting: Six ATGs were upregulated and 16 ATGs were downregulated in mycorrhized roots, and nine ATGs were upregulated and 12 ATGs were downregulated in nodulated roots (Figure 5b; Supplementary Information SI2). The expression of ATG10 was found to be specifically induced in mycorrhized roots, ATG12 was highly induced and ATG18g.l was highly suppressed under both symbiotic conditions. The RNA-seq data was validated using RT-qPCR for PvATG2, PvATG8i, PvATG9 and PvATG10.

Identification of ATG18 Families in Plants
Through an extensive study aiming to identify and analyze the ATG18 family, we selected 27 plant species starting from the early plant lineage Chlorophyta, Charophyta, liverworts, mosses and higher plants such as monocots and dicots. As with other ATGs, the ATG18 family is also well conserved in all the studied plant species; herein, a total of 280 genes and amino acid sequences were identified and retrieved from various databases. Initially, we identified the ATG18 homologs through a BLAST search of NCBI, and we then used the Pfam database to ensure the presence of WD40 repeats in the characteristic ATG18 members. The identified members were named using the aliases registered in the legume information system, NCBI, Phytozome, InParanoid, EGGNOG and Ensembl (Supplementary Information SI3). The genes with the same names were distinguished by adding a Roman numeral: The number I indicated the closest sequence to that in NCBI. For the primitive plants Physcomitrella patens, Chara braunii, Chlamydomonas reinhardtii, Dunaliella salina, Volvox carteri, Klebsormidium nitens, Micromonas pusilla, Ostreococcus lucimarinus, Ostreococcus tauri and Coccomyxa subellipsoidea, we retained the same names that were reported by Norizuki and colleagues [51].
Starting from the most primitive photosynthetic organisms of Chlorophyta, all the members studied had two ATG18 homologs except C. subellipsoidea, which had three ATG18 genes. Charophyta (C. braunii), liverworts (Marchantia polymorpha) and mosses (P. patens) had two, four and eight genes, respectively. Among monocots, we found that Oryza sativa had the lower number of genes (8), and Z. mays had the highest number of genes (31). Arabidopsis had a total of eight ATG18 members, and the 12 legumes considered here together had a total of 180 genes belonging to the ATG18 family. P. sativum had a minimum of six, and a maximum of 27 genes were found in L. angustifolius. The details of the ATG18 homologs in every species are listed in Tables 2 and 3.

Principal Component Analysis of the ATG18 Family
Multidimensional scaling analysis using Bios2mds demonstrates the similarity between 280 ATG18 protein sequences from 27 different species. The plot clearly shows that orthologs (genes with closely related sequences and having the same function in different species) are more similar than paralogs (genes that have similar sequences but have different functions in the same species). The plots show that all ATG18 sequences were grouped into three clusters ( Figure 6 and Supplementary Figure S3A). The principal components (PCs) allowed us to construct graphs with PC1, PC2 and PC3, and we then applied the K-means method. Cluster I formed a subfamily with ATG18a, c, d and e members from all the higher plant species studied. Cluster II contained only ATG18b homologs, and cluster III contained ATG18f, g and h members. Cluster III consisted of 3 groups: Lower plants formed a distant group, the second group contained the monocot-derived proteins, and the third group harbored all dicots except Arabidopsis, which was more similar to monocots than dicots. Lower plant species were found to be distributed mostly in clusters I and II with the exception of K. nitens, C. subellipsoidea, M. polymorpha and P. patens, which were also grouped in cluster III but exhibited more similarities among themselves than with higher plants. These clusters were named subfamilies I, II and III for convenience.

Principal Component Analysis of the ATG18 Family
Multidimensional scaling analysis using Bios2mds demonstrates the similarity between 280 ATG18 protein sequences from 27 different species. The plot clearly shows that orthologs (genes with closely related sequences and having the same function in different species) are more similar than paralogs (genes that have similar sequences but have different functions in the same species). The plots show that all ATG18 sequences were grouped into three clusters ( Figure 6 and Supplementary Figure S3A). The principal components (PCs) allowed us to construct graphs with PC1, PC2 and PC3, and we then applied the K-means method. Cluster I formed a subfamily with ATG18a, c, d and e members from all the higher plant species studied. Cluster II contained only ATG18b homologs, and cluster III contained ATG18f, g and h members. Cluster III consisted of 3 groups: Lower plants formed a distant group, the second group contained the monocot-derived proteins, and the third group harbored all dicots except Arabidopsis, which was more similar to monocots than dicots. Lower plant species were found to be distributed mostly in clusters I and II with the exception of K. nitens, C. subellipsoidea, M. polymorpha and P. patens, which were also grouped in cluster III but exhibited more similarities among themselves than with higher plants. These clusters were named subfamilies I, II and III for convenience.

Phylogenetic Relationships of the ATG18 Family in Plants
To understand the evolutionary relationship among primitive and advanced dicot plant species, a multiple sequence alignment of 280 ATG18 amino acid sequences was performed. The aligned sequences were used to generate phylogenetic trees based on the maximum likelihood and Bayes methods using MEGA and Phangorn software (Figure 7 and Supplementary Figure S3B). The largest clade was subfamily III followed by subfamily I, which was mainly composed of ATG18 a, c, d and e. Subfamily II harbored ATG18b. Subfamilies II and III consisted of the Bryopsida, Charophyceae, Klebsormidiophyceae, Mamiellophyceae and Trebouxiophyceae plants, which is important for understanding the divergence of ATG18 homologs.

Phylogenetic Relationships of the ATG18 Family in Plants
To understand the evolutionary relationship among primitive and advanced dicot plant species, a multiple sequence alignment of 280 ATG18 amino acid sequences was performed. The aligned sequences were used to generate phylogenetic trees based on the maximum likelihood and Bayes methods using MEGA and Phangorn software (Figure 7 and Supplementary Figure S3B). The largest clade was subfamily III followed by subfamily I, which was mainly composed of ATG18 a, c, d and e. Subfamily II harbored ATG18b. Subfamilies II and III consisted of the Bryopsida, Charophyceae, Klebsormidiophyceae, Mamiellophyceae and Trebouxiophyceae plants, which is important for understanding the divergence of ATG18 homologs.

Analysis of the Primary Structure and the Secondary Structure Predictions of the ATG18 Family in Plants
For the detection of motifs in 280 aa sequences, we identified four main motifs using MEME software. Motif 1 (SGVHLYKLRRGATNAVIQDIAFSHDSQWJAISSSKGTVHIF) contained 41 aa, and the motif sequence matched that of the WD40 family (PF00400) and β propeller clan 186 (CL0186) in the Pfam database. The InterProScan results also showed that motif 1 belongs to the superfamily WD40 (IPR036322), WD40 repeat-like (SSF50978) and breast carcinoma amplified sequence 3 (PTHR13268). Motif 2 (VIAQFRAHTSPISALCFDPS-GTLLVTASVHGHNINVFRIMP) contained 41 aa and was similar to motif 1 but contained an additional domain (WD40/YVTN repeat-like domain, IPR015943). Moreover, motifs 3 (VRCSRDRVAVVLATQIYCYBA) and 4 (GYGPMAVGPRWLAYASNPPLLSNTGRLSPQN) did not belong to any protein family (Figure 8).

Analysis of the Primary Structure and the Secondary Structure Predictions of the ATG18 Family in Plants
For the detection of motifs in 280 aa sequences, we identified four main motifs using MEME software. Motif 1 (SGVHLYKLRRGATNAVIQDIAFSHDSQWJAISSSKGTVHIF) contained 41 aa, and the motif sequence matched that of the WD40 family (PF00400) and β propeller clan 186 (CL0186) in the Pfam database. The InterProScan results also showed that motif 1 belongs to the superfamily WD40 (IPR036322), WD40 repeat-like (SSF50978) and breast carcinoma amplified sequence 3 (PTHR13268). Motif 2 (VIA-QFRAHTSPISALCFDPSGTLLVTASVHGHNINVFRIMP) contained 41 aa and was similar to motif 1 but contained an additional domain (WD40/YVTN repeat-like domain, IPR015943). Moreover, motifs 3 (VRCSRDRVAVVLATQIYCYBA) and 4 (GYGP-MAVGPRWLAYASNPPLLSNTGRLSPQN) did not belong to any protein family ( Figure  8). The motif sequences were further analyzed with PfamScan to identify the repeats, domains and families. Subfamily I was characterized by motifs 1 and 4, which consisted of WD40 and ANAPC4_WD40 repeats. These motifs also had two domains and eight families, although these Pfam family results are not representative of the subfamily. Subfamily II had motifs 1, 2 and 4, and we detected WD40 and ANAPC4_WD40 repeats in all the members. Only the green alga O. tauri contained leucine-rich repeats (LRR9 and LRR4). A total of four domains were identified: Gel_WD40, which was the largest, a defensin domain and PQQ and SecA preprotein crosslinking domains. Subfamily II also consisted of three families in six plants (Figure 9; Supplementary Information SI4). Figure 8. Protein motifs of the ATG18b family from different plant species. The conserved motifs were identified with MEME. The amino acid sequence of the ATG18 family is represented by lines, and the motifs identified using TBtools are shown with boxes: Motif 1 (green), motif 2 (yellow), motif 3 (dark green) and motif 4 (pink).

Subfamily III
The motif sequences were further analyzed with PfamScan to identify the repeats, domains and families. Subfamily I was characterized by motifs 1 and 4, which consisted of WD40 and ANAPC4_WD40 repeats. These motifs also had two domains and eight families, although these Pfam family results are not representative of the subfamily. Subfamily II had motifs 1, 2 and 4, and we detected WD40 and ANAPC4_WD40 repeats in all the members. Only the green alga O. tauri contained leucine-rich repeats (LRR9 and LRR4). A total of four domains were identified: Gel_WD40, which was the largest, a defensin domain and PQQ and SecA preprotein crosslinking domains. Subfamily II also consisted of three families in six plants ( Subfamily III had all four motifs, and we found PD40 repeats along with WD40 and ANAPC4_WD40 repeats. Among the 27 plant species analyzed, nine of them had 12 domains and ATP synthase was specific Z. mays. Breast carcinoma amplified sequence 3 (BCAS3) is a characteristic domain found in most members (Figure 9; Supplementary Information SI4).
The secondary structure of ATG18 was determined by protein alignment using JPred software. Here, we found that the sequence of ATG18h in A. thaliana was the largest sequence in the alignment with 927 aa. The protein contains seven blades with four beta blades commonly found in the WD40 family (Supplementary Figure S4). This sequence composition was 1% alpha-helix (H), 29% beta-sheet and 68% coil. ATG18 sequences have four antiparallel β-strands, which are named blades [52]. The beta-sheets in ATG18 proteins contain flexible loops that facilitate molecule binding.
AtATG18h has an LHRG sequence in the same place where the alignments have the FRRG sequence, and we found the BCAS3 domain with Phe17 ( Figure 10). The sequence alignment performed to identify the FRRG motif revealed that FRRGs appeared in subfamily II, which consists of ATG18b. In addition, subfamily I contained the LRRG or VRRG sequences, whereas subfamily III contained LQRG, LHRG or LYRG sequences. The sequences that appear in ATG18 contain the same pattern of two polar and neutral amino acids in the center of the sequence between two neutral and nonpolar amino acids. ATG18b in subfamily II has the conserved sequence for PtdInsP binding, and other subfamilies likely also show PtdInsP binding ( Figure 10, Supplementary Figure S4). Subfamily III had all four motifs, and we found PD40 repeats along with WD40 and ANAPC4_WD40 repeats. Among the 27 plant species analyzed, nine of them had 12 domains and ATP synthase was specific Z. mays. Breast carcinoma amplified sequence 3 (BCAS3) is a characteristic domain found in most members (Figure 9; Supplementary Information SI4).
The secondary structure of ATG18 was determined by protein alignment using JPred software. Here, we found that the sequence of ATG18h in A. thaliana was the largest sequence in the alignment with 927 aa. The protein contains seven blades with four beta blades commonly found in the WD40 family (Supplementary Figure S4). This sequence composition was 1% alpha-helix (H), 29% beta-sheet and 68% coil. ATG18 sequences have four antiparallel β-strands, which are named blades [52]. The beta-sheets in ATG18 proteins contain flexible loops that facilitate molecule binding.
AtATG18h has an LHRG sequence in the same place where the alignments have the FRRG sequence, and we found the BCAS3 domain with Phe17 ( Figure 10). The sequence alignment performed to identify the FRRG motif revealed that FRRGs appeared in subfamily II, which consists of ATG18b. In addition, subfamily I contained the LRRG or VRRG sequences, whereas subfamily III contained LQRG, LHRG or LYRG sequences. The sequences that appear in ATG18 contain the same pattern of two polar and neutral amino acids in the center of the sequence between two neutral and nonpolar amino acids. ATG18b in subfamily II has the conserved sequence for PtdInsP binding, and other subfamilies likely also show PtdInsP binding ( Figure 10, Supplementary Figure S4).

Microsynteny of ATG18 in P. vulgaris
To explore the origins and evolutionary processes of the P. vulgaris ATG18 family genes, a comparative synteny map between the eight PvATG18 homologs and 15 other genomes was constructed. The species compared in this study were based on their availability in the GCV database. The classification of the ATG18 family was based on the subfamilies obtained by multidimensional scaling (Figure 6).

Microsynteny of ATG18 in P. vulgaris
To explore the origins and evolutionary processes of the P. vulgaris ATG18 family genes, a comparative synteny map between the eight PvATG18 homologs and 15 other genomes was constructed. The species compared in this study were based on their availability in the GCV database. The classification of the ATG18 family was based on the subfamilies obtained by multidimensional scaling (Figure 6).

Subfamily I
ATG18a was highly conserved in all species with the exception of A. ipaensis. SPATA 20 (legfed_v1_0.L_1H5ZXB) is tandemly duplicated in P. vulgaris. In contrast, the lyase dihydroneopterin aldolase (legfed_v1_0.L_2MWVJ4) was only found in P. vulgaris in the syntenic block. Other genes conserved in the syntenic block were related to cell cycle regulation, transcriptional regulation, transcription factors, zinc finger proteins and other structural motifs involved in peroxisomal and mitochondrial import (Supplementary Figure S5A).
ATG18c was not located in the syntenic block in L. albus, M. truncatula, P. sativum or V. angularis. Genes related to ABC transport, vacuolar iron transport, proteins with WD40 repeats involved in protein-protein interactions, cytochrome P450, oxidoreductases and zinc-binding dehydrogenase were highly conserved in the syntenic block. T. pratense and P. lunatus show duplication of oxidoreductases and zinc-binding dehydrogenase family proteins (Supplementary Figure S5B).
ATG18c II was not located in the syntenic block in L. japonicus. Transcriptional regulator SUPERMAN-like (legfed_v1_0.L_Tx802x and legfed_v1_0.L_NLQvfk) were specific to P. vulgaris. Furthermore, an uncharacterized protein (legfed_v1_0.L_2ffJFT) was found to have undergone duplications in G. max, indicating a putative functional role. Pre-mRNAsplicing factor (legfed_v1_0.L_1Bt8v9) was specifically found in milletioid members of legumes, such as P. vulgaris, G. max, G. soja and V. unguiculata (Supplementary Figure S5C).

Subfamily II
ATG18b was not located in L. japonicus or V. angularis. L. japonicus exhibited inversions in the syntenic block involving the synthesis of pectic cell wall components, ATPases and DUF788 proteins, which have been proven to be involved in autophagy regulation. ATG11 was also found in the same syntenic block (Supplementary Figure S5D; Supplementary Information SI5).

Subfamily III
ATG18f.I was identified in most of the species compared, and most of the flanking genes were conserved. An important observation from this syntenic block is the tandem duplication of Histone H2A (legfed_v1_0.L_0mwghf) in all species except Arachis and Lotus. Fe(II)-dependent dioxygenase-like (legfed_v1_0.L_81S90D) was missing in L. albus and L. angustifolia (Supplementary Figure S6A; Supplementary Information SI5).
ATG18g.I was only found in P. vulgaris, C. cajan, G. max, L. japonicus and V. angularis, and in the other species, the circadian clock-regulated growth regulator Zinc knuckle family protein (legfed_v1_0.L_001qtq) was found in the same syntenic block. The most significant feature of this block was the repeated duplication of disease resistance-responsive dirigentlike protein family protein (legfed_v1_0.L_08frmp) in all the species except V. angularis. In Arachis species, the clustering of vacuolar protein-sorting protein (legfed_v1_0.L_0c0sd2) and breast carcinoma amplified sequence 3 protein (legfed_v1_0.L_cdgcy6) with other genes was an important observation (Supplementary Figure S6B; Supplementary Information SI5).
ATG18g.II was missing in L. albus and was well conserved in other species. In Arachis, gene clusters involving FANTASTIC FOUR 3-like (legfed_v1_0.L_xmq5fm) protein were found associated with shoot meristem growth (Supplementary Figure S6C; Supplementary Information SI5).

ATG18 Protein Characterization
As mentioned previously, ATG18 homologs in P. vulgaris were also divided into three subfamilies with the characteristic motifs FRRG in PvATG18b, VRRG in PvATG18a and PvATG18c, LQRG in PvATG18f and LHRG in PvATG18g. Characterization of the PvATG18 homologs revealed that PvATG18b had the lowest molecular weight, was stable with an isoelectric point of 8.86 and had a high aliphatic index. High-molecular-weight proteins were specifically found in subfamily III (Supplementary Table S2).
Prediction of the subcellular localization of ATG18 homologs showed that ATG18a, c.I, c.II, g.I and g.II were localized in the cytoplasm, and ATG18f.I and f.II were located in the ER membrane and plasma membrane. Only ATG18c homologs were localized to the lumen of lysosomes. ATG18b was unique because it was found in the mitochondrial inner membrane, inner membrane space and ER membrane (Supplementary Table S2). Furthermore, only three of the PvATG18 proteins had a transmembrane helix spanning the aa 44-67 in PvATG18b and located between the aa 12 and 34 in PvATG18f.I and the aa 7 and 26 in PvATG18f.II (Supplementary Figure S7). Furthermore, we predicted the putative phosphorylation sites in PvATG18 homologs and found that these were located on the amino acids threonine and serine in all sequence alignments (Supplementary Figure S8).

Protein Structure Prediction and Molecular Dynamics Simulation of ATG18b in P. vulgaris
The above-described analysis implies that PvATG18b is the functional ortholog of AtATG18b; hence, we attempted to understand the structure of this protein using the Robetta Server. This model was submitted to 2.1-µs-long unbiased MD to evaluate the predicted protein model (Figure 11a). In the simulation, we monitored the root mean square deviation (RMSD) of the model protein. The graph clearly indicates a change in the RMSD during the first 1.8 µs of simulation, but the RMSD then reached a plateau. This finding indicates that after 1.8 µs of simulation, the 3D structural model of PvATG18b represents a stable folding conformation (Figure 11b). The model shows the seven-bladed β-propeller architecture conserved among the ATG18 family of proteins [52]. The PvATG18 protein structure consists of seven blades formed by antiparallel β-stands connected by short loop regions. The blades are listed with the numbers 1 to 7 beginning at the C-terminus, whereas the β-stands are named with letters from an inner to outer location as A to D. These structures were similar to those observed with the biophysical characterization of PROPPIN ATG18 in Pichia angusta [52]. We also found a CD loop (S269 to T288) located between the two phosphoinositide-binding sites and the FRRG motif at positions F218, R219, R220 and G221 between blades 5 and 6 ( Figure 10d). PROPPINs are WD-40 family propeller proteins that act as scaffolds for protein-protein interactions. The binding of PvATG18b to PtdIns(3,5)P 2 and PtdIns3P might be mediated by additional protein-protein interactions, as observed in Kluyveromyces lactis [37]. Earlier models of PROPPINS predicted the insertion of two loops into the membrane in a perpendicular orientation in the phagophore membrane through nonspecific electrostatic interactions [53,54]. Our results for PvATG18 reveal the previously reported nonspecific electrostatic interaction in the protein structure and the presence of one transmembrane motif (Figure 10b,c.) PvATG18 homologs revealed that PvATG18b had the lowest molecular weight, was stable with an isoelectric point of 8.86 and had a high aliphatic index. High-molecular-weight proteins were specifically found in subfamily III (Supplementary Table S2).
Prediction of the subcellular localization of ATG18 homologs showed that ATG18a, c.I, c.II, g.I and g.II were localized in the cytoplasm, and ATG18f.I and f.II were located in the ER membrane and plasma membrane. Only ATG18c homologs were localized to the lumen of lysosomes. ATG18b was unique because it was found in the mitochondrial inner membrane, inner membrane space and ER membrane (Supplementary Table S2). Furthermore, only three of the PvATG18 proteins had a transmembrane helix spanning the aa 44-67 in PvATG18b and located between the aa 12 and 34 in PvATG18f.I and the aa 7 and 26 in PvATG18f.II (Supplementary Figure S7). Furthermore, we predicted the putative phosphorylation sites in PvATG18 homologs and found that these were located on the amino acids threonine and serine in all sequence alignments (Supplementary Figure S8).

Protein Structure Prediction and Molecular Dynamics Simulation of ATG18b in P. vulgaris
The above-described analysis implies that PvATG18b is the functional ortholog of AtATG18b; hence, we attempted to understand the structure of this protein using the Robetta Server. This model was submitted to 2.1-μs-long unbiased MD to evaluate the predicted protein model (Figure 11a). In the simulation, we monitored the root mean square deviation (RMSD) of the model protein. The graph clearly indicates a change in the RMSD during the first 1.8 μs of simulation, but the RMSD then reached a plateau. This finding indicates that after 1.8 μs of simulation, the 3D structural model of PvATG18b represents a stable folding conformation (Figure 11b). The model shows the seven-bladed β-propeller architecture conserved among the ATG18 family of proteins [52]. The PvATG18 protein structure consists of seven blades formed by antiparallel β-stands connected by short loop regions. The blades are listed with the numbers 1 to 7 beginning at the C-terminus, whereas the β-stands are named with letters from an inner to outer location as A to D. These structures were similar to those observed with the biophysical characterization of PROPPIN ATG18 in Pichia angusta [52]. We also found a CD loop (S269 to T288) located between the two phosphoinositide-binding sites and the FRRG motif at positions F218, R219, R220 and G221 between blades 5 and 6 ( Figure 10d). PROPPINs are WD-40 family propeller proteins that act as scaffolds for protein-protein interactions. The binding of PvATG18b to PtdIns(3,5)P2 and PtdIns3P might be mediated by additional protein-protein interactions, as observed in Kluyveromyces lactis [37]. Earlier models of PROP-PINS predicted the insertion of two loops into the membrane in a perpendicular orientation in the phagophore membrane through nonspecific electrostatic interactions [53,54]. Our results for PvATG18 reveal the previously reported nonspecific electrostatic interaction in the protein structure and the presence of one transmembrane motif (Figure 10b

Discussion
Autophagy is recognized as a highly selective cellular clearance pathway that helps maintain homeostasis in eukaryotic cells. The genes involved in autophagy are highly conserved from yeast to humans, and the process is the result of the interaction of these ATGs and other associated genes. The number of identified ATGs shows a marked variation among different species. In yeast, a total of 41 genes have been identified to date, and several studies on plant ATGs have also identified a varied number of genes. In the present investigation, we attempted to perform a comprehensive study for identifying ATG families in three important legume species, namely, P. vulgaris, M. truncatula and G. max. Furthermore, we focused on the ATG18 gene family, the largest of all the families, to identify and phylogenetically compare 27 plant species starting from early plant lineages, chlorophytes to higher plants including legumes.

Autophagy Genes in Legumes Are Highly Conserved
Using Arabidopsis ATGs as a reference, we retrieved ATG homologs in all the species listed in various databases, including Phytozome, and the sequences were confirmed to be affiliated with ATG-like homologs by analyzing their Pfam matches in the Pfam database. We identified a total of 32, 28 and 61 ATG homologs in P. vulgaris, M. truncatula and G. max, respectively. The identified homologs could be classified into 17 families based on their phylogenetic relationships and motifs. The phylogenetic analysis revealed that homologs in Medicago were located closer to Arabidopsis than those in other species. Unlike in yeast, which contains a single copy of each family, many of the gene families have multiple copies. ATG1 has 4, 3, 2 and 6 homologs in Arabidopsis, Medicago, Phaseolus and Glycine, respectively, ATG13 has 2 homologs in Arabidopsis, Medicago and Phaseolus (2 in each) and 4 homologs in G. max, ATG9 has 2 or 4 homologs in Medicago, Phaseolus and G. max and ATG14 and ATG4 have 2 homologs in Arabidopsis and 2 homologs in G. max. The analysis of larger families revealed that ATG8 has 9, 6, 7 and 10 homologs in Arabidopsis, Medicago, Phaseolus and G. max, respectively, and that ATG18 has 8 homologs in Arabidopsis, Medicago and Phaseolus (8 in each) and a maximum of 19 homologs in G. max. Similar results were also obtained with O. sativa [55], Nicotiana tabacum [56], Vitis vinifera [57], Musa acuminate [58] and Setaria italic [59]. However, in most of the families, the homologs were placed in one clade, which clearly showed sequence similarity and the derivation of statistically reliable pairs of possible orthologous proteins sharing similar functions from a common ancestor, consistent with the results from a previous study conducted by Kellogg (2001) [60]. Furthermore, the ATG families identified constituted a relatively complete autophagic machinery in forming the complexes, namely, the ATG1 kinase complex, class III PI3K complex, ATG9 recycling complex, Atg8-lipidation system and Atg12-conjugation system.
ATG17 is an important accessory protein along with ATG31-ATG29, which acts as a scaffold/modulator in linking the ATG1-ATG13 complex to the phagophore assembly site in yeast. Homologs of the ATG17-ATG31-ATG29 subcomplex were not detected in Arabidopsis. However, single orthologs of ATG11 and ATG101 were identified, and ATG11 reportedly contains a short cryptic ATG17-like domain with weak identity to yeast ATG17 [61]. The identification of ATG homologs in the present study revealed one homolog of ATG11 and one homolog of ATG101 in all the legumes analyzed.
For further exploration of the origin and evolutionary process of ATGs, a comparative synteny map that depicted the presence of 160 genes in Arabidopsis and three legumes compared was constructed. The results suggested that the majority of ATGs had a common ancestor. The Ka/Ks ratio is an important genetic parameter for determining whether positive Darwinian selection is related to gene differentiation [62]. Positive Darwinian selection will retain the advantages of nonsynonymous mutations, and purification selection will gradually remove deleterious nonsynonymous mutations. Herein, the Ka/Ks ratio among most of the ATG sequences was lower than 1 (average of 0.17), indicating purifying selection; in contrast, the sequences of ATG8 (1.24) and two ATG18s (1.09 and 1.04) in G. max had higher values, indicating accelerated evolution and positive selection.
Plant macroautophagy is a process in which macromolecules and cellular components are recycled in lytic vacuoles to be reused. Recycling is crucial for the maintenance of cellular homeostasis by acting as a quality control mechanism under nonstressful conditions and is stimulated under stress conditions [63]. Stress-induced autophagy is well documented in some plant species. Our study of the transcription factors binding to the ATGs revealed that several light-responsive transcription factors, such as BOX-4, G-box, GT1-motif, MRE and ACE, were abundant in most of the ATGs. Furthermore, cis-acting elements related to circadian control were also identified. Phytohormones play key roles in different plant processes, including stress responses. The ATGs analyzed exhibited TF-binding sites for EREs, ABA-responsive ABREs, MeJA-responsive CGTCA motifs, auxin-responsive TGA elements and gibberellin-responsive GARE motifs. Ethylene is considered a key regulator of autophagy in petal senescence in petunia, and ERF5 is also shown to induce autophagy by binding to ATG8 and ATG18h under drought stress in tomato. Upregulation of autophagy by low concentrations of salicylic acid is found to delay methyl jasmonate-induced leaf senescence in Arabidopsis [64][65][66]. In addition, several wound-responsive, pathogenresponsive, flavonoid biosynthetic gene regulation-related and meristem-specific elements were also detected. Based on all the results, the involvement of autophagy in the regulation of plant responses to biotic and abiotic stresses is undeniable.

Autophagy Genes Are Responsive to Nitrate
To assess the differential expression pattern and responsive nature of ATGs to the presence of different nitrate sources, we developed heatmaps using the data retrieved from databases and from a previous RNA-seq analysis performed by our research group. The differential expression pattern in Phaseolus tissues showed that most of the ATGs were expressed in all tested tissues. Nitrogen is an essential component of life that is needed for building proteins and DNA, and despite its abundance in the atmosphere, only limited reserves of soil inorganic nitrogen are accessible to plants, and this nitrogen is primarily in the forms of nitrate and ammonium. Legumes have a unique ability to establish a symbiotic association with nitrogen-fixing rhizobia. Due to our understanding of the evolution of ATGs in legumes, we opted to understand the response of both arial and root tissues of these legumes to different nitrate sources. The expression patterns showed that the highest expression was found in roots treated with ammonia and leaves treated with urea. ATG18 homologs a, g and h were specifically induced in all tissues and by all treatments, indicating the nitrate-responsive nature of these genes.
Furthermore, an analysis of the differential expression patterns of ATGs in Phaseolus tissues revealed that the highest expression level was noted in 21-day fix (-) nodules, which could be due to the involvement of the autophagic process in providing the necessary amino acids for the synthesis of nitrogen in the absence of the symbiont. In yeast and other eukaryotes, it has been proven that nitrogen deficiency induces autophagy. A recent study using yeast cells also suggested that autophagy sustains glutamate and aspartate synthesis during nitrogen starvation [67]. RNA-seq data from early symbiosis with rhizobia and mycorrhizae showed differential ATG expression, and more ATGs were upregulated in rhizobia-inoculated roots than in mycorrhizae-inoculated roots. This analysis provided candidate genes that could play pivotal roles in symbiosis. The involvement of ATG6/beclin has previously been reported in P. vulgaris during rhizobial infection progression and arbuscule maturation [68].

The ATG18 Family Is Highly Conserved and Has a Broader Sequence-Based Classification
Atg18 is one of the autophagy-related molecules responsible for autophagic processes and is conserved from yeast to higher organisms [34]. ATG18 proteins belong to the PROP-PINs (β-propellers that bind polyphosphoinositides) family and work as PI3P effectors. Earlier studies that focused on the identification of ATG genes in primitive and higher plants showed that each family is represented by only one gene for each component of the core autophagy machinery. ATG8 and ATG18 are exceptions and have multiple homologs with lower redundancy in Arabidopsis and P. patens [51].
ATG18 was the family with the highest number of homologs; hence, we chose this family for a comprehensive analysis of the family from the early plant lineage to legumes. The multiple sequence alignment and phylogeny of ATG18 homologs resulted in separation of the homologs into three clades. Each of the clades had subfamily members, as determined by the multidimensional scaling projection of 280 ATG18 homologs in 27 photosynthetic organisms. Unlike previous studies by Norizuki and colleagues [51], the classification of the ATG18 family was not based on the BCAS3 domain alone. Knockout of the BCAS3 gene in Dictyostelium resulted in a reduction in early autophagosomes compared with that found in wild-type cells [69]. In the present study, due to the multidimensional scaling projection of the retrieved sequences, we classified the ATG18 sequences into three subfamilies. Subfamily I contained ATG18a, ATG18c, ATG18d and ATG18e homologs, subfamily II had only ATG18b and subfamily III had ATG18f, ATG18g and ATG18h members. All homologs with BCAS3 were found to be clustered within subfamily III.
Subfamily II, which contained only ATG18b homologs, had few members but was detected in all the plant species investigated in this study, which suggested the sequence and functional conservation of these proteins. Among the early photosynthetic organisms, we identified at least one homolog in subfamilies I and II, but significant divergence was detected, particularly within subfamily III. Among monocots, O. sativa had 8 homologs, whereas 32 and 21 homologs were found in Z. mays and T. aestivum, respectively. The analysis of dicots revealed 8 homologs in each of Arabidopsis, L. japonicus, M. truncatula and P. vulgaris, whereas Arachis sp. had 9 and 10. The maximum number of homologs was recorded in C. cajan (18), G. max (18), C. arietinum (20), Vigna sp. and L. angustifolius (27).
The legume family includes one of the most agroeconomically important plant crops after Poaceae [70]. Of the three subfamilies within Fabaceae, Papilionoideae is the largest, the most recently evolved and monophyletic. Because Papilionoideae includes the most important cultivated legumes, we sought to determine the members of this subfamily in different clades. In the present study, the maximum number of homologs (27) was identified in L. angustifolius, which belongs to the genistoid clade and exhibited an early divergence at approximately 56.4 ± 2 mya. Furthermore, in Arachis species, we found less than half of the ATG18 homologs, indicating possible deletions. Among the members of the next recent (45 mya) clade, which consisted of milletoids, an increase in the number of homologs (18) was detected, which might be due to whole-genome duplication in G. max. However, P. vulgaris had only eight members of ATG18, indicating possible divergence prior to wholegenome duplications, whereas Vigna sp. was found to have high numbers of homologs. Furthermore, more recent robinioid (48.3 ± 1.0 mya) and IRLC (39.0 ± 2.4 mya) clade members had fewer members with the exception of the tribe Vicieae, whose gene numbers were due to genome expansion and related genomic events. In contrast, syntenic relations were not disrupted due to differences in genome sizes [71,72]. A phylogenetic analysis revealed that the ATG18 homologs of Chlorophyta, Charophyta, Marchantiophyta and Bryophyta were always grouped together, and similar results were obtained for monocots and dicots. However, in a comparison of a broad class of species, it is often not simple to precisely define orthologous genes or genomic loci in a straightforward manner, and this analysis is complicated due to gene duplication, recurring polyploidy and extensive genome rearrangement [73].

The ATG18 Protein Structure Predicts Possible Functional Diversification
In addition, the prediction of the primary and secondary structures of the proteins strengthens the classification of ATG18 proteins into subfamilies. The protein size, motif structure and changes in FRRG motifs among the ATG18 homologs were identified as the fundamental features that contribute to the classification. The changes in the FRRG motifs found in members of subfamily II comprising ATG18b to LRRG, VRRG in subfamily I, LQRG, LHRG or LYRG in subfamily III indicate functional diversification. The WD40 domain is among the top ten most abundant domains in eukaryotic genomes and is also ranked as the top interacting domain in S. cerevisae [74] (Stirnimann et al., 2010). Based on the SMART database, the human genome contains approximately 349 WD40 domaincontaining proteins [75]. The presence of the WD40 domain in ATG18 homologs could indicate their involvement in cellular functions. Proteins containing WD40 domains are known to be involved in signal transduction, vesicular trafficking, cytoskeletal assembly, cell cycle control, apoptosis, chromatin dynamics and transcription regulation due to their ability to bind and thus function as interchangeable substrate receptors to target different substrates and recruit different substrates in distinct modes [76]. In C. elegans, ATG18 and WIPI 1/2 (WD-repeat protein interacting with phosphoinositides) in mammals have FRRGs and EPG-6 and WIPI 3/4 have LRRGs. The substitution of the FRRG motif by FTTG and FKKG does not allow PtdInsP binding; however, the changes in LKKG and LTTG still allow PtdInsP binding [77], implying a possible functional diversification of ATG18 homologs. The studies conducted thus far also demonstrate the involvement of ATG18 homologs in abiotic stress responses in plants [42][43][44][45][46][47][48][49][50].

ATG18 Family in P. vulgaris
In P. vulgaris, a total of eight ATG18 homologs were identified in the current study and were also classified into three subfamilies. While the functional roles of these subfamilies were not determined in this study, the involvement of these proteins in diversified cellular functions cannot be ruled out. All the subfamilies showed conserved phosphorylation sites but different subcellular localizations.
The conserved nature of serine/threonine sites could indicate the functional roles corresponding to several cellular responses in P. vulgaris. In yeast, Pichia pastoris, Atg18 phosphorylation in the loops in the propeller structure of blades 6 and 7 decreases its binding affinity to phosphatidylinositol 3,5-bisphosphate. The association of ATG18 with the vacuolar membrane is inhibited until dephosphorylation [78]. A recent study in Arabidopsis showed that the phosphorylation of ATG18a by brassinosteroid insensitive 1associated receptor kinase 1 (BAK1) suppresses autophagy and attenuates plant resistance against necrotrophic pathogens [79].
The microsynteny of P. vulgaris ATG18 homologs showed that subfamily I members were highly conserved across the compared species and were flanked by genes involved in cell cycle regulation, transcriptional regulation, cellular transport and metal ion binding. Furthermore, subfamily II was flanked by the ATPase and DUF788 proteins, which have been proven to be involved in autophagy regulation. ATG11, which is a part of the ATG13-ATG1 complex in autophagy initiation, was also found in the same syntenic block. The subfamily III syntenic block contained conserved genes related to histones, circadian clock, growth and vacuolar transport.

PvATG18b Could Be the Homolog of AtATG18b
In accordance with a well-established fact, the most important feature of ATG18 proteins is the presence of the FRRG motif and its ability to bind to phosphoinositide. Among P. vulgaris ATG18 homologs, the FRRG motif was found only in ATG18b belonging to subfamily II. Hence, we propose PvATG18b as the functional homolog of A. thaliana ATG18b. We also hypothesize that other ATG18 homologs might be involved in other molecular recognition events through binding to surface molecules that play a distinctive role in autophagy, and similar findings have been observed with human ATG18 homologs, e.g., WIPI 1/WIPI 2 with FRRG repeats and WIPI 3/WIPI 4 with LRRG repeats bind to various PtdIns and thus play distinct roles in autophagy [76,80].
We then performed a molecular dynamic simulation of PvATG18b that is unique to ATG models in legumes. Our model shows the stable folding conformation of the seven-bladed β-propeller architecture. PvATG18b is composed of 359 amino acids, and we found the CD loop (S269 to T288) in blade 6. While this loop sequence differs among species, it forms an amphipathic alpha-helix and might insert into a membrane to allow two lipid-binding sites (PtdIns3P and PtdIns(3,5)P 2 ) [81]. Additionally, PvATG18b contains the FRRG repeat and helps form the site for binding to lipids. The FRRG repeat is in F218 to G221 and is conserved in ATG18b to form the PROPPIN family. The FRRG motif (Phe-Arg-Arg-Gly) in ATG18 proteins has been studied in mammals, yeast and C. elegance [79,82]. In Kluyveromyces lactis, the mutation of the blade 6 β3-β4 loop affects the loss of liposome binding, and the flexible loop coordinates two distinct lipid-binding sites [83]. Previous studies with S. cerevisiae have demonstrated that loops A and B of blade 7 are the locations where ATG2 interacts with ATG18. Further research should be performed to understand the interaction of ATG18 with ATG2 and thus ensure the binding site and vacuole scission function of PvATG18b.

Alignment and Phylogenetic Tree Analyses
The protein sequences of ATG families were aligned using Clustal Omega (1.2.4) [90] (www.clustal.org and www.ebi.ac.uk; accessed on 5 July 2020) with the default parameters. The phylogenetic tree was a neighbor-joining tree without distance corrections, and we extracted the outputs from the tree and generated circular phylogram and cladogram tree images using EvolView. The different phylogenetic trees were combined with the MEME results for all sequences, and the final details were obtained using Inkscape software [91] (https://www.evolgenius.info/evolview/; accessed on 6 July 2020).
Multiple sequence alignment of 280 intraspecies protein sequences of ATG18 family members was performed using Clustal Omega. The phylogenetic analysis was performed using MEGA X with the maximum likelihood method and Bayes analyses with 1000 bootstrap replicates and the default parameters [92]. Phangorn and APE packages in R were used to build the phylogenetic trees [93,94]. In Phangorn, we used the Akaike information criterion and the Whelan and Goldman matrix (WAG) as the substitution model.

Chromosome Localization, Synteny and Ka/Ks Calculation
The chromosomal localization of ATG family homologs in A. thaliana, P. vulgaris, M. truncatula and G. max was verified using NCBI. Furthermore, Ensembl Plants was used to compare and explore the gene alignments and generate a segment to link the genomes. The synteny relation of ATG genes was drawn using OmicCircos in R36 [95]. The macro-and microsynteny of the ATG18 family was developed using the Genome Context Viewer (GCV) in the Legume information system [96] (https://legumeinfo.org/ lis_context_viewer/instructions; accessed on 16 July 2020).
The CDSs and protein sequences were obtained from Phytozome and used to calculate the synonymous substitutions (Ks) and nonsynonymous substitutions (Ka) with TBtools software (https://github.com/CJ-Chen/TBtools; accessed on 26 July 2020). Using the data table, we developed a graph of the Ka and Ks values for all ATG families in P. vulgaris, M. truncatula and G. max using the ggplot2 R packages (https://ggplot2.tidyverse.org/; accessed on 28 July 2020).

Promoter Analysis, Expression Profiling and Transcriptome of ATG Families
The 2000-bp upstream sequences of ATG genes were retrieved from Phytozome, and these sequences were used as query sequences in PlantCARE software (http:// bioinformatics.psb.ugent.be/webtools/plantcare/html/; accessed on 2 August 2020). The results were analyzed, and the most abundant transcription factors were identified using ggplot2 in R.
ATG gene expression data for A. thaliana, M. truncatula and G. max were extracted from Phytozome to determine the differential expression of the genes under different nitrogen treatments [97]. Data on the differential expression of genes in P. vulgaris under nitrogen treatments and after fixation and inoculation with Rhizobium tropici (CIAT899) were obtained from the PvGEA website (https://plantgrn.noble.org/PvGEA/; accessed on 2 July 2020).
We calculated the log2 values of the RPKM values for the comparison. To show the data for A. thaliana, M. truncatula and G. max, we used the OmicCircos package and constructed subfamilies using the synteny graph. However, for P. vulgaris, we constructed an independent heatmap of ggplot2 because the amounts of treatments and tissues were higher. The expression data for ATGs under rhizobial and mycorrhizal symbiotic conditions were obtained from our previous global transcriptomic analysis [98]. A heatmap of the fold change values was constructed using the ggplot2 package.

Quantitative Real-Time PCR Analysis
Four genes were selected for RT-qPCR analysis, which was performed to validate the RNA-seq data. High-quality total RNA was isolated from frozen root tissues using TRIzol reagent (Sigma) according to the manufacturer's instructions. RNA integrity was verified by gel electrophoresis and RNA concentration was assessed using a NanoDrop spectrophotometer (Thermo Scientific). RNA was treated with DNase to eliminate DNA contamination (1 U/µL; Roche, USA) according to the manufacturer's instructions. Reversetranscription quantitative PCR (RT-qPCR) analysis was performed using a DNA-free RNA and iScript TM One-Step RT-PCR Kit with SYBR ® Green (Bio-Rad) according to the manufacturer's instructions. To confirm the absence of DNA contamination, a sample lacking reverse transcriptase was included. Relative expression values were calculated using the 2 -∆Ct method, where the quantification cycle (Cq) value equals the Cq value of the gene of interest minus the Cq value of the reference gene [99]. Gene-specific primers were used for RT-qPCR analysis (Table S3). PvEF1α and PvIDE were used as reference as described previously by Arthikala et al. [100]. The relative expression values were normalized with respect to two reference genes EF1α and IDE as described previously by Vandesompele et al. [101]. The values presented are averages of three biological replicates, and each data set was recorded using triplicate samples.

Principal Components Analysis of the ATG18 Family
Based on multiple alignments of ATG18 protein sequences, we converted the information into a distance matrix calculated using the bios2mds packages (https://CRAN.Rproject.org/package=bios2mds; accessed on 3 July 2020) in R. The matrix used was BLO-SUM62 (BLOcks of Amino Acid SUbstitution Matrix), and sequences with 62% identity were obtained. Using the same packages, we obtain the K-means and principal components to generate the multidimensional scaling projection and thus define the subfamilies within the protein family.

Detection of Motifs, Domains, Repeats, Families and Secondary Protein Structure of the ATG18 Family
ATG sequences were analyzed for a repeated sequence motif pattern using Multiple Expectation Maximization for Motif Elicitation [102] (http://meme-suite.org/tools/meme; accessed on 18 July 2020) in the classical motif discovery mode and using a limit of three motifs. The secondary structures of the proteins were developed after alignment with Clustal Omega using the online tool JPred in FASTA format. To obtain the repeats, domains and families, a Pfam scan of EMBL-EBI was performed (https://www.ebi.ac.uk/Tools/ pfa/pfamscan/; accessed on 26 August 2020).

ATG18b Protein in P. vulgaris
The 3D structure of the PvATG18b protein was determined using the Robetta server [102]. Comparative models were built from structures detected and aligned using HHSEARCH, SPARKS and Raptor [104][105][106][107]. The loop regions were assembled from fragments and optimized to fit the aligned template structures. The final structure prediction was selected using the lowest-energy model as determined by a low-resolution Rosetta energy function. The final 3D image was colored with Quimera [108].

Conclusions
The present study was carried out to understand the diversification of ATG genes during plant evolution with special emphasis on legumes and P. vulgaris. In the present study, we identified 32, 39 and 61 core ATG genes in P. vulgaris, M. truncatula and G. max, respectively. The ATG genes were conserved across the species, but the higher plants revealed great redundancy. Most of the ATGs in Phaseolus were found to be nitrate responsive and were differentially expressed under rhizobial and mycorrhizal symbiosis, implying their possible role during symbiosis. Further, analysis ATG18 of the family in 27 photosynthetic organisms showed their classification into three subfamilies based on the sequence. In Phaseolus, ATG18 members belonging to all the three subfamilies were conserved. Comparison of Phaseolus ATG18b structure to the crystal structure in Arabidopsis showed conserved FRRG sequence.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/plants10122619/s1, Figure S1: Percentage of legume ATG homologs in different software programs. Figure S2: Validation of expression patterns of ATGs of symbiont-colonized P. vulgaris roots by RT-qPCR analysis. Figure S3A: Representation of 280 ATG18 proteins from different plant species analyzed by multidimensional scaling using Bios2mds. Figure S3B: Phylogenetic tree of ATG18 in plants. Figure S4: Secondary structure prediction using JPred. Figure S5: Microsynteny analysis of ATG18 (Subfamily I & II) in P. vulgaris. Figure S6: Microsynteny analysis of ATG18 (Subfamily III) in P. vulgaris. Figure S7: Phosphorylation sites of ATG18 in P. vulgaris identified using NetPhos. Figure  S8: Prediction of transmembrane helices in PvATG18 proteins using TMHMM. Table S1: List of identifiers of the genes, transcripts, and proteins of each ATG in P. vulgaris, Table S2: ATG18 protein characterization in P. vulgaris. Table S3