Genome-Wide Analyses of Aspartic Proteases on Potato Genome (Solanum tuberosum): Generating New Tools to Improve the Resistance of Plants to Abiotic Stress

Aspartic proteases are proteolytic enzymes widely distributed in living organisms and viruses. Although they have been extensively studied in many plant species, they are poorly described in potatoes. The present study aimed to identify and characterize S. tuberosum aspartic proteases. Gene structure, chromosome and protein domain organization, phylogeny, and subcellular predicted localization were analyzed and integrated with RNAseq data from different tissues, organs, and conditions focused on abiotic stress. Sixty-two aspartic protease genes were retrieved from the potato genome, distributed in 12 chromosomes. A high number of intronless genes and segmental and tandem duplications were detected. Phylogenetic analysis revealed eight StAP groups, named from StAPI to StAPVIII, that were differentiated into typical (StAPI), nucellin-like (StAPIIIa), and atypical aspartic proteases (StAPII, StAPIIIb to StAPVIII). RNAseq data analyses showed that gene expression was consistent with the presence of cis-acting regulatory elements on StAP promoter regions related to water deficit. The study presents the first identification and characterization of 62 aspartic protease genes and proteins on the potato genome and provides the baseline material for functional gene determinations and potato breeding programs, including gene editing mediated by CRISPR.


Introduction
Aspartic proteases (APs) (EC 3.4.23) are widely distributed among living organisms and viruses [1][2][3][4]. These enzymes have been extensively studied and constitute one of the four superfamilies of proteolytic enzymes. APs are characterized by the presence of two aspartic acid residues located within the conserved Asp-Thr/Ser-Gly motif, responsible for catalytic activity [2,5,6].
The MEROPs database classified APs into 15 families based on their amino acid sequence similarity and grouped them into 5 different clans based on their evolutionary relationship and tertiary structure [7]. Plant APs belong to families A1, A3, A11, and A12 of clan AA and family 22 of clan AD [8]. Most of the plant APs belong to the A1 family, are active at acidic pH, are specifically inhibited by pepstatin A, and present a great structural diversity [8,9]. In silico analysis of the Arabidopsis thaliana genome revealed the presence of 51 genes that encode possible APs (AtAP), 46 of which presented different primary structure characteristics compared to the canonical forms. Based on the sequence of the active sites and the organization of domains, AtAPs are currently sorted into three groups or categories: typical, atypical, and nucellin-like [10].
Typical plant AP precursors present a protein domain of 100 amino acids long known as the plant-specific insert (PSI), which is highly similar to saposin-like proteins [8]. In monomeric typical APs, this domain still remains in the mature enzyme [11], while PSI is removed from the precursors upon activation in most of the heterodimeric typical APs enzymes [8]. Atypical and nucellin-like APs share several common features such as the absence of the internal segment plant-specific insert in their sequence, variable Nterminal regions, differences in active site environment, unusually high cysteine content and localization, and have different primary structure organizations from typical APs [10,12]. Coincidentally with A. thaliana, most potential APs sequences from grape and rice genomes correspond to atypical and nucellin-like APs [9]. Regarding typical APs, their expression has been determined simultaneously in several different plant organs, including flowers, leaves, roots, stems, and seeds, as well as in different stages of development [9,[13][14][15]. Typical AP functions have been associated with development, growth, lipid metabolism, and protein degradation [16,17].
In addition, we have demonstrated that leaves from A. thaliana plants overexpressing a typical AP (AtAPA1) showed changes in morphological and physiological features related to water loss deficit [45]. We concluded that AtAPA1 participates as intermediate in the ABA-induced stomatal closure, as well as in the stomatal density regulation conferring tolerance to mild water deficit.
For the last two decades, we focused our research on the biochemical and molecular characterization of Solanum tuberosum L. aspartic proteases (StAPs) [11,[46][47][48][49]. Additionally, we reported the role/s of two typical StAPs (StAP1 and StAP3) in the plant's mechanisms in response to water deficit and pathogen attack [45][46][47][48][49]. Potato is an ancestral crop domesticated at least 7000 years ago [50,51], and today it represents the third most important crop for human consumption and is the fourth most important crop in terms of production after wheat, rice, and maize (http://faostat.fao.org/ Accessed date: 25 October 2021). Our laboratory is in the Southeast of the Province of Buenos Aires, Argentina, one of the most important cultivation areas of S. tuberosum in this country. Therefore, increasing the knowledge about how potato plants respond to abiotic and biotic stress is crucial for farmers and breeding programs.
The aim of this work was to identify and characterize the AP genes present on the S. tuberosum genome, focused on abiotic stress conditions. Data obtained provide new knowledge about gene structure and protein organization, tissue, and subcellular localization of these enzymes in potato plants. In addition, this study provides the baseline material for functional gene analysis and for potato breeding programs, including gene-editing mediated by CRISPR.

StAPs Potato Genome-Wide Identification
The Hidden Markov Model (HMMER3.1) profile, built from 59 aspartic protease amino acid sequences from A. thaliana (AtAPs), was tested against the TAIR 10 2010-12-2014 peptide database rendering 78 AtAPs amino acid sequences up to an e-value of 1.5 × 10 −21 and a score of 77 on full sequence (Table S3). The HMMER 3.1 search allowed the detection of 149 amino acid sequences on the Potato Genome Sequence Consortium Doubled Monoploide 3.4 version (PGSC DM3.4) peptide database with a maximum e-value of 0.0024 and a score of 18.5 on full sequence (Table S4). These corresponded to 97 tentative StAP genes.
A total of 98 AP amino acid sequences codified by 62 AP genes were retrieved from the potato genome. The longest amino acid sequence was selected from each gene that was confirmed to have the Asp domain (PF00026), including two conserved aspartic acid catalytic residues (Figure 1). From these 149 amino acid sequences, there were 54 sequences with partial or absent Asp domain PF00026 (PFam database) and were thus not taken into consideration for further analyses. The corresponding genes, transcripts and protein names, lengths, and chromosome positions are presented in Table S5. This AP genome representation is close to what has been previously reported: 51 and 67 AtAPs reported in A. thaliana [10,52], 50 APs in V. vinifera (VvAPs) [9], 67 APs in P. trichocarp (PtAPs) [53], and 96 APs from O. sativa (OsAPs) [16].

Phylogenetic Analyses
In order to know the phylogenetic relationship between the 62 identified StAPs and previously reported APs, we constructed a phylogenetic tree with 51 known AtAP amino acid sequences from the plant model species A. thaliana [10] and 6 reference AP amino acid sequences: Nucellin (U87148), Procardosin A (CAB40134), StAP3, OsAP2 (AAK81699), Nepenthesin I (AB114914), and Phytepsin (CAA39602) (cited in Materials and Methods section).
The remaining clusters correspond to the rest of atypical APs: StAPIV includes five StAPs and the reference NP_565219.1 aspartic protease; StAPV includes seven StAPs and the reference NP_563851.1_AED3 aspartic protease; StAPVI includes six StAPs; StAPVII includes nine StAPs. Finally, StAPVIII includes 10 StAPs and the Nepenthensin I reference AP.

StAPs Domains and Conserved Motifs
In order to characterize StAPs at the primary structure level, the presence of domains and conserved motifs were analyzed on the 62 amino acid sequences (Figure 2A, B).
The main StAPs domains scanned with the Pfam database are shown in Figure 2A, while Figure 2B provides complementary primary sequence-structure information with motifs assessed with Multiple EM for Motif Elicitation (MEME) online software. The majority of StAPs presented a signal peptide and a pro sequence segment, in accordance with previous reports for typical, atypical, and nucellin-like aspartic proteases, [9,10,12,13,53,60]. Transmembrane signals at the C-terminal end were presented exclusively in Aps from StAPII and StAPIII groups. Figure 2A also showed aspartic domains (Asp) (PF00026) that contained the catalytic aspartate at the active site in all StAPs. Xylanase inhibitor domains (PF14543), called TAXI_N and TAXI_C [61][62][63], were found overlapping with the Asp domain, and they were thus not included in Figure 2A. Both TAXI domains were present in all StAPs except for the TAXI_C domain that was absent in typical StAPs. These domains were described as jointly necessary for creating a catalytic pocket for xylanase cleavage, a product generated by pathogens that destroys plant cells [62][63][64]. The role of these domains in aspartic proteases require further studies [64]. Another characteristic feature is the presence of the PSI domain or Saposin-like SapB 1 and SapB 2 domains [65] in all aspartic protease sequences from group StAPI, coincident with the reported for the close related typical AP reference sequences.
Regarding StAP motifs assessed with MEME online software ( Figure 2B), the number of motifs per protein ranged from 4 to 12, with a length between 15 and 50 amino acids long.
A total of six StAP sequences clustered in Group StAPI along with the typical aspartic proteases, presented at least six group exclusive motifs, such as 7, 8, 11, 15, 16, and 19. Motifs 8 and 16 were detected as part of PSI domains SapB_1 and SapB_2 respectively. While motifs 1, 8, 11, and 16 were characterized by the presence of conserved cysteine residues.
Subgroup StAPIIIb exhibited motif 20 at the N-terminal region in most sequences, and a remarkable characteristic is the substitution of hydrophobic by polar amino acid (glutamine) at the surroundings of the first catalytic site (Table S1).
Group StAVIII related to NepenthensinI presented 2 distinctive conserved motifs, motif 14 at the N-terminal region and motif 18, both of which were present in 7 out of 10 StAPVIII sequences ( Figure 2B).
Additionally, some motifs were shared by atypical and nucellin-like AP groups (Groups from StAPII to StAPVIII) but were absent in Typical StAPI group, such as motifs 6, 10, 12, and 13. Interestingly, motif 10 presented a segment that has been previously reported to be an important conserved sequence GCGYDQ in nucellin-like APs [10], but although the motif showed highly conserved Gly and Cys residues in the majority of StAPs, this six amino acid sequence was present in only two sequences of the nucellin-like, StAP6.1 and StAP6.2 (in StAPIIIb).
Motif 5 was present in StAPs from the majority of groups. Sequence inspection allowed the identification of a conserved Tyr residue of the flap region (Tyr75 of pepsin numbering [10]) in all 62 StAP sequences, although the motif was not present in groups StAPI and StAPII. It has been previously reported that nucellin-like APs had a conserved sequence QCDYE in the vicinity of Y75 [10]. Coincidentally, this sequence was found exclusively in StAPs that clustered with the reference nucellin-like APs in group StAPIIIa.
Finally, some motifs such as 1, 2, 3, and 4 were present in all eight StAPs groups, highlighting the importance of these conserved overall regions in aspartic proteases, while a deeper analysis showed group or subgroup distinctive amino acid sequences. Motif 2 was present at the C-terminal region of most 62 StAPs. A closer inspection of motif 2 confirmed the presence of HTVFD amino acid sequence exclusively in five out of six StAPs from Group StAPI. This short amino acid sequence was reported previously to be exclusive of typical StAPs [10]. Motif 4 comprises the GLIGL sequence reported to be in the loop that forms the active site [3,10].
Motif 1 and 3 contained the DT/SG from the first and second catalytic sites, respectively, and were both present in all 62 StAPs, except StAP5.3, which, although lacking motif 1, presented the DTG triad of the catalytic site.
The surroundings of the active sites of StAPs were distinctive and displayed the three characteristic variations reported previously for typical, nucellin-like, and atypical APs.
The main sequence features surrounding the characteristic first and second catalytic aspartate sequences and the C-terminal end that characterize typical, nucellin-like, and atypical aspartic proteases in potato were detected (as indicated in Table S1). The conserved amino acid signature assigned to typical aspartic proteases was found in StAPI at the first catalytic aspartate (hydrophobic-hydrophobic-hydrophobic (F)-DTG-serine-serine) and in the C-terminal region, which includes the characteristic saposin-like sequence HTVFD (except for StAP9.3). Most typical aspartic proteases presented a DSG sequence in the second catalytic aspartate, which was not present in StAP9.3 and StAP3.1, which had DTG instead. Moreover, the C-terminal end of amino acid sequences in this group confirmed the absence of a Cys residue, which was observed in most sequences of the rest of the groups.

Cysteine Distribution and Glycosylation Sites
Regarding the presence of cysteines in StAPs, the cysteine content was similar within AP groups, as previously reported [10,12]. It was found that 13 cysteines were the most frequent number, with an upper and lower limit of 9 to 17.
Mature typical APs present six conserved Cys residues involved in three disulfide bonds that are crucial to stabilizing the three-dimensional structure of the enzymes [66]. The plant-specific insert (PSI) also has six cys residues that form three disulfide bridges with the main role of conferring rigidity for antimicrobial activity [65]. In agreement, conserved Cys residues were present in all sequences from group StAPI. Most of the group StAPI conserved cysteine residues were present within motifs 1, 8, 11, and 16. Cys residues from motifs 8 and 16 belong to the PSI sequence.
The Cys-rich region located between the first catalytic site and the conserved Tyr75 (of pepsin numbering, [10]) described in atypical and nucellin-like APs was coincident with motifs 5 and 6. Motif 5 was present in StAPs from all groups except StAPI and StAPII and exhibits a conserved Cys residue, while motif 6 is present in all groups except in StAPI and has two conserved Cys of the Cys rich region.
Conserved Cys residues described for atypical and nucellin-like AP were also found in motifs 10, 12, and 13. Motif 17 in groups StAPIV to StAPVIII included a conserved Cys that was present in all 62 StAPs. Although the structures of both typical and nucellin-like proteins have not yet been determined experimentally, molecular modeling predictions for a Nepethesin I protein indicate that the number of disulfide bonds might be similar to those of typical APs [35].
N-linked glycosylation plays an important role in AP biological functions [67]. Typical three-dimensional AP structures have been solved and present two or even three conserved glycosylation sites [66]. StAPI N-linked glycosylation sites were present within the conserved motifs 7, 11, and 15. It was reported that the PSI of populus APs presents a glycosylation site, which was also predicted on StAP in motif 8 [53].
Although the structures of atypical and nucellin-like APs are yet to be available, the presence of a glycosylation site has been determined [53]. Interestingly, opposite to what is observed for the presence of glycosylation sites in typical APs within conserved motifs, an atypical AP glycosylation site was found to be in a non-motif region right before motif 10.

StAPs Cellular and Subcellular Predicted Location
Primary protein structure was employed to predict cellular and subcellular location of StAPs. Most StAPs were predicted to be localized on vacuole (29 StAPs), plasma membrane (27 StAPs), extracellular space (32 StAPs), Golgi apparatus (21 StAPs), and cytoplasm (14 StAPs), while a few were located in the nucleus (9 StAPs), chloroplast (3 StAPs), peroxisome (2 StAPs), and mitochondria (1 StAPs) (Table S7). About 85% of the 62 StAPs were predicted to be multi-located. StAPs' multilocation could be reflecting their possibility to act in different tissues and/or stress conditions. One of the functions of aspartic proteases is to process enzymes and includes turnover and enzyme activation [68].
Aspartic proteases from StAPI were preferentially predicted to be located on vacuole as other typical APs reported in barley [69], castor bean (Ricinus communis) [70], and A. thaliana [71]. To a lesser extent, typical APs were also predicted to be in the extracellular space coincident with those reported for tomato (Solanum lycopersicum) [72] and tobacco (Nicotiana tabacum) [73]. Finally, nucellin-like (Cluster StAPIIIa) and atypical aspartic proteases (Clusters StAPIIIb and StAPIV to StAPVIII) had been predicted to be in several cellular and subcellular locations, especially the plasma membrane, extracellular space, and the Golgi apparatus. Several examples of atypical and nucellin-like have widespread distribution in other plant species [17].

Chromosome Location
StAPs were distributed along all 12 potato chromosomes, the majority of them localized in chromosomes I, II, V, VI, VII, and VIII ( Figure 3). Some AP genes were grouped closely, and others were located in tandem arrays as it could be observed with StAP1.1 and StAP1.2; StAP2.5 and StAP2.6; StAP6.5 and StAP 6.6; StAP 7.1 to StAP 7.3; StAP 8.1 to StAP 8.6; StAP 8.8, and StAP 8.9. However, the StAP0.1 chromosome location could not be determined on S. tuberosum group Phureja DM1-3 516 R44 genome (version 4.3 [74]). In order to find its probable location, a blast search on the NCBI database was performed. The results obtained showed that this sequence matched a sequence on chromosome 4 of S. tuberosum cultivar Solyntus (sequence ID: CP055237.1, Id: 4000/4130(97%); gaps: 32/4130) and on the same chromosome in S. pinnelli, and in S. lycopersicum.
Segmental and tandem duplications have played a fundamental role in gene family expansions and functional diversification [75]. Wang et al. (2018) investigated the variation in gene family sizes across species and the likely factors contributing to the variation, using the Solanaceae family as a model and Pfam domain families as a proxy for gene families [76]. They found that genes in high-and low-variability gene families tend to be duplicated by tandem and whole genome duplication, respectively. This could be the case of StAP genes, where gene duplications and tandem arrays were found ( Figure 3, Table S2). Tandem array distribution has also been recently described in APs from A. thaliana [77], V. Vinifera [9], and P. trichocarpa [53].
As observed in the phylogenetic tree (Figure 2), clusters grouped highly similar proteins codified by genes located in different chromosomes. In some cases, it could be attributed to segmental duplications (blue lines in Figure 3), as it was confirmed on the potato genome duplication database (Table S2). In all these cases, a negative selection or purifying selection (Ka/Ks<1) was found with a Ka/Ks 0.211 to 0.24. We also tested other potential gene segmental duplications by using the criteria of a) the length of the aligned sequence covers (at least 75% of the longer gene) and b) a similarity of aligned regions equal or superior to 75% [78]. We found potential cases of segmental duplications for some pairs of genes: StAP5.4 and StAP9.4; StAP12.1 and StAP4.4; StAP1.3 and StAP10.1 (green lines in Figure 3), which presented a Ka/Ks<1 from 0.14 to 0.18 and approximately a duplication date of 24.9 MYA [79].  [50]. Blue lines connect genes in segmental duplications blocks previously reported [80], and green fine lines represent possible cases of segmental duplication detected.

Segmental and tandem duplications observed in StAPs genes (approximately 19 of 62)
were lower than what was reported in AP genes from P. trichocarpa (52 out of 62), however, both segmental duplication events were under purifying selection force. It had been proposed that this could mean the retention of their ancestral functions and redundancy, which will also be related to the sequence environment where they were located [16,53]. Thus, the sequence environment plays an important role and could lead to changes in gene expression patterns of the duplicated copies, as could be the cis-acting regulatory elements that would be analyzed in the next section.

StAP Gene Structure
Structural gene analyses were performed comparing exon-intron organization based on the predicted longest primary transcript structure of the 62 StAP nucleotide sequences. Twenty-seven intronless StAP genes were found, representing almost 44% of the total StAP genes ( Figure 4). These were mostly grouped in clusters related to atypical APs, such as StAPV, StAPVI, StAPVII, and StAPVIII. The other four clusters contained StAP genes with numerous introns (up to 13), included mainly in StAPI (related with typical Aps), StAPII (atypical), and StAPIIIa and StAPIIIb (related with nucellin-like and atypical Aps, respectively). A similar intron-exon distribution among categories has been reported in rice [16], populus [53], and A. thaliana [77].
The expression analyses of 62 StAPs were retrieved from the PGSC DM1.3 RNASeq database to characterize StAPs and to identify candidate genes that could be involved in different abiotic stresses.
There were five genes without expression in any of the reported conditions or tissues (StAP7.1, StAP8.2, StAP8.3, StAP8.4, StAP8.6), which could mean that they might be induced in other conditions such as other plant physiological stages or stress sources. Alternatively, the lack of expression might suggest that they could in fact be pseudogenes. Interestingly, four of these genes were grouped in StAPVIII with the other five genes that were expressed in mesocarp, endocarp, and/or fruit, which points out a tight regulation.
An exception to the above-mentioned behavior of group StAPVIII was StAP5.6 (homolog to Nepenthesin I), which showed expression in several tissues including root, stem, callus, immature fruit, and upregulation after abiotic stress (including salt, mannitol, and temperature treatments) and under hormonal treatment mainly by Gibberellic acid (GA3) and 6-Benzylaminopurine (BAP). This broad expression profile could be related to a role as processing and/or degradative enzyme [8].
Approximately 21% of StAPs modified their gene expression profile under salt, osmotic, or temperature treatments. It was observed that those StAP genes with increased expression after NaCl and mannitol treatments also presented increased transcription profiles after GA3, BAP, and Abscisic Acid (ABA) treatments. The induced genes included   [50]. The phylogram was built with the iTol program [81,82]. Names of genes are colored based on the group membership in Figure 1.
Regarding heat treatments, 13 genes presented changes in their expression profile. Most of these genes were repressed except for StAP7.5, StAP2.6, StAP2.1, and StAP8.7, which exhibited different degrees of induction. Interestingly, some AP genes, orthologous to the ones described in this work, were expressed in similar environmental stress conditions. In a transcriptomic study, it was reported that an AtAP gene At1g62290 (a possible ortholog to StAp7.5) was induced in plants overexpressing the heat shock transcription factor HSfA2 [89]. Interestingly, HSfA2 overexpressing plants were tolerant to high temperatures, as well as to both salt and osmotic stress conditions. The authors did not analyze the role of these AP genes, but they concluded that genes that resulted regulated by HSfA2 could be implicated in multiple stress tolerance responses. Expression analysis of the AP gene superfamily in grape (V. vinifera.) identified a total of 19 VvAPs that responded to at least one abiotic stress condition. Specifically, the VvAP21 gene was upregulated in plants treated with NaCl 250 mM during 24 h, in accordance with its possible ortholog, StAP1.6 [9]. In another work, the same group overexpressed VvAP17, obtaining enhanced salt and drought stress-tolerant plants [85]. In detail, overexpressed VvAP17 lines exhibited an incremented level of the ABA hormone, a reduced stomatal aperture size, longer primary roots, as well as higher activities of several antioxidant enzymes such as catalase, superoxide dismutase, and peroxidase. A possible ortholog to VvAP17 is StAP4.3, which also showed upregulation after salt and mannitol treatment. These results suggest that StAPs hormonal expression profiles are related to abiotic stress.

Cis-Acting Regulatory Elements
Cis-acting elements were identified in silico in promoters of S. tuberosum aspartic proteases ( Figure 5, Table S8), several of which correspond to abiotic and biotic stress responsiveness: anaerobic response element (ARE), enhancer element involved in specific anoxic inducibility (GC-motif), low-temperature response element (LTR), "cis-acting element involved in dehydration, low-temp, salt stresses" (DRE), water response elements (Myb, AT-rich element), drought response elements (DRE core, Dre1, as-1, MyC), MYB binding site involved in drought inducibility (MBS) and its drought stress-related MYB recognition site, "heat, osmotic stress, low pH, nutrient starvation stresses response element" (STRE), repetitions rich in TC involved in defense and response to stress (TC-rich repeats), and wound and pathogen response elements (WUN-motif, WRE3, W-box, Box S). In addition to stress-related elements, aspartic proteases presented motifs associated with phytohormone response: elements sensitive to abscisic acid (ABRE, ABRE2, ABRE3a, ABRE4), ABA and GA response element (CARE), cis-acting elements involved gibberellin response (GARE motif, P-box, and TATC box), auxin-responsive elements (AuxRR-core, TGA-box, TGA element), cis-acting regulatory element involved in the MeJA response (CGTCA-motif), ethylene-responsive element (ERE), and cis-acting elements involved in the response to salicylic acid (SARE, TCA, TCA element). The majority of StAP coding genes presented more than one stress-related and phytohormonal-responsive cis-element in their promoter regions, some of them in very high copy numbers. In general, their presence correlates with the StAPs expression profiles observed in Figure 4.
StAPs that expressed differentially under NaCl and mannitol treatments (such as StAP1.6, StAP11.1, StAP5.3, and StAP8.11, among others) presented several cis-acting elements related to water, drought, salt, and osmotic stresses (AT-rich element, AT-rich sequence, DRE, DRE1, as-1, MBS, Myb, MYB recognition site, MyC, STRE). In addition, StAP1.6, StAP8.11, and StAP11.1 were also induced by ABA and presented cis-acting elements sensitive to abscisic acid, some of them in multiple copies (StAP1.6 has two copies of ABRE in the positive strand and three copies of ABRE, two of ABRE2, and one ABRE3a element in the negative strand; StAP8.11 has one ABRE4 in the positive strand and ABRE and ABRE3a in the negative strand, and StAP11.1 has one ABRE element in the negative strand). StAP1.6, which presented the highest copy number of abscisic acid-responsive elements, exhibited major induction. It has been described that more than one copy of ABRE is necessary for ABA-dependent gene expression [94].
Regarding heat stress response, promoters of StAPs induced by heat (StAP7.5, StAP2.6, StAP2.1, and StAP8.7) presented one or more copies of the heat, osmotic stress, low pH, and nutrient starvation stresses response element (STRE). P. infestans-responsive StAPs presented one, or in some cases two, cis-acting elements involved in defense, wound and pathogen responses (Box S, W box, WRE3, WUN-motif, TC-rich repeats). Salicylic acid-responsive elements (SARE, TCA, TCA-element) were present in three out of six upregulated and in five out of six downregulated genes under BTH treatments.
The presence of cis-acting regulatory elements found in StAP promoter regions is in agreement with previously reported roles of APs in water deficit, osmotic stress, and ABA signaling [45,85].

AP Identification and Characterization
A profile of Hidden Markov Model (HMMER 3.1b2, [95]) was created based on 59 aspartic protease amino acid sequences from A. thaliana [10]. These sequences were aligned with Multiple Alignment using Fast Fourier Transform (MAFFT) [96], and five of them were eliminated in order to improve the profile. The resulting alignment was trimmed on both ends, removing regions where the alignment was poor due to the variable nature. The HM-MER profile was validated on the Arabidopsis TAIR database [97] (TAIR10_pep_20101214/) in accordance with sequences described [10,77].
StAPs were named according to their chromosome localization with a first number that indicates the chromosome and a second number that accounts for its relative position within the chromosome (i.e., the aspartic protease localized on the most upstream position of chromosome 2 is named StAP2.1).
Alignments and phylogenetic trees were performed with MAFFT [96]. A phylogenetic tree was generated using Simple Phylogeny with the neighbor-joining clustering method with 100 bootstrap replications and default parameters [100].

Domain Structure Analysis, MEME Motifs, and Subcellular Localization
AP protein domains and active sites were analyzed in Pfam database [101] and Scanprosite tool [102]. The Multiple EM for Motif Elicitation (MEME) online tool v5.2.0 [103] was employed to identify conserved motifs of StAP proteins. Parameter settings were as follows: optimum motif width was set to 6-50, the maximum number of motifs was set to 20, with expectation motif site distribution set to any number of repetitions and all other parameters set by default. SignalP V4.1 was used for identifying putative signal peptides [104]. PredoTar V1.04 [105], TargetP4.0 [106], ProtComp 9.0 [107] and YLOC [108,109] with default settings were used to predict signal sequences to organelles and other subcellular localizations [110].

Gene Structure Analysis
AP genes were checked for intron and exon structure based on primary transcripts from the PGSC [56].

Chromosome Localization and Duplication Analyses
A physical map was constructed with potato StAPs gene localization obtained from the potato sequenced genome PGSC database (http://potato.plantbiology.msu.edu/data/ PGSC_DM_V403_genes.gff.zip accessed date: 15 October 2020 [74]. Plots were generated with a custom gene viewer based on HTML5 Canvas, JQuery, and Kinetics. Js developed by the Agrobiotechnology Laboratory from INTA EEA-Balcarce. For duplication analyses, genome tandem duplications were defined for those StAP genes falling within 50 Kb of one another [53]. While duplicated blocks were downloaded from the Plant Genome Duplication Database (PGDD) [80], the number of Ka and Ks values of non-synonymous and synonymous nucleotide substitutions, respectively, were extracted from PGDD, and Ka/Ks ratio was calculated. The duplication time was estimated according to the formula: T = Ks/2λ, assuming a clock-like rate (λ) was 6.9 × 10 −9 substitutions/site/year for S. tuberosum [111].

Analysis of Cis-Acting Elements
Cis-acting elements were detected in the promoters of StAP by in silico analysis using the web software PlantCARE [112,113]. Upstream sequences of AP genes, −3000 bp to +103 bp from the start codon, were retrieved from the PGSC v4.03 database [74].

Conclusions
In the present study, we presented the first identification and description of 62 aspartic protease genes on the potato genome. The data obtained provide new knowledge about gene structure, chromosome localization, expression, and the evolution of the aspartic protease family in S. tuberosum, as well as S. tuberosum aspartic protease (StAP) protein organization, phylogenetic relations, and tissue and subcellular predicted localization.
Phylogenetic analysis revealed that StAPs are distributed in eight groups, named from StAPI to StAPVIII, that were differentiated into typical (StAPI), nucellin-like (StAPIIIa), and atypical aspartic proteases (StAPII, StAPIIIb to StAPVIII). Specific and common domains and Multiple EM for Motif Elicitation (MEME) motifs were defined among these groups.
StAPs were distributed along all 12 potato chromosomes. Segmental and tandem duplications evidenced in the present work, might have contributed to the expansion of the potato AP gene family.
A total of 27 intronless StAP genes were found, representing almost 44% of the total StAP mainly on atypical AP.
The RNAseq analysis revealed that 26 out of 62 genes presented a constitutive expression in different tissues or organs. Some StAPs revealed preferential expression and tissue-specific patterns. Approximately 21% of StAPs modified their gene expression profile under salt, osmotic, or temperature treatments. Gene expression was consistent with the presence of cis-acting regulatory elements on StAP promoter regions.
This study leads to a better understanding of the StAP family and provides the baseline material for future studies tending to determine the functional assignment of the identified genes, as well as the redundancy or diversification of functions in duplicate and tandem genes. Additionally, the knowledge of the diversity of StAP genes and their potential functions can be a source of information for future researchers whose objective is potato breeding through new biotechnology techniques such as gene editing mediated by CRISPR.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants11040544/s1, Table S1: Partial amino acid sequence alignment of Solanum tuberosum APs and Arabidopsis thaliana APs (with known function) and six references APs,   Data Availability Statement: All data generated or analyzed during this study are included in this published article [and its supplementary information files] or are available from the corresponding author on reasonable request.