2.1. Broadening the Sequence Data Set of the Aptamer Pool Selected for Protein A
Previously, we described the process of DNA aptamer selection for Protein A known as a structural part of the cell wall of the bacterial pathogen S. aureus
]. The selected aptamer pool after 11 rounds of the FluMag-SELEX procedure was further processed by the classical way of cloning and Sanger sequencing of individual aptamer clones. A data set of 88 sequences (Sanger data pool) was obtained (Figure 1
). Identical and homologous sequences were grouped together resulting in 12 sequence groups. Each group consists of 2–8 sequences (total 47 sequences, 53.4%). In addition, 41 orphans (46.6%) were found which have no homology to one of the identified groups (Figure 1
). Sequences from all groups were tested for target binding. Only one group with aptamer PA#2/8 as representative (8 group members, 9.1%) could be confirmed for binding to Protein A. These are rather unusual outcomes of a SELEX process indicating the selection of a very heterogeneous oligonucleotide pool missing a dominating sequence family.
Therefore, we additionally employed next generation sequencing technology for direct sequencing of the same aptamer pool to get a larger sequence data set. Bioinformatics tools were used to obtain information on the number and frequency of individual sequences, their grouping based on sequence similarities, group size and complexity. After pre-processing of the raw data set, there were 2597 sequences (NGS pool) to be analyzed by a two-step clustering and alignment method (Figure 2
A,B and Figure S1
). All identical sequences were clustered first resulting in a reduced non-redundant pool of 1420 sequences. This pool included 286 clusters with a size of ≥2 sequences (total 1463 sequences, 56.3%) and 1134 orphans (43.7%). All clusters with a size of ≥15 sequences (total 15 clusters C0–C14) were highlighted in Figure 2
A. Seven representatives of sequence groups from the Sanger data pool (Figure 1
) could be identified among these 15 clusters of the non-redundant NGS pool and are given on the right of the cluster boxes in Figure 2
A. Moreover, the four largest groups and clusters of both data pools are represented by the same aptameric sequences PA#2/8 (C1), PA#4/22 (C3), PA#4/34 (C0) and PA#2/11 (C2). The remaining 8 of the 15 clusters were recognized as new aptameric sequences, because they were not present in the Sanger data pool. In addition to the 7 Sanger aptamer groups, all of the double sequences and also 22 orphans of the Sanger data pool could be identified in the non-redundant NGS pool (Figure S2
). Only 19 orphans were not present in the NGS pool. These analysis results conform to that of the Sanger data pool and confirm the selection of a highly complex oligonucleotide pool. The proportion of orphans and the number of small clusters with a size of 2–5 sequences is very high. The relative frequency distribution of certain sequences in the NGS pool is comparable to that found for the Sanger data pool. There is no specific sequence with outstanding frequency as well.
A second clustering of the non-redundant NGS pool based on 85% sequence identity was performed followed by a final alignment. The combination of all results from clustering and alignment aimed at forming groups of homologous sequences from the whole NGS pool. The focus was on groups with at least 15 sequences, which were highlighted in Figure 2
B. There are 15 groups (see also Figure 3
) matching this condition and contain 790 sequences in total. 1025 another sequences belong to clusters but were not aligned yet for grouping, and 782 sequences (30.1%) remain orphans at this stage of data analysis.
The frequency distribution of sequences from the first clustering is reflected in the ranking of the identified sequence groups, whereby some significant changes could be observed. The four most abundant sequences in the NGS pool also represent the largest groups, group 1 (PA#2/8) and group 3–5 (PA#4/22, PA#4/34, PA#2/11). An exception forms group 2 with 85 sequences (3.3%), which is represented by a new aptameric sequence PA-C10. All new sequences that are not present in the Sanger data pool were named according to their cluster number (C…) from the first clustering. This means in case of PA-C10, it comes from cluster C10 with 20 identical sequences. Group 2 also contains a related sequence, which is already known as orphan PA#14/89 from the Sanger data pool and now exists six times in the NGS pool (Figure 3
). By far the largest group with 247 sequences (9.6%) is group 1 represented by PA#2/8. This sequence is the only aptamer from the Sanger data pool able to bind Protein A. Representatives (PA#14/82, PA#2/3 and PA#2/6) of 3 further groups are known as multiple sequences from the Sanger data pool. Seven groups completely consist of new aptameric sequences. Therefore the NGS pool is clearly dominated by a sequence group in contrast to the Sanger pool.
2.2. Group Complexities and Consensus Sequences
Group 1 shows the highest sequence variability. The 247 sequences contained are represented by 20 differently sized clusters and 50 orphans (Figure 3
). The most frequent sequence PA#2/8 (cluster C1) in this group was already analyzed in our previous work concerning its structural features and binding ability to Protein A [30
]. It is characterized by four G-rich regions (GGGGG-D5
-GG) and thus able to form a G-quadruplex structure, which we have demonstrated by circular dichroism spectroscopy [31
]. Differences between the sequences of group 1 often concern these G-rich regions, especially the number of guanine residues (Figure S3
). The size of each region varies, but is dominated by two (region 1 with G5–6
) or only one variant (region 2–3 with G5
and region 4 with G2
) (Figure S4
). Group 1 contains two further clusters C4 and C7 that also represent large clusters with ≥15 sequences. Cluster C4 (41 sequences) has the same G-profile as PA#2/8 (cluster C1), but differs at one nucleotide position in the sequence between G-rich region 1 and 2. In contrast, cluster C7 (25 sequences) contains 6 guanine residues in region 1 as only difference to PA#2/8. The high variability of this group may be caused by sequencing artefacts. The Roche 454 GS FLX system is known for its relatively high error rate in terms of homopolymers including insertions as the most common type of error [33
]. Especially long stretches with more than three identical bases are affected, which is the case for at least two of the G-rich regions found with 5–6 guanines each.
Besides group 1, group 2 and 6 also contain several stretches of guanines, but in a completely different pattern (Figure 3
and Figure S5
). The sequence variability in these groups is much lower than in group 1, but higher than in the other groups. Two consensus regions between PA-C8 (cluster C8) from group 6 and PA#2/8 from group 1 that overlap three G-stretches in both sequences were identified by sequence alignments (Figure S6
). In contrast, no consensus regions were found between group 2 (PA-C10, cluster C10; PA#14/89, cluster C60) and PA#2/8 or PA-C8.
shows the predicted 2D-structures of the full-length sequences representing the three G-rich sequence groups. The close relationship between PA#2/8 and PA-C8 is affirmed by a common structural element at their 5′-ends forming a stem-loop. This element includes the 5′-primer binding region and was found to be essential for the target binding of PA#2/8 [30
The other 12 groups are distinguished by their diversity without significant consensus regions (Table S1
2.3. Functional Screening of Identified Aptamer Groups
The representatives of the identified 15 largest groups of homologous sequences from the NGS pool were screened for their individual binding abilities to Protein A. The SPR-based Biacore X100 instrument was used for these comparative interaction analyses where the 5′- or 3′-biotinylated aptamers were immobilized on a streptavidin-modified sensor surface. The results of interactions with Protein A are shown in Figure 5
revealing new functional aptamers in addition to the previously published ones. Besides group 1 with the already known aptamer PA#2/8, two further groups are able to bind Protein A: group 2 with PA-C10/PA#14/89 and group 6 with PA-C8. PA#14/89 is also known from the Sanger data pool but was not identified as binding sequence at that time. The other 12 groups do not show any interaction with Protein A. This indicates a high proportion of sequences in the selected aptamer pool not able to bind to the target, thus confirming the screening results of the Sanger data pool as described in Stoltenburg et al. [30
]. Other proteins like streptavidin, bovine serum albumin (BSA) or immunoglobulins (human IgG, rabbit IgG) were applied to investigate alternative specificities of the selected sequences. But none of the 15 groups were able to bind to these proteins. This means, no cross reactivity for the Protein A-targeting aptamers from group 1, 2 and 6 were found. On the other hand, it is not clear which kind of background binders were co-enriched during the aptamer selection for Protein A. The results of additional specificity analyses with the aptamers from group 2 and 6 show that they are able to effectively distinguish Protein A from the functionally related proteins Protein G and Protein L as well (Figure S7
). This is in accordance with previously obtained results for PA#2/8.
also reveals immobilization site effects on the functionality of the Protein A-binding aptamers. As known for PA#2/8, PA-C8 also shows a strongly reduced binding when immobilized via its 5′-end. Both aptamers exhibit similar structural elements and therefore may have a comparable binding behavior. In previous studies, we have comprehensively analyzed the binding behavior of PA#2/8 under different assay designs [30
]. We could verify that a free accessible and intact 5′-end of this aptamer is essential for correct folding into the functional structure. The complex three-dimensional conformation is a general prerequisite for the functionality of each aptamer and is important for its specific interaction with the target. Stepwise truncation experiments of PA#2/8 have shown that the stem-loop-structure at the 5′end involving the 5′-primer binding site is essential for binding of the aptamer to Protein A. Removing it leads to a complete loss of function of the aptamer. Related to this, the immobilization of PA#2/8 via the 5′-end also leads to a reduced binding ability. It is known that immobilization may alter the functionality of aptamers in a specific assay. e.g., the molecular flexibility of an aptamer may be restricted impeding its correct folding, or the accessibility of the aptameric binding structure may be limited for the target. Neighborly interactions between aptamers may also interfere with aptamer folding resulting in inhibition of target recognition [36
]. Variation of immobilization density and distance may counter such negative effects. Knowledge about the specific performance of an aptamer at different conditions is crucial for its successful application, e.g., optimal spatial arrangement of the aptamer in the assay or on the sensor surface for most effective target binding and measuring signal formation.
In contrast, the interaction of group 2 aptamers with Protein A is only little affected by the immobilization site.
We could refine the results of our previous SELEX experiment with this medium throughput sequencing approach by expanding the dataset 30-fold and identifying new aptameric sequences in the enriched DNA pool. Several years ago, NGS is increasingly integrated into SELEX experiments in various ways [19
]. Schütze et al. performed a SELEX experiment over 10 selection rounds and applied both Sanger sequencing and NGS [20
]. They could identify specific binders among several Sanger sequenced clones of the tenth round and were interested in the dynamics of these clones during the whole selection process using high throughput sequencing in all selection rounds. They found, that identified binders start to enrich from round 3, whereas the complexity of the DNA pools drops dramatically after round 4. Unique sequences that occurred after round 5 seem to be derivatives of strongly enriched clones generated by mutation or sequencing artifacts. The authors also found that the frequency of specific binders often reached a maximum until rounds 6 or 7 and then tended to decrease in the following rounds. Interestingly, the most abundant sequences analyzed in the final selection round did not correlate with the strongest binding behavior. Similar observations were described by Berezhnoy et al. They confirmed that the best binders for the given target identified by high throughput sequencing after five SELEX rounds, progressively disappeared in further rounds (in total 16 rounds) while weak binders became more enriched [39
Other researchers focused on an extremely reduced SELEX process with three down to only one selection round in combination with high throughput sequencing and comprehensive sequence analysis for a rapid aptamer development [40
]. The application of capillary electrophoresis (CE) provides the possibility to strongly reduce the number of SELEX rounds. CE is known for its high partitioning efficiency and therefore has been successfully introduced in SELEX experiments for the separation of unbound oligonucleotides from target-bound oligonucleotides during the aptamer selection process (as reviewed in [9
]). For example, Riley et al. combined CE with NGS [43
]. The authors demonstrated the identification of thrombin binding aptamers from a spiked SELEX library during a single round of CE-based selection directly followed by NGS and data analysis for aptamer identification and frequency distribution. The aptamer content could be increased from 0.4% in the original library before selection to >15% in the CE-selected fraction. A similar strategy was applied for the de novo selection of aptamers for vitronectin [44
More recently, Valenzano et al. applied high throughput sequencing and bioinformatics analysis at specific stages of a multiple round SELEX process where changes of the selection conditions occurred [45
]. They were interested in deeply understanding the effects of increasing stringency on the enrichment of target-specific aptamers and their dynamics over the course of 21 SELEX rounds. At the end, they could identify high affinity aptamers for the small molecule tyramine from the largest sequence clusters of the last selection round. Soldevilla et al. combined high throughput sequencing with a strategy of Conserved Motif Accumulation (CMA) [46
]. They applied NGS after the last two rounds (6–7) in their SELEX experiment and identified the five most abundant aptamers. After a motif analysis, the authors postulate that aptamer species with a higher accumulation of potential binding motifs are likely to have a higher probability of being better binders.
These different approaches exemplify the multitude of possibilities to integrate next generation sequencing into SELEX experiments with the aim to improve the process of aptamer development. It also underlines that the aptamer development remains a complex process and a simple universal method or strategy does not exist.
2.4. Comparative Affinity Studies of Protein A-Targeting Aptamers
Biacore X100 was also used to analyze the affinities of the screened candidate aptamers from group 1, 2 and 6 for their binding to Protein A. Concentration series of recombinant and native Protein A in the range of 10–8000 nM were applied for binding with immobilized aptamers. The best binding aptamer is PA#2/8 (group 1) with steady-state affinities in the low nanomolar range. KD
values of 20 ± 1 nM for native Protein A and 92 ± 12 nM for recombinant Protein A were calculated from saturation curves of binding data at the end of the binding phases (Figure 6
A). These KD
values are significantly lower than that described previously and could be achieved by optimizing interaction conditions like thermal equilibration of the aptamers and lowering its immobilization level on the sensor surface. The aptamer is also characterized by a very stable binding to Protein A visualized by a slow dissociation shown in the sensorgrams in Figure 6
Besides PA#2/8, two other aptameric sequences, PA-C4 and PA-C7, are very frequent in the group and were therefore analyzed regarding their binding abilities to Protein A. A very similar binding behavior was observed for PA-C4 with a slightly lower affinity of 222 ± 22 nM for recombinant Protein A (Figure 6
B). In comparison to the sequence of PA#2/8, the nucleotide exchange in PA-C4 at one position in the sequence between G-rich region 1 and 2 (G5
) has only little effect on the binding ability. In contrast, an additional guanine in G-rich region 1 of PA-C7 (G5
) has a strongly negative effect on the binding to Protein A resulting in a decreased affinity of 1614 ± 94 nM (Figure 6
C). This was unexpected, because both size variants of G-rich region 1 are highly frequent among the sequences of group 1 (G5
with 141 sequences and G6
with 102 sequences). PA-C7 could therefore be the result of a particular type of sequencing error concerning homopolymers as mentioned above.
The relationship between PA#2/8 and PA-C8 (group 6) proved by identified consensus regions and common structural features is further confirmed by comparable association and dissociation behavior during interaction with Protein A (Figure 6
F). Slightly higher KD
values of 99 ± 4 nM for native Protein A and 443 ± 44 nM for recombinant Protein A were calculated.
A significant different binding behavior to Protein A was observed for PA-C10 and PA#14/89 (group 2). The binding curves of both 3′-immobilized aptamers are characterized by a fast dissociation of the binding complexes especially after interacting with recombinant Protein A (Figure 6
D,E). This indicates an unstable binding complex and results in affinities only in the micromolar range with KD
= 2730 ± 125 nM for PA-C10 and KD
= 2655 ± 168 nM for PA#14/89. The affinities increase strongly if binding to native Protein A is measured, and switch in the nanomolar range with KD
= 588 ± 28 nM and KD
= 467 ± 23 nM, respectively. The 5′-immobilized variants of both aptamers bind Protein A with affinities in a similar range (Figure S8
, Table S2
The group 2 aptamers identified in this work exemplify the need of combined experimental approaches already for functional screening of selected sequences to avoid loss of potent aptamers at an early stage of aptamer developments. In our first study, PA#14/89 was not recognized as a Protein A-binding sequence by a fluorescent assay using target-coated magnetic beads. The SPR-based assay used for functional screening in our current study provides a different assay design and also allows a more differentiated insight into the dynamics of the aptamer-target interactions. The fast dissociation of the formed binding complexes as observed in the SPR-based assays has also affected the bead-based binding assays resulting in a release of most of the aptamers from the binding complexes on the beads during the washing steps before quantifying the bound aptamer fraction. This led to the initial assessment of the selected aptamer as non-binding sequence.
McKeague et al. performed a comprehensive analysis of aptamer binding assays with respect to evaluation of small molecule-targeting aptamers [47
]. They stated the need for multiple experimental strategies for aptamer candidate screening and characterization. But there is no universal method or assay applicable to functionally verify each aptamer-target system or to compare them. Moreover, the performance of a specific aptamer may vary under different assay designs, e.g., if the aptamer is used in solution or immobilized, or has to be modified for the specific application. Additionally, each assay has its own inherent sensitivity range, and therefore can limit the aptamer affinity reported as KD
value and reflecting the strength of attraction between aptamer and target. A flexible combination of different assays must be included for screening, characterization, and functional verification of aptamers with balancing the efficiency, parallelization and cost-effectiveness [47
]. This is one of the challenging issues in the post-SELEX phase and also matches our observations during several aptamer developments for different classes of targets.
The binding behaviors of the candidate aptamers were also tested in a reverse experimental setup where the sensor surface was build up by immobilization of biotinylated native Protein A. The unmodified aptamers were applied as analytes. The reverse assembly affects the binding abilities strongly. A good functionality was observed for PA#2/8 as shown in Figure 7
A, whereas only weak binding to immobilized Protein A was found for the related aptamer PA-C8. In case of PA#2/8, a concentration series of the aptamer was applied for binding and the dissociation constant was calculated to be in the micromolar range with KD
= 3730 ± 130 nM (Figure 7
B). The significant difference in the affinity of the immobilized aptamer and that in solution to Protein A was already discussed previously [30
]. Protein A is supposed to be a multivalent target for the aptamer and therefore can cause avidity effects in case of immobilized aptamers. Avidity is much stronger than 1:1 affinity and allows more stable binding complexes.
In contrast, the group 2 aptamers PA-C10 and PA#14/89 are not able to form detectable binding complexes on the sensor surface when applied as analyte in the flow system. These results are in accordance with those shown in Figure 6
D,E. As discussed above, the binding complexes of both aptamers with Protein A are relatively unstable hampering the binding of the aptamers to the Protein A-coated sensor surface. Possible stabilizing avidity effects provided by Protein A have no impacts under these assay conditions.
Steric hindrance cannot be excluded and could explain differences in the binding behavior of the aptamers depending on the assay format. The sensor surface was coated with Protein A at a high level, but not fully saturated. Differences between both assay formats also concern Protein A modification. The unmodified protein was used when applied as analyte, but the biotinylated protein was used when applied as ligand immobilized on the sensor surface. Biotin was randomly coupled to Protein A through an aminocaproyl spacer with different extent of labeling (see manufacturer’s description). Biotinylation of Protein A may affect the binding ability of the aptamers by altering original features of the protein or masking the potential binding sites. We will investigate the discrepancy in the binding behavior of the aptamers in our further work.