Polymorphic Molecular Signatures in Variable Regions of the Plasmodium falciparum var2csa DBL3x Domain Are Associated with Virulence in Placental Malaria

The Plasmodium falciparum protein VAR2CSA allows infected erythrocytes to accumulate within the placenta, inducing pathology and poor birth outcomes. Multiple exposures to placental malaria (PM) induce partial immunity against VAR2CSA, making it a promising vaccine candidate. However, the extent to which VAR2CSA genetic diversity contributes to immune evasion and virulence remains poorly understood. The deep sequencing of the var2csa DBL3X domain in placental blood from forty-nine primigravid and multigravid women living in malaria-endemic western Kenya revealed numerous unique sequences within individuals in association with chronic PM but not gravidity. Additional analysis unveiled four distinct sequence types that were variably present in mixed proportions amongst the study population. An analysis of the abundance of each of these sequence types revealed that one was inversely related to infant gestational age, another was inversely related to placental parasitemia, and a third was associated with chronic PM. The categorization of women according to the type to which their dominant sequence belonged resulted in the segregation of types as a function of gravidity: two types predominated in multigravidae whereas the other two predominated in primigravidae. The univariate logistic regression analysis of sequence type dominance further revealed that gravidity, maternal age, placental parasitemia, and hemozoin burden (within maternal leukocytes), reported a lack of antimalarial drug use, and infant gestational age and birth weight influenced the odds of membership in one or more of these sequence predominance groups. Cumulatively, these results show that unique var2csa sequences differentially appear in women with different PM exposure histories and segregate to types independently associated with maternal factors, infection parameters, and birth outcomes. The association of some var2csa sequence types with indicators of pathogenesis should motivate vaccine efforts to further identify and target VAR2CSA epitopes associated with maternal morbidity and poor birth outcomes.


Introduction
Nearly 12 million cases of malaria occur in pregnant women every year, resulting in thousands of deaths and adverse birth outcomes [1,2]. First and second pregnancies are at the highest risk, particularly with Plasmodium falciparum infection, which contributes

Number of Unique Sequences per Patient Is Elevated in Association with Chronic PM but Not Gravidity
Using a traditional sequencing approach, we previously identified an average of 10 unique DBL3X sequences in placental blood from Kenyan women [24]. To expand on those observations, a next-generation sequencing approach was used to assess DBL3X diversity in parasite populations isolated from the placental blood of 49 western Kenyan women recruited at parturition. Relative to the 26 multigravidae included in the study, the 23 primigravidae were younger, had higher placental parasite burdens, tended to be more likely to have histological evidence of chronic infection (as evidenced by the presence of malarial pigment (hemozoin [30])), and delivered lower birth weight infants (Table 1).
Next-generation sequencing provided enhanced sensitivity, yielding 127,721 highquality reads (Supplementary Tables S1 and S2 and Supplementary Figure S1) from the 49 tested placental blood samples, covering two highly variable (V1-V2) and one conserved (C2) region of DBL3X ( Figure 1A).  ). c Placenta sections for the histological categorization of infection were not available for three primigravid and three multigravid women; chronic infection is defined as in Methods. d White blood cells (WBC) bearing hemozoin relative to total WBCs counted on a thick placental blood smear. e Data were not available for six primigravid and nine multigravid women. f Based on self-reporting; one primigravid woman reported antimalarial drug use but could not name the type of drug taken; one multigravid woman reported using an unspecified antimalarial other than sulfadoxine/pyrimethamine, metakelphin, or quinine. None reported use of chloroquine. g Data were not available for two primigravid women.
A de novo contig assembly analysis approach (described in Section 4) identified 522 contigs. For all samples, a subset of contigs accounts for a majority of the reads, suggesting that in most infections certain parasite genotypes are overrepresented relative to others present. At the individual patient level, the most common contig is referred to as the "dominant sequence". In two separate 454 sequencing runs, DNA from the clonal line 3D7 and a patient previously found to be clonal (ID = 895) [24] were included as a control and in both cases yielded a single contig upon analysis. As shown in Supplementary Table S3, within the dataset of 522 contigs, 35 sequence pairs share 99% or greater identity, representing identical or near-identical sequences present in different study subjects. This includes 11 pairs sampled from geographically distinct study sites (Siaya vs. Kisumu, Kenya) at different times during the six-year sampling period. Therefore, accounting for identical sequences shared between different samples, 487 contigs were unique, yielding 127 unique sequences at the amino acid level. Seventy-six of the seventy-nine unique sequences observed in our previous work [24] were also found in this study. Rarefaction analysis revealed an accumulation curve that did not reach an asymptote (Supplementary Figure S2), suggesting that additional sampling may identify additional variants.
A de novo contig assembly analysis approach (described in Section 4) identified 522 contigs. For all samples, a subset of contigs accounts for a majority of the reads, suggesting that in most infections certain parasite genotypes are overrepresented relative to others present. At the individual patient level, the most common contig is referred to as the "dominant sequence". In two separate 454 sequencing runs, DNA from the clonal line 3D7 and a patient previously found to be clonal (ID = 895) [24] were included as a control and in both cases yielded a single contig upon analysis. As shown in Supplementary Table S3, within the dataset of 522 contigs, 35 sequence pairs share 99% or greater identity, representing identical or near-identical sequences present in different study subjects. This includes 11 pairs sampled from geographically distinct study sites (Siaya vs. Kisumu, Kenya) at different times during the six-year sampling period. Therefore, accounting for identical sequences shared between different samples, 487 contigs were unique, yielding 127 unique sequences at the amino acid level. Seventy-six of the seventy-nine unique sequences observed in our previous work [24] were also found in this study. Rarefaction Figure 1. Deep sequencing of a region spanning 420 base pairs within the var2csa DBL3X domain reveals extensive diversity that falls into four unique groups. (A) Two highly variable (V1-V2) and conserved regions (C2) of the DBL3X domain were sequenced. (B) Tajima's D was calculated over the first 250 nucleotides of the sequenced region using a sliding window of 10 base pairs. (C) The sequence logo represents a translation of the V1-C2 region, which yields a 74 amino acid peptide. regions of conserved amino acid sites and motifs of the protein sequence alignment which is comprised of conserved, variable, and indel regions (N = 522). Motifs previously reported to be associated with gravidity are indicated [24]. analysis revealed an accumulation curve that did not reach an asymptote (Supplementary Figure S2), suggesting that additional sampling may identify additional variants. The number of unique nucleotide sequences within a patient sample (Tables S1 and S2) was independent of gravidity (primigravid: median = 6; 95% confidence interval (CI), 4, 22; range 1-33; multigravid: median = 9.5; CI, 5, 13.3; range 1-30, p = 0.3997). The same was true for peptide sequences (primigravid: median = 4; CI, 2, 7; range 1-18; multigravid: median = 4; CI, 3, 8; range 1-11, p = 0.5246). However, unique DNA and peptide sequence numbers were higher with chronic PM as defined by histopathological parameters ( Figure  2). Univariate linear regression modeling did not reveal the associations of peptide sequence numbers with other laboratory or clinical parameters (see Methods; data not shown). Patients were stratified according to the histological categorization of infection chronicity (Section 4) and the number of unique sequences calculated. Following the translation of sequences to amino acids, unique amino acid sequences were also ascertained. Patients were stratified according to the histological categorization of infection chronicity (Section 4) and the number of unique sequences calculated. Following the translation of sequences to amino acids, unique amino acid sequences were also ascertained. Groups were compared by the Mann-Whitney U test. Red line represents the median.

Association of Short Peptide Motifs in DBL3X with Gravidity and PM Outcome Parameters
Within the first 150 base pairs of the sequenced region (encompassing V1; see Figure 1A) Tajima's D index approached 3, indicating that balancing selection, possibly immune-driven, is very high in this subdomain region ( Figure 1B). Given previous findings of associations between gravidity and specific amino acid motifs in this region of DBL3X [24,31], the V1-C2 region ( Figure 1A) was similarly examined. Five motifs were observed ( Figure 1C; Table 2). Seventy-eight percent of the sequences contained either an IISQNDKK motif, which tended to be slightly more prevalent in primigravidae, or a less frequently observed IISRNPMK motif, which was more highly represented in multigravidae (Table 2). EGGEDGKGKQKE was observed in less than a fifth of the sequences but was more prevalent in multigravidae. The EKANNN motif predominated in primigravidae ( Table 2). The highly represented NSNGLP motif was equivalently distributed independent of gravidity. Regression analysis was used to assess associations between the carriage of these motifs and clinical and laboratory parameters, treated as dependent variables (Table S4). This analysis approach facilitates the identification of associations between individual motifs and patient parameters while controlling for the presence of other motifs. Among continuous variables, the length of gestation (gestational age at birth) was significantly associated with the EGGEDGKGKQKE motif: for each single increase in sequence number bearing this motif, the gestational age increased by 0.289 weeks (SEM: 0.119; p = 0.0238). For the motif EKANNN, each increase in sequence number with this motif resulted in a 0.28% increase in leukocytes bearing hemozoin in the placenta (p = 0.0238; Table S4). Among categorical variables, no statistically significant associations were observed for this motif (Table S4). The IISQNDKK, IISRNPMK, and NSNGLP motifs showed no significant associations with any of the tested parameters.
We used the recently published data reporting the high-resolution structure of fulllength VAR2CSA [28] to determine the location of the region of the DBL3x domain containing the motifs. As shown in Figure S3, the motifs (green region) lie in a loop that falls between major helical structures of the DBL3x domain (in blue). Based on the structure, it appears that this loop is on the external face of the DBL3x domain and not buried within it. This may account for the ability of this region to accommodate the indel found in this area.

Sequence Analysis Identifies Four Types That Associate with Gravidity and Infection Parameters
Because the variability observed in DBL3X spans the bulk of the V1 region (Figure 1), an analytical approach that considered the full-length sequences (as opposed to short peptides) was next undertaken. Specifically, ordinal multidimensional scaling (OMDS) and k-step network analysis were used to visualize the genetic distance matrix obtained from the alignment of sequences unique at the amino acid level. We chose to use multidimensional scaling and k-step network analysis as complementary approaches to address the question of the current level of separation between the sequences. Both methods revealed distinct clustering into four sequence types ( Figures 1D and 3A,B). The detection of these types was not geographically or temporally restricted; they were consistently observed in patients recruited across multiple years and in two sites in western Kenya (Kisumu, 2002(Kisumu, -2004 and Siaya (2004-2008; Supplementary Figure S4) that are~70 km apart by road. While the number of sequence types within a patient sample did not vary as a function of gravidity (primigravid: median = 2; CI, 1, 3; range 1-3; multigravid: median = 2; CI, 2, 3; range 1-3, p = 0.9867), in keeping with elevated sequence and peptide sequence number with chronic PM, the same relationship was evident for sequence types (acute: median = 2; CI, 1, 2; range 1-3; chronic: median = 2; CI, 2, 3; range 1-3, p = 0.0476). To test associations between the four identified sequence types and disease outcomes, regression analysis with clinical and laboratory parameters treated as dependent variables was performed, with the 127 unique sequences considered at the amino acid level and stratified by type ( Table 3). As above, this analysis approach allows the identification of associations between individual sequence types and patient parameters while controlling for the presence of other types in the same patient. For each single increase in the carriage of a unique While the number of sequence types within a patient sample did not vary as a function of gravidity (primigravid: median = 2; CI, 1, 3; range 1-3; multigravid: median = 2; CI, 2, 3; range 1-3, p = 0.9867), in keeping with elevated sequence and peptide sequence number with chronic PM, the same relationship was evident for sequence types (acute: median = 2; CI, 1, 2; range 1-3; chronic: median = 2; CI, 2, 3; range 1-3, p = 0.0476). To test associations between the four identified sequence types and disease outcomes, regression analysis with clinical and laboratory parameters treated as dependent variables was performed, with the 127 unique sequences considered at the amino acid level and stratified by type ( Table 3). As above, this analysis approach allows the identification of associations between individual sequence types and patient parameters while controlling for the presence of other types in the same patient. For each single increase in the carriage of a unique type 1 sequence, gestational age decreased by 0.646 weeks (SEM: 0.209; p = 0.0036; Table 3). Type 3 sequence number was negatively associated with percent placental parasitemia (coefficient ± SEM: −0.363 ± 0.163, p = 0.0311; Table 3). Among categorical variables, each single increase in carriage of a unique type 4 sequence yielded 3.26-fold increased odds for a histological assessment of chronic infection (CI: 1.41, 7.52; p = 0.0056; Table 3). Type 3 sequence carriage was associated with significantly reduced odds of having a percentage of placental parasitemia in the upper quartile (OR (CI): 0.184 (0.042, 0.805), p = 0.0245; Table 3).

Association of Infection Outcomes with Sequence Type Dominance
The assessment of unique sequences per patient revealed that, typically, one unique sequence predominates in each patient (Supplementary Figure S4). To assess which clinical parameters influence sequence dominance, "groups" were defined according to the type to which the most prevalent unique sequence within each individual patient belongs (i.e., the predominance of a type 1 sequence triggered assignment to group 1). Interestingly, 7/7 women in group 1 and 8/10 in group 3 were multigravid, whereas 5/6 in group 2 and 16/26 in group 4 were primigravid. Univariate logistic regression revealed that gravidity, maternal age, placental parasitemia, placental hemozoin within leukocytes, reported lack of antimalarial drug use, infant gestational age, and infant birth weight influence the odds of membership in one or more of these sequence predominance groups ( Figure 4A). Other clinical and laboratory parameters were not associated with sequence dominance (Table S5). Logistic regression modeling revealed that for each one-week increase in gestational age, the odds of group 1 membership decreased by more than 50% (p = 0.0300), and indeed, were 15.6 times more likely to undergo preterm birth (p = 0.0365; Figure 4A, Table S5). Group 1 members also were 83% less likely than non-members to self-report the use of anti-malarial drugs (p = 0.0473; Figure 4A, Table S5). Multivariate modeling adjusting for gestational age and being a multigravida confirmed the risk for no antimalarial drug use in this group (p = 0.0491). Group 2 members had more than 22 times greater odds (p = 0.0319) of having a low-birth-weight baby relative to nonmembers ( Figure 4A, Table S5). Multivariate modeling controlling for all of these factors revealed a high persistent risk for low birth weight in group 2 members (p = 0.0319; Figure 4B, Table S6). For every log increase in placental parasite density (p = 0.0136), percentage of placental parasitemia (p = 0.0211), and percentage of placental leukocytes bearing hemozoin (p = 0.0490), there was >40% decreased risk for membership in group 3 ( Figure 4A, Table S5). The inclusion of placental parasite density in a multivariate model adjusting for age, being a multigravida, placental hemozoin burden, and chronic PM revealed that only it remained significant, with each log increase in placental parasite load yielding 86% reduced odds of being a group 3 member ( Figure 4B, Table S6). Finally, each one-year increase in age yielded a 13% decreased odds (p = 0.0245) for group 4 membership, a group also strongly associated with primigravidity (p = 0.0325; Figure 4A, Table S5). In a model adjusting for age, gravidity, and being in the upper quartile percentage of placental parasitemia, younger age emerged as the defining attribute for group 4 membership (p = 0.0486; Figure 4B, Table S6).    Tables 1 and 3 and Section 4.

Discussion
Var genes are highly polymorphic [32], facilitating immune evasion in P. falciparum [33]. Thus, var members that are associated with virulence are arguably optimal antigenic targets for vaccine development [34]. Despite relative conservation in comparison to other members of the var gene family [35], var2csa varies considerably among laboratory P. falciparum isolates [36] and globally disparate patient samples [16,25,31,37]. Nonetheless, VAR2CSA is increasingly subject to antibody recognition over successive malaria-exposed pregnancies [11,15,38] and has therefore emerged as a vaccine candidate for protecting women against PM [36]. Indeed, early work found that human antibody recognition of VAR2CSA was strain-transcendent, leading to predictions of broad cross-protection by these responses [39,40]. However, more recent work has challenged this notion [41,42], raising concerns about the impact of polymorphisms on the efficacy of a vaccine targeting VAR2CSA [16], which were indeed borne out in a recent VAR2CSA vaccine trial [18]. There is considerable interest in the N-terminal region (DBL1X, DBL2X, CIDR) of VAR2CSA, which is necessary and sufficient for CSA binding [19,42], and is the target for naturally acquired antibodies capable of interfering with cytoadhesion [40]. Our previous conventional DNA sequencing of DBL3X revealed a high degree of diversity [24]. Considerable polymorphism is also evident in the ID1-DBL2Xb segment [25,42]. Because VAR2CSA polymorphism may retard the acquisition of cross-protective immune recognition [43] by altering linear or conformational immune epitopes proximally and distally to mutated residues, it is critical to elucidate the genetic diversity of this protein and associated impacts on function [44], capacity to promote pathogenesis, and host immune recognition. Two VAR2CSA-based vaccines, PAMVAC and PRIMVAC, were recently tested in clinical trials [17,18]. Both vaccines are comprised of the N-terminal domains shown to be the minimal regions required for binding to CSA. They differ in that one is based on the VAR2CSA sequence from the 3D7 parasite strain, the other on the FCR3 strain. Neither vaccine elicits a broadly reactive immune response capable of recognizing the diverse VA2CSA sequences present in field populations of the parasite [17,18,25]. This may be related to the absence of conformational epitopes that are only present in the full-length protein [45,46]. Additional regions of the protein may contribute to binding to CSA, and are indispensable to protein structure, as demonstrated recently with the cryo-electron microscopic analysis of full-length VAR2CSA [28,29]. Another recent study utilizing small angle X-ray scattering and the single particle reconstruction of negative-stained electron micrographs suggested that a secondary site for CSA binding may utilize the domain's C-terminal to the minimal CSA binding domain, including DBL3X [26].
This study used targeted deep sequencing, which affords the sensitive detection of parasite diversity, including minor variants, to probe the diversity of var2csa in placental parasites obtained from women naturally exposed to endemic P. falciparum ( Table 1).
Most of the DBL3X sequences reported here are novel, and rarefaction analysis suggests that additional sequencing may identify further variants ( Figure S2). Here, we analyzed a DBL3X region spanning 420bp in length, identifying 487 unique sequences (based on the nucleotide level). This is comparable to another study that probed diversity in a 54bp region in DBL3X and identified 222 unique sequences [20]. Because the sequenced region represents <5% of the total var2csa coding region, the actual number of unique var2csa sequences is likely to be far greater in this population. Benaventa et al. have examined the diversity of full-length var2csa sequences from geographically diverse parasites and, consistent with our work, found a significant degree of diversity in DBL3x, though lower nucleotide diversity than domains within the N-terminal minimal binding region. While that study reported a lower percentage of indel positions in DBL3x relative to the DBL2x domain, our work demonstrates that motifs within this indel loop region are associated with clinically relevant parameters in PM, suggesting that the same may be true of indel regions in other domains. It will be interesting to see if these indels also map to putative surface-exposed loop regions, as we demonstrated for the DBL3x domain ( Figure  S3). Although independent of gravidity, the number of unique DBL3X sequences (Figure 2) and the number of sequence types on a per-patient basis were significantly higher in women with histologically-confirmed chronic PM. In this area of high endemicity [47], chronic PM likely represents more than one independent exposure to infection. Given the high level of diversity observed in DBL3X in this study, it would be expected that multiple exposures would yield greater within-host diversity relative to "new", shorter-lived acute infections.
Multigravid women avoid severe outcomes of PM through the development of inhibitory antibodies that can block CSA binding [11]; opsonizing antibodies are also a significant aspect of this immune response [21]. The extent to which antibody-mediated immune pressure leads to the selection of specific var2csa sequence types, however, remains unknown. Consistent with our previously published work [24] and that of others [31], the data reported here show clear evidence for gravidity-associated selection in DBL3X at the sequence level, both in terms of short peptide motifs (Table 2) and the region encompassing variable subdomain V1 that shows evidence of balancing selection (Figure 3). A striking finding of the present work is the association of these peptide motifs with clinically relevant parameters in PM, infant gestational age, and placental hemozoin burden. Such observations provide critical knowledge about protein regions (and therefore epitopes) that may be essential targets of protective immunity.
Ordinal multidimensional scaling (OMDS) and k-step network analysis of the region with the highest evidence of balancing selection revealed clustering into four sequence types (Figure 3), facilitating a broader assessment of associations between var2csa diversity and critical indicators of PM pathogenesis. Histologically confirmed chronic PM was associated with the carriage of a higher number of these unique sequence types relative to acute infection, suggesting that in this environment, superinfection is a contributor to chronic infection. At the sequence type level, a type 1 sequence number was related to reduced infant gestational age at birth ( Table 3). The multivariate analysis of sequence type dominance confirmed that type 1 sequence carriage is associated with preterm birth (Figure 4). While a number of multigravidae in this study did not fall into this sequence dominance group, it is interesting that all members of this group were multigravidae, suggesting a tendency for these parasites to establish infection only in this group. Interestingly, type 1 dominance also predicts self-reported failure to use anti-malarial drugs and is the only factor that maintained significance in a model controlling for the other factors. Thus, type 1 sequence-bearing parasites may not be particularly competitive relative to others but can establish a low-level infection in multigravidae who avoid the use of antimalarial drugs and are immune to other, potentially "primigravida-tropic", VAR2CSA types, such as type 2. The recent evidence of significant variability in antigenicity and CSA binding affinity amongst VAR2CSA variants may be related to the presence of these sequence types in DBL3X or similar sequence types in other domains of this protein [44]. While a type 2 sequence number did not associate significantly with any of the tested parameters, 83% of members in group 2 (type 2 sequence dominance) were primigravidae. Importantly, membership in this group is associated with having a low-birth-weight infant, an outcome of PM that is most prevalent in lower-order births [48]. Types 3 and 4 sequence number carriage also appeared to segregate to different PM outcomes: whereas type 3 sequence numbers were inversely related to parasitemia, increasing type 4 sequences were associated with chronic PM (Table 3). Furthermore, type 3 sequence dominance was associated with less intense placental infection (Figure 4), and this dominance group was 80% multigravidae. Group 4, in whom type 4 sequences were dominant, was 62% primigravid, ref. [20], and a younger age remained a significant factor in association with this group in multivariate analysis. Considered all together, these results suggest that the identified sequence types fall into two broad categories loosely defined by gravidity/age, infection intensity, PM chronicity, and birth outcomes. This evidence suggests that polymorphism in this region of DBL3X, or variant sequences with which these types are in linkage disequilibrium, is a key determinant for the establishment and persistence of infection in pregnant women and therefore represents a critical target for vaccine-induced immunity. Moreover, the results show clear links with critical birth outcomes (low birth weight and pre-term birth), highlighting the need to consider the importance of anti-disease and anti-pathogenesis parameters when choosing vaccine antigens. Given the striking association of type 2 sequences with a critically important outcome of PM in vulnerable primigravid women and type 4 sequences with age, targeting parasite variants of these putative virulent types in a pregnancy-specific malaria vaccine might be particularly beneficial. Our results are concordant with other work that looked at sequences of the VAR2CSA ID1-DBL2x region and found associations of specific VAR2CSA clades with low birth weight [37]. This clearly demonstrates the need for further studies that use full-length VAR2CSA sequences to investigate additional associations with pathology and birth outcome.
The association of type 1 sequences with a self-reported lack of use of anti-malarial drugs is intriguing. P. falciparum in this region of Kenya is known to carry multiple resistance mutations to sulfadoxine-pyrimethamine, the prophylactic drug of choice during the recruitment period for this study [49]. Similar to the situation in neighboring Tanzania [50], sulfadoxine-pyrimethamine failure is associated with a risk for elevated placental parasitemia in this population (Matthias et al. manuscript in preparation), but the determination of how drug use might influence var2csa diversity will require further study.
In summary, we report here exceptionally high levels of inter-and intra-patient var2csa DBL3X diversity. Further, we identify unique sequence types that, similar to previous reports, are influenced by gravidity [24,31] and associated with elevated placental parasitemia [20]. Our novel observations extend such associations to maternal age, anti-malarial drug use, PM chronicity, and birth outcomes, including infant birth weight and gestational age. Given the high degree of interest in developing a VAR2CSA-based vaccine to protect reproductive-age women in endemic areas [43,51], and the dependence of straintranscendent immunity to VAR2CSA on discontinuous or conformational epitopes that may only be found in the native, full-length protein [46], we propose that continued efforts to identify virulence types through intensive sequencing efforts and association studies, such as those reported here, is essential for the identification of optimal antigens for an efficacious vaccine.

Ethics Statement
The study was approved by the Kenya Medical Research Institute Ethical Review Committee and the Institutional Review Boards of the University of Georgia, the Centers for Disease Control and Prevention, and the National Institutes of Health. All participants provided informed, written consent under the auspices of these approved protocols. All methods were carried out in accordance with the approved guidelines. All data associated with this study are now anonymized.

Study Population
Samples and clinical data analyzed here are derived from a cross-sectional study of immune responses to malaria during pregnancy in western Kenya. Recruitment was conducted from November 2002 to September 2008 at New Nyanza Provincial General Hospital in Kisumu and Siaya District Hospital. A previous subgroup analysis of these populations revealed a higher parasite burden among the latter [52]. As a whole, this study population included a total of 1046 parturient women, ranging from gravida 1 to 11.
For the present work, a total of 60 maternal placental blood samples were selected from available placental blood thick smear malaria-positive samples. This initial set included 29 primigravid, 3 secundigravid, and 28 multigravid samples (ranging from gravida 3 to 9). An additional 26 submicroscopic but PCR-confirmed PM+ samples [30] (9 primigravid, 7 secundigravid, and 10 multigravid samples, ranging from gravida 3 to 8) were also selected. Other than PM status and gravidity, no other clinical or laboratory parameters were considered at this stage of sample selection. Aligning with a desire to compare primigravid samples to multigravidae, the secundigravid samples were restricted to technical optimization for pyrosequencing but were ultimately omitted from further consideration.
Among the remaining 57 smear-positive and 19 submicroscopic samples, 19 and 9, respectively, were omitted at various stages of the study due to the consumption of all available DNA during optimization, the failure of amplification in initial PCR screens with DBL3X primers, or the failure of the sample during 454 sequencing library preparation. Ultimately, 49 placental blood samples, 10 recruited from Kisumu and 39 from Siaya, representing 23 primigravidae and 26 multigravidae, and spanning the entirety of the parent study ( Figure S5), were fully analyzed. The multigravid samples represent 13 gravida 3, 8 gravida 4, 2 gravida 5, and one each of gravida 6, 8 and 9. Clinical and demographic data, collected post-partum by interview and the assessment of patient records, are summarized in Table 1.
Immediately post-expulsion, placentas were collected and maternal placental blood was obtained either by the prick method [53] or perfusion [54] under aseptic conditions. Placental parasite density was estimated by counting the number of infected erythrocytes on a Giemsa-stained thick smear of maternal placental prick blood per at least 300 white blood cells. The percentage of these white blood cells containing phagocytosed hemozoin was also recorded, and is reported as "percent hemozoin WBCs". Because complete blood count data (using a COULTER ® A c ·T™ 5diff CP; Cap Pierce) was not available for all women, the calculation of parasite density per microliter of blood used the median placental white blood cell count (12,000/uL for those HIV seronegative and 13,450/uL for those HIV seropositive) for all PM+ women in the parent study for whom complete blood count was ascertained. Percent parasitemia was assessed on Giemsa-stained thin smears of prick blood samples. A thin smear was not available for one multigravida. For two other multigravidae with a low level of infection by thick smear, the percent parasitemia was too low to enumerate accurately; these samples are therefore assigned a parasitemia of 0%. As noted in Table 1, 10 samples (3 from primigravidae and 7 from multigravidae) were blood smear-negative but PCR-positive in placental blood [30]. Gestational age was determined by the modified Dubowitz test [55], and birth weight was measured to an accuracy of 50 g within twelve hours after delivery. Peripheral hemoglobin was measured by Coulter counter, as above; data for sixteen women were not available. Relevant clinical information, including the usage of anti-malarial drugs during pregnancy, was collected from hospital records and by interview within 24 h after delivery.
Placental tissue sections were preserved and histologically assessed as described [30]. At least two discrete full-thickness regions of the placental disk were assessed. Chronic infection was defined as fibrin hemozoin score ≥2 and leukocyte hemozoin score ≥1. Acutely infected placentas scored 1 ± 1 (mean ± SD) for hemozoin-bearing inflammatory cells and 1 ± 1 for hemozoin embedded in fibrin. Chronically infected placentas scored 2 ± 1 for inflammatory cells and 3 ± 1 for fibrin. Histological sections were not available for six participants Analytical cut-offs for high placental percent parasitemia (≥3.4%), high placental parasite density (≥23,187 parasites/uL), and high percent placental white blood cells bearing hemozoin (10.7%) were based on the upper quartile for all microscopically infected women in the parent study population.

Amplification and Sequencing of var2csa DBL3X Domain
The variable region of the DBL3X domain of var2csa was amplified from extracted DNA using previously described primers [24]. The primers were modified for 454 sequencing by including a multiplex identifier (MID), linker, and tag sequence. Each patient sample was amplified in triplicate using the modified primers. Amplicons were pooled and sequenced using a 454 GS Junior at the Georgia Genomics Facility at The University of Georgia. Figure S6 presents a flow chart overview of the next-gen sequencing pipeline.

Data Quality Filtering
A total of 158,573 sequence reads were obtained, and low-quality reads were removed on the basis of a defined cut-off (i.e., length and quality scores). The length cut-off was determined based on all published DBL3X sequences available as of 31 September 2013, and sequences below 320 bp or above 420 bp were omitted. Additionally, sequences with a Q20 quality score of less than 80 were discarded, leaving 127,721 sequences. All quality filtering and data analysis were performed using Geneious v8.1 [56].
Next, all sequences were BLAST searched for P. falciparum and non-hits were discarded. The remaining sequences, numbering 120,625, were separated by unique MIDs (multiplex identifiers) from the pooled data into patient-specific sequences. For each patient data set, sequences were trimmed of MIDs, tags, and primers prior to subsequent analysis.

Unique Sequence Types Analysis
For each patient sample, a mean of 3254 (SD = 1648) reads was obtained. A de novo contig assembly approach was used to identify unique sequence types using Geneious v8.1 Geneious Assembler [56] and the Velvet Short Read Assembler plugin. Each method yielded the same number of unique contigs. For each patient, contigs (i.e., unique sequence types) were obtained through the computation of overlaps between reads, the removal of false overlaps, and the construction of multiple sequence alignments. A minimum/maximum overlap of 320 bp/420 bp and identity of 98% were selected to account for the expected read length and the intrinsic 454 sequencing error of 2%. A total of 522 unique contigs were produced. Between 5 and 5880 (median = 343) reads per patient per contig were obtained. Contigs with fewer than 5 reads were omitted, corresponding to 522 unique contigs. To validate this approach in determining unique sequence types, a 3D7 clonal lineage, as well as a patient found to be clonal (ID = 895) previously [24], was included as a control in each of the 454 runs. These samples consistently yielded a single contig (e.g., one unique sequence type). All sequences have been submitted to GenBank as a PopSet with accession numbers KP275019-KP275392 and KP275406-KP275551.
Ordinal multidimensional scaling (OMDS) and k-step network analysis were used to visualize the genetic distance matrix obtained from the amino acid alignment of unique sequences. Ordinal Multidimensional Scaling (OMDS) finds the optimal display of cases in a dataset for a chosen number of dimensions and was performed with IBM SPSS (New York, NY, USA) statistics for windows. The k-step network displays all possible minimum spanning trees [57] and was built using Python 2.7. Diversity and rarefaction curves were determined using EstimateS v9.1.0 [58]. Tajima D was calculated using DnaSP v5 [59]. Additional information about the data and unique sequence types is provided in Supplementary Figure S5

Motif Analysis
Sequence motif analysis was guided using two criteria: 1. The region that showed the highest evidence of balancing selection as indicated by Tajima D >1 ( Figure 1B). Using the sequence region and motifs previously identified to produce DBL3X antibodies [31].

VAR2CSA Structure Modeling
The protein structure of VAR2CSA (Protein Data Bank: 7JGH) [28] was used as a template for structure modeling. Structure visualization and figure generation were performed using UCSF Chimera molecular analysis program [60].

Statistical Analysis and Graph Generation
Descriptive analysis and graphing of clinical data were performed with GraphPad Prism, v9.2.0. Non-normal data were log-transformed toward normality for statistical analysis. Comparisons of two groups with normally distributed data were conducted with an unpaired two-tailed t-test. Proportions were assessed by Fisher's exact or x 2 tests with Pearson's correction as appropriate. Regression analyses were performed using SAS, v9.4. Continuous clinical and laboratory parameters considered were age, gravidity, placental parasite density, percent placental parasitemia, percent placental leukocytes bearing hemozoin, peripheral hemoglobin, infant birth weight, and gestational age at birth. Categorical variables were gravidity group (primigravid, multigravida), HIV serostatus, high (upper quartile)/low (lower three quartiles including submicroscopic) placental parasite density and percent placental parasitemia, placental histology group (acute/chronic), self-reported antimalarial drug use, birth weight (low (≤2500 g)/normal), and gestational age (preterm (<37 weeks)/term). To assess the extent to which these parameters influence the number of unique sequences per patient, univariate linear regression analysis was performed with continuous clinical and laboratory parameters as the dependent variable, and logistic regression was performed with categorical clinical and laboratory parameters as dependent variables. For the examination of motif and sequence type carriage, which must account for the contribution of various combinations of these features in each individual, linear regression was performed with continuous clinical and laboratory parameters as dependent variables and motif or sequence type counts as predictors; likewise, logistic regression used categorical clinical and laboratory parameters as dependent variables. For the analysis of sequence dominance, groups were defined and subjects assigned based on the most frequent sequence type found at the patient level. Analysis of this parameter assigned group membership as the dependent variable in univariate and multivariate regression analyses, with independent variables including the above described clinical and laboratory parameters. All multivariate linear and logistic regression models incorporated independent variables with p < 0.1 from univariate analysis.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/pathogens11050520/s1, Table S1: Summary of data obtained for primigravid women; Table S2: Summary of data obtained for multigravid women; Table S3: Sequence pairs sharing 99% or higher identity; Table S4: Univariate regression analysis of factors associated with amino acid motif carriage in DBL3X; Table S5: Univariate logistic regression analysis of sequence type dominance; Table S6: Multiple logistic regression analysis of sequence type dominance. Figure S1: Summary of total number of quality reads used. Shown on the x-axis is the total number of sequences and y-axis the sequence lengths in base pairs (bp); Figure S2: Rarefaction curve of var2csa unique sequence types for entire study population (N = 49). The computed refraction curve (in solid green) shows the expected average rate of unique sequence types that would be produced as a result of repeated deep sequencing. Upper and lower 95% confidence intervals (CI) for species richness are shown by dashed and dotted red lines, respectively; Figure S3: VAR2CSA and DBL3X motif region. A ribbon representation of VAR2CSA (grey), DBL3X domain (blue). Dotted circles and atom stick representation show the CSA binding site (orange) and specific motif region (green) associated with gravidity and placental malaria outcomes; Figure S4: DBL3X sequence types are consistently found independently of time and location of patient recruitment. The graph depicts proportional distribution of sequence types by year of patient recruitment. The tale at right summarizes number of patients recruited each year (stratified by gravidity group), number of unique sequences contributed by those patients, and the mean number of unique sequences per patient; Figure S5: Unique DBL3X sequences are not proportionally represented at the individual patient level. The graphs depict the number of unique contigs within each patient. Notably, most patients have a single dominant contig. Patient 895 is depicted twice, representing two separate deep sequencing runs. This patient was previously identified (REF) as having a single unique sequence. DNA from the clonal laboratory isolate, 3D7, was also included as a control, and shows, in two separate runs, a single contig. Similar colors across individual patients do not imply sequence similarity; Figure S6: Schematic overview of the overall sequencing process.

Funding:
We would like to acknowledge support from the Advanced Molecular Detection (AMD) program (https://www.cdc.gov/amd/index.html) at the CDC (accessed on 25 April 2022). This work was supported by NIH grant R01AI05240 to JMM and a CDC/UGA Research Seed Grant to DSP and VU. ET was supported on NIH training grant T32AI060546. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIAID, the National Institutes of Health, the Centers for Disease Control and Prevention, or KEMRI.

Institutional Review Board Statement:
The participation of human subjects and sample collection were conducted in accordance with the Declaration of Helsinki and approved by the University of Georgia (protocol #H2001-10630 and 2006-10855) and the Centers for Disease Control and Prevention (protocol #3260) Institutional Review Boards and the Kenya Medical Research Institute Ethical Review Committee (protocol #639). All data are now anonymized.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available in the main text, figures, and tables.