The Mobilome-Enriched Genome of the Competence-Deficient Streptococcus pneumoniae BM6001, the Original Host of Integrative Conjugative Element Tn5253, Is Phylogenetically Distinct from Historical Pneumococcal Genomes

Streptococcus pneumoniae is an important human pathogen causing both mild and severe diseases. In this work, we determined the complete genome sequence of the S. pneumoniae clinical isolate BM6001, which is the original host of the ICE Tn5253. The BM6001 genome is organized in one circular chromosome of 2,293,748 base pairs (bp) in length, with an average GC content of 39.54%; the genome harbors a type 19F capsule locus, two tandem copies of pspC, the comC1-comD1 alleles and the type I restriction modification system SpnIII. The BM6001 mobilome accounts for 15.54% (356,521 bp) of the whole genome and includes (i) the ICE Tn5253 composite; (ii) the novel IME Tn7089; (iii) the novel transposon Tn7090; (iv) 3 prophages and 2 satellite prophages; (v) 5 genomic islands (GIs); (vi) 72 insertion sequences (ISs); (vii) 69 RUPs; (viii) 153 BOX elements; and (ix) 31 SPRITEs. All MGEs, except for the GIs, produce excised circular forms and attB site restoration. Tn7089 is 9089 bp long and contains 11 ORFs, of which 6 were annotated and code for three functions: integration/excision, mobilization and adaptation. Tn7090 is 9053 bp in size, flanked by two copies of ISSpn7, and contains seven ORFs organized as a single transcriptional unit, with genes encoding for proteins likely involved in the uptake and binding of Mg2+ cations in the adhesion to host cells and intracellular survival. BM6001 GIs, except for GI-BM6001.4, are variants of the pneumococcal TIGR4 RD5 region of diversity, pathogenicity island PPI1, R6 Cluster 4 and PTS island. Overall, prophages and satellite prophages contain genes predicted to encode proteins involved in DNA replication and lysogeny, in addition to genes encoding phage structural proteins and lytic enzymes carried only by prophages. ΦBM6001.3 has a mosaic structure that shares sequences with prophages IPP69 and MM1 and disrupts the competent comGC/cglC gene after chromosomal integration. Treatment with mitomycin C results in a 10-fold increase in the frequency of ΦBM6001.3 excised forms and comGC/cglC coding sequence restoration but does not restore competence for genetic transformation. In addition, phylogenetic analysis showed that BM6001 clusters in a small lineage with five other historical strains, but it is distantly related to the lineage due to its unique mobilome, suggesting that BM6001 has progressively accumulated many MGEs while losing competence for genetic transformation.


Introduction
Comparative genomics of Streptococcus pneumoniae has shown that 558 out of the average 1954 protein coding sequences are conserved [1]. Intraspecies variability is, in part, associated with allelic variants of genes encoding surface-exposed structures such as capsule, pspC and pspA genes [2][3][4][5][6]. In addition to these variable chromosomal loci, the majority of S. pneumoniae intraspecies variability is associated with the presence of mobile genetic elements (MGEs). The entire set of MGEs, collectively referred as a mobilome, contributes to the spread of antimicrobial resistance and virulence genes [7]. The comparison of invasive and noninvasive pneumococcal isolates shows that the invasive potential of a strain is often correlated to the presence or absence of a few specific additional sequences containing important virulence factors [8]. The pneumococcal mobilome includes prophages; plasmids; integrative conjugative elements (ICEs); pathogenicity islands; insertion sequences (ISs); and three families of small interspersed repeats, namely BOX elements, RUPs (repeat units of pneumococcus) and SPRITEs (S. pneumoniae rho-independent terminator-like element) [6]. Pneumococcal prophages are largely present in the genomes of S. pneumoniae isolates of different serotypes and are clustered into three major groups [9,10]. Satellite prophages are also widespread in S. pneumoniae, and despite being defective, may contain genes contributing to phenotypic traits such as virulence [11]. Plasmids are rarely found in S. pneumoniae genomes [6]. Most of the pneumococcal plasmids are identical or very similar to pDP1, a 3161 bp cryptic plasmid first identified in the type 2 D39 Avery's strain and its derivatives [12,13]. ICE Tn5253 was one of the first conjugative transposons identified in S. pneumoniae [14,15]. Tn5253 is a 64.5 kb composite element containing Tn5251, another ICE carrying the tetracycline resistance tet(M) gene, and the Ωcat(pC194) element, carrying the chloramphenicol resistance cat gene [16]. Tn5253 is able to (i) excise from the pneumococcal chromosome reconstituting the attB target site; (ii) produce circular forms; and (iii) transfer horizontally via conjugation in different bacterial species [17,18]. The chromosomal integration of Tn5253 occurs downstream of a conserved 11 bp sequence of the essential rbgA gene in S. pneumoniae and in all other known hosts [19]. Pneumococcal pathogenicity island 1 (PPI1) is a highly variable accessory region widely present in S. pneumoniae isolates [20,21]. It is composed of (i) an operon coding for an ABC iron transporter, (ii) the pezAT toxin-antitoxin system, (iii) a putative lantibiotic production locus and (iv) the nplT neopullulanase gene. In particular, highly virulent S. pneumoniae isolates carrying a variant of PPI1 containing pezAT and nplT exhibit enhanced fitness in blood, lungs and nasopharinx [22]. In this work, we determined the complete genome sequence of the S. pneumoniae type 19F clinical isolate BM6001, which is the original host of ICE Tn5253. Sequence analysis allowed the definition of the BM6001 mobilome. Since prophage ΦBM6001.3 disrupts the competent comGC/cglC gene, we investigated whether prophage mitomycin C induction could restore competence for genetic transformation. Furthermore, a phylogenetic analysis of the BM6001 genome was performed to investigate whether the lack of competence for genetic transformation drives the accumulation of multiple MGEs.

Bacterial Strain and Growth Conditions
S. pneumoniae BM6001 is a serotype 19F clinical strain isolated from a patient with sinusitis [23]. BM6001 was grown in Tryptic Soy Broth (TSB) (BD) or in TSB supplemented with 1.5% agar (BD) and 3% defibrinated horse blood (Liofilchem, Roseto Degli Abruzzi, Italy) at 37 • C in the presence of a 5% CO 2 -enriched atmosphere.

Genomic DNA Purification
Bacterial cells were grown at 37 • C in 500 mL of TSB broth until reaching the late exponential phase (OD 590 = 1), and then harvested via centrifugation at 5000× g for 30 min at 4 • C. High-molecular-weight genomic DNA was purified using a cetyl trimethyl ammonium bromide (CTAB)-based method [24]. Briefly, a cell pellet was dry vortex-mixed and lysed for 15 min at 37 • C in a lysis solution containing sodium dodecyl sulphate (SDS) 0.008% and sodium deoxycholate (DOC) 0.1%, as described in [25]. Proteins and polysaccharides were precipitated in 0.5 M NaCl and CTAB/NaCl (10% CTAB, 0.7 M NaCl) at 65 • C for 10 min. High-molecular-weight DNA was purified three times with 1 volume of chloroform/isoamyl alcohol (24:1 (v:v)), precipitated in 0.6 volumes of ice-cold isopropanol and spooled on a glass rod. DNA was resuspended in 10-fold diluted saline-sodium citrate (SSC) 1× buffer, adjusted to 1× SSC, homogenized using a rotator mixer and stored at 4 • C. DNA was quantified with a Qubit 3.0 Fluorometer (Invitrogen, Waltham, MA, USA) using the Qubit dsDNA BR Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA) and with a spectrophotometer measurement (Implen, Munich, Germany). DNA integrity and size were assayed via horizontal gel electrophoresis using 0.6% Seakem LE (Lonza, Rockland, ME, USA) agarose in 0.5× Tris Borate EDTA running buffer.

Nanopore Sequencing
Nanopore sequencing was carried out essentially as already described [24,26]. Briefly, sequencing libraries were prepared in 1.5 mL LoBind tubes (Sarstedt, Nümbrecht, Germany) using wide bore (Φ 1.2 mm) tips for DNA manipulation to reduce physical shearing. The DNA size selection of the genomic DNA was obtained with 0.5 volumes of AMPure XP beads (Beckman Coulter, Milano, Italy) according to the manufacturer's instructions. Then, 2.5 µg of size-selected DNA was employed for library construction by using the SQK-LSK 108 kit (Oxford Nanopore Technologies, Oxford, UK). Library preparation was performed according to the manufacturer's protocol with the following modifications: (i) incubation on rotator mix for 15 min; (ii) the Library Loading Beads (LLB) were not added. Finally, 1 µg of DNA library was loaded onto a R9.4 flow cell (FLO-MIN106) (Oxford Nanopore Technologies). A 25 h sequencing run was performed on a GridION device (Oxford Nanopore Technologies). Real-time base-calling was performed with Guppy v3.2.6 (Oxford Nanopore Technologies), filtering out reads with a quality cut-off of <Q7. Base-called reads were analyzed with NanoPlot v1.18.2 (https://github.com/wdecoster/NanoPlot).

PCR, qPCR and Direct PCR Sequencing
PCR and direct PCR sequencing were carried out as previously described [27]. Quantitative real-time PCR was carried out with the KAPA SYBR FAST qPCR kit Master Mix Universal (2X) (Merck, Darmstadt, Germany) on a LightCycler 1.5 apparatus (Roche Diagnostics, Mannheim, Germany). The real-time PCR mixture contained, in a final volume of 20 µL, 1× KAPA SYBR FAST qPCR reaction mix, 5 pmol of each primer and 1 µL of bacterial lysate. The thermal profile was an initial 3 min denaturation step at 95 • C followed by 40 cycles of repeated denaturation (0 s at 95 • C), annealing (20 s at 60 • C) and polymerization (45 s at 72 • C). The temperature transition rate was 20 • C/s in the denaturation and annealing step and 5 • C/s in the polymerization step. Oligonucleotide primers and their properties are reported in Table S1. Excised forms were detected and quantified using divergent primers targeting the ends of the elements, while the reconstitution of the integration sites was detected using primers targeting their flanking regions. A standard curve for the gyrB gene of S. pneumoniae BM6001 was built and used to standardize results. Melting curve analysis was performed to differentiate the amplified products from primer dimers.

Competent Cells Preparation and Transformation
S. pneumoniae competent cells' preparation and transformation were carried out essentially as described in [28]. Briefly, precompetent pneumococcal cells were obtained by growing bacteria in TSB until the early exponential phase (OD 590 = 0.1), then diluting them 100-fold in the Competence Transformation Medium (CTM) composed of TSB with 20% glucose, 4% BSA and 1% CaCl 2 . Cells were collected at 15 min intervals during the exponential phase of growth in CTM after 1 h of incubation at 37 • C. Mitomycin C was added 30 min after the dilution in CTM at a final concentration of 2 ng/mL. Competence curve was obtained by adding 1 µg/mL of transforming chromosomal DNA, carrying the nov-1 point mutation, and 100 ng/mL of competence-stimulating peptide (CSP) to the precompetent cells. The transformation mixture was incubated at 37 • C for 45 min. The selection of transformants was obtained by a multilayer plating procedure in the presence of 10 µg/mL novobiocin [29]. The S. pneumoniae FP10 standard recipient was used as a control strain.

Nucleotide Sequences Accession Numbers
The complete genome sequence of Streptococcus pneumoniae BM6001 is available under GenBank accession no. CP107038, whereas Nanopore and Illumina sequencing reads are available under Sequence Read Archive (SRA) accession no. SRR21857739 and SRR21857738, respectively.

The Genome of S. pneumoniae BM6001
The sequence analysis showed that the BM6001 genome is organized in one circular chromosome of 2,293,748 base pairs (bp) in length, with an average GC content of 39.54% ( Figure 1). The genome contains 2403 open reading frames (ORFs), distributed on the sense strand (1172) and on the antisense strand (1231). An annotation with the prediction of a biological function was possible for 2125 ORFs, including 12 ribosomal RNA (rRNA) genes grouped in four rRNA operons, 59 tRNA genes, of which 15 are not adjacent to rRNA operons, and 3 structural RNAs, namely the (i) tRNA-like/mRNA-like RNA, (ii) the signal recognition particle RNA and (iii) the ribonuclease P RNA. BM6001 genome harbors a type 19F capsule locus located between dexB (cds spanning nts 386,479 to 388,086) and aliA (cds spanning nts 405,911 to 407,893) as in almost all pneumococcal strains [2,32,33]. A total of 15 capsule genes were predicted to be a single transcriptional unit with a promoterlike sequence located upstream of wzg/cps19A, as reported in [2,32]. Between the last capsule gene rmlD/cps19O and aliA, the locus harbors the UDP-galactopyranose mutase glf gene, as reported for other serotypes, including 19C [2]. The pspC locus, encoding the multifunctional pneumococcal surface protein C (PspC) in BM6001, contains two tandem copies of the pspC gene spaced by an IS1167 element, as already described in other S. pneumoniae clinical isolates [3]. The 1764 bp pspC copy (cds spanning nts 2,239,644 to 2,241,407) encodes the novel PspC9.5 variant containing an LPXTG anchoring domain, whereas the 2172 bp pspC copy (cds spanning nts 2,243,070 to 2,245,241) encodes the novel PspC3.14 variant containing the classical pneumococcal choline-binding domain. The comCDE competence locus spans nts 2,288,375 to 2,290,595 and contains the comC1 and comD1 alleles [34][35][36], while the phase-variable type I restriction modification system spnIII, predominantly arranged as variant C, spans nts 529,953 to 538,028 [37].

IME Tn7089
The novel IME Tn7089 is 9089 bp in length, spans nts 339,362 to 348,450 and is characterized by an overall GC content of 30.48%. The element contains 11 ORFs, which are all transcribed in the same direction ( Figure 2). Manual homology-based annotation attributed a putative function to only 6 out of 11 ORFs (Table 1). Tn7089 contains (i) an integration/excision module, constituted by the int and xis genes, coding for an integrase and an excisionase; (ii) orf3 and orf7 coding for a Rep protein and an FtsK homologous protein likely involved in the mobilization; and (iii) orf4 and orf5 coding for metal-dependent phosphoesterase proteins. The latter of these may be involved in the hydrolysis of the pyrophosphate released during nucleotide polymerization, and thus may influence the kinetics of DNA synthesis [41]. The NCBI database of 90,819 complete microbial genomes (accessed in May 2023) was interrogated using the Tn7089 DNA sequence as a query. Tn7089-like elements were found in 11 S. pneumoniae genomes and contained nt changes up to 2.56% of the sequence length ( Figure 2). In two additional genomes, a Tn7089-like element contains a 963 bp deletion and a 2682 bp insertion at the 3 end. Furthermore, a Tn7089-like element containing a 1639 bp DNA deletion and an 870 bp DNA insertion at the 3 end was found in the Streptococcus mitis B6 genome.
The novel IME Tn7089 is 9089 bp in length, spans nts 339,362 to 348,450 and is characterized by an overall GC content of 30.48%. The element contains 11 ORFs, which are all transcribed in the same direction ( Figure 2). Manual homology-based annotation attributed a putative function to only 6 out of 11 ORFs (Table 1). Tn7089 contains (i) an integration/excision module, constituted by the int and xis genes, coding for an integrase and an excisionase; (ii) orf3 and orf7 coding for a Rep protein and an FtsK homologous protein likely involved in the mobilization; and (iii) orf4 and orf5 coding for metal-dependent phosphoesterase proteins. The latter of these may be involved in the hydrolysis of the pyrophosphate released during nucleotide polymerization, and thus may influence the kinetics of DNA synthesis [41]. The NCBI database of 90,819 complete microbial genomes (accessed in May 2023) was interrogated using the Tn7089 DNA sequence as a query. Tn7089-like elements were found in 11 S. pneumoniae genomes and contained nt changes up to 2.56% of the sequence length ( Figure 2). In two additional genomes, a Tn7089-like element contains a 963 bp deletion and a 2682 bp insertion at the 3′ end. Furthermore, a Tn7089-like element containing a 1639 bp DNA deletion and an 870 bp DNA insertion at the 3′ end was found in the Streptococcus mitis B6 genome.

Transposon Tn7090
The novel 9053 bp long (nts 1,920,769 to 1,929,821) transposon Tn7090 has a GC content of 30.78% and is flanked by two 460 bp long direct repeats homologous to the ISSpn7 element of the IS5 family ( Figure 3). Tn7090 contains seven ORFs organized as a single transcriptional unit (Table 2). Orf1 is predicted to be a divalent cation-dependent glycerophosphodiester phosphodiesterase family protein which hydrolyzes glycerophosphodiesters into sn-glycerol 3-phosphate and alcohol [42]. In bacteria, this protein family is involved in virulence and adhesion to host cells, as demonstrated for Haemophilus influenzae protein D [43]. Orf2 and Orf7 are predicted to be extracellular solute-binding proteins, whereas Orf3-Orf4 to constitute an ATP-binding cassette transporter. Orf5 is homologous to proteins of the MgtC/SapB/SrpB/YhiD family, likely involved in intracellular survival and Mg 2+ binding [44]. Orf6 is homologous to a phosphatase of the haloacid dehalogenase-like hydrolase family [45]. Altogether, these genes are predicted to code for proteins likely involved in the uptake and binding of Mg 2+ cations in the adhesion to host cells and intracellular survival. The homology search analysis showed that Tn7090-like elements are present in 43 out of 153 complete genomes of S. pneumoniae. positional matrix adjustment. c Numbers in parentheses represent the part of the protein homologous to the Pfam domain.

Transposon Tn7090
The novel 9053 bp long (nts 1,920,769 to 1,929,821) transposon Tn7090 has a GC content of 30.78% and is flanked by two 460 bp long direct repeats homologous to the ISSpn7 element of the IS5 family ( Figure 3). Tn7090 contains seven ORFs organized as a single transcriptional unit (Table 2). Orf1 is predicted to be a divalent cation-dependent glycerophosphodiester phosphodiesterase family protein which hydrolyzes glycerophosphodiesters into sn-glycerol 3-phosphate and alcohol [42]. In bacteria, this protein family is involved in virulence and adhesion to host cells, as demonstrated for Haemophilus influenzae protein D [43]. Orf2 and Orf7 are predicted to be extracellular solute-binding proteins, whereas Orf3-Orf4 to constitute an ATP-binding cassette transporter. Orf5 is homologous to proteins of the MgtC/SapB/SrpB/YhiD family, likely involved in intracellular survival and Mg 2+ binding [44]. Orf6 is homologous to a phosphatase of the haloacid dehalogenase-like hydrolase family [45]. Altogether, these genes are predicted to code for proteins likely involved in the uptake and binding of Mg 2+ cations in the adhesion to host cells and intracellular survival. The homology search analysis showed that Tn7090-like elements are present in 43 out of 153 complete genomes of S. pneumoniae.

Prophages
The BM6001 genome search carried out with PHASTER software led to the identification of three prophages, ΦBM6001.1, ΦBM6001.2 and ΦBM6001.3, ranging in size from 36,108 bp to 42,445 bp, and two satellite prophages, ΦBM6001.4 and ΦBM6001.5 (Figure 4). Prophages have a GC content similar to the host, while satellite prophages have a lower GC content [11]. Virfam analysis and a BLAST homology search, conducted in public protein databases and Pfam, predicted the presence of genes encoding phage structural proteins and lytic enzymes such as holin and amidase for the three prophages. Genes encoding proteins involved in the DNA replication process and in lysogeny were found in both prophages and satellite prophages (Table S3). Only prophage ΦBM6001.2 contains virulence determinants, namely orf31, predicted to code for a YopX homolog [46], and orf38, predicted to encode the "virulence-associated protein E", which is widely spread among pneumococcal phages and associated with S. pneumoniae virulence in a mouse pneumonia model [11] and a tyrosine-tRNA gene, which might contribute to modulating the tRNA pools of bacteria, improving the translation efficiency of viral genes [47]. The homology search showed that ΦBM6001.1 is essentially identical to the S. pneumoniae siphovirus prophage IPP42 in the AP200 genome [10]. ΦBM6001.2 is homologous to S. pneumoniae siphovirus prophage phiARI0131-1 [48] except for the presence of a rearranged copy of the structural gene orf6 and the acquisition of a 1202 bp DNA segment (orf29 to orf33). ΦBM6001.3 has a mosaic structure where the 3 end lysogeny module (orf40 to orf49) is similar to that of S. pneumoniae siphovirus prophage IPP69, whereas the remaining sequence is homologous to S. pneumoniae siphovirus prophage MM1 [10]. Finally, satellite prophages ΦBM6001.4 and ΦBM6001.5 are essentially identical to satellite prophages Javan 759 and Javan 757, respectively [11].

Genomic Islands
Strain-specific genomic DNA fragments have been described in the genome of S. pneumoniae and were named (i) gene clusters [49,50], (ii) regions of diversity (RD) [51,52], or (iii) accessory regions [8,22]. These terms were used as synonyms. In addition, two DNA fragments were described as genomic islands, namely pneumococcal pathogenicity island 1 (PPI1) [20,21] and the PTS island [53]. Using the IslandViewer and BLAST software b, we identified five specific DNA fragments, ranging in size from 5188 bp to 22,489 bp, flanked by direct repeats (7 to 15 bp in length) and characterized by a lower GC content than the average of BM6001 chromosome. Since these DNA fragments are neither prophages nor ICEs/IMEs, we refer to them as genomic islands (GIs) ( Figure 5). GI-BM6001.1 (orf40 to orf49) is similar to that of S. pneumoniae siphovirus prophage IPP69, whereas the remaining sequence is similar to S. pneumoniae siphovirus phage MM1. ΦBM6001.4, nts 2,961 to 21,546, is homologous to prophage Javan 759, and ΦBM6001.5, nts 228,793 to 241,676, is essentially identical to Javan 757. Via manual homology-based annotation, a putative function was only attributed to a minority of prophage ORFs ranging from 38% to 57% (Table 2). ORFs and their direction of transcription are represented by arrows, and annotated ORFs are indicated by their number. Different colors highlight the different functional gene categories. Chromosomal genes flanking the prophage insertion sites are represented by thin gray arrows. The scale is in kilobases.

Genomic Islands
Strain-specific genomic DNA fragments have been described in the genome of S. pneumoniae and were named (i) gene clusters [49,50], (ii) regions of diversity (RD) [51,52], or (iii) accessory regions [8,22]. These terms were used as synonyms. In addition, two DNA fragments were described as genomic islands, namely pneumococcal pathogenicity island 1 (PPI1) [20,21] and the PTS island [53]. Using the IslandViewer and BLAST software b, we identified five specific DNA fragments, ranging in size from 5188 bp to 22,489 bp, flanked by direct repeats (7 to 15 bp in length) and characterized by a lower GC content than the average of BM6001 chromosome. Since these DNA fragments are neither prophages nor ICEs/IMEs, we refer to them as genomic islands (GIs) ( Figure 5). GI-BM6001.1 is located in the intergenic region between the cell division divIB gene and the orotidine-5 -phosphate decarboxylase pyrF gene. GI-BM6001.1 is a variant of TIGR4 RD5 [51] and contains four ORFs, including orf2, which is predicted to code for an HesA/MoeB/ThiF family enzyme that catalyzes the adenylation of ThiS, a sulfur carrier protein involved in the biosynthesis of thiamine [54], and orf3 and orf4, coding for an ATP-binding cassette transporter. Furthermore, a truncated IS of the IS30 family is present at the 3 end. GI-BM6001.2, flanked by a 14 bp inverted repeat, is a variant of the pathogenicity island PPI1 [21] located at the same intergenic region as that found in the TIGR4 genome. Compared to TIGR4 PPI1, a 5657 bp insertion produces a 181 bp deletion involving the PPI1 sp1038 gene, while at the 3 end, a 1715 bp insertion produces a 13,297 bp deletion, corresponding to PPI1 sp1048-sp1066. GI-BM6001.3 is a variant of the R6 Cluster 4 [50] and is part of a variable locus in S. pneumoniae genomes, located between the conserved NADP-specific glutamate dehydrogenase gdhA and 50S ribosomal protein L7/L12 rplL genes ( Figure 5) [55,56]. The island is flanked by a 7 bp DR and contains a truncated copy of ISSmi2 of the IS1182 family at the 5 end. The DR likely acts as a hotspot for integration, as a truncated copy of the IS5 element was found downstream of the 3 end of GI-BM6001.3, and a complete copy of IS5, flanked by the 7 bp DR, was found in other pneumococcal genomes. GI-BM6001.3 contains 11 ORFs, of which, orf4 to orf11 are predicted to be a single transcriptional unit likely involved in the metabolism of sialic acid [57]. Orf4 is a phosphatidylglycerophosphatase A, Orf9-Orf8-Orf7-Orf6 constitute an ATP-binding cassette transporter, Orf10 is a cyclically permuted mutarotase family protein and Orf11 is homologous to an N-acetylmannosamine-6-phosphate 2-epimerase (Table S4). GI-BM6001.4 is flanked by 15 bp imperfect direct repeats containing the 3 end of the guaA gene. The island carries nine ORFs including orf3, a cell division FtsK gene and orf1 and orf2, predicted to encode an integration/excision module. GI-BM6001.4-like sequences were only found in two pneumococcal complete genomes and in other streptococcal genomes such as Streptococcus suis, Streptococcus equi and Streptococcus agalactiae. GI-BM6001.5 is a variant of the PTS island described in S. pneumoniae Hungary 19A-6 and OXC141 genomes [53] located in an intergenic region. Compared to the PTS island, GI-BM6001.5 contains the insertion of ISSpn5 in an intergenic region and an insertion of orf7 within the putative cellobiose PTS system module.

Excision and Attachment Sites' Reconstitution of BM6001 MGEs
PCR and sequencing analysis were used to detect and quantify the restored insertion sites and the excised forms of the different genetic elements composing the BM6001 mobilome. As already reported, ICE Tn5253 excises from bacterial chromosomes, producing circular forms and the reconstitution of the attB target site [18]. IME Tn7089 is able to excise from the bacterial chromosome, producing a circular form where the left and right ends are joined by a 52 bp sequence (attTn). The excision of the element reconstitutes the chromosomal 52 bp Tn7089 attachment site (attB), whereas upon chromosomal integration, the element is flanked by attL and attR, identical to attTn and attB, respectively. attL-attTn contain four nt changes compared to attR-attB. attB corresponds to the last 52 nts of the rpsI coding sequence encoding the 30S ribosomal protein S9. In liquid culture of BM6001, circular forms of Tn7089 were present at a concentration of 3.92 × 10 −8 (±1.08 × 10 −8 ) copies per chromosome, whereas the reconstituted attB site was at 5.73 × 10 −8 (±5.78 × 10 −9 ) copies per chromosome (Table 3). Also, Tn7090 is able to excise from the chromosome and produce a circular form. Upon excision, a copy of ISSpn7 remains in the chromosome, while a copy of ISSpn7 joins the left and right ends of the transposon circular forms. Tn7090 circular forms were present at a concentration of 4.19 × 10 −4 (±1.60 × 10 −4 ) copies per chromosome, whereas the reconstituted attB site was at 3.21 × 10 −4 (±3.11 × 10 −5 ) copies per chromosome (Table 3). Prophages and satellite prophages also produce excised forms where the left and right ends are joined by an attP sequence, restoring the relative attB integration sites. Prophages and satellite prophage attB sites are insertional hotspots, since other genetic elements integrate at the same site [11,58]. In ΦBM6001.1, the 21 bp attB is identical to attP and is located in the intergenic region, spacing the adenylosuccinate synthase gene and tRNA-specific adenosine deaminase tadA gene (Figure 4). ΦBM6001.2 is flanked by 14 bp long attL and attR, which differ by one nt change. attB is located in the intergenic region between the NAD(P)/FAD-dependent oxidoreductase gene and the DNA-binding protein whiA gene. ΦBM6001.3 is flanked by 11 bp long attL and attR, identical to attB and attP, respectively, and differing by two nt changes. The ΦBM6001.3 attB corresponds to 11 nts (2,042,299 to 2,042,309) located at the 5 end of the comGC/cglC late competence gene [59][60][61]. Both ΦBM6001.4 and ΦBM6001.5 satellite prophages integrate in intergenic regions, upstream of ychF and uvrA genes, respectively. ΦBM6001.4 attP and attB sites are 26 bp long with attL-attB differing from attR-attP by four nt changes, whereas ΦBM6001.5 attP and attB sites are 24 bp long with attL-attP differing from attR-attB by one nt change. Excised forms of prophages and satellite prophages range from 1.12 × 10 −5 to 2.35 × 10 −1 , while the frequency of reconstituted integration sites varies from 3.23 × 10 −7 to 3.01 × 10 −2 (Table 3). Finally, neither circular forms nor reconstituted chromosomal target sites could be detected for genomic islands. variable locus in S. pneumoniae genomes, located between the conserved NADP-specific glutamate dehydrogenase gdhA and 50S ribosomal protein L7/L12 rplL genes ( Figure 5) [55,56]. The island is flanked by a 7 bp DR and contains a truncated copy of ISSmi2 of the IS1182 family at the 5′ end. The DR likely acts as a hotspot for integration, as a truncated copy of the IS5 element was found downstream of the 3′ end of GI-BM6001.3, and a complete copy of IS5, flanked by the 7 bp DR, was found in other pneumococcal genomes. GI-BM6001.3 contains 11 ORFs, of which, orf4 to orf11 are predicted to be a single transcriptional unit likely involved in the metabolism of sialic acid [57]. Orf4 is a phosphatidylglycerophosphatase A, Orf9-Orf8-Orf7-Orf6 constitute an ATP-binding cassette transporter, Orf10 is a cyclically permuted mutarotase family protein and Orf11 is homologous to an N-acetylmannosamine-6-phosphate 2-epimerase (Table S4). GI-BM6001.4 is flanked by 15 bp imperfect direct repeats containing the 3′ end of the guaA gene. The island carries nine ORFs including orf3, a cell division FtsK gene and orf1 and orf2, predicted to encode an integration/excision module. GI-BM6001.4-like sequences were only found in two pneumococcal complete genomes and in other streptococcal genomes such as Streptococcus suis, Streptococcus equi and Streptococcus agalactiae. GI-BM6001.5 is a variant of the PTS island described in S. pneumoniae Hungary 19A-6 and OXC141 genomes [53] located in an intergenic region. Compared to the PTS island, GI-BM6001.5 contains the insertion of ISSpn5 in an intergenic region and an insertion of orf7 within the putative cellobiose PTS system module.

Impact of ΦBM6001.3 on Genetic Transformation
ΦBM6001.3 chromosomal insertion disrupts the major pilin-like comGC/cglC gene, a member of the comG/cgl operon involved in the formation of the DNA uptake machinery during transformation [60,62]. To investigate if ΦBM6001.3 impairs transformation [48], BM6001 competent cells were collected over a 90 min interval and transformed with a donor DNA carrying a point mutation for a selectable marker [28]. Competent cells were also obtained in the presence of a mitomycin C stimulus. In our experimental conditions, we were not able to obtain transformants at any time point sampled, neither with nor without mitomycin C (<1.07 × 10 −7 transformants/total CFU). Then, we quantified the ΦBM6001.3 excised forms and attB restoration frequency in bacterial lysates obtained from competent cells. Over time, we did not observe a significant variation in the frequencies of ΦBM6001.3 excised forms and reconstituted attB in the untreated competent cells (average values of 8.93 × 10 −1 and 4.32 × 10 −2 , respectively; Figure 6), whereas mitomycin C induced a 10-fold frequency increase compared to the untreated control after 1 h of treatment.

Impact of ΦBM6001.3 on Genetic Transformation
ΦBM6001.3 chromosomal insertion disrupts the major pilin-like comGC/cglC gene, a member of the comG/cgl operon involved in the formation of the DNA uptake machinery during transformation [60,62]. To investigate if ΦBM6001.3 impairs transformation [48], BM6001 competent cells were collected over a 90 min interval and transformed with a donor DNA carrying a point mutation for a selectable marker [28]. Competent cells were also obtained in the presence of a mitomycin C stimulus. In our experimental conditions, we were not able to obtain transformants at any time point sampled, neither with nor without mitomycin C (<1.07 × 10 −7 transformants/total CFU). Then, we quantified the ΦBM6001.3 excised forms and attB restoration frequency in bacterial lysates obtained from competent cells. Over time, we did not observe a significant variation in the frequencies of ΦBM6001.3 excised forms and reconstituted attB in the untreated competent cells (average values of 8.93 × 10 −1 and 4.32 × 10 −2 , respectively; Figure 6), whereas mitomycin C induced a 10-fold frequency increase compared to the untreated control after 1 h of treatment.

Phylogenetic Analysis of BM6001
To investigate whether the lack of competence for genetic transformation drives the accumulation of multiple MGEs within the BM6001 genome, a collection of sequenced historical pneumococcal genomes was used to analyze the phylogenetic relationship of BM6001 [30,31]. A total of 195 genomes were used to generate a phylogenetic tree using

Phylogenetic Analysis of BM6001
To investigate whether the lack of competence for genetic transformation drives the accumulation of multiple MGEs within the BM6001 genome, a collection of sequenced historical pneumococcal genomes was used to analyze the phylogenetic relationship of BM6001 [30,31]. A total of 195 genomes were used to generate a phylogenetic tree using both the core and accessory genome information (Figure 7). BM6001 collocates in the same lineage with five other genomes, namely GA13455, CDC1873-00, GA04375, SA41 and SA42. The core genome distances vary between 0.005647123 and 0.024928689, while the accessory genome distances are between 0.14985132 and 0.4647357 (Table S3), indicating that most changes are clustered in the mobile portion of the genome. The 292,323 bp representing the BM6001 mobilome devoid of ISs were used as a probe for the homology search in the five genomes belonging to the same lineage. Regions of homology were identified in all the genomes, ranging from a minimum of 34,436 bp (11.8%) in GA04375 to a maximum of 129,463 bp (44.3%) in SA42.   Genetic relatedness was evaluated with the PopPUNK tool using the '--fit-model lineage' parameter for data fitting. The phylogenetic tree was generated on the sequence data available at GenBank with branch lengths, calculated based on k-mer analysis, indicating the number of nucleotide substitutions per site (scale bar), whereas 23 lineages (colored dots) were obtained including both core and accessory genome sequence information. Lineage 9 includes BM6001 and strains GA13455, CDC1873-00, GA04375, SA41 and SA42 (red dots). Names of the strains are indicated, and the date of isolation is reported in parenthesis.

Discussion
In this study, complete genome sequence analysis allowed the definition of the BM6001 mobilome, which includes, beyond the well-characterized ICE Tn5253, (i) 2 novel elements, IME Tn7089 and transposon Tn7090; (ii) 3 prophages and 2 satellite prophages; (iii) 5 GIs; (iv) several ISs, RUPs, 153 BOX elements and SPRITEs. MGEs represent 15.54% of the BM6001 genome and prophages and satellite prophages constitute 6.49% of the genome alone, while the average for pneumococci is 3.8% [11]. The annotation of the GI genes revealed that they contained five predicted transport systems, which are possibly involved in pneumococcal virulence [8,20,21]. Furthermore, ΦBM6001.2 contains virulence determinants coding for the YopX family protein and the virulence-associated E family protein [11], while Tn7090 codes for proteins likely involved in the uptake and binding of Mg 2+ cations, in the adhesion to host cells and in intracellular survival. It has been shown that ICE Tn5253 is capable of excision from the BM6001 chromosome, circularization and subsequent conjugal transfer to other pneumococcal strains [18]. Instead, BM6001 does not acquire exogenous DNA via genetic transformation even though S. pneumoniae are naturally transformable species. Given this, we first hypothesized that the insertion of ΦBM6001.3 within the major pilin-like comGC/cglC gene impairs the DNA entry, since the gene is part of the comG/cgl operon involved in the formation of the pneumococcal DNA uptake machinery during transformation. It is noteworthy that ΦBM6001.3 excises at a high frequency from the pneumococcal chromosome, restoring the insertion site in about 3% of the cells, which therefore have an intact copy of comGC/cglC. Mitomycin C induction further increases ΦBM6001.3's frequency of excision and restoration of the intact comGC/cglC gene of about one order of magnitude. Nevertheless, BM6001 could not be transformed in any of our experimental conditions. As observed for ΦBM6001.3, mitomycin C treatment induces the excision of the lysogenic phage Φ1207.3 [63][64][65][66], which inserts within the pneumococcal comEC/celB competence gene impairing genetic transformation and increases the free locus restoration frequency. Differently from ΦBM6001.3, this restores a low level of competence for genetic transformation (9.96 × 10 −4 transformants/total CFU). To investigate whether other factors could explain the lack of transformability of BM6001, we compared the BM6001 pneumococcal competence genes to those of TIGR4 and found that all of them were essentially conserved, except for comGC/cglC disruption. Furthermore, the presence of restriction modification (RM), toxin-antitoxin (TA) and abortive infection (Abi) systems, carried by BM6001 MGEs, could concur with transformation impairment; in fact, Tn5253 harbors two copies of the pezAT TA system, which is known to impair transformation [67], a methylase and an Abi system, while ΦBM6001.2 carries a restriction enzyme and an Abi protein. Transformation impairment is probably the result of multiple strategies adopted by the BM6001 mobilome to avoid its curing; in fact, genome-wide analysis has shown that transformation can lead to the removal of inserted MGEs via homologous recombination and that many MGEs tend to disrupt competence genes to block transformation and consequently avoid this phenomenon [48,68]. The genomic comparison showed that BM6001 is in the same lineage of five other historical strains, but it only shares less than 50% of its mobilome sequences with the closest relative. Altogether, our findings suggest that BM6001 has progressively accumulated many MGEs while losing competence for genetic transformation, to become a sort of genomic cul-de-sac.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/microorganisms11071646/s1, Table S1: List of oligonucleotide primers; Table S2: List of insertion sequences; Table S3: List of annotated ORFs of prophages and satellite prophages; Table S4: List of annotated ORFs of genomic islands; Table S5: Core and accessory genome distances.