Burkholderia cenocepacia Prophages—Prevalence, Chromosome Location and Major Genes Involved

Burkholderia cenocepacia, is a Gram-negative opportunistic pathogen that belongs to Burkholderia cepacia complex (BCC) group. BCC representatives carry various pathogenicity factors and can infect humans and plants. Phages as bacterial viruses play a significant role in biodiversity and ecological balance in the environment. Specifically, horizontal gene transfer (HGT) and lysogenic conversion (temperate phages) influence microbial diversification and fitness. In this study, we describe the prevalence and gene content of prophages in 16 fully sequenced B. cenocepacia genomes stored in NCBI database. The analysis was conducted in silico by manual and automatic approaches. Sixty-three potential prophage regions were found and classified as intact, incomplete, questionable, and artifacts. The regions were investigated for the presence of known virulence factors, resulting in the location of sixteen potential pathogenicity mechanisms, including toxin–antitoxin systems (TA), Major Facilitator Superfamily (MFS) transporters and responsible for drug resistance. Investigation of the region’s closest neighborhood highlighted three groups of genes with the highest occurrence—tRNA-Arg, dehydrogenase family proteins, and ABC transporter substrate-binding proteins. Searches for antiphage systems such as BacteRiophage EXclusion (BREX) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) in the analyzed strains suggested 10 sequence sets of CRISPR elements. Our results suggest that intact B. cenocepacia prophages may provide an evolutionary advantage to the bacterium, while domesticated prophages may help to maintain important genes.


Introduction
The Burkholderia genus, named after its discoverer, Walter Burkholder [1,2] includes Gram-negative bacteria that cause plant disease (e.g., sour skin rot disease), but paradoxically they also act as natural pest antagonists by promoting plant growth [3,4]. There are also reports that Burkholderia lipopolysaccharide (LPS) is a factor which induces plant defensive capacity [5].
Burkholderia cenocepacia is a multidrug resistant, opportunistic pathogen. B. cenocepacia belongs to the Burkholderia cepacia complex (BCC), a group of 21 species of opportunistic pathogens that colonizes patients with cystic fibrosis (CF) and chronic granulomatous disease (CGD) [6]. Although, BCC representatives are responsible for only 3.5% of infections in CF patients, the fatality rate is higher than in the more commonly acquired Pseudomonas aeruginosa infection [7].
Following colonization of the host by BCC, the likelihood of complete eradication is very low. The course of infection differs depending on the CF status and may range from asymptomatic to severe inflammation of the lower respiratory tract, leading to lung function decline and ultimately to the In this study we analyzed 16 B. cenocepacia strains with fully sequenced genomes stored in the NCBI database. The main goals were to search for the presence and prevalence of prophages and their gene composition. Further, phage accumulation in particular chromosomes, location site (integration point), and closest neighborhood (flanking sequences) were also investigated. The occurrence of known virulence factors of B. cenocepacia was examined to determine if they were any of phage origin. All identified prophages were compared phylogenetically, to establish their taxonomic relationship. The analysis was done by in silico methods, based on two different approaches, one fully automatic and one manual approach to highlight the advantages and disadvantages of both techniques.

PHASTER Automated Annotation
Genomes of B. cenocepacia used for the analysis were obtained from the NCBI database (http://www. ncbi.nlm.nih.gov/) and were chosen based on completeness of sequence. Other incomplete sequences were omitted (as of 31.05.2017), leaving 16 bacterial genomes for further analysis (Table 1). Most B. cenocepacia strains have three chromosomes and each one was analyzed separately [30].  The prophages were located using PHASTER [31]. PHASTER scores potential phage regions using one of three methods, all of which are based on comparisons with the NCBI complete virus genome table (direct link to the table in citation) [32]. Based on the score, the region was assigned to one of three categories: intact (score higher than 90), questionable (score between 70 and 90) and incomplete (score below 70). The first method of scoring regions utilizes global comparison of the region with each of phages in the NCBI table as follows: if there is 100% or more conformity between total number of the CDS in both sources, the region score is set to 150. If the first method fails, the second or third one is used sequentially. The second evaluation is divided for three steps: (i) designation of the most probable potential phage (in the same manner as the first method, although with a minimum conformity level of 50%); (ii) the percentage comparison of the amount of proteins in the region and most probable potential phage, checked and multiplied by 100 (giving a partial score); (iii) the percentage comparison of the region and major potential phage length is checked and multiplied by 50 and added to the partial score from the step "ii", giving the final score. The last method relies on the keywords contained in the names of phage specific proteins (also called "cornerstones") present within particular region. PHASTER searches for words such as: "capsid", "coat", "fiber", "head", "integrase", "lysin", "plate", "tail", "portal", "terminase", "transposase", etc. Each of those hits grants 10 points for particular region. Additionally, up to 20 points can be awarded, if the region size is greater than 30 kb and there are at least 40 proteins in it.
PHASTER runs the second and third method in parallel, leaving the higher score as the final one for particular region [31,33].

Manual Annotation
The annotation of prophage genomes was performed twice-once via a PHASTER automated annotation during the prophage region search and secondly by a manual search. During the manual annotation the open reading frames were delimited with Artemis (open reading frame (ORF) size equal or higher than 75 nt) and confirmed with GenMarkS [34,35]. All sequences of identified open reading frames were compared with the NCBI database using BLASTN and BLASTP and annotated accordingly [36,37]. In certain cases where BLAST results were questionable or were difficult to assess (hypothetical proteins), HMMER software was used to verify the protein structure and domains [38].
Manual annotation was enriched by additional verification of 10 open reading frames lying upstream and downstream of regions found by PHASTER, to confirm or refute their involvement in a potential prophage genome. The regions were searched for the presence of regulatory sequences during the manual annotation. The transfer RNA's localization was conducted using Aragorn and confirmed by tRNAscan-SE [39,40]. Potential Rho-independent terminators were found using ARNold and curated manually [41]. The search for putative promoters was done by extraction of 100-nt sequences lying upstream of the predicted ORFs using STORM and their analysis by MEME [42,43]. MEME identified repetitive motifs which were investigated manually. Once identified, the sequence and localization of any regulatory sequences were marked in region characteristics cards (Supplementary Materials). Both annotation methods were compared in the tables (Tables S2-S17). Each table includes the manual and automatic annotation of one region, tabulating the localization of each ORF, the virus to which it holds homology as well as its accession numbers.

Phylogeny Analysis
Phylogeny reconstruction trees based on of the maximum likelihood method were prepared using Mega7. This software was set to use Jones-Taylor-Thornton (JTT) matrix-based model, while Neighbor-Join and BioNJ algorithms were applied to create an initial tree for heuristic search [44,45].
The graphical mapping of genomes was based on the PHASTER output data and confirmed by EasyFig 2.1 [46]. The location of regions in the host genomes was prepared with Circos (v. 0.67, Canada's Michael Smith Genome Sciences Centre) [47].

Results
In the 16 B. cenocepacia genomes analyzed, sixty-three potential prophage sequences were detected ( Table 2). Each region name was created based on its bacterial host with the addition of the chromosome number in which the prophage lay. The location of prophages and their initial completeness analysis were performed automatically with PHASTER to make three different categories: intact (marked in table as green), incomplete (marked red) and questionable (marked blue) ( Table 2). Only intact regions were further annotated manually and compared in terms of taxonomy to known phages (based on NCBI database). The visualization of their genome map, investigation of regional neighborhood genes and phylogenetic comparisons were also done. The latter two analyses were also used for incomplete regions. Table 2. Presence of prophages in Burkholderia cenocepacia genomes stored in the NCBI database. Regions types were marked with colors: intact (green), questionable (blue) incomplete (red) and artifact region (yellow).    After initial automatic analysis of the qualified regions, detailed annotation of intact regions was conducted via Artemis and manual search with BLASTP. In addition, the data which indicated the presence of phage-borne DNA such as: (i) the presence of regions with higher G + C pairs; (ii) the presence of phage recombinase gene-integrase; (iii) the presence of pathogenicity factors of viral origin; (iv) the presence of the cornerstones, were also included into the study [48,49]. The cornerstones are highly conserved phage specific proteins, which serve as indicators of the presence of a potential prophage: large terminase subunit, portal protein, head maturation protease, coat protein, tail shaft protein, tail fiber protein and tail tape measure protein. Especially distinctive are groups of large terminase subunits and portal proteins as conservative viral proteins [25].
Taking into consideration all aforementioned conditions, it has been showed that PHASTER annotation often differed substantially from manual result. Three main characteristics have been noted for PHASTER regions analysis: (i) it often classifies genes as viral, even if its homology to the database was minimal (less than 25%); (ii) the gene sets in some regions either consist of multiple repeats of one gene or lack sufficient genes crucial for phage functioning; (iii) in contrast to the manual analysis, viral proteins were classified as phages specific to various bacterial taxonomical groups other than Burkholderia. Nonviral gene occurrence, gene repetitions and incorrect taxonomical affiliation of the genes were observed in the case of J2315_chr2_1, H111_chr1_3, 895_chr1_2, 895_chr1_3 and MSMB384WGS_chr1_2 regions which PHASTER marked as intact phages. Consequently, an additional group named by us as an artifact region (yellow) has been created. The artifact region group assembled all regions with homology less than 25% to the database or with gene set suggesting phage remnants.
In the case of two B. cenocepacia strains, J2315 and H111, a total of three phage regions have been already described in the literature. In B. cenocepacia J2315, chromosome 1, two prophages-KS10 (Goudie et al.) and Bcep-Mu (Summer et al.) were reported [50,51]. The third presided in the B. cenocepacia H111, chromosome 1-φH111-1 (Lynch et al.) [29]. Even though the φH111-1 was characterized thoroughly, its analysis was done based on the incomplete sequence of B. cenocepacia H111 (contigs). Because of that, the location of the φH111-1 was suggested on the second chromosome. As our study was conducted on the complete sequence of B. cenocepacia H111, the location of the φH111-1 could be indicated more precisely and was located on the first chromosome. [29]. Due to extensive characterization of regions containing KS10, BcepMu and φH111-1 in the literature, they were treated as intact phages and included only in phylogenetic analysis.
The final count of prophages, based on both automated and manual methods, revealed fifteen intact, nine questionable and thirty-four incomplete phages (Table 2). Furthermore, five regions were classified as artifacts-J2315_chr2_1, H111_chr1_3, 895_chr1_2, 895_chr1_3 and MSMB384WGS_chr1_2. Eight intact regions showed homology to Myoviridae (region homology ranged between 10.4% and 79.9%), four were similar to Siphoviridae (homology between 1.7% and 54.3%). The remaining three phages, KS10 and Bcep-Mu are described as Myoviridae, while φH111-1 belongs to Siphoviridae (Table S18). No correlation between host origin (CF or environmental) and number of phages have been detected.
Due to the observed differences between manual and automated annotation, the data obtained by both processes were compared in Tables S1-S16. PHASTER annotation occasionally did not indicate the gene function-only the gene number, such as gp12. To make the comparison table more consistent, a manual annotation was done for genes lacking a name, thus the basic information of the coding sequence was provided. This was done by a manual search of the databases for proteins with 100% homology or investigation of information stored in the NCBI database.
The interpretation of BLASTp results and occurrence of ORFs differed locally between manual and automatic annotation in all of analyzed genomes. Variations in genome start positions were noted for regions: 895_chr1_9, FL-5-3-30-S1-D7_chr1_2 and J2315_chr1_1. Furthermore, the latter phage also had an altered genome end position.
In most of the potential prophage genomes (except artifact regions) the differences in annotation were not sufficiently significant to change the final classification of prophage completeness. However, in the case of J2315_chr2_1, H111_chr1_3, 895_chr1_2, 895_chr1_3 and MSMB384WGS_chr1_2 the result of the manual annotation led to a change of status from an intact to an artifact region.
Potential bacteriophages found in the bacterial genomes differed in the prevalence and distribution within host chromosomes. To investigate if there was any particular pattern in the placement of prophages in B. cenocepacia strains, the amount and location of prophage regions were examined. Table 3 presents the total prevalence of phage genomes in B. cenocepacia chromosomes. The occurrence of phages in hosts genomes varied from 0.3% to 3.67% of total DNA content, while the virus abundance in a single chromosome was between 0.3% up to 4.29%. Regardless of the number of chromosomes in a particular strain, the majority of potential phage regions were always found in chromosome 1. In 11 out of 12 strains possessing three chromosomes, most of prophage sequences were divided between chromosome 1 and 3, with a dominance of chromosome 1 (seven out of eleven). In the case of two-chromosome strains the ratio of phage to host sequence did not have any distinctive division. The intact regions were compared with the BCC phages available in the NCBI database and with each other (Table S18). The inquiry against the database returned following results: (i) phages J2315_chr1_1 (41.28%), 895_chr1_7 (33.54%) and CR 318_chr1_1 (74.62%) showed high similarity to phage KL3 (NC_015266.1); (ii) phages VC7848_chr1_2 (78.02%) and MC0-3_chr1_1 (63.70%) showed high similarity to phage AP3 (KP966108.1); (iii) phages DWS 37E-2_chr1(1.70%) and FL-5-3-30-S1-D7_chr1_2 (1.70%) showed minimal similarity to phage phi1026b (NC_005284.1); (iv) phage 895_chr1_9 showed 54.29% similarity to KS9 (NC_013055.1); (v) phage 895_chr2_1 showed 61.17% similarity to ST79 (NC_021343.1); (vi) phage DDS 22E-1_chr1_2 showed 21.50% similarity to phi644 (NC_009235.2); (vii) phage VC12308_chr1_1 showed 46.44% similarity to phi52237 (NC_007145.2); while (vii)-phage 895_chr1_1 showed 79.90% similarity to BcepMu (NC_005882.1). The comparison among studied phages showed a high similarity between phages J2315_chr1_1, VC12308_chr1_1 and 895_chr1_7. Another similar phages were MC0-3_chr1_1 and VC7848_chr1_1. The similarity between the other phages examined was not significant (Table S18).
Potential bacteriophages found in the bacterial genomes differed in the prevalence and distribution within host chromosomes. To investigate if there was any particular pattern in the placement of prophages in B. cenocepacia strains, the amount and location of prophage regions were examined. Table 3 presents the total prevalence of phage genomes in B. cenocepacia chromosomes. The occurrence of phages in hosts genomes varied from 0.3% to 3.67% of total DNA content, while the virus abundance in a single chromosome was between 0.3% up to 4.29%. Regardless of the number of chromosomes in a particular strain, the majority of potential phage regions were always found in chromosome 1. In 11 out of 12 strains possessing three chromosomes, most of prophage sequences were divided between chromosome 1 and 3, with a dominance of chromosome 1 (seven out of eleven). In the case of two-chromosome strains the ratio of phage to host sequence did not have any distinctive division.     To investigate prophage integration sites in the host chromosomes we manually examined their adjacent regions in the adjacent neighborhood. This verification was done regardless of the region status (intact, incomplete, questionable, artifact) (Table S18). Two approaches were used. First, we determined single genes located next to start and end positions. Second, we considered the 5-10 kb region located next to phage genome start and end. The first study aimed to search for a pattern in integration site (similar neighboring gene). The second method sought to verify the positioning of the start and end positions which were determined automatically by PHASTER. If the verification showed the presence of additional phage-like genes, the appropriate change was done in manual version of annotation. In most cases the phage genomes started and ended in the intergenic space. There were however several exceptions: (i) start position of prophage regions located inside the host gene were observed for J2315_chr1_2, 895_chr1_3, 895_chr1_4, HI2424_chr2_1, HI2424_chr3_1, DDS 22E-1_chr2_1; (ii) end position of prophage regions located inside the host gene were observed for DWS 37E-2_chr1_1, 895_chr1_3, 895_chr1_8 and VC12308_chr1_1; (iii) both, start and end positions of prophage regions located inside the host gene were observed for 895_chr1_3 (artifact region).
The closest neighbors of phage regions were often hypothetical proteins, which flanked phage genomes in 48 cases (37.2%). The most common neighboring, recognizable sequence (non-hypothetical protein), was tRNA-Arg, which was found either upstream or downstream of the prophage sequence (10.3% of all regions). The second group of sequences located next to identified regions was ABC transporter genes (4.7% of all regions), while the third one was LysR family transcriptional regulator genes (4%). Peptidase encoding genes were found in the neighborhood of three phage regions (2.4%). The remaining 52 flanking genes (41.4%) have determined function, however they did not repetitively appear as neighboring sequences (Table S18).
Bacteriophages can encode virulence factors such as toxins, enzymes or drug resistance. As bacteria may domesticate phages by leaving only useful genes, all identified prophage regions (complete, questionable, artifact, incomplete) were analyzed for the presence of potential virulence genes (Table 4). Thirteen genes were recognized as sequences potentially increasing bacterial drug resistance: (i) one was specified as class A β-lactamase; (ii) one was the Vicinal Oxygen Chelate (VOC) family protein; (iii) eleven belonged to the Major Facilitator Superfamily (MFS) transporter superfamily. VOC proteins possess Glo_EDI_BRP_like domains, which are found in a vast variety of related groups of metalloproteins. One of these groups comprised of antibiotic resistance proteins which can block drugs in different ways. For example, bleomycin resistance protein inhibits drug activity by binding to it, while fosfomycin resistance proteins inactive the drug by modifying its molecule [52,53].
The MFS transporters are the largest known superfamily of secondary carriers [54]. A considerable amount of drug and multidrug efflux pumps found in bacteria are comprised of the MFS transporters class. They are found in both Gram-negative (e.g., enterobacteria, Pseudomonas sp. or Moraxella sp.) and Gram-positive strains (Staphylococcus sp., Bacillus sp.), however it is unclear if they share substrate profiles [55]. The MFS found in strains examined have been phylogenetically compared to each other ( Figure 3). Although, MFS proteins occur six times in the analyzed regions of HI2424, they differ significantly. There is however high similarity between MFS transporters found in VC12802_chr2_2, CR318_chr3_2 and HI2424_chr3_2 (undelined in Table 4). compared to each other ( Figure 3). Although, MFS proteins occur six times in the analyzed regions of HI2424, they differ significantly. There is however high similarity between MFS transporters found in VC12802_chr2_2, CR318_chr3_2 and HI2424_chr3_2 (undelined in Table 4).  Three genes encoded for toxin-antitoxin (TA) systems ( Table 4). The hicA and hicB found in DDS 22E-1_chr1_2 encode a complete two protein system that targets mRNA. HicA is the predicted interferase, while HicB balances the system and neutralizes HicA [56]. Interestingly, the only complete TA system was found in intact phage DDS 22E-1_chr1_2. This may suggest that the host is not able to remove or damage the phage because the system is active. Three genes encoded for toxin-antitoxin (TA) systems (Table 4). The hicA and hicB found in DDS 22E-1_chr1_2 encode a complete two protein system that targets mRNA. HicA is the predicted interferase, while HicB balances the system and neutralizes HicA [56]. Interestingly, the only complete TA system was found in intact phage DDS 22E-1_chr1_2. This may suggest that the host is not able to remove or damage the phage because the system is active.
Bacteria can be armed with antiphage defense systems such as CRISPR-Cas and BREX, which effectively protect its host against viral infections [57]. Those systems may also take part in disposing of integrated phages (completely or partially), leading to the creation of cryptic phages [58]. Consequently, all bacterial genomes were screened for the presence of these bacterial antiphage defense systems. No BREX systems were found after manual analysis based on previously described methodology [59]. In contrast, using CRISPRfinder [60] ten potential CRISPR sequences were located in the chromosomes of B. cenocepacia strain 842 (two sequences in the same chromosome), J2315, H111, DWS 37E-2, 895, ST32, MSMB384WGS, CR318, DDS 22E-1, and VC7848 strains. Table S19 presents detailed data considering each of potential CRISPR sequence (Supplementary Data).
In an attempt to differentiate taxonomical groups of analyzed prophages, a phylogenetic analysis was conducted. The evolutionary resemblance of 64 nucleotide phage-like sequences (intact, artifact, questionable, incomplete) were proposed by using the maximum likelihood method based on the Tamura-Nei model [61]. The tree with the highest log likelihood (−108,132.8877) is presented in Figure 4. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.  Table S19 presents detailed data considering each of potential CRISPR sequence (Supplementary Data).
In an attempt to differentiate taxonomical groups of analyzed prophages, a phylogenetic analysis was conducted. The evolutionary resemblance of 64 nucleotide phage-like sequences (intact, artifact, questionable, incomplete) were proposed by using the maximum likelihood method based on the Tamura-Nei model [61]. The tree with the highest log likelihood (−108,132.8877) is presented in Figure 4. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Four specific clades were distinguished. Phage genomes located in selected clades (marked on the tree) share the same neighboring sequence in the host genome: clade I and IV-tRNA-Arg; clade II-dehydrogenase family proteins; clade III-ABC transporter substrate-binding proteins. Other branches did not show any significant internal resemblance.
To summarize the results obtained by automatic and manual methods, region characteristics cards were created for all intact and artifact regions (Supplementary data). The cards collect the most Four specific clades were distinguished. Phage genomes located in selected clades (marked on the tree) share the same neighboring sequence in the host genome: clade I and IV-tRNA-Arg; clade II-dehydrogenase family proteins; clade III-ABC transporter substrate-binding proteins. Other branches did not show any significant internal resemblance.
To summarize the results obtained by automatic and manual methods, region characteristics cards were created for all intact and artifact regions (Supplementary data). The cards collect the most crucial data relating to each of the qualified phages. The intact phage card consists of basic data including size and location, taxonomical affiliation (based on the homology the NCBI database), the number of ORFs annotated, recognized regulatory sequences, region derivation and the complete annotation. Information regarding regulatory sequences and taxonomy are not included in artifact region cards however the additional data including homology of each protein is appended to the annotation tables.

Discussion
This article presents the results of an in silico analysis of potential prophage regions found in Burkholderia cenocepacia genomes, currently stored in the NCBI database (October 2017). Regions were divided into four groups basing on their completeness: intact (15 regions), questionable (nine regions), incomplete (34 regions) and artifact (five regions). Phage prevalence differed between BCC strains but did not exceed 3.67% of the total genome size, which is relatively low ratio in comparison to reports by Ohnishi [62,63].
All regions were subjected to both automatic and manual annotation processes to see if there were notable differences in these techniques. The automatic method was fast and allowed rough annotations, but it was quite imprecise because of incorrect interpretation of the homology level between genes in regions and in the database. Even if the genes' similarity to the virus database was at an insignificant level (in some cases as low as 5%) the software identified the gene as viral.
In the PHASTER scoring method some regions consisting of multiple repeats of the same virus gene were eventually marked as intact prophages. Bearing that in mind, the automatic annotation was verified manually, resulting in localization of potential mistakes made by the software and allowing the creation of an additional group of artifact regions. Those regions were either incomplete prophages with essential genes deprived or were aggregates of random viral-origin genes (often repetitive). The occurrence of such regions may suggest that BCC strains are able to domesticate prophages, as described previously by Bobay et al. [27].
Recently, Bodilis et al. 2018 [63] compared the Burkholderia ET12 lineage with a selection of environmental strains, including similar analysis based on the PHASTER software. These results considering fully sequences strains (J2315, MC0-3 and AU1054) presented by Bodilis et al. 2018 [63] were partially synonymous with the ones obtained in this study: (i) J2315_chr1_1 is the same region as the PI_J2315_1; (ii) J2315_chr2_1 is the same region as the PI_J2315_3; (iii) both prophages which were already described (KS10 and BcepMu) have been annotated in the same regions by all studies [50,51]; (iv) regions MC0-3_chr1_1 and MC0-3_chr1_2 are the same regions respectively to PI_MC0-3_1 and PI_MC0-3_2. Regions PI_J2315_6, PI_MC0-3_3, PI_MC0-3_4 and PI_AU1054_1 presented by Bodilis et al. 2018 [63] could not be found with the current version of PHASTER, while region PI_J2315_4 (ORFs BCAM0001AM_1973-BCAM0001AM_2038) is no longer available in the database, making it impossible to address those regions in the comparison. Further, all data relating to the H111 genome were based on the contigs sequence and therefore, the PI_H111_1, PI_H111_2 and PI_H111_3 regions could not be located in current database. Additionally, the size of the region PI_H111_3 which has been described as phage φH111-1 does not match the region size from our study and Lynch et al. description (10.8 kB vs. 43.0 kB) [29,64].
A search for virulence factors was conducted in all regions to examine eventual influence of prophage to host pathogenicity. The results obtained suggest that prophages of Burkholderia preferentially carry drug resistance mechanisms over other categories of virulence factors. This finding is consistent to those of Summer et al. and Ronning et al. [21,65].
The resistance of BCC to β-lactams is well-known, however there are no reports on these genes being located on the prophage region [62]. We now show that a class A β-lactamase encoding gene was identified in 842_chr2_1 region. MFS transporter genes were the most commonly observed in BCC genomes analyzed which is probably related to host drug resistance. MFS have a very broad substrate spectrum, thus their influence on the host virulence could not be precisely indicated. DeShazer examined the influence of MFC genes deletions in B. mallei prophage phi1026b extensively, however no correlation to the host virulence was observed [66].
Three of the identified genes-fic (J2315_chr2_1), hicA and hicB (DDS 22E-1_chr1_2), encode components of toxin-antitoxin systems. Fic is a representative of incomplete Fic/Doc system, whereas hicA and hicB form a complete hicAB cassette. The HicAB cassette has been extensively described and classified as a type II toxin-antitoxin system [67]. HicA serves as a ribosome-independent mRNA interferase, which can cleave specific mRNAs and tmRNAs. HicB function as antitoxin and is composed of two domains, a DNA-recognition domain (C-terminal) and an RNase H fold with no catalytic residues (N-terminal). The HicAB cassette was previously identified as a part of bacteria (including Burkholderia species) or prophage genomes [56,68]. What is more, the HicA protein, found in Burkholderia pseudomallei, was proven to play a role in formation of the persister cells tolerant to ciprofloxacin and ceftazidime [69]. There are no reports of locating a HicAB cassette directly in a Burkholderia prophage.
The virulence of B. cenocepacia strains HI2424, AU1054 and J2315 was tested in the nematode Caenorhabditis elegans model by Cooper et al. 2009 [69]. It was showed that two closely related B. cenocepacia isolates of the PHDC clonal lineage, HI2424 and AU1054, exhibited significantly different virulence, with HI2424 being more lethal and toxic to the nematodes. When considered in relation to our study, it could be inferred that strains possessing more prophage regions are more virulent/toxic [70].
The results obtained from the CRISPR system search did not show any direct correlation between the host and its prophages. The number of phages identified and their completeness differed between hosts possessing CRISPR system. The 65% of the CRISPR systems found were located on the second chromosome, which was the least favored prophage integration place (based on the prophage prevalence and distribution). If the strain carried CRISPR on the second chromosome there were no prophages present or only incomplete ones. The results indicating the presence of CRISPR differed from the study of Bodilis et al. 2018 [64]. While the absence of CRISPR in H111 may be explained by the incomplete sequence of H111, the lack of the anti-phage system in J2315 is more difficult to interpret. The most probable reason is the use of a more recent version of the CRISPRfinder which is more effective in searching for these systems.
The closest neighborhood analysis showed that the biggest fraction of repetitive sequences found next to the prophage regions was tRNA-Arg. The tRNAs are known to be the most common integration sites of prophages, including BCC hosts. Previously described tRNA that are known to be valid recombination sites are tRNA-Arg ( [29,65,71,72]. Even though the ABC transporter, LysR family transcriptional regulators and peptidase encoding genes were found in the phage neighborhood less frequently than tRNA-Arg, their continual appearance, may suggest these are also plausible integration sites for viruses (Table S18).
In summary, the prophages identified in the B. cenocepacia genomes analyzed differed in amount, composition and completeness. The biggest proportion of located viruses were marked as incomplete, which may suggest that this host is actively domesticating its prophages. The fact that BCC are environmental bacteria associated with the rhizosphere also suggests that it will favor collection of phage genes, as they may promote the fitness in their habitat. The analyses allowed antiphage systems to be distinguished as well as phage genes, which are potentially responsible for host virulence (drug resistance, pathogenicity). The results obtained offer profound insights into the composition of each phage genome, which are future targets for research, and will enable the selection of appropriate methods and build a solid foundation for further experimental work in relatively cost-efficient and quick manner.