Proteomic Characterization of the Oral Pathogen Filifactor alocis Reveals Key Inter-Protein Interactions of Its RTX Toxin: FtxA

Filifactor alocis is a Gram-positive asaccharolytic, obligate anaerobic rod that has been isolated from a variety of oral infections including periodontitis, peri-implantitis, and odontogenic abscesses. As a newly emerging pathogen, its type strain has been investigated for pathogenic properties, yet little is known about its virulence variations among strains. We previously screened the whole genome of nine clinical oral isolates and a reference strain of F. alocis, and they expressed a novel RTX toxin, FtxA. In the present study, we aimed to use label-free quantification proteomics to characterize the full proteome of those ten F. alocis strains. A total of 872 proteins were quantified, and 97 among them were differentially expressed in FtxA-positive strains compared with the negative strains. In addition, 44 of these differentially expressed proteins formed 66 pairs of associations based on their predicted functions, which included clusters of proteins with DNA repair/mediated transformation and catalytic activity-related function, indicating different biosynthetic activities among strains. FtxA displayed specific interactions with another six intracellular proteins, forming a functional cluster that could discriminate between FtxA-producing and non-producing strains. Among them were FtxB and FtxD, predicted to be encoded by the same operon as FtxA. While revealing the broader qualitative and quantitative proteomic landscape of F. alocis, this study also sheds light on the deeper functional inter-relationships of FtxA, thus placing this RTX family member into context as a major virulence factor of this species.


Introduction
Filifactor alocis is a Gram-positive asaccharolytic, obligate anaerobe of the Firmicutes phylum that has recently been identified as a member of the oral microbiome with potential involvement in oral disease [1][2][3][4][5][6][7]. Its purported virulence mechanisms include the ability to manipulate neutrophils [8][9][10][11][12] and macrophages [13]. By whole-genome sequencing, we discovered that 60% of the F. alocis strains encode a novel repeats-in-toxins (RTX) protein family member, which we designated as FtxA [14]. Several of the virulence factors and/or putative virulence-related proteins of F. alocis, including FtxA, are contained within extracellular vesicles released by this organism [15,16]. Further to whole-genome sequencing, protein variations in the presence and expression levels among F. alocis strains can be expected. Hence, the early genomic characterization of F. alocis needs to be supported by universal quantitative proteomics, which will enable the deeper characterization of intraspecies proteinic differences and identification of relevant virulence-associated motifs. Earlier proteomic work identified protein differences (i.e., in cell-wall anchoring proteins) among two F. alocis strains, which may reflect variations in virulence between them, evidenced by their differential effects on host cells [17]. By using a quantitative shotgun proteomics platform and an in-house proteomics database, the present study aimed at defining the full proteomes of the ten F. alocis strains (nine clinical isolates and one reference strain) to identify differences among their core proteomes and protein-protein interaction patterns. Furthermore, specific focus was placed on the functional interactions of FtxA with other proteins as well as the differential abundance regulation of FtxA between strains.

Proteome Profiles of Ten F. alocis Strains
To analyze the full proteome differences between F. alocis strains, a total of 872 proteins were identified and quantified from the reference strain ATCC 35896 and nine F. alocis clinical isolates (n = 3 for each strain) (Table S1) based on our Progenesis QI-Scaffold inclusion criteria. Among them, 744-802 proteins were identified from each strain ( Table 1). The visual representations of protein abundances and the correlation between strains are provided as heatmaps in Figure 1. Unsupervised clustering of the heatmap data revealed that the biological replicates within the same strain (e.g., 10E-17U_1, 10E-17U_2, and 10E-17U_3 were triplicates of strain 10E-17U) were grouped separately from others, except for 624B-08U_1, indicating that almost all strains were indeed distinctive at the proteomic level. In addition, the clustering results also showed that protein profiles of 624B-08U were the closest to the reference strain ATCC 35896 among all nine clinical isolates, while the profiles of strains 6B-17U and 413B3-17U were the most distinctive. We previously found a putative member of the large repeats-in-toxins (RTX) toxin family, FtxA, which is consistent with phylogenetic relationships based on multilocus sequence typing analysis [14]. In line with this discovery, both 6B-17U and 413B3-17U do not express FtxA.

Differential Expression of Proteins between FtxA-Positive and -Negative Strains and Their Predicted Functional Protein Association Networks
Thirty-four proteins were differentially expressed (abs (Log2FC) > 1 and p-value < 0.05) in F. alocis strains with ftxA-positive genotypes compared with ftxA-negative strains (Tables 2 and S2). Yet, 41 proteins were exclusively identified in ftxA-positive strains, while seven were exclusively identified in ftxA-negative strains (Tables 3 and S2). In addition, 12 proteins were found to be at least twice as high in one condition than the other but unable to have a p-value, since only one sample was identified in their weaker conditions (Tables 4 and S2). To further understand their function, the inter-relationships of all differentially expressed proteins were investigated by STRING. In sum, 66 pairs of functional associations with a combined confidence score >0.4 were retrieved among 44 (including FtxA(UniProt ID ADW1614)) differentially expressed proteins ( Figure 2 and Table S3). Although most of these associations only involved two or three proteins, probably due to the lack of known information on F. alocis, three of them involved multiple proteins and formed three protein network clusters ( Figure 2). The largest clustering includes five proteins having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane (i.e., integral component of membrane Gene Ontology Term (GO:0016021)). Yet, their annotated functions are quite distinctive including DNA repair (EFE28003.1), DNA-mediated transformation (EFE28216.1), type II secretion system (EFE28506.2), and pilin domain protein (EFE28505.1). The second-largest cluster of these three, with five different proteins, were mainly proteins with catalytic activity, except EFE28863.1, which is an ABC transporter. The smallest cluster of these three was a group of enzymes including dehydrogenase and aminotransferase.  The ftxA genotypes (+ or −) [14] and different strains are color coded.     Table S3). The colors of the lines illustrate different types of interactions. Among them, the blue and purple lines indicate interactions based on the curated database and experimental results, respectively, while green, red, dark blue, yellow, and black lines are predicted interactions determined from gene neighborhood, gene fusions, gene co-occurrence, text mining, and co-expression, respectively.

Predicted Functional Protein Association Network for FtxA
Additional protein interaction analysis was centered on the novel RTX family member of F. alocis, FtxA. The STRING protein-protein interaction analysis revealed that six proteins had interactions with FtxA (ADW16141.1), thereby forming a putative "functional FtxA cluster" of seven proteins ( Figure 3A and Table S4). This included three proteins from the ftx gene cluster itself: FtxA (ADW16141.1), FtxB (EFE27661.1), and FtxD (EFE27662.1), as well as four other essentially uncharacterized proteins. Four of these seven proteins, including FtxA, were identified and quantified in this work ( Figure 3B). The identification of ADW16141 (FtxA) was consistent with the ftxA genotypes in our previous work [14]. Of the remaining three identified and quantitated proteins, one was annotated as a "repeat protein" and contained a copper amine oxidase N-terminal domain with a divergent InlB B-repeat domain (ADW16149.1). This protein displayed interactions with only two of the seven cluster proteins, apart from FtxA ( Figure 3A). The uncharacterized protein EFE27658.1 was encoded directly upstream of ftxA, whereas another uncharacterized protein EFE27629.1 was also found to interact with FtxA, mainly in automated text mining and other annotations from STRING (Table S4). However, there is no clear functional overlap between these proteins based on their predicted functions (Table 5). Of note, these predicted proteins are still in the early stage of annotation. As a result, none of them have an assigned function in the KEGG database, and no BRITE terms have been generated.  Table S4). The colors of the lines illustrate different types of interactions as is shown in Figure 2. Four identified proteins are highlighted in circles. (B) The abundance of identified proteins is displayed in the values for arcsinh transformed normalized abundance plus one in the heatmap. The normalized abundance values of "NA" are represented by black. The clustering between rows is based on four identified proteins, while the clustering between columns is based on all identified proteins (same as Figure 1).

Discussion
In this study, we analyzed the full proteomes of ten F. alocis strains, which yielded a total of 872 proteins, the majority of which were identified in all strains. For instance, 802 proteins were identified in the ATCC 35896 strain, 755 proteins were identified in the 845G-16U strain, and 762 proteins were identified in the 117A-17U strain. In comparison to the reference strain, ATCC 35896, which is so far the best-characterized one, strain 624B-08U showed the closest proteomic profile identity, whereas strains 6B-17U and 413B-17U were the most distant in this respect. This is in agreement with the phylogenetic relationships revealed among the ten F. alocis strains, based on eight genes using multilocus sequence typing analysis [14]. Of note, those two strains were isolated from different infectious sources, as the former was a constituent of dental biofilm at a periodontitis site, whereas the latter at an acute necrotizing gingivitis (ANUG) site. Hence, the specialized ecological niche of the infection could account for qualitative and quantitative proteomic variations among clinical isolates. We observed that strains expressing FtxA, a putative member of the large repeats-in-toxins (RTX) toxin family [14], revealed more common virulence characteristics, regardless of their infectious origin, and possibly associated with activities on host immunity [18]. Clustering of bacterial strains according to the expression levels of RTX toxins has been observed for other species, including Escherichia coli [19,20], and the periodontal pathogen Aggregatibacter actinomycetemcomitans [21,22]. Clustering has also been seen based on other toxins, such as for Salmonella enterica serovar Typhimurium strains expressing cytolethal distending toxin (CDT) and other serovar Typhi-related genes [23].
Ninety-seven proteins were differentially regulated in FtxA positive compared with FtxA negative groups. While the differential abundances of proteins may account for variations in functional and biosynthetic activities between strains, they merely imply differences in virulence characteristics. Indeed, both strains were isolated from infected root canals, even though they largely differed in terms of expressed protein abundances with only one expressing FtxA. We also attempted to evaluate the functions of all differentially expressed proteins, using enrichment analysis based on known databases, that did not have significant results (data not shown). Despite the fact that the genome of F. alocis has been annotated, most of their proteins were only computationally analyzed (i.e., unreviewed proteins), and their automatic annotation functions were sometimes not sufficient for accurate enrichment analysis. Alternatively, the largest clustering from protein associate networks constitutes a superfamily of integral membrane proteins that mediate ATP-powered translocation of many substrates across membranes, either for import or export [24]. Proteins clustering with catalytic activity or clustering of various dehydrogenase and aminotransferase were also found by String. The contribution of ABC transporters [25], respective to antibiotic resistance in many bacterial species, was demonstrated and was in agreement with the few ABC transporters we discovered in this work. Hence, based on our analyses, a key variation among the different F. alocis strains might rely on their antibiotic resistance capabilities, which, however, need to be validated in further studies.
Finally, we considered the associations between the expressions of FtxA protein and other proteins associated with it. We observed that the presence and absence abundancebased of FtxA was consistent with the ftxA genotypes in our previous work [14], which is a good indicator that we applied a reliable protein-inclusion criteria in the work. The STRING protein-protein interaction analysis revealed that six other proteins can interact with FtxA and, hence, tentatively constitute a cluster of proteins that may be functionally associated with cytoplasmic FtxA, three of which (in addition to FtxA) were identified and quantitated in the present work. The ftx ABD gene operon encoded four predicted products [14], hypothetical protein EFE27658.1, FTXA, FtxB, and FtxD; the last two were not identified in the current study. Whether the hypothetical protein EFE27658.1, encoded directly upstream of ftxA, has any role in FtxA post-translational modification, intracellular trafficking, and/or secretion is not known. In addition, it displays no apparent similarity to an equivalent, such as HlyC or TolC, commonly present in and/or associated with RTX toxin-encoding gene clusters [14,19]. The other two proteins can potentially interact with FtxA thanks to their proximity within the chromosome where they are encoded (i.e., chromosome neighborhoods) based on the neighborhood prediction algorithm of STRING as well as other annotations. However, we should also beware that these two proteins were not encoded in an operon with FtxA. The FtxA-associated "repeat protein" ADW16149.1 appeared to be present in all strains, including the strains lacking ftxA, and it had no apparent sequence similarity to ftxA, neither was it encoded close to the ftx gene cluster. Interestingly, however, this FtxA-associated protein appears to be an InlB B-repeat-containing protein, which may associate it with host cell invasion [26]. This remains to be experimentally tested. Since there is currently no clear overlap based on their predicted functions, these proteins are still in the early stage of discovery and, thus, warrant deeper exploration.
In conclusion, the present study identified that F. alocis species has a broad "core proteome", while there are also quantitative variations in the expression of select proteins between strains. The functional pathways associated with the most or least abundantly expressed proteins were related to ribosomal and mitochondrial activity as well as protein biosynthesis and transportation. Due to the early stage of bioinformatic annotation of the identified proteins, it is difficult to confer any deeper roles in the metabolic functions of this species, let alone in the virulence-specific characteristics of individual strains. Nevertheless, the global proteomic analysis of F. alocis performed in this study justifies the need for a deeper characterization of its recently discovered FtxA RTX toxin. Indeed, our analysis revealed that a functional clustering of specific protein-protein interactions can discriminate between FtxA-producing and non-producing strains. The identities, functions, and interactions of these proteinic groups need to be further investigated to reveal whether they comprise a pathogenicity island within F. alocis that could regulate the virulence of this species.

Bacterial Protein Extraction
The F. alocis strains were cultivated for three days under the condition described above before being suspended in PBS. The F. alocis suspensions were then adjusted to approximately OD600 nm = 1.0. Then, 0.5 mL of suspensions in each strain were reduced, alkylated, trypsinized, and purified using the PreOmics iST kit (PreOmics GmbH) following the manufacturing protocol for protein extraction and digestion. These extracts were concentrated using a Speedvac (Thermo Savant SPD121P, Thermo Scientific, Waltham, MA, USA) and stored at −20 • C until further use.

LC-MS/MS Analysis
The bacterial extracts were first reconstituted with 30 µL of 3% acetonitrile (ACN) in 0.1% formic acid, then normalized to 1 mg/mL based on the estimated protein concentration using a NanoDrop One system (Thermo Fisher Scientific, Madison, WI, USA). One microgram of each sample was then loaded on an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA) interfaced to an Easy nano-flow HPLC system (Thermo Fisher Scientific) in a randomized order for mass spectrometry analysis. A pool of all samples was inserted around the middle of the sequence to be used as the reference for the label-free quantification. The liquid chromatography solvent compositions of buffers A and B were 0.1% formic acid in water and 0.1% formic acid in acetonitrile, respectively. The samples were loaded onto an Acclaim PepMap 100 (Thermo Scientific) trap column, 75 µm × 2 cm, packed with C18 material, 3 µm, 100 Å, and separated on an analytical EASY-Spray column (Thermo Scientific, 75 µm × 500 mm) packed with reversephase C18 material (PepMap RSLC, 2 µm, 100 Å). Peptides were eluted over 110 min at a flow rate of 300 nL/min. The following LC gradient protocol was applied: 0-2 min: 2% buffer B; 95 min: 25% B; 100 min: 35% B; 105-110 min: 95% B.
Survey scans were acquired in the Orbitrap mass analyzer in the range of m/z 300-2000, with a resolution of 120,000, an automated gain control (AGC) target value of 400,000, and a maximum injection time of 50 ms. Higher energy collisional dissociation (HCD) spectra were acquired in the linear ion trap mass analyzer, using a normalized collision energy of 30%. Precursor ions were isolated in the quadrupole with an m/z 1.6 isolation window. Charge state screening was enabled, and only precursor ions with charge states of 2-7 were included. The threshold for signal intensities was 5000. Precursor ion masses already selected for MS/MS acquisition were dynamically excluded for 25 s. A maximum injection time of 300 ms, an AGC target value of 2000, and a first mass of 140 for HCD spectra were applied.

Label-Free Quantification
Label-free quantification was performed using Progenesis QI for Proteomics (Nonlinear Dynamics) as described previously [29]. In brief, all raw files were aligned with the pooled sample for feature detection, alignment, and quantification. An mgf file of all aligned samples was exported with the top 5 ms/ms per feature, 200 minimal fragment ion count, and deisotoping and charge deconvolution, and then it was exported for searching using Mascot (version 2.4.1, Matrix Science, London, UK) against an in-house database containing 6679 protein sequences. This database included F. alocis proteins (taxon identifiers 143361 and 546269, downloaded from UniProt (https://www.uniprot.org/) (accessed on 19 March 2018), 260 sequences, known as MS contaminants, and reverse sequences were used as a decoy for estimating the false discovery rate (FDR) [30]. The following search parameters were used: precursor tolerance: ±10 ppm; fragment ion tolerance: ±0.6 Da; enzyme: trypsin; maximum missed cleavages: 2; fixed modification: carbamidomethyl (C); variable modification: oxidation (M) and acetyl (protein N-term). Then, the spectrum reports of the search result were generated using Scaffold (version 4.2.1, Proteome software) with a threshold of protFDR of 10%, minimum of one peptide, and a pepFDR of 5%, which was imported in Progenesis QI for Proteomics for identifying the quantified proteins.
To minimize potential errors introduced by aggressively matching features between samples in Progenesis QI for Proteomics. All raw files from each sample were also individually searched using Mascot against the same database, with the following searching parameters: precursor tolerance: ±10 ppm; fragment ion tolerance: ±0.6 Da; enzyme: trypsin; maximum missed cleavages: 2; fixed medication: carbamidomethyl (C); oxidation (M) and acetyl (protein N-term). These Mascot generic files (mgf) were combined using Scaffold (version 4.2.1, Proteome software, Portland, OR, USA) and then exported using Scaffold at a cutoff at 3.0% FDR at the protein level (protFDR), minimal two peptides, and 1.0% FDR at the peptide level (pepFDR). Then Progenesis results were compared with the Mascot results. The Progenesis quantified proteins were only accepted as ture quantifications if they were also identified from an individual mgf in a Mascot search with a minimum of 2 unique peptides. The normalized abundances from these accepted proteins were then kept for quantification, while the abundances from proteins not identified in individual mgfs were replaced with "NA". These Mascot-filtered Progenesis results were used to calculate fold changes (FCs) between strains in the FtxA-positive compared with the FtxA-negative group as well as log2 transformed FC. The hyperbolic arcsine transformed result was used for two-tailed student t-tests as in Progenesis QI. Proteins with an absolute value of log2FC > 1 as well as a p-value < 0.05 were considered as being regulated. Benjamini-Hochberg FDR corrections were provided based on the p-value.
Some proteins were identified and quantified in either ftxA-positive or -negative strains (i.e., only found in one condition). Therefore, they cannot have FC or p-values. Similarly, other proteins that were found to display high abundance changes (absolute value of log2FC > 1) between ftxA-positive or -negative strains cannot acquire p-values due to they have only one identification in one of the conditions. Nevertheless, proteins with high intensity in one condition but not present in the other condition (or present in a low abundance) can have biological relevance. Thus, proteins in the above two circumstances were also defined as regulated proteins.
All three types of regulated proteins, namely, (a) proteins differentially expressed (abs (Log2FC) > 1 and p-value < 0.05), (b) proteins exclusively identified in one condition, and (c) proteins expressed at least twice as high in one condition than the other (abs (Log2FC) > 1) but with no p-value (could not acquire a p-value, as only one sample was identified in one of the conditions) were treated equally in the following functional analysis.

Data Clustering and Heat Maps for Regulated Proteins
The R software (R: A Language and Environment for Statistical Computing, R Development Core Team) and, in particular, the packages quantable (https://cran.r-project. org/web/packages/quantable/index.html) (accessed on 11 September 2019) and pheatmap (https://cran.r-project.org/web/packages/pheatmap/index.html) (accessed on 24 October 2019) were used to generate unsupervised clustering analysis, correlations between different strains and heat maps. No apparent outlier was found or excluded in this study.

Functional Analysis for Regulated Proteins
The enrichment analyses were conducted in the STRINGdb package, version 3.1.3, on 20 July 2021, using all quantified proteins as background. The interaction scores were calculated from experimental evidence as well as predictions based on knowledge gained from other organisms [29], using STRING (https://string-db.org/) (accessed on 1 February 2021). Only proteins with medium confident scores (>0.4) are shown in the illustration. The function of proteins that contributed to the enriched functions or pathways were manually searched in KEGG (https://www.genome.jp/kegg/) (accessed on 15 September 2021) to retrieve their BRITE hierarchical classifications and Pfam domain annotations.

Predicted Interaction for FtxA
The protein-protein interactions predicted for FtxA were determined using STRING (https://string-db.org/) (accessed on 11 August 2021). All seven independent channels for STRING interaction analysis including, chromosome neighborhoods, gene fusion, phylogenetic co-occurrence, homology co-expression, experimentally determined interaction, database annotation, and automated text mining, were used to identify interactions. Proteins that exhibited a final combination score with a medium confidence of more than 0.4 were considered.

Image Processing
Microsoft PowerPoint (version 16; Microsoft, Redmond, WA, USA) was used for assembling the figures.

Ethical Considerations
All procedures were conducted following the guidelines of the local ethics committee at the Medical Faculty of Umeå University, which are based on the Declaration of Helsinki (64th WMA General Assembly, Fortaleza, October 2013).

Data Availability
Mass spectrometry data were handled using the local laboratory information management system (LIMS) [31]. The in-house database and mass spectrometry proteomics data were deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD026971. The raw file names and their corresponding sample names are listed in Table S5. The authors declare that all data supporting the findings of this study are available within the article and the Supplementary Materials or upon request from the corresponding author.