Proteotyping of Campylobacter jejuni by MALDI-TOF MS and Strain Solution Version 2 Software

Identification of microorganisms by MALDI-TOF MS has become a popular method in the past 20 years. Strain Solution ver. 2 software appended with MALDI-TOF MS enables accurate discrimination of serotypes and strains beyond the genus and species level by creating a theoretical mass-based database. In this study, we constructed a theoretical mass database with the validated biomarkers to proteotype Campylobacter jejuni. Using 10 strains belonging to Campylobacter spp. available from culture collections and 41 Campylobacter jejuni strains isolated from humans and foods, the ribosomal protein subunits L36, L32, S14, L24, L23, L7/L12, and S11 could be selected as the effective biomarkers for the proteotyping of C. jejuni at MALDI-TOF MS. An accurate database of their theoretical mass-based values was constructed by matching these gene DNA sequences and the observed mass peaks. We attempted to automatically classify 41 strains isolated from nature using this database and Strain Solution ver. 2 software, and 38 strains (93%) were correctly classified into the intended group based on the theoretical mass-based values. Thus, the seven biomarkers found in this study and Strain Solution ver. 2 are promising for the proteotyping of C. jejuni by MALDI-TOF MS.


Introduction
Campylobacter spp. are microaerophilic, non-spore-forming, Gram-negative helical rods, of which 33 species have been reported [1]. Campylobacter jejuni and Campylobacter coli have been the most common causative agents of Campylobacter infections in humans for about 100 years. Campylobacter jejuni/coli infections, commonly associated with gastroenteritis and diarrhea, are still increasing in many industrialized countries [2]. In severe cases, C. jejuni infections are thought to cause rare complications including neuropathy, such as Guillain-Barré syndrome and Fisher syndrome. C. jejuni/coli is widely distributed in the intestinal tracts of livestock, poultry, pets, and wild animals, and bacteria thought to originate from them have been isolated from rivers, lakes, and even sewage [3]. Chickens are the most significant source of C. jejuni/coli infection in developed countries, and its contamination rate in chicken is much higher than that of other livestock [4]. To reduce such food poisoning caused by chicken, it is important to monitor and control C. jejuni/coli in poultry processing [5].
Therefore, there are a variety of diagnostics analysis tools for determining Campylobacter by genus and species, and further classification. These include biochemical analysis based on traditional culturing [6], genetic approaches such as multiplex-PCR [7,8], pulsedfield gel electrophoresis (PFGE) [9], multi-locus sequence typing (MLST) focusing on seven housekeeping genes [10,11], and immunochemical approaches. For the past 30 years, two serotyping methods (Penner and Lior types) have been used in epidemiological studies of C.
jejuni and C. coli [12,13]. Campylobacter spp. lack lipopolysaccharide in the outer-cell membrane and instead express lipooligosaccharide and capsule polysaccharide. The primary serotyping by Penner is thought to reflect the differentiation of the capsule polysaccharide [14,15]. Penner and Lior serotyping is useful for tracing the source of contamination, though agglutination tests using antisera involve complicated and time-consuming operations, and require skilled manipulation. More recently, whole-genome sequencing (WGS) technologies have been applied clinically as a robust alternative to conventional typing methods such as PFGE and MLST [16,17]. Using the genetic information available from WGS analysis, Cody et al. found the core genome of 1343 loci in C. jejuni and proposed the core genome MLST (cgMLST), which enables the high-resolution analysis of C. jejuni.
However, in terms of human health protection and food safety, a crucial key for the diagnosis and prevention of foodborne illness has been the establishment of a rapid and accurate discrimination method of foodborne pathogens at the strain or serovar level. In recent years, microbial classification using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has been rapidly gaining popularity in the clinical and food industries [18]. With this method, it is easy to obtain analytical results and to analyze multiple samples in a short time, thus enabling simple and rapid identification and typing of microorganisms. To overcome the limit of the conventional serotyping of Campylobacter spp., several research groups have developed a MALDI-TOF MS-based identification of Campylobacter spp. [19][20][21][22]. Mandrell et al. reported that six Campylobacter species (C. coli, C. jejuni, C. lali, C. specterum, C. helveticus, and C. upsaliensi) could be distinguished using biomarkers, ribosomal protein L7/L12, DNA-binding protein HU/HCj, ribosomal protein S13, chaperonin GroES, unknown protein of 9651 Da, 12,786 Da, and 9796 Da. In addition, Zautner et al. reported grouping results of C.jejuni by MALDI-TOF MS-reflected phenotypic aspects clustered by MLST [23]. More recently, MALDI-TOF MS combined with machine learning showed that the sensitive and specific prediction of C. jejuni Sequence type (ST) was possible [24].
The S10-spc-alpha operon gene-encoded ribosomal protein mass spectrum (S10-GERMS) method is one of the MALDI-TOF MS microbial identification and discrimination approaches, which provides a theoretical mass-based database for serotype-or species-specific biomarkers [25][26][27]. By combining this method and a proteotyping software Strain Solution ver. 2, which is created from the collaboration with Meijo University and Shimadzu corporation, we demonstrated the advanced classification of food-borne-related bacteria such as Escherichia coli, Listeria monocytogenes, and Salmonella enterica [27][28][29].
In this study, we constructed a theoretical mass-based database allowing the proteotyping of Campylobacter spp. and demonstrated proteotyping of 41 isolates of C. jejuni using the Strain Solution ver. 2 software.

Bacterial Strains
The Campylobacter spp. used in this study are summarized in Table 1. The strains obtained from public culture collections, American Type Culture Collection (ATCC) and Japan Collection of Microorganisms (JCM), were used to construct the theoretical MS database. Forty-one strains from No. 11 to 51 in Table 1 isolated from food and humans and preserved at the Osaka Institute of Public Health in Japan were used for the evaluation of the constructed database. ST of MLST was determined by PCR and Sanger sequencing with DNA primers listed in Table S1, which were designed according to the open access PubMLST website (http://pubmlst.org/campylobacter/info/primers.shtml (accessed on 14 December 2016) [30]. The Penner serotype of C. jejuni subsp. jejuni was determined by antisera, commercially available from Denka (Tokyo, Japan) [31]. Penner genetic typing was conducted according to the modified Capsule PCR multiplex typing system developed by Konno et al. [32], of which the original scheme was described by Poly et al. [33]. A superscript T indicates that the strain is the type strain. ND; not determined. * Penner genetic typing determined by modified Capsule PCR multiplex typing system. ** The ST of the strain ATCC700819 may be a new type because it did not exactly match the database-registered type, but 6 in 7 genes matched against ST 43 or 3210 types.

Construction of the Mass Database
Theoretical mass values of ribosomal proteins encoded in the S10-spc-alpha operon were calculated based on genetic information obtained by DNA sequencing with the primers listed in Table S2. DNA sequences determined in this study and selected as working biomarkers in MALDI-TOF MS were deposited to the DNA Data Bank of Japan (DDBJ) and are available in the DDBJ/EMBL/GenBank databases under accession numbers LC723726-LC723765.

MALDI-TOF MS Analysis
All strains were cultured on a 14EG medium plate (https://www.jcm.riken.jp/cgibin/jcm/jcm_grmd?GRMD=14 (accessed on 10 December 2016) recommended by JCM under a microaerobic condition (6-12% O 2 , 5-8% CO 2 ) at 37 • C for two nights using Aneropack (Sugiyamagen, Tokyo, Japan). The bacterial colonies grown on the agar plates were picked by disposable loops and mixed with 10 µL of matrix solution (25 mg/mL sinapinic acid [Fujifilm Wako Pure Chemicals, Osaka, Japan], 0.6 (v/v)% trifluoroacetic acid, and 50 (v/v)% acetonitrile). The mixtures (1.2 µL) were spotted on the wells of a dry metal analytical plate which were precoated with 0.5 µL of saturated sinapinic acid dissolved in ethanol. After drying in air, they were analyzed with a positive linear mode by the AXIMA MALDI-TOF MS system for microorganisms (Shimadzu corporation, Kyoto, Japan) and the SARAMIS database v 3.5 (VITEK MS, bioMérieux, Marcy l'Etoile, France), which utilizes a fingerprinting approach. To further precisely analyze in Strain Solution 2, MS data categorized as C. jejuni in the SARAMIS database were calibrated with the common m/z peaks distributed in C. jejuni, namely m/z 4365.42 (ribosomal protein L36), 6826.31 (S14), 7034.49 (L29), 10,323.07 (S19), 11,673.73 (S10), and 16,376.32 (L16). Then, the analytical data of mass values and intensities were scanned on Strain Solution ver. 2 software at the tolerance of 800 ppm and with auto peak selection mode.

Construction of the Theoretical MS Database
We obtained 10 strains of Campylobacter spp. and found that all of the ribosomal proteins encoded in S10-spc-alpha operon had sequence diversity according to the species or serotypes ( Table 2, please refer to the registered DNA sequences described in Section 2.2). Among these, we could observe mass peaks derived from 15 kinds of ribosomal proteins (subunit S10, L23, L22, L16, L29, S17, L14, L24, S14, L18, L15, L36, and S11 in S10-spc-alpha operon) and additional biomarker candidates L32 and L7/L12, however, S10, L22, L16, L29, S17, L14, L18, L15, and S13 were not suitable as biomarkers because unknown protein mass peaks were overlapped on these mass peaks (data not shown).
Therefore, based on the previous results, we selected seven biomarkers as shown in Table 3. The ribosomal S11 peaks, except for C. fetus subsp. Fetus, were observed at the methylated (+14) mass values. The S11 peaks of C. jejuni ATCC700819 were not found, whereas it had a corresponding gene whose methylated theoretical mass value was same as that of other C. jejuni subsp. jejuni. In addition, the isolates from foods or humans were analyzed by MALDI-TOF MS, and we found some peaks showing characteristic mass values different from those of the strains obtained from the culture collection. Therefore, the gene sequences corresponding to the peaks were analyzed and included in the database. Sequence analysis revealed that some strains isolated from foods or humans had unique theoretical values that were not present in the culture collection strains. Therefore, five strains (C15_93, C15_94, C15_97, C15_113, C14_188) were selected as representatives of each group to be added to the database. Finally, the theoretical mass-based value database including these five strains was constructed.

Strain Solution Ver. 2 Analysis
First, all Campylobacter spp. strains used in this study were identified at the species level by the SARAMIS fingerprinting method. Based on the constructed and registered database with seven biomarkers on the Strain Solution ver. 2 software (Table 3), the MALDI-TOF MS analysis data (m/z and intensity) of each strain were scanned. The biomarker hit scores and output results are summarized in Table 4. Among 41 strains isolated from food or humans, 6 or 7 biomarkers in 38 strains were detected and correctly proteotyped as the intended groups according to the theoretical mass values. Though three strains, namely C15-92, C15-135, and C15-141, showed the hit score 7 (100% matching) to the correct groups, they matched another group with the same score. Therefore, the correct classification rate was 93% (38/41). To further evaluate the results, a phylogenetic tree was drawn based on all detected biomarker peaks in the Strain Solution ver. 2 software (Figure 1). By comparing Penner serotype, ST, and clonal complex (CC) with this proteotyping phylogenetic tree, it was found that most of the strains with CC type 22, 21, and 45 separated each other and those with the same CC type were assigned into the clusters. For Penner serotypes (Penner genetic typing), G (HS: 8/17) and B (HS: 2) were clearly distinguished from the others.

Discussion
In this study, we constructed a theoretical mass-based database for the proteotyping of Campylobacter species by MALDI-TOF MS and Strain Solution ver. 2 software. The selected biomarker peaks were all derived from ribosomal protein subunits, and our previously reported proteotyping approach allowed us to construct a widely disseminated theoretical m/z value database (Table 3). Although useful biomarkers for the identification of Campylobacter using MALDI-TOF MS have been vigorously investigated by Mandrell et al. [19], it should be noted that the biomarkers other than ribosomal protein L7/L12 are first reported in this study. The reasons we could find new biomarkers in this study would be that we selected stably detected biomarker peaks either with small intensity or molecular weights greater than m/z 13,000. The 41 strains isolated from the foods and humans were used to evaluate the detection of the selected seven biomarker peaks and the validity of this database, and generally we could get more than six marker hits in reference to the database, allowing classification into the intended groups based on the theoretical mass-based value database ( Table 4). As for the three strains doubly hitting the correct and the other groups with the same score of 7, adding a new biomarker to the database could help with accurate identification. However, at present, since we have not found any biomarkers other than Table 3 that can be detected stably, the sample preparation method may need to be improved, as examined for Salmonella enterica [34].
Several attempts to correlate STs by MLST with MALDI-TOF MS results have been reported, but the large number and complexity of the MLST makes it difficult to correlate all types [24,35]. In this study, we determined Penner serotypes including genetic types, STs, and CC, and compared them with MALDI-TOF MS proteotyping results. The cluster analysis of the MALDI-TOF MS results using the obtained phylogenetic tree on Strain Solution ver. 2 showed that the tested C. jejuni strains divided into two main clusters. One consisted of mainly CC-22 and CC-45, and the other was CC-21 and the others such as CC-354 and 443, and these main clusters further classified into two sub-clusters ( Figure 1). In total, we could observe four distinct clusters shown as I to IV in Figure 1. Furthermore, group I was subdivided into five, and group II into three clusters. Focusing on the Penner serotypes, we can see that O (HS: 19) and Z 6 (HS: 55) types tend to belong to cluster I, P (HS: 21), D (HS: 4A), R (HS: 23/36), and others to cluster II, G (HS: 8/17), Y (HS: 37), and R (HS: 53) to cluster III. The Penner type of cluster IV, to which CC-21 mainly belongs, was found to be mainly B (HS: 2). The major C. jejuni strains isolated from humans are CC-21 and 45, but CC-21 is more resistant to environmental stresses such as high temperatures and freezing, and has acquired resistance to various drugs more frequently [36,37]. Therefore, the clear separation of CC-21 and 45 in the present results indicates that such differences may be reflected in the peak appearance of MALDI-TOF MS, which was also supposed by a previous report [35]. This suggests that Strain Solution ver. 2 analysis may have the possibility of classification by lineage like Listeria monocytogenes described previously [28].
In addition, we have found CC-specific biomarker candidate peaks by MALDI-TOF MS analysis as follows: With further increased validation and improvement by accumulating analyses of biomarkers, more detailed classification may be possible. The theoretical mass-based database which is useful for the proteotyping of C. jejuni as well as the identification of Campylobacter spp. by MALDI-TOF MS will be broadly used in clinical and food-related investigations, allowing a rapid MALDI analysis to make a quicker decision before the CC or Penner serotypes, which can contribute to Campylobacter risk management.
WGS has rapidly developed because of the prevailing technology of the next-generation sequencing. The enrichment of the web-accessible WGS database will also make possible the creation of a sophisticated method for the establishment of a database constructed with genetically theoretical mass-based biomarkers, leading to the proteotyping of bacterial isolates at the species, strain, and serovar level. The conceptual principle using genetically theoretical mass-based biomarkers is valid for the MALDI-TOF MS analysis with machine learning, and will pave the way for greater change.

Patents
Patent WO2017168741 is resulting from the work reported in this manuscript.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/microorganisms11010202/s1, Table S1: DNA primers used for MLST; Table S2: DNA primers used for sequencing of biomarkers. Funding: This work was financially supported by the Aichi Science and Technology Foundation (Japan) as a part of the "Technological Development Project for Food Safety and Security" under Knowledge Hub Aichi.

Data Availability Statement:
The datasets presented in this study can be found in this article/ Supplementary Materials, and DDBJ/EMBL/GenBank. Further information and requests for resources and reagents should be directed to and will be fulfilled by the corresponding authors.