Colony Fingerprint-Based Discrimination of Staphylococcus species with Machine Learning Approaches

Detection and discrimination of bacteria are crucial in a wide range of industries, including clinical testing, and food and beverage production. Staphylococcus species cause various diseases, and are frequently detected in clinical specimens and food products. In particular, S. aureus is well known to be the most pathogenic species. Conventional phenotypic and genotypic methods for discrimination of Staphylococcus spp. are time-consuming and labor-intensive. To address this issue, in the present study, we applied a novel discrimination methodology called colony fingerprinting. Colony fingerprinting discriminates bacterial species based on the multivariate analysis of the images of microcolonies (referred to as colony fingerprints) with a size of up to 250 μm in diameter. The colony fingerprints were obtained via a lens-less imaging system. Profiling of the colony fingerprints of five Staphylococcus spp. (S. aureus, S. epidermidis, S. haemolyticus, S. saprophyticus, and S. simulans) revealed that the central regions of the colony fingerprints showed species-specific patterns. We developed 14 discriminative parameters, some of which highlight the features of the central regions, and analyzed them by several machine learning approaches. As a result, artificial neural network (ANN), support vector machine (SVM), and random forest (RF) showed high performance for discrimination of theses bacteria. Bacterial discrimination by colony fingerprinting can be performed within 11 h, on average, and therefore can cut discrimination time in half compared to conventional methods. Moreover, we also successfully demonstrated discrimination of S. aureus in a mixed culture with Pseudomonas aeruginosa. These results suggest that colony fingerprinting is useful for discrimination of Staphylococcus spp.


Introduction
Staphylococcus species are widely distributed in various animals including human and livestock, and are frequently isolated as common pathogens from clinical specimens. The skin, anterior nares, and mucous membranes of human and other animals including mammals and birds contain Staphylococcus spp., but hosts can vary depending on the bacterial species [1]. S. aureus is one of the most pathogenic staphylococci that causes a range of severe infections such as suppurative diseases and toxic-shock The equipment for lens-less imaging was composed of only blue light emitting diode (LED) and a complementary metal-oxide semiconductor (CMOS) image sensor without optical lens. This means that colony fingerprinting can be carried out using a simple, inexpensive, and compact devise. In our previous study [16], it was demonstrated that five microorganisms belonging to different genera (E. coli, S. aureus, Pseudomonas aeruginosa, S. enterica, and Candida albicans) could be discriminated based on the colony fingerprinting. On the contrary, discrimination of closely related bacteria in identical genera was not intensively studied. We only demonstrated that two Staphylococcus spp. (S. aureus and S. epidermidis) were distinguishable with linear discrimination analysis (LDA) of their colony fingerprints. In this study, we expanded the discrimination targets to five Staphylococcus spp., i.e., S. aureus, S. epidermidis, S. haemolyticus, S. saprophyticus, and S. simulans, all of which have been isolated from humans, and can be pathogenic. To precisely discriminate these closely related species, we developed novel discriminative parameters, which were not employed in our previous study [16], as image features of colony fingerprints. Furthermore, discrimination of the target bacteria was attempted using several machine learning approaches. We also discuss the relationship between actual colony morphology and colony fingerprints. The results of this study indicate that colony fingerprinting showed considerable promise as a tool for bacterial discrimination even within the closely related species.

Bacterial Strains
S. aureus ATCC 6538, S. epidermidis ATCC 14990, and P. aeruginosa ATCC 9027 were obtained from American type culture collection (ATCC, Manassas, VA, USA). S. haemolyticus NBRC 109768, S. saprophyticus subsp. saprophyticus NBRC 102446, and S. simulans NBRC 109714 were obtained from National Institute of Technology and Evaluation (NITE), Biotechnological Resource Center (NBRC, Kisarazu, Chiba, Japan). Staphylococcus spp. were cultivated in lysogeny broth (LB) medium (5 mL) with shacking at 37 • C overnight prior to colony formation testing. The cell concentration was determined by cell counting using a microscope and hemocytometer.

Lens-Less Imaging System
Setup of the lens-less imaging system and the observation methods employed in this study were described in our previous study [16]. The lens-less imaging system consists of a CMOS image sensor (2048 × 1536 pixels, pixel size: 3.2 µm, imaging area: 6.55 × 4.92 mm 2 , DFK61BUC02, The Imaging Source Europe GmbH, Bremen, Germany) and a blue light-emitting diode (LED, λ = 465 nm) located 10 cm above the sensor. Bacterial cells were spread on LB-agar embedded in a microchamber composed of a glass slide, cover grass, and spacer seals (9 × 9 mm 2 , thickness 300 µm each). The microchamber without a cover glass was fulfilled with 1.5% (w/v) agarose-containing LB medium in the liquid state. At this point, a sticky side of the spacer seal was covered by a release film. A cover glass was mounted to make the surface of LB-agar flat. After 20 min, LB-agar was solidified, and then the cover glass and the release film covering the spacer seal were carefully removed. Bacterial suspension (3.2 × 10 5 cells/mL, 1 µL) was dropped on the LB-agar. A cover glass was mounted on the LB-agar again. The cover glass and spacer seal were tightly agglutinant to prevent water evaporation. The microchamber containing bacterial cells was placed on the CMOS image sensor. The entire system (CMOS sensor, microchanber and LED) were kept in an incubator at 37 • C [16]. Images were automatically captured every 5 min (exposure time: 1/18 s) under the control of the IC Capture 2.2 software (The Imaging Source Europe GmbH, Bremen, Germany).

Imaging Processing
The image analysis described below was performed using ImageJ [24] and MATLAB (The MathWorks, Inc., Natick, Massachusetts, MA, USA). First, the contrast of original lens-less images was enhanced by remapping the data values to fill the entire intensity range (0, 255) using the auto-adjusting function for intensity values in MATLAB. Subsequently, each pixel value is subtracted from the maximal pixel value, and the difference is used as the pixel value in the output images; i.e., black-and-white balance is inverted. Then, the images were binarized based on the Otsu's thresholding method [25], by which the pixels were represented in 10 gray levels, and dichotomized into two classes (i.e., background and colony regions) with a threshold at the maximum level. After the binarization, fill-up processing was executed to determine colony regions. Finally, discriminative parameters, which are described in detail below, were calculated from the contrast-adjusted and inverted lens-less images.

Discrimination Analysis
In this study, 14 parameters (i.e., growth rate, µ max ; histogram deviation, G; average intensity, I; half central intensity, I 1/2 ; quarter central intensity, I 1/4 ; dounutness, D; central dounutness, D c ; entropy, H; energy, En; energy density, Ed; weighted center difference, W; roundness, R; Zernike moment, Z; and solidity, S) were extracted from the colony images captured with the lens-less imaging system, which are referred to as "colony fingerprints" with a size of approximately 250 µm. Individual parameters were explained in Table S1. The parameters were computed using MATLAB software.
Discrimination analysis was performed using R 3.1.2 (R Foundation for Statistical Computing, Vienna, Austria) [26]. LDA was operated using the function lda. k-nearest neighbor algorithm (k-NN), naive Bayes classifier (NB), artificial neural network (ANN), support vector machine (SVM), and random forest (RF) were operated using the caret package; methods knn, nb, nnet, svmRadial, and rf, respectively. For k-NN, the optimal value of k was 11. Leave-one-out cross-validation was employed to assess the generalizable discrimination accuracy of the models.

Colony Fingerprints of Staphylococcus spp.
Using the lens-less imaging system [16], time-lapse images of colony formation of five Staphylococcus spp. were obtained. (Figure 1, see also Supplementary Information Figure S1). It should be noted that, in our previous study [16], we already confirmed that colony fingerprints were reproducible even when different operators did the experiments using different culture lots. It was observed that all of the five species generated round-shaped colony fingerprints. In general, a region with high intensity (shown as a central white dot) were surrounded by a region with low intensity (shown in black) at the initial stages of the colony formations (e.g., typically Figure 1a,b,d, at 8 h). The regions with low intensity expanded outward during the growth of colonies, while a white ring was observed on the inner side of the outer edge (e.g., typically Figure 1a-e, at 16 h). Characteristic white dots at the central area were also observed (e.g., typically Figure 1a,b,d,e, at 16 h).
Although the general features described above were similar among the five species, distributions of intensity in the colony regions were likely to be species-specific. To quantitatively assess this difference, we selected the colony fingerprints of each species at the size of approximately 250 µm in diameter, and plotted the intensity profiles across the colony fingerprints ( Figure 2a). The intensities of the five species at the outer edge region (B and B' in Figure 2b) were relatively similar. A clear difference appeared at the region with the half diameter of the entire colony region, referred to as half colony region in this study (C-C' in Figure 2b). Within it, the most central regions with quarter diameter of the entire colony region, referred to as quarter colony region (D in Figure 2b), showed the most variability.

Discriminative Parameters for Colony Fingerprinting of Staphylococcus spp.
In the previous study [16], we discriminated S. aureus and S. epidermidis by LDA with seven discriminative parameters: Maximum specific growth rate, colony appearance time, relative intensity, histogram deviation, donutness, entropy, and energy density. Among these, colony appearance time and relative intensity could not generate output automatically using the MATLAB software, but required labor-intensive calculations using ImageJ due to the complicated definitions. Our goal was to develop a simple system to discriminate bacteria. Therefore, in the present study, we decided to eliminate these two parameters for discrimination.
We attempted to discriminate 25 colony fingerprints of five Staphylococcus spp. (125 colony fingerprints in total) by LDA with the remaining five parameters, i.e., maximal growth rate (µ max ), histogram deviation (G), donutness (D), entropy (H), and energy density (Ed). These parameters were extracted from the colony fingerprints with a size of approximately 250 µm as described above. It took 11 ± 2 h for the colony fingerprints to reach this size. As a result, discrimination accuracy was 74.4% (Table 1, see also Supplementary Information Table S2). We considered that developing additional discriminative parameters that represent the species-specific features of colony fingerprints was required in order to improve the discrimination accuracy.
Next, we computed the additional nine parameters, i.e., central dounutness (D c ), energy (En), average intensity (I), half central intensity (I 1/2 ), quarter central intensity (I 1/4 ), weighted center difference (W), roundness (R), Zernike moment (Z), and solidity (S), all of which can be output automatically using our MATLAB program. Distribution of 14 parameters extracted from the 25 colony fingerprints of five Staphylococcus spp. was shown in the Supplementary Information Figure S2. In particular, D, D c , I 1/2 , and I 1/4 strongly highlighted the species-specific features of colony fingerprints because they are determined based on the intensities at the half and quarter colony regions where the intensity much varied species by species (Figure 2b). Subsequently, we examined the 14 parameters extracted from the colony fingerprints of five Staphylococcus spp. by analysis of variance (ANOVA ,  Table S3). Ten parameters including D, D c , I 1/2 , and I 1/4 showed statistically significant differences (p < 0.05) among the five Staphylococcus spp., while Z, S, W, and R did not. This result would be reasonable since typical colony fingerprints of Staphylococcus spp. are radially symmetric without distinct convexity. Although these four parameters might contribute less to discrimination, compared to others, we employed all of the 14 parameters for the following analysis.
Principal component analysis (PCA) was first carried out with the 14 parameters extracted from the colony fingerprints of five Staphylococcus spp. PCA revealed that S. aureus were relatively separated from four other species (Figure 3). Then, discrimination of the five Staphylococcus spp. was performed by LDA with the 14 discriminative parameters. As a result, discrimination accuracy reached 79.2% (Table 1, see also Supplementary Information Table S4). This result indicates that discrimination accuracy was improved by employing additional parameters, but it did not reach 100%. Therefore, we decided to consider other machine learning approaches for better discrimination accuracy. In the previous study [16], we discriminated S. aureus and S. epidermidis by LDA with seven discriminative parameters: Maximum specific growth rate, colony appearance time, relative intensity, histogram deviation, donutness, entropy, and energy density. Among these, colony appearance time and relative intensity could not generate output automatically using the MATLAB software, but required labor-intensive calculations using ImageJ due to the complicated definitions. Our goal was to develop a simple system to discriminate bacteria. Therefore, in the present study, we decided to eliminate these two parameters for discrimination.
We attempted to discriminate 25 colony fingerprints of five Staphylococcus spp. (125 colony fingerprints in total) by LDA with the remaining five parameters, i.e., maximal growth rate (μmax), histogram deviation (G), donutness (D), entropy (H), and energy density (Ed). These parameters were extracted from the colony fingerprints with a size of approximately 250 μm as described above. It took 11 ± 2 h for the colony fingerprints to reach this size. As a result, discrimination accuracy was 74.4% (Table 1, see also Supplementary Information Table S2). We considered that developing additional discriminative parameters that represent the species-specific features of colony fingerprints was required in order to improve the discrimination accuracy.
Next, we computed the additional nine parameters, i.e., central dounutness (Dc), energy (En), average intensity (I), half central intensity (I1/2), quarter central intensity (I1/4), weighted center difference (W), roundness (R), Zernike moment (Z), and solidity (S), all of which can be output automatically using our MATLAB program. Distribution of 14 parameters extracted from the 25 colony fingerprints of five Staphylococcus spp. was shown in the Supplementary Information Figure  S2. In particular, D, Dc, I1/2, and I1/4 strongly highlighted the species-specific features of colony fingerprints because they are determined based on the intensities at the half and quarter colony regions where the intensity much varied species by species (Figure 2b). Subsequently, we examined the 14 parameters extracted from the colony fingerprints of five Staphylococcus spp. by analysis of variance (ANOVA , Table S3). Ten parameters including D, Dc, I1/2, and I1/4 showed statistically significant differences (p < 0.05) among the five Staphylococcus spp., while Z, S, W, and R did not. This result would be reasonable since typical colony fingerprints of Staphylococcus spp. are radially symmetric without distinct convexity. Although these four parameters might contribute less to discrimination, compared to others, we employed all of the 14 parameters for the following analysis.
Principal component analysis (PCA) was first carried out with the 14 parameters extracted from the colony fingerprints of five Staphylococcus spp. PCA revealed that S. aureus were relatively separated from four other species (Figure 3). Then, discrimination of the five Staphylococcus spp. was performed by LDA with the 14 discriminative parameters. As a result, discrimination accuracy reached 79.2% (Table 1, see also Supplementary Information Table S4). This result indicates that discrimination accuracy was improved by employing additional parameters, but it did not reach 100%. Therefore, we decided to consider other machine learning approaches for better discrimination accuracy.

Comparison of Machine Learning Approaches for Discrimination
As machine learning approaches for colony fingerprinting, we further tested five classifiers, k-NN, NB, ANN, SVM, and RF. With the 14 parameters, discrimination accuracy by k-NN, NB, ANN, SVM, and RF were 80.8%, 83.2%, 99.2%, 98.4%, and 100.0%, respectively (Table 1, see also Supplementary  Information Tables S5-S9). When we compared the positive predictive values (PPVs), which reflect the probability that an instance classified into a particular species was actually that species, those of S. aureus were equal to or even higher than those of other species with any classifiers, and were 100%, except for the analysis by LDA. This suggests that accurate discrimination of S. aureus was relatively easy compared to other species.

Colony Fingerprints of S. aureus in the Presence of Another Bacterium
In the experiments described above, each bacterial species was solely cultivated and visualized. However, actual bacterial contamination can be caused by multiple species. To date, it remained elusive whether co-existence of multiple bacterial species affects the discrimination accuracy of colony fingerprinting. To address this issue, we acquired another 23 colony fingerprints of S. aureus in the presence of P. aeruginosa (Figure 4). According to our previous study [16], S. aureus and P. aeruginosa were easily distinguishable. We used the 125 colony fingerprints of the solely cultivated five Staphylococcus spp., which were obtained above, as training data, and discriminated those of S. aureus, which co-existed with P. aeruginosa, as test data by ANN, SVM, and RF. Using SVM, 22 colony fingerprints were correctly classified as S. aureus, but one was recognized as S. epidermidis. ANN and RF classified all of the 23 colony fingerprints as S. aureus. These results suggest that discrimination of Staphylococcus spp. with high accuracy based on the colony fingerprinting was successfully demonstrated using ANN, SVM, and RF as classifiers even in the presence of P. aeruginosa.

Comparison of Machine Learning Approaches for Discrimination
As machine learning approaches for colony fingerprinting, we further tested five classifiers, k-NN, NB, ANN, SVM, and RF. With the 14 parameters, discrimination accuracy by k-NN, NB, ANN, SVM, and RF were 80.8%, 83.2%, 99.2%, 98.4%, and 100.0%, respectively (Table 1, see also  Supplementary Information Tables S5-S9). When we compared the positive predictive values (PPVs), which reflect the probability that an instance classified into a particular species was actually that species, those of S. aureus were equal to or even higher than those of other species with any classifiers, and were 100%, except for the analysis by LDA. This suggests that accurate discrimination of S. aureus was relatively easy compared to other species.

Colony Fingerprints of S. aureus in the Presence of Another Bacterium
In the experiments described above, each bacterial species was solely cultivated and visualized. However, actual bacterial contamination can be caused by multiple species. To date, it remained elusive whether co-existence of multiple bacterial species affects the discrimination accuracy of colony fingerprinting. To address this issue, we acquired another 23 colony fingerprints of S. aureus in the presence of P. aeruginosa (Figure 4). According to our previous study [16], S. aureus and P. aeruginosa were easily distinguishable. We used the 125 colony fingerprints of the solely cultivated five Staphylococcus spp., which were obtained above, as training data, and discriminated those of S. aureus, which co-existed with P. aeruginosa, as test data by ANN, SVM, and RF. Using SVM, 22 colony fingerprints were correctly classified as S. aureus, but one was recognized as S. epidermidis. ANN and RF classified all of the 23 colony fingerprints as S. aureus. These results suggest that discrimination of Staphylococcus spp. with high accuracy based on the colony fingerprinting was successfully demonstrated using ANN, SVM, and RF as classifiers even in the presence of P. aeruginosa

Discussion
In the present study, we examined the performance of a colony fingerprinting approach [16] for discrimination of five closely related Staphylococcus spp. by various machine learning approaches. According to Bergey's Manual of Systematic Bacteriology (2nd ed.) [1], all of the five species analyzed in this study form smooth, glistening, and raised colonies with entire margin on conventional agar plates. Therefore, we assumed that the morphologies of the colonies growing in the microchambers composed of glass slides [16] could also be similar-resulting in the generation of similar colony fingerprints. The lens-less imaging system allowed us to discriminate these similar colony fingerprints ( Figure S1) with a size of up to 250 µm, which could not be discriminated with the bare eye due to the small size. This resulted in rapid discrimination within 11 h on average. We have not evaluated how accurate the colony fingerprinting is if colony fingerprints smaller than 250 µm are used. If smaller colony fingerprints enable accurate discrimination, assay time can be shorter. In addition, the colony fingerprints can be affected by the density populations; colonies in the high colony density area tend to be small in size. This could be caused by the limitation of the nutrition supply from LB-agar medium. In the present study, we did not analyze the colonies located at a too dense area where the colonies merged with other ones. Classification of the merging colonies is the next issue to be solved in the future study.
The lens-less imaging system employed in this study was composed of only blue LED and a CMOS image sensor without any optical lens and actuators. The lens-less imaging systems composed of CMOS image sensor are recently utilized for various cell analyses, in particular for mammal cells. For instance, Bishara et al. analyzed red blood cells with or without malaria parasite infection using a CMOS image sensor and a fiber-optic array [27]. Vercruysse et al. demonstrated the label-free discrimination of leukonocytes (lymphocytes, monocytes and granulocytes) using a CMOS image sensor combined with a microfluidic channel [28]. In these studies, the lens-less imaging systems were developed with super-resolution technologies including image re-construction techniques, because the main observation objects were mammal cells with the size of several micrometers. As compared to these imaging systems, our system was much simpler, and did not require super-resolution for the purpose of bacterial discrimination.
For the discrimination, 14 parameters were extracted from the colony fingerprints (Table S1). Since a clear difference in intensity appeared at the half-and quarter-colony regions (Figure 2), we employed some parameters (e.g., D, D c , I 1/2 , and I 1/4 ), which reflected such differences. Indeed, ANOVA revealed that such parameters exhibited high F values (i.e., the high variations between groups and/or low variations in groups). It was expected that these parameters could strongly contribute to discrimination of the five Staphylococcus spp. The quarter-colony regions of the colony fingerprints almost correspond to the central white dot (Figure 2). The high intensity at the white dot suggests that the light passing through the colony was condensed on the photodiode of the CMOS sensor at this region. As mentioned, the Staphylococcus spp. form raised elevated colonies with smooth surfaces [1]. These facts suggest that the colonies might behave like a hemisphere lens. This notion is supported by the studies reporting that bacterial colonies can act as biological microlenses in spite of incomplete transparency [29,30]. Therefore, we assumed that differences of the parameter values between species could be attributable to the difference in colony morphology, which can directly correlate to the light focusing property. The 3D morphology of the staphylococcal microcolonies in the microchamber remained elusive, and it will be analyzed in the near future for example by meads of confocal microscopy [31].
PCA analysis with the 14 parameters of the five Staphylococcus spp. revealed another interesting point that S. aureus were separated from the other four species (Figure 3). Accordingly, PPVs of S. aureus tend to be higher than those of other 4 species (Table 1). S. aureus is the most clinically important species among Staphylococcus spp. due to the high pathogenicity. The relationship between the high pathogenicity of S. aureus and its cell surface property has been well studied [32]. Compared to other Staphylococcus spp., S. aureus displays various types of host matrix-binding proteins that enable adherence to the matrix-composing proteins including collagen, laminin, elastin, and fibronectin, as well as fibrinogen. This adhesive property leads, in part, to high pathogenicity, and thus distinguishes S. aureus from other Staphylococcus spp. In addition, it is also well known that S. aureus secretes proteins that other Staphylococcus spp. do not produce, e.g., coagulase and enterotoxin [1]. Such extracellular components could affect the intercellular adhesion and special arrangement of the cells in the colonies. These could result in variation of the light passing properties of the colonies, leading to generation of the S. aureus-specific colony fingerprints.
Nowadays, it is easy to utilize various types of machine learning approaches with open source platforms for statistical analysis like R [26]. We employed k-NN, NB, ANN, SVM, and RF in this study, and found that ANN, SVM, and RF showed high performance. Recently, various bacterial discrimination methods assisted by machine learning approaches were proposed, e.g., light scattering pattern (referred to as BARDOT) [11][12][13][14][15], Raman spectroscopy [33], and MALDI-TOF-MS [34]. In these studies, SVM was the most employed due to the high performance for discrimination as well as ease of implementation with the aid of sophisticated software packages. ANN is a machine learning approach inspired by biological neural networks in the brain. RF is an ensemble method in which multiple decision trees are developed, and final classification is the class with the most votes. The high performance of RF is attributable to high number of decision trees, although discrimination performance of each decision tree is not necessarily high. RF was also employed for bacterial discrimination based on MALDI-TOF-MS [34,35]. These machine learning approaches (ANN, SVM, and RF) can easily deal with nonlinearly separable data, which LDA cannot handle. It remains arguable which ANN, SVM, and RF are the best classifier for colony fingerprinting. The comparison with much larger data set may bring an end to this argument.
In the present study, we demonstrated that these three machine learning approaches were useful for colony fingerprinting. These results suggest that colony fingerprinting has great promise for feasible bacterial tests.

Conclusions
Discrimination of five species of Staphylococcus spp. with colony fingerprinting was successfully demonstrated. We developed novel discriminative parameters that reflect species-specific features of colony fingerprints. As machine learning approaches to analyze these parameters, ANN, SVM and RF showed high performance compared to k-NN, NB, and LDA. By analyzing 14 parameters with RF, discrimination accuracy reached 100%. Furthermore, it was possible to correctly discriminate S. aureus, even in the presence of P. aeruginosa. This suggests that colony fingerprinting could be useful to analyze contaminated samples with multiple species. It was possible to obtain the colony fingerprints within 11 h on average using the compact and inexpensive equipment. Therefore, colony fingerprinting is a rapid and cost-effective method for detection and discrimination of Staphylococcus spp.
Supplementary Materials: The following are available online at http://www.mdpi.com/1424-8220/18/9/2789/s1, Table S1: Discriminative parameters used in this study. Figure S1: The colony fingerprints of five Staphylococcus spp. acquired in this study. Table S2: Confusion matrix from LDA classification of 125 colony fingerprints based on five parameters. Table S3: ANOVA for the parameters. Figure S2: Distribution of the values of the discriminative parameters extracted from the colony fingerprints of 5 Staphylococcus spp.