Distinguishing Different Cancerous Human Cells by Raman Spectroscopy Based on Discriminant Analysis Methods

An approach to distinguish eight kinds of different human cells by Raman spectroscopy was proposed and demonstrated in this paper. Original spectra of suspension cells in the frequency range of 623~1783 cm−1 were acquired and pre-processed by baseline calibration, and principal component analysis (PCA) was employed to extract the useful spectral information. To develop a robust discrimination model, a linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) were attempted comparatively in the work. The results showed that the QDA model is better than the LDA model. The optimal QDA model was generated with 12 principal components. The classification rates are 100% in the calibration and prediction set, respectively. From the experimental results, it is concluded that Raman spectroscopy combined with appropriate discriminant analysis methods has significant potential in human cell detection.


Introduction
Cancer is one of the main causes of human death in recent years [1].Early diagnosis of cancer is a prerequisite for patient recovery [2], however the human body has many organs which may produce cancer cells, so there are many types of cancer cells.Therefore, the classification of cancer cells is also critical for the location of cancer incidence site.Not only that, another major feature of cancer is that it is prone to metastasis [3].For example, the patient will bleed heavily undergoing tumor resection in the early stage of cancer, so cancerous cells may enter the peripheral blood circulation system, moving in the blood vessels in the form of a single cell or cell clusters, called circulating tumor cells.So, it is easy for cancer cells to migrate through the blood system.Therefore, the accurate identification of cancer cells is of great significance for diagnosing the metastasis, diffusion, and recurrence of cancer cells.
At present, the fluorescent labelling method is mainly used in the identification of the type of cancer cells due to its specificity.Fluorescent labelling is based on the specific binding of antigen and antibody [4].The method has a substantial impact on, and even damage to, the original physiological activity of cells, which is not conducive to further analysis and research.It is prone to false positives for antigen and antibody specific binding [5].In addition, the treatment of samples is complex, costly, and inefficient, so there are many drawbacks in clinical applications.If there is a non-contact technique which can specifically identify cancer cells at the physical level, it will not only keep the cell activity intact, but can also effectively solve the problem of efficiency in the complexity of biological sample pre-treatment.Raman spectroscopy is such a technique, which is a kind of inelastic scattering fingerprint spectra of molecules [6].There is a strong specificity to reflect the changes in biochemical components of living cells in aqueous solutions without any labelling and fixation [7], as such Raman spectroscopy has been employed in clinical diagnostics, toxicology tests, and tissue engineering [8,9].
Raman spectroscopy is a fast, accurate, label-free, and non-destructive analytical tool for the detection of the human cells at the single cell level [10,11].It can be used to obtain the difference of the intranuclear genetic material between the cancer cells and the normal cells, and the differences of the proteins in the cell membrane and cytoplasm [6,12].It is known that cellular biochemical components vary depending upon the cancer cells coming from different organs, and different malignancy degrees.The difference is critical for the development of Raman spectroscopy as a new clinical diagnostic approach [10,11,[13][14][15].The main objective of the present experimental study was to investigate the biochemical difference in these different cancer cells (SH-SY5Y, HeLa, HO-8910, MDA-MB-231, U87, A549), the cells of distinct malignancy degree (MDA-MB-231 and MCF-7) and the normal cell line and cancer cells (HEB and U87) utilizing Raman spectroscopy.
In recent years, Raman spectroscopy combined with discriminant analysis techniques has drawn considerable attention for distinguishing similar biological materials such as tissues, cells, and biological molecules [16][17][18][19][20][21][22].In this work, a rapid approach for distinguishing eight kinds of different human cells by Raman spectroscopy was studied.To develop an accurate Raman spectroscopic discrimination model, principal component analysis (PCA) was employed to extract useful spectral information, and then two discriminant analysis algorithms, linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA) were employed to and contrasted to discriminate the eight different human cells.

Sample Preparation
All human cell samples belong to the eight different cell types, the name and the serial number can be seen in Table 1.Each cell type was divided into two groups at random.2/3 of the samples were regarded as the calibration set and 1/3 of the samples were regarded as the prediction set.Dulbecco's Modified Eagle Medium (DMEM) was used to culture the eight different human cells, it was added 1% penicillin-streptomycin and 10% fetal bovine serum (both from Invitrogen, Grand Island, NY, USA), the cells were cultured at 37 • C with 5% CO 2 in a humidified atmosphere.Cells at a density of 1 × 10 6 per 1 mL of media were cultured on 25 cm 2 flask for around 24 h prior to experiments.Figure 1 shows an optical image of the morphology of the eight different adherent human cells.Before the Raman spectroscopy measurement, the cells were removed with 0.25% Trypsin-EDTA and then harvested in 3 mL PBS.
Appl.Sci.2017, 7, 900 2 of 9 only keep the cell activity intact, but can also effectively solve the problem of efficiency in the complexity of biological sample pre-treatment.Raman spectroscopy is such a technique, which is a kind of inelastic scattering fingerprint spectra of molecules [6].There is a strong specificity to reflect the changes in biochemical components of living cells in aqueous solutions without any labelling and fixation [7], as such Raman spectroscopy has been employed in clinical diagnostics, toxicology tests, and tissue engineering [8,9].Raman spectroscopy is a fast, accurate, label-free, and non-destructive analytical tool for the detection of the human cells at the single cell level [10,11].It can be used to obtain the difference of the intranuclear genetic material between the cancer cells and the normal cells, and the differences of the proteins in the cell membrane and cytoplasm [6,12].It is known that cellular biochemical components vary depending upon the cancer cells coming from different organs, and different malignancy degrees.The difference is critical for the development of Raman spectroscopy as a new clinical diagnostic approach [10,11,[13][14][15].The main objective of the present experimental study was to investigate the biochemical difference in these different cancer cells (SH-SY5Y, HeLa, HO-8910, MDA-MB-231, U87, A549), the cells of distinct malignancy degree (MDA-MB-231 and MCF-7) and the normal cell line and cancer cells (HEB and U87) utilizing Raman spectroscopy.
In recent years, Raman spectroscopy combined with discriminant analysis techniques has drawn considerable attention for distinguishing similar biological materials such as tissues, cells, and biological molecules [16][17][18][19][20][21][22].In this work, a rapid approach for distinguishing eight kinds of different human cells by Raman spectroscopy was studied.To develop an accurate Raman spectroscopic discrimination model, principal component analysis (PCA) was employed to extract useful spectral information, and then two discriminant analysis algorithms, linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA) were employed to and contrasted to discriminate the eight different human cells.

Sample Preparation
All human cell samples belong to the eight different cell types, the name and the serial number can be seen in Table 1.Each cell type was divided into two groups at random.2/3 of the samples were regarded as the calibration set and 1/3 of the samples were regarded as the prediction set.Dulbecco's Modified Eagle Medium (DMEM) was used to culture the eight different human cells, it was added 1% penicillin-streptomycin and 10% fetal bovine serum (both from Invitrogen, Grand Island, NY, USA), the cells were cultured at 37 °C with 5% CO2 in a humidified atmosphere.Cells at a density of 1 × 10 6 per 1 mL of media were cultured on 25 cm 2 flask for around 24 h prior to experiments.Figure 1 shows an optical image of the morphology of the eight different adherent human cells.Before the Raman spectroscopy measurement, the cells were removed with 0.25% Trypsin-EDTA and then harvested in 3 mL PBS.

Raman Spectroscopy Measurement
A Renishaw inVia Raman spectrometer (controlled by WiRE 3.4 software, Renishaw plc, Wotton-under-Edge, UK) was used to collect the Raman spectra of the eight different human cells.It was connected to a Leica microscope (Leica DMLM, Leica Microsystems, Buffalo Grove, IL, USA), and equipped with a 532 nm laser that was focused through a 50×, NA = 0.75 objective (Leica Microsystems, Buffalo Grove, IL, USA); A standard calibration peak of 520.5 ± 0.1 cm −1 was used for the system with a silicon in a static mode.20 µL cell suspension was dripped onto MgF 2 wafer for Raman spectrum measurement.The Raman spectra ranged from 623 to 1783 cm −1 were collected at 10 s laser exposure for 1 accumulation in a static mode.The laser power is 0.5 mW, BLZ of a diffraction grating is 2400 line/mm.The three replicate measurements at different times were performed for each cell to reduce the measurement error.The humidity and the temperature was kept at a stable level in the laboratory.Figure 2a presents the raw Raman spectra of the background and A549 cell samples.
For the Raman spectral pre-processing, Renishaw WiRE 3.4 software (Renishaw plc, Wotton-under-Edge, UK) was used to remove the cosmic rays in the raw spectra.Then, all of the Raman spectra were baseline corrected using the Vancouver Raman algorithm [23].The smoothing pre-treatments were performed to reduce the external noises, and enhance the useful information of the biochemical composition.Therefore, Figure 2b gives representative peaks of the A549 cells after pre-processing.The average spectra of the eight different human cells after pre-processing are presented in Figure 3.

Raman Spectroscopy Measurement
A Renishaw inVia Raman spectrometer (controlled by WiRE 3.4 software, Renishaw plc, Wotton-under-Edge, UK) was used to collect the Raman spectra of the eight different human cells.It was connected to a Leica microscope (Leica DMLM, Leica Microsystems, Buffalo Grove, IL, USA), and equipped with a 532 nm laser that was focused through a 50×, NA = 0.75 objective (Leica Microsystems, Buffalo Grove, IL, USA); A standard calibration peak of 520.5 ± 0.1 cm −1 was used for the system with a silicon in a static mode.20 μL cell suspension was dripped onto MgF2 wafer for Raman spectrum measurement.The Raman spectra ranged from 623 to 1783 cm −1 were collected at 10 s laser exposure for 1 accumulation in a static mode.The laser power is 0.5 mW, BLZ of a diffraction grating is 2400 line/mm.The three replicate measurements at different times were performed for each cell to reduce the measurement error.The humidity and the temperature was kept at a stable level in the laboratory.Figure 2a presents the raw Raman spectra of the background and A549 cell samples.
For the Raman spectral pre-processing, Renishaw WiRE 3.4 software (Renishaw plc, Wotton-under-Edge, UK) was used to remove the cosmic rays in the raw spectra.Then, all of the Raman spectra were baseline corrected using the Vancouver Raman algorithm [23].The smoothing pre-treatments were performed to reduce the external noises, and enhance the useful information of the biochemical composition.Therefore, Figure 2b gives representative peaks of the A549 cells after pre-processing.The average spectra of the eight different human cells after pre-processing are presented in Figure 3.

Software
All the analysis algorithms were executed in Matlab R2009a (MathWorks, Inc.Natick, MA, USA) under Windows XP.

Principal Component Analysis (PCA)
Raman spectral data were the application of the array of 1005 variables which were cross-sensitive toward the different biochemical composition in the human cells, so they contained overlapped information which brought some difficulty to the study.Multivariate data analysis could be applied to solve the problem.It is possible that PCA can extract the main information from the Raman spectra, and eliminate some of the overlapped information [7,24,25].In the work, PCA based on spectra pre-processing method was the first attempted to visualize and extract the useful information from the multivariate spectral data to examine the qualitative differences among all types of cell samples.
Figure 4 shows score cluster plot of the eight different human cells with PC1, PC2, and PC3, which were labelled according to their types.Eight human cell groups appeared in cluster trends along the top three PCs axes.PC1 interprets 95.52% variances, PC2 3.17% variances, and PC3 0.66% variances.The cumulative contribution rate of the top three PCs was 99.35%.The 3-dimensional space represented by the top three PCs score indicated 99.35% information from the original spectral data, which covered most of the main information of them.
It could be observed from Figure 4 that there are some inherent component and structure differences among human cell samples even though they actually belong to the same types.The cell samples could not be classified directly using PCA.The separation of the eight different types of samples was not distinct, and especially, some overlapped samples could be examined from the groups of HeLa, HO-8910 and A549.It can be assumed that the biochemical composition of the samples, such as protein, nucleic acid, glycolipid, are similar among the three groups of human cells.

Software
All the analysis algorithms were executed in Matlab R2009a (MathWorks, Inc.Natick, MA, USA) under Windows XP.

Principal Component Analysis (PCA)
Raman spectral data were the application of the array of 1005 variables which were cross-sensitive toward the different biochemical composition in the human cells, so they contained overlapped information which brought some difficulty to the study.Multivariate data analysis could be applied to solve the problem.It is possible that PCA can extract the main information from the Raman spectra, and eliminate some of the overlapped information [7,24,25].In the work, PCA based on spectra pre-processing method was the first attempted to visualize and extract the useful information from the multivariate spectral data to examine the qualitative differences among all types of cell samples.
Figure 4 shows score cluster plot of the eight different human cells with PC1, PC2, and PC3, which were labelled according to their types.Eight human cell groups appeared in cluster trends along the top three PCs axes.PC1 interprets 95.52% variances, PC2 3.17% variances, and PC3 0.66% variances.The cumulative contribution rate of the top three PCs was 99.35%.The 3-dimensional space represented by the top three PCs score indicated 99.35% information from the original spectral data, which covered most of the main information of them.
It could be observed from Figure 4 that there are some inherent component and structure differences among human cell samples even though they actually belong to the same types.The cell samples could not be classified directly using PCA.The separation of the eight different types of samples was not distinct, and especially, some overlapped samples could be examined from the groups of HeLa, HO-8910 and A549.It can be assumed that the biochemical composition of the samples, such as protein, nucleic acid, glycolipid, are similar among the three groups of human cells.
Based on the PCA score plots, geometrical exploration gives the clear clusters trend in the 3D space, instrumental in discriminating types of samples but it cannot be used as a classification tool.Therefore, some discrimination models were used to classify the samples.Supervised pattern recognition approaches refer to some techniques with which a priori knowledge about the category membership of calibration samples are used for distinguishing purposes.The classification model is calibrated on training samples set with different categories [26,27].The performance of the calibration model is evaluated using the prediction or test set.Two discrimination models were attempted comparatively.
Based on the PCA score plots, geometrical exploration gives the clear clusters trend in the 3D space, instrumental in discriminating types of samples but it cannot be used as a classification tool.Therefore, some discrimination models were used to classify the samples.Supervised pattern recognition approaches refer to some techniques with which a priori knowledge about the category membership of calibration samples are used for distinguishing purposes.The classification model is calibrated on training samples set with different categories [26,27].The performance of the calibration model is evaluated using the prediction or test set.Two discrimination models were attempted comparatively.

Comparison of Discrimination Models
The Figure 2b gives representative peaks of the A549 cells.Proteins have strong Raman peaks at 1003 cm −1 (phenylalanine), 1449 cm −1 (CH2 deformation), and 1658 cm −1 (Amide I).The main Raman peaks associated nucleic acids correspond to 1176 cm −1 (cytosine, guanine), 1339 cm −1 (G (DNA/RNA)) and 1580 cm −1 (pyrimidine ring of nucleic acids).Lipids have strong Raman peaks at 1304 cm −1 (CH2 deformation) and 1449 cm −1 (CH2 bending mode in malignant tissue) [28][29][30].Raman spectra distinguish the cancer cells coming from different organs, different malignancy degree are all based on the difference of the cellular biochemical components.The basic biochemical components of human cells are consistent, with only a few differences in the chemical composition, so Raman band assignment of the 8 cell lines are similar (Figure 3).Therefore, the identification of 8 types of cells by Raman spectroscopy must be carried out by the means of Chemometrics.
Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), two classic discriminant methods, were employed in classification of the eight human cells based on Raman data.LDA and QDA are two of the best-known discriminant analysis approaches, which have been successfully used for the appraisement in various fields [30,31].The boundaries that separate groups or classes of samples are calculated using LDA and QDA.Linear boundaries, where a straight line or hyperplane divides the variable space into regions, and quadratic boundaries, where a quadratic curve divides the variable space into regions, were generated by LDA and QDA, respectively.LDA fits a multivariate normal density to each group, with a pooled estimate of covariance.It does not take into account different variance structures for the two classes.QDA fits multivariate normal densities with covariance estimates stratified by group.It allows for discriminating classes which have significantly different class-specific covariance matrices and forms a separate variance model for each class.QDA classifier focuses on finding a transformation of the input features which is able to optimally distinguish between the different classes in the dataset [32][33][34][35][36][37][38].

Comparison of Discrimination Models
The Figure 2b gives representative peaks of the A549 cells.Proteins have strong Raman peaks at 1003 cm −1 (phenylalanine), 1449 cm −1 (CH 2 deformation), and 1658 cm −1 (Amide I).The main Raman peaks associated nucleic acids correspond to 1176 cm −1 (cytosine, guanine), 1339 cm −1 (G (DNA/RNA)) and 1580 cm −1 (pyrimidine ring of nucleic acids).Lipids have strong Raman peaks at 1304 cm −1 (CH 2 deformation) and 1449 cm −1 (CH 2 bending mode in malignant tissue) [28][29][30].Raman spectra distinguish the cancer cells coming from different organs, different malignancy degree are all based on the difference of the cellular biochemical components.The basic biochemical components of human cells are consistent, with only a few differences in the chemical composition, so Raman band assignment of the 8 cell lines are similar (Figure 3).Therefore, the identification of 8 types of cells by Raman spectroscopy must be carried out by the means of Chemometrics.
Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), two classic discriminant methods, were employed in classification of the eight human cells based on Raman data.LDA and QDA are two of the best-known discriminant analysis approaches, which have been successfully used for the appraisement in various fields [30,31].The boundaries that separate groups or classes of samples are calculated using LDA and QDA.Linear boundaries, where a straight line or hyperplane divides the variable space into regions, and quadratic boundaries, where a quadratic curve divides the variable space into regions, were generated by LDA and QDA, respectively.LDA fits a multivariate normal density to each group, with a pooled estimate of covariance.It does not take into account different variance structures for the two classes.QDA fits multivariate normal densities with covariance estimates stratified by group.It allows for discriminating classes which have significantly different class-specific covariance matrices and forms a separate variance model for each class.QDA classifier focuses on finding a transformation of the input features which is able to optimally distinguish between the different classes in the dataset [32][33][34][35][36][37][38].
Figure 5 shows the identification results of LDA and QDA models with 12 PCs.Sample numbers of the eight cell lines for LDA and QDA models with 12 PCs, is shown in Table 2.For the LDA model, the classification rate by cross-validation was 91.21%.A few samples from several groups were wrongly classified in the prediction set; for the QDA model, the classification rate was 100%, which is better than LDA model.The classification results of LDA and QDA models influenced by the number of PCs are presented in Figure 6.As shown in Figure 6, the optimal LDA model was obtained when 15 PCs were used; the optimal QDA model were generated when 12 PCs were employed, and QDA consistently gives a relatively high identification rate.
In order to get a discrimination model for human cell types with good performance, LDA and QDA models were attempted comparatively.Identification results from two models in the calibration set and prediction set, is shown in Table 3. Contrasting to the LDA model, the QDA model obtains a comparatively better performance.It indicated that the quadratic information was helpful to improve the classification performance in the prediction set.Investigated between LDA and QDA models, the LDA adopted hyperplane to classify the samples, while the QDA used higher complexity hypersurface as separator [39].Because of preferable generalization in its theory, QDA results in a better result than LDA model in prediction set.Quadratic discriminant method is stronger in the level of self-learning and self-adjust than linear discriminant method.Therefore, models based on quadratic discriminant analysis often feature superior performance.
Figure 5 shows the identification results of LDA and QDA models with 12 PCs.Sample numbers of the eight cell lines for LDA and QDA models with 12 PCs, is shown in Table 2.For the LDA model, the classification rate by cross-validation was 91.21%.A few samples from several groups were wrongly classified in the prediction set; for the QDA model, the classification rate was 100%, which is better than LDA model.The classification results of LDA and QDA models influenced by the number of PCs are presented in Figure 6.As shown in Figure 6, the optimal LDA model was obtained when 15 PCs were used; the optimal QDA model were generated when 12 PCs were employed, and QDA consistently gives a relatively high identification rate.
In order to get a discrimination model for human cell types with good performance, LDA and QDA models were attempted comparatively.Identification results from two models in the calibration set and prediction set, is shown in Table 3. Contrasting to the LDA model, the QDA model obtains a comparatively better performance.It indicated that the quadratic information was helpful to improve the classification performance in the prediction set.Investigated between LDA and QDA models, the LDA adopted hyperplane to classify the samples, while the QDA used higher complexity hypersurface as separator [39].Because of preferable generalization in its theory, QDA results in a better result than LDA model in prediction set.Quadratic discriminant method is stronger in the level of self-learning and self-adjust than linear discriminant method.Therefore, models based on quadratic discriminant analysis often feature superior performance.

Conclusions
Distinguishing of eight different human cells based on Raman spectroscopy was attempted in this work.The PCA method was first attempted to visualize and extract the useful information from multivariate spectral data to examine the qualitative differences among all types of samples.Two discrimination models (LDA and QDA) were developed comparatively in this work.The results indicated that the human cell detection based on Raman spectroscopy was feasible, and the QDA method performed much better in contrast to the LDA method, resulting in the unambiguous identifications of all eight cells.It could be concluded that it is a promising method using Raman spectroscopy technique combined with appropriate discrimination models to distinguish different cancerous human cells.Furthermore, it will be a very interesting topic to study the Raman detection of cancerous cells mixed with the corresponding normal cells (lung, breast, etc.) in our future research.

Conclusions
Distinguishing of eight different human cells based on Raman spectroscopy was attempted in this work.The PCA method was first attempted to visualize and extract the useful information from multivariate spectral data to examine the qualitative differences among all types of samples.Two discrimination models (LDA and QDA) were developed comparatively in this work.The results indicated that the human cell detection based on Raman spectroscopy was feasible, and the QDA method performed much better in contrast to the LDA method, resulting in the unambiguous identifications of all eight cells.It could be concluded that it is a promising method using Raman spectroscopy technique combined with appropriate discrimination models to distinguish different cancerous human cells.Furthermore, it will be a very interesting topic to study the Raman detection of cancerous cells mixed with the corresponding normal cells (lung, breast, etc.) in our future research.

Figure 1 .
Figure 1.Optical images of the eight different human cells.Scale bar: 100 μm.

Figure 1 .
Figure 1.Optical images of the eight different human cells.Scale bar: 100 µm.

Figure 2 .
Figure 2. (a) Raman spectra of the background and A549 human cells; (b) The representative peaks of the A549 cells after pre-processing.

Figure 2 .
Figure 2. (a) Raman spectra of the background and A549 human cells; (b) The representative peaks of the A549 cells after pre-processing.

Figure 3 .
Figure 3.The average Raman spectra of the eight different human cells subtracted background obtained from baseline calibration.Error bars are standard deviation of the mean.

Figure 3 .
Figure 3.The average Raman spectra of the eight different human cells subtracted background obtained from baseline calibration.Error bars are standard deviation of the mean.

Figure 4 .
Figure 4. 3-dimensional (3D) space with the top three PCs for the eight different human cells.

Figure 4 . 3 -
Figure 4. 3-dimensional (3D) space with the top three PCs for the eight different human cells.

Figure 5 .
Figure 5. Identification results of the Linear Discriminant Analysis (LDA) model (a) and the Quadratic Discriminant Analysis (QDA) model (b) with 12 PCs for the eight different human cells.The inset shows the amplified graph to highlight the practical samples and the classified samples clearly.

Figure 5 .
Figure 5. Identification results of the Linear Discriminant Analysis (LDA) model (a) and the Quadratic Discriminant Analysis (QDA) model (b) with 12 PCs for the eight different human cells.The inset shows the amplified graph to highlight the practical samples and the classified samples clearly.

Figure 6 .
Figure 6.Classification rates of LDA model and QDA model with the number of PCs in the prediction set.

Figure 6 .
Figure 6.Classification rates of LDA model and QDA model with the number of PCs in the prediction set.

Table 1 .
Summary of the eight different human cells.

Table 1 .
Summary of the eight different human cells.

Table 2 .
Sample numbers of the eight human cells for Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) models with 12 PCs.

Table 2 .
Sample numbers of the eight human cells for Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) models with 12 PCs.

Table 3 .
Classification rates from LDA model and QDA model with the number of PCs.

Table 3 .
Classification rates from LDA model and QDA model with the number of PCs.