Early Detection of Pre-Cancerous and Cancerous Cells Using Raman Spectroscopy-Based Machine Learning

Cancer is the most common and fatal disease around the globe, with an estimated 19 million newly diagnosed patients and approximately 10 million deaths annually. Patients with cancer struggle daily due to difficult treatments, pain, and financial and social difficulties. Detecting the disease in its early stages is critical in increasing the likelihood of recovery and reducing the financial burden on the patient and society. Currently used methods for the diagnosis of cancer are time-consuming, producing discomfort and anxiety for patients and significant medical waste. The main goal of this study is to evaluate the potential of Raman spectroscopy-based machine learning for the identification and characterization of precancerous and cancerous cells. As a representative model, normal mouse primary fibroblast cells (NFC) as healthy cells; a mouse fibroblast cell line (NIH/3T3), as precancerous cells; and fully malignant mouse fibroblasts (MBM-T) as cancerous cells were used. Raman spectra were measured from three different sites of each of the 457 investigated cells and analyzed by principal component analysis (PCA) and linear discriminant analysis (LDA). Our results showed that it was possible to distinguish between the normal and abnormal (precancerous and cancerous) cells with a success rate of 93.1%; this value was 93.7% when distinguishing between normal and precancerous cells and 80.2% between precancerous and cancerous cells. Moreover, there was no influence of the measurement site on the differentiation between the different examined biological systems.


Introduction
Cancer comprises a growing health, economic, and social issue [1], accounting for nearly 10 million deaths in 2020, or nearly one in six deaths. For many years, studies of the incidence and relevance of cancer have informed us of the outcomes for tumors detected at an advanced stage, prompting research into methods to identify the disease before symptoms appear [2]. Cancer causes the transformation of normal cells into tumor cells in a multi-stage process that generally develops from a precancerous lesion to a malignant tumor.
These changes result from the interaction between a person's genetic factors and three categories of external agents-physical, chemical, and biological carcinogens [3]. Malignant cells are characterized, among other factors, by the acceleration of the cell cycle,

Materials and Methods
The main phases of the current research were (a) treatment and cell growth; (b) preparation of the samples for Raman measurements; (c) acquisition of the Raman spectra; (d) preprocessing of the spectra and feature extraction; and (e) analysis of the data using LDA and the decision system.

Cells
Three different cell types, NFC, NIH/3T3, and MBM-T, were investigated to follow the development of cancer. These cells represented the normal, precancerous, and cancer categories, respectively.
The NFC cells were fibroblast cells of mice embryos with a short life span [28], making it necessary to re-establish these cultures frequently.
The NIH/3T3 (murine fibroblast cells) were precancerous cells that had undergone many changes due to mutations throughout various transfers; however, such cells are not considered cancerous cells because they do not have all the properties of a cancerous cell [10]. These cells are defined as abnormal cells because they may undergo further changes over time that will convert them into cancerous cells [29].
The MBM-T cells were fully malignant mouse fibroblasts with all the characteristics of cancer cells [30].

Sample Preparation for Raman Measurements
A quartz slide was wrapped with smooth aluminum foil and soaked with highly concentrated acetone (99.5%) as a substrate. All the cells were collected from the flasks using trypsin (Biological Industries, Kibbutz Beit-Haemek, Israel), and then transferred to an Eppendorf tube. The cells were then centrifuged at 240× g for five minutes. The cell pellet was washed three times with 500-1000 µL of NaCl buffer (0.9%). A hematocytometer determined the number of cells. Then, the cells were pelleted and re-suspended to achieve a cell concentration of 30-50 cells/µL. A drop of suspended cells was mounted on the substrate and dried before performing the Raman measurements.

Raman Measurements
Dried samples of representative cells were measured using the single spectrum mode in a Horiba LabRAM HR Evolution Raman Microscope with a Sincerity CCD detector that Cells 2023, 12,1909 4 of 17 was deep-cooled to −60 • C, 1024 × 256 pixels. The basic microscope and detector were fitted with a 10 mW 532 nm illumination Nd:YAG green laser (with a laser spot size of 2 µm) and a 10% transmittance filter fixed at the probe laser station to avoid sample heating by the laser. An integration time of 60 s was used for all Raman measurements. The laser was focused by a 50× objective lens (Olympus MPLAN (Tokyo, Japan)) to generate a diffractionlimited spot size of 1.54 µm on the sample. A 600 L/mm grating was used to generate spectra with 1.2 cm −1 dispersion to maximize the signal strength while minimizing the background signal from auto-fluorescence. Wavenumber calibration was performed using a silicon sample.
The cells were prepared from different cultures and were measured for more than one year using the Raman facility. Moreover, three measurements were recorded for each cell, as detailed in Table 1. The measurements were performed on three different sites of each cell, namely the cell center, cell cytoplasm, and cell membrane, as shown in Figure 1. suspended to achieve a cell concentration of 30-50 cells/µL. A drop of suspended cells was mounted on the substrate and dried before performing the Raman measurements.

Raman Measurements
Dried samples of representative cells were measured using the single spectrum mode in a Horiba LabRAM HR Evolution Raman Microscope with a Sincerity CCD detector that was deep-cooled to −60 °C, 1024 × 256 pixels. The basic microscope and detector were fiMed with a 10 mW 532 nm illumination Nd:YAG green laser (with a laser spot size of 2 µm) and a 10% transmiMance filter fixed at the probe laser station to avoid sample heating by the laser. An integration time of 60 s was used for all Raman measurements. The laser was focused by a 50× objective lens (Olympus MPLAN (Tokyo, Japan)) to generate a diffraction-limited spot size of 1.54 µm on the sample. A 600 L/mm grating was used to generate spectra with 1.2 cm −1 dispersion to maximize the signal strength while minimizing the background signal from auto -fluorescence. Wavenumber calibration was performed using a silicon sample.
The cells were prepared from different cultures and were measured for more than one year using the Raman facility. Moreover, three measurements were recorded for each cell, as detailed in Table 1. The measurements were performed on three different sites of each cell, namely the cell center, cell cytoplasm, and cell membrane, as shown in Figure 1. A typical Raman shift spectrum for a normal fibroblast cell is shown in Figure 2. A typical Raman shift spectrum for a normal fibroblast cell is shown in Figure 2. Table 2 lists all the prominent bands based on the published literature [31][32][33][34][35][36][37][38]. Each peak in the Raman shift spectrum is characteristic of a certain type of vibration of the functional groups of the main molecules, providing us with a detailed molecular fingerprint.

Spectral Preprocessing
To improve the quality of the spectra and Raman shift bands and to compare the different spectra, all the received spectra were preprocessed before the machine learning classification analysis. All preprocessing steps were performed using our in-house code (written in Python).
In the first step, the spectra were cut to the 1800-600 cm −1 range and smoothed using the Savitsky-Golay algorithm with 5 points, aiming to reduce the instrumental noise (of the device) and improve the spectral information.
In the next step, baseline correction was performed to eliminate the differences between the spectra resulting from fluorescence and spectral shifts in the baseline. The process was carried out in several stages [39]; the full (1800-600 cm −1 ) region was subdivided into two sub-regions, 1800-1201 cm −1 and 1200-600 cm −1 .  Table 2 lists all the prominent bands based on the published literature [31][32][33][34][35][36][37][38]. Each peak in the Raman shift spectrum is characteristic of a certain type of vibration of the functional groups of the main molecules, providing us with a detailed molecular fingerprint.   The minima at fixed wavenumbers 1800 cm −1 , 1750 cm −1 , 1720 cm −1 , 1560 cm −1 , 1530 cm −1 , and 1490 cm −1 were calculated in the first sub-region. The second sub-region was divided into 50 equal ranges, and the minimum at each range was calculated. In the next step, the minima points were connected using straight lines to create the baseline and subtracted from the spectrum; five iterations of the baseline correction process were performed.
The last step in the preprocessing process involved normalizing the spectra using vector normalization. The Raman intensities at all the measured wavenumbers were averaged and subtracted from the original spectrum. The resulting spectrum was treated as a vector and its "norm" was normalized to 1 by calculating the sum of the squares of all the Y values and dividing the spectrum by the square root of this sum. As a result of the first step (subtracting the intensities' average from the original spectrum), some of the intensities of the normalized spectrum (Y-axis) were negative. Therefore, all spectra after vector normalization were corrected by shifting the minimum intensities to zero.

Machine Learning Analysis
The NFC, NIH/3T3, and MBM-T cells were treated as different categories; we aimed to predict the correct category by analyzing the Raman spectrum. PCA and LDA were used for the goals analysis to identify each category's characteristic features [40][41][42]. In addition, the decision system was designed to use a classifier (biological system type: normal, precancerous, and cancerous) based on a feature vector to determine the category of the model.

Principal Component Analysis
The PCA technique is most often used for dimensionality reduction. In the current work, we used PCA for both data visualization and feature extraction. The eigenvectors of the covariance matrix are the principal components (PCs). The PCs with the highest eigenvalues capture a major part of the data variance. The projection of each measurement (in our case, the Raman spectrum) onto the PC gives a weight that indicates the contribution of this PC to the measurement. The weights of the PCs with the highest eigenvalues are a low-dimensional representation of the measurement [40].

Linear Discriminant Analysis
All the classification tasks in this study were binary. After the features were extracted by PCA dimensionality reduction, the LDA classifier was applied. In the case of uniform priors, LDA classification is performed according to the minimum of the Mahalanobis distance between the feature vector and the mean vector of each class [43]. The application of LDA provides class separability by constructing a linear decision region between the different classes.
In general, the classifier calculates an average for each group under the assumption of a Gaussian distribution with a shared covariance matrix; if the difference is only in the averages of each group, the separation is obtained by a hyper-plane (k − 1). When the data are two-dimensional (k = 2), the hyperplane is a straight line. When the data are three-dimensional vectors (k = 3), the hyperplane is a plane. This separation boundary serves as a decision system that determines the class of a certain feature vector [16,44].

Validation
Cross-validation using the K-fold approach was applied to evaluate the classification performance. Using K-fold, the database is partitioned into K groups. In this study, since the database was relatively small, five-fold cross-validation was used (K = 5). In this method, one fold is left for the test and the remaining four folds are used to train the classifier; thus, the training subset and the test subset are disjointed. In such a way, the predictive power of the LDA was evaluated.
To evaluate the statistical accuracy, the validation process was performed five times, and, in each repetition, a different fold was used for prediction. The performance of the binary LDA classifier was averaged and summarized in the confusion matrix, as illustrated in Table 3. The deviation in the accuracy of the classifier was calculated as the standard deviation of the LDA classifier in each fold.
When the classification was performed between the normal category (NFC) and the abnormal category (combined NIH/3T3 and MBM-T), the abnormal category was determined as the positive state. When the classification was performed between couples of the three categories, the cancerous cells (MBM-T) were determined as the positive state in NFC-(MBM-T) and NIH/3T3-(MBM-T). When the classification was performed between NIH/3T3-NFC, the precancerous (NIH/3T3) cells were determined as the positive state.
Four statistical indices were obtained, as shown in Table 3 the number of cases correctly predicted by the classifier as the negative group. FN describes the number of cases incorrectly predicted by the classifier as the negative group. The performance of the classifier was calculated in terms of the accuracy (ACC), sensitivity (SE), specificity (SP), positive expected value (PPV), and negative predictive value (NPV), according to Equations (1)- (5):

Results and Discussion
The potential of Raman spectroscopy to distinguish between three biological systems-primary normal cells, precancerous cells, and cancerous cells-was investigated during the initial stage of this project. The analyses were based on 997 Raman spectra acquired from 457 different cells and cultures that were grown on different days, as detailed in Table 1.
As mentioned, many changes in the components of all cells occur during the initiation of cancer. It is interesting to identify in which region of the cell the biochemical changes are dominant. With this goal, the spectra acquired from the different cell regions-cell center, cytoplasm, or cell membrane-were analyzed separately. Spectral differences correlate with the biochemical changes; thus, large spectral differences will lead to higher classification rates. Thus, the analyses were performed based on the spectra acquired from the different sites separately; the performance of the classifiers was compared. For the 1800-600 cm −1 region, Figure 3 displays the average spectra of the measurements from the different regions of each of the three biological systems investigated in the study: NFC, NIH/3T3, and MBM-T. As can be seen, the spectral changes were minor, making it impossible to distinguish between these biological systems using simple methods such as visual comparison.
In particular, regarding these spectral Raman differences, the delta between each couple of the three systems was applied for each region (i.e., the cell center, cell cytoplasm, and the cell membrane), as shown in Figure 4. Figure 4 shows that the most noticeable differences between the average spectra of the three systems occurred in the wavenumber range of 700-800 cm −1 , with a peak at 1064 cm −1 ; in the wavenumber range of 1096-1088 cm −1 ; at 1128 cm −1 , 1250 cm −1 , 1311 cm −1 , 1337 cm −1 , 1443 cm −1 , and 1578 cm −1 ; and finally in the wavenumber range of 1602-1618 cm −1 and at 1700 cm −1 , which arose from different components and likely also the structures of the main molecules that composed the biological samples (e.g., the proteins, fats, carbohydrates, and nucleic acids).
As is clear from Figure 4, the spectral differences between the MBM-T cells, representing cancer cells, and the NFC cells, representing normal cells, were the highest, followed by the spectral differences between NFC and NIH/3T3 cells, which represented precancerous cells.
However, the differences between the NIH/3T3 and MBM-T cells were the lowest and were considered very small. It is worth noting that these differences were prominent in different parts of the cell and were not specifically focused on one part of the cell. These results align with the biological hypothesis of the existence of significant changes in cells when they transform from normal cells to cancerous ones, including changes in the cell membrane, cell cytoplasm, and cell nucleus [45]. the wavenumber range of 1602-1618 cm −1 and at 1700 cm −1 , which arose from different components and likely also the structures of the main molecules that composed the biological samples (e.g., the proteins, fats, carbohydrates, and nucleic acids).
As is clear from Figure 4, the spectral differences between the MBM-T cells, representing cancer cells, and the NFC cells, representing normal cells, were the highest, followed by the spectral differences between NFC and NIH/3T3 cells, which represented precancerous cells.  As is known, the development of a tumor and the host's responses to it might be affected by the surface molecular alterations brought about by malignization, including changes in proteins and carbohydrates that function as enzymes and cell surface receptors [46]. Changes also occur in the cytoplasm, which are reflected by new proteins and other components [46]. Malignant cells have a small amount of cytoplasm, which frequently contains vacuoles [4].
Moreover, the cell organelles, which are usually distributed inside the cells in the cytoplasm and cell center, including the nucleus, undergo several changes.
Through its alterations, the nucleus of a cancerous cell contributes significantly to the evaluation of a tumor malignancy, as mentioned in the Introduction.
In addition, the granular endoplasmic reticulum appears to be a more streamlined structure and the cisternae may be clogged with amorphous, granular, or filamentous debris. Along with a decrease in the granular endoplasmic reticulum and an increase in free ribosomes and polysomes, tumor cells also exhibit an increase in free ribosomes and polysomes, which indicates that more proteins are being produced throughout the cell growth process [4].
PCA projections of the data into a two-dimensional subspace were applied as a first step in the differentiation between the three tested biological systems ( Figure 5). These plots were generated for visualization and to estimate the complexity of the classification problem. Moreover, these plots offered a clearer and more accurate understanding of the differences between the three biological systems, MBM-T, NIH-3T3, and NFC, while demonstrating the ability to distinguish between the three biological systems as pairs, based on the Raman spectra obtained from various cell regions. In Figure 5, two clear clusters can be seen in all the pairs of (MBM-T)-NFC and NIH/3T3-NFC, regardless of the region of measurement in the cell. Although there are two distinct clusters in each figure, there is still some overlap between the points; the overlap between the points is almost complete when the biological systems considered are precancerous and cancerous cells. Thus, the classification complexity of the precancerous and cancerous classes was higher compared to the classification between normal and cancerous cells and between normal and precancerous cells. Different projections onto different PC subspaces were examined and the best projections were found in the PC1-PC2 subspace, as presented in Figure 5.  Following the PCA analysis, the LDA classifier was applied for the classification of the three different biological systems, the MBM-T, NIH/3T3, and NFC biological systems, with four databases. In the first database (database I), the characteristic vector of each cell was the average of three Raman spectra measured from the center, cytoplasm, and cell membrane. Meanwhile, in the second database (database II), third database (database III), and fourth database (database IV), the characteristic vector of each cell was the Raman spectrum measured from the center, cytoplasm, and membrane of the cell respectively.
At this level, PCA was used for dimensionality reduction; thus, the feature vectors were the coefficients of the PCs. The eigenvectors for the projection were calculated for the training set only. The required number of PCs is task-dependent: when the differences between the classes are large, a small number of PCs is required (simple classification), while, when the differences between the classes are minor, a large number of PCs is Following the PCA analysis, the LDA classifier was applied for the classification of the three different biological systems, the MBM-T, NIH/3T3, and NFC biological systems, with four databases. In the first database (database I), the characteristic vector of each cell was the average of three Raman spectra measured from the center, cytoplasm, and cell membrane. Meanwhile, in the second database (database II), third database (database III), and fourth database (database IV), the characteristic vector of each cell was the Raman spectrum measured from the center, cytoplasm, and membrane of the cell respectively.
At this level, PCA was used for dimensionality reduction; thus, the feature vectors were the coefficients of the PCs. The eigenvectors for the projection were calculated for the training set only. The required number of PCs is task-dependent: when the differences between the classes are large, a small number of PCs is required (simple classification), while, when the differences between the classes are minor, a large number of PCs is required (hard classification). Thus, when classification was performed between each of the couples, namely abnormal (cancerous and precancerous)-normal, cancerous-normal, and precancerous-normal, 2-6 PCs were sufficient. When the classification was performed between the cancerous and precancerous categories, 10-20 PCs were required to enable the LDA classifier to achieve the best classification.
The classifier's performance was evaluated according to its classification success rate (Acc) regarding the number of PCs for the classification of normal and abnormal cells for the first 25 PCs. As illustrated in Figure 6, we created a graph of the success rate as a function of the number of PCs using databases I, II, III, and IV. required (hard classification). Thus, when classification was performed between each of the couples, namely abnormal (cancerous and precancerous)-normal, cancerous-normal and precancerous-normal, 2-6 PCs were sufficient. When the classification was performed between the cancerous and precancerous categories, 10-20 PCs were required to enable the LDA classifier to achieve the best classification.
The classifier's performance was evaluated according to its classification success rate (Acc) regarding the number of PCs for the classification of normal and abnormal cells for the first 25 PCs. As illustrated in Figure 6, we created a graph of the success rate as a function of the number of PCs using databases I, II, III, and IV.
After a certain PC number, the accuracy rate reached a plateau, which means that increasing the number of PCs did not affect the LDA accuracy rate. The smallest number of PCs that enabled the classifier to reach the plateau was chosen.  Table 4 shows the classifier's performance when classifying two different categories The results show that Raman spectroscopy has the potential to detect any changes in cells that are likely to develop into cancer cells, with 93.1% Acc (Table 4a).

Table 4. Performance of the LDA classifier in distinguishing healthy NFC cells from aberrant cells (NIH/3T3 and MBM-T) in (a). The distinction between cancerous MBM-T cells and precancerous NIH/3T3 cells is shown in (b). The distinction between normal NFC cells and precancerous NIH/3T3 cells is shown in (c). The distinction between normal NFC cells and cancerous MBM-T cells is shown in (d)
. Four (I, II, III, and IV) databases were classified, where the feature vectors were the Raman spectra after dimensionality reduction using PCA.   After a certain PC number, the accuracy rate reached a plateau, which means that increasing the number of PCs did not affect the LDA accuracy rate. The smallest number of PCs that enabled the classifier to reach the plateau was chosen. Table 4 shows the classifier's performance when classifying two different categories. The results show that Raman spectroscopy has the potential to detect any changes in cells that are likely to develop into cancer cells, with 93.1% Acc (Table 4a).
Moreover, Table 4b shows the classifier's performance in classifying the precancerous cells NIH/3T3 versus cancer cells MBM-T, with 80.2% success. In addition, Table 4c,d show the binary LDA classifier's performance in categorizing normal NFC cells and precancerous cells (NIH/3T3), normal NFC cells, and cancer cells (MBM-T), respectively.
From the performance of the classifier, presented in Table 4, it is evident that the classification of normal NFC cells and precancerous NIH/3T3 cells, and the classification of normal NFC cells and cancerous MBM-T cells, were the most successful for the database of average cells (database I), with 93.7% and 96.5% success, respectively. Additionally, although the classification of the measurements taken from the cytoplasm (database III) yielded slightly better results compared to the other measurement locations for both categories (c and d), the success rate of the classification was very high when taken from the three tested regions of the cell. These results are not surprising because, when normal cells are transformed, all the cell components, including the nucleus, cytoplasm, organelles, and membrane, undergo many changes. Changes cannot take place in one area without changes also taking place in others. Therefore, the average for all areas of each cell already includes most of the changes in the cell biomolecules during their transformation.
The cytoplasm region contains many proteins, organelles, and cell structures, so measurement from the cytoplasm can include changes in any of these structures, which might explain the slight superiority achieved when using this area for the classification of normal NFC cells and aberrant cells (NIH/3T3 and MBM-T).

Table 4. Performance of the LDA classifier in distinguishing healthy NFC cells from aberrant cells (NIH/3T3 and MBM-T) in (a). The distinction between cancerous MBM-T cells and precancerous NIH/3T3 cells is shown in (b). The distinction between normal NFC cells and precancerous NIH/3T3 cells is shown in (c). The distinction between normal NFC cells and cancerous MBM-T cells is shown in (d)
. Four (I, II, III, and IV) databases were classified, where the feature vectors were the Raman spectra after dimensionality reduction using PCA.  The Raman spectrum is rich in features arising from both the chemical and morphological structures of various cells' biomolecules, and due to their modes of vibration. Dividing the spectrum into sub-ranges may help to indicate which range and biomolecules have been significantly altered during cancer development. Moreover, it can help to determine the most informative range within the Raman spectra that can be used for discrimination between normal NFC and precancerous NIH/3T3 cells; consequently, it can be particularly useful for the early detection of cancer. The spectra were divided into four domains that characterized the main functional groups of the cells' biomolecules. The range 1195-600 cm −1 was contributed mainly by carbohydrates and the 1380-1196 cm −1 range encompassed mainly proteins and lipids, while the 1520-1381 cm −1 range was mainly contributed by nucleic acids. In addition, the 1728-1521 cm −1 range encompassed mainly proteins (amide I and amide II). Table 5 shows the classifier's performance for the classification of the three systems as pairs of normal NFC versus aberrant cells (NIH/3T3 and MBM-T), precancerous NIH/3T3 versus cancerous cells MBM-T, NFC versus NIH/3T3, and NFC versus MBM-T, in the four different ranges of the Raman spectra. The feature vectors in the LDA classifier of each cell were the average of three Raman spectra measured from the center, cytoplasm, and cell membrane (database I), which offered the highest success rate for differentiation. Table 5. Performance results of the LDA classifier in discriminating between different systems as pairs: (a) NFC versus (NIH/3T3 and MBM-T), (b) NFC versus NIH/3T3, (c) NFC versus MBM-T, and (d) NIH/3T3 versus MBM-T. The analysis was based on four wavenumber ranges, namely carbohydrates (1195-600 cm −1 ), proteins (amide III) and lipids (1380-1196 cm −1 ), nucleic acids (1520-1381 cm −1 ), and (d) proteins (amide I and II) (1728-1521 cm −1 ), of the Raman spectra of database I.   Table 5 shows that the 1195-600 region yielded slightly greater differentiation success than other spectrum ranges in discriminating between NFC, NIH/3T3, and MBM-T; NFC and NIH/3T3; and NFC and MBM-T. In contrast, the 1380-1196 cm −1 range, which represents mostly proteins and lipids, showed greater success when discriminating between NIH/3T3 and MBM-T. Cancer cells typically have altered energy metabolism, including increased resting energy consumption and increased sugar, lipid, and protein metabolism [47]. Aerobic glycolysis is dominant in cancer cells, which means that even when oxygen is present, cancer cells mostly obtain their energy through glycolysis (the Warburg effect) [48].
Because cancer cells need a significant amount of energy to thrive, this produces significantly less energy than oxidative phosphorylation, which appears counterintuitive. However, the Warburg effect may benefit cancer cells because it provides precursors for many biosynthetic pathways, including amino acid precursors and NADPH and ribose sugars for DNA and RNA synthesis. Glycolytic enzymes such as GLUT1, lactate dehydrogenase, pyruvate kinase, and the lactate exporter are unregulated in cancer cells, whereas pyruvate dehydrogenase is inhibited, increasing glycolytic flux and reducing pyruvate's ability to enter oxidative phosphorylation [48].
These facts regarding the metabolism of sugars in cancer cells present a clear and logical justification of the results, which indicated that the 1195-600 cm −1 region, contributed mainly by carbohydrates, had a greater ability to distinguish between normal and cancerous cells and between normal and precancerous cells. At the same time, the situation was different regarding the ability to distinguish between cancerous and precancerous cells, because both of them altered their energy metabolism. However, it is important to note that when using the entire spectrum (1800-600 cm −1 ), which included all changes in all cell biomolecules (proteins, lipids, nucleic acids, and carbohydrates), we were able to differentiate between cells more successfully. This was true despite the changes in the carbohydrate region being consistent with scientific facts.
As mentioned, all the cells of the three biological systems were measured from three different sites in the cell, the center, cytoplasm, and cell membrane, proving that there was no influence of the measurement site concerning the classification success rate. Moreover, because the spatial resolution of Raman spectroscopy is high (~1.5 µm), some of the spectral differences were due to the screening of different organelles during measurements.
It is preferable to take several measurements from different sites in the cell and use the average of the obtained spectra as a representative spectrum in the classification analyses, to reduce the spectral variation due to the screening of different organelles.

Conclusions
Raman spectroscopy is a potential method of discrimination between normal and precancerous or cancer cells and it can help in the early detection of cancer. A clear impact on the lives and health of patients, as well as the medical and economic fields more generally, will result from the ability to identify precancerous cells. Discrimination between normal and precancerous or cancerous cells is not related to specific regions of cells. It is recommended to take measurements from multiple locations in the cell and use the average of these measurements for differentiation. In addition, the use of the complete spectrum range (1800-600 cm −1 ), without segmenting it into separate ranges based on the primary biomolecule contributor, yields superior results than the use of one range alone.