Improving Diagnosis of Cervical Pre-Cancer : Combination of PCA and SVM Applied on Fluorescence Lifetime Images

We report a significant improvement in the diagnosis of cervical cancer through a combined application of principal component analysis (PCA) and support vector machine (SVM) on the average fluorescence decay profile of Fluorescence Lifetime Images (FLI) of epithelial hyperplasia (EH) and CIN-I cervical tissue samples, obtained ex-vivo. The fast and slow components of double exponential fitted fluorescence lifetimes were found to be higher for EH compared to the lifetimes of CIN-I samples. Application of PCA to the average time-resolved fluorescence decay profiles showed that the 2nd PC, in combination with 1st PC, enhanced the discrimination between EH and CIN-I tissues. Fluorescence lifetime and PC scores were then classified separately by using SVM support vector machine to identify the two. On applying SVM to a combination of fluorescence lifetime and PC scores, diagnostic capability improved significantly.


Introduction
Since the early 1980s, steady-state fluorescence spectroscopy has been extensively studied by several groups for detection of cancer [1][2][3][4][5][6][7][8][9].Fluorescence spectra have been utilizused by several groups to detect cervical cancer [6,9,10].Ramanujan et al. have efficiently combined reflectance with fluorescence spectroscopy for in-vivo cervical pre-cancer detection [9,11], which has resulted in a fiber based probe for clinical studies.However, one of the limitations of fluorescence spectroscopy from biological tissue is the highly overlapping character of contributing fluorophores making it difficult to understand disease development through the changes in fluorophores.In addition, it is difficult to interpret or quantify the difference in spectra due to quenching by other molecules, aggregation, or energy transfer, because fluorescence intensity depends on fluorophore concentration and illumination intensity.On the other hand, fluorescence lifetime (τ) is independent of fluorophore concentration and illumination intensity but depends only on the intrinsic characteristics.It is also sensitive to the local environment such as viscosity, pH, refractive index as well as interactions with other molecules [12][13][14].During the progress of cancer, the environment of the contributing fluorophores changes and this can be captured by its lifetime as it is more sensitive to environment as compared to steady-state fluorescence [15,16].Hence, the temporal information adds an extra dimension which can be used to distinguish different fluorophores with overlapping spectra but exhibiting different lifetimes.The photo-physical properties of the intrinsic bio-molecules and bio-structures have been considered as a possible parameter that may be related to the morphofunctional state of a biological tissue.Due to this advantage, time-resolved fluorescence technique has been used to study the structure and dynamics [17][18][19][20][21] of biological molecules and has been applied in tissue diagnosis [21,22], specifically for cervical cancer detection.[22,23].
Tissue undergoes changes during the development of cancer.Morphological changes mostly contribute to changes in the scattering properties, while biochemical changes occur at a cellular level from bio-molecules which are reflected in the absorption properties of the tissue [24,25].In addition, tissue undergoes changes in concentration of fluorophores e.g., decrease in concentration of flavin adenine dinucleotide (FAD) and increase in concentration of nicotinamide adenine dinucleotide (NADH) which are well documented [26,27].Optical techniques may be more sensitive to the changes occurring in the preliminary stages of cancer than the existing conventional techniques [25,28].
Further, application of PCA on fluorescence lifetime images improves the capabilities of the lifetime, since the principal components capture both absorption and scattering effects [29].PCA is a well known technique used for dimensionality reduction of data [30].It has been used by several groups for cancer diagnosis by feature extraction [26,29,[31][32][33].Recently, different organs have been differentiated using this technique, in which changes in anatomical behavior is reflected in its time corelated fluorescence signal [34][35][36].Statistical algorithms play an important role in enhancing the discrimination.The physiological changes in each organ lead to different fluorescence decay times.These are enhanced through different principal components, which provide structural information of organs.In another study, our group has validated with the phantom studies, how the PCs capture the absorption and scattering information and thus clearly demarcate the precancerous regions from the cancerous sites [29].
For better classification of cervical data, comparative evaluation of several classifier techniques using same set of data is necessary.Among all the available classifiers, SVM is robust and performs better than other classification techniques.SVM is a machine learning algorithm used for classification of data, function approximation, etc., due to its generalization ability, it has been successfully applied in many cases [37][38][39][40][41][42][43].SVM works by minimizing the upper bound of error through maximizing the margin between separating hyperplane and data set.It has the advantage of choosing the model automatically such that both the optimal number and location of basis functions are determined during training.SVM is also suitable for small samples and some inherently non-linear problems.There are several kernel functions available for SVM, both for linear and non-linear classification, and its performance largely depends on the selection of the kernel function [44,45].Rarely is one index test sufficient for diagnosis of a particular condition, and so diagnostics involving multiple tests are often used.We have hence, utilizused the advantage of SVM on fluorescence lifetime and PC scores for classification.
Here, we report the discrimination of EH and CIN-I through their fluorescence lifetime.The average fluorescence decay profiles were fitted to a double exponential.A principal component analysis was applied to the time-resolved fluorescence decay and PC scores were calculated for each sample.Fluorescence lifetime and PC scores were used for classification of EH and CIN-I samples by the application of a machine learning algorithm (SVM).We explain how the combination of SVM results of fluorescence lifetime and PCA were used to improve the diagnostic efficacy.

Fluorescence Lifetime
Here, we have used 375 nm wavelength pulsed laser for excitation, with the resulting fluorescence dominantly from NADH and FAD [28].Fluorescence intensity images at a certain time delay are shown in Figure 1a,d for EH and CIN-II, respectively.The corresponding FLI are shown in Figure 1b,e for fast component (τ 1 ) and Figure 1c,f for slow component (τ 2 ).Both the fast and slow components of FLI for EH are higher than those of CIN-I tissues.However, fluorescence lifetime at few pixels are higher due to bad fitting at those pixels.Fluorescence decay was found to be faster for CIN-I compared to EH samples, as indicated in Figure 2a.Scatter plot of fast versus slow components of fluorescence lifetimes of fitted average fluorescence decay of all the samples for EH and CIN-I tissues is shown in Figure 2b.The fast and slow components of average fluorescence lifetime also show higher values for most of the EH as compared to CIN-I tissues as illustrated in Figure 2b.

Principal Component Analysis
PCA has been performed on average time-resolved fluorescence decay profiles following the method mentioned in Section 4.3.2.First three eigenvalues are considered here as they capture more than 99.5% of the total variance of the fluorescence decay.This can be seen in the eigenvalue and variance plots shown in Figure 3a.PC scores represent the projection of data on the principal components and hence carry useful information, which can be used for classification purposes.Scatter plots between 1st and 2nd PC scores, 2nd and 3rd PC scores and 1st and 3rd PC scores are shown in Figure 3b-d, respectively.It can be seen from Figure 3b that the EH and CIN-I tissue samples can be distinguished clearly from the scatter plot between 1st and 2nd PC scores, but the distinction becomes clearer from the scatter plot between 2nd and 3rd PC scores as seen in Figure 3c.From Figure 3d, it can be seen that the scatter plot between 1st and 3rd PC scores for both EH and CIN-I samples are highly overlapping and hence, the 3rd PC does not carry useful information, which can be used for classification.

Support Vector Machine
The SVM is used for classification of the processed data.The important features of SVM are mapping linear inseparable data into high-dimensional space by non-linear kernel function and linearly distinguishing the data in high-dimensional space.SVM is especially suitable for small samples and some inherently non-linear problems.The data points that lies close to the decision surface receives the maximum weightage and the points far away from the margin receives zero weightage.These data points close to the decision boundary are called support vectors.The margin of the classifier is determined by the distance from the decision surface to the support vectors.In SVM, the holdout method of cross-validation was employed to randomly divide the data set into two parts: a training set consisting of 16 EH and 16 CIN-I samples and validation set includes 9 EH and 4 CIN-I and 2 CIN-II samples.Here, the input data matrices for SVM classifier were dimension reduced by application of PCA.The first two PCs are considered as they occupy more than 99% of total information.To obtain the optimal parameter of the classifier, a grid search technique with tenfold cross-validation was employed.Given a constant sample size, one approach to improve the classification accuracy may be to incorporate non-linear techniques such as non-linear kernels in SVM analysis.Different types of linear and non-linear kernel functions were tried, out of which "radial basis function (RBF)" Kernel was found to classify better than others.Figure 4a,c

Fluorescence Lifetime
Both the fast and slow components of FLI shown in Figure 1b,e and c,f respectively are found to be higher for EH than those of CIN-I tissue.It may be noted that the low SNR at the edges of the images create error in fitting, giving rise to higher values of lifetime.The average fluorescence decay profile, on the other hand, shows a better SNR and hence its double exponential fitting is more reliable.On comparing the lifetimes obtained from the images and the average fluorescence decay profiles, they are found to match well where the SNR are high.On comparing the fluorescence lifetime for all tissues from both the categories, both the fast and slow components of lifetime are found to be higher for EH than CIN-I, which can be confirmed from Figure 2b.In our earlier results we showed that the fluorescence lifetime for normal tissue is lesser than that of CIN-I tissue [29] but, it is pertinent to note that in this study, we report results of EH rather than normal tissues.EH is declared as normal by the histopathologist, but, its epithelial thickness is almost double of a typical normal tissue.The fluorescence signal would then emerge from deeper regions undergoing higher number of scattering events before escaping from the tissue, which would result in an increase in fluorescence lifetime.

Principal Component Analysis
Fluorescence lifetime shows a good discrimination between the EH and CIN-I tissues, but the double exponential fitting is prone to error in case of noisy data and hence cannot be trusted in case of low SNR [46].This can be confirmed from Figure 1b,c,e,f where lifetime for few pixels show high values due to bad fitting because of low SNR.PCA helps to overcome this limitation.It has been shown that fluorescence lifetime is more sensitive to changes in scattering and is unaffected by absorption.
Application of PCA to time-resolved fluorescence images has the advantage of capturing both the changes in absorption and scattering of fluorophore environment [16,29].First three eigenvalues are considered here as they capture more than 99.5% of the total variance of the fluorescence decay as seen in Figure 3a.The eigenvectors corresponding to these three eigenvalues are used to represent the complete data, and reconstruction using these display profiles very close to the original data.From Figure 3d it can be seen that the scatter plot between 1st and 3rd PC scores are highly overlapping, while scatter plots between 1st versus 2nd (Figure 3b) and 2nd versus 3rd (Figure 3c) PC scores shows clear distinction between EH and CIN-I samples.From the above results, one can assume that the 2nd PC plays an important role in discriminating EH and CIN-I tissues as it captures the subtle changes in fluorophore environment i.e., the changes in absorption and scattering, which complies with our earlier results [29].

Support Vector Machine
For PC score classification, first two PCs are considered as they carry more than 99% of the original information (as seen in Figure 3a) while 3rd PC does not carry any significant information.The overall model results for PC scores using different kernels for training data is shown in Table 1.The performance of a model is generally evaluated in terms of accuracy, precision, sensitivity, and specificity.From Table 1 it can be seen that polynomial and RBF kernel function performances are similar with accuracy, precision, sensitivity and specificity of 84%, 100%, 100% and 100% respectively.The higher specificity and sensitivity for non-linear SVM clearly indicates that the boundary separating CIN-I and EH samples are not linear.For the calculation of specificity and sensitivity the samples lying on the boundary between the two regions have not been considered.If we only consider the samples lying on either side of the boundary then RBF kernel performs better than others.The RBF kernel based SVM classification results of lifetime and PC scores for training and validation data between EH and CIN-I are shown in Figure 4.In the validation results in Figure 4b,d two CIN-I samples are lying outside the CIN-I regions for both the cases, while all the normal samples are clearly distinguished.The low specificity can be attributed to lesser number of training samples used.As the number of samples increases, the separating hyperplane will become more robust and will improve the classification accuracy.Table 2 shows the sensitivity and specificity for lifetime, PCA and their combined results for both training and validation data, respectively.As we have not considered the data lying on boundary for sensitivity and specificity calculation, it becomes 100% for training data set.For training data both the lifetime and PC scores have similar specificity and sensitivity but in case of validation data PCA performs better than fluorescence lifetime.The combined results obtained from application of SVM shows a better sensitivity as intended.

Sample Collection
Fresh human cervical tissue samples used in this ex-vivo experiment were obtained from GSVM medical college, Kanpur, Uttar Pradesh, India.The freshly resected tissue samples were stored in ice for transportation to the lab.Before performing the experiment, the samples were thawed to room temperature and then rinsed with saline water to remove superficial blood.The experiment was performed within 4 h of biopsy.Based on visual examination, samples were labeled as normal and abnormal.After completing the experiment, the samples were sent back to the medical college for histopathology from which the samples were confirmed as EH and CIN-I.EH is declared as normal but its epithelial thickness is almost double compared to normal tissue.Total number of 10 EH samples with 25 sites, 11 CIN-I samples with 20 sites and 1 CIN-II sample with 2 sites have been examined in this study.Out of which 16 sites from each group has been used as training data and 9 sites from EH and 4 sites from CIN-I and 2 sites from CIN-II has been used as validation data.

Data Collection
LaVision ICCD and PicoQuant picosecond pulsed diode laser (375 nm wavelength with pulse width of 48 ps, repetition rate 40 MHz & average power 0.5 mW) driven by PDL 800-B driver was used to capture fluorescence lifetime images.The pulsed signal from a single mode fiber was collimated using a achromatic lens (Thorlab f = 75 mm).Fluorescence signal from the tissue was collected using a 450 nm wavelength long pass filter and imaged onto the ICCD through a camera lens (Nikon AF Nikkor 50 mm f/1.8 D).Fluorescence lifetime imaging system comprises a high rate imager (HRI), ICCD and a high rate delay generator (HDG).HRI controls the functioning of image intensifier and HDG provides the desired delay of 100 ps with respect to excitation pulse.The pulses were synchronized to capture FLI from the sample at delay steps of 100 ps with respect to excitation pulse.Control of components and data collection were done by a "Davis" user interface software.
Figure 5 displays the block diagram of experimental setup of fluorescence lifetime imaging system.A 375 nm pulsed diode laser was used to excite the cervical tissue sample and the FLI were captured with a gate width of 200 ps at steps of 100 ps time delay from the excitation pulse.For good SNR at each pixel the CCD acquisition time was set at 3000 ms.Fluorescence decays were recorded by shifting the time gate in steps of 100 ps over 12.7 nanoseconds.Fluorescence lifetime imaging: Fluorescence intensity decays at every pixel were fitted to a double exponential (Equation ( 1)) using a non-linear least square minimization scheme.The fast and slow components of lifetimes for all pixels were then displayed separately as FLI.The code for the fitting is written by using "fit" function available in MATLAB.
Average fluorescence lifetime decays: Average fluorescence decay profile was fitted to a double exponential χ 2 (= 0.9999).The double exponential function used to fit data is shown in Equation ( 1).The fitting here has been performed using the curve fitting tool available in MATLAB.

Principal Component Analysis (PCA)
PCA is applied to the average fluorescence decay for the dimensional reduction without loss of any feature.PCA reduces the dimension of the data by finding the orthogonal linear combination (the principal components) of the original variables having largest variance.PCA captures the effect of absorption and scattering through the eigenvectors [29].To extract principal components of the timeresolved fluorescence signal, a correlation matrix C is constructed.
where, δI j (k) = A jk is the mean subtracted intensity divided by the standard deviation, computed over the samples at each time.Index i varies from 1 to 75, representing fluorescence decay profile and k is the number of samples.The eigenvectors and the eigenvalues of the correlation matrix are extracted using singular value decomposition.The eigenvectors are rearranged in descending order of their eigenvalues.First principal component is the eigenvector corresponding to highest eigenvalue of the matrix, similarly the 2nd PC, 3rd PC, 4th PC are called according to descending order of eigenvalues.PC scores represent the projection of data on the principal components and hence carry useful information, which can be used for classification purposes.The code for this study is written in MATLAB.

Support Vector Machine (SVM)
SVM is a machine learning technique that utilizuses the structural risk minimization (SRM) scheme of statistical machine learning and forms an optical separating hyperplane (OSH) which maximizes the width of the margin between different classes.The OSH minimizes the risk of misclassifying not only the data points in the training set but also yet-to-be-seen data points of the test set for a fixed but unknown probability distribution of the data thereby following the SRM principle.The training data points lying far from the decision boundary receives zero weight while the data points close to the decision boundary have non-zero weight.Support vectors are the training data samples along the hyperplanes near the class boundary and the margin is the distance between support vectors and the class boundary.A classification task involves with training and testing data which consists of some data instances.At each instance, the training set contains one "target value" (class labels) and several "attributes" (features).The two-class decision function defined by an SVM classifier is given by where K(x i , x j ) is the kernel function of a new data point x j (to be classified) and a set of training data points x i , S is the set of support vectors (a subset of training set), and λ i = ±1 is the label of training data points x i and α i ≥ 0 are the Lagrange multipliers for OSH.There are three most commonly used kernel functions, which are the linear kernel K(x i , x j ) = x T i x j , the polynomial kernel K(x i , x j ) = (x T i x j + 1) d and the Gaussian or RBF kernel K(x i , x j ) = exp(− x i − x j 2 ).

Conclusions
Fluorescence lifetime distinguishes EH and CIN-I tissues and the distinction improves by the application of PCA.The difference is attributed to the variation in scattering between EH and CIN-I tissues.Application of PCA enhances the discrimination as the principal components capture both absorption and scattering effects [29].Further application of SVM on fluorescence lifetime and PC scores quantifies the distinction with even better accuracy.Finally, the combined results of both fluorescence lifetime and PCA significantly improves the sensitivity and hence the diagnostic capability.The preliminary study suggests that the fluorescence lifetime and PCA combined with RBF kernel function in SVM has the potential to demarcate abnormal from EH samples and performance of this method will become more robust with a larger data set.
Author Contributions: G.R.S. did all the experiments, analysis and wrote the paper.P.S. helped in performing the experiments.A.P. provided guidance throughout the experimental and analytical work giving useful advice to remove any bottleneck during the work and critically examined the manuscript.K.P. provided the biopsied cervical tissue samples, while C.K. took care of all the histopathology work.

Funding:
The equipment was funded by a project from Department of Biotechnology, Ministry of Science and Technology, India, grant number BT/PR4497/MED/30/706/2012 and Indian Institute of Technology Kanpur, India.

Figure 1 .
(a,d) The fluorescence intensity image at a certain time delay; (b,c,e,f) corresponding fluorescence lifetime images of fast and slow components for EH and CIN-II, respectively.(a) (b) Figure 2. (a) Average fluorescence decay profiles and (b) Scatter plot of fast versus slow components of fluorescence lifetimes for EH and CIN-I sample.

Figure 3 .
(a) Eigenvalue and variance plots of first six eigenvalues, (b) Scatter plot of 1st PC versus 2nd PC, (c) 2nd PC versus 3rd PC and (d) 1st PC versus 3rd PC scores for EH and CIN-I cervical tissue.
show the plot of training data results between fast and slow components of lifetime and 1st and 2nd PC scores respectively, which are separated by well defined EH and CIN-I regions.Corresponding validation data results are shown in Figure 4b,d.In both the cases, the two CIN-I samples are lying outside the CIN-I regions, while all the normal samples are clearly distinguished.Here we have used two analysis techniques for detection of cervical cancer.To improve the diagnosis, we combined the results obtained from both the techniques.To improve the sensitivity of the test it is considered positive if one result is positive.The final sensitivity (Sn) and specificity (Sp) of the combined tests (A and B) used in this work is, Sn AB = Sn A + Sn B − Sn A × Sn B ans Sp AB = Sp A × Sp B .The calculated combined specificity and sensitivity are 87% and 100% respectively.

Figure 4 .
The RBF-SVM based classification of EH and CIN-I cervical tissues for fast component versus slow component (a) training and (b) validation; 1st PC versus 2nd PC (c) training and (d) validation data sets.

Figure 5 .
Figure 5. Block diagram of fluorescence lifetime imaging system for capturing time-resolved fluorescence images.

Figure 6 .
Figure 6.Flow chart of the algorithm used for analysis, PCA: Principal Component Analysis, SVM: Support Vector Machine.

Table 1 .
SVM model results using linear, polynomial and Gaussian radial basis kernel functions for training data set.

Table 2 .
Comparison of sensitivity and specificity for the results obtained from lifetime, PCA and their combined results.