Multivariate Statistical Analysis of Surface Enhanced Raman Spectra of Human Serum for Alzheimer’s Disease Diagnosis

Featured Application: In this research we propose a novel method for detecting Alzheimer’s disease. This method involves the use of surface enhanced Raman spectroscopy in combination with multivariate statistical analysis. Based on the results of the proof-of-concept study, we have successfully demonstrated the potential of the method to identify Alzheimer’s disease through analysis of blood serum. With further work, this method could be developed into a novel clinical assay for the e ﬀ ective and accurate diagnosis of Alzheimer’s disease. Abstract: Alzheimer’s disease (AD) is the most common form of dementia worldwide and is characterized by progressive cognitive decline. Along with being incurable and lethal, AD is di ﬃ cult to diagnose with high levels of accuracy. Blood serum from Alzheimer’s disease (AD) patients was analyzed by surface-enhanced Raman spectroscopy (SERS) coupled with multivariate statistical analysis. The obtained spectra were compared with spectra from healthy controls (HC) to develop a simple test for AD detection. Serum spectra from AD patients were further compared to spectra from patients with other neurodegenerative dementias (OD). Colloidal silver nanoparticles (AgNPs) were used as the SERS-active substrates. Classiﬁcation experiments involving serum SERS spectra using artiﬁcial neural networks (ANNs) achieved a diagnostic sensitivity around 96% for di ﬀ erentiating AD samples from HC samples in a binary model and 98% for di ﬀ erentiating AD, HC, and OD samples in a tertiary model. The results from this proof-of-concept study demonstrate the great potential of SERS blood serum analysis to be developed further into a novel clinical assay for the e ﬀ ective and accurate diagnosis of AD.


Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disease which results from neuronal cellular pathology associated with the dysregulation of protein and lipid metabolism, oxidative stress, and inflammation [1]. The discovery of relatively robust AD-specific biomarker signatures in blood has recently been reported by several research groups [2][3][4]. It was indicated that the diagnostic strategy could significantly benefit from combining several serum biomarkers because of the multiplicity of pathophysiological processes comprised in AD [5][6][7]. Efforts to find reliable biomarkers for AD in

Materials and Methods
Institutional Review Boards at the University at Albany and at Albany Medical College (AMC) reviewed and approved the research protocol for the human studies reported herein. All study subjects submitted a written informed consent to the authorized study personnel before participating.

Human Blood Serum Samples
Serum samples were collected from 48 individuals recruited from the subspecialty neurological clinics at the Alzheimer's Center and the Parkinson's disease and Movement Disorders Center, both at AMC. Three groups of subjects were recruited: AD, OD, and HC. The AD group included patients diagnosed with Alzheimer's disease at either mild (n = 10) or moderate (n = 10) stages of the disease. The OD group consisted of patients diagnosed with other neurodegenerative dementias including Lewy body dementia (n = 5), Parkinson's disease dementia (n = 10), and frontotemporal dementia (FTD, n = 3, where two of these donors have a specific variant of FTD called progressive supranuclear palsy). Medical history of all recruited patients was carefully reviewed by a trained neurologist who established the clinical diagnoses. Clinical assessments to determine the level of dementia were made using the following guidelines. Dementia was defined by a Clinical Dementia Rating Scale (CDR) of 0.5 or more for all dementia subjects [41]. AD was diagnosed by the NINDS-ADRDA criteria [42] and CDR was used to determine the stage of AD (CDR of 0.5-1 indicates mild AD, and CDR of 2.0 indicates moderate AD). Criteria for the diagnosis of Parkinson's disease utilized the Unified Parkinson's Disease Rating Scale [43,44] and diagnosis of Lewy body dementia was made following the guidelines outlined by the DLB Consortium [45]. The criteria for the diagnosis of FTD were applied as reported by Neary et al. [46].
The third group of blood serum donors included ten age-and gender-matched healthy control volunteers. Selected control subjects were typically the patients' spouses in general good health with no active major disease and who were free of any neurological or psychiatric disorders. The healthy controls thus had similar ethnic and socioeconomic backgrounds as well as environmental factors including age, education, race, religion, diet, and everyday lifestyle. A general summary of the demographic information for the study subjects is presented in Table 1. Table 1. Summary of demographic information on the blood serum study subjects. Age in years ± STD 76 ± 10 72. 4

Colloidal Silver Nanoparticles (AgNPs)
SERS measurements of blood serum were acquired using aqueous colloidal suspensions of hydroxylamine-reduced AgNPs prepared as described by Leopold and Lendl [47]. Briefly, colloidal AgNPs were obtained through the addition of 9 mL of 0.1 M sodium hydroxide solution mixed with 10 mL of 6 × 10 −2 M hydroxylamine hydrochloride solution to 180 mL of 1.11 × 10 −3 M silver nitrate aqueous solution. The mixture was shaken until it appeared homogeneous; the solution was a milky gray color. The concentration of silver nitrate and hydroxylamine hydrochloride/sodium hydroxide was calculated to be 10 −3 and 3 × 10 −3 /4.5 × 10 −3 M in the final obtained reaction mixture. The silver colloidal solution was then centrifuged at 9300 rcf for 10 min, and the supernatant was discarded. After making a 1:100 dilution, the AgNPs were characterized by an UV-Vis absorption maximum at 427 nm, as seen in Figure 1, which is in good agreement with the literature [47].

Sample Preparation
Peripheral blood samples of 5 mL were collected from AMC study subjects and immediately processed into aliquots of anticoagulated (1 mL) whole blood, plasma, and serum which were stored at −80 °C until analysis. EDTA was used as an anticoagulant at a concentration of 1.5 mg/mL. All blood samples were drawn and handled identically. For SERS measurements, the blood serum sample was defrosted on ice. The sample was then mixed with colloidal AgNPs in a 1:1 ratio by volume, bringing the total volume for each measurement to 40 μL. The solution was gently mixed with a pipette tip to ensure a homogeneous mixture was made. The solution was placed on a microscopic glass slide covered with aluminum foil and allowed to completely dry under gentle air flow for 5 min. The aluminum foil was used as a substrate due to the low fluorescence signal produced.

Spectroscopic Measurements
The SERS spectra were collected using a Renishaw inVia confocal Raman spectrometer equipped with a research-grade Leica microscope, and 20× long-range objective (numerical aperture of 0.35). The spectra were recorded using WiRE 3.2 software in the range of 400-1800 cm −1 under 785 nm excitation. To avoid photodegradation of the samples, the laser power was reduced to 11 mW. Spectra were recorded from ten sequential spots on the sample, with one 10 s accumulation collected at each spot. The data was collected using automatic mapping, in a grid-like manner, through the use of a Renishaw PRIOR automatic stage. No coffee ring effect observed; however, a cast film was observed. The area of the sample that was mapped was chosen at random.

Data Treatment
The SERS spectra were imported to MATLAB R2012a (7.14) software for preprocessing methods and analysis. The adaptive iteratively reweighted penalized least squares (airPLS) algorithm was used for fluorescent background removal [48]. The SERS spectra were normalized by total area and mean centered [49]. GA data analysis was performed using PLS_Toolbox 6.2 (Eigenvector Research, Inc.) available within the MATLAB environment [49] and the ANN analysis was performed using Rproject software, ver. 3.4.4.

Genetic Algorithm
Genetic Algorithm (GA) [50] was employed using PLS_Toolbox within the MATLAB environment to analyze the spectroscopic data used in the modeling events described below. Using

Sample Preparation
Peripheral blood samples of 5 mL were collected from AMC study subjects and immediately processed into aliquots of anticoagulated (1 mL) whole blood, plasma, and serum which were stored at −80 • C until analysis. EDTA was used as an anticoagulant at a concentration of 1.5 mg/mL. All blood samples were drawn and handled identically. For SERS measurements, the blood serum sample was defrosted on ice. The sample was then mixed with colloidal AgNPs in a 1:1 ratio by volume, bringing the total volume for each measurement to 40 µL. The solution was gently mixed with a pipette tip to ensure a homogeneous mixture was made. The solution was placed on a microscopic glass slide covered with aluminum foil and allowed to completely dry under gentle air flow for 5 min. The aluminum foil was used as a substrate due to the low fluorescence signal produced.

Spectroscopic Measurements
The SERS spectra were collected using a Renishaw inVia confocal Raman spectrometer equipped with a research-grade Leica microscope, and 20× long-range objective (numerical aperture of 0.35). The spectra were recorded using WiRE 3.2 software in the range of 400-1800 cm −1 under 785 nm excitation. To avoid photodegradation of the samples, the laser power was reduced to 11 mW. Spectra were recorded from ten sequential spots on the sample, with one 10 s accumulation collected at each spot. The data was collected using automatic mapping, in a grid-like manner, through the use of a Renishaw PRIOR automatic stage. No coffee ring effect observed; however, a cast film was observed. The area of the sample that was mapped was chosen at random.

Data Treatment
The SERS spectra were imported to MATLAB R2012a (7.14) software for preprocessing methods and analysis. The adaptive iteratively reweighted penalized least squares (airPLS) algorithm was used for fluorescent background removal [48]. The SERS spectra were normalized by total area and mean centered [49]. GA data analysis was performed using PLS_Toolbox 6.2 (Eigenvector Research, Inc.) available within the MATLAB environment [49] and the ANN analysis was performed using R-project software, ver. 3.4.4.

Genetic Algorithm
Genetic Algorithm (GA) [50] was employed using PLS_Toolbox within the MATLAB environment to analyze the spectroscopic data used in the modeling events described below. Using GA, all variables within the Raman spectral dataset collected from serum are considered in order to determine their significance for sample classification. When the spectral data consists of many features in the dataset, Appl. Sci. 2019, 9, 3256 5 of 16 as is seen herein, the data can become too noisy; this problem can be addressed through reducing the dimensionality of the dataset by selecting only a subset of informative spectral features and removing the uninformative ones. The process involves selecting variables with the lowest prediction error (RMSE-CE) through simulated natural selection, which involves genetic mutation and recombination of chromosomes, or subsets of variables.
The Raman bands which are determined by GA as the most useful for differentiating between classes of blood serum data can be further studied to determine their possible clinical relevance, including indicating biochemical changes which occur within serum during progression of AD.

Artificial Neural Networks
Machine learnings methods are known as data driven approaches; several different techniques were applied to the dataset to determine which algorithm could best solve the problem at hand. As a result, Artificial Neural Networks (ANNs) was selected as the optimal choice for multivariate data analysis [51]. ANNs are simplified mathematical models which resemble the human brain and can be described as several layers of interconnected "neurons." Each neuron can perform mathematical operations on the input values and transfer the result to the next layer of neurons. The parameters of the mathematical operations are tuned by a learning algorithm. The power of ANNs is based on the collective behavior of many interconnected neurons and their ability to exchange and retain meaningful information. Advantageously, ANNs do not rely on data distribution and thus can derive patterns from large, complex, and noisy datasets which are typically difficult to analyze by conventional linear data analysis methods. ANNs are fault-tolerant and capable of generalization, which enables them to restore missing data and represent "real-world" solutions for the generalization of a problem.
However, depending on the complexity of the model, ANNs can be time consuming to perform and the number of hidden layers has to be determined to completely represent all features of the dataset. Furthermore, it is not always clear how ANNs approach the solution, and as such, they are sometimes called "black boxes" [52][53][54]. ANNs tend to converge on local minima, yielding reduced generalization abilities [55,56]. However, ANNs consistently outperform other classification methods, indicating the ability of non-linear data generalization to achieve high predictive capabilities [27,54,[57][58][59]. The process of searching for a global minima can be complicated; to alleviate this problem herein, each training event during the testing phase was repeated twenty times, each time with new random starting values for the network weights; the network with the lowest error was then determined. Overtraining was assessed by performance of the model on the validation data. One established statistical technique to evaluate if a model is over-trained is to use a separate dataset for validation of the trained model. The cross-validation technique is an extension of this principle. The measurements of model accuracy on the validation dataset are accepted as an estimate of model accuracy on novel, unseen data. A well-generalized network should give good prediction results on the validation data, and these should be comparable to the prediction results on the test set [60].
To design, test, and validate ANN models, the "neuralnet" package in the R environment was employed [61]. The specific form of ANN used for AD diagnostics was a multilayer perceptron (MLP). The MLP architecture was designed and optimized by varying the number of hidden layers and the number of neurons in each layer; each architecture was tested using transfer and resilient back-propagation training methods. For each classification event, a random split was used on the original dataset to divide it into training (90%) and test (10%) sets during the testing phase and then during cross-validation, 10% of data was separated from the training set and used.
The training dataset was used to determine the relationship between dependent and independent variables. The test set, in turn, assesses the performance of the model. The assignment of data to either the training set or the test set is done by random sampling, where multiple spectra from donors were used as individual inputs. The random split was applied to the original dataset of 480 spectra to divide it into training (90%) and test (10%) datasets; this corresponds to 48 randomly selected individual spectra used for testing and the remaining spectra (432) to be used for training. This approach has some limitations, including overlooking the high probability that spectra from the same donor will appear in both the training and the test sets, therefore potentially creating a class imbalance problem in some of the training and test datasets. Therefore, after the optimal model network structure was selected and evaluated using the test dataset, an estimation of how the ANN is expected to perform when applied for making general prediction was tested using the Bootstrap Latin Partition (BLP) cross-validation scheme. Considering the small dataset, the BLP cross-validation scheme was chosen to evaluate the statistical fit of the selected ANN model. It has been shown that the BLP method reduces performance dependence by isolation and separation of the training and validation datasets and reduces prediction accuracy variance. Further, the BLP scheme preserves class distribution in both the training and validation sets and is performed in a donor-wise manner. This means that all spectra collected for an individual donor were excluded from the training dataset and then used to test the built model. Sensitivity (true-positive rate), selectivity (true-negative rate), and accuracy parameters and area under the receiver operating characteristic curve (AUC) of classification for validating data subsets were used to assess the performance of the calculated models. Results of BLP are reported as averages across all folds and bootstraps.

Results and Discussion
Both the Raman and SERS spectra of serum samples were measured. Figure 2 shows the comparison of the SERS spectrum of serum mixed with AgNPs to the regular Raman spectrum of blood serum. The observed dramatic increase in intensity of many vibrational bands within the SERS spectrum indicates that significant enhancement effects from the silver colloids was achieved.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 16 480 spectra to divide it into training (90%) and test (10%) datasets; this corresponds to 48 randomly selected individual spectra used for testing and the remaining spectra (432) to be used for training. This approach has some limitations, including overlooking the high probability that spectra from the same donor will appear in both the training and the test sets, therefore potentially creating a class imbalance problem in some of the training and test datasets. Therefore, after the optimal model network structure was selected and evaluated using the test dataset, an estimation of how the ANN is expected to perform when applied for making general prediction was tested using the Bootstrap Latin Partition (BLP) cross-validation scheme. Considering the small dataset, the BLP crossvalidation scheme was chosen to evaluate the statistical fit of the selected ANN model. It has been shown that the BLP method reduces performance dependence by isolation and separation of the training and validation datasets and reduces prediction accuracy variance. Further, the BLP scheme preserves class distribution in both the training and validation sets and is performed in a donor-wise manner. This means that all spectra collected for an individual donor were excluded from the training dataset and then used to test the built model. Sensitivity (true-positive rate), selectivity (true-negative rate), and accuracy parameters and area under the receiver operating characteristic curve (AUC) of classification for validating data subsets were used to assess the performance of the calculated models. Results of BLP are reported as averages across all folds and bootstraps.

Results and Discussion
Both the Raman and SERS spectra of serum samples were measured. Figure 2 shows the comparison of the SERS spectrum of serum mixed with AgNPs to the regular Raman spectrum of blood serum. The observed dramatic increase in intensity of many vibrational bands within the SERS spectrum indicates that significant enhancement effects from the silver colloids was achieved. Serum samples were collected from three groups of subjects: AD (mild AD, n = 10 and moderate AD, n = 10), other neurodegenerative dementias (OD, n = 18), and healthy controls (HC, n = 10). Supplementary Figure S1a shows the mean SERS serum spectrum for each group of subjects. Supplementary Figure S1b-f presents the difference spectrum and ±2 standard deviations (STD) for each of the compared groups. Although there are visible differences in the serum spectral profiles of different groups, the differentiation of the groups solely based on visual inspection can be unreliable, as confirmed by the difference spectra lying within 2 STDs for each compared group.
Because the differences between average spectra fall within 2 STDs of the mean, it is necessary to use multivariate statistical methods to fully understand the differentiation capabilities and significance of the observed spectral differences to be used for diagnostic purposes. Multivariate statistical methods allow for the capture and analysis of spectral variability by transforming hidden characteristic spectral features, which may be lost upon averaging of spectral datasets, into a Serum samples were collected from three groups of subjects: AD (mild AD, n = 10 and moderate AD, n = 10), other neurodegenerative dementias (OD, n = 18), and healthy controls (HC, n = 10). Supplementary Figure S1a shows the mean SERS serum spectrum for each group of subjects. Supplementary Figure S1b-f presents the difference spectrum and ±2 standard deviations (STD) for each of the compared groups. Although there are visible differences in the serum spectral profiles of different groups, the differentiation of the groups solely based on visual inspection can be unreliable, as confirmed by the difference spectra lying within 2 STDs for each compared group.
Because the differences between average spectra fall within 2 STDs of the mean, it is necessary to use multivariate statistical methods to fully understand the differentiation capabilities and significance of the observed spectral differences to be used for diagnostic purposes. Multivariate statistical methods allow for the capture and analysis of spectral variability by transforming hidden characteristic spectral features, which may be lost upon averaging of spectral datasets, into a discriminative algorithm. Further analysis demonstrated that high discrimination accuracy could be achieved by modeling nonlinear relationships between variables using ANNs.
It is important to note that a large number of features can lead to ANNs becoming overfit. Additionally, Raman spectra often contain spectral regions which may be non-informative. Therefore, GA was applied to optimize the generalization performance of the ANN models and to reduce the risk of overfitting. GA was used to select the subset of spectral variables which provide the most optimal information for discriminating between classes of donors in the training dataset, which consisted of 48 donors and 480 spectra. In general, more data leads to better accuracy due to a typically better quality dataset which has a closer domain and less noise. For GA, a single run was selected to have 71 individuals per population and 100 generations. A total of 300 runs were performed. After GA selected which spectral regions are the most informative, the Raman bands were identified and assigned to the biomolecules contributing to that vibrational mode (proteins, sugars, lipids, etc.). GA identified informative regions and bands of SERS serum spectra through comparison of the AD group to the HC group (Figure 3a) and the AD group to the OD group (Figure 3b). Spectral regions identified by GA as best influencing the differentiation capabilities are marked in red.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 16 discriminative algorithm. Further analysis demonstrated that high discrimination accuracy could be achieved by modeling nonlinear relationships between variables using ANNs. It is important to note that a large number of features can lead to ANNs becoming overfit. Additionally, Raman spectra often contain spectral regions which may be non-informative. Therefore, GA was applied to optimize the generalization performance of the ANN models and to reduce the risk of overfitting. GA was used to select the subset of spectral variables which provide the most optimal information for discriminating between classes of donors in the training dataset, which consisted of 48 donors and 480 spectra. In general, more data leads to better accuracy due to a typically better quality dataset which has a closer domain and less noise. For GA, a single run was selected to have 71 individuals per population and 100 generations. A total of 300 runs were performed. After GA selected which spectral regions are the most informative, the Raman bands were identified and assigned to the biomolecules contributing to that vibrational mode (proteins, sugars, lipids, etc.). GA identified informative regions and bands of SERS serum spectra through comparison of the AD group to the HC group (Figure 3a) and the AD group to the OD group ( Figure  3b). Spectral regions identified by GA as best influencing the differentiation capabilities are marked in red. Although GA revealed a greater number of significant regions between AD and OD SERS serum spectra than it did between AD and HC SERS serum spectra, the number of identified bands were very close. GA selected 9 regions (14 bands) and 5 regions (11 bands) between the AD to OD and AD to HC comparisons, respectively. As can be seen in Figure 3, the regions selected by GA in both comparisons are complimentary, and as such they selectively target complimentary spectral biomarkers expressed for the differentiation of these three classes, which can be very advantageous Although GA revealed a greater number of significant regions between AD and OD SERS serum spectra than it did between AD and HC SERS serum spectra, the number of identified bands were very close. GA selected 9 regions (14 bands) and 5 regions (11 bands) between the AD to OD and AD to HC comparisons, respectively. As can be seen in Figure 3, the regions selected by GA in both comparisons are complimentary, and as such they selectively target complimentary spectral biomarkers expressed for the differentiation of these three classes, which can be very advantageous for statistical modeling. The region borders, vibrational band positions, and tentative band assignments for presumptive molecular contributions are given in Tables 2 and 3 for each comparison. Interestingly, the regions selected by GA for SERS comparison almost completely overlap with regions which were selected in comparisons of conventional Raman spectroscopic datasets identified for the same purpose [27]. This shows the consistency of the applied method for the classification of blood serum spectral datasets obtained using different Raman spectroscopic techniques. Further, the tentatively identified set of molecules listed in Tables 2 and 3 can be used for future evaluation of the spectroscopic signature for AD. For this study, the selected regions that are the most significant differentiation were used for the rest of analysis, including during the building of the ANN models. Table 2. Tentative assignments of the most important regions in the SERS spectrum of blood serum for the discrimination between AD and HC, as determined by GA. adenine, cytosine, thymine, tryptophan, tyrosine, fatty acids, galactosamine, pyruvate, coenzyme A, acetoacetate, ascorbic acid, amide I, α helix, phospholipid υ : stretching mode; δ : bending mode. Table 3. Tentative assignments of the most important regions in the SERS spectrum of blood serum for the discrimination between AD and OD, as determined by GA.   After GA, ANNs were used to build three different models. The first model was a binary model to distinguish between all AD and HC samples. The second and third models were tertiary models built to distinguish between mild AD, moderate AD, and HC samples and between HC, all AD, and OD samples. The spectral features selected by GA for the best discrimination between AD and HC samples was used as the input dataset for all ANN networks. Since ANN is a supervised method, it is important to characterize the modeled system sufficiently with input and output data. The common approach is to partition the dataset into three parts: the training, test, and validation sets. For the algorithm, the inputs were provided as data points of the Raman spectra (240 values of wavenumbers, selected by GA) and the assigned classes (AD, OD, and HC) were the identity labels. The output layer contained two and three neurons for the binary and the tertiary models, respectively. The network weights within the network are trained for multiple different configurations of ANNs based on the resilient backpropagation algorithm with weight backtracking of the adjustable parameters in order to minimize differences between the network output and the known labels of the samples. The threshold for the partial derivatives of the error function was set to a stopping criteria of 0.01. All hyper-parameters, i.e., parameters whose values are set before the learning process begins, influence how the weights (parameters) between the neuron connections will be trained by the algorithm. The hyper-parameters are identified by a trial-and-error process, and performance of the algorithm on the test dataset is used to determine the optimal ANN architecture. The network that provided the lowest prediction errors on the test set as compared to the others was selected as the final optimal ANN network. The confusion table seen in Table 4 summarizes the results that were achieved when using the optimal ANN network structures during the testing phase. The confusion table depicts to which class every spectrum from the test set was predicted as belonging and compares the predictions to the actual class. After the training process was finished, all network weights were fixed to their single optimized value. The neural network architectures that yielded the best results during the testing phase consisted of one hidden layer with 10 neurons for the binary model, of two hidden layers with 60 and 10 neurons each for the AD tertiary model, and of two hidden layers with 60 and 20 neurons each in the OD tertiary model. The performance of the optimal ANN model is assessed using the BLP cross-validation procedure in order to assess the trained ANNs ability to differentiate between classes of Raman spectra. Ten Latin partitions bootstrapped five times were used to measure the generalized prediction accuracy of the models [64]. BLP will preserve the class distribution in both the training and validation sets and ensure all samples are used for prediction only once. Here, 10 training-prediction subsets of complete samples (donors), ensuring that each sample is only used once for prediction, were furnished. For each bootstrap, the data was split into training and prediction sets so that nine Latin partitions were combined into a training set for model building, and the tenth partition was used for prediction. All results from the ten prediction sets and across the five bootstraps were pooled and averaged, and the standard deviation was calculated. This approach was used to measure the generalized prediction ability of the three optimal ANN models built.

GA Region
The final network results for each model are shown in Table 5. The performance parameters for all classes are listed as they were determined using BLP. The sensitivity (true-positive rate, i.e., percent of spectra belonging to a class that were correctly predicted as belonging to that class), specificity (true-negative rate, i.e., percent of spectra belonging to a class that were correctly predicted as not belonging to a different class), and accuracy (overall closeness of the predicted results to the known results) for each class prediction is reported. Table 5 represents the results of the ANN networks for HC vs AD differentiation in a binary model, for HC vs mild AD vs moderate AD in an AD tertiary model, and for HC vs AD vs OD in an OD tertiary model. The AD tertiary model demonstrates slightly worse performance than the AD binary and OD tertiary classification models, resulting in an accuracy of 94.84% as compared to the average accuracy of 96.47% and 98.31%, respectively. Such results can be expected, as only relatively minor changes in biochemical composition of blood serum are likely to be associated with AD progression from the mild stage to the moderate stage of the disease. With smaller sample sizes, one can expect larger variance between bootstrap results, however the observed variance of the performance parameters in this study, as determined by the calculated standard deviation, was very small thus indicating stability of the models. The results shown in Table 5 correspond to the performance parameters for samples at the individual spectrum level. Each spectrum receives a probability estimate for belonging to each class; the spectrum is then assigned to the class corresponding to the highest probability estimate.
To determine the classification of each donor, overall, the donor is assigned to the class the majority of its spectra were assigned as. For the binary model and tertiary OD model, all donors were correctly classified. In the tertiary AD model, however, four donors were misclassified, each time occurring in a different bootstrap.
The BLP cross-validation results shown in Table 5 exhibit good prediction results and prove that the model is not majorly overfit. Rigorous validation using the BLP scheme allows for overfitting to be avoided and ensures high quality of the resulting dataset. Further, accuracy is calculated in classification problems and is computed as the percentage of correctly classified inputs. The accuracy of the validation set is a good indicator of future classification generalization abilities. However, in order to monitor overfitting, the test and validation accuracies were compared, since an overfit model would be expected to perform much worse on the validation data. It is further expected that the test accuracy would ideally only be slightly higher or lower than the validation accuracy. This was confirmed by accuracy values achieved for binary AD, tertiary AD and tertiary OD models from the test set, which were 98% each for all models. These values are only slightly higher than the average accuracies of the BLP cross-validation results for the binary AD, tertiary AD and tertiary OD models, which were 96%, 95% and 98%, respectively (Table 5). Additionally, there have been a number of studies reported which use a very similar approach of Raman spectroscopy in combination with machine learning methods (specifically with ANN) for the analysis of the same size or smaller datasets than what is used in the current study. For example, the same approach was used to differentiate eleven patients with type 2 diabetes and nine controls (healthy subjects) from different anatomical locations, achieving 88.9-90.9% accuracy using a 10-fold cross-validation approach in donor-wise manner [65]. It should be noted that the study reported herein is designed as a proof-of-concept and more testing is necessary to obtain a strong understanding of the classification accuracy and generalizability of this framework.
To further understand the classification ability of the SERS spectra, receiver operating characteristic (ROC) curves were built using the "pROC" package [66] to generate ROC curves using the probability estimates for classifying spectra to a particular class. A ROC curve plots the sensitivity (true positive rate) versus the specificity (true negative rate). The points on the curve represent different thresholds for determining the class assignment of a spectrum and allows the user to understand the tradeoff between sensitivity and specificity for all possible thresholds, rather than just the one that was used in the ANN models. Using this method, it was possible to assess the performance of the model using a variety of thresholds. The results of the ROC analysis can be seen in Figure 4. The AUC value indicates the probability of correct classification. These results further support the strong classification ability of the SERS spectra to correctly determine to which class a donor belongs for diagnostic purposes.

Conclusions
A combination of multivariate analysis and surface-enhanced Raman spectroscopy using silver colloidal nanoparticles was applied for the analysis of blood serum from AD patients, healthy controls, and individuals with other forms of dementia. Through the use of ANNs as a multivariate analysis, it was possible to differentiate AD serum samples from normal and other dementia serum samples with high diagnostic sensitivity and specificity. The ANNs achieved a diagnostic sensitivity around 96% for differentiating between AD and HC samples, 95% for differentiating mild AD, moderate AD, and HC samples and 98% for differentiating HC, AD, and OD samples. Tentative assignments of the Raman bands selected by GA as the most useful for correct classification of the measured SERS spectra demonstrated spectroscopic markers that may reflect AD specific changes of particular nucleic acids, saccharides, and protein content in the blood serum of AD patients. These vibrational mode assignments will require further evaluation and experimentation in order to more specifically and reliably determine the biomolecules which are responsible for the changes in biochemical composition of the blood serum samples, which will be conducted in future studies. The . The calculated ROC curves using BLP prediction values of the ANN models for differentiating AD vs HC (a) in the binary model, and predictions from both tertiary models, which differentiates one class against all others. The AD tertiary model differentiates mild AD (b1) vs moderate AD (b2) versus HC (b3). The OD tertiary model differentiates AD (c1) vs HC (c2) vs OD (c3). AUC refers to area under ROC curve value calculated from the model predictions against the outcome that shows the efficacies of the ANN models.

Conclusions
A combination of multivariate analysis and surface-enhanced Raman spectroscopy using silver colloidal nanoparticles was applied for the analysis of blood serum from AD patients, healthy controls, and individuals with other forms of dementia. Through the use of ANNs as a multivariate analysis, it was possible to differentiate AD serum samples from normal and other dementia serum samples with high diagnostic sensitivity and specificity. The ANNs achieved a diagnostic sensitivity around 96% for differentiating between AD and HC samples, 95% for differentiating mild AD, moderate AD, and HC samples and 98% for differentiating HC, AD, and OD samples. Tentative assignments of the Raman bands selected by GA as the most useful for correct classification of the measured SERS spectra demonstrated spectroscopic markers that may reflect AD specific changes of particular nucleic acids, saccharides, and protein content in the blood serum of AD patients. These vibrational mode assignments will require further evaluation and experimentation in order to more specifically and reliably determine the biomolecules which are responsible for the changes in biochemical composition of the blood serum samples, which will be conducted in future studies. The results reported herein demonstrate the great potential for development of a successful method which combines silver nanoparticle-based SERS serum analysis with ANN-based diagnostic algorithms which can be used within clinical laboratory settings for the non-invasive detection and screening of Alzheimer's disease.

Conflicts of Interest:
The authors declare no conflict of interest.