^{*}

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (

Detection and identification of blood, semen and saliva stains, the most common body fluids encountered at a crime scene, are very important aspects of forensic science today. This study targets the development of a nondestructive, confirmatory method for body fluid identification based on Raman spectroscopy coupled with advanced statistical analysis. Dry traces of blood, semen and saliva obtained from multiple donors were probed using a confocal Raman microscope with a 785-nm excitation wavelength under controlled laboratory conditions. Results demonstrated the capability of Raman spectroscopy to identify an unknown substance to be semen, blood or saliva with high confidence.

The body fluid traces recovered at crime scenes are among the most important types of evidence to forensic investigators [

Although several methods have been developed over the years for body fluid identification, nondestructive tests that can be performed at a crime scene are still in the developmental stage. Fluorescence and Raman spectroscopies are among the most promising nondestructive methods for confirmatory identification of body fluids [

For the last four years, our laboratory has been working on the development of a novel approach for body fluid identification based on near Infrared (IR) Raman spectroscopy and advanced statistical analysis. Our approach is based on the hypothesis that the biochemical composition of each body fluid is unique and Raman spectroscopy can easily recognize the difference [

A multi-dimensional Raman spectroscopic signature was built for each body fluid to uncover the sources of spectral variation and to assign spectroscopic features to the chemical species. However, utilizing these signatures for identification purposes was not optimal. As a more efficient alternative method, we utilized Discriminant Analysis (DA) based on Soft Independent Modeling of Class Analogy (SIMCA), Linear Discriminant Analysis (LDA) and Partial Least Squares Discriminant Analysis (PLS-DA) techniques for body fluid identification purposes.

Sets of 50 semen, 14 blood and 15 saliva samples were obtained from anonymous donors and volunteers (see 7–9 references for details). A 10-μL drop of each body fluid sample was placed on a circular glass slide designed for use with an automatic mapping stage and allowed for drying completely. Prepared samples were analyzed on a Renishaw inVia confocal Raman spectrometer equipped with the a research-grade Leica microscope, 20x long-range objective, and WiRE 2.0 software using automatic mapping (lower plate of a Nanonics AFM MultiView 1000 system) that scanned a sample area of 75 × 75 μm and measured Raman spectra from 16–36 spots within the area with 6 ten-second accumulations for each spot. Measurements were taken using Quartz II and QuartzSpec software. The obtained spectra were treated with GRAMS/AI 7.01 software to remove any cosmic ray interference and imported into MATLAB 7.4.0 for statistical analysis. The number and possible identities of principal spectral components were determined for semen, blood and saliva using significant factor analysis (SFA) and the alternate least squares (ALS) function. A multi-dimensional spectroscopic signature of a specific body fluid was built from these principal components.

Alternating least squares (ALS) analysis of the 36 Raman spectra of a single dry semen sample was utilized to build a multi-dimensional spectroscopic signature. It was found that three major principal components satisfactorily represented semen Raman spectra (

Each individual spectral PC was built by contributions from several biochemical components of semen. As is common for ALS, cross-mixing of peaks took place. Peaks which were intense in one PC also appeared in other components, but with lower intensity.

Specifically, the first PC dominated by the contribution from tyrosine [^{−1} showed a noticeable contribution from other biochemical species (^{−1} and 1240 cm^{−1} known as Amide I [^{−1} closely match the Raman bands of serum albumin [^{−1} is consistent with the C–N symmetric stretching vibration reported for choline [^{−1} Raman bands of PC 3 are consistent with previously reported spectra of spermine phosphate hexahydrate (SPH) [

The three described principal components, combined with a horizontal line and a tilted line presenting the fluorescent contribution, were used to create a multi-dimensional spectroscopic signature of semen. The linear combination of these components fitted any experimental Raman spectrum of all 50 semen samples with high quality [

A multi-dimensional spectroscopic signature of dry human blood was developed using the same approach as that for semen. The blood spectroscopic signature which contained three major principal components (^{−1}(^{−1} are similar to major peaks of pure fibrin, one of the coagulated blood components [

Hemoglobin- and fibrin-dominated principal components (developed from a single blood sample) together with a horizontal line and a tilted line presenting the fluorescent background were used for fitting the experimental Raman spectra of dry blood samples obtained from 14 donors. A quantitative statistical analysis using sum of squares due to error (SSE), R-square, and root mean squared error (RMSE) was performed to confirm a satisfactory fitting of all experimental spectra [

The process of multi-dimensional spectroscopic signature building was repeated with 15 saliva samples. According to significant factor analysis combined with principal component analysis, near IR Raman spectra of dry saliva samples demonstrated higher variability relative to semen and blood samples. The presence of 11 principal components was detected. Three major components were chosen (^{−1}, respectively (^{−1}) [^{−1} and 521 cm^{−1}) [^{−1}, which are consistent with the amino acid arginine, but this is a preliminary assignment and more investigation is needed. Despite the fact that the Raman spectrum of dry saliva, in contrast to blood and semen, varies considerably from donor to donor, a linear combination of three principal components, a horizontal line and a line resenting fluorescent contribution constituting a multi-dimensional signature fits to all Raman spectra of dry saliva samples with satisfactory goodness-of-fit statistics.

Identification of an unknown species based on spectroscopic data is a common statistical problem. Most of these statistical methods can be separated in two main groups, unsupervised (also called exploratory) and supervised methods [

Supervised methods utilize a prior knowledge about the system by developing classification models based on known spectra [

As a first step, a variety of statistical approaches were tested to explore the possibility of body fluid identification. It was found that the application of Raman multi-dimensional signatures built by the alternating least squares algorithm for identification purposes decreased the quality of identification relative to the direct usage of conventional DA methods (data is not shown). That is a consequence of different ideologies of these methods. Multi-dimensional Raman signatures were built to uncover the sources of data variation and to assign collected spectral data to real chemical species. In contrast, the DA is based on the orthogonal matrix decomposition, and obtained components or factors explain maximum data variance. The first approach is important for understanding the chemical composition of the system and can be used, for example, to map real species distribution among the tested area. The second approach gives abstract, unique and orthogonal (independent) solutions, which can be used to determine the number of different sources of the variation present in the data and, eventually, allows discrimination. The second approach does not use special constraints, such as non-negativity, unimodality or local-rank, which are necessary for a physically meaningful result.

We report here on the application of SIMCA, PLS-DA and LDA algorithms for the identification of traces of body fluids based on near IR Raman spectroscopic data. The efficiency of each method was tested by various validation methods, such as a “leave one out” or formation of training and test data sets.

Soft Independent Modeling of Class Analogy (SIMCA) is typically used to identify local models for defined groups and to predict a probable class membership for new observations. SIMCA focuses on modeling the classes rather than finding the optimal classifier. We utilized SIMCA classification method to compare Raman spectra of three body fluids. 170 spectra recorded from 17 blood samples, 252 spectra from 17 saliva samples and 693 spectra recorded from 50 semen samples were used to develop the SIMCA model using three PCA models based on the body fluid types. PCA models were calculated using contiguous block, leave-one-out, Venetian blind, and random subset cross-validation methods for determining the number of latent variables. Hotelling’s T^{2} and Q statistics were used for group membership decisions and to test the normality of principal components obtained from PCA. The results showed that 83% of blood, 88% of saliva and 89% of semen spectra were attributed to the correct models (

While SIMCA is a very useful classification tool, the PCA submodels in SIMCA are computed with the goal of capturing the variation within each class. Directions in the data space that discriminate classes are not identified in SIMCA. In LDA, linear combinations of variables are computed to determine directions in the spectral space; discriminant functions maximize the variance between groups and minimize the variance within groups according to Fisher’s criterion. For the validation of the LDA model, the leave-one-out cross-validation was used. In this method, all spectra except one were used to build a LDA model and then to classify the left out spectrum. This method is repeated so that each spectrum is predicted once.

The difference spectra (

Discriminant analysis using naive Bayes classifiers, fitting of multivariate normal densities with covariance estimates stratified by group and Mahalanobis distances also was performed. All these methods can be considered as modifications of LDA. Obtained results were consistent with the previous LDA results. Application of several different methods of discriminant analysis can be important in case of very noisy data or when samples are contaminated.

It has been shown that PLS-DA is basically the inverse-least squares approach to LDA, which produces essentially the same result but with the noise reduction and variable selection advantages of PLS [

Finally, in order to test the stability of SIMCA, LDA and PLS-DA methods, spectral data with introduced noise and background contributions were analyzed. Such modified data can model real “field” tests when contaminated Raman spectra or spectra with low intensity are recorded. Despite the different resistibility of the tested methods to spectral data quality, obtained results (not shown) allowed us to make a conclusion that the combination of the discriminant methods can be successfully used even for “bad” sets of Raman spectra.

Detection and identification of traces of body fluids encountered at a crime scene are important aspects of forensic science today. Our previous studies have demonstrated the possibility to characterize body fluids with unique Raman multi-dimensional signatures and assign spectroscopic features to the chemical species. A nondestructive, confirmatory method of body fluid identification using discriminant statistical analysis of Raman data was developed in the reported study. Discriminant Analysis (DA) using SIMCA, LDA and PLS-DA techniques allowed for discriminating semen, blood and saliva trace with 100% probability under laboratory conditions. Several different spectra preprocessing approaches were tested. Averaging Raman spectra acquired for multiple spots on the sample enhanced significantly the discrimination by the SIMCA algorithm. Data reduction by Principal Component Analysis (PCA) and Partial Least Squares (PLS) decomposition was beneficial for DA utilizing SIMCA and LDA family methods. Necessary and sufficient numbers of principal components or latent variables were determined by significant factor analysis. Three-dimensional score plots built for the PLS-DA model (

Overall, Raman spectroscopy coupled with the discriminant statistical analysis showed great potential for nondestructive, confirmatory identification of body fluids at a crime scene. The ability to make these determinations and identifications, especially on-site at a crime scene, would be a major advance in the forensic analysis of body fluids. Present study deals with pure body fluid traces only. Mixtures of body fluids, contaminations and substrate contributions are important factors for real forensic cases, and our laboratory is working currently on incorporating these additional aspects in the body fluid identification analysis.

We are grateful to John Hicks, Director of North East Regional Forensic Institute (NERFI) and Barry Duceman, Director of Biological Science in the New York State Police Forensic Investigation Center for continued support. We also would like to acknowledge Victor Shashilov for his advice and valuable discussions, and Aliaksandra Sikirzhytskaya for assistance with spectra acquisition.

This project was supported by Award No. 2009-DN-BX-K196 awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect those of the Department of Justice.

Multi-dimensional spectroscopic signatures of semen (

Nearest classes to SIMCA submodels for body fluid samples. Classification of single Raman spectra: correctly classified (

Difference between the average Raman spectrum of all human body fluids (

Linear discriminant analysis of the body fluids. Classes defined by training sets: human blood (red)–class 1, human saliva (green)–class 2, human semen (blue)–class 3. Black color indicates unknown spectra.

A three-dimensional latent variable plot for human blood (

Sensitivity and specificity of PLS-DA. Values calculated for calibration (self-prediction, Cal) and cross-validation results (CV).

Modeled class | Blood | Semen | Saliva |
---|---|---|---|

Sensitivity (Cal) | 1.000 | 1.000 | 0.999 |

Specificity (Cal) | 1.000 | 0.999 | 1.000 |

Sensitivity (CV) | 1.000 | 1.000 | 0.999 |

Specificity (CV) | 1.000 | 0.999 | 1.000 |

Class. Err (Cal) | 0 | 0.000579 | 0.000721 |

Class. Err (CV) | 0 | 0.000579 | 0.000721 |

RMSEC | 0.0344 | 0.107 | 0.103 |