Raman Spectroscopy as Noninvasive Method of Diagnosis of Pediatric Onset Inflammatory Bowel Disease

We propose here a spectroscopic method to diagnose and differentiate inflammatory bowel diseases (IBD), such as ulcerative colitis (UC) and Crohn’s disease (CD) with pediatric onset, in a complete noninvasive way without performing any duodenal biopsy. In particular, the Raman technique was applied to proteic extract from fecal samples in order to achieve information about molecular vibrations that can potentially furnish spectral signatures of cellular modifications occurring as a consequence of specific pathologic conditions. The attention was focused on the investigation of the amide I region, quantitatively accounting the spectral changes in the secondary structures by applying deconvolution and curve-fitting. Inflammation is found to give rise to a significant increasing of the nonreducible (trivalent)/reducible (divalent) cross-linking ratio R of the protein network. This parameter revealed an excellent marker in order to distinguish IBD subjects from non-IBD ones, and, among IBD patients, to differentiate between UC and CD. The proposed methodology was validated by statistical analysis using the receiver operating characteristic (ROC) curve.


Introduction
An early and accurate detection of chronic diseases is crucial in view of a proper intervention [1][2][3]. It can also contribute in facilitating the monitoring of disease progression and therapeutic response. In many chronic diseases, especially regarding pediatric patients, the diagnostic process can be time consuming and invasive so the search for noninvasive and accurate diagnostic tools is underway.
In particular, the diagnosis of inflammatory bowel disease (IBD) in childhood, can be challenging [4][5][6]. The diagnosis is based on detection of chronic inflammation in the gastrointestinal (GI) tract and exclusion of other causes of inflammation. The differentiation of Crohn's disease (CD) from ulcerative colitis (UC), and both of these from infectious diseases, allergic diseases, or primary immunodeficiency disorders (PIDs) with similar presentations is based on a combination of clinical suspicion, endoscopic and histological evaluation of the mucosa, and other additional tests in case of uncertainty [7]. An international group of European experts in pediatric inflammatory bowel disease (PIBD), mainly from the ''Porto'' IBD Working Group of the European Society of Pediatric Gastroenterology, Hepatology, and Nutrition (ESPGHAN) stated that an "accurate diagnosis of inflammatory bowel disease should be based on a combination of history, physical and laboratory examination, esophagogastroduodenoscopy (EGD) and ileocolonoscopy with histology, and imaging of the small bowel [8]. It is critical to exclude enteric infections". Even adhering to revised Porto IBD working group criteria for diagnosis, in some case distinguishing CD from UC and both from other disease is troublesome, especially in the subset of patients with early onset of disease. A noninvasive tool helpful, on the one hand, in reducing the burden of diagnosis and, on the other, in differentiating between CD and UC is a target for current and future research [9][10][11]. Nevertheless, the structural analysis of tissues didn't reveal successful in differentiating among inflammation subtypes.
Calprotectin is a small calcium-binding protein of the S100 family of zinc-binding proteins, and is a major component of cytosol protein content in neutrophils and other cells involved in inflammatory process. In the presence of active intestinal inflammation, polymorphonuclear neutrophils migrate to the intestinal mucosa from the circulation. Any disturbance to the mucosal architecture due to the inflammatory process, results in leakage of neutrophils, and hence, calprotectin, into the lumen and its subsequent excretion in feces. The concentration of calprotectin in feces has been shown to correlate well with the disease activity in IBD and to help differentiate IBD from other functional intestinal disorders, such as irritable bowel syndrome [12][13][14].
Raman spectroscopy is an inelastic light-scattering phenomenon according to which the illumination of a molecule by a monochromatic laser light will give rise to an exchange of a quantum vibrational energy between the two that will result in a difference in vibrational frequency between incident and scattered light [15][16][17]. Consequently, this experimental method provides a vibrational spectrum that contains information relative to chemical bonds and symmetry of a specific molecule [18,19]. As widely reported [20][21][22], it represents an essential methodology in chemistry, physics, biology and material science. For example, Raman spectroscopy is broadly employed in the investigation of changes in secondary structure at all stages of protein aggregation and amyloid fibril formation [23,24]. It is commonly used in unravelling molecule-specific spectral signatures of different biomolecules including nucleic acids, lipids, carbohydrates and complex biological systems like tissues, cells, that are made up of such biomolecules [25].
Most interesting, over the last decade Raman spectroscopy has proved to be a versatile tool in clinical diagnostics too, applied on tissues in order to detect a variety of diseases ranging from cancer [26] to infectious [27], neurodegenerative diseases [28] and in identification of cells [29]. This is because any altered biochemistry in the body that is specific to a particular disease and that will precede macroscopic tissue changes [30,31] will be reflected in a Raman spectrum. In particular, many inflammatory conditions have been investigated by Raman spectroscopy. For example, an endoscope-coupled Raman probe was used in vivo to investigate the colon mucosal composition in patients suffering from ulcerative colitis, for which a decrease in the phosphatidylcholine (720 cm −1 ) and total lipids (1303 cm −1 ) was detected [32]. In addition, colonoscopy-coupled fiber-optic probebased Raman spectroscopy was applied in vivo for the diagnosis, in real time and in a nondestructive way, to inflammatory bowel disease, i.e., ulcerative colitis and Crohn's disease [33].
With the aim of identifying a diagnostic procedural path for inflammatory bowel diseases, in particular CD and UC, that takes into consideration only a physical investigation technique, a noninvasive methodology has therefore been developed based on the analysis of the stool protein extract of patients, using Raman spectroscopy.
As is well known, the evaluation of diagnostic tests is a matter of concern in modern medicine not only for confirming the presence of disease but also to rule out the disease in healthy subjects [34]. In diagnostic tests with dichotomous outcome (positive/negative test results), the conventional approach of the diagnostic test evaluation uses sensitivity and specificity as measures of the accuracy of the test in comparison with gold standard status [35]. Here, the reliability of the proposed innovative methodology was evaluated by conducting statistical analysis by using, in particular, the receiver operating characteristic (ROC) curve and the Youden index ( J ), with its associated cutoffpoint.

Sample Preparation
Raman analysis was performed on fecal samples of patients with IBD (UC and CD). The same analysis was performed on fecal samples of patients for whom IBD was ruled out. All the patients were evaluated in the Pediatric Gastroenterology Unit of University Hospital "G Martino", in Messina, Italy. In particular, we analyzed a proteic extract from fecal samples of 15 subjects with CD, nine subjects with UC (both the groups of subjects constituting the so-called IBD group) and 19 subjects for whom IBD was ruled out (non-IBD group, i.e., control group). For all subjects a fecal sample was collected in order to evaluate fecal calprotectin levels (Calprest ® NG by Eurospital SpA, Trieste, Italy). In the IBD group the evaluation was necessary in order to assess disease activity (i.e., intensity of intestinal inflammation), in the non-IBD group the evaluation was performed as part of assessments to rule out IBD diagnosis. We selected patients in the IBD group both with CD and UC with different degrees of disease activity based on calprotectin levels. On the other hand, in the control group, we selected patients for whom an intestinal inflammation was excluded according to normal calprotetin levels (<100 mg/kg according to manufacturer indications) (see final diagnosis for these patients in Table 1). From the same sample prepared for the enzyme-linked immunosorbent assay (ELISA) test for fecal calprotectin incubated with an extraction buffer, in order to retrieve the proteic content, a liquid extract of 100 µl was collected in order to homogenize the sample for Raman analysis. In Table 1 demographic and clinical descriptions of subjects are summarized.

Raman Measurements
Raman measurements were performed by means of a DXR-SmartRaman Spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) using a diode laser with the excitation wavelength of 785 nm. All the Raman spectra were acquired over the wavenumber range of 400-3300 cm −1 , with a resolution of 1.9 cm −1 and irradiated with a laser power of 24 mW, coming out from a 50 µm spot.
Vials containing fecal extract were accommodated into their sample holder and the 180-degree sampling accessory for the DXR-SmartRaman Spectrometer was used for measurements. This sampling accessory, that has the typical 180-degree backscattering geometry for collecting Raman scattered radiation, is designed as a simple device to accommodate vials, tubes, powders and other samples in a variety of formats. It can allow the use of specialty cells, including cryogenic, hightemperature, electrochemical and controlled-humidity chambers, and results particularly useful in environments that value diverse sample formats over highly automated data collection. In order to obtain high signal-to-noise ratio (S/R) spectra, each Raman spectrum was obtained after collecting 32 sample frames, and the duration of each exposure for each frame was set equal to 60.0 s. Total acquisition time was 32 min for each spectrum.
Experimental data in the 1300-1800 cm −1 range were fitted by a multiple curve-fitting routine provided in the PeakFit 4.0 software package (Systat Software, Inc., San Jose, CA, USA). The fitting procedure we adopted was a nonlinear regression procedure (a nonlinear least squares fitting) that adjusts the parameters in small steps in order to improve the goodness of the fit, by minimizing the sum of the squares of the vertical deviations points from a function f. The analysis of the second derivative of the spectra was preliminary applied in order to gain, by evaluating the minima in the obtained profiles, a first indication of the minimum number of band components and their peak positions, according to a procedure already successfully applied in the analysis of Raman spectra of supramolecular systems [36].
In associated systems, the width of a vibrational band shape suffers homogeneous and nonhomogeneous enlargement, therefore the Voigt profile, which is a convolution of Lorentzian with Gaussian, is suitable for band shape analysis. In the light of this, the experimental data were deconvoluted into the minimum number of Voigt fitting functions, defined as the following convolution of a Lorentzian with a Gaussian curve: were left free to vary upon iteration. For each fitting session, multiple iterations were performed until a converging solution ("best-fit") was reached, by minimization of the value of 2 r . The wellknown problem of finding a local ("false") minimum was overcome by running each nonlinear regression several (20 on average) times, providing each time a different set of initial values of parameters. All fits generated the same parameters values, with the same sum-of squares, regardless the initial values, and this made us confident that we didn't encounter a false minimum.

Receiver Operating Characteristic (ROC) Curve and Youden Index ( J )
The ROC curve is the graph of sensitivity vs. 1-specificity, where the sensitivity represents the true positive rate, while (1-specificity) is the false positive rate [37]. In this study, sensitivity and specificity of the diagnostic test were determined considering positive and negative test in the two different groups (IBD vs. non-IBD subjects). Sensitivity referred to the ability of Raman spectroscopy to correctly identify those patients with the inflammatory process. Instead, specificity referred to the ability of Raman spectroscopy to correctly identify those patients without the inflammation. The area under the ROC curve (AUC) represents the probability Pr (Y ≥ X), i.e., the probability of the marker value Y of a randomly selected subject from the diseased population to be higher than the marker value X of a randomly selected subject from the healthy population [38]. The obtained AUC result is an effective way to summarize the overall diagnostic accuracy of the test. It takes values from 0 to 1, where a value of 0 indicates a perfectly inaccurate test and a value of 1 reflects a perfectly accurate test. In general, an AUC value equal to 0.5 suggests no discrimination (i.e., ability to diagnose patients with and without the disease or condition based on the test), from 0.7 to 0.8 is considered acceptable, from 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding [39]. On the other hand, since AUC serves as a measure of a biomarker's global diagnostic accuracy over all possible cut-points, it cannot directly provide an 'optimal' threshold or cut-point needed by clinicians for making diagnosis, i.e., for classifying a subject as either diseased or healthy. For this reason, the cutpoint was determined using the Youden index J , defined as J = sensitivity + specificity − 1. This index represents the maximum vertical distance between the ROC curve and the diagonal, and is taken at the cut-point that optimizes the biomarker's differentiating ability when equal weight is given to sensitivity and specificity [40]. It offers the optimal cutoff-point and, at the same time, it is a direct measure of the diagnostic accuracy at the optimal cut-point, that is, the maximum overall correct classification rate a marker can achieve [41,42].

Results
In Figure 1, we report the average Raman spectrum of fecal extract from non-IBD patients, obtained averaging the spectra collected for all the analyzed subjects, after having performed a baseline correction of each of them in order to compensate eventual technical and/or sample variations, and having normalized them to the total integrated area.
The spectrum highlights some of the main typical protein vibrational modes, associated to the polypeptide backbone (amide bands) and to the aromatic and nonaromatic amino acid residue side. Based on comparison with literature [43,44], the band detected at ~523 cm −1 is associated to the ν(S-S) stretching vibration. The feature at ~759 cm −1 is assigned to the ring breathing vibration of tryptophan. Going on, the tyrosine doublet can be recognized in the ~830-850 cm −1 range, the high peak corresponding to the phenylalanine (Phe) ring stretching mode is well evident at ~1000 cm −1 , and the ν(C-N) stretching mode is responsible of the band at ~1153 cm −1 . Going to higher Raman shifts, a large and composite band is detected extending from ~1250 cm −1 up to ~1750 cm −1 . It is the result of the overlapping of several contributions, among which we can distinguish the symmetric ν(C-O)s stretching vibration of COOgroups, at ~1416 cm −1 , and the δ(C-H2) bending vibration, at ~1468 cm −1 . Going on, the amide II band is recognized at ~1548 cm −1 , consisting of an out-of-phase combination of ν(C-N) stretching and δ(N-H) bending motions. Finally, the amide I vibration is well evident as a large band centered at ~1650 cm −1 . It is mainly due to the ν(C=O) stretching mode and a small amount of out-of-phase ν(C-N) stretching. The position of the amide I band will depend on the conformations of the polypeptide backbone, i.e., on the type of secondary structure, and on the intraand intermolecular hydrogen bond of protein specimen [45]. Each type of secondary structure will have a typical ν(C=O) stretching frequency, and this justifies the broadening of the band. Figure 2a,b depicts the average Raman spectra, in the 900-1200 cm −1 and 1300-1800 cm −1 wavenumber ranges respectively, of fecal extract from non-IBD subjects (black line), compared to those obtained from subjects within each disease class, i.e., CD (red line) and UC (green line) (the last two spectra were obtained following the same procedure described at the beginning of this section for the average Raman spectrum relative to non-IBD subjects). The spectra reported in Figure 2 were preliminarily normalized to the total integrated area. From an inspection of the figure it appears clear that, as disease occurs, an evident increasing is observed in spectral intensity of the band centered at ~1000 cm −1 and in the large band detected in the 1300-1800 cm −1 range. About this, it is worth remarking that significant variations in integrated Raman intensity, as active inflammation occurs and progresses in severity, were observed in previous studies by colonoscopy-coupled fiber-optic probe-based Raman spectroscopy for inflammatory bowel disease of the colon, including ulcerative colitis and Crohn's disease [33].
The high quality of our spectra allowed us to quantitatively account spectral modifications eventually induced by the inflammatory state as a consequence of changes in the secondary structures, by focusing our attention into the amide I region. For the reasons explained above, and as already reported in literature [46,47], this band constitutes an excellent marker to quantify the secondary structure and conformational changes of proteins, as a consequence of the role played by the amide moiety in crosslinking [48].
In order to do so, the several secondary structures of proteins contributing to the unresolved amide I band need to be distinguished, by separating the corresponding components through second derivative computations, deconvolution and curve-fitting. About this, it is worth remarking that there are clearly other methods to process the Raman spectra without fitting. For example, quantum mechanics/molecular mechanics calculations were recently used for computing the Raman spectra of the Phycocyanobilin Chromophore in α-C-Phycocyanin [49]. Again, the quantification of protein secondary structure content was achieved by multivariate analysis of Raman spectra [50]. A multivariate approach, on the other hand, considers the entire spectral profile, not just a single Raman band or frequency. In the present study, our attention was mainly focused on the investigation of the amide I region, for which the interpretation of protein peaks relies upon curve-fitting to extract information about the protein structure.
In this sense, the main problem we encountered in the analysis of our spectra is the overlapping of several components that contribute to the Raman vibrational profile observed in the 1300-1800 cm −1 range. According to a well-established procedure [51], we decided to fit, first of all, the whole spectrum by means of Voigt lineshapes. After that, all those sub-bands falling out from the amide I range were subtracted from the total fits. A second, more detailed curve-fitting was then carried out only for the amide I region. The analysis of the second derivative of the spectra was applied in order to gain, by evaluating the minima in the obtained profiles, a first indication of the peak positions of the band components [14,52]. In this regard, it is worth remarking that second derivative computation has been already proved to be successful in quantifying secondary structure changes in proteins [53,54]. Starting from it, four main sub-bands were recognized, centered at underlining that, in principle, the presence of the α-helix structures from the analysis of the amide I modes cannot be excluded. However, about this, we have to remark that the choice of the number of components for the spectral decomposition of the band was based on the following criteria: (1) minimum number of components; (2) acknowledged secondary structures present in the system; (3) spectral criteria (second derivative computation); and (4) generation of a reasonable fit. Hence, even if the evaluation of the second derivative profiles of the experimental Raman spectra indicated four main sub-bands and the curve fitting procedure was applied to the experimental profiles based on these values, the contribution of α-helix structures is also reasonably present in the 1640-1654 cm −1 region.
The results of a typical curve fitting procedure obtained for a non-IBD subject, and those for an example subject with CD and with UC are reported in Figure 3a-c. In the inset, the calculated second derivative profile of the experimental Raman spectrum of an example non-IBD subject is reported. Figure 3. Curve-fitting results in the amide I region for a non-IBD subject (a), for a subject with CD (b) and for a subject with UC (c). Experimental points: black circles; best-fit: red line; individual components: magenta, green, brown, and blue lines. In the inset, example of second derivative profile calculated for the experimental Raman spectrum of the non-IBD subject.
Even if we are conscious of the arbitrariness intrinsically related to the best-fit procedure, which can lead to an overinterpretation of the data, we want to underline that the minimum number of parameters was used in the protocol here adopted, and these furnished, at the same time, a very good  components, respectively. It is worth remarking, in this regard, that quantifying ratios of Raman bands is very advantageous, since they are least affected by background fluctuations and preprocessing methods [55]. In particular, as reported in [56], the parameter R is related to the nonreducible (trivalent)/reducible (divalent) crosslinking ratio. The obtained values for all the analyzed subjects are reported in Figure 4. Active inflammation is found to give rise to a significant increase in the crosslinking ratio, which passes from an average value of 0.93 ± 0.05 in the case of non-IBD subjects up to 1.26 ± 0.06 and to 1.45 ± 0.07 in the case of subjects with CD and with UC disease, respectively. Figure 5a illustrates the ROC curve for IBD and non-IBD subjects (AUC = 1, 95% confidence interval). Figure 5b shows the best cut-off point for maximizing sensitivity and specificity (threshold value = 1.2). These curves show that Raman spectroscopy is able to discriminate IBD from non-IBD subjects. In order to highlight the above results, in Figure 5c, the distribution graph for IBD and non-IBD subjects is depicted. Considering the values obtained from the ratio R , we also performed the differentiation between CD and UC in IBD patients. Figure 6a illustrates the ROC curve for IBD subjects (AUC = 1, 95% confidence interval). Figure 6b shows the best cut-off point for maximizing sensitivity and specificity (threshold value = 1.4). These curves show that Raman spectroscopy is able to discriminate UC from CD in IBD subjects. In order to highlight the above results, in Figure 6c, the distribution graph for UC and CD subjects is depicted.

Discussion
The analysis of the amide I Raman band of calprotectin, deconvoluted in different sub-bands directly correlated with various protein secondary structures, revealed an excellent tool to look at their spectral variations as a consequence of two inflammatory bowel diseases, Crohn's disease and ulcerative colitis, in childhood.
Starting from this, the purpose of this study was furnishing an evaluation method whose peculiarity lies in its ability of detecting an inflammatory disease without any invasive procedure, such as in vivo analysis and/or biopsy, in an automated and highly efficient way. This method, deeply described in the previous section, furnished at the same time some information regarding the quality of the protein network induced by the inflammatory state. As a matter of fact, the observed enhancement of the nonreducible (trivalent)/reducible (divalent) crosslinking ratio, as accounted by the parameter R , is indicative of an increasing, upon inflammation, of the quantity of trivalent, nonreducible cross-links at the expenses of the divalent reducible ones, and/or as a consequence of their reduced formation. As a consequence, a more strongly interconnected protein network is conceivable, that will clearly result in altered mechanical and functional properties [57].
One could argue that an increase in the amide I band could be also ascribed to protein conformational changes induced by factors different from inflammatory state, including aging, dehydration and radiological conditions [58][59][60]. Nevertheless, these factors don't affect the reported data, as the groups of subjects were very similar to each other, differing only regarding intestinal inflammation.
The obtained results demonstrated the feasibility of Raman spectroscopy as diagnostic modality for rapid detection of IBD and, among IBD subjects, for a clear and unequivocal distinction between CD and UC. Beyond noninvasiveness, the proposed method offers significant advantages, including ease of use, experimental repeatability and speed of execution. In addition, it is extremely low-cost, compared to the currently used diagnostic techniques requiring special diagnostic kits, and nondestructive, since the sample can be stored and analyzed several times after some time, being the only limit of analysis given by the degradation of the sample itself. Encouraged by the obtained results, the next step of the research will concern the possibility to apply the proposed Raman-based method for an early categorization of disease for patients with indeterminate colitis diagnoses.

Conclusions
In the present paper, a rapid, nondestructive, cost-effective and noninvasive diagnostic method for the detection and discrimination of pediatric onset inflammatory bowel diseases (IBD), namely Crohn's disease (CD) and ulcerative colitis (UC), is presented. The proposed methodology makes use of Raman spectroscopy in order to probe the changes in secondary structures occurring in proteic extract from fecal samples of pediatric patients that underwent to IBD, i.e., CD or UC, with respect to subjects for which any presence of IBD is excluded. The detected spectral modifications in the protein network were quantified, through a deconvolution procedure into Voigt profiles applied to the broad amide I vibrational band, by the evaluation of the cross-linking ratio R between nonreducible (trivalent) and reducible (divalent) structures. The aforementioned parameter is found to significantly increase upon inflammation, suggesting a more interconnected proteic network, and in a different extent according to CD or UC disease. Based on its value, then, the distinction between non-IBD and IBD subjects, and among the last ones, between CD and UC, was achieved.
Accuracy, sensitivity, and specificity of the proposed diagnostic test were determined using the receiver operating characteristic (ROC) curve and Youden index ( J ), which provided excellent results. The overall results suggest a possible use of the Raman technique as an early diagnostic tool for such classes of inflammatory bowel diseases of very high incidence, with enormous implications in facilitating, for example, early specific treatments, and for which, at the same time, biopsy and/or any invasive procedure are no longer necessary.

Patents
A patent application from the work reported in this manuscript is pending.