Protein Conformational Changes in Breast Cancer Sera Using Infrared Spectroscopic Analysis

Protein structural alterations, including misfolding and aggregation, are a hallmark of several diseases, including cancer. However, the possible clinical application of protein conformational analysis using infrared spectroscopy to detect cancer-associated structural changes in proteins has not been established yet. The present study investigates the applicability of Fourier transform infrared spectroscopy in distinguishing the sera of healthy individuals and breast cancer patients. The cancer-associated alterations in the protein structure were analyzed by fitting the amide I (1600–1700 cm−1) band of experimental curves, as well as by comparing the ratio of the absorbance values at the amide II and amide III bands, assigning those as the infrared spectral signatures. The snapshot of the breast cancer-associated alteration in circulating DNA and RNA was also evaluated by extending the spectral fitting protocol to the complex region of carbohydrates and nucleic acids, 1140–1000 cm−1. The sensitivity and specificity of these signatures, representing the ratio of the α-helix and β-pleated sheet in proteins, were both 90%. Likewise, the ratio of amides II and amide III (I1556/I1295) had a sensitivity and specificity of 100% and 80%, respectively. Thus, infrared spectroscopy can serve as a powerful tool to understand the protein structural alterations besides distinguishing breast cancer and healthy serum samples.


Introduction
Breast cancer (BC) is the most common invasive cancer among women worldwide [1]. The international agency for research on cancer (IARC) reports that BC comprises 22.9% of invasive cancers in women [1,2]. At present, personal inspection and imaging remain the preferred methods for screening asymptomatic women for BC. Nonetheless, the gold standard mammography entails high costs, is not available in all medical centers, and has a low sensitivity in young women and in the dense breast. Furthermore, BC typically produces less to no symptoms when the tumor is small and easily treatable [3]. The established mammography screening may miss up to 20% of underlying breast cancers [4]. It may also lead to a 30% rate of overdiagnosis and may increase unnecessary surgical hindered by the need for sampling protocols [33] and sophisticated data analysis tools. The X-ray diffraction technique requires a well-ordered crystal [33], while the use of NMR spectroscopy is limited to small proteins [34]. Data analysis protocols for these techniques are also complex, complicating the interpretation of the results. These limitations have led to the development of alternative methods for determining protein structures.
FTIR spectroscopy is one alternative method that can be used for protein secondary structure analysis [35][36][37]. In previous reports, the FTIR spectroscopic investigation of protein secondary structures [38] in BC patient serum samples was validated by several other analytical techniques [39], such as X-ray, NMR [40], and Circular Dichroism spectra (CD) [41]. The FTIR technique has also been tested with various sample types and conditions, including living cells [42], aqueous media [43], hydrogen deuteration [44] in serum [45], dehydration [46], and the heat-induced [47,48] denaturation of serum. Additionally, spectral deconvolution [41] has been employed to diagnose or monitor various ailments, including prostate cancer [49], lymphoma [50], melanoma [50], Alzheimer's disease [51], Parkinson's disease [52], colitis [37], and scrapie [53]. Moreover, this method has been successfully used to study protein-protein interactions [54]; the structure of calcium-binding proteins [55]; and the understanding of the uses and misuses of techniques [56], their optimizations [57], and instrumental improvisations [58]. The protein structure as well as protein conformational changes [59], structural dynamics, and stability have also been successfully determined using second derivative curves [60]. All in all, FTIR spectroscopy has emerged as a powerful tool to study protein secondary structures and can be clinically useful in the early diagnosis of diseases.
In the present proof-of-concept pilot study, we have used FTIR spectral discrimination using curve fitting to obtain the best fit that reflects protein conformational changes in the serum samples of BC patients. The curve fitting technique is also elaborated on in the complex spectral region of carbohydrates and nucleic acids, 1000-1140 cm −1 [12,61,62]. By deconvoluting these regions of experimental spectra with the corresponding GFEB of various biological components, the differentiating signatures of controls and cancerous spectra were determined. Other infrared spectral markers, such as the peak positions of the absorbance curves and spectral signatures, such as the ratio of absorbance values in amide II to amide III bands, are also considered for discrimination. Statistical analysis is further performed in these identifying spectral signatures to understand the discriminating potential. Herein, the accepted scientific premise is that the BC-associated genetic alteration in serum is reflected in the complex region of nucleic acids, including deoxyribonucleic acids (DNA) and ribonucleic acids (RNA) [61,62]. Therefore, our discussion also includes the possible application of genetic and proteomic molecular mapping in serum samples via FTIR spectroscopy for the earlier detection of BC. We have incorporated statistical measures, holistically evaluated the biochemical mapping of proteins structures and circulating nucleic acids components by using infrared spectral deconvolution. A unified fitting protocol for all the samples and a potential prototype applicable in the clinical domain is also presented. These findings go beyond the earlier study [29], providing spectral signatures with higher sensitivities and specificities. Similarly, the implementation of optimized experimental and data analysis protocols and quantification of the spectral signatures by scrutinizing molecular entities rather than relying entirely on wider spectral ranges are improvements over the earlier study [22].

Results
Using the absorbance spectral data of serum samples (using n = 10 for each BC and control), we investigated the applicability of FTIR spectroscopy to discriminate between the control and cancer sera. The attenuated total reflectance (ATR) sample mode of FTIR spectroscopy was used, and the discrimination between the control and test groups was conducted using various data analysis techniques. The investigation involves multivariate analysis, p-value calculation, and quantification by spectral deconvolution and is followed by a statistical analysis. PCA, a useful statistical analysis [63], is first performed to explain the holistic evaluation of protein structural content variations reflected in amides (amide I and II region, 1480-1600 cm −1 ). Herein, each of the 10 samples is measured twice (measurement replicates) to obtain 20 spectral data of BC and 20 of the control. Using the "PAST (PAleontological STatistics) 4 -the Past of the Future" software and the vector normalized second derivative curve of the absorbance spectra within 1480-1600 cm −1 as input data variables, we analyzed the variance-covariance matrix with the pairwise exclusion of missing values to get the component plots. The output of the component plot with 95% ellipses shows ( Figure 1A) a clear separation between each studied group. The scatter plot of PC1 (variability 88%) and PC2 (variability 6%) shows that the data related to the control and BC groups are clustered together with different magnitudes and directions. Figure 1B is the scree plot, showing that the total variance presented by PC1 and PC2 are significant. These findings from the PCA analysis of the amide bands have led us to investigate the spectral signatures useful in the clinical domain. PCA, a useful statistical analysis [63], is first performed to explain the holistic evaluation of protein structural content variations reflected in amides (amide I and II region, 1480-1600 cm -1 ). Herein, each of the 10 samples is measured twice (measurement replicates) to obtain 20 spectral data of BC and 20 of the control. Using the "PAST (PAleontological STatistics) 4 -the Past of the Future" software and the vector normalized second derivative curve of the absorbance spectra within 1480-1600 cm -1 as input data variables, we analyzed the variance-covariance matrix with the pairwise exclusion of missing values to get the component plots. The output of the component plot with 95% ellipses shows ( Figure 1A) a clear separation between each studied group. The scatter plot of PC1 (variability 88%) and PC2 (variability 6%) shows that the data related to the control and BC groups are clustered together with different magnitudes and directions. Figure 1B is the scree plot, showing that the total variance presented by PC1 and PC2 are significant. These findings from the PCA analysis of the amide bands have led us to investigate the spectral signatures useful in the clinical domain.

Discrimination of Average Absorbance
The average of the normalized absorbance spectra for both the control and BC sera that includes the fingerprint region of the biological functional groups (lipids, proteins, nucleic acids, and carbohydrates) is shown in Figure 2A. Solely by looking at the FTIR spectra, it is difficult to discriminate between the absorbance of the functional components of the two groups. However, the comparison of the absorbance spectra between the two groups using a Student's t-test (with twotailed unequal variance) revealed the discriminating potential at the amide regions (1541-1656 cm -1 ) and mixed regions of carbohydrates and nucleic acids (1018-1076 cm -1 ), as highlighted by the red ellipses in Figure 2B (p < 0.05). The prominent discriminatory regions include C=O/C-N stretching, N-H bends in amides, RNA/DNA nucleotides, and C-O vibrations of carbohydrates [64], as reported in previous studies [22]. Previous studies using a principal component analysis-linear discriminant analysis (PCA-LDA) of the FTIR spectra have shown that healthy and cancerous serum samples have different characteristics [22].
The molecular assignments of major spectral bands showing discrimination between the control and BC with higher significance (i.e., p-values < 0.05), are also presented in Table 1. These are the bands originating from the amides of protein, carbohydrates, and nucleic acids. The amide vibrations are mainly arising from the C=O stretching vibration, with minor contributions from out-of-phase C-H stretching vibrations, C-C-N deformation, and N-H in-plane bending [14]. Similarly, the mixed regions of carbohydrates and nucleic acids result from the C-O/C-C stretching, C-H bending, and νs(PO2−) [65]. The second derivative spectra of these absorbance curves revealed that the absorbance

Discrimination of Average Absorbance
The average of the normalized absorbance spectra for both the control and BC sera that includes the fingerprint region of the biological functional groups (lipids, proteins, nucleic acids, and carbohydrates) is shown in Figure 2A. Solely by looking at the FTIR spectra, it is difficult to discriminate between the absorbance of the functional components of the two groups. However, the comparison of the absorbance spectra between the two groups using a Student's t-test (with two-tailed unequal variance) revealed the discriminating potential at the amide regions (1541-1656 cm −1 ) and mixed regions of carbohydrates and nucleic acids (1018-1076 cm −1 ), as highlighted by the red ellipses in Figure 2B (p < 0.05). The prominent discriminatory regions include C=O/C-N stretching, N-H bends in amides, RNA/DNA nucleotides, and C-O vibrations of carbohydrates [64], as reported in previous studies [22]. Previous studies using a principal component analysis-linear discriminant analysis (PCA-LDA) of the FTIR spectra have shown that healthy and cancerous serum samples have different characteristics [22].
The molecular assignments of major spectral bands showing discrimination between the control and BC with higher significance (i.e., p-values < 0.05), are also presented in Table 1. These are the bands originating from the amides of protein, carbohydrates, and nucleic acids. The amide vibrations are mainly arising from the C=O stretching vibration, with minor contributions from out-of-phase C-H stretching vibrations, C-C-N deformation, and N-H in-plane bending [14]. Similarly, the mixed regions of carbohydrates and nucleic acids result from the C-O/C-C stretching, C-H bending, and νs(PO 2− ) [65]. The second derivative spectra of these absorbance curves revealed that the absorbance at the minima positions at wavenumbers 1629 and 1652 cm −1 differ between healthy individuals and BC patients ( Figure 2C). The elevation of absorbance values at the energy band 1018-1076 cm −1 ( Figure 2D) suggests differences in the glycomic profiling [66] and circulating DNA [67] in the blood components. Circulating DNA and glycomic profiling have proven to be critical molecular markers [66,67] in several tumor entities. at the minima positions at wavenumbers 1629 and 1652 cm -1 differ between healthy individuals and BC patients ( Figure 2C). The elevation of absorbance values at the energy band 1018-1076 cm -1 ( Figure  2D) suggests differences in the glycomic profiling [66] and circulating DNA [67] in the blood components. Circulating DNA and glycomic profiling have proven to be critical molecular markers [66,67] in several tumor entities. Difference between the absorbance spectra of the control and BC (shown in Figure 1A), indicating the up-and down-regulation of proteins, carbohydrates, and nucleic acids in the serum of breast cancer patients. Table 1. Discriminatory IR bands for BC serum samples from controls, and primary biomolecular assignments giving rise to the main contributions for the absorbance (taken from [68][69][70][71][72][73][74]). Amide regions and the complex region of carbohydrates and nucleic acids show the discriminating potential.

1700-1600
Amide I: sensitive to protein secondary structures of proteins, which arises mainly due to C=O stretching vibrations and the C-N groups.

1580-1480
Amide II: sensitive for protein conformation, originates mainly from the in-plane N-H bending mode along with C-N and C-C stretching vibrations.

Discrimination of Protein Secondary Structures
In Figure 3A, the average of the second derivative spectra at the amide I absorbance region is shown. The minima of the second derivatives of the spectra allow us to approximate the positions and numbers of the Gaussian function energy profiles required to fit the experimental curve. The amide I band of each spectrum was deconvoluated so that the baseline-corrected spectra were fitted with six GFEB profiles by estimating the number and position of the minima of the second derivatives, which was simulated (▪▪▪) to fit the experimental curve (-), as shown in Figure 3B. Six Gaussian band profiles are assigned as (a) side chain (~1610 cm -1 ), (b) β sheet (~1630 cm -1 ), (c) random (D) Difference between the absorbance spectra of the control and BC (shown in Figure 1A), indicating the up-and down-regulation of proteins, carbohydrates, and nucleic acids in the serum of breast cancer patients. Table 1. Discriminatory IR bands for BC serum samples from controls, and primary biomolecular assignments giving rise to the main contributions for the absorbance (taken from [68][69][70][71][72][73][74]). Amide regions and the complex region of carbohydrates and nucleic acids show the discriminating potential.

1700-1600
Amide I: sensitive to protein secondary structures of proteins, which arises mainly due to C=O stretching vibrations and the C-N groups.

1580-1480
Amide II: sensitive for protein conformation, originates mainly from the in-plane N-H bending mode along with C-N and C-C stretching vibrations.

Discrimination of Protein Secondary Structures
In Figure 3A, the average of the second derivative spectra at the amide I absorbance region is shown. The minima of the second derivatives of the spectra allow us to approximate the positions and numbers of the Gaussian function energy profiles required to fit the experimental curve. The amide I band of each spectrum was deconvoluated so that the baseline-corrected spectra were fitted with six GFEB profiles by estimating the number and position of the minima of the second derivatives, Cancers 2020, 12, 1708 6 of 17 which was simulated (

Discrimination of Protein Secondary Structures
In Figure 3A, the average of the second derivative spectra at the amide I absorbance region is shown. The minima of the second derivatives of the spectra allow us to approximate the positions and numbers of the Gaussian function energy profiles required to fit the experimental curve. The amide I band of each spectrum was deconvoluated so that the baseline-corrected spectra were fitted with six GFEB profiles by estimating the number and position of the minima of the second derivatives, which was simulated (▪▪▪) to fit the experimental curve (-), as shown in Figure 3B. Six Gaussian band profiles are assigned as (a) side chain (~1610 cm -1 ), (b) β sheet (~1630 cm -1 ), (c) random ) to fit the experimental curve (-), as shown in Figure 3B. Six Gaussian band profiles are assigned as (a) side chain (~1610 cm −1 ), (b) β sheet (~1630 cm −1 ), (c) random coil (~1645 cm −1 ), (d) α helix (~1652 cm −1 ), (e) β turn (~1682 cm −1 ), and (f) β anti-parallel sheet (~1690 cm −1 ) structures [75].
Cancers 2019, 11, x 6 of 17 coil (~1645 cm -1 ), (d) α helix (~1652 cm -1 ), (e) β turn (~1682 cm -1 ), and (f) β anti-parallel sheet (~1690 cm -1 ) structures [75]. In order to assess any alterations in structural components associated with malignancy, the integral values of the α-helix and β-sheet structures and their ratios were analyzed. Due to the fact that the intensity of the GFEB has a linear relationship with the concentration according to the Beer-Lambert law [76], the width of GFEB and full width half maximum (FWHM) is inversely related to the vibrational mode lifetime, which is a function of the "rigidity" of the vibrating bond [35]. The interaction of the molecule with its immediate environment also affects the width of the GFEB [77]. If a molecule transfers energy to its surroundings, the spectral peak has a broader line width and reduced intensity, even though the concentration of the molecule remains unchanged. In such cases, the integral area under the curve is a better indicator of the concentration than the intensity alone. Interestingly, we found that even though the levels of most structures did not differ between the samples from the breast cancer patients and healthy individuals, the breast cancer samples had an increase in β-sheet structures, while the levels of the α-helix structures were decreased ( Figure 3C,D). Furthermore, the amide II region is used to report on protein unfolding based on the extent of hydrogen exchanged. Because of the lack of water interference, the amide III region is also considered as a promising region to analyze protein structures. Herein, we have also used a ratio of IR absorbance at the amide II (I1556) to its value at the amide III (I1295) for the analysis of BC-associated protein alteration. The dot plots of these amides ratios are shown in Figure 3E.

Receiver Operating Characteristic (ROC) Curves and Area Under the Curve (AUC) Values
The sensitivity and specificity of a diagnostic test are often used to describe the diagnostic accuracy/performance of the analysis in biomedical research. The discriminating potential of a diagnostic regimen can be quantified by the AUC values of ROC curves [78]. The ROC curve is plotted to find the AUC, as in Figure 3F. The optimal cutoff value calculated for each spectral  Figure 2. Identification of discriminatory bands. (A) Ensemble averages of normalized serum spectra derived from control, n = 10, and BC, n = 10. This wider range of spectra is presented to show the quality of spectra, which overcomes the noise and atmospheric contamination, while measuring them at a resolution of 4 cm -1 . (B) Corresponding Student t-test p-values for the control and BC. (C) The second derivative absorbance spectra is confined to the amide-I region, covering 1600-1700 cm -1 . (D) Difference between the absorbance spectra of the control and BC (shown in Figure 1A), indicating the up-and down-regulation of proteins, carbohydrates, and nucleic acids in the serum of breast cancer patients. Table 1. Discriminatory IR bands for BC serum samples from controls, and primary biomolecular assignments giving rise to the main contributions for the absorbance (taken from [68][69][70][71][72][73][74]). Amide regions and the complex region of carbohydrates and nucleic acids show the discriminating potential.

1700-1600
Amide I: sensitive to protein secondary structures of proteins, which arises mainly due to C=O stretching vibrations and the C-N groups.

1580-1480
Amide II: sensitive for protein conformation, originates mainly from the in-plane N-H bending mode along with C-N and C-C stretching vibrations.

Discrimination of Protein Secondary Structures
In Figure 3A, the average of the second derivative spectra at the amide I absorbance region is shown. The minima of the second derivatives of the spectra allow us to approximate the positions and numbers of the Gaussian function energy profiles required to fit the experimental curve. The amide I band of each spectrum was deconvoluated so that the baseline-corrected spectra were fitted with six GFEB profiles by estimating the number and position of the minima of the second derivatives, which was simulated (▪▪▪) to fit the experimental curve (-), as shown in Figure 3B  In order to assess any alterations in structural components associated with malignancy, the integral values of the α-helix and β-sheet structures and their ratios were analyzed. Due to the fact that the intensity of the GFEB has a linear relationship with the concentration according to the Beer-Lambert law [76], the width of GFEB and full width half maximum (FWHM) is inversely related to the vibrational mode lifetime, which is a function of the "rigidity" of the vibrating bond [35]. The interaction of the molecule with its immediate environment also affects the width of the GFEB [77]. If a molecule transfers energy to its surroundings, the spectral peak has a broader line width and reduced intensity, even though the concentration of the molecule remains unchanged. In such cases, the integral area under the curve is a better indicator of the concentration than the intensity alone. Interestingly, we found that even though the levels of most structures did not differ between the samples from the breast cancer patients and healthy individuals, the breast cancer samples had an increase in β-sheet structures, while the levels of the α-helix structures were decreased ( Figure 3C,D). Furthermore, the amide II region is used to report on protein unfolding based on the extent of hydrogen exchanged. Because of the lack of water interference, the amide III region is also considered as a promising region to analyze protein structures. Herein, we have also used a ratio of IR absorbance at the amide II (I1556) to its value at the amide III (I1295) for the analysis of BC-associated protein alteration. The dot plots of these amides ratios are shown in Figure 3E.

Receiver Operating Characteristic (ROC) Curves and Area Under the Curve (AUC) Values
The sensitivity and specificity of a diagnostic test are often used to describe the diagnostic accuracy/performance of the analysis in biomedical research. The discriminating potential of a diagnostic regimen can be quantified by the AUC values of ROC curves [78]. The ROC curve is plotted to find the AUC, as in Figure 3F. The optimal cutoff value calculated for each spectral ). (C) Integral area of GBEF representing α helix and β sheet. (D) The ratio of α helix and β sheet energy bands, which proves an elevation of β sheet and drop off α helix structures due to malignancies. (E) The ratio of IR absorbance at amide II (I 1556 ) to its value at amide III (I 1295 ). (F) Receiver Operating Characteristic (ROC) curves for the ratio of the integral area of the energy bands representing α-helix and β-sheet protein secondary structures and the respective absorbance at amide II and amide III. The maximum values of sensitivity and specificity are 90% and 90% for signature α/β, while these values are 100% and 80% for signature I 1556 / I 1295 , respectively.
In order to assess any alterations in structural components associated with malignancy, the integral values of the α-helix and β-sheet structures and their ratios were analyzed. Due to the fact that the intensity of the GFEB has a linear relationship with the concentration according to the Beer-Lambert law [76], the width of GFEB and full width half maximum (FWHM) is inversely related to the vibrational mode lifetime, which is a function of the "rigidity" of the vibrating bond [35]. The interaction of the molecule with its immediate environment also affects the width of the GFEB [77]. If a molecule transfers energy to its surroundings, the spectral peak has a broader line width and reduced intensity, even though the concentration of the molecule remains unchanged. In such cases, the integral area under the curve is a better indicator of the concentration than the intensity alone. Interestingly, we found that even though the levels of most structures did not differ between the samples from the breast cancer patients and healthy individuals, the breast cancer samples had an increase in β-sheet structures, while the levels of the α-helix structures were decreased ( Figure 3C,D). Furthermore, the amide II region is used to report on protein unfolding based on the extent of hydrogen exchanged. Because of the lack of water interference, the amide III region is also considered as a promising region to analyze protein structures. Herein, we have also used a ratio of IR absorbance at the amide II (I 1556 ) to its value at the amide III (I 1295 ) for the analysis of BC-associated protein alteration. The dot plots of these amides ratios are shown in Figure 3E.

Receiver Operating Characteristic (ROC) Curves and Area Under the Curve (AUC) Values
The sensitivity and specificity of a diagnostic test are often used to describe the diagnostic accuracy/performance of the analysis in biomedical research. The discriminating potential of a diagnostic regimen can be quantified by the AUC values of ROC curves [78]. The ROC curve is plotted to find the AUC, as in Figure 3F. The optimal cutoff value calculated for each spectral signature is used to select the positivity/negativity of the disease and to estimate the sensitivity and specificity. Strong discrimination between diseased and control serum can be seen with a 90% sensitivity and 90% specificity for signature α/β, and these values are 100% and 80% for I 1556 / I 1295, respectively. The results indicated that the spectral signatures in the specified bands have a high diagnostic accuracy.
As shown in Figure 4, the backbone N-H group donates a hydrogen bond to the backbone C=O group to form the helical structure of the α-helix ( Figure 4B). In contrast, the backbone N-H groups of one strand can form hydrogen bonds with the backbone C=O groups of the adjacent strands, resulting in β-sheet structures ( Figure 4A). Therefore, the cancer-associated alterations in the integral ratio of α-helix and β-sheet protein secondary structures suggest that protein conformational alterations accompanying changes in their biological function might be a key event during the development of cancer. Several studies have shown that the proteins in serum change during breast cancer [21][22][23]79]. The alterations in the conformational compositions are presumably due to alteration in the concentration of cancer embryonic antigen (CEA) proteins [7]. signature is used to select the positivity/negativity of the disease and to estimate the sensitivity and specificity. Strong discrimination between diseased and control serum can be seen with a 90% sensitivity and 90% specificity for signature α/β, and these values are 100% and 80% for I1556/ I1295, respectively. The results indicated that the spectral signatures in the specified bands have a high diagnostic accuracy. As shown in Figure 4, the backbone N-H group donates a hydrogen bond to the backbone C=O group to form the helical structure of the α-helix ( Figure 4B). In contrast, the backbone N-H groups of one strand can form hydrogen bonds with the backbone C=O groups of the adjacent strands, resulting in β-sheet structures ( Figure 4A). Therefore, the cancer-associated alterations in the integral ratio of α-helix and β-sheet protein secondary structures suggest that protein conformational alterations accompanying changes in their biological function might be a key event during the development of cancer. Several studies have shown that the proteins in serum change during breast cancer [21][22][23]79]. The alterations in the conformational compositions are presumably due to alteration in the concentration of cancer embryonic antigen (CEA) proteins [7].

Discussion
Protein analysis is considered as a promising technique for understanding the progression of cancers. Similarly, FTIR spectral analysis is one of the accepted paradigms for the holistic evaluation of protein structural content at the molecular level in biological samples. Several studies have introduced the applicability of FTIR spectroscopy in serum samples accompanied by spectral analysis techniques for BC discrimination [22,23,29,80,81]. Reports [22,29] show the potential application of FTIR spectroscopy for protein analysis in the serum samples from BC patients. However, cancer initiation, progression, and response to therapy depend on an array of complex interactions between constituent biomolecules (proteins, lipids, nucleic acids, and carbohydrates) and not only at the level of the single (biomarker or target) molecule. Therefore, the feasibility of FTIR spectroscopy to extract a snapshot of cumulative molecular interactions within serum samples warrants a thorough investigation, as enabled by interdisciplinary collaboration between spectroscopists, biologists, and clinicians. It is noted that the evaluation of serological biomarkers (CA15-3, HSP90A, and PAI-1) do not show consistent differences between BC cases and controls that can lead to diagnosis [7]. Our

Discussion
Protein analysis is considered as a promising technique for understanding the progression of cancers. Similarly, FTIR spectral analysis is one of the accepted paradigms for the holistic evaluation of protein structural content at the molecular level in biological samples. Several studies have introduced the applicability of FTIR spectroscopy in serum samples accompanied by spectral analysis techniques for BC discrimination [22,23,29,80,81]. Reports [22,29] show the potential application of FTIR spectroscopy for protein analysis in the serum samples from BC patients. However, cancer initiation, progression, and response to therapy depend on an array of complex interactions between constituent biomolecules (proteins, lipids, nucleic acids, and carbohydrates) and not only at the level of the single (biomarker or target) molecule. Therefore, the feasibility of FTIR spectroscopy to extract a snapshot of cumulative molecular interactions within serum samples warrants a thorough investigation, as enabled by interdisciplinary collaboration between spectroscopists, biologists, and clinicians. It is noted that the evaluation of serological biomarkers (CA15-3, HSP90A, and PAI-1) do not show consistent differences between BC cases and controls that can lead to diagnosis [7]. Our data show alterations in the biochemical, and structural, information of the constituent components of the sample medium. As such, the holistic evaluation of biochemical details with the use of infrared spectroscopy can thus have an immense potential for BC discrimination analysis in the clinical domain.

Deconvolution of Spectral Range 1140-1000 cm −1
To analyze the snapshot of alterations reflected in our FTIR spectral data, the complex region [62] of carbohydrates and nucleic acids, 1140-1000 cm −1 , was deconvoluted. The BC-associated alterations in the DNA and RNA are reflected in this region. Circulating DNA and protein markers are generally evaluated to track the biomolecular events of cancerous patients [82]. Herein, this spectral range is deconvoluted with six GFEB ( Figure 5A) by approximating the numbers and positions using the minima of second derivatives. The sum of integral areas covered by six bands (integral values) of control, samples range from 11.4 to 13.2, while these values in BC samples are from 13 to 14.8. This quantified information was further statistically analyzed ( Figure 5B), and the result shows a clear separation between control and BC. Similarly, Figure 5C shows the histogram of the average values of absorbance at wavenumber 1020 cm −1 . The absorbance at this energy band is found to be due to the presence of circulating DNA [61,62].

Deconvolution of Spectral Range 1140-1000 cm -1
To analyze the snapshot of alterations reflected in our FTIR spectral data, the complex region [62] of carbohydrates and nucleic acids, 1140-1000 cm -1 , was deconvoluted. The BC-associated alterations in the DNA and RNA are reflected in this region. Circulating DNA and protein markers are generally evaluated to track the biomolecular events of cancerous patients [82]. Herein, this spectral range is deconvoluted with six GFEB ( Figure 5A) by approximating the numbers and positions using the minima of second derivatives. The sum of integral areas covered by six bands (integral values) of control, samples range from 11.4 to 13.2, while these values in BC samples are from 13 to 14.8. This quantified information was further statistically analyzed ( Figure 5B), and the result shows a clear separation between control and BC. Similarly, Figure 5C shows the histogram of the average values of absorbance at wavenumber 1020 cm -1 . The absorbance at this energy band is found to be due to the presence of circulating DNA [61,62]. The number and position of the six bands used to fit the experimental curve were determined by using the minima of secondary curves, as in the amide I case. (B) Bar graph representation of the average value of the integral sum, which shows a significant difference between the control and BC case. (C) Bar graph of the average absorbances at wavenumber position 1020 cm -1 , which is mainly due to the presence of DNA. It also shows a significant difference between the control and BC cases.

Potential Prototype for Clinical Application
Moreover, a prototype for our presented diagnostic regimen for clinical use can be developed. Spectral measurements and data analysis procedures will be automated into a single step so that a technician can deposit the sample on to the sample holder and start to measure with a simple click to get the result and, if needed, the biochemical information easily as shown in Figure 6. Here, attenuated total reflection Fourier transforms infrared (ATR-FTIR) spectroscopy (that is reliable for body-fluids analysis) integrated with two micro-controllers, where micro-controller A controls all the functions in the FTIR and extracts information about signal sample interaction, while the controller B controls software for data analysis. The software program will include several subroutines as The number and position of the six bands used to fit the experimental curve were determined by using the minima of secondary curves, as in the amide I case. (B) Bar graph representation of the average value of the integral sum, which shows a significant difference between the control and BC case. (C) Bar graph of the average absorbances at wavenumber position 1020 cm −1 , which is mainly due to the presence of DNA. It also shows a significant difference between the control and BC cases.

Potential Prototype for Clinical Application
Moreover, a prototype for our presented diagnostic regimen for clinical use can be developed. Spectral measurements and data analysis procedures will be automated into a single step so that a technician can deposit the sample on to the sample holder and start to measure with a simple click to get the result and, if needed, the biochemical information easily as shown in Figure 6. Here, attenuated Cancers 2020, 12, 1708 9 of 17 total reflection Fourier transforms infrared (ATR-FTIR) spectroscopy (that is reliable for body-fluids analysis) integrated with two micro-controllers, where micro-controller A controls all the functions in the FTIR and extracts information about signal sample interaction, while the controller B controls software for data analysis. The software program will include several subroutines as reading spectral data from the FTIR; extract data for suitable spectral signatures in the measured range; normalizing and baseline correction of spectral data subroutines will have simple loops, condition checks, and basics mathematical calculations. The second derivative will be calculated by using divided difference formulas for discrete data. After finding various minimums and their positions, the program will assign parameters for Gaussian energy bands and select settings for bands to minimize RMS error (Levenberg Marquardt algorithm) between the experimental data and fitted curves. The standard numerical integration technique will be used to find the area under Gaussian bands and the ratio. Additionally, combining all the identified multiple spectral signatures into a single diagnostic index using them as the discriminating signature marker, a portable device integrated with the user-friendly desktop unit (can automatically perform the full data analysis and will display laboratory test report) can be prepared. condition checks, and basics mathematical calculations. The second derivative will be calculated by using divided difference formulas for discrete data. After finding various minimums and their positions, the program will assign parameters for Gaussian energy bands and select settings for bands to minimize RMS error (Levenberg Marquardt algorithm) between the experimental data and fitted curves. The standard numerical integration technique will be used to find the area under Gaussian bands and the ratio. Additionally, combining all the identified multiple spectral signatures into a single diagnostic index using them as the discriminating signature marker, a portable device integrated with the user-friendly desktop unit (can automatically perform the full data analysis and will display laboratory test report) can be prepared. All in all, the FTIR spectroscopy of serum samples could be a promising technique for an Affordable, Sensitive, Specific, User-friendly, Robust and rapid, Equipment-free, and Deliverable (ASSURED) regimen for the evaluation of BC-associated molecular level of alteration in constituent protein structures. Our study holds value, as available techniques such as mammograms, MRI, and ultrasonography have their limits and may not be 100% accurate [4,[83][84][85]. Among them, MRI achieves a high sensitivity of 70-100% in the initial screening (prevalence), compared at 40% or less for mammography in patients with high risk to develop BC [84,85], but the specificity of MRI is hampered by its difficulty while distinguishing the overlapping features of benign and malignant lesions, leading to higher false-positive rates [83]. Ultrasonography also fails to detect microcalcifications and has a poor specificity. Therefore, the present diagnostic regimen of BC having the potential to promote timely onward referral of patients for further testing and detection of recurrent disease, "enabling serial sample and testing with less cost, resource and radiation exposure" could be beneficial for several patients.

Human Sera
Human sera from breast cancer patients were obtained from the Breast Satellite Tissue Bank, Winship Cancer Institute, Emory University, Atlanta GA, USA. The Helsinki Declaration guidelines were followed for sample collection, and informed consent was obtained from all the patients All in all, the FTIR spectroscopy of serum samples could be a promising technique for an Affordable, Sensitive, Specific, User-friendly, Robust and rapid, Equipment-free, and Deliverable (ASSURED) regimen for the evaluation of BC-associated molecular level of alteration in constituent protein structures. Our study holds value, as available techniques such as mammograms, MRI, and ultrasonography have their limits and may not be 100% accurate [4,[83][84][85]. Among them, MRI achieves a high sensitivity of 70-100% in the initial screening (prevalence), compared at 40% or less for mammography in patients with high risk to develop BC [84,85], but the specificity of MRI is hampered by its difficulty while distinguishing the overlapping features of benign and malignant lesions, leading to higher false-positive rates [83]. Ultrasonography also fails to detect micro-calcifications and has a poor specificity. Therefore, the present diagnostic regimen of BC having the potential to promote timely onward referral of patients for further testing and detection of recurrent disease, "enabling serial sample and testing with less cost, resource and radiation exposure" could be beneficial for several patients.

Human Sera
Human sera from breast cancer patients were obtained from the Breast Satellite Tissue Bank, Winship Cancer Institute, Emory University, Atlanta GA, USA. The Helsinki Declaration guidelines were followed for sample collection, and informed consent was obtained from all the patients (females, age 30-65 years, see Table S1 for details). Blood was collected without additives from patients after informed consent. The blood was then centrifuged at~3200g for 10 min, and the serum was pipetted and stored at -80 o C until analysis. The control healthy individual sera were from the baseline collection of healthy women (age 41-58 years) participating in an independent intervention study under approval number 13317, Edith Cowan University, Perth, Australia. All the participants provided informed consent. The sera were thawed, aliquoted in small volumes, and stored at −80 o C until analysis.

FTIR Spectrometer
Spectral data were obtained using a Bruker Vertex-70 FTIR spectrometer fitted with a potassium bromide (KBr) beam splitter and Deuterated Tri-Glycine Sulfate (DTGS) detector. Furthermore, this study utilized an MVP-Pro ATR accessory fitted with a diamond crystal, which was configured to have a single reflection. To achieve the best resolution available, a Medium Blackman-Harris apodization function was used with a resolution of 4 cm −1 and a zero-filling factor of four. This choice was opted for because a weak apodization leads to a higher resolution, but at the cost of increasing noise. Typically, a medium apodization is recommended [86] for liquids, gels, and semi-solids, such as the Blackman-Harris three-term used in this study. The aperture size was set to 2.5 mm for the optimization of the detector response without saturation.

ATR-FTIR Spectral Measurements
To get rid of excess staining substances, the ATR crystal surface (in the FTIR light path) was thoroughly cleaned with sterile phosphate-buffered saline and ethanol before use. Sufficient cleanliness was confirmed by ensuring that the absorbance spectrum obtained without a sample contained no peaks higher than the noise level. Prior to each spectral scanning, a background measurement was performed by collecting data from the cleaned crystal surface and subtracting it from the sample signal spectrum. One microliter of each sample was deposited, allowed to settle to room temperature (~4 min), then scanned multiple times to yield high-quality, reproducible spectra. Variations due to moisture were avoided by drying the serum samples, as described previously [46].

Spectral Analysis
Each sample was scanned within the range of 400 to 4000 cm −1 until at least eight high-quality spectral curves were obtained. Further statistical analysis was carried out on an average of 100 co-added scans for each sample. A total of 20 spectra of serums (representing 20 individuals; control 10 and BC 10) were obtained. However, to perform a multivariate analysis a repeated measurement was performed on each sample (to get 20 spectra of control and 20 spectra of BC). For the multivariate analysis, the second derivatives curves of the absorbance spectra were vector-normalized, while throughout other studies the spectral data (absorption spectra) were min-max normalized [13] using the OPUS 6.5 software within the fingerprint region of 1800 to 900 cm −1 . The absorbance value of the amide I band position (~1642 cm −1 ) was 2 AU, corresponding to~99% absorption according to the Beer-Lambert law [76]. Herein, we have selected the spectral region of 1800 to 900 cm −1 by avoiding the strong moisture absorption region as suggested. We have then studied the settling and uniformity while drying the serum samples in the crystal surface to determine the optimum settling time considering the reproducibility of curves. The spectral region of 1800 to 900 cm −1 shows excellent reproducibility, so the proteomic signatures reflected in this range are only analyzed in this study.
The complex IR spectral bands of amides I band were deconvoluted into six GFEBs, which involved the normalization of spectra, the sectioning of the necessary bands, and baseline rubber-band correction so that the absorbance at the endpoints becomes zero. The complex region of carbohydrates and nucleic acids, 1140-1000 cm −1 , was further deconvoluted into six GFEBs. The position and number of GFEB used to fit the experimental curves were determined using the minima of the original spectra's second derivatives. By utilizing the Levenberg-Marquardt algorithm, the experimental curves were fitted, minimizing the RMS error, which is indicative of successful [56] GFEB fitting.

Conclusions
The ratio of the integral areas of GFEB representing α-helix and β-sheet protein secondary structures (α/β) and absorbance values (I 1556 /I 1295 ) are identified as unique proteomics spectral signatures for BC discrimination. The discriminating potential of the technique, as well as its sensitivity and specificity, were further assessed using the AUC values of ROC curves ( Table 2). The maximum values of sensitivity and specificity for each feature describe the difference between the BC and control sera. The AUC, sensitivity, and specificity values for the α-helix/β-sheet ratio were 0.955, 80%, and 100%, respectively, with a p-value of 1.4E-04, while for the I 1556 / I 1295 ratio these values were 0.98, 100%, and 80%, respectively, with a p-value of 2.7 × 10 −5 . Table 2. Identifying BC-associated discriminatory protein bands in serum samples. These include the integral ratio of Gaussian Function Energy Bands (GFEB), representing α-helix and β-sheet protein secondary structures, as well as the absorbance ratio of amide II (1556 cm −1 ) to amide III (1295 cm −1 ). Quantified values (in A.U.) of the average and range of spectral signatures taken from the control and BC samples. The optimal cutoff and the corresponding sensitivity, specificity, and p-values are also shown.

Signatures
Average In conclusion, this study provides evidence that the BC-associated proteomics conformational changes in the serum samples of BC patients can be analyzed by using a curve-fitting technique of infrared spectral data. The study also provides a detailed insight into the protein structure changes that occur in BC patients, paving the way for further large-scale studies. The detailed study presented a new regimen of BC discrimination which would allow the assessment of disease status and therapeutic efficacy. In this study, the possible discrimination of nucleic acids and carbohydrate regions by using curve fitting is also presented. The simultaneous fitting of these absorption bands provides a more robust base for the structural studies of proteins and complex band contours. In addition, future research directions are also presented. The potential use of the spectrometric assessment of serum protein conformation in breast cancer diagnosis and monitoring, as well as the relevance of serum protein conformational changes in cancer development, merit further investigation towards establishing a successful clinical technique in the future.