Non-Invasive Disease Specific Biomarker Detection Using Infrared Spectroscopy: A Review

Many life-threatening diseases remain obscure in their early disease stages. Symptoms appear only at the advanced stage when the survival rate is poor. A non-invasive diagnostic tool may be able to identify disease even at the asymptotic stage and save lives. Volatile metabolites-based diagnostics hold a lot of promise to fulfil this demand. Many experimental techniques are being developed to establish a reliable non-invasive diagnostic tool; however, none of them are yet able to fulfil clinicians’ demands. Infrared spectroscopy-based gaseous biofluid analysis demonstrated promising results to fulfil clinicians’ expectations. The recent development of the standard operating procedure (SOP), sample measurement, and data analysis techniques for infrared spectroscopy are summarized in this review article. It has also outlined the applicability of infrared spectroscopy to identify the specific biomarkers for diseases such as diabetes, acute gastritis caused by bacterial infection, cerebral palsy, and prostate cancer.


Role of Metabolites in Diagnosis
Many life-threatening diseases develop silently in the human body, and symptoms only appear when the diseases are already rich in their advanced stage. In this regard, cancer [1], heart diseases [2], stroke [3], chronic obstructive pulmonary disease (COPD) [4], diabetes [5], Alzheimer's [6], etc., are the most vulnerable diseases and cause millions of deaths worldwide in every year. The primary cause of the high death rate of these diseases is the lack of diagnostic techniques in their early disease stage. In addition, most of the diagnostics are expensive and demand invasive sample collection (especially for cancer) which may create a physical risk as well as psychological stress [7,8]. For example, until now, tissue biopsy has been the only reliable diagnostic method for cancer. However, it is an invasive process; therefore, as a rule of thumb tissue biopsy is used only with definite symptomatic cases. Unfortunately, by the time of detection, the diseases are already in the advanced stage, and it is too late in many cases. Several attempts have been taken to develop non-invasive detection of cancer. For example, great efforts are being made to develop imaging methods for tumour detection [9,10]. Unfortunately, because of technical reasons, small tumours cannot be detected at their initial stage by imaging methods. This implies a possibility of realizing only advanced-stage diagnosis, giving a chance for cancer cells to be metastasized. Alternatively, metabolic analysis of biofluids already demonstrated its power to detect many diseases [11]. For example, many life-threatening diseases such as diabetes, Alzheimer's, cardiovascular disease, stroke, schizophrenia, etc., are characterized by changes in metabolites in the biofluids [12]. The advantage of the sample collection processes. These make these two biofluids attractive bio-probes for the development of non-invasive diagnostics. In this review, the development of infrared spectroscopy based a fully non-invasive diagnostics is discussed.

Mass-Spectroscopy
Diagnosis of disease by smelling the body is a practice from the ancient age; however, it was never been studied scientifically until recent years [26]. Recently it was reported that sniffer dogs were able to detect malignant tumours [27,28]. First, it was noticed by a 44-years old woman when her dog was constantly sniffing on a mole on her left thigh; however, the dog did not have any interest in other moles. This particular mole was excised and histological examination confirmed malignant melanoma. Williams et al. concluded in their report in The Lancet "Perhaps malignant tumours such as melanoma, with their aberrant protein synthesis, emit unique odours which though undetectable to man, are easily detected by dogs with their well-developed rhinencephalon". This observation and report are very promising; however, in the twenty-first-century diagnosis by dogs can not be a reliable technique for cancer detection. To my knowledge, Linus Pauling, the American physicist, first reported a scientifically systematic study of VOCs by gas-liquid partition chromatography [29]. Along with his colleagues, he identified and reported around two hundred VOCs. Up to now, more than three thousand VOCs have been identified [26,30]. Practically, most of them were identified by different massspectrometry methods, e.g., gas chromatography-mass spectrometry (GC-MS) [31], ion mobility spectrometry (IMS) [32,33], proton transfer reaction mass spectrometry (PTR-MS) [34], selected ion flow tube mass spectrometry (SIFT-MS) [35], etc. Mass spectrometry has a fairly high sensitivity at the level of 100 ppt (parts-per-trillion) [36]; however, due to the complex and not completely controlled process of the sample preparation, it suffers from poor accuracy and as a result does not show reproducibility in VOC analysis. This makes a statistical set of data unreliable [37]. An indirect confirmation of this statement is the insufficient accuracy of the corresponding diagnostics (the benchmark is accuracy above 90%), revealed after analysing the literature data for a very wide range of diseases. The accuracy becomes higher by using one GC-MS instrument and one operator throughout the study [38]. This statement makes a question mark over the reliable use for the medical diagnosis. Moreover, mass spectrometers are very expensive and bulky in size.

e-Nose and QEPAS
Another developing instrument, the so-called electronic nose (e-nose) [39,40] is an attractive experimental tool, especially concerning the cost and size of the instrument. It uses a series of chemical sensor arrays to mimic the human smelling system. Pattern recognition using machine learning makes the e-nose a user-friendly technique for the recognition of VOCs in real-time. A nano structure-based e-nose device is extremely small in size [41]. Additionally, the e-nose sensors are more cost-effective than any other existing VOC detection technique. However, being a "black box", different results were obtained by different research groups with e-nose; moreover, could not reveal metabolites [42]. Therefore, more investigation is necessary before its introduction into clinical application as a supplementary diagnostic technique. Recently developed quartz-enhanced photoacoustic spectroscopy (QEPAS) [43,44] for multi-gas detection holds a lot of promise to be a diagnostic tool; however, a proper investigation is required to carefully check the applicability in clinical diagnosis.

Infrared Spectroscopy
Compared to MS and e-nose, infrared spectroscopy represents the most fundamental technique for the detection and identification of molecules. Infrared spectroscopy is well established as a powerful and widely used tool for the analysis of molecular structure and dynamics in the fields of chemistry and physics [45][46][47][48][49]. It uses molecular vibrations as a probe to identify the molecule by structural analysis [50][51][52][53][54][55][56][57]. In practice, infrared light is used to excite molecular bonds and measure the absorption of light during their vibrations. Each chemical bond has a unique vibrational energy [58,59]; therefore, they absorb a specific wavelength of infrared light. As a result, each molecule produces a set of unique spectral features in the acquired infrared spectra. The set of unique spectral features from a particular molecule is called a fingerprint of the molecule [60]. The position, strength, and shape of the molecular fingerprint are used to identify and quantify a molecule from a mixture of molecules in biofluids [61][62][63].

Application of Infrared Spectroscopy to Biofluids and Tissue
As mentioned in the previous subsection, infrared light excites the molecular bonds only; therefore, molecules remain unperturbed. Additionally, the unique behaviour of each molecular bond allows for label-free extraction of biochemical information from biofluids and tissues. Particularly these two properties of infrared spectroscopy make it an attractive tool for biomarker-based early disease diagnostics. For example, a blood-based cancer diagnosis by infrared spectroscopy is a rapidly expanding research area [64]. A multiinstitutional study of infrared fingerprints of plasma and serum samples for different cancer types demonstrated a promising result to identify different cancer types [65]. The infrared spectroscopic method was also employed for the diagnosis of diseases such as type 2 diabetes [66], HIV [67], etc., by blood sample analysis. In a recent study, salivary vibrational modes were analysed by attenuated total reflection-Fourier transform infrared (ATR-FTIR) spectroscopy to distinguish between healthy and COVID-19 patients [68]. As a biofluid, urine is also analysed by FTIR spectroscopy for the diagnosis of different renal diseases [63] and diabetes [69]. FTIR imaging of tissue is the fastest diagnosis method for histopathology of breast cancer [70,71]. The success of infrared spectroscopy for the analysis of liquid biofluids and tissue naturally motivated researchers to apply the technique for gas-phase biofluid analysis. However, this powerful tool can not be used in a straightforward way, especially for the identification of VOCs from gaseous biofluids. A brief working principle, advantages, and limitations of infrared spectroscopy for VOC analysis are discussed in the following sections.

Infrared Spectroscopy of Gaseous Biofluids
The major obstacle of infrared spectroscopy for the analysis of biological samples is the large amount of water vapour contained in gaseous biofluids. For example, a breath sample of a healthy person contained 5-7% of water vapour in a normal condition [72,73]. Such high water contained not only absorbs a large amount of infrared intensity but also the absorption spectra of water practically cover the most important spectral region where many VOCs yield their fingerprints. As a result, spectral features of VOCs of lower concentration (trace amount) are buried under the water spectra. In these circumstances, infrared spectroscopy is practically paralysed to reveal most of the VOCs. Therefore, the primary task is to remove water vapour from the biofluids to make use of the utility of infrared spectroscopy. The recent progress on the water vapour suppression technique opens a new window to investigate volatile metabolites in gaseous biofluids [74]. A brief description of the water vapour suppression technique is presented in the following section.
A schematic of the experimental scheme is presented in Figure 1. A detailed description of the experimental setup and working principle were presented in a previous publication [74]. There are three major units in the experimental setup, namely, (1) a sample collector, (2) a sample preparation unit, and (3) an infrared FTIR spectrometer. The sample collector system is designed in such a way that it is able to accept gaseous samples as well as the headspace of liquid biofluids [75]. Gaseous biofluid, e.g., breath is collected in a Tedlar bag or canister, and liquid biofluids, e.g., urine, blood, etc. are collected in a specially designed well-sealed glass flask. Before the sample injects into the sample collector, the complete system is evacuated by two vacuum pumps to remove any trace of contamination from the previous measurements. Breath samples or headspace of liquid biofluids are transferred to the empty sample collector by releasing the valve. Point to be noted that in the case of liquid biofluids, the flask needs to be partially filled by the biofluid and keeps it for a sufficient time in the flask to allow volatile compounds to be escaped from the liquid via sublimation. The VOCs are then accumulated on the top of the liquid as headspace. Unlike breath, the headspace sample collection aimed at further spectroscopic analysis still has no established standard operating procedure (SOP). (2) sample preparation-a water-suppressed sample is prepared for infrared spectroscopy when gaseous biofluids are passing through the "Water Condenser"; (3) spectral analysis-water suppressed gaseous sample is collected in a multipass gas cell and measured with an FTIR spectrometer. For details see the Ref. [74].
The biofluids collected from the test person often need to be stored temporarily. There are established protocols to store liquid biofluids; however, gaseous biofluid storage is under development [76,77]. One of the common storage devices for VOCs is the Tenax sorbent tube. The stability of VOCs trapped from breath samples in a Tenax adsorbent tube and stored −80°C was studied over a year [78]. A significant loss was observed, and after 6 months only 27% of the sample was recovered. The study recommended the storage of VOCs for only 1.5 months. For infrared spectroscopy, sample storage by Tenax tube has yet not been tested. Breath samples were stored in Tedlar bags at 4 • C and tested carefully for several weeks. Approximately 30% drop of carbon monoxide was observed; however, bigger molecules such as methane, acetone, isoprene, etc., remain constant [74,79]. This observation was performed only for a few molecules. For the establishment of SOP, further investigation with a large number of VOCs is necessary. On the other hand, storage of breath samples using a Tedlar bag required significant storage space. Therefore, to optimize the storage capacity and reliability of the biomarker investigation, it is recommended to perform the measurement within a week.
A water condenser is a part of the experimental setup where water vapour is suppressed by condensation. It is a closed metal chamber containing a 12 m long, spiral copper tube, through which a gas-phase biofluid is transferred from the sample collector to a measurement cell attached to an FTIR spectrometer. The metal chamber in the water condenser is filled with a special liquid operating in a high dynamic temperature range between −95 • C and +45 • C. Before the gas-phase biofluid is allowed to transfer through the water condenser, the liquid is cooled down to −60 • C by a refrigerated circulator. At −60 • C, the sample is allowed to transfer through the spiral copper tube with a precisely controlled flow rate of 3 mL/s. A significant amount of water vapour is removed from the sample when it transfers through the cold copper tube. A reduction factor of above 2500 is achieved at −60 • C when the breath sample is transferred through a water condenser.
Finally, the water-suppressed gas-phase biofluid is transferred to the multipass sample cell. After each experiment, the copper tube is cleaned by heating up the special liquid to 45 • C with a heat circulator and vacuum pumps. A detailed description of the setup and its working principle was reported in a separate article [74].
After the water suppression, the gaseous biofluid is transferred to a multipass cell which is placed inside the FTIR spectrometer. The infrared light travels approximately 4 m inside the cell. Infrared absorption spectra of biofluid are collected by a liquid nitrogencooled MCT detector in a range from 500 cm −1 to 4000 cm −1 . Usually, two different types of experiments need to perform for the biofluids analysis. One included a single case aimed to find out metabolites present in the sample [75], and another aimed to find out the statistical difference between healthy volunteers and disease groups. For the first case, spectral data are analysed by the component analysis, and for the second case a statistical analysis is performed. In component analysis, the known molecular fingerprints are used for fitting the observed spectral features [80]. Statistical analysis is performed using unsupervised as well as supervised statistical methods.

Islands of Stability (IOS)
Identification and quantification of the metabolites for an individual have significant importance for the understanding of the internal chemistry of the body; however, it may have even greater importance for disease diagnosis if a reliable correlation among metabolites and disease can be found. Along with the reliable identification of metabolites, it is essential to monitor the dynamics of metabolite concentration over time, for the diagnosis of diseases. The existence of an individual metabolic phenotype (IMP) approach has been under discussion for several years [81][82][83]. Long-term stability has been reported for several biofluids [84,85]; however, until recent years, stability of the volatile metabolites in breath samples have been reported only for few days [77,86]. The first long-term study with several months has been reported recently. A few volatile metabolites in the human breath have been detected and quantified by infrared spectroscopy in a time period over eighteen months [17]. The observations are the following.

Effect of Physical Exercise
It is well-known that the concentration of the metabolite changes due to diet, lifestyle, sports, health conditions, etc. However, the question is what is the range of change in the concentration of the specific metabolites? There are many studies reported for each specific case. For example, during physical exercise breath components were measured and reported [87][88][89][90][91]; however, there was no information about the dynamics of metabolites concentration after exercise. Knowledge about the "after-effect" of any circumstance is crucial for a reliable diagnosis. A follow-up study of breath samples was reported recently. In this study, a healthy volunteer performed jogging for 30 min and donated breath samples "just before" and "just after" jogging and also follow-up samples for two hours. The infrared spectroscopic method has been used to monitor the concentration of VOCs. Many VOCs showed a change of concentration during jogging and gradually returned to their normal state in a short time period. As an illustration, the dynamics of carbon dioxide have been presented in Figure 2a,b. A sharp rise of CO 2 concentration was observed with the increase in jogging time. After jogging, CO 2 concentration dropped,;however, at a slower rate than CO 2 concentration increased. Two hours after jogging, CO 2 concentration returned to its normal state (Figure 2b). It was expected that similar behaviour would be observed for other kinds of physical activities. This information is important to develop the standard operating procedure (SOP) for breath sample collection.

Effect of Coffee Drinking
In metabolism, foods, and drinks may have the strongest and instant influences [88,[92][93][94][95][96]. Therefore, it is expected that a strong effect of foods and drinks would be observed in the breath VOCs. Of course, the change in metabolic concentration is mild in the case of regular food and drinks; however, a strong effect is observed in the case of a new type of food or drinks. There is a classic example of the coffee effect reported recently [17]. Two healthy volunteers of similar age, one is a moderate coffee drinker (>5 cups/day) and another volunteer drinks coffee occasionally (1 cup/two weeks), took part in the study. Both of them used the same kind of coffee and provided breath samples at regular intervals of time, starting from "just before" drinking the coffee and until three hours after the coffee intake. The experiment was performed and the results were compared without any attention to the individual's physical state or previous meal intake. Infrared spectral data were analysed by component analysis as well as statistical analysis, e.g., principal component analysis (PCA). Results are presented in Figure 3a,b. A maximum concentration variation of metabolites was observed about an hour later than coffee intake. The concentration of metabolites gradually decreased and returned to its steady state. The maximum shift of the steady state point and its return to the steady state reflects the characteristics of the two volunteers. In the case of the rare coffee drinker, the shift is significantly high and it took much longer time to return to a steady state in comparison to the moderate coffee drinker. This observation is expected since coffee is an unknown chemical for the rare coffee drinker's body and it reacts strongly. This information is crucial to establish an SOP for the metabolic analysis of biofluids.

Effect of Fasting
Fasting is not a regular human activity; however, it is a part of our life. Especially in the modern lifestyle, people often skip one or two meals in a day. Fasting has a strong influence on the metabolic profile of the person [95][96][97]. As a consequence of fasting, the concentration of acetone changes significantly. In a recent publication, a follow-up investigation of breath metabolites after fasting was reported [17]. The "after-effect" of 27 h of fasting was monitored for four hours. Many metabolites showed a significant shift in their concentration level due to fasting. For example, the variation of concentration of acetone and carbon monoxide were plotted in Figure 4a. After 24 h of fasting, the breath acetone level of a healthy person increased by approximately three times. At the end of fasting, the volunteer had a normal meal. The acetone level started to drop and still remain two times higher than the normal level after 4 h of the meal, whereas only 20% variation of acetone was observed in a circadian cycle. Therefore, it is extremely important to know the last food intake before collecting the biofluids, especially for diagnostic purposes. Otherwise, an abnormal change in acetone concentration may mislead the diagnosis.

Circadian Variation of Metabolites in Breath
In the circadian cycle, human activities change in a large spectrum from sleeping to a high level of mental and physical activities. These may have strong influences on the concentration of metabolites in biofluids [98]. In general dynamics of metabolites are expected to follow the circadian rhythm. Knowledge of change of metabolite composition in the circadian cycle [98] is essential for the development of SOP for metabolic analysis. The variation of acetone and isoprene over 27 h for a healthy person are plotted in Figure 4b. A strong increase in isoprene concentration was observed during sleeping. After awakening, isoprene concentration dropped very fast until lunchtime and remained low till evening. Many supportive pieces of evidence suggest that isoprene is related to cholesterol biosynthesis [99,100]. Therefore, measurement of isoprene could potentially be used for lipid disorders. It is also reported as a biomarker for cancer screening [101]. These evidential reports demand specific knowledge about the concentration of isoprene and acetone for the reliable diagnosis of the diseases. Since isoprene and acetone have a high dynamic range over the circadian cycle, therefore, it is important to know their concentration level, at the time of sample collection. This information is essential to upgrade the SOP for the diagnosis of disease.

Longitudinal Study of Metabolite's Stability
To date, discussed dynamics of metabolites are for a short time period, extended from hours to a single day only. What would be the behaviour of metabolites over a longer time period such as months or even years? In a recent investigation, several healthy volunteers took part in a longitudinal breath study by infrared spectroscopy. Several VOCs were monitored over a time period of eighteen months and observations were plotted in three-dimensional VOCs space in Figure 5a,b. For each individual, measured points in three-dimensional component space were enclosed by a circle, oval, or triangle. For eleven out of fourteen individuals, measured points were enclosed in a compact space and clearly separated from each other (see Figure 5a). This means, the individual's breath content remained stable over the period of eighteen months and the overall recognition score of the individual was 100%, meaning the probability of unambiguously linking the non-assigned experimental point to one or another individual. For three individuals, the size of the enclosures was relatively large in methane-acetone-isoprene space ( Figure 5a) and they were well apart from the other eleven individuals. It was observed that these three individuals shifted far away from other only in the methane axis. The individual points were also elongated in the methane axis. These three individuals were high methane emitters and their methane concentration varies over time [102]. Measured points for these three individuals in CO-acetone-isoprene (Figure 5b) space were compact in a small space; however, three smokers were separated out, who were compact in the previous case. In the long-term monitoring of metabolites for individual healthy persons, it was observed that, for each individual, a set of metabolites in breath samples remains reproducible at least for 18 months. This unique behaviour of an individual's breath component was conceptualized as "island of stability" (IOS) in a multidimensional metabolite concentration space. The concept is depicted in Figure 6. The IOS approach allows us to represent any physiological data of an individual as well as the effects affecting their variations, both in a quantifiable way. In the IOS representation, physiological parameters for an individual reduced to a multidimensional metabolite space as (n 1 ± δn 1 , · · · , n i ± δn i ), with δn i as concentration variations of ith metabolites during the longitudinal study. This set is called an IOS. In mathematical representation, a set of averaged values (n 1 , · · · , n i ) represents the IOS core of the individual's data of the highest precision. In normal physiological conditions, the IOS of an individual represents a dressed state (or the noise-affected state containing all contributions to δn i ), marked in grey in Figure 6. Different sizes of grey areas represent different effects that influence the concentration of metabolites. When these effects are released, the concentration of metabolites returns to the IOS. Nonetheless, for disease cases, it takes a long path to return to IOS and in the case of chronic disease, the IOS shift does not return to the core.
A disease is a change in the equilibrium state of the body, which may affect the metabolism of the body [103]. Due to the change in metabolism, it is expected that the characteristics of the contents of the biofluids (metabolites) also change. Having sensitive instruments for analysing the constitute and/or concentrations of the maximum amount of metabolites, it is possible to follow the body state and more importantly, one can try to recognize the disease at its early (asymptomatic) stage [17]. During the 18 months of study, some of the volunteers suffer from the common cold, pollen allergy, and viral and bacterial infections. The concentration of breath VOCs in the recovery time was monitored. The evolution path of the metabolites was drawn in Figure 7a,b. Although the monitoring was not systematic, many cases of intermediate states and return to the steadystate concentration of the metabolites after recovery indicate that it is possible to monitor the progression of the disease from its asymptomatic state. A discussion on diseases and potential biomarkers is presented in the following sections. Figure 6. An illustration of the IOS concept for an individual. Any physiological parameters of the body can be presented on this graph. The space of representation can be blind (PC, canonical analysis) or show measurable variables such as VOC concentrations (VOCC) as its axes. Shown: the light grey area represents several daily factors affecting the IOS core and increasing thus the measurable IOS size; the medium grey area, factors affecting IOS on a weekly or monthly scale such as fasting or coffee intake for rare coffee drinkers; the dark grey area, extraordinary effects such as strong stress or disease. There are two main scales making the concept quantifiable: the core size a and the strength of the effect b, c, etc. In the case of VOCC representation, scale parameters a and b are reduced to n and δn. The concept can be extended to many individuals. In this case, two other scales should be used: a and l, where l is the distance between the IOS cores. The higher the space dimensionality, the more cross sections can be found where any two persons will have l > a. (Ref. [17]).

Diagnosis and Potential Biomarkers
It is an established fact that metabolites carry unique chemical information specific to the cellular processes of human beings. By characterizing metabolites, chemical processes in the cellular constitution can be revealed. In a steady health condition, constituent metabolites in biofluids are in equilibrium. In case of any abnormalities in cells, the reaction rate of the biochemical processes may change. As a result, constituent metabolites in biofluids deviate from their equilibrium state. Revealing the characteristic change of metabolites it may possible to identify the cells which are under an abnormal situation. This is the key point of a diagnosis of disease by analysing metabolites and thus a new branch of science has evolved, called metabolomics [104,105]. Metabolites related to the specific disease are called biomarkers. In pathology, chemical analysis of urine and blood are routine procedures for diagnosis and monitoring many diseases such as diabetes, Alzheimer's, cardiovascular disease, prostate cancer, etc. Still, it does not consider an independent tool for diagnosis but rather considered as a supportive diagnostic tool. Significant work is going on to find out disease-specific reliable metabolites [16,[106][107][108][109][110][111], more specifically which is called as a biomarker. A few disease-specific biomarkers are discussed below.

Disease Specific Biomarkers
For the diagnosis of a disease, it is necessary to follow the IOS for a person under investigation. Any deviation in the concentration of metabolites from IOS indicates an abnormality of the body which may relate to a disease [17]. To find out specific biomarkers for a particular disease, a common practice is to compare the biofluid of the diseased person with the healthy person. In the case of infrared spectroscopy, spectral features in the infrared spectra of biofluids may differ in one or many places for the healthy and disease cases. Those spectral regions are chosen for further analysis. To identify the biomarker, known molecular spectra are fitted with the identified spectral features in breath by least square fitting. The best fitted molecule is the possible biomarker for the disease. Finally, a statistical analysis is performed to find out the maximum number of variables possible among healthy and diseased groups.

Diabetes
A metabolic disorder is a common heath issue among a large population in a modern society [5,112]. The metabolic disorder is commonly called diabetes. In general, a high blood sugar level is observed over a prolonged period of time in the case of diabetes [20]. It develops silently in the body and turns into a chronic disease. Over time, diabetes can cause serious health problems, such as heart disease, vision loss, kidney disease, etc. [21]. Biochemical analysis of blood is a routine procedure for clinical diagnosis and monitoring diabetes [113]. However, the invasive collection of the blood samples makes the diagnosis unpleasant for regular monitoring of blood sugar. Chemical analysis of saliva is a promising method for diagnosis and monitoring of sugar levels in the body [114]. The technique needs to be developed further before being accepted as a reliable clinical application. It is an established fact from ancient history that the body odour of diabetic patients smells sweet due to the presence of excess acetone in their biofluids, such as sweat, breath, etc. Recently many studies manifested that exhaled breath of diabetic patients contains considerably higher acetone than a healthy person [109,[115][116][117][118]. Most of those breath measurements were performed by mass spectroscopic methods. As already mentioned, due to high cost, bulky size, and complicated sample preparation procedure, mass-spectroscopy is not yet a reliable technique for the diagnosis of diabetes. On the other hand, infrared spectroscopy has many advantages for detecting acetone in the biofluids [74,119,120]. In infrared spectra, acetone yields prominent distinguishable spectral features, which makes them easier to identify in gaseous biofluids. Quantum cascade laser has been used to analyse exhaled breath of patients with type 1 diabetes [120]. Acetone was identified, but due to the strong absorption line of water, the spectral feature of acetone looked quite noisy (see Figure 2 in Ref. [120]). In this experiment, water has been suppressed from the breath samples but the suppression was not sufficient. In a recent experiment, a thermal source-based FTIR spectrometer has been used to analyse exhaled breath samples from healthy volunteers. The sample was sent through a water condenser and a strong suppression of water vapour was achieved. The absorption spectra of breath in the fingerprint region of acetone are presented in Figure 8. Two broad peaks centred at 1217 cm −1 and 1365 cm −1 indicate the presence of acetone in the breath sample. The peak at 1217 cm −1 is practically noisefree (black line) and fitted quite well with the reference (PNNL [121]) fitting curve of acetone (red line). However, the right side spectral feature of breath spectra, centred at 1365 cm −1 is slightly elevated in amplitude than the corresponding acetone peak. Using a developed data analysis technique, it was confirmed that this peak is a result of the combined absorption of acetone, aldehyde, tetramethylurea, and some other unidentified molecules [122]. The acetone concentration was calculated by applying the least square fitting. The measured concentration of acetone for this particular volunteer is ∼1.1 ppm. The reported concentration of acetone for patients suffering from type 1 diabetes is in between 1.5 and 2.2 ppm [117,118]. Here the presented acetone fingerprints are for a single volunteer. Acetone concentration was also measured and monitored for a large number of healthy volunteers by infrared spectroscopy [17]. For all the cases, acetone concentrations are well below the range of type 1 diabetes cases. The measured acetone level of healthy volunteers is also supported by measuring acetone concentration by massspectroscopy [115,123]. Therefore, it can be concluded that infrared spectroscopy can be an alternative diagnostic method for diabetes. The precise measurements of the time evaluation of acetone concentration for the circadian cycle and after-fasting effects by infrared spectroscopy even strengthen the argument (see

Antibiotic Treatment
Bacterial infection is a common cause of many diseases, e.g., cholera, tuberculosis, pneumonia, etc. Recently many studies have even found evidence that some cancers are initiated by bacteria [124][125][126][127]. Therefore, early detection of bacterial infection and monitoring its evolution for the diagnosis and treatment of many diseases is a necessary task. Current diagnostic methods to detect bacterial infection mostly rely on the culture of microorganisms from different biofluids. This approach is laborious and relatively timeconsuming to obtain the result. In addition, culture-based diagnosis suffers from many preanalytical limitations that may affect the performance of bacterial detection. For example, the inadequate volume of the collected biofluid, prior antibiotic exposure, contamination, and delays in laboratory processing are some of the main pre-analytical factors. In many cases, reliable identification of the infection and susceptibility testing may take a few days. Contamination is a frequent problem that may drive inappropriate antibiotic use, misdirect clinical diagnosis, and expose patients to unnecessary toxicities [128]. Many microbiological methods have been developed recently, which enhance the accuracy and faster detection of bacterial infection; however, the inherent problems of cultural approach are still there [129,130].
Apart from the detection of bacterial infection, understanding the population dynamics of microbiota helps to develop an efficient antibiotic treatment. The current knowledge on deviations of human microbiota caused by antibiotic treatment is substandard [131]. To improve it, deviation of breath VOCs of a volunteer under treatment of quadruple antibiotic course (QAC) against Helicobacter pylori has been studied by infrared spectroscopy [132]. Two spectral regions were identified where the corresponding spectral structures strongly deviate during the antibiotic treatment [132]. Both spectral features along with the time trace of one of them are shown in Figure 9a-c. The spectral feature of methane at around 3000 cm −1 is strongly modified by some unknown spectral features. Using the digital subtraction method, a prominent spectral feature is revealed (see Figure 9b). This feature is well fitted with methyl butyrate absorption peak at around 2970 cm −1 . A detailed of digital subtraction procedure is presented in a separate article [122]. Two other prominent spectral features are observed in Figure 9b. The spectra feature at around 1130 cm −1 is fitted well with the absorption peak of ethyl pyruvate and peak at around 1170 cm −1 is well fitted with methyl butyrate. Both the identified molecules generated by bacteria in the gut are involved in fundamental metabolic processes [107,133]. Therefore, both metabolites could be used for monitoring acute gastritis and anti-Helicobacter pylori treatment. The time trace of the absorption peak at 2970 cm −1 is plotted in Figure 9c. During infection by Helicobacter pylori, the concentration of methyl butyrate is quite high. After the antibiotic treatment, the concentration drops relatively slowly and takes more than 10 days to return to its normal concentration. To the best of my knowledge, this is the first demonstration of the dynamics of breath VOCs for acute gastritis affected by the quadruple antibiotics course carried out by infrared spectroscopy. Therefore, it can be concluded that infrared spectroscopy is capable of identifying possible biomarkers for bacterial infection. Of course, it needs to be clarified with some other bacteria also. In an ongoing project, many bacteria are identified by analysing bacterial headspace by infrared spectroscopy.

Cerebral Palsy
Cerebral palsy (CP) is a permanent disorder of the postural and musculoskeletal systems, caused by non-progressive damage to the brain in early childhood, shortly before, after, or during the birth [134][135][136]. It causes learning disabilities, behavioural problems, speech disorders, perception deficits, and seizure disorders [137,138]. Unfortunately little has been known about brain damage. Recently a mathematical model has been developed to understand the cerebral blood flow and occurrence of intracerebral haemorrhage in preterm infants. Based on this model, a machine learning model has been developed for identifying preterm infants who are at risk of cerebral haemorrhage [139]. This is a very important step toward understanding one of the causes of brain damage; however, its verification is far from reality, as the brain damage is understood much later than it actually occurs. Recently, in a pilot study, a postmortem brain biopsy was analysed using mass spectrometry and nuclear magnetic resonance spectroscopy [110]. Several potential biomarkers have been identified by both experimental techniques. The article reported the metabolomic profiling and biochemical pathways associated with CP. These findings definitely help to further investigation of the complex etiopathophysiology of CP.  [132]). (b) Absorption spectra of the breath of a healthy volunteer (grey) and in bacterial infection (cyan). The red curve is the spectral difference between the above two spectra. The blue plot represents the IR spectrum of methyl butyrate. (Ref. [122]). (c) The recovery dynamics of acute gastritis via QAC. (Ref. [132]).
Recently another investigation was carried out to reveal possible biomarkers in the breath sample of a person with CP [140]. Infrared spectroscopy has been used to identify the VOCs associated with CP in exhaled breath. The infrared spectra of breath from 13 volunteers with CP were compared with 14 healthy volunteers of comparable ages. The average infrared spectra of CP and healthy cohorts are presented in Figure 10a. Infrared spectroscopy allowed us to identify two distinguishable spectral features for CP and healthy groups. These two spectral feature are observed around 1189 cm −1 and 1205 cm −1 . The least-square fitting procedure was performed to find out the possible molecules associated with these two spectral features. Ethyl propionate, propyl propionate, and 3-buten-2one seem to fit quite well with the spectral feature around 1189 cm −1 . The other peak is not yet resolved. Statistical analysis was performed on the spectral data in the range 1185-1215 cm −1 using unsupervised principal component analysis (PCA analysis) and supervised Support Vector Machine (SVM) and Random Forest (RF) methods. The statistical results are presented in Figure 10b,c. More than 90% accuracy has been achieved in the identification of the two groups by using supervised analysis [141]. This is a significant result toward the development of the diagnosis of CP at an early stage. However, promising results demand additional studies, mostly focused on new specific biomarkers, larger statistics, and an extension of this investigation to newly born babies. All these three aspects are under investigation in an ongoing project.

Prostate Cancer
Cancer is one of the most vulnerable diseases and causes millions of deaths worldwide [1]. An uncontrollable cell growth can start in any part of the body and spread to other parts of the body. It might be prevented if affected cells are identified in their early disease state. Unfortunately, at an early stage, cancer remains asymptotic and difficult to realize. Symptoms only appear when cancer reaches its advanced stage and pushes the patient to a great risk of death. However, if the infected cells are identified, they can be completely cured, or at least the life span of the patient can be prolonged. Until now, invasive tissue collection and biopsy has been the only reliable diagnostic method for detecting cancer cells. This invasive process not only creates surgical complications but also creates psychological stress; therefore, it is used only with definite symptomatic cases. Unfortunately, it is already too late in many cases.
Many attempts have been taken to develop a non-invasive technique for the detection of cancer. Imaging-based tumour detection is one of the pioneering processes for cancer diagnosis [9,10]. However, due to technical reasons, small tumours cannot be detected at their initial stage by imaging techniques. Therefore, this technique is also applicable to the advanced stages of cancer detection. Alternatively, metabolic analysis of biofluids already demonstrated its power to detect many diseases, such as diabetes, cardiovascular disease, Alzheimer's, etc. [11,12]. Therefore, it is logical to expect that analysis of metabolites in human biofluids may contribute to the diagnosis of cancer. Indeed, since the affected cells suffer from the uncontrolled high rate of cell division, their metabolism changes significantly, which is expected to reflect in biofluid.
Many studies have been performed to find out reliable biomarkers for different cancer types. Lung cancer is probably the most investigated cancer type for biomarker findings. The availability of different biofluids, which are directly related to the lung's probably the reason behind the most investigated cancer types. Additionally, most of the biofluids for lung cancer investigation are collected through non-invasive, semi-invasive, or minimallyinvasive processes. For example, exhaled breath can be collected fully non-invasively, and sputum can be collected non-invasively or semi-invasively [142,143]. Blood also consider as a biofluid for biomarker-based lung cancer detection. There has been significant work on biomarker search from exhaled breath. A large number of them have been carried out by mass-spectroscopy methods. Many biomarkers have been reported [26,[144][145][146][147]; however, agreement among different researchers is not convincing. In spite of the poor agreement, biomarker-based lung cancer detection can be a potential non-invasive diagnostic for lung cancer. However, further improvement of the techniques and tests of a larger number of data sets is necessary to find out reliable VOCs related to lung cancer. Very few studies have been performed to reveal biomarkers for colon [148], bladder [149], breast [111], prostate [106], and other cancers. Recent work on prostate cancer detection by infrared spectroscopy is presented in the following section.
Prostate cancer may be the second most investigated cancer type for biomarkers search. In this regard, urine has been mostly investigated as biofluid as it is closely related to the prostate. Large metabolites in the liquid phase, as well as small metabolites (VOCs) in the gas phase from urine, are under investigation [106,[150][151][152]. Prostate-specific antigen (PSA) test from the blood serum is one of the routine screening procedures for the detection of prostate cancer; however, the sensitivity and the specificity are significantly lower than the desired accuracy for cancer diagnosis [153,154]. Breath has been also considered a potential source of biomarkers for prostate cancer [155,156]. A little has been conducted to find out breath VOCs related to prostate cancer. Recently, an infrared spectroscopy-based VOCs analysis for prostate cancer demonstrated promising results. In this study, breath samples from a small cohort of 28 prostate cancer patients were analysed and compared with 19 healthy volunteers [157]. In addition, breath samples of eight kidney cancer and eight bladder cancer cases were also investigated to find out whether the infrared signatures of all these three cancers are different or not. Eight spectral regions were found where spectral signatures for different cohorts are distinguishable. One of the spectral signatures, centred around 1005 cm −1 , is shown in Figure 11a for the healthy and prostate cancer cohorts. The average spectra of the healthy cohort are shown with a solid red line. A shaded red region depicts the spectral variation of each individual. Similarly, prostate cancer cohorts are shown with blue colour. Two cohorts are fully separated even without overlapping among their individuals. This allows for distinguishing prostate cancer cohorts from healthy cohorts with over 95% accuracy [158]. The three types of cancer under investigation are also distinguishable from each other; however, their separations are not prominent like in the previous case (see Figure 11b). The molecule associated with this spectral feature was identified by least-square fitting with reference molecular spectra from a commercial database. For this case, acetic anhydride is identified as one of the potential biomarkers for prostate cancer diagnosis. The identification was also verified by quantum chemistry calculations (see supplementary of reference [158]). In this study, seven more spectral features are identified for which healthy and prostate cancer cohorts are distinguished with accuracy over 90%. A detail of the spectral position, associated molecule, and calculated accuracy are presented in Table 1. Statistical analysis is performed in case of visible spectral differences among the different sets of infrared spectra of breath to determine the accuracy of the diagnosis. Selected statistical results are presented in Figure 12a-c. Supervised as well as unsupervised statistical analyses were performed. The PCA analysis of spectral feature around 1005 cm −1 is presented in Figure 12a. The principle component values for different cohorts are clustered in PC1 vs. PC3 space. For a better understanding, the clusters are enclosed by ellipses. It was revealed that healthy and prostate cancer groups are well separated. The cloud for bladder and kidney cases is more compact; however, overlapped with the prostate cancer cloud. It is not clear yet, whether urogenital cancers are indistinguishable with respect to this particular volatile metabolite or insufficient sample size, especially for bladder and kidney cancers made the analysis poor. A larger sample size is necessary to confirm the above statement, which is an ongoing project. The statistical data are also presented as box plots (see Figure 12b,c). The box plots demonstrate a good separation between healthy and prostate cancer groups. A figure of merit is determined by the p-value. In the case of p > 0.05, the data of the corresponding samples are dependent. The very low p-value for the spectral feature at 1005 cm −1 indicates that data from healthy and prostate cancer groups are independent. Similar results have been achieved for the other seven spectral regions where visible spectral differences are observed. The results are summarized in Table 1.

Applicability of Infrared Technique for VOC Detection
It has been already demonstrated that infrared spectroscopy has the potential to identify disease-specific volatile biomarkers with high accuracy. Since it uses fundamental molecular vibrations as a probe to identify the molecules, the identification is straightforward and accurate. A minimal sample preparation effort makes the infrared detection techniques much faster than mass-spectroscopic methods for VOC detection [158]. In the current state, a typical measurement takes about twenty minutes including sample preparation, measurement, and cleaning of the system to prepare for the next measurement. However, regarding the number of molecules and detection sensitivity, infrared spectroscopy is far behind mass-spectroscopic techniques. Mass-spectroscopic techniques already identified hundreds of metabolites with a sensitivity of 100 ppt [26,30]. Of course, it needs to be noted that mass-spectroscopy for VOC detection has been developing for decades, whereas the infrared technique for VOC identification in biofluids is still in its infancy. Extensive research needs to be undertaken for spectroscopic technology development as well as VOC identification processes. For example, laser-based spectroscopy definitely increases the sensitivity of the detection down to the ppt level [159,160], which is comparable to the detection limit by mass-spectrometry. Regarding the device cost and size, infrared spectrometers are more favourable than mass-spectrometers. In fact, to identify specific molecules, extremely small-size single-frequency lasers can be made. Such laser systems are used in environmental, forensic, and food industries for the detection of volatile molecules. For clinical application, extensive research needs to be completed both in technology development as well as diagnostic development [158,[161][162][163]. In the current state, it requires a molecular spectroscopic expert for the analysis of the data; however, when a disease-specific biomarker is confirmed, this diagnostic can be carried out by any clinician. In the same race, e-nose technology is also promising, especially for its size and cost; however, a careful investigation is necessary before it can be used as a clinical diagnostic method [164,165].

Conclusions
This review article presented the state-of-the-art applicability of infrared spectroscopy for volatile metabolites analysis in gaseous biofluids. It is an established fact that volatile metabolites present in biofluids carry information about the body's state. As a consequence, abnormality/disease initiated at any part of the body is reflected in the composition of biofluids. By having sensitive instruments for analysing the change of equilibrium in metabolic composition, one can try to recognize the disease in its early (asymptomatic) state. Among many developing experimental techniques, infrared spectroscopy demonstrated very promising results. Infrared spectroscopy is a well-established experimental tool for molecular identification. However, due to a large amount of water contained, its applicability for biofluid analysis was limited until recent years. The development of the high-duty cycle water suppression technique for gaseous biological samples opens a new window for the applicability of infrared spectroscopy in biofluids. A detail of the experimental technique and data analysis are presented in this review. It has been shown how the VOC compositions in breath change with the circadian rhythm and external factors such as food, drink, physical exercise, etc. In spite of short-term changes, in general, VOC compositions remain stable over months and years unless the patient is suffering from a disease. This substantial evidence allowed us to establish the concept of the "Island of stability (IOS)". Any deviation from IOS is considered an abnormality of the body. Infrared spectroscopy is capable to monitor IOS very precisely. Using this concept, the infection of Helicobacter pylori and its antibiotic treatment were monitored by infrared spectroscopy. This study identified the possible biomarker for bacterial infection caused by Helicobacter pylori. This experimental technique is also used to identify biomarkers for cerebral palsy and prostate cancer. Several possible biomarkers are identified for both disease cases. This technique also shows very high sensitivity and specificity (>90%) to distinguish between different diseases and healthy groups. All these pieces of evidence manifest that infrared spectroscopy holds a lot of promise as a future non-invasive diagnostic tool. However, significant developments need to be completed in terms of technology as well as data analysis before it will be accepted as a diagnostic tool. Many studies are being conducted in both directions to develop non-invasive diagnosis tools.
Funding: There is no funding involved with this review article.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.