1. Introduction
The human body emits hundreds of volatile organic compounds (VOCs) through biological fluids, such as breath, saliva, feces, urine, sebum, and sweat [
1,
2]. These VOCs are referred to as the volatilome and offer a unique chemical fingerprint reflecting the metabolic activity and conditions of the body, contributing to a person’s specific odor [
3]. The composition of the volatilome is dynamic and closely related to health conditions, diet and lifestyle, metabolism, microbiome activity, and overall physiological states. Because it reflects complex metabolic and microbial interactions, the volatilome composition can be highly informative because it may provide valuable insights into an individual’s physiological state. Changes in volatile composition can indicate shifts in metabolic processes and may be associated with various health conditions. For example, changes in the VOCs profile have been reported in cases of infections [
4,
5,
6,
7], or complex pathological conditions, such as Parkinson’s disease [
8,
9] and neoplastic disorders [
10]. The presence or absence of specific VOCs can thus be used to detect alterations in metabolic states, making them useful indicators for disease detection and monitoring.
High-resolution techniques such as headspace solid-phase microextraction coupled with gas chromatography and mass spectrometry (HS-SPME-GC-MS) have proven particularly effective in VOCs analysis due to their high analytical sensitivity, reproducibility, and compatibility with multivariate statistics [
11]. HS-SPME-GC-MS has been widely adopted across diverse fields, from environmental monitoring to the food and beverage industry, where it is used to detect trace compounds with remarkable accuracy [
12], and it is now being increasingly applied in biomedical contexts. Its successful application in these areas has supported the extension of its use in clinical diagnostics, enabling the comprehensive characterization of a subject’s or patient’s volatilome [
2,
8,
9] using an almost non-invasive procedure. As further evidence of this diagnostic value, even trained dogs have demonstrated the ability to detect diseases by recognizing specific VOCs patterns in body fluids, further highlighting the diagnostic relevance of volatilome characterization [
5]. The non-invasive nature of VOCs sampling, combined with its potential for high specificity and sensitivity, represents a notable advantage, particularly for populations where traditional diagnostic methods may be impractical. For example, bacterial infections in the lungs have been detected in vivo by sampling the breath and analyzing the emitted VOCs, allowing the specific diagnosis of infections from closely related but different bacteria [
7].
The global impact of SARS-CoV-2 has been profound, causing widespread illness and significant mortality worldwide. Although the acute pandemic phase has subsided, SARS-CoV-2 infection continues to affect a substantial portion of the population, often leading to health consequences and complications in some Patients [
13]. For this reason, understanding the effects of the virus, particularly its impact on metabolism and subsequent biochemical markers, remains a critical and intriguing area of research. It is plausible that VOCs associated with SARS-CoV-2 infection differ from those produced under normal metabolic conditions, providing potential markers for early detection and monitoring of the disease. To interpret such complex molecular data, chemometrics techniques, such as Principal Component Analysis (PCA), are often applied to analyze VOCs data and can significantly enhance diagnostic accuracy by enabling the identification of infection-specific patterns. Previous studies have applied GC-MS to investigate the VOCs profiles of COVID-19 Patients, including analyses of breath and skin emissions [
14,
15].
While these studies typically relied on larger cohorts and pattern-recognition models, the present work contributes by adopting a high-resolution, untargeted GC-MS strategy, enabling detailed molecular annotation and the identification of candidate compounds with potential mechanistic and diagnostic relevance. The presence of potential biomarkers associated with SARS-CoV-2 infection may pave the way for their use in rapid diagnostic tools for early screening, characterized by high sensitivity and specificity. Our goal was not to replace current diagnostic methods, but to explore, with an untargeted approach, a complementary, rapid, non-invasive screening approach. In particular, the approach was conceived to provide a binary diagnostic indication (presence or absence of infection), rather than to characterize specific viral variants, which would require different analytical strategies and larger epidemiological cohorts. We applied fully automated headspace solid-phase microextraction (HS-SPME) coupled with gas chromatography (GC) and time-of-flight mass spectrometry (ToF-MS) [
16] to analyze the volatilome of sebum and sweat samples collected from individuals with and without a diagnosis of SARS-CoV-2 (detected using standard diagnostic tests). Since previous studies have demonstrated that sweat and sebum are significant routes for the elimination of VOCs, making them ideal candidates for VOCs analysis [
8,
9], samples of this unique VOCs chemical fingerprint were collected non-invasively through gentle axillary swabbing using a cotton gauze. Such an approach could be particularly beneficial in contexts where rapid diagnostics are urgently required, such as remote locations or resource-limited healthcare systems. The ability to identify specific VOCs associated with infection would support future development of portable or wearable devices, such as electronic noses (e-noses) or smart patches, capable of on-site detection of disease biomarkers. Previous studies have demonstrated the feasibility of e-noses to detect VOCs patterns associated with diseases like lung cancer [
17] and diabetes [
18], reinforcing the potential of this method. Thus, precision medicine could increasingly rely on individual VOCs profiles, offering the possibility of more personalized diagnostic and therapeutic strategies.
2. Materials and Methods
The overall experimental workflow consisted of four main steps: 1. non-invasive collection of sweat and sebum samples; 2. storage and transport under controlled conditions; 3. HS-SPME-GC/ToF-MS analysis with automated sample handling; 4. multivariate chemometric analysis for feature selection and classification.
2.3. Sampling and Analytical Procedure
Samples of sweat and sebum were collected by gently swabbing the skin in the axillary area using sterile cotton gauze. Participants were instructed to avoid soaps and deodorants for 12 h before sampling. Care was taken during the collection of the samples to avoid contamination by the environment, clothes, or other body areas.
Immediately after the swab, the gauze was placed in a sterile 10 mL polypropylene vial (Agilent Technologies Italia SpA, Cernusco sul Naviglio, Milano, Italy) with a pierceable cap, which was stored in a portable refrigerator at 4 °C and transported within a few hours to the laboratory of analysis (Faculty of Agricultural, Environmental, and Food Sciences, Free University of Bolzano). An anonymous code identified each vial. In the laboratory, the vials were stored at −80 °C until the analysis (within one month). Sample stability was validated during preliminary method setup. Analysts were blinded to sample origin (Patient or Control).
Volatile organic compounds (VOCs–volatilome) analysis was performed by solid-phase micro extraction (SPME) gas chromatography (GC) coupled with time-of-flight mass spectrometry (ToF-MS) (HS-SPME-GC/ToF-MS, LECO, St. Joseph, MI, USA) [
16]. The SPME fiber used was a triphasic 1 cm 50/30 µm CARBOXEN/DVB/PDMS StableFlex™, needle size 23 ga, purchased from Merck Life Science SrL (Milan, Italy). Standard compounds (C4-C22 series of fatty acid ethyl esters in dichloromethane) used to calculate the retention index were purchased from Merck Life Science SpA (Milan, Italy). The vials were processed by analytical instrumentation using a pre-set method, without opening or other handling procedures. This ensured minimal sample degradation or contamination. The herein described SPME vials have a pierceable silicon septum, so no handling was necessary after their sealing.
The volatilome profile was characterized by HS-SPME-GC/ToF-MS on a LECO Flux BT 4D instrument (LECO, Mönchengladbach, Germany). The instrument was equipped with a PAL-II autosampler, and all described operations were performed automatically. For each analysis, the sample was pre-conditioned at 40 °C with 300 rpm stirring for 15 min. Then, the pre-conditioned SPME fiber was inserted into the vial, maintaining a 1 cm distance from the sample, and then the fiber in the vial was conditioned at 40 °C with 300 rpm stirring for 45 min. The fiber was then transferred to the 240 °C heated GC inlet and desorbed/injected from the heated inlet to the column by the He flow. The injection/desorption time was 6 min, and the GC inlet was set at 240 °C. In the gas chromatograph, the column used was a polar MEGA-WAX spirit column (PEG phase) 40 m/0.18 mm/0.30 μm (MEGA, Milan, Italy). The injection was done in splitless mode. The carrier gas (He) flow rate was 1 mL.min−1. The temperature ramp of the GC oven was 40 °C for 6 min (injection), then from 40 °C to 180 °C at 3 °C/min, then 180 °C to 240 °C at 10 °C/min, and 1 min at 240 °C (consequently, LRI attributed to compounds eluting above 180 °C might deviate considerably from their reference values). Detection was done with a pre-tuned time-of-flight (ToF) detector, according to the following parameters: acquisition rate = 5 spectra.s−1, acquisition mass range = m/z 35-650, extraction frequency = 32 kHz. The processing software ChromaToF® (ver. 2021, LECO Corporation, Berlin, Germany) was used to process the GC-MS obtained automatically, providing identification and tentative assignments of compounds via spectral comparison with the NIST library 2017 (NIST MS search 2.3). Linear retention indexes were calculated against the even-carbon-containing ethyl ester of saturated linear fatty acids from C4 to C22. Samples were disposed after analysis.
Daily performance checks ensured analytical reliability: internal standards, blanks, retention time stability, and detector sensitivity were routinely verified. All materials were VOC-free certified. Procedural blanks and ambient air monitoring minimized background interference. These precautions align with recent recommendations on minimizing exogenous influences in volatilome analysis [
15].
4. Discussion
In clinical diagnostics, the identification of rapid, accurate, and non-invasive biomarkers is essential to tackle the challenges posed by emerging infectious diseases, such as SARS-CoV-2. While current methods like RT-PCR testing are reference standards because they are highly specific, their invasiveness, reagent dependency, and diagnostic sensitivity limitations highlight the need for alternative approaches. Innovative approaches based on the analysis of the volatilome naturally emitted by the body can be a promising method to detect infections at an early stage and manage them.
In this context, the present study was conceived to detect a general infection-related signature, providing a binary diagnostic output (presence/absence of infection), rather than distinguishing between viral variants, an endeavour requiring different analytical strategies and larger cohorts. RT-PCR analysis of nasopharyngeal swabs was used as the reference method to confirm SARS-CoV-2 status in participants, ensuring alignment with the clinical gold standard. The VOC-based discrimination was therefore benchmarked against RT-PCR-confirmed infection status. The aim here was to assess whether untargeted GC-MS can identify infection-associated molecular features, not to benchmark diagnostic performance against RT-PCR.
Chemometrics is the branch of analytical chemistry focused on extracting relevant information from complex datasets and can play a pivotal role in processing the huge datasets derived from VOC analysis, enabling robust discrimination between pathological and healthy states. Although the number of Patients included in this study was relatively small, it was appropriate for the analytical aim, which focused on identifying discriminant molecular markers using a high-precision, non-targeted GC-MS approach. The methodology enabled detailed spectral analysis and statistically sound multivariate modelling, supporting future validation studies on larger, more diverse populations.
Sweat and sebum samples were collected from subjects diagnosed with (Patients) and without (Controls) a confirmed SARS-CoV-2 infection, verified through RT-PCR analysis of nasopharyngeal swabs. Coupled with highly automated HS-SPME-GC/ToF-MS analysis, this non-invasive sampling ensured minimal inconveniences and preserved sample integrity. This approach provides a reliable and participant-friendly platform for volatilome studies. The protocol for gas chromatography (GC) and mass spectrometry (MS) analysis was outlined by Sinclair et al. [
8,
9], and Darnal et al. [
16]. The data obtained from the analysis were preliminarily processed with Principal Component Analysis (PCA) to explore and select the most significant features from the data (
Figure 2). The Loadings from PCA were then used to filter the dataset, to remove less relevant variables for discrimination. Subsequently, Partial Least Squares Discriminant Analysis (PLS-DA) was applied using the refined set of chemical variables (the VOCs).
With this approach, it was possible to discriminate between Patients and Controls based on the VOC patterns. Initially, 20–30% of the Control samples were misclassified as Patients, indicating a notable false positive rate. To improve accuracy, further variable selection was applied, refining the model by removing interfering signals. This led to improved classification performance. Cross-validation and application to randomized validation sets demonstrated >95% accuracy overall. Specifically, Controls were correctly identified in 88.24% of cross-validation cases and 71.43% in the validation datasets (
Table 2).
Patients were correctly assigned 100% of the time in both analyses. These figures should be interpreted as internal cross-validation results rather than as diagnostic accuracy estimates, because the present study was designed as a proof-of-concept for candidate biomarker discovery, not for diagnostic validation. In this framework, the cohort size (n = 51) was appropriate for untargeted high-resolution GC/ToF-MS, where each sample yields thousands of variables, and the analytical focus is on detecting discriminant signals rather than estimating clinical performance. Accordingly, the reported >95% value reflects internal model consistency and does not imply generalizability or real-world diagnostic performance, which will require external validation on larger, independent cohorts.
The chemometric model allowed us to achieve robust classification, correctly identifying all positive samples within the dataset. A limited number of false positives were observed among Controls. The high classification accuracy observed in cross-validation does not equate to a full diagnostic validation. Indeed, performance metrics such as sensitivity, specificity, ROC curves, or confusion matrices were not computed here, as the limited cohort size and the absence of an external test set preclude reliable generalization. Accordingly, this study focused on the identification of discriminant VOC features rather than on diagnostic accuracy estimation. The design is consistent with a biomarker-discovery scope. The cohort size (n = 51) was adequate for untargeted GC/ToF-MS analysis, which prioritizes analytical resolution per sample over statistical generalization. Despite this, the PLS-DA model achieved robust internal performance (Q2_cum = 0.727, R2Y = 0.820) with no false negatives, supporting the reproducibility of the discriminative signal within this proof-of-concept framework. Nonetheless, it is clear that broader external validation will be required to confirm biomarker stability across populations differing in age, comorbidities, and environmental exposures. This is the focus of ongoing work, which will establish the generalizability and clinical reliability of the proposed biomarkers.
Given the use of high-resolution, untargeted GC-MS, the chosen sample size was appropriate for identifying molecular markers with mechanistic significance, offering greater analytical depth than large-scale sensor-based studies [
14]. Although classical multivariate techniques such as PCA and PLS-DA were employed, alternative or other approaches, including univariate tests (e.g., Mann–Whitney U), ROC analysis, and machine learning algorithms (e.g., neural networks), could provide further insights during future validations. These methods, however, require larger datasets and were beyond the scope of this initial profiling study. The integration of these methods will be essential to advance from exploratory chemometric modeling toward clinically validated diagnostic tools. PCA and PLS-DA were employed to explore data structure and identify discriminant signals, not as clinical diagnostic models per se. However, these models can be translated into automated diagnostic tools through embedded algorithms in sensor-based platforms, ultimately delivering simplified, clinically actionable outputs. This integration would enable the application of VOC-based analysis in clinical workflows in point-of-care or hospital settings without requiring specialized knowledge of multivariate statistics.
The main objective of this work was to explore potential biomarkers associated with SARS-CoV-2 infection by comparing healthy individuals and confirmed COVID-19 Patients. Rather than replacing current diagnostic methods, this approach aims to offer complementary, non-invasive testing tools. In the context of precision medicine, the characterization of the volatilome offers a diagnostic approach tailored to the specific metabolic and microbial profiles of the subjects. It addresses the inter-individual variability and the environmental influences and could improve diagnostic precision and reduce the risk of misclassification.
The promising cross-validation accuracy supports the possible translational relevance of the refined model. Given the 70–80% sensitivity of current antigenic and molecular tests [
19,
20], non-invasive tools like skin swabs could provide meaningful advantages. Nonetheless, it is necessary to highlight that the impact of soaps, perfumes, deodorants, or other cosmetic products on the volatilome must be carefully considered, as these may alter VOC profiles. Further studies should aim to identify markers unaffected by such confounders, or to isolate the interfering signal from cosmetics. However, although potential confounding factors such as diet, medication, and personal hygiene products were not controlled beyond the 12-h restriction on soaps and deodorants, this choice reflects the exploratory and deliberately realistic design of the study. The inclusion of participants without lifestyle stratification inherently increased within-group variability, thereby testing the robustness of infection-related VOC signatures under real-world conditions. The persistence of significant discriminatory features despite this variability supports their potential biological relevance.
The specific chromatographic features identified in this study were obtained through a fully untargeted approach. No pre-treatment of samples was applied, and no assumptions were made regarding the molecular classes involved. Further research is required to confirm and characterize the compounds responsible for distinguishing between positive and negative cases in a larger and more diverse population. For instance, this includes understanding how the volatilome may behave in the presence of other pathological or environmental factors, to establish the robustness of potential diagnostic markers. Preliminary data analysis tentatively identified a key compound of interest (2-methylbenzenemethanol acetate), which was reported as a promising marker for infection. Its identification was given based on GC-MS data and comparisons with the NIST spectral library. The metabolism of compounds similar to 2-methylbenzenemethanol acetate can occur through various biochemical pathways in humans, particularly involving aromatic derivatives and microbial interactions [
21]. While direct evidence for 2-methylbenzenemethanol acetate’s metabolism is lacking, investigations already done on related compounds can give an idea of how the body can process similar substances. It is important to consider that VOCs can originate not only from host metabolism but also from microbial activity on host skin. In fact, recent studies have demonstrated that certain VOCs, including esters and alcohols, can be produced by bacteria such as
Xenorhabdus indica strain AB [
22] and fungi like
Aspergillus flavus [
23]. While our untargeted GC-MS approach allowed the identification of 2-methylbenzenemethanol acetate as a discriminant compound in COVID-19 Patients, we cannot exclude that its presence may be influenced by microbial populations inhabiting the axillary region. 2-Methylbenzenemethanol acetate, the most discriminant feature identified in this study, is an aromatic ester that may arise from the acetylation of 2-methylbenzyl alcohol, a compound linked to amino acid and/or microbial metabolism. Aromatic esters are known to form through acetyl-CoA–dependent acetyltransferase activity acting on aromatic alcohols, a mechanism widely reported in microbial and yeast systems [
24,
25]. To the best of our knowledge, a direct biosynthetic route for 2-methylbenzenemethanol acetate in humans or microbiota has not yet been described. Thus, the following remains a plausible hypothesis.
Viral infections such as SARS-CoV-2 are known to induce oxidative stress, inflammatory responses, and alterations in the skin and mucosal microbiome [
26,
27], all of which can influence ester-forming enzymatic activity. It is therefore plausible that the observed increase in this compound in infected subjects reflects host–microbiome metabolic interactions modulated by the infection rather than a direct viral product. The absence of this compound in analytical blanks and its significant enrichment in the patient group (
p = 0.003, VIP > 1) support a biological, rather than environmental, origin. Further investigations combining targeted metabolomic and metagenomic approaches will be required to elucidate its metabolic pathway and confirm its diagnostic specificity in independent cohorts. Nevertheless, it is worth noting that the consistent increase in this compound is observed only in SARS-CoV-2-positive individuals, suggesting a potential link between viral infection and the modulation of host–microbiome interactions. The infection might either directly or indirectly alter the skin microenvironment or immune regulation, favoring microbial proliferation or metabolic shifts that lead to the production of this compound. Nonetheless, the absence of this compound in the control group reduces the likelihood of a widespread fungal/microbial source, although further studies integrating mycobiome profiling would be needed to fully clarify this possibility. Moreover, the use of volatile organic compounds (VOCs) in exhaled breath has emerged as a promising non-invasive diagnostic approach for SARS-CoV-2 infection. A recent study [
28] identified VOCs biomarkers that could distinguish between COVID-19 cases and non-COVID-19 illnesses during the circulation of the Delta variant. However, the emergence of the Omicron variant significantly altered the volatilome profile, showing that there was a need to identify a different set of VOCs to maintain diagnostic accuracy.
In the current study, 2-methylbenzenemethanol acetate is reported in association to SARS-CoV-2 infection. This compound highlights its possible role as a diagnostic indicator, reflecting shifts in metabolic processes induced by the virus. While this evidence is promising, further studies are necessary to validate its specificity and sensitivity compared to other candidate VOCs. Nonetheless, these findings provide a foundation for follow-up investigations to explore its diagnostic utility and specificity. Moreover, investigating the response of the volatilome to confounding factors such as cosmetics or environmental exposure will help refine the applicability of volatile biomarkers in diagnostic settings. By deepening our understanding of the biochemical and environmental influences on these VOCs, it is possible to unlock their potential as reliable biomarkers for infectious and inflammatory diseases.
Further research should also investigate the biological mechanisms underlying infection-associated volatilome changes, as detected here with headspace GC-MS. Clarifying these metabolic pathways could support the development of precision medicine strategies and improve the design of effective diagnostic and therapeutic tools. This information is valuable for the development of specific rapid sensors. Refining sensor technology, potentially using nanotechnology or microfluidics, could further improve sensitivity and analytical efficiency in detecting infection-related VOCs, including those associated with SARS-CoV-2.