Untargeted Metabolomics and Multivariate Data Processing to Reveal SARS-CoV-2 Specific VOCs for Canine Biodetection

Aizpitarte, Diego Pardina; Larrañaga, Eider; Mayor, Ugo; Isla, Ainhoa; Amigo, Jose Manuel; Bartolomé, Luis

doi:10.3390/chemosensors14020035

Open AccessArticle

Untargeted Metabolomics and Multivariate Data Processing to Reveal SARS-CoV-2 Specific VOCs for Canine Biodetection

by

Diego Pardina Aizpitarte

^1,2

,

Eider Larrañaga

^2,3,

Ugo Mayor

^4,5

,

Ainhoa Isla

²,

Jose Manuel Amigo

^1,4

and

Luis Bartolomé

^2,3,*

¹

Analytical Chemistry Department, University of the Basque Country (EHU), Barrio Sarriena s/n, 48940 Leioa, Bizkaia, Spain

²

AUZIKER Training Aids, Ed. Rectorado—Planta Baja, Campus de Bizkaia, Sarriena s/n, 48940 Leioa, Bizkaia, Spain

³

Servicios Generales de Investigación, Faculty of Science and Technology, University of the Basque Country (EHU), Barrio Sarriena s/n, 48940 Leioa, Bizkaia, Spain

⁴

Ikerbasque, Basque Foundation for Science, 48009 Bilbao, Bizkaia, Spain

⁵

Department of Biochemistry and Molecular Biology, Faculty of Science and Technology, University of the Basque Country (EHU), 48940 Leioa, Bizkaia, Spain

^*

Author to whom correspondence should be addressed.

Chemosensors 2026, 14(2), 35; https://doi.org/10.3390/chemosensors14020035

Submission received: 30 December 2025 / Revised: 27 January 2026 / Accepted: 29 January 2026 / Published: 2 February 2026

(This article belongs to the Special Issue Artificial Intelligence (AI)/Machine Learning (ML)-Assisted Chemical Sensors)

Download

Browse Figures

Versions Notes

Abstract

The exceptional olfactory capabilities of trained detection dogs demonstrate high potential for identifying infectious diseases. However, safe and standardized canine training requires specific chemical targets rather than infectious biological samples. This study presents an analytical proof-of-concept combining untargeted metabolomics and machine learning (ML) to decode the specific odor profile of SARS-CoV-2 infection. Using headspace solid-phase microextraction gas chromatography coupled with time-of-flight mass spectrometry (HS-SPME-GC/MS-ToF), axillary sweat samples from 76 individuals (SARS-CoV-2 positive and negative) were analyzed. Data preprocessing and dimensionality reduction were performed to feed a Partial Least Squares-Discriminant Analysis (PLS-DA) model. The optimized model achieved an overall accuracy of 79%, with a specificity of 89% and sensitivity of 70% in external validation, identifying a specific panel of Volatile Organic Compounds (VOCs) as discriminant biomarkers. The optimized model achieved robust classification performance, effectively distinguishing infected individuals from healthy controls based solely on their volatilome. Six VOCs were found to be consistently presented in COVID-19-positive individuals. These compounds were proposed as candidate odor signatures for constructing artificial training aids to standardize and accelerate the training of detection dogs. This study establishes a framework where machine learning-driven metabolomic profiling directly informs biological sensor training, offering a novel synergy between ML and biological intelligence in disease detection. This study establishes a scalable computational framework to translate biological samples into chemical data, providing the scientific basis for designing safe, synthetic K9 training aids for future infectious disease outbreaks without the biosafety risks associated with handling live pathogens.

Keywords:

machine learning (ML); chemical sensing; healthcare diagnostics; data processing; COVID-19; VOCs; detection dogs; mass spectrometry

1. Introduction

Dogs (Canis familiaris), due to their highly developed olfactory system, are capable of smelling concentrations down to parts per trillion (ppt), which is why they have been widely used for the real-time detection of targets Volatile Organic Compounds (VOCs) of interest by law enforcement, military, and private agencies. Traditional applications of dogs include drug, currency, and explosives detection, as well as missing-person searches and the detection of illegal food products, plants, and pests. Additionally, less traditional uses for canines have been demonstrated in the detection of chronic human diseases such as cancer and diabetes [1,2,3]. Although dogs have a highly developed sense of smell, they are not innately capable of detecting any scent without prior training. Training these K9 units is typically conducted using the very materials to be detected as aids [4]. Disease detection presents significant challenges and requires the development of training aids that contain the characteristic components of the disease’s scent. Other research groups have also begun to consider using dogs as an effective tool for COVID-19 detection, working with saliva [5], sweat [6], and expired air [7], as a well-trained dog can detect VOCs emitted by an individual with COVID-19. The aim of this work was to use the VOCs of sweat as it is a promising substrate for detection dogs, given that it is the key odor for search and rescue dogs, and the armpit is an easily accessible region to carry out the study [8]. Due to its noninvasive nature, ease of sampling, and the fact that its composition varies with the presence of pathogens [8], sweat is a highly interesting matrix for metabolomic studies [9,10,11,12]. The absence of viral particles in the sweat samples of COVID-19 patients has been demonstrated [6]. Therefore, SARS-CoV-2 in biological samples poses no safety concerns for personnel, particularly for dogs employed in disease-odor detection tests.

It is well known that biological elements within an organism do not act independently but interact. Systems biology is a discipline that focuses on the study of all components and their interactions. The main idea behind this field is that “The Whole is more than the sum of the parts” [13]. Within this multifaceted field, metabolomics encompasses the comprehensive quantitative analysis of metabolites in biological samples, providing insight into the dynamic metabolic processes within living systems.

Metabolomics can be categorized into two approaches: targeted and untargeted. Both targeted and untargeted metabolomics use high-throughput technologies that enable the acquisition of large amounts of data in a short period. However, whereas targeted metabolomics focuses on a defined set of metabolites, untargeted metabolomics examines the full profile of measurable metabolites in a biological sample, including unknown metabolites [14]. Although the extensive data produced by untargeted analysis requires rigorous processing and statistical validation to generate clinically relevant findings, it can detect any metabolites in the samples.

The urgency of the COVID-19 pandemic greatly accelerated research into the use of metabolomics methods for disease diagnosis and prognosis [15,16,17,18]. In particular, many studies have identified specific target VOCs as potential disease biomarkers in various biological samples (e.g., exhaled air, urine, serum, or plasma) [15,19,20,21]. VOCs are of great interest because they are emitted by the human body and reflect an individual’s metabolic status. Hundreds of different substance types are emitted by the human body of different substances that can be classified into groups based on functional groups, such as carboxylic acids, alcohols, fatty acids, and amines [22,23]. The collection of VOCs released is known as “Volatilome” [24]. The VOCs can be odorous and non-odorous, and they can vary in body area, sex, age, and genetic and physiological status. Therefore, any change in human physiology caused by a pathogen alters VOCs, resulting in disease-specific VOC profiles [22]. The change may involve the production of new VOCs or a modification in VOC ratios. Evidence indicates that the Omicron and Delta variants of SARS-CoV-2 contribute to changes in exhaled VOC profiles during COVID-19 infection. Notable studies have successfully demonstrated that axillary sweat is a robust matrix for canine detection of SARS-CoV-2 and have provided key insights into its chemical volatilome [25,26,27]. These works established that skin secretions [28] (eccrine, apocrine, and sebaceous glands) carry the specific metabolic signature of the infection. Building upon this foundation, the present study addresses a specific translational gap: the need to move from biological samples to universal VOCs in synthetic training aids. Unlike studies focused primarily on diagnostic classification or biological training aids, the primary objective of this work is to utilize chemometric modeling to pinpoint the specific ‘key odorants’ within the sweat profile. The ultimate goal is to translate these identified biomarkers into a chemical formulation for a synthetic canine training aid, ensuring standardization and safety for K9 units.

However, many compounds exist naturally in sweat, making it necessary to identify the target molecules characteristic of the aroma to be determined for training the dogs. To carry out this identification, a highly sensitive analytical method is required, such as solid-phase microextraction (SPME) coupled to gas chromatography (GC) and a time-of-flight mass spectrometry (MS-ToF) detector. The chromatographic signals generated by the analysis are entered into databases, where a search is conducted to identify the corresponding compound.

SPME is among the most widely used microextraction techniques. SPME is a simple, generally automated, and rapid extraction method that integrates conditioning, sampling, and extraction into a single step. For analyte detection, mass spectrometry is widely used in VOC analysis due to the high sensitivity required. Studies exist in which the detector’s mass analyzer is a time-of-flight (ToF), as mass-range scanning of all ions is very fast and, as such, increases the instrument’s inherent sensitivity [29]. Therefore, ToF is well-suited to rapid non-targeted metabolomic analyses [30]. The use of GC/MS in non-targeted metabolomics yields large, complex datasets. Therefore, to identify and quantify metabolites in biological samples, integrated computational approaches for data processing are needed. This analysis enables the extraction of relevant information, discriminating against all those compounds without analytical importance.

Integrating biodetection dogs with multivariate data processing offers a robust approach for mass screening of infectious diseases like COVID-19 by leveraging the canine olfactory system’s ability to detect VOC concentrations as low as ppt [3]. The “volatilome” of an infected individual reflects a unique metabolic state, allowing dogs to discriminate between positive samples and even identify cases days earlier than conventional tests [31]. However, as the volatilome change by factors such as gender, age or diet [3], the effectiveness of this biological tool as a chemical sensor justifies the implementation of computational models that analyze the olfactory fingerprint generated during the host–pathogen interaction. By combining exceptional biological sensitivity with data processing to standardize the chemical signatures of VOCs present in sweat or breath, diagnostic accuracy can be optimized, transforming canine biodetection into a highly scalable and technologically supported preliminary screening method for use in high-traffic environments such as airports, borders, transport hubs, or as a screening method in countries with low vaccination rates [32,33].

While biological samples are the gold standard for canine training, they pose biosafety risks and stability challenges. Synthetic training aids have been proposed as alternatives; however, replicating the complex olfactory profile of a disease is challenging [4]. Creating effective training aids requires pinpointing the specific biomarkers that differentiate the disease state from the healthy volatilome. This study addresses the robust computational framework by applying a multivariate machine learning workflow to isolate the specific ‘odor signature’ of SARS-CoV-2 from complex sweat matrices.

Based on all the above, the main objective of the COVID-K9 project is to test the capability of a detection dog as a chemical sensor. This involves the development of a technological device that contains one or more substances characteristic of the COVID-19-positive patient aroma, for subsequent use as a training aid to detect sick individuals by screening sweat with medical canine units. Achieving this objective would enable the development of an effective, versatile, fast, and inexpensive alternative diagnostic method, enabling large-scale virus detection. This would be highly useful in high-traffic areas, such as airports, borders, transport hubs, or as a screening method in countries with low vaccination rates.

2. Materials and Methods

2.1. Chemical and Reagents

For the optimization study of the analytical method as well as for quality controls on the injection sequences (QC), a mixture of the following standards was used: acetic acid, toluene, hexanal, heptanal, nonanal, phenol, benzaldehyde and ethyl octanoate, all purchased from Sigma Aldrich (Merck KGaA, Darmstadt, Germany) (purity: >99). Compounds were selected to represent the primary chemical families found in sweat: alcohols, aldehydes, ketones, and aromatic compounds. Appropriate dilutions of each standard were prepared, and a mixture was then prepared at a concentration of 30 ppm. The dilutions and the multi-component mixture were prepared in methanol (Scharlau, Barcelona, Spain). Each sample used for this study contained the gauzes to which 20 μL of the 30 ppm mix was added. All solutions and the mix were stored at −20 °C and protected from light.

2.2. HS-SPME-GC/MS-ToF Method

The chromatographic separation was carried out using a model 7890B gas chromatograph (Agilent Technologies, Santa Clara, CA, USA) equipped with an HP-5MS column (30 m × 250 µm × 0.25 µm; Agilent Technologies, Waldbronn, Germany) with a stationary phase consisting of 5% phenyl methylpolysiloxane. Helium (Air Liquide, Madrid, Spain) has been used as carrier gas (purity ≥ 99.9995%). A time-of-flight mass spectrometer (model 7250; Agilent Technologies, Santa Clara, CA, USA), equipped with an electron-impact ionization source (70 eV) and centroid data acquisition, was used. The samples were injected in the splitless mode using an autosampler (PAL System CTC Analytics, Zwingen, Switzerland). The column temperature was held at 40 °C for 2 min, then increased to 200 °C at 8 °C/min and to 260 °C at 5 °C/min, and held for 1 min. The temperature of the MS transfer line was maintained at 260 °C.

Samples were equilibrated for one min at 30 °C without agitation. Fiber exposure depth was standardized according to manufacturer recommendations; desorption time was 360 s; injector temperature was 270 °C; and fiber conditioning time after injection was 60 s.

To avoid drift in chromatographic signals, the sample batches and injection sequences were prepared according to a protocol specifying a defined distribution of positive samples, quality controls (two gauze pads with 20 µL of the standard solution added), and blanks (empty vials with two sterile 5 × 5 cm gauze pads). Injecting short sequences of positive and negative samples (tempered one hour before injection) alternated with each other and with quality controls (QCs) throughout the sequence. In addition, checks (TuneChecks) were carried out with each injection to verify that the chromatograph and detector were working correctly.

2.3. Fiber and Extraction Time Optimization

In this study, some of the variables were not optimized. Thus, the extraction temperature was set at the maximum body temperature (40 °C). Other extraction parameters (desorption temperature, desorption time, agitation) were previously set according to recommendations from prior work in the same field [30,34,35,36,37]. The optimized and evaluated parameters were fiber type and extraction time.

Fiber selection was conducted to identify the most suitable adsorbent for extracting the analytes of interest based on their polarity characteristics. Five SPME fibers (Supelco, Bellefonte, PA, USA) were evaluated. Before use, they were conditioned according to the manufacturer’s recommendations: 85 μm Carboxen/Polydimethylsiloxane (CAR/PDMS) fiber and 75 μm Carboxen/Polydimethylsiloxane (CAR/PDMS) fiber at 300 °C; 60 μm Polyethylene glycol (PEG) fiber at 240 °C; 50/30 μm Divinylbenzene/Carboxen/Polydimethylsiloxane (DVB/CAR/PDMS) fiber at 270 °C; 65 μm Polydimethylsiloxane/divinylbenzene (PDMS/DVB) fiber at 250 °C. All samples were collected simultaneously during the optimization process to minimize variability. Samples were collected from a single healthy volunteer during 15 min of outdoor exercise, and all samples were collected simultaneously to enable optimization using the same set of samples.

Each fiber was assayed in quadruplicate, and reproducibility was studied by analyzing each sample three times. A blank, which only contained the gauze, was prepared to see what signals were obtained from it in case of each fiber. To select the fiber coating for the HS-SPME process, the abundance of all standards and the total volatile compounds extracted in both the sample and the blank were evaluated. The selected fiber, therefore, had to meet the following: maximum abundance in all or almost all standards, the highest number of signals in the sample and the lowest number of compounds in the blank sample.

The extraction time was investigated at 5, 10, 15, 20, 30, and 50 min. Signals corresponding to certain compounds (toluene, 1-hexanol, 2-ethyl, 7-octen-2-ol,2,6-dimethyl and nonanal) were selected because they were the main compounds in the chromatogram, and each of them belonged to different families of compounds, all of which are present in human sweat.

2.4. Sampling Design and Collection of Real Samples

Although previous studies to date have shown that sweat is not a vector for SARS-CoV-2 [38], the design of the sampling with volunteers required, based on the precautionary principle, to incorporate a disinfection and quarantine process for the vials. To find out if the disinfection of these vials with bleach solution posed a problem for subsequent chemical analysis with the sweat samples, a gas chromatograph analysis of an empty vial not cleaned with bleach and another vial externally disinfected with bleach was carried out. The chromatographic signals from the two vials were compared. The test was performed four times and reproducibility was studied by analyzing the same sample three times.

Two types of gauze pads of different sizes were used: 10 × 10 cm gauze pads, sterile, of the commercial brand Eroski (Luxembourg) and 5 × 5 cm gauze pads, also sterile, of the commercial brand Medicomp, sterile (HARTMANN, Heidenheim, Germany). No pretreatment of the gauze was performed before chemical analysis for VOCs, although gauze is known to contain VOCs [36]. To minimize background chromatographic signals from the vials, three types of vials were analyzed: new vial, old vial (reused), and vial heated in an oven at 100 °C overnight. Each condition was tested twice. In total 20 mL transparent glass SPME vials (Agilent Technologies, Berlin, Germany) with a PTFE/silicone screw cap (Agilent Technologies, Santa Clara, CA, USA) were used.

SARS-CoV-2-negative and SARS-CoV-2-positive individuals were recruited from the Basque Autonomous Community (CAV-Euskadi). Participation was completely voluntary, and all subjects were informed about the objectives of this study. The protocol was approved by the Ethics Committee for Research involving Human Subjects, their Samples and Data (CEISH-UPV/EHU) and the Ethics Committee for Research with Biological Agents and/or GMOs (CEIAB-UPV/EHU). Samples were collected from 76 individuals aged 18–73, 42 women and 34 men.

The recruitment process was conducted through an appeal on social networks (RRSS). Three groups were sampled in this study: SARS-CoV-2-positive individuals with symptoms (Group A), SARS-CoV-2-positive individuals without symptoms (Group B), and SARS-CoV-2-negative individuals (Group C). Group A individuals include showing clinical symptoms of COVID-19 and being positive on the diagnostic test for active infection (AIDP) for SARS-CoV-2; Group B individuals include showing no clinical symptoms of COVID-19 and being positive on the diagnostic test for active infection for SARS-CoV-2. Only samples from individuals with a negative AIDP are included for Group C. In all cases, diagnostic tests, whether positive or negative, were required to have been performed within the previous 48 h before sweat sampling. Potential confounding between medical treatments and sweat VOCs in SARS-CoV-2-positive individuals receiving medical treatment, or for diet and routine, was not controlled for. Due to isolation and quarantine protocols, sample collection was carried out by the participants in their respective homes. In addition, as expected, when more than one direct contact AIDP was required in the same household, an attempt was made to recruit both SARS-CoV-2-negative and SARS-CoV-2-positive individuals from that household.

Before initiating the study protocol, all participants agreed to sign the informed consent form, which was sent electronically (by mobile phone photograph or scan) by the participants to the principal investigator, and they provided their postal address to send them the sampling kit together with the instructions, and to arrange for its collection. In response to the confirmation of the participation of the subjects, a link was shared to the online form in which, in a pseudonymized form, participants provided their personal data such as age, sex, absence/presence of symptoms, current symptoms, type of AIDP, date of sample collection for the AIDP test and the result of the diagnostic test performed. To ensure anonymization, each participant was assigned a sample code.

Sampling was performed by a volunteer at home. In order to minimize potential alterations to the volatilome, the application of scented products such as shampoos or colognes was restricted to a minimum of 48 h before sample acquisition. A gauze pad was held under each axilla for 30 min. Subsequently, the gauzes were introduced into the previously conditioned extraction vials, corresponding to the numbered plastic Falcon tubes. Individual anonymized data were recorded on a form for each coded sample.

2.5. Stability Study

Given the extended duration of the sample collection phase, a stability study was conducted to identify optimal storage temperatures and evaluate the degradation kinetics of the volatile compounds. Using sweat samples from healthy controls, the maximum storage time and temperature that maintained the integrity of the original VOC pattern (time 0) were determined.

Two stability studies were carried out independently: one short-term (up to 15 days) and one long-term (15 days to two months). The same storage temperatures were used in both cases: 4 °C and −20 °C. In both the short-term and long-term studies, four sweat samples were used (A–D). Two of them (A and B) were stored at −20 °C, and the other two (C and D) at 4 °C. For the short term, the four samples were injected at defined time points (t = 1, 4, 6, 8, 11, 13, and 18 days). In the long-term study, the four samples were injected at t = 23, 28, 33, and 41, and only the samples stored at −20 °C (A and B) were analyzed at t = 77 days.

Both the optimal storage temperature for sweat samples and their stability were determined by summing the areas of chromatographic peaks corresponding to the previously identified compounds over time.

2.6. Data Pre-Treatment and Multivariate Analysis

To compare the chromatograms obtained from the sweat samples using GC/ToF, the data set was processed with MS-DIAL 4.9 RIKEN software (Kanagawa, Japan) to detect and align features. A feature is defined as an ion signal with a unique m/z and a specific retention time (RT). The MS-Dial performs deconvolution of the data from each acquired sample and aligns them based on the RT of the reference sample. Finally, the software provides a data matrix that lists the abundance of each feature for each analyzed sample. The parameters used for feature detection were a minimum signal amplitude of 10,000, a mass accuracy of 0.025 Da and a width of 0.1 Da. For alignment, an RT tolerance of 0.075 min with a tolerance and identification similarity of 70% were used. Finally, a manual review of the integration areas was conducted before the multivariate analysis.

Data analysis was carried out in MATLAB (R2024a, The MathWorks, Natick, MA, USA) using the data matrix obtained from MS-Dial. Unsupervised analysis with the PLS_Toolbox (Eigenvector Research, WA, USA) was first performed using Principal Component Analysis (PCA) to identify differences between the two groups: positive patients (A) and negative controls (C). Groups of observations and trends can be revealed with this overview. PCA also uncovers relationships among observations and variables, as well as among variables themselves. PCA was analyzed visually through PCA score plots.

The classification method was developed with the Classification toolbox [39]. Partial Least Squares-Discriminant Analysis (PLS-DA) was performed to classify the groups. Partial Least-Squares (PLS) regression is a technique used with data that contain correlated predictor variables. This technique constructs new predictor variables, known as components, as linear combinations of the original predictor variables. PLS constructs these components by considering the observed response values, yielding a model with reliable predictive power. PLS therefore combines information about the variances of both the predictors and the responses while also accounting for correlations among them.

Variable selection was also performed by excluding all signals placed in the middle of the loading scatter plot, since they are the ones that have the least weight in the model when classifying the two groups. Moreover, variables with VIP < 1 were excluded. Variable Importance in Projection (VIP) scores estimate the importance of each variable in the projection used in a PLS model and are often used for variable selection. A variable with a VIP Score close to or greater than 1 (one) can be considered important in the given model. Variables with VIP scores significantly less than 1 (one) are less important and might be good candidates for exclusion from the model. After this treatment, the data matrix was reduced from 440 to 285 features.

PLS reduces collinearity in X (predictor matrix) and identifies the optimal correlations between the latent variables and Y (vector of answers). The effects of gender, symptoms, age, day of injection, and viral variant on sample separation were also examined. Cross-validation (CV) and external validation were used as practical and reliable methods to assess the significance of the PLS-DA model.

Cross-validation is a model assessment technique used to evaluate a machine learning algorithm’s performance on new datasets on which it has not been trained. This is performed by partitioning the known dataset, using a subset to train the algorithm and the remaining data for testing. Each round of cross-validation involves randomly partitioning the original dataset into a training set and a testing set. The training set is then used to train a supervised learning algorithm, and the testing set is used to evaluate its performance. This process is repeated several times, and the average cross-validation error is used as a performance indicator.

Validation values can be calculated, such as:

Selectivity: It is the classification and/or prediction capacity obtained from the category defined as positive (in our case, category A, Positive Patient). It is calculated using Equation (1).

%Selectivity = TP/(TP + FN) × 100,

(1)

where TP = Total Positives and FN = False Negatives.

Specificity: It is the classification and/or prediction capacity obtained from the category defined as negative (in our case, category C, Negative Control). It is calculated using Equation (2).

%Specificity = TN/(TN + FP) × 100,

(2)

where TN = Total Negatives and FP = False Positives.

Sensitivity and specificity are negatively correlated with respect to diagnostic thresholds. If the threshold for a test positive is set higher—say when the test provides a continuously valued result—sensitivity will decrease while specificity will increase. On the other hand, if the threshold is lower, sensitivity increases and specificity decreases. This non-independence of sensitivity and specificity across explicit or implicit diagnostic thresholds poses challenges for quantitative synthesis. Receiver operating characteristic (ROC) curves are an important tool for evaluating the performance of a machine learning model. They are typically used in binary classification problems, specifically problems with two distinct output classes. The ROC curve shows the relationship between the true positive rate (TPR) and the false positive rate (FPR) of the model. Perfect classifiers have a TPR of 1 and an FPR of 0.

The model was created and validated using a small number of samples due to the difficulty of obtaining them. The effectiveness of the PLS-DA method was compared to SIMCA and Random Forest.

The classification method Soft Independent Modelling of Class Analogies (SIMCA) describes each class of samples using its own principal component model.

As the name implies, Random Forest is a supervised classification method that builds multiple classifiers to improve predictive accuracy. Random Forest is applied to a test dataset, in which the trees are constructed, and the resulting predictions are combined to yield a class label.

3. Results

3.1. Fiber and Extraction Time Optimization

As shown in Figure 1, the CAR/PDMS (75 μm), DVB/CAR/PDMS, and PDMS/DVB fibers demonstrated high extraction capacities for acidic compounds, BTEX, aldehydes, aromatics, and phenol. While all three mixed-phase fibers effectively extracted analytes of varying polarities, the DVB/CAR/PDMS fiber distinguished itself by recovering the greatest number of features from the biological sample while maintaining the lowest background signal in the blank (Table 1). Based on these results, DVB/CAR/PDMS was determined to be the most suitable fiber for this study.

Similarly, the effect of extraction time was investigated. In this case, real axillary sweat samples were used to determine the optimal extraction time. The chromatographic signals of four main compounds were studied: toluene, 1-Hexanol, 2-ethyl-7-octen-2-ol, 2,6-dimethyl and nonanal at different times from 5 to 50 min. Figure 2 illustrates that the extraction balance was achieved at 30 min for most of the compounds studied. For nonanal, a maximum is observed at 15 min; however, this finding was not considered significant for the study, as it may be attributable to the analytical method. Although no maximum is observed for the compounds Toluene and 7-octen-2-ol, the curve decreases substantially after 30 min. This suggests that the equilibrium of these components within that time range does not necessarily reach a maximum.

Based on these experimental results, the optimal absorption conditions were a DVB/CAR/PDMS fiber, a temperature of 40 °C, and a time of 30 min.

3.2. Sampling Design and Collection of Real Samples

First, the impact of the cleaning protocol using 10% bleach was assessed. The VOC profile was found to remain unaltered, with the total count of detected compounds showing no significant variation (Table 2), thus confirming that the sterilization procedure did not introduce analytical artifacts.

Secondly, headspace evaluation of the 5 × 5 cm and 10 × 10 cm sterile gauze pads revealed an identical background of 98 compounds (Table 2). However, regarding the extraction efficiency of the eight target standards (acetic acid, toluene, hexanal, heptanal, nonanal, phenol, benzaldehyde, and ethyl octanoate), superior reproducibility was observed with the 5 × 5 cm gauze. Moreover, this dimension permitted the simultaneous insertion of two pads into the sampling vial, thereby facilitating the concurrent sampling of both axillae.

Finally, a comparative analysis of vial preparation methods (new, reused, and thermally conditioned) demonstrated a marked reduction in background volatile compounds in the heated vials (Table 2). Consequently, a thermal conditioning step at 100 °C was implemented prior to use to minimize background interference.

3.3. Stability Results

The first objective was to determine the temperature at which the samples could be preserved without modifying their volatile profile.

As some of the compounds selected for the short-term were lost in the long-term study, they were replaced by others to complete the long-term research. In the short-term study, the compounds studied were toluene, 2,6-dimethyl-7-octen-2-ol, nonanal, and 2-ethyl-1-hexanol. In the long-term study, the compounds were toluene, 2,6-dimethyl-7-octen-2-ol, nonanal, lilial, and isopropyl myristate.

In the stability study conducted both in the short and long term, the absence of certain compounds at 4 °C was observed, including nonanal, lilial, and isopropyl myristate. However, the absence or presence of these compounds may reflect the complexity of sweat as a biological matrix, which depends on diet, exercise, environment, and other factors that influence the composition of VOCs. Although the sweat samples were obtained from the same person, they were collected on different days and at different times to conduct each stability study independently.

The second study aimed to determine the short- and long-term stability of aids stored in the freezer. In this study, decreases in most compounds were observed over time, although it was unclear whether this decline was attributable to passage of time or to repeated injections. In the short-term study, it was observed that toluene, 7-Octen-2-ol, 2,6-dimethyl-, and nonanal (present only at −20 °C) showed similar behavior, whereas the 1-hexanol,2-ethyl compound reached its maximum on day 6 and then decreased.

The long-term study illustrates that toluene is more stable. 7-octen-2-ol, 2,6-dimethyl- behaves in a very similar way, both in the short and long term. Nonanal and lilial behave similarly, and, finally, isopropyl myristate decreases over time but increases considerably in the final analysis (day 77).

In general, the VOC profile of the sweat samples was unstable at both temperatures studied (4 °C and −20 °C), so the samples were stored at −80 °C.

3.4. Multivariate Analysis Results

3.4.1. Data Pre-Treatment and Datasheet Creation

A total of 79 samples were analyzed: 25 positive samples, 34 negative controls, 15 QCs and 5 blanks. The batch and injection order for each sample was assigned within the MS-DIAL program.

After deconvolution and alignment, MS-DIAL identified 810 features. However, a check had to be carried out and verified: on the one hand, that the integrated signals were well aligned, and on the other, that the program had performed the integration well.

After cleaning the poorly defined signals, 441 features were selected to construct the data matrix.

The data were preprocessed using logarithmic transformations to improve the predictive performance and interpretability of the multivariable calibration model. Also, autoscaling was used. A data preprocessing technique where each variable in a dataset is scaled by its standard deviation, often after centering the data, was used. This mathematical transformation of the data is particularly useful when measures exhibit high area values in a matrix. If this extreme measure is not preprocessed before data analysis, it will exert a strong influence on the model and dominate the other measures.

3.4.2. Multivariate Model Optimization and Validation

The first step was to perform a preliminary PCA to assess whether any trends or clustering were present.

The PCA model scores and loadings are shown in Figure 3. PC1-PC2 scatter plots for the scores and loadings were plotted because they were the ones that explained the largest variance.

Initial unsupervised analysis using Principal Component Analysis (PCA) revealed no distinct clustering between SARS-CoV-2 positive individuals (red) and healthy controls (blue). This lack of separation was attributed to the high biological variability inherent to the sweat matrix and the dominance of confounding background features over subtle disease-specific signals.

Consequently, Partial Least Squares-Discriminant Analysis (PLS-DA) was employed to maximize class separation and identify discriminatory biomarkers. Prior to modeling, non-informative variables were removed, and data underwent logarithmic transformation and autoscaling to standardize signal intensity. Model robustness was assessed using the MATLAB Classification Toolbox by Davide Ballabio. To generate the classification model, the dataset was randomly partitioned into a training set (approx. 70%) and an external validation test set (approx. 30%) to ensure independent evaluation. The model was optimized using cross-validation on the training set, while predictive accuracy was evaluated using the external samples, which were strictly excluded from the model construction phase. Variable selection was performed based on Variable Importance in Projection (VIP) scores (VIP > 1) derived from the PLS-DA model. The number of latent variables (LVs) was optimized by minimizing the root mean square error of cross-validation.

Figure 4 illustrates the predictions for sample A from the training and test datasets.

As shown, 88% of positive patients and 72% of negative controls were correctly classified, resulting in an overall accuracy of 79%, which is regarded as acceptable for this type of analysis. The model displayed a selectivity of 70% and a specificity of 89%, indicating effective predictive performance for both respective groups.

ROC curves and sensitivity and specificity thresholds were also represented in Figure 5.

The effectiveness of the PLS-DA method was compared to SIMCA and Random Forest. Figure 6 illustrates the SIMCA predictions for sample A from the training and test datasets.

A selectivity of 0% and a specificity of 46% were obtained, indicating that the SIMCA classification model produced suboptimal results. This outcome is also evidenced by the ROC curves and the sensitivity/specificity thresholds detailed in Figure 7.

Also, a Random Forest model was made. Figure 8 illustrates the RF predicted samples A from the training and test datasets.

External validation yielded a selectivity and specificity of 75% and 82%, respectively. These findings corroborate the performance observed in the PLS-DA model.

Finally, a confusion matrix (Table 3) was generated to visualize the classification of all samples into the two study groups.

Although lower performance metrics were obtained using SIMCA, satisfactory and comparable results were achieved with both the RF and PLS-DA models.

3.5. VOC Candidates for Constructing Artificial Training Aids for COVID-19 K9 Detection

The MS-DIAL program assigns retention times to each feature, and the mass measurement, together with the use of a ToF detector with high mass accuracy, facilitates metabolite identification.

To identify the compounds, knowing the retention time and exact mass of the features, the chromatographic signal was extracted using the MassHunter Qualitative Analysis 10.0 software from Agilent (Santa Clara, CA, USA), and the signal was then identified using the NIST MS search program from NIST (Gaithersburg, MD, USA), where the compound was determined by comparing the mass spectrum with the NIST library14.

Finally, among the possible compounds of interest, those with the highest identification precision in the library were selected; compounds that did not correspond to possible metabolites of the human metabolome were eliminated (silanes from the column, fragrances, body creams, etc.), and, in case of doubt among compounds, those specific to the metabolome were selected through a search in the HMDB (human metabolome database [32]).

3.5.1. VOCs Specific to Positive Patients

Following the identification process, a series of patient-specific compounds identified as positive for COVID-19 were selected as VOCs of interest. Although the features are presented in Table 4, information on their formulation and/or molecular structure will not be provided here, as this is part of an Auziker (Ikerkude S.L.,Leioa, Spain) future device.

After removing undesirable compounds, such as silanes and compounds present in colonies, features 6, 23, 50, 65, 128, and 653 correspond to the VOCs selected to develop an aid for training medical canine units to detect positive COVID-19 cases.

3.5.2. VOCs with Higher Presence in Negative Controls

The features specific to the negative controls are presented in Table 5. In addition, their identification score for the compound is provided, along with the compound name and its presence or absence as a metabolite in the human organism.

Features 183 and 58 were discarded as they were solvents or silanes. In cases such as 406 and 398, where several possible compounds were identified by mass spectrometry, a search was conducted for the compound with the highest NIST score in the HMDB. Among the multiple options, VOCs without a potential human origin were excluded.

4. Discussion

4.1. Discussion of Results Obtained in the Multivariate Study

As mentioned, a Scores diagram shows the relationship between the observations. If information such as patient characteristics, sampling date, or injection order is added to the samples, it can be determined whether these external factors influence the collection of sweat metabolites. The factors considered were the age of the sampled individuals, gender (man or woman), the moment or day in which each of the samples was injected (batch) and the theoretical strain responsible for the infection (taking into account the most probable strain according to the epidemiological studies of the moment in each of the waves sampled: Alpha, Delta or Omicron).

Although the sampling was suitable for all people of legal age, the samples obtained tend to have a higher average age, probably because the largest number of cases during the pandemic were adults or older adults (age differentiation divided into less than 30 years, between 30 and 50 years and over 50 years).

As shown in Figure 9, no groupings were observed by factors such as age, gender, or injection batch, and these factors did not influence the results.

Furthermore, the random arrangement of the samples when analyzing the injection batches, which indicates the order of the days on which the samples were injected, confirms the robustness of the analytical method used, demonstrating that injections on different days have not influenced the results.

Regarding the symptoms, the samples from COVID-19-positive patients who were asymptomatic are all located in the middle region of the graph, or even in the area where the negative controls are grouped (Figure 9d). This indicates that the multivariate model has not been able to distinguish these samples; that is, the concentration or presence of VOCs in these samples is similar to that in the negative samples. This suggests that the compounds responsible for the aroma of sweat excreted by asymptomatic individuals may be identical to those found in healthy individuals. These results align with findings from other studies comparing the metabolomes of COVID-19 patients with varying disease severity, in which the metabolomes of negative individuals are similar to asymptomatic patients [40,41].

Finally, regarding the strain of the virus with which the patients were infected, it is observed that a certain differentiation can be seen between some of the strains, such as the Omicron variant (blue) where strains are located in the upper left margin of the graph and the Alpha variant (red), where the samples are located in the lower margin of the graph (Figure 9e). However, to identify differences among VOCs generated by contagion across different strains, it would be necessary to conduct a multivariate analysis using only patient samples, which would require a larger sample size. Regarding the results obtained for the work, both some of the sweat samples from Alpha and Delta patients are located in the area of the negative controls, so, as there is no clear differentiation on the classification axis of the model, that is, the axis that differentiates positive patients from controls, it is observed that there is no influence of the strains on obtaining the results (Figure 9e).

It is crucial to note that the multivariate analysis identified no significant groupings in sweat metabolites associated with external factors such as age, gender, the sample’s injection batch (confirming the method’s robustness), viral strains (Alpha, Delta, Omicron), or the patient’s asymptomatic status. The lack of influence of variables such as age, gender, and injection batch on the results is a key indicator that the distinctive Volatile Organic Compounds (VOCs) of COVID-19 infection are generated consistently and independently of these demographic or analytical characteristics. This consistency is of paramount importance for the objective of creating effective chemical aid based on these VOCs. Assistance that is independent of patient gender or age maximizes its universality and applicability in training dogs as chemical sensors, ensuring that the animals focus solely on the “scent of the disease” without interference from other biological variations.

4.2. VOCs in Negative Controls

After the identification process, five compounds were selected as VOCs of interest that decrease in concentration in patients positive for COVID-19: features 406, 398, 193, 130, and 68, corresponding to 2,7,10-trimethyldodecane, 2,6,10,15-tetramethyllheptadecane, Cyclohexanone, Hexanal, and Pentanal, respectively. An independent study of these metabolites may help elucidate the virus and the reasons these compounds are present at lower concentrations in the sweat of patients with COVID-19.

2,7,10-trimethyldodecane is a member of the class of organic compounds known as branched alkanes. These are acyclic, branched hydrocarbons having the general formula C_nH_2n+2. The human metabolome database describes its biological location in breath excreta [35].

2,6,10,15-tetramethylheptadecane is a sesquiterpenoid. These are terpenes with three consecutive isoprene units [35].

Cyclohexanone is a member of the class of organic compounds known as cyclic ketones. These are organic compounds containing a ketone that is conjugated to a cyclic moiety. The human metabolome database describes its biological location from feces excreta [35]. In an article, several cyclohexanone compounds were studied as potential ligands for the SARS-CoV-2 main protease [36].

Hexanal is a member of the class of organic compounds known as medium-chain aldehydes. These are aldehydes with chain lengths of 6–12 carbon atoms. The human metabolome database describes its biological locations in feces and urine [35]. A recent study demonstrates downregulation of hexanal in vaccinated individuals. Hexanal was the principal oxidation product of hepatic arachidonic acid and other polyunsaturated fatty acids. The literature suggests involvement of the arachidonic acid pathway in modulating various inflammatory responses and their resolution [37]. Thus, substrate depletion due to antiviral activity might be a reason for downregulation.

Pentanal is a member of the class of organic compounds known as alpha-hydrogen aldehydes. These are aldehydes with the general formula HC(H)(R)C(=O)H, where R is an organyl group. The human metabolome database describes its biological location from feces excreta [35]. A recent study demonstrates that pentanal increases ACE2 expression and SARS-CoV-2 protease activity on the S protein. Infection of human cells by viral particles occurs through binding of viral spike proteins to host–cell receptors and subsequent proteolytic priming. Angiotensin-converting enzyme 2 (ACE2) is considered to be the classic receptor for SARS-CoV-2 [38,39].

5. Conclusions

Thanks to the analytical method based on HS-SPME-GC/MS-ToF, the VOC contents in human sweat samples have been separated and analyzed. The alignment and subsequent cleaning of the respective spectra together with the application of machine learning and a validated PLS-DA multivariate model have allowed us to differentiate those volatile compounds that differentiate healthy people from patients infected by the SARS-CoV-2 virus regardless of the gender of the sick person as well as the age or the strain with which they were infected. Six VOCs have been identified as significantly more prevalent in patients who test positive for the disease. These compounds can be used to enable a chemical sensor, such as a detection dog, to identify the target molecules, thus indicating the presence of the disease in a patient.

Additionally, five VOCs of interest have been identified as significantly less expressed in the sick individuals sampled: 2,7,10-trimethyldodecane, 2,6,10,15-tetramethyllheptadecane, Cyclohexanone, Hexanal, and Pentanal. In-depth bibliographic and technical studies on the inhibition of this type of compound can provide important information on the metabolic pathways and biochemical behavior of the disease in humans.

The methodology employed in this study, which used sweat samples from infected patients, can be extended to the detection of other diseases and infections.

Author Contributions

Conceptualization, L.B.; Methodology, U.M., A.I., J.M.A. and L.B.; Software, D.P.A., E.L. and J.M.A.; Validation, D.P.A., E.L., J.M.A. and L.B.; Formal analysis, D.P.A., E.L. and J.M.A.; Investigation, D.P.A., E.L., U.M., A.I. and L.B.; Resources, U.M. and L.B.; Data curation, D.P.A., E.L. and J.M.A.; Writing—original draft, D.P.A. and E.L.; Writing—review & editing, D.P.A., U.M., A.I., J.M.A. and L.B.; Visualization, U.M., A.I. and L.B.; Supervision, U.M., A.I., J.M.A. and L.B.; Project administration, L.B.; Funding acquisition, U.M., A.I. and L.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Basque Government through the Bikaintek program which funded the industrial doctorate fellowship associated with this research, 027-B2/2024. This study was funded with a EHU 2020 Special Action, code AE20/15, from the University of the Basque Country (EHU).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee for Research involving Human Subjects, their Samples and Data (CEISH-UPV/EHU) and the Ethics Committee for Research with Biological Agents and/or GMOs (CEIAB-UPV/EHU) (date of approval 19/02/2021).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank all participants who collaborated in the sampling and Beloaran K9 unit. The authors also gratefully acknowledge the invaluable assistance of Oihane Albóniga in the processing of chromatographic data. The authors thank for technical and human support provided by Central Analysis Service of the UPV/EHU, Bizkaia Unit, SGIker (EHU/ERDF, EU).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

VOCs	Volatile Organic Compounds
HS	Head Space
SPME	Solid Phase Micro Extraction
GC	Gas Chromatography
MS	Mass Spectrometry
ToF	Time of Flight
PLS-DA	Partial Least Squares-Discriminant Analysis
ML	Machine learning
qMS	Mass quadrupole detector
QC	Quality Control
CAR	Carboxen
PDMS	Polydimethylsiloxane
PEG	Polyethylene glycol
RT	Retention Time
PCA	Principal Component Analysis
BTEX	Benzene, Toluene, Ethylbenzene, Xylene
TP	True Positive
FN	False Negative
TN	True Negative
FP	False Positive
ROC	Receiver Operating Characteristic
TPR	True Positive Ratio
FPR	False Positive Ratio
SIMCA	Soft Independent Modelling of Class Analogies
CV	Cross-Validation
HMDB	Human metabolome database

References

McCulloch, M.; Jezierski, T.; Broffman, M.; Hubbard, A.; Turner, K.; Janecki, T. Diagnostic Accuracy of Canine Scent Detection in Early- and Late-Stage Lung and Breast Cancers. Integr. Cancer Ther. 2006, 5, 30–39. [Google Scholar] [CrossRef] [PubMed]
Moser, E.; McCulloch, M. Canine Scent Detection of Human Cancers: A Review of Methods and Accuracy. J. Vet. Behav. 2010, 5, 145–152. [Google Scholar] [CrossRef]
Angle, C.; Waggoner, L.P.; Ferrando, A.; Haney, P.; Passler, T. Canine Detection of the Volatilome: A Review of Implications for Pathogen and Disease Detection. Front. Vet. Sci. 2016, 3, 47. [Google Scholar] [CrossRef]
Simon, A.; Lazarowski, L.; Singletary, M.; Barrow, J.; Van Arsdale, K.; Angle, T.; Waggoner, P.; Giles, K. A Review of the Types of Training Aids Used for Canine Detection Training. Front. Vet. Sci. 2020, 7, 313. [Google Scholar] [CrossRef]
Jendrny, P.; Schulz, C.; Twele, F.; Meller, S.; von Köckritz-Blickwede, M.; Osterhaus, A.D.M.E.; Ebbers, J.; Pilchová, V.; Pink, I.; Welte, T.; et al. Scent Dog Identification of Samples from COVID-19 Patients—A Pilot Study. BMC Infect. Dis. 2020, 20, 536. [Google Scholar] [CrossRef] [PubMed]
Grandjean, D.; Sarkis, R.; Lecoq-Julien, C.; Benard, A.; Roger, V.; Levesque, E.; Bernes-Luciani, E.; Maestracci, B.; Morvan, P.; Gully, E.; et al. Can the Detection Dog Alert on COVID-19 Positive Persons by Sniffing Axillary Sweat Samples? A Proof-of-Concept Study. PLoS ONE 2020, 15, e0243122. [Google Scholar] [CrossRef]
Mendel, J.; Frank, K.; Edlin, L.; Hall, K.; Webb, D.; Mills, J.; Holness, H.K.; Furton, K.G.; Mills, D. Preliminary Accuracy of COVID-19 Odor Detection by Canines and HS-SPME-GC-MS Using Exhaled Breath Samples. Forensic Sci. Int. Synerg. 2021, 3, 100155. [Google Scholar] [CrossRef] [PubMed]
Doogweb El Olfato Canino en el Perro de Rescate. 2011. Available online: https://www.doogweb.es (accessed on 28 January 2026).
Raiszadeh, M.M.; Ross, M.M.; Russo, P.S.; Schaepper, M.A.; Zhou, W.; Deng, J.; Ng, D.; Dickson, A.; Dickson, C.; Strom, M.; et al. Proteomic Analysis of Eccrine Sweat: Implications for the Discovery of Schizophrenia Biomarker Proteins. J. Proteome Res. 2012, 11, 2127–2139. [Google Scholar] [CrossRef]
Cui, X.; Zhang, L.; Su, G.; Kijlstra, A.; Yang, P. Specific Sweat Metabolite Profile in Ocular Behcet’s Disease. Int. Immunopharmacol. 2021, 97, 107812. [Google Scholar] [CrossRef]
Woodley, F.W.; Gecili, E.; Szczesniak, R.D.; Shrestha, C.L.; Nemastil, C.J.; Kopp, B.T.; Hayes, D. Sweat Metabolomics before and after Intravenous Antibiotics for Pulmonary Exacerbation in People with Cystic Fibrosis. Respir. Med. 2022, 191, 106687. [Google Scholar] [CrossRef]
Delgado-Povedano, M.d.M.; Calderón-Santiago, M.; Priego-Capote, F.; Jurado-Gámez, B.; Luque de Castro, M.D. Recent Advances in Human Sweat Metabolomics for Lung Cancer Screening. Metabolomics 2016, 12, 166. [Google Scholar] [CrossRef]
Taleb, N.N. Skin in the Game: Hidden Asymmetries in Daily Life; Random House Publishing Group: New York, NY, USA, 2018; ISBN 978-0-425-28463-6. [Google Scholar]
Wishart, D.S. Metabolomics for Investigating Physiological and Pathophysiological Processes. Physiol. Rev. 2019, 99, 1819–1875. [Google Scholar] [CrossRef]
Wilson, A.D.; Forse, L.B. Potential for Early Noninvasive COVID-19 Detection Using Electronic-Nose Technologies and Disease-Specific VOC Metabolic Biomarkers. Sensors 2023, 23, 2887. [Google Scholar] [CrossRef]
Hasan, M.R.; Suleiman, M.; Pérez-López, A. Metabolomics in the Diagnosis and Prognosis of COVID-19. Front. Genet. 2021, 12, 721556. [Google Scholar] [CrossRef] [PubMed]
Ruszkiewicz, D.M.; Sanders, D.; O’Brien, R.; Hempel, F.; Reed, M.J.; Riepe, A.C.; Bailie, K.; Brodrick, E.; Darnley, K.; Ellerkmann, R.; et al. Diagnosis of COVID-19 by Analysis of Breath with Gas Chromatography-Ion Mobility Spectrometry—A Feasibility Study. EclinicalMedicine 2020, 29, 100609. [Google Scholar] [CrossRef]
Grassin-Delyle, S.; Roquencourt, C.; Moine, P.; Saffroy, G.; Carn, S.; Heming, N.; Fleuriet, J.; Salvator, H.; Naline, E.; Couderc, L.-J.; et al. Metabolomics of Exhaled Breath in Critically Ill COVID-19 Patients: A Pilot Study. EBioMedicine 2021, 63, 103154. [Google Scholar] [CrossRef]
Boeselt, T.; Terhorst, P.; Kroenig, J.; Nell, C.; Spielmanns, M.; Heers, H.; Boas, U.; Veith, M.; Vogelmeier, C.; Greulich, T.; et al. Pilot Study on Non-Invasive Diagnostics of Volatile Organic Compounds over Urine from COVID-19 Patients. Arch. Clin. Biomed. Res. 2022, 6, 65–73. [Google Scholar] [CrossRef]
Ketchanji, Y.C.M.; Di Zazzo, L.; Canuano, R.; Minieri, M.; Di Natale, C. Urinary Volatile Recognition for COVID-19 Diagnosis. In Proceedings of the 2022 IEEE International Symposium on Olfaction and Electronic Nose (ISOEN), Aveiro, Portugal, 29 May–1 June 2022; pp. 1–4. [Google Scholar]
Lamote, K.; Janssens, E.; Schillebeeckx, E.; Lapperre, T.S.; De Winter, B.Y.; van Meerbeeck, J.P. The Scent of COVID-19: Viral (Semi-)Volatiles as Fast Diagnostic Biomarkers? J. Breath. Res. 2020, 14, 042001. [Google Scholar] [CrossRef]
Shirasu, M.; Touhara, K. The Scent of Disease: Volatile Organic Compounds of the Human Body Related to Disease and Disorder. J. Biochem. 2011, 150, 257–266. [Google Scholar] [CrossRef] [PubMed]
Schivo, M.; Aksenov, A.A.; Linderholm, A.L.; McCartney, M.M.; Simmons, J.; Harper, R.W.; Davis, C.E. Volatile Emanations from in Vitro Airway Cells Infected with Human Rhinovirus. J. Breath Res. 2014, 8, 037110. [Google Scholar] [CrossRef]
Amann, A.; Costello, B.d.L.; Miekisch, W.; Schubert, J.; Buszewski, B.; Pleil, J.; Ratcliffe, N.; Risby, T. The Human Volatilome: Volatile Organic Compounds (VOCs) in Exhaled Breath, Skin Emanations, Urine, Feces and Saliva. J. Breath Res. 2014, 8, 034001. [Google Scholar] [CrossRef]
Frazier, C.J.G.; Gokool, V.A.; Holness, H.K.; Mills, D.K.; Furton, K.G. Multivariate Regression Modelling for Gender Prediction Using Volatile Organic Compounds from Hand Odor Profiles via HS-SPME-GC-MS. PLoS ONE 2023, 18, e0286452. [Google Scholar] [CrossRef]
Crespo-Cajigas, J.; Gokool, V.A.; Ramírez Torres, A.; Forsythe, L.; Abella, B.S.; Holness, H.K.; Johnson, A.T.C.; Postrel, R.; Furton, K.G. Investigating the Use of SARS-CoV-2 (COVID-19) Odor Expression as a Non-Invasive Diagnostic Tool—Pilot Study. Diagnostics 2023, 13, 707. [Google Scholar] [CrossRef]
Gokool, V.A.; Crespo-Cajigas, J.; Ramírez Torres, A.; Forsythe, L.; Abella, B.S.; Holness, H.K.; Johnson, A.T.C.; Postrel, R.; Furton, K.G. Predicting SARS-CoV-2 Variant Using Non-Invasive Hand Odor Analysis: A Pilot Study. Analytica 2023, 4, 206–216. [Google Scholar] [CrossRef]
Wysocki, C.J.; Preti, G. Facts, Fallacies, Fears, and Frustrations with Human Pheromones. Anat. Rec. A Discov. Mol. Cell Evol. Biol. 2004, 281, 1201–1211. [Google Scholar] [CrossRef]
GC-MS Introduction. Available online: https://www.chromacademy.com/gc-ms/principles/gc-ms-introduction/ (accessed on 17 November 2025).
Wang, Y.; Zhou, L.; Zhou, Y.; Zhao, C.; Lu, X.; Xu, G. A Rapid GC Method Coupled with Quadrupole or Time of Flight Mass Spectrometry for Metabolomics Analysis. J. Chromatogr. B Analyt Technol. Biomed. Life Sci. 2020, 1160, 122355. [Google Scholar] [CrossRef]
Dogs Can Detect Parkinson’s Years Before Symptoms—With 98% Accuracy. Available online: https://www.sciencedaily.com/releases/2025/07/250716000846.htm (accessed on 16 January 2026).
Sharun, K.; Jose, B.; Tiwari, R.; Natesan, S.; Dhama, K. Biodetection Dogs for COVID-19: An Alternative Diagnostic Screening Strategy. Public Health 2021, 197, e10–e12. [Google Scholar] [CrossRef]
Salgirli Demirbas, Y.; Sareyyupoglu, B.; Öztürk, H.; Alpay, M.; Seçilmiş, H.; Emen, F.; BAŞ, B.; Ozkul, A. The Role of Bio-Detection Dogs in Prevention and Diagnosis of Infectious Disease: A Systematic Review. Ank. Üniversitesi Vet. Fakültesi Derg. 2021, 68, 185–192. [Google Scholar] [CrossRef]
Monedeiro, F.; dos Reis, R.B.; Peria, F.M.; Sares, C.T.G.; De Martinis, B.S. Investigation of Sweat VOC Profiles in Assessment of Cancer Biomarkers Using HS-GC-MS. J. Breath Res. 2020, 14, 026009. [Google Scholar] [CrossRef] [PubMed]
Kim, H.J.; Lee, M.J.; Hur, S.H.; Jeong, M.H. Development of a Gas Chromatography-Time-of-Flight Method for Detecting Glucosinolate Metabolites and Volatile Organic Compounds in Kimchi. Int. J. Anal. Chem. 2021, 2021, 9978251. [Google Scholar] [CrossRef] [PubMed]
Curran, A.M.; Rabin, S.I.; Prada, P.A.; Furton, K.G. Comparison of the Volatile Organic Compounds Present in Human Odor Using SPME-GC/MS. J. Chem. Ecol. 2005, 31, 1607–1619. [Google Scholar] [CrossRef]
Penn, D.J.; Oberzaucher, E.; Grammer, K.; Fischer, G.A.; Soini, H.; Wiesler, D.; Novotny, M.V.; Dixon, S.J.; Xu, Y.; Brereton, R.G. Individual and Gender Fingerprints in Human Body Odour. J. R. Soc. Interface 2006, 4, 331–340. [Google Scholar] [CrossRef] [PubMed]
Arslan, B.; Bercin, S.; Aydogan, S.; Islamoglu, Y.; Dinc, B. SARS-CoV-2 Is Not Found in the Sweat of COVID-19 Positive Patients. Ir. J. Med. Sci. 2022, 191, 27–29. [Google Scholar] [CrossRef] [PubMed]
Ballabio, D.; Consonni, V. Classification Tools in Chemistry. Part 1: Linear Models. PLS-DA. Anal. Methods 2013, 5, 3790–3798. [Google Scholar] [CrossRef]
Valdés, A.; Moreno, L.O.; Rello, S.R.; Orduña, A.; Bernardo, D.; Cifuentes, A. Metabolomics Study of COVID-19 Patients in Four Different Clinical Stages. Sci. Rep. 2022, 12, 1650. [Google Scholar] [CrossRef]
Saheb Sharif-Askari, N.; Soares, N.C.; Mohamed, H.A.; Saheb Sharif-Askari, F.; Alsayed, H.A.H.; Al-Hroub, H.; Salameh, L.; Osman, R.S.; Mahboub, B.; Hamid, Q.; et al. Saliva Metabolomic Profile of COVID-19 Patients Associates with Disease Severity. Metabolomics 2022, 18, 81. [Google Scholar] [CrossRef]

Figure 1. Average area of chromatographic signals (n = 3) for each standard (acetic acid, toluene, hexanal, heptanal, nonanal, phenol, benzaldehydeand ethyl octanoate) using five different fibers. In total, 20 μL of the multicomponent mixture was used; the extraction temperature was 40 °C, and the extraction time was 20 min. The bars represent the standard deviation (n = 3). From left to right: orange 75 µm CAR/PDMS; dark green 85 µm CAR/PDMS, green DVB/CAR/PDMS; blue PDMS/DVB; pink PEG.

Figure 2. Effect of extraction time on extraction efficiency for four compounds (toluene, 1-Hexanol, 2-ethyl, 7-octen-2-ol,2,6-dimethyl and nonanal) in axillary sweat (n = 1). The extraction temperature was 40 °C, and desorption was carried out at 250 °C for 6 min.

Figure 3. (a) PCA scores graph differentiating the positive groups (A, red) and negative controls (C, blue). (b) PCA loadings graph.

Figure 4. Sample prediction for groups A and B by the PLS-DA model for ● training dataset + test dataset. Prediction value marked with red line.

Figure 5. ROC curves and sensitivity and specificity threshold for PLS-DA model.

Figure 6. The SIMCA prediction samples A from the ● training dataset + test dataset.

Figure 7. ROC curves and sensitivity and specificity threshold for the SIMCA classification model.

Figure 8. The Random Forest prediction samples A from the ● training dataset + test dataset.

Figure 9. Colored Scores graph differentiating (a) age, (b) sex, (c) injection batch, (d) symptoms and (e) variant.

Table 1. Average number of extracted compounds in the sample and in the blank for each type of fiber (n = 3).

Fiber	Number of Compounds in the Sample	Number of Compounds in the Blank
85 μm CAR/PDMS	120	40
75 μm CAR/PDMS	94	32
PEG	103	64
DVB/CAR/PDMS	155	25
PDMS/DVB	140	35

Table 2. Number of total compounds extracted in the study of bleach, vial and gauze.

	Bleach		Vial			Gauze
	Disinfected Vial	Non-Disinfected Vial	New	Reused	Heated	5 × 5	10 × 10
Nº of compounds	96	97	23	27	17	98	98

Table 3. Confusion matrix for training dataset.

		PLS-DA	SIMCA	RF
Real/Training	Samples	Correct	Correct	Correct
A	18	84%	73%	89%
C	23	96%	70%	100%
total	41	91%	71%	95%
Real/CV	Samples	Correct	Correct	Correct
A	18	62%	22%	56%
C	23	74%	78%	78%
total	41	68%	54%	68%
Real/Predicted	Samples	Correct	Correct	Correct
A	8	88%	0%	76%
C	11	73%	64%	82%
total	19	79%	37%	79%

Table 4. Features of interest are more abundant in patients with COVID-19. Features with a tick correspond to the VOCs selected to develop an aid for training medical canine units to detect positive COVID-19 cases.

Feature	Score	Identification	Result
777	80	Silane	Column Silane
50	84.1	Aldehyde	Expected metabolite (HMDB)
128	65	Aromatic hydrocarbon	Expected metabolite (HMDB)
65	80	Ketone	Expected metabolite (HMDB)
493	83	Fluoren-9-ol	Compound present in colonies
6	81.7	Hydrocarbon	Expected metabolite (HMDB)
23	80.8	Hydrocarbon	Expected metabolite (HMDB)
653	85.6	Ester	Expected metabolite (HMDB)
517	80.4	Borneol	Compound present in colonies

Table 5. Features of interest are more abundant in negative controls (healthy patients). VOCs with a potential human origin marked with a tick.

Feature	Score	Identification	Result
183	86.1	Silane	Silane from the column
406	76.9	4-methylundecane	Not present in HMDB
406	71.4	2,7,10-trimethyldodecane	Metabolite detected in breath (HMDB)
398	71	2,6,10,15-tetramethyllheptadecane	Metabolite (HMDB)
398	71	4-Methyl-5-propilnonane	Not present in HMDB
58	76.2	1-Metoxi-2-propanol	Dissolvent
193	73.7	Cyclohexanone	Metabolite detected in feces (HMDB)
130	86	Hexanal	Metabolite presence in HMDB
68	73.3	Pentanal	Metabolite present in HMDB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aizpitarte, D.P.; Larrañaga, E.; Mayor, U.; Isla, A.; Amigo, J.M.; Bartolomé, L. Untargeted Metabolomics and Multivariate Data Processing to Reveal SARS-CoV-2 Specific VOCs for Canine Biodetection. Chemosensors 2026, 14, 35. https://doi.org/10.3390/chemosensors14020035

AMA Style

Aizpitarte DP, Larrañaga E, Mayor U, Isla A, Amigo JM, Bartolomé L. Untargeted Metabolomics and Multivariate Data Processing to Reveal SARS-CoV-2 Specific VOCs for Canine Biodetection. Chemosensors. 2026; 14(2):35. https://doi.org/10.3390/chemosensors14020035

Chicago/Turabian Style

Aizpitarte, Diego Pardina, Eider Larrañaga, Ugo Mayor, Ainhoa Isla, Jose Manuel Amigo, and Luis Bartolomé. 2026. "Untargeted Metabolomics and Multivariate Data Processing to Reveal SARS-CoV-2 Specific VOCs for Canine Biodetection" Chemosensors 14, no. 2: 35. https://doi.org/10.3390/chemosensors14020035

APA Style

Aizpitarte, D. P., Larrañaga, E., Mayor, U., Isla, A., Amigo, J. M., & Bartolomé, L. (2026). Untargeted Metabolomics and Multivariate Data Processing to Reveal SARS-CoV-2 Specific VOCs for Canine Biodetection. Chemosensors, 14(2), 35. https://doi.org/10.3390/chemosensors14020035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Untargeted Metabolomics and Multivariate Data Processing to Reveal SARS-CoV-2 Specific VOCs for Canine Biodetection

Abstract

1. Introduction

2. Materials and Methods

2.1. Chemical and Reagents

2.2. HS-SPME-GC/MS-ToF Method

2.3. Fiber and Extraction Time Optimization

2.4. Sampling Design and Collection of Real Samples

2.5. Stability Study

2.6. Data Pre-Treatment and Multivariate Analysis

3. Results

3.1. Fiber and Extraction Time Optimization

3.2. Sampling Design and Collection of Real Samples

3.3. Stability Results

3.4. Multivariate Analysis Results

3.4.1. Data Pre-Treatment and Datasheet Creation

3.4.2. Multivariate Model Optimization and Validation

3.5. VOC Candidates for Constructing Artificial Training Aids for COVID-19 K9 Detection

3.5.1. VOCs Specific to Positive Patients

3.5.2. VOCs with Higher Presence in Negative Controls

4. Discussion

4.1. Discussion of Results Obtained in the Multivariate Study

4.2. VOCs in Negative Controls

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI