Sniff species-SURMOF based sensor array discriminate aro- matic plants beyond the genus level

The Lamiaceae belong to the species-richest families of flowering plants and harbor many species used as herbs or for medicinal applications, such as Basils or Mints. Evolution of this group has been driven by chemical speciation, mainly of Volatile Organic Compounds (VOCs). The commercial use of these plants is characterized by a large extent of adulteration and surrogation. To authenticate and discern the species, is, thus, relevant for consumer safety, but usually requires cumbersome analytics, such as Gas Chromatography, often to be coupled with Mass Spectroscopy. We demonstrate here that quartz-crystal microbalance (QCM)-based electronic noses provide a very cost-efficient alternative, allowing for a fast, automated discrimination of scents emitted from leaves of different plants. To explore the range of this strategy, we used leaf material from four genera of Lamiaceae along with Lemongrass as similarly scented, but non-related outgroup. In order to unambiguously differentiate the scents from the different plants, the output of the 6 different SURMOF/QCM sensors was analyzed using machine learning (ML) methods, together with a thorough statistical analysis. The exposure and purging datasets (4 cycles) obtained from a QCM-based, low-cost homemade portable e-Nose were analyzed with Linear Discriminant Analysis (LDA) classification model. Prediction accuracies with repeating test measurements reached values of up to 90%. We show that it is not only possible to discern and identify plants on the genus level, but even to discriminate closely related sister clades within a genus (Basil), demonstrating that e-Noses are a powerful technology to safeguard consumer safety against the challenges of globalized trade.


Introduction
Plants have developed subtle mechanisms to defend and adapt themselves against biotic and abiotic stress factors. One of the ways plants have evolved to protect themselves is by producing volatile organic compounds (VOCs) as part of their essential oils [1]. These plant essential oils, often with monoterpenes as primary components, accumulate in different organs such as leaves, bark, wood, roots, in flowers or fruits, sometimes in specialized glands, but also in lysogenic or schizogenic oil ducts [2]. These VOCs are the base for human use of aromatic plants, both as spices and for medicinal applications. The multitude of VOC profiles has been shaping entire cultures, cuisines, and medical traditions.
One of the most prodigious plant families in this context are the Lamiaceae. With more than 7,000 species belonging to more than 200 genera, they belong to the taxonomically most challenging and diverse groups in the flowering plants [3]. They secrete complex bouquets of VOCs from their glandular hairs and scales. These bouquets are often specific for a given species, and due to their interaction with pollinator insects, might have been even one of the drivers for the immense complexity of this family. Sometimes even within a species, different chemotypes exist. Commercially relevant plants, such as the Mints or the Basils, belong to this group, and are often part of novel food trends that are fueled by their reputation to be so called "superfoods" [4].
With growing concerns for holistic approaches towards health, there has been a trend in usage of supplements and plant-based products which have their roots in traditional medical systems like Ayurveda or Traditional Chinese medicine. With the growing popularity of Ayurveda in Europe, products containing Ocimum tenuiflorum L. are readily available in supermarkets. O tenuiflorum (Holy Basil or Tulsi) has been used for treating ailments like pains in the joints, headache, cold, fever, and also insect bites [5][6][7][8][9]. In addition, Holy Basil has also been recommended to relieve stress [10], and to reduce the effects of diabetes mellitus [11]. Due to the benefits attributed to Holy Basil, the volume of its market in the West is progressively increasing [12]. This accentuates the problem of authentication and identification of commercial products declared to contain Tulsi [13]. The genus Ocimum is composed of many species, several of which are commonly traded. However, each species is endowed with a unique chemical profile that is mostly genetically determined [12]. To authenticate O. tenuiflorum by microscopic diagnostic is possible [5], although limited, especially in commercial products that are often processed [14]. In addition, it is possible to discriminate true O. tenuiflorum from other Basils on the basis of DNA barcoding [12], a rather expensive and time-consuming process. An alternative would be the detection of the different odorous contents, since the spectrum of volatile organic compounds (VOCs) emitted by Ocimum species is unique [15]. In fact, a trained human nose is able to distinguish O. tenuiflorum from other Basils, due to the emission of specific patterns of volatile phenylpropanoids [16].
The profile of VOCs emitted by a plant is unique to a particular species and provides a unique way to identify plant species contained in commercial products. However, the chemical analysis necessary for such an identification of gaseous compounds, is a time consuming and costly undertaking, requiring gas chromatography (GC). Since a human nose can discriminate those species, sensor arrays, also referred to as electronic noses, might offer a cost-efficient, convenient, and fast alternative. [17,18]. Such e-Noses have been successfully used for discriminating different medicinal plants on the basis of the different emitted VOCs [19,20]. Quartz crystal microbalance (QCM)-based sensor arrays have also been previously used to differentiate between plants from the Lamiaceae family [21,22]. For instance, a multichannel QCM (MQCM) with molecularly imprinted polystyrene membranes has been used for discrimination of terpenes emanated from freshly dried species belonging to the Lamiaceae, such as Rosemary (Rosmarinus officinalis L.), Sweet Basil (Ocimum basilicum), and Common Sage (Salvia officinalis) [21].
A crucial point when fabricating QCM-based sensors is the detector material used for coating the QCM substrate. In this context, reticular compounds like metal-organic frameworks, or MOFs, carry a huge potential. These porous materials can be easily modified to yield different responses to VOCs, thus allowing to fabricate sensor arrays with each component showing different sensitivities.
In our previous work we have used an e-Nose to differentiate between different species of Mints or VOCs isolated from them [23]. In the present study we focus on QCM sensors coated with MOF thin films. Six different MOFs were investigated, including HKUST-1, Cu(BDC), Cu(BPDC), Cu2(DCam)2(dabco), Cu2(DCam)2(BiPy), and Cu2(DCam)2(BiPyB) [24]. MOF thin films were deposited using layer-by-layer methods, yielding co-called SURMOFs (surface anchored MOFs) [25]. To validate the performance of these SURMOF-based QCM arrays, we challenged them by testing the ability of these sensors to discriminate different chemotypes of closely related species (the two mints tenuiflorum, Holy Basil, Tulsi). In contrast to Sweet Basil, which had been addressed by e-Noses previously [22], we wanted to test, to what extent it would be possible to discern true O. tenuiflorum from closely related sister species, since O. campechianum is a member of the closest known haplotype known for the genus Ocimum [12]. All accessions were selected from the authenticated reference plant collection of the Botanic Garden of the Karlsruhe Institute of Technology. The response times of the QCM sensors upon exposure to and removal of a particular scent were determined using Nonlinear Least-Square (NLS) fits to an exponential rise (or fall) function and were found to amount to less than 1 min. [26,27] The exposure and purging datasets (4 cycles) obtained from a low-cost homemade portable e-Nose were analyzed using machine learning approaches, employing three different classification methods, i.e. PCA, LDA, and k-NN techniques.
[22] The first and second cycles of the data sets were used for training and the following repeated cycles were used as unknown data for prediction. A statistical analysis revealed that more than 90% classification accuracy could be achieved within 8 different scent classes from 3 different plant leaves in very short time (less 6 min). Prediction accuracies with repeating test measurements reached up to 90% for LDA and k-NN from unknown data sets.

Plant material
The present study included freshly collected 3 different species of Basil, 4 different species of Mints, and a control sample of Lemon Grass grown in the Botanical Institute of Karlsruhe Institute of Technology (KIT), Germany ( Table 1). The scents were collected from 3 g weight of fresh leaves from Ocimum campechianum Mill., Cymbopogon citratus, Ocimum tenuiflorum L., Melissa officinalis L, Mentha aquatica, Agastache rugosa, Mentha suaveolens. The abbreviations used throughout the text were defined in the Table  1. Table 1: Accessions used in this study. The voucher number gives the code, under which the plants are available in the Botanical Garden of the KIT. The abbreviations used in the text are also given. The number of leaves harvested to reach 3 g is indicated as well.
Prior to SURMOF deposition, the QCM substrates were functionalized by an O2 plasma treatment for 30 min. All films were prepared using 30 synthesis cycles. The SUR-MOF synthesis details were provided in the Supporting Information part of our previous work [24]. X-ray diffraction (XRD) was used to characterize the SURMOFs thin films grown on the QCM sensors, the diffractograms are shown in Figure SI Fig. 2 shows a schematic view of the working principle of the 6-channel low-cost homemade portable e-Nose system used for discrimination of scents of Basil/Mint leaves. The sensor array and a humidity/temperature sensor were placed inside a 3D-printed head space in a cylindrical form. For the QCM data acquisition, 5 V/16 MHz ATMega32U4 microcontrollers and open-source Pierce oscillator circuits designed by openQCM have been used [28] to read the frequency change. Temperature and humidity were measured with Adafruit HHTU21D-F temperature & humidity sensor breakout board. The temperature of the chamber was kept constant at 25±0.5 °C. The software package MATLAB has been used to record and analyze the data.

Data acquisition with the e-Nose
3g of freshly collected leaves from each species of Basil and Mints were inserted separately into a 100 mL glass vial. The emanating VOCs emitted from the fresh plant leaves inside the bottle were circulated through the sensor array with a 3W small diaphragm pump with a small flow rate 0.1 L/m while the inlet and outlet of N2 gas closed. The surface of the sensing thin films inside the head space was activated by purging with N2. This process led to a removal of residual compounds within the SURMOF pores. For each Basil/Mint scent accession, the change in resonance frequency was recorded 6 min for each cycle with 2 min exposure for adsorption, and subsequently, 4 min of purging during cleaning with dry N2 gas. The exposure and purging cycles were repeated 4 times.

Figure 2.
A schematic view of the working principle of the 6-channel low-cost homemade portable E-nose system used for discrimination of scents of Basil/Mint leaves.
MOFs are highly porous materials with huge specific surfaces [29]. SURMOFs coated on a QCM will adsorb the VOCs on the outer surface, as well as inside the pores (see Fig. 3) [24]. Of course, for the latter it is required that the pores and channels inside the MOF are sufficiently large to accommodate diffusion of the VOC into the pore system. A quantitative determination of the total amount of a particular VOC loaded into a MOF thin film can be carried out using a QCM. In the present case, the scent emitted from a plant consist of a large variety of different compounds, their number typically exceeding 20 [30][31][32].

Data Analysis and Classification
The QCM-response after exposure to the plant scent and after purging with dry nitrogen is shown in Fig. 4. Is was found that single-component rise and fall functions well described the QCM data for times up to 60 a after start of exposure/purging. At later times there is a linear behavior, indicating diffusion into and out of the pores [34][35][36]. The frequency shift of the QCM sensors is directly proportional to the change of the absorbed mass according to the Sauerbrey relation [37], e.g.  ( ) = −  ( ), where C is the QCM mass sensitivity constant, which is related to structural and physical properties of the piezo electrical quartz sensor material. The frequency response times were calculated from Nonlinear Least-Square (NLS) fits of the QCM response to an exponential rise function [26,27] in the time interval between 5s and 60s.
The QCM signal drop observed after removing a particular scent has been determined by a NLS fit to a exponential decay function in the time interval between 125s and 180s using the following expression: where  and  are the relaxation time related to the association constant of the adsorption process and desorption process respectively.
During the discrimination analysis of the scents, the first cycle of the loading/purg- PCA is an unsupervised machine learning method that uses dimension reduction, and data visualization [38,39].   The resonance frequency shifts of the sensor array consisting of 7 QCM sensors coated with all the 7 different sensing materials (see Table 1 Table 2. The maximum frequency shifts response values of the sensor arrays shown in the radar plot in Figure 5. The sum of the three scores of the total variance explained by each principal component in 3D plot given in Fig. 6 is equal to 96.3%. By introducing the 4th. and 5th. PCA components, the visual PCA discrimination accuracy reaches to 99,8%.

Linear Discrimination Analysis (LDA)
The 2D plot of the Linear Discriminant Analysis for the 8 different species with 95% confidence ellipse is presented in Fig 7(a). The so-called confusion matrix was calcu-   with the average of 79.2%. The prediction accuracies in Table 3 show similar overlap between Bas8257 (Krishna Tulsi) and Bas5751 (Tulsi) confirming that both samples originating from the same plant species. Table 3. LDA prediction results for unknown data sets obtained from different cycles of measurement after training with data sets from 1 st cycle, 2 nd cycle, 3 rd cycle.

Nearest Neighbour Analysis (k-NN)
As a third non-parametric classification scheme, we applied the k-NN analysis with a 10fold (k=10) calculation of the unknown data sets from the second cycle as compared to the true assignment from the training data set collected during the first cycle. The data sets from the second cycle of the e-Nose measurement were used for the k-NN calculation to determine prediction accuracy for the unknown observations. The k-NN discrimination accuracy was obtained as 94.2% with 5.8% misclassification (See Fig 9a). The overall prediction accuracy for the unknown data sets was smaller than in case of LDA with 82.3% corresponding to 17.7% misclassification.
The change in the k-NN discrimination and prediction accuracies with increasing number of nearest neighbors between 2 and 50 is given in Figure SI-4  (a) (b) Figure 9. The discrimination (a) and prediction (b) confusion matrixes obtained from the k-NN analysis with10-fold (k=10) calculation with the unknown data sets from the second cycle to compare with the training data set (true labels) obtained from the first cycle.

Conclusions
In this work, sensor arrays based on six different SURMOFs have been used successfully for the discrimination of eight aromatic plants, seven of which belonged to the taxonomically challenging family of the Lamiaceae. The exposure and purging datasets (4 cycles) obtained from a low-cost custom-made portable e-Nose were analyzed with a Linear Discriminant Analysis (LDA) classification model. The first and second cycles of the data sets were used for training and the following repeated cycles were used as unknown data for prediction. More than 90% classification accuracy has been obtained within 8 different scent classes. Prediction accuracies with repeating test measurements reached up to 90% for LDA from unknown data sets. We can show that it is not only possible to discern and identify plants on the genus level (Mentha, Agastache, Melissa, all belonging to the Mentheae tribe within the Lamiaceae), but even to discriminate closely related sister clades within a genus (Basil). In addition, we were able to unequivocally separate Lemon Grass (Cymbopogon citratus) from Common Balm (Melissa officinalis L), although these species share an intense lemon-like scent and are often used for mutual surrogation and adulteration, demonstrating that the e-Nose excels most human noses that can be easily tricked by replacements of these two species. This study paves way for potential use of the sensors in detection of food adulterants. The portability and quick response of the sensor arrays demonstrates a huge potential for a future fabrication of cheap monitoring devices for use in food industry and food surveillance.