Gas Chromatography-Mass Spectrometry Analysis of Compounds Emitted by Pepper Yellow Leaf Curl Virus-Infected Chili Plants: A Preliminary Study

: Pepper yellow leaf curl virus (PYLCV) is a threat to chili plants and can signiﬁcantly reduce yields. This study aimed as a pilot project to detect PYLCV by analyzing compounds emitted by chili plants using gas chromatography-mass spectrometry (GC-MS). The samples investigated in this research were PYLCV-infected and PYLCV-undetected chili plants taken from commercial chili ﬁelds. The infection status was validated by using a polymerase chain reaction (PCR) test. A headspace technique was used to extract the volatile organic compounds emitted by plants. The analysis of GC-MS results began with pre-processing, analyzing sample compound variability with a boxplot analysis, and sample classiﬁcation by using a multivariate technique. Unsupervised multivariate technique principal component analysis (PCA) was performed to discover whether GC-MS could identify PYLCV-infected or not. The results showed that PYLCV-infected and PYLCV-undetected chili plants could be differentiated, with a total percent variance of the ﬁrst three principal components reaching 91.32%, and successfully discriminated between PYLCV-infected and PYLCV-undetected chili plants. However, more comprehensive studies are needed to ﬁnd the potential biomarkers of The results showed a clear plants, with the total variance PC1, PC2, and PC3 91.32%. The loading plot analysis showed that (2-aziridinylethyl)amine and acetic acid high to the component PC2; the samples were on the PC2 demonstrates that plant


Introduction
Chili (Capsicum annum L.) contains high levels of vitamins and minerals, which are very beneficial for human health. Besides being consumed as a seasoning, chilies are also used for food dye and are essential ingredients for the pharmaceutical and cosmetic industries [1,2]. Due to their incredible benefits, the consumption of chilies in Indonesia has increased by almost 29% over the period of 2013-2018. Furthermore, the plant's influence on the national economic dynamics is significant since it is included as one of the largest contributors to inflation. Indonesia itself has the second largest chili harvest area globally and the fourth largest chili production, with a contribution of 5.89% to the total world chili production [3].
In 2018, the productivity of chilies in Indonesia experienced a considerable decline (some 11.06%) [3]. One of the causes of this decrease was the presence of various diseases caused by viruses. Pepper yellow leaf curl virus (PYLCV), which belongs to the genus of Begomovirus [4,5], is the main virus that attacks chili plants, and in 2019 the virus attacks Thus, testing was carried out on plants taken from commercial chili fields. PCR tests were performed on symptomatic and asymptomatic plants to confirm the presence or absence of PYLCV. The GC-MS tests were then carried out for the samples. In our method, the sample to be tested with the GC-MS headspace technique was not extracted first but was directly picked from the tree. The samples were then placed into the glass vial for GC-MS testing; hence, the sampling process was faster, easier, and practically applicable. In addition, a shorter sampling process was expected to deliver the actual reading of the VOC emissions from the plants. The main objective of this research was to perform a multivariate analysis technique that could process the GC-MS result by recognizing the infected and uninfected samples. A boxplot analysis was carried out to analyze the variability of VOCs from the plant samples, while the multivariate technique PCA was performed to analyze the total VOC blend of the samples. This was to determine whether the system could distinguish the PYLCV-infected and -undetected samples or not.

Samples
The plant samples used in this research were PYLCV symptomatic and asymptomatic chili plants. There were five plant samples of the same age, which was two months old, investigated in this study. The first sample was a chili plant that had symptoms of PYLCV infection (yellowing and curling leaves) and was taken from a commercial chili plantation in Purworejo, Central Java, Indonesia, in early July 2020 (these samples were referred to as IPJ). The second sample was the same age as IPJ and was taken from the same place with the same plant treatment but showed no symptoms of PYLCV infection (designated by UPJ). GC-MS and PCR testing for the first and second samples were carried out in July 2020. Meanwhile, samples three and four were taken from Purworejo in September 2020, with the third showing symptoms of PYLCV attack and assigned the abbreviation IPS, whereas the fourth did not exhibit symptoms and was designated UPS. The fifth sample (marked as ICS) was a sample of chilies that showed PYLCV disease symptoms but came from a different area, namely Cangkringan, D.I. Yogyakarta. The plants were brought from the plantation to the laboratory using polybags, and their leaves were picked and immediately placed into vials for testing using PCR and GC-MS.

Polymerase Chain Reaction
The samples were tested by polymerase chain reaction (PCR) to confirm the infection of PYLCV. DNA isolation of chili plants was achieved by using a Tiangen genomic DNA extraction kit. Plant leaf samples were weighed at 0.3 g for isolation. Furthermore, DNA amplification was carried out using the polymerase chain reaction (PCR) method. The primers used were Begomovirus species-specific primers for pepper yellow leaf curl Indonesian virus (PYLCIV) species specificity virus, Pep Uni F (5 -GTG YWG TAY CTT CTG YGG AAY TKG A-3 ), and PI Uni R (5 -ACG CCG TAA ACG ATG TTT AYG CG-3 ) [40]. PCR mix contained Mytaq HS red mix, nuclease-free water (NFW), primer forward, primer reverse, and samples. The PCR mix was placed in a tube, and the PCR process was carried out in a thermocycler machine. In the PCR process, pre-denaturation and denaturation were carried out at a temperature of 95 • C for 1 min and 15 s, respectively. Next, annealing was carried out for 15 s at a temperature of 54 • C. Extension and final extension were then carried out for 10 s and 5 min at 72 • C. The PCR process was carried out in 35 cycles.
The results of DNA amplification were visualized using the electrophoresis method, employing 1% agarose, and Florosafe DNA staining was added for visualizing the DNA. In the process of electrophoresis, agarose gel was placed on an electroporator, Bio-Rad Mini-Sub Cell GT system. The marker used was the 1k bp marker (base pairs). Electrophoresis was carried out at a power supply voltage of 75 V for 45 min. Documentation of visualization results was carried out using a UV transilluminator and gel documentation system. The system confirmed the presence or absence of a specific band of PYLCV virus in the samples.

Gas Chromatography-Mass Spectrometry
The leaves of PYLCV-infected and PYLCV-undetected chili plants were subjected to gas chromatography-mass spectrometry (GC-MS). The GC-MS used in this study was the Thermo Scientific Trace 1310 GC coupled to a Thermo Scientific ISQ LT Single Quadrupole Mass Spectrometer (Thermo Fisher Scientific Inc., San José, CA, USA). The GC-MS employed a static headspace sampling method to extract the VOCs of samples and used the Chromeleon Chromatography Data System (CDS) version 7.2 software (Dionex, Sunnyvale, CA, USA). The column used was HP-5MS UI with a front column length of 30 m and a front column film thickness of 0.25 µm. The front inlet split-flow was 20.0 mL/min, and the purge flow was 3 mL/min. The gas saver time was 5 min, with a flow of 5 mL/min. The gas carrier was helium ultra-high purity (UHP), and the column flow rate was 1 mL/min.
Three leaves of each sample were placed into a glass vial until it reached the equilibrium time of 30 s, and the analytes vaporized on entering the headspace above the sample. The system equilibrated at 60 • C before the sample was injected into the GC-MS. The injection volume was 1000.00 µL. In the injection process, the oven was programmed with an initial temperature of 60 • C and gradually increased at the rate of 5 • C per minute until it reached 150 • C. The temperature rate was then increased to 10 • C per minute until it reached 230 • C. The GC-MS sampling process took about 60 min. The identification of compounds was made by mass spectra library matching using the National Institute Standard and Technology (NIST) NIST 14 standard library and references to standard compounds. The GC-MS results were then analyzed by using multivariate analysis.

Data Analysis
The first step in the data analysis was to pre-process the GC-MS results by selecting the compounds for further analysis. The spectra library matching generated three compounds with the highest similarity indexes. The selection of the compounds was based on the literature and the highest similarity index among the three. The acceptance criterion of the similarity index was above 700 (compounds with a similarity lower than 700 were eliminated). Several compounds were identified several times because of their high concentration and tailing. Only one compound was considered for the final list [41]. The GC-MS data were first normalized by considering the summation of all peak areas in the total ion chromatogram (TIC) of one sample as 100%. Each peak percentage was obtained by dividing its area by the total area of all compounds.
After pre-processing, the variability of the compounds emitted by samples was then analyzed by using a boxplot. This is a statistical method that can visualize the spread of data showing sample differences and the skewness of the dataset. The boxplot displays the first, second (median), and third quartiles. The spread of data is described by the distance between the first and third quartiles and is known as interquartile [42,43]. From the boxplot analysis, we could identify which compounds contributed to the high variability of the dataset.
The total compound blend of the five samples was then classified using PCA analysis to determine whether GC-MS could distinguish PYLCV-infected and PYLVC-undetected from samples taken from the fields. PCA is a multivariate analysis technique that extracts meaningful information from a dataset by reducing data redundancy so that it is easier to interpret. It transforms the data into a new base where the variance of the data is emphasized more [44]. An orthogonal transformation is performed on data variables that are possibly related to each other to obtain principal components (PC) where each PC displays different information. Most of the variance in the original data variables was described by the first few PCs [45]. The score from PCA describes the relationship between samples in the original data, while loading describes the relationship between the variables in the data set [46]. The software used for boxplot and PCA analyses was MATLAB R2020a.

Results
The plant samples used in this research were five PYLCV-infected and -undetected plants. Figure 1A shows the chili plantation where the IPJ and UPJ samples were taken, Figure 1B depicts the chili plantation area where ICS was taken, while Figure 1C depicts the IPJ and UPJ. The results of the PCR analysis of all symptomatic plants using the primer pairs PYLCIV Pep Uni F and Pep YLCIV PI Uni R showed the appearance of DNA fragments in plants showing indications of PYLCV infection. Meanwhile, no virus was detected on the asymptomatic plants, confirmed with no DNA bands appearing. Plants that had been tested by PCR were then tested by GC-MS to obtain five chromatograms for three PYLCV-infected plants and two PYLCV-undetected plants.

Results
The plant samples used in this research were five PYLCV-infected and -undetected plants. Figure 1A shows the chili plantation where the IPJ and UPJ samples were taken, Figure 1B depicts the chili plantation area where ICS was taken, while Figure 1C depicts the IPJ and UPJ. The results of the PCR analysis of all symptomatic plants using the primer pairs PYLCIV Pep Uni F and Pep YLCIV PI Uni R showed the appearance of DNA fragments in plants showing indications of PYLCV infection. Meanwhile, no virus was detected on the asymptomatic plants, confirmed with no DNA bands appearing. Plants that had been tested by PCR were then tested by GC-MS to obtain five chromatograms for three PYLCV-infected plants and two PYLCV-undetected plants.
The GC-MS data showed 80 compounds from five different samples, including carbon dioxide. The detection of carbon dioxide is possibly due to un-evacuated air inside the sample vial. This compound was then eliminated in further processing. As a result of the pre-processing stage, there were 39 compounds from all five samples (3 PYLCV-infected and 2 PYLVC-undetected). The compounds were VOCs from a variety of functional groups, such as amine (1 compound), other nitrogenous compounds (2 compounds), acids (4 compounds), alcohols (6 compounds), aldehydes (5 compounds), esters (8 compounds), hydrocarbons (3 compounds), ketones (9 compounds), and sulfur compound (1 compound). The details of the compounds with their references are shown in Table 1.    The GC-MS data showed 80 compounds from five different samples, including carbon dioxide. The detection of carbon dioxide is possibly due to un-evacuated air inside the sample vial. This compound was then eliminated in further processing. As a result of the pre-processing stage, there were 39 compounds from all five samples (3 PYLCV-infected and 2 PYLVC-undetected). The compounds were VOCs from a variety of functional groups, such as amine (1 compound), other nitrogenous compounds (2 compounds), acids (4 compounds), alcohols (6 compounds), aldehydes (5 compounds), esters (8 compounds), hydrocarbons (3 compounds), ketones (9 compounds), and sulfur compound (1 compound). The details of the compounds with their references are shown in Table 1. The variability of the compounds emitted by the plants constituting the five samples was then investigated by using a boxplot analysis, and the results are shown in Figure 2. The vertical axis of the boxplot shows the 39 compounds, while the horizontal axis depicts the value of the compound percentage area from the GC-MS result. The thick vertical black line shows the median of the samples, the left edge of the grey box indicates the 25th percentiles (Q1), while the 75th percentile (Q3) is represented by the right edge of the box. The interquartile range (Q3-Q1) measures the variability of data [41]. From Figure 2, it can be seen that there are two compounds with the highest variability, namely, (2-aziridinylethyl)amine and acetic acid.
In the next step, PCA analysis was carried out to determine whether the VOC blend emitted by PYLCV-infected and PYLCV-undetected plants could be distinguished or not. The results of GC-MS detection of the five samples that had been normalized and extracted yielded 39 compounds that were then assembled into a 5 × 39 matrix to perform PCA. A covariance matrix was generated from the 5 × 39 input matrix, which was then manipulated to minimize redundancy and highlight the important characteristics of the data. The eigenvector of the covariance matrix that has been manipulated was then sorted based on the magnitude of the eigenvalue, and the PC values were obtained. Figure 3 is the score plot of the PCA analysis in three dimensions. The total of the PC1, PC2, and PC3 components accounted for 91.32% of the explained variance, and the analysis of the PC scores showed that there were clear boundary areas that differentiated between PYLCVundetected and -infected plants. Meanwhile, a loading plot analysis to find out which compounds contributed to the PC value is depicted in Figure 4.

Discussion
The PCR test confirmed the presence of the virus in the three symptomatic plants and the absence of PYLCV in the two asymptomatic samples. The samples were then subjected to GC-MS, and after the pre-processing, the results showed that there were 39 VOCs from all five samples.
The boxplot analysis in Figure 2 was performed to show the variability of the compounds. From Figure 2, it can be seen that there were two main compounds with a large interquartile, namely, (2-aziridinylethyl)amine and acetic acid. The (2-aziridinylethyl)amine is in the amine functional group, and it is a type of secondary metabolite of plants. Apart from being a part of cell processes such as the division of cells and formation of nucleic acids and proteins, amine has also functioned as a component of chemical and physical defenses in dealing with herbivorous and pathogenic attacks [85]. In the references, (2-aziridinylethyl)amine was found on the seed of Persea Americana (Avocado) [86] and on the extract of a plant member of the Araceae family, Colocasia gigantea [47]. However, in those reports, there is no explanation of whether the compounds appear as part of cell processes or because of the stress that was experienced by the plants. In this research, the amine compound (2-aziridinylethyl)amine appeared only in infected plants; hence, we argue that it is a part of the defense from pathogenic attacks. However, further investigation needs to be undertaken to prove that (2-aziridinylethyl)amine can be used as a potential biomarker for PYLCV detection. On the other hand, acetic acid was the compound with the second-highest variability and appeared in all samples, except for IPJ. Plants release acetic acid as part of adaptation to biotic and abiotic stress [30], but, as demonstrated by IPJ, it may not always be emitted. This inconsistency is probably due to the differences in the environmental conditions, causing differences in the plants' biotic stress.

Discussion
The PCR test confirmed the presence of the virus in the three symptomatic plants and the absence of PYLCV in the two asymptomatic samples. The samples were then subjected to GC-MS, and after the pre-processing, the results showed that there were 39 VOCs from all five samples.
The boxplot analysis in Figure 2 was performed to show the variability of the compounds. From Figure 2, it can be seen that there were two main compounds with a large interquartile, namely, (2-aziridinylethyl)amine and acetic acid. The (2-aziridinylethyl)amine is in the amine functional group, and it is a type of secondary metabolite of plants. Apart from being a part of cell processes such as the division of cells and formation of nucleic acids and proteins, amine has also functioned as a component of chemical and physical defenses in dealing with herbivorous and pathogenic attacks [85]. In the references, (2aziridinylethyl)amine was found on the seed of Persea Americana (Avocado) [86] and on the extract of a plant member of the Araceae family, Colocasia gigantea [47]. However, in those reports, there is no explanation of whether the compounds appear as part of cell processes or because of the stress that was experienced by the plants. In this research, the amine compound (2-aziridinylethyl)amine appeared only in infected plants; hence, we argue that it is a part of the defense from pathogenic attacks. However, further investigation needs to be undertaken to prove that (2-aziridinylethyl)amine can be used as a potential biomarker for PYLCV detection. On the other hand, acetic acid was the compound with the second-highest variability and appeared in all samples, except for IPJ. Plants release acetic acid as part of adaptation to biotic and abiotic stress [30], but, as demonstrated by IPJ, it may not always be emitted. This inconsistency is probably due to the differences in the environmental conditions, causing differences in the plants' biotic stress.
Several compounds were found in more than one sample. Glycidol and 2-propanone, 1-hydroxy-were found in the infected plants IPS and UPS. Butanal, 2-methyl-and propanal, 2-methyl-were emitted by IPJ and virus-undetected UPS. The compounds emitted by plants from the same area, which is Purworejo.
The boxplot analysis was only carried out to investigate the variability of compounds. Furthermore, the PCA analysis was performed to ascertain whether the total blend of compounds in the GC-MS results from PYLCV-infected and -undetected plants was distinguishable or not. As shown in Figure 3, there is a clear separation between PYLCV-infected leaf samples and those not detected with PYLCV. The separator is a diagonal mirror plane that forms a 45 • angle to the PC1 and the PC2 axes. PYLCV-undetected plants were seen on the positive axis of PC2, while the three infected samples were on the negative axis of PC2. This suggests that PCA can differentiate between PYLCV-infected and PYLCV-undetected plants.
A loading plot analysis was also carried out to find out which compounds contributed to the PC value and is depicted in Figure 4. The (2-aziridinylethyl)amine had a high contribution to PC3 and negative PC2, and, as seen in Table 1, the compound was only found in infected samples (IPJ, IPS, and ICS) and was emitted in large numbers by IPJ and IPS; therefore, it was also located in the quadrant III, the same as IPJ and IPS in Figure 3. Meanwhile, acetic acid contributed to negative PC2 and lay in the same quadrant (VII) as the PYLCV-infected plants. Likewise, other compounds detected in more than one sample appear to have contributed to the PC value. Glycidol contributed to negative PC1 and was in the same quadrant as the PYLCV-infected plants in the score plot in Figure 3, and from Table 1, it is known that glycidol was only found in IPS and ICS.

Conclusions
A new method to distinguish PYLCV-infected from PYLCV-undetected plants has been developed. This method is based on GC-MS analysis and its interpretation using multivariate analysis techniques. The boxplot analysis showed that two main compounds contribute to the largest variability of the VOCs emitted by the two kinds of samples: (2-aziridinylethyl)amine and acetic acid. Acetic acid appeared inconsistently between samples because of the effects of the weather and environmental conditions. Meanwhile, (2-aziridinylethyl)amine is only found in infected plants. However, further research needs to be done to ensure it is a potential biomarker of PYLCV-infected plants.
PCA was then performed to see whether GC-MS can distinguish the compounds' total blend of PYLCV-infected and PYLCV-undetected chili plants. The results showed that PCA delivered a clear separation between the PYLCV-infected and -undetected chili plants, with the total variance of PC1, PC2, and PC3 reaching 91.32%. The loading plot analysis showed that (2-aziridinylethyl)amine and acetic acid made high contributions to the component score of PC2; therefore, the samples were separable on the PC2 axis. This demonstrates that GC-MS can be used as a highly effective and reliable method to detect plant disease.