Combination of GC-MS Molecular Networking and Larvicidal Effect against Aedes aegypti for the Discovery of Bioactive Substances in Commercial Essential Oils

Dengue is a neglected disease, present mainly in tropical countries, with more than 5.2 million cases reported in 2019. Vector control remains the most effective protective measure against dengue and other arboviruses. Synthetic insecticides based on organophosphates, pyrethroids, carbamates, neonicotinoids and oxadiazines are unattractive due to their high degree of toxicity to humans, animals and the environment. Conversely, natural-product-based larvicides/insecticides, such as essential oils, present high efficiency, low environmental toxicity and can be easily scaled up for industrial processes. However, essential oils are highly complex and require modern analytical and computational approaches to streamline the identification of bioactive substances. This study combined the GC-MS spectral similarity network approach with larvicidal assays as a new strategy for the discovery of potential bioactive substances in complex biological samples, enabling the systematic and simultaneous annotation of substances in 20 essential oils through LC50 larvicidal assays. This strategy allowed rapid intuitive discovery of distribution patterns between families and metabolic classes in clusters, and the prediction of larvicidal properties of acyclic monoterpene derivatives, including citral, neral, citronellal and citronellol, and their acetate forms (LC50 < 50 µg/mL).


Introduction
Dengue is a viral infection transmitted mainly by the female Aedes aegypti mosquito. This disease mainly affects tropical regions, depending on the rain precipitation rate, temperature, humidity and urbanization process [1][2][3][4][5][6][7]. The number of cases has increased almost 8-fold over the last 2 decades (from 505,430 in 2000 to 2.4 million in 2010; 4.2 million in 2019), leading to the death of more than 4000 people in 2015. Regions of Latin America, East Asia and the Western Pacific account for over 70% of the cases. In Brazil, more than 1.5 million cases (more than 1000 deaths) were recorded in 2016 alone [1,7,8].
The WHO indicates that combatting the transmitting mosquito is the most efficient strategy to control and prevent dengue [9]. However, there are still no specific insecticides or repellents (natural or synthetic) to exclusively combat the Ae. Aegypti mosquito. Consequently, insecticides and repellents can cause disturbances and/or the mortality of important insects, such as bees and ants, resulting in the degradation of important ecosystem services, including the pollination of crops [9]. Thus, nontoxic and specific repellent agents or larvicides against Ae. aegypti are urgently required to both reduce the number of direct spectral correspondence between MS/MS spectra and compound libraries (MS/MS data) and through the relationship of molecular network masses (differences) between closely related structures (degree of spectral similarity). A mass difference of 15 Da between nodes with a high degree of similarity may suggest a CH3 group for the same class of compound, while differences of 162, 146 or 132 Da may correspond to homologues glycosylated with hexose, deoxyhexose or pentose [36,[39][40][41][42].
In this study, we propose a strategy combining the spectral similarity networking (molecular networking) approach with larvicidal activity tests against Ae. aegypti to analyze commercial essential oils with the aim of discovering potential bioactive metabolic classes. The GC-MS retention time and fragmentation, chemotaxonomy and larvicidal activity against Ae. aegypti (LC50 values) of essential oils were organized, grouped and evaluated by molecular networking. Figure 1 shows a graphical representation of the strategy.

Results
Essential oils from 20 plant species (from 9 families) with insecticidal properties were analyzed by an untargeted profiling method using GC-EI/MS (Supplementary Materials) and tested against Ae. aegypti larvae (third instar, L3), evaluating their mortality rate at 24 h and 48 h, Table 1. According to Dermaque et al. [43], an initial screening strategy to preselect an extract active against Ae. aegypti larvae involves testing at a single concentration, standardized at 250 ppm.
In order to annotate and streamline the process of discovering bioactive classes in essential oils, we applied the molecular network approach with retention time and fragmentation data from the GC-MS/MS experiments and metadata, such as taxonomy and LC50 values calculated in the Ae. aegypti larvicidal tests.

Results
Essential oils from 20 plant species (from 9 families) with insecticidal properties were analyzed by an untargeted profiling method using GC-EI/MS (Supplementary Materials) and tested against Ae. aegypti larvae (third instar, L3), evaluating their mortality rate at 24 h and 48 h, Table 1. According to Dermaque et al. [43], an initial screening strategy to preselect an extract active against Ae. aegypti larvae involves testing at a single concentration, standardized at 250 ppm.
In order to annotate and streamline the process of discovering bioactive classes in essential oils, we applied the molecular network approach with retention time and fragmentation data from the GC-MS/MS experiments and metadata, such as taxonomy and LC 50 values calculated in the Ae. aegypti larvicidal tests.

Molecular Networking
The molecular network of 20 essential oils resulted in 82 nodes (with qualitative, quantitative and metadata for each compound), connected by 258 edges and grouped into a large cluster with 76 nodes and 6 lone pairs, as shown in Figure 2. One of the advantages of MN is the ability to create filters for nodes and edges in order to recognize patterns in the dataset. In this sense, different colors were attributed to nodes considering the retention time of the compounds present in the GC-MS profiles (red-longer RT; yellow-shorter RT) and to the relative abundance of ions represented by the node sizes ( Figure 2). The distribution of colors in the cluster regions indicated that different classes of metabolites must be grouped differently. In addition, most of the compounds (nodes) were eluted between 7 and 20 min, which may denote certain metabolic classes and, consequently, assist in the annotation process. Another aspect is to evaluate the correlation of these compounds between families and their chemotaxonomic characteristics. It is also possible to make an indirect association between the chemotypes and the larvicidal potential of the cluster regions. Figure 3 shows the MN using the compound distribution filter (relative abundance of ions) among the families using a color gradient in the nodes. Yellow represents little or no abundance, while blue represents a high abundance of ions. The Burseraceae and Cupressaceae families presented a strong correlation with most of their compounds (nodes) located in the region with the lowest retention times. Geraniaceae and Lamiaceae displayed a wide distribution of their compounds in the MN, while metabolites in Lauraceae, Myrtaceae, Poaceae and Rutaceae were observed between 7 and 20 min. Pinaceae concentrated its compounds in longer retention times. Another aspect is to evaluate the correlation of these compounds between families and their chemotaxonomic characteristics. It is also possible to make an indirect association between the chemotypes and the larvicidal potential of the cluster regions. Figure 3 shows the MN using the compound distribution filter (relative abundance of ions) among the families using a color gradient in the nodes. Yellow represents little or no abundance, while blue represents a high abundance of ions. The Burseraceae and Cupressaceae families presented a strong correlation with most of their compounds (nodes) located in the region with the lowest retention times. Geraniaceae and Lamiaceae displayed a wide distribution of their compounds in the MN, while metabolites in Lauraceae, Myrtaceae, Poaceae and Rutaceae were observed between 7 and 20 min. Pinaceae concentrated its compounds in longer retention times.
Larvicidal activity against Ae. aegypti was used as a filter for the third step of evaluating the molecular network of commercial essential oils. The LC 50 values (Table 1) from essential oils were used in two different ways to calculate the individual larvicidal activity for each node (substance). First, we calculated the average of the LC 50 values for each node (one substance may be present in different EO), Figure 4A. Alternatively, we calculated the relative average considering ion abundance (present differently in each EO) for each node, Figure 4B. In both molecular networks ( Figure 4A,B), we colored the maximum and minimum calculated LC 50 values in four equidistant categories. The pink color represents the most active nodes, followed by blue, green and yellow. Although the ranges were different between the different averages, the color pattern was similar.
Molecules 2022, 27, x FOR PEER REVIEW 7 of 19 Figure 3. Molecular networks filtered for the relative abundance of ions present in the studied families. The color gradient represents ion (compound) abundance in the essential oils (family). Yellow-low or no abundance; blue-high abundance. The node size also represents the total relative abundance of ions.
Larvicidal activity against Ae. aegypti was used as a filter for the third step of evaluating the molecular network of commercial essential oils. The LC50 values (Table 1) from essential oils were used in two different ways to calculate the individual larvicidal activity for each node (substance). First, we calculated the average of the LC50 values for each node (one substance may be present in different EO), Figure 4A. Alternatively, we calculated the relative average considering ion abundance (present differently in each EO) for each node, Figure 4B. In both molecular networks ( Figure 4A,B), we colored the maximum and minimum calculated LC50 values in four equidistant categories. The pink color represents . Molecular networks filtered for the relative abundance of ions present in the studied families. The color gradient represents ion (compound) abundance in the essential oils (family). Yellow-low or no abundance; blue-high abundance. The node size also represents the total relative abundance of ions.
As a result, it was possible to observe three regions of molecular network with different larvicidal potentials. It is important to emphasize that these regions are projections of bioactivity, but may be used as a guide for regions/compounds to be explored and studied. In this case, for both averages (molecular networks) the lower right side of the cluster, colored in pink and blue, suggests a region with higher larvicidal potential. In Figure 4B, there is also a pink diagonal projecting this potential. the most active nodes, followed by blue, green and yellow. Although the ranges were different between the different averages, the color pattern was similar.  The fourth filter of this molecular networking was to annotate the nodes. We used the GNPS library combined with our in-house NIST database and gas-phase fragmentation data. The annotated compounds were then classified hierarchically according to the NP-Classifier ontology [44], as shown in Figure 5. Figure 5 represents the filtered molecular network for compound annotation. The ellipse represents the level of the superclass (terpenes), while the colors represent the distribution of classes into monoterpenes (orange) and sesquiterpenes (purple). In the lower part of Figure 5, it is possible to observe the group of expanded monoterpenes and sesquiterpenes. Eight subclasses were found for monoterpenes: acyclic (orange), camphane (green), fatty alcohols (light blue), menthane (blue), monocyclic (purple), pinane (pink) and thujan (red), while six subclasses were classified as sesquiterpenes: cadinan (yellow), caryophyllane (green), elemane (light blue), germacrane (blue), himachalane (dark blue) and longifolene (pink) alcohols. As a result, it was possible to observe three regions of molecular network with different larvicidal potentials. It is important to emphasize that these regions are projections of bioactivity, but may be used as a guide for regions/compounds to be explored and studied. In this case, for both averages (molecular networks) the lower right side of the cluster, colored in pink and blue, suggests a region with higher larvicidal potential. In Figure 4B, there is also a pink diagonal projecting this potential.
Similarly, we projected the LC 50 values of the tested compounds onto the molecular network to confirm the pharmacological patterns pointed out by MN, Figure 6. As a result, the diagonal indicated by MN including acyclic, monocyclic monoterpenes and menthanetype monoterpenes were active against Ae. aegypti larvae. In addition, some menthane-type monoterpene derivatives, such as D-limonene, were highly active.

Discussion
Essential oils and their structural analogues have historically made an important contribution as repellents or insecticides against Ae. aegypti in different communities [8,14,45,46]. However, larvicide/insecticide products based on natural products (NP) are scarce in the industry, revealing some difficulties associated with applying traditional approaches in NP [43,47,48].
Typically, a traditional approach uses organic solvents to produce crude extracts (polar or non-polar) that are screened for pharmacological activities and then fractionated into dozens of portions for further iterative bioguided analyses. In each cycle, the active fractions are reassembled after verification with spectral experiments, which can range from simple UV light to more expensive NMR analyses. Thus, the cost of pinpointing the NP-based product by traditional methods is high when considering that hundreds of ex-

Discussion
Essential oils and their structural analogues have historically made an important contribution as repellents or insecticides against Ae. aegypti in different communities [8,14,45,46]. However, larvicide/insecticide products based on natural products (NP) are scarce in the industry, revealing some difficulties associated with applying traditional approaches in NP [43,47,48].
Typically, a traditional approach uses organic solvents to produce crude extracts (polar or non-polar) that are screened for pharmacological activities and then fractionated into dozens of portions for further iterative bioguided analyses. In each cycle, the active fractions are reassembled after verification with spectral experiments, which can range from simple UV light to more expensive NMR analyses. Thus, the cost of pinpointing the NP-based product by traditional methods is high when considering that hundreds of extracts need to be analyzed to find a hit molecule [47,49].
This challenge has been partially addressed through the development of dereplication methods, which aim to prioritize the discovery of bioactive substances, still in the crude extract, avoiding the re-isolation of interferents or known compounds. Dereplication uses previously established data (spectroscopic data, pharmacological and physicochemical properties) present in databases, scientific literature and computational tools to compare samples and reference material and, therefore, annotate and to attribute some properties to substances still in extracts or fractions. Despite offering advantages over traditional methods, this strategy faces some obstacles, as follows: (1) the databases and literature are not comprehensive and standardized, rendering it difficult to discover non-ubiquitous compounds; (2) the comparison process is still univariate between the sample(s) and the reference material, making the whole process time-consuming; (3) the global relationships between samples and metabolites are little explored [18,25,50,51].
On the other hand, chemometrics and bioinformatics have faced these obstacles introducing a holistic view of how to manage data and especially how to extract relevant information from vast datasets. Some of the strategies employed in genomics and proteomics are gradually being introduced into NP science. One such strategy is the use of spectral similarity information (or any other molecular data) that can provide clues as to how the constituents of a sample are organized (classes, substituents and properties) [38,40,[52][53][54].
This organization concept is already used in the GNPS platform generating spectral networks of MS/MS data. Therefore, not only can single samples be screened for similarity, but hundreds or even thousands of samples can be organized into a single network, the so-called molecular network. Thus, using a spectral similarity score (cosine score) we can organize families, classes, substituents from one or several samples at once and still discover distribution patterns of unknown metabolites [25,[55][56][57][58][59].
However, several parameters need to be adjusted to extract relevant information from the MS/MS data. In LC-MS, MS/MS experiments are performed in data dependent analysis (DDA), which means that each ion is initially isolated from the others, fragmented and then analyzed (collected). In contrast, most GC-MS systems do not possess an ion isolation chamber and the separation of substances depends on chromatographic resolution, which is often insufficient to separate all of the metabolites. This results in the fragmentation of more than one substance at the same time, making it difficult to apply spectral comparison (similarity) tools between samples and the reference [53,60].
In this sense, our strategy largely overcomes these limitations, since the MS/MS spec-tral data from GC-MS are initially deconvoluted and aligned by the Mzmine 2 tool and then compared and organized by spectral similarity networking (molecular network GNPS). As shown in Figures 2-6, it is possible to clearly visualize how substances are distributed throughout the chromatographic analysis, showing potential metabolic classes with different fragmentation profiles. It was possible to group information at the family level and discover patterns among them. We noticed that some families share similar MS/MS profiles, some of which have similar pharmacological properties. This allows us to extract chemotaxonomic information and prioritize bioactive families [61].
The incorporation of metadata into the molecular network opens up new opportunities to discover unknown patterns in samples. The relative mean of LC 50 values from larvicidal experiments in Ae. aegypti, in several samples, allowed us to estimate the pharmacological effect of individual compounds that were repeatedly present in the samples and also indicate the bioactive ones. This is an innovative strategy in terms of discovering bioactive compounds in crude extracts, particularly in GC-MS experiments.
We annotated compounds quickly and easily using our in-house library combined with the molecular network. Using the unknown to known annotation principle, the compounds determined by the library were used in the network to evaluate the neighbors (similar) and their spectra, facilitating and speeding up the annotation. During this process we also noticed that pharmacological patterns were associated with some metabolic classes such as acyclic, monocyclic and methane monoterpenes. Furthermore, they rationally shared some characteristics such as the presence of oxygen in the form of an alcohol or carbonyl group (acetate, aldehyde or ketone), an unsaturation index between 1 and 3, and masses between 152 and 156 Da. This suggests that this strategy can be employed to indicate potentially promising chemical classes.
Finally, we tested the larvicidal activity of some of the isolated monoterpenes, revealing the bioactive potential of some of them. We therefore confirmed the potential of the strategy to not only predict the pharmacological activity of compounds in crude extracts and fractions, but also facilitate pattern recognition in samples and metabolites.

Materials and Methods
The essential oils were purchased from BioEssência ® , Jaú, Brazil, and analyzed by dissolving in ethyl acetate at a concentration of 5 µg/mL. Table 1 details the percentages of the major compounds in each essential oil.

Larvicidal Activity against Ae. aegypti
Larvicidal tests were performed with the Ae. aegypti Rockefeller strain. Third instar larvae (L3) were obtained from infection-free colonies maintained in the insectary of the Laboratory of Pharmacognosy of the University of Brasília. Colony maintenance is in accordance with World Health Organization guidelines [62] Monthly monitoring of this strain, which is susceptible to insecticides, using dose-response curves performed in 12-well plates with 10 L3 larvae, with temephos as the positive control (concentrations ranging from 0.05 to 0.003125 µg/mL).
We optimized the WHO larvicidal trial to perform rapid screening and subsequent scale-up without harm. Assays were performed as described by Silva et al., 2020 [63], using 12-well plates, with 3 mL of tap water, 10 L3 larvae and 50 µL of sample diluted in DMSO. This test is rapid, uses a small sample and allows the screening of many essential oil samples for major compounds.
The samples were tested in quadruplicate using a negative control of 0.025% dimethyl sulfoxide in tap water at pH 7.75, conductivity at 34.5 µS/cm and temephos as the positive control. This organophosphate is used as a positive control due to its efficiency against Ae. aegypti Rockefeller strain (100% mortality at 0.35 µg/mL after 24 and 48 h with LC 50 of 0.019 µg/mL), being used in private companies in Brazil as a pest control agent.
For initial screening, only the mortalities of essential oils at the final concentration of 250 µg/mL were determined, after 24 h and 48 h. The 50% lethal concentration values (LC 50 µg/mL) were estimated using the test concentrations 250, 125, 62.5 and 31.25 µg/mL for the essential oils and 100, 50, 25, 12.5 and 6.25 µg/mL for pure compounds (GraphPad Prism 7.0 software, GraphPad, La Jolla, CA, USA). Larvae mortality was determined after a 24 h exposure treatment. For each bioassay, the temperature was maintained at 28 ± 2 • C and 70 ± 10% RH, with a 12 h photoperiod.

GC-MS Analysis
For the essential oil analysis, we used the analytical methodology described by Adams (2007) with adaptations. Analysis involved gas chromatography coupled with mass spectrometer detection (GCMS-QP2010) according to the following parameters: injector temperature, 250 • C; column temperature, 60 • C; heating ramp from 60 to 210 • C, at 3 • C/min, with a total time of 50 min; chromatographic column, DB-5, 30 m × 0.25 mm in diameter, 0.25 µm in thickness; helium was used as the carrier gas, under 79.7 kPa at 1.30 mL/min, with a linear velocity of 41.6 cm/s and 1 µL injection volume and a 1:60 split.
The mass detector parameters were as follows: ion source temperature, 250 • C; interface temperature, 260 • C; solvent cutoff at 3.0 min; Scan mode, from 35 to 400 m/z; detector voltage, 0 kV.

Molecular Networking
GC-EI/MS data were initially processed using the GCsolution (Shimadzu-Tokyo, Japan). Mass spectrometry molecular networks were created using the GNPS platform (http://gnps.ucsd.edu, accessed on 30 November 2021) [35,64]. As the mass data from the EI experiments did not present pre-selection of precursor ions (called acquisition format of DIA), a spectral deconvolution was necessary. To achieve this, GC-MS data were analyzed and processed using the MzMine 2 package according to the parameters shown in Table 3. The files were submitted for processing by the spectral networks algorithm (GNPS) in three files: mgf file of the EI spectra deconvoluted by Mzmine 2, a quantification table of the peaks generated by Mzmine 2 and a metadata table, with information on the samples, such as LC 50 , taxonomy, coding, etc. In GNPS, the data were adjusted as follows: fragment ion mass tolerance of 0.5 Da; min matched peaks of 5; score threshold of 0.5. The advanced search options were: library class bronze; top history per spectrum of 1 and NIST and GNPS spectral libraries. In advanced network options: min pair cos 0.6 and network topK 10. For more details of the network on the GNPS, access: https://gnps.ucsd.edu/ProteoSAFe/ status.jsp?task=e980401aaf22484f83adead45f6012dc, accessed on 30 November 2021.