Next Article in Journal
Chemical Modifications of an Insect Immune Resolvin, EpOME, to a Broad-Spectrum Lepidopteran-Specific Insecticide
Previous Article in Journal
Seasonal, Aspect and Elevational Effects on Auchenorrhyncha Communities in Taibai Mountain, China
Previous Article in Special Issue
What Really Lurks in the Dark? Revisiting the Occurrence of Tomicus destruens (Coleoptera, Curculionidae, Scolytinae) in Greece
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Learning the Structural Diversity of Olfactory Receptors: A Genomic Case Study in Two Longhorn Beetles (Cerambycidae: Lamiinae)

1
Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR 72701, USA
2
Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR 72701, USA
3
Department of Biological Sciences, University of Memphis, Memphis, TN 38152, USA
4
Center for Biological Diversity, University of Memphis, Memphis, TN 38152, USA
5
Department of Entomology, The Pennsylvania State University, University Park, PA 16802, USA
6
Center for Chemical Ecology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
*
Author to whom correspondence should be addressed.
Insects 2026, 17(6), 587; https://doi.org/10.3390/insects17060587
Submission received: 8 May 2026 / Revised: 29 May 2026 / Accepted: 1 June 2026 / Published: 4 June 2026

Simple Summary

Understanding the breadth of proteins that exist in nature remains a grand challenge in the life sciences. New approaches for predicting the structure and function of proteins hold great promise in this endeavor, particularly for non-model organisms with limited resources and understudied protein repertoires. Here we conducted a genomic case study to explore the application of machine learning for studying the structure and diversity of olfactory proteins in two longhorn beetles with markedly different life histories. Our goal was to investigate how structure prediction works for these hard-to-study receptors and to see how variation exists within and between species. We find evidence of structure variation across different gene families, and differences among algorithms depending on annotation quality. Modeling structure and sequence divergence confirms relationships between the two distances, while highlighting potential outliers. These results showcase the promise of AI-based structure prediction, which can help reveal hidden biological diversity in organisms that lack extensive laboratory data, with potential value for understanding insect biology and biodiversity broadly.

Abstract

Recent advances in machine learning are transforming biological research by offering powerful tools to address complex challenges across the life sciences. In particular, deep learning approaches now enable accurate predictions of protein structure and function, opening new avenues for investigating proteomic diversity in non-model organisms. In this study, we conducted a genomic case study that examines the predicted structure and diversity of odorant receptor (OR) proteins in two species of longhorn beetles (Cerambycidae) with divergent life histories: the highly specialized red milkweed beetle (Tetraopes tetrophthalmus) and the broadly polyphagous Asian longhorned beetle (Anoplophora glabripennis). Using leading predictive algorithms, we inferred the structure of beetle-encoded OR genes, compared confidence scores, and assessed protein diversity across OR families and between the two genomes. Unsupervised clustering applied to pairwise protein comparisons revealed an expected strong correlation between structure and sequence, while supporting the evolutionary classification of previously predicted OR groups and revealing new evidence of previously unrecognized OR subclusters. Notably, we identify specific proteins exhibiting substantial structural divergence despite relatively low sequence divergence with other paralogs, suggesting potential outliers subject to unusual evolutionary processes. These results highlight the utility of statistical learning for uncovering patterns of protein evolution and structural diversity in understudied insect genomes.

1. Introduction

Determining the structure and function of proteins has long stood as one of biology’s grand challenges [1,2]. Historically, this required labor-intensive methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy [3,4,5]. These traditional methods are notoriously time-consuming and technically demanding [6], with success depending on factors like crystal lattice properties, protein stability, and conformational flexibility [7,8].
Recent progress in artificial intelligence (AI) has paved new paths in structural biology. Deep learning algorithms now predict the 3D structure of proteins from amino acid sequences alone [9,10,11], often within minutes. These innovations are reshaping our understanding of protein biology and evolution [12] and have been applied across a wide array of biological subdisciplines—including biochemistry [13], molecular biology [14], and human health [15]—as well as in specific investigations such as chromatin architecture [16] and the evolution of SARS-CoV-2 variants [17,18]. Notably, AI-based prediction models have been successfully applied to the entire human proteome [19], and, more recently, to the proteomes of major model organisms such as Caenorhabditis elegans, Mus musculus, and Arabidopsis thaliana [20]. These breakthroughs now rank among the most widely cited scientific achievements of the decade—for instance, the AlphaFold paper has garnered over 30,000 citations (Google Scholar, accessed February 2025; [19]).
AI models hold special promise for non-model organisms, where traditional biochemical approaches are often underutilized due to limited experimental infrastructure and resources [21,22,23]. In the absence of efficient computational approaches, our understanding of protein diversity across nature is likely to remain incomplete—especially for poorly characterized genomes in highly diverse lineages and non-model organisms [24,25]. Beetles (Coleoptera) are among the most megadiverse insect orders, playing a key evolutionary role in shaping terrestrial biodiversity over the past 300 million years [26]. In addition to their remarkable taxonomic diversity, beetles are notable for their wide array of fascinating life history traits [27,28], their agricultural relevance as both pollinators and pests [29,30], and, more recently, for their remarkable genomic diversity [31]. Despite an estimated ~1.5 million species worldwide [32], only a very few—such as Tribolium castaneum—are considered model organisms [33]. Importantly, this means that a vast diversity of proteins encoded within beetle genomes remains entirely unexplored. To date, only a handful of beetle protein structures have been resolved using traditional biochemical techniques, including proteins involved in bioluminescence [34] and antifreeze [35]. However, recent efforts to assemble and annotate a growing number of beetle genomes [36,37,38,39,40,41], have begun to reveal the molecular underpinnings of this ecologically, environmentally, and economically important insect group [31,42].
Chemosensation plays a profound role in shaping many dimensions of beetle biology [43]. Essential to the beetle sensory system are odorant receptors (ORs)—membrane-bound proteins that detect volatile chemical compounds in the environment [44]. Broadly, insect ORs mediate a wide array of biological functions, including mate finding [43] and recognition [45,46], host seeking [47,48,49], chemical communication [50], and special navigation [51]. ORs are some of the most diverse and expansive gene families documented in beetle genomes sequenced to-date [31,50,52]. In most insects, ORs function alongside a highly conserved co-receptor, Orco, forming a tetrameric receptor channel [53]. Orco, present in nearly all extant insects, is typically highly expressed in olfactory neurons [53,54]. To date, no complete OR structure has been experimentally solved for any beetle species. Moreover, given the tremendous diversity of beetle genes and genomes in nature, it is likely that protein structures will never be experimentally solved for the vast majority of beetle ORs that exist. AI-based protein algorithms therefore hold great promise for studying these biologically critical yet poorly characterized proteins.
We recently sequenced and assembled the genome of the red milkweed beetle (Tetraopes tetrophthalmus; Cerambycidae; [55]), uncovering a diverse array of OR genes in this highly host-specific insect [56,57,58]. As in many longhorn beetles, the species exhibits exaggerated antennae—a morphological hallmark of Cerambycidae, reflecting their well-developed chemosensory biology [28,59]. T. tetrophthalmus is a charismatic member of the herbivore community associated with North American milkweeds (Asclepias spp.) [60] and has been the focus of extensive research on coevolution [56,57], host plant specialization [61,62], population genetics [63], and chemical ecology [64].
Here we use the T. tetrophthalmus genome as a genomic case study to explore the methodological behavior of AI-based algorithms for learning OR protein structure and diversity in longhorn beetle genomes. To provide a comparative perspective, we also examine the predicted OR repertoire of the Asian longhorned beetle (Anoplophora glabripennis)—a globally invasive, highly polyphagous pest and the first cerambycid species with a sequenced genome [42,65]. These two species (T. tetrophthalmus and A. glabripennis) share a common ancestor ~70 MYA [66]. The contrasting life histories of these two species raise interesting questions relevant to our investigations. For example, recent hypotheses have suggested that host-specialized species may show expansions in OR groups involved in detecting host-specific cues, whereas relative generalists may maintain a more heterogeneous OR landscape, consistent with studies linking OR evolution to ecological adaptation and receptor tuning [67,68]. Likewise, species with relatively broad host ranges may exhibit greater OR diversity and lower levels of pseudogenization than specialists, reflecting selection for detecting chemically diverse environments [69,70]. Differences in evolutionary history may be associated with lineage-specific expansions or contractions of particular OR clades, potentially reflecting ecological shifts or divergence in host utilization over time [69,71,72].
Our study is primary focused on technical insights for learning protein structure and diversity based on predicted pairwise similarity metrics [73]. We applied unsupervised clustering with dimensionality reduction to characterize patterns of protein structural diversity within and between these two genomes and to examine the relationship between protein structure and sequence-based evolutionary divergence. Specifically, we asked: (1) How similar are OR protein structures predicted from two different algorithms and how do they react to incomplete protein sequences? (2) What is the extent of OR structural diversity in the two cerambycid genomes studied? (3) Do cross-species structural comparisons cluster more strongly by sequence similarity, species identity, or both? (4) How well do structural distances correlate with sequence-based evolutionary distances?

2. Materials and Methods

2.1. Protein Sequences

We obtained amino acid sequences for 122 OR genes from the curated annotation of the T. tetrophthalmus genome [55]. To enable cross-species comparisons, we also included 133 OR genes from the Asian longhorned beetle (A. glabripennis) genome [42], a species that is in the same cerambycid subfamily (Lamiinae) and which has been the subject of intensive chemosensory annotation efforts [44,50,65]. For clarity, ORs annotated in the T. tetrophthalmus genome are hereafter labeled as “TETRA,” while those identified in A. glabripennis are labeled as “AGLAB”.
Our primary analyses were focused on a set of high-quality OR gene models with complete annotations available for the entire, full-length protein-coding sequence. We refer to this set as the “complete” ORs, which provide the focal point for exploring comparative insights into the diversity of putatively functional ORs in both cerambycid species. In total we identified 73 complete ORs in the TETRA genome and 95 complete ORs in AGLAB (Table S1). These sequences span eight previously characterized OR subfamilies found throughout Coleoptera (Groups 1, 2A, 2B, 3, 4, 5A, 5B, and 7; [50,74,75]).
Separately, we recovered a set of “incomplete” ORs in both TETRA (49) and AGLAB (38) with low-to-poor quality annotations with missing exons, premature stop codons or other irregularities (Tables S1 and S2). We emphasize that gene “completeness” is a continuum (Table S2), ranging from the absence of a single small exon to more extensive deletions of entire domains, which may result from biology (e.g., pseudogenization) or methodology (i.e., poor annotations). Annotation quality (or lack thereof) can be a persistent problem in undercharacterized, non-model insect genomes, and thus, we included incomplete genes in certain analyses as a simple technical comparison of predictive algorithms applied to variable annotation quality. Thus, we leveraged these incomplete ORs as an opportunity to gain technical insight into the behavior of algorithms when applied to recalcitrant gene models, whether due to annotation quality or real biological phenomena (e.g., pseudogenes). For the subset of analyses that include incomplete ORs, our rationale therefore focused on the comparison of the methodological behavior of algorithms on incomplete protein models. For example, is predicted confidence level lower or higher on incomplete ORs from AlphaFold versus RoseTTAFold? Detailed information regarding protein sequences, genomic context, and gene completeness is provided in the Supplement (Tables S1 and S2).

2.2. Protein Structure Predictions

We predicted structures for all 122 TETRA ORs using two deep learning models: AlphaFold version 2.3.2 [11] and RoseTTAFold version All-Atom (RFAA) [9]. Likewise, we also applied AlphaFold to predict the structures of all 133 AGLAB ORs for comparative analyses. We first compared these two algorithms to provide technical insight to predictions of beetle OR structures, while downstream analyses of OR structural and functional diversity were based on AlphaFold predictions [11,76,77,78]. AlphaFold models were generated using Google Colab Notebooks [79] according to the recommended default settings, including Amber energy minimization to obtain relaxed structures, as suggested in prior studies [10,80]. RoseTTAFold predictions were generated via the Robetta web server (https://robetta.bakerlab.org/ accessed on 5 March 2024), which implements the RoseTTAFold algorithm using default parameters. Each tool produced five structural models per protein; we retained the relaxed model with the highest confidence score for each OR, following established best practices [77]. Final protein models were saved in PDB format for downstream analysis.

2.3. Comparing AlphaFold and RoseTTAFold

We first evaluated consistency between AlphaFold and RoseTTAFold to provide technical insights and better understand their predictive properties on both complete and incomplete OR sets. Specifically, we obtained and compared the top-scoring relaxed models for each TETRA OR from both methods. Pairwise structural distances among predicted OR structures were quantified using root mean squared deviation (RMSD), both with and without a 15 Å cutoff—an approach commonly used to mitigate potential alignment outliers [81,82]. Alignments and RMSD calculations were performed using ChimeraX’s ‘Matchmaker’ function [18,83]. Confidence scores were extracted from the prediction output files for both AlphaFold and RoseTTAFold, and a delta score ( Δ s c o r e ) was computed to contrast confidence between algorithms:
Δ s c o r e = A l p h a F o l d s c o r e R o s e T T A F o l d s c o r e
Positive Δ s c o r e values indicate higher confidence from AlphaFold, and negative values indicate higher confidence from RoseTTAFold. To compare the two scores on a similar scale, we used the normalized A l p h a F o l d s c o r e and R o s e T T A F o l d s c o r e that are both bounded by zero and one, representing little (near zero) to high confidence (near one), with moderate levels of confidence in-between. These two scores provided only relative predictive confidence in the global structure, and thus do not quantify accuracy, nor do they measure the true biological correctness of a predicted structure. Therefore, we included Δ s c o r e only as a technical comparison of relative algorithm confidence because the true structures of these beetle ORs have not been experimentally solved. Specifically, we compared algorithm confidence between predictions derived from complete versus incomplete ORs, with the a priori expectation that both methods should return lower confidence on incomplete ORs due to their fragmented nature.

2.4. Phylogenetic Trees

We obtained a phylogeny of all 255 OR genes (122 TETRA and 133 AGLAB) from the original TETRA genome study [55]. Briefly, this tree was inferred using FastTree 2.1 [84] under the JTT substitution model with gamma-distributed rate variation and branch support assessed via Shimodaira–Hasegawa tests. Prior to phylogenetic construction, the OR protein sequences were aligned using Clustal Omega 1.2.4 [85], and processed with trimAl [86] (similarity threshold = 0, gap threshold = 0.8, conservation cutoff = 25). This produced a phylogeny with branch lengths representing evolutionary distance in units of expected amino acid substitutions per site. From this full phylogeny, four pruned subtrees were generated for focused analyses: (1) all ORs (complete + incomplete) from both species (255 total), (2) only complete genes from both species (168 total), (3) all TETRA ORs (122 total), and (4) only complete TETRA ORs (73 total). Pairwise sequence-based evolutionary distances were derived from phylogenetic branch lengths using the cophenetic.phylo function in the R package Ape 5.8 [87,88].

2.5. Unsupervised Learning of OR Protein Structural Diversity Within TETRA

We assessed OR structural diversity in TETRA by calculating RMSD for all pairwise combinations for AlphaFold-predicted structures (7381 comparisons), using the recommended 15 Å cutoff in ChimeraX 1.7. This yielded a structural distance matrix, which we paired with a corresponding matrix of sequence-based evolutionary distances extracted from the TETRA phylogeny described above. Separately, we pruned these datasets to include only the subset of 73 TETRA ORs with complete protein sequences to provide a comparison of analyses with and without incomplete genes; this provided a structural distance matrix comprising 2628 comparisons, which we paired with a matched sequence-based distance matrix from the pruned TETRA phylogeny.
We applied t-distributed Stochastic Neighbor Embedding (t-SNE; [89]) to each matrix to investigate patterns in structural and evolutionary divergence. t-SNE is a non-linear dimensionality reduction technique that seeks a lower-dimensional representation while preserving local relationships and structure in the data. This approach has been applied successfully in a number of applications for exploring and visualizing large-scale pairwise distance matrices [90,91,92,93]. Here we utilize t-SNE as an exploratory tool for compressing high-dimensional protein distances to discover visual patterns and to generate hypotheses. That is, we leveraged this approach to explore the high-dimensional space and uncover interesting and potentially overlooked patterns hidden within the protein comparisons. We applied the Rtsne function provided in Rtsne package 0.17 [94] according to the recommended default settings with Barnes-Hut implementation.

2.6. Cross-Species Comparisons of OR Structural and Sequence Diversity

To compare OR protein diversity across species, we first analyzed all 255 AlphaFold-predicted ORs from TETRA and AGLAB including all complete and incomplete genes. Pairwise RMSD comparisons (32,385 total) were used to construct a cross-species structural distance matrix. In parallel, we obtained sequence-based distances matched to the same pairwise comparisons from the corresponding phylogeny. To provide a comparison of analyses with versus without incomplete genes, we pruned these datasets to only include the 168 complete TETRA and AGLAB ORs (14,028 total). As before, we applied t-SNE to each of the pairwise distance matrices to explore and visualize the high-dimensional space in a lower-dimensional representation. Lastly, we plotted protein structural-based distances as a function of sequence-based (evolutionary) distances to assess evidence of the relationship between the two measures of protein divergence. For these analyses, we computed Pearson correlation coefficients (r) as only a descriptive measure between protein and sequence distance across all comparisons within each of the two species, as well as a combined analysis containing all ORs across both species together. Additionally, we conducted a Mantel test to evaluate whether structural and sequenced based distances are correlated. Specifically, we used the mantel function provided in the R package vegan 2.6-6 [95] with default settings and 1000 bootstrap replicates. We focused our analyses and visualization on the comparisons using complete OR proteins.

3. Results

3.1. Predicting the Structure of Beetle OR Proteins

We predicted structures for all 122 OR proteins from the TETRA genome using both AlphaFold and RoseTTAFold. As an initial overview, the highest-confidence AlphaFold model from each OR group was visualized alongside the full TETRA OR phylogeny (Figure 1). These representative structures had AlphaFold confidence scores ranging from 0.73 to 0.92 (mean: 0.86; scale: 0–1). Comparative analysis of AlphaFold and RoseTTAFold predictions revealed both similarities and differences in structural outputs (Figure 2 and Figure 3). Pairwise RMSD values computed with and without a 15 Å cutoff showed broadly consistent patterns (Figure 2a vs. Figure 2b). Across all 122 ORs, the mean RMSD between AlphaFold and RoseTTAFold structures was 5.2 Å without the cutoff and 3.2 Å with it. Restricting the analysis to only complete ORs reduced the mean RMSD to 2.79 Å (no cutoff) and 2.54 Å (with cutoff).
Variation in RMSD differed by OR group and gene completeness. For instance, Group 5A, which consisted entirely of incomplete genes in TETRA, showed high structural variability and the highest mean RMSD between AlphaFold and RoseTTAFold (average: 14.0 Å; Figure 2a). In contrast, groups with a higher proportion of complete genes—such as Group 1 (2.73 Å), Group 2A (2.22 Å), Group 2B (3.61 Å), Group 3 (2.79 Å), and Group 7 (3.20 Å)—exhibited much lower structural divergence between the two algorithms. A few notable outliers with large RMSD values were observed, particularly among incomplete Group 7 proteins (Figure 2). Indeed, pairwise distances for the same incomplete OR but different algorithm (AlphaFold versus RoseTTAFold for the same OR) spanned the entire range of RMSD observed (discussed below).
AlphaFold yielded higher relative confidence scores than RoseTTAFold for 79% of TETRA ORs (96 out of 122; Figure 3a). This percentage rose to 98% (72 of 73) when analysis was limited to only complete ORs. Delta confidence scores ( Δ s c o r e ) were generally positive across most OR groups (Figure 3b) with the exception of TETRA Group 5A—comprised exclusively of incomplete genes—where RoseTTAFold provided consistently higher relative confidence, reflecting by negative Δ s c o r e . On average, AlphaFold confidence scores exceeded those of RoseTTAFold by 2% across all ORs, and by 7% when restricted to only complete sequences. Notably, AlphaFold confidence scores consistently approached or exceeded 0.80 for complete proteins, in line with previous studies on other protein families [96].

3.2. OR Structural Diversity Within the TETRA Genome

Dimensionality reduction with t-SNE revealed distinct patterns of OR structural and evolutionary diversity within the TETRA genome (Figure 4). Observed patterns based on pair-wise protein structural distances were largely congruent with previously defined OR groups. Broadly comparable clustering was observed when t-SNE was applied to both structural distances (top panels, Figure 4) and sequence-based distance inferred from the phylogeny (bottom panels, Figure 4). Resolution improved substantially when analyses were restricted to only complete genes (Figure 4b vs. Figure 4a; Figure 4d vs. Figure 4c). For example, Groups 1, 2A, 3, and 7 formed well-separated clusters in both structural and sequence space. Within Group 7, evidence of two distinct subclusters emerged based on protein distances, suggesting potential subclade diversification within this group. Group 2B also became more clearly distinguishable when incomplete genes were excluded. Group 1 and Group 7 strongly diverged from one another, particularly in the structure-based analysis (Figure 4b).

3.3. OR Diversity Across the TETRA and AGLAB Genomes

Cross-genome comparisons of ORs encoded within TETRA and AGLAB further revealed patterns of structural diversity (Figure 5). As with the TETRA-only analyses (Figure 4), t-SNE revealed evidence of clear similarities to OR groups based on previous designations. Well-defined clusters were evident for Groups 1, 3, and 7 (Figure 5a), each containing representatives from both species (Figure 5). Similar clustering patterns emerged whether t-SNE was applied to sequence-based distances or protein structural distances (bottom vs. top rows; Figure 5). Exclusion of incomplete gene models sharpened cluster boundaries (right vs. left columns; Figure 5). For example, Groups 1, 2A, 2B, and 7 were far better resolved in structural space when only complete genes were analyzed (Figure 5c). Additionally, both structural and sequence-based analyses consistently supported the subdivision of Groups 3 and 7 into two distinct subclusters, each containing ORs from both beetle species. These patterns indicate parallel diversification within the same OR groups across the two species. Overall, t-SNE recovered patterns that were strongly organized by OR group, rather than species. That is, homologous ORs of both species tended to cluster together based on their previous designations.

3.4. Structural vs. Phylogenetic Distance

Comparisons of structural and sequence-based distances revealed strong positive correlations both within each species and between the two species (Figure 6). Within TETRA, the Pearson correlation coefficient (Pearson’s r) between structure and sequence distance was r = 0.76 when restricted to complete genes and slightly weaker at r = 0.62 when all incomplete genes were also included. A similar trend was observed for the full dataset of 255 ORs across both TETRA and ALGAB together (r = 0.72) when only complete genes were considered. Likewise, the mantel statistic was estimated to be 0.74 and significant with p-value = 0.001. Most pairwise comparisons clustered in the center of the distance ranges, reflecting a moderate to high level of divergence. Several pairwise comparisons were also identified as possible outliers from the bulk of these distributions as proteins with high structural divergence despite relatively low sequence divergence (left side of Figure 6). These outliers were primarily found in Group 5A of AGLAB and a handful of examples of Group 3 of TETRA, suggesting potentially unusual evolutionary trajectories specific to these ORs.

4. Discussion

Advances in statistical modeling are transforming our ability to study evolution and biodiversity broadly across the Tree of Life [12,13,14,15,16,97,98,99,100,101]. In this study, we applied supervised learning to predict protein structures, followed by unsupervised learning with t-SNE to investigate the structure and diversity of ORs—an evolutionarily dynamic and functionally diverse, yet under-characterized lineage of insect chemoreceptors. Our results contribute to a deeper understanding of OR structural diversity in beetles and contribute to broader efforts to understand proteins in non-model organisms [50,102,103].
As a result of rapid evolution and other processes, OR genes are notoriously challenging to annotate, and many remain functionally uncharacterized, particularly in non-model insects [50,104,105,106]. Efforts to better recover these important genes will be essential to provide a complete picture of olfactory evolution in beetles and other organisms. Thus, we primarily focused our analyses on the comparison of complete ORs with high-quality annotations spanning the entire length of the predicted protein. Yet, like many under characterized genomes, we nonetheless found a large set of poorly annotated, incomplete ORs, and only included these as a case study for several technical comparisons of predictive algorithms and t-SNE clustering. Both AlphaFold and RoseTTAFold generated viable structural predictions for beetle ORs, but the two algorithms reacted differently to gene completeness. Consistent with previous work [77,107], generally, we found evidence that structural predictions of identical OR sequences differed notably between the two methods, sometimes outpacing comparisons of different genes on the same algorithm. For example, the distance between AlphaFold and RoseTTAfold predictions for the same OR was often greater than the distance between two different ORs using AlphaFold alone. These data dovetail with previous work that has found AlphaFold to produce more accurate structural models [77], and these conclusions ultimately led us to base our downstream analyses on the AlphaFold structures.
Our study highlights the growing potential of machine learning to investigate protein structure and diversity in non-model organisms, underscoring its broad potential across diverse gene families and taxa. Our application of unsupervised learning revealed a high degree of OR structural diversity in both beetle species. The resulting t-SNE clusters were well-aligned with prior phylogenetic classifications based on sequence data alone, underscoring the potential of structural comparisons for investigating and characterizing gene family classifications [108,109]. Notably, our structural-based comparisons also identified two prominent subclusters within OR Groups 3 and 7, suggesting possible recent divergence or perhaps cryptic subfunctionalization within these lineages. Future studies will be necessary to uncover the evolutionary processes underlying driving these apparent patterns.
We also observed strong correlations between structural and sequence-based distances, reinforcing the close predicted relationship between these two dimensions of protein evolution (Figure 6). However, a subset of ORs—primarily within AGLAB Group 5A—displayed high structural divergence despite comparatively low sequence divergence. Such patterns may result from a diversity of factors, such as adaptive shifts, relaxed constraints, or other potential evolutionary processes. Group 5A is known to have a high degree of lineage specificity among the ORs, yet it is also considered to be one of the youngest of all the beetle OR subfamilies, only appearing with the Bostrichoidea [42]. Yet, we cannot yet rule out technical limitations in current AI-based prediction models for membrane-bound receptors. Future studies incorporating broader taxonomic sampling of beetle genomes will be essential to determine whether these patterns are conserved across taxa and evolutionary timescales. Additionally, investigating predicted ligand-binding sites and affinities [110,111,112]—especially within the subdivided groups—may shed light on the biological and evolutionary significance of the observed structural divergence and whether it reflects functional differences among beetle OR genes.
Despite encouraging results, several limitations remain, suggesting new paths for future work at the interface of OR structure, function, and evolution. ORs are membrane-bound proteins with complex architectures and often limited available structural data, making them particularly challenging to model [113,114]. Both AlphaFold and RoseTTAFold are known to underperform on membrane protein structures [19,115], and both struggle to account for conformational flexibility—an important factor for ORs that undergo dynamic changes upon binding to volatile ligands [52,53,116,117]. Nonetheless, these tools provide practical, scalable solutions for studying protein structure in data-poor systems, and future improvements in AI modeling are likely to address many of their current limitations [78].
Recent studies have increasingly leveraged AI to predict and identify ligand-binding sites [118,119,120], offering valuable insights into receptor function and molecular interactions. Such knowledge of OR structure and function can help guide informed and targeted strategies for biotechnical applications [121,122,123]. For example, synthetic compounds that mimic or block natural odorants can be designed to lure pests into traps or prevent them from locating crops [121,122,123]. Indeed, informed practices based on OR biology have been proposed as a key component for directing efforts to control the Asian longhorn beetle itself [65]. Moreover, OR studies have implications in biosensor development, where insect-derived receptors are used to detect volatile organic compounds in environmental monitoring [121,122,123]. Collectively, our study demonstrates the power of combining deep learning and unsupervised clustering to explore protein diversity and evolution, even in organisms with largely uncharacterized proteomes. By integrating structural and sequence-based analyses, we identify both conserved patterns and surprising outliers in OR evolution—laying the groundwork for future genomic, functional, and ecological investigations into insect chemosensory diversification.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/insects17060587/s1, Table S1: OR gene data; Table S2: Gene completeness data.

Author Contributions

Conceptualization, R.F.M., D.D.M. and R.A.; Methodology, M.D., T.S., E.W., E.T., S.D., R.F.M. and R.A.; Software, M.D. and E.W.; Validation, J.R.L. and R.A.; Formal analysis, M.D.; Investigation, M.D., T.S., E.W., E.T., S.D., R.F.M., D.D.M. and R.A.; Resources, J.R.L., R.F.M., D.D.M. and R.A.; Data curation, M.D., T.S., S.D., R.F.M. and D.D.M.; Writing—original draft, M.D., R.F.M., D.D.M. and R.A.; Writing—review & editing, M.D., T.S., J.R.L., R.F.M., D.D.M. and R.A.; Visualization, M.D., J.R.L., E.T., S.D. and R.A.; Supervision, D.D.M. and R.A.; Project administration, R.A.; Funding acquisition, R.F.M., D.D.M. and R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant from the Arkansas Bioscience Institutes, as well as the Arkansas High Performance Computing Center, which is funded through multiple National Science Foundation grants and the Arkansas Economic Development Commission. RA was also supported by start-up funds provided by the University of Arkansas, and NSF DEB-2529693. This study was funded by a grant from the United States National Science Foundation DEB: 2110053 to DDM and RFM, an NSF ROA supplement to DDM and RA, and a grant from the United States National Science Foundation MRI: 2318210 to DDM.

Data Availability Statement

Protein sequences and associated data are available with the Supplementary Materials of this study, and the original published genomes [42,55].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Dill, K.A.; Ozkan, S.B.; Shell, M.S.; Weikl, T.R. The Protein Folding Problem. Annu. Rev. Biophys. 2008, 37, 289–316. [Google Scholar] [CrossRef] [PubMed]
  2. Fisher, A.G. Cell and Developmental Biology: Grand Challenges. Front. Cell Dev. Biol. 2024, 12, 1377073. [Google Scholar] [CrossRef]
  3. Acharya, K.R.; Lloyd, M.D. The Advantages and Limitations of Protein Crystal Structures. Trends Pharmacol. Sci. 2005, 26, 10–14. [Google Scholar] [CrossRef]
  4. Anfinsen, C.B. Principles That Govern the Folding of Protein Chains. Science 1973, 181, 223–230. [Google Scholar] [CrossRef] [PubMed]
  5. Kay, L.E. NMR Studies of Protein Structure and Dynamics. J. Magn. Reson. 2011, 213, 477–491. [Google Scholar] [CrossRef]
  6. Chruszcz, M.; Wlodawer, A.; Minor, W. Determination of Protein Structures—A Series of Fortunate Events. Biophys. J. 2008, 95, 1–9. [Google Scholar] [CrossRef]
  7. Huang, F.; Nau, W.M. Photochemical Techniques for Studying the Flexibility of Polypeptides. Res. Chem. Intermed. 2005, 31, 717–726. [Google Scholar] [CrossRef]
  8. Whitford, D. Proteins: Structure and Function; Wiley: Hoboken, NJ, USA, 2013; ISBN 978-1-118-68572-3. [Google Scholar]
  9. Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef]
  10. Gordon, C.H.; Hendrix, E.; He, Y.; Walker, M.C. AlphaFold Accurately Predicts the Structure of Ribosomally Synthesized and Post-Translationally Modified Peptide Biosynthetic Enzymes. Biomolecules 2023, 13, 1243. [Google Scholar] [CrossRef]
  11. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  12. Varadi, M.; Velankar, S. The Impact of AlphaFold Protein Structure Database on the Fields of Life Sciences. Proteomics 2023, 23, 2200128. [Google Scholar] [CrossRef]
  13. Perrakis, A.; Sixma, T.K. AI Revolutions in Biology: The Joys and Perils of AlphaFold. EMBO Rep. 2021, 22, e54046. [Google Scholar] [CrossRef]
  14. Ruff, K.M.; Pappu, R.V. AlphaFold and Implications for Intrinsically Disordered Proteins. J. Mol. Biol. 2021, 433, 167208. [Google Scholar] [CrossRef] [PubMed]
  15. Porta-Pardo, E.; Ruiz-Serra, V.; Valentini, S.; Valencia, A. The Structural Coverage of the Human Proteome before and after AlphaFold. PLoS Comput. Biol. 2022, 18, e1009818. [Google Scholar] [CrossRef] [PubMed]
  16. Tang, Q.-Y.; Ren, W.; Wang, J.; Kaneko, K. The Statistical Trends of Protein Evolution: A Lesson from AlphaFold Database. Mol. Biol. Evol. 2022, 39, msac197. [Google Scholar] [CrossRef]
  17. Dejnirattisai, W.; Huo, J.; Zhou, D.; Zahradník, J.; Supasa, P.; Liu, C.; Duyvesteyn, H.M.E.; Ginn, H.M.; Mentzer, A.J.; Tuekprakhon, A.; et al. SARS-CoV-2 Omicron-B.1.1.529 Leads to Widespread Escape from Neutralizing Antibody Responses. Cell 2022, 185, 467–484.e15. [Google Scholar] [CrossRef]
  18. Meng, E.C.; Goddard, T.D.; Pettersen, E.F.; Couch, G.S.; Pearson, Z.J.; Morris, J.H.; Ferrin, T.E. UCSF ChimeraX: Tools for Structure Building and Analysis. Protein Sci. 2023, 32, e4792. [Google Scholar] [CrossRef] [PubMed]
  19. Tunyasuvunakool, K.; Adler, J.; Wu, Z.; Green, T.; Zielinski, M.; Žídek, A.; Bridgland, A.; Cowie, A.; Meyer, C.; Laydon, A.; et al. Highly Accurate Protein Structure Prediction for the Human Proteome. Nature 2021, 596, 590–596. [Google Scholar] [CrossRef]
  20. Badaczewska-Dawid, A.E.; Kuriata, A.; Pintado-Grima, C.; Garcia-Pardo, J.; Burdukiewicz, M.; Iglesias, V.; Kmiecik, S.; Ventura, S. A3D Model Organism Database (A3D-MODB): A Database for Proteome Aggregation Predictions in Model Organisms. Nucleic Acids Res. 2024, 52, D360–D367. [Google Scholar] [CrossRef]
  21. Askari Rad, A.; Fayazi, J.; Beigi Nassiri, M.T.; Hasani Baferani, A. Functional Annotation of Some Hypothetical Genes in the Schistosoma Parasite Based on Reciprocal Best Structural-Hit Relationship. Res. Mol. Med. 2022, 10, 225–234. [Google Scholar] [CrossRef]
  22. Moreland, R.T.; Zhang, S.; Barreira, S.N.; Ryan, J.F.; Baxevanis, A.D. An AI-generated Proteome-scale Dataset of Predicted Protein Structures for the Ctenophore Mnemiopsis leidyi. Proteomics 2024, 24, 2300397. [Google Scholar] [CrossRef]
  23. Stephan, G.; Dugdale, B.; Deo, P.; Harding, R.; Dale, J.; Visendi, P. Bridging Functional Annotation Gaps in Non-Model Plant Genes with AlphaFold, DeepFRI and Small Molecule Docking. BoRxiv 2021. [Google Scholar] [CrossRef]
  24. McKenna, D.D. Beetle Genomes in the 21st Century: Prospects, Progress and Priorities. Curr. Opin. Insect Sci. 2018, 25, 76–82. [Google Scholar] [CrossRef] [PubMed]
  25. Li, F.; Zhao, X.; Li, M.; He, K.; Huang, C.; Zhou, Y.; Li, Z.; Walters, J.R. Insect Genomes: Progress and Challenges. Insect Mol. Biol. 2019, 28, 739–758. [Google Scholar] [CrossRef]
  26. Petit, S.; Usher, M.B. Biodiversity in Agricultural Landscapes: The Ground Beetle Communities of Woody Uncultivated Habitats. Biodivers. Conserv. 1998, 7, 1549–1561. [Google Scholar] [CrossRef]
  27. García-López, A.; Micó, E.; Múrria, C.; Galante, E.; Vogler, A.P. Beta Diversity at Multiple Hierarchical Levels: Explaining the High Diversity of Scarab Beetles in Tropical Montane Forests. J. Biogeogr. 2013, 40, 2134–2145. [Google Scholar] [CrossRef]
  28. Rossa, R.; Goczał, J. Global Diversity and Distribution of Longhorn Beetles (Coleoptera: Cerambycidae). Eur. Zool. J. 2021, 88, 289–302. [Google Scholar] [CrossRef]
  29. Kromp, B. Carabid Beetles in Sustainable Agriculture: A Review on Pest Control Efficacy, Cultivation Impacts and Enhancement. Agric. Ecosyst. Environ. 1999, 74, 187–228. [Google Scholar] [CrossRef]
  30. Paz, F.S.; Pinto, C.E.; De Brito, R.M.; Imperatriz-Fonseca, V.L.; Giannini, T.C. Edible Fruit Plant Species in the Amazon Forest Rely Mostly on Bees and Beetles as Pollinators. J. Econ. Entomol. 2021, 114, 710–722. [Google Scholar] [CrossRef]
  31. McKenna, D.D.; Shin, S.; Ahrens, D.; Balke, M.; Beza-Beza, C.; Clarke, D.J.; Donath, A.; Escalona, H.E.; Friedrich, F.; Letsch, H.; et al. The Evolution and Genomic Basis of Beetle Diversity. Proc. Natl. Acad. Sci. USA 2019, 116, 24729–24737. [Google Scholar] [CrossRef]
  32. Stork, N.E.; McBroom, J.; Gely, C.; Hamilton, A.J. New Approaches Narrow Global Species Estimates for Beetles, Insects, and Terrestrial Arthropods. Proc. Natl. Acad. Sci. USA 2015, 112, 7519–7523. [Google Scholar] [CrossRef]
  33. Pointer, M.D.; Gage, M.J.G.; Spurgin, L.G. Tribolium Beetles as a Model System in Evolution and Ecology. Heredity 2021, 126, 869–883. [Google Scholar] [CrossRef] [PubMed]
  34. Day, J.C.; Tisi, L.C.; Bailey, M.J. Evolution of Beetle Bioluminescence: The Origin of Beetle Luciferin. Luminescence 2004, 19, 8–20. [Google Scholar] [CrossRef]
  35. Liou, Y.-C.; Daley, M.E.; Graham, L.A.; Kay, C.M.; Walker, V.K.; Sykes, B.D.; Davies, P.L. Folding and Structural Characterization of Highly Disulfide-Bonded Beetle Antifreeze Protein Produced in Bacteria. Protein Expr. Purif. 2000, 19, 148–157. [Google Scholar] [CrossRef]
  36. Evans, J.D.; McKenna, D.; Scully, E.; Cook, S.C.; Dainat, B.; Egekwu, N.; Grubbs, N.; Lopez, D.; Lorenzen, M.D.; Reyna, S.M. Genome of the Small Hive Beetle (Aethina Tumida, Coleoptera: Nitidulidae), a Worldwide Parasite of Social Bee Colonies, Provides Insights into Detoxification and Herbivory. GigaScience 2018, 7, giy138. [Google Scholar] [CrossRef]
  37. Nagy, N.A.; Rácz, R.; Rimington, O.; Póliska, S.; Orozco-terWengel, P.; Bruford, M.W.; Barta, Z. Draft Genome of a Biparental Beetle Species, Lethrus Apterus. BMC Genom. 2021, 22, 301. [Google Scholar] [CrossRef]
  38. Xue, H.-J.; Niu, Y.-W.; Segraves, K.A.; Nie, R.-E.; Hao, Y.-J.; Zhang, L.-L.; Cheng, X.-C.; Zhang, X.-W.; Li, W.-Z.; Chen, R.-S. The Draft Genome of the Specialist Flea Beetle Altica Viridicyanea (Coleoptera: Chrysomelidae). BMC Genom. 2021, 22, 243. [Google Scholar] [CrossRef]
  39. Cunningham, C.B.; Benowitz, K.M.; Moore, A.J. The Updated Genome of the Burying Beetle Nicrophorus Vespilloides, a Model Species for Evolutionary and Genetic Studies of Parental Care. Ecol. Evol. 2024, 14, e70601. [Google Scholar] [CrossRef] [PubMed]
  40. Sylvester, T.; Adams, R.; Mitchell, R.F.; Ray, A.M.; Shen, R.; Shin, N.R.; Daundasekara, K.C.; McKenna, D.D. Insights into Longhorn Beetle (Cerambycidae) Evolution from Comparative Analyses of the Red-Headed Ash Borer (Neoclytus Acuminatus Acuminatus) Genome. J. Hered. 2025, 116, 558–567. [Google Scholar] [CrossRef] [PubMed]
  41. Sylvester, T.; Adams, R.; Hunter, W.B.; Li, X.; Rivera-Marchand, B.; Shen, R.; Shin, N.R.; McKenna, D.D. The Genome of the Invasive and Broadly Polyphagous Diaprepes Root Weevil, Diaprepes Abbreviatus (Coleoptera), Reveals an Arsenal of Putative Polysaccharide-Degrading Enzymes. J. Hered. 2024, 115, 94–102. [Google Scholar] [CrossRef]
  42. McKenna, D.D.; Scully, E.D.; Pauchet, Y.; Hoover, K.; Kirsch, R.; Geib, S.M.; Mitchell, R.F.; Waterhouse, R.M.; Ahn, S.-J.; Arsala, D.; et al. Genome of the Asian Longhorned Beetle (Anoplophora glabripennis), a Globally Significant Invasive Species, Reveals Key Functional and Evolutionary Innovations at the Beetle–Plant Interface. Genome Biol. 2016, 17, 227. [Google Scholar] [CrossRef] [PubMed]
  43. Thornhill, R.; Alcock, J. The Evolution of Insect Mating Systems; Harvard University Press: Cambridge, MA, USA, 1983; ISBN 978-0-674-43395-3. [Google Scholar]
  44. Andersson, M.N.; Keeling, C.I.; Mitchell, R.F. Genomic Content of Chemosensory Genes Correlates with Host Range in Wood-Boring Beetles (Dendroctonus Ponderosae, Agrilus Planipennis, and Anoplophora glabripennis). BMC Genom. 2019, 20, 690. [Google Scholar] [CrossRef] [PubMed]
  45. Fleischer, J.; Pregitzer, P.; Breer, H.; Krieger, J. Access to the Odor World: Olfactory Receptors and Their Role for Signal Transduction in Insects. Cell. Mol. Life Sci. 2018, 75, 485–508. [Google Scholar] [CrossRef]
  46. Hansson, B.S.; Stensmyr, M.C. Evolution of Insect Olfaction. Neuron 2011, 72, 698–711. [Google Scholar] [CrossRef] [PubMed]
  47. Mustaparta, H.; Angst, M.E.; Lanier, G.N. Specialization of Olfactory Cells to Insect-and Host-Produced Volatiles in the Bark beetleIps Pini (Say). J. Chem. Ecol. 1979, 5, 109–123. [Google Scholar] [CrossRef]
  48. Tanaka, K.; Uda, Y.; Ono, Y.; Nakagawa, T.; Suwa, M.; Yamaoka, R.; Touhara, K. Highly Selective Tuning of a Silkworm Olfactory Receptor to a Key Mulberry Leaf Volatile. Curr. Biol. 2009, 19, 881–890. [Google Scholar] [CrossRef]
  49. Todd, I.L.; Baker, T.C. Response of Single Antennal Neurons of Female Cabbage Loopers to Behaviorally Active Attractants. Naturwissenschaften 1993, 80, 183–186. [Google Scholar] [CrossRef]
  50. Mitchell, R.F.; Schneider, T.M.; Schwartz, A.M.; Andersson, M.N.; McKenna, D.D. The Diversity and Evolution of Odorant Receptors in Beetles (Coleoptera). Insect Mol. Biol. 2020, 29, 77–91. [Google Scholar] [CrossRef]
  51. Pequeno-Zurro, A.; Rano, I.; Shaikh, D. A Chemosensory Navigation Model Inspired by the On/Off Neural Processing Mechanism in Cockroaches. IEEE Trans. Med. Robot. Bionics 2020, 2, 338–346. [Google Scholar] [CrossRef]
  52. Del Mármol, J.; Yedlin, M.A.; Ruta, V. The Structural Basis of Odorant Recognition in Insect Olfactory Receptors. Nature 2021, 597, 126–131. [Google Scholar] [CrossRef]
  53. Butterwick, J.A.; Del Mármol, J.; Kim, K.H.; Kahlson, M.A.; Rogow, J.A.; Walz, T.; Ruta, V. Cryo-EM Structure of the Insect Olfactory Receptor Orco. Nature 2018, 560, 447–452. [Google Scholar] [CrossRef]
  54. Robertson, H.M.; Warr, C.G.; Carlson, J.R. Molecular Evolution of the Insect Chemoreceptor Gene Superfamily in Drosophila Melanogaster. Proc. Natl. Acad. Sci. USA 2003, 100, 14537–14542. [Google Scholar] [CrossRef] [PubMed]
  55. Adams, R.; Sylvester, T.; Mitchell, R.F.; Price, M.A.; Shen, R.; McKenna, D.D. Functional and Evolutionary Insights into Chemosensation and Specialized Herbivory from the Genome of the Red Milkweed Beetle, Tetraopes Tetrophthalmus (Cerambycidae: Lamiinae). J. Hered. 2024, 16, esae049. [Google Scholar] [CrossRef]
  56. Farrell, B.D.; Mitter, C. The Timing of Insect/Plant Diversification: Might Tetraopes (Coleoptera: Cerambycidae) and Asclepias (Asclepiadaceae) Have Co-Evolved? Biol. J. Linn. Soc. 1998, 63, 553–577. [Google Scholar] [CrossRef]
  57. Farrell, B.D. Evolutionary Assembly of the Milkweed Fauna: Cytochrome Oxidase I and the Age of TetraopesBeetles. Mol. Phylogenetics Evol. 2001, 18, 467–478. [Google Scholar] [CrossRef]
  58. Ali, J.G.; Agrawal, A.A. Trade-offs and Tritrophic Consequences of Host Shifts in Specialized Root Herbivores. Funct. Ecol. 2017, 31, 153–160. [Google Scholar] [CrossRef]
  59. Kariyanna, B.; Mohan, M.; Gupta, R. Biology, Ecology and Significance of Longhorn Beetles (Coleoptera: Cerambycidae). J. Entomol. Zool. Stud. 2017, 5, 1207–1212. [Google Scholar]
  60. Batalden, R.V.; Oberhauser, K.; Peterson, A.T. Ecological Niches in Sequential Generations of Eastern North American Monarch Butterflies (Lepidoptera: Danaidae): The Ecology of Migration and Likely Climate Change Implications. Environ. Entomol. 2007, 36, 1365–1373. [Google Scholar] [CrossRef]
  61. Rasmann, S.; Agrawal, A.A. Evolution of Specialization: A Phylogenetic Study of Host Range in the Red Milkweed Beetle (Tetraopes Tetraophthalmus). Am. Nat. 2011, 177, 728–737. [Google Scholar] [CrossRef] [PubMed]
  62. Züst, T.; Agrawal, A.A. Population Growth and Sequestration of Plant Toxins along a Gradient of Specialization in Four Aphid Species on the Common Milkweed Asclepias Syriaca. Funct. Ecol. 2016, 30, 547–556. [Google Scholar] [CrossRef]
  63. McCauley, D.E.; Eanes, W.F. Hierarchical Population Structure Analysis of the Milkweed Beetle, Tetraopes Tetraophthalmus (Forster). Heredity 1987, 58, 193–201. [Google Scholar] [CrossRef]
  64. Agrawal, A.A.; Hastings, A.P. Tissue-Specific Plant Toxins and Adaptation in a Specialist Root Herbivore. Proc. Natl. Acad. Sci. USA 2023, 120, e2302251120. [Google Scholar] [CrossRef]
  65. Mitchell, R.F.; Hall, L.P.; Reagel, P.F.; McKenna, D.D.; Baker, T.C.; Hildebrand, J.G. Odorant Receptors and Antennal Lobe Morphology Offer a New Approach to Understanding Olfaction in the Asian Longhorned Beetle. J. Comp. Physiol. A 2017, 203, 99–109. [Google Scholar] [CrossRef]
  66. Gutiérrez-Trejo, N.; Van Dam, M.H.; Lam, A.W.; Martínez-Herrera, G.; Noguera, F.A.; Weissling, T.; Ware, J.L.; Toledo-Hernández, V.H.; Skillman, F.W., Jr.; Farrell, B.D.; et al. Phylogenomics of Tetraopes Longhorn Beetles Unravels Their Evolutionary History and Biogeographic Origins. Sci. Rep. 2024, 14, 7285. [Google Scholar] [CrossRef]
  67. Andersson, M.N.; Löfstedt, C.; Newcomb, R.D. Insect Olfaction and the Evolution of Receptor Tuning. Front. Ecol. Evol. 2015, 3, 53. [Google Scholar] [CrossRef]
  68. Auer, T.O.; Khallaf, M.A.; Silbering, A.F.; Zappia, G.; Ellis, K.; Álvarez-Ocaña, R.; Arguello, J.R.; Hansson, B.S.; Jefferis, G.S.; Caron, S.J. Olfactory Receptor and Circuit Evolution Promote Host Specialization. Nature 2020, 579, 402–408. [Google Scholar] [CrossRef]
  69. McBride, C.S. Rapid Evolution of Smell and Taste Receptor Genes during Host Specialization in Drosophila Sechellia. Proc. Natl. Acad. Sci. USA 2007, 104, 4996–5001. [Google Scholar] [CrossRef]
  70. Engsontia, P.; Sangket, U.; Robertson, H.M.; Satasook, C. Diversification of the Ant Odorant Receptor Gene Family and Positive Selection on Candidate Cuticular Hydrocarbon Receptors. BMC Res. Notes 2015, 8, 380. [Google Scholar] [CrossRef] [PubMed]
  71. Yan, H. Insect Olfactory Neurons: Receptors, Development, and Function. Curr. Opin. Insect Sci. 2025, 67, 101288. [Google Scholar] [PubMed]
  72. Nei, M.; Niimura, Y.; Nozawa, M. The Evolution of Animal Chemosensory Receptor Gene Repertoires: Roles of Chance and Necessity. Nat. Rev. Genet. 2008, 9, 951–963. [Google Scholar] [CrossRef] [PubMed]
  73. Maiorov, V.N.; Crippen, G.M. Significance of Root-Mean-Square Deviation in Comparing Three-Dimensional Structures of Globular Proteins. J. Mol. Biol. 1994, 235, 625–634. [Google Scholar] [CrossRef] [PubMed]
  74. Engsontia, P.; Sanderson, A.P.; Cobb, M.; Walden, K.K.; Robertson, H.M.; Brown, S. The Red Flour Beetle’s Large Nose: An Expanded Odorant Receptor Gene Family in Tribolium Castaneum. Insect Biochem. Mol. Biol. 2008, 38, 387–397. [Google Scholar] [CrossRef] [PubMed]
  75. Mitchell, R.F.; Hughes, D.T.; Luetje, C.W.; Millar, J.G.; Soriano-Agatón, F.; Hanks, L.M.; Robertson, H.M. Sequencing and Characterizing Odorant Receptors of the Cerambycid Beetle Megacyllene Caryae. Insect Biochem. Mol. Biol. 2012, 42, 499–505. [Google Scholar] [CrossRef]
  76. Pereira, J.; Simpkin, A.J.; Hartmann, M.D.; Rigden, D.J.; Keegan, R.M.; Lupas, A.N. High-accuracy Protein Structure Prediction in CASP14. Proteins 2021, 89, 1687–1699. [Google Scholar] [CrossRef]
  77. Lee, C.; Su, B.-H.; Tseng, Y.J. Comparative Studies of AlphaFold, RoseTTAFold and Modeller: A Case Study Involving the Use of G-Protein-Coupled Receptors. Brief. Bioinform. 2022, 23, bbac308. [Google Scholar] [CrossRef]
  78. Kovalevskiy, O.; Mateos-Garcia, J.; Tunyasuvunakool, K. AlphaFold Two Years on: Validation and Impact. Proc. Natl. Acad. Sci. USA 2024, 121, e2315002121. [Google Scholar] [CrossRef]
  79. Mirdita, M.; Schütze, K.; Moriwaki, Y.; Heo, L.; Ovchinnikov, S.; Steinegger, M. ColabFold: Making Protein Folding Accessible to All. Nat. Methods 2022, 19, 679–682. [Google Scholar] [CrossRef]
  80. Saldaño, T.; Escobedo, N.; Marchetti, J.; Zea, D.J.; Mac Donagh, J.; Velez Rueda, A.J.; Gonik, E.; García Melani, A.; Novomisky Nechcoff, J.; Salas, M.N.; et al. Impact of Protein Conformational Diversity on AlphaFold Predictions. Bioinformatics 2022, 38, 2742–2748. [Google Scholar] [CrossRef]
  81. Cole, J.C.; Murray, C.W.; Nissink, J.W.M.; Taylor, R.D.; Taylor, R. Comparing Protein–Ligand Docking Programs Is Difficult. Proteins 2005, 60, 325–332. [Google Scholar] [CrossRef] [PubMed]
  82. Kirchmair, J.; Markt, P.; Distinto, S.; Wolber, G.; Langer, T. Evaluation of the Performance of 3D Virtual Screening Protocols: RMSD Comparisons, Enrichment Assessments, and Decoy Selection—What Can We Learn from Earlier Mistakes? J. Comput. Aided Mol. Des. 2008, 22, 213–228. [Google Scholar] [CrossRef]
  83. Ille, A.M.; Markosian, C.; Burley, S.K.; Mathews, M.B.; Pasqualini, R.; Arap, W. Generative Artificial Intelligence Performs Rudimentary Structural Biology Modeling. Sci. Rep. 2024, 14, 19372. [Google Scholar] [CrossRef] [PubMed]
  84. Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree: Computing Large Minimum Evolution Trees with Profiles Instead of a Distance Matrix. Mol. Biol. Evol. 2009, 26, 1641–1650. [Google Scholar] [CrossRef]
  85. Sievers, F.; Higgins, D.G. Clustal Omega. Curr. Protoc. Bioinform. 2014, 48, 3–13. [Google Scholar] [CrossRef] [PubMed]
  86. Capella-Gutiérrez, S.; Silla-Martínez, J.M.; Gabaldón, T. trimAl: A Tool for Automated Alignment Trimming in Large-Scale Phylogenetic Analyses. Bioinformatics 2009, 25, 1972–1973. [Google Scholar] [CrossRef]
  87. Paradis, E.; Claude, J.; Strimmer, K. APE: Analyses of Phylogenetics and Evolution in R Language. Bioinformatics 2004, 20, 289–290. [Google Scholar] [CrossRef]
  88. Paradis, E.; Schliep, K. Ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R. Bioinformatics 2019, 35, 526–528. [Google Scholar] [CrossRef]
  89. van der Maaten, L.; Hinton, G. Visualizing Data Using T-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  90. Yokota, R.; Kaminaga, Y.; Kobayashi, T.J. Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information. Front. Immunol. 2017, 8, 1500. [Google Scholar] [CrossRef]
  91. Xu, X.; Xie, Z.; Yang, Z.; Li, D.; Xu, X. A T-SNE Based Classification Approach to Compositional Microbiome Data. Front. Genet. 2020, 11, 620143. [Google Scholar] [CrossRef] [PubMed]
  92. Cieslak, M.C.; Castelfranco, A.M.; Roncalli, V.; Lenz, P.H.; Hartline, D.K. T-Distributed Stochastic Neighbor Embedding (t-SNE): A Tool for Eco-Physiological Transcriptomic Analysis. Mar. Genom. 2020, 51, 100723. [Google Scholar] [CrossRef]
  93. Bachmann, F.; Hennig, P.; Kobak, D. Wasserstein T-SNE; Springer: Berlin/Heidelberg, Germany, 2022; pp. 104–120. [Google Scholar]
  94. Krijthe, J.; van der Maaten, L.; Krijthe, M.J. Package ‘Rtsne’, R Package Version. 2018; Volume 13, p. 577. Available online: https://cran.r-project.org/web/packages/Rtsne (accessed on 8 March 2024).
  95. Dixon, P. VEGAN, a Package of R Functions for Community Ecology. J. Veg. Sci. 2003, 14, 927–930. [Google Scholar]
  96. Binder, J.L.; Berendzen, J.; Stevens, A.O.; He, Y.; Wang, J.; Dokholyan, N.V.; Oprea, T.I. AlphaFold Illuminates Half of the Dark Human Proteins. Curr. Opin. Struct. Biol. 2022, 74, 102372. [Google Scholar] [CrossRef] [PubMed]
  97. Lozano, J.R.; DeGiorgio, M.; Assis, R.; Adams, R. Discriminating Models of Trait Evolution. Evolution 2025, 80, 697–710. [Google Scholar]
  98. Duncan, M.; DeGiorgio, M.; Assis, R.; Adams, R. Robust Regression Rescues Poor Phylogenetic Decisions. BMC Ecol. Evo. 2025, 25, 108. [Google Scholar] [CrossRef]
  99. Brahmantio, B.; Bartoszek, K.; Yapar, E. Bayesian Inference of Mixed Gaussian Phylogenetic Models. BMC Bioinform. 2024, 27, 77. [Google Scholar]
  100. Adams, R.; DeGiorgio, M. Likelihood-Based Tests of Species Tree Hypotheses. Mol. Biol. Evol. 2023, 40, msad159. [Google Scholar] [CrossRef]
  101. Brown, J.M.; Thomson, R.C. Evaluating Model Performance in Evolutionary Biology. Annu. Rev. Ecol. Evol. Syst. 2018, 49, 95–114. [Google Scholar] [CrossRef]
  102. Kulmuni, J.; Wurm, Y.; Pamilo, P. Comparative Genomics of Chemosensory Protein Genes Reveals Rapid Evolution and Positive Selection in Ant-Specific Duplicates. Heredity 2013, 110, 538–547. [Google Scholar] [CrossRef]
  103. Brand, P.; Robertson, H.M.; Lin, W.; Pothula, R.; Klingeman, W.E.; Jurat-Fuentes, J.L.; Johnson, B.R. The Origin of the Odorant Receptor Gene Family in Insects. eLife 2018, 7, e38340. [Google Scholar] [CrossRef]
  104. Karpe, S.D.; Tiwari, V.; Ramanathan, S. InsectOR—Webserver for Sensitive Identification of Insect Olfactory Receptor Genes from Non-Model Genomes. PLoS ONE 2021, 16, e0245324. [Google Scholar] [CrossRef] [PubMed]
  105. Bachler, A.; Walsh, T.K.; Rane, R.V.; Pandey, G. Chimeric Mis-Annotations of Genes Remain Pervasive in Eukaryotic Non-Model Organisms. BMC Genom. 2025, 26, 630. [Google Scholar] [CrossRef]
  106. Olvera-Vazquez, S.G.; Chen, X.; Mesnil, A.; Meslin, C.; Almeida-Silva, F.; Confais, J.; Bourgeois, Y.; Lombardi, G.; Lougmani, C.; Alix, K. Comprehensive Annotation of Olfactory and Gustatory Receptor Genes and Transposable Elements Revealed Their Evolutionary Dynamics in Aphids. Mol. Biol. Evol. 2025, 42, msaf238. [Google Scholar] [CrossRef]
  107. Barbarin-Bocahu, I.; Graille, M. The X-Ray Crystallography Phase Problem Solved Thanks to AlphaFold and RoseTTAFold Models: A Case-Study Report. Corrigendum. Acta Crystallogr. D Struct. Biol. 2023, 79, 353. [Google Scholar] [CrossRef] [PubMed]
  108. Xu, Q.; Dunbrack, R.L. Assignment of Protein Sequences to Existing Domain and Family Classification Systems: Pfam and the PDB. Bioinformatics 2012, 28, 2763–2772. [Google Scholar] [CrossRef]
  109. Shi, Y.; Zhang, W.; Yang, Y.; Murzin, A.G.; Falcon, B.; Kotecha, A.; Van Beers, M.; Tarutani, A.; Kametani, F.; Garringer, H.J.; et al. Structure-Based Classification of Tauopathies. Nature 2021, 598, 359–363. [Google Scholar] [CrossRef]
  110. Aggarwal, R.; Gupta, A.; Chelur, V.; Jawahar, C.; Priyakumar, U.D. DeepPocket: Ligand Binding Site Detection and Segmentation Using 3D Convolutional Neural Networks. J. Chem. Inf. Model. 2021, 62, 5069–5079. [Google Scholar] [CrossRef]
  111. Mylonas, S.K.; Axenopoulos, A.; Daras, P. DeepSurf: A Surface-Based Deep Learning Approach for the Prediction of Ligand Binding Sites on Proteins. Bioinformatics 2021, 37, 1681–1690. [Google Scholar] [CrossRef]
  112. Kandel, J.; Tayara, H.; Chong, K.T. PUResNet: Prediction of Protein-Ligand Binding Sites Using Deep Residual Neural Network. J. Cheminformatics 2021, 13, 65. [Google Scholar] [CrossRef] [PubMed]
  113. Jacquin-Joly, E.; Merlin, C. Insect olfactory receptors: Contributions of molecular biology to chemical ecology. J. Chem. Ecol. 2004, 30, 2359–2397. [Google Scholar] [CrossRef] [PubMed]
  114. Schmidt, H.R.; Benton, R. Molecular Mechanisms of Olfactory Detection in Insects: Beyond Receptors. Open Biol. 2020, 10, 200252. [Google Scholar] [CrossRef]
  115. Del Alamo, D.; Sala, D.; Mchaourab, H.S.; Meiler, J. Sampling Alternative Conformational States of Transporters and Receptors with AlphaFold2. eLife 2022, 11, e75751. [Google Scholar] [CrossRef] [PubMed]
  116. Zufall, F.; Domingos, A.I. The Structure of Orco and Its Impact on Our Understanding of Olfaction. J. Gen. Physiol. 2018, 150, 1602–1605. [Google Scholar] [CrossRef] [PubMed]
  117. Zhao, J.; Chen, A.Q.; Ryu, J.; Del Mármol, J. Structural Basis of Odor Sensing by Insect Heteromeric Odorant Receptors. Science 2024, 384, 1460–1467. [Google Scholar] [CrossRef]
  118. Raschka, S.; Kaufman, B. Machine Learning and AI-Based Approaches for Bioactive Ligand Discovery and GPCR-Ligand Recognition. Methods 2020, 180, 89–110. [Google Scholar] [CrossRef]
  119. Chatterjee, A.; Walters, R.; Shafi, Z.; Ahmed, O.S.; Sebek, M.; Gysi, D.; Yu, R.; Eliassi-Rad, T.; Barabási, A.-L.; Menichetti, G. Improving the Generalizability of Protein-Ligand Binding Predictions with AI-Bind. Nat. Commun. 2023, 14, 1989. [Google Scholar] [CrossRef]
  120. Chen, W.; Song, C.; Leng, L.; Zhang, S.; Chen, S. The Application of Artificial Intelligence Accelerates G Protein-Coupled Receptor Ligand Discovery. Engineering 2024, 32, 18–28. [Google Scholar] [CrossRef]
  121. Bohbot, J.D.; Vernick, S. The Emergence of Insect Odorant Receptor-Based Biosensors. Biosensors 2020, 10, 26. [Google Scholar] [CrossRef]
  122. Lu, Y.; Liu, Q. Insect Olfactory System Inspired Biosensors for Odorant Detection. Sens. Diagn. 2022, 1, 1126–1142. [Google Scholar] [CrossRef]
  123. Venthur, H.; Zhou, J.-J. Odorant Receptors and Odorant-Binding Proteins as Insect Pest Control Targets: A Comparative Analysis. Front. Physiol. 2018, 9, 1163. [Google Scholar] [CrossRef]
Figure 1. Exploring OR proteins in the Tetraopes tetrophthalmus (TETRA) genome. Shown are predicted structures for specific ORs with the highest normalized AlphaFold confidence score for each of the nine OR groups. Labels for each structure indicate the specific OR and its associated confidence score.
Figure 1. Exploring OR proteins in the Tetraopes tetrophthalmus (TETRA) genome. Shown are predicted structures for specific ORs with the highest normalized AlphaFold confidence score for each of the nine OR groups. Labels for each structure indicate the specific OR and its associated confidence score.
Insects 17 00587 g001
Figure 2. Comparing AlphaFold versus RoseTTAFold structural predictions for individual proteins. RSMD (no cutoff) distributions between predicted models shown as boxplots for each OR group in (a). Likewise, distributions of RMSD with a 15 Å cut are shown as boxplots in (b). Filled shapes represent complete ORs, while hollow shapes indicate incomplete ORs.
Figure 2. Comparing AlphaFold versus RoseTTAFold structural predictions for individual proteins. RSMD (no cutoff) distributions between predicted models shown as boxplots for each OR group in (a). Likewise, distributions of RMSD with a 15 Å cut are shown as boxplots in (b). Filled shapes represent complete ORs, while hollow shapes indicate incomplete ORs.
Insects 17 00587 g002
Figure 3. Comparing AlphaFold and RoseTTAFold confidence scores across TETRA ORs. Panel (a) shows individual confidence scores returned from both algorithms for each OR, while panel (b) provides boxplots of Δ s c o r e from AlphaFold versus RoseTTAfold. Colors coordinate with OR groups, while circles indicate AlphaFold predictions and triangles represent RoseTTAFold predictions, respectively. Filled shapes represent complete ORs, and hollow shapes denote incomplete ORs.
Figure 3. Comparing AlphaFold and RoseTTAFold confidence scores across TETRA ORs. Panel (a) shows individual confidence scores returned from both algorithms for each OR, while panel (b) provides boxplots of Δ s c o r e from AlphaFold versus RoseTTAfold. Colors coordinate with OR groups, while circles indicate AlphaFold predictions and triangles represent RoseTTAFold predictions, respectively. Filled shapes represent complete ORs, and hollow shapes denote incomplete ORs.
Insects 17 00587 g003
Figure 4. Unsupervised learning with t-SNE for OR proteins encoded in the TETRA genome. Colors are based on previously assigned OR group designation. Results are shown for t-SNE based on structural distances for analyses containing all genes (a) and only complete genes (b), and likewise for analyses based on phylogenetic distances for all ORs (c) and only complete ORs (d).
Figure 4. Unsupervised learning with t-SNE for OR proteins encoded in the TETRA genome. Colors are based on previously assigned OR group designation. Results are shown for t-SNE based on structural distances for analyses containing all genes (a) and only complete genes (b), and likewise for analyses based on phylogenetic distances for all ORs (c) and only complete ORs (d).
Insects 17 00587 g004
Figure 5. Exploring t-SNE analyses for OR proteins in both Tetraopes tetrophthalmus (TETRA, circle) and Anoplophora glabripennis (AGLAB, star) genomes. Colors indicate OR group assignments based on previous studies. Results shown for t-SNE based on structural distances for analyses containing all genes (a) and only complete genes (b), respectively, and likewise for analyses based on phylogenetic distances for all ORs (c) and only complete ORs (d).
Figure 5. Exploring t-SNE analyses for OR proteins in both Tetraopes tetrophthalmus (TETRA, circle) and Anoplophora glabripennis (AGLAB, star) genomes. Colors indicate OR group assignments based on previous studies. Results shown for t-SNE based on structural distances for analyses containing all genes (a) and only complete genes (b), respectively, and likewise for analyses based on phylogenetic distances for all ORs (c) and only complete ORs (d).
Insects 17 00587 g005
Figure 6. Relating structure-based and sequence-based distances for pairwise comparisons of only complete OR proteins within Anoplophora glabripennis (AGLAB, blue), within Tetraopes tetrophthalmus (TETRA, red), and between AGLAB and TETRA (purple). Curves and associated confidence intervals depict recovered relationships from non-parametric LOESS regression.
Figure 6. Relating structure-based and sequence-based distances for pairwise comparisons of only complete OR proteins within Anoplophora glabripennis (AGLAB, blue), within Tetraopes tetrophthalmus (TETRA, red), and between AGLAB and TETRA (purple). Curves and associated confidence intervals depict recovered relationships from non-parametric LOESS regression.
Insects 17 00587 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duncan, M.; Sylvester, T.; Walden, E.; Roa Lozano, J.; Turner, E.; Duncan, S.; Mitchell, R.F.; McKenna, D.D.; Adams, R. Learning the Structural Diversity of Olfactory Receptors: A Genomic Case Study in Two Longhorn Beetles (Cerambycidae: Lamiinae). Insects 2026, 17, 587. https://doi.org/10.3390/insects17060587

AMA Style

Duncan M, Sylvester T, Walden E, Roa Lozano J, Turner E, Duncan S, Mitchell RF, McKenna DD, Adams R. Learning the Structural Diversity of Olfactory Receptors: A Genomic Case Study in Two Longhorn Beetles (Cerambycidae: Lamiinae). Insects. 2026; 17(6):587. https://doi.org/10.3390/insects17060587

Chicago/Turabian Style

Duncan, Mataya, Terrence Sylvester, Emilee Walden, Jenniffer Roa Lozano, Emma Turner, Samuel Duncan, Robert F. Mitchell, Duane D. McKenna, and Rich Adams. 2026. "Learning the Structural Diversity of Olfactory Receptors: A Genomic Case Study in Two Longhorn Beetles (Cerambycidae: Lamiinae)" Insects 17, no. 6: 587. https://doi.org/10.3390/insects17060587

APA Style

Duncan, M., Sylvester, T., Walden, E., Roa Lozano, J., Turner, E., Duncan, S., Mitchell, R. F., McKenna, D. D., & Adams, R. (2026). Learning the Structural Diversity of Olfactory Receptors: A Genomic Case Study in Two Longhorn Beetles (Cerambycidae: Lamiinae). Insects, 17(6), 587. https://doi.org/10.3390/insects17060587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop