Inconsistencies in the Classification of the Family Cydnidae (Hemiptera: Heteroptera: Pentatomoidea) Revealed by Molecular Apomorphies in the Secondary and Tertiary Structures of 18S rRNA Length-Variable Region L (LVR L)

The SSU nuclear rDNA (encoding 18S ribosomal RNA) is one of the most frequently sequenced genes in the molecular analysis of insects. Molecular apomorphies in the secondary and tertiary structures of several 18S rRNA length-variable regions (LVRs) located within the V2, V4, and V7 hypervariable regions can be good indicators for recovering monophyletic groups within some heteropteran families. Among the LVRs that have been analysed, the LVR L in the V4 hypervariable region is the longest and most crucial for such assessments. We analysed the 18S rRNA V4 hypervariable region sequences of 45 species from the family Cydnidae, including all 6 subfamilies (Amaurocorinae, Amnestinae, Cephalocteinae, Cydninae, Garsauriinae, and Sehirinae) and three pentatomoid families (Parastrachiidae, Thaumastellidae, and Thyreocoridae), which have often been included in the broadly defined Cydnidae family. This is the first time that representatives of all Cydnidae subfamilies have been included in a molecular analysis. Only taxa from two subfamilies, Sehirinae and Cydninae, have been used in previous molecular studies. The secondary and tertiary structures of the LVR L were predicted for each species using the two-step procedure already accepted for such analyses to recover any molecular apomorphy essential for determining monophyly. The results of our comparative studies contradict the current understanding of the relationships among burrowing bugs and the current family classification.

There are two different subfamily classifications of Cydnidae currently used.In the broadest sense [18], the family Cydnidae includes all taxa that are morphologically defined by the presence of a series of flattened setae forming coxal combs not found elsewhere among Heteroptera [20].One argument is that the family contains nine subfamilies: Amaurocorinae, Amnestinae, Cephalocteinae, Cydninae, Garsauriinae, Parastrachiinae, Sehirinae, Taumastellinae, and Thyreocorinae.However, the characteristic morphological feature described has independently evolved several times within the family [4]; therefore, this group cannot be clearly defined, and the family classification has not been accepted.
The present study aimed to verify whether the secondary and tertiary structures of the 18S rRNA LVR Ls of representatives of the six Cydnidae subfamilies demonstrate morpho-molecular apomorphies that could serve as indicators of monophyly for certain taxa groups.

Hypervariable Region V4 and Length-Variable Region L (LVR L) Sequence Analyses
The 18S rDNA sequences in the V4 hypervariable regions of 45 Pentatomoidea species were phylogenetically analysed to identify monophyletic groups (Figure A1) and their 'consensus species', which are crucial for predicting the secondary and tertiary structures of the LVR L during further analyses [1,2].Thirty sequences were newly acquired and deposited in GenBank.Their accession numbers are listed in Tables S3 and S4.
For the first time, the family Cydnidae was represented by species belonging to all subfamilies and almost all tribes.One tribe, the Cephalocteini, was not represented in the present analysis.
To verify the sequence variability within the tribes identified as polyphyletic by Pluot-Sigwalt and Lis [19], such as Geotomini sensu lato (subfamily Cydninae) and Sehirini sensu lato (subfamily Sehirinae), all analysed species were assigned to the corresponding spermathecal types, and facies recovered within these tribes [19].
The final 18S rDNA V4 hypervariable region alignment contained 331 sites.There were 219 and 112 conserved and variable sites, respectively, while 78 were parsimonyinformative and 34 were singletons.The alignment file used to search for monophyletic groups is available in the Supplementary Material (File S1).
ModelFinder in the IQ-TREE [34] tested 88 DNA models for this set of sequences.The K3P + G4 substitution model was selected as the best fit based on the Bayesian Information Criterion.The IQ-TREE generated 98 initial trees; the ML consensus tree is shown in Figure A1.
The number of nucleotides in the hypervariable region V4, the LVR L, and the L2 subregion of LVR L in the 18S rRNA of the 'consensus species' for families, subfamilies, facies, and clades within tribes (Figure A1) are given in Table 1.The same data for every species analysed are provided in Table S1.
Table 1.The number of nucleotides of the hypervariable region V4, the length-variable region L (LVR L), and the L2 subregion of the LVR L in the 18S rRNA of the 'consensus species' for specific groups of taxa.The 'consensus species' for family, subfamily or facies are marked with *; the 'consensus species' for the clades within the specific tribe are marked with **.Each analysed species of the Cydnidae were assigned to the particular group of spermathecal types and facies recovered within the family [19].The number of nucleotides for the higher taxa (families, subfamilies, and tribes) for each region or subregion is shown in Table 2.The same data for all the species are provided in Table S2.

Family Subfamily
The length of the analysed hypervariable region V4 varies significantly within the family Cydnidae (316-325 nucleotides), while it shows less variation in other 'cydnoid' families-specifically 316 nucleotides in both Parastrachiidae and Thyreocoridae, and 315 and 318 in Thaumastellidae (Tables 1, 2 and S1).
Table 2.The number of nucleotides of the hypervariable region V4, the length-variable region L (LVR L), and the L2 subregion of the LVR L in the 18S rRNA of the analysed higher taxa.The length of the hypervariable region V4 varied significantly within the family Cydnidae (316-325 nucleotides), while less variation was observed in other 'cydnoid' families.The length was 316 nucleotides in both Parastrachiidae and Thyreocoridae and 315 and 318 in Thaumastellidae (Tables 1, 2 and S1).

Region or Subregion
For the family Cydnidae, the nucleotide number in this region is most consistent within the geotoman facies of the tribe Geotomini s. lato, in which all species have 317 nucleotides (Table S1).Only one species within the subfamily Cephalocteinae, which also has the geotoman facies of the spermatheca, and one representative of the subfamily Garsauriinae, which has the garsauriinae type of spermatheca, shared the same nucleotide number (i.e., 317) (Table S1).
The shortest length of the hypervariable region V4 within the family Cydnidae, namely 316, characterises all species representing the subfamily Sehirinae (Table S1).These 316 nucleotides were found in several Cydnidae taxa and in representatives of the families Thyreocoridae and Parastrachiidae and the subfamily Sehirinae.These include the only member of the subfamily Amaurocorinae (characterised by the amaurocorinae type of spermatheca) and five members of Geotomini s. lato tribe (subfamily Cydninae).The latter includes two adrisan facies species (Adrisa magna and A. romani), two scoparipan facies species (Pseudoscoparipes fraterculus and P. kinabalensis), and one cydnan facies species (Cydnus aterrimus) (Table S1).
The highest number of nucleotides in the hypervariable region V4 was found in the subfamily Amnestinae (324 nucleotides) and in most species representing the tribe Cydnini (three species of Chilocoris and one of Parachilocoris) (325 nucleotides).These are the highest nucleotide numbers recovered in the hypervariable region V4 among all taxa of the superfamily Pentatomoidea [1].Such high nucleotide numbers are unique to Cydnidae and the entire superfamily Pentatomoidea [1].
The high number of nucleotides in the Chilocoris and Parachilocoris species (representing the tribe Cydnini) correlated with a significantly higher number of nucleotides in the LVR L region (81 nucleotides).However, the high number of nucleotides found in the hypervariable region V4 in species of the subfamily Amnestinae did not correlate with a higher number of nucleotides in their LVR L region.All Amnestinae species had 73 nucleotides in the LVR L region.This is the same as in many other species of the subfamily Cydninae (adrisan and scoparipan facies), all of the Sehirinae, and all of the family Parastrachiidae (Tables 1, 2 and S1).Seventy-four nucleotides within the LVR L region (Tables 1, 2 and S1) were observed among species of the subfamily Garsauriinae, the Cephalocteinae, all geotoman facies of the subfamily Cydninae, and species of the family Thyreocoridae.
The number of nucleotides within the L2 subregion of the LVR L varied from 3-7 (Tables 1 and 2, Tables S1 and S2).This region is discussed in detail within the context of the secondary structure of LVR L (see Section 2.3).

18S rRNA Secondary and Tertiary Structure Models
The secondary and tertiary structure models of the predicted 18S rRNA gene for Fromundus pygmaeus (subfamily Cydninae) and Adomerus biguttatus (subfamily Sehirinae) are shown in Figures 1 and S1, respectively.For both predictions, existing 18S rDNA sequences deposited in GenBank (F.pygmaeus, GenBank accession number KJ535871; A. biguttatus, GenBank accession number KY886253) were used.
Both models were generally consistent with those for all pentatomoid species analysed to date [1,2,[15][16][17], particularly when considering the location of three hypervariable regions (V2, V4, V7) and the position of the LVR L within the gene sequences (Figures 1 and S1).

Length-Variable Region L Secondary Structure
The LVR L length among the observed species ranged from 72-81 nucleotides and exhibited the most notable variation in the subfamily Cydninae of the Cydnidae (73-81 nucleotides), as shown in Tables 2 and S1.
Subregions were shaped to identify homologous fragments in the LVR L sequences of the investigated species (Table 3, Figures 2, 3 and S2) following recent analyses of the 18S rRNA secondary structures in the superfamily Pentatomoidea [1].The subdivision of LVR L into subregions was based on the alignment (File S1, Figure S2) and results of the secondary structure predictions (Figures 2 and 3).The number of nucleotides for each subregion based on this comparative analysis is provided in Table 3.     LA and LB were the most variable subregions in terms of nucleotide numbers, ranging from 11-17 and 11-19, respectively (Table 3).All other subregions showed significantly less variation, with nucleotide numbers ranging from 3-7 in L2, 12-16 in LC, 8-11 in LD, and 16-21 in LE (Table 3).LD was the most consistent subregion across species, with ten species containing the same nucleotide formula (6 + 3) (Table 3).
Only two species had the same number of nucleotides in each of the subregions analysed: Adrisa romani and Pseudoscoparipes fraterculus, representing the tribe Geotomini of the subfamily Cydninae.Most importantly, the consensus species of the subfamily Amnestinae demonstrated autapomorphic nucleotide numbers in all the subregions except LD and L2.The other ingroup taxa demonstrated single autapomorphies (Thyreocoridae in the subregion LB, Cydnus aterrimus in the subregion LE) or two autapomorphies (Chilocoris piceus in the subregions L2 and LE).The outgroup consensus species representing Thaumastellidae showed two autapomorphies in the L2 and LE subregions (Table 3).
Most species (denoted by the number 4 in File S2) had specific secondary structure of the LA subregion.This group of eight species come from the Cydnidae (three from the Cydninae and one from the Cephalocteinae), while one species is from the Thyreocoridae (see File S2).Both species of the subfamily Sehirinae (O.secunda, A. biguttatus) had the same secondary structure in the LA subregion as P. japonensis (Parastrachiidae) (marked as number 5 in File S2).
The LA subregion showed the most asymmetric secondary structures, with 10 nucleotides in L(A1) and a single nucleotide in L(A2), in the consensus species of the subfamily Amnestinae (A. zacki) (Figure S2, Table 3, File S2).All other species showed symmetric structures (Figure S2, Table 3, File S2).
The LB subregion was the most diverse of all the LVR L subregions, with only one structure shared by more than two species.This structure, designated number 6 within the LB subregion in File S2, was recovered in four species: three species of the tribe Geotomini sensu lato and one of the tribe Cydnini (all of the subfamily Cydninae of the fa-mily Cydnidae).One or two species represented all other secondary structure types within the LB subregion (File S2).
Although the LC subregion was characterised by low variability in nucleotide number (maximum range of five), 10 different types of secondary structures were noted.The most common was numbered 6 within the LC subregion (File S2).Three species that possessed this structure belonged to the Cydnidae (two of the subfamily Cydninae, one of the Sehirinae), one belonged to the family Parastrachiidae, and one belonged to the family Thyreocoridae.
The LD subregion was the most constant fragment of the entire LVR L. Ten species had the same nucleotide numbers and arrangement (6 + 3) (Table 3) and the same secondary structure (numbered 5 in File S2).This group included eight species of the family Cydnidae (a single species of the subfamily Cephalocteinae, five species of the subfamily Cydninae, and two of the subfamily Sehirinae) and one species each of the families Parastrachiidae and Thyreocoridae (File S2).
In the LE subregion, seven different types of secondary structures were distinguished (File S2); however, most species demonstrated only one of two (Table 3, File S2).The first, numbered 5 in File S2, was found in seven species of the family Cydnidae (one of the Cephalocteinae, five of the Cydninae, and one of the Sehirinae) and a single species of the family Parastrachiidae (File S2).The second group (designated 7 in File S2) consisted of one species each from the families Amaurocorinae, Sehirinae, Parastrachiidae, and Thyreocoridae.
The number of nucleotides in the L2 subregion enabled all analysed species to be placed in four groups (Table 3, File S2).Three nucleotides were only found in Thaumastellidae, distinguishing it as the outgroup species of the superfamily.Seven nucleotides were only observed in one species representing the tribe Cydnini (subfamily Cydninae of the Cydnidae).The two remaining L2 groups, characterised by four and six nucleotides, contained many more species (nine and six, respectively).The first group had seven species of Cydninae, one of Cephalocteinae, and one of Sehirinae (Table 3, File S2).The second group contained a single species each from the Parastrachiidae, Thyreocoridae, Amnestinae, Garsauriinae, Amaurocorinae, and Sehirinae (the last four of the family Cydnidae) families (File S2).

Length-Variable Region L Tertiary Structure
The position of the LVR L in the gene tertiary structure for the outgroup (T.elizabethae) and two ingroup species (F.pygmaeus, A. biguttatus) is shown in Figure 4.

Length-Variable Region L Tertiary Structure
The position of the LVR L in the gene tertiary structure for the outgroup (T.elizabethae) and two ingroup species (F.pygmaeus, A. biguttatus) is shown in Figure 4.When all 16 analysed LVR L tertiary structures were combined and aligned to the outgroup sequence (Figure 5), their general shapes appeared very similar.Only one species, Amnestus zacki, had an LVR L tertiary structure fragment (LVR LA) distinct from those found in other species.This could undoubtedly serve as one of its potential morpho-molecular autapomorphies.When all 16 analysed LVR L tertiary structures were combined and aligned to the outgroup sequence (Figure 5), their general shapes appeared very similar.Only one species, Amnestus zacki, had an LVR L tertiary structure fragment (LVR LA) distinct from those found in other species.This could undoubtedly serve as one of its potential morpho-molecular autapomorphies.
Root-mean-square deviation (RMSD) values (calculated using the RNAssess web server [35] and PyMol software v. 2.5.2 [36]) for each LVR L sequence aligned with the outgroup model are shown in Table 4.The results of the analysis of all LVR L tertiary structures for the sequences of the ingroup species, including the calculation of the Interaction Network Fidelity (INF), Deformation Profile (DP), and p-value coefficients, are provided in File S3.The results of the RMSD calculations for the specific tertiary structures of the LVR L subregions, which could serve as potential morpho-molecular synapomorphies for two species, are shown in Table 5.The RMSD calculation results for the LVR L subregions that could serve as morpho-molecular synapomorphies for more than two species are shown in Table 6.The RMSD values for the LVR L subregions that can be considered reliable morpho-molecular autapomorphies are shown in Table 7.
and green (V7).All sequences were aligned to the outgroup sequence.
When all 16 analysed LVR L tertiary structures were combined and aligned to the outgroup sequence (Figure 5), their general shapes appeared very similar.Only one species, Amnestus zacki, had an LVR L tertiary structure fragment (LVR LA) distinct from those found in other species.This could undoubtedly serve as one of its potential morpho-molecular autapomorphies.The predicted tertiary structures of the LVR L subregions for all 17 analysed species are presented in Figures 6 and 7.All subregions that could serve as potential morpho-molecular autapomorphies or synapomorphies are indicated by coloured arrows corresponding to particular LVR L subregions (Figures 6 and 7).All other subregions are plesiomorphic (as defined by Lis [1]) in terms of their number of nucleotides and their secondary and tertiary structure.
corresponding to particular LVR L subregions (Figures 6 and 7).All other subregions are plesiomorphic (as defined by Lis [1]) in terms of their number of nucleotides and their secondary and tertiary structure.5 and 6].Sequences are aligned with the outgroup (T.elizabethae) sequence.a11) autapomorphies [for symbols explanation see Tables 5 and 6].Sequences are aligned with the outgroup (T.elizabethae) sequence.(Q) Chilocoris piceus (Cydnidae: Cydninae).The arrows corresponding in colour to the particular LVR L subregion indicate the fragments that can serve as potential morpho-molecularly derived characters: (s1-s14) synapomorphies, (a1-a11) autapomorphies (for an explanation of the symbols, see Tables 5 and 6).Sequences are aligned with the outgroup (T.elizabethae) sequence.

Potential Plesiomorphies and Apomorphies in LVR L Secondary Structures
The present analysis of LVR L secondary structures confirmed the occurrence of predefined plesiomorphic conditions for nucleotide numbers in specific subregions within Pentatomoidea [1].These plesiomorphies were found in all examined subregions except for the LE subregion, in which they were absent in species from the family Cydnidae and its closest relatives (Parastrachiidae, Thaumastellidae, and Thyreocoridae).Therefore, we identified the LVR LE subregion as the most diverse subregion within the superfamily Pentatomoidea, which was tentatively suggested by previous studies [1].
As previous studies focused on the superfamily Pentatomoidea [1], our analyses validated several synapomorphies and autapomorphies in nucleotide numbers within specific subregions.

Potential Synapomorphies and Autapomorphies in LVR L Tertiary Structure
The RMSD values computed for the species with identical secondary structures for each of the LVR L subregions indicated that certain structures could be classified as morphomolecular derived apomorphies (autapo-or synapomorphies), while others could not.
The LA subregion exhibited considerable variation in terms of nucleotide numbers (11-17 nucleotides in total) and had six distinct nucleotide schemes.The results derived from the secondary and tertiary structure predictions indicate the presence of three possible morpho-molecular synapomorphies within this subregion.The first synapomorphy pertains to structures in T. elizabethae (Thaumastellidae) and G. aradoides (Cydnidae: Garsauriinae).The second was observed in A. biguttatus (Cydnidae: Sehirinae) and P. japonensis (Parastrachiidae).The third synapomorphy was observed in the secondary and tertiary structures.It was identified in eight species, one from Thyreocoridae and the remaining seven from Cydnidae.These Cydnidae species belong to two different subfamilies: Cephalocteinae (S. indonesicus) and Cydninae (six species).The RMSD values derived by analysing the tertiary structure of the LVR LA subregion indicated two distinct morphomolecular autapomorphies: one for the subfamily Amnestinae (A. zacki) and the other for Amaurocorinae (A. curtus).
Like the LA subregion, the LB subregion had a notable variation in nucleotide numbers (11-19 nucleotides) with 6 nucleotide arrangements.Despite having the largest number of secondary structure types compared to other subregions (up to 11), only one morphomolecular synapomorphy was identified.This was identified in two species representing the tribe Geotomini (of the subfamily Cydninae): C. emarginatus and R. indentatus.The RMSD values calculated for the tertiary structures of the LB subregion indicated the presence of two morpho-molecular autapomorphies.One was found in A. zacki (a consensus species of the subfamily Amnestinae) and the other in T. scarabaeoides (a consensus species of the family Thyreocoridae).
In the LC subregion, six nucleotide arrangements and ten types of secondary structures were identified.However, this subregion only demonstrated three morpho-molecular synapomorphies in the tertiary structures and one morpho-molecular autapomorphy.Synapomorphies within the tertiary structures of this subregion were observed in A. biguttatus (Cydnidae: Sehirinae) and P. japonensis (Parastrachiidae); M. badius and C. emarginatus (both representing the tribe Geotomini in the subfamily Cydninae of Cydnidae); T. scarabaeoides (Thyreocoridae), A. romani and P. fraterculus (from the tribe Geotomini).Only one morphomolecular autapomorphy was identified in A. zacki, a consensus species for the subfamily Amnestinae of the Cydnidae.
The LD subregion exhibited significant diversity in nucleotide numbers, with six possible arrangements.Six secondary structures were also identified.However, only one morphomolecular synapomorphy was present in the tertiary structures.This synapomorphy was seen in three species of the tribe Geotomini: F. pygmaeus, C. emarginatus, and M. badius.No morpho-molecular autapomorphies were detected within this subregion.
In the LE region, seven nucleotide arrangements and seven secondary structure types were identified.The results derived from the secondary and tertiary structure predictions indicated the presence of two morpho-molecular synapomorphies and four autapomorphies.The first morpho-molecular synapomorphy was detected for a group of seven Cydnidae species from three different subfamilies: Sehirinae (O.secunda), Cephalocteinae (S. indonesicus), and Cydninae (five species representing the tribe Geotomini sensu lato).The second synapomorphy was identified in four species.However, only two belonged to the family Cydnidae: A. biguttatus from the subfamily Sehirinae and A. curtus from the subfamily Amaurocorinae.The remaining two species were not members of the family Cydnidae and were from two closely related families: P. japonensis (a consensus species for the Parastrachiidae) and T. scarabaeoides (a consensus species for the Thy-reocoridae).
The LE subregion contained the highest number of autapomorphies within the LVR L, with four detected.These were identified in T. elizabethae (Thaumastellidae), A. zacki (Cydnidae: Amnestinae), and C. piceus and C. aterrimus (both representing the tribe Cydnini of the subfamily Cydninae).
The L2 subregion was the most constant.Lis [1] identified four nucleotides that were plesiomorphic and six that were symplesiomorphic.In the current study, T. elizabethae (Thaumastellidae) and C. piceus (Cydnidae: Cydninae: Cydnini) had three and seven nucleotides as autapomorphies, respectively.

Systematic Position of the Family Thaumastellidae
The current analyses of the LVR L of 18S rRNA secondary and tertiary structures support earlier findings [1,4,9,10,33], indicating that Thaumastellidae is not a member of the family Cydnidae, irrespective of its internal classification, and should be recognised as a distinct Pentatomoidea family.
The distinctiveness of this family in the 'cydnoid' complex, as specified by Lis et al. [4], was verified by the results of the RSMD calculations for the tertiary structures of all the examined species.T. elizabethae (a consensus species of Thaumastellidae) exhibited distinctive morpho-molecular autapomorphies in two subregions: LE and L2.
Identifying a synapomorphy between T. elizabethae and G. aradoides was a significant discovery during the structure analyses.This aspect has not been previously studied in the subfamily Garsauriinae of Cydnidae, nor have these species' genetic sequences been analysed.Due to significant differences in the RMSD values obtained from RNAsses and PyMol calculations, future analyses should concentrate on the molecular relationships between Thaumastellidae representatives and species within the subfamily Garsauriinae.

Classification of the Family Cydnidae versus Morpho-Molecular Apomorphies in the LVR L
The RMSD values calculated for the Cydnidae species with equivalent secondary structures in each of the LVR L subregions show that certain subregions can be identified as morpho-molecular apomorphies in the tertiary structures.
The analyses revealed no synapomorphies in the family Cydnidae, including all currently recognised subfamilies.This lack of synapomorphies was observed in the primary, secondary, and tertiary structures of the studied 18S rRNA region.In addition, no autapomorphy was identified within the region that could distinguish Cydnidae as a well-defined monophyletic group within the 'cydnoid' complex.These results support previous hypotheses regarding the non-monophyletic origin of this family [4,11,12,20].
The analysis of the morpho-molecular synapomorphies for the subfamily Sehirinae as a monophyletic group revealed two significant discrepancies that challenge the current classification of this subfamily and the family Cydnidae.Firstly, the data indicate a distinction between a group of two species, Ochetostethus opacus and Ochetostethomorpha secunda, from other species of the subfamily Sehirinae.This is because they have four nucleotides in the L2 subregion, equal to that of species representing the two other subfamilies: Cydninae and Cephalocteinae.In contrast, the remaining Sehirinae species have six nucleotides in this subregion.The uniqueness of this group within the subfamily Sehirinae is further indicated by the presence of a synapomorphy in subregion LE in O. secunda (the consensus species for the ochetostethan facies of spermatheca) and several non-sehirine species representing the subfamilies Cydninae and Cephalocteinae.
Consistent with previous molecular studies on the superfamily Pentatomoidea [1,4], our analyses revealed synapomorphies in the LA and LC subregions of Sehirinae and Parastrachidae.However, the synapomorphy identified in the tertiary structure of the LE subregion observed in A. biguttatus (Cydnidae: Sehirinae), P. japonensis (Parastrachiidae), T. scarabaeoides (Thyreocoridae), and A. curtus (Cydnidae: Amaurocorinae), was not supported by the results of the phylogenetic analysis.
As in the case of the subfamily Sehirinae, we could not identify morpho-molecular synapomorphies for the entire subfamily Cydninae in any of the analysed subregions.Of the nine analysed Cydninae species, morpho-molecular synapomorphies were only identified in the LA subregions of six species, the LD subregions of three species, and the LE subregions of five species.In addition to the members of Cydninae, several species from other subfamilies or related families are also included in the groups defined by a particular synapomorphy.The LA subregion synapomorphy includes two additional species: one from the subfamily Cephalocteinae (S. indonesicus) and T. scarabaeoides of the family Thyreocoridae.For the LE subregion, in addition to five species from Cydninae, one species of the subfamily Cephalocteinae (S. indonesicus) and one species of the subfamily Sehirinae (O.secunda) are linked by this synapomorphy.
The positioning of the Cephalocteinae species (S. indonesicus) on the phylogenetic tree indicates a distinct correlation with the species of the tribe Geotomini sensu lato (subfamily Cydninae).These findings supplement the current results.This is further supported by the number of nucleotides present in the subregions of the LVR L of S. indonesicus, which matches with F. pygmaeus (Geotomini) across all six subregions and shares similarities with other Geotomini tribe species in five sub-regions.Moreover, it has the same type and facies of spermatheca as representatives of this tribe.The Cephalocteinae subfamily does not demonstrate any autapomorphy, unlike the subfamilies of Amaurocorinae and Amnestinae.Therefore, it might be suitable to categorise the subfamily species as part of the tribe Geotomini or a separate tribe within the subfamily Cydninae, as previously proposed by Wagner [37].
The findings related to the subfamily Cydninae indicate that its two tribes, Geoto-mini sensu lato and Cydnini, are heterogeneous.This is particularly evident in the tribe Cydnini, which comprises two separate, distinguishable groups.The first includes all Chilocoris species and Parachilocoris minutus, while the second consists of the tribe's type genus and species, Cydnus aterrimus.
The consensus species (C.piceus) of the first group is distinct from C. aterrimus in several molecular characteristics.The length of the LVR L region in C. piceus (81 nucleotides) differs from C. aterrimus (73 nucleotides) and is consistent with the length seen in other species of the Cydninae, Sehirinae, Amaurocorinae, Amnestinae, and Parastrachiidae subfamilies.Additionally, both groups exhibited different nucleotide numbers in the LC, LD, LE, and L2 subregions, with the discrepancy in L2 being particularly noteworthy.C. piceus had seven nucleotides in this subregion, a characteristic unique to it (autapomorphy), while C. aterrimus only exhibited four nucleotides, a feature common to the entire Pentatomoidea (a plesiomorphic state, [1]).The LVR L subregions, specifically LB, LC, LD, LE, and L2 in species from both Cydnini groups, exhibited distinct secondary structures.A shared characteristic (synapomorphy) for both groups was only detected in the secondary and tertiary structures of the LA subregion.
Furthermore, different morpho-molecular autapomorphies were detected for each group.Two autapomorphies were identified in C. piceus tertiary structures in the LE and L2 subregions, while one was identified in C. aterrimus in the LE subregion.The differences between C. aterrimus and C. piceus were significant due to differences in nucleotide numbers and secondary structures.The recovered phylogenetic relationship between these two groups suggests that they are distantly related.C. aterrimus unexpectedly belongs to the tribe Geotomini sensu lato, while C. piceus (a consensus species of the remaining Cydnini) is the closest relative to the subfamily Amnestinae.An inferred potential close relationship between species of the subfamily Amnestinae and species of the Chilocoris and Parachilocoris genera (Cydninae: Cydnini) was previously suggested following an analysis of wing stridulitrum across the family Cydnidae [38].Together, these findings suggest that the two groups that comprise Cydnini (Cydnus versus Chilocoris + Parachilocoris) are not phylogenetically related and probably should not be grouped in the same tribe.
The subfamily Amaurocorinae shares certain characteristics with species of the subfamilies Cydninae and Sehirinae, particularly regarding the number of nucleotides within various LVR L subregions.These similarities are also evident in the secondary structures of multiple subregions.Comparable secondary structure patterns were also found in species belonging to two other families: Thyreocoridae and Parastrachiidae.The presence of morpho-molecular synapomorphies in tertiary structures suggests a possible relationship between this subfamily and species of the subfamily Sehirinae, as well as two other families related to the Cydnidae: Thyreocoridae and Parastrachiidae.However, the status of Amaurocorinae as a subfamily could be justified by the existence of its morpho-molecular autapomorphy in the LA subregion.
The subfamily Amnestinae was most distinct within the Cydnidae.It exhibited the greatest number of unique morpho-molecular autapomorphies, identified in four out of the six LVR L subregions: LA, LB, LC, and LD.In addition, the species in this subfamily did not exhibit any morpho-molecular synapomorphies with other subfamilies within the Cydnidae or with closely related families, such as Thyreocoridae and Parastrachiidae.
One morpho-molecular characteristic was identified in the subfamily Garsauriinae, a synapomorphy in the LA subregion shared with T. elizabethae (Thaumastellidae).This subfamily shared no synapomorphies with any of the analysed taxa of the Cydnidae, Thyreocoridae, or Parastrachiidae families.Therefore, the relationship between this subfamily and others within the family Cydnidae remains unclear, as has been previously suggested [19].
To verify the sequence variability of the species representing the two tribes recognised as non-monophyletic by Pluot-Sigwalt and Lis [19], namely the Geotomini (subfamily Cydninae) and Sehirini (subfamily Sehirinae), the species were assigned to the groups of spermathecal types and facies recovered within these two tribes (Tables 1 and S1).To identify the monophyletic groups and their 'consensus species', which are essential for predicting secondary and tertiary structures (see Section 4.5), representatives of the Thaumastellidae family were considered outgroup species [1] during the relationship analysis (Figure A1).
The taxa names, specimens' geographic origins, collectors' names (if available), the University of Opole (Poland) sample numbers (if relevant), and the accession numbers for sequences we deposited into GenBank and those obtained directly from GenBank are provided in Tables S3 and S4.

DNA Extraction
Genomic DNA was extracted from the thorax muscle tissues of ethanol-preserved specimens using the DNeasy Tissue Kit (QIAGEN Inc., Santa Clara, CA, USA) per the manufacturer's guidelines.Sample residues were placed in tubes containing 96% ethanol and cryopreserved in a freezer at the Institute of Biology, University of Opole (University of Opole sample numbers are listed in Table S3).
The PCR reactions were performed according to the protocol described by Lis et al. [2].The amplification consisted of 36 cycles of denaturation at 93 • C for 1 min, annealing at 59 • C for 1 min, and extension at 72 • C for 40 s.The amplification was initialised via incubation at 93 • C for 2 min and a final extension at 72 • C for 5 min.The quality of the final PCR products was assessed using 1% agarose gel electrophoresis.Successful samples were purified using the Clean-Up purification kit (A&A Biotechnology, Gda ńsk, Poland).
All experimental PCR runs were performed simultaneously with negative controls (i.e., reactions without template DNA).Sequencing was performed at GENOMED S.A. (Warszawa, Poland) and A&A Biotechnology (Gda ńsk, Poland).The sequences obtained were verified by BLAST searches to confirm that the results were not due to contaminants.All newly obtained DNA sequences were deposited in GenBank (OR691631-OR691660).Their accession numbers are given in Tables S3 and S4.

Reconstruction of 18S rRNA Secondary Structure Models
The 18S rRNA secondary structure of two species in the family Cydnidae, Fromundus pygmaeus (Dallas, 1851) and Adomerus biguttatus (Linnaeus, 1758), was designed according to the universal model of the gene provided for insects [3].Some necessary modifications and reinterpretations already proposed for Heteroptera have been described in detail by Lis [1].These species represented the two largest subfamilies of the Cydnidae (Cydninae and Sehirinae, respectively), and both had their 18S rDNA sequenced [1,16] and deposited in GenBank (F.pygmaeus, GenBank accession no.: KJ535871; A. biguttatus, GenBank accession no.: KY886253).
The hypervariable region numbering and numbering scheme used for the LVRs was adopted from the works of Lis [1], Yu et al. [15], Wu et al. [16], and Neefs et al. [51].

Prediction of LVR L Secondary Structure
Secondary LVR L structures of all analysed species were predicted using RNAstructure ver.6.3 [52].A three-step procedure was used for comparative sequence analysis based on the study by Lis [1].First, structure predictions were made for each species separately.Then, if only two species were identified as strictly monophyletic, a secondary structure common to both sequences was predicted.Secondary structures common to three or more sequences were calculated if the identified monophyla were represented by three or more species [52].The species selected by the RNAstructure ver.6.3 [52] were those that had secondary structures common to two or more sequences; these species were considered 'consensus species' for these sequences (Tables 1 and S1).
The results of the phylogenetic analysis of the 45 species (Figure A1) were used to determine the number of steps to be followed to predict the secondary structures, but not to suggest any changes in the classification of the family Cydnidae.To mitigate the influence of missing data from incomplete sequences, the sequences were aligned using ClustalW with default parameters in the MEGAX software v. 10.2.6 [53] and then truncated at both ends.A Maximum Likelihood tree was generated via IQ-TREE [54] on the web server [33] using 10,000 replicates and the ultrafast bootstrap method [55].The resulting tree was visualised and edited using the online tool iTOL v5 [56].It was prepared for publication using CorelDRAW 21.
The subdivision of the secondary structures into subregions was performed according to the method used by Lis [1,2].All secondary structures were visualised using the Structure Editor in RNAstructure ver.6.3 [52].

Prediction of LVR L Tertiary Structure
RNAComposer (http://rnacomposer.ibch.poznan.pl,accessed on 18 August 2023), a fully automated RNA structure modelling server, was used to predict the tertiary structure of LVR L [58,59].The RNAComposer method allows RNA structures up to 500 nucleotides long to be constructed with high accuracy and accounts for the possible occurrence of pseudoknots [58,59].Twenty 3D models were generated for each LVR L sequence.The best model with the lowest free energy was selected for further investigation.All tertiary structure images were visualised using PyMol software ver.2.4.0 [35].

Concept of the Morpho-Molecular Structures Potentially Serving as Derived Characters
The concept of morpho-molecular apomorphies (autapomorphies and synapomorphies) of nucleotide sequences in the predicted secondary structures was adapted from Ouvrard et al. [14], Yu et al. [15], and Xie et al. [60].The definition of plesiomorphic states within Pentatomoidea in terms of the number of nucleotides in each LVR L region and the secondary and tertiary structures was adapted from Lis [1].
The strategy for identifying morpho-molecular derived characters in the predicted tertiary structures was based on the methods proposed by Lis [1] and Lis et al. [2].Based on this [1], all morpho-molecular tertiary structures were identified based on their degree of uniqueness [61].Only the tertiary structures whose uniqueness was confirmed at the secondary structure level were considered derived characters (apomorphies) and were used for further analysis.Furthermore, only structures that had a subregion-specific nucleotide arrangement (Figure S2) and different secondary (File S2) and tertiary (Figures 6 and 7) structures were considered potential morpho-molecular autapomorphies.
The uniqueness of the tertiary structures was confirmed using Root-Mean-Square Deviation (RMSD) values, a widely accepted method for comparing 3D structures [34,35,61,62].The RMSD computation aligns atoms in the predicted and reference structures, indicating tertiary structure likeness [34,35,61,62].RMSD is equal to 0 for theoretically identical structures.As the dissimilarity between two structures increases, so does the RMSD value [34,35,61,62].RMSD values were calculated for the LVR L tertiary structures of all species using PyMol [35] and the RNAssess web server [34].Additionally, the Interaction Network Fidelity (INF), Deformation Profile (DP), and p-value coefficients were calcu-lated to determine the similarity between structures with comparable RMSD values more accurately [34].
For the specific tertiary structures of the LVR L subregions, any RMSD value less than 1Å between the compared model sequences was considered sufficient to identify them as potential morpho-molecular synapomorphies for the two or more species analysed [63,64].If there was a difference between the RMSD value calculated in PyMol [35] and that computed by the RNAssess web server [34], the value from the calculation containing more atoms was deemed conclusive.Potential morpho-molecular autapomorphies in a particular subregion were deemed reliable only if a calculated RMSD greater than 1Å existed between all analysed structures and the compared model.

1.
Comparisons of the predicted tertiary structures of the LVR L of the 18S rRNA in species representing all presently recognised and accepted subfamilies and tribes within the family Cydnidae revealed inconsistencies in their classifications.

2.
The present comparative analyses of the LVR L of the 18S rRNA secondary and tertiary structures support earlier findings that irrespective of its internal classification, Thaumastellidae is not a member of the family Cydnidae and should be recognised as a distinct Pentatomoidea family.

3.
The analysis did not identify one synapomorphy that was present across all presently acknowledged subfamilies of Cydnidae.This absence was observable in the primary, secondary, and tertiary structures of the studied 18S ribosomal RNA region.Furthermore, no autapomorphy was detected in the examined region to differentiate Cydnidae as a monophyletic group within the 'cydnoid' complex.These findings are consistent with previous hypotheses suggesting that the origin of this family is non-monophyletic.4.
The predicted secondary and tertiary structures of the LVR L of the 18S rRNA of the family Parastrachiidae and the subfamily Sehirinae (Cydnidae) confirm their close relationship, highlighted by the several morpho-molecular synapomorphies shared between their LVR L subregions. 5.
Two notable groups of species in the subfamily Sehirinae were found to be unrelated.These groups challenged the classification currently in use for this subfamily.One group displayed ochetostethan facies of spermatheca, which significantly differed in regard to their morpho-molecular data from species representing the sehiran facies of spermatheca within the subfamily.These findings indicate that the subfamily may need to be divided into at least two tribes.However, further supportive analyses incorporating alternative mitochondrial and nuclear genes must be conducted to address this further.6.
The subfamily Cephalocteinae displayed a clear correlation with the species of the tribe Geotomini across several morpho-molecular data.It does not possess any distinctive morpho-molecular autapomorphy and exhibits the same type and facies of spermatheca as representatives of the tribe.Therefore, the subfamily may be classified as part of the tribe Geotomini or as a distinct tribe within the subfamily Cydninae.7.
The relationship between two groups from the tribe Cydnini (C.aterrimus and C. piceus) suggests they are distantly related.C. aterrimus was found to be closely related to the tribe Geotomini, while C. piceus (the consensus species of the remaining Cydnini) appeared to be the closest relative to the subfamily Amnestinae.These findings imply that these two groups are not phylogenetically related.Therefore, it is likely inappropriate to categorise them as belonging to the same tribe.8.
The subfamily status of Amaurocorinae was confirmed based on the morpho-molecular autapomorphy in the LA subregion of the LVR L, despite its relation to species of the subfamilies Cydninae and Sehirinae, as well as those of the families Thyreocoridae and Parastrachiidae in terms of morpho-molecular LVR L characters.

Figure 1 .
Figure 1.18S rRNA of Fromundus pygmaeus (A) Secondary structure model.The bases marked in colour represent the hypervariable regions (V2-red, V4-dark blue, V7-green).Thirteen lengthvariable regions (LVRs) are labelled as capital letters B to W in colours analogous to the base colours representing the hypervariable regions or other sequences' regions.The capital letter in the filled circle indicates the LVR L. Base pairing is shown as follows: standard canonical pairs are lines (G-C, A-U), wobble G:U pairs are dots (G•U), A:G or A:C pairs are open circles (A; G, A; C) and other non-canonical pairs are filled circles (e.g., U and U, A and A).(B) Tertiary structure model.The fragments marked in colour represent the hypervariable regions (V2-red, V4-dark blue, V7-green).The LVR L region within the V4 hypervariable region is marked in dark grey.

Figure 1 .
Figure 1.18S rRNA of Fromundus pygmaeus (A) Secondary structure model.The bases marked in colour represent the hypervariable regions (V2-red, V4-dark blue, V7-green).Thirteen lengthvariable regions (LVRs) are labelled as capital letters B to W in colours analogous to the base colours representing the hypervariable regions or other sequences' regions.The capital letter in the filled circle indicates the LVR L. Base pairing is shown as follows: standard canonical pairs are lines (G-C, A-U), wobble G:U pairs are dots (G•U), A:G or A:C pairs are open circles (A; G, A; C) and other noncanonical pairs are filled circles (e.g., U and U, A and A).(B) Tertiary structure model.The fragments marked in colour represent the hypervariable regions (V2-red, V4-dark blue, V7-green).The LVR L region within the V4 hypervariable region is marked in dark grey.

Figure 5 .
Figure 5.All analysed LVR L tertiary structures divided into subregions and aligned with the outgroup (Thaumastella elizabethae) structure.The arrow indicates the LVR L(A) of Amnestus zacki.The subregions are marked in colours according to Figures 2 and 3.

Figure 5 .
Figure 5.All analysed LVR L tertiary structures divided into subregions and aligned with the outgroup (Thaumastella elizabethae) structure.The arrow indicates the LVR L(A) of Amnestus zacki.The subregions are marked in colours according to Figures 2 and 3.

Figure A1 .
Figure A1.Phylogenetic consensus tree of the family Cydnidae based on the Maximum Likelihood analysis of the sequences of the 18S rDNA hypervariable region V4.Node labels-Ultrafast Bootstrap values (see Section 4.5), and the Bootstrap values over 50% are shown next to the branches.

Table 3 .
The nucleotide numbers of the subregions of the LVR L. The autapomorphies in nucleotide numbers for Thaumastellidae are indicated in red, Thyreocoridae in purple, Chilocoris piceus in green, Cydnus aterrimus in grey, and Amnestinae in blue.The plesiomorphic nucleotide number in each subregion is shown in yellow.The 'consensus species' for family, subfamily or facies are marked with *; the 'consensus species' for the clades within the specific tribe are marked with **.

Table 4 .
RMSD values (calculated in RNAssess web server and PyMol software) v. 2.5.2 for each LVR L sequence model aligned to the outgroup (Thaumastella elizabethae, Thaumastellidae) model.

Table 5 .
RMSD values (calculated in RNAssess web server and PyMol) for the specific tertiary structure of the LVR L subregions, which can serve as potential morpho-molecular synapomorphies for two species analysed (target and compared models).The value from the calculation containing more atoms was deemed conclusive.Figures6 and 7were used as the basis for the numbering of synapomorphies.

Table 6 .
RMSD values (calculated in RNAssess web server and PyMol) for the specific tertiary structure of the LVR L subregions, which can serve as potential morpho-molecular synapomorphies for more than two analysed species (one target and several compared models).The value from the calculation containing more atoms was deemed conclusive.

Table 7 .
RMSD values for the specific tertiary structure of the LVR L subregions, which can serve as potential morpho-molecular autapomorphies for a target model against which all other taxa were compared.The higher value from the calculation containing more atoms was deemed conclusive (marked * for results in RNAssess web server, and ** for results in PyMol).