Validating Morphometrics with DNA Barcoding to Reliably Separate Three Cryptic Species of Bombus Cresson (Hymenoptera: Apidae)

Simple Summary Evidence of bumble bee population declines has led to an increase in conservation efforts to protect these important pollinators. However, effective conservation requires accurate species identification. We provide quantitative methods to accurately identify three cryptic species of bumble bees using morphometric measurements of the cheek length and width, and antennal segments. We validated the accuracy of our methods with DNA analysis. We predicted that these methods would reliably identify both the queens and worker bees of Bombus vagans and B. sandersoni. We expanded these methods to include an uncommon form of Bombus perplexus with all light hair on its thorax, rather than the more common light on top and dark below, that can mistakenly be identified as B. vagans or B. sandersoni. Although the species we consider here, Bombus vagans, B. sandersoni and B. perplexus, are not currently listed as species of concern in North America, there is uncertainty of their population status, some of which is due to difficulty in species identification, which we have resolved. Recent history informs us that some bumble bee species experience rapid declines within a few decades. Our methods to correctly identify these cryptic species is key to monitoring their status and population trends. Abstract Despite their large size and striking markings, the identification of bumble bees (Bombus spp.) is surprisingly difficult. This is particularly true for three North American sympatric species in the subgenus Pyrobombus that are often misidentified: B. sandersoni Franklin, B. vagans Smith B. perplexus Cresson. Traditionally, the identification of these cryptic species was based on observations of differences in hair coloration and pattern and qualitative comparisons of morphological characters including malar length. Unfortunately, these characteristics do not reliably separate these species. We present quantitative morphometric methods to separate these species based on the malar length to width ratio (MRL) and the ratios of the malar length to flagellar segments 1 (MR1) and 3 (MR3) for queens and workers, and validated our determinations based on DNA barcoding. All three measurements discriminated queens of B. sandersoni and B. vagans with 100% accuracy. For workers, we achieved 99% accuracy by combining both MR1 and MR3 measurements, and 100% accuracy differentiating workers using MRL. Moreover, measurements were highly repeatable within and among both experienced and inexperienced observers. Our results, validated by genetic evidence, demonstrate that malar measurements provide accurate identifications of B. vagans and B. sandersoni. There was considerable overlap in the measurements between B. perplexus and B. sandersoni. However, these species can usually be reliably separated by combining malar ratio measurements with other morphological features like hair color. The ability to identify bumble bees is key to monitoring the status and trends of their populations, and the methods we present here advance these efforts.

Unfortunately, despite being large-bodied with conspicuous hair coloration and patterns, many bumble bee species are deceptively challenging to identify [20][21][22][23], particularly because there are few helpful characters for separating species [24]. In North America, several identification guides exist for bumble bees [23,[25][26][27], however, there are still challenges to the identification of some similarly colored cryptic species. Hair coloration is an important field characteristic for identification, yet it has been widely acknowledged that accurate identifications based on hair coloration are complicated by chromatic variability within species and convergent evolution between species [23,[28][29][30]. For example, Franklin [25] confessed that he had "much difficulty in describing the colors exhibited by the pile of the various species." Therefore, errors in identification may result without careful examination of morphological characters [28,31,32].
Three commonly misidentified sympatric species of bumble bee with similar hair coloration and patterns in the subgenus Pyrobombus Dalla Torre, are B. vagans (Half-Black Bumble Bee), B. sandersoni (Sanderson Bumble Bee) and B. perplexus (Confusing Bumble Bee). While hair color has frequently been used in bumble bee identification, there exists considerable variation in hair color patterns both within and among castes of B. sandersonii, B. vagans and B. perplexus [23,25,27], making reliance on this character alone insufficient. Although B. perplexus typically has extensive black hair on the lower half of the pleura [25,27], it has a less common light colored form that has the sides of the thorax entirely yellow to the base of the legs that is often confused with the prior two species [23]. Franklin [25] stated that light form females of B. perplexus were regularly confused as B. vagans.
In addition to their morphologic similarities, these species share a confusing history that is complicated by the fact that important early taxonomic work on this group was conducted solely on spring-caught queens and males [25,28]. Franklin initially described B. sandersoni as a subspecies of B. vagans noting slight differences in their malar space. Frison [33] did not agree with Franklin's determinations that B. sandersoni was a subspecies of B. vagans and instead suggested that B. sandersoni was actually a color variety of B. frigidus Smith based on his observations of the male genitalia [28]. Later, Mitchell [27] and Soroye and Bucks [34] classified B. sandersoni as a separate species from B. vagans, also based on differences in the malar length in spring-caught queens [28].
To reliably separate these species, previous authors sought to find morphologic characters that were consistent across a range of body sizes and castes to reliably differentiate species in the subgenus Pyrobombus. For example, Plowright and Pallett [28] proposed that spring-caught queen B. sandersoni, B. vagans, and B. frigidus specimens from a wide geographic range across Canada could be identified using several metrics including wing venation, and measurements of malar and antennal lengths. Following the methods in Richards [31], they verified that the length of malar spaces in spring-caught queen B. vagans did not overlap with either queen B. sandersoni or B. frigidus, but that the malar space did not significantly separate the queens of B. frigidus from B. sandersoni, concluding that the relationship between B. sandersoni and B. frigidus remained unclear and required further investigation [28]. They acknowledged that the longer malar length of B. vagans was the primary Insects 2020, 11, 669 3 of 20 structural character used to separate the queens of B. vagans from B. sandersoni. Mitchell [27] compared the median length of the malar space in relation to the basal width of the mandible to differentiate the spring-caught queens of B. sandersoni from B. vagans. Although these morphometric characters are effective for distinguishing among spring-caught queens of B. sandersoni and B. vagans, none of these authors used these characters to separate worker specimens of these species. Size polymorphism in worker bumble bees can exhibit up to 10-fold variation in body mass within a single colony [35,36], and visual assessment of the malar length of smaller bodied bees without the aid of a microscope presents a greater challenge. This is important because workers are more abundant and active during the flight season than queens, and thus more likely to be encountered by field investigators and collected as specimens during community and population monitoring [37].
Earlier authors did not provide quantitative comparisons for morphologic features in worker castes, only in spring-caught queens. For example, Mitchell [27] described qualitative differences between B. sandersoni and B. vagans with respect to the malar character as "slightly greater", "slightly shorter than", or "nearly equal", and Williams et al. [23] provide a similar qualitative assessment of the cheek length to width to distinguish between these species. Richards [31], in his work on the subgeneric divisions of Bombus Laterille, provided measurements of several morphometric characters using a micrometer scale (reticle) to provide measurements as units including those for absolute malar length and the proportions of antennal segments 3:4:5 in females, but he did not apply these findings to species within the subgenera. Although the descriptive morphological features used by previous authors provide a means for discriminating among these species, variability within species may make accurate determinations based on these characteristics difficult and have not been tested in workers [23,25].
To assess conservation assessments for species of bumble bees, and to monitor population status and trends, accurate identification is required. We undertook this study to establish reliable quantitative morphometric characters to distinguish among B. sandersoni, B. vagans and B. perplexus, using specimens of both queens and workers that were independently identified using DNA sequence data. We predicted that if these characters were useful for discriminating among species, then classifying species based on morphometrics would correspond to classifications based on DNA [37]. In addition, we assessed the accuracy of characters currently employed to separate these species, compare measurement repeatability within and between observers, and provide guidelines for accurate identification of this species complex for future studies.

Specimens
Specimens of worker and spring and fall-caught queen B. vagans, B. sandersoni, and B. perplexus were collected from 2010-2019 from the northern edge of their range in three states in the Midwest (Minnesota, Wisconsin, and Michigan) and three states in the northeast (New York, Massachusetts, and Maine) to represent regional diversity (see Supplemental Appendix A for complete specimen collection information). Initial species identifications were made on 216 specimens using taxonomic keys [23,27,38] and accompanying visual assessments of the ocular-malar area and hair color and patterns on the thorax and tergal segments. We did not include males in our analysis because they can be reliably identified using standard keys [23,27]. To confirm species identifications, we obtained DNA sequence data on 115 specimens haphazardly chosen from the 216 specimens to include individuals tentatively identified as one of the focal species based on visual observations, hair color pattern, including the presence of yellow hair on tergal segment 5 (T%), and individuals whose species assignment was not initially apparent (species information in Appendix A).

Determining Malar and Flagellar Segment Ratios
Using a reticle in the eyepiece of a stereomicroscope, we measured the malar length, width, and length of flagellar segments 1 and 3 on 216 specimens, including the 115 (44 B. vagans, 49 B. sandersoni, and 22 B. perplexus) that were subsequently chosen for DNA analysis. Following the anatomy in Michener [39] and Williams et al. [23], we measured the malar length (i.e., the shortest distance from the base of the eye to the edge of the cheek (Figure 1), and the malar width (i.e., the outside of the mandible condyle to the outside of the cheek condyle, assumed to be synonymous with the "basal width of the mandible" following Mitchell [27] (Figure 1a) on queens and workers to establish a range of MRL length to width ratios. To ensure consistency of measurements, we measured the malar length on both the vertical and horizontal axis to confirm that our measurements were constant.

Determining Malar and Flagellar Segment Ratios
Using a reticle in the eyepiece of a stereomicroscope, we measured the malar length, width, and length of flagellar segments 1 and 3 on 216 specimens, including the 115 (44 B. vagans, 49 B. sandersoni, and 22 B. perplexus) that were subsequently chosen for DNA analysis. Following the anatomy in Michener [39] and Williams et al. [23], we measured the malar length (i.e., the shortest distance from the base of the eye to the edge of the cheek (Figure 1), and the malar width (i.e., the outside of the mandible condyle to the outside of the cheek condyle, assumed to be synonymous with the "basal width of the mandible" following Mitchell [27] (Figure 1a) on queens and workers to establish a range of MRL length to width ratios. To ensure consistency of measurements, we measured the malar length on both the vertical and horizontal axis to confirm that our measurements were constant. To take malar width measurements, we used the cheek condyle to demark the medial edge of the mandible, because when the mandible of the specimen is closed, the small corner of the acetabulum is hidden by the lateral edge of the clypeus (Figure 1b,c). The cheek condyle aligns with the hidden edge of the mandible and therefore marks the edge of the mandible (Figure 1b). The malar length to width ratio is calculated as MRL = malar length/malar width. Next, we measured the horizontal lengths of flagellar segments 1 and 3. (Figure 1d) to establish a range of malar length to flagellar length ratios for flagellar segments 1 (MR1) and 3 (MR3) using the following calculations MR1 = malar length/flagellomer − 1, and MR3 = malar length/flagellomer − 3. Detailed instructions on how to measure the MRL, MR1, and MR3 ratios are provided in Appendix B.
To examine any possible effects from observer biases and experiences, each specimen was measured by two independent observers (Hereafter: Obs1 and Obs2). To determine the extent to which this methodology could be employed by a novice, specimens were measured by a student volunteer with < two months of experience identifying bees under a microscope (Hereafter: Obs3). To take malar width measurements, we used the cheek condyle to demark the medial edge of the mandible, because when the mandible of the specimen is closed, the small corner of the acetabulum is hidden by the lateral edge of the clypeus (Figure 1b,c). The cheek condyle aligns with the hidden edge of the mandible and therefore marks the edge of the mandible (Figure 1b). The malar length to width ratio is calculated as MRL = malar length/malar width. Next, we measured the horizontal lengths of flagellar segments 1 and 3. (Figure 1d) to establish a range of malar length to flagellar length ratios for flagellar segments 1 (MR1) and 3 (MR3) using the following calculations MR1 = malar length/flagellomer − 1, and MR3 = malar length/flagellomer − 3. Detailed instructions on how to measure the MRL, MR1, and MR3 ratios are provided in Appendix B.
To examine any possible effects from observer biases and experiences, each specimen was measured by two independent observers (Hereafter: Obs1 and Obs2). To determine the extent to which this Insects 2020, 11, 669 5 of 20 methodology could be employed by a novice, specimens were measured by a student volunteer with < two months of experience identifying bees under a microscope (Hereafter: Obs3).

DNA Barcoding
To provide species-level identifications to determine the accuracy of existing keys and our MRL, MR1, and MR3 measurements, DNA was extracted from 115 specimens for which we had used existing keys and made visual identifications. From each specimen, the central-right leg was removed using sterilized forceps, legs were ground in a 1.5 mL microcentrifuge tube using a polypropylene pestle (USA Scientific, Inc., Ocala, FL, USA) and whole genomic DNA was isolated using the Omega Bio-tek E.Z.N.A. ® Tissue DNA Kit (Omega Bio-tek, Inc., Norcross, GA, USA). After isolation, a fragment of the mitochondrial locus cytochrome oxidase I (COI) was amplified using the protocol described in Hebert et al. [40]. Initial attempts to amplify the complete "barcode" fragment using the primers LepF1 and LepR1 as described in Hebert et al. [40], were unsuccessful, likely due to extractions that were performed from pinned specimens of varying ages. Therefore, we designed novel primers BombusF (5 -AGWCAYCCTGGAATATGAA-3 ) and BombusR (5 -GTGGRAAAGCTATATCAGG-3 ) to amplify~150 base-pairs of the barcode fragment that was diagnostic among species. PCRs were conducted following conditions described in Smith-Freedman et al. [41], and DNA sequencing of both forward and reverse fragments was performed at the DNA Analysis Facility on Science Hill at Yale University. Raw sequence reads were edited in Geneious 11.1.2 (Biomaters Ltd., Auckland, New Zealand), and the forward and reverse reads assembled into a consensus sequence for each sample, and an alignment of all sequenced samples was then constructed in Geneious using the "Geneious Alignment Tool" with default parameters. Sequences were then assigned to haplotypes, and the relationships of haplotypes to each other were reconstructed using TCS v. 1.21 [42] with a 95% connection limit. Inter-and intraspecific percent differences between and among each species were then calculated in Geneious based on the species identifications from the TCS network analysis. DNA barcodes associated with all specimens are accessible on GenBank under the Accession numbers MT951454-MT951575, MT991562-MT991569 (Appendix A).

Statistical Analyses
To determine the accuracy of visual identifications (i.e., traditional identifications which are based on hair coloration and just visual observations of malar length) in comparison to DNA-confirmed identifications we performed Pearson's correlation tests. To compare mean malar ratio measurements among the three species, we used a generalized linear model (GLM) that included the malar ratio measurement as our response (either MR1, MR3 or MRL), a fixed effect of species (identified by DNA barcoding), bee caste (queen or worker) and region (Midwest or Northeast). We also included an interaction term between species and caste and between species and region. We ran this model assuming a Gaussian error distribution and tested variable significance using analysis of variance (ANOVA). Non-significant interaction terms (p > 0.05) were removed, and the analysis rerun with the remaining fixed effects and interaction terms [43]. Pairwise differences among levels within our fixed effects with least-squares means were compared using the package 'emmeans' to acquire estimated means and Tukey-adjusted p-values [44].
To test the accuracy of the three malar ratio measurements for identifying species, we used a linear discriminant analysis (LDA) on queens and workers separately for B. sandersoni and B. vagans. B. perplexus was excluded from this analysis because we found that hair color, even on the less common light-colored specimens, contained a few scattered dark hairs on the thorax, and in most cases, light form specimens had light hair on the third tergal segment. Thus, malar ratio measurements were unnecessary for this statistical comparison. However, we present B. perplexus MRL, MR1, and MR3 ratio ranges to alert taxonomists to the possibility of rare, light-colored B. perplexus that may require additional characters to confirm identification. We performed an LDA on 60% of our 115 DNA verified specimens to use as a training dataset and then predicted species identification on the remaining 40% of samples using MR1, MR3, and MRL measurements and validated the accuracy of our classifications using the species identifications from the DNA barcodes. We ran the LDA using the function 'lda' in package 'MASS' [45] and split the data using function 'createDataPartition' from package 'caret' [46]. To quantify the uncertainty in accuracy using three measurements, we performed a bootstrapping method to resample the dataset 999 times and determined the median ± SD of accuracy for the three measurements for both queens and workers separately.
We then predicted a range of values for each measurement beyond our measured individuals using the sample mean and SE. For each species/caste combo (e.g., B. sandersoni queens), we first bootstrapped the population mean and standard error by 500 sampled replicates using the function 'boot' from package 'boot' [47]. We then simulated a population of 10,000 bees from a normal distribution using the bootstrapped mean and SD and report the 99% quantile of the simulated distribution. All analyses were performed using the statistical environment R v. 3.5.1 [48].
Finally, we assessed the repeatability of malar ratio measurements by different observers. We compared measurements within and among observers using a Pearson's correlation using the function 'cor,test' in R [48]. We first compared measurements by experienced Obs1 to measurements by both an experienced Ob2 and inexperienced Ob3. We also examined the correlation between standard deviation (SD) and the mean of repeated measurements within and between observers to determine if repeatability was associated with measurement size (e.g., observations become less reliable as individuals decrease in size).

Malar Ratio Measurements & DNA Barcoding
We collected measurements for MRL, MR1, and MR3 from a total of 216 specimens and obtained DNA results from 115 specimens to confirm species identifications (49 B. sandersoni, 44 B. vagans, 22 B. perplexus). Without using quantified malar length to width or flagellar length ratios, our identifications based on existing keys and visual characteristics (e.g., hair color) were correct 70.4% of the time. For all worker and queens, the malar length was smaller than the malar width. Correlation between the three malar ratio measurements were strong (MR1 to MR3:R = 0.66, MR1 to MRL R = 0.75, MR3 to MRL:R = 0.77). For all species comparisons, we use measurements from only one observer (Obs1) to avoid pseudoreplication.
Based on statistical parsimony reconstruction, individuals could be assigned to one of three groups representing individuals of (1) B. sandersoni (haplotype MR001), (2) B. vagans (haplotype MR003), and (3) B. perplexus (haplotypes MR191 and MR196 [one base-pair different between the two B. perplexus haplotypes]). The relationship of each group is presented in Figure 2.
Mirroring the network analyses, estimates of intra-and interspecific percent differences, found that individuals of B. sandersoni and B. vagans were 95.7% similar to each other (both species had 100% within species similarity), and that on average individuals of B. perplexus were 94.3% similar to individuals of both B. sandersoni and B. vagans (this species had 99.3% within species similarity due to the one-basepair difference between haplotype MR191 and MR196).

Comparing the Three Measurements among Species
For all three measurements, means were significantly different among the three species and between queens and workers (Tables 1 and 2, Figure 3). However, differences between queens and workers varied among measurements. Queens had larger MR3 and MRL measurements compared to workers, but there was no difference in MR1 between queens and workers (Tables 1 and 2, Figure 3). There were no significant differences in any measurement between the two geographic regions (Table 1) Insects 2020, 11, 669 7 of 20 nor were there any significant interactions among species, castes, or region for any measurement (all p > 0.1).
between the three malar ratio measurements were strong (MR1 to MR3:R = 0.66, MR1 to MRL R = 0.75, MR3 to MRL:R = 0.77). For all species comparisons, we use measurements from only one observer (Obs1) to avoid pseudoreplication.
Based on statistical parsimony reconstruction, individuals could be assigned to one of three groups representing individuals of (1) B. sandersoni (haplotype MR001), (2) B. vagans (haplotype MR003), and (3) B. perplexus (haplotypes MR191 and MR196 [one base-pair different between the two B. perplexus haplotypes]). The relationship of each group is presented in Figure 2.

Predictive Accuracy of the Three Measurements for Separating B. vagans and B. sandersoni
Using the measurements of queens, the LDA model predicted species identity with 100% accuracy for all three malar ratio measurements (Figure 4). For workers, MRL had 100% ± 0% SD and MR1 had 96% ± 5% SD accuracy, and MR3 had a 97% ± 4% SD accuracy ( Figure 5). Only MRL had complete separation between B. vagans and B. sandersoni for queens and workers. However, using both MR1 and MR3 measurements to differentiate species resulted in 99% ± 0.01% SD accuracy. We report characteristics of the three species and the min and max of all measurements as well as the 99% quantile of our simulated distributions (Table 3).

Predictive Accuracy of the Three Measurements for Separating B. Vagans and B. Sandersoni
Using the measurements of queens, the LDA model predicted species identity with 100% accuracy for all three malar ratio measurements (Figure 4). For workers, MRL had 100% ± 0% SD and MR1 had 96% ± 5% SD accuracy, and MR3 had a 97% ± 4% SD accuracy ( Figure 5). Only MRL had complete separation between B. vagans and B. sandersoni for queens and workers. However, using both MR1 and MR3 measurements to differentiate species resulted in 99% ± 0.01% SD accuracy. We report characteristics of the three species and the min and max of all measurements as well as the 99% quantile of our simulated distributions (Table 3).

Correlation within and between All Observers
Repeat measurements of MR1, MR3, and MRL were highly correlated within observers, as were measurements between both experienced observers (R > 0.9, Table 4). Correlations between experienced and inexperienced observers were also strong but slightly lower than ratios measured by experienced observers (R = 0.81compared to 0.85, Table 4). The standard deviation of measurements was small (range: 0 to 0.11), and for MR1 and MR3 there was no evidence of a relationship between the standard deviation of measurements and the mean in either measurement for both experienced and inexperienced observers (Table 4). There was a weak tendency for SD to decrease as mean increased for MRL (R-0.18, Table 4), suggesting this measurement may be more difficult to measure accurately when specimens are small.

Discussion
Accurate methods for bumble bee identifications are a critical component of any monitoring and conservation effort. Here, we present methods that allow researchers to discriminate among queens and workers of three cryptic eastern bumble bee species (B. sandersoni, B. vagans, and B. perplexus) with 100% accuracy in MRL ratios. We document that there was little variation among observers, even novices, in measurements, suggesting these methods, if properly employed, are robust with respect to measurement error. The quantification of defined ranges of these measurements presented for each species provides a replicable and timely tool for species identification given the current conservation interest in pollinator species and the prevalence of regional Bombus surveys to inform conservation best management practices.
When we used hair color or observations of the malar length as characters to separate B. sandersoni, B. vagans, and B. perplexus based on descriptions found in taxonomic sources e.g., [23,27,38] our rate of correct identification was far lower than the identification rate using our morphometric measurements (70% using visual descriptions vs 100% using MRL and 99%+ using both MR1 and MR3 together). Validated by DNA identification, we found visual characteristics, particularly hair color, could vary widely among individuals. For example, T5 in queens and worker B. vagans and B. sandersoni could be either all black or have widely varying amounts of yellow hair from few to many. Although Williams et al. [23] describes this variation, other sources such as Discoverlife.org suggest that B. vagans workers always have T5 black except in specimens collected in Newfoundland. In our specimens, the presence of yellow hairs on T5 was found in specimens collected in four different states in both the Midwest and Northeast regions of the United States. Further complicating hair color as a useful identification tool is preparation quality, such that hair color is often difficult to distinguish in Bombus specimens with matted hair, a not uncommon state of some Bombus specimens in collections. We found that B. perplexus can usually be identified by some of the following hair color patterns (1) the scutum has all light hair, or rarely with a few black hairs whereas B. vagans and B. sandersoni usually have many black hairs in this area, (2) the lower half of the pleura usually has dark hairs (common dark form), which the other two species do not, (3) the uncommon "light form" appears to have no dark hairs on the pleura, although often a few can be found low down around the legs, and (4) light form B. perplexus often have some yellow hairs on T3 which would not be the case for the other two species. While hair color can be useful to identify most B. perplexus, careful examination of malar ratio measurements is essential to confirm identification of light-form B. perplexus, and in some challenging cases, DNA confirmation may still be necessary. Additionally, while the malar ratios we present were calculated based on the examination of physical specimens of workers and queens, it is possible that they could be integrated into future community science projects (also called 'citizen science') if detailed photographs of the malar region and flagellar segments were taken providing a clear image that could be digitally measured with our methods.
The morphological similarity among these three species highlights the inherent challenges to accuracy in community science projects that do not collect specimens. Goulson et al. [49] point out that community science surveys are often limited by the taxonomic skills of the observers, particularly for bee species that are difficult or impossible to identify in the field because of cryptic coloration. MacPhail et al. [50] reported that 46.3% of B. vagans, 38.6% of B. sandersoni, and 86.4% of B. perplexus were correctly identified from photos of Bombus by project designated expert taxonomists, or they were placed into a "two-striped species group" when the photos were ambiguous. Richardson et al. [13] found that they could reliably make a species determination from 68% of photographs submitted by participants in the Vermont Bumble Bee Atlas. Suzuki et al. [51] in their survey of bumble bees in Japan found that they had high consistency of identification from photos that ranged from 95-97.7%. The higher success rate in Japan could be contributed to fewer species that exhibited similar hair patterns. However, none of these identifications were validated with DNA sequence data. For our three focal species with similar hair coloration, the use of photographs alone as a means for identification represents a cautionary tale.
Prior to this study, there were no established quantitative morphometric measurements ranges for workers of these three species. In his comprehensive paper "The Bombidae of the New World," Franklin [25], included a chart of the range of malar spaces for spring-caught queen B. vagans, as lengths vs. the widths of the eye expressed in Filar micrometer spaces (divisions) and expressed them as ratios. Using spring-caught queen B. sandersoni and B. vagans, Plowright and Pallett [28] presented their results for the means and range lengths of malar spaces and the length of the malar space-to-length of the 3rd antennal segment. If we assume that their antennal segment three is equivalent to our flagellar segment 1, then their measurements were similar to our results for queen B. vagans but showed greater variation for queen B. sandersoni. There is some ambiguity in their results because the authors did not provide detailed instruction on where they made their malar and flagellar measurements, nor did they validate their specimen identifications using DNA sequence data. Based on our measurements for MRL, MR1, and MR3, and our DNA confirmed species identification, we have been able to provide a reliable range of values to differentiate B. vagans and B. sandersoni. However, and perhaps not unexpected given its name, our values for B. perplexus overlapped the other two species, though generally accurate identifications can be made based on observations of hair coloration combined with malar ratios.
Although the three species we discuss in this paper are considered species of least conservation concern by ICUN criteria at this time [52], we recognize that bumble bees that were once considered common can experience rapid declines in population size and range restrictions as witnessed in Bombus affinis Cresson, B. terricola Kirby, and B. pensylvanica Degeer; species now considered Critically Endangered or vulnerable by the IUCN SSC Bumblebee Specialist Group (https://bumblebeespecialistgroup.org/north-america/). Moreover, population status for B. vagans and B. sandersoni varies considerably over different regions. For example, Jacobson et al. [11] found a significant decline in B. vagans in New Hampshire over the past 150 years. Similarly, B. vagans was considered to be in decline in Canada in 2008, but then deemed stable in 2012 because of an increase in records despite a reduction in historical range size [7,53] and was considered a candidate for additional monitoring [54]. In Illinois, Grixti et al. [55] report that B. vagans was locally extirpated from its historic southern and northern ranges. In a review of historical changes in US bees with shared ecological traits, Bartomous et al. [5] found that B. vagans exhibited a decline but that B. sandersoni was stable. Franklin [56] considered B. sandersoni to be one of the most abundant species in its range, but it was rare in New Hampshire [11], and in Canada it was considered a candidate for immediate conservation concern [52] despite its status of Least Concern on the ICUN Red List [53]. Similarly, Richardson et al. [13] noted a 53% decline in B. sandersoni although a 266% increase in B. vagans abundance in in Vermont. B. perplexus appears to be increasing overall in abundance in the US and Canada [5,53]. Goldstein and Ascher [57] suspected that B. sandersoni may be under-reported on Martha's Vineyard, MA because of its similar coloration with B. vagans, and to lesser degree, B. perplexus. Thus, conflicting reports of population trends in these cryptic species could be due in part to misclassification of specimens due to limitations in the diagnostic characteristics that were used to identify these species. This highlights our argument that accurate identifications are essential to accurately track population status and trends consistently across regions.

Conclusions
Around the globe, efforts are currently underway to examine changes in the distributions and abundance of pollinator species [17,[58][59][60], with particular focus on the status of the native species [7,52,[61][62][63][64]. However, for country and state-wide monitoring programs, or large-scale community science programs such as Bumble Bee Watch (https://www.bumblebeewatch.org/) and their associated conservation projects, providing accurate information on bumble bee population size and distribution depends on reliable identifications. The methods we present allow researchers to accurately discriminate among queens and workers of three cryptic species of bumble bee that had formerly posed a challenge to identify, particularly in worker specimens. The standardization of the MRL, MR1, and MR3 methods as we have defined them, and the range of values for each of the three species, provides an important tool to reliably identify species beyond visual characters such as hair coloration for B. perplexus and assessments of malar length in B. vagans and B. sandersoni. DNA analysis is an excellent tool to provide species identifications [65], but not everyone has access to DNA facilities or funding. In addition, care needs to be taken when interpreting DNA analyses (particularly for DNA barcoding projects) as results can often be misleading due to factors such as (but not limited to) the unintended amplification of non-target organisms (e.g., parasites, parasitoids, endosymbionts, etc.), the amplification of non-target loci (e.g., nuclear copies of mitochondrial DNA), and misidentifications in public databases (e.g., BOLD or GenBank). We are encouraged that the combination of DNA sequencing and morphological measurements was able to illuminate characters that can be used to accurately identify members of this confusing group of bumble bees, and we hope that as new technologies (such as Next-Generation sequencing) allow for high-throughput analyses of communities of organisms, that similar synergistic studies will be performed in other groups as well. The measurements we provide here to determine MRL, MR1, and MR3 values for B. sandersoni, B. vagans, and B. perplexus can be done with a minimum investment in equipment and time and provides highly accurate identification outcomes of these cryptic species required to inform effective monitoring and management decisions.

4.
Specimens must be oriented so that the body part being measured is oriented perpendicular to the axis of observation (e.g, place the specimen in a flat, horizontal position). A specimen manipulator allows for a pinned specimen to be viewed from multiple angles while remaining in a central viewing area under the microscope while allowing the viewer to maintain relative focus while twisting or turning a specimen to show all sides. A large cork can also be used, but it requires more patience to properly orient the specimen for measurement on a flat plane.

5.
To make the measurements, either rotate eyepiece to orient the reticle in a horizontal or vertical position or move the specimen. Try both methods to see which works best. Make one or more measurements to confirm that they are the same value. 6.
It is extremely important not to change the zoom between measurements. The specimen can be moved, and the focus can be adjusted between the length and width measurements, but do not change the zoom power. Each time you zoom up or down in power, the actual magnification level may be slightly different. Changing the zoom level between measurements means that you risk making your measurements at different magnification levels and the resulting ratios will be incorrect. The power of the zoom used for making measurements is not important as long as both measurements are made at the same magnification. 7.
Take care to ensure that both ends of the of the body part to be measured are in focus.

8.
Good lighting is important in making accurate determinations. A cool white LED ring lamp works well. 9.
Be careful to make sure that you are not 5 or 10 divisions off when reading the reticle measurement. This is a common error.