Simple Summary
The reliable identification of honey bee subspecies is important for their breeding and conservation, but common approaches can be slow or expensive. We measured a compact set of routine body traits—mainly forewing angles and abdominal plate sizes—in worker bees collected under a standard protocol. Using these measurements, we built a small, easy-to-use classification tool that assigns subspecies with very high accuracy. The tool also shows which traits drive each decision so that users can understand why a specimen was assigned to a group. It runs quickly on a regular computer, accepts local data, and produces clear plots and a short list of key traits. The same steps can be retrained on new regional datasets. Our results show that routine measurements, combined with an accessible computer-based approach, can support fast screening in the lab or field and help prioritize samples for follow-up genetic testing.
Abstract
The conservation and breeding of the western honey bee (Apis mellifera) is central dependent on accurate subspecies assignment, but the most commonly used methods are labor-intensive classical morphometrics and costly molecular assays. We developed an XGBoost-based classification framework using a compact set of routinely measurable characters. A curated dataset of labeled workers was measured under harmonized protocols; features were screened according to embedded importance, and model performance was assessed using five-fold cross-validation, outperforming standard machine learning baselines. The resulting model using only the top 10 characters—primarily forewing venation angles and abdominal plate metrics—achieved high performance (accuracy = 0.98; F1 = 0.99) and an area under the receiver operating characteristic curve (AUC) of 0.99 (95% CI = 0.995–0.999). SHAP analyses confirmed the discriminatory contributions of these features, while error inspection suggested that misclassifications were concentrated in morphologically overlapping lineages. The model’s performance supports its use as a rapid triage tool alongside genetic testing, providing a scalable and interpretable tool for researchers to create and deploy custom morphometric models, demonstrated here for A. mellifera but portable to other insect taxa.