You are currently viewing a new version of our website. To view the old version click .
Insects
  • Article
  • Open Access

24 December 2025

An XGBoost-Based Morphometric Classification System for Automatic Subspecies Identification of Apis mellifera

,
,
,
,
,
,
,
,
and
1
Key Laboratory of Pathobiology, Ministry of Education, Jilin University, Changchun 132108, China
2
Jilin Provincial Key Laboratory of Bee Genetics and Breeding, Jilin 132108, China
3
Apiculture Science Institute of Jilin Province, Jilin 132108, China
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue Biology and Conservation of Honey Bees

Simple Summary

The reliable identification of honey bee subspecies is important for their breeding and conservation, but common approaches can be slow or expensive. We measured a compact set of routine body traits—mainly forewing angles and abdominal plate sizes—in worker bees collected under a standard protocol. Using these measurements, we built a small, easy-to-use classification tool that assigns subspecies with very high accuracy. The tool also shows which traits drive each decision so that users can understand why a specimen was assigned to a group. It runs quickly on a regular computer, accepts local data, and produces clear plots and a short list of key traits. The same steps can be retrained on new regional datasets. Our results show that routine measurements, combined with an accessible computer-based approach, can support fast screening in the lab or field and help prioritize samples for follow-up genetic testing.

Abstract

The conservation and breeding of the western honey bee (Apis mellifera) is central dependent on accurate subspecies assignment, but the most commonly used methods are labor-intensive classical morphometrics and costly molecular assays. We developed an XGBoost-based classification framework using a compact set of routinely measurable characters. A curated dataset of labeled workers was measured under harmonized protocols; features were screened according to embedded importance, and model performance was assessed using five-fold cross-validation, outperforming standard machine learning baselines. The resulting model using only the top 10 characters—primarily forewing venation angles and abdominal plate metrics—achieved high performance (accuracy = 0.98; F1 = 0.99) and an area under the receiver operating characteristic curve (AUC) of 0.99 (95% CI = 0.995–0.999). SHAP analyses confirmed the discriminatory contributions of these features, while error inspection suggested that misclassifications were concentrated in morphologically overlapping lineages. The model’s performance supports its use as a rapid triage tool alongside genetic testing, providing a scalable and interpretable tool for researchers to create and deploy custom morphometric models, demonstrated here for A. mellifera but portable to other insect taxa.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.