Abstract
Accurate, high-throughput estimation of Above-Ground Biomass (AGB), a key predictor of yield, is a critical goal in rapeseed breeding. However, this is constrained by two key challenges: (1) traditional measurement is destructive and laborious, and (2) modern deep learning approaches require vast, costly labeled datasets. To address these issues, we present a data-efficient deep learning framework using smartphone-captured top-down RGB images for AGB estimation (Fresh Weight, FW, and Dry Weight, DW). Our approach utilizes a two-stage strategy where a Vision Transformer (ViT) backbone is first pre-trained on a large, aggregated dataset of diverse, non-rapeseed public plant datasets using the DINOv2 self-supervised learning (SSL) method. Subsequently, this pre-trained model is fine-tuned on a small, custom-labeled rapeseed dataset (N = 833) using a Multi-Task Learning (MTL) framework to simultaneously regress both FW and DW. This MTL approach acts as a powerful regularizer, forcing the model to learn robust features related to the 3D plant structure and density. Through rigorous 5-fold cross-validation, our proposed model achieved strong predictive performance for both Fresh Weight (Coefficient of Determination, R2 = 0.842) and Dry Weight (R2 = 0.829). The model significantly outperformed a range of baselines, including models trained from scratch and those pre-trained on the generic ImageNet dataset. Ablation studies confirmed the critical and synergistic contributions of both domain-specific SSL (vs. ImageNet) and the MTL framework (vs. single-task training). This study demonstrates that an SSL+MTL framework can effectively learn to infer complex 3D plant attributes from 2D images, providing a robust and scalable tool for non-destructive phenotyping to accelerate the rapeseed breeding cycle.