Semantic segmentation of UAV–acquired RGB orthomosaics is a key component for quantifying vegetation cover and monitoring phenology in precision agriculture. This study evaluates a representative set of CNN–based architectures (U–Net, U–Net Xception–Style, SegNet, DeepLabV3+) and Transformer–based models (Swin–UNet/Swin–Transformer, SegFormer, and Mask2Former) under a unified and reproducible protocol. We propose a transfer–and–consolidation workflow whose performance is assessed not only through region–overlap and pixel–wise discrepancy metrics, but also via boundary–sensitive criteria that are explicitly linked to orthomosaic–scale vegetation–cover estimation by pixel counting under GSD (Ground Sample Distance) control. The experimental design considers a transfer scenario between morphologically related crops: initial training on
Opuntia spp. (prickly pear), direct (“zero–shot”) inference on
Agave salmiana, fine–tuning using only 6.84% of the agave tessellated set as limited target–domain supervision, and a subsequent consolidation stage to obtain a multi–species model. The evaluation integrates IoU, Dice, RMSE, pixel accuracy, and computational cost (time per image), and additionally reports the BF score and HD95 to characterize contour fidelity, which is critical when area is derived from orthomosaic–scale masks. Results show that Transformer-based approaches tend to provide higher stability and improved boundary delineation on
Opuntia spp., whereas transfer to
Agave salmiana exhibits selective degradation that is mitigated through low–annotation–cost fine-tuning. On
Opuntia spp., Mask2Former achieves the best test performance (IoU 0.897 +/− 0.094; RMSE 0.146 +/− 0.002) and, after consolidation, sustains the highest overlap on both crops (IoU 0.894 +/− 0.004 on
Opuntia and IoU 0.760 +/− 0.046 on
Agave), while preserving high contour fidelity (BF score 0.962 +/− 0.102/0.877 +/− 0.153; HD95 2.189 +/− 3.447 px/8.458 +/− 16.667 px for
Opuntia/
Agave), supporting its use for final vegetation–cover quantification. Overall, the study provides practical guidelines for architecture selection under hardware constraints, a reproducible transfer protocol, and an orthomosaic–oriented implementation that facilitates integration into agronomic and remote–sensing workflows.