Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting

Zhou, Jinhao; Zhang, Pengcheng; Wei, Menglei; Liu, Wei; Shi, Jiawei; Tan, Youheng; Hu, Jianping

doi:10.3390/agriculture16060657

Open AccessArticle

Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting

by

Jinhao Zhou

¹,

Pengcheng Zhang

¹,

Menglei Wei

²

,

Wei Liu

¹,

Jiawei Shi

¹,

Youheng Tan

¹ and

Jianping Hu

^1,*

¹

Jiangsu Provincial Key Laboratory of Hi-Tech Research for Intelligent Agricultural Equipment, Jiangsu University, Zhenjiang 212013, China

²

Jiangsu Academy of Agricultural Sciences Wuxi Branch, Wuxi 214174, China

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(6), 657; https://doi.org/10.3390/agriculture16060657

Submission received: 17 January 2026 / Revised: 11 March 2026 / Accepted: 12 March 2026 / Published: 13 March 2026

(This article belongs to the Section Agricultural Technology)

Download

Browse Figures

Versions Notes

Abstract

Correct arch-back orientation is essential in ridge-based strawberry transplanting. Improper orientation can increase soil contact and soil-borne disease risk, leading to yield loss and reduced harvest efficiency. In current practice, arch-back orientation of bare-root seedlings is still mainly judged and corrected manually, which is labor-intensive and not always accurate under field conditions. Although plug seedlings are easier for mechanized transplanting, they are about three times more expensive than bare-root seedlings. Therefore, bare-root seedlings remain widely used for cost-effective production. However, accurate real-time orientation perception for bare-root seedlings is still challenging because stems are thin, morphology varies widely, and leaves often occlude key curvature cues. To address this gap, we propose a lightweight machine-vision method for bare-root strawberry seedlings that detects three characteristic keypoints on the new stem. The three-keypoint design is inspired by farmers’ practical judgement: farmers often determine arch-back direction by observing the stem and using manual touch to sense curvature changes. Similarly, three keypoints provide a simple geometric representation of curvature trend, enabling real-time estimation of both arch-back direction and bending angle. Physical tests on 100 bare-root seedlings achieved a 93% agronomically compliant orientation rate, with an MAE of 5.74° and an RMSE of 7.44° for bending-angle estimation. For edge deployment, the optimized model achieved real-time performance on an embedded GPU platform, reaching 152.51 FPS (FP16) and 154.26 FPS (INT8). Overall, the proposed method provides a practical perception module that can be integrated into strawberry transplanting machines to support cost-effective, orientation-aware mechanized transplanting.

Keywords:

directional transplanting; keypoint; lightweight model

1. Introduction

Strawberry is a high-value fruit widely cultivated worldwide [1,2]. Strawberry planting is typically established by transplanting seedlings into the field [3,4]. During ridge-based transplanting, seedlings should be placed with the arch-back (curved) side of the new stem facing the ridge exterior, which facilitates subsequent operations such as canopy management and harvesting and can improve within-canopy light interception for more uniform fruit ripening [5,6]. However, although strawberry transplanting machines have become increasingly mature and multiple machine types are now available, the agronomic requirement of reliable arch-back orientation remains largely unresolved. In practice, the arch-back direction is still judged by workers or adjusted through nursery-stage interventions, meaning the process is labor-saving but not labor-free and the machines cannot fully match strawberry planting requirements [7]. This bottleneck is particularly critical for bare-root seedlings, which are widely used because their cost is typically only about one third of plug seedlings. In field operations, correct orientation judgement is highly dependent on experienced farmers, and untrained operators usually require training to perform the judgement consistently. Manual transplanting errors may cause fruit-bearing branches to grow toward the inter-row space, reducing light exposure and increasing soil contact. This can increase soil-borne disease risk and lead to yield loss. Meanwhile, bare-root seedlings exhibit thin stems, large morphological variability, and frequent leaf occlusion, making reliable and real-time orientation perception difficult under practical conditions. Therefore, there is an urgent need for an automatic, field-deployable solution that can directly perceive stem curvature and recognize arch-back direction in real time [8,9]. In this study, we address this bottleneck by introducing a machine-vision-based, lightweight keypoint learning approach for arch-back direction recognition of bare-root strawberry seedlings, aiming to enable orientation-aware mechanized transplanting. In this work, we formulate the problem as three-keypoint pose estimation. The model outputs three stem keypoints, from which the arch-back direction and the bending angle

θ

are computed geometrically. Thus, the final direction/angle outputs are derived from keypoint predictions rather than being produced by a separate direction classifier or angle regressor.

Many crops have agronomic requirements for oriented planting during transplanting [10,11]. Existing oriented planting solutions can be broadly categorized into two groups. The first group relies on mechanical regulation (e.g., guiding structures, hoppers, or torque-imbalance mechanisms) to passively align seeds or seedlings during feeding and dropping [12,13,14,15,16]. These designs can be effective when the target morphology is relatively rigid and consistent, but their performance often depends on stable shape, predictable motion, and limited occlusion, which makes them less suitable for seedlings with high variability and flexible stems. The second group uses machine vision to recognize pose cues and then actively adjusts orientation using a control mechanism [14,17]. Vision-based strategies can partially address orientation perception by providing direct visual cues for pose recognition. However, many existing practical solutions still rely on nursery-stage manual interventions (e.g., pre-orientation or standardized seedling preparation) to make seedlings more uniform for mechanized transplanting and to reduce perception difficulty. Such workflows do not fully consider economic constraints and leave a gap for a robust and fast orientation perception approach that can work reliably on low-cost, widely used bare-root seedlings under real operating conditions.

For strawberries, existing orientation-related studies focus mainly on plug seedlings. Sun et al. [17] developed an automatic transplanting device for plug seedlings by exploiting the relationship between stolon direction and arch-back orientation. However, the stolon–arch-back relationship is not consistent across practical propagation scenarios, especially when multiple daughter plants are produced along the stolon. Therefore, inferring arch-back direction solely from stolon orientation is not robust in real transplanting operations. In contrast, bare-root seedlings are widely used in practice because their cost is about one-third that of plug seedlings [18]. Overall, existing methods still lack a robust, low-cost, real-time solution for bare-root seedling orientation under practical transplanting conditions. These limitations highlight the need for a new vision-based approach that can directly interpret the morphological arch-back characteristic of strawberry seedlings and determine its orientation. Such a method should also be fast and lightweight to support deployment on resource-constrained platforms in real field transplanting.

Accordingly, this paper proposes an ultra-lightweight deep learning network, termed Stem-YOLO, for arch-back direction recognition of bare-root strawberry seedlings. The main contributions are summarized as follows.

(1): We constructed a new dataset of bare-root strawberry seedlings covering diverse environments and seedling types. The images were collected from multiple locations in Jiangsu Province, including Taicang, Jurong, and Yixing. The dataset also contains several cultivars such as ’Hongyan’ and ’Ningyu’. In total, it consists of 8076 images of bare-root strawberry seedlings captured under diverse field-like conditions to reflect practical variability.
(2): We formulate arch-back perception as three-keypoint pose estimation and propose a geometric inference strategy to obtain both arch-back direction and bending angle $θ$ from keypoint geometry.
(3): We develop Stem-YOLO to achieve a favorable accuracy–efficiency trade-off for edge deployment. The model is further optimized via pruning, reducing GFLOPs by 71.2% and parameters by 26.7%, and then accelerated with TensorRT to enable real-time inference on Jetson Orin NX (154.3 FPS, 6.48 ms per frame, INT8).
(4): We establish a vision-based physical validation platform to evaluate both bending-angle estimation and agronomically compliant orientation adjustment. On 100 bare-root strawberry seedlings, the proposed method achieves a 93% agronomically compliant orientation rate, and the bending-angle estimation yields an MAE of 5.74° and an RMSE of 7.44°.

2. Materials and Methods

2.1. Construction of Datasets

To meet the agronomic requirement of oriented transplanting in ridge-based strawberry production, bare-root seedlings should be placed with the arch-back (curved) side facing the ridge exterior (Figure 1a). In current practice, this orientation is often judged and adjusted manually during planting (Figure 1b), which is labor-intensive and difficult to standardize for mechanized transplanting. To support automatic arch-back direction recognition under practical transplanting conditions, we constructed a dedicated image dataset for training and evaluating the proposed keypoint-based model, covering seedling detection, keypoint localization, and subsequent curvature-based direction inference. The data were collected across multiple locations and cultivars to capture realistic variations in seedling morphology, bending angles, leaf occlusion, and complex backgrounds that commonly occur in field-like operations. The dataset was collected at several locations in Jiangsu Province, including Jurong, Yixing, and Taicang in Figure 1c. The main study area is Jurong, where the dominant cultivar is ‘Hongyan’ strawberry. The data were acquired in a single strawberry greenhouse at the Da Xiaohua family farm in Jurong. This greenhouse contains eight planting beds. Each bed is 100 m long, with a bed-top width of 40 cm, a bed-bottom width of 65 cm, and a bed height of 30 cm. Images were captured using a Canon 7D camera (Canon Inc., Tokyo, Japan) equipped with an EF-S 18–135 mm lens in the visible RGB spectrum. During acquisition, the camera was used under fixed imaging conditions in the greenhouse, with typical settings of approximately 35 mm focal length, f/8 aperture, 1/125 s shutter speed, and ISO 400, so as to maintain consistent image quality. The acquisition took place on 9 September 2024 (14:00–17:00) under natural greenhouse illumination (no additional artificial lighting). The camera was positioned approximately 40 cm above the bare-root strawberry seedlings with a vertical viewing angle to maintain consistent imaging conditions. The dataset initially contained 2979 raw images of bare-root seedlings from five cultivars prepared for planting, including ‘Hongyan’ and ‘Ningyu’. After manual screening and cleaning, 2019 valid images were retained.

2.2. Dataset Labeling and Enhancement

The experimental images were annotated using the Labelme tool (https://github.com/tzutalin/labelImg, accessed on 5 March 2025). The labels include the class ID of the bare-root strawberry seedling, the center coordinates of the bounding box, and the coordinates and visibility of the three keypoints. An example of the annotation format is shown in Figure 2a. The spatial distribution of the three keypoints is illustrated in Figure 2b. All annotations were performed by the author following a unified labeling guideline, and a random subset of the labeled images was rechecked to ensure annotation consistency and correct possible errors. Point

P_{1}

is located at the midpoint of the axillary bud differentiation region at the upper end of the new stem. Point

P_{2}

is located at the midpoint of the junction between the new stem and the crown (rhizome). Point

P_{3}

is located at the midpoint of the distal end of the crown. These keypoints were defined at anatomically stable landmarks, and the same labeling rule was applied consistently across all images to reduce ambiguity. In mechanized transplanting of bare-root strawberry seedlings, each seedling needs to be recognized and transplanted individually. Therefore, there is no occlusion problem for the keypoints. The visibility of all three keypoints is thus set to 2, indicating that every keypoint is visible [19].

To better match actual planting conditions and to alleviate sample imbalance while increasing the diversity of training data, the experimental images were augmented using the ImgAug 3.2 software (https://github.com/Fafa-DL/Image-Augmentation, accessed on 9 March 2025). Brightness adjustment and motion blur were applied to enrich the dataset and reduce the risk of overfitting. After augmentation, the dataset was expanded to 8076 images. The dataset was then randomly divided into three subsets at a ratio of 7:2:1, resulting in 5653 images for training, 1615 images for validation, and 808 images for testing. This partition strategy helps avoid bias and overfitting and provides representative data distributions for training, validation, and testing [20].

2.3. Test Environments

The experiments in this paper were conducted on an Ubuntu 20.04 operating system with 32 GB of RAM, an NVIDIA RTX 4070 Ti GPU, and a 13th Gen Intel^® Core™ i7-13700F CPU running at 2.1 GHz. The deep learning framework used was PyTorch 2.2.2 with torchvision 0.17.2, compiled against CUDA 12.1 and cuDNN 8.8.1. The detailed experimental environment is summarized in Table 1. For a fair and comparable evaluation, all ablation and comparative experiments were trained from scratch without using pretrained weights. The main training hyperparameters for the models are summarized in Table 2.

2.4. Stem-YOLO Network Model

YOLOv11 introduces the C3k2 module and the C2PSA module. These modules reduce the computational complexity and improve the efficiency and accuracy of feature extraction. As a result, YOLOv11 performs well in real-time detection tasks. These improvements also provide a solid foundation for extending YOLOv11 to pose estimation, especially for accurate keypoint detection in complex environments [21]. Based on these advantages, this study modifies YOLOv11-Pose within this framework and constructs the Stem-YOLO network for arch-back direction detection of strawberry seedlings. The goal is to achieve higher efficiency and accuracy in keypoint detection [22,23]. The overall network architecture is shown in Figure 3.

2.4.1. KWConv and C3K2_KW Feature Enhancement Module

Dynamic convolution has stronger feature modeling capability than conventional static convolution. However, it requires n groups of convolution kernels for linear combination, which greatly increases the number of parameters and the model size. This overhead limits its application in practical networks. To address this problem, we introduced Kernel Warehouse Convolution (KWConv) into the YOLOv11 model. As shown in Figure 4, a standard convolution kernel is divided into several smaller kernel cells along the spatial and channel dimensions. A shared kernel warehouse is then constructed within the same stage. An attention mechanism assigns weights to local kernels in the warehouse and generates an equivalent convolution kernel that adapts to the input. In this way, the representational ability and parameter utilization efficiency are significantly improved under a fixed parameter budget [24]. Based on this mechanism, KWConv is adopted in the backbone to replace conventional convolution. This allows the backbone network to describe fine-grained structural features more flexibly while reducing redundant computation. As a result, the overall GFLOPs are reduced from 6.6 to 3.9, which is more favorable for subsequent edge deployment.

We further improve the original C3K2 module and construct a C3K2_KW feature enhancement module based on KWConv. As shown in Figure 5, C3K2_KW preserves the residual topology of the C3 structure. Only the key 3 × 3 convolutions in the main branch are replaced with KWConv. Through kernel partitioning and cross-layer sharing, the module forms richer dynamic kernel combinations inside the block and strengthens multi-scale and hierarchical feature representations. The ablation study further confirms that introducing KWConv and C3K2_KW substantially reduces GFLOPs while keeping mAP_50:95 at a comparable level to the YOLO11n baseline. These results verify the effectiveness and efficiency of the proposed feature enhancement module.

2.4.2. ELA-HSFPN Multi-Scale Feature Fusion Module

In the original YOLO11 neck, the FPN/PAN structure mainly relies on stepwise upsampling, downsampling, and convolutional concatenation to achieve multi-scale feature fusion. It lacks explicit modeling of salient regions at different spatial positions and scales. As a result, it may introduce irrelevant responses in complex backgrounds and fine-grained structures, such as the arch-back region of bare-root strawberry seedlings. To enhance the multi-scale representation ability of the neck, this study introduces an Efficient Local Attention (ELA) module on top of the hierarchical feature pyramid network (HSFPN). The resulting ELA-HSFPN structure adaptively recalibrates and enhances multi-scale features.

As shown in Figure 6, the ELA module first performed strip pooling on the input feature map along the horizontal and vertical directions. This operation generated two sets of one-dimensional feature vectors that contained long-range positional information. These vectors were then processed by one-dimensional convolutions and group normalization to achieve local interaction and enhancement. After nonlinear activation, horizontal and vertical positional attention maps were obtained. The two attention maps were multiplied to form a complete spatial attention map, which was then applied to the original features through element-wise multiplication for reweighting [25]. Compared with Coordinate Attention, ELA did not reduce the channel dimension during attention generation. It also used 1D convolution and group normalization instead of 2D convolution and batch normalization. In this way, ELA maintained a lightweight design while balancing long-range dependency modeling and good generalization performance for small models.

Based on the above idea, the ELA module is serially inserted at each key fusion node of HSFPN to form the improved ELA-HSFPN structure. HSFPN is responsible for top-down and bottom-up hierarchical multi-scale feature propagation and fusion. Before the fused features at each scale are fed into the next convolutional layer or the detection head, they are first recalibrated by the position attention of ELA. This process highlights the responses related to the strawberry seedling targets and their arch-back structures. At the same time, it suppresses the responses from the soil background and noisy textures. In this study, we adopted a lightweight ELA configuration that is recommended for object detection tasks and apply it only with a moderate depth in the neck. In this way, the multi-scale feature representation is effectively strengthened, and the robustness and accuracy of arch-back recognition are improved.

2.4.3. LSCD-LQE Quality-Aware Ultra-Lightweight Detection Head

As shown in Figure 7, we redesign a lightweight detection head named LSCD-LQE. It consists of a Lightweight Shared Convolutional Detection (LSCD) structure and a Localization Quality Estimation (LQE) branch. The aim is to reduce the parameters and computation of the detection head, while making the predicted scores better reflect the true localization quality of the candidate boxes. In LSCD, the convolutional structure of the multi-scale decoupled head is unified. Feature maps at different scales share a shallow convolutional backbone [26]. Only a small number of 3 × 3 convolutions and 1 × 1 prediction convolutions are kept to output classification, bounding boxes, and keypoints. Reusing the same kernels across scales greatly reduces the parameters and convolutional computation, which is suitable for embedded platforms with limited resources. To maintain stable optimization and good classification/localization performance under small-batch training, Group Normalization (GN) is used instead of Batch Normalization. Previous work in FCOS has shown that GN can improve the convergence stability and accuracy of the regression and classification branches [26]. Shared convolutions may cause different response magnitudes at different scales. Therefore, an independent scale layer is added to each scale branch to apply a learnable scaling factor to the shared features. This adaptive scaling reduces distribution differences between targets of different sizes and improves multi-scale detection robustness. By combining GN, shared convolutions, and Scale layers, LSCD significantly reduces the complexity of the detection head, while limiting the negative impact on detection accuracy.

Following the “distributional boundary modeling and distribution-guided quality estimation” idea in Generalized Focal Loss V2, an LQE branch was incorporated. This branch explicitly models the spatial localization quality of candidate boxes. For each prediction location, the network no longer regresses a single boundary offset. Instead, it predicts, for each of the four sides, a probability distribution over a set of discrete sampling points {

y_{0}

,…,

y_{n}

}:

P^{w} (y_{i}), w \in {l, r, t, b}

(1)

where

w \in {l, r, t, b}

indexes the four box sides (left, right, top, and bottom). {

y_{0}

, …,

y_{n}

} denotes the set of discrete sampling points (bins) used to represent the generalized distribution, and

y_{i}

is the

i

-th bin value.

P^{w} (y_{i})

is the predicted probability mass assigned to

y_{i}

for side

w

, satisfying

P^{w} (y_{i})

≥ 0 and

\sum_{i = 0}^{n} P^{w} (y_{i})

= 1. Based on this distribution, the boundary value

{\hat{y}}^{w}

is computed as the discrete expectation in (2).

The predicted boundary value is then obtained in the form of a discrete expectation:

{\hat{y}}^{w} = \sum_{i = 0}^{n} P^{w} (y_{i}) y_{i}

(2)

In this way, both the numerical estimate and its uncertainty are encoded. In LSCD-LQE, the quality estimation branch concatenates the four directional discrete distributions along the channel dimension. It then extracts several simple statistics to form a low-dimensional feature vector. This vector is passed through a lightweight mapping composed of a 1 × 1 convolution and a nonlinear activation to produce a scalar localization quality score.

I \in [0, 1]

The score

I

is multiplied element-wise with the class confidence vector

C

from the classification branch to obtain a joint score

J = C \times I

(3)

which is used for loss supervision and candidate box ranking during inference [27]. Through the joint design of lightweight shared convolution (LSCD) and distribution-guided quality estimation (LQE), the LSCD-LQE head achieves a significant reduction in parameters and GFLOPs. At the same time, it improves the consistency between the predicted scores and the true IoU, thus minimizing accuracy loss on resource-constrained platforms.

2.5. Model Pruning

Model pruning can effectively reduce the number of parameters, computational cost, and memory footprint of a network. It also lowers the hardware requirements during deployment. In this study, the improved Stem-YOLO model is compressed using a Layer-wise Adaptive Magnitude-based Pruning (LAMP) strategy to obtain a more lightweight architecture.

LAMP (Layer-wise Adaptive Magnitude-based Pruning) is a magnitude-based pruning criterion that works in a layer-adaptive manner [28]. It does not require extra hyperparameter tuning and can be applied to a wide range of network architectures and datasets. The idea is to rescale the magnitude of each weight using a model-level distortion metric, so that pruning focuses on reducing output distortion after weights are removed.

For a weight vector

W

in one layer, let

W [u]

denote the

u

-th weight after sorting all weights in ascending order of magnitude. The LAMP score of

W [u]

is defined as

score (u; W) = \frac{(W [u])^{2}}{\sum_{v \geq u} (W [v])^{2}}

(4)

Here, the denominator is the sum of squared weights from index

u

to the end of the list in that layer. The pruning decision can be written as

(W [u])^{2} > (W [v])^{2} \Rightarrow score (u; W) > score (v; W)

(5)

which means that larger weights always receive higher LAMP scores, while relatively small weights obtain lower scores and will be pruned first.

As shown in Figure 8, LAMP scores are computed for all weights in the network and then sorted globally. A global threshold is applied to these scores to reach a target overall sparsity, so that different layers automatically obtain different pruning ratios. This global LAMP-based pruning scheme avoids manual per-layer sparsity design and improves the retraining performance of the pruned model.

2.6. Arch-Back Identification of Bare-Root Strawberry Seedlings Based on a Three-Keypoints Pose Model

Stem-YOLO performs three-keypoint detection on each bare-root strawberry seedling. For an input image, it outputs the pixel coordinates

P_{1} = (x_{1}, y_{1}), P_{2} = (x_{2}, y_{2}), P_{3} = (x_{3}, y_{3})

, which are then used for geometric inference of arch-back direction and bending angle. The downstream task outputs are obtained by geometric computation: the stem bending angle

θ

is computed from the three keypoints, and the arch-back direction is determined by the sign of the lateral projection score

s

(out-of-ridge vs. in-ridge; Equation (12)) computed from the fitted Bézier curve. Point

P_{1}

is located at the midpoint of the axillary bud differentiation region at the upper end of the new stem. Point

P_{2}

is located at the transition between the new stem and the crown. Point

P_{3}

is located at the midpoint of the distal end of the crown. These three points are distributed along the skeleton line of the arch-back contour. Therefore, arch-back recognition can be formulated as a geometric inference problem based on the three detected keypoints.

Based on the field investigation of 500 seedlings (Section 2.8), the arch-back direction inference became unstable when the stem was nearly straight or excessively bent. Therefore, an angle-based screening rule was introduced to filter out postures that are geometrically unreliable for direction judgment. Let

P_{1},

P_{2},

P_{3}

denote the three detected keypoints. The vectors

\vec{v_{12}} = P_{2} - P_{1}

and

\vec{v_{23}} = P_{3} - P_{2}

represent two consecutive stem segments. The included angle ϕ between these two vectors is computed as

ϕ = \arccos (\frac{\vec{v_{12}} \cdot \vec{v_{23}}}{(‖ v_{12} ‖ + ϵ) (‖ v_{23} ‖ + ϵ)})

(6)

where

ϵ

is a small constant for numerical stability. We use the supplementary angle

θ = 180 ° - ϕ

as the stem bending angle. A sample is accepted only when,

θ \in [100 °, 160 °]

(equivalently).

ϕ \in [20 °, 80 °]

Otherwise, the posture is excluded from arch-back direction evaluation.

For valid samples, we estimate the arch-back direction from the bending tendency of the arch-back skeleton. The three keypoints (

P_{1}

,

P_{2},

P_{3}

) are used to fit a quadratic Bézier curve according to (7),

B (t) = (1 - t^{2}) P_{1} + 2 (1 - t) t P_{2} + t^{2} P_{3}, t \in [0, 1]

(7)

The second derivative of a quadratic Bézier curve is constant and can be written as (8)

B ″ (t) = 2 {(P}_{3} - 2 P_{2} + P_{1})

(8)

We therefore define

k = P_{3} - 2 P_{2} + P_{1}

(9)

which is proportional to

B ″ (t)

and encodes the curve’s bending tendency implied by the three keypoints. The arch-back direction is then decided in a stem-centric coordinate system. The stem-axis unit vector is

a = \frac{v_{12}}{‖ v_{12} ‖ + ϵ} = (a_{x}, a_{y})

(10)

and its in-plane unit normal (lateral axis) is

n = (- a_{y}, a_{x})

(11)

Finally, the signed lateral projection score is computed as

s = k \cdot n

(12)

where

s \in

ℝ is a scalar used for arch-back direction classification.

If

s

> 0, the arch-back is classified as bending toward the positive lateral direction defined by

n

; if

s

< 0, it is classified as bending toward the negative lateral direction −n. The sign of

n

is kept consistent for all samples. Specifically, under the fixed camera pose and known ridge orientation, the positive lateral direction is defined to coincide with the out-of-ridge direction. Therefore,

s

> 0 indicates that the arch-back faces outward, whereas

s

< 0 indicates that it faces inward relative to the ridge in Figure 9. In practical operation, the outward-facing direction is taken as the default target orientation. If

s

< 0, the pose-adjustment mechanism is triggered to perform a 180°reorientation so that the seedling is rotated to the outward-facing configuration before transplanting.

2.7. Assessment of Indicators

The choice of evaluation metrics takes into account the characteristics of the arch-back recognition task for bare-root strawberry seedlings. In this task, there is only a single target class, and the target shape and spatial distribution are relatively fixed. As a result, the differences between samples in the dataset are limited. Under these conditions, commonly used metrics such as precision (P), recall (R), and mAP₅₀ tend to become saturated across different comparison models. The numerical differences are very small and cannot fully reflect the performance changes brought by modifications to the network structure. To avoid this saturation effect, this study adopts COCO-style mAP_50:95 as the main accuracy metric. This indicator provides a more comprehensive description of the overall performance of the model in terms of localization accuracy and arch-back posture discrimination. At the same time, the main focus of this work is lightweight design for edge deployment. Therefore, in addition to accuracy metrics, we also introduce complexity metrics such as GFLOPs, the number of parameters (Params), and model size. These metrics are used to quantify the improvements in computational cost and storage overhead. The subsequent analysis and discussion mainly focus on mAP_50:95 together with GFLOPs, parameter count, and model size. In this way, we highlight the advantage of achieving substantial model compression and deployability while maintaining high accuracy.

Precision measures the proportion of predicted detections that are correct, and its calculation is given in Equation (13). In this context, true positives (TP) represent the number of bare-root strawberry seedling instances that are correctly detected with valid keypoint predictions. False positives (FP) and false negatives (FN) correspond to the number of non-targets incorrectly detected as targets and the number of true targets that are missed by the model, respectively.

P r e c i s i o n = \frac{T P}{T P + F P} \times 100 %

(13)

Recall quantifies the proportion of real targets that are successfully detected by the model, and its value is computed according to (14).

R e c a l l = \frac{T P}{T P + F N} \times 100 %

(14)

AP corresponds to the area under the precision-recall curve. A value closer to 1 indicates better detection performance, and its computation is given in (15).

A P = \int_{0}^{1} p (R) d R

(15)

mAP_50:95 is obtained by averaging the AP values over IoU thresholds from 0.50 to 0.95 with a step of 0.05. A value closer to 1 indicates better overall detection performance, and its computation is given in (16).

{\begin{array}{l} τ_{k} = 0.50 + 0.05 k, k = 0, 1, \dots, 9 \\ m A P_{50 : 95} = \frac{1}{10} \sum_{k = 0}^{9} A P (τ_{k}) \end{array}

(16)

FLOPs denotes the number of floating-point operations required for a single convolutional layer and is computed according to (17).

FLOPs = 2 \times H \times W (C_{in} K^{2} + 1) C_{out}

(17)

The overall computational complexity of the network is expressed in GFLOPs, which are obtained by summing the FLOPs of all

L

layers and dividing by 109, as shown in (18).

GFLOPs = \frac{1}{1 0^{9}} \sum_{l = 1}^{L} FLOPs l

(18)

Model lightweighting is assessed by the number of parameters (Params), whose value for a single convolutional layer is computed as shown in (19).

Params = C_{in} \times K^{2} \times C_{out}

(19)

Additionally, the model size serves as an indicator of how easily the model can be deployed and how suitable it is for deployment on lightweight devices [29].

2.8. Physical Validation Setup and Agronomic Acceptance Criterion

To assess the practical performance of the proposed method, both a physical validation platform and a field-based agronomic survey were conducted. The physical validation platform is shown in Figure 10, while the agronomic field investigation used to determine the acceptable orientation tolerance is illustrated in Figure 11.

As shown in Figure 10a, each bare-root strawberry seedling was fixed in a transparent acrylic cylinder mounted on a rotating platform. The platform was driven by a servo motor controlled by an NVIDIA Jetson Orin NX 16 GB (SUPER) module (NVIDIA Corporation, Santa Clara, CA, USA), which also served as the embedded inference platform. The module is equipped with 1024 CUDA cores, 16 GB LPDDR5 memory, and up to 157 TOPS of AI computing power. An Intel RealSense D435i camera (Intel Corporation, Santa Clara, CA, USA) was positioned in front of the cylinder to acquire frontal RGB images of the stem region. During validation, the Jetson Orin NX controlled the servo motor to rotate the platform through 360°, while the RealSense D435i continuously captured images for real-time inference. Based on the detected keypoints, Stem-YOLO identified the arch-back direction at different viewing angles and recorded the viewing angle yielding the most reliable recognition result. The system then calculated the required rotation angle and adjusted the seedling to the desired orientation.

To determine whether the adjusted orientation satisfied practical agronomic requirements, a field survey was conducted at 15:00 on 12 January 2025 at the Da Xiaohua Family Farm in Jurong, Zhenjiang, during the strawberry fruit-maturity stage. As shown in Figure 11a, errors in manual transplanting can result in an improper fruit-bearing orientation, meaning that the fruit may no longer remain on the outer side of the ridge as required by agronomic practice. Therefore, a total of 500 strawberry plants were investigated, and any plant whose fruit-bearing side remained outside the ridge was regarded as satisfying the agronomic planting requirement. As shown in Figure 11b, the orientation deviation angle γ was measured as the angle between the actual arch-back planting direction and the ideal planting direction, which is perpendicular to the ridge line. Statistical analysis showed that the agronomic requirement could still be satisfied when γ remained within ±11.05°. Accordingly, in the present validation experiment, the final seedling orientation was considered agronomically acceptable when the adjusted arch-back direction fell within this tolerance range.

After the platform stopped, the final arch-back direction was checked by manual visual inspection to determine whether the orientation result was correct. Meanwhile, the stem bending angle

θ

was calculated from the three detected keypoints. This estimated angle was then compared with the angle measured manually using a protractor. Based on these comparisons, the arch-back orientation accuracy, mean absolute error (MAE), root mean square error (RMSE), mean relative error (MRE), and standard deviation were used to evaluate the practical performance of the proposed method. The corresponding deployment and physical validation results are presented in Section 3.3.

3. Results

3.1. Ablation Experiment

As shown in Table 3, each module affects accuracy and computational cost differently. KWConv improves accuracy from 0.845 to 0.856 while substantially reducing computational cost (GFLOPs: 6.6 → 3.9), with only a small increase in the number of parameters. LSCD-LQE also improves accuracy (0.845 → 0.858) while reducing the number of parameters and model size. When KWConv and LSCD-LQE are combined, the model reaches the highest mAP_50:95 (0.866) with a relatively low computational cost (3.1 GFLOPs).

This trade-off is more clearly illustrated in Figure 12, where the combined KWConv + LSCD-LQE configuration lies in the favorable upper-left region of the accuracy-efficiency plot, indicating better accuracy with lower computational cost. To further show the relative effect of each module, Figure 13 presents the change in mAP_50:95 compared with the baseline. It can be seen that KWConv, LSCD-LQE, and their combination all produce positive accuracy gains, whereas ELA-HSFPN alone leads to a slight decrease in accuracy. In contrast, Figure 14 shows the corresponding change in GFLOPs relative to the baseline, demonstrating that most variants reduce computational cost, with the largest reduction observed for KWConv + LSCD-LQE and the final Stem-YOLO configuration.

Taken together, Figure 12, Figure 13 and Figure 14 show that different modules contribute differently to the balance between accuracy and efficiency, and they visually support the overall comparison among the tested configurations. Among these modules, ELA-HSFPN shows a somewhat different behavior and therefore requires further discussion.

When ELA-HSFPN is used alone, mAP_50:95 decreases slightly (0.845 → 0.827), while GFLOPs, parameters, and model size are reduced at the same time. This suggests that ELA-HSFPN mainly acts as a lightweight neck for multi-scale feature fusion. Its primary value is efficiency-oriented feature aggregation, rather than an isolated gain in accuracy under the current setting.

Its contribution is more visible on hard samples. Figure 15 visualizes Grad-CAM heatmaps for three visually ambiguous bare-root seedlings. Compared with the baseline and partial variants, the full Stem-YOLO produces more concentrated responses around the crown–stem transition and root region, which are the key areas for keypoint localization and arch-back inference. At the same time, spurious activations on background and leaf texture are suppressed. Based on Table 3 and Figure 12, Figure 13, Figure 14 and Figure 15, KWConv + ELA-HSFPN + LSCD-LQE is selected as the final configuration. It keeps accuracy close to the baseline (0.841 vs. 0.845) while reducing GFLOPs by more than half (6.6 → 3.1), and it also decreases parameters and model size.

After determining the final architecture, we further applied deployment-oriented compression to reduce the practical compute and memory footprint. Specifically, LAMP was performed on the selected Stem-YOLO model. As shown in Table 4, a speed-up factor of 1.6 was chosen as a balanced pruning level. At this setting, mAP_50:95 remained 0.870, which was still higher than the unpruned Stem-YOLO baseline (0.841). Meanwhile, GFLOPs were reduced from 3.1 to 1.9 (−38.7%), the number of parameters decreased from 2.44 M to 1.95 M (−20%), and the model size shrank from 5.0 MB to 4.3 MB. These results indicate that the pruned model offers a better accuracy–efficiency trade-off for edge deployment.

Finally, GFLOPs reflect theoretical cost rather than hardware latency. The end-to-end runtime can still be affected by deployment overhead on embedded devices (e.g., memory access and operator scheduling). Therefore, structured pruning and TensorRT acceleration will be applied to further improve practical inference speed and reduce latency.

3.2. Comparison Experiments

To ensure reproducibility, the YOLO-based baselines (YOLOv8/YOLO11/YOLO12-pose) were trained using the official Ultralytics v8.3.9 implementation. The OpenMMLab-based baselines (HRNet, RTMPose, and LiteHRNet) were trained using MMpose v1.3.2 with MMEngine v0.10.4 and MMCV v2.1.0, following the official training pipelines, where only dataset-related fields were adapted. All models used the same strawberry seedling dataset with an identical train/val/test split and the same three-keypoint annotation protocol.

Due to differences in the standard settings of the compared frameworks, HRNet/RTMPose/LiteHRNet used the MMpose default input size (256 × 192), while YOLO-based models used 640 × 640. For a fair and interpretable complexity comparison, GFLOPs were measured under a unified input size of 640 × 640 (forward pass only) for all models, whereas parameter count and model size are input-independent. All compared models were trained from scratch on the target dataset to keep a consistent training setting across methods and to avoid potential bias introduced by different pretraining sources.

The performance of Stem-YOLO and several representative keypoint detection networks is shown in Table 5. YOLOv8-pose achieves the highest mAP_50:95 of 0.904, and RTMPose achieves 0.886. HRNet and Stem-YOLO both reach 0.870, while LiteHRNet, YOLO11n-pose, and YOLO12n-pose show lower accuracy. These results indicate that Stem-YOLO achieves competitive accuracy with substantially lower computational cost. In addition, it outperforms the lightweight baselines listed in Table 5 while remaining more efficient (GFLOPs/Params/Size). Stem-YOLO uses only 1.95 M parameters, 1.9 GFLOPs, and 4.3 MB of storage. Compared with HRNet, the parameters, FLOPs, and model size are reduced by about 93.2%, 97.0%, and 96.1%, respectively. Relative to YOLOv8-pose, the reductions are 36.7%, 77.1%, and 29.5%, and they are 26.4%, 71.2%, and 20.4% when compared with YOLO11n-pose. Overall, Stem-YOLO provides a better balance between detection accuracy and computational complexity and is well suited for real-time deployment on resource-constrained agricultural transplanters.

3.3. Edge Deployment and Physical Validation Results

Based on the physical validation setup described in Section 2.8, the deployment performance and practical validation results of the proposed Stem-YOLO model were further evaluated. For edge deployment, the original PyTorch model (PyTorch 2.2.2+cu121) was first exported to ONNX (1.17.0) and then converted into a TensorRT engine to utilize the hardware acceleration capability of the embedded platform [30,31,32].

On the developed platform, the TensorRT FP16 engine achieved an average inference speed of 152.51 FPS, with a latency of 6.74 ms per frame. After INT8 quantization and calibration, the inference speed increased to 154.26 FPS, while the latency decreased to 6.48 ms per frame. Compared with the FP16 engine, this corresponds to an improvement of 1.75 FPS and a reduction of 0.26 ms per frame.

Using the agronomic acceptance criterion defined in Section 2.8

(γ \in [- 11.05 °, + 11.05 °]

), the physical validation experiment was conducted on 100 bare-root strawberry seedlings. The proposed system achieved an agronomically acceptable orientation rate of 93%. For the stem bending angle

θ

(evaluated on valid samples), the mean absolute error (MAE) was 5.74°, the root mean square error (RMSE) was 7.44°, the mean relative error (MRE) was 4.32%, and the standard deviation was 7.14°.

These results indicate that Stem-YOLO can satisfy the real-time requirements of embedded deployment while providing arch-back orientation adjustment consistent with practical agronomic tolerance. This demonstrates its potential for precision orientation control in mechanized transplanting of bare-root strawberry seedlings. The detailed software configuration of the embedded platform, including the JetPack and TensorRT versions, is summarized in Table 6.

4. Discussion

The results indicate that Stem-YOLO provides a practical balance among detection performance, computational efficiency, and embedded deployability for arch-back-oriented transplanting of bare-root strawberry seedlings. Rather than pursuing the highest benchmark accuracy alone, this study emphasizes an accuracy- efficiency trade-off that is more suitable for real-time agricultural robotic applications. In particular, compared with representative pose-estimation baselines, the proposed method has a lighter model footprint. It also supports real-time inference on embedded platforms, which is essential for perception-driven orientation adjustment under limited computing resources.

Compared with lightweight baselines such as LiteHRNet and YOLO11n-pose, Stem-YOLO shows better task suitability for bare-root strawberry seedlings because the target problem is not merely a generic keypoint detection task. The practical objective is to support orientation-aware transplanting under agronomic constraints. By detecting three stem keypoints, the model enables geometric inference for both arch-back direction and stem bending angle

θ

, providing outputs that are directly useful for downstream orientation control rather than only reporting detection scores.

From an application standpoint, arch-back-oriented planting in strawberry production is still largely achieved through manual recognition and adjustment. Although vision-based studies on strawberry seedlings have increased in recent years, many of them focus on plug seedlings. In addition, these studies are often carried out under standardized nursery conditions or rely on nursery-stage adjustments to make seedlings more uniform before transplanting. In contrast, bare-root seedlings are widely used by farmers for economic reasons, yet their irregular morphology makes orientation recognition more challenging under practical transplanting conditions. Therefore, this work complements existing seedling-vision studies by directly targeting bare-root seedlings and by avoiding reliance on nursery-stage preprocessing.

To interpret performance in an agronomically meaningful way, this study further introduced the orientation deviation angle γ and established an agronomic tolerance range (±11.05°) through a field investigation during the fruit-maturity stage. Under this acceptance criterion, the reported 93% reflects the proportion of seedlings whose adjusted arch-back orientation falls within the practical tolerance range, rather than being a purely visual ‘direction accuracy’ metric. This provides a clearer linkage between model output and real planting requirements and helps assess whether the system output is practically acceptable in the field.

5. Conclusions

This study proposed Stem-YOLO, a lightweight three-keypoint-based model for arch-back orientation recognition of bare-root strawberry seedlings under practical transplanting conditions. The method provides both arch-back direction recognition and stem bending-angle estimation to support orientation-aware mechanized transplanting.

Using the agronomic acceptance criterion defined by the orientation deviation angle γ (±11.05°), the proposed algorithm achieved a 93% agronomically compliant arch-back detection rate on 100 bare-root strawberry seedlings. For stem bending-angle

θ

estimation, it achieved an MAE of 5.74° and an RMSE of 7.44°. After TensorRT optimization, the model ran in real time on an NVIDIA Jetson Orin NX, achieving 154.26 FPS with a latency of 6.48 ms per frame (INT8), supporting embedded deployment.

Future work will extend the current approach to perception–manipulation integration with a dedicated end-effector and closed-loop orientation control during real transplanting operations and further validate robustness under complex field conditions.

Author Contributions

Conceptualization, J.Z. and J.H.; methodology, J.Z., J.H. and P.Z.; software, J.Z. and M.W.; validation, J.Z. and Y.T.; formal analysis, J.Z., P.Z. and W.L.; investigation, J.Z. and W.L.; resources, J.H.; data curation W.L. and W.L.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z., J.H. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Jiangsu Province Pilot Project for the Integration of Agricultural Machinery R&D, Manufacturing, Promotion and Application, Research and Application of Precision and Efficient Strawberry Transplanting Machines (JSYTH09).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, J.; Zhao, S.; Li, N.; Faheem, M.; Zhou, T.; Cai, W.; Zhao, M.; Zhu, X.; Li, P. Development and Field Test of an Autonomous Strawberry Plug Seeding Transplanter for Use in Elevated Cultivation. Appl. Eng. Agric. 2019, 35, 1067–1078. [Google Scholar] [CrossRef]
Tituaña, L.; Gholami, A.; He, Z.; Xu, Y.; Karkee, M.; Ehsani, R. A small autonomous field robot for strawberry harvesting. Smart Agric. Technol. 2024, 8, 100454. [Google Scholar] [CrossRef]
Zhang, Y.; Xie, H.; Zhang, D.; Yang, L.; Lu, Z.; Cui, T.; He, X.; Zhang, K. Design and experimental study of ridge-grown strawberry automatic harvesting robot. Comput. Electron. Agric. 2026, 240, 111112. [Google Scholar] [CrossRef]
He, Z.; Karkee, M.; Zhang, Q. Enhanced machine vision system for field-based detection of pickable strawberries: Integrating an advanced two-step deep learning model merging improved YOLOv8 and YOLOv5-cls. Comput. Electron. Agric. 2025, 234, 110173. [Google Scholar] [CrossRef]
Guo, J.; Yang, Z.; Karkee, M.; Jiang, Q.; Feng, X.; He, Y. Technology progress in mechanical harvest of fresh market strawberries. Comput. Electron. Agric. 2024, 226, 109468. [Google Scholar] [CrossRef]
Wang, D.; Wang, X.; Chen, Y.; Wu, Y.; Zhang, X. Strawberry ripeness classification method in facility environment based on red color ratio of fruit rind. Comput. Electron. Agric. 2023, 214, 108313. [Google Scholar] [CrossRef]
Sun, M.; Hu, S.; Zhao, C.; Xiong, Y. Light-resilient visual regression of strawberry ripeness for robotic harvesting. Comput. Electron. Agric. 2026, 241, 111169. [Google Scholar] [CrossRef]
Yang, H.; Yang, L.; Wu, T.; Yuan, Y.; Li, J.; Li, P. MFD-YOLO: A fast and lightweight model for strawberry growth state detection. Comput. Electron. Agric. 2025, 234, 110177. [Google Scholar] [CrossRef]
Huang, Z.; Lee, W.S.; Yang, P.; Ampatzidis, Y.; Shinsuke, A.; Peres, N.A. Advanced canopy size estimation in strawberry production: A machine learning approach using YOLOv11 and SAM. Comput. Electron. Agric. 2025, 236, 110501. [Google Scholar] [CrossRef]
Lai, Q.; Wang, Y.; Tan, Y.; Sun, W. Design and experiment of Panax notoginseng root orientation transplanting device based on YOLOv5s. Front. Plant Sci. 2024, 15, 1325420. [Google Scholar] [CrossRef]
Zhang, W.; Zhu, Q.; Zhang, T.; Liu, H.; Mu, G. Design and control of a side dense transplanting machine for sweet potato seedlings on mulch film. Comput. Electron. Agric. 2024, 224, 109193. [Google Scholar] [CrossRef]
Yu, Q.; Li, R.; Gong, Y.; Ma, Y.; Chen, X.; Hu, J. Research status and development trends of mechanized transplanting of Codonopsis pilosula. J. Chin. Agric. Mech. 2025, 46, 34–42. [Google Scholar] [CrossRef]
Su, W.; Ma, Y.; Lai, Q.; Zhang, X.; Wang, F. Design and Experiment of Directional Transplanting Device for Panax notoginseng Seedlings Based on Torque Imbalance Effect. Trans. Chin. Soc. Agric. Mach. 2024, 55, 237–245. [Google Scholar] [CrossRef]
Li, G.; Gu, K.; Zhao, M. Method and experiment on automatic orientation of slice sampling for corn seed. Trans. Chin. Soc. Agric. Eng. 2016, 32, 40–47. [Google Scholar] [CrossRef]
Geng, A.; Li, X.; Hou, J.; Zhang, Z.; Zhang, J.; Chong, J. Design and experiment of automatic directing garlic planter. Trans. Chin. Soc. Agric. Eng. 2018, 34, 17–25. [Google Scholar] [CrossRef]
Yu, Y.; Wang, Y.; Zhou, J.; Pan, Y. Design and Test of Combined Pumpkin Seed Attitude-constrained Directional Seed Metering Device. Trans. Chin. Soc. Agric. Mach. 2024, 55, 121–132. [Google Scholar] [CrossRef]
Sun, L.; Zhang, Y.; Cui, R.; Ye, P.; Chen, D. Design and Experiment of Automatic Transplanting Device for Strawberry Plug Seedlings with Arch-back Orientation. Trans. Chin. Soc. Agric. Mach. 2025, 56, 200–209. [Google Scholar] [CrossRef]
Yu, J.; Bai, Y.; Yang, S.; Ning, J. Stolon-YOLO: A detecting method for stolon of strawberry seedling in glass greenhouse. Comput. Electron. Agric. 2023, 215, 108447. [Google Scholar] [CrossRef]
Ma, Z.; Dong, N.; Gu, J.; Cheng, H.; Meng, Z.; Du, X. STRAW-YOLO: A detection method for strawberry fruits targets and key points. Comput. Electron. Agric. 2025, 230, 109853. [Google Scholar] [CrossRef]
Bai, Y.; Yu, J.; Yang, S.; Ning, J. An improved YOLO algorithm for detecting flowers and fruits on strawberry seedlings. Biosyst Eng. 2024, 237, 1–12. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Jin, S.; Zhou, L.; Zhou, H. CO-YOLO: A lightweight and efficient model for Camellia oleifera fruit object detection and posture determination. Comput. Electron. Agric. 2025, 235, 110394. [Google Scholar] [CrossRef]
Amir, A.; Giri, S.; Giri, S.; Butt, M. An advanced approach to tomato apex head thickness measurement using lightweight YOLO variants, faster RCNN, and RGB-depth sensor. Smart Agric. Technol. 2025, 12, 101214. [Google Scholar] [CrossRef]
Li, C.; Yao, A. KernelWarehouse: Rethinking the Design of Dynamic Convolution. arXiv 2024, arXiv:2406.07879. [Google Scholar] [CrossRef]
Xu, W.; Wan, Y. ELA: Efficient Local Attention for Deep Convolutional Neural Networks. arXiv 2024, arXiv:2403.01123. [Google Scholar] [CrossRef]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: A Simple and Strong Anchor-Free Object Detector. arXiv 2020, arXiv:2006.09214. [Google Scholar] [CrossRef]
Li, X.; Wang, W.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection. arXiv 2020, arXiv:2011.12885. [Google Scholar] [CrossRef]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-adaptive sparsity for the Magnitude-based Pruning. arXiv 2020, arXiv:2010.07611. [Google Scholar] [CrossRef]
Liu, C.; Wei, S.; Zhong, S.; Yu, F. YOLO-PowerLite: A Lightweight YOLO Model for Transmission Line Abnormal Target Detection. IEEE Access 2024, 12, 105004–105015. [Google Scholar] [CrossRef]
Zhong, C.; Wu, H.; Jiang, J.; Zheng, C.; Song, H. YOLO-DLHS-P: A Lightweight Behavior Recognition Algorithm for Captive Pigs. IEEE Access 2024, 12, 104445–104462. [Google Scholar] [CrossRef]
Deng, X.; Huang, T.; Wang, W.; Feng, W. SE-YOLO: A sobel-enhanced framework for high-accuracy, lightweight real-time tomato detection with edge deployment capability. Comput. Electron. Agric. 2025, 239, 110973. [Google Scholar] [CrossRef]
Song, Z.; Li, W.; Tan, W.; Qin, T.; Chen, C.; Yang, J. LBSR-YOLO: Blueberry health monitoring algorithm for WSN scenario application. Comput. Electron. Agric. 2025, 238, 110803. [Google Scholar] [CrossRef]

Figure 1. Strawberry Dataset Collection Site: (a) Strawberry dataset collection location (b) Scene after planting bare-root strawberry seedlings (c) Strawberry artificial cultivation site.

Figure 2. Strawberry dataset label format: (a) Dataset annotation format.; (b) Strawberry bare-root seedling labeling.

Figure 3. Stem-YOLO network structure model diagram.

Figure 4. A schematic overview of KernelWarehouse to a ConvNet. cdd denotes common kernel dimension divisors, and b is the desired convolutional parameter budget.

Figure 5. C3K2_KW Module.

Figure 6. The schematic diagrams of Efficient Local Attention (ELA).

Figure 7. The schematic diagrams of LSCD-LQE head.

Figure 8. Schematic illustration of the LAMP score computation and its use in global model pruning.

Figure 9. Detection principle for arch-back recognition in bare-root strawberry seedlings.

Figure 10. Physical validation of arch-back orientation and angle measurement. (a) Experimental platform for seedling orientation adjustment and detection; (b) manual measurement of the stem bending angle and orientation deviation.

Figure 11. Evaluation of Arch-Back Planting Orientation Error under Manual Transplanting at the Fruit Maturity Stage: (a) Improper fruit-bearing position caused by manual planting orientation error; (b) Measurement of the angular deviation between actual and ideal arch-back planting orientations.

Figure 12. Accuracy-Efficiency Trade-off (Ablation Variants).

Figure 13. Accuracy Change Relative to Baseline.

Figure 14. Computational Cost Change Relative to Baseline.

Figure 15. Comparison of heatmap of different model structures.

Table 1. Experimental environment configuration.

Accessory	Version
CPU	Intel^® Core™ i7-13700F
RAM	32 GB
GPU	NVIDIA RTX 4070 Ti
Operating system	Ubuntu 20.04
Development	Python 3.10
Environment	CUDA 12.1 cuDNN 8.8.1 Torch2.2.2 Torchvision 0.17.2

Table 2. Model hyperparameter settings.

Parameters	Setup
Epoch	300
Batch size	16
Image Size	640 × 640
Initial Learning Rate	1 × 10⁻²
Final Learning Rate	1 × 10⁻⁴
Momentum	0.937
Weight-Decay	5 × 10⁻⁴
Optimizer	SGD

Table 3. Detection results after the introduction of different improvement strategies.

YOLO11n-Pose	KW Conv	ELA-HSFPN	LSCD-LQE	mAP50:95	GFLOPs	Params (M)	Size (MB)
✓				0.845	6.6	2.65	5.4
✓	✓			0.856	3.9	2.74	5.7
✓		✓		0.827	5.9	2.55	5.2
✓			✓	0.858	5.8	2.45	5.2
✓	✓	✓		0.838	3.9	2.60	5.4
✓	✓		✓	0.866	3.1	2.53	5.5
✓		✓	✓	0.751	5.3	2.39	5.1
✓	✓	✓	✓	0.841	3.1	2.44	5.0

Table 4. Comparison of compression rates for different compute volumes.

Lamp Speed_Up	mAP50:95	GFLOPs	Params	Size (MB)
1.0	0.841	3.1	2,440,916	5.0 M
1.1	0.881	2.8	2,184,701	4.8 M
1.2	0.879	2.6	2,106,139	4.6 M
1.3	0.876	2.4	2,050,008	4.5 M
1.4	0.874	2.2	2,009,695	4.4 M
1.5	0.873	2.1	1,974,394	4.3 M
1.6	0.87	1.9	1,945,586	4.3 M
1.7	0.845	1.8	1,919,239	4.2 M
1.8	0.825	1.8	1,910,307	4.2 M

Table 5. Comparison of each indicator for different models (GFLOPs are computed for a forward pass at 640 × 640 input resolution, excluding post-processing).

Model	mAP50:95	GFLOPs	Params (M)	Size (MB)
HRnet	0.87	64.0	28.54	109.8
Litehrnet	0.58	1.5	1.52	4.5
Rtmpose	0.886	5.5	5.12	13.4
yolov8-pose	0.904	8.3	3.08	6.1
yolo11n-pose	0.845	6.6	2.65	5.4
yolo12n-pose	0.828	6.1	2.58	5.37
Stem-YOLO	0.87	1.9	1.95	4.3

Table 6. Detailed configuration information for NVIDIA Jetson Orin NX.

Configuration Item	Details
Jetpack Version	6.2.1
CUDA Version	12.6.85
TensorRT Version	10.7.0.23
OpenCV Version	4.10
Operating System	Ubuntu 22.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, J.; Zhang, P.; Wei, M.; Liu, W.; Shi, J.; Tan, Y.; Hu, J. Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting. Agriculture 2026, 16, 657. https://doi.org/10.3390/agriculture16060657

AMA Style

Zhou J, Zhang P, Wei M, Liu W, Shi J, Tan Y, Hu J. Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting. Agriculture. 2026; 16(6):657. https://doi.org/10.3390/agriculture16060657

Chicago/Turabian Style

Zhou, Jinhao, Pengcheng Zhang, Menglei Wei, Wei Liu, Jiawei Shi, Youheng Tan, and Jianping Hu. 2026. "Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting" Agriculture 16, no. 6: 657. https://doi.org/10.3390/agriculture16060657

APA Style

Zhou, J., Zhang, P., Wei, M., Liu, W., Shi, J., Tan, Y., & Hu, J. (2026). Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting. Agriculture, 16(6), 657. https://doi.org/10.3390/agriculture16060657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting

Abstract

1. Introduction

2. Materials and Methods

2.1. Construction of Datasets

2.2. Dataset Labeling and Enhancement

2.3. Test Environments

2.4. Stem-YOLO Network Model

2.4.1. KWConv and C3K2_KW Feature Enhancement Module

2.4.2. ELA-HSFPN Multi-Scale Feature Fusion Module

2.4.3. LSCD-LQE Quality-Aware Ultra-Lightweight Detection Head

2.5. Model Pruning

2.6. Arch-Back Identification of Bare-Root Strawberry Seedlings Based on a Three-Keypoints Pose Model

2.7. Assessment of Indicators

2.8. Physical Validation Setup and Agronomic Acceptance Criterion

3. Results

3.1. Ablation Experiment

3.2. Comparison Experiments

3.3. Edge Deployment and Physical Validation Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI