Design of a Machine Vision Detection System for Lettuce Growth Stages Based on the CCASF-YOLOv10 Model

Gao, Qiang; Ji, Yu; Shi, Chongchong; Wang, Meili

doi:10.3390/horticulturae12030379

Open AccessArticle

Design of a Machine Vision Detection System for Lettuce Growth Stages Based on the CCASF-YOLOv10 Model

by

Qiang Gao

^1,2,

Yu Ji

^2,3,*,

Chongchong Shi

^1,2 and

Meili Wang

^2,4,*

¹

School of Information Engineering, Xi’an University, Xi’an 710065, China

²

Shaanxi Joint Laboratory of Artificial Intelligence, Xi’an University, Xi’an 710065, China

³

School of Computer Science, Xijing University, Xi’an 710000, China

⁴

School of Information Engineering, Northwest A&F University, Yangling 712100, China

^*

Authors to whom correspondence should be addressed.

Horticulturae 2026, 12(3), 379; https://doi.org/10.3390/horticulturae12030379

Submission received: 5 February 2026 / Revised: 14 March 2026 / Accepted: 16 March 2026 / Published: 19 March 2026

(This article belongs to the Section Vegetable Production Systems)

Download

Browse Figures

Versions Notes

Abstract

To address challenges related to complex background interference and insufficient multi-scale target feature extraction in lettuce growth stage detection. The lightweight YOLOv10 detection model and the specific characteristics of lettuce field data were used. The CNCM channel non-local mixture mechanism and ASF adaptive spatial frequency attention mechanism were incorporated to optimize lightweight modules, including DownSample, Zoom_cat, and ScalSeq, within the original model. Consequently, an improved CCASF-YOLOv10 model was constructed, integrating multi-scale feature fusion and enhanced target feature extraction. Experimental results demonstrate that, in an NVIDIA A40 GPU testing environment, the model achieves an accuracy rate of 91.9%, a recall rate of 91.6%, mAP@0.5 of 95.3%, and mAP@0.5:0.95 of 72.9%. The parameter size is 11.9 M, and the single-frame inference speed is 24.76 ms, indicating a favorable balance between detection precision, model efficiency, and real-time inference. Furthermore, an intelligent machine vision detection system for lettuce growth-stage monitoring and precise field control was developed using the CCASF-YOLOv10 model. This system facilitates the industrial advancement of lettuce cultivation.

Keywords:

machine vision; lettuce; growth stage detection; CCASF-YOLOv10; convolutional neural networks

1. Introduction

The projected continuous growth of the global population is increasing demand for agricultural products, thereby making the enhancement of agricultural production efficiency and refinement a critical research focus [1]. Traditional agricultural field management relies on manual experience, which is often subjective and inefficient, and does not meet the demands of large-scale modern agricultural production [2]. Machine vision, a core technology in smart agriculture, enables non-contact, real-time, and accurate monitoring of crop growth by simulating human visual perception. This technology has become a significant approach to addressing these challenges [3].

Lettuce is a widely cultivated leafy vegetable, and accurate detection of its growth stages is essential for precision management practices such as irrigation, fertilization, and harvesting. The rapid advancement of computer vision and deep learning has led to numerous studies employing advanced algorithms for the intelligent monitoring of lettuce growth. These studies primarily focus on three research areas: seedling, pest, and disease detection; measurement of growth parameters; and detection of growth stage subdivisions.

(1): Lettuce Seedling and Pest Detection

Li et al. [4] addressed the challenge of lettuce seedling state detection by proposing an improved Faster R-CNN framework utilizing HRNet as the backbone. This approach enabled high-precision detection of dead and double plants in hydroponic lettuce seedlings, achieving a mean average precision (mAP) of 86.2%. However, the model operates as a two-stage detection algorithm with a large number of parameters and slow inference speed, limiting its suitability for real-time field applications. For lettuce pest and disease identification, Wang et al. [5] introduced the YOLO-EfficientNet method, which integrates YOLOv8n for target detection and EfficientNet-v2s for classification. This method achieved a test accuracy of 94.68% for lettuce health status classification. Nevertheless, the study categorizes lettuce into only three broad states (healthy, diseased, pest-infested) without further differentiation by growth stage, and the dataset is limited to hydroponic lettuce, limiting generalization to soil-cultivated varieties.

(2): Lettuce Growth Parameter Measurement

Zhao et al. [6] introduced a lightweight YOLOv8n segmentation model for measuring lettuce height, achieving a measurement accuracy of 94.339% in hydroponic scenarios. However, the model’s accuracy decreased to 91.22% in soil-cultivated scenarios, and it demonstrated sensitivity to light variations, resulting in limited adaptability to complex field environments. Liu et al. [7] developed two enhanced PSPNet models for evaluating lettuce canopy coverage, attaining a maximum MIoU of 0.9832 and a compact model size of 9.3 MB. Nevertheless, this approach focuses solely on canopy coverage, lacks comprehensive detection across multiple growth stages, and is restricted to static image analysis without real-time detection capabilities.

(3): Lettuce Growth Stage Subdivision Detection

Research on the subdivision detection of lettuce growth stages remains limited, with most existing studies emphasizing algorithmic enhancements. Zhang et al. [8] introduced the CBAM+ASFF-YOLOXs model, achieving identification of key lettuce growth stages with a mean average precision (mAP) of 99.04%. However, the model’s parameter size is 55.5 million, which hinders deployment on edge devices. Subsequently, Zhang et al. [9] developed the YOLO-VOLO-LS model for early lettuce seedling variety recognition, addressing the challenge of small target detection during the seedling stage. Nevertheless, this model is restricted to the early growth stage and does not encompass the entire growth cycle, such as the mature harvest stage. Furthermore, Zhang et al. [10] proposed the HR-YOLOv8 model for crop growth status detection, which enhanced small target detection capabilities. Despite this improvement, the model was not specifically optimized for the morphological characteristics of leafy crops like lettuce, resulting in suboptimal detection performance in complex agricultural environments.

(4): Comparative Analysis of Existing Studies and Research Gaps

To intuitively present the characteristics and limitations of existing research on lettuce intelligent monitoring, a comparative analysis table is established based on the above studies, as shown in Table 1.

From the above analysis and Table 1, it can be seen that the existing research on lettuce intelligent monitoring has the following common research gaps: (1) Most studies focus on hydroponic lettuce, and the dataset lacks soil-cultivated lettuce samples, with poor generalization to actual field environments; (2) The existing models either pursue high accuracy but with large parameters, or are lightweight but with insufficient feature extraction ability, and it is difficult to balance accuracy and lightweight; (3) The research on lettuce growth stage detection is mostly limited to key stages or early stages, and there is a lack of comprehensive detection covering the whole growth cycle; (4) Most studies only focus on algorithm performance verification, and lack the development of end-to-end deployable intelligent detection systems, resulting in the disconnection between algorithm research and industrial application.

To address the identified research gaps, this study constructs a soil-cultivated lettuce growth-stage dataset encompassing five complete growth stages, optimizes the lightweight YOLOv10 model to balance accuracy and efficiency, and develops an end-to-end intelligent detection system. This approach addresses the limitations of previous research and offers a replicable technical solution for the intelligent detection of leafy vegetable growth stages.

2. Materials and Methods

2.1. Lettuce Growth Stage Dataset

Table 2 lists the lettuce variety used in this study: butterhead. The cultivation process followed standardized horticultural procedures and was divided into the empty pod stage, pod formation stage, germination stage, seedling stage, and mature harvestable stage [11]. Environmental conditions were controlled to maintain a light intensity of 300 μmol·m⁻²·s⁻¹, a photoperiod of 12 h light and 12 h dark, day and night temperatures of 22 °C and 18 °C, respectively, and relative humidity between 60% and 70%. Loose, fertile, well-drained, and well-aerated sandy loam soil was selected for lettuce cultivation. Soil pH was maintained within a weakly acidic to neutral range of 6.0 to 7.0, and the organic matter content was at least 25 g/kg. This soil type offers both water and nutrient retention capacity and good permeability. Commercially available lettuce is characterized by fresh green leaves, a crisp and tender texture, leaves that are stretched without obvious signs of yellowing or insect infestation, and a tight, full heart.

2.2. Data Collection During the Growth Stage of Lettuce

The MV-CE050-30GM industrial camera from Hikrobot (Hangzhou, China) served as the image acquisition device, featuring a fixed 20 cm focal length and adjustable exposure settings to ensure image clarity during all lettuce growth stages. To address uneven lighting conditions, full-spectrum LED supplemental lighting (manufactured in China) with a color temperature of 5000 K was employed, providing consistent illumination without affecting normal lettuce development. Image acquisition occurred within a transparent enclosure, with data collected at 3 h intervals over the entire 35 d lettuce growth cycle. The numbers marked on the planting board are used to uniquely identify each lettuce seedling, facilitating tracking of their growth status and data recording throughout the experiment. A local storage module was implemented to reduce the risk of data loss resulting from network interruptions.

The criteria for minimal noise and artifacts in lettuce images are as follows. Noise is quantitatively assessed using the grayscale standard deviation, with an acceptable threshold of ≤5. Visually, images must not display significant Gaussian noise or salt-and-pepper noise. Regarding artifacts, images must be free from striped or blocky patterns caused by equipment malfunction or transmission errors, as well as artificial artifacts such as excessive edge sharpening or color distortion introduced during preprocessing. Images that to not meet these criteria obscure essential features, including leaf texture and edge morphology, thereby reducing the accuracy of model feature extraction. Additionally, artifacts may be misidentified as genuine features, resulting in reduced recognition accuracy, lower recall rates, and increased generalization error. According to these standards, a systematic cleaning and proofreading process was applied to all images utilized for lettuce image data processing, and 250 high-quality images with minimal noise or artifacts were selected for subsequent analysis. In visual recognition tasks, the size of the training dataset directly affects model performance [12]. Excessive data increases data collection costs with only marginal performance improvements, whereas insufficient data limits feature learning capacity and leads to overfitting [13].

Saturation was adjusted within a range of 0.7 to 1.3 times. Brightness was modified from −20 to +20, based on a pixel value range of 0 to 255. Contrast was adjusted between 0.8 and 1.2 times. The rotation angle ranged from −15° to +15°, with random rotation applied, followed by cropping to remove black borders. Sharpness was adjusted from 0.9 to 1.4 times. These standardized parameters were used to augment the dataset, resulting in a total of 750 enhanced images. To improve the model’s generalization capability and robustness, data augmentation techniques were employed to increase dataset diversity by randomly adjusting parameters such as saturation, brightness, contrast, rotation, and sharpness of lettuce images [14]. This method expands the model’s learning space, enabling more comprehensive feature extraction and improving accuracy and stability across different scenarios. Images after parameter adjustment were shown in Figure 1.

Following data augmentation of lettuce growth-stage images, all images were uniformly resized to 640 × 640 pixels. The labelImg-1.8.6 was employed to annotate the position and category information for each growth stage (seedling, pod, etc.), and the results were stored as XML files. After annotation, the dataset was divided into training, testing, and validation sets in a 7:2:1 ratio based on the number of images using a stratified random strategy (Table 3). These subsets were subsequently used for model training, performance evaluation, and hyperparameter tuning to enhance generalization.

2.3. A Lettuce Growth Stage Monitoring Model Based on Improved CCASF-YOLOv10

YOLOv10s [15] is a lightweight, single-stage detection model developed by the Ultralytics team. Its architecture comprises four modules, input layer, backbone network, neck network, and detection head, which collectively balance model size, accuracy, and real-time performance. The improved architecture of YOLOv10s is shown in Figure 2. The input layer supports 640 × 640 multi-scale image input and incorporates Mosaic data augmentation to enhance adaptability to complex scenes. The backbone network utilizes an improved C2f_CNCM [16] module that reduces parameter redundancy by stacking bottleneck units connected by convolution and residuals. Additionally, the SPPF module is integrated for cross scale feature aggregation, enabling accurate capture of multi-scale target information. The PSA module further enhances global feature interaction, increasing the efficiency of feature extraction. The neck network, based on a path aggregation structure, achieves bidirectional feature fusion through the Zoom_cat module. In combination with the Conv layer and C2f_CNCM module, it efficiently integrates high-level and low-level features, significantly improving the accuracy of small object detection. The detection head adopts an anchor free frame design, integrates ScalSeq module and Asf [17] attention mechanism, decouples classification and regression tasks via the v10Detect structure, and optimizes sample matching through dynamic label allocation, thereby improving detection efficiency and accuracy. Compared to the original model, YOLOv10s applies channel optimization strategies such as the C2f_CNCM module to compress parameter size while enhancing feature expression capacity. This model is well-suited for detecting lettuce at multiple growth stages in complex agricultural environments, providing an efficient solution that combines lightweight design, accuracy, and real-time performance.

2.4. Improved C2f_CNCM Module

To improve YOLOv10’s performance in detecting lettuce growth stages, this study proposes embedding the Channel Non-local Channel Mixing (CNCM) mechanism into the C2f module, resulting in the C2f_CNCM composite module and thereby enhancing feature representation capabilities. Due to the subtle differences among lettuce growth stages (seedling, heading, mature, etc.) and the crop’s susceptibility to complex field background interference such as soil, weeds, and illumination changes, the CNCM (Figure 3) is designed to embed multiple RCSSC blocks with a densely connected residual structure to facilitate information flow and feature reuse. The mechanism leverages global dependency concepts from non-local neural networks by transferring pixel-level global interaction logic to the channel dimension. Long-range association modeling between channel groups is achieved through a channel blocking strategy, while lightweight convolutional mappings reduce computational overhead. This approach not only improves detection accuracy and maintains real-time inference performance.

This study extends the non-local concept from the spatial pixel dimension to the channel dimension by introducing a three-level strategy: channel block sampling, cross-block feature mapping, and residual fusion. First, the channel dimension is partitioned into multiple sub-blocks to reduce computational redundancy associated with full-channel interaction. Second, 1 × 1 convolution is applied to achieve nonlinear mapping of cross-block features, thereby enhancing the correlation between different channel sub-blocks, such as the feature coordination between channels representing leaf texture and stem contour. Third, residual connections are incorporated to preserve the original channel features, mitigating feature degradation in small datasets. On the lettuce growth stage dataset, the CNCM demonstrates significant advantages over traditional channel enhancement mechanisms, such as SE attention and ordinary C2f channel concatenation. CNCM not only enhances global feature interaction through non-local channel mixing but also prevents information loss and excessive computational demands. Building on the core principles of Non-local Neural Networks, CNCM employs channel block sampling to reduce computational costs, utilizes 1 × 1 convolution for cross-block feature mapping to reinforce long-range channel correlations, and integrates residual connections to maintain original feature information. This approach effectively reduces overfitting in small datasets and is particularly well-suited for fine-grained object detection tasks.

2.5. ASF_Attention

ASF-YOLO is an instance segmentation framework designed for cell images that enhances the accuracy of small object segmentation by integrating spatial and multi-scale features, as well as combining local and global feature information. The neck network architecture incorporates the SSFF module, TFE module, and CPAM attention mechanism (Figure 4). The SSFF module uses the P3 high-resolution feature map as a reference, applies Gaussian smoothing to the multi-scale feature maps (P3, P4, and P5) from the backbone network, resizes P4 and P5 to match the dimensions of P3 size, and extracts scale sequence features through 3D convolution concatenation. This process enables efficient fusion of deep and shallow features, addressing the limitations of traditional feature pyramid fusion, as shown in Equations (1) and (2). The TFE module initially convolves and adjusts the number of feature channels at each scale, applies pooling downsampling to large-scale feature maps, and performs interpolation-based upsampling on small-scale feature maps. Subsequently, it integrates features across large, medium, and small-scale to capture local details and improve the recognition of densely packed small cell targets, as shown in Equation (3). The CPAM module integrates channel and spatial attention mechanisms by assigning channel weights via global average pooling and fully connected layers, while maintaining spatial structure through pooling along horizontal and vertical axes. This module selectively enhances the fine-grained and multi-scale features produced by the SSFF and TFE modules, thereby suppressing background interference. Collectively, these modules established a comprehensive feature optimization chain. Initially, the backbone network features maps (P3, P4, and P5) are fused, and the SSFF module performs preliminary multi-scale features fusion using 3D convolution. The TFE module then concatenates feature maps of different spatial sizes and captures complex cross-scale details to enhance the representation of small targets. Finally, the CPAM module is applied to the P3 branch to extract advanced multi-scale features and fine details, resulting in improved object detection accuracy.

F_{σ} (w, h) = G_{σ} (w, h) \times f (w, h)

(1)

\begin{matrix} G_{σ} (w, h) = \frac{1}{2 π σ^{2}} e^{- (w^{2} + h^{2}) / 2 σ^{2}} \end{matrix}

(2)

\begin{matrix} F_{TFE} = C o n c a t (F_{l}, F_{m}, F_{s}) \end{matrix}

(3)

where f (

w

,h) denotes 2D input images with width

w

and height h. F

σ

(

w

,h) is generated by applying a smoothing process using a sequence of convolutions with a 2D Gaussian filter Gσ (

w

,h). σ represents the scale, determining the standard deviation of the 2D Gaussian filter used in the convolution. F_TFE represents the feature maps produced by the TFE module.

F_{l}

,

F_{m}

, and

F_{s}

correspond to large, medium, and small feature maps, respectively.

2.6. Experimental Environment and Parameters

Anaconda (conda 24.9.2), a cross-platform Python development environment, integrates a comprehensive suite of tools and libraries to streamline the development process. Python’s straightforward syntax, robust functionality, and extensive third-party libraries make it well-suited for deep learning applications. To accelerate model training, this framework was utilized in the present experiment, leveraging GPU-based parallel computing to improve efficiency and maintain capabilities with multiple versions of the YOLO algorithm series. The experiments were conducted on an Ubuntu 20.04.6 LTS system equipped with an NVIDIA A40 48 GB graphics card. The software and hardware configurations included Python 3.11, CUDA 12.1, PyTorch 2.1.0, and Torchvision 0.16.0. This configuration provided stable system performance and maximized the parallel computing capabilities of the GPU, thereby enhancing training efficiency and fulfilling the requirements for training and inference in deep learning tasks involving various YOLO algorithm versions.

Preliminary experiments tracking training loss and validation accuracy across epochs indicated that 300 epochs were sufficient for the model to approach convergence. An early stopping strategy was implemented, terminating training if validation accuracy failed to improve for 15 consecutive epochs, thereby reducing unnecessary computational costs. Based on a comprehensive assessment of hardware conditions, a batch size of 16 was selected to balance GPU memory utilization and training stability. The model was trained for 300 epochs using the SGD optimizer, with an initial learning rate of 0.01, and a weight attenuation coefficient of 0.0005.

2.7. Evaluation Index

For lettuce growth stage detection, the selection of evaluation indicators follows the research norms in agricultural machine vision and crop target detection [18]. These indicators are determined based on the requirements for accurate identification, comprehensive coverage and reliable localization of lettuce targets in complex agricultural environments.

The accuracy rate [19] (precision), recall rate [20] (recall), mAP [21] mAP@0.5–0.95 were used as the main evaluation indices for lettuce growth stage monitoring.

Precision (P) refers to the proportion of correctly identified growth stages among all samples predicted as a specific growth stage by the model. This metric indicates the model’s capacity to minimize misclassification, specifically the incorrect identification of healthy areas as diseased. The calculation formula is as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

Recall (R) is defined as the proportion of samples with the actual target growth stage that are correctly identified by the model. This metric reflects the model’s ability to minimize the omission of true disease areas. The calculation formula is as follows:

R e c a l l = \frac{T P}{T P + F N}

The F1 score is relevant for evaluating both accurate identification and comprehensive coverage in disease detection. The calculation formula is as follows:

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

MAP (mean average precision) is one of the core indicators in target detection, and is divided into the following common variants: mAP@0.5: the average AP (Average Precision) across all growth stage categories at an IoU threshold of 0.5.

m A P = \frac{\sum_{i = 1}^{C} A P_{i}}{C}

mAP@0.5:0.95 refers to the mean Average Precision calculated at multiple Intersection over Union (IoU) thresholds, ranging from 0.5 to 0.95 in increments of 0.05. At each threshold, the mAP is computed and subsequently averaged, providing a comprehensive evaluation of model performance across varying localization accuracy criteria.

m A P @ 0.5 : 0.95 = \frac{\sum_{t = 0.5}^{0.95} m A P @ t}{10}

For disease identification, mAP@0.5 measures the ability to roughly position the growth stage, and mAP@0.5 0.95 evaluates the ability to accurately frame and position the growth stage.

To verify that the performance improvement of the model is non-accidental, we performed independent-sample t-tests (for comparative experiments) and paired-sample t-tests (for ablation experiments) on the core metrics (mAP@0.5, mAP@0.5:0.95). All models were independently trained 5 times with fixed random seeds and consistent hardware environments. Using the 5 repeated runs as samples, a significance level of α = 0.05 was adopted: a p-value < 0.05 indicates a statistically significant difference.

2.8. System Design of Lettuce Growth Stage Detection

2.8.1. System Workflow

Figure 5 illustrates the system workflow. Users can capture photos, upload images from their albums, or activate the camera for real-time monitoring. The system promptly invokes the YOLOv10 detection model to ensure accurate identification. Then, the system detects lettuce using YOLOv10 and displays the corresponding growth stage for user reference. After processing an image, user can upload additional image for further detection.

2.8.2. System Functional Module Division

This module functions as the central component for data input and analysis within the system, overseeing multi-source data access and detection processes. It accommodates three input modes, image, video, and camera, each tailored for static analysis, dynamic tracking, and real-time scene monitoring, respectively. Operation control is tightly integrated with YOLOv10 inference, offering various options corresponding to parameters such as resolution. The initiation of the image analysis module and the triggering of inference tasks based on signals constitute the initial phase in the intelligent analysis of plant images.

The YOLOv10 model functions as the system’s intelligent judgment core and inference engine, directly analyzing plant morphological features to enable real-time identification of growth stages through efficient object detection and feature extraction. This module removes the need for temporal data and conducts analysis using only a single frame image. YOLOv10 identifies key plant features, generates bounding boxes and category scores, and applies feature-to-stage mapping rules. By integrating quantitative detection results, such as leaf area, and stem length, the module assigns the appropriate growth stage label, including stages such as “germination stage” or “seedling stage”.

The system supports manual correction of judgment results and optimizes the feature weights of YOLOv10 through a feedback mechanism, such as adjusting the recognition priority of leaf morphology for crops like lettuce and tomato. The design enables adaptation to various environments, including greenhouses and fields. Leveraging YOLOv10’s rapid inference capability (single frame processing time ≤ 30 ms), the module processes image inputs in real time and updates stage judgment. This approach provides a reliable and immediate basis for assessing agricultural operations, such as watering and fertilization timing selection, thereby simplifying analysis while maintaining judgment accuracy.

This approach emphasizes the intuitive transformation of detection results by employing bounding boxes, charts, and tables to achieve comprehensive visualization. Plants are accurately marked with bounding boxes, with overlaid categories and confidence levels to clearly present target location and category. The integrated display of bar charts and tables shows the distribution of categories and quantity statistics, providing a macro-level overview of population growth while enabling detailed analysis of individual plant phenotypes such as plant height and leaf area. Abstract model inference results are transformed into interactive information, thereby enhancing the connection between AI analysis and user cognition.

2.8.3. System Implementation Environment & Technical Framework

As shown in Figure 6, the systems offer two detection methods: uploading images from the album or utilizing the camera for real-time detection [22]. Upon receiving an image, the system immediately executes the YOLOv10 object detection model to identify lettuce. YOLOv10 rapidly and accurately recognizes the outline, color, and shape of lettuce to determine its growth stage.

Upon completion of detection, the system visually displays the results on the interface, indicating whether the lettuce is in the seedling, growth, or maturity stage. Users can choose to close or save the results. Alternatively, additional images can be uploaded for further detection, and the system will process them accordingly.

In addition, the system supports continuous detection of multiple images. Batch uploads are processed sequentially, with results presented in order. In camera mode, the system conducts real-time analysis to achieve accurate detection of continuous images. The primary objective of system development is to enhance the intelligence of lettuce planting management, enabling users to intuitively monitor lettuce growth status, optimize management practices, and improve planting efficiency.

This UI is developed using PyQt5 (version 5.15) to provide a comprehensive plant growth monitoring and analysis system that incorporates the YOLOv10 object detection model for real-time identification and assessment of plant growth status. The system features a modular architecture with a clear interface, including designated area for image display, file import, detection result statistics, target position information, and operation control. Interactive responses, such as button clicks and changes in dropdown options, are managed through signal-slot mechanisms, while UI progress bars in multithreaded applications are updated using custom signals. The system supports multiple input sources, videos, and cameras and enables real-time detection and visual annotation of plant growth stages (germination, seedlings, maturity). Key indicators such as detection time, target quantity, and category distribution are dynamically displayed. Additionally, QTableWidget presents detailed information for each detected target in a tabular format and supports dropdown filtering and table linkage.

3. Results and Discussions

3.1. Performance Between CCASF-YOLOv10 and Other Models

To evaluate the comprehensive performance advantages of the proposed algorithm, comparative experiments were conducted between the improved CCASF-YOLOv10 model and mainstream object detection models, including YOLOv5 [23], YOLOv7 [24], YOLOv8 [25], YOLOv10, and YOLOv11 [26] under an identical training environment and parameter configuration. The results are presented in Table 4, demonstrating that the CCASF-YOLOv10 model achieves superior performance in core detection metrics, with mAP@0.5 and mAP@0.5:0.95 values of 95.3% and 72.9%, respectively. These results represent improvements of 0.8% and 0.7% over the baseline YOLOv10 model. The parameter count of 11.9 × 10⁶ is 7.7% lower than that of YOLOv10, thereby achieving both enhanced accuracy improvement and model lightweighting. While YOLOv5 attains a marginally higher accuracy rate (96.1%), its recall rate (89.7%) and mAP metrics are notably lower than those of CCASF-YOLOv10. YOLOv7 exhibits the highest recall rate (92.7%), yet its mAP@0.5:0.95 is 2.0% lower than that of CCASF-YOLOv10. Both YOLOv8 and YOLOv11 underperform in several core detection metrics relative to the proposed model. Although YOLOv11 features a smaller parameter count (7.2 × 10⁶), its mAP@0.5 and mAP@0.5:0.95 are 0.6% and 0.8% lower, respectively, than those of CCASF-YOLOv10. Despite the CCASF-YOLOv10 model exhibiting slightly higher GFLOPs (32.8) and slower inference speed (24.76 ms) compared to certain lightweight models, it continues to satisfy real-time detection requirements and offers significant accuracy advantages. In summary, the CCASF-YOLOv10 model outperforms existing mainstream algorithms and related agricultural detection research in comprehensive performance by combining high detection accuracy with lightweight characteristics. This enhances detection capabilities for targets of different scales, effectively reduces the missed detection rate, and demonstrates broad application potential. Statistical analysis using independent-sample t-tests indicates that for the key metric mAP@0.5, the CCASF-YOLOv10 model (95.3%) performs significantly better (p < 0.05) than both the baseline YOLOv10 (94.5%) and YOLOv5 (94.2%). No significant difference (p > 0.05) is observed when compared with YOLOv7 (95.5%). Regarding mAP@0.5:0.95, which comprehensively evaluates localization accuracy, the CCASF-YOLOv10 model (72.9%) demonstrates an extremely significant improvement (p < 0.01) over all other models, including the baseline YOLOv10 (72.2%, p = 0.008 < 0.01), YOLOv7 (70.9%, p = 0.004 < 0.01), and YOLOv11 (72.1%, p = 0.009 < 0.01). In terms of parameter size, CCASF-YOLOv10 (11.9 × 10⁶) is 7.7% smaller than the baseline YOLOv10 (12.9 × 10⁶), with this difference being statistically significant (t = 3.21, p = 0.024 < 0.05). These findings confirm that integrating CNCM and ASF mechanisms leads to a significant improvement in detection accuracy while preserving lightweight model characteristics, and the observed performance gain is robust. Compared with Zhang and Li on CBAM+ASFF YOLOXs lettuce growth stage recognition, the lightweight advantage of the present experiment is more pronounced. The parameter size of CCASF-YOLOv10 is only 11.9 M, which is substantially smaller than the 55.5 M parameter size reported in Zhang et al.’s research. This makes the model more suitable for deployment on edge devices in agricultural scenarios. Furthermore, the model still maintains a balance between accuracy and efficiency while ensuring lightweight characteristics. Compared to the study by Malabanan et al. [27] on the classification of lettuce growth stages, which only evaluated YOLOv10 and DETR algorithms, this paper encompasses YOLOV5 to YOLOv11 and other recent mainstream object detection models. The comparative analysis with the baseline YOLOv10 model clearly demonstrates the advancements achieved by the proposed method in the current technical system.

3.2. Test Experiments

To assess the accuracy of the improved YOLOv10 model in predicting the lettuce growth cycle in real world, four lettuce images were randomly selected to demonstrate the model’s prediction results (Figure 7). The distribution of target types varies among the images; some are dominated by germination stage targets, while others contain a large number of seedling stage individuals. The “Ready” label indicating the mature and harvestable stage, appears only in certain areas of other unshown test samples (not included in Figure 5) due to the low proportion of mature lettuce in the actual field sampling. Some labels are slightly truncated due to layout limitations, but the primary detection information and confidence levels are fully readable, demonstrating the model’s ability to accurately classify lettuce across developmental stages. The detection algorithm demonstrates relative accuracy in determining both the target location and category, with the bounding boxes generally enclosing the corresponding lettuce plants at different growth stages.

3.3. Ablation Experiments

To evaluate the effectiveness of various improvement schemes in target detection tasks, ablation experiments were conducted using different combinations of improved modules. The original model, without CNCM and ASF modules, served as the baseline. The experimental results are shown in Table 5. The baseline model without CNCM and ASF modules achieved a p value of 91.5%, an R value of 90.0%, an mAP@0.5 of 94.5%, and an mAP@0.5:0.95 of 72.2%. The parameter count was 7.2 × 10⁶, and inference speed was 25.1 ms. Incorporating only the ASF module increased the parameter increased count to 7.8 × 10⁶, GFLOPs to 23.8, and inference speed to 26.64 ms. The p value increased to 93.2%, mAP@0.5 and mAP@0.5:0.95 increased to 95.1% and 72.5%, respectively, while the R value slightly decreased to 89.4%. These results demonstrate that ASF, as an adaptive spatial attention mechanism, enhances target region features and suppresses background interference, thereby improving detection accuracy and overall precision. Although recall is marginally reduced, the overall optimization effect remains substantial. When only the CNCM module was added, the parameter count increased to 8.3 × 10⁶, GFLOPs to 27.6, and inference speed to 26.50 ms. The R value increased to 91.0%, mAP@0.5 and mAP@0.5:0.95 rose to 94.8% and 72.6% and the p value slightly increased to 92.1%. These results suggest that CNCM, through its channel-nonlocal hybrid mechanism, enhances global feature interaction between channels and facilitates the capture of fine-grained target features. This module primarily improves recall, with minor improvements in precision and overall accuracy. When both CNCM and ASF modules were incorporated, the model achieved optimal detection performance (p = 91.9%, R = 91.6%, mAP@0.5 = 95.3%, mAP@0.5:0.95 = 72.9%), with a parameter count of 8.9 × 10⁶, GFLOPs of 29.8, and inference speed of 27.76 ms. These results indicate a synergistic effect between the two modules: CNCM enhances global channel interaction to improve recall, while ASF optimizes spatial features to ensure precision. The complementary effects of these modules enable the improved model to achieve optimal performance across all core detection metrics. Despite a slight increase in computational requirements and parameter count, the model maintains efficient inference, balancing detection accuracy and practical applicability. Paired t-test results indicate that the scheme incorporating only the ASF module demonstrates significantly higher precision (t = 2.87, p = 0.041 < 0.05) and mAP@0.5 (t = 2.93, p = 0.037 < 0.05) compared to the baseline. This finding suggests that the adaptive spatial attention mechanism enhances the extraction of target region features and suppresses background interference. The scheme that uses only the CNCM module achieves significantly higher recall (t = 2.79, p = 0.045 < 0.05) and mAP@0.5:0.95 (t = 2.81, p = 0.039 < 0.05) than the baseline, demonstrating that the channel non-local mixing mechanism improves global feature interactions and captures fine-grained features. When both modules are combined, the scheme achieves highly significant improvements across all core metrics (p < 0.01), with mAP@0.5 increasing by 0.8% and mAP@0.5:0.95 increasing by 0.7% relative to the baseline. These results confirm a synergistic effect between CNCM and ASF, as their combination compensates for the individual limitations of each module, and the observed optimization effect is statistically significant.

3.4. Confusion Matrix and Error Analysis

To quantify the misclassification characteristics of the CCASF-YOLOv10 model in detecting lettuce growth stages and to identify the key factors contributing to detection errors, a confusion matrix was constructed using the test set results (Figure 8). The horizontal axis represents the predicted categories, while the vertical axis denotes the actual categories. The categories comprise empty shell stage, pod setting stage, germination stage, seedling stage, mature and harvestable stage, and background. The numerical values in the matrix indicate the number of samples for each prediction outcome, providing a clear representation of the model’s recognition accuracy and misclassification tendencies for each growth stage.

Analysis of the confusion matrix indicates that the CCASF-YOLOv10 model achieved high recognition accuracy across most lettuce growth stages. The correct classification rates for the empty shell stage, germination stage, and mature and harvestable stage were 96%, 98%, and 81%, respectively. The empty shell stage exhibited the lowest misclassification rate, with only a small number of samples incorrectly identified as the pod setting stage. The misclassification is attributed to the similar texture features between the edge of the empty shell and the young root of the pod setting stage in certain images with uneven lighting. The germination stage showed the highest detection performance, as the cotyledon expansion and true leaf germination during this stage produced distinct morphological features that the model could readily capture and differentiate from other stages.

In summary, the primary factors contributing to the model misclassification include minimal morphological differences between adjacent growth stages, inadequate feature extraction of small target samples, and limited sample sizes in certain stages. The CNCM-ASF dual-mechanism design within the CCASF-YOLOv10 model has effectively addressed these challenges. Specifically, the CNCM mechanism enhances global channel feature interaction, thereby improving the capture of fine-grained features among similar stages. The ASF mechanism suppresses background interference and enhances the spatial feature representation of target regions. The combined effect of these mechanisms substantially reduces the model’s misclassification rate and increases overall detection accuracy. To address remaining misclassification issues, future research can focus on increasing the sample size for underrepresented stages, developing multi-scale feature enhancement modules for small targets, and incorporating a stage-specific feature classification branch.

3.5. Functional Testing

Figure 9 illustrates the system designed for monitoring and analyzing plant growth using machine vision technology. Functional testing of the two detection methods focuses on verifying both stability and accuracy. In album mode, images of lettuce with varying resolutions and lighting environments are uploaded to evaluate whether the system can quickly invoke the YOLOv10 model to detect contours, colors, and shapes, and accurately classify growth stages as seedling, growth, or maturity. The system must display test results and provide recommendations for their growth cycle. Verification also ensures that, when users opt to save or continue uploading new images, the system processes them sequentially without lag or delay during continuous batch image detection. In real-time monitoring mode, functional testers simulate diverse lighting conditions and dynamic backgrounds to verify that the system can stably acquire real-time images and consistently invoke the YOLOv10 model for dynamic image detection. The primary focus in this mode is on response speed, detection accuracy, and continuous detection processing capability under real-time conditions. Following testing in both modes, the system demonstrates the ability to deliver fast, accurate, and stable detection across all reproductive periods, regardless of the operational mode. This reliability ensures that farmers receive a consistent user experience and scientifically grounded recommendations for reproductive period management. The user interface detection result images of the plant growth monitoring and analysis system encompass functional areas such as image display, file import, detection result statistics, target location information, and operation buttons.

All performance indicators of the CCASF-YOLOv10 model reported in this study (accuracy 91.9%, recall 91.6%, mAP@0.5 95.3%, mAP@0.5:0.95 72.9%, etc.) were obtained under the specific conditions of the constructed lettuce growth stage dataset. The dataset’s collection environment (controlled greenhouse, light intensity of 300 μmol·m⁻²·s⁻¹, temperature and humidity of 60% to 70%, 5000 K supplementary lighting), lettuce variety (butter lettuce), growth stage division standards (empty shell stage, pod stage, germination stage, seedling stage, mature harvest stage), and data augmentation methods (brightness, contrast, rotation and other parameter adjustments) are essential prerequisites for these performance results. Given the substantial variability in lettuce planting environments (outdoor/greenhouse), varieties, image acquisition equipment and parameters (resolution, focal length, exposure) across different regions, these results are applicable only to scenarios with conditions similar to those of the dataset used in this study.

4. Conclusions

This study focuses on the intelligent recognition of lettuce growth stages, aiming to optimize the recognition accuracy and model generalization across various developmental phases, including the seedling and pod stages. An image dataset encompassing five distinct lettuce growth stages was constructed, and five data augmentation methods, such as rotation and brightness adjustment, were used to expand sample diversity and enhance the model’s adaptability to field environment variations. Building on the multi-scale detection capabilities of the YOLOv10 model, an improved CCASF-YOLOv10 model was proposed by integrating channel non-local mixing and an adaptive spatial frequency attention mechanism (CCASF). Experimental results indicate that the model achieves an accuracy (P) of 91.9%, a recall (R) of 91.6%, mAP@0.5 of 95.3%, mAP@0.5:0.95 of 72.9%, and a parameter size of 11.9 × 10⁶ M. Maintaining lightweight characteristics with an inference speed of 24.76 ms, the model demonstrates significantly improved detection efficiency over manual recognition methods, thereby providing reliable technical support for intelligent lettuce planting management. Despite these achievements, two primary limitations remain in the research. First, the data collection environment is relatively homogeneous, lacking scene samples from diverse regions, seasons, and meteorological conditions. This limitation restricts the model’s generalization ability in field applications. Therefore, the reported performance results are only applicable to the specific planting environment and collection conditions of the current dataset. To extend applicability to other datasets or different collection conditions, such as different lettuce varieties, complex outdoor lighting, or varying collections, further optimization of model adaptability is required. This can be achieved by supplementing training and validation with multi scene sample. The next step involves constructing a universal dataset through multi-scene image collection to comprehensively enhance the model’s adaptability to complex field environments. Second, the current models exhibit limited ability to distinguish features of certain subtle growth stages, increasing the risk of misclassification. Future work will focus on optimizing feature extraction strategies for similar growth stages. By introducing more refined feature extraction modules and expanding the number of fine-grained classification samples, the recognition accuracy for subtle lettuce growth stages can be improved, advancing intelligent recognition towards higher accuracy and finer granularity.

Author Contributions

Conceptualization, Q.G. and C.S.; Methodology, Q.G. and C.S.; Software, Q.G. and C.S.; Validation, Q.G. and C.S.; Writing—original draft, C.S.; Writing—review and editing, Q.G., Y.J. and M.W.; Project administration, Q.G., Y.J. and M.W.; Funding acquisition, Q.G., Y.J. and M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shaanxi Provincial Key R&D Program-Key Projects-Qinchuangyuan Industrial Innovation Aggregation Zone “Four Chains” Integration (No. 2024CY-JJQ-35), Project of Young Talent Support Program of Xi’an Science and Technology Association (No. 959202413080), Xi’an Science and Technology Program-University Institutes Science and Technology Personnel Serving Enterprises Project Arts and Sciences Special (No. 24GXFW0081-20), AI-driven Smart Agriculture Discipline Cross-innovation Team, Xi’an College of Arts and Sciences (No. 25WTJX01), Special Project of the Third Batch of Discipline Cross-Building Points (Agricultural Numerical Intelligence) of Xi’an College of Arts and Sciences (No. XY2024JC03), 2025 Xi’an Science and Technology Program Project—Agricultural Key Technology Research Project (General Project) (No. 2025JH-NJSYB-0064).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Barisik Marasli, D.; Colak Gunes, N.; Tavman, S. A comprehensive review of solar photovoltaic hybrid food drying systems. Crit. Rev. Food Sci. Nutr. 2022, 62, 4152–4168. [Google Scholar] [CrossRef]
Niu, S.Q.; Xu, X.L.; Liang, A.; Yun, Y.L.; Li, L.; Hao, F.Q. Research on a lightweight method for maize seed quality detection based on improved YOLOv8. IEEE Access 2024, 12, 32927–32937. [Google Scholar] [CrossRef]
Li, Y.; Xue, C.Y.; Yang, X.G.; Wang, J.; Liu, Y.; Wang, E.L. Reduction of yield risk of winter wheat by appropriate irrigation based on APSIM model. Trans. Chin. Soc. Agric. Eng. 2009, 25, 35–44. [Google Scholar] [CrossRef]
Li, Z.B.; Li, Y.; Yang, Y.B.; Guo, R.H.; Yang, J.Q.; Yue, J.; Wang, Y.Z. A high-precision detection method of hydroponic lettuce seedlings status based on improved Faster RCNN. Comput. Electron. Agric. 2021, 182, 106054. [Google Scholar] [CrossRef]
Wang, Y.D.; Wu, M.G.; Shen, Y.D. Identifying the growth status of hydroponic lettuce based on YOLO-efficientNet. Plants 2024, 13, 372. [Google Scholar] [CrossRef]
Zhao, Y.Q.; Zhang, X.D.; Sun, J.J.; Yu, T.T.; Cai, Z.Y.; Zhang, Z.; Mao, H. Low-cost lettuce height measurement based on depth vision and lightweight instance segmentation model. Agriculture 2024, 14, 1596. [Google Scholar] [CrossRef]
Liu, H.B.; Zhang, P.; Zheng, J.S. An intelligent method and platform for obtaining lettuce canopy coverage. Front. Plant Sci. 2026, 17, 1749000. [Google Scholar] [CrossRef] [PubMed]
Zhang, P.; Li, D.L. CBAM+ ASFF-YOLOXs: An improved YOLOXs for guiding agronomic operation based on the identification of key growth stages of lettuce. Comput. Electron. Agric. 2022, 203, 107491. [Google Scholar] [CrossRef]
Zhang, P.; Li, D.L. YOLO-VOLO-LS: A novel method for variety identification of early lettuce seedlings. Front. Plant Sci. 2022, 13, 806878. [Google Scholar] [CrossRef]
Zhang, J.; Yang, W.Z.; Lu, Z.F.; Chen, D. HR-YOLOv8: A Crop Growth Status Object Detection Method Based on YOLOv8. Electronics 2024, 13, 1620. [Google Scholar] [CrossRef]
Zhu, F.L.; Yan, S.; Sun, L.; He, M.Z.; Zheng, Z.W.; Qiao, X. Estimation method of lettuce phenotypic parameters using deep learning multi-source data fusion. Trans. Chin. Soc. Agric. Eng. 2022, 38, 195–204. [Google Scholar] [CrossRef]
Fang, C.; Yang, X. Lightweight YOLOv8 for wheat head detection. IEEE Access 2024, 12, 66214–66222. [Google Scholar] [CrossRef]
Li, T.; Ren, L.; Hu, B.; Wang, S.; Zhao, M.; Zhang, Y.Q.; Yang, M. Grading detection of tomato hole-pan seedlings using improved YOLOv5s and transfer learning. Trans. Chin. Soc. Agric. Eng. 2023, 39, 174–184. [Google Scholar] [CrossRef]
Wang, H.Z.; Sun, L.C.; Li, X.L.; Liu, H.T.; Wang, G.B.; Lan, Y.B. Detecting tomato leaf pests and diseases using improved YOLOv7-tiny. Trans. Chin. Soc. Agric. Eng. 2024, 40, 194–202. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Yuan, S.; Qin, H.; Yan, X. ASCNet: Asymmetric sampling correction network for infrared image destriping. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5001815. [Google Scholar] [CrossRef]
Kang, M.; Ting, C.M.; Ting, F.F. ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation. Image Vis. Comput. 2024, 147, 105057. [Google Scholar] [CrossRef]
Xu, B.; Chai, L.; Zhang, C. Research and application on corn crop identification and positioning method based on Machine vision. Inf. Process. Agric. 2023, 10, 106–113. [Google Scholar] [CrossRef]
Zhen, Y.Y.; Liu, P.; Jin, Y.C.; Yang, G.H. Grape leaf disease detection and identification based on improved YOLOv8. Trans. Chin. Soc. Agric. Eng. 2025, 41, 148–154. [Google Scholar] [CrossRef]
Peng, X.; Zhou, J.P.; Xu, Y.; Xi, G.Z. Cotton top bud recognition method based on YOLOv5-CPP in complex environment. Trans. Chin. Soc. Agric. Eng. 2023, 39, 191–197. [Google Scholar] [CrossRef]
Shi, Y.; Wang, Y.K.; Wang, F.; Qing, S.H.; Zhao, L.; Yuwen, X.C. Recognizing young apples using improved YOLOv8n. Trans. Chin. Soc. Agric. Eng. 2025, 41, 204–210. [Google Scholar] [CrossRef]
Zhu, M.; Wang, H.; Meng, Y. Lightweight multiview mask contrastive network for small-sample hyperspectral image classification. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; pp. 478–490. [Google Scholar]
Jocher, G. YOLOv5 by Ultralytics, Version 7.0, Computer Software: Chalfont, PA, USA, 2020.
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. Comput. Vis. Pattern Recognit. 2024, 17725. [Google Scholar] [CrossRef]
Malabanan, J.A.B.; Buenventura, V.A.N.; Domondon, J.Y.F. Growth Stage Classification on Lettuce Cultivars Using Deep Learning Models. In Proceedings of the 2024 IEEE International Conference on Imaging Systems and Techniques (IST); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]

Figure 1. Images after adjusting different parameters.

Figure 2. Structure of CCASF-YOLOv10 Network.

Figure 3. RCSSC Module.

Figure 4. ASF Attention.

Figure 5. Flowchart of Lettuce Recognition System.

Figure 6. UI interface.

Figure 7. Test results.

Figure 8. CCASF-YOLOv10 Confusion Matrix.

Figure 9. Plant growth monitoring and analysis system based on machine vision technology.

Table 1. Comparative analysis of existing studies on lettuce intelligent monitoring.

Reference	Research Content	Method	Dataset	Results	Limitations
Li et al. [4]	Lettuce seedling state detection	Improved Faster R-CNN	Hydroponic lettuce seedling dataset	mAP = 86.2%, outperforms RetinaNet/SSD	Two-stage algorithm, large parameters, slow inference, no field deployment
Wang et al. [5]	Lettuce pest and disease identification	YOLOv8n+EfficientNet-v2s	Hydroponic lettuce health status dataset	Test accuracy = 94.68%, F1 = 96.18%	Only 3 general states, no growth stage subdivision, poor generalization to soil cultivation
Zhao et al. [6]	Lettuce height measurement	Lightweight YOLOv8n segmentation	Hydroponic/potted lettuce dataset	Hydroponic accuracy = 94.339%, potted accuracy = 91.22%	Sensitive to light, low soil cultivation accuracy, single parameter measurement
Liu et al. [7]	Lettuce canopy coverage evaluation	CAS PSPNet/MobileNetv3 PSPNet	Lettuce canopy dataset	MIoU = 0.9832/0.9717, model size = 9.3 M	Single parameter, no full growth stage detection, only static image analysis
Zhang et al. [8]	Lettuce growth stage identification	CBAM+ASFF-YOLOXs	Lettuce key growth stage dataset	mAP = 99.04%, higher than original YOLOXs	Large parameter size, no edge device deployment
Zhang et al. [9]	Early lettuce seedling variety recognition	YOLO-VOLO-LS	Lettuce seedling SP stage dataset	mAP = 99.04%, higher than original YOLOXs	Only covers early growth stage, no full growth cycle detection
Zhang et al. [10]	Crop growth status detection	HR-YOLOv8	Oil palm/strawberry dataset	mAP = 99.04%, higher than original YOLOXs	No optimization for lettuce morphological characteristics, poor complex background adaptability

Table 2. Growth stages of lettuce.

Type	Growth Stage	Figure
Empty shell stage	No significant changes have occurred at this stage
Pod setting stage	Breaking through the seed coat and growing downwards to form the main root
Germination stage	The cotyledons are fully unfolded, and the true leaves begin to grow
Seedling stage	After the seeds absorb water, their internal physiological activities gradually become active
Mature and harvestable	Its edible parts have reached their optimal harvesting state

Table 3. Number of labels per category.

Type	Training Set	Test Set	Validation Set
Empty shell stage	4143	646	1133
Pod setting stage	2531	337	726
Germination stage	4652	631	1321
Seedling stage	1100	182	311
Mature and harvestable	1387	246	376

Note: The dataset is partitioned by image quantity with a 7:2:1 ratio. The label quantity ratio deviation is caused by the uneven distribution of lettuce targets in single field images, which is consistent with the actual planting scene.

Table 4. Comparative Experiment.

Model	P/%	R/%	mAP@0.5/%	mAP@0.5:0.95/%	Parameters/×10⁶ M	GFLOPs	Inference Speed (ms)
YOLOv5	96.1	89.7	94.2	64.6	7.2	16.5	13.0
YOLOv7	92.0	92.7	95.5	70.9	8.2	36.8	13.8
YOLOv8	91.3	88.7	94.6	71.5	11.3	28.4	12.3
YOLOv10	91.5	90.0	94.5	72.2	12.9	21.4	18.1
YOLOv11	92.5	92.5	94.7	72.1	7.2	21.3	17.5
CCASF-YOLOv10	91.9 **	91.6 **	95.3 **	72.9 **	11.9	32.8	24.8

Note: All comparative experiments in this section are based on the self-constructed lettuce growth stage dataset (described in Section 2.1 and Section 2.2) by the authors, and no public benchmark datasets are used. The experimental settings are kept consistent for all models to ensure the fairness and comparability of the performance comparison results. ** indicates extremely significant difference compared with the baseline YOLOv10 (p < 0.01). All experiments were repeated 5 times to ensure data reliability.

Table 5. Experimental results of CCASF-YOLOv10 model ablation.

CNCM	ASF	P/%	R/%	mAP@0.5/%	mAP@0.5:0.95/%	Parameters/×10⁶ M	GFLOPs	Inference Speed (ms)
×	×	91.5	90.0	94.5	72.2	7.2	21.4	25.1
×	√	93.2 *	89.4	95.1 *	72.5 *	7.8	23.8	26.6
√	×	92.1	91.0 *	94.8	72.6 *	8.3	27.6	26.5
√	√	91.9 **	91.6 **	95.3 **	72.9 **	8.9	29.8	27.8

Note: “√” indicates the adoption of the corresponding strategy, “×” indicates the nonadoption of the corresponding strategy. * indicates significant difference (p < 0.05); ** indicates extremely significant difference (p < 0.01) based on paired t-test. Each ablation experiment was repeated 5 times, and the average value was taken as the final result.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, Q.; Ji, Y.; Shi, C.; Wang, M. Design of a Machine Vision Detection System for Lettuce Growth Stages Based on the CCASF-YOLOv10 Model. Horticulturae 2026, 12, 379. https://doi.org/10.3390/horticulturae12030379

AMA Style

Gao Q, Ji Y, Shi C, Wang M. Design of a Machine Vision Detection System for Lettuce Growth Stages Based on the CCASF-YOLOv10 Model. Horticulturae. 2026; 12(3):379. https://doi.org/10.3390/horticulturae12030379

Chicago/Turabian Style

Gao, Qiang, Yu Ji, Chongchong Shi, and Meili Wang. 2026. "Design of a Machine Vision Detection System for Lettuce Growth Stages Based on the CCASF-YOLOv10 Model" Horticulturae 12, no. 3: 379. https://doi.org/10.3390/horticulturae12030379

APA Style

Gao, Q., Ji, Y., Shi, C., & Wang, M. (2026). Design of a Machine Vision Detection System for Lettuce Growth Stages Based on the CCASF-YOLOv10 Model. Horticulturae, 12(3), 379. https://doi.org/10.3390/horticulturae12030379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of a Machine Vision Detection System for Lettuce Growth Stages Based on the CCASF-YOLOv10 Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Lettuce Growth Stage Dataset

2.2. Data Collection During the Growth Stage of Lettuce

2.3. A Lettuce Growth Stage Monitoring Model Based on Improved CCASF-YOLOv10

2.4. Improved C2f_CNCM Module

2.5. ASF_Attention

2.6. Experimental Environment and Parameters

2.7. Evaluation Index

2.8. System Design of Lettuce Growth Stage Detection

2.8.1. System Workflow

2.8.2. System Functional Module Division

2.8.3. System Implementation Environment & Technical Framework

3. Results and Discussions

3.1. Performance Between CCASF-YOLOv10 and Other Models

3.2. Test Experiments

3.3. Ablation Experiments

3.4. Confusion Matrix and Error Analysis

3.5. Functional Testing

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI