TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection

Chen, Xiaolei; Wu, Long; Yang, Xu; Xu, Lu; Chen, Shuyu; Zhang, Yong

doi:10.3390/app15179461

Open AccessArticle

TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection

by

Xiaolei Chen

¹,

Long Wu

^1,*,

Xu Yang

^1,*,

Lu Xu

¹,

Shuyu Chen

² and

Yong Zhang

³

¹

College of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

²

Keyi College of Zhejiang Sci-Tech University, Shaoxing 312369, China

³

Institute of Optical Target Simulation and Test Technology, Harbin Institute of Technology, Harbin 150001, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9461; https://doi.org/10.3390/app15179461

Submission received: 6 August 2025 / Revised: 22 August 2025 / Accepted: 24 August 2025 / Published: 28 August 2025

Download

Browse Figures

Versions Notes

Abstract

The inspection of the appearance quality of tea leaves is vital for market classification and value assessment within the tea industry. Nevertheless, many existing detection approaches rely on sophisticated model architectures, which hinder their practical use on devices with limited computational resources. This study proposes a lightweight object detection network, TeaAppearanceLiteNet, tailored for tea leaf appearance analysis. A novel C3k2_PartialConv module is introduced to significantly reduce computational redundancy while maintaining effective feature extraction. The CBMA_MSCA attention mechanism is incorporated to enable the multi-scale modeling of channel attention, enhancing the perception accuracy of features at various scales. By incorporating the Detect_PinwheelShapedConv head, the spatial representation power of the network is significantly improved. In addition, the MPDIoU_ShapeIoU loss is formulated to enhance the correspondence between predicted and ground-truth bounding boxes across multiple dimensions—covering spatial location, geometric shape, and scale—which contributes to a more stable regression and higher detection accuracy. Experimental results demonstrate that, compared to baseline methods, TeaAppearanceLiteNet achieves a 12.27% improvement in accuracy, reaching a mAP@0.5 of 84.06% with an inference speed of 157.81 FPS. The parameter count is only 1.83% of traditional models. The compact and high-efficiency design of TeaAppearanceLiteNet enables its deployment on mobile and edge devices, thereby supporting the digitalization and intelligent upgrading of the tea industry under the framework of smart agriculture.

Keywords:

object detection; C3k2; attention mechanism; detection head; regression loss function

1. Introduction

As a traditional and important economic crop, tea relies heavily on appearance quality for market grading and value assessment. As consumer expectations continue to rise and the tea industry undergoes intelligent transformation, the rapid and precise assessment of tea leaf appearance has become a crucial factor driving digital development. Manual, experience-driven inspection is limited by subjectivity, low efficiency, and a high labor cost, and it adapts poorly to multi-variety, multi-batch, and complex operating conditions. These constraints place higher demands on automated appearance inspections based on computer vision.

In recent years, artificial intelligence in agricultural settings has advanced steadily across tasks such as crop recognition, pest monitoring, and product grading, and the tea domain has seen effective explorations. For instance, Zhang et al. [1] achieve efficient identification of one-bud-two-leaf tea samples on edge devices through structural simplification and channel pruning. Cao et al. [2] proposed incorporating the GhostNet module together with coordinate attention to achieve the lightweight and efficient detection of tea buds. Shuai et al. [3] enhance the identification of dense small targets by integrating image modality information and modifying the YOLOv5 framework. Wang et al. [4] incorporated attention mechanisms along with improved feature fusion methods, which enhanced the accuracy of small tea bud detection under complex background conditions. Meng et al. [5] refined the YOLOX-tiny and PSPNet architectures to achieve reliable tea bud identification and the precise determination of picking positions. Yang et al. [6] construct a classification model for tea bud recognition based on YOLOX. Zhao et al. [7] propose a multi-variety tea bud detection approach by integrating an improved YOLOv7 with an ECA attention mechanism. Meng et al. [8] further enhance feature extraction by embedding DSConv, CBAM, and CA modules into an improved YOLOv7 network. Shi et al. [9] proposed a small-object detection approach tailored to complex background scenarios by integrating the Swin Transformer with YOLOv8. In contrast, Xie et al. [10] enhanced detection performance under challenging conditions by combining deformable convolutions, attention modules, and an improved spatial pyramid pooling structure.

In parallel, lightweight object detection oriented to edge and embedded use cases has progressed rapidly and shows promising transferability and deployability in heterogeneous conditions. Moosmann et al. [11] present TinyissimoYOLO, which leverages quantization and low-memory optimization for efficient detection on low-power microcontrollers. Li et al. [12] propose Edge-YOLO that integrates pruning and feature fusion for lightweight infrared detection on edge devices. Betti et al. [13] proposed YOLO-S, a lightweight framework with compact architecture and enhanced feature extraction, designed for small-object detection in aerial images. Reis et al. [14] use YOLOv8 with transfer learning to build a lightweight real-time detector that is trained on multi-class flying-object data and fine-tuned in complex environments, achieving high Precision for small targets and occlusions. Alqahtani et al. [15] benchmark different detectors on edge hardware to evaluate the performance of lightweight methods. Nghiem et al. [16] proposed LEAF-YOLO, which integrates lightweight feature extraction with multi-scale fusion to enable the real-time detection of small objects in aerial imagery on edge devices.

Despite these advances, tea leaves, which exhibit subtle inter-class differences, complex textures, and significant batch variability, still pose unresolved challenges. Accuracy and speed remain difficult to balance, and parameter counts are often large, which hinders deployment on mobile and embedded platforms. In production scenarios with multiple coexisting varieties, large scale variation, and dense stacking, feature representation is easily disturbed, the bounding box regression becomes unstable, and small objects are frequently missed. It is therefore necessary to conduct targeted research on tea leaf appearance recognition and to adopt dedicated lightweight design so that detection accuracy is maintained together with stable real-time inference. This direction improves the automation and consistency of quality evaluation and supports smart agriculture in resource-constrained settings.

To tackle these issues, we introduce a lightweight detection framework named TeaAppearanceLiteNet for evaluating tea leaf appearance, in which multiple innovative modules are embedded into the overall architecture. The proposed method effectively lowers computational overhead while sustaining, and in some cases, enhancing, detection accuracy, thus demonstrating a robust real-time capability. The main contributions of this study are outlined as follows:

1.: The C3k2_PartialConv module incorporates PartialConv operations to effectively minimize redundant calculations and reduce memory access overhead.
2.: To address the limitations of CBAM in channel attention, this work presents the CBMA_MSCA mechanism, which incorporates a multi-scale strategy.
3.: The Detect_PinwheelShapedConv head is introduced, leveraging pinwheel-shaped convolutions to enhance feature perception and spatial representation capabilities.
4.: The MPDIoU_ShapeIoU loss function is developed by combining MPDIoU and ShapeIoU, aiming to improve detection accuracy and regression stability.

Overall, TeaAppearanceLiteNet achieves a better balance between compactness and accuracy and exhibits a clear intrinsic distinctiveness relative to lightweight variants of the YOLO family. C3k2_PartialConv suppresses redundancy and performs selective channel updates, improving feature utilization and computing efficiency while preserving the backbone topology and avoiding the representational loss associated with depthwise separability and layerwise pruning; CBMA_MSCA injects a multi-scale context and introduces saliency competition to realize fine-grained discrimination across scales, which matches the subtle differences and wide scale range of tea leaves; Detect_PinwheelShapedConv, together with MPDIoU_ShapeIoU, forms a closed loop of perception enhancement and shape consistency, where orientation-sensitive convolution strengthens spatial representation and multidimensional regression stabilizes localization, leading to a better shape sensitivity and robustness. Benefiting from these designs, TeaAppearanceLiteNet is suitable for agricultural vision tasks in resource-limited deployments, provides a lightweight, efficient, and practical solution for smart agriculture, and offers a transferable design paradigm for fine-grained analysis in tea leaf appearance recognition.

2. Materials and Methods

2.1. Architecture of the TeaAppearanceLiteNet Network

In this work, we present TeaAppearanceLiteNet, whose overall framework is illustrated in Figure 1. The architecture incorporates the C3k2_PartialConv module, leveraging the advantages of PartialConv to improve computational efficiency and feature representation while maintaining structural effectiveness. The CBMA_MSCA attention mechanism is employed to enable the multi-scale modeling of channel attention, allowing for a more refined extraction of salient features across targets of varying sizes. The Detect_PinwheelShapedConv head introduces a pinwheel-shaped convolution in place of part of the conventional convolution operations, enhancing both feature perception and spatial representation. To strengthen detection robustness, this study introduces the MPDIoU_ShapeIoU regression loss, which simultaneously accounts for the spatial position, geometric shape, and scale consistency between predicted and ground-truth boxes, thus improving both accuracy and regression stability.

2.2. C3k2_PartialConv

This work presents the C3k2_PartialConv module, where the Bottleneck in the original C3k structure of the C3k2 module is substituted with PartialConv [17] to enable more efficient and accurate feature extraction. The overall design is shown in Figure 2.

The core idea of PartialConv is that convolution operations are performed on only a portion of the input feature map channels, while the other channels are directly forwarded without modification. In practice, either the initial or final continuous segment of channels is typically selected as the convolution subset to facilitate contiguous memory access and improve execution efficiency. The channel ratio for the subset (e.g., 1/4 or 1/2) is predefined and has been validated across multiple tasks to retain most of the essential information effectively.

To strengthen feature fusion, a 1 × 1 convolution is performed across all channels after the PartialConv operation. This step compensates for the untouched channels and improves the overall completeness and accuracy of feature representation.

By preserving the multi-scale processing strengths of the C3k module and optimizing convolutional efficiency, this modification ensures both accuracy and computational effectiveness. The enhancement is particularly evident in boundary detection and scenes with complex backgrounds, where higher feature extraction Precision is required. By exploiting the redundancy among feature channels, the C3k2_PartialConv module markedly improves computational efficiency while avoiding a substantial increase in parameters or overall computational burden.

In the context of tea leaf appearance inspection, the selective convolution mechanism of PartialConv enhances the ability to capture fine-grained edge information, which is especially valuable for delineating irregular leaf boundaries. Moreover, by emphasizing critical spatial features while reducing redundant computation, the C3k2_PartialConv module improves robustness against variations in leaf shape and complex background interference, thereby supporting more accurate and reliable feature extraction in agricultural vision tasks.

2.3. CBMA_MSCA

This work introduces the CBMA_MSCA attention mechanism, which extends the original CBAM [18] by substituting its channel attention component with the MSCA module [19]. Through this replacement, multi-scale channel modeling is incorporated to improve the accuracy of feature representation. The overall architecture is shown in Figure 3.

For channel modeling, CBMA_MSCA leverages the multi-branch bar-shaped convolutional structure of MSCA, including kernels such as 1 × 7, 7 × 1, 1 × 11, 11 × 1, 1 × 21, and 21 × 1 to extract features at multiple scales in parallel. This enables the effective capture of both local details and long-range dependencies, enhances the ability to model significant inter-channel interactions, and improves the accuracy and discriminative power of channel attention.

For spatial modeling, CBMA_MSCA preserves the spatial attention component of CBAM, where spatial features are obtained via a parallel average and max pooling, followed by convolution to produce the spatial attention map. This enhances the sensitivity of the network to key spatial positions and improves the perception of object structure and spatial layout.

CBMA_MSCA leverages multi-scale channel modeling together with refined spatial saliency to establish joint channel–spatial attention, which substantially improves its capacity for effective feature selection. Additionally, CBMA_MSCA inherits the lightweight design of CBAM by employing depthwise separable convolutions, resulting in minimal computational and parameter overhead. With strong adaptability and generalization, this design proves effective for a broad spectrum of visual recognition tasks.

Compared with other attention mechanisms, CBMA_MSCA achieves a higher accuracy because the introduction of multi-scale strip convolutions effectively aggregates contextual information across different receptive fields. This design provides a superior capability in modeling objects of varying sizes, which is critical for dense prediction tasks. Therefore, CBMA_MSCA can capture fine-grained details while simultaneously maintaining global consistency, leading to more precise feature representations and an improved overall performance.

2.4. Detect_PinwheelShapedConv

This study introduces the detection head Detect_PinwheelShapedConv, which integrates PinwheelShapedConv [20] to enable more effective feature extraction and receptive field expansion. This design is particularly well-suited for detecting weak and small targets. The architecture is shown in Figure 4.

Unlike standard convolution, Detect_PinwheelShapedConv adopts the distinctive asymmetric padding strategy of a pinwheel-shaped convolution. Through the outward alternation of horizontal and vertical kernels, the receptive field is greatly extended. This innovative design facilitates efficient low-level feature capture, enhances object–background discrimination, and significantly strengthens the modeling of subtle target features.

Detect_PinwheelShapedConv greatly enlarges the receptive field while incurring only a slight growth in parameter count. This parameter efficiency is attributed to its grouped convolution structure, which enables the significant enlargement of the receptive field while maintaining low computational overhead. Owing to its structure, the design is particularly advantageous for small-object detection when dealing with faint targets and background complexity.

Furthermore, Detect_PinwheelShapedConv improves tea leaf appearance inspection by effectively handling shape variation, fine-grained edges, and low-contrast defects. Its expanded anisotropic receptive field captures curved contours under deformation, while alternating horizontal–vertical kernels enhance sensitivity to serrated margins and microcracks. The Gaussian-aligned emphasis increases contrast for weak textures and blemishes, reducing background interference. In addition, the decoupled-head formulation allows for classification and localization branches to specialize, jointly improving Precision and stability for subtle defects in complex production settings.

2.5. MPDIoU_ShapeIoU

This study proposes the MPDIoU_ShapeIoU loss function, which effectively combines the positional information of MPDIoU [21] with the shape and scale descriptors of ShapeIoU [22]. This integration allows for more sensitive capture of both location discrepancies and differences in geometric proportions, thereby improving generalization across targets of varying scales and shapes. As a result, the accuracy and robustness of the bounding box regression are significantly enhanced.

The MPDIoU_ShapeIoU loss function is defined as follows:

MPDIoU_ShapeIoU (b_{1}, b_{2}) = IoU (b_{1}, b_{2}) - distance - \frac{1}{2} \cdot shape_cost - \frac{d_{1}}{h_{w}} - \frac{d_{2}}{h_{w}}

(1)

IoU (b_{1}, b_{2})

computes the ratio between the overlap and the union of two boxes, as given in (2):

IoU (b_{1}, b_{2}) = \frac{|b_{1} \cap b_{2}|}{|b_{1} \cup b_{2}|}

(2)

where

|b_{1} \cap b_{2}|

is the intersection area of boxes

b_{1}

and

b_{2}

, and

|b_{1} \cup b_{2}|

is the union area.

distance

denotes the center-to-center Euclidean distance of the two boxes, as in (3).

distance = \sqrt{{(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2}}

(3)

shape_cost

measures the discrepancy of shapes between the two boxes, as in (4):

shape_cost = |\frac{w_{1}}{h_{1}} - \frac{w_{2}}{h_{2}}|

(4)

where

w_{1}

and

h_{1}

and

w_{2}

and

h_{2}

are the width and height of

b_{1}

and

b_{2}

, respectively.

\frac{d_{1}}{h_{w}}

and

\frac{d_{2}}{h_{w}}

denote the vertical edge distances between the two boxes, where

d_{1}

is the vertical distance along one axis, defined in (5):

d_{1} = |y_{1 top} - y_{2 top}|

(5)

where

d_{2}

is the vertical distance along the other axis, defined in (6):

d_{2} = |y_{1 bottom} - y_{2 bottom}|

(6)

where

h_{x}

is a weighting factor that can be adjusted according to box size or other considerations, with the purpose of rescaling the vertical distance so that the influence is appropriate for boxes of different sizes, as shown in (7).

h_{w} = \max (w_{1}, h_{1}, w_{2}, h_{2})

(7)

2.6. Dataset

The tea leaf appearance dataset constructed in this study encompasses a wide range of leaf conditions, with the goal of enhancing the model’s ability to identify the diverse visual traits of tea leaves. The data preparation procedure is presented in Figure 5. To ensure clean backgrounds and prominent subjects, tea leaves are uniformly arranged on white paper and photographed under controlled conditions. All images are captured using the main camera of an Apple iPhone 14 Pro Max, which has a resolution of 48 megapixels, with the original image resolution set to 4032 × 3024 pixels. During preprocessing, each image was cropped and resized to 640 × 640 pixels to better preserve fine leaf details. A strict screening procedure was applied to eliminate blurred, distorted, or otherwise inappropriate images, ensuring the dataset’s integrity and reliability.

The finalized dataset contains 3313 images, with 2320 allocated for training, 664 for validation, and 329 reserved for testing. Each tea leaf instance within the images is annotated into one of four categories based on appearance characteristics: fine indicates complete, tender leaves with clear edges and uniform color, representing a high-quality appearance; coarse refers to older, rough leaves with damaged or wrinkled edges, indicating a lower visual quality; touching describes leaves that are close to or overlapping each other, resulting in unclear or compressed boundaries; unsure is used when factors such as blur, occlusion, or abnormal lighting prevent an accurate judgment of the appearance. The total number of annotated instances for each category is 9209 (fine), 10,914 (coarse), 66 (touching), and 1530 (unsure).

To mitigate the impact of class imbalance during evaluation and the limitations arising from the relatively small dataset size, several strategies were applied during model training. The training pipeline employed data augmentation strategies, including random flipping and scaling, to increase sample diversity while preserving the key visual characteristics of tea leaves. Meanwhile, the hyperparameters were meticulously adjusted: a smaller learning rate and early stopping were employed to mitigate overfitting, whereas an appropriate batch size combined with multi-scale training contributed to an improved robustness and better generalization across categories. Performance was further assessed using class-wise metrics such as Precision and Recall, ensuring fair and representative measurement for all categories.

2.7. Experimental Environment and Parameter Configuration

Table 1 presents the hardware and software specifications of the computer used for training, together with detailed information about the training environment of the model. The training was conducted with Python version 3.8.19, which was selected on the basis of its compatibility with essential libraries, even though more recent versions of Python were available.

Table 2 summarizes the training parameters adopted in the experimental process.

Figure 6 depicts the training and validation loss curves obtained after 300 epochs. A rapid reduction in all loss components is observed in the early stage, followed by stabilization, which demonstrates steady optimization and robust convergence. The training and validation curves exhibit a strong alignment in both value and trend, suggesting that no significant overfitting or underfitting occurs. The initial rapid reduction and subsequent stabilization of the loss curve confirm both the robustness and effectiveness of the training strategy. The model demonstrates clear convergence and exhibits a strong generalization capability.

3. Results

3.1. Experimental Results of Model Training

Following 300 epochs of training, TeaAppearanceLiteNet achieves remarkable outcomes across both training and validation sets, highlighting its effectiveness. As shown in Figure 7, all evaluation metrics exhibit an overall monotonic upward trend with increasing iterations, indicating a continuous improvement in performance during the training process. Although some fluctuations occur in the early stages, these are typical of the progressive learning phase of the model and primarily result from the sensitivity of accuracy-based metrics to boundary prediction and classification errors. As training continues, the model becomes more effective in target localization and classification, which drives a gradual improvement in metric values. The stable rise in mAP-related indicators further confirms the consistent improvement in detection performance across various IoU thresholds, reflecting a strong training stability and generalization capability.

To further enhance the detection accuracy and generalization capability while maintaining the lightweight nature of the model, a knowledge distillation mechanism is introduced to optimize TeaAppearanceLiteNet. By constructing a more powerful teacher model and applying strategies such as feature alignment, deep-level knowledge is effectively transferred to the student model of TeaAppearanceLiteNet. This enables an efficient information transfer without significantly increasing computational costs. As a result, the student model gains a richer capacity for semantic feature representation, achieving a substantial improvement in detection performance while preserving a compact architecture. Consequently, both detection accuracy and efficiency are optimized in a balanced manner.

The performance of TeaAppearanceLiteNet on the validation set is shown in Table 3, where the overall results demonstrate a strong detection capability. On average, the Recall attains 76.1%, while mAP@0.5 reaches 84.1%, demonstrating the effectiveness of the method in recognition and localization across the majority of categories. For the fine and coarse classes, Recall exceeds 94% and mAP@0.5 approaches 98%, reflecting a high accuracy and robustness in tea leaf appearance recognition. In contrast, the performance in touching and unsure classes is relatively limited. The touching class contains only 11 annotated instances, which constrains both the Recall and mAP@0.5. The presence of labeling ambiguity and semantic vagueness leads to reduced Recall and mAP@0.5 for the unsure category relative to others. Nevertheless, these limitations have little impact on the method’s overall stability in the primary categories, confirming its practical utility and potential for application in tea leaf appearance inspection.

3.2. Comparative Experiments

3.2.1. Comparative Experiments of Different Network Architectures

TeaAppearanceLiteNet demonstrates an overall superior performance in the tea leaf appearance inspection task compared with mainstream object detection architectures, such as Faster R-CNN [23], SSD [24], YOLOv3 [25], YOLOv6 [26], YOLOv8n, YOLOv10n [27], and YOLOv11n, as shown in Table 4. Specifically, its Precision reaches 85.71%, representing a 5.41% improvement over YOLOv6, which ranks second in this metric. The mAP@0.5 achieves 84.06%, outperforming both YOLOv6 and YOLOv8n by 0.61% and 2.91%, respectively. Although TeaAppearanceLiteNet achieves a Recall of 76.05%, which is slightly lower than certain competing models, this limitation is likely attributed to the challenges of detecting small or occluded leaves within complex background environments. While such cases reduce the Recall through missed detections, the model still achieves a balanced trade-off, and its high Precision ensures reliable results. This balance is particularly valuable in practical tea grading, where minimizing false positives is often more critical than maximizing the Recall. Moreover, the model demonstrates significant advantages in both the parameter quantity and overall size, requiring only about 1.83% and 0.24% of those of the conventional Faster R-CNN, respectively. These values are also significantly lower than those of SSD and the YOLO series. The inference speed reaches 157.81 FPS, exceeding all models except YOLOv3, indicating an excellent real-time performance and lightweight efficiency. Overall, TeaAppearanceLiteNet achieves outstanding results in detection accuracy, inference speed, and model compactness, and its comprehensive performance surpasses that of comparable architectures.

3.2.2. Comparative Analysis of Detection Results Among Different Models

As shown in Figure 8, TeaAppearanceLiteNet significantly outperforms YOLOv11n in tea leaf appearance recognition. In the detection results, TeaAppearanceLiteNet achieves a higher classification accuracy, with confidence scores generally exceeding 0.9. The predicted bounding boxes show a closer fit to the actual object boundaries and reveal no significant omission in detection. In contrast, YOLOv11n presents a lower classification of confidence and visible misclassifications. In the heatmaps, TeaAppearanceLiteNet demonstrates a focused attention on key leaf regions, indicating more precise feature extraction. YOLOv11n, by comparison, shows scattered attention responses and inaccurate localization. The regions of concentrated attention in the heatmaps are consistent with the high-confidence classification outputs, further confirming the superior ability of TeaAppearanceLiteNet to focus on critical features during object recognition. In conclusion, TeaAppearanceLiteNet exhibits notable advantages in both detection Precision and feature representation, rendering it highly appropriate for tea leaf appearance grading applications.

3.2.3. Comparative Experiments on Different C3k2 Modules

In comparing the performance of different C3k2 modules within the baseline architecture, three variants are evaluated: C3k2, C3k2_ODConv, and C3k2_PartialConv. As presented in Table 5, the results demonstrate that C3k2_PartialConv achieves the highest overall performance across several critical evaluation indicators. Its Precision reaches 81.70%, showing an improvement of 8.26% over C3k2 and 14.60% over C3k2_ODConv. The mAP@0.5 attains 80.54%, outperforming the other two by 0.84% and 7.82%, respectively, indicating a significant accuracy gain. Although its Recall is slightly lower than that of C3k2, the inference speed reaches 140.27 FPS, comparable to C3k2 and substantially higher than C3k2_ODConv. In terms of model complexity, C3k2_PartialConv has the lowest parameter count, with only 2,503,452 parameters, which represents a 32.8% reduction compared to C3k2_ODConv. It also shows the smallest model size, occupying just 5.1 MB.

3.2.4. Comparative Analysis of Different Attention Mechanisms

To assess the performance of different attention mechanisms within the baseline architecture, CBMA, iRMB, and CBMA_MSCA were selected for comparison. As reported in Table 6, CBMA_MSCA outperforms the other two mechanisms across critical metrics, including Precision, mAP@0.5%, and FPS, demonstrating a superior overall effectiveness. CBMA_MSCA attains a Precision of 82.47%, which is 8.05% higher than that of iRMB. Its mAP@0.5% reaches 83.89%, representing improvements of 5.97% and 3.34% over CBMA and iRMB, respectively. The inference speed, measured at 148.17 FPS, is also substantially higher than the FPS of CBMA (140.52) and the FPS of iRMB (127.92), with increases of approximately 5.4% and 15.8%. Despite a minor decrease in Recall compared with iRMB, the considerable gains in Precision and mAP more than compensated. Notably, CBMA_MSCA preserves the parameter count and model size yet achieves a more favorable trade-off between detection accuracy and computational efficiency.

3.2.5. Comparative Experiments of Different Detection Heads

In comparing the performance of different detection heads within the baseline architecture, DynamicHead, RTDETRDecoder, and Detect_PinwheelShapedConv are selected for evaluation. Table 7 indicates that Detect_PinwheelShapedConv delivers superior results across Precision, mAP@0.5, and FPS. Specifically, the mAP improves by 14.73% compared to RTDETRDecoder, while the FPS is 3.11 times and 3.54 times higher than those of DynamicHead and RTDETRDecoder, respectively, indicating a substantial gain in inference efficiency. The parameter count of Detect_PinwheelShapedConv is 3.06 million, and the model size is 6.2 MB, both significantly smaller than those of the RTDETRDecoder, accounting for only approximately 32.6% and 34.1%, respectively, which leads to a better deployment efficiency. Although its Recall is slightly lower than that of DynamicHead, Detect_PinwheelShapedConv offers a more balanced overall performance.

3.2.6. Comparative Experiments of Different Regression Loss Functions

To evaluate the effectiveness of different regression loss functions within the baseline architecture, three variants—MPDIoU, ShapeIoU, and MPDIoU_ShapeIoU—were tested. The corresponding outcomes are summarized in Table 8, where MPDIoU_ShapeIoU demonstrates the most consistent superiority across all evaluation metrics. Its Precision reaches 83.32%, representing an improvement of 1.03% over MPDIoU and 5.57% over ShapeIoU. The Recall reaches 79.18%, with increases of 10.01% and 2.21%, respectively. The mAP@0.5 reaches 83.96%, improving by 4.56% compared to MPDIoU and 0.82% compared to ShapeIoU.

3.3. Ablation Study

An ablation study was conducted on the tea leaf appearance dataset using the baseline model as a reference to verify the impact of each modification. The corresponding results are presented in Table 9. The findings show that TeaAppearanceLiteNet achieves the best performance across three key metrics: Precision, mAP@0.5, and FPS, reaching 85.71%, 84.06%, and 157.81, respectively. Compared with the baseline, it improves Precision by 12.27%, mAP by 4.36%, and FPS by 12.2%. Although the Recall is marginally below the highest recorded value, the reduction is only 1.72%, and it still maintains a relatively high level. In terms of model size, the number of parameters is 2,514,750, and the total size is just 5.1 MB, the smallest among all tested variants. Compared with models that incorporate only partial improvements, TeaAppearanceLiteNet achieves the best trade-off among accuracy, speed, and model complexity. Moreover, certain module combinations resulted in a reduction in the FPS. For instance, when C3k2_PartialConv was integrated with Detect_PinwheelShapedConv or CBMA_MSCA (Models F, H, and M), the introduction of additional convolutional branches and enlarged receptive fields increased computational and memory demands. Although these configurations improved accuracy, the added overhead inevitably slowed the inference, thereby leading to the reduction in FPS. These results clearly demonstrate the effectiveness of its integration strategy and highlight its strong overall performance and practical application potential.

3.4. Statistical Significance and Efficiency Analysis Relative to the Baseline

To assess whether the gains of TeaAppearanceLiteNet over the baseline exceed the sampling noise, a paired bootstrap resampling is applied under a strictly aligned evaluation protocol, with identical validation data, input resolution, and post-processing thresholds and non-maximum suppression, to infer the ΔmAP@0.5. The analysis indicates an improvement of +4.21 percentage points, with a 95% confidence interval of [0.92, 7.37]; a one-sided test under the alternative hypothesis H_A: Δ > 0 yields p = 0.008. The null hypothesis H_0: Δ ≤ 0 is therefore rejected at the significance level α = 0.05, which indicates a statistical significance in the improvement. Moreover, the confidence interval lies entirely to the right of zero; the lower bound of 0.92 percentage points serves as a conservative estimate of the effect size under the present data distribution and evaluation protocol, suggesting that the observed improvement does not arise from random sampling variability or differences in evaluation settings. The paired design compares both systems with the same images, effectively reducing the variance due to sample heterogeneity and increasing the test power, thereby strengthening the robustness and reproducibility of the conclusion.

Beyond accuracy, TeaAppearanceLiteNet also shows advantages in complexity and efficiency. Under the same hardware and input settings, a smaller parameter count and computation lead to higher frames per second (FPS), substantially reducing resource usage and latency during inference and enabling deployment in edge and large-scale online scenarios. In summary, the improvement in mAP@0.5 is confirmed as statistically significant by the paired resampling test, while the lightweight design delivers a faster inference and lower energy consumption and cost, thereby improving system throughput and deployability. Consequently, TeaAppearanceLiteNet provides superior combined benefits in accuracy and efficiency for the present tea leaf appearance recognition setting of appearance inspection.

4. Discussion

The proposed TeaAppearanceLiteNet strikes an effective balance between lightweight architecture and detection accuracy, demonstrating its feasibility and potential for tea leaf appearance grading. A further point of discussion is its adaptability and possible directions in more complex tasks and broader application scenarios.

Although four categories are adopted as the experimental basis in this study, grading standards in the tea industry are usually more refined and may even involve cross-varietal distinctions. With strong feature extraction and multi-scale representation capabilities, TeaAppearanceLiteNet shows the potential for application in multi-class grading tasks, but validation with larger-scale and more complex tasks is still needed to further establish its applicability.

The dataset employed in this study was collected under relatively standardized conditions, which ensures stability for both training and evaluation. However, the performance of existing methods in real field environments requires further verification, where variations in illumination, leaf posture, and background diversity may introduce new challenges. Future work may consider collecting more diverse samples under natural conditions to comprehensively evaluate robustness and adaptability in complex scenarios.

The data volume used in this study also leaves room for expansion compared with common deep learning tasks. To enhance generalization and performance, future research can apply data augmentation techniques to enrich diversity and progressively include samples with varying illuminations, backgrounds, and leaf conditions. Such efforts would not only improve generalization but also provide a stronger foundation for practical deployment.

From the perspective of industrial application, the lightweight nature of TeaAppearanceLiteNet enables the deployment on embedded platforms and edge computing devices, satisfying the requirements of real-time processing under limited computational resources. It should be noted that applications of computer vision and machine learning in tea leaf grading are still at an exploratory stage, with most work concentrated in academic research. With the rapid progress of intelligent detection technologies in agricultural product grading, it can be anticipated that this approach has considerable potential in tea grading and quality control. If performance stability in complex environments is further improved while maintaining accuracy and efficiency, and adaptation to industrial hardware conditions is achieved, its practical value will become more prominent.

In summary, TeaAppearanceLiteNet not only verifies the effectiveness of lightweight networks for tea leaf appearance inspection but also offers valuable insights for future research and industrial applications. Subsequent work may focus on expanding data diversity, enhancing adaptability in natural environments, and validating industrial applications, thereby advancing the adoption of intelligent detection technologies in the digital transformation of the tea industry.

5. Conclusions

The proposed lightweight model for tea leaf appearance inspection, TeaAppearanceLiteNet, achieves a favorable trade-off between computational efficiency and detection accuracy. The integration of C3k2_PartialConv reduces computational redundancy, CBMA_MSCA enhances multi-scale channel attention, Detect_PinwheelShapedConv improves spatial perception, and MPDIoU_ShapeIoU increases the regression accuracy. Collectively, these components significantly boost both detection speed and Precision. Experimental results demonstrate that TeaAppearanceLiteNet outperforms mainstream models across several key metrics, achieving an 85.71% in Precision, 84.06% in mAP@50, 157.8 FPS in inference speed, and a parameter count of 2.51 million. Featuring high accuracy, a compact architecture, and strong adaptability, the model is well-suited for deployment on resource-limited devices and applications in smart agriculture. Future research will emphasize further architectural improvements to strengthen the generalization ability and practical robustness in complex environmental settings.

Author Contributions

Conceptualization, X.C.; methodology, X.C.; software, X.C.; validation, X.C., L.X. and S.C.; formal analysis, X.C.; investigation, X.C.; resources, L.W. and X.Y.; data curation, L.X. and S.C.; writing—original draft preparation, X.C.; writing—review and editing, L.W. and X.Y.; visualization, X.C.; supervision, Y.Z.; project administration, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62301493; and the Stable Support Plan of the National Key Laboratory of Underwater Acoustic Technology, grant number JCKYS2024604SSJS00303.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and/or analyzed during the current study are not publicly available due to restrictions imposed by the research laboratory but are available from the corresponding author on reasonable request.

Acknowledgments

The author thanks the optoelectronic team of Zhejiang Sci-Tech University for their assistance in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, S.; Yang, H.; Yang, C.; Yuan, W.; Li, X.; Wang, X.; Zhang, Y.; Cai, X.; Sheng, Y.; Deng, X.; et al. Edge Device Detection of Tea Leaves with One Bud and Two Leaves Based on ShuffleNetv2-YOLOv5-Lite-E. Agronomy 2023, 13, 577. [Google Scholar] [CrossRef]
Cao, M.; Fu, H.; Zhu, J.; Cai, C. Lightweight Tea Bud Recognition Network Integrating GhostNet and YOLOv5. Math. Biosci. Eng. 2022, 19, 12897–12914. [Google Scholar] [CrossRef]
Shuai, L.; Chen, Z.; Li, Z.; Li, H.; Zhang, B.; Wang, Y.; Mu, J. Real-Time Dense Small Object Detection Algorithm Based on Multi-Modal Tea Shoots. Front. Plant Sci. 2023, 14, 1224884. [Google Scholar] [CrossRef]
Wang, M.; Li, Y.; Meng, H.; Chen, Z.; Gui, Z.; Li, Y.; Dong, C. Small Target Tea Bud Detection Based on Improved YOLOv5 in Complex Background. Front. Plant Sci. 2024, 15, 1393138. [Google Scholar] [CrossRef]
Meng, J.; Wang, Y.; Zhang, J.; Tong, S.; Chen, C.; Zhang, C.; An, Y.; Kang, F. Tea Bud and Picking Point Detection Based on Deep Learning. Forests 2023, 14, 1188. [Google Scholar] [CrossRef]
Yang, M.; Yuan, W.; Xu, G. YOLOX Target Detection Model Can Identify and Classify Several Types of Tea Buds with Similar Characteristics. Sci. Rep. 2024, 14, 2855. [Google Scholar] [CrossRef] [PubMed]
Zhao, R.; Liao, C.; Yu, T.; Chen, J.; Li, Y.; Lin, G.; Huan, X.; Wang, Z. IMVTS: A Detection Model for Multi-Varieties of Famous Tea Sprouts Based on Deep Learning. Horticulturae 2023, 9, 819. [Google Scholar] [CrossRef]
Meng, J.; Kang, F.; Wang, Y.; Tong, S.; Zhang, C.; Chen, C. Tea Buds Detection in Complex Background Based on Improved YOLOv7. IEEE Access 2023, 11, 88295–88304. [Google Scholar] [CrossRef]
Shi, M.; Zheng, D.; Wu, T.; Zhang, W.; Fu, R.; Huang, K.; Deng, L.-J. Small Object Detection Algorithm Incorporating Swin Transformer for Tea Buds. PLoS ONE 2024, 19, e0299902. [Google Scholar] [CrossRef] [PubMed]
Xie, S.; Sun, H. Tea-YOLOv8s: A Tea Bud Detection Model Based on Deep Learning and Computer Vision. Sensors 2023, 23, 6576. [Google Scholar] [CrossRef]
Moosmann, J.; Giordano, M.; Vogt, C.; Magno, M. TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers. In Proceedings of the 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hangzhou, China, 11–14 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
Li, J.; Ye, J. Edge-YOLO: Lightweight Infrared Object Detection Method Deployed on Edge Devices. Appl. Sci. 2023, 13, 4402. [Google Scholar] [CrossRef]
Betti, A.; Tucci, M. YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Detection in Aerial Imagery. Sensors 2023, 23, 1865. [Google Scholar] [CrossRef]
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8. arXiv 2024, arXiv:2305.09972. [Google Scholar] [CrossRef]
Alqahtani, D.K.; Cheema, A.; Toosi, A.N. Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices. arXiv 2024, arXiv:2409.16808. [Google Scholar] [CrossRef]
Nghiem, V.Q.; Nguyen, H.H.; Hoang, M.S. LEAF-YOLO: Lightweight Edge-Real-Time Small Object Detection on Aerial Imagery. Intell. Syst. Appl. 2025, 25, 200484. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Gary Chan, S.-H. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 1140–1156. [Google Scholar]
Yang, J.; Liu, S.; Wu, J.; Su, X.; Hai, N.; Huang, X. Pinwheel-Shaped Convolution and Scale-Based Dynamic Loss for Infrared Small Target Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 27 January–1 February 2025; Volume 39, pp. 9202–9210. [Google Scholar] [CrossRef]
Ma, S.; Xu, Y. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, S. Shape-IoU: More Accurate Metric Considering Bounding Box Shape and Scale. arXiv 2024, arXiv:2312.17663. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of TeaAppearanceLiteNet. TeaAppearanceLiteNet consists of a backbone with C3k2_PartialConv for efficient feature extraction, CBMA_MSCA for multi-scale channel attention, and a neck for feature fusion through up-sampling and concatenation. The detection head employs Detect_PinwheelShapedConv to enhance spatial representation, while MPDIoU_ShapeIoU is adopted as the regression loss to improve accuracy and stability.

Figure 2. Structure of C3k2_PartialConv. The Bottleneck in the original C3k is replaced with PartialConv, which selectively applies convolution to part of the channels while others pass through directly. A 1 × 1 convolution integrates all channels, enhancing efficiency and edge feature extraction in complex backgrounds.

Figure 3. Structure of CBMA_MSCA. Based on the original CBAM, the module replaces the channel attention with MSCA for multi-scale channel modeling, while retaining the spatial attention to enhance feature representation with lightweight design.

Figure 4. Structure of Detect_PinwheelShapedConv. The detection head introduces PinwheelShapedConv to expand the receptive field and enhance feature representation, improving weak and small target detection with parameter efficiency.

Figure 5. Dataset preparation process.

Figure 6. Loss curves of training and validation for TeaAppearanceLiteNet. The curves show a rapid decrease during the initial epochs followed by stabilization, demonstrating effective optimization, stable convergence, and strong generalization without significant overfitting or underfitting.

Figure 7. Evaluation results of training and validation metrics for TeaAppearanceLiteNet. The metrics exhibit a generally monotonic upward trend throughout 300 epochs, with minor fluctuations in the early stages due to boundary prediction sensitivity. The steady rise in mAP-related indicators highlights enhanced localization, improved classification performance, and consistent detection accuracy across different IoU thresholds, confirming the stability and generalization capability of the model.

Figure 8. Comparison of model detection results. TeaAppearanceLiteNet demonstrates higher confidence, precise bounding boxes, and focused attention, whereas YOLOv11n exhibits lower confidence and misclassifications. The yellow solid line denotes a recognition error, and the yellow dashed line denotes a correct detection.

Table 1. Experimental configuration and training setup.

Environmental Parameter	Value
Operating system	Ubuntu 20.04 LTS (Linux)
CPU	18 vCPU AMD EPYC 9754 128-Core Processor
GPU	RTX 4090D (24 GB)
RAM	60 GB
Compilation language	Python 3.8.19
Framework	Pytorch 2.3.1
CUDA	CUDA 12.1

Table 2. Training parameters.

Parameter	Value	Parameter	Value
Image Size	480 × 480	Batch Size	32
Epoch	300	Optimizer	SGD
Patience	50	Learning Rate	0.01
Momentum	0.937	Weight Decay	0.0005

Table 3. Comparative detection performance of TeaAppearanceLiteNet across various classes on the validation dataset.

Class	Images	Recall/%	mAP@0.5/%
all	664	76.1	84.1
fine	613	94.4	97.0
coarse	657	94.9	98.1
touching	11	50.0	68.1
unsure	273	64.9	73.0

Note: In the images column, all denotes the total number of validation images, while other rows denote annotated instances per class.

Table 4. Comparative evaluation of tea leaf appearance inspection using different network models.

Model	Precision/%	Recall/%	mAP@0.5/%	FPS	Parameters	Size/MB
Faster R-CNN	61.16	70.29	66.46	46.22	137,098,724	2097.9
SSD	55.94	72.97	65.65	194.50	26,285,486	515.2
YOLOv3	61.04	81.73	75.12	276.75	9,521,080	18.3
YOLOv6	80.30	80.80	83.45	181.54	4,155,420	8.2
YOLOv8n	77.37	77.04	81.15	153.22	2,685,148	5.4
YOLOv10n	78.13	75.72	81.70	150.61	2,695,976	5.5
YOLOv11n	73.44	81.43	79.70	140.66	2,582,932	5.2
TeaAppearanceLiteNet	85.71	76.05	84.06	157.81	2,514,750	5.1

Table 5. Comparative evaluation of various C3k2 module variants.

C3k2 Module	Precision/%	Recall/%	mAP@0.5/%	FPS	Parameters	Size/MB
C3k2	73.44	81.43	79.70	140.66	2,582,932	5.2
C3k2_ODConv	67.10	76.44	72.72	79.61	3,724,995	7.5
C3k2_PartialConv	81.70	77.09	80.54	140.27	2,503,452	5.1

Table 6. Comparative evaluation of various attention mechanisms.

Attention Mechanism	Precision/%	Recall/%	mAP@0.5/%	FPS	Parameters	Size/MB
CBMA	81.54	79.34	77.92	140.52	2,587,190	5.3
iRMB	74.42	82.35	80.55	127.92	2,600,148	5.3
CBMA_MSCA	82.47	77.31	83.89	148.17	2,594,230	5.3

Table 7. Comparative evaluation of different detection heads.

Detection Head	Precision/%	Recall/%	mAP@0.5/%	FPS	Parameters	Size/MB
DynamicHead	76.04	84.58	81.92	45.92	2,267,384	4.7
RTDETRDecoder	69.99	72.48	67.37	40.34	9,381,040	18.2
Detect_PinwheelShapedConv	80.85	77.97	82.10	142.91	3,057,140	6.2

Table 8. Comparative experiments of different regression loss functions.

Loss Function	Precision/%	Recall/%	mAP@0.5/%	FPS	Parameters	Size/MB
MPDIoU	82.29	69.17	79.40	142.87	2,582,932	5.2
ShapeIoU	77.75	76.97	83.14	147.68	2,582,932	5.2
MPDIoU_ShapeIoU	83.32	79.18	83.96	148.61	2,582,932	5.2

Table 9. Comparison of evaluation results from ablation experiments on different models.

Model	C3k2_PartialConv	CBMA_MSCA	Detect_PinwheelShapedConv	MPDIoU_ShapeIoU	Precision/%	Recall/%	mAP@0.5/%	FPS	Parameters	Size/MB
Baseline	×	×	×	×	73.44	81.43	79.70	140.66	2,582,932	5.2
A	√	×	×	×	81.70	77.09	80.54	140.27	2,503,452	5.1
B	×	√	×	×	82.47	77.31	83.89	148.17	2,594,230	5.3
C	×	×	√	×	80.85	77.97	82.10	142.91	3,057,140	6.2
D	×	×	×	√	83.32	79.18	83.96	148.61	2,582,932	5.2
E	√	√	×	×	78.50	82.42	81.96	134.30	2,514,750	5.1
F	√	×	√	×	74.55	82.05	78.60	125.65	2,977,660	6.0
G	√	×	×	√	74.92	78.31	78.77	131.45	2,503,452	5.1
H	×	√	√	×	74.20	84.63	83.93	128.66	3,068,438	6.2
I	×	√	×	√	80.95	84.89	83.30	136.54	2,594,230	5.3
J	×	×	√	√	79.66	77.48	83.02	151.97	3,057,140	6.2
K	√	×	√	√	82.29	71.54	82.44	131.85	2,977,660	6.0
L	√	√	×	√	79.42	73.32	82.54	128.07	2,514,750	5.1
M	√	√	√	×	82.82	75.73	83.65	102.37	2,988,958	6.0
N	×	√	√	√	77.27	81.44	81.67	140.53	3,068,438	6.2
TeaAppearanceLiteNet	√	√	√	√	85.71	76.05	84.06	157.81	2,514,750	5.1

Note: The symbol √ denotes that the model integrates the specified module, whereas × signifies that the module is excluded.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Wu, L.; Yang, X.; Xu, L.; Chen, S.; Zhang, Y. TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection. Appl. Sci. 2025, 15, 9461. https://doi.org/10.3390/app15179461

AMA Style

Chen X, Wu L, Yang X, Xu L, Chen S, Zhang Y. TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection. Applied Sciences. 2025; 15(17):9461. https://doi.org/10.3390/app15179461

Chicago/Turabian Style

Chen, Xiaolei, Long Wu, Xu Yang, Lu Xu, Shuyu Chen, and Yong Zhang. 2025. "TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection" Applied Sciences 15, no. 17: 9461. https://doi.org/10.3390/app15179461

APA Style

Chen, X., Wu, L., Yang, X., Xu, L., Chen, S., & Zhang, Y. (2025). TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection. Applied Sciences, 15(17), 9461. https://doi.org/10.3390/app15179461

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection

Abstract

1. Introduction

2. Materials and Methods

2.1. Architecture of the TeaAppearanceLiteNet Network

2.2. C3k2_PartialConv

2.3. CBMA_MSCA

2.4. Detect_PinwheelShapedConv

2.5. MPDIoU_ShapeIoU

2.6. Dataset

2.7. Experimental Environment and Parameter Configuration

3. Results

3.1. Experimental Results of Model Training

3.2. Comparative Experiments

3.2.1. Comparative Experiments of Different Network Architectures

3.2.2. Comparative Analysis of Detection Results Among Different Models

3.2.3. Comparative Experiments on Different C3k2 Modules

3.2.4. Comparative Analysis of Different Attention Mechanisms

3.2.5. Comparative Experiments of Different Detection Heads

3.2.6. Comparative Experiments of Different Regression Loss Functions

3.3. Ablation Study

3.4. Statistical Significance and Efficiency Analysis Relative to the Baseline

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI