1. Introduction
The tea tree serves as the cornerstone of the tea industry, playing a pivotal role in economic and cultural domains [
1]. The quality of its seeds is critical to industrial development, as moldy seeds exhibit low germination rates and vulnerable seedlings prone to diseases, while improperly soaked seeds suffer from difficult sprouting [
2]. Therefore, the precise elimination of low-quality seeds is key to efficient tea garden cultivation and the high-quality development of the tea industry.
In recent years, spectral technologies have been widely applied to seed detection and screening. Yang et al. [
3] proposed the MIVopt-SPA algorithm for feature wavelength extraction from near-infrared (NIR) spectra of seeds, achieving multi-level accurate non-destructive detection of seed viability. Kusumaningrum et al. [
4] developed a non-destructive method using Fourier transform near-infrared (FT-NIR) spectroscopy technique to assess soybean seed viability, accurately distinguishing viable from non-viable seeds through spectral analysis combined with chemometrics. Cui et al. [
5] conducted a study on single corn seed maturity detection based on hyperspectral imaging and transfer learning, establishing a model by acquiring hyperspectral images and integrating transfer learning to achieve rapid, non-destructive maturity assessment. However, in practical applications, NIR spectroscopy is affected by factors such as temperature, light, moisture, and sample physical morphology (e.g., curvature and particle size), which easily cause deviations in the detection results [
6]. Hyperspectral image classification faces challenges including high data dimensionality, high sample annotation costs, significant intra-class variation, and difficulties in spectral–spatial feature modeling, leading to an insufficient model generalization capability [
7].
The wavelength range of MIR spectroscopy typically spans 2500 to 25,000 nm. This region corresponds to the characteristic spectral zone of molecular functional groups, where different chemical bonds or functional groups exhibit distinctive absorption frequencies in the MIR band. Spectral analysis can thus enable the identification of chemical species and functional groups [
8]. Eevera et al. [
9] utilized attenuated total reflection Fourier transform infrared (ATR–FTIR) spectroscopy to acquire the spectral data of peanut seeds. By analyzing the correlation between specific wavelength bands and quality indicators such as the germination rate and viability, they demonstrated its potential for rapid non-destructive detection of peanut seed quality. Gabriela et al. [
10] employed mid-infrared diffuse reflection spectroscopy combined with stochastic gradient descent (SGD) preprocessing and support vector machine (SVM) models to achieve high-precision prediction of 11 nutrient concentrations in
Ilex paraguariensis leaves. Andrade et al. [
11] used ATR–FTIR technology coupled with chemometric tools to model and analyze the viability of artificially accelerated-aged corn seeds, achieving classification and prediction of seed viability grades through the correlation between the spectral data and the viability indicators.
While traditional MIR spectroscopy has achieved fruitful results in seed quality detection and other fields, deep learning has brought new research perspectives and technological breakthroughs to spectral analysis in recent years, thanks to its powerful feature learning capabilities. For example, Ma [
12] employed near-infrared hyperspectral imaging (NIR–HSI) combined with deep learning methods such as convolutional neural networks (CNNs) to achieve rapid non-destructive prediction of seed viability, demonstrating high accuracy in classifying naturally aged seeds of
Brassica juncea (Japanese mustard). Li et al. [
13] used hyperspectral imaging and deep learning techniques to conduct classification studies on multi-year and multi-variety pumpkin seeds, achieving precise differentiation of seed types. Jin et al. [
14] utilized NIR–HSI combined with deep learning models to identify five common rice seed varieties, with most models achieving classification accuracies exceeding 95%.
DenseNet demonstrates a higher connection density than ResNet, significant classification accuracy advantages over MobileNet, and better parameter efficiency compared to traditional CNNs, particularly in the spectral data classification. To achieve accurate screening of tea tree seeds using MIR spectroscopy, this study proposes an improved model based on DenseNet121 (ECA-DenseNet) for classifying seeds of different quality levels. Specifically, the following tasks were carried out:
An FTIR spectrometer was employed to gather spectral data of tea tree seeds across different states. The data were preprocessed via Savitzky–Golay (SG) filtering and wavelet transform to ensure data quality.
The model was enhanced by addressing DenseNet121’s shortcomings, including simplifying the architecture, replacing the convolutional kernels, adopting a new normalization method, introducing a novel module, optimizing the transition layers, and adjusting the classification strategy to improve the model performance.
The improved model was compared with DenseNet121 and other relevant models. Its advantages were evaluated using metrics such as accuracy, Kappa coefficient, Matthews correlation coefficient (MCC), and confusion matrices, verifying the effectiveness of the improvement strategies.
3. Results and Discussion
3.1. Experimental Environment
The experiments were conducted on a Windows 11 operating system developed by Microsoft (Redmond, WA, USA). On the hardware side, the device is equipped with 128 GB of RAM, an Intel(R) Xeon(R) Bronze 3204 CPU (Intel, Santa Clara, CA, USA), and an NVIDIA GeForce RTX 3090 graphics card with 24 GB of video memory (NVIDIA, Santa Clara, CA, USA). All Python code was run using PyCharm Version 2024.3.1.1 (JetBrains, Prague, Czech Republic), with the Python environment based on PyTorch-GPU 1.8.0 (Meta, Menlo Park, CA, USA) and Python 3.8.8 programming language (Python Software Foundation, USA). The CUDA computing platform version 11.1 (NVIDIA, Santa Clara, CA, USA) was utilized for GPU acceleration.
3.2. Hyperparameter Settings
The reasonable setting of hyperparameters plays a crucial role in the model’s performance and training effect [
23]. We adjusted and optimized several key hyperparameters, with specific settings shown in
Table 2:
3.3. Evaluation Metrics
This study uses precision (P), recall (R), F1-score (F1), and accuracy [
24] as evaluation metrics. The definitions of all the metrics are as follows:
Kappa [
25]: measures model prediction-actual result consistency (range: [−1, 1]). Closer to 1 = better consistency; 0 = random guessing; −1 = complete opposition. Complements accuracy limitations in imbalanced datasets.
is the proportion of correctly predicted samples (accuracy), and is the proportion of correct predictions by random chance.
MCC [
26]: Evaluates models using TP, TN, FP, and FN, assessing the performance in imbalanced datasets. Ranges from [−1, 1]: 1 = optimal, 0 = random guessing, −1 = complete misclassification.
TP, FP, TN, and FN denote the numbers of true positives, false positives, true negatives, and false negatives, respectively.
Macro Average [
27]: Independently calculates metrics (e.g., precision, recall, F1) for each class, averages them without weighting by sample size, reflecting the model’s cross-class average performance. Example: macro-averaged precision formula.
where n is the number of classes, and
is the precision of the i-th class.
Weighted Average [
27]: calculates weighted metrics per class using sample size proportions (larger classes have higher influence), addressing imbalance. Example: weighted-averaged precision formula.
where
is the number of samples in the i-th class.
3.4. Chemical Analysis
In this study, mid-infrared spectroscopy chemical analysis was performed on four types of samples. As shown in
Figure 5, the chemical characteristics of the four types of tea tree seeds differ significantly and have clear physicochemical directivity:
Dry and healthy tea tree seeds (HDSC): the O-H stretching vibration peak at 3379.8 cm−1 indicates the presence of hydroxyl-related substances in the seeds. The C-H asymmetric/symmetric stretching vibration peaks at 2856.2 cm−1 and 2926.1 cm−1 suggest the presence of lipids and long-chain alkanes. Additionally, the C-H bending vibration peak at 1380.6 cm−1, the potential ester C-O or amide III band characteristic peak at 1246.2 cm−1, and the polysaccharide C-O-C skeleton vibration peak at 1042.8 cm−1 collectively demonstrate the stable chemical composition and good structural properties of the seed components.
Soaked healthy tea tree seeds (SHCS): Compared to HDSC, the O-H stretching vibration peak at 3452.3 cm−1 exhibits a red shift, implying that the soaking environment promotes the synthesis of hydrophilic substances. The enhanced C-O-C asymmetric stretching vibration peak at 1162.8 cm−1 (originating from hemicellulose synthesis) and the sharp peak of long-chain lipid ordered structures at 722.1 cm−1 indicate that the soaking treatment facilitates hemicellulose synthesis while maintaining lipid structural stability.
Dry and moldy tea tree seeds (DMCS): in addition to the C-H stretching vibration peaks shared with HDSC, an ester carbonyl C=O peak appears at 1744.8 cm−1, signaling lipid peroxidation. The CH2/CH3 bending vibration peak at 1461.1 cm−1, ester C-O stretching vibration peak at 1242.5 cm−1, residual C-O-C glycosidic bond breakage peak at 1160.8 cm−1, and -(CH2)n-out-of-plane rocking vibration peak at 722.1 cm−1 reveal degradation of lipid side chains and disruption of polysaccharide structures.
Soaked moldy tea tree seeds (SMCS): The O-H stretching vibration peak at 3385.8 cm−1 indicates the presence of H-bonded hydroxyl groups from polysaccharide hydrolysis products and fungal metabolites. The shift in the amide I band peak at 1655.2 cm−1 may lead to changes in the secondary structure of proteins. The characteristic peaks at 862.9 cm−1 and 719.5 cm−1, potentially associated with carbohydrate structures (e.g., cellulose) and lipid crystallization, respectively, reflect the damage to the structure and properties of multiple seed components caused by soaking and mildew.
Differences in the characteristic peak position, intensity, and shape (e.g., amide I band shift, lipid peak sharpness) serve as qualitative indicators, providing a basis for constructing deep learning-based seed quality classification models.
3.5. Comparative Experiments and Analysis
3.5.1. Comparison with the Original Model DenseNet121
A comparison was conducted between ECA-DenseNet and the DenseNet121 model, with results shown in
Table 3, the four metrics of ECA-DenseNet consistently exceed 99%, while the metrics of DenseNet121 are generally lower than those of ECA-DenseNet.
Table 4 demonstrates that both the macro-average and weighted-average metrics of ECA-DenseNet are around 99%, while the macro-average and weighted-average metrics of DenseNet121 are lower than those of ECA-DenseNet.
In summary, compared with DenseNet121, ECA-DenseNet has a relative improvement of about 3.6 in metrics such as P, R, and accuracy.
Confusion matrices serve as critical tools for evaluating the performance of classification models, intuitively displaying a model’s prediction outcomes for each class [
28]. The confusion matrices of the test sets for DenseNet121 and the improved ECA-DenseNet are compared in
Figure 6. The results show that ECA-DenseNet correctly predicts all the samples in each class, while DenseNet121 exhibits a misjudgment rate of 3.7% across all the classifications.
3.5.2. Comparison of ECA-DenseNet with Machine Learning Models
This section compares the performance of the ECA-DenseNet model with machine learning models including eXtreme Gradient Boosting (XGBoost) [
29], Support Vector Machine (SVM) [
30], Backpropagation (BP) [
31], Random Forest (RF) [
32], Gradient Boosting Machine (GBM) [
33], and PLS-DA [
34] on the test set. Evaluation metrics including accuracy (overall correctness rate of predictions), Kappa (agreement accounting for random chance), and MCC (a balanced measure robust to class imbalance) are employed, with results shown in
Table 5 and
Figure 7. In key classification metrics, ECA-DenseNet outperforms other machine learning models, with its accuracy, Kappa value, and MCC all exceeding 99%. The best-performing machine learning model, SVM, achieves corresponding metrics of 91.42%, 90.89%, and 91.12%, respectively. ECA-DenseNet outperforms SVM by approximately 8% across all the metrics.
3.5.3. Comparison with Deep Learning Models
Table 6 presents a comparison of ECA-DenseNet with deep learning models such as ResNet50 [
35], MobileNetV3 [
36], InceptionV3 [
37], GhostNet [
38], and GoogLeNet [
39] using classification accuracy-related metrics. The accuracy of ECA-DenseNet is 99.42%, its Kappa value is 99.01%, and its MCC value is 99.11%. It outperforms other deep-learning models, being 3.32% higher than InceptionV3 and 1.7% higher than ResNet50, which are the top-performing models among them.
Training time is a critical indicator for measuring model training efficiency, which increases with the number of model parameters [
40]. A comparison of training efficiency among deep learning models in this study is shown in
Table 7. ECA-DenseNet has a parameter scale of 28,774 KB, which is approximately 69.1% smaller than the largest parameter scale of InceptionV3 (92,968 KB). Additionally, its training time is 310 s, approximately 66.3% shorter than InceptionV3’s 921 s. Even when compared to GhostNet, the model with the smallest parameter scale, ECA-DenseNet only experiences a training time increase of about 187.0% despite a parameter scale increase of approximately 309.2%. This demonstrates that ECA-DenseNet not only achieves superior classification performance but also maintains high training efficiency through lightweight design, striking an optimal balance between the model complexity and computational cost.
To visualize the comparative data more intuitively, a radar chart was plotted across five comprehensive dimensions. Among them, accuracy, MCC, and Kappa were normalized in the positive direction (higher values indicate better performance), while Training Time and Parameters were reverse-normalized (lower values indicate stronger capabilities), as shown in
Figure 8. This chart vividly demonstrates the performance differences among ECA-DenseNet and five other deep learning models, highlighting ECA-DenseNet’s balanced superiority in accuracy, generalization ability (MCC/Kappa), and computational efficiency (training time/parameters). Specifically, ECA-DenseNet exhibits the largest radial coverage in the radar chart, indicating its dominance across all evaluated metrics and confirming its effectiveness as a lightweight and high-performance model for spectral data classification.
3.6. Ablation Experiments
To comprehensively evaluate the effectiveness of the constructed ECA-DenseNet model and deeply analyze the role of each module in the overall model performance, this study conducted ablation experiments [
41] focusing on three key modules: BCA, ACMix, and ECA. The performance of the model was investigated in terms of metrics such as Acc, Kappa, MCC, Parameters, Size [
42], and FLOPs [
42]. The experiments adopted both forward and reverse ablation methods to quantitatively analyze the contribution of each module, with the results shown in
Table 8.
In terms of accuracy, when all three modules (BCA, ACMix, ECA) were enabled, the accuracy reached the highest level of 99.42%, surpassing all the other combinations. When only single modules or partial combinations were enabled, the accuracy was relatively lower. For example, the accuracy was 94.85% when no modules were enabled. Similar trends were observed for the consistency coefficient (Kappa) and Matthews Correlation Coefficient (MCC). With all the three modules enabled, Kappa reached 99.01% and MCC reached 99.11%, both achieving optimal values among all the combinations.
Regarding the model resource consumption metrics, Parameters, Size, and FLOPs increased with the enabling of the modules. For instance, in terms of the parameter count, the value was 26.1 M when no modules were enabled, increasing to 28.1 M when all three modules were enabled. The model size increased from 5.5 M (no modules) to 6.5 M (all modules), and the number of floating-point operations (FLOPs) rose from 5.5 M to 6.5 M. This indicates that while the introduction of modules significantly improves the model performance, it inevitably leads to increased model complexity and resource requirements.
The ablation experiments validate the rationality and the effectiveness of the ECA-DenseNet model, which significantly enhances the performance while slightly increasing the size and computational costs. By innovatively integrating lightweight modules and attention mechanisms into DenseNet, this model achieves efficient and accurate identification of tea tree seed varieties.
4. Conclusions
4.1. Comparative Analysis of the Models
Manual screening of tea tree seeds (e.g., visual inspection and dissection) suffers from issues such as low efficiency, reliance on experience, and difficulty in identifying early-stage mildew and internal damage. The combination of MIR spectroscopy and deep learning has provided an efficient solution for maize seed identification.
The ECA-DenseNet model has shown significant performance improvements. Specifically, its accuracy has been enhanced from 95.20% to 99.42%, representing a notable increase of 4.43%. Meanwhile, the F1-score, precision (P), and recall (R) have, respectively, increased by 3.09%, 3.60%, and 3.16%. Notably, the misjudgment rate has been reduced from a maximum of 3.7% to 0%. When compared with machine learning algorithms, ECA-DenseNet outperforms the best-performing SVM (91.42%) by 8% in accuracy. Its Kappa value (99.01%) and Matthews Correlation Coefficient (MCC, 99.11%) are also 8.12% and 7.99% higher than those of SVM, respectively. In the comparison with deep learning models, ECA-DenseNet achieves a 3.32% improvement in accuracy over InceptionV3 (96.10%), while only requiring 31% of its parameters and reducing the training time by 66.3%. Ablation experiments further indicate that the synergistic effect of the three modules increases the accuracy by 4.57% compared to the model without these modules. These results collectively demonstrate ECA-DenseNet’s remarkable advantages in classification accuracy, generalization ability, and computational efficiency, highlighting that each module plays a critical complementary role in enhancing the model’s performance.
Therefore, the constructed ECA-DenseNet model has advanced the technological progress in tea tree seed identification. Its core advantage lies in the integration of the molecular-level detection capability of MIR spectroscopy and the automated feature learning capability of deep learning, providing an accurate screening technical pathway for agricultural production practices.
4.2. Future Work
This study has certain limitations—for example, the tea tree seeds were sourced from a single region and the FTIR used in this research requires sample pretreatment such as grinding and tableting, which may introduce limited on-site rapid detection. In future research, we will validate and optimize the model by incorporating the seeds from multiple geographical areas. Additionally, integrating portable spectral devices will facilitate the practical application of this technology in the field detection and intelligent management of germplasm resources. These efforts will not only enhance the product quality and screening efficiency in the tea industry but also provide a reference for the quality inspection of other crops.