Abstract
Black tea is a widely consumed beverage whose high economic value has led some producers to illegally add artificial colorants such as Sunset Yellow, Tartrazine, and Ponceau 4R, posing health risks. Although near-infrared (NIR) spectroscopy offers a rapid, non-destructive detection method, its use in trace-level colorant detection is limited due to low adulterant concentrations and interference from natural tea pigments. Hence, we developed a rapid, non-destructive method for detecting trace adulteration (from 0.1 to 0.5 g·kg−1) in black tea with artificial colorants using a handheld near-infrared spectrometer. To enhance sensitivity to low-level adulteration, we proposed a novel Spectral Multi-scale Attention Fusion Network (SMAFNet), designed to dynamically integrate multiscale features. SMAFNet consists of spectral preprocessing, multi-scale feature extraction, and cross-scale attention fusion modules. Comparative experiments with traditional machine-learning models demonstrated that SMAFNet achieved superior performance even at low adulteration levels. Sample sets (each including 36 samples) adulterated with Sunset Yellow, Tartrazine, and Ponceau 4R, SMAFNet achieved accuracies of 97.22–100%, F1-scores of 0.9879–1.00, and 100% recall. These findings confirm the feasibility and robustness of combining NIR with SMAFNet for the rapid and discriminative detection of trace colorants in black tea, offering a practical framework for on-site food safety monitoring and quality control.
1. Introduction
Black tea is one of the most widely consumed tea types worldwide and is renowned for its distinct flavor and bright liquor color. It is produced through a complete fermentation process, during which tea leaves undergo enzymatic oxidation, leading to the formation of characteristic pigments such as theaflavins and thearubigins [1,2,3]. In 2023, global tea production totaled 6.604 million tons, a 1.9% increase compared with 2022, with black tea representing more than 50% of the total production [4]. High-quality black tea infusions are bright and clear, whereas inferior products often appear dull or turbid. Owing to the high economic value of black tea, some producers illegally add artificial colorants such as Sunset Yellow, Tartrazine, and Ponceau 4R to conceal the inferior quality. These additives pose potential health risks and contravene food safety regulations. Excessive consumption of artificial colorants may lead to health risks, including genetic mutations, carcinogenesis, reduced hemoglobin levels, and allergic reactions [5]. Therefore, the detection of unauthorized colorant additives in black tea is essential to maintaining product integrity and safeguarding consumer safety.
Conventional analytical techniques for colorant detection include high-performance liquid chromatography [6], liquid chromatography–mass spectrometry [7], polarography [8], capillary electrophoresis [9], and thin-layer chromatography [10]. However, these methods are time consuming, instrument-intensive, and require skilled operators for data interpretation. Near-infrared (NIR) spectroscopy has emerged as a rapid, nondestructive analytical technique that requires minimal sample preparation and is environmentally friendly. When combined with chemometric algorithms, NIR has been successfully applied to adulteration detection [11,12], quality evaluation [13], geographical origin tracing [14], and process monitoring [15,16]. With technological advances and reduced equipment cost, handheld spectrometers have become suitable for fast, on-site analysis [17,18]. Their measurement range typically spans from 900 to 1700 nm [19]. However, only a few studies have investigated the detection of artificial colorants in teas using a handheld spectrometer, likely because of the low concentrations involved, as the quantitative detection limit of NIR is approximately 0.1% [20]. The amount of added colorant in black tea rarely exceeds 0.5 g·kg−1, as excessive addition darkens the infusion and degrades sensory quality. Furthermore, the presence of natural tea pigments, such as theaflavins and thearubigins, introduces spectral interference [21], complicating analysis. Consequently, a more efficient feature-extraction algorithm is required to obtain discriminative information from NIR.
Recently, one-dimensional convolutional neural networks (1D-CNNs) have shown strong capability for feature extraction from NIR data [22] and have been employed for quality assessment [23], component analysis [24,25], and adulteration or freshness detection [26,27]. However, traditional 1D-CNNs present certain limitations: the use of a single or fixed convolution kernel size restricts extraction to a specific scale [22], and the lack of cross-scale feature interaction hinders integration of multi-level spectral features, limiting sensitivity to weak signals from low-level adulteration.
To overcome these challenges, this study proposes a Spectral Multi-Scale Attention Fusion Network (SMAFNet), a novel spectral feature-extraction network that employs a multi-branch parallel architecture to simultaneously capture local and global spectral representations across multiple scales, thereby improving its ability to model complex spectral patterns. Thus, the study aims to develop an effective and non-destructive method for the rapid detection of trace synthetic colorant adulteration in black tea, thereby supporting on-site food safety monitoring and quality control.
2. Materials and Methods
2.1. Sample Preparation and Spectra Acquisition
Black tea samples were obtained from Xida Tea Co., Ltd. (Chongqing, China). Food-grade colorants, including Tartrazine, Sunset Yellow, and Ponceau 4R, were purchased from Kuoyi Biotechnology Co., Ltd. (Jinan, China). To minimize light-scattering effects caused by particle-size variation and to ensure spectral uniformity, all black tea samples were ground and sieved through a 60-mesh screen, sealed in airtight polyethylene bags, and stored in a cold, dry environment until analysis. For the adulterated samples, black tea powder was homogeneously mixed with one of the three colorants: Tartrazine, Sunset Yellow, or Ponceau 4R. The added colorant concentration ranged from 0.1 to 0.5 g·kg−1, at 0.05 g·kg−1 intervals. Before spectral measurement, the moisture content of all samples was monitored and adjusted as necessary to maintain consistency. The final moisture content of black tea samples was maintained at approximately 6.9% to 7.2%, with no significant differences among them.
Spectral data were collected using a handheld spectrometer (model NIR-R210, Shenzhen Pynect Science and Technology Co., Ltd., Shenzhen, China) operating over the 900–1700 nm range with a spectral resolution of 3 nm. To ensure measurement stability, the spectrometer was preheated for 3 min before use, and the ambient temperature was maintained as constant as possible during acquisition. Each sample was scanned six times, and the mean spectrum was used for analysis. In total, 345 spectra were obtained, including 45 pure-tea and 300 adulterated samples (100 per colorant).
2.2. Dataset Partition
To develop and evaluate the predictive model, the tea samples were divided into calibration and validation subsets. The SPXY algorithm [28] was employed to split each dataset, with 75% of the samples allocated for training and the remaining 25% reserved for validation. The SPXY ensures spatial uniformity by considering both spectral variables (X) and reference values (Y), rendering it particularly suitable for partitioning low-concentration datasets. Once a sample is assigned to the training set by SPXY, it is exclusively retained in that set, preventing any overlap between the two subsets. This guarantees that the model is evaluated on truly unseen data.
2.3. Construction of SMAFNet
The overall architecture of SMAFNet (Figure 1) comprises three functional modules: (1) a Spectral Preprocessing Module (SPM), designed to enhance input consistency and suppress noise; (2) a Multi-Scale Feature Extraction Module (MSFEM), which captures hierarchical spectral characteristics across different receptive fields; and (3) a Cross-Scale Attention Fusion Module (CSAFM), which dynamically integrates multi-scale features to enhance the model’s sensitivity to trace adulterants.
Figure 1.
Architecture of the proposed spectral multi-scale attention fusion network (SMAFNet).
Following feature fusion, the resulting features were flattened, concatenated, and passed through fully connected layers for the final classification output.
2.3.1. Spectral Preprocessing Module
The SPM was developed to preprocess raw 1D spectral data with the dual objectives of suppressing instrumental noise and adjusting feature dimensions, thereby establishing a stable foundation for downstream feature extraction.
It comprises a 1D convolutional (Conv1d) layer and a max-pooling layer (Figure 2). The Conv1d layer performs initial spectral feature extraction and channel-dimension adjustment, whereas the max-pooling layer executes dimensionality reduction.
Figure 2.
Structure of the spectral preprocessing module (SPM).
Conv1d layer: This layer applies a set of learnable convolution kernels to the raw spectral input to implement preliminary feature extraction and channel-dimension adjustment. For a raw spectral input , where , denotes the initial input channel count (initial = 1), and L represents the number of spectral features. The Conv1d layer employs convolution kernels , where , k is the kernel size and C0 is the number of output channels. For the j-th output channel, the value at position t in the feature map y(j) is computed as
Max-pooling layer: This layer performs dimensionality reduction on the Conv1d output, compressing redundant information while preserving key discriminative spectral features.
2.3.2. Multi-Scale Feature Extraction Module
Based on the Inception module in GoogLeNet [29], the MSFEM was designed to overcome the limitations of traditional 1D-CNNs. The MSFEM extracts spectral features across multiple receptive fields, thereby enhancing the ability of the model to identify weak signals from low-level adulterants.
It comprises a set of n parallel feature extraction blocks (FEBs), each employing Conv1d kernels of different sizes (Figure 3a). In this study, n was set between 1 and 3, and six kernel sizes were tested: [1 × 1], [3 × 1], [5 × 1], [7 × 1], [9 × 1], and [11 × 1]. The MSFEM also features a configurable network depth d (ranging from 1 to 3), defined as the number of sequentially stacked FEBs applied to the spectral representations.
Figure 3.
(a) Architecture of the multi-scale feature extraction module (MSFEM). (b) Structure of a single feature extraction block (FEB).
Subsequent experiments systematically investigated the effects of the kernel size, scale number (n), and depth (d) on the MSFEM feature extraction performance.
A single FEB (Figure 3b) performs four operations:
- Convolution: A Conv1d layer with a specific kernel size extracts spectral features corresponding to its receptive field.
- Activation: A rectified linear unit (ReLU) introduces a nonlinearity feature extraction.
- Channel recalibration: A squeeze-and-excitation (SE) module enhances the model’s representational capacity by emphasizing informative channels while suppressing less relevant channels [30]. The SE module operates in three stages: (i) Squeeze: Global average pooling is applied to the input feature map to compress the spectral information of each channel into a single scalar; (ii) Excitation: Two fully connected layers, followed by ReLU and Sigmoid activations, are used to learn the relative importance of each channel; and (iii) Reweighting: The learned attention weights are applied to the original feature map through channel-wise multiplication.
- Pooling: A max-pooling layer reduces feature dimensionality.
2.3.3. Cross-Scale Attention Fusion Module
The CSAFM was designed by integrating the key mechanisms of the SE network and the bidirectional feature pyramid network (BiFPN) [31], with the objective of enhancing feature interaction and selective aggregation across scales. The CSAFM comprises two Conv1d layers and two activation functions (ReLU and Sigmoid), and its workflow involves three main stages (Figure 4):
Figure 4.
Structure of the cross-scale attention fusion module (CSAFM).
- 1.
- Cross-scale interaction
For each reference-scale feature , where r denotes the reference scale, the module sequentially fuses with features from all other scales (, j ≠ r). First, and are concatenated along the channel dimension to construct a cross-scale interaction representation :
- 2.
- Attention-weight generation and feature recalibration
The concatenated feature is fed into a Conv1d layer (Conv1d1) and activated by ReLU to extract the fused feature representations. A second Conv1d layer (Conv1d2) and Sigmoid activation are then applied to generate attention weights :
where denotes the Sigmoid function, mapping the weights to the range (0, 1). The generated weights are multiplied element-wise with the corresponding scale feature along the channel axis to achieve adaptive recalibration:
where ⊙ represents element-wise multiplication.
- 3.
- Feature aggregation
The recalibrated features from all the scales () are added to the reference feature to obtain the final fused feature :
After repeating this process for all scales, the fused features were flattened, concatenated, and passed into fully connected layers for classification.
2.4. Model Transmission
The structural parameters of SMAFNet were first optimized using a dataset containing pure samples and samples adulterated with Sunset Yellow. Sunset Yellow is a widely used artificial colorant with a low acceptable daily intake (ADI) value [5], indicating higher toxicity and thus requiring priority in detection. The optimized SMAFNet was then transferred to the datasets containing pure and Tartrazine- or Ponceau 4R-adulterated samples to evaluate its generalization performance in multitype adulteration detection. Its cross-domain adaptability was further validated by transferring SMAFNet to an open-source tablet NIR dataset (310 samples, available at http://www.models.life.ku.dk/, accessed on 8 May 2025). The NIR spectra of tablet samples are plotted in Supplementary Figure S1, and the validation results are provided in Supplementary Table S1 and Figure S2.
2.5. Model Evaluation
The performance of SMAFNet was evaluated by conducting comparative experiments between SMAFNet and several traditional machine learning models: partial least squares-discriminant analysis (PLS-DA), radial basis function support vector machine (RBF-SVM), random forest (RF), 1D-CNN, and multilayer perceptron (MLP). The hyperparameters of PLS-DA, RBV-SVM, and RF were optimized using a grid search, while 1D-CNN, MLP, and SMAFNet were trained using the adaptive moment estimation (Adam) optimizer [32] with an initial learning rate of 0.001.
Model performance was assessed using four metrics: accuracy (ACC), precision (PRE), recall (REC), and F1-score. Their definitions are expressed as follows:
In this study, TP (true positive) is the number of adulterated samples correctly identified as “adulterated”; TN (true negative) is the number of pure samples correctly classified as “unadulterated”; FP (false positive) is the number of pure samples incorrectly labeled as “adulterated”; and FN (false negative) is the number of adulterated samples incorrectly classified as “unadulterated.”
3. Results and Discussion
3.1. Spectra Analysis
The original NIR spectra of the experimental samples are presented in Figure 5a. The spectral curves of all samples were highly similar, making it challenging to distinguish adulterated from pure samples by visual inspection alone. All samples exhibited weak absorption bands in the range of 1170–1220 nm, along with distinct absorption peaks in the range of 1430–1500 nm and 1680–1700 nm. A consistent spectral trend was observed across all specimens tested. The absorption peak around 1195 nm may originate from the second overtone stretch vibration of C–H and O–H [33]. The spectral band at roughly 1465 nm may derive from the O–H first overtone vibrations [33]. The absorption bands near 1656 and 1680 nm were correlated with catechin content, primarily associated with the C–H and S–H first overtone vibrations [34]. In addition, the spectral patterns observed in this study are consistent with those previously reported for fermented tea [33]—a variety closely related to black tea. This similarity in spectral profiles may be attributed to analogous biochemical transformations occurring during the fermentation.
Figure 5.
(a) Near-infrared (NIR) spectra of all samples; (b) average spectra of unadulterated and colorant-adulterated samples.
The characteristics of the different samples were explored by computing and plotting the mean spectra of adulterated and unadulterated black tea samples (Figure 5b). The results revealed a slight decrease in absorbance following the addition of colorants, suggesting that these artificial additives exert a measurable influence on the NIR spectra. However, owing to spectral noise and other interfering factors, substantial spectral overlap remained among the samples.
3.2. Dataset Division
All samples were divided into calibration and validation subsets using the SPXY algorithm, and three binary classification datasets were constructed (Table 1). Each dataset comprised unadulterated samples and samples adulterated with one of the three colorants. The differences among the datasets are attributed to the inherent mechanisms of the SPXY algorithm, which selects representative samples by maximizing diversity in both the spectral (X) and property (Y) spaces, where Y denotes the adulteration status. Because of the spectral responses of Tartrazine, Sunset Yellow, and Ponceau 4R, their interactions with the SPXY selection logic led to slight variations in the number of pure samples assigned to each subset. Overall, each dataset contained 109 calibration samples and 36 validation samples.
Table 1.
Dataset division for calibration and validation subsets constructed using the SPXY algorithm.
3.3. Construction and Parameter Selection of SMAFNet
SMAFNet was optimized using the Sunset Yellow-adulterated dataset for both module configuration and parameter selection. The cross-entropy loss function was adopted as the objective function, and an Adam optimizer was employed with an initial learning rate of 0.001, beta1 = 0.9, beta2 = 0.999, and epsilon = 10−8 was employed. Model performance was evaluated using two metrics: ACC and F1-score.
3.3.1. SPM Construction
The SPM was introduced to suppress noise and enhance the quality of input features. Three kernel sizes, each coupled with different output-channel configurations, were evaluated (Table 2). The performance of the SPM was strongly influenced by both the kernel size and the number of output channels in the convolutional layer. The medium-sized kernel [16 × 1] achieved the highest accuracy (71.56%) and F1-score (0.7737) compared with the other configurations. This behavior can be explained by the fact that larger kernels tend to overlook fine spectral variations, whereas smaller kernels may fail to capture global characteristics. Regarding the number of output channels, the results indicated that configurations with 64 channels generally yielded comparable or slightly superior performance to those with 128 channels across all kernel sizes. A max-pooling layer with 64 channels in the SPM was sufficient to reduce the dimensionality of the input spectral sequence, compress redundant information, and preserve discriminative features. Therefore, the [16 × 1] kernel coupled with 64 output channels was selected as the optimal SMP configuration for subsequent experiments.
Table 2.
Performance of the SPM with different kernel sizes and output channel configurations.
3.3.2. MSFEM Construction
The selection of kernel sizes and the number of scales directly influence the sensitivity of MSFEM to features at different resolutions, whereas network depth affects the complexity of feature extraction. Accordingly, the MSFEM was constructed and optimized by sequentially evaluating the kernel size, scale number, and network depth.
Kernel Size and Scale Selection
The effects of convolutional kernel size and scale diversity on the feature-extraction capability of the MSFEM were investigated by testing convolutional kernels ranging from [1 × 1] to [11 × 1] (in odd-numbered increments) under single-, dual-, and tri-scale configurations (Table 3).
Table 3.
Performance of the MSFEM with different kernel sizes and scale combinations.
In the single-scale experiments, the [5 × 1] kernel achieved the best performance, with an ACC of 79.82% and an F1-score of 0.8472. This advantage stemmed from its balanced receptive field, which effectively captured both fine local spectral details and medium-range spectral trends. The [1 × 1] kernel was excluded from single-scale testing because of its ultranarrow receptive field, which was insufficient to capture effective spectral correlations on its own.
For dual-scale configurations, all kernel pairings outperformed the best single-scale kernel [5 × 1]. Pairings of small-to-medium or small-to-large kernels demonstrated superior performance: for example, the [3 × 1, 5 × 1] pairing reached an ACC of 85.32%; the [1 × 1, 9 × 1] pairing achieved the same ACC. This improvement stemmed from their complementary receptive-field coverage—the [1 × 1] kernel effectively extracted local spectral details, whereas the [3 × 1] and [5 × 1] kernels captured medium-range spectral correlations, together enabling the robust identification of weak adulteration signals. In contrast, pairings of medium-to-large or large-to-extra-large kernels (e.g., [9 × 1, 11 × 1]) exhibited lower efficiency because overlapping global receptive fields introduced information redundancy.
In the tri-scale experiments, dispersed kernel sets outperformed adjacent kernel sets. Among all configurations, the [1 × 1, 5 × 1, 9 × 1] combination achieved the highest ACC (92.66%) and F1-score (0.9452). This configuration encompassed three distinct receptive field scales: the [1 × 1] kernel captured fine local details, the [5 × 1] kernel extracted medium-range correlations, and the [9 × 1] kernel captured global spectral trends. This multiscale coverage minimized receptive field overlap and fully extracted hierarchical spectral information, which is crucial for detecting trace-level adulteration.
Based on these results, the MSFEM was optimized to a tri-scale configuration using [1 × 1, 5 × 1, and 9 × 1] kernels, providing a robust feature-extraction foundation for the subsequent CSAFM. Notably, the instances of lower accuracy observed during the initial stages of hyperparameter optimization for individual modules do not reflect the performance of the final, fully optimized SMAFNet model. These lower accuracies only reflect the performance of sub-optimal configurations or isolated modules prior to full integration and fine-tuning of the complete architecture.
Network Depth Selection
To further optimize the MSFEM, the network depth (d) was set to 1, 2, and 3 for comparative experiments (Table 4).
Table 4.
Performance of the MSFEM with different depth values.
As shown in the table, increasing the depth (d) of the MSFEM initially improved model performance, but eventually caused a decline. When d was increased from 1 to 2, the ACC improved from 92.66% to 95.41%, and the F1-score increased from 0.9452 to 0.9660, demonstrating that moderate depth enhanced feature abstraction and representation. This improvement can be attributed to a moderately deepened network, which strengthens the model’s ability to extract hierarchical spectral features by integrating local, medium-range, and global information for better detection of weak adulteration signals. However, when the depth was further increased to d = 3, model performance declined, as the ACC and F1-score decreased to 94.50% and 0.9595, respectively. This suggests that excessive depth introduces overfitting and reduces the generalization capability of the model. Based on these results, the optimal network depth for the MSFEM was determined to be d = 2.
3.3.3. Architecture of SMAFNet
After the aforementioned processing steps, the CSAFM was used to integrate information from multiple scales. Accordingly, the processing workflow of the SMAFNet can be described as follows:
The raw spectra were first preprocessed and dimensionally reduced by the SPM, which adopted the optimized configuration of a [16 × 1] convolutional kernel with 64 output channels. Subsequently, the MSFEM—configured as a tri-scale structure with convolutional kernels of [1 × 1], [5 × 1], and [9 × 1], and a network depth of d = 2—extracted multiscale spectral features from the preprocessed data. The CSAFM then integrated features from different scales through cross-scale interaction and attention-based recalibration. After flattening and concatenation, the fused features were fed into the fully connected layer, where a Sigmoid activation function was applied to complete the classification task.
The final architecture of SMAFNet is illustrated in Figure 6; the detailed model parameters are provided in Supplementary Table S1.
Figure 6.
Final architecture of SMAFNet.
3.4. Validation and Performance Comparison
To evaluate the performance of the optimized SMAFNet in identifying trace levels of artificial colorant adulteration in black tea, several traditional machine-learning models were implemented and validated under consistent experimental conditions, including PLS-DA, RBF-SVM, RF, 1D-CNN, and MLP. The final performance metrics of each model on the Sunset Yellow-adulterated dataset is listed in Table 5.
Table 5.
Performance evaluation of different models on the Sunset Yellow-adulterated dataset.
As shown in Table 5, the RBF-SVM—which achieved the best performance among the traditional models—obtained an ACC of 86.11% and an F1-score of 0.9123. Previous studies have demonstrated the nonlinearity of tea spectra obtained using portable spectrometers [19], which explains why the RBF-SVM performed better than the other traditional models. However, the selection of kernel parameters in an SVM lacks a universally effective strategy and typically depends on grid or random search within a predefined parameter space. These approaches are computationally intensive and may yield suboptimal performance owing to limited exploration efficiency [35]. PLS-DA exhibited the lowest ACC (72.22%), as linear models cannot resolve the nonlinear spectral interference caused by artificial colorants [36].
In contrast, SMAFNet outperformed all other models across all evaluation metrics, achieving an ACC of 97.22%, REC of 100%, and F1-score of 0.9825. The confusion matrix of SMAFNet for Sunset Yellow adulteration prediction on the validation set is shown in Figure 7. This superior performance arises from the synergy among its modular components: the SPM suppresses spectral noise while preserving key feature information; the MSFEM effectively captures spectral characteristics ranging from local details to global trends and enhances salient channel information through the SE module; and the CSAFM mitigates information barriers between multi-level features, thereby maximizing the retention of weak spectral signals from trace colorants. Although one unadulterated sample was misclassified as adulterated, resulting in a slight decrease in accuracy, the model successfully identified all adulterated samples, achieving 100% recall. From a food safety perspective, such conservative misclassification is acceptable. These results demonstrate that SMAFNet possesses excellent discriminative capability for the detection of Sunset Yellow adulteration in black tea.
Figure 7.
Confusion matrix of SMAFNet for the discrimination of Sunset Yellow-adulterated black tea.
3.5. Transfer Performance of SMAFNet
To further evaluate the generalization and transferability of the optimized SMAFNet, the optimized network was transferred to Tartrazine- and Ponceau 4R-adulterated datasets to validate its detection performance. For comparison, PLS-DA, RBF-SVM, RF, 1D-CNN, and MLP models were also trained and tested on these two datasets under identical conditions.
For the Tartrazine-adulterated dataset (Table 6), SMAFNet maintained the highest performance, achieving an ACC of 97.22%, an REC of 100%, and an F1-score of 0.9811. The discrimination confusion matrix of SMAFNet for Tartrazine adulteration is shown in Figure 8. Among the traditional models, RBF-SVM performed best, with an ACC of 86.11% and an F1-score of 0.9123, but still showed a significant gap when compared with SMAFNet. In contrast, PLS-DA exhibited the lowest performance (ACC = 75.00%), underscoring the limitations of linear models in capturing the complex nonlinear relationships within spectral data.
Table 6.
Performance evaluation of different models on the Tartrazine-adulterated dataset.
Figure 8.
Confusion matrix of SMAFNet for the discrimination of Tartrazine-adulterated black tea.
For the Ponceau 4R-adulterated dataset (Table 7), SMAFNet achieved perfect results across all metrics (ACC = 100%, PRE = 100%, REC = 100%, F1-score = 1.0), demonstrating complete discriminative capability. The best-performing baseline, RF, attained only an ACC of 86.11% and an F1-score of 0.9123, whereas the RBF-SVM, which had performed well in the Sunset Yellow dataset, exhibited a marked decline (ACC = 83.33%, F1-score = 0.8929). These findings confirm that SMAFNet not only provides a strong discriminative power for single-colorant adulteration but also demonstrates excellent transferability across different adulterant types. The confusion matrix of SMAFNet for Ponceau 4R-adulterated samples is shown in Figure 9.
Table 7.
Performance evaluation of different models on the Ponceau 4R-adulterated dataset.
Figure 9.
Confusion matrix of SMAFNet for the discrimination of Ponceau 4R-adulterated black tea.
The transfer performances across different types of artificial colorants demonstrated that SMAFNet’s modular design provides strong adaptability. Although this study focused on artificial colorants in black tea, the underlying principles of spectral analysis and deep learning are broadly applicable. SMAFNet can be extended to detect other types of adulterants or contaminants in various food products, provided that appropriate spectral data for these substances are available for training. However, additional studies are required to validate its generalization across diverse food matrices and adulterant types.
4. Conclusions
To address the challenges of weak spectral signals and limited feature separability in detecting trace levels of artificial colorant adulteration in black tea, this study proposed and validated a novel SMAFNet comprising three synergistic modules. Compared with traditional machine-learning models, SMAFNet achieved superior performance, with ACC ranging from 97.22% to 100%, F1-scores between 0.9879 and 1.00 and perfect recall (100%) for all adulterated samples (Sunset Yellow, Tartrazine, and Ponceau 4R), ensuring zero missed detections. This high sensitivity is critical for reliable food safety monitoring. When combined with a handheld NIR device, SMAFNet allows for quick analysis and provides near real-time results, offering a practical framework for on-site food safety monitoring.
SMAFNet’s superior performance can be attributed to its cross-scale attention fusion mechanism. By dynamically weighting and integrating multiscale features, SMAFNet effectively mitigated interference from the intrinsic chromogenic compounds of black tea, such as theaflavins and thearubigins, and accurately identified spectral differences introduced by artificial colorants, even at low adulteration concentrations (0.1 g.kg−1). Notably, SMAFNet also exhibited strong transferability, maintaining high accuracy across multiple colorant-adulterated datasets.
Despite these advancements, some limitations remain. In real-world food safety monitoring, illegally adulterated samples are much rarer than authentic samples, resulting in small and highly imbalanced datasets. This data scarcity poses a significant challenge for SMAFNet as deep learning models typically require abundant and balanced training data to achieve optimal generalization. Although this study focused on detecting individual adulterants in separate experiments to thoroughly evaluate SMAFNet’s performance for each, future research should explore its capability to simultaneously identify multiple colorants. Environmental variability—including fluctuations in humidity and temperature—may also adversely affect the predictive performance of the model, indicating a need for further optimization and rigorous validation to enhance the model’s robustness.
In future work, self-supervised and weakly supervised learning techniques could be integrated to enhance SMAFNet’s ability to detect abnormal adulteration under limited data conditions. Additionally, further validation using broader datasets—including those containing different black tea varieties, samples from different countries or regional markets, and samples contaminated with other types of adulterants—should be conducted to strengthen the model’s generalization and reinforce its practical value in routine food quality control.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/foods14244261/s1, Figure S1. NIR spectra of tablet samples; Figure S2. Confusion matrix on the public dataset; Table S1: The structural parameters of final SMAFNet; Table S2: The validation results of different models on public dataset.
Author Contributions
Conceptualization, J.C. and Q.M.; methodology, J.C.; validation, Y.C.; resources, Q.M.; writing—original draft preparation, J.T. and Y.C.; writing—review and editing, D.Q.; visualization, B.Z.; supervision, B.Z.; project administration, G.Z.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Key Research and Development Program of China (2024YFE0213900), Chongqing Key Project of Technological Innovation and Application Development (CSTB2022TIAD-KPX0091), and The Open Fund of Yunnan Key Laboratory of Tea Germplasms Conservation and Utilization in the Lancang River Basin (YNTGCU202509).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions of this study are presented in this article/Supplementary Materials. Further inquiries can be directed to the corresponding authors.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| NIR | Near-infrared |
| MS-NIR | Mid- and short-wave near-infrared |
| 1D-CNN | One-dimensional convolutional neural network |
| SMAFNet | Spectral Multi-Scale Attention Fusion Network |
| SPM | Spectral preprocessing module |
| MSFEM | Multi-scale feature extraction module |
| CSAFM | Cross-scale attention fusion module |
| FEB | Feature extraction block |
| ReLU | Rectified linear unit |
| SE | Squeeze-and-excitation |
| BiFPN | Bidirectional feature pyramid network |
| ACC | Accuracy |
| PRE | Precision |
| REC | Recall |
| Conv1d | One-dimensional convolutional |
| PLS-DA | Partial least squares-discriminant analysis |
| RF | Random forest |
| MLP | Multilayer perceptron |
| RBF-SVM | Radial basis function support vector machine |
References
- Yang, C.S.; Wang, H.; Sheridan, Z.P. Studies on prevention of obesity, metabolic syndrome, diabetes, cardiovascular diseases and cancer by tea. J. Food Drug Anal. 2018, 26, 1–13. [Google Scholar] [CrossRef]
- Zheng, F.; Gan, S.; Zhao, X.; Chen, Y.; Zhang, Y.; Qiu, T.; Zheng, P.; Zhai, X.; Dai, Q. Unraveling the chemosensory attributes of Chinese black teas from different regions using GC-IMS combined with sensory analysis. LWT 2023, 184, 114988. [Google Scholar] [CrossRef]
- Zhao, X.; Qiu, T.; Zhang, Z.; Huang, S.; Chen, Y.; Gan, S.; Jiang, Q.; Zhang, Y.; Zheng, F.; Li, L.; et al. Establishing a keemun black tea brewing control chart based on consumer acceptance via survival analysis and degree of satisfaction-difference method. LWT 2025, 224, 117829. [Google Scholar] [CrossRef]
- Xu, Y.; Qiao, F.; Huang, J. Black tea markets worldwide: Are they integrated? J. Integr. Agric. 2022, 21, 552–565. [Google Scholar] [CrossRef]
- Kaya, S.I.; Cetinkaya, A.; Ozkan, S.A. Latest advances on the nanomaterials-based electrochemical analysis of azo toxic dyes Sunset Yellow and Tartrazine in food samples. Food. Chem. Toxicol. 2021, 156, 112524. [Google Scholar] [CrossRef]
- Moreira, J.; Aryal, J.; Guidry, L.; Adhikari, A.; Chen, Y.; Sriwattana, S.; Prinyawiwatkul, W. Tea quality: An overview of the analytical methods and sensory analyses used in the most recent studies. Foods 2024, 13, 3580. [Google Scholar] [CrossRef]
- Shi, J.; Huang, M.; Yang, Q.; Xu, Y.; Wu, J.; Liu, H.; Zhang, J.; Zheng, F.; Dong, W. Relatively reliable and rapid identification of colorant compounds in food matrices by HPLC-DAD-QTOF-MS combined with theoretical calculation. Food Chem. 2025, 463, 141133. [Google Scholar] [CrossRef]
- Ghalkhani, M.; Zare, N.; Karimi, F.; Karaman, C.; Alizadeh, M.; Vasseghian, Y. Recent advances in Ponceau dyes monitoring as food colorant substances by electrochemical sensors and developed procedures for their removal from real samples. Food. Chem. Toxicol. 2022, 161, 112830. [Google Scholar] [CrossRef]
- Nowak, P.M. Simultaneous quantification of food colorants and preservatives in sports drinks by the high performance liquid chromatography and capillary electrophoresis methods evaluated using the red-green-blue model. J. Chromatogr. A 2020, 1620, 460976. [Google Scholar] [CrossRef]
- Schwack, W.; Pellissier, E.; Morlock, G. Analysis of unauthorized Sudan dyes in food by high-performance thin-layer chromatography. Anal. Bioanal. Chem. 2018, 410, 5641–5651. [Google Scholar] [CrossRef]
- Shi, X.; Gan, X.; Wang, X.; Peng, J.; Li, Z.; Wu, X.; Shao, Q.; Zhang, A. Rapid detection of Ganoderma lucidum spore powder adulterated with dyed starch by NIR spectroscopy and chemometrics. LWT 2022, 167, 113829. [Google Scholar] [CrossRef]
- Zaukuu, J.Z.; Attipoe, N.Q.; Korneh, P.B.; Mensah, E.T.; Bimpong, D.; Amponsah, L.A. Detection of bissap calyces and bissap juices adulteration with sorghum leaves using NIR spectroscopy and VIS/NIR spectroscopy. J. Food Compos. Anal. 2025, 141, 107358. [Google Scholar] [CrossRef]
- Jiang, Q.; Zhang, M.; Mujumdar, A.S.; Wang, D. Non-destructive quality determination of frozen food using NIR spectroscopy-based machine learning and predictive modelling. J. Food Eng. 2023, 343, 111374. [Google Scholar] [CrossRef]
- Zhang, L.; Dai, H.; Zhang, J.; Zheng, Z.; Song, B.; Chen, J.; Lin, G.; Chen, L.; Sun, W.; Huang, Y. A study on origin traceability of white tea (white peony) based on near-infrared spectroscopy and machine learning algorithms. Foods 2023, 12, 499. [Google Scholar] [CrossRef] [PubMed]
- Bec, K.B.; Grabska, J.; Huck, C.W. Miniaturized NIR spectroscopy in food analysis and quality control: Promises, challenges, and perspectives. Foods 2022, 11, 1465. [Google Scholar] [CrossRef]
- Marinoni, L.; Cattaneo, T.M.P.; Vanoli, M.; Barzaghi, S. Real-time monitoring of solar drying of melon slices with a portable NIR spectrometer: A preliminary approach. Eur. Food Res. Technol. 2023, 249, 2151–2164. [Google Scholar] [CrossRef]
- Kim, D.; Park, H.; Ha, N. Analyzing optical dual-wavelength-band cameras operating in short-wave and medium-wave infrared spectral regions. Heliyon 2024, 10, e35806. [Google Scholar] [CrossRef]
- Zhou, L.; Tan, L.; Zhang, C.; Zhao, N.; He, Y.; Qiu, Z. A portable NIR-system for mixture powdery food analysis using deep learning. LWT 2022, 153, 112456. [Google Scholar] [CrossRef]
- Luo, Z.; Chai, Y.; Zhao, G.; Qiao, D.; Ye, F.; Lei, L.; Chen, J. Rapid detection of colorants in black tea using mid- and short-wave near infrared spectroscopy. Anal. Methods 2025, 17, 5897–5905. [Google Scholar] [CrossRef]
- Da Silva Pereira, E.; Cruz-Tirado, J.P.; Lourenço Crippa, B.; Martins Morasi, R.; Milagres De Almeida, J.; Fernandes Barbin, D.; Barbon Junior, S.; Cristina Cirone Silva, N. Portable near infrared (NIR) spectrometer coupled with machine learning to classify milk with subclinical mastitis. Food Control 2024, 163, 110527. [Google Scholar] [CrossRef]
- Sun, R.; Yang, W.; Li, Y.; Sun, C. Multi-residue analytical methods for pesticides in teas: A review. Eur. Food Res. Technol. 2021, 247, 1839–1858. [Google Scholar] [CrossRef]
- Guo, X.; Zhao, Q.; Zheng, D.; Ning, Y.; Gao, Y. A short-term load forecasting model of multi-scale CNN-LSTM hybrid neural network considering the real-time electricity price. Energy Rep. 2020, 6, 1046–1053. [Google Scholar] [CrossRef]
- Kaushal, S.; Tammineni, D.K.; Rana, P.; Sharma, M.; Sridhar, K.; Chen, H. Computer vision and deep learning-based approaches for detection of food nutrients/nutrition: New insights and advances. Trends Food Sci. Technol. 2024, 146, 104408. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Q.; Wu, J.; Liu, Y.; Yu, L.; Chen, Y. Moisture detection of single corn seed based on hyperspectral imaging and deep learning. Infrared Phys. Technol. 2022, 125, 104279. [Google Scholar] [CrossRef]
- Zhang, J.; Bai, X.; Wu, J.; Zhou, B. Nondestructive detection method for soluble solids content and titratable acidity content in pepino melons based on Vis/NIR spectroscopy and dual-attention enhanced 1D-CNN. J. Food Compos. Anal. 2025, 148, 108232. [Google Scholar] [CrossRef]
- Lim, H.; Cho, H.; Kim, J.Y.; Shin, Y.J.; Chun, H.S.; Kim, B.H.; Ahn, S. Classification and quantification of sesame oil in edible oils and adulterated mixtures using 1H NMR spectroscopy combined with multivariate, machine learning, and deep learning models. Food Chem. 2025, 493, 146008. [Google Scholar] [CrossRef]
- Ouyang, Q.; Fan, Z.; Chang, H.; Shoaib, M.; Chen, Q. Analyzing TVB-N in snakehead by Bayesian-optimized 1D-CNN using molecular vibrational spectroscopic techniques: Near-infrared and Raman spectroscopy. Food Chem. 2025, 464, 141701. [Google Scholar] [CrossRef]
- Chen, J.; Ye, F.; Zhao, G. Rapid determination of farinograph parameters of wheat flour using data fusion and a forward interval variable selection algorithm. Anal. Methods 2017, 9, 6341–6348. [Google Scholar] [CrossRef]
- Wang, X.; Zhu, Z. Context understanding in computer vision: A survey. Comput. Vis. Image Underst. 2023, 229, 103646. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 22 June 2018; pp. 7132–7141. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 19 June 2020; pp. 10781–10790. [Google Scholar]
- Coppola, C.; Papa, L.; Boresta, M.; Amerini, I.; Palagi, L. Tuning parameters of deep neural network training algorithms pays off: A computational study. TOP 2024, 32, 579–620. [Google Scholar] [CrossRef]
- Wu, Z.; Li, C.; Liu, H.; Lin, T.; Yi, L.; Ren, D.; Gu, Y.; Wang, S. Quantification of caffeine and catechins and evaluation of bitterness and astringency of Pu-erh ripen tea based on portable near-infrared spectroscopy. J. Food Compos. Anal. 2024, 125, 105793. [Google Scholar] [CrossRef]
- Sun, Y.; Wang, Y.; Huang, J.; Ren, G.; Ning, J.; Deng, W.; Li, L.; Zhang, Z. Quality assessment of instant green tea using portable NIR spectrometer. Spectrochim. Acta A Mol. Biomol. Spectrosc. 2020, 240, 118576. [Google Scholar] [CrossRef]
- Xu, Z.; Abdul Aziz, M.A.; Abdul Razak, N. Research on the Application of Adaptive Parameter Adjustment Strategy in RBF Kernel SVM Image Classification. In Proceedings of the 2024 IEEE 6th Symposium on Computers & Informatics (ISCI), Kuala Lumpur, Malaysia, 10 August 2024; pp. 138–145. [Google Scholar]
- Khorrami, M.K.; Shahverdi, M.A.; Asadian, M.; Shirinnejad, M.; Mohammadi, M.; Shirian, A.Z.; Hajiseyedrazi, Z.S. Combining ACE, PLS-R, and SVM-R for rapid detection of adulteration in saffron samples by diffuse reflectance infrared Fourier transform spectroscopy. Food Control 2025, 168, 110853. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).