Geographical Origin Identification of Chinese Red Jujube Using Near-Infrared Spectroscopy and Adaboost-CLDA

Wu, Xiaohong; Yang, Ziteng; Yang, Yonglan; Wu, Bin; Sun, Jun

doi:10.3390/foods14050803

Open AccessArticle

Geographical Origin Identification of Chinese Red Jujube Using Near-Infrared Spectroscopy and Adaboost-CLDA

by

Xiaohong Wu

^1,2,*

,

Ziteng Yang

³,

Yonglan Yang

⁴,

Bin Wu

^5,6,* and

Jun Sun

¹

School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China

²

High-Tech Key Laboratory of Agricultural Equipment and Intelligence of Jiangsu Province, Jiangsu University, Zhenjiang 212013, China

³

Mengxi Honors College, Jiangsu University, Zhenjiang 212013, China

⁴

School of Energy and Power and Engineering, Jiangsu University, Zhenjiang 212013, China

⁵

Department of Information Engineering, Chuzhou Polytechnic, Chuzhou 239000, China

⁶

School of Computer Science and Engineering, Southeast University, Nanjing 211102, China

^*

Authors to whom correspondence should be addressed.

Foods 2025, 14(5), 803; https://doi.org/10.3390/foods14050803

Submission received: 11 February 2025 / Accepted: 24 February 2025 / Published: 26 February 2025

(This article belongs to the Special Issue Spectroscopic Methods Applied in Food Quality Determination)

Download

Browse Figures

Versions Notes

Abstract

Red jujube is a nutritious food, known as the “king of all fruits”. The quality of Chinese red jujube is closely associated with its place of origin. To classify Chinese red jujube more correctly, based on the combination of adaptive boosting (Adaboost) and common vectors linear discriminant analysis (CLDA), Adaboost-CLDA was proposed to classify the near-infrared (NIR) spectra of red jujube samples. In the study, the NIR-M-R2 spectrometer was employed to scan red jujube from four different origins to acquire their NIR spectra. Savitzky–Golay filtering was used to preprocess the spectra. CLDA can effectively address the “small sample size” problem, and Adaboost-CLDA can achieve an extremely high classification accuracy rate; thus, Adaboost-CLDA was performed for feature extraction from the NIR spectra. Finally, K-nearest neighbor (KNN) and Bayes served as the classifiers for the identification of red jujube samples. Experiments indicated that Adaboost-CLDA achieved the highest identification accuracy in this identification system for red jujube compared with other feature extraction algorithms. This demonstrates that the combination of Adaboost-CLDA and NIR spectroscopy significantly enhances the classification accuracy, providing an effective method for identifying the geographical origin of Chinese red jujube.

Keywords:

red jujube; near-infrared spectroscopy; feature extraction; geographical origin

1. Introduction

Chinese red jujube is a kind of dried fruit with edible, medicinal, and healthcare functions [1,2,3]. It contains protein, sugars, organic acids, vitamin A, vitamin C, and other rich nutrients, and it has some healthcare effects such as replenishing the spleen and stomach qi and nourishing blood for tranquillization. Different soil and climate conditions in various regions can affect the quality and chemical composition of jujubes [4,5], thereby influencing their taste. Investigating the origin of jujube can provide insights into how environmental factors affect jujube [6]. Variations in soil, climate, and cultivation practices among regions may result in unique characteristics, potentially appealing to consumers [7]. Above all, investigating the origin of jujube provides valuable information about its growth environment, quality characteristics, and potential medicinal value. This can be beneficial for producers, consumers, researchers, and government regulatory agencies [6,7,8,9].

Several conventional methods have been used for the classification of jujube, focusing on its geographical and varietal traceability. Physical characteristics such as size, shape, color, and weight, along with nutritional properties like sugar, moisture, protein, and phenolic content, are commonly employed. Wu et al. tried a combination of physical measurements (e.g., calipers for size, colorimeters for color) and nutritional analyses (e.g., HPLC for sugar and phenolics) to differentiate Chinese jujubes [10]. While nutritional profiling achieved higher precision, physical traits were less reliable due to overlaps between varieties. These methods, however, are time-consuming, labor-intensive, and often destructive, making them impractical for large-scale applications. Similarly, Wang et al. performed PCA to classify 15 varieties of Chinese jujube based on volatile profiles, but this approach required expensive instrumentation (e.g., GC–MS) and was limited by its procession of high-dimensional data [11].

Near-infrared (NIR) spectroscopy has become a powerful tool in the food industry, providing non-invasive, real-time monitoring, quality control, classification, and safety assurance [12,13,14,15]. For jujube classification, it detects key nutritional components, such as sugar, moisture, protein, and phenolics, without damaging samples [16,17,18]. Wang et al. demonstrated its high accuracy in detecting insect infestations in jujubes [16], while Guo et al. highlighted its precision in chemical composition analysis when combined with chemometric techniques [18]. A major advancement is the development of portable NIR devices, enabling rapid on-site detection at cultivation sites, processing plants, or markets [19]. These devices enhance efficiency, reduce reliance on lab-based methods, and make NIR more accessible. Furthermore, NIR addresses the limitations of traditional methods by eliminating destructive sampling and minimizing chemical use. Integration with the chemometrics methods such as partial least squares regression (PLSR) and back propagation neural network (BPNN) further enhances accuracy, as shown by Luo et al., who developed reliable online detection models for southern Xinjiang jujubes [17]. These advancements establish NIR as a sustainable and practical solution for jujube classification and traceability.

Due to the high dimensions and redundant data in NIR spectra, there is a need for processing NIR data with feature extraction methods. For example, fuzzy improved linear discriminant analysis (FILDA), Adaboost-ULDA, and fuzzy uncorrelated discriminant transformation (FUDT) have been used to extract discriminant data from NIR spectra of foods [20,21,22]. On the other hand, linear discriminant analysis (LDA)-based combination algorithms, such as discriminant partial least squares (DPLS) with LDA, and successive projections algorithm (SPA) with LDA, were utilized to classify NIR spectra of corn and camellia, respectively [23,24]. Discriminant vectors in the null space of

S_{W}

were computed to derive discriminative common vectors for classification, and then common vectors linear discriminant analysis (CLDA) was proposed for solving the small sample size problem [25]. Inspired by the notion of model fusion, Adaboost integrates multiple relatively weak classifiers to produce a robust ensemble classifier [26]. Leveraging its unique weight-adjustment mechanism and iterative training process, Adaboost has been successfully applied to classify pork storage time with Fourier transform NIR spectroscopy, as well as in other practical applications [21,27].

In this study, the classification system for identification of jujube origins has four parts: data acquisition, data preprocessing, feature extraction, and classification. The NIR-M-R2 spectrometer was used to scan red jujube for NIR spectra, which were pretreated by Savitzky–Golay (SG) filtering. Then, the data features were extracted via PCA + LDA, CLDA, and Adaboost-CLDA, respectively. In the end, K-nearest neighbor (KNN) and Bayes served as classifiers for the identification of red jujube. The schematic diagram of the traceability system for red jujube is presented in Figure 1.

2. Materials and Methods

2.1. Jujube Sample Preparation

The cultivation areas and varieties of red jujube samples were Dunhuang Junzao (Gansu), Xinzheng Huizao (Henan), Jishan Banzao (Shanxi), and Ruoqiang Huizao (Xinjiang). Experimenters wiped off any dust on the surface of the samples with a soft cloth to guarantee that the appearance of each sample was clean. On the other hand, they carefully selected red jujube samples by rejecting samples with noticeable defects, insect infestation, and other contaminants. A total of 60 samples were collected for each province (i.e., each variety), resulting in a total of 240 samples. Following this, the red jujube samples were partitioned into the training set and test set according to a proportion of 2:1, which meant each variety was subdivided into 40 training samples and 20 test samples.

2.2. NIR Acquisition

The NIR-M-R2 spectrometer (Shenzhen Pynect Science and Technology Co. Ltd., Shenzhen, China) was applied to collect the NIR spectra of red jujube samples. This spectrometer operates within a wavelength range of 900–1700 nm, containing an InGaAs detector, a ratio of signal to noise of 6000:1, and a slit size of 1.8 × 0.025 mm. During the data collection, the environmental conditions were controlled, maintaining an experimental temperature of approximately 25 °C and a relative humidity of 50–60%. Before collecting NIR spectra, the spectrometer underwent a preheating period of one hour.

The NIR spectra represented the 228-dimensional data for each red jujube sample. For consistency and reliability, each red jujube sample underwent three scans along the equator using the spectrometer. The final NIR spectrum of each sample is the average of the three scans for each sample, and they are shown in Figure 2a.

2.3. Spectral Data Preprocessing

In order to minimize the impact of noise and some partially redundant information on experimental accuracy, multiple preprocessing algorithms were introduced, including multiplicative scatter correction (MSC), standard normal variable (SNV), and Savitzky–Golay (SG) filtering. SG is effective in eliminating noise from NIR spectra, while MSC and SNV are capable of mitigating the influence of scattering in the spectral data [1].

2.4. CLDA

The steps of obtaining the discriminative common vectors using the range space of

S_{W}

can be summarized as follows:

Consider a training set comprising

C

classes, each containing

N

samples. Let

x_{m}^{i}

represent a

d

-dimensional column vector indicating the

m

th sample from the

i

th class. The total number of samples in the training set is denoted as

M = N C

. Assuming that

d > M - C

, in this scenario,

S_{W}

,

S_{B}

, and

S_{T}

can be defined as follows [25]:

S_{W} = \sum_{i = 1}^{C} \sum_{m = 1}^{N} (x_{m}^{i} - μ_{i}) {(x_{m}^{i} - μ_{i})}^{T}

(1)

S_{B} = \sum_{i = 1}^{C} N (μ_{i} - μ) {(μ_{i} - μ)}^{T}

(2)

and

S_{T} = \sum_{i = 1}^{C} \sum_{m = 1}^{N} (x_{m}^{i} - μ) {(x_{m}^{i} - μ)}^{T} = S_{W} + S_{B}

(3)

where

μ

is the mean of all samples, and

μ_{i}

is the mean of samples of the

i

th class.

Step 1: Calculate the non-zero eigenvalues and their corresponding eigenvectors of

S_{W}

using the matrix

A^{T} A

, where

S_{W} = A^{T} A

and

A

is defined by Equation (4). The form is

Q = [α_{1}, \dots, α_{r}]

, where

r

is the rank of

S_{W}

.

A = [x_{1}^{1} - μ_{1}, \dots, x_{N}^{1} - μ_{1}, x_{1}^{2} - μ_{2}, \dots, x_{N}^{C} - μ_{C}]

(4)

Step 2: Select a representative sample from each class and project it onto the null space of

S_{W}

to derive the common vectors as follows [25]:

x_{c o m}^{i} = x_{m}^{i} - Q Q^{T} x_{m}^{i}, m = 1, \dots, N, i = 1, \dots, C

(5)

Step 3: Determine the eigenvectors

w_{k}

of

S_{c o m}

, associated with the non-zero eigenvalues, utilizing the matrix

A_{c o m}^{T} A_{c o m}

, where

S_{c o m} = A_{c o m} A_{c o m}^{T}

and

A_{c o m}

is defined in Equation (5). There can be at most

C - 1

eigenvectors corresponding to the non-zero eigenvalues. Construct the projection matrix

W = [w_{1}, \dots, w_{C - 1}]

, which will be employed for extracting feature vectors in Equations (6) and (7) [25]:

A_{c o m} = [x_{c o m}^{1} - μ_{c o m}, \dots, x_{c o m}^{C} - μ_{c o m}]

(6)

Ω_{i} = W^{T} x_{m}^{i}, m = 1, \dots, N, i = 1, \dots, C

(7)

Ω_{t e s t} = W^{T} x_{t e s t}

(8)

The feature vectors

Ω_{i}

are referred to as discriminative common vectors, and they are utilized for the classification of red jujube. For recognizing a test sample,

x_{t e s t}

, the feature vector of this test sample, is obtained through Equation (7). Subsequently, it is compared with the discriminative common vector

Ω_{i}

of each class using the Euclidean distance. The discriminative common vector identified as the closest to

Ω_{t e s t}

is utilized to classify the test sample.

2.5. Adaboost

In the ever-evolving landscape of machine learning, researchers continually explore various algorithms and techniques to enhance model performance and robustness. Ensemble learning, a powerful paradigm, stands out by combining the outputs of multiple weak learners to achieve significant improvements over individual models. This concept is rooted in the idea of “crowdsourcing”, where aggregating opinions from multiple experts often yields more accurate results than relying on a single expert.

Among the myriad ensemble learning methods, Adaboost (Adaptive Boosting) has emerged as a particularly noteworthy algorithm. Leveraging its unique weight-adjustment mechanism and iterative training process, Adaboost has achieved notable success in practical applications. Inspired by the notion of model fusion, Adaboost integrates multiple relatively weak classifiers to produce a robust ensemble classifier. Our research aimed to delve into the principles, advantages, and applications of the Adaboost algorithm, offering a comprehensive understanding and utilization of this potent machine learning tool.

2.6. Adaboost-CLDA

The Adaboost-CLDA algorithm, which is an ensemble learning algorithm that combines the Adaboost algorithm with the CLDA algorithm. In this algorithm, during each iteration of the Adaboost process, the generated training subset is mapped to the feature subspace of CLDA. The weak classifiers are obtained from the nearest-neighbor classifier in the CLDA feature subspace. At this point, the Adaboost algorithm functions as an adaptive feature selection process. Based on the weighted classification errors generated by the weak classifiers in each round, corresponding weights are assigned to the feature projection vectors. Ultimately, a joint feature subspace is constructed through a voting mechanism, forming a strong classifier. The process of Adaboost-CLDA is described as follows:

Input the training set

S = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}

, where

x_{i}

is

i

th sample.

y_{i}

is the class label

y_{i} \in y = {1, 2, \dots, C}

.

Initialize the weights of all data:

W_{1} (i) = \frac{1}{n}, i = 1, 2, \dots, n

.

For t = 1, …, T

(1): Normalize the weights:

$P_{t} (i) = \frac{W_{t} (i)}{\sum_{i = 1}^{n} W_{t} (i)}$

(9)
(2): CLDA is executed to derive $C - 1$ (where $C$ is the number of classes) optimal feature vectors. These feature vectors serve as the projection space for training data, resulting in the creation of a new training set denoted as $S^{*}$ ( $C - 1$ dimensional data set). $S^{*} = {(x_{1}^{*}, y_{1}), \dots, (x_{n}^{*}, y_{n})}$ , $x_{i}^{*} = W^{T} x_{i}$ .
(3): Utilize the nearest-neighbor classifier (NNC) as the weak classifier, supplying NNC with the distribution $P_{t} (i)$ . Obtain a hypothesis $h_{t} : x^{*} \to {1, 2, \dots, C}$ in return.
(4): Compute the error of $h_{t} : ε_{t} = \sum_{i = 1}^{N} P_{t} (i) I (y_{i} \neq h_{t} (x_{i}^{*}))$ . If $ε_{t} = 0$ or $ε_{t} > \frac{1}{2}$ , then set $t = T - 1$ and terminate the loop.
(5): Set

$α_{t} = \frac{1}{2} \ln [(1 - ε_{t}) / ε_{t}]$

(10)
(6): Update the weights:

$P_{t + 1} (i) = \frac{P_{t} (i)}{Z_{t}} * \{\begin{matrix} e^{- α_{t}}, & if y_{i} = h_{t} (x_{i}^{*}) \\ e^{α_{t}}, & if y_{i} \neq h_{t} (x_{i}^{*}) \end{matrix} where Z_{t} = \sum_{1}^{n} P_{t} (i)$

(11)
(7): Generate the final hypothesis:

$H (x) = \arg \max_{y \in Y} [\sum_{t = 1}^{T} α_{t} I (h_{t} (x^{*}) = y)]$

(12)

2.7. KNN Classifier

KNN is a kind of instance-based learning algorithm commonly utilized for classification and regression problems. Unlike traditional model-based learning algorithms, KNN does not explicitly learn a function for prediction. Instead, it determines the label of a new data point by examining the labels of its closest neighbors within the feature space. The fundamental idea behind KNN is that similar samples in the feature space probably belong to the same class. For a new data point, KNN identifies the K-nearest training samples and makes predictions based on a majority vote for classification problems. In this study, the KNN algorithm was used to construct a classifier for red jujube samples.

2.8. Bayes Classifier

The Naive Bayes classifier first learns the prior probabilities of categories and the conditional probabilities of feature values within each category from known training data. These probabilities can be estimated through methods such as frequency statistics. Then, when new samples need to be classified, the classifier utilizes the learned probability model to compute the posterior probabilities for each category and selects the category with the highest posterior probability as the classification result [28,29].

2.9. Software

All the algorithms in this article were executed using Matlab 2014b (The MathWorks, Natick, MA, USA).

3. Results

3.1. Spectral Analysis

The collected NIR spectra of red jujube spanned a wavelength range of 5894.5–11,111 cm⁻¹, revealing distinct features, as illustrated in Figure 2a. Two prominent peaks at 8475 cm⁻¹ and 6993 cm⁻¹ appeared in the NIR spectra. Beyond 7407 cm⁻¹, a significant change in absorbance occurred across all red jujube samples, attributed to O-H and water absorption [30]. The absorbance peaked at 6993 cm⁻¹, associated with the first and second frequency multiplications of C-H group stretching vibrations, indicating protein-like substances. This peak might be linked to the first-order and second-order double frequency of the O-H group in water [12]. The varying functional group information among red jujube varieties is mostly contained in the NIR spectra.

3.2. Spectral Preprocessing

In this experiment, three preprocessing algorithms denoted as MSC, SNV, and SG filtering, as well as their combinations, were used for spectral preprocessing. Table 1 and Table 2 outline the influence of the three preprocessing algorithms, as well as their combinations, on the classification accuracy with the KNN classifier and Bayes classifier, respectively. Based on Table 1 and Table 2, the classification accuracies of Adaboost-CLDA were higher than those of PCA + LDA and CLDA with several preprocessing algorithms. After comprehensive consideration, SG was chosen as a preprocessing algorithm in this identification system. The NIR spectra processed by the SG algorithm are shown in Figure 2b. Figure 3 shows the classification results of PCA + LDA, CLDA, and Adaboost-CLDA combined with SG preprocessing and the KNN classifier. The confusion matrix in Figure 3 illustrates the test sample numbers correctly classified and wrongly classified. It was evident that Adaboost-CLDA had the highest classification accuracy compared with PCA + LDA and CLDA.

3.3. Classification with CLDA

PCA + LDA constitutes a dual-stage algorithm, wherein the data undergo compression through PCA before being extracted by LDA. Different from PCA + LDA, there is no need for CLDA to compress data. CLDA could directly extract discriminant common vectors without the help of PCA. All the 240 red jujube samples were partitioned into the training set (each kind of red jujube has 40 training samples, a total of 160) and the test set (each kind of red jujube has 20 test samples, a total of 80). There are three (i.e.,

C - 1 = 3

) eigenvectors corresponding to the non-zero eigenvalues. That is to say, the projection matrix is three dimensions. CLDA projected the test data into the matrix to obtain the three-dimensional data. Figure 4a shows the test data projected by three discriminant common vectors of CLDA. As illustrated in Figure 4a, certain red jujube samples from Shanxi and Xinjiang could not be differentiated, and a few samples from Gansu and Henan could not be distinguished. Hence, the classification accuracy is only 75%.

3.4. Classification with Adaboost-CLDA

The Adaboost-CLDA algorithm, incorporating ensemble learning, achieved optimal experimental results through 10 rounds of iterations. The classification accuracies are shown in Figure 4b. It can be observed from Figure 4b that the highest classification accuracy achieved 97.5% in the eighth iteration with the KNN classifier. From Figure 4b, the highest classification accuracy achieved was 100% in the sixth iteration with the Bayes classifier. The Adaboost-CLDA algorithm had significantly outperformed traditional linear feature extraction methods in terms of classification performance. The reason was that the feature extraction process based on Adaboost autonomously selected the optimal classification features according to the classification errors of weak classifiers. Through the collaborative voting of multiple linear feature extractors, it achieved complex feature extraction from spectral data.

4. Discussion

The NIR spectral data were obtained using the NIR-M-R2 spectrometer, and they were preprocessed by SG filtering. Next, PCA + LDA, CLDA, and Adaboost-CLDA were performed for feature extraction from NIR spectra. At last, KNN and Bayes served as classifiers for sample classification. As evident from the experimental results in Table 1, the classification accuracy was influenced by the feature extraction algorithms. When PCA + LDA or CLDA was used, the accuracy dropped below 90%. Conversely, if Adaboost-CLDA served as the feature extraction algorithm, it resulted in an accuracy over 90%. Table 1 indicates that the highest classification accuracy (100%) was attained by combining Adaboost-CLDA and the MSC preprocessing method with the KNN classifier in the identification system for the geographical origin classification of red jujube samples. Table 2 indicates that the highest classification accuracy (100%) was attained by the combination of Adaboost-CLDA and the SG preprocessing method with the Bayes classifier in the classification system.

To consider the effectiveness of the classification system, the number of training and test samples was changed to compute the classification accuracies while keeping other experimental conditions unchanged. Table 3 displays the accuracy of classifying red jujube samples using three feature extraction methods and the numbers of training data and test data. In Table 3, num_training and num_test denote the number of training samples and test samples, respectively. It could be observed that the classification accuracy changed with variation in the parameters num_training and num_test. It was evident that when the parameters num_training and num_test were set to 160 and 80, respectively, the classification accuracy of Adaboost-CLDA achieved the highest value of 97.5%.

The results of this study not only have significance in the classification of the geographical origin of jujubes but also provide a valuable reference for practical market applications. By accurately identifying the geographical origin of jujubes, it is possible to effectively prevent counterfeit products and establish a traceability system for jujube products, safeguarding consumers’ legal rights and interests. Jujubes from different regions have different geographical indications (GI) and prices in the markets. By clarifying the origin information, the study provides a strong basis for geographical origin traceability and better protection of GI.

In the study, the role of the parameter K in the K-nearest neighbors (KNN) algorithm was evaluated to understand its impact on classification accuracy. The experimental results demonstrated that the selection of the K value significantly influenced the classification accuracy of KNN combined with feature extraction algorithms, including PCA + LDA, CLDA, and Adaboost-CLDA. As shown in Figure 5, Adaboost-CLDA consistently achieved superior accuracy across various K values, particularly at K = 7, where it outperformed other methods, reaching nearly 100% accuracy. This indicates that the ensemble learning approach of Adaboost-CLDA is robust to the changed value of K and can effectively optimize feature selection during classification. Conversely, CLDA and PCA + LDA displayed lower accuracy rates and greater variability in performance as the K value changed. These findings underscore the importance of selecting an optimal K value in KNN-based classification systems and further affirm the efficacy of Adaboost-CLDA in enhancing classification reliability and robustness.

Recent studies have explored some machine learning and chemometric methods for red jujube classification. Qi et al. (2022) introduced fuzzy improved linear discriminant analysis (FiLDA) to improve the classification of red jujube varieties using NIR spectroscopy [20]. While FiLDA effectively handles noisy data and enhances class separability, its classification accuracy (57.5%) is significantly lower than the accuracy achieved in this study (97.5% with KNN, 100% with Bayes). Furthermore, improved linear discriminant Analysis (iLDA) achieved a low classification accuracy of 76.25%. In contrast, our proposed Adaboost-CLDA approach builds on CLDA’s ability to handle the small sample size problem and further enhances classification accuracy by adaptively selecting optimal features. Unlike FiLDA, which relies on fuzzy membership values, Adaboost iteratively boosts weak classifiers to refine the decision boundary. This makes Adaboost-CLDA a more effective and robust solution for geographical origin classification, as demonstrated by its superior performance with different preprocessing algorithms.

5. Conclusions

This study proposes a method for the accurate classification of Chinese red jujube by combining CLDA with Adaboost. While CLDA effectively addresses the “small sample size” problem, the main contribution of our research is the introduction of Adaboost-CLDA, which significantly improves classification accuracy by adaptively selecting features and enhancing the robustness of the classification model. The highest classification accuracy of Adaboost-CLDA could reach 100%. The results indicated that Adaboost-CLDA coupled with near-infrared spectroscopy was an effective method for the geographical origin identification of Chinese red jujube.

The successful application of the Adaboost-CLDA combined with NIR spectroscopy highlights its broad potential in the field of food origin identification and quality control. Beyond red jujube, this method can be extended to other agricultural products such as tea, wine, and rice for geographical indication protection by rapidly analyzing their chemical composition and spectral features, enabling precise classification of origin and variety. For instance, the spectral characteristics of polyphenols and amino acids in tea, the composition of acidity and phenolic compounds in wine, and the starch and protein ratios in rice can all be effectively utilized for high-accuracy origin tracing using this approach. Moreover, this technique can also be applied to honey, coffee, olive oil, and other food products, addressing the growing demand for traceability technologies in the global food market. Its non-invasive, cost-effective, and efficient characteristics offer an environmentally friendly and sustainable solution for the food industry, helping to enhance consumers’ trust, combat counterfeit products, and promote market standardization.

Author Contributions

Conceptualization, X.W.; methodology, X.W. and B.W.; software, X.W. and B.W.; validation, X.W. and J.S.; formal analysis, Z.Y.; investigation, Z.Y. and Y.Y.; resources, B.W. and J.S.; data curation, Y.Y.; writing—original draft preparation, Z.Y.; writing—review and editing, X.W.; visualization, Y.Y.; supervision, J.S.; project administration, X.W.; funding acquisition, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Major Natural Science Research Projects of Colleges and Universities in Anhui Province (2022AH040333), the Youth and Middle-aged Teachers Cultivation Action Project in Anhui Province (JNFX2023136), and the Undergraduate Innovation and Entrepreneurship Training Program of Jiangsu Province (202310299363X).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, S.; Sun, J.; Fu, L.; Xu, M.; Tang, N.; Cao, Y.; Yao, K.; Jing, J. Identification of red jujube varieties based on hyperspectral imaging technology combined with CARS-IRIV and SSA-SVM. J. Food Process Eng. 2022, 45, e14137. [Google Scholar] [CrossRef]
Zhang, X.; Wang, Y.; Wu, F.; Gu, D.; Tao, H.; Zhang, R. Organic acid and aromatic compounds create distinctive flavor in the blackening process of jujube. Food Chem. 2024, 439, 138199. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Sun, J.; Zhou, X.; Nirere, A.; Wu, X.; Dai, R. Classification detection of saccharin jujube based on hyperspectral imaging technology. J. Food Process. Pres. 2020, 44, e14591. [Google Scholar] [CrossRef]
Hu, C.; Xu, H.; Fu, Z.; Zhang, R.; Zhi, C. Non-destructive identification of the geographical origin of red jujube by near-infrared spectroscopy and fuzzy clustering methods. Int. J. Food Prop. 2023, 26, 3275–3290. [Google Scholar] [CrossRef]
Arslan, M.; Zareef, M.; Tahir, H.E.; Ali, S.; Huang, X.W.; Rakha, A.; Shi, J.; Zou, X. Comparative analyses of phenolic compounds and antioxidant properties of Chinese jujube as affected by geographical region and drying methods (Puff-drying and convective hot air-drying systems). J. Food Meas. 2021, 15, 933–943. [Google Scholar] [CrossRef]
Li, Z.; Li, W.; Wang, J.; Zhang, J.; Wang, Z. Drip irrigation shapes the soil bacterial communities and enhances jujube yield by regulating the soil moisture content and nutrient levels. Agr. Water Manag. 2023, 289, 108563. [Google Scholar] [CrossRef]
Öztürk, M.; Yalçın, O.; Tekgündüz, C.; Tekgündüz, E. Origin of the effects of optical spectrum and flow behaviour in determining the quality of dry fig, jujube, pomegranate, date palm and concentrated grape vinegars. Spectrochim. Acta A 2022, 270, 120792. [Google Scholar] [CrossRef] [PubMed]
Ruan, J.; Li, H.; Lu, M.; Hao, M.; Sun, F.; Yu, H.; Zhang, Y.; Wang, T. Bioactive triterpenes of jujube in the prevention of colorectal cancer and their molecular mechanism research. Phytomedicine 2023, 110, 154639. [Google Scholar] [CrossRef]
Si, Q.; Su, L.; Wang, D.; De, B.J.; Na, R.; He, N.; Byambaa, T.; Dalkh, T.; Bao, X.; Yi, L. An evaluation of the qualitative superiority of the Mongolian medicinal herb hurdan-tsagaan (Platycodi Radix) from five different geographic origins based on anti-inflammatory effects. J. Ethnopharmacol. 2023, 310, 116331. [Google Scholar] [CrossRef]
Wu, L.; Li, L.; Zhang, G.; Jiang, N.; Ouyang, X.; Wang, M. Geographical and varietal traceability of Chinese jujubes based on physical and nutritional characteristics. Foods 2021, 10, 2270. [Google Scholar] [CrossRef]
Wang, L.; Wang, Y.; Wang, W.; Zheng, F.; Chen, F. Comparison of volatile compositions of 15 different varieties of Chinese jujube (Ziziphus jujuba Mill.). J. Food Sci. Technol. 2019, 56, 1631–1640. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Paliwal, J. Near-infrared spectroscopy and imaging in food quality and safety. Sens. Instrum. Food Qual. Saf. 2007, 1, 193–207. [Google Scholar] [CrossRef]
Cortés, V.; Blasco, J.; Aleixos, N.; Cubero, S.; Talens, P. Monitoring strategies for quality control of agricultural products using visible and near-infrared spectroscopy: A review. Trends Food Sci. Technol. 2019, 85, 138–148. [Google Scholar] [CrossRef]
Kademi; Ibrahim, H.; Ulusoy, B.H.; Hecer, C. Applications of miniaturized and portable near-infrared spectroscopy (NIRS) for inspection and control of meat and meat products. Food Rev. Int. 2019, 35, 201–220. [Google Scholar] [CrossRef]
Tahir, H.E.; Mariod, A.A.; Hashim, S.B.H.; Arslan, M.; Mahunu, G.K.; Huang, X.; Li, Z.; Abdalla, I.I.H.; Zou, X. Classification of Black Mahlab seeds (Monechma ciliatum) using GC–MS and FT-NIR and simultaneous prediction of their major volatile compounds using chemometrics. Food Chem. 2023, 408, 134948. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Nakano, K.; Ohashi, S. Nondestructive detection of internal insect infestation in jujubes using visible and near-infrared spectroscopy. Postharvest Biol. Technol. 2011, 59, 272–279. [Google Scholar] [CrossRef]
Luo, H.; Lu, Q.; Ding, H.; Gao, H.; Guo, L. Study on online detection modeling parameters of jujube internal quality of southern Xinjiang with near infrared spectrometric techniques. Spectrosc. Spectr. Anal. 2012, 32, 1225–1229. [Google Scholar]
Guo, Y.; Ni, Y.; Kokot, S. Evaluation of chemical components and properties of the jujube fruit using near infrared spectroscopy and chemometrics. Spectrochim. Acta A 2016, 153, 79–86. [Google Scholar] [CrossRef] [PubMed]
McGrath, T.F.; Haughey, S.A.; Islam, M.; Elliott, C.T. The potential of handheld near infrared spectroscopy to detect food adulteration: Results of a global, multi-instrument inter-laboratory study. Food Chem. 2021, 353, 128718. [Google Scholar] [CrossRef]
Qi, Z.; Wu, X.; Yang, Y.; Wu, B.; Fu, H. Discrimination of the red jujube varieties using a portable NIR spectrometer and fuzzy improved linear discriminant analysis. Foods 2022, 11, 763. [Google Scholar] [CrossRef]
Wu, X.; Fu, H.; Tian, X.; Wu, B.; Sun, J. Prediction of pork storage time using Fourier transform near infrared spectroscopy and Adaboost-ULDA. J. Food Process Eng. 2017, 40, e12566. [Google Scholar] [CrossRef]
Zhang, T.; Wu, X.; Wu, B.; Dai, C.; Fu, H. Rapid authentication of the geographical origin of milk using portable near-infrared spectrometer and fuzzy uncorrelated discriminant transformation. J. Food Process Eng. 2022, 45, e14040. [Google Scholar] [CrossRef]
Qin, H.; Wang, H.; Li, W.; Jin, X. Application of DPLS-based LDA in corn qualitative near infrared spectroscopy analysis. Spectrosc. Spectr. Anal. 2011, 31, 1777–1781. [Google Scholar]
Diniz, P.H.G.D.; Gomes, A.A.; Pistonesi, M.F.; Band, B.S.F.; de Araújo, M.C.U. Simultaneous classification of teas according to their varieties and geographical origins by using NIR spectroscopy and SPA-LDA. Food Anal. Methods 2014, 7, 1712–1718. [Google Scholar] [CrossRef]
Cevikalp, H.; Neamtu, M.; Wilkes, M.; Barkana, A. Discriminative common vectors for face recognition. IEEE Trans. Pattern Anal. 2005, 27, 4–13. [Google Scholar] [CrossRef] [PubMed]
Schapire, R.E. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef]
Pan, W.; Liu, W.; Huang, X. Rapid identification of the geographical origin of Baimudan tea using a Multi-AdaBoost model integrated with Raman Spectroscopy. Curr. Res. Food Sci. 2024, 8, 100654. [Google Scholar] [CrossRef] [PubMed]
Ranganathan, S.; Gribskov, M.; Nakai, K.; Schönbach, C. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics; Elsevier: Amsterdam, The Netherlands, 2018. [Google Scholar]
Murphy, K.P. Naive Bayes Classifiers. Univ. B. C. 2006, 18, 1–8. [Google Scholar]
Wu, L.; He, J.; Liu, G.; Wang, S.; He, X. Detection of common defects on jujube using Vis-NIR and NIR hyperspectral imaging. Postharvest Biol. Technol. 2016, 112, 134–142. [Google Scholar] [CrossRef]

Figure 1. The schematic diagram of the traceability system. PCA, principal component analysis; LDA, linear discriminant analysis; CLDA, common vectors linear discriminant analysis; NIR, near-infrared; S-G, Savitzky–Golay; KNN, K-nearest neighbor.

Figure 2. NIR spectra of red jujube. (a) The original spectra; (b) the preprocessed spectra by SG algorithm; (c) tmean spectra; (d) the mean spectra preprocessed by SG algorithm.

Figure 3. Classification results of PCA + LDA, CLDA, and Adaboost-CLDA. (a) The confusion matrix of PCA + LDA; (b) the confusion matrix of CLDA; (c) the confusion matrix of Adaboost-CLDA.

Figure 4. Data distribution and classification. (a) The test data projected by three discriminant common vectors of CLDA; (b) the classification accuracy of Adaboost-CLDA with KNN and Bayes.

Figure 5. Classification accuracy with different K values using feature extraction methods.

Table 1. Classification accuracies of three feature extraction methods with preprocessing algorithms and KNN classifier (%).

Feature Extraction Method	MSC	SNV	SG	MSC + SG	SNV + SG	MSC + SNV
PCA + LDA	53.75	62.5	77.5	78.75	88.75	53.75
CLDA	52.5	47.5	75	67.5	68.75	53.75
Adaboost-CLDA	100	98.75	97.5	98.75	97.5	98.75

MSC, multiplicative scattering correction; SNV, standard normal variable; SG, Savitzky–Golay; PCA, principal component analysis; LDA, linear discriminant analysis; CLDA, common vectors linear discriminant analysis.

Table 2. Classification accuracies of three feature extraction methods with preprocessing algorithms and Bayes classifier (%).

Feature Extraction Method	MSC	SNV	SG	MSC + SG	SNV + SG	MSC + SNV
PCA + LDA	71.25	67.5	81.25	78.75	92.5	71.25
CLDA	40	42.5	77.5	50	47.5	40
Adaboost-CLDA	92.5	87.5	100	96.25	91.25	88.75

Table 3. Classification accuracies of three feature extraction methods with the numbers of training samples and test samples.

num_training	num_test	PCA + LDA (%)	CLDA (%)	Adaboost-CLDA (%)
120	120	75	54.17	93.34
140	100	82	70	94
144	96	82.29	71.88	96.88
160	80	77.5	75	97.5
180	60	76.67	68.33	90
192	48	72.92	68.75	91.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Yang, Z.; Yang, Y.; Wu, B.; Sun, J. Geographical Origin Identification of Chinese Red Jujube Using Near-Infrared Spectroscopy and Adaboost-CLDA. Foods 2025, 14, 803. https://doi.org/10.3390/foods14050803

AMA Style

Wu X, Yang Z, Yang Y, Wu B, Sun J. Geographical Origin Identification of Chinese Red Jujube Using Near-Infrared Spectroscopy and Adaboost-CLDA. Foods. 2025; 14(5):803. https://doi.org/10.3390/foods14050803

Chicago/Turabian Style

Wu, Xiaohong, Ziteng Yang, Yonglan Yang, Bin Wu, and Jun Sun. 2025. "Geographical Origin Identification of Chinese Red Jujube Using Near-Infrared Spectroscopy and Adaboost-CLDA" Foods 14, no. 5: 803. https://doi.org/10.3390/foods14050803

APA Style

Wu, X., Yang, Z., Yang, Y., Wu, B., & Sun, J. (2025). Geographical Origin Identification of Chinese Red Jujube Using Near-Infrared Spectroscopy and Adaboost-CLDA. Foods, 14(5), 803. https://doi.org/10.3390/foods14050803

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Geographical Origin Identification of Chinese Red Jujube Using Near-Infrared Spectroscopy and Adaboost-CLDA

Abstract

1. Introduction

2. Materials and Methods

2.1. Jujube Sample Preparation

2.2. NIR Acquisition

2.3. Spectral Data Preprocessing

2.4. CLDA

2.5. Adaboost

2.6. Adaboost-CLDA

2.7. KNN Classifier

2.8. Bayes Classifier

2.9. Software

3. Results

3.1. Spectral Analysis

3.2. Spectral Preprocessing

3.3. Classification with CLDA

3.4. Classification with Adaboost-CLDA

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI