Next Article in Journal
Potential Landslide Identification in Baihetan Reservoir Area Based on C-/L-Band Synthetic Aperture Radar Data and Applicability Analysis
Next Article in Special Issue
Dual-Structure Elements Morphological Filtering and Local Z-Score Normalization for Infrared Small Target Detection against Heavy Clouds
Previous Article in Journal
A Modified Frequency Nonlinear Chirp Scaling Algorithm for High-Speed High-Squint Synthetic Aperture Radar with Curved Trajectory
Previous Article in Special Issue
Using Deep Learning and Advanced Image Processing for the Automated Estimation of Tornado-Induced Treefall
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Design of a Multimodal Detection System Tested on Tea Impurity Detection

1
State Key Laboratory of Optoelectronic Materials and Technologies, School of Physics, Sun Yat-sen University, Guangzhou 510275, China
2
Nanchang Research Institute, Sun Yat-sen University, Nanchang 330096, China
3
Guangzhou Guangxin Technology Co., Ltd., Guangzhou 510300, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(9), 1590; https://doi.org/10.3390/rs16091590
Submission received: 5 March 2024 / Revised: 26 April 2024 / Accepted: 28 April 2024 / Published: 29 April 2024
(This article belongs to the Special Issue Machine Learning and Image Processing for Object Detection)

Abstract

:
A multimodal detection system with complementary capabilities for efficient detection was developed for impurity detection. The system consisted of a visible light camera, a multispectral camera, image correction and registration algorithms. It can obtain spectral features and color features at the same time and has higher spatial resolution than a single spectral camera. This system was applied to detect impurities in Pu’er tea to verify its high efficiency. The spectral and color features of each pixel in the images of Pu’er tea were obtained by this system and used for pixel classification. The experimental results showed that the accuracy of a support vector machine (SVM) model based on combined features was 93%, which was 7% higher than that based on spectral features only. By applying a median filtering algorithm and a contour detection algorithm to the label matrix extracted from pixel-classified images, except hair, eight impurities were detected successfully. Moreover, taking advantage of the high resolution of a visible light camera, small impurities could be clearly imaged. By comparing the segmented color image with the pixel-classified image, small impurities such as hair could be detected successfully. Finally, it was proved that the system could obtain multiple images to allow a more detailed and comprehensive understanding of the detected items and had an excellent ability to detect small impurities.

1. Introduction

A multimodal imaging system refers to a sophisticated technology that combines different imaging modalities to provide a comprehensive view or analysis of a subject, such as the human body, geological formations, materials, etc. This integrated system enables the simultaneous or sequential acquisition of multiple types of images or data, allowing for a more detailed and holistic understanding of the subject under investigation. Multimodal imaging systems find diverse applications across various fields due to their ability to provide comprehensive insights by combining different imaging techniques. Some of the key applications include medical diagnostics [1,2,3,4], biomedical research [5], environmental and earth sciences [6,7]. In addition, the detection content in the industry is becoming more and more complex, and the information obtained by a single sensor cannot meet the detection needs. Therefore, multimodal imaging systems have been gradually used in the industry for detection in recent years. Zhao et al. [8] utilized a multimodal information acquisition and test platform that contained a camera and an IR thermal image to achieve accurate recognition of coal and gangue. Xu et al. [9] developed a defect-detecting system based on unmanned airships, integrated panoramic CCD cameras, three-dimensional laser scanners, inertial measurement units, barometric altimeters, illumination sensors, and control modules and successfully detected defects on a vertical shaft surface. Saran et al. [10] used a multimodal imaging (polarization camera)-based system to detect foreign objects on the surface of a coal-carrying conveyor. Jiang et al. [11] collected the diameter, temperature, and pulling speed signals as well as two-dimensional images of the meniscus and proposed a multimodal fusion network (MMFN) for multimodal data fusion. The experimental results showed that the proposed method effectively detected node-loss defects in the growth process of monocrystalline silicon with high accuracy, robustness, and real-time performance. In addition, there are some other studies on the application of multimodal imaging systems in industry [12,13,14]. Although multimodal systems have been used in industry, there are few studies on the application of multimodal detection system in impurity detection.
Pu’er is a major kind of postfermented tea made with a “large leaf” variety of Camellia sinensis (C. sinensis assamica), whose distribution is limited to the mountains of southern Yunnan, China. Pu’er tea, a unique postfermented tea produced in China, has antiobesity, hypolipidemic, and antioxidative properties [15]. The modern manufacture process of Pu’er involves withering, fixation, rolling, sun-drying, steaming, compressing, drying, and postfermentation [16]. In the process of picking fresh Pu’er leaves and subsequent processing, Pu’er tea is prone to being adulterated with impurities including tea stalks, tea fruits, branches, and grains, significantly impacting its taste and quality. If Pu’er tea products containing impurities flowed into the market, it would affect the commercial reputation of manufacturers and the health of consumers. In order to improve the quality of Pu’er tea, factories commonly remove impurities from Pu’er tea before it is compressed. The traditional impurity sorting method of Pu’er tea is manual selection, which is labor-intensive, time-consuming, and inefficient. In addition to manual selection, color sorters are also employed for the identification and removal of impurities in tea. Traditional color sorter utilize photoelectric detection technology to automatically sort out the discolored objects according to the different optical properties of samples. Color sorters are mainly used in agricultural equipment like rice sorters, bean sorters, peanut sorters, etc. It reduces the human effort, labor, and cost [17]. However, for some impurities that have a similar color to Pu’er tea, it is difficult to identify and eliminate them by traditional color sorters. More efficient and reliable methods are needed to accurately identify and remove impurities in Pu’er tea.
In recent years, with the rapid development of electronic technology, computer technology, image processing technology, machine vision, and vision-related disciplines, machine vision technology has gradually been applied to impurity detection. The traditional research process of automated impurity detection is to select the appropriate sensor to obtain the relevant features according to the feature differences between the detected items and impurities, and then use a machine learning method to classify them. Md et al. [18] utilized a digital camera that combines back, front, and structured lighting to achieve soybean quality evaluation. Then, a series of image processing algorithms were used to successfully identify the dockage fractions with an accuracy of 96% for slit beans, 75% for contaminated beans, and 98% for both defect beans and stem/pods. Mahirah et al. [19] developed a machine vision system with double lighting and an image processing algorithm to detect undesirable objects in paddy. Based on HSI color features and geometrical features, a series of image processing algorithms were utilized to detect undesirable objects and damaged grain in paddy. Vithu et al. [20] utilized a color machine vision system to identify dockage in paddy, including organic and inorganic impurities, varietal admixture, and grain admixture. Shubham et al. [21] presented an automatic, real-time, and cost-effective image processing-based system for the classification of rice grains into various categories according to their inferred commercial value. They extracted geometrical features in the spatial domain and utilized SVM (support vector machine) for multiple-class classification. Senni et al. [22] used infrared thermography to detect impurities in biscuits. Traditional image segmentation (e.g., Otsu’s method and co-occurrence matrix), handcrafted feature extraction (e.g., texture features), and classification approaches (e.g., fuzzy clustering and support vector machine) were used to detect impurities in cotton [23,24,25]. Shen et al. [26] proposed an improved convolutional neural network, WheNet, to identify five categories of impurities in wheat. These detection studies have shown good detection results. However, unlike soybean and paddy, tea is difficult to be completely dispersed by vibration feeder, making it impossible to detect impurities using geometrical feature. In addition, some studies have also shown that for impurities with similar shapes and colors, the accuracy of detection using only visible light cameras is too low to meet industrial needs.
In addition to machine vision methods, spectral imaging techniques are commonly used in intelligent detection research [27,28,29,30,31]. Spectral imaging technology combines two-dimensional imaging technology and spectral technology and can obtain the spatial and spectral information of objects at the same time. Spectral imaging techniques mainly include multispectral imaging (MSI), hyperspectral imaging (HSI) and terahertz spectral imaging (TSI). Sun et al. [32] utilized an electromagnetic vibration feeder combination with terahertz time-domain spectroscopy (THz-TDS), effectively detecting tea stalk and insect foreign bodies in finished tea products. The overall accuracy of their KNN model was 95.6%. Shen et al. [33] proposed a method to rapidly and effectively detect impurities contained in wheat based on a combination of terahertz spectral imaging and a convolutional neural network, Wheat-V2. The results showed that the designed Wheat-V2 model could effectively recognize the impurities in wheat images. Sun et al. [34] utilized terahertz spectroscopy and imaging to detect tea stalks in finished tea products. Results showed that the AirPLS-KNN model with the input vector of THz time-domain signals presented the best performance, with the accuracy rate of the prediction and its recall rate reaching 97.3% and 0.96, respectively. Yu et al. [35] used a visible and near-infrared camera to imaged green tea containing tea stalks and used a convolutional neural network (CNN) to identify tea stalks in hyperspectral image. The experimental results showed that the recognition accuracy for tea stalks reached 98.53%. Tang et al. [36] proposed a novel deep learning-based tobacco impurity detection method on hyperspectral data. Experiments conducted on the collected dataset showed that the proposed segmentation model yielded superior performance compared to other detection methods. Most studies detected only a few types of impurities and did not mention the detection of small impurities. In spectral imaging technology, the resolution of spectral images is generally low, and small impurities cannot be clearly imaged. That is, spectral data of small impurities cannot be extracted, resulting in spectral imaging technology generally unable to detect small impurities.
The information obtained by a single sensor is limited and often cannot meet the needs of classification. Machine vision system with multiple types of sensors participating in the information integration will fill this gap [37]. A multispectral camera can obtain the spectral information of tea, but its resolution is too low to meet the needs of detecting small impurities. A visible light camera makes up for this disadvantage and can integrate color features to improve classification accuracy. Therefore, it is necessary to use multiple sensors to obtain more information and achieve a more accurate identification of tea impurities. In this study, we developed a multimodal imaging-based impurity detection system having complementary capabilities for efficient detection, which included a multispectral camera for obtaining the spectral features, and a visible light camera for obtaining the color features. It could obtain more information for a more accurate analysis of the sample, and the visible-light camera effectively makes up for the shortcomings of the low resolution of the multispectral camera. In this study, this system was used to detect impurities in Pu’er tea to verify its effectiveness.
In this study, we build a multimodal detection system, including system design, image preprocessing, and image registration. Then, we explore the effect of its application in the detection of impurities in Pu’er tea. Through experiments, we verify that using this multimodal detection system can improve the accuracy of classification compared with using a single multispectral camera and that this system has the ability to detect small impurities. The proposed multimodal detection system for impurity detection provides a reference for solving increasingly complex detection projects and is expected to promote the application of multimodal systems in the industry. The aim of this study was to achieve a rapid and accurate identification of impurities in Pu’er tea.

2. Design of the Multimodal Detection System

2.1. Multimodal Detection System

In this study, a compact and efficient spectral imaging module (GX-IRSV-1100, Guangzhou Guangxin Technology Co., Ltd., Guangzhou, China) was used to acquire near-infrared diffuse spectral images of the sample surface. The spectral imaging module could obtain 10 spectral images at red and near-infrared wavelengths, corresponding to 713, 736, 759, 782, 805, 828, 851, 874, 897, and 920 nm, with an image resolution of 1024 × 1280 pixels and its field of view was 40° × 31.5° × 25.5°. Compared with other multispectral camera, this spectral imaging module has the characteristics of a small size, low cost, and high efficiency and was very suitable for rapid detection, because it adopts an F-P interferometer to select the wavelength of light passing through, and all spectral images are obtained by the same CMOS sensor. In addition, a high-resolution visible light camera (MV-CS050-60GC, Hangzhou HIKROBOT Co., Ltd., Hangzhou, China) with a higher resolution (2448 × 2048 pixels) than the spectral camera was used to obtain color images. A lens (MVL-MF1224M-5MPE, Hangzhou HIKROBOT Co., Ltd., Hangzhou, China) suitable for the camera was selected, and the field of view was slightly larger than that of the spectral camera (49.6° × 39° × 33°). Halogen lamps (MR16, Royal Dutch Philips Electronics Ltd., Amsterdam, The Netherlands) with a power of 50 W were used as light sources, and the white PU conveyor belt was used as the background of the samples. Figure 1 shows the imaging system used in this experiment. The distance between the spectral imaging module and the visible light camera was about 10 cm, and the vertical distance between the camera and the sample was about 78 cm. Four halogen lamps were distributed at the four corners and installed on adjustable brackets to adjust the angle and height of the light. The vertical distance between the halogen lamp and the sample was about 63 cm. In order to ensure the stability of the data acquisition and eliminate the impact of the sensor baseline signal and ambient light, a standard diffuse whiteboard with the same reflectivity in the working band was used to calibrate the original data. The equation is shown as follows:
I = I 0 B W B / r
where I is the calibrated data, I 0 is the original data, W is standard whiteboard data, B is the black calibration data captured when the camera sensor is covered, and r is the whiteboard reflectance (50% here).
After impurity data acquisition and calibration, the threshold segmentation method in OpenCV library was used to extract the region of interest (ROI) of each impurity sample. For different impurities, segmentation methods with a better segmentation effect were adopted, including a threshold segmentation method and an automatic threshold segmentation method. Some impurity images with low contrast were segmented after image enhancement.

2.2. Image Correction And Registration

The images obtained by the visible and multispectral cameras had a certain amount of pixel offset and a slight spatial offset. The images obtained by the two cameras needed to be registered to achieve the combined features before pixel classification. In the process of image registration, the position of the two cameras was adjusted first, and then the self-made calibration plate was placed on the conveyor belt and imaged by the two cameras separately. Since the image quality of the 828 nm spectral image was the best among all multispectral images, the 828 nm spectral image was used for image registration. The corrected whiteboard size used in the experiment was 20 cm × 20 cm. After the whiteboard calibration of the 828 nm spectral image, the reflectance image obtained was only 512 × 512 pixels due to the limitation of the size of the whiteboard. There are three common methods for image registration, which are the area-based pipeline, feature-based pipeline, and learning-based pipeline [38]. The feature-based pipeline can effectively resist the influence of geometric distortions (e.g., scale and rotation changes), and this method is much simpler than the learning-based pipeline, so the feature-based pipeline was used for image registration. The scale-invariant feature transform (SIFT) algorithm was used to obtain the feature points of the color and spectral images of the calibration plate. A KNN feature point matching method was adopted to find the matching feature points of the two images. Using the coordinates of the matched feature points, the perspective transformation matrix between the two cameras was obtained. Finally, the color images were registered with the spectral images through a perspective transformation matrix. The specific registration process is shown in Figure 2. From the original color image, it can be observed that the color image is affected by uneven illumination, which affects the color features of objects. To address this, a whiteboard was also used to correct the color image to eliminate the influence of uneven illumination. Figure 2b shows the corrected color image after registration and correction, and Figure 2c shows the 828 nm spectral image after correction, indicating that the clarity of the color image is much higher than that of the multispectral image.

3. Materials and Methods

3.1. Sample Preparation

All Pu’er tea and impurity samples used in this study were provided by a prominent Pu’er tea industry leader in Yunnan Province, China. All impurity samples were manually selected from the production line, then sorted and stored in specific containers. Impurity classes are extremely numerous, making it challenging to classify them clearly. The impurity categories include tea stalk, bamboo, leaf, wood, tea fruit, stone, cotton, plastic, and hair. Each impurity was selected in sufficient quantity, sealed in polyethylene (PE) bags, and used to extract spectral features and color features in order to establish a classification model. In addition, a certain amount of tea was mixed with various impurities in two other bags, which were set aside as test samples for conducting experiments and verify the feasibility of the impurity detection method in this study.

3.2. Feature Extraction

After the tea was dispersed by the vibration feeder, there was still overlap among tea leaves and overlap between tea and impurities, which would cause a misjudgment in identifying the category of each connected domain according to its spectral features, color features, or geometric features. Pixels are basic components of images. Each pixel in the spectral image and color image contains a spectral feature and a color feature (R, G, B), respectively. Since the surface color distribution of tea and impurities is uniform, and the internal chemical composition of each kind of object is consistent, different pixels of the same object have similar color features and spectral features. Therefore, pixel classification was adopted in this experiment. The experimental data had a total of 11 categories of pixels, including 9 categories of impurities, the tea, and the white PU conveyor belt. Each sample was separately spread out on the white PU conveyor belt and imaged by the multimodal detection system. After the images of each sample were obtained, whiteboard correction and image registration were performed on it. For each sample, the color image had the best quality among the color image and 10 spectral images. Consequently, different threshold segmentation methods were used on the color images to obtain the ROI of each sample image.
Each pixel in a spectral image corresponds to a spectral curve, and each pixel in a color image corresponds to three-channel (RGB) gray values. The reflectance at 10 wavelengths was used as the spectral feature, and the color feature was the RGB value. The reflectance at 10 wavelengths and the three-channel gray values of the RGB image yielded a total of 13 feature values, which were used as combined features. Each pixel was classified based on spectral features and combined features, respectively. The final classification results was compared to verify the effectiveness of this system.

3.3. Modeling and Evaluation

In this study, we only needed to identify impurities and not to classify each category of impurities. Therefore, the pixel samples were divided into three categories: white background, Pu’er tea, and impurity, in which the same number of pixels were extracted from the 9 categories of impurities to constitute the impurity pixel sample. For each type of pixel sample, the pixels were randomly divided into dataset A and dataset B in a ratio of 6:4. Dataset A was used to train the machine learning model, and dataset B was used to evaluate the trained model. In this study, the accuracy rate, that is, the proportion of correctly classified samples to the total number of samples, was used as the main index to evaluate the model. For the models with the highest accuracy, in order to evaluate their performance more comprehensively, precision, recall, and F1-score were used. The formulas for precision, recall, and F1-score are as follows:
p r e c i s i o n = T P T P + F P
r e c a l l = T P T P + F N
F 1 = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l
where TP, TN, FP, and FN are the number of true positives, true negatives, false positives, and false negatives, respectively.
In this study, a machine learning model was established to analyze the processed data and accurately classify pixels. Commonly used machine learning algorithms include support vector machines (SVMs), K-nearest neighbors (KNN), random forests (RFs), decision trees (DTs), etc.
An SVM is a nonlinear classifier, commonly used to solve classification problems. The kernel function solves the nonlinear problem of the classification hyperplane by mapping the non-separable data into a higher dimensional space. The algorithm still has good classification performance when the number of training samples is limited. If classes are separated by nonlinear boundaries, the SVM uses kernel functions to achieve a linear separation of classes. A radial basis function (RBF) is mostly employed because of its simplicity and speed during its computation [28].
KNN is a commonly used data mining algorithm, which is an example of lazy learning. When using KNN for prediction, all the training data are involved in the calculation. After k-nearest neighbor points are found, the category of the points to be measured is determined by a voting method based on distance. The selection of k has a certain effect on the recognition performance during the establishment of the KNN model. The optimal k value can be obtained by a cross-validation method. The KNN model needs to calculate all the training data during classification, resulting in a large number of computations. It is more suitable for the classification of low-dimensional features [39].
An RF is an ensemble learning algorithm based on decision trees, which uses multiple decision trees to carry out parallel independent prediction classification and then obtains the classification result through voting statistics among multiple trees. In RF setting, to ensure enough difference in and quantity of data in each sub-dataset, a bootstrap method is used to randomly select datasets and features, and sub-datasets are constructed for the base decision tree training. Therefore, an RF does not easily overfit the data, has good anti-noise ability, and has a strong robustness [27].
The decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure [40].
All of these machine learning models have some hyperparameters that need to be manually determined before training. Cross-validation and grid search were used to select the optimal hyperparameters. In the experiment, only the training set was used for the cross-validation and grid search to avoid data leakage in the testing set which would result in an inaccurate model evaluation.

4. Results

4.1. Feature Analysis

4.1.1. Spectral Analysis

A total of more than 400,000 pixels were extracted from the ROI of sample images; the specific number is shown in the Table 1. In each type of pixel datasets, 2000 pixels were randomly selected for the experiment. Using the spectral curves of these pixels, the average spectral curve of each category was drawn to verify the differentiation of each type of spectrum. The spectra curves of 30 pixels were randomly selected from these 2000 pixels to draw raw spectral curves for the analysis of spectral features. Figure 3 shows that spectra belonging to the same category exhibited similar trends and comparable intensities, while spectra of different categories exhibited significant differences in intensity and trends. Similar trends indicated that the samples had similar chemical compositions, while differences in intensity were related to variations in the chemical content and surface morphology of the samples. However, some spectral curves belonging to the same category had slight oscillations of intensity, causing the spectral curves of different categories to overlap at some wavelengths. That is because the reflection spectra in different parts of the same object were also different due to the granularity and height of the sample surface. However, it can be observed from the average spectral curves that there were obvious differences among the spectral curves of different categories. Although the intensity of the spectral curves of different categories could not be distinguished at some wavelengths due to the physical properties of the sample and noise, the reflectance at ten wavelengths could still be used to distinguish each category, which is why the reflectance values at ten wavelengths were used as spectral features.

4.1.2. Principal Component Analysis (PCA)

Principal component analysis (PCA) is the most widely used dimensionality reduction algorithm [41]. After dimensionality reduction, the scatter plot of principal components (PCs) can judge the degree of differentiation of different categories of data and the degree of aggregation of each category of data. The dimensionality of spectral feature data and combined feature data was reduced by PCA, retaining the first two principal components. The cumulative contribution rate of the two principal components exceeded 97%, which indicated that the PCA results could explain the characteristics of the original feature information. The two-dimensional scatter plot of the first two PCs of some training data is shown in Figure 4. It can be observed that compared with color features and spectral features, combined features had better performance for classification. In the scatter plot of the first two PCs of the combined feature vector, the features of each category of data were more aggregated, and the distinction among different categories of data was more obvious. From the scatter plots of the first two PCs of the combined features of each class of pixels, it can be seen that tea fruit, tea stalk, and leaves were easily misjudged as tea.

4.2. Comparison of Different Models

Since impurity identification only needs to detect impurities without classifying them, the same number of pixels were randomly selected from the nine impurities to form an impurity sample containing 2000 impurity pixels. In addition to the impurity samples, there were also a sample of Pu’ er tea containing 2000 pixels of Pu’er tea and a background sample containing 2000 pixels of the white conveyor belt. These pixel samples were divided into a training set and a test set in a ratio of 6:4. In this study, SVM [42,43], KNN [44], RF [45], and DT [40] models were established, respectively, and the best parameters were set by a grid search algorithm. The optimal parameters of different models obtained by the grid search algorithm in the experiment are shown in Table 2. The pixel classification accuracy of all models based on combined features was improved compared with that only based on spectral features, which indicated that adding color features could effectively improve the pixel classification accuracy. The results showed the SVM model using combined features had the highest classification accuracy of 93%. As shown in Table 3, compared with the SVM model using only spectral features, the overall accuracy was increased by 7% after adding color features.
A confusion matrix is a specific table layout used to visualize the performance of supervised learning algorithms, especially classification algorithms. In this matrix, each row represents the actual category, and each column represents the predicted category. Each cell of the matrix contains the number of samples in that actual and predicted category. With the confusion matrix, we can not only calculate evaluation metrics such as accuracy, precision, and recall but also obtain a more complete picture of the model’s performance across different categories. The confusion matrix obtained by using the SVM model to classify tea leaf pixels, impurity pixels, and background pixels is shown in Figure 5a. The confusion matrix only shows the prediction results in the test set. In order to know which impurities are easily misjudged as tea, a classification model of various impurities was established. The corresponding confusion matrix was obtained and is shown in Figure 5b. It can be observed that pixels of tea stalk, tea fruit, and leaves were easily misjudged as tea pixels.

4.3. Classification Results and Post-Processing

All types of impurities were included in the test images, which were used to verify the effectiveness of the detection method in this study. There were a total of eight test images, and Figure 6 show the processing results of four of them. The best trained model was used to process the tested images, and the pixel-classified images obtained are shown in the second column of Figure 6.
The classification images directly inferred by the model mainly had two types of misclassification: first, there were chaotic misjudged pixels (red dots) at the edge of the object in the pixel-classified images; second, there were misjudged pixels in the object region, especially in the cotton thread, tea fruit, and hair impurities.
For the first type of misjudgment, there were two main reasons: One reason was that a shadow was still generated around the object under the irradiation of the four-corner light source, resulting in changes in the spectral features and color features of the edge area of the object. Another reason was that there were still a certain pixel offset after the spectral image was registered with the color image, which depended on the matching degree of feature points of the two images and the accuracy of the image registration algorithm. For this type of misjudgment, algorithms could be used to solve it. Mode filtering, that is, each pixel set to the label value with the highest proportion in the window of a certain area of the image, could be used to deal with these chaotic misjudged pixels. However, in order to reduce the processing time of the algorithm, the label matrix was obtained first and then processed by a median filtering algorithm. In the label matrix, the positions of pixels predicted as impurities were assigned two, and the remaining positions, including those predicted as background and Pu’er tea, were assigned zero. Using a median filter in the label matrix could achieve the effect of a mode filter, which greatly reduced the processing time. The result after processing is shown in the third column of Figure 6.
For the second type of misjudgment, this was due to the low resolution of the spectral camera, the low feature differentiation, and teh surface morphology of the object. For tea fruit impurity, it was mainly because the spherical surface would cause the reflection spectra of different parts of the surface to be different. For cotton impurities, it was mainly due to the low feature distinctiveness. However, the post-processing algorithm not only handled chaotic misjudged pixels but also ensured a better connectivity of impurity regions. Therefore, post-processing could also effectively deal with the wrong pixels inside these categories of impurities. After post-processing, the impurity region was extracted by a contour detection algorithm. It can be observed from the fourth column of Figure 6 that although there were misjudgments in the regions of these two categories of impurities, they could be accurately detected after post-processing. In addition, from the impurity detection results of sample 1, it can be observed that the hair category was missed. The main reason for the misjudgment of hair impurities is that the resolution of the spectral camera was too low to clearly image small impurities, resulting in the inability to obtain the spectral feature of hair. However, all impurities except hair were successfully detected after post-processing

4.4. Small Impurity Detection and Final Detection Results

Multispectral cameras require a larger CMOS sensing area to receive sufficient light intensity, so their resolution is not very high. Small impurities such as hair, which is only 1 mm wide, are not clearly seen in spectral images. This results in a serious loss of spectral features in the hair region. However, because the resolution of the visible light camera in the multimodal detection system was higher than that of the multispectral camera, it could clearly image small objects such as hair. The multimodal detection system could effectively extract the color features of the hair region, so the classification based on the combined features could detect the hair impurities to a certain extent. In the second column of Figure 6, it can be observed from the classification image of sample 1 directly inferred by the model that there are intermittent red dots in the hair region. However, in the process of removing the chaotic misjudged red dots in the hair region, the pixels predicted as hair were processed into background pixels. In the third column of Figure 6, it can be observed from the pixel-classified images processed by mode filtering in sample 1 that the hair regions were all processed into background, leading to subsequent cases where hair was missed during impurity detection using edge detection algorithms on the impurity mask. The process of using this system to detect small impurities is shown in Figure 7. Taking advantage of the high resolution of the visible light camera, hair could be clearly imaged. In the color image, all the regions except background were segmented by a threshold segmentation method. In the pixel-classified images after processing, the regions predicted as impurities and Pu’er tea were segmented. Comparing the two segmentation images, the excess part of the segmented region of the color image was hair.
Figure 8 shows the final detection results of the eight test images. In order to evaluate the accuracy of the prediction box, the intersection over union (IoU) between the prediction box B 1 and the ground true box B 2 was used in the experiment. Its calculation formula is as follows:
I o U = B 1 B 2 B 1 B 2
A higher IoU value indicates a higher degree of overlap between the predicted and the ground-truth boxes. Figure 9 shows the IoU values of all impurities in the test images. Here, multiple IoU values correspond to a single impurity object, indicating that there are multiple predicted boxes within this impurity region. Most IoU values were greater than 0.5, with only 8 out of 38 impurity objects having IoU values less than 0.3; the corresponding impurity categories were cotton thread, bamboo, and plastic. The main reason for the small IoU value of the prediction box for these categories of impurities was that there were misjudged pixels in the impurity region, resulting in the internal disconnection of these impurity regions in the pixel-classified image. After post-processing, an impurity region was divided into multiple impurity regions or the area of the impurity region was reduced. In fact, the area of these prediction boxes contained all the impurities. In practical applications, the impurities in the prediction boxes with an IoU value greater than zero would be picked out by the automatic equipment. Consequently, 38 impurities in the test image could be successfully removed.

4.5. Model Extrapolation Capability

In real-world scenarios, the types of impurities are highly diverse, and their categories have no boundaries. In the tea production process, it is inevitable that different types of impurities will be accidentally mixed into the tea. It is challenging to obtain datasets containing all types of impurities for model training. However, models built based on specific impurity categories may have a certain degree of extrapolation ability. To verify this capability in our model, several types of impurities that were not present in the training data were selected in this study, including cigarette butts, feathers, insects, and grains. Similarly, the spectral and color features of the samples were obtained by the multimodal detection system, and the image was classified at a pixel level by the SVM model based on the combined features. The pixel-classified results are shown in Figure 10. The results show that although the combined features of these impurities were not captured in the training set, these impurity objects could be accurately detected, which indicated that the established model had good extrapolation ability. The model included three categories: background, tea, and impurities, in which all types of impurities were grouped into one category. In addition, rich types of impurities and a large number of pixel data were used in the model. All these made the model have a good generalization ability, which means that the model is highly likely to have a good detection effect for some unfamiliar types of impurities.

5. Discussion and Prospect

Through the experimental results, it was verified that the multimodal detection system could effectively detect impurities in tea. The detection time for each (512 × 512)-pixel image was about 2.66 s, which is ideal for rapid non-destructive detection. The factors that affect the detection efficiency are not only the detection algorithm time but also the imaging time and the imaging field of view. In the future, we will improve the efficiency of the impurity detection method and the accuracy of the algorithm, and more comprehensively analyze the performance of the system as follows:
  • Increase the detection field of the multimodal detection system: The multispectral camera used in this study had fast imaging speed, and the image registration algorithm was simple and fast. The total imaging and image registration time was less than 3 s. Due to the limitation of the size of the correction whiteboard in the experiment, the actual detection field corresponding to the image after registration was about 20 cm × 20 cm. In practical applications, the size of the corrected whiteboard can be increased, which improves the detection field to a certain extent. A visible light camera with a slightly larger field of view than a multispectral camera was used in the experiment, but a visible light camera with a larger resolution and field of view can be selected for practical applications. However, the detection field captured by multispectral cameras is much smaller than that of visible light cameras, but multiple multispectral cameras can be used to increase the detection field. A visible light camera with a larger field of view and a higher resolution can be positioned at the center, while multiple multispectral cameras are installed in different directions at the same distance from the visible light cameras. Through this installation method, each multispectral camera can be paired with the same visible-light camera, thus effectively increasing the detection field.
  • Improve the detection accuracy: In terms of the accuracy of the algorithm, it can be concluded from the detection results that there were problems such as the incomplete detection of the impurity object region and the deviation of the predicting box. This was mainly because the accuracy of pixel classification was not high enough, and some scattered impurity pixels were processed into other types of pixels in the post-processing, which caused the object area to become incomplete. Fortunately, despite the incomplete detection of the impurity object region, all the impurity regions could be detected. This means that an incomplete detection of the impurity object region does not affect the final detection accuracy of the impurity. In order to improve the accuracy of pixel classification, an instance segmentation based on deep learning method will be considered next. A total of 13 channels of multispectral data and visible light data were input into the network for training. Multiple image sources can provide more information, which can effectively improve the accuracy of network training results. In addition, several impurity objects were detected as the same impurity object due to overlapping or close proximity. In the subsequent process of automatic removal of impurities, this would cause some impurities to be missed. It can be solved by adding another round of impurity detection after the vibrator disperses the tea again, which can not only solve the missed detection caused by overlapping impurity objects but also solve the missed detection caused by impurities covered by tea. However, this will increase the time of detection, which needs to be balanced between accuracy and efficiency in practical applications.
  • Comprehensively analyze the performance of the system: The multimodal detection system designed in this experiment can obtain more abundant information from multiple cameras, and more importantly, it can make up for the shortcomings of the low resolution of multispectral cameras. The whole system has a high imaging stability and fast imaging speed, which are very suitable for rapid detection. This study confirmed the advantages of this system in detecting tea impurities, which can not only improve the accuracy of pixel classification but also improve the ability to detect small objects. It can also be used for impurity detection of other samples, such as soybean impurity detection, rice impurity detection, grain impurity detection, and so on. It can also be used for crop growth detection and classification problems. In the future, we consider using this system to detect impurities in rice, in cocoa beans, in tobacco, and in wheat to verify the scalability of this system. This system can be used in cases where samples and all types of impurities can be distinguished by color or spectrum. The spectral band of the multispectral camera used in the experiment can be selected in the band from 713 nm to 920 nm. The optical fiber spectrometer can be used to detect the spectral characteristics of samples and impurities, and the appropriate band can be selected. If the detection requirements cannot be met within this band, it is recommended to choose other types of multispectral cameras, build the system by referring to the method in this paper, and then refer to the impurity detection algorithm adopted in this paper. This system is particularly suitable for projects that exclusively utilize multispectral cameras, as the additional information and higher resolution will enhance the results of these studies to varying degrees.

6. Conclusions

In this paper, a multimodal image-based detection system containing a fast-imaging multispectral camera, a visible light camera, image correction, and a registration algorithm was proposed and applied to the rapid non-destructive detection of impurities in Pu’er tea to verify the feasibility and efficient detection ability of the system. The spectral features and color features of pixels in the image were obtained by the system, and the classification model was established by using combined features and spectral features, respectively. The results showed that adding color features could effectively improve the classification accuracy of pixels. The SVM model using combined features had the highest accuracy of 93%, which was 7% higher than the SVM model using only spectral features. In addition, taking advantage of the high resolution of the visible light camera, small impurities such as hair were successfully detected. The system successfully detected nine kinds of impurities in Pu’er tea, such as leaf, tea stalk, and hair, which effectively verified the high efficiency of the system. The main contribution of this study was to establish a multimodal detection system and successfully realize the rapid impurity detection of Pu’er tea. This study provides a reference for the design of a multimodal detection system for the industry and provides a method of impurity detection using a multimodal detection system. This is expected to promote the application of multimodal systems in industry and contribute to the development of future intelligent production systems in Industry 4.0.

Author Contributions

Conceptualization, X.Y. and Z.K.; methodology, Z.K. and Y.G.; software, Z.K., Y.G. and W.H.; validation, Z.K. and Y.G.; formal analysis, X.Y. and Z.K.; investigation, Z.K. and Y.C.; resources, X.Y.; data curation, Z.K.; writing—original draft preparation, Z.K.; writing—review and editing, X.Y.; visualization, W.H. and Y.C.; supervision, X.Y.; project administration, X.Y.; funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the research project titled “Study on optical properties of spectral vision systems” and funded by Sun Yat-Sen University under grant 74130-71010023.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Authors Yefan Cai and Weibin Hong were employed by the Guangzhou Guangxin Technology Co. Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Mukhi, S.E.; Varshini, R.T.; Sherley, S.E.F. Diagnosis of COVID-19 from Multimodal Imaging Data Using Optimized Deep Learning Techniques. SN Comput. Sci. 2023, 4, 212. [Google Scholar] [CrossRef] [PubMed]
  2. Nayak, M.; Tiyadi, J. Predicting the Onset of Diabetes Using Multimodal Data and a Novel Machine Learning Method; Technical Report. EasyChair. 2023. Available online: https://www.researchgate.net/profile/Jagannath-Tiyadi/publication/376595859_EasyChair_Preprint_Predicting_the_Onset_of_Diabetes_Using_Multimodal_Data_and_a_Novel_Machine_Learning_Method/links/657f14b78e2401526ddf2708/EasyChair-Preprint-Predicting-the-Onset-of-Diabetes-Using-Multimodal-Data-and-a-Novel-Machine-Learning-Method.pdf (accessed on 5 March 2024).
  3. Houria, L.; Belkhamsa, N.; Cherfa, A.; Cherfa, Y. Multimodal magnetic resonance imaging for Alzheimer’s disease diagnosis using hybrid features extraction and ensemble support vector machines. Int. J. Imaging Syst. Technol. 2023, 33, 610–621. [Google Scholar] [CrossRef]
  4. Spaide, R.F.; Curcio, C.A. Drusen characterization with multimodal imaging. Retina 2010, 30, 1441–1454. [Google Scholar] [CrossRef] [PubMed]
  5. Heintz, A.; Sold, S.; Wühler, F.; Dyckow, J.; Schirmer, L.; Beuermann, T.; Rädle, M. Design of a Multimodal Imaging System and Its First Application to Distinguish Grey and White Matter of Brain Tissue. A Proof-of-Concept-Study. Appl. Sci. 2021, 11, 4777. [Google Scholar] [CrossRef]
  6. Li, X.; Zhang, G.; Cui, H.; Hou, S.; Chen, Y.; Li, Z.; Li, H.; Wang, H. Progressive fusion learning: A multimodal joint segmentation framework for building extraction from optical and SAR images. ISPRS J. Photogramm. Remote Sens. 2023, 195, 178–191. [Google Scholar] [CrossRef]
  7. Quan, L.; Lou, Z.; Lv, X.; Sun, D.; Xia, F.; Li, H.; Sun, W. Multimodal remote sensing application for weed competition time series analysis in maize farmland ecosystems. J. Environ. Manag. 2023, 344, 118376. [Google Scholar] [CrossRef] [PubMed]
  8. Zhao, L.; Han, L.; Zhang, H.; Liu, Z.; Gao, F.; Yang, S.; Wang, Y. Study on recognition of coal and gangue based on multimode feature and image fusion. PLoS ONE 2023, 18, e0281397. [Google Scholar] [CrossRef] [PubMed]
  9. Chu, X.; Tang, L.; Sun, F.; Chen, X.; Niu, L.; Ren, C.; Li, Q. Defect detection for a vertical shaft surface based on multimodal sensors. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8109–8117. [Google Scholar] [CrossRef]
  10. Saran, G.; Ganguly, A.; Tripathi, V.; Kumar, A.A.; Gigie, A.; Bhaumik, C.; Chakravarty, T. Multi-modal imaging-based foreign particle detection system on coal conveyor belt. Trans. Indian Inst. Met. 2022, 75, 2231–2240. [Google Scholar] [CrossRef]
  11. Jiang, L.; Xue, R.; Liu, D. Node-Loss Detection Methods for CZ Silicon Single Crystal Based on Multimodal Data Fusion. Sensors 2023, 23, 5855. [Google Scholar] [CrossRef]
  12. Maheshkar, V. Improved Detection of Recyclable Plastics Using Multi Modal Sensing and Machine Learning. Ph.D. Thesis, State University of New York at Buffalo, Buffalo, NY, USA, 2023. [Google Scholar]
  13. Villafana, T.; Edwards, G. Creation and reference characterization of Edo period Japanese woodblock printing ink colorant samples using multimodal imaging and reflectance spectroscopy. Herit. Sci. 2019, 7, 1–14. [Google Scholar] [CrossRef]
  14. Lee, J.H.; Kim, B.H.; Kim, M.Y. Machine learning-based automatic optical inspection system with multimodal optical image fusion network. Int. J. Control. Autom. Syst. 2021, 19, 3503–3510. [Google Scholar] [CrossRef]
  15. Tian, J.; Zhu, Z.; Wu, B.; Wang, L.; Liu, X. Bacterial and fungal communities in Pu’er tea samples of different ages. J. Food Sci. 2013, 78, M1249–M1256. [Google Scholar] [CrossRef] [PubMed]
  16. Lv, H.p.; Zhang, Y.j.; Lin, Z.; Liang, Y.r. Processing and chemical constituents of Pu-erh tea: A review. Food Res. Int. 2013, 53, 608–618. [Google Scholar] [CrossRef]
  17. Thike, A.; San, Z.M.; Oo, Z.M. Design and development of an automatic color sorting machine on belt conveyor. Int. J. Sci. Eng. Appl. 2019, 8, 176–179. [Google Scholar] [CrossRef]
  18. Momin, M.A.; Yamamoto, K.; Miyamoto, M.; Kondo, N.; Grift, T. Machine vision based soybean quality evaluation. Comput. Electron. Agric. 2017, 140, 452–460. [Google Scholar] [CrossRef]
  19. Mahirah, J.; Yamamoto, K.; Miyamoto, M.; Kondo, N.; Ogawa, Y.; Suzuki, T.; Habaragamuwa, H.; Ahmad, U. Monitoring harvested paddy during combine harvesting using a machine vision-Double lighting system. Eng. Agric. Environ. Food 2017, 10, 140–149. [Google Scholar] [CrossRef]
  20. Vithu, P.; Anitha, J.; Raimond, K.; Moses, J. Identification of dockage in paddy using multiclass SVM. In Proceedings of the 2017 International Conference on Signal Processing and Communication (ICSPC), Coimbatore, India, 28–29 July 2017; pp. 389–393. [Google Scholar]
  21. Mittal, S.; Dutta, M.K.; Issac, A. Non-destructive image processing based system for assessment of rice quality and defects for classification according to inferred commercial value. Measurement 2019, 148, 106969. [Google Scholar] [CrossRef]
  22. Senni, L.; Ricci, M.; Palazzi, A.; Burrascano, P.; Pennisi, P.; Ghirelli, F. On-line automatic detection of foreign bodies in biscuits by infrared thermography and image processing. J. Food Eng. 2014, 128, 146–156. [Google Scholar] [CrossRef]
  23. Zhang, H.; Li, D. Applications of computer vision techniques to cotton foreign matter inspection: A review. Comput. Electron. Agric. 2014, 109, 59–70. [Google Scholar] [CrossRef]
  24. Zhang, R.; Li, C.; Zhang, M.; Rodgers, J. Shortwave infrared hyperspectral reflectance imaging for cotton foreign matter classification. Comput. Electron. Agric. 2016, 127, 260–270. [Google Scholar] [CrossRef]
  25. Zhang, M.; Li, C.; Yang, F. Classification of foreign matter embedded inside cotton lint using short wave infrared (SWIR) hyperspectral transmittance imaging. Comput. Electron. Agric. 2017, 139, 75–90. [Google Scholar] [CrossRef]
  26. Shen, Y.; Yin, Y.; Zhao, C.; Li, B.; Wang, J.; Li, G.; Zhang, Z. Image recognition method based on an improved convolutional neural network to detect impurities in wheat. IEEE Access 2019, 7, 162206–162218. [Google Scholar] [CrossRef]
  27. Pan, S.; Zhang, X.; Xu, W.; Yin, J.; Gu, H.; Yu, X. Rapid On-site identification of geographical origin and storage age of tangerine peel by Near-infrared spectroscopy. Spectrochim. Acta Part A Mol. Biomol. 2022, 271, 120936. [Google Scholar] [CrossRef]
  28. Zhang, X.; Gao, Z.; Yang, Y.; Pan, S.; Yin, J.; Yu, X. Rapid identification of the storage age of dried tangerine peel using a hand-held near infrared spectrometer and machine learning. J. Infrared Spectrosc. 2022, 30, 31–39. [Google Scholar] [CrossRef]
  29. Yu, Z.; Cui, W. LSCA-net: A lightweight spectral convolution attention network for hyperspectral image processing. Comput. Electron. Agric. 2023, 215, 108382. [Google Scholar] [CrossRef]
  30. Yin, J.; Yang, Y.; Hong, W.; Cai, Y.; Yu, X. Portable smart spectrometer integrated with blockchain and big data technology. Appl. Sci. 2019, 9, 3279. [Google Scholar] [CrossRef]
  31. Liang, D.; Zhou, Q.; Ling, C.; Gao, L.; Mu, X.; Liao, Z. Research progress on the application of hyperspectral imaging techniques in tea science. J. Chemom. 2023, 37, e3481. [Google Scholar] [CrossRef]
  32. Sun, X.; Xu, C.; Luo, C.; Xie, D.; Fu, W.; Gong, Z.; Wang, X. Non-destructive detection of tea stalk and insect foreign bodies based on THz-TDS combination of electromagnetic vibration feeder. Food Qual. Saf. 2023, 7, fyad004. [Google Scholar] [CrossRef]
  33. Shen, Y.; Yin, Y.; Li, B.; Zhao, C.; Li, G. Detection of impurities in wheat using terahertz spectral imaging and convolutional neural networks. Comput. Electron. Agric. 2021, 181, 105931. [Google Scholar] [CrossRef]
  34. Sun, X.; Cui, D.; Shen, Y.; Li, W.; Wang, J. Non-destructive detection for foreign bodies of tea stalks in finished tea products using terahertz spectroscopy and imaging. Infrared Phys. Technol. 2022, 121, 104018. [Google Scholar] [CrossRef]
  35. Yu, X.; Zhao, L.; Liu, Z.; Zhang, Y. Distinguishing tea stalks of Wuyuan green tea using hyperspectral imaging analysis and Convolutional Neural Network. J. Agric. Eng. 2024. [Google Scholar] [CrossRef]
  36. Tang, L.; Zhao, M.; Shi, S.; Chen, J.; Li, J.; Li, Q.; Li, R. Tobacco Impurities Detection with Deep Image Segmentation Method on Hyperspectral Imaging. In Proceedings of the 2023 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Zhengzhou, China, 14–17 November 2023; pp. 1–5. [Google Scholar]
  37. Yang, Z.; Ma, W.; Lu, J.; Tian, Z.; Peng, K. The Application Status and Trends of Machine Vision in Tea Production. Appl. Sci. 2023, 13, 10744. [Google Scholar] [CrossRef]
  38. Zhu, B.; Zhou, L.; Pu, S.; Fan, J.; Ye, Y. Advances and challenges in multimodal remote sensing image registration. IEEE J. Miniaturization Air Space Syst. 2023, 4, 165–174. [Google Scholar] [CrossRef]
  39. Peterson, L.E. K-nearest neighbor. Scholarpedia 2009, 4, 1883. [Google Scholar] [CrossRef]
  40. Song, Y.Y.; Ying, L. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar] [PubMed]
  41. Rao, C.R. The use and interpretation of principal component analysis in applied research. Sankhyā Indian J. Stat. Ser. A 1964, 26, 329–358. [Google Scholar]
  42. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  43. Yang, Y.; Zhang, X.; Yin, J.; Yu, X. Rapid and nondestructive on-site classification method for consumer-grade plastics based on portable NIR spectrometer and machine learning. J. Spectrosc. 2020, 2020, 1–8. [Google Scholar] [CrossRef]
  44. Jiang, L.; Cai, Z.; Wang, D.; Jiang, S. Survey of improving k-nearest-neighbor for classification. In Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), Haikou, China, 24–27 August 2007; Volume 1, pp. 679–683. [Google Scholar]
  45. Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
Figure 1. Multimodal detection system.
Figure 1. Multimodal detection system.
Remotesensing 16 01590 g001
Figure 2. The process of image registration: (a) original color image; (b) corrected color image; (c) 828 nm spectral image.
Figure 2. The process of image registration: (a) original color image; (b) corrected color image; (c) 828 nm spectral image.
Remotesensing 16 01590 g002
Figure 3. (a) Three types of raw spectral curves. (b) Three types of average spectral curves. (c) All types of raw spectral curves. (d) All types of average spectral curves.
Figure 3. (a) Three types of raw spectral curves. (b) Three types of average spectral curves. (c) All types of raw spectral curves. (d) All types of average spectral curves.
Remotesensing 16 01590 g003
Figure 4. Scatter plot of the first two principal components (a). Combined feature vector of 3 kinds of pixels. (b) Spectral feature vector of 3 kinds of pixels. (c) Color feature vector of 3 kinds of pixels. (d) Combined feature vector of 11 kinds of pixels.
Figure 4. Scatter plot of the first two principal components (a). Combined feature vector of 3 kinds of pixels. (b) Spectral feature vector of 3 kinds of pixels. (c) Color feature vector of 3 kinds of pixels. (d) Combined feature vector of 11 kinds of pixels.
Remotesensing 16 01590 g004
Figure 5. The results of the confusion matrix: (a) classification of tea pixels, impurity pixels, and background pixels, (b) classification of all types of pixels.
Figure 5. The results of the confusion matrix: (a) classification of tea pixels, impurity pixels, and background pixels, (b) classification of all types of pixels.
Remotesensing 16 01590 g005
Figure 6. The results of 4 test images. (a) Corrected color image. (b) Pixel-classified images. (c) Pixel-classified images after processing. (d) Prediction box. (e) Ground-truth box.
Figure 6. The results of 4 test images. (a) Corrected color image. (b) Pixel-classified images. (c) Pixel-classified images after processing. (d) Prediction box. (e) Ground-truth box.
Remotesensing 16 01590 g006
Figure 7. The process of detecting small impurities.
Figure 7. The process of detecting small impurities.
Remotesensing 16 01590 g007
Figure 8. The final detection result of the test images.
Figure 8. The final detection result of the test images.
Remotesensing 16 01590 g008
Figure 9. The IOU value between the predicted and ground-truth boxes of the impurities in tested images.
Figure 9. The IOU value between the predicted and ground-truth boxes of the impurities in tested images.
Remotesensing 16 01590 g009
Figure 10. The inference results of other impurity samples.
Figure 10. The inference results of other impurity samples.
Remotesensing 16 01590 g010
Table 1. Specific number of pixel samples.
Table 1. Specific number of pixel samples.
TeaTea StalkBambooLeafWoodTea FruitStoneHairPlasticCotton
Remotesensing 16 01590 i001Remotesensing 16 01590 i002Remotesensing 16 01590 i003Remotesensing 16 01590 i004Remotesensing 16 01590 i005Remotesensing 16 01590 i006Remotesensing 16 01590 i007Remotesensing 16 01590 i008Remotesensing 16 01590 i009Remotesensing 16 01590 i010
Color image
Remotesensing 16 01590 i011Remotesensing 16 01590 i012Remotesensing 16 01590 i013Remotesensing 16 01590 i014Remotesensing 16 01590 i015Remotesensing 16 01590 i016Remotesensing 16 01590 i017Remotesensing 16 01590 i018Remotesensing 16 01590 i019Remotesensing 16 01590 i020
Segmented image
92,07053,32436,18667,93429,38736,43113,98624,06926,12220,720
Number of pixels in ROI
Table 2. The optimal parameters of different models obtained by the grid search algorithm.
Table 2. The optimal parameters of different models obtained by the grid search algorithm.
ModelOptimal Paremeters
Spectral featuresSVMC: 100, kernel: linear
RFmax_features: sqrt, n_estimators: 100
KNNn_neighbors: 10, p: 4, weights: distance
DTcriterion: entropy, max_depth: 7, min_samples_leaf: 11
Combined featuresSVMC: 100, kernel: linear
RFmax_features: 0.8, n_estimators: 50
KNNn_neighbors: 5, p: 3, weights: distance
DTcriterion: entropy, max_depth: 10, min_samples_leaf: 41
Table 3. Accuracy using different features and different models.
Table 3. Accuracy using different features and different models.
SVMRFKNNDT
Spectrum0.860.860.860.84
Spectrum + RGB0.930.910.910.88
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kuang, Z.; Yu, X.; Guo, Y.; Cai, Y.; Hong, W. Design of a Multimodal Detection System Tested on Tea Impurity Detection. Remote Sens. 2024, 16, 1590. https://doi.org/10.3390/rs16091590

AMA Style

Kuang Z, Yu X, Guo Y, Cai Y, Hong W. Design of a Multimodal Detection System Tested on Tea Impurity Detection. Remote Sensing. 2024; 16(9):1590. https://doi.org/10.3390/rs16091590

Chicago/Turabian Style

Kuang, Zhankun, Xiangyang Yu, Yuchen Guo, Yefan Cai, and Weibin Hong. 2024. "Design of a Multimodal Detection System Tested on Tea Impurity Detection" Remote Sensing 16, no. 9: 1590. https://doi.org/10.3390/rs16091590

APA Style

Kuang, Z., Yu, X., Guo, Y., Cai, Y., & Hong, W. (2024). Design of a Multimodal Detection System Tested on Tea Impurity Detection. Remote Sensing, 16(9), 1590. https://doi.org/10.3390/rs16091590

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop