A Fast Online Classiﬁcation Method of Solid Wood Floors Based on Stochastic Sampling and Machine Learning

: Solid wood ﬂoors are widely used as an interior decoration material, and the color of solid wood surfaces plays a decisive role in the ﬁnal decoration effect. Therefore, the color classiﬁcation of solid wood ﬂoors is the ﬁnal and most important step before laying. However, research on ﬂoor classiﬁcation usually focuses on recognizing complex and diverse features but ignores execution speed, which causes common methods to not meet the requirements of online classiﬁcation in practical production. In this paper, a new online classiﬁcation method of solid wood ﬂoors was proposed by combining probability theory and machine learning. Firstly, a probability-based feature extraction method (stochastic sampling feature extractor) was developed to obtain rapid key features regardless of the disturbance of wood grain. The stochastic features were determined by a genetic algorithm. Then, an extreme learning machine—as a fast classiﬁcation neural network—was selected and trained with the selected stochastic features to classify solid wood ﬂoors. Several experiments were carried out to evaluate the performance of the proposed method, and the results showed that the proposed method achieved a classiﬁcation accuracy of 97.78% and less than 1 ms for each solid wood ﬂoor. The proposed method has advantages including a high execution speed, great accuracy, and ﬂexible adaptability. Overall, it is suitable for online industry production.


Introduction
Solid wood is an important natural resource and is widely used in various furniture manufacturing processes due to its unique color and natural wood grain [1]. The final step before distribution for sale is the color classification by experienced workers, since the color consistency of a batch of boards affects its commercial value [2]. However, wood color grading by trained workers has low efficiency and strong subjectivity in the wood processing industry [3].
Currently, some studies have been conducted, and several algorithms have been used for color classification, including support vector machine (SVM) [4], K-nearest neighbors (K-NN) [5], decision trees [6], fuzzy rules [7], and neural network [8][9][10][11]. The SVM is sensitive to missing data when constructing support vectors with training samples, and it is difficult to achieve high accuracy due to the wood grain interference [12][13][14]. K-NN employs the entire dataset that serves as the feature space and calculates the distance between the given example and each labeled sample to classify unknown samples based on prior knowledge. However, the calculation of the distance is time consuming and increases with the data amount, which makes it unacceptable for wood floor online classification industry production [15]. The decision tree method uses a flowchart-like structure that is based on attribute tests. However, due to its conditional control statements, it is difficult to design a robust structure for various kinds of wood floors with countless grain disturbances [6]. Fuzzy rules introduce human fuzzy characteristics into the classification rules and achieve more robustness than pure decision trees. However, the fuzzy logic depends on the determination of expert experience, which is difficult to build the rules in fields [16]. The neural network is able to learn weights from the training dataset and has a generalization ability for the testing dataset with an appropriate structure [17]. However, traditional neural networks with simple structures are sensitive to the input data and are not suitable for handling high dimension inputs such as an image, which will cause high computational costs on a fully connected input layer [18].
With the development of hardware, especially high-performance graphic processing units (GPUs), deep learning has become more widely used due to its parallelable computing structure. The proposal of the convolutional layer greatly improves the classification accuracy and has recently been widely used in the classification field [19]. The convolutional feature extractor can extract features from images automatically [20], but it requires adequate training samples, long training time, and high-performance execution equipment, which increases industrial production costs and limits its industrial application.
Additionally, wood grain also has a great influence on the color classification of solid wood boards. Although adding the number of convolutional layers to extract highlevel abstract features could reduce, to a certain extent, the influence of wood grain on classification accuracy, the execution time of the deep convolutional networks is too long for online production.
Therefore, a fast and automatic floor classification method is urgently needed for wood floor manufactory industries, which should meet the speed and color consistency requirements of online wood production. Moreover, solid wood floor industries, in their practical production, require that the classification method should be rapidly trained in the field to meet the switching of different floors whenever necessary and that the algorithm should be robust to various wood grain distributions. On the other hand, the processing time should be as short as possible and must meet the requirement of dealing with the continuously coming wood floor stream.
The aim of this research is to develop an online method to identify the color grade of solid wood floors for industrial production. Although wood grain varies, the color of the solid wood floor is uniform. Therefore, considering the color probability distribution of the solid wood floor image pixels [21], a stochastic sampling feature extractor was proposed to extract features based on probability theory and obtain rapid color features regardless of the wood grain disturbance. The color features that reduced the high-dimensional image data of solid wood were transformed into the low-dimensional color features in order to reduce the complexity of the classifier model and hasten the execution efficiency. Then, a fast classification neural network (extreme learning machine, ELM) [22] was constructed to classify solid wood floors based on color features that were optimized by the genetic algorithm.
The specific contributions of this work are as follows: • A stochastic sampling feature extractor (SSFE) was developed and proved based on probability theory for extracting quickly statistical features from the wood floor images with flexible equipment adaptability.

•
The genetic algorithm was used to optimize those statistical features considering the complexity and accuracy of the classification neural network. • A flexible workflow was presented for classifying solid wood floors online in the industry.
The remaining paper is organized as follows: Section 2 describes the utilized image acquisition system and explains the proposed online classification method in detail. Section 3 presents and discusses the experimental results, and Section 4 concludes this work.

Materials and Data Collection
The image acquisition system for solid wood floors was built in our laboratory to obtain experimental data, as shown in Figure 1. The acquisition system included a line scan camera, conveyor belt, light source, and photoelectric sensor, where the camera was on the top of the belt gap, the photoelectric sensor was located at both sides of the conveyor belt, and the light source was a white strip light. The camera used was a Linea LA-GC-02K05B color line scan camera produced by Teledyne DALSA with a resolution of 2048×2, and the max line frequency was 26 kHz. The photoelectric sensor was the ES12-D15NK (LanHon, Shanghai, China) with direct current (DC) type and normal open features. The trigger module generated an effective pulse to trigger the camera to scan continuously until the solid wood floor passed away the photoelectric sensor. By adjusting the parameters of the camera and the brightness of the light source, solid wood floor images were collected, and a scanning photo of the solid wood floor including the background had the dimensions: 12,450 × 2048 × 3 (width × height × channels), as shown in Figure 2. Finally, 432 solid wood floor images were collected to construct an experimental dataset. These solid wood floor images were labeled into three kinds of color grades: light grade, medium grade, and dark grade, by experienced workers from the Dehua TuBao New Decoration Material Company. The detailed data segmentation is listed in Table 1.

Online Classification Method of Solid Wood Floors
The proposed classification algorithm included three main parts: (1) statistical features extracted by a stochastic sampling feature extractor, (2) key features selected by the genetic algorithm, and (3) classification by the extreme learning machine. The whole workflow of the proposed online classification method is shown in Figure 3. Firstly, the purity configuration was based on the performance of the classification system hardware. The low-performance hardware was given a low-purity configuration, while, in contrast, the high-performance hardware was given a higher purity configuration. Secondly, the input wood floor image was randomly extracted with the optimized sampling configuration, and then, the sampled points were converted in color space [23] to obtain statistical features including color and texture features by rule filtering. Finally, the classification network was trained on a genetic algorithm (GA) under different requirements. According to the device performance and classification speed requirement, the appropriate priority configuration coupled with the trained neural network weights could be utilized to obtain the color grade in the field.

Stochastic Sampling Feature Extractor
In practical solid wood floor production, the wood grain has a significant influence on the accuracy of color classification. Conventional methods utilize the robustness of deep convolutional layers against the disturbance of various wood grains. However, the deep convolutional extractor causes a lack of interpretability and speed loss and often requires long training progress and high training device cost, which is impractical to classify various solid wood floor styles in the field. Therefore, a relatively simple stochastic sampling feature extractor (SSFE) was proposed to extract reliable wood floor color based on probability theory with minimal computational costs. Figure 4 shows the difference between the traditional convolutional feature extractor and the proposed SSFT. In the SSFT, the first step is random sampling in order to obtain a reliable wood color. The colors of wood floor image pixels were divided into two categories: (1) original wood floor color based on the human color perception and (2) disturbing color including wood grain and background. The original wood floor color is denoted as C o , and the disturbing color is denoted as C d . A subset C s of the original image color set C is obtained when sampling from the wood floor image: An extracted pixel can be denoted as a discrete random variable, X p . According to the occurrence probability of two colors C o and C d , the probability of X p = C o is calculated as p o , and the probability of X p = C d is calculated as p d , and the two probabilities should satisfy as follows: Assume sampling n samples from the input wood floor image, a subset color C s,n can be obtained, which is a mixture of C o and C d . The sampling progress can be viewed as a binomial experiment, so the times of random events X p = C o and X p = C d occurring are recorded as random variables k o and k d . The probability distributions of these two variables are: The second step of the SSFT is rule filtering to extract the color C o . According to the wood industry, the brightness of C o is higher than C d . From this perspective, the filtering rule was designed as a sorting progress according to the brightness in the extracted subset C s ; and, only i samples with top i brightness could be filtered down to the following classification.
Then, the color purity of the filtered i samples is denoted as pur and defined as follows: The pur is a hyperparameter and set based on the device performance. Then, the operation parameters i are optimized as follows: By solving Equation (5) to obtain an optimized i, the SSFE is able to extract wood color under the purity requirement, pur. Meanwhile, the sampling parameter controls the robustness against the interference.
The third step of the SSFT is feature extraction. After n pixels are sampled from a floor image, a set of features should be constructed based on subsequent neural network classification by comparing the differences between solid wood floors of different color grades. Considering that the human eye perceives color not only based on differences in hue but also the brightness and vividness, the color information was mainly incarnated by the color moments in HSV space and Lab space in terms of the color grade of solid wood floors classification. Finally, 12 statistical features were designed including l mean , a mean , b mean , h mean , s mean , v mean , l var , a var , b var , h var , s var , and v var .

Feature Selection Based on Genetic Algorithm
Feature selection has an important role in classification to maximize the classification accuracy and minimize the number of features that slows the network inference progress. A genetic algorithm (GA) [24] was used to search the optimal feature combination. GA is a random search method that is capable of efficiently discovering large search spaces, which is commonly used in feature selection. Furthermore, unlike other search algorithms, the GA conducts a global search rather than a local, or greedy search. The basic concept is to evolve a population of individuals, each of which is a possible solution of how features are selected [25,26]. The GA is made up of three main operators: reproduction, crossover, and mutation. The GA begins by randomly seeding a population of potential solutions. At the end of each generation, the population is evaluated and tested for algorithm termination. If the termination condition is not met, the population is re-evaluated by running it through the three GA operators. This process would be repeated until the stopping criterion is met [27], as shown in Figure 5.

Fast Classification Network Based on ELM
In real-world wood floor production, the floor colors may be different in each production batch due to the external environment and materials, which means that the definition of the color grades changes frequently. Conventional neural networks adopt a backpropagation algorithm to train the weights, which is a time-consuming process, which is unacceptable in the industry field if the training was a constant task. In this study, we employed the ELM as the backbone classifier with the advantages of a strong generalization ability and fast learning speed [21,28]. An ELM usually uses a single-layer feedforward network containing three layers: an input layer, a hidden layer, and an output layer.
The hidden layer in the ELM can be expressed as follows: where In other words, it found the parameters, including β i , W i , and b i , that make the equation above tenable.
Assuming H is the output of the hidden nodes and β is the weight of the outputs, the equation can be expressed as follows: where T is the expected output.
H(W 1 , . . . , W L , b 1 , . . . , b L , X 1 , . . . , X L ) Once the input weights W i and bias b i of the ELM are determined, the output matrix of the hidden nodes H is definite. In this way, the ELM model with one hidden layer can be transformed into a linear system Hβ = T, and the weights of outputs β can be calculated as follows:β = H + T where H + is the generalized inverse of matrix H, and T is the expected output. Finally, the flow chart of the proposed method for the online classification of solid wood floors is shown in Figure 6. The SSFE extracted statistical features as the inputs of the ELM classifier to reduce the complexity of the classifier and speed up the execution efficiency by transforming the high-dimensional solid wood floor image data into lowdimensional color features.

Experiment and Metrics
The color classification programs of solid wood floors experiments were written in Python, using the machine learning library Scikit-Learn and deep learning framework Pytorch. The software, hardware, and compilation environment configuration are listed in Table 2. To evaluate the performance of the proposed method, an evaluation criterion, accuracy, was used and defined as: where N c is the number of wood floors classified correctly, N is the total number of floors, and ACC is the overall classification accuracy.

Hyperparameters of Stochastic Sampling Feature Extractor
The hyperparameters n and pur strictly affect the performance of the SSFE. The hyperparameter n controls the sample number picked by the sampling progress, which would affect the following computing speed from the perspective of processing the data amount. On the other hand, the hyperparameter pur controls the numbers that can pass through the rule filtering, by which the final classification accuracy is determined. In order to evaluate the effect of these two hyperparameters, the structure of the subsequent ELM was fixed. In the experiment, the number of hidden layer neurons (of the ELM) was set to 14, and the output was set to three (determined by the target color grades).
Firstly, a comparative experiment was carried out to test the effect of n, where the pur was set to 0.95. Figure 7 shows the accuracy changing tendency of the overall classification accuracy corresponding with the increase in n. As the total sampling number n increases, the accuracy presents a rising tendency. The total number of samples determines the number of filtered samples ultimately used for feature extraction. Equation (5) shows that when pur and p d are fixed, the number of samples that are filtered by the rule increases as the total sampling number increases. When taking a few samples to classify the color grade of the solid wood floors, the i samples which are filtered by the rule are not suitable to extract features, since using few samples to represent the original color of solid wood floors is not reliable. With the increase in n, these i samples tend to show a higher possibility of representing the original color of solid wood floors. The extracted features can reflect the color grade of solid wood floors more accurately, which brings the higher classification accuracy at the cost of performance. Moreover, the filtering rules also have an effect on classification accuracy. When the interference during sampling increases or the requirement for purity is higher, the rules that are utilized to filter samples become stricter. This means that when increasing the value of p d and pur in Equation (5), the filtered samples i will decrease with n being fixed. However, the filtered samples have a higher probability of representing the color grade of floors than loose rules. Figure 8 shows the classification accuracy corresponding to different values of p d with an increase in p d , where n was fixed to 1000 and pur was set to 0.95. It can be seen that the classification accuracy does not seem to be affected. Even under the extreme condition p d = 0.95, I.e., in which the wood grain of the floor occupies most of the floor area, the over-classification can still reach 93.33%. This occurs mainly because the base number of total samples is large, and there are still enough remaining samples after rule filtering, which leads to a higher probability of representing the color grade of the floors even under strict rules. Figure 9 shows the condition of classification accuracy when different values of pur are set, which is similar to p d . Changes to hyperparameter pur have almost no effect when n is large. In summary, n has a significant inference on the classification accuracy because the number of filtered samples i is mainly determined by n. When n is large enough, p d and pur will have only a small impact on i.

Features Selected by GA
After confirming the color space for feature extraction, we used GA to search the feature subset. The optimization goal was to maximize the classification accuracy on the test set, with the hyperparameters fixed temporarily, where n was set to 5000 and pur was set to 0.95. The number of neurons in the hidden layer and the output layer of the ELM were set to 14 and three, respectively. The number of neurons in the input layer depended on the number of features. The fitness value (F) was defined as follows: The features selection searching process is shown in Figure 10. In the initial generation, there are various evaluation values, which highlight the accuracy of the selected features that were used for the subsequent neural network. With the growth in the generation number, the accuracy points of the classification method, corresponding to the best combination of features, gather together slowly, although some points are out of the rule due to mutation. In the end, all the feature combinations tended to have the same fitness, which means that the algorithm reached its convergence. The search space of the feature combinations is quite large, and the searched feature combinations are not a single result. Table 3 lists some combinations of the features searched by the GA. By comparing the results of all selected features, we found that most feature combinations that were searched by the GA included the first-order moment of channel "l" in the Lab color space. This shows that the color depth of different solid wood floors is mainly reflected in the brightness of visual perception. Therefore, the brightness distribution of the entire solid wood floor image is an important feature for the classification result. It is also worth mentioning that by comparing feature combinations of groups 6 and 8 with other groups, the contribution of the first-order "h" channel in the classification is average. This also shows that different color grades of solid wood floors have similar hues-the difference in how the human eye perceives different color grades is caused by changes in color depth. Compared with other groups, group 5 uses the least features and achieves the same model accuracy; moreover, the first-order moment of the "s" channel is also related to the color depth. As for the "b" channel in the Lab color space, this represents the component of blue to yellow, because the solid wood floors' color tone is mainly yellow from the standpoint of human perceived of color; this is especially evident on light-colored and medium-colored floors. Compared with the accuracy rate of 89% with all the features, this confirms the effectiveness of feature selection.  Table 4 shows the confusion matrix of the test results, in which the model was trained under the condition of hyperparameter p d set to 0.3, pur set to 0.98, and the feature set including the first-order moment of the "l" channel and "b" channel in the Lab color spacealso including the "s" channel in HSV color space. The dark color grade and the light color grade floor scan can be recognized correctly with high accuracy. The medium color grade floors may be recognized as a light color grade. In fact, from the perspective of human visual perception, compared to the light color grade and dark color grade, there are more differences between different medium color grade floors. It seems that the medium color grade oscillates between the light color grade and the dark color grade. This may lead to fuzzy boundaries between the light color grade and medium color grade. Even so, the classification accuracy of medium color grade is still up to 93.33% in the testing set.

Classification Performance Evaluation for Different Methods
In order to evaluate the performance of the proposed method-comparing other published methods-six repeated experiments were carried out. The results are listed in Table 5, where the first seven methods were all deep learning methods and the last two are the methods based on color features. Among these deep learning methods, MNASNet showed the lowest accuracy on the testing set, and the other six methods showed similar accuracy. The two methods based on color features-the proposed method and XGBoost method-achieved similar classification accuracy and were comparable to those deep learning methods. Table 5. Comparison of classification accuracy on the testing set for different methods, where the data are in the format of µ ± ∆x, µ represents the mean value of the results, and ∆x shows the confidence interval with a confidence limit of 0.95.

Classification Accuracy (%)
Resnet-18 [29] 97.78 ± 2.59 VGG16 [30] 96.44 ± 3.01 ResNet-34 [29] 95.56 ± 1.77 ResNet-50 [29] 96.00 ± 0.89 MobileNetV2 [31] 97.11 ± 0.89 MobileNetV3 [31] 97.78 ± 1.99 MNASNet [32] 47.77 ± 1.78 XGBoost [6] 97.00 ± 2.59 The proposed method 97.78 ± 1.56 For online production, the running time of classification methods is another important index, which would determine the probability of practical engineering application. Therefore, another set of experiments was carried out to evaluate the complexity of different methods, and the results are listed in Table 6. In terms of the training time, the deep learning methods spent more time than the last two methods based on color features. Compared with the proposed method, the XGBoost model spent shorter training time due to without feature selection process.
For the online classification time, the cooperation company (Dehua TuBao New Decoration Material Company) required that the practical production time of each image should be less than 10 ms. Compared with the other six methods in Table 6, the proposed method has a much shorter classification time, and it is the only one meeting the practical production requirement.  [33] is commonly used to the computational cost of a model in order to evaluate its complexity, and FLOPs of these seven methods are also listed in Table 6. It can be seen that the proposed method in this study has a great advantage in regard to computational cost compared with deep learning methods. All deep learning methods obtain high FLOPs, and some lightweight deep learning networks, such as MobileNetV2, have lower computational complexity due to the addition of depthwise separable convolution. This demonstrates that the computation complexity will increase when the convolutional feature extractor operates-compared with the SSFE-also proving the effectiveness of the proposed method for classification.

Conclusions
In this study, a flexible method was proposed by developing a stochastic sampling feature extractor and combining ELM for the online classification of solid wood floors in the industry. The stochastic sampling feature extractor was developed based on probability theory, a GA was used to search the feature space in order to rapidly extract optimal features with minimal computational resources, and the ELM was used for rapid classification. The proposed model was used to classify three color grades of solid wood floors with a classification accuracy of 97.78% and an image processing speed of less than 1 ms, which met the industrial production speed requirement. Comparing the experimental results, the proposed method has the following advantages: (1) The proposed method removes the influence of wood grain and background and is robust to the wood grain distribution of wood grain. (2) The proposed method satisfies flexible switching of the classification of different floor styles in production. (3) The proposed method's image processing speed is much faster than that of deep learning methods, and the accuracy of the proposed method is comparable with that of deep learning methods.
On the other hand, it is worth noting that the proposed method for online classification is not completely automatic because it needs an expert to set up the classification system and provide training samples when changing from one batch of solid wood floors to another. Furthermore, the advantages of the proposed online method, such as accuracy, efficiency, and cost, should be verified in the industrial production compared with manual production in future, which is significantly important for popularization and application.