A Transfer Residual Neural Network Based on ResNet ‐ 34 for Detection of Wood Knot Defects

: In recent years, due to the shortage of timber resources, it has become necessary to reduce the excessive consumption of forest resources. Non ‐ destructive testing technology can quickly find wood defects and effectively improve wood utilization. Deep learning has achieved significant re ‐ sults as one of the most commonly used methods in the detection of wood knots. However, com ‐ pared with convolutional neural networks in other fields, the depth of deep learning models for the detection of wood knots is still very shallow. This is because the number of samples marked in the wood detection is too small, which limits the accuracy of the final prediction of the results. In this paper, ResNet ‐ 34 is combined with transfer learning, and a new TL ‐ ResNet34 deep learning model with 35 convolution depths is proposed to detect wood knot defects. Among them, ResNet ‐ 34 is used as a feature extractor for wood knot defects. At the same time, a new method TL ‐ ResNet34 is proposed, which combines ResNet ‐ 34 with transfer learning. After that, the wood knot defect da ‐ taset was applied to TL ‐ ResNet34 for testing. The results show that the detection accuracy of the dataset trained by TL ‐ ResNet34 is significantly higher than that of other methods. This shows that the final prediction accuracy of the detection of wood knot defects can be improved by TL ‐ ResNet34.


Introduction
In recent years, due to the shortage of wood resources, delaying the consumption of forest resources has become a research focus. If wood knot defects are found quickly on the surface of wood, then the utilization rate of wood can be improved effectively and the excessive consumption of woods can be reduced [1][2][3][4]. The combination of digital image processing and artificial intelligence algorithms is a common method of wood knots defect detection and classification [5]. Among them, fixed feature extraction and classification recognition technology have been widely used [6,7]. This technology is mainly composed of computer vision technology, spectral analysis technology, and other digital image processing methods [8]. The effective feature parameters are extracted from the samples by changing the operating mode through fixed mapping, and multiple statistical or machine learning methods are compared to determine the effective feature parameters. Deep learning (DL) has become a new method with great potential in the field of artificial intelligence [9]. The DL method enables the feature values of the original data to be learned automatically, thereby the impact of manual operations on feature extraction can be reduced [10]. However, due to the small sample size in wood knot defect detection, it is easy to generate overfitting or underfitting, which limits the final prediction accuracy of the model [11]. In order to rectify this problem, the deep convolution neural network (CNN) model was trained on ImageNet [12] by researchers and used as a feature extractor in small data sets in different fields after being combined with transfer learning [13]. A large amount of literature proves that good results have been achieved after these improvements. For example, Thenmozhi et al. (2019) used the AlexNet pre-trained model for classification of insects [14]. The results showed that the accuracy of the AlexNet model based on transfer learning was 7.65% higher than that of AlexNet. In the same year, Gao et al. proposed a tree species identification method based on transfer learning [15]. The experimental results showed that the highest accuracy of trunks and leaves based on transfer learning was improved by 51.38% and 51.69%, respectively, compared with the accuracy of ordinary deep learning. In 2020, Kentsch et al. conducted research on classification identification of a winter orthomosaic dataset by ResNet-50 based on transfer learning [16]. The results showed that from no transfer learning to transfer learning from a general-purpose dataset, the accuracy with a 9.78% improvement. Meanwhile, they also observed a further 2.7% improvement when transfer learning was performed from a dataset that was closer to our type of images.
The feature extraction of wood knot defects is the prior condition for the realization of the above method. For example, a Hu invariant moments feature extraction method combined with a BP (back propagation) neural network to classify wood knot defects was proposed by Qi et al. [17]. The accuracy of this method for wood knot defect recognition is over 86%. Another method of sub-region variable scale Gaussian fitting model combined with a BP neural network to classify wood defects was proposed by Yuan et al. [18]. The accuracy of this method to identify wood knots reaches 92.80%. In 2020, Beshaier and Yossra et al. conducted a comparative study on two different image retrieval methods of gray level co-occurrence matrix (GLCM) and Hu invariant moments [19]. The results showed that the average accuracy of GLCM and Hu invariant moments is 92.8% and 84.4%, respectively. However, due to the unique shape of each wood knot defect, the process of feature extraction is difficult and complicated. Therefore, a method combining near infrared spectroscopy (NIR) with machine vision was proposed by Yu et al. [20]. Compounds with different spectral absorption rates are used to identify special types of wood knot defects, and the recognition accuracy rate reaches 92.0%. It should be emphasized that luminescent equipment is required to collect data before processing the acquired data, so the recognition speed of this method is not fast enough. These network and learning-based recognition methods depend on a priori wood feature extraction. However, the process of artificial feature extraction is difficult and complex for wood products, whether it is extracting the morphological features (Hu invariant moments, etc.) or physical features (NIR, etc.) of wood. Since each knot defect has its own unique appearance, it is difficult to detect it accurately. At the same time, it also requires a lot of time to extract the knot defect features by manual operation equipment. Therefore, a convolution neural network that can learn wood knot defect features automatically, instead of complex artificial extraction for defect detection, is needed. A fully convolutional neural network (Mix-FCN) was proposed by He et al. in 2019 to identify and locate wood defects [21], but the depth of the network is too deep, resulting in too much computation. In 2020, an improved SSD algorithm was proposed by Ding et al. [22]. Although the amount of computation is reduced, the average precision of knot defect detection is not high enough. To solve the above problems, a high accuracy detection algorithm for wood knot defects based on a convolution neural network with automatic feature extraction may be required.
In this article, a new transfer residual neural network based on ResNet-34 is proposed, named TL-ResNet34, as a classifier for the detection of wood knots. The depth of TL-ResNet34 is 35 layers. The high-quality features of images can be extracted on ImageNet, due to the good performance of ResNet-34 in image classification. First, it is assumed that the feature extraction layer of ResNet-34 also performs well in the detection of wood knot defects. After that, we use deeper network layers and better feature extraction layers to train and test the knot defect data set. Finally, we compare the test results with other DL and traditional models. The results show that the prediction accuracy of TL-ResNet34 is as high as 98.69%, indicating that the model has a good recognition effect. This provides more possibilities for non-destructive testing of wood knot defects.

Dataset
The experimental dataset was provided by the Computer Laboratory of the Department of Electrical Engineering, University of Oulu [23][24][25] and includes the image information for 448 spruce knots.
Appropriate datasets are required to train or evaluate the performance of the algorithm. A total of 448 images of wood knots were prepared as the experimental samples, so 450 original images were obtained. The wood knots dataset includes seven classes of 448 images, which were divided into training set, validation set, and test set at a ratio of 6:2:2, including 268 training, 90 validation, and 90 testing images, respectively ( Table 1). Because of the particularity of wood, the color of wood knots is usually darker than wood. However, sometimes the heartwood has a darker color, which may cause the neural network to evaluate the image of the wood knot defects incorrectly, which will affect the recognition of the wood knot defects in the heartwood during the network training. In order to solve this problem, the images of the wood knot defect in the darker heartwood were pre-processed and their contrast was enhanced to make the wood knot defects more obvious. To maximize the collected data, the original images were processed by different methods. Additional data were used to expand the dataset through data augmentation. Finally, we obtained 1885, 636, and 615 images for the training, validation, and test datasets, respectively.
Overfitting can usually be reduced by training the network with additional data [26][27][28], and the generalization ability of machine learning models can be improved. Massive data is required for deep learning; expanding the data through multiple methods was necessary due to the insufficiency of the data collected. The problem of limited data can be solved easily by enlarging the dataset artificially and adding it to the training set. Six distinct forms of data augmentation were used for color images ( Figure 1). After data initialization, new images were added by vertical mirroring and horizontal mirroring, rotating by 180°, adding Gaussian noise, adding salt-and-pepper noise to images, and increasing the hue by 10. After data augmentation, the training dataset, validation, and testing datasets were expanded to seven times their original size.

ResNet-34
With the rapid development of computer technology and the improvement of computer hardware performance, deep learning has made great progress [29]. Artificial neural networks have been widely used in different fields due to their excellent performance in image classification and recognition [30]. CNN is a multilayer feedforward neural network with a convoluted structure, which has good fault tolerance and self-learning capabilities. It can deal with problems in complex environments and unclear backgrounds. Its generalization ability is significantly better than that of other methods. CNN generally consists of an input layer, several convolution layers, pooling layers, a fully connected layer, and an output layer. It can carry out supervised learning and unsupervised learning and is applied to computer vision, natural language processing, and other fields.
Residual building block is the infrastructure of the ResNet-34 network, and the whole network is mainly composed of it. A shortcut-connection [9] was used by the residual building block to skip the convolutional layers, which effectively alleviates the problem of gradient disappearance or gradient explosion caused by increasing depth in the neural networks, and it helped us to construct CNN structures more flexibly and improve the recognition rate of wood knot defects.
The structure of the basic-block is shown in Figure 2. It is used for the 34 layers of ResNet. The residual building block is composed of several convolutional layers (Conv), batch normalizations (BN), a rectified linear unit (ReLU) activation function, and a shortcut. The output of the residual building block can be formulated as follows: where F is the residual function and x and y are the input and output of the residual function, respectively. The entire residual network is composed of the first convolutional layer and several basic-blocks. The ResNet-34 contains 33 convolutional layers, a max-pooling layer of size 3 × 3, an average pool layer, followed by a fully connected layer. A classical ResNet-34 model involves 63.5 million parameters, where rectification nonlinearity (ReLU) activation and batch normalization (BN) is applied to the back of all convolution layers in the "Basic-Block" block and the softmax function is applied in the final layer. The architecture of ResNet-34 is shown in Table 2.

Transfer Learning
A large number of annotated datasets are needed for CNN training to achieve high prediction performance. However, it is difficult to obtain such a large quantity of data in general, and there are high costs for image labeling [31]. Therefore, transfer learning is often applied in a relatively small number of datasets to train the neural network, and it has been proven to be a very effective method.
Due to their fixed network structure, some mature CNNs, such as AlexNet and Goog-LeNet, etc. [32], have certain feature extraction capabilities that can be obtained by pretraining on large-scale mature datasets (such as ImageNet) before training on their own datasets.
It is easy to produce overfitting problems because the amount of data in this experiment is relatively small [26][27][28], and the model also requires more epochs during the training phase, resulting in poor model recognition capability. Therefore, the idea of transfer learning can be used to pre-train the model on ImageNet to optimize the classification of wood knot defects. ResNet-34 was fine-tuned to fit the data in this article, and a lot of training time was saved.

ReLU Nonlinearity
The rectified linear unit (ReLU) is a commonly used activation function in artificial neural networks; we denoted the regularized ReLU function as follows: where f is ReLU and x is input.
Equation (2) zeroes out the output of some neurons when using the ReLU function, resulting in sparsity of the network, which reduces the interdependence of parameters and alleviates the occurrence of overfitting (x) max(0, x) f  g [33]. Compared with the larger number of calculations needed when using a sigmoid function and other functions, using ReLU can save a lot of time.
For deep networks, when sigmoid and tanh functions are back propagating, their gradients are close to zero in the saturation region. It is easy for the gradients to disappear, resulting in slower convergence and information loss. In most cases, the gradient of ReLU is a constant, which helps to solve the convergence problem of deep networks. Meanwhile, as a unilateral function, ReLU is more in line with the characteristics of biological neurons. CNNs with ReLU can train several times faster than their equivalents using sigmoid units ( Figure 3) or hyperbolic tangents [34].

Adaptive Moment Estimation
The Adam algorithm is a one-step optimization algorithm for random objective function. Based on adaptive low-order moment estimation, the neural network weights can be updated iteratively according to training data. This method is easy to implement and has high computational efficiency and low memory requirements [35]. The diagonal scaling of the gradient of the Adam algorithm is invariant, so it is suitable for solving problems with large-scale data or parameters. The super parameters of the Adam algorithm are well explained and usually need little or no adjustment. Different adaptive learning rates were also designed for different parameters, and the weights of a neural network can be iteratively updated according to the training data [29,36]. When there are back propagation and updated parameters, the Adam algorithm can better adjust the learning rate. Adam is also suitable for the problem of unstable objective functions and gradient sparsity. Therefore, the Adam algorithm has faster convergence speed and effective learning.

Cross Entropy
In the training process, cross entropy was used as a loss function to update  and b . The cross-entropy function is defined as follows: where H is cross entropy, x is input, p is the probability, and q is the probability of the predicted value. Compared with the variance loss function, the problem of updating weights and bias too slowly was overcome by this method. The updating of weights and deviations is affected by errors [37,38]. For this reason, when the error is large enough, the updating speed of weights is very fast. Similarly, when the error is small, the updating speed of weights and deviations is slow.

Overall Architecture
The structure of the proposed TL-ResNet34 is shown in Figure 4. There are 35 trainable layers in this architecture, Conv layers in these identity blocks), and two fully-connected layers. The TL-ResNet34 consists of 5 convolution groups; each convolution group is composed of one or more basic convolution calculation processes (Conv->BN->ReLU). The first convolution group contains only one convolution operation, its kernel is 7 × 7 and the stride is 2. The second to fifth convolution groups contain multiple, identical residual units, which were named Conv2_x, Conv3_x, Conv4_x, Conv5_x, respectively ( Figure 4). The first 33 layers of TL-ResNet34 were transferred with the ImageNet dataset (there are 1 + 16 × 2 = 33 Conv layers in Figure 4). Then, a fully connected (FC) layer and the softmax classifier were added to the TL-ResNet34 to fit the category label of the wood knots defect dataset. At the same time, the final prediction accuracy of the neural network in the recognition of wood knots defect dataset was improved.

Training
The proposed TL-ResNet34 was used and trained on one GPU (GTX 960M 2G). The experimental environment is presented in Table 3. The model using the Adam optimization algorithm and the cross-entropy loss function was trained for 300 iterations, with a batch size of 128 and learning rate of 1 × 10 −4 . The parameter configuration is shown in Table 4.   Figure 5 shows the process of training the model using the training and validation datasets. The best accuracy was 99.22%, and the best loss was about 2.83%. At the same time, the overall accuracy in the test phase was about 98.69%.

Comparison of Model Performance
To evaluate the performance of the TL-ResNet34, the precision (P), recall (R), f1-score (F1), and false alarm rate (FAR) were determined. All the evaluation indices are defined as follows: where i P is the class i precision, i R is the class i recall, i Fl is the class i weighted har- FAR is the class i false alarm rate, k is the number of knots, ii T is the class i predicted to be class i, ij T is the class i predicted to be class j , ji T is the class j predicted to be class i , and jj T is the class j predicted to be class j . Table 5 shows the precision, recall, f1-score, false accept rate, and accuracy of TL-ResNet34 and three other comparison methods for seven types of wood knot defect images. It can be seen that the five evaluation indicators of TL-ResNet34 for decayed knots, dry knots, edge knots, encased knots, and sound knots are the best among the four models. In the recognition of horn knots, the R, F1, and accuracy of TL-ResNet34 are slightly lower than those of GoogLeNet, but these five evaluation indicators of TL-ResNet34 are still better than those of AlexNet and VGGNet-16, and the P and FAR of TL-ResNet34 are the best among the four models. In the recognition of leaf knots, the performance of TL-ResNet34 is slightly inferior to that of GoogLeNet and AlexNet, but all indicators of TL-ResNet34 are better than those of VGGNet-16. Meanwhile, the P and FAR of TL-ResNet34 are better than those of AlexNet. For the seven types of knots, all networks have the highest precision for horn knots, the lowest precision for encased knots, the highest accuracy for horn knots, and the lowest accuracy for sound knots. Compared with the other three methods, the TL-ResNet34 performs well in these five evaluation indicators.
The comprehensive indicators of the five models are shown in Figure 6. The results show that the accuracy rates of AlexNet, VGGNet-16, GoogLeNet, and TL-ResNet34 are 96.89%, 95.58%, 95.42%, and 98.69%, respectively. It can be seen that TL-ResNet34 reached the highest value in terms of P, R, F1, FAR, and accuracy. This shows that the TL-ResNet34 model has the best recognition effect. Note: our method is bold.  Figure 7 shows the accuracy and loss values of TL-ResNet34, AlexNet, VGGNet-16, and GoogLeNet during the training phase. The experimental results show that these four models can converge within 300 training epochs, but their convergence rates are different. The convergence rate of GoogLeNet is the slowest. The training process of AlexNet is similar to that of VGGNet-16, it converges and tends to be stable after 150 training epochs. TL-ResNet34 converges within 50 training epochs, which is faster than the other three CNN models. Meanwhile, within 300 training epochs, the loss value of TL-ResNet34 is always lower than that of the other three models, and the accuracy value is always higher than the that of other three models. The results show that, compared with the other three models, TL-ResNet34 has the highest accuracy and the fastest convergence rate.

Transfer Learning Analysis
In order to reduce the overfitting and enhance the feature extraction ability of the network, transfer learning was applied in TL-ResNet34. Table 6 shows the test results of the wood knot classification of TL-ResNet34 and the comparison network. During the test phase of TL-ResNet34, a total of 19 decayed knots, 96 dry knots, 91 edge knots, and 49 horn knots were correctly recognized, and no misidentification occurred. Among the 40 encased knots, 37 knots were correctly recognized, and 3 knots were incorrectly recognized as a dry knot, an edge knot, and a sound knot. Among 66 leaf knots, 64 knots were correctly recognized, and 2 knots were incorrectly recognized as a horn knot and a sound knot. Among 250 sound knots, 247 knots were correctly recognized, and 3 knots were incorrectly recognized as leaf knots. It can be seen that TL-ResNet34 is better than ResNet-34 in the recognition ability of seven kinds of wood defect knots. Figure 8 shows the impact of transfer learning on the accuracy and loss of the model. In the training phase, TL-ResNet34 combined with transfer learning has a faster convergence rate. It can be found that the combination of transfer learning and ResNet-34 can increase the recognition accuracy to 98.69%, which is higher than that of the original Res-Net-34 (the recognition accuracy of ResNet-34 is 97.05%). It can be seen that the combination with transfer learning is effective at improving the performance of the model.

Comparison of Optimal Algorithms
An optimization algorithm is very important for a model's performance. In this paper, SGD (stochastic gradient descent), Adagrad, and Adam optimization algorithms were used to train the TL-ResNet34 network, and their convergence rates were compared. Figure 9 shows the training process of these three optimization algorithms, and their learning rate is 1 × 10 −4 . The result shows that the model with the Adam algorithm has the fastest convergence rate. It can be seen from the loss curve in Figure 9 that the Adam algorithm can converge quickly and it is more stable than are SGD and Adagrad.

Comparison of Detection Methods for Wood Surface Defects
The comparison between the method proposed in this paper and the other methods is shown in Table 7. It can be seen that compared with other detection methods, the classification accuracy of this method is relatively high, about 98.69%. Second, this method can detect more kinds of defects: decayed knots, dry knots, edge knots, encased knots, horn knots, leaf knots, and sound knots. In addition, although the method in [39] has the highest accuracy, about 99%, this method needs to use NIR to extract features manually, which leads to the decline of the overall recognition efficiency. The authors of [39] were committed to reducing the artificial extraction process of image features and reducing the time required for recognition; however, although their method can identify a variety of wood defects and has a high recognition rate, it had not been trained for the types of wood knot defects specifically, so their method was not suitable for this study. All the feature extraction processes in this method are completed by the convolutional neural network automatically, which overcomes the limitations of manually extracting image features.

Conclusions
In summary, a transfer residual neural network TL-ResNet-34 was proposed to identify wood knot defects quickly and accurately. The accuracy of the network was improved by more than 0.78% after extracting structural defect features, training parameters, and optimization of datasets and images. At the same time, transfer learning was added to build a pre-training model. The experimental results show that TL-ResNet34 achieved a high recognition rate of 99.22% in the training dataset and a low training loss of 2.83% in the validation dataset during the process of identifying seven wood knot defects. The overall accuracy reached 98.69%, and the fluctuation ranges of the loss curve and the accuracy curve were small when TL-ResNet34 was applied to the test dataset. Moreover, this method does not require a large amount of image preprocessing and feature extraction when detecting various types of wood defects, and has high efficiency and recognition accuracy in both training and testing stages. This means that the collected wood knot defects can be identified accurately and quickly by the proposed TL-ResNet34. Based on the above analysis, the proposed TL-ResNet-34 has potential applications in wood nondestructive testing and wood defect identification.