North American Hardwoods Identiﬁcation Using Machine-Learning

: This technical note determines the feasibility of using an InceptionV4_ResNetV2 convolutional neural network (CNN) to correctly identify hardwood species from macroscopic images. The method is composed of a commodity smartphone ﬁtted with a 14 × macro lens for photography. The end-grains of ten di ﬀ erent North American hardwood species were photographed to create a dataset of 1869 images. The stratiﬁed 5-fold cross-validation machine-learning method was used, in which the number of testing samples varied from 341 to 342. Data augmentation was performed on-the-ﬂy for each training set by rotating, zooming, and ﬂipping images. It was found that the CNN could correctly identify hardwood species based on macroscopic images of its end-grain with an adjusted accuracy of 92.60%. With the current growing of machine-learning ﬁeld, this model can then be readily deployed in a mobile application for ﬁeld wood identiﬁcation.


Introduction
Currently, wood identification is performed by human lumber graders, enthusiasts, professionals, and experts in the field by using hand lenses, keys, and wood species atlases or field guides and manuals. Thus, the accuracy of such wood identification critically depends on the observer' expertise and experience to correctly recognize esoteric species features, which depend on wood soundness, in the case of decayed specimens for example, and correctly cover the full range of anatomical variability within a species [1]. Being able to correctly identify wood is an issue of major international importance, due to illegal logging, misrepresentation, mislabeling, and public awareness, of which species they are buying for their own applications and because each unique species contains its own performance characteristics [2,3].
One application area for which fast and accurate wood identification is important is illegal logging. Wiedenhoeft et al. [3] indicated that more than 60% of the 73 types of wood products tested, including furniture, musical instrumental, sporting equipment, kitchen implements, etc., had some evidence of fraud or mislabeling. In more than half of the samples, the wood was labeled with the wrong species name, whereas about 20% of the samples had the wrong product type label. They also surveyed people regarding laboratory capacity in identifying wood and found that 15 of the 23 respondents reported having limited capacity to conduct wood identification, and 13 reported in detail about their identification capacity. These 13 laboratories identify over 830 specimens per year with an average identification fee of $65.00/sample (United States currency), and a total investment in wood identification of $54,000.00 per year. According to [4], considering only sawn wood, the United States exported $3.8 billion in 2018, which resulted in an identification capacity of approximately 0.001% of Forests 2020, 11, 298 2 of 9 the exported wood. For domestic wood identification, the accuracy was near 90% for the laboratories surveyed. Such high accuracy for law enforcement personnel officers screening products at border should help significantly minimize any illegal activity.
In a broader example, wood maintains a 93% share of the market for crossties installed in North America. This corresponds to 3249 wood crossties per mile, with a total of 650,000,000 crossties over 200,000 miles. The softwood and hardwood species used for crossties are catalpas (Catalpa spp. , and many others [5]. Each species has its own heartwood treatability characteristics that critically influence performance [5]. Accurate wood identification will benefit from this type of industry for economic and durability reasons, as it is important to extend the service life of crossties. In another application area, wood identification using fast and user-friendly approaches can help to develop a stronger and more competitive wood market. A prominent market is engineered wood mats. Mats protect environmentally sensitive soils and provide access and transport of heavy machinery in the field for construction of powerlines, bridges, roads, and drilling platforms. They generally are constructed with a wide variety of commercial hardwood timbers such as ash (Fraxinus spp. L.), beech (Fagus spp. L.), hickory (Carya spp. Nutt.), magnolia (Magnolia spp. L.), oak (Quercus spp. L.), pecan (Carya illinoinensis (Wangenh.) K. Koch), and sweetgum (Liquidambar styraciflua L.). Since the mechanical properties of wood changes from species to species, correct wood identification is crucial for adequate decision-making.
Over the last few years, artificial intelligence has provided technological breakthroughs in several fields by classifying, detecting, and segmenting images and videos [6,7]. For instance, [8,9] developed a system composed of a macro lens fitted with commodity smartphone to magnify the end-grain of Malaysian wood species. The images were processed and trained using convolutional neural networks (CNN), which is a complex numerical analysis used to assign predictive parameters to various aspects (shape and patterns) of an image. Their CNN classified 100 trade wood species with top-1 and top-2 accuracies (i.e., species is correctly identified in the top one or two predictions) of 77.52% and 87.29%, respectively [8]. However, deep learning models are specific, i.e., classifications can be done only on species and their attributes for which the model was trained. In that case, for North American hardwoods and even softwood species, their CNN has minimal impact. In addition, CNNs are not able to recognize non-visual characteristics such as species odor, unless such information can be digitized for training purposes.
The main goal of this work was to demonstrate how to make use of smartphone and macro lenses to develop powerful wood identification methods. This technical note, to the author's knowledge, represents the first documented approach to macroscopically classify a wide variety of North American hardwood species using a CNN methodology. This computer vision system can be enhanced and developed as a mobile application to ameliorate illegal logging and/or misrepresentation, increase wood identification body, and advance forest products industry in general.  (Fraxinus americana L.), and white oak (Quercus alba spp. L.). These species were chosen due to their availability and extensive number of pieces. Samples were identified by three experts, one of which has more than 25 years of experience in the field.

Generation of Wood Sample Database
We generated an initial dataset, which consisted of 1869 unique images from the 10 hardwood species. Figure 1 shows examples of the macroscopic end-grain samples used in the image dataset. As discussed later, this small dataset was augmented since a training set of approximately 150 unique images per class is too small to develop reliable CNN models. to their availability and extensive number of pieces. Samples were identified by three experts, one of which has more than 25 years of experience in the field. We generated an initial dataset, which consisted of 1869 unique images from the 10 hardwood species. Figure 1 shows examples of the macroscopic end-grain samples used in the image dataset. As discussed later, this small dataset was augmented since a training set of approximately 150 unique images per class is too small to develop reliable CNN models.

Image Acquisition Setup and Dataset Processing
Preparation of the wood surface was needed to properly locate and identify cell types that were useful for identification. To this end, several thin and clean cuts with a razor blade were made across the transverse end surface to improve sufficient visible area for photography. End-grain sanding was not applied in order to better simulate field applications. Image acquisition was facilitated by an inexpensive 14× macro lens attached to a commodity smartphone with a 12-megapixel rear-facing camera with aperture of f/1:8. The semi-transparent protective cap of the macro lens positioned the camera approximately 3 cm from the samples, which improved photo-taking stability ( Figure 2). As the pieces of available species varied in sizes, positioning of the lens was carefully considered to avoid any possible overlapping. Photos were taken under natural illumination. A camera flash was not used. Images were 3024 pixels × 3024 pixels with resolution of 72 dpi and 24 bits depth at ISO-40.

Image Acquisition Setup and Dataset Processing
Preparation of the wood surface was needed to properly locate and identify cell types that were useful for identification. To this end, several thin and clean cuts with a razor blade were made across the transverse end surface to improve sufficient visible area for photography. End-grain sanding was not applied in order to better simulate field applications. Image acquisition was facilitated by an inexpensive 14× macro lens attached to a commodity smartphone with a 12-megapixel rear-facing camera with aperture of f/1:8. The semi-transparent protective cap of the macro lens positioned the camera approximately 3 cm from the samples, which improved photo-taking stability ( Figure 2). As the pieces of available species varied in sizes, positioning of the lens was carefully considered to avoid any possible overlapping. Photos were taken under natural illumination. A camera flash was not used. Images were 3024 pixels × 3024 pixels with resolution of 72 dpi and 24 bits depth at ISO-40. to their availability and extensive number of pieces. Samples were identified by three experts, one of which has more than 25 years of experience in the field. We generated an initial dataset, which consisted of 1869 unique images from the 10 hardwood species. Figure 1 shows examples of the macroscopic end-grain samples used in the image dataset. As discussed later, this small dataset was augmented since a training set of approximately 150 unique images per class is too small to develop reliable CNN models.

Image Acquisition Setup and Dataset Processing
Preparation of the wood surface was needed to properly locate and identify cell types that were useful for identification. To this end, several thin and clean cuts with a razor blade were made across the transverse end surface to improve sufficient visible area for photography. End-grain sanding was not applied in order to better simulate field applications. Image acquisition was facilitated by an inexpensive 14× macro lens attached to a commodity smartphone with a 12-megapixel rear-facing camera with aperture of f/1:8. The semi-transparent protective cap of the macro lens positioned the camera approximately 3 cm from the samples, which improved photo-taking stability ( Figure 2). As the pieces of available species varied in sizes, positioning of the lens was carefully considered to avoid any possible overlapping. Photos were taken under natural illumination. A camera flash was not used. Images were 3024 pixels × 3024 pixels with resolution of 72 dpi and 24 bits depth at ISO-40.   For effective training and validation of deep learning models, a large number of images are necessary [10]. The initial image dataset was augmented to generate additional synthesized images produced by zooming, rotation, and flipping. Only the training set was augmented. To improve trustworthiness of our results, we applied means, i.e., stratified k-fold cross-validation. The augmented image dataset was randomly split into 5 (k = 5) folds of mutually exclusive and shuffled subsets of proportional size. The model was then trained and tested on the each of the k-folds of data. To avoid overfitting, data augmentation was performed on-the-fly after splitting the entire dataset so that redundancy was minimized.

Convolutional Neural Network Architecture
The InceptionV4_ResNetV2 convolutional neural network (CNN) model used in this study to classify 10 North American hardwoods species and was based on the work of [11]. The reason for using this CNN model was due to its image classification performance on the ImageNet dataset (0.953 top-5 accuracy in 1000 classes) and due to the smaller number of trainable parameters (3 times less) compared to prior research that used VGG16 architecture [12]. High definition images of the wood samples were resized to 299 × 299 × 3 (width × height × color-channels), which was the default input size for this CNN. The CNN was implemented using TensorFlow 1.14 [13] and Keras [14]. We only performed training from scratch with a final 10-way softmax function, which corresponded to the 10 possible wood species of the dataset. The RMSprop stochastic gradient descent optimizer was used with an initial learning rate of 0.001 and decay rate developed by [11]. The CNN was trained using a categorical cross entropy loss function. We balanced the dataset classes through class_weight functionality from Keras. The CNN was run using a Nvidia RTX2070 graphics-processing unit with a batch size of 8. We evaluated CNN performance by several metrics: the mean of weighted F1-score, precision, recall, adjusted accuracy averaged confusion matrices, and precision-recall by F1 isometric curves for the validation set.

Results and Discussion
The modern InceptionV4_ResNetV2 CNN architecture was used to macroscopically classify ten hardwood species. The validation set varied from 341 to 342 images. The stratified cross-validation split the dataset such that an equal percentage of images were selected for the validation set. The longest fold took 89 epochs to train. The same decay rate was used for all folds; the learning rate decay is displayed in Figure 3. For effective training and validation of deep learning models, a large number of images are necessary [10]. The initial image dataset was augmented to generate additional synthesized images produced by zooming, rotation, and flipping. Only the training set was augmented. To improve trustworthiness of our results, we applied means, i.e., stratified k-fold cross-validation. The augmented image dataset was randomly split into 5 (k = 5) folds of mutually exclusive and shuffled subsets of proportional size. The model was then trained and tested on the each of the k-folds of data. To avoid overfitting, data augmentation was performed on-the-fly after splitting the entire dataset so that redundancy was minimized.

Convolutional Neural Network Architecture
The InceptionV4_ResNetV2 convolutional neural network (CNN) model used in this study to classify 10 North American hardwoods species and was based on the work of [11]. The reason for using this CNN model was due to its image classification performance on the ImageNet dataset (0.953 top-5 accuracy in 1000 classes) and due to the smaller number of trainable parameters (3 times less) compared to prior research that used VGG16 architecture [12]. High definition images of the wood samples were resized to 299 × 299 × 3 (width × height × color-channels), which was the default input size for this CNN. The CNN was implemented using TensorFlow 1.14 [13] and Keras [14]. We only performed training from scratch with a final 10-way softmax function, which corresponded to the 10 possible wood species of the dataset. The RMSprop stochastic gradient descent optimizer was used with an initial learning rate of 0.001 and decay rate developed by [11]. The CNN was trained using a categorical cross entropy loss function. We balanced the dataset classes through class_weight functionality from Keras. The CNN was run using a Nvidia RTX2070 graphics-processing unit with a batch size of 8. We evaluated CNN performance by several metrics: the mean of weighted F1-score, precision, recall, adjusted accuracy averaged confusion matrices, and precision-recall by F1 isometric curves for the validation set.

Results and Discussion
The modern InceptionV4_ResNetV2 CNN architecture was used to macroscopically classify ten hardwood species. The validation set varied from 341 to 342 images. The stratified cross-validation split the dataset such that an equal percentage of images were selected for the validation set. The longest fold took 89 epochs to train. The same decay rate was used for all folds; the learning rate decay is displayed in Figure 3.  When developing deep learning models, overfitting can be a crippling problem. Overfitting means that the validation loss tends to move in a U-shape as training progresses such that the difference between training loss and validation loss increases. Figure 4 shows the averaged training and validation accuracies, and training and validation losses for five folds.
Forests 2020, 11, x FOR PEER REVIEW 5 of 9 When developing deep learning models, overfitting can be a crippling problem. Overfitting means that the validation loss tends to move in a U-shape as training progresses such that the difference between training loss and validation loss increases. Figure 4 shows the averaged training and validation accuracies, and training and validation losses for five folds. Even though spikes in the validation loss were observed as training progressed, the variance decreased as the learning rate decreased. No overfitting was observed for two reasons: (1) overfitting follows a U-shape, which was not the case, and (2) the differences between training and validation loss decreased over epochs. All of these behaviors demonstrated stability of the model. The CNN training stage was considered as converged when the accuracy approaches 1.0. This finding supports that InceptionV4_ResNetV2 can correctly classify ten North American hardwood species from macroscopic images. In that case, this model could be considered for use in a mobile application. Table 1 shows the averaged model evaluation metrics per species. The overall adjusted accuracy for the entire model was 92.60% on the imbalanced unseen validation set. This result translates into 318 corrected wood species identifications out of a possible 343, which validates the capability of this model to identify truly unseen data with high accuracy. Precision and recall metrics address different questions about model performance. These metrics are crucial to confirm that imbalanced datasets have been properly modeled. For our study, there were twice as many observations of some species than others (e.g., Red Mulberry and Sassafras). Precision refers to the ratio of true positives divided by the sum of the true positive and false positive. High precision defines a low false positive rate. Recall is the ratio of true positive divided by the sum Even though spikes in the validation loss were observed as training progressed, the variance decreased as the learning rate decreased. No overfitting was observed for two reasons: (1) overfitting follows a U-shape, which was not the case, and (2) the differences between training and validation loss decreased over epochs. All of these behaviors demonstrated stability of the model. The CNN training stage was considered as converged when the accuracy approaches 1.0. This finding supports that InceptionV4_ResNetV2 can correctly classify ten North American hardwood species from macroscopic images. In that case, this model could be considered for use in a mobile application. Table 1 shows the averaged model evaluation metrics per species. The overall adjusted accuracy for the entire model was 92.60% on the imbalanced unseen validation set. This result translates into 318 corrected wood species identifications out of a possible 343, which validates the capability of this model to identify truly unseen data with high accuracy. Precision and recall metrics address different questions about model performance. These metrics are crucial to confirm that imbalanced datasets have been properly modeled. For our study, there were twice as many observations of some species than others (e.g., Red Mulberry and Sassafras). Precision refers to the ratio of true positives divided by the sum of the true positive and false positive. High precision defines a low false positive rate. Recall is the ratio of true positive divided by the sum of the true positive and false negative. This indicates the percentage of ground truths that were correctly predicted. In the case of imbalanced datasets, the F1-score is another useful metric that considers the harmonic mean between precision and recall. Figure 5 shows the precision-recall curve (PRC) by isometric F1 curves for this study.
Forests 2020, 11, x FOR PEER REVIEW 6 of 9 of the true positive and false negative. This indicates the percentage of ground truths that were correctly predicted. In the case of imbalanced datasets, the F1-score is another useful metric that considers the harmonic mean between precision and recall. Figure 5 shows the precision-recall curve (PRC) by isometric F1 curves for this study.  The PRC is important to characterize model performance when moderate skewness of classes was present in the dataset. We avoided using receiver operating characteristic (ROC) because it does not always translate into realistic PRC when the classes are imbalanced, which was recommended by [15]. The PRC provides an accurate prediction for future classification performance because only the fraction of true positives among all positive predictions are evaluated, i.e., in the PRC there is an avoidance of true negatives. In our case, all folds showed robust performance with a desirable clustering of curves toward the maximum precision and recall values. The overall area under the curve for all species was 0.98 (in comparison, pure random guessing would result in a quadrant of 0.50). The averaged confusion matrix is displayed in Figure 6. The PRC is important to characterize model performance when moderate skewness of classes was present in the dataset. We avoided using receiver operating characteristic (ROC) because it does not always translate into realistic PRC when the classes are imbalanced, which was recommended by [15]. The PRC provides an accurate prediction for future classification performance because only the fraction of true positives among all positive predictions are evaluated, i.e., in the PRC there is an avoidance of true negatives. In our case, all folds showed robust performance with a desirable clustering of curves toward the maximum precision and recall values. The overall area under the curve for all species was 0.98 (in comparison, pure random guessing would result in a quadrant of 0.50). The averaged confusion matrix is displayed in Figure 6. The quality of our predictions can be verified by examining confusion matrix, which shows model estimation quality. An ideal classifier would produce 1.0 values on the diagonal of a confusion matrix, which would mean all species have been correctly predicted by the model. Figure 6 shows a very low degree of confusion in our model. This is likely due to using stratified k-fold cross-validation on this limited dataset to help train an unbiased model by ensuring that every fold tested different randomized data. However, our model did struggle to classify Hackberry. It misclassified 6 images out of a total of 35 images. The images were confused with Sassafras. We believe as more data is gathered and processed, this particular confusion can be ameliorated. In fact, the next step for this research is to collect more wood sample data, which is needed to increase the reliability of our model. As we increase the portfolio of species, we see future challenges related to image and model accuracy.
To tackle these issues, we plan to deploy wood identification on mobile devices, not relying on remote data centers, and we plan to continuously update our increasingly comphesive machine-learning models to sustain high accuracy. The quality of our predictions can be verified by examining confusion matrix, which shows model estimation quality. An ideal classifier would produce 1.0 values on the diagonal of a confusion matrix, which would mean all species have been correctly predicted by the model. Figure 6 shows a very low degree of confusion in our model. This is likely due to using stratified k-fold cross-validation on this limited dataset to help train an unbiased model by ensuring that every fold tested different randomized data. However, our model did struggle to classify Hackberry. It misclassified 6 images out of a total of 35 images. The images were confused with Sassafras. We believe as more data is gathered and processed, this particular confusion can be ameliorated. In fact, the next step for this research is to collect more wood sample data, which is needed to increase the reliability of our model. As we increase the portfolio of species, we see future challenges related to image and model accuracy. To tackle these issues, we plan to deploy wood identification on mobile devices, not relying on remote data centers, and we plan to continuously update our increasingly comphesive machine-learning models to sustain high accuracy.

Conclusions
In this study, we showed the feasibility of the InceptionV4_ResNetV2 convolutional neural network to classify ten North American hardwood species with 92.60% of accuracy and precision-recall curve of 0.98. We envision our highly accurate model being utilized to combat illegal logging and/or misrepresentation.
With a proven deep learning model, our future efforts will focus on developing a mobile application. Artificial intelligence applications are typically trained on workstations and powerful laptops that are not friendly to harsh environmental or weather conditions. More recently, training and inference are performed on cloud data centers. However, online processing of images via data centers requires internet availability, and data privacy is not guaranteed. Our vision of having wood identification applications on mobile devices avoids these issues. Mobile-first AI applications are a rapidly growing field that allows machine-learning apps to see, hear, sense, and think in real-time. We plan to directly process images on mobile devices using TensorFlow Lite. End-users will need to follow only a few simple steps to become productive: loading the application on a smartphone, attaching an inexpensive macro lens, preparing the wood surface with clean cuts with a razon blade, and finally snapshot an image for wood identification using our app. This wood identification work flow is fast, and user-friendly such that any person with minimal training could perform highly accurate wood identification. Mobile device-based machine-learning will help put wood identification capabilities directly into the hands of field technicians, sawmills operators, matting industry, wood anatomists, border controllers, and the forest products industry as a whole.