A Deep Learning Method for Log Diameter Measurement Using Wood Images Based on Yolov3 and DeepLabv3+

: Wood volume is an important indicator in timber trading, and log diameter is one of the primary parameters used to calculate wood volume. Currently, the most common methods for measuring log diameters are manual measurement or visual estimation by log scalers, which are laborious, time consuming, costly, and error prone owing to the irregular placement of logs and large numbers of roots. Additionally, this approach can easily lead to misrepresentation of data for profit. This study proposes a model for automatic log diameter measurement that is based on deep learning and uses images to address the existing problems. The specific measures to improve the performance and accuracy of log-diameter detection are as follows: (1) A dual network model is constructed combining the Yolov3 algorithm and DeepLabv3+ architecture to adapt to different log-end color states that considers the complexity of log-end faces. (2) AprilTag vision library is added to estimate the camera position during image acquisition to achieve real-time adjustment of the shooting angle and reduce the effect of log-image deformation on the results. (3) The backbone network is replaced with a MobileNetv2 convolutional neural network to migrate the model to mobile devices, which reduces the number of network parameters while maintaining detection accuracy. The training results show that the mean average precision of log-diameter detection reaches 97.28% and the mean intersection over union (mIoU) of log segmentation reaches 92.22%. Comparisons with other measurement models demonstrate that the proposed model is accurate and stable in measuring log diameter under different environments and lighting conditions, with an average accuracy of 96.26%. In the forestry test, the measurement errors for the volume of an entire truckload of logs and a single log diameter are 1.20% and 0.73%, respectively, which are less than the corresponding error requirements specified in the industry standards. These results indicate that the proposed method can provide a viable and cost-effective solution for measuring log diameters and offering the potential to improve the efficiency of log measurement and promote fair trade practices in the lumber industry.


Introduction
Taking log diameter and length measurements are essential and recurring tasks in forestry work that provide vital data for calculating log volume [1].Through analyzing these data, forest plantation plans can be better managed.With the development of technology, harvesting machines have been commonly used to achieve equal-length felling of logs.However, manual methods are still being employed for log diameter measurements.This type of measurement method has a high work intensity, is time-consuming, and the results are often subjective.This may lead to exaggerated or falsely reported data from the measurers for personal interests that negatively affect the fairness of market transactions [2,3].At present, there are relatively few mature, commercially available methods for detecting log diameters.Therefore, the development of a new method for the rapid and objective measurement of log diameters is of practical importance.It would assist trading parties in promptly verifying timber quantities and preventing log substitution or theft during a transportation.Moreover, it would enable forest managers to better understand tree-related data, thereby enhancing forest resource management effectiveness.With technology advancing, researchers are actively exploring efficient approaches for detecting and measuring logs.These methods are primarily categorized into laserbased approaches [4][5][6] and vision-based methods, depending on how log data is acquired.While laser measurement can collect more precise image data, vision-based methods offer advantages in terms of equipment cost and portability.Computer vision technology has undergone tremendous development over the past few decades, offering new possibilities for log-diameter measurements.Image processing techniques are commonly used for fast-diameter estimation, and several studies have been conducted by scholars centered around these aspects of log-region detection, log-end segmentation, and log-profile counting.Chen et al. [7] proposed a method to detect log-diameter classes using binocular vision.They achieved log-area detection by using a maximum threshold and connectivity domain identification, and the log diameter was obtained by fitting a mathematical model to the segmented end face using the reconstructed 3D coordinates.Lin et al. [8] proposed a contour-recognition method for bundled logs.This method combined principal component analysis [9] with histogram statistics in the hue, saturation, and value (HSV) color space [10] to analyze the color features of the pictures to separate the log-end faces.Finally, the diameter of each identified log was obtained by applying reference-scale pixel calculations.Xinxiu et al. [11] obtained log pixels by transforming an image into the CIELAB color space [12] and realizing K-means clustering [13] for the A and B color channels on the transformed image.The clustering results were then subjected to the Hough transform [14], and the log root count was realized by Hough fitting to the logs while realizing the segmentation of the connected region.The counting accuracy was 95.78%.Although many log-measurement methods based on traditional image-processing techniques have been proposed, these approaches have strict requirements for lighting, log shape, shading degree, and background.Images taken in an actual forestry field have interference factors that make it difficult for the above methods to be widely used for logdiameter measurement, such as different degrees of log shading, irregular cutting, and uneven lighting.
In recent years, convolutional neural networks that can effectively learn features from training samples, particularly in image data analysis, have been widely used in agriculture and forestry.Kuznetsova et al. [15] used Yolov3 as the detection system for a fruitpicking robot and achieved the results, with an average apple detection time of 19 ms, 7.8% misidentified apples, and 9.2% unidentified apples.Cai et al. [16] proposed a method for segmenting spotted fragrant tree leaf images using a modified DeepLabv3+ network.The model exhibited excellent segmentation performance for different levels of scattered spot segmentation that could quickly assess the disease condition, and thus contributed to garden conservation.Zhu et al. [17] proposed a two-stage DeepLabv3+ algorithm with adaptive losses to segment apple leaf disease images in complex scenarios.The model achieved intersection-over-union (IoU) values of 98.70% for leaf segmentation and 86.56% for spot extraction, providing an effective solution for leaf and disease spot extraction in complex environments.There have also been studies using convolutional neural networks to detect log-ends.Samdangdech et al. [18] achieved detection and counting of log sections at the end of lumber trucks by labeling segmentation of pixel points in log end and nonlog region through the Single Shot MultiBox Detector (SSD) [19] and VGG16 network [20].Lin et al. [21] designed a wood volume detection system combining yolov3-tiny and Hough transforms with good robustness.
Single-stage object-detection algorithms have been proposed to improve detection speed in the field of object detection, such as the SSD and You Only Look Once (YOLO) algorithms [22].Unlike the SSD series, which has redundant parameters and a large model structure, the YOLO series is characterized by its simple structure and fast recognition speed [23].In the YOLO families, Yolov3 performs better in detecting dense and small objects than other versions, and its stability and reliability have been confirmed in research.Detecting logs is essentially a single-class dense object detection, so Yolov3 is very viable for detecting the end of logs.However, the ultimate goal is to obtain log diameters, and traditional image processing methods cannot adapt well to tightly packed logs with complex end conditions.Therefore, a new end-faces segmentation method is needed.Chen et al. [24] first proposed the DeepLab series of networks as a representative algorithm for semantic segmentation based on the VGG16 network.DeepLabv3+ was proposed after continuous optimization and improvement, which adopts an encoder-decoder system and strengthens the decoder section [25].Consequently, the model can achieve good results at the edges of semantic segmentation.Compared to segmentation based on image hue and grayscale, semantic segmentation offers better segmentation performance by dividing images into different objects from a pixel perspective.Using Deeplabv3+ for log-end segmentation can prevent the phenomenon of logs adhering to adjacent end surfaces during the image processing of timber, which will result in expansion of the log area, causing an error in the fitting range that in turn affects the log-diameter measurement results.
This study proposes a real-time, criteria-compliant, two-neural-network combination method to measure log diameters at forestry sites.Compared to traditional methods, this approach offers greater adaptability, is capable of handling various lighting conditions, and facilitates a more convenient detection process.First, the applicability of the Yolov3 algorithm and DeepLabv3+ architecture for log-diameter measurements was evaluated using a log-image dataset.Second, AprilTag vision library was used to correct the shooting angle and reduce the influence of the images on the log-diameter measurement data during measurements.Finally, the log-measurement model was tested to verify its feasibility and effectiveness in a forestry farm.

Materials and Methods
The log diameter measurement consisted of two steps: obtaining an image with the aid of AprilTag and obtaining the log diameter by processing the image using the trained model.The steps of the model used to obtain the log diameter are shown in Figure 1.First, single logs in the wood pile were separated using the MobileNetv2-Yolov3 network, and the pixel coordinates of each log were obtained relative to the image.Second, a single log image was input into the MobileNetv2-DeepLabv3+ network to separate the log-end face and obtain the contours.Finally, the log diameter was obtained by fitting based on the log contours.When manually measuring the diameter, the shortest direction of the log section was considered to be its diameter.To accommodate this characteristic, an ellipse was utilized to fit the end face, with the short axis value of the ellipse being regarded as the diameter.The MobileNetv2-Yolov3 and MobileNetv2-DeepLabv3+ network structures are presented in Section 2.2.The optimal angle adjusted according to the AprilTag is described in Section 2.3.

Dataset
The images used in this study were obtained from eucalyptus trees that were felled in a forest in Nanning, Guangxi, China.The images depicted the stacked log-ends at the side or back of a vehicle, as shown in Figure 2a,b, respectively.The imaging device is a SAM-SUNG Galaxy S10+ smartphone equipped with the SM-G9750 sensor, featuring a primary camera boasting 12 million pixels.The focal length and aperture of the camera remained constant throughout the entire shooting process.A total of 25 photographs were captured using a smartphone under natural light, with the logs positioned at the center of the photograph; the cut sections were free of visible obstructions such as foliage.The number of logs in each picture ranged from 500-700.A training dataset of 56 images was obtained for Yolov3 after the non-overlapping cropping of the images.The DeepLabv3+ training dataset consisted of 750 images of individual logs recognized by Yolov3.Both datasets were labeled using the LabelImg annotation tool and randomly divided into training, testing, and validation datasets at common ratios of 70%, 20%, and 10%, respectively.The original images and Yolov3 labels from an annotation tool that assigned rectangular ground-truth bounding boxes to the log-end faces are shown in Figure 3a,b, respectively.In the annotation process of the YOLOv3 dataset, there are some rules that should be noted: First, it is imperative that the bounding boxes enclose all pixels of the target.Even in instances where the target is partially occluded, it is essential to select bounding boxes that encompass the complete set of relevant pixels, drawing upon experiences.Second, it is crucial to ensure that the starting or ending point of the annotation box does not coincide with the edges of the image.Failure to do so may result in errors during data processing by the network.The annotated data labels for DeepLabv3+ were converted into binary images, as shown in Figure 4, where red and black represent the log-section labeling and background, respectively.

Backbone Feature Extraction Network
MobileNetV1 is a lightweight model that was proposed by Google in 2017 for cell phones [26].MobileNetv2 is an upgraded version of MobileNetv1 that includes a bottleneck residual block (BRB) module consisting of three parts: a 1 × 1 convolution to increase the dimensionality of input features, 3 × 3 depth-separable convolution to extract features, and 1 × 1 convolution to reduce dimensionality [27].Thus, MobileNetv2 achieves higher accuracy while maintaining a smaller model size.This is beneficial for migrating subsequent models to portable devices.
The Darknet53 backbone was replaced with MobileNetv2 in the Yolov3 network, which changed the feature image fusion method.However, the other network structures remained unchanged, as shown in the network structure diagram in Figure 5. Here, the red dashed rectangle represents the MobileNetv2 network structure.After replacement, the output of the 14th BRB was fused with an upsampled 13 × 13 feature image to obtain a 26 × 26 feature map for the 416 × 416 input image.In addition, the output of the 7th BRB was fused with an upsampled 26 × 26 feature image to obtain a 52 × 52 feature map.

Experiment on the Best Shooting Angle
Different angles between the imaging equipment and log pile may result in varying degrees of deformation on the log-end surfaces when taking pictures, leading to deviations in the conversion of pixel diameters to physical diameters of the logs and thereby affecting the accuracy of diameter measurements.Therefore, AprilTag was used to obtain the camera position and adjust the camera placement angle in real time to reduce the impact of the shooting angle on the image [28].
AprilTag is a visual reference library similar to QR codes or barcodes that is widely used in robotics and camera calibration.The algorithm can accurately identify an AprilTag location despite a complex environment because of its uniqueness.Consequently, AprilTag can adapt to the changing environment of the forestry field.The camera calibration was conducted using Zhang's calibration method before acquiring the angles [29].The internal reference matrix of the camera was obtained with x-and y-axis focal lengths of 3100.3 and 3101.8,respectively.The process of obtaining the angle was as follows.First, four vertex pixel coordinates were returned by AprilTag.Second, the coordinates were combined with the internal reference matrix of the camera and corresponding points under the world coordinate system to obtain the rotation R and translation T matrices of the camera around the world coordinate system.Finally, the three-axis rotation angle of the camera coordinate system was obtained to realize a real-time display of the angle and adjust the camera position according to the method proposed by Slabaugh [30].
Variation curves were recorded for the AprilTag pixel edge lengths on the left, center, and right sides of the shooting screen with the camera's y-axis shooting angle to determine the optimal shooting angle of the phone.According to the calibration, the camera angles were zero, negative, and positive when parallel to the shooting plane, rotated counterclockwise, and rotated clockwise, respectively.A schematic of the shooting process and results of changing the shooting angle are shown in Figure 7.The camera was positioned 3400 mm from the wall at the same height as the AprilTag, and the side length of each AprilTag was 162 mm.The results showed that the side length of AprilTag A on the left side increased with an increasing angle (Figure 7b), whereas that of AprilTag C on the right side decreased (Figure 7d).The edge length of AprilTag B at the center decreased and then increased as the angle increased (Figure 7c).A change in the shooting angle caused a change in the pixels because the imaging principle of the camera was approximated as a small-aperture imaging model and the relationship between the lengths of the object and image satisfied: where H denotes the length of the object, h denotes the imaging length of the object, f denotes the focal length of the camera, and d denotes the distance of the object from the camera.d increased and f remained constant as the angle of the camera changed, causing the scale to decrease.The value of H was fixed, and h decreased to satisfy the ratio, which explains the trend of the change in the edge length.Regarding the analysis of the AprilTag A trend changes, the distance between the camera and AprilTag A gradually increased during the rotation of the camera from left to right, leading to a gradual increase in pixel length.The imaging length was obtained with a known focal length of 3101.8, object distance of 3400, object length of 162, and imaging length of 147.8 according to Equation (1).This value was closest to the pixel length when the angle was close to zero, compared to the pixel length change curve.Therefore, t the image distortion is minimized.

Evaluation Indicators
mAP and mIoU were selected to evaluate the recognition and segmentation performances, respectively.mAP is the most commonly used evaluation index in object detection experimental research and represents the average of the average precision (AP) of all categories.mAP is expressed as: where i denotes the i detection category and c denotes the number of detected categories.AP denotes the area enclosed by the curve formed by the precision (P), recall (R), and coordinate axis.AP, P, and R are respectively expressed as: P =T P∕(TP + FP) (4) R = TP∕(TP + FN) (5) where TP denotes the number of correctly determined positive samples, FN denotes the number of incorrectly determined negative samples, and FP denotes the number of incorrectly determined positive samples.mIoU is a standard measurement for semantic segmentation, representing the average ratio of the intersection to the union of the predicted bounding and ground truth boxes for all categories, which is expressed as: The results of mAP and AP were consistent since only one category of logs was detected, thus the subsequent text utilized the mAP results.

Training Environment
The network models were run on the PyTorch [31] platform with a Windows 11 operating system.The computer had a 13th Gen Intel (R) Core (TM) i7-13700K CPU with a clock rate of 3.40 GHz, 32 GB RAM, and NVIDIA GeForce GTX1080 Ti graphics processor with 11 GB of RAM.
The appropriate selection of learning rate is crucial for achieving convergence to the local minimum of the objective function within a reasonably time frame.Therefore, we compared the accuracy of different models with varying initial learning rates before formal training.The results are presented in Figure 8, indicating that MobileNetv2-Yolov3 model achieved the highest accuracy at a learning rate of 0.0001 while MobileNetv2-Deeplabv3+ model performed best at a learning rate of 0.005.

Results of Model Training
Transfer learning can reduce the impact of insufficient data on the training results, allowing even small datasets to achieve good training performance [32].Therefore, transfer learning was applied to the proposed model to improve log recognition and segmentation.The training parameters for MobileNetv2-Yolov3 were set as follows: batch size of 6, Adam optimizer, initial learning rate of 0.0001, and 320 epochs.The training parameters for MobileNetv2-DeepLabv3+ were set as follows: batch size of 10, Adam optimizer, initial learning rate of 0.005, and 100 epochs.
The training results for the Yolov3 and DeepLabv3+ networks are shown in Figure 9, where black and red represent the changes in the loss value and accuracy rate, respectively.The loss value and accuracy rate tended to stabilize as the number of training iterations increased.Yolov3 exhibited a stable loss value after 100 iterations and stable accuracy rate after 150 iterations.DeepLabv3+ exhibited a stable loss value after 80 iterations and steady accuracy rate after 80 iterations.The precision, recall, mAP, and mIoU values of the training dataset are listed in Table 1.The training obtained a mAP of 97.28% and mIoU of 92.22%, which are high values.Therefore, the training achieved the expected effect.

Results of Log-Diameter Measurement
The model was tested in a forest, and the actual diameters of the measured logs were obtained from the pixels of the reference of the ratio of AprilTag pixel edge length to the actual length.The measurement results are shown in Figure 10 and the fitting results are shown in Figure 10d.The yellow font denotes the fitted diameters of the corresponding logs.The Yolov3 and DeepLabv3+ models performed well in identification and segmentation despite cluttered environments and occluded log ends, demonstrating good stability and recognition and segmentation performance, as shown in Figure 10b,c, respectively.Log measurements can take place in various settings depending on the needs the forest.Figure 11 illustrates the measurement process in different environments, including on trucks and in log yards.While logs on trucks are typically organized in a neat manner, those in log yards are often randomly arranged, which can pose challenges for accurate measurements due to unstable positioning.However, the figure demonstrates that the method successfully handled log diameter measurements in diverse scenarios, showcasing its adaptability.While Figure 11a depicts a log end entirely covered by bark remaining undetected, such instances are infrequent and can be recognized and avoided during image capture.Several vehicles loaded with logs were tested, and the comparative results of one vehicle's measurement data are presented in Table 2.This table includes the number of logs corresponding to different diameter classes and the total log volume of the vehicle.The log volume measured by the picture was 16.558 m 3 , while the volume data provided by the forest site was 16.356 m 3 , with a measurement error of 1.2%.Twenty-two logs were randomly selected and the diameter data was measured and compared with those of a manually measured model.The comparison results are listed in Table 3, revealing an average comprehensive error of 0.73%.These results demonstrated that the measurement errors of log volume and log diameter met industry standards, which are less than 3%.Considering the natural shape of trees, the cross-sections of felled logs typically exhibit a circular appearance.Despite the differences in tree species, their cross-sections are similar.Pine trees were successfully detected during testing, demonstrating that the method can be applied to other tree species as well.Therefore, the proposed method could serve as a new solution for log diameter measurement.

Comparison of Training Performance of Different Backbone Networks
The original backbone network was replaced with the MobileNetv2 network for the structural design.A comparison experiment between the original backbone and replacement network was conducted to verify the effectiveness of the replacement network, and the results are listed in Tables 4 and 5.The experimental results showed that the replacement model parameters and training times were significantly reduced, and the accuracy was maintained at a similar level in the Yolov3 andDeepLabv3+ models.These experiments demonstrated that the replacement of the backbone network was correct.

Performance Comparison of Different Segmentation Methods
Log-end faces show different states when they are affected by external factors (such as light conditions, shadows, humidity, and leaf shading).Segmentation stability for different states of log-end face conditions is a crucial challenge in measuring a log diameter by image.
A comparison was made to validate the segmentation performance and robustness of the proposed dual-network model; the method was compared with K-Means clustering and HSV thresholding.Thirty log photos from different scenes were tested and the segmentation results of dual-network model, K-means, and HSV were obtained by counting the number of log roots.The accuracy was measured as the ratio of successfully fitted logs to the total number of logs in each image, with average accuracies of 96.26%, 92.95%, and 69.09%, respectively.Some of the segmentation results are shown in Figure 12.The segmentation accuracies of the different segmentation methods for each test image are depicted in Figure 13.The segmentation accuracies show that the dual-network model achieved more than 90% accuracy for each test image, demonstrating its superior segmentation performance.The proposed model exhibited good stability when adapted to various environments, lighting conditions, and log placements.Therefore, it is appropriate to utilize the dual-network model for log diameter measurements.In the test pictures (Figure 12, the second picture), we discovered that the processing ability of the model measurement still needs to be improved when faced with a higher level of occlusion on the log end face.Due to the nature of deep learning, more log images of different forest scenes should be collected in future to improve the robustness of the proposed method to measure log diameter measurements.

Advantage Analysis of Dual-Network Detection System
The traditional measurement method is often time-consuming and labor-intensive, particularly when dealing with a large number of logs.A comparison was made to analyze the advantages of the proposed method.The detection system was compared with the traditional manual method in measurement efficiency and cost.
When it comes to measuring time, Figure 14 illustrates the log diameter measurement process in a forest farm.Ordinarily, two individuals perform manual measurement, with one taking measurements and the other recording data.A truck can contain 500-800 logs, and it takes about 30 min to manually measure each log.Moreover, measuring logs on the top of the truck requires the use of a ladder.With the proposed method, if a computer equipped with a 13th Gen Intel Core i7-13700K CPU processes the data, each vehicle can measure the diameter of all logs in about 5 min, resulting in a time saving of 25 min.Regarding measurement cost, the required equipment for the dual-networks system includes a computer and a mobile phone.The computer is used to analyze images to obtain detection results, while the mobile phone is used to capture images.There are no additional training costs for taking pictures, simply instruct the photographer to adjust the log picture parallel to the mobile phone.While manual measurement workers require a monthly salary of about 5000 RMB, the measurement system only incurs costs for purchasing a computer, a smartphone, and a printing AprilTag, which are reusable.Based on the comparative results, the proposed system effectively improves the efficiency of log diameter measurement and reduces measurement costs.The implementation of this system will contribute to enhancing the efficiency of forestry surveys, alleviating the burden on staff, and offering new solutions for modern forestry management.

Conclusions
Log-diameter measurement is an important task in forestry.This study proposed a criteria-compliant log-diameter measurement model using a dual network combining Yolov3 and DeepLabv3+, with MobileNetv2 as the backbone network.The study can be summarized as follows.
1.The deformation of log images caused by shooting angles was reduced using AprilTags.2. The proposed method was trained and evaluated using a log dataset and tested in a forest.
3. A comparative study was conducted to verify the segmentation advantages of the proposed method over other commonly used segmentation methods, namely Kmeans clustering and HSV threshold segmentation.
The proposed log-diameter measurement model worked quickly and accurately in forest farms and was adaptable and robust to different forest farm measurement scenarios and log-end faces.The rapid and accurate measurement will help managers to manage and track the whole process of logs from harvesting to sale and realizes the digital management of forest resources.The results of the forestry tests showed that the measurement method met industry standards and could be promoted and applied, which is beneficial to forest resource management.Future research will focus on improving measurement accuracy and applicability by collecting more log image data to cover a wider range of log samples in terms of species, size, and condition, thereby enhancing the model's ability to generalize.Additionally, efforts will be made to refine diameter measurement conversion methods and explore how to incorporate other types of data, such as infrared images, to enhance measurement accuracy.

Figure 2 .
Figure 2. Examples of stacked wood end images, capturing in different scenes: (a) in the forest farm and (b) at the wood factory.

Figure 3 .
Figure 3. Examples of annotated images for Yolov3: (a) original image and (b) annotation.

Figure 4 .
Figure 4. Examples of annotated images for DeepLabv3+.With each group of images separated by a dashed line, red and black represent the log-section labeling and background, respectively.

Figure 5 .
Figure 5.The structure of MobileNetv2-Yolov3.Modified Aligned Xception is traditionally used for the backbone feature extraction network.Here, it was replaced with the lightweight MobileNetv2 in the encoder section of DeepLabv3+.The network structure after the replacement is shown in Figure 6.The MobileNetv2 portion of the figure shows the specific structure of the replaced backbone network and number of low-level feature output layers.

Figure 7 .
Figure 7. Schematic diagram and curves of the pixel length of AprilTag codes changing with shooting angle.(a) Shooting schematic diagram, (b) AprilTag A, (c) AprilTag B, and (d) AprilTag C.

Figure 11 .
Figure 11.Log diameter measurements in different scenarios: (a) truck and (b) timber yard.

Figure 13 .
Figure 13.Segmentation accuracy of different methods for each test image.

Figure 14 .
Figure 14.The log diameter measurement process image.

Table 2 .
Comparison of log volume measurements.

Table 3 .
Results of randomly individual log diameter measurements.

Table 4 .
Results of Yolov3 with different backbone.

Table 5 .
Results of DeepLabv3+ with different backbone.