Estimating Tree Position, Diameter at Breast Height, and Tree Height in Real-Time Using a Mobile Phone with RGB-D SLAM

: Accurate estimation of tree position, diameter at breast height (DBH), and tree height measurements is an important task in forest inventory. Mobile Laser Scanning (MLS) is an important solution. However, the poor global navigation satellite system (GNSS) coverage under the canopy makes the MLS system unable to provide globally-consistent point cloud data, and thus, it cannot accurately estimate the forest attributes. SLAM could be an alternative for solutions dependent on GNSS. In this paper


SLAM
SLAM is the process by which a mobile platform builds a map of the surrounding environment and finds its location using the map in real-time, as shown in Figure 1.

SLAM
SLAM is the process by which a mobile platform builds a map of the surrounding environment and finds its location using the map in real-time, as shown in Figure 1. is the motion vector describing the movement of the platform from time k − 1 to time k; is the vector describing the position of the jth landmark.
A mobile platform moves in an unknown area and, at the same time, separately observes the surrounding unknown landmarks and records their relative motion through the mounted visual sensors and motion sensors. Then, the relative positions of the landmarks and the pose of the mobile platform are estimated in real time. In probabilistic form, data from motion sensors are described as a motion model (a probability distribution): Here, , are the platform poses at times k and k − 1, respectively. Data from the vision sensors are described as an observation model (a probability distribution): Here, is the set of all landmarks. Generally, the SLAM problem is best described as a probabilistic Markov chain. That is, the joint posterior probability at time k, P( , | : , : , ), is solved using the posterior joint probability at time k − 1, P( , | : , : , ), and some other conditions. In the SLAM algorithm, this is implemented using recursive steps, namely, predictive update and observation update forms: Predictive update A mobile platform moves in an unknown area and, at the same time, separately observes the surrounding unknown landmarks and records their relative motion through the mounted visual sensors and motion sensors. Then, the relative positions of the landmarks and the pose of the mobile platform are estimated in real time. In probabilistic form, data from motion sensors u k are described as a motion model (a probability distribution): P(x k |x k−1 , u k ). (1) Here, x k , x k−1 are the platform poses at times k and k − 1, respectively. Data from the vision sensors z k are described as an observation model (a probability distribution): Here, m is the set of all landmarks. Generally, the SLAM problem is best described as a probabilistic Markov chain. That is, the joint posterior probability at time k, P(x k , m|Z 0:k , U 0:k , x 0 ), is solved using the posterior joint probability at time k − 1, P(x k−1 , m|Z 0:k−1 , U 0:k−1 , x 0 ), and some other conditions. In the SLAM algorithm, this is implemented using recursive steps, namely, predictive update and observation update forms: Predictive update P(x k , m|Z 0:k−1 , U 0:k , x 0 ) = P(x k |x k−1 , u k )P(x k−1 , m|Z 0:k−1 , U 0:k−1 , x 0 )dx k−1 Observation update P(x k , m|Z 0:k , U 0:k , x 0 ) = P(z k |x k , m)P(x k , m|Z 0:k−1 , U 0:k , x 0 ) P(z k |Z 0:k−1 , U 0:k ) Obviously, the key to solving the SLAM problem is to give a reasonable expression of the motion model and the observation model, and then the predictive update and observation update are used to solve the target probability distribution. Three commonly-used methods for solving SLAM problems are extended Kalman filtering (EKF), sparse nonlinear optimization, and particle filtering.

The Technology of a Portable Graph-SLAM Device
SLAM algorithms have been implemented to track device positions in real-time to build maps of the surrounding three-dimensional (3D) world. Though most SLAM software today works only on high powered computers, Project Tango [29] enables this technology to run on a portable mobile platform (Smartphone). Project Tango created a technology that enables mobile devices to acquire their poses in the three-dimensional world in real time using highly customized hardware and software. Three-dimensional maps can also be constructed by combining pose data with texture and depth information. Figure 2 shows the smartphone with Google Tango sensors (Lenovo Phab 2 Pro [31]) which was used in this paper. The mobile phone contains a combination of an RGB camera, a time of flight camera, and a motion-tracking camera called a vision sensor. The sensor is used to acquire texture and depth information of the 3D world so that the phone has a similar perspective of the real world. A 9-axis acceleration/gyroscope/compass sensor combined with the vision sensor can be used to implement a Visual-Inertial Odometer system using SLAM algorithms. Some special hardware, such as a computer-vision processor, is used to help speed up data processing so that the pose of the device can be acquired and adjusted in real-time. Obviously, the key to solving the SLAM problem is to give a reasonable expression of the motion model and the observation model, and then the predictive update and observation update are used to solve the target probability distribution. Three commonly-used methods for solving SLAM problems are extended Kalman filtering (EKF), sparse nonlinear optimization, and particle filtering.

The Technology of a Portable Graph-SLAM Device
SLAM algorithms have been implemented to track device positions in real-time to build maps of the surrounding three-dimensional (3D) world. Though most SLAM software today works only on high powered computers, Project Tango [29] enables this technology to run on a portable mobile platform (Smartphone). Project Tango created a technology that enables mobile devices to acquire their poses in the three-dimensional world in real time using highly customized hardware and software. Three-dimensional maps can also be constructed by combining pose data with texture and depth information. Figure 2 shows the smartphone with Google Tango sensors (Lenovo Phab 2 Pro [31]) which was used in this paper. The mobile phone contains a combination of an RGB camera, a time of flight camera, and a motion-tracking camera called a vision sensor. The sensor is used to acquire texture and depth information of the 3D world so that the phone has a similar perspective of the real world. A 9-axis acceleration/gyroscope/compass sensor combined with the vision sensor can be used to implement a Visual-Inertial Odometer system using SLAM algorithms. Some special hardware, such as a computer-vision processor, is used to help speed up data processing so that the pose of the device can be acquired and adjusted in real-time. The device uses a Visual-Inertial Odometer system for the front end and back end optimization algorithms implement adjustments to account for odometry drift. These adjustments are mainly done through pose graph non-linear optimization and loop closure optimization. These optimizations use visual features to identify previously-visited regions and then adjust the corresponding camera pose drift during this process.

Study Area
This study was conducted in a managed forest located in the suburbs of Beijing, China (N39°59′ E116°11′, Figure 3). We took nine (12 m × 12 m) square plots based on a wide range of DBH and tree height values. The selected plots had fewer shrubs and were accessible to locals. Table 1 presents the summarized basic description of the plots. The device uses a Visual-Inertial Odometer system for the front end and back end optimization algorithms implement adjustments to account for odometry drift. These adjustments are mainly done through pose graph non-linear optimization and loop closure optimization. These optimizations use visual features to identify previously-visited regions and then adjust the corresponding camera pose drift during this process.

Study Area
This study was conducted in a managed forest located in the suburbs of Beijing, China (N39 • 59 E116 • 11 , Figure 3). We took nine (12 m × 12 m) square plots based on a wide range of DBH and tree height values. The selected plots had fewer shrubs and were accessible to locals. Table 1 presents the summarized basic description of the plots.

The System Workflow
An application was developed to enable the SLAM mobile phone to be used for forestry inventory. Figure 4 shows the workflow of our hardware and software system. The SLAM system uses an RGB-D camera and an inertial measurement unit (IMU) as inputs and produces RGB images, point clouds, poses, and time stamps. Our forest inventory system uses these data, and then interacts with the Android system to show the results on a screen or accept instructions from users. The forest inventory system includes mapping the plot ground and tree-by-tree estimation, as shown in Figure  5. The mapping process should provide a globally consistent map and the plot coordinate system for tree-by-tree estimation.

The System Workflow
An application was developed to enable the SLAM mobile phone to be used for forestry inventory. Figure 4 shows the workflow of our hardware and software system. The SLAM system uses an RGB-D camera and an inertial measurement unit (IMU) as inputs and produces RGB images, point clouds, poses, and time stamps. Our forest inventory system uses these data, and then interacts with the Android system to show the results on a screen or accept instructions from users. The forest inventory system includes mapping the plot ground and tree-by-tree estimation, as shown in Figure 5. The mapping process should provide a globally consistent map and the plot coordinate system for tree-by-tree estimation.

The System Workflow
An application was developed to enable the SLAM mobile phone to be used for forestry inventory. Figure 4 shows the workflow of our hardware and software system. The SLAM system uses an RGB-D camera and an inertial measurement unit (IMU) as inputs and produces RGB images, point clouds, poses, and time stamps. Our forest inventory system uses these data, and then interacts with the Android system to show the results on a screen or accept instructions from users. The forest inventory system includes mapping the plot ground and tree-by-tree estimation, as shown in Figure  5. The mapping process should provide a globally consistent map and the plot coordinate system for tree-by-tree estimation.

Mapping of the Plot Ground
The plot ground was mapped before observing the trees. That mapping was used to correct the poses of the mobile phone when observing the trees through loop-closure detection and pose graph nonlinear optimization. To obtain a globally consistent plot ground map, the scan path was designed as shown in Figure 6a. The plot center was used as the starting point of the mapping process towards the north of the plot, while the end was near to the north point. Figure 6b shows a typical instance of mapping path.

Building the Plot Coordinate System
Before measuring the tree attributes, we built a new coordinate system during the mapping process to describe the position of each tree in the plot. This coordinate system, known as the Plot Coordinate System (PCS), has the center of the plot as its origin, and the horizontal eastward/horizontal northward/vertical upward directions were defined as the x/y/z-axis directions. However, the mobile phone defined a coordinate system known as the Initial Coordinate System (ICS), in which the origin is the position of the device when it is initiated, the x-axis is towards the right side of the phone screen, and the y-axis is directed vertically upward, while the z-axis is towards the mobile phone screen. Conversion of the coordinate system was needed after the center and the north corner of the plot had been tapped on the screen during the mapping process, as shown in Figure 7a-c. The center of the plot was tapped at the beginning of the mapping process, and the north corner was tapped at the end of the mapping process. After that, the transformation matrix between the PCS and the ICS was obtained. Then, the PCS was transformed into the OpenGL coordinate system and displayed as shown in Figure 7d.

Mapping of the Plot Ground
The plot ground was mapped before observing the trees. That mapping was used to correct the poses of the mobile phone when observing the trees through loop-closure detection and pose graph nonlinear optimization. To obtain a globally consistent plot ground map, the scan path was designed as shown in Figure 6a. The plot center was used as the starting point of the mapping process towards the north of the plot, while the end was near to the north point. Figure 6b shows a typical instance of mapping path.
Remote Sens. 2018, 9, x FOR PEER REVIEW 6 of 19 Figure 5. The workflow of the forest inventory system for data acquirement.

Mapping of the Plot Ground
The plot ground was mapped before observing the trees. That mapping was used to correct the poses of the mobile phone when observing the trees through loop-closure detection and pose graph nonlinear optimization. To obtain a globally consistent plot ground map, the scan path was designed as shown in Figure 6a. The plot center was used as the starting point of the mapping process towards the north of the plot, while the end was near to the north point. Figure 6b shows a typical instance of mapping path.

Building the Plot Coordinate System
Before measuring the tree attributes, we built a new coordinate system during the mapping process to describe the position of each tree in the plot. This coordinate system, known as the Plot Coordinate System (PCS), has the center of the plot as its origin, and the horizontal eastward/horizontal northward/vertical upward directions were defined as the x/y/z-axis directions. However, the mobile phone defined a coordinate system known as the Initial Coordinate System (ICS), in which the origin is the position of the device when it is initiated, the x-axis is towards the right side of the phone screen, and the y-axis is directed vertically upward, while the z-axis is towards the mobile phone screen. Conversion of the coordinate system was needed after the center and the north corner of the plot had been tapped on the screen during the mapping process, as shown in Figure 7a-c. The center of the plot was tapped at the beginning of the mapping process, and the north corner was tapped at the end of the mapping process. After that, the transformation matrix between the PCS and the ICS was obtained. Then, the PCS was transformed into the OpenGL coordinate system and displayed as shown in Figure 7d.

Building the Plot Coordinate System
Before measuring the tree attributes, we built a new coordinate system during the mapping process to describe the position of each tree in the plot. This coordinate system, known as the Plot Coordinate System (PCS), has the center of the plot as its origin, and the horizontal eastward/horizontal northward/vertical upward directions were defined as the x/y/z-axis directions. However, the mobile phone defined a coordinate system known as the Initial Coordinate System (ICS), in which the origin is the position of the device when it is initiated, the x-axis is towards the right side of the phone screen, and the y-axis is directed vertically upward, while the z-axis is towards the mobile phone screen. Conversion of the coordinate system was needed after the center and the north corner of the plot had been tapped on the screen during the mapping process, as shown in Figure 7a-c. The center of the plot was tapped at the beginning of the mapping process, and the north corner was tapped at the end of the mapping process. After that, the transformation matrix between the PCS and the ICS was obtained. Then, the PCS was transformed into the OpenGL coordinate system and displayed as shown in Figure 7d.

Estimation of the Stem Position, DBH, and Tree Height
Estimation of the stem position and DBH After mapping the plot ground, we were able to observe each tree individually. While observing a standing tree, a point near the bottom of the tree was acquired to determine the breast height and the approximate stem position (see Figure 7e,f). After the bottom point had been taken, the breast height was displayed on the screen as shown in Figure 7f. Then, the position of the stem center and DBH were calculated using the current camera pose and the point cloud ( Figure 8).

Estimation of the Stem Position, DBH, and Tree Height
Estimation of the stem position and DBH After mapping the plot ground, we were able to observe each tree individually. While observing a standing tree, a point near the bottom of the tree was acquired to determine the breast height and the approximate stem position (see Figure 7e,f). After the bottom point had been taken, the breast height was displayed on the screen as shown in Figure 7f. Then, the position of the stem center and DBH were calculated using the current camera pose and the point cloud ( Figure 8). For convenience of operation, an auxiliary coordinate system (Auxiliary Camera Coordinate System, ACCS) was established with a similar origin to that of the camera coordinate system (CCS), as the y-axis direction was same as the y-axis direction of the PCS (vertically upward), and the z-axis was in the plane formed by the new y-axis and the z-axis of the CCS. The point cloud data were constructed and transformed into the ACCS. The points belonging to the tree were filtered according to the bottom point ( _ ). Figure 9 shows the scatter plot of these points on the plane O − x − z , while the filter conditions were set to The marginal distribution of all filtered points in the x-axis direction was approximately uniformly distributed ( Figure 10). When the mean ( μ ) and variance ( σ ) of the points were calculated, the range of the stem was ( μ − √3σ , μ + √3σ ) in the x-axis direction. RANSAC (Random Sample Consensus) was used to improve the robustness of the interval in this process ( Figure 11). For convenience of operation, an auxiliary coordinate system (Auxiliary Camera Coordinate System, ACCS) was established with a similar origin to that of the camera coordinate system (CCS), as the y-axis direction was same as the y-axis direction of the PCS (vertically upward), and the z-axis was in the plane formed by the new y-axis and the z-axis of the CCS. The point cloud data were constructed and transformed into the ACCS. The points belonging to the tree were filtered according to the bottom point (x ac_bottom ). Figure 9 shows the scatter plot of these points on the plane O ac − x ac − z ac , while the filter conditions were set to Remote Sens. 2018, 9, x FOR PEER REVIEW 8 of 19

Estimation of the Stem Position, DBH, and Tree Height
Estimation of the stem position and DBH After mapping the plot ground, we were able to observe each tree individually. While observing a standing tree, a point near the bottom of the tree was acquired to determine the breast height and the approximate stem position (see Figure 7e,f). After the bottom point had been taken, the breast height was displayed on the screen as shown in Figure 7f. Then, the position of the stem center and DBH were calculated using the current camera pose and the point cloud ( Figure 8). For convenience of operation, an auxiliary coordinate system (Auxiliary Camera Coordinate System, ACCS) was established with a similar origin to that of the camera coordinate system (CCS), as the y-axis direction was same as the y-axis direction of the PCS (vertically upward), and the z-axis was in the plane formed by the new y-axis and the z-axis of the CCS. The point cloud data were constructed and transformed into the ACCS. The points belonging to the tree were filtered according to the bottom point ( _ ). Figure 9 shows the scatter plot of these points on the plane O − x − z , while the filter conditions were set to The marginal distribution of all filtered points in the x-axis direction was approximately uniformly distributed ( Figure 10). When the mean ( μ ) and variance ( σ ) of the points were calculated, the range of the stem was ( μ − √3σ , μ + √3σ ) in the x-axis direction. RANSAC (Random Sample Consensus) was used to improve the robustness of the interval in this process ( Figure 11). The marginal distribution of all filtered points in the x-axis direction was approximately uniformly distributed ( Figure 10). When the mean (µ x ) and variance (σ x ) of the points were calculated, the range of the stem was (µ in the x-axis direction. RANSAC (Random Sample Consensus) was used to improve the robustness of the interval in this process ( Figure 11). The two edges of the stem were underestimated due to the assumption of uniform distribution. The stem edges were adjusted by linear fitting the points near to the stem edges, and the point on each fitted line that was farthest from the previous estimation interval was used as the boundary point ( Figure 12); RANSAC was used in the process too. The TOF camera uses the principle of perspective projection to obtain images. So, when the stem edge points were defined as and , both were the tangent points of the stem circle and the lines and . The stem circle needed to meet the following conditions: ① Line and had to be the tangents of the circle and points , had to be the tangent points; and ② The points between point and point had to be in the circle. Condition ① could be expressed as the cost functions: The two edges of the stem were underestimated due to the assumption of uniform distribution. The stem edges were adjusted by linear fitting the points near to the stem edges, and the point on each fitted line that was farthest from the previous estimation interval was used as the boundary point ( Figure 12); RANSAC was used in the process too. The TOF camera uses the principle of perspective projection to obtain images. So, when the stem edge points were defined as and , both were the tangent points of the stem circle and the lines and . The stem circle needed to meet the following conditions: ① Line and had to be the tangents of the circle and points , had to be the tangent points; and ② The points between point and point had to be in the circle. Condition ① could be expressed as the cost functions: The two edges of the stem were underestimated due to the assumption of uniform distribution. The stem edges were adjusted by linear fitting the points near to the stem edges, and the point on each fitted line that was farthest from the previous estimation interval was used as the boundary point ( Figure 12); RANSAC was used in the process too. The two edges of the stem were underestimated due to the assumption of uniform distribution. The stem edges were adjusted by linear fitting the points near to the stem edges, and the point on each fitted line that was farthest from the previous estimation interval was used as the boundary point ( Figure 12); RANSAC was used in the process too. The TOF camera uses the principle of perspective projection to obtain images. So, when the stem edge points were defined as and , both were the tangent points of the stem circle and the lines and . The stem circle needed to meet the following conditions: ① Line and had to be the tangents of the circle and points , had to be the tangent points; and ② The points between point and point had to be in the circle. Condition ① could be expressed as the cost functions: The TOF camera uses the principle of perspective projection to obtain images. So, when the stem edge points were defined as T 1 and T 2 , both were the tangent points of the stem circle and the lines T 1 O ac and T 2 O ac . The stem circle needed to meet the following conditions: 1 Line T 1 O ac and T 2 O ac had to be the tangents of the circle and points T 1 , T 2 had to be the tangent points; and 2 The points between point T 1 and point T 1 had to be in the circle. Condition 1 could be expressed as the cost functions: Here, r is the residual to be optimized; w is the weight of the cost function; X T 1 = (x T 1 , z T 1 ) and X T 2 = (x T 2 , z T 2 ) are the coordinates of points T 1 and T 2 in the plane O ac − x ac − z ac ; X c = (x c , z c ) is the stem center coordinate; and d is the DBH of the tree. Each point in Condition 2 can be constructed as the cost function Here, X i = (x i , z i ) is one of the points between points T 1 and T 2 in the circle. Because condition 1 is more important for the circle fitting than condition 2 , the weight w should have a large value. In this paper, it was determined according to Condition 2 ; if the number of points in the optimization process was defined as N in condition 2 , the weight w was equal to N/4. After the cost functions had been constructed, the Levenberg Marquardt algorithm was used to fit the stem circle. To increase the robustness of the optimization result, RANSAC was used in the fitting process. Figure 13 shows the fitting circle. Then, the stem position coordinate was converted into the PCS and projected into the OpenGL coordinate system, so the result could be viewed from the display (Figure 7g).
Here, r is the residual to be optimized; is the weight of the cost function; X = x , z and X = x , z are the coordinates of points and in the plane O − x − z ; X = (x , z ) is the stem center coordinate; and d is the DBH of the tree. Each point in Condition ② can be constructed as the cost function Here, X = (x , z ) is one of the points between points and in the circle. Because condition ① is more important for the circle fitting than condition ②, the weight should have a large value. In this paper, it was determined according to Condition ② ; if the number of points in the optimization process was defined as N in condition ②, the weight was equal to N/4. After the cost functions had been constructed, the Levenberg Marquardt algorithm was used to fit the stem circle. To increase the robustness of the optimization result, RANSAC was used in the fitting process. Figure 13 shows the fitting circle. Then, the stem position coordinate was converted into the PCS and projected into the OpenGL coordinate system, so the result could be viewed from the display ( Figure  7g).

Estimation of the tree height
To estimate the height of the tree, the device was kept at a position where the treetop could be observed simultaneously (Figure 7h). After tapping the treetop position on the screen, the tree height was calculated in real time. As the treetop was on the connection line between the optical center of the camera and the tapped pixel (Figure 14), the coordinates of the treetop point were expressed as Here, _ = _ , y _ , z _ is the coordinate of the treetop point in the PCS; _ is the translation vector of the camera in the PCS; R _ is the rotation matrix of the camera in the PCS; K is the intrinsic matrix of the camera; = (u, v, 1) is the coordinate of the treetop pixel in the pixel coordinate system; is an unknown factor. However, additional conditions were needed to determine k. If the standing tree was assumed to be vertical, the treetop would be on the vertical line passing through the center of the stem, but since a natural stem is not exactly vertical and the tapped treetop pixel showed deviation, the two lines were difficult to intersect or even impossible to intersect. To solve this problem, we assumed that the treetop was on the plane of the stem center, whose normal vector is denoted by

Estimation of the tree height
To estimate the height of the tree, the device was kept at a position where the treetop could be observed simultaneously (Figure 7h). After tapping the treetop position on the screen, the tree height was calculated in real time. As the treetop was on the connection line between the optical center of the camera and the tapped pixel (Figure 14), the coordinates of the treetop point were expressed as Here, x p_top = (x p_top , y p_top , z p_top ) T is the coordinate of the treetop point in the PCS; t p_c is the translation vector of the camera in the PCS; R p_c is the rotation matrix of the camera in the PCS; K is the intrinsic matrix of the camera; u = (u, v, 1) T is the coordinate of the treetop pixel in the pixel coordinate system; k is an unknown factor. However, additional conditions were needed to determine k. If the standing tree was assumed to be vertical, the treetop would be on the vertical line passing through the center of the stem, but since a natural stem is not exactly vertical and the tapped treetop pixel showed deviation, the two lines were difficult to intersect or even impossible to intersect. To solve this problem, we assumed that the treetop was on the plane of the stem center, whose normal vector is denoted by Here, x p_center = (x p_center , y p_center , z p_center ) T is the central coordinate of the stem. In this way, the treetop coordinate satisfied (x p_top − x p_center )·n = 0.
Then, after being calculated from (11) and (13), the treetop coordinate x p_top was Naturally, the tree height h equaled As shown in Figure 7i,j, the treetop point x p_top was projected into the OpenGL coordinate system and displayed on the screen.
Here, _ = x _ , y _ , z _ is the central coordinate of the stem. In this way, the treetop coordinate satisfied Then, after being calculated from (11) and (13), the treetop coordinate _ was Naturally, the tree height h equaled h = y _ − y _ + 1.3.
As shown in Figures 7i,j, the treetop point _ was projected into the OpenGL coordinate system and displayed on the screen. is the center of the stem at breast height; is the optical center of the camera; is the treetop; is the treetop pixel on the image; is the horizontal component vector of the O vector; and is the plane whose normal vector is and on which is a point.

Evaluation of the Accuracy of the Stem Position, DBH and Tree Height Measurements
Stem position references were determined by total station (Leica Flexline TS06plus Total Station) measurement data, and DBH measurements. When measuring the position of a tree, the total station was installed in the center of the plot. The angle between the stem and north and the horizontal distance from the plot center to the trunk were measured by the total station and recorded. The true distance true position of the stem was calculated from the distance and angle, and the center point of the stem was needed for true position calculation. The stem position errors were described by a mean vector and a two-dimensional covariance matrix Σ, as follows: Here, (x , y ) is the ith position measurement. (x , y ) is the ith reference, and is the number of estimations. The direction of the eigenvector corresponding to the largest eigenvalue of the covariance matrix is the direction with the greatest variability. The standard deviation of this O is the center of the stem at breast height; C is the optical center of the camera; T is the treetop; U is the treetop pixel on the image; n is the horizontal component vector of the OC vector; and P is the plane whose normal vector is n and on which O is a point.

Evaluation of the Accuracy of the Stem Position, DBH and Tree Height Measurements
Stem position references were determined by total station (Leica Flexline TS06plus Total Station) measurement data, and DBH measurements. When measuring the position of a tree, the total station was installed in the center of the plot. The angle between the stem and north and the horizontal distance from the plot center to the trunk were measured by the total station and recorded. The true distance true position of the stem was calculated from the distance and angle, and the center point of the stem was needed for true position calculation. The stem position errors were described by a mean vector µ and a two-dimensional covariance matrix Σ, as follows: Here, (x i , y i ) is the ith position measurement. (x ir , y ir ) is the ith reference, and n is the number of estimations. The direction of the eigenvector corresponding to the largest eigenvalue of the covariance matrix is the direction with the greatest variability. The standard deviation of this direction is equal to the square root of the largest eigenvalue, which can describe the variability of the trunk position, and can be expressed as σ max = max(eigenvalues(Σ)).
The references of the DBH were measured using a taper and the references of the tree height were measured using a total station. The accuracy of the DBH and tree height measurements were evaluated using BIAS, root mean square error (RMSE), relative BIAS, and relative RMSE, as defined in the following equations: Here, x i is the ith measurement. x ir is the ith reference. n is the number of estimations. RMSE is also a way of describing the accuracy of the tree position by computing the values in the x-axis and y-axis directions, respectively.

Evaluation of Tree Position
The estimated stem positions are given in Figure 15. The accuracy of the stem position was checked with a total station instrument. Our results showed that the range of the average error was approximately −0.12 m to 0.13 m in both the x-axis and y-axis directions, as shown in Table 2. In addition, no significant correlation (approximately −0.08 to 0.35) was found between the errors in the x-axis and y-axis directions, so the standard deviations of the maximum variability direction (0.09~0.16 m) were close to the x-axis (0.05~0.16 m) and y-axis (0.05~0.13 m). The RMSEs in the x-axis (0.09~0.17 m) and y-axis (0.07~0.17 m) directions were also close. As shown in Figure 16, although the tree position estimated for each plot had systematic errors, there was a weak error overall. This shows that the systematic errors were random in different sample plots. The overall standard deviation in the maximum variability direction was small (0.10 m) in all plots. Table 2. Accuracy of the stem position estimations using the SLAM smartphone.

Evaluation of DBH
The DBH values measured by the smartphone with RGB-D sensor were compared with field data that were measured using diameter tape. Our results showed that our estimated values of DBH were similar to the reference data obtained in the field (Figure 17). Table 3 describes the statistical results of the DBHs in different plots. A 1.26 cm (6.39%) RMSE and a 0.33 cm (1.78%) BIAS is shown for all plots. Figure 18 shows that for smaller DBH values, our estimated results were larger than the reference value, while the case was reversed for larger DBH values. It can also be seen from the figure that the dispersion of observations was relatively stable.

Evaluation of DBH
The DBH values measured by the smartphone with RGB-D sensor were compared with field data that were measured using diameter tape. Our results showed that our estimated values of DBH were similar to the reference data obtained in the field (Figure 17). Table 3 describes the statistical results of the DBHs in different plots. A 1.26 cm (6.39%) RMSE and a 0.33 cm (1.78%) BIAS is shown for all plots. Figure 18 shows that for smaller DBH values, our estimated results were larger than the reference value, while the case was reversed for larger DBH values. It can also be seen from the figure that the dispersion of observations was relatively stable.

Evaluation of DBH
The DBH values measured by the smartphone with RGB-D sensor were compared with field data that were measured using diameter tape. Our results showed that our estimated values of DBH were similar to the reference data obtained in the field (Figure 17). Table 3 describes the statistical results of the DBHs in different plots. A 1.26 cm (6.39%) RMSE and a 0.33 cm (1.78%) BIAS is shown for all plots. Figure 18 shows that for smaller DBH values, our estimated results were larger than the reference value, while the case was reversed for larger DBH values. It can also be seen from the figure that the dispersion of observations was relatively stable.

Evaluation of Tree Height
Our estimated results of the tree heights were similar to the reference field data measured by total stations (Figure 19). The statistical results are summarized in Table 4. It is shown that the estimations generally had smaller BIAS (0.07 m, 0.54%) and RMSE (0.55 m, 3.71%) values, except plot 5 and plot 6. The dispersion of observed values increased as the tree height increased, as shown in Figure 20.

Evaluation of Tree Height
Our estimated results of the tree heights were similar to the reference field data measured by total stations (Figure 19). The statistical results are summarized in Table 4. It is shown that the estimations generally had smaller BIAS (0.07 m, 0.54%) and RMSE (0.55 m, 3.71%) values, except plot 5 and plot 6. The dispersion of observed values increased as the tree height increased, as shown in Figure 20.

Evaluation of Tree Height
Our estimated results of the tree heights were similar to the reference field data measured by total stations (Figure 19). The statistical results are summarized in Table 4. It is shown that the estimations generally had smaller BIAS (0.07 m, 0.54%) and RMSE (0.55 m, 3.71%) values, except plot 5 and plot 6. The dispersion of observed values increased as the tree height increased, as shown in Figure 20.

Discussion
SLAM technology provides a solution for positioning without GNSS signals in places like forests. In this study, we measured the positions, DBHs, and heights of trees in real-time using the poses and dense point cloud data provided by a phone with RGB-D SLAM.
The tree positions were obtained from the device pose and a circle was fitted to the points around the breast height of each tree trunk. The drift of the phone pose mainly affected the accuracy of the tree positions. In this paper, a global consistency map of the plot ground was pre-built; it can be used to reduce the drift by loop-closure detection and pose graph nonlinear optimization. Our result showed an error of approximately −0.12 to 0.13 m on both the x-axis and the y-axis and a 0.1~0.16 m standard deviation in the maximum variability direction; the RMSEs were 0.09~0.17 m and 0.07~0.17 m in the x-axis and the y-axis, respectively. Many studies have examined the extraction of tree positions, but few of them given results. Reference [22] used an unmanned ground vehicle with 3D LiDAR and graph-SLAM to scan the study area, and the error for the relative distance estimation between trees was 0.0476 m. Reference [25] used a Small-Footprint Mobile LiDAR to scan the study area, resulting in RMSEs of the motion trajectories in the two different areas of 0.32 m and 0.29 m, respectively, but no detail of the positional accuracy of the trees was provided. However, those studies were conducted over a relatively larger study areas than ours; therefore, our method of using the mobile phone with RGB-D SLAM needs to be tested for accuracy over large areas. Reference [33] used Tango-based point clouds to extract tree positions offline, showing an RMSE value of 0.105 m for distances from the central point in the better scanning pattern, which is similar to our results.
We attempted to attain the DBH values from point clouds. The method has been widely studied; Reference [34] extracted DBH values by fitting circles to the point cloud at breast height using three different algorithms (Lemen, Pratt and Taubin), and the results showed that the BIAS value was approximately −0.12 to 0.07 cm and the RMSE was 1.47~2.43 cm in single scan mode, while in merged scan mode, the BIAS was approximately −0.32 to 0 cm, and the RMSE was 0.66~1.21 cm. Similarly, reference [15] extracted DBH values by fitting the cylindrical and point cloud data from TLS that showed a BIAS value of approximately −0.18 to 0.76 cm and an RMSE value of 0.74~2.41 cm when using the single-scan method; however, the BIAS value was 0.11~0.77 cm and the RMSE was 0.90~1.90 cm when using the multi-single-scan method. Reference [35] extracted DBH values using the Hough transformation and the RANSAC algorithm; their results showed a BIAS value of approximately −0.6 to 1.3 cm and an RMSE value of 2.6~5.9 cm. Reference [33] obtained DBHs from Tango-based point clouds, and the RMSEs were in the range of 1.61~2.10 cm. This paper showed a 1.26 cm (6.39%) RMSE and a 0.33 cm (1.78%) BIAS. In our study, the point clouds for detecting DBH originated from a TOF camera instead of a LiDAR; therefore, our work increased the tangential and tangent point constraints by detecting the trunk boundaries and using the perspective projection principle of the TOF camera. This could be a possible reason why higher accuracies were obtained with a low-quality point cloud compared to LiDAR. In addition, the AR technology in this paper showed the fitted circle at breast height on the screen in real-time, making it possible for the observer to judge whether it needed to be re-fitted or not, thus artificially reducing the interference caused by noise points.
The method used to measure the height in this paper was similar to the traditional altimeter, which calculates the attribute by measuring the distance from the observation position to the tree and the inclinations of the tree bottom and treetop. The results showed a BIAS value of approximately -0.83 to 2.08 m and an RMSE value of 0.46~2.44 m. The result also showed that the measurement results had high precision when the tree was not higher than 20 m, as shown in Figure 20, although AR technology was used, which enabled the observer to determine whether the displayed treetop position was appropriate; otherwise, it was adjusted as needed. This may be influenced by occlusion and the subjective decisions of observers. Reference [36] evaluated a Laser-relascope, a classical traditional instrument; they reported a −0.016 m BIAS and a 0.190 m standard deviation. The limitation of the Laser-relascope is that it needs to be mounted on a fixed site, but our device did not need that. Reference [15] used the height difference between the ground level and highest point on the point cloud around the tree model, and obtained results with a BIAS value of approximately −1.30 to 1.50 m and an RMSE value of 1.36~4.29 m when using the single-scan method, while the BIAS value was approximately −0.34 to 2.11 m and the RMSE value was 2.04~6.53 m when using the multi-single-scan method. Reference [7] examined the consistency of the point cloud on the treetops when exclusively using TLS data compared to the previous method; the BIAS value was approximately −0.3 to 0.11 m and the RMSE value was 0.31~0.8 m. The point cloud at the treetop was difficult to obtain due to occlusion, so the estimates were not accurate. The device used in this article can acquire point clouds by using the TOF camera, but the measuring range is only 0.4~3.0 m, which makes it hard to get the point cloud at the treetop.
In the SLAM algorithm, corners or blobs are often used as visual feature points. A good feature should have localized accuracy (both in position and scale), repeatability, robustness, etc. However, it is difficult to find good corners or blobs, especially in forests with complex ground conditions, such as shrubs. The plot data used in this article were taken from human accessible areas with fewer shrubs on the ground. However, most forests will not meet these special conditions. In addition, the RGB camera needs to search for feature points in an environment with sufficient illumination, while the TOF camera is susceptible to strong illumination because it uses infrared light as its light source. Therefore, the device is only suitable for use under the canopies during the day.
Although the device is a possible option for forestry surveys, it still has some limitations. For example, (1) although the device can estimate forest inventory parameters in real-time, it is less efficient than TLS, MLS, or the previous uses of SLAM because of the need to access each tree individually; (2) the taller tree accuracy of our tree height estimates was compromised a little, perhaps due to drift; and (3) the tree position coordinates obtained by the device do not use the geodetic coordinate system, but rather, the plot coordinate system of the origin at the center of the plot. In addition, the device uses the manual selection method to locate the bottom of the tree.
Unfortunately, the Google Tango Project has been terminated, i.e., is longer being developed or supported. However, ARKit [37], a monocular SLAM system, was released by Apple. Google has also introduced a similar solution-ARCore [38]. These two technologies allow simultaneous localization and mapping for ordinary smartphones (without a TOF camera). Of course, due to the new features of these technologies, many aspects, such as how to get dense point clouds in real-time, still need to be investigated in the future.

Conclusions
This paper provided a solution to estimate tree position, DBH, and tree height under the canopy in real-time without GNSS signals using a mobile phone with RGB-D SLAM. The tree position and DBH were estimated by circle fitting the point cloud data. This paper added the tangent point constraints and the tangent constraints. The difference was that this article used the pose data provided by the SLAM system in real-time instead of using angles and distances to estimate tree height. The results showed that the tree height estimations were accurate, especially when the tree was not too tall. The experimental results also showed that this method can potentially be used to accurately estimate the tree position and DBH.
It is recommended that future research studies are carried out to test the device under complex forest conditions, such as in areas with more shrubs, better or poorer light, different tree species, and in different aged forests. Future studies should also focus on extracting other forest inventory attributes, such as stem curves and crown diameters.