Feasibility of Google Tango and Kinect for Crowdsourcing Forestry Information

In this paper, we demonstrate the feasibility of using the Microsoft Kinect and Google Tango frame-based depth sensors for individual tree stem measurements and reconstruction for the purpose of forest inventory. Conventionally field reference data in forest inventory are collected at tree and sample plot level by means of manual measurements (e.g., a caliper), which are both labor-intensive and time-consuming. In this study, color (i.e., red, green and blue channels, RGB) and range images acquired by a Kinect and Tango systems were processed and used to extract tree diameter measurements for the individual tree stems. For this, 121 reference stem diameter measurements were made with tape and caliper. Kinect-derived tree diameters agreed with tape measurements to a 1.90 cm root-mean-square error (RMSE). The stem curve from the ground to the diameter at breast height agreed with a bias of 0.7 cm and random error of 0.8 cm with respect to the reference trunk. For Tango measurements, the obtained stem diameters matched those from tape measurement with an RMSE of 0.73 cm, having an average bias of 0.3 cm. As highly portable and inexpensive systems, both Kinect and Tango provide an easy way to collect tree stem diameter and stem curve information vital to forest inventory. These inexpensive instruments may in future compete with both terrestrial and mobile laser scanning or conventional fieldwork using calipers or tape. Accuracy is adequate for practical applications in forestry. Measurements made using Kinect and Tango type systems could also be applied in crowdsourcing context.


Introduction
Mapping forest resources and their condition has great economic and ecological importance. Especially so, as trees contribute significantly to the carbon balance of the Earth. International interest in biomass detection is strongly linked to forest health, photosynthetic activity and other processes related to the carbon cycle and the variability of the climate [1]. There is a growing need and constant shortage of data for improved forest monitoring (e.g., [2]). Furthermore, forest inventory provides an input for forest operations, forest management, and related decision making. Retrieval of forest canopy and stand information for large areas has been mainly carried out using remote sensing, especially space-borne techniques [3,4], but has increasingly shifted towards airborne laser scanning (ALS) [5][6][7][8][9]. Forest inventories based on remote sensing data depend on the quality and quantity of field data collected on sample plots, which are used to calibrate remote sensing measurements, and to reduce the systematic (i.e., bias) and random errors of estimates. Reference data for sample plots is today mainly collected by manual measurements, although there is active research towards applying e.g., Terrestrial Laser Scanning (TLS), Mobile Laser Scanning (MLS) or Personal Laser Scanning (PLS) for plot level inventory. The attributes measured in operational forest field reference inventories are mainly the number of trees, tree species and Diameter at Breast Height (DBH). The latter is measured using simple tools, such as calliper and measuring tape. As these conventional field measurement techniques are expensive and labor-intensive [10,11], more cost-effective, automated techniques are needed. Even though the DBH is currently the most important attribute measured in ground surveys, it would be beneficial to measure the stem curve, or at least a part of it, as the stem curve actually defines the 3D geometry of the tree trunk. At this moment, the extraction of stem curve is one of the main research focus from TLS and MLS data, see [12] for an example.
Geographic crowdsourcing is collection of geospatial data by voluntary citizens, untrained in the disciplines of geography, cartography or related fields [13,14]. In forestry, crowdsourcing has been applied in assessing condition of city trees [15,16], even though some questions concerning the reliability of such has been raised [17]. By utilizing applications potentially useable in crowdsourcing scenarios, a basal area accuracy of 5 m 2 /ha [18], RMSE for basal area of 19.7-29.3% [19] and DBH root mean squared error of less than 7 cm [20] have been achieved.
Large-area forest inventory estimates can be calculated based on either area based approach (ABA) or individual tree detection (ITD) approaches using ALS data. In Finland, ALS itself is available as open data, but there is a lack of open plot-or tree-level data for calibrating the ALS estimates. While the ALS data can reliably produce e.g., tree height estimates for dominant trees, it suffers from not as good depiction of suppressed trees, and issues in identification of adjacent trees when located very close to each other. These errors can be compensated with plot level reference data. Today, this lack of local reference data causes the largest misestimation under boreal conditions. ITD techniques operate by detecting individual trees, with tree-level variables, such as height and volume, measured or predicted from the ALS data. In these analyses, the basic unit is an individual tree. International comparisons of the use of ALS for ITD have been reported in [21][22][23]. Advantage of the ITD approach over area-based approach (ABA) in crowdsourcing context is that it requires a smaller amount of reference trees, compared to that in ABA, for reasonable accuracy. It is also easier for the land-owners to measure physically well-established parameters, such as the diameter of the tree (cm) instead of the basal area (m 2 /ha). In addition, the common disadvantage of ITD approaches of not being able to detect all suppressed trees, that are poorly observable through the above canopy, can be partially overcome by the use of crowdsourcing, by including the missing major trees of ALS interpretation. Thus, there is synergy between the ITD and crowdsourcing.
Originally commercialized as an accessory for the Xbox (Microsoft Corporation, Redmond, WA, USA) 360 game console, Kinect (Version 1, 2010, Microsoft Corporation, Redmond, WA, USA) is a structured light system capable of capturing several millions of data points per second, with an affordable and easily applicable consumer-grade device. It incorporates color and near-infrared cameras, as well as a near-infrared laser projector. It projects a pseudo-random pattern on the target and uses the parallax information to calculate distance from the target. It is capable of producing 640 by 480 pixels depth images at 30 frames per second, with a maximum measurement distance of about 4 m.
Kinect has been applied in various applications, such as indoor mapping (e.g., [24][25][26][27][28]). A benefit of Kinect is that it efficiently brings 3D sensing capability to any conventional computer, thus allowing application development. The use of Kinect for tree measurements has been proposed for forestry robots [29,30], and plot level data collection [31]. With Kinect, individual tree diameters with 4.4 cm and 9.2 cm accuracies for deciduous and coniferous trees have been attained.
Depth sensors fully integrated to mobile devices have also emerged, with the Google Tango (2014, Google Inc., Menlo Park, CA, USA) technology [32] being available on two consumer devices. Tango is a development platform aiming to bring depth sensing to mobile devices. It consists of standardized application programming interfaces (APIs) to utilize depth sensor in Android software development. Tango framework allows the depth sensing to be accomplished by stereo vision, structured light, or time-of-flight principle [33]. As integrated systems, Tango enabled devices allow 3D sensing on a highly mobile platforms, utilizing sensor integration and Simultaneous Localization and Mapping (SLAM) for 3D measuring [34]. Devices with Tango sensors have in research been applied to indoor mapping [35], interior planning [36] and forestry [37].
While Tango and Kinect rely on similar operating principles, they offer different possibilities for application. Kinect is operated on a regular computer, and can thus be integrated to other tasks & software. Field tasks are possible when using a laptop or a tablet PC. Tango represents a fully integrated mobile system akin to a mobile phone camera, making it more suited for mobile tasks and crowdsourcing. Both of these consumer-level technologies could be applied for collecting objective, non-biased reference information, later applicable for remote sensing inventories. To give an example, there are more than 600,000 forest owners alone in Finland (12% of the population). Many of them visit their forests regularly or live in the nearby area. If 0.1% of these forest owners would measure 20 trees each, from their own forests, the resulting data set would consist 12,000 individually measured trees, producing an accurate, non-biased countrywide biomass/volume map of unprecedented accuracy.
Our objective is to firstly show the value of utilizing a local field reference for ITD forest inventory from ALS data by evaluating its impact to forest inventory estimate errors, and secondly, demonstrate the feasibility of Microsoft Kinect and Google Tango sensors for collecting stem geometry data on an individual tree level. Compared to working with a measuring tape, Microsoft Kinect or Google Tango represent efficient and potentially user-friendly ways to obtain diameter measurements and directly produce digital information. Finally, we test if these sensors could also provide stem curve information, potentially allowing an even higher accuracy of field data.

Test Area and Field Reference
Three test sites are utilized in the study, one (Masala) being used for Kinect and Tango studies, and two (Evo & Kalkkinen) for evaluating the impact of local field reference data.
The test area for the Kinect and Tango studies is located in Masala, Kirkkonummi, in southern Finland near the Finnish Geospatial Research Institute. As ground truth, reference data were measured with tape and caliper. The accuracy of diameter measurements using the calliper has been earlier analyzed to be about 0.7 cm by repeating measurements for a large number of trees and assuming the measurements are independent and of equal accuracy [38]. Since the tree stem cross-sections are often ellipses rather than perfect circular, it is conventional to measure the diameters in both north-south and east-west directions with calliper and then average the two measurements, calculating the square root of the average of squared values. We also used tape to record the circumference of each tree. Measurements were taken at different heights along the trunk; diameters were taken at lower part of the trunk (heights between 10 and 200 cm) to be able to estimate the lowest part of the stem curve. A polymer/fibre measuring tape was used to obtain the circumference measurements, after which the resulting circumferences were divided by Pi to calculate the diameter. In theory, the diameter for the rest of the trunk can be estimated using this stem curve information, which is a common practice among harvesters. The test data set constituted 41 and 80 diameter measurements from Kinect and Tango, respectively. For Kinect, both birches and scots pines were scanned. For Tango, the dataset consisted of pines. Kinect and Tango measurements were performed by authors having prior experience in 3D scanning methods.
In the local field reference test, two test sites (Evo and Kalkkinen) were used to show the value of utilizing a local field reference for ITD forest inventory from ALS data. Both of these areas lie in the Southern Finland, separated by a distance of approximate 34.5 km. Kalkkinen site consisted of mature forests, where no silvicultural operations had been carried out for a considerable amount of time. In Kalkkinen, the dominant tree species are Norway spruce and Scots pine. Evo site, on the other hand, is a commercially exploited forest area, with dominant tree species being Scots pine and Norway spruce. Data from field measurements by traditional methods (of accuracy equivalent to Kinect and Tango measurements) were used for demonstrating the potential of new techniques in improvement of forest attribute estimates. Table 1 shows the difference in forest attributes at plot level: Mean volume of Kalkkinen is almost double to that in Evo.

Kinect Measurements
When measuring trees with Kinect, the heights were marked with pins (see Figure 1) for every selected tree in order to guarantee that measurements were taken from the same height. The pins are not needed in real-life application, as they were used to ensure the correspondence between Kinect data and manual measuring methods. Images were acquired from marked heights. The stem curve was measured by moving the Kinect along the tree trunk and registering consecutive images together using tie points ( Figure 1). The whole trunk can be reconstructed by merging the obtained diameter estimates at each height. In a real-life application, the Kinect scanning would be completed from a single position, holding the device manually and scanning along the tree trunk. For higher accuracy, the scanning could be performed from two positions.  Table 1 shows the difference in forest attributes at plot level: Mean volume of Kalkkinen is almost double to that in Evo.

Kinect Measurements
When measuring trees with Kinect, the heights were marked with pins (see Figure 1) for every selected tree in order to guarantee that measurements were taken from the same height. The pins are not needed in real-life application, as they were used to ensure the correspondence between Kinect data and manual measuring methods. Images were acquired from marked heights. The stem curve was measured by moving the Kinect along the tree trunk and registering consecutive images together using tie points ( Figure 1). The whole trunk can be reconstructed by merging the obtained diameter estimates at each height. In a real-life application, the Kinect scanning would be completed from a single position, holding the device manually and scanning along the tree trunk. For higher accuracy, the scanning could be performed from two positions.

Kinect Data Processing
The processing of Kinect RGB and range data was based on range data, by fitting a circle to the scan points around the buffer area. The marked positions were detected semi-automatically using standard image processing techniques. The steps in detail are given as follows: 1. Extraction of trunk skeleton from range image by thresholding along the direction of the camera's depth axis, i.e., removal of the background to produce a trunk mask; 2. Automatic detection of the markers on the trunk from RGB images (first converted to grey scale) by comparing intensities, as markers are brighter than surroundings; 3. Extraction of the point cloud from range images in a buffer area of detected marker positions; 4. Fitting an optimum circle to the extracted point cloud and computing the diameter of the reconstructed circle at the corresponding height.

Kinect Data Processing
The processing of Kinect RGB and range data was based on range data, by fitting a circle to the scan points around the buffer area. The marked positions were detected semi-automatically using standard image processing techniques. The steps in detail are given as follows: 1.
Extraction of trunk skeleton from range image by thresholding along the direction of the camera's depth axis, i.e., removal of the background to produce a trunk mask;

2.
Automatic detection of the markers on the trunk from RGB images (first converted to grey scale) by comparing intensities, as markers are brighter than surroundings; 3.
Extraction of the point cloud from range images in a buffer area of detected marker positions; 4.
Fitting an optimum circle to the extracted point cloud and computing the diameter of the reconstructed circle at the corresponding height.
If the tree stem was imaged from the same height by more than one scan, the mean value of the diameter, obtained from the reconstructed circles was used. Figure 2 demonstrates one example of diameter estimation and the major steps in the determination of diameters using Kinect data. If the tree stem was imaged from the same height by more than one scan, the mean value of the diameter, obtained from the reconstructed circles was used. Figure 2 demonstrates one example of diameter estimation and the major steps in the determination of diameters using Kinect data.

Measurements with Tango Sensor
A mobile phone with Google Tango sensor (Phab2ProAR (Lenovo, Morrisville, NC, USA) [39]) was used to 3D scan a set of selected large pines from Masala, Kirkkonummi, Southern Finland. The device contains conventional smart phone hardware with the addition of a depth camera, allowing active 3D data acquisition. The primary purpose of integrating a 3D sensing system in a smartphone is augmented reality applications, but several 3D scanning software have also been developed for Tango devices.
The Matterport Scenes application [40] was applied in scanning the tree stems. It utilizes the Tango depth sensor in conjunction with device's orientation sensors and RGB camera, producing a 3D RGB point cloud. Scanning was performed from motion, maintaining an approximate orientation of depth camera sensor facing the tree stem while walking around the stem in a circular motion. Each tree was scanned three times, producing a total of 240 diameter estimates of 80 manually measured stem diameters. In addition, tree stem scanning was also tested by scanning two adjacent stems in one go, as shown in Figure 3. Red markers were placed on the tree stem in 10 cm height intervals, from ground surface to 1.5 m. For ground truth, circumference of the stem was measured from these heights with a measuring tape.

Measurements with Tango Sensor
A mobile phone with Google Tango sensor (Phab2ProAR (Lenovo, Morrisville, NC, USA) [39]) was used to 3D scan a set of selected large pines from Masala, Kirkkonummi, Southern Finland. The device contains conventional smart phone hardware with the addition of a depth camera, allowing active 3D data acquisition. The primary purpose of integrating a 3D sensing system in a smartphone is augmented reality applications, but several 3D scanning software have also been developed for Tango devices.
The Matterport Scenes application [40] was applied in scanning the tree stems. It utilizes the Tango depth sensor in conjunction with device's orientation sensors and RGB camera, producing a 3D RGB point cloud. Scanning was performed from motion, maintaining an approximate orientation of depth camera sensor facing the tree stem while walking around the stem in a circular motion. Each tree was scanned three times, producing a total of 240 diameter estimates of 80 manually measured stem diameters. In addition, tree stem scanning was also tested by scanning two adjacent stems in one go, as shown in Figure 3. If the tree stem was imaged from the same height by more than one scan, the mean value of the diameter, obtained from the reconstructed circles was used. Figure 2 demonstrates one example of diameter estimation and the major steps in the determination of diameters using Kinect data.

Measurements with Tango Sensor
A mobile phone with Google Tango sensor (Phab2ProAR (Lenovo, Morrisville, NC, USA) [39]) was used to 3D scan a set of selected large pines from Masala, Kirkkonummi, Southern Finland. The device contains conventional smart phone hardware with the addition of a depth camera, allowing active 3D data acquisition. The primary purpose of integrating a 3D sensing system in a smartphone is augmented reality applications, but several 3D scanning software have also been developed for Tango devices.
The Matterport Scenes application [40] was applied in scanning the tree stems. It utilizes the Tango depth sensor in conjunction with device's orientation sensors and RGB camera, producing a 3D RGB point cloud. Scanning was performed from motion, maintaining an approximate orientation of depth camera sensor facing the tree stem while walking around the stem in a circular motion. Each tree was scanned three times, producing a total of 240 diameter estimates of 80 manually measured stem diameters. In addition, tree stem scanning was also tested by scanning two adjacent stems in one go, as shown in Figure 3. Red markers were placed on the tree stem in 10 cm height intervals, from ground surface to 1.5 m. For ground truth, circumference of the stem was measured from these heights with a measuring tape. Red markers were placed on the tree stem in 10 cm height intervals, from ground surface to 1.5 m. For ground truth, circumference of the stem was measured from these heights with a measuring tape. Figure 4 shows a point cloud obtained with the mobile device. The terrain and other vegetation surrounding the tree stem were manually segmented using CloudCompare (Version 2.8.1, 2016). After this 20 mm high 'slices' of point cloud were separated from the marked heights, identifying the markers from the point clouds RGB data (Figure 4). 2D circular fitting was then performed to extract diameter information for stem from marked heights.

Tango Data Processing
Forests 2018, 9, 6 6 of 14 Figure 4 shows a point cloud obtained with the mobile device. The terrain and other vegetation surrounding the tree stem were manually segmented using CloudCompare (Version 2.8.1, 2016). After this 20 mm high 'slices' of point cloud were separated from the marked heights, identifying the markers from the point clouds RGB data (Figure 4). 2D circular fitting was then performed to extract diameter information for stem from marked heights.

Accuracy Assessment
Determination of the accuracy of the Kinect-and Tango estimates was performed by comparing them against calliper and tape measurements (dtape,calliper). The coefficient of determination (R 2 ) was calculated between the data sets to describe the goodness of fit between the obtained estimates and reference measurements. Bias (Equation (1)) and the root-mean-square error RMSE (Equation (2)

Local Field Reference Test
In the local field reference test, the influence of plot level information (diameter data with accuracy comparable to Kinect or Tango) to forest inventory estimates was explored utilizing traditionally obtained single tree data to calibrate estimations from ALS. For both of the sites, ALS data was used as the basis of inventory: Evo test area has been surveyed with Leica ALS50-II (Leica Geosystems, Sankt Gallen, Aarau, Switzerland), 400 m AGL, 2009, and point density of 16 pts/m 2 whereas Kalkkinen data was collected with Toposys Falcon, 400 m AGL, 2003, and the point density was 10 pts/m 2 . For calibration, field data (292 plots in Evo and 33 plots in Kalkkinen) were collected

Accuracy Assessment
Determination of the accuracy of the Kinect-and Tango estimates was performed by comparing them against calliper and tape measurements (d tape,calliper ). The coefficient of determination (R 2 ) was calculated between the data sets to describe the goodness of fit between the obtained estimates and reference measurements. Bias (Equation (1)) and the root-mean-square error RMSE (Equation (2)) were calculated as the measures of the accuracy of estimates. RMSE-% was obtained from RMSE by dividing it with the sample mean value.

Local Field Reference Test
In the local field reference test, the influence of plot level information (diameter data with accuracy comparable to Kinect or Tango) to forest inventory estimates was explored utilizing traditionally obtained single tree data to calibrate estimations from ALS. For both of the sites, ALS data was used as the basis of inventory: Evo test area has been surveyed with Leica ALS50-II (Leica Geosystems, Sankt Gallen, Aarau, Switzerland), 400 m AGL, 2009, and point density of 16 pts/m 2 whereas Kalkkinen data was collected with Toposys Falcon, 400 m AGL, 2003, and the point density was 10 pts/m 2 . For calibration, field data (292 plots in Evo and 33 plots in Kalkkinen) were collected with conventional methods at the corresponding time of laser acquisition. More details of these test sites can be found from [41][42][43]. To simulate the impact of local field reference, Evo plots served as conventional plots (with every tree measured by conventional methods) while Kalkkinen plots were used as test plots. In order to test the effect of locally collected reference data on forest parameter estimation accuracy we used three different strategies: (1) All 292 plots of Evo data were applied as references to develop prediction models using an area-based method. Developed models were then used to make the estimation of plot attributes for 33 plots in Kalkkinen. This corresponds to a situation where the calibration data comes from a different area than the area being the subject of inventory. (2) Kalkkinen data were used for both developing models and making predictions by area-based method. In this case, two third of data were used as training and one third as testing.
The procedure was repeated 100 times. This corresponds to an ideal situation where calibration data can be collected from the area being studied. (3) A hybrid approach, consisting of the currently applied solution added with local reference data.
All Evo data and 10% of Kalkkinen data (randomly selected 96 trees) were used to develop models. Predictions were conducted firstly for individual trees and then individual predictions were aggregated to plot level followed by area-based estimations. This corresponds to a situation where a certain amount of local reference data can be collected from the area being studied.
We used Random Forests (RF) to construct the forest prediction models, since previous studies have reported as good performance as other parametric or non-parametric methods if not better (e.g., [42]). In strategy 1 and 2, 23 plot features derived from ALS data were used as predictors. In strategy 3, 26 tree features derived from individual tree detection were used as predictors in individual tree prediction and aggregated values as predictors in area-based predictions. More details of the methods can be found in [43]. Table 2 gives the bias (cm) and RMSE (cm) and RMSE-% values when Kinect diameters are compared with calliper and tape, calliper with tape and Tango diameters with tape. The use of Kinect or Tango resulted in a smaller biases than using callipers, perhaps due to slack of the calliper jaws. The RMSE of 1.9 cm of Kinect (7.3%) compared to 1.2 cm of calliper (5.9%) is an impressive level of accuracy for a low-cost (100€) remote sensing device. An even more impressive RMSE of 0.73 cm (1.89%) was attained with a consumer smart phone having a Tango sensor. Scatter plots of diameters measured by Kinect and Tango compared to those obtained with measuring tape are shown in Figure 5a,b.  A positive mean bias of 3.34 mm was observed between Tango data and tape measures (ranging from minimum error of 0.3 mm to maximum error of 26.5 mm), with the RMSE being 0.73 cm. The performance is impressive for a handheld consumer device with on device point cloud reconstruction. When observing the repeated Tango measurements, it was discovered that the performance of the system varied significantly from scan to another. Figure 6 shows two different scans for the same tree stem. One of the scans seems to contain a misaligned segment, whereas the other is well composed. Looking at the results tree by tree, RMSE of 0.32 cm was attained for the best individual tree scan, the worst being 1.47 cm, calculated from all profiles obtained from the same tree in one scan.  (a,b), respectively. The cross sections of marked areas are shown from a top-down view on the right. The misalignments can be seen in both scans, but the repeated scans at different points of time show different misalignments, which indicates the performance of the Tango system may clearly vary from scan to another.

Measuring Multiple Trees
If two trees were reconstructed in the same Tango scan (as in Figure 3), more performance variations were encountered ( Figure 7). As the size of the scanned area increased, more incorrectly registered segments are produced by the system, negatively effecting the accuracy of diameter estimation (Figure 8).
In best cases, the performance remained high, with smallest RMSE of a single tree being 0.44 cm, calculated from 16 diameter estimates of the stem. However, in the worst case, an RMSE of 6.06 cm was observed, calculated from 16 diameter estimates of the stem. For individual diameter estimates, the best fit was near perfect (error of 0.01%), with the worst ones having coarse errors (20.48%). In A positive mean bias of 3.34 mm was observed between Tango data and tape measures (ranging from minimum error of 0.3 mm to maximum error of 26.5 mm), with the RMSE being 0.73 cm. The performance is impressive for a handheld consumer device with on device point cloud reconstruction. When observing the repeated Tango measurements, it was discovered that the performance of the system varied significantly from scan to another. Figure 6 shows two different scans for the same tree stem. One of the scans seems to contain a misaligned segment, whereas the other is well composed. Looking at the results tree by tree, RMSE of 0.32 cm was attained for the best individual tree scan, the worst being 1.47 cm, calculated from all profiles obtained from the same tree in one scan. A positive mean bias of 3.34 mm was observed between Tango data and tape measures (ranging from minimum error of 0.3 mm to maximum error of 26.5 mm), with the RMSE being 0.73 cm. The performance is impressive for a handheld consumer device with on device point cloud reconstruction. When observing the repeated Tango measurements, it was discovered that the performance of the system varied significantly from scan to another. Figure 6 shows two different scans for the same tree stem. One of the scans seems to contain a misaligned segment, whereas the other is well composed. Looking at the results tree by tree, RMSE of 0.32 cm was attained for the best individual tree scan, the worst being 1.47 cm, calculated from all profiles obtained from the same tree in one scan.  (a,b), respectively. The cross sections of marked areas are shown from a top-down view on the right. The misalignments can be seen in both scans, but the repeated scans at different points of time show different misalignments, which indicates the performance of the Tango system may clearly vary from scan to another.

Measuring Multiple Trees
If two trees were reconstructed in the same Tango scan (as in Figure 3), more performance variations were encountered ( Figure 7). As the size of the scanned area increased, more incorrectly registered segments are produced by the system, negatively effecting the accuracy of diameter estimation (Figure 8).
In best cases, the performance remained high, with smallest RMSE of a single tree being 0.44 cm, calculated from 16 diameter estimates of the stem. However, in the worst case, an RMSE of 6.06 cm was observed, calculated from 16 diameter estimates of the stem. For individual diameter estimates, the best fit was near perfect (error of 0.01%), with the worst ones having coarse errors (20.48%). In Figure 6. Misalignments of a tree stem segments from Tango device. The same tree scanned at two different points of time is shown in (a,b), respectively. The cross sections of marked areas are shown from a top-down view on the right. The misalignments can be seen in both scans, but the repeated scans at different points of time show different misalignments, which indicates the performance of the Tango system may clearly vary from scan to another.

Measuring Multiple Trees
If two trees were reconstructed in the same Tango scan (as in Figure 3), more performance variations were encountered ( Figure 7). As the size of the scanned area increased, more incorrectly registered segments are produced by the system, negatively effecting the accuracy of diameter estimation (Figure 8).
In best cases, the performance remained high, with smallest RMSE of a single tree being 0.44 cm, calculated from 16 diameter estimates of the stem. However, in the worst case, an RMSE of 6.06 cm was observed, calculated from 16 diameter estimates of the stem. For individual diameter estimates, the best fit was near perfect (error of 0.01%), with the worst ones having coarse errors (20.48%). In scans covering several trees, the best ones did not reach same accuracy as single tree scans, and the worst ones were significantly more erroneous.
Forests 2018, 9, 6 9 of 14 scans covering several trees, the best ones did not reach same accuracy as single tree scans, and the worst ones were significantly more erroneous.

Local Field Reference Test
The local field reference test (Table 3) shows that without local field data, there is a large bias on the results (5 cm with DBH and 57 m 3 /ha with volume). This is due to different forest conditions of the two sites. This is significant, especially as the sites are in the same geographic area, only 34.5 km apart. Use of local plot data (Strategy 2) lead to almost zero bias and also improved RMSE. Compared with Strategy 1, both bias and RMSE have been reduced by using small amount local reference data (Strategy 3). When compared with Strategy 2, RMSE of volume estimate is at the same level while mean DBH estimate needs to be improved.  scans covering several trees, the best ones did not reach same accuracy as single tree scans, and the worst ones were significantly more erroneous.

Local Field Reference Test
The local field reference test (Table 3) shows that without local field data, there is a large bias on the results (5 cm with DBH and 57 m 3 /ha with volume). This is due to different forest conditions of the two sites. This is significant, especially as the sites are in the same geographic area, only 34.5 km apart. Use of local plot data (Strategy 2) lead to almost zero bias and also improved RMSE. Compared with Strategy 1, both bias and RMSE have been reduced by using small amount local reference data (Strategy 3). When compared with Strategy 2, RMSE of volume estimate is at the same level while mean DBH estimate needs to be improved.

Local Field Reference Test
The local field reference test (Table 3) shows that without local field data, there is a large bias on the results (5 cm with DBH and 57 m 3 /ha with volume). This is due to different forest conditions of the two sites. This is significant, especially as the sites are in the same geographic area, only 34.5 km apart. Use of local plot data (Strategy 2) lead to almost zero bias and also improved RMSE. Compared with Strategy 1, both bias and RMSE have been reduced by using small amount local reference data (Strategy 3). When compared with Strategy 2, RMSE of volume estimate is at the same level while mean DBH estimate needs to be improved.

Discussion
The accuracy of both Kinect and Tango for stem diameter measurements is promising, taking into account that professional good-quality calliper results in precision of 7 mm [38]. The bias of the diameter estimate using Kinect range data seems to be negligible and in this test better than with the calliper. Even better accuracy was obtained with handheld, mobile integrated Tango system onboard a Phab2ProAR (Lenovo, Morrisville, NC, USA) [39]: A small positive mean bias (0.3 cm) was observed for individually scanned trees, most likely caused by noise in point cloud. Previous studies using TLS for tree DBH measurements have resulted in accuracies between 1 and 2.5 cm [2]. Use of other electronic devices, such as integrated laser beam and digital imaging, has resulted in accuracy of 0.6 cm using semi-automatic processing and 1.3 cm when using fully-automatic processing [44]. Accuracies of Kinect and Tango are comparable with those obtained with other sensors. Coefficient of determination (R 2 ) of 0.98 for Kinect and 0.96 for Tango were found when regressed against measuring tape results, when measuring individual trees. When scanning several stems simultaneously with Tango, the results were clearly inferior (R 2 = 0.23). According to our results, the Tango system is more suitable for scanning individual trees.
The results presented were obtained with the first version of the Kinect, after which an improved version was released, potentially providing better performance. We see two practical ways of using Kinect which have been demonstrated in this study: (1) measurement of DBH and processing all data related to one frame of the fixed height; (2) use of the Kinect to measure the first part of the tree stem to estimate the whole stem curve. Both areas should be further studied.
For Tango, our results are in line with [34], where RMSE in range 1.61-2.10 cm was attained for scanning whole plots in one go. A significantly higher accuracy for diameter estimation (RMSE 0.73 cm) was attained in our tests when scanning individual tree stems. This would suggest that for best performance, Tango devices should be used to scan individual trees rather than plots/patterns. As multi-sensor system, the performance of a Tango device in 3D scanning is influenced by several factors: the scanning accuracy of the depth sensor, accuracy of orientation sensors, correctness in registration of consecutive depth images, and the quality of the motion the device experiences while scanning. This is clearly visible in the performance variations found in repeated scans.
A benefit of Tango is that it provides an integrated package offering processing capabilities (with smartphones' own computational resources), connections (wireless local area network (WLAN), global system for mobile communications (GSM)) and other sensors (such as global navigation satellite systems (GNSS)), whereas Kinect is a separate device that requires a computer to operate. From an application development point of view, the Tango sensor holds significant potential for developing mobile applications for crowdsourcing 3D data, if the sensors become a common feature in smart phones. Currently they are rare, hindering development of applications. In addition, the data quality of all hand-hold devices for crowdsourcing applications still needs to improve. Similar misalignments can be clearly seen from the current Tango devices, as shown in Figure 7b, and the image-based point cloud using structure from motion technique as shown in [20], which cost under-or overestimates of tree DBHs.
For both of the tested systems, the biggest efficiency improvement when compared with the manual measuring methods, such as measuring tape, is the digitization of the entire lower part of the tree stem in a single scan. Ideally they should therefore be applied rather to stem curve estimation than only producing a DBH estimate. In the presented experiment, both systems were operated by experienced professionals. Determining whether inexperienced users would attain similar results would require further study.
In the local field reference test, we applied airborne laser scanning data [41,43], area-based approach based on Random Forest and individual tree technique [42]. We aimed to get reliable inventory results for natural forest in Kalkkinen using Evo field plots and data collected in Kalkkinen corresponding to Kinect/Tango diameter measurements. Without any field data from Kalkkinen, the bias of stem volume estimation was 57 m 3 /ha. Using a small amount of local field data resulted in significant reduction of this bias and RMSE, Table 3. Thus, there is a need for local data. In this study, the data applied was still obtained with traditional forest inventory practices having accuracy comparable to individual tree data provided by either Tango or Kinect systems. Clearly, such data for forest inventory calibration could well be provided by crowdsourcing with these emerging sensor technologies.
In the work presented, the accuracy of DBH extraction using mobile sensors was estimated. While this is an essential component in application for crowdsourcing forestry data, it is not the only aspect of their applicability. An additional question is, how would land-owners be able to measure the location of the trees which they would like to input to the reference. Since GNSS accuracy in forest is not enough for plot or individual tree level inventory, an alternative solution would have to be applied. One possibility would be the application of a GNSS assisted 3D Game Engine based approach. In this, an approximate location from GNSS could be further refined by the user, referencing to visualized ITD representations from ALS data. This would allow the users to visually locate themselves more accurately than solely by GNSS.
A potential user interface can be drafted from the same outset: The output of the ITD interpretation is visualized in the 3D Game Engine running in smart phone, and user can select the tree groups or individual trees from ITD interpretation to which he/she likes to give field reference. If the ITD process indicated one tree which corresponds to several trees in the field, the land-owner records all trees and their attributes, especially the diameter breast heights and tree species. For suitable plots, he/she preferably records corresponding information of all trees. Also, if there are omissions in the ITD process, i.e., trees missing from the original estimate, these can be added manually.
In addition to technical challenges, success of crowdsourcing campaigns is dependent on the user participation. In the presented case, the motivation of using 3D sensing sensors for forest owners in a crowdsourcing context would be apparent: to get improved estimates of one's own forest resources. The motivation for the system provider is also obvious: improvement in accuracy for the forest inventory.

Conclusions
In this paper, we have demonstrated that the Kinect and Google Tango depth sensors are feasible for tree stem mapping (i.e., diameter estimation and part of the stem curve estimation) with relatively good accuracy. Kinect-derived tree diameters agreed with tape measurements to 1.9 cm (RMSE) and 7.3% (RMSE-%). The stem curve from the ground to the height of 1.7 m agreed with similar statistics. For Tango enabled smartphone, an RMSE of 0.73 cm was achieved in diameter estimation compared to measurement tape. The accuracies are adequate for operational work. In the local field reference test, we showed that locally collected field reference improves significantly forest inventory estimates at boreal forest conditions. Based on these results, we have introduced a crowdsourcing concept based on individual tree diameter and stem curve measurements, using Kinect and mobile-phone embedded 3D sensor (Tango) for ALS (Airborne Laser Scanning)-based large-area forest inventory at individual tree level.