Mapping Urban Tree Cover Changes Using Object-Based Convolution Neural Network (OB-CNN)

: Urban trees provide social, economic, environmental and ecosystem services beneﬁts that improve the liveability of cities and contribute to individual and community wellbeing. There is thus a need for e ﬀ ective mapping, monitoring and maintenance of urban trees. Remote sensing technologies can e ﬀ ectively map and monitor urban tree coverage and changes over time as an e ﬃ cient and low-cost alternative to ﬁeld-based measurements, which are time consuming and costly. Automatic extraction of urban land cover features with high accuracy is a challenging task, and it demands object based artiﬁcial intelligence workﬂows for e ﬃ ciency and thematic accuracy. The aim of this research is to e ﬀ ectively map urban tree cover changes and model the relationship of such changes with socioeconomic variables. The object-based convolutional neural network (CNN) method is illustrated by mapping urban tree cover changes between 2005 and 2015 / 16 using satellite, Google Earth imageries and Light Detection and Ranging (LiDAR) datasets. The training sample for CNN model was generated by Object Based Image Analysis (OBIA) using thresholds in a Canopy Height Model (CHM) and the Normalised Di ﬀ erence Vegetation Index (NDVI). The tree heatmap produced from the CNN model was further reﬁned using OBIA. Tree cover loss, gain and persistence was extracted, and multiple regression analysis was applied to model the relationship with socioeconomic variables. The overall accuracy and kappa coe ﬃ cient of tree cover extraction was 96% and 0.77 for 2005 images and 98% and 0.93 for 2015 / 16 images, indicating that the object-based CNN technique can be e ﬀ ectively implemented for urban tree coverage mapping and monitoring. There was a decline in tree coverage in all suburbs. Mean parcel size and median household income were signiﬁcantly related to tree cover loss (R 2 = 58.5%). Tree cover gain and persistence had positive relationship with tertiary education, parcel size and ownership change (gain: R 2 = 67.8% and persistence: R 2 = 75.3%). The research ﬁndings demonstrated that remote sensing data with intelligent processing can contribute to the development of policy input for management of tree coverage in cities.


Introduction
Trees are an important element of the city and suburbs, benefiting and inconveniencing other urbanites in manifold ways [1][2][3][4][5]. Therefore, it is not surprising that there has been a growing literature documenting temporal change in urban tree density and cover [6][7][8][9][10][11] and testing hypotheses on the causes of change [12][13][14][15][16][17][18]. multi-thresholds followed by multi-resolution segmentation to segment the surface model into finer objects. The threshold of the CHM (4 m to 40 m) was used as a final classification to extract trees from other classes. The overall accuracy of the tree canopy cover extraction was 96.6%, with a Kappa Index of Agreement (KIA) of 0.9.
The GEOBIA method can be more accurate than methods using pixels, especially for very high-resolution images [28,32,37]. However, problems have been experienced in situations in which over segmentation and under-segmentation appear within the same image [38][39][40][41]. Additionally, feature extraction in urban environments is difficult because of the range of materials that make up the same classes [42] and the occlusion and shadows that break image objects into finer objects [20].
Extracting urban land cover features with high thematic accuracy in an automated way is still a challenging task with GEOBIA, and it demands machine-learning artificial intelligence workflows [43][44][45]. Among numerous alternative techniques, convolutional neural networks (CNNs) [46] are thought to be among the most promising for image classification [47][48][49]. The CNN technique became popular after release of AlexNet in 2012 [50] and with the release of CNN in Google TensorFlow. CNN is a deep-learning, supervised neural network that uses labelled data. CNN works with a combination of input layer, hidden layers with hidden units and an output layer. The hidden units are like neurons that are fully connected with each individual neuron from a previous layer [49,51]. CNN has proven successful in vegetation contexts [52][53][54][55][56][57]. Li et al. [52] used the CNN algorithm in very high-resolution quick bird images for oil palm trees detection in Malaysia and achieved 87.95% overall accuracy. Chen et al. [53] proposed a novel approach based on CNN to count apples and oranges in an unstructured environment with a 0.76 F1 score. Similarly, Wang et al. [54] used a faster region-based CNN (R-CNN) workflow to detect mango fruit flowers. Sa et al. [55] used the R-CNN workflow for sweet pepper and melon detection and achieved accuracy of a 0.84 F1 score. Similarly, Csillik et al. [56] used the CNN workflow, with post-processing using GEOBIA, for identifying citrus trees in a complex agricultural area of California from unmanned aerial vehicle (UAV) imagery, achieving 96.24% overall accuracy. Timilsina et al. [57] demonstrated that the accuracy of the image classifications can be improved by using a combination of OBIA and CNN methods to map the urban tree cover. No study has been published that maps temporal and spatial changes of tree covers using GEOBIA and CNN.
Trees in domestic gardens have been shown to be associated with high levels of household incomes [14][15][16][17][18]58,59], high levels of education [15][16][17]58,60] and large block size [16]. Motives for planting and removing trees have proven to be highly varied, as have preferences for particular types of trees, suggesting that changes of garden ownership may be a major cause of tree changes in suburbia [61,62]. However, neither time since purchase at the parcel level or mean time since purchase at the aggregate level have been included in any of the works that relate tree changes to other variables.
The main objective of this research is to identify the urban tree cover changes in suburban Hobart, Tasmania, Australia between 2005 and 2015/16 using object-based CNN and to model a relationship between tree cover changes and socioeconomic variables. In order to meet the main objective, the following subobjectives are addressed: • Perform stratified random sampling to select sample study areas, The organisation of this paper is as follows: in Section 2, the study area, datasets and the adapted methodology are presented. Section 3 presents the results. Section 4 presents the discussions from the results, and Section 5 presents the conclusions and possible future works.

Study Area and Sample Selection
Fourteen suburbs in the inner and general residential zone of the western suburbs of Hobart, Tasmania, Australia were selected ( Figure 1, Table 1) to represent a range in socioeconomic characteristics (median household income and tertiary education). The mean elevation of selected suburbs ranges between 23 and 69 m (https://en-au.topographic-map.com/maps/jqqb/Hobart/). One sample point in each suburb was generated using the "random point creation" tool in ArcGIS Pro 2.4. For each sample point, a sample patch of four hectares was created by buffering sample points (Table 1). A representative raster plot of sample patches is presented in Figure 2 (refer to Appendix A for all the sample patch images). Ten random private cadastral parcels from each sample patch were selected using the "create random points" tool ( Figure 3). The selected parcels had to be completely inside the boundary of each sample patch. Roads and parks were excluded from selection.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 27 suburbs ranges between 23 and 69 m (https://en-au.topographic-map.com/maps/jqqb/Hobart/). One sample point in each suburb was generated using the "random point creation" tool in ArcGIS Pro 2.4. For each sample point, a sample patch of four hectares was created by buffering sample points (Table 1). A representative raster plot of sample patches is presented in Figure 2 (refer to Appendix A for all the sample patch images). Ten random private cadastral parcels from each sample patch were selected using the "create random points" tool ( Figure 3). The selected parcels had to be completely inside the boundary of each sample patch. Roads and parks were excluded from selection.

Datasets
Very high spatial resolution (VHSR) multispectral QuickBird satellite images (60 cm) with red, green, blue and near infra-red (NIR) (Figure 4a) spectral bands acquired in November 2005, were

Datasets
Very high spatial resolution (VHSR) multispectral QuickBird satellite images (60 cm) with red, green, blue and near infra-red (NIR) (Figure 4a Table 2.   Table 2. (www.abs.gov.au) for the fourteen suburbs. Dates of sales of each parcel in the period 1983-2015 were obtained from the nationally leading property website (www.realestate.com.au). The years between the last sale and 2015 and the number of sales in the period 1983-2015 were extracted from these data. by associating recorded corresponding coordinates to the marks. The transformation was done by first-order polynomial (Affine), as it provides better and more accurate transformation results than other techniques [65,66]. The accuracy of rectified images was cross-verified with the 2005 satellite images. The atmospheric and geometric correction of rectified images were determined prior to further image analysis.

Normalised Difference Vegetation Index
The Normalised Difference Vegetation Index (NDVI) was calculated from 2005 satellite images by using the mean of red and near infra-red bands (Equation (1)) [67]. The NDVI value of 0.4 was used as a threshold to identify tree coverage from the 2005 images.

Canopy Height Model (CHM)
The LiDAR point cloud datasets were merged and clipped for the study area for both 2008 and 2011 using LAStools (https://rapidlasso.com/lastools/). Ground and high vegetation points in the classified LiDAR point cloud dataset were represented by class 2 and class 5, respectively. Hence, the digital surface model (DSM) was prepared by filtering the class 5-point cloud using the "las2dem" tool.

Preparation of Training Samples
The training sample for the CNN model require at least two land cover classes [69]. Hence, tree and other (nontree) classes were prepared. The tree class represented urban trees of different species within the sample patches, and the other class represented all other nontree features, including grassland, bare land, buildings, water bodies and roads.
Object-based image analysis (OBIA) in eCognition was used to segment images using the multiresolution segmentation algorithm at the pixel level. The tree and nontree classes for the training dataset from the 2005 satellite images were prepared by calculating CHM and NDVI values (Figure

Preparation of Training Samples
The training sample for the CNN model require at least two land cover classes [69]. Hence, tree and other (nontree) classes were prepared. The tree class represented urban trees of different Remote Sens. 2020, 12, 3017 9 of 27 species within the sample patches, and the other class represented all other nontree features, including grassland, bare land, buildings, water bodies and roads.
Object-based image analysis (OBIA) in eCognition was used to segment images using the multiresolution segmentation algorithm at the pixel level. The tree and nontree classes for the training dataset from the 2005 satellite images were prepared by calculating CHM and NDVI values ( Figure 6). The shape and compactness parameters were set to 0.1 and 0.5, respectively. To find the optimum scale factor for segmentation, iterative segmentation was done with different scale factor values ranging from 50 to 0.1 (Table 3)   The height threshold of five metres as calculated in the CHM was used to separate trees from other vegetation covers. The height threshold was calculated by assuming that a tree of two metres in 2005 would grow at one metre per year.
The 2015/16 images were segmented using the multiresolution segmentation algorithm with the scale, shape and compactness parameters set at 2, 0.1 and 0.5, respectively. The training dataset of tree class for 2015/16 images was prepared by using CHM values only and not including NDVI. This is because of the absence of the NIR band in the 2016 Google Earth image. Those segments with CHM (from the 2011 Lidar point cloud) values greater than and equal to two metres were assigned to the tree class. The representative training samples for trees and other classes were generated from the whole study area. Those trees that were present in 2011 but not in 2015 were manually filtered out by visual examination.

Object-based CNN for Tree Cover Identification
Some parts of this section are repeated from an earlier paper by the two senior authors [57]. The CNN workflow of Trimble's eCognition software Developer 9.4 was applied for tree extraction (Figure 7). This CNN workflow in eCognition software is based on Google TensorFlow API [69]. The overall analysis was done in a computer system having 64-bit operating system, 16 GB RAM and Intel (R) Core (TM) i7-7700 CPU @ 3.60 GHz processor.   The height threshold of five metres as calculated in the CHM was used to separate trees from other vegetation covers. The height threshold was calculated by assuming that a tree of two metres in 2005 would grow at one metre per year.
The 2015/16 images were segmented using the multiresolution segmentation algorithm with the scale, shape and compactness parameters set at 2, 0.1 and 0.5, respectively. The training dataset of tree class for 2015/16 images was prepared by using CHM values only and not including NDVI. This is because of the absence of the NIR band in the 2016 Google Earth image. Those segments with CHM (from the 2011 Lidar point cloud) values greater than and equal to two metres were assigned to the tree class. The representative training samples for trees and other classes were generated from the whole study area. Those trees that were present in 2011 but not in 2015 were manually filtered out by visual examination.

Object-Based CNN for Tree Cover Identification
Some parts of this section are repeated from an earlier paper by the two senior authors [57]. The CNN workflow of Trimble's eCognition software Developer 9.4 was applied for tree extraction (Figure 7). This CNN workflow in eCognition software is based on Google TensorFlow API [69]. The overall analysis was done in a computer system having 64-bit operating system, 16 GB RAM and Intel (R) Core (TM) i7-7700 CPU @ 3.60 GHz processor.

Object-based CNN for Tree Cover Identification
Some parts of this section are repeated from an earlier paper by the two senior authors [57]. The CNN workflow of Trimble's eCognition software Developer 9.4 was applied for tree extraction (Figure 7). This CNN workflow in eCognition software is based on Google TensorFlow API [69]. The overall analysis was done in a computer system having 64-bit operating system, 16 GB RAM and Intel (R) Core (TM) i7-7700 CPU @ 3.60 GHz processor.

Generate Labelled Sample Patches for CNN Model
In deep learning, finding the most suitable architecture for the CNN is still ongoing research. While generating sample patches, there are some parameters that should be considered. They are sample count, sample patch size and image layers. In the present research, 8000 sample patches were generated for the tree and other classes, separately. The sample size was assigned to 22 × 22 pixels. The selection of sample patch size was done by trial-and-error approaches. Values smaller than 22 × 22 increased tree canopy detection error, whereas values larger than 22 × 22 missed some of the small trees. Most of small trees in the study area were found to be within 22 × 22 pixels.
To apply max pooling while creating the CNN model, the size of the input training image should be an even number [69]. The samples were generated based on the thresholds for NDVI and CHM. The generated sample patches were saved in tiff format (Figure 8). It took almost five minutes to generate sample patches for each class. The processing time depends on the number of sample patches to be generated. The higher the number of samples, the more time will be consumed to generate the samples. All four spectral bands (green, red, infrared and blue) were used while generating samples from the 2005 images, whereas three spectral bands (blue, green and red) were used while generating samples from the 2015/16 images.
The generated sample patches were saved in tiff format (Figure 8). It took almost five minutes to generate sample patches for each class. The processing time depends on the number of sample patches to be generated. The higher the number of samples, the more time will be consumed to generate the samples. All four spectral bands (green, red, infrared and blue) were used while generating samples from the 2005 images, whereas three spectral bands (blue, green and red) were used while generating samples from the 2015/16 images.

Create CNN Model
A simple CNN model was created with one hidden layer. The hidden layer is based on the kernel size, number of feature maps and max pooling. As the even-sized kernels will generate hidden units located between pixels and then are shifted to match the pixel borders, old size kernels (13 × 13) were assigned with 40 feature maps. Max pooling using a 2 × 2 filter with a stride of 2 in both horizontal and vertical directions was applied to reduce the resolution of the feature maps. Thus, the weight of 4 × 13 × 13 × 40 corresponds to the hidden layer kernel. The first factor (4) represents the number of image layers, and the second and third factors (13 × 13) describe the number of units in the local neighbourhood, from which connections are forwarded into the hidden layer. The final factor (40) represents the number of feature maps generated. The hidden layer of this network thus contains 27,040 (4 × 13 × 13 × 40) different weights that can be trained.

Create CNN Model
A simple CNN model was created with one hidden layer. The hidden layer is based on the kernel size, number of feature maps and max pooling. As the even-sized kernels will generate hidden units located between pixels and then are shifted to match the pixel borders, old size kernels (13 × 13) were assigned with 40 feature maps. Max pooling using a 2 × 2 filter with a stride of 2 in both horizontal and vertical directions was applied to reduce the resolution of the feature maps. Thus, the weight of 4 × 13 × 13 × 40 corresponds to the hidden layer kernel. The first factor (4) represents the number of image layers, and the second and third factors (13 × 13) describe the number of units in the local neighbourhood, from which connections are forwarded into the hidden layer. The final factor (40) represents the number of feature maps generated. The hidden layer of this network thus contains 27,040 (4 × 13 × 13 × 40) different weights that can be trained.

Train CNN Model
The model was then trained based on the labelled sample patches and the adjusted model weights using backpropagation. The learning rate is an important parameter, as it defines the amount by which weights are adjusted in each iteration of the statistical gradient descent optimisation [69]. The learning rate of 0.0015 was assigned based on trial-and-error. The higher the value of the learning rate, the faster the speed of training, but the bottom of the optimal minimum may not be reached, while smaller values will slow down the training processing and may become stuck in local minima and end up with weights not even close to the optimal settings [69]. Training steps and training samples were set as 5000 and 50, respectively. With the given labelled samples and weight parameters, it took almost 30 min to complete the training process.

Apply CNN Model
After applying the trained CNN model to the input image with four layers in the 2005 image and 3 layers in the 2015 image, heatmaps were produced for the tree class ( Figure 9). The algorithm used was "apply convolutional neural network" in eCognition software. The heatmaps show the probability values of trees detected within the range of values 1 to 0 (the values close to 1 indicate the high likelihood of trees and those close to 0 indicate a low likelihood of trees). In order to extract trees from the image, the produced heatmaps were smoothed using a 7 × 7 gaussian filter with a 32-bit float output type. The local maxima of the smoothed heatmap of the trees were generated using a morphology (dilate) filter of 3 × 3 pixels.

Apply CNN Model
After applying the trained CNN model to the input image with four layers in the 2005 image and 3 layers in the 2015 image, heatmaps were produced for the tree class ( Figure 9). The algorithm used was "apply convolutional neural network" in eCognition software. The heatmaps show the probability values of trees detected within the range of values 1 to 0 (the values close to 1 indicate the high likelihood of trees and those close to 0 indicate a low likelihood of trees). In order to extract trees from the image, the produced heatmaps were smoothed using a 7 × 7 gaussian filter with a 32-bit float output type. The local maxima of the smoothed heatmap of the trees were generated using a morphology (dilate) filter of 3 × 3 pixels.

Object-Based Classification Refinement
The heatmaps were segmented using multiresolution segmentation with scale factor of 10, shape 0.1 and compactness 0.5. The segments with tree probability values greater than 0.5 were classified into the refined tree class. To reduce the noise on classification due to similar spectral properties of trees, grass and nontree objects, the CHM threshold of less than or equal to 2 m and NDVI threshold of less than 0.1 were applied in the classification. The classified refined tree objects were further refined using the assign merge function, pixel-based object resizing and remove object function. The tree segments with relational borders greater than and equal to 0 and with neighbour tree segments were merged. Growing and shrinking modes with surface tension values greater than or equal to 0.5 and box sizes in X, Y and Z as 5, 5 and 1, respectively, were applied consequently in the pixel-based object resizing algorithm in order to refine the shapes of tree segments. To eliminate smaller segments that were not trees, a number of pixel thresholds was used. Hence, tree segments with areas smaller than or equal to 200 pixels (equivalent to areas of 4.5 square metres) were removed from the trees class. Further, some manual editing was done to refine the tree class. The refined tree class was exported as an ESRI (Environmental Systems Research Institute) shapefile.

Accuracy Assessment
A manual digitisation of one randomly selected parcel from each of the 14 patches for the 2015/2016 images was used as the ground truth in an accuracy assessment. It was easy to discriminate trees using shape, colour and shadow length. The accuracy of tree detection was compared using true positive (TP), false positive (FP) and false negative (FN) classes at the pixel level [70], as presented in Equations (3) to (6). TP represents those pixels that are correctly identified as trees and that exactly intersect with the ground truth. FPs are the pixels that were classified as tree objects from the CNN classification but those were not trees based on the ground truth. FN corresponds to pixels that are not detected as trees from the applied CNN classification method. Four different statistical parameters associated with TP, FP and FN were used. They are as follow: Intersection Over Union (IOU) = TP TP + FP + FN (6) Precision (P) answers the question, "How many of the classified pixels are trees"? Recall (R) determines the proportion of the actual (ground truth) tree pixels that were classified as trees in the image. The balance between P and R was determined using the F1 measurement. The validation metric intersection over union (IOU) was used to measure the accuracy of the classification results based on the ground truth [71]. An IOU value of 100% represents the detected object exactly overlapping with the ground truth mapping, whereas an IOU value of 0% indicates no overlap.

Statistical Analysis
Statistical analysis was carried out in Minitab 18 software [72]. Regression analysis was performed at the patch level with five predictor variables: income, tertiary education, mean parcel size, mean years since last sale and mean numbers of times sold between 1983 and 2015 to model each of the tree cover loss, gain and persistence. The mean parcel size in the sample level analysis was the average area of the 10 random parcels. Similarly, the mean years since the sale and mean number of times sold were averages of the 10 random parcels. The model with the highest adjusted R 2 and all predictor variables with significant (p < 0.05) slopes was selected.
A general linear model (GLM) was used to model each of tree loss, gain and persistence at the parcel level with four predictor variables: sample patch number, parcel size, years from sale and number of times sold. The sample patch number was used as a random variable in this analysis. The others were covariates. Due to the low number of sample patches, an adjusted R 2 was used to indicate the level of explanation of alternative models. The model with the highest adjusted R 2 , and all predictor variables with significant (p < 0.05) slopes was selected.

Accuracy Assessment
The IOU values ranged from 62% to 88%. The F1 measure values ranged from 77% to 94%. The mean IOU value was found to be 70%. The mean precision and recall values were 87% and 85%, respectively.
An overall accuracy of 96% and a kappa coefficient of 0.77 was found for tree extraction for the 2005 data. Whereas, for the 2015/16 data, the accuracy was higher, with 98% overall accuracy and 0.93 kappa coefficient ( Figure 10).

Accuracy Assessment
The IOU values ranged from 62% to 88%. The F1 measure values ranged from 77% to 94%. The mean IOU value was found to be 70%. The mean precision and recall values were 87% and 85%, respectively.
An overall accuracy of 96% and a kappa coefficient of 0.77 was found for tree extraction for the 2005 data. Whereas, for the 2015/16 data, the accuracy was higher, with 98% overall accuracy and 0.93 kappa coefficient (Figure 10).

Tree Cover Change
There was a net tree cover loss in all the sample patches. The highest tree cover losses were in the Kingston (18.4%), Blackmans Bay (14.1%) and Kingston Beach (12.9%) sample patches. The lowest tree cover losses were in Chigwell (3.9%), North Hobart (4.2%) and Goodwood (4.6%) (Figure 11).

Tree Cover Change
There was a net tree cover loss in all the sample patches. The highest tree cover losses were in the Kingston (18.4%), Blackmans Bay (14.1%) and Kingston Beach (12.9%) sample patches. The lowest tree cover losses were in Chigwell (3.9%), North Hobart (4.2%) and Goodwood (4.6%) (Figure 11).  There was a strong positive relationship between the net tree cover losses of 2005-2015 and tree covers in 2005, with a strong positive residual for the net loss for Kingston and a strong negative residual for North Hobart (Figure 12). The best model for tree cover loss at the patch level had positive influences from income and mean parcel size (Table 4). At the parcel level, the parcel size was the only predictor of tree cover loss, with the larger the parcel, the greater the tree loss ( Table 5). The best model for tree cover gain at the patch level had positive influences from tertiary education, mean parcel size and mean years since sale (Table 4). At the parcel level, a poorly explanatory model had positive influences from parcel size and years since sale (Table 5). Tree persistence was well-explained at the patch level by tertiary education, mean parcel size and mean years since sale, all with positive influences (Table 4). At the parcel level, only the influence of the parcel size remained (Table 5). The best model for tree cover loss at the patch level had positive influences from income and mean parcel size (Table 4). At the parcel level, the parcel size was the only predictor of tree cover loss, with the larger the parcel, the greater the tree loss ( Table 5). The best model for tree cover gain at the patch level had positive influences from tertiary education, mean parcel size and mean years since sale (Table 4). At the parcel level, a poorly explanatory model had positive influences from parcel size and years since sale (Table 5). Tree persistence was well-explained at the patch level by tertiary education, mean parcel size and mean years since sale, all with positive influences (Table 4). At the parcel level, only the influence of the parcel size remained (Table 5). Mapping urban tree cover changes with high thematic accuracy in an automated way is a challenging task, and various attempts have been made in the past. Ellis and Mathews [73] used OBIA to find out urban tree canopy changes between 2006 and 2013 in Oklahoma City using RGB aerial imagery of one-metre spatial resolution and LiDAR data. Guo et al. [7] used very high resolution RGB aerial images of 2011 (0.1 m) and 2015/16 (0.075 m) and a LiDAR dataset of 2011 to map city-wide canopy cover changes of Christchurch, New Zealand using OBIA and the random forest classifier. However, both studies [7,73] acknowledged that their tree extraction results could have been better if they could have used aerial imagery with a near-infrared (NIR) band to fix the misclassifications caused by spectral similarities between roof materials and trees. In the present study, we used CHM and NDVI values as the thresholds to generate training samples of tree classes from 2005 satellite imagery. These derived threshold values are the result of using the NIR band. However, due to unavailability of the NIR band for the 2015/16 imagery, we generated tree training samples from RGB bands using only the threshold of the CHM with manual editing.
Branson et al. [74] also used aerial and Google Street View images to extract urban trees, detect the species of trees and map the tree species cover changes of a city of California, USA using the state-of-the-art CNN method. In contrast to the method of [74], we used LiDAR data to extract urban trees from Google Earth images using object-based CNN. The use of LiDAR data provided an accurate extent and location of the tree considering the third dimension on top of latitude and longitude.
In the present study, the CNN model was trained by using automatically generated samples. The object-based CNN method when trained with manually generated samples might produce better accuracy than the present research if applied to very high-resolution multispectral imagery [56]. However, the manual preparation of training samples might not be always feasible in terms of time and costs.
A comparison with previous relevant studies using the OBIA and CNN methods for urban tree cover mapping reveals a novelty in the combination of the use of LiDAR, very high-resolution satellite imagery, aerial imagery and the latest Google Earth imagery, with an overall accuracy of above 95% based on the confusion matrix and 70% based on IOU.

Urban Tree Cover Change
The influence on tree cover gain and tree cover persistence of years since sale of house (Figures 13 and 14) is consistent with the hypothesis that tree change is associated with changes in garden/parcel ownership [61,62]. Gain would result from the growth of new trees planted soon after possession and those allowed to survive. Persistence would reflect the stability of trees in long-possessed gardens. The lack of a negative effect of time since sale on tree cover loss over the decade may relate to a putatively short period in which trees are removed to satisfy preferences for other trees or less trees. If this period were a year and there was a ten percent house turnover per annum, the same tree loss would be expected in each of the ten years between 2005 and 2015, making it unlikely that the time since sale would have a linear relationship with tree loss. In contrast, all gains would be incremental after the initial loss.
household income was the best predictor of the percentage frequency of trees in front gardens in Hobart suburbs out of many socioeconomic, environmental and demographic variables [15]. The positive correlation between household income and tree cover loss might be taken to indicate that people with higher household incomes can better afford tree removal from their properties than poorer people or that people with higher incomes are more likely to perform building extensions, landscaping, and other structural development activities that result in tree losses. However, the main reason is likely to be that income relates closely to absolute tree abundance, so equal proportionate losses will result in higher absolute losses in richer areas. Our loss figures are the absolute percentage of a block from which the tree cover has disappeared, not a percentage of the 2005 cover.   The net tree cover loss contrasts with the widespread tree density gain recorded for Hobart in an earlier period   [16] but is consistent with some other observations from Australia [75][76][77][78] and elsewhere [7,73,[79][80][81]. Tree cover is likely to be predicted by tree density, except where very recent suburbs on previously treeless areas are contrasted with older suburbs or where houses that were built amongst pre-existing trees are contrasted with suburbs of the same age built in treeless areas. The highest losses of tree cover between 2005 and 2015 were in those areas where new developments of houses occurred amongst indigenous trees. The removal of older local indigenous trees tends to occur gradually, as they drop limbs. The older suburbs and those developed on farmlands did not exhibit high levels of net tree losses.
Variations of tree cover loss, gain and persistence with parcel sizes (Figures 13-15) was expected, because the opportunity to lose trees is much greater with more trees in more spaces [16]. The positive effects of high proportions of householders with tertiary incomes on tree gain and persistence (Figures 14 and 15) is consistent with the influence of a tertiary education on garden complexity [15].

Limitation
The main limitation of this research is the time difference between the used remote-sensing images (2005 and 2015/16) and LIDAR dataset (2008 and 2011). This could have introduced error in the analysis, because the analysis uses the CHM generated from the LiDAR dataset to identify the The significant relationship at the patch scale between the tree cover loss and median household income with the parcel size ( Figure 13) held constant is superficially puzzling, given that the household income was the best predictor of the percentage frequency of trees in front gardens in Hobart suburbs out of many socioeconomic, environmental and demographic variables [15]. The positive correlation between household income and tree cover loss might be taken to indicate that people with higher household incomes can better afford tree removal from their properties than poorer people or that people with higher incomes are more likely to perform building extensions, landscaping, and other structural development activities that result in tree losses. However, the main reason is likely to be that income relates closely to absolute tree abundance, so equal proportionate losses will result in higher absolute losses in richer areas. Our loss figures are the absolute percentage of a block from which the tree cover has disappeared, not a percentage of the 2005 cover.

Limitation
The main limitation of this research is the time difference between the used remote-sensing images (2005 and 2015/16) and LIDAR dataset (2008 and 2011). This could have introduced error in the analysis, because the analysis uses the CHM generated from the LiDAR dataset to identify the tree cover. This means those trees that have been cleared in between the acquisition of the LiDAR data (2008) and orthophoto (2005) may not have been classified as trees. On the other hand, those planted after the acquisition of LiDAR data (2011) and taller than two metres during the orthophoto acquisition (2015/16) might not be classified as trees. Additionally, the inconsistency in the spatial resolution of input images due to different sources-QuickBird satellite images, Google Earth images and aerial images-might have introduced some errors.

Conclusions
Urban trees have economic, environmental and socioeconomic benefits to the extent that their maintenance or increase are often objectives for governments. The development and implementation of policies requires accurate data on tree changes. The present research successfully maps tree cover changes and models the relationship of changes with socioeconomic factors. This research has made three major contributions. First, the use of automatically generated training samples to train the CNN model. Second, the application of a combined CNN and OBIA method to map urban trees and urban tree cover changes per sample and a cadastral parcel spatial analysis unit. Third, to model the relationship between tree cover change and socioeconomic variables. A net tree cover loss was measured in the study area of Greater Hobart between 2005 and 2015/16. This finding may motivate local councils to make plans and policies to reverse this tendency, such as increasing tree planting on public lands.
This research uses a simple CNN model with a single hidden layer. In future research, multiple hidden layers with a change in parameters can be applied and tested. Similarly, deeper CNN methods, including region-based CNN (R-CNN) and fully connected CNN (F-CNN), can be further tested for urban tree coverage mapping and tree species identification.
Five socioeconomic predictor variables were used to model the tree cover changes using a regression analysis. Topographic and climatic variables, such as slope, elevation, aspect, solar radiation, geology and precipitation could be used as predictors in developing higher-order spatial-statistical methods that may help in further understanding spatial and temporal associations in tree cover change mapping.