Health Assessment of Eucalyptus Trees Using Siamese Network from Google Street and Ground Truth Images

: Urban greenery is an essential characteristic of the urban ecosystem, which offers various advantages, such as improved air quality, human health facilities, storm-water run-off control, carbon reduction, and an increase in property values. Therefore, identiﬁcation and continuous monitoring of the vegetation (trees) is of vital importance for our urban lifestyle. This paper proposes a deep learning-based network, Siamese convolutional neural network (SCNN), combined with a modiﬁed brute-force-based line-of-bearing (LOB) algorithm that evaluates the health of Eucalyptus trees as healthy or unhealthy and identiﬁes their geolocation in real time from Google Street View (GSV) and ground truth images. Our dataset represents Eucalyptus trees’ various details from multiple viewpoints, scales and different shapes to texture. The experiments were carried out in the Wyndham city council area in the state of Victoria, Australia. Our approach obtained an average accuracy of 93.2% in identifying healthy and unhealthy trees after training on around 4500 images and testing on 500 images. This study helps in identifying the Eucalyptus tree with health issues or dead trees in an automated way that can facilitate urban green management and assist the local council to make decisions about plantation and improvements in looking after trees. Overall, this study shows that even in a complex background, most healthy and unhealthy Eucalyptus trees can be detected by our deep learning algorithm in real time.


Introduction
Street trees are an essential feature of urban or metropolitan areas, although relatively ignored. Their benefits include air filtering, water interception, cooling, minimising energy consumption, erosion reduction, pollution management, and run-off detection [1,2]. Various trees are planted in urban areas due to street trees' social, economic and environmental advantages. One such tree, Eucalyptus, is a valuable asset for communities in urban areas of Australia. Eucalyptus trees are icons of the Australian flora, often called gum trees. They dominate the Australian landscape with more than 800 species, forming forests, woodlands and shrub-lands in all environments, except for the aridest deserts. Evidence from DNA sequencing and fossil discovery shows that Eucalyptus had its evolutionary roots in Gondwana when Australia was still linked to Antarctica [3]. Traditionally, indigenous Australians have used almost all parts of Eucalyptus trees. Leaves and leaf oils have medicinal properties, and saps may be used as adhesive resins; bark and wood were used to make vessels, tools and weapons, such as spears and clubs [4]. For the conservation of Australia's rich biodiversity, Eucalyptus native forests are significant.
There are two factors that are detrimental to the health of street trees. First, urban trees are under persistent strain, i.e., excessive soil moisture and soil mounding in nurseries on roots that have an adverse effect on their health [5]. Secondly, urban ecosystem distinguished by elevated peak temperatures relative to nearby rural areas [6], soil compaction, limited growth of roots, pollution of groundwater [7], and high air pollution concentrations caused by community activities. Usually, urban soil contains a significant volume of static building waste, contaminants, de-icing salts, low soil quality and a significant degree of volume density, thus maintaining a low natural activity and the inferior organic material substance provided [8,9]. Both of these reasons raise the likelihood of water and nutrient pressure, which degrades the metabolism and development of a tree and reduces its capacity to provide ecosystem services. Urban tree conditions are adversely affected due to soil compaction, low hydraulic conductivity, low compaction aeration and mostly insufficient available rooting space [9]. In addition, inadequate conditions at the site raise the threat of insect disease and infestation [6].
The evaluation of tree health conditions is highly critical for biodiversity, forest management, global environmental monitoring and carbon dynamics. Unhealthy tree features are identifiable and can build a detection and classification model using deep learning to intelligently diagnose Eucalyptus in a healthy and unsanitary/dead tree. To consider the importance of urban trees to the community, they should be adequately maintained, including obstacle prevention, regeneration, and substitution of dead or unhealthy trees. Ideally, skilled green managers need to monitor the precise and consistent spatial data on tree's health. About 60% of the riparian tree vegetation in extensive wetlands and floodplains reported being in poor health, or extinct [10]. Chronic decreases are associated with extreme weather conditions due to human resources management, various pathogens, pests and various parasites. Trees are stressed [11] in the landscape, where the soil has a poor drainage mechanism, also resulting in low growth of trees. The most common factors such as soil erosion, nutrient deficiency, allelopathy, biodiversity, pests, and diseases affect Eucalyptus species' health.
Detection and recognition of Eucalyptus tree health presents a challenging task since many trees have a few pixels across input images, and some trees are also overshadowed by other trees and cannot be found due to weather conditions or lighting. For addressing these challenges and achieving high accuracy and precise prediction, a large amount of labelled training data for feature extraction of healthy and unhealthy class features is required. For this purpose, we used GSV imagery and ground truth images were obtained from various viewpoints and at different times. This study uses the Siamese Convolutional Neural Network (SCNN) [12], to develop an automated model for identification and classification and a line-of-bearing measurement approach paired with a spatial aggregation approach is used to estimate the geolocation of the Eucalyptus tree. We concentrated on the identification of healthy and unhealthy Eucalyptus trees along the streets and roads in the Wyndham city council area [13]. This study aims to use a self-created ground truth and GSV [14] imagery for finding the geolocation, identification and classification of healthy and unhealthy Eucalyptus trees to prevent damage that can significantly reduce ecosystem harm and financial loss. GSV is an open image series of streetwise panoramic views with approximate precise geolocation details acquired on mobile platforms using GPS, wheel encoder, and inertial navigation sensor (using multiple sources such as cars, trekkers and boats) [15]. This GSV has been widely used to increase geographical information in a variety of areas of interest, including urban greenery [16,17], land use classification [18,19] and tree shade provision [20].
Our key contributions are (a.) classification of trees that are in a healthy or unhealthy state and (b.) identification of geolocation of the Eucalyptus trees. All these evaluations are done based on GSV imagery and self-gathered ground truth image data from streets. Our experiments show that this proposed method can effectively detect and classify healthy and unhealthy Eucalyptus trees with various dataset and complex backgrounds. Our proposed method for geolocation identification gives us reliable results and could be applied for geo-identification of other objects on the roadside. Figure 1 shows the overall visual representation of this study.

Related Work
Numerous work has been done on detection and recognition in various areas such as fruits and vegetable plant leaves disease detection [21], vegetation detection [22], pedestrian detection [23], face detection [24], object detection [25], using various deep learning algorithms [26]. Automatic data analysis in the remote sensing (RS) [27] and computer vision [28] field is of vital significance. RS data have been used in urban areas to assess trees health. A large volume of the study shows various RS techniques used to determine the current condition of trees. In contrast, on the other side, a minimal amount of research shows interest in the identification and classification of dead trees. Milto Miltiadou et al. [29] presented a new way to detect dead Eucalyptus camaldulensis with the introduction of DASOS (feature vector extraction). They tried to explore the probability of dead trees detection without tree demidruleation from Voxel-based full-waveform (FW) LiDAR. Shendryk et al. [30] suggested a bottom-up algorithm to detect Eucalyptus tree trunks and the demidruleation of individual trees with complex shapes. Agnieszka Kamińska et al. [31] used remote sensing techniques, including airborne laser scanner and colour infrared imagery, to classify between living or dead trees and concluded that only airborne laser scanner detects dead tree at the single tree level.
Martin Weinmann et al. [32] proposed a novel two-step approach to detect a single tree in heavily sampled 3D point cloud data obtained from urban locations and tackled semantical classification by assignment of semantic class labelling to irregularly separated 3D points and semantic segmentation by separating individual items within the named 3D points. S. Briechle et al. [33] worked on the PointNet++ 3D deep neural network with the combination of imagery data (LiDAR and multispectral) to classify various species as well as standing dead tree crowns. The values of laser echo pulse width and multispectral characteristics were also introduced into the classification process, and individual tree's 3D segments were created in a pre-processing stage of a 3D detection system. Yousef Taghi Mollaei et al. [34] developed an object-oriented model using high-resolution images to map the pest-dried trees. The findings confirm that the object-oriented approach can classify the dried surfaces with precise detail and high accuracy. W. Yao et al. [35] proposed an approach to individual dead tree identification using LiDAR data in mountain forests. The three-dimensional coordinates were derived from laser beam reflexes, pulse intensity and width using waveform breakdowns and 3D single trees were then detected by an optimized method that describes both the dominated trees and small under-story trees within the canopy model.
According to Xiaoling Deng et al. [36,37] machine learning has been used to set several benchmarks in the field of agriculture. W. Yao et al. [35] and Shendryk et al. [38] published their prior work on the identification of dead trees is performed by individual tree crown segmentation prior to the health assessment. Meng R. et al. [39], Shendryk et al. [30], López-López M et al. [40], Barnes et al. [41], Fassnacht et al. [42], mentioned that most of the current tree health studies centred either on evaluating the defoliation of the tree crown or the overall health status of the tree, although there was minimal exposure to the discolouration of the tree crown. Dengkai et al. [43] used a group of fields assessed tree health indicators to define tree health that was classified with a Random Forest classifier using airborne laser scanning (ALS) data and hyperspectral imagery (HSI). They compared the outcomes of ALS data and HIS and also their combination and then analysed the accuracy degree of classification. Nasi et al. [44,45] reported in two different pieces of research that the potential of UAV-based photogrammetry and HSI for mapping bark beetle in an urban forest, damage at tree level. Degerickx et al. [46] performed tree health classification based on chlorophyll and leaf area index derived from HSI, where for individual tree crown segmentation, they used ALS data. Xiao et al. [47] used normalised difference vegetation index (NDVI) to detect healthy and unhealthy trees. They found it challenging to map tree health across various species or in places where many tree species coexist. Goldbergs et al. [48] evaluated local maxima and watershed models for the detection of individual trees, and they found the efficient performance of these models for dominant and co-dominant trees. Fabian et al. [49] presented their work on random forest regression to predict total trees using local maxima and a classification process to identify a tree, soil and shadow. Li et al. [50] introduced a Field-Programmable Gate Array (FPGA) for tree crown detection, significantly rapid calculations without loss of functioning.
Siamese network [12] has been used in a variety of applications, including signature verification [51], object tracking [52], railway track switches [53], plant leaves disease detection [54], and coronavirus diseases detection [55]. Bromley et al. [56] proposed a neural network model for signature matching by introducing for the very first time Siamese network. Bin Wang et al. [57] presented a few-shot learning method for leaf classification with a small sample size based on the Siamese network. However, we are using a Siamese convolutional neural network (SCNN) combined with a modified brute-force-based lineof-bearing (LOB) algorithm to classify Eucalyptus trees as healthy or unhealthy and to find their geolocation.

Study Area and GIS Data
The Wyndham city council (VIC, Australia) area [13] was chosen as the study area, as shown in Figure 2. It is located on Melbourne's western outskirts and covers an area of 542 km 2 and has a coastmidrule of 27.4 km. It has an estimated resident population of 270,478, according to the 2019 census.
Wyndham is currently the third fastest-growing local council in Victoria. Wyndham's population is growing and diverse, and the community forecasts indicate the population will be more than 330,000 by 2031 [13]. There are 19 suburbs (Cocoroc, Eynesbury, Hoppers Crossing, Laverton North, Laverton RAAF, Little River, Mambourin, Mount Cottrell, Point Cook, Quandong, Tarneit, Truganina, Werribee, Werribee South, Williams Landing, Wyndham Vale) in Wyndham [58]. Wyndham City Council is committed to enhancing the environment and liveability of residents. As part of this commitment, thousands of new trees are planted each year to increase Wyndham's tree canopy cover through the street tree planting program.

Google Street View (GSV) Imagery
The orientation of Eucalyptus trees (healthy and unhealthy) in a 360°GSV can be identified by GSV images. Images of the static street view have been downloaded via the GSV image application programming interface (API) [59] by supplying the corresponding parameter information with uniform resource locators (URLs) [60]. The GSV API snaps the requested coordinates automatically to the nearest GSV viewpoint. We have taken four GSV images with the fov of 90°and headings of 0°, 90°, 180°, 270°, respectively as shown in Figure 3. The "street-view" python package [61] was used for acquiring accurate latitude and longitude values for each GSV viewpoint to convert the coordinates requested to the nearest available Panorama IDs (i.e., unique panorama ID with purchased date [year, month], latitude and longitude). The latest Panorama ID was then used as the input location parameter as shown in Figure 4. We built a Python script to create the URLs and download 1000 GSV images to cover the study field automatically. To remove the Google logos, we cropped the downloaded images.

Annotation Data
For deep supervised learning algorithms to be practical, large image data are essential. From GSV images acquired with screen captures on Google Maps, we created 1000 images data points by manually tagging Eucalyptus trees, as can be seen in Figure 5. To increase the methodology's transferability, random Eucalyptus trees' around 3500 images at the Wyndham city council, Victoria, Australia, were also taken for training, validation and testing of the model. We used "labelling" [62] for ground truth and panorama images. It is a tool written in Python for graphical image annotation and uses Qt for its graphical interface. Annotations are stored in PASCAL VOC, the format used by ImageNet, as XML files. We used the PASCAL VOC format because the Siamese network supports it. In DL, training an algorithm requires an ample training and validation dataset to minimise and prevent overfitting the model. At the same time, a test dataset is required to assess the trained model's performance. In total, 4500 images from GSV and self-gathered images were annotated and used as a dataset for training, 500 for validation, and the other 500 for testing (accuracy) evaluation.

Training Siamese CNN
We trained a Siamese CNN based on the central idea that if we use two input images from the same class, then their feature vectors must also be identical, and if two input images are not of the same class, then their feature vectors must also be different. Depending on the input image types, the vector features must be very different in these situations and the similarity score will also be different.

Siamese CNN Architecture
The word Siamese refers to twins [12]. Siamese Neural Network is a sub-class of neural network architecture that comprises of two or more networks [63]. These networks must be two copies of the same network, i.e., having the same configuration with the same parameters and weights.
We used the Siamese network, consisting of two identical convolutional neural networks (CNN) [64]. The network architecture is the same as in our previous work [21], where an individual CNN is comprising of six convolutional layers and three fully connected or Dense layers. Each convolution layer contains two feature types, input and numeral filters. We used a 3 × 3 filter size for all convolution layers. The number of the filter is transformed into the next linked layer for each layer, which extracts the valuable features. One of the key benefits of the convolutional network is that the input image to the network can be much bigger than the size of the candidate image. Furthermore, in one evaluation, it will measure the similarity in all translated sub-windows on a dense grid. We search multiple scales in one forward-pass by assembling a mini-batch of scaled images. The output of this network performance is a score chart. For enhancing convergence speed, batch normalization [65] is applied to all convolutional layers except the last layer. We used five max-pooling layers that follow each convolutional layer to minimize the computational cost. The max-pooling has an active filter of 2 × 2 that slides on the input image and, based on the filter size; then the maximum value is selected as an output. The first two layers of the fully connected layers have ReLU activation [66] while the last layer (also known as the output layer) has a SoftMax activation [67]. The SoftMax activation finds the maximum probability value node and forwarded it as an output. A dropout of 0.5 is added to the fully connected layers to prevent over-fitting issues in the model. The total model parameters of our model are 51,161,305. Figure 6 is the visual representation of our Siamese network.

Contrastive Loss Function
Features extracted by the subnetworks are fed into the decision-making network component, which determines the similarity. This decision-making network can be a loss function [68], i.e., contrastive loss function [69].
We trained Siamese CNN with contrastive loss function [69]. Contrastive loss is a distance-based loss function used to find embeddings where the Euclidean distance is small in two related points and high in two separate points [69]. Therefore, if input images are of same class, then loss function allows the network to output features close to feature space and if the input images are not similar then the output features are away. The similarity feature function is: where x1 and x2 are the input images that shares the parameter vector w and Fw(x1), Fw(x2) represents the input mapping in the feature space and Dw is the Euclidean distance. By calculating the Euclidean distance, Dw, between the feature vectors, the co-evolutionary Siamese network can be seen as a measuring function that measures the similarity between x1 and x2.
We use contrastive loss function defined by Chopra et al. [70,71], in Siamese network training, defined as follows: where y is a binary label assigned to input images x1 and x2, y = 1 if both the inputs are of the same class and y = 0 if both inputs are of different class, while m > 0 is a margin value, must be chosen experimentally depending on the application domain.
Minimizing L(w, y, x1, x2) with respect to w will then result in a small value of Dw(x1, x2) for images of the same species and a high value of Dw(x1, x2) for images of different species. This is visually represented in Figure 7.

Mapping to Binary Function
A Siamese network takes an input of a pair of images, and the output is a similarity score. The similarity score will be 1 if both images belong to the same class, and it will be 0 if both input images are from different classes.

Geolocation Identification
Our proposed DL-based automatic mapping method for Eucalyptus tree from GSV includes three main steps as shown in Figure 8. They are the following.

1.
Detect Eucalyptus tree in the GSV images using a trained DL network.

2.
Calculate the azimuth from each viewpoint to the detected Eucalyptus tree based on the known azimuth angles of the GSV images, relative to their view point locations, and the horizontal positions of the target in the images as shown in Figure 8 (2) using the mean value of two X values of the bounding box. For instance; suppose a detected Eucalyptus tree has a bounding box that is centered on column 228 in a GSV image that is centered at 0°azimuth relative to the image viewpoint. Each GSV image contains 640 columns and spans a 90°horizontal field-of-view; thus, each pixel spans 0.14. The center of the Eucalyptus tree is 130 pixels to the right of the image center (at column 320) and so has an azimuth of 18.2°relative to the image viewpoint. "Azimuth is an angle formed by a reference vector in a reference plane pointing towards (but not necessarily meeting) something of interest and a second vector in the same plane. For instance, With the sea as your reference plane, the Sun's azimuth may be defined as the angle between due North and the point on the horizon where the Sun is currently visible. A hypothetical line drawn parallel to the sea's surface could point in the Sun's direction but never meet it." [72].

3.
The final step is to estimate the target locations based on the azimuths calculated from the second step as presented in Figure 8 (3). The bounding boxes of detected Eucalyptus trees, which result from the implementation of odometry from monocular vision of GSV images, are the outputs of Eucalyptus tree detection in GSV images using Siamese CNN, as shown in Figure 9. As a result, estimating Eucalyptus tree positions in pure GSV images is a multiple-source localization issue based on passive angle measurements that has been extensively studied [73,74]. One of three major multiple-source localization methods is the LOB-based method [75]. Since detected Eucalyptus tree are not signal sources such as propagating signal sources whose signal intensity can be calculated, a LOB calculation was used to estimate the position of a target Eucalyptus tree shown in Figure 9. Other methods (such as synchronization and power transit) necessitate more stringent requirements for a LOB calculation. Azimuths from multiple image viewpoints to a given Eucalyptus tree enable the Eucalyptus tree position to be triangulated in LOB localization presented in Figure 9. Since the LOB move through the target, the intersection of several LOB is ideally the exact location of the target as can be seen in Figure 9.
When the LOB calculation is used in a dense emitter setting, however, many ghost nodes (i.e., false targets) appear, as shown in our study for estimating Eucalyptus tree locations in GSV images [76] as shown in Figure 10.
As a result, a modified brute-force-based three-station cross position algorithm was used to reduce the ghost node problem of multiple-source localization using LOB measurement as shown in Figure 11; source localization from viewpoints A, B, and C, based on two assumptions:

1.
Targets and sensors are in the xy plane, and 2.
All LOB measurements are of equal precision [77].  The LOB measurement method shown in Figure 10 consists of the following steps: 1.
Find the closest neighboring viewpoints for a given viewpoint; we tested the algorithm's performance using 2 to 8 of the closest neighboring viewpoints (i.e., the corresponding number of views is 3 to 9).

2.
Measure the angles between each pair of LOBs from all viewpoints [78].

3.
Check whether there are positive associations among LOBs (set at 50 m length) from current viewpoint and its neighboring viewpoints. 4.
Repeat the process from step 1 to step 3 for every intersection point. To be more precise, a positive association among LOB is produced by three positive detections from any three views within an angle threshold (β) [77]. As a result, assuming constant detection rates, the number of predicted Eucalyptus trees increases as the number of views increases, based on the likelihood of combination. For example, suppose the total number of Eucalyptus trees estimation possibilities is t(t ε N); if the detection rate remains constant, the likelihood of a positive association with seven views (i.e., C(7, 3)/t) is greater than the probability of positive association with four views (i.e., C(4, 3)/t). To perform cross-validation in this analysis, the closest perspectives were chosen. A list of the nearest neighbouring perspectives (2, 3, 4, 5, 6, 7, and 8 viewpoints; that is 3, 4, 5, 6, 7, 8, and 9 views) and angle thresholds (1°, 2°and 3°) is used for testing to determine whether there is a positive correlation and which threshold functions better. Because of the span of the LOB and the interval between GSV acquisitions, only nine views were chosen for research (10 m). Eight perspectives are on a line on one side of the present perspective in the extreme case of nine views. For the intersection of two 50 m LOB, 80 m is almost the maximum distance needed.

Multiple LOB Intersection Points Aggregation
If we use a modified brute-force-based three-station cross-location algorithm, the result will be more than one LOB intersection point, and all these are possible targets for each Eucalyptus tree. To overcome this situation, we can further apply a geospatial algorithm, i.e., spatial aggregation ("Spatial Aggregation computes statistics in areas where an input layer and a boundary layer overlap" [79]) to determine where a Eucalyptus tree can be found. The primary purpose of this geospatial aggregation algorithm is to provide a central location (expected correct target) within a range of 10 m (this 10 m distance is given to the geospatial algorithm to apply aggregation on) of LOB intersection points. There are three main steps of this geospatial aggregation algorithm, as shown in Figure 12.

1.
Compute the Euclidean distance matrix between all LOB intersection points.

2.
The Euclidean distances between LOB intersection points are used to cluster LOB intersection points.

3.
Determine the centroid of each intersection point cluster.

Spatial Aggregation and Calculation of Points
Aggregation is the process of combining several objects with similar characteristics into a single entity, resulting in a less detailed layer than the original data. Aggregation, like any other type of generalization, removes some information (both spatial and attribute) but simplifies things for the consumer who is more interested in the unit as a whole rather than each individual component within it. Spatial aggregation [80] can be applied on Line, Points or Area; however, the calculation method is slightly different when calculating points. For Line and area features, average statistics are determined using a weighted mean. The following equation is used to calculate weighted mean [79].
where N = number of observations, x i = observations and w i = weights. Only the point features inside the input boundary are used to summarise point layers. As a result, no equations are weighted. It must be ensured that all data from the same database link is stored in the same spatial reference system while performing spatial aggregation or spatial filtering [79,80].

Experiments
We implemented our experiments in Keras [81] backend TensorFlow [82]. Typically, any state-of-the-art architecture may be used as a backbone to extract the features. We performed our experiments with VGG-16 [83], AlexNet [84] and ResNet-34 [85] to explore how effective the backbone network is in extracting features. Siamese network consists of two sister/twin CNNs as both are two copies of the same network. They share the same parameters and network weights were initialized. The initial learning rate was set at 0.001 with an optimizer Stochastic Gradient Descent (SGD) [86], dropout was set to 0.5 and momentum 0.9. We used L2 Regularization to avoid over-fitting in the network [21]. All input images were resized into 100 × 100 before feeding into two identical networks in the Siamese network. The two input images of Eucalyptus trees (X1 and X2) are passed through the networks and then through a fully connected layer to generate a feature vector for each (X1) and (X2). We added a dense layer with ReLU activation and then finally an output layer with SoftMax activation.

System Configuration
All our experiments were performed on Intel Core i7-9700K CPU @ 3.60 GHz (8 cores and 8 threads), 32 GB RAM, NVidia Titan RTX 24GB VRAM GPU. For development and implementation of methodology, we used Python 3.8 and Keras-2.2 with Tensorflow-2.2.0 backend as the deep learning framework.

Approach
The entire dataset was split into 70% training, 10% validation and 20% test set. We applied various data augmentation techniques on the images and resized all images into 100 × 100 before feeding it into the Siamese network. The weights were initialized to avoid the layer activation from disappearing during the forward passage through a deep neural network [87]. We also used early stopping with a patience of 50 epochs.

Results
We used various networks such as VGG-16 [83], ResNet-34 [85], and AlexNet [84] in our experiments. While performing experiments, first, we froze a few layers in the backbone network and trained the network on the remaining layers that we added. The obtained results from the experiments with various networks were not satisfactory, i.e., 85.33%, 82.67% and 79.89%, respectively. The achieved results from the frozen layers were not satisfactory, so we unfroze all the layers and again performed the experiments to extract features for Eucalyptus trees input images. This time the results were 93.2%, 90.43% and 86.26%, respectively. In each experiment, a total of 50 epochs were conducted, where each epoch is the number of iterations. Finally, the Siamese network was trained at a batch size of 32 and stopped training on epoch-50 as shown in Figure 13. The initial experiments with VGG-16, ResNet-34, and AlexNet demonstrated that VGG-16 consistently produced the best results in our scenario, so we used it as the backbone for all of our experiments.The resulting features of VGG-16 experiments are transferred to the decision network to identify whether or not two input images are similar. A sample output is shown in Figure 14. There are many methods of performance measurement that are used to evaluate the performance of neural networks. They include precision, recall, accuracy, and f1-score. Precision tells us about the correct predictions made out of false-positive while recall tells us about the correct predictions made out of false-negatives. The accuracy is the number of correct predictions out of both false-positives and false-negatives. We calculated all of our trained model's performance metrics using the formulas in Equations (4)-(7) from the confusion matrix.
where TP is true positives, TN is true negatives, FP is false positives and FN is false negatives. Here the TP and TN are the correct predictions while the FP and FN are the wrong predictions made by our model. After computing values from the confusion matrix, the results are shown in Table 1.

Location Estimation Accuracy Evaluation
The location estimation accuracy of the Eucalyptus tree is shown in Table 2 as a percentage of the number of predicted Eucalyptus tree positions within the buffer zones of a reference Eucalyptus tree. To assess the effects of the number of views, the angle threshold, and the distance to the middle of a chosen road, we considered seven views (i.e., 3,4,5,6,7,8,9), three angle thresholds (i.e., 1°, 2°, and 3°), and three distance thresholds to the centre of a selected road (i.e., 3 m, 4 m, and 5 m) to determine the impacts of the number of views, the angle threshold, and the distance threshold to the centre of a selected road. Around half of the estimated Eucalyptus tree locations were within the 6m buffer zone of their reference locations using the method we tested, and up to 79% of the estimated locations were within the 10 m buffer zone of their reference locations using the method we tested. However, about 12% of the approximate Eucalyptus tree positions were inside the 2 m reference position buffer zone.  Table 2 reveals that using more views and higher angle thresholds resulted in a more approximate Eucalyptus tree in the modified brute-force-based three-station cross-location algorithm, which is due to the increased relaxation of the modified brute-force-based three-station cross-location algorithm. Meanwhile, because relaxation allows more ghost nodes to be estimated Eucalyptus trees, more estimated Eucalyptus trees may result in lower accuracy (see Table 2). Table 2 shows that when comparing the results of other numbers of views, the average percentage of predicted Eucalyptus tree positions being inside all buffer zones of reference Eucalyptus trees for the results of eight views is the highest (47.80%). Using a greater distance to the centre of selected road thresholds, on the other hand, resulted in less approximate Eucalyptus trees. Since the optical GSV imagery was the only data source used to perform the localization, the precision of the position estimation for the Eucalyptus tree is fair, and the estimated data are helpful.
It is worth noting that GSV image distortion, terrain relief, GSV position accuracy, or limitations in the process we used may have caused location mismatches in some cases due to the ground positions of Eucalyptus trees varying from the orthographic predicted locations estimated from GSV images. For areas where GSV imagery is available and a Eucalyptus tree distribution map with a 10 m accuracy is appropriate, our proposed approach has a lot of promise. When a given Eucalyptus tree was not identified in at least three GSV images out of a certain number of views, our method failed to estimate the Eucalyptus tree's location. Three is the minimum number of images needed to triangulate a position and remove ghost nodes (as can be seen in Figure 10). This explains why the number of projected Eucalyptus trees rises in tandem with the number of views (see Table 2).

Discussion
Eucalyptus trees are evergreen; however, an early sign that shows they are unhealthy is if they turn brown, either partially or completely. Various signs/aspects can be spotted in unhealthy Eucalyptus trees; one of the most apparent is the loss or decrease of leaf growth in all or parts of the tree. Other symptoms include the bark of the tree becoming brittle and peeling off, or the trunk of the tree becoming sponge-like or brittle. A tree may have bare branches, i.e., without leaves, in any season can be a sign of dead tree or branches that are loose and weak could indicate a dead or dying tree. Weak joints of Eucalyptus tree can be dangerous, as it means branches can come loose during bad weather [88]. If the whole Eucalyptus is dead, it can be left untouched for a period of maximum two years; however, after this, it becomes unsafe and needs to be removed.
Some of the common diseases in Eucalyptus [89] trees are shown in Figure 15a-c. It is critical to identify such unhealthy trees in order to improve the urban Eucalyptus tree's health and environment. a.
Canker disease that infects the bark and then goes inside of the tree, b.
Phytophthora disease goes directly under the bark by discolored leaves and dark brown wood, and c.
The heart disease damages the tree from inside and outside.
Numerous approaches are studied in the current literature with regards to trees and their health in urban areas. Shendryk et al. [30] worked on the trunks of Eucalyptus trees, as well as their complex shapes. They used Euclidean distance clustering for individual tree trunk detection. Up to 67% of trees with diameters greater than or equal to 13 cm were successfully identified using their technique. Milto Miltiadou et al. [29] presented a new way to detect dead Eucalyptus camaldulensis with the introduction of DASOS (feature vector extraction). To do so, they attempted to research the odds of dead trees being detected using Voxel-based full-waveform (FW) LiDAR without tree demidruleation. It has been discovered that it is possible to determine tree health without outlining the trees, but since this is a new area of research, there are still many improvements to be made. Xiao et al. [47] presented that the trees were examined using remote sensing data and GIS techniques to examine their health. Trees had their conditions analysed in relation to physiognomy on two scales: the tree itself and in terms of pixels. A pixel-by-pixel analysis was performed in which each tree pixel within the tree crown was classified as either healthy or unhealthy based on values of vegetation index. A quantitative raster-based analysis was conducted on all of the trees, where they used the tree health index, which is a quantitative value that describes the number of healthy pixels compared to the total tree pixels on the crown. Classifying the tree as healthy if the index was greater than 70% of the overall index indicated that a random sample of 1186 trees was used to verify the accuracy of the tree data. When viewed at the whole tree level, approximately 86% of campus trees were found to be healthy and approximately 88% of mapping accuracy. In contrast to the above-discussed literature, we propose a deep learning-based network, Siamese convolutional neural network (SCNN), combining a modified brute-forcebase line-of-bearing (LOB) algorithm to classify Eucalyptus trees as healthy or unhealthy and to find their geolocation from the GSV and ground truth imagery. Our proposed method successfully achieved an average accuracy of 93.2% in identifying healthy and unhealthy trees and their geolocation. For training and validation of SCNN, a dataset of approximately 4500 images was used.
The main purpose of using Google imagery is that Google imagery is available publicly online and no privately man laboured efforts are required in order to capture the images. Second, using of sentinel imagery would be an expensive option and time consuming solution, as the sentinel's imagery requires longer time period to obtain images of specific locations and needs to subscribe to pay for receiving the service; i.e., it is not publicly available. The sentinel imagery is also protected by copyright. Therefore, in this work, we used GSV and ground truth image for obtaining better results and overcome the some of the challenges as discribed in the introduction section. It is worth mentioning that "the satellite data on Google Maps is typically between 1 to 3 years old". According to the Google Earth and other sources, data updates usually about once a month, but they may not show real-time images. Google Earth gathers data from various satellite and aerial photography sources, and it can take months to process, compare and set up the data before it appears on a map. However, in some circumstances, Google Maps are updated in real time to mark major events and to provide assistance in emergency situations. For example, it updated imagery for the 2012 London Olympic Games just before the Opening Ceremony, and it provided updated satellite crisis maps to help aid teams assess damage and target locations in need of help shortly after the Nepal earthquake in April 2015 [90,91].

Conclusions, Limitations, and Future Directions
Identifying various healthy and unhealthy Eucalyptus trees using traditional and manual methods is time-consuming and labor-intensive. This study is primarily an exploratory one that employs a DL-based method for identification, classification, and geolocation estimation. In this study, we present a Siamese CNN (SCNN) architecture trained to identify and classify healthy and unhealthy Eucalyptus trees and their geographical location. The SCNN uses the contrastive loss function to calculate its similarity score from two input images (one for each CNN). With the large number of GSV images available online, the method could be a useful tool for automatically mapping healthy and unhealthy Eucalyptus trees, as well as mapping their geolocation on metropolitan streets and roads. Although the model correctly identifies the Eucalyptus tree's health status and position, is certainly worth mentioning some limitations to consider. First, it is still challenging to map up-to-date GSV images with geographical location information because the changing nature of imagery is rapid. Secondly, to achieve reasonable accuracy for geolocation with the DL, a large amount of training data is needed. Thirdly, when Eucalyptus trees have a big lean, the LOB method requires more attention; this is due to terrain and GSV's visual distortion without compensation. Finally, the method suggested that automatic tree geolocation recognition can be useful, and in a future study it might be used to detect and classify other objects along the roadside.

Data Availability Statement:
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest:
The authors declare no conflict of interest.