Machine Learning in Urban Tree Canopy Mapping: A Columbia, SC Case Study for Urban Heat Island Analysis

: As the world’s urban population increases to the predicted 70% of the total population, urban infrastructure and built-up land will continue to grow as well. This growth will continue to have an impact on the urban heat island effect in all of the world’s cities. The urban tree canopy has been found to be one of the few factors that can lessen the effects of the urban heat island effect. This study seeks to accomplish two objectives: ﬁrst, we examine the use of a commonly used machine learning classiﬁer (e.g., Support Vector Machine) for identifying the urban tree canopy using no-cost high resolution NAIP imagery. Second, we seek to use Land Surface Temperature (LST) maps derived from no-cost Landsat thermal imagery to identify correlations between canopy loss and temperature hot spot increases over a 14-year period in Columbia, SC, USA. We found the SVM imagery classiﬁer was highly accurate in classifying both the 2005 imagery (94.3% OA) and the 2019 imagery (94.25% OA) into canopy and other classes. We found the color infrared image available in the 2019 NAIP imagery better for identifying canopy than the true color images available in 2005 (97.8% vs. 90.2%). Visual analysis based on the canopy maps and LST maps showed temperatures rose near areas where tree canopy was lost, and urban development continued. Future studies will seek to improve classiﬁcation methods by including other classes, other ancillary data sets (e.g., LiDAR), new classiﬁcation methods (e.g., deep learning), and analytical methods for change detection analysis.


Introduction
Rapid urbanization is challenging our ability to create sustainable and safe cities for the 55% of the world's population that live there. It is predicted that by 2050 nearly 70% of the world's population will live in an urban area [1]. Urbanization has a widely recognized impact on the local weather and climate, and one of the most common ways is the Urban Heat Island (UHI) effect. An UHI can be defined as an area where the urban build up creates a microclimate that is warmer than the surrounding rural areas. UHIs are caused by several factors, including the large extent of impervious surface materials covering a large portion of the area in a given city, the layout of urban environments, anthropogenic activities that create heat, and the removal of natural landscapes [2][3][4]. The UHI consequently contributes to several environmental changes, such as vegetation growth and water and air quality. The increased heat can also severely impact the health of all those who live in the city [5,6]. In response to the historic growth and in hope for a more sustainable future for our cities, the UN established the sustainable development goal 11 in 2015 [7]. This requires a better understanding of how the heat landscape of a city, as well as the mitigating factors such as tree canopy, are changing over time so that we can adjust course and be more protective and sustainable in our approaches to development.
In the many studies that have been conducted thus far to better understand the UHI phenomenon, UHI has been separated into two classes: Atmospheric UHI and Surface UHI. Atmospheric UHI is monitored with in situ sensors that can be expensive to install and use. Surface UHI can be observed using remote sensing data, as described by [8]. Observing the surface UHI phenomenon using surface temperatures collected by remote sensing instruments (e.g., Landsat, MODIS) allows for spatial and temporal analysis of cities all around the globe from historic and modern imagery collections. Extensive work has been completed to examine the UHI phenomenon across the globe [9][10][11][12][13]. While there are many benefits to examining UHI via remote sensing technologies, such as large areal coverage and low cost, there are also well documented limitations. For example, the longwave emissions recorded by the typical sensor on a satellite or aircraft are only a small fraction of the complex radiation and processes that impact UHI [14,15]. Furthermore, when daytime remote sensing analysis is used rather than nighttime, spatial patterns can be unexpected due to the UHI being attributed to the heat storage in urban materials and impervious surfaces [14,15]. Despite these limitations and others, remote sensing technologies are able to provide a view of UHI that is unique and can be useful [15].
Vegetation has been identified as a mitigating factor of UHI and the heat impacts on humans living in the cities. For example, a tree canopy can affect building energy use [16], reduce daytime urban heat in the summer [17], and urban stormwater management [18,19]. Therefore, a tree canopy is an important attribute of the urban environment that requires regular monitoring. As such, government organizations in cities are often formed to regulate tree removal and planting throughout urban areas. Despite the efforts of these organizations to control canopy gain and loss, city growth has inevitably led to a decrease in a canopy over time for many cities. Obtaining and managing up-to-date tree canopy datasets can be costly and time consuming for entire cities. Many studies where mapping the canopy was the first objective relied on LiDAR data to map the tree canopy, but LiDAR data can be costly and a limited resource [20][21][22]. High resolution historical aerial or satellite imagery can be used to identify tree canopy changes over time [23][24][25].
While several studies have investigated the UHI effect in a city or worked to map the urban tree canopy, few studies have brought both elements together to investigate spatial patterns and correlations to show changes over time. Many studies have acknowledged the connection between UHI effect and vegetation cover [26][27][28]. More recent studies have made further progress, but often using expensive LiDAR data or high-resolution satellite data rather than aerial imagery. Furthermore, [29] found that even just a 10% increase in forest can decrease the UHI effect by 0.83 • C, and [30] examined the role of the canopy structure on both nighttime and daytime UHI effects by using daytime and nighttime MODIS heat map data and a 2-m satellite and LiDAR-derived tree canopy and found that the tree canopy showed a stronger influence in the day. One study also examined change detection of canopy loss and heat change over an 8-year period for a moderately sized city in Massachusetts, but used costly LiDAR and satellite imagery to create the canopy [20]. Many other studies regarding the effect of trees on elements of the UHI effect have been carried out on a very large scale, rather than city wide [17][18][19]. This study proposes the use of each of the aforementioned remote sensing resources to look at the effect canopy changes have had on the UHI of a moderately sized city, Columbia, SC, USA, over a 14-year period. The objectives of this study were two-fold: A. Examine the accuracy of using a machine learning (ML) object-based image classifier for creating a high resolution (1-meter) tree canopy estimation from free NAIP imagery for a moderately sized city. B. Identify spatial patterns and correlations between tree canopy change and surface heat change over 14 years in Columbia, SC.
To accomplish these objectives, we propose the use of a support vector machine (SVM) ML classification algorithm to create a tree canopy map for two dates of high-resolution aerial imagery. Furthermore, we will use 30-m spatial resolution Landsat satellite data for obtaining surface temperature information to inform our visual analysis. The results of the study will produce change maps that can be used for visibly identifying areas of change and concern.

Study Area
Columbia, South Carolina (SC), USA, is the capital city in the state of South Carolina and home to over 130,000 people [31]. The present study examines the city of Columbia according to its present limits rather than its metropolitan area (Figure 1). The city of Columbia rests at the fall line, or the boundary, of the piedmont region and the Atlantic coastal plane. The city was built along the Congaree river, which forms at the confluence of the Broad and Saluda Rivers. Columbia has a humid subtropical climate with hot and humid summers. Sitting at the center of the state in between the blue ridge mountains and the Atlantic coast, the city encounters some of the state of South Carolina's hottest temperatures.
data for obtaining surface temperature information to inform our visual analysis. The results of the study will produce change maps that can be used for visibly identifying areas of change and concern.

Study Area
Columbia, South Carolina (SC), USA, is the capital city in the state of South Carolina and home to over 130,000 people [31]. The present study examines the city of Columbia according to its present limits rather than its metropolitan area (Figure 1). The city of Columbia rests at the fall line, or the boundary, of the piedmont region and the Atlantic coastal plane. The city was built along the Congaree river, which forms at the confluence of the Broad and Saluda Rivers. Columbia has a humid subtropical climate with hot and humid summers. Sitting at the center of the state in between the blue ridge mountains and the Atlantic coast, the city encounters some of the state of South Carolina's hottest temperatures.

Data
Two sets of open-source remote sensing data were used for each objective, respectively. First, National Agriculture Imagery Program (NAIP) Images were used for classifying the tree canopy, among other land cover classes. This program began in 2002 and is administered by the U.S. Department of Agriculture (USDA) Farm Service Agency. Aerial imagery is collected during growing seasons, always with leaf-on conditions. The digital sensors used for NAIP imagery collection meet rigid calibration specifications [32]. NAIP imagery is generally collected at a 1-m spatial resolution (50-60 cm in some areas) across the conterminous United States. In order to provide a more accurate comparison between the 2005 (1 m) and 2019 (60 cm) data, the 2019 data were resampled to a 1-m spatial resolution using the Resample tool in ArcGIS Pro before processing and analysis. NAIP imagery has previously been shown to be an effective remote sensing resource for mapping urban tree canopy [33].

Data
Two sets of open-source remote sensing data were used for each objective, respectively. First, National Agriculture Imagery Program (NAIP) Images were used for classifying the tree canopy, among other land cover classes. This program began in 2002 and is administered by the U.S. Department of Agriculture (USDA) Farm Service Agency. Aerial imagery is collected during growing seasons, always with leaf-on conditions. The digital sensors used for NAIP imagery collection meet rigid calibration specifications [32]. NAIP imagery is generally collected at a 1-m spatial resolution (50-60 cm in some areas) across the conterminous United States. In order to provide a more accurate comparison between the 2005 (1 m) and 2019 (60 cm) data, the 2019 data were resampled to a 1-m spatial resolution using the Resample tool in ArcGIS Pro before processing and analysis. NAIP imagery has previously been shown to be an effective remote sensing resource for mapping urban tree canopy [33].
Temperature data for the Columbia, SC area were derived from band 6 of the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) sensor. Landsat 7 was launched in 1999 and notably experienced a failure of the Scan Line Correction (SLC) in 2003 that affects the use of Landsat 7 data. However, tools have been created to correct for the scan lines in ArcGIS software, making the imagery available for other purposes. Thermal imagery data for July of 2005 and 2019, the same years as the NAIP imagery being used for the tree canopy estimation, were chosen to examine spatial heat changes in Columbia. The imagery data were originally collected at a 120-m resolution but were resampled to 30 m using the Resample tool in ArcGIS Pro for comparative analysis. The data collection occurred on 26  Canopy classification was completed using National Agriculture Imagery Program (NAIP) images from 2005 and 2019. Each image was first clipped to the extent of the city of Columbia shapefile. Fort Jackson and much of the Congaree, Broad, and Saluda rivers were removed. Figure 2 depicts the general process for obtaining a canopy classification.
software, making the imagery available for other purposes. Thermal imagery data for July of 2005 and 2019, the same years as the NAIP imagery being used for the tree canopy estimation, were chosen to examine spatial heat changes in Columbia. The imagery data were originally collected at a 120-m resolution but were resampled to 30 m using the Resample tool in ArcGIS Pro for comparative analysis. The data collection occurred on 26 July 2005 at 11:49 AM local time with the sun azimuth at 117.34 degrees and sun elevation at 62.64 degrees for the first collection. The second collection occurred on 1 July 2019 at 11:49 AM local time with the sun azimuth at 111.18 degrees and sun elevation at 65.34 degrees. Other information regarding the Landsat and NAIP data is presented in Table 1.

Canopy Classification
Canopy classification was completed using National Agriculture Imagery Program (NAIP) images from 2005 and 2019. Each image was first clipped to the extent of the city of Columbia shapefile. Fort Jackson and much of the Congaree, Broad, and Saluda rivers were removed. Figure 2 depicts the general process for obtaining a canopy classification.  Canopy classification was performed using SVM object-based classification in ESRI's ArcGIS Pro software. A major advantage of object-based classifications is the ability to overcome the "salt and pepper" effect that can occur in pixel-based classifications. This phenomenon refers to rogue single pixels or small pixel clusters that are classified as a particular class, though the pixel is nowhere near other pixels of that class. Support Vector Machine classifiers use a supervised classification method based on statistical learning theory. It was developed in the computer science community in the 1990s and has since been adapted to remote sensing image classification [34], pg. 337. A SVM classifier computes a maximal margin hyperplane where the hyperplane separates the classes (two in the case of Figure 3), and is the furthest away possible from any of the training data. The boundary observations are transformed into a slab and termed 'support vectors'. SVM classifiers allow nonlinear decision boundaries in more complex datasets than the one shown in Figure 3.
theory. It was developed in the computer science community in the 1990s and has since been adapted to remote sensing image classification [34], pg. 337. A SVM classifier computes a maximal margin hyperplane where the hyperplane separates the classes (two in the case of Figure 3), and is the furthest away possible from any of the training data. The boundary observations are transformed into a slab and termed 'support vectors'. SVM classifiers allow nonlinear decision boundaries in more complex datasets than the one shown in Figure 3. Training data polygons were digitized throughout each image for the following classes: tree canopy, urban, water, grass/agriculture, shadow, and barren. Each of the polygons were digitized from visual interpretations of the NAIP imagery or Google Earth Imagery. Visits to and pictures from various sites throughout the study area were only used for the 2019 training data, as there have been several changes since 2005. For the 2005 image, 1202 total polygons were digitized for training across all 6 target classes. For the 2019 image, there were 1253 total training polygons for all 6 target classes. A map showing the makeup of the training data can be seen in Figure 4. The goal in digitizing training data was to maintain as accurate of a database as possible. This was partly accomplished by focusing on areas that the authors were most familiar with, and also by spreading the training data as evenly as possible. The authors believe an appropriate balance was struck. Training data polygons were digitized throughout each image for the following classes: tree canopy, urban, water, grass/agriculture, shadow, and barren. Each of the polygons were digitized from visual interpretations of the NAIP imagery or Google Earth Imagery. Visits to and pictures from various sites throughout the study area were only used for the 2019 training data, as there have been several changes since 2005. For the 2005 image, 1202 total polygons were digitized for training across all 6 target classes. For the 2019 image, there were 1253 total training polygons for all 6 target classes. A map showing the makeup of the training data can be seen in Figure 4. The goal in digitizing training data was to maintain as accurate of a database as possible. This was partly accomplished by focusing on areas that the authors were most familiar with, and also by spreading the training data as evenly as possible. The authors believe an appropriate balance was struck.
After preparing extensive training data, the image was segmented into objects for classification. In ArcGIS Pro, segmentation parameters include a 1-20 scale for spatial characteristics of the data and a 1-20 scale for spectral characteristics, as well as a minimum segment size. Spectral detail was attributed the highest importance, with spatial detail coming in second. In the range of 1.0 to 20.0, spectral detail was placed at 18, while spatial detail was placed at a 16 in an effort to smooth out the image while also maintaining canopies as best as possible. The minimum segment size was 5 pixels (5 × 5 m) to accommodate smaller buildings and single trees.
The SVM ML classifier was trained using all of the training polygons, focusing on the following segment or object properties: active chromaticity color, mean digital number, standard deviation, and compactness. Each image used the same characteristics. While only an RGB image was available for 2005, a 2019 color infrared (CIR) image was available and used for classification. With the CIR image, trees and tree canopies were visually more easily distinguishable from other classes. After classification was complete, classes that did not include a tree canopy were combined to become a "no canopy" class, while the canopy class was left alone. This was carried out to provide a focus on the spatial distribution of the tree canopy. Geographies 2023, 3, FOR PEER REVIEW 6 After preparing extensive training data, the image was segmented into objects for classification. In ArcGIS Pro, segmentation parameters include a 1-20 scale for spatial characteristics of the data and a 1-20 scale for spectral characteristics, as well as a minimum segment size. Spectral detail was attributed the highest importance, with spatial detail coming in second. In the range of 1.0 to 20.0, spectral detail was placed at 18, while spatial detail was placed at a 16 in an effort to smooth out the image while also maintaining canopies as best as possible. The minimum segment size was 5 pixels (5 × 5 m) to accommodate smaller buildings and single trees.
The SVM ML classifier was trained using all of the training polygons, focusing on the following segment or object properties: active chromaticity color, mean digital number, standard deviation, and compactness. Each image used the same characteristics. While only an RGB image was available for 2005, a 2019 color infrared (CIR) image was available and used for classification. With the CIR image, trees and tree canopies were visually more

Canopy Classification Accuracy Assessment
Accuracy assessment metrics were used to compare the accuracy of the 2005 and 2019 classifications. A confusion matrix was calculated using ArcGIS Pro's Compute Confusion Matrix tool. Four main statistics were used to assess accuracy: the producer's accuracy, the user's accuracy, overall accuracy, and kappa. The producer's accuracy is the total number of pixels correctly classified for a class divided by the total number of pixels in that class as known from the ground truthing data. The user's accuracy is the total number of pixels correctly classified into a class, divided by the total number of pixels classified into that class. The overall accuracy (OA) percentage was calculated as follows: where x ii represents a correctly classified pixel, and N is the total number of pixels being assessed. Kappa analysis is a multivariate technique for accuracy assessment first published in [35]. A Kappa statistic is similar to overall accuracy, but each considers slightly different elements of data. A kappa estimate ( K) was determined as described in [36]: where N is the total number of samples, k is the number of rows in the confusion matrix, x ii is the number of observations in row i and column i, and x i+ and x +j are the marginal totals for row i and column j.
The ground truth data used to validate the classification in the aforementioned statistics came from digitized polygons of visually determined obvious land cover truths. The authors took exceptional care to only use areas that they were confident only contained the class to be validated. The validation was completed with the two classes only-tree cover and the joined class from the several other classes including urban, agriculture/grass, shadows, bare ground, and water. Using the digitized polygons, 600 random validation points were created in the polygons per class. The validation points saved the ground truth value (tree canopy or now), as well as the classified value for comparison and calculation.

Surface Temperature Mapping
The surface temperature maps were generated from Landsat 7 Collection 2 Level 2 satellite data captured on 26 July 2005 and 1 July 2019. As mentioned above, the SLC failed in 2003, creating missing data errors in each recorded image until the mission's conclusion in 2022. In order to use the Landsat 7 datasets for visual analysis, Landsat Toolbox (https: //drive.google.com/file/d/1FKc-G1vMVtWXi66hh15VKoj-zk3UPWgX/view (accessed on 20 April 2023)) was incorporated into ArcGIS Pro. The tool Fix Landsat 7 Seamline Errors was used to interpolate between the known values and the no data pixels. Data were geometrically and radiometrically corrected by the data provider (USGS) before use [37][38][39]. Land surface temperature data were calculated and derived by the USGS before data download and are described below. A general depiction of the process can be found in Figure 5.
To complete the conversion of the raw values to surface temperature in degrees Celsius, several steps were required. First, the raw values were converted into radiance [37]. This was achieved using the equation: where L λ is the spectral radiance, QCAL is the quantized calibrated pixel value in digital number (DN), LMAX λ is the spectral radiance scaled to QCALMAX, LMIN λ is the spectral radiance scaled to QCALMIN, QCALMIN is the minimum quantized calibrated pixel value in DN, and QCALMAX is the maximum quantized calibrated pixel value. These values were obtained from the metadata file provided with the Landsat imagery [37].
To convert the radiance values to brightness temperature values in degrees Kelvin, the radiance values for every pixel were passed through the following equation: where T is the temperature in degrees Kelvin, K1 is calibration constant 1 provided in the metadata, K2 is calibration constant 2 also provided in the metadata, and L λ is the spectral radiance in (Watts/(m 2 * sr * µm)) [  To complete the conversion of the raw values to surface temperature in degrees Celsius, several steps were required. First, the raw values were converted into radiance [37]. This was achieved using the equation: where Lλ is the spectral radiance, QCAL is the quantized calibrated pixel value in digital number (DN), LMAXλ is the spectral radiance scaled to QCALMAX, LMINλ is the spectral radiance scaled to QCALMIN, QCALMIN is the minimum quantized calibrated pixel value in DN, and QCALMAX is the maximum quantized calibrated pixel value. These values were obtained from the metadata file provided with the Landsat imagery [37].
To convert the radiance values to brightness temperature values in degrees Kelvin, the radiance values for every pixel were passed through the following equation: where T is the temperature in degrees Kelvin, K1 is calibration constant 1 provided in the metadata, K2 is calibration constant 2 also provided in the metadata, and Lλ is the spectral radiance in (Watts/(m 2 * sr * μm)) [37]. From here, further analysis was needed to translate these values into LST. [15,37,[40][41][42][43] described the use of a single channel algorithm to convert the brightness temperatures at the sensor to surface temperatures on the earth. Landsat 7 data have only one thermal band, and therefore only a single channel algorithm would suffice. Advanced Spaceborne Thermal Emission and Reflection Radiometer (AS-TER) Global Emissivity Dataset (GED) data, ASTER Normalized Difference Vegetation Index (NDVI) data, and atmospheric profiles of geopotential height, specific humidity, where LST is the temperature, with correction by emissivity ( • K); T is the temperature of the brightness at the sensor ( • K); λ is the wavelength of the emitted radiance; E is the emissivity; p is deduced from Equation (6) [15,[40][41][42].
where σ is the Boltzmann constant (1.38 × 10 −23 J/K), h is Planck's constant (6.626 × 10 −34 Js), and c is the speed of light (2998 × 10 8 m/s). When the data were obtained from USGS EarthExplorer (https://earthexplorer.usgs. gov (accessed on 20 April 2023)), the steps above were already completed. To scale the values into their original Kelvin values, pixel values were multiplied by 0.00341802 and added to 149 [44]. The new Kelvin values were easily transformed into degrees Celsius by subtracting 273.15.
Quality assessment data were available for the Landsat Level 2 Surface Temperature Data in a separate band where each pixel value describes uncertainty in the temperature values. Pixels closer to clouds and with higher uncertainty had higher overall values in the quality assessment band. The band was examined to identify regions where temperature values were less likely to be correct [38,39].
Each map was visualized using the natural breaks statistical classification for the data. While each data class did not match between the two dates, the natural breaks method was employed rather than a manual method to preserve the areas of hot and cold across the region. The city and surrounding areas experienced a unique temperature range each day the data were captured; therefore, the authors believe that "hot areas" should be compared in relation to other "hot areas" between the dates.

Tree Canopy Mapping
Tree canopy mapping using the two NAIP images resulted in two land cover maps, one displaying the estimated tree canopy from 2005 and the other estimating the tree canopy from 2019. The results of the classifications can be seen in Figures 6 and 7. Training for each of the images took less than 4 h. After training, classification only required 2.5 h using a Dell Inspiron 5680 6-core intel i7 CPU with 16 gb Ram and Nvidia GTX 1060 3 gb GPU.  The accuracy assessment conducted for each classification showed that the canopy classifications performed quite well (Tables 2 and 3). The overall accuracies for both classifications were just over 94% and the kappa statistics were very similar as well. For the 2005 NAIP image classification, the 'other' category had a much better users accuracy than producers accuracy, and the tree canopy class had a better producers accuracy than users accuracy. For the 2019 NAIP image classification, we found the exact opposite to be true for each of the 'other' class and tree canopy class. Overall accuracy presented in the accuracy assessment suggests confidence in the overall classification.
Visibly, however, there were still some discrepancies in the classification. The canopy class, when it did get confused, was often mistaken for shadows and grass, or vice versa. This was especially apparent in the 2005 image where a near infrared (NIR) band was not included during data capture. This did not seem to be an issue with the 2019 map. The canopy cover map for 2019 showed a significant loss of canopy in Columbia. Some of the changes in canopy could be attributed to development outside of the center of the city, and other tree removal projects that have occurred, including a Bull Street development project in downtown Columbia. Others, including the thinning of canopy throughout the entirety of downtown Columbia, are more difficult to account for. Some of the changes may be attributed to the classification difficulties discussed above in 2005 and not in 2019. Others are more likely to be trees being cut down in yards as nuisances.  The accuracy assessment conducted for each classification showed that the canopy classifications performed quite well (Tables 2 and 3). The overall accuracies for both classifications were just over 94% and the kappa statistics were very similar as well. For the 2005 NAIP image classification, the 'other' category had a much better users accuracy than

Surface Temperature Maps
The land surface temperature (LST) maps made from the Landsat 7 tiles are shown in Figure 6. Each of the maps show interesting surface temperature patterns over the 14-year period. Many of the assumed patterns hold true: surface temperature is highest in both maps where concrete and other impervious surfaces are in place throughout the urban environments. Furthermore, known forests within city limits and the tree canopy dispersed throughout the city show relatively cool temperatures in comparison with the impervious surfaces. As described in Section 2.3.3, the ranges of values between the two map dates are different and the statistical makeup of temperatures in the area are also different. It is important to remember while interpreting the surface temperature maps that while the temperature breakdowns are different, we are able to visibly determine "hot spots" of urban heat from the current color scheme for each date.
Quality assessment maps were created from the Landsat collection quality assessment bands for surface temperature. These maps were overlaid on top of the heat maps to show where the largest areas of uncertainty were by using a highlighter effect (Figure 8). These maps indicate that the region where the Saluda and Broad Rivers join to make the congaree river provides consistent low-confidence temperature values. Other regions around the city and study area changed in quality from 2005 to 2019. Overall, most of the study area had low quality assessment values that represented trustworthy temperature data.
Geographies 2023, 3, FOR PEER REVIEW 12 study area had low quality assessment values that represented trustworthy temperature data. Between the 2005 and 2019 heat maps (Figure 9), it is possible to discern that an apparent cooling of downtown Columbia has occurred (in pink). This apparent cooling has occurred despite the apparent loss of tree canopy. Several areas that showed high surface heat levels in 2005 once again showed at least a portion of that heat again in 2019. Several areas within our study area and just beyond, where other urban growth has occurred, experienced an increase in surface heat levels (in blue circles).

Spatial Patterns and Correlations
Because the canopy data were only computed for Columbia city and not the surrounding towns, we can only speak to the Columbia heat patterns in relation to the tree canopy changes. Some larger canopy losses correlated well with tree canopy loss ( Figure 10). Two small areas in the Harbison State Forest (indicated in Purple) lost tree canopy and showed an increase of surface heat. Furthermore, an area of northeast Between the 2005 and 2019 heat maps (Figure 9), it is possible to discern that an apparent cooling of downtown Columbia has occurred (in pink). This apparent cooling has occurred despite the apparent loss of tree canopy. Several areas that showed high surface heat levels in 2005 once again showed at least a portion of that heat again in 2019. Several areas within our study area and just beyond, where other urban growth has occurred, experienced an increase in surface heat levels (in blue circles).
Geographies 2023, 3, FOR PEER REVIEW 12 study area had low quality assessment values that represented trustworthy temperature data. Between the 2005 and 2019 heat maps (Figure 9), it is possible to discern that an apparent cooling of downtown Columbia has occurred (in pink). This apparent cooling has occurred despite the apparent loss of tree canopy. Several areas that showed high surface heat levels in 2005 once again showed at least a portion of that heat again in 2019. Several areas within our study area and just beyond, where other urban growth has occurred, experienced an increase in surface heat levels (in blue circles).

Spatial Patterns and Correlations
Because the canopy data were only computed for Columbia city and not the surrounding towns, we can only speak to the Columbia heat patterns in relation to the tree canopy changes. Some larger canopy losses correlated well with tree canopy loss ( Figure 10). Two small areas in the Harbison State Forest (indicated in Purple) lost tree canopy and showed an increase of surface heat. Furthermore, an area of northeast

Spatial Patterns and Correlations
Because the canopy data were only computed for Columbia city and not the surrounding towns, we can only speak to the Columbia heat patterns in relation to the tree canopy changes. Some larger canopy losses correlated well with tree canopy loss ( Figure 10). Two small areas in the Harbison State Forest (indicated in Purple) lost tree canopy and showed an increase of surface heat. Furthermore, an area of northeast Columbia experienced tree canopy loss and a subsequent increase in surface temperature (indicated in black).

Discussion
The spatial analysis of the LST heat maps and success of the tree canopy map followed the successes of other studies. For example, the Harbison State Forest area ure 7 circled in purple) is a significant cooling area, similar to the areas [29] discov that had decreased in overall temperature for that area. Large urban forests are like prevent nearby hot spots; indeed, [20] found similar results. A high overall accuracy achieved in their study using Worldview 2 high resolution satellite imagery to deter urban tree canopy gains and losses, and they used LST maps derived from similar so to discover hot spots that were highly correlated with new developments and tree ca loss. Furthermore, [13] found that new built-up land in India was becoming more c lated with urban heat islands, or hot spots, around the city area from 1990 to 2020 visual findings linking the tree canopy losses to new hot spots in the Columbia, SC support the previous findings by [20] and [13].
The visible LST decrease in the center of Columbia, while the suburbs and surro ing areas became warmer, is an interesting phenomenon. While no definitive conclu has been drawn, this is possibly due to an increase in vegetation in the downtown of Columbia. As regions of the city have undergone growth, the Columbia Tree and pearance Commission has sought to ensure there is adequate vegetation placed thro out the city [45]. There are several regulations prohibiting the removal of certain tre well. These policies appear to have impacted the heat island at the center of the city. thermore, the warming outside of central Columbia is consistent with the growth th has encountered. As Columbia grows, developments continue within and outside o city limits to accommodate the increase in population. While shadows could also pro a less thrilling reason for the decreases in downtown temperature, we argue this i Negative correlations were also apparent upon visual investigation. For example, the thinning of tree canopy throughout downtown Columbia and many of the suburbs included in the city limits were spatially related to a decrease in surface temperature hot spots. These areas are shown in a dark blue circle.

Discussion
The spatial analysis of the LST heat maps and success of the tree canopy mapping followed the successes of other studies. For example, the Harbison State Forest area (Figure 7 circled in purple) is a significant cooling area, similar to the areas [29] discovered that had decreased in overall temperature for that area. Large urban forests are likely to prevent nearby hot spots; indeed, [20] found similar results. A high overall accuracy was achieved in their study using Worldview 2 high resolution satellite imagery to determine urban tree canopy gains and losses, and they used LST maps derived from similar sources to discover hot spots that were highly correlated with new developments and tree canopy loss. Furthermore, [13] found that new built-up land in India was becoming more correlated with urban heat islands, or hot spots, around the city area from 1990 to 2020. Our visual findings linking the tree canopy losses to new hot spots in the Columbia, SC area support the previous findings by [13,20].
The visible LST decrease in the center of Columbia, while the suburbs and surrounding areas became warmer, is an interesting phenomenon. While no definitive conclusion has been drawn, this is possibly due to an increase in vegetation in the downtown areas of Columbia. As regions of the city have undergone growth, the Columbia Tree and Appearance Commission has sought to ensure there is adequate vegetation placed throughout the city [45]. There are several regulations prohibiting the removal of certain trees as well. These policies appear to have impacted the heat island at the center of the city. Furthermore, the warming outside of central Columbia is consistent with the growth the city has encountered. As Columbia grows, developments continue within and outside of the city limits to accommodate the increase in population. While shadows could also provide a less thrilling reason for the decreases in downtown temperature, we argue this is less likely than our other proposition because of the similarities in the built structures in downtown Columbia between 2005 and 2019. There has been some growth to the skyline of downtown Columbia, but the differences are not significant enough to change the amount of ground covered by shadow. Furthermore, the data were collected near peak solar azimuth and elevation to limit the influence of shadows on pixel values.
Each objective of the study encountered challenges while also providing insight into the Columbia, SC, UHI and canopy loss in the area. Canopy mapping required extensive training datasets and significant processing hours, but produced high accuracy tree canopy maps for 2005 and 2019 in Columbia, SC. The accuracy of each classification reached 94%. This is reflective of a high-quality classification and fares well against other urban canopy mapping efforts. For example, for a similar environment in Madison, Wisconsin, ref. [33] achieved 85% accuracy with a SVM classifier. The SVM classifier outperformed other Random Forest (RF) and Boosted Regression Tree (GBM) classifiers. Other studies have found success as well, including ref. [23] achieving 80% OA with Neural Network classification, ref. [46] achieving 89% OA with a Neural Network Classification, and ref. [24] achieving 96% with LiDAR data, ancillary data, and aerial photography including an NIR band.
An urban environment is very complex and introduces several elements for a classifier to overcome, especially when there are only two classes. Visual investigation showed the estimated canopy maps also performed fairly well, though improvements can be made. While the accuracy was remarkably high, it was determined that several of the changes observed in the change detection referred to small canopy changes in residential areas. In these cases, a tree has grown, been cut down, or perhaps there was a slight misalignment to the images and a change was detected, though there was no real change. The 2019 tree canopy classification looks to have fared better than the 2005 tree canopy. The CIR visually and substantially improved the classification. However, the 2005 image unfortunately did not include an NIR band during image capture. In order to overcome these challenges, we believe a few improvements can be made.
Improving the classification for tree canopy can be achieved by including other ancillary data, such as LiDAR data. Furthermore, a different classifier may do a better job at separating the tree canopy class from shadows, grass, and other similar classes. Deep learning remote sensing classifiers are becoming a larger topic of interest and many are performing remarkably well for large scale image analysis [47][48][49][50]. Future work should investigate a myriad of classifiers to find one ideal for classifying urban tree canopy. Some of the possible classifiers include the one used in this study, SVM, as well as Random Forest (RF), Deep Lab V3, U-NET, and many others. [51] showed the utility of utilizing several datasets, including LiDAR and vegetation indices, in conjunction with deep learning neural networks to obtain a very high accuracy (96% and 98%) urban canopy map. ArcGIS Pro offers several deep learning and machine learning classifiers using a graphical interface for the use of all GIScientists, regardless of having a technical background or not.

Conclusions
This study accomplished two objectives: first, we examined the use of a commonly used machine learning classifier for identifying urban tree canopy using no-cost high resolution NAIP imagery. Second, we used LST maps derived from no-cost Landsat thermal imagery to identify correlations between canopy loss and temperature hot spot increases over a 14-year period in Columbia, SC, USA. The support vector machine imagery classifier was highly accurate in classifying both the 2005 imagery (94.3% OA) and the 2019 imagery (94.25% OA) into canopy and other classes. We found the color infrared image available in the 2019 NAIP imagery better for identifying canopy than the true color images available in 2005 (97.8% vs. 90.2%). Visual analysis based on the canopy maps and LST maps showed temperatures rose near areas where tree canopy was lost and urban development continued. Future studies will seek to improve classification methods by including other classes, other ancillary data sets (e.g., LiDAR), new classification methods (e.g., deep learning), and analytical methods for change detection analysis.