Sentinel-Based Adaptation of the Local Climate Zones Framework to a South African Context

: The LCZ framework has become a widely applied approach to study urban climate. The standard LCZ typology is highly speciﬁc when applied to western urban areas but generic in some African cities. We tested the generic nature of the standard typology by taking a two-part approach. First, we applied a single-source WUDAPT-based training input across three urban areas that represent a gradient in South African urbanization (Cape Town, Thohoyandou and East London). Second, we applied a local customized training that accounts for the unique characteristics of the speciﬁc area. The LCZ classiﬁcation was completed using a random forest classiﬁer on a subset of single (SI) and multitemporal (MT) Sentinel 2 imagery. The results show an increase in overall classiﬁcation accuracy between 17 and 30% for the locally calibrated over the generic standard LCZ framework. The spring season is the best classiﬁed of the single-date imagery with the accuracies 7% higher than the least classiﬁed season. The multi-date classiﬁcation accuracy is 13% higher than spring but only 9% higher when a neighborhood function (NF) is applied. For acceptable performance of the LCZ classiﬁer in an African context, the training must be local and customized to the uniqueness of that speciﬁc area.


Introduction
Since before the dawn of civilization, the global population has been increasing both in isolated as well as connected communities [1,2]. This continuous rise in population has resulted in civilization and furthermore has created the barrier between urban and rural regions [3,4]. While urbanization comes with prospects of technologically advanced livelihoods for the inhabitants and easier access to amenities, it brings with it some undesirable side effects [5]. One of these side effects is the urban heat island (UHI) phenomenon. The urban heat island is a term coined by [6] to refer to a phenomenon where urban regions experience warmer surface and atmospheric climatic conditions as compared to their surrounding rural areas. The earliest documentation of this phenomenon was in a 1820 study on the London climate [7]. While the UHI is created by urban infrastructural developments, planning and design as well as population growth, it is however projected to be more intensified by climate change through extreme heat waves [8,9]. With anthropogenically-driven climate change becoming an even bigger threat to livelihoods, there is a need to establish living conditions that do not exacerbate but adapt to the changing climate. Urban planners propose that the increase in urban population demands innovation toward sustainable cities while some propose low energy buildings [10,11]. These are cities that have systems in place to curb and adapt to the effects of UHI while accommodating as much as 68% of the global population as projected for 2050 [12,13]. Even with this much awareness being raised in the urban planning and climate change circles, studies on UHI are still limited and only localized in Asia, Europe and North America with very little literature available on Africa, Mid and South America as well as Oceania [14].
What these limited studies highlight is that the biggest challenge in the study of UHI has been and is still that of relating the observed surface and air temperature patterns with the features on the ground [14,15]. This has been addressed primarily by the development and application of local climate zones to study both surface and atmospheric UHI [16][17][18]. Local climate zones (LCZ) are regions of homogenous surface features that experience uniform climatic conditions [19]. The local climate zones typology currently accepted as the standard framework for classification was designed by [15]. The framework was developed specifically to study UHI on the hypothesis that surface and atmospheric patterns in UHI can be attributed to the spatial and structural characteristics of surface features.
Even with the development of local climate zones to address the spatial distribution challenge, studies are still not evenly distributed geographically. Between 1970 and 2020, 57% of the publications have been in Asia, 23% have been in North America, 14% have been in Europe and only about 3% have been in Africa [14]. The majority of the African studies on local climate zones were part of global studies and not particularly focused on local African cities. Local climate zones studies in Africa have been limited as compared to the other continents, which creates a big gap in the literature in this area of study [20]. Urban climatology studies in South Africa have focused on a diversity of urban climate variables ranging from temperature-focused studies, Koppen's climatic zones and rainfall and drainage but not local climate zones [21,22]. This current study intended to not only play a part in filling the gap in the lack of LCZ classifications in South Africa but also contributing to the knowledge increase in African studies in general.
For the purposes of universal application and generalizability, the standard LCZ was designed to be culturally neutral [16]. Ref. [23] explored the relationship between culture and urban form and found them strongly correlated. Urban form is defined by [24] as the description of the city's physical characteristics. This covers everything from urban design, type of building material, arrangement of infrastructure and type of ground material among others. It was observed in Southwestern Saudi Arabia that cultural laws also influenced urban form [25]. Seventy-seven metropolitan cities in Asia, US, Europe, Latin America and Australia were sampled as part of a study to assess urban form across different continents. This study by [26] found that apart from differences of density and height, there are urban form features that are common across all the cities and yet there are some urban form features that are only unique to some cities. This suggests that a classification framework typology must be flexible enough to allow training based on the standard classes, combinations of the standard classes as well as the addition of classes that are unique to that local environment. The flexibility of the standard framework to manipulation is even more important in African urban regions due to how diverse they are and how different they are from each other and from western urban regions [27]. This current study explored the extent to which the standard LCZ framework can be manipulated and modified to accommodate the uniqueness of an urban area. All this was completed with the aim of developing a classification protocol that can be possibly applicable to similar urban regions that do not strictly resemble the cities the standard framework was based on.
While the standard typology remains the same, its implementation over the past decade since its conceptualization had adopted different methodologies. These approaches have included in situ measurements, GIS-based mapping as well as remote sensing-based approaches [28]. Within each approach, there are different sub-methods which can result in differences in accuracy even within the same general approach. A study on local climate zones on the Zimbabwean capital Harare adopted a remote sensing approach in attempt to compare the machine learning technique of support vector machines with the World Urban Database and Access Portal Tools (WUDAPT) generator approach. The WUDAPT is a global initiative of online tools to create local climate zone maps for a given city using a standard methodology [29]. This study by [30] found out that the WUDAPT approach yielded higher accuracies as compared to the SVM approach. Ref. [28] proposes that the GIS-based and remote sensing-based classifications produce different accuracies and level of detail as a function of scale [31]. The GIS-based method is more suitable for micro-scale classification, while remote sensing is suitable for larger-scale classification [32]. The WUDAPT approach in general yields higher accuracies as compared to other remote sensing as well as GIS-based methods. Ref. [28] attributed the accuracy of the WUDAPT methodology to its generic nature. According to [29], the satellite imagery used by the online WUDAPT classifier has pixel size of 100-120 m. This is the same size as the minimum size of a local climate zone training input, which makes the accuracy higher.
The WUDAPT method was developed for global urban morphology data collection for global climate models [33]. This suggests that it was not designed for implementation at the local level. For lower mesoscale and local scale climate models, more detail is required in urban morphology. This has resulted in the use of higher resolution data sources such as Landsat-8 and Sentinel-2 among others. These products have been used in classifications using mostly machine learning and deep learning techniques [20,30]. The use of coarser imagery of 100-200 m resolution uses the surface reflectance values for each pixel at a size that is comparable to the minimum size of a local climate zone. However, when finer resolution data of less than 60 m are used, the pixel size becomes much smaller than the size of the LCZ training input. When the pixel size is smaller than the object, the variation in pixels belonging to the same class becomes larger [34]. This then necessitates a neighborhood function kernel to aggregate the pixels to the level of the local climate zone. At the end, the result is an aggregated value instead of the original classified value.
The first objective of this study was to characterize the spatial designs and layout of three cities, namely, Cape Town, East London and Thohoyandou that cover a gradient in urbanization within the South African context. This looks at the historical influences that shape the morphology and urban form of each of these urban areas. The second objective was to determine the extent to which the standard LCZ framework covers the nature and morphology of South African cities. This is by applying the standard LCZ framework as it is. For this objective to be carried out, a combination of a field survey together with digital imagery was used to develop a training input of spectrally distinct classes found within these urban regions, which was the third objective of the study. For this objective, all these urban regions are assumed to be similar, and only one training is developed using Cape Town and then applied across the rest of the urban regions. Lastly, a more specific training is developed for a more customized LCZ classification protocol with each city having its own training sample.

Study Area
The study was conducted in Cape Town, Thohoyandou and East London urban areas of the Western Cape, Limpopo and Eastern Cape provinces of South Africa, respectively ( Figure 1). These are urban areas of different size, urban form and land-use systems, representing a gradient across South African urbanization. Their geographical location and dispersion through South Africa put them in different climatological systems. Cape Town and East London are coastal cities, while Thohoyandou is a remote inland small town ( Figure 1). Cape Town, located at the southernmost tip of Africa, experiences a Mediterranean climate characterized by cold wet winters and hot dry summers [35]. East London experiences a maritime climate with cool winters and mild summers that is moist throughout the year because of proximity to the ocean [36]. Thohoyandou experiences cold dry winters and hot wet summers [37]. Cape Town as a city is a monument of the historical occupations and influences of the diverse people groups who contributed to its development. This has greatly influenced urban form resembling Portuguese, Indian, Dutch and British architecture [38]. The original development of Cape Town into somewhat of an urban area began with the Portuguese explorers in the 14th century in an area that belonged to the Khoikhoi people. This was followed by the Dutch period in the 17th century and then the British period in the 19th century. Finally, the South African period began in the early 20th century and extends to date. The 20th century ushered in rapid urban expansion in Cape Town from multiple epicenters, resulting in an overall design that is comprised of multiple administration and suburban residential areas resembling Harris and Ullman's (1945) multiple nuclei model [39]. Cape Town currently sits at 400 km 2 with a population density of 17,500 per km 2 .
East London similar to Cape Town was also developed originally as a harbor town. However, East London does not have as long a history of cultural diversity from different developers as cape town. The city has remained a harbor city of mostly British influence in style, but the expansion outwards into the indigenous communities has brought indigenous cultural influences into the East London urban form [40]. It is currently sitting at an area of 168.9 km 2 and a population density of 2745 per km 2 .
Thohoyandou, which was developed as a capital for the Venda Bantustan in the latter half of the 20th century, does not benefit from the diversity of historical western influences in its urban form and design that East London and Cape Town encompasses [41]. The development of Thohoyandou was originally for a shopping center and government administration offices [42]. Among the three urban areas, Thohoyandou is the most integrated with features associated with the rural environment. The area of Thohoyandou is 42.62 km 2 with a population density of 2051 per km 2 .

Sentinel-2 Multispectral Imagery
The Sentinel 2 Top-of-Atmosphere (ToA) (L1C) product was obtained from the Copernicus Open Access Hub (COAH) of the European Space Agency (ESA). Sen2Cor288 algorithm was used in SNAP to correct for atmospheric interference and thus convert the L1C product to Sentinel L2A, which is Bottom-of-Atmosphere (BoA) reflectance [43]. Sen2Cor creates BoA reflectance images, terrain and cirrus corrected reflectance, aerosol optical thickness, water vapor, scene classification maps and quality indicators for cloud and snow probabilities [44]. The central month image of each season with a cloud coverage filter of less than 0.5% was selected as the most suitable to reflect peak season dynamics (Table 1).

Definition of LCZ Classification
The standard [15] LCZ classification framework was selected for this study to be applied across all three urban areas. A remote sensing-based approach was chosen over a GIS-based approach. A GIS approach to classification would require manual digitization of the entire image, which is laborious and time consuming. The processing of remotely sensed data also requires the manipulation and interpretation of digital data [45]. This tends to be a mathematically complex process due to the heterogeneity of materials and geometry of the features [46]. However, the advantage of the remote sensing-based route is that it can be automated. The irregularities of the geometry of the local climate zones becomes a challenge to both pixel and object-based classifications [45,47,48]. A pixel-based classification which was adopted for this study overcomes this geometric non-uniformity challenge by assigning every pixel into a single class based on the reflectance value [49].
This application of Stewart and Oke's LCZ classification framework was used in two approaches that differ in the creation of training data. Approach 1 followed strictly the LCZ region of interest (RoI) creation as outlined by WUDAPT to create a standard training set based on Cape Town to be applied on all three cities. The application of this remote training on Thohoyandou and East London indirectly assesses whether the influence of origin and culture on urban form affects the LCZ classification. This informs whether there is a need to have a customized training for LCZ classification in South African cities for all cities or locally for each urban region. Cape Town was chosen because it is the only urban area of the three that experiences all four seasons and contains all 17 LCZ standard classes [50]. Theoretically, this means that the impacts of phenology would be more apparent in Cape Town than in East London and Thohoyandou, which also would make it ideal for identifying the best season for single image classification across all three urban regions. In defining these LCZ classes, a separability analysis was performed. A spectral separability is an assessment of the performance of the Sentinel-2A multispectral instrument bands to differentiate between the classes of the typology. For the purposes of this study, histograms, scatter and box plots were used to perform this separability according to guidelines from [51]. The second approach to the classification still took off from the traditional [15] typology but explored combinations and subclasses of the standard typology based on the unique features of each urban region.

i. Model Training
Two types of model training sets were developed for the classification where the first was standardized and the second was customized to the context of the urban area. The ground reference data were collected initially using digital globe resources due to COVID restrictions and were validated and revised in situ on February and March 2022 in Cape Town, Thohoyandou and East London. Based on a digital globe basemap, circular regions of interest were made using QGIS having a diameter of 100 m following the specifications of the standard typology as outlined by WUDAPT. The field campaign was then used as a manual method to verify the regions of interest ( Figure 2). Most (80%) of the regions of interest were used for training and the remaining 20% were used for validation. The field campaign was also used to observe and document the unique elements of each urban region for the discriminations of subclasses to feed into the more specific training for Approach 2. a. Approach 1 The WUDAPT LCZ typology guidelines for the development of training data was applied. These guidelines are divided into subcategories depending on the scale of the total area, classification methods and the intended use of the final product [47]. These guidelines have a strict protocol for training with the objective of using the WUDAPT online LCZ generator as well as a more flexible protocol for developing training data to be used outside the WUDAPT generator [16]. The development of these training polygons depends on a combination of general typology elements such as cover, material, geometry and function taken to different levels of detail depending on the scale of the imagery and the purpose of the classification. Cities are then mapped using the scheme of [15], which classifies the urban landscape into 10 urban and seven natural classes. Each class in the typology represents a LCZ described in terms of specific landscape parameters of mean building height, canyon width, aspect ratio, building surface ratio and impervious area. These training areas are used to characterize the reflectance properties of each LCZ, which is then used to develop a model that assigns every other untrained pixel of the image into the LCZ classes within the framework. A three-step sampling method (Figure 3) was adopted from [52]. This block-based system was developed for LCZ classification at the city block scale primarily as GISbased. In this study, this method was used to guide the development of training samples following the three steps. The natural city blocks are easier to assign to LCZ classes because they are homogenous, but the urban classes are much harder even with the physical access to the area. Urban LCZ metadata variables were thus limited to mean building height (H s ), which is the number of stories as collected in the field, mean building height (BH), canyon width (CW), aspect ratio, building surface ratio and impervious area ( Figure 4). When the number of stories per building is less than 10, every story is assumed to be 3 m; otherwise, Equation (1) is used for buildings with more than 10 stories [53]. Buildings with one to three stories were considered low-rise, four to nine stories were considered mid-rise, and more than nine stories were classified as high rise ( Figure 5). Canyon width is estimated by the average distance between two buildings. Aspect ratio is estimated by the ratio of the building height (BH) to the canyon width (W). Building surface fraction (BSF) and impervious surface fraction (ISF) were estimated using simple calculations in QGIS following the polygonization method (   b. Local Customized Training Input (Approach 2) This approach is developed based on the layout and design of each city taking into account features that are common across all three cities and features that are unique to the specific urban region. In Thohoyandou, the urbanized city center and the immediate blocks around the city center have rural features integrated into the urban landscape. The intra-block streets in Thohoyandou are not homogenous in material; some are asphalted while some are gravel. While in a standard framework, the building density and height stand out as the main discriminators for local climate zones, the street canyon material stands out just as significantly in Thohoyandou. This is also noted by [27], who stated that as a unique general feature, remote African urban areas tend to have more bare soils than western urban. While this might not be statistically significant for a highly urbanized and highly westernized city such as Cape Town, its significance in a small town such as Thohoyandou cannot be neglected without investigation. However, spectrally separating bare soil from impervious surfaces in a remote sensing approach at the level of the pixel (10 m) has inherent confusion in spectral signatures [54]. Therefore, Following Jin's blocking, a GIS approach was developed in order to create a criterion for separating blocks that are completely asphalted from blocks that are gravel within the urban area without the inherent confusion of a remote sensing approach. A digitization process was applied to digital globe imagery to create asphalted and bare soil inter-street blocks ( Figure 6). The output of this digitization was used to separate training inputs that are in gravel blocks from those that are in asphalted blocks. LCZ 3 was the only class observed to be present in both block types. The buildings within LCZ 9 in Thohoyandou are also further apart than they are in Cape Town and East London. The space between houses is thus confused with Shrublands (LCZ 14) due to the dry nature of the shrublands in Thohoyandou. In order to reduce this confusion, the low plants were thus combined with shrubs lands to form a single class. Scattered trees were also combined with dense trees to form a single class. The number of the natural classes was thus minimized so that the built classes can then stand out more spectrally. Impervious surface and bare soil were also combined to form a single class. This is because they are the least represented classes in the area, and combining them makes them a slightly larger class. This thus makes the updated training set for Thohoyandou to become LCZ 3a, 3b, 6, 8, 9, 11 & 12, 13 & 14, 15 & 16, 17. In East London, the ground truthing process in lightweight zones (LCZ 7) revealed a unique class that is a hybrid between lightweight and compact low-rise (Figure 7). A unique feature of South African light weight squatter camps is that they have no designated stand-numbers. This means they have no yards, and it is common for houses to share walls with their neighbors on all sides except the front. Because of this nature of South African squatter camps, the integration of LCZ 3 and LCZ 7 in these East London zones happens below the minimum size of the local climate (100 m). This is thus treated as a unique class and incorporated as LCZ 7a into the updated East London training, which then becomes LCZ 2, 3, 5, 6, 7, 7a, 8,9,10,11,13,14,16,17. ii. Remote Sensing Classification Protocol The choice of the LCZ classification method is guided by the nature of the data, the available computational resources and the application purpose [45]. The classification protocol was performed via a coded script in R on R-Studio using mainly the CARET package through a Random Forests (RF) classifier. This was designed to extract the training pixels from the image stack, build predictive models that assign the rest of the image pixels into the most fitting class based on surface reflectance values and to validate the assigned pixels. This was performed on a single image stack as well as a multitemporal image stack.

a. Single Image vs. Multitemporal Classification and Neighborhood Function
The first classification method is the most straightforward application of a LCZ classification and consists of applying the iterative process on a single date image.
The seasonality was therefore analyzed in terms of meteorological seasons, namely winter, spring, summer and autumn, each in turn, by one scene at the center of the season [55]. For the multitemporal approach, accuracies of single-image classifications of each season will therefore be compared with those of a classification combining images of all four seasons. This is in order to account for the spectral and spatial changes in the natural vegetation that is caused by seasonal changes. This has the potential to increase the accuracy of the classification. This eliminates confusing between seasonal classes such as bare soil in the dry period, which is covered by low vegetation in the rainy periods. b. Neighborhood Function A neighborhood function or contextual classifier can contribute to increased accuracies in the classification of urban areas that are internally highly differentiated or heterogeneous, resulting from historical urbanization patterns that reflect the locality and the culture. In addition, most classification methods, including the original WUDAPT protocol, do not take this spatial variation into account. Moreover, the WUDAPT workflow causes a loss of spectral variability information before the actual classification by resampling the Landsat images during the pre-processing phase to a spatial resolution of 100 m [56][57][58][59][60]. At 10 m resolution, the sensitivity of the neighborhood function was tested by increasing the size of the kernel window for an optimal cell number.
iii. Validation The ground truth data were randomly split in R into a training (80%) and validation (20%) set. The validation set is used to validate the model using accuracy metrics. The first accuracy metric performed was visual comparison of the output with satellite imagery. The User Accuracy (UA) is the probability that the predicted value is correct; the Producer's Accuracy (PA) is the probability that a value in a certain class was classified correctly. The Overall Accuracy (OA), the Kappa coefficient, is a measure of the agreement between classification and truth values. All were calculated according to the guidelines in [48]. The F1-score was calculated as the harmonic mean of the UA and PA, which is even more useful when the classes are not balanced.

Visual Analysis of the Classification Outputs
A visual inspection of the Cape Town output reveals a clear separation of the built from the natural classes ( Figure 8). What this pattern also reveals is that the compact classes are rather concentrated next to each other and the open classes are further away. The city center toward the harbor is composed of compact high-rises and compact mid-rises, as expected in a modern city such as Cape Town. The agricultural lands up north are also visible from a visual inspection with minor patches of LCZ 9 through them. What is also worth noting is the confusion that arises between the paved surfaces and built classes. Roads at the city center are wrongfully classified as either compact high-rise (LCZ 1) or heavy industry (LCZ 10). When the remote Cape Town-based standardized training is applied on a Thohoyandou multitemporal image, LCZ 3 is completely absent ( Figure 9A), but it reappears immediately when a local standardized training is applied ( Figure 9B). Using a local standardized training shows the Nandoni area to be mostly LCZ 9 with some vegetation; this is in line with the pre-study survey and google globe imagery that show the area being a rural village. However, the remote standardized training shows the same area as being mostly composed of LCZ 8 (Large Open), which is mostly found in the city center. Single-image classification using remote training shows LCZ 7 (light low-rise) at the city center, which according to google globe is a misclassification, as Thohoyandou does not have squatter camps, and this remote training also shows forested environment at the Nandoni region. This is different from the output of the local training, which is almost in complete agreement with the local MT classification showing no LCZ 7 at the city center and low plants and sparse trees at the Nandoni dam area. The application of remote vs. local training in East London does not yield significant differences in the appearance of classification outputs.

Single Versus Multi-Seasonal Images and Neighborhood Functions Comparing Performance of Remote and Local Standard Trainings
While the multitemporal Cape Town image is the most accurately classified with an overall accuracy of 44%, the spring image has the highest accuracy of the single image classifications with an accuracy of 42% ( Table 2). The difference in Kappa and OA metrics between the spring image and the multitemporal image is relatively small: 1.6% for Kappa and 1.4% for OA. This suggests that the spring image could be an acceptable representation of the Cape Town area when there are no multitemporal data. The spring image OA is a 6% improvement on the summer classification, which is at 36%. However, even this 6% is mostly due to the natural vegetation (OA-nat), which is 15% higher in spring than in the summer. The difference between accuracies of the urban classes (OA-urb) is only 2%. This also proves that the effects of phenology are greatest among natural classes, as expected. The summer image as the least successfully classified has the lowest Kappa at 31.2%, which is a 68.8% disagreement between the training and the output. This indicates that the least represented classes in the training are not very well classified. According to the Kappa statistic, the least classified (summer) and the best classified (multitemporal) all have Kappa values that fall within the same range. This means about 60% disagreement between training and output, suggesting that only 40% of the data can be relied upon to produce the observed results. The difference between the F1 score and the OA is also Indicative of patterns in classification of individual classes, since the F1 is a harmonic mean of the PA and UA. From Table 2, the summer and autumn images F1-scores and OA are almost identical which indicates confusion even in the classes which are rather well classified in other seasons. The confusion matrix of the spring image shows that the built classes have more confusion than the natural classes (Table 3). While LCZ 3 and LCZ 6 are visually well represented across all reasons, the confusion matrix reveals that they have the highest confusion in general. A lot of bare lands in the summer and autumn are classified as built classes. This trend is also seen in the spring season with LCZ 16 confused with LCZ 9 and LCZ 10. However, the spring image is still the best classified single date image and is the best season to test the Cape Town based remote training in Thohoyandou and East London. The other tool that improves the classification is the contextual classifier, and the optimal kernel size must be determined for application across all urban areas. This fragmentation (salt and pepper effect) of classes is most visible from the classification output (Figures 8 and 9). The neighborhood function aggregates the pixels within a certain threshold into a single value and results in a smoother output as compared to the raw data classification ( Figure 10). This is tested on Cape Town and the optimal kernel size applied to Thohoyandou and East London to compare their results with Cape Town. The accuracy remains constant at 42.7% and Kappa at 43.2% for all kernel sizes below 11 cells and reaches its peak at 13 cells, after which the accuracy starts to decrease (Table 4). With a 13 × 13 kernel of 130 m at a resolution of 10 m, the accuracy of the spring single image classification is improved by 6.0%, and the multitemporal classification is improved by 5.7%. However, in general, the multitemporal combined with the neighborhood function still yields higher accuracies than the single image with the neighborhood function. The highest overall accuracy of the multitemporal is 49.8% at a 13 × 13 moving window size, bringing the total improvement over the general single image to 7.2%. Nevertheless, it is only a slight improvement of 1.1% compared to the single-image classification, which has an overall accuracy of 47.7%. Table 5 also shows that for both the single and multitemporal classifications, the overall accuracy of the combinations of both natural and built classes (OA-nat/urb) metric is highest at a kernel size of 11 × 11 or 110 m, indicating that confusion between natural classes and built classes is lower in these classifications. However, this is only a 0.3% difference from the 13-cell kernel application, which individually has higher OA-nat and OA-urb. For the purposes of this classification, a standard 13 cell NH kernel was adopted as the optimal kernel size for comparison across the three urban regions.
The application of remote training yields the highest results in a Thohoyandou multidate stack at 53.2%; however, a local standardized training yields a 7% increase in overall accuracy and a 10% increase in Kappa. Urban classes are less successfully classified across the board. However, they are better classified in East London than they are in Thohoyandou. The use of remote vs. local classification in East London does not seem to have as big an impact as it does on Thohoyandou. The multi-date classification using local standardized training is only 2.1% higher than its remote training counterpart. The single-date classification using remote training yields the lowest overall accuracy, Kappa and F1-metric across the board, while the multi-date local training yields the highest values of the same metrics. There seems to be a similarity in the LCZ layout between Cape Town and East London, which is seen through the accuracy metrics, which are almost similar for both remote and local training. Table 5. Accuracies of single-date and multitemporal classification using remote (local to Cape Town but remote to Thohoyandou and East London) and local (standard training collected in Thohoyandou and East London). Using a randomized selection of training and test sample makes the model more robust, but it does not create an even number of pixels throughout. The challenge even in the development of the original training protocol was that some classes had better representation than others. While this can be avoided by creating an even amount of training samples throughout, there simply is not always enough land area to create regions of interest in some classes. Other classes also call for a higher number of training due to their inter-class variability that must be accounted for in the training sample. The output of the randomly selected training pixels ( Figure 11A) shows that classes 14, 15 and 16 do not have enough representation in the sample. This was solved by joining them to other classes that have similar spectral signatures. Class 14 was combined with class 13; class 15 was combined with class 16. The result of this ends up with the lowest number of pixels in the training going from 250 pixels to above 700 ( Figure 11B). What has been observed in Approach 1 classifications is still maintained in the Approach 2 results. This is the observation that the spring image provides the highest accuracy of all the single image classifications (Table 6). However, natural vegetation is more accurately classified in the winter NH than in any other season, while urban is highest in the spring. The multitemporal image that combines all the seasons is the most accurately classified with an accuracy of 82.68% as compared to 53.2% of Approach 1. What the confusion matrix of the multitemporal classification reveals is that all the classes are more accurately classified than misclassified except for LCZ 16, which is a compound class of LCZ 15 and original LCZ 16. The highest confusion of LCZ 16 is with LCZ 8, which is due to the paved spaces between large open buildings, and LCZ 13, which is due to the large bare regions in between shrubs (Table 7). The discrimination between the gravel LCZ 3a and the asphalted LCZ 3 seems to be very good according to a visual inspection of the classification output ( Figure 13). The confusion matrix also confirms this with 17% of the LCZ 3 classified as LCZ 3a. However, the highest confusion with LCZ 3a is with LCZ 6. This is due to them sharing similar open spaces and also their proximity with 11.5% of LCZ 3a pixels classified as LCZ 6.

East London Classification
Using a local and customized training input yielded higher overall accuracy and Kappa values. The highest OA value for East London using Approach 1 was 41.3%, which has increased to 58.5% using Approach 2 which is a 17.5% increase. However, contrary to observations from the Approach 1 results, the multitemporal image does not appear to be the best fit for both built and natural classes. The spring image with a 13-cell neighborhood function yields the highest overall Kappa, F1-Score, and overall accuracy (Table 8). This is in agreement with the observation made in Approach 1, where the spring image was the highest of the seasonal single image classifications. What is also similar is that the summer image has the lowest values for all the overall metrics. It is worth noting that the spring image still has the highest number of lowest individual class Kappa values spread out across built classes; this suggests that the higher overall metrics are due to the perfect and near perfect classification of the natural classes. The highest individual LCZ Kappa values are not localized to a single season but spread out through different seasons. A common trend, however, is that the highest individual Kappa values fall within the classifications that have been smoothed out with the 13-cell kernel, while the lowest values are within the raw image classification. What the spring neighborhood function image ( Figure 14) reveals visually is that LCZ 7 and 7a are in close proximity to LCZ 3. This goes from a strict LCZ 3 and moves into LCZ 7a, which is an integration of 7 and 3, and then ultimately moves into a strict LCZ 7.

Conclusions
Cape Town is a multi-nuclei urban region of multi-cultural origin, East London is a harbor town, and Thohoyandou is a small town that originated as the administrative capital of the Venda Bantustan. These urban regions represent a gradient within urbanization in South Africa. These different historical backgrounds contribute to the uniqueness of the layout and feature type in each region, which is a phenomenon also noted in the Middle East [25]. These unique features become an element of importance, as they could potentially explain the poor performance of the standard framework when performed using multispectral imagery at the local scale in Africa. Cape Town as an urban area resembles closely the cities of the west; as such, the standard LCZ framework typology would best fit Cape Town with minimal to no adjustment in the guidelines for RoI creation. However, the development of a localized and customized training for East London and Thohoyandou individually creates a classification protocol that considers these unique local features stemming from influences of their unique origin and cultural evolution as they herd toward modernization.
The nature of the training input was the major difference between Approach 1 and Approach 2. Where Approach 1 used a single all-inclusive training input for all three cities, Approach 2 used a local customized training input for each urban region and yields better results. The biggest challenge in this study was the lack of a height layer in the stack as a discriminator for the algorithm. Ref. [61] stated that the presence of a height layer is essential for cities with LCZs belonging to different height classes (low, mid and highrise). What is immediately noticeable in the accuracies metrics is the big difference between values obtained in homogenous-height Thohoyandou across all LCZs and the heterogenousheight Cape Town and East London using both Approaches 1 and 2 (Tables 6-8). Without a height discriminator in the classification stack, there is inter-class confusion within the compact classes as well as the open ones (Table 3). Ref. [62] addressed the height challenge by using an abridged version of the LCZ classification that considers surface feature density but eliminates height altogether. While this land cover-based framework by [62] was also proven to explain trends in urban heat islands, the local scale suffers from detail loss. This compromise of detail over accuracy renders the output less useful and to some degree even unsuitable for local climate models. The best way to address the height data gap challenge while maintaining detail resolution remains using locally calibrated training as opposed to the [62] land cover approach.
Both approaches revealed that a local customized training sample is a better fit for the random forest LCZ classifier than using a standardized training input for all regions. This is seen through the classifier performance being better in Approach 2 as opposed to Approach 1. The literature would also dictate that seasonality would mostly affect natural LCZ classes because of plant phenology [63][64][65]. However, as seen in the metric tables (Tables 3, 6 and 7), urban classes are also classified to varying degrees of accuracy at different seasons. While the higher short-ware infrared (SWIR) bands are always the most important in the automated LCZ discrimination protocol, the lower bands range from minimally important to completely negligible. Studies by [66,67] isolate the variations in band priority for different seasons as a function of the physical properties of surface features. This is not limited to biotic but also abiotic features such buildings. The multitemporal classification was the most accurately classified of all classifications. Ref. [68] stated that the effects of seasonality are addressed by taking a multitemporal stack which covers classing during all stages of annual variability. While the actual seasons are classified to varying degrees, the multitemporal local customized training would still be more representative of the LCZ classes than using a single image from any season.
The size of the pixel also determines the accuracy. As such, a contextual classifier (NF) significantly improves the accuracy of the model [69]. While applying a neighborhood function does not change the pixel of the image, it reduces the level of detail in the classification output. This is seen by visually looking at the raw data output (Figure 9) as compared to the NF output ( Figure 12). The fragmentation is less apparent when the pixel size is higher than 100 m, as seen when the WUDAPT online generator is used [66]. The disadvantage is that classifying LCZ with a local-scale pixel size (100 m) reduces the level of detail that that would otherwise be found in using higher-resolution imagery, which in this study was 10 m. This is crucial while working with urban climate models. The challenge in using a contextual classifier is in finding a kernel size that balances detail, accuracy and aesthetic for the specific goal for which the classification is intended to be used. While for the purpose of this study, the aim of the NF was to achieve the highest accuracy, the nested algorithm is flexible enough to modify the kernel size should the purpose of the classification be different.
An application of these methods in future studies should consider using more training samples for the less represented classes. In addition, whether the accuracy in Thohoyandou would improve if a height discriminator is part of the protocol is worth exploring further. Height is definitely recommended as an important addition to the classification stack for East London, Cape Town or any other city with mid-and high-rise classes. The findings of this particular study as well as the methodological protocols would be recommended for adoption by any future study that aims at studying UHIs in the African context, particularly investigating the spatial correlation between the patterns that are observed in the UHIs with the underlying LCZ classification.