Local Climate Zone (LCZ) Map Accuracy Assessments Should Account for Land Cover Physical Characteristics that Affect the Local Thermal Environment

Local climate zone (LCZ) maps are increasingly being used to help understand and model the urban microclimate, but traditional land use/land cover map (LULC) accuracy assessment approaches do not convey the accuracy at which LCZ maps depict the local thermal environment. 17 types of LCZs exist, each having unique physical characteristics that affect the local microclimate. Many studies have focused on generating LCZ maps using remote sensing data, but nearly all have used traditional LULC map accuracy metrics, which penalize all map classification errors equally, to evaluate the accuracy of these maps. Here, we proposed a new accuracy assessment approach that better explains the accuracy of the physical properties (i.e., surface structure, land cover, and anthropogenic heat emissions) depicted in an LCZ map, which allows for a better understanding of the accuracy at which the map portrays the local thermal environment.


Introduction
Mapping and analysis of local climate zones (LCZs), i.e., landscape units with distinct screen-height temperature regimes [1], has become a popular research topic in the fields of remote sensing and urban climatology in recent years.The LCZ classification scheme divides the landscape into 17 classes, each having different surface structure, land cover, and anthropogenic heat emission characteristics (hereafter "physical characteristics") that affect the local air temperature [2].LCZ maps have become an important source of data for studies on urban climatology [2][3][4][5] and urban planning [6].In a more general context, the LCZ classification scheme has also been suggested as a standardized framework for the mapping of urban form at the global scale, as it can provide information on the basic physical properties of any urban area [7].
LCZ maps are typically generated by analyzing satellite images using semi-automated image classification algorithms.The most common approach involves performing supervised classification of Landsat satellite imagery using the random forest classification algorithm [8], and this approach has been termed the "world urban database and access portals tools" (WUDAPT) protocol [7].The WUDAPT protocol has been applied successfully in many LCZ mapping studies in different geographic regions [4][5][6]9,10].To build on the WUDAPT approach, some studies have proposed the use of additional remote sensing datasets, e.g., nighttime optical imagery [11], nighttime thermal infrared imagery [12], synthetic aperture radar data [9], or LiDAR data [13].Other studies have proposed incorporating ancillary GIS datasets in the image classification workflow, e.g., OpenStreetMap data [11,14], and/or the use of alternative classification algorithms (e.g., convolutional neural networks [11]).
No matter the input data or image classification method used, all LCZ maps contain some degree of classification errors.As with all land use/land cover (LULC) map products, it is important to quantify these errors, e.g., in terms of producer's accuracy (errors of omission), user's accuracy (errors of commission), and overall accuracy [15].Traditional LULC map accuracy assessment metrics are calculated by comparing the estimated (i.e., mapped) LULC type with the reference (i.e., "ground truth") LULC at sample pixel locations, and making a binary decision as to whether each sample location is correctly/incorrectly classified [16].This purely thematic approach which penalizes all mapping errors equally, however, does not allow a user to adequately interpret the map's accuracy for modeling specific physical processes (e.g., rainfall runoff [17] or urban microclimate).In the case of LCZ mapping, some LCZ types are quite similar to one another in terms of their physical characteristics (e.g., "LCZ 2: compact mid-rise" and "LCZ 3: compact low-rise"), and a misclassification between two physically similar LCZ types will not pose as great a problem for urban climate studies as a misclassification between two physically dissimilar LCZ types (e.g., "LCZ 1: compact high-rise" and "LCZ A: dense trees").Evidence of this can be seen in Bechtel et al. [3], where, based on an analysis of LCZ maps and land surface temperature data from 50 cities around the world, LCZ types with more similar physical characteristics were found to have more similar surface urban heat island intensities.Similar findings were also reported in individual case studies [2,10,12].
Based on the above, it is clear that LCZ map accuracy assessments should be conducted in a way that takes into account the physical characteristics of each LCZ type, e.g., by applying a greater penalization to classification errors between more dissimilar LCZ types.To our knowledge, however, only one prior study has attempted to incorporate such an approach.Bechtel et al. [18] proposed to weight the traditional LULC map error matrix using a multiplier that accounts for the similarity between LCZ types (based on the properties of openness, height, surface cover, and thermal inertia).Their idea of weighting the error matrix has merit, but their proposed methodology had two main shortcomings: (1) the determined weights were subjective and not fully transparent (e.g., no equations provided to explain how they were derived); and (2) the weights incorrectly applied a higher penalty to the misclassification of more physically similar LCZ types.The first shortcoming limits the flexibility and transferability of their methodology, as it is not possible to develop locally-optimized weights for a specific study site (i.e., weights based on the typical physical properties of the LCZs in that specific study site) if no equations are provided.The second shortcoming results in the assignment of lower penalizations to map classification errors that have larger impacts on the predicted microclimate (i.e., to misclassifications of physically dissimilar LCZ types), which is the opposite of what is logical.In the literature related to LCZ mapping, most past studies have used the traditional LULC map error matrix to evaluate LCZ mapping accuracy [4,7,12,14], while a few studies have used the method proposed by Bechtel et al. [3,18].Thus, it is critical to develop more suitable LCZ map accuracy assessment metrics.

Proposed Approach
The objective of this study is to provide an objective and generic method for LCZ map accuracy assessment, taking into account the typical physical characteristics of each LCZ type that affect the local thermal environment.Our approach involves weighting the traditional LULC map error matrix using a "LCZ dissimilarity metric", calculated based on the degree of dissimilarity between the classification result (i.e., mapped LCZ type) and the reference data in terms of the estimated surface geometry, surface cover, and thermal inertia characteristics.The values of this LCZ dissimilarity metric can range from 0 (no penalty for misclassification) to 1 (full penalty for misclassification, which is equivalent to value of the traditional error matrix).The weighted producer's/user's/overall accuracy measures that can be derived from our weighted error matrix indicate the level of similarity between the estimated and reference ("ground truth") landscape physical characteristics depicted in the map, which allows for a better understanding of the map's accuracy in portraying the local thermal environment.
Our proposed methodology is intended to be generic enough to apply in any study site, although site-specific information on the reference physical characteristics of each LCZ type could alternatively be used (in place of the values in Figure 1), if available.

Generating the LCZ Dissimilarity Metric
The proposed LCZ dissimilarity metric takes into account nine of the ten physical parameters presented in Stewart and Oke [1] as the basis for LCZ identification: 1.
sky view factor (SV), i.e., fraction of sky hemisphere visible from ground level; 2.
surface admittance (SA), i.e., capacity of surface to accept or release heat; 8.
The remaining parameter given in Stewart and Oke [1], pervious surface fraction, was not included for our calculations because it is inversely correlated with the impervious surface fraction parameter.
To calculate the LCZ dissimilarity metric, we first determined the average value of each parameter, for each LCZ type, based on the value ranges provided in Stewart and Oke [1] (see Figure 1 for all of these average values).To account for the different minimum-maximum ranges of the nine parameters, we normalized all of the derived average values to a 0-1 range using the following equation: where P is the average value of a parameter, P max and P min are the maximum and minimum P values of the parameter (among all LCZ types), and P norm is the normalized P value.Table A1 shows the P norm values calculated for each LCZ type.
As the second step, we calculated the average (absolute value) difference between the P norm values of each pairwise combination of LCZ types (D ij ).D ij is given by: where i and j indicate two LCZ types i and j; and SV, AR, H, TR, BF, IF, SA, A, and AH are the P norm values of the parameters sky view factor, aspect ratio, mean building/tree height, terrain roughness, building surface fraction, impervious surface fraction, surface admittance, surface albedo, and anthropogenic heat flux, respectively.For "LCZ A: Dense trees" the surface admittance parameter is unknown [1], so this parameter was excluded from D ij calculations involving "LCZ A" (and the numerator of Equation ( 2) was divided by 8 instead of 9).All of the resultant D ij values are reported in Table 1, with higher values indicating a greater dissimilarity between two LCZ classes in terms of their surface structure, land cover, and anthropogenic heat flux properties.These D ij values represent our proposed LCZ dissimilarity metric.From Table 1, it can be seen that the two most similar LCZ types are "LCZ 5: Open mid-rise" and "LCZ 6: Open low-rise" (D ij = 0.08), while the two most dissimilar LCZ types are "LCZ 1: Compact high-rise" and "LCZ G: Water" (D ij = 0.80).

Using the LCZ Dissimilarity Metric to Weight the Traditional Error Matrix
A traditional LULC map error matrix is generated by systematically comparing the remote sensing-derived LULC map and the reference LULC dataset at sample pixel locations.The central

Using the LCZ Dissimilarity Metric to Weight the Traditional Error Matrix
A traditional LULC map error matrix is generated by systematically comparing the remote sensing-derived LULC map and the reference LULC dataset at sample pixel locations.The central part of the matrix is an array of number k × k in size, where k is equal to the number of LULC types in the map [15].Rows in the error matrix represent the mapped LULC information at each sample location, columns represent the reference information, and the intersection of rows and columns summarize the number of samples mapped as a particular LULC type relative to the reference data [15].From this error matrix, overall accuracy is calculated as the fraction of all samples that are correctly classified, while producer's accuracy is calculated for each LULC type as the number of correctly classified samples divided by the total number of reference samples belonging to that LULC type (i.e., column total), and user's accuracy is calculated for each LULC type as the number of correctly classified samples divided by the total number of samples mapped as that LULC type (i.e., row total).
A weighted error matrix, which accounts for the dissimilarity between each pair of LCZ types, can be derived by multiplying the values in the traditional error matrix by the corresponding LCZ dissimilarity metric (D ij ) values presented in Table 2.The result of this weighting is that misclassifications between two physically similar LCZ classes receive a lower penalty in the calculation of producer's accuracy/user's accuracy/overall accuracy, as these classification errors will have less impact on the thermal properties that can be estimated from the LCZ map.On the other hand, misclassifications between two dissimilar LCZ types are applied with a higher penalty (up to 1, i.e., complete misclassification, in the case of highly dissimilar LCZ types).For correctly classified sample pixels, no penalty is applied because no misclassification exists, and the values are equivalent to those of the traditional error matrix.This proposed approach to derive a weighed error matrix-and corresponding weighted producer's accuracy/user's accuracy/overall accuracy values-is quite similar to the approach used to calculate "weighted accuracy" in Bechtel et al. [18], but the previous study incorrectly applied greater penalties for misclassification between physically similar LCZ types rather than physically dissimilar LCZ types.

Comparison of Our LCZ Dissimilarity Metric with a Reference and the Traditional Error Metric
Figure 2 shows several examples of how the misclassification penalties derived by our LCZ dissimilarity metric differ from those of Bechtel et al. [18] and the traditional error matrix.As can be seen in the figure, compared to the metric of Bechtel et al. [18], our approach resulted in a lower penalty being applied to the misclassification of physically similar LCZ classes, and a higher penalty being applied to the misclassification of physically dissimilar LCZ classes.Thus, our weighted error matrix should give a better indication of how accurately a LCZ map depicts the local thermal environment.
Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 10 Figure 2. Comparison of misclassification penalties derived using our proposed LCZ dissimilarity metric, the metric proposed by Bechtel et al. [18], and the traditional error metric (i.e., equal penalty applied to all types of misclassifications).

Potential, Limitations, and Alternative Implementations of Proposed Approach
The weighted accuracy measures calculated using our proposed approach, wOA in particular, can serve as the basis for comparing multiple classified LCZ maps of the same study site-e.g., using different input datasets, different classification algorithms, and/or different classification variablesto identify the most accurate classified map.In this way, a LCZ map that best portrays the local thermal environment of a study site can be selected, allowing for more accurate simulations of urban microclimate to be conducted using other spatially-explicit models (e.g., the Weather Research Forecasting model [20]), and (ideally) for more climate-sensitive urban planning practices to be adopted.
One potential limitation of our approach, however, is that the values of the LCZ dissimilarity metric can only range from 0-1 (or more specifically, from 0.08-0.80using our generic approach).The significance is that the weighted accuracy values cannot be lower than their unweighted counterparts, because the multiplier for misclassification is always assumed to be 1 in the traditional Comparison of misclassification penalties derived using our proposed LCZ dissimilarity metric, the metric proposed by Bechtel et al. [18], and the traditional error metric (i.e., equal penalty applied to all types of misclassifications).

Demonstration of Proposed Approach Using a Synthetic Dataset
In this section, we used a synthetic dataset to demonstrate the processes of generating a weighted error matrix, and calculating weighted overall/producer's/user's accuracy from the weighted error matrix.We elected to use a synthetic dataset here rather than real data to illustrate the metric calculations in a simpler way, i.e., without delving into explanations of the image data, training data, classification algorithm, etc., which would be needed if a real remote sensing dataset was used.
Table 2 shows a traditional (unweighted) error matrix for this synthetic dataset.The matrix contains 12 LCZ types, with 5 LCZ types assumed to not be present (missing LCZs is a common case, in practice).Table 3 shows the weighted error matrix, after the values in Table 2 have been multiplied by the LCZ dissimilarity metric values in Table 1.From the traditional error matrix, the overall accuracy (OA) of the LCZ map is calculated to be 0.76, while producer's accuracy (PA) values range from 0.06-1.0 and user's accuracy (UA) values range from 0.03-1.0.For the weighted error matrix, the weighted overall accuracy (wOA) value is calculated as 0.92, while the weighted producer's accuracy (wPA) values range from 0.22-1.0 and the weighted user's accuracy (wUA) values range from 0.14-1.0.Comparing the cell values of the two error matrices, it is evident that the differences between the PA/wPA values and UA/wUA values were greater for LCZ types that had been mostly misclassified as other physically similar LCZ types (e.g., misclassifications among the urban LCZ types; LCZ 1-10).This is because the misclassification penalties (i.e., D ij values) for these types of errors were relatively low under our proposed approach.From the comparison of the two matrices is also clear that the values in the weighted error matrix are always less than or equal to those of the traditional error matrix, and this is true because no pairwise combination of LCZs has a D ij value > 1.The weighted accuracy measures calculated using our proposed approach, wOA in particular, can serve as the basis for comparing multiple classified LCZ maps of the same study site-e.g., using different input datasets, different classification algorithms, and/or different classification variables-to identify the most accurate classified map.In this way, a LCZ map that best portrays the local thermal environment of a study site can be selected, allowing for more accurate simulations of urban microclimate to be conducted using other spatially-explicit models (e.g., the Weather Research Forecasting model [20]), and (ideally) for more climate-sensitive urban planning practices to be adopted.
One potential limitation of our approach, however, is that the values of the LCZ dissimilarity metric can only range from 0-1 (or more specifically, from 0.08-0.80using our generic approach).The significance is that the weighted accuracy values cannot be lower than their unweighted counterparts, because the multiplier for misclassification is always assumed to be 1 in the traditional error matrix.Thus, our error matrix weighting approach is somewhat akin to the process of merging of LCZs into fewer, more general LULC categories (e.g., either "built-up" or "natural" [9,18]) for accuracy assessment, but our approach is more of a partial merging (unless D ij = 0).
A second limitation of our approach is that some of the weighted accuracy metrics may be more difficult to interpret than the traditional metrics.wOA can be generally understood as the degree of accuracy at which the generated LCZ map estimates the physical characteristics of the landscape relevant to the local thermal environment.The wPA and wUA values, however, are more difficult to interpret and should be used with caution.Additionally, the LCZ dissimilarity metric weighted accuracy metrics are based the typical physical characteristics (surface structure, land cover, and anthropogenic heat emissions) of each LCZ type, as identified in Stewart and Oke [1].In reality, however, these physical characteristics can vary from region to region [11].Our approach is intended to be generic enough to use anywhere, but the collection and usage of locally appropriate parameter values would be beneficial, when feasible.
Finally, we should point out that for some applications (e.g., LULC change assessment or urban planning), an LCZ map creator or map user may be concerned with both the map's thematic accuracy and the accuracy of the physical characteristics depicted in the map.Considering this, rather than simply use OA or wOA to evaluate an LCZ map's accuracy, an alternative would be to combine these two metrics, e.g., based on the mean average accuracy of the two metrics [(OA + wOA)/2] or the harmonic mean (F1-measure) [(2 × OA × wOA)/(OA + wOA)] [21].In this way, multiple classified LCZ maps could be compared to identify one that achieves the desired trade-off between thematic and physical accuracy.Indeed, it is relatively common to interpret and/or combine multiple measures of accuracy in remote sensing studies, e.g., for image classification [18,22,23], image segmentation [24][25][26], or image enhancement [27].In the case of our synthetic dataset, the mean average accuracy of the two metrics would be equal to 0.840, while the F1-measure would be equal to 0.837.

Conclusions
In this study, we presented a new approach for evaluating the accuracy of local climate zone (LCZ) maps.The proposed approach takes into account the dissimilarity between different LCZ types in terms of nine physical characteristics (sky view factor, aspect ratio, mean building/tree height, terrain roughness, building surface fraction, impervious surface fraction, surface admittance, surface albedo, and anthropogenic heat flux), and allows for a better understanding of the accuracy at which LCZ maps depict the local thermal environment of the landscape.We have presented a generic approach based on the typical physical parameters of each LCZ type (as identified in [1]), but it is also possible to optimize the approach for a specific location, considering the locally-appropriate physical characteristics of each LCZ type, by substituting locally-specific values for the values given in Figure 1.In contrast, the previous approach developed by Bechtel et al. does allow for this kind of optimization, as equations for calculating the error matrix weights are not provided.Although our proposed approach has some limitations, it is more suitable for evaluating the accuracy of LCZ maps than the traditional land use/land cover accuracy assessment approach that considers only map thematic accuracy, and is also more suitable than the existing physically-based LCZ map accuracy assessment approach (i.e., that proposed by Bechtel et al.).
Finally, although in this study we focused only on LCZ map accuracy assessment, there is a need to develop other application-specific LULC map accuracy assessment approaches, as LULC maps are now being used as input data to model various physical processes.Our proposed approach could be slightly modified for use in other application domains.For example, to better evaluate the accuracy of LULC maps used as inputs for hydrological models, it would be necessary to account for the dissimilarity between LULC classes in terms of their hydrological properties (e.g., typical rainfall runoff and/or groundwater infiltration rates of different LULC types) rather than their thermal properties.

Figure 1 .
Figure 1.Graphical representation of each local climate zone (LCZ) type; and average values of their surface structure/land cover/anthropogenic heat flux parameters, as derived from the value ranges given in Stewart and Oke [1].

Figure 1 .
Figure 1.Graphical representation of each local climate zone (LCZ) type; and average values of their surface structure/land cover/anthropogenic heat flux parameters, as derived from the value ranges given in Stewart and Oke [1].

Figure 2 .
Figure 2.Comparison of misclassification penalties derived using our proposed LCZ dissimilarity metric, the metric proposed by Bechtel et al.[18], and the traditional error metric (i.e., equal penalty applied to all types of misclassifications).

Table 1 .
Dij value for each pairwise combination of local climate zone (LCZ) types.Higher Dij values indicate that a higher penalty (i.e., multiplier) is applied to misclassifications between the corresponding LCZ types.

Table 1 .
D ij value for each pairwise combination of local climate zone (LCZ) types.Higher D ij values indicate that a higher penalty (i.e., multiplier) is applied to misclassifications between the corresponding LCZ types.

Table 2 .
Traditional error matrix generated using our synthetic dataset with 12 LCZ classes.

Table 3 .
Weighted error matrix generated using our synthetic dataset with 12 LCZ classes.