Spatial Association Rules and Thermal Environment Differentiation Evaluation of Local Climate Zone and Urban Functional Zone

: Urban heat islands (UHIs) caused by urbanization have become a major issue affecting the sustainable development of the ecological environment. The distribution of UHIs is mainly affected by the reﬂection and transmission of heat radiation caused by differences in urban spaces, and the anthropogenic heat emissions caused by social activities. At present, the research on the urban thermal environment involves two spatial classiﬁcation systems: local climate zone (LCZ), based on urban morphology and spatial patterns; and urban functional zone (UFZ), based on socio-economic activities. It is not clear whether there are association rules between these two systems in different cities. Against this background, this study explores the association rules between the UFZ and LCZ classiﬁcation systems using the selected Chinese cities in different regions as typical examples. Our results conﬁrm that there are common association rules from UFZ to LCZ, as the form of buildings is greatly inﬂuenced by the types of functional areas in urban construction. Speciﬁcally, the medical zone corresponds to the compact mid-rise zone (LCZ2); the business service area and the ofﬁce area also correspond to the compact zone (LCZ1-LCZ3); and the industrial area corresponds to the compact low-rise zone (LCZ3). These functional zones have the same association rules in different cities. The cross-regional mining of the relationship between different urban functional systems will help to coordinate different planning departments and carry out the integration of multiple spatial plans. Furthermore, we found that LCZ has a better differentiation effect on the surface temperature through our comparison research, which makes it more suitable as a reference for research on the thermal environment and the heat island effect.


Introduction
According to the report "State of the Climate in 2021" released by the World Meteorological Organization [1], four key climate change indicators, including greenhouse gas concentration, rise in sea level, ocean heat, and ocean acidification, set new records in 2021. China's average surface temperature, coastal sea level, and other climate change indicators also broke the observation records in 2021. These factors may have harmful and long-lasting impacts on sustainable development and ecosystems. Under the trend of global climate change, urban warming has attracted more and more attention, as the high temperature will accelerate energy consumption [2], aggravate air pollution [3], and endanger human health [4]. Therefore, understanding the distribution and characteristics of the urban thermal environment is of great significance and has practical value for improving urban resilience. Urban heat islands (UHIs) are mainly caused by the combined However, according to the definition of LCZ, its division focuses on the description of surface morphology and texture, while the geographical location and human activity differences of LCZ are not fully considered. Therefore, there may be a certain temperature difference for the same LCZ type due to their various geographical locations in the city. Previous studies have shown that the average LST of a single LCZ distributed in different areas of the city can be significantly different, and the temperatures of different cities in the same LCZ may also show significant differences [28][29][30].
Based on the above research, it can be seen that the analysis of UHI has two spatial structures: the LCZ classification system, dominated by urban form, and the UFZ classification system, dominated by socio-economic activities. A large number of scholars have carried out thermal environment analysis based on these two systems, but have not systematically analyzed the applicability of these two classification systems for UHI studies. And it is still not known whether there are common association rules between these two classification systems in different cities. In this paper, the urban thermal environments of UFZ and LCZ are analyzed and evaluated. A comparative study is carried out in order to find out the association rules between the two classification systems. By investigating the association rules between different classification systems of UFZ and LCZ, we aim to develop a comprehensive "code-book" with which to effectively translate inter-departmental planning schemes. This can be used as a reference for the relevant thermal environment research, and assist urban planners in creating UHI mitigation policies tailored to local conditions.

Study Area
Beijing (the capital of China), Harbin, Wuhan, and Guangzhou (three provincial capitals) were selected as the study areas ( Figure 1). According to China's urban scale grading standards, Beijing and Guangzhou are megacities, Wuhan is a supercity, and Harbin is a big city. These four cities cover the northern and southern regions of China (Northeast-Harbin, North-Beijing, Central-Wuhan, South-Guangzhou). In recent decades, they all have experienced large-scale and rapid urbanization processes. The climates and landscapes in these cities vary significantly due to natural conditions and human social factors. Therefore, the choice of these cities for this research holds broad significance and represents different aspects of China.

Data Sources
In this study, Landsat 8 images were used to retrieve LST. LCZ was mainly divided by using images from Google Earth, base data on the buildings, and Landsat 8 images. The UFZ data were obtained from the public data source of Tsinghua University (Table 1). They were based on Sentinel remote sensing data, OpenStreetMap data, Luojia No.1 night light data, Tencent mobile positioning, and Points of Interest (POI) in Gaode navigation data, as well as other social and transportation-related big data, to divide urban functional areas in 2018 [31]. In this study, we only focused on urban areas and used built-up area boundaries to clip the UFZ data of the selected cities ( Figure 2). For the city of Beijing, the built-up area is mainly concentrated inside the Fifth Ring Road. Therefore, it was used to clip the study area of Beijing. This study also used building base vector data as a reference for constructing LCZ. The building base vector dataset which we used was obtained from the National Qinghai Tibet Plateau Science Data Center [32], and was extracted using high-resolution remote sensing images and deep learning semantic segmentation methods. The raster and vector data used in this paper are shown in Table 1. standards, Beijing and Guangzhou are megacities, Wuhan is a supercity, and Harbin is a big city. These four cities cover the northern and southern regions of China (Northeast-Harbin, North-Beijing, Central-Wuhan, South-Guangzhou). In recent decades, they all have experienced large-scale and rapid urbanization processes. The climates and landscapes in these cities vary significantly due to natural conditions and human social factors. Therefore, the choice of these cities for this research holds broad significance and represents different aspects of China.

Data Sources
In this study, Landsat 8 images were used to retrieve LST. LCZ was mainly divided by using images from Google Earth, base data on the buildings, and Landsat 8 images. The UFZ data were obtained from the public data source of Tsinghua University (Table 1). They were based on Sentinel remote sensing data, OpenStreetMap data, Luojia No.1 night light data, Tencent mobile positioning, and Points of Interest (POI) in Gaode navigation data, as well as other social and transportation-related big data, to divide urban functional areas in 2018 [31]. In this study, we only focused on urban areas and used built-up area boundaries to clip the UFZ data of the selected cities ( Figure 2). For the city of Beijing, the built-up area is mainly concentrated inside the Fifth Ring Road. Therefore, it was used to clip the study area of Beijing. This study also used building base vector data as a reference for constructing LCZ. The building base vector dataset which we used was obtained from the National Qinghai Tibet Plateau Science Data Center [32], and was extracted using high-resolution remote sensing images and deep learning semantic segmentation methods. The raster and vector data used in this paper are shown in Table 1.  boundaries to clip the UFZ data of the selected cities ( Figure 2). For the city of Beijing, the built-up area is mainly concentrated inside the Fifth Ring Road. Therefore, it was used to clip the study area of Beijing. This study also used building base vector data as a reference for constructing LCZ. The building base vector dataset which we used was obtained from the National Qinghai Tibet Plateau Science Data Center [32], and was extracted using high-resolution remote sensing images and deep learning semantic segmentation methods. The raster and vector data used in this paper are shown in Table 1.

LCZ and UFZ Association Rule Mining
The classification systems of UFZ and LCZ are different, but they are spatially and semantically related [33]. Mining the association rules between urban functional systems can help to coordinate multiple departments in order to complete territorial spatial planning [34]. Multi-system urban functions are interrelated: for example, shantytowns in the land use system may appear as slums in the socio-economic system and compact low-rise buildings in the LCZ system. In this research, the association rules between the established LCZ and the existing UFZ data were explored using the following steps: (1) Overlay analysis was conducted by calculating the spatial overlap of the two systems.

LCZ and UFZ Association Rule Mining
The classification systems of UFZ and LCZ are different, but they are spatially and semantically related [33]. Mining the association rules between urban functional systems can help to coordinate multiple departments in order to complete territorial spatial planning [34]. Multi-system urban functions are interrelated: for example, shantytowns in the land use system may appear as slums in the socio-economic system and compact low-rise buildings in the LCZ system. In this research, the association rules between the established LCZ and the existing UFZ data were explored using the following steps: (1) Overlay analysis was conducted by calculating the spatial overlap of the two systems.
The urban function classification results of different systems were spatially superimposed. As shown in Figure 4, a local climate zone (LCZ1) intersects several functional zones (UFZa and UFZb). Their spatial overlap area can be expressed as LCZ1 ∩ UFZa and LCZ1 ∩ UFZb. The spatial correlation degree (SCD) of UFZa in the functional area to LCZ1 was calculated as follows: SCD (LCZ1|UFZa) = (LCZ1 ∩ UFZa)/UFZa (i.e., 100% in Figure 4). The SCD of LCZ1 to UFZa was calculated as follows: SCD (UFZa|LCZ1) = LCZ1 ∩ UFZa/LCZ1 (i.e., 40% in Figure 4).  (2) Ranking the correlation degrees: a correlation degree of 70-100% is high; 30-70% is medium; and less than 30% is low. (3) According to the above two steps, we obtained the results regarding the degree of correlation of each block with other types of blocks. We filtered out all blocks with high correlation degrees, and used Apriori algorithm to mine the association rules of UFZ and LCZ.
The Apriori algorithm process is divided into two steps [35]: (1) All frequent item sets in the database were retrieved through iteration to identify the item sets whose support was not lower than the threshold set by the user. (2) The association rules satisfying the minimum confidence defined by users were constructed using frequent item sets. The support of the association rules was calculated by dividing the number of item sets that contained both X and Y by the total number of item sets. The frequency at which the item sets contained both items, X and Y, together was measured. This measure was denoted as support (X => Y). The confidence level of the association rule was defined as the ratio of the number of item sets containing X and Y to the number of item sets containing X, which was recorded as confidence (X => Y). The confidence level reflects the probability of the occurrence of item Y in item sets containing X. The algorithm follows an iterative process of continuous cycling until no more frequent item sets can be found. By going through these steps, the spatial association rule mining is effectively converted into association rule mining. This approach provides a valuable framework for mining association rules in a multiple urban classification system.

Land Surface Temperature Retrieval
We used Landsat 8 images to determine the surface temperature based on the radiation transfer equation method [36]. The main formula of this method is as follows: (2) Ranking the correlation degrees: a correlation degree of 70-100% is high; 30-70% is medium; and less than 30% is low. (3) According to the above two steps, we obtained the results regarding the degree of correlation of each block with other types of blocks. We filtered out all blocks with high correlation degrees, and used Apriori algorithm to mine the association rules of UFZ and LCZ.
The Apriori algorithm process is divided into two steps [35]: (1) All frequent item sets in the database were retrieved through iteration to identify the item sets whose support was not lower than the threshold set by the user. (2) The association rules satisfying the minimum confidence defined by users were constructed using frequent item sets. The support of the association rules was calculated by dividing the number of item sets that contained both X and Y by the total number of item sets. The frequency at which the item sets contained both items, X and Y, together was measured. This measure was denoted as support (X => Y). The confidence level of the association rule was defined as the ratio of the number of item sets containing X and Y to the number of item sets containing X, which was recorded as confidence (X => Y). The confidence level reflects the probability of the occurrence of item Y in item sets containing X. The algorithm follows an iterative process of continuous cycling until no more frequent item sets can be found. By going through these steps, the spatial association rule mining is effectively converted into association rule mining. This approach provides a valuable framework for mining association rules in a multiple urban classification system.

Land Surface Temperature Retrieval
We used Landsat 8 images to determine the surface temperature based on the radiation transfer equation method [36]. The main formula of this method is as follows: where B is the radiance value of blackbody in thermal infrared band; L λ is the spectral radiation of the sensor; L u , L d , and τ represent atmospheric upward radiation, atmospheric downward radiation, and atmospheric transmittance, which can be found on NASA's official website-https://atmcorr.gsfc.nasa.gov/ (accessed on 1 May 2022) according to the imaging time and central latitude and longitude of the image; K 1 and K 2 can be obtained from the image header file; and ε is the surface emissivity, which is an important parameter for temperature inversion. Due to the various types of surface coverage in the study area, we adopted the algorithm proposed by Qin et al. [37].

Geographic Detector Analysis
The geographic detector method is theoretically a spatial statistical method, and was proposed by Wang et al. [38,39]. The premise is that if the spatial stratification heterogeneity of two variables tends to be consistent, then the two variables are related. It can be used to explore the driving force of response variables. The geographical detector can be applied for both numerical data and deterministic data, which is a major advantage of the method. It calculates the parameter of the q value of each single factor, which indicates the extent to which each factor explains the spatial differentiation of dependent variables. The minimum q value is 0 and the maximum is 1. The larger the q value is, the greater the degree of spatial differentiation interpretation of the dependent variable. Linear regression is a classic statistical method. It assumes that all variables in the regression model are independent and share similar spatial distributions. It ignores the spatial characteristics of variable data and does not consider the spatial stratification heterogeneity. Technically, linear regression may lead to biased and incomplete results. Therefore, in this study, we used the geographic detector model to study the application effect of LCZ and UFZ in the field of thermal environment. The geographic detector model includes four modules, namely, the factor detector, interaction detector, risk detector, and ecological detector. This study mainly uses the factor detector. It identifies the factors that are responsible for the independent variable. The explanatory power of each factor is measured by its q value: where h (1, . . ., L) is the number of sub-regions of factor X; N represents the total number of spatial units in the whole study area; N h represents the number of samples in h partition; and σ and σ h represent the total variance and variance of samples in sub-region h.

LCZ Classification Results
As shown in Figure 5, the local climate zone in Guangzhou is mainly dominated by LCZ4, with LCZ2, LCZ5, and LCZ6 each accounting for 15%. In addition, LCZ7 and LCZ8 also account for 7.3% and 8.8%, respectively. LCZ4 and LCZ5 in Wuhan showed the highest proportion, accounting for 42% in total, followed by LCZ2 and LCZ8, accounting for 10.8% and 9.5%, respectively. In Harbin, LCZ4 accounted for 32%, followed by LCZ2 and LCZ3, accounting for 16.9% and 12.5%, respectively, and LCZ8 accounted for 10.5%. In Beijing's Fifth Ring Road, the dense high-rise buildings (LCZ1) showed the highest proportion, accounting for 41.6% of the total area. The second ring was mainly composed of dense mid-rise buildings (LCZ2). Dense low-rise buildings and open high-rise buildings accounted for the same proportion, each accounting for 7%, mostly scattered between the fourth and fifth rings. The proportion of heavy industrial areas (LCZ10) and very open low floors (LCZ9) was very small. According to the population data, the ranking order of population density is Beijing, Guangzhou, Wuhan, and Harbin. In order to accommodate more people in cities, urban planners tend to increase the density of the buildings or the height of the urban construc-  According to the population data, the ranking order of population density is Beijing, Guangzhou, Wuhan, and Harbin. In order to accommodate more people in cities, urban planners tend to increase the density of the buildings or the height of the urban constructions. Beijing and Guangzhou, with large population densities, are dominated by dense local climate zones. The ratio of dense to open local climate zones in the Beijing Research Area is 7:3, and the ratio of dense to open local climate zones in the Guangzhou Research Area is 6:4. Wuhan has a large proportion of open local climate zones in the built-up urban areas, but the proportion of high local climate zones is significantly higher than that of midand low-rise buildings. The proportion of open and dense climate zones in urban built-up areas in Harbin is relatively average. Basically, it tends to be 5:5, and the proportion of high, middle, and low climate zones is 5:3:2.

Surface Temperature Distribution
According to Figures 6 and 7, it can be seen that there are certain similarities in each area. The areas with non-urban cover types (LCZF, LCZT, LCZW) are smaller than those with other urban cover types, and the surface temperature demonstrates the general rule of LCZF > LCZT > LCZW. This shows that in the urban environment, the heat capacity of the impervious surface is higher, and it absorbs more heat after receiving solar radiation, while vegetation and water have obvious cooling effects [37]. areas, but the proportion of high local climate zones is significantly higher than that of mid-and low-rise buildings. The proportion of open and dense climate zones in urban built-up areas in Harbin is relatively average. Basically, it tends to be 5:5, and the proportion of high, middle, and low climate zones is 5:3:2.

Surface Temperature Distribution
According to Figures 6 and 7, it can be seen that there are certain similarities in each area. The areas with non-urban cover types (LCZF, LCZT, LCZW) are smaller than those with other urban cover types, and the surface temperature demonstrates the general rule of LCZF > LCZT > LCZW. This shows that in the urban environment, the heat capacity of the impervious surface is higher, and it absorbs more heat after receiving solar radiation, while vegetation and water have obvious cooling effects [37].  When the building height is equal, the Surface Urban Heat Island (SUHI) of a compact building area will be significantly higher than that of an open building, for example, the compact high-rise zone (LCZ1) > the open high-rise zone (LCZ4); the compact midrise zone (LCZ2) > the open mid-rise zone (LCZ5); and the compact low-rise zone (LCZ3) > the open low-rise zone (LCZ6). This is primarily due to the fact that compact building areas have a greater amount of hard surfaces, which results in increased heat absorption and limited ventilation. On the other hand, open building areas benefit from good air circulation and higher vegetation cover, which helps to mitigate the heat island effect through transpiration and shading from daylight. Furthermore, it is worth noting that the height of the climate zone is often linked to the surface temperature, as observed in the study. At equal density, the rule that the middle floor is larger than the lower and higher floor climate zones is generally common in each study area. This phenomenon may be due to the mutual shielding of floor shadows in high-rise buildings, resulting in relatively low surface temperatures. For the mid-and low-rise building areas, the higher the floor height is, the higher the temperature in the local climate area becomes, resulting in higher When the building height is equal, the Surface Urban Heat Island (SUHI) of a compact building area will be significantly higher than that of an open building, for example, the compact high-rise zone (LCZ1) > the open high-rise zone (LCZ4); the compact mid-rise zone (LCZ2) > the open mid-rise zone (LCZ5); and the compact low-rise zone (LCZ3) > the open low-rise zone (LCZ6). This is primarily due to the fact that compact building areas have a greater amount of hard surfaces, which results in increased heat absorption and limited ventilation. On the other hand, open building areas benefit from good air circulation and higher vegetation cover, which helps to mitigate the heat island effect through transpiration and shading from daylight. Furthermore, it is worth noting that the height of the climate zone is often linked to the surface temperature, as observed in the study. At equal density, the rule that the middle floor is larger than the lower and higher floor climate zones is generally common in each study area. This phenomenon may be due to the mutual shielding of floor shadows in high-rise buildings, resulting in relatively low surface temperatures. For the mid-and low-rise building areas, the higher the floor height is, the higher the temperature in the local climate area becomes, resulting in higher surface temperature in middle-rise climate areas than in the areas with low-rise buildings. The research shows that high-rise buildings absorb less heat due to shadow shading, while the LST in mid-and low-rise building areas has relatively little correlation with the building height, which has been proven by previous research [40]. Figure 8 shows that the distribution law of surface temperature was similar among the study areas. For example, the surface temperatures of 202 commercial service areas, 402 transportation stations, and 503 medical areas were relatively high, which may have been caused by more abundant artificial heat sources, a high solar radiation absorption rate of building materials, and less vegetation in these areas. The LSTs of 505 park and green areas, 504 sports and cultural areas, 501 administrative areas, and 502 education areas were relatively low, which may be related to the relatively open areas and high coverage of green space and water [41]. surface temperature in middle-rise climate areas than in the areas with low-rise buildings. The research shows that high-rise buildings absorb less heat due to shadow shading, while the LST in mid-and low-rise building areas has relatively little correlation with the building height, which has been proven by previous research [40]. Figure 8 shows that the distribution law of surface temperature was similar among the study areas. For example, the surface temperatures of 202 commercial service areas, 402 transportation stations, and 503 medical areas were relatively high, which may have been caused by more abundant artificial heat sources, a high solar radiation absorption rate of building materials, and less vegetation in these areas. The LSTs of 505 park and green areas, 504 sports and cultural areas, 501 administrative areas, and 502 education areas were relatively low, which may be related to the relatively open areas and high coverage of green space and water [41].

Association Rule Mining Results
The results of association rule mining were obtained by applying a support degree threshold of 0.1 and a promotion degree requirement of more than 1. The association rules were mined from two directions: UFZ to LCZ and LCZ to UFZ, and the results are shown in Tables 2 and 3. The rules between cities are generally quite different, but there are also similarities. For example, in Beijing and Guangzhou, as two megacities in the study area, residential areas have high correlation rules with the compact middle-rise areas, which is related to the high population density of these megacities. The business service areas in the four cities have formed association rules with compact areas (LCZ1-LCZ3). The business service areas in Guangzhou, Wuhan, and Harbin have mainly formed association rules with the LCZ2 compact mid-rise zone, and the business service areas in Beijing have formed association rules with the LCZ1 compact high-rise zone. Industrial zones also have certain commonalities. The industrial zones in the four research areas mainly form association rules with the LCZ3 compact low-rise zone and the LCZ8 large low-rise zone. In the four research areas, the medical zone forms association rules with the compact middle-rise zone.  Through these results, we found that the more association rules obtained from UFZ mining towards LCZ, which also shows that we are function-oriented in urban construction to determine the shapes of building areas, and the association rules of residential areas are mainly affected by the density of the urban population. The association rules formed in cities with high population densities also mainly correspond to dense areas, while some functional areas have formed common rules among cities. For example, the medical areas in the four research areas correspond to compact mid-rise buildings; business service areas and office areas correspond to compact buildings; and the industrial areas correspond to compact low-rise buildings. These functional areas have the same association rules in different cities. This shows that people have reached a certain consensus on the regional form according to the roles of these functional areas. However, the process of mining the association rules from LCZ to UFZ has great differences among different cities, and no common rules have been found. The formed association rules are relatively few, which shows that the different functional zones in each type of LCZ are more diverse and scattered.
It is difficult to screen out the association rules that meet the requirements of support and promotion.

Analysis of Thermal Environment Differentiation between UFZ and LCZ
According to the factor detector, the q value of each study area was obtained ( Figure 9). It can be clearly seen that LCZ had a higher degree of interpretation of thermal environment differentiation in many regions. Among them, the Wuhan Research Area had a higher q value, while the Beijing Research Area had a lower q value, which may be related to the smaller selection area of the latter. Other research areas generally had larger scopes, which may lead to more significant results in differentiation research. This indicates that the larger the research area, the better the effect on differentiation research. In each of the selected study areas, LCZ had a higher q value than UFZ, which indicates that LCZ achieved a better interpretation of the differentiation of thermal environments. selected study areas, LCZ had a higher q value than UFZ, which indicates that LCZ achieved a better interpretation of the differentiation of thermal environments. Figure 9. Factor detector q value.

Conclusions
In this study, two classification systems commonly used in the study of the urban thermal environment, LCZ and UFZ, were mined for association rules, and the spatial

Conclusions
In this study, two classification systems commonly used in the study of the urban thermal environment, LCZ and UFZ, were mined for association rules, and the spatial differentiation of thermal environment was evaluated based on LCZ and UFZ. The mining of association rules for LCZ and UFZ reveals that certain cities share similar associations. For instance, medical areas tend to correspond to densely intermediate floors, while commercial service and office areas are also linked to dense structures. Similarly, industrial areas align with compact low-rise regions. Furthermore, the analysis suggests a higher degree of commonality between UFZ and LCZ compared to the direction from LCZ to UFZ. This indicates that people in different cities have agreed upon architectural preferences based on the role of UFZ. Using factor detectors from geographic detectors, a thermal environment spatial differentiation analysis on LCZ and UFZ was conducted. Through the results of the factor detector, it became evident that LCZ had a higher degree of explanation for LST differentiation in each study area. Additionally, there were more substantial differences in LST between the sub-regions of LCZ, suggesting that LCZ has a more refined differentiation of the thermal environment.
In general, our study can provide references for relevant heat island research to help urban planners formulate mitigation policies according to local conditions. Nevertheless, there are still limitations, and future works may be conducted from the following perspectives. First, the research period in this study was mainly summer, and the relevant rules of other seasons have not been studied. In a next step, relevant research can be conducted on thermal environment differentiation in other seasons to explore the unified rules of various seasons, enriching the overall and comprehensive value of this study. Second, the association rule mining method can be further studied technically. For example, considering more factors (e.g., types of cities) in a geographic detector with two classification systems may help to enable more direct application and transformation of the mined association rules.

Data Availability Statement:
The data that support the findings of this study are available upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.