Local Climate Zone Mapping Using Multi-Source Free Available Datasets on Google Earth Engine Platform

: As one of the widely concerned urban climate issues, urban heat island (UHI) has been studied using the local climate zone (LCZ) classiﬁcation scheme in recent years. More and more effort has been focused on improving LCZ mapping accuracy. It has become a prevalent trend to take advantage of multi-source images in LCZ mapping. To this end, this paper tried to utilize multi-source freely available datasets: Sentinel-2 multispectral instrument (MSI), Sentinel-1 synthetic aperture radar (SAR), Luojia1-01 nighttime light (NTL), and Open Street Map (OSM) datasets to produce the 10 m LCZ classiﬁcation result using Google Earth Engine (GEE) platform. Additionally, the derived datasets of Sentinel-2 MSI data were also exploited in LCZ classiﬁcation, such as spectral indexes (SI) and gray-level co-occurrence matrix (GLCM) datasets. The different dataset combinations were designed to evaluate the particular dataset’s contribution to LCZ classiﬁcation. It was found that: (1) The synergistic use of Sentinel-2 MSI and Sentinel-1 SAR data can improve the accuracy of LCZ classiﬁcation; (2) The multi-seasonal information of Sentinel data also has a good contribution to LCZ classiﬁcation; (3) OSM, GLCM, SI, and NTL datasets have some positive contribution to LCZ classiﬁcation when individually adding them to the seasonal Sentinel-1 and Sentinel-2 datasets; (4) It is not an absolute right way to improve LCZ classiﬁcation accuracy by combining as many datasets as possible. With the help of the GEE, this study provides the potential to generate more accurate LCZ mapping on a large scale, which is signiﬁcant for urban development.


Introduction
As increasing people live in cities, the greatest public health challenges of the 21stcentury will come from cities. Therefore, human health is likely to be especially vulnerable to the adverse effects of urban climate change. As one of the widely concerned urban climate issues, urban heat island (UHI) is a universal phenomenon that occurs in cities worldwide [1][2][3]. It has caused some urban climate problems, such as heatwaves and air pollution [4][5][6][7] and has widely aroused people's attention in the world [8]. In the past several decades, it has been the preoccupation of researchers to measure the UHI effect by comparing the temperature of "urban" and "rural" areas. However, the simple division of "urban" and "rural" areas is always based on the researchers' experience and lacks a unified standard. Stewart and Oke (2012) proposed a local climate zone (LCZ) classification scheme, which provides a research framework for UHI studies and standardizes the worldwide exchange of urban temperature observation [9]. There are 17 LCZ classes in the LCZ classification scheme (Table 1): 10 built type classes (LCZ 1 to LCZ 10) and 7 land cover classes (LCZ A to LCZ G). LCZ provides a novel classification scheme to conduct UHI studies, in which urban landscapes are classified into different LCZ classes according to urban structures, land cover, and construction materials [9]. Hence, accurate LCZ classification information is important for UHI research.

D. Low plantsluhan
Featureless landscape of grass or herbaceous plants/crops. Few or no trees. Zone function is natural grassland, agriculture, or urban park.

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Heavily wooded landscape of deciduous and/or evergreen trees. Land cover mostly pervious (low plants). Zone function is natural forest, tree cultivation, or urban park.
2. Compact midrise [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021). Table 1. Local climate zones scheme [9].

Built Type Definition Land Cover Type Definition
1. Compact high-riseluhan Dense mix of tall buildings to tens of stories. Few or no trees. Land cover mostly paved. Concrete, steel, stone, and glass construction materials.
A. Dense treesluhan Heavily wooded landscape of deciduous and/or evergreen trees. Land cover mostly pervious (low plants). Zone function is natural forest, tree cultivation, or urban park.

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Dense mix of midrise buildings (3-9 stories). Few or no trees. Land cover mostly paved. Stone, brick, tile, and concrete construction materials.
B. Scattered trees [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021). Table 1. Local climate zones scheme [9].

Built Type Definition Land Cover Type Definition
1. Compact high-riseluhan Dense mix of tall buildings to tens of stories. Few or no trees. Land cover mostly paved. Concrete, steel, stone, and glass construction materials.
A. Dense treesluhan Heavily wooded landscape of deciduous and/or evergreen trees. Land cover mostly pervious (low plants). Zone function is natural forest, tree cultivation, or urban park.

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Lightly wooded landscape of deciduous and/or evergreen trees. Land cover mostly pervious (low plants). Zone function is natural forest, tree cultivation, or urban park.

Compact low-rise
WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021). WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Open arrangement of bushes, shrubs, and short, woody trees. Land cover mostly pervious (bare soil or sand). Zone function is natural scrubland or agriculture.

Open high-rise
The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Open arrangement of tall buildings to tens of stories. Abundance of pervious land cover (low plants, scattered trees). Concrete, steel, stone, and glass construction materials.
D. Low plants 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Featureless landscape of grass or herbaceous plants/crops. Few or no trees. Zone function is natural grassland, agriculture, or urban park.

Open midrise
mation is important for UHI research.
The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.

E. Bare rock or paved
The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Featureless landscape of rock or paved cover. Few or no trees or plants. Zone function is natural desert (rock) or urban transportation.

Open low-rise
tures, land cover, and construction materials [9]. Hence, accurate LCZ classification information is important for UHI research. The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Open arrangement of low-rise buildings (1-3 stories). Abundance of pervious land cover (low plants, scattered trees). Wood, brick, stone, tile, and concrete construction materials.
F. Bare soil or sand tures, land cover, and construction materials [9]. Hence, accurate LCZ classification information is important for UHI research. The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Featureless landscape of soil or sand cover. Few or no trees or plants. Zone function is natural desert or agriculture. 7. Lightweight low-rise which urban landscapes are classified into different LCZ classes according to urban structures, land cover, and construction materials [9]. Hence, accurate LCZ classification information is important for UHI research.
The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).

Lightweight low-riseluhan
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).

G. Waterluhan
Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons.

Large low-riseluhan
Open arrangement of large low-rise buildings (1-3 stories). Few or no trees. Land cover mostly paved.
Dense mix of single-story buildings. Few or no trees. Land cover mostly hard-packed. Lightweight construction materials (e.g., wood, thatch, corrugated metal).
G. Water which urban landscapes are classified into different LCZ classes according to urban structures, land cover, and construction materials [9]. Hence, accurate LCZ classification information is important for UHI research. The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021). Large, open water bodies such as seas and lakes, or small bodies such as rivers, reservoirs, and lagoons. With the fast development of remote sensor technologies mounted on either airborne or spaceborne platforms, remote sensing data are also used for LCZ mapping. Based on the standard production of the WUDAPT project, Landsat data have become the most popular data in LCZ mapping [13][14][15][16][17][18][19]. However, the LCZ classification scheme includes so many classes, and the single data source has some limitations in distinguishing such many LCZ classes. Now, it has become a prevalent trend in LCZ classification to take advantage of all kinds of datasets to produce a more accurate classification result. These datasets are complementary and combined for LCZ mapping. The dataset combination methods could be categorized into two types: (1) single-source datasets combination method. This method combines only one dataset and the derived datasets. The commonly used derived datasets are the contextual information and spectral indexes information. Some researchers tried to generate contextual information to further improve LCZ classification, such as greylevel co-occurrence matrix (GLCM), mean, minimum, maximum, median, and 25th and 75th quantile values derived from the spectral information of neighboring pixels [13,[20][21][22]. Spectral indexes (SI) are also used to improved LCZ classification accuracy, such as normalized difference vegetation index (NDVI) and modified normalized difference water index (MNDWI) [23]. (2) Multi-source datasets combination method. This method combines different datasets, such as Landsat, Sentinel-1, Sentinel-2, Aster, National Polarorbiting Partnership Visible Infrared Imager Radiometer Suite Nighttime Light, Global Urban Footprint, and Open Street Map (OSM) datasets [22][23][24][25][26][27].
Among all open-access remote sensing datasets, Sentinel-2 Multispectral instrument (MSI) and Sentinel-1 synthetic aperture radar (SAR) data are outstanding because of their relatively high spatial resolution (i.e., 10 m). Given that the difference in image mechanism between SAR and multispectral data, researchers have demonstrated that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data can complement each other in the remote sensing image classification, such as soil moisture mapping [28], crop classification [29], glacial lake mapping [30] and wetlands mapping [31]. Several studies have demonstrated the potential to combine Sentinel-2 or Sentinel-1 data with other datasets on local climate zone classification to improve the limitation of a single dataset. However, few studies exist to date that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data's contribution on LCZ classification.
A variety of dataset combinations will lead to a large number of calculations for computers. Computer performance has been given more and more advanced requirements. Accordingly, Google Earth Engine (GEE) came into being and made it possible to process the huge quantity of remote sensing data in a computer that can open the website https://code.earthengine.google.com/ accessed on 31 March 2021. GEE is a cloud-based platform for planetary-scale geospatial analysis [32]. It contains a multi-petabyte catalog With the fast development of remote sensor technologies mounted on either airborne or spaceborne platforms, remote sensing data are also used for LCZ mapping. Based on the standard production of the WUDAPT project, Landsat data have become the most popular data in LCZ mapping [13][14][15][16][17][18][19]. However, the LCZ classification scheme includes so many classes, and the single data source has some limitations in distinguishing such many LCZ classes. Now, it has become a prevalent trend in LCZ classification to take advantage of all kinds of datasets to produce a more accurate classification result. These datasets are complementary and combined for LCZ mapping. The dataset combination methods could be categorized into two types: (1) single-source datasets combination method. This method combines only one dataset and the derived datasets. The commonly used derived datasets are the contextual information and spectral indexes information. Some researchers tried to generate contextual information to further improve LCZ classification, such as greylevel co-occurrence matrix (GLCM), mean, minimum, maximum, median, and 25th and 75th quantile values derived from the spectral information of neighboring pixels [13,[20][21][22]. Spectral indexes (SI) are also used to improved LCZ classification accuracy, such as normalized difference vegetation index (NDVI) and modified normalized difference water index (MNDWI) [23]. (2) Multi-source datasets combination method. This method combines different datasets, such as Landsat, Sentinel-1, Sentinel-2, Aster, National Polarorbiting Partnership Visible Infrared Imager Radiometer Suite Nighttime Light, Global Urban Footprint, and Open Street Map (OSM) datasets [22][23][24][25][26][27].
Among all open-access remote sensing datasets, Sentinel-2 Multispectral instrument (MSI) and Sentinel-1 synthetic aperture radar (SAR) data are outstanding because of their relatively high spatial resolution (i.e., 10 m). Given that the difference in image mechanism between SAR and multispectral data, researchers have demonstrated that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data can complement each other in the remote sensing image classification, such as soil moisture mapping [28], crop classification [29], glacial lake mapping [30] and wetlands mapping [31]. Several studies have demonstrated the potential to combine Sentinel-2 or Sentinel-1 data with other datasets on local climate zone classification to improve the limitation of a single dataset. However, few studies exist to date that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data's contribution on LCZ classification.
A variety of dataset combinations will lead to a large number of calculations for computers. Computer performance has been given more and more advanced requirements. Accordingly, Google Earth Engine (GEE) came into being and made it possible to process the huge quantity of remote sensing data in a computer that can open the website https://code.earthengine.google.com/ accessed on 31 March 2021. GEE is a cloud-based platform for planetary-scale geospatial analysis [32]. It contains a multi-petabyte catalog Sparse arrangement of small or medium-sized buildings in a natural setting. Abundance of pervious land cover (low plants, scattered trees). With the fast development of remote sensor technologies mounted on either airborne or spaceborne platforms, remote sensing data are also used for LCZ mapping. Based on the standard production of the WUDAPT project, Landsat data have become the most popular data in LCZ mapping [13][14][15][16][17][18][19]. However, the LCZ classification scheme includes so many classes, and the single data source has some limitations in distinguishing such many LCZ classes. Now, it has become a prevalent trend in LCZ classification to take advantage of all kinds of datasets to produce a more accurate classification result. These datasets are complementary and combined for LCZ mapping. The dataset combination methods could be categorized into two types: (1) single-source datasets combination method. This method combines only one dataset and the derived datasets. The commonly used derived datasets are the contextual information and spectral indexes information. Some researchers tried to generate contextual information to further improve LCZ classification, such as greylevel co-occurrence matrix (GLCM), mean, minimum, maximum, median, and 25th and 75th quantile values derived from the spectral information of neighboring pixels [13,[20][21][22]. Spectral indexes (SI) are also used to improved LCZ classification accuracy, such as normalized difference vegetation index (NDVI) and modified normalized difference water index (MNDWI) [23]. (2) Multi-source datasets combination method. This method combines different datasets, such as Landsat, Sentinel-1, Sentinel-2, Aster, National Polarorbiting Partnership Visible Infrared Imager Radiometer Suite Nighttime Light, Global Urban Footprint, and Open Street Map (OSM) datasets [22][23][24][25][26][27].

Heavy industry
Among all open-access remote sensing datasets, Sentinel-2 Multispectral instrument (MSI) and Sentinel-1 synthetic aperture radar (SAR) data are outstanding because of their relatively high spatial resolution (i.e., 10 m). Given that the difference in image mechanism between SAR and multispectral data, researchers have demonstrated that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data can complement each other in the remote sensing image classification, such as soil moisture mapping [28], crop classification [29], glacial lake mapping [30] and wetlands mapping [31]. Several studies have demonstrated the potential to combine Sentinel-2 or Sentinel-1 data with other datasets on local climate zone classification to improve the limitation of a single dataset. However, few studies exist to date that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data's contribution on LCZ classification.
A variety of dataset combinations will lead to a large number of calculations for computers. Computer performance has been given more and more advanced requirements. Accordingly, Google Earth Engine (GEE) came into being and made it possible to process the huge quantity of remote sensing data in a computer that can open the website https://code.earthengine.google.com/ accessed on 31 March 2021. GEE is a cloud-based The World Urban Database and Access Portal Tools (WUDAPT) were developed in 2015 as a community-based concept to generate a global database of LCZ information. WUDAPT generates LCZ maps of cities with the use of freely available Landsat data [10,11]. The WUDAPT method resamples the Landsat image of each city into a 100 m resolution to get the spectral information of local-scale urban structures [12]. Local experts with deep knowledge of individual cities build LCZ reference polygons using high-resolution google earth images. These polygons are converted into 100 m resolution pixels and used for training and testing LCZ classification models with Landsat images. The random forest (RF) and a rule-based machine learning approach are used for LCZ classification. The LCZ maps of Europe, Continental United States have been shared through the WUDAPT portal (http://www.wudapt.org accessed on 18 April 2021).
With the fast development of remote sensor technologies mounted on either airborne or spaceborne platforms, remote sensing data are also used for LCZ mapping. Based on the standard production of the WUDAPT project, Landsat data have become the most popular data in LCZ mapping [13][14][15][16][17][18][19]. However, the LCZ classification scheme includes so many classes, and the single data source has some limitations in distinguishing such many LCZ classes. Now, it has become a prevalent trend in LCZ classification to take advantage of all kinds of datasets to produce a more accurate classification result. These datasets are complementary and combined for LCZ mapping. The dataset combination methods could be categorized into two types: (1) single-source datasets combination method. This method combines only one dataset and the derived datasets. The commonly used derived datasets are the contextual information and spectral indexes information. Some researchers tried to generate contextual information to further improve LCZ classification, such as greylevel co-occurrence matrix (GLCM), mean, minimum, maximum, median, and 25th and 75th quantile values derived from the spectral information of neighboring pixels [13,[20][21][22]. Spectral indexes (SI) are also used to improved LCZ classification accuracy, such as normalized difference vegetation index (NDVI) and modified normalized difference water index (MNDWI) [23]. (2) Multi-source datasets combination method. This method combines different datasets, such as Landsat, Sentinel-1, Sentinel-2, Aster, National Polar-orbiting Partnership Visible Infrared Imager Radiometer Suite Nighttime Light, Global Urban Footprint, and Open Street Map (OSM) datasets [22][23][24][25][26][27]. Among all open-access remote sensing datasets, Sentinel-2 Multispectral instrument (MSI) and Sentinel-1 synthetic aperture radar (SAR) data are outstanding because of their relatively high spatial resolution (i.e., 10 m). Given that the difference in image mechanism between SAR and multispectral data, researchers have demonstrated that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data can complement each other in the remote sensing image classification, such as soil moisture mapping [28], crop classification [29], glacial lake mapping [30] and wetlands mapping [31]. Several studies have demonstrated the potential to combine Sentinel-2 or Sentinel-1 data with other datasets on local climate zone classification to improve the limitation of a single dataset. However, few studies exist to date that the synergetic use of Sentinel-2 MSI and Sentinel-1 SAR data's contribution on LCZ classification.
A variety of dataset combinations will lead to a large number of calculations for computers. Computer performance has been given more and more advanced requirements. Accordingly, Google Earth Engine (GEE) came into being and made it possible to process the huge quantity of remote sensing data in a computer that can open the website https:// code.earthengine.google.com/ accessed on 31 March 2021. GEE is a cloud-based platform for planetary-scale geospatial analysis [32]. It contains a multi-petabyte catalog of satellite imagery and geospatial datasets. Therefore, the GEE platform is well-suited for dealing with various dataset combinations in LCZ mapping.
This paper aims to generate LCZ classification from multi-source free available datasets using the GEE platform, especially to analyze the attribution of different datasets to LCZ classification and find the best data combination in LCZ mapping of Wuhan.

Study Area and Datasets
Wuhan, the capital of central Hubei province, is located in central China. It is known as one of China's "furnace cities" where summertime temperatures can soar to 40 • C. In the past several decades, Wuhan has experienced rapid urbanization, which would intensify the UHI effect. Additionally, the permanent resident population of Wuhan has reached 11.21 million. The increasingly serious urban heat island effect will affect the health of citizens. Therefore, Wuhan was selected as the study area ( Figure 1).

Datasets Used for LCZ Mapping
Multi-source free available datasets were used in this study: Sentinel-2 MSI, Sentinel-1 SAR, Luojia1-01 nighttime light (NTL), and Open Street Map (OSM) dataset. Additionally, the derived datasets of Sentinel-2 MSI data were also exploited in LCZ classification, such as spectral indexes and GLCM datasets. All data processing procedures were realized in the GEE platform. GEE platform only contained Sentinel-2 MSI and Sentinel-1 SAR datasets. The Luojia1-01 NTL and OSM datasets were uploaded to the GEE platform for use.
Attention focused on datasets that might be complementary and so, when combined, provide an enhanced capacity to map the LCZ classes accurately. The combination of optical and radar remote sensing imagery provides complementary information which could improve the land cover classification accuracy [33,34]. Therefore, this study employed Sentinel-2 MSI and Sentinel-1 SAR data to improve the accuracy of LCZ mapping. Additionally, seasonal information can be used to increase the accuracy of LCZ classification [35]. The seasonal information Sentinel-2 MSI and Sentinel-1 SAR data were also generated in this study. Wuhan, the capital of central Hubei province, is located in central China. It is known as one of China's "furnace cities" where summertime temperatures can soar to 40 °C. In the past several decades, Wuhan has experienced rapid urbanization, which would intensify the UHI effect. Additionally, the permanent resident population of Wuhan has reached 11.21 million. The increasingly serious urban heat island effect will affect the health of citizens. Therefore, Wuhan was selected as the study area (Figure 1).

Datasets Used for LCZ Mapping
Multi-source free available datasets were used in this study: Sentinel-2 MSI, Sentinel-1 SAR, Luojia1-01 nighttime light (NTL), and Open Street Map (OSM) dataset. Additionally, the derived datasets of Sentinel-2 MSI data were also exploited in LCZ classification, such as spectral indexes and GLCM datasets. All data processing procedures were realized in the GEE platform. GEE platform only contained Sentinel-2 MSI and Sentinel-1 SAR datasets. The Luojia1-01 NTL and OSM datasets were uploaded to the GEE platform for use.
Attention focused on datasets that might be complementary and so, when combined, provide an enhanced capacity to map the LCZ classes accurately. The combination of optical and radar remote sensing imagery provides complementary information which

Sentinel-2 Data
Data from the Sentinel-2 MSI are provided by the European Space Agency. The MSI sensors provide relatively fine spatial resolution multispectral imagery globally at a high revisit time (5 days at the Equator with two satellites in orbit). Data are acquired in 13 spectral bands in the optical NIR, SWIR parts of the electromagnetic spectrum. The data are acquired at a range of spatial resolutions: four bands at 10 m, six bands at 20 m, and three bands at 60 m. Only the bands that acquired data with a 10 m spatial resolution (i.e., B2, B3, B4, and B8) were used. In the GEE platform, the dataset "Sentinel-2 MSI: MultiSpectral Instrument, Level-1C" was selected for use in this study. Only images with less than 10% cloud coverage were selected, and the clouds were masked using the Fmask method [36]. One hundred thirty-six images met the less than 10% cloud coverage condition. As the median value image can generate the equally high accuracy as time-series images [37], this study also calculated the annual and seasonal median value images. Figure 2 shows the seasonal Sentinel-2 median images. It can be found that the vegetation distribution has some difference in a different season, and the quality of the median image is worst in winter. The median image was covered with thin clouds in winter because the weather in winter is often hazy in Wuhan.

Sentinel-1 Data
Sentinel-1 SAR data acquired in the Interferometric Wide swath mode were used. These data represent the dual-polarized (VV and VH polarizations) response in C-band. Specifically, in the GEE platform, the dataset "Sentinel-1 SAR GRD: C-band synthetic aperture radar ground range detected, log scaling" was selected for use. Here, the Level-1 GRD products with a 10 m spatial resolution used. In total, 118 Sentinel-1 images were used to calculate the median value for the entire year. As with the Sentinel-2 data, the annual and seasonal median value images were calculated in the study. From Figure 3, the quality of median value images in the four seasons is good and slightly different from each other, which indicates that the quality of SAR images is not affected by weather by comparison with the multispectral image (i.e., Figure 2). three bands at 60 m. Only the bands that acquired data with a 10 m spatial resolution (i.e., B2, B3, B4, and B8) were used. In the GEE platform, the dataset "Sentinel-2 MSI: Multi-Spectral Instrument, Level-1C" was selected for use in this study. Only images with less than 10% cloud coverage were selected, and the clouds were masked using the Fmask method [36]. One hundred thirty-six images met the less than 10% cloud coverage condition. As the median value image can generate the equally high accuracy as time-series images [37], this study also calculated the annual and seasonal median value images. Figure 2 shows the seasonal Sentinel-2 median images. It can be found that the vegetation distribution has some difference in a different season, and the quality of the median image is worst in winter. The median image was covered with thin clouds in winter because the weather in winter is often hazy in Wuhan.

Sentinel-1 Data
Sentinel-1 SAR data acquired in the Interferometric Wide swath mode were used. These data represent the dual-polarized (VV and VH polarizations) response in C-band. Specifically, in the GEE platform, the dataset "Sentinel-1 SAR GRD: C-band synthetic aperture radar ground range detected, log scaling" was selected for use. Here, the Level-1 GRD products with a 10 m spatial resolution used. In total, 118 Sentinel-1 images were used to calculate the median value for the entire year. As with the Sentinel-2 data, the annual and seasonal median value images were calculated in the study. From Figure 3, the quality of median value images in the four seasons is good and slightly different from each other, which indicates that the quality of SAR images is not affected by weather by comparison with the multispectral image (i.e., Figure 2).

Spectral Indexes Data
A set of spectral indexes were derived from the Sentinel-2 data. The indices used were the NDVI [38] and NDWI [39]. As with the Sentinel-2 data, the median value for each pixel was used to form yearly and seasonal variables for use in the analyses. The equations of NDVI and NDWI are as follows:

Spectral Indexes Data
A set of spectral indexes were derived from the Sentinel-2 data. The indices used were the NDVI [38] and NDWI [39]. As with the Sentinel-2 data, the median value for each pixel was used to form yearly and seasonal variables for use in the analyses. The equations of NDVI and NDWI are as follows: ρ Green , ρ Red , ρ N IR represent the B3, B4, and B8 bands of Sentinel-2 data, respectively.

Texture Data
Image textural information can be used to enhance image classification accuracy. Here, the texture was quantified using the GLCM approach [40,41]. The latter can be used to estimate a range of textural descriptors, and here 8 of them were used: contrast (Con), correlation (Corr), entropy (Ent), angular second moment (Asm), dissimilarity (Diss), inverse difference moment (Idm), sum average (Savg) and variance (Var). As the seasonal Sentinel-2 dataset had 16 bands and every band would correspond to 8 kinds of GLCM variables, 128 (16 × 8) GLCM variables were generated. To keep the maximum amount of information and the minimum data redundancy, a principal component analysis was applied to every kind of GLCM variable. The first principal component, which explained the largest proportion of this kind of variable, could be used as a single textural dataset in analyses. The textural dataset only included 8 kinds of first principal components of GLCM variables.

OSM Data
OSM is a free and editable map of the whole world [42] and can be downloaded from the website https://download.geofabrik.de/ accessed on 1 October 2019. Here, three OSM layers were used: building, water, and road ( Figure 4). OSM data were uploaded to the GEE platform. OSM data were rasterized to a 10 m spatial resolution using bilinear interpolation.

OSM Data
OSM is a free and editable map of the whole world [42] and can be downloaded from the website https://download.geofabrik.de/ accessed on 1 October 2019. Here, three OSM layers were used: building, water, and road ( Figure 4). OSM data were uploaded to the GEE platform. OSM data were rasterized to a 10 m spatial resolution using bilinear interpolation.

NTL Data
The NTL dataset represents the visible light emerging from the Earth at night, and this is strongly related to variables such as urban density. Here, NTL data acquired by the Luojia1-01 satellite system were used ( Figure 5). This system was launched in June 2018 and had a spatial resolution of 130 m [43]. Only one image was used. This was acquired on 13 June 2018 after the visual check of images at the website  The NTL dataset represents the visible light emerging from the Earth at night, and this is strongly related to variables such as urban density. Here, NTL data acquired by the Luojia1-01 satellite system were used ( Figure 5). This system was launched in June 2018 and had a spatial resolution of 130 m [43]. Only one image was used. This was acquired on 13 June 2018 after the visual check of images at the website http://59.175.109.173: 8888/index.html accessed on 1 October 2019. Then the Luojia1-01 data were uploaded to the GEE platform. To aid the integration of the data with the Sentinel datasets, the Luojia1-01 data were resampled to 10 m using the bilinear interpolation method in the GEE platform.
Land 2021, 10, x FOR PEER REVIEW 8 Figure 5. A sample of the Luojia1-01 data.

Datasets Combinations
A core thrust of the research was to explore the value of combining different dat for LCZ mapping. Here, two broad categories of the dataset were available (Table 2). there are the original Sentinel-based datasets (Sections 2.1.1 and 2.1.2). These two dat are available on an annual and seasonal basis. In total, therefore, there are 4 Sentine tasets (annual and seasonal data from Sentinel-1 and Sentinel-2). Second, there are secondary datasets. The latter comprise the OSM, NTL, SI, and GLCM. These variou tasets may be combined to enhance LCZ class separability and hence the accuracy which LCZ classes may be mapped.

Datasets Combinations
A core thrust of the research was to explore the value of combining different datasets for LCZ mapping. Here, two broad categories of the dataset were available (Table 2). First, there are the original Sentinel-based datasets (Sections 2.1.1 and 2.1.2). These two datasets are available on an annual and seasonal basis. In total, therefore, there are 4 Sentinel datasets (annual and seasonal data from Sentinel-1 and Sentinel-2). Second, there are four secondary datasets. The latter comprise the OSM, NTL, SI, and GLCM. These various datasets may be combined to enhance LCZ class separability and hence the accuracy with which LCZ classes may be mapped.
The datasets formed (Table 1) could be combined in various ways. Here, attention focused on a range of incremental combinations (Table 3), beginning with the Sentinel-2 data as such optical imagery are the most widely used datasets. Initial work focused on the combination of the Sentinel-2 and Sentinel-1 datasets. Other analyses included the addition of one or more of the secondary datasets into the analysis. In total, 19 dataset combinations were generated using the layer stacking method.  Table 3. The datasets in this study.

+NTL
Adding any one secondary datasets to the basic combination

LCZ Classification System for Wuhan City and Training Polygons Selection
As shown in Table 4, 13 LCZ classes are encountered in Wuhan. In terms of built type: LCZ 1, LCZ 2, LCZ 3, LCZ 4, LCZ 5, LCZ 6 and LCZ 8 classes were selected. In terms of land cover type: LCZ A, LCZ B, LCZ D, LCZ E, LCZ F, and LCZ G classes were selected. Based on the LCZ classification scheme of Wuhan city, the training polygons were collected from google earth images by visual interpretation. About 70% of training polygons were randomly selected for training, and the rest training polygons were used for testing.

RF Classification
These LCZ classes were mapped using an RF classifier in the GEE platform. This analysis has six input parameters: (1) "numberOfTrees", the number of decision trees to create (2) "variablesPerSplit", the number of variables per split, the default value is the square root of the number of variables; (3) "minLeafPopulation", only create nodes whose training set contains at least this many points, the default value is 1; (4) "bagFraction", the fraction of input to bag per tree, the default value is 0.5; (5) "maxNodes", the maximum number of leaf nodes in each tree out-of-bag mode; (6) "Seed", the randomization seed. Nineteen dataset combinations corresponded to 19 RF classifiers which had the same parameters for comparison.

Accuracy Assessment
Here, the accuracy with which the 13 LCZ classes was expressed in terms of overall accuracy (OA, %) as well as on a per-type basis with user's accuracy (UA, %) and producer's accuracy (PA, %). Moreover, the overall accuracy for built or land cover type LCZ classes was estimated. These accuracies are referred to as OAb (%) and OAlc (%) for the built and land cover types, respectively [44]. Based on the confusion matrix, the formulas of OA, UA, PA, OAb and OAlc are shown as follows:

Accuracy Assessment and Comparison
The initial analysis was undertaken using just the yearly Sentinel-2 dataset (S2_year). The accuracy with which LCZ class was classified was typically low. Only for the classes LCZ G and LCZ 8 were relatively high accuracies observed. The overall accuracy for the classification from the Sentinel-2 yearly dataset (i.e., S2_year) was 58.95% (Table 5). Combining the Sentinel-2 yearly dataset (i.e., S2_year) with the Sentinel-1year dataset (i.e., S1_yearly) increased classification accuracy for every class except LCZ 8. The overall accuracy of the combined analysis (i.e., S1_S2_year) was 66.91%, and the OAb and OAlc were also increased relative to the value observed from using only the Sentinel-2 yearly dataset (i.e., S2_year). The text in red represents the highest PA value for each class; the text in blue represents the highest UA value for each class; the text in purple represents the highest OA value. The text in red represents the highest PA value for each class; the text in yellow represents the highest OAb value for each class; the text in green represents the highest OAlc value. Note: Tables 5-9 share the same color definition. Table 6. The accuracy comparison of adding any one of secondary datasets to the basic combination. The changes in OA (i.e., ∆OA%) are measured relative to the OA of the S2_S1_season combination in Table 5.    It was also evident that seasonal data offered the potential for higher classification accuracy over yearly data. For example, using just the Sentinel-2 seasonal data (i.e., S2_season) resulted in a classification that was more accurate than that from using just the Sentinel-2 yearly dataset (i.e., S2_year) and nearly as accurate as of the combined Sentinel-2 and Sentinel-1 yearly datasets (i.e., S2_S1_year). The combination of the Sentinel-2 and Sentinel-1 seasonal datasets (i.e., S2_S1_season) yielded the most accurate classification in terms of OA, OAb, and OAlc. The addition of seasonal data also tended to increase the accuracy of the classifications over that obtained with just the Sentinel-2 yearly dataset (i.e., S2_year). Although this was not always the most accurate classification for individual LCZ classes, it was often the most accurate and provided the overall most accurate classification.
Since the highest overall accuracies were observed with the combination of the Sentinel-2 and Sentinel-1 seasonal datasets (i.e., S2_S1_season), means to further enhance this classification by adding secondary datasets were explored. The per-class and overall accuracies obtained by adding a single secondary dataset are shown in Table 6. The overall accuracy of LCZ classification was increased by the addition of each secondary dataset. The increases in OA over that obtained from using the seasonal Sentinel-2 and Sentinel-1 datasets (i.e., S2_S1_season) were 0.01%, 0.85%, 2.55%, and 2.53% after adding the NTL, SI, OSM, and GLCM(PC1) datasets, respectively. The addition of the OSM dataset to the combined seasonal Sentinel-2 and Sentinel-1 dataset (i.e., +OSM) had the highest OA value (73.89%). On a per-class basis, it was evident that the +OSM combination had the four highest PA values and five highest UA values. Furthermore, three of these high PA values and three of the high UA values are in urban classes, which indicates that the OSM dataset has good performance in improving the urban classes' accuracy in the LCZ classification system. +GLCM(PC1) combination has the highest OAlc, and the +OSM combination has the highest OAb. Thus, while the +OSM combination seemed to generally increase accuracy for the LCZ built type classes +GLCM(PC1) combination seemed to enhance accuracy for the LCZ land cover type classes. This suggests that the addition of more than one secondary dataset could produce further increases in the overall accuracy.
The addition of any two secondary datasets to the seasonal Sentinel-2 and Sentinel-1 datasets (i.e., S2_S1_season) was explored (Table 7). In terms of overall accuracy, it was evident that the addition of two secondary datasets to the Sentinel-2 and Sentinel-1 seasonal datasets increased accuracy. Moreover, the increases were generally larger than those obtained from the addition of a single secondary dataset. +GLCM(PC1) + OSM combination has the highest OA (75.19%) and highest OAb (64.71%), which highlighted, again, the value of the OSM and GLCM(PC1) in enhancing LCZ class separability. Note, however, that the overall accuracy of the classification arising from the addition of SI and NTL datasets (i.e., +SI + NTL) was lower than that obtained from the addition of just the SI dataset (i.e., +SI). At the same time, this may suggest that the NTL data are of limited value, perhaps due to its coarse spatial resolution. It was evident that the +GLCM(PC1) + NTL combination had the highest OAlc (86.59%).
The addition of any three secondary datasets to the Sentinel-2 and Sentinel-1 seasonal datasets (i.e., S2_S1_season) was found to increase overall accuracy (Table 8). +OSM + GLCM(PC1) + SI combination yielded the highest OA (76.64%), with also high accuracies on a per-class basis (having 4 highest PA values and 8 highest UA values).
The addition of any one, two, and three secondary datasets to the seasonal Sentinel-2 and Sentinel-1 datasets (i.e., S2_S1_season) were studied in Tables 6-8, respectively. The dataset combinations with the highest overall accuracy in Tables 6-8 were also shown in Table 9, respectively. Moreover, the addition of all secondary datasets to the seasonal Sentinel-2 and Sentinel-1 datasets (i.e., S2_S1_season) was explored in Table 9. From Table 9, it is evident that the incremental change in overall accuracy over that from the Sentinel-2 and Sentinel-1 seasonal datasets (i.e., S2_S1_season) achieved by adding secondary data was 2.55%, 3.85%, 5.30%, and 4.00%. +OSM+GLCM(PC1)+SI combination has the highest OA. It was evident that the OA decreased by 1.3% when adding the NTL dataset to the +OSM+GLCM(PC1)+SI combination. +OSM+ GLCM(PC1)+SI combination has the highest OAlc while +NTL+OSM+ GLCM(PC1)+SI combination has the highest OAb.

LCZ Map of Wuhan
In total, 19 LCZ maps of Wuhan were generated by 19 dataset combinations. +OSM + GLCM(PC1) + SI and S2_year combinations have the highest and lowest OA accuracy, respectively. Figure 6 compares the two LCZ classifications generated by +OSM + GLCM(PC1) + SI and S2_year combinations. It was found that: (1) most of built type LCZ classes were located in the urban center of Wuhan; (2) LCZ A (i.e., dense tree) are mainly located in the northwest of Wuhan; (3) LCZ B (i.e., scatter tree) mainly distributed in the northwest and south-east of Wuhan; (4) LCZ G (i.e., water) could be depicted accurately in LCZ classification as water is more separable from other LCZ classes. (5) To compare two LCZ maps in detail, we selected the Shahu Lake area to zoom in on these two maps and compare them based on the corresponding google earth image. LCZ E (i.e., paved) and LCZ G (i.e., water) on the +OSM + GLCM(PC1) + SI map were more accurate than that on the S2_year map. Built type LCZ classes were overestimated on the S2_year map to some degree. Even LCZ G (i.e., water) was classified as LCZ 4 (i.e., open high-rise) or LCZ 5 (i.e., open midrise). In contrast, built type LCZ classes on +OSM + GLCM(PC1) + SI map are more consistent with google earth image. That proved again that the +OSM + GLCM(PC1) + SI combination could generate a more accurate LCZ map. maps and compare them based on the corresponding google earth image. LCZ E (i.e., paved) and LCZ G (i.e., water) on the +OSM+GLCM(PC1)+SI map were more accurate than that on the S2_year map. Built type LCZ classes were overestimated on the S2_year map to some degree. Even LCZ G (i.e., water) was classified as LCZ 4 (i.e., open high-rise) or LCZ 5 (i.e., open midrise). In contrast, built type LCZ classes on +OSM+GLCM(PC1)+SI map are more consistent with google earth image. That proved again that the +OSM+GLCM(PC1)+SI combination could generate a more accurate LCZ map. Figure 6. The LCZ map of Wuhan City. Figure 6. The LCZ map of Wuhan City.

Conclusions
Open data from Sentinel-2, Sentinel-1, OSM, GLCM, SI, and NTL were used to map the LCZ classes in Wuhan. These datasets were purposefully selected as offering different, but complementary information on the urban area and so could potentially be combined to produce an accurate LCZ map. The combination of seasonal optical and radar data from the Sentinel satellites provided a benchmark classification based on open data. The addition of secondary datasets to this combination could increase the accuracy of LCZ classifications. The OSM, GLCM, SI, and NTL datasets could be constructively combined with the Sentinel data. The highest OA value was obtained by using seasonal Sentinel-2 and Sentinel-1, OSM, and image texture datasets. Although more datasets would get a more accurate classification usually, it may not always be an absolutely effective way to improve the LCZ classification accuracy result by combining as many datasets as possible. This would have good guidance on data source selection for other study areas' LCZ classification.
It may be possible to further enhance the LCZ mapping accuracy. For example, other ancillary datasets such as the point of interest data the social media (e.g., Sina Weibo or WeChat) check-in data may provide useful information to refine the mapping, especially concerning the built type LCZ classes. The latter were classified to a lower accuracy (OAb = 65.93%) than the land cover classes (OAlc = 88.06%). Most importantly, POI and social media check-in data also are open access for the public. The open and free datasets provide more possibilities for LCZ mapping in other cities. Not only the data source but also the classification method is a direction of effort. RF classifier is a powerful classifier in most image classifications. As the built type LCZ classes are hard to differentiate, the RF classifier did not get an ideal accuracy result for all built type LCZ classes. Hence, a more powerful classifier, such as the deep learning method, could be used in further study. Moreover, this study provides the potential to generate more accurate LCZ mapping on a large scale using the GEE platform, which is significant for urban development. Acknowledgments: The authors are grateful to the referees for their constructive comments.

Conflicts of Interest:
The authors declare no conflict of interest.