Pix2Pix-Based Modelling of Urban Morphogenesis and Its Linkage to Local Climate Zones and Urban Heat Islands in Chinese Megacities

Wang, Mo; Xiong, Ziheng; Zhao, Jiayu; Zhou, Shiqi; Wang, Qingchan

doi:10.3390/land14040755

Open AccessArticle

Pix2Pix-Based Modelling of Urban Morphogenesis and Its Linkage to Local Climate Zones and Urban Heat Islands in Chinese Megacities

by

Mo Wang

¹

,

Ziheng Xiong

¹,

Jiayu Zhao

¹,

Shiqi Zhou

^2,* and

Qingchan Wang

^3,*

¹

College of Architecture and Urban Planning, Guangzhou University, Guangzhou 510006, China

²

College of Design and Innovation, Tongji University, Shanghai 200093, China

³

Art School, Hunan University of Information Technology, Changsha 410151, China

^*

Authors to whom correspondence should be addressed.

Land 2025, 14(4), 755; https://doi.org/10.3390/land14040755

Submission received: 31 January 2025 / Revised: 27 February 2025 / Accepted: 28 February 2025 / Published: 1 April 2025

Download

Browse Figures

Versions Notes

Abstract

Accelerated urbanization in China poses significant challenges for developing urban planning strategies that are responsive to diverse climatic conditions. This demands a sophisticated understanding of the complex interactions between 3D urban forms and local climate dynamics. This study employed the Conditional Generative Adversarial Network (cGAN) of the Pix2Pix algorithm as a predictive model to simulate 3D urban morphologies aligned with Local Climate Zone (LCZ) classifications. The research framework comprises four key components: (1) acquisition of LCZ maps and urban form samples from selected Chinese megacities for training, utilizing datasets such as the World Cover database, RiverMap’s building outlines, and integrated satellite data from Landsat 8, Sentinel-1, and Sentinel-2; (2) evaluation of the Pix2Pix algorithm’s performance in simulating urban environments; (3) generation of 3D urban models to demonstrate the model’s capability for automated urban morphology construction, with specific potential for examining urban heat island effects; (4) examination of the model’s adaptability in urban planning contexts in projecting urban morphological transformations. By integrating urban morphological inputs from eight representative Chinese metropolises, the model’s efficacy was assessed both qualitatively and quantitatively, achieving an RMSE of 0.187, an R² of 0.78, and a PSNR of 14.592. In a generalized test of urban morphology prediction through LCZ classification, exemplified by the case of Zhuhai, results indicated the model’s effectiveness in categorizing LCZ types. In conclusion, the integration of urban morphological data from eight representative Chinese metropolises further confirmed the model’s potential in climate-adaptive urban planning. The findings of this study underscore the potential of generative algorithms based on LCZ types in accurately forecasting urban morphological development, thereby making significant contributions to sustainable and climate-responsive urban planning.

Keywords:

climate-responsive urban planning; deep learning; Landsat; Sentinel

1. Introduction

In the wake of accelerated urbanization and subsequent land-use transformations, China has been experiencing the emergence and proliferation of mega-cities [1]. These mega-cities are characterized by extensive resource and energy consumption, which exacerbates their vulnerability to climatic extremes, such as heatwaves, intense rainfall, and urban flooding [2]. The rapid expansion of urban areas, coupled with the increasing frequency of extreme weather events, underscores the urgent need for innovative urban planning strategies to enhance resilience and sustainability. Managing the complexities of mega-cities requires a nuanced understanding of the interplay between urban development and environmental stressors, particularly in the context of climate change.

A critical aspect of addressing these challenges lies in understanding the intricate relationship between urban morphology and localized climatic conditions [3]. Urban morphology, which encompasses various configurations, such as open spaces, building complexes, and architectural characteristics, plays a pivotal role in shaping microclimates within cities [4]. Traditional urban landscape models, such as the Global Urban Footprint and Global Digital Elevation Model, have been widely used to represent urban forms. However, these models are limited, as they do not include localised climatic considerations and regional specificities, leading to inaccuracies in representing the diverse 3D urban landscape [1,5]. The Localized Climatic Zones (LCZs) framework, introduced by Stewart and Oke [6], emerges as a robust mechanism for elucidating the complex interplay between urban forms and climate, through a detailed classification of urban and natural land features alongside 3D spatial metrics. This framework facilitates a granular understanding of urban heterogeneity, offering insights into micro-environmental variations with precision [7,8]. Despite its advantages in enhancing urban morphology’s spatial configuration to better adapt to climatic adversities, LCZ application is marred by challenges in assessing specific urban forms, maintaining up-to-date information. and incorporating urban dynamics, such as wind patterns, resource consumption, and energy efficiency. A comprehensive solution calls for the development of a novel methodology that incorporates a synergy of insights into LCZs with the automated generation of 3D urban morphologies, and transcends existing limitations. Once developed, this methodology could be used to foster resilient urban development in the face of urbanization pressures and climate change. Such a solution may be drawn from deep learning tools, as elaborated herein.

With the growing application of deep learning models in urban morphology analysis and prediction, cities are increasingly able to advance climate resilience [9]. Leveraging data on historical temperatures, land use changes, and building densities, these machine learning models can detect and forecast areas with pronounced urban heat island effects, thus providing urban planners with robust, data-driven insights for sustainable decision-making. References [10,11] demonstrate that convolutional neural networks (CNNs), exemplified by the MobileNet model, could accurately predict restaurant locations in Seoul using only 2D building outlines, achieving an AUC of 0.732. Zhou [12] introduced an LSTM–CA model that integrates cellular automata with long-short term memory networks to simulate urban expansion based on time-series urban extent maps. In a Foshan, China case study, this model exhibited superior predictive accuracy compared to the FLUS and GeoSOS models, highlighting neighborhood variables’ importance and transition potential factors through Shapley analysis. However, traditional CNNs and LSTM–CA models, while excelling in urban morphology analysis and dynamic simulation, lack conditional generation capabilities and are unable to produce high-resolution, high-fidelity urban morphology images based on specific conditions (such as LCZ classification). While conventional CNNs primarily excel in feature extraction and classification, and LSTM–CA models are better suited for dynamic, time-dependent processes, like urban growth, both approaches face limitations in generating high-fidelity image outputs. In contrast, Pix2Pix, through its conditional generation mechanism, high-resolution image generation capability, end-to-end learning, and adaptability to local climatic features, addresses these shortcomings, making it an ideal tool for coupling the relationship between urban 3D morphology and LCZ. This capability is unattainable with traditional CNNs and LSTM–CA models.

Generative Adversarial Networks (GANs), a forefront in deep learning technologies, have garnered significant attention for their adaptability across various engineering domains, notably in image processing and computer vision. Their application has been extended to the simulation of urban environments and the generation of urban imagery, demonstrating a substantial potential in urban planning and design [13]. GANs operate on an unsupervised learning model, engaging two neural networks—the discriminator and the generator—in an adversarial process. This interaction facilitates the generator ’s capability to refine its output, enhancing the realism of produced environmental scenes [14]. Among its variants, the cGAN excels in generating high-fidelity images from limited datasets, making it particularly suitable for applications requiring detailed feature extraction and replication. This attribute has led to its deployment across several fields. Notably, Huang et al. [15] utilized a cGAN model to optimize urban morphology in real-time by predicting environmental conditions, such as pedestrian-level wind, solar radiation, and the Universal Thermal Climate Index. Similarly, González-Sabbagh et al. [16] introduced a framework leveraging on cGAN for environmental signal reconstruction, significantly improving signal recovery accuracy for air quality data in Beijing compared to other methods. The Pix2Pix model, a specialized form of cGAN, is acclaimed for its ability to perform high-resolution simulations, making it invaluable for deciphering urban morphology complexities. Its effectiveness in simulating urban environments has been demonstrated through various applications, from predicting pedestrian wind flow conditions to integrating LCZ classification with urban morphology data for enhanced climate responsiveness in urban settings [17,18].

Currently, there is a noticeable lack of research focused on intelligent prediction of urban spatial configurations within major cities. This study aims to bridge that gap by leveraging GANs, specifically the Pix2Pix model, to generate 3D urban morphologies based on LCZ standards and to conduct studies on the urban heat island effect in the context of future urban landscape changes. The study has three key objectives: (i) Developing a generative model tailored for urban planning addresses the gap in predictive modeling tools capable of simulating complex 3D urban morphologies, which is crucial for climate-responsive urban planning. While existing methods focus on static or simplified representations of urban forms, our model incorporates real-time climatic interactions, offering a more dynamic approach, (ii) Predicting large-scale urban morphology based on LCZ classifications directly addresses the gap in understanding the relationship between urban form and climate resilience in large cities, particularly in terms of localized climatic effects, like urban heat islands. Current models often overlook fine-grained LCZ-specific simulations, a gap which our research seeks to bridge. (iii) Evaluating the adaptability of the Pix2Pix model in large-scale urban block morphology simulation fills the gap in high-precision predictive technologies for optimizing large-scale urban forms in urban heat island mitigation and climate adaptation strategies. This objective focuses on how to use generative models to accurately simulate urban morphology, providing more efficient data support for large-scale urban planning and climate adaptation, particularly in addressing urban heat island effects and optimizing urban spatial layouts.

2. Materials and Methods

The research framework consists of four key components, as illustrated in Figure 1a: Collect and organize urban morphology and LCZ classification data, preprocess and augment the data, and split it into training and testing sets; Figure 1b: Train the Pix2pix model using the dataset, iteratively optimize to generate realistic images, and monitor performance using similarity verification metrics; Figure 1c: Perform robustness testing using various accuracy metrics to ensure model reliability, and evaluate and refine the generated output images; Figure 1d: Utilize the model to predict emerging 3D urban forms and conduct dynamic response analyses by altering LCZ settings or road layouts, generating detailed visual outputs. This framework utilizes LCZ classifications to account for anticipated changes in urban form. By employing various parametric modeling techniques, it enhances the efficiency and accuracy of automated spatial model generation, thus improving the urban simulation process.

2.1. Urban Form Data Acquisition

2.1.1. Preparation of Urban Morphology Data

To ensure the robustness and generalization of the Pix2pix model, eight Chinese cities with distinct urban morphological characteristics were selected: Shanghai, Guangzhou, Shenzhen, Fuzhou, Wuhan, Changsha, Qingdao, and Hangzhou. These cities were chosen based on criteria that reflect the breadth of urban forms in China, encompassing a spectrum of geographic, economic, and developmental patterns. Shanghai and Shenzhen, for instance, exemplify densely populated commercial hubs with high-rise architecture, while Fuzhou and Wuhan capture more traditional cityscapes characterized by compact, low-rise historic districts [19].

Furthermore, the selection includes cities with varied environmental elements, such as extensive green spaces and water bodies—examples being the parks and rivers in Hangzhou and the coastal landscape of Qingdao—each contributing different environmental typologies. These cities were also selected to provide comprehensive representations of urban structural diversity, capturing various patterns of urban expansion and morphological evolution, and infrastructure development that reflect China’s rapid urbanization. Their unique characteristics include complex transportation networks designed to meet high travel demands and heterogeneity in green and blue infrastructure. This diversity in urban forms serves as an extensive dataset for the Pix2pix model, enhancing its adaptability and generalization across multiple urban contexts. The inclusion of cities with such a wide range of urban configurations allows for a thorough evaluation of the model’s applicability in simulating urban morphologies and assessing its effectiveness in diverse urban scenarios.

The spatial data for the selected cities in this study was obtained from the World Cover database (World Cover Viewer, esa-worldcover.org) (accessed on 17 September 2024), an interactive online map tool provided by the European Space Agency (ESA) [20]. This tool allows users to view global land cover data with a resolution of 10 m. It is part of ESA’s WorldCover project, which provides land cover products for 2020 and 2021, based on Sentinel-1 and Sentinel-2 satellite data. The dataset includes detailed classifications of land cover types, such as forests, grasslands, croplands, urban areas (with varying degrees of development), and water bodies. For this study, the 2021 data were utilized.

For the purposes of the research, all water bodies were grouped into a single category with an RGB value of (0, 99, 196). Land cover types, including forests, shrubs, and wetlands, were aggregated into an “urban green space” category with an RGB value of (0, 106, 0). Urban areas were classified according to their development level, and impermeable surfaces were also assigned an RGB value of (0, 106, 0).

Building outlines and height data were sourced from RiverMap, the powerful professional ArcGIS 10.8 software used for downloading global high-resolution satellite images, historical imagery, elevation data (e.g., contour lines), and both vector and raster maps. This software supports the conversion between various coordinate systems (e.g., WGS84, Beijing 54), and offers features such as map annotation, vector downloading, overlaying, and large-scale image stitching. The building data in the study were classified according to development levels, and the height data were evenly categorized from 0 to 600 m based on the RGB color range (139, 0, 0) to (255, 255, 255), as shown in Figure 2.

2.1.2. Preparation of LCZ Data

The LCZ scheme (see Table S3), an international standard protocol for studying urban morphology associated local climate conditions, is an important surface classification system used to differentiate urban climates in various locations [21,22]. The World Urban Database and Access Portal Tools (WUDAPT) represent a pioneer initiative in this regard and have been widely employed to generate LCZ maps along with integration of 34 pre-processed satellite input features [23]. This is the approach adopted in this paper. In the workflow for WUDAPT LCZ classification, satellite raster data underwent initial processing, followed by pre-processing of certain training areas. Then, classification algorithms were employed and analyzed. The process may be iterated to achieve high accuracy requirements while ensuring consistency between the classified outputs and the underlying urban landscapes [24].

In this study, the integrated satellite dataset comprised available features from Landsat 8, Sentinel-1, and Sentinel-2 [25]. LCZ maps for these eight cities were generated using summer-season remote sensing satellite imagery. The choice of summer data stems from the season’s unique environmental characteristics: peak vegetation growth, pronounced surface temperature variations, and distinct land cover patterns. These conditions enable satellite imagery and remote sensing data to capture diverse land use types and building structures with greater clarity. Consequently, LCZ classifications derived from summer imagery offer a more precise representation of urban morphological attributes, aligning well with the objectives of urban form analysis [26]. To facilitate the utilization of urban landscape differentiation for systematic mapping and data analysis, the images were re-sampled from a 30-m resolution to grid units of 100 m by 100 m. Given the prevalent characteristic of densely-built environments in the core areas across the selected eight cities, the classification of LCZs holds paramount significance. The Landsat imagery of the eight selected cities was processed using the Random Forest algorithm and integrated into the LCZ map generator [27]. Approximately 70% of randomly sampled instances from the dataset were allocated for model training, while the remaining 30% were reserved for verification [28].

The results of the verification phase were further employed to optimize parameters, such as sample size and feature quantities, facilitating the generation of the representative LCZ map (Figure 3). The accuracy of the LCZ maps generated is pivotal in assessing the efficacy of model training [29]. To validate the LCZ maps, four metrics were employed: overall accuracy (OA), overall accuracy for urban LCZ types (OAu), overall accuracy for building LCZ types (OAbu), and weighted accuracy (OAw) [20]. The required accuracy of the LCZ classification map for the experiment were substantiated based on the computed indicators in Supplementary Table S1.

The spatial and urban LCZ typologies of each city were systematically delineated and categorized into discrete grid cells, each measuring 600 m*600 m. The LCZ proportion within each grid cell was calculated based on the accuracy of the 100 m*100 m LCZs taxon. The results are shown in Figure 4 and show a specific city form streetscape map for each LCZ type. Typical LCZ-type samples are filtered based on different partitions to arrive at a final dataset of 4489 instances. The obtained typical urban morphology of LCZ1-G was classified roughly using street view images, as shown in Figure 4. “LCZ7” was excluded from this study due to the limited availability of the dataset, resulting in a total of 16 different classifications.

2.1.3. Encoding and Enhancement of Data

After obtaining 4489 training samples, each representing a Pix2Pix training unit, these units consisted of 6 × 6 LCZ grids, which together correspond to a 600 m × 600 m area. Each LCZ was resampled to a 100 m × 100 m grid, and 36 LCZs (6 × 6) were used as a single training sample. Various data augmentation techniques were then applied to increase the diversity of the training dataset and improve the robustness and versatility of the calculation model [30]. Each data sample underwent specific transformations, including rotations at 90° and 270°, respectively, and horizontal mirroring, thereby generating three unique variants (17,956 samples) for each original sample [31], as shown in Figure S1. To alleviate the computational burden associated with processing a large volume of training images, transformations were applied in real-time to small batches of training samples prior to each iteration of model training. This strategic approach facilitated the creation of an augmented dataset containing a total of 17,956 samples. Of these, 70% were allocated to the training set and the remaining 30% used for validation purposes. Each sample was carefully paired to train the Pix2pix model, ensuring methodological consistency and data integrity [32].

2.1.4. Accuracy and Validation of Model

For the validation dataset containing 30% of the samples, six evaluation indicators were used to verify the accuracy of the model: (1) Overall Accuracy (OA); (2) Overall Accuracy for Urban LCZs Classes (OAurb); (3) Overall Accuracy for Natural LCZ Classes (OAnat); (4) OA for weighted accuracy (OAw); (5) Kappa coefficient, serving as a customary measure for assessing accuracies across different types; and (6) F1 score, regarded as an evaluation metric describing classification performance. Given the complexity of LCZ classification, the accuracy of the product (PA), User’s Accuracy (UA), and F1 score were determined using weighted harmonic means as the final evaluation metrics for each city [33].

2.1.5. Calculation of Urban Geometric Properties

Precise alignment between urban morphologies within the training dataset and their respective LCZ classifications is paramount for maintaining dataset accuracy. A comprehensive review of the literature suggested several critical parameters for LCZ classification [34]. These parameters include Sky View Factor (SVF), Aspect Ratio (AR), Green Space Index (GSI), Impervious Surface Fraction (ISF), Permeable Surface Fraction (PSF), and High-Resolution Elevation (HRE), as detailed in Supplementary Table S2. There is significant variability in the value-ranges of these parameters for LCZ types delineated using the WUDAPT method across diverse urban environments [35]. To mitigate this issue and provide precise parameter delineations, Supplementary Table S3 has also been included to produce an exhaustive breakdown of specific parameter ranges applicable to each LCZ type.

2.2. Pix2pix Model Training Framework

GANs, currently acknowledged as leading-edge deep learning models, comprise two fundamental components. The architecture includes (i) the generator model (G model), which utilizes noise variables as input to generate novel data instances, and (ii) the discriminator model (D model), which is responsible for determining the authenticity of each data instance with respect to the actual training data. These two parallel networks are involved in iterative and alternating training to mutually enhance their respective performances [36]. Through a min–max formulation, these networks are trained in opposition to each other, with the discriminator network aimed at discerning the authenticity of data [36]. The characteristics of GANs associated with adversarial interplay and iterative optimization render the identification of real and generated sample images.

The generator component includes “input and proceed” procedures, such as convolution, activation functions, de-convolution, and creates samples resembling the input images through the enhanced competition between G-model and D-model [37]. When the samples generated by G-model become sufficiently realistic, D-model is tasked with distinguishing between real and generated samples.

The two neural networks, i.e., the U-Net generator G and discriminator D, adopt a shared encoder–decoder network structure, as illustrated in Figure 5. G represents the function G{x, z} →y, where x is the input cityscape background image, and y is the generated output image [38]. The architecture consisted of a contracting path and an expanding path. Convolutional blocks were employed in the contracting path, performing multi-channel 2D convolutions with a stride of 2, while the expanding path was characterized as a symmetrical structure with transposed convolutional blocks, tailored for both the generator G and discriminator D.

Due to the replication and concatenation operations, the channel count was increased by a factor of two at each layer in the U-Net generator [39]. Both the neural network functions “Dropout” and “Tanh” performed activation functions. On the one hand, “Dropout” was used to prevent overfitting during model training by randomly excluding a portion of neurons, thereby enhancing model generalization [40]. On the other hand, “Tanh” was employed to address the issue of non-zero-centered outputs, which may result in slow convergence [41]. Supplementary Table S4 meticulously documents the final hyperparameter configurations for the Pix2pix model, which were established following extensive training iterations. Notably, the batch size was optimized to a singular unit, ensuring precise adjustment and calibration of the model, as shown in Figure 5.

3. Results

The results showcase the key findings of the Pix2Pix model in urban morphology prediction. They highlight the model’s effectiveness in simulating 3D urban forms of selected Chinese cities in alignment with Local Climate Zone (LCZ) classifications. Key performance indicators, including RMSE, R², and PSNR, demonstrate the model’s high accuracy in capturing urban features, such as building distribution, green spaces, and water bodies. Additionally, the model’s dynamic response to changes in urban structures and climate scenarios was also evaluated. The following subsections will provide a detailed analysis of the model’s iteration selection, training process, and the accuracy of the generated urban morphology.

3.1. Iteration Selection in pix2pix Model

During the training phase of the model, meticulous attention was devoted to the loss curves of both the discriminator and generator, with specific emphasis on the “Gen total loss” and “L1 loss”. According to Sutanto et al. [42], “L1 loss,” a metric evaluating pixel-level discrepancies, is employed to coax the generator into replicating images that not only challenge the discriminator but also preserve intricate details and structural fidelity. This metric ensures that the generator aspires to reconstruct the target image with heightened precision. In addition, “Gen_total_loss” amalgamates various loss components, including adversarial and content losses, providing a holistic evaluation of the performance of the “Generator”.

The loss curves for various components of the model, including the discriminator loss, generator loss, L1 loss, and total generator loss, track the changes in these loss metrics over the training epochs, providing a detailed overview of how the model’s performance improves over time. A review of these loss curves revealed that the “Generator loss” displayed a declining trajectory from 0 to 200 iterations and leveled off through to 1000 iterations (Figure S2). The nadir of this curve was identified at epoch 212, where the loss value registered at 1.49. Similarly, the “L1_loss” and “Gen_total loss” curves displayed this pattern, with both metrics decreasing through 1000 iterations. Optimal values were recorded at epoch 790, with loss figures of 0.15 and 16.43, respectively. Beyond this point, only minor fluctuations were observed. This stability in the training process underscores the robustness of the model, making it most suitable for subsequent applications in predicting urban morphology. The selection of the model at epoch 800, proximal to the minimum loss epoch, was strategic and produced optimum accuracy in urban morphological simulations.

3.2. Monitoring and Summarizing the Training Process

To rigorously evaluate the authenticity of images generated by the Pix2pix model at various iteration counts, and to assess its capability in replicating original urban images, samples from eight distinct cities were randomly selected after a certain number of iterations: 25, 50, 100, 300, 500, and 800. Figure 6 showcases a set of urban form images generated by the Pix2Pix model at different epochs, starting from epoch 25 and going up to epoch 1000. The progressive improvement in image quality and fidelity is evident as the epochs increase. At early epochs (e.g., epoch 25), the generated images still have rough edges and lack certain details, like building shapes and green spaces. However, as the training progresses to epochs 500 and 800, the generated images display much more accurate urban features, mimicking the real-world architecture and natural landscapes in the input images.

Initial results at iteration 25 revealed that, while partial morphological features of the input images were captured, certain distinct features of building distribution and morphology were evident. However, with more iterations, numbering 800–1000, the graphical fidelity of the images improved substantially, reflecting more accurate architectural forms and green space contours. This qualitative comparison visually demonstrates the model’s ability to refine and improve its output over time, which is critical for evaluating the model’s capability in simulating realistic urban environments. Optimal image quality was achieved at iteration 800, as illustrated in Figure 6, whereas a decline in quality at 1000 iterations suggested potential model overfitting [43,44].

Further qualitative and quantitative evaluations were conducted using metrics such as the R² coefficient, Root Mean Square Error (RMSE), and Peak Signal-to-Noise Ratio (PSNR) to measure the RGB discrepancies between generated and actual images [45,46]. These analyses, depicted in Figure 6 and outlined in Table 1, focused on the statistical performance across selected test datasets of the eight cities. Notably, the case of Fuzhou demonstrated superior performance, with a structural similarity index measure (SSIM) of 0.723, R² score of 0.862, and PSNR of 16.029. In contrast, the case of Guangzhou showed lower scores, primarily due to its diverse architectural landscape and variations in building sizes, which is likely to have affected the performance of the trained model. The case of Wuhan showed relatively lower scores, and this was attributable to insufficient data on urban road networks, which impacted effective training constraints within the Pix2pix framework. The average performance metrics for the eight cities—R² score of 0.814, SSIM of 0.648, and PSNR of 14.592—highlighted the model’s effectiveness in capturing urban green spaces and water bodies, as well as its profound capability in replicating architectural morphology and producing transportation networks that could closely align with real-world traffic network configurations.

Fuzhou achieved the highest scores in SSIM, R² score, and PSNR, with values of 0.723, 0.862, and 16.029, respectively, while Guangzhou’s scores were comparatively lower, at 0.617, 0.780, and 13.905. For Wuhan, the corresponding SSIM, R² score, and PSNR were 0.580, 0.787, and 14.115. The differences between the Guangzhou and Fuzhou samples may be attributed to several factors. First, the architectural heterogeneity in Guangzhou’s central urban area is likely to have contributed to a slightly lower training accuracy compared to the other seven cities. Meanwhile, the lower training accuracy in Wuhan may be due to the limited amount of urban road network data in the original dataset, resulting in a lack of effective constraints during the Pix2pix training process. In terms of urban green spaces and water bodies, Pix2pix demonstrated strong edge texture extraction capabilities, effectively capturing the characteristics of common urban greenery, such as parks and roadside vegetation.

Figure 7 highlights the effectiveness of the Pix2Pix model in synthesizing realistic urban morphological features. Comparing the generated images with the ground truth shows how accurately the model can simulate 3D urban forms, such as building layouts, roads, and green spaces, based on the initial input data. The relative error quantifies discrepancies between the predicted and real images, which is crucial for assessing model performance. The 3D model further demonstrates the potential application of the model for real-world urban planning and climate-resilient city design, particularly in predicting future urban developments under different environmental and infrastructural conditions.

In generating architectural morphologies, Pix2pix demonstrates a high degree of replication capability, particularly relying on the alignment with traffic networks. The model not only accurately captures the visual features of individual buildings but also maintains structural consistency with real urban layouts [47]. Given the critical role of roads and traffic networks in shaping building distributions, Pix2pix effectively learns these relationships, producing building morphologies that closely mirror reality in both layout and detail [48]. This allows the model to excel not only in replicating local architectural features but also in reflecting the broader coordination between buildings and traffic networks, providing a precise predictive tool for urban planning.

3.3. Urban Morphology Generation and Accuracy

3.3.1. 3D Urban Morphology Generation

The urban form of the 3D model was generated using the Grasshopper plug-in for the Rhino 7.9 software environment, leveraging its automation capabilities to execute form-generating operations by correlating different numerical values within the RGB spectrum, thereby identifying building outlines and constructing models with specific heights. In the case of water bodies and green spaces, the positions of natural elements were identified through RGB values in Grasshopper, which were matched with the corresponding locations in the output image automatically. Figure 7 illustrates the images produced by the Pix2pix model and their corresponding three-dimensional reconstructed urban morphologies [49].

In the case of Wuhan, the distribution of green spaces closely resembled those of the actual situation, especially in a central green space within the built-up area, where both the boundaries and size were realistic and mimicked the true picture closely. The samples from Shenzhen exhibited a more regular urban morphology, which fitted naturally to model learning. In the cases of Shanghai and Changsha, some discrepancies between predicted and real images were observed and were mainly due to irregular urban morphology. Nevertheless, the overall prediction accuracy was relatively high. Similar results were observed in the case of Fuzhou. In the case of Hangzhou, the well-structured urban layout and minimal variation in building types led to a close resemblance between the generated and real images. Even in the Qingdao samples, where there was a lack of constraints from the urban road networks, the image quality and details were rather satisfactory. As such, with the highly accurate prediction of architectural contours, coupled with a multifaceted built environment, including roads, greenery, and waterways, this model exhibited a strong predictive capacity for adequately reflecting the characteristics of most cities.

3.3.2. Evaluation of the Accuracy of the Model Based on LCZ Classification

Detailed examination of individual LCZ classifications was performed through confusion matrices, and the results for the test sets across eight cities are presented in Table 2. These results provide insights into the model’s proficiency in accurately replicating urban forms consistent with the designated LCZ types. The mean values recorded—OA at 81.5%, Kappa at 0.81, and F1 score at 0.83—underscored the model’s high accuracy in predicting building configurations. The average values for OAurb and OAnat were 80.6% and 81.7%, respectively, highlighting the model’s robust capability in rendering realistic representations of both urban architecture and natural environments.

Overall Accuracy (OA) measures the proportion of correctly predicted instances to the total number of instances, and its formula is

O A = \frac{T P + T N}{T P + T N + F P + F N}

where TP are True Positives, TN are True Negatives, FP are False Positives, and FN are False Negatives.

Kappa coefficient is used to evaluate the agreement between predicted and actual results, adjusted for chance. Its formula is

K a p p a = \frac{P_{o} - P_{e}}{1 - P_{e}}

where

P_{o}

is the observed agreement and

P_{e}

is the expected agreement by chance.

F1 score combines both Precision and Recall into a single metric by taking their harmonic mean, and its formula is

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

where Precision is the ratio of correctly predicted positive observations to the total predicted positives, and Recall is the ratio of correctly predicted positive observations to all actual positives. The calculation results can be found in Table 2.

Further analyses using confusion matrices, as illustrated in Figure S3, revealed variation in classification accuracy among the cities. Notably, the lowest F2 score was observed in the case of Guangzhou and was attributable primarily to multiple misclassifications within LCZ3 and errors in identifying LCZ2 and LCZ8. These discrepancies suggested potential overfitting issues and boundary adhesion problems during the model learning process, which particularly affected the categories of low-rise large buildings (e.g., LCZs8) and industrial structures (e.g., LCZs10). Additionally, challenges in differentiating compact high-rise clusters and distinguishing between natural features across LCZsB to LCZsD contributed to further misclassifications of urban and natural LCZs [50].

In the case of cities such as Shenzhen, Changsha, and Wuhan, limited data on road networks adversely impacted the effectiveness of model training, leading to misidentifications among open and compact low-rise residential buildings. Conversely, cities such as Qingdao and Fuzhou achieved higher F1 scores, mainly due to the presence of fewer dense low-rise areas and a clearer delineation between natural and urban LCZ categories. The minimal errors in vegetation cover further enhanced the Pix2pix model’s capacity to closely mimic the true urban spatial environments in these scenarios.

3.3.3. Dynamic Response of 3D Models

The Pix2pix model demonstrates significant potential in detecting LCZ variations and generating corresponding 3D models, making it particularly valuable for urban design and sustainable development in rapidly growing urban areas. This capability is essential for predicting and adapting to future urban transformations. Figure 8 illustrates dynamic adjustments in 3D models resulting from changes in LCZ classifications and road network configurations. In the case of Guangzhou, data primarily classified under LCZ1 served as the initial input. Subsequent modifications to the road network generated updated urban morphology models, revealing the rapid emergence of new urban block forms and their adaptability to changes in road network structures. Further simulations employed LCZB as a variable to account for green space changes driven by urban development, impacting critical urban morphology indicators, such as building outlines, heights, and the extent of green spaces. These adjustments allowed the model to retain original architectural features while introducing new structural patterns responsive to urban changes.

Following these modifications, air temperature data for various urban morphologies were analyzed using the Ladybug tool. This functionality provides urban designers and planners with comprehensive monitoring of temperature data across different configurations, equipping them with insights to effectively mitigate urban heat island (UHI) effects in future projects. Using IWEC climate data for Macau 450,110, randomly downloaded from ewpmap.com and imported into the Ladybug platform, a quantitative analysis was conducted on 3D models generated from Figure 8 to represent changes in road network and LCZ classifications. The results reveal that modifications in road network structure and land cover significantly influence UHI impacts. Specifically, images (a), (b), and (b) illustrate different LCZ scenarios and their corresponding temperature profiles. In the original model shown in Image A, urban morphology exhibited three occurrences of average midday temperatures exceeding 30 °C (12–15 h). In Model B, where the road network was modified, four instances of midday temperatures above 30 °C were observed. Conversely, in Model C, where LCZ1 was replaced by LCZB, no midday temperatures exceeded 30 °C. Additionally, nighttime heat dissipation capacity is a critical factor in assessing UHI. During winter nights, instances of temperatures exceeding 21.5 °C were less frequent in Model C than in Models A and B, highlighting the impact of LCZ adjustments on nighttime heat retention. These temperature distributions and quantitative results confirm that road network and LCZ modifications influence urban temperature profiles, providing valuable data to support the development of urban heat island mitigation strategies.

4. Discussion

4.1. Prediction of Future Urban Morphology

Given its exceptional learning performance and morphological capabilities in LCZ classification, the Pix2pix model has demonstrated extraordinary potential for accurately representing the typical urban morphology of selected Chinese cities, while also reducing economic costs and providing a more efficient tool for urban planning and design. Moreover, by effectively simulating the interactions between urban structures and climate, the model plays a critical role in enhancing urban resilience to extreme weather events and climate change. The simulation outputs facilitate informed urban planning decisions, enabling optimized spatial layouts, improved building configurations, and enhanced ecosystem services. Collectively, these outcomes bolster a city’s adaptive capacity, supporting a proactive approach to mitigating climate impacts and promoting sustainable urban development. This capability serves as a vital tool for forecasting urban transformations amid rapid expansion phases. The Tangjiawan Zhuhai North Station Transit-Oriented Development (TOD) complex, strategically positioned within Zhuhai, Guangdong, was selected as a case study to generate predictive LCZ images in alignment with government planning initiatives [51]. Figure 9a shows the projected LCZ classifications, while Figure 9b displays the corresponding 3D urban morphology model, created by inputting the LCZ data into the trained Pix2pix model.

The detailed visualizations in Figure 9 include axonometric views of local building configurations within the Zhuhai North Station TOD and their corresponding LCZ predictions. Specifically, Figure 9 (1) portrays a large recreational space predominantly covered with lawns, designated as LCZD. Figure 9 (2) captures the operational dynamics of the Zhuhai North Station, classified under LCZ8, with its associated buildings and environment. Figure 9 (3) illustrates the central commercial office zone, reminiscent of a typical CBD business district, predicted under LCZ1 and LCZ2. Figure 9 (4) highlights a high-rise residential area anticipated as LCZ4, while These figures demonstrate the model’s adeptness in cross-feature learning and texture capture, affirming the efficacy of the GAN in automatically generating realistic 3D urban models for building clusters.

4.2. Computational Efficiency and Future Perspectives

Compared to traditional machine learning models, Pix2pix demonstrates superior end-to-end training capabilities and efficiency, eliminating the need for manual feature extractors and significantly streamlining the model design process. By preserving the structural integrity of building clusters during the image translation process, Pix2pix enhances both the accuracy and coherence of generated images, showcasing exceptional computational performance among GAN variants applied to image translation and classification tasks [52].

This study uniquely positions Pix2pix as a powerful data-driven tool for feature learning in China’s major urban centers. Traditional reliance on urban designers’ knowledge and expertise can limit the creation of complex and novel architectural forms, posing constraints on creativity and innovation in urban planning. Compared to traditional manual design methods, the automated features of the Pix2Pix model significantly save on labor and time costs. Due to the autonomy of LCZ classification, planners can quickly assess multiple planning options based on different climate types, avoiding repeated revisions and recalculations, thus accelerating the process of finalizing the plan. By incorporating direct experimental data from Excel, this study further highlights Pix2pix’s impact on urban development by demonstrating its capacity to optimize spatial configurations in ways that mitigate urban heat island effects. Through numerical simulation advancements and diverse urban environment training data, Pix2pix opens avenues for exploring complex urban morphologies, supporting multi-scenario planning that considers both urban growth and climate resilience. The innovation of this study lies in its ability to predict and optimize ideal urban configurations using a multi-scenario matrix of urban development and climate change, integrating LCZ classification to offer valuable insights. Clearly, Pix2pix not only enables quantitative analysis of local climate phenomena, such as urban heat islands, but also provides a forward-looking predictive tool for urban planners and policymakers, underscoring its potential to support sustainable urban development [53].

5. Conclusions

In this study, standard LCZ typologies were employed, which were typically adequate for accurately delineating urban morphology in the majority of the selected cities. However, it was crucial to recognize that prevailing LCZ classification frameworks might not fully capture the architectural nuances of other metropolises, especially those characterized by a significant prevalence of skyscrapers exceeding 25 m in height and distinct surface textures. This limitation could lead to substantial inaccuracies in subsequent simulation efforts. Addressing this shortfall was essential for accurately modeling the complex and evolving built environments of major urban centers. An imperative step forward would be the refinement of the existing LCZ schema by incorporating additional sub-categories, such as a “skyscraper” LCZs, to more accurately reflect distinctive urban landscapes and enhance the fidelity and realism of 3D urban morphology simulations [54].

Furthermore, this study recognized methodological limitations due to the use of non-independent samples for model validation, which might have compromised the comprehensive evaluation of model effectiveness. Future research should aim to integrate independent validation samples to strengthen the robustness and external validity of the simulation results. Moreover, there was a growing inclination towards utilizing advanced deep learning models, including 3DGAN, StyleGAN, and Stable Diffusion, for the reconstruction and planning of 3D urban morphologies. Future studies could have leveraged and extended the current findings by incorporating a wider array of GAN algorithms, thereby enhancing the reliability and precision of these innovative technological approaches.

This study employed the Pix2pix deep learning framework to explore the complex interactions between 3D urban morphology and local climate, aiming to provide critical insights for sustainable development strategies in major Chinese cities. Despite challenges in obtaining high-resolution remote sensing imagery and limitations on generating dynamic models within 3D platforms, the paper demonstrated an effective integration of urban morphology with LCZ classifications. By using both qualitative and quantitative parameters, the study confirmed Pix2pix’s outstanding predictive accuracy and robust modeling capabilities for LCZ analysis in the select Chinese cities. Although LCZ classification faced certain challenges due to gaps in data on specific building footprints and vegetation coverage, the Pix2pix model demonstrated excellent overall effectiveness following rigorous validation protocols. With an overall accuracy of 85.2%, a Kappa coefficient of 0.83, and an F1 score of 0.86, the model underscored its strong capabilities and adaptability in advanced applications, offering potential support for mitigating urban heat island effects. Additionally, when applied to a transit-oriented development (TOD) project in Zhuhai, the model proved its potential as a sophisticated and practical tool for urban planners and policymakers, supporting efforts to optimize urban structures and reduce the impacts of urban heat islands. The potential of the model could have been further expanded by incorporating a wider range of dynamic urban factors, such as transportation systems and socio-economic data. This could have provided deeper insights into the prediction of urban morphology. Additionally, integrating real-time data from various sensors and satellite imagery would have enhanced the model’s adaptability to rapidly evolving urban environments. Future studies could also explore combining Pix2Pix with other advanced machine learning techniques, which would have helped refine its ability to predict and optimize urban forms in response to climate challenges. These advancements would have further strengthened the model’s role in supporting sustainable urban planning and enhancing climate resilience.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/land14040755/s1, Figure S1: Data enhancement for pix2pix; Figure S2: Learning trajectory of the Pix2pix Model through loss metrics; Figure S3: Confusion matrix of urban classification across eight cities; Table S1: Accuracy of the eight Chinese cities’ LCZs classifications by WUDAPT Lv.0 method; Table S2: Urban morphological parameters for LCZs classification; Table S3: Values of surface properties for LCZs simplified from Stewart and Oke (2012); Table S4: Hyperparameter settings for surrogate model.

Author Contributions

Conceptualization, M.W. and Z.X.; methodology, Z.X.; software, Z.X.; validation, M.W.; formal analysis, Z.X.; investigation, J.Z.; resources, S.Z.; data curation, M.W.; writing—original draft preparation, Q.W.; writing—review and editing, M.W.; visualization, Z.X.; supervision, M.W. and S.Z.; project administration, Q.W.; funding acquisition, M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Guangdong Basic and Applied Basic Research Foundation, China, grant number 2023A1515030158, and Guangzhou City School (Institute) Enterprise Joint Funding Project, China, grant number 2024A03J0317. The APC was funded by Guangdong Basic and Applied Basic Research Foundation, China, and Guangzhou City School (Institute) Enterprise Joint Funding Project, China.

Data Availability Statement

The study did not report any publicly archived datasets.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Meng, Y.; Wong, M.; Kwan, M.-P.; Pearce, J.; Feng, Z. Assessing multi-spatial driving factors of urban land use transformation in megacities: A case study of Guangdong–Hong Kong–Macao Greater Bay Area from 2000 to 2018. Geo-Spat. Inf. Sci. 2023, 27, 1090–1106. [Google Scholar] [CrossRef]
Wang, M.; Sun, C.; Zhang, D. Opportunities and challenges in green stormwater infrastructure (GSI): A comprehensive and bibliometric review of ecosystem services from 2000 to 2021. Environ. Res. 2023, 236, 116701. [Google Scholar] [CrossRef]
Mohtat, N.; Khirfan, L. The climate justice pillars vis-à-vis urban form adaptation to climate change: A review. Urban Clim. 2021, 39, 100951. [Google Scholar] [CrossRef]
Banzhaf, E.; Hofer, R. Monitoring Urban Structure Types as Spatial Indicators with CIR Aerial Photographs for a More Effective Urban Environmental Management. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2008, 1, 129–138. [Google Scholar] [CrossRef]
Yu, Z.; Jing, Y.; Yang, G.; Sun, R. A New Urban Functional Zone-Based Climate Zoning System for Urban Temperature Study. Remote Sens. 2021, 13, 251. [Google Scholar] [CrossRef]
Stewart, I.D.; Oke, T. Local Climate Zones for Urban Temperature Studies. Bull. Am. Meteorol. Soc. 2012, 93, 1879–1900. [Google Scholar] [CrossRef]
Yang, J.; Jin, S.; Xiao, X.; Jin, C.; Xia, J.; Li, X.; Wang, S. Local climate zone ventilation and urban land surface temperatures: Towards a performance-based and wind-sensitive planning proposal in megacities. Sustain. Cities Soc. 2019, 47, 101487. [Google Scholar] [CrossRef]
Zhang, C.; Feng, Y.; Qiang, B.; Shang, J. Wasserstein Generative Recurrent Adversarial Networks for Image Generating. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 242–247. [Google Scholar]
Cai, C.; Guo, Z.; Zhang, B.; Wang, X.; Li, B.; Tang, P. Urban Morphological Feature Extraction and Multi-Dimensional Similarity Analysis Based on Deep Learning Approaches. Sustainability 2021, 13, 6859. [Google Scholar] [CrossRef]
Hua, J.; Shi, Y.; Ren, C.; Lau, K.K.-L.; Ng, E.Y.Y. Impact of Urban Overheating and Heat-Related Mortality in Hong Kong. In Urban Overheating: Heat Mitigation and the Impact on Health; Aghamohammadi, N., Santamouris, M., Eds.; Springer Nature: Singapore, 2022; pp. 275–292. [Google Scholar]
Yang, J.; Kwon, Y. Novel CNN-Based Approach for Reading Urban Form Data in 2D Images: An Application for Predicting Restaurant Location in Seoul, Korea. ISPRS Int. J. Geo-Inf. 2023, 12, 373. [Google Scholar] [CrossRef]
Zhou, Z.; Chen, Y.; Wang, Z.; Lu, F. Integrating cellular automata with long short-term memory neural network to simulate urban expansion using time-series data. Int. J. Appl. Earth Obs. Geoinf. 2024, 127, 103676. [Google Scholar] [CrossRef]
Mori, M.; Fujioka, T.; Katsuta, L.; Kikuchi, Y.; Oda, G.; Nakagawa, T.; Kitazume, Y.; Kubota, K.; Tateishi, U. Feasibility of new fat suppression for breast MRI using pix2pix. Jpn. J. Radiol. 2020, 38, 1075–1081. [Google Scholar] [CrossRef] [PubMed]
Cira, C.-I.; Manso-Callejo, M.-Á.; Alcarria, R.; Fernández Pareja, T.; Bordel Sánchez, B.; Serradilla, F. Generative Learning for Postprocessing Semantic Segmentation Predictions: A Lightweight Conditional Generative Adversarial Network Based on Pix2pix to Improve the Extraction of Road Surface Areas. Land 2021, 10, 79. [Google Scholar] [CrossRef]
Huang, C.; Zhang, G.; Yao, J.; Wang, X.; Calautit, J.K.; Zhao, C.; An, N.; Peng, X. Accelerated environmental performance-driven urban design with generative adversarial network. Build. Environ. 2022, 224, 109575. [Google Scholar] [CrossRef]
González-Sabbagh, S.; Robles-Kelly, A.; Gao, S. Scene-cGAN: A GAN for underwater restoration and scene depth estimation. Comput. Vis. Image Underst. 2025, 250, 11. [Google Scholar] [CrossRef]
Mokhtar, S.; Sojka, A.; Davila, C.C. Conditional generative adversarial networks for pedestrian wind flow approximation. In Proceedings of the 11th Annual Symposium on Simulation for Architecture and Urban Design, Vienna, Austria, 25–27 May 2020; p. 58. [Google Scholar]
Zhou, S.; Wang, Y.; Jia, W.; Wang, M.; Wu, Y.; Qiao, R.; Wu, Z. Automatic responsive-generation of 3D urban morphology coupled with local climate zones using generative adversarial network. Build. Environ. 2023, 245, 110855. [Google Scholar] [CrossRef]
Guo, R.; Leng, H.; Yuan, Q.; Song, S. Impact of urban form on carbon emissions of residents in counties: Evidence from Yangtze River Delta, China. Environ. Sci. Pollut. Res. 2024, 31, 56332–56349. [Google Scholar] [CrossRef] [PubMed]
Zhang, Q.; Yuan, Q.; Li, J.; Shen, H.; Zhang, L. Cloud and Shadow Removal for Sentinel-2 by Progressively Spatiotemporal Patch Group Learning. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 775–778. [Google Scholar]
Wang, R.; Ren, C.; Xu, Y.; Lau, K.K.-L.; Shi, Y. Mapping the local climate zones of urban areas by GIS-based and WUDAPT methods: A case study of Hong Kong. Urban Clim. 2018, 24, 567–576. [Google Scholar] [CrossRef]
Verdonck, M.-L.; Okujeni, A.; van der Linden, S.; Demuzere, M.; De Wulf, R.; Van Coillie, F. Influence of neighbourhood information on ‘Local Climate Zone’ mapping in heterogeneous cities. Int. J. Appl. Earth Obs. Geoinf. 2017, 62, 102–113. [Google Scholar] [CrossRef]
Hammerberg, K.; Brousse, O.; Martilli, A.; Mahdavi, A. Implications of employing detailed urban canopy parameters for mesoscale climate modelling: A comparison between WUDAPT and GIS databases over Vienna, Austria. Int. J. Climatol. 2018, 38, e1241–e1257. [Google Scholar] [CrossRef]
Bechtel, B.; Alexander, P.J.; Beck, C.; Böhner, J.; Brousse, O.; Ching, J.; Demuzere, M.; Fonte, C.; Gál, T.; Hidalgo, J.; et al. Generating WUDAPT Level 0 data—Current status of production and evaluation. Urban Clim. 2019, 27, 24–45. [Google Scholar] [CrossRef]
Pandey, S.; van Nistelrooij, M.; Maasakkers, J.D.; Sutar, P.; Houweling, S.; Varon, D.J.; Tol, P.; Gains, D.; Worden, J.; Aben, I. Daily detection and quantification of methane leaks using Sentinel-3: A tiered satellite observation approach with Sentinel-2 and Sentinel-5p. Remote Sens. Environ. 2023, 296, 113716. [Google Scholar] [CrossRef]
Zhao, N.; Ma, A.; Zhong, Y.; Zhao, J.; Cao, L. Self-Training Classification Framework with Spatial-Contextual Information for Local Climate Zones. Remote Sens. 2019, 11, 2828. [Google Scholar] [CrossRef]
Vaidya, M.; Keskar, R.; Kotharkar, R. Classifying heterogeneous urban form into local climate zones using supervised learning and greedy clustering incorporating Landsat dataset. Urban Clim. 2024, 53, 101770. [Google Scholar] [CrossRef]
Pan, T.; Chen, J.; Zhang, T.; Liu, S.; He, S.; Lv, H. Generative adversarial network in mechanical fault diagnosis under small sample: A systematic review on applications and future perspectives. ISA Trans. 2022, 128, 1–10. [Google Scholar] [CrossRef] [PubMed]
Huang, F.; Jiang, S.; Zhan, W.; Bechtel, B.; Liu, Z.; Demuzere, M.; Huang, Y.; Xu, Y.; Ma, L.; Xia, W.; et al. Mapping local climate zones for cities: A large review. Remote Sens. Environ. 2023, 292, 113573. [Google Scholar] [CrossRef]
Touya, G.; Lokhat, I. Deep Learning for Enrichment of Vector Spatial Databases: Application to Highway Interchange. ACM Trans. Spat. Algorithms Syst. 2020, 6, 21. [Google Scholar] [CrossRef]
Laugros, A.; Caplier, A.; Ospici, M. Are Adversarial Robustness and Common Perturbation Robustness Independent Attributes? In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 1045–1054. [Google Scholar]
Jin, D.; Qi, J.; Huang, H.; Li, L. Combining 3D Radiative Transfer Model and Convolutional Neural Network to Accurately Estimate Forest Canopy Cover from Very High-Resolution Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10953–10963. [Google Scholar] [CrossRef]
Cui, S.; Wang, X.; Yang, X.; Hu, L.; Jiang, Z.; Feng, Z. Mapping Local Climate Zones in the Urban Environment: The Optimal Combination of Data Source and Classifier. Sensors 2022, 22, 6407. [Google Scholar] [CrossRef]
Fan, C.; Zou, B.; Li, J.; Wang, M.; Liao, Y.; Zhou, X. Exploring the relationship between air temperature and urban morphology factors using machine learning under local climate zones. Case Stud. Therm. Eng. 2024, 55, 104151. [Google Scholar] [CrossRef]
Mouzourides, P.; Eleftheriou, A.; Kyprianou, A.; Ching, J.; Neophytou, M.K.A. Linking local-climate-zones mapping to multi-resolution-analysis to deduce associative relations at intra-urban scales through an example of Metropolitan London. Urban Clim. 2019, 30, 100505. [Google Scholar] [CrossRef]
Kang, X.; Liu, L.; Ma, H. ESR-GAN: Environmental Signal Reconstruction Learning with Generative Adversarial Network. IEEE Internet Things J. 2020, 8, 636–646. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Y.; Zhou, X.; Kong, X.; Luo, J. Curb-GAN: Conditional Urban Traffic Estimation through Spatio-Temporal Generative Adversarial Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 842–852. [Google Scholar]
Pazhani, A.A.J.; Periyanayagi, S. A novel haze removal computing architecture for remote sensing images using multi-scale Retinex technique. Earth Sci. Inform. 2022, 15, 1147–1154. [Google Scholar] [CrossRef]
Zheng, J.; Liu, X.Y.; Wang, X. Single Image Cloud Removal Using U-Net and Generative Adversarial Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6371–6385. [Google Scholar] [CrossRef]
Liu, L.; Luo, Y.; Shen, X.; Sun, M.; Li, B. β-Dropout: A Unified Dropout. IEEE Access 2019, 7, 36140–36153. [Google Scholar] [CrossRef]
Parkes, E.J. Observations on the tanh–coth expansion method for finding solutions to nonlinear evolution equations. Appl. Math. Comput. 2010, 217, 1749–1754. [Google Scholar] [CrossRef]
Sutanto, A.R.; Kang, D.-K. A Novel Diminish Smooth L1 Loss Model with Generative Adversarial Network. In Proceedings of the Intelligent Human Computer Interaction, Daegu, Republic of Korea, 24–26 November 2020; pp. 361–368. [Google Scholar]
Hu, F.; Gong, J.; Fu, H.; Liu, W. Fabric Defect Detection Method Using SA-Pix2pix Network and Transfer Learning. Appl. Sci. 2024, 14, 41. [Google Scholar] [CrossRef]
He, H.; Li, C.; Yang, R.; Zeng, H.; Li, L.; Zhu, Y. Multisource Data Fusion and Adversarial Nets for Landslide Extraction from UAV-Photogrammetry-Derived Data. Remote Sens. 2022, 14, 3059. [Google Scholar] [CrossRef]
Cao, X.; Wang, F.; Yi, B.; Wei, Z.; Liu, L. Pix2Pix-based DOA Estimation with Low SNR. In Proceedings of the 2022 IEEE 10th Asia-Pacific Conference on Antennas and Propagation (APCAP), Xiamen, China, 4–7 November 2022; pp. 1–2. [Google Scholar]
Gomi, T.; Kijima, Y.; Kobayashi, T.; Koibuchi, Y. Evaluation of a Generative Adversarial Network to Improve Image Quality and Reduce Radiation-Dose during Digital Breast Tomosynthesis. Diagnostics 2022, 12, 495. [Google Scholar] [CrossRef]
Liu, Z.; Li, T.; Ren, T.; Chen, D.; Li, W.; Qiu, W. Day-to-Night Street View Image Generation for 24-Hour Urban Scene Auditing Using Generative AI. J. Imaging 2024, 10, 112. [Google Scholar] [CrossRef]
Huang, S.-Y.; Wang, Y.; Llabres-Valls, E.; Jiang, M.; Chen, F. Meta-Connectivity in Urban Morphology: A Deep Generative Approach for Integrating Human–Wildlife Landscape Connectivity in Urban Design. Land 2024, 13, 1397. [Google Scholar] [CrossRef]
Qiu, Y.; Hanna, S. Fluid grey 2: How well does generative adversarial network learn deeper topology structure in architecture that matches images? J. Build. Eng. 2024, 98, 111220. [Google Scholar] [CrossRef]
Binarti, F.; Pranowo, P.; Aditya, C.; Matzarakis, A. Characterizing the local climate of large-scale archaeological parks in the tropics. J. Cult. Herit. Manag. Sustain. Dev. 2024. ahead of print. [Google Scholar] [CrossRef]
Fan, P.Y.; Chun, K.P.; Mijic, A.; Mah, D.N.-Y.; He, Q.; Choi, B.; Lam, C.K.C.; Yetemen, O. Spatially-heterogeneous impacts of surface characteristics on urban thermal environment, a case of the Guangdong-Hong Kong-Macau Greater Bay Area. Urban Clim. 2022, 41, 101034. [Google Scholar] [CrossRef]
Zhao, X.; Yu, H.; Bian, H. Image to Image Translation Based on Differential Image Pix2Pix Model. Comput. Mater. Contin. 2023, 77, 181–198. [Google Scholar] [CrossRef]
Gan, W.; Zhao, Z.; Wang, Y.; Zou, Y.; Zhou, S.; Wu, Z. UDGAN: A new urban design inspiration approach driven by using generative adversarial networks. J. Comput. Des. Eng. 2024, 11, 305–324. [Google Scholar] [CrossRef]
Chiba, E.; Ishida, Y.; Wang, Z.; Mochida, A. Proposal of LCZs categories and standards considering super high-rise buildings suited for Asian cities based on the analysis of urban morphological properties of Tokyo. Jpn. Archit. Rev. 2022, 5, 247–268. [Google Scholar] [CrossRef]

Figure 1. The framework and its 4 components: (a) Dataset Preparation; (b) Pix2pix Model Training; (c) Pix2pix model and performances of the Pix2pix algorithm; (c) Robustness Testing and Output Evaluation; (d) 3D Urban Form Prediction and Dynamic Response.

Figure 2. Spatial distribution of selected study cities. (1) Guangzhou, (2) Shenzhen, (3) Qingdao, (4) Fuzhou, (5) Changsha, (6) Wuhan, (7) Shanghai, and (8) Hangzhou.

Figure 3. LCZs for eight cities.

Figure 4. Typical street classifications in 8 cities.

Figure 5. Structural blueprint of the Pix2pix Model.

Figure 6. Comparative efficacy analysis of Pix2pix Model implementations.

Figure 7. Synthesized urban plans and 3D Models for the eight cities.

Figure 8. Dynamic Adaptation of 3D Urban Models to Different LCZ Classifications and Urban Heat Island Effects. This figure demonstrates the responsiveness of 3D urban models to changes in LCZ classifications, road network configurations, and other variables, including (a) the baseline ground truth, (b) updated urban planning and 3D models resulting from LCZ changes, and (c) the impact of road network adjustments on planning outcomes. The temperature heatmap displays the temperature distribution across different urban forms, highlighting areas where summer midday temperatures exceed 30 °C and winter nighttime temperatures surpass 21.5 °C.

Figure 9. Predictive modeling of Transit-Oriented development in Zhuhai North Railway Station vicinity. (1) Large Lawn Resting Space (2) Zhuhai North Station (3) Central Business District (4) High-rise Residential Area.

Table 1. Performance evaluation of Pix2pix models.

City Sample	Pix2pix
City Sample	SSIM	R²	PSNR
Guangzhou	0.617	0.780	13.905
Wuhan	0.580	0.787	14.115
Shenzhen	0.712	0.805	14.540
Shanghai	0.630	0.814	13.752
Fuzhou	0.723	0.862	16.029
Hangzhou	0.660	0.832	15.001
Changsha	0.653	0.839	15.229
Qingdao Average	0.611 0.648	0.795 0.814	14.165 14.592

Table 2. Accuracy of the eight cities’ LCZs classifications.

City Name	OA %	OAurb %	OAnat %	Kappa	F1_Score
Guangzhou	75.6%	77.3%	73.8%	0.72	0.74
Wuhan	80.8%	79.7%	78.2%	0.77	0.79
Shenzhen	79.2%	80.2%	79.4%	0.81	0.83
Shanghai	86.5%	83.4%	87.1%	0.83	0.86
Fuzhou	83.2%	81.1%	84.5%	0.86	0.88
Hangzhou	83.1%	82.8%	85.3%	0.84	0.85
Changsha	81.3%	79.8%	82.2%	0.80	0.82
Qingdao	82.2%	80.7%	83.1%	0.82	0.84
Average	81.5%	80.6%	81.7%	0.81	0.83

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, M.; Xiong, Z.; Zhao, J.; Zhou, S.; Wang, Q. Pix2Pix-Based Modelling of Urban Morphogenesis and Its Linkage to Local Climate Zones and Urban Heat Islands in Chinese Megacities. Land 2025, 14, 755. https://doi.org/10.3390/land14040755

AMA Style

Wang M, Xiong Z, Zhao J, Zhou S, Wang Q. Pix2Pix-Based Modelling of Urban Morphogenesis and Its Linkage to Local Climate Zones and Urban Heat Islands in Chinese Megacities. Land. 2025; 14(4):755. https://doi.org/10.3390/land14040755

Chicago/Turabian Style

Wang, Mo, Ziheng Xiong, Jiayu Zhao, Shiqi Zhou, and Qingchan Wang. 2025. "Pix2Pix-Based Modelling of Urban Morphogenesis and Its Linkage to Local Climate Zones and Urban Heat Islands in Chinese Megacities" Land 14, no. 4: 755. https://doi.org/10.3390/land14040755

APA Style

Wang, M., Xiong, Z., Zhao, J., Zhou, S., & Wang, Q. (2025). Pix2Pix-Based Modelling of Urban Morphogenesis and Its Linkage to Local Climate Zones and Urban Heat Islands in Chinese Megacities. Land, 14(4), 755. https://doi.org/10.3390/land14040755

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pix2Pix-Based Modelling of Urban Morphogenesis and Its Linkage to Local Climate Zones and Urban Heat Islands in Chinese Megacities

Abstract

1. Introduction

2. Materials and Methods

2.1. Urban Form Data Acquisition

2.1.1. Preparation of Urban Morphology Data

2.1.2. Preparation of LCZ Data

2.1.3. Encoding and Enhancement of Data

2.1.4. Accuracy and Validation of Model

2.1.5. Calculation of Urban Geometric Properties

2.2. Pix2pix Model Training Framework

3. Results

3.1. Iteration Selection in pix2pix Model

3.2. Monitoring and Summarizing the Training Process

3.3. Urban Morphology Generation and Accuracy

3.3.1. 3D Urban Morphology Generation

3.3.2. Evaluation of the Accuracy of the Model Based on LCZ Classification

3.3.3. Dynamic Response of 3D Models

4. Discussion

4.1. Prediction of Future Urban Morphology

4.2. Computational Efficiency and Future Perspectives

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI