Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow

Chen, Zequn; Zhang, Nan; Xu, Chaoran; Xu, Zhiyu; Han, Songjiang; Jiang, Lishan

doi:10.3390/buildings15132292

Open AccessArticle

Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow

by

Zequn Chen

¹

,

Nan Zhang

^2,*

,

Chaoran Xu

^2,3,*

,

Zhiyu Xu

¹

,

Songjiang Han

⁴

and

Lishan Jiang

²

¹

International School of Engineering, Tianjin Chengjian University, Tianjin 300384, China

²

School of Architecture, Tianjin Chengjian University, Tianjin 300384, China

³

Institute of Architectural History and Theory, Tianjin University, Tianjin 300072, China

⁴

School of Architecture, Inner Mongolia University of Technology, Hohhot 010051, China

^*

Authors to whom correspondence should be addressed.

Buildings 2025, 15(13), 2292; https://doi.org/10.3390/buildings15132292

Submission received: 26 May 2025 / Revised: 23 June 2025 / Accepted: 24 June 2025 / Published: 29 June 2025

(This article belongs to the Special Issue Research on Promoting the Social Sustainability of Urban Neighbourhoods)

Download

Browse Figures

Versions Notes

Abstract

The stylistic emulation of historical building facades presents significant challenges for artificial intelligence (AI), particularly for complex and data-scarce styles like Krakow’s Eclecticism. This study aims to develop a methodological framework for a “typological transcoding” of style that moves beyond mere visual mimicry, which is crucial for heritage preservation and urban renewal. The proposed methodology integrates architectural typology with Low-Rank Adaptation (LoRA) for fine-tuning a Stable Diffusion (SD) model. This process involves a typology-guided preparation of a curated dataset (150 images) and precise control of training parameters. The resulting typologically guided LoRA-tuned model demonstrates significant performance improvements over baseline models. Quantitative analysis shows a 24.6% improvement in Fréchet Inception Distance (FID) and a 7.0% improvement in Learned Perceptual Image Patch Similarity (LPIPS). Furthermore, qualitative evaluations by 68 experts confirm superior realism and stylistic accuracy. The findings indicate that this synergy enables data-efficient, typology-grounded stylistic emulation, highlighting AI’s potential as a creative partner for nuanced reinterpretation. However, achieving deeper semantic understanding and robust 3D inference remains an ongoing challenge.

Keywords:

diffusion model; low-rank adaptation model; stylistic emulation; contrastive language–image pretraining; urban renewal; architectural typology; typological transcoding

1. Introduction

The preservation of architectural heritage is a critical task for maintaining cultural identity and historical continuity [1,2]. However, stylistic emulation of historical facades, particularly for styles characterized by intricate details and data scarcity, poses significant challenges [3,4]. Krakow’s Eclectic architecture, a unique blend of Neo-Renaissance, Neo-Baroque, and local influences from the late 19th and early 20th centuries, exemplifies such a complex case. Traditional restoration methods often rely on manual interpretation, which can be time-consuming, costly, and prone to inconsistency [5,6]. In recent years, AI, especially deep generative models, has shown immense potential in automating and enhancing design processes [7,8,9]. However, many existing AI-driven approaches function as “black boxes”, focusing on superficial visual mimicry rather than understanding the underlying architectural principles and typological logic that define a style. This gap often leads to outputs that, while visually plausible, may lack structural or compositional rationality.

This study addresses the challenge of moving beyond mere phenomenological emulation towards a more profound “typological transcoding” of architectural styles. The primary objective is to develop and validate a methodological framework that integrates architectural typological theory with advanced deep learning techniques to achieve a more accurate, controllable, and meaningful stylistic generation of historical facades. To achieve this, we pose the following research questions. (1) How can architectural typological principles be systematically integrated into the AI model’s training pipeline to enhance stylistic accuracy? (2) How does the proposed typologically guided approach perform against standard generative models, especially in data-scarce scenarios?

To answer these questions, this study proposes a framework that combines typology-guided dataset construction with LoRA fine-tuning of a state-of-the-art diffusion model (FLUX). Using the Eclectic facades of Krakow’s Wesoła District as a case study, we demonstrate the efficacy of this approach. Our findings reveal that the typologically guided model significantly outperforms baseline models in both quantitative metrics (e.g., a 24.6% improvement in FID) and qualitative expert evaluations. The results indicate that the model not only reproduces stylistic features with high fidelity but also exhibits a preliminary capacity to respond to typological concepts, generating novel yet coherent designs.

The main contributions of this paper are threefold. First, we provide a novel methodological workflow for data-efficient and high-fidelity stylistic emulation of complex architectural styles. Second, we offer a tangible case study demonstrating the value of integrating architectural theory with AI, enhancing the interpretability and performance of generative models. Third, the findings provide a pathway for applying this technology to practical domains such as historical preservation, urban regeneration, and digital cultural heritage.

This paper is organized as follows. Section 2 reviews the relevant literature on stylistic emulation and generative AI in architecture. Section 3 details the materials and methods used, including the case study selection and the proposed framework. Section 4 presents and analyzes the experimental results. Section 5 discusses the implications of the findings and Section 6 concludes the paper with a summary and directions for future research.

2. Literature Review

2.1. Stylistic Emulation in Architectural Heritage

In established heritage conservation charters and guidelines, the principles of ‘authenticity’ and ‘legibility of interventions’ are recognized as fundamental tenets. These principles are of paramount importance and are widely adopted to guide practices in the specialized field of architectural heritage preservation and historic urban regeneration. Consequently, these guiding principles generally discourage the adoption of direct stylistic mimicry or verbatim replication in the repair and retrofitting of historic buildings. Such an approach is typically discouraged to prevent the distortion of historical narratives, the conflation of different historical layers, and any compromise to the integrity of the original fabric [1,2,3]. Nevertheless, in complex real-world scenarios, architectural entities often suffer severe deterioration due to various factors, putting historic urban landscapes at significant risk of extensive degradation. Under these circumstances, an overly rigid adherence to the aforementioned principles may, paradoxically, exacerbate the loss of historic cities and their architectural character if the ‘completeness’ or ‘integrity’ of the built heritage cannot be effectively maintained or restored. In these challenging contexts, a controlled and well-informed ‘stylistic restoration’ or ‘emulation’ of historic building facades—coupled with the flexible adaptation and reinterpretation of their constituent elements—can emerge as a more proactive and efficacious strategic approach. This is particularly pertinent during the preliminary design phases, where the rapid evaluation of multiple design alternatives is crucial [7,8,9]. This approach transcends mere mechanical reproduction. Instead, it relies on a creative ‘typological transcoding’ [10,11] rooted in a profound understanding of historical architecture and rigorous typological analysis. The notion of ‘typological transcoding’ draws inspiration from both architectural theory and information science. This concept of ‘type’ as a generative structure, as articulated by Rossi [11] in his seminal work The Architecture of the City, finds a conceptual parallel in the ‘method of types’ from information theory, which Csiszár [10] established as a powerful tool for analyzing probabilistic systems. The primary aim is to perpetuate the stylistic characteristics and genius loci of specific historical periods. This concept is exemplified by Viollet-le-Duc’s restoration of Notre-Dame Cathedral. His work demonstrated that, at times, restoration may necessitate a degree of ‘idealized’ creation. Such creation, based on a thorough comprehension of the original style’s essence, seeks to reinstate the building’s ‘complete state’, as perceived for that particular historical era [12,13,14].

The scholarly investigation of ‘stylistic emulation’ in historic building facades extends far beyond the mere replication of existing structures. This concept broadly encompasses the nuanced and variously focused stylistic reinterpretation of historic facades, predicated upon a thorough comprehension of their architectural typology. Such reinterpretation may involve, inter alia, (i) the extraction of critical facade elements—such as composition, proportion, decorative motifs, and material texture—for application within novel design contexts; (ii) the flexible adaptation and recombination of historical prototypes to meet contemporary functional requirements; and (iii) innovative ‘re-creation’ that respects the core principles of the original style. This broader view of ‘stylistic emulation’ is particularly relevant during the conceptual and schematic phases of architectural design. It facilitates the rapid generation and comparative analysis of multiple design proposals imbued with specific historical stylistic connotations, thereby enabling more effective design development and refinement. This approach is particularly valuable in projects seeking to balance historical continuity with modern functional demands amid rapid urbanization.

Furthermore, the research and application of ‘stylistic emulation’ for historic architecture offers a value proposition that extends beyond the traditional domains of physical heritage conservation and historic urban area regeneration. The advent and proliferation of digital technologies have significantly broadened its application landscape. For instance, in the digital reconstruction of historical and cultural exhibitions, ‘stylistic emulation’ techniques facilitate the virtual recreation of lost or severely damaged architectural settings, providing the public immersive historical experiences. Similarly, in the development of virtual engines and game environments, architectural style generation predicated on ‘stylistic emulation’ enables the efficient construction of virtual worlds imbued with specific historical atmospheres and a high degree of verisimilitude, substantially enriching the content and experiential quality of digital entertainment and virtual tourism [15,16,17]. Consequently, the pursuit of efficient, precise, and interpretable methodologies for ‘stylistic emulation’ of historic building facades not only holds tangible significance for the stewardship of the tangible built environment but also provides critical technological support for the advancement of emerging fields such as digital humanities and virtual heritage.

2.2. Development of Generative AI in Architecture

Traditionally, conventional methodologies for emulating architectural facades have predominantly relied on esthetic intuition and accumulated empirical knowledge. The effectiveness of these processes was often constrained by the cognitive frameworks and technical repertoires of individual architects, lacking robust mechanisms for efficient information processing and feedback. This limitation manifested as insufficient dynamic adaptability and information handling capabilities, potentially resulting in mechanistic and monolithic stylistic reproductions that struggled to achieve fluid and meaningful stylistic innovation. However, with the advent of substantially enhanced computational power ushering in the era of artificial intelligence (AI), interdisciplinary research has increasingly integrated AI technologies into architectural practice. This convergence has progressively addressed the inherent shortcomings of traditional workflows, thereby infusing new potentialities into the evolution of architectural research methodologies (See Figure 1).

The rapid advancements in deep learning technologies, particularly generative artificial intelligence (Generative AI) models such as generative adversarial networks (GANs) [18] and diffusion models [19], have introduced transformative potential within the architectural design domain. Intriguingly, the developmental trajectories and core operational mechanisms of these technologies resonate, to a certain extent, with the fundamental principles of architectural typology. This resonance lies in the shared conceptual approach of learning from extensive corpuses of existing built precedents, abstracting underlying principles, and subsequently generating novel forms and spaces that follow specific generative rules and established typological frameworks.

Early generative models, such as those predicated on Boltzmann Machines (RBMs) [20,21] and Convolutional Neural Network (CNN) architectures [22], exhibited considerable limitations concerning the quality, resolution, and diversity of generated imagery. The advent of GANs marked a significant inflection point in this trajectory. Leveraging a unique adversarial training mechanism involving a generator and a discriminator, GANs demonstrated the capacity to learn and emulate the latent distributions of complex data, thereby facilitating the synthesis of highly photorealistic images. While subsequent advancements, including Wasserstein GANs (WGANs) [23], BigGANs [24], DCGANs [25], ProGANs [26], and CycleGANs [27], partially mitigated persistent challenges such as mode collapse and training instability, the direct applicability of GANs to text-to-image generation tasks remains constrained. This is particularly evident in scenarios demanding precise control and nuanced semantic understanding, where their inherent stochasticity and pronounced sensitivity to input conditioning impede their straightforward deployment for the emulation of intricate architectural styles. Bachl and Ferreira (2019) [28] employed GANs to learn architectural features of major cities and subsequently generate images of non-existent buildings. However, their findings revealed that both standard GAN and Conditional GAN (CGAN) struggled to effectively capture and reproduce the complex geometric configurations, diverse stylistic attributes, and fine-grained details characteristic of built environments. Consequently, these frameworks were deemed unsuitable for direct application in such generative tasks.

Diffusion models emerged alongside the evolution of GANs. These models learn data distributions through an iterative ‘noising-denoising’ process. This approach has shown superior performance in generating high-quality and diverse imagery. It surpasses GANs in certain generative capabilities [29]. Later optimizations, including denoising diffusion probabilistic models (DDPMs) [30] and latent diffusion models (LDMs) [31], further improved these models. These advancements not only enhanced generation efficiency but also offered more adaptable frameworks for conditional image synthesis. However, conventional diffusion models face challenges without explicit semantic guidance. Their outputs can sometimes lack relevance and controllability. This limitation is especially evident when generating images from intricate textual descriptions or specific stylistic directives.

Against this backdrop, the advent of multimodal learning models was pivotal. In particular, contrastive language–image pretraining (CLIP) [32] emerged as a crucial bridge, facilitating a deeper integration between AI generative technologies and typological principles. CLIP models undergo contrastive learning on extensive datasets of image-text pairs. This training enables them to map both images and text into a shared embedding space. As a result, these models can comprehend the semantic correlations between visual and textual information. Such robust semantic alignment capabilities are critical. They allow the models to guide the image generation process with greater precision based on textual prompts. This is essential for tasks requiring the generation of images that conform to specific architectural styles or typological concepts. Furthermore, technologies like ControlNet [33] have advanced this control. By incorporating supplementary conditional inputs, such as edge maps, pose skeletons, or depth maps, ControlNet significantly enhances the fine-grained manipulation of image layout and structure. This, in turn, reduces the stochasticity inherent in the generation process.

Deep learning’s optimization and evolution represents a progressive approximation towards an intrinsic encoding and decoding of visual information (See Figure 2 and Figure 3). This encompasses adversarial image generation with GANs, diffusion processes in diffusion models, and CLIP’s semantic matching capabilities. Throughout this developmental trajectory, models incrementally learn to encode latent structural regularities within data. During the generation phase, they effectively reconstruct images that closely emulate the data distribution of real-world examples. This continuous endeavor to approach the essence of an image and discern its inherent generative principles parallels the objectives of architectural typology. Architectural typology strives to abstract immutable ‘prototypes’ (Types) from extensive collections of built precedents, identifying their core organizational logic and variable constituent elements.

Indeed, a methodological similarity exists between deep learning-based image generation and the study and application of ‘Typology’ in architecture. Both disciplines aim to unveil the underlying structures and generative mechanisms of objects. Architectural typology, through the systematic analysis of numerous built examples, distills fundamental spatial organizations, functional logics, and formal principles. This process culminates in an understanding of the ‘grammar’ governing specific architectural types or styles. Analogously, data-driven deep learning models apprehend the ‘encoding’ of the visual world by learning statistical regularities and structural patterns from images. In the generation phase, both fields leverage these encoded rules for decoding and re-creation. Architects utilize typological knowledge as ‘prototypes’, adapting it to specific contexts to generate novel design variations. Similarly, AI models, guided by their learned rules, generate new images from latent representations, demonstrating an innovative aspect akin to typological transcoding.

Although technologies like CLIP and ControlNet now allow models to follow textual prompts and structural guides with far greater fidelity, they also introduce a new challenge: how to adapt these large-scale foundation models efficiently for domain-specific tasks. A key aspect is how to integrate user-defined stylistic preferences or object concepts into these models. Full fine-tuning of large-scale models to learn such specific concepts presents significant challenges. This approach is computationally expensive, demanding substantial GPU resources and considerable time. Moreover, it generates large model files, complicating the storage and dissemination of multiple customized versions. To address these limitations, LoRA technology emerged [34], providing a solution. LoRA has since been widely adopted to fine-tune large models across various tasks, including image generation. It complements the semantic guidance from CLIP and the structural control offered by ControlNet. Together, these technologies form a critical toolkit within current mainstream text-to-image generation. This toolkit enables highly controllable, high-fidelity, and personalized image generation. Consequently, it has further propelled the application and popularization of AI-generated content (AIGC) in fields such as artistic creation and design assistance.

Therefore, the evolution of deep learning models in image generation can be conceptualized as a computational, data-driven ‘typological exploration’. Models, through large-scale learning, discover and encode the latent ‘typological’ regularities and generative rules of the visual world. This discovery process often occurs ‘bottom-up’, particularly via contrastive learning techniques like CLIP, which involve continuous matching and training on extensive databases. The introduction of technologies such as LoRA and ControlNet then enhances this automated ‘typological system’. These additions enhance its semantic understanding and capacity for external constraints. Consequently, the system can more precisely generate instances ‘on-demand’ that conform to specific ‘types’ or ‘stylistic variations’.

2.3. AI Applications in Architectural Generation

As previously discussed, deep learning technologies, particularly GANs and diffusion models, have achieved significant advancements in image generation. These technologies have also demonstrated considerable potential for interdisciplinary applications across various architectural research domains. Furthermore, some studies have even encoded architectural constraints as graphical structures. This approach has been applied, for instance, in the study of architectural floor plans; scholars have employed GANs to recognize and generate architectural drawings. These enabled the automated generation of floor plans based on typological norms [35]; networks learn and store the typological features of floor plans [36]. Regarding facade style transfer, the continuous optimization of diffusion models has led to their progressive application, alongside GANs, in addressing complex facade design challenges. Researchers have explored various technical approaches for specific tasks. For instance, CycleGAN has been employed for extracting historic urban block architectural styles and integrating them with new designs [37]. Similarly, CGAN has been utilized for generating facades of rural and small-town buildings [38]. More recent investigations have yielded positive outcomes. These studies compare the performance of different models, such as GANs versus diffusion models, in facade style transfer. They also focus on leveraging technologies like ControlNet and LoRA to enhance image generation accuracy and stylistic control [36,37].

The current research frontier is rapidly advancing, as evidenced by several 2024 studies. For instance, Zhang et al. [39] explored the commercial style transfer of historical facades using Stable Diffusion, showcasing the practical application of these models in urban regeneration and design contexts. Even more aligned with our work, Xu et al. [40] proposed a knowledge-driven framework for generating historical Minnan-style facades. Their work corroborates the central thesis of our research: that incorporating explicit architectural knowledge—whether termed “knowledge-driven” or, in our case, “typology-guided”—is a critical step towards moving beyond mere visual mimicry to achieve more accurate and meaningful stylistic generation. This evolving research landscape sets the stage for our proposed methodology, which aims to formalize this knowledge integration through the rigorous lens of architectural typology.

2.4. Research Gap and Proposed Approach

This tendency to prioritize form over structure and appearance over substance is a prevalent limitation for many current generative AI applications in architecture. Consequently, even if AI-generated facades are visually captivating and stylistically accurate, considering them as complete architectural proposals can be problematic. Such proposals are likely to exhibit severe deficiencies in terms of intrinsic architectural rationality. This rationality encompasses aspects like the harmonious proportionality of components, the logical coherence of structural systems, and the sequential integrity of spatial narratives. These shortcomings highlight a critical bottleneck in the ongoing evolution of AI technology. Specifically, it underscores the challenge in transitioning from purely ‘data-driven’ generation to a more profound ‘knowledge/principle-driven’ paradigm. The core issue lies in an insufficient grasp and adherence to the essential characteristics of the subject matter.

Within this context, architectural typology presents a critical perspective and methodology to address such limitations. Typology’s scope extends beyond formal diversity alone. It also prioritizes uncovering the underlying structural cores that inform formal expressions, alongside the organizational paradigms and generative logics tailored to specific requirements. This approach—encompassing the abstraction, analysis, and deduction of ‘Types’—has optimized AI-driven content generation. Therefore, profoundly integrating typological principles into the AI generation pipeline holds paramount importance. This integration is particularly crucial across key stages such as training data construction, model fine-tuning, and output evaluation. Fine-tuning techniques like Low-Rank Adaptation (LoRA) [34], for instance, allow for efficiently adapting large models to specific architectural styles, while methods like ControlNet [33] offer enhanced spatial control, collectively enabling AI models to transcend superficial stylistic mimicry. Therefore, profoundly integrating typological principles into the AI generation pipeline holds paramount importance. This integration is particularly crucial across key stages such as training data construction, model fine-tuning, and output evaluation. Such an approach enables AI models to transcend superficial stylistic mimicry. More significantly, it facilitates their capacity to learn and apply fundamental architectural principles and underlying design logics. Ultimately, this leads to the generation of architectural proposals that are not only stylistically congruent but also rational.

3. Materials and Methods

3.1. Case-Study Selection: Krakow’s Eclectic Facades

The selection of the research subject for this study involved careful deliberation, culminating in the choice of Eclectic building facades from the Krakow region of Poland (See Figure 4) as the primary research specimens. This decision was guided by several factors (See Figure 5), the first of which is the subject’s relative non-mainstreamness. Unlike globally prominent and extensively documented architectural styles such as Gothic, Baroque, or Modernism, Krakow’s Eclecticism is comparatively underrepresented within global architectural scholarship and computer vision. This is particularly significant in the specific context of stylistic transfer applications. Such a characteristic is advantageous for evaluating our proposed methodology’s efficacy with minimal interference from pre-existing, large-scale datasets or established analytical precedents. To quantify this, a preliminary search in major academic databases (e.g., Scopus, Web of Science) using combined keywords such as (“Krakow Eclecticism” OR “Krakow architecture”) AND (“generative model” OR “AI generation”) yielded fewer than five relevant studies, confirming the novelty of this research area. Furthermore, the profound historical and cultural value of these buildings is self-evident. Serving as crucial symbols for late 19th and early 20th-century urban modernization and cultural renaissance, these buildings embody a rich heritage. Many of these buildings are listed on national or regional heritage registers, yet lack comprehensive digital documentation, creating an urgent need for preservation and study methods like the one proposed here. Their study, therefore, holds dual significance for both academic research and the preservation of cultural heritage. Finally, selecting this challenging case serves a strategic purpose: to validate the generalizability and potential applicability of our proposed methodology. If this framework can successfully address such a demanding ‘hard case’, its broader applicability and robustness will be strongly demonstrated. This would be particularly true when dealing with architectural types characterized by more abundant data or clearer stylistic definitions. Such an outcome would, in turn, augur well for its future prospects in AI-assisted architectural design and related fields.

To ensure the focused nature and representativeness of the research samples, this study further concentrated its investigation on Kraków’s Wesoła District (approximating to 50°03′42″ N latitude, 19°57′13″ E longitude). This district served as the specific area for on-site surveys and image acquisition (See Figure 6). The Wesoła District constitutes a significant component of the buffer zone for the Historic Centre of Krakow, a UNESCO World Heritage site inscribed in 1978. It covers an area of approximately 49.9 hectares. The Wesoła District has remarkably preserved the urban planning fabric of the 19th century. The Wesoła District has remarkably preserved the 19th-century urban planning fabric and retains the evolutionary trajectory of architectural technologies from that era. The area features a high concentration of Eclectic-style residential mansions. Many of these are listed in historical monument registers and inventories of ancient sites. Furthermore, its geographical proximity to the historic Old Town complex of Krakow is notable. Collectively, these factors provide this study with an abundant, concentrated, and high-quality corpus of empirical research material.

3.2. Image Data Acquisition and Preprocessing

To ensure that the subsequent LoRA model can accurately learn and effectively transfer the core typological features of Krakow’s Eclectic building facades, this study adopted a rigorous and meticulous strategy during the image data acquisition and preprocessing phase.

3.2.1. Initial Collection and Screening Criteria for Image Samples

Initially, this research gathered approximately 450 images depicting Eclectic-style residential mansions in the Krakow region, covering around 60 individual buildings identified during preliminary on-site and web-based surveys. This was achieved through a combination of on-site surveys and multi-source web data collection. However, the quality of these initially collected images was heterogeneous, and their stylistic representations also exhibited noticeable variations. To ensure the quality and representativeness of the dataset for model training, the research team implemented a rigorous screening process. This process incorporated classification and evaluation criteria grounded in typological theory (See Figure 7). The primary principle of this screening was to ensure style consistency. Priority was given to facade images that clearly and consistently displayed the characteristic features of mainstream Krakow’s Eclecticism. This selection aimed to mitigate suboptimal generation outcomes potentially caused by stylistic drift. Concurrently, image clarity was deemed essential. Selected images were required to possess high resolution and optimal illumination to mitigate the influence of transient environmental variables. This criterion was crucial for addressing factors such as harsh shadows, adverse weather conditions (e.g., rain, snow), or atmospheric haze. By focusing on images captured under clear, diffuse lighting, the methodology aimed to ensure that the model learned the intrinsic architectural features—such as form, texture, and color—rather than ephemeral atmospheric effects. Therefore, any obscured or blurry images were excluded to maintain a high-quality, standardized dataset. Thirdly, the prominence of characteristic features was another critical consideration. Preference was given to images that clearly showcased key typological attributes of the style, such as symmetrical tripartite facade compositions and abundant decorative elements. Finally, achieving multi-perspective coverage was also pursued. We aimed to include diverse viewpoints of the buildings whenever possible. This included standard elevation views and street-level perspectives exhibiting some degree of perspectival distortion, thereby assisting the model in comprehensively learning three-dimensional morphological characteristics.

Following multiple rounds of screening and optimization informed by the aforementioned typological analysis, the research team refined the initial collection of 450 images. This process yielded 248 images that met the preliminary selection criteria. To further enhance the dataset’s quality and specificity, the team identified thirty representative buildings from these 248 images. These selected buildings were considered most emblematic of the core typological characteristics—in terms of volume, constituent elements, and structural form—of Eclectic residential mansions in the region. From these thirty edifices, a final selection of 150 high-quality, highly representative images was curated. This curated set formed the foundational dataset for training the LoRA model.

The primary objective of this rigorous data screening and optimization process was to ensure that the model could effectively learn the most essential and archetypal typological features of Krakow’s Eclectic building facades. This meticulous preparation not only established a solid data foundation for subsequent AI-driven stylistic emulation. It also provided a reliable repository of stylistic prototypes to inform further innovative design explorations.

3.2.2. Typology-Based Label Generation and Keyword Optimization

This study employed a hybrid strategy, combining automated annotation with expert correction, to construct a semantic labeling system for architectural facade images intended for LoRA model training. Initially, a pretrained CLIP model was utilized for the preliminary semantic annotation of the image dataset. This process generated descriptive labels encompassing foundational information such as architectural type, stylistic features, and material composition. However, the labels automatically generated by the CLIP model were often quite broad. They frequently lacked precise descriptions of the detailed features characteristic of architectural facades. For example, during the initial annotation of Krakow’s Eclectic building facades, the CLIP model predominantly generated generic labels. These included terms such as ‘building’, ‘facade’, and ‘architecture’. However, it struggled to accurately capture more specialized and fine-grained descriptions. Such descriptions are crucial for reflecting the specific typological affiliations and hybridized stylistic characteristics of these facades. Examples of these missed details include ‘Neoclassical tripartite composition’, ‘Gothic Revival pointed-arch windows’, and ‘Baroque broken pediments’.

To address this limitation inherent in automated annotation, this study engaged a team of architectural experts. Their role was to manually correct and supplement the initial labels generated by the CLIP model. The core of this correction process involved the systematic review and refinement of labels, guided by the intellectual framework of architectural typology. Drawing upon their expertise in architectural typology, the expert team meticulously revised, refined, and augmented the label content. This ensured that the labels accurately reflected the unique characteristics of the building facades shown in the images. During this process, the experts not only rectified erroneous or ambiguous labels generated by the CLIP model but also supplemented these with a substantial number of missing architectural terminologies. Examples include terms such as ‘sandstone plinth’, ‘Corinthian order’, and ‘molding’. This comprehensive revision significantly enhanced the accuracy and professional relevance of the labels, as illustrated in Figure 8.

During the optimization of the keyword dataset, particular emphasis was placed on the hierarchical analysis of architectural facade images. This hierarchical analytical approach is based on fundamental principles of architectural typology. It conceptualizes the building facade as an organic entity composed of multiple constituent levels. Progressing from macroscopic to microscopic scales, this analysis sequentially addresses aspects such as overall architectural type, stylistic composition, color and material palettes, compositional forms, and detailed elements. Specifically, the label classification system developed in this study primarily encompasses the following tiers:

Architectural Attributes—Label Categories: For example, historical residences, public buildings;
Facade Composition: For example, Krakow eclectic architecture (predominantly neo-renaissance style), ornate balconies, Baroque-style window decoration;
Material Attributes: For example, red brick, beige stone, white window frames;
Facade Composition: For example, symmetrical composition, classical segmental composition, a row of multi-story buildings;
Detailed Style Classification: For example, orders (column types), cornices, pediments, moldings, spandrels (pier/window infill), corbels;
Image Perspective: For example, front elevation, low-angle view (looking up at the buildings);
Impurity Labels: For example, the sky is blue with some clouds, the street is lined with parked cars and bicycles.

The application of hierarchical analysis is clearly illustrated by examining the facades of two residential mansions: Wesoła No. 8 and Wesoła No. 15 (See Figure 9). Both edifices employ a tripartite compositional structure, a common feature in Classical architecture. This structure divides the facade into three distinct sections: the base, the main body, and the entablature (or cornice/eaves section). Regarding the compositional elements of the main body, both buildings exhibit a clear inter-story correspondence. Specifically, windows are vertically aligned across stories, often with identical dimensions. Fenestration patterns become progressively more intricate from the ground-floor openings to the attic, while decorative moldings also grow correspondingly elaborate [41]. However, while both edifices share commonalities in overall composition, gable ornamentation, and facade coloration that align with Classical styles, they also exhibit a fusion of disparate stylistic influences in their localized details. For instance, the central roof section of Wesoła No. 8 displays distinct Gothic stylistic characteristics. In contrast, Wesoła No. 15 features a ground-floor entrance lintel incorporating Baroque-style broken triangular and segmental pediments. This hierarchical analytical approach is instrumental in enabling the model to achieve a more profound comprehension of the facade’s constituent elements and stylistic attributes. Consequently, this enhances both the accuracy and stylistic consistency of the generated images. Through iterative cycles of automated annotation, manual expert correction, and dataset optimization, this study curated 150 high-quality image–label pairs. These pairs formed the foundational data for the subsequent LoRA model training.

This hierarchically clear and comprehensively detailed labeling system is designed with a primary aim: to furnish AI models with richer and more structured learning information. The intention is to enable these models to transcend rudimentary pixel-level mimicry. Ultimately, this facilitates a more profound understanding and faithful reproduction of the stylistic essence inherent in historic architecture.

3.3. Typological Transcoding Framework

To achieve precise ‘stylistic emulation’ and innovative re-creation of Krakow’s Eclectic building facades, this study formulated a typological transcoding framework. This framework integrates Low-Rank Adaptation (LoRA) fine-tuning techniques with a Stable Diffusion Model. The central principle of this framework is the application of typological analysis to guide both the training and inference processes of the AI model. The aim is to enable the model not merely to mimic visual phenomenology but also to comprehend and reproduce the underlying logic and compositional principles inherent in the architectural style.

3.3.1. Brief Introduction to Diffusion Models and LoRA Technology

Diffusion Models

Diffusion models represent a potent class of deep generative models. Their operational principle can be summarized as a bidirectional process. The initial phase is the ‘forward diffusion process’. This involves the incremental addition of Gaussian noise to the original data (See Figure 10) until it fully transforms into pure noise. Subsequently, the crucial ‘reverse denoising process’ takes place. In this phase, the model learns to progressively remove noise, starting from pure noise, to ultimately reconstruct a clear new sample that aligns with the original data distribution [19,30,31]. Separately, the Stable Diffusion Model, an advanced iteration of diffusion models, operates by performing diffusion and denoising processes within a latent space. This operation is coupled with textual conditional guidance, often facilitated by models like CLIP [32]. Consequently, Stable Diffusion achieves exceptional performance in text-to-image generation tasks. It is capable of producing high-resolution, highly detailed, and semantically pertinent images. Its robust generative capabilities and responsiveness to textual prompts make it an ideal foundational model for the stylistic transfer tasks investigated in this study.

Low-Rank Adaptation

Although pretrained large-scale diffusion models, such as Stable Diffusion, possess robust general-purpose generative capabilities, their direct application to specific, nuanced styles often falls short. For instance, when applied to styles like Krakow’s Eclectic architecture, achieving desired levels of precision and stylistic consistency can be challenging. Furthermore, full fine-tuning of these entire large models is computationally prohibitive. It also carries a significant risk of overfitting, particularly when training data is scarce. LoRA technology [34] offers an efficient solution to these challenges. Its core principle involves injecting trainable, low-rank matrices alongside key layers of a pretrained model. These key layers include the weight matrices within attention mechanisms (See Figure 11). During the fine-tuning process, only the parameters of these low-rank matrices are updated. The main weights of the pretrained model remain frozen.

3.3.2. LoRA Model Training Workflow and Key Parameter Regulation

The training workflow for the LoRA model, specifically tailored for Krakow’s Eclectic architectural style in this study, is illustrated in Figure 12. This workflow leverages the previously optimized image–label dataset. It employs a Stable Diffusion model—with FLUX selected as the foundational model architecture in this research—as its pretrained base.

During the training process, the meticulous adjustment of the following key hyperparameters is crucial for ensuring that the model effectively learns and generates high-quality, stylistically consistent facade images:

Learning Rate: This hyperparameter directly dictates the step size for model weight updates during training. While an excessively high learning rate can destabilize training or cause divergence, an overly low rate significantly prolongs training and risks entrapment in local optima. Consistent with common LoRA fine-tuning practices and prioritizing model stability, this study explored and set learning rates within a relatively narrow range (e.g., 1 × 10⁻⁴ to 1 × 10⁻⁵). This strategy aimed to effectively capture the nuanced characteristics of the target style while ensuring stable convergence [42].
Total Training Steps: This parameter depends on the image count, total training epochs, repetitions per image, and batch size. It directly correlates with the depth of the model’s learning from the training data. When training on complex styles such as Krakow’s Eclecticism, achieving a balance between underfitting and overfitting is paramount [43,44,45]. Underfitting occurs when the model fails to adequately learn stylistic elements, whereas overfitting involves excessive memorization of training sample details, thereby impairing generalization capabilities. Insufficient training steps can result in generated facades lacking typical stylistic details. Conversely, an excessive number of steps may lead the model to merely reproduce specific buildings from the training set, limiting its flexible application in novel design contexts. Therefore, a critical aspect of parameter tuning in this study involves judiciously planning the total training steps. This is coupled with the subsequent selection of optimal model checkpoints based on rigorous evaluation.
Loss Value Monitoring: The loss value is a metric quantifying the discrepancy between the model’s predictions and the ground truth data. It directly reflects the model’s training efficacy. During training, a diminishing loss value typically indicates that the model’s predictions are aligning more closely with actual observations. Consequently, the monitoring and optimization of the loss value are linked to the model’s learning efficiency and the quality of the generated images. For architectural style generation tasks, particularly in rendering the detailed nuances of Krakow’s Eclecticism, optimizing the loss value is crucial. This ensures that the model captures the fine-grained characteristics of the architectural style, facilitating the generation of more realistic and precise design imagery.

Systematic adjustment and experimentation with the aforementioned key parameters aimed to identify the optimal training configuration (Table 1). This configuration was sought to enable the LoRA model to efficiently and accurately learn the stylistic characteristics of Krakow’s Eclectic architecture. For this study, LoRA model fine-tuning was performed on the previously described dataset of 150 image–label pairs. The FLUX architecture was used as the foundational model for this training, which was executed on an NVIDIA RTX 4090 GPU with 24 GB of VRAM. Key training hyperparameters were configured as follows: epochs = 20; batch size = 4; and learning rate = 1 × 10⁻⁴. The entire training process spanned 17 h, yielding 20 LoRA models, each with a file size of 584 MB.

3.3.3. The Guiding Role of Typological Theory in Training and Inference Processes

Relying solely on the intrinsic learning capabilities of AI models and the adjustment of the aforementioned technical parameters may still prove insufficient to fully overcome the limitations of ‘phenomenological emulation’. Therefore, this study underscores the critical importance of integrating architectural typological thought as a consistent guiding force. This integration is emphasized throughout the entire training and inference pipeline of the LoRA model.

During the training data preparation phase, as detailed in Section 2.2, typological principles guided both image selection and label construction. Image selection aimed to ensure stylistic consistency and feature typicality. Label construction focused on hierarchical and structured organization, identifying aspects such as architectural type, stylistic composition, compositional forms, and key elements. This structured approach provided the model with learning material richer in both structural and semantic information. Upon entering the LoRA model fine-tuning phase, although LoRA predominantly learns visual features, high-quality, typologically informed labels can indirectly steer the model. These labels guide its attention towards visual patterns associated with specific typological concepts. For instance, by emphasizing labels such as ‘tripartite composition’, the model may, during its learning process, focus more on the vertical organizational principles of the facade.

During the inference (image generation) phase, users can guide the model’s generation trajectory through meticulously designed textual prompts. These prompts are deliberately imbued with typological concepts. Finally, in the result evaluation and iteration phase, subjective assessments by architectural experts are critical, complementing quantitative metrics from computer vision. These experts evaluate the generated outputs from a typological perspective, rather than focusing solely on superficial visual resemblance. They assess whether the results conform to the intrinsic logic, compositional principles, and cultural connotations (or significations) of the specific (or target) style, rather than relying solely on visual similarity. This feedback, grounded in typological knowledge, can inform the iterative refinement of the model or adjustments to the prompts.

In this manner, the present study endeavors to construct a framework that deeply integrates AI generation with typological theory (See Figure 13). The overarching aim is to enable AI not merely to ‘render accurately’ but, more crucially, to ‘think correctly’ in an architectural sense. Consequently, this approach seeks to achieve a higher echelon of intelligence and creativity in both the ‘stylistic emulation’ and re-creation of historic architectural styles.

3.4. Ethical Considerations

This research was conducted with full respect for ethical principles and intellectual property rights. All photographs used for the training dataset were either taken by the authors or sourced from publicly available, open-access platforms (e.g., Wikimedia Commons, Google Street View), in accordance with their respective terms of service. The study focuses on the stylistic analysis of building facades, which are publicly visible, and does not involve any sensitive or private data of individuals. The generated images are intended for academic research and design exploration, aiming to promote the preservation and appreciation of Krakow’s architectural heritage. The work does not claim ownership of the original architectural designs and is presented as a scholarly reinterpretation based on typological analysis. Regarding the legal standards applicable to buildings in the study location, any practical application of the generated designs in real-world restoration or construction would require full compliance with local Polish building codes, heritage preservation laws, and UNESCO guidelines for the Historic Centre of Krakow.

4. Analysis and Results

This chapter presents the experimental results of the proposed methodological framework. We first analyze the influence of key LoRA parameters on generation outcomes. Then, we conduct a comprehensive comparative analysis, evaluating the performance of our typologically guided model (FLUX-LORA) against baseline models (FLUX and SD3.5) through both quantitative metrics and qualitative expert evaluations. Finally, we situate our findings within the context of previous research.

4.1. Influence of LoRA Parameters on Stylistic Generation

Throughout the training and application phases of LoRA models, two pivotal parameters emerge: the loss value during the training stage and the LoRA weight applied during the inference stage. These parameters collectively exert a significant influence on both the accuracy and diversity of the stylistic attributes in the generated images.

LoRA Loss and Weight Tuning for Style Transfer

Training a LoRA model is fundamentally a learning endeavor, wherein the model’s proficiency in capturing a target historical style—Krakow’s Eclecticism in this instance—typically enhances with an increasing number of training epochs. This enhancement is generally paralleled by a steady decline in the training loss value (LOSS), with lower LOSS indicating a better fit of the model to the training data. Simultaneously, during inference for image generation, the applied LoRA weight (typically ranging from 0 to 1, though occasionally extending slightly beyond 1) dictates the degree to which the fine-tuned stylistic features influence the base model’s output.

By experimenting with LoRA model checkpoints saved at various training epochs (corresponding to different LOSS), in conjunction with diverse inference weight values, this study observed several discernible patterns (See Figure 14); (see also Figure A1 in Appendix A). During earlier training stages, characterized by higher LOSS, or when lower LoRA application weights were employed, the resultant images exhibited greater ‘creativity’ and ‘conceptuality’. The AI-generated facade styles tended to manifest as a fusion. This fusion typically involved the target historical style blended with either the inherent style of the base model or broader contemporary design elements. This characteristic offers designers a valuable opportunity for exploring stylistic fusion and seeking innovative expressions during the initial conceptual design phases. Conversely, in later training stages—marked by a significant decrease and convergence of the LOSS—or when higher LoRA application weights are utilized, the AI-generated image outcomes demonstrate a more precise and stable replication of the target historical style. In such cases, these outputs exhibit a superior resemblance to authentic historical images from the training set, particularly concerning overall composition, decorative detailing, material rendering, and lighting effects. This latter approach is, therefore, more congruous with application scenarios that prioritize high-fidelity reproduction.

This dual control mechanism, comprising the LOSS (indicative of training depth) and the LoRA weight (reflecting the intensity of fine-tuning influence), affords designers considerable flexibility and control throughout the stylistic emulation process. Designers can select and combine LoRA models from different training stages, along with their respective application weights, tailored to the evolving requirements of a project. This adaptability accommodates needs ranging from early-stage conceptual explorations of stylistic fusion to later-phase pursuits of high-precision stylistic expression. Consequently, AI tools can more effectively address diverse needs for both stylistic imitation and innovation within the design workflow.

However, it is crucial to acknowledge a core deficiency in this approach, which relies purely on understanding and applying technical parameters like LOSS and LoRA weights. This deficiency lies in its primary effect being confined to the visual phenomenological level of the image. While reducing LOSS and an increase in LoRA weight enhance the similarity of generated images to the training data, particularly concerning texture, color, lighting, and recognizable stylistic elements (e.g., specific column orders, window ornamentation), this often occurs at a superficial level. Essentially, the model learns a visual pattern-matching mechanism, aiming for a ‘looks like’ resemblance. However, this does not guarantee that the model comprehends the underlying architectural principles. These include the structural logic, spatial organization paradigms, or specific construction techniques embedded within these visual elements. Consequently, even if the generated images exhibit high stylistic fidelity to the training data, they may still contain conspicuous fallacies concerning intrinsic architectural rationality. Such fallacies could manifest as, for instance, disproportionate component scaling, illogical structural relationships, or incoherent spatial circulation. Under such circumstances, AI-generated images might merely represent a rigid transplantation of ‘stylistic phenomenology’, rather than constituting an architectural expression endowed with intrinsic coherence and buildability. This precisely underscores the inherent limitations of relying solely on AI’s visual mimetic capabilities. It also highlights the imperative to integrate more profound architectural knowledge—such as typological principles—for both guidance and evaluation.

4.2. Comparative Evaluation of Model Performance

To comprehensively evaluate the model’s efficacy, we adopted a dual-framework comprising quantitative metrics and qualitative assessment by a panel of architectural experts. Figure 15 visually presents a comparison of facade images generated by our proposed model (FLUX-LORA) and the two baseline models (SD3.5 and FLUX), offering an intuitive overview of the performance differences. A more detailed analysis follows.

4.2.1. Quantitative Metrics Analysis

For quantitative evaluation, we selected three widely adopted objective metrics from the image generation domain. The first of these is the FID, which assesses the similarity between the feature distributions of real and generated images; a lower FID value signifies higher quality and diversity [46]. Secondly, LPIPS was utilized, which employs deep learning models to quantify the perceptual similarity between two images, aligning closely with human subjective judgments; a lower LPIPS value indicates greater resemblance [47]. Finally, Peak Signal-to-Noise Ratio (PSNR) was included, a widely used metric for assessing image distortion; a higher PSNR value signifies less distortion and better image quality [48].

The comparative results for these three metrics are presented in Figure 16. The data unequivocally demonstrates that the FLUX-LORA model, fine-tuned using the proposed methodology, exhibits a significant advantage across all selected indicators:

FID Improvement (Lower is Better): The FLUX-LORA model achieved an FID value of 90.48. This represents an approximate improvement of 28.4% over the SD3.5 model’s score of 126.42, and a 24.6% improvement over the base FLUX’s score of 119.96. These results indicate that the image set generated by FLUX-LORA exhibits an overall feature distribution more closely aligned with that of authentic Krakow Eclectic building facades.
LPIPS Improvement (Lower is Better): The FLUX-LORA model attained an LPIPS value of 0.5904. This signifies an approximate improvement of 11.0% over the SD3.5 model’s score of 0.6636, and a 7.0% improvement compared to the base FLUX’s score of 0.6349. This suggests a higher fidelity in reproducing both fine details and overall stylistic characteristics.

Figure 16. Quantitative line chart of the three major evaluation metrics. Image source: drawn by the author.

PSNR Improvement (Higher is Better): The FLUX-LORA model registered a PSNR value of 10.1488 dB, marking an approximate increase of 6.8% compared to the SD3.5 model’s score of 9.4979 dB and demonstrating superior pixel-level image fidelity.

Collectively, these metrics confirm that the proposed typology-guided fine-tuning strategy significantly enhances the generated facade imagery in terms of authenticity, perceptual quality, and stylistic similarity.

4.2.2. Qualitative Evaluation by Expert Panel

While quantitative metrics can objectively reflect image quality at a data level, a nuanced understanding of deeper issues remains reliant on subjective human judgment, particularly from individuals with specialized expertise. These issues include the accuracy of architectural style, the appropriateness of elements, and conformity to design intent. Consequently, this study convened an evaluation panel comprising 68 experts from the field of architecture. The panel members possessed diverse backgrounds, encompassing scholars engaged in historical building preservation and regeneration research, as well as professionals specializing in architectural design and its theoretical foundations. Notably, approximately two-thirds of the experts were either based in Krakow or had extensive academic or professional experience directly related to Krakow’s historical urban fabric and architectural heritage. This selection process ensures the relevance and depth of the qualitative feedback.

During the evaluation process, experts initially observed a set of ‘real images’ presented in Figure 15, which served as references. They then compared these with corresponding ‘emulated images’, generated by different models (FLUX, FLUX-LORA, and SD 3.5). Throughout this assessment, the experts also consulted the ‘semantic labels’ (i.e., the textual prompts used during training) that were employed to generate the images, providing a basis for stylistic description. They rated each AI-generated image using a 1–5 point scale across four key dimensions: firstly, Realism, assessing whether the generated image appeared to be an authentic photograph of a building; secondly, Semantic Correspondence, evaluating whether the image accurately reflected the architectural style and elements described in its semantic label; thirdly, Image Similarity, considering the degree of resemblance to its corresponding ‘real image’; and finally, Stylistic Accuracy, scrutinizing whether the image faithfully reproduced the typical characteristics of Krakow’s Eclectic architectural style. To analyze the results, we calculated the mean score for each dimension across all 68 expert responses.

The aggregated results, as exemplified in Figure 17, Figure 18 and Figure 19, reveal a clear superiority of the FLUX-LORA model. It achieved an average overall score of 4.13, a notable increase of approximately 33% compared to the baseline FLUX (3.09) and the SD3.5 model (3.06). Experts particularly commended the model’s performance in ‘Realism’ (4.24) and ‘Stylistic Accuracy’ (4.06). They concurred that facades generated by the FLUX-LORA model more precisely captured the complexity and uniqueness of Krakow’s Eclectic architecture, including features like tripartite composition, window ornamentation, and cornice details. This expert validation underscores that the model not only visually emulates the target style but also demonstrates a preliminary capacity to respond to and reorganize typological elements from textual prompts.

4.3. Comparison with Previous Studies

Situating our findings in the context of previous research highlights the contribution of this study. Direct quantitative comparison is challenging due to differing datasets and architectural styles. However, prior works, such as Sun et al. [37] using CycleGAN or Xu et al. [40] using diffusion models for facade generation, have primarily focused on visual style transfer without an explicit integration of deep architectural knowledge. While they achieved commendable visual results, they did not systematically address the “black box” problem or the generation of compositionally coherent novel designs based on typological principles.

Our framework addresses this gap. The significant improvements in both quantitative metrics and, more importantly, in the qualitative expert scores (a 33% increase in perceived quality), demonstrate the tangible value of our typology-guided approach. This methodology moves beyond mere stylistic imitation towards a more architectonically informed generation, a key limitation in many prior studies.

5. Discussion

This chapter offers a deeper interpretation and discussion of the experimental results. Initially, we adopt an architectural typological perspective to thoroughly analyze the intrinsic mechanisms by which AI—particularly diffusion models fine-tuned with LoRA—learns and reproduces historic architectural styles. The relationship between these mechanisms and core typological theories is also explored. Subsequently, a comprehensive analysis is presented, evaluating the efficacy, value, and broader contextual significance of the methodology developed in this study. Finally, we address the current study’s limitations and delineate promising avenues for future research.

5.1. Interpreting the Stylistic Learning Process: A Typological Perspective

The findings reveal that the AI model’s learning process is not instantaneous but hierarchical. As illustrated in the progressive learning stages (see Figure 20), the model’s understanding evolves from macro-level features (e.g., volume, overall composition) to micro-level details (e.g., ornamentation, material texture). This progression, controlled by parameters like training epochs and LoRA weights, strikingly mirrors the cognitive process of an architect analyzing a building type. This observation provides a valuable interpretive lens; the AI’s data-driven, bottom-up learning process can be understood as a computational form of “typological analysis”.

This aligns with core tenets of architectural theory, particularly Rossi’s concept of ‘Type’ as a generative logic that transcends specific forms [11]. Our methodology, which uses typology-guided dataset construction and labeling, essentially provides the model with a structured “architectural language”. It enables the AI to learn not just the “vocabulary” (elements) but also the “grammar” (compositional rules) of Krakow’s Eclecticism. The superior performance in ‘Stylistic Accuracy’ and ‘Semantic Correspondence’ confirmed by experts suggests that our method helps the AI move from “seeing” to “understanding”, a critical step towards more intelligent architectural generation.

5.2. Methodological Contributions and Practical Implications

The primary contribution of this work is a validated, data-efficient methodology for handling complex, data-scarce architectural styles. The 33% improvement in expert-rated quality, achieved with only 150 training images, highlights the efficacy of combining deep architectural knowledge with targeted fine-tuning. This framework offers significant practical implications:

For Heritage Preservation: It provides an efficient tool for the digital reconstruction and restoration of damaged or lost heritage, enabling rapid visualization of multiple restoration scenarios.

For Architectural Design: It can serve as a powerful “creative partner”, assisting designers in exploring historical styles, generating context-aware design variations, and facilitating client communication during preliminary design phases.

For Digital Humanities: The generated high-fidelity assets can be used in virtual reality (VR), augmented reality (AR), and game environments, enriching digital tourism and educational experiences.

5.3. Limitations and Future Research Directions

Despite the promising results, this study has several limitations that open avenues for future research.

First, the current work is predominantly focused on 2D facades. Extending this methodology to consistent and rational 3D model generation remains a significant challenge, requiring larger multi-modal datasets and more advanced model architectures [49,50]. This is a highly active area of research, with recent state-of-the-art models like Direct3D demonstrating scalable image-to-3D generation through latent diffusion transformers [51,52,53]. However, ensuring the topological correctness and architectural rationality of such generated 3D models remains a key hurdle for future work.

Second, while our approach enhances the model’s understanding of compositional logic, its grasp of deep structural rationality and functional organization is still limited. Future work could explore incorporating structural or programmatic constraints into the loss function to penalize irrational outputs, moving the AI towards a more holistic architectural intelligence.

Finally, the model’s ability to transcode profound cultural connotations remains superficial. Future research should investigate methods to integrate textual, historical, and cultural data more deeply into the training process, enabling the AI to generate designs that are not only stylistically accurate but also culturally resonant.

6. Conclusions

This research successfully developed and validated a novel framework for the stylistic emulation of historical architecture by integrating typological theory with LoRA fine-tuning of diffusion models. Through a case study on the complex and data-scarce Eclectic facades of Krakow, we demonstrated that our methodology significantly enhances the quality, accuracy, and coherence of generated images compared to baseline approaches. The key finding is that a typology-guided approach enables the AI to move beyond superficial visual mimicry towards a more profound, architectonically informed “transcoding” of style. This work provides a robust and data-efficient solution for digital heritage preservation and offers a new paradigm for a more meaningful collaboration between human designers and artificial intelligence. While challenges in 3D generation and deeper semantic understanding persist, this study lays the theoretical and practical groundwork for future advancements in intelligent and culturally aware architectural design automation.

Author Contributions

Writing—original draft, Z.C.; writing—review and editing, C.X.; methodology, N.Z.; supervision, C.X. and Z.X.; formal analysis, S.H. and L.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The LoRA model generated during this study is publicly available on the Civitai platform at Eclectic architecture in Krakow—v1.0|Flux LoRA|Civitai https://civitai.com/models/1576307 (accessed on 13 May 2025). The training dataset used for this study is available from the corresponding author upon reasonable request.

Acknowledgments

We are grateful for the invaluable guidance from our supervisors, the diligent efforts of our team members, and the generous support from external experts. We thank Zhang Nan for his critical suggestions on the research’s significance and his guidance in refining the conceptual framework, enhancing its theoretical depth. We also thank Xu Chaoran for providing essential funding and equipment, and for his insightful advice on the manuscript’s overall structure. Within the team, Zequn Chen proposed the core research idea and led key technical experiments. Zhiyu Xu assisted with collecting and analyzing experimental materials. Songjiang Han created the main analytical figures and synthesized the AI literature. Lishan Jiang polished the manuscript’s language, assisted with material analysis, and proofread the final version. Special thanks are due to Anna Porębska (Kracow University of Technology) for providing valuable supplementary materials, including historical data and site photographs, which supported our research. We thank all supervisors, colleagues, and experts for their contributions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AIGC	AI-Generated Content
CNN	Convolutional Neural Network
CLIP	Contrastive Language–Image Pretraining
DDPM	Denoising Diffusion Probabilistic Model
FID	Fréchet Inception Distance
GANs	Generative Adversarial Networks
LDM	Latent Diffusion Model
LoRA	Low-Rank Adaptation
LPIPS	Learned Perceptual Image Patch Similarity
PSNR	Peak Signal-to-Noise Ratio
SD	Stable Diffusion

Appendix A

Figure A1. Visual Comparison Table of Image Generation Effects with Different Weight Settings in Multi-Round LoRA Training.

References

Petzet, M.; Ziesemer, J. (Eds.) International Charters for Conservation and Restoration/Chartes Internationales sur la Conservation et la Restauration/Cartas Internacionales Sobre la Conservacion y la Restauracion; Monuments & Sites 1; ICOMOS: Paris, France; Lipp GmbH: Munich, Germany, 2004; ISBN 3874906760. [Google Scholar]
Jokilehto, J. A History of Architectural Conservation, 2nd ed.; Routledge: London, UK, 2017; ISBN 9781138639997. [Google Scholar]
Boccardi, G. Authenticity in the heritage context: A reflection beyond the Nara Document. Hist. Environ. Policy Pract. 2019, 10, 4–18. [Google Scholar] [CrossRef]
He, M.; Qi, J. Study on the theory of Rafael Moneo architectural typology. IOP Conf. Ser. Mater. Sci. Eng. 2019, 592, 012105. [Google Scholar] [CrossRef]
Plevoets, B.; Van Cleempoel, K. Adaptive Reuse of the Built Heritage: Concepts and Cases of an Emerging Discipline; Routledge: London, UK, 2019; p. 256. ISBN 9781138062764. [Google Scholar]
Plevoets, B.; Van Cleempoel, K. Adaptive reuse as a strategy towards conservation of cultural heritage: A literature review. WIT Trans. Built Environ. 2011, 118, 155–164. [Google Scholar] [CrossRef]
Tang, Q.; Zheng, L.; Chen, Y.; Chen, J.; Yang, S. Innovative design method for Lingnan region veranda architectural heritage (Qi-Lou) façades based on computer vision. Buildings 2025, 15, 368. [Google Scholar] [CrossRef]
Yuan, F.; Xu, X.; Wang, Y. Toward the era of generative—AI—Augmented design. Archit. J. China 2023, 659, 14–20. (In Chinese) [Google Scholar] [CrossRef]
Yang, J.; Tan, M.; Chen, X.; Lin, Z.; Jiang, X. Exploration of theories and technical mechanisms for smart city planning. J. Southeast Univ. Nat. Sci. Ed. 2024, 54, 1066–1079. (In Chinese) [Google Scholar]
Csiszár, I. The method of types. IEEE Trans. Inf. Theory 1998, 44, 2505–2523. [Google Scholar] [CrossRef]
Rossi, A. The Architecture of the City; MIT Press: Cambridge, MA, USA, 1984. [Google Scholar]
Viollet-le-Duc, E.-E. Dictionnaire Raisonné De L’architecture Française Du XIe Au XVIe Siècle; Bance: Paris, France, 1854; Volume 1. [Google Scholar]
Bressani, M. Architecture and The Historical Imagination: Eugène—Emmanuel Viollet-le-Duc, 1814–1879; Routledge: London, UK, 2016. [Google Scholar]
Bressani, M. Notes on Viollet-le-Duc’s philosophy of history: Dialectics and technology. J. Soc. Archit. Hist. 1989, 48, 327–350. [Google Scholar] [CrossRef]
Zhong, H.; Wang, L.; Zhang, H. The application of virtual reality technology in the digital preservation of cultural heritage. Comput. Sci. Inf. Syst. 2021, 18, 535–551. [Google Scholar] [CrossRef]
Selmanović, E.; Rizvic, S.; Harvey, C.; Boskovic, D.; Hulusic, V.; Chahin, M.; Sljivo, S. Improving accessibility to intangible cultural heritage preservation using virtual reality. J. Comput. Cult. Herit. JOCCH 2020, 13, 13. [Google Scholar] [CrossRef]
Poyck, G. Procedural City Generation with Combined Architectures for Real—Time Visualization. Master’s Thesis, Clemson University, Clemson, SC, USA, 2023. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2256–2265. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Hinton, G.E. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 5767–5777. [Google Scholar] [CrossRef]
Brock, A.; Donahue, J.; Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Bachl, M.; Ferreira, D.C. City—GAN: Learning architectural styles using a custom conditional GAN architecture. arXiv 2021, arXiv:1907.05280. [Google Scholar]
Dhariwal, P.; Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 68406851. [Google Scholar]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Lille, France, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 3836–3847. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-rank adaptation of large language models. In Proceeding of the 10th ICLR, Kigali, Rwanda, 25–29 April 2022. [Google Scholar]
Huang, W.; Zheng, H. Architectural drawings recognition and generation through machine learning. In Proceedings of the 38th Annual Conference of the Association for Computer Aided Design in Architecture, Mexico City, Mexico, 17–20 October 2018; pp. 616–625. [Google Scholar]
Nauata, N.; Chang, K.H.; Cheng, C.Y.; Mori, G.; Furukawa, Y. House-—GAN: Relational generative adversarial networks for graph constrained house layout generation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 162–177. [Google Scholar]
Sun, C.; Zhou, Y.; Han, Y. Automatic generation of architecture façade for historical urban renovation using generative adversarial network. Build. Environ. 2022, 212, 108781. [Google Scholar] [CrossRef]
Zhang, L.; Zheng, L.; Chen, Y.; Huang, L.; Zhou, S. CGAN—Assisted renovation of the styles and features of street façades—A case study of the Wuyi area in Fujian, China. Sustainability 2022, 14, 16575. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Y.; Li, Z.; Li, Y.; Yu, Z.; Li, M. Development of a method for commercial style transfer of historical architectural façades based on stable diffusion models. J. Imaging 2024, 10, 165. [Google Scholar] [CrossRef] [PubMed]
Xu, S.; Zhang, J.; Li, Y. Knowledge-driven and diffusion model-based methods for generating historical building façades: A case study of traditional Minnan residences in China. Information 2024, 15, 344. [Google Scholar] [CrossRef]
Xu, Z.; Wang, L. Reshaping classicism—An Abnormal Landscape of Paestum and the rise of Neoclassicism. World Arch. 2023, 3, 110–115. (In Chinese) [Google Scholar]
Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
Dietterich, T.G. Overfitting and underfitting in machine learning. In Proceedings of the ACM Computing Surveys (CSUR)–1995 Workshop on Overfitting, Seattle, WA, USA, 5–9 August 1995; pp. 114–122. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; ISBN 9780262035613. [Google Scholar]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30, 6626–6637. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition IEEE, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar] [CrossRef]
Liu, J.; Huang, X.; Huang, T.; Chen, L.; Hou, Y.; Tang, S.; Liu, Z.; Ouyang, W.; Zuo, W.; Jiang, J.; et al. A comprehensive survey on 3D content generation. arXiv 2024, arXiv:2402.01166. [Google Scholar]
Zhang, R.; Guo, Z.; Zhang, W.; Li, K.; Miao, X.; Cui, B.; Qiao, Y.; Gao, P.; Li, H. PointCLIP: Point cloud understanding by CLIP. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 5790–5800. [Google Scholar] [CrossRef]
Wu, S.; Lin, Y.; Zhang, F.; Zeng, Y.; Xu, J.; Torr, P.; Cao, X.; Yao, Y. Direct3D: Scalable image-to-3D generation via 3D latent diffusion transformer. arXiv 2024, arXiv:2405.14832. [Google Scholar]
Tochilkin, D.; Pankratz, D.; Liu, Z.; Huang, Z.; Letts, A.; Li, Y.; Liang, D.; Laforte, C.; Jampani, V.; Cao, Y.P. TriPoSR: Fast 3D object reconstruction from a single image. arXiv 2024, arXiv:2403.02151. [Google Scholar]
Xiang, J.; Lv, Z.; Xu, S.; Deng, Y.; Wang, R.; Zhang, B.; Chen, D.; Tong, X.; Yang, J. Structured 3D latents for scalable and versatile 3D generation. arXiv 2025, arXiv:2412.01506. [Google Scholar]

Figure 1. The development process of deep learning research. Image source: drawn by the author.

Figure 2. Design mechanisms predicated on architectural typology (adapted from Rossi’s The Architecture of the City, which describes design mechanisms rooted in architectural typology). Image source: drawn by the author.

Figure 3. The CLIP mechanism (adapted from “Learning Transferable Visual Models from Natural Language Supervision” by Radford et al. [32]). Image source: drawn by the author.

Figure 4. The Eclectic facade of the building on Wesola Street in Krakow. Image source: photo by the author.

Figure 5. Considerations in the selection of research subjects.

Figure 6. Research plot map. Image source: drawn by the author. Base map source: Google Maps.

Figure 7. Image screening process. Image source: drawn by the author.

Figure 8. Image–label diagram. Image source: self-drawn and self-photographed by the author.

Figure 9. Taking the facade analysis of the mansion as an example. Image source: self-drawn and self-photographed by the author.

Figure 10. The operational principle of diffusion models, illustrating the forward noising process and the reverse denoising process. Image source: self-drawn and self-photographed by the author.

Figure 11. LoRA system diagram (adapted from “LoRA: Low-Rank Adaptation of Large Language Models” by Edward J. Hu et al. [34]). Image source: self-drawn by the author.

Figure 12. LoRA training workflow. Image source: drawn by the author.

Figure 13. Research workflow framework. Image source: self-drawn and self-photographed by the author.

Figure 14. Diagram of LoRA multi-round models with different weight values. Image source: self-drawn and self-photographed by the author.

Figure 15. Comparison of images generated by different large models. Image source: self-drawn and self-photographed by the author.

Figure 17. Example of FLUX-LoRA expert evaluation form (Case 1). Image source: self-drawn and self-photographed by the author.

Figure 18. Example of FLUX expert evaluation form (Case 1). Image source: self-drawn and self-photographed by the author.

Figure 19. Example of SD3.5 expert evaluation form (Case 1). Image source: self-drawn and self-photographed by the author.

Figure 20. Typological style transfer comparison diagram.

Table 1. FLUX-LoRA training parameter table.

Model Train Type	Pretrained Model	AE Model	t5xxl Model
flux-lora	flux1-dev.safetensors	ae.sft	t5xxl fp16.safetensors
Clip-l	Timestep Sampling	Model Prediction Type	Loss-Type
Clip-l.safetensors	sigmoid	raw	I2
Resolution	Save Precision	Epochs	Batch Size
1024, 1024	bf16	20	4
GPU Equipped	Learning Rate	Unet Learning Rate	Text-Encoder Learning Rate
NVIDIA RTX 4090	1 × 10⁻⁴	5 × 10⁻⁴	1 × 10⁻⁵

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Zhang, N.; Xu, C.; Xu, Z.; Han, S.; Jiang, L. Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow. Buildings 2025, 15, 2292. https://doi.org/10.3390/buildings15132292

AMA Style

Chen Z, Zhang N, Xu C, Xu Z, Han S, Jiang L. Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow. Buildings. 2025; 15(13):2292. https://doi.org/10.3390/buildings15132292

Chicago/Turabian Style

Chen, Zequn, Nan Zhang, Chaoran Xu, Zhiyu Xu, Songjiang Han, and Lishan Jiang. 2025. "Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow" Buildings 15, no. 13: 2292. https://doi.org/10.3390/buildings15132292

APA Style

Chen, Z., Zhang, N., Xu, C., Xu, Z., Han, S., & Jiang, L. (2025). Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow. Buildings, 15(13), 2292. https://doi.org/10.3390/buildings15132292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow

Abstract

1. Introduction

2. Literature Review

2.1. Stylistic Emulation in Architectural Heritage

2.2. Development of Generative AI in Architecture

2.3. AI Applications in Architectural Generation

2.4. Research Gap and Proposed Approach

3. Materials and Methods

3.1. Case-Study Selection: Krakow’s Eclectic Facades

3.2. Image Data Acquisition and Preprocessing

3.2.1. Initial Collection and Screening Criteria for Image Samples

3.2.2. Typology-Based Label Generation and Keyword Optimization

3.3. Typological Transcoding Framework

3.3.1. Brief Introduction to Diffusion Models and LoRA Technology

3.3.2. LoRA Model Training Workflow and Key Parameter Regulation

3.3.3. The Guiding Role of Typological Theory in Training and Inference Processes

3.4. Ethical Considerations

4. Analysis and Results

4.1. Influence of LoRA Parameters on Stylistic Generation

LoRA Loss and Weight Tuning for Style Transfer

4.2. Comparative Evaluation of Model Performance

4.2.1. Quantitative Metrics Analysis

4.2.2. Qualitative Evaluation by Expert Panel

4.3. Comparison with Previous Studies

5. Discussion

5.1. Interpreting the Stylistic Learning Process: A Typological Perspective

5.2. Methodological Contributions and Practical Implications

5.3. Limitations and Future Research Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI