Formal Feature Identification of Vernacular Architecture Based on Deep Learning—A Case Study of Jiangsu Province, China

Han, Pingyi; Hu, Shenjian; Xu, Rui

doi:10.3390/su17041760

Open AccessArticle

Formal Feature Identification of Vernacular Architecture Based on Deep Learning—A Case Study of Jiangsu Province, China

by

Pingyi Han

¹,

Shenjian Hu

^1,* and

Rui Xu

²

¹

School of Architecture and Fine Art, Dalian University of Technology, Dalian 116024, China

²

School of Computer Science & Technology, Henan Institute of Science and Technology, Xinxiang 453003, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(4), 1760; https://doi.org/10.3390/su17041760

Submission received: 26 November 2024 / Revised: 9 February 2025 / Accepted: 18 February 2025 / Published: 19 February 2025

Download

Browse Figures

Versions Notes

Abstract

As an important sustainable architecture, vernacular architecture plays a significant role in influencing both regional architecture and contemporary architecture. Vernacular architecture is the traditional and natural way of building that involves necessary changes and continuous adjustments. The formal characteristics of vernacular architecture are accumulated in the process of sustainable development. However, most of the research methods on vernacular architecture and its formal features are mainly based on qualitative analysis. It is therefore necessary to complement this with scientific and quantitative means. Based on the object detection technique, this paper proposes a quantitative model that can effectively recognize and detect the formal features of architecture. First, the Chinese traditional architecture image dataset (CTAID) is constructed, and the model is trained. Each image has the formal features of “deep eave”, “zheng wen”, “gable” and “long window” marked by experts. Then, to accurately identify the formal features of vernacular architecture in Jiangsu Province, the Jiangsu traditional vernacular architecture image dataset (JTVAID) is created as the object dataset. This dataset contains images of vernacular architecture from three different regions: northern, central, and southern Jiangsu. After that, the object dataset is used to predict the architectural characteristics of different regions in Jiangsu Province. Combined with the test results, it can be seen that there are differences in the architectural characteristics of the northern, middle, and southern Jiangsu. Among them, the “deep eave”, “zheng wen”, “gable”, and “long window” features of the vernacular architecture in southern Jiangsu are very outstanding. Compared with middle Jiangsu, northern Jiangsu has obvious features of “zheng wen” and “gable”, with recognition rates of 45.8% and 27.5%, respectively. The features of “deep eave” and “long windows” are more prominent in middle Jiangsu, with recognition rates of 50.9% and 73.5%, respectively. In addition, architectural images of contemporary vernacular architecture practice projects in the Jiangsu region are selected and they are inputted into the AOD R-CNN model proposed in this paper. The results obtained can effectively identify the feature style of Jiangsu vernacular architecture. The deep-learning-based approach proposed in this study can be used to identify vernacular architecture form features. It can also be used as an effective method for assessing territorial features in the sustainable development of vernacular architecture.

Keywords:

vernacular architecture; deep learning; object detection; architectural form; sustainable design; architectural heritage

1. Introduction

Vernacular architecture has important historical and cultural values, and they play an important role in cultural inheritance. They are not just material cultural heritage, even more so, they are carriers of intangible cultural heritage. Vernacular architecture demonstrates the local national characteristics and regional culture through unique architectural styles. Vernacular architecture also makes a significant contribution to ecological conservation and cultural diversity. They live in harmony with the natural environment and form a unique local style. The diversity of vernacular architecture demonstrates the different cultural traditions of each region and enriches cultural diversity.

Under the influence of globalization, the phenomenon of architectural homogenization is becoming more pronounced. The regional style of architectural forms is once again a matter of concern. Vernacular architecture has a distinctive regional style. Thus, vernacular architecture provides ideas for solving the problem of architectural homogenization [1]. Vernacular architecture is always of great interest to architectural scholars. On the one hand, it is because of its harmonious coexistence with the natural environment, which embodies the concept of sustainability; on the other hand, this is because it carries a wealth of local information and has important historical and cultural value [2]. Vernacular architecture is an important sustainable architecture and an important rural tangible cultural heritage. In the process of sustainable development, vernacular architecture records the history of local development and daily life and forms distinctive formal features [3]. The construction of vernacular architecture advocates using local resources and spontaneous construction by local people. Typically, folk artisans build houses based on geographic conditions, ecological resources, and other factors, combined with traditional cultural concepts [4]. The form of vernacular architecture meets the needs of local people’s lives and production, as well as traditional cultural concepts and aesthetic preferences [5].

In China, the construction techniques and the water village culture of vernacular architecture in the Jiangsu region are influential. This includes its artistic value, cultural value, and environmental features. The Jiangsu region has well-preserved areas of vernacular architecture and many “Chinese famous towns and villages in history and culture”. For example, the famous Tongli Ancient Town and Zhouzhuang Ancient Town. They have all retained a great deal of their vernacular architecture heritage, and those buildings were constructed during the Ming and Qing dynasties. Those architectural forms fully display the regional style of China’s Jiangnan water town. The history of the ancient town of Tongli can be traced back to the Songze and Liangzhu cultures 5000 or 6000 years ago. In 2000, the ”Tuisi Garden” in the ancient town of Tongli was inscribed on the World Heritage List. The history of Zhouzhuang Ancient Town can be traced back to the first year of Yuanyou in the Northern Song Dynasty (1086), more than 900 years ago [6]. Because of the convenient water transport there, it gradually developed into a center of commerce and culture in the southern part of the Jiangnan region and also became an important meeting place of the Wu-Yue culture. In 2003, the ancient town of Zhouzhuang received the UNESCO Award for the Protection of World Cultural Heritage in the Asia-Pacific Region. Therefore, this paper chooses vernacular architecture in historical villages and towns in Jiangsu Province as the main research object.

Currently, research on “architectural form” is well established. In established research, “architectural form” is often associated with the symbolism, meaning, and other elements of the architectural form [7]. Related to this is also the study of “architectural typology”, in which the concepts of “archetype” and “type” are proposed [8]. The meaning of architectural forms is analyzed from a “semiotic” point of view [9]. The research related to the formal features of vernacular architecture is focused on the stylistic features of each region. Furthermore, the formal features are largely dependent on expert refinement and summarization [10]. The above study is based on qualitative research. In China, the Jiangnan architectural style should have white walls and grey tiles, but it is difficult to generalize how white and grey they respectively have. Therefore, the study of the vernacular architecture form features with deep learning methods is different from the previous qualitative research approach.

Along with the continuous maturity of artificial intelligence technology, it provides more convenient and scientific research tools for vernacular architecture research. Nowadays, the use of artificial intelligence techniques, especially deep learning, helps to carry out a more in-depth and comprehensive study of vernacular architecture [11]. Deep learning techniques greatly advance the development of object detection [12]. Object detection is an important task in the field of computer science. The aim is to identify specific targets in an image and determine their location and category [13]. Bao et al. used a deep learning object detection algorithm to establish a classification system for vernacular architecture in China. Deep learning has a wide range of applications in the construction field. Guo proposes a model for generating architectural micro-visual morphology by improving the GAN for deep learning, which provides more ideas for designers [14]. Deep learning excels in classifying architectural styles, allowing for the study of traditional architecture in a variety of styles, as well as categorizing the design work of different architects [1,15]. From the perspective of architectural heritage, digital heritage preservation with the help of deep learning technology is becoming an inevitable trend in architectural heritage research [16,17,18]. Deep learning is mainly applied in the architectural field with the help of its image classification [19] and object detection [20,21] capabilities.

Therefore, in this study, a deep learning object detection algorithm is used to study vernacular architecture form features, and the AOD R-CNN model is proposed. Specifically, AOD R-CNN is a quantitative research method based on visual elements of vernacular architecture images. This is different from the qualitative research in previous studies [22]. This study attempts to adopt deep learning techniques to provide new perspectives for the inheritance of regional features and the sustainable development of vernacular architecture. The results of the study can be used as a theoretical complement and methodological optimization for the study of vernacular architecture and architectural forms. This attempt is different from previous analyses of architectural form features but can serve as a complementary illustration of traditional research methods and findings. The methodology proposed in this study is not only applicable to the study of formal features of vernacular architecture in the Jiangsu region but is also applicable to the formal features analysis of vernacular architecture or regional architecture in other regions.

2. Literature Review

2.1. Research on the Form of Vernacular Architecture

The study of “form” exists in many disciplines, and its definition varies from field to field. For the sustainable development of vernacular architecture, the question of form is one of the central issues [23]. On the one hand, the goal of the study of vernacular architecture’s formal features is to establish the consciousness concept of sustainable development and re-establish a model of living in harmony with nature. On the other hand, the study of vernacular architecture’s formal features involves the objects of study including human beings, architecture, culture, and the environment. This is especially true of architectural forms that are specific to the natural environment [24].

The study of vernacular architecture has its origins in the 1964 book, Architecture without Architects, by the famous American architect Bernard Rudofsky. It marks the beginning of the study of vernacular architecture and is also the origin of the study of vernacular architectural formal features. Through a series of case studies, the book showcases architectural forms that are gradually being forgotten and emphasizes their historical and cultural importance [25]. In 1969, Amos Rapoport, a renowned American architect, and anthropologist, published his book, “Architectural Form and Culture”. Using case studies and fieldwork methods, the book provides insights into the formation and typical features of architectural forms from the research perspectives of anthropology and cultural geography [26]. In the late 1980s, Chinese architectural scholars set up a research group on vernacular architecture, adopting a cross-cutting perspective between architecture and sociology. Replacing the previous study of “residential houses” with “vernacular architecture”, the study attempts to link and study vernacular architecture with everyday life. They collated and summarized the materials obtained from genealogies, inscriptions, and inquiries, and conducted in-depth research on the formal features, and historical and cultural values of vernacular architecture [27]. From the perspective of vernacular architectural heritage, it should pay attention to the overall research and protection of vernacular architectural forms to achieve a comprehensive understanding of vernacular architectural conservation revitalization and development [28].

At present, the research results of vernacular architecture are very mature, and researchers have thought more about the study of vernacular architecture form features in different regions. They explore the formation and development of vernacular architecture’s formal features from a wider range of perspectives, continuously expanding and extending the study of vernacular architecture’s formal features to lay the foundation for the preservation and sustainable development of vernacular architecture. Established related studies are mostly combined with methods such as field research and mapping sampling to analyze the regional features of vernacular architecture forms. On this basis, with the help of rooted theory, ‘gene’, and other concepts, it provides a theoretical and practical basis for the protection and inheritance of vernacular architecture [29,30]. Because of the regional and functional nature of vernacular architecture, the architectural form features are analyzed in terms of the specificity and distribution of the area in which the object of study is located [31,32]. Vernacular architecture forms are related to their functionality and construction techniques. Moreover, different regions have different resources and climates, so the construction methods used also vary. Grain silos, for example, do not remain in a specific original state during the process of sustainable development, but undergo a fundamental evolution in architectural form and technology [33]. In China, the formal features of vernacular architecture also have differences depending on the ethnicity. Tao et al. use Hakka vernacular architecture as an example to analyze the relationship between culture and architectural form [34]. The combination of architectural form and aesthetic studies is also very common. Wei and Cho study the aesthetic value and morphogenesis of architectural morphologies through the knowledge of cognitive science and neuroaesthetics, which is based on the visual perception of the human eye to evaluate the architectural form aesthetically [35]. The study of vernacular architecture’s formal features can help to summarize the appropriate design methods, which have an impact on both regional and contemporary architecture. Vernacular architecture form should be designed in combination with different site attributes, regional features, and architectural functions. Design approaches to energy and architectural form and space creation can be explored through the sustainability of vernacular architecture [36].

To sum up, most current research results on the formal features of vernacular architecture are based on qualitative methods such as case studies and field research. Quantitative research methods provide an alternative perspective for the study of vernacular architecture and the study of architectural forms. Although it cannot replace traditional qualitative research, the results obtained using quantitative research methods can complement traditional theoretical research and help traditional research explore unknown aspects. Along with technological advances, research tools and methods are becoming more diverse. These can bring convenience to the study of vernacular architecture form features. Therefore, this paper uses deep learning techniques to study the formal features of vernacular architecture, based on vernacular architectural images, which is different from previous research methods.

2.2. The Application of Deep Learning in the Field of Architecture

Deep learning is the current hotspot of research in the field of artificial intelligence. The most important technical feature of deep learning is the ability to automatically extract features. The representational power of the features extracted by deep learning is much better compared to the traditional manual extraction. Along with the maturity and promotion of deep learning, it brings more possibilities for research in the field of architecture. Zou et al. apply machine learning to construct a regional architectural form feature recognition model to serve as an effective tool for evaluating architectural form analysis in urban renewal [37]. Zeppelzauer et al. proposed an intelligent method for automatically assessing the age of buildings through techniques such as deep learning and visual pattern extraction [38]. With the continuous innovation of large language models, deep learning technology has made rapid progress. Generative artificial techniques are widely used in the field of architectural design. Training models, generating images, and deriving architectural silhouettes to inform the design of architectural facade forms [39,40]. A convolutional neural network model is trained by encoding morphological information from synthetic data. Then, deep learning is utilized to facilitate classification completion, proposing a new research idea for architectural style classification and prediction [41,42,43,44]. Classification can also be performed by image segmentation techniques. The image of the architecture scene is parsed into various regions, and image segmentation techniques are used to determine the doors, windows, roofs, walls, and other parts of the building, which in turn classify the architectural facades in the image [45,46,47]. Architectural forms develop different features because of the different scenarios and needs in which they are situated. In digital architectural heritage preservation, deep learning techniques can be used to identify important elements of architectural images that are significant and to classify architectural heritage images [16,48].

The application of new technologies is becoming more and more widespread, and the research paradigms for the identification and classification of architectural formal features have become more diverse. While discussing different algorithmic models and recognition techniques, the problem of lacking training samples is gradually solved. This becomes the key to optimizing the technique and improving the accuracy [49]. In existing research, artificial intelligence techniques are used to recognize and classify architectural images, the advantages of which are objectivity, science, and efficiency. Traditional image recognition methods rely on high-quality image data material and need to train the model with a large number of image samples thus acquiring feature information. Therefore, there are limitations to this research paradigm [50]. Overall, the current techniques used for segmentation, classification, and recognition of architectural images are gradually maturing. They focus on the differences between buildings of the same or different types. However, there is a lack of a generic model that can effectively identify and extract local features of a single building [51]. In this paper, with the help of deep learning technology, using the object detection model to detect the vernacular architecture images, to construct the architecture image dataset used to train the model. Then, the object dataset is identified, and the results of vernacular architecture form features are derived. This study attempts to apply deep learning techniques to the study of vernacular architecture, identify the formal features of vernacular architecture through artificial intelligence technology, and provide evaluation tools and design references for the preservation and sustainable development of vernacular architecture.

3. Research Area and Data

The application of deep learning in architecture is gradually maturing. The accuracy of image segmentation and target recognition through AI technology exceeds human expectations. The analytical power with the help of neural network models is not inferior to manual analysis [52]. This study aims to recognize important features in vernacular architecture images by training neural network models. It can be used as a quantitative tool for extracting formal features of vernacular architecture and as an effective method for assessing local features in the sustainable development process of vernacular architecture. In the study, images of vernacular architecture in Jiangsu Province are recognized, and specific applications of object detection techniques are introduced and demonstrated. Finally, the results of identifying the formal features of vernacular architecture in the region are obtained. The Chinese traditional architectural image dataset (CTAID) and Jiangsu Province traditional vernacular architecture image dataset (JVAID) are used to train and test the model, respectively.

3.1. Study Area

Jiangsu Province is located in the middle of China’s eastern coastal region and is an economically developed region in China. Geographically, Jiangsu Province is bordered by Anhui to the west, Zhejiang and Shanghai to the southeast, and Shandong Province to the north. The Changjiang River, Huaihe River, and the Yuntai Mountain Range in the north of Jiangsu Province divide the province into several regions, forming the Sunan Plain, the Jianghuai Plain, the Huanghuai Plain, and the Eastern Coastal Plain. Among them, the South Jiangsu Plain is centered on Taihu Lake, forming a unique Taihu Lake scenery [53]. The natural environment and climatic conditions of Jiangsu Province are very favorable, so it is an ideal place for people to live since ancient times. As Jiangsu Province straddles rivers and coasts, has many lakes, a dense water network, and abundant rainfall, the vernacular architecture of Jiangsu Province has water village features. This favorable geographic environment makes the region rich in cultural exchanges and rich in cultural history. The Chu and Han cultures in the north, the Jinling and Huaiyang cultures in the middle, the Wu culture in the south, and the marine culture in the east meet here. After a long period of historical evolution, vernacular architecture conforms to the local environment dynamically and adaptively; moreover, it gradually forms an architectural form with regional features [54]. Even more valuable is the area’s well-preserved vernacular architecture, with thousands of waterfront homes being passed on. Although the region is economically developed, in the preservation and sustainable development of vernacular architecture, it focuses on the preservation of regional features and the inheritance of local culture. Therefore, this study selects vernacular architecture in Jiangsu Province as the research object, and analyzes and discusses the formal features of vernacular architecture in this region, to promote the protection and sustainable development of vernacular architecture.

Since Jiangsu Province is located at the crossroads of the Central Plains, Qilu, and Wu-Yue cultures, it is the junction of China’s northern and southern culture circles. Under the influence of multiple and complex factors, there are differences in the form of vernacular architecture in different regions of the province. According to the topographical features and cultural zoning of Jiangsu Province, this study divides it into three regions: northern, middle, and southern Jiangsu. Furthermore, the vernacular architectures of the historical and cultural villages and towns in the three regions are selected as the research objects, respectively (Figure 1).

Table 1 lists the historical and cultural villages and towns in the three regions of Jiangsu Province. Chinese historical and cultural villages and towns are selected by the government and professional organizations. They are rich in cultural relics and have significant historical value or commemorative significance, which can completely reflect the traditional style of the historical period and local national features [55]. Therefore, this paper takes the vernacular architecture of historical and cultural villages and towns in Jiangsu Province as the research object. This helps to pass on traditional culture and protect traditional villages and towns and historical features, which promotes the continuation of traditional architectural features.

3.2. Overview of the Vernacular Architecture Form Features in Jiangsu Province

As Jiangsu folk houses have important reference value and guiding significance for the formal features of Jiangsu vernacular architecture, they are closely related. Therefore, this study takes “Jiangsu Folk Houses” as the main research basis to refine the formal features of vernacular architecture in Jiangsu Province [56]. The formal features of Jiangsu vernacular architecture can be specifically categorized into the following six points.

(1) Deep Eave

In traditional Chinese architecture, the large roof shape is the most feature. Common traditional roof forms are classified as “wudian roof”, “xieshan roof”, “xuanshan roof”, “yingshan roof”, “juanpeng roof”, “cuanjian roofs”, “shiziji roof”, “lu roof”, and “kui roof”. Among them, the “yingshan roof”, “xuanshan roof” and “juanpeng roof” styles are often used in folk vernacular architecture. In ancient China, the choice of roof style is related to status level. Because of the design of the roof style, it creates the architectural feature of “deep eave” (Figure 2). Such eaves can be used to keep the sun and rain out, which is both practical and aesthetically pleasing [57,58].

(2) Zheng Wen (Decoration at the two ends of the roof’s main ridge)

Zheng wen is an important component of traditional Chinese architectural roofs. It is usually used as a decoration for the roof ridge and has a fixing effect. This kind of decoration is often combined with elements such as warped feet and flying eaves, resulting in a roof style with traditional features (Figure 3). Zheng wen is very common, both in ancient China’s official architecture and in folk vernacular architecture [59,60].

(3) Gable

The gable usually refers to the exterior transverse wall, which is the wall in the direction of the short axis of the building. It serves mainly as a divider and fire protection in traditional architecture. In ancient times, it was found that fires usually spread upwards from the bottom to the pillars of the house, so gables are designed to guard against fires. This form is perpetuated to create staggered formal features. There are various types of gables in traditional Chinese vernacular architecture, varying from region to region. Gable in traditional northern vernacular architecture is mostly used to enclose and separate spaces. And most of the gables in the south are overhanging gables (Figure 4). For example, the Matou wall and Guanyindou of the Jiangnan residence. Saddle walls in the Fuzhou area. Huoer walls of Lingnan residential architecture [61,62].

(4) Long Window

The Jiangsu region has a humid climate, and the most important function of the window, in addition to light, is ventilation. Windows in the facades of vernacular architecture in the Jiangsu region usually take the form of long windows, “half windows”, “hengfeng windows”, and “he windows”. The distribution of windows is balanced and symmetrical, which is influenced by the layout of the architectural facade. Generally, long and half windows are opened in even numbers. The exact number of openings and how they are opened will be determined by the width of the architectural facade. It seems that this has now become one of the features of traditional vernacular architecture facade forms (Figure 5) [59,62].

(5) Elaborate Construction

Most of the vernacular architecture in the Jiangsu region is freely and flexibly scattered around the water system according to the topography. The layout is compact, the building volume is small, and the building structure is light and elegant. Traditional artisans are sophisticated in their construction techniques, designing structures that are appropriate to the local environment and in line with the function of their use (Figure 6a) [63]. Exquisite residential buildings are arranged in a tightly staggered pattern, creating a water village character [61,64].

(6) Exquisite Decoration

Traditional Chinese architecture attaches importance to the fine details of the decoration, carving decorations can be found everywhere, with colorful themes (Figure 6b) [63]. In the Jiangsu region, wood carving, brick carving, and stone carving are called the “three carvings”. The “three carvings” in vernacular architecture are very common and have become the local features. Examples include building portals, wall doors, doors and windows, beams, square columns, railings, and steps. Elaborate decorations give the region’s vernacular architecture a cultural and artistic atmosphere [60,64].

4. Methods

4.1. Data Collection

Architectural form is a relatively general concept that encompasses all aspects of building appearance and form. For example, the facade, colors, and materials of the building. Especially the facade of a building, which is a concrete expression of the formal features of the building structure and building volume [65]. Appearance features such as composition elements and composition ratio of the building facade can be analyzed through the building facade form, and then other features such as volume, proportion, and building structure can be explored [66]. The establishment of formal features of the building facade plays an important role in influencing the built environment. Whether it is the practice of vernacular architecture renovation or the restoration of vernacular architectural heritage, the building facade is the most important element. The dataset for this study is constructed by selecting those images that demonstrate the form of the building facade.

Three datasets need to be constructed for this study, and they are CTAID, JTVAID, and JCVAID. CTAID is used to train the model so that the model can recognize the formal features of vernacular architecture in Jiangsu. JTVAID is the object dataset, which is recognized using the trained model, and quantitative recognition results are obtained. The JCVAID is used to assess whether contemporary vernacular architecture is in keeping with the local features.

4.1.1. Chinese Traditional Architecture Image Dataset (CTAID)

The selection of training samples is important for the model. Since we do not have a directly available dataset, we first construct a dataset of traditional Chinese architectural images to train the model. Retrieve and filter images of traditional Chinese architecture through Google, Baidu, Bing, and other Internet platforms. To highlight the validity of the image data, we mainly selected the architectural images with the regional features of Jiangsu in the search results. We mainly use the keywords “deep eave”, “zheng wen”, “gable”, and “long window” and collect the corresponding images separately. The architectural images in the search results are screened by experts and those that include important content for the keywords are selected to form the dataset. After the initial screening and removal of duplicate images and images with insufficient clarity, a total of 2106 traditional architecture images are obtained.

4.1.2. Jiangsu Traditional Vernacular Architecture Image Dataset (JTVAID)

There are a total of 13 cities in Jiangsu Province, and this paper adopts the trichotomy method often used in existing studies. That is to say, according to geographical location and cultural features, it is divided into three parts: northern Jiangsu, middle Jiangsu, and southern Jiangsu. The northern Jiangsu region includes five places: Xuzhou, Lianyungang, Suqian, Huai’an, and Yancheng. The middle Jiangsu region includes Yangzhou, Taizhou, and Nantong. The Southern Jiangsu region includes Nanjing, Zhenjiang, Changzhou, Wuxi, and Suzhou.

To construct the image dataset of traditional vernacular architecture in Jiangsu Province, the first step is to determine the list of cultural and historical villages and towns located in Jiangsu Province. After collecting and organizing the list of historical and cultural ancient villages and towns in Jiangsu Province, the vernacular architecture images of the historical villages and towns are comprehensively collected through search engines such as Google, Baidu, and Bing, as well as tourism information sharing websites. Firstly, after initial screening, duplicate images and images with insufficient clarity are removed. Then, the corresponding image data of vernacular architecture are collected and organized according to the zoning method of northern, middle, and southern Jiangsu, respectively. In the end, a total of 153 images are collected from 47 ancient villages and towns in northern Jiangsu. There are 106 images collected from 27 ancient villages and towns in middle Jiangsu. There are 186 images collected from 68 ancient villages and towns in southern Jiangsu. Thus, the dataset includes 445 images of vernacular architecture from 142 historical and cultural villages and towns (Table 2).

The image dataset of traditional vernacular architecture in northern Jiangsu, middle Jiangsu, and southern Jiangsu is shown in Figure 7.

4.1.3. Jiangsu Contemporary Vernacular Architecture Image Dataset (JCVAID)

Contemporary vernacular architecture refers to new forms of architecture that face the geography, natural resources, climatic features, and functional needs of the region in which they are located. The sustainable development of vernacular architecture requires the preservation of regional features and cultural connotations. Contemporary vernacular architecture is a contemporary expression of the sustainability of vernacular architecture. It advocates the integration of traditional vernacular architecture with the period context, which can correctly reflect the values and lifestyles of today’s society and era. Contemporary vernacular architecture morphologies should have regional features that can be quickly recognized. Contemporary vernacular architecture still has the qualities of vernacular architecture. Its difference from traditional vernacular architecture lies in the different missions and responsibilities given by the era and times. The construction concept of contemporary vernacular architecture is based on the features of traditional vernacular architecture, combined with the development needs of the times to realize the sustainable development of vernacular architecture. The preservation and sustainable development of vernacular architectural heritage contributes to the maintenance of cultural–ecological balance. By constructing a dataset of contemporary vernacular architecture and using image recognition technology, the formal features of these new vernacular architectures are scientifically and quantitatively assessed.

The process of constructing the contemporary vernacular architecture dataset begins with the screening of practice projects with regional features. People from all walks of life in China attach great importance to the sustainable development of vernacular architecture, and as a result, widespread remedial action for vernacular architecture is being carried out. Numerous experts and scholars are actively involved in the preservation of vernacular architecture, including the renewal and renovation of vernacular architecture, maintenance and restoration, and other practical projects. Second, an initial screening of architectural images from these projects is performed. Then, architectural images are selected that provide a comprehensive and objective representation of the exterior form of the architecture. Architectural images that are heavily occluded or have missing forms are excluded. This is to ensure that the recognition model is scientific and efficient in its evaluation. Finally, the construction of the dataset is completed by organizing and archiving the image data with high image quality.

4.2. Processing of Datasets

To enable subsequent steps such as model training and object detection, the original images in the dataset need to be labeled. Image annotation is the process of attaching labels to the original image or a set of pixels in the original image. This process is related to the accuracy and validity of the data, and it directly affects the training effect of the deep learning model. Considering the accuracy requirements of architecture images and the application scenarios of architecture, LabelImg v1.8.1 software is selected as the annotation tool to manually annotate the original images in the dataset [67]. Moreover, the “VOC” tag format is mainly used in this paper, and the corresponding annotation information of the original image is saved in an XML format file. Finally, the original images are stored in the “JpgImages” folder, and the annotation files are uniformly stored in the “Annotation” folder, and it is guaranteed that the names of the annotation files and the original images are the same.

The formal features of Jiangsu vernacular architecture mainly contain “deep eave”, “zheng wen”, “gable”, “long window”, “elaborate construction”, and “exquisite decoration”. Moreover, the standard of labeling is to place all the pixels in the image that meet the requirements in the true value box, and all the original images need to follow the same standard. The features of “elaborate construction” and “exquisite decoration” are not uniform in different buildings, which cannot be labeled correctly manually. Therefore, features such as “deep eave”, “zheng wen”, “gable”, and “long window” are labeled prominently in the original images in the dataset.

4.3. AOD R-CNN Model

By observing and analyzing the CTAID dataset, it is found that the traditional building images in this dataset have two features. (1) The individual differences in architectural forms are large, showing diverse features. Moreover, the shooting angle varies greatly, resulting in architectural forms in buildings being at different angles. (2) The environment in which the buildings are located is complex and integrated with the surrounding landscape, which leads to the prevalence of occluded targets, foreground noise, and difficult-to-distinguish target boundaries in the images. These features can reflect the actual environment in which the architectural form is located, as well as bring great difficulties to the object detection of the architectural form. With the introduction of Region Proposal Network (RPN), Faster R-CNN (Faster Region-based Convolutional Neural Networks) significantly improves the speed and accuracy of target detection and becomes one of the benchmark methods in modern target detection tasks. To improve the precision and accuracy of architectural form detection in real environments, the two-stage generalized object detection model Faster R-CNN is improved in this paper [68].

The AOD R-CNN model with good detection performance is proposed, and its specific flowchart is shown in Figure 8. First, the AOD R-CNN model is trained with CTAID as the training dataset. Then, the JTVAID dataset is input into the AOD R-CNN model to obtain quantitative recognition results of architectural morphologies. Finally, the AOD R-CNN model is utilized to identify the new buildings in the JCVAID dataset. By analyzing the formal features of the new buildings in the local area, it is evaluated whether these new buildings are characterized by the style of the local architecture.

4.3.1. Faster R-CNN Network Model

Faster R-CNN is a mainstream algorithm in the field of object detection, which innovatively adopts a two-stage framework, including RPN including the Region Proposal Network (RPN), and an effective feature extraction mechanism. The algorithm adopts the region candidate network to generate detection frames, which effectively solves the time-consuming problem of candidate frame generation. Moreover, the algorithm achieves faster object detection, saves computation time, and avoids redundant computation. The network architecture of Faster R-CNN consists of three main parts, as shown in Figure 9. The first part is the backbone feature extraction network, which extracts features from the original image and generates a feature map with convolution, pooling, and other operations. The second part is the RPN network, which exploits the feature maps extracted by the backbone feature network to generate suggestion frames and perform preliminary classification and localization. The third part is the detection network, which utilizes RoI Pooling to unify the RPN-generated suggestion frames to the same size and achieves category prediction and bounding box generation through the fully connected layer, thus obtaining the final target prediction frame.

4.3.2. Optimization of Backbone Network

The feature extraction capability of the backbone network has a greater impact on the object detection performance of the whole model. Considering the features of the original data in the building dataset, deeper neural networks need to be selected to learn the complex image semantic information. Meanwhile, to take into account the training time, AOD R-CNN replaces the original backbone network VGG in Faster R-CNN with ResNet50. ResNet50 is a widely used deep convolutional neural network with excellent performance [67]. To overcome the problem of gradient vanishing in the training process of the object detection algorithm, the residual block structure is used in ResNet50. Different from VGG, ResNet50 replaces the fully connected layer with an average pooling layer, which effectively reduces the number of parameters in the model and avoids the risk of over-fitting the model [49]. The network structure of ResNet50 is shown in Table 3. During model training, ResNet50 can be used as a benchmark network for object detection algorithms, which enhances the detection accuracy of the model and improves the speed of detection by increasing the depth of the network model [12]. In addition, ResNet50 enhances the extensibility and flexibility of the model by adjusting the size of the convolution kernel and optimizing the network structure. It can be seen that ResNet50 effectively improves the performance of the model and enhances the learning ability to cope with complex models through methods such as residual learning and the global average pooling layer [13].

4.3.3. Design of the Feature Pyramid Structure

FPN for object detection is a feature pyramid structure widely used in the field of object detection. FPN effectively improves the detection performance of multi-scale targets through feature fusion and has an excellent detection effect when dealing with small target objects. FPN constructs a multi-scale feature pyramid, which fuses the semantic information of the high-level feature maps with the spatial information of the low-level feature maps through the top-down and horizontal connectivity structure so that rich information can be accessed at each layer.

In this paper, the FPN network is integrated into the Faster R-CNN network structure, and the outputs of the second feature extraction stage (C2) to the fifth feature extraction stage (C5) of the ResNet50 model are used as inputs to the FPN model. The feature maps output from the second feature extraction stage to the fifth feature extraction stage are interpolated and fused to produce different scale feature maps. In addition, the small-size feature maps from the layer-by-layer height stage are interpolated bilaterally to obtain feature maps with the same size as the previous stage and fused with them to finally obtain (P2, P3, P4, P5). Where (P2, P3, P4) is the new feature map, P5 is the feature map output from C5 and P6 is the output of C5 after maximum pooling. The proposed feature pyramid structure is shown in Figure 10. Combining the ResNet50 model with the FPN network helps to improve the learning ability of the network, make full use of the deep and shallow semantic features, and strengthen the network’s ability to recognize small target objects in the architectural form.

4.3.4. Deformable Optimization Strategy

Since the candidate target regions extracted by RPN in Faster R-CNN are all of different sizes, RoI Pooling is employed to transform the candidate regions of different sizes into feature maps of the same size for the subsequent object detection task. To address the limitations of geometric transformation modeling for convolutional neural networks, Deformable Convolutional Networks (DCN) and deformable RoI Pooling structures are introduced in this paper to enhance the learning capability of convolutional neural networks. The two modules mainly use added offsets to change the constant receptive field of the object, and the added offsets allow the model to learn the offsets from the target task.

(1) DCN allows adaptive change in receptive fields according to the object scale, which helps to solve the target recognition problem in complex scenes. The standard convolution and deformable convolution structures are shown in Figure 11.

The offsets in deformable convolution are derived by applying a convolutional layer to the input feature map, and the convolutional kernel has the same spatial resolution and expansion as the current convolutional layer, which allows for the free transformation of the sampling level. The implementation principle of deformable convolution is shown in Figure 12.

Assuming the starting position P₀ of the convolution content, the new feature map can be obtained by convolution operation, as shown in Equation (1).

y (p_{0}) = \sum_{p_{n} \in R} w (p_{n}) \cdot x (p_{0} + p_{n})

(1)

where R is the kernel, w is the weight of the kernel, and x(p₀ + p_n) denotes the value of the element at position p₀ + p_n on the input feature map. The new feature map can be obtained by calculating this equation.

In deformable convolution, R incorporates an offset ∆p_n, taking the range {∆p_n∣n = 1, …, N}, and it is calculated as shown in Equation (2).

y (p_{0}) = \sum_{p_{n} \in R} w (p_{n}) \cdot x (p_{0} + p_{n} + Δ p_{n})

(2)

(2) Deformable RoI Pooling mainly consists of classical RoI Pooling with a fully connected layer for learning offsets. First, RoI Pooling divides RoI into regular sub-regions and calculates the maximum value in each sub-region to generate a new feature map. The new feature map is then passed through the fully connected layer to add offsets to each region to generate the final feature map. Moreover, the offsets are usually learned based on the features and RoI of the previous layer and are also adaptively localized based on the shape of the object.

First, the pooled feature map is generated after classical ROI Pooling. Assuming an input feature map x, the ROI Pooling layer divides the ROI into k × k containers and outputs a k × k feature map y. The output feature map is denoted as Equation (3).

y (i, j) = \sum_{p \in b i n (i, j)} \frac{x (p_{0} + p_{n})}{n_{i j}}

(3)

where n_ij is the number of pixel points in the container.

In deformable RoI Pooling, offsets are added to the position of the spatial grouping block, calculated as shown in Equation (4).

y (i, j) = \sum_{p \in b i n (i, j)} \frac{x (p_{0} + p_{n} + Δ p_{i j})}{n_{i j}}

(4)

The deformable RoI Pooling extracts more structural features of the architectural form, and the parameters of the fully connected layer in this model can be learned and adjusted by backpropagation, which in turn accelerates the training by downsampling. Therefore, the proposed AOD R-CNN is shown in Figure 13.

4.3.5. Adaptive Anchor Selection Algorithm Based on K-Means++

The anchors commonly used in the Faster R-CNN model are set based on the target size of the public dataset. By comparing the CTAID dataset with other datasets, it can be found that the target sizes in the two types of datasets differ greatly. If the default anchor is directly used to train the CTAID dataset, it tends to lead to lower recall, slower convergence of the loss function, and lower precision of object detection [13]. To generate an anchor applicable to the target size of architectural forms and improve the accuracy of bounding box regression in the object detection algorithm, the adaptive anchor selection algorithm based on K-means++ is adopted to set the guidance for the width-to-height ratio of the anchor. Setting the set consisting of column width and row height of all architectural form labeling boxes in the CTAID dataset as D and the number of clusters as n, the steps of the adaptive anchor selection algorithm are as follows.

(1) A sample from the dataset D is randomly selected as the initial clustering center c₁.

(2) Calculate the shortest distance D(x) between each sample x_j (j∈1, 2, …, i) in the dataset D and the known clustering center c_k (k∈1, 2, …, n), as shown in Equation (5).

D (x) = \arg \min {‖x_{j} - c_{k}‖}^{2}

(5)

(3) Calculate the probability P(x) that each sample x_j in the dataset D is the next clustering center, as shown in Equation (6).

P (x) = \frac{D {(x)}^{2}}{\sum_{x \in D} D {(x)}^{2}}

(6)

(4) Determine which interval each sample belongs to by calculating the cumulative sum of its probabilities. Then, the sample point corresponding to that interval is used as the next clustering center.

(5) Repeat steps (2) to (4) until n clustering centers C = {c₁,c₂,…,c_n} are obtained.

(6) Calculate the distance from each sample point in dataset D to each clustering center in C and classify it into the cluster where the clustering center with the smallest distance is located. And calculate and obtain the new clustering center in each cluster.

(7) Repeat step (6) until the change in the position of the cluster center reaches the convergence condition.

Unlike traditional subjective methods based on architectural history or architectural theory, the proposed AOD-RCNN model can utilize a large number of image resources to independently extract the features of architectural morphologies to overcome the problem of insufficient description of features by traditional methods [51].

4.3.6. Comparison Between AOD R-CNN and Faster R-CNN

To achieve efficient target detection of architectural forms, the proposed AOD R-CNN model undergoes a series of improvements, including the optimization of the backbone network, the design of the feature pyramid structure, the deformable optimization strategy, and the adaptive anchor selection algorithm based on K-means++. The proposed AOD R-CNN model is based on the Faster R-CNN model with the following four improvements, as shown in Table 4.

(1) Optimization of backbone network. ResNet50 is chosen as the backbone feature extraction network, replacing the classic VGG. Compared with VGG, ResNet50 deepens the network depth and avoids the problems of gradient vanishing and explosion. ResNet50 can learn higher-level information in the input image and extract more feature information from the input image to further improve the performance of the network model. Moreover, ResNet50 adopts a convolutional kernel with a smaller size, which reduces the number of parameters of the model and reduces the complexity of the model while ensuring that the receptive field remains unchanged.

(2) Design of the feature pyramid structure. Considering that the feature map tends to lose the location and feature information of small targets during continuous downsampling in the backbone network, the FPN network is integrated into the Faster R-CNN network structure. Complementing the underlying spatial location features and the high-level semantic features with FPN networks, different-sized targets can be detected in all levels of resolution feature maps.

(3) Deformable optimization strategy. Since the studied architectural form is very irregular and there are more target objects with variable shapes, the standard convolution kernel has a fixed size and dimensions and can only extract features at a fixed position in the feature map, which leads to a low detection effect. Therefore, deformable convolution is employed to learn the offset of the receptive field sampling points to cover the target location more accurately. The deformable feature image is generated to improve the accuracy of detection by adaptively fusing the similar structure information adjacent to each pixel point.

(4) Adaptive anchor selection algorithm based on K-means++. Anchor frames are a priori frames that need to be set in advance in the network before training. The original anchor frames in Faster R-CNN are obtained from the labeling experience of the public dataset, and there are nine anchor frames with different scale sizes and aspect ratios. For the characteristics of the architectural form dataset, the scale size and aspect ratio of the labeled boxes need to be adjusted. Therefore, the Kmeans++ algorithm is used for clustering, and the a priori boxes with different sizes and dimensions applicable to architectural form detection are calculated.

Through the above improvements, the AOD R-CNN model can effectively overcome the problems of diversified architectural forms and indistinguishable boundaries.

5. Results and Discussion

This study begins with the identification and analysis of architectural formal features to analyze the underlying differences in vernacular architecture styles. In the past, most of the attention to architectural form or architectural style is focused on overall performance. This paper quantitatively evaluates the formal features of vernacular architecture in the Jiangsu region through object detection technology. The purpose of using this method of identification is to focus on the effect of local features on the overall form. This paper combines geographic location and cultural history to study the formal features of vernacular architecture in the Jiangsu region by zoning. According to the experimental results derived from the above experimental process, it is found that there are differences in the formal features of vernacular architecture in each region, but there are also common features.

5.1. Identifying Formal Features of Vernacular Architecture in the Jiangsu Region

Object detection requires not only accurately finding the location of the recognized target but also being able to differentiate between types of recognized objects. It is usually labeled with the corresponding confidence score. Confidence is the degree to which the model is credible for the detection results. It is mainly used to determine whether the object in the “Object Detection Box” is a valid sample or an invalid sample. Table 5 shows the feature detection results of the vernacular architecture image dataset in Jiangsu Province. Among them, “deep eave” is labeled as a valid sample 262 times, including 46 times with a confidence score ≥ 0.68, 59 times with a confidence score ≥ 0.78, and 157 times with a confidence score ≥ 0.88. It can be seen that the “deep eave”, “zheng wen”, “gable”, and “long window” are the main features of vernacular architecture in Jiangsu Province. In addition, when constructing the dataset, images of architecture that can demonstrate the local features are selected, which include the architectural form of the front and side elevations of the buildings. However, it is difficult to recognize the formal features of “zheng wen” in the front and side elevations. Thus, the “zheng wen” is defined as the small area of the facade at the top ends of the building’s roof. “Zheng wen” is not shown on the side; that is, not all detection targets are included in all 445 sample images. In such a situation, the test results are more satisfactory than those presented.

5.2. Vernacular Architecture Formal Features Zoning Identification

Table 6 shows the identification results of “deep eave”, “zheng wen”, “gable”, and “long window” in three regions of Jiangsu. Of the three regions, “long window” is identified most times and has the highest identification rate. The “gable” is identified the least times and has the lowest identification rate.

A comparison of the results between northern and middle Jiangsu shows that the recognition rate of “zheng wen” and “gable” is higher in northern Jiangsu. Of the two regions, middle Jiangsu has a higher recognition rate for “deep eave” and “long window”. The identification rate of vernacular architecture formal features is the highest in all three regions in southern Jiangsu (Figure 14).

As can be seen from the geographic location of Jiangsu Province, the northern part of the province is bordered by Anhui and Shandong. Therefore, the vernacular architectural form of northern Jiangsu blends the vernacular architectural features of both the south and north regions of China [29]. On the one hand, influenced by Qilu culture, it has the simplicity and steadiness of northern architecture and focuses on practicality. On the other hand, in the “Huizhou” architectural style, the vernacular architecture form of the “gable” features is more obvious than the middle Jiangsu [69]. In the middle Jiangsu region, which is the transition between the northern Jiangsu style and the southern Jiangsu style, the architectural form is more Jiangnan style than that of the northern Jiangsu region. This is reflected in the “deep eave” and “long window” that characterize the building. Among the three different regions, the vernacular architecture form of southern Jiangsu is the most feature of Jiangnan. As a result, the features of “deep eave”, “zheng wen”, “gable”, and “long window” are very obvious. The Jiangsu region has a well-developed water system and abundant rainfall. Under the influence of geographical and climatic conditions, the “deep eave” and “long window” of the roof became essential architectural features [63]. The “deep eaves” of traditional architectural roofs can protect the structure under the eaves and prolong the life of the architecture. It is also capable of solving practical problems such as drainage and lighting and providing shelter from light and rain for people. In ancient China, the eaves form of traditional roofs has symbolic meaning and artistic value [70]. Due to the features of the geographical environment, the indoor space of the architecture in the Jiangsu region is more humid. The “Long Window” can effectively improve the ventilation of the room, with ventilation and light, and can also be used as a door. In addition, the “long window” is an important element of traditional vernacular architecture facade decoration, reflecting the aesthetic preferences of local people. Therefore, it can be seen from the test results that “deep eaves” and “long windows” are very obvious in the north, middle, and south Jiangsu regions. The recognition rates of “deep eaves” in the three regions are 49.7%, 50.9%, and 73.1%, respectively; the recognition rates of “long windows” in the three regions are 53.6%, 73.5%, and 78.4% respectively.

5.3. Identification of Contemporary Vernacular Architecture Formal Features

5.3.1. The d-u DUCAL Coffee & Culture

The d-u DUCAL Coffee & Culture in Wuxi City is a renewed use of vernacular architecture. The updated architectural entryway replaces the traditional lattice-length windows with fine wood grille windows. The overall appearance of the architecture inherits the traditional vernacular architecture formal features, such as the roof, the gable, and the deep eave. In addition, traditional architectural massing is preserved. Both the overall form and the architectural ambiance preserve the local character. The architectural facade, despite the floor-to-ceiling glass, continues the traditional wood materials with black and white gray. The architectural facade, although floor-to-ceiling glazing, continues the traditional matching of wood materials in combination with black, white, and gray. The d-u DUCAL Coffee & Culture is innovative while highlighting the local traditional style. It can serve as an excellent example of vernacular architecture preservation and renewal practices.

As shown in Figure 15, the features of “deep eave”, “zheng wen”, and “gable” are labeled in the architecture image, and the confidence score is 0.89, 0.83, and 0.92, respectively. The confidence score for the “long window” is only 0.66. Since the modeling training uses image material of traditional vernacular architecture, their materials and forms are different from modern architecture. So, it fails to mark it as a feature form with high confidence. However, this result can still be seen as a useful sample. From this, it can be determined that this architectural form has a regional feature. Of the four features, three are evident and one is insignificant. From the perspective of contemporary vernacular architecture sustainability, the architectural form of the d-u DUCAL Coffee & Culture possesses traditional and regional features. This is a design based on the inheritance of traditional features, innovating while retaining traditional features so that vernacular architecture can be reused. The influence of traditional features on contemporary vernacular architecture practice is thus evident.

5.3.2. The Project of the Folk Song Culture Center in Fengmenglong Village

The project of the folk song culture center in Fengmenglong village is located in Suzhou, and the whole design project consists of several building groups. The appearance inherits the regional features, and the spatial distribution also adopts scattered small volumes to echo the traditional village texture. The new buildings highlight the regional style of “deep eave”, “zheng wen”, “long window”, and black, white, and gray coloring on the exterior. In terms of building materials, the traditional way of matching wood materials with white walls and grey tiles is continued. The construction project is in line with the traditional vernacular architecture, rich in the showy beauty of the southern Jiangsu region. From the architectural form to the ambiance, one can experience the handsome and elegant Jiangnan temperament.

As shown in Figure 16, the formal features of “deep eave”, “zheng wen”, and “long window” are marked in the architectural image with confidence scores of 0.91, 0.89, and 0.93, respectively. Although the recognition result of the “gable” feature is only 0.68, it is still possible to determine that these architectural morphologies have regional features. They preserve the “deep eave”, the “zheng wen”, and the “long window” that characterize the vernacular architecture. This means that the formal characteristics of traditional vernacular architecture have an important inspirational role to play in the sustainable design of contemporary vernacular architecture.

5.4. The Sustainability of the Formal Features of Vernacular Architecture

5.4.1. Retention of Roof Features and Highlighting of Regional Characteristics

The roof is an important part of the architectural form. Roof moldings can demonstrate the regional character of the architectural form. Retaining roof features is a direct and effective way to highlight the formal features of vernacular architecture. This is also evidenced in the design of contemporary vernacular architecture in Jiangsu Province. Many new buildings in the area use traditional architectural roof forms as one of the regional style icons. The “zheng wen” and “deep eave” are important features of the roofs of vernacular architecture in the Jiangsu region. They are distinctly regional in both functional and aesthetic terms. The “zheng wen” and “deep eave” are localized details of the building’s roof form. Ancient Chinese craftsmen paid attention to detail. They use their great skills to build architectural forms with local features. The “zheng wen” and “deep eave” are the result of their careful design and represent the local features of the vernacular architecture form. After identifying the formal features of contemporary vernacular architecture in Jiangsu Province, it can be found from the identification results that the confidence scores of “zheng wen” and “deep eave” are both higher. This suggests that the features of the “zheng wen” and “deep eave” have been retained in contemporary vernacular architecture practice. Therefore, the contemporary practice of vernacular architecture retains the “zheng wen” and “deep eave” of the roof shape, which can highlight the regional characteristics of traditional vernacular architectural forms.

5.4.2. Optimization of Facade Features and Enrichment of Architectural Forms

In China, the Jiangsu region is considered to be an important area of the Jiangnan water town and thus has a Jiangnan character. A special feature of the Jiangnan architectural style is that the architectural facade is very rich in layers. The “gable” and “long window” are very important in the facade of traditional vernacular architecture. According to the results of identifying the formal features of contemporary vernacular architecture, the confidence scores of “gable” and “long window” are lower than those of “zheng wen” and “deep eave”. This suggests that in the contemporary practice of vernacular architecture, the “gable” and “long window” differ from the traditional features. Today, new materials and techniques are available to provide diversified options for the design of vernacular architecture facades. Compared to the traditional vernacular architecture facade, the contemporary vernacular architecture design facade form is more concise and less decorated with details. For example, the traditional Chinese “long window” emphasizes exquisite craftsmanship. Based on satisfying the function of use, more emphasis is placed on beautification. Contemporary vernacular architecture can be used in the form of floor-to-ceiling windows instead of the “long window”. The floor-to-ceiling windows are less detailed than the traditional “long window” with wood carvings, but they enhance the transparency of the facade. It has a modern aesthetic while enhancing practicality. Therefore, enriching the form of the architecture by optimizing the facade can help to achieve sustainable development of vernacular architecture.

5.4.3. Continuation of Traditional Features and Moderate Innovation

The formal features of vernacular architecture in the Jiangsu region are influenced by various factors such as geographic factors, cultural factors, and construction techniques. In the long process of historical development, vernacular architecture has formed unique formal features. In China, contemporary vernacular architecture practice focuses on innovation based on the inheritance of traditional features. The introduction of formal features, construction techniques, and traditional materials from traditional vernacular architecture into contemporary architectural practice is a common approach. The basic idea of this approach is to continue tradition and moderate innovation, rather than simply collage traditional and modern elements together. In the contemporary vernacular architecture design of the Jiangsu region, tradition and modernity can co-exist. In the case of traditional vernacular architecture, it can be preserved as architectural heritage or the sustainability of vernacular architecture can be achieved by retaining traditional features. Of course, the application of tradition to contemporary architectural practice includes the formal features of the building, but also aspects such as traditional materials and colors. Sustainability in vernacular architecture can also be achieved through the use of traditional materials. The role of traditional vernacular architecture form features in influencing contemporary architecture is obvious. Therefore, the important value of traditional features should be reflected in contemporary architectural practice.

6. Conclusions

The important contribution of this study is to propose an object detection model for identifying formal features of vernacular architecture. On a theoretical level, the methods presented in this paper help to improve theories related to the study of vernacular architecture. It can also be used as a tool for studying the formal features of architecture in a particular region. On a practical level, the model proposed in this paper can be used to identify and extract the formal features of vernacular architecture, which can provide a reference for the conservation and sustainable development of vernacular architecture. It can also be used to measure and evaluate the fit of contemporary architecture with regional styles, contributing to the overall control of architectural style. From the results of the study, after training, the AOD R-CNN model can be used to identify formal features of vernacular architecture such as “deep eave”, “zheng wen”, “gable”, “long window”, and so on.

The main purpose of this study is to identify the formal features of vernacular architecture in the Jiangsu region through deep learning technology, and objectively display the formal features of vernacular architecture in this region. The observation results show that the vernacular architecture in the three geographical regions of Jiangsu Province has differences. The features of “deep eave”, “zheng wen”, “gable”, and “long window” are all very obvious in southern Jiangsu. The ”deep eaves” and “long windows” are more pronounced in the middle and southern Jiangsu than in northern Jiangsu. Comparing the northern Jiangsu with the middle Jiangsu, the” gable” feature of the northern Jiangsu is more obvious. Whereas contemporary vernacular architecture forms are more flexible and varied, they still inherit some of the features of the region’s traditional vernacular architecture forms.

Vernacular architecture has an influential role in contemporary architecture. The preservation and sustainable development of vernacular architecture is the focus of attention in the field of architecture. The practice of renovating vernacular architecture in the context of today’s times is complex. The practice of vernacular architecture renovation in today’s context is complex, encompassing both architectural and technological aspects, and is also concerned with artistic, cultural, and philosophical contents. The formal features of vernacular architecture are important for both heritage conservation and cultural transmission. For contemporary architectural design, vernacular architecture formal features are “prototypes” and “cultural genes” [71]. The application of deep learning technology in the field of architecture and design helps to understand and master this “gene”. And it can provide a reference basis for the regional expression of contemporary architecture. While the approach presented in this paper is a new perspective, it has limitations. Firstly, because of the limited number of vernacular architectures in the study area, the sample size is limited. This could lead to a potential bias in the results of the study. Although we collected as many buildings as possible that met the requirements, there is still room for improvement in sample size and sample quality. In future research, there will be further improvement and optimization of the model to enhance its discriminative and computational capabilities.

At present, the methodology and model proposed in this paper is a preliminary attempt, but it is hoped that it will provide useful ideas for the preservation and sustainable development of vernacular architectural heritage. At the same time, this approach is equally applicable to the study of other regions and architecture types. The use of architectural images as a database also makes it easier to collect and organize information materials [45]. Through the application of deep learning techniques, it can be used to assist designers and also to do architecture evaluation. For example, in the renewal of vernacular architecture or regional architecture, architectural images are used to determine whether the architectural formal features are in line with regional styles and design objectives. In summary, the application of deep learning in the recognition of vernacular architecture formal features contributes to both the conservation and sustainable development of vernacular architecture and is also important for contemporary architectural design and regional style planning.

Author Contributions

Conceptualization, P.H. and S.H.; methodology, P.H. and R.X.; software, R.X.; formal analysis, P.H.; investigation, P.H. and S.H.; resources, R.X.; data curation, R.X.; writing—original draft preparation, P.H.; writing—review and editing, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Obeso, A.M.; Benois-Pineau, J.; Acosta, A.Á.R.; Vázquez, M.S.G. Architectural style classification of Mexican historical buildings using deep convolutional neural networks and sparse features. J. Electron. Imaging 2017, 26, 011016. [Google Scholar] [CrossRef]
Karahan, F.; Davardoust, S. Evaluation of Vernacular Architecture of Uzundere District (Architectural Typology and Physical Form of building) in Relation to Ecological Sustainable Development. J. Asian Archit. Build. Eng. 2020, 19, 490–501. [Google Scholar] [CrossRef]
Zhao, X.X.; Greenop, K. From ‘neo-vernacular’ to ‘semi-vernacular’:a case study of vernacular architecture representation and adaptation in rural Chinese village revitalization. Int. J. Herit. Stud. 2019, 25, 1128–1147. [Google Scholar] [CrossRef]
Martínez, P.G. The ‘preservation by relocation’ of Huizhou vernacular architecture: Shifting notions on the authenticity of rural heritage in China. Int. J. Herit. Stud. 2022, 28, 200–215. [Google Scholar] [CrossRef]
Ji, F.Y.; Zhou, S.Y. Dwelling Is a Key Idea in Traditional Residential Architecture’s Sustainability: A Case Study at Yangwan Village in Suzhou, China. Sustainability 2021, 13, 6492. [Google Scholar] [CrossRef]
Hu, X.H.; Huang, Z.F. Study on Influence Mechanism of Culture Protection Behavior of Residents in Tourism Destination: A Case Study of Zhouzhuang. Mod. Urban Res. 2016, 10, 116–120. [Google Scholar]
Kukina, I. The Architecture of the Conflicts. 2nd International Conference on Architecture—Heritage, Traditions and Innovations (AHTI); Atlantis Press: Moscow, Russia, 2020; Volume 471, pp. 386–391. [Google Scholar]
Wang, Y.T.; Hu, W.J. Cultural geography meets architectural typology: A mixed-methods study of traditional Bayu dwellings in Southwestern China. J. Asian Archit. Build. Eng. 2024, 1–25. [Google Scholar] [CrossRef]
Chu, Y.C. Hypoiconicity in the architecture of Suzhou: Authentic resemblance, diagrammatic reduction, and metaphoric displacement. Soc Semiot. 2020, 30, 114–132. [Google Scholar] [CrossRef]
Wang, D.G.; Lu, Q.Y.; Wu, Y.F.; Fan, Z.Q. The characteristic of regional differentiation and impact mechanism of architecture style of traditional residence. J. Nat. Resour. 2019, 34, 1864–1885. [Google Scholar]
Bao, S.H.; Zhuo, X.L.; Tao, J. Using semi-supervised machine learning to assist classification and recognition of Chinese vernacular architecture. J. Build. Eng. 2024, 98, 111327. [Google Scholar] [CrossRef]
Wu, X.W.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef]
Dong, J.Y. Object Detection based on Deep Learning. In Proceedings of the International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV), Sanya, China, 19–21 November 2021; p. 12153. [Google Scholar]
Guo, Y. The microscopic visual forms in architectural art design following deep learning. J. Supercomput. 2021, 78, 559–577. [Google Scholar] [CrossRef]
Yoshimura, Y.; Cai, B.; Wang, Z.; Ratti, C. Deep learning architect: Classification for architectural design through the eye of artificial intelligence. In Proceedings of the International Conference on Computers in Urban Planning and Urban Management, Espoo, Finland, 16–18 June 2019; pp. 249–265. [Google Scholar]
Llamas, J.; Lerones, P.M.; Medina, R.; Zalama, E.; Gómez-García-Bermejo, J. Classification of Architectural Heritage Images Using Deep Learning Techniques. Appl. Sci. 2017, 7, 992. [Google Scholar] [CrossRef]
Lee, J.; Yu, J.M. Automatic Surface Damage Classification Developed Based on Deep Learning for Wooden Architectural Heritage. In Proceedings of the 9th CIPA Symposium on Documenting, Understanding, Preserving Cultural Heritage—Humanities and Digital Technologies for Shaping the Future, Florence, Italy, 25–30 June 2023; pp. 151–157. [Google Scholar]
Gao, L.; Wu, Y.; Yang, T.; Zhang, X.; Zeng, Z.; Chan, C.K.D.; Chen, W. Research on Image Classification and Retrieval Using Deep Learning with Attention Mechanism on Diaspora Chinese Architectural Heritage in Jiangmen, China. Buildings 2023, 13, 275. [Google Scholar] [CrossRef]
Angeline, R.; Nambiar, A.S.; Samuel Jacinth, K.; Alan Christo, P.; Joseph, P.A. AI Art Authenticator: Deep Learning Image Classification. In Proceedings of the 8th Smart Trends in Computing and Communications (SmartCom), Pune, India, 12–13 January 2024; Volume 947, pp. 107–119. [Google Scholar]
Fan, T. Research and realization of video target detection system based on deep learning. Int. J. Wavelets Multiresolution Inf. Process. 2020, 18, 1941010. [Google Scholar] [CrossRef]
Wang, H.; Liu, C.; Yu, L.; Zhao, J. Research on Target Detection and Recognition Algorithm Based on Deep Learning. In Proceedings of the 38th Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8483–8487. [Google Scholar]
Xu, H.; Sun, H.; Wang, L.; Yu, X.; Li, T. Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan. ISPRS Int. J. Geo-Inf. 2023, 12, 264. [Google Scholar] [CrossRef]
Jin, T.; Youjia, C.; Geng, L.; Dawei, X.; Huashuai, C.; Jiaping, H. Juxtaposition or integration: The formation mechanism of architectural form in a cultural transition zone. J. Asian Archit. Build. Eng. 2023, 22, 2690–2703. [Google Scholar] [CrossRef]
Le, V.A.; Cao, D.S. Study on Vietnamese Design Methods of Traditional Vernacular Architecture and Discussion on Their Technical Origins. Int. J. Arch. Herit. 2024, 18, 622–651. [Google Scholar] [CrossRef]
Landi, S. Historical centers in Sabine, Italy: Links between architecture and environment. In Proceedings of the International Conference on Vernacular Heritage, Sustainability and Earthen Architecture, Valencia, Spain, 11–13 September 2014; pp. 419–424. [Google Scholar]
Septian, N. Cultural Microcosm in Focus: Landscape and Identity in Small Village Explored Through Design and Environment-Behavior Theories. In Proceedings of the 8th Art, Craft and Design in Southeast Asia International Symposium (ARCADESA), Yogyakarta, Indonesia, 27–28 September 2024; Volume 9(SI), pp. 259–265. [Google Scholar]
Luo, D.Y. They collated and summarised the materials obtained from genealogies, inscriptions and inquiries, and conducted in-depth research on the formal features, historical and cultural values of vernacular architecture. New Archit. 2024, 5, 1223–1226. [Google Scholar]
Li, J.H.; Bao, H.Y. Thoughts on Vernacular Architecture Research and Contemporary Regional Architectural Creation. In Proceedings of the 2nd International Conference on Civil Engineering, Architecture and Building Materials (CEABM 2012), Yantai, China, 25–27 May 2012; Volume 174–177, pp. 1656–1659. [Google Scholar]
Zhang, M.H.; Zhang, J.Y.; Liu, Q.; Li, T.S.; Wang, J. Research on the Strategies of Living Conservation and Cultural Inheritance of Vernacular Dwellings—Taking Five Vernacular Dwellings in China’s Northern Jiangsu as an Example. Sustainability 2022, 14, 12503. [Google Scholar] [CrossRef]
Li, G.Q.; Chen, B.Q.; Zhu, J.; Sun, L. Traditional Village research based on culture landscape genes: A Case of Tujia traditional villages in Shizhu, Chongqing, China. J. Asian Arch. Build. Eng. 2023, 23, 325–343. [Google Scholar] [CrossRef]
Chen, W.W.; Du, Y.M.; Cui, K.; Fu, X.L.; Gong, S.Y. Architectural Forms and Distribution Characteristics of Beacon Towers of the Ming Great Wall in Qinghai Province. J. Asian Arch. Build. Eng. 2017, 16, 503–510. [Google Scholar] [CrossRef]
Saleh, M.A.E. The decline vs the rise of architectural and urban forms in the vernacular villages of southwest Saudi Arabia. Build Environ. 2001, 36, 89–107. [Google Scholar] [CrossRef]
Wang, Y.S.; Yi, Y.; Zhang, N.; Du, J.A. Study of the Forms and Technology of Traditional Granary Buildings in the Middle and Lower Reaches of the Fu River. J. Asian Arch. Build. Eng. 2018, 17, 175–182. [Google Scholar] [CrossRef]
Tao, J.; Chen, H.S.; Zhang, S.W.; Xiao, D.W. Space and Culture: Isomerism in Vernacular Dwellings in Meizhou, Guangdong Province, China. Space Cult. 2018, 17, 15–22. [Google Scholar] [CrossRef]
Wei, R.R.; Cho, T.Y. A Study of the Influence of Shape Grammar on Architectural Form. Des. Res. 2021, 6, 426–438. [Google Scholar]
Forwood, B. Expressing Sustainability In Architectural Form-Energy And Environment As Architectural Metaphors. Renew Energ. 1994, 5, 1132–1134. [Google Scholar] [CrossRef]
Zou, H.; Ge, J.; Liu, R.C.; He, L. Feature Recognition of Regional Architecture Forms Based on Machine Learning: A Case Study of Architecture Heritage in Hubei Province, China. Sustainability 2023, 15, 3504. [Google Scholar] [CrossRef]
Zeppelzauer, M.; Despotovic, M.; Sakeena, M.; Koch, D.; Döller, M. Automatic prediction of building age from photographs. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, Yokohama, Japan, 11–14 June 2018; pp. 126–134.54. [Google Scholar]
Zhang, L.; Zheng, L.; Chen, Y.; Huang, L.; Zhou, S.H. CGAN-Assisted Renovation of the Styles and Features of Street Facades—A Case Study of the Wuyi Area in Fujian, China. Sustainability 2022, 14, 16575. [Google Scholar] [CrossRef]
Zhang, Y.; Yin, H. Application of AIGC Technology in Vernacular Architecture Design. In Proceedings of the 4th International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 12–14 January 2024; pp. 492–497. [Google Scholar]
Xia, B.; Li, X.; Shi, H.; Chen, S.C.; Chen, J.M. Style Classification and Prediction of Residential Buildings Based on Machine Learning. J. Asian Arch. Build. Eng. 2020, 19, 714–730. [Google Scholar] [CrossRef]
Cai, C.Y.; Li, B. Training Deep Convolution Network With Synthetic Data For Architectural Morphological Prototype Classification. Front. Archit. Res. 2021, 10, 304–316. [Google Scholar] [CrossRef]
Yi, Y.K.; Zhang, Y.; Myung, J. House style recognition using deep convolutional neural network. Automat Constr. 2020, 118, 103307. [Google Scholar] [CrossRef]
Mathias, M.; Martinovic, A.; Weissenberg, J.; Haegler, S.; Van Gool, L. Automatic architectural style recognition. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2011, 3816, 171–176. [Google Scholar] [CrossRef]
Berg, A.C.; Grabler, F.; Malik, J. Parsing images of architectural scenes. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio De Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
Yang, M.Y.; Förstner, W. Regionwise classification of building facade images. In Proceedings of the ISPRS Conference on Photogrammetric Image Analysis, Munich, Germany, 5–7 October 2011; pp. 209–220. [Google Scholar]
Shalunts, G.; Haxhimusa, Y.; Sablatnig, R. Architectural style classification of building facade windows. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 26–28 September 2011; pp. 280–289. [Google Scholar]
Kevseroglu, O.; Kurban, R. Re-exploring the Kayseri Culture Route by Using Deep Learning for Cultural Heritage Image Classification. In Proceedings of the AICCONF’24: Proceedings of the Cognitive Models and Artificial Intelligence Conference; Istanbul, Turkey, 25–26 May 2024, pp. 196–201.
Zhao, P.; Miao, Q.; Song, J.; Qi, Y.; Liu, R.; Ge, D. Architectural style classification based on feature extraction module. IEEE Access 2018, 6, 52598–52606. [Google Scholar] [CrossRef]
Lee, S.; Maisonneuve, N.; Crandall, D.; Efros, A.A.; Sivic, J. Linking past to present: Discovering style in two centuries of architecture. In Proceedings of the IEEE International Conference on Computational Photography, Houston, TX, USA, 24–26 April 2015. [Google Scholar]
Wang, B.; Zhang, S.; Zhang, J.; Cai, Z. Architectural style classification based on CNN and channel–spatial attention. Signal Image Video Process. 2022, 17, 99–107. [Google Scholar] [CrossRef]
Porretta, P.; Pallottino, E.; Colafranceschi, E. Minnan and Hakka Tulou. Functional, Typological and Construction Features of the Rammed Earth Dwellings of Fujian (China). Int. J. Arch. Herit. 2022, 16, 899–922. [Google Scholar] [CrossRef]
Wang, X.Y.; Guo, W.M.; Yang, Z.; Li, X.Y.; Zhang, B.W. Localisation of Composite capital designs in modern Jiangsu, China, based on formal and social analysis. Humanit. Soc. Sci. Commun. 2023, 10, 560. [Google Scholar] [CrossRef]
Tang, S.S.; Feng, J.X.; Li, M.Y. Housing tenure choices of rural migrants in urban destinations: A case study of Jiangsu Province, China. Hous. Stud. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Li, X.L.; Wang, Y. A Study on the Protection and Planning of Historical and Cultural Cities. In Proceedings of the 4th International Conference on Energy and Environmental Protection (ICEEP), Shenzhen, China, 2–4 June 2015; pp. 3830–3834. [Google Scholar]
Yong, Z. Jiangsu Folk Houses; China Architecture & Building Press: Beijing, China, 2009. [Google Scholar]
Shi, Y.B.; Shi, F.; Zhang, Z.J. Conservation, Utilisation and Rural Revitalisation of Traditional Residence Villages in Nantong, Jiangsu, China. Delta 2023, 21, 205–210. [Google Scholar]
Huang, H. Re-interpretation of New Jiangnan Style: The Design of Jiataowan Residential Community in Zhujiajiao Town. Huazhong Archit. 2014, 32, 84–87. [Google Scholar]
Jiang, X.H. Research on the Application of Jiangsu Canal Cultural Elements in the Renewal and Reconstruction of Traditional Residences. Chutzpah 2024, 4, 78–80. [Google Scholar]
Wang, X.Y.; Guo, W.M.; Yang, Z. A Study on the Research Methods Used in Modern Chinese Architectural Decoration Design. In Proceedings of the 9th Congress of the International-Association-of-Societies-of-Design-Research (IASDR); Springer: Hong Kong, 2021; pp. 2724–2738. [Google Scholar]
Yang, S.G. Exploring the Aesthetic Connotation of Xuzhou at Hubushan Ancient Dwellings. In Proceedings of the 2nd International Conference on Social Science and Health (ICSSH); Atlantis Press: Taipei, Taiwan, 2014; Volume 56, pp. 318–321. [Google Scholar]
Weng, W.F.; Wu, J.X.; Bao, L. The Regeneration of Traditional Residential Block with Typological Approach-Taking Zhongnongli in Nanjing as an Example. In Proceedings of the 5th World Multidisciplinary Civil Engineering-Architecture-Urban Planning Symposium (WMCAUS), Prague, Czech Republic, 1–5 September 2020; Volume 960, p. 42037. [Google Scholar] [CrossRef]
Liu, Q.; Liao, Z.; Wu, Y.; Mulugeta Degefu, D.; Zhang, Y. Cultural Sustainability and Vitality of Chinese Vernacular Architecture: A Pedigree for the Spatial Art of Traditional Villages in Jiangnan Region. Sustainability 2019, 11, 6898. [Google Scholar] [CrossRef]
Wang, Z.Q. Research on the Historical and Aesthetic Value of Nanjing Republican Architecture and Decorative Style. Designs 2020, 9, 14–17. [Google Scholar]
Malewczyk, M.; Taraszkiewicz, A.; Czyz, P. Composition Patterns of Contemporary Polish Residential Building Facades. Nexus Netw. J. 2022, 24, 767–785. [Google Scholar] [CrossRef]
Meddahi, K.; Boussora, K. Aesthetic Measures of Algiers’ Colonial Facades. Nexus Netw. J. 2021, 23, 667–688. [Google Scholar] [CrossRef]
Ramalingam, S.P.; Kumar, V. Building usage prediction in complex urban scenes by fusing text and facade features from street view images using deep learning. Build Environ. 2025, 267, 112174. [Google Scholar] [CrossRef]
Philbin, J.; Chum, O.; Isard, M.; Sivic, J.; Zisserman, A. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
Jiao, M.; Lu, L. Spatiotemporal distribution of toponymic cultural heritage in Jiangsu Province and its historical and geographical influencing factors. Herit. Sci. 2024, 12, 377. [Google Scholar] [CrossRef]
Bao, L.; Li, H.; Liu, C.; Jin, H. The Strategy of Integrated Promotion of Function and Performance in the Renovation of Vernacular Dwelling: The Case of the Renovation of Vernacular Dwelling in the Historical City of Yixing, Jiangsu. New Archit. 2017, 5, 12–17. [Google Scholar]
Zhang, Y.; Luo, X.; Xu, X.; Mak, K.; Ruan, D. Construction and Application of Cultural Gene Library of Ancestral Hall in Canton Region. Teh Vjesn. 2024, 31, 993–1004. [Google Scholar]

Figure 1. Definition of the study area.

Figure 2. Deep eave on vernacular architecture.

Figure 3. The zheng wen of vernacular architecture roof.

Figure 4. The gables of vernacular architecture.

Figure 5. Long window in vernacular architecture.

Figure 6. Elaborate construction and exquisite decoration: (a) Elaborate construction; (b) Exquisite Decoration.

Figure 7. Jiangsu traditional vernacular architecture image dataset (partly): (a) Northern Jiangsu; (b) Middle Jiangsu; (c) Southern Jiangsu.

Figure 8. The flowchart of the proposed method.

Figure 9. Faster R-CNN Network Architecture.

Figure 10. Feature Pyramid Structure.

Figure 11. Standard convolution and deformable convolution: (a) Standard convolution; (b–d) Deformable convolutions.

Figure 12. Principle of deformable convolution.

Figure 13. AOD R-CNN Network Architecture.

Figure 14. Recognition results of vernacular architecture formal features: (a) Results of feature identification in the northern Jiangsu; (b) Results of feature identification in the middle Jiangsu; (c) Results of feature identification in the southern Jiangsu; (d) Comparison of identification results by region.

Figure 15. The identification result for the d-u DUCAL Coffee & Culture.

Figure 16. The identification result for the project of the folk song culture center in Fengmenglong village.

Table 1. Historical and Cultural Ancient Villages and Towns in Jiangsu Province.

Northern Jiangsu	Northern Jiangsu	Middle Jiangsu	Southern Jiangsu	Southern Jiangsu
Hexia Ancient Town	Taierzhuang Ancient Town	Yudong Ancient Town	Zhouzhuang Ancient Town	Yanjiaqiao Ancient Town
Yaodi Ancient Town	Anfeng Ancient Town	Yuxi Ancient Town	Tongli Ancient Town	Nanchang Street
Jiangba Ancient Town	Zhuxi Ancient Town	Shixiang Ancient Town	Luzhi Ancient Town	Ganlu Ancient Town
Pingqiao Ancient Town	Miaowan Ancient Town	Bencha Ancient Town	Mudu Ancient Town	Jiaoxi Ancient Town
Xuyi Ancient Street	Yandu Ancient Town	Baipu Ancient Town	Jinxi Ancient Town	Yangqiao Ancient Town
Lvliang Ancient Town	Wuyouzhuxi Ancient Town	Tangzha Ancient Town	Lili Ancient Town	Benniu Ancient Town
Banzha Ancient Town	Yukou Ancient Town	Dingyan Ancient Town	Qiandeng Ancient Town	Menghe Ancient Town
Guishan Ancient Town	Maliang Ancient Town	Qutang Ancient Town	Zhenze Ancient Town	Xueyan Ancient Town
Zaohe Ancient Town	Xixi Ancient Town	Lvsi Ancient Town	Shaxi Ancient Town	Baoyan Ancient Town
Yanghe Ancient Town	Caoyan Ancient Town	Erjia Ancient Town	Liuhe Ancient Town	Xijindu Ancient Town
Shuanggou Ancient Town	Dongjin Ancient Town	Haian Ancient Town	Guangfu	Yanling Ancient Town
Chuancheng Ancient Town	Lianyun Ancient Town	Shaobo Ancient Town	Luxu Ancient Town	Qianhua Ancient Town
Wang’s Hometown	Haizhou Ancient Town	Guazhou Ancient Town	Shuangfeng Ancient Town	Ruli Ancient Town
Yaowan Ancient Town	Banpu Ancient Town	Daqiao Ancient Town	Luxiang Ancient Town	Gecun Ancient Town
Buzi Ancient Town	Phoenix	Zhenzhou Ancient Town	Zhengyi Ancient Town	Qixia Ancient Town
Hanwang Ancient Town	Nancheng Ancient Town	Jieshou Ancient Town	Xiemaqiao Ancient Town	Wuxiang Water Town
Panan Ancient Town	Donghaiquyang Ancient Town	Wantou Ancient Town	Pingmen	Jinling Ancient Town
Tushan Ancient Town	Yanhe Lane	Linze Ancient Town	Guli Ancient Town	Guchengwan
Dashahe Ancient Town	Erdao Street	Dayi Ancient Town	Huangjing Ancient Town	Gaochun Ancient Street
Xiapi Ancient Town	Yankesi	Sanduo Ancient Town	Pingwang Ancient Town	Lishuishiqiu Ancient Town
Wushao Ancient Town	Anran Ancient Town	Fanshui Ancient Town	Luyuan Ancient Town	Jiangninghushu Ancient Town
——	——	Qintong Ancient Town	Yangwan Ancient Town	Pukoutangquan Ancient Town
——	——	Shagou Ancient Town	Wenzhao Ancient Town	Qiqiao Ancient Town
——	——	Huangqiao Ancient Town	Tangshi Ancient Town	Chunxi Ancient Town
——	——	Chaixu Ancient Town	Huishan Ancient Town	Dongmen Ancient Town
——	——	Daohe Ancient Town	Xuntang Ancient Town	Guabu Ancient Town
——	——	Jindongmen Ancient Street	Yuantouzhu	Hushu Ancient Town
——	——	——	Rongxiang Ancient Town	Moling Ancient Town
——	——	——	Dangkou Ancient Town	Taowu Ancient Town
——	——	——	Meili Ancient Town	Tangshan Ancient Town
——	——	——	Changjing Ancient Town	Banqiao Ancient Town
——	——	——	Nanquan Ancient Town	Lukou Ancient Town
——	——	——	Yuqi Ancient Town	Yulongtanmingqing Ancient Town

Table 2. Image Statistics of Traditional Vernacular Architecture in Jiangsu Province.

Regional Division	Umber of Historical Villages and Towns	Number of Architectural Images
Northern Jiangsu	47	153
Middle Jiangsu	27	106
Southern Jiangsu	68	186
Total	142	445

Table 3. ResNet50 Network Architecture.

Layer Name	Output Size	ResNet50
Conv1	112 × 112	7 × 7, 64, stride = 2
Conv2_x	56 × 56	3 × 3, max pool, stride = 2 $[\begin{array}{l} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{array}] \times 3$
Conv3_x	28 × 28	$[\begin{array}{l} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{array}] \times 4$
Conv4_x	14 × 14	$[\begin{array}{l} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{array}] \times 6$
Conv5_x	7 × 7	$[\begin{array}{l} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{array}] \times 3$

Table 4. Comparison between Faster R-CNN and AOD R-CNN.

	Faster R-CNN	AOD R-CNN
Backbone network	VGG	ResNet50
Design of the feature pyramid structure	No	Addition of FPN network
Convolution structure	Standard convolution	Deformable convolutional networks and deformable RoI Pooling structures
Anchor selection algorithm	Default anchor box with fixed aspect ratio	Adaptive anchor selection algorithm based on K-means++

Table 5. The test results of vernacular architecture formal features in Jiangsu Province.

Features	Training Samples	Confidence Score ≥ 0.68	Confidence Score ≥ 0.78	Confidence Score ≥ 0.88	Total
Deep Eave	445	46	59	157	262
Zheng Wen	445	33	39	110	182
Gable	445	36	47	95	178
Long Window	445	50	56	139	245

Table 6. Identification results of vernacular architecture formal features in the Jiangsu region.

Regional Division	Architectural Images	Positive Sample	Deep Eave	Zheng Wen	Gable	Long Window
Northern Jiangsu	153	270	76	70	42	82
Middle Jiangsu	106	191	54	39	20	78
Southern Jiangsu	186	450	136	107	61	146
Total	445	911	266	216	123	306

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, P.; Hu, S.; Xu, R. Formal Feature Identification of Vernacular Architecture Based on Deep Learning—A Case Study of Jiangsu Province, China. Sustainability 2025, 17, 1760. https://doi.org/10.3390/su17041760

AMA Style

Han P, Hu S, Xu R. Formal Feature Identification of Vernacular Architecture Based on Deep Learning—A Case Study of Jiangsu Province, China. Sustainability. 2025; 17(4):1760. https://doi.org/10.3390/su17041760

Chicago/Turabian Style

Han, Pingyi, Shenjian Hu, and Rui Xu. 2025. "Formal Feature Identification of Vernacular Architecture Based on Deep Learning—A Case Study of Jiangsu Province, China" Sustainability 17, no. 4: 1760. https://doi.org/10.3390/su17041760

APA Style

Han, P., Hu, S., & Xu, R. (2025). Formal Feature Identification of Vernacular Architecture Based on Deep Learning—A Case Study of Jiangsu Province, China. Sustainability, 17(4), 1760. https://doi.org/10.3390/su17041760

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Formal Feature Identification of Vernacular Architecture Based on Deep Learning—A Case Study of Jiangsu Province, China

Abstract

1. Introduction

2. Literature Review

2.1. Research on the Form of Vernacular Architecture

2.2. The Application of Deep Learning in the Field of Architecture

3. Research Area and Data

3.1. Study Area

3.2. Overview of the Vernacular Architecture Form Features in Jiangsu Province

4. Methods

4.1. Data Collection

4.1.1. Chinese Traditional Architecture Image Dataset (CTAID)

4.1.2. Jiangsu Traditional Vernacular Architecture Image Dataset (JTVAID)

4.1.3. Jiangsu Contemporary Vernacular Architecture Image Dataset (JCVAID)

4.2. Processing of Datasets

4.3. AOD R-CNN Model

4.3.1. Faster R-CNN Network Model

4.3.2. Optimization of Backbone Network

4.3.3. Design of the Feature Pyramid Structure

4.3.4. Deformable Optimization Strategy

4.3.5. Adaptive Anchor Selection Algorithm Based on K-Means++

4.3.6. Comparison Between AOD R-CNN and Faster R-CNN

5. Results and Discussion

5.1. Identifying Formal Features of Vernacular Architecture in the Jiangsu Region

5.2. Vernacular Architecture Formal Features Zoning Identification

5.3. Identification of Contemporary Vernacular Architecture Formal Features

5.3.1. The d-u DUCAL Coffee & Culture

5.3.2. The Project of the Folk Song Culture Center in Fengmenglong Village

5.4. The Sustainability of the Formal Features of Vernacular Architecture

5.4.1. Retention of Roof Features and Highlighting of Regional Characteristics

5.4.2. Optimization of Facade Features and Enrichment of Architectural Forms

5.4.3. Continuation of Traditional Features and Moderate Innovation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI