Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan

Xu, Hong; Sun, Haozun; Wang, Lubin; Yu, Xincan; Li, Tianyue

doi:10.3390/ijgi12070264

Open AccessArticle

Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan

by

Hong Xu

^1,2,*,

Haozun Sun

¹,

Lubin Wang

³

,

Xincan Yu

¹ and

Tianyue Li

¹

School of Urban Construction, Wuhan University of Science and Technology, Wuhan 430065, China

²

Hubei Provincial Engineering Research Center of Urban Regeneration, Wuhan University of Science and Technology, Wuhan 430065, China

³

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2023, 12(7), 264; https://doi.org/10.3390/ijgi12070264

Submission received: 5 April 2023 / Revised: 13 June 2023 / Accepted: 29 June 2023 / Published: 2 July 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The visual quality and spatial distribution of architectural styles represent a city’s image, influence inhabitants’ living conditions, and may have positive or negative social consequences which are critical to urban sensing and designing. Conventional methods of identifying architectural styles rely on human labor and are frequently time-consuming, inefficient, and subjective in judgment. These issues significantly affect the large-scale management of urban architectural styles. Fortunately, deep learning models have robust feature expression abilities for images and have achieved highly competitive results in object detection in recent years. They provide a new approach to supporting traditional architectural style recognition. Therefore, this paper summarizes 22 architectural styles in a study area which could be used to define and describe urban architectural styles in most Chinese urban areas. Then, this paper introduced a Faster-RCNN general framework of architectural style classification with a VGG-16 backbone network, which is the first machine learning approach to identifying architectural styles in Chinese cities. Finally, this paper introduces an approach to constructing an urban architectural style dataset by mapping the identified architectural style through continuous street view imagery and vector map data from a top-down building contour map. The experimental results show that the architectural style dataset created had a precision of 57.8%, a recall rate of 80.91%, and an F1 score of 0.634. This dataset can, to a certain extent, reflect the geographical distribution characteristics of a wide variety of urban architectural styles. The proposed approach could support urban design to improve a city’s image.

Keywords:

deep learning; street view images; architectural style recognition; dataset; construction method

1. Introduction

The construction of refined architectural style datasets is vital for guiding new urban architectural styles, renewing and transforming old communities, and controlling urban style. Many cities have begun implementing control requirements for architectural styles, landscape garden resources, cultural landmarks, and other landscape-related components. Urban style control does not mean unifying urban buildings into one style but means providing style usage standards so that the present state of neighboring architectural styles is wholly considered when designing new structures or repairing existing buildings in a city. This promotes stylistic harmony in urban areas. Thus, an urban architectural style has the characteristics of continuity, development, and regionality. The objective, rapid, and accurate identification of architectural style has always been a research problem in architecture and urban planning. In practice, there are some issues in these research questions; for example, various architectural styles share decisive architectural elements; multiple styles are integrated into buildings of a city; and the characteristics of old residential areas or self-built houses are not prominent. Regarding street view data, the properties of the image itself (noise, darkness, shadow, deformation etc.) can also obscure the characteristic elements of the style. Therefore, it is difficult to conduct the identification of architectural style in urban street environments.

CNN models have made strides in computer vision in recent years due to better hardware performance and the rapid development of deep learning technologies. In face recognition, vehicle identification, and remote sensing target recognition [1], the general recognition framework represented by algorithms such as the R-CNN series and the YOLO series have received much research attention. Compared to traditional images [2], panoramic SVIs can produce a 360-degree panoramic view similar to that of a pedestrian. SVIs depict a façade structure that is rich enough to portray the landscape in great detail and from the ground up, capturing aspects that historically may not have been viewable from aerial and satellite platforms. Deep learning for SVIs, which can be applied to architectural style recognition, is increasingly valued in the architecture and urban planning fields.

This paper proposes an approach to building an urban architectural style dataset under deep learning for SVIs. The contributions of this paper are summarized as follows. First, this paper summarized 22 architectural styles in the study area, which could be used to define and describe urban architectural styles in most Chinese urban areas. Second, this paper implemented a Faster-RCNN general framework of architectural style classification with a VGG-16 backbone network, which is the first machine learning approach for identifying architectural styles in Chinese cities. Third, this paper introduces an approach for constructing urban architectural style datasets by mapping the identified architectural style in continuous street view imagery and vector map data of building top-down contour maps. This is valuable for urban landscape planning and maintaining in sustainable and smart cities.

2. Related Works

2.1. Street View Image-Based Urban Data

Street view image-based urban data are challenging to evaluate and improve broadly and have historically been utilized in urban environmental assessment studies [3]. Although the value of data volume is increasingly being recognized, obtaining this data efficiently and cost-effectively remains a difficult challenge. Advanced surveying, mapping, and geographic information technologies are frequently employed to gather a variety of building locations and attribute data [4].

Large-scale building mapping projects rely significantly on human surveys and use aerial photogrammetry technologies to build attribute data. Aerial photogrammetry technology, however, primarily captures the top-down outline of a structure, excluding important façade information. On the other hand, gathering data on building attributes is less automated and not useful in supporting massive operations. Therefore, the issue of how to rapidly and automatically evaluate the varied geographic information of structures has surfaced. Using SVIs [5] provides an opportunity to solve this problem.

SVIs, used to study urban environmental assessment [6,7], provide vast sample data sources and fresh research ideas because of their extensive coverage, ability to provide street-level landscape information, and inexpensive data-collecting costs. SVIs are superior to traditional images in the following ways: First, they offer a broader perspective and a full range of SVI data. Second, the image’s attribute data are more detailed and have geographic labels, which lays the groundwork for later automation of the collection of building attribute data. Third, data can be acquired by utilizing Application Programming Interfaces (APIs) provided by map service providers, which offer a significantly more efficient approach than manual data collection. Lastly, SVI data have a vast coverage area; the current picture data include historical and street view photography for more than 100 nations, which accounts for more than half of the world’s population. Urban researchers can easily understand a city’s present status due to free access to SVIs [8,9], which has grown to be a crucial data source for geospatial data collecting and urban analysis [10]. SVIs may eventually develop into a competitive big data source [11]. The above research shows that SVIs can be used as a data source for perceptual research at the macro scale of cities. However, the limitations of manual investigation costs make it very challenging to process a large number of images.

Combining SVI data with deep learning [12,13,14,15,16,17] for large-scale, in-depth evaluation and analysis of the physical environment at the urban level has become a trend. Especially in large metropolitan areas, the widespread use of SVIs offers new opportunities to determine the style of buildings efficiently and cost-effectively. The coverage of SVIs and the learning power of deep learning have significant advantages for urban spaces, streets, and environmental assessment [18,19], which has largely contributed to the construction of smart cities [20]. Deep learning has a high potential for architectural feature extraction from images [21]. The ability to learn textures, patterns, and colors makes deep learning suitable for visual analytical tasks such as architectural style classification and detecting architectural elements in building facades.

2.2. Deep Learning Technology in the Field of Architectural Study

With the improvement of computer hardware facilities, deep learning technology is increasingly used in the field of construction, such as in building disaster prevention [22,23,24,25,26], building energy consumption [27,28,29,30], and building extraction [31,32]. In recent years, the application of deep learning technology has also appeared in the fields of building façade color research [33], building management [34], and architectural psychology [35].

Some scholars have proposed identifying buildings’ components based on deep learning technology, aiming to assess the problem of the accuracy and automation degree of ridge beast recognition in ancient buildings not being high. Ji [36] proposed an automatic recognition method for ridge beasts based on a convolutional neural network. The result could meet the application needs of fine 3D roof reconstruction, maintenance management, and generation. Some scholars [37] added group convolution and expansion convolution to the ResNet model and then introduced an improved channel attention mechanism to construct a new CA-MSResNet model. The characteristics of the local components of the building were better noted, and the facades of ten Western buildings and two traditional Chinese buildings were identified. In contrast to the above, scholars have attempted to identify specific visual features of buildings (e.g., windows, domes, columns). Yoshimura paid more attention to the capture of spatial design features. Reference [38] identified the work of 34 different architects and used CNNs to predict their architectural styles. However, most were Pritzker Prize winners, with specific architectural languages and expressions.

In recent years, some scholars have tried to apply deep learning technology to the field of architectural style. Some scholars have proposed deep learning-based identification methods for characteristic local buildings, such as traditional Chinese buildings. Zhang et al. [39] proposed a deep learning model for automatically predicting the presence of traditional Chinese buildings and developed two view metrics to quantify the visual perception of buildings by pedestrians. When solving the problem of developing and protecting traditional Chinese architectural settlements, [40] proposed a new deep learning classification framework and screened out six representative ethnic architectural styles, including Min, Wan, Su, Jin, Jing, and Chuan. Overfitting problems were overcome with transfer learning and learning-based data augmentation techniques (i.e., automatic reinforcement). In addition, class activation mapping (CAM) visualization techniques can be used to help understand how CNN classifiers abstract patterns from inputs.

In terms of urban disaster prevention, Rony Kalfarisi [41] trained deep convolutional neural networks to identify urban soft-story buildings to minimize the risk of possible damage in the event of an earthquake and to develop mitigation plans accordingly so that the required resources can be allocated to vulnerable buildings. However, such deep learning models are designed to segment lossless buildings and may not be suitable for segmenting damaged buildings. Chang [42] trained a squeeze-stimulus dual-resolution network (HRNet) model using the xBD dataset and images from the 2010 Haiti earthquake labeled by the University of New South Wales to provide a safe and rapid method for classifying post-disaster building damage. The above method selects characteristic buildings as experimental objects, and their characteristics are more representative than those common in cities. It is also relatively easier to achieve satisfactory accuracy using this method.

Some scholars have tried to select urban architecture as their research object [43] based on the sparse features (SF) at the network’s input combined with primary color pixel values and using CNN models to predict the architecture of Mexican historical buildings, dividing Mexican architecture into three categories to promote Mexican culture. There are scholars [44] proposing a general network framework for the classification of individual building functions by displaying remote sensing images of roof structures and SVIs, including building façade structures. This method has been applied in Canada and the United States, generating eight city-based regional and city-scale building classification maps. Some scholars [45] have explored the possibilities of several advanced image recognition algorithms applied to house style recognition, have analyzed their limitations and possibilities, and have used CNN models to classify house types, but their absolute accuracy is limited. Xu et al. [46] extracted 19 architectural styles from four study area images. They generated a cityscape map based on Faster-RCNN combined with the spatial geometric constraints and image features of spherical panoramic images, reflecting the geographical distribution characteristics of a wide range of urban architectural styles. Sun et al. [47], through a compilation of architectural chronology metadata collections in Amsterdam and Stockholm, showed that it is necessary to understand the evolution of architectural elements and the relationship between architectural chronology and style in space and time. The above scholars have applied deep learning technology to the classification of buildings in cities. They are usually classified according to building function and time to completion.

The interclass relationships between architectural styles are rich. Some scholars have made the following adjustments to improve the recognition effect of the deep learning model. To discover the common features of the same style and the differences between different types, Zhao [48] proposed a feature extraction module based on a deformable part model (DPM), which adopts an improved integrated projection method to maximize interclass distance and minimize intra-class distance. The performance of several classifiers was tested, and the optimal support vector machine classifier was selected. Wang et al. [49] proposed an architectural style classification method based on CNNs and channel spatial attention.

Based on the above analysis findings, some scholars have proposed applying deep learning technology to architecture and urban planning, selecting designated experimental areas or representative architectural features as samples. Good recognition accuracy can be obtained through training, and it has been proven that using deep learning technology can achieve rapid and efficient recognition of architectural styles. However, some studies have limited application and are unsuitable for large-scale urban applications. The author proposes the use of deep learning technology based on CNNs combined with Wuhan SVIs to identify and verify the dominant style of Wuhan architecture [50,51]. Taking Wuhan architecture as the research object, the aim of this study is to identify and classify Wuhan’s architectural style and construct a Wuhan architectural style dataset, demonstrating the viability of the suggested approach for widespread architectural style recognition in cities.

3. Materials and Methods

3.1. Study Area

The Yangtze River Economic Belt’s main city, Wuhan, is also a significant industrial base, a center for science and education, and a central transportation hub in China. We concentrate on the experimental area inside the Third Ring Road (Figure 1). It explicitly contains the following: the Western-style architectural landscape area centered on Jianghan Road and Hankou Concession, the research area for traditional Chinese architecture represented by Yellow Crane Tower and the classic Wuhan University buildings, and the research area from the central urban area to the complete urbanization of the Third Ring Road. Wuhan offers a wide range of architectural samples for research. While it has many historical sites and traditional buildings with Chinese characteristics, it also has buildings in the Western style that date back to the concession era, encompassing both Chinese and foreign-style buildings. The study region has a total size of around 684 square kilometers, a resident population of approximately 6.38 million, and a high residential population density. In summary, we chose Wuhan as our experimental research area.

3.2. Methodology

This paper introduces an approach for building an urban architectural style dataset under deep learning for SVIs (Figure 2). This approach includes three important parts. The first is to label the urban architectural style to be able to define and describe urban architectural styles in most Chinese urban areas. This study summarized 22 architectural styles in the study area and collected the SVIs of these styles to train the Faster-RCNN-based model in identifying these urban architectural styles. The data were obtained using Baidu map APIs to create the Wuhan SVI dataset. Then, manual curation and preprocessing techniques were employed to select images that met the specified criteria. As a result, the architectural style was annotated. The second part is to build up a Faster-RCNN-based model for identifying these urban architectural styles. The Faster-CNNs model is trained using the PASCAL VOC dataset, which is labeled with defined architectural styles in the previous part. Then, this is compared with the in-depth features so that the network converges faster and “learns” the relationship between the architectural construction features and the architectural style. The third part is to construct the urban architectural style dataset by combining the identified results of the trained Faster-CNNs model under the panoramic image reference method, which is based on continuous geographical tags matching the building of the same name, determining the architectural style, and constructing the architectural style dataset.

3.2.1. Architectural Style Classification Based on Building Components and Characteristics

In recent years, many scholars have distinguished buildings at the urban level, and their classification basis can be summarized in the following categories (Table 1).

This paper summarized 22 architectural styles in the study area for training and building a Faster-RCNN-based model for identifying these urban architectural styles. First, one must determine the differences between architectural styles inside the region. Any advanced classification approach or algorithm will comparatively diminish the robustness of the selection if the relationship between styles cannot be distinguished. The doors and windows of Gothic architecture, Roman columns, arches etc., are representative architectural elements that can be used to identify different architectural styles. These symbolic architectural elements can reflect the style characteristics of the building. At the same time, compared with buildings, building components have simple structures and remarkable features, which is conducive to identifying the architectural style in SVIs. This paper proposes a new classification standard for architectural style datasets based on the architectural style classification system of Chinese and foreign architectural development, combined with the distribution and quantity characteristics of existing buildings in the city, to distinguish each style directly. Analyzing the relevant literature and field research shows that the most common buildings in Chinese cities can be summarized into three categories: traditional Chinese architecture, Western-style architecture, and modern architecture. Ancient Chinese architecture has a rich history, outstanding accomplishments, and recognized performance. There are many types of Western-style architecture in China, most of which were introduced because of the opening of the gates of modern China. Trade activities led to large-scale concession construction, which led to a violent collision between Chinese and Western civilizations and the development of Western-style architecture in China. Modern urban buildings can be constructed in a variety of styles, or even a combination of styles, depending on the preferences of the owners and customers, as opposed to the instantly identifiable ancient traditional Chinese buildings. There are dozens of types of architectural style to be found in Baidu. It should be noted that different locations may have other names for the same architectural style. We look for styles that are distinct from one another and have glaring distinctions to reduce complexity.

Based on our existing datasets and with an enumeration of the characteristics of each style, 22 architectural styles were summarized (Table 2). The 22 styles may not include all architectural styles, but are enough to support the architectural style analysis of different cities and have good applicability in most Chinese regions.

The above classification system of architectural styles and building components can be used to categorize building styles and components in SVIs. The CNN model extracts the high-dimensional features of these architectural styles and building components, and the high-dimensional image feature vector set of representative architectural styles is established to identify architectural features and classify architectural styles based on SVIs.

3.2.2. Deep Learning Network for Street View Architectural Style

In recent years, with the rapid development of digital image processing technology, a method has been developed to evaluate cities [52] based on digital images taken at sampling points. Traditional image-based approaches to urban style perception rely heavily on manual image acquisition, which is time-consuming, labor-intensive, and only suitable for small-scale evaluation of cities. When evaluating architectural styles at the metropolitan level, it is challenging to collect sufficient images covering the entire urban area due to the cost of on-site investigations. This study compiles image data for architectural styles by utilizing Baidu map APIs. Considering that the façade of a fixed area may have similar decoration styles, the collected data should be appropriately scattered and provide wide coverage to ensure that the model has good detection adaptability after training.

To classify and identify prevalent architectural styles in SVIs, the experimental network model employed in this research uses the Faster-RCNN general framework of architectural style classification (Figure 3). Instead of the conventional selective search algorithm [53], the Faster-RCNN target detection algorithm introduces the Region Proposal Network (RPN), which calculates the category confidence score while predicting the target-bounding box. This requires little time and is highly efficient; proposal extraction, bounding-box regression (rectangle refinement), and classification are all integrated into a network resulting in a dramatically improved comprehensive performance. At the same time, an important concept (anchor box) was added to the algorithm. The anchor box is an important concept in Faster-RCNN. Before classifying objects in an image, a series of candidate detection frames are generated to facilitate classification and recognition by the neural network. The anchor box of the model in this paper has three different scales (128 × 128, 256 × 256, 512 × 512) and three aspect ratios, i.e., 1:1, 1:2, 2:1, for a total of 9 anchor boxes of different shapes and scales which can be combined. Candidate boxes of various sizes can be generated on the original input image. Compared with other architectures, models with anchor boxes can locate the target area in a shorter time to achieve higher accuracy in completing the frame selection of targets of different sizes and achieve faster and more accurate classification detection. Therefore, we introduce this algorithm to capture all object candidate regions of the target architectural image.

Within the constructed Faster-RCNN, this approach uses VGG-16 as its backbone network. The VGG-16 model comprises 13 convolutional layers, 5 pooling layers, and 3 fully connected layers. Because the convolutional layer and the fully connected layer contain weight coefficients, they are also known as weight layers. Their combined number is 16, the origin of the name of the VGG-16 network. For the VGG-16 convolutional neural network, its 13 convolutional layers and 5 pooling layers are responsible for feature extraction, and the last 3 fully connected layers are responsible for completing the classification task. The model for the VGG-16 algorithm is shown in Figure 3. The Feature Extraction Module, Area Suggestion Network, Proposal Layer Part, and ROI Pooling Section are the four primary components of VGG-16.

The specific workflow of constructing this model is as follows: First, we extract the feature map from the original image using a series of convolutional and pooling layers. Second, the target’s approximate location is obtained from the feature map through network training. Then, we introduce the regional suggestion network to obtain the approximate position and continue training. Using the exact position obtained earlier, the target for classification is truncated from the feature map and pooled into fixed-length data. In addition, the Softmax classifier is used to predict the score of the target class.

For the training and testing of this constructed model, this study split the city-level photos, collected to create a collection of Wuhan architectural styles, into a 4:1 ratio between the training and test sets. The training set mainly converged the deep learning model to a depth that met the training requirements. The test set was used to judge the accuracy of model training. The original training image was preprocessed into the original pixel image and label information map into the model for training, respectively. Then, the training set was imported to extract the upper and lower features of the image in layers and calculate their precise value (P). The model that completed image training used the test set for testing and calculated the recall value (R). P and R were checked to verify whether they fulfilled the accuracy requirements; if not, the parameters were modified and retrained until they did. The model could divide the photographs of the building façade into 22 categories.

This study uses some model evaluation criteria for estimating the performance of the constructed model. The first is the precision (P) of prediction results, which is the likelihood that the sample is positive among all the samples projected to be positive, which is used as the reference standard in this article along with recall and F1 score, and the expression is:

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

The second is recall (R), which is for the original sample and means the probability that it will be predicted as a positive sample in the sample that is positive, expressed as:

R e c a l l = \frac{T P}{T P + F N}

(2)

where TP (True Positive) is the number of samples with positive matches of the actual result. FP (False Positive) is the number of pieces with a negative match of the accurate result. TN (True Negative) is the number of samples with a positive match to the test result. FN (False Negative) is the number of samples with a negative match to the test result. The default model training tends to the predicted limit when P and R improvement is less than 1%.

The third is the F1 score, which is commonly referred to as the F-measure, a metric of classification accuracy that accounts for both recall and precision. It has been outlined as follows:

F 1 = \frac{2 * P * R}{P + R}

(3)

3.2.3. Street View Architectural Style Dataset Construction

SVI data have gained popularity in recent years as agents for real-world experience and perception, and they are frequently used in scientific research. Using Google Street View (GSV), a feature of Google Maps, is less time-consuming and labor-intensive than traditional on-site data collection or drone shooting. It contains a substantial quantity of data, covers various geography (street view services are available for more than 100 different nations worldwide), and offers API development interfaces from map providers. SVIs and other parameters can be acquired at the desired location. The SVI comprises comprehensive urban infrastructure information, allowing for the retrieval of street-level architectural images and the intuitive and correct reflection of urban façade information [54]. Commonly used SVI data sources in the literature are Google Street View (GSV) [55], Baidu Street View (BSV), and Tencent Street View (TSV) [56]. The data in this study were derived from the Baidu Street View database. The dataset should attempt to satisfy the four conditions of sufficient light, weak distortion, complete content, and good resolution of the photo to improve recognition accuracy. After data cleaning, the parts that are too deformed, damaged, or not obvious are excluded, and the image size is unified to 2048 × 1024 pixels by OpenCV processing software (Figure 4).

After completing image processing, the open-source tool Labelimg was obtained through GitHub to semantically annotate the architectural style content. A JSON file in XML format comprising the markup type, coordinate point, quantity etc., was used to store the annotated data. The information in the extracted file could generate binary images for training to bring the CNN models to a fitted state. The completed PASCAL VOC dataset was fed into the CNNs to “learn” the deep spatial composition information of different architectural styles. Cyclic parameter tuning, training, and verification steps ensured the network accuracy loss reaches the minimum value and completes model convergence. The obtained SVI dataset was entered into a CNN model that complemented architectural style “learning” to predict architectural style. The continuous SVI matched [4] the building with the map vector data (Figure 5). Area matching between the single building image and the corresponding building outline was realized by obtaining the correspondence between the continuous SVI and the azimuth angle of the building in the flat building outline. The mapping relationship between the target building and the flat building outline in the image was established and constructed using the architectural style dataset. Labelimg is an open-source visualization image annotation tool. Faster-RCNN, YOLO, SSD, and other datasets required by object detection networks need this tool to calibrate image targets. The generated XML file follows the PASCAL VOC format.

We introduced a method of mapping the orientation of the monoscopic building area to match the inside building image and the building’s top-down contour map. The basic flow of the algorithm is: (1) Obtain the area of a single building in street view (represented by the corner pixel coordinates corresponding to the rectangular box). (2) Transfer the pixel coordinates of the rectangular corner point to the panorama spherical coordinates and then construct a rotation matrix through the azimuth and pitch angles of the street view and rotate the panorama spherical coordinates parallel to the geodetic coordinates. (3) Calculate the azimuth range of the target building under the panoramic spherical coordinates and the azimuth range of all buildings near the collection point centered on the collection point. Theoretically, the azimuth range of the target building in the panoramic spherical coordinate system is the same as that of its corresponding overhead profile. According to this principle, we designed the IOU index to select the top-down contour of the building with the highest overlap in the azimuth range and complete the matching of the target building with the corresponding overhead profile.

This process does not need to calculate the absolute position of the streetscape construction area, eliminating a variety of errors due to absolute positioning. Of course, we need to pay attention to the accuracy of this algorithm. Therefore, we artificially selected 500 streetscape building areas and marked their corresponding building outlines on the map as real labels. Then, the matching building outline was obtained by this algorithm, and the matching accuracy was obtained. The results are shown in Table 3.

4. Results

4.1. Street View Image Annotation

There is no public and accessible dataset of the classification of architectural styles in Wuhan currently, and this paper uses the central urban area of Wuhan as the research object. We obtained a total of 133,114 Baidu SVIs from 2020 to the present and vector map data containing roughly 130,743 overlooking outlines of buildings in shapefile format (Figure 6). This created a Wuhan SVI dataset for network model training and testing. Twenty-three different architectural styles prevalent in urban architecture were compiled into a dataset. Each SVI has a resolution of 2048 × 1024 pixels, and the distance between two adjacent image locations is 8–20 m. Each SVI has a geographic information label. This includes the latitude and longitude of the location where the image was taken, the azimuth angle of true north in the image, the shooting attitude information of the image, and a unique identifier for adjacent street view. A PASCAL VOC sample set of 43,670 images was generated by manual annotation by undergraduate and graduate students with a professional architectural education. This study also conducted regular quantitative inspections of the completed annotation information to ensure accuracy. After manual checking and confirmation, the accuracy of the label key reached 100%. The size of the extracted building affects the accuracy of the labeling to some extent. Considering that the SVIs we acquired were preprocessed, each SVI had a resolution of 2048 × 1024 pixels. We needed to maintain the dimensions of the candidate boxes in the range of 200 × 100 to 1000 × 500 to ensure the accuracy and validity of our annotations. The image data were preprocessed to establish the training and test sets, and the ratio of training samples to test samples was 4:1. Figure 7 shows the number of original calibrations for each architectural style. To balance the training sample, the images of building areas with a limited number of classes were horizontally flipped to double the sample size effectively.

4.2. Model Comparison

This study used the Wuhan SVI dataset to fine-tune all convolutional layers to find a model suitable for creating the Wuhan architectural style dataset. Three typical CNN models, namely Faster-RCNN, YOLOX_X, and SSD, were tested to demonstrate and compare the accuracy performance of the corresponding styles. The three sets of model accuracy line charts in Figure 8 show the disparities in the models’ respective capacities to distinguish between architectural styles. The performance of the SSD model can be observed via the accuracy of the detection of the 22 architectural styles and does not show an ability to identify specific style features. The recognition of all architectural styles needs to be improved. It is not suitable for the construction task of the Wuhan architectural style dataset; thus, it is not considered. Even though the YOLOX_X model performs similarly to Faster-RCNN in the recognition performance of some styles, such as French classicism, Gothic, Hui, and Su style, and even slightly outperforms it in the recognition performance of Ancient Greek and Art Deco styles, when the recognition performance of the 22 different styles is compared, Faster-RCNN is still the best model, particularly in the recognition of the Folk Residence, Functionalism, Western style, Yuan, New Chinese, and Byzantium styles, at which it showed overwhelming dominance.

For the functionalism style with a large number of labels, the three models show good recognition accuracy, proving that a sufficient number of labels is conducive to improving the model’s accuracy. However, for some traditional architectural styles, such as Byzantium, Jing, and Tang, the numbers of labels of these styles differ from the functionalist style, but show good recognition accuracy. Through analysis, the architectural style with apparent characteristics of this class has certain advantages in judging style attributes by identifying external features. This may be why they do not have an advantage from the number of labels, but they can achieve better recognition accuracy. Therefore, if the style does not have eye-catching features, increasing the number of annotations may be the method to obtain a higher recognition accuracy.

In order to produce a city-scale architectural style dataset, we chose the Faster-RCNN network with the best overall performance out of these networks. The hardware setup for the test was as follows: The operating system was Windows 10, and the programming language was Python 3.6. The deep learning platform was Tensor-Flow-GPU 1.13. The machine learning framework used the sci-kit-learn library. The image processing was carried out using Open-CV. The CPU was an Intel core i7-7700HQ@2.8 GHz quad-core processor running at 2.8 GHz. The GPU was a Dell Ge-force GTX 1050. Memory capacity was 16G (Kingston DDR4 2400 MHz/magnesium DDR 2400 MHz), and the SSD was a Toshiba THNSNK128VN8 M.2 2280 128 GB.

After the deep learning model was trained, the detection effect of the model was tested. The test consisted of machine inspection (Figure 9) and manual review in two steps, and machine detection of the test set using a trained deep learning model. At the same time, the TP, FP, and FN sample values of the model were recorded through manual observation. The effect of the model on architectural style recognition was evaluated according to Equations (1)–(3), and the average precision (AP) on the test set was used as the accuracy index of Faster-RCNN. The detection result’s mean average precision (mAP) was the arithmetic mean of the AP value detected via each architectural style category. An example of the test set detection result when the IoU is set to 0.7 is shown in Figure 10. Table 4 presents the dataset style type and the corresponding style detection precise value. In this paper, for the detection and training of urban architectural images, the algorithm of the architectural style task is set as follows: the classification threshold is set to 0.7, the number of network model iterations is 50,000, the weight attenuation is 0.0005, the learning rate is 0.001, and the number of candidate detection boxes generated by the regional recommendation network layer is 300 [41,46,57]. The experimental results show high precision (P), recall (R), and training efficiency. The training results tend to be stable with the increase in training times.

Table 4 and the 22 × 22 confusion matrix diagram (Figure 11) show that there are 22 types of architectural styles, of which 18 types can be detected with an accuracy of 0.5 and 10 types have reached 0.76, and no style with a detection accuracy below 0.25 has been produced. The potential factors for several lower-precision styles (Art Deco, New Chinese, French Classicism, and Baroque) are as follows. First, for French classicism and Baroque, the number of buildings within the city limits is small for these styles. However, they shows good validity in the confusion matrix, proving that these styles have suitable identification. However, the data should be expanded to improve the accuracy when conditions permit. For Art Deco, we found by looking at the confusion matrix that a considerable number of Art Deco buildings were incorrectly identified as Western-style. Art Deco originated at the Paris Exposition, matured during the construction of skyscrapers in the United States in the 1920s and 1930s, and its style characteristics include some European styles. Nevertheless, by looking at the European style, it is rare to find that it is mistakenly identified as Art Deco. Therefore, extracting feature elements such as setbacks and diverse geometric patterns significantly improves detection of Art Deco. Looking at the confusion matrix, the New Chinese buildings are mostly misidentified as Functionalism. Only by appearance can the two styles be identified with some similarities, such as simple, streamlined, function-oriented features. New Chinese is a style that adds abstract Chinese cultural symbols based on modernist architectural design. One of the representative features is the Chinese roof, so adjusting the model to focus more on the representative features of the building may be the key to improving the accuracy of New Chinese style detection. Finally, part of the style is defined as background architecture. In this case, the New Chinese style and all styles will be greatly affected.

4.3. Results and Analysis of Architectural Style Dataset

Figure 12 shows the number of styles in Wuhan that completed the style identification task and were successfully matched to the architectural style dataset.

We produced a schematic map of Wuhan’s architectural style dataset matching street view imagery. According to Figure 13, the architectural style dataset can genuinely reflect the accurate architectural style distribution of the city. Nevertheless, the experimental results also show some misjudgments of architectural style. Our analysis of the experimental results shows that the convolutional neural network model performs well for Hui- and Qing-style buildings in terms of accuracy and prediction accuracy, considering that the characteristics of Hui-style buildings are more prominent in the city. The unique horse head wall, tile, white wall, and Qing’s hard mountain architectural features are not difficult to identify. According to Figure 14, most of the erroneous predictions of the dark parts in the picture occurred with the new Chinese architectural style. One of the possible reasons for the incorrect prediction after analysis is that an essential criterion for the new Chinese architectural style depends on its Chinese roof. However, the new Chinese-style buildings the model incorrectly predicted are primarily concentrated in high-density areas. The street view captured by the street view collection vehicle cannot fully show the characteristics of the building itself, which leads to these errors. According to the statistics, the selected area contains 613 building units; 465 building units were correctly predicted, 148 building units were incorrectly predicted, and the prediction accuracy was 75.96%. The prediction validity should continue to be improved in subsequent studies, but, to some extent, the actual validity of the architectural style dataset can be demonstrated.

By analyzing the dataset, it can be seen that the distribution of buildings in Wuhan shows the following characteristics: First, the Ancient Roman, Ancient Greece, and Western styles are mainly concentrated along the river around Hankou (Figure 15). The distribution of Western-style buildings in other areas is scattered, which may be because a Western style was chosen by real estate developers when designing buildings and has yet to form a normal distribution in an identifiable manner. Then, the dataset includes a limited number of traditional historical styles such as Folk Residences and buildings from the Chu, Han, Ming, and Qing style. Their distribution is mainly concentrated around Wuhan’s most popular tourist destinations, such as the Hankou Historical and Cultural Scenic Area, Yellow Crane Tower Scenic Area, and East Lake Ecotourism Scenic Area (Figure 16), or less developed areas near the Third Ring Road. Expressionist architecture in Wuhan is more widely distributed in the more economically developed and densely populated Hankou River area and Wuchang District. In addition to the sites mentioned above, most of the expressionist buildings in the city are the only remaining large-scale public buildings, such as high-specification shopping malls or stadiums in each region. Lastly, the Functionalism style and the New Chinese style make up the majority of the architectural styles in Wuhan found in this article. The architectural distribution in most Chinese cities may be similar to the distribution in Wuhan.

According to the spatial distribution of buildings presented by the Wuhan architectural style dataset, it was concluded that the architectural and cultural characteristics of Wuhan may be affected by the following situations.

The main concentration of Ancient Roman, Ancient Greece, and Western styles was originally in the sites leased by the five countries. At that time, Western countries relied on their colonial privileges and financial and material resources, and on advanced science and technology, to build the area along the river in Hankou into a modern city. There are characteristic semi-colonial Western buildings on both sides of the riverside avenues with a sense of the times and history, which are protected today, becoming tourist attractions and reflecting the history of urban development.

Traditional Chinese historical buildings are primarily concentrated in the major scenic spots in Wuhan. “Moshan Chucheng” (Moshan Scenic Area) was built in the East Lake Ecotourism Scenic Area in 1992. It has significant components reflecting Chinese national culture, distinct regional representation, and enormous economic and cultural development value. It has the features of simple, magnificent, and historical buildings, and it possesses “Jingchu” cultural characteristics. The development and renewal of the city have significantly affected the level of preservation of traditional buildings. Traditional architecture had different development strategies; in the early days, the choice between renewal and redevelopment was influenced mainly by attitudes toward conventional architecture and the economic value of the time. Historic districts were undervalued in the early stages of urban growth, which led to numerous demolitions in places with more active urban construction. Some classic buildings have been conserved due to planners’ growing understanding of historical importance in recent years.

The expressionist aesthetic strongly emphasizes linearity, rejects the conventional models of modern industry, pursues novel volumes, and uses unique architectural forms. To achieve the expressionist appearance, it is typically necessary to complete individualized artistic expression with contemporary science and technology. The distribution of expressionist style buildings in Wuhan has been examined, and it has been found that streamlined and asymmetrical characteristics mostly characterize expressionist-style buildings. These buildings are more common in areas with strong economic power or advanced scientific and technological levels. At the same time, they are relatively uncommon in regions with relatively slower development or where production and manufacturing are the primary goals.

The Functionalism and New Chinese styles comprise most of the architectural style dataset for Wuhan created in this study. This is not only the case in Wuhan, but also in most cities in China, where the distribution of buildings has a high probability of being in this form. Functionalism was the easiest to adopt in the early years of the formation of New China due to the lack of resources and the pressing need for rebuilding, since it can quickly satisfy people’s fundamental housing needs. Master architect Sicheng Liang once proposed that building “Chinese and new” is the upper grade, “West and new” is secondary, “Chinese and ancient” is again secondary, and “West and ancient” is the lower product. This concept has persisted to the present day, contributing to the New Chinese style’s dominance in the city. The New Chinese style maintains the essence of traditional architecture and effectively integrates modern architectural elements and design factors, changing the practical use of traditional buildings and repositioning them.

The following characteristics of Wuhan’s architectural style can be deduced from the analysis of the dataset: (1) from the standpoint of architectural distribution, different architectural styles coexist in Wuhan, fusing Chinese and Western elements and coordinating with one another; (2) development and innovation continuously improve the architectural form; (3) high-rise buildings or other public buildings in the city have been influenced by foreign architectural culture, showing a diversity of architectural styles, which is also the most crucial feature of Wuhan’s contemporary architectural culture.

4.4. Dataset Validity

Six hundred and fifty-seven distinct building shapes were chosen from the generated results, and their architectural attributes were manually identified according to previous knowledge to objectively examine the dataset’s correctness. The dataset’s classification accuracy map representing Wuhan’s architectural styles is presented in Figure 17 and was created after the experimental results were cross-validated with the previous categories. The Yuan, Song, Han, Su, French classicism, Byzantium, Gothic, and Baroque architectural information used for verification in the experimental area needed to be included, so these eight architectural styles were ignored. As a result, the average classification accuracy of the dataset was 57.8%, the average recall rate was 80.91%, and the average F1-score was 0.634. The results demonstrated that the accuracy of the dataset created in this experiment for Wuhan’s architectural style needs to be further enhanced. However, it can reflect the geographic distribution of Wuhan’s architectural style and create datasets for other cities with only minor adjustments. In addition, the accuracy of Ming and Tang styles is much lower than the average. However, considering the geographical location, the Ming and Tang styles are more distributed in the north, such as in Beijing or Xi’an, and may have better recognition performance if evaluated in these areas. According to the Faster-RCNN model test results, combined with precision, recall, F1-score, and other indicators, the visual analysis of architectural style datasets shows that the number of training samples corresponding to architectural styles with poor classification accuracy is low, or it is challenging to identify them due to their style characteristics. It was found via manual inspection that the model has a good recognition of distinctive styles such as Hui, Ancient Greece, and Ancient Rome. Although the identification of Functionalism is efficient, the recognition accuracy of the model between the three styles (Functionalism, New Chinese style, and other Western-style high-rises) still needs further improvement.

5. Discussion

This study introduced deep learning based computer vision techniques to automatically create urban building datasets, including building facades, orientations, and geographic locations from publicly available SVIs. After obtaining the object detection model, we created an urban building dataset using only publicly available SVI images without needing experts to conduct expensive field surveys or time-consuming manual site investigations. Compared to other studies using street view images, which usually focus on the height of buildings, the comfort of urban spaces, or the rationality of streets, our model focused more on studying urban architectural styles. We combined the building detection and classification in SVIs with the estimation of architectural styles through the graphical analysis of images, resulting in a comprehensive dataset of urban architectural styles. Unlike other approaches to analyzing urban space via SVI, we use SVI-based deep learning graphical analysis to accomplish the automated creation of urban architectural style datasets.

5.1. Street View Imagery as an Agent for Deep Learning

With the advent of the big data era, street view imagery is more readily available from map merchants (Google, Mountain View, Santa Clara County, California, U.S and Baidu, Haidian District, Beijing, China) and crowdsourcing platforms, which provides opportunities to measure and understand the built environment and human–environment interactions. Regarding urban building information acquisition, street view imaging methods have so far been more cost-effective and scalable than manual acquisition. Combined with deep learning, SVI can be used to explore the distribution of urban architectural styles and the rationale behind them. In this study, SVIs are used to classify architectural styles, which presents better accuracy, and is easily transferable to other cities based on the dataset construction method proposed in this paper. More importantly, the dataset construction method proposed in this paper completes the construction of an architectural style dataset of a city at a very low cost. As far as we know, this is the first time that SVIs and convolutional neural networks have been combined to be applied to the study of the architectural styles of Wuhan.

5.2. The Model’s Performance in Architectural Style Detection and Classification

The method has a mAP value of 0.57, a recall of 0.80, and an F1-score of 0.63 for detecting and classifying urban building styles in SVI. The higher accuracy and lower recall are due to a confidence threshold of 0.7, as we focus more on accurate classification of objects than on detection capabilities. This performance is slightly higher than the findings of Kang et al. [44] in 2017. All of the models they reported had optimal classification performance for building categories with a precision of 0.59, recall of 0.58, and F1-score of 0.58. Compared to Sun et al. [47] in 2022, our model recognition performance is lower. The mAP values of the latter nine categories of buildings mostly exceeded 0.80. In addition, Shan et al. [37] designed a higher-accuracy deep learning model to identify buildings in Harbin with an average accuracy of 0.87 across 12 categories. It is worth mentioning that our classification system is richer and more comprehensive than the studies of the above scholars.

5.3. Sensitivity Analysis of the CNN Model

The results show the potential of our proposed dataset construction method in predicting urban architectural styles from SVIs and mapping the datasets. We confirm the mixed performance produced by the model’s quantitative evaluation in predicting urban architectural styles. In terms of classification accuracy in 22 architectural categories, classical Chinese architectural styles (Tang, Song, Hui, etc.) are easier to identify than other architectural categories because their structures are more unique. However, our model does not exhibit completely satisfactory performance when grading modern architectural styles that do not have distinctive features apart from a mix of multiple styles, such as Art Deco. Although we have tried to distinguish architectural styles as much as possible, the reality is that it is difficult to define the representation of each building precisely.

The following reasons can interfere with the accuracy of the model. First, the style of architecture has evolved over time. Architectural features of similar eras (such as the Ming and Qing dynasties) will have some similarities. This is an extremely serious challenge for the object detection capabilities of convolutional neural networks, although human experts can distinguish them by their representative architectural components. Secondly, there are many old neighborhoods in the city, most of which have undergone secondary renovation, which has destroyed the original architectural styles to some extent.

5.4. Promoting the Development of Research in the Field of Architectural Styles

The contribution of this work is that it not only demonstrates the learning and prediction capabilities of convolutional neural networks but also combines SVIs to show the decision-making process and the analysis ability of deep learning models in the study of urban architectural styles. The method proposed in this paper presents a new idea for research in urban architectural styles. Traditional architectural style research mostly focuses on a specific style in a specific area, analyzing and judging its development, characteristics, and reasons for formation. Cheng [58] analyzed the stylistic buildings built by Japanese architects in Qingdao circa 1914, and the author interpreted the specific architectural heritage style of Qingdao during this special period. Chai [59] selected the building complex of the Bund in Shanghai as the object of study and analyzed the influence of the development of the building structure on the architectural style. Harbin’s architectural style shifted from the Russian architecture style to an eclectic architectural style with Chinese architectural styles, and Yang [60] also paid attention to Harbin’s Chinese Baroque architectural district. Qin [61] categorized and summarized the style types of Christian church architecture in the area through visits, research, and analysis and classified them accordingly. This study also analyzed the architectural characteristics of Christian churches in the region in terms of orientation selection, plan form, façade form, and interior space. This provides a useful supplement to studying the current situation of traditional and contemporary Christian churches in Nanjing. The dataset component method proposed in this paper relies on more economical SVI data, avoids high expert fees, and reduces the cost of manual research. The task of perceptual style judgment is transformed into rational data analysis. An automated dataset construction process from data acquisition to model analysis and finally dataset mapping is formed.

6. Conclusions

This paper introduces an approach for building an architectural style dataset via machine learning on SVIs. It summarized 22 architectural styles in the study area which could be used to define and describe urban architectural styles in most Chinese urban areas. Within this approach, a Faster-RCNN general framework of architectural style classification with VGG-16 backbone network was introduced. This framework is the first attempt to use a machine learning approach for identifying architectural styles for Chinese cities. Furthermore, this framework implements an approach of constructing an urban architectural style dataset by mapping the identified architectural style in continuous street view imagery and vector map data of a building top-down contour map.

This introduced approach puts forward a different working idea from the traditional sense to solve the problems of time-consuming, subjective judgment and difficult large-scale promotion of manual architectural style recognition in the existing architectural style recognition methods. The dataset generated in this study helps us gain a more thorough understanding of urban areas, which should be valuable in maintaining distinctive features for sustainable and smart cities.

This research can be further developed in two directions in the future. First, we can pay more attention to the relevant research in the field of architectural styles. This experiment focused on only three classical deep learning frameworks, which need more deep learning performance and level index comparisons. There may be more effective CNN models in addition to these three models. Considering that the model has limited recognition of small building details, the model’s tiny target recognition could be enhanced in the future. Second, the classification definition of architectural styles within a city may require further discussion. When technology permits, datasets should be expanded to include more representative architectural style groups and broaden the image coverage. It is now challenging to promote such exploratory and non-digital assessment techniques and combine them with computer-aided work since the definition of architectural style is a subjective task.

This study verifies the feasibility of using this method to identify urban architectural styles. The architectural style dataset construction method we proposed only needs to be modified appropriately, and can provide a practical reference for other urban architectural style control work. The methods and techniques of this research can be extended to other fields in the future.

Author Contributions

Conceptualization, Hong Xu and Haozun Sun; methodology, Hong Xu and Haozun Sun; software, Lubin Wang and Haozun Sun; validation, Lubin Wang and Haozun Sun; formal analysis, Hong Xu and Haozun Sun; annotation, Haozun Sun, Xincan Yu and Tianyue Li; writing—original draft preparation, Hong Xu and Haozun Sun; writing—review and editing, Hong Xu; visualization, Hong Xu and Haozun Sun; supervision, Hong Xu; project administration, Hong Xu; funding acquisition, Hong Xu. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the National Natural Science Foundation of China (41771473) and the Hubei Changjiang National Cultural Park Construction Research Project (HCYK2022Y20).

Data Availability Statement

Not applicable. Data are annotated by architectural professionals and can be contacted if needed.

Conflicts of Interest

The authors declare no conflict of interest.

References

Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep Learning for Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef]
Chen, X.; Meng, Q.; Hu, D.; Zhang, L.; Yang, J. Evaluating Greenery around Streets Using Baidu Panoramic Street View Images And the Panoramic Green View Index. Forests 2019, 10, 1109. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Pei, T.; Chen, Y.; Song, C.; Liu, X. A review of urban environment assessment based on streetscape images. J. Geo-Inf. Sci. 2019, 21, 46–58. [Google Scholar]
Guan, F.; Fang, Z.; Yu, T.; Feng, M.; Yang, F. Detecting visually salient scene areas and deriving their relative spatial relations from continuous street-view panoramas. Int. J. Digit. Earth 2020, 13, 1504–1531. [Google Scholar] [CrossRef]
Cinnamon, J.; Jahiu, L. Panoramic Street-Level Imagery in Data-Driven Urban Research: A Comprehensive Global Review of Applications, Techniques, and Practical Considerations. ISPRS Int. J. Geo-Inf. 2021, 10, 471. [Google Scholar] [CrossRef]
Li, Y.; Guo, J.; Chen, Y. A New Approach for Tourists’Visual Behavior Patterns and Perception Evaluation based on Multi-source Data. J. Geo-Inf. Sci. 2022, 24, 2004–2020. [Google Scholar]
Li, Y.; Jingxiong, H.; Liang, J.; Zhang, Y.; Chen, Y. Research on Visual Attraction and Influencing Factors of Perception of Commercial Street Space in Cultural Heritage Site: Taking Gulangyu Longtou Road as an Example. West. J. Hum. Settl. 2022, 37, 114–121. [Google Scholar]
Biljecki, F.; Ito, K. Street view imagery in urban analytics and GIS: A review. Landsc. Urban Plan. 2021, 215, 104217. [Google Scholar] [CrossRef]
Ning, H.; Li, Z.; Ye, X.; Wang, S.; Wang, W.; Huang, X. Exploring the vertical dimension of street view image based on deep learning: A case study on lowest floor elevation estimation. Int. J. Geogr. Inf. Sci. 2022, 36, 1317–1342. [Google Scholar] [CrossRef]
Liu, Z.; Lv, J.; Yao, Y.; Zhang, J.; Kou, S.; Guan, Q. Research Method of Interpretable Urban Perception Model based on Street View Imagery. J. Geo-Inf. Sci. 2022, 24, 2045–2057. [Google Scholar]
Mahabir, R.; Schuchard, R.; Crooks, A.; Croitoru, A.; Stefanidis, A. Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam. ISPRS Int. J. Geo-Inf. 2020, 9, 341. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
Liu, J.e.; An, F.P. Image Classification Algorithm Based on Deep Learning-Kernel Function. Sci. Program. 2020, 2020, 7607612. [Google Scholar] [CrossRef] [Green Version]
Barbosa, R.C.; Ayub, M.S.; Rosa, R.L.; Rodriguez, D.Z.; Wuttisittikulkij, L. Lightweight PVIDNet: A Priority Vehicles Detection Network Model Based on Deep Learning for Intelligent Traffic Lights. Sensors 2020, 20, 6218. [Google Scholar] [CrossRef]
Khalil, R.A.; Jones, E.; Babar, M.I.; Jan, T.; Zafar, M.H.; Alhussain, T. Speech Emotion Recognition Using Deep Learning Techniques: A Review. IEEE Access 2019, 7, 117327–117345. [Google Scholar] [CrossRef]
Zhou, H.; He, S.; Cai, Y.; Wang, M.; Su, S. Social inequalities in neighborhood visual walkability: Using street view imagery and deep learning technologies to facilitate healthy city planning. Sustain. Cities Soc. 2019, 50, 101605. [Google Scholar] [CrossRef]
Yang, L.; Liu, J.; Liang, Y.; Lu, Y.; Yang, H. Spatially Varying Effects of Street Greenery on Walking Time of Older Adults. ISPRS Int. J. Geo-Inf. 2021, 10, 596. [Google Scholar] [CrossRef]
Li, X.; Ratti, C.; Seiferling, I. Quantifying the shade provision of street trees in urban landscape: A case study in Boston, USA, using Google Street View. Landsc. Urban Plan. 2018, 169, 81–91. [Google Scholar] [CrossRef]
Wang, W.X.; Guo, H.; Li, X.M.; Tang, S.J.; Xia, J.Z.; Lv, Z.H. Deep learning for assessment of environmental satisfaction using BIM big data in energy efficient building digital twins. Sustain. Energy Technol. Assess. 2022, 50, 101897. [Google Scholar] [CrossRef]
Chen, D.; Wawrzynski, P.; Lv, Z. Cyber security in smart cities: A review of deep learning- based applications and case studies. Sustain. Cities Soc. 2021, 66, 102655. [Google Scholar] [CrossRef]
Zeng, N.; Zhang, H.; Song, B.; Liu, W.; Li, Y.; Dobaie, A.M. Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 2018, 273, 643–649. [Google Scholar] [CrossRef]
Guo, Y.H.; Wang, C.F.; Yu, S.X.; McKenna, F.; Law, K.H. AdaLN: A Vision Transformer for Multidomain Learning and Predisaster Building Information Extraction from Images. J. Comput. Civ. Eng. 2022, 36, 04022024. [Google Scholar] [CrossRef]
Wang, C.F.; Hornauer, S.; Yu, S.X.; McKenna, F.; Law, K.H. Instance segmentation of soft-story buildings from street-view images with semiautomatic annotation. Earthq. Eng. Struct. Dyn. 2022, 52, 2520–2532. [Google Scholar] [CrossRef]
Yu, Q.; Wang, C.F.; McKenna, F.; Yu, S.X.; Taciroglu, E.; Cetiner, B.; Law, K.H. Rapid visual screening of soft-story buildings from street view images using deep learning classification. Earthq. Eng. Eng. Vib. 2020, 19, 827–838. [Google Scholar] [CrossRef]
Wang, C.F.; Yu, Q.; Law, K.H.; McKenna, F.; Yu, S.X.; Taciroglu, E.; Zsarnoczay, A.; Elhaddad, W.; Cetiner, B. Machine learning-based regional scale intelligent modeling of building information for natural hazard risk management. Autom. Constr. 2021, 122, 103474. [Google Scholar] [CrossRef]
Wang, C.F.; Antos, S.; Goldsmith, J.; Triveno, L. Visual Perception of Building and Household Vulnerability from Streets. arXiv 2022, arXiv:2205.14460. [Google Scholar] [CrossRef]
Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-Line Building Energy Optimization Using Deep Reinforcement Learning. IEEE Trans. Smart Grid 2019, 10, 3698–3708. [Google Scholar] [CrossRef] [Green Version]
Fan, C.; Xiao, F.; Zhao, Y. A short-term building cooling load prediction method using deep learning algorithms. Appl. Energy 2017, 195, 222–233. [Google Scholar] [CrossRef]
Runge, J.; Zmeureanu, R. A Review of Deep Learning Techniques for Forecasting Energy Use in Buildings. Energies 2021, 14, 608. [Google Scholar] [CrossRef]
Fan, C.; Sun, Y.J.; Zhao, Y.; Song, M.J.; Wang, J.Y. Deep learning-based feature engineering methods for improved building energy prediction. Appl. Energy 2019, 240, 35–45. [Google Scholar] [CrossRef]
Shon, D.; Noh, B.; Byun, N. Identification and Extracting Method of Exterior Building Information on 3D Map. Buildings 2022, 12, 452. [Google Scholar] [CrossRef]
Yin, J.C.; Wu, F.; Qiu, Y.; Li, A.P.; Liu, C.Y.; Gong, X.Y. A Multiscale and Multitask Deep Learning Framework for Automatic Building Extraction. Remote Sens. 2022, 14, 4744. [Google Scholar] [CrossRef]
Zhang, J.X.; Fukuda, T.; Yabuki, N. Development of a City-Scale Approach for Facade Color Measurement with Building Functional Classification Using Deep Learning and Street View Images. ISPRS Int. J. Geo-Inf. 2021, 10, 551. [Google Scholar] [CrossRef]
Pardamean, B.; Muljo, H.H.; Cenggoro, T.W.; Chandra, B.J.; Rahutomo, R. Using transfer learning for smart building management system. J. Big Data 2019, 6, 110. [Google Scholar] [CrossRef] [Green Version]
Han, X.; Wang, L.; Seo, S.H.; He, J.; Jung, T. Measuring Perceived Psychological Stress in Urban Built Environments Using Google Street View and Deep Learning. Front. Public Health 2022, 10, 1295. [Google Scholar] [CrossRef]
Ji, Y.; Dong, Y.; Hou, M.; Qi, Y.; Huo, P. Automatic identification method of ancient building ridge beast based on convolutional neural network. Geomat. World 2021, 28, 54–60. [Google Scholar]
Shan, L.; Zhang, L. Application of Intelligent Technology in Facade Style Recognition of Harbin Modern Architecture. Sustainability 2022, 14, 7073. [Google Scholar] [CrossRef]
Yoshimura, Y.; Cai, B.; Wang, Z.; Ratti, C. Deep Learning Architect: Classification for Architectural Design through the Eye of Artificial Intelligence. In Computational Urban Planning and Management for Smart Cities; Geertman, S., Zhan, Q., Allan, A., Pettit, C., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 249–265. [Google Scholar]
Zhang, L.; Pei, T.; Wang, X.; Wu, M.; Song, C.; Guo, S.; Chen, Y. Quantifying the Urban Visual Perception of Chinese Traditional-Style Building with Street View Images. Appl. Sci. 2020, 10, 5963. [Google Scholar] [CrossRef]
Han, Q.; Yin, C.; Deng, Y.; Liu, P. Towards Classification of Architectural Styles of Chinese Traditional Settlements Using Deep Learning: A Dataset, a New Framework, and Its Interpretability. Remote Sens. 2022, 14, 5250. [Google Scholar] [CrossRef]
Kalfarisi, R.; Hmosze, M.; Wu, Z.Y. Detecting and Geolocating City-Scale Soft-Story Buildings by Deep Machine Learning for Urban Seismic Resilience. Nat. Hazards Rev. 2022, 23, 04021062. [Google Scholar] [CrossRef]
Liu, C.; Sepasgozar, S.M.E.; Zhang, Q.; Ge, L.L. A novel attention-based deep learning method for post-disaster building damage classification. Expert Syst. Appl. 2022, 202, 117268. [Google Scholar] [CrossRef]
Obeso, A.M.; Benois-Pineau, J.; Acosta, A.A.R.; Vazquez, M.S.G. Architectural style classification of Mexican historical buildings using deep convolutional neural networks and sparse features. J. Electron. Imaging 2017, 26, 011016. [Google Scholar] [CrossRef]
Kang, J.; Koerner, M.; Wang, Y.; Taubenboeck, H.; Zhu, X.X. Building instance classification using street view images. ISPRS J. Photogramm. Remote Sens. 2018, 145, 44–59. [Google Scholar] [CrossRef]
Yi, Y.K.; Zhang, Y.; Myung, J. House style recognition using deep convolutional neural network. Autom. Constr. 2020, 118, 103307. [Google Scholar] [CrossRef]
Xu, H.; Wang, L.; Fang, Z.; He, M.; Hou, X.; Zuo, L.; Guan, F.; Xiong, C.; Gong, Y.; Pang, Q.; et al. Mapping of street-facing architectural style and map generation method under street view imagery. Geomat. Inf. Sci. Wuhan Univ. 2021, 46, 13. [Google Scholar] [CrossRef]
Sun, M.; Zhang, F.; Duarte, F.; Ratti, C. Understanding architecture age and style through deep learning. Cities 2022, 128, 103787. [Google Scholar] [CrossRef]
Zhao, P.; Miao, Q.; Song, J.; Qi, Y.; Liu, R.; Ge, D. Architectural Style Classification Based on Feature Extraction Module. IEEE Access 2018, 6, 52598–52606. [Google Scholar] [CrossRef]
Wang, B.; Zhang, S.; Zhang, J.; Cai, Z. Architectural style classification based on CNN and channel-spatial attention. Signal Image Video Process. 2022, 17, 99–107. [Google Scholar] [CrossRef]
Lamas, A.; Tabik, S.; Cruz, P.; Montes, R.; Martinez-Sevilla, A.; Cruz, T.; Herrera, F. MonuMAI: Dataset, deep learning pipeline and citizen science based app for monumental heritage taxonomy and classification. Neurocomputing 2021, 420, 266–280. [Google Scholar] [CrossRef]
Sun, H.; Hong, X.; Wei, Q. The Classification Method of Urban Architectural Styles Based on Deep Learning and Street View Imagery. Hydraul. Civ. Eng. Technol. VII 2022, 31, 823–830. [Google Scholar] [CrossRef]
Revaud, J.; Heo, M.; de Rezende, R.S.; You, C.; Jeong, S.G. Did It Change? Learning to Detect Point-Of-Interest Changes for Proactive Map Updates. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4081–4090. [Google Scholar]
Bao, X.; Wang, S. Survey of object detection algorithm based on deep learning. Sens. Microsyst. 2022, 41, 5–9. [Google Scholar] [CrossRef]
Zhang, f.; Liu, Y. Street view imagery: Methods and applications based on artificial intelligence. J. Remote Sens. 2021, 25, 1043–1054. [Google Scholar] [CrossRef]
Anguelov, D.; Dulong, C.; Filip, D.; Frueh, C.; Lafon, S.; Lyon, R.; Ogale, A.; Vincent, L.; Weaver, J. Google Street View: Capturing the World at Street Level. Computer 2010, 43, 32–38. [Google Scholar] [CrossRef]
Cheng, L.; Chu, S.; Zong, W.; Li, S.; Wu, J.; Li, M. Use of Tencent Street View Imagery for Visual Perception of Streets. ISPRS Int. J. Geo-Inf. 2017, 6, 265. [Google Scholar] [CrossRef] [Green Version]
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cheng, S.; Ma, H.; Liu, S. Acceptance Derivation and Local Presentation in Transplantation: An Analysis of Modern Japanese Architects’ Architectural Style in Qingdao (1914–1945). Urban Archit. 2023, 20, 191–195. [Google Scholar]
Chai, X.; Qin, J.; Ying, C.; Jin, S.; Rao, X. Study on the façade style of the Bund building complex in Shanghai. Urban Archit. 2022, 19, 119–122. [Google Scholar]
Yang, D. A brief analysis of the formation and characteristics of Harbin’s modern architectural style. Ind. Des. 2022, 8, 95–97. [Google Scholar]
Qin, Z.; Zhang, W.; Wen, Y.; Lu, J. A Study on the Style Types and Characteristics of Christian Church Architecture: A Case Study of Nanjing. Urban Archit. 2023, 20, 200–203. [Google Scholar]

Figure 1. Study area. The green line segment is the Third Ring area of Wuhan City, which is the experimental subject of this study. The red areas are handmade building vector map data; the blue portion is the continuous SVI collection points.

Figure 2. Approach of building an architectural style dataset. The red part is the pre-processing data for the “learning” process of the convolutional neural network. The blue part shows how a deep learning model predicts and classifies architectural styles. Finally, the dataset is constructed by matching it according to the outer contour of the building.

Figure 3. Faster-RCNN model detection flowchart. (b) The process of identifying and predicting objects in street view imagery via the Faster-RCNN model. Inside the red line is the backbone network VGG-16 of Faster-RCNN, and we use (a) to show the VGG-16 network.

Figure 4. Schematic diagram of the dataset construction process. The whole process of dataset construction is introduced in detail, including data acquisition, preprocessing, and labeling. Model network adjustment, training, testing, and dataset construction.

Figure 5. Schematic diagram based on continuous street view imagery and vector map data matching. (a) The coordinate system; (b,c) the target in continuous street view imagery and a flat map.

Figure 6. Building outline vector.

Figure 7. Number of annotations for each style.

Figure 8. Model accuracy comparison. The accuracy score performance of the three training networks on 22 building classes. For the Ancient Rome Art Deco architecture classes, YOLOX_ X received the highest accuracy score; for the other classes, Faster-RCNN was the best.

Figure 9. Model iteration parameters. (a) Training-Rpn-Class-and-Box-loss of the model. (b) Training-Class-and-Box-loss of the model. (c) Training-Total-loss of the model. (d) Training-Average-Time of the model.

Figure 10. Example of test set detection results. The detection box selects the target building and displays the building category and the detection probability value.

Figure 11. 22 × 22 confusion matrix diagram. The pictures show the actual and predicted values of 22 styles and the false predictions for each.

Figure 12. Statistical chart of Wuhan’s architectural style dataset.

Figure 13. Illustration of the dataset generation results matching SVIs. It includes the Qing, Hui, Functionalism, New Chinese, and Folk Residence styles.

Figure 14. Schematic diagram of dataset validity. Pink indicates the correct prediction monomer and black the incorrect one.

Figure 15. Enlarged schematic view of the area along the river. We can see a large number of red Western-style buildings and ancient Roman and ancient Greek style buildings.

Figure 16. Schematic map of East Lake Ecotourism Landscape Park.

Figure 17. Dataset style accuracy histogram. Although the accuracy rate can judge the overall accuracy rate, it is not a good indicator to measure the result in the case of unbalanced samples. Therefore, we evaluate the accuracy of the dataset we construct by comparing precision, recall, and F1 score.

Table 1. The summarized two common architectural classification methods.

Chronological [47]	Functional [44]
This categorization method divides buildings into groups based on a more than ten-year gradient. Attention should be paid to the differences between the buildings with more emphasis on the aesthetic experience than the age of the building.	This classification method primarily classifies buildings into several categories (residential, office, industrial, and shop), and focuses more on the function of the building. However, the exact function of the building can have a significant errors based on appearance alone because the shape of the building facade depends significantly on the personal subjective vision of the architect.

Table 2. The summarized 22 architectural styles in the study area.

Number	Category	Feature
1.	Functionalism	One of the most common architectural styles in the city, it focuses more on the function of the building than on the exterior features.
2.	Expressionism	Most of them are streamlined and attract attention with exaggerated shapes in their features.
3.	New Chinese	An architectural form that combines Chinese and modern architecture. It retains the charm and essence of Chinese houses. Following the layout format of traditional houses, the tiled sloping roofs of traditional houses are continued.
4.	Western style	This style features simple and precise lines, pays attention to symmetry, and uses color, light and shade. Representative features include Roman columns, line feet, convex belly windows, dormer windows, etc.
5.	Art Deco	The Art Deco style incorporates traditional craft motifs with the colors of the machine age. Rich colors, striking geometric patterns, setbacks, and many decorations often characterize this style.
6.	Folk Residence	Representative features include Chinese corner eaves, Five mile towers, and European champagne stone columns. This style is known for its comfortable space layout, rich decoration, and Republican-style decoration style.
7.	Han	The foundations, structures, and carvings follow the archetype of ancient Chinese architecture. Common features include straight ridges. This style has wooden structures such as buckets. Mainly red, black and yellow. It is located on a base or high platform.
8.	Tang	This style is magnificent and rigorous. The bucket is large, the eaves are far-reaching, the lifting is gentle, the roof atmosphere is stable, the color painting is complex and elegant, and the two colors vermilion and white are commonly used.
9.	Song	Delicate and intricate, with attention to decoration. The roof is interspersed with rich, staggered three-dimensional silhouettes. The ridge and corners of the roof are upturned, with various colorful paintings, carvings, doors, and windows.
10.	Yuan	Extensive and uninhibited, affected by economy and culture, this architectural development is in a withered state; the Yuan dynasty mainly used logs as beams, and the appearance is extensive. Yuan architecture features multi-use column shifting, column reduction, and reduced wood frame structure.
11.	Ming	The Ming style is rigorous and steady in its decoration, color painting, and decoration archetype. Many works of materials such as masonry, glass, and hardwood were left, and bricks were commonly used in residential walls. The hierarchy is strict. The people were forbidden to use brackets and colors.
12.	Qing	Representative forms include hard mountain architecture, hanging mountain style architecture, hill building, cutting-edge architecture, etc.
13.	Su	This is the architectural style of the north and south garden layout. Representative features include roofs with high ridge angles, brick gatehouses, tile windows, and street-crossing buildings.
14.	Chu	This style is the embodiment of “the unity of heaven and man” and “ritual law and patriarchal system”. The high-platform buildings are typical of the characteristics and level of Chu architecture, which occupies an important position in the history of architecture.
15.	Jing Residence	The most typical example is the courtyard in Beijing. The yard is expansive and sparse, closed and private; the houses on all sides are independent, the appearance is regular, and the mid-line is symmetrical.
16.	Hui	The main features are the unique horse-head wall, the closed high wall, the horse-head angle wall, horse-head staggered tiles, and white wall.
17.	Gothic	The minaret is high, with pointed arches, large windows, and stained glass depictions of Biblical stories.
18.	Baroque	This style features freedom of shape, pursuit of movement, preference for rich decoration, carvings, and intense colors.
19.	French classicism	Employing the classical pillar style, rejecting national traditions and local characteristics, this style has majesty. The general representative works are relatively large-scale, majestic palace buildings and some monumental square complexes
20.	Ancient Greece	Characteristic features include triangular pediments, high pedestals, and neatly arranged colonnades.
21.	Ancient Rome	Ancient Roman architecture inherited the styles of ancient Greece and developed new architectural characteristics, mainly the broad application of arch structure (including barrel arches and cross arches) and the application of Tuscan-type columns and mixed-type columns.
22.	Byzantium	It is characterized by the widespread use of vaulted roofs with prominent overall styling centers. In its use of color, attention is paid to both change and unity.

Table 3. Map method precision table.

Mapping Method	Correct Mapping	Error Mapping
Monographic Mapping	419	81

Table 4. The detection accuracy and range of each architectural style after model training.

Category	AP/%	Range
None		Q4 (0–25%)
Art Deco	0.4128
New Chinese	0.4361	Q3 (25–50%)
French Classicism	0.4385
Baroque	0.4828
Expressionism	0.5379
Han	0.5717
Gothic	0.6316
Western-style	0.6539	Q2 (50–75%)
Chu	0.6923
Su	0.7219
Ancient Rome	0.7336
Yuan	0.7361
Ancient Greece	0.7697
Ming	0.7852
Folk Residence	0.7861
Qing	0.7959
Functionalism	0.8412	Q1(75–100%)
Hui	0.8437
Song	0.8673
Jing Residence	0.8835
Byzantium	0.8921
Tang	0.8943
MAP	0.7003

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, H.; Sun, H.; Wang, L.; Yu, X.; Li, T. Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan. ISPRS Int. J. Geo-Inf. 2023, 12, 264. https://doi.org/10.3390/ijgi12070264

AMA Style

Xu H, Sun H, Wang L, Yu X, Li T. Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan. ISPRS International Journal of Geo-Information. 2023; 12(7):264. https://doi.org/10.3390/ijgi12070264

Chicago/Turabian Style

Xu, Hong, Haozun Sun, Lubin Wang, Xincan Yu, and Tianyue Li. 2023. "Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan" ISPRS International Journal of Geo-Information 12, no. 7: 264. https://doi.org/10.3390/ijgi12070264

APA Style

Xu, H., Sun, H., Wang, L., Yu, X., & Li, T. (2023). Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan. ISPRS International Journal of Geo-Information, 12(7), 264. https://doi.org/10.3390/ijgi12070264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Urban Architectural Style Recognition and Dataset Construction Method under Deep Learning of street View Images: A Case Study of Wuhan

Abstract

1. Introduction

2. Related Works

2.1. Street View Image-Based Urban Data

2.2. Deep Learning Technology in the Field of Architectural Study

3. Materials and Methods

3.1. Study Area

3.2. Methodology

3.2.1. Architectural Style Classification Based on Building Components and Characteristics

3.2.2. Deep Learning Network for Street View Architectural Style

3.2.3. Street View Architectural Style Dataset Construction

4. Results

4.1. Street View Image Annotation

4.2. Model Comparison

4.3. Results and Analysis of Architectural Style Dataset

4.4. Dataset Validity

5. Discussion

5.1. Street View Imagery as an Agent for Deep Learning

5.2. The Model’s Performance in Architectural Style Detection and Classification

5.3. Sensitivity Analysis of the CNN Model

5.4. Promoting the Development of Research in the Field of Architectural Styles

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI