Method for Constructing a Façade Dataset through Deep Learning-Based Automatic Image Labeling

: The construction industry has made great strides in recent decades by utilizing computer programs, including computer aided design programs. However, compared to the manufacturing sector, labor productivity is low because of the high proportion of knowledge-based tasks and simple repetitive tasks. Therefore, knowledge-based task efﬁciency should be improved through the visual recognition of information by computers. A computer requires a large amount of training data, such as the ImageNet project, to recognize visual information. This paper proposes façade datasets that are efﬁciently constructed by quickly collecting façade data through road-view images generated from web portals and automatically labeled using deep learning as part of the construction of image datasets for visual recognition construction by a computer. Therefore, we attempted to automatically label façade images to quickly generate large-scale façade datasets with much less effort than the existing research methods. Simultaneously, we constructed datasets for a part of Dongseong-ro, Daegu Metropolitan City, and analyzed their utility and reliability. It was conﬁrmed that the computer could extract signiﬁcant façade information from the road-view images by recognizing the visual information of the façade image. In addition, we veriﬁed the characteristics of the building construction image datasets. This study suggests the possibility of securing quantitative and qualitative façade design knowledge by extracting façade design information from façades anywhere in the world. Previous studies mainly collected façade images through camera photography to construct databases, but in this study, a signiﬁcant part of the database construction process was shortened through automation. In the case of façade automatic image labeling studies, it is the façade-based automatic 3D modeling which has been primarily studied, but it is difﬁcult to ﬁnd a study to extract data for façade design research.


Introduction
The current construction industry has made great progress through the accumulation of various computer programs and construction data, including computer aided design (CAD) [1]. However, compared with the manufacturing sector, labor productivity is low, except in the construction industry [1,2]. Artificial intelligence (AI) must be used to improve areas of architecture that are considered human aspects by intelligent computer systems to increase productivity. However, the level of AI research and technology development in the field of architectural design is only at the level of research for optimizing and streamlining tasks in the design-construction phase or automation of some simple architectural designs. The development of artificial related technologies based on design uniqueness is insufficient [3].
In this study, we determined that large scale façade datasets can be constructed with considerably less effort than any other methods by applying AI. One of the most important causes of the low level of research AI in the field of architecture is the lack of

Background and Related Work
For a computer to be able to recognize visual information at the same level as humans, it is necessary to construct and learn the same amount of data that humans have accumulated over the years through their eyes. To this end, Fei-Fei (2009) built an image dataset called the ImageNet project, which has labeled approximately 1 billion images in 167 countries with 50,000 people. Therefore, the computer was able to provide a basis for recognizing visual information [11]. However, the ImageNet project had a vast array of general objects and information but was limited in the number of architectural objects and information; therefore, it is difficult to apply it in the construction field. The construction of a construction-specialized image dataset that labels construction information in relation to construction-related objects is an essential early step in this process. Hence, it is necessary to build big data from various mixed data formats, such as drawings, images, and text, in the construction field and convert these into a construction dataset to utilize in construction AI research. Accordingly, this section examines construction dataset cases and studies related to façade labeling, and describes the research gap between previous studies and this study.

1.
Park (2006) built a building façade database manually to manage and provide the color of the façade. Using this, the information of the façade was calculated and the color characteristics of each façade composition type were summarized and analyzed.
2. Park (2007) manually built a building façade database to provide quantified data. Using this, the information of the façade was calculated to interpret the characteristics of the current color range of the existing street side building façade and the color range for each street was presented through the setting of the color evaluation model.
As described above, to understand the characteristics of the façade, the existing research in Table 1 mainly needs to collect the façade image of the building by directly photographing it with a camera, calculating information through manual operation, and collecting data to establish a database. However, this method is considered costly and time-consuming given the need to construct a building façade database of more than block units; therefore, it is necessary to reduce the effort of a part or all of the existing database construction process through automation. Therefore, we examine research related to façade labeling, a method to reduce the existing database construction process.

1.
Teboul, Simon, Koutsourakis, and Paragios [2] improved the façade image labeling performance by using shape grammar, random walk, map classification, and machine learning methods to improve urban understanding and façade image-based 3D modeling performance. In particular, it was possible to obtain a right-angled label image through shape grammar and the labeling performance was improved using machine learning [12].

2.
Martinović (2012) used three layers to improve building façade-labeling performance. Each layer was algorithmized using (1) an RNN, which is a machine learning technique; (2) enhanced labeling to enhance the recognition of the façade components; and (3) structural knowledge to make the label image structurally valid [13]. 3.
Riemenschneider, Krispel, Thaller, Donoser, Havemann, Fellner, and Bischof [3] improved façade labeling performance by combining low-level classification models with medium-level object recognition models to compensate for the complicated research of labeling building façades using existing shape grammar [14].

4.
Jampani, Gadde, and Gehler [4]  Research has been conducted in the field of computer vision to automatically generate façade-based 3D modeling or to improve façade-labeling performance. It is difficult to find cases where data are collected for architectural design studies.
As such, previous studies on architectural elements using computer vision technology were mainly conducted to construct spatial information indoors and outdoors. In the case of building façade automatic labeling research, the main area of study in the field of computer vision was the automatic generation of 3D modeling based on the façade or improvement of façade labeling performance, and it is difficult to find a data collection for  Table 2 used the CMP database as training data, and these two studies recorded the highest performance. This shows that the CMP database satisfies an accurate and large amount of training data, a prerequisite for improving the building façade automatic labeling performance. Teboul, O Segmentation of building façades using procedural shape priors (2010) 3 Martinović, A A Three-Layered Approach to Façade Parsing (2012) 4 Riemenschneider, H Irregular lattices for complex shape grammar façade parsing (2012) 5 Jampani, V Efficient Façade Segmentation Using Auto-context (2015) 6 Jampani, V Efficient 2D and 3D Façade Segmentation Using Auto-Context (2018) Image processing methods include image classification, localization/detection, and image segmentation, as shown in Table 3 [17]. Table 3. Automatic labeling technology analysis.

Image Processing Method Characteristic Example
Classification Predict the class of the input mance. This shows that the CMP database satisfies an accurate and large amount of training data, a prerequisite for improving the building façade automatic labeling performance.  5 Jampani, V Efficient Façade Segmentation Using Auto-context (2015) 6 Jampani, V Efficient 2D and 3D Façade Segmentation Using Auto-Context Image processing methods include image classification, localization/detection, and image segmentation, as shown in Table 3 [17]. mance. This shows that the CMP database satisfies an accurate and large amount of training data, a prerequisite for improving the building façade automatic labeling performance.  5 Jampani, V Efficient Façade Segmentation Using Auto-context (2015) 6 Jampani, V Efficient 2D and 3D Façade Segmentation Using Auto-Context Image processing methods include image classification, localization/detection, and image segmentation, as shown in Table 3 [17]. mance. This shows that the CMP database satisfies an accurate and large amount of training data, a prerequisite for improving the building façade automatic labeling performance.  2 Teboul, O Segmentation of building façades using procedural shape priors (2010) 3 Martinović, A A Three-Layered Approach to Façade Parsing (2012) 4 Riemenschneider, H Irregular lattices for complex shape grammar façade parsing (2012) 5 Jampani, V Efficient Façade Segmentation Using Auto-context (2015) 6 Jampani, V Efficient 2D and 3D Façade Segmentation Using Auto-Context Image processing methods include image classification, localization/detection, and image segmentation, as shown in Table 3 [17]. In calculating the façade information, the prediction of multiple labels and the width of the object are necessary. As shown in Figure 1, semantic segmentation is most suitable for the automatic label of the façade image that requires information such as the area and number of the same class. Semantic segmentation classifies each pixel according to a specified class and does not distinguish objects of the same class, whereas instance segmentation distinguishes each object when the labels are distinguished.
In calculating the façade information, the prediction of multiple labels and the width of the object are necessary. As shown in Figure 1, semantic segmentation is most suitable for the automatic label of the façade image that requires information such as the area and number of the same class. Semantic segmentation classifies each pixel according to a specified class and does not distinguish objects of the same class, whereas instance segmentation distinguishes each object when the labels are distinguished.

Research Gap
Previous studies mainly collected façade images through camera photography to construct databases, but in this study, a significant part of the database construction process was shortened through automation. In the case of façade automatic image labeling studies, the façade-based automatic 3D modeling has been mainly studied, but it is difficult to find a study to extract data for façade design research.

Research Direction
In this study, we attempted to automatically label façade images to quickly generate large-scale façade datasets with much less effort than by the existing research methods. Therefore, the main direction of this study is to utilize deep learning technology to accurately auto-label non-quantified street façade images existing in road-view images generated from web portals such as Google Street View [9] and Naver Street View [10]. It is also necessary to verify that the required data can be obtained using the façade database to identify the characteristics of the building design on the street.
The direction of this research, shown in Figure 2, was as follows:

Definition of Façade Dataset Information
Defining the required information in a dataset based on the existing characteristics of the façade.

Acquisition Method of Dataset Sources
Selection of the source of the dataset from road-view images and image distortion -correction technology.

Conversion Method of Dataset Sources
For the selection of converted datasets using deep learning model-based automatic labeling technology, the road-view images were corrected using image distortion technology, which was converted into computer-recognizable datasets. The deep learning method included the SegNet model training data acquisition using the CMP database [18], training data optimization, and securing a façade automatic labeling algorithm based on the SegNet model [19].

Dataset Acquisition
Dataset was secured according to the conversion and acquisition methods 5. Reliability Analysis of Dataset

Research Gap
Previous studies mainly collected façade images through camera photography to construct databases, but in this study, a significant part of the database construction process was shortened through automation. In the case of façade automatic image labeling studies, the façade-based automatic 3D modeling has been mainly studied, but it is difficult to find a study to extract data for façade design research.

Research Direction
In this study, we attempted to automatically label façade images to quickly generate large-scale façade datasets with much less effort than by the existing research methods. Therefore, the main direction of this study is to utilize deep learning technology to accurately auto-label non-quantified street façade images existing in road-view images generated from web portals such as Google Street View [9] and Naver Street View [10]. It is also necessary to verify that the required data can be obtained using the façade database to identify the characteristics of the building design on the street.
The direction of this research, shown in Figure 2, was as follows: 1.

Definition of Façade Dataset Information
Defining the required information in a dataset based on the existing characteristics of the façade.

2.
Acquisition Method of Dataset Sources Selection of the source of the dataset from road-view images and image distortion correction technology.

Conversion Method of Dataset Sources
For the selection of converted datasets using deep learning model-based automatic labeling technology, the road-view images were corrected using image distortion technology, which was converted into computer-recognizable datasets. The deep learning method included the SegNet model training data acquisition using the CMP database [18], training data optimization, and securing a façade automatic labeling algorithm based on the SegNet model [19].

4.
Dataset Acquisition Dataset was secured according to the conversion and acquisition methods 5.
Reliability Analysis of Dataset A reliability analysis was conducted by comparing the façade information obtained with ground truth on the four categories (Dominant color, Secondary Color, Number of Floors, Number of Buildings) of façade information that can be calculated through the datasets. 6.
Verify Efficiency and Application of Dataset Dataset efficiency verification and utilization involved validation of façade dataset efficiency based on reliability analysis and derivation of a means to utilize the information, which can be calculated from the datasets.
A reliability analysis was conducted by comparing the façade information obtaine with ground truth on the four categories (Dominant color, Secondary Color, Numb of Floors, Number of Buildings) of façade information that can be calculated throug the datasets. 6. Verify Efficiency and Application of Dataset Dataset efficiency verification and utilization involved validation of façade datas efficiency based on reliability analysis and derivation of a means to utilize the info mation, which can be calculated from the datasets.

SegNet Model
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla investigated the dee learning-based image automatic labeling method using semantic segmentation and co cluded that among other models, the SegNet model [19] had a higher performance terms of accuracy. Models primarily used for classification, such as VGG and AlexNe maintain a layer that reduces the number and dimension of parameters and consequent loses some location information. However, the SegNet model efficiently preserves the l cation information using the encoder and decoder structure. As shown in Figure 3, th encoder and decoder structures output the image compressed through the encoder an are labeled with each pixel according to the designated class [19]. Figure 4 shows the input and output labels according to the specified class.

SegNet Model
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla investigated the deep learning-based image automatic labeling method using semantic segmentation and concluded that among other models, the SegNet model [19] had a higher performance in terms of accuracy. Models primarily used for classification, such as VGG and AlexNet, maintain a layer that reduces the number and dimension of parameters and consequently loses some location information. However, the SegNet model efficiently preserves the location information using the encoder and decoder structure. As shown in Figure 3, the encoder and decoder structures output the image compressed through the encoder and are labeled with each pixel according to the designated class [19]. Appl. Sci. 2022, 12, x FOR PEER REVIEW 7 of 17  To implement automatic labeling of images based on the SegNet model, a high level of programming knowledge is required, but various frameworks, such as Tensorflow, Pytorchzky, and Caffe, are available to make this easier. Among these, TensorFlow is the most often-used framework in image processing.
In this study, we implemented an automatic image-labeling algorithm based on the SegNet model using TensorFlow.

CMP Database
The higher the amount of correctly labeled data, the greater are learning results. According to previous studies, as shown in Table 2. Automatic labeling of the façade based on machine learning showed that the CMP database, which is considered to possess a relatively large amount of data and contains information on façades, was used as training   To implement automatic labeling of images based on the SegNet model, a high level of programming knowledge is required, but various frameworks, such as Tensorflow, Pytorchzky, and Caffe, are available to make this easier. Among these, TensorFlow is the most often-used framework in image processing.
In this study, we implemented an automatic image-labeling algorithm based on the SegNet model using TensorFlow.

CMP Database
The higher the amount of correctly labeled data, the greater are learning results. According to previous studies, as shown in Table 2. Automatic labeling of the façade based on machine learning showed that the CMP database, which is considered to possess a relatively large amount of data and contains information on façades, was used as training To implement automatic labeling of images based on the SegNet model, a high level of programming knowledge is required, but various frameworks, such as Tensorflow, Pytorchzky, and Caffe, are available to make this easier. Among these, TensorFlow is the most often-used framework in image processing.
In this study, we implemented an automatic image-labeling algorithm based on the SegNet model using TensorFlow.

CMP Database
The higher the amount of correctly labeled data, the greater are learning results. According to previous studies, as shown in Table 2. Automatic labeling of the façade based on machine learning showed that the CMP database, which is considered to possess a relatively large amount of data and contains information on façades, was used as training data. Therefore, we used the CMP database as training data to obtain an accurate and large number of datasets in this study.

Framework
The ultimate purpose of this study was to propose a construction method that vastly reduces the build-up period for the construction of façade datasets by applying artificial intelligence technologies such as deep learning. The framework of the proposed construction method is presented in Figure 5 and consists of five modules. data. Therefore, we used the CMP database as training data to obtain an accurate and large number of datasets in this study.

Framework
The ultimate purpose of this study was to propose a construction method that vastly reduces the build-up period for the construction of façade datasets by applying artificial intelligence technologies such as deep learning. The framework of the proposed construction method is presented in Figure 5 and consists of five modules.

Module 1: Raw Data Collection
In this study, façade images extracted from road-view images were used as raw data for the construction of the datasets. This is because images of buildings worldwide, including domestic ones, are being updated regularly, and most façade images overlooking the street can be secured. However, because the road-view images were obtained through a 360 • camera, perspective distortion occurred, as shown in Figures 6 and 7. A correction method must therefore be devised [9,10].
Appl. Sci. 2022, 12, x FOR PEER REVIEW 9 of 1 Figure 5. Framework of the proposed construction method.

Module 1: Raw Data Collection
In this study, façade images extracted from road-view images were used as raw dat for the construction of the datasets. This is because images of buildings worldwide, in cluding domestic ones, are being updated regularly, and most façade images overlookin the street can be secured. However, because the road-view images were obtained throug a 360° camera, perspective distortion occurred, as shown in Figures 6 and 7. A correctio method must therefore be devised [9,10].

Module 2: Image Distortion Correction
To use the portal site road view as the façade data for buildings, there are two meth ods: labeling after performing distortion correction, as shown in Figure 6, and labelin directly in a distorted image state. In the latter case, there is the advantage of not havin to go through the process of photo distortion correction, but there are disadvantages, suc as the degree of distortion of the façade image being different from the road-view images Hence, the amount of training data needs to be increased. In this study, as a result of test ing the labeling accuracy of both methods, the accuracy of the former was higher. There fore, the former method of undergoing a distortion correction process was used [10].
Image distortion correction requires an understanding of image processing and th overall camera system, but various tools have been provided to make it easier to proces images.
Adobe's Lightroom [21] is the most commonly used tool for correcting image distor tion. In addition, an automatic distortion-correction function is provided that reduces th processing time. In this study, the image distortion correction process was performed us ing Lightroom.

Module 3: Image Distortion Correction
In this study, a database containing both façade information and the correspondin labeling image was used as training data. As shown in Table 4, the façade information o the building was limited to the dominant color, secondary color, number of floors, an number of buildings [22].

Module 1: Raw Data Collection
In this study, façade images extracted from road-view images were used as raw dat for the construction of the datasets. This is because images of buildings worldwide, in cluding domestic ones, are being updated regularly, and most façade images overlookin the street can be secured. However, because the road-view images were obtained throug a 360° camera, perspective distortion occurred, as shown in Figures 6 and 7. A correctio method must therefore be devised [9,10].

Module 2: Image Distortion Correction
To use the portal site road view as the façade data for buildings, there are two meth ods: labeling after performing distortion correction, as shown in Figure 6, and labelin directly in a distorted image state. In the latter case, there is the advantage of not havin to go through the process of photo distortion correction, but there are disadvantages, suc as the degree of distortion of the façade image being different from the road-view images Hence, the amount of training data needs to be increased. In this study, as a result of test ing the labeling accuracy of both methods, the accuracy of the former was higher. There fore, the former method of undergoing a distortion correction process was used [10].
Image distortion correction requires an understanding of image processing and th overall camera system, but various tools have been provided to make it easier to proces images.
Adobe's Lightroom [21] is the most commonly used tool for correcting image distor tion. In addition, an automatic distortion-correction function is provided that reduces th processing time. In this study, the image distortion correction process was performed us ing Lightroom.

Module 3: Image Distortion Correction
In this study, a database containing both façade information and the correspondin labeling image was used as training data. As shown in Table 4, the façade information o the building was limited to the dominant color, secondary color, number of floors, an number of buildings [22].

Module 2: Image Distortion Correction
To use the portal site road view as the façade data for buildings, there are two methods: labeling after performing distortion correction, as shown in Figure 6, and labeling directly in a distorted image state. In the latter case, there is the advantage of not having to go through the process of photo distortion correction, but there are disadvantages, such as the degree of distortion of the façade image being different from the road-view images. Hence, the amount of training data needs to be increased. In this study, as a result of testing the labeling accuracy of both methods, the accuracy of the former was higher. Therefore, the former method of undergoing a distortion correction process was used [10].
Image distortion correction requires an understanding of image processing and the overall camera system, but various tools have been provided to make it easier to process images.
Adobe's Lightroom [21] is the most commonly used tool for correcting image distortion. In addition, an automatic distortion-correction function is provided that reduces the processing time. In this study, the image distortion correction process was performed using Lightroom.

Module 3: Image Distortion Correction
In this study, a database containing both façade information and the corresponding labeling image was used as training data. As shown in Table 4, the façade information of the building was limited to the dominant color, secondary color, number of floors, and number of buildings [22]. Table 4. The Definition of façade information.

Dominant Color
The largest color in the Façade Secondary Color The second largest color in the façade

Number of Floors
Single-story building (1st floor)/Low-rise building (2~3rd floor) Medium-rise building (4-5 floors)/High-rise building (more than six floors) Number of Buildings Number of buildings facing the street in one horizontal block The CMP database [23] consists of the 12 most basic (Wall, Molding, Cornice, Pillar, Window, Door, Sill, Blind, Balcony, Shop, Deco, and Background) classes of façades with approximately 600 labeled data inputs. In this study, among the 12 classes, except for the class with little influence on calculating the façade information, similar classes were combined and re-designated into four classes. The redesigned classes are shown in Table 5 [18].

Module 4: Training Data Optimization
CMP databases with re-designated classes require preprocessing to ensure optimal learning in the SegNet model. Pre-processing operations consist of resizing operations to meet the input value conditions of the SegNet model, as shown in Figure 8, crop operations to remove elements that impair the quality of the image to make the image similar to the actual environment, and augmentation operations to increase the effectiveness of learning by increasing training data by horizontally reversing the image or adjusting the gamma value [19].

Façade Information Definition Dominant Color
The largest color in the Façade Secondary Color The second largest color in the façade

Number of Floors
Single-story building (1st floor)/Low-rise building (2~3rd floor) Medium-rise building (4-5 floors)/High-rise building (more than six floors) Number of Buildings Number of buildings facing the street in one horizontal block The CMP database [23] consists of the 12 most basic (Wall, Molding, Cornice, Pillar, Window, Door, Sill, Blind, Balcony, Shop, Deco, and Background) classes of façades with approximately 600 labeled data inputs. In this study, among the 12 classes, except for the class with little influence on calculating the façade information, similar classes were combined and re-designated into four classes. The redesigned classes are shown in Table 5 [18].

Module 4: Training Data Optimization
CMP databases with re-designated classes require preprocessing to ensure optimal learning in the SegNet model. Pre-processing operations consist of resizing operations to meet the input value conditions of the SegNet model, as shown in Figure 8, crop operations to remove elements that impair the quality of the image to make the image similar to the actual environment, and augmentation operations to increase the effectiveness of learning by increasing training data by horizontally reversing the image or adjusting the gamma value [19].

Module 5: Test Data Selection
In this study, a part of Dongseong-ro, Daegu Metropolitan City, which is a central commercial street, was used as the test data. The building façade information was limited to buildings with low noise, such as street trees. Elements such as trees that generated a high level of noise were excluded.
The selected building images were collected according to the raw data collection method, and the collected images were distorted and corrected to obtain test data. Because the test data were also needed to calculate the optimal information from the automatic façade image labeling algorithm based on the SegNet model, the resize and crop task was performed during the preprocessing operation conducted on the training data. In addition, a ground truth labeling image was required to measure the labeling accuracy of the test image and labeling.

Façade Information Calculation Criteria
Using the labeling image obtained in the test data selection, the building façade information was calculated as follows in Table 6: 1.
Dominant color: Combines the area of the label of the same class to calculate the color of the corresponding façade image of the largest class as the dominant color.

2.
Secondary color: The area of those labels of the same class is combined to calculate the color of the corresponding façade image of the second largest class as the secondary color.  high level of noise were excluded. The selected building images were collected according to the ra method, and the collected images were distorted and corrected to obtain the test data were also needed to calculate the optimal information fro façade image labeling algorithm based on the SegNet model, the resize a performed during the preprocessing operation conducted on the traini tion, a ground truth labeling image was required to measure the labelin test image and labeling.

Façade Information Calculation Criteria
Using the labeling image obtained in the test data selection, the bu formation was calculated as follows in Table 6

Training Data Output and Analysis
In the CMP database, the SegNet model-based automatic façade i gorithm was tested on images that were not used for learning. Table 7 s  image and output. Tables 8-10 show the test results. 2 Secondary Color high level of noise were excluded. The selected building images were collected according to the ra method, and the collected images were distorted and corrected to obtain the test data were also needed to calculate the optimal information fro façade image labeling algorithm based on the SegNet model, the resize a performed during the preprocessing operation conducted on the traini tion, a ground truth labeling image was required to measure the labelin test image and labeling.

Façade Information Calculation Criteria
Using the labeling image obtained in the test data selection, the bu formation was calculated as follows in Table 6

Training Data Output and Analysis
In the CMP database, the SegNet model-based automatic façade i gorithm was tested on images that were not used for learning. Table 7 s  image and output. Tables 8-10 show the test results. 3 Number of Floors high level of noise were excluded. The selected building images were collected according to the raw method, and the collected images were distorted and corrected to obtain the test data were also needed to calculate the optimal information fro façade image labeling algorithm based on the SegNet model, the resize a performed during the preprocessing operation conducted on the traini tion, a ground truth labeling image was required to measure the labelin test image and labeling.

Façade Information Calculation Criteria
Using the labeling image obtained in the test data selection, the bu formation was calculated as follows in Table 6

Training Data Output and Analysis
In the CMP database, the SegNet model-based automatic façade i gorithm was tested on images that were not used for learning. Table 7 sh image and output. Tables 8-10 show the test results. 4 Number of Buildings high level of noise were excluded. The selected building images were collected according to the raw method, and the collected images were distorted and corrected to obtain the test data were also needed to calculate the optimal information fro façade image labeling algorithm based on the SegNet model, the resize a performed during the preprocessing operation conducted on the traini tion, a ground truth labeling image was required to measure the labelin test image and labeling.

Façade Information Calculation Criteria
Using the labeling image obtained in the test data selection, the bu formation was calculated as follows in Table 6

Training Data Output and Analysis
In the CMP database, the SegNet model-based automatic façade i gorithm was tested on images that were not used for learning. Table 7 sh image and output. Tables 8-10 show the test results.

Training Data Output and Analysis
In the CMP database, the SegNet model-based automatic façade image-labeling algorithm was tested on images that were not used for learning. Table 7 shows the labeling image and output. Tables 8-10 show the test results.   Table 9. Pixel accuracy of CMP test data.

Data
Pixel Accuracy Average 0.77 A total of 185 labeling tests were conducted using 37 test images and the results were analyzed. The time required for automatic labeling of the 37 images was 85.47 s, with each image being labeled in 2.31 s.
The mIoU in Table 4 is an index indicating the accuracy of labeling for each class by comparing the ground truth image and the extracted labeling image, and was indicated as 1 when the accuracy was 100%.
Therefore, the reliability of the number and location of objects in each class can be checked, and the walls and backgrounds were most successfully labeled with a mIoU of 0.70 or higher.
In the case of windows, mIoU 0.66 showed relatively compliant accuracy. However, in the case of entries, mIoU 0.51 showed a relatively low accuracy compared to the labeling accuracy of other classes. The average mIoU for all classes was 0.63.
The pixel accuracy listed in Table 5 is an index showing similarity between the two images by comparing all pixels of the ground truth image and the extracted labeling image. Thus, the reliability of the entire area of the extracted labeling image can be confirmed.
The accuracy of the façade information listed in Table 10 is an index indicating the accuracy of the calculated information of the output labeled image by comparing the output labeling image and the ground truth image. The façade information accuracy was 0.84, which was higher than the pixel accuracy and mIoU of the CMP database. This is because   Table 9. Pixel accuracy of CMP test data.

Data
Pixel Accuracy Average 0.77 A total of 185 labeling tests were conducted using 37 test images and the results were analyzed. The time required for automatic labeling of the 37 images was 85.47 s, with each image being labeled in 2.31 s.
The mIoU in Table 4 is an index indicating the accuracy of labeling for each class by comparing the ground truth image and the extracted labeling image, and was indicated as 1 when the accuracy was 100%.
Therefore, the reliability of the number and location of objects in each class can be checked, and the walls and backgrounds were most successfully labeled with a mIoU of 0.70 or higher.
In the case of windows, mIoU 0.66 showed relatively compliant accuracy. However, in the case of entries, mIoU 0.51 showed a relatively low accuracy compared to the labeling accuracy of other classes. The average mIoU for all classes was 0.63.
The pixel accuracy listed in Table 5 is an index showing similarity between the two images by comparing all pixels of the ground truth image and the extracted labeling image. Thus, the reliability of the entire area of the extracted labeling image can be confirmed.
The accuracy of the façade information listed in Table 10 is an index indicating the accuracy of the calculated information of the output labeled image by comparing the output labeling image and the ground truth image. The façade information accuracy was 0.84, which was higher than the pixel accuracy and mIoU of the CMP database. This is because  A total of 185 labeling tests were conducted using 37 test images and the results were analyzed. The time required for automatic labeling of the 37 images was 85.47 s, with each image being labeled in 2.31 s.
The mIoU in Table 4 is an index indicating the accuracy of labeling for each class by comparing the ground truth image and the extracted labeling image, and was indicated as 1 when the accuracy was 100%.
Therefore, the reliability of the number and location of objects in each class can be checked, and the walls and backgrounds were most successfully labeled with a mIoU of 0.70 or higher.
In the case of windows, mIoU 0.66 showed relatively compliant accuracy. However, in the case of entries, mIoU 0.51 showed a relatively low accuracy compared to the labeling accuracy of other classes. The average mIoU for all classes was 0.63.
The pixel accuracy listed in Table 5 is an index showing similarity between the two images by comparing all pixels of the ground truth image and the extracted labeling image. Thus, the reliability of the entire area of the extracted labeling image can be confirmed.
The accuracy of the façade information listed in Table 10 is an index indicating the accuracy of the calculated information of the output labeled image by comparing the output labeling image and the ground truth image. The façade information accuracy was 0.84, which was higher than the pixel accuracy and mIoU of the CMP database. This is because  Table 9. Pixel accuracy of CMP test data.

Data Pixel Accuracy
Average 0.77 A total of 185 labeling tests were conducted using 37 test images and the results were analyzed. The time required for automatic labeling of the 37 images was 85.47 s, with each image being labeled in 2.31 s.
The mIoU in Table 4 is an index indicating the accuracy of labeling for each class by comparing the ground truth image and the extracted labeling image, and was indicated as 1 when the accuracy was 100%.
Therefore, the reliability of the number and location of objects in each class can be checked, and the walls and backgrounds were most successfully labeled with a mIoU of 0.70 or higher.
In the case of windows, mIoU 0.66 showed relatively compliant accuracy. However, in the case of entries, mIoU 0.51 showed a relatively low accuracy compared to the labeling accuracy of other classes. The average mIoU for all classes was 0.63.
The pixel accuracy listed in Table 5 is an index showing similarity between the two images by comparing all pixels of the ground truth image and the extracted labeling image. Thus, the reliability of the entire area of the extracted labeling image can be confirmed.
The accuracy of the façade information listed in Table 10 is an index indicating the accuracy of the calculated information of the output labeled image by comparing the output labeling image and the ground truth image. The façade information accuracy was 0.84, which was higher than the pixel accuracy and mIoU of the CMP database. This is because the class mIoU of the walls and windows, which occupy an overwhelming proportion, was high among the classes produced by the dominant and secondary colors.

Test Data Output and Analysis
The test was conducted on the façade image (raw data) extracted from the road-view images and the results were analyzed. The test data were intended to determine if the pixel accuracy, mIoU, and façade information could be maintained and how it could be improved through distortion correction, even for characteristics that differed from the training data. Thirty façade images (referred to as original data) extracted from the original road views and 30 façade images (referred to as distortion correction data) extracted from the distortion-corrected road views were used as test data.
As shown in Tables 11 and 12, when the original data and distortion correction data were compared, the average mIoU and pixel accuracy values of the distortion correction data were higher by an average of 0.05 and 0.09, respectively. In addition, as shown in Table 13, the accuracy of the façade information was also 0.07 higher than that of the original. The results show that the distorted and corrected data can be effectively labeled using the CMP database compared with the original data. In this study, we compared the CMP database and the distortion correction data by viewing the distortion correction data showing higher accuracy of the façade information as a representative image of the façade image extracted from road views. As shown in Tables 14 and 15, the average mIoU and pixel accuracy values of the distortion correction data were lower than the CMP database by 0.09 and 0.02, respectively.   Table 16 shows the accuracy of the façade information obtained by the output and ground truth images. The accuracy of the façade information of the distorted and corrected data was lower than that of the CMP database by 0.09, but it was higher by 0.07, 0.04, and 0.09 in the secondary color, number of floors, respectively, and the number of buildings was 0.03 higher in the accuracy of the average façade information. This is a marginally different result from the mIoU and pixel accuracy values and is assumed to be the result of the façade characteristics of Dongsung-ro being tested. The façade characteristics of Dongseong-ro are as follows. 1.
The accuracy of the secondary color calculation was high because the ratio of the area of the window class, which has a relatively high mIoU, is high and the color of the window is often calculated as a secondary color because it is included in the façade.

2.
Compared to the CMP database, the shape of the window is clear and designed for each floor; therefore, the number of floors calculation accuracy of the building is high.

3.
The boundaries between buildings were ambiguous in the CMP database, but the test data were clear. Therefore, despite the higher mIoU of the wall and background classes in the CMP database, the number of buildings calculation accuracy of the buildings in the test data was higher.

Discussion of the Results
The resulting value of the revised version was higher than the original value for all indicators. This means that if there is distortion in the raw data image, the accuracy of the façade information can be improved through distortion correction. Conversely, it is clear that the mIoU and pixel accuracy affect the façade information accuracy, but the façade information accuracy varies depending on the data situation. It was confirmed that the factors that affect the accuracy of the façade information include the clarity of the shape of the class that appears in each image data, the clarity of the boundaries of each floor of the window class, and the clarity of the boundaries between buildings. Therefore, to increase façade information accuracy, it is necessary to obtain detailed learning criteria for mIoU and pixel accuracy and secure training data to increase mIoU and pixel accuracy depending on the characteristics of the data.

Limitations
Several limitations are acknowledged in this study. First, there was a limitation that made automatic labeling difficult, in the case of high noise such as street trees and streetlights. To this end, we are studying ways to increase the façade information accuracy calculation using NVDIA Image Inpainting [24], a technology that automatically removes noise and by securing high-quality readings.
Secondly, the SegNet model exhibited low quality learning due to only using 600 images of the CMP database as training data. To this end, we will build large-scale databases by increasing the accuracy of labeling to a certain level from the road-view images.
These issues in part provide a general perspective to the limitations of the deep learning approach. The approach requires careful data collection and preparation.

Practical Implications
Researchers and the government can use façade automatic labeling-based façade recognition to extract the window ratio at a specific distance from the road-view images and thereby provide a basis for the window ratio as a demonstration of energy consumption. This will be comparable to the existing window ratio studies or government regulations. The architectural industry will be able to automatically collect façade images and use them to model them to develop the level of architectural services by promoting digital transformation of architecture.
Through digital transformation, this study will contribute to research that can extract exterior building information from city-scale architecture digitized on the 3D map in the opposite case [25]. Furthermore, the application of window recognition methodology on road-view images using deep learning developed in this study will contribute to the development of a novel automated method for simulating the wind damage to building windows at the city-scale architecture [26].

Directions for Future Work
In this study, we proposed the possibility of obtaining quantitative and qualitative façade design information by extracting patterns of façade design from anywhere in the world. In the future, based on improved labeling accuracy, we will build datasets of automatically labeled images from the road-view images for research on automated color collection and demonstration of energy consumption according to window area ratio [1,22,[27][28][29].

Conclusions
This study proposed a method for efficiently constructing façade datasets through the process of collecting façade data from road-view images and automatically labeling them using deep learning. The goal of this study was to automatically label façade images to quickly generate large-scale façade datasets with much less effort than the existing research methods. We tested the possibility of calculating the data required by the façade database to comprehend the design characteristics of the street.
The semantic segmentation method was used for automatic image labeling and a SegNet model with high performance in terms of accuracy was used. The labeling subjects in this study were walls, windows, entrances, and background. The façade information calculated was the dominant color, secondary color, number of floors, and number of buildings. Six hundred images of CMP database-based training data were used to train the SegNet model-based automatic image labeling algorithm and 30 road-view images were used to test the algorithms. The automatic labeling of the test images showed a result of one per 2.31 s. Automatic labeling and façade information calculation showed results of 0.54, 0.75, and 0.87 for mIoU, pixel accuracy, and accuracy of façade information, respectively.
This study confirmed that the computer could recognize the visual information of the façade image and calculate meaningful façade information of the road-view images at a high speed. In addition, the results of the automatic labeling and façade information analysis were analyzed using the method of calculating the façade information proposed in this study. Detailed criteria for the required mIoU and pixel accuracy values based on the characteristics of the data will be prepared and applied in further studies to increase the façade information accuracy calculation.

Conflicts of Interest:
The authors declare that they have no known competing financial interest or personal relationships that could have influenced the work reported in this study.